-
Notifications
You must be signed in to change notification settings - Fork 681
PoC: InferenceClient
is also a MCPClient
#2986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
mcp_client.py
Outdated
self.exit_stack = AsyncExitStack() | ||
self.available_tools: List[ChatCompletionInputTool] = [] | ||
|
||
async def add_mcp_server(self, command: str, args: List[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not name this method connect_to_server
?
we can't add multiple mcp servers to single instance of client, can we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean we would need to store a map of sessions
, but there's nothing preventing us from doing it, conceptually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if the following question is out of scope.
Question: how would I connect to multiple MCP servers? Would it look like option 1 or option 2?
Option 1
client1 = MCPClient()
client1.add_mcp_server()
client2 = MCPClient()
client2.add_mcp_server()
or
Option 2
client = MCPClient()
client.add_mcp_server(server1)
client.add_mcp_server(server2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another design question: class MCPClient(AsyncInferenceClient)
vs class AsyncInferenceClient(...args, mcp_clients: MCPClient[])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to chime in unannounced, but from a very removed external user standpoint, I find this all very confusing - I just don't think what you coded should be called MCPClient
😄
When I came to this PR I was fully expecting MCPClient
to be passed as parameter to InferenceClient
, though I hear @Wauplin above, so why not a wrapper. But the end result is really more of an InferenceClientWithEmbeddedMCP
to me, not an MCPClient
.
That being said, it's just about semantics, but I'm kind of a semantics extremist, sorry about that (and feel free to completely disregard this message, as is very likely XD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was fully expecting
MCPClient
to be passed as parameter toInferenceClient
What do you mean as parameter? Do you have an example signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second option of #2986 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, sure, we can probably add this I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah actually with the async/await stuff i'm not so sure.
Co-authored-by: Mishig <dmishig@gmail.com>
… AsyncInferenceClient" This reverts commit 2c7329c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few comments out of my head, SSE supports would not be so hard to add and could really be a nice addition
self.exit_stack = AsyncExitStack() | ||
self.available_tools: List[ChatCompletionInputTool] = [] | ||
|
||
async def add_mcp_server(self, command: str, args: List[str], env: Dict[str, str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would need to lighten a bit the requirements on your args if you want to make it work with SSE or the intent is just to support STDIO ? I see the rest seems to focus on stdio so maybe it's by design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, just Stdio, but in the future Streaming HTTP
from what i've understood should be the way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is the new spec, but it is backward compatible and at the level you are working with in this PR I wouldnt expect much to change, probably the internals of the client will change but the client interface would remain the same. Which means if today you do something like add_mcp_server(StdioParameters | dict) dict being the arguments of the sse_client from the python sdk you could already support all the SSE servers + potentially future Streaming HTTP server with minor adjustments at most
"function": { | ||
"name": tool.name, | ||
"description": tool.description, | ||
"parameters": tool.inputSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note that I have seen some MCP servers with jsonref in their description which sometimes confuses the model. In mcpadapt I had to resolve the jsonref before passing it to the model, might be minor for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused or sometime plain unsupported by the model sdk like google genai...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, does the spec mention anything about whether jsonref is allowed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the spec mention it however, it gets auto generated if you use pydantic models by the official mcp python sdk using the fastMCP syntax. I had the case for one of my mcp server I use to test things: https://github.com/grll/pubmedmcp
ToolName: TypeAlias = str | ||
|
||
|
||
class MCPClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could potentially also support 2 syntaxes like in the MCPClient for smolagents:
- like you have with try and finally
- one as a context manager, where you directly pass the the MCP client settings + the MCP servers with the tools you want to have this would allow to do something like "with MCPClient(...) as client:" and then just run client.chat.completion with the tools
Thanks for the review @grll! |
FYI on what I was saying about sse_client being easy to add: this is a current proposal for a streaming HTTP client while not compatible and not officially approved or commented on you see that it takes the same arguments as before and even fallback to the SSE client if detected and yield the same as before (a read and write stream). Same as the stdio client. |
@grll indeed, thanks. I think i'll merge this PR almost as-is, but we'd welcome further PRs to add SSE/HTTP and other stuff! |
|
||
# Map tool names to their server for later lookup | ||
for tool in tools: | ||
self.sessions[tool.name] = session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might have some funny situation happening here if multiple mcp servers have similarly named tools, almost missed that one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nothing prevent two mcp servers to have conflicting tool name at the moment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first solution, I would just raise a warning to the user to let them know we are overwriting a tool from another MCP server. We can always do more fancy work in a follow-up PR.
self.sessions[tool.name] = session | |
if tool.name in self.sessions: | |
warning.warns(f"Overwriting existing tool '{tool.name}'. This can lead to unexpected behavior. It is recommended to make sure MCP servers have different tool names.") | |
self.sessions[tool.name] = session |
Having several tools with same names would be a problem anyway since we can't distinguish which tool the LLM chose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a follow-up PR we can define a namespace per mcp client to avoid conflicts (low prio)
provider: PROVIDER_T, | ||
model: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider: PROVIDER_T, | |
model: str, | |
model: str, | |
provider: Optional[PROVIDER_T] = None, |
let's make provider optional for auto selection
"""Connect to an MCP server | ||
|
||
Args: | ||
todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
extras["mcp"] = [ | ||
"mcp>=1.6.0", | ||
"aiohttp", # for AsyncInferenceClient | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extras["mcp"] = [ | |
"mcp>=1.6.0", | |
"aiohttp", # for AsyncInferenceClient | |
] | |
extras["mcp"] = [ | |
"mcp>=1.6.0", | |
] + extras["inference"] |
let's reuse the existing "inference"
extra
if tool_calls is None or len(tool_calls) == 0: | ||
return response | ||
|
||
for tool_call in tool_calls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are in an async context, we should run the tools in coroutines in parallel
(can be done in later PR)
for tool in tools: | ||
self.sessions[tool.name] = session | ||
|
||
self.available_tools += [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make available_tools
a mapping of Tools instead like this:
# in __init__
(...)
self.sessions: Dict[ToolName, ClientSession] = {}
self.available_tools: Dict[ToolName, Tool] = {}
then to add tools:
# in add_mcp_server
(...)
self.available_tools[tool.name] = tool
and finally format them to List[ChatCompletionInputTool]
only when you use them:
# in `process_query`:
tools= [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
},
}
for tool in self.available_tools.items()
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels more idiomatic to me to format only when needed. Also prevent having multiple times the same tool in the list sent to the LLM.
|
||
# Map tool names to their server for later lookup | ||
for tool in tools: | ||
self.sessions[tool.name] = session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first solution, I would just raise a warning to the user to let them know we are overwriting a tool from another MCP server. We can always do more fancy work in a follow-up PR.
self.sessions[tool.name] = session | |
if tool.name in self.sessions: | |
warning.warns(f"Overwriting existing tool '{tool.name}'. This can lead to unexpected behavior. It is recommended to make sure MCP servers have different tool names.") | |
self.sessions[tool.name] = session |
Having several tools with same names would be a problem anyway since we can't distinguish which tool the LLM chose.
|
||
# Map tool names to their server for later lookup | ||
for tool in tools: | ||
self.sessions[tool.name] = session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a follow-up PR we can define a namespace per mcp client to avoid conflicts (low prio)
## Required reading https://modelcontextprotocol.io/quickstart/client TL;DR: MCP is a standard API to expose sets of Tools that can be hooked to LLMs ## Summary of how to use this You can either use McpClient, or you can run an example Agent directly: # Tiny Agent We now have a tiny Agent (a while loop, really) in this PR, built on top of the MCP Client. You can run it like this: ```bash cd packages/mcp-client pnpm run agent ``` # `McpClient` i.e. the underlying class ```ts const client = new McpClient({ provider: "together", model: "Qwen/Qwen2.5-72B-Instruct", apiKey: process.env.HF_TOKEN, }); await client.addMcpServer({ // Filesystem "official" mcp-server with access to your Desktop command: "npx", args: ["-y", "@modelcontextprotocol/server-filesystem", join(homedir(), "Desktop")], }); ``` ## Variant where we call a custom, local MCP server ```ts await client.addMcpServer( "node", ["--disable-warning=ExperimentalWarning", join(homedir(), "Desktop/hf-mcp/index.ts")], { HF_TOKEN: process.env.HF_TOKEN, } ); const response = await client.processQuery(` find an app that generates 3D models from text, and also get the best paper about transformers `); ``` #### Where to find the MCP Server used here as an example https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6 (Note that you can replace it with any MCP Server, from this doc for instance: https://modelcontextprotocol.io/examples) ## Python version Python version will be implemented in `huggingface_hub` in this PR: huggingface/huggingface_hub#2986 Contributions are welcome! --------- Co-authored-by: Eliott C. <coyotte508@gmail.com> Co-authored-by: SBrandeis <simon@huggingface.co> Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
Required reading
https://modelcontextprotocol.io/quickstart/client
TL;DR: MCP is a standard API to expose sets of Tools that can be hooked to LLMs
Summary of how to use this
Open questions
(Async)InferenceClient
directly? Or on a distinct class, like here?Where to find the MCP Server used here as an example
Note that you can replace it with any MCP Server, from this doc for instance: https://modelcontextprotocol.io/examples
https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6
Script output
Generation from LLM with tools
3D Model Generation from Text
Here are some of the best apps that can generate 3D models from text:
Shap-E
Hunyuan3D-1
LGM
3D-Adapter
Fictiverse-Voxel_XL_Lora
Best Paper on Transformers
One of the most influential and highly cited papers on transformers is:
If you are looking for more recent advancements, here are a few other notable papers:
Title: "RoFormer: Enhanced Transformer with Rotary Position Embedding"
Link: huggingface.co/papers/2104.09864
Description: This paper introduces RoFormer, which improves the Transformer by using rotary position embeddings.
Title: "Performer: Generalized Attention with Gaussian Kernels for Sequence Modeling"
Link: huggingface.co/papers/2009.14794
Description: This paper presents Performer, a more efficient and scalable version of the Transformer.
Title: "Longformer: The Long-Document Transformer"
Link: huggingface.co/papers/2004.05150
Description: This paper introduces Longformer, which extends the Transformer to handle very long documents.
These resources should provide you with a solid foundation for both generating 3D models from text and understanding the latest advancements in transformer models.