PoC: `InferenceClient` is also a `MCPClient` #2986

julien-c · 2025-04-08T11:28:05Z

Required reading

https://modelcontextprotocol.io/quickstart/client

TL;DR: MCP is a standard API to expose sets of Tools that can be hooked to LLMs

Summary of how to use this

client = MCPClient(
    provider="together",
    model="Qwen/Qwen2.5-72B-Instruct",
    api_key=os.environ["HF_TOKEN"],
)
await client.add_mcp_server(
    "node", ["--disable-warning=ExperimentalWarning", f"{os.path.expanduser('~')}/Desktop/hf-mcp/index.ts"]
    # in the future we'll (very likely) be able to just pass a remote URL,
    # but for now, we need the MCP server to run locally
)

response = await client.process_query(
    """
    find an app that generates 3D models from text,
    and also get the best paper about transformers
    """
)

Open questions

Should we implement this in (Async)InferenceClient directly? Or on a distinct class, like here?

Where to find the MCP Server used here as an example

Note that you can replace it with any MCP Server, from this doc for instance: https://modelcontextprotocol.io/examples

https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6

Script output

Generation from LLM with tools

3D Model Generation from Text

Here are some of the best apps that can generate 3D models from text:

Shap-E
- Link: huggingface.co/spaces/hysts/Shap-E
- Description: This app supports both text-to-3D and image-to-3D generation.
Hunyuan3D-1
- Link: huggingface.co/spaces/tencent/Hunyuan3D-1
- Description: Another robust tool for generating 3D models from text and images.
LGM
- Link: huggingface.co/spaces/ashawkey/LGM
- Description: Generates 3D content from images or text.
3D-Adapter
- Link: huggingface.co/spaces/Lakonik/3D-Adapter
- Description: Generates 3D models from text descriptions.
Fictiverse-Voxel_XL_Lora
- Link: huggingface.co/spaces/mygyasir/Fictiverse-Voxel_XL_Lora
- Description: Generates 3D models from text prompts.

Best Paper on Transformers

One of the most influential and highly cited papers on transformers is:

Title: "Attention Is All You Need"
Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
Link: huggingface.co/papers/1706.03762
Description: This paper introduced the Transformer architecture, which has become a cornerstone in natural language processing and many other areas of deep learning.

If you are looking for more recent advancements, here are a few other notable papers:

Title: "RoFormer: Enhanced Transformer with Rotary Position Embedding"
Link: huggingface.co/papers/2104.09864
Description: This paper introduces RoFormer, which improves the Transformer by using rotary position embeddings.
Title: "Performer: Generalized Attention with Gaussian Kernels for Sequence Modeling"
Link: huggingface.co/papers/2009.14794
Description: This paper presents Performer, a more efficient and scalable version of the Transformer.
Title: "Longformer: The Long-Document Transformer"
Link: huggingface.co/papers/2004.05150
Description: This paper introduces Longformer, which extends the Transformer to handle very long documents.

These resources should provide you with a solid foundation for both generating 3D models from text and understanding the latest advancements in transformer models.

HuggingFaceDocBuilderDev · 2025-04-08T11:32:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ferenceClient

mishig25 · 2025-04-08T14:03:40Z

mcp_client.py

+        self.exit_stack = AsyncExitStack()
+        self.available_tools: List[ChatCompletionInputTool] = []
+
+    async def add_mcp_server(self, command: str, args: List[str]):


why not name this method connect_to_server ?
we can't add multiple mcp servers to single instance of client, can we?

i mean we would need to store a map of sessions, but there's nothing preventing us from doing it, conceptually

let me know if the following question is out of scope.
Question: how would I connect to multiple MCP servers? Would it look like option 1 or option 2?

Option 1

client1 = MCPClient() client1.add_mcp_server() client2 = MCPClient() client2.add_mcp_server()

or

Option 2

client = MCPClient() client.add_mcp_server(server1) client.add_mcp_server(server2)

?

another design question: class MCPClient(AsyncInferenceClient) vs class AsyncInferenceClient(...args, mcp_clients: MCPClient[])

Sorry to chime in unannounced, but from a very removed external user standpoint, I find this all very confusing - I just don't think what you coded should be called MCPClient 😄

When I came to this PR I was fully expecting MCPClient to be passed as parameter to InferenceClient, though I hear @Wauplin above, so why not a wrapper. But the end result is really more of an InferenceClientWithEmbeddedMCP to me, not an MCPClient.

That being said, it's just about semantics, but I'm kind of a semantics extremist, sorry about that (and feel free to completely disregard this message, as is very likely XD)

I was fully expecting MCPClient to be passed as parameter to InferenceClient

What do you mean as parameter? Do you have an example signature?

second option of #2986 (comment)

ah yes, sure, we can probably add this I guess

ah actually with the async/await stuff i'm not so sure.

mcp_client.py

Co-authored-by: Mishig <dmishig@gmail.com>

@mishig25

cc @mishig25

… AsyncInferenceClient" This reverts commit 2c7329c.

@Wauplin

cc @Wauplin

grll

few comments out of my head, SSE supports would not be so hard to add and could really be a nice addition

grll · 2025-04-16T16:26:43Z

mcp_client.py

+        self.exit_stack = AsyncExitStack()
+        self.available_tools: List[ChatCompletionInputTool] = []
+
+    async def add_mcp_server(self, command: str, args: List[str], env: Dict[str, str]):


you would need to lighten a bit the requirements on your args if you want to make it work with SSE or the intent is just to support STDIO ? I see the rest seems to focus on stdio so maybe it's by design

for now, just Stdio, but in the future Streaming HTTP from what i've understood should be the way to go?

yes this is the new spec, but it is backward compatible and at the level you are working with in this PR I wouldnt expect much to change, probably the internals of the client will change but the client interface would remain the same. Which means if today you do something like add_mcp_server(StdioParameters | dict) dict being the arguments of the sse_client from the python sdk you could already support all the SSE servers + potentially future Streaming HTTP server with minor adjustments at most

grll · 2025-04-16T16:28:27Z

mcp_client.py

+                "function": {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "parameters": tool.inputSchema,


just a note that I have seen some MCP servers with jsonref in their description which sometimes confuses the model. In mcpadapt I had to resolve the jsonref before passing it to the model, might be minor for now

confused or sometime plain unsupported by the model sdk like google genai...

Interesting, does the spec mention anything about whether jsonref is allowed or not?

I don't think the spec mention it however, it gets auto generated if you use pydantic models by the official mcp python sdk using the fastMCP syntax. I had the case for one of my mcp server I use to test things: https://github.com/grll/pubmedmcp

grll · 2025-04-16T16:35:24Z

mcp_client.py

+ToolName: TypeAlias = str
+
+
+class MCPClient:


could potentially also support 2 syntaxes like in the MCPClient for smolagents:

like you have with try and finally

one as a context manager, where you directly pass the the MCP client settings + the MCP servers with the tools you want to have this would allow to do something like "with MCPClient(...) as client:" and then just run client.chat.completion with the tools

julien-c · 2025-04-17T07:49:41Z

Thanks for the review @grll!

grll · 2025-04-18T06:50:56Z

FYI on what I was saying about sse_client being easy to add:

https://github.com/modelcontextprotocol/python-sdk/pull/416/files#diff-3223f09c8047aee59a2209f564b9bbf0258d4dabe30be98478b9cc8c599144aeR23-R30

this is a current proposal for a streaming HTTP client while not compatible and not officially approved or commented on you see that it takes the same arguments as before and even fallback to the SSE client if detected and yield the same as before (a read and write stream). Same as the stdio client.

julien-c · 2025-04-18T07:54:41Z

@grll indeed, thanks. I think i'll merge this PR almost as-is, but we'd welcome further PRs to add SSE/HTTP and other stuff!

grll · 2025-04-18T08:10:54Z

mcp_client.py

+
+        # Map tool names to their server for later lookup
+        for tool in tools:
+            self.sessions[tool.name] = session


you might have some funny situation happening here if multiple mcp servers have similarly named tools, almost missed that one!

(nothing prevent two mcp servers to have conflicting tool name at the moment)

As a first solution, I would just raise a warning to the user to let them know we are overwriting a tool from another MCP server. We can always do more fancy work in a follow-up PR.

Suggested change

self.sessions[tool.name] = session

if tool.name in self.sessions:

warning.warns(f"Overwriting existing tool '{tool.name}'. This can lead to unexpected behavior. It is recommended to make sure MCP servers have different tool names.")

self.sessions[tool.name] = session

Having several tools with same names would be a problem anyway since we can't distinguish which tool the LLM chose.

in a follow-up PR we can define a namespace per mcp client to avoid conflicts (low prio)

Wauplin · 2025-04-23T14:50:55Z

mcp_client.py

+        provider: PROVIDER_T,
+        model: str,


Suggested change

provider: PROVIDER_T,

model: str,

model: str,

provider: Optional[PROVIDER_T] = None,

let's make provider optional for auto selection

Wauplin · 2025-04-23T14:55:09Z

mcp_client.py

+        """Connect to an MCP server
+
+        Args:
+            todo


Wauplin · 2025-04-23T14:59:54Z

setup.py

+extras["mcp"] = [
+    "mcp>=1.6.0",
+    "aiohttp",  # for AsyncInferenceClient
+]


Suggested change

extras["mcp"] = [

"mcp>=1.6.0",

"aiohttp", # for AsyncInferenceClient

]

extras["mcp"] = [

"mcp>=1.6.0",

] + extras["inference"]

let's reuse the existing "inference" extra

Wauplin · 2025-04-23T15:10:30Z

mcp_client.py

+        if tool_calls is None or len(tool_calls) == 0:
+            return response
+
+        for tool_call in tool_calls:


Since we are in an async context, we should run the tools in coroutines in parallel

(can be done in later PR)

Wauplin · 2025-04-23T15:19:41Z

mcp_client.py

+        for tool in tools:
+            self.sessions[tool.name] = session
+
+        self.available_tools += [


I would make available_tools a mapping of Tools instead like this:

# in __init__ (...) self.sessions: Dict[ToolName, ClientSession] = {} self.available_tools: Dict[ToolName, Tool] = {}

then to add tools:

# in add_mcp_server (...) self.available_tools[tool.name] = tool

and finally format them to List[ChatCompletionInputTool] only when you use them:

# in `process_query`: tools= [ { "type": "function", "function": { "name": tool.name, "description": tool.description, "parameters": tool.inputSchema, }, } for tool in self.available_tools.items() ]

It feels more idiomatic to me to format only when needed. Also prevent having multiple times the same tool in the list sent to the LLM.

Wauplin · 2025-04-23T15:25:11Z

mcp_client.py

+
+        # Map tool names to their server for later lookup
+        for tool in tools:
+            self.sessions[tool.name] = session


As a first solution, I would just raise a warning to the user to let them know we are overwriting a tool from another MCP server. We can always do more fancy work in a follow-up PR.

Suggested change

self.sessions[tool.name] = session

if tool.name in self.sessions:

warning.warns(f"Overwriting existing tool '{tool.name}'. This can lead to unexpected behavior. It is recommended to make sure MCP servers have different tool names.")

self.sessions[tool.name] = session

Having several tools with same names would be a problem anyway since we can't distinguish which tool the LLM chose.

Wauplin · 2025-04-23T15:26:05Z

mcp_client.py

+
+        # Map tool names to their server for later lookup
+        for tool in tools:
+            self.sessions[tool.name] = session


in a follow-up PR we can define a namespace per mcp client to avoid conflicts (low prio)

## Required reading https://modelcontextprotocol.io/quickstart/client TL;DR: MCP is a standard API to expose sets of Tools that can be hooked to LLMs ## Summary of how to use this You can either use McpClient, or you can run an example Agent directly: # Tiny Agent We now have a tiny Agent (a while loop, really) in this PR, built on top of the MCP Client. You can run it like this: ```bash cd packages/mcp-client pnpm run agent ``` # `McpClient` i.e. the underlying class ```ts const client = new McpClient({ provider: "together", model: "Qwen/Qwen2.5-72B-Instruct", apiKey: process.env.HF_TOKEN, }); await client.addMcpServer({ // Filesystem "official" mcp-server with access to your Desktop command: "npx", args: ["-y", "@modelcontextprotocol/server-filesystem", join(homedir(), "Desktop")], }); ``` ## Variant where we call a custom, local MCP server ```ts await client.addMcpServer( "node", ["--disable-warning=ExperimentalWarning", join(homedir(), "Desktop/hf-mcp/index.ts")], { HF_TOKEN: process.env.HF_TOKEN, } ); const response = await client.processQuery(` find an app that generates 3D models from text, and also get the best paper about transformers `); ``` #### Where to find the MCP Server used here as an example https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6 (Note that you can replace it with any MCP Server, from this doc for instance: https://modelcontextprotocol.io/examples) ## Python version Python version will be implemented in `huggingface_hub` in this PR: huggingface/huggingface_hub#2986 Contributions are welcome! --------- Co-authored-by: Eliott C. <coyotte508@gmail.com> Co-authored-by: SBrandeis <simon@huggingface.co> Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>

julien-c added 2 commits April 8, 2025 13:05

Add extra dependency

c720d86

PoC: InferenceClient is also a MCPClient

a0be544

julien-c requested a review from mishig25 April 8, 2025 12:51

[using Claude] change the code to make MCPClient inherit from AsyncIn…

2c7329c

…ferenceClient

mishig25 reviewed Apr 8, 2025

View reviewed changes

mcp_client.py Outdated Show resolved Hide resolved

julien-c and others added 5 commits April 8, 2025 16:20

Update mcp_client.py

cef1bba

Co-authored-by: Mishig <dmishig@gmail.com>

mcp_client: Support multiple servers (#2987)

42f036e

cc @mishig25

Revert "[using Claude] change the code to make MCPClient inherit from…

990a926

… AsyncInferenceClient" This reverts commit 2c7329c.

add_mcp_server: the env should not be hardcoded here

879d2ee

cc @Wauplin

Handle the "no tool call" case

9ee3c68

julien-c mentioned this pull request Apr 11, 2025

PoC: InferenceClient is also a McpClient huggingface/huggingface.js#1351

Merged

grll reviewed Apr 16, 2025

View reviewed changes

grll reviewed Apr 18, 2025

View reviewed changes

Wauplin reviewed Apr 23, 2025

View reviewed changes

PoC: InferenceClient is also a MCPClient #2986

Are you sure you want to change the base?

PoC: InferenceClient is also a MCPClient #2986

Conversation

julien-c commented Apr 8, 2025

Required reading

Summary of how to use this

Open questions

Where to find the MCP Server used here as an example

Script output

3D Model Generation from Text

Best Paper on Transformers

HuggingFaceDocBuilderDev commented Apr 8, 2025

mishig25 Apr 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mishig25 Apr 8, 2025 • edited Loading

Choose a reason for hiding this comment

Option 1

Option 2

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julien-c Apr 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julien-c commented Apr 17, 2025

grll commented Apr 18, 2025

julien-c commented Apr 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PoC: `InferenceClient` is also a `MCPClient` #2986

PoC: `InferenceClient` is also a `MCPClient` #2986

mishig25 Apr 8, 2025 •

edited

Loading

mishig25 Apr 8, 2025 •

edited

Loading

julien-c Apr 17, 2025 •

edited

Loading