Skip to content

PoC: InferenceClient is also a MCPClient #2986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

PoC: InferenceClient is also a MCPClient #2986

wants to merge 8 commits into from

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Apr 8, 2025

Required reading

https://modelcontextprotocol.io/quickstart/client

TL;DR: MCP is a standard API to expose sets of Tools that can be hooked to LLMs

Summary of how to use this

client = MCPClient(
    provider="together",
    model="Qwen/Qwen2.5-72B-Instruct",
    api_key=os.environ["HF_TOKEN"],
)
await client.add_mcp_server(
    "node", ["--disable-warning=ExperimentalWarning", f"{os.path.expanduser('~')}/Desktop/hf-mcp/index.ts"]
    # in the future we'll (very likely) be able to just pass a remote URL,
    # but for now, we need the MCP server to run locally
)

response = await client.process_query(
    """
    find an app that generates 3D models from text,
    and also get the best paper about transformers
    """
)

Open questions

  • Should we implement this in (Async)InferenceClient directly? Or on a distinct class, like here?

Where to find the MCP Server used here as an example

Note that you can replace it with any MCP Server, from this doc for instance: https://modelcontextprotocol.io/examples

https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6

Script output

Generation from LLM with tools

3D Model Generation from Text

Here are some of the best apps that can generate 3D models from text:

  1. Shap-E

  2. Hunyuan3D-1

  3. LGM

  4. 3D-Adapter

  5. Fictiverse-Voxel_XL_Lora

Best Paper on Transformers

One of the most influential and highly cited papers on transformers is:

  • Title: "Attention Is All You Need"
  • Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
  • Link: huggingface.co/papers/1706.03762
  • Description: This paper introduced the Transformer architecture, which has become a cornerstone in natural language processing and many other areas of deep learning.

If you are looking for more recent advancements, here are a few other notable papers:

  • Title: "RoFormer: Enhanced Transformer with Rotary Position Embedding"

  • Link: huggingface.co/papers/2104.09864

  • Description: This paper introduces RoFormer, which improves the Transformer by using rotary position embeddings.

  • Title: "Performer: Generalized Attention with Gaussian Kernels for Sequence Modeling"

  • Link: huggingface.co/papers/2009.14794

  • Description: This paper presents Performer, a more efficient and scalable version of the Transformer.

  • Title: "Longformer: The Long-Document Transformer"

  • Link: huggingface.co/papers/2004.05150

  • Description: This paper introduces Longformer, which extends the Transformer to handle very long documents.

These resources should provide you with a solid foundation for both generating 3D models from text and understanding the latest advancements in transformer models.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@julien-c julien-c requested a review from mishig25 April 8, 2025 12:51
mcp_client.py Outdated
self.exit_stack = AsyncExitStack()
self.available_tools: List[ChatCompletionInputTool] = []

async def add_mcp_server(self, command: str, args: List[str]):
Copy link
Contributor

@mishig25 mishig25 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not name this method connect_to_server ?
we can't add multiple mcp servers to single instance of client, can we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean we would need to store a map of sessions, but there's nothing preventing us from doing it, conceptually

Copy link
Contributor

@mishig25 mishig25 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if the following question is out of scope.
Question: how would I connect to multiple MCP servers? Would it look like option 1 or option 2?

Option 1

client1 = MCPClient()
client1.add_mcp_server()

client2 = MCPClient()
client2.add_mcp_server()

or

Option 2

client = MCPClient()
client.add_mcp_server(server1)
client.add_mcp_server(server2)

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another design question: class MCPClient(AsyncInferenceClient) vs class AsyncInferenceClient(...args, mcp_clients: MCPClient[])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to chime in unannounced, but from a very removed external user standpoint, I find this all very confusing - I just don't think what you coded should be called MCPClient 😄

When I came to this PR I was fully expecting MCPClient to be passed as parameter to InferenceClient, though I hear @Wauplin above, so why not a wrapper. But the end result is really more of an InferenceClientWithEmbeddedMCP to me, not an MCPClient.

That being said, it's just about semantics, but I'm kind of a semantics extremist, sorry about that (and feel free to completely disregard this message, as is very likely XD)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fully expecting MCPClient to be passed as parameter to InferenceClient

What do you mean as parameter? Do you have an example signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second option of #2986 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, sure, we can probably add this I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah actually with the async/await stuff i'm not so sure.

Copy link

@grll grll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments out of my head, SSE supports would not be so hard to add and could really be a nice addition

self.exit_stack = AsyncExitStack()
self.available_tools: List[ChatCompletionInputTool] = []

async def add_mcp_server(self, command: str, args: List[str], env: Dict[str, str]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would need to lighten a bit the requirements on your args if you want to make it work with SSE or the intent is just to support STDIO ? I see the rest seems to focus on stdio so maybe it's by design

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, just Stdio, but in the future Streaming HTTP from what i've understood should be the way to go?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is the new spec, but it is backward compatible and at the level you are working with in this PR I wouldnt expect much to change, probably the internals of the client will change but the client interface would remain the same. Which means if today you do something like add_mcp_server(StdioParameters | dict) dict being the arguments of the sse_client from the python sdk you could already support all the SSE servers + potentially future Streaming HTTP server with minor adjustments at most

"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note that I have seen some MCP servers with jsonref in their description which sometimes confuses the model. In mcpadapt I had to resolve the jsonref before passing it to the model, might be minor for now

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused or sometime plain unsupported by the model sdk like google genai...

Copy link
Member Author

@julien-c julien-c Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, does the spec mention anything about whether jsonref is allowed or not?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the spec mention it however, it gets auto generated if you use pydantic models by the official mcp python sdk using the fastMCP syntax. I had the case for one of my mcp server I use to test things: https://github.com/grll/pubmedmcp

ToolName: TypeAlias = str


class MCPClient:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could potentially also support 2 syntaxes like in the MCPClient for smolagents:

  1. like you have with try and finally
  2. one as a context manager, where you directly pass the the MCP client settings + the MCP servers with the tools you want to have this would allow to do something like "with MCPClient(...) as client:" and then just run client.chat.completion with the tools

@julien-c
Copy link
Member Author

Thanks for the review @grll!

@grll
Copy link

grll commented Apr 18, 2025

FYI on what I was saying about sse_client being easy to add:

https://github.com/modelcontextprotocol/python-sdk/pull/416/files#diff-3223f09c8047aee59a2209f564b9bbf0258d4dabe30be98478b9cc8c599144aeR23-R30

this is a current proposal for a streaming HTTP client while not compatible and not officially approved or commented on you see that it takes the same arguments as before and even fallback to the SSE client if detected and yield the same as before (a read and write stream). Same as the stdio client.

@julien-c
Copy link
Member Author

@grll indeed, thanks. I think i'll merge this PR almost as-is, but we'd welcome further PRs to add SSE/HTTP and other stuff!


# Map tool names to their server for later lookup
for tool in tools:
self.sessions[tool.name] = session
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might have some funny situation happening here if multiple mcp servers have similarly named tools, almost missed that one!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nothing prevent two mcp servers to have conflicting tool name at the moment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first solution, I would just raise a warning to the user to let them know we are overwriting a tool from another MCP server. We can always do more fancy work in a follow-up PR.

Suggested change
self.sessions[tool.name] = session
if tool.name in self.sessions:
warning.warns(f"Overwriting existing tool '{tool.name}'. This can lead to unexpected behavior. It is recommended to make sure MCP servers have different tool names.")
self.sessions[tool.name] = session

Having several tools with same names would be a problem anyway since we can't distinguish which tool the LLM chose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a follow-up PR we can define a namespace per mcp client to avoid conflicts (low prio)

Comment on lines +22 to +23
provider: PROVIDER_T,
model: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provider: PROVIDER_T,
model: str,
model: str,
provider: Optional[PROVIDER_T] = None,

let's make provider optional for auto selection

"""Connect to an MCP server

Args:
todo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Comment on lines +60 to +63
extras["mcp"] = [
"mcp>=1.6.0",
"aiohttp", # for AsyncInferenceClient
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extras["mcp"] = [
"mcp>=1.6.0",
"aiohttp", # for AsyncInferenceClient
]
extras["mcp"] = [
"mcp>=1.6.0",
] + extras["inference"]

let's reuse the existing "inference" extra

if tool_calls is None or len(tool_calls) == 0:
return response

for tool_call in tool_calls:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are in an async context, we should run the tools in coroutines in parallel

(can be done in later PR)

for tool in tools:
self.sessions[tool.name] = session

self.available_tools += [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make available_tools a mapping of Tools instead like this:

# in __init__
    (...)
    self.sessions: Dict[ToolName, ClientSession] = {}
    self.available_tools: Dict[ToolName, Tool] = {}

then to add tools:

# in add_mcp_server
    (...)
    self.available_tools[tool.name] =  tool

and finally format them to List[ChatCompletionInputTool] only when you use them:

# in `process_query`:
    tools= [
        {
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description,
                "parameters": tool.inputSchema,
            },
        }
        for tool in self.available_tools.items()
    ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels more idiomatic to me to format only when needed. Also prevent having multiple times the same tool in the list sent to the LLM.


# Map tool names to their server for later lookup
for tool in tools:
self.sessions[tool.name] = session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first solution, I would just raise a warning to the user to let them know we are overwriting a tool from another MCP server. We can always do more fancy work in a follow-up PR.

Suggested change
self.sessions[tool.name] = session
if tool.name in self.sessions:
warning.warns(f"Overwriting existing tool '{tool.name}'. This can lead to unexpected behavior. It is recommended to make sure MCP servers have different tool names.")
self.sessions[tool.name] = session

Having several tools with same names would be a problem anyway since we can't distinguish which tool the LLM chose.


# Map tool names to their server for later lookup
for tool in tools:
self.sessions[tool.name] = session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a follow-up PR we can define a namespace per mcp client to avoid conflicts (low prio)

julien-c added a commit to huggingface/huggingface.js that referenced this pull request Apr 25, 2025
## Required reading

https://modelcontextprotocol.io/quickstart/client

TL;DR: MCP is a standard API to expose sets of Tools that can be hooked
to LLMs

## Summary of how to use this

You can either use McpClient, or you can run an example Agent directly:

# Tiny Agent

We now have a tiny Agent (a while loop, really) in this PR, built on top
of the MCP Client. You can run it like this:

```bash
cd packages/mcp-client
pnpm run agent
```

# `McpClient` i.e. the underlying class

```ts
const client = new McpClient({
	provider: "together",
	model: "Qwen/Qwen2.5-72B-Instruct",
	apiKey: process.env.HF_TOKEN,
});

await client.addMcpServer({
	// Filesystem "official" mcp-server with access to your Desktop
	command: "npx",
	args: ["-y", "@modelcontextprotocol/server-filesystem", join(homedir(), "Desktop")],
});
```

## Variant where we call a custom, local MCP server

```ts
await client.addMcpServer(
	"node",
	["--disable-warning=ExperimentalWarning", join(homedir(), "Desktop/hf-mcp/index.ts")],
	{
		HF_TOKEN: process.env.HF_TOKEN,
	}
);

const response = await client.processQuery(`
	find an app that generates 3D models from text,
	and also get the best paper about transformers
`);
```
#### Where to find the MCP Server used here as an example

https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6

(Note that you can replace it with any MCP Server, from this doc for
instance: https://modelcontextprotocol.io/examples)

## Python version

Python version will be implemented in `huggingface_hub` in this PR:
huggingface/huggingface_hub#2986

Contributions are welcome!

---------

Co-authored-by: Eliott C. <coyotte508@gmail.com>
Co-authored-by: SBrandeis <simon@huggingface.co>
Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants