Skip to content

Commit cadb7a9

Browse files
[MCP] Add documentation (#3102)
* mcp documentation * wording * style * title * Update src/huggingface_hub/inference/_mcp/mcp_client.py Co-authored-by: Julien Chaumond <[email protected]> * Update src/huggingface_hub/inference/_mcp/agent.py Co-authored-by: Julien Chaumond <[email protected]> * Update docs/source/en/package_reference/mcp.md Co-authored-by: Julien Chaumond <[email protected]> * Update docs/source/en/package_reference/mcp.md Co-authored-by: Julien Chaumond <[email protected]> * Update docs/source/en/guides/inference.md Co-authored-by: Julien Chaumond <[email protected]> * Update docs/source/en/guides/inference.md Co-authored-by: Julien Chaumond <[email protected]> * Update docs/source/en/guides/inference.md Co-authored-by: Julien Chaumond <[email protected]> * nit --------- Co-authored-by: Julien Chaumond <[email protected]>
1 parent 417ad89 commit cadb7a9

File tree

6 files changed

+145
-19
lines changed

6 files changed

+145
-19
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@
6666
title: Inference Client
6767
- local: package_reference/inference_endpoints
6868
title: Inference Endpoints
69+
- local: package_reference/mcp
70+
title: MCP Client
6971
- local: package_reference/hf_file_system
7072
title: HfFileSystem
7173
- local: package_reference/utilities

docs/source/en/guides/inference.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,69 @@ strictly the same as the sync-only version.
443443

444444
For more information about the `asyncio` module, please refer to the [official documentation](https://docs.python.org/3/library/asyncio.html).
445445

446+
## MCP Client
447+
448+
The `huggingface_hub` library now includes an experimental [`MCPClient`], designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). This client extends an [`AsyncInferenceClient`] to seamlessly integrate Tool usage.
449+
450+
The [`MCPClient`] connects to MCP servers (either local `stdio` scripts or remote `http`/`sse` services) that expose tools. It feeds these tools to an LLM (via [`AsyncInferenceClient`]). If the LLM decides to use a tool, [`MCPClient`] manages the execution request to the MCP server and relays the Tool's output back to the LLM, often streaming results in real-time.
451+
452+
In the following example, we use [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) model via [Nebius](https://nebius.com/) inference provider. We then add a remote MCP server, in this case, an SSE server which made the Flux image generation tool available to the LLM.
453+
454+
```python
455+
import os
456+
457+
from huggingface_hub import ChatCompletionInputMessage, ChatCompletionStreamOutput, MCPClient
458+
459+
460+
async def main():
461+
async with MCPClient(
462+
provider="nebius",
463+
model="Qwen/Qwen2.5-72B-Instruct",
464+
api_key=os.environ["HF_TOKEN"],
465+
) as client:
466+
await client.add_mcp_server(type="sse", url="https://evalstate-flux1-schnell.hf.space/gradio_api/mcp/sse")
467+
468+
messages = [
469+
{
470+
"role": "user",
471+
"content": "Generate a picture of a cat on the moon",
472+
}
473+
]
474+
475+
async for chunk in client.process_single_turn_with_tools(messages):
476+
# Log messages
477+
if isinstance(chunk, ChatCompletionStreamOutput):
478+
delta = chunk.choices[0].delta
479+
if delta.content:
480+
print(delta.content, end="")
481+
482+
# Or tool calls
483+
elif isinstance(chunk, ChatCompletionInputMessage):
484+
print(
485+
f"\nCalled tool '{chunk.name}'. Result: '{chunk.content if len(chunk.content) < 1000 else chunk.content[:1000] + '...'}'"
486+
)
487+
488+
489+
if __name__ == "__main__":
490+
import asyncio
491+
492+
asyncio.run(main())
493+
```
494+
495+
496+
For even simpler development, we offer a higher-level [`Agent`] class. This 'Tiny Agent' simplifies creating conversational Agents by managing the chat loop and state, essentially acting as a wrapper around [`MCPClient`]. It's designed to be a simple while loop built right on top of an [`MCPClient`]. You can run these Agents directly from the command line:
497+
498+
499+
```bash
500+
# install latest version of huggingface_hub with the mcp extra
501+
pip install -U huggingface_hub[mcp]
502+
# Run an agent that uses the Flux image generation tool
503+
tiny-agents run julien-c/flux-schnell-generator
504+
505+
```
506+
507+
When launched, the Agent will load, list the Tools it has discovered from its connected MCP servers, and then it's ready for your prompts!
508+
446509
## Advanced tips
447510

448511
In the above section, we saw the main aspects of [`InferenceClient`]. Let's dive into some more advanced tips.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# MCP Client
2+
3+
The `huggingface_hub` library now includes an [`MCPClient`], designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). This client extends an [`AsyncInferenceClient`] to seamlessly integrate Tool usage.
4+
5+
The [`MCPClient`] connects to MCP servers (local `stdio` scripts or remote `http`/`sse` services) that expose tools. It feeds these tools to an LLM (via [`AsyncInferenceClient`]). If the LLM decides to use a tool, [`MCPClient`] manages the execution request to the MCP server and relays the Tool's output back to the LLM, often streaming results in real-time.
6+
7+
We also provide a higher-level [`Agent`] class. This 'Tiny Agent' simplifies creating conversational Agents by managing the chat loop and state, acting as a wrapper around [`MCPClient`].
8+
9+
10+
11+
## MCP Client
12+
13+
[[autodoc]] MCPClient
14+
15+
## Agent
16+
17+
[[autodoc]] Agent

src/huggingface_hub/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,9 @@
443443
"ZeroShotObjectDetectionOutputElement",
444444
"ZeroShotObjectDetectionParameters",
445445
],
446+
"inference._mcp.agent": [
447+
"Agent",
448+
],
446449
"inference._mcp.mcp_client": [
447450
"MCPClient",
448451
],
@@ -525,6 +528,7 @@
525528
# ```
526529

527530
__all__ = [
531+
"Agent",
528532
"AsyncInferenceClient",
529533
"AudioClassificationInput",
530534
"AudioClassificationOutputElement",
@@ -1415,6 +1419,7 @@ def __dir__():
14151419
ZeroShotObjectDetectionOutputElement, # noqa: F401
14161420
ZeroShotObjectDetectionParameters, # noqa: F401
14171421
)
1422+
from .inference._mcp.agent import Agent # noqa: F401
14181423
from .inference._mcp.mcp_client import MCPClient # noqa: F401
14191424
from .inference_api import InferenceApi # noqa: F401
14201425
from .keras_mixin import (

src/huggingface_hub/inference/_mcp/agent.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,27 @@
1111

1212
class Agent(MCPClient):
1313
"""
14-
Python implementation of a Simple Agent
15-
i.e. just a basic while loop on top of an Inference Client with MCP-powered tools
14+
Implementation of a Simple Agent, which is a simple while loop built right on top of an [`MCPClient`].
15+
16+
<Tip warning={true}>
17+
18+
This class is experimental and might be subject to breaking changes in the future without prior notice.
19+
20+
</Tip>
21+
22+
Args:
23+
model (`str`):
24+
The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
25+
or a URL to a deployed Inference Endpoint or other local or remote endpoint.
26+
servers (`Iterable[Dict]`):
27+
MCP servers to connect to. Each server is a dictionary containing a `type` key and a `config` key. The `type` key can be `"stdio"` or `"sse"`, and the `config` key is a dictionary of arguments for the server.
28+
provider (`str`, *optional*):
29+
Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
30+
If model is a URL or `base_url` is passed, then `provider` is not used.
31+
api_key (`str`, *optional*):
32+
Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
33+
prompt (`str`, *optional*):
34+
The system prompt to use for the agent. Defaults to the default system prompt in `constants.py`.
1635
"""
1736

1837
def __init__(
@@ -40,6 +59,15 @@ async def run(
4059
*,
4160
abort_event: Optional[asyncio.Event] = None,
4261
) -> AsyncGenerator[Union[ChatCompletionStreamOutput, ChatCompletionInputMessage], None]:
62+
"""
63+
Run the agent with the given user input.
64+
65+
Args:
66+
user_input (`str`):
67+
The user input to run the agent with.
68+
abort_event (`asyncio.Event`, *optional*):
69+
An event that can be used to abort the agent. If the event is set, the agent will stop running.
70+
"""
4371
self.messages.append({"role": "user", "content": user_input})
4472

4573
num_turns: int = 0

src/huggingface_hub/inference/_mcp/mcp_client.py

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,16 @@ class MCPClient:
6161
This class is experimental and might be subject to breaking changes in the future without prior notice.
6262
6363
</Tip>
64+
65+
Args:
66+
model (`str`, `optional`):
67+
The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
68+
or a URL to a deployed Inference Endpoint or other local or remote endpoint.
69+
provider (`str`, *optional*):
70+
Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
71+
If model is a URL or `base_url` is passed, then `provider` is not used.
72+
api_key (`str`, `optional`):
73+
Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
6474
"""
6575

6676
def __init__(
@@ -107,23 +117,24 @@ async def add_mcp_server(self, type: ServerType, **params: Any):
107117
- "stdio": Standard input/output server (local)
108118
- "sse": Server-sent events (SSE) server
109119
- "http": StreamableHTTP server
110-
**params: Server parameters that can be either:
111-
- For stdio servers:
112-
- command (str): The command to run the MCP server
113-
- args (List[str], optional): Arguments for the command
114-
- env (Dict[str, str], optional): Environment variables for the command
115-
- cwd (Union[str, Path, None], optional): Working directory for the command
116-
- For SSE servers:
117-
- url (str): The URL of the SSE server
118-
- headers (Dict[str, Any], optional): Headers for the SSE connection
119-
- timeout (float, optional): Connection timeout
120-
- sse_read_timeout (float, optional): SSE read timeout
121-
- For StreamableHTTP servers:
122-
- url (str): The URL of the StreamableHTTP server
123-
- headers (Dict[str, Any], optional): Headers for the StreamableHTTP connection
124-
- timeout (timedelta, optional): Connection timeout
125-
- sse_read_timeout (timedelta, optional): SSE read timeout
126-
- terminate_on_close (bool, optional): Whether to terminate on close
120+
**params (`Dict[str, Any]`):
121+
Server parameters that can be either:
122+
- For stdio servers:
123+
- command (str): The command to run the MCP server
124+
- args (List[str], optional): Arguments for the command
125+
- env (Dict[str, str], optional): Environment variables for the command
126+
- cwd (Union[str, Path, None], optional): Working directory for the command
127+
- For SSE servers:
128+
- url (str): The URL of the SSE server
129+
- headers (Dict[str, Any], optional): Headers for the SSE connection
130+
- timeout (float, optional): Connection timeout
131+
- sse_read_timeout (float, optional): SSE read timeout
132+
- For StreamableHTTP servers:
133+
- url (str): The URL of the StreamableHTTP server
134+
- headers (Dict[str, Any], optional): Headers for the StreamableHTTP connection
135+
- timeout (timedelta, optional): Connection timeout
136+
- sse_read_timeout (timedelta, optional): SSE read timeout
137+
- terminate_on_close (bool, optional): Whether to terminate on close
127138
"""
128139
from mcp import ClientSession, StdioServerParameters
129140
from mcp import types as mcp_types

0 commit comments

Comments
 (0)