[MCP] Add documentation (#3102)

hanouticelina · julien-c · web-flow · commit cadb7a9e2d42 · 2025-05-22T19:35:30.000+02:00
* mcp documentation

* wording

* style

* title

* Update src/huggingface_hub/inference/_mcp/mcp_client.py

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* Update src/huggingface_hub/inference/_mcp/agent.py

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* Update docs/source/en/package_reference/mcp.md

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* Update docs/source/en/package_reference/mcp.md

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* Update docs/source/en/guides/inference.md

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* Update docs/source/en/guides/inference.md

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* Update docs/source/en/guides/inference.md

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;

* nit

---------

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -66,6 +66,8 @@
       title: Inference Client
     - local: package_reference/inference_endpoints
       title: Inference Endpoints
+    - local: package_reference/mcp
+      title: MCP Client
     - local: package_reference/hf_file_system
       title: HfFileSystem
     - local: package_reference/utilities
diff --git a/docs/source/en/guides/inference.md b/docs/source/en/guides/inference.md
@@ -443,6 +443,69 @@ strictly the same as the sync-only version.
 
 For more information about the `asyncio` module, please refer to the [official documentation](https://docs.python.org/3/library/asyncio.html).
 
+## MCP Client
+
+The `huggingface_hub` library now includes an experimental [`MCPClient`], designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). This client extends an [`AsyncInferenceClient`] to seamlessly integrate Tool usage.
+
+The [`MCPClient`] connects to MCP servers (either local `stdio` scripts or remote `http`/`sse` services) that expose tools. It feeds these tools to an LLM (via [`AsyncInferenceClient`]). If the LLM decides to use a tool, [`MCPClient`] manages the execution request to the MCP server and relays the Tool's output back to the LLM, often streaming results in real-time.
+
+In the following example, we use [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) model via [Nebius](https://nebius.com/) inference provider. We then add a remote MCP server, in this case, an SSE server which made the Flux image generation tool available to the LLM.
+
+```python
+import os
+
+from huggingface_hub import ChatCompletionInputMessage, ChatCompletionStreamOutput, MCPClient
+
+
+async def main():
+    async with MCPClient(
+        provider="nebius",
+        model="Qwen/Qwen2.5-72B-Instruct",
+        api_key=os.environ["HF_TOKEN"],
+    ) as client:
+        await client.add_mcp_server(type="sse", url="https://evalstate-flux1-schnell.hf.space/gradio_api/mcp/sse")
+
+        messages = [
+            {
+                "role": "user",
+                "content": "Generate a picture of a cat on the moon",
+            }
+        ]
+
+        async for chunk in client.process_single_turn_with_tools(messages):
+            # Log messages
+            if isinstance(chunk, ChatCompletionStreamOutput):
+                delta = chunk.choices[0].delta
+                if delta.content:
+                    print(delta.content, end="")
+
+            # Or tool calls
+            elif isinstance(chunk, ChatCompletionInputMessage):
+                print(
+                    f"\nCalled tool '{chunk.name}'. Result: '{chunk.content if len(chunk.content) < 1000 else chunk.content[:1000] + '...'}'"
+                )
+
+
+if __name__ == "__main__":
+    import asyncio
+
+    asyncio.run(main())
+```
+
+
+For even simpler development, we offer a higher-level [`Agent`] class. This 'Tiny Agent' simplifies creating conversational Agents by managing the chat loop and state, essentially acting as a wrapper around [`MCPClient`]. It's designed to be a simple while loop built right on top of an [`MCPClient`]. You can run these Agents directly from the command line:
+
+
+```bash
+# install latest version of huggingface_hub with the mcp extra
+pip install -U huggingface_hub[mcp]
+# Run an agent that uses the Flux image generation tool
+tiny-agents run julien-c/flux-schnell-generator
+
+```
+
+When launched, the Agent will load, list the Tools it has discovered from its connected MCP servers, and then it's ready for your prompts!
+
 ## Advanced tips
 
 In the above section, we saw the main aspects of [`InferenceClient`]. Let's dive into some more advanced tips.
diff --git a/docs/source/en/package_reference/mcp.md b/docs/source/en/package_reference/mcp.md
@@ -0,0 +1,17 @@
+# MCP Client
+
+The `huggingface_hub` library now includes an [`MCPClient`], designed to empower Large Language Models (LLMs) with the ability to interact with external Tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). This client extends an [`AsyncInferenceClient`] to seamlessly integrate Tool usage.
+
+The [`MCPClient`] connects to MCP servers (local `stdio` scripts or remote `http`/`sse` services) that expose tools. It feeds these tools to an LLM (via [`AsyncInferenceClient`]). If the LLM decides to use a tool, [`MCPClient`] manages the execution request to the MCP server and relays the Tool's output back to the LLM, often streaming results in real-time.
+
+We also provide a higher-level [`Agent`] class. This 'Tiny Agent' simplifies creating conversational Agents by managing the chat loop and state, acting as a wrapper around [`MCPClient`].
+
+
+
+## MCP Client
+
+[[autodoc]] MCPClient
+
+## Agent
+
+[[autodoc]] Agent
diff --git a/src/huggingface_hub/__init__.py b/src/huggingface_hub/__init__.py
@@ -443,6 +443,9 @@
         "ZeroShotObjectDetectionOutputElement",
         "ZeroShotObjectDetectionParameters",
     ],
+    "inference._mcp.agent": [
+        "Agent",
+    ],
     "inference._mcp.mcp_client": [
         "MCPClient",
     ],
@@ -525,6 +528,7 @@
 # ```
 
 __all__ = [
+    "Agent",
     "AsyncInferenceClient",
     "AudioClassificationInput",
     "AudioClassificationOutputElement",
@@ -1415,6 +1419,7 @@ def __dir__():
         ZeroShotObjectDetectionOutputElement,  # noqa: F401
         ZeroShotObjectDetectionParameters,  # noqa: F401
     )
+    from .inference._mcp.agent import Agent  # noqa: F401
     from .inference._mcp.mcp_client import MCPClient  # noqa: F401
     from .inference_api import InferenceApi  # noqa: F401
     from .keras_mixin import (
diff --git a/src/huggingface_hub/inference/_mcp/agent.py b/src/huggingface_hub/inference/_mcp/agent.py
@@ -11,8 +11,27 @@
 
 class Agent(MCPClient):
     """
-    Python implementation of a Simple Agent
-    i.e. just a basic while loop on top of an Inference Client with MCP-powered tools
+    Implementation of a Simple Agent, which is a simple while loop built right on top of an [`MCPClient`].
+
+    <Tip warning={true}>
+
+    This class is experimental and might be subject to breaking changes in the future without prior notice.
+
+    </Tip>
+
+    Args:
+        model (`str`):
+            The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
+            or a URL to a deployed Inference Endpoint or other local or remote endpoint.
+        servers (`Iterable[Dict]`):
+            MCP servers to connect to. Each server is a dictionary containing a `type` key and a `config` key. The `type` key can be `"stdio"` or `"sse"`, and the `config` key is a dictionary of arguments for the server.
+        provider (`str`, *optional*):
+            Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
+            If model is a URL or `base_url` is passed, then `provider` is not used.
+        api_key (`str`, *optional*):
+            Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
+        prompt (`str`, *optional*):
+            The system prompt to use for the agent. Defaults to the default system prompt in `constants.py`.
     """
 
     def __init__(
@@ -40,6 +59,15 @@ async def run(
         *,
         abort_event: Optional[asyncio.Event] = None,
     ) -> AsyncGenerator[Union[ChatCompletionStreamOutput, ChatCompletionInputMessage], None]:
+        """
+        Run the agent with the given user input.
+
+        Args:
+            user_input (`str`):
+                The user input to run the agent with.
+            abort_event (`asyncio.Event`, *optional*):
+                An event that can be used to abort the agent. If the event is set, the agent will stop running.
+        """
         self.messages.append({"role": "user", "content": user_input})
 
         num_turns: int = 0
diff --git a/src/huggingface_hub/inference/_mcp/mcp_client.py b/src/huggingface_hub/inference/_mcp/mcp_client.py
@@ -61,6 +61,16 @@ class MCPClient:
     This class is experimental and might be subject to breaking changes in the future without prior notice.
 
     </Tip>
+
+    Args:
+        model (`str`, `optional`):
+            The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
+            or a URL to a deployed Inference Endpoint or other local or remote endpoint.
+        provider (`str`, *optional*):
+            Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
+            If model is a URL or `base_url` is passed, then `provider` is not used.
+        api_key (`str`, `optional`):
+            Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
     """
 
     def __init__(
@@ -107,23 +117,24 @@ async def add_mcp_server(self, type: ServerType, **params: Any):
                 - "stdio": Standard input/output server (local)
                 - "sse": Server-sent events (SSE) server
                 - "http": StreamableHTTP server
-            **params: Server parameters that can be either:
-                - For stdio servers:
-                    - command (str): The command to run the MCP server
-                    - args (List[str], optional): Arguments for the command
-                    - env (Dict[str, str], optional): Environment variables for the command
-                    - cwd (Union[str, Path, None], optional): Working directory for the command
-                - For SSE servers:
-                    - url (str): The URL of the SSE server
-                    - headers (Dict[str, Any], optional): Headers for the SSE connection
-                    - timeout (float, optional): Connection timeout
-                    - sse_read_timeout (float, optional): SSE read timeout
-                - For StreamableHTTP servers:
-                    - url (str): The URL of the StreamableHTTP server
-                    - headers (Dict[str, Any], optional): Headers for the StreamableHTTP connection
-                    - timeout (timedelta, optional): Connection timeout
-                    - sse_read_timeout (timedelta, optional): SSE read timeout
-                    - terminate_on_close (bool, optional): Whether to terminate on close
+            **params (`Dict[str, Any]`):
+                Server parameters that can be either:
+                    - For stdio servers:
+                        - command (str): The command to run the MCP server
+                        - args (List[str], optional): Arguments for the command
+                        - env (Dict[str, str], optional): Environment variables for the command
+                        - cwd (Union[str, Path, None], optional): Working directory for the command
+                    - For SSE servers:
+                        - url (str): The URL of the SSE server
+                        - headers (Dict[str, Any], optional): Headers for the SSE connection
+                        - timeout (float, optional): Connection timeout
+                        - sse_read_timeout (float, optional): SSE read timeout
+                    - For StreamableHTTP servers:
+                        - url (str): The URL of the StreamableHTTP server
+                        - headers (Dict[str, Any], optional): Headers for the StreamableHTTP connection
+                        - timeout (timedelta, optional): Connection timeout
+                        - sse_read_timeout (timedelta, optional): SSE read timeout
+                        - terminate_on_close (bool, optional): Whether to terminate on close
         """
         from mcp import ClientSession, StdioServerParameters
         from mcp import types as mcp_types