diff --git a/examples/partners/mcp_powered_voice_agents/database.db b/examples/partners/mcp_powered_voice_agents/database.db new file mode 100644 index 0000000000..372e89dd8e Binary files /dev/null and b/examples/partners/mcp_powered_voice_agents/database.db differ diff --git a/examples/partners/mcp_powered_voice_agents/mcp_powered_agents_cookbook.ipynb b/examples/partners/mcp_powered_voice_agents/mcp_powered_agents_cookbook.ipynb new file mode 100644 index 0000000000..72aa39fc2a --- /dev/null +++ b/examples/partners/mcp_powered_voice_agents/mcp_powered_agents_cookbook.ipynb @@ -0,0 +1,976 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2LgIIWiQ_zS4" + }, + "source": [ + "# MCP‑Powered Agentic Voice Framework\n", + "\n", + "### Agents\n", + "Agents are becoming the de-facto framework in which we orchestrate various, often specialized, LLMs applications to work with one another. Many practical applications require the use of external tools to create a complex workflow for LLM-based agents.\n", + "\n", + "Model Context Protocol (MCP) has quickly become the open standard for building Agentic systems. The protocol provides easy integration of common tool services and the interoperability between models across the AI ecosystem.\n", + "\n", + "### What is MCP?\n", + "Model Context Protocol (MCP) is an open protocol designed to standardize how AI models - especially large language models (LLMs) - interface with external tools, data sources, and context providers in a secure, modular, and composable way. MCP provides a unified framework for sending structured requests from an agent or application to a set of “tool services,” such as databases, APIs, or custom logic modules. By adopting MCP, developers can,\n", + "* Decouple agent logic from tool implementations: Agents can call out to tools (like a database or search service) using a standard protocol, rather than relying on hardcoded integrations.\n", + "* Enforce consistent security and governance: MCP defines authentication, authorization, and data boundary controls between the model and external resources.\n", + "* Support modular, reusable agent architectures: Tools can be swapped, updated, or extended without changing the agent code, making it easy to evolve complex workflows.\n", + "* Run tools locally or remotely: The same protocol works whether a tool is running in the customer’s environment or in the cloud, supporting privacy and data residency requirements.\n", + "\n", + "MCP acts as the “middleware” that bridges AI models and the external world, enabling secure, flexible, and maintainable integration of real-world context and capabilities into conversational or autonomous agents.\n", + "\n", + "### Agents in the enterprise\n", + "In today’s enterprise landscape, conversational agents - especially voice-powered ones—are quickly becoming a standard for customer support, internal helpdesks, and task automation. Yet, building robust, scalable voice agents is challenging due to fragmented tooling, integration complexity, and the need for reliable orchestration of backend systems. A common pattern seen across the enterprise landscape is to develop agents that are backed by knowledge bases (both structured and unstructured). These bots are divided into several categories:\n", + " - copilots for internal use, and \n", + " - customer-facing assistants. \n", + "The latter of the two use cases, i.e. customer-facing assistants, tends to have a higher requirement for both accuracy, usability and design. Additionally, one common requirement for customer-facing chatbots is the need to add voice as a modality for user interface (i.e. for phone call automation).\n", + "\n", + "These Q&A chatbots apply to a wide range of industries: healthcare, government, legal and other industries that requires a easy way for knowledge retrieval at a user's fingertips.\n", + "\n", + "One such industry is the insurance industry, where we've seen tremendous value for customers we work with in the space. Insurance policies are complex and navigating the system can often be difficult for policy holders.\n", + "\n", + "### What's in this Cookbook?\n", + "In this cookbook, we provide an end-to-end modular recipe leveraging MCP for building voice-enabled agents using the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/). In particular, we demonstrate how we can use it for dynamic context management and using agentic tool-calling. We demonstrate the capabilities of such a system for the aforementioned insurance use-case. In this example, we demonstrate the use of MCP for various tools that you may want for your application. Specifically, we showcase the use of custom MCP servers (for text retrieval and web search) as well as using predefined MCP servers (for SQLite). " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pItv9wdaOJfL" + }, + "source": [ + "### End-to-end Flow\n", + "\n", + "This section outlines a straightforward setup for deploying microservices for tools within the MCP framework, specifically focusing on RAG, database lookup, and web search functionalities. The MCP servers are responsible not only for hosting these services but also for performing RAG indexing to support backend operations.\n", + "\n", + "We employ a \"chained\" approach for voice input and output throughout the system. During inference, the workflow begins by capturing a user's voice input, which is transcribed to text using a speech-to-text system. This transcribed text is then sent to the Planner agent, which determines which tools to invoke and makes requests to the appropriate microservices. After retrieving tool outputs, the Planner agent synthesizes a cohesive, contextually appropriate response. This textual response is subsequently converted to audio using a text-to-speech system, delivering the final voice response to the user.\n", + "\n", + "The end-to-end workflow is summarized in the diagram below:\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fQYezWo2C5t0" + }, + "source": [ + "![Cookbook_image](./../../../images/partner_mcp_Cookbook.svg)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1-WOUoRKNdZG" + }, + "source": [ + "### Installing dependencies\n", + "First, we install the library dependencies for the project.\n", + "\n", + "> Note: One specific dependency that may be needed on your machine, is to install `ffmpeg`. If you are using a mac, you will need to install this separately using `brew install ffmpeg`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "0YKzEa44ODbP" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.1.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n", + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.1.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "#install dependencies\n", + "%pip install asyncio ffmpeg ffprobe mcp openai openai-agents pydub scipy sounddevice uv --quiet\n", + "%pip install \"openai-agents[voice]\" --quiet" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UOrxjtbL3T8X" + }, + "source": [ + "### Setup\n", + "\n", + "To execute this cookbook, you'll need to install the following packages providing access to OpenAI's API, the Agents SDK, MCP, and libraries for audio processing. Additionally, you can set your OpenAI API key for use by the agents via the `set_default_openai_key` function." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "aMsySrYz1rIL" + }, + "outputs": [], + "source": [ + "import socket\n", + "import time\n", + "import warnings\n", + "from typing import List, Optional, AsyncGenerator\n", + "\n", + "from numpy.typing import NDArray\n", + "\n", + "\n", + "\n", + "warnings.filterwarnings(\"ignore\", category=SyntaxWarning)\n", + "\n", + "\n", + "async def wait_for_server_ready(port: int = 8000, timeout: float = 10) -> None:\n", + " \"\"\"Wait for SSE server to be ready\"\"\"\n", + " start = time.time()\n", + " while time.time() - start < timeout:\n", + " try:\n", + " with socket.create_connection((\"localhost\", port), timeout=1):\n", + " print(\"✅ SSE server TCP port is accepting connections.\")\n", + " return\n", + " except OSError as e:\n", + " if time.time() - start > timeout - 1: # Only print on last attempt\n", + " print(f\"Waiting for server... ({e})\")\n", + " time.sleep(0.5)\n", + " raise RuntimeError(\"❌ SSE server did not become ready in time.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KJMNqVCLDVNC" + }, + "source": [ + "### Defining Tool-use Agents through custom MCP services\n", + "\n", + "First, we define a custom MCP service that host the RAG and web search tools using the `FastMCP` interface. Specifically, we add `@mcp.tool` functions for:\n", + "\n", + "1. Retrieving information from a RAG service\n", + "2. Searching the broader internet for information using OpenAI's `web_search`\n", + "\n", + "\n", + "For the purpose in this cookbook, we'll run both tools under the same service.\n", + "\n", + "The below code has been provided in `search_server.py` within the same directory. Run the code to start the server. As the server runs, your files will be indexed and stored in the vector store. \n", + "\n", + "You can run the `search_server.py` file by running the following command:\n", + "\n", + " ```bash\n", + " uv run python search_server.py \n", + " ```\n", + "\n", + "Once the server is running, you can access the vector store and files at https://platform.openai.com/storage/files and https://platform.openai.com/storage/vector_stores respectively, and continue with running the next cells in the notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```python\n", + "# search_server.py\n", + "import os\n", + "from mcp.server.fastmcp import FastMCP\n", + "from openai import OpenAI\n", + "from agents import set_tracing_export_api_key\n", + "\n", + "# Create server\n", + "mcp = FastMCP(\"Search Server\")\n", + "_vector_store_id = \"\"\n", + "\n", + "def _run_rag(query: str) -> str:\n", + " \"\"\"Do a search for answers within the knowledge base and internal documents of the user.\n", + " Args:\n", + " query: The user query\n", + " \"\"\"\n", + " results = client.vector_stores.search(\n", + " vector_store_id=_vector_store_id,\n", + " query=query,\n", + " rewrite_query=True, # Query rewriting generally improves results\n", + " )\n", + " return results.data[0].content[0].text\n", + "\n", + "\n", + "def _summarize_rag_response(rag_output: str) -> str:\n", + " \"\"\"Summarize the RAG response using GPT-4\n", + " Args:\n", + " rag_output: The RAG response\n", + " \"\"\"\n", + " response = client.responses.create(\n", + " model=\"gpt-4.1-mini\",\n", + " tools=[{\"type\": \"web_search_preview\"}],\n", + " input=\"Summarize the following text concisely: \\n\\n\" + rag_output,\n", + " )\n", + " return response.output_text\n", + "\n", + "\n", + "@mcp.tool()\n", + "def generate_rag_output(query: str) -> str:\n", + " \"\"\"Generate a summarized RAG output for a given query.\n", + " Args:\n", + " query: The user query\n", + " \"\"\"\n", + " print(\"[debug-server] generate_rag_output: \", query)\n", + " rag_output = _run_rag(query)\n", + " return _summarize_rag_response(rag_output)\n", + "\n", + "\n", + "@mcp.tool()\n", + "def run_web_search(query: str) -> str:\n", + " \"\"\"Run a web search for the given query.\n", + " Args:\n", + " query: The user query\n", + " \"\"\"\n", + " print(\"[debug-server] run_web_search:\", query)\n", + " response = client.responses.create(\n", + " model=\"gpt-4.1-mini\",\n", + " tools=[{\"type\": \"web_search_preview\"}],\n", + " input=query,\n", + " )\n", + " return response.output_text\n", + "\n", + "\n", + "def index_documents(directory: str):\n", + " \"\"\"Index the documents in the given directory to the vector store\n", + " Args:\n", + " directory: The directory to index the documents from\n", + " \"\"\"\n", + " # OpenAI supported file extensions for retrieval (see docs)\n", + " SUPPORTED_EXTENSIONS = {'.pdf', '.txt', '.md', '.docx', '.pptx', '.csv', '.rtf', '.html', '.json', '.xml'}\n", + " # Collect all files in the specified directory\n", + " files = [os.path.join(directory, f) for f in os.listdir(directory)]\n", + " # Filter files for supported extensions only\n", + " supported_files = []\n", + " for file_path in files:\n", + " _, ext = os.path.splitext(file_path)\n", + " if ext.lower() in SUPPORTED_EXTENSIONS:\n", + " supported_files.append(file_path)\n", + " else:\n", + " print(f\"[warning] Skipping unsupported file for retrieval: {file_path}\")\n", + "\n", + " vector_store = client.vector_stores.create( # Create vector store\n", + " name=\"Support FAQ\",\n", + " )\n", + " global _vector_store_id\n", + " _vector_store_id = vector_store.id\n", + "\n", + " for file_path in supported_files:\n", + " # Upload each file to the vector store, ensuring the file handle is closed\n", + " with open(file_path, \"rb\") as fp:\n", + " client.vector_stores.files.upload_and_poll(\n", + " vector_store_id=vector_store.id,\n", + " file=fp\n", + " )\n", + " print(f\"[debug-server] uploading file: {file_path}\")\n", + "\n", + "\n", + "if __name__ == \"__main__\":\n", + " oai_api_key = os.environ.get(\"OPENAI_API_KEY\")\n", + " if not oai_api_key:\n", + " raise ValueError(\"OPENAI_API_KEY environment variable is not set\")\n", + " set_tracing_export_api_key(oai_api_key)\n", + " client = OpenAI(api_key=oai_api_key)\n", + "\n", + " current_dir = os.path.dirname(os.path.abspath(__file__))\n", + " samples_dir = os.path.join(current_dir, \"sample_files\")\n", + " index_documents(samples_dir)\n", + "\n", + " mcp.run(transport=\"sse\")\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uWYXigXVGg-w" + }, + "source": [ + "As seen above, we also include the RAG indexing as part of this workflow. In real-world applications, this will not be necessary for every run and if you have a large corpus of data, you may put this in a separate process.\n", + "\n", + "In addition to simple RAG retrieval, we add an extra step to summarize the RAG output. This step is not always necessary, though we've found this to provide more succinct responses to the planner. Whether to do this depends on your system and your latency requirements.\n", + "\n", + "\n", + "### Using Pre-defined MCP Servers\n", + "\n", + "While implementing custom MCPs servers is relatively straightforward, the power of MCP is the ability to use pre-defined servers that others have built and maintain. Using existing implementations enables more rapid development, has a consistent interface with other tools, and makes data integration more seamless. \n", + "\n", + "For our database lookup tool, we use the prebuilt [SQLite server](https://github.com/modelcontextprotocol/servers-archived/tree/main/src/sqlite) implementation. As you will see below, we can implement this simply with just a comand line prompt and providing it with a `*.db` file with the data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HJKqscj87Jg5" + }, + "source": [ + "### Defining the Planner Agent\n", + "\n", + "Next, we can define how the MCP server will generate meaningful responses. The planner agent is a key component within MCP’s agent orchestration pipeline. Its primary function is to decompose user requests into actionable steps and decide which tools, APIs, or agents should be called at each stage. Given the input as text, the planner parses and analyzes the request, maintaining context across multiple turns. Based on the conversation state, it invokes MCP tool services by dispatching tool calls via the MCP server’s orchestration layer. The agent then collects intermediate results, synthesizes responses, and guides the conversation toward resolution.\n", + "\n", + "A key design consideration is the model selection for the planner. While larger models like `4.1` offer superior reasoning, low end-to-end latency is critical in voice-driven applications. For this reason, we select the `4.1-mini` model, which achieves a strong balance between reasoning ability and response speed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u1kIMV2AAaAW" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[non-fatal] Tracing client error 400: {\n", + " \"error\": {\n", + " \"message\": \"Invalid type for 'data[2].span_data.result': expected an array of strings, but got null instead.\",\n", + " \"type\": \"invalid_request_error\",\n", + " \"param\": \"data[2].span_data.result\",\n", + " \"code\": \"invalid_type\"\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "from agents import Agent, trace\n", + "from agents.mcp import MCPServer, MCPServerSse, MCPServerStdio\n", + "from agents.extensions.handoff_prompt import prompt_with_handoff_instructions\n", + "\n", + "voice_system_prompt = \"\"\"[Voice Output Guidelines]\n", + "Your responses will be delivered via voice, so please:\n", + "1. Use conversational, natural language that sounds good when spoken\n", + "2. Keep responses concise - ideally 1-2 sentences per point\n", + "3. Avoid technical jargon unless necessary, and explain terms simply\n", + "4. Pause naturally between topics using brief sentences\n", + "5. Be warm and personable in tone\n", + "\"\"\"\n", + "\n", + "\n", + "async def create_insurance_agents(mcp_servers: list[MCPServer]) -> Agent:\n", + " \"\"\"Create the insurance agent workflow with voice optimization\"\"\"\n", + " \n", + " # Main insurance agent with MCP tools\n", + " insurance_agent = Agent(\n", + " name=\"InsuranceAssistant\",\n", + " instructions=voice_system_prompt + prompt_with_handoff_instructions(\"\"\"\n", + " #Identity\n", + " You an a helpful chatbot that answers questions about our insurance plans. \n", + " #Task\n", + " Use the tools provided to answer the questions. \n", + " #Instructions\n", + " * Information about plans and policies are best answered with sqlite or rag_output tools.\n", + " * web_search should be used for answering generic health questions that are not directly related to our insurance plans.\n", + " * Evaluate the quality of the answer after the tool call. \n", + " * Assess whether you are confident in the answer generated.\n", + " * If your confidence is low, try use another tool.\n", + " \"\"\"),\n", + " mcp_servers=mcp_servers,\n", + " model=\"gpt-4.1-mini\",\n", + " )\n", + " \n", + " return insurance_agent" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the agent definition, we clearly specify when each tool should be used. This ensures better control over responses and improves answer relevance. We also provide the Voice Agent with guidelines to set the desired tone and level of precision in its replies." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "afUsni7W7L2M" + }, + "source": [ + "### Defining configurations for voice \n", + "\n", + "Next, we define the configurations for our voice module, both for speech-to-text (STT) and text-to-speech (TTS). We use the OpenAI Agent Voice library to handling both input and output of voice. As defaults, this API calls the `gpt-4o-transcribe` and `gpt-4o-mini-tts` for STT and TTS, respectively.\n", + "\n", + "For more content on defining voice assistants, see [this Cookbook](https://cookbook.openai.com/examples/agents_sdk/app_assistant_voice_agents)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "D4J2SEKq5_WB" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import sounddevice as sd\n", + "\n", + "\n", + "from agents.voice import (\n", + " AudioInput,\n", + " SingleAgentVoiceWorkflow,\n", + " VoicePipeline,\n", + " VoicePipelineConfig,\n", + " TTSModelSettings\n", + ")\n", + "\n", + "AudioBuffer = List[NDArray[np.int16]]\n", + "\n", + "AUDIO_CONFIG = {\n", + " \"samplerate\": 24000,\n", + " \"channels\": 1,\n", + " \"dtype\": \"int16\",\n", + " \"blocksize\": 2400,\n", + " \"silence_threshold\": 500,\n", + " \"silence_duration\": 1.5,\n", + " \"min_speech_duration\": 0.5,\n", + "}\n", + "\n", + "insurance_tts_settings = TTSModelSettings(\n", + " instructions=(\n", + " \"Personality: Professional, knowledgeable, and helpful insurance advisor\"\n", + " \"Tone: Friendly, clear, and reassuring, making customers feel confident about their insurance choices\"\n", + " \"Pronunciation: Clear and articulate, ensuring insurance terms are easily understood\"\n", + " \"Tempo: Moderate pace with natural pauses, especially when explaining complex insurance concepts\"\n", + " \"Emotion: Warm and supportive, conveying trust and expertise in insurance matters\"\n", + " )\n", + ")\n", + "\n", + "class AudioStreamManager:\n", + " \"\"\"Context manager for handling audio streams\"\"\"\n", + " def __init__(self, input_stream: sd.InputStream, output_stream: sd.OutputStream):\n", + " self.input_stream = input_stream\n", + " self.output_stream = output_stream\n", + "\n", + " async def __aenter__(self):\n", + " try:\n", + " self.input_stream.start()\n", + " self.output_stream.start()\n", + " return self\n", + " except sd.PortAudioError as e:\n", + " raise RuntimeError(f\"Failed to start audio streams: {e}\")\n", + "\n", + " async def __aexit__(self, exc_type, exc_val, exc_tb):\n", + " try:\n", + " if self.input_stream:\n", + " self.input_stream.stop()\n", + " self.input_stream.close()\n", + " if self.output_stream:\n", + " self.output_stream.stop()\n", + " self.output_stream.close()\n", + " except Exception as e:\n", + " print(f\"Warning: Error during audio stream cleanup: {e}\")\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In enterprise scenarios, the tone and style of audio responses are critical to system usability. Speech output should consistently reflect professionalism and align with the company's brand identity. For most applications, this means generating a realistic voice that mirrors the courteous, approachable demeanor typical of call-center representatives. With TTS, we can leverage prompt engineering to guide the model toward producing audio that better matches specific customer use cases and brand values." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1ZVYg3SMENdj" + }, + "source": [ + "### Processing Voice I/O\n", + "\n", + "After configuring the voice settings, the next step is to implement functions for processing incoming audio and generating spoken responses. Pay particular attention to the `silence_threshold` parameter in your configuration—this plays a crucial role in accurately detecting when a user has finished speaking and helps with speech endpoint detection." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "rA-OY3HuENEi" + }, + "outputs": [], + "source": [ + "import asyncio\n", + "\n", + "async def continuous_voice_conversation(agent: Agent):\n", + " \"\"\"Run a continuous voice conversation with automatic speech detection\"\"\"\n", + " \n", + " voice_config = VoicePipelineConfig(\n", + " tts_settings=insurance_tts_settings,\n", + " )\n", + " \n", + " pipeline = VoicePipeline(\n", + " workflow=SingleAgentVoiceWorkflow(agent),\n", + " config=voice_config\n", + " )\n", + " \n", + " audio_queue: asyncio.Queue[NDArray[np.int16]] = asyncio.Queue()\n", + " is_agent_speaking = False\n", + " \n", + " def audio_callback(indata: NDArray[np.int16], frames: int, time_info: dict, status: sd.CallbackFlags) -> None:\n", + " \"\"\"Callback for continuous audio input\"\"\"\n", + " if status:\n", + " print(f\"Audio input status: {status}\")\n", + " if not is_agent_speaking: # Only record when agent isn't speaking\n", + " audio_queue.put_nowait(indata.copy())\n", + " \n", + " input_stream = sd.InputStream(\n", + " samplerate=AUDIO_CONFIG[\"samplerate\"],\n", + " channels=AUDIO_CONFIG[\"channels\"],\n", + " dtype=AUDIO_CONFIG[\"dtype\"],\n", + " callback=audio_callback,\n", + " blocksize=AUDIO_CONFIG[\"blocksize\"]\n", + " )\n", + " \n", + " output_stream = sd.OutputStream(\n", + " samplerate=AUDIO_CONFIG[\"samplerate\"],\n", + " channels=AUDIO_CONFIG[\"channels\"],\n", + " dtype=AUDIO_CONFIG[\"dtype\"]\n", + " )\n", + " \n", + " print(\"🎙️ Insurance Voice Assistant Ready!\")\n", + " print(\"Start speaking at any time. Say 'goodbye' to exit.\")\n", + " print(\"-\" * 50)\n", + " \n", + " async with AudioStreamManager(input_stream, output_stream):\n", + " silence_threshold = AUDIO_CONFIG[\"silence_threshold\"]\n", + " silence_duration = 0\n", + " max_silence = AUDIO_CONFIG[\"silence_duration\"]\n", + " audio_buffer: AudioBuffer = []\n", + " \n", + " while True:\n", + " try:\n", + " chunk = await asyncio.wait_for(audio_queue.get(), timeout=0.1)\n", + " \n", + " if np.abs(chunk).mean() > silence_threshold:\n", + " audio_buffer.append(chunk)\n", + " silence_duration = 0\n", + " elif audio_buffer:\n", + " silence_duration += 0.1\n", + " audio_buffer.append(chunk)\n", + " \n", + " if silence_duration >= max_silence:\n", + " try:\n", + " full_audio = np.concatenate(audio_buffer, axis=0)\n", + " \n", + " if len(full_audio) > AUDIO_CONFIG[\"samplerate\"] * AUDIO_CONFIG[\"min_speech_duration\"]:\n", + " print(\"\\n🤔 Processing speech...\")\n", + " \n", + " is_agent_speaking = True\n", + " \n", + " audio_input = AudioInput(buffer=full_audio)\n", + " \n", + " with trace(\"Insurance Voice Query\"):\n", + " result = await pipeline.run(audio_input)\n", + " \n", + " print(\"💬 Assistant responding...\")\n", + " async for event in result.stream():\n", + " if event.type == \"voice_stream_event_audio\":\n", + " output_stream.write(event.data)\n", + " elif event.type == \"voice_stream_event_transcript\":\n", + " print(f\" > {event.text}\", end=\"\", flush=True)\n", + " \n", + " print(\"\\n\")\n", + " \n", + " except Exception as e:\n", + " print(f\"\\n❌ Error processing speech: {e}\")\n", + " finally:\n", + " is_agent_speaking = False\n", + " audio_buffer = []\n", + " silence_duration = 0\n", + " \n", + " except asyncio.TimeoutError:\n", + " continue\n", + " except KeyboardInterrupt:\n", + " print(\"\\n\\n👋 Goodbye!\")\n", + " break\n", + " except Exception as e:\n", + " print(f\"\\n❌ Unexpected error: {e}\")\n", + " if isinstance(e, (sd.PortAudioError, RuntimeError)):\n", + " raise\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setting up the server process\n", + "\n", + "Next, we add a simple convenience function for bringing up servers locally: " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "import subprocess\n", + "import nest_asyncio\n", + "\n", + "\n", + "class ServerProcess:\n", + " \"\"\"Context manager for handling the SSE server process\"\"\"\n", + " def __init__(self, server_file: str):\n", + " self.server_file = server_file\n", + " self.process: Optional[subprocess.Popen] = None\n", + "\n", + " async def __aenter__(self):\n", + " if not shutil.which(\"uv\"):\n", + " raise RuntimeError(\n", + " \"uv is not installed. Please install it: https://docs.astral.sh/uv/getting-started/installation/\"\n", + " )\n", + "\n", + " print(\"Starting SSE server at http://localhost:8000/sse ...\")\n", + " self.process = subprocess.Popen([\"uv\", \"run\", self.server_file])\n", + " try:\n", + " await wait_for_server_ready()\n", + " nest_asyncio.apply()\n", + " print(\"SSE server started. Starting voice assistant...\\n\")\n", + " return self\n", + " except Exception as e:\n", + " if self.process:\n", + " self.process.terminate()\n", + " raise RuntimeError(f\"Failed to start SSE server: {e}\")\n", + "\n", + " async def __aexit__(self, exc_type, exc_val, exc_tb):\n", + " if self.process:\n", + " try:\n", + " self.process.terminate()\n", + " self.process.wait(timeout=5)\n", + " if self.process.poll() is None:\n", + " self.process.kill()\n", + " except Exception as e:\n", + " print(f\"Warning: Error during server shutdown: {e}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "--UKG_qD6aRM" + }, + "source": [ + "### Specifying the MCP tool services\n", + "\n", + "In our `main` function, we can bring up the various tool-use services we're interested in.\n", + "\n", + "For our custom server for (RAG and web search), we can use the `MCPServerSse` function to start a server (in this case locally). To bring up the standard MCP SQLite service, we call `MCPServerStdio` with simple arguments provided, in this case, the local `database.db` file." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "6zsMnqsw6bko" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "async def main():\n", + " \"\"\"Main function to run the voice assistant\"\"\"\n", + " this_dir=os.getcwd()\n", + " #this_dir = os.path.dirname(os.path.abspath(__file__))\n", + " server_file= os.path.join(this_dir, \"search_server.py\")\n", + " #server_file = os.path.join(this_dir, \"search_server.py\")\n", + "\n", + " async with ServerProcess(server_file):\n", + " # Initialize MCP servers\n", + " async with MCPServerSse(\n", + " name=\"SSE Python Server\",\n", + " params={\n", + " \"url\": \"http://localhost:8000/sse\",\n", + " \"timeout\": 15.0,\n", + " },\n", + " client_session_timeout_seconds=15.0,\n", + " ) as search_server:\n", + " async with MCPServerStdio(\n", + " cache_tools_list=True,\n", + " params={\"command\": \"uvx\", \"args\": [\"mcp-server-sqlite\", \"--db-path\", \"./database.db\"]},\n", + " ) as sql_server:\n", + " # Create insurance agent with MCP tools\n", + " agent = await create_insurance_agents([search_server, sql_server])\n", + " \n", + " # Run the voice assistant\n", + " try:\n", + " await continuous_voice_conversation(agent)\n", + " except Exception as e:\n", + " print(f\"\\nError in voice conversation: {e}\")\n", + " raise\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summarizing the flow\n", + "\n", + "Now that we have the various pieces in place, we can take a step back and visualize the overall workflow of our system:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Cookbook_image](./../../../images/System_flow_partner_mcp.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iNbJ3n2qB-vT" + }, + "source": [ + "### Tying it all together\n", + "Finally, we can instantiate the custom tool-use server and bring up the service:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "\n", + "try:\n", + " asyncio.get_running_loop().create_task(main())\n", + "except RuntimeError:\n", + " # For Jupyter, use nest_asyncio and run main as a task\n", + " import nest_asyncio\n", + " nest_asyncio.apply()\n", + " task = asyncio.create_task(main())\n", + " try:\n", + " await task\n", + " except KeyboardInterrupt:\n", + " print(\"\\nShutting down gracefully...\")\n", + " except Exception as e:\n", + " print(f\"\\nFatal error: {e}\")\n", + " raise" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nZHzDV8Y9JwB" + }, + "source": [ + "## Example outputs\n", + "\n", + "Now that we have built the system end-to-end, we can now use it to answer questions. Here, we use our system to provide answers for a few common insurance questions based on the policy information docs. Below are some sample voice outputs from our agents based on some common questions users have:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**How are prescription drugs covered under this plan?** (uses retrieval)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from IPython.display import display, Audio\n", + "import os\n", + "\n", + "# Get the absolute path to the audio file\n", + "audio_path = os.path.join(os.getcwd(), \"sample_output\", \"rag.mp3\")\n", + "\n", + "# Check if the file exists before trying to play it\n", + "if os.path.exists(audio_path):\n", + " display(Audio(audio_path))\n", + "else:\n", + " print(f\"Audio file not found at: {audio_path}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Which policies have monthly premium less than $300?** (uses DB lookup with SQL)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(Audio(\"sample_output/sqlite.mp3\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**What are effective treatments for diabetes?** (uses Web Search)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(Audio(\"sample_output/web_search.mp3\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Examining Traces\n", + "By default, model and tool calls that are used in our application are added to the [Traces](https://platform.openai.com/traces) dashboard out-of-the-box. These traces provide meaningful insight into what users experience as they use our agents. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Cookbook_image](./../../../images//trace-sk1_partner.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Beyond agent performance, one critical aspect of building voice agents is the latency of responses. With the Traces dashboard, we are able to view the breakdown of walltime for each step to help debug and find areas of improvement for latency: " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Cookbook_image](./../../../images/Traces-2_partner.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explore individual traces to see each function call and its output, as shown below.\n", + "\n", + "![image](../../../images/traces_partner_granular.png)\n", + "\n", + "Traces offer granular visibility into function calls and their execution times, making it easy to identify sources of latency (for example, the web search tool above). Analyzing response time variability for each tool invocation helps you pinpoint bottlenecks and opportunities for optimization in production systems." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xzpded4L8ecZ" + }, + "source": [ + "## Conclusion\n", + "\n", + "This cookbook has guided you through building a complete agent solution that harnesses the flexibility and strength of the MCP platform. By integrating the Voice Agents SDK, we illustrated how to develop a consumer-ready product powered by these technologies. We've shown how OpenAI’s tools and the Agents API can be effectively combined with MCP to deliver impactful applications.\n", + "\n", + "We hope this guide has offered both practical instruction and inspiration, helping you create your own MCP-powered voice agents tailored to your specific needs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Contributors\n", + "\n", + "This cookbook serves as a joint collaboration effort between OpenAI and [Brain Co](https://www.braincompany.ai/en/).\n", + "\n", + "- [Cece Z](https://www.linkedin.com/in/cecez/)\n", + "- [Sibon Li](https://www.linkedin.com/in/sibon-li-9a9bba34/)\n", + "- [Shikhar Kwatra](https://www.linkedin.com/in/shikharkwatra/)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/examples/partners/mcp_powered_voice_agents/sample_files/Evergreen_Health_Platinum_Plan.pdf b/examples/partners/mcp_powered_voice_agents/sample_files/Evergreen_Health_Platinum_Plan.pdf new file mode 100644 index 0000000000..4e4e4be74f Binary files /dev/null and b/examples/partners/mcp_powered_voice_agents/sample_files/Evergreen_Health_Platinum_Plan.pdf differ diff --git a/examples/partners/mcp_powered_voice_agents/sample_output/rag.mp3 b/examples/partners/mcp_powered_voice_agents/sample_output/rag.mp3 new file mode 100644 index 0000000000..434eeb7071 Binary files /dev/null and b/examples/partners/mcp_powered_voice_agents/sample_output/rag.mp3 differ diff --git a/examples/partners/mcp_powered_voice_agents/sample_output/sqlite.mp3 b/examples/partners/mcp_powered_voice_agents/sample_output/sqlite.mp3 new file mode 100644 index 0000000000..cef9a00c39 Binary files /dev/null and b/examples/partners/mcp_powered_voice_agents/sample_output/sqlite.mp3 differ diff --git a/examples/partners/mcp_powered_voice_agents/sample_output/web_search.mp3 b/examples/partners/mcp_powered_voice_agents/sample_output/web_search.mp3 new file mode 100644 index 0000000000..3564d92bae Binary files /dev/null and b/examples/partners/mcp_powered_voice_agents/sample_output/web_search.mp3 differ diff --git a/examples/partners/mcp_powered_voice_agents/search_server.py b/examples/partners/mcp_powered_voice_agents/search_server.py new file mode 100755 index 0000000000..7f599855eb --- /dev/null +++ b/examples/partners/mcp_powered_voice_agents/search_server.py @@ -0,0 +1,107 @@ +import os +from mcp.server.fastmcp import FastMCP +from openai import OpenAI +from agents import set_tracing_export_api_key + +# Create server +mcp = FastMCP("Search Server") +_vector_store_id = "" + +def _run_rag(query: str) -> str: + """Do a search for answers within the knowledge base and internal documents of the user. + Args: + query: The user query + """ + results = client.vector_stores.search( + vector_store_id=_vector_store_id, + query=query, + rewrite_query=True, # Query rewriting generally improves results + ) + return results.data[0].content[0].text + + +def _summarize_rag_response(rag_output: str) -> str: + """Summarize the RAG response using GPT-4 + Args: + rag_output: The RAG response + """ + response = client.responses.create( + model="gpt-4.1-mini", + tools=[{"type": "web_search_preview"}], + input="Summarize the following text concisely: \n\n" + rag_output, + ) + return response.output_text + + +@mcp.tool() +def generate_rag_output(query: str) -> str: + """Generate a summarized RAG output for a given query. + Args: + query: The user query + """ + print("[debug-server] generate_rag_output: ", query) + rag_output = _run_rag(query) + return _summarize_rag_response(rag_output) + + +@mcp.tool() +def run_web_search(query: str) -> str: + """Run a web search for the given query. + Args: + query: The user query + """ + print("[debug-server] run_web_search:", query) + response = client.responses.create( + model="gpt-4.1-mini", + tools=[{"type": "web_search_preview"}], + input=query, + ) + return response.output_text + + +def index_documents(directory: str): + """Index the documents in the given directory to the vector store + Args: + directory: The directory to index the documents from + """ + # OpenAI supported file extensions for retrieval (see docs) + SUPPORTED_EXTENSIONS = {'.pdf', '.txt', '.md', '.docx', '.pptx', '.csv', '.rtf', '.html', '.json', '.xml'} + # Collect all files in the specified directory + files = [os.path.join(directory, f) for f in os.listdir(directory)] + # Filter files for supported extensions only + supported_files = [] + for file_path in files: + _, ext = os.path.splitext(file_path) + if ext.lower() in SUPPORTED_EXTENSIONS: + supported_files.append(file_path) + else: + print(f"[warning] Skipping unsupported file for retrieval: {file_path}") + + vector_store = client.vector_stores.create( # Create vector store + name="Support FAQ", + ) + global _vector_store_id + _vector_store_id = vector_store.id + + for file_path in supported_files: + # Upload each file to the vector store, ensuring the file handle is closed + with open(file_path, "rb") as fp: + client.vector_stores.files.upload_and_poll( + vector_store_id=vector_store.id, + file=fp + ) + print(f"[debug-server] uploading file: {file_path}") + + +if __name__ == "__main__": + oai_api_key = os.environ.get("OPENAI_API_KEY") + if not oai_api_key: + raise ValueError("OPENAI_API_KEY environment variable is not set") + set_tracing_export_api_key(oai_api_key) + client = OpenAI(api_key=oai_api_key) + + current_dir = os.path.dirname(os.path.abspath(__file__)) + samples_dir = os.path.join(current_dir, "sample_files") + index_documents(samples_dir) + + mcp.run(transport="sse") diff --git a/images/System_flow_partner_mcp.png b/images/System_flow_partner_mcp.png new file mode 100644 index 0000000000..55ec0465c2 Binary files /dev/null and b/images/System_flow_partner_mcp.png differ diff --git a/images/Traces-1_partner.png b/images/Traces-1_partner.png new file mode 100644 index 0000000000..0466e085e8 Binary files /dev/null and b/images/Traces-1_partner.png differ diff --git a/images/Traces-2_partner.png b/images/Traces-2_partner.png new file mode 100644 index 0000000000..1a9f34c61e Binary files /dev/null and b/images/Traces-2_partner.png differ diff --git a/images/partner_mcp_Cookbook.svg b/images/partner_mcp_Cookbook.svg new file mode 100644 index 0000000000..79ea75ed0a --- /dev/null +++ b/images/partner_mcp_Cookbook.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/images/trace-sk1_partner.png b/images/trace-sk1_partner.png new file mode 100644 index 0000000000..d7269873c6 Binary files /dev/null and b/images/trace-sk1_partner.png differ diff --git a/images/traces_partner_granular.png b/images/traces_partner_granular.png new file mode 100644 index 0000000000..d2ebbeb8ae Binary files /dev/null and b/images/traces_partner_granular.png differ diff --git a/registry.yaml b/registry.yaml index 6d8b6ae325..6b535bb520 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,6 +4,20 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. +- title: MCP Powered Voice Agents + path: examples/partners/mcp_powered_voice_agents/mcp_powered_agents_cookbook.ipynb + date: 2025-06-12 + authors: + - shikhar-cyber + - Cece Z + - Sibon li + tags: + - mcp + - voice + - agents-sdk + - functions + - tracing + - title: Eval Driven System Design - From Prototype to Production path: examples/partners/eval_driven_system_design/receipt_inspection.ipynb date: 2025-06-02