Skip to content

Commit 7299c12

Browse files
authored
Deep research api (#1921)
1 parent b35868e commit 7299c12

File tree

7 files changed

+1731
-0
lines changed

7 files changed

+1731
-0
lines changed

authors.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,3 +376,8 @@ alexl-oai:
376376
name: "Alex Lowden"
377377
website: "https://www.linkedin.com/in/alex-lowden01/"
378378
avatar: "https://avatars.githubusercontent.com/u/215167546"
379+
380+
glojain:
381+
name: "Glory Jain"
382+
website: "https://www.linkedin.com/in/gloryjain/"
383+
avatar: "https://media.licdn.com/dms/image/v2/C4E03AQH72n6Sm5q69Q/profile-displayphoto-shrink_400_400/profile-displayphoto-shrink_400_400/0/1557995338725?e=1756339200&v=beta&t=FGTXiCZwTZvqHCY-wd8It15EDf11Rex1oLlBKRGHNtY"
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# MCP for Deep Research
2+
3+
This is a minimal example of a Deep Research style MCP server for searching and fetching files from the OpenAI file storage service.
4+
5+
For a reference of _how_ to call this service from the Responses API, with Deep Research see [this cookbook](https://cookbook.openai.com/examples/deep_research_api/introduction_to_deep_research_api). To see how to call the MCP server with the Agents SDK, checkout [this cookbook](https://cookbook.openai.com/examples/deep_research_api/how_to_use_deep_research_API_agents)!
6+
7+
The Deep Research agent relies specifically on Search and Fetch tools. Search should look through your object store for a set of specfic, top-k IDs. Fetch, is a tool that takes objectIds as arguments and pulls back the relevant resources.
8+
9+
## Set up & run
10+
11+
Store your internal file(s) in [OpenAI Vector Storage](https://platform.openai.com/storage/vector_stores/)
12+
13+
Python setup:
14+
15+
```shell
16+
python3 -m venv env
17+
source env/bin/activate
18+
pip install -r requirements.txt
19+
```
20+
21+
Run the server:
22+
23+
```shell
24+
python main.py
25+
```
26+
27+
The server will start on `http://0.0.0.0:8000/sse/` using SSE transport. If you want to reach the server from the public internet, there are a variety of ways to do that including with ngrok:
28+
29+
```shell
30+
brew install ngrok
31+
ngrok config add-authtoken <your_token>
32+
ngrok http 8000
33+
```
34+
35+
You should now be able to reach your local server from your client.
36+
37+
## Files
38+
39+
- `main.py`: Main server code
40+
41+
## Example Flow diagram for MCP Server
42+
43+
```mermaid
44+
flowchart TD
45+
subgraph Connection_Setup
46+
A1[MCP Server starts up<br/>listening on /sse/] --> A2[Client opens SSE connection]
47+
A2 --> A3[Server confirms SSE connection]
48+
end
49+
50+
subgraph Tool_Discovery
51+
A3 --> B1[Client asks 'What tools do you support?']
52+
B1 --> B2[Server replies with Search & Fetch schemas]
53+
B2 --> B3[Client stores schemas in context]
54+
end
55+
56+
subgraph Search_Fetch_Loop
57+
B3 --> C1[Client issues search call]
58+
C1 --> C2[MCP Server routes to Search Tool]
59+
C2 --> C3[Search Tool queries Data Store<br/>returns one hit]
60+
C3 --> C4[Client issues fetch call]
61+
C4 --> C5[MCP Server routes to Fetch Tool]
62+
C5 --> C6[Fetch Tool retrieves document text]
63+
C6 --> C7[Client refines/repeats search<br/> cost-effectiveness, market revenue…]
64+
C7 --> C1
65+
end
66+
```
67+
68+
## Example request
69+
70+
```python
71+
# system_message includes reference to internal file lookups for MCP.
72+
system_message = """
73+
You are a professional researcher preparing a structured, data-driven report on behalf of a global health economics team. Your task is to analyze the health question the user poses.
74+
75+
Do:
76+
- Focus on data-rich insights: include specific figures, trends, statistics, and measurable outcomes (e.g., reduction in hospitalization costs, market size, pricing trends, payer adoption).
77+
- When appropriate, summarize data in a way that could be turned into charts or tables, and call this out in the response (e.g., "this would work well as a bar chart comparing per-patient costs across regions").
78+
- Prioritize reliable, up-to-date sources: peer-reviewed research, health organizations (e.g., WHO, CDC), regulatory agencies, or pharmaceutical earnings reports.
79+
- Include an internal file lookup tool to retrieve information from our own internal data sources. If you've already retrieved a file, do not call fetch again for that same file. Prioritize inclusion of that data.
80+
- Include inline citations and return all source metadata.
81+
82+
Be analytical, avoid generalities, and ensure that each section supports data-backed reasoning that could inform healthcare policy or financial modeling.
83+
"""
84+
85+
user_query = "Research the economic impact of semaglutide on global healthcare systems."
86+
87+
response = client.responses.create(
88+
model="o3-deep-research-2025-06-26",
89+
input=[
90+
{
91+
"role": "developer",
92+
"content": [
93+
{
94+
"type": "input_text",
95+
"text": system_message,
96+
}
97+
]
98+
},
99+
{
100+
"role": "user",
101+
"content": [
102+
{
103+
"type": "input_text",
104+
"text": user_query,
105+
}
106+
]
107+
}
108+
],
109+
reasoning={
110+
"summary": "auto"
111+
},
112+
tools=[
113+
{
114+
"type": "web_search_preview"
115+
},
116+
{ # ADD MCP TOOL SUPPORT
117+
"type": "mcp",
118+
"server_label": "internal_file_lookup",
119+
"server_url": "http://0.0.0.0:8000/sse/", # Update to the location of *your* MCP server
120+
"require_approval": "never"
121+
}
122+
]
123+
)
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Sample MCP Server for Deep Research API Integration
4+
5+
This server implements the Model Context Protocol (MCP) with search and fetch
6+
capabilities designed to work with ChatGPT's deep research feature.
7+
"""
8+
9+
import logging
10+
from typing import Dict, List, Any
11+
from fastmcp import FastMCP
12+
from openai import OpenAI
13+
14+
# Configure logging
15+
logging.basicConfig(level=logging.INFO)
16+
logger = logging.getLogger(__name__)
17+
18+
# OpenAI configuration
19+
OPENAI_API_KEY = ""
20+
VECTOR_STORE_ID = "" #OpenAI Vector Store ID https://platform.openai.com/storage/vector_stores/
21+
22+
# Initialize OpenAI client
23+
openai_client = OpenAI(api_key=OPENAI_API_KEY) if OPENAI_API_KEY else None
24+
25+
# No local data storage needed - using OpenAI Vector Store only
26+
27+
28+
def create_server():
29+
"""Create and configure the MCP server with search and fetch tools."""
30+
31+
# Initialize the FastMCP server
32+
mcp = FastMCP(name="Sample Deep Research MCP Server",
33+
instructions="""
34+
This MCP server provides search and document retrieval capabilities for deep research.
35+
Use the search tool to find relevant documents based on keywords, then use the fetch
36+
tool to retrieve complete document content with citations.
37+
""")
38+
39+
@mcp.tool()
40+
async def search(query: str) -> Dict[str, List[Dict[str, Any]]]:
41+
"""
42+
Search for documents using OpenAI Vector Store search.
43+
44+
This tool searches through the vector store to find semantically relevant matches.
45+
Returns a list of search results with basic information. Use the fetch tool to get
46+
complete document content.
47+
48+
Args:
49+
query: Search query string. Natural language queries work best for semantic search.
50+
51+
Returns:
52+
Dictionary with 'results' key containing list of matching documents.
53+
Each result includes id, title, text snippet, and optional URL.
54+
"""
55+
if not query or not query.strip():
56+
return {"results": []}
57+
58+
if not openai_client:
59+
logger.error("OpenAI client not initialized - API key missing")
60+
raise ValueError(
61+
"OpenAI API key is required for vector store search")
62+
63+
# Search the vector store using OpenAI API
64+
logger.info(
65+
f"Searching vector store {VECTOR_STORE_ID} for query: '{query}'")
66+
67+
response = openai_client.vector_stores.search(
68+
vector_store_id=VECTOR_STORE_ID, query=query)
69+
70+
results = []
71+
72+
# Process the vector store search results
73+
if hasattr(response, 'data') and response.data:
74+
for i, item in enumerate(response.data):
75+
# Extract file_id, filename, and content from the VectorStoreSearchResponse
76+
item_id = getattr(item, 'file_id', f"vs_{i}")
77+
item_filename = getattr(item, 'filename', f"Document {i+1}")
78+
79+
# Extract text content from the content array
80+
content_list = getattr(item, 'content', [])
81+
text_content = ""
82+
if content_list and len(content_list) > 0:
83+
# Get text from the first content item
84+
first_content = content_list[0]
85+
if hasattr(first_content, 'text'):
86+
text_content = first_content.text
87+
elif isinstance(first_content, dict):
88+
text_content = first_content.get('text', '')
89+
90+
if not text_content:
91+
text_content = "No content available"
92+
93+
# Create a snippet from content
94+
text_snippet = text_content[:200] + "..." if len(
95+
text_content) > 200 else text_content
96+
97+
result = {
98+
"id": item_id,
99+
"title": item_filename,
100+
"text": text_snippet,
101+
"url": f"https://platform.openai.com/storage/files/{item_id}"
102+
}
103+
104+
results.append(result)
105+
106+
logger.info(f"Vector store search returned {len(results)} results")
107+
return {"results": results}
108+
109+
@mcp.tool()
110+
async def fetch(id: str) -> Dict[str, Any]:
111+
"""
112+
Retrieve complete document content by ID for detailed analysis and citation.
113+
114+
This tool fetches the full document content from OpenAI Vector Store or local storage.
115+
Use this after finding relevant documents with the search tool to get complete
116+
information for analysis and proper citation.
117+
118+
Args:
119+
id: File ID from vector store (file-xxx) or local document ID
120+
121+
Returns:
122+
Complete document with id, title, full text content, optional URL, and metadata
123+
124+
Raises:
125+
ValueError: If the specified ID is not found
126+
"""
127+
if not id:
128+
raise ValueError("Document ID is required")
129+
130+
if not openai_client:
131+
logger.error("OpenAI client not initialized - API key missing")
132+
raise ValueError(
133+
"OpenAI API key is required for vector store file retrieval")
134+
135+
logger.info(f"Fetching content from vector store for file ID: {id}")
136+
137+
# Fetch file content from vector store
138+
content_response = openai_client.vector_stores.files.content(
139+
vector_store_id=VECTOR_STORE_ID, file_id=id)
140+
141+
# Get file metadata
142+
file_info = openai_client.vector_stores.files.retrieve(
143+
vector_store_id=VECTOR_STORE_ID, file_id=id)
144+
145+
# Extract content from paginated response
146+
file_content = ""
147+
if hasattr(content_response, 'data') and content_response.data:
148+
# Combine all content chunks from FileContentResponse objects
149+
content_parts = []
150+
for content_item in content_response.data:
151+
if hasattr(content_item, 'text'):
152+
content_parts.append(content_item.text)
153+
file_content = "\n".join(content_parts)
154+
else:
155+
file_content = "No content available"
156+
157+
# Use filename as title and create proper URL for citations
158+
filename = getattr(file_info, 'filename', f"Document {id}")
159+
160+
result = {
161+
"id": id,
162+
"title": filename,
163+
"text": file_content,
164+
"url": f"https://platform.openai.com/storage/files/{id}",
165+
"metadata": None
166+
}
167+
168+
# Add metadata if available from file info
169+
if hasattr(file_info, 'attributes') and file_info.attributes:
170+
result["metadata"] = file_info.attributes
171+
172+
logger.info(f"Successfully fetched vector store file: {id}")
173+
return result
174+
175+
return mcp
176+
177+
178+
def main():
179+
"""Main function to start the MCP server."""
180+
# Verify OpenAI client is initialized
181+
if not openai_client:
182+
logger.error(
183+
"OpenAI API key not found. Please set OPENAI_API_KEY environment variable."
184+
)
185+
raise ValueError("OpenAI API key is required")
186+
187+
logger.info(f"Using vector store: {VECTOR_STORE_ID}")
188+
189+
# Create the MCP server
190+
server = create_server()
191+
192+
# Configure and start the server
193+
logger.info("Starting MCP server on 0.0.0.0:8000")
194+
logger.info("Server will be accessible via SSE transport")
195+
logger.info("Connect this server to ChatGPT Deep Research for testing")
196+
197+
try:
198+
# Use FastMCP's built-in run method with SSE transport
199+
server.run(transport="sse", host="0.0.0.0", port=8000)
200+
except KeyboardInterrupt:
201+
logger.info("Server stopped by user")
202+
except Exception as e:
203+
logger.error(f"Server error: {e}")
204+
raise
205+
206+
207+
if __name__ == "__main__":
208+
main()
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Core dependencies for the Deep Research MCP Server
2+
fastmcp>=2.9.0
3+
openai>=1.88.0
4+
uvicorn>=0.34.3
5+
6+
# Additional dependencies that may be required
7+
pydantic>=2.0.0
8+
typing-extensions>=4.0.0
9+
httpx>=0.23.0
10+
python-multipart>=0.0.9
11+
sse-starlette>=1.6.1
12+
starlette>=0.27.0
13+
14+
# Optional but recommended for production
15+
python-dotenv>=1.0.0

0 commit comments

Comments
 (0)