Skip to content

Commit 4a5af48

Browse files
authored
OpenAI responses endpoint (#320)
1 parent 6447570 commit 4a5af48

File tree

9 files changed

+875
-167
lines changed

9 files changed

+875
-167
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
## [0.88.0]
1414

1515
### Added
16-
- Added support for OpenAI's Responses API (`/responses` endpoint) with reasoning traces and streaming support via `OpenAIResponseSchema`. Use `airespond(OpenAIResponseSchema(), prompt)` for models like `gpt-5.1-codex` that require this endpoint.
16+
- Added support for OpenAI's Responses API (`/responses` endpoint) via `OpenAIResponseSchema`. Supports reasoning traces, multi-turn conversations with `previous_response_id`, and structured extraction with `aiextract`. Use `aigenerate(OpenAIResponseSchema(), prompt; model="o4-mini")` for reasoning models (access via `result.extras[:reasoning_content]`). See `examples/working_with_responses_api.jl`. Note: Many features are not supported yet, eg, streaming, built-in tools, etc.
1717

1818
## [0.87.0]
1919

README.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ For more practical examples, see the `examples/` folder and the [Advanced Exampl
101101
- [Experimental Agent Workflows / Output Validation with `airetry!`](#experimental-agent-workflows--output-validation-with-airetry)
102102
- [Using Ollama models](#using-ollama-models)
103103
- [Using MistralAI API and other OpenAI-compatible APIs](#using-mistralai-api-and-other-openai-compatible-apis)
104+
- [Using OpenAI Responses API](#using-openai-responses-api)
104105
- [Using Anthropic Models](#using-anthropic-models)
105106
- [More Examples](#more-examples)
106107
- [Package Interface](#package-interface)
@@ -587,13 +588,52 @@ As you can see, it also works for any local models that you might have running o
587588

588589
Note: At the moment, we only support `aigenerate` and `aiembed` functions for MistralAI and other OpenAI-compatible APIs. We plan to extend the support in the future.
589590

591+
### Using OpenAI Responses API
592+
593+
PromptingTools.jl supports OpenAI's **Responses API** (`/responses` endpoint) in addition to the traditional Chat Completions API. The Responses API offers several advantages for agentic workflows and reasoning models:
594+
595+
**Key Benefits:**
596+
- **Server-side state management**: No need to send full conversation history with each request
597+
- **Better cache utilization**: 40-80% improved cache hits, reducing latency and costs
598+
- **Built-in tools**: Native web search, file search, and code interpreter without round-trips
599+
- **Reasoning model support**: Better preservation of reasoning traces for models like o1, o3, and GPT-5
600+
- **Multimodal-first design**: Text, images, and tools as first-class citizens
601+
602+
```julia
603+
# Use the Responses API with any compatible model
604+
schema = OpenAIResponseSchema()
605+
msg = aigenerate(schema, "What is Julia?"; model="gpt-5-mini")
606+
607+
# Enable web search (built-in tool)
608+
msg = aigenerate(schema, "What are the latest Julia releases?";
609+
model="gpt-5-mini", enable_websearch=true)
610+
611+
# With reasoning enabled (for reasoning models)
612+
msg = aigenerate(schema, "Solve: What is 15% of 80?";
613+
model="o3-mini",
614+
api_kwargs = (reasoning = Dict("effort" => "medium", "summary" => "auto"),))
615+
616+
# Access reasoning summary
617+
println(msg.extras[:reasoning_content])
618+
619+
# Continue conversations using previous_response_id
620+
msg2 = aigenerate(schema, "Tell me more";
621+
model="gpt-5-mini", previous_response_id=msg.extras[:response_id])
622+
```
623+
624+
**When to use which API:**
625+
- **Chat Completions API** (default): Straightforward conversations, established integrations, maximum compatibility
626+
- **Responses API**: Complex agent workflows, tool use, reasoning models, state-heavy applications
627+
628+
See the [FAQ](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions/#Why-use-the-Responses-API-instead-of-Chat-Completions?) for more details.
629+
590630
### Using Anthropic Models
591631

592632
Make sure the `ANTHROPIC_API_KEY` environment variable is set to your API key.
593633

594634
```julia
595635
# cladeuh is alias for Claude 3 Haiku
596-
ai"Say hi!"claudeh
636+
ai"Say hi!"claudeh
597637
```
598638

599639
Preset model aliases are `claudeo`, `claudes`, and `claudeh`, for Claude 3 Opus, Sonnet, and Haiku, respectively.

docs/src/coverage_of_model_providers.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ Below is an overview of the model providers supported by PromptingTools.jl, alon
1010

1111
| Abstract Schema | Schema | Model Provider | aigenerate | aiembed | aiextract | aiscan | aiimage | aiclassify |
1212
|-------------------------|---------------------------|----------------------------------------|------------|---------|-----------|--------|---------|------------|
13-
| AbstractOpenAISchema | OpenAISchema | OpenAI |||||||
13+
| AbstractOpenAISchema | OpenAISchema | OpenAI (Chat Completions) |||||||
14+
| AbstractResponseSchema | OpenAIResponseSchema*** | OpenAI (Responses API) |||||||
1415
| AbstractOpenAISchema | CustomOpenAISchema* | Any OpenAI-compatible API (eg, vLLM)* |||||||
1516
| AbstractOpenAISchema | LocalServerOpenAISchema** | Any OpenAI-compatible Local server** |||||||
1617
| AbstractOpenAISchema | MistralOpenAISchema | Mistral AI |||||||
@@ -28,6 +29,8 @@ Below is an overview of the model providers supported by PromptingTools.jl, alon
2829

2930
\*\* This schema is a flavor of CustomOpenAISchema with a `url` key preset by global preference key `LOCAL_SERVER`. It is specifically designed for seamless integration with Llama.jl and utilizes an ENV variable for the URL, making integration easier in certain workflows, such as when nested calls are involved and passing `api_kwargs` is more challenging.
3031

32+
\*\*\* The Responses API (`OpenAIResponseSchema`) is OpenAI's newer API designed for agentic workflows and reasoning models. Key features include server-side state management (no need to send full conversation history), built-in tools (web search, file search, code interpreter), and better support for reasoning models (o1, o3, GPT-5). Use `previous_response_id` kwarg to continue conversations. See the [FAQ](frequently_asked_questions.md#Why-use-the-Responses-API-instead-of-Chat-Completions) for details.
33+
3134
**Note 1:** `aitools` has identical support as `aiextract` for all providers, as it has the API requirements.
3235

3336
**Note 2:** The `aiscan` and `aiimage` functions rely on specific endpoints being implemented by the provider. Ensure that the provider you choose supports these functionalities.

docs/src/frequently_asked_questions.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,107 @@ There will be situations not or cannot use it (eg, privacy, cost, etc.). In that
88

99
Note: To get started with [Ollama.ai](https://ollama.ai/), see the [Setup Guide for Ollama](#setup-guide-for-ollama) section below.
1010

11+
## Why use the Responses API instead of Chat Completions?
12+
13+
OpenAI offers two main APIs for interacting with their models:
14+
- **Chat Completions API** (`/v1/chat/completions`) - The traditional, widely-adopted approach
15+
- **Responses API** (`/v1/responses`) - A newer API designed for agentic workflows and reasoning models
16+
17+
### Key Advantages of the Responses API
18+
19+
**1. Server-Side State Management**
20+
21+
With Chat Completions, you must maintain conversation history yourself, sending the full message array with each request. This becomes unwieldy with long conversations, attachments, and tools.
22+
23+
The Responses API manages state server-side. You simply reference a `previous_response_id` to continue conversations:
24+
25+
```julia
26+
schema = OpenAIResponseSchema()
27+
msg1 = aigenerate(schema, "What is Julia?"; model="gpt-5-mini")
28+
29+
# Continue the conversation without resending history
30+
msg2 = aigenerate(schema, "Tell me more about its type system";
31+
model="gpt-5-mini", previous_response_id=msg1.extras[:response_id])
32+
```
33+
34+
**2. Better Performance and Cost Efficiency**
35+
36+
OpenAI reports 40-80% better cache utilization with the Responses API, leading to:
37+
- Reduced latency (cached tokens are processed faster)
38+
- Lower costs (cached tokens are cheaper)
39+
- More efficient multi-turn conversations
40+
41+
**3. Built-in Tools**
42+
43+
The Responses API provides hosted tools that execute server-side:
44+
45+
| Tool | Cost | Description |
46+
|------|------|-------------|
47+
| Web Search | \$25-50 per 1,000 queries | Integrated search capability |
48+
| File Search | \$2.50 per 1,000 queries | Vector store integration for RAG |
49+
| Code Interpreter | Included | Sandbox for code execution |
50+
| Computer Use | Varies | Agent automation tasks |
51+
52+
```julia
53+
# Enable built-in web search
54+
msg = aigenerate(schema, "What are the latest developments in Julia 1.11?";
55+
model="gpt-5-mini", enable_websearch=true)
56+
```
57+
58+
**4. Better Reasoning Model Support**
59+
60+
Reasoning models (o1, o3, o4-mini, GPT-5) use internal "chain-of-thought" that isn't directly exposed. The Responses API:
61+
- Preserves reasoning traces across multi-turn conversations server-side
62+
- Provides reasoning summaries in responses
63+
- Enables control over reasoning effort and verbosity
64+
65+
```julia
66+
msg = aigenerate(schema, "Solve: A train travels 120 km in 2 hours...";
67+
model="o3-mini",
68+
api_kwargs = (reasoning = Dict("effort" => "high", "summary" => "detailed"),))
69+
70+
# Access reasoning summary
71+
println(msg.extras[:reasoning_content])
72+
```
73+
74+
**5. Structured Outputs**
75+
76+
The Responses API supports JSON schema output natively:
77+
78+
```julia
79+
struct CalendarEvent
80+
name::String
81+
date::String
82+
participants::Vector{String}
83+
end
84+
85+
result = aiextract(schema, "Alice and Bob are meeting on Friday for lunch.";
86+
return_type=CalendarEvent, model="gpt-5-mini")
87+
```
88+
89+
### When to Use Each API
90+
91+
| Use Case | Recommended API |
92+
|----------|-----------------|
93+
| Simple, one-off queries | Chat Completions |
94+
| Existing integrations | Chat Completions |
95+
| Multi-turn conversations | Responses API |
96+
| Agentic workflows with tools | Responses API |
97+
| Reasoning models (o1, o3) | Responses API |
98+
| Web search or file search | Responses API |
99+
| Maximum compatibility | Chat Completions |
100+
101+
### Important Notes
102+
103+
- **Assistants API Deprecation**: OpenAI's Assistants API (launched 2023) will sunset in H1 2026 in favor of the Responses API
104+
- **Chat Completions Stability**: The Chat Completions API is not going away and remains fully supported
105+
- **Schema Selection**: Use `OpenAIResponseSchema()` to explicitly use the Responses API
106+
107+
### Further Reading
108+
109+
- [OpenAI: Responses vs Chat Completions](https://platform.openai.com/docs/guides/responses-vs-chat-completions)
110+
- [Why We Built the Responses API](https://developers.openai.com/blog/responses-api/)
111+
11112
### What if I cannot access OpenAI?
12113

13114
There are many alternatives:

docs/src/how_it_works.md

Lines changed: 150 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -377,4 +377,153 @@ food = JSON3.read(last_output(result), Food)
377377
It took 1 retry (see `result.config.retries`) and we have the correct output from an open-source model!
378378

379379
If you're interested in the `result` object, it's a struct (`AICall`) with a field `conversation`, which holds the conversation up to this point.
380-
AIGenerate is an alias for AICall using `aigenerate` function. See `?AICall` (the underlying struct type) for more details on the fields and methods available.
380+
AIGenerate is an alias for AICall using `aigenerate` function. See `?AICall` (the underlying struct type) for more details on the fields and methods available.
381+
382+
## Walkthrough Example for the Responses API
383+
384+
The Responses API is OpenAI's newer API endpoint designed for agentic workflows and reasoning models. Unlike the Chat Completions API which requires you to manage conversation state client-side, the Responses API can manage state server-side.
385+
386+
### When to Use the Responses API
387+
388+
Use the Responses API when you need:
389+
- **Server-side state management**: Avoid sending full conversation history with each request
390+
- **Built-in tools**: Web search, file search, code interpreter without implementing them yourself
391+
- **Reasoning models**: Better support for o1, o3, o4-mini, and GPT-5 models
392+
- **Better caching**: 40-80% improved cache utilization for cost and latency benefits
393+
394+
### Basic Usage
395+
396+
```julia
397+
using PromptingTools
398+
const PT = PromptingTools
399+
400+
# Explicitly use the Responses API with OpenAIResponseSchema
401+
schema = PT.OpenAIResponseSchema()
402+
403+
msg = aigenerate(schema, "What is the capital of France?"; model="gpt-5-mini")
404+
```
405+
406+
### How It Works Under the Hood
407+
408+
Let's trace through what happens when you make a Responses API call:
409+
410+
```julia
411+
# Step 1: Render the prompt for the Responses API
412+
prompt = "What is Julia programming language?"
413+
rendered = PT.render(schema, prompt)
414+
```
415+
416+
The `render` function for `OpenAIResponseSchema` produces a different output than `OpenAISchema`:
417+
418+
```plaintext
419+
(input = "What is Julia programming language?", instructions = nothing)
420+
```
421+
422+
Notice that instead of a vector of messages with "role" and "content" keys, we get a named tuple with `input` and `instructions` fields. This matches the Responses API specification.
423+
424+
If we use a template with a system message:
425+
426+
```julia
427+
conversation = [
428+
PT.SystemMessage("You are a helpful Julia programming assistant."),
429+
PT.UserMessage("What is Julia?")
430+
]
431+
rendered = PT.render(schema, conversation)
432+
```
433+
434+
```plaintext
435+
(input = "What is Julia?", instructions = "You are a helpful Julia programming assistant.")
436+
```
437+
438+
### Server-Side State Management
439+
440+
One of the key advantages of the Responses API is server-side state management:
441+
442+
```julia
443+
# First message
444+
msg1 = aigenerate(schema, "My name is Alice."; model="gpt-5-mini")
445+
446+
# Continue the conversation using previous_response_id
447+
# No need to send the full conversation history!
448+
msg2 = aigenerate(schema, "What is my name?";
449+
model="gpt-5-mini",
450+
previous_response_id=msg1.extras[:response_id])
451+
452+
# The model remembers: "Your name is Alice."
453+
```
454+
455+
With Chat Completions, you would need to send all previous messages with each request. The Responses API handles this server-side.
456+
457+
### Built-in Web Search
458+
459+
The Responses API provides hosted tools that execute server-side:
460+
461+
```julia
462+
msg = aigenerate(schema, "What are the latest Julia 1.11 features?";
463+
model="gpt-5-mini",
464+
enable_websearch=true)
465+
```
466+
467+
This uses OpenAI's built-in web search tool without any additional setup.
468+
469+
### Reasoning Models
470+
471+
For reasoning models like o1, o3, and o4-mini, you can control the reasoning effort:
472+
473+
```julia
474+
msg = aigenerate(schema, "Solve: If a train travels 120 km in 2 hours, and then 180 km in 3 hours, what is its average speed for the entire journey?";
475+
model="o3-mini",
476+
api_kwargs = (reasoning = Dict("effort" => "high", "summary" => "detailed"),))
477+
478+
# Access the reasoning summary
479+
println(msg.extras[:reasoning_content])
480+
```
481+
482+
Reasoning options:
483+
- `effort`: "low", "medium", or "high" - controls how much reasoning effort the model applies
484+
- `summary`: "auto", "concise", or "detailed" - controls verbosity of reasoning summary
485+
486+
### Structured Data Extraction
487+
488+
The Responses API supports structured output via JSON schema:
489+
490+
```julia
491+
struct WeatherInfo
492+
location::String
493+
temperature::Float64
494+
conditions::String
495+
end
496+
497+
result = aiextract(schema, "The weather in Paris is 22°C and sunny.";
498+
return_type=WeatherInfo,
499+
model="gpt-5-mini")
500+
501+
result.content
502+
# WeatherInfo("Paris", 22.0, "sunny")
503+
```
504+
505+
### Response Extras
506+
507+
The `AIMessage` returned by the Responses API includes additional information in the `extras` field:
508+
509+
```julia
510+
msg = aigenerate(schema, "Hello!"; model="gpt-5-mini")
511+
512+
msg.extras[:response_id] # ID for continuing conversations
513+
msg.extras[:reasoning_content] # Vector of reasoning summaries (for reasoning models)
514+
msg.extras[:usage] # Token usage details
515+
msg.extras[:full_response] # Complete API response
516+
```
517+
518+
### Chat Completions vs Responses API Comparison
519+
520+
| Aspect | Chat Completions | Responses API |
521+
|--------|------------------|---------------|
522+
| State Management | Client-side (send all messages) | Server-side (`previous_response_id`) |
523+
| Built-in Tools | None | Web search, file search, code interpreter |
524+
| Reasoning Models | Limited | Full support with effort/summary controls |
525+
| Cache Efficiency | Standard | 40-80% better cache hits |
526+
| Endpoint | `/v1/chat/completions` | `/v1/responses` |
527+
| Schema | `OpenAISchema()` | `OpenAIResponseSchema()` |
528+
529+
For more details on when to use each API, see the [FAQ section on Responses API](frequently_asked_questions.md#Why-use-the-Responses-API-instead-of-Chat-Completions).

0 commit comments

Comments
 (0)