svilupp
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 41 additions & 1 deletion b/‎README.md‎
Lines changed: 41 additions & 1 deletion
diff --git a/‎docs/src/coverage_of_model_providers.md‎
Lines changed: 4 additions & 1 deletion b/‎docs/src/coverage_of_model_providers.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/src/frequently_asked_questions.md‎
Lines changed: 101 additions & 0 deletions b/‎docs/src/frequently_asked_questions.md‎
Lines changed: 101 additions & 0 deletions
diff --git a/‎docs/src/how_it_works.md‎
Lines changed: 150 additions & 1 deletion b/‎docs/src/how_it_works.md‎
Lines changed: 150 additions & 1 deletion
@@ -13,7 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [0.88.0]
 
 ### Added
-- Added support for OpenAI's Responses API (`/responses` endpoint) with reasoning traces and streaming support via `OpenAIResponseSchema`. Use `airespond(OpenAIResponseSchema(), prompt)` for models like `gpt-5.1-codex` that require this endpoint.
+- Added support for OpenAI's Responses API (`/responses` endpoint) via `OpenAIResponseSchema`. Supports reasoning traces, multi-turn conversations with `previous_response_id`, and structured extraction with `aiextract`. Use `aigenerate(OpenAIResponseSchema(), prompt; model="o4-mini")` for reasoning models (access via `result.extras[:reasoning_content]`). See `examples/working_with_responses_api.jl`. Note: Many features are not supported yet, eg, streaming, built-in tools, etc.
 
 ## [0.87.0]
 
 
@@ -101,6 +101,7 @@ For more practical examples, see the `examples/` folder and the [Advanced Exampl
   - [Experimental Agent Workflows / Output Validation with `airetry!`](#experimental-agent-workflows--output-validation-with-airetry)
     - [Using Ollama models](#using-ollama-models)
     - [Using MistralAI API and other OpenAI-compatible APIs](#using-mistralai-api-and-other-openai-compatible-apis)
+    - [Using OpenAI Responses API](#using-openai-responses-api)
     - [Using Anthropic Models](#using-anthropic-models)
     - [More Examples](#more-examples)
   - [Package Interface](#package-interface)
@@ -587,13 +588,52 @@ As you can see, it also works for any local models that you might have running o
 
 Note: At the moment, we only support `aigenerate` and `aiembed` functions for MistralAI and other OpenAI-compatible APIs. We plan to extend the support in the future.
 
+### Using OpenAI Responses API
+
+PromptingTools.jl supports OpenAI's **Responses API** (`/responses` endpoint) in addition to the traditional Chat Completions API. The Responses API offers several advantages for agentic workflows and reasoning models:
+
+**Key Benefits:**
+- **Server-side state management**: No need to send full conversation history with each request
+- **Better cache utilization**: 40-80% improved cache hits, reducing latency and costs
+- **Built-in tools**: Native web search, file search, and code interpreter without round-trips
+- **Reasoning model support**: Better preservation of reasoning traces for models like o1, o3, and GPT-5
+- **Multimodal-first design**: Text, images, and tools as first-class citizens
+
+```julia
+# Use the Responses API with any compatible model
+schema = OpenAIResponseSchema()
+msg = aigenerate(schema, "What is Julia?"; model="gpt-5-mini")
+
+# Enable web search (built-in tool)
+msg = aigenerate(schema, "What are the latest Julia releases?";
+    model="gpt-5-mini", enable_websearch=true)
+
+# With reasoning enabled (for reasoning models)
+msg = aigenerate(schema, "Solve: What is 15% of 80?";
+    model="o3-mini",
+    api_kwargs = (reasoning = Dict("effort" => "medium", "summary" => "auto"),))
+
+# Access reasoning summary
+println(msg.extras[:reasoning_content])
+
+# Continue conversations using previous_response_id
+msg2 = aigenerate(schema, "Tell me more";
+    model="gpt-5-mini", previous_response_id=msg.extras[:response_id])
+```
+
+**When to use which API:**
+- **Chat Completions API** (default): Straightforward conversations, established integrations, maximum compatibility
+- **Responses API**: Complex agent workflows, tool use, reasoning models, state-heavy applications
+
+See the [FAQ](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions/#Why-use-the-Responses-API-instead-of-Chat-Completions?) for more details.
+
 ### Using Anthropic Models
 
 Make sure the `ANTHROPIC_API_KEY` environment variable is set to your API key.
 
 ```julia
 # cladeuh is alias for Claude 3 Haiku
-ai"Say hi!"claudeh 
+ai"Say hi!"claudeh
 ```
 
 Preset model aliases are `claudeo`, `claudes`, and `claudeh`, for Claude 3 Opus, Sonnet, and Haiku, respectively.
 
@@ -10,7 +10,8 @@ Below is an overview of the model providers supported by PromptingTools.jl, alon
 
 | Abstract Schema         | Schema                    | Model Provider                         | aigenerate | aiembed | aiextract | aiscan | aiimage | aiclassify |
 |-------------------------|---------------------------|----------------------------------------|------------|---------|-----------|--------|---------|------------|
-| AbstractOpenAISchema    | OpenAISchema              | OpenAI                                 | ✅         | ✅     | ✅       | ✅     | ✅     | ✅         |
+| AbstractOpenAISchema    | OpenAISchema              | OpenAI (Chat Completions)              | ✅         | ✅     | ✅       | ✅     | ✅     | ✅         |
+| AbstractResponseSchema  | OpenAIResponseSchema***   | OpenAI (Responses API)                 | ✅         | ❌     | ✅       | ❌     | ❌     | ❌         |
 | AbstractOpenAISchema    | CustomOpenAISchema*       | Any OpenAI-compatible API (eg, vLLM)*  | ✅         | ✅     | ✅       | ✅     | ✅     | ❌         |
 | AbstractOpenAISchema    | LocalServerOpenAISchema** | Any OpenAI-compatible Local server**   | ✅         | ✅     | ✅       | ✅     | ✅     | ❌         |
 | AbstractOpenAISchema    | MistralOpenAISchema       | Mistral AI                             | ✅         | ✅     | ✅       | ✅     | ✅     | ❌         |
@@ -28,6 +29,8 @@ Below is an overview of the model providers supported by PromptingTools.jl, alon
 
 \*\* This schema is a flavor of CustomOpenAISchema with a `url` key preset by global preference key `LOCAL_SERVER`. It is specifically designed for seamless integration with Llama.jl and utilizes an ENV variable for the URL, making integration easier in certain workflows, such as when nested calls are involved and passing `api_kwargs` is more challenging.
 
+\*\*\* The Responses API (`OpenAIResponseSchema`) is OpenAI's newer API designed for agentic workflows and reasoning models. Key features include server-side state management (no need to send full conversation history), built-in tools (web search, file search, code interpreter), and better support for reasoning models (o1, o3, GPT-5). Use `previous_response_id` kwarg to continue conversations. See the [FAQ](frequently_asked_questions.md#Why-use-the-Responses-API-instead-of-Chat-Completions) for details.
+
 **Note 1:** `aitools` has identical support as `aiextract` for all providers, as it has the API requirements.
 
 **Note 2:** The `aiscan` and `aiimage` functions rely on specific endpoints being implemented by the provider. Ensure that the provider you choose supports these functionalities.
 
@@ -8,6 +8,107 @@ There will be situations not or cannot use it (eg, privacy, cost, etc.). In that
 
 Note: To get started with [Ollama.ai](https://ollama.ai/), see the [Setup Guide for Ollama](#setup-guide-for-ollama) section below.
 
+## Why use the Responses API instead of Chat Completions?
+
+OpenAI offers two main APIs for interacting with their models:
+- **Chat Completions API** (`/v1/chat/completions`) - The traditional, widely-adopted approach
+- **Responses API** (`/v1/responses`) - A newer API designed for agentic workflows and reasoning models
+
+### Key Advantages of the Responses API
+
+**1. Server-Side State Management**
+
+With Chat Completions, you must maintain conversation history yourself, sending the full message array with each request. This becomes unwieldy with long conversations, attachments, and tools.
+
+The Responses API manages state server-side. You simply reference a `previous_response_id` to continue conversations:
+
+```julia
+schema = OpenAIResponseSchema()
+msg1 = aigenerate(schema, "What is Julia?"; model="gpt-5-mini")
+
+# Continue the conversation without resending history
+msg2 = aigenerate(schema, "Tell me more about its type system";
+    model="gpt-5-mini", previous_response_id=msg1.extras[:response_id])
+```
+
+**2. Better Performance and Cost Efficiency**
+
+OpenAI reports 40-80% better cache utilization with the Responses API, leading to:
+- Reduced latency (cached tokens are processed faster)
+- Lower costs (cached tokens are cheaper)
+- More efficient multi-turn conversations
+
+**3. Built-in Tools**
+
+The Responses API provides hosted tools that execute server-side:
+
+| Tool | Cost | Description |
+|------|------|-------------|
+| Web Search | \$25-50 per 1,000 queries | Integrated search capability |
+| File Search | \$2.50 per 1,000 queries | Vector store integration for RAG |
+| Code Interpreter | Included | Sandbox for code execution |
+| Computer Use | Varies | Agent automation tasks |
+
+```julia
+# Enable built-in web search
+msg = aigenerate(schema, "What are the latest developments in Julia 1.11?";
+    model="gpt-5-mini", enable_websearch=true)
+```
+
+**4. Better Reasoning Model Support**
+
+Reasoning models (o1, o3, o4-mini, GPT-5) use internal "chain-of-thought" that isn't directly exposed. The Responses API:
+- Preserves reasoning traces across multi-turn conversations server-side
+- Provides reasoning summaries in responses
+- Enables control over reasoning effort and verbosity
+
+```julia
+msg = aigenerate(schema, "Solve: A train travels 120 km in 2 hours...";
+    model="o3-mini",
+    api_kwargs = (reasoning = Dict("effort" => "high", "summary" => "detailed"),))
+
+# Access reasoning summary
+println(msg.extras[:reasoning_content])
+```
+
+**5. Structured Outputs**
+
+The Responses API supports JSON schema output natively:
+
+```julia
+struct CalendarEvent
+    name::String
+    date::String
+    participants::Vector{String}
+end
+
+result = aiextract(schema, "Alice and Bob are meeting on Friday for lunch.";
+    return_type=CalendarEvent, model="gpt-5-mini")
+```
+
+### When to Use Each API
+
+| Use Case | Recommended API |
+|----------|-----------------|
+| Simple, one-off queries | Chat Completions |
+| Existing integrations | Chat Completions |
+| Multi-turn conversations | Responses API |
+| Agentic workflows with tools | Responses API |
+| Reasoning models (o1, o3) | Responses API |
+| Web search or file search | Responses API |
+| Maximum compatibility | Chat Completions |
+
+### Important Notes
+
+- **Assistants API Deprecation**: OpenAI's Assistants API (launched 2023) will sunset in H1 2026 in favor of the Responses API
+- **Chat Completions Stability**: The Chat Completions API is not going away and remains fully supported
+- **Schema Selection**: Use `OpenAIResponseSchema()` to explicitly use the Responses API
+
+### Further Reading
+
+- [OpenAI: Responses vs Chat Completions](https://platform.openai.com/docs/guides/responses-vs-chat-completions)
+- [Why We Built the Responses API](https://developers.openai.com/blog/responses-api/)
+
 ### What if I cannot access OpenAI?
 
 There are many alternatives:
 
@@ -377,4 +377,153 @@ food = JSON3.read(last_output(result), Food)
 It took 1 retry (see `result.config.retries`) and we have the correct output from an open-source model!
 
 If you're interested in the `result` object, it's a struct (`AICall`) with a field `conversation`, which holds the conversation up to this point.
-AIGenerate is an alias for AICall using `aigenerate` function. See `?AICall` (the underlying struct type) for more details on the fields and methods available.
+AIGenerate is an alias for AICall using `aigenerate` function. See `?AICall` (the underlying struct type) for more details on the fields and methods available.
+
+## Walkthrough Example for the Responses API
+
+The Responses API is OpenAI's newer API endpoint designed for agentic workflows and reasoning models. Unlike the Chat Completions API which requires you to manage conversation state client-side, the Responses API can manage state server-side.
+
+### When to Use the Responses API
+
+Use the Responses API when you need:
+- **Server-side state management**: Avoid sending full conversation history with each request
+- **Built-in tools**: Web search, file search, code interpreter without implementing them yourself
+- **Reasoning models**: Better support for o1, o3, o4-mini, and GPT-5 models
+- **Better caching**: 40-80% improved cache utilization for cost and latency benefits
+
+### Basic Usage
+
+```julia
+using PromptingTools
+const PT = PromptingTools
+
+# Explicitly use the Responses API with OpenAIResponseSchema
+schema = PT.OpenAIResponseSchema()
+
+msg = aigenerate(schema, "What is the capital of France?"; model="gpt-5-mini")
+```
+
+### How It Works Under the Hood
+
+Let's trace through what happens when you make a Responses API call:
+
+```julia
+# Step 1: Render the prompt for the Responses API
+prompt = "What is Julia programming language?"
+rendered = PT.render(schema, prompt)
+```
+
+The `render` function for `OpenAIResponseSchema` produces a different output than `OpenAISchema`:
+
+```plaintext
+(input = "What is Julia programming language?", instructions = nothing)
+```
+
+Notice that instead of a vector of messages with "role" and "content" keys, we get a named tuple with `input` and `instructions` fields. This matches the Responses API specification.
+
+If we use a template with a system message:
+
+```julia
+conversation = [
+    PT.SystemMessage("You are a helpful Julia programming assistant."),
+    PT.UserMessage("What is Julia?")
+]
+rendered = PT.render(schema, conversation)
+```
+
+```plaintext
+(input = "What is Julia?", instructions = "You are a helpful Julia programming assistant.")
+```
+
+### Server-Side State Management
+
+One of the key advantages of the Responses API is server-side state management:
+
+```julia
+# First message
+msg1 = aigenerate(schema, "My name is Alice."; model="gpt-5-mini")
+
+# Continue the conversation using previous_response_id
+# No need to send the full conversation history!
+msg2 = aigenerate(schema, "What is my name?";
+    model="gpt-5-mini",
+    previous_response_id=msg1.extras[:response_id])
+
+# The model remembers: "Your name is Alice."
+```
+
+With Chat Completions, you would need to send all previous messages with each request. The Responses API handles this server-side.
+
+### Built-in Web Search
+
+The Responses API provides hosted tools that execute server-side:
+
+```julia
+msg = aigenerate(schema, "What are the latest Julia 1.11 features?";
+    model="gpt-5-mini",
+    enable_websearch=true)
+```
+
+This uses OpenAI's built-in web search tool without any additional setup.
+
+### Reasoning Models
+
+For reasoning models like o1, o3, and o4-mini, you can control the reasoning effort:
+
+```julia
+msg = aigenerate(schema, "Solve: If a train travels 120 km in 2 hours, and then 180 km in 3 hours, what is its average speed for the entire journey?";
+    model="o3-mini",
+    api_kwargs = (reasoning = Dict("effort" => "high", "summary" => "detailed"),))
+
+# Access the reasoning summary
+println(msg.extras[:reasoning_content])
+```
+
+Reasoning options:
+- `effort`: "low", "medium", or "high" - controls how much reasoning effort the model applies
+- `summary`: "auto", "concise", or "detailed" - controls verbosity of reasoning summary
+
+### Structured Data Extraction
+
+The Responses API supports structured output via JSON schema:
+
+```julia
+struct WeatherInfo
+    location::String
+    temperature::Float64
+    conditions::String
+end
+
+result = aiextract(schema, "The weather in Paris is 22°C and sunny.";
+    return_type=WeatherInfo,
+    model="gpt-5-mini")
+
+result.content
+# WeatherInfo("Paris", 22.0, "sunny")
+```
+
+### Response Extras
+
+The `AIMessage` returned by the Responses API includes additional information in the `extras` field:
+
+```julia
+msg = aigenerate(schema, "Hello!"; model="gpt-5-mini")
+
+msg.extras[:response_id]        # ID for continuing conversations
+msg.extras[:reasoning_content]  # Vector of reasoning summaries (for reasoning models)
+msg.extras[:usage]              # Token usage details
+msg.extras[:full_response]      # Complete API response
+```
+
+### Chat Completions vs Responses API Comparison
+
+| Aspect | Chat Completions | Responses API |
+|--------|------------------|---------------|
+| State Management | Client-side (send all messages) | Server-side (`previous_response_id`) |
+| Built-in Tools | None | Web search, file search, code interpreter |
+| Reasoning Models | Limited | Full support with effort/summary controls |
+| Cache Efficiency | Standard | 40-80% better cache hits |
+| Endpoint | `/v1/chat/completions` | `/v1/responses` |
+| Schema | `OpenAISchema()` | `OpenAIResponseSchema()` |
+
+For more details on when to use each API, see the [FAQ section on Responses API](frequently_asked_questions.md#Why-use-the-Responses-API-instead-of-Chat-Completions).