You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
13
13
## [0.88.0]
14
14
15
15
### Added
16
-
- Added support for OpenAI's Responses API (`/responses` endpoint) with reasoning tracesand streaming support via `OpenAIResponseSchema`. Use `airespond(OpenAIResponseSchema(), prompt)` for models like `gpt-5.1-codex` that require this endpoint.
16
+
- Added support for OpenAI's Responses API (`/responses` endpoint) via `OpenAIResponseSchema`. Supports reasoning traces, multi-turn conversations with `previous_response_id`, and structured extraction with `aiextract`. Use `aigenerate(OpenAIResponseSchema(), prompt; model="o4-mini")` for reasoning models (access via `result.extras[:reasoning_content]`). See `examples/working_with_responses_api.jl`. Note: Many features are not supported yet, eg, streaming, built-in tools, etc.
@@ -587,13 +588,52 @@ As you can see, it also works for any local models that you might have running o
587
588
588
589
Note: At the moment, we only support `aigenerate` and `aiembed` functions for MistralAI and other OpenAI-compatible APIs. We plan to extend the support in the future.
589
590
591
+
### Using OpenAI Responses API
592
+
593
+
PromptingTools.jl supports OpenAI's **Responses API** (`/responses` endpoint) in addition to the traditional Chat Completions API. The Responses API offers several advantages for agentic workflows and reasoning models:
594
+
595
+
**Key Benefits:**
596
+
-**Server-side state management**: No need to send full conversation history with each request
See the [FAQ](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions/#Why-use-the-Responses-API-instead-of-Chat-Completions?) for more details.
629
+
590
630
### Using Anthropic Models
591
631
592
632
Make sure the `ANTHROPIC_API_KEY` environment variable is set to your API key.
593
633
594
634
```julia
595
635
# cladeuh is alias for Claude 3 Haiku
596
-
ai"Say hi!"claudeh
636
+
ai"Say hi!"claudeh
597
637
```
598
638
599
639
Preset model aliases are `claudeo`, `claudes`, and `claudeh`, for Claude 3 Opus, Sonnet, and Haiku, respectively.
@@ -28,6 +29,8 @@ Below is an overview of the model providers supported by PromptingTools.jl, alon
28
29
29
30
\*\* This schema is a flavor of CustomOpenAISchema with a `url` key preset by global preference key `LOCAL_SERVER`. It is specifically designed for seamless integration with Llama.jl and utilizes an ENV variable for the URL, making integration easier in certain workflows, such as when nested calls are involved and passing `api_kwargs` is more challenging.
30
31
32
+
\*\*\* The Responses API (`OpenAIResponseSchema`) is OpenAI's newer API designed for agentic workflows and reasoning models. Key features include server-side state management (no need to send full conversation history), built-in tools (web search, file search, code interpreter), and better support for reasoning models (o1, o3, GPT-5). Use `previous_response_id` kwarg to continue conversations. See the [FAQ](frequently_asked_questions.md#Why-use-the-Responses-API-instead-of-Chat-Completions) for details.
33
+
31
34
**Note 1:**`aitools` has identical support as `aiextract` for all providers, as it has the API requirements.
32
35
33
36
**Note 2:** The `aiscan` and `aiimage` functions rely on specific endpoints being implemented by the provider. Ensure that the provider you choose supports these functionalities.
Copy file name to clipboardExpand all lines: docs/src/frequently_asked_questions.md
+101Lines changed: 101 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,107 @@ There will be situations not or cannot use it (eg, privacy, cost, etc.). In that
8
8
9
9
Note: To get started with [Ollama.ai](https://ollama.ai/), see the [Setup Guide for Ollama](#setup-guide-for-ollama) section below.
10
10
11
+
## Why use the Responses API instead of Chat Completions?
12
+
13
+
OpenAI offers two main APIs for interacting with their models:
14
+
-**Chat Completions API** (`/v1/chat/completions`) - The traditional, widely-adopted approach
15
+
-**Responses API** (`/v1/responses`) - A newer API designed for agentic workflows and reasoning models
16
+
17
+
### Key Advantages of the Responses API
18
+
19
+
**1. Server-Side State Management**
20
+
21
+
With Chat Completions, you must maintain conversation history yourself, sending the full message array with each request. This becomes unwieldy with long conversations, attachments, and tools.
22
+
23
+
The Responses API manages state server-side. You simply reference a `previous_response_id` to continue conversations:
24
+
25
+
```julia
26
+
schema =OpenAIResponseSchema()
27
+
msg1 =aigenerate(schema, "What is Julia?"; model="gpt-5-mini")
28
+
29
+
# Continue the conversation without resending history
30
+
msg2 =aigenerate(schema, "Tell me more about its type system";
It took 1 retry (see `result.config.retries`) and we have the correct output from an open-source model!
378
378
379
379
If you're interested in the `result` object, it's a struct (`AICall`) with a field `conversation`, which holds the conversation up to this point.
380
-
AIGenerate is an alias for AICall using `aigenerate` function. See `?AICall` (the underlying struct type) for more details on the fields and methods available.
380
+
AIGenerate is an alias for AICall using `aigenerate` function. See `?AICall` (the underlying struct type) for more details on the fields and methods available.
381
+
382
+
## Walkthrough Example for the Responses API
383
+
384
+
The Responses API is OpenAI's newer API endpoint designed for agentic workflows and reasoning models. Unlike the Chat Completions API which requires you to manage conversation state client-side, the Responses API can manage state server-side.
385
+
386
+
### When to Use the Responses API
387
+
388
+
Use the Responses API when you need:
389
+
-**Server-side state management**: Avoid sending full conversation history with each request
390
+
-**Built-in tools**: Web search, file search, code interpreter without implementing them yourself
391
+
-**Reasoning models**: Better support for o1, o3, o4-mini, and GPT-5 models
392
+
-**Better caching**: 40-80% improved cache utilization for cost and latency benefits
393
+
394
+
### Basic Usage
395
+
396
+
```julia
397
+
using PromptingTools
398
+
const PT = PromptingTools
399
+
400
+
# Explicitly use the Responses API with OpenAIResponseSchema
401
+
schema = PT.OpenAIResponseSchema()
402
+
403
+
msg =aigenerate(schema, "What is the capital of France?"; model="gpt-5-mini")
404
+
```
405
+
406
+
### How It Works Under the Hood
407
+
408
+
Let's trace through what happens when you make a Responses API call:
409
+
410
+
```julia
411
+
# Step 1: Render the prompt for the Responses API
412
+
prompt ="What is Julia programming language?"
413
+
rendered = PT.render(schema, prompt)
414
+
```
415
+
416
+
The `render` function for `OpenAIResponseSchema` produces a different output than `OpenAISchema`:
417
+
418
+
```plaintext
419
+
(input = "What is Julia programming language?", instructions = nothing)
420
+
```
421
+
422
+
Notice that instead of a vector of messages with "role" and "content" keys, we get a named tuple with `input` and `instructions` fields. This matches the Responses API specification.
423
+
424
+
If we use a template with a system message:
425
+
426
+
```julia
427
+
conversation = [
428
+
PT.SystemMessage("You are a helpful Julia programming assistant."),
429
+
PT.UserMessage("What is Julia?")
430
+
]
431
+
rendered = PT.render(schema, conversation)
432
+
```
433
+
434
+
```plaintext
435
+
(input = "What is Julia?", instructions = "You are a helpful Julia programming assistant.")
436
+
```
437
+
438
+
### Server-Side State Management
439
+
440
+
One of the key advantages of the Responses API is server-side state management:
441
+
442
+
```julia
443
+
# First message
444
+
msg1 =aigenerate(schema, "My name is Alice."; model="gpt-5-mini")
445
+
446
+
# Continue the conversation using previous_response_id
447
+
# No need to send the full conversation history!
448
+
msg2 =aigenerate(schema, "What is my name?";
449
+
model="gpt-5-mini",
450
+
previous_response_id=msg1.extras[:response_id])
451
+
452
+
# The model remembers: "Your name is Alice."
453
+
```
454
+
455
+
With Chat Completions, you would need to send all previous messages with each request. The Responses API handles this server-side.
456
+
457
+
### Built-in Web Search
458
+
459
+
The Responses API provides hosted tools that execute server-side:
460
+
461
+
```julia
462
+
msg =aigenerate(schema, "What are the latest Julia 1.11 features?";
463
+
model="gpt-5-mini",
464
+
enable_websearch=true)
465
+
```
466
+
467
+
This uses OpenAI's built-in web search tool without any additional setup.
468
+
469
+
### Reasoning Models
470
+
471
+
For reasoning models like o1, o3, and o4-mini, you can control the reasoning effort:
472
+
473
+
```julia
474
+
msg =aigenerate(schema, "Solve: If a train travels 120 km in 2 hours, and then 180 km in 3 hours, what is its average speed for the entire journey?";
For more details on when to use each API, see the [FAQ section on Responses API](frequently_asked_questions.md#Why-use-the-Responses-API-instead-of-Chat-Completions).
0 commit comments