Skip to content

Commit 39d8402

Browse files
authored
update streaming for responses (#321)
1 parent 4a5af48 commit 39d8402

13 files changed

+302
-52
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
## [0.88.0]
1414

1515
### Added
16-
- Added support for OpenAI's Responses API (`/responses` endpoint) via `OpenAIResponseSchema`. Supports reasoning traces, multi-turn conversations with `previous_response_id`, and structured extraction with `aiextract`. Use `aigenerate(OpenAIResponseSchema(), prompt; model="o4-mini")` for reasoning models (access via `result.extras[:reasoning_content]`). See `examples/working_with_responses_api.jl`. Note: Many features are not supported yet, eg, streaming, built-in tools, etc.
16+
- Added support for OpenAI's Responses API (`/responses` endpoint) via `OpenAIResponseSchema`. Supports reasoning traces, multi-turn conversations with `previous_response_id`, and structured extraction with `aiextract`. Use `aigenerate(OpenAIResponseSchema(), prompt; model="o4-mini")` for reasoning models (access via `result.extras[:reasoning_content]`). See `examples/working_with_responses_api.jl`. Note: Many features are not supported yet, eg, built-in tools, etc.
17+
- Added support for streaming responses with `OpenAIResponseSchema` via a dedicated `StreamCallback` flavor. See `examples/working_with_responses_api.jl`.
1718

1819
## [0.87.0]
1920

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ REPL = "<0.0.1, 1"
4646
Random = "<0.0.1, 1"
4747
SparseArrays = "<0.0.1, 1"
4848
Statistics = "<0.0.1, 1"
49-
StreamCallbacks = "0.6.2"
49+
StreamCallbacks = "0.7"
5050
Test = "<0.0.1, 1"
5151
Unicode = "<0.0.1, 1"
5252
julia = "1.9, 1.10, 1.11"

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,12 @@ println(msg.extras[:reasoning_content])
619619
# Continue conversations using previous_response_id
620620
msg2 = aigenerate(schema, "Tell me more";
621621
model="gpt-5-mini", previous_response_id=msg.extras[:response_id])
622+
623+
# Streaming responses
624+
msg = aigenerate(schema, "Count from 1 to 10, one number per line.";
625+
model = "gpt-5-mini",
626+
streamcallback = stdout,
627+
verbose = false)
622628
```
623629

624630
**When to use which API:**

docs/src/coverage_of_model_providers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Below is an overview of the model providers supported by PromptingTools.jl, alon
1111
| Abstract Schema | Schema | Model Provider | aigenerate | aiembed | aiextract | aiscan | aiimage | aiclassify |
1212
|-------------------------|---------------------------|----------------------------------------|------------|---------|-----------|--------|---------|------------|
1313
| AbstractOpenAISchema | OpenAISchema | OpenAI (Chat Completions) |||||||
14-
| AbstractResponseSchema | OpenAIResponseSchema*** | OpenAI (Responses API) |||||||
14+
| AbstractOpenAIResponseSchema | OpenAIResponseSchema*** | OpenAI (Responses API) |||||||
1515
| AbstractOpenAISchema | CustomOpenAISchema* | Any OpenAI-compatible API (eg, vLLM)* |||||||
1616
| AbstractOpenAISchema | LocalServerOpenAISchema** | Any OpenAI-compatible Local server** |||||||
1717
| AbstractOpenAISchema | MistralOpenAISchema | Mistral AI |||||||

examples/working_with_responses_api.jl

Lines changed: 50 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,8 +215,8 @@ println("Usage: ", response.extras[:usage])
215215
# ## 10. Using with Templates
216216
#
217217
# Works with PromptingTools templates:
218-
219-
response = aigenerate(schema, :BlankSystemUser;
218+
tpl = PT.render(AITemplate(:BlankSystemUser))
219+
response = aigenerate(schema, tpl;
220220
system = "You are a helpful coding assistant specialized in Julia.",
221221
user = "How do I read a CSV file?",
222222
model = "gpt-5-mini",
@@ -225,6 +225,53 @@ response = aigenerate(schema, :BlankSystemUser;
225225
println("\n=== Template Usage ===")
226226
println("Template response: ", response.content)
227227

228+
# ## 11. Streaming Responses
229+
#
230+
# Stream responses in real-time for better interactivity.
231+
# Uses `OpenAIResponsesStream` flavor from StreamCallbacks.jl.
232+
233+
using PromptingTools: StreamCallback
234+
235+
# Basic streaming to stdout - see tokens appear as they're generated
236+
println("\n=== Streaming to stdout ===")
237+
response = aigenerate(schema, "Count from 1 to 10, one number per line.";
238+
model = "gpt-5-mini",
239+
streamcallback = stdout,
240+
verbose = false)
241+
242+
# Streaming with custom StreamCallback to capture chunks
243+
println("\n\n=== Streaming with StreamCallback ===")
244+
cb = StreamCallback() # captures all chunks for inspection
245+
response = aigenerate(schema, "What is Julia in one sentence?";
246+
model = "gpt-5-mini",
247+
streamcallback = cb,
248+
verbose = false)
249+
250+
println("Final content: ", response.content)
251+
println("Number of chunks received: ", length(cb.chunks))
252+
253+
# Streaming to an IOBuffer for programmatic capture
254+
output = IOBuffer()
255+
cb = StreamCallback(; out = output)
256+
response = aigenerate(schema, "Say hello in 3 languages.";
257+
model = "gpt-5-mini",
258+
streamcallback = cb,
259+
verbose = false)
260+
261+
streamed_text = String(take!(output))
262+
println("Captured streamed text: ", streamed_text)
263+
264+
# Streaming with reasoning models - see reasoning and output streamed
265+
println("\n=== Streaming with Reasoning ===")
266+
cb = StreamCallback(; out = stdout)
267+
response = aigenerate(schema, "What is 15 * 7? Think step by step.";
268+
model = "o4-mini",
269+
api_kwargs = (reasoning = Dict("effort" => "medium", "summary" => "auto"),),
270+
streamcallback = cb,
271+
verbose = false)
272+
273+
println("\nReasoning content: ", response.extras[:reasoning_content])
274+
228275
# ## Summary of Key Features
229276
#
230277
# | Feature | How to Use |
@@ -236,4 +283,5 @@ println("Template response: ", response.content)
236283
# | Multi-turn (efficient) | `previous_response_id = response.extras[:response_id]` |
237284
# | Structured extraction | `aiextract(schema, prompt; return_type=MyStruct)` |
238285
# | Web search | `enable_websearch = true` |
286+
# | Streaming | `streamcallback = stdout` or `StreamCallback()` |
239287
# | Access reasoning | `response.extras[:reasoning_content]` |

src/PromptingTools.jl

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,9 @@ import Preferences
1313
using Preferences: @load_preference, @set_preferences!
1414
using PrecompileTools
1515
using StreamCallbacks
16-
using StreamCallbacks: OpenAIStream, AnthropicStream, OllamaStream, StreamCallback,
17-
StreamChunk, AbstractStreamCallback
18-
# ResponseStream will be available in a future StreamCallbacks release
19-
# For now, we use OpenAIStream as a fallback for the Responses API
16+
using StreamCallbacks: OpenAIStream, OpenAIResponsesStream, AnthropicStream, OllamaStream,
17+
StreamCallback, StreamChunk, AbstractStreamCallback,
18+
streamed_request!, build_response_body
2019
using Test, Pkg
2120
## Added REPL because it extends methods in Base.docs for extraction of docstrings
2221
using REPL

src/llm_interface.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -587,19 +587,19 @@ isextracted(x) = x isa AbstractExtractedData
587587
# which is used by models like gpt-5.1-codex that don't support the standard chat completions API.
588588

589589
"""
590-
AbstractResponseSchema
590+
AbstractOpenAIResponseSchema
591591
592-
Abstract type for all response-based schemas that use the `/responses` endpoint instead of `/chat/completions`.
592+
Abstract type for all OpenAI response-based schemas that use the `/responses` endpoint instead of `/chat/completions`.
593593
"""
594-
abstract type AbstractResponseSchema <: AbstractPromptSchema end
594+
abstract type AbstractOpenAIResponseSchema <: AbstractPromptSchema end
595595

596596
"""
597-
OpenAIResponseSchema <: AbstractResponseSchema
597+
OpenAIResponseSchema <: AbstractOpenAIResponseSchema
598598
599599
A schema for OpenAI's Responses API (`/responses` endpoint).
600600
601601
This schema is used for models that only support the Responses API, such as `gpt-5.1-codex`.
602-
Unlike the standard chat completions API, the Responses API uses `input` and `instructions`
602+
Unlike the standard chat completions API, the Responses API uses `input` and `instructions`
603603
fields instead of a messages array.
604604
605605
# Example
@@ -608,10 +608,10 @@ schema = OpenAIResponseSchema()
608608
response = aigenerate(schema, "What is Julia?"; model="gpt-5.1-codex")
609609
```
610610
"""
611-
struct OpenAIResponseSchema <: AbstractResponseSchema end
611+
struct OpenAIResponseSchema <: AbstractOpenAIResponseSchema end
612612

613613
"Echoes the user's input back to them. Used for testing the Responses API implementation"
614-
@kwdef mutable struct TestEchoOpenAIResponseSchema <: AbstractResponseSchema
614+
@kwdef mutable struct TestEchoOpenAIResponseSchema <: AbstractOpenAIResponseSchema
615615
response::AbstractDict = Dict(
616616
"id" => "resp_test123",
617617
"object" => "response",

src/llm_openai_responses.jl

Lines changed: 25 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ function create_response(schema::TestEchoOpenAIResponseSchema, api_key::Abstract
1515
end
1616

1717
"""
18-
create_response(schema::AbstractResponseSchema, api_key::AbstractString,
18+
create_response(schema::AbstractOpenAIResponseSchema, api_key::AbstractString,
1919
model::AbstractString,
2020
input;
2121
instructions::Union{Nothing, AbstractString} = nothing,
@@ -29,7 +29,7 @@ end
2929
Creates a response using the OpenAI Responses API with streaming support.
3030
3131
# Arguments
32-
- `schema::AbstractResponseSchema`: The response schema to use
32+
- `schema::AbstractOpenAIResponseSchema`: The response schema to use
3333
- `api_key::AbstractString`: The API key to use for the OpenAI API
3434
- `model::AbstractString`: The model to use for generating the response
3535
- `input`: The input for the model, can be a string or structured input
@@ -46,7 +46,7 @@ Creates a response using the OpenAI Responses API with streaming support.
4646
# Returns
4747
- `response`: The response from the OpenAI API
4848
"""
49-
function create_response(schema::AbstractResponseSchema, api_key::AbstractString,
49+
function create_response(schema::AbstractOpenAIResponseSchema, api_key::AbstractString,
5050
model::AbstractString,
5151
input;
5252
instructions::Union{Nothing, AbstractString} = nothing,
@@ -73,23 +73,19 @@ function create_response(schema::AbstractResponseSchema, api_key::AbstractString
7373
body["stream"] = true
7474
end
7575

76-
# Add all parameters from api_kwargs
76+
# Add all parameters from api_kwargs (except url which is used for testing)
7777
# Supports: reasoning, text, temperature, max_output_tokens, etc.
7878
for (key, value) in pairs(api_kwargs)
79+
key == :url && continue # url is used for testing, not sent to API
7980
body[string(key)] = value
8081
end
8182

82-
# Make the API request
83-
url = OpenAI.build_url(OpenAI.DEFAULT_PROVIDER, "responses")
83+
# Make the API request (url can be overridden via api_kwargs for testing)
84+
url = get(api_kwargs, :url, OpenAI.build_url(OpenAI.DEFAULT_PROVIDER, "responses"))
8485
headers = OpenAI.auth_header(OpenAI.DEFAULT_PROVIDER, api_key)
8586

8687
if !isnothing(streamcallback)
87-
# Streaming is not yet supported for the Responses API
88-
# The Responses API uses a different SSE format than Chat Completions,
89-
# requiring a dedicated ResponseStream flavor in StreamCallbacks.jl
90-
throw(ArgumentError("Streaming is not yet supported for OpenAI Responses API (OpenAIResponseSchema). Use non-streaming requests for now."))
91-
92-
# Configure streaming callback - only pass schema, no extra kwargs
88+
# Configure streaming callback
9389
streamcallback, stream_kwargs = configure_callback!(streamcallback, schema)
9490

9591
# Convert body dict to IOBuffer for streaming (streamed_request! expects IOBuffer)
@@ -99,7 +95,10 @@ function create_response(schema::AbstractResponseSchema, api_key::AbstractString
9995

10096
# Use streaming request
10197
resp = streamed_request!(streamcallback, url, headers, input; http_kwargs...)
102-
return OpenAI.OpenAIResponse(resp.status, JSON3.read(resp.body))
98+
99+
# Build response body from chunks using StreamCallbacks
100+
response_body = build_response_body(streamcallback.flavor, streamcallback)
101+
return OpenAI.OpenAIResponse(resp.status, response_body)
103102
else
104103
# Convert the body to JSON for non-streaming
105104
json_body = JSON3.write(body)
@@ -111,7 +110,7 @@ function create_response(schema::AbstractResponseSchema, api_key::AbstractString
111110
end
112111

113112
"""
114-
render(schema::AbstractResponseSchema, messages::Vector{<:AbstractMessage};
113+
render(schema::AbstractOpenAIResponseSchema, messages::Vector{<:AbstractMessage};
115114
conversation::AbstractVector{<:AbstractMessage} = AbstractMessage[],
116115
no_system_message::Bool = false,
117116
kwargs...)
@@ -124,7 +123,7 @@ The Responses API expects:
124123
- `instructions`: System-level instructions (from SystemMessage, optional)
125124
126125
# Arguments
127-
- `schema::AbstractResponseSchema`: The response schema
126+
- `schema::AbstractOpenAIResponseSchema`: The response schema
128127
- `messages::Vector{<:AbstractMessage}`: Messages to render
129128
- `conversation`: Previous conversation history (currently limited support)
130129
- `no_system_message`: If true, don't add default system message
@@ -133,7 +132,7 @@ The Responses API expects:
133132
# Returns
134133
- `NamedTuple{(:input, :instructions), Tuple{String, Union{Nothing, String}}}`: Rendered input and instructions
135134
"""
136-
function render(schema::AbstractResponseSchema,
135+
function render(schema::AbstractOpenAIResponseSchema,
137136
messages::Vector{<:AbstractMessage};
138137
conversation::AbstractVector{<:AbstractMessage} = AbstractMessage[],
139138
no_system_message::Bool = false,
@@ -165,23 +164,23 @@ function render(schema::AbstractResponseSchema,
165164
end
166165

167166
# Render for string prompts - wrap in UserMessage and process
168-
function render(schema::AbstractResponseSchema, prompt::AbstractString;
167+
function render(schema::AbstractOpenAIResponseSchema, prompt::AbstractString;
169168
no_system_message::Bool = true, kwargs...)
170169
render(schema, [UserMessage(prompt)]; no_system_message, kwargs...)
171170
end
172171

173172
# Render for single message
174-
function render(schema::AbstractResponseSchema, msg::AbstractMessage; kwargs...)
173+
function render(schema::AbstractOpenAIResponseSchema, msg::AbstractMessage; kwargs...)
175174
render(schema, [msg]; kwargs...)
176175
end
177176

178177
# Render for AITemplate
179-
function render(schema::AbstractResponseSchema, template::AITemplate; kwargs...)
178+
function render(schema::AbstractOpenAIResponseSchema, template::AITemplate; kwargs...)
180179
render(schema, render(template); kwargs...)
181180
end
182181

183182
# Render for Symbol (template name)
184-
function render(schema::AbstractResponseSchema, template::Symbol; kwargs...)
183+
function render(schema::AbstractOpenAIResponseSchema, template::Symbol; kwargs...)
185184
render(schema, AITemplate(template); kwargs...)
186185
end
187186

@@ -225,7 +224,7 @@ function extract_response_content(response)
225224
end
226225

227226
"""
228-
aigenerate(schema::AbstractResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
227+
aigenerate(schema::AbstractOpenAIResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
229228
previous_response_id::Union{Nothing, AbstractString} = nothing,
230229
enable_websearch::Bool = false,
231230
model::AbstractString = MODEL_CHAT,
@@ -238,7 +237,7 @@ Generate an AI response using the OpenAI Responses API with streaming support.
238237
Returns an AIMessage with the response content and additional information in the extras field.
239238
240239
# Arguments
241-
- `schema::AbstractResponseSchema`: The schema to use (e.g., `OpenAIResponseSchema()`)
240+
- `schema::AbstractOpenAIResponseSchema`: The schema to use (e.g., `OpenAIResponseSchema()`)
242241
- `prompt`: The prompt to send to the API, can be:
243242
- A string (sent as user input)
244243
- A vector of AbstractMessages (SystemMessage becomes instructions, UserMessage becomes input)
@@ -285,7 +284,7 @@ response = aigenerate(schema, "Solve 2+2*3";
285284
println(response.extras[:reasoning_content])
286285
```
287286
"""
288-
function aigenerate(schema::AbstractResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
287+
function aigenerate(schema::AbstractOpenAIResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
289288
previous_response_id::Union{Nothing, AbstractString} = nothing,
290289
enable_websearch::Bool = false,
291290
model::AbstractString = MODEL_CHAT,
@@ -367,7 +366,7 @@ function aigenerate(schema::AbstractResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
367366
end
368367

369368
"""
370-
aiextract(schema::AbstractResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
369+
aiextract(schema::AbstractOpenAIResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
371370
return_type::Union{Type, AbstractTool},
372371
model::AbstractString = MODEL_CHAT,
373372
api_key::AbstractString = "",
@@ -383,7 +382,7 @@ Note: Unlike the Chat Completions API, the Responses API `text.format` only supp
383382
JSON schema. For multi-type extraction (union of structs), use the Chat Completions API instead.
384383
385384
# Arguments
386-
- `schema::AbstractResponseSchema`: The schema to use
385+
- `schema::AbstractOpenAIResponseSchema`: The schema to use
387386
- `prompt`: The input prompt
388387
- `return_type`: A Julia struct type or AbstractTool to extract (single type only)
389388
- `model`: The model to use
@@ -424,7 +423,7 @@ result = aiextract(schema, "Solve: What is 15% of 80?";
424423
println(result.extras[:reasoning_content])
425424
```
426425
"""
427-
function aiextract(schema::AbstractResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
426+
function aiextract(schema::AbstractOpenAIResponseSchema, prompt::ALLOWED_PROMPT_TYPE;
428427
return_type::Union{Type, AbstractTool},
429428
model::AbstractString = MODEL_CHAT,
430429
api_key::AbstractString = "",

src/streaming.jl

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,18 +22,17 @@ function configure_callback!(cb::T, schema::AbstractPromptSchema;
2222
api_kwargs = (;
2323
api_kwargs..., stream = true, stream_options = (; include_usage = true))
2424
flavor = OpenAIStream()
25-
elseif schema isa AbstractResponseSchema
25+
elseif schema isa AbstractOpenAIResponseSchema
2626
## Enable streaming for Response API
27-
## Note: Using OpenAIStream until ResponseStream is available in StreamCallbacks
2827
api_kwargs = (; api_kwargs..., stream = true)
29-
flavor = OpenAIStream()
28+
flavor = OpenAIResponsesStream()
3029
elseif schema isa Union{AbstractAnthropicSchema, AbstractOllamaSchema}
3130
api_kwargs = (; api_kwargs..., stream = true)
3231
flavor = schema isa AbstractOllamaSchema ? OllamaStream() : AnthropicStream()
3332
elseif schema isa AbstractOllamaManagedSchema
3433
throw(ErrorException("OllamaManagedSchema is not supported for streaming. Use OllamaSchema instead."))
3534
else
36-
error("Unsupported schema type: $(typeof(schema)). Currently supported: OpenAISchema, AbstractResponseSchema, and AnthropicSchema.")
35+
error("Unsupported schema type: $(typeof(schema)). Currently supported: OpenAISchema, AbstractOpenAIResponseSchema, and AnthropicSchema.")
3736
end
3837
cb.flavor = flavor
3938
end
File renamed without changes.

0 commit comments

Comments
 (0)