fix: drain remaining SSE events after [DONE] to prevent TCP FIN#3442
fix: drain remaining SSE events after [DONE] to prevent TCP FIN#3442C1-BA-B1-F3 wants to merge 3 commits into
Conversation
When a backend sends 'response.output: null' in a response.completed event (e.g., chatgpt.com Codex backend), parse_response would crash with TypeError: 'NoneType' object is not iterable. This one-line fix coerces None to an empty list, allowing the stream to complete gracefully. Consumers that track output_item.done events can still backfill from their collected items. Fixes openai#3325
After receiving the [DONE] SSE event, Stream.__stream__ and AsyncStream.__stream__ now drain remaining events from the iterator before calling response.close(). This ensures the HTTP/1.1 chunked transfer-encoding terminator (0\r\n\r\n) is consumed, allowing h11's their_state to advance to DONE so httpcore returns the connection to the pool gracefully instead of destroying it with TCP FIN. Regression from 6132922 which removed the drain. Fixes openai#3440
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7dc030f8de
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for _ in iterator: | ||
| pass |
There was a problem hiding this comment.
Gate draining on the [DONE] path
This finally runs not only after the [DONE] break, but also when an SSE error event raises APIError or parsing/processing an event fails before the stream is complete. In those paths the new drain loop keeps reading until EOF before the original error reaches the caller, so a long-running or stalled stream can hang instead of aborting and closing immediately; the async drain at lines 228-229 has the same behavior. Track that [DONE] was actually seen and only drain in that case, leaving error exits to close the response directly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks for the review! You're absolutely right that the current drain loop runs even when an error occurs before [DONE], which can cause hangs on stalled streams.
The fix adds tracking for whether [DONE] was actually received, and only performs the drain-to-EOF cleanup in that specific case. Error paths (APIError, parsing failures) will now close the response immediately without waiting for a stalled stream.
The async drain at lines 228-229 is updated with the same logic. Let me know if you'd like me to clarify any part of the implementation!
There was a problem hiding this comment.
Thanks for the review! Fixed in C1-BA-B1-F3/openai-python@fix/sse-done-drain
Changes:
- Added
done_seen = Falseflag before the try block in bothStream.__stream__andAsyncStream.__stream__ - Set
done_seen = Trueonly when[DONE]is actually received - Drain loop now gated behind
if done_seen:— only runs after successful[DONE] - Error paths (APIError, parsing failures) now close response immediately without draining
Regression tests added:
test_stream_drains_before_close— verifies response is fully consumed after [DONE]test_stream_no_drain_on_error— verifies drain loop is skipped on error (uses a stalled iterator that would hang if drained)
All 24 streaming tests pass.
Address PR review feedback: only drain remaining SSE events after [DONE] is received. Error paths (APIError, parse failures) now close the response immediately without waiting for a stalled stream. - Added done_seen flag to Stream.__stream__ and AsyncStream.__stream__ - Set done_seen = True only when [DONE] is actually received - Drain loop now gated behind if done_seen - Added test_stream_no_drain_on_error regression test
Problem
After receiving the
data: [DONE]SSE event,Stream.__stream__andAsyncStream.__stream__break out of the iteration loop and immediately callresponse.close()/await response.aclose(). The underlyingresponse.iter_bytes()is not fully consumed at this point — the HTTP/1.1 chunked transfer-encoding terminator (0\r\n\r\n) may still be in flight.When h11's
their_stateis not yetDONE, httpcore takes the "destroy the connection" branch, sending TCP FIN upstream. This causes:downstream_remote_disconnectin envoy/nginx logshttpcore.RemoteProtocolError: peer closed connection without sending complete message bodyon subsequent requestsFix
Drain remaining events from the SSE iterator before calling
close(). This ensures the underlyingresponse.iter_bytes()is fully consumed (including the chunked terminator), so h11'stheir_stateadvances toDONEand the connection returns to the pool gracefully.Regression point: Commit
6132922c("fix(client): close streams without requiring full consumption") removed the drain that was originally added in7e2b2544.Fixes #3440
Changes
src/openai/_streaming.py: Added drain loop infinallyblock for both sync and async pathstests/test_streaming.py: Added regression test verifying the response stream is fully consumed before closeTest Results
All 196 tests pass (22 streaming tests including the new regression test).