fix: merge duplicate-index entries in streaming delta accumulator (#3201)#3446
fix: merge duplicate-index entries in streaming delta accumulator (#3201)#3446C1-BA-B1-F3 wants to merge 1 commit into
Conversation
When the first streamed chunk contains multiple tool_calls entries with the same (e.g. id/name + start of arguments, both at index 0), the accumulator stored them as separate list entries. Subsequent chunks only merged into the first one at that position, leaving the duplicate stranded with incomplete arguments. Fix: add that merges list entries sharing the same integer key before first storage. Applied to both (chat completions) and (assistants API). Fixes openai#3201
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75b895d3d4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if is_list(delta_value) and delta_value: | ||
| acc[key] = _normalize_indexed_list(delta_value) |
There was a problem hiding this comment.
Normalize duplicate tool calls in the initial chunk
When the duplicate tool_calls entries arrive in the very first chat streaming chunk, this branch is never reached: ChatCompletionStreamState._accumulate_chunk() returns _convert_initial_chunk_into_snapshot() while the snapshot is None, and that path copies choice.delta.to_dict() directly. In that provider scenario, the first chunk still stores two entries with index == 0, so the next chunk merges into tool_calls[0] and leaves the argument prefix stranded in tool_calls[1]; apply the same normalization in the initial conversion path or route the first chunk through the accumulator.
Useful? React with 👍 / 👎.
Problem
When the first streamed chunk contains multiple
tool_callsentries with the sameindex(e.g. id/name + start of arguments, both at index 0), the accumulator stores them as separate list entries. Subsequent chunks only merge into the first one at that position, leaving the duplicate stranded with incomplete arguments.This produces invalid final JSON for tool call arguments — some argument fragments end up in a second list entry that nobody reads.
Root cause: The
accumulate_deltafunction stores the first encounter of a list value directly viaacc[key] = delta_valuewithout checking for duplicate indices.Fix
Add
_normalize_indexed_list()that merges list entries sharing the same integerindexkey before first storage. Applied to both:src/openai/lib/streaming/_deltas.py(chat completions)src/openai/lib/streaming/_assistants.py(assistants API)Test plan
tests/test_accumulate_delta.pywith 6 test cases covering:All 6 new tests pass. All 15 existing streaming tests pass.
Fixes #3201