Skip to content

Conversation

@RohanDisa
Copy link

@RohanDisa RohanDisa commented Jan 24, 2026

Purpose

Fix a bug where reasoning_content was incorrectly flushed into content in the final streamed chunk when finish_reason='tool_calls'.

Root Cause & Solution

Stream finalization did not clear content/reasoning buffers when finish_reason='tool_calls', allowing leftover reasoning_content (especially from speculative decoding) to leak.
Added guards to clear content/reasoning fields before final chunk creation and after tool call extraction.

Test Plan

  • Added regression test test_no_content_leak_when_finish_reason_tool_calls in tests/entrypoints/openai/test_chat_with_tool_reasoning.py.
  • The test simulates streaming with reasoning + tool_choice="auto" and asserts that the final chunk has no content/reasoning leaks while earlier reasoning is preserved.

Test Result

  • Regression test passes ✅
  • Final chunk with finish_reason="tool_calls" has content=null and no reasoning/reasoning_content

Fixes: #32921

…ool_calls

This fixes a bug where reasoning_content was incorrectly flushed into
the content field in the final streamed chunk when finish_reason='tool_calls'.

The issue occurred when:
- stream=true
- OpenAI tool-call parser enabled
- tool_choice='auto'
- reasoning fields enabled (reasoning, reasoning_content)
- speculative decoding enabled

Per OpenAI's schema contract, when finish_reason='tool_calls', the
response must only contain tool_calls and finish_reason, never content.

Changes:
1. Add guard before final chunk creation to clear content/reasoning
   when finish_reason='tool_calls'
2. Add guards after tool call extraction in all paths (auto, required,
   named, harmony) to prevent content leakage during streaming
3. Ensure reasoning_content is never flushed into content when
   tool calls are present
4. Add test to verify no content leak when finish_reason=tool_calls

Signed-off-by: RohanDisa <105740583+RohanDisa@users.noreply.github.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a bug where reasoning_content could leak into the content field during streaming with tool calls. The solution, which involves clearing content and reasoning fields at various points, is sound and is well-supported by a new regression test. My review includes a suggestion to ensure consistency in the fix across different code paths and a recommendation to refactor duplicated code to improve long-term maintainability.

# in harmony path. Per OpenAI spec, tool call deltas
# must not contain content.
if tools_streamed_flag and delta_message:
delta_message.content = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency with the other fixes in this file and to fully prevent data leakage, reasoning and reasoning_content should also be cleared here, not just content. This ensures that no residual reasoning data is present in tool call deltas in the harmony path.

Suggested change
delta_message.content = None
delta_message.content = None
delta_message.reasoning = None
delta_message.reasoning_content = None

Comment on lines +988 to +993
# Ensure no content/reasoning leaks when tool calls
# are present. Per OpenAI spec, tool call deltas
# must not contain content.
delta_message.content = None
delta_message.reasoning = None
delta_message.reasoning_content = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block of code for clearing content and reasoning fields is duplicated in several places within this function when handling tool calls. This code duplication can make future maintenance more difficult and increases the risk of inconsistent fixes.

To improve maintainability and reduce redundancy, I recommend refactoring this logic into a private helper method. For example:

def _clear_non_tool_call_fields(self, delta_message: DeltaMessage):
    """Ensures no content/reasoning leaks when tool calls are present."""
    delta_message.content = None
    delta_message.reasoning = None
    delta_message.reasoning_content = None

You could then call this helper method in all the relevant locations.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

# in harmony path. Per OpenAI spec, tool call deltas
# must not contain content.
if tools_streamed_flag and delta_message:
delta_message.content = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harmony path misses clearing reasoning fields

Medium Severity

In the harmony path, when tools_streamed_flag is true, only delta_message.content is set to None, but reasoning and reasoning_content are not cleared. The extract_harmony_streaming_delta function can return a DeltaMessage with both reasoning and tool_calls set simultaneously (see stream_harmony.py lines 163-166). This allows reasoning to leak in intermediate streaming chunks for GPT-OSS models, which contradicts the PR's purpose. All other paths in this PR correctly clear all three fields.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: gpt-oss-20b streaming last reasoning content part into content

1 participant