Bugfix: Prevent reasoning_content leak #32997

RohanDisa · 2026-01-24T07:09:30Z

Purpose

Fix a bug where reasoning_content was incorrectly flushed into content in the final streamed chunk when finish_reason='tool_calls'.

Root Cause & Solution

Stream finalization did not clear content/reasoning buffers when finish_reason='tool_calls', allowing leftover reasoning_content (especially from speculative decoding) to leak.
Added guards to clear content/reasoning fields before final chunk creation and after tool call extraction.

Test Plan

Added regression test test_no_content_leak_when_finish_reason_tool_calls in tests/entrypoints/openai/test_chat_with_tool_reasoning.py.
The test simulates streaming with reasoning + tool_choice="auto" and asserts that the final chunk has no content/reasoning leaks while earlier reasoning is preserved.

Test Result

Regression test passes ✅
Final chunk with finish_reason="tool_calls" has content=null and no reasoning/reasoning_content

Fixes: #32921

…ool_calls This fixes a bug where reasoning_content was incorrectly flushed into the content field in the final streamed chunk when finish_reason='tool_calls'. The issue occurred when: - stream=true - OpenAI tool-call parser enabled - tool_choice='auto' - reasoning fields enabled (reasoning, reasoning_content) - speculative decoding enabled Per OpenAI's schema contract, when finish_reason='tool_calls', the response must only contain tool_calls and finish_reason, never content. Changes: 1. Add guard before final chunk creation to clear content/reasoning when finish_reason='tool_calls' 2. Add guards after tool call extraction in all paths (auto, required, named, harmony) to prevent content leakage during streaming 3. Ensure reasoning_content is never flushed into content when tool calls are present 4. Add test to verify no content leak when finish_reason=tool_calls Signed-off-by: RohanDisa <105740583+RohanDisa@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request effectively addresses a bug where reasoning_content could leak into the content field during streaming with tool calls. The solution, which involves clearing content and reasoning fields at various points, is sound and is well-supported by a new regression test. My review includes a suggestion to ensure consistency in the fix across different code paths and a recommendation to refactor duplicated code to improve long-term maintainability.

gemini-code-assist · 2026-01-24T07:11:21Z

vllm/entrypoints/openai/chat_completion/serving.py

+                        # in harmony path. Per OpenAI spec, tool call deltas
+                        # must not contain content.
+                        if tools_streamed_flag and delta_message:
+                            delta_message.content = None


For consistency with the other fixes in this file and to fully prevent data leakage, reasoning and reasoning_content should also be cleared here, not just content. This ensures that no residual reasoning data is present in tool call deltas in the harmony path.

Suggested change

delta_message.content = None

delta_message.content = None

delta_message.reasoning = None

delta_message.reasoning_content = None

gemini-code-assist · 2026-01-24T07:11:21Z

vllm/entrypoints/openai/chat_completion/serving.py

+                            # Ensure no content/reasoning leaks when tool calls
+                            # are present. Per OpenAI spec, tool call deltas
+                            # must not contain content.
+                            delta_message.content = None
+                            delta_message.reasoning = None
+                            delta_message.reasoning_content = None


This block of code for clearing content and reasoning fields is duplicated in several places within this function when handling tool calls. This code duplication can make future maintenance more difficult and increases the risk of inconsistent fixes.

To improve maintainability and reduce redundancy, I recommend refactoring this logic into a private helper method. For example:

def _clear_non_tool_call_fields(self, delta_message: DeltaMessage): """Ensures no content/reasoning leaks when tool calls are present.""" delta_message.content = None delta_message.reasoning = None delta_message.reasoning_content = None

You could then call this helper method in all the relevant locations.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-01-24T07:18:49Z

vllm/entrypoints/openai/chat_completion/serving.py

+                        # in harmony path. Per OpenAI spec, tool call deltas
+                        # must not contain content.
+                        if tools_streamed_flag and delta_message:
+                            delta_message.content = None


Harmony path misses clearing reasoning fields

Medium Severity

In the harmony path, when tools_streamed_flag is true, only delta_message.content is set to None, but reasoning and reasoning_content are not cleared. The extract_harmony_streaming_delta function can return a DeltaMessage with both reasoning and tool_calls set simultaneously (see stream_harmony.py lines 163-166). This allows reasoning to leak in intermediate streaming chunks for GPT-OSS models, which contradicts the PR's purpose. All other paths in this PR correctly clear all three fields.

RohanDisa requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners January 24, 2026 07:09

mergify bot added frontend tool-calling bug Something isn't working labels Jan 24, 2026

github-project-automation bot added this to Tool Calling Jan 24, 2026

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

cursor bot reviewed Jan 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix: Prevent reasoning_content leak #32997

Bugfix: Prevent reasoning_content leak #32997

RohanDisa commented Jan 24, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 24, 2026

Uh oh!

gemini-code-assist bot Jan 24, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Bugfix: Prevent reasoning_content leak #32997

Are you sure you want to change the base?

Bugfix: Prevent reasoning_content leak #32997

Conversation

RohanDisa commented Jan 24, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Root Cause & Solution

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 24, 2026

Choose a reason for hiding this comment

Harmony path misses clearing reasoning fields

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RohanDisa commented Jan 24, 2026 •

edited by github-actions bot

Loading