Add reasoning → tool state machine transition#1160
Open
christiangenco wants to merge 1 commit intoml-explore:mainfrom
Open
Add reasoning → tool state machine transition#1160christiangenco wants to merge 1 commit intoml-explore:mainfrom
christiangenco wants to merge 1 commit intoml-explore:mainfrom
Conversation
Thinking models that emit tool calls without first closing the <think> block (e.g. Kimi K2.5) never triggered the tool parser because there was no reasoning -> tool edge in the state machine. Add that transition when both has_thinking and has_tool_calling are enabled on the tokenizer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a tokenizer has both
has_thinkingandhas_tool_callingset, the server-side state machine in_make_state_machine()has no edge fromreasoningtotool. If the model emits the tool-call start token while still inside a<think>block (i.e. without first emitting</think>), the tool-call markers are accumulated as reasoning text and the tool parser never runs.This affects models like Kimi K2.5 (
mlx-community/Kimi-K2.5), which opens<think>at the start of every response and will happily emit<|tool_calls_section_begin|>from inside the think block.Fix
In
_make_state_machine(), when bothhas_thinkingandhas_tool_callingare true, add a(tool_call_start_tokens, "tool")edge to thereasoningstate. This lets the state machine transition from reasoning directly to tool on the tool start token, the same way it does from normal.The fix reuses the
ts = tokenizer.tool_call_start_tokensbinding from the existinghas_tool_callingblock — no re-encoding needed.Testing
Added
test_state_machine_reasoning_to_tool_transitionintests/test_generate.py. It constructs aSequenceStateMachinestarting inreasoning, feeds reasoning-state tokens, then the tool start token, and asserts the machine transitionsreasoning → tool → normalcorrectly.Ran the full suite locally on Apple Silicon (Python 3.10,
pip install -e ".[test]",HF_HOME=.with the release test_data bundle):Pre-commit (black + isort) passes.
Note on Kimi K2.5 specifically
This fix alone does not fully resolve Kimi K2.5 tool calling, because that model also emits the function identifier (
functions.name:idx) before<|tool_calls_section_begin|>— so the function name is still consumed as reasoning text before the tool state begins, and thekimi_k2parser only sees the bare JSON arguments. A separate parser enhancement would be needed for full Kimi K2.5 support.However, the missing
reasoning → tooltransition is a correctness gap in its own right that affects any thinking + tool-calling model where tool calls may be emitted from inside a think block. This PR closes that gap.