Skip to content

Auto-discover tool-call markers from tokenizer config fields#1163

Open
michaelstingl wants to merge 1 commit intoml-explore:mainfrom
michaelstingl:feat/auto-discover-tool-markers
Open

Auto-discover tool-call markers from tokenizer config fields#1163
michaelstingl wants to merge 1 commit intoml-explore:mainfrom
michaelstingl:feat/auto-discover-tool-markers

Conversation

@michaelstingl
Copy link
Copy Markdown

@michaelstingl michaelstingl commented Apr 18, 2026

Summary

Google's Gemma 4 publishes structured special-token fields in tokenizer_config.json (stc_token / etc_token for tool calls, soc_token / eoc_token for the thinking channel). HuggingFace's AutoTokenizer exposes these as attributes on the tokenizer object, but nothing in mlx-lm currently consumes them. Tool-call marker discovery relies entirely on the hardcoded _infer_tool_parser() pattern chain.

This PR adds a small, additive layer that reads stc_token / etc_token directly from the tokenizer and feeds the discovered markers into the existing SequenceStateMachine plumbing in server.py. The intent is that if other model authors adopt the same convention, they will get marker detection for free without requiring a new entry in _infer_tool_parser().

Reference: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

What changes

  • mlx_lm/tokenizer_utils.py
    • New _infer_markers_from_config(tokenizer) — reads stc_token / etc_token via getattr, returns a dict of marker strings (or None).
    • TokenizerWrapper.__init__ gains optional think_start / think_end kwargs. When passed, they bypass _infer_thinking(); otherwise the existing fallback runs unchanged.
    • load() calls _infer_markers_from_config() once. When a tool parser module is matched via _infer_tool_parser(), its markers still win (unchanged behaviour). When no parser matches but the tokenizer exposes stc_token / etc_token, those markers are used — the state machine can then segregate tool-call content from normal content even without a structured parser.
  • tests/test_tokenizers.py
    • New TestMarkerDiscovery class (6 tests) covering the discovery function, the new wrapper kwargs, and the parser-precedence guarantee.

No changes to server.py, generate.py, or any tool-parser module. This is purely a discovery-layer addition.

Why tool markers only (thinking deferred)

The stc_token / etc_token pair is semantically unambiguous: anything between them is a tool call. soc_token / eoc_token are trickier — Gemma 4's soc_token is <|channel>, but the actual thinking-start sequence is the multi-token <|channel>thought (channels can have other labels too). _infer_thinking() already handles that case correctly. Auto-discovering thinking markers from soc_token would require model-specific heuristics, which defeats the purpose. The think_start / think_end kwargs on TokenizerWrapper are there as infrastructure so a future PR can plug in a cleaner convention once more models adopt one.

Backwards compatibility

The new elif branch in load() is only reachable when _infer_tool_parser() returns None and the tokenizer exposes the Gemma-4-style fields. For every currently-supported parser model this branch is a no-op; the parser's markers continue to win. test_parser_markers_take_precedence (using Qwen3-4B-4bit) exercises this explicitly.

Test plan

  • pre-commit run --files mlx_lm/tokenizer_utils.py tests/test_tokenizers.py — black + isort clean
  • python -m pytest tests/test_tool_parsing.py tests/test_tokenizers.py — 14 passed (8 existing + 6 new)
  • python -m pytest tests/test_server.py tests/test_prompt_cache.py — all passed
  • python -m pytest tests/test_generate.py — identical pass/fail set as main (the 4 failing tests in test_generate.py::TestGenerate::test_many_batches / test_batch_continued_generation* fail pre-existing on main in the same local environment, likely due to the missing test_data.zip artifact the CI downloads)
  • TestMarkerDiscovery tests use a plain stub tokenizer (no MagicMock patching) to keep them hermetic
  • Qwen3 integration test (test_parser_markers_take_precedence) confirms parser-module markers still take precedence

🤖 drafted with Claude Code, reviewed before submitting.

Gemma 4 publishes structured token fields in tokenizer_config.json
(stc_token / etc_token for tool calls) that HuggingFace's
AutoTokenizer exposes as attributes. Previously mlx-lm could only
reach these markers via hardcoded pattern matching in
_infer_tool_parser().

Add _infer_markers_from_config() which reads stc_token / etc_token
directly from the tokenizer. Parser-module markers still take
precedence when a parser is matched; config-discovered markers
fall back to enable state-machine streaming for models that ship
the fields but do not yet have a dedicated parser module.

Also add think_start / think_end parameters to TokenizerWrapper
as infrastructure for a future thinking-marker auto-discovery
pass (soc_token / eoc_token semantics are deferred until more
models adopt the convention).

Ref: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant