Auto-discover tool-call markers from tokenizer config fields by michaelstingl · Pull Request #1163 · ml-explore/mlx-lm

michaelstingl · 2026-04-18T22:31:43Z

Summary

Google's Gemma 4 publishes structured special-token fields in tokenizer_config.json (stc_token / etc_token for tool calls, soc_token / eoc_token for the thinking channel). HuggingFace's AutoTokenizer exposes these as attributes on the tokenizer object, but nothing in mlx-lm currently consumes them. Tool-call marker discovery relies entirely on the hardcoded _infer_tool_parser() pattern chain.

This PR adds a small, additive layer that reads stc_token / etc_token directly from the tokenizer and feeds the discovered markers into the existing SequenceStateMachine plumbing in server.py. The intent is that if other model authors adopt the same convention, they will get marker detection for free without requiring a new entry in _infer_tool_parser().

Reference: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

What changes

mlx_lm/tokenizer_utils.py
- New _infer_markers_from_config(tokenizer) — reads stc_token / etc_token via getattr, returns a dict of marker strings (or None).
- TokenizerWrapper.__init__ gains optional think_start / think_end kwargs. When passed, they bypass _infer_thinking(); otherwise the existing fallback runs unchanged.
- load() calls _infer_markers_from_config() once. When a tool parser module is matched via _infer_tool_parser(), its markers still win (unchanged behaviour). When no parser matches but the tokenizer exposes stc_token / etc_token, those markers are used — the state machine can then segregate tool-call content from normal content even without a structured parser.
tests/test_tokenizers.py
- New TestMarkerDiscovery class (6 tests) covering the discovery function, the new wrapper kwargs, and the parser-precedence guarantee.

No changes to server.py, generate.py, or any tool-parser module. This is purely a discovery-layer addition.

Why tool markers only (thinking deferred)

The stc_token / etc_token pair is semantically unambiguous: anything between them is a tool call. soc_token / eoc_token are trickier — Gemma 4's soc_token is <|channel>, but the actual thinking-start sequence is the multi-token <|channel>thought (channels can have other labels too). _infer_thinking() already handles that case correctly. Auto-discovering thinking markers from soc_token would require model-specific heuristics, which defeats the purpose. The think_start / think_end kwargs on TokenizerWrapper are there as infrastructure so a future PR can plug in a cleaner convention once more models adopt one.

Backwards compatibility

The new elif branch in load() is only reachable when _infer_tool_parser() returns None and the tokenizer exposes the Gemma-4-style fields. For every currently-supported parser model this branch is a no-op; the parser's markers continue to win. test_parser_markers_take_precedence (using Qwen3-4B-4bit) exercises this explicitly.

Test plan

pre-commit run --files mlx_lm/tokenizer_utils.py tests/test_tokenizers.py — black + isort clean
python -m pytest tests/test_tool_parsing.py tests/test_tokenizers.py — 14 passed (8 existing + 6 new)
python -m pytest tests/test_server.py tests/test_prompt_cache.py — all passed
python -m pytest tests/test_generate.py — identical pass/fail set as main (the 4 failing tests in test_generate.py::TestGenerate::test_many_batches / test_batch_continued_generation* fail pre-existing on main in the same local environment, likely due to the missing test_data.zip artifact the CI downloads)
TestMarkerDiscovery tests use a plain stub tokenizer (no MagicMock patching) to keep them hermetic
Qwen3 integration test (test_parser_markers_take_precedence) confirms parser-module markers still take precedence

_{🤖 drafted with Claude Code, reviewed before submitting.}

Gemma 4 publishes structured token fields in tokenizer_config.json (stc_token / etc_token for tool calls) that HuggingFace's AutoTokenizer exposes as attributes. Previously mlx-lm could only reach these markers via hardcoded pattern matching in _infer_tool_parser(). Add _infer_markers_from_config() which reads stc_token / etc_token directly from the tokenizer. Parser-module markers still take precedence when a parser is matched; config-discovered markers fall back to enable state-machine streaming for models that ship the fields but do not yet have a dedicated parser module. Also add think_start / think_end parameters to TokenizerWrapper as infrastructure for a future thinking-marker auto-discovery pass (soc_token / eoc_token semantics are deferred until more models adopt the convention). Ref: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

michaelstingl mentioned this pull request Apr 18, 2026

Strip tool-call markup from streamed delta.content Blaizzy/mlx-vlm#1035

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-discover tool-call markers from tokenizer config fields#1163

Auto-discover tool-call markers from tokenizer config fields#1163
michaelstingl wants to merge 1 commit intoml-explore:mainfrom
michaelstingl:feat/auto-discover-tool-markers

michaelstingl commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michaelstingl commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Why tool markers only (thinking deferred)

Backwards compatibility

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

michaelstingl commented Apr 18, 2026 •

edited

Loading