fix(server): build xtc_special_tokens as a flat list by devYRPauli · Pull Request #1408 · ml-explore/mlx-lm

devYRPauli · 2026-06-15T21:00:10Z

Bug

_make_sampler in mlx_lm/server.py builds xtc_special_tokens as a nested list:

xtc_special_tokens=[
    tokenizer.eos_token_id,   # int
    tokenizer.encode("\n"),   # list, e.g. [198]
],

That produces [int, [int]] (e.g. [50256, [198]]). apply_xtc expects a flat List[int] — it does mask[..., xtc_special_tokens] = False. With the nested list it raises:

ValueError: Initialization encountered extra dimension.

So any server request with temperature > 0 and xtc_probability > 0 fails (at temperature == 0 make_sampler short-circuits to argmax and never applies XTC, which is why it isn't always hit).

generate.py and chat.py already build this correctly:

xtc_special_tokens=tokenizer.encode("\n") + list(tokenizer.eos_token_ids)

Fix

Build the list the same flat way in server.py.

Verification

Real tokenizer (gpt2) reproduces the exact construction and failure:

tokenizer.encode("\n")  -> [198]   (list)
tokenizer.eos_token_id  -> 50256   (int)
server builds           -> [50256, [198]]
apply_xtc(..., [50256, [198]])  -> ValueError: Initialization encountered extra dimension.
apply_xtc(..., [198, 50256])    -> OK

Added a network-free regression test (tests/test_server.py::TestMakeSampler) that calls _make_sampler with temperature=0.6, xtc_probability=1.0 and runs the returned sampler. It fails on the current code with the ValueError above and passes with the fix. tests/test_sample_utils.py still passes (7); black + isort clean.

_make_sampler built xtc_special_tokens as [tokenizer.eos_token_id, tokenizer.encode("\n")], i.e. [int, list] -> a nested list. apply_xtc expects a flat List[int] (it does mask[..., xtc_special_tokens] = False), so any request with temperature > 0 and xtc_probability > 0 raised "ValueError: Initialization encountered extra dimension." and failed. Build it the same way generate.py and chat.py already do: tokenizer.encode("\n") + list(tokenizer.eos_token_ids). Adds a network-free regression test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): build xtc_special_tokens as a flat list#1408

fix(server): build xtc_special_tokens as a flat list#1408
devYRPauli wants to merge 1 commit into
ml-explore:mainfrom
devYRPauli:fix/server-xtc-special-tokens-flat

devYRPauli commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devYRPauli commented Jun 15, 2026

Bug

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant