Skip to content

fix(server): build xtc_special_tokens as a flat list#1408

Open
devYRPauli wants to merge 1 commit into
ml-explore:mainfrom
devYRPauli:fix/server-xtc-special-tokens-flat
Open

fix(server): build xtc_special_tokens as a flat list#1408
devYRPauli wants to merge 1 commit into
ml-explore:mainfrom
devYRPauli:fix/server-xtc-special-tokens-flat

Conversation

@devYRPauli

Copy link
Copy Markdown
Contributor

Bug

_make_sampler in mlx_lm/server.py builds xtc_special_tokens as a nested list:

xtc_special_tokens=[
    tokenizer.eos_token_id,   # int
    tokenizer.encode("\n"),   # list, e.g. [198]
],

That produces [int, [int]] (e.g. [50256, [198]]). apply_xtc expects a flat List[int] — it does mask[..., xtc_special_tokens] = False. With the nested list it raises:

ValueError: Initialization encountered extra dimension.

So any server request with temperature > 0 and xtc_probability > 0 fails (at temperature == 0 make_sampler short-circuits to argmax and never applies XTC, which is why it isn't always hit).

generate.py and chat.py already build this correctly:

xtc_special_tokens=tokenizer.encode("\n") + list(tokenizer.eos_token_ids)

Fix

Build the list the same flat way in server.py.

Verification

Real tokenizer (gpt2) reproduces the exact construction and failure:

tokenizer.encode("\n")  -> [198]   (list)
tokenizer.eos_token_id  -> 50256   (int)
server builds           -> [50256, [198]]
apply_xtc(..., [50256, [198]])  -> ValueError: Initialization encountered extra dimension.
apply_xtc(..., [198, 50256])    -> OK

Added a network-free regression test (tests/test_server.py::TestMakeSampler) that calls _make_sampler with temperature=0.6, xtc_probability=1.0 and runs the returned sampler. It fails on the current code with the ValueError above and passes with the fix. tests/test_sample_utils.py still passes (7); black + isort clean.

_make_sampler built xtc_special_tokens as [tokenizer.eos_token_id,
tokenizer.encode("\n")], i.e. [int, list] -> a nested list. apply_xtc
expects a flat List[int] (it does mask[..., xtc_special_tokens] = False),
so any request with temperature > 0 and xtc_probability > 0 raised
"ValueError: Initialization encountered extra dimension." and failed.

Build it the same way generate.py and chat.py already do:
tokenizer.encode("\n") + list(tokenizer.eos_token_ids). Adds a
network-free regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant