fix: merge config.json eos_token_id with tokenizer-derived value instead of replacing it#1414
Open
EmreCelenli wants to merge 1 commit into
Open
Conversation
…ead of replacing it
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TokenizerWrapper.__init__treatseos_token_idspassed in fromconfig.jsonas a hard replacement for whatever the tokenizer itself correctly derives.
When
config.jsonhas a stale value (common in third-party MLX-quantized repos),the correct stop token is silently discarded, causing generation to never stop
at the right boundary. The real stop token (e.g.
<|im_end|>) leaks into outputas plain text and the model keeps generating until max tokens.
Root Cause
In
utils.py, bothload()andsharded_load()do:And
TokenizerWrapper.__init__treats this as a strict replacement:So if
config.jsonhas a stale value, the tokenizer's own correct derivationis silently thrown away with no warning.
Fix
Verified
Model:
mlx-community/Qwen2.5-Coder-7B-Instruct-4bit, 16GB M1 Mac.config.jsonhaseos_token_id: 151643(<|endoftext|>, stale pretraining EOS).Tokenizer correctly derives
151645(<|im_end|>).Real-world symptom:
<|im_end|>no longer leaks into output when usingmlx_lm.serverwith this model. Generation speed returns to normal sincethe model stops at the correct boundary instead of running to max tokens.
Tests
203 tests pass. Black formatting check passed.