minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size#1204
Open
adurham wants to merge 1 commit intoml-explore:mainfrom
Open
minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size#1204adurham wants to merge 1 commit intoml-explore:mainfrom
adurham wants to merge 1 commit intoml-explore:mainfrom
Conversation
…te_size ModelArgs.head_dim was Optional[int] = None with a silent fallback to hidden_size // num_attention_heads (= 64) — correct for older MiniMax variants but wrong against the released M2, which ships head_dim=128. Current mlx-community configs set the field explicitly so we load correctly today, but a config sanitizer that strips unknown fields would silently produce 3072-wide projections against a 6144-wide checkpoint with a cryptic shape error at load time. Adds a validation in Model.sanitize() that compares the effective head_dim × num_attention_heads against the actual q_proj weight shape in the checkpoint and raises a targeted ValueError pointing at the config if they disagree. Also drops the shared_intermediate_size field from ModelArgs. M2 has no shared expert (intermediate_size=0 in the released config), the field is never referenced in the model body, and BaseModelArgs.from_dict already filters unknown keys — so configs with the field will still load.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small
mlx_lm/models/minimax.pychanges:Validate
head_dimagainst the checkpoint inModel.sanitize().ModelArgs.head_dimisOptional[int] = Nonewith a silent fallback tohidden_size // num_attention_heads(e.g. 64) — correct for older MiniMax variants but wrong for the released M2 checkpoint, which shipshead_dim=128. Today's mlx-community configs set the field explicitly so loads succeed, but any config sanitizer that strips unknown fields would silently produce 3072-wide projections against a 6144-wide checkpoint and fail with a cryptic shape error deep in load. Compare effectivehead_dim × num_attention_headsagainst the actualq_projweight shape and raise a targetedValueErrorpointing atconfig.json.Drop
shared_intermediate_sizefromModelArgs. The field is required (int, no default) but never referenced in the model body, and the released M2 config shipsintermediate_size=0for it (no shared expert).BaseModelArgs.from_dictfilters unknown keys, so existing configs containing the field still load.Verification
tests/test_models.py::test_all_modelspasses (the test fixture at line 2717 includes"shared_intermediate_size": 128—from_dict's parameter filter drops the unknown key cleanly, model builds, forward pass + cache + batch + deepcopy all OK).black==25.1.0andisort==6.0.0 --profile=black: clean.Happy to split into two PRs if you'd prefer reviewing the validation and the field removal separately.