Skip to content

minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size#1204

Open
adurham wants to merge 1 commit intoml-explore:mainfrom
adurham:minimax-validate-head-dim
Open

minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size#1204
adurham wants to merge 1 commit intoml-explore:mainfrom
adurham:minimax-validate-head-dim

Conversation

@adurham
Copy link
Copy Markdown
Contributor

@adurham adurham commented Apr 26, 2026

Summary

Two small mlx_lm/models/minimax.py changes:

  1. Validate head_dim against the checkpoint in Model.sanitize(). ModelArgs.head_dim is Optional[int] = None with a silent fallback to hidden_size // num_attention_heads (e.g. 64) — correct for older MiniMax variants but wrong for the released M2 checkpoint, which ships head_dim=128. Today's mlx-community configs set the field explicitly so loads succeed, but any config sanitizer that strips unknown fields would silently produce 3072-wide projections against a 6144-wide checkpoint and fail with a cryptic shape error deep in load. Compare effective head_dim × num_attention_heads against the actual q_proj weight shape and raise a targeted ValueError pointing at config.json.

  2. Drop shared_intermediate_size from ModelArgs. The field is required (int, no default) but never referenced in the model body, and the released M2 config ships intermediate_size=0 for it (no shared expert). BaseModelArgs.from_dict filters unknown keys, so existing configs containing the field still load.

Verification

  • tests/test_models.py::test_all_models passes (the test fixture at line 2717 includes "shared_intermediate_size": 128from_dict's parameter filter drops the unknown key cleanly, model builds, forward pass + cache + batch + deepcopy all OK).
  • black==25.1.0 and isort==6.0.0 --profile=black: clean.

Happy to split into two PRs if you'd prefer reviewing the validation and the field removal separately.

…te_size

ModelArgs.head_dim was Optional[int] = None with a silent fallback to
hidden_size // num_attention_heads (= 64) — correct for older MiniMax
variants but wrong against the released M2, which ships head_dim=128.
Current mlx-community configs set the field explicitly so we load
correctly today, but a config sanitizer that strips unknown fields
would silently produce 3072-wide projections against a 6144-wide
checkpoint with a cryptic shape error at load time.

Adds a validation in Model.sanitize() that compares the effective
head_dim × num_attention_heads against the actual q_proj weight shape
in the checkpoint and raises a targeted ValueError pointing at the
config if they disagree.

Also drops the shared_intermediate_size field from ModelArgs. M2 has
no shared expert (intermediate_size=0 in the released config), the
field is never referenced in the model body, and BaseModelArgs.from_dict
already filters unknown keys — so configs with the field will still
load.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant