minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size by adurham · Pull Request #1204 · ml-explore/mlx-lm

adurham · 2026-04-26T16:54:57Z

Summary

Two small mlx_lm/models/minimax.py changes:

Validate head_dim against the checkpoint in Model.sanitize(). ModelArgs.head_dim is Optional[int] = None with a silent fallback to hidden_size // num_attention_heads (e.g. 64) — correct for older MiniMax variants but wrong for the released M2 checkpoint, which ships head_dim=128. Today's mlx-community configs set the field explicitly so loads succeed, but any config sanitizer that strips unknown fields would silently produce 3072-wide projections against a 6144-wide checkpoint and fail with a cryptic shape error deep in load. Compare effective head_dim × num_attention_heads against the actual q_proj weight shape and raise a targeted ValueError pointing at config.json.
Drop shared_intermediate_size from ModelArgs. The field is required (int, no default) but never referenced in the model body, and the released M2 config ships intermediate_size=0 for it (no shared expert). BaseModelArgs.from_dict filters unknown keys, so existing configs containing the field still load.

Verification

tests/test_models.py::test_all_models passes (the test fixture at line 2717 includes "shared_intermediate_size": 128 — from_dict's parameter filter drops the unknown key cleanly, model builds, forward pass + cache + batch + deepcopy all OK).
black==25.1.0 and isort==6.0.0 --profile=black: clean.

Happy to split into two PRs if you'd prefer reviewing the validation and the field removal separately.

…te_size ModelArgs.head_dim was Optional[int] = None with a silent fallback to hidden_size // num_attention_heads (= 64) — correct for older MiniMax variants but wrong against the released M2, which ships head_dim=128. Current mlx-community configs set the field explicitly so we load correctly today, but a config sanitizer that strips unknown fields would silently produce 3072-wide projections against a 6144-wide checkpoint with a cryptic shape error at load time. Adds a validation in Model.sanitize() that compares the effective head_dim × num_attention_heads against the actual q_proj weight shape in the checkpoint and raises a targeted ValueError pointing at the config if they disagree. Also drops the shared_intermediate_size field from ModelArgs. M2 has no shared expert (intermediate_size=0 in the released config), the field is never referenced in the model body, and BaseModelArgs.from_dict already filters unknown keys — so configs with the field will still load.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size#1204

minimax: validate head_dim against checkpoint, drop unused shared_intermediate_size#1204
adurham wants to merge 1 commit intoml-explore:mainfrom
adurham:minimax-validate-head-dim

adurham commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adurham commented Apr 26, 2026

Summary

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant