fix(utils): skip already-quantized layers in load_model._quantize predicate by adurham · Pull Request #1216 · ml-explore/mlx-lm

adurham · 2026-04-27T19:55:08Z

Summary

Models that pre-quantize specific layers in their __init__ (for example DeepSeek V4's DeepseekV4MoE calling SwitchLinear.to_quantized(..., mode=\"mxfp4\") on its expert projections so the experts have a non-default quantization mode) trip load_model._quantize's walker if those same layer paths also appear in config[\"quantization\"] as per-layer overrides. The predicate returns the override dict for the path, so nn.quantize tries to re-quantize the already-Quantized* module and raises:

ValueError: Unable to quantize model of type
<class 'mlx_lm.models.switch_layers.QuantizedSwitchLinear'>

The existing not hasattr(m, \"to_quantized\") clause already encodes the "module is already quantized, skip" intent — moving it ahead of the per-layer-override check makes that intent take effect even when the override map covers a pre-quantized path. End state for non-pre-quantized layers is unchanged.

Repro

from mlx_lm import load
model, tok = load(\"mlx-community/DeepSeek-V4-Flash-6bit\")
# ValueError: Unable to quantize model of type <class 'mlx_lm.models.switch_layers.QuantizedSwitchLinear'>

The 6bit checkpoint declares model.layers.<i>.ffn.switch_mlp.{gate,up,down}_proj overrides at mode=\"mxfp4\", bits=4, group_size=32 — matching what the model code pre-applies — and that's what triggers the re-quantize attempt.

Diff

One-line move; ~8 lines counting the comment:

 def class_predicate(p, m):
+    # Skip layers already quantized at construction time...
+    if not hasattr(m, \"to_quantized\"):
+        return False
     # Handle custom per layer quantizations
     if p in config[\"quantization\"]:
         return config[\"quantization\"][p]
-    if not hasattr(m, \"to_quantized\"):
-        return False
     return f\"{p}.scales\" in weights

🤖 Generated with Claude Code

…dicate Models that pre-quantize specific layers in their `__init__` (for example DeepSeek V4's `DeepseekV4MoE` calling `SwitchLinear.to_quantized(..., mode="mxfp4")` on its expert projections so the experts have a non-default quantization mode) trip `load_model._quantize`'s walker if those same layer paths also appear in `config["quantization"]` as per-layer overrides. The predicate returns the override dict for the path, so `nn.quantize` tries to re-quantize the already-`Quantized*` module and raises: ValueError: Unable to quantize model of type <class 'mlx_lm.models.switch_layers.QuantizedSwitchLinear'> The existing `not hasattr(m, "to_quantized")` clause already encodes the "module is already quantized, skip" intent — moving it ahead of the per-layer-override check makes that intent take effect even when the override map covers a pre-quantized path. End state for non-pre-quantized layers is unchanged. Reproducible on `mlx-community/DeepSeek-V4-Flash-6bit` whose config declares `model.layers.<i>.ffn.switch_mlp.{gate,up,down}_proj` overrides at `mode="mxfp4", bits=4, group_size=32` — matching what the model code pre-applies — and triggers the re-quantize attempt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(utils): skip already-quantized layers in load_model._quantize predicate#1216

fix(utils): skip already-quantized layers in load_model._quantize predicate#1216
adurham wants to merge 1 commit intoml-explore:mainfrom
adurham:mlx-lm-quantize-skip-already-quantized

adurham commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adurham commented Apr 27, 2026

Summary

Repro

Diff

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant