Skip to content

fix(gemma4): drop KV-shared layer projections in sanitize#1205

Open
Fox13 wants to merge 1 commit intoml-explore:mainfrom
Fox13:fix/gemma4-kv-shared-sanitize
Open

fix(gemma4): drop KV-shared layer projections in sanitize#1205
Fox13 wants to merge 1 commit intoml-explore:mainfrom
Fox13:fix/gemma4-kv-shared-sanitize

Conversation

@Fox13
Copy link
Copy Markdown

@Fox13 Fox13 commented Apr 26, 2026

Problem

Quantized Gemma 4 safetensors files ship k_proj/v_proj/k_norm weights for layers >= num_hidden_layers - num_kv_shared_layers, even though those layers reuse KV projections from an earlier layer and have no corresponding model attributes. Loading any quantized Gemma 4 model fails with:

ValueError: Received 126 parameters not in model:
model.layers.24.self_attn.k_norm.weight,
model.layers.24.self_attn.k_proj.biases,
...
model.layers.41.self_attn.v_proj.weight

This affects mlx-community/gemma-4-e4b-it-4bit (52k downloads) and other quantized variants.

Fix

Drop those tensors in Gemma4TextModel.sanitize(), the complement of #1158. That PR stopped mlx-lm from creating unused projection slots for KV-shared layers. This PR drops the incoming weights for those slots that still arrive from the checkpoint. Together they make quantized Gemma 4 models load cleanly.

m = _re.search(r"\.layers\.(\d+)\.self_attn\.(k_proj|v_proj|k_norm)\.", k)
if m and int(m.group(1)) >= first_kv_shared:
    continue

Testing

Tested on M4 MacBook Air 24 GB with mlx-community/gemma-4-e4b-it-4bit. Loads cleanly and runs at 18–25 tok/s across 500–4K token prompts. Before this fix it failed immediately with the ValueError above.

Quantized Gemma 4 safetensors files include k_proj/v_proj/k_norm
weights for layers >= num_hidden_layers - num_kv_shared_layers, even
though those layers reuse KV from an earlier layer and have no
corresponding model attributes. This causes a strict-load ValueError:

  ValueError: Received 126 parameters not in model: ...

Drop those tensors in sanitize(), which is the complement of ml-explore#1158
(which stopped mlx-lm from creating unused projection slots on those
layers). Together the two fixes make standard and custom-quantized
Gemma 4 models load cleanly.

Tested with mlx-community/gemma-4-e4b-it-4bit and
FakeRockert543/gemma-4-e4b-it-MLX-4bit on M4 MacBook Air 24 GB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant