fix(gemma4): drop KV-shared layer projections in sanitize#1205
Open
Fox13 wants to merge 1 commit intoml-explore:mainfrom
Open
fix(gemma4): drop KV-shared layer projections in sanitize#1205Fox13 wants to merge 1 commit intoml-explore:mainfrom
Fox13 wants to merge 1 commit intoml-explore:mainfrom
Conversation
Quantized Gemma 4 safetensors files include k_proj/v_proj/k_norm weights for layers >= num_hidden_layers - num_kv_shared_layers, even though those layers reuse KV from an earlier layer and have no corresponding model attributes. This causes a strict-load ValueError: ValueError: Received 126 parameters not in model: ... Drop those tensors in sanitize(), which is the complement of ml-explore#1158 (which stopped mlx-lm from creating unused projection slots on those layers). Together the two fixes make standard and custom-quantized Gemma 4 models load cleanly. Tested with mlx-community/gemma-4-e4b-it-4bit and FakeRockert543/gemma-4-e4b-it-MLX-4bit on M4 MacBook Air 24 GB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Quantized Gemma 4 safetensors files ship
k_proj/v_proj/k_normweights for layers>= num_hidden_layers - num_kv_shared_layers, even though those layers reuse KV projections from an earlier layer and have no corresponding model attributes. Loading any quantized Gemma 4 model fails with:This affects
mlx-community/gemma-4-e4b-it-4bit(52k downloads) and other quantized variants.Fix
Drop those tensors in
Gemma4TextModel.sanitize(), the complement of #1158. That PR stopped mlx-lm from creating unused projection slots for KV-shared layers. This PR drops the incoming weights for those slots that still arrive from the checkpoint. Together they make quantized Gemma 4 models load cleanly.Testing
Tested on M4 MacBook Air 24 GB with
mlx-community/gemma-4-e4b-it-4bit. Loads cleanly and runs at 18–25 tok/s across 500–4K token prompts. Before this fix it failed immediately with the ValueError above.