feat: implement gemma3n text model in MLXLLM#346
feat: implement gemma3n text model in MLXLLM#346davidkoski merged 6 commits intomainfrom unknown repository
Conversation
* added to LLMModelFactory * added to MLXService * added to MLXChatExample * 4 model references from HF
|
Nice! Did you base this on #340, or did you start from scratch based on the Python implementation? |
|
Hey, good question. This implementation is like 3rd attempt and it is made from scratch based on the python source from mlx-lm. #340 was a great inspiration, since I am new to this, but sometimes it was misleading. Also, in my initial attempt I was using mlx-vlm language but it also wasn't a good reference. It all worked out once mlx-lm reference was ready. The key to a successful transpilation is to prompt it piece by piece and verify, also feeding this at the end a re-verifying the whole thing https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/converting-python |
|
It looks like it needs swift-format run: |
|
@davidkoski please take another look, I've run |
davidkoski
left a comment
There was a problem hiding this comment.
Changes look good, thank you!
|
Xcode 16 fails with: This is caused by a trailing comma in a call, e.g. self._routerNorm.wrappedValue = RMSNorm(
dimensions: config.hiddenSize,
eps: config.rmsNormEps, // <--- here
) |
|
Hey, I have one question after trying mlx-community/gemma-3n-E2B-it-lm-4bit with your implementation (huge thanks for it). How do you resolve missing chat template in |
|
@davidkoski very interesting, I was confused why this happens until I found the proposal SE-0439 that enables trailing commas in Swift, according to release page of Apple Swift version 6.1.2, it's available in Xcode 16.3+ Anyways, for the sake of compatibility I've removed commas in the latest commit. Cannot test with Xcode 16 but it must be fine now! |
|
Yeah, I am on 16.3 myself -- the 16.0 CI builder has been very useful :-) |
|
@tseylerd I don't have use cases where missing chat template is a problem. If you have an example how it must look like, you can attach here and I'll update the repos on HF with new |
|
Thank you for the contribution! |
Implementation of Gemma 3n model for MLXLLM, text only. Based on the reference implementation in mlx-lm:
ml-explore/mlx-lm#258
This code can actually help building the VLM version there #340
cc @DePasqualeOrg
Models
The original MLX weights from
mlx-vlmare not supported, only weights converted bymlx-lmare supported.I've made a new collection with Text Only MLX models, i.e.
bf16and4bitquantized using this new support.https://huggingface.co/collections/mlx-community/gemma-3n-text-only-lm-6861cf66ddc9a13102996308
Naive benchmarks
Apple M4 Max
mlx-community/gemma-3n-E4B-it-lm-bf16mlx-community/gemma-3n-E2B-it-lm-bf16mlx-community/gemma-3n-E4B-it-lm-4bitmlx-community/gemma-3n-E2B-it-lm-4bitiPhone 16 Pro
mlx-community/gemma-3n-E4B-it-lm-4bitmlx-community/gemma-3n-E2B-it-lm-4bitNotes
gelu_topk,logit_softcap) to improve performaneRMSNoScalecan be improved whenMLXFast.rmsNormis fixed (allows nil weights)Misc
Demos