Releases · ml-explore/mlx-lm

22 Apr 07:43

angeloskath

v0.31.3

ed1fca4

v0.31.3 Latest

Latest

Highlights

Lots of bugfixes
Thread local generation stream to accompany MLX v0.31.2

What's Changed

Bump the patch version by @angeloskath in #1124
Fix batch dimension mismatch in BatchKVCache and BatchRotatingKVCache extend() by @razorback16 in #1141
Fix parallel tool call handling in server by @kernelpool in #1170
Fix MiniMax M2 parallel tool calling by @kernelpool in #1171
Fix missing tree_reduce import in models/cache.py by @siiea-ai in #1165
Apertus tie_word_embeddings fix by @BlackSamorez in #1143
Fix batch dimension mismatch in ArraysCache extend() by @techtoboggan in #1169
Fix dwq: check for actual safetensors in target_dir by @micuentadecasa in #1173
fix: handle NoneType check for think tokens in TokenizerWrapper by @yuetyeelo2855 in #1167
Fix Gemma4 tool parser: support hyphenated function names and braces in string args by @AkashKhamkar in #1150
Fix empty tool_call_end breaking Mistral tool calls by @eyupcanakman in #1151
Fix ArraysCache extend by @angeloskath in #1177
Fix Gemma 4 KV-shared layers creating unused projections by @glyphVault in #1158
Thread local generation stream by @angeloskath in #1090

New Contributors

@razorback16 made their first contribution in #1141
@siiea-ai made their first contribution in #1165
@BlackSamorez made their first contribution in #1143
@techtoboggan made their first contribution in #1169
@micuentadecasa made their first contribution in #1173
@yuetyeelo2855 made their first contribution in #1167
@AkashKhamkar made their first contribution in #1150
@glyphVault made their first contribution in #1158

Full Changelog: v0.31.2...v0.31.3

Contributors

kernelpool, micuentadecasa, and 9 other contributors

Assets 2

07 Apr 22:15

angeloskath

v0.31.2

dcbf6e3

v0.31.2

Highlights

Caching system prompt and user messages for non-trimmable caches
Batch generator refactoring

What's Changed

Bump the patch version by @angeloskath in #959
Presence and frequency penalties by @angeloskath in #971
Eval self.left_padding whenever it is updated in BatchRotatingKVCache by @rltakashige in #960
Late binding caused incorrect cache checkpoint by @angeloskath in #976
Move to metal agnostic device_info by @angeloskath in #979
Fix CompletionsDataset mask_prompt crash by @eyupcanakman in #967
Bump the patch version by @angeloskath in #981
Fix test after latest MLX update by @angeloskath in #996
Clear cache trainer memory by @N8python in #986
feat(server): add --allowed-origins by @nwtgck in #987
Delta net precision by @angeloskath in #997
avoid mutating input in SuScaledRoPE and YarnRoPE by @mm65x in #1003
handle missing content-length header in server by @mm65x in #1001
fall back to ast.literal_eval for malformed JSON in qwen3_coder tool parser by @mm65x in #1004
Nemotron super support by @angeloskath in #992
Supporting delay in mlx_lm benchmark by @AndreasPlt in #1010
Fix flaky test by @angeloskath in #1020
Fix missing cache advance from qwen 3.5 by @angeloskath in #1024
Refactor LRUPromptCache by @angeloskath in #1019
Fix SSM dt clamp default for Nemotron-H by @kernelpool in #1026
Inserting logits processors into BatchGenerator in batch_generate by @arthurhjorth in #1008
fix: break shared-buffer memory leak in GatedDeltaNet cache by @adurham in #1077
Fix PromptTrie.pop_prefixes() off-by-one when pruning immediate prefixes by @LxYuan0420 in #1078
Batch generation refactoring and various fixes by @angeloskath in #1072
perf: use max instead of argsort in apply_min_p sampling by @matteocelani in #1083
Add gemma 4 by @Blaizzy in #1093
Bring back max-kv-size to the batch generator by @angeloskath in #1106
Add Gemma 4 tool call parser by @nicdavidson in #1105
Fix Gemma 4 quantized per-layer projection loading by @spicyneuron in #1112
Fix output corruption in speculative decoding by @kernelpool in #1109
Gemma4 final fixes and multi-token think/tool start/end by @angeloskath in #1114
Align batch logits processor token contract by @neilmehta24 in #1115

New Contributors

@rltakashige made their first contribution in #960
@eyupcanakman made their first contribution in #967
@nwtgck made their first contribution in #987
@mm65x made their first contribution in #1003
@AndreasPlt made their first contribution in #1010
@arthurhjorth made their first contribution in #1008
@adurham made their first contribution in #1077
@LxYuan0420 made their first contribution in #1078
@matteocelani made their first contribution in #1083
@nicdavidson made their first contribution in #1105

Full Changelog: v0.31.0...v0.31.2

Contributors

kernelpool, angeloskath, and 14 other contributors

Assets 2

07 Mar 03:59

angeloskath

v0.31.0

044474b

v0.31.0

What's Changed

Fix save/load of CacheList by @angeloskath in #886
Share model by @angeloskath in #871
Fix mixed quant predicates for MLA models by @spicyneuron in #892
Add JoyAI LLM Flash by @kernelpool in #894
perplexity: add --trust-remote-code option by @ivanfioravanti in #896
server: add usage.prompt_tokens_details.cached_tokens to json response by @percontation in #849
Fix qwen3.5 casting to fp32 by @awni in #902
Fix sharded rms norm in MiniMax M2.5 by @angeloskath in #898
Bump for next version by @awni in #904
Add tie_word_embeddings modulars in mistral and qwen3 moe by @Goekdeniz-Guelmez in #889
Allow reading LFM2 models nested rope params by @ykhrustalev in #908
Improve the cache size limits by @angeloskath in #906
Make the cache limits more friendly by @angeloskath in #910
Add 'mx.clear_cache()' to piecewise prompt processing in server. by @N8python in #917
Add filter guard to ArraysCache.nbytes property by @f1yn in #918
Add tokens to eval to avoid large graphs when they are not used by @awni in #924
Clear the cache during batch generation by @awni in #926
Fix qwen3.5 sanitize by @awni in #928
step3p5: use rotating cache for sliding attention layers by @lyonsno in #949
Proposal: --prefill-step-size as cmd line argument for speed/memory usage trade-off by @Abioy in #943
fix: convert() uses incorrect defaults for quantization mode by @spicyneuron in #935
Bump minor by @angeloskath in #954
Ensure normalization does not promote to fp32 by @angeloskath in #951
Better caching in the server by @angeloskath in #911
Adds tensor parallelism for Qwen 3.5 by @angeloskath in #957

New Contributors

@spicyneuron made their first contribution in #892
@ykhrustalev made their first contribution in #908
@f1yn made their first contribution in #918
@lyonsno made their first contribution in #949
@Abioy made their first contribution in #943

Full Changelog: v0.30.7...v0.31.0

Contributors

kernelpool, percontation, and 10 other contributors

Assets 2

12 Feb 18:40

awni

v0.30.7

1974376

v0.30.7

What's Changed

Fix Kimi Linear by @kernelpool in #853
Bump version for next release by @awni in #865
Pythonic tool calling for LFM2 models by @viktike in #864
Fix DeepSeek V3.2 indexer and weight loading by @kernelpool in #866
Make validation set optional in training process by @Goekdeniz-Guelmez in #857
Mistral tool parser by @awni in #874
LongCat MLA by @kernelpool in #868
[MODEL] support qwen3.5 series w/o vision by @JJJYmmm in #869
Faster DSV32 generation by @kernelpool in #885
Add GLM5 by @Goekdeniz-Guelmez in #867

New Contributors

@viktike made their first contribution in #864
@JJJYmmm made their first contribution in #869

Full Changelog: v0.30.6...v0.30.7

Contributors

kernelpool, awni, and 3 other contributors

Assets 2

04 Feb 21:27

awni

v0.30.6

f18526f

v0.30.6

What's Changed

Transformers v5 by @awni in #811
Add LongCat Flash tool parser by @kernelpool in #810
Add Kimi-K2.5 by @kernelpool in #813
Bump mlx version and version by @awni in #816
Fix NemotronH config compatibility with HuggingFace format by @LuqDaMan in #820
Fix for Exception - MultiLinear.to_quantized() missing 'mode' by @inferencers in #809
Fix Kimi K2.5 tool call handling by @kernelpool in #821
Actually add cli by @awni in #823
Add LongCat Flash Lite by @kernelpool in #819
Fix mixed quant by @awni in #825
Support distributed inference in the server by @angeloskath in #741
fix cli by @solarpunkin in #827
Enable loading custom models by @awni in #830
Allow default creation of BatchRotatingKVCache instead of BatchKVCache in batch mode by @christian-lms in #834
Add Step 3.5 Flash by @kernelpool in #836
server: support chat_template_kwargs and top_logprobs by @percontation in #829
fix: handle GLM 4.7 tool call fallbacks by @jalehman in #792
Deepseek V3.2 implementation fixes by @sjug in #838
Fix Step 3.5 Flash model conversion by @kernelpool in #840
Fix batch mamba by @awni in #842
Fix sliding window mask during generation by @kernelpool in #843
DSV3 MLA by @awni in #839

New Contributors

@jalehman made their first contribution in #792

Full Changelog: v0.30.5...v0.30.6

Contributors

kernelpool, jalehman, and 8 other contributors

Assets 2

25 Jan 15:29

awni

v0.30.5

beceb5c

v0.30.5

What's Changed

import logging as it throws no logging error in place of actual error by @Maanas-Verma in #778
server: use OpenAI compatible finish_reason by @percontation in #782
move Xielu Activation in Apertus to activations.py by @Goekdeniz-Guelmez in #772
bump transformers by @awni in #746
Update glm4_moe_lite to store KV latent in cache by @N8python in #780
Adding TeleChat3 by @Goekdeniz-Guelmez in #773
add kimi tool parser by @Evanev7 in #791
Allow qq ops with activation quantization by @awni in #749
fix: use correct variable for logprobs in batch generation by @LuqDaMan in #800
Sync random seed across ranks in distributed chat by @kernelpool in #801
Fix ArraysCache.from_state not initializing left_padding and lengths by @lpalbou in #807

New Contributors

@Maanas-Verma made their first contribution in #778
@percontation made their first contribution in #782
@LuqDaMan made their first contribution in #800
@lpalbou made their first contribution in #807

Full Changelog: v0.30.4...v0.30.5

Contributors

kernelpool, percontation, and 7 other contributors

Assets 2

19 Jan 16:13

awni

v0.30.4

0222860

v0.30.4

What's Changed

Add AWQ/GPTQ weight transformation utilities by @ericcurtin in #730
Add IQuest Coder V1 Loop variant by @kernelpool in #716
Fix sliding window batching by @awni in #738
Fix Batch Generation: Add extract method to ArraysCache for item retrieval by @Goekdeniz-Guelmez in #740
Make MambaCache compatible with batch generation for nemotron-h by @nikhilmitrax in #690
Add a server benchmark for continuous batching by @awni in #728
Fix tools parameter in apply_chat_template call by @kernelpool in #747
Refactor tokenizer error handling to use warnings instead of exceptio… by @cubist38 in #744
Make cache list batchable by @awni in #743
Fix batch generation for IQuestLoopCoder model by @kernelpool in #748
Fix type hint and pydoc for batch_generate by @tibbes in #745
Handle empty caches during batch merge by @ivanfioravanti in #755
Update for latest mlx by @awni in #759
Use compiled Swiglu by @awni in #753
Adds support for Nemotron Super 49b v1.5 by @lazarust in #756
fix(falcon_h1): support tied embeddings and correct muP scaling by @solarpunkin in #764
Fix swiglu parameter order by @kernelpool in #767
Fix CacheList batching by @kernelpool in #769
fix: unused batch_size parameter for mlx_lm.evaluate by @AndrewTan517 in #762
Add gpt-oss sharding by @Evanev7 in #761
Fix LongCat Flash extended context support by @kernelpool in #768
Add minimax tensor sharding by @Evanev7 in #760
Shard LongCat Flash by @kernelpool in #771
Add glm4 moe lite model by @ivanfioravanti in #776

New Contributors

@ericcurtin made their first contribution in #730
@nikhilmitrax made their first contribution in #690
@tibbes made their first contribution in #745
@solarpunkin made their first contribution in #764
@AndrewTan517 made their first contribution in #762
@Evanev7 made their first contribution in #761

Full Changelog: v0.30.2...v0.30.4

Contributors

kernelpool, ivanfioravanti, and 10 other contributors

Assets 2

06 Jan 02:32

awni

v0.30.2

94497d5

v0.30.2

What's Changed

Fix mlx-lm release by @awni in #733

Full Changelog: v0.30.1...v0.30.2

Contributors

awni

Assets 2

06 Jan 00:55

awni

v0.30.1

4c80c68

v0.30.1

What's Changed

custom dsv32 chat template by @awni in #693
shard glm by @awni in #698
support minimax m2 by @awni in #700
Enhance load_config function to check for config file existence and i… by @cubist38 in #701
batch_generate fails with Phi3 (LongRoPE) when prompts have different lengths by @vyaivanove in #707
Fix GIL starvation in _generate thread when batch is idle by @sjug in #706
Ignore generation_config decode errors by @will-lms in #708
Allow mxfp8 and nvfp4 by @awni in #709
Fix chat template detection for models with custom tokenizers by @kernelpool in #712
chore: add model-path param flag for convert API for better clarity by @jaycoolslm in #702
Add RWKV7 by @MollySophia in #580
Fix empty /v1/models response for locally loaded models by @cxl-git-hub in #713
Add IQuest Coder V1 by @kernelpool in #714
Add YoutuLLM by @johnmai-dev in #720
Add logits_processors support to batch_generate by @lazarust in #635
Add Solar Open by @kernelpool in #721
Add K-EXAONE MoE by @kernelpool in #719
Improve reasoning and tool call parsing in server by @awni in #711
Patch bump by @awni in #731

New Contributors

@cubist38 made their first contribution in #701
@vyaivanove made their first contribution in #707
@sjug made their first contribution in #706
@jaycoolslm made their first contribution in #702
@MollySophia made their first contribution in #580
@cxl-git-hub made their first contribution in #713
@lazarust made their first contribution in #635

Full Changelog: v0.30.0...v0.30.1

Contributors

kernelpool, awni, and 9 other contributors

Assets 2

18 Dec 21:46

angeloskath

v0.30.0

1b2d11b

v0.30.0

What's Changed

fix: server busy-waiting during idle request polling by @zenyr in #674
Fixes for transformers v5 by @awni in #684
Add mimo v2 flash by @awni in #685
More useful error message for unsupported batching by @awni in #687
Model parallel generation by @angeloskath in #676
Bump to transformer v5 by @awni in #689
Revert return dict and wrap apply_chat_template by @awni in #691
Bump the version by @angeloskath in #692

Full Changelog: v0.29.0...v0.30.0

Contributors

angeloskath, awni, and zenyr

Assets 2

Releases: ml-explore/mlx-lm

v0.31.3

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.31.2

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.31.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.7

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.6

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.5

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.2

What's Changed

Contributors

Uh oh!

v0.30.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.0

What's Changed

Contributors

Uh oh!