v0.31.0
Highlights
- Initial version of QMMs for CUDA (#3160)
- JACCL mesh bandwidth improvements (#3174)
- Massive speedups for 3D convs (#3147)
- Continued improvements to qqmm (#3106, #3022)
What's Changed
- Patch bump by @angeloskath in #3102
- is_available() should check the device index too by @andresy in #3107
- Fix residency set with user provided buffer by @awni in #3108
- Cleanup test_fast_sdpa.py by @zcbenz in #3112
- [CUDA] Set current device before allocating memory by @zcbenz in #3110
- Quantize module to QQLinear by @nastya236 in #3106
- [CUDA] Use cuDNN SDPA for decoding when using fixed-size KV cache by @zcbenz in #3113
- register pressure by @nastya236 in #3116
- Fix precision in Metal fused attention by @awni in #3119
- [CUDA] Attention sinks in cuDNN SDPA by @zcbenz in #3118
- Fix donation in sdpa vector by @angeloskath in #3121
- Manage stream placement in import function by @awni in #3127
- fix: propagate quantization mode in QuantizedAllToShardedLinear / QuantizedShardedToAllLinear by @vskiwi in #3133
- [featuring] - add hanning window function by @Vlor999 in #3124
- feat: adding the hamming function by @Vlor999 in #3135
- Tensor scale nvfp4 by @nastya236 in #3022
- Fix fence synchronization accross command buffers by @awni in #3144
- Export: preserve Dtype state values in export callback arguments by @skryl in #3145
- [Metal] Fix 32-bit integer overflow in conv3d unfold kernel by @kellen-sun in #3143
- [Metal][Performance] Add implicit matmul pathway for mx.conv3d by @belkakari in #3147
- [Metal] Fix event leak by @awni in #3159
- [CUDA] FPxINT quantized matmul for Hopper by @zcbenz in #3160
- feat: implement mlx.core.blackman by @Vlor999 in #3136
- Enable setting thread block cluster for Hopper and later by @zcbenz in #3168
- [CUDA][NCCL] group split by @nastya236 in #3172
- JACCL refactor and small update by @angeloskath in #3174
- [CUDA] Heuristics for Hopper QMM by @zcbenz in #3173
- Fix compile_fuse broadcast split aliasing bug by @robert-johansson in #3166
- Enable passing in a GPU architecture string via env var by @angeloskath in #3176
- Bump the minor version by @angeloskath in #3183
New Contributors
- @vskiwi made their first contribution in #3133
- @Vlor999 made their first contribution in #3124
- @skryl made their first contribution in #3145
- @kellen-sun made their first contribution in #3143
- @belkakari made their first contribution in #3147
- @robert-johansson made their first contribution in #3166
Full Changelog: v0.30.6...v0.31.0