Releases · Blaizzy/mlx-audio

07 Mar 19:21

Blaizzy

v0.4.0

5fac1de

v0.4.0 Latest

Latest

What's Changed

add example of qwen3-asr with forced alignment by @eschmidbauer in #463
Restore Qwen3-TTS encoder_config to preserve accents in voice clones. by @orbitalquark in #461
Ensure TTS audio player plays a trailing, partially-filled audio frame. by @orbitalquark in #465
Fix source separation issue with shape mismatch: noise shape for separate_long by @mnoukhov in #467
Enable streaming for Qwen3-TTS when ICL mode is enabled. by @orbitalquark in #466
feat(stt): add support for MedASR (Lasr architecture) by @sigjhl in #376
Formatting fix and add to pyproject.toml by @mnoukhov in #475
Use shared model cache resolution for SAM-Audio by @mnoukhov in #468
Fix voice matching for Pocket TTS by @lucasnewman in #477
Fix ALMs max tokens and chunking by @Blaizzy in #474
[Soprano] Fix decoder and config loading (v1 and v1.1) by @Blaizzy in #480
Add audio separation UI & Server by @Blaizzy in #347
Add Parakeet v3 multilingual support with language detection by @andimarafioti in #481
Revert "Do not discard the last unfilled audio frame. (#465)" by @orbitalquark in #473
Fix longform generation for Pocket TTS by @lucasnewman in #486
Enable streaming /v1/audio/speech server endpoint with raw/pcm data. by @orbitalquark in #484
feat: Add support for Voxtral Mini 4B Realtime by @shreyaskarnik in #487
fix(kokoro): Chinese TTS crashes with ValueError in g2p pipeline by @smartchainark in #489
fix: update STT transcription parameters and preserve original audio format by @shreyaskarnik in #488
fix(docs): update outdated model links by @joaopalmeiro in #495
[VAD / Diarization] Add sortformer by @Blaizzy in #493
feat: add streaming support and toggle for realtime STT by @Cold-A-Muse in #494
chore: update protobuf dependency to version 6.33.5 by @Blaizzy in #497
fix(ui): update footer to display the current year dynamically and link to the repo by @shreyaskarnik in #509
Add Smart Turn v3 semantic VAD by @lucasnewman in #511
fix(vibevoice-asr): add audio resampling and normalization to preprocessing by @bellkjtt in #510
Refactor(whisper): update model instantiation and loading process by @Blaizzy in #514
fix: replace 4 bare excepts with except Exception by @haosenwang1018 in #521
feat(stt): add system_prompt parameter to Qwen3ASR generation methods by @chris-schra in #522
fix(medasr): collapse CTC tokens manually to prevent raw output by @sigjhl in #519
Add Echo TTS by @lucasnewman in #525
Allow printing transcriptions to stdout when output path is "-". by @orbitalquark in #527
stt: Keep uploaded file extension to avoid unnecessary conversions. by @orbitalquark in #528
feat(lid): Add spoken language identification (MMS-LID) by @beshkenadze in #529
Set max tokens to a more reasonable value by default for STT by @lucasnewman in #533
[Qwen3-TTS] Improve inference, TTFB and add batch support by @Blaizzy in #534
refactor(codec): extract shared ECAPA-TDNN backbone by @beshkenadze in #532
Add KittenTTS support and ONNX parity fixes by @Reza2kn in #517
Fix Qwen3-TTS streaming decoder throttling with incremental decoding by @Blaizzy in #537
Add ming omni tts (MoE and Dense) by @Blaizzy in #515
[Qwen3-ASR] Fix auto lang detection by @Blaizzy in #547
Add nvfp4, mxfp4 and mxfp8 quants by @Blaizzy in #543
Fix duplicate audio_samples field in GenerationResult dataclass by @Blaizzy in #548
[Whisper] Fix lang code assignment by @Blaizzy in #549

New Contributors

@eschmidbauer made their first contribution in #463
@orbitalquark made their first contribution in #461
@mnoukhov made their first contribution in #467
@sigjhl made their first contribution in #376
@andimarafioti made their first contribution in #481
@shreyaskarnik made their first contribution in #487
@smartchainark made their first contribution in #489
@joaopalmeiro made their first contribution in #495
@Cold-A-Muse made their first contribution in #494
@bellkjtt made their first contribution in #510
@haosenwang1018 made their first contribution in #521
@chris-schra made their first contribution in #522
@Reza2kn made their first contribution in #517

Full Changelog: v0.3.1...v0.4.0

Contributors

beshkenadze, shreyaskarnik, and 14 other contributors

Assets 2

29 Jan 22:39

Blaizzy

v0.3.1

f7328a4

v0.3.1

What's Changed

Update uv.lock to reflect dependency version changes by @Blaizzy in #432
v0.3.1: Update STT API docs and fix default output path by @Blaizzy in #433
Qwen3-TTS: Add streaming and optimise peak usage by @Blaizzy in #435
Fix: Use single quotes in README examples to avoid Bash history expansion. by @reinexworldc in #440
Fix: improve import error hadling by @reinexworldc in #443
[Qwen3-TTS] Fix some Custom Voices producing silence with 0.6B by @Blaizzy in #444
Refactor audio load by @Blaizzy in #445
Update pyproject.toml for poetry support by @lucasnewman in #446
Add Qwen3-ASR by @Blaizzy in #454
Fix chatterbox load by @Blaizzy in #455
Update README to remove basic usage section by @Blaizzy in #456
Update README with output path for ASR commands by @rahimnathwani in #458
Update package dependencies in uv.lock to include new extras by @Blaizzy in #457
Fix server (STT, TTS) by @Blaizzy in #460

New Contributors

@reinexworldc made their first contribution in #440

Full Changelog: v0.3.0...v0.3.1

Contributors

rahimnathwani, Blaizzy, and 2 other contributors

Assets 2

25 Jan 21:43

Blaizzy

v0.3.0

02ada37

v0.3.0

What's Changed

Fix speaker embedding extraction in Qwen3-TTS model by @Blaizzy in #390
Fix Qwen3-TTS tail artifacts by @Blaizzy in #391
Fix Qwen3-TTS Base Voice Cloning by @Blaizzy in #394
Add Vibevoice ASR by @Blaizzy in #389
Qwen3 speaker embedding tests by @Blaizzy in #396
Update TTS commands in README to include language code option by @rudolfolah in #401
Unify Mimi implementation for Pocket TTS by @lucasnewman in #403
Fix issue of ref_audio not loading prior to inference with server. by @BuffMcBigHuge in #406
Enhance README with installation and usage examples by @rahimnathwani in #404
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #418
Upgrade GitHub Actions to latest versions by @salmanmkc in #419
[VibeVoice-ASR] Fix Metal kernel crash and optimize memory for long audio by @Blaizzy in #417
fix: Allowing quantization of Qwen3-TTS! Adding model_quant_predicate to Qwen3-TTS to exclude embedding layers by @kyr0 in #398
Fix qwen3 tts quants (silence in VC and word precision) by @Blaizzy in #407
Fix stt array io by @Blaizzy in #426
Update MANIFEST.in to remove leading dot from requirements.txt path by @Blaizzy in #428
Move audio path/format prints under verbose flag by @wladpaiva in #429
Update pyproject.toml and GitHub Actions workflow for package publishing by @Blaizzy in #431

New Contributors

@rudolfolah made their first contribution in #401
@BuffMcBigHuge made their first contribution in #406
@rahimnathwani made their first contribution in #404
@salmanmkc made their first contribution in #418
@kyr0 made their first contribution in #398
@wladpaiva made their first contribution in #429

Full Changelog: v0.2.10...v0.3.0

Contributors

kyr0, rahimnathwani, and 6 other contributors

Assets 2

22 Jan 21:39

Blaizzy

v0.3.0rc1

185a0d7

v0.3.0rc1 Pre-release

Pre-release

What's Changed

Remove extra deps by @Blaizzy in #373
Refactor load by @Blaizzy in #374
Add lfm2 audio by @Blaizzy in #370
update lfm readme by @Blaizzy in #377
Fix lang codes kokoro by @Blaizzy in #380
Replace soundfile with miniaudio + ffmpeg by @Blaizzy in #379
Add Pocket TTS model by @lucasnewman in #381
Fix STT stream by @Blaizzy in #382
Migrate swift to https://github.com/Blaizzy/mlx-audio-swift by @Blaizzy in #363
Refactor model path retrieval in get_model_path function by @Blaizzy in #383
Add streaming decoding to snac and orpheus by @Blaizzy in #384
Update generate output path by @Blaizzy in #385
Add Qwen3-TTS by @Blaizzy in #388

Full Changelog: v0.2.10...v0.3.0rc1

Contributors

Blaizzy and lucasnewman

Assets 2

06 Jan 20:44

Blaizzy

v0.2.10

67326f3

v0.2.10

What's Changed

Refactor GLMASR and improve LM style ASR logging by @Blaizzy in #332
Remove actual issue ID reference from PR template by @mootari in #334
Add maya1 fixes to Llama by @Blaizzy in #340
Fix marvis, chatterbox and args by @Blaizzy in #342
Add Sam Audio by @Blaizzy in #338
fix: Add missing mlx-lm dependency by @joshwhiton in #344
feat(swift): add Kokoro-82M-v1.1-zh MLX Support by @Alex-Wengg in #341
Remove loguru reconfiguration on Kokoro import by @joshwhiton in #348
feat(stt): add AlignAtt streaming transcription for Whisper by @beshkenadze in #321
Fix stft args by @Blaizzy in #354
Allow using DACVAE as a codec independent of SAM Audio model by @lucasnewman in #357
chore: update Python version requirement and dependencies by @Blaizzy in #355
Add MossFormer2 SE (Speech Enhancement) by @starkdmi in #351
use chatterbox MTLTokenizer for multilingual. by @litmudoc in #362
Add streaming and refactor Sam Audio API by @Blaizzy in #360
Add Soprano by @Blaizzy in #359
Fix model type, refactor orpheus style models by @Blaizzy in #358
revert default response format to mp3 by @Blaizzy in #356
Refactor voice loading in KokoroPipeline to support .safetensors files by @Blaizzy in #364
Add uv.lock and pin all deps as core by @Blaizzy in #366

New Contributors

@mootari made their first contribution in #334
@Alex-Wengg made their first contribution in #341
@starkdmi made their first contribution in #351
@litmudoc made their first contribution in #362

Full Changelog: v0.2.9...v0.2.10

Contributors

beshkenadze, mootari, and 6 other contributors

Assets 2

20 Dec 20:16

Blaizzy

v0.2.9

4bc1d0c

v0.2.9

What's Changed

Add GLM ASR by @Blaizzy in #320
Simplify convert API for TTS and STT by @Blaizzy in #324
[Chatterbox-turbo] Add speaker embedding by @Blaizzy in #322
[Chatterbox-turbo] Add in-place cache by @Blaizzy in #322
[Chatterbox-turbo] Add audio streaming by @Blaizzy in #322
[Chatterbox-turbo] Add audio chunking by @Blaizzy in #322

Full Changelog: v0.2.6...v0.2.9

Contributors

Blaizzy

Assets 2

17 Dec 19:30

Blaizzy

v0.2.8

22a2bb9

v0.2.8

What's Changed

fix(server): use lowercase for default response_format by @beshkenadze in #301
Add Chatterbox and Chatterbox Turbo by @Blaizzy in #302
Add Chatterbox [VC only] by @DePasqualeOrg in #282
feat: add lazy imports for TTS/STT modules by @beshkenadze in #290
Pin tfms dep <5.0.0 by @Blaizzy in #303
feat: migrate from setup.py to pyproject.toml with optional deps by @beshkenadze in #291
fix(test): use case-insensitive content-type comparison by @beshkenadze in #300
ci: add modular installation tests for pyproject.toml extras by @beshkenadze in #298
Fix build by @Blaizzy in #304

Full Changelog: v0.2.7...v0.2.8

Contributors

beshkenadze, Blaizzy, and DePasqualeOrg

Assets 2

16 Dec 20:35

Blaizzy

v0.2.7

f3cd320

v0.2.7

What's Changed

Refactor Marvis TTS API: Make public methods accessible by @rudrankriyam in #259
Add Marvis model selection to TTS UI by @rudrankriyam in #261
Add Marvis quant selection to TTS Web UI by @adrgrondin in #264
Fix security vulnerabilities in Next.js and brace-expansion dependencies by @Copilot in #265
make the methods more useful by @pritamsoni-hsr in #246
Update to mlx-swift-lm and remove redundant mlx-swift dependency by @rudrankriyam in #269
Fix voxtral segments by @Blaizzy in #273
Add UI startup option to Server by @Blaizzy in #274
Add preemphasis preprocessing support for Parakeet models to match NeMo training config by @joshwhiton in #286
feat: Add support for VoxCPM (w/ voice cloning) by @voxmenthe in #293
Fix spark decoding by @Blaizzy in #296
feat: extract DSP utilities to dedicated module by @beshkenadze in #289
Feat: add response format option to SpeechRequest by @Blaizzy in #297
Add VibeVoice by @Blaizzy in #295

New Contributors

@pritamsoni-hsr made their first contribution in #246
@joshwhiton made their first contribution in #286
@voxmenthe made their first contribution in #293
@beshkenadze made their first contribution in #289

Full Changelog: v0.2.6...v0.2.7

Contributors

beshkenadze, voxmenthe, and 5 other contributors

Assets 2

07 Nov 17:08

Blaizzy

v0.2.6

bcd5ccf

v0.2.6

What's Changed

fix wav2vec by @josharian in #222
Fix RTF calculation in kokoro model by @davidxifeng in #227
Fix Unnecessary Audio Transcription for the IndexTTS Model by @bytefer in #231
Add Sesame TTS Integration for Swift Audio Package by @rudrankriyam in #223
Use batched vocoding to reduce peak memory usage with Sesame arch models by @lucasnewman in #236
Cache RoPE by dtype for Sesame arch models for improved generation performance by @lucasnewman in #232
Install Metal toolchain for Swift tests by @lucasnewman in #233
Adopt changes interface changes from mlx-lm to fix Sesame-arch models by @lucasnewman in #242
Update swift-transformers dependency to 1.1.0 by @Liam1506 in #247
Improve Swift TTS app UX by @rudrankriyam in #248
Add quality selection and streaming controls to Marvis with UI support for macOS & iOS by @rudrankriyam in #249
Fix Swift compiler warnings by @rudrankriyam in #250
Refactor MarvisModel to handle optional backbone and decoder flavors by @rudrankriyam in #251
Fix iOS 16 compatibility and ESpeakNG framework linking for iOS app by @rudrankriyam in #252
Add memory increase limit for iOS by @rudrankriyam in #253
Update audio playback management in Marvis TTS by @rudrankriyam in #254
Bump version and add new copy files by @Blaizzy in #255
Add UI v2 by @Blaizzy in #154

New Contributors

@josharian made their first contribution in #222
@davidxifeng made their first contribution in #227
@bytefer made their first contribution in #231
@Liam1506 made their first contribution in #247

Full Changelog: v0.2.5...v0.2.6

Contributors

josharian, davidxifeng, and 5 other contributors

Assets 2

26 Aug 18:06

Blaizzy

v0.2.5

cc6bdb4

v0.2.5

What's Changed

Use indeterminate progress for CSM models by @lucasnewman in #216
Bump version to 0.2.5 by @Blaizzy in #219

Full Changelog: v0.2.4...v0.2.5

Contributors

Blaizzy and lucasnewman

Assets 2

Uh oh!

Releases: Blaizzy/mlx-audio

v0.4.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0rc1

What's Changed

Contributors

Uh oh!

v0.2.10

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.9

What's Changed

Contributors

Uh oh!

v0.2.8

What's Changed

Contributors

Uh oh!

v0.2.7

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.6

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.5

What's Changed

Contributors

Uh oh!