Skip to content

Releases: Blaizzy/mlx-audio

v0.4.0

07 Mar 19:21
5fac1de

Choose a tag to compare

What's Changed

  • add example of qwen3-asr with forced alignment by @eschmidbauer in #463
  • Restore Qwen3-TTS encoder_config to preserve accents in voice clones. by @orbitalquark in #461
  • Ensure TTS audio player plays a trailing, partially-filled audio frame. by @orbitalquark in #465
  • Fix source separation issue with shape mismatch: noise shape for separate_long by @mnoukhov in #467
  • Enable streaming for Qwen3-TTS when ICL mode is enabled. by @orbitalquark in #466
  • feat(stt): add support for MedASR (Lasr architecture) by @sigjhl in #376
  • Formatting fix and add to pyproject.toml by @mnoukhov in #475
  • Use shared model cache resolution for SAM-Audio by @mnoukhov in #468
  • Fix voice matching for Pocket TTS by @lucasnewman in #477
  • Fix ALMs max tokens and chunking by @Blaizzy in #474
  • [Soprano] Fix decoder and config loading (v1 and v1.1) by @Blaizzy in #480
  • Add audio separation UI & Server by @Blaizzy in #347
  • Add Parakeet v3 multilingual support with language detection by @andimarafioti in #481
  • Revert "Do not discard the last unfilled audio frame. (#465)" by @orbitalquark in #473
  • Fix longform generation for Pocket TTS by @lucasnewman in #486
  • Enable streaming /v1/audio/speech server endpoint with raw/pcm data. by @orbitalquark in #484
  • feat: Add support for Voxtral Mini 4B Realtime by @shreyaskarnik in #487
  • fix(kokoro): Chinese TTS crashes with ValueError in g2p pipeline by @smartchainark in #489
  • fix: update STT transcription parameters and preserve original audio format by @shreyaskarnik in #488
  • fix(docs): update outdated model links by @joaopalmeiro in #495
  • [VAD / Diarization] Add sortformer by @Blaizzy in #493
  • feat: add streaming support and toggle for realtime STT by @Cold-A-Muse in #494
  • chore: update protobuf dependency to version 6.33.5 by @Blaizzy in #497
  • fix(ui): update footer to display the current year dynamically and link to the repo by @shreyaskarnik in #509
  • Add Smart Turn v3 semantic VAD by @lucasnewman in #511
  • fix(vibevoice-asr): add audio resampling and normalization to preprocessing by @bellkjtt in #510
  • Refactor(whisper): update model instantiation and loading process by @Blaizzy in #514
  • fix: replace 4 bare excepts with except Exception by @haosenwang1018 in #521
  • feat(stt): add system_prompt parameter to Qwen3ASR generation methods by @chris-schra in #522
  • fix(medasr): collapse CTC tokens manually to prevent raw output by @sigjhl in #519
  • Add Echo TTS by @lucasnewman in #525
  • Allow printing transcriptions to stdout when output path is "-". by @orbitalquark in #527
  • stt: Keep uploaded file extension to avoid unnecessary conversions. by @orbitalquark in #528
  • feat(lid): Add spoken language identification (MMS-LID) by @beshkenadze in #529
  • Set max tokens to a more reasonable value by default for STT by @lucasnewman in #533
  • [Qwen3-TTS] Improve inference, TTFB and add batch support by @Blaizzy in #534
  • refactor(codec): extract shared ECAPA-TDNN backbone by @beshkenadze in #532
  • Add KittenTTS support and ONNX parity fixes by @Reza2kn in #517
  • Fix Qwen3-TTS streaming decoder throttling with incremental decoding by @Blaizzy in #537
  • Add ming omni tts (MoE and Dense) by @Blaizzy in #515
  • [Qwen3-ASR] Fix auto lang detection by @Blaizzy in #547
  • Add nvfp4, mxfp4 and mxfp8 quants by @Blaizzy in #543
  • Fix duplicate audio_samples field in GenerationResult dataclass by @Blaizzy in #548
  • [Whisper] Fix lang code assignment by @Blaizzy in #549

New Contributors

Full Changelog: v0.3.1...v0.4.0

v0.3.1

29 Jan 22:39
f7328a4

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.3.1

v0.3.0

25 Jan 21:43
02ada37

Choose a tag to compare

What's Changed

  • Fix speaker embedding extraction in Qwen3-TTS model by @Blaizzy in #390
  • Fix Qwen3-TTS tail artifacts by @Blaizzy in #391
  • Fix Qwen3-TTS Base Voice Cloning by @Blaizzy in #394
  • Add Vibevoice ASR by @Blaizzy in #389
  • Qwen3 speaker embedding tests by @Blaizzy in #396
  • Update TTS commands in README to include language code option by @rudolfolah in #401
  • Unify Mimi implementation for Pocket TTS by @lucasnewman in #403
  • Fix issue of ref_audio not loading prior to inference with server. by @BuffMcBigHuge in #406
  • Enhance README with installation and usage examples by @rahimnathwani in #404
  • Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #418
  • Upgrade GitHub Actions to latest versions by @salmanmkc in #419
  • [VibeVoice-ASR] Fix Metal kernel crash and optimize memory for long audio by @Blaizzy in #417
  • fix: Allowing quantization of Qwen3-TTS! Adding model_quant_predicate to Qwen3-TTS to exclude embedding layers by @kyr0 in #398
  • Fix qwen3 tts quants (silence in VC and word precision) by @Blaizzy in #407
  • Fix stt array io by @Blaizzy in #426
  • Update MANIFEST.in to remove leading dot from requirements.txt path by @Blaizzy in #428
  • Move audio path/format prints under verbose flag by @wladpaiva in #429
  • Update pyproject.toml and GitHub Actions workflow for package publishing by @Blaizzy in #431

New Contributors

Full Changelog: v0.2.10...v0.3.0

v0.3.0rc1

22 Jan 21:39
185a0d7

Choose a tag to compare

v0.3.0rc1 Pre-release
Pre-release

What's Changed

Full Changelog: v0.2.10...v0.3.0rc1

v0.2.10

06 Jan 20:44
67326f3

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.9...v0.2.10

v0.2.9

20 Dec 20:16
4bc1d0c

Choose a tag to compare

What's Changed

Full Changelog: v0.2.6...v0.2.9

v0.2.8

17 Dec 19:30
22a2bb9

Choose a tag to compare

What's Changed

Full Changelog: v0.2.7...v0.2.8

v0.2.7

16 Dec 20:35
f3cd320

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.6...v0.2.7

v0.2.6

07 Nov 17:08
bcd5ccf

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.2.6

v0.2.5

26 Aug 18:06
cc6bdb4

Choose a tag to compare

What's Changed

Full Changelog: v0.2.4...v0.2.5