speaker-diarization #520

denmrnngp-cloud · 2026-02-23T09:31:13Z

denmrnngp-cloud
Feb 23, 2026

Hi! Thank you for the streaming_sortformer 4spk-v2 port.1. He has a problem with fake speakers, I found a way to fix it through a cosine matrix comparison of vectors and similar merge speakers. But I was infuriated by the number of iterations and the cost of resources. I tested this model https://huggingface.co/FluidInference/speaker-diarization-coreml - and after setting up the parameters, I began to get perfect results for speakers (from 2 to 6) - maybe you can port it to mlx? or is it not worth it?

Blaizzy · 2026-02-23T10:17:02Z

Blaizzy
Feb 23, 2026
Maintainer

Could you provide more context

How many speakers you were testing, reproducible example (audio + code) and version of MLX-Audio you are using?

I personally find sortformer to be very accurate and I have tested upto 3h of audio.

1 reply

denmrnngp-cloud Feb 23, 2026
Author

I prepared a reproducible test bundle and attached it here (archive).
It includes:

code (run_diarization_matrix.py + minimal examples),
models,
test audio,
raw diarization outputs (json/txt),
summary with reference-vs-predicted speaker counts.
I compared 3 modes:

sortformer_no_merge
sortformer_with_merge (cosine embedding merge with tuned thresholds)
fluid_coreml_no_merge (FluidInference/speaker-diarization-coreml)
Reference speaker counts:

Ilya.mp3 -> 2
final interviy pm.mp3 -> 3
how_to_raise_a_star_child_podcast_for_women_4.mp3 (alias for my RU file) -> 5
Observed results:

sortformer_no_merge: 4 / 4 / 4 (all mismatched)
sortformer_with_merge: 2 / 3 / 4 (2 of 3 matched)
fluid_coreml_no_merge: 2 / 3 / 5 (3 of 3 matched)
Speed (RTF) was also better for Fluid CoreML in this setup:

SortFormer no merge: ~0.0075–0.0076
SortFormer with merge: ~0.0098–0.0118
Fluid CoreML no merge: ~0.0039–0.0041

https://we.tl/t-5KZOAD4744

Blaizzy · 2026-02-23T10:23:29Z

Blaizzy
Feb 23, 2026
Maintainer

That model is actually a pyannote model. I can add it to the backlog 👌🏽

3 replies

denmrnngp-cloud Mar 2, 2026
Author

@Blaizzy
Hi! not a priority yet?

Blaizzy Mar 2, 2026
Maintainer

It's been done for a couple weeks now in #493

Blaizzy Mar 2, 2026
Maintainer

Pyannote models will be next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speaker-diarization #520

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

speaker-diarization #520

Uh oh!

denmrnngp-cloud Feb 23, 2026

Replies: 2 comments · 4 replies

Uh oh!

Blaizzy Feb 23, 2026 Maintainer

Uh oh!

denmrnngp-cloud Feb 23, 2026 Author

Uh oh!

Blaizzy Feb 23, 2026 Maintainer

Uh oh!

denmrnngp-cloud Mar 2, 2026 Author

Uh oh!

Blaizzy Mar 2, 2026 Maintainer

Uh oh!

Blaizzy Mar 2, 2026 Maintainer

denmrnngp-cloud
Feb 23, 2026

Replies: 2 comments 4 replies

Blaizzy
Feb 23, 2026
Maintainer

denmrnngp-cloud Feb 23, 2026
Author

Blaizzy
Feb 23, 2026
Maintainer

denmrnngp-cloud Mar 2, 2026
Author

Blaizzy Mar 2, 2026
Maintainer

Blaizzy Mar 2, 2026
Maintainer