Multi-language capabilities for MatchaTTS by fzalkow · Pull Request #166 · shivammehta25/Matcha-TTS

fzalkow · 2025-12-19T14:07:51Z

This pull request adds multi-language support to MatchaTTS by concatenating language embeddings to the encoder and decoder inputs.

Speaker and Language Disentanglement:
When training with only a few monolingual speakers, speaker and language IDs are highly correlated. Therefore, it is likely (though not tested) that the model may have difficulty disentangling them. With multilingual speakers or a sufficiently large number of speakers, the model can better separate speaker and language information.

How to Use:
Specify the number of languages using the n_langs key in the data config and annotate your data CSV as follows:

filepath	spk	lang	text

Backward Compatibility:
If you set n_langs: 1, the system behaves as before this pull request, with one exception: both speaker and language embeddings are now L2-normalized (if used). In my experience, this helps maintain a balance between speaker and language information during training.

Note:
This pull request does not include any text processing functions for additional languages. You will need to ensure that your text preprocessing pipeline supports the languages you intend to use.

scott-parkhill · 2026-03-04T23:15:20Z

Would this PR allow us to train a multi-lingual model on typologically similar languages to potentially reduce the amount of training data we would need for these languages? I.e. if we have a lot of training data for one language in the family, and there is a topologically similar language for which we do not have much data, could the contents of this PR allow us to train a voice leveraging the model trained with a larger dataset?

fzalkow · 2026-03-11T19:37:56Z

Would this PR allow us to train a multi-lingual model on typologically similar languages to potentially reduce the amount of training data we would need for these languages? I.e. if we have a lot of training data for one language in the family, and there is a topologically similar language for which we do not have much data, could the contents of this PR allow us to train a voice leveraging the model trained with a larger dataset?

I have not tried this for this particular PR, but for other models with similar techniques, and there was a cross-language benefit. So, my guess is that the answer to your question is yes.

multi language capabilities for MatchaTTS

0dec1a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-language capabilities for MatchaTTS#166

Multi-language capabilities for MatchaTTS#166
fzalkow wants to merge 1 commit intoshivammehta25:mainfrom
fzalkow:main

fzalkow commented Dec 19, 2025

Uh oh!

scott-parkhill commented Mar 4, 2026

Uh oh!

fzalkow commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fzalkow commented Dec 19, 2025

Uh oh!

scott-parkhill commented Mar 4, 2026

Uh oh!

fzalkow commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants