Multi-language capabilities for MatchaTTS#166
Multi-language capabilities for MatchaTTS#166fzalkow wants to merge 1 commit intoshivammehta25:mainfrom
Conversation
|
Would this PR allow us to train a multi-lingual model on typologically similar languages to potentially reduce the amount of training data we would need for these languages? I.e. if we have a lot of training data for one language in the family, and there is a topologically similar language for which we do not have much data, could the contents of this PR allow us to train a voice leveraging the model trained with a larger dataset? |
I have not tried this for this particular PR, but for other models with similar techniques, and there was a cross-language benefit. So, my guess is that the answer to your question is yes. |
This pull request adds multi-language support to MatchaTTS by concatenating language embeddings to the encoder and decoder inputs.
Speaker and Language Disentanglement:
When training with only a few monolingual speakers, speaker and language IDs are highly correlated. Therefore, it is likely (though not tested) that the model may have difficulty disentangling them. With multilingual speakers or a sufficiently large number of speakers, the model can better separate speaker and language information.
How to Use:
Specify the number of languages using the
n_langskey in the data config and annotate your data CSV as follows:Backward Compatibility:
If you set
n_langs: 1, the system behaves as before this pull request, with one exception: both speaker and language embeddings are now L2-normalized (if used). In my experience, this helps maintain a balance between speaker and language information during training.Note:
This pull request does not include any text processing functions for additional languages. You will need to ensure that your text preprocessing pipeline supports the languages you intend to use.