WenetSpeech is a family of speech datasets, We means co-creation and Win-Win, Net means internet, Speech means speech data.
Now we have the following datasets, and we will add more datasets in the future.
| Dataset | Year | Hours | Paper | Github | Huggingface | 公众号文章 |
|---|---|---|---|---|---|---|
| WenetSpeech | 2021 | 10,000 | paper | github | data | blog |
| WenetSpeech4TTS | 2024 | 12,800 | paper | / | data | blog |
| WenetSpeech-Yue | 2025 | 21,800 | paper | github | data | blog |
| WenetSpeech-Chuan | 2025 | 10,000 | paper | github | data | blog |