-
Notifications
You must be signed in to change notification settings - Fork 11
[examples] add libritts recipe #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def extract(self, item): | ||
| import s3tokenizer | ||
| waveform, sample_rate = torchaudio.load(item['wav']) | ||
| item['wav'] = waveform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两行不需要了,下面的直接用 waveform 和 sample_rate 就可以
audio = torchaudio.transforms.Resample(sample_rate,
16000)(waveform)
| def extract(self, item): | ||
| import s3tokenizer | ||
| IGNORE_TOKEN_ID = LabelSmoother.ignore_index | ||
| waveform, sample_rate = torchaudio.load(item['wav']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
| en_tn_model = EnNormalizer(overwrite_cache=False) | ||
| # ASR model | ||
| model = whisper.load_model( | ||
| "/jfs-hdfs/user/xingchen.song/share/whisper/large-v3-turbo.pt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个改成 large-v3-turbo,用户使用时自己自动下载
| import wespeaker | ||
|
|
||
| model = wespeaker.load_model( | ||
| model_dir="/jfs-hdfs/user/binbin.zhang/models/wespeaker/chinese" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个也改成 wespeaker 可自动下载的模型名,CC @cdliang11
| @@ -0,0 +1,7 @@ | |||
| { | |||
| "llm_model_name_or_path": "/bucket/output/jfs-hdfs/user/binbin.zhang/github/west/examples/aishell/tts/model/Qwen2.5-0.5B-Audio", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是在 Qwen2.5-0.5B 基础上增加 4096 个 speech token 来的,这个模型的生成我们后续也加进来。
|
后续增加一下 README,主要有 Tutorial 和 Results 两部分,参考 aishell。这个 PR 我我们先合并,一些 comment 我们放到后续的 PR 里。 |
Add Libritts recipe