Skip to content

Commit 6d932da

Browse files
committed
whisper-large-v3-turbo
1 parent 2330e58 commit 6d932da

File tree

7 files changed

+14
-8
lines changed

7 files changed

+14
-8
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929

3030
<a name="whats-new"></a>
3131
## What's new:
32+
- 2024/10/10:Added support for the Whisper-large-v3-turbo model, a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. It can be downloaded from the[modelscope](examples/industrial_data_pretraining/whisper/demo.py), and [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py).
3233
- 2024/09/26: Offline File Transcription Service 4.6, Offline File Transcription Service of English 1.7,Real-time Transcription Service 1.11 released,fix memory leak & Support the SensevoiceSmall onnx model;File Transcription Service 2.0 GPU released, Fix GPU memory leak; ([docs](runtime/readme.md));
3334
- 2024/09/25:keyword spotting models are new supported. Supports fine-tuning and inference for four models: [fsmn_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [fsmn_kws_mt](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [sanm_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-offline), [sanm_kws_streaming](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online).
3435
- 2024/07/04:[SenseVoice](https://github.com/FunAudioLLM/SenseVoice) is a speech foundation model with multiple speech understanding capabilities, including ASR, LID, SER, and AED.
@@ -95,18 +96,18 @@ FunASR has open-sourced a large number of pre-trained models on industrial data.
9596

9697
| Model Name | Task Details | Training Data | Parameters |
9798
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|:--------------------------------:|:----------:|
98-
| SenseVoiceSmall <br> ([](https://www.modelscope.cn/models/iic/SenseVoiceSmall) [🤗](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) ) | multiple speech understanding capabilities, including ASR, ITN, LID, SER, and AED, support languages such as zh, yue, en, ja, ko | 300000 hours | 234M |
99+
| SenseVoiceSmall <br> ([](https://www.modelscope.cn/models/iic/SenseVoiceSmall) [🤗](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) ) | multiple speech understanding capabilities, including ASR, ITN, LID, SER, and AED, support languages such as zh, yue, en, ja, ko | 300000 hours | 234M |
99100
| paraformer-zh <br> ([](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗](https://huggingface.co/funasr/paraformer-zh) ) | speech recognition, with timestamps, non-streaming | 60000 hours, Mandarin | 220M |
100101
| <nobr>paraformer-zh-streaming <br> ( [](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗](https://huggingface.co/funasr/paraformer-zh-streaming) )</nobr> | speech recognition, streaming | 60000 hours, Mandarin | 220M |
101102
| paraformer-en <br> ( [](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗](https://huggingface.co/funasr/paraformer-en) ) | speech recognition, without timestamps, non-streaming | 50000 hours, English | 220M |
102103
| conformer-en <br> ( [](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗](https://huggingface.co/funasr/conformer-en) ) | speech recognition, non-streaming | 50000 hours, English | 220M |
103104
| ct-punc <br> ( [](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) ) | punctuation restoration | 100M, Mandarin and English | 290M |
104105
| fsmn-vad <br> ( [](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) ) | voice activity detection | 5000 hours, Mandarin and English | 0.4M |
105-
| fsmn-kws <br> ( [](https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | keyword spotting,streaming | 5000 hours, Mandarin | 0.7M |
106+
| fsmn-kws <br> ( [](https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | keyword spotting,streaming | 5000 hours, Mandarin | 0.7M |
106107
| fa-zh <br> ( [](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) ) | timestamp prediction | 5000 hours, Mandarin | 38M |
107108
| cam++ <br> ( [](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) ) | speaker verification/diarization | 5000 hours | 7.2M |
108-
| Whisper-large-v2 <br> ([](https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary) [🍀](https://github.com/openai/whisper) ) | speech recognition, with timestamps, non-streaming | multilingual | 1550 M |
109109
| Whisper-large-v3 <br> ([](https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [🍀](https://github.com/openai/whisper) ) | speech recognition, with timestamps, non-streaming | multilingual | 1550 M |
110+
| Whisper-large-v3-turbo <br> ([](https://www.modelscope.cn/models/iic/Whisper-large-v3-turbo/summary) [🍀](https://github.com/openai/whisper) ) | speech recognition, with timestamps, non-streaming | multilingual | 1550 M |
110111
| Qwen-Audio <br> ([](examples/industrial_data_pretraining/qwen_audio/demo.py) [🤗](https://huggingface.co/Qwen/Qwen-Audio) ) | audio-text multimodal models (pretraining) | multilingual | 8B |
111112
| Qwen-Audio-Chat <br> ([](examples/industrial_data_pretraining/qwen_audio/demo_chat.py) [🤗](https://huggingface.co/Qwen/Qwen-Audio-Chat) ) | audio-text multimodal models (chat) | multilingual | 8B |
112113
| emotion2vec+large <br> ([](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) [🤗](https://huggingface.co/emotion2vec/emotion2vec_plus_large) ) | speech emotion recongintion | 40000 hours | 300M |

README_zh.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
3333

3434
<a name="最新动态"></a>
3535
## 最新动态
36+
- 2024/10/10:新增加Whisper-large-v3-turbo模型支持,多语言语音识别/翻译/语种识别,支持从 [modelscope](examples/industrial_data_pretraining/whisper/demo.py)仓库下载,也支持从 [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py)仓库下载模型。
3637
- 2024/09/26: 中文离线文件转写服务 4.6、英文离线文件转写服务 1.7、中文实时语音听写服务 1.11 发布,修复ONNX内存泄漏、支持SensevoiceSmall onnx模型;中文离线文件转写服务GPU 2.0 发布,修复显存泄漏; 详细信息参阅([部署文档](runtime/readme_cn.md))
3738
- 2024/09/25:新增语音唤醒模型,支持[fsmn_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [fsmn_kws_mt](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [sanm_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-offline), [sanm_kws_streaming](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online) 4个模型的微调和推理。
3839
- 2024/07/04:[SenseVoice](https://github.com/FunAudioLLM/SenseVoice) 是一个基础语音理解模型,具备多种语音理解能力,涵盖了自动语音识别(ASR)、语言识别(LID)、情感识别(SER)以及音频事件检测(AED)。
@@ -102,17 +103,18 @@ FunASR开源了大量在工业数据上预训练模型,您可以在[模型许
102103

103104
| 模型名字 | 任务详情 | 训练数据 | 参数量 |
104105
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:--------------:|:------:|
105-
| SenseVoiceSmall <br> ([](https://www.modelscope.cn/models/iic/SenseVoiceSmall) [🤗](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) ) | 多种语音理解能力,涵盖了自动语音识别(ASR)、语言识别(LID)、情感识别(SER)以及音频事件检测(AED) | 400000小时,中文 | 330M |
106+
| SenseVoiceSmall <br> ([](https://www.modelscope.cn/models/iic/SenseVoiceSmall) [🤗](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) ) | 多种语音理解能力,涵盖了自动语音识别(ASR)、语言识别(LID)、情感识别(SER)以及音频事件检测(AED) | 400000小时,中文 | 330M |
106107
| paraformer-zh <br> ([](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗](https://huggingface.co/funasr/paraformer-zh) ) | 语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
107108
| paraformer-zh-streaming <br> ( [](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗](https://huggingface.co/funasr/paraformer-zh-streaming) ) | 语音识别,实时 | 60000小时,中文 | 220M |
108109
| paraformer-en <br> ( [](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗](https://huggingface.co/funasr/paraformer-en) ) | 语音识别,非实时 | 50000小时,英文 | 220M |
109110
| conformer-en <br> ( [](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗](https://huggingface.co/funasr/conformer-en) ) | 语音识别,非实时 | 50000小时,英文 | 220M |
110111
| ct-punc <br> ( [](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) ) | 标点恢复 | 100M,中文与英文 | 290M |
111112
| fsmn-vad <br> ( [](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
112-
| fsmn-kws <br> ( [](https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | 语音唤醒,实时 | 5000小时,中文 | 0.7M |
113+
| fsmn-kws <br> ( [](https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | 语音唤醒,实时 | 5000小时,中文 | 0.7M |
113114
| fa-zh <br> ( [](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) ) | 字级别时间戳预测 | 50000小时,中文 | 38M |
114115
| cam++ <br> ( [](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) ) | 说话人确认/分割 | 5000小时 | 7.2M |
115116
| Whisper-large-v3 <br> ([](https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [🍀](https://github.com/openai/whisper) ) | 语音识别,带时间戳输出,非实时 | 多语言 | 1550 M |
117+
| Whisper-large-v3-turbo <br> ([](https://www.modelscope.cn/models/iic/Whisper-large-v3-turbo/summary) [🍀](https://github.com/openai/whisper) ) | 语音识别,带时间戳输出,非实时 | 多语言 | 809 M |
116118
| Qwen-Audio <br> ([](examples/industrial_data_pretraining/qwen_audio/demo.py) [🤗](https://huggingface.co/Qwen/Qwen-Audio) ) | 音频文本多模态大模型(预训练) | 多语言 | 8B |
117119
| Qwen-Audio-Chat <br> ([](examples/industrial_data_pretraining/qwen_audio/demo_chat.py) [🤗](https://huggingface.co/Qwen/Qwen-Audio-Chat) ) | 音频文本多模态大模型(chat版本) | 多语言 | 8B |
118120
| emotion2vec+large <br> ([](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) [🤗](https://huggingface.co/emotion2vec/emotion2vec_plus_large) ) | 情感识别模型 | 40000小时,4种情感类别 | 300M |

examples/industrial_data_pretraining/whisper/demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from funasr import AutoModel
99

1010
model = AutoModel(
11-
model="iic/Whisper-large-v3",
11+
model="Whisper-large-v3-turbo",
1212
vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
1313
vad_kwargs={"max_single_segment_time": 30000},
1414
)

examples/industrial_data_pretraining/whisper/demo_from_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# model = AutoModel(model="Whisper-medium", hub="openai")
1212
# model = AutoModel(model="Whisper-large-v2", hub="openai")
1313
model = AutoModel(
14-
model="Whisper-large-v3",
14+
model="Whisper-large-v3-turbo",
1515
vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
1616
vad_kwargs={"max_single_segment_time": 30000},
1717
hub="openai",

funasr/download/name_maps_from_hub.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
"iic/emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
3737
"emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
3838
"iic/emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
39+
"Whisper-large-v3-turbo": "iic/Whisper-large-v3-turbo",
3940
}
4041

4142
name_maps_openai = {
@@ -51,4 +52,5 @@
5152
"Whisper-large-v2": "large-v2",
5253
"Whisper-large-v3": "large-v3",
5354
"Whisper-large": "large",
55+
"Whisper-large-v3-turbo": "turbo",
5456
}

funasr/models/whisper/model.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
@tables.register("model_classes", "Whisper-large-v1")
2929
@tables.register("model_classes", "Whisper-large-v2")
3030
@tables.register("model_classes", "Whisper-large-v3")
31+
@tables.register("model_classes", "Whisper-large-v3-turbo")
3132
@tables.register("model_classes", "WhisperWarp")
3233
class WhisperWarp(nn.Module):
3334
def __init__(self, *args, **kwargs):

funasr/version.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.1.11
1+
1.1.12

0 commit comments

Comments
 (0)