Feature request: Recognize Latin words when transcribing in various languages #181
Replies: 1 comment
-
It is a general limitation of the whisper model that terms from a foreign language are often not transcribed correctly. Widely used Latin or English terms should still be recognized. But when it comes to more specific vocabulary from the medical field, etc., specific training ("fine tuning") of the models would be necessary. This is beyond my capabilities, unfortunately. People who do this kind of training often share their models on hugging face (search for whisper models). If you find an interesting model, I can help you to integrate it into noScribe. However, my experiences with such fine-tuned models have been mixed. Often times, improvements in one domain come with downsides in another. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I often work with audio recordings in Polish that contain many Latin terms (especially in medical and scientific contexts). Currently, when I select Polish as the transcription language, Latin words are often not recognized or are transcribed incorrectly. This issue is not unique to Polish—Latin terminology is frequently used in many languages, particularly in medicine, biology, law, and academia.
Would it be possible to add an option or improve the model so that, when transcribing in languages such as Polish, English, German, etc., Latin words (especially standard medical/scientific/legal terms) are also recognized and transcribed correctly?
Alternatively, maybe an option to combine the selected language with Latin recognition, or to use a "multilingual" mode for such cases, could be suggested in the interface?
This feature would be extremely helpful for users working with mixed-language (e.g., Polish-Latin, English-Latin) audio, especially in professional and academic fields.
Thank you very much for considering this improvement!
Dzimma
Beta Was this translation helpful? Give feedback.
All reactions