Feature request: Recognize Latin words when transcribing in various languages #181

Dzimma-git · 2025-05-25T17:31:02Z

Dzimma-git
May 25, 2025

Hi!

I often work with audio recordings in Polish that contain many Latin terms (especially in medical and scientific contexts). Currently, when I select Polish as the transcription language, Latin words are often not recognized or are transcribed incorrectly. This issue is not unique to Polish—Latin terminology is frequently used in many languages, particularly in medicine, biology, law, and academia.

Would it be possible to add an option or improve the model so that, when transcribing in languages such as Polish, English, German, etc., Latin words (especially standard medical/scientific/legal terms) are also recognized and transcribed correctly?
Alternatively, maybe an option to combine the selected language with Latin recognition, or to use a "multilingual" mode for such cases, could be suggested in the interface?

This feature would be extremely helpful for users working with mixed-language (e.g., Polish-Latin, English-Latin) audio, especially in professional and academic fields.

Thank you very much for considering this improvement!

Dzimma

kaixxx · 2025-05-26T08:31:06Z

kaixxx
May 26, 2025
Maintainer

It is a general limitation of the whisper model that terms from a foreign language are often not transcribed correctly. Widely used Latin or English terms should still be recognized. But when it comes to more specific vocabulary from the medical field, etc., specific training ("fine tuning") of the models would be necessary. This is beyond my capabilities, unfortunately. People who do this kind of training often share their models on hugging face (search for whisper models). If you find an interesting model, I can help you to integrate it into noScribe. However, my experiences with such fine-tuned models have been mixed. Often times, improvements in one domain come with downsides in another.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: Recognize Latin words when transcribing in various languages #181

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature request: Recognize Latin words when transcribing in various languages #181

Uh oh!

Dzimma-git May 25, 2025

Replies: 1 comment

Uh oh!

kaixxx May 26, 2025 Maintainer

Dzimma-git
May 25, 2025

kaixxx
May 26, 2025
Maintainer