Synthalingua v1.2.1: Multi-Backend Support, Intelligent Mode, and Performance Overhaul #171

cyberofficial · 2025-08-14T09:19:06Z

cyberofficial
Aug 14, 2025
Maintainer

This is a landmark release for Synthalingua, introducing a completely new multi-backend architecture that provides massive performance gains, broader hardware compatibility, and new intelligent features to dramatically improve transcription accuracy. The core application has been refactored to be more modular, stable, and user-friendly.

Pick up the portable version on itch!

Key Highlights

Choose Your Engine: Select from three powerful transcription backends: the original whisper, the blazing-fast faster-whisper, or the hardware-accelerated openvino.
Intel Hardware Support: Native support for Intel iGPUs, discrete GPUs (Arc), and NPUs via the OpenVINO backend.
Intelligent Mode: Automatic quality analysis and model upscaling for subtitle generation to ensure the best possible accuracy.
Massive Performance Gains: Experience up to 4x faster transcription and 2x less VRAM usage with the faster-whisper backend.
New Automated Setup: A new SourceSetUp.py script handles the entire environment creation process, from installing Python to configuring dependencies.
Critical Stability Fix: An isolated worker process now prevents VRAM memory leaks, making long transcription sessions and model comparisons stable.

New Features & Major Enhancements

Multiple Backend Support (`--model_source`)

Synthalingua now integrates three distinct transcription backends. This allows users to select the optimal engine based on their specific hardware and performance requirements.

--model_source whisper: The default, original OpenAI Whisper implementation. It serves as a reliable baseline.
--model_source fasterwhisper: A complete re-implementation of Whisper using CTranslate2. This backend provides a significant performance boost and is the recommended choice for users with NVIDIA GPUs or those seeking better performance on a CPU.
--model_source openvino: A new backend leveraging Intel's OpenVINO toolkit for high-performance inference.

# Example: Use the faster-whisper backend for a significant speed-up
python synthalingua.py --model_source fasterwhisper --ram 6gb --file_input "video.mp4"

Intel Hardware Acceleration (via OpenVINO)

With the new OpenVINO backend, Synthalingua now officially supports and is optimized for a range of Intel hardware. The --device flag has been expanded with new options:

--device intel-igpu: For Intel integrated GPUs.
--device intel-dgpu: For Intel discrete GPUs, such as the Arc series.
--device intel-npu: For Intel Neural Processing Units, found in modern Intel Core Ultra processors, offering extremely efficient AI inference.

# Example: Run transcription on an Intel Arc GPU
python synthalingua.py --model_source openvino --device intel-dgpu --ram 3gb --file_input "video.mp4"

Model Quantization (`--compute_type`)

For the faster-whisper and openvino backends, users can now utilize quantized models. Quantization reduces the precision of the model's weights (e.g., from 32-bit floats to 8-bit integers), which dramatically decreases memory consumption and increases processing speed. This makes it feasible to run larger, more accurate models on consumer-grade hardware with limited VRAM.

# Example: Run the large-v3 model with int8 quantization to save memory
python synthalingua.py --model_source fasterwhisper --ram 11gb-v3 --compute_type int8 --file_input "video.mp4"

Intelligent Mode for Subtitle Generation (`--intelligent_mode`)

A new "intelligent mode" has been introduced for the subtitle generation feature (--makecaptions). When enabled, Synthalingua will automatically analyze the quality of a transcribed audio segment. If the confidence score is low or if the model produces repetitive, hallucinatory text, the tool will automatically retry that specific segment with the next-largest, more powerful model. This process continues until a satisfactory result is achieved or the largest model has been tried, ensuring the best possible accuracy without manual intervention.

# Example: Generate subtitles with automatic quality improvement
python synthalingua.py --makecaptions --intelligent_mode --file_input "lecture.mp4"

Performance & Stability Improvements

Isolated Transcription Worker Process for Stability

A critical stability issue has been resolved where GPU memory (VRAM) was not being fully released after a model was used. The subtitle generation process has been re-engineered to run each transcription model in an isolated child process (transcribe_worker.py).

After each segment is processed, the worker process terminates, guaranteeing that all of its memory is freed. This prevents memory leaks and crashes, especially when using the --makecaptions compare mode or the new --intelligent_mode.

Setup & User Experience

New Comprehensive Source Setup Script (`SourceSetUp.py`)

To handle the increased complexity of the new backends, a new, more robust setup script has been introduced. This script automates the entire environment configuration process:

It can download and perform a clean, isolated installation of the correct Python version (3.12.10) to prevent conflicts.
It creates a dedicated virtual environment to manage dependencies cleanly.
It handles the installation of all necessary packages, including backend-specific libraries like PyTorch for CUDA or OpenVINO for Intel.
It automatically downloads and configures FFmpeg and yt-dlp.
It generates a launcher batch file (ffmpeg_path.bat) that correctly sets up the system PATH and activates the virtual environment, simplifying the process of running the application.

Other Usability Improvements

Greatly Enhanced Command-Line Help (--help): The help descriptions for nearly all command-line arguments have been rewritten to be far more detailed and user-friendly.
Robust Vocal Isolation (Demucs) Path Handling: A new helper module (demucs_path_helper.py) has been added to reliably locate the correct Python executable for running Demucs, making the --isolate_vocals feature much more robust.

Full Changelog

Refactoring & Code Quality

Core Application Renamed: The main entry point transcribe_audio.py has been renamed to synthalingua.py.
Modular Backend Architecture: The transcription logic has been refactored into a modular, class-based system. Each backend is now encapsulated in its own class (BaseWhisper.py, FasterWhisper.py, OpenVINOWhisper.py), making the codebase cleaner and easier to extend.
Automatic Model File Migration: The BaseWhisper.py module now includes a one-time, automatic migration feature that moves any existing model files from the root models/ directory into the new, more organized models/Whisper/ subdirectory.

Build System

Overhauled Portable Executable Build: The build process for the portable Windows executable has been significantly improved with a new PyInstaller spec file and runtime hooks, resolving common runtime errors with the transformers library.
Automated Metadata Discovery: New helper scripts (find_metadata_packages*.py) now automatically generate the required metadata flags for Nuitka builds, ensuring all dependencies are correctly packaged.

Bug Fixes & Minor Changes

FasterWhisper Language Detection: Corrected the audio processing logic for language detection in the FasterWhisper.py module to ensure accurate language identification.
Robust Stream Transcription: The stream_transcription_module.py is now more resilient to errors and can handle corrupted or empty audio segments from HLS streams without crashing.
Browser Cookie Fix: Resolved an issue where cookies extracted from a browser were not being correctly used during interactive stream selection.
Unified Transcription Worker Logic: The transcribe_worker.py now correctly handles the different APIs and output formats of all three backends.
Improved Blocklist Filtering: The blocklist is now cached in memory and only re-read if the file is modified, making filtering more efficient.
Documentation: The README.md and modules/about.py files have been extensively updated to reflect all new features and to credit new contributors and backend creators.
Setup Script (setup.bat): The Windows setup script has been improved to allow users to reuse an existing virtual environment.

Automated Notes

What's Changed

Added support for different Whisper models (FasterWhisper and OpenVINO) by @YuumiPie in Added support for different Whisper models (FasterWhisper and OpenVINO) #165

New Contributors

@YuumiPie made their first contribution in Added support for different Whisper models (FasterWhisper and OpenVINO) #165

Full Changelog: 1.2.0...1.2.1

This discussion was created from the release Synthalingua v1.2.1: Multi-Backend Support, Intelligent Mode, and Performance Overhaul.

DonLaud · 2025-08-18T18:12:47Z

DonLaud
Aug 18, 2025

Hello, i've followed your project for a while now, i'm using it on linux with an AMD gpu, is in your intentions to handle the installation of PyTorch for ROCm too with the script?
Thank you for your work

5 replies

cyberofficial Aug 18, 2025
Maintainer Author

Hello 0/

Yes I plan to make the setup more dynamic, I plan on making set up based on OS and Graphics Provider. Initially, the project was meant for Windows + Nvidia, but over time it evolved to officially support other things. Keep in mind, I'm just a solo dev, so stuff is coming at a slow pace. I'll manage to get to it at some point. I also don't have an AMD device, so pushing out AMD based updates will require bug reports from the community to better mold it. I can only go off documentation, and hopefully it works and if not, maybe someone will report it.

My current progress is tracked here: #164

cyberofficial Aug 20, 2025
Maintainer Author

Hello again, I updated the 1.2.x branch with better install setups; https://github.com/cyberofficial/Synthalingua/tree/1.2.x

Please let me know if you encounter any issues, and i'll do my best to address them.

DonLaud Oct 13, 2025

Hello again, I updated the 1.2.x branch with better install setups; https://github.com/cyberofficial/Synthalingua/tree/1.2.x

Please let me know if you encounter any issues, and i'll do my best to address them.

Sorry i totally forgot to answer(I didn't log on the account), the new install setups is working very well. I have only problems with isolate_vocals because the script doesn't see the python.exe inside data_whisper. I tried to set the path myself but it's like the os library doesn't see the path(it's a full path) as a real path, the if even return false. Maybe is a problem with fedora idk, didn't work even with miniconda because on the last part of the installation it doesn't find an anaconda exe.

cyberofficial Oct 13, 2025
Maintainer Author

Hello, I have been working on making a new and improved setup for 1.2.5
It /should/ have better support with different environments. I'm trying to streamline stuff to where it can be better to work between different operating systems, the way I currently check stuff doesn't always work and really hard to handle most edge cases. 1.2.5 is around the corner if everything works out.

DonLaud Oct 13, 2025

I did a little more testing with a friend, the solution was creating a symlink ln -s python python.exe, now it recognize the path and the script works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Synthalingua v1.2.1: Multi-Backend Support, Intelligent Mode, and Performance Overhaul #171

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Synthalingua v1.2.1: Multi-Backend Support, Intelligent Mode, and Performance Overhaul #171

Uh oh!

cyberofficial Aug 14, 2025 Maintainer

Key Highlights

New Features & Major Enhancements

Multiple Backend Support (--model_source)

Intel Hardware Acceleration (via OpenVINO)

Model Quantization (--compute_type)

Intelligent Mode for Subtitle Generation (--intelligent_mode)

Performance & Stability Improvements

Isolated Transcription Worker Process for Stability

Setup & User Experience

New Comprehensive Source Setup Script (SourceSetUp.py)

Other Usability Improvements

Full Changelog

Refactoring & Code Quality

Build System

Bug Fixes & Minor Changes

Automated Notes

What's Changed

New Contributors

Replies: 1 comment · 5 replies

Uh oh!

DonLaud Aug 18, 2025

Uh oh!

cyberofficial Aug 18, 2025 Maintainer Author

Uh oh!

cyberofficial Aug 20, 2025 Maintainer Author

Uh oh!

DonLaud Oct 13, 2025

Uh oh!

Uh oh!

cyberofficial Oct 13, 2025 Maintainer Author

Uh oh!

DonLaud Oct 13, 2025

cyberofficial
Aug 14, 2025
Maintainer

Multiple Backend Support (`--model_source`)

Model Quantization (`--compute_type`)

Intelligent Mode for Subtitle Generation (`--intelligent_mode`)

New Comprehensive Source Setup Script (`SourceSetUp.py`)

Replies: 1 comment 5 replies

DonLaud
Aug 18, 2025

cyberofficial Aug 18, 2025
Maintainer Author

cyberofficial Aug 20, 2025
Maintainer Author

cyberofficial Oct 13, 2025
Maintainer Author