Synthalingua v1.2.1: Multi-Backend Support, Intelligent Mode, and Performance Overhaul #171
Pinned
cyberofficial
announced in
Announcements
Replies: 1 comment 5 replies
-
|
Hello, i've followed your project for a while now, i'm using it on linux with an AMD gpu, is in your intentions to handle the installation of PyTorch for ROCm too with the script? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This is a landmark release for Synthalingua, introducing a completely new multi-backend architecture that provides massive performance gains, broader hardware compatibility, and new intelligent features to dramatically improve transcription accuracy. The core application has been refactored to be more modular, stable, and user-friendly.
Pick up the portable version on itch!
Key Highlights
whisper, the blazing-fastfaster-whisper, or the hardware-acceleratedopenvino.faster-whisperbackend.SourceSetUp.pyscript handles the entire environment creation process, from installing Python to configuring dependencies.New Features & Major Enhancements
Multiple Backend Support (
--model_source)Synthalingua now integrates three distinct transcription backends. This allows users to select the optimal engine based on their specific hardware and performance requirements.
--model_source whisper: The default, original OpenAI Whisper implementation. It serves as a reliable baseline.--model_source fasterwhisper: A complete re-implementation of Whisper using CTranslate2. This backend provides a significant performance boost and is the recommended choice for users with NVIDIA GPUs or those seeking better performance on a CPU.--model_source openvino: A new backend leveraging Intel's OpenVINO toolkit for high-performance inference.Intel Hardware Acceleration (via OpenVINO)
With the new OpenVINO backend, Synthalingua now officially supports and is optimized for a range of Intel hardware. The
--deviceflag has been expanded with new options:--device intel-igpu: For Intel integrated GPUs.--device intel-dgpu: For Intel discrete GPUs, such as the Arc series.--device intel-npu: For Intel Neural Processing Units, found in modern Intel Core Ultra processors, offering extremely efficient AI inference.Model Quantization (
--compute_type)For the
faster-whisperandopenvinobackends, users can now utilize quantized models. Quantization reduces the precision of the model's weights (e.g., from 32-bit floats to 8-bit integers), which dramatically decreases memory consumption and increases processing speed. This makes it feasible to run larger, more accurate models on consumer-grade hardware with limited VRAM.Intelligent Mode for Subtitle Generation (
--intelligent_mode)A new "intelligent mode" has been introduced for the subtitle generation feature (
--makecaptions). When enabled, Synthalingua will automatically analyze the quality of a transcribed audio segment. If the confidence score is low or if the model produces repetitive, hallucinatory text, the tool will automatically retry that specific segment with the next-largest, more powerful model. This process continues until a satisfactory result is achieved or the largest model has been tried, ensuring the best possible accuracy without manual intervention.Performance & Stability Improvements
Isolated Transcription Worker Process for Stability
A critical stability issue has been resolved where GPU memory (VRAM) was not being fully released after a model was used. The subtitle generation process has been re-engineered to run each transcription model in an isolated child process (
transcribe_worker.py).After each segment is processed, the worker process terminates, guaranteeing that all of its memory is freed. This prevents memory leaks and crashes, especially when using the
--makecaptions comparemode or the new--intelligent_mode.Setup & User Experience
New Comprehensive Source Setup Script (
SourceSetUp.py)To handle the increased complexity of the new backends, a new, more robust setup script has been introduced. This script automates the entire environment configuration process:
ffmpeg_path.bat) that correctly sets up the system PATH and activates the virtual environment, simplifying the process of running the application.Other Usability Improvements
--help): The help descriptions for nearly all command-line arguments have been rewritten to be far more detailed and user-friendly.demucs_path_helper.py) has been added to reliably locate the correct Python executable for running Demucs, making the--isolate_vocalsfeature much more robust.Full Changelog
Refactoring & Code Quality
transcribe_audio.pyhas been renamed tosynthalingua.py.BaseWhisper.py,FasterWhisper.py,OpenVINOWhisper.py), making the codebase cleaner and easier to extend.BaseWhisper.pymodule now includes a one-time, automatic migration feature that moves any existing model files from the rootmodels/directory into the new, more organizedmodels/Whisper/subdirectory.Build System
transformerslibrary.find_metadata_packages*.py) now automatically generate the required metadata flags for Nuitka builds, ensuring all dependencies are correctly packaged.Bug Fixes & Minor Changes
FasterWhisper.pymodule to ensure accurate language identification.stream_transcription_module.pyis now more resilient to errors and can handle corrupted or empty audio segments from HLS streams without crashing.transcribe_worker.pynow correctly handles the different APIs and output formats of all three backends.README.mdandmodules/about.pyfiles have been extensively updated to reflect all new features and to credit new contributors and backend creators.setup.bat): The Windows setup script has been improved to allow users to reuse an existing virtual environment.Automated Notes
What's Changed
New Contributors
Full Changelog: 1.2.0...1.2.1
This discussion was created from the release Synthalingua v1.2.1: Multi-Backend Support, Intelligent Mode, and Performance Overhaul.
Beta Was this translation helpful? Give feedback.
All reactions