Synthalingua v1.2.0 (Now in Beta) #162

cyberofficial · 2025-07-16T22:57:25Z

cyberofficial
Jul 16, 2025
Maintainer

Disclaimer: This is a Beta Release

This update represents a significant architectural overhaul of Synthalingua. While it has been tested, some features are experimental and you may encounter bugs or unexpected behavior. Your feedback is crucial during this phase. If you find something that isn't working right, please report it on the GitHub Issues page so I can make this version as stable as possible.

This update is a massive overhaul of Synthalingua's core architecture, bringing significant new features, major performance and stability improvements, and a vastly improved user experience. My primary goal was to refactor the codebase for better maintainability and to introduce powerful new capabilities like vocal isolation and intelligent subtitle generation.

Primary Repository: https://github.com/cyberofficial/Synthalingua
Beta Branch: https://github.com/cyberofficial/Synthalingua/tree/1.2.x
Bug Reports: Submit an Issue
Feedback & Suggestions: Start a Discussion

Portable + GUI is at itch.io! Pick it up there.

TL;DR Version

Welcome to the biggest Synthalingua update yet! I've packed this release with powerful new features and quality-of-life improvements that make transcribing and translating audio easier, faster, and more accurate than ever before.

Awesome New Toys

Vocal Isolation Has Arrived! - Got a song or a noisy video? The new --isolate_vocals feature uses advanced AI (Demucs) to strip out music and background noise, leaving you with just the voice for a crystal-clear transcription.
Super-Smart Subtitle Generator - The subtitle creator (--makecaptions) is now a genius. It automatically detects and skips silent parts of your audio, saving you a ton of processing time. If it's not confident about a transcription, it will even automatically try a bigger, smarter model to get it right!
Pick the Perfect Stream - No more guessing which stream link to use! The new --selectsource command shows you all available audio qualities for a stream and even lets you listen to a 10-second preview to make sure you're grabbing the live feed.
The Ultimate "Compare Mode" - Can't decide which model gives the best results? Use --makecaptions compare to generate subtitles with every single model at once. Just compare the files and pick your favorite!

Making Life Easier

The Smoothest Setup Ever - I completely rebuilt the setup script. It's now interactive, smarter, and saves all the tools it downloads so you don't have to re-download them every time. It even sets up the complex vocal isolation tools for you!
No-Fuss Cookie Handling - Need to log in to a stream? You can now pull cookies directly from your browser (--cookies-from-browser chrome) without having to find and export the file yourself.
Better Live Transcriptions - I added a new --paddedaudio feature that gives the AI more context, resulting in smoother and more consistent live transcriptions from your mic or a stream.
See Your Mic's Audio Levels - When using remote_microphone.py, you now get a live audio meter right in your console, so you can see if your mic is picking up audio correctly.
A Fresh Coat of Paint - The whole app is now easier on the eyes, with stylish, colored, and icon-based console messages. The Discord notifications are also cleaner and more informative!
Final Output in Console: The Final SRT file for making captions will be shown in the console window if you choose to do so.

Squashing Annoying Bugs

Fixed: The subtitle generator no longer eats all your memory (VRAM/RAM) when processing multiple files.
Fixed: The web server won't get "stuck" on exit anymore. I gave it a special off-switch (kill_server.py).
Fixed: Live transcriptions are way less likely to get stuck in a repetitive loop, printing the same phrase over and over.

The Technical Deep Dive (For Nerds)

This release focused on a major architectural refactor to improve modularity, stability, and to enable a new suite of advanced features.

Architectural Changes

Isolated Worker Process for Transcription: To combat memory leaks, sub_gen now spawns a separate transcribe_worker.py process for each transcription task. This guarantees that all model-related resources (especially VRAM) are released upon completion, enabling features like compare mode and improving stability during batch processing.
Background Queue for Microphone Processing: The TranscriptionCore now uses a threaded background queue to process microphone audio chunks. This decouples audio capture from processing, leading to a more responsive system that can better handle high-frequency input and prevent audio data loss.
PID-Based Server Management: The Flask server (api_backend.py) now creates a server.pid file on startup. A new kill_server.py utility can remove this file, which is monitored by a watchdog thread, triggering an immediate and safe shutdown. This resolves long-standing issues with the server process hanging on exit.
Centralized Temporary File Management: A new singleton TempFileManager in sub_gen.py manages all temporary files and directories, ensuring they are created in a central temp/ subfolder and reliably cleaned up on exit (unless --keep_temp is used).

Module-Specific Updates

`set_up_env.py` - Environment Setup Overhaul

Complete Rewrite: The script is now interactive and state-aware. It checks for existing tools before downloading.
Asset Caching: All downloads (FFmpeg, yt-dlp, 7zr, Miniconda) are now stored locally in a downloaded_assets/ directory for reuse.
Vocal Isolation Setup:
- Adds --using_vocal_isolation flag.
- Automates the download and silent installation of Miniconda.
- Creates a data_whisper conda environment with Python 3.12.
- Installs the correct PyTorch version (with CUDA if selected) and demucs.
Improved User Experience: Prompts users for custom paths, reinstallation choices, and default settings, providing a much friendlier setup process.
Path Management: The final ffmpeg_path.bat is now intelligently generated to include conda activation scripts if vocal isolation is set up.

`sub_gen.py` & `transcribe_worker.py` - Subtitle Generation

New Feature: Vocal Isolation: Implemented --isolate_vocals flag which runs the demucs CLI on the input file. Supports model selection (--demucs_model) and multi-threading (--isolate_vocals [jobs]).
New Feature: Silence Detection: Added --silent_detect to analyze the audio file, identify speech regions, and process them in batches, skipping silence. This dramatically reduces processing time on sparse audio tracks.
New Feature: Intelligent Mode: When using silence detection, if a transcribed region has low confidence or contains repetitions, the system can automatically re-process it with the next-higher quality model.
New Feature: Media Segmentation: For any file with a detectable duration, the script now offers to split it into segments before processing to prevent memory overload. Supports both suggested and custom split points.
New Feature: Compare Mode: Added --makecaptions compare to iterate through all RAM models and generate a separate SRT file for each, allowing for direct quality comparison.
Word-Level Timestamps: The --word_timestamps flag is now fully integrated, allowing for more precise subtitle chunking when available.

`remote_microphone.py` - HLS Streaming

Refactored for Stability: Rewrote the entire script to improve reliability.
Enhanced Audio Pipeline: Increased default CHUNK size to 4096 and RATE to 48000 Hz for higher fidelity input.
FFmpeg Command Optimization: The FFmpeg command is now simpler and more robust, using a lower bitrate (96k) and flags like -avoid_negative_ts make_zero for stability.
Robust Device Selection: Now lists detailed device info (Host API, Sample Rate) and performs a test on the selected device to ensure it's working before starting the stream.
Live Audio Monitoring: Added a real-time RMS audio level meter to the console output.
Graceful Shutdown: Now uses a shutdown_event and signal_handler for cleaner exits on Ctrl+C.

`stream_handler.py` - Stream Processing

New Feature: Interactive Stream Selection: Implemented --selectsource flag, which calls yt-dlp -F to list all available formats and prompts the user for a selection.
New Feature: Stream Preview: When using interactive selection, a 10-second preview of the chosen stream is downloaded and converted to WAV for verification.
New Feature: Padded Audio: The --paddedaudio flag now works for streams, carrying over the last few segments from the previous batch to provide context for the current one, improving transcription continuity.
New Feature: Browser Cookie Integration: Added --cookies-from-browser flag, which uses yt-dlp's functionality to extract cookies from a specified browser into a temporary file.

Core Modules & General Quality of Life

transcription_core.py: Refactored to use a background processing queue for microphone input. Integrated similarity detection (is_similar) and an auto-blocklist feature to prevent repetitive output.
parser_args.py: Added validation for new arguments (--isolate_vocals, --demucs_jobs, --cookies-from-browser) and improved help text.
version_checker.py: Now uses the GitHub API's /releases/latest endpoint instead of parsing a file from a specific branch, making version checks more reliable and accurate.
api_backend.py: Added the PID file watchdog system for robust server termination.
discord.py: Webhook messages are now richly formatted with markdown for better readability.
UI/UX: Consistent, styled console output across all modules for a more polished and professional user experience.

--- Automated Notes ---

What's Changed

Refactor 1.2 by @cyberofficial in Refactor 1.2 #159

Full Changelog: 1.1.0...1.2.0

This discussion was created from the release Synthalingua v1.2.0 (Now in Beta).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Synthalingua v1.2.0 (Now in Beta) #162

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Synthalingua v1.2.0 (Now in Beta) #162

Uh oh!

Uh oh!

cyberofficial Jul 16, 2025 Maintainer

TL;DR Version

Awesome New Toys

Making Life Easier

Squashing Annoying Bugs

The Technical Deep Dive (For Nerds)

Architectural Changes

Module-Specific Updates

set_up_env.py - Environment Setup Overhaul

sub_gen.py & transcribe_worker.py - Subtitle Generation

remote_microphone.py - HLS Streaming

stream_handler.py - Stream Processing

Core Modules & General Quality of Life

What's Changed

Replies: 0 comments

cyberofficial
Jul 16, 2025
Maintainer

`set_up_env.py` - Environment Setup Overhaul

`sub_gen.py` & `transcribe_worker.py` - Subtitle Generation

`remote_microphone.py` - HLS Streaming

`stream_handler.py` - Stream Processing