Synthalingua v1.2.0 (Now in Beta) #162
cyberofficial
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This update is a massive overhaul of Synthalingua's core architecture, bringing significant new features, major performance and stability improvements, and a vastly improved user experience. My primary goal was to refactor the codebase for better maintainability and to introduce powerful new capabilities like vocal isolation and intelligent subtitle generation.
Portable + GUI is at itch.io! Pick it up there.

TL;DR Version
Welcome to the biggest Synthalingua update yet! I've packed this release with powerful new features and quality-of-life improvements that make transcribing and translating audio easier, faster, and more accurate than ever before.
Awesome New Toys
--isolate_vocalsfeature uses advanced AI (Demucs) to strip out music and background noise, leaving you with just the voice for a crystal-clear transcription.--makecaptions) is now a genius. It automatically detects and skips silent parts of your audio, saving you a ton of processing time. If it's not confident about a transcription, it will even automatically try a bigger, smarter model to get it right!--selectsourcecommand shows you all available audio qualities for a stream and even lets you listen to a 10-second preview to make sure you're grabbing the live feed.--makecaptions compareto generate subtitles with every single model at once. Just compare the files and pick your favorite!Making Life Easier
--cookies-from-browser chrome) without having to find and export the file yourself.--paddedaudiofeature that gives the AI more context, resulting in smoother and more consistent live transcriptions from your mic or a stream.remote_microphone.py, you now get a live audio meter right in your console, so you can see if your mic is picking up audio correctly.Squashing Annoying Bugs
kill_server.py).The Technical Deep Dive (For Nerds)
This release focused on a major architectural refactor to improve modularity, stability, and to enable a new suite of advanced features.
Architectural Changes
sub_gennow spawns a separatetranscribe_worker.pyprocess for each transcription task. This guarantees that all model-related resources (especially VRAM) are released upon completion, enabling features like compare mode and improving stability during batch processing.TranscriptionCorenow uses a threaded background queue to process microphone audio chunks. This decouples audio capture from processing, leading to a more responsive system that can better handle high-frequency input and prevent audio data loss.api_backend.py) now creates aserver.pidfile on startup. A newkill_server.pyutility can remove this file, which is monitored by a watchdog thread, triggering an immediate and safe shutdown. This resolves long-standing issues with the server process hanging on exit.TempFileManagerinsub_gen.pymanages all temporary files and directories, ensuring they are created in a centraltemp/subfolder and reliably cleaned up on exit (unless--keep_tempis used).Module-Specific Updates
set_up_env.py- Environment Setup Overhauldownloaded_assets/directory for reuse.--using_vocal_isolationflag.data_whisperconda environment with Python 3.12.demucs.ffmpeg_path.batis now intelligently generated to include conda activation scripts if vocal isolation is set up.sub_gen.py&transcribe_worker.py- Subtitle Generation--isolate_vocalsflag which runs thedemucsCLI on the input file. Supports model selection (--demucs_model) and multi-threading (--isolate_vocals [jobs]).--silent_detectto analyze the audio file, identify speech regions, and process them in batches, skipping silence. This dramatically reduces processing time on sparse audio tracks.--makecaptions compareto iterate through all RAM models and generate a separate SRT file for each, allowing for direct quality comparison.--word_timestampsflag is now fully integrated, allowing for more precise subtitle chunking when available.remote_microphone.py- HLS StreamingCHUNKsize to4096andRATEto48000Hz for higher fidelity input.96k) and flags like-avoid_negative_ts make_zerofor stability.shutdown_eventandsignal_handlerfor cleaner exits onCtrl+C.stream_handler.py- Stream Processing--selectsourceflag, which callsyt-dlp -Fto list all available formats and prompts the user for a selection.--paddedaudioflag now works for streams, carrying over the last few segments from the previous batch to provide context for the current one, improving transcription continuity.--cookies-from-browserflag, which usesyt-dlp's functionality to extract cookies from a specified browser into a temporary file.Core Modules & General Quality of Life
transcription_core.py: Refactored to use a background processing queue for microphone input. Integrated similarity detection (is_similar) and an auto-blocklist feature to prevent repetitive output.parser_args.py: Added validation for new arguments (--isolate_vocals,--demucs_jobs,--cookies-from-browser) and improved help text.version_checker.py: Now uses the GitHub API's/releases/latestendpoint instead of parsing a file from a specific branch, making version checks more reliable and accurate.api_backend.py: Added the PID file watchdog system for robust server termination.discord.py: Webhook messages are now richly formatted with markdown for better readability.--- Automated Notes ---
What's Changed
Full Changelog: 1.1.0...1.2.0
This discussion was created from the release Synthalingua v1.2.0 (Now in Beta).
Beta Was this translation helpful? Give feedback.
All reactions