output_filtering

Output, Captions, and Filtering

These arguments control output formatting, captions, and filtering of unwanted content.

Arguments

Flag	Description
`--ignorelist`	Path to a blacklist file for filtering words/phrases.
`--auto_blocklist`	Auto-add frequently blocked phrases to the blocklist file.
`--debug`	Print debug output for blocked/suppressed messages.
`--save_transcript`	Save the transcript to a file.
`--save_folder`	Folder to save the transcript to (default: `out`). Used with `--save_transcript`.
`--makecaptions`	Enable captions mode with intelligent model progression and quality detection. Use `--makecaptions compare` to generate captions with all RAM models (11gb-v3, 11gb-v2, 7gb, 6gb, 3gb, 2gb, 1gb). Features automatic confidence scoring, repetition detection, and "try all models" option for optimal quality. Only `compare` is a valid argument.
`--print_srt_to_console`	Display generated SRT subtitle content in the console/terminal after creating subtitle files.
`--word_timestamps`	Enable word-level timestamps in subtitle output (sub_gen only). May make subtitle generation slower as it requires more processing power. If you notice slowdowns, remove this flag next time. Has no effect in microphone or HLS/stream modes.
`--file_input`	Path to input file for captioning.
`--file_output`	Folder to save generated captions (SRT) to. Used with `--makecaptions`.
`--file_output_name`	Output file name for captions (without extension, e.g. `MyCaptionsFile`). The program will add `.srt` automatically.
`--subtype`	Process video with subtitles after generation. 'burn' overlays subtitles permanently onto the video. 'embed' adds subtitle track to video container. Only works with `--makecaptions` and video files.
`--substyle`	Customize burned subtitle appearance (only with `--subtype burn`). Format: 'font,size,color' in any order. Font files go in `fonts/` folder. Use `--substyle help` for detailed examples and available options.
`--isolate_vocals [jobs]`	Attempt to isolate vocals from the input audio before generating subtitles (sub_gen only). Requires the demucs package. Optionally accepts a value: `all` (use all CPU cores), a number (set parallel jobs), or nothing (default, single job).
`--demucs_model`	Demucs model to use for vocal isolation (default: htdemucs). Choices: htdemucs, htdemucs_ft, htdemucs_6s, hdemucs_mmi, mdx, mdx_extra, mdx_q, mdx_extra_q, hdemucs, demucs. Only used when `--isolate_vocals` is enabled.
`--silent_detect`	Skip processing silent audio chunks during caption generation (sub_gen only). Improves processing speed for files with long silent periods. Note: Only works with `--makecaptions` - not supported for HLS/streaming or microphone modes.
`--silent_threshold`	dB threshold for silence detection (default: -35.0). Lower values (e.g., -45.0) detect quieter speech like whispers. Higher values (e.g., -25.0) only detect louder speech. Only used with `--silent_detect`.
`--silent_duration`	Minimum duration in seconds for audio to be classified as silence (default: 0.5). Higher values treat brief pauses as speech. Lower values detect shorter silent periods. Only used with `--silent_detect`.
`--batchmode`	Number of speech regions to process simultaneously in parallel for faster transcription (default: 1). Only works with `--makecaptions` mode.
`--print_srt_to_console`	Print the final generated SRT subtitles to the console after file creation (captions mode only).

`--print_srt_to_console`

When enabled, this flag prints the final, fully combined SRT subtitles to the console after the SRT file is created (in captions mode). This is useful for quickly viewing the generated subtitles without opening the SRT file manually. It only prints the final combined SRT (not per-segment SRTs) and works with --makecaptions.

Example:

python synthalingua.py --makecaptions --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile" --print_srt_to_console

This will save the SRT file as usual and also print its contents to the console at the end of processing.

`--isolate_vocals [jobs]`

When enabled, the program will attempt to extract vocals from the input audio file before generating subtitles. This can improve subtitle accuracy for music or noisy audio, but may take additional time and requires the demucs package. If demucs is not installed, a warning will be shown.

VRAM Warning: If you use --isolate_vocals with large models or many parallel jobs on a GPU (CUDA), substantial VRAM (GPU memory) is required. For best results, use a GPU with at least 12GB VRAM. Longer files and advanced models may require even more. If VRAM is insufficient, you may encounter crashes or out-of-memory errors. For CPU-only processing, system RAM is the main requirement. Most systems will use shared memory which may slow up stuff.

Parallel Processing (NEW):

You can now specify an optional value for --isolate_vocals to control the number of parallel jobs Demucs uses:
- --isolate_vocals all — Use all available CPU cores for maximum speed
- --isolate_vocals N — Use N parallel jobs (where N is a number, up to your CPU core count)
- --isolate_vocals (no value) — Use default (single job, no parallelism)
If you specify a number greater than your CPU core count, it will be capped automatically.
This can greatly speed up vocal isolation on multi-core systems.

Examples:

# Use all CPU cores for Demucs (fastest, recommended for powerful systems)
python synthalingua.py --makecaptions --isolate_vocals all --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

# Use 4 parallel jobs for Demucs
python synthalingua.py --makecaptions --isolate_vocals 4 --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

# Use default (single job)
python synthalingua.py --makecaptions --isolate_vocals --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

Model Selection:

By default, the program will prompt you to select which Demucs model to use
Use --demucs_model to specify the model directly and skip the interactive prompt
Available models: htdemucs (default), htdemucs_ft, htdemucs_6s, hdemucs_mmi, mdx, mdx_extra, mdx_q, mdx_extra_q, hdemucs, demucs

Note: This flag only affects subtitle generation (sub_gen/captions mode). It has no effect in microphone or HLS/stream modes.

`--demucs_model`

Specifies which Demucs model to use for vocal isolation. Only used when --isolate_vocals is enabled. If not specified, the program will prompt you to select a model interactively.

Available models:

htdemucs (default): Latest Hybrid Transformer model
htdemucs_ft: Fine-tuned version for better quality (slower)
htdemucs_6s: 6-source separation (includes piano/guitar)
hdemucs_mmi: Hybrid v3 trained on expanded dataset
mdx: Frequency-domain model, MDX winner
mdx_extra: Enhanced MDX with extra training data
mdx_q: Quantized MDX (faster, smaller)
mdx_extra_q: Quantized MDX Extra (faster, smaller)
hdemucs: Original Hybrid Demucs v3
demucs: Original time-domain Demucs

Warning: The longer your video or audio file, the more RAM will be required for processing—especially when using advanced models. For example, the htdemucs_ft model may require up to 24GB of RAM (not to be confused with VRAM) to process a 1-hour video. If you encounter memory errors or segmentation faults, try using a shorter file, a less demanding model, processing your media in smaller segments, or increasing your system's page file (virtual memory) size to help prevent crashes.

`--silent_detect`

When enabled, the program will intelligently skip silent regions in audio files during caption generation. This uses advanced audio analysis to detect speech vs. silence boundaries, resulting in faster processing and better transcription quality by avoiding unnecessary processing of silent segments.

Benefits:

Faster processing for files with long silent periods
Reduced resource usage
Better transcription quality (no processing of noise/silence)
Natural speech boundaries (no mid-word cuts)

Best used with: --isolate_vocals for maximum efficiency and quality

Note: This flag only works with --makecaptions (caption generation mode). It is not supported for HLS/streaming or microphone input modes.

`--silent_threshold`

Controls the dB threshold used for silence detection. This allows fine-tuning the sensitivity of --silent_detect for different types of audio content.

Default: -35.0dB (suitable for normal speech levels)

Common adjustments:

Quiet speech/whispers: Use -45.0dB or lower for more sensitive detection
Noisy environments: Use -30.0dB or higher to avoid false speech detection
Loud speech only: Use -25.0dB or higher for less sensitive detection

Examples:

# Default threshold
--silent_detect

# More sensitive (detects quieter speech)
--silent_detect --silent_threshold -45.0

# Less sensitive (only loud speech)
--silent_detect --silent_threshold -25.0

`--silent_duration`

Controls the minimum duration for a region to be considered silence versus a brief pause in speech. This helps distinguish between natural speaking pauses and actual silent periods.

Default: 0.5 seconds (brief pauses are treated as part of speech)

Common adjustments:

Ignore brief pauses: Use 2.0s or higher to only consider longer gaps as silence
Conversational speech: Use 1.0-1.5s for natural conversation with pauses
Rapid speech: Use 0.1-0.3s to detect even brief silent moments
Podcast intros/outros: Use 3.0s+ to skip only major silent sections

Examples:

# Default duration (0.5s minimum)
--silent_detect

# Only consider 2+ second gaps as silence (ignore brief pauses)
--silent_detect --silent_duration 2.0

# Very sensitive to short gaps
--silent_detect --silent_duration 0.1

# Combined with custom threshold
--silent_detect --silent_threshold -40.0 --silent_duration 1.5

Note: This argument only has effect when used with --silent_detect.

`--makecaptions` - Intelligent Model Progression

The captions mode now features advanced quality detection and intelligent model progression with multiple user-friendly options:

Quality Detection Features:

Confidence Scoring: Automatically calculates confidence scores for each transcription region (90%+ = excellent, 75-89% = good, <75% = needs improvement)
Repetition Detection: Detects two types of model hallucinations:
- Consecutive repetitions: Same text repeated across multiple segments (e.g., "Hello" → "Hello" → "Hello")
- Internal repetitions: Repeated phrases within single segments (e.g., "the one who is the one who is the one who is...")
Turbo Model Handling: Special handling for 7GB Turbo model with translation compatibility warnings

Smart Model Testing:

Higher Models Only: Auto mode intelligently tests only higher models in the hierarchy (never lower/weaker models)
Efficiency Optimization: If already using the highest available model, auto mode skips testing entirely
Clear Communication: Shows exactly which higher models will be tested (e.g., "Available higher models to test: 6gb, 7gb, 11gb-v2, 11gb-v3")
Original vs Tested: Clearly distinguishes the original model results from newly tested model results in performance summaries

User-Friendly Model Progression Options:

Try Next Model Only: Test just the next higher model (traditional approach)
Try All Remaining Models: Automatically test all higher models and show comprehensive comparison
Skip Model Upgrades: Use current results and continue

Enhanced User Experience:

Shows current transcription before asking for upgrades (no more "Current transcription above" with nothing shown)
Displays exactly which models will be tested (e.g., "Try all remaining models (6gb, 7gb, 11gb-v2, 11gb-v3)")
Comprehensive comparison screens showing all attempts with confidence scores and repetition indicators
Intelligent auto-continue options to save time on multiple regions

Example Workflow:

🤔 Low confidence (83.9%) for region 3
   Current model: 2gb | Available upgrade: 3gb
   Region: 15.9s - 18.8s (2.9s)

📝 Current transcription (2gb model):
   📝 15.9s-18.8s: "I'm not sure if I can eat it all"

Model upgrade options:
   1. Try 3gb model only
   2. Try all remaining models (3gb, 6gb, 7gb, 11gb-v2, 11gb-v3) and compare
   n. Skip model upgrades for this region

Enter your choice (1/2/n):

After trying multiple models, users get a comprehensive comparison with clear original vs tested model indicators:

🤔 Which transcription do you prefer?
   A. Use Version 1 (2gb model - 83.9% confidence) [original]
   B. Use Version 2 (3gb model - 91.6% confidence) 
   C. Use Version 3 (6gb model - 94.2% confidence)
   D. Use Version 4 (7gb model - 95.1% confidence)
   E. Use Version 5 (11gb-v2 model - 96.8% confidence) [3 internal repetitions]
   F. Continue trying higher models (one by one)
   G. Try all remaining models (11gb-v3) and compare

Enter your choice (A/B/C/D/E/F/G):

Auto Mode Example (when using automatic model testing):

🤖 Auto mode: Low confidence (88.3%) or repetitions detected for region 18
   Issues: Low confidence (88.3%)
   Region: 253.7s - 255.6s (2.0s)
   🚀 Automatically trying all available models to find best result...
   Available higher models to test: 6gb, 7gb, 11gb-v2, 11gb-v3
   ⚠️  Skipping 7gb (Turbo model - does not support translation to English)
   🔄 Testing 6gb model...
      Confidence: 91.9% ← New best!
   🔄 Testing 11gb-v2 model...
      Confidence: 91.5%
   🔄 Testing 11gb-v3 model...
      Confidence: 86.1%

   📊 Model Performance Summary:
      🟡 3gb (original): 88.3%
      🟢 6gb: 91.9% ← SELECTED
      🟢 11gb-v2: 91.5%
      🟡 11gb-v3: 86.1%

   🎯 Auto mode results:
      Best model: 6gb
      Best confidence: 91.9%
      ℹ️  Selection prioritizes: 1) No repetitions, 2) High confidence, 3) Lower repetition counts
      ✅ Excellent results achieved!

`--word_timestamps`

When enabled, subtitles will include word-level timestamps for more precise alignment. This may make subtitle generation a bit slower as it requires more processing power. If you notice any unusual slowdowns, try removing the --word_timestamps flag next time you run this command.

Note: This flag only affects subtitle generation (sub_gen/captions mode). It has no effect in microphone or HLS/stream modes, and will show a warning if used there.

`--subtype` - Video Subtitle Processing

After generating SRT subtitle files with --makecaptions, you can automatically process the video file to add subtitles using FFmpeg. This parameter controls how subtitles are integrated with the video.

Available options:

burn: Permanently overlay subtitles onto the video (hardcoded subtitles). The text becomes part of the video image and cannot be turned off. Best for sharing or when subtitle support is uncertain.
embed: Add subtitles as a separate track in the video container (soft subtitles). Viewers can toggle subtitles on/off. Requires media player with subtitle support.

Requirements:

Only works with --makecaptions mode
Requires video input files (not audio-only)
FFmpeg must be installed and available in PATH
Input file must be a video format (MP4, AVI, MKV, etc.)

Examples:

# Burn subtitles permanently into video
python synthalingua.py --makecaptions --subtype burn --file_input video.mp4 --file_output output/ --file_output_name result

# Embed subtitles as toggleable track
python synthalingua.py --makecaptions --subtype embed --file_input video.mp4 --file_output output/ --file_output_name result

Output files:

Original SRT file: result.srt
Processed video: result_subtitled.mp4 (burn) or result_embedded.mp4 (embed)

`--substyle` - Custom Subtitle Styling

Customize the visual appearance of burned subtitles when using --subtype burn. This allows you to control font, size, and color to match your preferences or branding.

Format: "font,size,color" (comma-separated, any order)

Parameters:

Font: Filename of font file in fonts/ directory (e.g., FiraSans-Bold.otf)
Size: Font size in points (e.g., 24, typical range: 12-72)
Color: Color name (e.g., yellow, red, cyan)

Supported font formats: .ttf, .otf, .woff, .woff2

Available colors: white, black, red, green, blue, yellow, cyan, magenta, orange

Flexible usage:

All parameters are optional (defaults will be used)
Parameters can be in any order
Mix and match as needed

Font setup:

Create a fonts/ folder in your Synthalingua directory
Place your font files (.ttf, .otf, etc.) in the folder
Reference the exact filename in --substyle

Examples:

# Custom font, size, and color
--substyle "FiraSans-Bold.otf,24,yellow"

# Size and color only (system default font)
--substyle "20,red"

# Font and size (default color)
--substyle "FiraSans-UltraLightItalic.otf,18"

# Color only (default font and size)
--substyle "cyan"

# Parameters in different order
--substyle "24,FiraSans-Bold.otf,green"

Get help and see available fonts:

python synthalingua.py --substyle help

This displays:

Complete usage instructions
All available fonts in your fonts/ directory
Color options
Practical copy-paste examples
Technical notes and requirements

Features:

Automatic fallback: If font not found, uses system default with warning
Smart font detection: Dynamically shows available fonts in help
Black outline: Automatically added for better text readability
Cross-platform: Works on Windows, Linux, and macOS

Technical notes:

Only works with --subtype burn (not with embed)
Font files must be placed in fonts/ directory relative to script location
Uses FFmpeg's subtitle filter with proper font directory handling
Fallback behavior ensures video processing continues even with font issues

Complete styling example:

python synthalingua.py --makecaptions --subtype burn --substyle "FiraSans-Bold.otf,26,orange" --file_input video.mp4 --file_output output/ --file_output_name styled_video

Details & Examples

`--ignorelist`

Load a blacklist file (one word/phrase per line) to filter unwanted content from all outputs.

`--auto_blocklist`

When enabled (with --ignorelist), phrases blocked 3+ times in the last 10 are auto-added to your blocklist.

`--debug`

Prints debug info about blocked or suppressed messages.

`--save_transcript` & `--save_folder`

Save transcriptions to a file in the specified folder (always use both flags together):

python synthalingua.py --save_transcript --save_folder "C:/transcripts"

Captions Example

Basic caption generation with intelligent model progression:

python synthalingua.py --makecaptions --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

What happens: The system will automatically detect low confidence regions and offer to try higher models. You can choose to test models one-by-one or use the "try all models" option for comprehensive comparison.

Video Subtitle Processing Examples

Basic burned subtitles (permanent overlay):

python synthalingua.py --makecaptions --subtype burn --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyVideo"

Creates: MyVideo.srt + MyVideo_subtitled.mp4 (with burned subtitles)

Embedded subtitles (toggleable track):

python synthalingua.py --makecaptions --subtype embed --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyVideo"

Creates: MyVideo.srt + MyVideo_embedded.mp4 (with subtitle track)

Custom styled burned subtitles:

# Large yellow text with custom font
python synthalingua.py --makecaptions --subtype burn --substyle "FiraSans-Bold.otf,28,yellow" --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyVideo"

# System font with custom size and color
python synthalingua.py --makecaptions --subtype burn --substyle "20,cyan" --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyVideo"

# Red text only (default font and size)
python synthalingua.py --makecaptions --subtype burn --substyle "red" --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyVideo"

Complete workflow with vocal isolation and custom styling:

python synthalingua.py --makecaptions --isolate_vocals --silent_detect --subtype burn --substyle "FiraSans-UltraLightItalic.otf,22,orange" --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="ProfessionalVideo"

This creates the highest quality subtitles with:

Vocal isolation (removes background music/noise)
Silence detection (faster processing)
Custom styled burned subtitles (italic orange font)

Get styling help and see available fonts:

python synthalingua.py --substyle help

Advanced Captions with Vocal Isolation and Silence Detection (RECOMMENDED)

For maximum efficiency and quality, combine vocal isolation with silence detection:

python synthalingua.py --makecaptions --isolate_vocals --silent_detect --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

With specific Demucs model (skip interactive prompt):

python synthalingua.py --makecaptions --isolate_vocals --demucs_model htdemucs_ft --silent_detect --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

For fastest processing (quantized model):

python synthalingua.py --makecaptions --isolate_vocals --demucs_model mdx_q --silent_detect --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

For quiet speech or whispered content:

python synthalingua.py --makecaptions --isolate_vocals --silent_detect --silent_threshold -45.0 --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

For content with brief speaking pauses (ignore pauses under 2 seconds):

python synthalingua.py --makecaptions --isolate_vocals --silent_detect --silent_duration 2.0 --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

For precise control over both threshold and duration:

python synthalingua.py --makecaptions --isolate_vocals --silent_detect --silent_threshold -40.0 --silent_duration 1.5 --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

This combination:

Extracts clean vocals (removes background music/noise)
Skips silent regions in the cleaned audio
Adjustable threshold for different speech volumes
Results in faster processing and higher accuracy

Captions Compare Mode

Generate captions with all available RAM models for quality comparison (automated batch processing):

python synthalingua.py --makecaptions compare --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

What happens: Automatically generates captions using every model without user intervention, creating separate files for comparison.

With advanced features for optimal quality and efficiency:

python synthalingua.py --makecaptions compare --isolate_vocals --silent_detect --file_input="C:/path/video.mp4" --file_output="C:/output" --file_output_name="MyCaptionsFile"

This will create files like:

MyCaptionsFile.11gb-v3.srt (highest quality)
MyCaptionsFile.11gb-v2.srt
MyCaptionsFile.7gb.srt (Turbo model)
MyCaptionsFile.6gb.srt
MyCaptionsFile.3gb.srt
MyCaptionsFile.2gb.srt
MyCaptionsFile.1gb.srt (fastest)

Compare Mode vs. Interactive Mode:

Compare Mode: Batch processes with all models automatically, creates multiple SRT files
Interactive Mode: Smart progression with user choices, creates single optimized SRT file

Quality Optimization Tips

For Best Results (Interactive Mode):

Start with a lower RAM model (2gb or 3gb)
When prompted with low confidence, choose "Try all remaining models"
Review the comprehensive comparison and select the best version
The system will remember your preferences for subsequent regions

For Auto Mode Efficiency:

Auto mode only tests higher models (never wastes time on lower/weaker models)
If already using the highest model (11gb-v3), auto mode skips testing entirely
Clear performance summaries show original vs tested model results
Intelligent selection prioritizes: 1) No repetitions, 2) High confidence, 3) Lower repetition counts

For Efficiency:

Use --silent_detect to skip processing silent regions
Combine with --isolate_vocals for cleaner audio input
Use "try all models" option instead of testing one-by-one

For Different Content Types:

Music/Noisy Audio: Always use --isolate_vocals
Quiet Speech: Use --silent_threshold -45.0
Fast-Paced Content: Use --silent_duration 0.1
Long Files: Use --silent_detect for significant time savings

Back to Index

Uh oh!

output_filtering

Output, Captions, and Filtering

Arguments

--print_srt_to_console

--isolate_vocals [jobs]

--demucs_model

--silent_detect

--silent_threshold

--silent_duration

--makecaptions - Intelligent Model Progression

--word_timestamps

--subtype - Video Subtitle Processing

--substyle - Custom Subtitle Styling

Details & Examples

--ignorelist

--auto_blocklist

--debug

--save_transcript & --save_folder

Captions Example

Video Subtitle Processing Examples

Advanced Captions with Vocal Isolation and Silence Detection (RECOMMENDED)

Captions Compare Mode

Quality Optimization Tips

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`--print_srt_to_console`

`--isolate_vocals [jobs]`

`--demucs_model`

`--silent_detect`

`--silent_threshold`

`--silent_duration`

`--makecaptions` - Intelligent Model Progression

`--word_timestamps`

`--subtype` - Video Subtitle Processing

`--substyle` - Custom Subtitle Styling

`--ignorelist`

`--auto_blocklist`

`--debug`

`--save_transcript` & `--save_folder`