Video Transcriber App

For contributor guidelines see AGENTS.md.

A powerful desktop application that converts video files into accurate text transcripts using OpenAI's Whisper AI model. Features a modern GUI built with PyQt6, batch processing capabilities, and advanced text post-processing with filler word removal.

Features

[VIDEO] Multi-Format Support: Process MP4, AVI, MKV, and MOV video files
[GPU] GPU Acceleration: Automatic CUDA detection for 10-20x faster processing
[BATCH] Batch Processing: Queue multiple videos for automated transcription
[TEXT] Advanced Text Processing:
- Automatic filler word removal ("um", "uh", "like", "you know")
- Smart punctuation and capitalization
- Paragraph formatting for readability
[MODEL] Flexible Model Selection: Choose from tiny, base, small, medium, or large Whisper models
[FILE] Custom Model Loading: Load pre-downloaded models to work offline
[PROGRESS] Real-time Progress: Track processing with time estimates and progress bars
[CONTROL] Pause/Resume: Control processing without losing progress
[UI] Modern UI: Clean, intuitive interface with drag-and-drop support

[QUICK START] Quick Start (Windows)

For first-time setup:

# 1. Clone the repository
git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriber

# 2. Create virtual environment
python -m venv venv

# 3. Activate and install dependencies
venv\Scripts\activate
pip install -r requirements.txt

# 4. Install PyTorch (choose one based on your setup)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # For CUDA 11.8
# OR
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121  # For CUDA 12.1
# OR
pip install torch torchvision torchaudio  # For CPU only

To run the app (after setup):

run_app.bat

That's it! The batch file handles activation and running automatically.

[INSTALL] Installation

Option 1: Download Pre-built Executable (Windows)

Download the latest VideoTranscriber.exe from the Releases page
Download Whisper model files (see Model Setup below)
Run VideoTranscriber.exe

Option 2: Run from Source

Prerequisites

Python 3.11 or higher
NVIDIA GPU (optional, for faster processing)
CUDA 11.8 or 12.1 (if using GPU)

Step 1: Clone the Repository

git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriber

Step 2: Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Install PyTorch with CUDA (for GPU support)

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CPU only
pip install torch torchvision torchaudio

Step 5: Run the Application

Option A: Use the Batch File (Windows - Recommended)

run_app.bat

The batch file will automatically:

Check if virtual environment exists
Activate the virtual environment
Launch the application
Display helpful error messages if something goes wrong

Option B: Manual Run

# Windows
venv\Scripts\activate
python run.py

# macOS/Linux
source venv/bin/activate
python run.py

[SETUP] Model Setup

Understanding Whisper Models

The app uses OpenAI's Whisper models for transcription. Each model size offers different trade-offs:

Model	Parameters	Speed	Quality	Download Size
tiny	39M	Very Fast	Basic	~39 MB
base	74M	Fast	Good	~74 MB
small	244M	Moderate	Better	~244 MB
medium	769M	Slow	Very Good	~769 MB
large	1550M	Very Slow	Best	~1.5 GB

Automatic Model Download

On first use, the app will automatically download the selected model from OpenAI (requires internet connection).

Manual Model Setup (Recommended for Offline Use)

Download Model Files
- Download .pt files from OpenAI Whisper
- Or from Hugging Face

Place Models in a Folder

C:\WhisperModels\
????????? tiny.pt
????????? base.pt
????????? small.pt
????????? medium.pt
????????? large-v3.pt

Load in Application
- Click "Load Model Folder" button
- Navigate to your models folder
- Select the folder containing .pt files
- The app will remember this location

[USAGE] Usage Guide

Basic Workflow

Start the Application
- Run VideoTranscriber.exe (if using pre-built executable)
- Or run run_app.bat (Windows - automatically activates venv)
- Or run python run.py (after manually activating virtual environment)
Configure Settings
- Select output directory for transcripts
- Choose Whisper model size (larger = better quality, slower)
- (Optional) Load custom model folder
Add Videos
- Click "Add Files" to select videos
- Or "Add Directory" to process entire folders
- Or drag and drop files directly
Process Videos
- Click "Start Processing"
- Monitor progress in real-time
- Pause/resume as needed
Get Results
- Transcripts saved as .txt files
- Same filename as video with .txt extension
- Located in your selected output directory

Advanced Features

GPU Acceleration

The app automatically detects and uses NVIDIA GPUs. Check status in console output:

Model loaded successfully on cuda = GPU active [ENABLED]
Model loaded successfully on cpu = CPU only [DISABLED]

Batch Processing Tips

Queue processes videos in order (FIFO)
Each video's transcript is saved immediately upon completion
Failed videos don't stop the queue
Time estimates improve as more videos are processed

Text Processing Options

The app automatically:

Removes filler words while preserving meaning
Adds proper punctuation and capitalization
Creates readable paragraphs
Fixes common transcription errors

[BUILD] Building from Source

Creating Executable

Install PyInstaller
```
pip install pyinstaller
```

Run Build Script

# Windows
build_exe.bat

# Or manually
pyinstaller VideoTranscriber.spec --clean

Find Executable
- Located in dist/VideoTranscriber.exe
- Single file, ready for distribution

Customizing Build

Edit VideoTranscriber.spec to:

Add custom icon
Include additional files
Modify build options

[TROUBLESHOOT] Troubleshooting

Common Issues

"No model found" error

Ensure .pt files are in the selected folder
File names should contain model size (e.g., large.pt, large-v3.pt)

Slow processing on CPU

Install CUDA-enabled PyTorch (see installation)
Use smaller model (base or small)
Check GPU is detected in console output

"CUDA out of memory" error

Use smaller model
Close other GPU applications
Process shorter videos

Transcription has repeated text

App includes automatic repetition removal
Update to latest version
Report persistent issues

Performance Tips

For Speed: Use GPU + smaller models (base/small)
For Quality: Use large model with GPU
For Long Videos: Videos auto-split into segments
For Batch Processing: Queue overnight with large model

[STRUCTURE] Project Structure

video-transcriber/
????????? src/
???   ????????? ui/                    # GUI components
???   ????????? transcription/          # Whisper integration
???   ????????? audio_processing/       # Video/audio conversion
???   ????????? post_processing/        # Text enhancement
???   ????????? input_handling/         # File management
???   ????????? config/                 # Settings management
????????? run.py                      # Application entry point
????????? run_app.bat                 # Windows launcher (auto-activates venv)
????????? requirements.txt            # Python dependencies
????????? VideoTranscriber.spec       # PyInstaller configuration
????????? build_exe.bat              # Build script

[CONTRIBUTING] Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

[LICENSE] License

This project is licensed under the MIT License - see the LICENSE file for details.

[THANKS] Acknowledgments

OpenAI Whisper for the amazing transcription model
PyQt6 for the GUI framework
MoviePy for video processing
PyTorch for ML framework

[SUPPORT] Support

Issues: GitHub Issues
Discussions: GitHub Discussions

[ROADMAP] Roadmap

Made with care by [Your Name]

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.claude		.claude
assets/icons		assets/icons
docs		docs
resources		resources
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
PROJECT_INDEX.json		PROJECT_INDEX.json
README.md		README.md
Video Transcriber App.pyproj		Video Transcriber App.pyproj
Video Transcriber App.sln		Video Transcriber App.sln
VideoTranscriber.spec		VideoTranscriber.spec
__init__.py		__init__.py
requirements.txt		requirements.txt
run.py		run.py
run_app.bat		run_app.bat
setup.py		setup.py

dusancv22/Video-Transcriber-App

Folders and files

Latest commit

History

Repository files navigation