For contributor guidelines see AGENTS.md.
A powerful desktop application that converts video files into accurate text transcripts using OpenAI's Whisper AI model. Features a modern GUI built with PyQt6, batch processing capabilities, and advanced text post-processing with filler word removal.
- [VIDEO] Multi-Format Support: Process MP4, AVI, MKV, and MOV video files
- [GPU] GPU Acceleration: Automatic CUDA detection for 10-20x faster processing
- [BATCH] Batch Processing: Queue multiple videos for automated transcription
- [TEXT] Advanced Text Processing:
- Automatic filler word removal ("um", "uh", "like", "you know")
- Smart punctuation and capitalization
- Paragraph formatting for readability
- [MODEL] Flexible Model Selection: Choose from tiny, base, small, medium, or large Whisper models
- [FILE] Custom Model Loading: Load pre-downloaded models to work offline
- [PROGRESS] Real-time Progress: Track processing with time estimates and progress bars
- [CONTROL] Pause/Resume: Control processing without losing progress
- [UI] Modern UI: Clean, intuitive interface with drag-and-drop support
For first-time setup:
# 1. Clone the repository
git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriber
# 2. Create virtual environment
python -m venv venv
# 3. Activate and install dependencies
venv\Scripts\activate
pip install -r requirements.txt
# 4. Install PyTorch (choose one based on your setup)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # For CUDA 11.8
# OR
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # For CUDA 12.1
# OR
pip install torch torchvision torchaudio # For CPU onlyTo run the app (after setup):
run_app.batThat's it! The batch file handles activation and running automatically.
- Download the latest
VideoTranscriber.exefrom the Releases page - Download Whisper model files (see Model Setup below)
- Run
VideoTranscriber.exe
- Python 3.11 or higher
- NVIDIA GPU (optional, for faster processing)
- CUDA 11.8 or 12.1 (if using GPU)
git clone https://github.com/yourusername/video-transcriber.git
cd video-transcriberpython -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txt# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CPU only
pip install torch torchvision torchaudioOption A: Use the Batch File (Windows - Recommended)
run_app.batThe batch file will automatically:
- Check if virtual environment exists
- Activate the virtual environment
- Launch the application
- Display helpful error messages if something goes wrong
Option B: Manual Run
# Windows
venv\Scripts\activate
python run.py
# macOS/Linux
source venv/bin/activate
python run.pyThe app uses OpenAI's Whisper models for transcription. Each model size offers different trade-offs:
| Model | Parameters | Speed | Quality | Download Size |
|---|---|---|---|---|
| tiny | 39M | Very Fast | Basic | ~39 MB |
| base | 74M | Fast | Good | ~74 MB |
| small | 244M | Moderate | Better | ~244 MB |
| medium | 769M | Slow | Very Good | ~769 MB |
| large | 1550M | Very Slow | Best | ~1.5 GB |
On first use, the app will automatically download the selected model from OpenAI (requires internet connection).
-
Download Model Files
- Download
.ptfiles from OpenAI Whisper - Or from Hugging Face
- Download
-
Place Models in a Folder
C:\WhisperModels\ ????????? tiny.pt ????????? base.pt ????????? small.pt ????????? medium.pt ????????? large-v3.pt -
Load in Application
- Click "Load Model Folder" button
- Navigate to your models folder
- Select the folder containing
.ptfiles - The app will remember this location
-
Start the Application
- Run
VideoTranscriber.exe(if using pre-built executable) - Or run
run_app.bat(Windows - automatically activates venv) - Or run
python run.py(after manually activating virtual environment)
- Run
-
Configure Settings
- Select output directory for transcripts
- Choose Whisper model size (larger = better quality, slower)
- (Optional) Load custom model folder
-
Add Videos
- Click "Add Files" to select videos
- Or "Add Directory" to process entire folders
- Or drag and drop files directly
-
Process Videos
- Click "Start Processing"
- Monitor progress in real-time
- Pause/resume as needed
-
Get Results
- Transcripts saved as
.txtfiles - Same filename as video with
.txtextension - Located in your selected output directory
- Transcripts saved as
The app automatically detects and uses NVIDIA GPUs. Check status in console output:
Model loaded successfully on cuda= GPU active [ENABLED]Model loaded successfully on cpu= CPU only [DISABLED]
- Queue processes videos in order (FIFO)
- Each video's transcript is saved immediately upon completion
- Failed videos don't stop the queue
- Time estimates improve as more videos are processed
The app automatically:
- Removes filler words while preserving meaning
- Adds proper punctuation and capitalization
- Creates readable paragraphs
- Fixes common transcription errors
-
Install PyInstaller
pip install pyinstaller
-
Run Build Script
# Windows build_exe.bat # Or manually pyinstaller VideoTranscriber.spec --clean
-
Find Executable
- Located in
dist/VideoTranscriber.exe - Single file, ready for distribution
- Located in
Edit VideoTranscriber.spec to:
- Add custom icon
- Include additional files
- Modify build options
"No model found" error
- Ensure
.ptfiles are in the selected folder - File names should contain model size (e.g.,
large.pt,large-v3.pt)
Slow processing on CPU
- Install CUDA-enabled PyTorch (see installation)
- Use smaller model (base or small)
- Check GPU is detected in console output
"CUDA out of memory" error
- Use smaller model
- Close other GPU applications
- Process shorter videos
Transcription has repeated text
- App includes automatic repetition removal
- Update to latest version
- Report persistent issues
- For Speed: Use GPU + smaller models (base/small)
- For Quality: Use large model with GPU
- For Long Videos: Videos auto-split into segments
- For Batch Processing: Queue overnight with large model
video-transcriber/
????????? src/
??? ????????? ui/ # GUI components
??? ????????? transcription/ # Whisper integration
??? ????????? audio_processing/ # Video/audio conversion
??? ????????? post_processing/ # Text enhancement
??? ????????? input_handling/ # File management
??? ????????? config/ # Settings management
????????? run.py # Application entry point
????????? run_app.bat # Windows launcher (auto-activates venv)
????????? requirements.txt # Python dependencies
????????? VideoTranscriber.spec # PyInstaller configuration
????????? build_exe.bat # Build script
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for the amazing transcription model
- PyQt6 for the GUI framework
- MoviePy for video processing
- PyTorch for ML framework
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Support for more video formats
- Real-time transcription preview
- Speaker diarization
- Multiple language support
- Cloud processing option
- Export to SRT/VTT subtitles
- Integration with video editing software
Made with care by [Your Name]