A real-time face recognition application built with Gradio that allows you to recognize faces in live webcam video streams directly in your browser. Upload reference images to build a face gallery, then watch as the system identifies matching faces in real-time with confidence scores and visual annotations.
β¨ Key Features: Real-time processing β’ Web-based interface β’ Multi-face detection β’ Adjustable similarity thresholds β’ Live confidence scoring
- π Example
- π Quick Start
- π How to Use
- ποΈ Features
- π§ Machine Learning Models
- π§ Troubleshooting
- π Development Status & Roadmap
- π About This Project
- π Resources & References
As an example, instead of using a webcam I fed a snippet of BigBangTheory into the app and it recognized the characters in real-time:
- Python 3.12 (exactly - not 3.11 or 3.13)
- 4GB RAM (8GB recommended)
- Webcam (built-in or USB)
- Modern web browser (Chrome, Firefox, Safari, Edge)
- 8GB+ RAM for optimal performance
- Multi-core CPU (Intel i5/AMD Ryzen 5 or better)
- Good lighting conditions for best recognition accuracy
- uv for fastest dependency management
- β macOS (Intel & Apple Silicon)
- β Linux (Ubuntu 20.04+, CentOS 8+)
- β Windows 10/11 (with proper Python setup)
- β Chrome/Chromium 90+
- β Firefox 88+
- β Safari 14+
- β Edge 90+
The fastest and most reliable way to get started:
# Clone the repository
git clone https://github.com/Martlgap/livefaceidapp.git
cd livefaceidapp
# Install dependencies and run
uv sync
uv run python main.pyOr use the convenience script:
chmod +x run.sh
./run.shIf you prefer traditional Python package management:
# Clone and setup virtual environment
git clone https://github.com/Martlgap/livefaceidapp.git
cd livefaceidapp
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install gradio scikit-image scikit-learn opencv-python-headless onnxruntime mediapipe numpy pillow watchdog memory-profiler
# Run the application
python main.pyπ‘ Pro Tip: The uv approach is significantly faster for dependency resolution and installation, especially on fresh setups.
The app runs on http://localhost:7860 by default with share=True enabled, automatically generating a public shareable link for easy testing and demonstration.
For production deployment on a server with HTTPS support:
# Download and setup ssl-proxy
wget https://github.com/suyashkumar/ssl-proxy/releases/download/v0.2.7/ssl-proxy-linux-amd64.tar.gz
tar -xzf ssl-proxy-linux-amd64.tar.gz
# Run with HTTPS proxy
./ssl-proxy-linux-amd64 -from 0.0.0.0:8502 -to 0.0.0.0:7860Then access your app securely at https://your-server-ip:8502
The app automatically provides a public shareable link when launched, perfect for:
- Remote demonstrations
- Cross-device testing
- Sharing with team members
- Mobile device access
This implementation leverages Gradio's strengths for computer vision applications:
- π₯ Native Webcam Support: Seamless browser-based camera integration
- β‘ Real-time Streaming: Optimized for live video processing
- π Zero Configuration: Works across platforms without WebRTC setup
- π± Mobile Friendly: Responsive design for various devices
- π§ Simple Deployment: No complex server configurations needed
- Click "Upload Reference Images" in the left panel
- Select one or more photos containing faces you want to recognize
- The system automatically detects faces and adds them to your gallery
- Each image filename becomes the person's name label
- Allow browser access to your webcam when prompted
- Position faces in front of the camera
- Watch as the system detects and recognizes faces in real-time
- Adjust the "Similarity Threshold" slider:
- Lower values (0.3-0.7): Stricter matching, fewer false positives
- Higher values (0.8-1.2): More lenient matching, catches more matches
- Monitor live match scores and confidence percentages in the right panel
- Green boxes: Successfully matched faces
- Blue boxes: Detected but unmatched faces
- Distance scores: Lower = better match (typically < 1.0 for good matches)
- Confidence %: Higher = more certain match
The application processes video streams through three optimized stages:
- π Face Detection: MediaPipe's FaceMesh identifies up to 7 faces per frame with precise landmark detection
- π§ Face Recognition: MobileNetV2-based neural network extracts 512-dimensional face embeddings via ONNX Runtime
- π― Face Matching: Cosine distance similarity matching against your gallery with configurable thresholds
The entire pipeline is optimized for real-time performance, processing frames at up to 10 FPS depending on hardware capabilities.
A detailed description of the implementation can be found here:
The app uses the following machine learning models:
- Face Detection: FaceMesh (MediaPipe)
- Face Recognition: MobileNetV2 Architecture, trained with MS1M dataset and ArcFace Loss using Tensorflow, converted to ONNX
- π₯ Real-time Face Recognition: Process live webcam feeds with minimal latency
- π Dynamic Gallery Management: Upload multiple reference images to build your face database
- π― Adjustable Similarity Threshold: Fine-tune recognition sensitivity from strict to lenient
- π₯ Multi-face Detection: Simultaneously detect and recognize up to 7 faces per frame
- π Live Confidence Scoring: Real-time distance metrics and confidence percentages
- π¨ Visual Annotations: Dynamic bounding boxes, name labels, and match indicators
- π Browser-based Interface: Zero installation for end users - runs in any modern browser
- π± Responsive Design: Optimized for desktop, tablet, and mobile devices
- β‘ High Performance: ONNX Runtime optimization for fast inference
- π Streaming Architecture: Efficient frame processing with Gradio's streaming capabilities
- ποΈ Interactive Controls: Real-time threshold adjustment and gallery updates
- π One-click Setup: Simple script execution to get started
- π€ Drag & Drop Upload: Intuitive image gallery management
- π Live Feedback: Instant visual and numerical match results
- π Share Ready: Built-in public link generation for demonstrations
This application was developed during research at the Chair of Human-Machine Communication at TUM (Technical University of Munich). The primary goal was to evaluate and compare live face recognition systems across different platforms and frameworks.
The project explores the practical implementation of real-time face recognition in web-based environments, focusing on:
- Performance benchmarking of different ML frameworks
- User experience optimization for browser-based computer vision
- Accessibility of advanced AI technologies through simple interfaces
- Privacy considerations in live video processing applications
This implementation represents a significant evolution from earlier versions:
- V1: Original WebRTC-based streaming (complex setup, platform limitations)
- V2: Streamlit implementation (good but limited real-time capabilities)
- V3: Current Gradio-based solution (optimal balance of performance and usability)
This project serves as both a research tool and an educational resource. Contributions, suggestions, and improvements are welcome! Whether you're interested in:
- Performance optimizations
- Feature enhancements
- Platform compatibility
- Documentation improvements
Feel free to open issues or submit pull requests.
Released under MIT License - feel free to use, modify, and distribute for both academic and commercial purposes. If you use this work in research, please consider citing the associated publications.
- Camera not detected: Ensure browser permissions are granted and no other apps are using the webcam
- Poor video quality: Check lighting conditions and camera resolution settings
- Laggy performance: Close other browser tabs/applications using camera resources
- No faces detected: Ensure adequate lighting and face visibility
- False matches: Decrease similarity threshold (try 0.5-0.7)
- Missing matches: Increase similarity threshold (try 0.9-1.2)
- Low confidence scores: Add more/better quality reference images
- Slow processing: Check system resources, close unnecessary applications
- High CPU usage: Normal behavior - face recognition is computationally intensive
- Memory warnings: Restart the application if processing many large images
- Python version: Ensure Python 3.12 is installed (not 3.11 or 3.13)
- uv not found: Install uv with
curl -LsSf https://astral.sh/uv/install.sh | sh - Dependencies fail: Try the pip installation method as fallback
- Optimal conditions: Good lighting, clear face visibility, minimal background motion
- Image quality: Use high-resolution reference images with clear, front-facing photos
- Hardware: Better performance on systems with dedicated GPUs (future CUDA support planned)
- Gradio - Modern ML app framework powering the interface
- MediaPipe - Production-ready face detection
- ONNX Runtime - Cross-platform ML inference optimization
- OpenCV - Computer vision processing library
- uv - Fast Python package management
- Gradio Documentation - Complete framework reference
- MediaPipe Face Mesh - Face detection details
- ONNX Model Zoo - ML model optimization guides
- ssl-proxy - HTTPS deployment tool
- MobileNetV2 - Efficient neural architecture
- ArcFace Loss - Face recognition training methodology
- MS1M Dataset - Large-scale face recognition dataset
