⚠️ Internal Alpha - This project is in early development and not ready for production use.
⚠️ CRITICAL: Documentation is out of sync with code. Whisper STT has been removed; Parakeet doesn't compile. SeecriticalActionPlan.mdfor current status. Only Moonshine STT works (requiresuv syncfirst).
- Install Rust (stable) and required system dependencies for your platform.
- Use the provided scripts in
scripts/to help with local environment setup.
This project uses a "Zero-Latency" git hook standard powered by mise and lint-staged.
- Install mise:
curl https://mise.run | sh(or see docs) - Install dependencies:
mise install - Activate hooks:
mise run prepare(runs automatically onnpm install)
Hooks will now run automatically on git commit. To run manually:
mise run pre-commit
⚠️ Internal Alpha - This project is in early development and not ready for production use.
Minimal root README. Full developer & architecture guide: see CLAUDE.md.
ColdVox is a modular Rust workspace providing real‑time audio capture, VAD, STT (Faster-Whisper), and cross‑platform text injection.
For Voice Dictation (Recommended):
# Run with default Faster-Whisper STT and text injection (model auto-discovered)
cargo run --features text-injection
# With specific microphone device
cargo run --features text-injection -- --device "HyperX QuadCast"
# TUI Dashboard with controls
cargo run --bin tui_dashboard --features tuiOther Usage:
# VAD-only mode (no speech recognition)
cargo run
# Test microphone setup
cargo run --bin mic_probe -- list-devicesAudio dumps: The TUI dashboard now records raw audio to
logs/audio_dumps/by default. Pass--dump-audio=falseto disable persistent capture.
Note on Defaults: Faster-Whisper STT is the default feature (enabled automatically), ensuring real speech recognition in the app and tests. This prevents fallback to the mock plugin, which skips transcription. Override with --stt-preferred mock or env COLDVOX_STT_PREFERRED=mock if needed for testing. For other STT backends, enable their features and set preferred accordingly.
- Canonical STT selection config lives at
config/plugins.json. - Any legacy duplicates like
./plugins.jsonorcrates/app/plugins.jsonare deprecated and ignored at runtime. A warning is logged on startup if they exist. Please migrate changes intoconfig/plugins.jsononly. - Some defaults can also be set in
config/default.toml, butconfig/plugins.jsonis the source of truth for STT plugin selection.
- Python Package: Install the
faster-whisperPython package via pip - Models: Whisper models are automatically downloaded on first use
- Model Identifiers: Use standard Whisper model names (e.g., "tiny.en", "base.en", "small.en", "medium.en")
- Manual Path: Set
WHISPER_MODEL_PATHto specify a model identifier or custom model directory - Common Models:
- "tiny.en" (~39MB) - Fastest, lower accuracy
- "base.en" (~142MB) - Good balance of speed and accuracy
- "small.en" (~466MB) - Better accuracy
- "medium.en" (~1.5GB) - High accuracy
- Always-on pipeline: Audio capture, VAD, STT, and text-injection buffering run continuously by default. Raw 16 kHz mono audio is recorded to
logs/audio_dumps/for later review. - Voice activation (default): The Silero VAD segments speech automatically—no hotkey required.
- Push-to-talk (preview inject): Hold
Super+Ctrlto stream buffered text into the preview/injection window when you need manual control. Release to stop feeding new text.
More detail: See CLAUDE.md for full developer guide.
If your system default Python is 3.13, current pyo3 versions may warn about unsupported Python version during build. Two options:
- Prefer Python 3.12 for development tools, or
- Build using the stable Python ABI by exporting:
set -gx PYO3_USE_ABI3_FORWARD_COMPATIBILITY 1 # fish shell
cargo checkWe plan to upgrade pyo3 in a follow-up to remove this requirement.
- We're actively exploring an always-on intelligent listening architecture that keeps a lightweight listener running continuously and spins up tiered STT engines on demand.
- This speculative work includes decoupled listening/processing threads, dynamic STT memory management, and context-aware activation.
- Read the full experimental plan in
docs/architecture.md. Treat it as research guidance—not a committed roadmap.
Some end‑to‑end tests exercise real injection & STT. Gate them locally by setting an env variable (planned):
export COLDVOX_SLOW_TESTS=1
cargo test -- --ignoredHeadless behavior notes: see docs/text_injection_headless.md.
Dual-licensed under MIT or Apache-2.0. See LICENSE-MIT and LICENSE-APACHE if present, else crate-level manifests.
- Review the Master Documentation Playbook.
- Follow the repository Documentation Standards.
- Coordinate work through the Documentation Todo Backlog.
- Assistants should read the Assistant Interaction Index.