ColdVox

⚠️ Internal Alpha - This project is in early development and not ready for production use.

⚠️ CRITICAL: Documentation is out of sync with code. Whisper STT has been removed; Parakeet doesn't compile. See criticalActionPlan.md for current status. Only Moonshine STT works (requires uv sync first).

Development

Install Rust (stable) and required system dependencies for your platform.
Use the provided scripts in scripts/ to help with local environment setup.

Developer Git Hooks

This project uses a "Zero-Latency" git hook standard powered by mise and lint-staged.

Setup

Install mise: curl https://mise.run | sh (or see docs)
Install dependencies: mise install
Activate hooks: mise run prepare (runs automatically on npm install)

Hooks will now run automatically on git commit. To run manually:

mise run pre-commit

ColdVox

⚠️ Internal Alpha - This project is in early development and not ready for production use.

Minimal root README. Full developer & architecture guide: see CLAUDE.md.

Overview

ColdVox is a modular Rust workspace providing real‑time audio capture, VAD, STT (Faster-Whisper), and cross‑platform text injection.

Quick Start

For Voice Dictation (Recommended):

# Run with default Faster-Whisper STT and text injection (model auto-discovered)
cargo run --features text-injection

# With specific microphone device
cargo run --features text-injection -- --device "HyperX QuadCast"

# TUI Dashboard with controls
cargo run --bin tui_dashboard --features tui

Other Usage:

# VAD-only mode (no speech recognition)
cargo run

# Test microphone setup
cargo run --bin mic_probe -- list-devices

Audio dumps: The TUI dashboard now records raw audio to logs/audio_dumps/ by default. Pass --dump-audio=false to disable persistent capture.

Note on Defaults: Faster-Whisper STT is the default feature (enabled automatically), ensuring real speech recognition in the app and tests. This prevents fallback to the mock plugin, which skips transcription. Override with --stt-preferred mock or env COLDVOX_STT_PREFERRED=mock if needed for testing. For other STT backends, enable their features and set preferred accordingly.

Configuration (Canonical Path)

Canonical STT selection config lives at config/plugins.json.
Any legacy duplicates like ./plugins.json or crates/app/plugins.json are deprecated and ignored at runtime. A warning is logged on startup if they exist. Please migrate changes into config/plugins.json only.
Some defaults can also be set in config/default.toml, but config/plugins.json is the source of truth for STT plugin selection.

Whisper Model Setup

Python Package: Install the faster-whisper Python package via pip
Models: Whisper models are automatically downloaded on first use
Model Identifiers: Use standard Whisper model names (e.g., "tiny.en", "base.en", "small.en", "medium.en")
Manual Path: Set WHISPER_MODEL_PATH to specify a model identifier or custom model directory
Common Models:
- "tiny.en" (~39MB) - Fastest, lower accuracy
- "base.en" (~142MB) - Good balance of speed and accuracy
- "small.en" (~466MB) - Better accuracy
- "medium.en" (~1.5GB) - High accuracy

How It Works

Always-on pipeline: Audio capture, VAD, STT, and text-injection buffering run continuously by default. Raw 16 kHz mono audio is recorded to logs/audio_dumps/ for later review.
Voice activation (default): The Silero VAD segments speech automatically—no hotkey required.
Push-to-talk (preview inject): Hold Super+Ctrl to stream buffered text into the preview/injection window when you need manual control. Release to stop feeding new text.

More detail: See CLAUDE.md for full developer guide.

Python 3.13 and PyO3

If your system default Python is 3.13, current pyo3 versions may warn about unsupported Python version during build. Two options:

Prefer Python 3.12 for development tools, or
Build using the stable Python ABI by exporting:

set -gx PYO3_USE_ABI3_FORWARD_COMPATIBILITY 1  # fish shell
cargo check

We plan to upgrade pyo3 in a follow-up to remove this requirement.

Future Vision (Experimental)

We're actively exploring an always-on intelligent listening architecture that keeps a lightweight listener running continuously and spins up tiered STT engines on demand.
This speculative work includes decoupled listening/processing threads, dynamic STT memory management, and context-aware activation.
Read the full experimental plan in docs/architecture.md. Treat it as research guidance—not a committed roadmap.

Slow / Environment-Sensitive Tests

Some end‑to‑end tests exercise real injection & STT. Gate them locally by setting an env variable (planned):

export COLDVOX_SLOW_TESTS=1
cargo test -- --ignored

Headless behavior notes: see docs/text_injection_headless.md.

License

Dual-licensed under MIT or Apache-2.0. See LICENSE-MIT and LICENSE-APACHE if present, else crate-level manifests.

Contributing

Review the Master Documentation Playbook.
Follow the repository Documentation Standards.
Coordinate work through the Documentation Todo Backlog.
Assistants should read the Assistant Interaction Index.

Name		Name	Last commit message	Last commit date
Latest commit History 389 Commits
.cargo		.cargo
.coldvox		.coldvox
.gemini		.gemini
.github		.github
.kilocode/rules		.kilocode/rules
config		config
crates		crates
docs		docs
scripts		scripts
test/docker		test/docker
vendor		vendor
.actionlint.yaml		.actionlint.yaml
.clippy.toml		.clippy.toml
.envrc		.envrc
.gitignore		.gitignore
.ignore		.ignore
.kilocodemodes		.kilocodemodes
.lintstagedrc.json		.lintstagedrc.json
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.python-version		.python-version
.yamllint.yaml		.yamllint.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
PR-190-Comprehensive-Assessment.md		PR-190-Comprehensive-Assessment.md
README.md		README.md
agentsDocResearch.md		agentsDocResearch.md
criticalActionPlan.md		criticalActionPlan.md
deny.toml		deny.toml
justfile		justfile
mise.toml		mise.toml
plugins.json		plugins.json
pyproject.toml		pyproject.toml
release-plz.toml		release-plz.toml
requirements.txt		requirements.txt
test_enigo_live.rs		test_enigo_live.rs
toolEditResearch.md		toolEditResearch.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ColdVox

Development

Developer Git Hooks

Setup

ColdVox

Overview

Quick Start

Configuration (Canonical Path)

Whisper Model Setup

How It Works

Python 3.13 and PyO3

Future Vision (Experimental)

Slow / Environment-Sensitive Tests

License

Contributing

About

Uh oh!

Releases 1

Packages

Contributors 7

Uh oh!

Languages

Coldaine/ColdVox

Folders and files

Latest commit

History

Repository files navigation

ColdVox

Development

Developer Git Hooks

Setup

ColdVox

Overview

Quick Start

Configuration (Canonical Path)

Whisper Model Setup

How It Works

Python 3.13 and PyO3

Future Vision (Experimental)

Slow / Environment-Sensitive Tests

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 7

Uh oh!

Languages

Packages