| title | RAG Bible | ||
|---|---|---|---|
| emoji | 📖 | ||
| colorFrom | yellow | ||
| colorTo | red | ||
| sdk | docker | ||
| app_port | 7860 | ||
| pinned | false | ||
| preload_from_hub |
|
||
| startup_duration_timeout | 10m |
Semantic search through the French Bible, powered by retrieval-augmented generation.
recherche-biblique.com
- About
- Features
- Quick Start
- Usage
- Architecture
- Project Structure
- Tech Stack
- Development
- Deployment
- Contributing
- Supporting the Project
- License
- Acknowledgments
Finding a half-remembered verse in a 35,000-verse corpus is hard when you only remember the idea, not the exact words. Traditional keyword search falls short because biblical language is rich with synonyms, metaphors, and paraphrase.
RAG Bible solves this with semantic search: describe a concept in plain French and get the most relevant verses back, ranked by meaning — not just keyword overlap. It combines a multilingual sentence transformer for broad recall with a cross-encoder reranker for precision, delivering results in under two seconds.
The entire system runs locally with no external API calls, no paid dependencies, and no GPU required. It ships as a single Docker container and is deployed on Hugging Face Spaces for anyone to try.
- Semantic search — find verses by meaning, not just keywords
- Two-stage retrieval — FAISS vector search for recall, cross-encoder reranking for precision
- Contextual results — each match includes surrounding verses for readable context
- Instant startup — background pipeline loading; UI available in < 1s, search auto-retries until models are ready
- Fast — sub-2s response times on CPU
- 35,000+ verses — complete French Bible (AELF translation)
- PWA-ready — offline support via service worker, installable on mobile
- Per-verse feedback — thumbs up/down on results, synced to HuggingFace Dataset
- Self-contained — no external APIs, runs entirely on your machine
- Python 3.12+
- uv package manager
bible.dbplaced indata/(SQLite database with AELF verses)
make install # install dependencies + pre-commit hooks
make ingest # build FAISS index from bible.db (~1 min)
make serve # start dev server at http://localhost:8000Open http://localhost:8000 in your browser.
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Main SPA entry point |
| POST | /search |
Search (accepts query form field, returns HTML fragment) |
| GET | /health |
Health check (200 ok or 503 loading) |
| GET | /robots.txt |
Robots.txt for crawlers |
| GET | /sitemap.xml |
XML sitemap for crawlers |
| POST | /feedback |
Per-verse feedback (fire-and-forget, returns 204) |
Ingestion reads bible.db, filters short/non-content verses (< 10 chars or < 3 words), encodes them with a multilingual sentence transformer, L2-normalizes the embeddings, and stores them in a FAISS IndexFlatIP index alongside a JSON mapping of verse metadata.
Search sanitizes the user query, encodes it with the same model, retrieves the top-K candidates via FAISS inner product (equivalent to cosine similarity for normalized vectors), then reranks with a cross-encoder. Raw reranker scores are sigmoid-normalized so 0.5 maps to the decision boundary. Each result is returned with surrounding context verses, bounded by book.
config.py # Central configuration (paths, models, thresholds)
app.py # FastAPI application + background pipeline loading
rag/ # Core package
embeddings.py # Model loading and text encoding
ingest.py # Ingestion: filter, embed, index
retrieve.py # Two-stage retrieval: FAISS + cross-encoder
feedback.py # Per-verse feedback buffer + HF Dataset flush
templates/ # Jinja2 HTML fragments
results.html # Search results (Embla Carousel)
context_verses.html # Surrounding context verses
loading.html # Loading state with HTMX auto-retry
error.html # Error display
no_results.html # No results feedback
static/ # Frontend assets (no build step)
index.html # SPA entry point (HTMX)
styles.css # Design system (CSS custom properties)
app.js # Component initializers
service-worker.js # Cache-first PWA worker
manifest.json # PWA manifest
favicon.svg # Inline SVG favicon
tests/ # Test suite
conftest.py # Shared fixtures (mock_pipeline, etc.)
test_app.py # App endpoint tests
test_embeddings.py # Embedding model tests
test_feedback.py # Feedback pipeline tests
test_ingest.py # Ingestion pipeline tests
test_integration.py # Integration tests (requires models + data/)
test_retrieve.py # Retrieval logic tests
data/ # Generated artifacts (gitignored)
bible.db # SQLite source database
index.faiss # FAISS vector index
mapping.json # Verse metadata mapping
| Component | Technology | Role |
|---|---|---|
| Embeddings | paraphrase-multilingual-MiniLM-L12-v2 |
384-dim multilingual sentence encoder |
| Reranker | mmarco-mMiniLMv2-L12-H384-v1 |
Cross-encoder for precision reranking |
| Inference | ONNX Runtime | Optimized CPU inference backend |
| Vector index | FAISS (IndexFlatIP) |
Fast inner-product similarity search |
| Backend | FastAPI + Uvicorn | Async HTTP server |
| Frontend | HTMX + Embla Carousel + vanilla CSS/JS | No-build interactive UI |
| Templating | Jinja2 | Server-rendered HTML fragments |
| Package manager | uv | Fast Python dependency management |
| Linting | Ruff | Lint + format |
| Type checking | mypy (strict) | Static type analysis |
| CI/CD | GitHub Actions | Quality gate + deploy to HF Spaces |
All commands use uv. See the Makefile for details.
make test-unit # unit tests (fast, no models needed)
make test-integration # integration tests (requires models + data/)
make test-all # all tests
make lint # ruff check + format check
make typecheck # mypy strict
make check # lint + typecheck
make format # auto-fix formatting and lint issuesRun a single test:
uv run pytest tests/test_app.py::test_name -vmake docker-build # build image
make docker-serve # run on port 7860The project deploys automatically via GitHub Actions. On every push to master:
- Quality gate — runs lint and unit tests
- Deploy — syncs code to the HF Space, which builds the Docker image and serves the app
The data/ directory (FAISS index, mapping, database) is stored on the HF Space via Git LFS and is not re-uploaded on each deploy.
Contributions are welcome. Please read CONTRIBUTING.md for setup instructions, coding conventions, and the pull request process.
If you find this tool useful, consider supporting its development:
This project is licensed under the MIT License. See LICENSE for details.
- AELF for the French Bible translation
- Sentence Transformers for multilingual embedding models
- FAISS for efficient similarity search
- FastAPI and HTMX for the web stack
- Hugging Face for free Spaces hosting
