AI-Powered GitHub Repository Analysis Platform
🚀 Live Demo → rexplain.vercel.app
RExplain automatically explains any GitHub repository using static analysis, architecture extraction, and Retrieval-Augmented Generation (RAG) — making it dramatically faster to understand large codebases without reading every file manually.
Simply paste a GitHub URL and get an instant, AI-powered breakdown of the codebase.
| Feature | Description |
|---|---|
| 🧠 Framework Detection | Detects FastAPI, Flask, Django, React, Vue, Angular, SQLAlchemy, PostgreSQL, MongoDB, and more |
| 🏗️ Architecture Diagrams | Generates dependency graphs and visual architecture diagrams |
| 📄 README Extraction | Fetches and renders the repository README with syntax-highlighted code blocks |
| 📊 Commit Analytics | Visualizes commit history, frequency, and contribution graphs |
| 🤖 AI Chatbot (RAG) | Ask natural-language questions about any repository |
| ⚡ Instant Reloads | PostgreSQL-backed cache skips re-analysis on unchanged repositories |
- FastAPI — REST API framework
- PostgreSQL + SQLAlchemy — persistent storage and ORM
- SentenceTransformers (
all-MiniLM-L6-v2) — embedding generation - Groq API — LLM inference
- Graphviz — architecture diagram rendering
- GitHub REST API — repository metadata and tree fetching
- Docker — containerized deployment
- React + Vite — fast frontend build
- TailwindCSS — utility-first styling
- Framer Motion — animations
- React Markdown — README rendering
- Axios — HTTP client
GitHub URL
↓
Repository metadata extraction
↓
Selective file fetching (GitHub Tree API)
↓
Framework detection
↓
Architecture generation
↓
README + intelligence extraction
↓
RAG indexing
↓
Interactive AI chat
Instead of cloning entire repositories (which causes bottlenecks for large repos, binaries, node_modules, and datasets), RExplain:
- Fetches the repository tree structure via the GitHub API
- Identifies and selectively retrieves only important files —
README,package.json,requirements.txt, route files, configs, and manifests - Falls back to a shallow clone (
depth=1) only if the API is unavailable
Result: Fast analysis regardless of repository size, with no large clone delays.
Embeddings and chunk metadata are stored in PostgreSQL so repeated loads are instant:
- Fetch the latest commit SHA and compare with the cached SHA
- If unchanged → restore embeddings and chunks, skip full analysis
- If changed → run the full pipeline and persist the new state
The embedding model loads once at backend startup rather than per-request, eliminating timeout issues during AI chat.
Users can ask natural-language questions about any analyzed repository:
- "What frameworks are used?"
- "Where is the database set up?"
- "How does authentication work?"
- "What are the main API routes?"
How it works:
- Repository files are chunked and embedded using
all-MiniLM-L6-v2 - On each question, relevant chunks are retrieved via semantic similarity search
- Retrieved context is passed to an LLM (Groq) for a grounded, repository-specific answer
┌─────────────────────────────────┐
│ React Frontend │ ← Vercel
│ (Analysis Panel + AI Chat) │
└────────────────┬────────────────┘
│ HTTPS
┌────────────────▼────────────────┐
│ FastAPI Backend │ ← Render (Docker)
│ Analysis · RAG · Cache Layer │
└──────┬──────────────────┬───────┘
│ │
┌──────▼──────┐ ┌───────▼──────┐
│ PostgreSQL │ │ GitHub API │
│ (Neon) │ │ Groq API │
└─────────────┘ └──────────────┘
| Layer | Platform |
|---|---|
| Frontend | Vercel |
| Backend | Render (Docker) |
| Database | Neon PostgreSQL |
Docker is used for the backend because Graphviz requires system-level installation, which Render's standard Python runtime does not support natively.
- GitHub API Timeouts — Very large repositories may be slower to analyze; retry logic and graceful fallbacks are in place
- Cold Starts — Render's free tier sleeps inactive services; the first request after inactivity may take a few seconds longer
- Interactive Diagrams — Replace Graphviz with React Flow for clickable, interactive dependency graphs
- Advanced RAG — Hybrid retrieval, AST-aware chunking, reranking, and source citations
- Streaming Responses — Real-time streaming AI chat
- Multi-Repository Analysis — Analyze microservice ecosystems and org-wide dependency relationships
- Background Workers — Celery + Redis for async job pipelines