The open-source AI gateway that intelligently routes between free and paid LLMs.
README: updated# The Problem
- Free AI tiers are fragmented. Groq, Google AI Studio, OpenRouter, Together, Mistral — all have free tiers with different formats, limits, and reliability.
- Rate limits break your app. You hit a 429 and your entire pipeline stops.
- No smart routing. Simple tasks waste premium credits, complex tasks fail on free tiers.
README: updated# The Solution
FreeRelay is a self-hosted AI gateway that automatically chooses the best provider for each request.
- Free mode: Uses only free providers (Groq, Google, OpenRouter, etc.)
- Paid mode: Uses OpenAI, Anthropic for maximum quality
- Auto mode: Free by default, intelligently switches to paid for complex tasks
┌────────────────┐ ┌────────────────────────────────────────┐
│ Your App │ │ FreeRelay Gateway │
│ │ │ │
│ OpenAI SDK │──────▶│ Task Complexity Detection │
│ LangChain │ │ Smart Provider Routing │
│ raw HTTP │ │ Circuit Breakers + Fallback │
│ │ │ Budget Forecasting │
└────────────────┘ └─────────────┬──────────────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FREE │ │ FREE │ | PAID │
│ tier │ │ tier │ │ tier │
│ Groq │ │ OpenAI │ │ GPT-4 │
│ Google │ │ Anthropic │ │ Claude │
└─────────────┘ └─────────────┘ └─────────────┘
README: updated# ⚡ Quick Start
README: updated& run - works out of the box!
pip install -e .; freerelayThat's it! FreeRelay runs in auto mode at http://localhost:8000.
README: updated## Guided Setup
README: updated add API keys
freerelay setupREADME: updated## Test it
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'README: updated## Use with OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="freerelay-auto",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)README: updated# Modes
| Mode | Description | Use Case |
|---|---|---|
free |
Only free providers | Budget-conscious apps |
paid |
Only OpenAI/Anthropic | Maximum quality |
auto |
Free + paid routing | Recommended - smart switching |
Auto mode automatically routes complex tasks (deep analysis, coding, large context) to paid providers while keeping simple tasks on free tier.
README: updated# Supported Providers
README: updated## Free Tier
| Provider | Models | RPM | Best For |
|---|---|---|---|
| Groq | llama-3.1, mixtral-8x7b | 30 | ⚡ Speed |
| gemini-1.5-flash | 15 | 🌐 Large context | |
| OpenRouter | llama-3.1, mistral-7b | 20 | 🔄 Most models |
| Together AI | llama-3.1, qwen2 | 60 | 📦 Batch |
| Mistral | mistral-small | — | 🇫🇷 Multilingual |
| NVIDIA | llama-3.1, mixtral | 40 | 🎮 GPU optimized |
README: updated## Paid Tier
| Provider | Models | Best For |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini | 🌟 Best overall |
| Anthropic | claude-3.5-sonnet | 📝 Long context |
README: updated# 🔑 How to Get API Keys (Step by Step)
README: updated## Groq (Free)
- Go to https://console.groq.com/keys
- Click Sign Up (or Log In if you have an account)
- Verify your email
- Click Create API Key
- Copy the key (starts with
gsk_...) - Add to
.env:GROQ_API_KEY=gsk_your_key_here
README: updated## Google AI Studio (Free)
- Go to https://aistudio.google.com/apikey
- Sign in with your Google account
- Click Create API Key
- Select a project (or create a new one)
- Copy the key
- Add to
.env:GOOGLE_AI_KEY=your_key_here
README: updated## OpenRouter (Free)
- Go to https://openrouter.ai/keys
- Click Sign Up (or Log In)
- Click Create Key
- Give it a name (e.g., "FreeRelay")
- Copy the key (starts with
sk-or-...) - Add to
.env:OPENROUTER_API_KEY=sk-or-your_key_here
README: updated## Together AI (Free)
- Go to https://api.together.xyz
- Click Sign Up or Log In
- Go to Settings → API Keys
- Click Create new API key
- Copy the key
- Add to
.env:TOGETHER_API_KEY=your_key_here
README: updated## Mistral AI (Free)
- Go to https://console.mistral.ai/api-keys/
- Sign up or log in
- Click Create new key
- Give it a name
- Copy the key
- Add to
.env:MISTRAL_API_KEY=your_key_here
README: updated## NVIDIA Build (Free)
- Go to https://build.nvidia.com/explore/recommended
- Click Sign Up (or Log In)
- Go to Settings → API Keys
- Click Generate API Key
- Copy the key (starts with
nvapi-...) - Add to
.env:NVIDIA_API_KEY=nvapi-your_key_here
README: updated## OpenAI (Paid)
- Go to https://platform.openai.com/api-keys
- Sign up or log in
- Click Create new secret key
- Name it (e.g., "FreeRelay")
- Copy the key (starts with
sk-...) - Add to
.env:OPENAI_API_KEY=sk-your_key_here
README: updated## Anthropic (Paid)
- Go to https://console.anthropic.com/settings/keys
- Sign up or log in
- Click Create Key
- Name it (e.g., "FreeRelay")
- Copy the key (starts with
sk-ant-...) - Add to
.env:ANTHROPIC_API_KEY=sk-ant-your_key_here
README: updated# Configuration
After getting your API keys, edit .env:
README: updated: free, paid, or auto
FREERELAY_MODE=auto
README: updated
GROQ_API_KEY=gsk_your_key_here
GOOGLE_AI_KEY=your_key_here
OPENROUTER_API_KEY=sk-or_your_key_here
TOGETHER_API_KEY=your_key_here
MISTRAL_API_KEY=your_key_here
NVIDIA_API_KEY=nvapi_your_key_here
README: updated(optional)
OPENAI_API_KEY=sk_your_key_here
ANTHROPIC_API_KEY=sk-ant_your_key_hereREADME: updated# Features That Set FreeRelay Apart
FreeRelay implements the v3 MAX inference specification documented in docs/free_relay_v3_max_spec.md (originally authored as FreeRelay_v3_MAX.zip). The spec describes an inference operating system that profiles every request, routes on expected outcomes, orchestrates declarative DAGs, validates/repairs, and keeps a policy-grade control plane buzzing behind the scenes.
README: updated## 🧠 Workload Profiling & Context Engineering Every request is profiled on ten axes (task family, depth, precision, latency class, context topology, tools, determinism, safety, output contract, and economics) in under 5ms without any LLM calls. A context optimizer salience-ranks history, packs the highest-value lanes (instructions, memory, facts, tools, scratch), and rewrites prompts per provider signature before execution.
README: updated## ⚖️ Outcome-Aware Routing & Policy Engine The router scores every provider-model on an expected utility formula that blends learned success probabilities, judge-derived quality scores, schema-compliance estimates, latency/cost/safety utilities, tenant policy weights, circuit state, budget health, and a UCB exploration bonus. Policy DSL rules can prefer/require/exclude providers, cap temperature, enable hedging, or fuse validators before the highest-utility decision is made.
README: updated## 🧵 Multi-Step Execution DAG & Validation Execution graphs replace one-shot requests. Workflows chain classifiers, generators, validators, judges, repair FSMs, tool nodes, speculative decomposers, and hedging strategies with conditional transitions (verification_failed, tool_error, etc.). Validation happens in tiers—structural (JSON/AST/schema), semantic (heuristics, spaCy), and asynchronous judges—and failures trigger repair attempts (stronger prompts, deterministic decoding, provider escalation) before the response leaves the system.
README: updated## 🛡️ Correctness, Resilience & Streaming Circuit breakers (Lua-backed CLOSED/HALF_OPEN/OPEN), EWMA budget forecasting, AIMD concurrency, brownout, and chaos-mode resilience protect downstream clients. Streaming uses backpressured SSE proxies with bounded queues and deterministic resume for long-running jobs. Semantic caching (datasketch MinHash + LSH) dedupes prompts, while observability (Prometheus + OpenTelemetry + structured logs) surfaces schema pass rates, retry taxonomies, hallucination signals, and provider drift.
README: updated## 🛰️ Control Plane, Economics & Leaderboard The control plane owns tenant policy objects, capability registry, benchmark catalog, experiments (shadowing, A/B routing, replay simulators, what-if scoring), and the economic engine. Policies cover allowed providers/geographies, cost/latency ceilings, tool restrictions, and fallback chains. Economics optimize cost-per-success, reserve premium budgets, arbitrage bursts, enforce SLA tiers, and forecast token futures. A public leaderboard (hourly aggregates) spots the best provider per task family and keeps privacy intact.
README: updated# Feature Comparison
| Feature | FreeRelay | OpenRouter | Portkey | Helicone |
|---|---|---|---|---|
| Outcome-aware routing | ✓ | Partial | – | – |
| Multi-step execution DAGs | ✓ | – | – | – |
| Validation & repair loops | ✓ | – | – | – |
| Policy DSL + experimentation | ✓ | – | – | – |
| Streaming backpressure | ✓ | ✓ | ✓ | N/A |
| OpenAI SDK compatible | ✓ | ✓ | ✓ | ✓ |
| OpenCode/Codex CLI backends | ✓ | – | – | – |
| Skills (coding-supervisor) | ✓ | – | – | – |
README: updated# Use With Your Favorite Tools
Continue.dev (VS Code)
{
"models": [{
"title": "FreeRelay",
"provider": "openai",
"model": "freerelay-auto",
"apiBase": "http://localhost:8000/v1"
}]
}LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed",
model="freerelay-auto",
)Node.js / TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8000/v1',
apiKey: 'not-needed',
});Open WebUI
Set the OpenAI API base to http://localhost:8000/v1. No API key needed.
OpenClaw
FreeRelay has built-in OpenClaw integration. Start FreeRelay, then fetch the config:
README: updated
python -m freerelay.main
README: updated config snippet
curl http://localhost:8000/openclaw/configOption A — Use the onboard wizard (recommended):
openclaw onboard --install-daemon
README: updated: Manual → Custom → Base URL: http://localhost:8000/v1 → Model: freerelay/autoOption B — Non-interactive:
openclaw onboard --non-interactive --accept-risk \
--auth-choice apiKey --token-provider custom \
--custom-base-url "http://localhost:8000/v1" \
--install-daemon --skip-channels --skip-skillsOption C — Manual config (~/.openclaw/openclaw.json):
{
"models": {
"providers": {
"freerelay": {
"baseUrl": "http://localhost:8000/v1",
"apiKey": "not-needed",
"api": "openai-completions",
"models": [
{ "id": "auto", "name": "FreeRelay Auto" },
{ "id": "freerelay-groq", "name": "FreeRelay → Groq" },
{ "id": "freerelay-google", "name": "FreeRelay → Google" }
]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "freerelay/auto" }
}
}
}Then run:
openclaw gateway runUse freerelay/auto as the model for workload-aware routing across all free providers.
For more details, see docs/openclaw-integration.md.
OpenCode & Codex
FreeRelay integrates with OpenCode as both an API proxy and CLI backend, plus Codex as a CLI backend.
OpenCode API Proxy (Zen + Go catalogs):
README: updated API key
echo "OPENCODE_API_KEY=your_key_here" >> .env
README: updated models (Claude, GPT, Gemini)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"freerelay/opencode-claude-sonnet","messages":[{"role":"user","content":"Hello"}]}'
README: updated models (Kimi, GLM, MiniMax)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"freerelay/opencode-kimi-k2","messages":[{"role":"user","content":"Write a function"}]}'List OpenCode models:
curl http://localhost:8000/opencode/modelsCLI Backend (spawn OpenCode/Codex as subprocess):
README: updated backends are available
curl http://localhost:8000/opencode/cli-backends
README: updated task via OpenCode CLI
curl -X POST http://localhost:8000/opencode/cli-run \
-H "Content-Type: application/json" \
-d '{"backend":"opencode-cli","prompt":"Write a Python hello world","model":"opencode-claude-sonnet"}'
README: updated CLI
curl -X POST http://localhost:8000/opencode/cli-run \
-H "Content-Type: application/json" \
-d '{"backend":"codex-cli","prompt":"Write a Python hello world"}'Skills:
README: updated
curl http://localhost:8000/skills
README: updated for OpenClaw
curl http://localhost:8000/skills/config| Model ID | Catalog | Upstream |
|---|---|---|
freerelay/opencode-claude-sonnet |
Zen | Claude Sonnet |
freerelay/opencode-claude-haiku |
Zen | Claude Haiku |
freerelay/opencode-gpt-4o |
Zen | GPT-4o |
freerelay/opencode-gemini-flash |
Zen | Gemini Flash |
freerelay/opencode-kimi-k2 |
Go | Kimi K2 |
freerelay/opencode-glm-4 |
Go | GLM-4 |
freerelay/opencode-minimax-01 |
Go | MiniMax |
CLI backends communicate via JSONL subprocess with API keys cleared from the environment for security.
README: updated# Docker
cd docker
docker compose up -dStarts: FreeRelay + Redis + Jaeger + Prometheus + Grafana
| Service | URL |
|---|---|
| FreeRelay API | http://localhost:8000 |
| Dashboard | http://localhost:8000/dashboard |
| Jaeger UI | http://localhost:16686 |
| Prometheus | http://localhost:9091 |
| Grafana | http://localhost:3000 (admin/freerelay) |
README: updated# 🚀 Deployment
README: updated## Railway (Recommended)
- Fork this repository.
- Create a new project on Railway and link your fork.
- Add the required environment variables (see
.env.example). - Railway will automatically detect the
railway.jsonanddocker/Dockerfileand deploy the gateway.
README: updated## Supabase Setup (Authentication & Usage Tracking)
FreeRelay supports Supabase for managing API keys and tracking usage.
- Create a new Supabase project.
- Run the SQL in
supabase_schema.sqlin the SQL Editor to create the necessary tables and indices. - Set
FREERELAY_ENABLE_SUPABASE_AUTH=trueand provideSUPABASE_URL,SUPABASE_KEY, andFREERELAY_SUPABASE_SERVICE_ROLE_KEY(for admin tasks like registration) in your environment.
README: updated## Stripe Integration (Payments)
FreeRelay includes basic Stripe integration for user upgrades.
- Set
STRIPE_SECRET_KEYandSTRIPE_WEBHOOK_SECRETin your environment. - Configure
STRIPE_SUCCESS_URLandSTRIPE_CANCEL_URL. - Use the
/v1/billing/checkoutendpoint to create a checkout session. - Set up a Stripe webhook pointing to
https://your-domain.com/v1/billing/webhooklistening forcheckout.session.completedevents.
README: updated# CLI
README: updated tool
pip install -e .
README: updated
freerelay start
README: updated mode
freerelay start --chaos
README: updated
freerelay status
README: updated benchmark
freerelay benchmark --requests 50 --concurrent 10README: updated# Project Structure
freerelay/
├── freerelay/
│ ├── main.py README: updated
│ ├── config/
│ │ ├── settings.py README: updated
│ │ ├── capability_matrix.yaml README: updated/model capability DB
│ │ └── routing_rules.yaml README: updated
│ ├── core/
│ │ ├── models/openai.py README: updated format (Pydantic v2)
│ │ ├── routing/engine.py README: updated
│ │ ├── routing/classifier.py README: updated
│ │ ├── execution/hedging.py README: updated
│ │ ├── streaming/backpressure.py
│ │ └── resilience/
│ │ ├── circuit_breaker.py README: updated→OPEN→HALF_OPEN
│ │ ├── budget.py README: updated
│ │ └── chaos.py README: updated
│ ├── providers/ README: updated, Google, OpenRouter, Together, Mistral, OpenCode
│ ├── middleware/ README: updated, audit
│ ├── observability/ README: updated, structlog, health probes
│ ├── openclaw/ README: updated
│ ├── cli_backend/ README: updated/Codex CLI subprocess backends
│ ├── skills/ README: updated(OpenCode, Codex, Supervisor)
│ └── cli/ README: updated
├── tests/ README: updated+ integration tests
├── docker/ README: updated+ compose stack
├── dashboard/index.html README: updated-time monitoring dashboard
└── docs/ README: updated
README: updated# How Routing Works
- Request arrives → Validated against OpenAI schema
- Intent classified → coding / math / creative / multilingual / chat (< 5ms)
- Providers scored →
capability × budget × circuit_state × (1/(1 + p95_latency)) - Best provider selected → Request forwarded
- On failure → Circuit breaker updated, next provider tried automatically
- After response → Tokens tracked, budget updated, metrics emitted
README: updated# FreeRelay v3 MAX Specification
FreeRelay is grounded in the v3 MAX inference operating system documented in docs/free_relay_v3_max_spec.md and the bundled FreeRelay_v3_MAX.zip. The spec lays out the complete control/data-plane split, Redis schema, workload profile schema, routing decision audit trail, expected utility math, DAG engine, validators/repair loops, capability benchmarking, and the 14-day build plan that drives the repo roadmap.
Key capabilities the spec demands:
- Workload profiling (10 axes + context lanes) that feeds routing, elevators, and observability.
- Outcome-aware routing with expected utility, UCB exploration, policy DSL, validation directives, and hedge signals.
- Multi-step execution DAGs (classification → generation → validators → judges → repairs) plus tool-aware agents and speculative decomposition.
- Resilience: circuit breakers, EWMA budget forecasting, AIMD concurrency, brownout, chaos mode, deterministic resume, and streaming backpressure.
- Control-plane economics, experiments, tenant policy controls, signed audit trails, and the privacy-preserving public leaderboard.
README: updated# Roadmap
The v3 MAX spec embeds a 14-day build plan that keeps every merge focused on the same outcome: a workload-aware control plane with intelligent routing, validation, and experiments.
- Days 1-5 — Deposit the OpenAI wire format, provider adapters, streaming/backpressure, circuit breakers, budget forecasting, and multi-provider execution so requests reliably reach the best backend.
- Days 6-10 — Deliver the profiler (all ten axes), expected utility routing, semantic cache, context pipeline, validation layers, and repair FSMs so every response is intent-aware and correct.
- Days 11-14 — Ship the execution DAG engine, control-plane learner/benchmark/anomaly systems, observability/dashboards, Docker + compose stack, and final docs/CI/packaging polish.
Refer to docs/free_relay_v3_max_spec.md for the full day-by-day checklist and done criteria.
README: updated# Contributing
Contributions welcome. Start with good first issues.
git clone https://github.com/HrachShah/FreeRelay.git
cd FreeRelay
pip install -e ".[dev]"
pytest tests/ -vREADME: updated# License
MIT — use it however you want.
If this saved you money, star the repo ⭐
Built by @HrachShah