FreeRelay

The open-source AI gateway that intelligently routes between free and paid LLMs.

README: updated# The Problem

Free AI tiers are fragmented. Groq, Google AI Studio, OpenRouter, Together, Mistral — all have free tiers with different formats, limits, and reliability.
Rate limits break your app. You hit a 429 and your entire pipeline stops.
No smart routing. Simple tasks waste premium credits, complex tasks fail on free tiers.

README: updated# The Solution

FreeRelay is a self-hosted AI gateway that automatically chooses the best provider for each request.

Free mode: Uses only free providers (Groq, Google, OpenRouter, etc.)
Paid mode: Uses OpenAI, Anthropic for maximum quality
Auto mode: Free by default, intelligently switches to paid for complex tasks

┌────────────────┐       ┌────────────────────────────────────────┐
│   Your App     │       │          FreeRelay Gateway             │
│                │       │                                        │
│  OpenAI SDK    │──────▶│  Task Complexity Detection             │
│  LangChain     │       │  Smart Provider Routing                │ 
│  raw HTTP      │       │  Circuit Breakers + Fallback           │
│                │       │  Budget Forecasting                    │
└────────────────┘       └─────────────┬──────────────────────────┘
                                       │
          ┌────────────────────────────┼────────────────────────────┐
          │                            │                            │
          ▼                            ▼                            ▼
   ┌─────────────┐             ┌─────────────┐              ┌─────────────┐
   │   FREE      │             │   FREE      │              |    PAID     │
   │   tier      │             │   tier      │              │    tier     │
   │  Groq       │             │  OpenAI     │              │    GPT-4    │
   │  Google     │             │  Anthropic  │              │    Claude   │
   └─────────────┘             └─────────────┘              └─────────────┘

README: updated# ⚡ Quick Start

README: updated& run - works out of the box!
pip install -e .; freerelay

That's it! FreeRelay runs in auto mode at http://localhost:8000.

README: updated## Guided Setup

README: updated add API keys
freerelay setup

README: updated## Test it

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'

README: updated## Use with OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="freerelay-auto",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

README: updated# Modes

Mode	Description	Use Case
`free`	Only free providers	Budget-conscious apps
`paid`	Only OpenAI/Anthropic	Maximum quality
`auto`	Free + paid routing	Recommended - smart switching

Auto mode automatically routes complex tasks (deep analysis, coding, large context) to paid providers while keeping simple tasks on free tier.

README: updated# Supported Providers

README: updated## Free Tier

Provider	Models	RPM	Best For
Groq	llama-3.1, mixtral-8x7b	30	⚡ Speed
Google	gemini-1.5-flash	15	🌐 Large context
OpenRouter	llama-3.1, mistral-7b	20	🔄 Most models
Together AI	llama-3.1, qwen2	60	📦 Batch
Mistral	mistral-small	—	🇫🇷 Multilingual
NVIDIA	llama-3.1, mixtral	40	🎮 GPU optimized

README: updated## Paid Tier

Provider	Models	Best For
OpenAI	gpt-4o, gpt-4o-mini	🌟 Best overall
Anthropic	claude-3.5-sonnet	📝 Long context

README: updated# 🔑 How to Get API Keys (Step by Step)

README: updated## Groq (Free)

Go to https://console.groq.com/keys
Click Sign Up (or Log In if you have an account)
Verify your email
Click Create API Key
Copy the key (starts with gsk_...)
Add to .env: GROQ_API_KEY=gsk_your_key_here

README: updated## Google AI Studio (Free)

Go to https://aistudio.google.com/apikey
Sign in with your Google account
Click Create API Key
Select a project (or create a new one)
Copy the key
Add to .env: GOOGLE_AI_KEY=your_key_here

README: updated## OpenRouter (Free)

Go to https://openrouter.ai/keys
Click Sign Up (or Log In)
Click Create Key
Give it a name (e.g., "FreeRelay")
Copy the key (starts with sk-or-...)
Add to .env: OPENROUTER_API_KEY=sk-or-your_key_here

README: updated## Together AI (Free)

Go to https://api.together.xyz
Click Sign Up or Log In
Go to Settings → API Keys
Click Create new API key
Copy the key
Add to .env: TOGETHER_API_KEY=your_key_here

README: updated## Mistral AI (Free)

Go to https://console.mistral.ai/api-keys/
Sign up or log in
Click Create new key
Give it a name
Copy the key
Add to .env: MISTRAL_API_KEY=your_key_here

README: updated## NVIDIA Build (Free)

Go to https://build.nvidia.com/explore/recommended
Click Sign Up (or Log In)
Go to Settings → API Keys
Click Generate API Key
Copy the key (starts with nvapi-...)
Add to .env: NVIDIA_API_KEY=nvapi-your_key_here

README: updated## OpenAI (Paid)

Go to https://platform.openai.com/api-keys
Sign up or log in
Click Create new secret key
Name it (e.g., "FreeRelay")
Copy the key (starts with sk-...)
Add to .env: OPENAI_API_KEY=sk-your_key_here

README: updated## Anthropic (Paid)

Go to https://console.anthropic.com/settings/keys
Sign up or log in
Click Create Key
Name it (e.g., "FreeRelay")
Copy the key (starts with sk-ant-...)
Add to .env: ANTHROPIC_API_KEY=sk-ant-your_key_here

README: updated# Configuration

After getting your API keys, edit .env:

README: updated: free, paid, or auto
FREERELAY_MODE=auto

README: updated
GROQ_API_KEY=gsk_your_key_here
GOOGLE_AI_KEY=your_key_here
OPENROUTER_API_KEY=sk-or_your_key_here
TOGETHER_API_KEY=your_key_here
MISTRAL_API_KEY=your_key_here
NVIDIA_API_KEY=nvapi_your_key_here

README: updated(optional)
OPENAI_API_KEY=sk_your_key_here
ANTHROPIC_API_KEY=sk-ant_your_key_here

README: updated# Features That Set FreeRelay Apart

FreeRelay implements the v3 MAX inference specification documented in docs/free_relay_v3_max_spec.md (originally authored as FreeRelay_v3_MAX.zip). The spec describes an inference operating system that profiles every request, routes on expected outcomes, orchestrates declarative DAGs, validates/repairs, and keeps a policy-grade control plane buzzing behind the scenes.

README: updated## 🧠 Workload Profiling & Context Engineering Every request is profiled on ten axes (task family, depth, precision, latency class, context topology, tools, determinism, safety, output contract, and economics) in under 5ms without any LLM calls. A context optimizer salience-ranks history, packs the highest-value lanes (instructions, memory, facts, tools, scratch), and rewrites prompts per provider signature before execution.

README: updated## ⚖️ Outcome-Aware Routing & Policy Engine The router scores every provider-model on an expected utility formula that blends learned success probabilities, judge-derived quality scores, schema-compliance estimates, latency/cost/safety utilities, tenant policy weights, circuit state, budget health, and a UCB exploration bonus. Policy DSL rules can prefer/require/exclude providers, cap temperature, enable hedging, or fuse validators before the highest-utility decision is made.

README: updated## 🧵 Multi-Step Execution DAG & Validation Execution graphs replace one-shot requests. Workflows chain classifiers, generators, validators, judges, repair FSMs, tool nodes, speculative decomposers, and hedging strategies with conditional transitions (verification_failed, tool_error, etc.). Validation happens in tiers—structural (JSON/AST/schema), semantic (heuristics, spaCy), and asynchronous judges—and failures trigger repair attempts (stronger prompts, deterministic decoding, provider escalation) before the response leaves the system.

README: updated## 🛡️ Correctness, Resilience & Streaming Circuit breakers (Lua-backed CLOSED/HALF_OPEN/OPEN), EWMA budget forecasting, AIMD concurrency, brownout, and chaos-mode resilience protect downstream clients. Streaming uses backpressured SSE proxies with bounded queues and deterministic resume for long-running jobs. Semantic caching (datasketch MinHash + LSH) dedupes prompts, while observability (Prometheus + OpenTelemetry + structured logs) surfaces schema pass rates, retry taxonomies, hallucination signals, and provider drift.

README: updated## 🛰️ Control Plane, Economics & Leaderboard The control plane owns tenant policy objects, capability registry, benchmark catalog, experiments (shadowing, A/B routing, replay simulators, what-if scoring), and the economic engine. Policies cover allowed providers/geographies, cost/latency ceilings, tool restrictions, and fallback chains. Economics optimize cost-per-success, reserve premium budgets, arbitrage bursts, enforce SLA tiers, and forecast token futures. A public leaderboard (hourly aggregates) spots the best provider per task family and keeps privacy intact.

README: updated# Feature Comparison

Feature	FreeRelay	OpenRouter	Portkey	Helicone
Outcome-aware routing	✓	Partial	–	–
Multi-step execution DAGs	✓	–	–	–
Validation & repair loops	✓	–	–	–
Policy DSL + experimentation	✓	–	–	–
Streaming backpressure	✓	✓	✓	N/A
OpenAI SDK compatible	✓	✓	✓	✓
OpenCode/Codex CLI backends	✓	–	–	–
Skills (coding-supervisor)	✓	–	–	–

README: updated# Use With Your Favorite Tools

Continue.dev (VS Code)

{
  "models": [{
    "title": "FreeRelay",
    "provider": "openai",
    "model": "freerelay-auto",
    "apiBase": "http://localhost:8000/v1"
  }]
}

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed",
    model="freerelay-auto",
)

Node.js / TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8000/v1',
  apiKey: 'not-needed',
});

Open WebUI

Set the OpenAI API base to http://localhost:8000/v1. No API key needed.

OpenClaw

FreeRelay has built-in OpenClaw integration. Start FreeRelay, then fetch the config:

README: updated
python -m freerelay.main

README: updated config snippet
curl http://localhost:8000/openclaw/config

Option A — Use the onboard wizard (recommended):

openclaw onboard --install-daemon
README: updated: Manual → Custom → Base URL: http://localhost:8000/v1 → Model: freerelay/auto

Option B — Non-interactive:

openclaw onboard --non-interactive --accept-risk \
  --auth-choice apiKey --token-provider custom \
  --custom-base-url "http://localhost:8000/v1" \
  --install-daemon --skip-channels --skip-skills

Option C — Manual config (~/.openclaw/openclaw.json):

{
  "models": {
    "providers": {
      "freerelay": {
        "baseUrl": "http://localhost:8000/v1",
        "apiKey": "not-needed",
        "api": "openai-completions",
        "models": [
          { "id": "auto", "name": "FreeRelay Auto" },
          { "id": "freerelay-groq", "name": "FreeRelay → Groq" },
          { "id": "freerelay-google", "name": "FreeRelay → Google" }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "freerelay/auto" }
    }
  }
}

Then run:

openclaw gateway run

Use freerelay/auto as the model for workload-aware routing across all free providers. For more details, see docs/openclaw-integration.md.

OpenCode & Codex

FreeRelay integrates with OpenCode as both an API proxy and CLI backend, plus Codex as a CLI backend.

OpenCode API Proxy (Zen + Go catalogs):

README: updated API key
echo "OPENCODE_API_KEY=your_key_here" >> .env

README: updated models (Claude, GPT, Gemini)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"freerelay/opencode-claude-sonnet","messages":[{"role":"user","content":"Hello"}]}'

README: updated models (Kimi, GLM, MiniMax)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"freerelay/opencode-kimi-k2","messages":[{"role":"user","content":"Write a function"}]}'

List OpenCode models:

curl http://localhost:8000/opencode/models

CLI Backend (spawn OpenCode/Codex as subprocess):

README: updated backends are available
curl http://localhost:8000/opencode/cli-backends

README: updated task via OpenCode CLI
curl -X POST http://localhost:8000/opencode/cli-run \
  -H "Content-Type: application/json" \
  -d '{"backend":"opencode-cli","prompt":"Write a Python hello world","model":"opencode-claude-sonnet"}'

README: updated CLI
curl -X POST http://localhost:8000/opencode/cli-run \
  -H "Content-Type: application/json" \
  -d '{"backend":"codex-cli","prompt":"Write a Python hello world"}'

Skills:

README: updated
curl http://localhost:8000/skills

README: updated for OpenClaw
curl http://localhost:8000/skills/config

Model ID	Catalog	Upstream
`freerelay/opencode-claude-sonnet`	Zen	Claude Sonnet
`freerelay/opencode-claude-haiku`	Zen	Claude Haiku
`freerelay/opencode-gpt-4o`	Zen	GPT-4o
`freerelay/opencode-gemini-flash`	Zen	Gemini Flash
`freerelay/opencode-kimi-k2`	Go	Kimi K2
`freerelay/opencode-glm-4`	Go	GLM-4
`freerelay/opencode-minimax-01`	Go	MiniMax

CLI backends communicate via JSONL subprocess with API keys cleared from the environment for security.

README: updated# Docker

cd docker
docker compose up -d

Starts: FreeRelay + Redis + Jaeger + Prometheus + Grafana

Service	URL
FreeRelay API	http://localhost:8000
Dashboard	http://localhost:8000/dashboard
Jaeger UI	http://localhost:16686
Prometheus	http://localhost:9091
Grafana	http://localhost:3000 (admin/freerelay)

README: updated# 🚀 Deployment

README: updated## Railway (Recommended)

Fork this repository.
Create a new project on Railway and link your fork.
Add the required environment variables (see .env.example).
Railway will automatically detect the railway.json and docker/Dockerfile and deploy the gateway.

README: updated## Supabase Setup (Authentication & Usage Tracking)

FreeRelay supports Supabase for managing API keys and tracking usage.

Create a new Supabase project.
Run the SQL in supabase_schema.sql in the SQL Editor to create the necessary tables and indices.
Set FREERELAY_ENABLE_SUPABASE_AUTH=true and provide SUPABASE_URL, SUPABASE_KEY, and FREERELAY_SUPABASE_SERVICE_ROLE_KEY (for admin tasks like registration) in your environment.

README: updated## Stripe Integration (Payments)

FreeRelay includes basic Stripe integration for user upgrades.

Set STRIPE_SECRET_KEY and STRIPE_WEBHOOK_SECRET in your environment.
Configure STRIPE_SUCCESS_URL and STRIPE_CANCEL_URL.
Use the /v1/billing/checkout endpoint to create a checkout session.
Set up a Stripe webhook pointing to https://your-domain.com/v1/billing/webhook listening for checkout.session.completed events.

README: updated# CLI

README: updated tool
pip install -e .

README: updated
freerelay start

README: updated mode
freerelay start --chaos

README: updated
freerelay status

README: updated benchmark
freerelay benchmark --requests 50 --concurrent 10

README: updated# Project Structure

freerelay/
├── freerelay/
│   ├── main.py                    README: updated
│   ├── config/
│   │   ├── settings.py            README: updated
│   │   ├── capability_matrix.yaml README: updated/model capability DB
│   │   └── routing_rules.yaml    README: updated
│   ├── core/
│   │   ├── models/openai.py       README: updated format (Pydantic v2)
│   │   ├── routing/engine.py      README: updated
│   │   ├── routing/classifier.py  README: updated
│   │   ├── execution/hedging.py   README: updated
│   │   ├── streaming/backpressure.py
│   │   └── resilience/
│   │       ├── circuit_breaker.py README: updated→OPEN→HALF_OPEN
│   │       ├── budget.py          README: updated
│   │       └── chaos.py           README: updated
│   ├── providers/                 README: updated, Google, OpenRouter, Together, Mistral, OpenCode
│   ├── middleware/                README: updated, audit
│   ├── observability/             README: updated, structlog, health probes
│   ├── openclaw/                  README: updated
│   ├── cli_backend/               README: updated/Codex CLI subprocess backends
│   ├── skills/                    README: updated(OpenCode, Codex, Supervisor)
│   └── cli/                       README: updated
├── tests/                         README: updated+ integration tests
├── docker/                        README: updated+ compose stack
├── dashboard/index.html          README: updated-time monitoring dashboard
└── docs/                          README: updated

README: updated# How Routing Works

Request arrives → Validated against OpenAI schema
Intent classified → coding / math / creative / multilingual / chat (< 5ms)
Providers scored → capability × budget × circuit_state × (1/(1 + p95_latency))
Best provider selected → Request forwarded
On failure → Circuit breaker updated, next provider tried automatically
After response → Tokens tracked, budget updated, metrics emitted

README: updated# FreeRelay v3 MAX Specification

FreeRelay is grounded in the v3 MAX inference operating system documented in docs/free_relay_v3_max_spec.md and the bundled FreeRelay_v3_MAX.zip. The spec lays out the complete control/data-plane split, Redis schema, workload profile schema, routing decision audit trail, expected utility math, DAG engine, validators/repair loops, capability benchmarking, and the 14-day build plan that drives the repo roadmap.

Key capabilities the spec demands:

Workload profiling (10 axes + context lanes) that feeds routing, elevators, and observability.
Outcome-aware routing with expected utility, UCB exploration, policy DSL, validation directives, and hedge signals.
Multi-step execution DAGs (classification → generation → validators → judges → repairs) plus tool-aware agents and speculative decomposition.
Resilience: circuit breakers, EWMA budget forecasting, AIMD concurrency, brownout, chaos mode, deterministic resume, and streaming backpressure.
Control-plane economics, experiments, tenant policy controls, signed audit trails, and the privacy-preserving public leaderboard.

README: updated# Roadmap

The v3 MAX spec embeds a 14-day build plan that keeps every merge focused on the same outcome: a workload-aware control plane with intelligent routing, validation, and experiments.

Days 1-5 — Deposit the OpenAI wire format, provider adapters, streaming/backpressure, circuit breakers, budget forecasting, and multi-provider execution so requests reliably reach the best backend.
Days 6-10 — Deliver the profiler (all ten axes), expected utility routing, semantic cache, context pipeline, validation layers, and repair FSMs so every response is intent-aware and correct.
Days 11-14 — Ship the execution DAG engine, control-plane learner/benchmark/anomaly systems, observability/dashboards, Docker + compose stack, and final docs/CI/packaging polish.

Refer to docs/free_relay_v3_max_spec.md for the full day-by-day checklist and done criteria.

README: updated# Contributing

Contributions welcome. Start with good first issues.

git clone https://github.com/HrachShah/FreeRelay.git
cd FreeRelay
pip install -e ".[dev]"
pytest tests/ -v

README: updated# License

MIT — use it however you want.

If this saved you money, star the repo ⭐
Built by @HrachShah

README: updated fix test-run-1776720263

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
dashboard		dashboard
docker		docker
docs		docs
freerelay		freerelay
integrations/openclaw		integrations/openclaw
tests		tests
tmp_docx		tmp_docx
tmp_max		tmp_max
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
FreeRelay_Complete_Spec.md		FreeRelay_Complete_Spec.md
FreeRelay_v3_MAX.zip		FreeRelay_v3_MAX.zip
LICENSE		LICENSE
README.md		README.md
extracted_max.txt		extracted_max.txt
extracted_max.xml		extracted_max.xml
pyproject.toml		pyproject.toml
railway.json		railway.json
requirements.txt		requirements.txt
section_18.py		section_18.py
section_detect.py		section_detect.py
supabase_schema.sql		supabase_schema.sql
test_analytics.py		test_analytics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FreeRelay

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FreeRelay

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages