Local-first coding-agent workflows built on OpenAI Codex and Vercel AI SDK v6.
codex-toolloop is a Node.js TypeScript CLI and runtime for building reproducible, multi-step coding-agent workflows: plan, research, implement, test, review, and produce audit-grade run artifacts (traces, diffs, reports). It is designed to integrate tightly with Codex CLI authentication (ChatGPT account usage) while using AI SDK agent patterns for orchestration and future UI integration.
- codex-toolloop
If you already use Codex daily, you typically want two things that plain “chat in a terminal” does not reliably provide:
- Reusable workflows: named, parameterized pipelines you can run again (feature dev, audits, migrations).
- Tooling discipline: controlled tool access, consistent context gathering, and inspectable evidence for every action taken.
codex-toolloop adds a workflow engine, typed handoffs, policy controls, and an artifact trail around Codex-backed engineering runs.
- Workflow engine for multi-step engineering runs (feature-dev, code review, audit).
- Multi-agent roles (planner, researcher, implementer, verifier, reviewer) with strict, typed handoffs.
- Codex-first execution with support for:
- persistent sessions and mid-run course correction (app-server mode)
- non-interactive automation with JSONL traces and structured output schemas (exec mode)
- programmatic control via the Codex TypeScript SDK (optional)
- Type-safe schemas everywhere (Zod v4) for tool inputs, step outputs, and structured reports.
- Local-first artifacts: every run produces a directory of logs and outputs that can be inspected, diffed, and shared.
flowchart LR
Dev[Developer] -->|CLI commands| CLI[toolloop CLI]
CLI --> RT["Workflow runtime<br/>(roles + steps + policies)"]
RT -->|stream + events| Codex["Codex backend<br/>(app-server / exec / sdk)"]
Codex --> Repo[(Repo workspace)]
RT --> Artifacts["Run artifacts<br/>(JSONL traces, reports, diffs)"]
RT <-->|discover + call tools| Tools[Tool substrate]
Tools -->|local HTTP or stdio| ToolServers["Tool servers<br/>(first-party + third-party)"]
sequenceDiagram
participant Dev as Developer
participant TL as toolloop
participant P as Planner
participant C as Codex backend
participant T as Tool servers
participant R as Repo workspace
Dev->>TL: toolloop run workflow feature-dev --spec specs/...
TL->>P: Produce plan (typed)
P-->>TL: PlanOutput
TL->>C: Implement step (context pack + policies)
C->>R: Read/write files, run commands
C->>T: Call tools (docs, repo, etc.)
T-->>C: Tool results
C-->>TL: Stream events + final output
TL->>TL: Verify + Review
TL-->>Dev: Final report + artifact path
- Node.js v24 LTS (runtime; required for AI SDK MCP STDIO transport)
- pnpm (recommended via Corepack)
- Codex CLI installed and authenticated (ChatGPT login or API key)
- Git (recommended)
Codex CLI references:
- CLI reference: https://developers.openai.com/codex/cli/reference/
- Non-interactive mode (
codex exec): https://developers.openai.com/codex/noninteractive/ - Codex SDK: https://developers.openai.com/codex/sdk/
git clone https://github.com/BjornMelin/codex-toolloop
cd codex-toolloop
pnpm install
# verify environment
pnpm dev:cli -- doctorpnpm v10 blocks dependency lifecycle scripts by default. This repo uses a minimal, audited allowlist
in pnpm-workspace.yaml (currently @biomejs/biome and esbuild). If you add dependencies that
require build scripts, update the allowlist (or run pnpm approve-builds).
# Planned (SPEC 010 + SPEC 040)
# pnpm dev:cli -- mcp start# Planned (SPEC 030 + SPEC 040)
# pnpm dev:cli -- run spec ./docs/specs/040-cli.md
# pnpm dev:cli -- run workflow feature-dev --approval on-failure --sandbox workspace-writeCurrent implementation status (SPEC 000): only doctor is implemented. Other commands below
are part of the roadmap and are tracked by their corresponding SPECs.
| Command | Purpose |
|---|---|
codex-toolloop doctor |
Validate environment (node, pnpm, codex) |
codex-toolloop mcp start |
Start local tool servers (planned) |
codex-toolloop run spec <path> |
Execute a SPEC-driven run (planned) |
codex-toolloop run workflow <name> |
Execute a named workflow (planned) |
codex-toolloop session list |
List runs, thread IDs, and statuses (planned) |
codex-toolloop session inspect <runId> |
Inspect run artifacts and report (planned) |
feature-dev: plan -> research -> implement -> verify -> review -> finalizereview: analyze current branch or diff, produce findings and recommendationsaudit: deep check for deprecations, API mismatches, and documentation gaps
codex-toolloop reads a single config file (plus environment variables).
Default path (configurable): ./toolloop.config.toml
[toolloop]
artifacts_dir = "~/.toolloop/runs"
default_backend = "app-server"
[policies]
approval_mode = "on-failure" # untrusted | on-failure | on-request | never
sandbox_mode = "workspace-write" # read-only | workspace-write | danger-full-access
[tools]
# domain allowlist for network fetching tools (if enabled)
allowed_domains = ["ai-sdk.dev", "developers.openai.com", "vercel.com", "github.com"]
# tool allowlist at the workflow level (optional)
allowed_tools = ["repo.readFile", "repo.listDir", "repo.ripgrep", "docs.fetch"]
[codex]
# codex model alias used by backends
model = "gpt-5.2-codex"
# optional: configure MCP-compatible tool servers here (see next section)Tooling is implemented as a shared tool substrate (MCP) that can be discovered and called at runtime. This keeps workflows extensible without hardcoding every capability into the CLI.
Codex ToolLoop uses AI SDK v6 primitives for MCP and dynamic tooling:
- MCP clients via
createMCPClient()(@ai-sdk/mcp) - Dynamic tools via
dynamicTool()(for large or evolving tool catalogs)
You can attach MCP servers to:
- provide repo utilities (search, read, directory listing)
- fetch documentation from allowlisted domains
- connect to third-party tool ecosystems
Key design point: avoid tool-definition context bloat.
- Tools are grouped into small bundles and loaded on-demand per workflow/role/step (ADR 0009, SPEC 011).
- Default transport is HTTP (deployable); stdio is local-only and may require Node.js.
- For huge catalogs, we expose only a small set of MCP meta-tools implemented with
dynamicTool()(SPEC 011) rather than injecting every tool schema.
See:
- ADR 0003:
docs/adr/0003-mcp-tooling.md - ADR 0009:
docs/adr/0009-dynamic-tool-loading.md - SPEC 010:
docs/specs/010-mcp-platform.md - SPEC 011:
docs/specs/011-dynamic-tool-loading.md
codex-toolloop supports three Codex execution backends behind a single interface.
| Backend | Best for | Notes |
|---|---|---|
app-server (default) |
Long, interactive multi-step runs | persistent threads, mid-run injection, event streaming |
exec |
Scriptable automation | JSONL traces and --output-schema for strict structured output |
sdk (optional) |
Programmatic integration | use the Codex TypeScript SDK for thread control |
References:
- App-server provider: https://ai-sdk.dev/providers/community-providers/codex-app-server
- Exec non-interactive mode: https://developers.openai.com/codex/noninteractive/
- Codex SDK: https://developers.openai.com/codex/sdk/
- AI SDK ToolLoopAgent: https://ai-sdk.dev/docs/reference/ai-sdk-core/tool-loop-agent
Every run produces a directory (default: ~/.toolloop/runs/<runId>/) with:
meta.json(backend, policies, timing, thread id)events.jsonl(normalized event stream)tool-calls.jsonl(tool invocations and redacted results)final-report.md(human-readable report)step-outputs/(typed JSON outputs per step)diff-summary.md(what changed)
This is designed for:
- debugging failures
- sharing a run with a teammate
- comparing runs over time
- creating golden tests for workflows
apps/
cli/ # CLI entrypoint and streaming UX
packages/
codex-toolloop/ # runtime: workflows, steps, artifacts, policies
codex/ # codex backends (app-server, exec, sdk)
mcp/ # tool substrate + local tool servers
workflows/ # named workflows and role definitions
testkit/ # fixtures, temp dirs, mocks
docs/ # PRD, architecture, ADRs
specs/ # implementation specs
pnpm buildVitest is used for:
- unit tests (pure TypeScript)
- integration tests (mocked Codex and tool servers)
- type-level tests (
expectTypeOf)
Run tests:
pnpm testRun typecheck:
pnpm typecheck- PRD:
docs/PRD.md - Architecture:
docs/architecture.md - ADRs:
docs/adr/ - Specs:
docs/specs/
- Stable workflow pack: feature-dev, review, audit
- Tool server catalog and discovery UX
- Stronger structured outputs for step boundaries (schema-first)
- Optional local UI (Next.js) for browsing run artifacts and starting runs
- Recording and replay of runs (golden traces)
Contributions are welcome.
Recommended flow:
- Open an issue describing the use case and expected behavior.
- Add or update a SPEC under
docs/specs/for non-trivial changes. - Add tests (unit or integration) for behavior changes.
- Keep changes small and focused.
- Default policies aim to be safe for local development:
- sandbox:
workspace-write - approvals:
on-failure
- sandbox:
- Avoid running with dangerous sandbox modes unless you fully understand the risk.
- Tool outputs are redacted before being written to artifacts where possible.
If you discover a security issue, please open a private report or create a GitHub security advisory (preferred).
See LICENSE.
If you use codex-toolloop in academic work or technical reports, cite it as software.
@software{melin_codex_toolloop_2026,
author = {Bjorn Melin},
title = {codex-toolloop: Local-first coding-agent workflows built on OpenAI Codex and Vercel AI SDK},
year = {2026},
url = {https://github.com/BjornMelin/codex-toolloop}
}