Skip to content

RFC: Deterministic Session Checkpoint v1 (DSC) — Compaction Without Summarization #8573

@AmirTlinov

Description

@AmirTlinov

What feature would you like to see?

Status: Draft
Target: Codex CLI + IDE extension (shared session format)

0. Summary

Replace lossy "conversation summarization" compaction with a deterministic, host-generated checkpoint.

Key idea:

  • The Codex CLI already writes a structured per-session event log: $CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl.
  • We add a tiny derived projection: checkpoint_v1.json.
  • Compaction becomes a local operation: reset context + inject view_v1(checkpoint, caps).

No LLM is required to produce the checkpoint.


1. Motivation / Problem

After compaction (manual or auto), Codex often:

  • re-reads the same files,
  • re-derives already known facts,
  • loses awareness of recent edits or task pointer.

Users report that auto-compaction can reset the model's working state and it may only retain a lossy "memento" summary instead of tool-call history and concrete actions.

This wastes tokens/time and degrades UX.

Related reports:


2. Goals

  • G1. Make compaction "state checkpoint", not "narrative summary".
  • G2. Compaction must succeed even when context is full (no extra model call required).
  • G3. Deterministic + testable: given the same inputs, checkpoint and view bytes are identical.
  • G4. Dramatically reduce redundant file re-reads after compaction when artifacts are unchanged.
  • G5. No silent wrongness: stale derived facts must become SUSPECT automatically.
  • G6. Keep it bounded and cheap: stable caps, stable formatting.

3. Non-Goals

  • N1. Cross-session long-term memory / personal preferences.
  • N2. Storing "agent behavior policy" persistently (prompt-injection risk).
  • N3. Semantic search / retrieval ranking inside the checkpoint (can be an optional later layer).
  • N4. A new logging system: we reuse rollout JSONL.

4. Architecture (high-level)

Existing:

rollout-*.jsonl  (already produced by CLI)

New:

checkpoint_v1.json = deterministic projection of rollout + small validated semantic updates

Used at runtime:

view_v1(checkpoint, caps) -> injected as a "SESSION_CHECKPOINT v1" message after compaction/resume

ASCII diagram:

rollout.jsonl (events) ──► reduce() ──► checkpoint_v1.json ──► view_v1(caps) ──► LLM context
     ^  already exists             ^ new, tiny                 ^ stable text

5. Data Model (Checkpoint v1)

We keep a single JSON file as the "single source of truth" after compaction.

5.1 Types

type FactStatus = "VALID" | "SUSPECT"

type Artifact = {
  uri: string              // e.g. "src/auth.py"
  kind: "file" | "tool_output" | "command"
  hash?: string            // if available
  lastObservedSeq: number
}

type EvidenceRef = {
  source: "user" | "file" | "tool_output"
  ref: string              // message id, file path, tool call id
  hash?: string
}

type FactRecord = {
  value: string
  evidence: EvidenceRef
  dependsOn: Array<{ uri: string, hash?: string }>
  status: FactStatus
}

type DecisionRecord = {
  decisionId: string
  topic?: string
  decision: string
  rationale: string
  supersedes?: string
  evidence: EvidenceRef
}

type Plan = {
  steps: Array<{ id: string, text: string }>
  done: Record<string, boolean>
  evidence?: EvidenceRef
}

type CheckpointV1 = {
  schemaVersion: 1
  seq: number                         // last applied seq
  task: { text: string, evidence: EvidenceRef } | null
  plan: Plan
  decisions: DecisionRecord[]
  artifacts: Record<string, Artifact> // key = uri
  facts: Record<string, FactRecord>
  recentArtifacts: string[]           // URIs, stable bounded list
}

5.2 Boundedness

Hard limits (example defaults):

  • maxFactsTotal = 64
  • maxDecisionsTotal = 32
  • maxPlanStepsTotal = 32
  • maxRecentArtifacts = 16
  • maxValueChars = 160 (truncate deterministically)

Eviction is deterministic:

  • facts: evict oldest by lastTouchedSeq, tie-break by key lexicographic
  • decisions: evict oldest by seq
  • recentArtifacts: keep most recent unique URIs, bounded

6. Event Sources (no new logging layer)

We reuse existing rollout JSONL (already written by CLI).

We derive:

  • artifact observations from tool calls (host truth)
  • task pointer from last user message

Optionally, we accept validated semantic updates from the model (facts/plan/decisions) via a strict JSON tool.

6.1 Host-owned updates (always)

From tool calls / file ops:

  • artifact observed: {uri, kind, hash?, seq}
  • update recentArtifacts

Critical rule: hashes are computed/recorded by the host, not by the model.

6.2 Model-proposed semantic updates (optional MVP, but recommended)

Provide a tool:

memory.apply(payload: { kind: "plan"|"decision"|"fact", ... })

The reducer validates:

  • must include evidence that references an existing artifact/tool output id or user message id
  • must include dependsOn for facts (URIs), when applicable
  • cannot write "behavior policies" (see Security)

If invalid: reject tool call (no state update).


7. Staleness Derivation (no invalidate event)

Fact.status is derived:

A fact is VALID iff:

  • for every dep in fact.dependsOn:
    • checkpoint.artifacts[dep.uri].hash exists
    • and matches dep.hash (when dep.hash exists)

Otherwise SUSPECT.

If the current artifact hash is unknown → SUSPECT.

This guarantees no silent wrongness.


8. view_v1(checkpoint, caps) — deterministic, jitter-free

view_v1 must be:

  • pure
  • byte-for-byte stable for the same (checkpoint, caps)
  • no token counting, no heuristics, no randomness

8.1 ViewCaps

type ViewCaps = {
  maxOpenPlanSteps: number
  maxDonePlanSteps: number
  maxDecisions: number
  maxFactsValid: number
  maxFactsSuspect: number
  maxRecentArtifacts: number
  maxValueChars: number
}

8.2 Output format (stable)

[SESSION_CHECKPOINT v1]

[TASK]
- ...

[PLAN]
- [ ] ... (id=...)
- [x] ... (id=...)

[RECENT_ARTIFACTS]
- file: src/auth.py (hash=...)
- cmd:  pytest -q
- file: tests/test_auth.py (hash=...)

[DECISIONS]
- ... — ... (id=... supersedes=... evidence=source:ref)

[FACTS_VALID]
- key: value (evidence=source:ref deps=n)

[FACTS_SUSPECT]
- key: value (why=SUSPECT dep=uri)

Selection ordering rules are fixed (lexicographic for facts; stable plan order; decisions last non-superseded).

Truncation is deterministic:

  • if value > maxValueChars => take first (maxValueChars-1) + "…"

9. Compaction Behavior Change

Current compaction tries to summarize conversation.

New behavior:

  1. Persist checkpoint_v1.json (projection).
  2. Start a fresh model context and inject:
    • standard base instructions / AGENTS.md (unchanged)
    • a fixed short "how to use checkpoint" instruction
    • view_v1(checkpoint, caps)

No LLM call is required to generate the checkpoint.

This also makes /compact usable even at 100% context usage.


10. Resume Behavior

On resume (from rollout JSONL):

  • load checkpoint_v1.json if present (or rebuild from rollout)
  • inject view_v1 similarly
  • bias toward completing open plan steps

11. Security / Integrity

  • S1. Actor constraints:
    • Only "user/system" can set task pointer.
    • Only host can set artifact hashes/observations.
  • S2. Facts/decisions are NOT "behavior policies":
    • Reducer rejects updates that attempt to store normative "always do X" agent policies.
  • S3. Evidence gating:
    • facts derived from file/tool output must reference stable anchors (path/tool-call id).
  • S4. Avoid persistent injection:
    • tool outputs are never treated as instructions; they are only evidence.

12. Metrics / Acceptance Criteria

  • A1. After compaction, unchanged artifacts do not get re-read purely to "remember what we learned".
  • A2. Compaction always reduces context usage to a predictable bounded range.
  • A3. Staleness flips facts to SUSPECT automatically on hash mismatch/unknown.
  • A4. view_v1 is deterministic and unit-tested (golden tests).
  • A5. Checkpoint is bounded on disk and stable across platforms.

13. Implementation Plan (incremental)

Phase 0 (1 PR):

  • Write checkpoint_v1.json projection:
    • task pointer from last user message
    • artifacts + recentArtifacts from tool calls / file ops
  • Implement view_v1 + inject on compaction/resume
  • No model semantic updates yet

Phase 1:

  • Add memory.apply tool for plan/decision/fact updates with strict validation
  • Add staleness derivation + SUSPECT rendering

Phase 2:

  • Drift guardrails (optional): if next action edits unrelated file, ask confirmation
  • UX: /checkpoint show (optional)

14. Open Questions

  • Q1. Where to store checkpoint_v1.json (same folder as rollout? recommended).
  • Q2. Hashing strategy: cheap vs strong (git blob hash, content hash, or mtime+size fallback).
  • Q3. Default caps per model/tool-output limits.

Related Issues

Additional information

I'm willing to implement Phase 0 (checkpoint_v1.json projection + view_v1 + inject on compaction/resume) as a single PR. Can deliver working code with golden tests in a few days. Happy to iterate based on maintainer feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions