RFC: Deterministic Session Checkpoint v1 (DSC) — Compaction Without Summarization

### What feature would you like to see?

**Status**: Draft
**Target**: Codex CLI + IDE extension (shared session format)

## 0. Summary

Replace lossy "conversation summarization" compaction with a **deterministic, host-generated checkpoint**.

**Key idea**:
- The Codex CLI already writes a structured per-session event log: `$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl`.
- We add a tiny derived projection: `checkpoint_v1.json`.
- Compaction becomes a local operation: reset context + inject `view_v1(checkpoint, caps)`.

**No LLM is required to produce the checkpoint.**

---

## 1. Motivation / Problem

After compaction (manual or auto), Codex often:
- re-reads the same files,
- re-derives already known facts,
- loses awareness of recent edits or task pointer.

Users report that auto-compaction can reset the model's working state and it may only retain a lossy "memento" summary instead of tool-call history and concrete actions.

This wastes tokens/time and degrades UX.

**Related reports**:
- "memento summary instead of full tool call history" and the model forgetting edits mid-task: #5957
- compaction fails or does not help: #4813, #7232
- compaction loop / hangs: #8365, #8481, #8402
- resume loses task intent; request for explicit task pointer + resume checkpoint: #8310

---

## 2. Goals

- **G1.** Make compaction "state checkpoint", not "narrative summary".
- **G2.** Compaction must succeed even when context is full (no extra model call required).
- **G3.** Deterministic + testable: given the same inputs, checkpoint and view bytes are identical.
- **G4.** Dramatically reduce redundant file re-reads after compaction when artifacts are unchanged.
- **G5.** No silent wrongness: stale derived facts must become `SUSPECT` automatically.
- **G6.** Keep it bounded and cheap: stable caps, stable formatting.

---

## 3. Non-Goals

- **N1.** Cross-session long-term memory / personal preferences.
- **N2.** Storing "agent behavior policy" persistently (prompt-injection risk).
- **N3.** Semantic search / retrieval ranking inside the checkpoint (can be an optional later layer).
- **N4.** A new logging system: we reuse rollout JSONL.

---

## 4. Architecture (high-level)

**Existing**:
```
rollout-*.jsonl  (already produced by CLI)
```

**New**:
```
checkpoint_v1.json = deterministic projection of rollout + small validated semantic updates
```

**Used at runtime**:
```
view_v1(checkpoint, caps) -> injected as a "SESSION_CHECKPOINT v1" message after compaction/resume
```

**ASCII diagram**:
```
rollout.jsonl (events) ──► reduce() ──► checkpoint_v1.json ──► view_v1(caps) ──► LLM context
     ^  already exists             ^ new, tiny                 ^ stable text
```

---

## 5. Data Model (Checkpoint v1)

We keep a single JSON file as the "single source of truth" after compaction.

### 5.1 Types

```typescript
type FactStatus = "VALID" | "SUSPECT"

type Artifact = {
  uri: string              // e.g. "src/auth.py"
  kind: "file" | "tool_output" | "command"
  hash?: string            // if available
  lastObservedSeq: number
}

type EvidenceRef = {
  source: "user" | "file" | "tool_output"
  ref: string              // message id, file path, tool call id
  hash?: string
}

type FactRecord = {
  value: string
  evidence: EvidenceRef
  dependsOn: Array<{ uri: string, hash?: string }>
  status: FactStatus
}

type DecisionRecord = {
  decisionId: string
  topic?: string
  decision: string
  rationale: string
  supersedes?: string
  evidence: EvidenceRef
}

type Plan = {
  steps: Array<{ id: string, text: string }>
  done: Record<string, boolean>
  evidence?: EvidenceRef
}

type CheckpointV1 = {
  schemaVersion: 1
  seq: number                         // last applied seq
  task: { text: string, evidence: EvidenceRef } | null
  plan: Plan
  decisions: DecisionRecord[]
  artifacts: Record<string, Artifact> // key = uri
  facts: Record<string, FactRecord>
  recentArtifacts: string[]           // URIs, stable bounded list
}
```

### 5.2 Boundedness

**Hard limits** (example defaults):
- `maxFactsTotal = 64`
- `maxDecisionsTotal = 32`
- `maxPlanStepsTotal = 32`
- `maxRecentArtifacts = 16`
- `maxValueChars = 160` (truncate deterministically)

**Eviction is deterministic**:
- facts: evict oldest by `lastTouchedSeq`, tie-break by key lexicographic
- decisions: evict oldest by seq
- recentArtifacts: keep most recent unique URIs, bounded

---

## 6. Event Sources (no new logging layer)

We reuse existing rollout JSONL (already written by CLI).

We derive:
- artifact observations from tool calls (host truth)
- task pointer from last user message

Optionally, we accept *validated* semantic updates from the model (facts/plan/decisions) via a strict JSON tool.

### 6.1 Host-owned updates (always)

From tool calls / file ops:
- artifact observed: `{uri, kind, hash?, seq}`
- update `recentArtifacts`

**Critical rule**: hashes are computed/recorded by the host, not by the model.

### 6.2 Model-proposed semantic updates (optional MVP, but recommended)

Provide a tool:
```
memory.apply(payload: { kind: "plan"|"decision"|"fact", ... })
```

The reducer validates:
- must include evidence that references an existing artifact/tool output id or user message id
- must include `dependsOn` for facts (URIs), when applicable
- cannot write "behavior policies" (see Security)

If invalid: reject tool call (no state update).

---

## 7. Staleness Derivation (no invalidate event)

`Fact.status` is **derived**:

A fact is `VALID` iff:
- for every dep in `fact.dependsOn`:
  - `checkpoint.artifacts[dep.uri].hash` exists
  - and matches `dep.hash` (when `dep.hash` exists)

Otherwise `SUSPECT`.

If the current artifact hash is unknown → `SUSPECT`.

**This guarantees no silent wrongness.**

---

## 8. view_v1(checkpoint, caps) — deterministic, jitter-free

`view_v1` must be:
- pure
- byte-for-byte stable for the same `(checkpoint, caps)`
- no token counting, no heuristics, no randomness

### 8.1 ViewCaps

```typescript
type ViewCaps = {
  maxOpenPlanSteps: number
  maxDonePlanSteps: number
  maxDecisions: number
  maxFactsValid: number
  maxFactsSuspect: number
  maxRecentArtifacts: number
  maxValueChars: number
}
```

### 8.2 Output format (stable)

```
[SESSION_CHECKPOINT v1]

[TASK]
- ...

[PLAN]
- [ ] ... (id=...)
- [x] ... (id=...)

[RECENT_ARTIFACTS]
- file: src/auth.py (hash=...)
- cmd:  pytest -q
- file: tests/test_auth.py (hash=...)

[DECISIONS]
- ... — ... (id=... supersedes=... evidence=source:ref)

[FACTS_VALID]
- key: value (evidence=source:ref deps=n)

[FACTS_SUSPECT]
- key: value (why=SUSPECT dep=uri)
```

**Selection ordering rules are fixed** (lexicographic for facts; stable plan order; decisions last non-superseded).

**Truncation is deterministic**:
- if `value > maxValueChars` => take first `(maxValueChars-1)` + "…"

---

## 9. Compaction Behavior Change

**Current** compaction tries to summarize conversation.

**New behavior**:
1. Persist `checkpoint_v1.json` (projection).
2. Start a fresh model context and inject:
   - standard base instructions / AGENTS.md (unchanged)
   - a fixed short "how to use checkpoint" instruction
   - `view_v1(checkpoint, caps)`

**No LLM call is required to generate the checkpoint.**

This also makes `/compact` usable even at 100% context usage.

---

## 10. Resume Behavior

On resume (from rollout JSONL):
- load `checkpoint_v1.json` if present (or rebuild from rollout)
- inject `view_v1` similarly
- bias toward completing open plan steps

---

## 11. Security / Integrity

- **S1. Actor constraints**:
  - Only "user/system" can set task pointer.
  - Only host can set artifact hashes/observations.
- **S2. Facts/decisions are NOT "behavior policies"**:
  - Reducer rejects updates that attempt to store normative "always do X" agent policies.
- **S3. Evidence gating**:
  - facts derived from file/tool output must reference stable anchors (path/tool-call id).
- **S4. Avoid persistent injection**:
  - tool outputs are never treated as instructions; they are only evidence.

---

## 12. Metrics / Acceptance Criteria

- **A1.** After compaction, unchanged artifacts do not get re-read purely to "remember what we learned".
- **A2.** Compaction always reduces context usage to a predictable bounded range.
- **A3.** Staleness flips facts to `SUSPECT` automatically on hash mismatch/unknown.
- **A4.** `view_v1` is deterministic and unit-tested (golden tests).
- **A5.** Checkpoint is bounded on disk and stable across platforms.

---

## 13. Implementation Plan (incremental)

**Phase 0 (1 PR)**:
- Write `checkpoint_v1.json` projection:
  - task pointer from last user message
  - artifacts + recentArtifacts from tool calls / file ops
- Implement `view_v1` + inject on compaction/resume
- No model semantic updates yet

**Phase 1**:
- Add `memory.apply` tool for plan/decision/fact updates with strict validation
- Add staleness derivation + `SUSPECT` rendering

**Phase 2**:
- Drift guardrails (optional): if next action edits unrelated file, ask confirmation
- UX: `/checkpoint show` (optional)

---

## 14. Open Questions

- **Q1.** Where to store `checkpoint_v1.json` (same folder as rollout? recommended).
- **Q2.** Hashing strategy: cheap vs strong (git blob hash, content hash, or mtime+size fallback).
- **Q3.** Default caps per model/tool-output limits.

---

## Related Issues

- #6102 — task-scoped context containers
- #4106 — control over auto-compaction parameters
- #668 — intelligent context selection (closed, but related)
- #5957 — memento summary loses tool call history
- #4813, #7232 — compaction fails or doesn't help
- #8365, #8481, #8402 — compaction loop/hangs
- #8310 — resume loses task intent

### Additional information

I'm willing to implement Phase 0 (checkpoint_v1.json projection + view_v1 + inject on compaction/resume) as a single PR. Can deliver working code with golden tests in a few days. Happy to iterate based on maintainer feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Deterministic Session Checkpoint v1 (DSC) — Compaction Without Summarization #8573

What feature would you like to see?

0. Summary

1. Motivation / Problem

2. Goals

3. Non-Goals

4. Architecture (high-level)

5. Data Model (Checkpoint v1)

5.1 Types

5.2 Boundedness

6. Event Sources (no new logging layer)

6.1 Host-owned updates (always)

6.2 Model-proposed semantic updates (optional MVP, but recommended)

7. Staleness Derivation (no invalidate event)

8. view_v1(checkpoint, caps) — deterministic, jitter-free

8.1 ViewCaps

8.2 Output format (stable)

9. Compaction Behavior Change

10. Resume Behavior

11. Security / Integrity

12. Metrics / Acceptance Criteria

13. Implementation Plan (incremental)

14. Open Questions

Related Issues

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Deterministic Session Checkpoint v1 (DSC) — Compaction Without Summarization #8573

Description

What feature would you like to see?

0. Summary

1. Motivation / Problem

2. Goals

3. Non-Goals

4. Architecture (high-level)

5. Data Model (Checkpoint v1)

5.1 Types

5.2 Boundedness

6. Event Sources (no new logging layer)

6.1 Host-owned updates (always)

6.2 Model-proposed semantic updates (optional MVP, but recommended)

7. Staleness Derivation (no invalidate event)

8. view_v1(checkpoint, caps) — deterministic, jitter-free

8.1 ViewCaps

8.2 Output format (stable)

9. Compaction Behavior Change

10. Resume Behavior

11. Security / Integrity

12. Metrics / Acceptance Criteria

13. Implementation Plan (incremental)

14. Open Questions

Related Issues

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions