Skip to content

[pull] main from openai:main#58

Open
pull[bot] wants to merge 3510 commits into
kontext-security:mainfrom
openai:main
Open

[pull] main from openai:main#58
pull[bot] wants to merge 3510 commits into
kontext-security:mainfrom
openai:main

Conversation

@pull

@pull pull Bot commented Mar 12, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull Bot locked and limited conversation to collaborators Mar 12, 2026
@pull pull Bot added ⤵️ pull merge-conflict Sync PR has merge conflicts labels Mar 12, 2026
cconger and others added 27 commits June 21, 2026 13:53
## Summary

- Restore yielded output when an observation receiver disappears before
delivery.
- Preserve pending-frontier output and tool IDs across failed delivery.
- Add dropped-observer coverage for yield and pending observations.

## Why

Canceling a wait must not consume output or a pending frontier that the
caller never received.

## Impact

A later observation can recover undelivered incremental output without
duplication.

## Validation

- Stack-tip validation: `just test -p codex-code-mode -p
codex-code-mode-protocol` (70 passed).
- Parent branch:
`cconger/code-mode-runtime-compact-03e-shutdown-hierarchy`.
## Summary

- Retain the first pre-observation `yield_control()` boundary when a
cell completes before observation.
- Deliver the preserved yield before the buffered completion.
- Keep later unattached yields as no-ops.

## Why

Create followed by the initial wait must preserve the former execute
response boundary even when the script runs to completion first.

## Impact

The first wait observes the same initial yield boundary as before create
and observe were decoupled.

## Validation

- Focused initial-yield signature regression passed.
- Stack-tip validation: `just test -p codex-code-mode -p
codex-code-mode-protocol` (70 passed).
- Parent branch:
`cconger/code-mode-runtime-compact-03e2-observation-delivery`.
## Summary

- add a default-on `auto_compaction` feature flag as an internal escape
hatch
- skip pre-turn, model-switch/hash, and mid-turn automatic compaction
when the flag is disabled
- preserve manual `/compact` behavior and surface the existing
context-window error when the provider runs out of room
- add integration coverage for disabled pre-turn and mid-turn compaction

## Motivation

Long-running SPO optimization rollouts need the option to preserve their
full context and fail on context exhaustion instead of entering another
compaction window. This deliberately uses the existing feature-flag
mechanism rather than adding a dedicated public config or app-server
API.

Disable it with:

```sh
codex --disable auto_compaction
```

## Testing

- `just test -p codex-features` — 51 passed
- `just test -p codex-core auto_compaction_feature_disabled` — 2 passed
- `just fix -p codex-core -p codex-features`
- `just write-config-schema`
- `just test -p codex-core` — the new compaction tests passed; the
overall local run had 54 unrelated environment failures, primarily
missing first-party test binaries and shell-snapshot timeouts
Responses API safety buffering metadata currently stops at the transport
boundary, so app-server clients cannot render the in-progress safety
review state.

This change:
- decodes and deduplicates `safety_buffering` metadata from Responses
API SSE and WebSocket events without suppressing the original response
event
- emits a typed core event containing the requested model plus backend
use cases and reasons
- forwards that event as `turn/safetyBuffering/updated` through
app-server v2 and updates generated protocol schemas
- keeps the side-channel event out of persisted rollouts and turn timing

This supports the Codex Apps buffering UX and depends on the Responses
API backend work in openai/openai#1044569 and
openai/openai#1044571.

Validation:
- focused `codex-core` safety-buffering integration test passes
- `cargo check -p codex-core -p codex-app-server -p
codex-app-server-protocol`
- `just fix -p codex-api -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-rollout -p
codex-rollout-trace -p codex-otel`
- `just fmt`
- broad package test run: 4,430/4,492 passed; 62 unrelated
local-environment/concurrency failures involved unavailable test
binaries, MCP subprocess setup, and app-server timeouts
## Summary

- read the `AutoCompaction` feature flag through `TurnContext::config`
- fix both the mid-turn and pre-sampling compaction checks

## Why

#28260 was validated against an older base where `TurnContext` exposed a
direct `features` field. It was then merged after that field had moved
under `config`, leaving the merge result unable to compile with `E0609`
on `turn_context.features`.

This restores compilation for Bazel, SDK, and argument-comment-lint jobs
that build `codex-core`. Behavior is unchanged: disabling
`auto_compaction` still skips automatic compaction.

## Validation

- `just fmt`
- `CODEX_HOME=/private/tmp/codex-fix-auto-compaction-test-home just test
-p codex-core auto_compaction_feature_disabled` — 4 passed
- `just test -p codex-core` — `codex-core` compiled; 2,722 passed and 89
unrelated local-environment failures remained because the sandbox could
not write the default Codex SQLite/proxy paths and some first-party test
binaries were unavailable
## Summary

A cold-resumed subagent kept its durable thread ID but could receive a
new session ID, splitting one agent tree across multiple sessions after
a restart.

Persist the root session ID in every rollout `SessionMeta`, carry it
through thread creation, and restore it before initializing the resumed
`Session` and `AgentControl`.

## Behavior

For a nested agent tree:

```text
root session R
  parent thread P
    child thread C
```

The child rollout stores:

```text
session_id:       R
parent_thread_id: P
id:               C
```

After a cold resume, the child still belongs to root session `R` while
its immediate parent remains `P`. The integration coverage uses distinct
values for all three IDs so it catches restoring the session from
`parent_thread_id`.

## Legacy rollouts

Previous rollouts have `id` but no `session_id`. `SessionMetaLine`
deserialization treats a missing `session_id` as `id`, keeping those
files readable, listable, and resumable. When a legacy subagent is
resumed through its root, that synthesized child ID no longer overrides
the inherited root-scoped `AgentControl`. New rollouts always persist
the explicit root session ID.
## Why

Multi-agent delegation policy was split across `multiAgentMode`,
`features.multi_agent_mode`, and `usage_hint_enabled`. These controls
could disagree: a requested mode could be downgraded by the feature
flag, and disabling usage hints also disabled mode instructions.

Some clients also need multi-agent tools without adding
delegation-policy text to model context. The previous two-mode API could
not express that directly.

## What changed

`multiAgentMode` is now the only live delegation-policy control:

| Mode | Behavior |
| --- | --- |
| `none` | Keep multi-agent tools available without adding mode
instructions. |
| `explicitRequestOnly` | Only delegate after an explicit user request.
|
| `proactive` | Delegate when parallel work materially improves speed or
quality. |

- new threads default to `explicitRequestOnly`; omitting the mode on
later turns keeps the current value
- thread start, resume, fork, and settings responses always report the
concrete current mode instead of `null`
- mode selection remains sticky across turns and resume
- usage-hint text no longer controls whether mode instructions apply
- `features.multi_agent_mode` and `usage_hint_enabled` remain accepted
as ignored compatibility settings so existing configs continue to load
- app-server documentation and generated schemas describe the three-mode
API

## Tests

- `just test -p codex-core multi_agent_mode`
- `just test -p codex-core multi_agent_v2_config_from_feature_table`
- `just test -p codex-core spawn_agent_description`
- `just test -p codex-features`
- `just test -p codex-app-server-protocol`
- `just test -p codex-app-server multi_agent_mode`
## Why

PR #29108 lets the orchestrator send sandbox intent with `process/start`
without wrapping the command for its own operating system.

This PR completes that boundary by making the executor interpret and
enforce the intent using its own filesystem paths and sandbox
implementation.

For example, a macOS TUI targeting a Linux devbox sends `/bin/bash -lc
pwd`. The Linux executor turns that into its own `codex-linux-sandbox
... /bin/bash -lc pwd` launch.

## What changes

- Keep `process/start` unchanged when no sandbox intent is present.
- Convert sandbox `PathUri` values into native paths on the executor.
- Bind symbolic `:workspace_roots` permissions to the executor's native
sandbox cwd.
- Select the sandbox implementation on the executor and wrap the
original command immediately before spawning it.
- Reject sandbox-required execution before spawning when the executor
cannot enforce the intent.
- Pass exec-server runtime paths into process creation so Linux can
locate `codex-linux-sandbox`.

The boundary is therefore:

```text
orchestrator                         executor
original argv + sandbox intent  ->  select and enforce local sandbox
```

This PR intentionally treats a denied remote command as an ordinary
command failure. Draft follow-up #29424 carries a semantic
`sandboxDenied` result back to unified exec for the existing approval
and retry flow.

## Platform scope

Linux and macOS use their existing direct-spawn sandbox transforms.

Windows sandboxed remote process launch is intentionally unsupported in
this PR. The current Windows direct-spawn wrapper does not correctly
preserve arbitrary argv, TTY behavior, or pass the full child
environment out of band. The executor rejects the request instead of
running it incorrectly or unsandboxed.

## Known follow-ups

- The transported permission profile can still contain
orchestrator-materialized helper or explicit paths. A `TODO(jif)` marks
where the executor boundary should receive pre-host-materialization
permission intent.
- The sandbox wrapper currently replaces a requested custom inner
`arg0`. A `TODO(jif)` marks where this must be preserved or rejected
explicitly.
- Draft PR #29424 contains the deferred sandbox-denial classification
and approval/retry behavior.

## Rollout assumption

This executor-sandbox stack is unreleased and its client and executor
are expected to move together. This PR does not add mixed-version
negotiation with older exec servers.
## Summary

- Add backend-client types and fetch support for active workspace
messages.
- Add the app-server v2 `account/workspaceMessages/read` method,
generated schemas, and README documentation.
- Delegate workspace-message eligibility to the Codex backend feature
gate; map a backend 404 to `featureEnabled: false`.

## Testing

- `just write-app-server-schema`
- `just test -p codex-backend-client`
- `just test -p codex-app-server-protocol`
- `just test -p codex-app-server workspace_messages`
- `just fix -p codex-backend-client -p codex-app-server-protocol -p
codex-app-server`
- `just fmt`

## Stack

- Base PR for #28232, which adds the TUI status-line integration.
## Why

Every successful Responses WebSocket event currently produces three
local log records: the full payload at TRACE, an OpenTelemetry log
event, and an OpenTelemetry trace event.

On busy threads these records fill the 1,000-row log partition in
seconds and cause continuous SQLite insert-and-prune churn.

Related to
https://openai.slack.com/archives/C095U48JNL9/p1782128972644209

## What changed

- Stop logging each successful Responses WebSocket payload at TRACE.
- Stop emitting `codex.websocket_event` as OpenTelemetry log and trace
events.
- Keep WebSocket event counters, duration metrics, response timing
metrics, parsing, and error handling.
## Why

Nonblocking environment snapshots allow a turn to reach the model while
a remote environment is still starting. The initial context can describe
that environment as still loading, but nothing currently refreshes the
model-visible environment context when startup finishes during the same
turn.

This adds the first request-scoped reconciliation slice on top of
#28683. It is gated by `DeferredExecutor` and intentionally updates only
model-visible environment context; tools and other environment-derived
state will migrate separately.

## What

- Add a minimal `StepContext` containing the environment snapshot
captured before each sampling request.
- Render attached environments with their resolved shell and starting
environments with `still loading`.
- Track the latest environment state recorded in model history and
append a bounded update only when it changes.
- Seed that baseline from full initial context so ready-at-start
environments are not duplicated.
- Clear the in-memory baseline when history is rewritten so replacement
history can be refreshed safely.

## Testing

- `just test -p codex-core deferred_executor`
- `just test -p codex-core
environment_context_baseline_deduplicates_until_history_is_replaced`

The integration coverage verifies that a pending environment reaches the
first request, the ready state reaches the next request, later requests
do not duplicate it, and ready-at-start environments remain
single-injected.

<details>
<summary>Live verification</summary>

- Connected to a real remote executor with startup deliberately delayed
and forced three sampling requests in one turn.
- Inspected the raw model inputs: request 1 showed the remote
environment as `still loading`, request 2 appended its ready shell and
cwd, and request 3 contained no duplicate ready update.
- With the feature disabled, startup waited for the delayed executor and
the first request contained only the ready environment.
- With a synchronously ready environment and the feature enabled, the
first request contained one environment context with no duplicate.
- Executed `pwd` and read a marker file through the remote process
runner; the command exited successfully and returned the remote cwd and
marker contents.

</details>
## Description

Restore `thread_source` in `x-codex-turn-metadata`.

Inadvertently removed `thread_source` from `x-codex-turn-metadata` in
#27122 - didn't realize it was a
top-level thread app-server API field, not passed in
`responsesapi_client_metadata`.

This also reserves the key so `responsesapi_client_metadata` cannot
override it.
## Why

The local SQLite log sink currently enables TRACE for every target. This
persists high-volume dependency logs bridged through `target=log` and
duplicates OpenTelemetry mirror events in `codex_otel.log_only` and
`codex_otel.trace_safe`.

These records rapidly consume the per-partition log budget and cause
unnecessary SQLite insert-and-prune churn.

## What changed

- Keep TRACE persistence for other targets.
- Exclude bridged `target=log` events from the SQLite sink.
- Exclude the two `codex_otel` mirror targets from the SQLite sink.
- Share the same filter between app-server and TUI.

Remote OpenTelemetry export and metrics are unchanged.
## What

- make Fjord's centralized response-item image preparation unconditional
for new and resumed history
- have local user images and `view_image` outputs always defer decoding
and resizing to that path
- retain `resize_all_images` as an ignored, removed compatibility key
for released clients
- delete the flag-off producer paths and obsolete policy-specific tests

## Why

Centralized preparation is now the intended image path. Keeping the
runtime feature checks also kept two image-processing implementations
alive and allowed client config to select the legacy behavior.

This is a clean replacement for #28975, rebuilt from the latest `main`.

## How

`prepare_response_items` now runs whenever items enter history and
whenever persisted history is reconstructed. Producers emit deferred
image data, so malformed images become the existing model-visible
placeholder instead of failing the session at the producer.

## Test plan

- `just fmt`
- `just fix -p codex-core -p codex-features`
- `just test -p codex-features` — 52 passed
- focused affected `codex-core` set — 20 passed
- `just test -p codex-core handle_accepts_explicit_high_detail` — 1
passed
- full `just test -p codex-core` attempt — 2,723 passed; 88 unrelated
environment failures from read-only `~/.codex` SQLite state and
unavailable integration helper binaries
The custom Windows argument-comment-lint job was temporarily moved to
`windows-2022` in #28940 after hermetic LLVM source extraction failed on
the newer runner. This takes the upstream extraction fix so the job can
return to the intended custom runner.

This upgrades `llvm` to `0.7.9` and `rules_cc` to `0.2.18`, refreshes
the module lock, rebases the remaining Windows and custom libc++
patches, drops the obsolete symlink-extraction workaround, and restores
the `windows-x64` runner configuration.

Validation:

- Verified all LLVM patches apply cleanly against the `0.7.9` source.
- Built `@llvm-project//compiler-rt:clang_rt.builtins.static`.
This PR moves construction of `PluginTelemetryMetadata` from loader and
model helpers into `PluginsManager`, which already owns installed plugin
state and will eventually perform remote identity enrichment. The
metadata type remains in `codex-plugin`, and serialized analytics events
remain unchanged.

## Before

```mermaid
flowchart LR
    subgraph Events["Analytics event paths"]
        direction TB
        Lifecycle["Local install / uninstall"]
        Config["Enable / disable"]
        Remote["Remote install"]
        Used["Plugin used"]
    end

    subgraph Construction["Metadata construction"]
        direction TB
        Loader["Loader telemetry helpers"]
        Summary["PluginCapabilitySummary::telemetry_metadata"]
        Override["Caller adds remote_plugin_id"]
    end

    Metadata["PluginTelemetryMetadata"]

    Lifecycle --> Loader
    Config --> Loader
    Remote --> Loader
    Loader -->|"local events"| Metadata
    Loader -->|"remote install"| Override
    Override --> Metadata
    Used --> Summary
    Summary --> Metadata
```

Telemetry metadata was constructed through loader helpers, a
capability-summary method, and a remote-install call-site override.

## After

```mermaid
flowchart LR
    subgraph Events["Analytics event paths"]
        direction TB
        Lifecycle["Local install / uninstall"]
        Config["Enable / disable"]
        Remote["Remote install"]
        Used["Plugin used"]
    end

    Manager["PluginsManager — single construction owner"]
    Metadata["PluginTelemetryMetadata"]

    Lifecycle --> Manager
    Config --> Manager
    Remote -->|"authoritative remote ID"| Manager
    Used -->|"capability summary"| Manager
    Manager --> Metadata
```

Every analytics path delegates metadata construction to
`PluginsManager`. Remote install still supplies its authoritative
backend ID explicitly.

## What Changes

- Make loader code return a focused plugin capability summary instead of
constructing analytics metadata.
- Centralize immutable plugin telemetry metadata construction in
`PluginsManager`.
- Route local install/uninstall, remote install, enable/disable, and
plugin-used emitters through the manager.
- Preserve the current serialized analytics contract exactly.

Normal metadata still has no remote override. Remote install continues
to provide its authoritative backend ID explicitly, so the existing
serializer continues reporting that ID through `plugin_id`.
Snapshot-based enrichment is intentionally deferred to the final PR.

## Testing

- `just test -p codex-core-plugins` (238 tests passed)
- `just test -p codex-plugin` (3 tests passed)
- Scoped Clippy/compile checks passed for `codex-plugin`,
`codex-core-plugins`, `codex-app-server`, and `codex-core`.

## Split Overview

```text
main
├── #27093  Debug analytics capture                 (merged)
├── #27099  Non-mutating plugin smoke               (merged)
├── #27100  Remote install/uninstall smoke          (merged)
└── #27102  Plugin telemetry metadata refactor      ← you are here
    └── #27669  Persist remote plugin identity

After #27102 and #27669 merge:
└── Final PR: add explicit local and remote IDs to plugin analytics
```

Review order and dependencies:

1. [#27093 Add debug-only analytics event
capture](#27093) (merged)
2. [#27099 Add a plugin analytics smoke
workflow](#27099) (merged)
3. [#27100 Add a remote plugin analytics mutation smoke
workflow](#27100) (merged)
4. This metadata refactor, independent and based on `main`
5. [#27669 Persist remote plugin
identity](#27669), stacked on this
PR
6. Final remote-ID behavior PR, created after the prerequisites merge

The original [#26281](#26281)
remains open as the aggregate reference until the final replacement PR
is published.
## Summary

[#26701](#26701) added remote plugin
identity support, [#26702](#26702)
added remote-section fetching and state, and
[#28768](#28768) extracted the
catalog rendering module. This PR builds the product-facing `/plugins`
catalog on that foundation so remote records appear as OpenAI Curated,
Workspace, and Shared with me sections rather than backend marketplace
implementation details.

Plugin details remain read-only for sharing metadata. This PR does not
add share-authoring actions or change the app-server protocol.

## Changes

- Renders OpenAI Curated, Workspace, and Shared with me sections with
loading, empty, and error states.
- Preserves section selection and stable tab ordering as remote sections
transition between fallback and populated states.
- Shows OpenAI Curated loading only when the explicit vertical fallback
request was issued.
- Centralizes remote marketplace identity matching around the existing
marketplace constants.
- Uses product labels for remote marketplaces and identifies the
personal marketplace as Local by its path.
- Shows read-only source, authentication, version, and sharing metadata
in plugin detail views.
- Applies narrow display deduplication for local and remote records
sharing a remote plugin ID:
  - installed records take precedence;
- local mapped sources are preferred for details only when their
installed state matches the selected record.
- Returns from detail and confirmation views through the current plugin
cache so newly loaded remote sections are not overwritten by an older
captured response.
- Keeps admin-disabled plugins view-only and labels default-installed
plugins as Available by default.

## Tests

New tests:

- `plugins_popup_admin_disabled_available_plugin_has_view_only_hint`
- `plugins_popup_remote_section_fallback_states_snapshot`
-
`plugins_popup_installed_remote_row_keeps_remote_detail_when_local_share_is_uninstalled`

Updated existing plugin catalog tests and snapshots for product labels,
detail metadata, personal-marketplace labeling, and stable tab ordering.

Verification:

- `cargo clippy -p codex-tui --all-targets -- -D warnings`

## Follow-ups

- Local/remote duplicate normalization should eventually move into
app-server. This PR intentionally keeps the compatibility behavior
narrow and display-only.
- PR5 will sanitize sensitive components before displaying Git source
URLs.
## Why

#29113 moved remote sandbox setup and enforcement to the exec server.
That gives the executor ownership of the platform-specific work: a Linux
executor chooses and runs a Linux sandbox even when the Codex
orchestrator is running on macOS or Windows.

It also means the orchestrator no longer knows which concrete sandbox
the executor selected. When that sandbox blocks a remote command, the
orchestrator currently sees only a failed process and can treat the
denial as an ordinary command failure. The existing sandbox approval and
retry path is then skipped.

This PR lets the executor report one portable fact:

> This command probably failed because the executor sandbox blocked it.

The executor keeps its concrete sandbox type private. The protocol sends
only the semantic result.

## Example

Suppose a local macOS Codex session asks a Linux devbox to write outside
the allowed workspace.

Before this PR:

```text
Linux sandbox blocks the write
    -> remote process exits with "Permission denied"
    -> local orchestrator sees an ordinary command failure
    -> the normal sandbox approval and retry path can be skipped
```

With this PR:

```text
Linux sandbox blocks the write
    -> executor reports sandboxDenied: true
    -> unified exec returns UnifiedExecError::SandboxDenied
    -> the existing approval prompt is shown
    -> an approved retry runs through the existing unsandboxed retry path
```

## What changes

### The executor remembers its selected sandbox

The prepared remote process now retains the executor-selected
`SandboxType`. This value never crosses the executor boundary.

Commands started without a sandbox retain `SandboxType::None` and are
never reported as sandbox denials.

### The executor uses the existing denial heuristic

The existing local denial heuristic moves from `codex-core` into the
shared `codex-sandboxing` crate.

When a sandboxed remote process exits, the executor:

1. waits the same short output grace period used by local unified exec;
2. reads the output currently available in the existing retained output
buffer;
3. runs the existing heuristic using the exit code and common denial
messages;
4. stores the yes/no result before publishing the process exit.

This deliberately matches the old local unified-exec behavior. It does
not add a new streaming classifier, another output buffer, or stronger
output-retention guarantees.

### The protocol reports a portable boolean

`process/read` gains `sandboxDenied`:

```json
{
  "exited": true,
  "exitCode": 1,
  "closed": false,
  "sandboxDenied": true
}
```

The field defaults to `false` when an older executor omits it. The
response does not expose the executor sandbox implementation or
executor-native paths.

### Unified exec uses the existing error path

The exec-server client carries `sandboxDenied` into the unified process
state. If it is true, unified exec returns the existing `SandboxDenied`
error instead of trying to classify remote output using an
orchestrator-side sandbox type.

Remote process exit remains visible as soon as the process exits. This
PR does not wait for stdout or stderr to close and does not change the
existing process lifecycle.

## Scope

This PR is intentionally limited to matching the existing local
unified-exec behavior for the initial command execution path.

It does not add:

- incremental denial tracking across the full output stream;
- new denial handling for commands completed later through
`write_stdin`;
- new guarantees for preserving the semantic flag during the narrow
reconnect-recovery race.

Those can be considered separately if the same behavior is added for
local execution.

## Test coverage

One remote end-to-end integration test covers the complete intended
flow:

```text
remote read-only sandbox
    -> denied write
    -> executor reports the denial
    -> Codex requests approval
    -> user approves
    -> retry succeeds on the remote executor
```

Existing lifecycle coverage continues to verify that remote process exit
is reported before late output streams close.
…28968)

## Description
This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
here: #28355) to
`ResponseItem.internal_chat_message_metadata_passthrough`, which is the
blessed path and has strongly-typed keys.

For now we have to drop this MAv2 usage of `metadata`:
#28561 until we figure out where
that should live.
## Summary

- use generated image data URLs in the Python SDK examples and notebook
- document HTTP and HTTPS image URLs as deprecated and recommend
`LocalImageInput`
- replace the remote-URL integration test with data-URL coverage

`ImageInput` remains available for data URLs. The SDK does not duplicate
app-server URL validation.

## Testing

- `uv run --frozen --no-sync ruff check --output-format=full .`
- `uv run --frozen --no-sync ruff format --check .`
- full Python SDK test suite with an isolated writable
`CODEX_SQLITE_HOME` (119 passed, 38 skipped)
## Why

The reset flow introduced in #28154 still describes earned reset credits
as "rate-limit resets" and uses generic reset-scope copy. It can also
retain a stale available-credit count after redemption or an account
change, leaving the reset action enabled after the last credit is used.

This follow-up updates terminology only within that reset feature.
Existing rate-limit wording elsewhere in the CLI and TUI is unchanged.

## What changed

- Rename reset-specific `/usage` menu items, startup hints, and reset
dialogs to "usage limit reset."
- Describe monthly resets for Free, Go, and accounts that report a
monthly usage window; otherwise describe the current 5-hour and weekly
limits.
- Recheck a cached zero balance when `/usage` is reopened, and refresh
the balance after redemption so the final reset immediately disables the
action.
- Correlate async refresh results before updating snapshots and clear
account-derived reset state, warnings, prompts, and status surfaces when
the account changes.

## Validation

- `just test -p codex-tui chatwidget::tests::usage` — 29 passed.
- `just test -p codex-tui chatwidget::tests::status_command_tests` — 7
passed.
- Account-boundary prompt and plan-mode prompt regression tests passed.
- `cargo insta pending-snapshots` from `codex-rs/tui` — no pending
snapshots.\

<img width="814" height="318" alt="image"
src="https://github.com/user-attachments/assets/2a460e96-458b-4805-8d9f-c759382d21a4"
/>
view for monthly
<img width="905" height="243" alt="image"
src="https://github.com/user-attachments/assets/179f88e3-08fb-4af5-8dc6-ce6a944ed681"
/>
…ed (#27982)

## Why

The first auto-review currently creates its Guardian child session on
demand, adding avoidable latency before the review can begin. Creating
the ordinary Guardian child during parent-session initialization lets
that child use the existing session startup WebSocket prewarm before the
first escalation. This does not introduce a Guardian-specific prewarm
mechanism.

## What changed

- initialize the existing Guardian review-session manager owned by
`Session` when a thread starts with auto-review enabled and an approval
policy that routes to Guardian
- use the standard Guardian child-session construction and the existing
session startup WebSocket prewarm
- preserve the existing reuse-key invalidation and lazy creation
fallback when startup initialization fails or the effective review
configuration changes
- add an integration test that verifies normal root-session startup
emits a Guardian `generate=false` prewarm request

## Benchmark

I compared release builds against main. Each prompt first ran a
non-escalated `sleep 3`, then requested an escalated marker command.

| binary | count | avg Guardian duration | median Guardian duration |
avg Guardian TTFT |
|---|---:|---:|---:|---:|
| origin-main | 10 | 4008.7 ms | 3949.5 ms | 3746.5 ms |
| session-fix | 10 | 2865.0 ms | 2594.0 ms | 2492.7 ms |

Guardian duration fell by 28.5% and Guardian TTFT fell by 33.5%. These
measurements cover Guardian review latency; they do not measure parent
thread-start latency.
## Why

`compile_scoped_filesystem_pattern()` accepted a `_policy_cwd` parameter
even though scoped glob compilation no longer uses the policy working
directory. Keeping that unused argument forced the surrounding
permissions compilation path to keep forwarding `policy_cwd` through
call sites that did not need it, making the API look more dependent on
cwd resolution than it is.

## What changed

Removed the unused cwd parameter from
`compile_scoped_filesystem_pattern()` and the callers that only
forwarded it: `compile_filesystem_permission()`,
`compile_permission_profile()`, and
`compile_permission_profile_selection()`. Workspace root resolution
still keeps `policy_cwd`, because that path still resolves relative
roots against the active policy cwd.

Relevant code:
[`codex-rs/core/src/config/permissions.rs`](https://github.com/openai/codex/blob/b8b9816102e064dae4488ec130cf560f63c1ab78/codex-rs/core/src/config/permissions.rs#L346).

## Verification

- `just test -p codex-core config::permissions`
- `just test -p codex-core` was also run after building
`test_stdio_server`; it passed the touched permissions coverage but
still reported unrelated existing failures in `cli_stream` and shell
snapshot tests.
## Summary

Stacked on #26706.

Adds the shared auth/system-proxy contract that later platform resolver
PRs plug into. This PR moves Codex-owned auth and startup HTTP clients
through a common route-aware boundary, but does not yet add Windows or
macOS system proxy resolution.

The default path remains unchanged when `respect_system_proxy` is absent
or disabled.

## Implementation

- Adds `codex-client/src/outbound_proxy.rs` with the shared
route-selection model:
  - `OutboundProxyConfig`;
  - `ClientRouteClass`;
  - `RouteFailureClass`;
  - `build_reqwest_client_for_route`.
- Preserves the existing reqwest/default-client behavior when no route
config is supplied.
- Uses the fixed MVP routing policy when route config is supplied:
platform system/PAC/WPAD discovery, then explicit env proxy variables,
then direct connection.
- Keeps platform-specific system discovery behind the shared client
boundary. This PR provides the contract and fallback behavior; later
resolver PRs plug in Windows and macOS discovery.
- Adds `login::AuthRouteConfig` so auth call sites depend on a small
policy type instead of platform resolver details.
- Maps the resolved `Config.respect_system_proxy` boolean into
`AuthRouteConfig` for auth-owned clients.
- Wires the route config through browser login, device-code login,
access-token login, login status, logout/revoke, token refresh, API-key
exchange, app-server account login, TUI/app startup, cloud-config
bootstrap, cloud tasks, plugin auth, and exec startup config loading.

## End-user behavior

- No behavior changes by default.
- When `respect_system_proxy = true`, auth-owned clients opt into the
shared route-aware client path.
- On platforms without a resolver implementation in this PR, system
discovery is unavailable and the route-aware path falls back to explicit
env proxy handling, then direct connection.
- Custom CA handling remains separate from proxy route selection and
still runs through the shared client builder.
- No proxy URLs, PAC contents, or resolved platform details are exposed
through the public config surface introduced here.

## Tests

Adds or updates coverage for:

- preserving default auth-client fallback behavior when no route config
is provided;
- injected environment-proxy fallback without mutating process
environment;
- existing login-server E2E flows using explicit `auth_route_config:
None` to guard unchanged default behavior;
- updated auth manager, login, logout, cloud-config, startup, and
plugin-auth call sites passing route config explicitly.
# Summary

Codex required every ChatGPT account to have an email address. A
service-account personal access token can return valid account metadata
without one, so PAT login failed while decoding the metadata response.

This change makes email optional in the account metadata type that owns
it and preserves that absence through authentication, provider account
state, the app-server API, generated clients, and TUI bootstrap.
Existing accounts with email addresses keep the same behavior.

## Behavior-changing call sites

| Call site | Behavior after this change |
| --- | --- |
| `login/src/auth/personal_access_token.rs` | PAT metadata accepts a
missing or null email and retains `None`. |
| `agent-identity/src/lib.rs` | Agent Identity JWT claims accept an
omitted email. |
| `login/src/auth/storage.rs` and `login/src/auth/agent_identity.rs` |
Stored and managed Agent Identity records carry `Option<String>`.
Deserialization maps the legacy empty-string sentinel to `None`. |
| `login/src/auth/manager.rs` | `get_account_email` returns the stored
option, and managed identity bootstrap no longer converts `None` to an
empty string. |
| `model-provider/src/provider.rs` and `protocol/src/account.rs` | A
ChatGPT provider account requires a plan type but may carry no email. |
| `app-server-protocol/src/protocol/v2/account.rs` | `account/read`
keeps the `email` field on the wire and returns `null` when the account
has no email. Generated TypeScript and JSON schemas describe a required,
nullable field. |
| `sdk/python/src/openai_codex/generated/v2_all.py` | The generated
Python `ChatgptAccount` model accepts `None` for email. |
| `tui/src/app_server_session.rs` | Email-less ChatGPT accounts
bootstrap normally, keep external feedback routing, omit account-email
telemetry, and display the plan in account status. |

## Design decisions

- Missing email remains `None` at every layer. The code never uses an
empty string as a substitute.
- The app-server response includes `"email": null` instead of omitting
the field. Clients retain a stable response shape.
- Plan type remains required for provider account state. This change
relaxes only the email assumption.

## Testing

Tests: affected test targets compile, scoped Clippy and formatting pass,
a focused TUI snapshot covers plan-only account status, real
before/after PAT login smoke covers metadata without email, app-server
smoke covers `account/read` with `email: null`, and a regression smoke
covers an existing email-bearing PAT. Unit tests run in CI.

## Evidence

Visual smoke evidence will be attached here.
## Summary

Instead of:

    reminder_interval_tokens = 65_536

allow users to configure explicit remaining-token reminder thresholds:

reminder_at_remaining_tokens = [65_536, 32_768, 16_384, 8_192, 4_096,
2_048, 1_024, 512]

## Validation

- CARGO_INCREMENTAL=0 just test -p codex-core rollout_budget: 9 passed
- just fix -p codex-core
- just fmt
## Why

`permissionProfile/list` currently advertises every built-in and
configured profile even when effective enterprise requirements prevent
selecting it. That forces each client to reconstruct policy from
lower-level requirement fields, which is easy to miss and difficult to
keep consistent.

The catalog should remain complete so clients can explain that an option
was disabled by an administrator, while also reporting whether each
profile is selectable.

## What

- Add an `allowed` field to each permission profile summary.
- Build a shared catalog from the effective config and current
requirements, including `allowed_sandbox_modes`, `allowed_permissions`,
and filesystem restrictions.
- Use the shared catalog in app-server and the TUI so disallowed
profiles remain visible but cannot be selected.
- Use the canonical `:danger-full-access` profile ID in the TUI.
- Update the app-server schemas, API documentation, behavioral tests,
and TUI snapshots.

## Scope

This PR targets `main` directly and is independent of #24852. It
preserves the current behavior where built-in profiles are constrained
by sandbox-mode requirements and `allowed_permissions` applies to
configured profiles.

## Testing

- `just test -p codex-core
permission_profile_catalog_marks_profiles_disallowed_by_requirements`
- `just test -p codex-app-server permission_profile_list`
- `just test -p codex-app-server-protocol`
- `just test -p codex-tui profile_permissions`
- `just fix -p codex-core`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
- `just fmt`

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Joey Trasatti <joey.trasatti@openai.com>
anp-oai and others added 30 commits June 26, 2026 08:47
## Summary

- initialize `selected_capability_roots` in the new
`attach_in_memory_thread_store` test helper
- restore `codex-core` test compilation on `main`

## Root cause

[#30144](#30144) added the helper
from commit `0c3d0742`, whose parent was `c38b2e9b`. That branch was
based before [#29856](#29856) added
`selected_capability_roots` as a required field on `CreateThreadParams`.

The PR's Rust and Bazel workflows both passed against the stale branch
head `0c3d0742`. When #30144 was squashed onto newer `main`, its
initializer was integrated alongside the required field from #29856,
producing `E0063` in `core/src/session/tests.rs`. Because those
workflows tested the branch head rather than the integrated merge
result, they did not see the version-skew failure before merge.

## Impact

Any job that compiles the `codex-core` library tests fails, which turned
the main-branch `rust-ci-full` and `Bazel` workflows red across
platforms and blocks unrelated focused core tests. This change only
completes the test initializer; it does not alter production behavior or
workflow configuration.

## Validation

- `just fmt`
- `just test -p codex-core
turn_complete_flushes_terminal_event_after_delivery` (1 passed, 2909
skipped)
- `git diff --check`
## Why

MCP runtime reuse was keyed by every ready selected-capability
environment, even when an environment contributed no MCP servers or
connectors.

For example:

1. a global stdio MCP is running;
2. a selected remote environment contains only a skill;
3. that environment becomes ready;
4. the MCP and connector projection stays exactly the same;
5. Codex nevertheless rebuilds the MCP manager and restarts the global
stdio process.

That restart can interrupt active calls and discard process-local state
even though nothing about MCP changed.

## What changes

When selected-environment availability changes, Codex now resolves the
candidate MCP and connector projection before deciding whether to
replace the runtime:

- if the winning MCP servers or their ownership change, rebuild as
before;
- if the selected connector snapshot changes, rebuild as before;
- if an enabled MCP is explicitly bound to an environment whose
availability changed, rebuild as before;
- otherwise, keep the exact live manager and processes, and update only
the availability input remembered by the snapshot.

```text
ready selected environments:  [] -> [skills-env]
resolved MCP servers:          {global_probe} -> {global_probe}
resolved connectors:           {} -> {}
result:                         reuse manager; keep the same process
```

The comparison uses the resolved winning servers and their sources, so
plugin/config ownership remains part of the runtime identity.

## Existing stack coverage

The integration PR directly below this one already covers both rebuild
boundaries: a selected MCP becomes callable and a selected connector
tool becomes model-visible when their environment becomes available. It
also verifies that an unchanged selected MCP runtime keeps its process.

This PR does not add another remote-attachment integration scenario for
the no-change optimization. `environment/add` returns before readiness,
and app-server does not currently expose a deterministic readiness
signal for an environment that contributes only skills. Keeping a
fixed-delay test would add flake risk; adding a new readiness API would
be outside this fix.

## Scope and assumptions

- This does not change skill discovery, World State rendering, or plugin
metadata caching.
- This does not add file watching or hot reload behavior.
- This does not change disconnect/reconnect handling.
- Selected environment IDs and their capability contents retain the
stack's existing stability assumption.
- Delayed `required = true` executor MCP behavior remains out of scope.
## Why

The selected-capability integration test already covers initial
attachment and cold resume, but it resumes while the selected executor
is still reachable.

That leaves an important World State transition untested: a thread
remembers its selected capability root, resumes while that environment
is unavailable, and later sees the same stable environment return.

## What this tests

This extends the existing end-to-end scenario:

```text
selected executor available
        ↓
app-server stops and the executor goes away
        ↓
thread resumes with the executor unavailable
        ↓
skills, selected MCP tools, and connector attribution are absent
        ↓
the same environment ID is attached again
        ↓
skills, MCP tools, and connector attribution return
```

The test also checks that the unavailable snapshot explicitly tells the
model that no selected-environment skills are currently available. After
reattachment, it invokes the selected skill again and verifies that a
new executor-owned MCP process starts.

## Scope

This is test-only. It keeps the existing assumption that an environment
ID refers to stable capability contents. It does not add package-file
invalidation or live transport reconnect behavior.
## Summary

- stop publicly re-exporting the internally used
`SKILLS_INTRO_WITH_ALIASES` constant
- keep the constant and all skills rendering behavior unchanged
- preserve every integration helper, API, fixture, assertion, and module
used by tests

## Scope guardrails

This revision keeps all remote/network-facing functionality and every
line introduced by `jif <jif@openai.com>`.

Following the test-preservation audit, it also restores the in-process
RMCP test transport, the original `codex-mcp` fixture,
`PluginLoadOutcome::effective_skill_roots` and its assertions, the
`EffectiveSkillRoots` API family, the test-only apps renderer, and the
TUI dead-code annotation. Those files now match the PR base exactly.

No test imports or directly references the remaining public skills
export being narrowed.

## Validation

- repository-wide test-reference audit: no test-used code remains
deleted or narrowed
- deleted-line `git blame` audit: zero Jif-authored deletions
- `cargo test -p codex-core-plugins -p codex-mcp -p codex-rmcp-client
--lib`: 467 passed
- `cargo test -p codex-core --lib apps::render`: 2 passed
- `cargo test -p codex-core-skills --lib render::tests`: 19 passed
- `cargo check -p codex-core-skills --all-targets`: passed
- `just fix -p codex-core-skills`: passed
- `just fmt`: passed
- `git diff --check`: passed

The full local `codex-core-skills` suite passed 106/108 tests; two
loader tests detected an ambient repository skills root outside the
package and failed their isolation assertions. The scoped renderer suite
and all-target compile pass, and CI runs in an isolated environment.

Final code delta: 1 insertion, 2 deletions across 2 files.
## Summary
- Allow a top-level `description` string in `hooks.json`.
- Continue rejecting unknown top-level keys and root-level hook events;
events must remain under `hooks`.

## Testing
- `just test -p codex-config`
## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.
Introduced by a merge race around thread.history_mode.
## Why

Admins need persistent defaults for the model, reasoning effort, and
service tier shown when the Desktop App creates a new thread. These are
initialization defaults rather than runtime constraints: the App should
use them to initialize its draft while still allowing a user to make an
explicit selection.

The app-server therefore needs to expose the managed values before
thread creation without changing `thread/start` behavior for other
clients.

## What changed

- Parse `model`, `model_reasoning_effort`, and `service_tier` from
`[models.new_thread]` in `requirements.toml`.
- Compose the `models` requirements through the existing
requirements-layer precedence rules.
- Expose the resolved values through `configRequirements/read` as
`requirements.models.newThread`.
- Add the corresponding app-server protocol types and regenerate the
JSON and TypeScript schema fixtures.
- Document the new `configRequirements/read` fields in the app-server
README.

## Scope

This PR is data plumbing only. It does not apply these values during
`thread/start` and does not change thread creation for existing
app-server clients, resumed or forked sessions, internal or subagent
sessions, `codex exec`, or the TUI. A companion Desktop App change owns
draft initialization, sends the effective settings for ordinary and
prewarmed starts, and preserves explicit user changes.

## Validation

- Requirements deserialization coverage for `[models.new_thread]`
- Requirements-layer precedence coverage
- App-server API mapping coverage
- `configRequirements/read` integration coverage
- Regenerated app-server JSON and TypeScript schema fixtures
## Why

Environment skill discovery needs two independent pieces of information:

- plugin namespaces from `plugin.json` files; and
- skill metadata from each `SKILL.md` file.

Today these happen in sequence. Codex waits for every plugin namespace
lookup to finish before it starts reading any skill files. On a remote
executor, that creates an avoidable network-latency barrier.

```text
before: walk -> namespace lookups -> skill reads -> build catalog
after:  walk -> namespace lookups ─┐
             -> skill reads ───────┴-> build catalog
```

## What changes

- Read and parse skill files without waiting for plugin namespace
discovery.
- Resolve root and nested plugin namespaces concurrently.
- Join both results only when constructing the final qualified skill
names.
- Keep the existing 64-skill concurrency bound, output ordering,
warnings, metadata behavior, and namespace rules.

## Testing

The regression test makes plugin manifest lookup wait until a `SKILL.md`
read has started. The old serialized pipeline would time out; the new
pipeline completes and still returns the correctly namespaced skill.

`just test -p codex-core-skills` passes all 111 tests.

## Out of scope

This does not add an exec-server endpoint, batch filesystem calls, or
reduce the number of files transferred. A frontmatter-only read or
server-side skill catalog can remain a separate follow-up if benchmarks
show that transferred bytes are the next bottleneck.
Prompt update of MAv2 to include agents.md and skills more explicitly

should mimic: #27919
## Why

#29683 exposes managed defaults for new-thread model settings through
`configRequirements/read` without applying them server-wide. The TUI is
an app-server client, so it should explicitly consume those defaults
when it creates a fresh thread.

This lets plain `codex` start on the managed model while preserving the
existing ability to change model settings within the thread.

## What changed

- Read `requirements.models.newThread` during TUI app-server bootstrap.
- Apply the managed model, reasoning effort, and service tier to the
initial fresh thread and subsequent `/new` or `/clear` threads.
- Keep explicit launch overrides above the managed defaults.
- Normalize the managed `fast` service tier to the `priority` request
value.
- Leave resumed and forked threads unchanged.

The application logic lives in a small TUI-only module; app-server
`thread/start` behavior remains unchanged for other clients.

## User experience

- Plain `codex` starts with the managed new-thread settings.
- A user can still change settings with `/model` or the existing
service-tier controls.
- Starting another fresh thread reapplies the managed defaults.
- Explicit launch choices such as `codex -m <model>` continue to win.

## Validation

- `just test -p codex-tui managed_new_thread_defaults`
- `just fix -p codex-tui`

Depends on #29683.
## Description

This PR makes `thread.history_mode` immutable after the thread's
canonical first `SessionMeta` has been written. Later same-thread
`SessionMeta` lines are compatibility metadata writes, not a new thread
definition.

Without this, an older binary could append a `SessionMeta` that omits
`history_mode`; when a newer binary replays it, serde defaults that
missing field to `legacy` and SQLite could downgrade a paginated thread.

## Why

`history_mode` is the persisted thread storage contract.
Paginated-thread fail-closed behavior and SQLite memory filtering depend
on it staying aligned with canonical rollout metadata, especially when
multiple Codex binary versions can touch the same local rollout.

## What changed

- Stop generic rollout metadata replay from overwriting `history_mode`
from later `SessionMeta` items.
- Remove `history_mode` from `ThreadMetadataPatch`, so mutable metadata
sync and app-server metadata updates cannot rewrite it.
- When local metadata sync has to recreate a missing SQLite row, recover
`history_mode` from the rollout's canonical first `SessionMeta` instead
of from a mutable patch.
- Keep the in-memory thread store using the created thread's canonical
`history_mode` instead of metadata patches.
- Fill the one remaining core test `CreateThreadParams` initializer with
the new `history_mode` field; Bazel CI caught this after the parent
history-mode PR landed.

## Validation

- `just fmt`
- `just test -p codex-thread-store`
- `just test -p codex-state
session_meta_does_not_set_model_or_reasoning_effort`
## Description

This adds stable optional `turnId` support to `thread/fork`. When
supplied, the fork copies persisted history through that terminal turn,
inclusive, and drops later turns from the new thread.

Omitting or passing `null` preserves the existing full-history fork
behavior, including the interruption marker when the stored source
history ends mid-turn.

## Why

We're deprecating `thread/rollback` and this will help certain UX use
cases work around it by using `thread/fork` + `turn_id` instead.
## Why

I use the `$code-review` skill a lot and it'd be nice to add my own
additional review criteria in `$CODEX_HOME/skills/code-review-*`.

## What

Removes phrasing about "code-review-* skills in this repository" which
in practice seems like enough to get Codex to consult my user-level code
review skills in addition to the repo-level ones.
## Summary

- add Sol (`openai.gpt-5.6-sol`), Terra (`openai.gpt-5.6-terra`), and
Luna (`openai.gpt-5.6-luna`) to the Amazon Bedrock static model catalog
- derive all three entries from the bundled GPT-5.5 metadata and add the
Bedrock-only `max` reasoning effort
- keep the new entries below the current GPT-5.5 and GPT-5.4 models at
priorities 2, 3, and 4, preserving GPT-5.5 as the default
- add deep-equality coverage for inherited model configuration, catalog
ordering, context windows, and service-tier behavior
### Summary

Release live thread persistence when a session ends because its
submission channel closes. This prevents a later same-process resume
from failing with `thread ... already has a live local writer`.

### Details

The issue is in the `codex-core` session teardown path used by Codex
hosts, rather than in Managed Agents API or exec-server itself.

Explicit shutdown already closes the `LiveThread`, which releases the
process-scoped writer held by `LocalThreadStore`. The
submission-channel-close fallback ran runtime and extension teardown but
skipped that persistence shutdown, leaving the thread ID registered as
having a live writer.

This change:

- closes the `LiveThread` on the channel-close fallback path;
- preserves the existing teardown order used by explicit shutdowns;
- extends the lifecycle regression test to assert that the thread store
receives `shutdown_thread`.

Context: [original
report](https://openai.slack.com/archives/C0B4NBHQGTV/p1782136364948039),
[recent occurrence
1](https://openai.slack.com/archives/C0B4NBHQGTV/p1782434817895839?thread_ts=1782136364.948039&cid=C0B4NBHQGTV),
[recent occurrence
2](https://openai.slack.com/archives/C0B4NBHQGTV/p1782335107474429?thread_ts=1782136364.948039&cid=C0B4NBHQGTV)

### Testing

- `just test -p codex-core
submission_loop_channel_close_runs_full_thread_teardown`
- `just test -p codex-core --lib` (1,989 passed; 3 skipped)
- `just fix -p codex-core`
- `just fmt`
- Native code review: no findings

I also attempted `just test -p codex-core`. The new regression passed;
79 unrelated integration tests failed in the local harness, primarily
because helper binaries such as `test_stdio_server` were unavailable,
plus local proxy/shell timing failures.
## Summary

- classify authentication-required RMCP startup failures, including
errors nested inside `ClientInitializeError::TransportError`
- let `codex-mcp` consume that classification so the existing
`reauthenticationRequired` startup failure reason is emitted
- add a regression test that performs real startup with an expired
persisted OAuth token and no refresh token

## Why

Follow-up to #29877.

RMCP stores streamable HTTP initialization failures inside a dynamic
transport error whose payload is not exposed through the standard Rust
error source chain. The original `anyhow::Error::chain()` check
therefore missed the nested `AuthError::AuthorizationRequired` seen
during real MCP startup and emitted `failureReason: null`.

The transport-specific inspection now lives in `codex-rmcp-client`,
while `codex-mcp` consumes only the domain-level authentication-required
result. This classifier does not distinguish first-time login from
reauthentication; the existing auth-state logic remains responsible for
that distinction.

## User impact

When stored MCP OAuth credentials are expired and cannot be refreshed,
app clients now receive `failureReason: "reauthenticationRequired"` on
the failed startup update and can show the reconnect action. First-time
login and unrelated startup failures remain unchanged.

## Validation

- `just test -p codex-rmcp-client --test streamable_http_oauth_startup
identifies_expired_unrefreshable_token_startup_error`
- `just test -p codex-mcp
startup_outcome_error_identifies_authentication_required`
- `just test -p codex-mcp
mcp_startup_failure_reason_requires_existing_oauth_and_auth_failure`
- `cargo build -p codex-cli --bin codex`
- local app-server probe emitted `failureReason:
"reauthenticationRequired"`
- manual end-to-end reconnect flow confirmed
- `just fmt`
## Why

Marketplace source deserialization treated `{"source":"npm", ...}` as
unsupported. The loader logged and skipped the entry, so npm-backed
plugins never appeared in `plugin list --available` and `plugin add`
returned "plugin not found".

Codex plugins are installed from a plugin root, not from an npm
dependency tree. For npm-backed marketplace entries, Codex should fetch
the published package contents without running package scripts or
installing unrelated dependencies.

## What changed

- Add `npm` marketplace plugin sources with `package`, optional semver
`version` or version range, and optional HTTPS `registry`.
- Reject unsafe npm source fields before materialization, including
invalid package names, non-semver version selectors, plaintext or
credential-bearing registry URLs, and registry query/fragment data.
- Materialize npm plugins with `npm pack --ignore-scripts`, then unpack
the resulting tarball through the existing hardened plugin bundle
extractor.
- Enforce npm archive and extracted-size limits, require the standard
npm `package/` archive root, and verify the extracted `package.json`
name matches the requested package before installing.
- Keep plugin listings, install-source descriptions, CLI JSON/human
output, app-server v2 `PluginSource`, TUI source summaries, regenerated
schema fixtures, and app-server documentation in sync.

## Impact

Marketplaces can distribute Codex plugins from public or configured
private HTTPS npm registries using the same install flow as existing
materialized plugin sources. `npm` must be available on `PATH` when an
npm-backed plugin is installed.

Fixes #27831

## Validation

- `just write-app-server-schema`
- `just test -p codex-core-plugins -p codex-app-server-protocol -p
codex-app-server -p codex-cli`
  - npm/schema/core-plugin coverage passed in the run.
- The full focused command finished with `1739 passed`, `11 failed`, and
`6 timed out`; the failures were unrelated local app-server environment
failures from `sandbox-exec: sandbox_apply: Operation not permitted`
plus one missing `test_stdio_server` helper binary.
- Installed an npm-published Codex plugin package through a throwaway
local marketplace and throwaway `CODEX_HOME` to exercise the real npm
materialization path end to end.
## Why

It's hard to change the set of required jobs when they're managed in the
GitHub UI, and when each workflow is responsible for choosing it's own
scheduling it's easy to end up with skew between what we enforce on PRs
vs. on main.

## What

- add a `blocking-ci` caller workflow, triggered by pull requests and
pushes to `main`, for Bazel, blob size, cargo-deny, Codespell,
`repo-checks`, rust CI, and SDK CI
- add an `always()` terminal job named `CI required` that fails unless
every called workflow succeeds
- add a `postmerge-ci` caller workflow for `rust-ci-full` and
`v8-canary`, with a terminal `Postmerge CI results` job
- centralize V8 relevance detection in `v8_canary_changes.py`; unrelated
PR and postmerge runs execute metadata only and skip the expensive build
matrices
- leave `v8-canary` outside the blocking gate and leave the external
`cla` check independent

## Rollout

A repository admin must replace the existing required GitHub Actions
contexts with `CI required` in the main-branch ruleset. Retain `cla` as
a separate required check. Until that change is coordinated, this PR
cannot satisfy the old standalone check names. In-flight PRs will need
to be rebased after this lands.
## Description

This PR adds canonical core `TurnItem` shapes for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity, to
be stored in the rollout file soon.

It also teaches app-server protocol / `ThreadHistoryBuilder` how to
render those items, and adds the small legacy fanout helpers needed for
existing event-based consumers. No core producer or rollout persistence
behavior changes here, that will be done in a followup.

## Making ThreadHistoryBuilder stateless

This is the first PR in a stack to make `ThreadHistoryBuilder` stateless
enough that we can materialize app-server `ThreadItem`s from only a
given slice of `RolloutItem` history, without ever needing to replay the
whole thread from the beginning.

The persisted legacy `RolloutItem::EventMsg` records are mostly shaped
like live UI events, not like materialized `ThreadItem`s. They work if
we replay the full rollout in order, but they often do not contain
enough stable identity or complete item state to project an arbitrary
suffix on its own.

A few examples:

- `UserMessageEvent` and `AgentMessageEvent` have content, but
historically do not carry the persisted app-server item ID that should
become the SQLite primary key.
- `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are
fragments. `ThreadHistoryBuilder` currently merges them into the last
reasoning item, which means a slice starting in the middle of reasoning
cannot know whether to append to an earlier item or create a new one.
- `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and
similar legacy events can often render a final-looking item, but they
usually rely on prior replay state to know which turn owns the item.
- Begin/end legacy events are partial views of one logical item. The
builder correlates them by `call_id` and mutates prior state to
synthesize the final `ThreadItem`.

That is the problem this direction fixes. A persisted canonical
lifecycle record looks much closer to the read model we actually want
later:

```rust
ItemCompletedEvent {
    turn_id,
    item: TurnItem { id, ...full snapshot... },
    completed_at_ms,
}
```

Once rollout has explicit `turn_id`, stable `item.id`, and a canonical
completed item snapshot, the future SQLite projector can reduce only the
new rollout suffix and upsert the affected `thread_items` rows. It no
longer needs to synthesize `item-N`, infer item ownership from the
active turn, or replay earlier events just to reconstruct the current
item snapshot.

## What changed

- Added core `TurnItem` variants and item structs for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity.
- Added conversions from those canonical items back into the legacy
event shapes where current consumers still need them.
- Added app-server v2 `ThreadItem` conversion for the new core item
variants.
- Taught `ThreadHistoryBuilder` and rollout persistence metrics to
recognize the new item variants.

## Follow-up

The next PR #30283 switches the live
core producers for these item families onto canonical `ItemStarted` /
`ItemCompleted` events.
## Why

Remote-control websocket reconnects and pairing requests proactively
refresh their server token. When `/server/refresh` returns a transient
error such as `502`, the still-valid token was discarded as a usable
connection path, causing reconnect failures and repeated refresh
attempts that could amplify an upstream incident.

## What Changed

- Start proactive refresh five minutes before token expiry and
distinguish it from a required refresh for missing or expired tokens.
- Continue websocket and pairing operations with the existing valid
token after `429`, `5xx`, or timeout failures.
- Share an in-memory `next_refresh_at` throttle across websocket and
pairing callers, honoring both `Retry-After` formats and otherwise using
a jittered 24–36 second delay.
- Keep required refreshes strict, preserve `404` enrollment replacement,
and clear token/throttle state for `401` and `403` auth recovery.
- Preserve refresh response metadata internally and add focused
wire-level and integration coverage.

## Verification

Added behavioral coverage proving that:

- a valid near-expiry token still completes websocket and pairing
requests after transient refresh failures;
- `Retry-After` suppresses a subsequent refresh across websocket and
pairing callers;
- request and response-body timeouts are classified as transient;
- an expired token, including one that expires during refresh, cannot
proceed to websocket connection;
- auth failures clear the attempted token without overwriting a
concurrently rotated token.
## Summary

- complete unified-exec processes from the ordered event stream instead
of issuing a final zero-wait `process/read`
- add optional executor sandbox-denial state to `process/exited`
- retain `process/read` as a retained-output and compatibility fallback
for receiver lag, sequence gaps, and legacy servers
- recover sandbox-denial state across transport reconnection
- cover the real `TestCodex` remote-exec path without adding a public
test-only event constructor

## Why

A successful one-shot tool call currently receives its output and
terminal notifications, then pays another wide-area `process/read` round
trip before returning. Staging traces showed that remote response wait
accounted for more than 99.8% of RPC time; local serialization,
queueing, and deserialization were below 0.6 ms.

## Measured impact

A direct staging A/B used the same build and route and changed only
completion mode. Each arm ran three times with 30 one-shot
`/usr/bin/true` calls per run. The table reports the median of the three
per-run percentiles.

| Metric | Final `process/read` | Pushed events | Change |
| --- | ---: | ---: | ---: |
| End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
| End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
| Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
| Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |

TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
successful, complete, in-order event path issued zero final
`process/read` calls.

## Compatibility and recovery

- new servers send `sandboxDenied` on `process/exited`
- legacy servers omit it, which triggers one compatibility
`process/read`
- broadcast lag or a sequence gap triggers a retained-output read
- recovery remains bounded by the server's existing 1 MiB
retained-output window
- complete, in-order event streams issue no completion read
- sandbox denial is attached to the exit event before consumers can
observe process completion
- server-first and client-first rollouts remain wire-compatible;
server-first realizes the latency win immediately

## Integration coverage

The `TestCodex` suite exercises four distinct remote-exec contracts:

- complete pushed output/exit/close with zero reads
- direct pushed sandbox denial with zero reads
- legacy missing denial metadata with exactly one compatibility read
- count-bounded replay eviction recovered from retained output without
duplication

## Validation

- `just test -p codex-core
exec_command_consumes_pushed_remote_process_events`: 4 passed
- `just test -p codex-core unified_exec::process_tests::`: 4 passed
- `just test -p codex-exec-server`: 294 passed, 2 skipped
- `just test -p codex-exec-server-protocol`: 5 passed
- `just test -p codex-rmcp-client`: 89 passed, 2 skipped
- focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
- scoped `just fix` passed for core and exec-server
- `just fmt` passed

The complete workspace suite was not rerun; focused Cargo and Bazel
coverage passed for the changed behavior.
## Why

Remote diff-root discovery is independent of world-state construction,
but it ran afterward and added filesystem metadata latency before the
first model request. Overlap the independent work so thread-cold turns
do not pay those waits serially.

## What

- Run `record_context_updates_and_set_reference_context_item` and
`turn_diff_display_roots` with `tokio::join!`.
- Reuse the same resolved display roots when constructing
`TurnDiffTracker`; no cache or behavior lifecycle changes are
introduced.

## Validation

A synthetic executor-skill benchmark with artificial network delay:
thread-cold model-request p50 improved from about 1.79 s to 1.58 s.
## Why

`LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the
behavior was only covered indirectly. We should verify the actual JSONL
written by both user-facing entry points: `codex app-server` and the
standalone `codex-app-server` binary.

The existing processor shutdown message also always said the channel
closed, even though the processor can exit for several different
reasons. Structured fields make that event more accurate and useful to
log consumers.

## What changed

- Record the processor `exit_reason`, remaining connection count, and
forced-shutdown state as structured tracing fields.
- Add a shared process-test helper that enables JSON logging, validates
every stderr line as JSON, and verifies the top-level timestamp is RFC
3339.
- Cover both `codex app-server` and `codex-app-server`, asserting the
stable `level`, `fields`, and `target` payload.

## Test plan

- `just test -p codex-app-server
standalone_app_server_emits_json_info_events`
- `just test -p codex-cli app_server_emits_json_info_events`
## Summary

- Preserve the optional namespace on custom tool calls during response
deserialization and app-server replay.
- Use the namespaced tool identifier for streaming argument handling and
tool dispatch.
- Regenerate app-server protocol schemas.
- Add regression tests covering namespace serialization and routing.

## Testing

- Ran affected protocol and app-server test suites.
- Ran the full core test suite; two load-sensitive timing tests passed
when rerun individually.
- Ran Clippy and formatting checks.
- Verified with a local end-to-end app-server replay that the namespace
is preserved through the complete request/response flow.
## Why

Response item IDs represent stable conversation identity.
`ContextManager::for_prompt` repairs an unmatched call by synthesizing
an `"aborted"` output in the disposable prompt projection, but that
output previously had no ID. Assigning a fresh ID on every prompt build
would make retries and resumes change otherwise identical model context
and reduce prompt-cache reuse.

The concrete bug is that these normalization-created outputs bypass the
regular item-ID allocation path. Even with item IDs enabled, a prompt
could therefore contain an identified call paired with a synthetic
output whose `id` was missing. This change closes that gap by deriving
the output ID from the source call's item ID. For legacy calls that have
no item ID, the output remains ID-less because there is no stable source
identity to derive from.

The originating call already has a stable item ID under the item-ID
model introduced in #28814. A prompt-only output can therefore derive
stable identity from that call without mutating canonical history or
persisted rollouts. This addresses the failure exposed by #30311 while
keeping normalization read-only outside its detached prompt snapshot.

UUIDv5 is intentional here because it is the standard namespaced,
deterministic UUID construction. Using the output kind and source call
ID as the name produces the same UUID on every projection while keeping
output kinds in separate name domains. UUIDv7 would introduce randomness
and time, so keeping it stable would require persisting the synthetic
repair. UUIDv5 uses SHA-1 internally, but this is only an identity
mapping—not an authenticity or security boundary.

## What changed

- Derive a deterministic UUIDv5 ID for each synthesized call output from
the source call item ID.
- Use the Responses API prefix appropriate for function, custom-tool,
tool-search, and local-shell outputs.
- Preserve the existing insertion position immediately after the
unmatched call.
- Keep synthesized outputs prompt-only; no rollout, task-lifecycle,
compaction, or raw-response behavior changes.

## Testing

- `just test -p codex-core
for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history`
- `just test -p codex-core
synthetic_call_output_id_is_stable_across_resumes`
- `just test -p codex-core normalize_adds_missing_output`
- `just test -p codex-core response_item_ids`
## Why

App-server clients that configure named execution environments need to
discover an environment's shell and working directory before selecting
it for a thread or turn. Because the environment can run on a different
operating system than app-server, its working directory is represented
as a canonical `file:` URI rather than a host-local path string. The
probe also needs a bounded response time: an exec-server that completes
initialization but never answers `environment/info` must not hold the
environment serialization queue indefinitely.

## What changed

- Add an experimental `environment/info` app-server RPC for named
environments.
- Route the probe through the managed environment connection and return
target-native shell metadata plus the default working directory as a
`PathUri`.
- Return connection and protocol failures as JSON-RPC errors.
- Bound the exec-server probe response to 30 seconds and remove
timed-out calls from the pending-request table so later environment
mutations can proceed.
- Cover successful responses, omitted working directories, unknown
environments, connection failures, and pending-call cleanup.

## Protocol examples

Request:

```json
{
  "id": 42,
  "method": "environment/info",
  "params": {
    "environmentId": "remote-a"
  }
}
```

Successful response:

```json
{
  "id": 42,
  "result": {
    "shell": {
      "name": "zsh",
      "path": "/bin/zsh"
    },
    "cwd": "file:///workspace"
  }
}
```

If the exec-server initializes but does not answer the probe within 30
seconds:

```json
{
  "id": 42,
  "error": {
    "code": -32603,
    "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
  }
}
```

## Testing

- App-server integration coverage for successful info (including omitted
`cwd`), unknown environments, and connection failures.
- Exec-server RPC coverage verifying a timed-out call is removed from
the pending-request table.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
## Summary

- project effective marketplace/plugin config through the enterprise
source policy so blocked installed plugins become inactive
- filter plugin list/read/discovery and CLI marketplace source/snapshot
reporting using the same policy
- enforce source admission for background marketplace cache refreshes
- continue refreshing/upgrading independent marketplaces and plugins
when one entry fails, returning per-entry errors
- include policy-projected plugin state in cache and refresh keys so
requirement changes invalidate stale results

## Stack

This is PR 2 of 2 and is based on #29690. Review the admission model and
source matcher in #29690 first; this PR contains only runtime
enforcement.

## Test plan

- `just test -p codex-core-plugins` (287 tests)
- `just test -p codex-cli
plugin_list_ignores_implicit_system_marketplace_roots_without_manifests`
- `cargo check -p codex-cli -p codex-app-server --tests`
## Summary

Increase the external currentTime/read request timeout from 5 seconds to
10 seconds.

## Validation

- just fmt
- Focused app-server test build was stopped to defer validation to CI.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

⤵️ pull merge-conflict Sync PR has merge conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.