|
| 1 | +--- |
| 2 | +name: temporal-diagnose |
| 3 | +description: > |
| 4 | + Two modes for working on the Temporal worker library loading bug. |
| 5 | + DIAGNOSE mode: run the 3-terminal Temporal dev setup (server + worker + job), |
| 6 | + observe the failure, interpret errors. Use when the user says "test temporal", |
| 7 | + "run temporal", "diagnose temporal", "temporal dev", "reproduce the temporal bug", |
| 8 | + "check if temporal works", or pastes Temporal worker/submitter output to interpret. |
| 9 | + FIX mode: discuss architecture, design the solution, plan implementation, make |
| 10 | + code changes. Use when the user says "fix temporal", "let's discuss a fix", |
| 11 | + "design the temporal fix", "implement the temporal fix", "plan the temporal |
| 12 | + solution", or wants to iterate on the worker library loading solution. |
| 13 | + Always use this skill when the conversation touches the Temporal worker library |
| 14 | + problem, get_required_pipe failures on the worker, or mthds_contents not reaching |
| 15 | + the worker. |
| 16 | +--- |
| 17 | + |
| 18 | +# Temporal Worker Library — Diagnose & Fix |
| 19 | + |
| 20 | +This skill has two modes. Determine which one from the user's prompt: |
| 21 | + |
| 22 | +- **"diagnose"**, **"test"**, **"run"**, **"reproduce"**, **"check"** → DIAGNOSE mode |
| 23 | +- **"fix"**, **"discuss"**, **"design"**, **"implement"**, **"plan"**, **"solution"** → FIX mode |
| 24 | + |
| 25 | +If ambiguous, ask the user: "Do you want to diagnose (run the setup and observe) or discuss the fix?" |
| 26 | + |
| 27 | +Read `references/temporal-worker-problem.md` before proceeding — it explains the |
| 28 | +root cause, code paths, and expected error patterns. |
| 29 | + |
| 30 | +## How Claude Code runs everything |
| 31 | + |
| 32 | +Claude Code handles all three processes. Do NOT ask the user to open terminals |
| 33 | +or run commands — do it yourself. |
| 34 | + |
| 35 | +Use **tmux** to manage the long-running processes (server and worker) in named |
| 36 | +sessions. This lets you start them, run the job submitter, and then capture |
| 37 | +output from all three to diagnose. |
| 38 | + |
| 39 | +| Process | tmux session | Raw command | Lifecycle | |
| 40 | +|---------|-------------|-------------|-----------| |
| 41 | +| Temporal server | `temporal-server` | `temporal server start-dev` | Long-running, stays up across iterations | |
| 42 | +| Temporal worker | `temporal-worker` | `PIPELEXPATH=<bundle_dir> .venv/bin/python -m pipelex.temporal.worker_cli --is-not-sandboxed` | Long-running, restart after code changes | |
| 43 | +| Job submitter | (inline Bash) | `make trund` / `make trun` | Runs and exits | |
| 44 | + |
| 45 | +**Important**: The server and worker are **long-running processes that never exit**. |
| 46 | +They block the shell they run in. That is why they run inside tmux sessions, not |
| 47 | +inline. The submitter (`make trund` / `make trun`) is the only process that runs to |
| 48 | +completion and exits — run it directly via Bash, not in tmux. |
| 49 | + |
| 50 | +**Why raw commands in tmux**: tmux sessions run in a bare shell without the |
| 51 | +Makefile's variable resolution (`$(VENV_PYTHON)`, `$(call PRINT_TITLE,...)`). |
| 52 | +Using `make ts` or `make tw` inside tmux will fail. Always use the raw commands |
| 53 | +shown above for tmux sessions. The `make` targets are only for the job submitter |
| 54 | +which runs in Claude Code's own shell. |
| 55 | + |
| 56 | +### tmux cheatsheet |
| 57 | + |
| 58 | +**Start a session:** |
| 59 | +```bash |
| 60 | +tmux new-session -d -s temporal-server 'temporal server start-dev' |
| 61 | +``` |
| 62 | + |
| 63 | +**Check if running:** |
| 64 | +```bash |
| 65 | +tmux has-session -t temporal-server 2>/dev/null && echo "running" || echo "not running" |
| 66 | +``` |
| 67 | + |
| 68 | +**Read output** (last N lines): |
| 69 | +```bash |
| 70 | +tmux capture-pane -t temporal-worker -p -S -100 |
| 71 | +``` |
| 72 | + |
| 73 | +**Kill and restart** (e.g., to pick up code changes): |
| 74 | +```bash |
| 75 | +tmux kill-session -t temporal-worker |
| 76 | +tmux new-session -d -c "$PWD" -s temporal-worker 'PIPELEXPATH=tests/integration/pipelex/pipes/controller/pipe_sequence .venv/bin/python -m pipelex.temporal.worker_cli --is-not-sandboxed' |
| 77 | +``` |
| 78 | + |
| 79 | +If tmux is not installed, fall back to asking the user to run the server and |
| 80 | +worker in separate terminals. |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +## DIAGNOSE Mode |
| 85 | + |
| 86 | +Run the 3-process Temporal development setup and interpret results. |
| 87 | + |
| 88 | +### Prerequisites |
| 89 | + |
| 90 | +Verify these yourself (via Bash): |
| 91 | +1. `tmux` installed: `which tmux` |
| 92 | +2. `temporal` CLI installed: `which temporal` |
| 93 | + |
| 94 | +### Step 1: Start the Temporal server |
| 95 | + |
| 96 | +First check if a server is already running (possibly outside tmux from a previous |
| 97 | +session or another terminal): |
| 98 | +```bash |
| 99 | +curl -s http://localhost:8233 > /dev/null && echo "running" || echo "not running" |
| 100 | +``` |
| 101 | + |
| 102 | +If **running**: skip to step 2. The server is already up — no need to start it again. |
| 103 | + |
| 104 | +If **not running**: start it in a tmux session: |
| 105 | +```bash |
| 106 | +tmux new-session -d -s temporal-server 'temporal server start-dev' |
| 107 | +``` |
| 108 | +Sleep **3 seconds**, then verify: |
| 109 | +```bash |
| 110 | +sleep 3 && curl -s http://localhost:8233 > /dev/null && echo "running" || echo "not running" |
| 111 | +``` |
| 112 | + |
| 113 | +Do NOT try to start the server if port 7233 is already in use — it will fail with |
| 114 | +a bind error, the tmux session will exit immediately, and subsequent `capture-pane` |
| 115 | +calls will fail. |
| 116 | + |
| 117 | +### Step 2: Start the worker |
| 118 | + |
| 119 | +```bash |
| 120 | +tmux has-session -t temporal-worker 2>/dev/null || \ |
| 121 | + tmux new-session -d -s temporal-worker \ |
| 122 | + 'cd $PWD && PIPELEXPATH=tests/integration/pipelex/pipes/controller/pipe_sequence .venv/bin/python -m pipelex.temporal.worker_cli --is-not-sandboxed' |
| 123 | +``` |
| 124 | + |
| 125 | +The worker is also long-running and never exits. Sleep **4 seconds** (no more), |
| 126 | +then capture the pane: |
| 127 | +```bash |
| 128 | +sleep 4 && tmux capture-pane -t temporal-worker -p -S -30 |
| 129 | +``` |
| 130 | +Look for `Temporal Worker started for 'temporal_task_queue'`. |
| 131 | + |
| 132 | +### Step 3: Submit a job |
| 133 | + |
| 134 | +Run the job submitter. It connects to Temporal, submits the workflow, and **waits |
| 135 | +for the result**. If the worker fails to process the job (e.g., deserialization |
| 136 | +error), the submitter may hang for a long time waiting for a response that never |
| 137 | +comes. Run it in the background so you can check worker output while it's waiting. |
| 138 | + |
| 139 | +Dry run (no real LLM calls): |
| 140 | +```bash |
| 141 | +TEMPORAL_BUNDLE="tests/integration/pipelex/pipes/controller/pipe_sequence/pipe_sequence_1.mthds" |
| 142 | +tmux has-session -t temporal-submitter 2>/dev/null || \ |
| 143 | + tmux new-session -d -s temporal-submitter \ |
| 144 | + "cd $PWD && .venv/bin/pipelex run bundle $TEMPORAL_BUNDLE --temporal --dry-run --mock-inputs --no-logo" |
| 145 | +``` |
| 146 | + |
| 147 | +Or for real LLM execution: |
| 148 | +```bash |
| 149 | +TEMPORAL_BUNDLE="tests/integration/pipelex/pipes/controller/pipe_sequence/pipe_sequence_1.mthds" |
| 150 | +tmux has-session -t temporal-submitter 2>/dev/null || \ |
| 151 | + tmux new-session -d -s temporal-submitter \ |
| 152 | + "cd $PWD && .venv/bin/pipelex run bundle $TEMPORAL_BUNDLE --temporal --mock-inputs --no-logo" |
| 153 | +``` |
| 154 | + |
| 155 | +Both default to `pipe_sequence_1.mthds`. To target a specific pipe, add `--pipe <pipe_code>`. |
| 156 | +Override the bundle by changing `TEMPORAL_BUNDLE`. |
| 157 | + |
| 158 | +### Step 4: Diagnose the output |
| 159 | + |
| 160 | +Read the submitter output (from step 3) AND the worker output: |
| 161 | +```bash |
| 162 | +tmux capture-pane -t temporal-worker -p -S -200 |
| 163 | +``` |
| 164 | + |
| 165 | +**Expected failure (bug not yet fixed):** |
| 166 | + |
| 167 | +There are two failure layers, both caused by the missing library on the worker. |
| 168 | +See `references/temporal-worker-problem.md` for details. |
| 169 | + |
| 170 | +**Layer 1 — Deserialization failure** (hits first): |
| 171 | +1. The PipeJob's WorkingMemory contains Stuff objects with dynamically-generated |
| 172 | + concept content classes (e.g., `RawText` inheriting from `TextContent`) |
| 173 | +2. These classes are generated during library loading by `ConceptFactory` / |
| 174 | + `StructureGenerator` and registered with Kajson's class registry |
| 175 | +3. On the worker, the library was never loaded → these classes don't exist → |
| 176 | + Kajson fails with `KajsonDecoderError: Class 'RawText' not found in module 'builtins'` |
| 177 | +4. Temporal wraps this as `RuntimeError: Failed decoding arguments` |
| 178 | +5. The submitter may hang waiting for a result that never comes |
| 179 | + |
| 180 | +**Layer 2 — Library resolution failure** (would hit after Layer 1 is fixed): |
| 181 | +1. `WfPipeRouter.run()` receives the PipeJob with the top-level PipeSequence |
| 182 | +2. `PipeSequence.run_pipe()` calls `get_required_pipe("clean_text")` |
| 183 | +3. `library_manager` singleton is empty on the worker → error |
| 184 | +4. Propagates as `TemporalError` / `ActivityError` to the submitter |
| 185 | + |
| 186 | +The submitter output will show a Temporal workflow failure (or hang indefinitely |
| 187 | +for Layer 1 failures). |
| 188 | + |
| 189 | +**After fix is applied (success looks like):** |
| 190 | +- Submitter: successful pipeline result printed to stdout |
| 191 | +- Worker (`tmux capture-pane`): logs showing pipe execution steps |
| 192 | +- Temporal UI (http://localhost:8233): completed workflow with result |
| 193 | + |
| 194 | +### Step 5: Iterate |
| 195 | + |
| 196 | +1. Kill and restart the worker (to pick up code changes): |
| 197 | + ```bash |
| 198 | + tmux kill-session -t temporal-worker |
| 199 | + tmux new-session -d -c "$PWD" -s temporal-worker 'PIPELEXPATH=tests/integration/pipelex/pipes/controller/pipe_sequence .venv/bin/python -m pipelex.temporal.worker_cli --is-not-sandboxed' |
| 200 | + sleep 5 |
| 201 | + ``` |
| 202 | +2. Make code changes |
| 203 | +3. Run `make trund` again and read the result |
| 204 | +4. Capture worker output: `tmux capture-pane -t temporal-worker -p -S -200` |
| 205 | +5. Repeat |
| 206 | + |
| 207 | +The server session (`temporal-server`) stays running across iterations. |
| 208 | + |
| 209 | +### Cleanup |
| 210 | + |
| 211 | +When done with the entire session: |
| 212 | +```bash |
| 213 | +tmux kill-session -t temporal-worker 2>/dev/null |
| 214 | +tmux kill-session -t temporal-server 2>/dev/null |
| 215 | +``` |
| 216 | + |
| 217 | +### Test bundles for different pipe controllers |
| 218 | + |
| 219 | +| Controller | Bundle path | |
| 220 | +|------------|-------------| |
| 221 | +| PipeSequence | `tests/integration/pipelex/pipes/controller/pipe_sequence/pipe_sequence_1.mthds` | |
| 222 | +| PipeCondition | `tests/integration/pipelex/pipes/controller/pipe_condition/pipe_condition_1.mthds` | |
| 223 | +| PipeBatch | `tests/integration/pipelex/pipes/controller/pipe_batch/uppercase_transformer.mthds` | |
| 224 | +| PipeParallel | `tests/integration/pipelex/pipes/controller/pipe_parallel/pipe_parallel_1.mthds` | |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## FIX Mode |
| 229 | + |
| 230 | +Discuss architecture, design choices, and implementation for solving the worker |
| 231 | +library loading problem. Stay in discussion/planning territory — do NOT jump to |
| 232 | +code changes unless the user explicitly says to implement. |
| 233 | + |
| 234 | +### What you must understand first |
| 235 | + |
| 236 | +Read `references/temporal-worker-problem.md` thoroughly. The core tension: |
| 237 | +- `pipeline_run_setup()` loads the library into `library_manager` — but only in the API process |
| 238 | +- PipeJob carries the serialized top-level pipe, but child pipes are resolved by code at runtime |
| 239 | +- On the worker, `get_required_pipe()` finds an empty library |
| 240 | + |
| 241 | +### Design dimensions to discuss with the user |
| 242 | + |
| 243 | +1. **Where does the library load on the worker?** |
| 244 | + - At worker startup (base library from PIPELEXPATH)? |
| 245 | + - Per-workflow in an Activity (custom bundles from mthds_contents)? |
| 246 | + - Both (two-tier cache)? |
| 247 | + |
| 248 | +2. **What travels with the workflow input?** |
| 249 | + - Today: a pre-resolved `PipeJob` with the top-level pipe object |
| 250 | + - Option A: send `mthds_contents` + `pipe_code` instead, resolve on worker |
| 251 | + - Option B: send `PipeJob` but also include `mthds_contents` for the worker to load |
| 252 | + |
| 253 | +3. **Replay safety** |
| 254 | + - Library loading is I/O — it belongs in Activities, not workflow code |
| 255 | + - Side-effect state (loading into a singleton) is lost on replay |
| 256 | + - Activities re-execute cleanly on replay |
| 257 | + |
| 258 | +4. **Caching strategy** |
| 259 | + - Tier 1: base library at worker startup (same for all executions) |
| 260 | + - Tier 2: per-request overlay cached by content hash of mthds_contents |
| 261 | + |
| 262 | +### Key files to read and discuss |
| 263 | + |
| 264 | +| What | Where | |
| 265 | +|------|-------| |
| 266 | +| Library loading (API-side) | `pipelex/pipeline/pipeline_run_setup.py` | |
| 267 | +| Hub singleton + get_required_pipe | `pipelex/hub.py` | |
| 268 | +| Workflow definition | `pipelex/temporal/tprl_pipe/wf_pipe_router.py` | |
| 269 | +| Router (Temporal) | `pipelex/temporal/tprl_pipe/pipe_router_top.py` | |
| 270 | +| Router (local, works fine) | `pipelex/pipe_run/pipe_router.py` | |
| 271 | +| Worker startup | `pipelex/temporal/worker_cli.py` | |
| 272 | +| All controllers that break | `pipelex/pipe_controllers/` (sequence, condition, batch, parallel, sub_pipe) | |
| 273 | +| Library manager | `pipelex/libraries/library_manager.py` | |
| 274 | + |
| 275 | +### When the user says "implement" |
| 276 | + |
| 277 | +Only then shift to making code changes. Use the diagnose loop to verify each change: |
| 278 | +1. Make code changes yourself |
| 279 | +2. Restart the worker: |
| 280 | + ```bash |
| 281 | + tmux kill-session -t temporal-worker |
| 282 | + tmux new-session -d -c "$PWD" -s temporal-worker 'PIPELEXPATH=tests/integration/pipelex/pipes/controller/pipe_sequence .venv/bin/python -m pipelex.temporal.worker_cli --is-not-sandboxed' |
| 283 | + sleep 5 |
| 284 | + ``` |
| 285 | +3. Run `make trund` via Bash and read the output |
| 286 | +4. Capture worker output: `tmux capture-pane -t temporal-worker -p -S -200` |
| 287 | +5. Repeat |
0 commit comments