fix(queue): cap dead-letter manual retries to prevent infinite retry loops#223
fix(queue): cap dead-letter manual retries to prevent infinite retry loops#223jcenters wants to merge 1 commit intoTinyAGI:mainfrom
Conversation
…loops retryDeadMessage() was resetting retry_count to 0 on every manual retry, giving each retried message a full 5 new attempts. An agent with access to the tinyclaw-admin skill could call POST /api/queue/dead/:id/retry indefinitely, creating an infinite token burn loop on any persistently failing message. Changes: - Add MAX_MANUAL_RETRIES = 3 constant - Add manual_retry_count column to messages table (with migration for existing databases) - retryDeadMessage() returns 'cap_exceeded' after 3 manual retries instead of resetting — no bypass - On retry, set retry_count = MAX_RETRIES - 1 (not 0) so the message gets exactly one more automatic attempt rather than five - POST /api/queue/dead/:id/retry returns HTTP 429 when cap is exceeded - GET /api/queue/dead now includes manualRetriesUsed and manualRetriesRemaining in the response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR closes an infinite-retry vulnerability in the dead-letter queue by introducing a Key changes:
Issue found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as Admin Agent
participant API as POST /api/queue/dead/:id/retry
participant Core as retryDeadMessage()
participant DB as SQLite messages
Agent->>API: POST /api/queue/dead/42/retry
API->>Core: retryDeadMessage(42)
Core->>DB: SELECT manual_retry_count WHERE id=42 AND status='dead'
DB-->>Core: { manual_retry_count: N }
alt message not found
Core-->>API: false
API-->>Agent: 404 dead message not found
else manual_retry_count >= MAX_MANUAL_RETRIES (3)
Core-->>API: 'cap_exceeded'
API-->>Agent: 429 Manual retry limit (3) reached
else manual_retry_count < 3
Core->>DB: UPDATE SET status='pending', retry_count=4, manual_retry_count=N+1
DB-->>Core: changes=1
Core-->>API: true
API-->>Agent: 200 { ok: true }
Note over DB: Message re-enters queue with retry_count=4
DB->>DB: failMessage → retry_count=5 ≥ MAX_RETRIES → status='dead'
end
Last reviewed commit: 85c3aab |
| failedReason: m.last_error, | ||
| attemptsMade: m.retry_count, | ||
| manualRetriesUsed: m.manual_retry_count, | ||
| manualRetriesRemaining: Math.max(0, 3 - m.manual_retry_count), |
There was a problem hiding this comment.
Magic number instead of shared constant
manualRetriesRemaining is computed with the literal 3 rather than a shared constant. Because MAX_MANUAL_RETRIES is defined (but not exported) in queues.ts, this file has no choice but to hardcode the value. If the cap is ever adjusted in queues.ts, the API response will silently report a stale limit, making manualRetriesRemaining wrong for any client that relies on it.
The same problem appears on line 124 where the error message uses ${3} — syntactically a template-literal interpolation that always evaluates to the literal 3.
Fix: Export MAX_MANUAL_RETRIES from queues.ts and import it here:
In packages/core/src/queues.ts:
export const MAX_MANUAL_RETRIES = 3;In packages/server/src/routes/queue.ts:
import { ..., MAX_MANUAL_RETRIES } from '@tinyclaw/core';Then replace both occurrences:
| manualRetriesRemaining: Math.max(0, 3 - m.manual_retry_count), | |
| manualRetriesRemaining: Math.max(0, MAX_MANUAL_RETRIES - m.manual_retry_count), |
| if (!ok) return c.json({ error: 'dead message not found' }, 404); | ||
| const result = retryDeadMessage(id); | ||
| if (result === false) return c.json({ error: 'dead message not found' }, 404); | ||
| if (result === 'cap_exceeded') return c.json({ error: `Manual retry limit (${3}) reached for this message. Delete it if you want to discard it.` }, 429); |
There was a problem hiding this comment.
Hardcoded literal ${3} in error message
The template literal ${3} embeds a bare numeric literal inside interpolation braces. It evaluates correctly today, but it will diverge from the real cap the moment MAX_MANUAL_RETRIES is changed in queues.ts. Once MAX_MANUAL_RETRIES is exported (see comment on line 114), use it here as well:
| if (result === 'cap_exceeded') return c.json({ error: `Manual retry limit (${3}) reached for this message. Delete it if you want to discard it.` }, 429); | |
| if (result === 'cap_exceeded') return c.json({ error: `Manual retry limit (${MAX_MANUAL_RETRIES}) reached for this message. Delete it if you want to discard it.` }, 429); |
Problem
retryDeadMessage()resetretry_countto 0 on every manual retry, giving each retried message a full 5 new automatic attempts. Since thetinyclaw-adminskill documentsPOST /api/queue/dead/:id/retry, agents can call it repeatedly on any persistently failing message — creating an infinite retry loop with no upper bound on token usage.Changes
packages/core/src/queues.tsMAX_MANUAL_RETRIES = 3manual_retry_countcolumn to themessagestable with a migration for existing databasesretryDeadMessage()returns'cap_exceeded'(notfalse) after 3 manual retries — no bypassretry_count = MAX_RETRIES - 1instead of 0, so the message gets exactly one more automatic attempt before going dead again rather than fivepackages/server/src/routes/queue.tsPOST /api/queue/dead/:id/retryreturns HTTP 429 when the manual retry cap is exceededGET /api/queue/deadnow includesmanualRetriesUsedandmanualRetriesRemainingin each dead message objectTest plan
manualRetriesRemaining: 3retry_count = 4manualRetriesRemaining: 2manual_retry_countcolumn is added automatically on startup🤖 Generated with Claude Code