Skip to content

fix(queue): cap dead-letter manual retries to prevent infinite retry loops#223

Open
jcenters wants to merge 1 commit intoTinyAGI:mainfrom
jcenters:fix/dead-letter-retry-cap
Open

fix(queue): cap dead-letter manual retries to prevent infinite retry loops#223
jcenters wants to merge 1 commit intoTinyAGI:mainfrom
jcenters:fix/dead-letter-retry-cap

Conversation

@jcenters
Copy link

Problem

retryDeadMessage() reset retry_count to 0 on every manual retry, giving each retried message a full 5 new automatic attempts. Since the tinyclaw-admin skill documents POST /api/queue/dead/:id/retry, agents can call it repeatedly on any persistently failing message — creating an infinite retry loop with no upper bound on token usage.

Changes

packages/core/src/queues.ts

  • Add MAX_MANUAL_RETRIES = 3
  • Add manual_retry_count column to the messages table with a migration for existing databases
  • retryDeadMessage() returns 'cap_exceeded' (not false) after 3 manual retries — no bypass
  • On retry, set retry_count = MAX_RETRIES - 1 instead of 0, so the message gets exactly one more automatic attempt before going dead again rather than five

packages/server/src/routes/queue.ts

  • POST /api/queue/dead/:id/retry returns HTTP 429 when the manual retry cap is exceeded
  • GET /api/queue/dead now includes manualRetriesUsed and manualRetriesRemaining in each dead message object

Test plan

  • New message fails 5 times → goes dead → manualRetriesRemaining: 3
  • Retry 1: returns 200, message re-enters queue with retry_count = 4
  • Message fails again → goes dead → manualRetriesRemaining: 2
  • Retry 2, Retry 3: same behavior
  • Retry 4: returns HTTP 429, message stays dead
  • Existing databases: manual_retry_count column is added automatically on startup

🤖 Generated with Claude Code

…loops

retryDeadMessage() was resetting retry_count to 0 on every manual retry,
giving each retried message a full 5 new attempts. An agent with access
to the tinyclaw-admin skill could call POST /api/queue/dead/:id/retry
indefinitely, creating an infinite token burn loop on any persistently
failing message.

Changes:
- Add MAX_MANUAL_RETRIES = 3 constant
- Add manual_retry_count column to messages table (with migration for
  existing databases)
- retryDeadMessage() returns 'cap_exceeded' after 3 manual retries
  instead of resetting — no bypass
- On retry, set retry_count = MAX_RETRIES - 1 (not 0) so the message
  gets exactly one more automatic attempt rather than five
- POST /api/queue/dead/:id/retry returns HTTP 429 when cap is exceeded
- GET /api/queue/dead now includes manualRetriesUsed and
  manualRetriesRemaining in the response

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR closes an infinite-retry vulnerability in the dead-letter queue by introducing a MAX_MANUAL_RETRIES = 3 cap enforced both at the data layer (queues.ts) and the HTTP layer (queue.ts), and by resetting retry_count to MAX_RETRIES - 1 on each manual retry so the message gets exactly one additional automatic attempt rather than a full reset to zero.

Key changes:

  • manual_retry_count column added to messages with a backward-compatible ALTER TABLE migration for existing databases
  • retryDeadMessage() returns the new discriminated value 'cap_exceeded' (instead of false) once the cap is reached, allowing the route to distinguish "not found" (404) from "cap hit" (429)
  • GET /api/queue/dead now surfaces manualRetriesUsed and manualRetriesRemaining per message
  • POST /api/queue/dead/:id/retry returns HTTP 429 with a descriptive message when the cap is exceeded

Issue found:

  • MAX_MANUAL_RETRIES is not exported from queues.ts, forcing the route file to hardcode the literal 3 in two places — once in the manualRetriesRemaining computation (line 114) and once inside the 429 error message body using the unusual ${3} template expression (line 124). If the cap is ever changed in queues.ts, the API will silently report stale/incorrect values. The fix is to export the constant and import it in the route file.

Confidence Score: 3/5

  • Safe to merge with the caveat that the hardcoded magic number 3 in the route file will silently diverge if the cap is ever changed — export MAX_MANUAL_RETRIES before shipping to production.
  • The core logic (cap enforcement, migration, single re-attempt strategy) is correct and well-reasoned. The risk is limited to the duplicated literal 3 in the route file; it doesn't cause a bug today, but it is a maintenance hazard that can cause subtle API contract breakage in future changes.
  • packages/server/src/routes/queue.ts — two hardcoded 3 literals at lines 114 and 124 should reference the exported MAX_MANUAL_RETRIES constant instead.

Important Files Changed

Filename Overview
packages/core/src/queues.ts Adds MAX_MANUAL_RETRIES cap, manual_retry_count column with safe migration, and updates retryDeadMessage to enforce the cap and set retry_count=MAX_RETRIES-1 for a single re-attempt. Logic is correct but MAX_MANUAL_RETRIES is not exported, causing the route layer to duplicate the magic number.
packages/server/src/routes/queue.ts Adds HTTP 429 for cap-exceeded retries and exposes manualRetriesUsed/manualRetriesRemaining on the dead-letter listing, but hardcodes the literal 3 in two places instead of importing the MAX_MANUAL_RETRIES constant, creating a silent drift risk if the cap is ever changed.

Sequence Diagram

sequenceDiagram
    participant Agent as Admin Agent
    participant API as POST /api/queue/dead/:id/retry
    participant Core as retryDeadMessage()
    participant DB as SQLite messages

    Agent->>API: POST /api/queue/dead/42/retry
    API->>Core: retryDeadMessage(42)
    Core->>DB: SELECT manual_retry_count WHERE id=42 AND status='dead'
    DB-->>Core: { manual_retry_count: N }

    alt message not found
        Core-->>API: false
        API-->>Agent: 404 dead message not found
    else manual_retry_count >= MAX_MANUAL_RETRIES (3)
        Core-->>API: 'cap_exceeded'
        API-->>Agent: 429 Manual retry limit (3) reached
    else manual_retry_count < 3
        Core->>DB: UPDATE SET status='pending', retry_count=4, manual_retry_count=N+1
        DB-->>Core: changes=1
        Core-->>API: true
        API-->>Agent: 200 { ok: true }
        Note over DB: Message re-enters queue with retry_count=4
        DB->>DB: failMessage → retry_count=5 ≥ MAX_RETRIES → status='dead'
    end
Loading

Last reviewed commit: 85c3aab

failedReason: m.last_error,
attemptsMade: m.retry_count,
manualRetriesUsed: m.manual_retry_count,
manualRetriesRemaining: Math.max(0, 3 - m.manual_retry_count),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number instead of shared constant

manualRetriesRemaining is computed with the literal 3 rather than a shared constant. Because MAX_MANUAL_RETRIES is defined (but not exported) in queues.ts, this file has no choice but to hardcode the value. If the cap is ever adjusted in queues.ts, the API response will silently report a stale limit, making manualRetriesRemaining wrong for any client that relies on it.

The same problem appears on line 124 where the error message uses ${3} — syntactically a template-literal interpolation that always evaluates to the literal 3.

Fix: Export MAX_MANUAL_RETRIES from queues.ts and import it here:

In packages/core/src/queues.ts:

export const MAX_MANUAL_RETRIES = 3;

In packages/server/src/routes/queue.ts:

import { ..., MAX_MANUAL_RETRIES } from '@tinyclaw/core';

Then replace both occurrences:

Suggested change
manualRetriesRemaining: Math.max(0, 3 - m.manual_retry_count),
manualRetriesRemaining: Math.max(0, MAX_MANUAL_RETRIES - m.manual_retry_count),

if (!ok) return c.json({ error: 'dead message not found' }, 404);
const result = retryDeadMessage(id);
if (result === false) return c.json({ error: 'dead message not found' }, 404);
if (result === 'cap_exceeded') return c.json({ error: `Manual retry limit (${3}) reached for this message. Delete it if you want to discard it.` }, 429);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded literal ${3} in error message

The template literal ${3} embeds a bare numeric literal inside interpolation braces. It evaluates correctly today, but it will diverge from the real cap the moment MAX_MANUAL_RETRIES is changed in queues.ts. Once MAX_MANUAL_RETRIES is exported (see comment on line 114), use it here as well:

Suggested change
if (result === 'cap_exceeded') return c.json({ error: `Manual retry limit (${3}) reached for this message. Delete it if you want to discard it.` }, 429);
if (result === 'cap_exceeded') return c.json({ error: `Manual retry limit (${MAX_MANUAL_RETRIES}) reached for this message. Delete it if you want to discard it.` }, 429);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant