Skip to content

Add circuit breaker for database connectivity failures during sync#397

Open
joschiv1977 wants to merge 1 commit intos1t5:mainfrom
joschiv1977:fix/circuit-breaker-db-connectivity
Open

Add circuit breaker for database connectivity failures during sync#397
joschiv1977 wants to merge 1 commit intos1t5:mainfrom
joschiv1977:fix/circuit-breaker-db-connectivity

Conversation

@joschiv1977
Copy link
Copy Markdown

Summary

When PostgreSQL becomes unreachable during email sync (e.g. container crash, OOM kill, DNS failure), the sync loop currently continues processing every single email — logging a DB error for each one without ever stopping. This causes massive log spam, wasted Graph API / IMAP calls, and high memory/CPU usage for no benefit.

This PR adds a circuit breaker pattern to both GraphEmailService and ImapEmailService:

  • Message-level circuit breaker: After 5 consecutive DB connectivity errors within a folder, abort the folder sync immediately
  • Folder-level circuit breaker: After 2 consecutive folders fail due to DB issues, check database health. Abort entire account sync if DB is unreachable
  • Pre-pagination health check: Before fetching the next page from Graph API, verify DB connectivity to avoid wasting API quota
  • IsDbConnectivityError() helper: Distinguishes DB connectivity errors (SocketException, DNS failures, transient failures) from application-level errors (UTF-8 issues, parse errors, etc.)
  • IsDatabaseReachableAsync() helper: Lightweight connectivity check via CanConnectAsync()

Non-DB errors are not affected by the circuit breaker and continue to be handled as before.

Context

This was discovered when the PostgreSQL container died during a large sync operation (800+ emails across multiple folders). The mail archiver continued running, spamming thousands of identical "Name or service not known" errors in the logs — one for every email it tried to process — while also continuing to fetch pages from the Graph API (wasting API calls). The container had to be manually stopped.

Related issues:

Test plan

  • Verify normal sync still works without DB issues (circuit breaker should not trigger)
  • Simulate DB outage during sync (stop postgres container) — sync should abort after 5 consecutive DB errors per folder
  • Verify non-DB errors (e.g. UTF-8 encoding) don't trigger circuit breaker
  • Verify folder-level circuit breaker aborts account sync after 2 folder failures
  • Check that sync resumes normally after DB comes back

🤖 Generated with Claude Code

When PostgreSQL becomes unreachable during email sync (e.g. container crash,
OOM kill, DNS failure), the sync loop previously continued processing every
single email, logging a DB error for each one without ever stopping. This
caused massive log spam and wasted API calls to Graph/IMAP servers.

This adds a circuit breaker pattern to both GraphEmailService and
ImapEmailService:

- Message-level: After 5 consecutive DB connectivity errors, abort the
  current folder sync immediately
- Folder-level: After 2 consecutive folder failures due to DB issues,
  check database health before continuing. Abort entire account sync
  if DB is unreachable.
- Pre-pagination health check: Before fetching the next page from
  Graph API, verify DB is still reachable to avoid wasting API calls
- New IsDbConnectivityError() helper distinguishes DB connectivity
  errors (SocketException, DNS failures, transient failures) from
  application-level errors
- New IsDatabaseReachableAsync() performs lightweight connectivity
  check via CanConnectAsync()

Non-DB errors (e.g. UTF-8 encoding issues, parse errors) do not trigger
the circuit breaker and are handled as before.

Relates to s1t5#382, s1t5#388, s1t5#363

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant