-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Proposed change
Add reconciliation logic for R1 (single replica) consumers when the stream's stored data has reverted due to data loss (e.g., VM crash, corrupted index.db, truncated .blk files). This prevents consumers from being in an inconsistent state where their sequence numbers are ahead of the stream.
This zip file contains stream and consumer data where the stream has no block files and the index file is also empty, resulting in stream sequences being reset to 0, 0. However, the consumer still retains state data, with its sequence at 1844.
When the server is restarted, NATS recovers with the correct stream sequences, but since the consumer sequence is ahead of the stream sequence, messages continue to accumulate instead of being consumed.
Use case
Consumer state should always be consistent with the actual stream state.
When a stream loses data (VM crash, corrupted storage, truncated files), the consumer's stored sequence numbers can become ahead of the stream's actual last sequence. This causes the consumer to become stuck - it won't deliver any messages until the stream accumulates enough new messages to "catch up" to the consumer's sequence.
Example
- Stream has 1000 messages (lseq=1000)
- Consumer has delivered and acked all 1000 (sseq=1001, asflr=1000)
- VM crash occurs - stream data is lost/corrupted
- On recovery: stream has lseq=0 (could be 0 or less than actual consumer seq), but consumer still has sseq=1001, Consumer is now stuck - it's waiting for message 1001, but stream is empty Even if you publish 100 new messages (lseq=100), consumer won't see them because it's waiting for seq 1001. You'd need to publish 1001 messages before the consumer starts working again
Consumers should always reconcile themselves with stream states to make sure consumers have valid sequences. Source of truth for consumers should be stream seq as well.
PS: Sync is default
Contribution
Yes.