-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
In a 5-replica replace member process, starting the replace member is allowed even when there are only 2 active peers. However, once an active member out and flip to a learner, it does not participate in elections, making it impossible to meet the commit quorum requirement. Consequently, subsequent logs (e.g., set_priority) cannot be committed, causing the replace member operation to be canceled due to timeout, returning an INVALID_ARG error.
Consider the following example:
- Five replicas, M1 through M5, exist. Replicas M1, M2, and M3 have a committed LSN of 1000. Replicas M4 and M5 have a committed LSN of 100. M1 is the leader.
- M2 is selected for removal and set as a learner with LSN 1001. After the flip is complete, M1, M2, and M3 have LSN 1001, while M4 and M5 have LSN 100.
- M2's priority is set to 0 with LSN 1002, but this change cannot be committed due to an insufficient quorum of only two active peers. (Only M1 and M3 are active. M4 and M5 are behind, and M3 is a learner.)
- Learner status checks fail as the learner's priority remains non-zero during the timeout, causing the member replacement operation to be cancelled.
More details in issue51
Metadata
Metadata
Assignees
Labels
No labels