-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
During the GC test (38:sdsnuobj03-prod), we observed that a learner was re-elected as the leader during a member replacement in the following sequence:
- T1: The leader, sm1, flips out_member sm4 as a learner.
- T2: The leader completes the addition of a new member, and the new member catches up.
- T3: The leader attempts to remove sm4 from the cluster but fails due to a network issue.
- T4: The SH test script restarts followers sm2 and sm3, prompting sm1 and sm4 to initiate a vote.
- T5: sm4 is elected as the leader, forming a group with sm2, sm3, and sm4.
Although the HS includes Nuraft fix (learner cannot be elected as leader), the learner sm4 still attend the leader election (shown in logs)
More logs:
leader(sm-long-running1-14)
[07/07/25 04:07:29.618009] [I] [62] [handle_join_leave.cxx:536:rm_srv_from_cluster] removed server 591629310 from configuration and save the configuration to log store at 52904 [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:29.618198] [I] [62] [handle_join_leave.cxx:555:rm_srv_from_cluster] set srv_to_leave_, server 591629310 will be removed from cluster, config 52904 [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:29.924649] [I] [64] [handle_timeout.cxx:117:handle_hb_timeout] peer 591629310 is not responding for 1 HBs since leave request [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:31.125037] [I] [65] [handle_timeout.cxx:117:handle_hb_timeout] peer 591629310 is not responding for 2 HBs since leave request [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:32.325414] [I] [65] [handle_timeout.cxx:117:handle_hb_timeout] peer 591629310 is not responding for 3 HBs since leave request [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:33.525693] [I] [65] [handle_timeout.cxx:117:handle_hb_timeout] peer 591629310 is not responding for 4 HBs since leave request [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:34.726059] [I] [65] [handle_timeout.cxx:117:handle_hb_timeout] peer 591629310 is not responding for 5 HBs since leave request [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:34.726077] [E] [65] [handle_timeout.cxx:122:handle_hb_timeout] force remove peer 591629310 [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:34.726085] [I] [65] [handle_join_leave.cxx:565:handle_join_leave_rpc_err] rpc failed for removing server (591629310), will remove this server directly [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
learner (sm-long-running4-14):
[07/07/25 04:04:19.630858] [I] [75] [handle_commit.cxx:970:reconfigure] new configuration: log idx 52903, prev log idx 52901
peer 1855058611, DC ID 0, 605838b9-c1df-4f4e-b31b-496b5ad16ae9, voting member, regular member, 100
peer 2010849527, DC ID 0, b9ba2b61-38f0-4312-a87e-2fe8c1997d88, voting member, regular member, 66
peer 591629310, DC ID 0, 0937248c-efc7-4a02-a89e-6efd3acc67f3, learner, regular member, 66
peer 1675859187, DC ID 0, 73a4b538-a7e7-4a96-885d-17736e012e3f, voting member, regular member, 66
my id: 591629310, leader: 1855058611, term: 4 [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:29.619450] [I] [61] [handle_append_entries.cxx:923:handle_append_entries] receive a config change from leader at 52904 [group=28b69f78-5cbc-4631-ab2f-4c6182667a0b]
[07/07/25 04:07:33.197262] [I] [60] [handle_timeout.cxx:300:handle_election_timeout] [ELECTION TIMEOUT] current role: follower, log last term 4, state term 4, target p 100, my p 100, hb alive, pre-vote NOT done [group=3b79acd2-7802-4d3c-acaa-d70f3d63e542]
...
[07/07/25 04:09:36.793844] [I] [57] [handle_vote.cxx:264:request_vote] [VOTE INIT] my id 591629310, my role candidate, term 6, log idx 52809, log term 5, priority (target 1 / mine 100) [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.795399] [I] [57] [handle_vote.cxx:415:handle_vote_resp] [VOTE RESP] peer 2010849527 (O), resp term 6, my role candidate, granted 2, responded 2, num voting members 3, quorum 2 [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.795411] [I] [57] [handle_vote.cxx:424:handle_vote_resp] Server is elected as leader for term 6 [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.795424] [I] [57] [raft_server.cxx:1030:become_leader] number of pending commit elements: 0 [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.795434] [I] [57] [raft_server.cxx:1043:become_leader] state machine commit index 52809, precommit index 52809, last log index 52809 [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.795629] [I] [57] [raft_server.cxx:1092:become_leader] [BECOME LEADER] appended new config at 52810 [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.796029] [I] [57] [handle_vote.cxx:427:handle_vote_resp] === LEADER (term 6) === [group=78a581d9-3003-4fc3-9bd8-25fd3d18e47c]
[07/07/25 04:09:36.797356] [I] [81] [handle_commit.cxx:414:commit_conf] config at index 52810 is committed, prev config log idx 52809 [group=78a581d9-3003-4fc3-9bd
Metadata
Metadata
Assignees
Labels
No labels