[Questions] Re-adding a wiped node to a cluster with Khepri vs. Mnesia #14283

dormanze · 2025-07-25T09:18:51Z

dormanze
Jul 25, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.1.2

Erlang version used

26.2.x

Operating system (distribution) used

linux

How is RabbitMQ deployed?

Generic binary package

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
nothing

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

2025-07-25 15:47:24.064248+08:00 [notice] <0.1261.0> RabbitMQ metadata store: candidate -> leader in term: 1 machine version: 1, last applied 0
2025-07-25 15:47:24.064723+08:00 [info] <0.248.0> DB: joining cluster using remote nodes:
2025-07-25 15:47:24.064723+08:00 [info] <0.248.0> ['rabbit@rabbitmqservice-0','rabbit@rabbitmqservice-2']
2025-07-25 15:47:24.067536+08:00 [info] <0.248.0> Khepri clustering: Attempt to add node 'rabbit@rabbitmqservice-1' to cluster ['rabbit@rabbitmqservice-0','rabbit@rabbitmqservice-2'] through node 'rabbit@rabbitmqservice-0'
2025-07-25 15:47:24.067956+08:00 [info] <0.248.0> ra: starting system coordination
2025-07-25 15:47:24.068040+08:00 [info] <0.248.0> starting Ra system: coordination in directory: /usr/local/rabbitmq_server-4.1.2/var/lib/rabbitmq/mnesia/rabbit@rabbitmqservice-1/coordination/rabbit@rabbitmqservice-1
2025-07-25 15:47:54.072405+08:00 [info] <0.248.0> Khepri clustering: Failed to add node 'rabbit@rabbitmqservice-1' to cluster "rabbitmq_metadata" through 'rabbit@rabbitmqservice-0': {error,
2025-07-25 15:47:54.072405+08:00 [info] <0.248.0> timeout}

Logs from node 2 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

2025-07-25 15:47:24.064248+08:00 [notice] <0.1261.0> RabbitMQ metadata store: candidate -> leader in term: 1 machine version: 1, last applied 0 2025-07-25 15:47:24.064723+08:00 [info] <0.248.0> DB: joining cluster using remote nodes: 2025-07-25 15:47:24.064723+08:00 [info] <0.248.0> ['rabbit@rabbitmqservice-0','rabbit@rabbitmqservice-2'] 2025-07-25 15:47:24.067536+08:00 [info] <0.248.0> Khepri clustering: Attempt to add node 'rabbit@rabbitmqservice-1' to cluster ['rabbit@rabbitmqservice-0','rabbit@rabbitmqservice-2'] through node 'rabbit@rabbitmqservice-0' 2025-07-25 15:47:24.067956+08:00 [info] <0.248.0> ra: starting system coordination 2025-07-25 15:47:24.068040+08:00 [info] <0.248.0> starting Ra system: coordination in directory: /usr/local/rabbitmq_server-4.1.2/var/lib/rabbitmq/mnesia/rabbit@rabbitmqservice-1/coordination/rabbit@rabbitmqservice-1 2025-07-25 15:47:54.072405+08:00 [info] <0.248.0> Khepri clustering: Failed to add node 'rabbit@rabbitmqservice-1' to cluster "rabbitmq_metadata" through 'rabbit@rabbitmqservice-0': {error, 2025-07-25 15:47:54.072405+08:00 [info] <0.248.0> timeout}

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

# PASTE rabbitmq.conf HERE, BETWEEN BACKTICKS

Steps to deploy RabbitMQ cluster

3 rabbitmq node

Steps to reproduce the behavior in question

lost one node data and restart

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

I understand that this might be a characteristic of Khepri, but I want to know how I can rejoin the cluster in the event of losing data from a node.
Before switching to Khepri, when using Mnesia, I was able to start the server even if data was lost on one node, and then join the cluster with rabbitmqctl.

Answered by michaelklishin

Jul 25, 2025

@dormanze the right thing to do has always been to remove the node that no longer has any prior data.

View full answer

michaelklishin · 2025-07-25T15:01:58Z

michaelklishin
Jul 25, 2025
Maintainer

@dormanze the right thing to do has always been to remove the node that no longer has any prior data.

1 reply

dormanze Jul 28, 2025
Author

Thank you for your response. It seems that for this scenario, we can only discard the old cluster data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Questions] Re-adding a wiped node to a cluster with Khepri vs. Mnesia #14283

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Questions] Re-adding a wiped node to a cluster with Khepri vs. Mnesia #14283

Uh oh!

dormanze Jul 25, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 1 comment · 1 reply

Uh oh!

michaelklishin Jul 25, 2025 Maintainer

Uh oh!

dormanze Jul 28, 2025 Author

dormanze
Jul 25, 2025

Replies: 1 comment 1 reply

michaelklishin
Jul 25, 2025
Maintainer

dormanze Jul 28, 2025
Author