[Questions] Re-adding a wiped node to a cluster with Khepri vs. Mnesia #14283
-
Community Support Policy
RabbitMQ version used4.1.2 Erlang version used26.2.x Operating system (distribution) usedlinux How is RabbitMQ deployed?Generic binary package rabbitmq-diagnostics status outputSee https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics Logs from node 1 (with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs 2025-07-25 15:47:24.064248+08:00 [notice] <0.1261.0> RabbitMQ metadata store: candidate -> leader in term: 1 machine version: 1, last applied 0 Logs from node 2 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 3 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
2025-07-25 15:47:24.064248+08:00 [notice] <0.1261.0> RabbitMQ metadata store: candidate -> leader in term: 1 machine version: 1, last applied 0
2025-07-25 15:47:24.064723+08:00 [info] <0.248.0> DB: joining cluster using remote nodes:
2025-07-25 15:47:24.064723+08:00 [info] <0.248.0> ['rabbit@rabbitmqservice-0','rabbit@rabbitmqservice-2']
2025-07-25 15:47:24.067536+08:00 [info] <0.248.0> Khepri clustering: Attempt to add node 'rabbit@rabbitmqservice-1' to cluster ['rabbit@rabbitmqservice-0','rabbit@rabbitmqservice-2'] through node 'rabbit@rabbitmqservice-0'
2025-07-25 15:47:24.067956+08:00 [info] <0.248.0> ra: starting system coordination
2025-07-25 15:47:24.068040+08:00 [info] <0.248.0> starting Ra system: coordination in directory: /usr/local/rabbitmq_server-4.1.2/var/lib/rabbitmq/mnesia/rabbit@rabbitmqservice-1/coordination/rabbit@rabbitmqservice-1
2025-07-25 15:47:54.072405+08:00 [info] <0.248.0> Khepri clustering: Failed to add node 'rabbit@rabbitmqservice-1' to cluster "rabbitmq_metadata" through 'rabbit@rabbitmqservice-0': {error,
2025-07-25 15:47:54.072405+08:00 [info] <0.248.0> timeout}
rabbitmq.confSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ cluster3 rabbitmq node Steps to reproduce the behavior in questionlost one node data and restart advanced.configSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application code# PASTE CODE HERE, BETWEEN BACKTICKS Kubernetes deployment file# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS What problem are you trying to solve?I understand that this might be a characteristic of Khepri, but I want to know how I can rejoin the cluster in the event of losing data from a node. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@dormanze the right thing to do has always been to remove the node that no longer has any prior data. |
Beta Was this translation helpful? Give feedback.
@dormanze the right thing to do has always been to remove the node that no longer has any prior data.