Replies: 2 comments 1 reply
-
Unlike 3.3, modern RabbitMQ versions use a pair of message stores per virtual host. If you have many of them, they will take longer to shut down. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Enabling debug logging might help see more of the shutdown steps, e.g. for individual message stores. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
We are currently using RabbitMQ version 3.9.20 with Erlang 23.3.4.10 on CentOS 7.
We have a cluster with 3 brokers with (classic) mirroring enabled using: "ha-mode":"all","ha-sync-mode":"automatic". All queues are declared using: durable=true; exlusive=false; auto-delete=true.
When the system is up and running we have an average of 200 client connections and 111 queues and all queues are mirrored to all brokers (messages published: 15/s; received 99/s).
When we shutdown one broker using "systemctl stop rabbitmq-server", we notice that the broker starts going down and all it's client connections are immediately closed. The clients connect to the next available broker, but the other two brokers are not responding to the following AMQP requests, i.e. calls to Queue Declare from the clients to these brokers are getting no response.
This state takes about 3 minutes. Using 'netstat' we are still seeing established connections between the broker process (beam.smp) that is shutting down and the other 2 brokers. After 3 minutes a timeout in systemd finally kills the broker process (beam.smp) and after that the other 2 brokers finally notice that the broker is down ("node rabbit@xxxx down: connection_closed" is printed in their logs).
At the same time the clients finally get a response OK in their call to Queue Declare, and the message processing continues.
When we increased the systemd timeout to 15 minutes, the broker in the end shuts down by itself after 10 minutes (see time gap):
In the rabbitmq log file we see this (see also the time gap between 2022-07-07 12:17:40 and 2022-07-07 12:27:40:
Can anybody explain why the shutdown of the broker takes so long that in the end the OS (systemd) just decides to kill the process?
When we were still using RabbitMQ 3.3.5 we didn't seem to have this problem, so can this have been introduced by a later version of RabbitMQ?
Any suggestions in how we can analyse this problem further?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions