Slow Import of vHosts/RabbitMQ Definitions in RabbitMQ Cluster (Kubernetes & VMs) #5672
-
Hi, I am facing "Slow Import of vHosts/RabbitMQ Definitions" in a "3 node rabbitmq cluster on Kubernetes and/or 3 node VMs both". In our test environment, I created a 3 node RabbitMQ Cluster on Kubernetes. I am using RabbitMQ 3.10.5. It is taking more than half an hour (We also noted initial vHosts loading were fast e.g. first 100 vHosts were imported very fast. But as the import proceeds from 100 to 1000, it exponentially slows down). I tried to import definitions using "RabbitMQ Management UI" OR "Calling API in single threaded for loop through bash script" OR "going inside the pod and loading definitions using rabbitmqctl). Also RabbitMQ Management UI slows down while import process is running (especially vHosts page). If I use single node RabbitMQ Server (without Clustering) - it takes ~10 minutes. Do we have any rabbitmq configuration parameter which affects "bulk vHost Definition Import" process, that we can tune ? In our production environment, we have 1000+ vHosts on AWS EC2 VMs (3 node RabbitMQ Cluster). We saw similar issue there also where large definition import takes time and it gets slow exponentially. We are using vHosts to separate Tenant communications. Anyone can help ? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Starting with 3.7.0 or so, every virtual host is a separate tree of processes plus two message stores. In a cluster, virtual host would only be considered ready after all nodes reported that they have started the entire tree of dependent processes. I do not expect this virtual host design to change significantly even in 4.0. So the right thing to do is to use fewer virtual hosts and/or more focused clusters (built for |
Beta Was this translation helpful? Give feedback.
-
I've looked into this a few months back when reported on the mailing list. I believe the problem is not really about the processes to be started (that would affect Khepri-enabled deployments as well, but importing 1000 vhosts is much faster with Khepri). I believe the culprit is Mnesia locking related to exchange declarations - each vhost has 7 default exchanges, so 1000 vhosts => 7000 exchanges to be declared. Could that code path be optimised? Almost certainly yes (PRs welcome), but we are focused on migrating from Mnesia to Khepri to avoid this whole class of issues going forward. Having said that, as Michael said, having a hundreds or more tenants in a single RabbitMQ cluster is probably not a good idea for other reasons. |
Beta Was this translation helpful? Give feedback.
Starting with 3.7.0 or so, every virtual host is a separate tree of processes plus two message stores.
This means a failure of a single message store does not affect other virtual hosts and also means that starting hundreds or thousands of virtual hosts takes much longer.
In a cluster, virtual host would only be considered ready after all nodes reported that they have started the entire tree of dependent processes.
I do not expect this virtual host design to change significantly even in 4.0. So the right thing to do is to use fewer virtual hosts and/or more focused clusters (built for
a specific purpose or set of applications) instead of sharing a single cluster for everything.