How do you deploy RabbitMQ on Kubernetes?
- Deploying RabbitMQ as a StatefulSet with proper pod naming and stable network identities
- Managing PersistentVolumeClaims (PVCs) for each cluster node's data storage
- Handling cluster membership, node discovery, and the Erlang cookie
- Providing a RabbitmqCluster custom resource for declarative cluster configuration
What is the termination grace period and why does it matter?
"What the pre-stop hook does is it checks to make sure that none of the queues are in a quorum-critical status before it allows the pod to exit. It's protecting quorum. But when you go from three to zero replicas, it creates a deadlock — because taking one pod down makes quorum critical, which prevents that pod from terminating, which prevents the cluster from coming down."
— Scott Sternloff, AceMQ Principal Architect, Adeptia engagement session, April 2026
How do you safely scale a RabbitMQ cluster to zero on Kubernetes?
kubectl patch rabbitmqcluster <cluster-name> -n <namespace> \
--type merge \
-p '{"spec": {"terminationGracePeriodSeconds": 30}}'kubectl patch rabbitmqcluster <cluster-name> -n <namespace> \
--type merge \
-p '{"spec": {"replicas": 0}}'kubectl patch rabbitmqcluster <cluster-name> -n <namespace> \
--type merge \
-p '{"spec": {"replicas": 3, "terminationGracePeriodSeconds": 604800}}'What happens when a Kubernetes node gets drained during a cluster upgrade?
"Your clients have an automated process to upgrade their cluster. At a time, one node can go down. And because the RabbitMQ pod running on that node refuses to terminate due to quorum protection, it won't allow the node to be drained. Their upgrade process gets stuck."
— Scott Sternloff, AceMQ Principal Architect, Adeptia session, April 2026
- Reduce the default termination grace period before automated upgrade windows, and restore it afterward
- Set a watchdog timer: if a pod hasn't terminated within N seconds of receiving a SIGTERM, force-delete it (use cautiously)
- Use pod disruption budgets (PDBs): a PDB set to allow at most one unavailable pod at a time ensures Kubernetes respects quorum constraints during node drains
What should you never do with RabbitMQ pods on Kubernetes?
kubectl delete pod --force --grace-period=0) bypasses the pre-stop hook entirely. If the pod being force-deleted is the quorum leader for any queue, those queues lose quorum immediately.kubectl exec -n <namespace> <rabbitmq-pod> -- rabbitmq-diagnostics check_running
kubectl exec -n <namespace> <rabbitmq-pod> -- rabbitmq-queues check_if_node_is_quorum_criticalStorage and monitoring considerations
- Use a StorageClass with volumeBindingMode: WaitForFirstConsumer to ensure pods and their volumes land on the same availability zone
- SSD or NVMe storage is strongly recommended — quorum queue WAL writes are latency-sensitive
- Set retention policy to Retain on PVCs so that data survives pod deletion
- Do not use shared storage (NFS, shared file systems) for RabbitMQ data volumes
rabbitmq_quorum_queue_stat_voters— verify quorum is maintained across nodes- Pod restart counts — frequent restarts indicate quorum or resource issues
- PVC capacity — monitor disk utilization on each pod's PVC
rabbitmq_node_disk_free— alert before the disk alarm triggers