The company's IoT platform relies on RabbitMQ to handle messaging for over 300,000 connected devices with plans for significant scaling. Weekly out-of-memory crashes and performance bottlenecks were threatening platform stability and growth plans.
RabbitMQ nodes were crashing weekly due to out-of-memory errors caused by improper vertical scaling approach, low prefetch counts (set to 1), and missing TTL configurations on queues. Producer blocking from slow consumers was creating cascading failures. The team was using classic queues without mirroring, leaving them vulnerable to data loss during node failures.
Kubernetes deployment, RabbitMQ handling 300,000+ device connections, Spring Boot consumers, plans for horizontal scaling with Prometheus/Grafana monitoring.
AceMQ conducted a multi-day intensive remediation with live environment review, real-time configuration changes, and hands-on performance optimization. The team analyzed metrics, reviewed topology, and provided detailed architectural recommendations for scaling.
The client eliminated weekly RabbitMQ crashes, achieved significantly improved consumer throughput, and has a clear scaling roadmap to support growing device counts. The platform now handles 300,000+ devices without stability issues.
Throughput, latency, and resource utilization optimization including queue design, publisher confirms, replication settings, and concurrency tuning.
Hardening RabbitMQ in Kubernetes environments with StatefulSet tuning, quorum queue optimization, storage isolation, and memory/network configuration.
Whether you need architecture advisory, 24/7 support, or full managed services, AceMQ has the expertise to help.