Back to all use cases
TelecommunicationsRemediationCloud / Kubernetes

Scaling RabbitMQ to support hundreds of thousands of connected devices for a global telecom leader

GT
Global Telecom Leader

Overview

The company's IoT platform relies on RabbitMQ to handle messaging for over 300,000 connected devices with plans for significant scaling. Weekly out-of-memory crashes and performance bottlenecks were threatening platform stability and growth plans.

Challenge

RabbitMQ nodes were crashing weekly due to out-of-memory errors caused by improper vertical scaling approach, low prefetch counts (set to 1), and missing TTL configurations on queues. Producer blocking from slow consumers was creating cascading failures. The team was using classic queues without mirroring, leaving them vulnerable to data loss during node failures.

Environment

Kubernetes deployment, RabbitMQ handling 300,000+ device connections, Spring Boot consumers, plans for horizontal scaling with Prometheus/Grafana monitoring.

Approach

AceMQ conducted a multi-day intensive remediation with live environment review, real-time configuration changes, and hands-on performance optimization. The team analyzed metrics, reviewed topology, and provided detailed architectural recommendations for scaling.

Solution

  • Recommended horizontal scaling strategy (more nodes) instead of vertical scaling
  • Increased prefetch count from 1 to 20 for dramatically improved consumer throughput
  • Configured lazy queues to reduce producer blocking by enabling direct-to-disk writes
  • Implemented per-queue memory limits and TTL policies to prevent OOM crashes
  • Raised high water mark from 0.7 to 0.9 for better memory utilization
  • Designed migration path from classic queues to quorum queues for high availability
  • Provided Excel-based server requirements calculator for capacity planning

Outcome

The client eliminated weekly RabbitMQ crashes, achieved significantly improved consumer throughput, and has a clear scaling roadmap to support growing device counts. The platform now handles 300,000+ devices without stability issues.

Technologies

RabbitMQKubernetesQuorum QueuesPrometheusGrafana

Ready to Get Started?

Whether you need architecture advisory, 24/7 support, or full managed services, AceMQ has the expertise to help.

Contact Us