Design of the RobustMQ Rate Limiting Module
Rate limiting is a fundamental capability of a message broker. Without it, a single misbehaving client can exhaust the resources of an entire cluster.
Why Two Layers?
A message broker typically serves multiple business tenants simultaneously — this is the core tension in a multi-tenant scenario: a traffic spike from any one tenant should not affect other tenants.
Rate limiting only at the cluster level doesn't solve this problem: if one tenant maxes out the cluster quota, everyone else is stuck. Rate limiting only at the tenant level doesn't protect the node itself — in extreme cases the node can still be overwhelmed.
So both layers are needed — the node layer for node self-protection, the tenant layer for isolation between tenants.
Request arrives
│
▼
Node layer (protects total node resources)
│
▼
Tenant layer (ensures resource isolation between tenants)
│
▼
Process requestTwo Different Rate Limiting Approaches
There are two completely different semantics for rate limiting, and RobustMQ uses both.
Rate Limiting
Limits the maximum number of requests processed per second, implemented with a token bucket algorithm. Requests that exceed the rate are not immediately rejected — they wait for tokens to be replenished.
This is suitable for scenarios like connection establishment and message publishing — when traffic suddenly spikes, smoothing it out with a queue is better than immediately disconnecting clients.
Count Limiting
Limits the total number of a certain type of resource; requests are rejected outright when the limit is exceeded, with no waiting.
This is suitable for scenarios like connection count, session count, and topic count — if the resource hasn't been released, waiting longer won't help.
Scope of Rate Limiting
Rate limiting is divided into two parts: global and MQTT protocol layer.
The global layer does not distinguish between protocols; it controls speed at the network level:
- New TCP connections per second — prevents connection storms
- Management API rate limiting by URI dimension — prevents management operations from overwhelming the Broker
The MQTT layer has controls at both the node and tenant dimensions:
| Limiting Dimension | Node Layer | Tenant Layer |
|---|---|---|
| Message publish rate | Maximum messages published per second on the node | Maximum messages published per second per tenant |
| Connection establishment rate | Maximum new connections per second on the node | Maximum new connections per second per tenant |
Tenant rate limiters are lazily initialized — on the first request they are initialized with the tenant's configuration, and subsequent requests hit the cache directly. Node-layer rates can be hot-updated at runtime without restarting the service.
Scope of Count Limiting
Connection Count
When a new connection is established, two things are checked: whether the node's current total connection count exceeds the cluster-configured max_connections_per_node, and whether that tenant's connection count exceeds its own quota. If either is exceeded, the CONNECT request is rejected.
Session Count
Sessions and connections are two independent concepts. A persistent session (Clean Start = 0) is retained after the connection is closed, so session count and connection count are not necessarily equal. Both dimensions must be checked separately.
Topic Count
Topics are global resources. When a new topic is created, the node total and the tenant quota are both checked.
QoS In-Flight Window
This limit is unrelated to multi-tenancy — it applies to individual connections.
An MQTT client can declare via the Receive Maximum attribute at connect time how many QoS 1/QoS 2 in-flight messages (sent but not yet acknowledged) it can handle simultaneously. When the Broker pushes to that client and the number of in-flight messages reaches this limit, the Broker must stop and wait for existing messages to be acknowledged before continuing.
This mechanism protects the client from the Broker pushing too fast and overflowing the client's buffer.
Order of Rate Limiting Checks in Request Processing
PUBLISH messages:
Client → PUBLISH
│
├─ Is the topic count limit exceeded?
├─ Is the QoS in-flight window full?
├─ Is the publish rate limit exceeded? (wait for token)
│
▼
Message processing (storage, routing, pushing to subscribers)CONNECT requests:
Client → CONNECT
│
├─ Is the connection rate limit exceeded? (wait for token)
├─ Is the total connection count limit exceeded?
├─ Is the total session count limit exceeded?
│
▼
Establish connection, load sessionConfiguration
Cluster Layer (broker.toml)
[mqtt_limit.cluster]
# Maximum connections per node
max_connections_per_node = 10000000
# Maximum new connections per second
max_connection_rate = 100000
# Maximum topics per node
max_topics = 5000000
# Maximum sessions per node
max_sessions = 50000000
# Maximum published messages per second
max_publish_rate = 10000These are node-level upper bounds, shared by all tenants.
Tenant Layer (Admin API)
| Field | Description |
|---|---|
max_connections_per_node | Maximum connections per node for this tenant |
max_create_connection_rate_per_second | Maximum new connections per second for this tenant |
max_topics | Maximum topics for this tenant |
max_sessions | Maximum sessions for this tenant |
max_publish_rate | Maximum published messages per second for this tenant |
Tenant quotas are typically set lower than cluster quotas. The cluster quota is the hard upper bound; tenant quotas are subdivisions within that upper bound. The two are meant to be used together.
