Design of the RobustMQ Rate Limiting Module

Rate limiting is a fundamental capability of a message broker. Without it, a single misbehaving client can exhaust the resources of an entire cluster.

Why Two Layers?

A message broker typically serves multiple business tenants simultaneously — this is the core tension in a multi-tenant scenario: a traffic spike from any one tenant should not affect other tenants.

Rate limiting only at the cluster level doesn't solve this problem: if one tenant maxes out the cluster quota, everyone else is stuck. Rate limiting only at the tenant level doesn't protect the node itself — in extreme cases the node can still be overwhelmed.

So both layers are needed — the node layer for node self-protection, the tenant layer for isolation between tenants.

text

Request arrives
  │
  ▼
Node layer (protects total node resources)
  │
  ▼
Tenant layer (ensures resource isolation between tenants)
  │
  ▼
Process request

Two Different Rate Limiting Approaches

There are two completely different semantics for rate limiting, and RobustMQ uses both.

Rate Limiting

Limits the maximum number of requests processed per second, implemented with a token bucket algorithm. Requests that exceed the rate are not immediately rejected — they wait for tokens to be replenished.

This is suitable for scenarios like connection establishment and message publishing — when traffic suddenly spikes, smoothing it out with a queue is better than immediately disconnecting clients.

Count Limiting

Limits the total number of a certain type of resource; requests are rejected outright when the limit is exceeded, with no waiting.

This is suitable for scenarios like connection count, session count, and topic count — if the resource hasn't been released, waiting longer won't help.

Scope of Rate Limiting

Rate limiting is divided into two parts: global and MQTT protocol layer.

The global layer does not distinguish between protocols; it controls speed at the network level:

New TCP connections per second — prevents connection storms
Management API rate limiting by URI dimension — prevents management operations from overwhelming the Broker

The MQTT layer has controls at both the node and tenant dimensions:

Limiting Dimension	Node Layer	Tenant Layer
Message publish rate	Maximum messages published per second on the node	Maximum messages published per second per tenant
Connection establishment rate	Maximum new connections per second on the node	Maximum new connections per second per tenant

Tenant rate limiters are lazily initialized — on the first request they are initialized with the tenant's configuration, and subsequent requests hit the cache directly. Node-layer rates can be hot-updated at runtime without restarting the service.

Scope of Count Limiting

Connection Count

When a new connection is established, two things are checked: whether the node's current total connection count exceeds the cluster-configured max_connections_per_node, and whether that tenant's connection count exceeds its own quota. If either is exceeded, the CONNECT request is rejected.

Session Count

Sessions and connections are two independent concepts. A persistent session (Clean Start = 0) is retained after the connection is closed, so session count and connection count are not necessarily equal. Both dimensions must be checked separately.

Topic Count

Topics are global resources. When a new topic is created, the node total and the tenant quota are both checked.

QoS In-Flight Window

This limit is unrelated to multi-tenancy — it applies to individual connections.

An MQTT client can declare via the Receive Maximum attribute at connect time how many QoS 1/QoS 2 in-flight messages (sent but not yet acknowledged) it can handle simultaneously. When the Broker pushes to that client and the number of in-flight messages reaches this limit, the Broker must stop and wait for existing messages to be acknowledged before continuing.

This mechanism protects the client from the Broker pushing too fast and overflowing the client's buffer.

Order of Rate Limiting Checks in Request Processing

PUBLISH messages:

text

Client → PUBLISH
  │
  ├─ Is the topic count limit exceeded?
  ├─ Is the QoS in-flight window full?
  ├─ Is the publish rate limit exceeded? (wait for token)
  │
  ▼
Message processing (storage, routing, pushing to subscribers)

CONNECT requests:

text

Client → CONNECT
  │
  ├─ Is the connection rate limit exceeded? (wait for token)
  ├─ Is the total connection count limit exceeded?
  ├─ Is the total session count limit exceeded?
  │
  ▼
Establish connection, load session

Configuration

Cluster Layer (broker.toml)

toml

[mqtt_limit.cluster]
# Maximum connections per node
max_connections_per_node = 10000000
# Maximum new connections per second
max_connection_rate = 100000
# Maximum topics per node
max_topics = 5000000
# Maximum sessions per node
max_sessions = 50000000
# Maximum published messages per second
max_publish_rate = 10000

These are node-level upper bounds, shared by all tenants.

Tenant Layer (Admin API)

Field	Description
`max_connections_per_node`	Maximum connections per node for this tenant
`max_create_connection_rate_per_second`	Maximum new connections per second for this tenant
`max_topics`	Maximum topics for this tenant
`max_sessions`	Maximum sessions for this tenant
`max_publish_rate`	Maximum published messages per second for this tenant

Tenant quotas are typically set lower than cluster quotas. The cluster quota is the hard upper bound; tenant quotas are subdivisions within that upper bound. The two are meant to be used together.

Design of the RobustMQ Rate Limiting Module ​

Why Two Layers? ​

Two Different Rate Limiting Approaches ​

Rate Limiting ​

Count Limiting ​

Scope of Rate Limiting ​

Scope of Count Limiting ​

Connection Count ​

Session Count ​

Topic Count ​

QoS In-Flight Window ​

Order of Rate Limiting Checks in Request Processing ​

Configuration ​

Cluster Layer (broker.toml) ​

Tenant Layer (Admin API) ​

Design of the RobustMQ Rate Limiting Module

Why Two Layers?

Two Different Rate Limiting Approaches

Rate Limiting

Count Limiting

Scope of Rate Limiting

Scope of Count Limiting

Connection Count

Session Count

Topic Count

QoS In-Flight Window

Order of Rate Limiting Checks in Request Processing

Configuration

Cluster Layer (broker.toml)

Tenant Layer (Admin API)