Skip to content

Storage Layer Storage Adapter Architecture

Design Philosophy

RobustMQ's storage layer adopts a pluggable architecture. The cluster supports multiple underlying storage engines, and businesses can flexibly choose based on actual needs. The core implementation is through the Storage Adapter layer, which uses Rust's Trait to define the capabilities required by underlying storage engines. Any storage system that implements these Traits can serve as RobustMQ's underlying storage engine, and supports runtime dynamic selection and switching.

Supported Storage Engines

Storage engines are divided into two major categories: local storage and remote storage.

Local Storage Engines

Local storage is mainly used for single-node deployment and development/testing scenarios. Currently, Memory and RocksDB are supported. The Memory engine stores data entirely in memory, suitable for temporary data and high-speed cache scenarios. RocksDB is a persistent KV storage engine suitable for development and testing environments that require persistence. The Local File engine is still in planning.

Local storage is characterized by simple deployment and suitability for single-node and test modes, but it does not support multi-node or multi-replica disaster recovery—that is, data reliability is entirely dependent on the reliability of a single machine.

Remote Storage Engines

Remote storage is used for cluster mode and production environments. Journal Engine is RobustMQ's self-developed distributed storage engine and is also the default storage engine; it is currently in development. It is specifically designed for message queue scenarios and can provide low-latency, high-throughput persistent storage capabilities.

The MySQL engine is already supported and is suitable for business scenarios that need to store message data directly in MySQL. Engines such as Redis, MinIO, and S3 are still in planning. MinIO and S3 are primarily aimed at large-scale data scenarios and can provide low-cost object storage capabilities.

Remote storage offers support for distributed deployment, multi-replica, and high reliability, making it suitable for production environments requiring high reliability and persistence.

Storage Adapter Architecture

The following diagram shows the position of the Storage Adapter in the overall system:

img

Core Capability Definitions

The Storage Adapter is a code module within the Broker that defines the capabilities required by underlying storage engines through Traits. It mainly includes the following aspects:

Shard management is responsible for the lifecycle management of shards, including creating shards, listing shards, deleting shards, and other operations. Data write capabilities support single writes and batch writes, with an offset returned after writing.

Query capabilities are flexible and support multiple dimensions. Data can be read by offset, by key, or by tag; it also supports finding the corresponding offset by timestamp. This design can meet the query requirements of different message protocols.

Consumer group management is responsible for consumption progress management, supporting commit and retrieval of consumption offset. Message expiration handles data lifecycle management and supports time-based and size-based data eviction policies.

Any storage system that implements these capabilities can serve as RobustMQ's underlying storage engine.

How to Use

Configuration

The storage engine is selected through startup configuration. The message_storage section in the configuration file is used to specify the storage engine type and related configuration:

json
"message_storage": {
  "storage_type": "Memory",
  "journal_config": null,
  "memory_config": null,
  "minio_config": null,
  "mysql_config": null,
  "rocksdb_config": null,
  "s3_config": null
}

The storage_type field specifies the storage engine type, such as Memory, RocksDB, MySQL, etc. The other *_config fields are the specific configuration options for each storage engine.

Multi-Engine Support

RobustMQ supports configuring multiple storage engines simultaneously. When the cluster starts, a default storage engine must be specified. When creating a Topic, you can specify which storage engine to use; different Topics can use different storage engines. If not specified when creating a Topic, the cluster's default storage engine is used.

Storage Engine Change

A Topic's storage engine can be changed, but a change strategy must be manually specified. One strategy is direct switchover without retaining old data—suitable for test environments or scenarios where historical data is not important. Another strategy is data migration, migrating data from the old engine to the new engine—suitable for production environments and important historical data. Storage engine changes are important operations and require careful selection of strategy.

Shard Concept

Design Background

Different message queue systems have different storage units, but they all essentially adopt an Append Only model. MQTT and AMQP use Topic and Queue as storage units, Kafka uses Partition, RocketMQ uses MessageQueue, and Pulsar also uses Partition. Although the names differ, the storage model is the same.

Shard Unified Abstraction

The Storage Adapter introduces the Shard concept to unify the abstraction of message data storage across different protocols. Shard was introduced mainly to mask the storage differences between different message systems, provide a unified storage interface and semantics, and simplify the implementation complexity of storage engines.

Shard is RobustMQ's unified data storage organization unit, similar to Kafka's Partition, RocketMQ's MessageQueue, and Pulsar's Partition. It supports distributed deployment and multi-replica storage, provides an Append Only write model, and supports queries across multiple dimensions such as offset, key, tag, and timestamp.

Through this unified Shard abstraction, RobustMQ can handle the message storage requirements of MQTT, AMQP, Kafka, and other protocols in the same way, and storage engines only need to implement one set of interfaces to support all protocols.