02: RobustMQ: Overview of Technical Design Philosophy
RobustMQ is a new generation high-performance multi-protocol message queue built on Rust. The vision is to become the next-generation cloud-native and AI-native messaging infrastructure. It is not simply "yet another message queue", but a rethinking and redesign of message queues for the AI era and cloud-native requirements.
In 《RobustMQ: Redefining the Future of Cloud-Native Message Queues with Rust》, we defined that RobustMQ has six characteristics: high performance, Serverless, pluggable storage, minimalist high-cohesion architecture, compute/storage/scheduling separation, and multi-protocol support.
Overall, these six characteristics represent the core technical philosophy of RobustMQ's aspiration to become a new generation high-performance multi-protocol message queue. Next, we will elaborate on our thinking process for each of these six characteristics.
From an architectural design philosophy perspective, RobustMQ draws from and references the advantages and disadvantages of current mainstream message queues and mainstream infrastructure software (such as databases and distributed storage) in the industry. Combined with our understanding of message middleware business usage, we hope to create an open-source message queue product that can meet various scenarios. Therefore, this article only represents our current architectural design summary. In the long term, the architecture will continue to evolve and update.
High Performance
At the beginning of RobustMQ's design, the first keyword was high performance. Currently, mainstream message queues are built with Java, and compared to other languages, Java does have performance degradation compared to C/C++ in terms of GC pauses and language performance. Therefore, many message queues in the industry have attempted to implement based on C++, with RedPanda being a relatively successful product.
RobustMQ developers have previously used C++ to build message queues, but due to C++'s development efficiency and security risks such as memory leaks, the project maturation cycle was too long, leading to less than ideal results. The emergence of Rust has shown us the light. From industry practice, we believe that from the perspectives of performance, development efficiency, and security risks, Rust can solve the problems we encountered when developing with C++ and will have significant performance improvements.
Therefore, RobustMQ hopes to achieve high performance through Rust's language advantages, combined with good architectural design and coding implementation. From a practical perspective, Iggy, a message streaming engine built on Rust, claims to achieve nanosecond-level latency. This also proves the advantages of this language choice.
Serverless
The second keyword is Serverless. With the popularization of cloud technology, K8s/Docker, cloud disks, and distributed storage (such as cloud object storage, MinIO), traditional message queues like Kafka, RocketMQ, and RabbitMQ have a coupled compute-storage architecture with local storage, which prevents them from truly leveraging the advantages of cloud and K8s technologies. They cannot achieve true on-demand usage and elastic scaling. Therefore, Pulsar built a compute-storage separation architecture based on Apache Bookkeeper, hoping to solve the problem that traditional message queues cannot elastically scale through this separation architecture. This design philosophy was an excellent innovation and reference in the message queue field at that time (around 2019), representing the right direction worth referencing and learning from.
Therefore, RobustMQ's first priority is to implement a compute-storage separation architecture.
Pluggable Storage
The third keyword is pluggable storage. Similar to Pulsar using Bookkeeper as a storage engine, a current technical trend in the message queue industry is to use distributed storage (such as cloud object storage, data lakes, HDFS) as the underlying storage engine for message queues. This achieves compute-storage separation and storage cost reduction. The advantage of this technical approach is that it leverages the advantages of current public cloud architecture and greatly reduces the workload of developing storage layers and data storage costs. However, it brings the following problems:
Degradation of read-write performance (although there are some ideas in the industry to improve performance, such as preheating data with local high-performance disks, WAL, etc.), but undoubtedly, the first problem this technical approach brings is that performance cannot reach its peak. Therefore, this technical approach satisfies business scenarios with large data volumes, high cost requirements, and latency insensitivity. However, in many scenarios (such as financial scenarios), businesses require stable 10-millisecond level or even lower latency from message queues.
Not all businesses allow the use of cloud services. In many scenarios, data has local storage requirements. If cloud storage is mandatory, then this technical approach cannot be satisfied in such scenarios.
RobustMQ hopes to create a message queue that can meet various scenarios, solving both cost and performance requirements, as well as multi-deployment scenario needs.
Therefore, RobustMQ designs the storage layer to be pluggable and includes built-in memory, local segmented multi-replica storage, and remote distributed storage (object storage, HDFS, etc.). Businesses can choose storage engines at the cluster and Topic levels according to their needs to satisfy various scenarios.
High Cohesion Architecture
The fourth keyword is high cohesion architecture. A major performance bottleneck and stability risk of current mainstream message queues is metadata storage (such as Topic, node metadata). For example, both Kafka and Pulsar initially relied on Zookeeper for metadata storage and distributed coordination. Today, Kafka has spent enormous effort removing Zookeeper and built internal metadata storage based on the Raft protocol. Pulsar has also spent significant effort supporting multiple metadata storage services. From a technical perspective, when a distributed service depends on another distributed service, there will be significant risks and bottlenecks in terms of functional compatibility, stability, and performance. This is why Kafka and Pulsar have been building and removing third-party metadata components.
Therefore, from the beginning of its design, RobustMQ hoped to be a high cohesion architecture, meaning a single binary file can start a cluster without any external dependencies. This benefits long-term architectural evolution, stability, and can reduce operational costs.
Compute, Storage, and Scheduling Separation
The fifth keyword is compute, storage, and scheduling separation. In distributed clusters, there is a lot of internal cluster coordination work. If relying on metadata storage services, current mainstream message queues will choose one node in the compute cluster to complete this work. In RobustMQ, we moved the coordination function to the metadata storage service, meaning Brokers achieve complete computation and only need to be responsible for message data reading and writing. This achieves the capability of three-layer separation: compute, storage, and scheduling. The benefit of this approach is that any layer can quickly and independently scale elastically without affecting each other. For example, when a cluster is deployed in K8s, regardless of which layer encounters bottlenecks, it can quickly scale without scaling problems caused by architectural coupling.
Multi-Protocol
The sixth keyword is multi-protocol. In the message queue industry, there are two distinct directions: one focuses on performance, functionality, and reliability in the messaging direction, represented by products like RocketMQ and RabbitMQ. The other focuses on throughput in the streaming direction, represented by Kafka. This causes businesses to need to choose different products for different scenarios. The key issue is that different products have different protocols and client SDKs. This means businesses need to modify business code when choosing different products, greatly increasing the cost of business switching. With the industry's proposed concepts of stream-batch integration and message-stream fusion in recent years, if there could be a message queue product that is compatible with multiple protocols and meets different scenarios, it would be a great boost to enterprise efficiency, cost, and stability.
Therefore, RobustMQ will not design new communication protocols but will be compatible with mainstream message queue protocols, such as MQTT, Kafka, AMQP, RocketMQ, and even Pulsar. The benefit of this approach is that RobustMQ can focus on the kernel without spending effort building an ecosystem. It allows businesses to migrate at zero cost without any changes. On the other hand, when RobustMQ is mature enough, it can greatly accelerate RobustMQ's adoption and usage speed.
Summary
In summary, RobustMQ's core technical philosophy is:
- Implement a high-performance message queue kernel based on Rust.
- Based on a three-layer architecture of compute, storage, and scheduling separation, each layer has complete elastic scaling capabilities, giving the system complete Serverless capabilities.
- Based on pluggable storage capabilities, meet the storage engine needs of different scenarios, such as cost or performance.
- Based on multi-protocol adaptation, be compatible with current ecosystems, focus on the kernel, and reduce business switching costs.
- High cohesion kernel, no external dependencies, one binary, one-click cluster startup.