Skip to content

Self-Reflection: Can We Really Become the Next Generation Messaging Infrastructure?

From the day the project started, we set a big goal for RobustMQ: to become the next-generation cloud-native and AI-native messaging infrastructure. Sounds impressive, right? But honestly, I often wonder: Can our technical roadmap really support this goal?

So in this article, I want to put RobustMQ alongside Kafka, Pulsar, NATS, Redpanda, and Iggy, and take a冷静 look at whether we can actually deliver. No boasting, no self-deprecation — just honest assessment.

After thinking about it for a long time, my conclusion is: The direction is probably right, but whether we can pull it off is really hard to say.

Specifically:

  • The cloud-native path — I think we can make it work, but it will take at least 2-3 years to see results
  • AI-native — to be frank, we haven't even figured out what that should look like
  • Multi-protocol unification is the core selling point, and success or failure hinges on whether we can do the Kafka protocol well
  • On performance, Rust does have advantages, but language alone isn't enough — we need data to prove it

If I had to name a critical path: First, nail the Kafka protocol → find a few early adopters → spend 3-5 years slowly building the ecosystem → focus on differentiated scenarios like IoT and big data integration. Sounds long, but that's how infrastructure software works.


I. Why We Want to Do This

Let me start with why we have this "next generation" idea.

Cloud-Native Is Indeed the Trend

Kafka's compute-storage coupled architecture makes scaling really painful, and it still depends on ZooKeeper. Pulsar did storage-compute separation, but look at its architecture — Broker, BookKeeper, ZooKeeper, that triple setup isn't easy to operate either.

In the cloud, compute and storage have different pricing and scaling models. Separating them — scale compute when needed, scale storage when needed — that makes sense. So we believe storage-compute separation is definitely the future direction.

Multi-Protocol Unification — Does Anyone Actually Want It?

We've been asked this many times. My answer: Yes, people really want it, and the demand is quite strong.

Look at enterprises today: IoT devices run MQTT, big data uses Kafka, microservices use AMQP. Maintaining so many systems is costly, and data ends up in silos everywhere.

The problem is that most existing message queues are single-protocol. Pulsar has plugins (KoP/MoP/AoP), but they still require extra maintenance. Multi-protocol unification is technically hard — different protocol semantics, performance overhead to control — but we think this direction has value.

The Rust Choice

We really deliberated on the language choice:

Java/JVM (Kafka, Pulsar) has GC pauses that cause latency spikes. C++ (Redpanda) delivers great performance, but memory safety relies entirely on programmer discipline — one slip and you're in trouble. Go (NATS) is decent, low GC latency, but it still has GC. Rust — memory safety guaranteed at compile time, performance on par with C — but ecosystem and hiring are real issues.

In the end we chose Rust. Not because it's perfect, but because we think it strikes a good balance between safety and performance. Of course, choosing Rust means bearing the cost of an immature ecosystem and recruiting difficulties — we're prepared for that.

"AI Native"... Honestly I Haven't Figured It Out Either

This is what keeps me up at night right now.

What AI scenarios need from message queues is clear: high throughput, low latency, flexible storage, rich connectors. But what does "AI native" actually mean? What role should message queues play in AI? Just a data pipeline? Or deeper integration with stream processing, feature engineering, model inference?

I've looked around — the industry doesn't really have clear success cases. Kafka supports AI scenarios through peripheral components like Streams and Connect, and that path has worked.


II. A Calm Look at Ourselves

Having talked about external trends, let's turn back and look at ourselves.

What We're Doing Okay On

Rust + Storage-Compute Separation

Rust gives us zero GC and memory safety, but hiring is really hard and the ecosystem is still building.

Architecturally we designed three layers: Broker (protocol routing), Journal Server (persistence), Meta Service (metadata). Compared to Pulsar, we use Raft instead of ZooKeeper, one less dependency. But how this architecture performs in production, resource efficiency, operational complexity — honestly we'll only know when it's really running.

For plugin-based storage, we support memory, SSD, S3, HDFS, and several other backends. Sounds flexible, but the implementation has plenty of pitfalls:

First is performance — abstraction layers add overhead. Then consistency — S3 is eventually consistent, local disk is strongly consistent, how do we unify at the upper layer? And failure handling — different storages fail in different ways. Finally operations — supporting this many storages multiplies testing and debugging complexity.

We've gained flexibility, but the cost is real.

Single Binary

This we're quite happy with. One file does it all — convenient for development and testing, runs in edge scenarios, simple to deploy in production.

The Challenges We Face

Multi-Protocol Unification Is Really Hard

Protocol semantic mapping is the hardest part: MQTT's QoS, Kafka's partitions, AMQP's routing — these concepts are completely different, how do we unify them? How do we control performance overhead? How do we balance flexibility with high performance?

Current progress: MQTT is done, Kafka is in development, AMQP and RocketMQ are still planned.

But honestly, the Kafka protocol is way more complex than we imagined. Consumer Group, Rebalance, transactions, idempotence... each one is non-trivial. If we don't do this protocol well, multi-protocol unification is basically empty talk.

Ecosystem Gap — This Is the Most Brutal Reality

The gap becomes clear when you compare:

  • Kafka: 300+ connectors, Streams API, all kinds of tools
  • RobustMQ: 8 connectors, no stream processing yet, toolchain just starting

Pulsar took 6-7 years to build its ecosystem. We estimate 3-5 years for us. How do we keep the project alive during that time? How do we attract contributors? How do we find early adopters? These are all open questions.


III. Is the "Cloud-Native + AI-Native" Goal Realistic?

Now we get to the key part.

Cloud-Native

Architecturally we should be aligned — storage-compute separation, K8s support, single binary for fast cold starts in Serverless scenarios.

But ecosystem integration is still far behind. Service Mesh, Observability, GitOps — Kafka and Pulsar are deeply integrated, we're just getting started. And whether this architecture actually works in production still needs validation.

My assessment: The direction is probably right, but reaching Kafka/Pulsar level maturity will take at least 2-3 more years.

AI Native... I Really Don't Know What to Do

This is where I'm most lost.

What we have now are some data connectors (MySQL, MongoDB, etc.), but what's the connection to "AI native"? These connectors are essentially generic data integration, not optimized for AI scenarios.

And what should "AI native" even be? What should message queues do in AI? The industry doesn't have clear success cases either.

I'm wondering if we should change the positioning. Instead of "AI native," say "high-performance message queue optimized for AI scenarios" — focus on low latency, high throughput, flexible storage, connectors, these foundational capabilities. That would make the goal clearer.

Multi-Protocol Unification — Kafka Is Make or Break

This is our core selling point and our biggest risk.

The architecture supports multiple protocols; MQTT proves the approach works. But the Kafka protocol is far more complex than MQTT. If we don't do Kafka well, multi-protocol unification is basically dead.

If we can nail Kafka by 2026, this direction will be validated. If we only deliver a half-baked implementation, that would be awkward.

Performance — Rust Alone Isn't Enough

Rust's zero GC is an advantage; latency should theoretically be more stable. But message queue performance isn't just about the language — network, disk, concurrency model, serialization all matter.

What we're missing:

  • Benchmark comparisons with Kafka/Pulsar on the same hardware
  • Long-running P99, P999 latency data

Talking about language advantages isn't enough; we need data.


IV. Mountains Still Ahead

With all these issues, what's next?

Protocol Completeness

The Kafka protocol is really complex. Consumer Group, Rebalance, transactions, idempotence... If we do a subset, power users won't be satisfied; if we do full compatibility, the cost is too high. AMQP is the same — routing model mapping, multi-language SDK compatibility, all hard problems.

Our current approach: iterate, get core functionality right first, build automated tests, put R&D resources here. But honestly, this will be a long process.

Ecosystem Building

Connectors, monitoring (Prometheus/Grafana/Jaeger), management tools (Dashboard/CLI)... all need to be built one by one.

Kafka has Confluent backing it, Pulsar has the Apache Foundation. If we can enter Apache incubation, the brand would help, but ultimately the project itself needs to be attractive.

The plan: get core functionality right first, then gradually expand, build community incentives, sustained investment. But this cycle will be long.

Production Case Studies

This is a chicken-and-egg problem: no case studies means enterprises won't dare use it; no usage means no case studies.

We're planning an early adopter program — find some startups, non-critical business, academic institutions to try it first. We'll provide full support, quick response to needs, and promote success stories actively.

The goal is to land 1-2 benchmark scenarios first — like IoT platform or real-time data pipeline — build a user community, then rely on word of mouth.

Competitive Pressure

The message queue market is already mature; Kafka is the leader. Redpanda has good performance and simple deployment, growing fast. Iggy is also written in Rust and actively developing.

We certainly can't go head-to-head with Kafka. We need to find differentiated angles: multi-protocol, IoT, edge computing.


V. What Really Matters Most?

After all this thinking, I think the key points are:

Technical Execution

This is the foundation. Get the Kafka protocol right, optimize performance, improve stability.

MQTT is done, Kafka is in development. If we don't get this right, everything else is wasted.

Market Positioning

We can't do everything. We need to find our stronghold — focus on IoT + big data integration, edge to cloud scenarios.

Going up against Kafka in every scenario would be suicide.

Ecosystem Building

Target 50+ connectors; monitoring tools, management console — all need to be built. We only have 8 connectors now, toolchain just starting.

This is a long war; 3-5 years of investment.

Production Case Studies

This is critical. With 1-2 benchmark cases, enterprise concerns will ease significantly.

The project is still early; case studies are scarce. We need to find a few willing users quickly.


Closing Thoughts

After writing this article, my feelings are mixed.

The comforting part: The technical direction is probably right. Rust, storage-compute separation, plugin-based storage, multi-protocol unification — all align with industry evolution. And as far as I can see, among open-source projects with "multi-protocol," "storage-compute separation," "Rust," "single binary," and "plugin-based storage" — we might be the only one. The differentiation is clear.

But the anxiety is also obvious: The execution difficulty is really high. Finishing MQTT was just 0 to 1; going from 1 to 10 requires completing protocols, validating performance, and finding benchmark cases — all three are essential. Going from 10 to 100 requires sustained ecosystem building and branding — at least 3-5 years — with lots of uncertainty.

For the "AI native" positioning, I think we really need to reconsider. Rather than empty claims, let's be practical: "high-performance message queue optimized for AI scenarios," focus on doing the basics well.

If you want to try RobustMQ:

  • 2025: Non-critical business can try it, especially MQTT scenarios
  • 2026: If the Kafka protocol is done, small-to-medium production environments should be viable
  • 2027+: Then consider large-scale replacement of existing systems

This isn't false modesty — it's the truth. Technical innovation takes time; ecosystem building even more so.

Kafka took a decade to become the industry standard; Pulsar took seven years to build its ecosystem. We've just started; there are countless pitfalls ahead. But precisely because it's hard, succeeding would mean something.

Every line of code, every protocol implemented, every architecture optimization — we're moving forward. The journey will be long, it will be hard, but with the right direction, perseverance, and community support, we should get there.

Thanks to everyone who follows RobustMQ, thanks to contributors, thanks to users who give feedback. Let's walk this road together.

Honestly, we're climbing the first mountain now, still at the base. But at least we've started climbing.


Project Links

If you found this article helpful, we'd appreciate a Star.


2025-01-29 / RobustMQ Team