Skip to content

The Next Decade of Message Queues: Forget Kafka

I've been working in the message queue space for a long time.

Over the past few years I've watched S3 storage, Kafka compatibility, and storage-compute separation emerge one after another. Each felt good, but each felt like it was missing something. I couldn't quite articulate what was missing — the engineering quality was fine, the direction was reasonable — but after reading through them I never had that feeling of "yes, this is the future."

I've recently figured out why. What's missing isn't the implementation; it's the starting point of the thinking. Our understanding of what a "message" is has been too narrow.

It's not that people haven't thought about it — it's that the scenarios hadn't arrived yet. When the scenarios aren't there, you can't think about them. Now they're here.


Kafka Defined How We Think

Kafka was open-sourced by LinkedIn in 2011 to solve a very specific problem: large-scale log collection and real-time data pipelines. Append-only log, multi-replica, partition, consumer group — this model perfectly matched the core demand of the big data era: moving massive amounts of data from A to B, without loss, without disorder, fast enough.

And then this model won. It won completely.

It didn't just become one product's design choice; it became the default mental model for the entire industry — what a messaging system "should" look like. Today, when you talk to any engineer about message queues, the first image that comes to mind is almost certainly: topic, partition, offset, consumer group.

There's nothing inherently wrong with this. Kafka's success is well-deserved; it truly reached the limit of what's possible in its own scenario. The problem is that when a model wins too completely, it calcifies into a way of thinking that locks everyone — including later competitors — into the same framework.


S3 and the Data Lake Narrative: Running Out of Road in the Right Framework?

What has been the hottest direction in the message queue space over the past few years? S3 storage.

AutoMQ moved Kafka's storage layer to object storage, dramatically cutting costs. WarpStream went further, writing data directly to S3 without local disk at all. Confluent itself is working on Freight, taking the same path. StreamNative is doing Pulsar + Iceberg integration, landing message data directly in a Lakehouse to eliminate intermediate movement — a different posture, but at its core still optimizing the efficiency and cost of moving data from A to B. Redpanda rewrote Kafka in C++, pushing performance to the limit. Pulsar did storage-compute separation early on, with BookKeeper and tiered storage.

The engineering quality of these products is high, and the problems they solve are real. But if you step back and look, you notice something interesting: everyone is competing under the same assumption, just optimizing along different dimensions.

Redpanda optimizes performance — same model, faster with C++. AutoMQ optimizes cost — same model, cheaper with S3. Pulsar optimizes elasticity — same model, more flexible with storage-compute separation. Confluent optimizes ecosystem — same model, better upstream and downstream integration.

Nobody questions the model itself.

Why is everyone focused on S3? Because if you accept Kafka's mental model — messages are an append-only log, and the core capability is throughput and reliability — then the only optimization path left is storage cost. Local disk is expensive; S3 is cheap; the logic is complete.

But this is also the ceiling of the S3 narrative: it can only tell a story about cost and latency.

"We're 80% cheaper than Kafka" — not bad. But the trade-off is tail latency going from 50ms to 100ms. Cheaper, but slower. You finish reading and still feel like something is missing. What's missing? An answer about the future. The S3 narrative is always caught in a trade-off between cost and latency. An 80% cost reduction solves today's problems; it doesn't answer what tomorrow's communication needs will look like.

This isn't to say the S3 direction is wrong. Within Kafka's evaluation framework, this path has been walked very correctly. But precisely because it has been walked so correctly, it exposes a deeper problem: when you've optimized cost and performance to their limits within a fixed model, where do you go next?

The answer is: within this model, there is no next step.


The Logic of Selling Iron

There's a deeper layer of lock-in.

Kafka's business model is essentially selling resources. More traffic, more partitions, more brokers, more disk — more revenue. Confluent Cloud charges by throughput and storage. Redpanda does too. AutoMQ has brought the unit price down with S3, but the commercial logic hasn't changed — when users scale up and consume more, you make money.

This is the "selling iron" logic: your revenue scales proportionally with the physical resources users consume. High volume, high traffic — that's how you earn.

In the internet scenarios of the past decade, this logic was entirely valid. Enterprise log volumes, transaction flows, real-time data warehouses — data only ever grew, and Kafka's revenue moved in lockstep with the internet growth curve.

But this logic locked the entire industry's attention on "volume." Everyone was asking: how do we handle more throughput? How do we store more data? How do we reduce the cost per GB? Because those questions directly corresponded to revenue.

The result: we kept optimizing the efficiency of moving data, while fewer and fewer people asked what we were actually moving, who we were moving it for, and what the essential needs of communication actually are.

I don't think this is wrong. In the age of "volume," these were the right things to do. But I don't think this is the future.


Communication Comes in More Than One Form

If you step outside Kafka's mental model and look fresh at "communication," you find its forms are far richer than append-only log.

Mailbox. I send you a message; you're offline; the message waits; when you come online you collect it yourself. You can delete it after reading, or keep it. This is the oldest async communication model — email works exactly this way. But in Kafka's world, to implement this semantic you need to create a topic per person, configure retention, manage offsets — fine for 10 people, a disaster for 100,000.

Broadcast. I shout something out; whoever hears it handles it; I don't care who heard. No persistence needed, no acknowledgment, no replay. MQTT's QoS 0 is this semantic; NATS's core pub/sub is too. But if all you have is Kafka, you need to create a topic, configure a consumer group — even if the message's lifetime is only a few seconds.

Latest value. I only care about the current value of some state; historical values mean nothing to me. Sensor temperature, device online status, service health — what's needed is a key-value overwrite, not an append-only write. Kafka has compacted topics as an approximation, but that's a patch on the log model.

Request-response. I send a request; I wait for a response; timeout means retry. NATS natively supports request-reply; AMQP can implement it with reply-to. Kafka is inherently poor at this — it was designed for async streams, not sync interaction.

Ephemeral channel. Two participants temporarily establish a communication link, discard it when done. No persistent infrastructure needed — what's needed is extremely low creation cost and automatic lifecycle management.

These communication forms have always existed; they were just marginalized in the Kafka-dominated narrative.


AMQP and NATS: Two Underrated Paths

It's worth looking back at two protocols that were undervalued in Kafka's shadow.

AMQP got one thing right in 2006: separating routing from storage. Exchanges handle routing logic — direct, fanout, topic, headers, each an independent routing strategy. Queues handle storage and consumption. The value of this separation is that routing strategies can be flexibly combined: the same message can be delivered to different places based on different rules, without copying it multiple times.

RabbitMQ lost on performance against Kafka, but its model is far richer in flexibility. Unfortunately, Erlang's performance ceiling and the limitations of a single-machine architecture led many to conflate the value of the AMQP model with the implementation limitations of RabbitMQ. The protocol's design ideas were sound; the implementation just didn't keep up.

NATS took a different path: extreme simplicity. Core NATS is pub/sub + request/reply, no persistence — messages are gone once sent. In Kafka's evaluation framework, this is practically a "toy" — no message persistence? What kind of message queue is that?

But the problem NATS set out to solve was never data pipelines; it was service-to-service communication. It wanted extremely low latency, an extremely simple protocol, and extremely low resource footprint. JetStream was added later to fill in persistence, but NATS's soul isn't in JetStream — it's in the lightness and directness of core pub/sub.

These two protocols — flexible routing and extreme simplicity — didn't get the attention they deserved in the Kafka-dominated decade. But the changing landscape of scenarios is bringing them back to center stage.


The Agent Era: The Scenario Changed, the Model Must Follow

Huang Dongxu recently wrote an article about AI Agent infrastructure software with a striking data point: more than 90% of new TiDB Cloud clusters created each day are being created directly by AI Agents.

This is not a prediction; it is already happening.

The way Agents use infrastructure is completely different from human developers. Agents are temporary — they may live for only a few seconds. They are numerous — an orchestration system runs thousands of Agents simultaneously. They are autonomous — they don't deliver messages to pre-defined topic structures; they need to build communication relationships dynamically.

Put these characteristics into Kafka's model and every one of them is friction. Creating a topic for an Agent that will live for 5 seconds, configuring 3 replicas, allocating partitions — the Agent dies and you still have to clean up. Ten Agents is fine; 100,000 Agents is a systemic burden.

The deeper change is the "value density" of messages. In a data pipeline, every message is a business event; if it's lost, data is lost. In Agent communication, a large proportion of messages are temporary, retryable, one-time. Agents can resend, retry with a different Agent, or try a different strategy.

High reliability, multi-replica, zero loss — the core metrics of the last era become unnecessary cost in this scenario.

This isn't to say reliability is no longer important; rather: not every communication needs the same level of reliability. Choice matters more than absolute guarantees.

Some messages need three-replica persistence — critical event logs, audit records. Some messages only need to live in memory for a few seconds — coordination signals between Agents. Some messages need temporary persistence — wait for the other party to come online for delivery, then auto-clean. A good communication infrastructure should let users choose based on their needs, not treat every message at the highest specification.


The Underlying Layer Must Be Atomic

At this point, we can discuss a deeper judgment.

The messaging systems of the past decade have followed "large model" thinking — one append-only log solves everything. Where the scenario doesn't fit, you adapt with configuration and parameters. The S3 storage direction is also performing surgery on this large model — swapping out the disk underneath, leaving everything above unchanged.

I believe the direction of the next decade is the opposite: the underlying layer must be atomic and fine-grained; the upper layer composes and packages capabilities by scenario.

What does "atomic" mean?

Storage is atomic — a message's storage can be in-memory, on local disk, or in object storage, determined by the scenario, not mandated by the system. Routing is atomic — point-to-point delivery, fan-out broadcast, rule matching: each routing logic exists independently and can be combined on demand. Lifecycle is atomic — a message's TTL, cleanup policy, and persistence level can differ per message. Tenancy is atomic — the cost of creating a communication entity approaches zero, as does destroying one.

How does the upper layer compose?

Need Kafka semantics — compose with persistent storage + ordered routing + offset management. Need mailbox semantics — compose with temporary persistence + point-to-point routing + TTL. Need broadcast semantics — compose with in-memory routing + subscription matching. Need request-response semantics — compose with ephemeral channels + timeout mechanism.

Not one model to rule them all — a fine enough lower layer, assembled on demand at the top.

This is the same direction as the evolution of the database space: from "one Oracle solves everything" to Redis for caching, Elasticsearch for search, ClickHouse for analytics, TiDB for HTAP. The difference is that for messaging, the better path may not be building a separate product for each scenario, but using an atomic-enough underlying capability to compose solutions for different scenarios.


The Evaluation Framework Must Change Too

If the underlying assumptions change, the evaluation framework must change with them.

Over the past decade, messaging systems competed on: how many messages per second? What's the end-to-end latency? How many nines of availability? These metrics measure "volume" capability — how much data you can move.

The next decade may need to add another set of metrics: how many independent communication entities can the system simultaneously support? What is the cost of creating and destroying one communication entity? How much work does it take to switch from "three-replica persistence" to "pure in-memory"? How many communication modes can the system serve simultaneously without downtime?

The first set measures "volume"; the second measures "variety."

My judgment is: the core competitiveness of the next generation of messaging infrastructure lies in "variety," not "volume."

The "volume" problem has been solved well enough by Kafka and its successors. On the path of append-only log plus multi-replica, S3 storage is probably the last major optimization space; the marginal returns after that will be increasingly small.

The real incremental space lies in "variety" — in those communication scenarios that Kafka's model doesn't cover, that are growing rapidly, and that have no good answer today.


In Closing

This post didn't provide specific technical solutions, because what I wanted to say wasn't "what should be done" but "how to think about it."

The progress in the message queue space over the past decade has been enormous; the maturity of the Kafka ecosystem is impressive. The S3 storage direction also solves real problems. But when scenarios undergo fundamental change, the success of the previous generation often becomes the cognitive shackle of the next.

Append-only log is not the whole of communication. High throughput, high reliability, and low latency are not the only metrics that matter. Optimizing storage cost to its absolute minimum is not the same as finding the future.

The more atomic the underlying layer, the freer the upper layer.

Getting these things straight matters far more than choosing which product to use.

🎉 既然都登录了 GitHub,不如顺手给我们点个 Star 吧!⭐ 你的支持是我们最大的动力 🚀