The State and Future of Agent Communication: Where mq9 Fits
What mq9 is trying to do — define a standard protocol for Agent asynchronous communication — sounds valuable, but a few questions need answers: why can't existing infrastructure evolve to solve it? And not just message queues — what about other infrastructure? How real is the developer pain today? And is this a need that only exists in the Agent era?
This post works through all of these angles.
Is Mailbox Communication a New Problem?
No. This need has always existed.
What came before Agents? Microservices. The problem of asynchronous communication between microservices is fundamentally the same as Agent communication — sender and receiver don't need to be online at the same time, messages can't be lost, there needs to be priority, there needs to be broadcast support.
How did the microservices era handle it? Every team stitched their own solution — a bit of RabbitMQ here, a bit of Kafka there, some Redis, some HTTP callbacks. It worked, but there was never a unified, simple standard.
Why did no one build one? Because microservices had long enough lifecycles and stable enough uptime that HTTP plus retries mostly sufficed. The pain wasn't painful enough to demand a dedicated solution. Teams gritted their teeth, stitched things together, and moved on.
Agents amplified that pain by ten. Lifecycle went from months to seconds. Uptime went from stable to unpredictable. Numbers went from dozens to tens of thousands. The "grit and stitch" approach that barely worked for microservices collapses entirely with Agents — HTTP calls fail, Redis messages drop, Kafka is too heavy to spin up on demand.
So it's not "mailbox communication is only needed in the Agent era." It's "the Agent era made mailbox communication impossible to ignore." The need was always there. It just wasn't painful enough before. Now it's painful enough that someone needs to solve it specifically.
mq9 is not solving a brand-new problem. It's solving an old problem that has finally reached a breaking point.
Can Message Queues Evolve Into an Agent Mailbox?
Let's go through them one by one.
Kafka / Confluent
For Kafka to handle Agent mailbox communication, it would need to layer on top of its append-only log: automatic creation and teardown of temporary topics, priority queues (which don't exist today), TTL-based auto-cleanup, and lightweight identity management.
Someone will say: Kafka evolved from log collection to stream processing, event sourcing, and CQRS. Each time, people said "Kafka isn't suited for this." Each time, Kafka proved them wrong. Why not Agent communication?
The difference is that every previous evolution was a natural extension of the same model — high-throughput persistent data streams, the append-only log assumption unchanged. But an Agent mailbox requires temporary channels, priority queues, and auto-cleanup. That's not an extension; it conflicts with the existing model. Kafka can add these features, but after doing so, Kafka would have two mental models: data streams and mailboxes. Two mental models in one product is a burden for users and a maintenance cost for the engineering team.
The more concrete problem is implementation cost. Adding mailbox semantics isn't a few lines of code — it means changing the storage model, adding priority queues, adding dynamic topic lifecycle management, adding TTL cleanup. The result would be two near-independent systems coexisting in one process. Confluent's engineering team won't take that on. Not because it's technically impossible, but because it's the wrong thing to do, and doing it right would mean building something essentially new.
The more likely path is Confluent building an Agent communication framework on top, wrapping Kafka underneath. It would work, but it would be heavy — and Kafka's business model is selling throughput and storage. Lightweight Agent communication doesn't fit that revenue model.
NATS
NATS is the closest to mq9. Core pub/sub is already there. Subject-based addressing is already there. Queue groups are already there.
NATS folks will say: add an Agent communication feature set on top of JetStream, ship an official nats-agent SDK that wraps the four command words, and the user experience is just as simple as mq9. You say we "won't deeply customize for one scenario" — but if the scenario is big enough, why wouldn't we?
That pushback has merit. But the question is how, specifically. JetStream streams are pre-created, long-lived, and offset-based. Mailboxes are dynamically created, temporary, and have no offsets. Building mailboxes on JetStream means either one stream per mailbox — expensive to create, JetStream is not designed for millions of streams — or all mailboxes sharing a stream with filter subjects, which loses isolation and makes TTL and cleanup complex. An SDK can wrap an API, but it can't abstract away a mismatch in the underlying model.
To do it right, NATS would also need to open a new path alongside JetStream — storage and lifecycle management designed specifically for mailbox semantics. That's roughly the same cost as building it from scratch.
The second practical issue is that NATS is general-purpose infrastructure. It serves all users. The more likely outcome is generic features (more flexible TTL, priority support) rather than a purpose-built namespace like $mq9.AI.* with deep mailbox semantics.
Pulsar
Pulsar has delayed messages, TTL, and multi-tenancy — a better foundation than Kafka. But its core abstraction is still topic/subscription/cursor, designed for persistent message streams.
It faces the same "wrong thing to do, and doing it right means starting over" problem. StreamNative's current direction is Kafka compatibility and Iceberg integration. Agent communication is not on the roadmap. Pulsar's ecosystem is contracting, and the motivation is low.
Redis
Redis actually has the best shot from a different angle. It has pub/sub, Streams, TTL, and lightweight data structures.
Someone could define an Agent mailbox convention on top of Redis — Streams for persistent mailboxes, pub/sub for broadcast, Sorted Sets for priority. Technically, you can piece it together.
But what you end up with is a collection of Redis commands, not a unified semantic, not a protocol-level agreement. Every team's implementation would differ. And Redis is an in-memory database — the cost structure is wrong. As Agent communication scales, memory won't hold.
RabbitMQ
RabbitMQ's AMQP model is the most flexible in terms of routing — exchanges for routing, queues for storage, naturally layered. It has priority queues, TTL, and reply-to.
But performance ceilings and single-node architecture are hard limits, and Erlang's constraints are real. The community's attention is on chasing Kafka compatibility (AMQP 1.0, Stream plugin), not the Agent scenario.
Not Just Message Queues: Can Other Infrastructure Grow a Mailbox?
Mailbox communication requires four capabilities simultaneously: persistent storage, real-time push, broadcast/subscribe, and lifecycle management. Let's look across all categories of infrastructure.
Databases
A table can serve as a mailbox — mail_id, priority, payload, created_at, ttl. Writing a message is INSERT, receiving is SELECT + DELETE, priority is ORDER BY, TTL is a scheduled cleanup job. Many teams do exactly this today — polling a database.
Someone will say: Postgres has LISTEN/NOTIFY, Supabase built real-time subscriptions on top — you can't say databases have "no push."
That's a fair criticism and deserves precision. But Postgres LISTEN/NOTIFY is process-scoped and doesn't persist across connections — if the connection drops, the notification is lost. Same problem as Redis pub/sub. Supabase's real-time subscriptions add a WebSocket layer plus change data capture on top, which is essentially bolting a pub/sub engine onto the database — which proves the point: databases can't do it alone; you have to add something. Once you've added all that, the complexity is no simpler than just using a message queue.
Broadcast and wildcard subscriptions also lack native support. Databases can grow the storage half of a mailbox, but not the real-time push and broadcast half.
etcd / ZooKeeper / Nacos
Distributed coordination services. Core capability is strongly consistent KV storage plus Watch change notifications. The Watch mechanism is close to a mailbox's "real-time push" — subscribers are notified immediately when a key changes, and etcd Watch even supports prefix matching.
Someone will say that etcd Watch plus Lease TTL together cover more scenarios than you'd expect. True — but there are a few hard limits you can't get around. These systems are designed for strong consistency; every write goes through consensus, write throughput is low, and they can't handle high-frequency messages. They are optimized for "low-write, high-read" configuration management. A key holds only the latest value — there's no concept of "five unread messages in the mailbox." No broadcast, no queue groups, no priority. Storage capacity is limited; etcd's official guidance is to keep data under a few GB.
Having a capability and having enough of it are different questions. Systems like etcd can grow "state awareness" but not a complete mailbox.
Time-Series Databases
InfluxDB, TDengine, TimescaleDB. Core capability is efficient write and query by timestamp. The storage part is there — messages written by time, TTL natively supported via retention policies, write throughput typically high.
But like databases, there's no real-time push — purely pull-based. And the query model doesn't match: time-series databases aggregate over time ranges; a mailbox needs "take the earliest unread message by priority."
Time-series databases can serve the archival and analytics half of a mailbox, not the real-time communication layer.
S3 / Object Storage
S3 can store messages, using prefixes to simulate mailbox directory structure, TTL via lifecycle policies. But there's no real-time notification (Event Notification latency is too high), no pub/sub, no priority. Suitable as an archival layer, not a communication layer.
RPC Frameworks (gRPC, Dubbo)
RPC is synchronous and point-to-point — both parties must be online. That's the exact opposite of "async, offline-deliverable" mailbox semantics. To build a mailbox on top of RPC, you'd add a persistent queue as an intermediary — at which point RPC is just a serialization protocol and all the communication capability comes from that queue. Backwards.
Enterprise Messaging (IBM MQ, etc.)
IBM MQ has been around since 1993. It has the most complete feature set — persistent queues, priority, dead-letter queues, transactional messages, point-to-point, pub/sub, TTL, security. IBM MQ's queue model is naturally a mailbox.
So why doesn't anyone use it for Agent mailboxes? Too heavy — deployment and operations require dedicated MQ administrators. Too expensive — commercial licensing priced per CPU core. Too closed — completely out of place with Docker, Kubernetes, and cloud-native tooling. TIBCO, Oracle AQ, and Azure Service Bus follow similar logic: capable, but locked into specific enterprise ecosystems or cloud platforms.
IBM MQ can do mailboxes. It solves problems for banks and airlines, not for AI Agents.
Application Frameworks (Spring, Django)
Frameworks don't do communication themselves — they integrate communication components. Mailbox capability depends entirely on what's integrated underneath. A framework can wrap a mailbox API to hide implementation details, but the underlying problems remain.
The Full Picture
| Category | Storage | Push | Broadcast | Lifecycle Management | Can Do Mailbox? |
|---|---|---|---|---|---|
| Relational DB | ✓ | Limited (LISTEN/NOTIFY) | ✗ | Manual | Storage yes, push unreliable |
| Redis | ✓ | ✓ | ✓ | ✓ | Patchwork works, unreliable, costly |
| S3 / Object Storage | ✓ | ✗ | ✗ | ✓ | Archival only |
| etcd / ZK / Nacos | ✓ (small) | ✓ (Watch) | ✗ | ✓ | State awareness only |
| Time-Series DB | ✓ | ✗ | ✗ | ✓ | Archival and analytics only |
| RPC Frameworks | ✗ | ✓ (sync) | ✗ | ✗ | Completely unsuited |
| IBM MQ et al. | ✓ | ✓ | ✓ | ✓ | Capable, but too heavy and expensive |
| Kafka | ✓ | ✓ | Requires config | Requires ops | Model mismatch, high migration cost |
| NATS + MQ | ✓ | ✓ | ✓ | ✓ | Closest, lacks scenario focus |
This table simplifies — each system's capability is a spectrum, not a switch. But the position on that spectrum doesn't change the conclusion: nothing sits at "production-ready and cost-appropriate" across all four capabilities mailbox communication requires.
Message queues are the only infrastructure category that naturally has storage plus push plus subscribe together. Mailboxes emerging from message queues is not a coincidence — it's because the capability model of message queues overlaps most closely with the needs of a mailbox.
Someone will challenge this: you chose the mailbox metaphor, then used mailbox requirements to evaluate all infrastructure, and of course concluded that message queues fit best. That's circular reasoning.
That's an interesting challenge. Mailbox is not the only possible abstraction for Agent communication — the blackboard model (shared state), event streams (Event Sourcing), and graph structures (relationship as communication) are all viable directions. But the mailbox doesn't need to be the only correct abstraction. It just needs to be good enough. The blackboard model introduces consistency problems. Event streams are Kafka's approach — too heavy for ephemeral Agents. Graph structures describe relationships, not communication mechanisms. The mailbox covers the basic communication needs: async, offline-deliverable, priority, broadcast. Agents can compose eight distinct scenarios on top of a mailbox, which shows the abstraction has enough expressive power. Good enough is enough. It doesn't need to be perfect.
The Common Conclusion Across All the Analysis
Whether we're talking about message queues or other infrastructure, solving Agent mailbox communication runs into the same dilemma: either the model doesn't match and migration cost is prohibitively high (Kafka / Pulsar / databases), or capabilities are closest but there's no scenario focus (NATS / RabbitMQ), or the cost structure is fundamentally wrong (Redis), or it's too heavy and expensive (IBM MQ), or capabilities are incomplete (etcd / time-series / S3).
When I say "can't do it," I don't mean technically impossible. I mean the cost of building it on an existing architecture is high enough that you might as well build it from scratch. Adding mailbox semantics to Kafka would result in two near-independent systems coexisting in one codebase. NATS would need to open a new path alongside JetStream at roughly the same cost as starting fresh. Not impossible. Just the wrong thing to do, and doing it right means building something essentially new.
No one will bear the cost of building something essentially new specifically for Agent communication. Because they are all general-purpose infrastructure serving all scenarios. Before this scenario is validated, no mature product will deeply customize for it.
When Agent communication becomes an undeniable need, they will follow. But the likely form is patches on existing architecture, not a redesign. Patches work. They won't be better than something built for the purpose from the start.
If I had to name the biggest technical possibility, it would be NATS officially supporting this scenario — the foundation is the best, the distance is the shortest, and if the NATS team decided to do it, they would have the best shot.
But even that would be a good thing. That's what makes the technology world interesting — different people pushing the same problem from different angles benefits everyone. mq9 has defined this direction first. If NATS follows, that means the direction is right. If NATS does it better, developers win. If mq9 fits the scenario better, developers also win. Competition itself is progress. What's worrying is not others coming to solve the same problem — it's the problem not mattering to anyone.
How Developers Solve Agent Communication Today
Stepping away from the infrastructure vendor perspective — here is what the real experience looks like for developers in the field.
A typical multi-Agent project unfolds something like this.
Day one, HTTP calls. Agents need to communicate. The most natural choice is writing a few endpoints — A calls B's API. It runs, but soon you realize that when B is offline, requests just fail. You add retries. Retries need idempotency. Idempotency needs timeout handling. A simple "send a message" requirement turns into three hundred lines of glue code.
Week two, bring in Redis. Fed up, you use pub/sub for real-time communication, a List for the task queue, key+TTL for state storage. Three Redis data structures duct-taped into a barely functional communication layer. Then you discover that pub/sub doesn't persist — if an Agent restarts, messages are gone. You switch to Redis Streams, and now you're managing consumer groups, ACKs, and backlogs. Redis has gone from a cache to the most complex component in the project.
Week three, someone says "use Kafka." Kafka comes in. You create five or six topics — task-requests, task-results, agent-status, alerts, approvals. Each topic needs partition count, retention policy, and consumer group configuration. A month later, half the team's ops effort is managing Kafka. And Agents are ephemeral, but topics are permanent — that mismatch keeps generating garbage topics.
End state. Most teams' Agent communication layer looks like this: HTTP + Redis + a bit of Kafka + a pile of custom glue code. It works, but every project's implementation is different. A new person joining needs to re-learn the whole thing from scratch. Cross-team collaboration is impossible.
The core pain is not a lack of tools — it's a lack of standards. The tools exist: Redis, Kafka, HTTP. But none of them were designed for Agent communication specifically. Developers are constantly hand-rolling a purpose-built solution out of general-purpose tools.
What the Developer Experience Looks Like With mq9
Same project, day one:
Connection nc = Nats.connect("nats://localhost:4222");
// Create a mailbox
Message reply = nc.request("$mq9.AI.MAILBOX.CREATE",
"{\"type\":\"standard\",\"ttl\":3600}".getBytes(), Duration.ofSeconds(3));
// Send a message
nc.publish("$mq9.AI.INBOX.{mail_id}.normal", result.getBytes());
// Broadcast a task
nc.publish("$mq9.AI.BROADCAST.task.available", task.getBytes());No HTTP endpoints to write. No Redis data structures to choose. No Kafka topics to create. Four command words, NATS SDK, done.
Agent is offline? Message waits in the mailbox. Need priority? Change the subject suffix. Need competing consumers? Add a queue group. Need state awareness? Create a latest mailbox. Worried about gaps? QUERY to pull a batch as a fallback.
What developers no longer need to do:
No selection decisions — "should this scenario use Redis or Kafka?" No assembly — "how do I combine pub/sub, List, and Stream?" No glue code — "how do I handle retries, idempotency, timeouts, cleanup?" No operations — "who creates the topic? How many partitions? How do I configure consumer groups?"
What developers only need to do:
Decide the communication pattern — point-to-point or broadcast, persistent or not, what priority. Then use the corresponding command word to send. That's it.
What mq9 Actually Is
Not a better Redis. Not a lighter Kafka. A standard protocol for Agent communication.
The analogy is HTTP and the Web — before HTTP, every networked application had its own communication style. After HTTP, everyone spoke the same protocol, and an ecosystem became possible.
Agent communication today is what the internet looked like before HTTP: every team doing their own thing, nothing interoperating. What mq9 wants to do is give this chaos a standard answer — $mq9.AI.*, four command words, all Agents on the same protocol. That's what enables cross-team Agent collaboration, cross-system Agent interoperability, and a real Agent ecosystem.
An Honest Assessment
There is value here, but it needs to be separated into two layers.
What is definitely valuable: There is no standard answer today for Agent asynchronous communication. Developers really are stitching together HTTP + Redis + Kafka. Every team really is reinventing the same wheel. This pain is not new to the Agent era — it existed in the microservices era too, just not painful enough to demand a dedicated solution. Agents have amplified it to the point where it can no longer be ignored. A simple, purpose-built protocol designed for this scenario has real value for developers who feel this pain. This doesn't require a bet.
What is uncertain: How widespread and urgent that pain is today. Teams running multi-Agent systems are still a minority. Most people are still in the single-Agent stage, where HTTP calls are sufficient. mq9's value scales directly with how common multi-Agent systems become — more Agents means more communication pain means more value. The direction is right, but the timing window is unclear.
The direction has value; the timing requires patience. But RobustMQ's three years of foundational architecture are already there, which makes waiting cheap. The work that needs doing — getting the code solid, the protocol well-defined, the documentation clear — doesn't depend on timing. It can be done now.
When that day comes, mq9 will be ready.
