mq9 Protocol Design: Q&A

Design Philosophy

Before designing mq9, I asked myself one question: What's different between an Agent and a human?

Humans go offline intermittently. Humans have identities. Humans have memory. When humans receive duplicate messages, they decide for themselves whether to act on them. Humans send emails without needing confirmation that the other side received them before moving on.

Today's messaging infrastructure is designed for machines, not for Agents. Kafka assumes consumers are always online, processing high throughput in batches. NATS JetStream is powerful but conceptually dense — streams, consumers, durables, ack policies — Agent developers spend enormous time learning concepts unrelated to their business logic. None of these systems treat "Agents may be offline, may crash, may run as multiple instances" as a first-class concern.

mq9's starting point: Agent communication should be as simple as sending email.

Email has no ack. If you know the address, you can send. Everyone has their own inbox. When you're offline, mail waits for you; when you come back, you keep reading. Nobody thinks Email is "unreliable" — it has run in global production for decades.

From this starting point, five design principles follow:

1. Mailbox as the core abstraction. Mailbox is the only abstraction that naturally carries all three characteristics: identity, memory, and intermittent connectivity. mail_address is the Agent's identity, message persistence is memory, and offline buffering + ordered replay handles intermittent connectivity. No new concepts need to be invented.

2. Complexity belongs to the protocol, not the Agent. Priority scheduling, offset management, TTL cleanup — these are things messaging infrastructure should handle. The interface an Agent sees should be minimal: a single queue name parameter covers broadcast, shared subscription, and independent offset semantics.

3. Accurate responsibility assignment. Idempotency is a business-layer concern, not a messaging-layer concern. Exactly-once fundamentally doesn't exist in distributed systems — every system claiming it is doing at-least-once plus deduplication underneath. Rather than creating a false idempotency guarantee in the protocol, expose the capability (optional dedup_id) and let the Agent decide whether to use it. This isn't laziness; it's correct responsibility assignment.

4. Progressive capability, never forced. Normal usage requires no knowledge of advanced features. Need delayed delivery? Add a send_at header. Need idempotency? Include a dedup_id. Need priority? Append .critical or .urgent to the subject. Each capability layer is an optional addition that doesn't affect basic usage.

5. Reuse, don't rebuild. NATS is the foundation. The subject suffix mechanism naturally supports priority extension. The authentication system is reused directly. Request-reply is not redesigned. Energy is focused on what actually needs design: mailbox semantics, offset management, Agent communication primitives.

These five principles run through every design decision below. After reading this Q&A, come back and check each answer against these principles.

mq9 Is Not a Message Queue

mq9 looks a lot like a message queue: it has messages, sending, subscribing, and offsets. But it wasn't designed with a message-queue mindset. This distinction is worth stating clearly, because it determines the logic behind many specific design choices.

The message-queue mental model is a pipeline. Producers push data into the pipe, consumers pull data out. The pipe is anonymous and neutral — it doesn't care who the data belongs to or who is waiting for it. Kafka's topic is a data stream, not anyone's inbox. A consumer group is a carrier of consumption progress, not an identity. The core of the entire system is throughput and ordering — data flows from A to B, as fast as possible, without loss or disorder.

mq9's mental model is an inbox. A mailbox belongs to a specific Agent; mail_address is that Agent's identity. Messages aren't data flowing through a pipe — they're letters addressed to someone. The sender knows who the recipient is. The recipient has their own read position. When they're offline, the letters wait. When they come back, they keep reading. The core of the entire system is identity and state — who sent what to whom, and where each party has read to.

These two mental models lead to completely different design judgments:

Pipeline thinking cares about "was the message processed," so ack is needed — the consumer tells the system when processing is complete, and the system advances the offset. Without ack, messages may be re-delivered, which is a "bug" in the pipeline.

Inbox thinking cares about "was the message delivered." Ack isn't necessary — a letter is complete once it's in the inbox. When the recipient reads it and what they do with it is not the email system's responsibility. Re-delivery isn't a bug; it's a reality the Agent needs to handle itself, just as you might receive two identical emails.

In pipeline thinking, offset is a global resource managed by the system, shared across the consumer group. Multiple consumers must coordinate — either broadcast or compete.

In inbox thinking, offset is personal state. Each Agent has its own read position, independent of others. Queue name is the declaration of "who I am" — with the same mailbox, different Agents use different queue names, each advancing independently, with no need to create additional consumer objects.

Pipeline idempotency is a system promise — exactly-once is a commitment where the system handles deduplication, and consumers don't need to worry about it.

Inbox idempotency is an Agent capability. mq9 provides an optional dedup_id field as a deduplication key; the Agent decides whether to use it. Not forced, because Agents have judgment — just like you decide for yourself whether to reply once or twice to two identical emails.

Understanding this distinction explains why mq9 has no ack, why idempotency is optional, and why a single queue name parameter is sufficient. These aren't missing features — they're different answers under a different mental model.

Looking at mq9 through a pipeline lens makes it seem "unreliable" or "incomplete." Looking at it through an inbox lens, it's exactly sufficient with no unnecessary complexity.

Why Start from a Message Queue Perspective for Agent Communication

This question deserves a direct answer, because it wasn't accidental.

We started from message queues not out of habit, but because Agent communication needs and the problems message queues solve have significant structural overlap. Persistence, asynchrony, decoupling, multiple consumers — these are core message queue capabilities and also core Agent communication needs. Starting from message queues was a reasonable starting point.

But Agents have several characteristics that message queues have never seriously addressed. These are what ultimately led us to a different design.

Agents have short lifecycles. Traditional message queues assume consumers are long-running services — a Kafka consumer, once started, consumes continuously; going offline is an exception requiring rebalance. But Agents are naturally short-lived: a task starts, completes, exits. The next start may be the same Agent, a different instance, or hours later. Message queues were never designed for consumers that "may disappear at any moment, may return at any moment." Mailbox offline buffering exists precisely for this — when an Agent is absent, its messages wait; when it returns, it resumes reading, without the system needing to track Agent liveness.

Agents are naturally one-to-many. Broadcasting a command to all relevant Agents, notifying multiple downstream parties of an event — this is standard in Agent systems, not an edge case. Message queues handle one-to-many clumsily: create multiple consumers, fan out through topics, or maintain a separate queue per downstream. mq9 uses a single queue name parameter to distinguish broadcast from shared subscription, without Agent developers needing to understand the underlying mechanism.

Agents are like people, not services. This is the most fundamental difference. Traditional message queue consumers are stateless processing units — they have no identity, no memory, they just need to process messages. Agents have identity (mail_address), memory (message history in the mailbox), and autonomous judgment (deciding what to do when receiving duplicate messages). This requires the communication layer to carry not just data but identity and state.

Agents need simplified communication, not increased complexity. Agents are already complex enough — they must understand context, make decisions, call tools. If the communication layer then introduces consumer groups, ack policies, and durable names, the cognitive burden compounds. mq9's goal is to let Agents communicate with the minimum number of concepts: know the address to send, have a queue name to subscribe, and the protocol handles the rest.

So we started from message queues, and moved beyond them. Not to reject message queues, but to acknowledge: a pipe designed for machines is not suitable for direct use as a communication layer designed for Agents. What's needed is the same solid foundational capabilities, but a completely different abstraction model.

This is a complete discussion record of mq9 protocol design. Every question has a reason; every answer has a trade-off. Compiled here as the foundation document for ongoing protocol refinement.

Design Decisions at a Glance

Topic	Decision	Core Rationale
Write idempotency	Optional: include a string `dedup_id` in the header to enable idempotent dedup; without it, duplicate writes are allowed	Capability built-in, not forced; idempotency is a business-layer concern
Ack semantics	No ack; no guarantee of non-duplicate delivery	Push is a notification, not a delivery guarantee; reliability backed by mailbox persistence + pull/list fallback; Agents have human-level judgment
Consumption idempotency	Optional `dedup_id` in header; server deduplicates on send and passes `dedup_id` through to consumer; consumer decides whether to use it for dedup	Capability built-in, not forced; both sides independent
Multi-subscriber offset	Keyed by queue name, each independent	One parameter solves the durable consumer problem
Priority	Three-level enum: high / normal / low; strict ordering within each level	Enums are more intuitive than numbers; no cross-priority disorder
Priority transmission	Subject suffix: `.critical` / `.urgent`; no suffix = normal	Zero protocol changes, backward compatible
Delay / TTL	Headers: `send_at` + `expire_at`	Zero extra concepts; omitting = regular message
mail_address lifecycle	mq9 only handles allocation; Agent decides fixed vs. rotating	Separation of concerns; mq9 doesn't track Agent state
Message retention	Mailbox-level TTL + message-level `expire_at`, two independent layers	Flexible and sufficient; different granularities
Message size	Configurable upper limit (e.g., 10MB); rejected if exceeded; no chunking	Chunking is complex; large files should use the storage layer
Access control	No ACL; mail_address is the credential; unguessable	Analogous to Email address; security via unguessability
Message-level tag	Optional `tags` field in header at send time; list supports tag filter parameter; server returns matching subset	Bulletin-board scenarios: senders cannot be controlled; full retrieval becomes a bottleneck as messages accumulate
Message list query	Supports tag filter parameter; max_count limit recommended	Server-side filtering, backward compatible; no-parameter behavior unchanged
Message deletion	Single-message delete by message_id; batch deletion extensible	Sufficient for now
Mailbox deletion	Immediately inaccessible; messages naturally expire with TTL	No cascading delete; simple implementation
Subscription reconnect recovery	With queue name → offset persisted; without → cleared on disconnect	Durable / ephemeral naturally distinguished
Shared subscription	No queue name → broadcast; same queue name → round-robin	One parameter covers both semantics
reply_to	Uses mail_address directly; mq9 does no extra processing	No overreach; address changes are the Agent's concern
mail_address format	`{name}`; name limited to lowercase letters + digits + dots	Unified namespace; no ad-hoc suffix invention
message_id format	u64, mailbox-unique monotonically increasing, server-generated	Simple and efficient; mailbox-scoped uniqueness is sufficient
Message ordering	Three priority levels each maintain independent ordered queues	Clear model; no global ordering required
Mailbox count limit	No limit; storage layer scales	Protocol doesn't bear storage concerns
Multi-tenancy isolation	Implemented at the connection layer; protocol layer zero-aware	Protocol carries no tenant identifier; Agent unaware
Broadcast to all mailboxes	Currently unsupported; admin functionality	Operational capability; doesn't pollute client protocol
Connection authentication	Aligned with NATS: username/password, token, JWT, nkey	Reuses NATS ecosystem; low learning cost
Flow control	No application-layer control; relies on TCP backpressure	Sufficient for now; extensible for high-throughput scenarios
public mailbox discovery	Specify `public=true` at creation; server auto-writes to `public`; tag filtering reserved for large-scale scenarios	Create = register; server auto-maintains; client read-only
request-reply	Directly reuses NATS native semantics	Don't reinvent the wheel

Protocol Q&A

Write Idempotency

Why it matters: Duplicate writes are common in distributed systems. At-least-once vs. exactly-once is a fundamental protocol design fork.

Answer: Optionally supported. Without a dedup_id in the header, the mq9 server generates its own message_id and each send creates a new message — same semantics as Email. With a dedup_id in the header, the server performs idempotent deduplication: the same dedup_id is stored only once, and subsequent resends return the original message_id. Callers decide whether they need idempotency; it is never forced.

Assessment: Reasonable. Exactly-once fundamentally doesn't exist in distributed systems — every system claiming it is doing at-least-once plus deduplication underneath, with the deduplication key still provided by the business. mq9 builds the capability into the protocol without forcing it — Agents that don't need idempotency send directly; those that do include dedup_id. Both usage patterns are clean with no added cognitive burden.

Ack Semantics

Why it matters: Ack determines the reliable delivery guarantee. Having ack enables a "messages won't be lost" guarantee, which is often considered the threshold for production readiness.

Answer: No ack, no ack semantics. No guarantee of non-duplicate delivery. With a queue name specified, a message is delivered only once based on offset advancement. Specifying a new name results in full consumption from the beginning. One-time retrieval of all messages is also supported. Both mailbox semantics and message-queue semantics are present. The distinction is that non-duplicate delivery is not guaranteed.

Assessment: Reasonable. "No ack means it can't go to production" is wrong.

The core reason is that push in mq9 is a notification, not a delivery guarantee. Once a message is written successfully, it is already persisted in the mailbox. Push merely tells the Agent "there are new messages"; if a push is lost, at worst the notification is delayed — the message itself is not lost. Agents can retrieve unread messages at any time via list or pull. Reliability is backed by mailbox persistence + pull/list fallback, not by push reliability. This is fundamentally different from JetStream, where a message without an ack can genuinely be lost.

Introducing ack would actually cause harm: the broker would need to wait for ack timeouts and re-deliver, potentially causing the Agent to receive the same message twice — creating a problem that otherwise wouldn't exist, while adding state-machine complexity on both sides.

Email has no ack, and Email is one of the most successful asynchronous communication protocols ever, running in production for decades. Agents act on behalf of humans and should have human-level judgment — receiving a duplicate message, they decide for themselves whether to act. Pushing idempotency to the Agent isn't laziness; it's correct responsibility assignment.

Consumption Idempotency

Why it matters: mq9 doesn't guarantee non-duplicate delivery, but Agents may need consumption idempotency. Who bears this complexity, and how does the protocol provide support without forcing it?

Answer: Senders can optionally include a dedup_id field in the header:

Without it: The mq9 server generates a message_id; sending multiple times creates multiple messages, same semantics as Email.
With it: The mq9 server performs idempotent deduplication; the same dedup_id is stored only once, and subsequent sends return the original message_id.

On the consumer side, the server passes dedup_id through to the message header; the Agent decides whether to use it for consumer-side deduplication. Both sides are optional; Agents are fully autonomous in choosing to use it.

Assessment: This design is very clean. Idempotency capability is built into the protocol but not forced — Agents that need idempotency include dedup_id in the header at send time; the server deduplicates and passes the same dedup_id through to the consumer side, which uses it for dedup with consistent semantics. The dedup_id field should use caller-generated meaningful IDs (like a business transaction number) to maintain idempotency semantics across retries.

dedup_id lifetime is tied to the message — when a message expires or is deleted, its associated dedup_id record expires with it. No separate TTL management for dedup_id is needed; the implementation is simple and the semantics are consistent.

Independent Multi-Subscriber Offset

Why it matters: When multiple Agents subscribe to the same mailbox, whether each has an independent position determines whether mq9 is a true Agent primitive or just a NATS wrapper.

Answer: Queue name serves as the offset key; each is independent. Different Agents use different queue names and maintain their own consumption progress independently.

Assessment: This is mq9's greatest highlight, bar none. In NATS JetStream, achieving independent offsets for multiple consumers requires creating multiple durable consumers, each tracked separately. mq9 solves this with a single parameter, naturally corresponding to the "inbox" mental model: each person subscribes to the same bulletin board and reads at their own pace.

Priority Design

Why it matters: Numbers vs. enums, strict priority vs. weighted scheduling — these affect the Agent user experience and implementation complexity.

Answer: Three-level enum: high / normal / low. Strict offset order within each level; across levels, high precedes normal precedes low.

Assessment: Reasonable. Enums are more intuitive than numbers; Agents don't need to understand scheduling algorithms. The "ordered within priority" constraint is important — it prevents messages at the same priority from becoming disordered. Transmitted via subject suffix: $mq9.AI.MAILBOX.MSG.{mail_address}.critical / .urgent; no suffix = normal. Zero protocol changes, backward compatible.

Delayed Delivery / Message TTL

Why it matters: Agent scenarios have time-sensitivity — some messages should only take effect in the future; some messages lose meaning if they expire.

Answer: Two fields transmitted via headers:

send_at: When the message becomes visible; if omitted, immediately visible.
expire_at: When the message expires; if omitted, never expires.

The combination of these two fields defines the message's valid window.

Implementation semantics: send_at is implemented via server-side hold — the message is persisted immediately upon arrival at the server, but remains invisible to everyone (both list queries and consumer-side push/pull return nothing) until send_at is reached, at which point it automatically becomes deliverable. The Agent sends and moves on; no local timer, no awareness of the message's current visibility state needed.

Note: Cancelling a held message is not currently supported. Once sent, the message cannot be retrieved via list before send_at is reached, so there is no way to obtain its message_id for deletion. To cancel a delayed message, the only current options are to wait until it becomes visible and then delete it, or to set an appropriate expire_at so it expires automatically.

Assessment: This is a very clean design. Transmitted via headers, zero extra concepts, backward compatible. Omitting both gives a regular message; including them adds extra capability. send_at and expire_at together express "this message is valid between T1 and T2," with zero cognitive burden on the Agent. Server-side hold makes the delayed delivery semantics fully transparent to the Agent — the decoupling of send time and delivery time is handled by the protocol.

mail_address Lifecycle

Why it matters: Whether the address is stable after an Agent crashes and restarts determines whether the core scenario of "Agent continues working after crash-restart" holds.

Answer: mq9 is only responsible for creating and allocating addresses, guaranteed globally unique. The Agent itself decides which address to use after restart — fixed or rotating, as desired.

Assessment: Reasonable. Clear separation of concerns. mq9 doesn't manage Agent lifecycle, only messages during the address's lifetime. This design means mq9 doesn't need to track Agent liveness.

Mailbox Message Retention Policy

Why it matters: When messages are deleted determines storage cost and how long an offline Agent can still receive messages.

Answer: Messages follow the mailbox lifecycle. At mailbox creation, a TTL can be set — permanent, or a specific duration. Individual messages also support independent expire_at (via header).

Assessment: Reasonable. Mailbox-level TTL controls the overall policy; message-level TTL controls individual messages. Two independent layers, flexible and sufficient.

Message Size Limit

Why it matters: Agent scenarios may involve large files, images, or long contexts. The protocol layer needs clear boundaries.

Answer: Large messages are supported with a configurable upper limit (e.g., 10MB, 30MB). Exceeding the limit results in immediate rejection with an error. Chunked transfer is not supported.

Assessment: Reasonable. Not chunking is the right choice — chunking is complex to implement, hard to debug when things go wrong, and Agent communication messages shouldn't be oversized payloads. When large files truly need transferring, use a dedicated storage layer and put only references in the mailbox. This should be clearly documented.

Access Control

Why it matters: Does mailbox security rely on ACL or on the address itself?

Answer: No ACL. Knowing the mail_address is sufficient to send and subscribe. The mail_address functions as a password — possession equals authorization. System-generated addresses ({uuid}) are globally unique and unguessable; user-defined addresses ({name}) are human-readable, and security depends on the name itself not being disclosed.

Assessment: Reasonable. Analogous to Email — knowing the address lets you send; the protocol is minimal with no permission layer. Worth noting: custom addresses are semantically clear but also predictable. For security-sensitive mailboxes, use system-generated addresses to avoid guessing. mail_address leakage equals permission leakage; Agents should guard their addresses carefully.

Message List Query

Why it matters: How Agents browse a mailbox, and whether server-side filtering is needed, affects protocol complexity.

Answer: Viewing the message list in a mailbox is supported; conditional filtering and pagination are currently not supported.

Assessment: A max_count limit is recommended (defaulting to the latest N messages) to avoid performance issues from full retrieval when a mailbox accumulates many messages. Filtering conditions should be handled by the Agent; the protocol doesn't need to bear this. This embodies the "minimal protocol, sufficient capability" principle.

Message Deletion

Why it matters: The granularity and atomicity of deletion affects the Agent user experience.

Answer: Currently supports single-message deletion by message_id (u64). Batch deletion can be added later.

Assessment: Reasonable. Single deletion is sufficient; batch deletion on demand.

Mailbox Deletion Behavior

Why it matters: After a mailbox is deleted, do its messages clear immediately or expire naturally? This affects implementation complexity.

Answer: After a mailbox is deleted, it becomes immediately inaccessible. Messages expire and are cleaned up naturally following the mailbox TTL; immediate reclamation is not required.

Assessment: Reasonable. No cascading deletion; simple implementation; storage layer naturally reclaims space.

Subscription Disconnect Recovery

Why it matters: Whether the offset is preserved after an Agent reconnects determines whether the core scenario of "Agent continues working after going offline" holds.

Answer: Subscriptions with a queue name have their offset persistently preserved, with a configurable TTL (e.g., 7 days). Delivery progress is automatically restored upon reconnection. Subscriptions without a queue name are cleared on disconnect; reconnecting results in full consumption from the beginning.

Assessment: Reasonable. Durable and ephemeral semantics are naturally distinguished by the presence or absence of a queue name; Agents don't need to understand any additional concepts.

Shared Subscription

Why it matters: When multiple Agents subscribe to the same mailbox, broadcast vs. load balancing — both semantics are useful and need clear distinction.

Answer: No queue name → broadcast; every Agent receives a copy. Same queue name → shared subscription; round-robin delivery; the same message is delivered only once.

Assessment: This design is exceptionally elegant. A single parameter covers two completely different semantics; Agents understand it intuitively.

Priority and shared subscription interaction: When multiple Agents share a subscription under the same queue name, each delivery takes one message from the current highest-priority queue, then rotates to the next Agent. Priority determines which message to deliver; round-robin determines which Agent receives it — the two are independent. The result is that all Agents process high-priority messages first, then normal, then low; priority semantics are fully preserved under shared subscription.

reply_to

Why it matters: An Agent's mail_address may change after crash-restart; does reply_to need special handling?

Answer: Uses mail_address directly; reply_to reuses the existing NATS field; mq9 does no extra processing. If an Agent's address changes, that's the Agent's problem.

Assessment: Reasonable. mq9 doesn't overreach; responsibility boundaries are clear.

mail_address Format

Why it matters: Address format determines readability, identity semantics, and extension paths.

Answer: Uniformly adopts {name} format. `` is the fixed suffix; users only need to care about the part before @. Three forms:

User-defined: {name}; name may only contain lowercase letters + digits + dots, e.g., lobo.robustmq, payment.agent
System-generated: {uuid}; randomly generated, unguessable
Public mailbox: public; system-reserved address; user attempts to create it will be rejected

Addresses do not carry tenant information. Tenancy is a broker-layer concept; the mq9 protocol is unaware of it. Agents specify their tenant at connection time via header or account; the address itself remains pure. public is tenant-scoped; each tenant has its own independent instance, and Agents can only see public mailboxes within their own tenant.

Assessment: The `` fixed suffix is the right decision. The namespace is unified under one domain, preventing users from inventing ad-hoc suffixes chaotically. Users only need to think about the prefix — minimal cognitive burden. public is more intuitive than the old mail@public — subject first, domain second, fully consistent with Email. System-reserved addresses are a fixed whitelist, not distinguished by naming rules; the server rejects them at creation time. When new reserved addresses are added, simply update the whitelist and documentation.

message_id Format

Why it matters: How to balance uniqueness, readability, and traceability.

Answer: Server-generated; type is u64, monotonically increasing and unique within the mailbox. Each message within the same mailbox has a unique message_id; uniqueness across different mailboxes is not guaranteed. Simple, efficient, and sufficient for message identification.

Assessment: Reasonable. u64 monotonically increasing is sufficient at mailbox granularity — global uniqueness is not needed. Implementation is simple with low storage and comparison overhead.

Message Ordering Guarantee

Why it matters: There's a tension between priority preemption and strict ordering; the semantics need to be clear.

Answer: Strict offset order within each priority level; across levels, high precedes normal precedes low. This is equivalent to three independent ordered queues; delivery selects the queue by priority.

Assessment: Reasonable. This model is clear and the implementation is direct — three priority levels each with their own queue and offset tracking, no global ordering required.

Mailbox Message Count Limit

Why it matters: Will unlimited message accumulation exhaust storage?

Answer: No limit; storage can scale.

Assessment: Reasonable. Storage problems belong to the storage layer; the protocol doesn't need to bear them. However, operations documentation should clearly explain storage scaling approaches to prevent users from hitting issues unexpectedly.

Multi-Tenancy Isolation

Why it matters: When multiple teams share a single mq9 cluster, how are mailboxes isolated?

Answer: Already implemented, resolved at the connection layer. At connection time, tenant information is retrieved from headers, accounts, etc. If found, the connection is associated with that tenant; if not, it defaults to the default tenant. The protocol layer is unaware of tenancy; messages don't need to carry tenant identifiers. mail_address is unique within a tenant; different tenants can have identically named addresses without interfering with each other.

Assessment: This is the correct implementation approach. Isolation at the connection layer means the protocol layer has zero overhead, and Agents don't need to be aware of which tenant they belong to. The default tenant is no different from any other tenant — it's simply a pre-created tenant during system initialization to avoid requiring manual tenant creation in single-tenant scenarios. Other tenants must be manually created in the admin backend.

Broadcast to All Mailboxes

Why it matters: In system notification scenarios, administrators need to notify all Agents.

Answer: Currently not supported; this is internal broker admin functionality, not exposed in the client protocol.

Assessment: Reasonable. This is an operational capability, not a protocol primitive. Put it in the admin API; don't pollute the client protocol.

Connection Authentication

Why it matters: The method of secure access affects deployment complexity and security.

Answer: Aligned with NATS: supports username/password, token, JWT, nkey; extensible as needed.

Assessment: Reasonable. Reuses the NATS ecosystem; low learning cost for users.

Flow Control

Why it matters: When an Agent processes slowly, unlimited mq9 pushing may overwhelm it.

Answer: Currently no application-layer flow control; relies on NATS TCP backpressure for natural rate limiting.

Assessment: Sufficient for now. TCP backpressure handles most scenarios. If high-throughput scenarios emerge later, application-layer credits/window mechanisms can be added as needed.

public mailbox Discovery Mechanism

Why it matters: How do Agents discover capabilities offered by other Agents?

Answer: public is the system-reserved discovery address. At mailbox creation, specify public=true and the server automatically writes that mailbox into public — no additional client action required. Agents subscribe to or pull from public to discover public mailboxes. When a mailbox is deleted, the server automatically cleans up the corresponding record; clients need not be aware.

Assessment: Minimal design, zero extra concepts. Specifying public=true at creation is the registration; the server auto-writes and auto-cleans on mailbox deletion; clients are unaware. public is a system-reserved address, tenant-scoped, independently isolated per tenant; Agents can only see public mailboxes within their own tenant.

Reserved extension point for large-scale scenarios: tag filtering. When a tenant has hundreds of public mailboxes, full return becomes a performance bottleneck. The reserved solution is a tag mechanism:

Mailboxes can carry multiple tags at creation time, e.g., payment-agent tags: [finance, payment, v2]
Subscribing to public supports tag-based filtering; the server filters and returns only the matching subset
Subscriptions without tags behave exactly as today, backward compatible, progressively upgradable

Tags are transmitted as header fields — zero protocol changes. An extension direction: messages can also specify tags; only Agents subscribed to that tag receive them, and public evolves into a lightweight pub/sub routing layer. This capability is not implemented at this stage; wait until the discovery mechanism runs at scale, then extend based on actual needs.

Message-Level Tag Filtering

Why it matters: Should messages carry tags in their headers at send time, with list supporting tag-based filtering?

Answer: Supported. Two scenarios make this necessary:

Scenario 1: Bulletin-board mailboxes. public or a shared notification mailbox where multiple senders write in and receivers only care about specific types. This is inherently a "one mailbox, mixed message types" scenario — receivers cannot control which mailbox senders write to, so multiple mailboxes cannot solve this. Server-side tag filtering is the only clean solution.

Scenario 2: List queries on mailboxes with accumulated messages. Full retrieval followed by client-side filtering becomes a performance bottleneck as message volume grows; server-side filtering is necessary.

Protocol design:

At send time, an optional tags field is carried in the header as a string list, e.g., tags: [finance, payment, v2]
list supports an optional tag filter parameter; the server returns only messages containing that tag
Messages without tags and list calls without filters behave exactly as before — fully backward compatible
The server indexes tags on write and scans by tag on list

Assessment: Multiple mailboxes remain the preferred isolation granularity — when a scenario can be isolated with multiple mailboxes, that is still the cleaner approach. Tag filtering is for scenarios where the sender cannot be controlled, or where message volume within a single mailbox is large enough to require precise querying. The two complement each other and are not mutually exclusive.

request-reply

Why it matters: Does the synchronous question-answer scenario between Agents need special support?

Answer: Directly reuses NATS native request-reply semantics.

Assessment: Reasonable. Don't reinvent the wheel.

How mailboxes Communicate with Each Other

Why it matters: Is the basic unit of communication a message or a mailbox? This determines whether the protocol has exceptional paths.

Answer: There is no "direct message" mode. To communicate, you must send to the other party's mailbox, and they go read it. If they're online and read in real time, it's synchronous. If they're offline, the message waits in the mailbox — that's asynchronous. The communication mode is determined by whether the other party is online, not by the protocol.

Assessment: This constraint is correct. All communication goes through the mailbox; protocol behavior is uniform with no exceptional paths. "Online equals synchronous, offline equals asynchronous" is a naturally emergent property, not an additional design element.

How Many Mailboxes Can One Agent Have

Why it matters: Can Agents create mailboxes per task? This determines isolation granularity and lifecycle management flexibility.

Answer: No limit; Agents can create as needed. For example, creating a temporary mailbox per dispatched task, letting it lapse after the task completes, with the system auto-reclaiming via TTL. An Agent can also have one permanent "identity mailbox" plus multiple temporary "task mailboxes," each independent.

Assessment: This design is very flexible and well-suited to Agent short-lifecycle characteristics. One mailbox per task provides natural isolation without needing to distinguish task context within messages. The mailbox disappears automatically when the task ends, leaving no residual state.

Mailbox Creation

Why it matters: Whether a registry is needed determines Agent autonomy and deployment complexity.

Answer: Agents call the create API directly and can choose permanent or temporary (with TTL). There's no "registration" concept, no centralized address allocation — Agents create on demand.

Assessment: Reasonable. Decentralized creation gives Agents full autonomy with no dependency on any registry. Permanent vs. temporary maps cleanly to two use cases: identity uses permanent, tasks use temporary.

Message Payload Format

Why it matters: Whether the protocol mandates a format determines how tightly mq9 couples with the application ecosystem.

Answer: The protocol layer mandates nothing. However, A2A (Agent-to-Agent protocol) specifies the message format; Agents can send A2A messages via mq9, and the ecosystem closes naturally. mq9 is the transport layer; A2A is the application layer — each handles its own responsibilities.

Assessment: The layering here is very clean. mq9 doesn't overstep by mandating application-layer formats. A2A handles semantic conventions; any Agent that implements A2A can communicate via mq9. This also means mq9 is not bound to any specific Agent framework.

Message Status Query

Why it matters: Whether senders can know if a message was processed is a common requirement in Agent workflows; whether the protocol supports this has large implications.

Answer: Not introduced. No protocol primitive for "message was read" is defined. If an Agent needs to confirm the other party processed something, it relies on the other party actively sending a response, or uses request-reply.

Why not introduce it: "Was read" has no clear definition in Agent scenarios. Pulled but not processed, processed halfway before a crash, processed with no further action — these three are indistinguishable at the protocol layer. "Received" and "processed" are two different things; "processed" and "processed successfully" are also two different things. Defining any one of them in the protocol would introduce new state machines and new interaction rounds; the mailbox would need to track per-message state for every subscriber, increasing storage and implementation complexity by an order of magnitude — and once introduced, it's very hard to roll back. The definition of "processed" is business semantics, not protocol semantics. Email's answer to this problem is to reply; mq9 follows the same pattern.

Recommended pattern: To confirm processing completion, use request-reply or have B actively send a message to A's mailbox. For deduplication, include dedup_id at send time; the server passes it through to the consumer side for consistent dedup semantics.

Mailbox Permission Granularity

Why it matters: mail_address is full permission — are there scenarios that need separate read vs. write permissions, or is that never necessary?

Answer: Currently no fine-grained permissions. Possessing mail_address grants full permissions — read, write, delete. The design semantics: messages grow naturally; whoever holds the address is inherently trusted. mail_address can be thought of as a password; possession equals authorization. Introducing a permission system would make the protocol complex, turning it from a communication primitive into an application platform — not the direction mq9 wants to go.

Assessment: Reasonable. Consistent with the overall design of mail_address as credential. For scenarios requiring access control, the upper-layer application manages mail_address distribution and confidentiality; the protocol doesn't intervene.

Can mailbox Addresses Be Renamed

Why it matters: As an externally published identity, can mail_address be updated as an Agent's responsibilities change?

Answer: No rename capability. mail_address is immutable once created, whether user-defined or system-generated. The semantic meaning of the address is carried by the description written at registration time in PUBLIC.LIST; descriptions can be updated at any time while the address itself remains stable.

Assessment: Reasonable. An immutable address can serve as a stable identity anchor. When semantics need adjustment, update the description — no need to change the address, and no impact on other Agents that already know this address.

Message Forwarding

Why it matters: When an Agent receives a message and forwards it to another Agent, does the protocol layer need to support this, or should the Agent handle it itself?

Answer: The protocol layer has no forwarding primitive. This is the Agent's job — receive the message, read the payload, decide whether to forward, to whom, and whether to preserve original sender information (write it into the new message's payload or header). Under the current semantics, Agents can fully implement this capability themselves.

Assessment: Reasonable. Forwarding is business semantics, not transport semantics. Adding a protocol-layer forwarding primitive would introduce ambiguity about "who is the real sender" and burden mq9 with unnecessary complexity.

Relationship Between mq9 and A2A

Why it matters: Is the dependency truly one-directional, or will A2A eventually impose new protocol requirements on mq9?

Answer: Currently one-directional — mq9 is the transport layer, A2A is the application layer, A2A runs on top of mq9, and mq9 is unaware of A2A semantics. As the two deepen their integration, A2A as a standard protocol may impose new requirements on the transport layer, at which point mq9 may need targeted evolution. But boundaries are clear for now; no premature design.

Assessment: The current one-directional layering is correct — don't couple prematurely. Wait until A2A integration scenarios actually run at scale, then see what real requirements feed back to the transport layer. Designing based on concrete scenarios then is far more accurate than guessing now.

Many-to-One Topology (Aggregator Mailbox)

Why it matters: Task collection, voting, result aggregation — multiple Agents writing to the same mailbox while one consumer processes and aggregates. Is this natively supported?

Answer: Natively supported. Multiple Agents that know the same mail_address can all write to it; a consumer that knows the address can subscribe and read. Mailboxes have no "who can write" restriction — knowing the address is sufficient. Many-to-one topology requires no additional design.

Assessment: This is a natural dividend of the mail_address-as-credential design. Broadcast (one-to-many), point-to-point (one-to-one), and aggregation (many-to-one) topologies are all determined by the distribution scope of the address; the protocol is unaware.

Consumption Order in Aggregator Mailboxes

Why it matters: When multiple Agents write concurrently to the same mailbox, is the order the consumer sees deterministic?

Answer: In the order of write arrival, the kernel records with monotonically increasing offsets; what the consumer sees is this order. First-come-first-served, deterministic. Worth noting: offset is an internal kernel concept not exposed at the mq9 protocol layer — Agents don't see offsets, only the delivery order of messages.

Assessment: This design is clean. Offsets ensure ordering internally; offsets are not exposed externally; Agents don't need to understand this concept. Ordering is naturally determined by write timing — no extra protocol-level convention required.

Write Permissions for PUBLIC.LIST

Why it matters: Can anyone freely register to PUBLIC.LIST? This determines the trustworthiness of the public discovery mechanism.

Answer: Clients cannot write directly to PUBLIC.LIST. The correct approach is to specify public=true at mailbox creation; the server automatically writes that mailbox into PUBLIC.LIST. Agents have only consume permissions on PUBLIC.LIST, not write permissions. This eliminates the two-step "create then register" process — creation is registration.

public is tenant-scoped; each tenant has its own independent public, completely isolated between tenants. Subscribing to public only shows public mailboxes within your own tenant — other tenants' mailboxes are invisible. This behavior is naturally guaranteed by the connection-layer tenant isolation, requiring no extra protocol-layer handling.

public is a system-reserved address; user attempts to create a mailbox with this name are directly rejected. System-reserved addresses must be listed in the protocol documentation so users know which names are unavailable.

Assessment: This design is excellent. The server controls writes; clients cannot fabricate registrations; every record in PUBLIC.LIST is inherently trustworthy — each corresponds to a real, existing mailbox. The server synchronously cleans up records when mailboxes are deleted, resolving the stale-record problem. The tenant-scoped isolation of public is mandatory and must be clearly documented.

Mailbox Identity Semantics and Address Naming

Why it matters: An Agent can have multiple mailboxes. How do external Agents structurally find its primary identity without guessing from descriptive text?

Answer: At mailbox creation, the Agent may specify the address; system auto-generation is not required. Agents can use meaningful names as their primary address (e.g., payment-agent), while other temporary mailboxes use system-generated {uuid}. Identity semantics are carried by the address naming convention; the protocol doesn't introduce type or role fields, keeping semantics simple.

Assessment: This direction is right. Custom addresses make an Agent's primary identity immediately apparent without additional structured fields. Naming conventions can be documented in the A2A layer or usage documentation; mq9 protocol doesn't need to be aware. Uniqueness is enforced within the tenant — creating a duplicate-named address within the same tenant results in an error; different tenants may have identically named addresses with clear isolation boundaries.

Consistency Model Under Failure Scenarios

Why it matters: When the underlying NATS cluster experiences a network partition, does mq9 prioritize availability or consistency? What can Agents rely on?

Answer: Consistency first — similar to Kafka's data consistency semantics. A successful write means the data is persisted; there's no scenario where partitioned writes cause data to diverge. During a network partition, writes fail rather than accumulating independently on two sides and causing offset conflicts. Agents can trust the "write succeeded" confirmation.

Assessment: This is the right choice. The CP model makes Agent behavior predictable — either the write succeeds, or it explicitly fails; there's no ambiguous "maybe succeeded, maybe not" state. For Agent communication, predictability matters more than high availability: an Agent that receives a failure can retry, but an Agent that receives a false success without knowing it faces far harder consequences.

Overall Assessment

Positioning

mq9's positioning is very clear: complete infrastructure for Agent asynchronous communication. The protocol is minimal, capabilities are complete, and complexity is absorbed by mq9 itself rather than transferred to Agents.

This positioning fills a genuine gap. Today's Agent frameworks either use MQ (too heavy, not designed for Agents), hand-write async logic (every project reinvents the wheel), or lack persistent communication capabilities entirely. mq9 is the first infrastructure designed with "Agent communication" as a first-class citizen.

Strengths

1. The right abstraction was chosen. Mailbox is the only abstraction that naturally carries all three characteristics: identity, memory, and intermittent connectivity. mail_address is identity, message persistence is memory, and offline buffering + ordered replay handles intermittent connectivity.

2. Queue name design is the stroke of genius. One parameter covers three semantics: broadcast, shared subscription, and independent offset. No additional concepts; Agents understand it intuitively.

3. Subject suffix extends priority, header extends delay and TTL — zero protocol changes. Regular messages send directly; add a subject suffix for priority; add a header for delay — backward compatible, progressively upgradable.

4. Minimal protocol, built-in complexity. Priority scheduling, offset management, TTL cleanup — all this complexity is absorbed inside mq9; the interface Agents see is minimal.

5. The Email analogy is accurate. mail_address as credential, no ACL, no forced idempotency, resendable and deletable. This mental model is friendly to Agent developers, with no message-queue cognitive overhead.

1. Message list query lacks max_count. Currently returns everything; a large mailbox is a performance risk. A default upper limit (e.g., latest 100 messages) is recommended. The protocol doesn't need pagination, but a protective limit is needed.

2. Message-level header query not yet supported. In large-message scenarios (10MB–30MB), Agents cannot inspect headers before deciding whether to pull the full payload. Worth supporting later.

3. Protocol documentation lacks "this is a design choice, not a defect" statements. No ack, no forced idempotency, non-duplicate delivery not guaranteed — if these aren't stated clearly and positively in the documentation, users will assume they're unfinished features rather than intentional decisions. This is critical to mq9 adoption.

4. message_id type is finalized as u64, mailbox-unique and monotonically increasing, and must be formally documented. Write idempotency is separately achieved by including a string dedup_id in the message header. The two concepts must be clearly distinguished in the documentation to avoid confusion.

5. The tenant-isolation semantics of public must be explicitly stated in documentation. Each tenant has its own independent public; Agents can only see public mailboxes within their own tenant. This behavior is not intuitive to users and must be stated clearly and positively in the documentation.

6. The system-reserved address list must be maintained as the protocol evolves. Currently public is a known reserved address; more system-reserved addresses will be added. Each time one is added, the reserved address list in the protocol documentation must be updated in sync.

This document serves as the baseline for protocol design. Each future refinement can come back here to check: which questions have been resolved, which answers have changed — all worth updating here.

mq9 Protocol Design: Q&A ​

Design Philosophy ​

mq9 Is Not a Message Queue ​

Why Start from a Message Queue Perspective for Agent Communication ​

Design Decisions at a Glance ​

Protocol Q&A ​

Write Idempotency ​

Ack Semantics ​

Consumption Idempotency ​

Independent Multi-Subscriber Offset ​

Priority Design ​

Delayed Delivery / Message TTL ​

mail_address Lifecycle ​

Mailbox Message Retention Policy ​

Message Size Limit ​

Access Control ​

Message List Query ​

Message Deletion ​

Mailbox Deletion Behavior ​

Subscription Disconnect Recovery ​

Shared Subscription ​

reply_to ​

mail_address Format ​

message_id Format ​

Message Ordering Guarantee ​

Mailbox Message Count Limit ​

Multi-Tenancy Isolation ​

Broadcast to All Mailboxes ​

Connection Authentication ​

Flow Control ​

public mailbox Discovery Mechanism ​

Message-Level Tag Filtering ​

request-reply ​

How mailboxes Communicate with Each Other ​

How Many Mailboxes Can One Agent Have ​

Mailbox Creation ​

Message Payload Format ​

Message Status Query ​

Mailbox Permission Granularity ​

Can mailbox Addresses Be Renamed ​

Message Forwarding ​

Relationship Between mq9 and A2A ​

Many-to-One Topology (Aggregator Mailbox) ​

Consumption Order in Aggregator Mailboxes ​

Write Permissions for PUBLIC.LIST ​

Mailbox Identity Semantics and Address Naming ​

Consistency Model Under Failure Scenarios ​

Overall Assessment ​

Positioning ​

Strengths ​

Areas Needing Further Refinement ​