RobustMQ Supporting Vector Search and Full-Text Search: Some Thoughts
The Starting Point
Working on mq9 recently, I ran into a problem I couldn't avoid — the Agent registry needs to support semantic search.
In the A2A protocol, Agents describe their capabilities in natural language (the description and skills fields of an AgentCard are all natural language). When a client looks for an Agent, it also uses natural language ("I want an Agent that can write Rust"). Both sides speak natural language, so you can't rely solely on exact tag matching in the middle — semantic vector matching is necessary.
This requirement brought "vector search" to RobustMQ's side of the table.
But thinking it through more carefully, the problem isn't limited to the mq9 registry. RobustMQ's design principle has always been "protocol-neutral + no message content parsing" — the broker doesn't know the message format and has no built-in search capability. If users want to search message content, they connect to Elasticsearch, Qdrant, or something else themselves.
So should we take this opportunity to build search capability as a component within RobustMQ?
I went back and forth on this several rounds, and my judgment evolved each time. This post documents that process.
Round One: Should We Do This At All?
My initial judgment was no. The reasoning was straightforward — infrastructure projects are most at risk from scope creep. Elasticsearch and Qdrant are mature solutions already. RobustMQ reinventing the wheel makes no sense. Focus on core message broker capabilities; let users choose their own search tools.
This judgment was correct before mq9 came along. But the mq9 registry requirement changed the situation.
The Agent registry must have vector search. Agents describe capabilities in natural language, clients query in natural language — without semantic matching in the middle, the whole thing falls apart. This isn't a nice-to-have enhancement; it's a problem mq9 fundamentally must solve.
To support it, RobustMQ has to bring in a vector search technology stack. Once that stack is introduced, it becomes part of RobustMQ.
So the question shifted from "should we do this" to "since we have to, what's the most sensible way to do it."
Round Two: Where Does It Live?
The stack has to come in. Next question: where does it go?
The most direct option is inside the mq9 module — LanceDB + fastembed as implementation details of the mq9 registry, invisible to other modules.
But that approach has problems.
First, a technology stack introduced once should be reused multiple times. The engineering cost of bringing in LanceDB and fastembed is real — learning the API, handling edge cases, version management, documentation. Once that cost is paid, it should benefit every place that needs it, not just one module.
Second, similar requirements will come up repeatedly. Beyond the Agent registry, topic content search, Connector content routing, Bridge content filtering, and future scenarios may all need content-based search. If search capability is locked inside mq9, every scenario has to reinvent the wheel.
Third, mixing it into mq9 makes the mq9 module more complex. mq9 would be responsible for both protocol handling and search indexing, with logic tangled together.
The right direction is to extract search as an independent internal module within RobustMQ.
Round Three: How to Abstract It
Extracting it as an independent module is correct. But how should the abstraction be designed?
A common trap is over-abstraction — adding extension points, configuration options, and plugin mechanisms for hypothetical future users, resulting in a module that can't even serve its first use case well.
The right approach is the opposite: make the current requirements work well, and clean abstraction will emerge naturally.
Concretely:
- The interface is just three basic operations: save / search / delete
- Use namespaces to isolate data from different callers
- Internally use LanceDB + fastembed, but keep a trait abstraction for swappability
- Don't assume anything about how other scenarios will use it
Once this abstraction is done well, it naturally serves multiple scenarios over time — not because we pursued "generality," but because clean abstraction decouples itself from specific use cases.
Generality is a result of good abstraction, not a design goal. That distinction matters.
Design Anchor: Serving Agent Discover
Let me reprioritize clearly.
Agent Discover is something that must be done now. Without it, mq9 doesn't work. Its characteristics are:
- Small data volume (tens of thousands to hundreds of thousands of AgentCards)
- Low-frequency writes (Agent registration is infrequent)
- Medium query frequency (DISCOVER calls won't be thousands per second)
- No significant performance or cost pressure
Every design decision is made around this target. A design that serves Agent Discover well is a good design; complexity beyond what Agent Discover requires is over-engineering.
Topic content search is another real application scenario, but it's not something for the current engineering phase. Its challenges are in a completely different league — millions of records, high-frequency writes, non-trivial embedding inference costs, doubled storage overhead. These challenges are real and need to be addressed eventually, but after Agent Discover is running smoothly.
Short-term focus on Agent Discover; long-term capable of serving topic search — these two are not in conflict. The key is getting the abstraction right and the implementation solid.
A Critical Design Decision: Don't Parse Messages
The most important design decision in this whole thing deserves its own section.
One of RobustMQ's core principles has always been "the broker does not parse message content" — message bodies are byte arrays, and the broker doesn't know whether they're JSON, Protobuf, or anything else. This is the foundation of RobustMQ's protocol neutrality.
After adding search capability, should we break this principle?
One approach would be to break it — let users declare in topic configuration "messages are JSON, index the $.payload.text field." RobustMQ would parse messages according to schema and extract fields for indexing.
But this approach has problems: it assumes messages are JSON, requires users to maintain schemas, adds "semi-parsing" logic to the broker, and requires different adapters for different message formats.
The other approach is to not break it — index message content as a whole:
- Full-text index: take payload bytes, tokenize them, build a BM25 inverted index
- Vector index: treat the entire payload as a text passage, pass it to an embedding model to generate a vector
RobustMQ doesn't know what fields are in the payload, and doesn't need to. JSON, Protobuf, plain text — they're all treated the same.
The trade-off is precision — the semantics of structured fields get submerged in the overall text. But that trade-off is acceptable:
- Full-text search is fuzzy matching by nature; if the keyword appears, it's a hit
- Vector search looks for semantic similarity; the overall meaning of the text is roughly right
- If you need exact field matching ("find messages where status=created"), that's SQL or application logic — not the job of SearchEngine
For the infrastructure layer, generality is more important than precision. This judgment determines why the SearchEngine interface is simple — no schema concept, no field paths, no type inference. The caller passes "content to index + metadata," and SearchEngine handles it.
This decision also keeps RobustMQ's core principle of protocol neutrality intact. Search capability is layered on top of the principle, not a modification of it.
Capability Scope: Four Basic Search Types
Making "good enough" concrete: four search capabilities:
By key: exact match on an identifier
By tag: exact match on a tag or tag set
Full-text: BM25-based tokenized inverted index
Vector: embedding-based semantic matchingThese four can be combined (filter by tag, then do vector search). But nothing beyond these four will be built:
- No complex query syntax (nested bool, function score, etc.)
- No aggregation analysis (group by, statistics, histograms)
- No advanced scoring (reranking, cross-encoder, custom scoring)
- No large-scale indexing (tens of millions of records and above)
For those needs, use Elasticsearch, Solr, or other specialized search engines.
The four basic capabilities cover most search scenarios in a messaging system. Doing them well is enough; trying to do more is not the goal.
This scope is permanent. Even as SearchEngine evolves, capability expansion will only deepen within the four basic types (supporting more tokenizers, tuning vector index parameters, supporting nested structures in tag search, etc.) — no new capability categories will be added.
Technology Choices
Not going into detail on the full technical selection, but a few key decisions:
LanceDB as the vector store and FTS engine.
LanceDB is an embedded vector database, pure Rust implementation. One line in Cargo and it's available, data stored in a local directory, no external processes needed. This aligns with RobustMQ's overall philosophy of "single binary, zero external dependencies."
LanceDB supports both vector search and full-text search (based on Tantivy, BM25 scoring) in a single component. The reason for not choosing Qdrant, Weaviate, or Milvus is that they're all server-based, requiring a separate deployed service that adds operational complexity.
fastembed-rs as the default embedding model.
Pure Rust embedding library with built-in mainstream open-source models (BGE, Jina, E5), using ONNX Runtime for CPU inference. No external API calls.
Default is BGE-small (130MB, 384 dimensions). Configuration override is supported (use a larger model or an external embedding service).
The default implementation aims for "good enough," not optimal.
Small embedding model (BGE-small rather than BGE-M3). Simple indexing strategy (no index for small data volumes, IVF_FLAT only above a threshold). No complex query optimization or caching strategies.
The reason: SearchEngine serves RobustMQ's internal routine search needs, not the high-intensity scenarios of a specialized search engine. Good enough is fine. If someone needs higher precision or larger scale, they can override the defaults through configuration or plug in a specialized component.
Index failures do not affect primary storage writes.
Embedding inference can fail (model loading failure, insufficient memory, etc.). Index writes can fail (LanceDB issues, etc.).
The principle: indexing failures are secondary errors — log them, possibly retry, but the message write itself cannot be rolled back because of them. This principle ensures that RobustMQ's core capability (reliable message storage) is never dragged down by the enhanced capability (search).
Index writes are async by default.
FTS indexing is fast (tokenization + inverted index, milliseconds). Vector indexing is slow (embedding inference: 30–150ms). Synchronous indexing would multiply write latency several times over.
Default is async — messages are written to primary storage and returned immediately; indexing happens in the background. The trade-off is a window (seconds to tens of seconds) where newly saved messages aren't yet searchable. Scenarios requiring strictly real-time search can configure synchronous mode.
SearchEngine's Place in RobustMQ's Architecture
Putting this capability back in the context of the overall RobustMQ architecture:
RobustMQ
├─ Protocol Layer (MQTT, Kafka, AMQP, NATS, mq9)
├─ Meta Service (metadata management, including mq9 registry)
├─ Storage Layer (File Segment, RocksDB, Memory)
└─ Search Engine (newly added capability layer)SearchEngine is an independent module, called by internal modules like Meta Service and Storage Layer. It doesn't belong to the protocol layer (not bound to a specific protocol), doesn't belong to the storage layer (doesn't only serve message storage) — it's a capability layer at the same level as both.
External users don't interact with SearchEngine directly. What they experience is:
- mq9's AGENT.DISCOVER can do semantic search
- A topic configured with search enabled can query messages through a search API
SearchEngine's existence makes these capabilities natural, but users don't need to face it directly.
First Wave of Use Cases
mq9 registry. Meta Service manages AgentCard metadata. When an AgentCard is registered, Meta Service takes the relevant fields from each skill (name, description, examples, tags), concatenates them into a text passage, and calls SearchEngine.save with that text and metadata (agent_id, mailbox, raw_card, etc.). When a client calls DISCOVER, Meta Service calls SearchEngine.search to get matching records and looks up the raw_card to return.
This is the primary goal for the current phase. Getting it solid is the short-term priority.
Topic content search. Topics can opt-in to enable search via configuration. Once enabled, Storage Layer writes to both primary storage and SearchEngine when writing a message — the payload is passed directly to SearchEngine as content. Search is exposed through a new API.
In the engineering roadmap, this comes after the mq9 registry is stable. It's not the core focus of the current phase, but it's a real scenario that SearchEngine should be able to serve long-term.
Potential future scenarios. Metadata search, Connector content routing, Bridge content filtering, etc. SearchEngine as a foundational capability, called by whoever needs it. These aren't things to do now, but with the right abstraction, they can reuse it directly later.
Questions Still Not Fully Resolved
Being honest — there are a few things about this that aren't fully worked out:
Binary payloads. Plain text can be tokenized and embedded directly. Binary data (images, video, Protobuf) treated as text produces meaningless indices. Possible approach: use heuristics like UTF-8 decode failures or high ratio of non-printable characters to detect binary, and skip indexing when detected. This detection logic lives in the caller; SearchEngine just receives "content to index."
Long text. Embedding models have token length limits (BGE-small is 512 tokens). Oversized messages either get truncated or split into chunks. Simple approach is truncation; complex approach is chunking (one message maps to multiple vectors). Default is truncation, with an option to enable chunking.
Replicas. RobustMQ storage uses replicas — does the index also have replicas? Simple approach: each replica builds its own index (N× write overhead). Complex approach: primary builds the index, replicas copy index files. Which path to take depends on RobustMQ's overall high-availability model.
Embedding model upgrades. Once a model is in production, all data's vectors are generated with it. Switching models requires regenerating all vectors — expensive. A likely approach: RobustMQ won't proactively upgrade embedding models; let users decide. The specific mechanism still needs to be figured out.
These questions don't affect the overall direction — they're all implementation details, to be sorted out when we actually start building.
On Boundaries
One last thing to be clear about —
Search is an enhancement to RobustMQ, not a core capability.
RobustMQ's core is communication. MQTT, Kafka, AMQP, NATS, mq9 — five protocols on a unified storage layer — that's the reason RobustMQ exists. Search, rule engines, Connectors are local enhancements around communication that make RobustMQ more useful in certain scenarios.
This boundary matters. If SearchEngine tries to become a general AI data platform, or tries to replace Elasticsearch, RobustMQ becomes something that does everything and masters nothing. That's the most dangerous direction for an infrastructure project — scattering energy across capability expansion while the core degrades.
Holding the boundary means:
Communication is the core. MQTT, Kafka, AMQP, NATS, mq9 protocols themselves; the storage layer above them; the protocol-neutral design — these are the parts that must be done well. Performance, stability, protocol compatibility cannot be compromised.
SearchEngine is an enhancement. It exists not to replace specialized search engines, but to provide basic search capability for scenarios where "I don't want to deploy multiple components" — edge deployments, small and medium businesses, embedded scenarios, development and test environments. In these scenarios, the operational cost of a dedicated component matters more than the capability gap. Users would rather accept 60-point search quality than deploy a separate system for simple needs.
Rule engines and Connectors are also enhancements. They make RobustMQ usable for data integration and lightweight stream processing, but without trying to beat Flink or Airbyte.
This boundary won't change long-term. Even as SearchEngine evolves and users ask for more search capabilities, expansion will only deepen within the four basic types (by key, by tag, full-text, vector) — not attempt to become a search engine. Rule engines and Connectors follow the same constraint.
This restraint isn't a lack of ambition; it's how infrastructure projects stay alive long-term. Kafka added Streams, Connect, and Schema Registry — but Kafka remains a streaming message platform. PostgreSQL added vector, JSON, and geospatial — but PostgreSQL remains a relational database. Redis added modules, streams, and vector — but Redis remains an in-memory data structure service. All of these projects expanded their capabilities while never changing their fundamental identity. RobustMQ takes the same path — excel at the core, do enhancements appropriately, don't try to do everything.
Overall Cadence
Putting everything above together, the engineering cadence:
Short-term: Build SearchEngine with the goal of getting the mq9 registry's Agent Discover working. Performance, scale, and stability targets are set by Agent Discover's requirements.
Medium-term: After validating that the basic capabilities are solid, expand to topic search. This may require optimizations for the higher load of topic search (async queues, batch processing, etc.).
Long-term: Let actual usage feedback drive the evolution direction. But the capability scope boundary (four basic search types) doesn't change.
As for how far this ultimately goes — whether it only serves the mq9 registry, or actually becomes a stable foundational search component within RobustMQ — that depends on real-world usage. The current judgment is that it's worth doing and the thinking is sound. The rest will be figured out as we build.
