Skip to content

Meta Service Architecture

Technology Stack

gRPC + Multi Raft (openraft) + RocksDB

  • All communication, both between nodes and with external clients, uses gRPC
  • Multi Raft is implemented using openraft, providing multi-node data consistency
  • RocksDB persists all data, including Raft logs and snapshots

Responsibilities

ResponsibilityDescription
Cluster coordinationNode discovery, join/leave management, inter-node data distribution
Metadata storageBroker node info, Topic configuration, Schema, Connector configuration, Storage Engine shard metadata
KV business dataMQTT Session, retained messages, will messages, subscriptions, ACL, blocklist, and other runtime data
Consumer offsetsConsumer group offset commit and management
ControllerSession expiry cleanup, Last Will delayed delivery, Storage Engine GC, Connector task scheduling

Multi Raft Architecture

Meta Service runs three independent Raft Groups. Each Group has its own Leader and storage, and they operate in parallel without blocking each other:

Raft GroupStored Data
metadataCluster node info, general KV storage, Schema, resource configuration, Storage Engine Shard/Segment metadata
offsetConsumer group offset commits and management
mqttUsers, Topics, Sessions, retained messages, will messages, subscriptions, ACL, blocklist, Connectors, shared subscription group Leaders

Meta Service Multi Raft Architecture

Raft Parameters:

ParameterValue
heartbeat_interval250ms
election_timeout_min299ms
write_timeout30s (configurable)
Slow write warning threshold1000ms

Write Path

Writes that exceed write_timeout (default 30s) return an error. Writes that exceed 1000ms produce a warn log entry.

Meta Service Write Path


Data Storage

  • Raft logs: Stored in RocksDB, fully recovered after node restart
  • Raft snapshots: Generated periodically to compact logs and accelerate node recovery
  • Business data: Written to the corresponding RocksDB Column Family via DataRoute
  • In-memory cache: CacheManager maintains a hot data cache to reduce RocksDB read pressure; cold data is read and written directly from RocksDB

Controller (BrokerController)

After the Leader node starts, it runs BrokerController, which handles background scheduling:

Background TaskDescription
Session expiry cleanupPeriodically scans for expired Sessions and cleans up associated data
Last Will delayed deliveryDetects due will messages and triggers delivery to the Broker
Storage Engine GCCleans up residual data from deleted Shards / Segments
Connector schedulingCreates, assigns, and tracks the status of Connector tasks

Startup Sequence

  1. Read meta_addrs from configuration to obtain all Meta Node addresses
  2. Initialize MultiRaftManager, creating the metadata, offset, and mqtt Raft Groups in sequence
  3. Establish gRPC connections to all nodes, completing cluster initialization and leader election
  4. Leader node starts BrokerController
  5. Meta Service is ready and begins serving gRPC requests to Broker and Storage Engine

Comparison with ZooKeeper / etcd

DimensionZooKeeperetcdMeta Service
Consensus protocolZAB (single Leader)Single RaftMulti Raft
StorageAll in-memoryBoltDBRocksDB
ScalabilityLimited by memoryLimited by single RaftEach Raft Group scales independently
Feature scopeMetadata coordinationMetadata coordinationMetadata + KV storage + Controller
External dependencyYesYesNo (built-in)
🎉 既然都登录了 GitHub,不如顺手给我们点个 Star 吧!⭐ 你的支持是我们最大的动力 🚀