Skip to content

What Is StreamNative Orca Agent Engine

In September 2025, StreamNative released Orca Agent Engine, which is currently in private preview. Orca is not an Agent framework, but an infrastructure layer designed for deploying AI Agents in production. In short, it addresses the question: "How do we take an Agent from a lab prototype to production?"

There are already many Agent development frameworks on the market—LangChain, OpenAI Agents SDK, Google ADK, and others. These help developers quickly build Agent prototypes, but when you want to deploy them to production, you hit many infrastructure questions: How do you keep an Agent online? How do you manage Agent state? How do you coordinate multiple Agents? How do you monitor and govern these autonomous systems? Orca exists to solve these problems.

Core Architecture

Orca's architecture can be understood as event bus + Agent runtime. At the bottom it uses Pulsar or Kafka as a shared event bus; all Agents communicate through it. Agents subscribe to topics they care about, process events, and publish results to other topics. This forms an event-driven interaction network.

On top of this event bus, Orca provides a managed Agent runtime. Developers package their Agent code and deploy it; Orca handles lifecycle management, state persistence, tool invocation, monitoring, and governance. Agent code focuses on business logic and doesn't need to deal with message subscription, deserialization, retries, and other infrastructure details.

Compared to using Kafka or Pulsar directly, Orca adds several Agent-specific optimizations. Persistent streaming memory lets Agents maintain context across multiple turns of conversation. MCP tool integration lets Agents dynamically discover and use external tools. Asynchronous LLM call support avoids blocking the event loop when Agents call LLMs. General-purpose message queues don't offer these.

Orca evolved from Apache Pulsar's serverless compute foundation (Pulsar Functions). Pulsar Functions have proven reliable in production; Orca extends this with AI Agent-specific features. Agents running on Orca automatically inherit StreamNative Cloud's elastic scaling, multi-tenancy isolation, and high availability.

From Stateless to Stateful

Traditional Agents are mostly stateless request-response. A user sends a request, the Agent processes it and returns a response, then forgets everything. On the next request, the Agent has no memory of past interactions.

Orca turns Agents into always-on stateful services. Agents can maintain persistent streaming memory and keep context across conversations and interactions. This is crucial for truly autonomous Agents. Consider a support Agent that must remember prior questions, preferences, and purchase history; or a monitoring Agent that must track trends in system state.

State management is deeply integrated with StreamNative Cloud's scaling and tenant capabilities. Agent state persists in the event stream, so it can resume after restarts, migrations, or fault recovery. This lets Agents run as reliably as microservices.

MCP Tool Integration

Orca deeply integrates the Model Context Protocol (MCP), an open protocol from Anthropic. Through MCP, Agents can safely use external tools: call REST APIs, query databases, read real-time data streams, invoke cloud services, even manage infrastructure in natural language.

StreamNative built its own MCP Server (open-source component) to bridge Pulsar/Kafka event streams with external tools and APIs. Developers define tool interfaces, authentication, and parameter schemas; any Agent can dynamically discover and use these tools when needed. No custom glue code per integration, no credential leaks.

A typical scenario is managing a Pulsar cluster in natural language. A user can tell the Agent "increase the retention of topic 'user-signups' to 3 days," and the Agent uses MCP to call Pulsar's admin API. Or "show consumer lag for all topics," and the Agent will query cluster state and return results.

Framework-Agnostic and Orchestration

Orca supports multiple Agent frameworks, including Google ADK, OpenAI Agents SDK, and plain Python code. Developers can package and deploy existing Agent code without changes. LangGraph and other frameworks are on the roadmap. This means developers aren't locked into one framework.

Orca provides flexible orchestration. Multiple Agents can run in sequence, with one's output as another's input. Multiple Agents can run in parallel for different tasks. Execution paths can be adjusted dynamically by rules or events. This simplifies building complex event-driven architectures.

An order processing system might need several Agents working together: inventory check, payment processing, logistics scheduling, notification. Orca can coordinate them based on order state and events. Each Agent focuses on its domain; they collaborate via the event bus.

Governance and Observability

For production AI Agents, governance and observability are essential. Orca provides full monitoring: end-to-end trace of event flows, Agent latency, anomaly capture, and visualization of Agent interactions and decision paths.

For governance, Orca supports RBAC, secure secret management, audit logs, versioning, and canary rollout. In emergencies, you can pause Agent autonomy or disable event subscriptions with a single action. For debugging, you can replay Agent event inputs to reproduce behavior, or inspect Agent state logs to understand decisions.

These capabilities let enterprises deploy autonomous Agents while keeping necessary oversight and control.

Agent Mesh

Orca's longer-term vision is the "Agent Mesh." In this mesh, multiple autonomous Agents connect via the event bus, dynamically discover and invoke each other's capabilities, share context and state, and collaborate on complex tasks. The system is self-organizing and self-healing.

This is similar to Service Mesh for microservices, but designed for AI Agents. Agent Mesh provides unified infrastructure for communication, discovery, governance, and observability, so developers can focus on business logic instead of building infrastructure.

A typical use case is autonomous incident response. A monitoring Agent continually listens to metrics events; when it detects anomalies, it triggers a diagnostic Agent to analyze logs and find root cause; then a remediation Agent restarts affected services and adjusts resource quotas; finally a notification Agent sends a report to on-call. The whole flow completes in seconds, with multiple Agents coordinating automatically over the event bus.

Current State

Orca entered private preview on September 30, 2025, first in BYOC (Bring Your Own Cloud) deployments. It currently supports Google ADK and OpenAI Agents SDK; support will expand to all deployment modes in the coming months. Organizations can join the preview via StreamNative's customer team.

As a new product, Orca's production readiness will take time to validate. Whether the market really needs dedicated Agent infrastructure, or generic serverless plus a message queue is enough, also remains to be seen. But at least technically, Orca offers a clear answer: on top of an event-driven message queue, build an infrastructure layer specifically for AI Agents.

The Essence

Orca essentially positions the message queue (Pulsar/Kafka) as the communication infrastructure for AI Agents, instead of each Agent working in isolation. Through a unified event-driven architecture, multiple Agents form a collaborative network. Orca adds a dedicated runtime for Agents on top of the event bus, providing state management, tool integration, orchestration, governance, and more.

This is StreamNative's most aggressive AI product and represents a shift from "supporting AI" to "redesigning infrastructure for AI." It attempts to address core issues in AI Agent deployment: scattered data, brittle pipelines, isolated Agent processes. Success or not remains to be seen.