Architecture

How Agent as a Backend processes every request

Request Flow

Frontend

User sends a natural language intent

Auth & Security

JWT validation, PALADIN injection defense

Intent Router

Kernel classifies intent via small model (<10ms)

Capability Registry

Dispatches to the matching registered Capability

Context Bus

Progressive memory hydration across 5 tiers

Worker Fleet

Typed workers execute with LiteLLM backend

AG-UI Event Stream

13 event types streamed via SSE to frontend

Mem0 Consolidation

Async fact extraction: ADD / UPDATE / DELETE / NOOP

Memory Tiers

WorkingPython dict<1msCurrent request context

SessionRedis1-5msConversation history within session

Semanticpgvector5-50msLong-term facts, similar queries

Core BlocksSystem prompt0msAgent identity, pinned knowledge

ConsolidationMem0asyncFact extraction after request

Protocol Triangle

AG-UI

User-facing

SSE event streaming to frontends

MCP

Tool-facing

Aggregate external tools or expose agent as tool

A2A

Agent-facing

Agent card discovery and task delegation

Fleet Worker Slots

classify

claude-haiku-4-5

Intent classification, fast and cheap

180ms p50budget: gemini-2.0-flashlocal: qwen2.5-7b

extract

claude-sonnet-4-6

Structured data extraction, JSON output

600ms p50budget: gemini-2.5-flashlocal: qwen2.5-72b

verify

claude-opus-4-6

Fact checking, validation, highest reasoning

2000ms p50budget: gemini-2.5-flashlocal: llama-4-maverick

summarize

claude-sonnet-4-6

Content summarization, condensation

600ms p50budget: gemini-2.0-flashlocal: phi-4-14b

generate

claude-opus-4-6

Long-form text and code generation

2000ms p50budget: gemini-2.5-flashlocal: qwen2.5-coder-32b

tool_call

claude-sonnet-4-6

Tool selection and execution, 97.2% success rate

600ms p50budget: gemini-2.5-flashlocal: llama-4-maverick