Memory
Agentend implements a 5-tier context system called the ContextBus. Memory is progressively hydrated — fast, always-available tiers load first, while slower, richer tiers load asynchronously. If Redis or PostgreSQL are unavailable, the system degrades gracefully rather than failing.
5-tier overview
| Tier | Backend | Latency | Use Case |
|---|---|---|---|
| Working | In-process dict | <1ms | Current request state, scratchpad values |
| Session | Redis | 1-5ms | Conversation history within a session |
| Semantic | pgvector | 5-50ms | Similarity search over past interactions |
| Core Blocks | System prompt | 0ms | Domain context, capability instructions |
| Consolidation | Mem0 / Built-in | Async | Long-term memory extraction and archival |
Progressive hydration
When a request arrives, the ContextBus hydrates memory in 4 stages. Each stage runs only if the previous one has completed and the backend is available:
- Stage 1 — Core blocks + Working memory
Always available, <1ms. Loads system prompt blocks and current request state. - Stage 2 — Session history
Redis-backed, ~10ms. Loads conversation history for the active session. - Stage 3 — Semantic search
pgvector-backed, ~100ms. Finds relevant past interactions via embedding similarity. - Stage 4 — Agent-driven retrieval
The agent can call a retrieve_context tool to pull in additional context on demand.
Working memory
Working memory is an in-process Python dictionary that lives for the duration of the request. It stores scratchpad values, intermediate computation results, and per-request state. There is no network overhead.
# Working memory is automatically available in capabilities
self.working_memory.set("extracted_total", 1250.00)
total = self.working_memory.get("extracted_total")Session memory
Session memory persists conversation history in Redis, keyed by session ID. It supports configurable TTL, max size limits, and FIFO eviction. If Redis is unavailable, the system continues with working memory only.
# fleet.yaml
memory:
session:
enabled: true
type: redis
ttl: 3600 # 1 hour
max_size_mb: 10
strategy: fifoSemantic memory
Semantic memory stores embeddings of past interactions in PostgreSQL with the pgvector extension. When a new request arrives, the ContextBus performs a similarity search to find relevant historical context. Assistant messages are automatically embedded and stored after each interaction.
# fleet.yaml
memory:
semantic:
enabled: true
type: pgvector
vector_size: 1536
similarity_threshold: 0.7
consolidation_schedule: dailyCore blocks
Core blocks are static system prompt fragments that are always included. They come from two sources: the framework (security instructions, output format rules) and the capability (get_domain_context()). Core blocks have zero latency because they are loaded at startup.
Consolidation
The consolidation tier runs asynchronously after request completion. It extracts important facts and patterns from conversations and stores them as long-term memories. Agentend supports two consolidation engines:
- •Mem0 — External memory service with managed embeddings and retrieval. Requires a Mem0 API key.
- •Built-in engine — Uses the configured LLM to extract and summarize memories. No external dependency.
# fleet.yaml
memory:
consolidation:
enabled: true
schedule: daily
archive_after_days: 30
compression_ratio: 0.8Graceful degradation
The ContextBus is designed to never crash due to a missing backend. If Redis is unreachable, session memory is silently disabled. If PostgreSQL is down, semantic memory is skipped. The agent continues operating with whatever tiers are available — at minimum, working memory and core blocks are always present.