Memory

Agentend implements a 5-tier context system called the ContextBus. Memory is progressively hydrated — fast, always-available tiers load first, while slower, richer tiers load asynchronously. If Redis or PostgreSQL are unavailable, the system degrades gracefully rather than failing.

5-tier overview

Tier	Backend	Latency	Use Case
Working	In-process dict	<1ms	Current request state, scratchpad values
Session	Redis	1-5ms	Conversation history within a session
Semantic	pgvector	5-50ms	Similarity search over past interactions
Core Blocks	System prompt	0ms	Domain context, capability instructions
Consolidation	Mem0 / Built-in	Async	Long-term memory extraction and archival

Progressive hydration

When a request arrives, the ContextBus hydrates memory in 4 stages. Each stage runs only if the previous one has completed and the backend is available:

Stage 1 — Core blocks + Working memory
Always available, <1ms. Loads system prompt blocks and current request state.
Stage 2 — Session history
Redis-backed, ~10ms. Loads conversation history for the active session.
Stage 3 — Semantic search
pgvector-backed, ~100ms. Finds relevant past interactions via embedding similarity.
Stage 4 — Agent-driven retrieval
The agent can call a retrieve_context tool to pull in additional context on demand.

Working memory

Working memory is an in-process Python dictionary that lives for the duration of the request. It stores scratchpad values, intermediate computation results, and per-request state. There is no network overhead.

# Working memory is automatically available in capabilities
self.working_memory.set("extracted_total", 1250.00)
total = self.working_memory.get("extracted_total")

Session memory

Session memory persists conversation history in Redis, keyed by session ID. It supports configurable TTL, max size limits, and FIFO eviction. If Redis is unavailable, the system continues with working memory only.

# fleet.yaml
memory:
  session:
    enabled: true
    type: redis
    ttl: 3600        # 1 hour
    max_size_mb: 10
    strategy: fifo

Semantic memory

Semantic memory stores embeddings of past interactions in PostgreSQL with the pgvector extension. When a new request arrives, the ContextBus performs a similarity search to find relevant historical context. Assistant messages are automatically embedded and stored after each interaction.

# fleet.yaml
memory:
  semantic:
    enabled: true
    type: pgvector
    vector_size: 1536
    similarity_threshold: 0.7
    consolidation_schedule: daily

Core blocks

Core blocks are static system prompt fragments that are always included. They come from two sources: the framework (security instructions, output format rules) and the capability (get_domain_context()). Core blocks have zero latency because they are loaded at startup.

Consolidation

The consolidation tier runs asynchronously after request completion. It extracts important facts and patterns from conversations and stores them as long-term memories. Agentend supports two consolidation engines:

•Mem0 — External memory service with managed embeddings and retrieval. Requires a Mem0 API key.
•Built-in engine — Uses the configured LLM to extract and summarize memories. No external dependency.

# fleet.yaml
memory:
  consolidation:
    enabled: true
    schedule: daily
    archive_after_days: 30
    compression_ratio: 0.8

Graceful degradation

The ContextBus is designed to never crash due to a missing backend. If Redis is unreachable, session memory is silently disabled. If PostgreSQL is down, semantic memory is skipped. The agent continues operating with whatever tiers are available — at minimum, working memory and core blocks are always present.