Memory

Agentend implements a 5-tier context system called the ContextBus. Memory is progressively hydrated — fast, always-available tiers load first, while slower, richer tiers load asynchronously. If Redis or PostgreSQL are unavailable, the system degrades gracefully rather than failing.

5-tier overview

TierBackendLatencyUse Case
WorkingIn-process dict<1msCurrent request state, scratchpad values
SessionRedis1-5msConversation history within a session
Semanticpgvector5-50msSimilarity search over past interactions
Core BlocksSystem prompt0msDomain context, capability instructions
ConsolidationMem0 / Built-inAsyncLong-term memory extraction and archival

Progressive hydration

When a request arrives, the ContextBus hydrates memory in 4 stages. Each stage runs only if the previous one has completed and the backend is available:

  1. Stage 1 — Core blocks + Working memory
    Always available, <1ms. Loads system prompt blocks and current request state.
  2. Stage 2 — Session history
    Redis-backed, ~10ms. Loads conversation history for the active session.
  3. Stage 3 — Semantic search
    pgvector-backed, ~100ms. Finds relevant past interactions via embedding similarity.
  4. Stage 4 — Agent-driven retrieval
    The agent can call a retrieve_context tool to pull in additional context on demand.

Working memory

Working memory is an in-process Python dictionary that lives for the duration of the request. It stores scratchpad values, intermediate computation results, and per-request state. There is no network overhead.

# Working memory is automatically available in capabilities
self.working_memory.set("extracted_total", 1250.00)
total = self.working_memory.get("extracted_total")

Session memory

Session memory persists conversation history in Redis, keyed by session ID. It supports configurable TTL, max size limits, and FIFO eviction. If Redis is unavailable, the system continues with working memory only.

# fleet.yaml
memory:
  session:
    enabled: true
    type: redis
    ttl: 3600        # 1 hour
    max_size_mb: 10
    strategy: fifo

Semantic memory

Semantic memory stores embeddings of past interactions in PostgreSQL with the pgvector extension. When a new request arrives, the ContextBus performs a similarity search to find relevant historical context. Assistant messages are automatically embedded and stored after each interaction.

# fleet.yaml
memory:
  semantic:
    enabled: true
    type: pgvector
    vector_size: 1536
    similarity_threshold: 0.7
    consolidation_schedule: daily

Core blocks

Core blocks are static system prompt fragments that are always included. They come from two sources: the framework (security instructions, output format rules) and the capability (get_domain_context()). Core blocks have zero latency because they are loaded at startup.

Consolidation

The consolidation tier runs asynchronously after request completion. It extracts important facts and patterns from conversations and stores them as long-term memories. Agentend supports two consolidation engines:

  • Mem0 — External memory service with managed embeddings and retrieval. Requires a Mem0 API key.
  • Built-in engine — Uses the configured LLM to extract and summarize memories. No external dependency.
# fleet.yaml
memory:
  consolidation:
    enabled: true
    schedule: daily
    archive_after_days: 30
    compression_ratio: 0.8

Graceful degradation

The ContextBus is designed to never crash due to a missing backend. If Redis is unreachable, session memory is silently disabled. If PostgreSQL is down, semantic memory is skipped. The agent continues operating with whatever tiers are available — at minimum, working memory and core blocks are always present.