Docs/Architecture

Architecture

How am-server works under the hood — from request routing to LadybugDB graph retrieval, pheromone dynamics, and GCP deployment topology.

1. Overview#

Agentverse Memory is a persistent memory service for LLM agents. Agents write memories during interactions; the service stores them in a per-agent knowledge graph and retrieves them on demand. Unlike flat vector stores, retrieval quality improves with use — every successful retrieval deposits a pheromone trace that biases future traversal toward proven paths.

Zero LLM at write time

Memories are ingested via TF-IDF entity extraction and direct graph insertion — no API calls, no variable latency, no per-write token cost.

Graph-first storage

Every agent's memory lives in a LadybugDB embedded graph database, enabling multi-hop relational retrieval that flat vector search cannot do.

Pheromone-guided traversal

Every retrieval deposits typed decay traces on nodes and edges it traverses. A* cost function treats those traces as first-class signals.

2. Single Binary (ARCH-001)#

Decision: MCP server and REST API run in the same am-server binary. There is one Cloud Run service, not two.

Why: LadybugDB is single-process

LadybugDB uses file-based locking and a write-ahead log (WAL). Multiple processes opening the same database directory is not supported — it causes WAL corruption. This is not a configuration issue; it's fundamental to embedded databases. Running MCP and REST in the same process eliminates this risk entirely.

POST /mcp ───┐
│ am-server (single Axum binary, one Cloud Run service)
GET /health ──┤ ├── am-core domain types, error handling
GET /ready ──┘ ├── am-config env-var config (serde + envy)
├── am-retrieval A* pheromone traversal, TF-IDF, BM25+HNSW RRF
└── am-storage LadybugDB FFI, palace pool LRU, Redis working memory

Cloud Run concurrency model

containerConcurrency=80 means 80 concurrent HTTP requests hit one process, handled by Tokio's async runtime. The palace pool uses RwLock<PalacePool> internally. No cross-process coordination; no corruption risk.

3. Deployment Topology#

┌─────────────────────────────────────────────────────────┐
│ LLM Agent (Python / TypeScript / Claude / GPT / any) │
└──────────────────────────┬──────────────────────────────┘
│ MCP tools or REST API
│ Authorization: Bearer avmem_sk_...
┌─────────────────────────────────────────┐
│ Cloud Load Balancer (GCP, HTTPS) │
│ memory.agentverse.ai │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ am-server (Cloud Run, us-central1) │
│ project: fetch-coder │
│ 2 vCPU · 4 GiB · concurrency=80 │
│ scale-to-zero · min-instances=1 │
│ │
│ POST /mcp ← JSON-RPC 2.0 │
│ GET|POST /v1/* ← REST API │
│ GET /health ← liveness probe │
│ GET /metrics ← Prometheus scrape │
└──────────────┬──────────────────────────┘
┌───────────┴────────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────────┐
│ GCP Filestore NFS │ │ Memorystore Redis │
│ /data/palaces/ │ │ Working memory │
│ {agent_id}/ │ │ Auth cache │
│ palace.kuzu │ │ Rate limiting │
└────────────────────┘ └────────────────────────┘

4. Memory Types & Storage#

Five memory types are supported. Four are stored in LadybugDB; Working memory is Redis-primary.

TypeNode TablePrimary StoreDefault τTools prefix
EpisodicEpisodeNodeLadybugDB24 hmemory_*_episode*
SemanticFactNodeLadybugDB7 dmemory_*_fact*
ProceduralProcedureNodeLadybugDB30 dmemory_*_procedure*
WorkingWorkingMemoryNodeRedis1 hmemory_*_working*
SharedSharedSpaceNodeLadybugDB (shared)memory_*_space*

Palace Pool

am-server maintains an LRU pool of up to 100 concurrently open LadybugDB databases (PALACE_MAX_OPEN=100). When an agent's request arrives and their palace is not in the pool, it is opened from NFS and the least-recently-used palace is evicted if the pool is at capacity. New agents get a fresh LadybugDB database with schema initialized on first open.

5. Retrieval Pipeline#

Every query runs a 5-stage hybrid pipeline. Stages 1–3 run in parallel; stages 4–5 are sequential.

1
BM25 full-text searchAll tiers

LadybugDB native FTS index over episode content. Returns ranked list of episode IDs.

2
HNSW vector searchAll tiers

Approximate nearest-neighbour search over TF-IDF embeddings (FLOAT[1536] Pro+, PCA-reduced FLOAT[512] Explorer/Builder).

3
RRF mergeAll tiers

Reciprocal Rank Fusion combines BM25 and HNSW ranked lists into a single merged ranking. No threshold tuning required.

4
Pheromone rerankingAll tiers

Each result score is multiplied by its current pheromone weight w(t) = w₀·exp(−Δt/τ), computed lazily at query time. No background daemon.

5
A* graph traversalBuilder+ (403 on Explorer)

Results are expanded along high-pheromone graph edges using A* pathfinding. cost(edge) = 1 − pheromone_weight(edge). Max hops: 6.

6. Pheromone Model#

Agentverse Memory uses a stigmergic retrieval model inspired by ant colony optimization (ACO). Every retrieval deposits a pheromone trace; every trace decays exponentially over time. Memories that are frequently retrieved accumulate weight faster than they decay — making them progressively more retrievable. Memories that become irrelevant decay to near-zero without requiring explicit deletion.

Decay (lazy, at query time)

w(t) = w₀ · exp(−Δt / τ)

Computed in Rust with no background daemon, no write amplification.

Deposit (on successful retrieval)

w_new = clamp(w + α, 0, 1)

α = 0.1 (configurable). Deposit happens post-retrieval, asynchronously.

Memory Typeτ (decay half-life)Rationale
Episode1 dayConversation context fades quickly
Entity / Fact7 daysFacts persist through a work week
Procedure30 daysHow-to knowledge is durable
Working1 hourCurrent task context — intentionally ephemeral
Graph Edge3 daysRelationships have medium persistence

Research: The pheromone model is being formalized in PheromGraph (ICLR 2027 target) — the first paper to apply stigmergic dynamics to LLM knowledge graph retrieval.

7. LadybugDB Schema (Key Tables)#

Note: LadybugDB is our active fork of Kùzu, optimized for agent memory workloads — including per-agent database isolation, pheromone column support, and LRU palace pool management. Cypher syntax and file format remain compatible.

Each agent gets an isolated LadybugDB database at /data/palaces/{agent_id}/palace.kuzu on GCP Filestore. Shared spaces (Builder+) get their own dedicated LadybugDB DB.

create_nodes.cypher
-- Episodic memory: time-stamped events, verbatim storage
CREATE NODE TABLE EpisodeNode (
id STRING PRIMARY KEY, -- UUID v4: "ep_550e8400-..."
agent_id STRING NOT NULL,
content STRING NOT NULL, -- verbatim text, no summarization
embedding FLOAT[1536], -- TF-IDF + optional dense embedding
valid_at TIMESTAMP NOT NULL, -- when episode occurred (temporal)
invalid_at TIMESTAMP, -- when superseded (NULL = still valid)
session_id STRING,
pheromone_w FLOAT DEFAULT 1.0, -- current weight (decayed at query time)
pheromone_tau FLOAT DEFAULT 86400.0, -- decay half-life in seconds
last_accessed TIMESTAMP -- for lazy decay computation
);
-- Semantic memory: knowledge triples
CREATE NODE TABLE FactNode (
id STRING PRIMARY KEY,
agent_id STRING NOT NULL,
subject STRING NOT NULL,
predicate STRING NOT NULL,
object_ STRING NOT NULL,
valid_at TIMESTAMP NOT NULL,
invalid_at TIMESTAMP, -- temporal validity
confidence FLOAT DEFAULT 1.0,
pheromone_w FLOAT DEFAULT 1.0,
pheromone_tau FLOAT DEFAULT 604800.0 -- 7-day default
);
-- Graph edges carry pheromone weight for A* traversal
CREATE REL TABLE FACT_RELATES_FACT (
FROM FactNode TO FactNode,
relation_type STRING NOT NULL,
pheromone_w FLOAT DEFAULT 1.0 -- edge-level traversal weight
);

Full schema: github.com/fetchai/agentverse-memory

8. Performance Targets#

Phase 1 (in-memory am-local) verified 2026-05-12. Phase 2 targets (Cloud Run + Filestore) are design targets, not yet measured.

MetricPhase 1 (am-local)Phase 2 TargetComparison
Write op (engine, no LLM)0.035 ms ✅<5 msMem0: ~2,590 ms LLM-extraction write step
Query op (engine)0.069 ms ✅<20 msZep: ~800 ms (deployed)
Graph traversal (A*)N/A (in-memory)<50 msMem0: graph behind $249/mo
Max agents (Explorer)3Mem0: 3 (same)
Ops/mo (Explorer)50KMem0: 100 ops (!!)
LLM calls at write0 ✅0Mem0: 1 LLM call/write

Phase 1 latencies measured on am-local (2.3 MB stripped binary). 5 episodes ingested, 2 queries. 35/35 tests pass. Ablation modes: full pheromone, --no-pheromone, --no-astar.

Explore Further