Architecture
How am-server works under the hood — from request routing to LadybugDB graph retrieval, pheromone dynamics, and GCP deployment topology.
1. Overview#
Agentverse Memory is a persistent memory service for LLM agents. Agents write memories during interactions; the service stores them in a per-agent knowledge graph and retrieves them on demand. Unlike flat vector stores, retrieval quality improves with use — every successful retrieval deposits a pheromone trace that biases future traversal toward proven paths.
Memories are ingested via TF-IDF entity extraction and direct graph insertion — no API calls, no variable latency, no per-write token cost.
Every agent's memory lives in a LadybugDB embedded graph database, enabling multi-hop relational retrieval that flat vector search cannot do.
Every retrieval deposits typed decay traces on nodes and edges it traverses. A* cost function treats those traces as first-class signals.
2. Single Binary (ARCH-001)#
Decision: MCP server and REST API run in the same am-server binary. There is one Cloud Run service, not two.
Why: LadybugDB is single-process
LadybugDB uses file-based locking and a write-ahead log (WAL). Multiple processes opening the same database directory is not supported — it causes WAL corruption. This is not a configuration issue; it's fundamental to embedded databases. Running MCP and REST in the same process eliminates this risk entirely.
POST /mcp ───┐ │ am-server (single Axum binary, one Cloud Run service)GET /health ──┤ ├── am-core domain types, error handlingGET /ready ──┘ ├── am-config env-var config (serde + envy) ├── am-retrieval A* pheromone traversal, TF-IDF, BM25+HNSW RRF └── am-storage LadybugDB FFI, palace pool LRU, Redis working memoryCloud Run concurrency model
containerConcurrency=80 means 80 concurrent HTTP requests hit one process, handled by Tokio's async runtime. The palace pool uses RwLock<PalacePool> internally. No cross-process coordination; no corruption risk.
3. Deployment Topology#
┌─────────────────────────────────────────────────────────┐│ LLM Agent (Python / TypeScript / Claude / GPT / any) │└──────────────────────────┬──────────────────────────────┘ │ MCP tools or REST API │ Authorization: Bearer avmem_sk_... ▼ ┌─────────────────────────────────────────┐ │ Cloud Load Balancer (GCP, HTTPS) │ │ memory.agentverse.ai │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ am-server (Cloud Run, us-central1) │ │ project: fetch-coder │ │ 2 vCPU · 4 GiB · concurrency=80 │ │ scale-to-zero · min-instances=1 │ │ │ │ POST /mcp ← JSON-RPC 2.0 │ │ GET|POST /v1/* ← REST API │ │ GET /health ← liveness probe │ │ GET /metrics ← Prometheus scrape │ └──────────────┬──────────────────────────┘ │ ┌───────────┴────────────┐ ▼ ▼┌────────────────────┐ ┌────────────────────────┐│ GCP Filestore NFS │ │ Memorystore Redis ││ /data/palaces/ │ │ Working memory ││ {agent_id}/ │ │ Auth cache ││ palace.kuzu │ │ Rate limiting │└────────────────────┘ └────────────────────────┘4. Memory Types & Storage#
Five memory types are supported. Four are stored in LadybugDB; Working memory is Redis-primary.
| Type | Node Table | Primary Store | Default τ | Tools prefix |
|---|---|---|---|---|
| Episodic | EpisodeNode | LadybugDB | 24 h | memory_*_episode* |
| Semantic | FactNode | LadybugDB | 7 d | memory_*_fact* |
| Procedural | ProcedureNode | LadybugDB | 30 d | memory_*_procedure* |
| Working | WorkingMemoryNode | Redis | 1 h | memory_*_working* |
| Shared | SharedSpaceNode | LadybugDB (shared) | — | memory_*_space* |
Palace Pool
am-server maintains an LRU pool of up to 100 concurrently open LadybugDB databases (PALACE_MAX_OPEN=100). When an agent's request arrives and their palace is not in the pool, it is opened from NFS and the least-recently-used palace is evicted if the pool is at capacity. New agents get a fresh LadybugDB database with schema initialized on first open.
5. Retrieval Pipeline#
Every query runs a 5-stage hybrid pipeline. Stages 1–3 run in parallel; stages 4–5 are sequential.
LadybugDB native FTS index over episode content. Returns ranked list of episode IDs.
Approximate nearest-neighbour search over TF-IDF embeddings (FLOAT[1536] Pro+, PCA-reduced FLOAT[512] Explorer/Builder).
Reciprocal Rank Fusion combines BM25 and HNSW ranked lists into a single merged ranking. No threshold tuning required.
Each result score is multiplied by its current pheromone weight w(t) = w₀·exp(−Δt/τ), computed lazily at query time. No background daemon.
Results are expanded along high-pheromone graph edges using A* pathfinding. cost(edge) = 1 − pheromone_weight(edge). Max hops: 6.
6. Pheromone Model#
Agentverse Memory uses a stigmergic retrieval model inspired by ant colony optimization (ACO). Every retrieval deposits a pheromone trace; every trace decays exponentially over time. Memories that are frequently retrieved accumulate weight faster than they decay — making them progressively more retrievable. Memories that become irrelevant decay to near-zero without requiring explicit deletion.
Decay (lazy, at query time)
w(t) = w₀ · exp(−Δt / τ)Computed in Rust with no background daemon, no write amplification.
Deposit (on successful retrieval)
w_new = clamp(w + α, 0, 1)α = 0.1 (configurable). Deposit happens post-retrieval, asynchronously.
| Memory Type | τ (decay half-life) | Rationale |
|---|---|---|
| Episode | 1 day | Conversation context fades quickly |
| Entity / Fact | 7 days | Facts persist through a work week |
| Procedure | 30 days | How-to knowledge is durable |
| Working | 1 hour | Current task context — intentionally ephemeral |
| Graph Edge | 3 days | Relationships have medium persistence |
Research: The pheromone model is being formalized in PheromGraph (ICLR 2027 target) — the first paper to apply stigmergic dynamics to LLM knowledge graph retrieval.
7. LadybugDB Schema (Key Tables)#
Each agent gets an isolated LadybugDB database at /data/palaces/{agent_id}/palace.kuzu on GCP Filestore. Shared spaces (Builder+) get their own dedicated LadybugDB DB.
-- Episodic memory: time-stamped events, verbatim storageCREATE NODE TABLE EpisodeNode ( id STRING PRIMARY KEY, -- UUID v4: "ep_550e8400-..." agent_id STRING NOT NULL, content STRING NOT NULL, -- verbatim text, no summarization embedding FLOAT[1536], -- TF-IDF + optional dense embedding valid_at TIMESTAMP NOT NULL, -- when episode occurred (temporal) invalid_at TIMESTAMP, -- when superseded (NULL = still valid) session_id STRING, pheromone_w FLOAT DEFAULT 1.0, -- current weight (decayed at query time) pheromone_tau FLOAT DEFAULT 86400.0, -- decay half-life in seconds last_accessed TIMESTAMP -- for lazy decay computation); -- Semantic memory: knowledge triplesCREATE NODE TABLE FactNode ( id STRING PRIMARY KEY, agent_id STRING NOT NULL, subject STRING NOT NULL, predicate STRING NOT NULL, object_ STRING NOT NULL, valid_at TIMESTAMP NOT NULL, invalid_at TIMESTAMP, -- temporal validity confidence FLOAT DEFAULT 1.0, pheromone_w FLOAT DEFAULT 1.0, pheromone_tau FLOAT DEFAULT 604800.0 -- 7-day default); -- Graph edges carry pheromone weight for A* traversalCREATE REL TABLE FACT_RELATES_FACT ( FROM FactNode TO FactNode, relation_type STRING NOT NULL, pheromone_w FLOAT DEFAULT 1.0 -- edge-level traversal weight);Full schema: github.com/fetchai/agentverse-memory
8. Performance Targets#
Phase 1 (in-memory am-local) verified 2026-05-12. Phase 2 targets (Cloud Run + Filestore) are design targets, not yet measured.
| Metric | Phase 1 (am-local) | Phase 2 Target | Comparison |
|---|---|---|---|
| Write op (engine, no LLM) | 0.035 ms ✅ | <5 ms | Mem0: ~2,590 ms LLM-extraction write step |
| Query op (engine) | 0.069 ms ✅ | <20 ms | Zep: ~800 ms (deployed) |
| Graph traversal (A*) | N/A (in-memory) | <50 ms | Mem0: graph behind $249/mo |
| Max agents (Explorer) | — | 3 | Mem0: 3 (same) |
| Ops/mo (Explorer) | — | 50K | Mem0: 100 ops (!!) |
| LLM calls at write | 0 ✅ | 0 | Mem0: 1 LLM call/write |
Phase 1 latencies measured on am-local (2.3 MB stripped binary). 5 episodes ingested, 2 queries. 35/35 tests pass. Ablation modes: full pheromone, --no-pheromone, --no-astar.