Docs/Architecture

Architecture

How am-server works under the hood — from request routing to LadybugDB graph retrieval, pheromone dynamics, and GCP deployment topology.

1. Overview#

Agentverse Memory is a persistent memory service for LLM agents. Agents write memories during interactions; the service stores them in a per-agent knowledge graph and retrieves them on demand. Unlike flat vector stores, retrieval quality improves with use — every successful retrieval deposits a pheromone trace that biases future traversal toward proven paths.

Zero LLM at write time

Memories are ingested via TF-IDF entity extraction and direct graph insertion — no API calls, no variable latency, no per-write token cost.

Graph-first storage

Every agent's memory lives in a LadybugDB embedded graph database, enabling multi-hop relational retrieval that flat vector search cannot do.

Pheromone-guided traversal

Every retrieval deposits typed decay traces on nodes and edges it traverses. A* cost function treats those traces as first-class signals.

2. Single Binary (ARCH-001)#

Decision: MCP server and REST API run in the same am-server binary. There is one Cloud Run service, not two.

Why: LadybugDB is single-process

LadybugDB uses file-based locking and a write-ahead log (WAL). Multiple processes opening the same database directory is not supported — it causes WAL corruption. This is not a configuration issue; it's fundamental to embedded databases. Running MCP and REST in the same process eliminates this risk entirely.

POST /mcp   ───┐
               │  am-server   (single Axum binary, one Cloud Run service)
GET /health  ──┤  ├── am-core        domain types, error handling
GET /ready   ──┘  ├── am-config      env-var config (serde + envy)
                  ├── am-retrieval   TF-IDF + dense RRF hybrid, pheromone, A*
                  └── am-storage     LadybugDB FFI, palace pool LRU, Redis working memory

Cloud Run concurrency model

containerConcurrency=80 means 80 concurrent HTTP requests hit one process, handled by Tokio's async runtime. The palace pool uses RwLock<PalacePool> internally. No cross-process coordination; no corruption risk.

3. Deployment Topology#

┌─────────────────────────────────────────────────────────┐
│  LLM Agent (Python / TypeScript / Claude / GPT / any)   │
└──────────────────────────┬──────────────────────────────┘
                           │  MCP tools or REST API
                           │  Authorization: Bearer am_...
                           ▼
          ┌─────────────────────────────────────────┐
          │  Cloud Load Balancer (GCP, HTTPS)        │
          │  memory.agentverse.ai                   │
          └─────────────────────────────────────────┘
                           │
                           ▼
          ┌─────────────────────────────────────────┐
          │  am-server  (Cloud Run, us-central1)     │
          │  project: fetch-coder                   │
          │  2 vCPU · 4 GiB · concurrency=80        │
          │  scale-to-zero · min-instances=1        │
          │                                         │
          │  POST /mcp      ← JSON-RPC 2.0          │
          │  GET|POST /v1/* ← REST API              │
          │  GET /health    ← liveness probe        │
          │  GET /metrics   ← Prometheus scrape     │
          └──────────────┬──────────────────────────┘
                         │
             ┌───────────┴────────────┐
             ▼                        ▼
┌────────────────────┐   ┌────────────────────────┐
│  GCP Filestore NFS │   │  Memorystore Redis      │
│  /data/palaces/    │   │  Working memory         │
│    {agent_id}/     │   │  Auth cache             │
│      palace.kuzu   │   │  Rate limiting          │
└────────────────────┘   └────────────────────────┘

4. Memory Types & Storage#

Five memory types are supported. Four are stored in LadybugDB; Working memory is Redis-primary.

Type	Node Table	Primary Store	Default τ	Tools prefix
Episodic	EpisodeNode	LadybugDB	24 h	memory__episode
Semantic	FactNode	LadybugDB	7 d	memory__fact
Procedural	ProcedureNode	LadybugDB	30 d	memory__procedure
Working	WorkingMemoryNode	Redis	1 h	memory__working
Shared	SharedSpaceNode	LadybugDB (shared)	—	memory__space

Palace Pool

am-server maintains an LRU pool of up to 100 concurrently open LadybugDB databases (PALACE_MAX_OPEN=100). When an agent's request arrives and their palace is not in the pool, it is opened from NFS and the least-recently-used palace is evicted if the pool is at capacity. New agents get a fresh LadybugDB database with schema initialized on first open.

5. Retrieval Pipeline#

memory_search_episodes defaults to hybrid retrieval (stages 1–3). Pheromone reweighting and graph expansion (stages 4–5) are optional, opt-in via per-call toggles. All embedding work happens on the read path only and is cached per agent — the write path stays zero-LLM.

TF-IDF lexical retrievalDefault · all tiers

Classical TF-IDF term-frequency scoring over episode content. Deterministic, no model call. Returns a ranked list of episode IDs.

Dense embedding retrievalDefault · all tiers

Nearest-neighbour search over dense embeddings (text-embedding-3-small via OpenRouter), computed lazily on the read path and cached per agent — so writes never call an embedding model.

RRF fusion (k=60)Default · all tiers

Reciprocal Rank Fusion merges the TF-IDF and dense ranked lists into one ranking (k=60). No threshold tuning required. This is the default returned ranking (retrieval: "hybrid").

Pheromone reweighting (optional)Opt-in · all tiers

When enabled (use_pheromone), each result score is multiplied by its current pheromone weight w(t) = w₀·exp(−Δt/τ), computed lazily at query time — no background daemon. Off by default; benefits repeated-access and multi-agent workloads.

Graph expansion / A* (optional)Opt-in · Builder+ (find_path 403 on Explorer)

Results can be expanded along graph edges, or paths found with A* via the memory_find_path tool. cost(edge) = 1 − pheromone_weight(edge). Off by default for QA-style retrieval.

6. Pheromone Model#

Agentverse Memory uses a stigmergic retrieval model inspired by ant colony optimization (ACO). Every retrieval deposits a pheromone trace; every trace decays exponentially over time. Memories that are frequently retrieved accumulate weight faster than they decay — making them progressively more retrievable. Memories that become irrelevant decay to near-zero without requiring explicit deletion.

Decay (lazy, at query time)

w(t) = w₀ · exp(−Δt / τ)

Computed in Rust with no background daemon, no write amplification.

Deposit (on successful retrieval)

w_new = clamp(w + α, 0, 1)

α = 0.1 (configurable). Deposit happens post-retrieval, asynchronously.

Memory Type	τ (decay half-life)	Rationale
Episode	1 day	Conversation context fades quickly
Entity / Fact	7 days	Facts persist through a work week
Procedure	30 days	How-to knowledge is durable
Working	1 hour	Current task context — intentionally ephemeral
Graph Edge	3 days	Relationships have medium persistence

Research: The pheromone model is being formalized in PheromGraph (ICLR 2027 target) — the first paper to apply stigmergic dynamics to LLM knowledge graph retrieval.

7. LadybugDB Schema (Key Tables)#

Note: LadybugDB is our active fork of Kùzu, optimized for agent memory workloads — including per-agent database isolation, pheromone column support, and LRU palace pool management. Cypher syntax and file format remain compatible.

Each agent gets an isolated LadybugDB database at /data/palaces/{agent_id}/palace.kuzu on GCP Filestore. Shared spaces (Builder+) get their own dedicated LadybugDB DB.

create_nodes.cypher

-- Episodic memory: time-stamped events, verbatim storage
CREATE NODE TABLE EpisodeNode (
    id             STRING    PRIMARY KEY,   -- UUID v4: "ep_550e8400-..."
    agent_id       STRING    NOT NULL,
    content        STRING    NOT NULL,      -- verbatim text, no summarization
    embedding      FLOAT[1536],             -- dense embedding (text-embedding-3-small), read-path only
    valid_at       TIMESTAMP NOT NULL,      -- when episode occurred (temporal)
    invalid_at     TIMESTAMP,              -- when superseded (NULL = still valid)
    session_id     STRING,
    pheromone_w    FLOAT     DEFAULT 1.0,   -- current weight (decayed at query time)
    pheromone_tau  FLOAT     DEFAULT 86400.0, -- decay half-life in seconds
    last_accessed  TIMESTAMP               -- for lazy decay computation
);
 
-- Semantic memory: knowledge triples
CREATE NODE TABLE FactNode (
    id             STRING    PRIMARY KEY,
    agent_id       STRING    NOT NULL,
    subject        STRING    NOT NULL,
    predicate      STRING    NOT NULL,
    object_        STRING    NOT NULL,
    valid_at       TIMESTAMP NOT NULL,
    invalid_at     TIMESTAMP,              -- temporal validity
    confidence     FLOAT     DEFAULT 1.0,
    pheromone_w    FLOAT     DEFAULT 1.0,
    pheromone_tau  FLOAT     DEFAULT 604800.0  -- 7-day default
);
 
-- Graph edges carry pheromone weight for A* traversal
CREATE REL TABLE FACT_RELATES_FACT (
    FROM FactNode TO FactNode,
    relation_type  STRING NOT NULL,
    pheromone_w    FLOAT  DEFAULT 1.0      -- edge-level traversal weight
);

Full schema: github.com/fetchai/agentverse-memory

8. Performance Targets#

Phase 1 (in-memory am-local) verified 2026-05-12. Phase 2 targets (Cloud Run + Filestore) are design targets, not yet measured.

Metric	Phase 1 (am-local)	Phase 2 Target	Comparison
Write op (engine, no LLM)	0.035 ms ✅	<5 ms	Mem0/Zep: LLM-extraction call on every write
Query op (engine)	0.069 ms ✅	<20 ms	Mem0 published: ~2.59 s p95 query (graph variant)
Graph traversal (A*)	N/A (in-memory)	<50 ms	Mem0: graph behind $249/mo
Max agents (Explorer)	—	3	Mem0: 3 (same)
Ops/mo (Explorer)	—	50K	Mem0: 100 ops (!!)
LLM calls at write	0 ✅	0	Mem0: 1 LLM call/write

Phase 1 latencies are in-process engine micro-benchmarks measured on am-local (2.3 MB stripped binary; 5 episodes ingested, 2 queries; 35/35 tests pass; ablation modes: full pheromone, --no-pheromone, --no-astar). They are not deployed end-to-end latencies: on the current Cloud Run deployment, warm writes are ~150 ms, TF-IDF search ~150 ms, and hybrid search ~350–400 ms. Phase 2 targets are design goals, not yet measured.

Explore Further

PheromGraph Research

The paper formalizing pheromone retrieval

API Reference

All 35 MCP tools

MCP Integration

Connect Claude Desktop, Cursor

Pricing

Graph at every tier including free