Back
guide2026-02-22

The Ultimate Guide to OpenClaw Memory

Author: @lijiuer92Original
AD SLOT — TOP

The Ultimate Guide to OpenClaw Memory

Author: @lijiuer92 | Source: X/Twitter | Date: 2026-02-22

❤️ 1139 | 🔁 260 | 🔖 2240 | 👁 404K


Every time your OpenClaw agent loses its memory, it burns your money and kills your productivity.

You're afraid to even restart it.

I've reviewed 10+ agent memory research papers and 6 open-source community GitHub projects totaling 77K stars — to break down every layer of your OpenClaw memory pain points, from current state to solutions, from academia to engineering.

Part 1: The Brutal Reality — Your Agent Has Goldfish Memory

Let's start with a number: 45 hours.

GitHub Issue #5429 reporter EmpireCreator lost 45 hours of accumulated agent context: skill configurations, integration parameters, task priorities. The cause was a silent compaction that wiped all conversation history — no warning, no recovery option.

This isn't an isolated case.

Issue #2624 reports agents randomly resetting, forgetting conversations from just 2 messages ago. Issue #8723 reports a memory flush triggering an infinite loop, locking the agent for 72 minutes.

What does OpenClaw's current memory architecture look like? In one sentence: Markdown files + vector search.

Memory is stored in Markdown files under the ~/.openclaw/workspace/ directory.

Daily Logs are short-term journals. MEMORY.md is long-term memory. SOUL.md defines personality. Retrieval uses vector embeddings + BM25 hybrid search.

A Medium blogger captured this design perfectly: "Deliberately uncool — treating memory as Markdown files and retrieval as tool calls."

Where's the problem? Six words: flat, undifferentiated, and passive.

All memories have equal weight — a casual chat from a year ago is treated the same as a major decision from yesterday. Forgetting mechanism? None — you can only delete manually. Auto-organization? Entirely manual curation. Retrieval only looks at semantic similarity, doesn't evaluate importance, and can't express relationships like "A is B's friend." Data stays data forever and never becomes knowledge.

The community put it most bluntly: "Everyone complains their OpenClaw has amnesia."

Part 2: What OpenClaw Is Doing Officially — QMD Backend and Hybrid Search

The official team hasn't been idle.

Release timeline for January–February 2026:

v2026.1.12 (Jan 13): Vector search infrastructure launched — SQLite indexing + chunking + lazy sync + file watching, supporting local and remote embeddings. This is the foundation of the entire memory search system.

v2026.1.29 (Jan 29): L2 normalization fix. Local embedding vectors weren't normalized, causing cosine similarity calculations to be distorted — meaning previous semantic search accuracy was compromised. Additional index path support also added.

v2026.2.2 (Feb 4): QMD memory backend merged (PR #3160) — the most important architectural upgrade. 30 commits, adding QMD backend support for BM25 + vector + Reranking three-way hybrid search.

What does QMD do? It replaces the built-in SQLite indexer with a local search sidecar process. Each agent/config combination caches a sidecar, supports multiple named collections, and session transcripts can be exported and indexed into dedicated collections. For privacy, session data is anonymized before indexing. QMD automatically falls back to SQLite when unavailable.

Known issues: Query time on CPU-only systems is ~3 minutes 40 seconds, exceeding the 12-second timeout (Issue #8786). The paths config doesn't take effect (Issue #8750). And the fallback is silent — users don't know QMD isn't working.

PR #6060 attempts to solve the "discoverability" problem: OpenClaw's memory system has powerful features that users can't find. The proposal adds a "memory optimization" step to the onboarding guide, exposing four hidden features that are off by default: pre-compaction memory flush, hybrid search, embedding cache, and session transcript search.

The core problem with the official direction: these are all "retrieval layer" optimizations. Search is more accurate, faster, more discoverable.

But the six fundamental gaps in memory architecture — forgetting, importance, graph, reflection, temporal reasoning, promotion — remain completely unaddressed.

Part 3: How the Community Is Helping Itself — Five DIY Solutions

The community didn't wait for the official team. At least 7 third-party memory projects appeared in January–February 2026.

1⃣ Mem0: The most well-known memory layer SDK. Auto-Recall searches relevant memories before each response and injects them into context. Auto-Capture extracts facts after responses and stores them. Session + User dual-layer memory. Claims 91% latency improvement and 90% token savings.

2⃣ Hindsight: Local long-term memory. Core insight: traditional systems give agents a search_memory tool, but models don't always use it. Auto-Recall automatic injection solves this. Fully local, PostgreSQL backend, supports multi-instance sharing.

3⃣ MoltBrain (365 Stars): SQLite + ChromaDB semantic search, lifecycle hooks for automatic context capture, Web UI for timeline viewing.

4⃣ NOVA Memory System: PostgreSQL structured memory, Claude API parses natural language into JSON, 8 database tables (entities, relationships, locations, projects, events, lessons, preferences).

5⃣ Penfield Skill: Hybrid search BM25 + vector + graph — the community is already doing three-way hybrid search.

6⃣ Also: Memory Template (Git-backed), SuperMemory (very early stage), MemoryPlugin (Chrome extension for cross-platform sync).

What directions do community "best practices" validate?

  1. Daily Log → MEMORY.md promotion pattern
  2. Heartbeat repurposed as memory consolidation trigger
  3. 70/30 hybrid search weights (vector 70% + keyword 30%)
  4. Session Transcript indexing

But six blind spots the community hasn't touched at all: forgetting/decay mechanisms, importance scoring, knowledge graphs, automatic reflection/consolidation, temporal reasoning, memory promotion.

One-line summary: the community is using manual operations to compensate for architectural deficiencies. The direction is right, but everything stays at the manual operation level.

Part 4: Academic Explosion — 10+ Papers in February 2026

In February 2026, agent memory suddenly became the main battleground in academia. Over 10 papers appeared on arXiv in a single month, including xMemory accepted at ICML 2026 and A-MEM from NeurIPS 2025. A survey paper with 59 authors systematically reviewed the entire field.

Key papers:

  • xMemory (ICML 2026): Decouples memory into semantic components organized into hierarchical structures. Uses Sparsity-Semantics objectives to guide memory splitting and merging.
  • A-MEM (NeurIPS 2025): Manages agent memory using the Zettelkasten method. When new memories are added, generates structured notes containing contextual descriptions, keywords, and tags.
  • InfMem: Implements System-2 style active memory control through a PreThink-Retrieve-Write protocol. 10–12% accuracy improvement on QA benchmarks from 32K to 1M tokens.
  • TAME: Discovers "Agent Memory Misevolution" — memory can accumulate "toxic shortcuts." Proposes an Executor/Evaluator dual-memory framework.
  • ALMA: Meta-learning framework that lets AI automatically discover memory designs. Learned designs outperform hand-crafted baselines by 6–13%.
  • MemSkill: Reconstructs memory operations as learnable "memory skills."
  • BudgetMem: Runtime memory framework using reinforcement learning to train a lightweight router for budget-tier routing.

The 59-author survey paper provides a three-dimensional taxonomy: memory Substrate, cognitive Mechanism, and memory Subject.

Two key warnings from industry:

  1. Serial Collapse (Moonshot AI Kimi K2.5): Agents degrade to not using memory at all
  2. Memory Misevolution (TAME): Toxic shortcuts accumulate during normal iteration

Part 5: The Open-Source Memory Ecosystem — A Full Scan of 6 Projects

Six agent memory open-source projects, totaling 77K+ stars, representing three memory philosophies:

  1. State layer first — mem0 (46.6K), Memori (12K): memory = state management
  2. Knowledge layer first — cognee (11.7K), MemOS (4.9K): memory = structured knowledge
  3. Learning layer first — Hindsight (1.3K): memory = learning process (retain/recall/reflect three-operation loop)

No single project covers all three layers simultaneously.

Part 6: Lessons from 200+ Issues — Pitfalls Others Have Fallen Into

Five common problems across projects:

  1. Silent failure (present in 6/6 projects): features don't work but don't tell the user
  2. Memory deduplication: duplicate memories trigger DELETE instead of NOOP; LLM interprets duplicates as "contradictions" causing incorrect deletions
  3. Unreliable LLM judgment: first-person reference loss, unstable JSON formatting, prompt language bias
  4. Database connection/migration issues: SQLite connections not closed, Docker migration failures, telemetry memory leaks
  5. Search ranking distortion: cross-collection normalization causes ranking distortion; retrieval has no temporal dimension

Part 7: What Game AI Teaches Us

Dwarf Fortress's three-layer memory architecture:

  • Short-Term Memory (STM) — 8-slot circular buffer queue; new memories compete by emotional intensity
  • Long-Term Memory (LTM) — STM memories that stay long enough without being displaced attempt promotion
  • Core Memory — qualitative change; permanently modifies character personality parameters

Stanford Generative Agents' three-dimensional retrieval:

  • Retrieval score = Recency × Importance × Relevance
  • Reflection mechanism: take the most recent 100 trivial memories → LLM distills 3 high-level insights
  • Long-term conversation fact recall improved from 41% to 87%

The Sims 4's emotional solidification: short-term emotions that repeatedly appear → converted into permanent traits

Nemesis System's event-driven evolution: event tags → trigger parameter mutations → propagate through social relationship networks

Part 8: Two Types of Memory — User Memory vs Agent Memory

ByteDance OpenViking project's classification system:

  • 6 memory types: profile, preference, entity, event, case, pattern
  • L0/L1/L2 three-tier content model: L0 summary ~100 tokens for indexing, L1 overview ~500 tokens for structured presentation, L2 full text
  • Merge strategy: profiles always merge; preferences/entities/patterns support merging; events/cases cannot be merged

Part 9: Why Memory Is Core Infrastructure

Whoever solves the memory problem first wins the 24/7 Agent war.

Three forces in February 2026 — academic paper density, open-source project explosion, official architectural upgrades — all point to one signal: AI memory is transitioning from "nice to have" to core infrastructure.

Part 10: memX and ePro

Based on the above research, the author has built two systems: memX (User Memory) and ePro (Agent Memory), both live and iterating.

References

  • [1] Hu et al., "xMemory: Beyond RAG for Agent Memory," ICML 2026. arXiv:2602.02007
  • [2] Xu et al., "A-MEM: Agentic Memory for LLM Agents," NeurIPS 2025. arXiv:2502.12110
  • [3] Huang et al., "Rethinking Memory Mechanisms of Foundation Agents," 2026. arXiv:2602.06052
  • [4] Wang et al., "InfMem: Learning System-2 Memory Control," 2026. arXiv:2602.02704
  • [5] Cheng et al., "TAME: Trustworthy Agent Memory Evolution," 2026. arXiv:2602.03224
  • [6] "ALMA: Automated Meta-Learning of Memory Designs," 2026. arXiv:2602.07755
  • [7] Zhang et al., "MemSkill: Learning and Evolving Memory Skills," 2026. arXiv:2602.02474
  • [8] Zhang et al., "BudgetMem: Budget-Tier Routing for Runtime Agent Memory," 2026. arXiv:2602.06025
  • [9] Kimi Team, "Kimi K2.5: Scaling Reinforcement Learning with LLMs," 2026. arXiv:2602.02276
  • [10] Park et al., "Generative Agents: Interactive Simulacra of Human Behavior," 2023. arXiv:2304.03442

Co-created by @lijiuer92 with Claude Max, Manus, and Google Gemini. Primary contributor: @lijiuer92. Data based on snapshot from February 23, 2026.

AD SLOT — BOTTOM