OpenMemory: Ambitious, Flawed, Interesting

This is a follow-on to the source-level analysis of eight agent memory architectures. OpenMemory (CaviraOSS, 3,300+ stars — not Mem0’s product of the same name) is a TypeScript memory engine that attempts things no other system in that analysis does: multi-tier decay with progressive forgetting, Hebbian coactivation learning, temporal fact supersession, and a five-sector cognitive architecture. All without requiring a single LLM call. We pulled the source and traced the code paths. The ideas are genuinely interesting. The integration between them is where it falls apart.

The forgetting system doesn’t work

The headline feature is a three-layer decay engine. Memories age through hot/warm/cold tiers with sector-specific rates (emotional fades in ~35 days, reflective persists ~693 days). As memories decay, the system progressively compresses their embedding vectors via mean-pooling — shrinking a 1536-dim vector to, say, 768 dims. When a memory decays further, it gets fingerprinted down to 32 dimensions. The idea: old memories lose fidelity but stay findable, and get rebuilt at full resolution if accessed again.

The problem: cosineSimilarity (embed.ts:656) checks if (a.length !== b.length) return 0. Query vectors are always 1536-dim. A compressed 768-dim memory scores zero. It’s invisible to search. And the reconsolidation path that’s supposed to rescue it only triggers for vectors ≤ 64 dimensions — so the mean-pooled middle tier (the graduated part of “progressive”) is a dead zone where memories can’t be found AND can’t be rebuilt.

There’s also a deeper conceptual issue. Mean-pooling adjacent embedding dimensions treats them like image pixels — as if nearby values are related. In neural embeddings (or even in their synthetic TF-IDF hashed embeddings), dimensions are arbitrary learned features with no spatial locality. Averaging dimensions 500 and 501 doesn’t produce a “blurrier” version of the meaning. It produces noise. The right technique for embedding compression would be something like PCA, Matryoshka truncation, or random projection — methods that actually preserve distance relationships.

Ironically, the code does compress the text content alongside the vector (compress_summary in decay.ts): full text → extractive summary → top keywords. That’s the right idea — degrade the content, then re-embed the degraded content at full dimensionality so it’s still findable. But instead it degrades both the content and the index independently, breaking the index.

Hebbian learning as a rarely-activated tiebreaker

When memories co-occur in query results, the system strengthens waypoint connections between them using a Hebbian rule modulated by temporal proximity: new_weight = min(1, current + η · (1 - current) · exp(-time_diff / τ)). The graph self-organizes from retrieval patterns. It’s a genuinely interesting mechanic.

In practice: waypoint weights feed into the composite score at 15% weight, only when vector search confidence is below 0.55 (otherwise expansion is skipped entirely). And the waypoints table has a primary key of (src_id, user_id) instead of (src_id, dst_id) — so each memory can only maintain one outbound link. Every new learned association overwrites the last. The graph never develops multi-link structure. It’s a lot of machinery for a single-link tiebreaker that rarely activates.

35 regex patterns deciding everything

The five-sector architecture (episodic, semantic, procedural, emotional, reflective) drives differential decay, cross-sector resonance scoring, and multi-vector fusion. The classification feeding all of it: about six hardcoded regex patterns per sector. “I hate how the algorithm forgets data” classifies as emotional (matches “hate”). “I think we should use Postgres” classifies as reflective (matches “think”) and gets near-permanent 693-day storage. No matches at all? Default to semantic, confidence 0.2.

What reads as genuinely good

The temporal knowledge graph is clean: SPO triples with validity windows, automatic fact supersession when contradicting facts are inserted, point-in-time queries, and a compare_time_points() that diffs knowledge across dates. It’s wired into the MCP server for unified contextual + temporal queries. The gap is that it runs parallel to the HSG engine rather than feeding into retrieval scoring.

The VS Code extension tracks file diffs on save (not just chat — actual code changes), auto-configures Cursor, Claude Desktop, Windsurf, Copilot, and Codex to use OpenMemory on install, and has a clean product flow. The connector system (GitHub, Notion, Google Drive, OneDrive, web crawler) has proper rate limiting, retry logic, and error hierarchies. The migration tools import from Mem0, Zep, and Supermemory with provenance tracking. These are production-quality components.

The pattern

This reads as an ambitious solo developer (or very small team) who’s strong on theory and building fast. The ideas are sourced from real literature — Ebbinghaus forgetting curves, Hebbian learning, cognitive sector models. Each component works in isolation. The bugs are all integration-level: compression produces vectors that search can’t compare, coactivation writes to a table whose schema prevents multi-link graphs, temporal facts live in a store that retrieval doesn’t consult. These are the kind of issues you’d catch with end-to-end testing that exercises the interactions between subsystems over time — exactly the testing that’s hardest to do when you’re moving fast.

It’s clearly targeting individual developer use rather than production infrastructure, and that’s fine — the VS Code extension, the zero-config SQLite default, the auto-linking to coding tools all point that direction. The decay rates, the temporal graph, the Hebbian coactivation are ideas worth refining. A lightweight local classifier replacing the regex, proper dimensionality reduction on the vectors, and fixing the waypoint primary key would transform this from interesting-but-broken to something genuinely useful. The architecture is ready for it.