Cross-Session Memory: How Hermes Fixes the 'Claude Forgets' Problem
Every Claude user has lived through session amnesia. Here's why it happens, why CLAUDE.md is a partial fix, and how Hermes solves it at the runtime layer.
There is a conversation every Claude Code user has had with themselves at some point: "I already told it this last week." You open a new session, the context is empty, and you start over. Claude did not fail. The runtime you were using simply was not asked to remember. This article is about the problem, the partial fixes we have built around it, and how Hermes Agent solves it at the layer where it actually belongs.
Key Takeaways
- "Session amnesia" is a runtime problem, not a model problem. Claude itself does not remember across sessions because nothing is persisting context for it.
- Claude Code's CLAUDE.md is a partial fix: static, hand-edited, loaded every session. It works for stable facts but not for evolving knowledge.
- Vector databases solve the scale problem but introduce an auditability problem — you cannot read them the way you can read a markdown file.
- Chat transcripts preserve everything but are too noisy for reliable recall without summarization.
- Hermes puts memory at the runtime layer: markdown files, FTS5 search, agent-curated updates, loaded on demand during the session.
- The four approaches (static CLAUDE.md, vector DB, raw transcript, Hermes markdown-plus-FTS5) each optimize for different properties.
- For the underlying implementation, see the Hermes memory deep dive.
The Failure Mode
You spend an hour on Monday walking Claude Code through your codebase's peculiarities. The auth module uses a weird dual-token scheme because of a historical migration. The CI pipeline has a flaky test you always have to re-run. Your deployment script only works on a specific Node version. By the end of the session, the agent knows all of this and uses it to produce good code.
Friday you open a new session. None of it carries over. You re-explain the dual-token scheme. You re-explain the flaky test. You copy-paste the same CLAUDE.md snippets you pasted Monday.
This is not a Claude bug. It is a property of stateless request-response architectures. The model is given a context window for each call. When the session ends, that context is discarded. Unless the runtime persists something, the next session starts blank.
Why CLAUDE.md Is a Partial Fix
Claude Code addressed this with CLAUDE.md: a static markdown file auto-loaded into every session. It works, and it works well for things that do not change: coding conventions, architectural overviews, always-relevant constraints.
It is a partial fix for three reasons.
It is static. You write it by hand. When the agent learns something new on Monday, CLAUDE.md does not update unless you remember to go edit it.
It is monolithic. Everything loads every session, whether relevant or not. Large CLAUDE.md files burn context on information the current task does not need.
It does not scale to multi-project knowledge. A CLAUDE.md per repo is fine for code. Personal context ("I prefer TypeScript over JavaScript for frontend work"), cross-project context ("client X uses Bitbucket not GitHub"), and time-bound context ("the deploy pipeline is broken until Wednesday") have no natural home.
The Alternatives
Before we look at what Hermes does, it is worth being honest about the other approaches.
Vector Databases
The industry's default answer for agent memory. Embed everything, store in a vector DB, retrieve semantically at query time. Solves the scale problem cleanly — you can have gigabytes of history and still retrieve relevantly. It introduces two new problems: opacity and drift. You cannot read a vector store. When retrieval goes wrong you have no easy way to debug it. And embeddings from today's model may age poorly when you swap models six months from now.
Raw Chat Transcripts
Keep everything. Log every session verbatim. On demand, retrieve by date or keyword. Complete and auditable. Too noisy to be useful as live context: a year of transcripts includes every mistake, every retry, every interruption.
Manual Notes
The low-tech fallback. Copy useful things into a personal notes system (Obsidian, Notion, a markdown folder). Works, does not scale, creates its own maintenance burden.
How Hermes Solves It
Hermes treats memory as a runtime responsibility. The key design choices:
- Memory is a directory of markdown files under
~/.hermes/. Human-readable, diffable, portable. - SQLite FTS5 indexes everything for fast full-text retrieval with BM25 ranking.
- The agent curates its own memory. New information merges into existing sections. Outdated claims are edited in place. Files get rewritten when structure drifts.
- Retrieval is dynamic. At session start Hermes pulls the top-ranked memory files for your first message. Mid-session, it can search further on demand.
- You can edit it. When the agent remembered something wrong, you open the file and fix it.
This is the topic of the Hermes memory deep dive in architectural detail.
A Side-by-Side Comparison
| Property | Static CLAUDE.md | Vector DB | Raw Transcripts | Hermes Memory |
|---|---|---|---|---|
| Auditable by a human | Yes | No | Partial (too noisy) | Yes |
| Updates automatically | No | Yes | Yes | Yes |
| Editable by a human | Yes | No | No | Yes |
| Scales past a few KB | Poorly | Yes | Yes | Yes |
| Supports targeted deletion | Yes | Partial | Yes | Yes |
| Good for evolving facts | No | Yes | Partial | Yes |
| Good for stable conventions | Yes | Overkill | Noisy | Yes |
| Works without external services | Yes | Typically no | Yes | Yes |
The Hermes column wins on most rows because it borrows the good properties of each alternative: readability from markdown files, dynamic retrieval from vector-DB thinking, completeness from transcripts, and deliberate curation from manual notes.
What This Looks Like in Practice
Monday morning, you start a Hermes chat. You tell it about the dual-token auth scheme. You explain the flaky test. At session close the agent writes:
# projects/client-foo.md
## Auth
Uses a dual-token scheme: short-lived access tokens in the header,
refresh tokens in an HTTP-only cookie. Historical migration from the
legacy session-based system.
## CI
The `auth.integration.test.ts` test is flaky — re-run on failure
before investigating. Tracked in issue #427.
Friday morning, you start a new session. You ask it to add an endpoint that requires authentication. Hermes's retrieval step pulls projects/client-foo.md into context. The agent applies the dual-token pattern without being told again.
That is the entire point.
When to Still Use CLAUDE.md
Hermes memory does not replace CLAUDE.md. It complements it. CLAUDE.md still earns its keep for project-local conventions that every session needs: "use Tailwind, not CSS modules." "Never use force-dynamic on listing pages." Those are stable rules where always-loaded context is correct.
Use Hermes memory for the evolving layer: what you learned yesterday, what a specific client needs, why a decision was made. The two work together.
For Claude Code Users Specifically
If you stay in Claude Code, there is still value in structuring your CLAUDE.md files more thoughtfully, splitting global from project-specific, and committing them to git. But you will hit the ceiling of the static approach eventually. When you do, Hermes is the runtime-level fix: put a persistent agent on a VPS, let it accumulate memory, and use its delegate-to-Claude-Code capability to run specific coding tasks through your existing CLI.
We cover that integration pattern in spawning Claude Code as a Hermes subagent and the broader tradeoffs in Hermes vs Claude Code: when to use which.