Cross-Session Memory: How Hermes Fixes the 'Claude Forgets' Problem

There is a conversation every Claude Code user has had with themselves at some point: "I already told it this last week." You open a new session, the context is empty, and you start over. Claude did not fail. The runtime you were using simply was not asked to remember. This article is about the problem, the partial fixes we have built around it, and how Hermes Agent solves it at the layer where it actually belongs.

Key Takeaways

"Session amnesia" is a runtime problem, not a model problem. Claude itself does not remember across sessions because nothing is persisting context for it.
Claude Code's CLAUDE.md is a partial fix: static, hand-edited, loaded every session. It works for stable facts but not for evolving knowledge.
Vector databases solve the scale problem but introduce an auditability problem — you cannot read them the way you can read a markdown file.
Chat transcripts preserve everything but are too noisy for reliable recall without summarization.
Hermes puts memory at the runtime layer: markdown files, FTS5 search, agent-curated updates, loaded on demand during the session.
The four approaches (static CLAUDE.md, vector DB, raw transcript, Hermes markdown-plus-FTS5) each optimize for different properties.
For the underlying implementation, see the Hermes memory deep dive.

The Failure Mode

You spend an hour on Monday walking Claude Code through your codebase's peculiarities. The auth module uses a weird dual-token scheme because of a historical migration. The CI pipeline has a flaky test you always have to re-run. Your deployment script only works on a specific Node version. By the end of the session, the agent knows all of this and uses it to produce good code.

Friday you open a new session. None of it carries over. You re-explain the dual-token scheme. You re-explain the flaky test. You copy-paste the same CLAUDE.md snippets you pasted Monday.

This is not a Claude bug. It is a property of stateless request-response architectures. The model is given a context window for each call. When the session ends, that context is discarded. Unless the runtime persists something, the next session starts blank.

Why CLAUDE.md Is a Partial Fix

Claude Code addressed this with CLAUDE.md: a static markdown file auto-loaded into every session. It works, and it works well for things that do not change: coding conventions, architectural overviews, always-relevant constraints.

It is a partial fix for three reasons.

It is static. You write it by hand. When the agent learns something new on Monday, CLAUDE.md does not update unless you remember to go edit it.

It is monolithic. Everything loads every session, whether relevant or not. Large CLAUDE.md files burn context on information the current task does not need.

It does not scale to multi-project knowledge. A CLAUDE.md per repo is fine for code. Personal context ("I prefer TypeScript over JavaScript for frontend work"), cross-project context ("client X uses Bitbucket not GitHub"), and time-bound context ("the deploy pipeline is broken until Wednesday") have no natural home.

The Alternatives

Before we look at what Hermes does, it is worth being honest about the other approaches.

Vector Databases

The industry's default answer for agent memory. Embed everything, store in a vector DB, retrieve semantically at query time. Solves the scale problem cleanly — you can have gigabytes of history and still retrieve relevantly. It introduces two new problems: opacity and drift. You cannot read a vector store. When retrieval goes wrong you have no easy way to debug it. And embeddings from today's model may age poorly when you swap models six months from now.

Raw Chat Transcripts

Keep everything. Log every session verbatim. On demand, retrieve by date or keyword. Complete and auditable. Too noisy to be useful as live context: a year of transcripts includes every mistake, every retry, every interruption.

Manual Notes

The low-tech fallback. Copy useful things into a personal notes system (Obsidian, Notion, a markdown folder). Works, does not scale, creates its own maintenance burden.

How Hermes Solves It

Hermes treats memory as a runtime responsibility. The key design choices:

Memory is a directory of markdown files under ~/.hermes/. Human-readable, diffable, portable.
SQLite FTS5 indexes everything for fast full-text retrieval with BM25 ranking.
The agent curates its own memory. New information merges into existing sections. Outdated claims are edited in place. Files get rewritten when structure drifts.
Retrieval is dynamic. At session start Hermes pulls the top-ranked memory files for your first message. Mid-session, it can search further on demand.
You can edit it. When the agent remembered something wrong, you open the file and fix it.

This is the topic of the Hermes memory deep dive in architectural detail.

A Side-by-Side Comparison

Property	Static CLAUDE.md	Vector DB	Raw Transcripts	Hermes Memory
Auditable by a human	Yes	No	Partial (too noisy)	Yes
Updates automatically	No	Yes	Yes	Yes
Editable by a human	Yes	No	No	Yes
Scales past a few KB	Poorly	Yes	Yes	Yes
Supports targeted deletion	Yes	Partial	Yes	Yes
Good for evolving facts	No	Yes	Partial	Yes
Good for stable conventions	Yes	Overkill	Noisy	Yes
Works without external services	Yes	Typically no	Yes	Yes

The Hermes column wins on most rows because it borrows the good properties of each alternative: readability from markdown files, dynamic retrieval from vector-DB thinking, completeness from transcripts, and deliberate curation from manual notes.

What This Looks Like in Practice

Monday morning, you start a Hermes chat. You tell it about the dual-token auth scheme. You explain the flaky test. At session close the agent writes:

# projects/client-foo.md

## Auth
Uses a dual-token scheme: short-lived access tokens in the header,
refresh tokens in an HTTP-only cookie. Historical migration from the
legacy session-based system.

## CI
The `auth.integration.test.ts` test is flaky — re-run on failure
before investigating. Tracked in issue #427.

Friday morning, you start a new session. You ask it to add an endpoint that requires authentication. Hermes's retrieval step pulls projects/client-foo.md into context. The agent applies the dual-token pattern without being told again.

That is the entire point.

When to Still Use CLAUDE.md

Hermes memory does not replace CLAUDE.md. It complements it. CLAUDE.md still earns its keep for project-local conventions that every session needs: "use Tailwind, not CSS modules." "Never use force-dynamic on listing pages." Those are stable rules where always-loaded context is correct.

Use Hermes memory for the evolving layer: what you learned yesterday, what a specific client needs, why a decision was made. The two work together.

For Claude Code Users Specifically

If you stay in Claude Code, there is still value in structuring your CLAUDE.md files more thoughtfully, splitting global from project-specific, and committing them to git. But you will hit the ceiling of the static approach eventually. When you do, Hermes is the runtime-level fix: put a persistent agent on a VPS, let it accumulate memory, and use its delegate-to-Claude-Code capability to run specific coding tasks through your existing CLI.

We cover that integration pattern in spawning Claude Code as a Hermes subagent and the broader tradeoffs in Hermes vs Claude Code: when to use which.

Cross-Session Memory: How Hermes Fixes the 'Claude Forgets' Problem

Key Takeaways

The Failure Mode

Why CLAUDE.md Is a Partial Fix

The Alternatives

Vector Databases

Raw Chat Transcripts

Manual Notes

How Hermes Solves It

A Side-by-Side Comparison

What This Looks Like in Practice

When to Still Use CLAUDE.md

For Claude Code Users Specifically

Sources

Related Skills to Try

Related Skills to Try

Memory Systems Design

Caveman Token Compression Bundle

Related Articles

Related Articles

Dynamic Runtime in AI Skill Design

Is SwiftUI Ready for AI-Built Apps?

Catalyst Patterns for AI Mac Apps

Memory Systems Design

Caveman Token Compression Bundle

Linear CLI Integration

comfyui

Linear CLI Integration

comfyui