CLAUDE.md Decay: The 3–6 Month Review Cadence Anthropic Just Made Official
Instructions that helped your old model can hurt your new one. Anthropic's Applied AI team just put a number on how often you should audit your CLAUDE.md, hooks, and skills — and gave two concrete examples of rules that aged badly.
There's a category of tech debt nobody named until Anthropic's Applied AI team named it this week: configuration decay.
The setup: your CLAUDE.md files, hooks, and skills were written to compensate for the current model's limitations. The next model release lifts those limitations. The instructions remain. They are now actively making your tooling worse.
The Applied AI team gives two examples that landed for me, because I've seen variants of both:
"A CLAUDE.md rule that tells Claude to break every refactor into single-file changes may have helped an earlier model stay on track but would prevent a newer one from making coordinated cross-file edits it handles well."
And:
"A hook that intercepted file writes to enforce
p4 editin a Perforce codebase, for example, became redundant once Claude Code added native Perforce mode."
Both rules made sense the day they were written. Both rules are now subtracting capability.
The Pattern: Your Config Has Expiration Dates Nobody Tracks
When a developer writes a CLAUDE.md rule like "always run tests after each change" or "use try/except defensively around every external call," they're encoding a workaround for behavior they observed at the time. Six months later:
- The model has improved at testing intuition. The "always run tests" rule now causes test-runs after every typo fix, wasting context.
- The model has improved at error-handling judgment. The "use try/except defensively" rule now wraps internal calls that don't need wrapping.
Neither of these is catastrophically wrong. Both are quiet drag. The model is being constrained by instructions that no longer match its capability surface.
This is the configuration analogue of dead code, with one crucial difference: dead code is harmless. Dead configuration is a tax. It costs context tokens to load. It biases the model toward outdated patterns. It compounds across sessions.
The 3–6 Month Cadence, Stated Officially
The recommendation in the article is concrete:
"Teams should expect to do a meaningful configuration review every three to six months, but it's also worth doing one whenever performance feels like it's plateaued after major model releases."
That's the first time I've seen Anthropic put a number on it. Three to six months is the operational rhythm.
Practical implication: if your CLAUDE.md hasn't been audited since you wrote it, it's probably already out of date. If it hasn't been audited in a year, it's almost certainly load-bearing for an older model that doesn't exist anymore.
What An Audit Actually Looks Like
The Applied AI team doesn't spell out the audit process. Here's what I'd actually do, in priority order:
1. Take every rule that starts with "always" or "never" and stress-test it
Rules with absolute quantifiers are the most likely to age badly. The model that needed "always do X" is rarely the same model still benefiting from it. Run the same task with the rule and without. If outputs are identical, the rule is doing nothing. If they're different, decide which one is actually better — and don't assume the rule wins.
2. Audit your hooks for what they prevent
Every hook intercepts a behavior. Ask: is this behavior still a problem? If your hook prevents the model from writing to a directory because the old model used to spam it with junk, but the current model has never tried to write there in your last 100 sessions, the hook is overhead.
3. Audit your skills for compensation patterns
A skill that exists to compensate for a specific model limitation should die when the limitation does. The Perforce hook example is the canonical case. Your equivalent is: any skill whose entire purpose is "remind Claude to do X" rather than "give Claude expertise in domain Y."
Domain-expertise skills compound. Reminder skills decay.
4. Read the model release notes
This sounds obvious. It's almost never done. When Anthropic ships a new Sonnet, the release notes tell you what the new model does well that the old one didn't. That's exactly the surface area of capability your configuration might be suppressing. Treat each release note as a configuration-audit checklist.
The Failure Mode I See Most
Teams add to their CLAUDE.md. They almost never remove from it.
There's a psychology to this — removing a rule feels like undoing a learned lesson. But the lesson was provisional. It was true given the model at the time. When the model changes, the lesson's truth status is provisional again, and the right move is to re-test, not to keep accumulating.
The net effect across a hundred teams: CLAUDE.md files become geological. Layers of historical reactions to historical model behaviors. The current model is buried under instructions written for models two generations dead.
The Light Touch That Compounds
There's a self-improving version of all this, mentioned obliquely in the Applied AI piece in the section on hooks:
"A stop hook can reflect on what happened during a session and propose CLAUDE.md updates while the context is fresh."
A stop hook that removes outdated rules is at least as valuable as one that adds new ones. Few teams have implemented the removal side. It's the natural complement: every quarter, the hook flags rules whose triggering behavior hasn't occurred in N sessions. Those are candidates for deletion.
I'd bet good money that a "configuration garbage collector" hook would be among the highest-leverage skills published in the next six months. Someone is going to package it and the rest of us will install it.
The Practical Reframe
CLAUDE.md is not a config file. It's a living document with expiration dates, and the dates aren't written on the rules.
Three to six months. After every major model release. Especially when performance feels like it's plateaued — because plateaus usually mean the model has lapped the configuration, and the configuration is now the bottleneck.
This is the unglamorous half of skill-based engineering: not writing the rule, but knowing when to retire it.
Part of the Claude Code at Scale series. Previous: Agentic search vs. RAG. Next: LSP — the highest-ROI investment most teams skip.