Context Compaction: Building Resilient AI Skills

Every skill builder eventually hits the same invisible wall. Their skill works perfectly in short sessions but starts producing degraded output in longer ones. Instructions get forgotten. Previously generated code loses consistency. The agent seems to lose track of what the skill was doing.

The culprit is almost always context compaction -- the process by which the AI agent compresses its conversation history to free up space. Understanding how compaction works -- alongside Claude Code's tool architecture and permission model -- is not optional knowledge for serious skill builders. It is the difference between skills that work reliably and skills that silently degrade.

Disclaimer: The technical details in this article draw on CCLeaks material -- AI-generated content that analyses Claude Code's architecture. This content may contain inaccuracies, is not affiliated with Anthropic, and should be treated as signal reading rather than confirmed specifications. The design patterns we recommend, however, are sound engineering regardless of the exact implementation details.

Key Takeaways

Claude Code's default context window is 200K tokens (1M with specific configurations), with system prompts consuming 5-25K before the user types anything.
Auto-compaction triggers at approximately 187K tokens (80% of the 200K window), compressing old messages and freeing 40-60% of context.
MicroCompact incrementally compresses individual tool outputs (Bash, FileRead, Grep) before full compaction triggers.
Skills that write critical state to files survive compaction; skills that rely on conversation history silently degrade as sessions get longer.
Seven design patterns make skills compaction-resilient: concise output, file-based state, structured formatting, memory file usage, resumability, chunked operations, and context-volume awareness.

The Token Budget Reality

The signals from CCLeaks suggest Claude Code operates with a 200K default context window, with a 1M option available through specific model configurations. For most users, 200K is the working constraint, and it fills up faster than you might expect.

System prompts typically consume 5-15K tokens. Memory files add another 2-10K. That is up to 25K tokens spent before the user types a single character. In a long coding session with file reads, tool outputs, and back-and-forth conversation, the remaining 175K tokens can be consumed in 20-30 minutes of active work.

The key number is the auto-compact trigger: approximately 187K tokens, which represents 80% of the 200K window with a 13K buffer. When usage crosses this threshold, compaction fires automatically. The agent does not ask permission. It does not warn you. It compresses the conversation history and continues.

What Compaction Actually Does

Compaction is not deletion -- it is summarisation. The agent reviews the conversation history and produces a compressed version that preserves the key information while reducing token count. The signals suggest it frees 40-60% of the context window, which is substantial.

But summarisation is lossy by nature. Details that the compaction process judges to be less important get condensed or dropped. Specific code snippets might be summarised as "the user wrote a React component for user authentication." Exact error messages might become "there was a type error in the authentication module." The gist survives. The precision often does not.

A CompactBoundaryMessage marker gets inserted at the compaction point, which means the agent knows compaction happened. But knowing it happened does not restore what was lost.

MicroCompact: The Incremental Squeeze

Beyond the full auto-compact, there is a more subtle mechanism: MicroCompact. This performs incremental compression on the output of specific tools -- Bash, FileRead, and Grep in particular. If your skill generates a 500-line file read, MicroCompact might compress that output before it even settles into the context window.

This means that large tool outputs from skills are especially vulnerable. A skill that reads an entire configuration file, processes it, and produces verbose output is consuming context at a rate that accelerates its own compaction. The tool output gets compressed, then eventually the broader conversation gets compacted, and your skill's work gets compressed twice.

Post-Compact Restoration

The system has a partial recovery mechanism. After compaction, up to 5 files can be restored into context, with a maximum of 5K tokens each (25K total). Background extraction may also preserve key insights into memory files.

This is important for skill design. If your skill produces output that gets stored in files, those files can be restored after compaction. If your skill produces output that only exists in the conversation stream, it vanishes when compacted.

The restoration mechanism is not automatic for all file types. The signals suggest it targets files that were recently read or modified -- the files the agent was actively working with. This creates a survival heuristic: skills that write their important state to files are more resilient than skills that keep everything in conversation context.

The Circuit Breaker

There is a safety mechanism worth knowing about: a circuit breaker that allows a maximum of 3 consecutive auto-compact failures. If compaction fails three times in a row, the system presumably takes a more aggressive action -- possibly resetting context or alerting the user.

For skill builders, this means that skills which generate enormous amounts of context quickly can push the system into a failure mode. If your skill triggers compaction, and its post-compaction output immediately re-fills the window, and compaction fires again, you are consuming the circuit breaker budget. Three cycles and you are in trouble.

Design Patterns for Compaction Resilience

Understanding compaction mechanics leads directly to better skill design. Here are the patterns that work.

Pattern 1: Be Ruthlessly Concise

Every token your skill outputs is a token that accelerates compaction. The most impactful design decision is to produce less output, not more.

Bad pattern: a code review skill that quotes entire functions, explains each issue in a paragraph, provides the fix inline, and adds context about why the pattern is problematic.

Better pattern: a code review skill that lists file, line number, issue category, and a one-line description, then provides detailed fixes only when the user asks for them.

The difference is not just about token efficiency. Concise output survives compaction better because there is less to lose. A summary of "3 critical issues found in auth module" is more compaction-resistant than three paragraphs of detailed analysis that might get compressed into exactly that summary anyway.

Pattern 2: Write State to Files, Not Conversation

This is the single most important pattern for long-running skills. Any information your skill needs to persist across compaction should be written to a file, not left in conversation context.

A project scaffolding skill should write its plan to a file before executing it. A multi-step refactoring skill should maintain a progress tracker in a file. An audit skill should write findings incrementally to a report file rather than accumulating them in conversation.

The files persist on disk. The conversation gets compacted. Design accordingly.

Pattern 3: Structure Output for Summarisation

If your skill's output will be summarised (and it will, eventually), design it so the summary is useful. This means front-loading the most important information.

Put conclusions before evidence. Put action items before analysis. Put the critical finding in the first sentence, not the last. When compaction summarises your output, it is more likely to preserve the opening than the closing.

Use clear headers and structured formats. Compaction handles "## Critical Issues" followed by a bullet list better than it handles a narrative paragraph where the critical finding is buried in the middle of sentence seven.

Pattern 4: Leverage Memory Files for Persistence

The signals suggest Claude Code supports background extraction of key insights into memory files. Skills can work with this mechanism rather than against it.

Design your skill to produce output that is memory-file friendly. Key decisions, architectural choices, configuration values, and important context should be clearly labelled and separated from routine output. This makes it easier for the extraction process to identify what matters.

Some skills explicitly write to memory files as part of their workflow. A project initialisation skill might write project conventions to a memory file so they survive any number of compactions throughout the development session.

Pattern 5: Design for Resumability

Assume your skill's previous output has been compacted. Design interactions so the skill can recover gracefully.

A multi-step deployment skill should check the current state of the deployment rather than relying on memory of what it did in step 3. A code generation skill should read the existing files to understand what has been generated rather than assuming the conversation history is complete.

Idempotent operations naturally handle compaction well. If your skill can re-derive its state from the current state of the project (files on disk, database state, git history), it does not need the conversation history to function correctly.

Pattern 6: Chunk Large Operations

Instead of processing an entire codebase in one pass, chunk the work into smaller operations with intermediate file outputs.

A codebase analysis skill should not read 50 files and produce a monolithic report in a single turn. It should read files in batches, write intermediate findings to a file, and produce a final synthesis that references the intermediate files. Each chunk is smaller, survives compaction better, and the intermediate files persist regardless of context compression.

Pattern 7: Use the Early Warning Buffer

The signals suggest a 20K token early warning threshold. Skills that are context-aware can monitor their own output volume and behave differently when approaching limits.

A verbose skill might switch to a compact output mode when context is running low. A multi-step skill might checkpoint its progress more frequently as it approaches the compaction zone. This is not easy to implement at the skill level today, but designing skills with output volume awareness is a good practice regardless.

The Custom Compaction Lever

One signal worth noting: custom compaction instructions appear to be configurable. This means that projects can influence how compaction prioritises information. If your skill operates in a project with custom compaction instructions, you can potentially ensure that skill-specific context is preserved during compression.

This is a project-level configuration, not a skill-level one. But if you are building skills for a specific project or team, setting up compaction instructions that protect critical context is a high-leverage move.

What This Means for Skill Builders

Context management is the hidden quality dimension of skill design. Two skills with identical logic and identical prompts will produce different results depending on how they handle context. The skill that writes state to files, produces concise output, structures information for summarisation, and designs for resumability will work reliably in hour-long sessions. The skill that dumps verbose output into the conversation stream will degrade as the session progresses.

The compaction system is not your enemy. It is what makes long sessions possible at all. But it operates on its own logic, and skills that work with that logic outperform skills that ignore it.

Build skills that assume their output will be compressed. Build skills that persist critical state outside the conversation. Build skills that can resume from file state rather than conversation history. These are not optimisations -- they are requirements for production-quality skill design.

The context window is not infinite, and compaction is not transparent. The skill builders who internalise this constraint -- alongside understanding persistent memory and the broader feature flag landscape -- will build tools that work reliably in the real world, not just in short demo sessions.

Compaction Thresholds at a Glance

Metric	Value	Implication for Skills
Default context window	200K tokens	Your skill shares this with system prompt, memory, and conversation
System prompt size	5-15K tokens	Leaves ~185K for everything else
Memory files	2-10K tokens	Persistent state is cheap in context terms
Auto-compact trigger	~187K tokens (80%)	Plan for compaction in any session over 20 minutes
Context freed by compaction	40-60%	Old output will be summarised, not preserved verbatim
Post-compact file restoration	5 files, 5K each (25K total)	Only the most recent file reads survive intact
Circuit breaker	3 consecutive failures	Compaction can fail -- your skill must handle degraded context

Frequently Asked Questions

What triggers auto-compaction in Claude Code?

Auto-compaction triggers when the conversation reaches approximately 187K tokens (about 80% of the default 200K context window), leaving a 13K buffer. The system summarises older messages, inserts a CompactBoundaryMessage marker, and frees 40-60% of the context. A circuit breaker stops after 3 consecutive compaction failures.

How many tokens does the context window hold?

The default context window is 200K tokens. A 1M option is available for Opus and Sonnet 4.6 models via a specific model suffix. System prompts consume 5-15K tokens and memory files add 2-10K, so the effective space for conversation and tool output is approximately 175-190K tokens.

How do I build a skill that survives compaction?

Write critical state to files rather than relying on conversation history. Keep output concise. Structure information with clear headings and lists so summaries preserve key points. Design for resumability -- your skill should be able to pick up from file state if conversation context is lost.

What is MicroCompact?

MicroCompact is an incremental compression system that compresses individual large tool outputs (from Bash, FileRead, and Grep) before full auto-compaction triggers. It reduces context consumption from verbose tool results without waiting for the 80% threshold.

How does compaction affect skill output quality?

Compaction is lossy -- older conversation messages are summarised by Claude, not preserved verbatim. Specific numbers, code snippets, and detailed instructions may be simplified during summarisation. Skills that persist important data to files are unaffected; skills that rely on earlier conversation output may produce inconsistent results after compaction.

Explore production-ready AI skills at aiskill.market/browse or submit your own skill to the marketplace.