Build a Self-Improving AI Agent with ClawHub Skills
Learn how the self-improving-agent skill works, how to install and configure it, and how to customize its feedback loops for your specific development workflow.
Build a Self-Improving AI Agent with ClawHub Skills
Most AI agents are static. You prompt them, they respond, the interaction ends. The next session starts from zero.
The self-improving-agent skill breaks that pattern. It observes the outcomes of its own actions, writes reflections to a persistent memory store, and loads those reflections at the start of every new session. The agent gets better the more you use it — automatically, without any fine-tuning or manual prompting.
This tutorial covers how the skill works internally, how to install and configure it, and how to customize the feedback loops for your specific workflow.
What the Self-Improving Agent Actually Does
Before installing anything, understand the mechanism:
- Action logging — Every significant action the agent takes (code written, command run, decision made) is logged with context
- Outcome tracking — After a task completes, the agent evaluates whether the outcome matched the goal
- Reflection writing — The agent generates a structured reflection: what worked, what didn't, what to do differently
- Memory persistence — Reflections are written to a local store (
.claude/memory/reflections.jsonby default) - Session loading — At the start of each new session, relevant reflections are injected into the system prompt
The result: an agent that recognizes patterns in its own failures, avoids repeating mistakes, and doubles down on approaches that work.
Installation
clawhub install self-improving-agent
Verify it installed:
clawhub list | grep self-improving
# self-improving-agent v2.3.0 agent /self-improve
The skill installs two components:
/self-improve— the slash command to trigger reflection sessions- An automatic session hook that loads relevant memories on startup
First Run: Establishing a Baseline
Start Claude Code in your project and run the setup command:
/self-improve setup
This initializes the memory store and prompts you to define your agent's core goals:
Setting up self-improving-agent...
What is this agent's primary purpose?
> Senior full-stack developer focusing on TypeScript and Next.js
What are the top 3 things this agent should always optimize for?
> 1. Type safety — never use `any`, always define proper interfaces
> 2. Performance — prefer Server Components, minimize client bundle
> 3. Test coverage — write tests alongside every feature
Baseline established. The agent will now track and improve against these criteria.
These criteria become the evaluation rubric for every reflection cycle.
How Reflection Cycles Work
After the agent completes a significant task, trigger a reflection:
/self-improve reflect
The agent analyzes its recent actions and writes a structured reflection:
{
"session_id": "2026-03-20-14:32",
"task": "Implement user authentication with NextAuth",
"outcome": "completed",
"evaluation": {
"type_safety": "pass",
"performance": "partial — used useEffect for session check instead of Server Component",
"test_coverage": "fail — no tests written for auth callbacks"
},
"lessons": [
"Auth state can be checked server-side via getServerSession — avoid client-side useEffect for this",
"NextAuth callbacks need dedicated test coverage with mock providers"
],
"next_session_guidance": "When implementing auth: check session server-side first, write callback tests before implementation"
}
The next time you open Claude Code in this project, that guidance is loaded automatically. The agent won't repeat the same mistakes.
Viewing the Memory Store
Check what the agent has learned:
/self-improve show
Output:
Self-improvement memory store (12 reflections)
Recent lessons:
→ Auth state: prefer getServerSession over client-side useEffect
→ TypeScript: define Zod schemas before writing handler logic
→ Testing: write test file skeleton before implementing feature
→ Performance: check Lighthouse score after every major UI change
Patterns identified:
→ Test coverage failures: 4 of 12 sessions (33%)
→ Most common gap: async error handling
Run `/self-improve insights` for deeper analysis.
Customizing the Feedback Loop
The default configuration works well, but you can tune it for your workflow.
Configuration File
The skill creates .claude/self-improve.config.json in your project:
{
"goals": [
"Type safety — never use any",
"Performance — prefer Server Components",
"Test coverage — write tests alongside features"
],
"reflection_trigger": "manual",
"memory_store": ".claude/memory/reflections.json",
"max_reflections": 50,
"relevance_threshold": 0.7,
"inject_top_n": 5
}
Reflection Triggers
Change when reflections happen:
{
"reflection_trigger": "auto"
}
With auto, the agent reflects after every session automatically. With manual (default), you control timing with /self-improve reflect.
Relevance Threshold
The relevance_threshold controls which past reflections get injected into new sessions. A score of 0.7 means only reflections that are 70%+ relevant to the current task context are loaded. Lower this for broader memory injection, raise it for more targeted recall.
{
"relevance_threshold": 0.5,
"inject_top_n": 3
}
Custom Evaluation Criteria
Replace the default goals with domain-specific criteria:
{
"goals": [
"Security — never expose secrets in client code",
"Accessibility — all interactive elements need ARIA labels",
"Documentation — every exported function needs JSDoc"
]
}
The agent will evaluate every action against these criteria and build a targeted memory store around them.
Combining with the Elite Long-Term Memory Skill
For even deeper persistence, combine self-improving-agent with elite-longterm-memory:
clawhub install elite-longterm-memory
While self-improving-agent tracks task-level reflections, elite-longterm-memory stores factual knowledge: your project architecture, your team's conventions, decisions you've made and why.
Configure them to share a memory directory:
// .claude/self-improve.config.json
{
"memory_store": ".claude/memory/reflections.json"
}
// .claude/longterm-memory.config.json
{
"memory_store": ".claude/memory/knowledge.json",
"cross_reference": ".claude/memory/reflections.json"
}
With both active, the agent has two memory systems: episodic (what happened in past sessions) and semantic (what it knows about your project). Combined, they produce an agent that behaves like a senior developer who's been working in your codebase for months.
Practical Example: A Week of Improvement
Here's what the improvement curve looks like in practice over a week of daily use:
Day 1: Agent writes boilerplate auth code. Missing test coverage. Reflection logged.
Day 2: Reflection loaded. Agent proactively suggests test skeleton before implementing auth. Test coverage improves.
Day 3: Agent spots a performance issue in its own output before you do. References day 1 reflection: "I learned to check for Server Component alternatives."
Day 5: You ask for a completely different feature. Agent applies cross-domain lessons: "Based on past patterns, I'll write the interface definition first, then the implementation."
Day 7: Agent catches a type safety issue it would have missed on day 1. No prompting required.
Resetting the Memory Store
If the agent has learned bad habits or you're starting a new project:
/self-improve reset
This clears all reflections and starts fresh. You can also archive the store before resetting:
/self-improve export --output ./agent-memory-backup.json
/self-improve reset
Troubleshooting
Reflections not loading on session start — Check that .claude/memory/reflections.json exists and is valid JSON. Run /self-improve validate.
Too many irrelevant memories loading — Increase relevance_threshold to 0.8 or higher.
Agent not improving on a specific issue — The issue may not be captured in reflections. Trigger a manual reflection immediately after the problem occurs: /self-improve reflect --note "Focus on X pattern".
Next Steps
- Pair this skill with persistent memory skills for a complete memory architecture
- Read about proactive agent behavior to complement the self-improvement loop
- Explore the full skill catalog for skills that feed data into the agent's reflections