Build a Self-Improving AI Agent with ClawHub Skills

Most AI agents are static. You prompt them, they respond, the interaction ends. The next session starts from zero.

The self-improving-agent skill breaks that pattern. It observes the outcomes of its own actions, writes reflections to a persistent memory store, and loads those reflections at the start of every new session. The agent gets better the more you use it — automatically, without any fine-tuning or manual prompting.

This tutorial covers how the skill works internally, how to install and configure it, and how to customize the feedback loops for your specific workflow.

What the Self-Improving Agent Actually Does

Before installing anything, understand the mechanism:

Action logging — Every significant action the agent takes (code written, command run, decision made) is logged with context
Outcome tracking — After a task completes, the agent evaluates whether the outcome matched the goal
Reflection writing — The agent generates a structured reflection: what worked, what didn't, what to do differently
Memory persistence — Reflections are written to a local store (.claude/memory/reflections.json by default)
Session loading — At the start of each new session, relevant reflections are injected into the system prompt

The result: an agent that recognizes patterns in its own failures, avoids repeating mistakes, and doubles down on approaches that work.

Installation

clawhub install self-improving-agent

Verify it installed:

clawhub list | grep self-improving
# self-improving-agent  v2.3.0  agent  /self-improve

The skill installs two components:

/self-improve — the slash command to trigger reflection sessions
An automatic session hook that loads relevant memories on startup

First Run: Establishing a Baseline

Start Claude Code in your project and run the setup command:

/self-improve setup

This initializes the memory store and prompts you to define your agent's core goals:

Setting up self-improving-agent...

What is this agent's primary purpose?
> Senior full-stack developer focusing on TypeScript and Next.js

What are the top 3 things this agent should always optimize for?
> 1. Type safety — never use `any`, always define proper interfaces
> 2. Performance — prefer Server Components, minimize client bundle
> 3. Test coverage — write tests alongside every feature

Baseline established. The agent will now track and improve against these criteria.

These criteria become the evaluation rubric for every reflection cycle.

How Reflection Cycles Work

After the agent completes a significant task, trigger a reflection:

/self-improve reflect

The agent analyzes its recent actions and writes a structured reflection:

{
  "session_id": "2026-03-20-14:32",
  "task": "Implement user authentication with NextAuth",
  "outcome": "completed",
  "evaluation": {
    "type_safety": "pass",
    "performance": "partial — used useEffect for session check instead of Server Component",
    "test_coverage": "fail — no tests written for auth callbacks"
  },
  "lessons": [
    "Auth state can be checked server-side via getServerSession — avoid client-side useEffect for this",
    "NextAuth callbacks need dedicated test coverage with mock providers"
  ],
  "next_session_guidance": "When implementing auth: check session server-side first, write callback tests before implementation"
}

The next time you open Claude Code in this project, that guidance is loaded automatically. The agent won't repeat the same mistakes.

Viewing the Memory Store

Check what the agent has learned:

/self-improve show

Output:

Self-improvement memory store (12 reflections)

Recent lessons:
  → Auth state: prefer getServerSession over client-side useEffect
  → TypeScript: define Zod schemas before writing handler logic
  → Testing: write test file skeleton before implementing feature
  → Performance: check Lighthouse score after every major UI change

Patterns identified:
  → Test coverage failures: 4 of 12 sessions (33%)
  → Most common gap: async error handling

Run `/self-improve insights` for deeper analysis.

Customizing the Feedback Loop

The default configuration works well, but you can tune it for your workflow.

Configuration File

The skill creates .claude/self-improve.config.json in your project:

{
  "goals": [
    "Type safety — never use any",
    "Performance — prefer Server Components",
    "Test coverage — write tests alongside features"
  ],
  "reflection_trigger": "manual",
  "memory_store": ".claude/memory/reflections.json",
  "max_reflections": 50,
  "relevance_threshold": 0.7,
  "inject_top_n": 5
}

Reflection Triggers

Change when reflections happen:

{
  "reflection_trigger": "auto"
}

With auto, the agent reflects after every session automatically. With manual (default), you control timing with /self-improve reflect.

Relevance Threshold

The relevance_threshold controls which past reflections get injected into new sessions. A score of 0.7 means only reflections that are 70%+ relevant to the current task context are loaded. Lower this for broader memory injection, raise it for more targeted recall.

{
  "relevance_threshold": 0.5,
  "inject_top_n": 3
}

Custom Evaluation Criteria

Replace the default goals with domain-specific criteria:

{
  "goals": [
    "Security — never expose secrets in client code",
    "Accessibility — all interactive elements need ARIA labels",
    "Documentation — every exported function needs JSDoc"
  ]
}

The agent will evaluate every action against these criteria and build a targeted memory store around them.

Combining with the Elite Long-Term Memory Skill

For even deeper persistence, combine self-improving-agent with elite-longterm-memory:

clawhub install elite-longterm-memory

While self-improving-agent tracks task-level reflections, elite-longterm-memory stores factual knowledge: your project architecture, your team's conventions, decisions you've made and why.

Configure them to share a memory directory:

// .claude/self-improve.config.json
{
  "memory_store": ".claude/memory/reflections.json"
}

// .claude/longterm-memory.config.json
{
  "memory_store": ".claude/memory/knowledge.json",
  "cross_reference": ".claude/memory/reflections.json"
}

With both active, the agent has two memory systems: episodic (what happened in past sessions) and semantic (what it knows about your project). Combined, they produce an agent that behaves like a senior developer who's been working in your codebase for months.

Practical Example: A Week of Improvement

Here's what the improvement curve looks like in practice over a week of daily use:

Day 1: Agent writes boilerplate auth code. Missing test coverage. Reflection logged.

Day 2: Reflection loaded. Agent proactively suggests test skeleton before implementing auth. Test coverage improves.

Day 3: Agent spots a performance issue in its own output before you do. References day 1 reflection: "I learned to check for Server Component alternatives."

Day 5: You ask for a completely different feature. Agent applies cross-domain lessons: "Based on past patterns, I'll write the interface definition first, then the implementation."

Day 7: Agent catches a type safety issue it would have missed on day 1. No prompting required.

Resetting the Memory Store

If the agent has learned bad habits or you're starting a new project:

/self-improve reset

This clears all reflections and starts fresh. You can also archive the store before resetting:

/self-improve export --output ./agent-memory-backup.json
/self-improve reset

Troubleshooting

Reflections not loading on session start — Check that .claude/memory/reflections.json exists and is valid JSON. Run /self-improve validate.

Too many irrelevant memories loading — Increase relevance_threshold to 0.8 or higher.

Agent not improving on a specific issue — The issue may not be captured in reflections. Trigger a manual reflection immediately after the problem occurs: /self-improve reflect --note "Focus on X pattern".

Next Steps

Pair this skill with persistent memory skills for a complete memory architecture
Read about proactive agent behavior to complement the self-improvement loop
Explore the full skill catalog for skills that feed data into the agent's reflections

Build a Self-Improving AI Agent with ClawHub Skills

Build a Self-Improving AI Agent with ClawHub Skills

What the Self-Improving Agent Actually Does

Installation

First Run: Establishing a Baseline

How Reflection Cycles Work

Viewing the Memory Store

Customizing the Feedback Loop

Configuration File

Reflection Triggers

Relevance Threshold

Custom Evaluation Criteria

Combining with the Elite Long-Term Memory Skill

Practical Example: A Week of Improvement

Resetting the Memory Store

Troubleshooting

Next Steps

Related Skills to Try

Related Skills to Try

Soultrace

DeepTutor

Related Articles

Related Articles

Gesture Recognition in AI Interfaces

CI/CD on Apple Silicon With AI

Apple Silicon Optimization for AI

Soultrace

DeepTutor

GitHub Actions Docs

Lark Approval Workflows

GitHub Actions Docs

Lark Approval Workflows