Feedback Loop Pattern: Self-Improving Skills
Learn the feedback loop pattern for AI skills that iteratively refine their outputs through evaluation and improvement cycles for quality-critical tasks.
Feedback Loop Pattern: Self-Improving Skills
Some tasks cannot be done well in a single pass. Writing polished prose, generating bug-free code, or crafting effective prompts often requires iteration—produce something, evaluate it, identify improvements, and try again. The feedback loop pattern captures this iterative refinement process.
Unlike single-shot skills that produce output once, or chain patterns that move forward through stages, feedback loops cycle back. They generate, evaluate, refine, and repeat until quality criteria are met. This pattern is essential for quality-critical tasks where "good enough" is not good enough.
In this guide, we will explore how to design and implement feedback loop skills that progressively improve their outputs through intelligent self-evaluation.
Understanding the Feedback Loop Pattern
The feedback loop follows a cyclical flow:
Input → [Generate] → [Evaluate] → Quality OK? → Output
↑ │
│ No │
└──────────────┘
Each cycle:
- Generates or refines output
- Evaluates against quality criteria
- Decides whether to continue or stop
- If continuing, feeds improvements back for next cycle
Core Characteristics
Feedback loop skills have distinct properties:
Iterative Refinement: Output improves through multiple passes rather than one attempt.
Quality Gates: Clear criteria determine when output is good enough.
Self-Evaluation: The skill can assess its own work against standards.
Bounded Iteration: Limits prevent infinite loops even when quality goals are not met.
Progressive Improvement: Each iteration should measurably improve the output.
When to Use Feedback Loop
The feedback loop pattern excels when:
- Quality cannot be guaranteed in a single pass
- Clear evaluation criteria exist
- Improvements can be identified from evaluation
- The cost of iteration is worth the quality gain
- Tasks involve creative or nuanced judgment
When Not to Use Feedback Loop
Avoid feedback loops when:
- Single-pass quality is acceptable
- Evaluation criteria are unclear or subjective
- Improvements are not actionable
- Time constraints prohibit iteration
- The task is fundamentally one-shot (like data extraction)
Designing Feedback Loop Skills
Effective feedback loops require careful design of generation, evaluation, and improvement logic.
The Generate Phase
The generator produces or refines content:
## Generator Design
### Initial Generation
First pass creates baseline output:
- Use available context fully
- Follow all explicit constraints
- Apply known best practices
- Aim for completeness over perfection
### Refinement Generation
Subsequent passes improve existing output:
- Address specific feedback points
- Preserve what is already good
- Make targeted improvements
- Track what changed and why
### Generator State
Maintain generation context:
```typescript
interface GeneratorState {
iteration: number;
currentOutput: string;
history: Array<{
iteration: number;
output: string;
evaluation: Evaluation;
changes: string[];
}>;
focusAreas: string[];
}
### The Evaluate Phase
The evaluator assesses output quality:
```markdown
## Evaluator Design
### Quality Dimensions
Define what "good" means across multiple dimensions:
```yaml
dimensions:
accuracy:
description: "Factually correct, no errors"
weight: 0.3
threshold: 0.9
completeness:
description: "Covers all required aspects"
weight: 0.25
threshold: 0.85
clarity:
description: "Easy to understand"
weight: 0.2
threshold: 0.8
style:
description: "Matches requested tone and format"
weight: 0.15
threshold: 0.75
conciseness:
description: "No unnecessary content"
weight: 0.1
threshold: 0.7
Evaluation Output
interface Evaluation {
overallScore: number; // 0.0 to 1.0
passesThreshold: boolean;
dimensions: {
[dimension: string]: {
score: number;
threshold: number;
passes: boolean;
feedback: string;
};
};
improvements: Array<{
priority: "high" | "medium" | "low";
dimension: string;
issue: string;
suggestion: string;
location?: string;
}>;
strengths: string[];
}
Evaluation Criteria
Criteria should be:
- Measurable: Can assign scores objectively
- Actionable: Low scores suggest specific fixes
- Relevant: Tied to actual quality goals
- Balanced: Not overly emphasizing one aspect
### The Improve Phase
The improver translates evaluation into action:
```markdown
## Improvement Strategy
### Prioritization
Address issues in order of impact:
1. Critical errors that make output unusable
2. High-priority issues affecting core quality
3. Medium-priority issues affecting polish
4. Low-priority nice-to-haves
### Improvement Actions
For each identified issue:
```typescript
interface Improvement {
issue: string;
action: "rewrite" | "add" | "remove" | "modify" | "restructure";
target: string;
approach: string;
expectedImpact: number; // Expected score improvement
}
Convergence Strategy
Ensure improvements lead to convergence:
- Fix the most impactful issues first
- Don't over-optimize on one dimension
- Stop if improvements become marginal
- Preserve previously achieved quality
### Termination Conditions
Define when the loop should stop:
```markdown
## Termination Conditions
### Success Conditions
Stop when any of these are met:
- Overall score exceeds threshold
- All dimensions pass their thresholds
- User-defined quality goal achieved
### Failure Conditions
Stop (with partial result) when:
- Maximum iterations reached
- No improvement for N consecutive iterations
- Critical error that cannot be fixed
- Resource budget exhausted
### Configuration
```yaml
termination:
maxIterations: 5
qualityThreshold: 0.85
minImprovement: 0.02
stallIterations: 2
tokenBudget: 50000
Termination Response
interface TerminationResult {
reason: "quality_achieved" | "max_iterations" | "no_improvement" |
"budget_exhausted" | "critical_error";
finalScore: number;
iterationsUsed: number;
improvement: number; // First to last score
}
## Implementing Feedback Loop Skills
Let us build a complete feedback loop skill.
### Example: Code Quality Improver
```markdown
---
name: code-quality-improver
description: Iteratively improves code quality through feedback cycles
version: 1.0.0
---
# Code Quality Improver
Improve code through iterative refinement based on quality evaluation.
## Overview
Code → [Improve] ←──────┐ ↓ │ [Evaluate] │ ↓ │ Passes? ──No───────┘ │ Yes ↓ [Output]
---
## Input
```json
{
"code": "Source code to improve",
"language": "Programming language",
"goals": {
"readability": true,
"performance": true,
"security": true,
"bestPractices": true
},
"constraints": {
"preserveLogic": true,
"maxIterations": 5,
"minQuality": 0.85
}
}
Phase 1: Generate/Improve
Initial Generation
For iteration 1, analyze and improve:
- Identify improvement opportunities
- Apply straightforward fixes
- Add missing documentation
- Improve naming
- Restructure for clarity
Refinement Generation
For subsequent iterations, address specific feedback:
## Refinement Instructions
Given the evaluation feedback:
{evaluation.improvements}
Current code:
{currentCode}
Apply improvements in priority order:
1. Address high-priority issues first
2. Make minimal changes to fix each issue
3. Preserve working code
4. Document significant changes
For each improvement:
- Identify the exact location
- Apply the fix
- Verify it addresses the feedback
- Check it doesn't break other aspects
Improvement Categories
Readability Improvements:
- Better variable/function names
- Clearer code structure
- Helpful comments
- Consistent formatting
Performance Improvements:
- Algorithm optimization
- Reduce redundant operations
- Efficient data structures
- Lazy evaluation where appropriate
Security Improvements:
- Input validation
- Safe defaults
- Proper error handling
- Remove vulnerabilities
Best Practice Improvements:
- Language idioms
- Design patterns
- Error handling
- Testing considerations
Phase 2: Evaluate
Quality Dimensions
readability:
weight: 0.3
checks:
- naming: "Variables and functions have clear, descriptive names"
- structure: "Code is logically organized"
- comments: "Complex logic is explained"
- formatting: "Consistent, readable formatting"
performance:
weight: 0.25
checks:
- algorithms: "Efficient algorithms used"
- operations: "No unnecessary computations"
- memory: "Memory used efficiently"
- complexity: "Acceptable time complexity"
security:
weight: 0.25
checks:
- input: "All inputs validated"
- errors: "Errors handled safely"
- secrets: "No hardcoded secrets"
- vulnerabilities: "Common vulnerabilities avoided"
bestPractices:
weight: 0.2
checks:
- idioms: "Language idioms followed"
- patterns: "Appropriate patterns used"
- modularity: "Code is modular and reusable"
- testability: "Code is testable"
Evaluation Process
## Evaluate the improved code
For each quality dimension:
1. **Score each check** (0.0 to 1.0)
- 1.0: Excellent, no issues
- 0.8: Good, minor issues
- 0.6: Adequate, some issues
- 0.4: Poor, significant issues
- 0.2: Very poor, major issues
- 0.0: Critical failure
2. **Aggregate dimension score**
Average of check scores
3. **Identify specific issues**
For scores < 0.8, describe:
- What the issue is
- Where it occurs
- How to fix it
- Priority (high/medium/low)
4. **Note strengths**
What is done well that should be preserved
5. **Calculate overall score**
Weighted average of dimension scores
Evaluation Output
{
"iteration": 2,
"overallScore": 0.78,
"passesThreshold": false,
"dimensions": {
"readability": {
"score": 0.85,
"passes": true,
"feedback": "Good naming, clear structure"
},
"performance": {
"score": 0.65,
"passes": false,
"feedback": "Nested loop could be optimized"
},
"security": {
"score": 0.80,
"passes": true,
"feedback": "Input validation present"
},
"bestPractices": {
"score": 0.75,
"passes": false,
"feedback": "Could use more idiomatic patterns"
}
},
"improvements": [
{
"priority": "high",
"dimension": "performance",
"issue": "O(n^2) nested loop in processItems()",
"suggestion": "Use a Set for O(1) lookup instead of array.includes()",
"location": "lines 15-20"
},
{
"priority": "medium",
"dimension": "bestPractices",
"issue": "Manual iteration where map() would be clearer",
"suggestion": "Replace for loop with array.map()",
"location": "lines 25-30"
}
],
"strengths": [
"Clear function names",
"Good error messages",
"Comprehensive input validation"
]
}
Phase 3: Decide
Continue or Stop
function shouldContinue(state: LoopState): Decision {
const eval = state.currentEvaluation;
const prevScore = state.history[state.iteration - 1]?.evaluation.overallScore ?? 0;
// Success: Quality achieved
if (eval.overallScore >= state.config.minQuality) {
return { continue: false, reason: "quality_achieved" };
}
// Failure: Max iterations
if (state.iteration >= state.config.maxIterations) {
return { continue: false, reason: "max_iterations" };
}
// Failure: No improvement
const improvement = eval.overallScore - prevScore;
if (improvement < state.config.minImprovement && state.iteration > 1) {
state.stallCount++;
if (state.stallCount >= 2) {
return { continue: false, reason: "no_improvement" };
}
} else {
state.stallCount = 0;
}
// Continue: More improvement possible
return {
continue: true,
focus: eval.improvements.slice(0, 3) // Top 3 issues
};
}
Loop Execution
Full Cycle
Iteration 1:
Generate: Create improved version
Evaluate: Score 0.65
Decide: Continue, focus on performance
Iteration 2:
Improve: Fix performance issues
Evaluate: Score 0.78
Decide: Continue, focus on best practices
Iteration 3:
Improve: Apply idiomatic patterns
Evaluate: Score 0.87
Decide: Stop, quality achieved
State Tracking
{
"iterations": 3,
"scoreProgression": [0.65, 0.78, 0.87],
"improvementsMade": [
"Replaced nested loop with Set lookup",
"Added input validation to processItems()",
"Converted for loops to map/filter",
"Added JSDoc comments to public functions"
],
"finalQuality": 0.87,
"terminationReason": "quality_achieved"
}
Output
{
"success": true,
"improvedCode": "... the final improved code ...",
"summary": {
"iterations": 3,
"initialScore": 0.45,
"finalScore": 0.87,
"improvement": 0.42,
"terminationReason": "quality_achieved"
},
"changes": [
{
"type": "performance",
"description": "Optimized nested loop to O(n)",
"location": "processItems()",
"impact": "high"
},
{
"type": "bestPractices",
"description": "Used array methods instead of loops",
"location": "multiple",
"impact": "medium"
}
],
"qualityReport": {
"readability": { "score": 0.90, "status": "excellent" },
"performance": { "score": 0.85, "status": "good" },
"security": { "score": 0.85, "status": "good" },
"bestPractices": { "score": 0.88, "status": "good" }
},
"metadata": {
"processingTime": 4500,
"tokensUsed": 12000,
"version": "1.0.0"
}
}
## Advanced Feedback Loop Techniques
### Multi-Evaluator Approach
Use multiple evaluators for comprehensive assessment:
```markdown
## Multi-Evaluator Design
### Specialized Evaluators
```yaml
evaluators:
syntactic:
focus: "Code correctness and syntax"
weight: 0.2
semantic:
focus: "Logic and behavior"
weight: 0.3
stylistic:
focus: "Code style and readability"
weight: 0.2
practical:
focus: "Real-world usability"
weight: 0.3
Aggregation
Combine evaluator outputs:
- Run all evaluators in parallel
- Weight and combine scores
- Merge improvement suggestions
- Resolve conflicting feedback
Conflict Resolution
When evaluators disagree:
- Higher priority evaluator wins
- Semantic > Stylistic for logic issues
- Stylistic > Semantic for pure formatting
- Flag conflicts for human review
### Adaptive Iteration
Adjust strategy based on progress:
```markdown
## Adaptive Strategy
### Progress Monitoring
Track improvement rate:
```typescript
function getStrategy(history: History): Strategy {
const rate = calculateImprovementRate(history);
if (rate > 0.1) {
return "aggressive"; // Big improvements possible
} else if (rate > 0.03) {
return "standard"; // Normal progress
} else if (rate > 0) {
return "careful"; // Diminishing returns
} else {
return "stop"; // No improvement
}
}
Strategy Adjustment
aggressive:
changesPerIteration: 5-7
riskTolerance: high
focusOn: "biggest issues"
standard:
changesPerIteration: 3-5
riskTolerance: medium
focusOn: "balanced improvement"
careful:
changesPerIteration: 1-2
riskTolerance: low
focusOn: "safe wins only"
### Rollback Capability
Support reverting bad changes:
```markdown
## Rollback Design
### Checkpoint System
Save state at each iteration:
```typescript
interface Checkpoint {
iteration: number;
output: string;
evaluation: Evaluation;
timestamp: number;
}
Rollback Triggers
- Score decreased significantly
- Critical dimension failed that previously passed
- Introduced new errors
Rollback Process
- Detect regression
- Identify last good checkpoint
- Restore from checkpoint
- Log what caused regression
- Adjust strategy to avoid same mistake
- Continue from good state
Rollback Limits
- Maximum 2 rollbacks per session
- After limit, return best-seen result
## Testing Feedback Loop Skills
Testing loops requires verifying convergence and quality.
### Convergence Testing
```markdown
## Convergence Tests
### Guaranteed Convergence
Input: Any valid code
Expected: Terminates within maxIterations
### Quality Convergence
Input: Code with known issues
Expected: Score increases each iteration (until threshold or stall)
### Stall Detection
Input: Already optimal code
Expected: Detects no improvement, stops early
### Regression Prevention
Input: Code where naive fix breaks other aspects
Expected: Does not regress, or rolls back
Quality Testing
## Quality Tests
### Known Issues
Input: Code with specific planted issues
Expected: Identifies and fixes issues
### Score Accuracy
Input: Range of code quality levels
Expected: Scores correlate with actual quality
### Improvement Validity
Input: Code with known improvements
Expected: Suggestions match expected improvements
Performance Testing
## Performance Tests
### Token Efficiency
Measure tokens per quality point gained
### Iteration Efficiency
Track iterations needed for various quality gaps
### Resource Bounds
Verify stays within configured limits
Real-World Feedback Loop Examples
Documentation Improver
---
name: doc-improver
description: Iteratively improves documentation quality
---
# Documentation Improver
## Loop Structure
Generate → Evaluate → Improve → Repeat
## Quality Dimensions
- Completeness: All features documented
- Clarity: Easy to understand
- Accuracy: Matches actual behavior
- Examples: Helpful code samples
- Structure: Logical organization
## Termination
- Quality > 0.9 across all dimensions
- Maximum 4 iterations
- Stall after 2 iterations with < 0.05 improvement
Prompt Optimizer
---
name: prompt-optimizer
description: Iteratively refines prompts for better results
---
# Prompt Optimizer
## Loop Structure
Test → Evaluate → Refine → Repeat
## Quality Dimensions
- Effectiveness: Achieves intended result
- Consistency: Produces reliable outputs
- Efficiency: Uses tokens wisely
- Robustness: Handles edge cases
## Testing Phase
Run prompt against test cases, collect results
## Evaluation Phase
Compare results to expected outcomes
## Refinement Phase
Adjust prompt based on failures
Conclusion
The feedback loop pattern transforms AI skills from single-shot attempts into iterative refinement engines. By generating, evaluating, and improving in cycles, skills can achieve quality levels that one-pass processing cannot reach.
Key principles for effective feedback loops:
- Clear quality criteria: Define measurable dimensions with thresholds
- Actionable evaluation: Feedback must suggest specific improvements
- Progressive improvement: Each iteration should measurably improve output
- Bounded iteration: Prevent infinite loops with clear termination conditions
- Rollback capability: Recover from improvements that make things worse
Start with simple loops—generate, evaluate against one or two criteria, improve. As you gain confidence, add more evaluation dimensions, adaptive strategies, and sophisticated improvement logic.
The feedback loop pattern is your tool for achieving excellence through iteration, turning good enough into genuinely good.