Feedback Loop Pattern: Self-Improving Skills

Some tasks cannot be done well in a single pass. Writing polished prose, generating bug-free code, or crafting effective prompts often requires iteration—produce something, evaluate it, identify improvements, and try again. The feedback loop pattern captures this iterative refinement process.

Unlike single-shot skills that produce output once, or chain patterns that move forward through stages, feedback loops cycle back. They generate, evaluate, refine, and repeat until quality criteria are met. This pattern is essential for quality-critical tasks where "good enough" is not good enough.

In this guide, we will explore how to design and implement feedback loop skills that progressively improve their outputs through intelligent self-evaluation.

Understanding the Feedback Loop Pattern

The feedback loop follows a cyclical flow:

Input → [Generate] → [Evaluate] → Quality OK? → Output
              ↑              │
              │      No      │
              └──────────────┘

Each cycle:

Generates or refines output
Evaluates against quality criteria
Decides whether to continue or stop
If continuing, feeds improvements back for next cycle

Core Characteristics

Feedback loop skills have distinct properties:

Iterative Refinement: Output improves through multiple passes rather than one attempt.

Quality Gates: Clear criteria determine when output is good enough.

Self-Evaluation: The skill can assess its own work against standards.

Bounded Iteration: Limits prevent infinite loops even when quality goals are not met.

Progressive Improvement: Each iteration should measurably improve the output.

When to Use Feedback Loop

The feedback loop pattern excels when:

Quality cannot be guaranteed in a single pass
Clear evaluation criteria exist
Improvements can be identified from evaluation
The cost of iteration is worth the quality gain
Tasks involve creative or nuanced judgment

When Not to Use Feedback Loop

Avoid feedback loops when:

Single-pass quality is acceptable
Evaluation criteria are unclear or subjective
Improvements are not actionable
Time constraints prohibit iteration
The task is fundamentally one-shot (like data extraction)

Designing Feedback Loop Skills

Effective feedback loops require careful design of generation, evaluation, and improvement logic.

The Generate Phase

The generator produces or refines content:

## Generator Design

### Initial Generation
First pass creates baseline output:
- Use available context fully
- Follow all explicit constraints
- Apply known best practices
- Aim for completeness over perfection

### Refinement Generation
Subsequent passes improve existing output:
- Address specific feedback points
- Preserve what is already good
- Make targeted improvements
- Track what changed and why

### Generator State
Maintain generation context:
```typescript
interface GeneratorState {
  iteration: number;
  currentOutput: string;
  history: Array<{
    iteration: number;
    output: string;
    evaluation: Evaluation;
    changes: string[];
  }>;
  focusAreas: string[];
}


### The Evaluate Phase

The evaluator assesses output quality:

```markdown
## Evaluator Design

### Quality Dimensions
Define what "good" means across multiple dimensions:

```yaml
dimensions:
  accuracy:
    description: "Factually correct, no errors"
    weight: 0.3
    threshold: 0.9

  completeness:
    description: "Covers all required aspects"
    weight: 0.25
    threshold: 0.85

  clarity:
    description: "Easy to understand"
    weight: 0.2
    threshold: 0.8

  style:
    description: "Matches requested tone and format"
    weight: 0.15
    threshold: 0.75

  conciseness:
    description: "No unnecessary content"
    weight: 0.1
    threshold: 0.7

Evaluation Output

interface Evaluation {
  overallScore: number;  // 0.0 to 1.0
  passesThreshold: boolean;
  dimensions: {
    [dimension: string]: {
      score: number;
      threshold: number;
      passes: boolean;
      feedback: string;
    };
  };
  improvements: Array<{
    priority: "high" | "medium" | "low";
    dimension: string;
    issue: string;
    suggestion: string;
    location?: string;
  }>;
  strengths: string[];
}

Evaluation Criteria

Criteria should be:

Measurable: Can assign scores objectively
Actionable: Low scores suggest specific fixes
Relevant: Tied to actual quality goals
Balanced: Not overly emphasizing one aspect


### The Improve Phase

The improver translates evaluation into action:

```markdown
## Improvement Strategy

### Prioritization
Address issues in order of impact:
1. Critical errors that make output unusable
2. High-priority issues affecting core quality
3. Medium-priority issues affecting polish
4. Low-priority nice-to-haves

### Improvement Actions
For each identified issue:
```typescript
interface Improvement {
  issue: string;
  action: "rewrite" | "add" | "remove" | "modify" | "restructure";
  target: string;
  approach: string;
  expectedImpact: number;  // Expected score improvement
}

Convergence Strategy

Ensure improvements lead to convergence:

Fix the most impactful issues first
Don't over-optimize on one dimension
Stop if improvements become marginal
Preserve previously achieved quality


### Termination Conditions

Define when the loop should stop:

```markdown
## Termination Conditions

### Success Conditions
Stop when any of these are met:
- Overall score exceeds threshold
- All dimensions pass their thresholds
- User-defined quality goal achieved

### Failure Conditions
Stop (with partial result) when:
- Maximum iterations reached
- No improvement for N consecutive iterations
- Critical error that cannot be fixed
- Resource budget exhausted

### Configuration
```yaml
termination:
  maxIterations: 5
  qualityThreshold: 0.85
  minImprovement: 0.02
  stallIterations: 2
  tokenBudget: 50000

Termination Response

interface TerminationResult {
  reason: "quality_achieved" | "max_iterations" | "no_improvement" |
          "budget_exhausted" | "critical_error";
  finalScore: number;
  iterationsUsed: number;
  improvement: number;  // First to last score
}


## Implementing Feedback Loop Skills

Let us build a complete feedback loop skill.

### Example: Code Quality Improver

```markdown
---
name: code-quality-improver
description: Iteratively improves code quality through feedback cycles
version: 1.0.0
---

# Code Quality Improver

Improve code through iterative refinement based on quality evaluation.

## Overview

Code → [Improve] ←──────┐ ↓ │ [Evaluate] │ ↓ │ Passes? ──No───────┘ │ Yes ↓ [Output]


---

## Input

```json
{
  "code": "Source code to improve",
  "language": "Programming language",
  "goals": {
    "readability": true,
    "performance": true,
    "security": true,
    "bestPractices": true
  },
  "constraints": {
    "preserveLogic": true,
    "maxIterations": 5,
    "minQuality": 0.85
  }
}

Phase 1: Generate/Improve

Initial Generation

For iteration 1, analyze and improve:

Identify improvement opportunities
Apply straightforward fixes
Add missing documentation
Improve naming
Restructure for clarity

Refinement Generation

For subsequent iterations, address specific feedback:

## Refinement Instructions

Given the evaluation feedback:
{evaluation.improvements}

Current code:
{currentCode}

Apply improvements in priority order:
1. Address high-priority issues first
2. Make minimal changes to fix each issue
3. Preserve working code
4. Document significant changes

For each improvement:
- Identify the exact location
- Apply the fix
- Verify it addresses the feedback
- Check it doesn't break other aspects

Improvement Categories

Readability Improvements:

Better variable/function names
Clearer code structure
Helpful comments
Consistent formatting

Performance Improvements:

Algorithm optimization
Reduce redundant operations
Efficient data structures
Lazy evaluation where appropriate

Security Improvements:

Input validation
Safe defaults
Proper error handling
Remove vulnerabilities

Best Practice Improvements:

Language idioms
Design patterns
Error handling
Testing considerations

Phase 2: Evaluate

Quality Dimensions

readability:
  weight: 0.3
  checks:
    - naming: "Variables and functions have clear, descriptive names"
    - structure: "Code is logically organized"
    - comments: "Complex logic is explained"
    - formatting: "Consistent, readable formatting"

performance:
  weight: 0.25
  checks:
    - algorithms: "Efficient algorithms used"
    - operations: "No unnecessary computations"
    - memory: "Memory used efficiently"
    - complexity: "Acceptable time complexity"

security:
  weight: 0.25
  checks:
    - input: "All inputs validated"
    - errors: "Errors handled safely"
    - secrets: "No hardcoded secrets"
    - vulnerabilities: "Common vulnerabilities avoided"

bestPractices:
  weight: 0.2
  checks:
    - idioms: "Language idioms followed"
    - patterns: "Appropriate patterns used"
    - modularity: "Code is modular and reusable"
    - testability: "Code is testable"

Evaluation Process

## Evaluate the improved code

For each quality dimension:

1. **Score each check** (0.0 to 1.0)
   - 1.0: Excellent, no issues
   - 0.8: Good, minor issues
   - 0.6: Adequate, some issues
   - 0.4: Poor, significant issues
   - 0.2: Very poor, major issues
   - 0.0: Critical failure

2. **Aggregate dimension score**
   Average of check scores

3. **Identify specific issues**
   For scores < 0.8, describe:
   - What the issue is
   - Where it occurs
   - How to fix it
   - Priority (high/medium/low)

4. **Note strengths**
   What is done well that should be preserved

5. **Calculate overall score**
   Weighted average of dimension scores

Evaluation Output

{
  "iteration": 2,
  "overallScore": 0.78,
  "passesThreshold": false,
  "dimensions": {
    "readability": {
      "score": 0.85,
      "passes": true,
      "feedback": "Good naming, clear structure"
    },
    "performance": {
      "score": 0.65,
      "passes": false,
      "feedback": "Nested loop could be optimized"
    },
    "security": {
      "score": 0.80,
      "passes": true,
      "feedback": "Input validation present"
    },
    "bestPractices": {
      "score": 0.75,
      "passes": false,
      "feedback": "Could use more idiomatic patterns"
    }
  },
  "improvements": [
    {
      "priority": "high",
      "dimension": "performance",
      "issue": "O(n^2) nested loop in processItems()",
      "suggestion": "Use a Set for O(1) lookup instead of array.includes()",
      "location": "lines 15-20"
    },
    {
      "priority": "medium",
      "dimension": "bestPractices",
      "issue": "Manual iteration where map() would be clearer",
      "suggestion": "Replace for loop with array.map()",
      "location": "lines 25-30"
    }
  ],
  "strengths": [
    "Clear function names",
    "Good error messages",
    "Comprehensive input validation"
  ]
}

Phase 3: Decide

Continue or Stop

function shouldContinue(state: LoopState): Decision {
  const eval = state.currentEvaluation;
  const prevScore = state.history[state.iteration - 1]?.evaluation.overallScore ?? 0;

  // Success: Quality achieved
  if (eval.overallScore >= state.config.minQuality) {
    return { continue: false, reason: "quality_achieved" };
  }

  // Failure: Max iterations
  if (state.iteration >= state.config.maxIterations) {
    return { continue: false, reason: "max_iterations" };
  }

  // Failure: No improvement
  const improvement = eval.overallScore - prevScore;
  if (improvement < state.config.minImprovement && state.iteration > 1) {
    state.stallCount++;
    if (state.stallCount >= 2) {
      return { continue: false, reason: "no_improvement" };
    }
  } else {
    state.stallCount = 0;
  }

  // Continue: More improvement possible
  return {
    continue: true,
    focus: eval.improvements.slice(0, 3)  // Top 3 issues
  };
}

Loop Execution

Full Cycle

Iteration 1:
  Generate: Create improved version
  Evaluate: Score 0.65
  Decide: Continue, focus on performance

Iteration 2:
  Improve: Fix performance issues
  Evaluate: Score 0.78
  Decide: Continue, focus on best practices

Iteration 3:
  Improve: Apply idiomatic patterns
  Evaluate: Score 0.87
  Decide: Stop, quality achieved

State Tracking

{
  "iterations": 3,
  "scoreProgression": [0.65, 0.78, 0.87],
  "improvementsMade": [
    "Replaced nested loop with Set lookup",
    "Added input validation to processItems()",
    "Converted for loops to map/filter",
    "Added JSDoc comments to public functions"
  ],
  "finalQuality": 0.87,
  "terminationReason": "quality_achieved"
}

Output

{
  "success": true,
  "improvedCode": "... the final improved code ...",
  "summary": {
    "iterations": 3,
    "initialScore": 0.45,
    "finalScore": 0.87,
    "improvement": 0.42,
    "terminationReason": "quality_achieved"
  },
  "changes": [
    {
      "type": "performance",
      "description": "Optimized nested loop to O(n)",
      "location": "processItems()",
      "impact": "high"
    },
    {
      "type": "bestPractices",
      "description": "Used array methods instead of loops",
      "location": "multiple",
      "impact": "medium"
    }
  ],
  "qualityReport": {
    "readability": { "score": 0.90, "status": "excellent" },
    "performance": { "score": 0.85, "status": "good" },
    "security": { "score": 0.85, "status": "good" },
    "bestPractices": { "score": 0.88, "status": "good" }
  },
  "metadata": {
    "processingTime": 4500,
    "tokensUsed": 12000,
    "version": "1.0.0"
  }
}


## Advanced Feedback Loop Techniques

### Multi-Evaluator Approach

Use multiple evaluators for comprehensive assessment:

```markdown
## Multi-Evaluator Design

### Specialized Evaluators
```yaml
evaluators:
  syntactic:
    focus: "Code correctness and syntax"
    weight: 0.2

  semantic:
    focus: "Logic and behavior"
    weight: 0.3

  stylistic:
    focus: "Code style and readability"
    weight: 0.2

  practical:
    focus: "Real-world usability"
    weight: 0.3

Aggregation

Combine evaluator outputs:

Run all evaluators in parallel
Weight and combine scores
Merge improvement suggestions
Resolve conflicting feedback

Conflict Resolution

When evaluators disagree:

Higher priority evaluator wins
Semantic > Stylistic for logic issues
Stylistic > Semantic for pure formatting
Flag conflicts for human review


### Adaptive Iteration

Adjust strategy based on progress:

```markdown
## Adaptive Strategy

### Progress Monitoring
Track improvement rate:
```typescript
function getStrategy(history: History): Strategy {
  const rate = calculateImprovementRate(history);

  if (rate > 0.1) {
    return "aggressive";  // Big improvements possible
  } else if (rate > 0.03) {
    return "standard";    // Normal progress
  } else if (rate > 0) {
    return "careful";     // Diminishing returns
  } else {
    return "stop";        // No improvement
  }
}

Strategy Adjustment

aggressive:
  changesPerIteration: 5-7
  riskTolerance: high
  focusOn: "biggest issues"

standard:
  changesPerIteration: 3-5
  riskTolerance: medium
  focusOn: "balanced improvement"

careful:
  changesPerIteration: 1-2
  riskTolerance: low
  focusOn: "safe wins only"


### Rollback Capability

Support reverting bad changes:

```markdown
## Rollback Design

### Checkpoint System
Save state at each iteration:
```typescript
interface Checkpoint {
  iteration: number;
  output: string;
  evaluation: Evaluation;
  timestamp: number;
}

Rollback Triggers

Score decreased significantly
Critical dimension failed that previously passed
Introduced new errors

Rollback Process

Detect regression
Identify last good checkpoint
Restore from checkpoint
Log what caused regression
Adjust strategy to avoid same mistake
Continue from good state

Rollback Limits

Maximum 2 rollbacks per session
After limit, return best-seen result


## Testing Feedback Loop Skills

Testing loops requires verifying convergence and quality.

### Convergence Testing

```markdown
## Convergence Tests

### Guaranteed Convergence
Input: Any valid code
Expected: Terminates within maxIterations

### Quality Convergence
Input: Code with known issues
Expected: Score increases each iteration (until threshold or stall)

### Stall Detection
Input: Already optimal code
Expected: Detects no improvement, stops early

### Regression Prevention
Input: Code where naive fix breaks other aspects
Expected: Does not regress, or rolls back

Quality Testing

## Quality Tests

### Known Issues
Input: Code with specific planted issues
Expected: Identifies and fixes issues

### Score Accuracy
Input: Range of code quality levels
Expected: Scores correlate with actual quality

### Improvement Validity
Input: Code with known improvements
Expected: Suggestions match expected improvements

Performance Testing

## Performance Tests

### Token Efficiency
Measure tokens per quality point gained

### Iteration Efficiency
Track iterations needed for various quality gaps

### Resource Bounds
Verify stays within configured limits

Real-World Feedback Loop Examples

Documentation Improver

---
name: doc-improver
description: Iteratively improves documentation quality
---

# Documentation Improver

## Loop Structure
Generate → Evaluate → Improve → Repeat

## Quality Dimensions
- Completeness: All features documented
- Clarity: Easy to understand
- Accuracy: Matches actual behavior
- Examples: Helpful code samples
- Structure: Logical organization

## Termination
- Quality > 0.9 across all dimensions
- Maximum 4 iterations
- Stall after 2 iterations with < 0.05 improvement

Prompt Optimizer

---
name: prompt-optimizer
description: Iteratively refines prompts for better results
---

# Prompt Optimizer

## Loop Structure
Test → Evaluate → Refine → Repeat

## Quality Dimensions
- Effectiveness: Achieves intended result
- Consistency: Produces reliable outputs
- Efficiency: Uses tokens wisely
- Robustness: Handles edge cases

## Testing Phase
Run prompt against test cases, collect results

## Evaluation Phase
Compare results to expected outcomes

## Refinement Phase
Adjust prompt based on failures

Conclusion

The feedback loop pattern transforms AI skills from single-shot attempts into iterative refinement engines. By generating, evaluating, and improving in cycles, skills can achieve quality levels that one-pass processing cannot reach.

Key principles for effective feedback loops:

Clear quality criteria: Define measurable dimensions with thresholds
Actionable evaluation: Feedback must suggest specific improvements
Progressive improvement: Each iteration should measurably improve output
Bounded iteration: Prevent infinite loops with clear termination conditions
Rollback capability: Recover from improvements that make things worse

Start with simple loops—generate, evaluate against one or two criteria, improve. As you gain confidence, add more evaluation dimensions, adaptive strategies, and sophisticated improvement logic.

The feedback loop pattern is your tool for achieving excellence through iteration, turning good enough into genuinely good.

Feedback Loop Pattern: Self-Improving Skills

In this guide, we will explore how to design and implement feedback loop skills that progressively improve their outputs through intelligent self-evaluation.

Understanding the Feedback Loop Pattern

The feedback loop follows a cyclical flow:

Input → [Generate] → [Evaluate] → Quality OK? → Output
              ↑              │
              │      No      │
              └──────────────┘

Each cycle:

Generates or refines output
Evaluates against quality criteria
Decides whether to continue or stop
If continuing, feeds improvements back for next cycle

Core Characteristics

Feedback loop skills have distinct properties:

Iterative Refinement: Output improves through multiple passes rather than one attempt.

Quality Gates: Clear criteria determine when output is good enough.

Self-Evaluation: The skill can assess its own work against standards.

Bounded Iteration: Limits prevent infinite loops even when quality goals are not met.

Progressive Improvement: Each iteration should measurably improve the output.

When to Use Feedback Loop

The feedback loop pattern excels when:

Quality cannot be guaranteed in a single pass
Clear evaluation criteria exist
Improvements can be identified from evaluation
The cost of iteration is worth the quality gain
Tasks involve creative or nuanced judgment

When Not to Use Feedback Loop

Avoid feedback loops when:

Single-pass quality is acceptable
Evaluation criteria are unclear or subjective
Improvements are not actionable
Time constraints prohibit iteration
The task is fundamentally one-shot (like data extraction)

Designing Feedback Loop Skills

Effective feedback loops require careful design of generation, evaluation, and improvement logic.

The Generate Phase

The generator produces or refines content:

## Generator Design

### Initial Generation
First pass creates baseline output:
- Use available context fully
- Follow all explicit constraints
- Apply known best practices
- Aim for completeness over perfection

### Refinement Generation
Subsequent passes improve existing output:
- Address specific feedback points
- Preserve what is already good
- Make targeted improvements
- Track what changed and why

### Generator State
Maintain generation context:
```typescript
interface GeneratorState {
  iteration: number;
  currentOutput: string;
  history: Array<{
    iteration: number;
    output: string;
    evaluation: Evaluation;
    changes: string[];
  }>;
  focusAreas: string[];
}


### The Evaluate Phase

The evaluator assesses output quality:

```markdown
## Evaluator Design

### Quality Dimensions
Define what "good" means across multiple dimensions:

```yaml
dimensions:
  accuracy:
    description: "Factually correct, no errors"
    weight: 0.3
    threshold: 0.9

  completeness:
    description: "Covers all required aspects"
    weight: 0.25
    threshold: 0.85

  clarity:
    description: "Easy to understand"
    weight: 0.2
    threshold: 0.8

  style:
    description: "Matches requested tone and format"
    weight: 0.15
    threshold: 0.75

  conciseness:
    description: "No unnecessary content"
    weight: 0.1
    threshold: 0.7

Evaluation Output

interface Evaluation {
  overallScore: number;  // 0.0 to 1.0
  passesThreshold: boolean;
  dimensions: {
    [dimension: string]: {
      score: number;
      threshold: number;
      passes: boolean;
      feedback: string;
    };
  };
  improvements: Array<{
    priority: "high" | "medium" | "low";
    dimension: string;
    issue: string;
    suggestion: string;
    location?: string;
  }>;
  strengths: string[];
}

Evaluation Criteria

Criteria should be:

Measurable: Can assign scores objectively
Actionable: Low scores suggest specific fixes
Relevant: Tied to actual quality goals
Balanced: Not overly emphasizing one aspect


### The Improve Phase

The improver translates evaluation into action:

```markdown
## Improvement Strategy

### Prioritization
Address issues in order of impact:
1. Critical errors that make output unusable
2. High-priority issues affecting core quality
3. Medium-priority issues affecting polish
4. Low-priority nice-to-haves

### Improvement Actions
For each identified issue:
```typescript
interface Improvement {
  issue: string;
  action: "rewrite" | "add" | "remove" | "modify" | "restructure";
  target: string;
  approach: string;
  expectedImpact: number;  // Expected score improvement
}

Convergence Strategy

Ensure improvements lead to convergence:

Fix the most impactful issues first
Don't over-optimize on one dimension
Stop if improvements become marginal
Preserve previously achieved quality


### Termination Conditions

Define when the loop should stop:

```markdown
## Termination Conditions

### Success Conditions
Stop when any of these are met:
- Overall score exceeds threshold
- All dimensions pass their thresholds
- User-defined quality goal achieved

### Failure Conditions
Stop (with partial result) when:
- Maximum iterations reached
- No improvement for N consecutive iterations
- Critical error that cannot be fixed
- Resource budget exhausted

### Configuration
```yaml
termination:
  maxIterations: 5
  qualityThreshold: 0.85
  minImprovement: 0.02
  stallIterations: 2
  tokenBudget: 50000

Termination Response

interface TerminationResult {
  reason: "quality_achieved" | "max_iterations" | "no_improvement" |
          "budget_exhausted" | "critical_error";
  finalScore: number;
  iterationsUsed: number;
  improvement: number;  // First to last score
}


## Implementing Feedback Loop Skills

Let us build a complete feedback loop skill.

### Example: Code Quality Improver

```markdown
---
name: code-quality-improver
description: Iteratively improves code quality through feedback cycles
version: 1.0.0
---

# Code Quality Improver

Improve code through iterative refinement based on quality evaluation.

## Overview

Code → [Improve] ←──────┐ ↓ │ [Evaluate] │ ↓ │ Passes? ──No───────┘ │ Yes ↓ [Output]


---

## Input

```json
{
  "code": "Source code to improve",
  "language": "Programming language",
  "goals": {
    "readability": true,
    "performance": true,
    "security": true,
    "bestPractices": true
  },
  "constraints": {
    "preserveLogic": true,
    "maxIterations": 5,
    "minQuality": 0.85
  }
}

Phase 1: Generate/Improve

Initial Generation

For iteration 1, analyze and improve:

Identify improvement opportunities
Apply straightforward fixes
Add missing documentation
Improve naming
Restructure for clarity

Refinement Generation

For subsequent iterations, address specific feedback:

## Refinement Instructions

Given the evaluation feedback:
{evaluation.improvements}

Current code:
{currentCode}

Apply improvements in priority order:
1. Address high-priority issues first
2. Make minimal changes to fix each issue
3. Preserve working code
4. Document significant changes

For each improvement:
- Identify the exact location
- Apply the fix
- Verify it addresses the feedback
- Check it doesn't break other aspects

Improvement Categories

Readability Improvements:

Better variable/function names
Clearer code structure
Helpful comments
Consistent formatting

Performance Improvements:

Algorithm optimization
Reduce redundant operations
Efficient data structures
Lazy evaluation where appropriate

Security Improvements:

Input validation
Safe defaults
Proper error handling
Remove vulnerabilities

Best Practice Improvements:

Language idioms
Design patterns
Error handling
Testing considerations

Phase 2: Evaluate

Quality Dimensions

readability:
  weight: 0.3
  checks:
    - naming: "Variables and functions have clear, descriptive names"
    - structure: "Code is logically organized"
    - comments: "Complex logic is explained"
    - formatting: "Consistent, readable formatting"

performance:
  weight: 0.25
  checks:
    - algorithms: "Efficient algorithms used"
    - operations: "No unnecessary computations"
    - memory: "Memory used efficiently"
    - complexity: "Acceptable time complexity"

security:
  weight: 0.25
  checks:
    - input: "All inputs validated"
    - errors: "Errors handled safely"
    - secrets: "No hardcoded secrets"
    - vulnerabilities: "Common vulnerabilities avoided"

bestPractices:
  weight: 0.2
  checks:
    - idioms: "Language idioms followed"
    - patterns: "Appropriate patterns used"
    - modularity: "Code is modular and reusable"
    - testability: "Code is testable"

Evaluation Process

## Evaluate the improved code

For each quality dimension:

1. **Score each check** (0.0 to 1.0)
   - 1.0: Excellent, no issues
   - 0.8: Good, minor issues
   - 0.6: Adequate, some issues
   - 0.4: Poor, significant issues
   - 0.2: Very poor, major issues
   - 0.0: Critical failure

2. **Aggregate dimension score**
   Average of check scores

3. **Identify specific issues**
   For scores < 0.8, describe:
   - What the issue is
   - Where it occurs
   - How to fix it
   - Priority (high/medium/low)

4. **Note strengths**
   What is done well that should be preserved

5. **Calculate overall score**
   Weighted average of dimension scores

Evaluation Output

{
  "iteration": 2,
  "overallScore": 0.78,
  "passesThreshold": false,
  "dimensions": {
    "readability": {
      "score": 0.85,
      "passes": true,
      "feedback": "Good naming, clear structure"
    },
    "performance": {
      "score": 0.65,
      "passes": false,
      "feedback": "Nested loop could be optimized"
    },
    "security": {
      "score": 0.80,
      "passes": true,
      "feedback": "Input validation present"
    },
    "bestPractices": {
      "score": 0.75,
      "passes": false,
      "feedback": "Could use more idiomatic patterns"
    }
  },
  "improvements": [
    {
      "priority": "high",
      "dimension": "performance",
      "issue": "O(n^2) nested loop in processItems()",
      "suggestion": "Use a Set for O(1) lookup instead of array.includes()",
      "location": "lines 15-20"
    },
    {
      "priority": "medium",
      "dimension": "bestPractices",
      "issue": "Manual iteration where map() would be clearer",
      "suggestion": "Replace for loop with array.map()",
      "location": "lines 25-30"
    }
  ],
  "strengths": [
    "Clear function names",
    "Good error messages",
    "Comprehensive input validation"
  ]
}

Phase 3: Decide

Continue or Stop

function shouldContinue(state: LoopState): Decision {
  const eval = state.currentEvaluation;
  const prevScore = state.history[state.iteration - 1]?.evaluation.overallScore ?? 0;

  // Success: Quality achieved
  if (eval.overallScore >= state.config.minQuality) {
    return { continue: false, reason: "quality_achieved" };
  }

  // Failure: Max iterations
  if (state.iteration >= state.config.maxIterations) {
    return { continue: false, reason: "max_iterations" };
  }

  // Failure: No improvement
  const improvement = eval.overallScore - prevScore;
  if (improvement < state.config.minImprovement && state.iteration > 1) {
    state.stallCount++;
    if (state.stallCount >= 2) {
      return { continue: false, reason: "no_improvement" };
    }
  } else {
    state.stallCount = 0;
  }

  // Continue: More improvement possible
  return {
    continue: true,
    focus: eval.improvements.slice(0, 3)  // Top 3 issues
  };
}

Loop Execution

Full Cycle

Iteration 1:
  Generate: Create improved version
  Evaluate: Score 0.65
  Decide: Continue, focus on performance

Iteration 2:
  Improve: Fix performance issues
  Evaluate: Score 0.78
  Decide: Continue, focus on best practices

Iteration 3:
  Improve: Apply idiomatic patterns
  Evaluate: Score 0.87
  Decide: Stop, quality achieved

State Tracking

{
  "iterations": 3,
  "scoreProgression": [0.65, 0.78, 0.87],
  "improvementsMade": [
    "Replaced nested loop with Set lookup",
    "Added input validation to processItems()",
    "Converted for loops to map/filter",
    "Added JSDoc comments to public functions"
  ],
  "finalQuality": 0.87,
  "terminationReason": "quality_achieved"
}

Output

{
  "success": true,
  "improvedCode": "... the final improved code ...",
  "summary": {
    "iterations": 3,
    "initialScore": 0.45,
    "finalScore": 0.87,
    "improvement": 0.42,
    "terminationReason": "quality_achieved"
  },
  "changes": [
    {
      "type": "performance",
      "description": "Optimized nested loop to O(n)",
      "location": "processItems()",
      "impact": "high"
    },
    {
      "type": "bestPractices",
      "description": "Used array methods instead of loops",
      "location": "multiple",
      "impact": "medium"
    }
  ],
  "qualityReport": {
    "readability": { "score": 0.90, "status": "excellent" },
    "performance": { "score": 0.85, "status": "good" },
    "security": { "score": 0.85, "status": "good" },
    "bestPractices": { "score": 0.88, "status": "good" }
  },
  "metadata": {
    "processingTime": 4500,
    "tokensUsed": 12000,
    "version": "1.0.0"
  }
}


## Advanced Feedback Loop Techniques

### Multi-Evaluator Approach

Use multiple evaluators for comprehensive assessment:

```markdown
## Multi-Evaluator Design

### Specialized Evaluators
```yaml
evaluators:
  syntactic:
    focus: "Code correctness and syntax"
    weight: 0.2

  semantic:
    focus: "Logic and behavior"
    weight: 0.3

  stylistic:
    focus: "Code style and readability"
    weight: 0.2

  practical:
    focus: "Real-world usability"
    weight: 0.3

Aggregation

Combine evaluator outputs:

Run all evaluators in parallel
Weight and combine scores
Merge improvement suggestions
Resolve conflicting feedback

Conflict Resolution

When evaluators disagree:

Higher priority evaluator wins
Semantic > Stylistic for logic issues
Stylistic > Semantic for pure formatting
Flag conflicts for human review


### Adaptive Iteration

Adjust strategy based on progress:

```markdown
## Adaptive Strategy

### Progress Monitoring
Track improvement rate:
```typescript
function getStrategy(history: History): Strategy {
  const rate = calculateImprovementRate(history);

  if (rate > 0.1) {
    return "aggressive";  // Big improvements possible
  } else if (rate > 0.03) {
    return "standard";    // Normal progress
  } else if (rate > 0) {
    return "careful";     // Diminishing returns
  } else {
    return "stop";        // No improvement
  }
}

Strategy Adjustment

aggressive:
  changesPerIteration: 5-7
  riskTolerance: high
  focusOn: "biggest issues"

standard:
  changesPerIteration: 3-5
  riskTolerance: medium
  focusOn: "balanced improvement"

careful:
  changesPerIteration: 1-2
  riskTolerance: low
  focusOn: "safe wins only"


### Rollback Capability

Support reverting bad changes:

```markdown
## Rollback Design

### Checkpoint System
Save state at each iteration:
```typescript
interface Checkpoint {
  iteration: number;
  output: string;
  evaluation: Evaluation;
  timestamp: number;
}

Rollback Triggers

Score decreased significantly
Critical dimension failed that previously passed
Introduced new errors

Rollback Process

Detect regression
Identify last good checkpoint
Restore from checkpoint
Log what caused regression
Adjust strategy to avoid same mistake
Continue from good state

Rollback Limits

Maximum 2 rollbacks per session
After limit, return best-seen result


## Testing Feedback Loop Skills

Testing loops requires verifying convergence and quality.

### Convergence Testing

```markdown
## Convergence Tests

### Guaranteed Convergence
Input: Any valid code
Expected: Terminates within maxIterations

### Quality Convergence
Input: Code with known issues
Expected: Score increases each iteration (until threshold or stall)

### Stall Detection
Input: Already optimal code
Expected: Detects no improvement, stops early

### Regression Prevention
Input: Code where naive fix breaks other aspects
Expected: Does not regress, or rolls back

Quality Testing

## Quality Tests

### Known Issues
Input: Code with specific planted issues
Expected: Identifies and fixes issues

### Score Accuracy
Input: Range of code quality levels
Expected: Scores correlate with actual quality

### Improvement Validity
Input: Code with known improvements
Expected: Suggestions match expected improvements

Performance Testing

## Performance Tests

### Token Efficiency
Measure tokens per quality point gained

### Iteration Efficiency
Track iterations needed for various quality gaps

### Resource Bounds
Verify stays within configured limits

Real-World Feedback Loop Examples

Documentation Improver

---
name: doc-improver
description: Iteratively improves documentation quality
---

# Documentation Improver

## Loop Structure
Generate → Evaluate → Improve → Repeat

## Quality Dimensions
- Completeness: All features documented
- Clarity: Easy to understand
- Accuracy: Matches actual behavior
- Examples: Helpful code samples
- Structure: Logical organization

## Termination
- Quality > 0.9 across all dimensions
- Maximum 4 iterations
- Stall after 2 iterations with < 0.05 improvement

Prompt Optimizer

---
name: prompt-optimizer
description: Iteratively refines prompts for better results
---

# Prompt Optimizer

## Loop Structure
Test → Evaluate → Refine → Repeat

## Quality Dimensions
- Effectiveness: Achieves intended result
- Consistency: Produces reliable outputs
- Efficiency: Uses tokens wisely
- Robustness: Handles edge cases

## Testing Phase
Run prompt against test cases, collect results

## Evaluation Phase
Compare results to expected outcomes

## Refinement Phase
Adjust prompt based on failures

Conclusion

Key principles for effective feedback loops:

Clear quality criteria: Define measurable dimensions with thresholds
Actionable evaluation: Feedback must suggest specific improvements
Progressive improvement: Each iteration should measurably improve output
Bounded iteration: Prevent infinite loops with clear termination conditions
Rollback capability: Recover from improvements that make things worse

Start with simple loops—generate, evaluate against one or two criteria, improve. As you gain confidence, add more evaluation dimensions, adaptive strategies, and sophisticated improvement logic.

The feedback loop pattern is your tool for achieving excellence through iteration, turning good enough into genuinely good.