TDD Skill Analysis: How obra/superpowers Changed Development

When obra released the superpowers collection, it quietly changed how the Claude Code community thinks about test-driven development. The TDD skill within this collection isn't just instructions for writing tests; it's a complete methodology encoded into a format that Claude Code can follow reliably. It has become the de facto standard for TDD in the Claude Code ecosystem, spawning dozens of derivatives and inspiring a generation of testing-focused skills.

In this analysis, we'll dissect what makes the obra/superpowers TDD skill exceptional, examine its internal structure, and extract lessons you can apply whether you're using this skill directly or building your own testing workflows.

The obra/superpowers Collection

Before focusing on the TDD skill specifically, let's understand the broader context. obra (also known as John Googlin) created superpowers as a comprehensive Claude Code enhancement collection. It includes:

TDD Skill: The focus of this analysis
Refactoring Patterns: Structured approaches to code transformation
Debugging Workflows: Systematic error investigation
Architecture Guidance: High-level design principles

The collection follows a consistent philosophy: encode expert knowledge into structured, actionable instructions that Claude Code can follow without ambiguity. This philosophy reaches its apex in the TDD skill.

What Makes This TDD Skill Different

Many TDD skills exist in the marketplace. Most follow a similar pattern: "Write a test first, watch it fail, write code to pass, refactor." This describes TDD but doesn't enable it. The obra/superpowers TDD skill goes much deeper.

The Red-Green-Refactor Cycle with Enforcement

The skill doesn't just describe the TDD cycle; it enforces it through explicit checkpoints:

## The TDD Cycle

### Phase 1: Red (Write a Failing Test)

Before writing ANY implementation code:
1. Identify the next smallest piece of functionality
2. Write a test that exercises that functionality
3. Run the test to confirm it FAILS
4. If the test passes, STOP - either:
   - The functionality already exists (unnecessary test)
   - The test is wrong (doesn't test what you think)

CRITICAL: Do not proceed to Phase 2 until you have seen red.

### Phase 2: Green (Make It Pass)

With a failing test in hand:
1. Write the MINIMUM code to make the test pass
2. Do not add extra functionality
3. Do not optimize
4. Do not refactor
5. Run the test to confirm it PASSES

MINIMUM means exactly that. If returning a hardcoded value passes the test, that's acceptable for now.

### Phase 3: Refactor (Improve the Code)

With passing tests:
1. Look for code smells in the new code
2. Apply one refactoring at a time
3. Run tests after each refactoring
4. If any test fails, REVERT the last refactoring
5. Continue until the code is clean

Notice how each phase has explicit stop conditions. This prevents Claude Code from shortcutting the process, which is a common failure mode for AI assistants doing TDD.

The Granularity Principle

One of the skill's most impactful contributions is its emphasis on test granularity:

## Finding the Right Granularity

A test should exercise ONE behavior. If you're tempted to use "and" when describing the test, split it.

### Wrong Granularity

"test that createUser validates email and saves to database"
This is two tests:
1. "test that createUser validates email format"
2. "test that createUser saves valid user to database"

### Right Granularity

Each test should be describable with:
- Given: A specific starting state
- When: A single action
- Then: A single expected outcome

### The One-Assertion Guideline

Prefer one assertion per test. Multiple assertions often indicate multiple behaviors being tested.

Exception: Related assertions about a single object are acceptable:
```python
def test_user_creation():
    user = create_user("test@example.com")
    assert user.email == "test@example.com"
    assert user.created_at is not None
    # These are aspects of the same behavior


This granularity principle solves a common problem: when tests are too broad, failures don't pinpoint issues. The skill's guidance helps Claude Code write tests that serve as precise documentation of intended behavior.

### Test Naming Convention

The skill prescribes specific naming patterns that improve test readability:

```markdown
## Test Naming

Use the pattern: `test_[unit]_[scenario]_[expected]`

Examples:
- `test_parser_empty_input_returns_none`
- `test_validator_invalid_email_raises_error`
- `test_calculator_divide_by_zero_raises_exception`

Avoid:
- `test_parse` (what scenario? what expected?)
- `test_it_works` (what is "it"?)
- `test1`, `test2` (no semantic meaning)

For nested test classes:
```python
class TestUserValidator:
    class TestEmailValidation:
        def test_accepts_valid_format(self):
            ...
        def test_rejects_missing_at_symbol(self):
            ...


The naming convention becomes crucial for test-as-documentation. When a test fails, its name immediately communicates what broke.

## Analyzing the Skill's Structure

Let's examine how the skill file itself is organized, as this reveals patterns applicable to skill design generally:

### Progressive Disclosure

The skill uses a layered structure that provides information at the right level of abstraction:

```markdown
# TDD Workflow

## Quick Reference
[One-paragraph summary for experienced users]

## The Cycle
[Core methodology - what to do]

## Detailed Guidance
[How to do it well]

## Edge Cases
[When the normal rules don't apply]

## Anti-Patterns
[Common mistakes to avoid]

This structure means Claude Code can quickly reference the cycle when it's confident, but has detailed guidance available when facing unusual situations.

Embedded Decision Trees

Throughout the skill, decision trees help Claude Code navigate ambiguous situations:

## When to Write Integration Tests

Before writing an integration test, ask:
1. Does this behavior span multiple units?
   - No -> Write a unit test instead
   - Yes -> Continue

2. Is the integration point external (database, API)?
   - Yes -> Consider a contract test
   - No -> Continue

3. Can the integration be tested without real external services?
   - Yes -> Write an integration test with fakes
   - No -> Write an integration test with test containers

4. Is the test fast enough for the main test suite?
   - Yes -> Include in main suite
   - No -> Mark as slow test, run in CI only

These decision trees transform vague expertise into explicit procedures. Claude Code doesn't have to guess; it follows the tree.

Anti-Pattern Catalog

The skill dedicates significant space to what NOT to do:

## TDD Anti-Patterns

### 1. The Prophet
Writing tests that predict implementation details.

Wrong:
```python
def test_uses_binary_search():
    # Tests HOW rather than WHAT
    with patch('module.binary_search') as mock:
        find_element([1,2,3], 2)
        mock.assert_called()

Right:

def test_finds_existing_element():
    # Tests WHAT, not HOW
    assert find_element([1,2,3], 2) == True

2. The Liar

Tests that pass but don't actually test anything.

Wrong:

def test_save_user():
    user = User(name="Test")
    save_user(user)
    # No assertion! Always passes

3. The Slow Poke

Tests that hit real databases/APIs unnecessarily.

If a test takes more than 100ms, ask:

Is this testing integration or unit behavior?
Can I fake the slow dependency?

4. The Giant

Single tests that verify too many things.

Signs:

Test method longer than 20 lines
Multiple unrelated assertions
Complex setup fixtures


By explicitly cataloging anti-patterns, the skill helps Claude Code recognize and avoid common pitfalls that even experienced developers fall into.

## How the Skill Changes Claude Code Behavior

Using this skill fundamentally alters how Claude Code approaches development tasks. Let's trace through an example:

### Without the TDD Skill

User: "Add a password strength validator"

Claude Code's typical response:
1. Create `PasswordValidator` class
2. Implement complexity checking logic
3. Add tests that verify the implementation

This is implementation-first development. The tests verify what was built rather than driving what should be built.

### With the obra/superpowers TDD Skill

User: "Add a password strength validator"

Claude Code's response:
1. "I'll start with the simplest failing test. What should happen with an empty password?"
2. Creates: `test_empty_password_is_weak`
3. Runs test, confirms failure
4. Implements minimal code to pass
5. "Now let's add length requirements. What's the minimum length?"
6. Creates: `test_password_shorter_than_8_chars_is_weak`
7. Continues cycle...

The skill transforms Claude Code's approach from implementation-first to specification-first. Each test becomes a requirement that drives implementation.

## Practical Lessons from the Skill

### Lesson 1: Constraints Enable Creativity

The skill's explicit constraints (minimum code, one behavior per test, specific naming) might seem limiting. In practice, they prevent common mistakes and help Claude Code produce consistently high-quality output.

Apply this lesson: When creating your own skills, don't be afraid of explicit constraints. Telling Claude Code what NOT to do is often as valuable as telling it what to do.

### Lesson 2: Decision Trees Scale

Expertise often lives in "it depends" knowledge that's hard to articulate. The skill's decision trees make these dependencies explicit and followable.

Apply this lesson: When you find yourself saying "it depends," that's a signal to create a decision tree. Map out the conditions and the corresponding guidance.

### Lesson 3: Anti-Patterns Are Instructive

Showing what's wrong, with explanations of why, helps Claude Code recognize similar patterns in novel situations.

Apply this lesson: Dedicate space in your skills to anti-patterns. Include examples and explanations of the problems they cause.

### Lesson 4: Progressive Detail

The skill works for both quick reference and deep guidance because of its layered structure. Experienced users get the summary; uncertain situations access detailed guidance.

Apply this lesson: Structure skills with increasing detail levels. Quick reference -> Core concepts -> Detailed guidance -> Edge cases.

## Extending the TDD Skill

The obra/superpowers TDD skill is designed to be extended. Here are common adaptations:

### Language-Specific Extensions

Add frameworks and conventions for your language:

```markdown
## Python-Specific Guidance

### Framework: pytest

- Use fixtures for shared setup
- Parametrize for testing multiple cases
- Use marks to categorize tests

### Mocking

Prefer:
```python
from unittest.mock import patch, MagicMock

@patch('module.dependency')
def test_with_mock(mock_dep):
    mock_dep.return_value = expected_value

Avoid:

# Monkeypatching directly
original = module.dependency
module.dependency = lambda: expected_value


### Domain-Specific Extensions

Add testing guidance for your domain:

```markdown
## Financial Calculations Testing

### Decimal Precision

Always use Decimal for monetary tests:
```python
from decimal import Decimal

def test_interest_calculation():
    principal = Decimal("1000.00")
    rate = Decimal("0.05")
    expected = Decimal("50.00")
    assert calculate_interest(principal, rate) == expected

Edge Cases to Always Test

Zero amounts
Negative amounts (if allowed)
Maximum precision boundaries
Currency conversion rounding


### CI/CD Integration

Add guidance for test categorization:

```markdown
## Test Categorization

### Fast Tests (< 100ms)
- Run on every commit
- Must not hit external services
- Mark with: `@pytest.mark.fast`

### Slow Tests (> 100ms)
- Run in CI only
- May hit test databases
- Mark with: `@pytest.mark.slow`

### E2E Tests
- Run nightly
- Hit real services in staging
- Mark with: `@pytest.mark.e2e`

Measuring the Skill's Impact

How do you know if this skill is working? Look for these indicators:

Test Coverage Quality

Not just coverage percentage, but coverage of behaviors. With the TDD skill active, you should see:

Fewer untested edge cases discovered in production
Tests that fail for the right reasons when code breaks
Tests that serve as documentation for how code works

Development Velocity

Counterintuitively, strict TDD often increases velocity:

Less debugging time (tests catch issues immediately)
Less rework (requirements are explicit before implementation)
Easier refactoring (comprehensive test coverage enables confident changes)

Code Quality

TDD-driven code tends to have:

Smaller functions (testability requires modularity)
Clearer interfaces (mocking requires explicit contracts)
Better separation of concerns (dependencies must be injectable)

Conclusion

The obra/superpowers TDD skill represents what's possible when expert knowledge is carefully encoded into a Claude Code skill. It's not just documentation; it's executable methodology. When Claude Code follows this skill, it produces the same output an experienced TDD practitioner would.

For skill developers, it provides a template: explicit constraints, decision trees for ambiguous situations, anti-pattern catalogs, and progressive detail levels. These patterns apply to any domain where you want to encode expert methodology.

For developers using Claude Code, this skill transforms TDD from an aspiration to a practice. The skill handles the discipline; you provide the domain knowledge about what to test.

If you're not already using the obra/superpowers collection, start with the TDD skill. Experience how explicit methodology guidance changes Claude Code's output. Then consider what other workflows in your development process could benefit from similar treatment.

The best skills don't just tell Claude Code what to do. They teach it how to think about a problem domain. obra/superpowers TDD is a masterclass in that approach.

Want to explore more testing-focused skills? Check out our Testing Skills Comparison for a broader look at QA automation options, or dive into Code Review Skills Roundup to see how testing connects to broader quality workflows.

TDD Skill Analysis: How obra/superpowers Changed Development

The obra/superpowers Collection

TDD Skill: The focus of this analysis
Refactoring Patterns: Structured approaches to code transformation
Debugging Workflows: Systematic error investigation
Architecture Guidance: High-level design principles

What Makes This TDD Skill Different

The Red-Green-Refactor Cycle with Enforcement

The skill doesn't just describe the TDD cycle; it enforces it through explicit checkpoints:

## The TDD Cycle

### Phase 1: Red (Write a Failing Test)

Before writing ANY implementation code:
1. Identify the next smallest piece of functionality
2. Write a test that exercises that functionality
3. Run the test to confirm it FAILS
4. If the test passes, STOP - either:
   - The functionality already exists (unnecessary test)
   - The test is wrong (doesn't test what you think)

CRITICAL: Do not proceed to Phase 2 until you have seen red.

### Phase 2: Green (Make It Pass)

With a failing test in hand:
1. Write the MINIMUM code to make the test pass
2. Do not add extra functionality
3. Do not optimize
4. Do not refactor
5. Run the test to confirm it PASSES

MINIMUM means exactly that. If returning a hardcoded value passes the test, that's acceptable for now.

### Phase 3: Refactor (Improve the Code)

With passing tests:
1. Look for code smells in the new code
2. Apply one refactoring at a time
3. Run tests after each refactoring
4. If any test fails, REVERT the last refactoring
5. Continue until the code is clean

Notice how each phase has explicit stop conditions. This prevents Claude Code from shortcutting the process, which is a common failure mode for AI assistants doing TDD.

The Granularity Principle

One of the skill's most impactful contributions is its emphasis on test granularity:

## Finding the Right Granularity

A test should exercise ONE behavior. If you're tempted to use "and" when describing the test, split it.

### Wrong Granularity

"test that createUser validates email and saves to database"
This is two tests:
1. "test that createUser validates email format"
2. "test that createUser saves valid user to database"

### Right Granularity

Each test should be describable with:
- Given: A specific starting state
- When: A single action
- Then: A single expected outcome

### The One-Assertion Guideline

Prefer one assertion per test. Multiple assertions often indicate multiple behaviors being tested.

Exception: Related assertions about a single object are acceptable:
```python
def test_user_creation():
    user = create_user("test@example.com")
    assert user.email == "test@example.com"
    assert user.created_at is not None
    # These are aspects of the same behavior


This granularity principle solves a common problem: when tests are too broad, failures don't pinpoint issues. The skill's guidance helps Claude Code write tests that serve as precise documentation of intended behavior.

### Test Naming Convention

The skill prescribes specific naming patterns that improve test readability:

```markdown
## Test Naming

Use the pattern: `test_[unit]_[scenario]_[expected]`

Examples:
- `test_parser_empty_input_returns_none`
- `test_validator_invalid_email_raises_error`
- `test_calculator_divide_by_zero_raises_exception`

Avoid:
- `test_parse` (what scenario? what expected?)
- `test_it_works` (what is "it"?)
- `test1`, `test2` (no semantic meaning)

For nested test classes:
```python
class TestUserValidator:
    class TestEmailValidation:
        def test_accepts_valid_format(self):
            ...
        def test_rejects_missing_at_symbol(self):
            ...


The naming convention becomes crucial for test-as-documentation. When a test fails, its name immediately communicates what broke.

## Analyzing the Skill's Structure

Let's examine how the skill file itself is organized, as this reveals patterns applicable to skill design generally:

### Progressive Disclosure

The skill uses a layered structure that provides information at the right level of abstraction:

```markdown
# TDD Workflow

## Quick Reference
[One-paragraph summary for experienced users]

## The Cycle
[Core methodology - what to do]

## Detailed Guidance
[How to do it well]

## Edge Cases
[When the normal rules don't apply]

## Anti-Patterns
[Common mistakes to avoid]

This structure means Claude Code can quickly reference the cycle when it's confident, but has detailed guidance available when facing unusual situations.

Embedded Decision Trees

Throughout the skill, decision trees help Claude Code navigate ambiguous situations:

## When to Write Integration Tests

Before writing an integration test, ask:
1. Does this behavior span multiple units?
   - No -> Write a unit test instead
   - Yes -> Continue

2. Is the integration point external (database, API)?
   - Yes -> Consider a contract test
   - No -> Continue

3. Can the integration be tested without real external services?
   - Yes -> Write an integration test with fakes
   - No -> Write an integration test with test containers

4. Is the test fast enough for the main test suite?
   - Yes -> Include in main suite
   - No -> Mark as slow test, run in CI only

These decision trees transform vague expertise into explicit procedures. Claude Code doesn't have to guess; it follows the tree.

Anti-Pattern Catalog

The skill dedicates significant space to what NOT to do:

## TDD Anti-Patterns

### 1. The Prophet
Writing tests that predict implementation details.

Wrong:
```python
def test_uses_binary_search():
    # Tests HOW rather than WHAT
    with patch('module.binary_search') as mock:
        find_element([1,2,3], 2)
        mock.assert_called()

Right:

def test_finds_existing_element():
    # Tests WHAT, not HOW
    assert find_element([1,2,3], 2) == True

2. The Liar

Tests that pass but don't actually test anything.

Wrong:

def test_save_user():
    user = User(name="Test")
    save_user(user)
    # No assertion! Always passes

3. The Slow Poke

Tests that hit real databases/APIs unnecessarily.

If a test takes more than 100ms, ask:

Is this testing integration or unit behavior?
Can I fake the slow dependency?

4. The Giant

Single tests that verify too many things.

Signs:

Test method longer than 20 lines
Multiple unrelated assertions
Complex setup fixtures


By explicitly cataloging anti-patterns, the skill helps Claude Code recognize and avoid common pitfalls that even experienced developers fall into.

## How the Skill Changes Claude Code Behavior

Using this skill fundamentally alters how Claude Code approaches development tasks. Let's trace through an example:

### Without the TDD Skill

User: "Add a password strength validator"

Claude Code's typical response:
1. Create `PasswordValidator` class
2. Implement complexity checking logic
3. Add tests that verify the implementation

This is implementation-first development. The tests verify what was built rather than driving what should be built.

### With the obra/superpowers TDD Skill

User: "Add a password strength validator"

Claude Code's response:
1. "I'll start with the simplest failing test. What should happen with an empty password?"
2. Creates: `test_empty_password_is_weak`
3. Runs test, confirms failure
4. Implements minimal code to pass
5. "Now let's add length requirements. What's the minimum length?"
6. Creates: `test_password_shorter_than_8_chars_is_weak`
7. Continues cycle...

The skill transforms Claude Code's approach from implementation-first to specification-first. Each test becomes a requirement that drives implementation.

## Practical Lessons from the Skill

### Lesson 1: Constraints Enable Creativity

The skill's explicit constraints (minimum code, one behavior per test, specific naming) might seem limiting. In practice, they prevent common mistakes and help Claude Code produce consistently high-quality output.

Apply this lesson: When creating your own skills, don't be afraid of explicit constraints. Telling Claude Code what NOT to do is often as valuable as telling it what to do.

### Lesson 2: Decision Trees Scale

Expertise often lives in "it depends" knowledge that's hard to articulate. The skill's decision trees make these dependencies explicit and followable.

Apply this lesson: When you find yourself saying "it depends," that's a signal to create a decision tree. Map out the conditions and the corresponding guidance.

### Lesson 3: Anti-Patterns Are Instructive

Showing what's wrong, with explanations of why, helps Claude Code recognize similar patterns in novel situations.

Apply this lesson: Dedicate space in your skills to anti-patterns. Include examples and explanations of the problems they cause.

### Lesson 4: Progressive Detail

The skill works for both quick reference and deep guidance because of its layered structure. Experienced users get the summary; uncertain situations access detailed guidance.

Apply this lesson: Structure skills with increasing detail levels. Quick reference -> Core concepts -> Detailed guidance -> Edge cases.

## Extending the TDD Skill

The obra/superpowers TDD skill is designed to be extended. Here are common adaptations:

### Language-Specific Extensions

Add frameworks and conventions for your language:

```markdown
## Python-Specific Guidance

### Framework: pytest

- Use fixtures for shared setup
- Parametrize for testing multiple cases
- Use marks to categorize tests

### Mocking

Prefer:
```python
from unittest.mock import patch, MagicMock

@patch('module.dependency')
def test_with_mock(mock_dep):
    mock_dep.return_value = expected_value

Avoid:

# Monkeypatching directly
original = module.dependency
module.dependency = lambda: expected_value


### Domain-Specific Extensions

Add testing guidance for your domain:

```markdown
## Financial Calculations Testing

### Decimal Precision

Always use Decimal for monetary tests:
```python
from decimal import Decimal

def test_interest_calculation():
    principal = Decimal("1000.00")
    rate = Decimal("0.05")
    expected = Decimal("50.00")
    assert calculate_interest(principal, rate) == expected

Edge Cases to Always Test

Zero amounts
Negative amounts (if allowed)
Maximum precision boundaries
Currency conversion rounding


### CI/CD Integration

Add guidance for test categorization:

```markdown
## Test Categorization

### Fast Tests (< 100ms)
- Run on every commit
- Must not hit external services
- Mark with: `@pytest.mark.fast`

### Slow Tests (> 100ms)
- Run in CI only
- May hit test databases
- Mark with: `@pytest.mark.slow`

### E2E Tests
- Run nightly
- Hit real services in staging
- Mark with: `@pytest.mark.e2e`

Measuring the Skill's Impact

How do you know if this skill is working? Look for these indicators:

Test Coverage Quality

Not just coverage percentage, but coverage of behaviors. With the TDD skill active, you should see:

Fewer untested edge cases discovered in production
Tests that fail for the right reasons when code breaks
Tests that serve as documentation for how code works

Development Velocity

Counterintuitively, strict TDD often increases velocity:

Less debugging time (tests catch issues immediately)
Less rework (requirements are explicit before implementation)
Easier refactoring (comprehensive test coverage enables confident changes)

Code Quality

TDD-driven code tends to have:

Smaller functions (testability requires modularity)
Clearer interfaces (mocking requires explicit contracts)
Better separation of concerns (dependencies must be injectable)

Conclusion

For developers using Claude Code, this skill transforms TDD from an aspiration to a practice. The skill handles the discipline; you provide the domain knowledge about what to test.

The best skills don't just tell Claude Code what to do. They teach it how to think about a problem domain. obra/superpowers TDD is a masterclass in that approach.