Best Claude Skills for Testing & TDD in 2026: 15 Compared
Top 15 Claude Code skills for testing and TDD. The #1 and #2 entries aren't test runners — they're TDD methodology skills at 4/5 quality, totaling 290K install signal.
Testing & TDD is the use-case cut where Claude Code's installed base has the strongest taste signal. The top two skills aren't test runners or framework wrappers — they're TDD methodology skills from obra/superpowers (167K, 4/5 quality) and NousResearch/hermes-agent (124K, 4/5 quality). Both encode the RED → GREEN → REFACTOR loop and force Claude to write tests before implementation. Combined install signal: 290K, more than the rest of the top 15 combined. The pattern is clear — developers using Claude Code for testing want philosophy more than tools.
Quick Pick
test-driven-development (obra/superpowers) — The highest-installed testing skill in the catalog. 167K signal, 4/5 quality. Encodes "write the test first, then the code." Install this before picking a Playwright wrapper.
What These Skills Actually Do
Testing skills cluster into three patterns: (1) Methodology skills (#1, #2) — encode the TDD discipline itself. They don't run tests; they make sure you write the test first. Both top entries do this and the install signal validates the audience cares more about discipline than tooling. (2) Test-runner wrappers (#3 Playwright Best Practices, #4 Playwright CLI, #5 Browser Use, #8 Playwright Commander, #12 E2E Testing Patterns) — five entries about running tests, mostly Playwright-flavored. (3) Strategy skills (#6 Agent Skills by Addy Osmani, #7 Code, #10 Test Runner, #11 Test Master, #14 Developer, #15 Testing Patterns) — broader software-quality skills that include testing as one component. What separates great from mediocre here is whether the skill enforces behavior change (methodology skills do; CLI wrappers don't) and whether the author shows their work (Currents' Playwright skill at #3 is 4/5-quality-grade detailed; the generic "Developer" #14 is shallow by comparison).
How We Ranked
We sorted 15 candidate skills by a composite score:
- Popularity signal — the highest of GitHub stars, install count, or ClawHub download count. Log-scaled so a 100-star skill doesn't get buried under a 100,000-star one if the smaller one is meaningfully better.
- Quality score — when set, a 0–5 rubric that breaks ties within popularity tiers. Roughly 15% of catalog skills carry a quality score today; we surface it in the comparison table when available.
The formula is identical across the entire Best-Of 2026 series, so you can compare apples to apples between categories.
The Top 15
1. test-driven-development
Skill · obra/superpowers · 167.5K signal · quality 4/5 Use when implementing any feature or bugfix, before writing implementation code.
The take: The single highest-installed testing skill in the catalog. From the obra/superpowers repo — the same author behind several top entries across the series. The skill is short (the discipline is simple), but the result is profound: Claude actually writes the failing test first instead of skipping ahead to the implementation it "knows" is right.
2. test-driven-development (Hermes)
Skill · NousResearch/hermes-agent · 124.8K signal · quality 4/5 TDD: enforce RED → GREEN → REFACTOR, tests before code.
The take: Same purpose as #1, different author. From the Hermes agent repo. Some teams prefer Hermes's stricter phrasing; some prefer Superpowers' looser framing. Install either, not both — they conflict on the same task.
3. Playwright Best Practices (Currents)
Plugin · currents-dev/playwright-best-practices-skill · 33.2K signal · quality 3/5 Playwright best practices from Currents — reliable selectors, parallel execution, CI configuration, flake prevention.
The take: Published by Currents (a Playwright observability vendor), which means they actually run thousands of test suites at scale and know where the failure modes are. Anti-flake guidance is the differentiator over the generic CLI wrappers (#4, #5, #8) — flaky tests are the single biggest reason teams give up on E2E testing.
4. Playwright CLI
Plugin · microsoft/playwright-cli · 26.6K signal · quality 3/5 Official Microsoft Playwright CLI skill — generates test scripts, runs assertions, manages browser contexts.
The take: Microsoft-published, so the API mapping is authoritative. Pair with #3 (Best Practices) — the CLI runs the tests, Best Practices ensures they're not flaky. Install both.
5. Browser Use
Skill · shawnpana/browser-use · 33.3K signal · quality unrated Browser interactions for testing, forms, screenshots, data extraction.
The take: Dual-use — appears here and in the Browser Automation list. The testing-flavored install is for when you want the Playwright-wrapper feel without the official Microsoft skill (#4).
6. Agent Skills by Addy Osmani
Plugin · addyosmani/agent-skills · 21.0K signal · quality unrated 20 production-grade engineering skills for AI coding agents — define, plan, build, verify, review, ship.
The take: Bigger-than-testing — but its "verify" stage covers the same ground as the dedicated testing skills. Install when you want the full lifecycle wrapped, not just testing.
7. Code
Skill · ivangdavila/code · 17.9K signal · quality unrated Coding workflow — planning, implementation, verification, testing.
The take: Generic workflow skill that includes a testing step. Same author as several other ivangdavila entries across the series; consistent in style, light on testing specifics.
8. Playwright Commander
Skill · austindixson/playwright-commander · 15.8K signal · quality unrated Playwright for UI automation, analysis, debugging.
The take: Differentiator is the debugging framing — gives Claude verbose state reports when tests fail. Useful pairing with #1 TDD when the red-step assertion isn't obvious.
9. Debug Pro
Skill · cmanfre7/debug-pro · 15.6K signal · quality unrated 7-step debugging protocol + language-specific commands for systematically identifying and fixing bugs.
The take: Cross-category install (also #3 in the Debugging list). Belongs here too because TDD's RED step is debugging when the test surfaces a real bug.
10. Test Runner
Skill · cmanfre7/test-runner · 11.9K signal · quality unrated Unit, integration, E2E across TypeScript, Python, Swift. Recommended frameworks.
The take: Multi-language test runner — the only skill in the top 15 that explicitly covers Swift. Install for polyglot codebases.
11. Test Master
Skill · veeramanikandanr48/test-master · 7.4K signal · quality unrated Test strategies, automation frameworks, unit/integration/E2E, coverage, performance, security testing.
The take: Broadest scope of the strategy skills — includes performance and security testing layers most other skills skip. Heavier than #10; lighter than #6 Addy Osmani's full lifecycle.
12. E2E Testing Patterns
Skill · wpank/e2e-testing-patterns · 6.5K signal · quality unrated Reliable E2E with Playwright and Cypress — critical user journey coverage, flake elimination, CI/CD integration.
The take: Covers Cypress too, which is the differentiator from the Playwright-only skills above. Install if your stack is Cypress-based; otherwise use #3.
13. CI/CD Pipeline
Skill · gitgoodordietrying/cicd-pipeline · 4.2K signal · quality unrated GitHub Actions pipelines for testing, deployment, releases.
The take: Tests don't help if they don't run on every PR. This skill bridges the gap — writes the GitHub Actions YAML that wires your test runner into CI. Use after picking your test runner.
14. Developer
Skill · ivangdavila/developer · 3.9K signal · quality unrated Generic dev workflow with debugging, testing, architecture.
The take: Shallow on testing specifics. Skip if you have #1/#2 plus a dedicated test runner.
15. Testing Patterns
Skill · wpank/testing-patterns · 3.8K signal · quality unrated Unit, integration, E2E patterns with framework-specific guidance.
The take: Companion to #12 from the same author (wpank). The two together cover both E2E patterns and the broader testing-strategy layer; install both or neither.
Comparison Table
| # | Skill | Type | Stars / Installs | Quality | License |
|---|---|---|---|---|---|
| 1 | test-driven-development | Skill | 167.5K | 4/5 | — |
| 2 | test-driven-development | Skill | 124.8K | 4/5 | MIT |
| 3 | Playwright Best Practices (Currents) | Plugin | 33.2K | 3/5 | MIT |
| 4 | Playwright CLI | Plugin | 26.6K | 3/5 | MIT |
| 5 | Browser Use | Skill | 33.3K | — | — |
| 6 | Agent Skills by Addy Osmani | Plugin | 21.0K | — | MIT |
| 7 | Code | Skill | 17.9K | — | — |
| 8 | Playwright Commander | Skill | 15.8K | — | — |
| 9 | Debug Pro | Skill | 15.6K | — | — |
| 10 | Test Runner | Skill | 11.9K | — | — |
| 11 | Test Master | Skill | 7.4K | — | — |
| 12 | E2E Testing Patterns | Skill | 6.5K | — | — |
| 13 | CI/CD Pipeline | Skill | 4.2K | — | — |
| 14 | Developer | Skill | 3.9K | — | — |
| 15 | Testing Patterns | Skill | 3.8K | — | — |
FAQ
How is this list different from the category page on aiskill.market?
The category page is a directory: every skill in the category, sortable and filterable. This list is editorial — opinionated, time-stamped (2026-05-18), and ranked. Use the directory when you know what you want; use this when you don't.
Why does the #1 pick have fewer stars than #5?
Stars are one signal among several. The composite score above also includes install counts (which reflect actual usage on aiskill.market) and the optional quality score. A skill with a smaller star count can rank higher if its installs or quality score are strong enough to offset.
Are these all free?
The skills are open source. Playwright wrappers (#3, #4, #5, #8, #12) install Chromium (~500MB disk). The Currents skill at #3 doesn't require a Currents account to use — but you get more value if you have one.
How do I install one?
Each linked skill page has install instructions. The fastest path is the one-line install via the aiskill.market CLI or by adding the source repo as a Claude Code plugin marketplace.
How often does this list update?
Quarterly. Testing tools evolve slowly; methodology-skill rankings are stable. Expect more Playwright-flavored entries each cycle as that ecosystem keeps growing.
Should I install both TDD skills (#1 and #2)?
No. They conflict on the same task — Claude won't know which methodology to apply when both fire. Pick one based on which README resonates: Superpowers is looser/more flexible, Hermes is stricter on RED-GREEN-REFACTOR ordering.
What's the right starter pack for testing?
Three skills: #1 test-driven-development (methodology) + #3 Playwright Best Practices (anti-flake guidance) + #4 Playwright CLI (actual test runner). Add #13 CI/CD Pipeline when you're ready to wire tests into GitHub Actions.
Related Categories
- Best AI Skills for Development & Code Tools in 2026 — Superpowers and Hermes repos show up there too
- Best Claude Skills for Debugging in 2026 — Debug Pro (#9) is shared
- Best Claude Skills for CI/CD in 2026 — pipeline integration follows naturally
Browse The Full Catalog
Find every skill in this category — including the ones that didn't make the top 15 — at the main browse page.
Part of the Best-Of 2026 series. Updated 2026-05-18. Skills sampled from a catalog of ~262 active entries with a combined 493.4K popularity signal across the ranked entries.