Flaky Tests Are Worse Than No Tests. The Currents Team Knows Why.

The .skip comment is the most honest thing in most test suites.

Someone wrote a test. The test was flaky — it passed most of the time and failed unpredictably. The failure was annoying enough to block PRs and slow CI, but the cause wasn't obvious enough to fix in the time available. Someone marked it .skip with a comment that usually says something like "TODO: fix this properly."

Most projects have at least a few of these. Some projects have dozens. The commented tests represent the places where the test infrastructure actively eroded trust in the testing process.

The Playwright Best Practices skill from currents-dev/playwright-best-practices-skill has 33.2K installs. The Currents team builds Playwright tooling professionally — they work with the production test failures of teams at scale. What they've encoded as best practices is specifically what they've watched break test suites repeatedly.

What Makes Tests Flaky

There are a few common patterns, and the Playwright context makes them specific.

The first is timing. Playwright tests that rely on waitForTimeout — sleeping for a fixed duration and hoping the UI has updated — are tightly coupled to a performance assumption that will eventually be violated. The page is fast in development, slow under CI load, slower still under load with network throttling. Fixed waits fail when performance degrades below the threshold. The fix is waitForSelector and semantic waits, not fixed times.

The second is selector brittleness. Tests that target elements by CSS class or DOM position fail when the frontend changes, even when the behavior being tested hasn't changed. A design refactor that renames a CSS class shouldn't break a functional test. The standard is testing IDs, ARIA attributes, or accessible role selectors — things that are stable across visual changes.

The third is test isolation. Tests that share state — a shared database, a shared browser session, a shared authenticated user — fail in ways that depend on execution order. Test A sets up state that Test B expects to be absent. They pass in isolation and fail when run together.

Each of these failure modes has a principled solution. The reason teams hit them repeatedly isn't that the solutions are unknown — it's that the solutions aren't consistently applied because there's no structural enforcement.

Why "Professional" Matters Here

There's a meaningful difference between best practices written by someone who knows Playwright and best practices written by someone who has watched production test suites fail at scale.

The former tends toward the principled and the general: use semantic selectors, avoid timeouts, test behavior not implementation. All correct. Not sufficient.

The latter includes the specific patterns that look fine in the documentation and break at scale. The test that works reliably for 10 tests and becomes flaky at 100 because of a shared database sequence. The selector that's stable across design changes until your design system introduces a component that reuses the role. The async handling that works under Node 16 and breaks under Node 18.

Currents has the second type of knowledge because their product — Playwright test analytics, parallel execution, and flakiness detection — is used by teams running thousands of tests in CI. They see the failure modes that most developers only see once, briefly, before the test gets marked .skip.

33.2K installs is partly the Playwright install base (large and growing) and partly developers who have already hit these failure modes and are looking for systematic coverage rather than post-hoc fixes.

The Cost That Doesn't Show Up in Metrics

Flaky tests have a cost that's hard to measure but significant.

The obvious cost is CI time — reruns, retries, slow pipelines. That shows up in numbers.

The hidden cost is trust erosion. When tests fail unpredictably, engineers stop treating test failures as signals. They learn to run CI again rather than investigate the failure. They start assuming failures are noise until proven otherwise. At the extreme, test failures that indicate real bugs get ignored because the baseline noise rate is too high.

This erosion is irreversible in a specific way: once engineers stop trusting the test suite, getting them to trust it again requires a sustained period of zero false failures. A single flaky test that's tolerated erodes the whole suite's credibility over time.

The Currents best practices are as much about maintaining test suite trust as they are about reducing CI costs. A test suite developers believe in is a different thing from a test suite developers put up with.

The .skip comments are the leading indicator. The question is whether you fix them before trust is gone or after.

Part of the Playwright Best Practices (Currents) — production-tested Playwright patterns from the team that runs test analytics at scale.

Flaky Tests Are Worse Than No Tests. The Currents Team Knows Why.

What Makes Tests Flaky

Why "Professional" Matters Here

The Cost That Doesn't Show Up in Metrics

Related Skills to Try

Related Skills to Try

GitHub Actions Docs

Related Articles

Related Articles

CI/CD on Apple Silicon With AI

AI-Powered Release Automation

Design Systems for Solo Builders

GitHub Actions Docs

Matt Pocock TypeScript Skills

test-driven-development

Agent Evaluation Frameworks

Matt Pocock TypeScript Skills

test-driven-development

Agent Evaluation Frameworks