Verification Is Not the Last Step. It's the Only Step That Counts.

There's a phrase I used to say constantly.

Should be working now.

I'd make a change, reason through it in my head, confirm it looked right to me, and ship that phrase. Not deliberately avoiding verification. Just certain enough that the step felt redundant.

The verification-before-completion skill from obra/superpowers opens with: Claiming work is complete without verification is dishonesty, not efficiency.

I read that and my first reaction was that it was too strong. I wasn't lying. I was moving fast.

The more I sat with it, the less I believed that distinction.

What "Should Be Working" Actually Means

When I say should be working, I mean: based on my model of how the code works, my model of how the test suite works, and my model of what I changed, I believe the output would show success.

That's three layers of model stacked on top of each other. Each layer introduces drift. I've been wrong about each layer, independently, more times than I can count.

The skill calls this out directly with a table of rationalizations.

"I'm confident" — Confidence is not evidence. "Should work now" — Run the verification. "Agent said success" — Verify independently. "I'm tired" — Exhaustion is not an excuse.

The table is blunt. It's blunt because the excuses are blunt. They all reduce to the same thing: I'm substituting my belief for the actual result.

The Iron Law

The skill states its core rule this way: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.

Fresh is doing a lot of work in that sentence.

A test run from ten minutes ago, before the last change — that's not fresh. A linter pass from a different file — that's not verification. An agent that reported "task complete" without a VCS diff to confirm — that's not evidence.

The law applies to paraphrases too. The skill is explicit that it applies to any wording implying success without having run verification. You can't route around it by saying "looks good" instead of "done."

The reason for that precision is interesting. The failure the skill is preventing isn't ignorance — it's premature closure. The moment you frame work as complete, you stop investigating. Every piece of information after that point gets filtered through "I'm done and defending it" rather than "I'm still checking."

Premature closure is particularly dangerous with AI-assisted development, where an agent can report confident success while having done exactly the wrong thing. The skill's phrase for this is trust agent success reports — listed explicitly in the red flags to watch for.

The Gate Function

The skill formalizes verification as a five-step gate.

Identify what command proves the claim. Run it completely, not partially. Read the full output, including exit codes. Ask whether the output actually confirms the claim. Only then make the claim.

Skipping any step, it says, is lying — not verifying.

That sounds harsh until you trace the logic. If you don't run the command, your claim is based on expectation. If you don't read the full output, your claim is based on a partial sample. If you don't check exit codes, you can miss failures that look like successes on the surface.

The gate exists because each step catches a different failure mode. You need all of them.

The Part About Trust

There's a section in the skill's documentation that felt the most honest to me. It references twenty-four specific failure memories and what came of them: your human partner said "I don't believe you" — trust broken.

That's not a technical consequence. It's a relational one.

When you say should be working and it isn't, the person on the other end of that message updates their model of how seriously to take your completion claims. They can't not do this — it's what people do after any false positive.

The skill is, underneath everything else, about the credibility of the signal you send when you say you're done.

An unverified "done" is noise. A verified "done" — here's the command, here's the output, here's the exit code — is a signal someone else can rely on. That distinction matters when you're working alone. It matters enormously when you're on a team, or handing work to an agent, or handing work to yourself three weeks from now.

What I Do Now

I run the verification before I say the thing.

Not always cheerfully. Not always without noticing the friction. But the cost of verification is a thirty-second command and a few seconds of reading.

The cost of a false completion claim is invisible. You don't see it when it happens. You see it later, in a bug report, in a confused colleague, in a breaking change you thought was already handled.

I now think of should be working as a signal that I haven't done the work yet. The work ends when I've run the command and read the output.

Until then, I have a belief. Not a result.

Part of the Superpowers series — 14 specialist skills from obra/superpowers.

Verification Is Not the Last Step. It's the Only Step That Counts.

What "Should Be Working" Actually Means

The Iron Law

The Gate Function

The Part About Trust

What I Do Now

Related Skills to Try

Related Skills to Try

using-git-worktrees

Related Articles

Related Articles

Design Systems for Solo Builders

First-Party Benchmarks Are Marketing: A Skeptic's Checklist for Launch Day

The Cheapest Frontier-Class Model Right Now? Grok 4.5's Price-per-Intelligence

using-git-worktrees

brainstorming

finishing-a-development-branch

test-driven-development

brainstorming

finishing-a-development-branch

test-driven-development