The Iron Law of Debugging: Why Root Cause Comes Before Everything Else
Most debugging sessions fail not because the problem is hard, but because engineers skip the investigation and go straight to fixes — a habit that compounds.
The third time I tried a fix on a failing CI pipeline and it still didn't work, I noticed something uncomfortable.
I had stopped thinking. I was just applying patches.
Each attempt had a theory attached to it, but none of the theories were connected by evidence. I was running experiments without recording results. I was doing what felt like debugging but was actually something closer to superstition — changing things in the hope that something would stick.
The Difference Between Investigating and Guessing
There's already an article on this site about gstack's investigate skill, which takes a similarly rigorous approach to debugging. But the superpowers systematic-debugging skill from obra/superpowers draws a sharper line. It doesn't just recommend investigating — it makes investigation a precondition.
The skill calls it the Iron Law:
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
Not "try to understand first." Not "it's usually good to investigate." A hard stop. If you haven't completed Phase 1, the skill explicitly won't let you proceed to proposing solutions. That's a different posture than most debugging advice takes.
What distinguishes this approach from the gstack investigate style is structural. The superpowers skill is built around four explicit phases — Root Cause, Pattern Analysis, Hypothesis and Testing, Implementation — and the phases are sequential. You cannot skip ahead. You cannot borrow from Phase 3 to shortcut Phase 1. The protocol maps almost exactly to the scientific method, applied to broken software.
The investigate skill asks you to look carefully before acting. The systematic-debugging skill asks you to prove you understand the root cause before you're allowed to act at all.
What Phase 1 Actually Means
Most engineers believe they investigate before fixing. In practice, "investigation" often means reading the error message for thirty seconds and then reaching for the most plausible change.
Phase 1 is more demanding than that. It requires reproducing the issue consistently, checking what changed recently, and — for systems with multiple components — adding diagnostic instrumentation to trace exactly where the failure originates. Not where you suspect it originates. Where the evidence shows it does.
The skill gives a specific example: a multi-layer CI signing failure. You add logging at each boundary — the workflow, the build script, the signing script, the actual signing command — and you run it once to gather evidence. Then you look at the output and determine which layer is breaking. Only then do you investigate that layer.
This sequence sounds obvious. But under time pressure, it feels like overkill. The temptation is to skip to "I bet it's the keychain state" and start poking at that. Sometimes you're right. But when you're wrong, you've just spent an hour in the wrong component, and you're no closer to the actual root cause than when you started.
The 3-Fix Rule
There's a specific heuristic in the skill that I think about often.
If you've tried three fixes and none have worked, you're not allowed to try a fourth. You have to stop and question the architecture.
The reasoning is that three failed fixes, each revealing a new problem in a different place, is a pattern. It usually means the underlying design is wrong, not the implementation. Each patch is treating a symptom that only exists because the structure is generating it.
This is uncomfortable to acknowledge mid-sprint. The skill names the rationalizations we use to avoid it: "just one more fix attempt," "each fix is getting closer," "this is taking so long we can't afford a refactor now."
The protocol says: at three failures, the cost of not questioning the architecture is higher than the cost of stopping.
The Mental Habit It Builds
What systematic-debugging is really teaching is a form of epistemic discipline.
It's training you to distinguish between what you observe and what you infer. Between "the error message says X" and "I think X means Y." Between "I've seen this before" and "I've traced this specific failure to its origin."
Most experienced engineers have absorbed versions of this discipline informally, through expensive debugging sessions that eventually forced them to slow down. The skill makes the implicit explicit. It gives you the protocol before the expensive session — not after.
The real value isn't the four phases themselves. It's the constraint that comes before all of them: you cannot fix what you haven't understood.
I now think of systematic debugging not as a process for hard bugs, but as a check on the cognitive bias that makes easy bugs hard. The bias that makes you reach for the plausible fix instead of the proven one. The bias that feels like productivity but produces rework.
The Iron Law isn't there to slow you down. It's there to stop the thrashing.
Part of the Superpowers series — 14 specialist skills from obra/superpowers.