The 5-Hour Bug AI Solved in Minutes
A real debugging story where Claude Code found a race condition in minutes that would have taken a developer half a day. The exact prompts and process.
Some bugs are hard because the code is complex. Others are hard because they are intermittent, appear only under specific conditions, and leave misleading traces in your logs. The bug I am about to describe was the second kind. It took me two hours of manual investigation to narrow it down. Then Claude Code found the root cause in four minutes.
This is not a story about AI being smarter than humans. It is a story about AI being faster at a specific type of analysis: reading multiple files simultaneously, holding all the context, and identifying a pattern that spans several layers of the stack.
Key Takeaways
- AI excels at bugs that span multiple files because it can hold all relevant code in context simultaneously
- Race conditions are particularly well-suited to AI debugging because the fix is usually obvious once you see the interaction between two code paths
- The key to fast AI debugging is providing the right context -- error logs, the suspicious code, and the test that fails intermittently
- AI debugging works best when you have already narrowed the search space -- it is a precision tool, not a search tool
- Describing the symptoms precisely is more important than guessing at the cause -- let Claude form its own hypothesis
The Bug
The symptom: approximately 1 in 20 skill installations failed with a 500 Internal Server Error. No pattern in which skills failed. No pattern in time of day. No pattern in user behavior. The error appeared, the user retried, and it worked.
The error log showed a Supabase error: "Could not find row with id [uuid]." But the row existed -- you could query it directly and find it. The error was transient.
My Manual Investigation (2 Hours)
Hour 1: Log Analysis
I pulled the logs for the last week and filtered for the 500 errors. Forty-seven failures out of roughly 940 install attempts. About 5%.
The failing requests all had the same error: a SELECT query returned no rows for a skill ID that definitely existed in the database. The timing was not random -- failures clustered around moments of high activity, but not consistently.
I checked for caching issues. No caching was involved. I checked for replication lag -- Supabase uses PostgreSQL, and if I were using read replicas, this could explain stale reads. But I was not using read replicas.
Hour 2: Code Review
I started reading the installation flow line by line.
// Step 1: Fetch the skill
const { data: skill } = await supabase
.from('skills')
.select('*')
.eq('id', skillId)
.single()
// Step 2: Increment the install count
const { error } = await supabase
.from('skills')
.update({ install_count: skill.install_count + 1 })
.eq('id', skillId)
// Step 3: Log the installation
await supabase
.from('install_log')
.insert({ skill_id: skillId, installed_at: new Date() })
Nothing looked wrong. The query was straightforward. The skill ID was validated before reaching this code. I was stuck.
Claude Code's Analysis (4 Minutes)
I loaded the relevant files into Claude Code and described the problem.
"I'm seeing intermittent 500 errors on skill installation. About 5% of requests fail
with 'Could not find row with id [uuid]' even though the row exists. The failures
cluster during high activity. Here are the relevant files."
I added the server action, the Supabase query, the API route, and the error logs.
Claude's response identified the issue in its first reply:
The race condition is in the install count update. When two users install the same skill simultaneously, both read the current
install_count(say, 42), both compute42 + 1 = 43, and both write43. But the second write succeeds while the first write's row version has changed.However, the actual 500 error is not from this race condition directly. It is from a trigger function on the
skillstable that runs after an update. The trigger re-queries the skill by ID, but it is running in a separate transaction that occasionally sees the row in a transitional state during a concurrent update.
Claude identified two problems:
- A read-modify-write race condition on the install count
- A database trigger that was not designed for concurrent access
The first problem was a correctness issue (install counts would be slightly off). The second problem was the actual cause of the 500 errors.
The Fix
Claude suggested two changes:
Fix 1: Atomic Increment
Replace the read-modify-write pattern with an atomic increment:
// Before: race condition
const { data: skill } = await supabase
.from('skills')
.select('install_count')
.eq('id', skillId)
.single()
await supabase
.from('skills')
.update({ install_count: skill.install_count + 1 })
.eq('id', skillId)
// After: atomic increment using RPC
await supabase.rpc('increment_install_count', { skill_id: skillId })
With the corresponding SQL function:
CREATE OR REPLACE FUNCTION increment_install_count(skill_id UUID)
RETURNS void AS $$
BEGIN
UPDATE skills
SET install_count = install_count + 1
WHERE id = skill_id;
END;
$$ LANGUAGE plpgsql;
Fix 2: Trigger Hardening
The trigger function needed to handle the case where the row was being modified concurrently:
-- Add FOR UPDATE to the trigger's internal query
-- to lock the row during the trigger execution
CREATE OR REPLACE FUNCTION on_skill_update()
RETURNS trigger AS $$
DECLARE
skill_row skills%ROWTYPE;
BEGIN
SELECT * INTO skill_row
FROM skills
WHERE id = NEW.id
FOR UPDATE SKIP LOCKED;
IF NOT FOUND THEN
RETURN NEW; -- Skip processing instead of erroring
END IF;
-- Rest of trigger logic
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Why AI Found It Faster
Three specific factors made this bug easier for AI than for a human.
Multi-File Context
The race condition involved three components: the server action, the Supabase query, and the database trigger. As a human, I was reading these sequentially -- action, then query, then (eventually) the trigger. Claude read all three simultaneously and identified the interaction.
Pattern Recognition
Claude has seen thousands of read-modify-write race conditions in its training data. The pattern read count, add one, write count is a classic concurrency bug that Claude recognized immediately. I was looking for something more exotic because the error message was misleading.
No Confirmation Bias
I had a theory (caching issue) that I spent an hour investigating before abandoning. Claude had no prior theory. It looked at the code, the error, and the timing pattern, and formed its hypothesis fresh. No time wasted pursuing dead ends.
When AI Debugging Works (and When It Does Not)
Works Well
Multi-file interactions. Any bug where the root cause is in a different file than the symptom benefits from AI's ability to hold multiple files in context.
Known bug patterns. Race conditions, off-by-one errors, null reference chains, incorrect async/await usage -- these are well-known patterns that AI recognizes quickly.
Configuration issues. Mismatches between environment config, database schema, and application code are easy for AI to spot when all three are loaded.
Dependency conflicts. Version mismatches, incompatible API changes, and deprecated function usage are faster to identify when AI can read both your code and the dependency's changelog.
Does Not Work Well
Performance bugs. "The page is slow" requires profiling data, not code reading. AI can suggest potential causes but cannot measure actual performance.
Environment-specific bugs. "It works on my machine but not in production" often involves infrastructure differences that AI cannot observe.
Undocumented behavior. Bugs caused by undocumented third-party behavior require experimentation, not code analysis.
For more debugging patterns with Claude Code, see our debugging techniques guide.
The Debugging Prompt Template
After this experience, I developed a template for AI-assisted debugging that consistently produces fast results.
"I'm seeing [specific symptom] in [specific context].
Frequency: [how often it happens]
Trigger: [what the user does to cause it]
Error: [exact error message]
Here are the files involved:
[load relevant files]
The error appears to come from [file/line], but the actual cause
might be elsewhere. What could explain these symptoms?"
The key elements:
- Specific symptom, not your interpretation of it
- Frequency and pattern information
- The exact error message, not a paraphrase
- All potentially relevant files loaded into context
- Acknowledgment that the error location might be misleading
That last point is crucial. If you tell Claude "the bug is in auth.ts," it will focus on auth.ts. If you tell Claude "the error appears in auth.ts but the cause might be elsewhere," it searches more broadly.
FAQ
Should I always use AI for debugging?
No. Simple bugs (typos, missing imports, wrong variable names) are faster to find yourself. AI debugging shines when the bug spans multiple files or involves non-obvious interactions. Use AI when you have been stuck for more than 20 minutes.
How much context should I provide?
More is better, up to a point. Load the files directly involved in the error, plus one level of dependencies. If the error is in a server action, load the action, the database query it calls, and the component that calls the action. Do not load your entire codebase.
What if Claude's diagnosis is wrong?
Treat it as a hypothesis, not a diagnosis. If Claude's suggestion does not fix the bug, tell it what happened and provide additional context. The second attempt is usually more accurate because Claude can eliminate its first hypothesis.
Can AI find security vulnerabilities?
Yes, with the same caveat as debugging: it finds known vulnerability patterns. SQL injection, XSS, CSRF, and insecure deserialization are patterns Claude recognizes. Novel vulnerability classes require security expertise.
How do I verify an AI-suggested fix is correct?
The same way you verify any fix: write a test that reproduces the bug, apply the fix, and confirm the test passes. For race conditions specifically, write a concurrency test that triggers the race. If the test cannot reproduce the bug, the fix cannot be verified.
Explore production-ready AI skills at aiskill.market/browse or submit your own skill to the marketplace.
Sources
- PostgreSQL Concurrency Control - Understanding transaction isolation and row locking
- Claude Code Documentation - Context management for debugging sessions
- Supabase Database Functions - Creating atomic database operations