Debugging Claude Code: Techniques for Skill Development
Master debugging techniques for Claude Code skills including logging strategies, common error patterns, and systematic troubleshooting approaches.
Debugging Claude Code: Techniques for Skill Development
Debugging AI-powered skills is fundamentally different from debugging traditional software. When your code produces unexpected results, is it a bug in your logic, a misunderstanding by the AI, or an edge case in the input? This guide provides systematic techniques for identifying and fixing issues in Claude Code skills.
The Debugging Mindset
Before diving into techniques, understand what makes AI debugging unique:
- Non-determinism: The same input may produce different outputs across runs
- Context sensitivity: Behavior changes based on conversation history and project state
- Emergent behavior: Complex skills can produce unexpected interactions between components
- Silent failures: The AI might produce plausible-looking but incorrect output
Effective debugging requires treating the AI as a collaborator whose reasoning you need to understand, not just a black box to test.
Setting Up Your Debug Environment
Enable Verbose Logging
Configure Claude Code to output detailed information:
# Set debug environment variables
export CLAUDE_DEBUG=1
export CLAUDE_LOG_LEVEL=debug
# Run with verbose output
claude --verbose --skill your-skill "test input"
Create a Debug Configuration
Add a debug profile to your skill:
# SKILL.md debug configuration
debug:
enabled: true
logPrompts: true
logResponses: true
logToolCalls: true
outputDir: "./.debug-logs"
retainLogs: 50 # Keep last 50 sessions
Use a Structured Logging Approach
// lib/debug/logger.ts
type LogLevel = 'debug' | 'info' | 'warn' | 'error';
interface LogEntry {
timestamp: string;
level: LogLevel;
component: string;
message: string;
data?: Record<string, unknown>;
}
class SkillLogger {
private entries: LogEntry[] = [];
private minLevel: LogLevel = 'info';
constructor(private component: string) {
if (process.env.CLAUDE_DEBUG) {
this.minLevel = 'debug';
}
}
private shouldLog(level: LogLevel): boolean {
const levels: LogLevel[] = ['debug', 'info', 'warn', 'error'];
return levels.indexOf(level) >= levels.indexOf(this.minLevel);
}
log(level: LogLevel, message: string, data?: Record<string, unknown>) {
if (!this.shouldLog(level)) return;
const entry: LogEntry = {
timestamp: new Date().toISOString(),
level,
component: this.component,
message,
data,
};
this.entries.push(entry);
// Format for console
const prefix = `[${entry.timestamp}] [${level.toUpperCase()}] [${this.component}]`;
console.log(`${prefix} ${message}`);
if (data) {
console.log(JSON.stringify(data, null, 2));
}
}
debug(message: string, data?: Record<string, unknown>) {
this.log('debug', message, data);
}
info(message: string, data?: Record<string, unknown>) {
this.log('info', message, data);
}
warn(message: string, data?: Record<string, unknown>) {
this.log('warn', message, data);
}
error(message: string, data?: Record<string, unknown>) {
this.log('error', message, data);
}
getEntries(): LogEntry[] {
return [...this.entries];
}
exportToFile(filepath: string) {
const fs = require('fs');
fs.writeFileSync(filepath, JSON.stringify(this.entries, null, 2));
}
}
export function createLogger(component: string): SkillLogger {
return new SkillLogger(component);
}
Common Error Patterns and Solutions
Pattern 1: Context Window Overflow
Symptoms:
- Skill works with small inputs but fails with larger ones
- Truncated or incomplete outputs
- Error messages about token limits
Diagnosis:
// lib/debug/context-analyzer.ts
export function analyzeContextUsage(
prompt: string,
response: string
): {
promptTokens: number;
responseTokens: number;
totalTokens: number;
percentUsed: number;
} {
// Approximate token count (rough estimate: 1 token ~ 4 chars)
const estimateTokens = (text: string) => Math.ceil(text.length / 4);
const promptTokens = estimateTokens(prompt);
const responseTokens = estimateTokens(response);
const totalTokens = promptTokens + responseTokens;
const maxTokens = 200000; // Claude's context window
return {
promptTokens,
responseTokens,
totalTokens,
percentUsed: (totalTokens / maxTokens) * 100,
};
}
Solution:
// Implement context management
export function trimContextToFit(
systemPrompt: string,
userMessages: string[],
maxTokens: number
): string[] {
const estimateTokens = (text: string) => Math.ceil(text.length / 4);
let currentTokens = estimateTokens(systemPrompt);
const result: string[] = [];
// Prioritize recent messages
for (let i = userMessages.length - 1; i >= 0; i--) {
const messageTokens = estimateTokens(userMessages[i]);
if (currentTokens + messageTokens > maxTokens) {
break;
}
result.unshift(userMessages[i]);
currentTokens += messageTokens;
}
return result;
}
Pattern 2: Tool Call Failures
Symptoms:
- Skill attempts to use tools but they fail silently
- Wrong tool selected for the task
- Tool arguments malformed
Diagnosis:
// lib/debug/tool-inspector.ts
interface ToolCall {
name: string;
arguments: Record<string, unknown>;
result?: unknown;
error?: string;
}
export function inspectToolCalls(session: ToolCall[]) {
console.log('=== Tool Call Analysis ===\n');
for (const call of session) {
console.log(`Tool: ${call.name}`);
console.log(`Arguments: ${JSON.stringify(call.arguments, null, 2)}`);
if (call.error) {
console.log(`ERROR: ${call.error}`);
// Common error patterns
if (call.error.includes('not found')) {
console.log(' -> Suggestion: Check if file/path exists');
}
if (call.error.includes('permission')) {
console.log(' -> Suggestion: Check sandbox configuration');
}
if (call.error.includes('timeout')) {
console.log(' -> Suggestion: Increase timeout or optimize operation');
}
} else {
console.log(`Result: ${JSON.stringify(call.result)?.substring(0, 200)}...`);
}
console.log('---');
}
}
Solution:
// Add tool call validation
export function validateToolCall(
toolName: string,
args: Record<string, unknown>
): { valid: boolean; errors: string[] } {
const errors: string[] = [];
// Tool-specific validation
switch (toolName) {
case 'read_file':
if (!args.path || typeof args.path !== 'string') {
errors.push('read_file requires a string path argument');
}
break;
case 'write_file':
if (!args.path || !args.content) {
errors.push('write_file requires path and content arguments');
}
break;
case 'bash':
if (!args.command || typeof args.command !== 'string') {
errors.push('bash requires a string command argument');
}
break;
}
return { valid: errors.length === 0, errors };
}
Pattern 3: Prompt Misinterpretation
Symptoms:
- Skill does something different from what was requested
- Outputs are technically correct but not what user wanted
- Inconsistent behavior across similar inputs
Diagnosis:
// lib/debug/prompt-analyzer.ts
export function analyzePromptClarity(prompt: string): {
score: number;
issues: string[];
suggestions: string[];
} {
const issues: string[] = [];
const suggestions: string[] = [];
// Check for ambiguous language
const ambiguousTerms = ['it', 'this', 'that', 'the thing', 'stuff'];
for (const term of ambiguousTerms) {
if (prompt.toLowerCase().includes(term)) {
issues.push(`Ambiguous term found: "${term}"`);
suggestions.push(`Replace "${term}" with specific references`);
}
}
// Check for missing context
if (!prompt.includes('file') && !prompt.includes('code') && !prompt.includes('project')) {
issues.push('Prompt may lack context about what to operate on');
suggestions.push('Explicitly state the target (file, function, project)');
}
// Check for conflicting instructions
if (prompt.includes('but') || prompt.includes('however') || prompt.includes('except')) {
issues.push('Prompt contains potential conflicts');
suggestions.push('Separate into clear, non-conflicting steps');
}
// Check length
if (prompt.length > 2000) {
issues.push('Prompt may be too long');
suggestions.push('Break into smaller, focused requests');
}
const score = Math.max(0, 100 - issues.length * 20);
return { score, issues, suggestions };
}
Solution:
<!-- Improve your skill's system prompt -->
## Clear Instructions Template
You are a [specific role] that [specific task].
### Input Format
You will receive:
1. [First input type] - [description]
2. [Second input type] - [description]
### Output Format
You must respond with:
1. [First output element] - [format specification]
2. [Second output element] - [format specification]
### Constraints
- ALWAYS [specific behavior]
- NEVER [prohibited behavior]
- When uncertain, [fallback behavior]
### Examples
Input: [example input]
Output: [example output]
Pattern 4: State Management Issues
Symptoms:
- Skill works the first time but fails on subsequent runs
- Cached data causes incorrect behavior
- Race conditions in async operations
Diagnosis:
// lib/debug/state-inspector.ts
interface StateSnapshot {
timestamp: string;
state: Record<string, unknown>;
trigger: string;
}
class StateDebugger {
private snapshots: StateSnapshot[] = [];
captureState(state: Record<string, unknown>, trigger: string) {
this.snapshots.push({
timestamp: new Date().toISOString(),
state: JSON.parse(JSON.stringify(state)), // Deep clone
trigger,
});
}
compareSnapshots(index1: number, index2: number): {
added: string[];
removed: string[];
changed: string[];
} {
const s1 = this.snapshots[index1]?.state || {};
const s2 = this.snapshots[index2]?.state || {};
const added = Object.keys(s2).filter(k => !(k in s1));
const removed = Object.keys(s1).filter(k => !(k in s2));
const changed = Object.keys(s1)
.filter(k => k in s2)
.filter(k => JSON.stringify(s1[k]) !== JSON.stringify(s2[k]));
return { added, removed, changed };
}
printTimeline() {
console.log('=== State Timeline ===\n');
for (const snapshot of this.snapshots) {
console.log(`[${snapshot.timestamp}] ${snapshot.trigger}`);
console.log(JSON.stringify(snapshot.state, null, 2));
console.log('---');
}
}
}
export const stateDebugger = new StateDebugger();
Solution:
// Implement proper state isolation
export function createIsolatedSession<T extends Record<string, unknown>>(
initialState: T
): {
getState: () => T;
setState: (updates: Partial<T>) => void;
reset: () => void;
} {
let state = { ...initialState };
return {
getState: () => ({ ...state }),
setState: (updates) => {
state = { ...state, ...updates };
},
reset: () => {
state = { ...initialState };
},
};
}
Pattern 5: Output Parsing Failures
Symptoms:
- Raw AI output appears in user-facing results
- JSON parsing errors
- Missing or malformed structured data
Diagnosis:
// lib/debug/output-debugger.ts
export function debugOutputParsing(rawOutput: string): {
format: 'json' | 'markdown' | 'code' | 'text' | 'mixed';
parseErrors: string[];
extractedBlocks: { type: string; content: string }[];
} {
const parseErrors: string[] = [];
const extractedBlocks: { type: string; content: string }[] = [];
// Try JSON parsing
try {
JSON.parse(rawOutput);
return { format: 'json', parseErrors: [], extractedBlocks: [] };
} catch (e) {
if (rawOutput.includes('{') || rawOutput.includes('[')) {
parseErrors.push(`Invalid JSON: ${(e as Error).message}`);
}
}
// Extract code blocks
const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
let match;
while ((match = codeBlockRegex.exec(rawOutput)) !== null) {
extractedBlocks.push({
type: match[1] || 'code',
content: match[2],
});
}
// Determine format
let format: 'markdown' | 'code' | 'text' | 'mixed' = 'text';
if (extractedBlocks.length > 0) {
format = rawOutput.replace(codeBlockRegex, '').trim().length > 100
? 'mixed'
: 'code';
} else if (rawOutput.includes('#') || rawOutput.includes('**')) {
format = 'markdown';
}
return { format, parseErrors, extractedBlocks };
}
Solution:
// Implement robust output parsing
export function parseSkillOutput<T>(
rawOutput: string,
schema: {
type: 'json' | 'code' | 'text';
validate?: (data: unknown) => data is T;
}
): { success: boolean; data?: T; error?: string } {
try {
switch (schema.type) {
case 'json': {
// Extract JSON from markdown code blocks if present
const jsonMatch = rawOutput.match(/```json\n([\s\S]*?)```/);
const jsonStr = jsonMatch ? jsonMatch[1] : rawOutput;
const parsed = JSON.parse(jsonStr);
if (schema.validate && !schema.validate(parsed)) {
return { success: false, error: 'Output failed validation' };
}
return { success: true, data: parsed };
}
case 'code': {
const codeMatch = rawOutput.match(/```\w*\n([\s\S]*?)```/);
if (!codeMatch) {
return { success: false, error: 'No code block found in output' };
}
return { success: true, data: codeMatch[1] as unknown as T };
}
case 'text':
default:
return { success: true, data: rawOutput.trim() as unknown as T };
}
} catch (e) {
return { success: false, error: (e as Error).message };
}
}
Systematic Debugging Workflow
Step 1: Reproduce the Issue
Create a minimal reproduction case:
// tests/debug/reproduce-issue.ts
describe('Issue #42: Skill fails with unicode input', () => {
it('should reproduce the issue', async () => {
const problematicInput = 'Process this: cafe';
// Capture full trace
const trace = await runSkillWithTracing('my-skill', {
input: problematicInput,
capturePrompts: true,
captureResponses: true,
});
// Save trace for analysis
fs.writeFileSync(
'./debug/issue-42-trace.json',
JSON.stringify(trace, null, 2)
);
// The failing assertion
expect(trace.output).toContain('cafe');
});
});
Step 2: Isolate the Failure Point
Use binary search to narrow down where the failure occurs:
// lib/debug/bisect.ts
export async function bisectFailure(
steps: (() => Promise<boolean>)[]
): Promise<number> {
let low = 0;
let high = steps.length - 1;
while (low < high) {
const mid = Math.floor((low + high) / 2);
console.log(`Testing step ${mid}...`);
const success = await steps[mid]();
if (success) {
low = mid + 1;
} else {
high = mid;
}
}
console.log(`Failure occurs at step ${low}`);
return low;
}
// Usage
const steps = [
async () => validateInput(input),
async () => buildPrompt(input),
async () => callAI(prompt),
async () => parseOutput(response),
async () => applyChanges(parsed),
];
const failingStep = await bisectFailure(steps);
Step 3: Compare Working vs. Failing Cases
// lib/debug/diff-analyzer.ts
interface RunResult {
input: unknown;
output: unknown;
success: boolean;
trace: unknown[];
}
export function compareRuns(
working: RunResult,
failing: RunResult
): {
inputDiff: string[];
traceDiff: string[];
hypothesis: string;
} {
const inputDiff: string[] = [];
const traceDiff: string[] = [];
// Compare inputs
const workingInput = JSON.stringify(working.input);
const failingInput = JSON.stringify(failing.input);
if (workingInput !== failingInput) {
inputDiff.push(`Input differs:`);
inputDiff.push(` Working: ${workingInput.substring(0, 100)}`);
inputDiff.push(` Failing: ${failingInput.substring(0, 100)}`);
}
// Compare trace lengths
if (working.trace.length !== failing.trace.length) {
traceDiff.push(
`Trace length differs: ${working.trace.length} vs ${failing.trace.length}`
);
}
// Find first divergence in trace
for (let i = 0; i < Math.min(working.trace.length, failing.trace.length); i++) {
if (JSON.stringify(working.trace[i]) !== JSON.stringify(failing.trace[i])) {
traceDiff.push(`First divergence at step ${i}`);
traceDiff.push(` Working: ${JSON.stringify(working.trace[i])}`);
traceDiff.push(` Failing: ${JSON.stringify(failing.trace[i])}`);
break;
}
}
// Generate hypothesis
let hypothesis = 'Unknown cause';
if (inputDiff.some(d => d.includes('unicode') || d.includes('\u'))) {
hypothesis = 'Unicode handling issue';
} else if (traceDiff.some(d => d.includes('timeout'))) {
hypothesis = 'Performance/timeout issue';
} else if (traceDiff.some(d => d.includes('parse'))) {
hypothesis = 'Output parsing issue';
}
return { inputDiff, traceDiff, hypothesis };
}
Step 4: Test Your Fix
// tests/regression/issue-42.test.ts
describe('Regression: Issue #42', () => {
// Test the specific case that was failing
it('should handle unicode input correctly', async () => {
const inputs = [
'Process: cafe',
'Handle: ',
'Parse: Hello',
'Analyze: ',
];
for (const input of inputs) {
const result = await runSkill('my-skill', { input });
expect(result.success).toBe(true);
expect(result.output).toBeDefined();
}
});
// Ensure fix doesn't break existing functionality
it('should still handle ASCII input', async () => {
const result = await runSkill('my-skill', {
input: 'Process: hello world',
});
expect(result.success).toBe(true);
});
});
Advanced Debugging Techniques
AI Behavior Probing
Ask the AI to explain its reasoning:
// lib/debug/behavior-probe.ts
export async function probeAIBehavior(
skill: string,
input: string
): Promise<{
reasoning: string;
confidence: number;
alternatives: string[];
}> {
const probePrompt = `
You are about to process the following input with the "${skill}" skill:
${input}
Before processing, explain:
1. What do you understand this request to mean?
2. What steps will you take?
3. What could go wrong?
4. How confident are you (0-100)?
5. What alternative interpretations exist?
Format your response as JSON.
`;
const response = await callAI(probePrompt);
return JSON.parse(response);
}
Session Recording and Replay
Record and replay skill runs for debugging:
// lib/debug/recorder.ts
interface RecordedRun {
id: string;
timestamp: string;
skill: string;
input: unknown;
aiCalls: {
prompt: string;
response: string;
}[];
toolCalls: {
tool: string;
args: unknown;
result: unknown;
}[];
output: unknown;
}
class SessionRecorder {
private recordings: RecordedRun[] = [];
startRecording(skill: string, input: unknown): string {
const id = crypto.randomUUID();
this.recordings.push({
id,
timestamp: new Date().toISOString(),
skill,
input,
aiCalls: [],
toolCalls: [],
output: undefined,
});
return id;
}
recordAICall(id: string, prompt: string, response: string) {
const recording = this.recordings.find(r => r.id === id);
if (recording) {
recording.aiCalls.push({ prompt, response });
}
}
recordToolCall(id: string, tool: string, args: unknown, result: unknown) {
const recording = this.recordings.find(r => r.id === id);
if (recording) {
recording.toolCalls.push({ tool, args, result });
}
}
finishRecording(id: string, output: unknown) {
const recording = this.recordings.find(r => r.id === id);
if (recording) {
recording.output = output;
}
}
async replay(
id: string,
options: { mockAI?: boolean; mockTools?: boolean } = {}
): Promise<unknown> {
const recording = this.recordings.find(r => r.id === id);
if (!recording) throw new Error(`Recording ${id} not found`);
// Replay with optional mocking
// This allows testing code paths without actual AI/tool calls
console.log(`Replaying session ${id}...`);
// Implementation details would go here
return recording.output;
}
save(filepath: string) {
const fs = require('fs');
fs.writeFileSync(filepath, JSON.stringify(this.recordings, null, 2));
}
}
export const recorder = new SessionRecorder();
Debugging Checklist
When debugging a skill issue:
- Reproduce: Can you reliably reproduce the issue?
- Isolate: Is it an AI issue, code issue, or integration issue?
- Log: Have you captured prompts, responses, and tool calls?
- Compare: Do you have a working case to compare against?
- Simplify: Can you reproduce with a simpler input?
- Validate: Is the input valid? Is the output parseable?
- Context: Is context window being exceeded?
- State: Is there stale state affecting behavior?
- Test: Have you written a test for the fix?
- Regress: Does the fix break anything else?
Conclusion
Debugging Claude Code skills requires a combination of traditional software debugging techniques and AI-specific strategies. The key principles are:
- Instrument everything - You cannot debug what you cannot see
- Reproduce reliably - Flaky issues are nearly impossible to fix
- Isolate systematically - Use binary search to find failure points
- Compare working vs. failing - Differences reveal root causes
- Probe AI behavior - Ask the AI to explain its reasoning
- Test the fix - Ensure the fix works and does not break other things
With these techniques, you can systematically identify and fix issues in even the most complex skills.
Want to make your skills faster? Continue to Performance Optimization for speed-up strategies.