Debugging Claude Code: Techniques for Skill Development

Debugging AI-powered skills is fundamentally different from debugging traditional software. When your code produces unexpected results, is it a bug in your logic, a misunderstanding by the AI, or an edge case in the input? This guide provides systematic techniques for identifying and fixing issues in Claude Code skills.

The Debugging Mindset

Before diving into techniques, understand what makes AI debugging unique:

Non-determinism: The same input may produce different outputs across runs
Context sensitivity: Behavior changes based on conversation history and project state
Emergent behavior: Complex skills can produce unexpected interactions between components
Silent failures: The AI might produce plausible-looking but incorrect output

Effective debugging requires treating the AI as a collaborator whose reasoning you need to understand, not just a black box to test.

Setting Up Your Debug Environment

Enable Verbose Logging

Configure Claude Code to output detailed information:

# Set debug environment variables
export CLAUDE_DEBUG=1
export CLAUDE_LOG_LEVEL=debug

# Run with verbose output
claude --verbose --skill your-skill "test input"

Create a Debug Configuration

Add a debug profile to your skill:

# SKILL.md debug configuration
debug:
  enabled: true
  logPrompts: true
  logResponses: true
  logToolCalls: true
  outputDir: "./.debug-logs"
  retainLogs: 50  # Keep last 50 sessions

Use a Structured Logging Approach

// lib/debug/logger.ts

type LogLevel = 'debug' | 'info' | 'warn' | 'error';

interface LogEntry {
  timestamp: string;
  level: LogLevel;
  component: string;
  message: string;
  data?: Record<string, unknown>;
}

class SkillLogger {
  private entries: LogEntry[] = [];
  private minLevel: LogLevel = 'info';

  constructor(private component: string) {
    if (process.env.CLAUDE_DEBUG) {
      this.minLevel = 'debug';
    }
  }

  private shouldLog(level: LogLevel): boolean {
    const levels: LogLevel[] = ['debug', 'info', 'warn', 'error'];
    return levels.indexOf(level) >= levels.indexOf(this.minLevel);
  }

  log(level: LogLevel, message: string, data?: Record<string, unknown>) {
    if (!this.shouldLog(level)) return;

    const entry: LogEntry = {
      timestamp: new Date().toISOString(),
      level,
      component: this.component,
      message,
      data,
    };

    this.entries.push(entry);

    // Format for console
    const prefix = `[${entry.timestamp}] [${level.toUpperCase()}] [${this.component}]`;
    console.log(`${prefix} ${message}`);
    if (data) {
      console.log(JSON.stringify(data, null, 2));
    }
  }

  debug(message: string, data?: Record<string, unknown>) {
    this.log('debug', message, data);
  }

  info(message: string, data?: Record<string, unknown>) {
    this.log('info', message, data);
  }

  warn(message: string, data?: Record<string, unknown>) {
    this.log('warn', message, data);
  }

  error(message: string, data?: Record<string, unknown>) {
    this.log('error', message, data);
  }

  getEntries(): LogEntry[] {
    return [...this.entries];
  }

  exportToFile(filepath: string) {
    const fs = require('fs');
    fs.writeFileSync(filepath, JSON.stringify(this.entries, null, 2));
  }
}

export function createLogger(component: string): SkillLogger {
  return new SkillLogger(component);
}

Common Error Patterns and Solutions

Pattern 1: Context Window Overflow

Symptoms:

Skill works with small inputs but fails with larger ones
Truncated or incomplete outputs
Error messages about token limits

Diagnosis:

// lib/debug/context-analyzer.ts

export function analyzeContextUsage(
  prompt: string,
  response: string
): {
  promptTokens: number;
  responseTokens: number;
  totalTokens: number;
  percentUsed: number;
} {
  // Approximate token count (rough estimate: 1 token ~ 4 chars)
  const estimateTokens = (text: string) => Math.ceil(text.length / 4);

  const promptTokens = estimateTokens(prompt);
  const responseTokens = estimateTokens(response);
  const totalTokens = promptTokens + responseTokens;
  const maxTokens = 200000; // Claude's context window

  return {
    promptTokens,
    responseTokens,
    totalTokens,
    percentUsed: (totalTokens / maxTokens) * 100,
  };
}

Solution:

// Implement context management
export function trimContextToFit(
  systemPrompt: string,
  userMessages: string[],
  maxTokens: number
): string[] {
  const estimateTokens = (text: string) => Math.ceil(text.length / 4);

  let currentTokens = estimateTokens(systemPrompt);
  const result: string[] = [];

  // Prioritize recent messages
  for (let i = userMessages.length - 1; i >= 0; i--) {
    const messageTokens = estimateTokens(userMessages[i]);
    if (currentTokens + messageTokens > maxTokens) {
      break;
    }
    result.unshift(userMessages[i]);
    currentTokens += messageTokens;
  }

  return result;
}

Pattern 2: Tool Call Failures

Symptoms:

Skill attempts to use tools but they fail silently
Wrong tool selected for the task
Tool arguments malformed

Diagnosis:

// lib/debug/tool-inspector.ts

interface ToolCall {
  name: string;
  arguments: Record<string, unknown>;
  result?: unknown;
  error?: string;
}

export function inspectToolCalls(session: ToolCall[]) {
  console.log('=== Tool Call Analysis ===\n');

  for (const call of session) {
    console.log(`Tool: ${call.name}`);
    console.log(`Arguments: ${JSON.stringify(call.arguments, null, 2)}`);

    if (call.error) {
      console.log(`ERROR: ${call.error}`);

      // Common error patterns
      if (call.error.includes('not found')) {
        console.log('  -> Suggestion: Check if file/path exists');
      }
      if (call.error.includes('permission')) {
        console.log('  -> Suggestion: Check sandbox configuration');
      }
      if (call.error.includes('timeout')) {
        console.log('  -> Suggestion: Increase timeout or optimize operation');
      }
    } else {
      console.log(`Result: ${JSON.stringify(call.result)?.substring(0, 200)}...`);
    }
    console.log('---');
  }
}

Solution:

// Add tool call validation
export function validateToolCall(
  toolName: string,
  args: Record<string, unknown>
): { valid: boolean; errors: string[] } {
  const errors: string[] = [];

  // Tool-specific validation
  switch (toolName) {
    case 'read_file':
      if (!args.path || typeof args.path !== 'string') {
        errors.push('read_file requires a string path argument');
      }
      break;

    case 'write_file':
      if (!args.path || !args.content) {
        errors.push('write_file requires path and content arguments');
      }
      break;

    case 'bash':
      if (!args.command || typeof args.command !== 'string') {
        errors.push('bash requires a string command argument');
      }
      break;
  }

  return { valid: errors.length === 0, errors };
}

Pattern 3: Prompt Misinterpretation

Symptoms:

Skill does something different from what was requested
Outputs are technically correct but not what user wanted
Inconsistent behavior across similar inputs

Diagnosis:

// lib/debug/prompt-analyzer.ts

export function analyzePromptClarity(prompt: string): {
  score: number;
  issues: string[];
  suggestions: string[];
} {
  const issues: string[] = [];
  const suggestions: string[] = [];

  // Check for ambiguous language
  const ambiguousTerms = ['it', 'this', 'that', 'the thing', 'stuff'];
  for (const term of ambiguousTerms) {
    if (prompt.toLowerCase().includes(term)) {
      issues.push(`Ambiguous term found: "${term}"`);
      suggestions.push(`Replace "${term}" with specific references`);
    }
  }

  // Check for missing context
  if (!prompt.includes('file') && !prompt.includes('code') && !prompt.includes('project')) {
    issues.push('Prompt may lack context about what to operate on');
    suggestions.push('Explicitly state the target (file, function, project)');
  }

  // Check for conflicting instructions
  if (prompt.includes('but') || prompt.includes('however') || prompt.includes('except')) {
    issues.push('Prompt contains potential conflicts');
    suggestions.push('Separate into clear, non-conflicting steps');
  }

  // Check length
  if (prompt.length > 2000) {
    issues.push('Prompt may be too long');
    suggestions.push('Break into smaller, focused requests');
  }

  const score = Math.max(0, 100 - issues.length * 20);

  return { score, issues, suggestions };
}

Solution:

<!-- Improve your skill's system prompt -->

## Clear Instructions Template

You are a [specific role] that [specific task].

### Input Format
You will receive:
1. [First input type] - [description]
2. [Second input type] - [description]

### Output Format
You must respond with:
1. [First output element] - [format specification]
2. [Second output element] - [format specification]

### Constraints
- ALWAYS [specific behavior]
- NEVER [prohibited behavior]
- When uncertain, [fallback behavior]

### Examples
Input: [example input]
Output: [example output]

Pattern 4: State Management Issues

Symptoms:

Skill works the first time but fails on subsequent runs
Cached data causes incorrect behavior
Race conditions in async operations

Diagnosis:

// lib/debug/state-inspector.ts

interface StateSnapshot {
  timestamp: string;
  state: Record<string, unknown>;
  trigger: string;
}

class StateDebugger {
  private snapshots: StateSnapshot[] = [];

  captureState(state: Record<string, unknown>, trigger: string) {
    this.snapshots.push({
      timestamp: new Date().toISOString(),
      state: JSON.parse(JSON.stringify(state)), // Deep clone
      trigger,
    });
  }

  compareSnapshots(index1: number, index2: number): {
    added: string[];
    removed: string[];
    changed: string[];
  } {
    const s1 = this.snapshots[index1]?.state || {};
    const s2 = this.snapshots[index2]?.state || {};

    const added = Object.keys(s2).filter(k => !(k in s1));
    const removed = Object.keys(s1).filter(k => !(k in s2));
    const changed = Object.keys(s1)
      .filter(k => k in s2)
      .filter(k => JSON.stringify(s1[k]) !== JSON.stringify(s2[k]));

    return { added, removed, changed };
  }

  printTimeline() {
    console.log('=== State Timeline ===\n');
    for (const snapshot of this.snapshots) {
      console.log(`[${snapshot.timestamp}] ${snapshot.trigger}`);
      console.log(JSON.stringify(snapshot.state, null, 2));
      console.log('---');
    }
  }
}

export const stateDebugger = new StateDebugger();

Solution:

// Implement proper state isolation
export function createIsolatedSession<T extends Record<string, unknown>>(
  initialState: T
): {
  getState: () => T;
  setState: (updates: Partial<T>) => void;
  reset: () => void;
} {
  let state = { ...initialState };

  return {
    getState: () => ({ ...state }),
    setState: (updates) => {
      state = { ...state, ...updates };
    },
    reset: () => {
      state = { ...initialState };
    },
  };
}

Pattern 5: Output Parsing Failures

Symptoms:

Raw AI output appears in user-facing results
JSON parsing errors
Missing or malformed structured data

Diagnosis:

// lib/debug/output-debugger.ts

export function debugOutputParsing(rawOutput: string): {
  format: 'json' | 'markdown' | 'code' | 'text' | 'mixed';
  parseErrors: string[];
  extractedBlocks: { type: string; content: string }[];
} {
  const parseErrors: string[] = [];
  const extractedBlocks: { type: string; content: string }[] = [];

  // Try JSON parsing
  try {
    JSON.parse(rawOutput);
    return { format: 'json', parseErrors: [], extractedBlocks: [] };
  } catch (e) {
    if (rawOutput.includes('{') || rawOutput.includes('[')) {
      parseErrors.push(`Invalid JSON: ${(e as Error).message}`);
    }
  }

  // Extract code blocks
  const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
  let match;
  while ((match = codeBlockRegex.exec(rawOutput)) !== null) {
    extractedBlocks.push({
      type: match[1] || 'code',
      content: match[2],
    });
  }

  // Determine format
  let format: 'markdown' | 'code' | 'text' | 'mixed' = 'text';
  if (extractedBlocks.length > 0) {
    format = rawOutput.replace(codeBlockRegex, '').trim().length > 100
      ? 'mixed'
      : 'code';
  } else if (rawOutput.includes('#') || rawOutput.includes('**')) {
    format = 'markdown';
  }

  return { format, parseErrors, extractedBlocks };
}

Solution:

// Implement robust output parsing
export function parseSkillOutput<T>(
  rawOutput: string,
  schema: {
    type: 'json' | 'code' | 'text';
    validate?: (data: unknown) => data is T;
  }
): { success: boolean; data?: T; error?: string } {
  try {
    switch (schema.type) {
      case 'json': {
        // Extract JSON from markdown code blocks if present
        const jsonMatch = rawOutput.match(/```json\n([\s\S]*?)```/);
        const jsonStr = jsonMatch ? jsonMatch[1] : rawOutput;
        const parsed = JSON.parse(jsonStr);

        if (schema.validate && !schema.validate(parsed)) {
          return { success: false, error: 'Output failed validation' };
        }

        return { success: true, data: parsed };
      }

      case 'code': {
        const codeMatch = rawOutput.match(/```\w*\n([\s\S]*?)```/);
        if (!codeMatch) {
          return { success: false, error: 'No code block found in output' };
        }
        return { success: true, data: codeMatch[1] as unknown as T };
      }

      case 'text':
      default:
        return { success: true, data: rawOutput.trim() as unknown as T };
    }
  } catch (e) {
    return { success: false, error: (e as Error).message };
  }
}

Systematic Debugging Workflow

Step 1: Reproduce the Issue

Create a minimal reproduction case:

// tests/debug/reproduce-issue.ts

describe('Issue #42: Skill fails with unicode input', () => {
  it('should reproduce the issue', async () => {
    const problematicInput = 'Process this: cafe';

    // Capture full trace
    const trace = await runSkillWithTracing('my-skill', {
      input: problematicInput,
      capturePrompts: true,
      captureResponses: true,
    });

    // Save trace for analysis
    fs.writeFileSync(
      './debug/issue-42-trace.json',
      JSON.stringify(trace, null, 2)
    );

    // The failing assertion
    expect(trace.output).toContain('cafe');
  });
});

Step 2: Isolate the Failure Point

Use binary search to narrow down where the failure occurs:

// lib/debug/bisect.ts

export async function bisectFailure(
  steps: (() => Promise<boolean>)[]
): Promise<number> {
  let low = 0;
  let high = steps.length - 1;

  while (low < high) {
    const mid = Math.floor((low + high) / 2);

    console.log(`Testing step ${mid}...`);
    const success = await steps[mid]();

    if (success) {
      low = mid + 1;
    } else {
      high = mid;
    }
  }

  console.log(`Failure occurs at step ${low}`);
  return low;
}

// Usage
const steps = [
  async () => validateInput(input),
  async () => buildPrompt(input),
  async () => callAI(prompt),
  async () => parseOutput(response),
  async () => applyChanges(parsed),
];

const failingStep = await bisectFailure(steps);

Step 3: Compare Working vs. Failing Cases

// lib/debug/diff-analyzer.ts

interface RunResult {
  input: unknown;
  output: unknown;
  success: boolean;
  trace: unknown[];
}

export function compareRuns(
  working: RunResult,
  failing: RunResult
): {
  inputDiff: string[];
  traceDiff: string[];
  hypothesis: string;
} {
  const inputDiff: string[] = [];
  const traceDiff: string[] = [];

  // Compare inputs
  const workingInput = JSON.stringify(working.input);
  const failingInput = JSON.stringify(failing.input);

  if (workingInput !== failingInput) {
    inputDiff.push(`Input differs:`);
    inputDiff.push(`  Working: ${workingInput.substring(0, 100)}`);
    inputDiff.push(`  Failing: ${failingInput.substring(0, 100)}`);
  }

  // Compare trace lengths
  if (working.trace.length !== failing.trace.length) {
    traceDiff.push(
      `Trace length differs: ${working.trace.length} vs ${failing.trace.length}`
    );
  }

  // Find first divergence in trace
  for (let i = 0; i < Math.min(working.trace.length, failing.trace.length); i++) {
    if (JSON.stringify(working.trace[i]) !== JSON.stringify(failing.trace[i])) {
      traceDiff.push(`First divergence at step ${i}`);
      traceDiff.push(`  Working: ${JSON.stringify(working.trace[i])}`);
      traceDiff.push(`  Failing: ${JSON.stringify(failing.trace[i])}`);
      break;
    }
  }

  // Generate hypothesis
  let hypothesis = 'Unknown cause';
  if (inputDiff.some(d => d.includes('unicode') || d.includes('\u'))) {
    hypothesis = 'Unicode handling issue';
  } else if (traceDiff.some(d => d.includes('timeout'))) {
    hypothesis = 'Performance/timeout issue';
  } else if (traceDiff.some(d => d.includes('parse'))) {
    hypothesis = 'Output parsing issue';
  }

  return { inputDiff, traceDiff, hypothesis };
}

Step 4: Test Your Fix

// tests/regression/issue-42.test.ts

describe('Regression: Issue #42', () => {
  // Test the specific case that was failing
  it('should handle unicode input correctly', async () => {
    const inputs = [
      'Process: cafe',
      'Handle: ',
      'Parse: Hello',
      'Analyze: ',
    ];

    for (const input of inputs) {
      const result = await runSkill('my-skill', { input });
      expect(result.success).toBe(true);
      expect(result.output).toBeDefined();
    }
  });

  // Ensure fix doesn't break existing functionality
  it('should still handle ASCII input', async () => {
    const result = await runSkill('my-skill', {
      input: 'Process: hello world',
    });
    expect(result.success).toBe(true);
  });
});

Advanced Debugging Techniques

AI Behavior Probing

Ask the AI to explain its reasoning:

// lib/debug/behavior-probe.ts

export async function probeAIBehavior(
  skill: string,
  input: string
): Promise<{
  reasoning: string;
  confidence: number;
  alternatives: string[];
}> {
  const probePrompt = `
You are about to process the following input with the "${skill}" skill:

${input}

Before processing, explain:
1. What do you understand this request to mean?
2. What steps will you take?
3. What could go wrong?
4. How confident are you (0-100)?
5. What alternative interpretations exist?

Format your response as JSON.
`;

  const response = await callAI(probePrompt);
  return JSON.parse(response);
}

Session Recording and Replay

Record and replay skill runs for debugging:

// lib/debug/recorder.ts

interface RecordedRun {
  id: string;
  timestamp: string;
  skill: string;
  input: unknown;
  aiCalls: {
    prompt: string;
    response: string;
  }[];
  toolCalls: {
    tool: string;
    args: unknown;
    result: unknown;
  }[];
  output: unknown;
}

class SessionRecorder {
  private recordings: RecordedRun[] = [];

  startRecording(skill: string, input: unknown): string {
    const id = crypto.randomUUID();
    this.recordings.push({
      id,
      timestamp: new Date().toISOString(),
      skill,
      input,
      aiCalls: [],
      toolCalls: [],
      output: undefined,
    });
    return id;
  }

  recordAICall(id: string, prompt: string, response: string) {
    const recording = this.recordings.find(r => r.id === id);
    if (recording) {
      recording.aiCalls.push({ prompt, response });
    }
  }

  recordToolCall(id: string, tool: string, args: unknown, result: unknown) {
    const recording = this.recordings.find(r => r.id === id);
    if (recording) {
      recording.toolCalls.push({ tool, args, result });
    }
  }

  finishRecording(id: string, output: unknown) {
    const recording = this.recordings.find(r => r.id === id);
    if (recording) {
      recording.output = output;
    }
  }

  async replay(
    id: string,
    options: { mockAI?: boolean; mockTools?: boolean } = {}
  ): Promise<unknown> {
    const recording = this.recordings.find(r => r.id === id);
    if (!recording) throw new Error(`Recording ${id} not found`);

    // Replay with optional mocking
    // This allows testing code paths without actual AI/tool calls
    console.log(`Replaying session ${id}...`);
    // Implementation details would go here
    return recording.output;
  }

  save(filepath: string) {
    const fs = require('fs');
    fs.writeFileSync(filepath, JSON.stringify(this.recordings, null, 2));
  }
}

export const recorder = new SessionRecorder();

Debugging Checklist

When debugging a skill issue:

Reproduce: Can you reliably reproduce the issue?
Isolate: Is it an AI issue, code issue, or integration issue?
Log: Have you captured prompts, responses, and tool calls?
Compare: Do you have a working case to compare against?
Simplify: Can you reproduce with a simpler input?
Validate: Is the input valid? Is the output parseable?
Context: Is context window being exceeded?
State: Is there stale state affecting behavior?
Test: Have you written a test for the fix?
Regress: Does the fix break anything else?

Conclusion

Debugging Claude Code skills requires a combination of traditional software debugging techniques and AI-specific strategies. The key principles are:

Instrument everything - You cannot debug what you cannot see
Reproduce reliably - Flaky issues are nearly impossible to fix
Isolate systematically - Use binary search to find failure points
Compare working vs. failing - Differences reveal root causes
Probe AI behavior - Ask the AI to explain its reasoning
Test the fix - Ensure the fix works and does not break other things

With these techniques, you can systematically identify and fix issues in even the most complex skills.

Want to make your skills faster? Continue to Performance Optimization for speed-up strategies.

Debugging Claude Code: Techniques for Skill Development

The Debugging Mindset

Before diving into techniques, understand what makes AI debugging unique:

Non-determinism: The same input may produce different outputs across runs
Context sensitivity: Behavior changes based on conversation history and project state
Emergent behavior: Complex skills can produce unexpected interactions between components
Silent failures: The AI might produce plausible-looking but incorrect output

Effective debugging requires treating the AI as a collaborator whose reasoning you need to understand, not just a black box to test.

Setting Up Your Debug Environment

Enable Verbose Logging

Configure Claude Code to output detailed information:

# Set debug environment variables
export CLAUDE_DEBUG=1
export CLAUDE_LOG_LEVEL=debug

# Run with verbose output
claude --verbose --skill your-skill "test input"

Create a Debug Configuration

Add a debug profile to your skill:

# SKILL.md debug configuration
debug:
  enabled: true
  logPrompts: true
  logResponses: true
  logToolCalls: true
  outputDir: "./.debug-logs"
  retainLogs: 50  # Keep last 50 sessions

Use a Structured Logging Approach

// lib/debug/logger.ts

type LogLevel = 'debug' | 'info' | 'warn' | 'error';

interface LogEntry {
  timestamp: string;
  level: LogLevel;
  component: string;
  message: string;
  data?: Record<string, unknown>;
}

class SkillLogger {
  private entries: LogEntry[] = [];
  private minLevel: LogLevel = 'info';

  constructor(private component: string) {
    if (process.env.CLAUDE_DEBUG) {
      this.minLevel = 'debug';
    }
  }

  private shouldLog(level: LogLevel): boolean {
    const levels: LogLevel[] = ['debug', 'info', 'warn', 'error'];
    return levels.indexOf(level) >= levels.indexOf(this.minLevel);
  }

  log(level: LogLevel, message: string, data?: Record<string, unknown>) {
    if (!this.shouldLog(level)) return;

    const entry: LogEntry = {
      timestamp: new Date().toISOString(),
      level,
      component: this.component,
      message,
      data,
    };

    this.entries.push(entry);

    // Format for console
    const prefix = `[${entry.timestamp}] [${level.toUpperCase()}] [${this.component}]`;
    console.log(`${prefix} ${message}`);
    if (data) {
      console.log(JSON.stringify(data, null, 2));
    }
  }

  debug(message: string, data?: Record<string, unknown>) {
    this.log('debug', message, data);
  }

  info(message: string, data?: Record<string, unknown>) {
    this.log('info', message, data);
  }

  warn(message: string, data?: Record<string, unknown>) {
    this.log('warn', message, data);
  }

  error(message: string, data?: Record<string, unknown>) {
    this.log('error', message, data);
  }

  getEntries(): LogEntry[] {
    return [...this.entries];
  }

  exportToFile(filepath: string) {
    const fs = require('fs');
    fs.writeFileSync(filepath, JSON.stringify(this.entries, null, 2));
  }
}

export function createLogger(component: string): SkillLogger {
  return new SkillLogger(component);
}

Common Error Patterns and Solutions

Pattern 1: Context Window Overflow

Symptoms:

Skill works with small inputs but fails with larger ones
Truncated or incomplete outputs
Error messages about token limits

Diagnosis:

// lib/debug/context-analyzer.ts

export function analyzeContextUsage(
  prompt: string,
  response: string
): {
  promptTokens: number;
  responseTokens: number;
  totalTokens: number;
  percentUsed: number;
} {
  // Approximate token count (rough estimate: 1 token ~ 4 chars)
  const estimateTokens = (text: string) => Math.ceil(text.length / 4);

  const promptTokens = estimateTokens(prompt);
  const responseTokens = estimateTokens(response);
  const totalTokens = promptTokens + responseTokens;
  const maxTokens = 200000; // Claude's context window

  return {
    promptTokens,
    responseTokens,
    totalTokens,
    percentUsed: (totalTokens / maxTokens) * 100,
  };
}

Solution:

// Implement context management
export function trimContextToFit(
  systemPrompt: string,
  userMessages: string[],
  maxTokens: number
): string[] {
  const estimateTokens = (text: string) => Math.ceil(text.length / 4);

  let currentTokens = estimateTokens(systemPrompt);
  const result: string[] = [];

  // Prioritize recent messages
  for (let i = userMessages.length - 1; i >= 0; i--) {
    const messageTokens = estimateTokens(userMessages[i]);
    if (currentTokens + messageTokens > maxTokens) {
      break;
    }
    result.unshift(userMessages[i]);
    currentTokens += messageTokens;
  }

  return result;
}

Pattern 2: Tool Call Failures

Symptoms:

Skill attempts to use tools but they fail silently
Wrong tool selected for the task
Tool arguments malformed

Diagnosis:

// lib/debug/tool-inspector.ts

interface ToolCall {
  name: string;
  arguments: Record<string, unknown>;
  result?: unknown;
  error?: string;
}

export function inspectToolCalls(session: ToolCall[]) {
  console.log('=== Tool Call Analysis ===\n');

  for (const call of session) {
    console.log(`Tool: ${call.name}`);
    console.log(`Arguments: ${JSON.stringify(call.arguments, null, 2)}`);

    if (call.error) {
      console.log(`ERROR: ${call.error}`);

      // Common error patterns
      if (call.error.includes('not found')) {
        console.log('  -> Suggestion: Check if file/path exists');
      }
      if (call.error.includes('permission')) {
        console.log('  -> Suggestion: Check sandbox configuration');
      }
      if (call.error.includes('timeout')) {
        console.log('  -> Suggestion: Increase timeout or optimize operation');
      }
    } else {
      console.log(`Result: ${JSON.stringify(call.result)?.substring(0, 200)}...`);
    }
    console.log('---');
  }
}

Solution:

// Add tool call validation
export function validateToolCall(
  toolName: string,
  args: Record<string, unknown>
): { valid: boolean; errors: string[] } {
  const errors: string[] = [];

  // Tool-specific validation
  switch (toolName) {
    case 'read_file':
      if (!args.path || typeof args.path !== 'string') {
        errors.push('read_file requires a string path argument');
      }
      break;

    case 'write_file':
      if (!args.path || !args.content) {
        errors.push('write_file requires path and content arguments');
      }
      break;

    case 'bash':
      if (!args.command || typeof args.command !== 'string') {
        errors.push('bash requires a string command argument');
      }
      break;
  }

  return { valid: errors.length === 0, errors };
}

Pattern 3: Prompt Misinterpretation

Symptoms:

Skill does something different from what was requested
Outputs are technically correct but not what user wanted
Inconsistent behavior across similar inputs

Diagnosis:

// lib/debug/prompt-analyzer.ts

export function analyzePromptClarity(prompt: string): {
  score: number;
  issues: string[];
  suggestions: string[];
} {
  const issues: string[] = [];
  const suggestions: string[] = [];

  // Check for ambiguous language
  const ambiguousTerms = ['it', 'this', 'that', 'the thing', 'stuff'];
  for (const term of ambiguousTerms) {
    if (prompt.toLowerCase().includes(term)) {
      issues.push(`Ambiguous term found: "${term}"`);
      suggestions.push(`Replace "${term}" with specific references`);
    }
  }

  // Check for missing context
  if (!prompt.includes('file') && !prompt.includes('code') && !prompt.includes('project')) {
    issues.push('Prompt may lack context about what to operate on');
    suggestions.push('Explicitly state the target (file, function, project)');
  }

  // Check for conflicting instructions
  if (prompt.includes('but') || prompt.includes('however') || prompt.includes('except')) {
    issues.push('Prompt contains potential conflicts');
    suggestions.push('Separate into clear, non-conflicting steps');
  }

  // Check length
  if (prompt.length > 2000) {
    issues.push('Prompt may be too long');
    suggestions.push('Break into smaller, focused requests');
  }

  const score = Math.max(0, 100 - issues.length * 20);

  return { score, issues, suggestions };
}

Solution:

<!-- Improve your skill's system prompt -->

## Clear Instructions Template

You are a [specific role] that [specific task].

### Input Format
You will receive:
1. [First input type] - [description]
2. [Second input type] - [description]

### Output Format
You must respond with:
1. [First output element] - [format specification]
2. [Second output element] - [format specification]

### Constraints
- ALWAYS [specific behavior]
- NEVER [prohibited behavior]
- When uncertain, [fallback behavior]

### Examples
Input: [example input]
Output: [example output]

Pattern 4: State Management Issues

Symptoms:

Skill works the first time but fails on subsequent runs
Cached data causes incorrect behavior
Race conditions in async operations

Diagnosis:

// lib/debug/state-inspector.ts

interface StateSnapshot {
  timestamp: string;
  state: Record<string, unknown>;
  trigger: string;
}

class StateDebugger {
  private snapshots: StateSnapshot[] = [];

  captureState(state: Record<string, unknown>, trigger: string) {
    this.snapshots.push({
      timestamp: new Date().toISOString(),
      state: JSON.parse(JSON.stringify(state)), // Deep clone
      trigger,
    });
  }

  compareSnapshots(index1: number, index2: number): {
    added: string[];
    removed: string[];
    changed: string[];
  } {
    const s1 = this.snapshots[index1]?.state || {};
    const s2 = this.snapshots[index2]?.state || {};

    const added = Object.keys(s2).filter(k => !(k in s1));
    const removed = Object.keys(s1).filter(k => !(k in s2));
    const changed = Object.keys(s1)
      .filter(k => k in s2)
      .filter(k => JSON.stringify(s1[k]) !== JSON.stringify(s2[k]));

    return { added, removed, changed };
  }

  printTimeline() {
    console.log('=== State Timeline ===\n');
    for (const snapshot of this.snapshots) {
      console.log(`[${snapshot.timestamp}] ${snapshot.trigger}`);
      console.log(JSON.stringify(snapshot.state, null, 2));
      console.log('---');
    }
  }
}

export const stateDebugger = new StateDebugger();

Solution:

// Implement proper state isolation
export function createIsolatedSession<T extends Record<string, unknown>>(
  initialState: T
): {
  getState: () => T;
  setState: (updates: Partial<T>) => void;
  reset: () => void;
} {
  let state = { ...initialState };

  return {
    getState: () => ({ ...state }),
    setState: (updates) => {
      state = { ...state, ...updates };
    },
    reset: () => {
      state = { ...initialState };
    },
  };
}

Pattern 5: Output Parsing Failures

Symptoms:

Raw AI output appears in user-facing results
JSON parsing errors
Missing or malformed structured data

Diagnosis:

// lib/debug/output-debugger.ts

export function debugOutputParsing(rawOutput: string): {
  format: 'json' | 'markdown' | 'code' | 'text' | 'mixed';
  parseErrors: string[];
  extractedBlocks: { type: string; content: string }[];
} {
  const parseErrors: string[] = [];
  const extractedBlocks: { type: string; content: string }[] = [];

  // Try JSON parsing
  try {
    JSON.parse(rawOutput);
    return { format: 'json', parseErrors: [], extractedBlocks: [] };
  } catch (e) {
    if (rawOutput.includes('{') || rawOutput.includes('[')) {
      parseErrors.push(`Invalid JSON: ${(e as Error).message}`);
    }
  }

  // Extract code blocks
  const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
  let match;
  while ((match = codeBlockRegex.exec(rawOutput)) !== null) {
    extractedBlocks.push({
      type: match[1] || 'code',
      content: match[2],
    });
  }

  // Determine format
  let format: 'markdown' | 'code' | 'text' | 'mixed' = 'text';
  if (extractedBlocks.length > 0) {
    format = rawOutput.replace(codeBlockRegex, '').trim().length > 100
      ? 'mixed'
      : 'code';
  } else if (rawOutput.includes('#') || rawOutput.includes('**')) {
    format = 'markdown';
  }

  return { format, parseErrors, extractedBlocks };
}

Solution:

// Implement robust output parsing
export function parseSkillOutput<T>(
  rawOutput: string,
  schema: {
    type: 'json' | 'code' | 'text';
    validate?: (data: unknown) => data is T;
  }
): { success: boolean; data?: T; error?: string } {
  try {
    switch (schema.type) {
      case 'json': {
        // Extract JSON from markdown code blocks if present
        const jsonMatch = rawOutput.match(/```json\n([\s\S]*?)```/);
        const jsonStr = jsonMatch ? jsonMatch[1] : rawOutput;
        const parsed = JSON.parse(jsonStr);

        if (schema.validate && !schema.validate(parsed)) {
          return { success: false, error: 'Output failed validation' };
        }

        return { success: true, data: parsed };
      }

      case 'code': {
        const codeMatch = rawOutput.match(/```\w*\n([\s\S]*?)```/);
        if (!codeMatch) {
          return { success: false, error: 'No code block found in output' };
        }
        return { success: true, data: codeMatch[1] as unknown as T };
      }

      case 'text':
      default:
        return { success: true, data: rawOutput.trim() as unknown as T };
    }
  } catch (e) {
    return { success: false, error: (e as Error).message };
  }
}

Systematic Debugging Workflow

Step 1: Reproduce the Issue

Create a minimal reproduction case:

// tests/debug/reproduce-issue.ts

describe('Issue #42: Skill fails with unicode input', () => {
  it('should reproduce the issue', async () => {
    const problematicInput = 'Process this: cafe';

    // Capture full trace
    const trace = await runSkillWithTracing('my-skill', {
      input: problematicInput,
      capturePrompts: true,
      captureResponses: true,
    });

    // Save trace for analysis
    fs.writeFileSync(
      './debug/issue-42-trace.json',
      JSON.stringify(trace, null, 2)
    );

    // The failing assertion
    expect(trace.output).toContain('cafe');
  });
});

Step 2: Isolate the Failure Point

Use binary search to narrow down where the failure occurs:

// lib/debug/bisect.ts

export async function bisectFailure(
  steps: (() => Promise<boolean>)[]
): Promise<number> {
  let low = 0;
  let high = steps.length - 1;

  while (low < high) {
    const mid = Math.floor((low + high) / 2);

    console.log(`Testing step ${mid}...`);
    const success = await steps[mid]();

    if (success) {
      low = mid + 1;
    } else {
      high = mid;
    }
  }

  console.log(`Failure occurs at step ${low}`);
  return low;
}

// Usage
const steps = [
  async () => validateInput(input),
  async () => buildPrompt(input),
  async () => callAI(prompt),
  async () => parseOutput(response),
  async () => applyChanges(parsed),
];

const failingStep = await bisectFailure(steps);

Step 3: Compare Working vs. Failing Cases

// lib/debug/diff-analyzer.ts

interface RunResult {
  input: unknown;
  output: unknown;
  success: boolean;
  trace: unknown[];
}

export function compareRuns(
  working: RunResult,
  failing: RunResult
): {
  inputDiff: string[];
  traceDiff: string[];
  hypothesis: string;
} {
  const inputDiff: string[] = [];
  const traceDiff: string[] = [];

  // Compare inputs
  const workingInput = JSON.stringify(working.input);
  const failingInput = JSON.stringify(failing.input);

  if (workingInput !== failingInput) {
    inputDiff.push(`Input differs:`);
    inputDiff.push(`  Working: ${workingInput.substring(0, 100)}`);
    inputDiff.push(`  Failing: ${failingInput.substring(0, 100)}`);
  }

  // Compare trace lengths
  if (working.trace.length !== failing.trace.length) {
    traceDiff.push(
      `Trace length differs: ${working.trace.length} vs ${failing.trace.length}`
    );
  }

  // Find first divergence in trace
  for (let i = 0; i < Math.min(working.trace.length, failing.trace.length); i++) {
    if (JSON.stringify(working.trace[i]) !== JSON.stringify(failing.trace[i])) {
      traceDiff.push(`First divergence at step ${i}`);
      traceDiff.push(`  Working: ${JSON.stringify(working.trace[i])}`);
      traceDiff.push(`  Failing: ${JSON.stringify(failing.trace[i])}`);
      break;
    }
  }

  // Generate hypothesis
  let hypothesis = 'Unknown cause';
  if (inputDiff.some(d => d.includes('unicode') || d.includes('\u'))) {
    hypothesis = 'Unicode handling issue';
  } else if (traceDiff.some(d => d.includes('timeout'))) {
    hypothesis = 'Performance/timeout issue';
  } else if (traceDiff.some(d => d.includes('parse'))) {
    hypothesis = 'Output parsing issue';
  }

  return { inputDiff, traceDiff, hypothesis };
}

Step 4: Test Your Fix

// tests/regression/issue-42.test.ts

describe('Regression: Issue #42', () => {
  // Test the specific case that was failing
  it('should handle unicode input correctly', async () => {
    const inputs = [
      'Process: cafe',
      'Handle: ',
      'Parse: Hello',
      'Analyze: ',
    ];

    for (const input of inputs) {
      const result = await runSkill('my-skill', { input });
      expect(result.success).toBe(true);
      expect(result.output).toBeDefined();
    }
  });

  // Ensure fix doesn't break existing functionality
  it('should still handle ASCII input', async () => {
    const result = await runSkill('my-skill', {
      input: 'Process: hello world',
    });
    expect(result.success).toBe(true);
  });
});

Advanced Debugging Techniques

AI Behavior Probing

Ask the AI to explain its reasoning:

// lib/debug/behavior-probe.ts

export async function probeAIBehavior(
  skill: string,
  input: string
): Promise<{
  reasoning: string;
  confidence: number;
  alternatives: string[];
}> {
  const probePrompt = `
You are about to process the following input with the "${skill}" skill:

${input}

Before processing, explain:
1. What do you understand this request to mean?
2. What steps will you take?
3. What could go wrong?
4. How confident are you (0-100)?
5. What alternative interpretations exist?

Format your response as JSON.
`;

  const response = await callAI(probePrompt);
  return JSON.parse(response);
}

Session Recording and Replay

Record and replay skill runs for debugging:

// lib/debug/recorder.ts

interface RecordedRun {
  id: string;
  timestamp: string;
  skill: string;
  input: unknown;
  aiCalls: {
    prompt: string;
    response: string;
  }[];
  toolCalls: {
    tool: string;
    args: unknown;
    result: unknown;
  }[];
  output: unknown;
}

class SessionRecorder {
  private recordings: RecordedRun[] = [];

  startRecording(skill: string, input: unknown): string {
    const id = crypto.randomUUID();
    this.recordings.push({
      id,
      timestamp: new Date().toISOString(),
      skill,
      input,
      aiCalls: [],
      toolCalls: [],
      output: undefined,
    });
    return id;
  }

  recordAICall(id: string, prompt: string, response: string) {
    const recording = this.recordings.find(r => r.id === id);
    if (recording) {
      recording.aiCalls.push({ prompt, response });
    }
  }

  recordToolCall(id: string, tool: string, args: unknown, result: unknown) {
    const recording = this.recordings.find(r => r.id === id);
    if (recording) {
      recording.toolCalls.push({ tool, args, result });
    }
  }

  finishRecording(id: string, output: unknown) {
    const recording = this.recordings.find(r => r.id === id);
    if (recording) {
      recording.output = output;
    }
  }

  async replay(
    id: string,
    options: { mockAI?: boolean; mockTools?: boolean } = {}
  ): Promise<unknown> {
    const recording = this.recordings.find(r => r.id === id);
    if (!recording) throw new Error(`Recording ${id} not found`);

    // Replay with optional mocking
    // This allows testing code paths without actual AI/tool calls
    console.log(`Replaying session ${id}...`);
    // Implementation details would go here
    return recording.output;
  }

  save(filepath: string) {
    const fs = require('fs');
    fs.writeFileSync(filepath, JSON.stringify(this.recordings, null, 2));
  }
}

export const recorder = new SessionRecorder();

Debugging Checklist

When debugging a skill issue:

Reproduce: Can you reliably reproduce the issue?
Isolate: Is it an AI issue, code issue, or integration issue?
Log: Have you captured prompts, responses, and tool calls?
Compare: Do you have a working case to compare against?
Simplify: Can you reproduce with a simpler input?
Validate: Is the input valid? Is the output parseable?
Context: Is context window being exceeded?
State: Is there stale state affecting behavior?
Test: Have you written a test for the fix?
Regress: Does the fix break anything else?

Conclusion

Debugging Claude Code skills requires a combination of traditional software debugging techniques and AI-specific strategies. The key principles are:

Instrument everything - You cannot debug what you cannot see
Reproduce reliably - Flaky issues are nearly impossible to fix
Isolate systematically - Use binary search to find failure points
Compare working vs. failing - Differences reveal root causes
Probe AI behavior - Ask the AI to explain its reasoning
Test the fix - Ensure the fix works and does not break other things

With these techniques, you can systematically identify and fix issues in even the most complex skills.

Want to make your skills faster? Continue to Performance Optimization for speed-up strategies.