Performance Optimization for Claude Code Skills

Performance in Claude Code skills isn't just about speed—it's about cost, user experience, and reliability. A skill that takes 30 seconds and costs $0.50 per run won't see adoption, no matter how powerful it is. This guide covers practical techniques for building fast, efficient skills.

Understanding Performance Bottlenecks

Before optimizing, understand where time and tokens are spent:

The Four Performance Dimensions

Latency: Time from request to response
Token usage: Input and output tokens consumed
Tool calls: Number and duration of tool invocations
Memory: Context window utilization

Measuring Performance

// lib/performance/profiler.ts

interface PerformanceMetrics {
  totalDuration: number;
  aiCallDuration: number;
  toolCallDuration: number;
  inputTokens: number;
  outputTokens: number;
  toolCalls: number;
  contextUtilization: number;
}

class SkillProfiler {
  private startTime: number = 0;
  private metrics: Partial<PerformanceMetrics> = {};
  private toolTimes: number[] = [];

  start() {
    this.startTime = performance.now();
    this.metrics = {};
    this.toolTimes = [];
  }

  recordAICall(inputTokens: number, outputTokens: number, duration: number) {
    this.metrics.inputTokens = (this.metrics.inputTokens || 0) + inputTokens;
    this.metrics.outputTokens = (this.metrics.outputTokens || 0) + outputTokens;
    this.metrics.aiCallDuration = (this.metrics.aiCallDuration || 0) + duration;
  }

  recordToolCall(duration: number) {
    this.toolTimes.push(duration);
  }

  finish(): PerformanceMetrics {
    const totalDuration = performance.now() - this.startTime;

    return {
      totalDuration,
      aiCallDuration: this.metrics.aiCallDuration || 0,
      toolCallDuration: this.toolTimes.reduce((a, b) => a + b, 0),
      inputTokens: this.metrics.inputTokens || 0,
      outputTokens: this.metrics.outputTokens || 0,
      toolCalls: this.toolTimes.length,
      contextUtilization: this.calculateContextUtilization(),
    };
  }

  private calculateContextUtilization(): number {
    const totalTokens = (this.metrics.inputTokens || 0) + (this.metrics.outputTokens || 0);
    const maxTokens = 200000; // Claude's context window
    return (totalTokens / maxTokens) * 100;
  }

  printReport() {
    const metrics = this.finish();
    console.log('\n=== Performance Report ===');
    console.log(`Total Duration: ${metrics.totalDuration.toFixed(0)}ms`);
    console.log(`AI Call Time: ${metrics.aiCallDuration.toFixed(0)}ms`);
    console.log(`Tool Call Time: ${metrics.toolCallDuration.toFixed(0)}ms`);
    console.log(`Input Tokens: ${metrics.inputTokens}`);
    console.log(`Output Tokens: ${metrics.outputTokens}`);
    console.log(`Tool Calls: ${metrics.toolCalls}`);
    console.log(`Context Usage: ${metrics.contextUtilization.toFixed(1)}%`);

    // Cost estimate (approximate)
    const inputCost = (metrics.inputTokens / 1000000) * 3; // $3/MTok input
    const outputCost = (metrics.outputTokens / 1000000) * 15; // $15/MTok output
    console.log(`Estimated Cost: $${(inputCost + outputCost).toFixed(4)}`);
  }
}

export const profiler = new SkillProfiler();

Context Window Optimization

The context window is your most precious resource. Every token counts.

1. Minimize System Prompt Size

// Bad: Verbose system prompt
const verbosePrompt = `
You are an expert software developer with years of experience in multiple
programming languages including JavaScript, TypeScript, Python, Go, Rust,
and many others. You have deep knowledge of software architecture, design
patterns, best practices, code review, testing methodologies, and more.
When asked to help with code, you should provide thorough, well-documented
solutions that follow industry best practices and conventions...
[500+ more words]
`;

// Good: Concise system prompt
const concisePrompt = `
Expert code reviewer. Focus on:
- Bugs and security issues
- Performance problems
- Style violations per project conventions

Be concise. Prioritize critical issues.
`;

Reduction: ~80% fewer tokens

2. Smart Context Loading

Only load what you need:

// lib/context/smart-loader.ts

interface FileContext {
  path: string;
  content: string;
  relevance: number;
}

export async function loadRelevantContext(
  query: string,
  projectPath: string,
  maxTokens: number = 50000
): Promise<FileContext[]> {
  const estimateTokens = (text: string) => Math.ceil(text.length / 4);

  // 1. Find potentially relevant files
  const allFiles = await findFiles(projectPath, ['**/*.ts', '**/*.js']);

  // 2. Score files by relevance to query
  const scored = await Promise.all(
    allFiles.map(async (path) => {
      const content = await readFile(path);
      const relevance = calculateRelevance(query, path, content);
      return { path, content, relevance };
    })
  );

  // 3. Sort by relevance and fit within token budget
  scored.sort((a, b) => b.relevance - a.relevance);

  const result: FileContext[] = [];
  let currentTokens = 0;

  for (const file of scored) {
    const fileTokens = estimateTokens(file.content);
    if (currentTokens + fileTokens > maxTokens) {
      // Try to include a summary instead
      const summary = summarizeFile(file.content);
      const summaryTokens = estimateTokens(summary);
      if (currentTokens + summaryTokens <= maxTokens) {
        result.push({ ...file, content: summary });
        currentTokens += summaryTokens;
      }
      continue;
    }
    result.push(file);
    currentTokens += fileTokens;
  }

  return result;
}

function calculateRelevance(query: string, path: string, content: string): number {
  let score = 0;

  // Path relevance
  const queryTerms = query.toLowerCase().split(/\s+/);
  for (const term of queryTerms) {
    if (path.toLowerCase().includes(term)) score += 10;
    if (content.toLowerCase().includes(term)) score += 1;
  }

  // Recency bonus (prefer recently modified)
  // Implementation would check file stats

  // Import graph bonus (prefer imported files)
  // Implementation would analyze imports

  return score;
}

function summarizeFile(content: string): string {
  // Extract key elements only
  const lines = content.split('\n');
  const summary: string[] = [];

  for (const line of lines) {
    // Keep exports, function signatures, class declarations
    if (
      line.includes('export ') ||
      line.match(/^(async\s+)?function\s+\w+/) ||
      line.match(/^class\s+\w+/) ||
      line.match(/^interface\s+\w+/) ||
      line.match(/^type\s+\w+/)
    ) {
      summary.push(line);
    }
  }

  return summary.join('\n');
}

3. Progressive Context Loading

Start small, add context only when needed:

// lib/context/progressive-loader.ts

export async function progressiveContextStrategy(
  task: string,
  projectPath: string
): Promise<string[]> {
  const contexts: string[] = [];

  // Level 1: Minimal context (try this first)
  contexts.push(await loadMinimalContext(projectPath));

  // Level 2: Add relevant files based on task
  if (taskRequiresMoreContext(task)) {
    contexts.push(await loadRelevantFiles(task, projectPath));
  }

  // Level 3: Add full file contents only if needed
  if (taskRequiresFullContext(task)) {
    contexts.push(await loadFullContext(projectPath));
  }

  return contexts;
}

async function loadMinimalContext(projectPath: string): Promise<string> {
  // Just the project structure and key files
  const structure = await getDirectoryTree(projectPath, { maxDepth: 3 });
  const readme = await readFileIfExists(`${projectPath}/README.md`);
  const packageJson = await readFileIfExists(`${projectPath}/package.json`);

  return `
Project Structure:
${structure}

${readme ? `README:\n${readme.substring(0, 1000)}...` : ''}
${packageJson ? `Dependencies: ${extractDeps(packageJson)}` : ''}
  `.trim();
}

function taskRequiresMoreContext(task: string): boolean {
  const contextHeavyKeywords = [
    'refactor', 'analyze', 'review', 'understand',
    'how does', 'explain', 'find all'
  ];
  return contextHeavyKeywords.some(k => task.toLowerCase().includes(k));
}

Token Optimization Strategies

1. Output Compression

Instruct the AI to be concise:

// lib/prompts/concise.ts

export function wrapWithConciseness(prompt: string): string {
  return `
${prompt}

IMPORTANT: Be concise. Optimize for clarity and brevity.
- No unnecessary explanations
- No repeating the question
- No filler phrases ("I'd be happy to", "Let me", etc.)
- Code only when specifically requested
- Use bullet points over paragraphs
`;
}

2. Structured Output Formats

Use compact formats for structured data:

// Instead of verbose JSON
const verboseOutput = {
  "analysis_results": {
    "file_name": "app.ts",
    "issues_found": [
      {
        "type": "security",
        "severity": "high",
        "description": "SQL injection vulnerability",
        "line_number": 42,
        "suggested_fix": "Use parameterized queries"
      }
    ]
  }
};

// Use compact format
const compactOutput = {
  "f": "app.ts",
  "i": [
    { "t": "sec", "s": "H", "d": "SQL injection", "l": 42, "fix": "param queries" }
  ]
};

// Or even more compact: line-based format
const lineFormat = `
app.ts:42:H:sec:SQL injection:param queries
`;

3. Incremental Processing

Process large inputs in chunks:

// lib/processing/chunker.ts

interface ChunkResult {
  chunkIndex: number;
  result: unknown;
}

export async function processInChunks<T>(
  items: T[],
  processor: (chunk: T[]) => Promise<unknown>,
  chunkSize: number = 10
): Promise<ChunkResult[]> {
  const results: ChunkResult[] = [];

  for (let i = 0; i < items.length; i += chunkSize) {
    const chunk = items.slice(i, i + chunkSize);
    const result = await processor(chunk);
    results.push({ chunkIndex: i / chunkSize, result });
  }

  return results;
}

// Usage for code review
export async function reviewLargeCodebase(files: string[]): Promise<void> {
  const results = await processInChunks(
    files,
    async (chunk) => {
      const fileContents = await Promise.all(chunk.map(readFile));
      return await reviewCode(fileContents.join('\n---\n'));
    },
    5 // Review 5 files at a time
  );

  // Aggregate results
  console.log('Review complete:', results.length, 'chunks processed');
}

Tool Call Optimization

Tool calls add latency. Minimize them.

1. Batch Operations

// Bad: Individual tool calls
for (const file of files) {
  await readFile(file);  // 10 separate calls
}

// Good: Batch read
const contents = await batchReadFiles(files);  // 1 call

// Implementation
export async function batchReadFiles(
  paths: string[]
): Promise<Record<string, string>> {
  // Use glob pattern or Promise.all internally
  const results = await Promise.all(
    paths.map(async (path) => ({
      path,
      content: await readFile(path)
    }))
  );

  return Object.fromEntries(
    results.map(r => [r.path, r.content])
  );
}

2. Predictive Tool Use

Anticipate needed data:

// lib/tools/predictive.ts

interface TaskAnalysis {
  likelyNeededFiles: string[];
  likelyNeededCommands: string[];
  confidenceScores: Record<string, number>;
}

export function analyzeTaskRequirements(task: string): TaskAnalysis {
  const analysis: TaskAnalysis = {
    likelyNeededFiles: [],
    likelyNeededCommands: [],
    confidenceScores: {},
  };

  // Pattern matching for common tasks
  if (task.includes('test')) {
    analysis.likelyNeededFiles.push('**/*.test.ts', '**/*.spec.ts');
    analysis.likelyNeededCommands.push('npm test');
    analysis.confidenceScores['test-related'] = 0.9;
  }

  if (task.includes('build') || task.includes('compile')) {
    analysis.likelyNeededFiles.push('tsconfig.json', 'package.json');
    analysis.likelyNeededCommands.push('npm run build');
    analysis.confidenceScores['build-related'] = 0.85;
  }

  if (task.includes('lint') || task.includes('style')) {
    analysis.likelyNeededFiles.push('.eslintrc*', '.prettierrc*');
    analysis.likelyNeededCommands.push('npm run lint');
    analysis.confidenceScores['lint-related'] = 0.9;
  }

  return analysis;
}

// Pre-fetch likely needed data
export async function prefetchForTask(
  task: string,
  projectPath: string
): Promise<Record<string, unknown>> {
  const requirements = analyzeTaskRequirements(task);
  const prefetched: Record<string, unknown> = {};

  // Prefetch files with high confidence
  for (const pattern of requirements.likelyNeededFiles) {
    const files = await glob(pattern, { cwd: projectPath });
    for (const file of files.slice(0, 5)) { // Limit prefetch
      prefetched[file] = await readFile(`${projectPath}/${file}`);
    }
  }

  return prefetched;
}

3. Caching Strategy

// lib/cache/skill-cache.ts

interface CacheEntry<T> {
  value: T;
  timestamp: number;
  ttl: number;
}

class SkillCache {
  private cache: Map<string, CacheEntry<unknown>> = new Map();

  set<T>(key: string, value: T, ttlMs: number = 60000): void {
    this.cache.set(key, {
      value,
      timestamp: Date.now(),
      ttl: ttlMs,
    });
  }

  get<T>(key: string): T | undefined {
    const entry = this.cache.get(key);
    if (!entry) return undefined;

    if (Date.now() - entry.timestamp > entry.ttl) {
      this.cache.delete(key);
      return undefined;
    }

    return entry.value as T;
  }

  // Cache file contents
  async getFile(path: string): Promise<string> {
    const cached = this.get<string>(`file:${path}`);
    if (cached) return cached;

    const content = await readFile(path);
    this.set(`file:${path}`, content, 30000); // 30s cache
    return content;
  }

  // Cache expensive computations
  async getOrCompute<T>(
    key: string,
    compute: () => Promise<T>,
    ttlMs: number = 60000
  ): Promise<T> {
    const cached = this.get<T>(key);
    if (cached !== undefined) return cached;

    const value = await compute();
    this.set(key, value, ttlMs);
    return value;
  }

  clear(): void {
    this.cache.clear();
  }
}

export const skillCache = new SkillCache();

Parallel Processing

Use parallelism where possible:

// lib/parallel/runner.ts

interface ParallelTask<T> {
  id: string;
  run: () => Promise<T>;
  priority?: number;
}

export async function runParallel<T>(
  tasks: ParallelTask<T>[],
  concurrency: number = 3
): Promise<Map<string, T>> {
  const results = new Map<string, T>();
  const queue = [...tasks].sort((a, b) => (b.priority || 0) - (a.priority || 0));

  const workers = Array(concurrency).fill(null).map(async () => {
    while (queue.length > 0) {
      const task = queue.shift();
      if (!task) break;

      try {
        const result = await task.run();
        results.set(task.id, result);
      } catch (error) {
        console.error(`Task ${task.id} failed:`, error);
      }
    }
  });

  await Promise.all(workers);
  return results;
}

// Usage: Analyze multiple files in parallel
export async function analyzeFilesParallel(files: string[]): Promise<void> {
  const tasks: ParallelTask<unknown>[] = files.map(file => ({
    id: file,
    run: async () => analyzeFile(file),
    priority: file.includes('src/') ? 10 : 1, // Prioritize source files
  }));

  const results = await runParallel(tasks, 5);
  console.log(`Analyzed ${results.size} files`);
}

Performance Benchmarks

Establish baselines and track improvements:

// lib/benchmark/suite.ts

interface BenchmarkResult {
  name: string;
  runs: number;
  avgDuration: number;
  minDuration: number;
  maxDuration: number;
  avgTokens: number;
  avgCost: number;
}

export async function runBenchmark(
  name: string,
  fn: () => Promise<{ tokens: number }>,
  runs: number = 5
): Promise<BenchmarkResult> {
  const durations: number[] = [];
  const tokens: number[] = [];

  for (let i = 0; i < runs; i++) {
    const start = performance.now();
    const result = await fn();
    durations.push(performance.now() - start);
    tokens.push(result.tokens);

    // Cool down between runs
    await new Promise(r => setTimeout(r, 1000));
  }

  const avgTokens = tokens.reduce((a, b) => a + b, 0) / tokens.length;

  return {
    name,
    runs,
    avgDuration: durations.reduce((a, b) => a + b, 0) / durations.length,
    minDuration: Math.min(...durations),
    maxDuration: Math.max(...durations),
    avgTokens,
    avgCost: (avgTokens / 1000000) * 10, // Approximate blended rate
  };
}

// Benchmark suite
export async function runSkillBenchmarks(): Promise<void> {
  const benchmarks = [
    { name: 'Small file review', fn: () => reviewSmallFile() },
    { name: 'Large file review', fn: () => reviewLargeFile() },
    { name: 'Multi-file analysis', fn: () => analyzeMultipleFiles() },
    { name: 'Code generation', fn: () => generateCode() },
  ];

  console.log('Running benchmarks...\n');

  for (const { name, fn } of benchmarks) {
    const result = await runBenchmark(name, fn);
    console.log(`${result.name}:`);
    console.log(`  Avg: ${result.avgDuration.toFixed(0)}ms`);
    console.log(`  Min/Max: ${result.minDuration.toFixed(0)}/${result.maxDuration.toFixed(0)}ms`);
    console.log(`  Tokens: ${result.avgTokens.toFixed(0)}`);
    console.log(`  Est. Cost: $${result.avgCost.toFixed(4)}`);
    console.log('');
  }
}

Performance Optimization Checklist

Before shipping a skill, verify:

Context Efficiency

System prompt is concise (<500 tokens)
Only necessary files are loaded
Large files are summarized or chunked
Context usage is under 50% for typical runs

Token Efficiency

Output format is compact
No redundant information in prompts
Incremental processing for large inputs
Caching for repeated operations

Tool Efficiency

Batch operations where possible
Predictive prefetching for common patterns
Parallel processing for independent tasks
Tool call count is minimized

Monitoring

Performance metrics are tracked
Benchmarks are established
Cost per operation is known
Regressions are detected

Conclusion

Performance optimization for Claude Code skills is a balance between:

Speed: Minimize latency for better UX
Cost: Reduce token usage for affordability
Quality: Maintain output quality despite constraints
Reliability: Ensure consistent performance

Start with measurement, optimize the biggest bottlenecks first, and always verify that optimizations don't degrade output quality. A fast skill that produces poor results isn't worth having.

Ready to deploy skills at scale? Continue to Enterprise Deployment for production strategies.

Performance Optimization for Claude Code Skills

Understanding Performance Bottlenecks

Before optimizing, understand where time and tokens are spent:

The Four Performance Dimensions

Latency: Time from request to response
Token usage: Input and output tokens consumed
Tool calls: Number and duration of tool invocations
Memory: Context window utilization

Measuring Performance

// lib/performance/profiler.ts

interface PerformanceMetrics {
  totalDuration: number;
  aiCallDuration: number;
  toolCallDuration: number;
  inputTokens: number;
  outputTokens: number;
  toolCalls: number;
  contextUtilization: number;
}

class SkillProfiler {
  private startTime: number = 0;
  private metrics: Partial<PerformanceMetrics> = {};
  private toolTimes: number[] = [];

  start() {
    this.startTime = performance.now();
    this.metrics = {};
    this.toolTimes = [];
  }

  recordAICall(inputTokens: number, outputTokens: number, duration: number) {
    this.metrics.inputTokens = (this.metrics.inputTokens || 0) + inputTokens;
    this.metrics.outputTokens = (this.metrics.outputTokens || 0) + outputTokens;
    this.metrics.aiCallDuration = (this.metrics.aiCallDuration || 0) + duration;
  }

  recordToolCall(duration: number) {
    this.toolTimes.push(duration);
  }

  finish(): PerformanceMetrics {
    const totalDuration = performance.now() - this.startTime;

    return {
      totalDuration,
      aiCallDuration: this.metrics.aiCallDuration || 0,
      toolCallDuration: this.toolTimes.reduce((a, b) => a + b, 0),
      inputTokens: this.metrics.inputTokens || 0,
      outputTokens: this.metrics.outputTokens || 0,
      toolCalls: this.toolTimes.length,
      contextUtilization: this.calculateContextUtilization(),
    };
  }

  private calculateContextUtilization(): number {
    const totalTokens = (this.metrics.inputTokens || 0) + (this.metrics.outputTokens || 0);
    const maxTokens = 200000; // Claude's context window
    return (totalTokens / maxTokens) * 100;
  }

  printReport() {
    const metrics = this.finish();
    console.log('\n=== Performance Report ===');
    console.log(`Total Duration: ${metrics.totalDuration.toFixed(0)}ms`);
    console.log(`AI Call Time: ${metrics.aiCallDuration.toFixed(0)}ms`);
    console.log(`Tool Call Time: ${metrics.toolCallDuration.toFixed(0)}ms`);
    console.log(`Input Tokens: ${metrics.inputTokens}`);
    console.log(`Output Tokens: ${metrics.outputTokens}`);
    console.log(`Tool Calls: ${metrics.toolCalls}`);
    console.log(`Context Usage: ${metrics.contextUtilization.toFixed(1)}%`);

    // Cost estimate (approximate)
    const inputCost = (metrics.inputTokens / 1000000) * 3; // $3/MTok input
    const outputCost = (metrics.outputTokens / 1000000) * 15; // $15/MTok output
    console.log(`Estimated Cost: $${(inputCost + outputCost).toFixed(4)}`);
  }
}

export const profiler = new SkillProfiler();

Context Window Optimization

The context window is your most precious resource. Every token counts.

1. Minimize System Prompt Size

// Bad: Verbose system prompt
const verbosePrompt = `
You are an expert software developer with years of experience in multiple
programming languages including JavaScript, TypeScript, Python, Go, Rust,
and many others. You have deep knowledge of software architecture, design
patterns, best practices, code review, testing methodologies, and more.
When asked to help with code, you should provide thorough, well-documented
solutions that follow industry best practices and conventions...
[500+ more words]
`;

// Good: Concise system prompt
const concisePrompt = `
Expert code reviewer. Focus on:
- Bugs and security issues
- Performance problems
- Style violations per project conventions

Be concise. Prioritize critical issues.
`;

Reduction: ~80% fewer tokens

2. Smart Context Loading

Only load what you need:

// lib/context/smart-loader.ts

interface FileContext {
  path: string;
  content: string;
  relevance: number;
}

export async function loadRelevantContext(
  query: string,
  projectPath: string,
  maxTokens: number = 50000
): Promise<FileContext[]> {
  const estimateTokens = (text: string) => Math.ceil(text.length / 4);

  // 1. Find potentially relevant files
  const allFiles = await findFiles(projectPath, ['**/*.ts', '**/*.js']);

  // 2. Score files by relevance to query
  const scored = await Promise.all(
    allFiles.map(async (path) => {
      const content = await readFile(path);
      const relevance = calculateRelevance(query, path, content);
      return { path, content, relevance };
    })
  );

  // 3. Sort by relevance and fit within token budget
  scored.sort((a, b) => b.relevance - a.relevance);

  const result: FileContext[] = [];
  let currentTokens = 0;

  for (const file of scored) {
    const fileTokens = estimateTokens(file.content);
    if (currentTokens + fileTokens > maxTokens) {
      // Try to include a summary instead
      const summary = summarizeFile(file.content);
      const summaryTokens = estimateTokens(summary);
      if (currentTokens + summaryTokens <= maxTokens) {
        result.push({ ...file, content: summary });
        currentTokens += summaryTokens;
      }
      continue;
    }
    result.push(file);
    currentTokens += fileTokens;
  }

  return result;
}

function calculateRelevance(query: string, path: string, content: string): number {
  let score = 0;

  // Path relevance
  const queryTerms = query.toLowerCase().split(/\s+/);
  for (const term of queryTerms) {
    if (path.toLowerCase().includes(term)) score += 10;
    if (content.toLowerCase().includes(term)) score += 1;
  }

  // Recency bonus (prefer recently modified)
  // Implementation would check file stats

  // Import graph bonus (prefer imported files)
  // Implementation would analyze imports

  return score;
}

function summarizeFile(content: string): string {
  // Extract key elements only
  const lines = content.split('\n');
  const summary: string[] = [];

  for (const line of lines) {
    // Keep exports, function signatures, class declarations
    if (
      line.includes('export ') ||
      line.match(/^(async\s+)?function\s+\w+/) ||
      line.match(/^class\s+\w+/) ||
      line.match(/^interface\s+\w+/) ||
      line.match(/^type\s+\w+/)
    ) {
      summary.push(line);
    }
  }

  return summary.join('\n');
}

3. Progressive Context Loading

Start small, add context only when needed:

// lib/context/progressive-loader.ts

export async function progressiveContextStrategy(
  task: string,
  projectPath: string
): Promise<string[]> {
  const contexts: string[] = [];

  // Level 1: Minimal context (try this first)
  contexts.push(await loadMinimalContext(projectPath));

  // Level 2: Add relevant files based on task
  if (taskRequiresMoreContext(task)) {
    contexts.push(await loadRelevantFiles(task, projectPath));
  }

  // Level 3: Add full file contents only if needed
  if (taskRequiresFullContext(task)) {
    contexts.push(await loadFullContext(projectPath));
  }

  return contexts;
}

async function loadMinimalContext(projectPath: string): Promise<string> {
  // Just the project structure and key files
  const structure = await getDirectoryTree(projectPath, { maxDepth: 3 });
  const readme = await readFileIfExists(`${projectPath}/README.md`);
  const packageJson = await readFileIfExists(`${projectPath}/package.json`);

  return `
Project Structure:
${structure}

${readme ? `README:\n${readme.substring(0, 1000)}...` : ''}
${packageJson ? `Dependencies: ${extractDeps(packageJson)}` : ''}
  `.trim();
}

function taskRequiresMoreContext(task: string): boolean {
  const contextHeavyKeywords = [
    'refactor', 'analyze', 'review', 'understand',
    'how does', 'explain', 'find all'
  ];
  return contextHeavyKeywords.some(k => task.toLowerCase().includes(k));
}

Token Optimization Strategies

1. Output Compression

Instruct the AI to be concise:

// lib/prompts/concise.ts

export function wrapWithConciseness(prompt: string): string {
  return `
${prompt}

IMPORTANT: Be concise. Optimize for clarity and brevity.
- No unnecessary explanations
- No repeating the question
- No filler phrases ("I'd be happy to", "Let me", etc.)
- Code only when specifically requested
- Use bullet points over paragraphs
`;
}

2. Structured Output Formats

Use compact formats for structured data:

// Instead of verbose JSON
const verboseOutput = {
  "analysis_results": {
    "file_name": "app.ts",
    "issues_found": [
      {
        "type": "security",
        "severity": "high",
        "description": "SQL injection vulnerability",
        "line_number": 42,
        "suggested_fix": "Use parameterized queries"
      }
    ]
  }
};

// Use compact format
const compactOutput = {
  "f": "app.ts",
  "i": [
    { "t": "sec", "s": "H", "d": "SQL injection", "l": 42, "fix": "param queries" }
  ]
};

// Or even more compact: line-based format
const lineFormat = `
app.ts:42:H:sec:SQL injection:param queries
`;

3. Incremental Processing

Process large inputs in chunks:

// lib/processing/chunker.ts

interface ChunkResult {
  chunkIndex: number;
  result: unknown;
}

export async function processInChunks<T>(
  items: T[],
  processor: (chunk: T[]) => Promise<unknown>,
  chunkSize: number = 10
): Promise<ChunkResult[]> {
  const results: ChunkResult[] = [];

  for (let i = 0; i < items.length; i += chunkSize) {
    const chunk = items.slice(i, i + chunkSize);
    const result = await processor(chunk);
    results.push({ chunkIndex: i / chunkSize, result });
  }

  return results;
}

// Usage for code review
export async function reviewLargeCodebase(files: string[]): Promise<void> {
  const results = await processInChunks(
    files,
    async (chunk) => {
      const fileContents = await Promise.all(chunk.map(readFile));
      return await reviewCode(fileContents.join('\n---\n'));
    },
    5 // Review 5 files at a time
  );

  // Aggregate results
  console.log('Review complete:', results.length, 'chunks processed');
}

Tool Call Optimization

Tool calls add latency. Minimize them.

1. Batch Operations

// Bad: Individual tool calls
for (const file of files) {
  await readFile(file);  // 10 separate calls
}

// Good: Batch read
const contents = await batchReadFiles(files);  // 1 call

// Implementation
export async function batchReadFiles(
  paths: string[]
): Promise<Record<string, string>> {
  // Use glob pattern or Promise.all internally
  const results = await Promise.all(
    paths.map(async (path) => ({
      path,
      content: await readFile(path)
    }))
  );

  return Object.fromEntries(
    results.map(r => [r.path, r.content])
  );
}

2. Predictive Tool Use

Anticipate needed data:

// lib/tools/predictive.ts

interface TaskAnalysis {
  likelyNeededFiles: string[];
  likelyNeededCommands: string[];
  confidenceScores: Record<string, number>;
}

export function analyzeTaskRequirements(task: string): TaskAnalysis {
  const analysis: TaskAnalysis = {
    likelyNeededFiles: [],
    likelyNeededCommands: [],
    confidenceScores: {},
  };

  // Pattern matching for common tasks
  if (task.includes('test')) {
    analysis.likelyNeededFiles.push('**/*.test.ts', '**/*.spec.ts');
    analysis.likelyNeededCommands.push('npm test');
    analysis.confidenceScores['test-related'] = 0.9;
  }

  if (task.includes('build') || task.includes('compile')) {
    analysis.likelyNeededFiles.push('tsconfig.json', 'package.json');
    analysis.likelyNeededCommands.push('npm run build');
    analysis.confidenceScores['build-related'] = 0.85;
  }

  if (task.includes('lint') || task.includes('style')) {
    analysis.likelyNeededFiles.push('.eslintrc*', '.prettierrc*');
    analysis.likelyNeededCommands.push('npm run lint');
    analysis.confidenceScores['lint-related'] = 0.9;
  }

  return analysis;
}

// Pre-fetch likely needed data
export async function prefetchForTask(
  task: string,
  projectPath: string
): Promise<Record<string, unknown>> {
  const requirements = analyzeTaskRequirements(task);
  const prefetched: Record<string, unknown> = {};

  // Prefetch files with high confidence
  for (const pattern of requirements.likelyNeededFiles) {
    const files = await glob(pattern, { cwd: projectPath });
    for (const file of files.slice(0, 5)) { // Limit prefetch
      prefetched[file] = await readFile(`${projectPath}/${file}`);
    }
  }

  return prefetched;
}

3. Caching Strategy

// lib/cache/skill-cache.ts

interface CacheEntry<T> {
  value: T;
  timestamp: number;
  ttl: number;
}

class SkillCache {
  private cache: Map<string, CacheEntry<unknown>> = new Map();

  set<T>(key: string, value: T, ttlMs: number = 60000): void {
    this.cache.set(key, {
      value,
      timestamp: Date.now(),
      ttl: ttlMs,
    });
  }

  get<T>(key: string): T | undefined {
    const entry = this.cache.get(key);
    if (!entry) return undefined;

    if (Date.now() - entry.timestamp > entry.ttl) {
      this.cache.delete(key);
      return undefined;
    }

    return entry.value as T;
  }

  // Cache file contents
  async getFile(path: string): Promise<string> {
    const cached = this.get<string>(`file:${path}`);
    if (cached) return cached;

    const content = await readFile(path);
    this.set(`file:${path}`, content, 30000); // 30s cache
    return content;
  }

  // Cache expensive computations
  async getOrCompute<T>(
    key: string,
    compute: () => Promise<T>,
    ttlMs: number = 60000
  ): Promise<T> {
    const cached = this.get<T>(key);
    if (cached !== undefined) return cached;

    const value = await compute();
    this.set(key, value, ttlMs);
    return value;
  }

  clear(): void {
    this.cache.clear();
  }
}

export const skillCache = new SkillCache();

Parallel Processing

Use parallelism where possible:

// lib/parallel/runner.ts

interface ParallelTask<T> {
  id: string;
  run: () => Promise<T>;
  priority?: number;
}

export async function runParallel<T>(
  tasks: ParallelTask<T>[],
  concurrency: number = 3
): Promise<Map<string, T>> {
  const results = new Map<string, T>();
  const queue = [...tasks].sort((a, b) => (b.priority || 0) - (a.priority || 0));

  const workers = Array(concurrency).fill(null).map(async () => {
    while (queue.length > 0) {
      const task = queue.shift();
      if (!task) break;

      try {
        const result = await task.run();
        results.set(task.id, result);
      } catch (error) {
        console.error(`Task ${task.id} failed:`, error);
      }
    }
  });

  await Promise.all(workers);
  return results;
}

// Usage: Analyze multiple files in parallel
export async function analyzeFilesParallel(files: string[]): Promise<void> {
  const tasks: ParallelTask<unknown>[] = files.map(file => ({
    id: file,
    run: async () => analyzeFile(file),
    priority: file.includes('src/') ? 10 : 1, // Prioritize source files
  }));

  const results = await runParallel(tasks, 5);
  console.log(`Analyzed ${results.size} files`);
}

Performance Benchmarks

Establish baselines and track improvements:

// lib/benchmark/suite.ts

interface BenchmarkResult {
  name: string;
  runs: number;
  avgDuration: number;
  minDuration: number;
  maxDuration: number;
  avgTokens: number;
  avgCost: number;
}

export async function runBenchmark(
  name: string,
  fn: () => Promise<{ tokens: number }>,
  runs: number = 5
): Promise<BenchmarkResult> {
  const durations: number[] = [];
  const tokens: number[] = [];

  for (let i = 0; i < runs; i++) {
    const start = performance.now();
    const result = await fn();
    durations.push(performance.now() - start);
    tokens.push(result.tokens);

    // Cool down between runs
    await new Promise(r => setTimeout(r, 1000));
  }

  const avgTokens = tokens.reduce((a, b) => a + b, 0) / tokens.length;

  return {
    name,
    runs,
    avgDuration: durations.reduce((a, b) => a + b, 0) / durations.length,
    minDuration: Math.min(...durations),
    maxDuration: Math.max(...durations),
    avgTokens,
    avgCost: (avgTokens / 1000000) * 10, // Approximate blended rate
  };
}

// Benchmark suite
export async function runSkillBenchmarks(): Promise<void> {
  const benchmarks = [
    { name: 'Small file review', fn: () => reviewSmallFile() },
    { name: 'Large file review', fn: () => reviewLargeFile() },
    { name: 'Multi-file analysis', fn: () => analyzeMultipleFiles() },
    { name: 'Code generation', fn: () => generateCode() },
  ];

  console.log('Running benchmarks...\n');

  for (const { name, fn } of benchmarks) {
    const result = await runBenchmark(name, fn);
    console.log(`${result.name}:`);
    console.log(`  Avg: ${result.avgDuration.toFixed(0)}ms`);
    console.log(`  Min/Max: ${result.minDuration.toFixed(0)}/${result.maxDuration.toFixed(0)}ms`);
    console.log(`  Tokens: ${result.avgTokens.toFixed(0)}`);
    console.log(`  Est. Cost: $${result.avgCost.toFixed(4)}`);
    console.log('');
  }
}

Performance Optimization Checklist

Before shipping a skill, verify:

Context Efficiency

System prompt is concise (<500 tokens)
Only necessary files are loaded
Large files are summarized or chunked
Context usage is under 50% for typical runs

Token Efficiency

Output format is compact
No redundant information in prompts
Incremental processing for large inputs
Caching for repeated operations

Tool Efficiency

Batch operations where possible
Predictive prefetching for common patterns
Parallel processing for independent tasks
Tool call count is minimized

Monitoring

Performance metrics are tracked
Benchmarks are established
Cost per operation is known
Regressions are detected

Conclusion

Performance optimization for Claude Code skills is a balance between:

Speed: Minimize latency for better UX
Cost: Reduce token usage for affordability
Quality: Maintain output quality despite constraints
Reliability: Ensure consistent performance

Start with measurement, optimize the biggest bottlenecks first, and always verify that optimizations don't degrade output quality. A fast skill that produces poor results isn't worth having.

Ready to deploy skills at scale? Continue to Enterprise Deployment for production strategies.