Performance Optimization for Claude Code Skills
Optimize Claude Code skill performance with context management, token optimization, caching strategies, and efficient tool usage patterns.
Performance Optimization for Claude Code Skills
Performance in Claude Code skills isn't just about speed—it's about cost, user experience, and reliability. A skill that takes 30 seconds and costs $0.50 per run won't see adoption, no matter how powerful it is. This guide covers practical techniques for building fast, efficient skills.
Understanding Performance Bottlenecks
Before optimizing, understand where time and tokens are spent:
The Four Performance Dimensions
- Latency: Time from request to response
- Token usage: Input and output tokens consumed
- Tool calls: Number and duration of tool invocations
- Memory: Context window utilization
Measuring Performance
// lib/performance/profiler.ts
interface PerformanceMetrics {
totalDuration: number;
aiCallDuration: number;
toolCallDuration: number;
inputTokens: number;
outputTokens: number;
toolCalls: number;
contextUtilization: number;
}
class SkillProfiler {
private startTime: number = 0;
private metrics: Partial<PerformanceMetrics> = {};
private toolTimes: number[] = [];
start() {
this.startTime = performance.now();
this.metrics = {};
this.toolTimes = [];
}
recordAICall(inputTokens: number, outputTokens: number, duration: number) {
this.metrics.inputTokens = (this.metrics.inputTokens || 0) + inputTokens;
this.metrics.outputTokens = (this.metrics.outputTokens || 0) + outputTokens;
this.metrics.aiCallDuration = (this.metrics.aiCallDuration || 0) + duration;
}
recordToolCall(duration: number) {
this.toolTimes.push(duration);
}
finish(): PerformanceMetrics {
const totalDuration = performance.now() - this.startTime;
return {
totalDuration,
aiCallDuration: this.metrics.aiCallDuration || 0,
toolCallDuration: this.toolTimes.reduce((a, b) => a + b, 0),
inputTokens: this.metrics.inputTokens || 0,
outputTokens: this.metrics.outputTokens || 0,
toolCalls: this.toolTimes.length,
contextUtilization: this.calculateContextUtilization(),
};
}
private calculateContextUtilization(): number {
const totalTokens = (this.metrics.inputTokens || 0) + (this.metrics.outputTokens || 0);
const maxTokens = 200000; // Claude's context window
return (totalTokens / maxTokens) * 100;
}
printReport() {
const metrics = this.finish();
console.log('\n=== Performance Report ===');
console.log(`Total Duration: ${metrics.totalDuration.toFixed(0)}ms`);
console.log(`AI Call Time: ${metrics.aiCallDuration.toFixed(0)}ms`);
console.log(`Tool Call Time: ${metrics.toolCallDuration.toFixed(0)}ms`);
console.log(`Input Tokens: ${metrics.inputTokens}`);
console.log(`Output Tokens: ${metrics.outputTokens}`);
console.log(`Tool Calls: ${metrics.toolCalls}`);
console.log(`Context Usage: ${metrics.contextUtilization.toFixed(1)}%`);
// Cost estimate (approximate)
const inputCost = (metrics.inputTokens / 1000000) * 3; // $3/MTok input
const outputCost = (metrics.outputTokens / 1000000) * 15; // $15/MTok output
console.log(`Estimated Cost: $${(inputCost + outputCost).toFixed(4)}`);
}
}
export const profiler = new SkillProfiler();
Context Window Optimization
The context window is your most precious resource. Every token counts.
1. Minimize System Prompt Size
// Bad: Verbose system prompt
const verbosePrompt = `
You are an expert software developer with years of experience in multiple
programming languages including JavaScript, TypeScript, Python, Go, Rust,
and many others. You have deep knowledge of software architecture, design
patterns, best practices, code review, testing methodologies, and more.
When asked to help with code, you should provide thorough, well-documented
solutions that follow industry best practices and conventions...
[500+ more words]
`;
// Good: Concise system prompt
const concisePrompt = `
Expert code reviewer. Focus on:
- Bugs and security issues
- Performance problems
- Style violations per project conventions
Be concise. Prioritize critical issues.
`;
Reduction: ~80% fewer tokens
2. Smart Context Loading
Only load what you need:
// lib/context/smart-loader.ts
interface FileContext {
path: string;
content: string;
relevance: number;
}
export async function loadRelevantContext(
query: string,
projectPath: string,
maxTokens: number = 50000
): Promise<FileContext[]> {
const estimateTokens = (text: string) => Math.ceil(text.length / 4);
// 1. Find potentially relevant files
const allFiles = await findFiles(projectPath, ['**/*.ts', '**/*.js']);
// 2. Score files by relevance to query
const scored = await Promise.all(
allFiles.map(async (path) => {
const content = await readFile(path);
const relevance = calculateRelevance(query, path, content);
return { path, content, relevance };
})
);
// 3. Sort by relevance and fit within token budget
scored.sort((a, b) => b.relevance - a.relevance);
const result: FileContext[] = [];
let currentTokens = 0;
for (const file of scored) {
const fileTokens = estimateTokens(file.content);
if (currentTokens + fileTokens > maxTokens) {
// Try to include a summary instead
const summary = summarizeFile(file.content);
const summaryTokens = estimateTokens(summary);
if (currentTokens + summaryTokens <= maxTokens) {
result.push({ ...file, content: summary });
currentTokens += summaryTokens;
}
continue;
}
result.push(file);
currentTokens += fileTokens;
}
return result;
}
function calculateRelevance(query: string, path: string, content: string): number {
let score = 0;
// Path relevance
const queryTerms = query.toLowerCase().split(/\s+/);
for (const term of queryTerms) {
if (path.toLowerCase().includes(term)) score += 10;
if (content.toLowerCase().includes(term)) score += 1;
}
// Recency bonus (prefer recently modified)
// Implementation would check file stats
// Import graph bonus (prefer imported files)
// Implementation would analyze imports
return score;
}
function summarizeFile(content: string): string {
// Extract key elements only
const lines = content.split('\n');
const summary: string[] = [];
for (const line of lines) {
// Keep exports, function signatures, class declarations
if (
line.includes('export ') ||
line.match(/^(async\s+)?function\s+\w+/) ||
line.match(/^class\s+\w+/) ||
line.match(/^interface\s+\w+/) ||
line.match(/^type\s+\w+/)
) {
summary.push(line);
}
}
return summary.join('\n');
}
3. Progressive Context Loading
Start small, add context only when needed:
// lib/context/progressive-loader.ts
export async function progressiveContextStrategy(
task: string,
projectPath: string
): Promise<string[]> {
const contexts: string[] = [];
// Level 1: Minimal context (try this first)
contexts.push(await loadMinimalContext(projectPath));
// Level 2: Add relevant files based on task
if (taskRequiresMoreContext(task)) {
contexts.push(await loadRelevantFiles(task, projectPath));
}
// Level 3: Add full file contents only if needed
if (taskRequiresFullContext(task)) {
contexts.push(await loadFullContext(projectPath));
}
return contexts;
}
async function loadMinimalContext(projectPath: string): Promise<string> {
// Just the project structure and key files
const structure = await getDirectoryTree(projectPath, { maxDepth: 3 });
const readme = await readFileIfExists(`${projectPath}/README.md`);
const packageJson = await readFileIfExists(`${projectPath}/package.json`);
return `
Project Structure:
${structure}
${readme ? `README:\n${readme.substring(0, 1000)}...` : ''}
${packageJson ? `Dependencies: ${extractDeps(packageJson)}` : ''}
`.trim();
}
function taskRequiresMoreContext(task: string): boolean {
const contextHeavyKeywords = [
'refactor', 'analyze', 'review', 'understand',
'how does', 'explain', 'find all'
];
return contextHeavyKeywords.some(k => task.toLowerCase().includes(k));
}
Token Optimization Strategies
1. Output Compression
Instruct the AI to be concise:
// lib/prompts/concise.ts
export function wrapWithConciseness(prompt: string): string {
return `
${prompt}
IMPORTANT: Be concise. Optimize for clarity and brevity.
- No unnecessary explanations
- No repeating the question
- No filler phrases ("I'd be happy to", "Let me", etc.)
- Code only when specifically requested
- Use bullet points over paragraphs
`;
}
2. Structured Output Formats
Use compact formats for structured data:
// Instead of verbose JSON
const verboseOutput = {
"analysis_results": {
"file_name": "app.ts",
"issues_found": [
{
"type": "security",
"severity": "high",
"description": "SQL injection vulnerability",
"line_number": 42,
"suggested_fix": "Use parameterized queries"
}
]
}
};
// Use compact format
const compactOutput = {
"f": "app.ts",
"i": [
{ "t": "sec", "s": "H", "d": "SQL injection", "l": 42, "fix": "param queries" }
]
};
// Or even more compact: line-based format
const lineFormat = `
app.ts:42:H:sec:SQL injection:param queries
`;
3. Incremental Processing
Process large inputs in chunks:
// lib/processing/chunker.ts
interface ChunkResult {
chunkIndex: number;
result: unknown;
}
export async function processInChunks<T>(
items: T[],
processor: (chunk: T[]) => Promise<unknown>,
chunkSize: number = 10
): Promise<ChunkResult[]> {
const results: ChunkResult[] = [];
for (let i = 0; i < items.length; i += chunkSize) {
const chunk = items.slice(i, i + chunkSize);
const result = await processor(chunk);
results.push({ chunkIndex: i / chunkSize, result });
}
return results;
}
// Usage for code review
export async function reviewLargeCodebase(files: string[]): Promise<void> {
const results = await processInChunks(
files,
async (chunk) => {
const fileContents = await Promise.all(chunk.map(readFile));
return await reviewCode(fileContents.join('\n---\n'));
},
5 // Review 5 files at a time
);
// Aggregate results
console.log('Review complete:', results.length, 'chunks processed');
}
Tool Call Optimization
Tool calls add latency. Minimize them.
1. Batch Operations
// Bad: Individual tool calls
for (const file of files) {
await readFile(file); // 10 separate calls
}
// Good: Batch read
const contents = await batchReadFiles(files); // 1 call
// Implementation
export async function batchReadFiles(
paths: string[]
): Promise<Record<string, string>> {
// Use glob pattern or Promise.all internally
const results = await Promise.all(
paths.map(async (path) => ({
path,
content: await readFile(path)
}))
);
return Object.fromEntries(
results.map(r => [r.path, r.content])
);
}
2. Predictive Tool Use
Anticipate needed data:
// lib/tools/predictive.ts
interface TaskAnalysis {
likelyNeededFiles: string[];
likelyNeededCommands: string[];
confidenceScores: Record<string, number>;
}
export function analyzeTaskRequirements(task: string): TaskAnalysis {
const analysis: TaskAnalysis = {
likelyNeededFiles: [],
likelyNeededCommands: [],
confidenceScores: {},
};
// Pattern matching for common tasks
if (task.includes('test')) {
analysis.likelyNeededFiles.push('**/*.test.ts', '**/*.spec.ts');
analysis.likelyNeededCommands.push('npm test');
analysis.confidenceScores['test-related'] = 0.9;
}
if (task.includes('build') || task.includes('compile')) {
analysis.likelyNeededFiles.push('tsconfig.json', 'package.json');
analysis.likelyNeededCommands.push('npm run build');
analysis.confidenceScores['build-related'] = 0.85;
}
if (task.includes('lint') || task.includes('style')) {
analysis.likelyNeededFiles.push('.eslintrc*', '.prettierrc*');
analysis.likelyNeededCommands.push('npm run lint');
analysis.confidenceScores['lint-related'] = 0.9;
}
return analysis;
}
// Pre-fetch likely needed data
export async function prefetchForTask(
task: string,
projectPath: string
): Promise<Record<string, unknown>> {
const requirements = analyzeTaskRequirements(task);
const prefetched: Record<string, unknown> = {};
// Prefetch files with high confidence
for (const pattern of requirements.likelyNeededFiles) {
const files = await glob(pattern, { cwd: projectPath });
for (const file of files.slice(0, 5)) { // Limit prefetch
prefetched[file] = await readFile(`${projectPath}/${file}`);
}
}
return prefetched;
}
3. Caching Strategy
// lib/cache/skill-cache.ts
interface CacheEntry<T> {
value: T;
timestamp: number;
ttl: number;
}
class SkillCache {
private cache: Map<string, CacheEntry<unknown>> = new Map();
set<T>(key: string, value: T, ttlMs: number = 60000): void {
this.cache.set(key, {
value,
timestamp: Date.now(),
ttl: ttlMs,
});
}
get<T>(key: string): T | undefined {
const entry = this.cache.get(key);
if (!entry) return undefined;
if (Date.now() - entry.timestamp > entry.ttl) {
this.cache.delete(key);
return undefined;
}
return entry.value as T;
}
// Cache file contents
async getFile(path: string): Promise<string> {
const cached = this.get<string>(`file:${path}`);
if (cached) return cached;
const content = await readFile(path);
this.set(`file:${path}`, content, 30000); // 30s cache
return content;
}
// Cache expensive computations
async getOrCompute<T>(
key: string,
compute: () => Promise<T>,
ttlMs: number = 60000
): Promise<T> {
const cached = this.get<T>(key);
if (cached !== undefined) return cached;
const value = await compute();
this.set(key, value, ttlMs);
return value;
}
clear(): void {
this.cache.clear();
}
}
export const skillCache = new SkillCache();
Parallel Processing
Use parallelism where possible:
// lib/parallel/runner.ts
interface ParallelTask<T> {
id: string;
run: () => Promise<T>;
priority?: number;
}
export async function runParallel<T>(
tasks: ParallelTask<T>[],
concurrency: number = 3
): Promise<Map<string, T>> {
const results = new Map<string, T>();
const queue = [...tasks].sort((a, b) => (b.priority || 0) - (a.priority || 0));
const workers = Array(concurrency).fill(null).map(async () => {
while (queue.length > 0) {
const task = queue.shift();
if (!task) break;
try {
const result = await task.run();
results.set(task.id, result);
} catch (error) {
console.error(`Task ${task.id} failed:`, error);
}
}
});
await Promise.all(workers);
return results;
}
// Usage: Analyze multiple files in parallel
export async function analyzeFilesParallel(files: string[]): Promise<void> {
const tasks: ParallelTask<unknown>[] = files.map(file => ({
id: file,
run: async () => analyzeFile(file),
priority: file.includes('src/') ? 10 : 1, // Prioritize source files
}));
const results = await runParallel(tasks, 5);
console.log(`Analyzed ${results.size} files`);
}
Performance Benchmarks
Establish baselines and track improvements:
// lib/benchmark/suite.ts
interface BenchmarkResult {
name: string;
runs: number;
avgDuration: number;
minDuration: number;
maxDuration: number;
avgTokens: number;
avgCost: number;
}
export async function runBenchmark(
name: string,
fn: () => Promise<{ tokens: number }>,
runs: number = 5
): Promise<BenchmarkResult> {
const durations: number[] = [];
const tokens: number[] = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
const result = await fn();
durations.push(performance.now() - start);
tokens.push(result.tokens);
// Cool down between runs
await new Promise(r => setTimeout(r, 1000));
}
const avgTokens = tokens.reduce((a, b) => a + b, 0) / tokens.length;
return {
name,
runs,
avgDuration: durations.reduce((a, b) => a + b, 0) / durations.length,
minDuration: Math.min(...durations),
maxDuration: Math.max(...durations),
avgTokens,
avgCost: (avgTokens / 1000000) * 10, // Approximate blended rate
};
}
// Benchmark suite
export async function runSkillBenchmarks(): Promise<void> {
const benchmarks = [
{ name: 'Small file review', fn: () => reviewSmallFile() },
{ name: 'Large file review', fn: () => reviewLargeFile() },
{ name: 'Multi-file analysis', fn: () => analyzeMultipleFiles() },
{ name: 'Code generation', fn: () => generateCode() },
];
console.log('Running benchmarks...\n');
for (const { name, fn } of benchmarks) {
const result = await runBenchmark(name, fn);
console.log(`${result.name}:`);
console.log(` Avg: ${result.avgDuration.toFixed(0)}ms`);
console.log(` Min/Max: ${result.minDuration.toFixed(0)}/${result.maxDuration.toFixed(0)}ms`);
console.log(` Tokens: ${result.avgTokens.toFixed(0)}`);
console.log(` Est. Cost: $${result.avgCost.toFixed(4)}`);
console.log('');
}
}
Performance Optimization Checklist
Before shipping a skill, verify:
Context Efficiency
- System prompt is concise (<500 tokens)
- Only necessary files are loaded
- Large files are summarized or chunked
- Context usage is under 50% for typical runs
Token Efficiency
- Output format is compact
- No redundant information in prompts
- Incremental processing for large inputs
- Caching for repeated operations
Tool Efficiency
- Batch operations where possible
- Predictive prefetching for common patterns
- Parallel processing for independent tasks
- Tool call count is minimized
Monitoring
- Performance metrics are tracked
- Benchmarks are established
- Cost per operation is known
- Regressions are detected
Conclusion
Performance optimization for Claude Code skills is a balance between:
- Speed: Minimize latency for better UX
- Cost: Reduce token usage for affordability
- Quality: Maintain output quality despite constraints
- Reliability: Ensure consistent performance
Start with measurement, optimize the biggest bottlenecks first, and always verify that optimizations don't degrade output quality. A fast skill that produces poor results isn't worth having.
Ready to deploy skills at scale? Continue to Enterprise Deployment for production strategies.