Reading Karpathy's Guidelines: What His One Skill Reveals

Most analysis of the karpathy-guidelines skill focuses on what it is — a behavioral CLAUDE.md encoded as an installable skill, derived from Andrej Karpathy's public observations on LLM coding pitfalls. Less attention goes to how it's written: what makes the four rules structurally effective as prompting constraints, why the ordering matters, and what the choice to include certain elements and exclude others reveals about the author's theory of LLM behavior.

A close reading takes five minutes and returns more than a surface summary.

The Full Skill, Briefly

The karpathy-guidelines skill contains four sections:

Think Before Coding — surface assumptions, ask when confused, don't guess
Simplicity First — minimum code, no speculation, no unrequested features
Surgical Changes — touch only what the request requires, clean your own mess
Goal-Driven Execution — convert vague tasks to verifiable success criteria

Each section has a bolded header stating the rule, a brief elaboration, and in some cases a concrete check or template. The document is 65 lines. There is no preamble, no aspirational framing, and no conclusion.

Rule One: Think Before Coding

The first rule is a gate. "Before implementing: state your assumptions explicitly. If uncertain, ask."

The structure here is important. The rule doesn't say "think carefully about the problem" — that's advice without a behavior attached. It says: before implementing, state assumptions and surface ambiguity. These are specific, observable behaviors. Either Claude stated its assumptions before generating code, or it didn't.

The rule also names what to do with confusion: "Name what's confusing. Ask." This is a behavioral redirect for a specific failure mode — the model resolving confusion by guessing rather than clarifying. Naming the confusion is more useful than resolving it, because it gives the developer the information needed to provide a useful answer.

The ordering matters. "Think Before Coding" is rule one because the other rules are less useful if the task is wrongly specified. No amount of simplicity or surgical precision recovers a correct implementation of the wrong thing.

Rule Two: Simplicity First

The second rule is the most actionable: "Minimum code that solves the problem. Nothing speculative."

The elaboration lists specific prohibitions: no features beyond what was asked, no abstractions for single-use code, no configurability that wasn't requested, no error handling for impossible scenarios. These aren't vague exhortations — they're a list of the specific behaviors that constitute over-building.

The self-test provided is worth highlighting: "If you write 200 lines and it could be 50, rewrite it." This converts a qualitative principle into a quantitative check. The ratio isn't the point — the act of asking "could this be shorter?" is. Most LLMs, absent this constraint, will not ask themselves this question. Generating 200 lines when 50 would do demonstrates thoroughness; it produces a worse outcome.

The broader principle here is that LLMs are trained on feedback that rewards apparent effort and completeness. A longer, more elaborate implementation looks more impressive and tends to get more positive feedback signals during training than a terse, minimal one. Simplicity First is a correction to this training bias.

Rule Three: Surgical Changes

The third rule addresses a subtler problem than over-building: it addresses scope.

"Don't 'improve' adjacent code, comments, or formatting. Don't refactor things that aren't broken." The key word is "adjacent" — the problem isn't improving the code being changed, it's improving code that was merely near the code being changed.

The rule makes a useful distinction: when your changes create orphans (unused imports, dead variables), clean those up. When you notice pre-existing orphans, mention them — don't clean them up. Your changes are your responsibility; the author's pre-existing decisions are not.

"The test: Every changed line should trace directly to the user's request." This is a definition of scope that's precise enough to be useful. It doesn't require judging whether a change is "good" — it requires tracing whether a change was requested.

This rule is doing something specific to the way LLMs experience code: they see an entire codebase, notice all the things that could be improved, and sometimes improve them unbidden. That behavior is reasonable in a developer who has unlimited time and clear authorization to improve things. It's harmful in a tool that's supposed to make a precise change and stop.

Rule Four: Goal-Driven Execution

The fourth rule addresses the relationship between vague task descriptions and implementation quality.

"Transform tasks into verifiable goals." The examples are the best part of the rule:

"Add validation" becomes "Write tests for invalid inputs, then make them pass"
"Fix the bug" becomes "Write a test that reproduces it, then make it pass"
"Refactor X" becomes "Ensure tests pass before and after"

The transformation in each case is: convert the task description into a test-first formulation with a clear pass/fail criterion. This is not just good LLM prompting — it's a reasonable software engineering principle in its own right.

The planning template for multi-step tasks is the most structural element in the entire skill:

1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]

This template does one thing: it requires that every step have an associated verification. Without this constraint, agentic coding sessions tend to drift — each step seems complete, but there's no mechanism to confirm that "seems complete" is actually "is complete."

What the Skill Excludes

The exclusions are as telling as the inclusions. There is no rule about code style, no preferred libraries, no architectural patterns, no language-specific guidance. There is no rule about communication tone, response length, or formatting.

The skill is narrowly scoped to behavioral constraints during coding tasks. This narrow scope is what makes it composable with project-specific CLAUDE.md files and domain skills — it doesn't overlap with them. A team can install karpathy-guidelines alongside their own project CLAUDE.md without conflicts.

The Structure as an Argument

Reading the four rules in sequence, the implicit argument becomes clear: LLMs fail at coding in four predictable ways (wrong task, over-built solution, excessive scope, vague success criteria), and each failure has a targeted remedy that can be encoded as a behavioral constraint.

This is a different theory of LLM improvement than "more training data" or "better reasoning" or "larger context." It's a theory of behavioral correction: the model has the capability, the problem is default behavior patterns that don't match what careful engineering requires. Explicit behavioral constraints, loaded at session start, redirect those defaults.

Whether four rules is the right number, and whether these four are the most important — those are empirical questions. The karpathy-guidelines skill is not the final word on LLM coding constraints. It's a starting point derived from specific observed failure modes.

The starting point is good enough to use today. See the karpathy-guidelines skill page to install it.

Part of the Karpathy on Claude Code series. Published 2026-05-23.

Reading Karpathy's Guidelines: What His One Skill Reveals

The Full Skill, Briefly

Rule One: Think Before Coding

Rule Two: Simplicity First

Rule Three: Surgical Changes

Rule Four: Goal-Driven Execution

What the Skill Excludes

The Structure as an Argument

Related Skills to Try

Related Skills to Try

Firecrawl MCP Server

Linear CLI Integration

Related Articles

Related Articles

Dynamic Runtime in AI Skill Design

Design Systems for Solo Builders

First-Party Benchmarks Are Marketing: A Skeptic's Checklist for Launch Day

Firecrawl MCP Server

Linear CLI Integration

notion

1password

notion

1password