From nanoGPT to Claude Skills: The Pedagogy of Karpathy's Public Code
Karpathy's 'build it from scratch' lineage — nanoGPT, micrograd, makemore — encodes a theory of understanding that ports directly into how the best Claude skills are designed.
Andrej Karpathy's most-starred GitHub repositories share a property that's easy to overlook: they are deliberately small. nanoGPT is roughly 300 lines of training loop plus 300 lines of model definition. micrograd — his implementation of a scalar-valued autograd engine — is about 100 lines. makemore, his character-level language model series, starts from near-nothing and builds incrementally across a sequence of video lectures.
The smallness is not a constraint. It's the point. Karpathy's public code is pedagogically engineered to be fully understandable by a single person in a single sitting. That design philosophy — minimum surface area, maximum transparency, no abstractions unless the abstraction earns its complexity — shows up again in the karpathy-guidelines skill, which is 65 lines long and does exactly as much as it needs to.
This isn't accidental convergence. It's a coherent theory of what makes technical knowledge transferable.
The nanoGPT Design
nanoGPT was first published in January 2023 as a "simplest, fastest repository for training/finetuning medium-sized GPTs." Karpathy built it as a rewrite of his earlier minGPT, with one explicit tradeoff: readability over modularity. The training loop is not parameterized for every possible configuration. The model definition does not implement every transformer variant. The code does the specific thing it needs to do, cleanly, without hiding what's actually happening.
The companion build-nanoGPT repository makes the pedagogical intent explicit: the git history is a teaching tool. Each commit is a step in the derivation, kept clean so a learner can walk through it sequentially. The code doesn't just run — it can be read in order of construction, which is the order that produces understanding.
This stands in sharp contrast to most production ML code, which is parameterized for configurability, wrapped in abstractions for extensibility, and documented for users rather than for learners. Karpathy chose comprehensibility over flexibility at every decision point.
The Pedagogy Pattern
Karpathy's "Zero to Hero" YouTube series makes the pedagogical pattern more explicit. Each series — nanoGPT, micrograd, makemore — starts from an empty file and builds toward a working implementation, with each step explained before implementation. The code that results is not the code you'd write for production. It's the code that most efficiently communicates how the thing works.
This is a specific kind of knowledge claim: that the best way to understand something is to build it yourself from first principles, in minimum code, with every decision explained as it's made. It's a rejection of the "use the library, don't worry about what's inside" approach to technical education.
The implications for skill design are direct. A skill that tells Claude what to do is less transferable than a skill that tells Claude how to think about a class of problem. The karpathy-guidelines skill doesn't say "when the user asks you to add a feature, here's the procedure." It says "always write minimum code that solves the stated problem" — a constraint that applies to any feature, any language, any context. The principle transfers; the procedure doesn't.
What This Means for Skill Authoring
The most common failure mode in skill authoring is procedure-over-principle: the skill encodes a specific workflow rather than the underlying principle that makes the workflow good. This produces skills that work well for the exact use case the author had in mind and generalize poorly.
Karpathy's approach — distill to minimum, explain the principle, let the learner apply it — suggests a different frame for skill authoring. Before writing the procedure, ask: what is the underlying constraint or goal that makes this procedure correct? Write that first. The procedure, if it's still needed, follows from the principle.
The karpathy-guidelines skill demonstrates this clearly. "Goal-Driven Execution" doesn't specify a procedure for every task type. It specifies a transformation: convert vague task descriptions into verifiable success criteria. That transformation applies to bug fixes, feature additions, refactors, and migrations. The principle is general; the application is per-context.
The Tension with Convenience
Karpathy's pedagogy is not frictionless. nanoGPT is not the fastest way to train a GPT — Hugging Face's Trainer is. micrograd is not the recommended autograd engine for production work — PyTorch is. The "build from scratch" approach costs time and requires background knowledge that production libraries abstract away.
This tension surfaces in skill design too. A skill that encodes a detailed, opinionated workflow for a specific tool can dramatically accelerate a developer who already knows what they want to do. A skill that encodes a general principle requires the developer to translate that principle into action themselves.
The karpathy-guidelines skill lands on the principled side of this tradeoff. It doesn't tell you how to write React components or structure a database migration. It tells you how to think about the task before you start. That's more cognitively demanding than a procedure-following skill, and it's also more durable.
The question isn't which approach is universally better — it's which approach fits the problem. For behavioral constraints that should apply across all tasks, principles beat procedures. For domain-specific workflows where the "right" approach is known and consistent, procedures beat principles.
What's Left Out
It's worth noting what Karpathy's pedagogical approach doesn't encode: the accumulated knowledge of having debugged production systems at scale. nanoGPT doesn't cover distributed training failures, model serving latency at high traffic, or the organizational complexity of keeping a large ML codebase maintained by multiple teams. It covers what you need to understand to build the thing from scratch.
This is a deliberate scope choice, not an oversight. The "Zero to Hero" framing is literal: the goal is to take someone from no understanding to working implementation. What happens after you've built it is outside scope.
The same scope limitation applies to the karpathy-guidelines skill. It addresses the behavioral failure modes of a single LLM coding session. It doesn't address code review culture, test coverage strategy, or long-term codebase health. Those are real problems; they require different tools.
For the tension between the "build from scratch" philosophy and the install-a-skill ecosystem, see The "Build it from Scratch" Discipline in the Age of Skills.
Part of the Karpathy on Claude Code series. Published 2026-05-23.