Software 3.0: Karpathy's Framework Applied to Claude Code

In June 2025, Andrej Karpathy gave a keynote at Y Combinator's AI Startup School titled "Software in the Age of AI is Changing (Again)." The talk extended his now-canonical Software 1.0/2.0 framing into a third paradigm: Software 3.0, where the programming language is English and the runtime is a large language model's context window.

The framework is elegant and easy to dismiss as high-level positioning. But it maps onto the practical architecture of Claude Code — skills, prompts, context files, agentic workflows — with enough precision to be genuinely useful for understanding what you're actually building when you write a CLAUDE.md file or install a skill.

The Stack, Briefly

Karpathy's Software 1.0 is traditional code: explicit logic written in programming languages, compiled and executed deterministically. The developer controls every instruction. The behavior is fully specified by what was written.

Software 2.0, which he articulated in a 2017 Medium essay, is neural network weights: the "source code" is a dataset, the "compilation" is training, and the resulting program is a set of learned parameters. The developer specifies the loss function and architecture, not the logic. Behavior emerges from data rather than explicit instruction.

Software 3.0 extends the logic: the programming language is natural language, the runtime is an LLM, and the context window is RAM. Prompts are programs. The developer's job becomes specifying intent precisely enough that the model executes it correctly — context engineering rather than code authorship.

These three layers don't replace each other cleanly. Modern production systems combine all three: traditional code handling infrastructure and deterministic logic, trained models handling perception and classification, and LLMs handling open-ended reasoning and generation.

How This Maps to Claude Code

Claude Code is a Software 3.0 interface for software development itself. But within Claude Code, the same three-layer architecture reappears at a smaller scale.

Skills are Software 3.0 components. A skill is a natural-language specification of how Claude should behave across a domain. The karpathy-guidelines skill is a prompt program: four rules written in plain English that redirect Claude's behavior during coding sessions. When you install a skill, you're loading a Software 3.0 module into the context window.

CLAUDE.md files are the project-level context. In Karpathy's framing, the context window is RAM — the working memory that shapes how the model processes the current task. A CLAUDE.md file at the project root is the project-level state loaded into that RAM at session start. It encodes team conventions, architectural constraints, and behavioral guidelines that apply for the entire session.

Agentic workflows are multi-step execution loops. When Claude Code runs a task autonomously — creating files, running tests, iterating on failures — it's operating as a Software 3.0 agent: natural language goals at the top, traditional code execution (terminals, file systems, build tools) at the bottom, with the model reasoning in between.

The Prompt Is the Program

The practical implication of Software 3.0 for Claude Code users is that prompt quality is code quality. A vague instruction isn't just inefficient — it's a malformed program. It will execute, but what it produces is undefined.

This is precisely what the karpathy-guidelines skill addresses at the behavioral level. "Goal-Driven Execution" — converting "fix the bug" into "write a test that reproduces it, then make it pass" — is prompt engineering as software engineering. You're specifying a program precisely enough that the runtime can execute it deterministically.

"Think Before Coding" addresses the equivalent of type errors: catching mismatches between what was specified and what the runtime assumes before execution begins. Surface ambiguity before implementation, not after.

The fit between Karpathy's Software 3.0 framing and the karpathy-guidelines skill isn't coincidental. Both emerge from the same observation: that natural language interfaces require the same discipline as traditional code. Imprecision is a bug. Ambiguity is undefined behavior.

The Developer's Role Shifts

Karpathy is careful not to frame Software 3.0 as the elimination of programming skill. His actual argument is that the developer's role shifts from writing instructions to writing specifications and managing context.

In Software 1.0, the developer writes the program. In Software 2.0, the developer curates the dataset. In Software 3.0, the developer engineers the context: what the model knows, what constraints it operates under, what success looks like, and how it should handle ambiguity.

This is not easier than traditional programming. It requires a different kind of discipline — one that favors precision in natural language over precision in syntax. CLAUDE.md files, carefully composed skills, explicit verification criteria: these are Software 3.0 artifacts that require as much care as production code.

The teams that treat them as afterthoughts produce the equivalent of spaghetti code in Software 3.0: systems where behavior is undefined, emergent in bad ways, and impossible to debug systematically.

Skills as Composable Modules

One of the useful things Software 3.0 framing reveals is why skill composability matters. In Software 1.0, you compose functions and modules. In Software 3.0, you compose context: the skills loaded in a session combine to shape behavior across the session.

The karpathy-guidelines skill is deliberately abstract — it applies to any coding domain. A domain-specific skill (say, a skill encoding TypeScript best practices) layers on top of it. Neither interferes with the other because they operate at different levels of abstraction: behavioral constraints versus domain knowledge.

This is why the single-skill design of the karpathy-guidelines repo isn't a limitation — it's a feature. Abstract behavioral constraints compose with domain-specific skills cleanly. Mix behavioral guidelines into domain skills and you lose that clean separation.

The Limits of the Framework

Karpathy's framing is clarifying, but it has limits worth naming. Software 3.0 assumes the runtime is capable. A poorly-specified prompt in a capable model produces better output than a well-specified prompt in an incapable one. The framework explains the developer's job but doesn't fully account for the variance in model capability — which matters enormously in practice.

It also understates the cost of debugging failures. When a Software 1.0 program fails, the stack trace is usually informative. When a Software 3.0 agent fails — goes down the wrong path, misinterprets intent, produces output that's technically correct but wrong in context — the failure mode is harder to trace. The karpathy-guidelines skill addresses some of this through explicit verification criteria, but the fundamental diagnostic challenge of non-deterministic systems remains.

These are known limits, not fatal flaws. The framework is still the clearest conceptual map available for understanding what Claude Code is and what it requires from the developer.

For the more specific question of what Karpathy's guidelines do in practice, see Reading Karpathy's Guidelines: What His One Skill Reveals. For the CLAUDE.md angle specifically, see CLAUDE.md as Context Engineering.

Part of the Karpathy on Claude Code series. Published 2026-05-23.

Software 3.0: Karpathy's Framework Applied to Claude Code

The Stack, Briefly

How This Maps to Claude Code

The Prompt Is the Program

The Developer's Role Shifts

Skills as Composable Modules

The Limits of the Framework

Related Skills to Try

Related Skills to Try

Soultrace

Related Articles

Related Articles

Design Systems for Solo Builders

First-Party Benchmarks Are Marketing: A Skeptic's Checklist for Launch Day

The Cheapest Frontier-Class Model Right Now? Grok 4.5's Price-per-Intelligence

Soultrace

Memory Systems Design

ElevenLabs Skills

Context Degradation Detection

Memory Systems Design

ElevenLabs Skills

Context Degradation Detection