The Karpathy CLAUDE.md File That 43,000 Developers Installed in 1 Week (Full Breakdown)

ELI5/TLDR

Andrej Karpathy tweeted his pet peeves about AI coding agents — they assume too much, overbuild, edit things you didn’t ask for, and lose the thread on what “done” means. A developer named Forest distilled the tweet into a single CLAUDE.md file, dropped it on GitHub, and 43,000 people starred it in a week. The file is just four rules. The video walks through each, side by side with a vanilla Claude Code session, to show the rules actually changing behaviour.

The Full Story

The setup

The repo is obra/superpowers — sorry, the video calls it the “Karpathy skills” repo by Forest. One markdown file. You drop it into your project and Claude Code starts behaving like the agent Karpathy wishes it were. The host runs every test twice: vanilla session on the left, Karpathy-loaded session on the right, same prompt, same codebase (a “Rubric” agent dashboard app).

Rule 1 — Think before coding

Karpathy’s complaint:

The most common category of mistake that these agents make is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don’t manage their confusion. They don’t seek clarifications and they don’t surface inconsistencies.

Test: “add a light mode toggle.” Vanilla Claude reports success. The toggle isn’t there. Karpathy Claude actually adds it, top right next to search, and bothers to pick the right icon colours so it doesn’t look broken next to the others. The framing the host lands on: without the rule, Claude assumes what you want; with it, Claude asks first.

Rule 2 — Simplicity first

Karpathy’s complaint, paraphrased: agents default to bloated thousand-line solutions when twenty would do. They were trained on production codebases, so they reach for production patterns even when you asked for a five-minute feature.

Test: add a search bar that filters a tab list. Vanilla session: claims success, nothing changed. Karpathy session: ships it in 20 lines, no clever separator-tracking logic, no extra UI it wasn’t asked for. The vanilla session, when it eventually does something, uses 50%+ more code.

Rule 3 — Surgical changes

The subtle one. Karpathy:

They still sometimes change or remove comments in code that they don’t like or don’t sufficiently understand even if it is orthogonal or not related to the task at hand.

Test: change the font from Outfit to Inter. Vanilla Claude burns tokens debugging why its change didn’t apply (it confirmed success but the dashboard still shows Outfit). Karpathy Claude finds every instance, swaps it, leaves the Google Fonts URL structure and font-family declarations untouched. The host’s reframe: extra lines look like productivity but are just productivity for productivity’s sake — and they cost tokens.

Rule 4 — Goal-driven execution

Define “done.” Don’t dictate the path. Karpathy again:

LLMs are exceptionally good at looping until they meet specific goals.

Imperative (“put the icon picker in the top right corner with a 4x4 grid of options”) versus declarative (“let the user pick an icon for each agent”). The host gives the goal-only prompt and the agent figures out where the picker lives, how many icons to show, what they look like. When you click an agent, an icon picker opens; selecting one updates the UI.

Key Takeaways

Bake an “ask first” rule into CLAUDE.md — explicit instructions to surface assumptions, ask clarifying questions, flag inconsistencies before writing code.
Add a simplicity clause — something like “prefer 20 lines to 200; only reach for production patterns if asked.” Counters the production-codebase training bias.
Forbid orthogonal edits — “do not touch comments, formatting, or unrelated code.” Saves tokens and prevents drift in long sessions.
Write prompts declaratively — describe done, not the path. Especially for UI tasks where the agent has more taste than you give it credit for.
Verify the actual app, not the agent’s report — the vanilla session repeatedly claimed success when nothing had changed. Trust the running build, not the chat.
Worth comparing against your own CLAUDE.md — yours is a project memory file (vault structure, scripts, conventions) rather than a behavioural one. The Karpathy file is orthogonal — could live alongside.

Claude’s Take

Useful, mostly because the side-by-side demos make the abstract advice concrete. The four rules themselves aren’t new — anyone who’s used Claude Code for more than a week has run into all four failure modes and probably written some version of these rules into their own setup. The contribution is the packaging: one file, four principles, viral GitHub repo.

The video does soft-pedal a few things. The “vanilla Claude reported success but nothing changed” failure happens, but not as reliably as the demos suggest — there’s some cherry-picking. The 43,000 stars are partly a Karpathy-name halo. And “goal-driven” prompting can backfire when the model picks a wrong frame and runs with it for an hour; the same Rule 1 (ask first) is meant to catch this, but the rules pull against each other in practice.

Still, six out of ten because the takeaway is real: most people’s CLAUDE.md files are project context, not behavioural guardrails. Adding the four rules — or some version — is genuinely cheap upside.