The Supply and Demand of AI Tokens | Dylan Patel Interview

ELI5/TLDR

Dylan Patel runs SemiAnalysis, the firm that sells data on chips and data centers. His team’s AI bill went from tens of thousands to seven million a year in a few months — now 25% of what he pays his humans, heading for 100%. He thinks the models are so good and the compute so scarce that tokens themselves are becoming a rationed resource, where the people who get early access to the best model quietly destroy everyone else. Supply can’t catch up for years, so margins keep fattening up and down the stack.

The Full Story

The $7M bill

The best way into this episode is the boring spreadsheet moment. Last year SemiAnalysis spent tens of thousands a year on AI subscriptions — the normal “give everyone ChatGPT” line item. In January the number started doubling every few weeks. By the time of recording they were on a $7M annual run-rate for Claude Code alone, against a $25M salary base. That is not a typo. A quarter of payroll, going to one vendor, growing so fast the trajectory crosses 100% by year-end.

If this person can do the work of five to ten to fifteen people using Claude Code, then all of a sudden I should probably cut people.

He doesn’t have to, because his company is growing faster than he can hire. But the math is sitting there in plain view.

The examples he gives are the interesting bit. A guy in the Oregon lab, with a couple thousand dollars of tokens, built a tool that looks at microscope images of chips and auto-labels every material — copper here, tantalum there, cobalt on the gate. An ex-Intel employee on the team said the same tool was a team’s full-time job at Intel. An in-house economist named Malcolm wired up FRED and BLS data, built a benchmark grading 2,000 labor-statistics tasks against AI capability, and produced what he called a “phantom GDP” analysis — output goes up, costs fall so far that measured GDP theoretically shrinks. One person. A couple of weeks. Would have been 200 economists and a year at a bank.

And then the energy team. SemiAnalysis had been trying for a year to break into the $900M energy-data-services market with multiple analysts and not much to show for it. One engineer caught “Claude psychosis,” spent $6,000 a day for three weeks, and scraped every power plant and high-voltage transmission line in the US into a dashboard. Customers compared it favorably to products from 100-person teams that had been at it for a decade.

The lesson Patel draws is not “AI is magic.” It’s a reordering of what’s scarce.

What used to matter a lot was execution was very very difficult and ideas were cheap. Now ideas are cheap and plentiful but execution is very easy. So really only the good ideas are the ones that can justify the spend on super cheap implementation.

Tokens as a rationed good

Here’s the counterintuitive piece for anyone who thinks of software as infinitely scalable. Anthropic’s revenue went from $9B to $40B+ run rate in a few months, but their compute did not grow anywhere near that fast. If you work the math, their gross margins went from around 30% earlier in the year to a floor of 72% now, possibly higher. A commoditized-looking business is quietly pulling software-tier margins because it physically cannot serve all the demand.

Which means tokens are being rationed. Not by price-gouging — by rate limits, enterprise contracts, and who you know. Patel admits, without embarrassment, to being on his knees in front of an Anthropic co-founder begging for access to “Mythos” (the internal name for the model released as Opus 4.7). Top banks get it for cybersecurity first. Everyone else gets a deliberately downgraded version. He’s predicting a world where a Ken Griffin-type walks in, writes a check for “the first $10 billion of tokens from every new model,” and quietly eats his competitors’ lunch before anyone else has access.

He’s refreshingly unsentimental about this.

AI is very expensive. Who’s going to pay for the trillion dollars of infrastructure? People who have money and can build useful things with AI.

The broader claim — and this is the line to remember — is that if you don’t use more tokens, you’ll never escape the permanent underclass. Three separate problems stacked on each other. You have to use more tokens. You have to generate outsized value from them. And you have to capture that value. Miss any one and you’re stuck.

Why the supply side can’t catch up

Normally, when demand spikes like this, supply reorients and the shortage becomes a glut. Patel’s case for why that takes years this time is a tour of the stack. Imagine the supply chain as a series of pipes, each one sold out:

GPUs: H100s are going up in price, not down. Useful life, which people said was 5 years, is stretching to 7-8. Clusters are being re-signed for another 3-4 years. Cloud margins that looked like 35% are actually higher.
Memory: DRAM and NAND capacity only grows 20-30% a year at max. Even with aggressive expansion decisions made now, new supply doesn’t land until late 2027 or 2028. In the meantime, prices double and triple. The only mechanism to free up capacity is demand destruction via price — no rationing boards here.
Logic (TSMC): Sold out, but being “nice” — single-digit price increases instead of triple-digit. Capex is $57B this year; Patel thinks it will be $100B by 2028, and the downstream supply chain hasn’t priced that in.
Deep supply chain: ASML sold out, needs Zeiss to expand faster. Copper foil, glass fibers for PCBs, lasers, wafer fab equipment — the deeper you go, the tighter it gets. Think of it like a whip — a small flick at the TSMC end becomes a massive crack out at the equipment makers.
CPUs: Not the star of the show, but completely sold out. Two reasons. First, reinforcement learning — the environments that grade whether a model’s attempt at a task succeeded run on CPUs, not GPUs. As those environments get more complex (open a CAD file, edit it, submit it), CPU demand explodes. Second, all the AI-generated software has to actually run somewhere, and that somewhere is a CPU.

The software-only singularity is a blip

A nice riff toward the end. The worry that AI will only automate software and leave the physical world alone — Patel thinks that’s a temporary state. The current robotics models (VLAs — vision-language-action) are data-hungry and don’t scale well. But humans are sample-efficient — one or two examples and we’re good. He thinks a pre-training breakthrough for robots is 6-18 months out, which would unlock few-shot learning: show the robot three examples of folding a shirt, it folds shirts. Once that clicks, the physical economy gets its own deflationary wave and token demand goes up again, not down.

What he’s worried about

Asked what happens in three months, Patel’s answer is dry and sharp: large-scale protests against Anthropic and OpenAI. The Pew data has AI less popular than ICE, less popular than politicians. Sam Altman has had two Molotov cocktails thrown at his house; the comments sections are cheering. His advice to the industry — stop putting Dario and Sam on interviews (he describes them as uncharismatic), stop talking about future capabilities, start showing concrete present-day uplift. The public has no connection to these companies and views them as a “sneaky cabal of 5,000 people who are going to automate all the jobs and destroy society.”

Key Takeaways

SemiAnalysis is spending 25% of salary on Claude Code, up from near-zero last year, and trending toward 100% by year-end.
Anthropic’s gross margins went from ~30% to 72%+ in one year — a rationing artifact, not a pricing decision. Demand vastly exceeds compute.
“Phantom GDP”: output rises but measured GDP shrinks because costs collapse faster than volume grows. Traditional GDP metrics miss most of the value AI creates.
Tokens are becoming a positional good. Early access to frontier models (like Opus 4.7 / “Mythos”) goes to enterprise accounts, banks, and well-connected players first. The rest get deliberately downgraded versions.
The three-part test for staying relevant: (1) use more tokens, (2) generate outsized value from them, (3) capture that value. Miss any and you fall into the “permanent underclass.”
GPU useful life is 7-8 years, not 5. Clusters are being re-signed years out. This quietly expands cloud economics.
Memory shortage is 2-3 year problem. DRAM will double or triple from here before new fab capacity lands in 2027-2028.
TSMC capex could hit $100B by 2028, roughly double consensus. The tail whips hardest for equipment makers (Lam, Applied Materials, ASML, MKSI).
CPUs are the sleeper bottleneck — needed for RL environments and for running all the AI-generated software.
Implementation cost has collapsed, so idea selection is the new scarce skill. Research cycles compressed from 6 months to 2 months. Anthropic went from L4-engineer capability (Opus 4.6) to L6-engineer capability (Mythos) in two months.
Mythos is “the biggest step up in model capabilities in 2 years” — cyber capabilities deliberately degraded in the public release; full version given to select banks only.
Robotics breakthrough predicted in 6-18 months via few-shot learning on pre-trained robot models — then the physical world joins the deflation.
Social backlash is Patel’s near-term concern. AI polls lower than politicians. Expect organized protest in the next quarter.

Claude’s Take

This is Dylan Patel in his most useful mode — reading the supply chain like a balance sheet and talking dollar amounts that are two or three orders of magnitude bigger than the average AI conversation. The $7M-on-$25M stat alone is worth the forty-five minutes. It’s the cleanest single data point I’ve seen for what “we’re living through an actual capex supercycle” means in a real firm, not a keynote slide.

The piece of the argument that earns the 8/10 is the interlock between three claims most people treat separately: (a) Anthropic’s margins expanded because of compute scarcity, not pricing power; (b) that scarcity will persist for years because memory and fab capacity can’t flex; (c) therefore frontier access is becoming a rivalrous, positional good. Put those together and you get a clean thesis for why this cycle doesn’t look like a typical commodity shortage that glut-corrects in 18 months. It also explains why Anthropic’s revenue shape (from $9B to $40B ARR while compute barely grew) isn’t a sign of some magical product moat — it’s the math of people fighting over rate limits.

Where to keep a skeptical eyebrow up. The “phantom GDP” framing is vibes-heavy — it’s a real phenomenon, but Patel doesn’t actually produce numbers, and his own economist colleague built a benchmark, not a measurement. The “if you don’t use more tokens you’ll join the permanent underclass” line is the kind of thing that sounds profound in an SF podcast and corny in a factory town. And his prediction of large-scale anti-AI protests in three months is a bold forecast; if it doesn’t happen, that’s useful signal about how wrong the Bay Area bubble can be about the mood outside it.

The uncomfortable bit, though, and the reason this one’s worth a careful read rather than a skim: his first-person account of his own firm is the most convincing version of the “this is different” argument I’ve heard. Not because he waves his hands about AGI, but because he walks through three specific products his team built in weeks that would have taken 100-person teams a decade. That’s the shape of a real productivity shift, and his framing of the scarce inputs — compute, good ideas to point it at, and capital to buy access — is the most useful mental model in the episode.