Claude Code Psychosis: How SemiAnalysis Is Token Mogging Meta

ELI5/TLDR

A research firm that writes about chips and AI infrastructure has gone fully native on AI coding tools. They’ve built a swarm of AI agents that do the grunt work of financial analysis — reading earnings calls, building spreadsheets, summarizing conference talks — so their human analysts can cover five to ten times as many companies. They claim to burn through more AI tokens per employee than Meta does, and they seem to think most of the business world has no idea what’s coming for them.

The Full Story

The framing of this podcast is half flex, half field report. Dylan Patel — who runs SemiAnalysis, a research shop that writes about semiconductors and AI hardware — has been tweeting that his team uses more than twice as many Claude tokens per employee as Meta does. In the AI crowd this is apparently a point of pride. The episode’s stated goal is to show how his team actually spends all those tokens, and what it’s bought them.

The agent swarm, explained by an analogy

Dan Nishball, one of the research leads calling in from Singapore, walks through his team’s setup. This is the core of the episode, and it’s worth slowing down because it’s a glimpse of a workflow that most knowledge workers haven’t encountered yet.

Think of a traditional equity research team. A senior analyst covers some number of companies — maybe ten, maybe fifteen. For each company they build a financial model, read every earnings transcript, follow the news, attend conferences, and form a view. The bottleneck is time. There is no version of this job where one person covers a hundred companies with real depth. There just aren’t enough hours.

Dan’s bet is that you can get around this bottleneck by turning each piece of the workflow into an AI agent — basically, a small specialized assistant that does one job well — and then having a head agent coordinate them. He calls the head agent “Wags.” Wags is, in his words, an “agentic director of research.” When Dan wants to start covering a new company, he tells Wags to “initiate coverage,” and Wags farms out work to its subordinates.

Here’s the cast of agents he shows on screen:

One reads every earnings call transcript and summarizes it.
One grabs news digests.
One handles “events” — product launches, investor days, that sort of thing.
One writes a company brief.
One builds the actual financial model.

Each of these does a narrow, discrete task. And then there’s one special agent he calls the “company agent” that he deliberately keeps clean and empty. When everything is done, the company agent reads in all the outputs — the model, the transcripts, the briefs — and Dan can ask it questions about the company in plain English.

Why the clean one matters: this is the most technically interesting idea in the episode, so let me expand it. AI models have a limited “context window” — you can think of it as short-term memory, or the size of the desk they can work on. If the desk gets cluttered with notes about how you grabbed the data, how you scraped the filings, which tools you tried, then there’s less room on the desk for the actual thinking. So Dan’s trick is: let the worker agents get messy doing their jobs, then throw away their cluttered desks, and hand a fresh, organized briefing to a brand-new agent whose only job is to answer questions. The cluttered memory doesn’t survive. The useful output does.

What it actually costs

Dan runs a live example during the podcast. He asks Wags to initiate coverage on AOI — an optical transceiver company — which means reading transcripts, pulling filings, building a rough model, and answering a question about a big customer contract. Five minutes of real time. Total cost: $3.52 in tokens. He estimates the full research push on a company is maybe ten or fifteen dollars of compute.

Compare that to what it would cost to have a human analyst spend a day on the same task. The math does itself. He says the same thing several times in different words: he’d still do this at twenty dollars, at fifty, at a hundred. It’s not close.

This cost me $3.52… it’s either this or I have an analyst spend like a day on it.

This is also the explanation, he argues, for why GPU rental prices have been inflecting upward. The economics of turning an analyst-day into ten dollars of compute is so lopsided that demand is basically unlimited at current prices.

The conference problem, and Claudia

One of the funnier specific examples: SemiAnalysts attend fifty to sixty industry conferences a year. Jordan, the host, admits he went to GTC — the biggest AI chip conference of the year, run by NVIDIA — and attended exactly zero of the 830 technical sessions. He was in meetings the whole time. Dan’s agent “Claudia” (their “conferences chief”) solves this by transcribing every talk from every conference, indexing everything, and letting Dan query for specific topics. When a new analyst joins — this week, someone named Julian — Dan can tell Claudia “brief Julian on ScalaCross,” and Claudia hands back a reading packet.

Jordan captures the workflow in a line that sounds like a joke but isn’t:

So the agent is going to brief an analyst to brief you is the workflow? Yes.

Where it still breaks

Dan is refreshingly blunt about where this fails. His current pain point is the balance sheet. Financial models are supposed to balance — assets equal liabilities plus equity, it’s a rule of the universe, or at least a rule of accounting. His agent sometimes produces models where they don’t. His human analysts would never make this mistake. When an analyst learns something — don’t mix fact-set data with SEC filings for this line item, the footnotes on depreciation are misleading for this industry — they remember it forever. An agent forgets the moment its context window closes. So a huge part of the work Dan’s actually doing right now is building a “supervisory” layer into Wags that checks for known failure modes every time a new model gets built. He compares it to whipping an analyst team into shape, except this analyst team has no long-term memory.

Sam Harshe — a new hire calling in from San Francisco — pushes back on the “it’s like training an analyst” framing, and he’s right to:

It’s not necessarily just like teaching an analyst team because that type of mistake is not one that an analyst would ever make. And they don’t forget.

Memory, not intelligence, is the bottleneck. You can’t have an endless memory file — the context fills up — but everything the agent learns has to live somewhere durable or it evaporates between sessions. This is one of the genuine hard problems in agent design right now, and Dan basically admits they haven’t solved it.

The bigger question — who else is even trying this

The back half of the episode drifts into a discussion of adoption. Sam makes the point that these tools are now good enough that he’d be comfortable showing them to his parents or grandparents. A year ago you needed test-driven development, careful context management, scaffolded project structure. Now it mostly just works if you tell it what you want.

And yet. Dan talks to a lot of fund managers, and most of them are using Claude Code for tiny one-off tasks — summarize this, rewrite that. They’re not building systems that compound. Most Fortune 500 companies haven’t even got IT approval to use these tools at all. Sam puts a sharp point on the disconnect:

The bottleneck is not actually the quality of the output. For some reason other than the fact that it’s not good enough, all these people at the Fortune 500 companies are not using them. It is good enough. It makes me five times as productive.

Jordan adds a theory that’s worth sitting with. At SemiAnalysis, getting 50% more efficient means producing 50% more research, which directly translates to more subscriptions sold. The feedback loop is tight. At most large companies, the person using AI is six layers removed from the customer — so if they get 50% more efficient, they probably just go home at 3pm. Productivity gains at one worker don’t translate into revenue gains at the firm, and so nobody at the top has a reason to push adoption hard. This is an underrated reason to expect slow enterprise uptake even though the tools are ready.

The Mythos discussion, and cybersecurity

They briefly discuss a recent Anthropic release called “Mythos” and its claimed improvements in cybersecurity — finding zero-day vulnerabilities in things like the Linux kernel and NFS drivers. A zero-day, for the uninitiated, is a security flaw that nobody has discovered yet, which makes it both valuable (defenders can fix it) and dangerous (attackers can use it). Anthropic’s marketing has been leaning hard on this, which Sam notes is a strange move if your goal is to prevent bad guys from using your model that way.

Their collective theory is roughly: the model isn’t necessarily smarter at security per se. What it’s better at is long-range persistent work — not giving up after three attempts, not telling you to “just open a PR with the maintainers.” If you believe this framing, then other models with better harnesses (the surrounding software that tells the model how to act) could probably do similar things. The gap isn’t in the weights. It’s in the scaffolding.

The switching question

The episode ends on a question Jordan throws back to Dan: what would it take for you to switch off Claude entirely? Dan doesn’t really have an answer beyond “it would be a pain.” All his knowledge files, all his agent configurations, all the institutional tuning — would have to be rebuilt. The moat, they agree, isn’t raw model quality anymore. It’s the ecosystem, the habits, the user’s accumulated history with a specific tool. Switching costs are starting to feel real.

Also, apparently the lead time for a Mac Mini in the US is three months because everyone’s buying them to run Claude Code on.

Claude’s Take

A few things to call out here.

First, the stuff that’s solid. The specific workflow Dan describes — a supervisor agent plus narrow worker agents, with a clean “reader” agent kept free of process noise — is a real and clever architectural pattern. The context-management problem is a genuine one, and the “keep one agent pristine” trick is a good response to it. The cost-per-initiation number ($3.52 for an AOI coverage push, ten to fifteen dollars total) is specific and falsifiable, and if it holds up it is a real data point about how cheap research work is becoming. The observation that memory, not reasoning, is the current bottleneck is correct and underdiscussed in the broader AI commentary.

Second, the stuff that’s more speculative. The “AI catches up and humans can’t counterpunch” framing for GPU kernel authoring is a nice line, but it’s offered without evidence — they’re in the ballpark of truth, probably, but “GPU mode leaderboards” aren’t the same thing as production kernel development, and they know it. The “token mogging” framing is fun marketing, but tokens-per-employee is an input metric, not an output one. Burning more tokens doesn’t mean doing more work, it means doing more compute. SemiAnalysis might genuinely be more productive than Meta on a per-person basis; they also might just be more wasteful. The episode assumes the first without quite proving it.

Third, the thing they get almost right but don’t fully land. Jordan’s observation about why Fortune 500 adoption is slow — that productivity gains don’t flow to the firm unless the worker is close to revenue — is the sharpest point in the episode. But it implies something uncomfortable that they don’t quite say: if the adoption curve is shaped by incentive structures rather than tool quality, then a lot of the bullish “everyone will be using this in six months” talk is wrong. Tools don’t propagate through companies just because they work. They propagate when someone with budget authority has a reason to force the change. That reason doesn’t exist at most large enterprises, and it’s not clear what would create it.

Finally, the weakest moment is the throwaway discussion of Meta’s new model (“avocado”). They note that it performs well, that Meta seems surprised by its own success, and that Meta has the compute and team to keep pushing. This is also the kind of take that requires no commitment — a four-horse race is a safe prediction when there are four horses. Nothing wrong with it, but nothing particularly load-bearing either.

Overall: the genuinely novel content is the concrete agent architecture Dan walks through, and the specific cost numbers. Most of the adoption-and-moat discussion is reasonable but not new. Worth watching for the first half; the second half is mostly industry chatter.