Google I/O 2026, Karpathy Joins Anthropic, and Cerebras' $95B IPO | EP #256
ELI5/TLDR
Google had its big annual show, and after a year of people calling it dead, it’s not dead — it’s spending $180 billion on AI this year and serving Gemini to 900 million people. Most of the new stuff it announced is solid but unsurprising; Google is winning by integrating everything, not by being the smartest model. Meanwhile, Andrej Karpathy joined Anthropic, Elon lost his lawsuit against OpenAI, and Cerebras — a company that builds a chip the size of a dinner plate — went public at a $95 billion valuation. Its CEO Andrew Feldman explains why making one giant chip beats stitching together thousands of small ones.
The Full Story
The Google comeback nobody saw coming
A year and a half ago, the consensus take was that Google was cooked. Search was about to be eaten by ChatGPT. The ad business was a dead man walking. Then Sundar Pichai opens his keynote with numbers that are hard to parse without going numb. Two years ago, Google was processing 9.7 trillion tokens a month. Now it’s processing 3.2 quadrillion — a 330x jump. The Gemini app has 900 million monthly users, more than doubling in a year. Thirteen Google products have over a billion users each. Five have over three billion.
The capex number is the one to sit with. In 2022, Google spent $31 billion building data centers and buying chips. This year, it’s spending around $185 billion. Six times more.
If you said five years ago, hey, Google’s going to 6x its capex and the stock will go up. Nobody in their right mind would have said that’s even possible.
The way Alex Wissner-Gross frames it is that Google had no choice. If they hadn’t leaned into AI at every layer — chips, data centers, models, products — the old search-and-ads business would have been eaten alive. So they cannibalized themselves before someone else could. The reward is that they’re now the only company in the world running the full stack: their own TPUs, their own foundation models, their own consumer products with billions of users to push them through.
Gemini Omni and the modality bet
The flashy demo of the day was Gemini Omni — a model family that takes text, images, video, audio in, and creates any of those out. The standout clip was a claymation explainer of protein folding generated from one prompt. Demis Hassabis, on stage, morphed his selfie through different outfits and backdrops in real time.
The strategic read here is that Google DeepMind is the last American frontier lab still chasing what they call multimodality — handling many different kinds of input and output in one model. OpenAI quietly demoted Sora and stepped back from video. Anthropic has always been laser-focused on code. That leaves Google as the only Western lab still betting that the path to something like super intelligence runs through every modality at once — not just text and code, but video, audio, maybe even biological sequences like DNA and proteins treated as just another modality the model has to learn.
It’s at this point almost an idiosyncratic bet that they’re going to get to some form of super intelligence that’s distinguishable because it handles all these different modalities.
The Chinese frontier labs are leaning hard into video too. The American labs, with the exception of Google, are not.
Gemini 3.5 Flash — solidly mid
Google also launched Gemini 3.5 Flash, the new default for the Gemini app and AI search. It’s fast. It’s four times faster than other frontier models on raw throughput. It does better than the previous Gemini Pro across most benchmarks. But it’s not actually competitive with the top tier — GPT 5.5 or the latest Anthropic models — on pure capability.
Alex’s read: the strategy was throughput-maxing and tool-use-maxing, not pushing the frontier. Google picked the most flattering possible chart — intelligence versus output speed — instead of raw capability. The real Pro model, the one meant to actually compete with frontier rivals, is still a month out.
Dave’s add: the scuttlebutt in Silicon Valley is that the real two-horse race is between OpenAI and Anthropic, and the AI talent is flooding into those two buildings in San Francisco. Google’s war chest is enormous, but OpenAI just raised $120 billion in cash and Anthropic is on a similar trajectory. The funding gap isn’t what it used to be.
Anti-Gravity 2.0, Gemini Spark, the universal cart — copycat city
The next product announcements followed a pattern. Anti-Gravity 2.0 is Google’s coding agent, basically a Windsurf rebrand (Google had hack-acquired the Windsurf team) following Cursor’s lead in moving from a code editor to an agent-orchestration interface. Useful, but a fast-follow.
Gemini Spark is Google’s answer to OpenAI’s “Open Claw” — a 24/7 personal agent that lives on a GCP VM and integrates across Gmail, Sheets, Docs, YouTube, Search. The implementation is safe and unambitious, but the integration is the point.
If Google says, okay, Gemini Spark is going to be one click away from a Google search, completely integrated, a massive fraction of the world is just going to click the button.
The universal shopping cart is the most direct shot at Amazon — add things to your cart from YouTube, Search, Gemini, or Gmail, and the cart hunts for deals and price drops in the background. Whether Amazon ever complies with the standard is another question. (Dave’s guess: no.)
The bigger pattern is that Google is reinventing search itself. The search box is no longer a rectangle you type into — it expands as you ask, suggests follow-ups, can launch persistent agents that keep scanning the web for apartments or products matching your criteria. Alex’s read on why this took so long: people kept assuming the obstacle was business risk to the ads engine. The real obstacle was technical — generative models were too slow and expensive to fit inside the latency-and-cost budget of a search query. The whole emphasis on throughput in 3.5 Flash is partly Google solving its own dog-food problem.
Watermarking, audio glasses, and the consumer hardware miss
Synth ID, Google’s invisible watermark for AI-generated images and video, has now been adopted by OpenAI, Nvidia, Cacao, and Lean Labs. The frame to hold this in: the whole panic about not being able to tell what’s real and what’s AI is being solved from the synthetic side, not the camera side. The companies that generate the content are stamping it, and the cameras will eventually follow.
We may end up at a point where authenticity is more valuable than creativity.
The other moonshot mate, Salim Ismail, frames it as moving from the information age to the verification age. Trust becomes the new scarcity, and trust becomes infrastructure.
The hardware reveal was Google’s new audio glasses — built with Samsung, Warby Parker, and Gentle Monster. No display. Cameras and a voice in your ear, all day. The honest take: Google should have owned this category. They had Google Glass in 2013. They abandoned it. Meta has been iterating for years and now runs away with the smart-glasses market. Google is playing catch-up with a stripped-down audio-only version because the AR display tech still isn’t there. The form factor is workable. The fact that you’ll be walking around with a voice whispering to you all day while cameras record everything is going to start some fights, in both senses.
Karpathy joins Anthropic
The other big news that dropped on the same day — quite possibly as counter-programming to Google I/O — is that Andrej Karpathy left independent research to join Anthropic. Karpathy is one of the most influential figures in the field. Co-founded OpenAI, ran Tesla’s Autopilot for Elon, returned to OpenAI, then founded an education startup called Eureka Labs.
In a podcast clip recorded just before the announcement, he basically explained his own decision out loud:
If you’re outside of the frontier lab, your judgment fundamentally will start to drift because you’re not part of what’s coming down the line.
The point applies more broadly. Andrew Feldman, the Cerebras CEO, picks up on it for hardware:
If you’re not building hardware for, or engaged at a fundamental level with, one of the three most important labs — Google, Anthropic, OpenAI — you are not seeing what they’re thinking. Your hardware will drift from what they need.
Karpathy’s specific job at Anthropic is using Claude to accelerate Claude’s own pre-training research. Which is exactly the recursive self-improvement loop that everyone’s been describing as the inflection point. The fact that he picked Anthropic over OpenAI or Google is interesting on its own.
Elon loses
A federal jury rejected Elon Musk’s lawsuit against OpenAI in two hours of deliberation. The basis: he waited too long, missed the statute of limitations. He’ll appeal. The appeal will almost certainly fail because it’s a factual ruling and courts rarely overturn those. Andrew Feldman, who knows both Sam Altman and Elon, was almost angry that the whole thing happened:
What billionaires in pissing matches interests me not at all. I just want want sort of these guys doing what they’re the best in the world at, which is building stuff.
The Cerebras story — a chip the size of a dinner plate
The second half of the episode is a long interview with Andrew Feldman, who just took Cerebras public at $95 billion — the biggest US tech IPO since Uber in 2019. The company makes the wafer-scale engine: a single chip 58 times larger than any chip ever built before. Imagine a normal computer chip, and now imagine one the size of a small pizza. That’s what they’re shipping.
Feldman’s origin story is worth understanding because it explains why Cerebras matters. He and his co-founders sold their previous startup to AMD in 2012. By 2015 they were looking at AI and made two big contrarian bets:
- AI was going to need its own dedicated silicon, just like graphics needed GPUs and mobile needed ARM.
- The right way to build that silicon was not as a slightly-better GPU. Start with a clean sheet.
The specific technical bet was around memory bandwidth — how fast you can move data from memory to compute. There are two kinds of memory. DRAM (the slow kind, stores a lot per square millimeter). And SRAM (the fast kind, but stores very little per square millimeter). The Cerebras insight: if you make the chip massive — dinner-plate-sized — you can pack it with so much SRAM that the storage problem solves itself. You get speed and capacity at once.
The catch is that nobody in the 75-year history of the computer industry had ever successfully built a chip that big. The legendary computer architect Gene Amdahl had failed at it decades earlier. Other companies came to Cerebras’ labs, saw the solution, went home and tried to build it, and also failed.
Cerebras solved it in August 2019. And then nobody cared.
We thought everybody would rush to our door. And the world didn’t care one bit. The world was utterly indifferent.
First generation: 12 systems sold. Second generation: around 350. Third generation: thousands. The world didn’t catch up until late 2024, when models finally got smart enough that inference — actually using the model, as opposed to training it — became the dominant workload. Cerebras chips run inference 15 to 20 times faster than a GPU. Once the models got useful, demand exploded. In December 2025, they signed a $20 billion deal with OpenAI. In March 2026, a term sheet with AWS.
Why one big chip beats thousands of small ones
Alex asked the obvious question: today’s frontier models can have ten trillion parameters. The Cerebras chip has 40-50 gigabytes of SRAM. How does that math work?
Feldman’s answer: every chip architecture has to split big models across multiple chips. Even GPUs and TPUs do this. The question is how clean the splits are. On a GPU, you have to do something called tensor model parallelism — the giant matrix multiplications inside the attention head literally don’t fit on one chip, so you have to cut them up and the chips have to constantly talk to each other.
On Cerebras, the matrix multiplications fit on one chip. You only have to split when a model crosses layer boundaries, which is a much smaller, more contained communication problem. The data you have to move between chips is a tiny results vector, not the whole computation. You pay maybe a 2% performance penalty per hop.
Compare that to Groq (the competing SRAM-based chip Nvidia just acquired), which has only 800 square millimeters of silicon per chip. To run a trillion-parameter model on Groq, you have to split it across two or three thousand chips, and each hop hurts performance.
While we have to do a few hops, they have to do thousands.
The receipt: Cerebras just posted numbers on Kimi K2 (an open-source trillion-parameter model) running at about 1,000 tokens per second. A really good GPU shop like Fireworks runs it at 70. 15x faster.
The Nvidia sleight of hand
A useful technical tangent: when chip vendors quote tokens-per-second numbers, they don’t always tell you whether they mean per user or aggregate. Nvidia in particular has been good at this slide of hand.
The GPU is an extraordinarily good machine at generating slow tokens. An NVL72 at 35 tokens per second per user, which is painfully slow, can generate millions of tokens aggregate. On the other hand, if you ask it to generate at 200 tokens per second per user, it can support one or two users. That’s a $4 million solution working on one user.
This is why throughput-versus-latency tradeoffs matter. If you want a single user to feel a fast response, you need the chip to be fast for that one user — not just to push lots of tokens aggregate across many slow users.
Fabs are pyramids
Feldman’s view on Elon’s announced “Terabit Fab” — a US chip fab project meant to outproduce TSMC by 50x — is interesting because he wants Elon to succeed but is sober about the difficulty.
Fabs are pyramids. They are our pyramids. TSMC is the greatest manufacturing company on Earth. These things take five years to build, six years to build, and $40 to $50 billion from the people who built the last one.
He puts Terabit Fab at 15-20 years to actually deliver on its full promise, not 5-10. He’s not saying Elon can’t do it — he’s saying everything Elon does takes longer and costs more than projected, and chip fab manufacturing is one of the genuinely hardest things humans do.
The deeper point: when the US stopped caring about chip manufacturing in the 1990s, we didn’t just lose the fabs. We lost the surrounding ecosystem — the packaging companies, the material scientists, the process engineers, the deposition engineering. All of it sits in Taipei, Korea, and Japan now. Bringing it back is a multi-decade commitment, not a five-year one.
The pressure test on your soul
Towards the end, Feldman shifts from the technical to the human. Eighteen months of board meetings every six weeks, burning $8 million a month, and still unable to solve their core problem. $100 million in the hole. Then $120. Then $140. Still no solution.
This will kill you if you can’t modulate the highs and the lows. For every entrepreneur, every CEO, this is a pressure test on your soul. The number of times you can get kicked in the gut before lunchtime and have it still be a good day is amazing.
When asked if he’d rather do something else: “No, this is all I know how to do. I’m a professional David in the battle with Goliath.”
Why money can’t buy a lead in AI
One of the audience questions was: why can’t Elon or Zuck just buy their way to the front of AI by paying for the best talent? Feldman’s answer was sharp.
Why couldn’t Intel build a cell phone processor? At the time they had the best fabs. They had the best computer architects. And they destroyed tens of billions of shareholder dollars failing. Same with AMD. In our industry, money and the acquisition of talent isn’t enough. Why doesn’t the team with the biggest money win every year in the NFL?
There’s something about culture and DNA in successful organizations that’s very hard to articulate and even harder to buy. Talent is necessary but not sufficient.
He added a related point about luck:
Luck is not equally distributed to those who work hard and those who don’t. Extremely hardworking people with tremendous grit end up more lucky. Life is really hardworking people over long periods of time who have integrity and ethics — they get lucky more often.
China and orbital data centers
Two final beats. On China selling cheap inference tokens to undercut American providers — they can’t, because they’re starved of leading-edge compute. ASML won’t sell them the machines. What they do have, in abundance, is power infrastructure. The Chinese grid is newer, better, and built for the load AI is demanding. The American grid is built in the 1950s for a different world, and political dysfunction at the municipal, state, and federal level makes upgrading it slow.
On putting data centers in space — Alex pointed out that Cerebras’ chip might actually be a perfect fit, because it’s already designed to be fault-tolerant at wafer scale (you have to be able to shut down individual cores and route around them), and space is full of cosmic rays causing single-bit errors. Feldman agrees but thinks production data centers in orbit are 7-10 years out. The hard part isn’t the chips; it’s the cluster communication and software orchestration.
Key Takeaways
- Google’s capex went from $31B in 2022 to ~$185B in 2026 — a 6x increase that nobody five years ago thought possible without crashing the stock.
- Gemini app has 900M monthly users. Notebook LM has been used to create 1.5B notebooks. 13 Google products have >1B users.
- The “Google was cooked” narrative aged badly. Their integration advantage — owning chips, models, and consumer products at scale — turned out to be more durable than any single best-model lead.
- Google DeepMind is the last American frontier lab still betting on full multimodality. OpenAI and Anthropic have effectively walked away from video. Chinese labs are leaning in.
- The real two-horse race for frontier capability is OpenAI vs Anthropic. Google is solidly tier 1.5 and competing on integration rather than raw smarts.
- Synth ID watermarking adopted by OpenAI, Nvidia, Cacao, and Lean Labs. End-to-end provenance is coming from the synthetic generation side first, not the camera capture side.
- Salim Ismail’s framing: “scarcity = abundance − trust.” As intelligence becomes abundant, trust and verification become the new scarce resource.
- Karpathy’s reason for joining a frontier lab: if you’re outside one, your judgment drifts because you’re not connected to what’s actually being built. Same logic Feldman applies to AI hardware design.
- Karpathy joined Anthropic specifically to use Claude to accelerate Claude’s own pre-training — a textbook recursive self-improvement loop.
- Cerebras’ wafer-scale engine: a single chip 58x larger than anything built before, packed with 40-50 GB of SRAM. Solved a problem (chip-size scaling) seven years before TSMC ran into the same coefficient-of-thermal-expansion issue with the Nvidia B200.
- The two contrarian bets that made Cerebras: (1) AI needs dedicated silicon, not a derivative GPU; (2) memory bandwidth, not compute, is the binding constraint — and the way to solve it is to use lots of SRAM, which requires a huge chip.
- Inference is 80-90% of the AI compute market today. Cerebras runs inference 15-20x faster than GPUs.
- Cerebras on Kimi K2 (1T parameter open-source model): ~1,000 tokens/second versus ~70 from a best-in-class GPU shop. ~15x.
- The Nvidia sleight of hand to watch for: tokens-per-second-per-user vs aggregate throughput. GPUs are great at high aggregate throughput with slow per-user response. A “fast” GPU at 200 tokens/sec/user can support 1-2 users on a $4M box.
- Why splitting big models across many small chips hurts: every hop between chips costs latency. Cerebras: a few hops. Groq (small SRAM chips): thousands of hops.
- Feldman puts Elon’s “Terabit Fab” project at 15-20 years to deliver, not 5-10. Fabs take ~$40-50B and 5-6 years even for TSMC. Local fire ordinances have forced Samsung to redesign their Texas fab mid-build.
- The US lost not just chip fabs but the entire surrounding ecosystem — packaging, material science, deposition engineering. All in Taipei, Korea, Japan now. Bringing it back is a decades-long project.
- China can’t undercut US AI providers with cheap tokens — they don’t have access to leading-edge ASML machines and frontier compute. What they do have is far better power grid infrastructure than the US.
- Why money can’t buy a lead in AI: same reason Intel couldn’t build a mobile chip despite having the best fabs and architects. Talent is necessary but not sufficient. Culture and DNA matter and can’t be acquired.
- Elon lost his OpenAI lawsuit in two hours. Federal jury ruled he missed the statute of limitations. Appeal will almost certainly fail because factual rulings are rarely overturned.
Claude’s Take
This is a competent industry-recap episode with one genuinely substantial conversation — the Andrew Feldman interview. The Google I/O coverage is solid but it’s mostly four guys reacting to demos. If you’ve already read the Google announcement coverage from a couple of newsletters, you can skip the first half without missing much.
The Feldman interview is the reason to listen. He’s articulate in a way that most CEOs aren’t, especially about the texture of why hard things are hard. The line about Intel having “the best fabs and the best architects” and still destroying tens of billions trying to build a mobile chip is the right answer to a question that gets asked a lot. The same applies to OpenAI’s competitors — money buys you a lot but it doesn’t buy you the mysterious thing that lets some organizations execute on hard problems and not others. He doesn’t pretend to know what that thing is, which is more honest than most.
The technical explanation of why wafer-scale beats stitching together small chips is also clean. The “every hop costs you” framing makes intuitive sense even if you’ve never thought about chip interconnects. And the Nvidia sleight-of-hand around tokens-per-second-per-user-vs-aggregate is a useful piece of fluency to carry into reading any chip vendor’s marketing.
The pod has a recurring problem where Diamandis and his cohost Dave Blundin are deep in the Silicon Valley ecosystem and assume the listener is too. They name-drop people the audience won’t know (Lipu, Andy Bechtolsheim, Pierre Lemon), reference past pod conversations as if they were canonical, and slip into accelerationist cheerleading at moments where a tougher question would land harder. Alex Wissner-Gross is the one who pushes back consistently — calling Gemini 3.5 Flash “mid,” calling Gemini Spark “a lazy copycat product,” asking why Notebook LM is still its own brand. He’s the most useful voice on the panel because he’s the only one actually critical.
The Karpathy news is genuinely interesting and the framing — that you can’t do frontier AI research from outside a frontier lab — is something to file away. It applies to hardware (Feldman’s version) and probably to other domains too. The “judgment drifts” argument is a real one.
Score: 7. Solid summary if you’re tracking the AI industry beat. The Feldman interview alone is worth the time. Worth a careful skim of the first hour and full attention to the second.
Further Reading
- Andrew Feldman, “Cerebras Wafer-Scale Engine” — the original whitepapers from 2019 explaining the architecture
- Andrej Karpathy on No Priors podcast — the conversation where he basically announces his own move before announcing it
- Brad Garlinghouse, “The Peanut Butter Manifesto” (2006) — referenced by Salim, the original critique of spreading R&D too thin across too many products, written when he was at Yahoo
- Demis Hassabis, “AI for Science” talks — for the deeper context on root-node problems and Isomorphic Labs
- Salim Ismail, “Exponential Organizations” — the source of the “scarcity = abundance − trust” framing