Transcript: Class 2 Mse435 Economics Of The Ai Supercycle Stanford University Spring 26

The premise of the class today we’re going to talk about everybody knows how software ate the world. Software produced had near zero uh incremental cost of distribution. That is not the case with AI. More users on AI apps require a lot of compute and so they’re not uh it’s not near zero.

The that’s the topic of our discussion. We’re going to do a presentation uh by the by the group. Uh we’re going to do a fireside chat and then we’ll open it up for questions.

So, without further ado, I’m really excited for today’s guests. Our first guest uh is Brad Gersonner. Brad is the founder and CEO of Ultimter. Brad started Ultimate uh with a few million dollars from friends and family. Today, 18 years later, Ultim manages over $15 billion across public and private markets. You know, Brad, I’ve known you for a little bit, and the single consistent thing I’ve known about you is that the best investors have invested across super cycles, across up markets, down markets, recession, crisis, co and Brad has done all of that and more. Brad started his career, trained as a lawyer, helped start general catalyst uh back in the dotcom era, started a couple of businesses uh after that. uh Ultimter being the fourth and at Ultimter you know early in the internet early to Google early with mobile and in Meta and many others early to cloud and software led our investment in Snowflake Confluent Gitlab and now with AI one of the largest investors in in OpenAI and Enthropic which I know you guys from last week love in Nvidia and he was on the board of Cerebras in letter investment in Grock which we’re going to get into deep today. And outside all of that, perhaps the most important movement that Brad has started is Invest America. Can I get a quick show of hands? How many of you have heard about Invest America?

Wow. Look at that. Yeah, look at that. Got a lot got a lot lot of opportunity, it looks like. That’s right. That’s right. In brief, Invest America is uh a federal legislation creating an investment account at the time of birth for every child born in America. The biggest impact invest America is going to have according to me is independence away from dependence from our state and making every child in America an owner of our economy.

Brad, I have the great honor of calling you my mentor, coach, and partner. Thank you so much for doing it. Please join us. It’s great to be here. Thanks for having me.

And you know, a special thanks to Dr. Goell for greenlighting the class. I think is a really important one. And you know, I’m lucky enough to have my son junior in high school sitting here today. And you know, in in a lot of schools today, in particular, colleges on the east coast as well. You know, there’s this people don’t really know what to do with AI. And I say all the time, you got to make yourself bionic with AI, right? like you can’t consume enough AI today because it doesn’t you know it used to be you go to this school you get a job with a place like Grock or Altimeter and today I don’t really care where you went to school I want somebody who shows up and delivers abnormal value bionic value right and the way you do that is going to be leveraging the latest technology so I’m glad that you are enabling the students to uh sit at the intersection of such important topics.

You know I’m going to introduce Sunny And then I’m going to share a couple slides and then I’ll invite Sunny up. But you know, I was thinking Sunny and I have been great friends for a long time. You know, we uh we play we we’re going to play poker tonight in the all-in poker game here in Silicon Valley. So, we’re buddies inside and outside of work. But I was thinking about the introduction. Then I asked Chat GBT and Claude to give me an introduction. And you know, CatchBTs wasn’t great to be perfectly honest. And Claude’s blew me away. So, I figured I’d just read to you, you know, what what what Claude had to say about Sunny.

So, our next guest is a serial entrepreneur who apparently can’t stop getting acquired by bigger and bigger companies. And honestly, the trajectory is incredible. He co-founded Extreme Labs, a mobile development shop acquired by Pivotal. Then he co-founded Autonomic, a smart mobility platform acquired by Ford, which he became the VP running Ford X, their internal innovation lab. Then he co-founded Definitive Intelligence that was acquired by Grock where he helped where he became president and helped launch Grock Cloud. And then Nvidia bought the platform of course recently for $20 billion, their largest acquisition ever. So if you’re keeping score at home, Pivotal, Ford, Grock, Nvidia, the man’s career is basically a spa that only goes up according to Claude, unlike Chamas. He has a computer engineering degree from the University of Ottawa which proves that even Canadians can disrupt things when they put their minds to it. Please welcome Sunny Mudra.

So I like to share a couple slides to to set the context for the for the moment that we’re living in. and inference. The conversation we’re going to have today is really a subset of this important conversation. But but this is global GDP per capita over the course of the last 2,000 years, right? And a and if you look at that, you realize that basically for 1800 years, nothing happened. It was survival, right? There was no excess productivity from a fixed amount of labor and capital, right? It was what we could use to survive. And then all of this stuff starts happening in the 1800s and 1900s. The number of years it takes to double GDP and think like I I like to say GDP is what creates the excess in life for enjoyment, right? It’s beyond survival, right? The surplus that we all have. And so the number of years it takes to double GDP, right, has plummeted. And now we’re doubling global GDP or you might think of it as quality of life every 25 years.

But you may say, “Well, Brad, what does GDP have to do with anything?” Well, it has to do with everything. So, what happens when you have higher rates of of GDP? You have lower rates of poverty. You have higher rates of basic education. You have higher rates of li literacy. You have more democracy, more freedom, higher rates of vaccination, few, you know, uh, lower child mortality. So it turns out that innovation in and of itself is a societal good and it happens to be correlated and accelerating. So technology as an investor has gone from 5% of global GDP to about 13% of global GDP. And if I asked you guys 10 years from now, are we going to be at below 13% of global GDP or above 13% of global GDP? I think you would all say that technology as a percentage of global GDP is going to be a lot bigger number.

We’re sitting in the heart of Silicon Valley. Technology outearns non-technology. So the the dotted blue line here is this is the NASDAQ has compounded earnings per share at 15% for the last 10 years compared to 6% for non- tech companies. So why do technology companies tend to be better investment than non- tech companies? Because they compound their earnings per share faster. Again, I think will be accelerated by AI. And of course, AI is going to massively accelerate all of this because when we look at all the knowledge work in the world, the TAM for it is measured in the trillions. Demis said it well. It’ll be 10x the impact of the industrial revolution but happening at 10x the speed probably unfolding in a decade rather than a century.

So I think that is the context like what we’re doing here and the acceleration that will come with AI should be something that’s better for all of society. We’re going to have to talk about the guardrails and the societal change we’re going to have to make to to to be that. But sitting at the very root of all of this is compute. You know, you guys all know that the atomic unit of AI or intelligence is the token, right? And there’s nobody better able to talk about the production um of this atomic unit than Sunny.

And so Sunny, I want to kind of go back in the wayback machine a little bit. You know, tell us what Grock is, right? and what your observations were in 2023 and 24 about what was going to happen um with inference.

Yeah. Uh just a a little bit of a quick background. So Grock was founded by uh Jonathan Ross. Jonathan Ross was the creator of the TPU at Google. Um and Jonathan Ross’s background is interesting. He was a high school dropout. Uh not cuz he couldn’t complete it because it was like probably too boring for him. and went straight from being a high school dropout and probably complete his GED or something and went straight into like a PhD math program at NYU. Um, and then gets recruited into Google and like every great engineer over the last 20 years was made to work on like ad optimization or ad testing which is terrible um in some ways. But what he did was um he listened to a talk by Jeff Dean and Jeff Dean had come in and basically said hey um good news bad news good news we’ve think we’ve found an algorithm to solve automatic speech recognition which could be useful in many places bad news there’s not powerful enough compute so we can never run it and Jonathan took it amongst himself to come up with a design um and coming you know from a completely different area design using an FPGA for the first version of what became the TPU. And then ultimately, you know, Jonathan left Google because he thought he, you know, the rest of the world should have this. It shouldn’t just be embedded inside Google. And so he left and started.

And so quickly, you know, what Grock is um and continues to be inside Nvidia as well. It’s a chip that’s designed with a data flow architecture. And what makes it very significantly different than any other computer architecture is that it’s fully deterministic. So hand in hand with the architecture is a compiler. and a compiler which predetermines where all the calculations are going to happen. And that last bit is really important because the you know the underlying thing to any AI problem and token generation is lots and lots of math. And that’s why we’re seeing this compute explode.

And I you know implore everyone to go look at the following. Uh you know we talked about one of Brad’s great investments snowflake. Snowflake is a you know database retrieval company right and so you have to go get a record and bring it back. And if you look at the number of cycles it takes to do that and compute cycles, you can really see and it’s not a really large amount, but you look at the number of tokens it takes to generate a single token. It’s mind-blowing. And the best way to think about it is it’s usually the parameter size of the model times the context length squared, right? And so that’s for each token. And you know, you’re doing something and have lots and lots of tokens. So we’re in this era where we have this incredible technology, but it’s incredibly compute intensive, several several orders of magnitude larger than any other computing paradigm we’ve had before.

You’re at your own startup, right? You have a conversation with Jonathan about merging. I was an investor in in Cerebrus, which is also building a fast inference chip. Grock was building a fast inference chip. These two companies had been in existence for upwards of 10 years. Um, and the extraordinary thing is like in year nine, they’re they’re they’re both fighting for survival. They’re not thriving, right? Like, and they’re they’re they’re building for a market that didn’t really exist. Okay. But you saw something, you know, Jensen came on my podcast, BG2, and he said, “Everything just changed.” I said, “What do you mean?” He said, “Inference time reasoning.” He said, “We’ve gone from pre-training models to inference time reasoning.” and inference is about to 1 billionx. So not 10x, not 100x, not a million x, it’s going to a billionx. And our systems of compute are not designed for what’s coming.

I remember you and I had a conversation and shortly thereafter you helped broker, you know, kind of this vision for Jonathan that said, Jonathan, I think I see your future more clearly than you do. So tell us about that moment.

Yeah, I think, you know, at that moment a couple things are happening, right? So um the market had been dominated by Nvidia um because NVIDIA is what the researchers use to create the models and so naturally when in part of creating a model inference is the forward pass of training right in the back prop that’s what’s different and so um you’re always doing inference when you’re when you’re creating models and so it’s very natural to just run it on the same hardware that you’ve created the model on and so one of the things that you know we saw with the Gro architecture was that we could complete inference much more efficiently. Right. And so if you look at our V1 chip which we you know put into the cloud that’s a silicon designed in 2018 silicon from 2019 14 nanometer and super competitive against hoppers right which is you know five generations newer in terms of silicon technology.

And so really what we saw was we thought it would be very difficult for convincing people to buy our hardware and use it. But if we built a cloud and put it in the cloud, developers really don’t care, right? If there’s an API and developers, you know, we’ve seen they’re quite fungible there. So our big insight was take these things start putting them in the cloud run the data centers and make them available via an API and make the best open source models available um for everyone and even including open AI models like open had whisper which was open source from the beginning and so we had put a lot of those models there and that’s what really took off and we launched the cloud and within a few weeks we went to a couple hundred thousand users and today it’s like something at 4 million users and it took Nvidia almost 17 years to get to you know 7 million users.

So effectively reasoning models come along. Reasoning models have are much more voracious in their token consumption. This is even before we get to agents. This is just deeper thinking um than what you know oneshot pre-trained models were doing. And so when you looked at token consumption curves, they were just going parabolic. And our hardware, our clouds were starting to break. You know, OpenAI only had a gigawatt of compute. Anthropic only had a gigawatt of compute. So we had to figure out how to make both more token efficient models but also more token efficient architectures.

So now remember Cerebras and Nvidia were big-time competitors. Nvidia and Grock were perceived as big-time competitors. So Sunny you sent me a text and said I have an idea. We we we at the time were major shareholders and still are in Nvidia and um and and good friends with Jensen and you had an idea and the reason I want to point this out is just like how one person’s idea you know we see these big transactions but sometimes we don’t unpack like that it’s just one decision on one day that causes these things to occur.

Yeah and you know Brad you did lead that email so that that was awesome. Um but basically when we were looking at the problem of inference even as grock what became obvious to us is if you started to dissect how inference works there’s first a dissection which happens between say prefill and decode right and so um many people were starting to do that where you basically you know use a separate set of machines for you know prefill and another set of machines for decode and you can basically get some efficiency or lots of efficiency by doing that. Um what we further did and this is like a good lesson for everyone. We further looked at you know prefill and decode and within the decode we realized that we could disaggregate the decode because within the decode there’s many different functions that are happening and some of those functions are compute intensive and some of those functions are memory bandwidth intensive.

And so one of the big differences with grock over a GPU is GPUs have lots and lots of compute and lots of external memory which is HBM for them which is slower. We don’t have a lot of compute on grock chips. We have a lot of SRAM and that SRAM is very high bandwidth almost more than an order of magnitude faster. And so typically on a CPU, you’d see that as like your L1 cache, but we have lots of that in our chips.

And so when we looked at the problem and what the email to Jensen was about was basically connecting to their chips via something they call NVLink. So NVIDIA chips speak to each other via protocol called NVLink. And that allows you to basically not run something on a single GPU. you can run it on lots and lots of GPUs together. I think today we have 72 you can do and we’re scaling up to 576. Grock has a similar protocol and we’ve been running thousands of chips together. In fact, we had many models that we were running on four to 8,000 chips at a time.

So basically NVLink Fusion was a way for us to allow our chips to speak to the Nvidia chips. So we could take part of the problem which we knew the Grock chips were faster at and more performant at and run it there. And the net result of all that is if you take the same footprint of power you can get two and a half times more tokens out by basically you know combining those two systems together which in today’s world of you know constrained compute is really valuable.

So Sunny sends me a text and he said I think we can partner with Nvidia. That in and of itself is a pretty big change because if somebody’s your your your chief competitor you know the idea that you can partner with them is is a pretty big change. He said, “Would you mind sending Jensen, you know, a text?” And I’m thinking to myself, “Man, you know, I I’m going to spend some political capital with Jensen, so like I I need to know that this isn’t a crazy idea.” And so I kind of sit on it for a week or something. And then Sunny texts me again. He’s like, “Have you sent Jensen, you know, that text yet?” And so I said, “Okay, I’m going to send it to him.” And Jensen immediately got back to us and said, “Interesting idea, like, you know, let’s let’s have a chat.” And you guys started working with him. And what was really compelling I think to Jensen was you had you know obviously somebody had built a competitive chip but they had mentally thought about how can we produce a lot more tokens together.

So what Sunny just said is really important. Open AI is got a fixed footprint of let’s call it a gigawatt that they’re going to take in September of Vera Rubin’s in in one end of the factory goes power and chips. You obviously have the building, all the costs, and out the other end comes tokens. Okay, when they bought Grock for the exact same power footprint, for the exact same building, they’re now generating two and a half times the number of tokens. And the constraint we have in the world is power and memory. So if you can double or triple the amount of tokens for the exact same footprint, it leads to an enormous economic outcome for OpenAI or for anthropic.

And so you’ve seen as the demand on inference because of inference time reasoning and we’ll talk next about agents as the demand for these tokens of intelligence have exploded and literally we’re consuming tens of trillions of tokens now per week around around the world. We’ve had to come up with more power, more chips, more of these inputs in order to produce those. And so it’s not just about fast chips. It’s also just about or uh fast inference. It’s just about the ability to get more tokens into the world in in a world that is constrained.

Yeah. How many days from the time you know you showed Jensen a working system? How many days from that until him greasing you with $20 billion? Uh probably just over a month. Yeah. 30 days. Yeah. And Jensen is like, you know, do they have any competitive efforts going on at NVIDIA?

Yeah. I mean, I I think like the, you know, Nvidia and you see it and we talked about it at GTC. Nvidia has an ecosystem already of seven chips and five different racks, right? So Nvidia is no longer making like a GPU and I think that’s what is one of Nvidia’s superpowers that they’ve started to look at disaggregating the problem in all different ways whether it’s storage, whether it’s CPUs, whether it’s you know uh compute or networking chips. And so that already exists. So they had already thought about building a decode only chip and something that was powered by a lot of SRAM, right?

But I think sort of the and you know it’s a good lesson for everyone like us putting that email in starting to work together and building a prototype that was working with their systems was a real proof of concept for them and and you know in these large systems making these things work together making them performant and this is you know across two different companies with two different completely different stacks. I think when they saw that we were able to do that, it showed that, you know, we’d be a good integration. And the last thing I’ll say is we’re two very different types of companies. I think, you know, if we were kind of making a better GPU, there’d be a lot of conflict within Nvidia, you know, after after the type of deal that we did. But because we were making this SRAM chip deterministic compiler based, which is completely different than how GPUs work, it’s very complimentary for the cultures and the engineering teams to come together as well.

How many people in here you have used open claw? I mean, that’s pretty incredible penetration. I saw a stat Mark Andreessen may have tweeted this today, you know, that most of the people he talks to are somewhere between a hundred and $1,000 now a day. Yeah. On on token consumption with OpenClaw. And he said basically the next 20 years of Silicon Valley is going to be producing technologies to drive down the cost of intelligence. Right?

And so I want to talk about that. Sunny, if we look at the cost of inference, it’s dropped by basically 90% over the course of the last year. It’s dropped by closer to 99% over the course of the last two two and a half years. So talk to us about the the the input like what’s driving the unit cost of inference and if I take a like for like let’s call it a unit of intelligence whether it’s a basic question I ask or whether it’s a little bit more complicated question I ask do you expect that unit cost to continue to go down and if so why what are the inputs to that unit cost.

So the the inputs are I think the following three major things the supply chain right? Like you know what can you do across the supply chain which is you know mostly centered around like Taiwan today um you know TSMC and the different packaging technologies and the lithography technologies that they buy from others the innovation that your engineers can perform right and and what I would say um is like the you know the amount of power you have right and so like those are the kind of things we’re talking about.

And so what what we see today is um you know, lithography technology is starting to reach a limit, right? We’re not we’re not going as fast as we used to, right? And so we’re not getting sort of the Moore’s law, but so we have to exceed that. So we’re exceeding that in a couple different ways. We’re exceeding that by making bigger and bigger chips. And so if you see that, you know, these chips become quite large now, which is very exciting, but also lead to a lot of interesting issues. Cerebras, you know, as Brad’s been talking about, you know, their chips are kind of size like a pizza box, right? versus you know CPUs you guys all would have seen.

And so there’s a lot of energy and technology there is like how big of a package can you make and how much silicon can you pack in there then there’s the innovations and so the innovations is really where we’re seeing most of this work happen right because that’s hand in hand with the models right you know we’re seeing this really interesting force today and there was a bunch of stuff that I don’t know if it was leaked or put out there I think Elon is at at the center of some of that which is they’re discussing like these newer models are approaching like 10 trillion like 1 trillion to 10 trillion parameters and those 10 trillion parameter models go back to that first thing I told you that’s in the fundamental flop calculation of how how much compute it takes to generate a token.

So as fast as companies like us are making better and better technology you know through lithography upgrades through memory bandwidth upgrades um through innovation and you know how we lay out our circuits through quantization efforts MVFP4 was another one the models are getting bigger and then the demand is increasing. So, it’s a I’m going to put it back to you. There’s this like this three kind of it’s a cube, but it’s it’s really difficult to navigate right now because all the factors are growing in ways which are uh really challenging. So, the demand keeps going up, the models keep getting bigger, and as fast as we’re innovating, even if we get a 50x over five years, the models and the demand faster. And that’s why we’re seeing, you know, this unique phenomenon like H100 prices, if you’re building a startup or using them, they’re going up.

In fact, yes, like one of the things that I think is important for everybody to understand, I mean, when OpenAI and Anthropic started, their their gross margins, right, on the businesses were highly negative. Okay, so that’s a scary thing to do. Go raise a lot of money and it basically produce a widget for a dollar and you’re selling it for 20 cents and you have a big negative gross margin. But why was that? Right? They were going out and they were charging you all to use chat GPT or uh you know their APIs were charging a certain amount of money. They weren’t that capable. So there was only so much money we were willing to pay, right? And two years ago the cost of inference was a lot higher. But the bet they were making is that the cost of inference would come down a lot and your willingness to pay would go up a lot as intelligence got a lot more valuable.

So you know I like to think of the first inning of AI was just getting to a place where we could yield answers right in code generation it was basically like autocomplete tab complete in the case of chat GPT it was like you know basically telling a slightly better version of Google but now we’re entering into this phase of action right where agents do things on go build me an app go me a website figure out how to resolve this customer service problem, sell more of my product, find a cure for cancer, book me a hotel in New York. It starts doing things and when it does things, the amount of tokens it has to consume in order to do those things explodes by an order of magnitude, but the value delivered to the end consumer as a unit of intelligence goes up by 100x. So, your willingness to pay goes up dramatically.

Can I add one? Yeah, you know this week we saw mythos which is the unreleased model by Anthropic find a bug in BSD which you know think about how many engineers and software developers and everyone else and you know PE companies using that have looked at that code. So we’ve gone to a place where it’s doing things beyond human capability. Correct. Which is and we’re in we’re in year three. Exactly. We’re in year three.

To give you another, you know, like what’s my best evidence to convince you of the value of AI? Well, my best evidence is that Anthropic in the month of March just added $10 billion in in in annualized revenue in a single month. Okay, that is the total amount of annual revenue for Databricks plus Palantir combined and they added it one month and they didn’t add it because they hired a million salespeople went out to a million companies and convinced them to buy their product right they added because their product crossed a threshold of intelligent capability that millions of customers around the world said I have to have this product to make my company better.

The amount that Altimeter is spending what but millions of self-interested actors around the world independently made a judgment I have to buy a lot of those tokens a lot of those capabilities both Claude Code and co-work and the same thing is happening in OpenAI um not quite you know on the same exponential in terms of revenue but I think for me this was a little bit of an Oppenheimer moment. This was a little bit of the splitting of the atom. Like we’ve heard Dario and Sam talk about the exponential or the end of the exponential on intelligence.

But the big question was, are they going to be able to afford to continue to build the compute in order to keep up with this? I had this somewhat uncomfortable moment with Sam Altman on my podcast, the BG2 Pod, that went a little viral when I asked Sam, “Hey Sam, you’ve made $1.4 trillion of spending commitments, but you only have 13 billion of revenue. So explain to me how that works. Like, how can you commit to spending 1.4 trillion, you have 13 billion of revenue?” And I had hoped that Sam would make the case that his revenue was going to go up a lot. and these were kind of call options and he could renegotiate them. But instead he said to me, “Well, if you don’t like if you don’t like your investment, I’ll buy back your shares.” Which was not exactly the response I was hoping for out of Sam in the moment.

But that was the question heading into 2026. My podcast partner Bill Gurley, a lot of other people highly skeptical saying this is an AI bubble. These guys are spending at rates they’re never going to be able to pay the bills on because there aren’t people on the other end willing to pay for the products to justify that level of spending. And what happened in January was Anthropic had a $3.5 billion month. In February they have an $8 billion month and in March they have a $10.5 billion month. That to me said, “Oh, everything’s changed. The product is now sufficiently good that you have revenue scaling on the same exponential as intelligence. So they can afford to pay for the $50 billion per gigawatt to stand up all of these inference factories to produce all this, you know, kind of collective intelligence.

Just react to that, Sunny, because our group talks a lot about this. There was a lot of debate in our group and on the all-in pod and others as to whether or not this was a bubble. Yeah, I’d say there’s like kind of a couple things that, you know, maybe the broader world doesn’t see yet. One, the models that we see today haven’t even been trained on the latest hardware, whether you want it to be, you know, um, Blackwells or Veras are just coming out, right, or Rubins are just coming out. Um, so we haven’t even seen that yet. And so we haven’t seen the capabilities that you get and so we’ll start to see that.

I think one of the the first ones we’ll see is, you know, the stuff out of Elon’s Grock, right? So that’s a. So the capabilities you’re seeing here are things that were done on older hardware. So that’s a so when you’re inside the ecosystem, you know what’s capable and what’s coming next, right?

I think one of the things that is really starting to take off and I think Anthropic’s done an incredible job here. And I think you know Codex does an equally incredible job on very hard kind of software problems is that there’s not just a chat interface that majority of people are interacting with. It’s not just an API, but they’ve created like a harness around the models. And those harnesses, OpenClaw is just another harness as well. Those harnesses have figured out how to extract more and continually extract. I think, you know, with Claude Code and co-work, you can have it just ping you whenever it’s stuck on your phone, even if you started somewhere else. And so, it can kind of be in this continuous loop and it’s working for you all the time. We’ve never had anything like that.

When it’s doing that, you take that token consumption of like the, you know, you were doing a query before and it was doing some thinking and coming back. Now it’s just working all night long and pinging here. You tell it don’t even bother me, keep coming back. So, we’re seeing these harnesses really extract more and more tokens out of it as well.

And the type of problems that people are solving. We gave the code problem, you put a bunch of other ones, but like you know inside big businesses and you know I tweeted this so I think it’s fair but like inside Nvidia now we have this thing called the Nvidia personal assistant and it’s connected to Slack, it’s connected to Teams, it’s connected to our email and it’s connected to all our files wherever they may exist. And so every morning it runs and it figures out all your task items for the day. You can have it answer those things and it’s really incredible.

And so you start to the the way we work and we were talking about this earlier with someone’s like you don’t even write email now like someone else’s agent is going in their emailing you and your agent is looking at emailing them back but a lot more work is getting done because my time is freed up from basically answering emails all day long and approving things out of all these traditional SAS systems. The agents handle all that.

So the explosion to your point is just in the first or second inning. The amount of tokens is is really just going up. So we don’t we don’t fear that. We don’t look at that as like an overbuild in any way, shape or form.

I I I think the facts and evidence on the field is number one the cost of both training and inference um but inference in particular is plummeting and continues to plummet. That shouldn’t be altogether surprising. Technology ultimately is highly deflationary. I’ve never seen something this deflationary this quickly. I think it’s a byproduct of extreme code design. It’s not a single chip. It’s a factory. And across the factory, there are all sorts of Moore’s laws playing out combinatorially across the factory.

At the same time, when you’re able to produce a lot more tokens, the unit of intelligence that you’re delivering is much more valuable. So the willingness to pay on the other end goes up a lot. And I’ll tell you for an OpenAI or an Anthropic today, if those guys were at negative gross margins a year and a half or two years ago, they’re now at very positive gross margins. Right? So all of a sudden, this business that looked diseconomic looks highly economic today.

So it’s kind of it’s resolved a little bit that question. Maybe just, you know, Sunny, I want to finish our our section with maybe just a little forecast and pre-wire you guys to we’re going to open it up to questions. It can be about the economics of inference or any part of the stack or any other questions, you know, that you all have.

But you mentioned mythos. It’s a model out of, you know, that came out this week was not generally released, uh, but was sandboxed by Anthropic. Tried to get out a few times. Tried to escape the sandbox, you know, trained on kind of TPU7. On the other hand, you have Spud or 55 coming out of OpenAI probably this week or next, which is a first Blackwell trained model. Elon’s going to have one. Meta’s just out with a model, you know, yesterday, Google, etc.

Talk us through, you get to see into the product pipeline at Nvidia. Is there do you think that the pace of the you know the cost of inference curve continuing to come down do you think that continues for the next several years? Do you think that the step function or the exponential if you will of both pre-training and inference time reasoning in terms of improving the algorithmic capabilities of intelligence continues?

Well, I can tell you, you know, kind of having a chance to work with Jensen now, like he challenges us in everything we do to not show up unless it’s 100x. So, whatever, you know, we bring to him and I you know, can’t get into too many details, but like his first challenge back is is this 100x from what you did before. So he is challenging the engineers to take a look at every part of the problem you know from all the way you know down into memory controllers or memory capacity or circuits whatever it happens to be to make sure we 100x everything.

So on the first part of your question yes because he pushes us to do it and he gives us the latitude to do and he gives us the resources to go do it. So I can tell you like the types of things that we’ve been enabled in coming in as the Grock team things we could never do as a startup but Jensen has enabled us to do those things.

Are you guys harnessing AI yourselves to design the next generation chips? A ton right we were doing that even before because we needed to we were a small team but now we have access to you know sort of the entire ecosystem of things that are available so I think that’s right is that we’re we’re being pushed to do it.

On the related side though the the more we innovate the more the model makers innovate and the bigger the models get so this is this and so um which means the capabilities that are coming out are better so we continue to need that buildout. So, you know, we’ll all look back and we’ll thank, you know, there’s a couple companies that changed the footprint of the internet for us. And you could talk more about this than I can even, Brad, but like the work that Google did to build the infrastructure they did for video, for search, it really paved the way for the rest of the internet, CDNs, all types of other things, right? And so, a lot of this work that’s happening to build out this infrastructure will pay benefits. And you need that to continue to happen because it can’t just be in the innovation of the chips. It’s like you need more and more infrastructure to be built.

I I I’ll wrap with this. You know, we have the great privilege of talking with Jensen or Elon or Sam or Dario and you guys all can read about, you know, kind of the personal battles they have between there’s, you know, uh some some days, uh not a lot of love lost between them, uh in the race to, you know, to AGI. But right now, I see amazing uniformity. When I talk to them, they all in a non-hyperbolic way say, “We’re there and we got there faster than we thought. Like, we’re nearing the end of the exponential.” And if you ask Dario, Dario, what is the most surprising thing to you right now? He says, “We’re almost at the end of the exponential.” And like people don’t even seem to realize it. And if you ask Sam, he’ll say the same thing. And if you ask Elon, he’ll say the same thing.

That shouldn’t be scary to any of us. It just means that we’re in this recursive place where you know we have AGI and the job of everybody in this room including the folks sitting up here is going to be how do we harness this technology for the betterment of all of us right which is going to require going back to what Apoorv said about the Invest America act. You know Dario says the accumulation of wealth that’s about to occur people call it the age of abundance we’re going to enter into, right? That’s going to be easier than ever, but the distribution problems we’re going to encounter are going to be harder than ever, right?

And so that was really the inspiration behind the work I’ve done on Invest America and the work that I think we’re all going to have to collectively do around the social contract, the intersection between public policy and technology because when the exponential looks like that and all of a sudden you have uh agents that are going to be able to have more capability than kind of collective human intelligence and it’s happening at an accelerating rate.

Remember all the stuff we’ve talked about has occurred with almost no compute. Anthropic and OpenAI are going to add more compute this year than all the labs put together for the last decade. Okay? And the year after that, they’re going to double it again. So, you know, I I think the rate of change is is is fairly parabolic. And that to me is both exciting. I’m an optimist about what’s to come, but I’m I you know, I’m not Pollyanna-ish about the challenges that come with that rate of change, right? Like it’s going to require active engagement like it has in other periods in history around the industrial revolution, the digital revolution, etc. because it’s going to exact a lot of change on the world.

But with that, I just want to say, you know, it’s been extraordinary watching Sunny orchestrate the work that he’s done at Grock. He’s an incredible thought leader uh in this whole area. Um I appreciate you coming in, but but maybe just open it up some questions and hopefully we can cover a lot of territory.

Q&A:

Question about positioning oneself for the future:

Yeah I mean listen I I uh I have to answer this question for my son and um and for so many others. Um, and humans have a unique way of finding a way to add value to society notwithstanding disruption. Right? In the industrial revolution, like if you were a trades person or a crafts person, right, and you built a product beginning to end, almost all of them were displaced by, you know, mass production. And for that person, it didn’t feel good. I was really good at making a wheel start to finish, but I was totally disintermediated by the means of production. Okay. So, but it’s not like we, you know, the world just stopped. Those people found other things to do.

And one of the observations I have is that we used to have, you know, 80% of people that were in manufacturing and that were in, you know, farming and other things. Today, we have 70% of people in the service economy, right? We have the luxury of people, you know, we didn’t used to hire coaches as an example, right? Like couldn’t afford to hire a coach, right? Today you have coaches and yoga instructors and tons of things in the world that adds a lot of value to the world. And I think that we have higher order things that people do, right?

And so for me, one of the things is, you know, if you were well off enough that you could hire a tutor, a specialized tutor, that was great. But for 98% of the world who couldn’t afford that, now they can get that. Or if you were part of the two or 3% that could have, you know, um concierge medicine, it was really great. But for the other 97% wasn’t great. Well, now they can get that same level of care. Um, you know, and so I think this is about democratizing intelligence, democratizing access, etc.

Um, but it it’s not to say that there aren’t going to be different challenges. My number one thing again is make yourself bionic, be a creator, figure out a way you know that you add value. Um, so if somebody comes and wants to interview at Altimeter and you know they say, “Oh, I don’t use you know AI and I don’t use Excel spreadsheets. I do everything by hand.” That would be a problem, right? Like I expect somebody to use all the greatest tools at their disposal to be the most effective they can to add value to allow us to generate alpha in the world.

And so starting in a place like this another way of saying it you know reserving this for a tweet at some point in time but I think that IQ gets commoditized and EQ becomes super valuable. Okay. What do I mean by EQ? I mean a network of people in this room. I mean the ability to persuade the person sitting next to you, the ability to form your team, the ability to lead people in different like that is super valuable, right? I think it becomes more valuable in the future and but I think you know just being the smartest person in a room and and solving the problem at the board faster than all the other humans in the room like that I think’s commoditized and you’re not going to be able to beat the machine. Doesn’t mean you don’t need to learn those things but I think it’ll be hard to beat the machine.

Can I just add one thing to that? I think like there’s this other moment that’s occurring right now and I think about this quite often and if you actually look at what’s happening in mathematics right now, there’s all kinds of new discoveries happening. And I use the following analogy like humanity had to wait for like an apple to fall on Newton’s head for him to kind of then start theorizing about gravity and you know start formulating that. But now if if we can have something else working and discovering new things like and it goes back to that chart that Brad showed right it’s really until we started having more you know innovation more intelligence that you know those curves went up and to the right we’re just about to make that go more vertical. So I I think I think the overall benefit to humanity is already been shown what happens when you have more intelligence, right? And so and we don’t have to wait for things to happen. We let the agents do it without us in the loop which I think will be powerful.

Question about Apple and hardware/software integration:

I think it’s a high state. I mean, listen, I I’ll tell you, even the people at Apple are nervous with their strategy. Um, and so part of it is their challenge around privacy. They have a real challenge with because we don’t have the capability yet on the edge and they don’t want to have you sharing information up to the cloud, right? um given their you know like they view as one of their core consumer value propositions is consumer privacy but I think they put themselves at you know at at risk. The the the the bull case would simply be that we’re so sticky to the device and the device is so good that they have time and ultimately the Gemini model that they’re going to put on the phone is just going to be a much more capable Siri. We can all agree the old Siri was really bad and you know we’ll be a more capable Siri and for the vast majority of people that will be good enough right so that would be the bull case. You know I think the bear case is like you said that other people come along and build more ambient devices that you know consumers uh you know really like but for me I frankly wish that OpenAI wasn’t working on a device I wish they would just focus on building intelligence you know and I think Apple is going to be very formidable uh you know in the device world. So I think they’re in a reasonably good spot.

One stat an 8 billion parameter model which is quite small right can burn out a phone in an iPhone in 30 minutes. It goes back just battery life. Yeah. Battery life. Yeah. So I just say you got to go back and look at how compute intensive AI is. Right. And so that’s the real challenge on the stuff that Brad said about pushing so much of that um you know frontier intelligence to the edge.

Question about AI safety and CEO fear-mongering:

He and I had that argument on the pod again today. Again, call me naive. I think that Dario’s speaking authentically what he believes. I think Sam’s thinking speaking authentically what what he believes. They’re staring at this exponential. They believe that they see AGI or ASI. And I think they do have legitimate concerns. Like listen, I’m glad that we sandbox mythos. You know, they tested it internally. Um they found 26 vulnerabilities on the Safari browser. And like I said to Chamath today, do you want them to just throw it out there and then all your browser history is out in the public? Probably not, right?

So at the same time, I don’t think it helps, you know, going out and fear-mongering, particularly if you’re real intent is regulatory capture, you know, to prevent everybody else from climbing up the ladder now that you’re on the top. Like I have a real problem with that. Um, but I think we have to find that balance and those tradeoffs, right, between reminding people about, you know, the optimistic side of things. Um, and I encourage you to read both of Dario’s, um, you know, uh, essays. Um, and his first essay on this is quite optimistic about, you know, what can happen. But I think he has the other side of it which is but it it it doesn’t happen without us being very thoughtful about the guardrails and things we need to put in place.

I don’t you know one of the things I was really happy about it’s called project Glasswing which is this consortium that they put together this week to effectively sandbox mythos before they release it publicly. You know Amazon, Microsoft, etc. Like that seemed to me to be a very pragmatic market-based solution to to solve the problem. And uh he and I were were just texting before I came over here. They they found and they’ve hardened a lot of things already very quickly and within a 100 days you can do a lot when you’re having the AI fix the things that it finds.

And so um you know I I certainly I talk optimistic about it. I think a lot of other people do encourage them to to find a little bit more balance in their commentary. Um but you know I I also don’t want us to ignore the realities that when you split the atom, okay, it can either provide unlimited free energy for the world and totally, you know, bring people out of darkness, right? Or it can be used to make a bomb to destroy cities and and nations. And so like powerful technology is powerful technology. We can’t just stick our head in the sand and act like it’s a one-way street.

Question about training costs and economics:

I I think you you see kind of a couple of phenomenons. One, the gear that’s used for training turns into inference gear, right? So those big clusters, we’re seeing that happen kind of all the time. So, um, there there’s just a natural progression between kind of, you know, those two worlds.

And then I I I really do think like the innovations that come, you know, from those larger models and those training clusters have such a large benefit and they tie back to, you know, what Brad said, right? You know, if you know, I was just reading this thing today by Mustafa from from Microsoft, right? saying look GPT2 what what we have 50x kind of more powerful compute today than what we did when we did GPT2 but look at the capabilities right and so I think you just have to kind of keep those two things in line with sort of the entire topic of the conversation um you know innovation is going to keep happening because you know Brad touched on a little bit we’re just now unleashing AI into designing these things right and there’s things that we see and we learn in terms of optimizations and software and hardware optimizations that we don’t see.

So, I continue to believe it’ll come down. Um, but yeah, the the we’re just working on a problem that’s just very very intensive from a compute standpoint. So, those numbers are going to be large.

Question about Nvidia’s long-term economics and margins:

So you know maybe just repeat it real quick and then answer it since I’m… Well, I’m going to let you investor… I I work there so I I shouldn’t answer that question. Yeah, I mean I mean listen, Nvidia is a $4.5 trillion company that’s trading at about 13 times earnings, very cheap, half the market multiple, growing at 70% is obviously dominant, you know, I in the market today and um I think there’s a wall of worry about Nvidia because everybody says what you know and you know and TPUs and Cerebras and Grock and all these people can come up with inference solutions. They can steal your share. They can compete on price. That’s the beautiful thing about capitalism.

You know what Nvidia will have to do? Compete. They either deliver a product that people are willing to pay more for, they have to drop their price, their margins come down, and they’ll have to compete in that market. I would tell you that when I look into the product roadmap um for what’s going on at Nvidia, and the acquisition of Grock was part of it, I think they’re going to be in an incredible position.

And they’ve already announced that they have a trillion dollars a trillion dollars of sales over the course of the next eight quarters that are already booked. People have more demand than than than they can get memory and supply to build all of this compute out. So, I think we’re so early in this.

I was in Silicon Valley, you know, not so long ago, 16 years ago, when they said there could never be a trillion dollar company. There would never be a trillion. And we asked the question, well, why? They say, well, law of large numbers. I was like, well, what what stone tablet is that etched into? Okay, in nowhere. Okay, today we have a $4 half trillion dollar company. I’ve already said publicly Nvidia will be the first $10 trillion company.

Okay, and I’m not it’s not because I’m a cheerleader. I can sell my Nvidia and invest in anything that I want to invest in. But that company’s leadership, that team, their lead, the lead that they have on both training and inference and the rate at which they’re moving, right, I think puts them in a really great competitive position. And they’re doing all this notwithstanding the fact that Tranium’s successful, TPU is successful, custom ASICs are being very successful, and they’re still killing it. And I think that says a lot more about the size of the market for intelligence and the compute that’s needed to get us there than it does about, you know, the individual company.

With that said, we have to wrap. Apoorv, thanks for having us. Thank you. Thank you.