Transcript: Leopold Aschenbrenner 2027 Agi China Us Super Intelligence Race And The Return Of History

TITLE: zdbVtZIn9IM CHANNEL: Unknown DATE: ---TRANSCRIPT--- What will be at stake will not just be cool products But whether liberal democracy survives, Whether the CCP survives, what the world order for the next century is going to be The CCP is going to have an all out effort to infiltrate American AI labs Billions of dollars, thousands of people CCP is going to try to out-build us. People don’t realize like how intense state level espionage can be When we have like literal superintelligence They can like Stuxnet the chinese data centers You really think that will be like a private company And the government wouldn’t be like “oh my god what is going on?” I do think it is incredibly important that these clusters are in the united states I mean would you do the manhattan project in the UAE? 2023 was the moment for me when it went from AGI as this sort of theoretical, abstract thing, and you’d make the models to like, I see it, I feel it. I can see the cluster where it’s trained on, like the rough combination of algorithms, the people, like how it’s happening, and I think most of the world is not; most of the people who feel it are like right here Today I’m chatting with my friend Leopold Aschenbrenner. He grew up in Germany and graduated as valedictorian of Columbia when he was 19. After that, he had a very interesting gap year which we’ll talk about. Then, he was on the OpenAI superalignment team, may it rest in peace. Now, with some anchor investments — from Patrick and John Collison, Daniel Gross, and Nat Friedman — he is launching an investment firm. Leopold, you’re off to a slow start but life is long. I wouldn’t worry about it too much. You’ll make up for it in due time. Thanks for coming on the podcast. Thank you. I first discovered your podcast when your best episode had a couple of hundred views. It’s been amazing to follow your trajectory. It’s a delight to be on. In the Sholto and Trenton episode, I mentioned that a lot of the things I’ve learned about AI I’ve learned from talking with them. The third, and probably most significant, part of this triumvirate has been you. We’ll get all the stuff on the record now. Here’s the first thing I want to get on the record. Tell me about the trillion-dollar cluster. I should mention this for the context of the podcast. Today you’re releasing a series called Situational Awareness. We’re going to get into it. First question about that is, tell me about the trillion-dollar cluster. Unlike most things that have recently come out of Silicon Valley, AI is an industrial process. The next model doesn’t just require some code. It’s building a giant new cluster. It’s building giant new power plants. Pretty soon, it’s going to involve building giant new fabs. Since ChatGPT, this extraordinary techno-capital acceleration has been set into motion. Exactly a year ago today, Nvidia had their first blockbuster earnings call. It went up 25% after hours and everyone was like, “oh my God, AI is a thing.” Within a year, Nvidia data center revenue has gone from a few billion a quarter to $25 billion a quarter and continues to go up. Big Tech capex is skyrocketing. It’s funny. There’s this crazy scramble going on, but in some sense it’s just the continuation of straight lines on a graph. There’s this long-run trend of almost a decade of training compute for the largest AI systems growing by about half an order of magnitude, 0.5 OOMs a year. Just play that forward. GPT-4 was reported to have finished pre-training in 2022. On SemiAnalysis, it was rumored to have a cluster size of about 25,000 A100s. That’s roughly a $500 million cluster. Very roughly, it’s 10 megawatts. Just play that forward half a year. By 2024, that’s a cluster that’s 100 MW and 100,000 H100 equivalents with costs in the billions. Play it forward two more years. By 2026, that’s a gigawatt, the size of a large nuclear reactor. That’s like the power of the Hoover Dam. That costs tens of billions of dollars and requires a million H100 equivalents. By 2028, that’s a cluster that’s ten GW. That’s more power than most US states. That’s 10 million H100 equivalents, costing hundreds of billions of dollars. By 2030, you get the trillion-dollar cluster using 100 gigawatts, over 20% of US electricity production. That’s 100 million H100 equivalents. That’s just the training cluster. There are more inference GPUs as well. Once there are products, most of them will be inference GPUs. US power production has barely grown for decades. Now we’re really in for a ride. When I had Zuck on the podcast, he was claiming not a plateau per se, but that AI progress would be bottlenecked by this constraint on energy. Specifically, he was like, “oh, gigawatt data centers, are we going to build another Three Gorges Dam or something?” According to public reports, there are companies planning things on the scale of a 1 GW data center. With a 10 GW data center, who’s going to be able to build that? A 100 GW center is like a state project. Are you going to pump that into one physical data center? How is it going to be possible? What is Zuck missing? Six months ago, 10 GW was the talk of the town. Now, people have moved on. 10 GW is happening. There’s The Information report on OpenAI and Microsoft planning a $100 billion cluster. Is that 1 GW? Or is that 10 GW? I don’t know but if you try to map out how expensive the 10 GW cluster would be, that’s a couple of hundred billion. It’s sort of on that scale and they’re planning it. It’s not just my crazy take. AMD forecasted a $400 billion AI accelerator market by 2027. AI accelerators are only part of the expenditures. We’re very much on track for a $1 trillion of total AI investment by 2027. The $1 trillion cluster will take a bit more acceleration. We saw how much ChatGPT unleashed. Every generation, the models are going to be crazy and shift the Overton window. Then the revenue comes in. These are forward-looking investments. The question is, do they pay off? Let’s estimate the GPT-4 cluster at around $500 million. There’s a common mistake people make, saying it was $100 million for GPT-4. That’s just the rental price. If you’re building the biggest cluster, you have to build and pay for the whole cluster. You can’t just rent it for three months. Can’t you? Once you’re trying to get into the hundreds of billions, you have to get to like $100 billion a year in revenue. This is where it gets really interesting for the big tech companies because their revenues are on the order of hundreds of billions. $10 billion is fine. It’ll pay off the 2024 size training cluster. It’ll really be gangbusters with Big Tech when it costs $100 billion a year. The question is how feasible is $100 billion a year from AI revenue? It’s a lot more than right now. If you believe in the trajectory of AI systems as I do, it’s not that crazy. There are like 300 million Microsoft Office subscribers. They have Copilot now. I don’t know what they’re selling it for. Suppose you sold some AI add-on for $100/month to a third of Microsoft Office subscribers. That’d be $100 billion right there. $100/month is a lot. That’s a lot for a third of Office subscribers. For the average knowledge worker, it’s a few hours of productivity a month. You have to be expecting pretty lame AI progress to not hit a few hours of productivity a month. Sure, let’s assume all this. What happens in the next few years? What can the AI trained on the 1 GW data center do? What about the one on the 10 GW data center? Just map out the next few years of AI progress for me. The 10 GW range is my best guess for when you get true AGI. Compute is actually overrated. We’ll talk about that. By 2025-2026, we’re going to get models that are basically smarter than most college graduates. A lot of the economic usefulness depends on unhobbling. The models are smart but limited. There are chatbots and then there are things like being able to use a computer and doing agentic long-horizon tasks. By 2027-2028, it’ll get as smart as the smartest experts. The unhobbling trajectory points to it becoming much more like an agent than a chatbot. It’ll almost be like a drop-in remote worker. This is the question around the economic returns. Intermediate AI systems could be really useful, but it takes a lot of schlep to integrate them. There’s a lot you could do with GPT-4 or GPT-4.5 in a business use case, but you really have to change your workflows to make them useful. It’s a very Tyler Cowen-esque take. It just takes a long time to diffuse. We’re in SF and so we miss that. But in some sense, the way these systems want to be integrated is where you get this kind of sonic boom. Intermediate systems could have done it, but it would have taken schlep. Before you do the schlep to integrate them, you’ll get much more powerful systems that are unhobbled. They’re agents, drop-in remote workers. You’re interacting with them like coworkers. You can do Zoom calls and Slack with them. You can ask them to do a project and they go off and write a first draft, get feedback, run tests on their code, and come back. Then you can tell them more things. That’ll be much easier to integrate. You might need a bit of overkill to make the transition easy and harvest the gains. What do you mean by overkill? Overkill on model capabilities? Yeah, the intermediate models could do it but it would take a lot of schlep. The drop-in remote worker AGI can automate cognitive tasks. The intermediate models would have made the software engineer more productive. But will the software engineer adopt it? With the 2027 model, you just don’t need the software engineer. You can interact with it like a software engineer, and it’ll do the work of a software engineer. The last episode I did was with John Schulman. I was asking about this. We have these models that have come out in the last year and none seem to have significantly surpassed GPT-4, certainly not in an agentic way where they interact with you as a coworker. They’ll brag about a few extra points on MMLU. Even with GPT-4o, it’s cool they can talk like Scarlett Johansson (I guess not anymore) but it’s not like a coworker. It makes sense why they’d be good at answering questions. They have data on how to complete Wikipedia text. Where is the equivalent training data to understand a Zoom call? Referring back to your point about a Slack conversation, how can it use context to figure out the cohesive project you’re working on? Where is that training data coming from? A key question for AI progress in the next few years is how hard it is to unlock the test time compute overhang. Right now, GPT-4 can do a few hundred tokens with chain-of-thought. That’s already a huge improvement. Before, answering a math question was just shotgun. If you tried to answer a math question by saying the first thing that comes to mind, you wouldn’t be very good. GPT-4 thinks for a few hundred tokens. If I think at 100 tokens a minute, that’s like what GPT-4 does. It’s equivalent to me thinking for three minutes. Suppose GPT-4 could think for millions of tokens. That’s +4 OOMs on test time compute on one problem. It can’t do it now. It gets stuck. It writes some code. It can do a little bit of iterative debugging, but eventually gets stuck and can’t correct its errors. There’s a big overhang. In other areas of ML, there’s a great paper on AlphaGo, where you can trade off train time and test time compute. If you can use 4 OOMs more test time compute, that’s almost like a 3.5x OOM bigger model. Again, if it’s 100 tokens a minute, a few million tokens is a few months of working time. There’s a lot more you can do in a few months of working time than just getting an answer right now. The question is how hard is it to unlock that? In the short timelines AI world, it’s not that hard. The reason it might not be that hard is that there are only a few extra tokens to learn. You need to learn things like error correction tokens where you’re like “ah, I made a mistake, let me think about that again.” You need to learn planning tokens where it’s like “I’m going to start by making a plan. Here’s my plan of attack. I’m going to write a draft and now I’m going to critique my draft and think about it.” These aren’t things that models can do now, but the question is how hard it is. There are two paths to agents. When Sholto was on your podcast, he talked about scaling leading to more nines of reliability. That’s one path. The other path is the unhobbling path. It needs to learn this System 2 process. If it can learn that, it can use millions of tokens and think coherently. Here’s an analogy. When you drive, you’re on autopilot most of the time. Sometimes you hit a weird construction zone or intersection. Sometimes my girlfriend is in the passenger seat and I’m like “ah, be quiet for a moment, I need to figure out what’s going on.” You go from autopilot to System 2 and you’re thinking about how to do it. Scaling improves that System 1 autopilot. The brute force way to get to agents is improving that system. If you can get System 2 working, you can quickly jump to something more agentified and test time compute overhang is unlocked. What’s the reason to think this is an easy win? Is there some loss function that easily enables System 2 thinking? There aren’t many animals with System 2 thinking. It took a long time for evolution to give us System 2 thinking. Pre-training has trillions of tokens of Internet text, I get that. You match that and get all of these free training capabilities. What’s the reason to think this is an easy unhobbling? First of all, pre-training is magical. It gave us a huge advantage for models of general intelligence because you can predict the next token. But there’s a common misconception. Predicting the next token lets the model learn incredibly rich representations. Representation learning properties are the magic of deep learning. Rather than just learning statistical artifacts, the models learn models of the world. That’s why they can generalize, because it learned the right representations. When you train a model, you have this raw bundle of capabilities that’s useful. The unhobbling from GPT-2 to GPT-4 took this raw mass and RLHF’d it into a good chatbot. That was a huge win. In the original InstructGPT paper, comparing RLHF vs. non-RLHF models it’s like a 100x model size win on human preference rating. It started to be able to do simple chain-of-thought and so on. But you still have this advantage of all these raw capabilities, and there’s still a huge amount you’re not doing with them. This pre-training advantage is also the difference to robotics. People used to say it was a hardware problem. The hardware is getting solved, but you don’t have this huge advantage of bootstrapping with pre-training. You don’t have all this unsupervised learning you can do. You have to start right away with RL self-play. The question is why RL and unhobbling might work. Bootstrapping is an advantage. Your Twitter bio is being pre-trained. You’re not being pre-trained anymore. You were pre-trained in grade school and high school. At some point, you transition to being able to learn by yourself. You weren’t able to do it in elementary school. High school is probably where it started and by college, if you’re smart, you can teach yourself. Models are just starting to enter that regime. It’s a little bit more scaling and then you figure out what goes on top. It won’t be trivial. A lot of deep learning seems obvious in retrospect. There’s some obvious cluster of ideas. There are some ideas that seem a little dumb but work. There are a lot of details you have to get right. We’re not going to get this next month. It’ll take a while to figure out. A while for you is like half a year. I don’t know, between six months and three years. But it’s possible. It’s also very related to the issue of the data wall. Here’s one intuition on learning by yourself. Pre-training is kind of like the teacher lecturing to you and the words are flying by. You’re just getting a little bit from it. That’s not what you do when you learn by yourself. When you learn by yourself, say you’re reading a dense math textbook, you’re not just skimming through it once. Some wordcels just skim through and reread and reread the math textbook and they memorize. What you do is you read a page, think about it, have some internal monologue going on, and have a conversation with a study buddy. You try a practice problem and fail a bunch of times. At some point it clicks, and you’re like, “this made sense.” Then you read a few more pages. We’ve kind of bootstrapped our way to just starting to be able to do that now with models. The question is, can you use all this sort of self-play, synthetic data, RL to make that thing work. Right now, there’s in-context learning, which is super sample efficient. In the Gemini paper, it just learns a language in-context. Pre-training, on the other hand, is not at all sample efficient. What humans do is a kind of in-context learning. You read a book, think about it, until eventually it clicks. Then you somehow distill that back into the weights. In some sense, that’s what RL is trying to do. RL is super finicky, but when it works it’s kind of magical. It’s the best possible data for the model. It’s when you try a practice problem, fail, and at some point figure it out in a way that makes sense to you. That’s the best possible data for you because it’s the way you would have solved the problem, rather than just reading how somebody else solved the problem, which doesn’t initially click. By the way, if that take sounds familiar it’s because it was part of the question I asked John Schulman. It goes to illustrate the thing I said in the intro. A bunch of the things I’ve learned about AI comes from these dinners we do before the interviews with me, you, Sholto, and a couple of others. We’re like, “what should I ask John Schulman, what I should ask Dario.” Suppose this is the way things go and we get these unhobblings— And the scaling. You have this baseline of this enormous force of scaling. GPT-2 was amazing. It could string together plausible sentences, but it could barely do anything. It was kind of like a preschooler. GPT-4, on the other hand, could write code and do hard math, like a smart high schooler. This big jump in capability is explored in the essay series. I count the orders of magnitude of compute and scale-up of algorithmic progress. Scaling alone by 2027-2028 is going to do another preschool to high school jump on top of GPT-4. At a per token level, the models will be incredibly smart. They’ll gain more reliability, and with the addition of unhobblings, they’ll look less like chatbots and more like agents or drop-in remote workers. That’s when things really get going. I want to ask more questions about this but let’s zoom out. Suppose you’re right about this. This is because of the 2027 cluster which is at 10 GW? 2028 is 10 GW. Maybe it’ll be pulled forward. Something like a 5.5 level by 2027, whatever that’s called. What does the world look like at that point? You have these remote workers who can replace people. What is the reaction to that in terms of the economy, politics, and geopolitics? 2023 was a really interesting year to experience as somebody who was really following the AI stuff. What were you doing in 2023? OpenAI. When you were at OpenAI in 2023, it was a weird thing. You almost didn’t want to talk about AI or AGI. It was kind of a dirty word. Then in 2023, people saw ChatGPT for the first time, they saw GPT-4, and it just exploded. It triggered huge capital expenditures from all these firms and an explosion in revenue from Nvidia and so on. Things have been quiet since then, but the next thing has been in the oven. I expect every generation these g-forces to intensify. People will see the models. They won’t have counted the OOMs so they’re going to be surprised. It’ll be kind of crazy. Revenue is going to accelerate. Suppose you do hit $10 billion by the end of this year. Suppose it just continues on the trajectory of revenue doubling every six months. It’s not actually that far from $100 billion, maybe by 2026. At some point, what happened to Nvidia is going to happen to Big Tech. It’s going to explode. A lot more people are going to feel it. 2023 was the moment for me where AGI went from being this theoretical, abstract thing. I see it, I feel it, and I see the path. I see where it’s going. I can see the cluster it’s trained on, the rough combination of algorithms, the people, how it’s happening. Most of the world is not there yet. Most of the people who feel it are right here. A lot more of the world is going to start feeling it. That’s going to start being intense. Right now, who feels it? You can go on Twitter and there are these GPT wrapper companies, like, “whoa, GPT-4 is going to change our business.” I’m so bearish on the wrapper companies because they’re betting on stagnation. They’re betting that you have these intermediate models and it takes so much schlep to integrate them. I’m really bearish because we’re just going to sonic boom you. We’re going to get the unhobblings. We’re going to get the drop-in remote worker. Your stuff is not going to matter. So that’s done. SF, this crowd, is paying attention now. Who is going to be paying attention in 2026 and 2027? Presumably, these are years in which hundreds of billions of capex is being spent on AI. The national security state is going to start paying a lot of attention. I hope we get to talk about that. Let’s talk about it now. What happens? What is the immediate political reaction? Looking internationally, I don’t know if Xi Jinping sees the GPT-4 news and goes, “oh, my God, look at the MMLU score on that. What are we doing about this, comrade?” So what happens when he sees a remote worker replacement and it has $100 billion in revenue? There’s a lot of businesses that have $100 billion in revenue, and people aren’t staying up all night talking about it. The question is, when does the CCP and when does the American national security establishment realize that superintelligence is going to be absolutely decisive for national power? This is where the intelligence explosion stuff comes in, which we should talk about later. You have AGI. You have this drop-in remote worker that can replace you or me, at least for remote jobs. Fairly quickly, you turn the crank one or two more times and you get a thing that’s smarter than humans. Even more than just turning the crank a few more times, one of the first jobs to be automated is going to be that of an AI researcher or engineer. If you can automate AI research, things can start going very fast. Right now, there’s already at this trend of 0.5 OOMs a year of algorithmic progress. At some point, you’re going to have GPU fleets in the tens of millions for inference or more. You’re going to be able to run 100 million human equivalents of these automated AI researchers. If you can do that, you can maybe do a decade’s worth of ML research progress in a year. You get some sort of 10x speed up. You can make the jump to AI that is vastly smarter than humans within a year, a couple of years. That broadens from there. You have this initial acceleration of AI research. You apply R&D to a bunch of other fields of technology. At this point, you have a billion super intelligent researchers, engineers, technicians, everything. They’re superbly competent at all things. They’re going to figure out robotics. We talked about that being a software problem. Well, you have a billion super smart — smarter than the smartest human researchers — AI researchers in your cluster. At some point during the intelligence explosion, they’re going to be able to figure out robotics. Again, that’ll expand. If you play this picture forward, it is fairly unlike any other technology. A couple years of lead could be utterly decisive in say, military competition. If you look at the first Gulf War, Western coalition forces had a 100:1 kill ratio. They had better sensors on their tanks. They had better precision missiles, GPS, and stealth. They had maybe 20-30 years of technological lead. They just completely crushed them. Superintelligence applied to broad fields of R&D — and the industrial explosion that comes from it, robots making a lot of material — could compress a century’s worth of technological progress into less than a decade. That means that a couple years could mean a Gulf War 1-style advantage in military affairs. That’s including a decisive advantage that even preempts nukes. How do you find nuclear stealth submarines? Right now, you have sensors and software to detect where they are. You can do that. You can find them. You have millions or billions of mosquito-sized drones, and they take out the nuclear submarines. They take out the mobile launchers. They take out the other nukes. It’s potentially enormously destabilizing and enormously important for national power. At some point people are going to realize that. Not yet, but they will. When they do, it won’t just be the AI researchers in charge. The CCP is going to have an all-out effort to infiltrate American AI labs. It’ll involve billions of dollars, thousands of people, and the full force of the Ministry of State Security. The CCP is going to try to outbuild us. They added as much power in the last decade as an entire US electric grid. So the 100 GW cluster, at least the 100 GW part of it, is going to be a lot easier for them to get. By this point, it’s going to be an extremely intense international competition. One thing I’m uncertain about in this picture is if it’s like what you say, where it’s more of an explosion. You’ve developed an AGI. You make it into an AI researcher. For a while, you’re only using this ability to make hundreds of millions of other AI researchers. The thing that comes out of this really frenetic process is a superintelligence. Then that goes out in the world and is developing robotics and helping you take over other countries and whatever. It’s a little bit more gradual. It’s an explosion that starts narrowly. It can do cognitive jobs. The highest ROI use for cognitive jobs is to make the AI better and solve robotics. As you solve robotics, now you can do R&D in biology and other technology. Initially, you start with the factory workers. They’re wearing the glasses and AirPods, and the AI is instructing them because you can make any worker into a skilled technician. Then you have the robots come in. So this process expands. Meta’s Ray-Bans are a complement to Llama. With the fabs in the US, their constraint is skilled workers. Even if you don’t have robots, you have the cognitive superintelligence and can kind of make them all into skilled workers immediately. That’s a very brief period. Robots will come soon. Suppose this is actually how the tech progresses in the United States, maybe because these companies are already generating hundreds of billions of dollars of AI revenue At this point, companies are borrowing hundreds of billions or more in the corporate debt markets. Why is a CCP bureaucrat, some 60-year-old guy, looking at this and going, “oh, Copilot has gotten better now” and now— This is much more than Copilot has gotten better now. It’d require shifting the production of an entire country, dislocating energy that is otherwise being used for consumer goods or something, and feeding all that into the data centers. Part of this whole story is that you realize superintelligence is coming soon. You realize it and maybe I realize it. I’m not sure how much I realize it. Will the national security apparatus in the United States and the CCP realize it? This is a really key question. We have a few more years of mid-game. We have a few more 2023s. That just starts updating more and more people. The trend lines will become clear. You will see some amount of the COVID dynamic. COVID in February of 2020 honestly feels a lot like today. It feels like this utterly crazy thing is coming. You see the exponential and yet most of the world just doesn’t realize it. The mayor of New York is like, “go out to the shows,” and “this is just Asian racism.” At some point, people saw it and then crazy, radical reactions came. By the way, what were you doing during COVID? Was it your freshman or sophomore year? Junior. Still, you were like a 17-year-old junior or something right? Did you short the market or something? Did you sell at the right time? Yeah. So there will be a March 2020 moment. You can make the analogy you make in the series that this will cause a reaction like, “we have to do the Manhattan Project again for America here.” I wonder what the politics of this will be like. The difference here is that it’s not just like, “we need the bomb to beat the Nazis.” We’ll be building this thing that makes all our energy prices go up a bunch and it’s automating a lot of our jobs. The climate change stuff people are going to be like, “oh, my God, it’s making climate change worse and it’s helping Big Tech.” Politically, this doesn’t seem like a dynamic where the national security apparatus or the president is like, “we have to step on the gas here and make sure America wins.” Again, a lot of this really depends on how much people are feeling it and how much people are seeing it. Our generation is so used to peace, American hegemony and nothing matters. The historical norm is very much one of extremely intense and extraordinary things happening in the world with intense international competition. There’s a 20-year very unique period. In World War II, something like 50% of GDP went to war production. The US borrowed over 60% of GDP. With Germany and Japan I think it was over 100%. In World War I, the UK, France, and Germany all borrowed over 100% of GDP. Much more was on the line. People talk about World War I being so destructive with 20 million Soviet soldiers dying and 20% of Poland. That happened all the time. During the Seven Years’ War something like 20-30% of Prussia died. In the Thirty Years’ War, up to 50% of a large swath of Germany died. Will people see that the stakes here are really high and that history is actually back? The American national security state thinks very seriously about stuff like this. They think very seriously about competition with China. China very much thinks of itself on this historical mission of the rejuvenation of the Chinese nation. They think a lot about national power. They think a lot about the world order. There’s a real question on timing. Do they start taking this seriously when the intelligence explosion is already happening quite late. Do they start taking this seriously two years earlier? That matters a lot for how things play out. At some point they will and they will realize that this will be utterly decisive for not just some proxy war but for major questions. Can liberal democracy continue to thrive? Can the CCP continue existing? That will activate forces that we haven’t seen in a long time. The great power conflict definitely seems compelling. All kinds of different things seem much more likely when you think from a historical perspective. You zoom out beyond the liberal democracy that we’ve had the pleasure to live in America for say the last 80 years. That includes things like dictatorships, war, famine, etc. I was reading The Gulag Archipelago and one of the chapters begins with Solzhenitsyn saying how if you had told a Russian citizen under the tsars that because of all these new technologies — we wouldn’t see some Great Russian revival with Russia becoming a great power and the citizens made wealthy — you would see tens of millions of Soviet citizens tortured by millions of beasts in the worst possible ways. If you’d told them that that would be the result of the 20th century, they wouldn’t have believed you. They’d have called you a slanderer. The possibilities for dictatorship with superintelligence are even crazier as well. Imagine you have a perfectly loyal military and security force. No more rebellions. No more popular uprisings. You have perfect lie detection. You have surveillance of everybody. You can perfectly figure out who’s the dissenter and weed them out. No Gorbachev who had some doubts about the system would have ever risen to power. No military coup would have ever happened. There’s a real way in which part of why things have worked out is that ideas can evolve. There’s some sense in which time heals a lot of wounds and solves a lot of debates. Throughout time, a lot of people had really strong convictions, but a lot of those have been overturned over time because there’s been continued pluralism and evolution. Imagine applying a CCP-like approach to truth where truth is what the party says. When you supercharge that with superintelligence, that could just be locked in and enshrined for a long time. The possibilities are pretty terrifying. To your point about history and living in America for the past eight years, this is one of the things I took away from growing up in Germany. A lot of this stuff feels more visceral. My mother grew up in the former East, my father in the former West. They met shortly after the Wall fell. The end of the Cold War was this extremely pivotal moment for me because it’s the reason I exist. I grew up in Berlin with the former Wall. My great-grandmother, who is still alive, is very important in my life. She was born in 1934 and grew up during the Nazi era. In World War II, she saw the firebombing of Dresden from this country cottage where they were as kids. Then she spent most of her life in the East German communist dictatorship. She’d tell me about how Soviet tanks came when there was the popular uprising in 1954. Her husband was telling her to get home really quickly and get off the streets. She had a son who tried to ride a motorcycle across the Iron Curtain and then was put in a Stasi prison for a while. Finally, when she’s almost 60, it was the first time she lived in a free country, and a wealthy country. When I was a kid, the thing she always really didn’t want me to do was get involved in politics. Joining a political party had very bad connotations for her. She raised me when I was young. So it doesn’t feel that long ago. It feels very close. There’s one thing I wonder about when we’re talking today about the CCP. The people in China who will be doing their version of this project will be AI researchers who are somewhat Westernized. They’ll either have gotten educated in the West or have colleagues in the West. Are they going to sign up for the CCP project that’s going to hand over control to Xi Jinping? What’s your sense of that? Fundamentally, they’re just people, right? Can’t you convince them about the dangers of superintelligence? Will they be in charge though? In some sense, this is also the case in the US. This is like the rapidly depreciating influence of the lab employees. Right now, the AI lab employees have so much power. You saw this November event. It’s so much power. Both are going to get automated and they’re going to lose all their power. It’ll just be a few people in charge with their armies of automated AIs. It’s also the politicians and the generals and the national security state. There are some of these classic scenes from the Oppenheimer movie. The scientists built it and then the bomb was shipped away and it was out of their hands. It’s good for lab employees to be aware of this. You have a lot of power now, but maybe not for that long. Use it wisely. I do think they would benefit from some more organs of representative democracy. What do you mean by that? In the OpenAI board events, employee power is exercised in a very direct democracy way. How some of that went about really highlighted the benefits of representative democracy and having some deliberative organs. Interesting. Let’s go back to the $100 billion revenue question. The companies are trying to build clusters that are this big. Where are they building it? Say it’s the amount of energy that would be required for a small or medium-sized US state. Does Colorado then get no power because it’s happening in the United States? Is it happening somewhere else? This is the thing that I always find funny, when you talk about Colorado getting no power. The easy way to get the power would be to displace less economically useful stuff. Buy up the aluminum smelting plant that has a gigawatt. We’re going to replace it with the data center because that’s important. That’s not actually happening because a lot of these power contracts are really locked in long-term. Also, people don’t like things like this. In practice what it requires, at least right now, is building new power. That might change. That’s when things get really interesting, when it’s like, “no, we’re just dedicating all of the power to the AGI.” So right now it’s building new power. 10 GW is quite doable. It’s like a few percent of US natural gas production. When you have the 10 GW training cluster, you have a lot more inference. 100 gigawatts is where it starts getting pretty wild. That’s over 20% of US electricity production. It’s pretty doable, especially if you’re willing to go for natural gas. It is incredibly important that these clusters are in the United States. Why does it matter that it’s in the US? There are some people who are trying to build clusters elsewhere. There’s a lot of free-flowing Middle Eastern money that’s trying to build clusters elsewhere. This comes back to the national security question we talked about. Would you do the Manhattan Project in the UAE? You can put the clusters in the US and you can put them in allied democracies. Once you put them in authoritarian dictatorships, you create this irreversible security risk. Once the cluster is there, it’s much easier for them to exfiltrate the weights. They can literally steal the AGI, the superintelligence. It’s like they got a direct copy of the atomic bomb. It makes it much easier for them. They have weird ties to China. They can ship that to China. That’s a huge risk. Another thing is they can just seize the compute. The issue here is people right now are thinking of this as ChatGPT, Big Tech product clusters. The clusters being planned now, three to five years out, may well be the AGI, superintelligence clusters. When things get hot, they might just seize the compute. Suppose we put 25% of the compute capacity in these Middle Eastern dictatorships. Say they seize that. Now it’s a ratio of compute of 3:1. We still have more, but even with only 25% of compute there it starts getting pretty hairy. 3:1 is not that great of a ratio. You can do a lot with that amount of compute. Say they don’t actually do this. Even if they don’t actually seize the compute, even if they actually don’t steal the weights, there’s just a lot of implicit leverage you get. They get seats at the AGI table. I don’t know why we’re giving authoritarian dictatorships the seat at the AGI table. There’s going to be a lot of compute in the Middle East if these deals go through. First of all, who is it? Is it just every single Big Tech company trying to figure it out over there? It’s not everybody, some. There are reports, I think Microsoft. We’ll get into it. So say the UAE gets a bunch of compute because we’re building the clusters there. Let’s say they have 25% of the compute. Why does a compute ratio matter? If it’s about them being able to kick off the intelligence explosion, isn’t it just some threshold where you have 100 million AI researchers or you don’t? You can do a lot with 33 million extremely smart scientists. That might be enough to build the crazy bio weapons. Then you’re in a situation where they stole the weights and they seized the compute. Now they can make these crazy new WMDs that will be possible with superintelligence. Now you’ve just proliferated the stuff that’ll be really powerful. Also, 3x on compute isn’t actually that much. The riskiest situation is if we’re in some sort of really neck and neck, feverish international struggle. Say we’re really close with the CCP and we’re months apart. The situation we want to be in — and could be in if we play our cards right — is a little bit more like the US building the atomic bomb versus the German project years behind. If we have that, we just have so much more wiggle room to get safety right. We’re going to be building these crazy new WMDs that completely undermine nuclear deterrence. That’s so much easier to deal with if you don’t have somebody right on your tails and you have to go at maximum speed. You have no wiggle room. You’re worried that at any time they can overtake you. They can also just try to outbuild you. They might literally win. China might literally win if they can steal the weights, because they can outbuild you. They may have less caution, both good and bad caution in terms of whatever unreasonable regulations we have. If you’re in this really tight race, this sort of feverish struggle, that’s when there’s the greatest peril of self-destruction. Presumably the companies that are trying to build clusters in the Middle East realize this. Is it just that it’s impossible to do this in America? If you want American companies to do this at all, do you have to do it in the Middle East or not at all? Then you just have China build a Three Gorges Dam cluster. There’s a few reasons. People aren’t thinking about this as the AGI superintelligence cluster. They’re just like, “ah, cool clusters for my ChatGPT.” If you’re doing ones for inference, presumably you could spread them out across the country or something. The ones they’re building, they’re going to do one training run in a single thing they’re building. It’s just hard to distinguish between inference and training compute. People can claim it’s inference compute, but they might realize that actually this is going to be useful for training compute too. Because of synthetic data and things like that? RL looks a lot like inference, for example. Or you just end up connecting them in time. It’s a lot like raw materials. It’s like placing your uranium refinement facilities there. So there are a few reasons. One, they don’t think about this as the AGI cluster. Another is just that there’s easy money coming from the Middle East. Another one is that some people think that you can’t do it in the US. We actually face a real system competition here. Some people think that only autocracies that can do this with top-down mobilization of industrial capacity and the power to get stuff done fast. Again, this is the sort of thing we haven’t faced in a while. But during the Cold War, there was this intense system competition. East vs. West Germany was this. It was West Germany as liberal democratic capitalism vs. state-planned communism. Now it’s obvious that the free world would win. But even as late as 1961, Paul Samuelson was predicting that the Soviet Union would outgrow the United States because they were able to mobilize industry better. So there are some people who shitpost about loving America, but then in private they’re betting against America. They’re betting against the liberal order. Basically, it’s just a bad bet. This stuff is really possible in the US. To make it possible in the US, to some degree we have to get our act together. There are basically two paths to doing it in the US. One is you just have to be willing to do natural gas. There’s ample natural gas. You put your cluster in West Texas. You put it in southwest Pennsylvania by the Marcellus Shale. The 10 GW cluster is super easy. The 100 GW cluster is also pretty doable. I think natural gas production in the United States has almost doubled in a decade. You do that one more time over the next seven years, you could power multiple trillion-dollar data centers. The issue there is that a lot of people made these climate commitments, not just the government. It’s actually the private companies themselves, Microsoft, Amazon, etc., that have made these climate commitments. So they won’t do natural gas. I admire the climate commitments, but at some point the national interest and national security is more important. The other path is doing green energy megaprojects. You do solar and batteries and SMRs and geothermal. If we want to do that, there needs to be a broad deregulatory push. You can’t have permitting take a decade. You have to reform FERC. You have to have blanket NEPA exemptions for this stuff. There are inane state-level regulations. You can build the solar panels and batteries next to your data center, but it’ll still take years because you actually have to hook it up to the state electrical grid. You have to use governmental powers to create rights of way to have multiple clusters and connect them and have the cables. Ideally we do both. Ideally we do natural gas and the broader deregulatory green agenda. We have to do at least one. Then this stuff is possible in the United States. Before the conversation I was reading a good book about World War II industrial mobilization in the United States called Freedom’s Forge. I’m thinking back on that period, especially in the context of reading Patrick Collison’s Fast and the progress study stuff. There’s this narrative out there that we had state capacity back then and people just got shit done but that now it’s a clusterfuck. It wasn’t at all the case! It was really interesting. You had people from the Detroit auto industry side, like William Knudsen, who were running mobilization for the United States. They were extremely competent. At the same time you had labor organization and agitation, which is very analogous to the climate change pledges and concerns we have today. They would literally have these strikes, into 1941, costing millions of man-hours worth of time when we’re trying to make tens of thousands of planes a month. They would just debilitate factories for trivial concessions from capital that were pennies on the dollar. There were concerns that the auto companies were trying to use the pretext of a potential war to prevent paying labor the money it deserves. So with what climate change is today, you might think, “ah, America’s fucked. We’re not going to be able to build this shit if you look at NEPA or something,” I didn’t realize how debilitating labor was in World War II. It wasn’ just that. Before 1939, the American military was in total shambles. You read about it and it reads a little bit like the German military today. Military expenditures were I think less than 2% of GDP. All the European countries had gone, even in peacetime, above 10% of GDP. It was rapid mobilization starting from nothing. We were making no planes. There were no military contracts. Everything had been starved during the Great Depression. But there was this latent capacity. At some point the United States got its act together. This applies the other way around too with China. Sometimes people count them out a little bit with the export controls and so on. They’re able to make 7-nanometer chips now. There’s a question of how many they could make. There’s at least a possibility that they’re going to mature that ability and make a lot of 7-nanometer chips. There’s a lot of latent industrial capacity in China. They are able to build a lot of power fast. Maybe that isn’t activated for AI yet. At some point, the same way the United States and a lot of people in the US government are going to wake up, the CCP is going to wake up. Companies realize that scaling is a thing. Obviously their whole plans are contingent on scaling. So they understand that in 2028 we’re going to be building 10 GW data centers. At that point, the people who can keep up are Big Tech, potentially at the edge of their capabilities, sovereign wealth fund-funded things, and also major countries like America and China. What’s their plan? With the AI labs, what’s their plan given this landscape? Do they not want the leverage of being in the United States? The Middle East does offer capital, but America has plenty of capital. We have trillion-dollar companies. What are these Middle Eastern states? They’re kind of like trillion-dollar oil companies. We have trillion-dollar companies and very deep financial markets. Microsoft could issue hundreds of billions of dollars of bonds and they can pay for these clusters. Another argument being made, which is worth taking seriously, is that if we don’t work with the UAE or with these Middle Eastern countries, they’re just going to go to China. They’re going to build data centers and pour money into AI regardless. If we don’t work with them, they’ll just support China. There’s some merit to the argument in the sense that we should be doing benefit-sharing with them. On the road to AGI, there should be two tiers of coalitions. There should be a narrow coalition of democracies that’s developing AGI. Then there should be a broader coalition of other countries, including dictatorships, and we should offer them some of the benefits of AI. If the UAE wants to use AI products, run Meta recommendation engines, or run the last-generation models, that’s fine. By default, they just wouldn’t have had this seat at the AGI table. So they have some money, but a lot of people have money. The only reason they’re getting this seat at the AGI table and giving these dictators this leverage over this extremely important national security technology, is because we’re getting them excited and offering it to them. Who specifically is doing this? Who are the companies who are going there to fundraise? It’s been reported that Sam Altman is trying to raise $7 trillion or whatever for a chip project. It’s unclear how many of the clusters will be there, but definitely stuff is happening. There’s another reason I’m a little suspicious of this argument that if the US doesn’t work with them, they’ll go to China. I’ve heard from multiple people — not from my time at OpenAI, and I haven’t seen the memo — that at some point several years ago, OpenAI leadership had laid out a plan to fund and sell AGI by starting a bidding war between the governments of the United States, China, and Russia. It’s surprising to me that they’re willing to sell AGI to the Chinese and Russian governments. There’s also something that feels eerily familiar about starting this bidding war and then playing them off each other, saying, “well, if you don’t do this, China will do it.” Interesting. That’s pretty fucked up. Suppose you’re right. We ended up in this place because, as one of our friends put it, the Middle East has billions or trillions of dollars up for persuasion like no other place in the world. With little accountability. There’s no Microsoft board. It’s only the dictator. Let’s say you’re right, that you shouldn’t have gotten them excited about AGI in the first place. Now we’re in a place where they are excited about AGI and they’re like, “fuck, we want to have GPT-5 while you’re going to be off building superintelligence. This Atoms for Peace thing doesn’t work for us.” If you’re in this place, don’t they already have the leverage? The UAE on its own is not competitive. They’re already export-controlled. You’re not supposed to ship Nvidia chips over there. It’s not like they have any of the leading AI labs. They have money, but it’s hard to just translate money into progress. But I want to go back to other things you’ve been saying in laying out your vision. There’s this almost industrial process of putting in the compute and algorithms, adding that up, and getting AGI on the other end. If it’s something more like that, then the case for somebody being able to catch up rapidly seems more compelling than if it’s some bespoke… Well, if they can steal the algorithms and if they can steal the weights, that’s really important. How easy would it be for an actor to steal the things that are not the trivial released things, like Scarlett Johansson’s voice, but the RL things we’re talking about, the unhobblings? It’s all extremely easy. They don’t make the claim that it’s hard. DeepMind put out their Frontier Safety Framework and they lay out security levels, zero to four. Four is resistant to state activity. They say, we’re at level zero. Just recently, there was an indictment of a guy who stole a bunch of really important AI code and went to China with it. All he had to do to steal the code was copy it, put it into Apple Notes, and export it as a PDF. That got past their monitoring. Google has the best security of any of the AI labs probably, because they have the Google infrastructure. I would think of the security of a startup. What does security of a startup look like? It’s not that good. It’s easy to steal. Even if that’s the case, a lot of your post is making the argument for why we are going to get the intelligence explosion. If we have somebody with the intuition of an Alec Radford to come up with all these ideas, that intuition is extremely valuable and you can scale that up. If it’s just intuition, then that’s not going to be just in the code, right? Also because of export controls, these countries are going to have slightly different hardware. You’re going to have to make different trade-offs and probably rewrite things to be compatible with that. Is it just a matter of getting the right pen drive and plugging it into the gigawatt data center next to the Three Gorges Dam and then you’re off to the races? There are a few different things, right? One threat model is just them stealing the weights themselves. The weights one is particularly insane because they can just steal the literal end product — just make a replica of the atomic bomb — and then they’re ready to go. That one is extremely important around the time we have AGI and superintelligence because China can build a big cluster by default. We’d have a big lead because we have the better scientists, but if we make the superintelligence and they just steal it, they’re off to the races. Weights are a little bit less important right now because who cares if they steal the GPT-4 weights. We still have to get started on weight security now because if we think there’s AGI by 2027, this stuff is going to take a while. It’s not just going to be like, “oh, we do some access control.” If you actually want to be resistant to Chinese espionage, it needs to be much more intense. The thing that people aren’t paying enough attention to is the secrets. The compute stuff is sexy, but people underrate the secrets. The half an order of magnitude a year is just by default, sort of algorithmic progress. That’s huge. If we have a few years of lead, by default, that’s a 10-30x, 100x bigger cluster, if we protect them. There’s this additional layer of the data wall. We have to get through the data wall. That means we actually have to figure out some sort of basic new paradigm. So it’s the “AlphaGo step two.” “AlphaGo step one” learns from human imitation. “AlphaGo step two” is the kind of self-play RL thing that everyone’s working on right now. Maybe we’re going to crack it. If China can’t steal that, then they’re stuck. If they can steal it, they’re off to the races. Whatever that thing is, can I literally write it down on the back of a napkin? If it’s that easy, then why is it so hard for them to figure it out? If it’s more about the intuitions, then don’t you just have to hire Alec Radford? What are you copying down? There are a few layers to this. At the top is the fundamental approach. On pre-training it might be unsupervised learning, next token prediction, training on the entire Internet. You actually get a lot of juice out of that already. That one’s very quick to communicate. Then there’s a lot of details that matter, and you were talking about this earlier. It’s probably going to be somewhat obvious in retrospect, or there’s going to be some not too complicated thing that’ll work, but there’s going to be a lot of details to get that. If that’s true, then again, why do we think that getting state-level security in these startups will prevent China from catching up? It’s just like, “oh, we know some sort of self-play RL will be required to get past the data wall.” It’s going to be solved by 2027, right? It’s not that hard. The US, and the leading labs in the United States, have this huge lead. By default, China actually has some good LLMs because they’re just using open source code, like Llama. People really underrate both the divergence on algorithmic progress and the lead the US would have by default because all this stuff was published until recently. Look at Chinchilla Scaling laws, MoE papers, transformers. All that stuff was published. That’s why open source is good and why China can make some good models. Now, they’re not publishing it anymore. If we actually kept it secret, it would be a huge edge. To your point about tacit knowledge and Alec Radford, there’s another layer at the bottom that is something about large-scale engineering work to make these big training runs work. That is a little bit more like tacit knowledge, but China will be able to figure that out. It’s engineering schlep, and they’re going to figure out how to do it. Why can’t they figure that out, but not how to get the RL thing working? I don’t know. Germany during World War II went down the wrong path with heavy water. There’s an amazing anecdote in The Making of the Atomic Bomb about this. Secrecy was one of the most contentious issues early on. Leo Szilard really thought a nuclear chain reaction and an atomic bomb were possible. He went around saying, “this is going to be of enormous strategic and military importance.” A lot of people didn’t believe it or thought, “maybe this is possible, but I’m going to act as though it’s not, and science should be open.” In the early days, there had been some incorrect measurements made on graphite as a moderator. Germany thought graphite wasn’t going to work, so they had to do heavy water. But then Enrico Fermi made new measurements indicating that graphite would work. This was really important. Szilard assaulted Fermi with another secrecy appeal and Fermi was pissed off, throwing a temper tantrum. He thought it was absurd, saying, “come on, this is crazy.” But Szilard persisted, and they roped in another guy, George Pegram. In the end, Fermi didn’t publish it. That was just in time. Fermi not publishing meant that the Nazis didn’t figure out graphite would work. They went down the path of heavy water, which was the wrong path. This is a key reason why the German project didn’t work out. They were way behind. We face a similar situation now. Are we just going to instantly leak how to get past the data wall and what the next paradigm is? Or are we not? The reason this would matter is if being one year ahead would be a huge advantage. In the world where you deploy AI over time they’re just going to catch up anyway. I interviewed Richard Rhodes, the guy who wrote The Making of the Atomic Bomb. One of the anecdotes he had was when the Soviets realized America had the bomb. Obviously, we dropped it in Japan. Lavrentiy Beria — the guy who ran the NKVD, a famously ruthless and evil guy — goes to the Soviet scientist who was running their version of the Manhattan Project. He says, “comrade, you will get us the American bomb.” The guy says, “well, listen, their implosion device actually is not optimal. We should make it a different way.” Beria says, “no, you will get us the American bomb, or your family will be camp dust.” The thing that’s relevant about that anecdote is that the Soviets would have had a better bomb if they hadn’t copied the American design, at least initially. That suggests something about history, not just for the Manhattan Project. There’s often this pattern of parallel invention because the tech tree implies that a certain thing is next — in this case, a self-play RL — and people work on that and are going to figure it out around the same time. There’s not going to be that much gap in who gets it first. Famously, a bunch of people invented the light bulb around the same time. Is it the case that it might be true but the one year or six months makes the difference? Two years makes all the difference. I don’t know if it’ll be two years though. If we lock down the labs, we have much better scientists. We’re way ahead. It would be two years. Even six months, a year, would make a huge difference. This gets back to the intelligence explosion dynamics. A year might be the difference between a system that’s sort of human-level and a system that is vastly superhuman. It might be like five OOMs. Look at the current pace. Three years ago, on the math benchmark — these are really difficult high school competition math problems — we were at a few percent, we couldn’t solve anything. Now it’s solved. That was at the normal pace of AI progress. You didn’t have a billion superintelligent researchers. A year is a huge difference, particularly after superintelligence. Once this is applied to many elements of R&D, you get an industrial explosion with robots and other advanced technologies. A couple of years might yield decades worth of progress. Again, it’s like the technological lead the U.S. had in the first Gulf War, when the 20-30 years of technological lead proved totally decisive. It really matters. Here’s another reason it really matters. Suppose they steal the weights, suppose they steal the algorithms, and they’re close on our tails. Suppose we still pull out ahead. We’re a little bit faster and we’re three months ahead. The world in which we’re really neck and neck, we only have a three-month lead, is incredibly dangerous. We’re in this feverish struggle where if they get ahead, they get to dominate, maybe they get a decisive advantage. They’re building clusters like crazy. They’re willing to throw all caution to the wind. We have to keep up. There are crazy new WMDs popping up. Then we’re going to be in the situation where it’s crazy new military technology, crazy new WMDs, deterrence, mutually assured destruction keeps changing every few weeks. It’s a completely unstable, volatile situation that is incredibly dangerous. So you have to look at it from the point of view that these technologies are dangerous, from the alignment point of view. It might be really important during the intelligence explosion to have a six-month wiggle room to be like, “look, we’re going to dedicate more compute to alignment during this period because we have to get it right. We’re feeling uneasy about how it’s going.” One of the most important inputs to whether we will destroy ourselves or whether we will get through this incredibly crazy period is whether we have that buffer. Before we go further, it’s very much worth noting that almost nobody I talk to thinks about the geopolitical implications of AI. I have some object-level disagreements that we’ll get into, things I want to iron out. I may not disagree in the end. The basic premise is that if you keep scaling, if people realize that this is where intelligence is headed, it’s not just going to be the same old world. It won’t just be about what model we’re deploying tomorrow or what the latest thing is. People on Twitter are like, “oh, GPT-4 is going to shake your expectations” or whatever. COVID is really interesting because when March 2020 hit, it became clear to the world — presidents, CEOs, media, the average person — that there are other things happening in the world right now but the main thing we as a world are dealing with right now is COVID. Soon it will be AGI. This is the quiet period. Maybe you want to go on vacation. Maybe now is the last time you can have some kids. My girlfriend sometimes complains when I’m off doing work that I don’t spend enough time with her. She threatens to replace me with GPT-6 or whatever. I’m like, “GPT-6 will also be too busy doing AI research.” Why aren’t other people talking about national security? I made this mistake with COVID. In February of 2020, I thought it was going to sweep the world and all the hospitals would collapse. It would be crazy, and then it’d be over. A lot of people thought this kind of thing at the beginning of COVID. They shut down their office for a month or whatever. The thing I just really didn’t price in was societal reaction. Within weeks, Congress spent over 10% of GDP on COVID measures. The entire country was shut down. It was crazy. I didn’t sufficiently price it in with COVID. Why do people underrate it? Being in the trenches actually gives you a less clear picture of the trend lines. You don’t have to zoom out that much, only a few years. When you’re in the trenches, you’re trying to get the next model to work. There’s always something that’s hard. You might underrate algorithmic progress because you’re like, “ah, things are hard right now,” or “data wall”