heading · body

Transcript

Leopold Aschenbrenner 2027 Agi China Us Super Intelligence Race And The Return Of History

read summary →

TITLE: zdbVtZIn9IM CHANNEL: Unknown DATE: ---TRANSCRIPT--- What will be at stake will  not just be cool products  But whether liberal democracy survives, Whether the CCP survives, what the world   order for the next century is going to be The CCP is going to have an all out   effort to infiltrate American AI labs Billions of dollars, thousands of people  CCP is going to try to out-build us. People don’t realize like how   intense state level espionage can be When we have like literal superintelligence  They can like Stuxnet the chinese data centers You really think that will be like   a private company And the government   wouldn’t be like “oh my god what is going on?” I do think it is incredibly important that these   clusters are in the united states I mean would you do the manhattan   project in the UAE? 2023 was the moment for me   when it went from AGI as this sort of theoretical,  abstract thing, and you’d make the models to like,   I see it, I feel it. I can see the cluster where  it’s trained on, like the rough combination of   algorithms, the people, like how it’s happening,  and I think most of the world is not; most of   the people who feel it are like right here Today I’m chatting with my friend Leopold   Aschenbrenner. He grew up in Germany and graduated  as valedictorian of Columbia when he was 19. After   that, he had a very interesting gap year which  we’ll talk about. Then, he was on the OpenAI   superalignment team, may it rest in peace. Now, with some anchor investments — from   Patrick and John Collison, Daniel Gross, and Nat  Friedman — he is launching an investment firm.  Leopold, you’re off to a slow start but  life is long. I wouldn’t worry about it   too much. You’ll make up for it in due  time. Thanks for coming on the podcast.  Thank you. I first discovered your  podcast when your best episode had   a couple of hundred views. It’s been amazing to  follow your trajectory. It’s a delight to be on.  In the Sholto and Trenton episode, I mentioned  that a lot of the things I’ve learned about AI   I’ve learned from talking with them. The  third, and probably most significant, part   of this triumvirate has been you. We’ll  get all the stuff on the record now.  Here’s the first thing I want to get on the  record. Tell me about the trillion-dollar cluster.  I should mention this for the context of  the podcast. Today you’re releasing a series   called Situational Awareness. We’re going to  get into it. First question about that is,   tell me about the trillion-dollar cluster. Unlike most things that have recently come   out of Silicon Valley, AI is an industrial  process. The next model doesn’t just require   some code. It’s building a giant new cluster.  It’s building giant new power plants. Pretty soon,   it’s going to involve building giant new fabs. Since ChatGPT, this extraordinary techno-capital   acceleration has been set into  motion. Exactly a year ago today,   Nvidia had their first blockbuster earnings call.  It went up 25% after hours and everyone was like,   “oh my God, AI is a thing.” Within a year, Nvidia  data center revenue has gone from a few billion a   quarter to $25 billion a quarter and continues  to go up. Big Tech capex is skyrocketing.  It’s funny. There’s this crazy scramble going on,  but in some sense it’s just the continuation of   straight lines on a graph. There’s this long-run  trend of almost a decade of training compute   for the largest AI systems growing by about  half an order of magnitude, 0.5 OOMs a year.  Just play that forward. GPT-4 was reported  to have finished pre-training in 2022.   On SemiAnalysis, it was rumored to have a cluster  size of about 25,000 A100s. That’s roughly a $500   million cluster. Very roughly, it’s 10 megawatts. Just play that forward half a year. By 2024,   that’s a cluster that’s 100 MW and 100,000  H100 equivalents with costs in the billions.  Play it forward two more years. By 2026, that’s  a gigawatt, the size of a large nuclear reactor.   That’s like the power of the Hoover Dam.  That costs tens of billions of dollars   and requires a million H100 equivalents. By 2028, that’s a cluster that’s ten GW.   That’s more power than most US states.  That’s 10 million H100 equivalents,   costing hundreds of billions of dollars. By 2030, you get the trillion-dollar cluster   using 100 gigawatts, over 20% of US electricity  production. That’s 100 million H100 equivalents.  That’s just the training cluster. There are more  inference GPUs as well. Once there are products,   most of them will be inference GPUs. US power  production has barely grown for decades.   Now we’re really in for a ride. When I had Zuck on the podcast,   he was claiming not a plateau per se, but that AI  progress would be bottlenecked by this constraint   on energy. Specifically, he was like, “oh,  gigawatt data centers, are we going to build   another Three Gorges Dam or something?” According to public reports, there   are companies planning things on the scale of  a 1 GW data center. With a 10 GW data center,   who’s going to be able to build that? A 100 GW  center is like a state project. Are you going   to pump that into one physical data center? How  is it going to be possible? What is Zuck missing?  Six months ago, 10 GW was the talk of the town.  Now, people have moved on. 10 GW is happening.   There’s The Information report on OpenAI and  Microsoft planning a $100 billion cluster.  Is that 1 GW? Or is that 10 GW? I don’t know but if you try to map   out how expensive the 10 GW cluster would be,  that’s a couple of hundred billion. It’s sort   of on that scale and they’re planning it.  It’s not just my crazy take. AMD forecasted   a $400 billion AI accelerator market by 2027. AI  accelerators are only part of the expenditures.  We’re very much on track for a $1 trillion of  total AI investment by 2027. The $1 trillion   cluster will take a bit more acceleration. We  saw how much ChatGPT unleashed. Every generation,   the models are going to be crazy  and shift the Overton window.  Then the revenue comes in. These are  forward-looking investments. The question is,   do they pay off? Let’s estimate the GPT-4 cluster  at around $500 million. There’s a common mistake   people make, saying it was $100 million for GPT-4.  That’s just the rental price. If you’re building   the biggest cluster, you have to build and pay  for the whole cluster. You can’t just rent it   for three months. Can’t you?  Once you’re trying to get into the  hundreds of billions, you have to get   to like $100 billion a year in revenue. This  is where it gets really interesting for the   big tech companies because their revenues  are on the order of hundreds of billions.  $10 billion is fine. It’ll pay off  the 2024 size training cluster.   It’ll really be gangbusters with Big Tech when  it costs $100 billion a year. The question is   how feasible is $100 billion a year from  AI revenue? It’s a lot more than right   now. If you believe in the trajectory of  AI systems as I do, it’s not that crazy.  There are like 300 million Microsoft Office  subscribers. They have Copilot now. I don’t   know what they’re selling it for. Suppose  you sold some AI add-on for $100/month to   a third of Microsoft Office subscribers. That’d  be $100 billion right there. $100/month is a lot.  That’s a lot for a third of Office subscribers. For the average knowledge worker, it’s a few   hours of productivity a month. You have  to be expecting pretty lame AI progress to   not hit a few hours of productivity a month. Sure, let’s assume all this. What happens in   the next few years? What can the AI trained  on the 1 GW data center do? What about the   one on the 10 GW data center? Just map out  the next few years of AI progress for me.  The 10 GW range is my best guess  for when you get true AGI. Compute   is actually overrated. We’ll talk about that. By 2025-2026, we’re going to get models that   are basically smarter than most college  graduates. A lot of the economic usefulness   depends on unhobbling. The models are smart  but limited. There are chatbots and then there   are things like being able to use a computer  and doing agentic long-horizon tasks.  By 2027-2028, it’ll get as smart as the smartest  experts. The unhobbling trajectory points to it   becoming much more like an agent than a chatbot.  It’ll almost be like a drop-in remote worker.  This is the question around the economic returns.  Intermediate AI systems could be really useful,   but it takes a lot of schlep to  integrate them. There’s a lot you   could do with GPT-4 or GPT-4.5 in a business  use case, but you really have to change your   workflows to make them useful. It’s a very Tyler  Cowen-esque take. It just takes a long time to   diffuse. We’re in SF and so we miss that. But in some sense, the way these systems   want to be integrated is where you get this kind  of sonic boom. Intermediate systems could have   done it, but it would have taken schlep. Before  you do the schlep to integrate them, you’ll get   much more powerful systems that are unhobbled. They’re agents, drop-in remote workers. You’re   interacting with them like coworkers. You  can do Zoom calls and Slack with them. You   can ask them to do a project and they go off and  write a first draft, get feedback, run tests on   their code, and come back. Then you can tell them  more things. That’ll be much easier to integrate.  You might need a bit of overkill to make  the transition easy and harvest the gains.  What do you mean by overkill?  Overkill on model capabilities?  Yeah, the intermediate models could do it but  it would take a lot of schlep. The drop-in   remote worker AGI can automate cognitive  tasks. The intermediate models would have   made the software engineer more productive.  But will the software engineer adopt it?  With the 2027 model, you just don’t need the  software engineer. You can interact with it   like a software engineer, and it’ll  do the work of a software engineer.  The last episode I did was with John Schulman. I was asking about this. We have these models   that have come out in the last year and none seem  to have significantly surpassed GPT-4, certainly   not in an agentic way where they interact  with you as a coworker. They’ll brag about   a few extra points on MMLU. Even with GPT-4o,  it’s cool they can talk like Scarlett Johansson   (I guess not anymore) but  it’s not like a coworker.  It makes sense why they’d be good at answering  questions. They have data on how to complete   Wikipedia text. Where is the equivalent training  data to understand a Zoom call? Referring back to   your point about a Slack conversation,  how can it use context to figure out the   cohesive project you’re working on?  Where is that training data coming from?  A key question for AI progress in the next few  years is how hard it is to unlock the test time   compute overhang. Right now, GPT-4 can do a few  hundred tokens with chain-of-thought. That’s   already a huge improvement. Before, answering a  math question was just shotgun. If you tried to   answer a math question by saying the first thing  that comes to mind, you wouldn’t be very good.  GPT-4 thinks for a few hundred tokens. If  I think at 100 tokens a minute, that’s like   what GPT-4 does. It’s equivalent to me thinking  for three minutes. Suppose GPT-4 could think   for millions of tokens. That’s +4 OOMs on test  time compute on one problem. It can’t do it now.   It gets stuck. It writes some code. It can do a  little bit of iterative debugging, but eventually   gets stuck and can’t correct its errors. There’s a big overhang. In other areas of ML,   there’s a great paper on AlphaGo, where you can  trade off train time and test time compute. If   you can use 4 OOMs more test time compute,  that’s almost like a 3.5x OOM bigger model.  Again, if it’s 100 tokens a minute, a few million  tokens is a few months of working time. There’s a   lot more you can do in a few months of working  time than just getting an answer right now. The   question is how hard is it to unlock that? In the short timelines AI world,   it’s not that hard. The reason it might not be  that hard is that there are only a few extra   tokens to learn. You need to learn things like  error correction tokens where you’re like “ah,   I made a mistake, let me think about that again.”  You need to learn planning tokens where it’s like   “I’m going to start by making a plan. Here’s my  plan of attack. I’m going to write a draft and   now I’m going to critique my draft and think about  it.” These aren’t things that models can do now,   but the question is how hard it is. There are two paths to agents. When   Sholto was on your podcast, he talked about  scaling leading to more nines of reliability.   That’s one path. The other path is the  unhobbling path. It needs to learn this   System 2 process. If it can learn that, it can  use millions of tokens and think coherently.  Here’s an analogy. When you drive, you’re  on autopilot most of the time. Sometimes you   hit a weird construction zone or intersection.  Sometimes my girlfriend is in the passenger seat   and I’m like “ah, be quiet for a moment,  I need to figure out what’s going on.”  You go from autopilot to System 2 and  you’re thinking about how to do it. Scaling   improves that System 1 autopilot. The brute force  way to get to agents is improving that system. If   you can get System 2 working, you can  quickly jump to something more agentified   and test time compute overhang is unlocked. What’s the reason to think this is an easy   win? Is there some loss function that easily  enables System 2 thinking? There aren’t many   animals with System 2 thinking. It took a long  time for evolution to give us System 2 thinking.  Pre-training has trillions of tokens of Internet  text, I get that. You match that and get   all of these free training capabilities. What’s  the reason to think this is an easy unhobbling?  First of all, pre-training is magical.  It gave us a huge advantage for models of   general intelligence because you can predict the  next token. But there’s a common misconception.   Predicting the next token lets the model learn  incredibly rich representations. Representation   learning properties are the magic of  deep learning. Rather than just learning   statistical artifacts, the models learn models  of the world. That’s why they can generalize,   because it learned the right representations. When you train a model, you have this raw bundle   of capabilities that’s useful. The unhobbling  from GPT-2 to GPT-4 took this raw mass and RLHF’d   it into a good chatbot. That was a huge win. In the original InstructGPT paper, comparing   RLHF vs. non-RLHF models it’s like a 100x model  size win on human preference rating. It started   to be able to do simple chain-of-thought and  so on. But you still have this advantage of all   these raw capabilities, and there’s still  a huge amount you’re not doing with them.  This pre-training advantage is also the difference  to robotics. People used to say it was a hardware   problem. The hardware is getting solved,  but you don’t have this huge advantage   of bootstrapping with pre-training. You don’t  have all this unsupervised learning you can do.   You have to start right away with RL self-play. The question is why RL and unhobbling might work.   Bootstrapping is an advantage. Your Twitter  bio is being pre-trained. You’re not being   pre-trained anymore. You were pre-trained in  grade school and high school. At some point,   you transition to being able to learn by yourself.  You weren’t able to do it in elementary school.   High school is probably where it started and by  college, if you’re smart, you can teach yourself.   Models are just starting to enter that regime. It’s a little bit more scaling and then you figure   out what goes on top. It won’t be trivial. A lot  of deep learning seems obvious in retrospect.   There’s some obvious cluster of ideas. There  are some ideas that seem a little dumb but   work. There are a lot of details you have to  get right. We’re not going to get this next   month. It’ll take a while to figure out. A while for you is like half a year.  I don’t know, between six months and three  years. But it’s possible. It’s also very   related to the issue of the data wall. Here’s one  intuition on learning by yourself. Pre-training   is kind of like the teacher lecturing to  you and the words are flying by. You’re   just getting a little bit from it. That’s not what you do when you learn   by yourself. When you learn by yourself,  say you’re reading a dense math textbook,   you’re not just skimming through it once.  Some wordcels just skim through and reread   and reread the math textbook and they memorize. What you do is you read a page, think about it,   have some internal monologue going on, and have  a conversation with a study buddy. You try a   practice problem and fail a bunch of times.  At some point it clicks, and you’re like,   “this made sense.” Then you read a few more pages. We’ve kind of bootstrapped our way to   just starting to be able to do that  now with models. The question is,   can you use all this sort of self-play, synthetic  data, RL to make that thing work. Right now,   there’s in-context learning, which is super  sample efficient. In the Gemini paper, it just   learns a language in-context. Pre-training, on  the other hand, is not at all sample efficient.  What humans do is a kind of in-context  learning. You read a book, think about it,   until eventually it clicks. Then you somehow  distill that back into the weights. In some sense,   that’s what RL is trying to do. RL is super  finicky, but when it works it’s kind of magical.  It’s the best possible data for the model.  It’s when you try a practice problem, fail,   and at some point figure it out in a way  that makes sense to you. That’s the best   possible data for you because it’s the  way you would have solved the problem,   rather than just reading how somebody else solved  the problem, which doesn’t initially click.  By the way, if that take sounds familiar it’s  because it was part of the question I asked   John Schulman. It goes to illustrate the thing  I said in the intro. A bunch of the things I’ve   learned about AI comes from these dinners we  do before the interviews with me, you, Sholto,   and a couple of others. We’re like, “what should  I ask John Schulman, what I should ask Dario.”  Suppose this is the way things  go and we get these unhobblings—  And the scaling. You have this baseline of this  enormous force of scaling. GPT-2 was amazing. It   could string together plausible sentences, but it  could barely do anything. It was kind of like a   preschooler. GPT-4, on the other hand, could write  code and do hard math, like a smart high schooler.   This big jump in capability is explored in the  essay series. I count the orders of magnitude   of compute and scale-up of algorithmic progress. Scaling alone by 2027-2028 is going to do another   preschool to high school jump on top of GPT-4. At  a per token level, the models will be incredibly   smart. They’ll gain more reliability, and with the  addition of unhobblings, they’ll look less like   chatbots and more like agents or drop-in remote  workers. That’s when things really get going.  I want to ask more questions about this but let’s  zoom out. Suppose you’re right about this. This is   because of the 2027 cluster which is at 10 GW? 2028 is 10 GW. Maybe it’ll be pulled forward.  Something like a 5.5 level by 2027, whatever  that’s called. What does the world look like at   that point? You have these remote workers who can  replace people. What is the reaction to that in   terms of the economy, politics, and geopolitics? 2023 was a really interesting year to experience   as somebody who was really following the AI stuff. What were you doing in 2023?  OpenAI. When you were at OpenAI in 2023, it was a  weird thing. You almost didn’t want to talk about   AI or AGI. It was kind of a dirty word. Then  in 2023, people saw ChatGPT for the first time,   they saw GPT-4, and it just exploded. It triggered huge capital expenditures   from all these firms and an explosion in revenue  from Nvidia and so on. Things have been quiet   since then, but the next thing has been in the  oven. I expect every generation these g-forces   to intensify. People will see the models.  They won’t have counted the OOMs so they’re   going to be surprised. It’ll be kind of crazy. Revenue is going to accelerate. Suppose you   do hit $10 billion by the end of this year.  Suppose it just continues on the trajectory   of revenue doubling every six months. It’s not  actually that far from $100 billion, maybe by   2026. At some point, what happened to Nvidia  is going to happen to Big Tech. It’s going to   explode. A lot more people are going to feel it. 2023 was the moment for me where AGI went from   being this theoretical, abstract thing. I see  it, I feel it, and I see the path. I see where   it’s going. I can see the cluster it’s trained on,  the rough combination of algorithms, the people,   how it’s happening. Most of the world is not there  yet. Most of the people who feel it are right   here. A lot more of the world is going to start  feeling it. That’s going to start being intense.  Right now, who feels it? You can go on Twitter  and there are these GPT wrapper companies, like,   “whoa, GPT-4 is going to change our business.” I’m so bearish on the wrapper companies because   they’re betting on stagnation. They’re  betting that you have these intermediate   models and it takes so much schlep to integrate  them. I’m really bearish because we’re just   going to sonic boom you. We’re going to get the  unhobblings. We’re going to get the drop-in remote   worker. Your stuff is not going to matter. So that’s done. SF, this crowd, is paying   attention now. Who is going to be paying  attention in 2026 and 2027? Presumably,   these are years in which hundreds of  billions of capex is being spent on AI.  The national security state is  going to start paying a lot of   attention. I hope we get to talk about that. Let’s talk about it now. What happens? What   is the immediate political reaction? Looking  internationally, I don’t know if Xi Jinping   sees the GPT-4 news and goes, “oh, my God,  look at the MMLU score on that. What are   we doing about this, comrade?” So what happens when he sees a   remote worker replacement and it has $100  billion in revenue? There’s a lot of businesses   that have $100 billion in revenue, and people  aren’t staying up all night talking about it.  The question is, when does the CCP and when does  the American national security establishment   realize that superintelligence is going to be  absolutely decisive for national power? This   is where the intelligence explosion stuff  comes in, which we should talk about later.  You have AGI. You have this drop-in  remote worker that can replace you or me,   at least for remote jobs. Fairly quickly, you  turn the crank one or two more times and you   get a thing that’s smarter than humans. Even more than just turning the crank a   few more times, one of the first jobs to  be automated is going to be that of an   AI researcher or engineer. If you can automate  AI research, things can start going very fast.  Right now, there’s already at this trend  of 0.5 OOMs a year of algorithmic progress.   At some point, you’re going to have GPU fleets  in the tens of millions for inference or more.   You’re going to be able to run 100 million human  equivalents of these automated AI researchers.  If you can do that, you can maybe do a decade’s  worth of ML research progress in a year. You   get some sort of 10x speed up. You can make  the jump to AI that is vastly smarter than   humans within a year, a couple of years. That broadens from there. You have this   initial acceleration of AI research. You apply  R&D to a bunch of other fields of technology. At   this point, you have a billion super intelligent  researchers, engineers, technicians, everything.   They’re superbly competent at all things. They’re going to figure out robotics. We   talked about that being a software problem. Well,  you have a billion super smart — smarter than the   smartest human researchers — AI researchers  in your cluster. At some point during the   intelligence explosion, they’re going to be able  to figure out robotics. Again, that’ll expand.  If you play this picture forward, it is fairly  unlike any other technology. A couple years of   lead could be utterly decisive in say, military  competition. If you look at the first Gulf War,   Western coalition forces had a 100:1 kill ratio.  They had better sensors on their tanks. They had   better precision missiles, GPS, and stealth.  They had maybe 20-30 years of technological   lead. They just completely crushed them. Superintelligence applied to broad fields   of R&D — and the industrial explosion that comes  from it, robots making a lot of material — could   compress a century’s worth of technological  progress into less than a decade. That means   that a couple years could mean a Gulf War 1-style  advantage in military affairs. That’s including a   decisive advantage that even preempts nukes. How do you find nuclear stealth submarines?   Right now, you have sensors and software to  detect where they are. You can do that. You   can find them. You have millions or billions  of mosquito-sized drones, and they take out the   nuclear submarines. They take out the mobile  launchers. They take out the other nukes.  It’s potentially enormously destabilizing  and enormously important for national power.   At some point people are going to realize  that. Not yet, but they will. When they do,   it won’t just be the AI researchers in charge. The CCP is going to have an all-out effort to   infiltrate American AI labs. It’ll involve  billions of dollars, thousands of people,   and the full force of the Ministry of State  Security. The CCP is going to try to outbuild us.  They added as much power in the last decade as an  entire US electric grid. So the 100 GW cluster,   at least the 100 GW part of it, is going to be  a lot easier for them to get. By this point,   it’s going to be an extremely  intense international competition.  One thing I’m uncertain about in this picture  is if it’s like what you say, where it’s more   of an explosion. You’ve developed an AGI. You  make it into an AI researcher. For a while,   you’re only using this ability to make hundreds  of millions of other AI researchers. The thing   that comes out of this really frenetic process  is a superintelligence. Then that goes out in the   world and is developing robotics and helping  you take over other countries and whatever.  It’s a little bit more gradual. It’s an  explosion that starts narrowly. It can do   cognitive jobs. The highest ROI use for  cognitive jobs is to make the AI better   and solve robotics. As you solve robotics, now  you can do R&D in biology and other technology.  Initially, you start with the factory workers.  They’re wearing the glasses and AirPods, and the   AI is instructing them because you can make any  worker into a skilled technician. Then you have   the robots come in. So this process expands. Meta’s Ray-Bans are a complement to Llama.  With the fabs in the US, their constraint is  skilled workers. Even if you don’t have robots,   you have the cognitive superintelligence  and can kind of make them all into skilled   workers immediately. That’s a very  brief period. Robots will come soon.  Suppose this is actually how the tech  progresses in the United States, maybe   because these companies are already generating  hundreds of billions of dollars of AI revenue  At this point, companies are borrowing hundreds  of billions or more in the corporate debt markets.  Why is a CCP bureaucrat, some 60-year-old  guy, looking at this and going,   “oh, Copilot has gotten better now” and now— This is much more than Copilot has gotten   better now. It’d require   shifting the production of an entire country,  dislocating energy that is otherwise being used   for consumer goods or something, and feeding all  that into the data centers. Part of this whole   story is that you realize superintelligence is  coming soon. You realize it and maybe I realize   it. I’m not sure how much I realize it. Will the national security apparatus in   the United States and the CCP realize it? This is a really key question. We have a few   more years of mid-game. We have a few more  2023s. That just starts updating more and   more people. The trend lines will become clear. You will see some amount of the COVID dynamic.   COVID in February of 2020 honestly feels a lot  like today. It feels like this utterly crazy thing   is coming. You see the exponential and yet most  of the world just doesn’t realize it. The mayor of   New York is like, “go out to the shows,” and “this  is just Asian racism.” At some point, people saw   it and then crazy, radical reactions came. By the way, what were you doing during   COVID? Was it your freshman or sophomore year? Junior.  Still, you were like a 17-year-old junior or  something right? Did you short the market or   something? Did you sell at the right time? Yeah.  So there will be a March 2020 moment. You can make the analogy you make in the   series that this will cause a reaction like, “we  have to do the Manhattan Project again for America   here.” I wonder what the politics of this will  be like. The difference here is that it’s not   just like, “we need the bomb to beat the Nazis.” We’ll be building this thing that makes all our   energy prices go up a bunch and it’s automating a  lot of our jobs. The climate change stuff people   are going to be like, “oh, my God, it’s making  climate change worse and it’s helping Big Tech.”  Politically, this doesn’t seem like a dynamic  where the national security apparatus or the   president is like, “we have to step on  the gas here and make sure America wins.”  Again, a lot of this really depends on how  much people are feeling it and how much people   are seeing it. Our generation is so used to  peace, American hegemony and nothing matters.   The historical norm is very much one of extremely  intense and extraordinary things happening in the   world with intense international competition. There’s a 20-year very unique period. In World   War II, something like 50% of GDP went  to war production. The US borrowed over   60% of GDP. With Germany and Japan I think  it was over 100%. In World War I, the UK,   France, and Germany all borrowed over 100% of GDP. Much more was on the line. People talk about   World War I being so destructive with 20  million Soviet soldiers dying and 20% of   Poland. That happened all the time. During  the Seven Years’ War something like 20-30%   of Prussia died. In the Thirty Years’ War,  up to 50% of a large swath of Germany died.  Will people see that the stakes here are  really high and that history is actually   back? The American national security state  thinks very seriously about stuff like this.   They think very seriously about competition with  China. China very much thinks of itself on this   historical mission of the rejuvenation of the  Chinese nation. They think a lot about national   power. They think a lot about the world order. There’s a real question on timing. Do they   start taking this seriously when the intelligence  explosion is already happening quite late. Do they   start taking this seriously two years earlier?  That matters a lot for how things play out.  At some point they will and they will realize  that this will be utterly decisive for not   just some proxy war but for major questions.  Can liberal democracy continue to thrive? Can   the CCP continue existing? That will activate  forces that we haven’t seen in a long time.  The great power conflict definitely seems  compelling. All kinds of different things   seem much more likely when you think  from a historical perspective. You   zoom out beyond the liberal democracy that  we’ve had the pleasure to live in America   for say the last 80 years. That includes  things like dictatorships, war, famine, etc.  I was reading The Gulag Archipelago and one of the  chapters begins with Solzhenitsyn saying how if   you had told a Russian citizen under the tsars  that because of all these new technologies — we   wouldn’t see some Great Russian revival with  Russia becoming a great power and the citizens   made wealthy — you would see tens of millions of  Soviet citizens tortured by millions of beasts in   the worst possible ways. If you’d told them that  that would be the result of the 20th century,   they wouldn’t have believed you.  They’d have called you a slanderer.  The possibilities for dictatorship with  superintelligence are even crazier as well.   Imagine you have a perfectly loyal military  and security force. No more rebellions. No   more popular uprisings. You have perfect lie  detection. You have surveillance of everybody.   You can perfectly figure out who’s the dissenter  and weed them out. No Gorbachev who had some   doubts about the system would have ever risen to  power. No military coup would have ever happened.  There’s a real way in which part of why things  have worked out is that ideas can evolve. There’s   some sense in which time heals a lot of wounds and  solves a lot of debates. Throughout time, a lot of   people had really strong convictions, but a lot  of those have been overturned over time because   there’s been continued pluralism and evolution. Imagine applying a CCP-like approach to   truth where truth is what the party says. When  you supercharge that with superintelligence, that   could just be locked in and enshrined for a long  time. The possibilities are pretty terrifying.  To your point about history and living in America  for the past eight years, this is one of the   things I took away from growing up in Germany. A  lot of this stuff feels more visceral. My mother   grew up in the former East, my father in the  former West. They met shortly after the Wall fell.   The end of the Cold War was this extremely pivotal  moment for me because it’s the reason I exist.  I grew up in Berlin with the former Wall.  My great-grandmother, who is still alive,   is very important in my life. She was born in 1934  and grew up during the Nazi era. In World War II,   she saw the firebombing of Dresden from  this country cottage where they were as   kids. Then she spent most of her life in  the East German communist dictatorship.  She’d tell me about how Soviet tanks came  when there was the popular uprising in 1954.   Her husband was telling her to get home  really quickly and get off the streets.   She had a son who tried to ride a motorcycle  across the Iron Curtain and then was put in   a Stasi prison for a while. Finally, when  she’s almost 60, it was the first time she   lived in a free country, and a wealthy country. When I was a kid, the thing she always really   didn’t want me to do was get involved in  politics. Joining a political party had   very bad connotations for her. She raised  me when I was young. So it doesn’t feel   that long ago. It feels very close. There’s one thing I wonder about when   we’re talking today about the CCP. The people  in China who will be doing their version of this   project will be AI researchers who are somewhat  Westernized. They’ll either have gotten educated   in the West or have colleagues in the West. Are they going to sign up for the CCP   project that’s going to hand over control to Xi  Jinping? What’s your sense of that? Fundamentally,   they’re just people, right? Can’t you convince  them about the dangers of superintelligence?  Will they be in charge though? In some  sense, this is also the case in the   US. This is like the rapidly depreciating  influence of the lab employees. Right now,   the AI lab employees have so much power. You  saw this November event. It’s so much power.  Both are going to get automated and they’re  going to lose all their power. It’ll just be   a few people in charge with their armies of  automated AIs. It’s also the politicians and   the generals and the national security  state. There are some of these classic   scenes from the Oppenheimer movie. The  scientists built it and then the bomb was   shipped away and it was out of their hands. It’s good for lab employees to be aware of   this. You have a lot of power now, but  maybe not for that long. Use it wisely.   I do think they would benefit from some  more organs of representative democracy.  What do you mean by that? In the OpenAI board events,   employee power is exercised in a very direct  democracy way. How some of that went about   really highlighted the benefits of representative  democracy and having some deliberative organs.  Interesting. Let’s go back to the $100 billion  revenue question. The companies are trying to   build clusters that are this big. Where  are they building it? Say it’s the amount   of energy that would be required for a small or  medium-sized US state. Does Colorado then get no   power because it’s happening in the United  States? Is it happening somewhere else?  This is the thing that I always find funny,  when you talk about Colorado getting no power.   The easy way to get the power would be to  displace less economically useful stuff.   Buy up the aluminum smelting plant that has a  gigawatt. We’re going to replace it with the   data center because that’s important. That’s not  actually happening because a lot of these power   contracts are really locked in long-term.  Also, people don’t like things like this.  In practice what it requires, at least  right now, is building new power. That   might change. That’s when things get really  interesting, when it’s like, “no, we’re just   dedicating all of the power to the AGI.” So right now it’s building new power. 10   GW is quite doable. It’s like a few percent of US  natural gas production. When you have the 10 GW   training cluster, you have a lot more inference.  100 gigawatts is where it starts getting pretty   wild. That’s over 20% of US electricity  production. It’s pretty doable, especially   if you’re willing to go for natural gas. It is incredibly important that these   clusters are in the United States. Why does it matter that it’s in the US?  There are some people who are trying to  build clusters elsewhere. There’s a lot of   free-flowing Middle Eastern money that’s trying  to build clusters elsewhere. This comes back to   the national security question we talked about.  Would you do the Manhattan Project in the UAE?  You can put the clusters in the US and you can  put them in allied democracies. Once you put them   in authoritarian dictatorships, you create this  irreversible security risk. Once the cluster is   there, it’s much easier for them to exfiltrate  the weights. They can literally steal the AGI,   the superintelligence. It’s like they got a direct  copy of the atomic bomb. It makes it much easier   for them. They have weird ties to China. They  can ship that to China. That’s a huge risk.  Another thing is they can just seize the compute.  The issue here is people right now are thinking   of this as ChatGPT, Big Tech product  clusters. The clusters being planned now,   three to five years out, may well be the AGI,  superintelligence clusters. When things get hot,   they might just seize the compute. Suppose we put 25% of the compute   capacity in these Middle Eastern dictatorships.  Say they seize that. Now it’s a ratio of compute   of 3:1. We still have more, but even with  only 25% of compute there it starts getting   pretty hairy. 3:1 is not that great of a ratio.  You can do a lot with that amount of compute.  Say they don’t actually do this. Even if  they don’t actually seize the compute,   even if they actually don’t steal the weights,  there’s just a lot of implicit leverage you   get. They get seats at the AGI table. I  don’t know why we’re giving authoritarian   dictatorships the seat at the AGI table. There’s going to be a lot of compute   in the Middle East if these deals go through. First of all, who is it? Is it just every   single Big Tech company trying  to figure it out over there?  It’s not everybody, some. There are reports, I think   Microsoft. We’ll get into it. So say the UAE gets a bunch of   compute because we’re building the clusters there.  Let’s say they have 25% of the compute. Why does a   compute ratio matter? If it’s about them being  able to kick off the intelligence explosion,   isn’t it just some threshold where you have  100 million AI researchers or you don’t?  You can do a lot with 33 million extremely  smart scientists. That might be enough   to build the crazy bio weapons. Then  you’re in a situation where they stole   the weights and they seized the compute. Now they can make these crazy new WMDs that   will be possible with superintelligence.  Now you’ve just proliferated the stuff   that’ll be really powerful. Also, 3x  on compute isn’t actually that much.  The riskiest situation is if we’re in some sort  of really neck and neck, feverish international   struggle. Say we’re really close with the CCP  and we’re months apart. The situation we want   to be in — and could be in if we play our cards  right — is a little bit more like the US building   the atomic bomb versus the German project years  behind. If we have that, we just have so much   more wiggle room to get safety right. We’re going to be building these crazy   new WMDs that completely undermine nuclear  deterrence. That’s so much easier to deal   with if you don’t have somebody right on your  tails and you have to go at maximum speed.   You have no wiggle room. You’re worried  that at any time they can overtake you.  They can also just try to outbuild you. They  might literally win. China might literally win   if they can steal the weights, because they  can outbuild you. They may have less caution,   both good and bad caution in terms of  whatever unreasonable regulations we have.  If you’re in this really tight race, this  sort of feverish struggle, that’s when   there’s the greatest peril of self-destruction. Presumably the companies that are trying to build   clusters in the Middle East realize this.  Is it just that it’s impossible to do this   in America? If you want American companies  to do this at all, do you have to do it in   the Middle East or not at all? Then you just  have China build a Three Gorges Dam cluster.  There’s a few reasons. People  aren’t thinking about this as the   AGI superintelligence cluster. They’re just  like, “ah, cool clusters for my ChatGPT.”  If you’re doing ones for inference, presumably  you could spread them out across the country or   something. The ones they’re building,  they’re going to do one training run   in a single thing they’re building. It’s just hard to distinguish between   inference and training compute. People  can claim it’s inference compute,   but they might realize that actually this is  going to be useful for training compute too.  Because of synthetic data and things like that? RL looks a lot like inference, for example. Or   you just end up connecting them in time. It’s a  lot like raw materials. It’s like placing your   uranium refinement facilities there. So there are a few reasons. One,   they don’t think about this as the AGI  cluster. Another is just that there’s   easy money coming from the Middle East. Another one is that some people think   that you can’t do it in the US. We actually  face a real system competition here. Some   people think that only autocracies that can do  this with top-down mobilization of industrial   capacity and the power to get stuff done fast. Again, this is the sort of thing we haven’t faced   in a while. But during the Cold War, there was  this intense system competition. East vs. West   Germany was this. It was West Germany as liberal  democratic capitalism vs. state-planned communism.  Now it’s obvious that the free world  would win. But even as late as 1961,   Paul Samuelson was predicting that the Soviet  Union would outgrow the United States because   they were able to mobilize industry better. So there are some people who shitpost about   loving America, but then in private they’re  betting against America. They’re betting against   the liberal order. Basically, it’s just a bad  bet. This stuff is really possible in the US.  To make it possible in the US, to some degree we  have to get our act together. There are basically   two paths to doing it in the US. One is you just  have to be willing to do natural gas. There’s   ample natural gas. You put your cluster in West  Texas. You put it in southwest Pennsylvania by   the Marcellus Shale. The 10 GW cluster is super  easy. The 100 GW cluster is also pretty doable.   I think natural gas production in the United  States has almost doubled in a decade. You do   that one more time over the next seven years, you  could power multiple trillion-dollar data centers.  The issue there is that a lot of people made these  climate commitments, not just the government. It’s   actually the private companies themselves,  Microsoft, Amazon, etc., that have made   these climate commitments. So they won’t do  natural gas. I admire the climate commitments,   but at some point the national interest  and national security is more important.  The other path is doing green energy  megaprojects. You do solar and batteries   and SMRs and geothermal. If we want to do that,  there needs to be a broad deregulatory push.   You can’t have permitting take a decade. You  have to reform FERC. You have to have blanket   NEPA exemptions for this stuff. There are inane state-level regulations. You can   build the solar panels and batteries next to your  data center, but it’ll still take years because   you actually have to hook it up to the state  electrical grid. You have to use governmental   powers to create rights of way to have multiple  clusters and connect them and have the cables.  Ideally we do both. Ideally we do natural gas  and the broader deregulatory green agenda.   We have to do at least one. Then this  stuff is possible in the United States.  Before the conversation I was reading a good book  about World War II industrial mobilization in the   United States called Freedom’s Forge. I’m thinking  back on that period, especially in the context of   reading Patrick Collison’s Fast and the progress  study stuff. There’s this narrative out there that   we had state capacity back then and people just  got shit done but that now it’s a clusterfuck.  It wasn’t at all the case! It was really interesting. You had   people from the Detroit auto industry side, like  William Knudsen, who were running mobilization for   the United States. They were extremely competent.  At the same time you had labor organization and   agitation, which is very analogous to the climate  change pledges and concerns we have today.  They would literally have these strikes,  into 1941, costing millions of man-hours   worth of time when we’re trying to make tens  of thousands of planes a month. They would just   debilitate factories for trivial concessions  from capital that were pennies on the dollar.  There were concerns that the auto companies  were trying to use the pretext of a potential   war to prevent paying labor the money it  deserves. So with what climate change is today,   you might think, “ah, America’s fucked. We’re  not going to be able to build this shit if you   look at NEPA or something,” I didn’t realize  how debilitating labor was in World War II.  It wasn’ just that. Before 1939, the American  military was in total shambles. You read about   it and it reads a little bit like the German  military today. Military expenditures were I think   less than 2% of GDP. All the European countries  had gone, even in peacetime, above 10% of GDP.  It was rapid mobilization starting  from nothing. We were making no planes.   There were no military contracts. Everything  had been starved during the Great Depression.   But there was this latent capacity. At some  point the United States got its act together.  This applies the other way around too with China.  Sometimes people count them out a little bit with   the export controls and so on. They’re able to  make 7-nanometer chips now. There’s a question   of how many they could make. There’s at least  a possibility that they’re going to mature that   ability and make a lot of 7-nanometer chips. There’s a lot of latent industrial capacity   in China. They are able to build a lot of power  fast. Maybe that isn’t activated for AI yet. At   some point, the same way the United States and  a lot of people in the US government are going   to wake up, the CCP is going to wake up. Companies realize that scaling is a thing.   Obviously their whole plans are contingent on  scaling. So they understand that in 2028 we’re   going to be building 10 GW data centers. At that point, the people who can keep up   are Big Tech, potentially at the edge of their  capabilities, sovereign wealth fund-funded things,   and also major countries like America and China.  What’s their plan? With the AI labs, what’s their   plan given this landscape? Do they not want  the leverage of being in the United States?  The Middle East does offer capital, but America  has plenty of capital. We have trillion-dollar   companies. What are these Middle Eastern  states? They’re kind of like trillion-dollar   oil companies. We have trillion-dollar companies  and very deep financial markets. Microsoft could   issue hundreds of billions of dollars of  bonds and they can pay for these clusters.  Another argument being made, which is worth  taking seriously, is that if we don’t work   with the UAE or with these Middle Eastern  countries, they’re just going to go to China.   They’re going to build data centers and  pour money into AI regardless. If we don’t   work with them, they’ll just support China. There’s some merit to the argument in the   sense that we should be doing benefit-sharing  with them. On the road to AGI, there should be   two tiers of coalitions. There should be a narrow  coalition of democracies that’s developing AGI.   Then there should be a broader coalition of other  countries, including dictatorships, and we should   offer them some of the benefits of AI. If the UAE wants to use AI products,   run Meta recommendation engines,  or run the last-generation models,   that’s fine. By default, they just wouldn’t  have had this seat at the AGI table. So they   have some money, but a lot of people have money. The only reason they’re getting this seat at the   AGI table and giving these dictators this leverage  over this extremely important national security   technology, is because we’re getting  them excited and offering it to them.  Who specifically is doing this? Who are the  companies who are going there to fundraise?  It’s been reported that Sam Altman is trying  to raise $7 trillion or whatever for a chip   project. It’s unclear how many of the clusters  will be there, but definitely stuff is happening.  There’s another reason I’m a little suspicious  of this argument that if the US doesn’t work   with them, they’ll go to China. I’ve heard from  multiple people — not from my time at OpenAI,   and I haven’t seen the memo — that at some  point several years ago, OpenAI leadership   had laid out a plan to fund and sell AGI by  starting a bidding war between the governments   of the United States, China, and Russia. It’s surprising to me that they’re willing to   sell AGI to the Chinese and Russian governments.  There’s also something that feels eerily familiar   about starting this bidding war and then  playing them off each other, saying, “well,   if you don’t do this, China will do it.” Interesting. That’s pretty fucked up.  Suppose you’re right. We ended up in this place  because, as one of our friends put it, the Middle   East has billions or trillions of dollars up  for persuasion like no other place in the world.  With little accountability. There’s no  Microsoft board. It’s only the dictator.  Let’s say you’re right, that you shouldn’t  have gotten them excited about AGI in the   first place. Now we’re in a place where they  are excited about AGI and they’re like, “fuck,   we want to have GPT-5 while you’re going to be off  building superintelligence. This Atoms for Peace   thing doesn’t work for us.” If you’re in this  place, don’t they already have the leverage?  The UAE on its own is not competitive. They’re  already export-controlled. You’re not supposed   to ship Nvidia chips over there. It’s  not like they have any of the leading   AI labs. They have money, but it’s hard  to just translate money into progress.  But I want to go back to other things you’ve  been saying in laying out your vision. There’s   this almost industrial process of putting in  the compute and algorithms, adding that up,   and getting AGI on the other end. If it’s  something more like that, then the case for   somebody being able to catch up rapidly seems  more compelling than if it’s some bespoke…  Well, if they can steal the algorithms and if they  can steal the weights, that’s really important.  How easy would it be for an actor to steal  the things that are not the trivial released   things, like Scarlett Johansson’s voice, but the  RL things we’re talking about, the unhobblings?  It’s all extremely easy. They don’t make the claim  that it’s hard. DeepMind put out their Frontier   Safety Framework and they lay out security levels,  zero to four. Four is resistant to state activity.   They say, we’re at level zero. Just recently,  there was an indictment of a guy who stole a bunch   of really important AI code and went to China  with it. All he had to do to steal the code was   copy it, put it into Apple Notes, and export  it as a PDF. That got past their monitoring.  Google has the best security of any of the AI  labs probably, because they have the Google   infrastructure. I would think of the security of  a startup. What does security of a startup look   like? It’s not that good. It’s easy to steal. Even if that’s the case, a lot of your   post is making the argument for why we are going  to get the intelligence explosion. If we have   somebody with the intuition of an Alec Radford to  come up with all these ideas, that intuition is   extremely valuable and you can scale that up. If it’s just intuition, then that’s not going   to be just in the code, right? Also because  of export controls, these countries are going   to have slightly different hardware. You’re going  to have to make different trade-offs and probably   rewrite things to be compatible with that. Is it just a matter of getting the right pen   drive and plugging it into the gigawatt  data center next to the Three Gorges Dam   and then you’re off to the races? There are a few different things,   right? One threat model is just them stealing  the weights themselves. The weights one is   particularly insane because they can just steal  the literal end product — just make a replica of   the atomic bomb — and then they’re ready to go.  That one is extremely important around the time   we have AGI and superintelligence because China  can build a big cluster by default. We’d have a   big lead because we have the better scientists,  but if we make the superintelligence and they   just steal it, they’re off to the races. Weights are a little bit less important right now   because who cares if they steal the GPT-4 weights.  We still have to get started on weight security   now because if we think there’s AGI by 2027, this  stuff is going to take a while. It’s not just   going to be like, “oh, we do some access control.”  If you actually want to be resistant to Chinese   espionage, it needs to be much more intense. The thing that people aren’t paying enough   attention to is the secrets. The compute stuff is  sexy, but people underrate the secrets. The half   an order of magnitude a year is just by default,  sort of algorithmic progress. That’s huge. If we   have a few years of lead, by default, that’s a  10-30x, 100x bigger cluster, if we protect them.  There’s this additional layer of the data wall.  We have to get through the data wall. That means   we actually have to figure out some sort of  basic new paradigm. So it’s the “AlphaGo step   two.” “AlphaGo step one” learns from human  imitation. “AlphaGo step two” is the kind   of self-play RL thing that everyone’s working  on right now. Maybe we’re going to crack it.   If China can’t steal that, then they’re stuck.  If they can steal it, they’re off to the races.  Whatever that thing is, can I literally write it  down on the back of a napkin? If it’s that easy,   then why is it so hard for them to figure  it out? If it’s more about the intuitions,   then don’t you just have to hire Alec  Radford? What are you copying down?  There are a few layers to this. At the top is the  fundamental approach. On pre-training it might be   unsupervised learning, next token prediction,  training on the entire Internet. You actually   get a lot of juice out of that already.  That one’s very quick to communicate.  Then there’s a lot of details that matter,  and you were talking about this earlier.   It’s probably going to be somewhat obvious in  retrospect, or there’s going to be some not too   complicated thing that’ll work, but there’s  going to be a lot of details to get that.  If that’s true, then again, why do we  think that getting state-level security   in these startups will prevent China  from catching up? It’s just like, “oh,   we know some sort of self-play RL will  be required to get past the data wall.”  It’s going to be solved by  2027, right? It’s not that hard.  The US, and the leading labs in the United States,  have this huge lead. By default, China actually   has some good LLMs because they’re just using open  source code, like Llama. People really underrate   both the divergence on algorithmic progress and  the lead the US would have by default because   all this stuff was published until recently. Look at Chinchilla Scaling laws, MoE papers,   transformers. All that stuff was published.  That’s why open source is good and why China   can make some good models. Now, they’re  not publishing it anymore. If we actually   kept it secret, it would be a huge edge. To your point about tacit knowledge and   Alec Radford, there’s another layer at the  bottom that is something about large-scale   engineering work to make these big training  runs work. That is a little bit more like   tacit knowledge, but China will be able to figure  that out. It’s engineering schlep, and they’re   going to figure out how to do it. Why can’t they figure that out,   but not how to get the RL thing working? I don’t know. Germany during World War II   went down the wrong path with heavy  water. There’s an amazing anecdote   in The Making of the Atomic Bomb about this. Secrecy was one of the most contentious issues   early on. Leo Szilard really thought a nuclear  chain reaction and an atomic bomb were possible.   He went around saying, “this is going to be of  enormous strategic and military importance.”   A lot of people didn’t believe it or thought,  “maybe this is possible, but I’m going to act   as though it’s not, and science should be open.” In the early days, there had been some incorrect   measurements made on graphite as a moderator.  Germany thought graphite wasn’t going to work,   so they had to do heavy water. But then Enrico  Fermi made new measurements indicating that   graphite would work. This was really important. Szilard assaulted Fermi with another secrecy   appeal and Fermi was pissed off, throwing a  temper tantrum. He thought it was absurd, saying,   “come on, this is crazy.” But Szilard persisted,  and they roped in another guy, George Pegram.   In the end, Fermi didn’t publish it. That was just in time. Fermi not publishing   meant that the Nazis didn’t figure out graphite  would work. They went down the path of heavy   water, which was the wrong path.  This is a key reason why the German   project didn’t work out. They were way behind. We face a similar situation now. Are we just   going to instantly leak how to get past the data  wall and what the next paradigm is? Or are we not?  The reason this would matter is if being  one year ahead would be a huge advantage.   In the world where you deploy AI over time  they’re just going to catch up anyway.  I interviewed Richard  Rhodes, the guy who wrote The   Making of the Atomic Bomb. One of the anecdotes  he had was when the Soviets realized America had   the bomb. Obviously, we dropped it in Japan. Lavrentiy Beria — the guy who ran the NKVD,   a famously ruthless and evil guy — goes to the  Soviet scientist who was running their version   of the Manhattan Project. He says, “comrade, you  will get us the American bomb.” The guy says,   “well, listen, their implosion device actually is  not optimal. We should make it a different way.”   Beria says, “no, you will get us the American  bomb, or your family will be camp dust.”  The thing that’s relevant about that anecdote is  that the Soviets would have had a better bomb if   they hadn’t copied the American design, at least  initially. That suggests something about history,   not just for the Manhattan Project. There’s  often this pattern of parallel invention   because the tech tree implies that a certain  thing is next — in this case, a self-play   RL — and people work on that and are going to  figure it out around the same time. There’s not   going to be that much gap in who gets it first. Famously, a bunch of people invented the light   bulb around the same time. Is it the case  that it might be true but the one year   or six months makes the difference? Two years makes all the difference.  I don’t know if it’ll be two years though. If we lock down the labs, we have much better   scientists. We’re way ahead. It would  be two years. Even six months, a year,   would make a huge difference. This gets back  to the intelligence explosion dynamics. A year   might be the difference between a system that’s  sort of human-level and a system that is vastly   superhuman. It might be like five OOMs. Look at the current pace. Three years ago,   on the math benchmark — these are really  difficult high school competition math   problems — we were at a few percent, we couldn’t  solve anything. Now it’s solved. That was at   the normal pace of AI progress. You didn’t  have a billion superintelligent researchers.  A year is a huge difference, particularly after  superintelligence. Once this is applied to many   elements of R&D, you get an industrial  explosion with robots and other advanced   technologies. A couple of years might  yield decades worth of progress. Again,   it’s like the technological lead  the U.S. had in the first Gulf War,   when the 20-30 years of technological lead  proved totally decisive. It really matters.  Here’s another reason it really matters. Suppose  they steal the weights, suppose they steal the   algorithms, and they’re close on our tails.  Suppose we still pull out ahead. We’re a   little bit faster and we’re three months ahead. The world in which we’re really neck and neck,   we only have a three-month lead, is incredibly  dangerous. We’re in this feverish struggle where   if they get ahead, they get to dominate, maybe  they get a decisive advantage. They’re building   clusters like crazy. They’re willing to throw  all caution to the wind. We have to keep up.  There are crazy new WMDs popping up. Then we’re  going to be in the situation where it’s crazy new   military technology, crazy new WMDs, deterrence,  mutually assured destruction keeps changing   every few weeks. It’s a completely unstable,  volatile situation that is incredibly dangerous.  So you have to look at it from the point of  view that these technologies are dangerous,   from the alignment point of view. It might be  really important during the intelligence explosion   to have a six-month wiggle room to be like, “look,  we’re going to dedicate more compute to alignment   during this period because we have to get it  right. We’re feeling uneasy about how it’s going.”  One of the most important inputs to  whether we will destroy ourselves or   whether we will get through this incredibly  crazy period is whether we have that buffer.  Before we go further, it’s very much worth noting  that almost nobody I talk to thinks about the   geopolitical implications of AI. I have some  object-level disagreements that we’ll get into,   things I want to iron out. I  may not disagree in the end.  The basic premise is that if you keep scaling, if  people realize that this is where intelligence is   headed, it’s not just going to be the same old  world. It won’t just be about what model we’re   deploying tomorrow or what the latest thing  is. People on Twitter are like, “oh, GPT-4 is   going to shake your expectations” or whatever. COVID is really interesting because when March   2020 hit, it became clear to the world  — presidents, CEOs, media, the average   person — that there are other things happening  in the world right now but the main thing we   as a world are dealing with right now is COVID. Soon it will be AGI. This is the quiet period.   Maybe you want to go on vacation. Maybe now is the  last time you can have some kids. My girlfriend   sometimes complains when I’m off doing work that  I don’t spend enough time with her. She threatens   to replace me with GPT-6 or whatever. I’m like,  “GPT-6 will also be too busy doing AI research.”  Why aren’t other people talking  about national security?  I made this mistake with COVID. In February  of 2020, I thought it was going to sweep   the world and all the hospitals would  collapse. It would be crazy, and then   it’d be over. A lot of people thought this kind  of thing at the beginning of COVID. They shut   down their office for a month or whatever. The thing I just really didn’t price in was   societal reaction. Within weeks, Congress  spent over 10% of GDP on COVID measures.   The entire country was shut down. It was crazy.  I didn’t sufficiently price it in with COVID.  Why do people underrate it? Being in the  trenches actually gives you a less clear   picture of the trend lines. You don’t have  to zoom out that much, only a few years.  When you’re in the trenches, you’re trying  to get the next model to work. There’s always   something that’s hard. You might underrate  algorithmic progress because you’re like,   “ah, things are hard right now,” or “data wall”