Transcript: Accelerating Science With Ai Paul G Allen School

TITLE: Accelerating Science with AI CHANNEL: Paul G. Allen School DATE: 2026-05-10 ---TRANSCRIPT--- I’ll welcome you all of you to what I think is our last distinguished lecture of the year. I’m Ed Lazowska. I’m a pseudo retired faculty member. That means my salary is zero and I don’t go to faculty meetings anymore, which is just utterly fantastic. It’s a great life.

[laughter] Um so I first met our guest today uh uh Kevin Weil when probably 20 years ago through his dad who’s a an Allen School alum. And the goal then was to convince him to uh uh go to graduate school in computer science, which we failed at. Uh he was an undergrad in math and physics and continued on that trajectory, but he has overcome that bad decision and had a 20-year career in the tech industry, which is quite varied. There are lots of seats over here, folks. Um so Kevin spent 2 years at Cooliris, 7 years at Twitter, 2 years at Instagram, 3 years at Meta, 4 years at Planet, which is pretty interesting. It’s a sort of private satellite company that uses uh uh uh uh imaging to deal with global change tracking. Uh and then 2 years at OpenAI, where he until very recently was president of OpenAI for science, and that’s the topic of his uh presentation today. Kevin also has a set of ancillary activities, which are quite amazing. He’s a member of the Council on Foreign Relations. Uh he’s a board member of The Nature Conservancy. Uh he’s a board member at Cisco. And he’s Lieutenant Colonel in the Army Reserve, quite amazingly in a sort of tech core. So please join me in welcoming Kevin Weil. Thanks for being here. [applause] All right. Thank you, Ed. Thank you to the U Dub computer science department for having me. This is a fun This will be a fun conversation. I since I just left OpenAI 2 weeks ago, I will probably say we a lot referring to OpenAI. Um but that’s all right. Uh so I I it’s exciting to be here to talk about AI and science AI and and scientific discovery and in particular I want to make the point that this is not some dystopic vision of the future where the AI is going to do all the work and we’re going to sit around like collecting our UBI and writing poetry. I think it’s quite the opposite actually. I think it is the most exciting time ever to be a scientist. The reach that every one of us has is broader because we have a collaborator that is extremely knowledgeable is infinitely patient uh knows every subfield of of basically any field of science that you want to talk about and we can go deeper in collaboration with AI. You can explore more routes of attack on a particular problem than any of us could sort of as as humans pre-AI. So I’ll I’ll tell you a story actually from a conversation that I was having that illustrates this that really kind of landed with me. So I was talking to a a mathematician a Fields Medal winning mathematician. So a mathematician at the very top of the field. And he said, you know, over the course of my career uh I have written I don’t know how many, you know, tens, hundreds of papers. And every single one of those papers there were routes that I could see to new mathematics that were branching off of the paper that I had written. But I didn’t feel like I had that I was the exact right person to to go after those problems. You know, they were a little bit outside of my area of expertise. And so I just left all of these paths unexplored my entire career, hundreds of papers. Now with AI I feel like I can go back and explore all of these routes that were unexplored because I have a collaborator who knows all of these other subfields that maybe I’m not an expert in. And to listen to a Fields medalist say he was sort of intimidated to go into these new areas, but now with AI he can just kind of illustrates that this is it’s not about AI taking over the world and doing things for us. It’s about AI giving us all superpowers. But first, I want to kind of go back and uh talk about how far we’ve come in a very short period of time. So, we’re going to go back all the way back to 3 years ago. So far back that instead of an iPhone 17 we had iPhone 14s. Uh and back then there was a model called GPT-4 and it blew my mind that GPT-4 could get a 700 on the math SAT. All right, that’s better than about 90% of high school students. And you can see a typical problem from the math SAT up there. All right, so let’s up the challenge. And instead of the SAT, let’s use the AIME, which is a top high school math competition. So, here the problems are meaningfully harder. You can see on the right. Uh and you can see the difference in score, too, right? We got a 9% on the AIME. Strong F. Uh we’re in mid-2024 now, by the way. So, this is about a year after uh the original GPT-4 getting 700 on the SAT. And this is GPT-4’s successor called GPT-4o. Only 9%. All right, so now let’s jump to the end of 2024 when we released our first reasoning model, which was called O1. And here we’ve had a major jump, right? O1 got 74% on the AIME. And now it’s worth pausing for a second to explain the difference uh between O1 and and all the models that came before it because O1 could reason. So, unlike GPT-3 and 4 and basically every other model at the time, which when you asked a question would just instantly give you an answer, O1 would do what you or I would do if we got asked a hard question. It wouldn’t just like spit out the first thing that came to mind. It would pause. Just like I would pause if you asked me a hard question. I would think about the problem. I would try and find different ways to attack it. I would maybe try and find similar problems that I knew how to solve, or maybe I would simplify the problem, you know, reduce it down to a smaller problem and see if it would give me intuition. So, O1 could use additional computing power not at training time, but at test time, at question time, uh to think when answering the question. And just like you or me, there’s a lot of questions that you can ask me that I can’t answer immediately, but I can answer if you give me 30 seconds, or maybe 30 minutes, maybe 30 hours, maybe 30 days for some really hard problems. So, this ability to think in response to a question was a really big step uh provided a sort of a second dimension on which to scale models. Uh you can scale the amount of pre-training you do, and you can scale the amount of of thinking that the models are allowed to do at test time. This unlocked the model’s ability to solve harder and harder problems, and you can see this big jump in about 6 months. Okay. So, now with reasoning unlocked, we’re working on perfecting uh and improving the model’s ability to reason. The next model, O3, uh is even better. It just a few months later, it gets 89% on the AIME. That is now better than almost all US high school students. But, you know, this is just a high school These are high school students. They’re very They’re pretty young, early in their career. Uh and it’s just a high school math competition. So, what if we up our game and we go after a real math competition? So, let’s talk about the IMO, the International Mathematical Olympiad. By about July of last year, so this is what, 10 months ago, we had an internal model at OpenAI that was able to achieve a gold medal on the IMO. And gold medal means you are in the top five of all results across every student that takes the IMO across the whole world. Not top 5%, but top five. And you fast forward another two months, uh another few months, so this past December, and the models are basically perfect on the AIME now. So, we’ve come a long way just in a few years from a model that couldn’t even answer some SAT math questions to a model that aces the AIME and is among the top five in the world at the top mathematics competition. All right, we we take this all in stride, but when you step back, it’s kind of an amazing pace of progress. I I think it’s actually cool how much we take this in stride. Has Has anybody actually has Raise your hand if you’ve ridden in a Waymo. So, a bunch of people. This is to me another example of how quickly we uh we we take these kind of um advances in stride. So, my first ride in a Waymo, you get in and like the car starts driving, and my initial reaction for the first 30 seconds was like, “Oh my god, watch out for that bicycle. Oh god, it’s turning down an alley. There are cars in this alley. I you know.” And then you you sort of get over that, your blood pressure comes down, and for the next few minutes, you’re just like, “This is amazing. I am in a car. I’m being driven around San Francisco by a robot. There’s nobody sitting in the front seat, and the wheel’s turning, and it all just works. This is incredible. I live in the future.” And then like 10 minutes later, I’m on my phone, bored, like scrolling Twitter, doing my email, being like, “Ugh, why is this drive so long? There’s so much traffic.” You know, so like you go from this thing being miraculous that you’ve never seen exist before to like, “Okay, this is just part of life now in a very short period of time and I I think we do the same with models. We forget how dramatically fast capabilities are evolving. But, okay, so uh all of this so far has been about contest math. And, you know, contest math is somewhat formulaic. It requires some creativity, but you know, most of contest math you learn a few techniques, you can apply those techniques over and over again and you can do well. It’s not like this is math research. We’re not solving novel problems. And this idea that LLMs would never do novel research was captured by the phrase stochastic parrots. The idea was that LLMs were just, you know, probabilistically parroting back segments of what they’ve learned of their pre-training data. And since all they do is sample from a distribution of stuff that they’ve learned somewhere, they clearly can’t do novel things. We felt differently. You’ll not be surprised to to learn. And in Q4 of this past year, we were seeing more and more examples of AI accelerating mathematics, accelerating science. And so we got together a group of about 10 different academics from outside OpenAI and we wrote a paper with them. Uh and the idea here wasn’t to break ground with novel research. It was to put a stake in the ground to say, this is where AI is. This is where we are in the evolution of AI. And the paper is about 10 chapters beginning with these more modest examples showing uh scientists and mathematicians sort of, you know, doing uh almost using the AI like a tool. Using it to do more advanced literature search, uh showing how they use AI as a research partner, things like that. But then at the end there are a couple of examples of AI solving open problems. And in some of these examples, they were problems that the authors had worked on for years without a solution, and then in tandem with AI, they were suddenly able to make progress and solve it. One of these problems was an Erdős problem. So, there’s this collection of problems named after a mathematician named Paul Erdős, who was this incredibly generative uh probabilist, combinatorialist, number theorist. Uh just has a fascinating life story that you should read. Uh he spent much of his life on methamphetamine, means, and coffee, and roaming around, and working with absolutely everybody, and he left in his wake a massive trail of unsolved problems. Uh of of like varying difficulty, right? Some are Some are maybe just a little bit beyond the horizon. Some have been solved. Some are a little bit beyond the horizon. Some are major, you know, significant open problems. Um there’s and they’ve all been under they’ve they’ve been brought under one roof now, collectively known as the Erdős problems. Um there are about 1,200 of them in total. The guy was just incredibly prolific. About 40% of them have been solved, so there are 700 open Erdős problems. Um and so we actually solved one. And we have a OpenAI employs a bunch of mathematicians as AI researchers. Um one of the fun things, if you’re a mathematician and an AI researcher, is you can use the AI to do math and learn from it. And a couple of the mathematicians on our team actually solved an Erdős problem using AI. Which at the time we were like, “Wow, you know, this isn’t just a small thing. This is like These are named problems. This is cool.” And then over the next month or two, so we published this paper saying, “Hey, we solved an Erdős problem.” And over the next month or two, from all over the world, there’s this huge rush as other mathematicians used, at the time it was GPT-5.2, to solve basically a lot of the low-hanging fruit uh among the Erdős problems. Um So, in some cases, they actually found, using, you know, uh really good AI literature search that the problems had been solved by other mathematicians in other papers and just nobody had realized it. In other cases, they actually were novel solutions. Um but, you know, these are these are low-hanging fruit. Terry Tao said in some cases these were these were more bottlenecked on attention than on capability, and I think he was right. So, all right. We Now Now we’ve moved on to solving some open problems, but, you know, you could criticize this and say, “Yeah, but these aren’t real research problems.” Uh and so, then there was this this this work called First Proof that uh was a play on a baking term. A group of mathematicians got together um and I’ll just quote from their paper. “We present a diverse set of 10 research-level math questions drawn from the mathematical fields of algebraic combinatorics, spectral graph theory, algebraic topology, stochastic analysis, symplectic geometry, representation theory, lattices and Lie groups, tensor analysis, and numerical linear algebra, each of which came about naturally in the research process for one of the authors. Each question has been solved by the author of the question with a proof that’s roughly five pages or less, but the answers are not yet posted to the internet.” So, they challenged the internet and and frontier AI companies to solve as many as they could in just 1 week, and at 1 week they were going to decrypt the the set of solutions that they’d posted. Uh and the other crucial part about this was they said, “You may not have a human in the loop. It has to be just AI.” So, we have a well-posed problem. You ask the question to the AI. See how well it can do. And of course, we chose to play. Uh some some folks were betting that no one would do better than two out of 10. Uh we used an internal unreleased model, uh and we believe that we were able to answer five out of the 10 correctly. We actually thought we’d gotten six, but when we published our solution, somebody pointed out a mistake. Um our write-up which was also done autonomously by GPT-5 um also included the problems that we didn’t think that we got right because we thought it would be interesting to show the the community you know, sort of the current state even if the the the answers weren’t quite right. Um One interesting thing by the way, I said we believe that we got five out of 10 right. That’s sort of a weird thing to say about a set of math proofs. Like we believe, why isn’t it’s right or wrong? But it it gets at the challenge of verifying these AI-created proofs, which is something I’m going to come back to you later. Um so now, if we go back, you know, we look at the the last 10 slides that we’ve that we’ve gone through here. We’ve traveled 3 years. We’ve gone from 2023 to 2026. We’ve gone from a pretty good score on the SAT to being able to autonomously do five out of 10 open research problems across a super wide spectrum of math. Like I don’t think that there’s a mathematician out there in the world that could answer five of those questions in a week because they’re so broad. Clearly, there are mathematicians that can answer and solve any one of those. But is there a single mathematician that can do five of those in a week? I’m not sure. Uh it which sort of points at one of the interesting and novel aspects of doing research with AI. Right? There are there are people in the audience probably here today that can do lots of things that GPT-5.5 or whatever cannot do. But I don’t think there’s anyone that knows the sheer range of research-grade mathematics that it knows. Let alone the full complement of science it knows across physics and chemistry and biology and health and economics and so on. Which is why the researchers in tandem with AI are so much more powerful than the researchers alone. The AI gives you superpowers. All right. I’m going to switch gears real quick and talk physics. Uh and here we’re not going to go all the way back to GPT-4. We’re just going to start uh a few months ago. So I was fortunate to do my undergrad in math and physics at Harvard and while I was there I looked up to a physicist named Andy Strominger. He’s one of the top string theorists in the world, one of the top particle physicist. He’s been there, you know, he’s been at that level for decades. And in particle physics you think about how particles interact. You model it with what are called scattering amplitudes which sort of give the likelihood that a certain set of particles interacts in a particular way in a particular theory. And you model these interactions with Feynman diagrams that look like the that image up there. Which give you this sort of perturbative expansion, like a Taylor series expansion, that let you calculate quantities uh with the theory. And they’re super painful. You know, every one of these vertices gives you like a four-dimensional integral. You know, each individual diagram takes many pages of calculations, takes many grad student hours. Uh ask me how I know. Uh Increasingly though, there are all these examples where you you do these hugely painful like combinatoric expansions of these things. Each one of them is a super messy individual answer. And then you bring them together and all the messiness cancels and you end up with something simple. Which tells you that there is some in some of these cases there’s some underlying simplicity, probably some symmetry, that you didn’t realize was there. Uh and you’ve overlooked it and you’ve ended up working way harder than you needed to. You’ve done the brute force version and there was some, you know, something more simple. So anyways, for years physicists had assumed that this particular amplitude right here where you have one negative helicity gluon coming in, interacting, and then there’s n minus one positive helicity gluons coming out they believed that was zero. You can show in the theory that if all the gluons have the same helicity, it’s definitely zero. They thought that this this example where there’s just one negative helicity gluon that that was also zero. And this was so much a part of I mean this is in textbooks. It’s just like everyone knew this was true. So much so that they called the version where there are two negative helicity gluons and n minus two positive they called it the maximally helicity violating amplitude. Like the version with two had a name and they called it maximal because the versions with zero and one were definitely zero. Right? This was true for decades. Um Andy Strominger wasn’t so sure. He thought that there was this a particular region of phase space where uh this didn’t vanish and he wanted to prove it. And the calculations though, when you started doing them, just became horrendous. So when there were just three particles, you have one of one helicity, two of the other, right? The expression is a single term, not so bad. Uh that term itself by the way is shorthand for something that is much more complex, but you know, still you can hide it and it’s pretty simple. The four-particle one starts to get, you know, it’s the sum of two products. The five-particle one turns into a sum of eight things, each of which is a product of three terms. When you get to n equals six, it starts to be really nasty. Um you can start to make some assumptions and simplify and, you know, but it still it gets exponentially complex as n grows and the whole point is to solve it for arbitrary n. So we had a physicist on our team, this guy Alex Lupsasca, who was a former student of Andy’s, and we had been looking for an excuse to invite Andy out and to try and test our model and do some physics together. See See how the model I started you know it’s at this point it’s starting to do interesting mathematics, how can it do on physics? Uh and so Alex and Andy uh along with some collaborators, these two guys David and Alfredo, decided that like this could be an interesting problem to try out the model on. And so we set up a date and they started to head out. On their way over, Alex starts playing with uh GPT-5.2 Pro just trying to see if it can understand this combinatorial explosion and how it could be simplified. And it turns out GPT-5.2 Pro was able to quickly predict a closed-form expression for the answer, but it couldn’t prove it. So, it said, “Ah, I think this is right, but I I can’t give you a proof.” We had a more powerful internal model that we hadn’t launched yet. And before Andy and David and Alfredo had even gotten off the plane, we’d given this problem to the internal model and it had proposed it and it had proved the closed-form uh of this of the expression for arbitrary n. So, these guys were still on the plane and we were like, uh we solved the problem. But but this is so basic, right? Why weren’t Andy and other people able to at least conjecture this before? I mean, it wasn’t uh basic to Andy or any of the guys that I mean, remember this is also like the whole the whole industry thought this term was gone. sum of products into a product of sums, which right? Is the first thing you try on any such problem. I think there’s I think there’s more to it. You should You should read the paper. Um but the interesting thing was instead of instead of like spending the the week at OpenAI together doing the work, we spent the week verifying the work, which is back to that same thing we were talking about, verifying. Um, and anyway, so we published it. It was interesting to start seeing people’s feedback. One of the cool things to me was the despite the fact that AI did, you know, a meaningful amount of work in the paper, the result like it doesn’t matter at the end of the day. The important thing is the result, right? The important thing is that science is moving forward, which I think is actually also a an indicator of where this is going. When technology works, it fades into the background. In the future, we’re not going to talk about results that came from AI versus results that came from humans. It’ll be like we don’t talk about results that involve computers or involve spreadsheets. It’s just a part of how we operate, and the important part is the science. Um, it was also Nima, who was also a a hero of mine, um, when I was a student, uh, also talked about how finding simplicity in complexity is something that AI is often great at and and points to underlying structures. Um, one other cool thing, there’s sort of a natural, um, generalization of this from gluons to gravitons, where you ask the same question but about spin-2 gravitons. And, uh, Andy and Alex and Alfredo published that paper, but the way that they did this, it in this case they knew that there was, you know, the the the calculation is actually more complex, there’s a bunch more stuff involved, but they said, “Hey GPT, here’s our paper on gluons. Can you now generalize this to the graviton setting and write the paper?” And it could do the whole thing autonomously with the input of, you know, the majority of the work having been done in the original paper, but it was able to autonomously generate the entire second paper. Um obviously then checked by humans to make sure everything was right, but it’s a cool example of uh being able to reason off of a an existing uh set of work. Yeah. Can you talk more about the human role here? Obviously, the human poses the problem. Yeah. Human checks the result. Okay? Presumably, we’re at the point in some fields where the model poses problems for itself. But but what what is cases and I will talk about this a bit later, but um I I mean you can probably tell by the setup. This is I I’m a big believer in AI as a tool. It is not AI as a thing that’s going to do all the work for us. Uh expertise still really matters. Um you know, I have most of a PhD in physics. I dropped out and started working at startups. But you know, I know enough to be dangerous. I can’t do nearly what my old colleague Alex Lupsasca, who is a real physicist, can do physics with a model because I am not nearly the expert that he is. So, he has an ability to ask the right question that I don’t have. He has an ability to then go back and forth with the model saying, “Well, is it that? Yeah, that’s an interesting idea, but I really think we should go in this direction.” You know, he’s using it like he would like he would talk to a colleague. And he can get so much more out of it than I can you know, at a lower level in physics. So, this is true across the board. You see the same thing with mathematicians. You see it across, you know, in biology and material science. Expertise matters. Experts in a field plus AI can get way more out of uh you know, a novice or a layman in a field plus AI. Um all right. Real quick, I want to switch fields one more time. Uh uh This time So, we’ll go to biology and the physical sciences. Um this is a result from a few months ago as well, and I’ll I’ll quote from the abstract. We use an autonomous lab. So, here we’re going from uh theoretical sciences like math and physics that you can do in silico to physical sciences that that involve the real world and atoms. We use an autonomous lab comprising a large language model and a fully automated cloud laboratory to optimize the cost efficiency of cell-free protein synthesis. By conducting iterative optimization, the LLM-driven autonomous lab was able to achieve a 40% reduction in the specific cost of cell-free protein synthesis relative to the state of the art. This cost reduction was accompanied by a 27% increase in protein production uh iterative experimental design, experiment execution, data capture and analysis, and data interpretation, as well as new hypothesis generation were all handled by the LLM-driven autonomous lab. So, let me show you what this looks like uh cuz I think it’s the future of physical science. So, you have an AI model that’s trained in biology and it’s given a goal, which in this case was to develop techniques for cell-free protein synthesis at high volume and at low cost. And it has as one of the tools it can use an IRL robotic lab uh that it can use to evaluate its ideas quickly. So, the AI model thinks. It designs different experimental concepts. It reasons through them. It tests them in silico as it reasons knowing what it knows. And then when it gets a set of parameters that it thinks could be a could be a sort of a valuable uh experimental setup it instructs the robotic lab to execute the experiment. The robotic arms go and run the experiment and here you can see the examples or the samples sort of like transit transiting between these different uh stations for different parts of the test. And then at the end it submits the results back to the AI model. The model incorporates these results uses them to refine its thinking and then comes back with an improved idea and then you iterate. So, in this way we ran 36,000 different experiments dramatically faster than a human lab could ever do. And the cool thing about this is all the parts of this are scalable. Right? You can apply more compute at the AI layer. You can use more of these robotic systems, and you can scale it out and parallel that way. Uh and so, you know, we don’t need to be limited by postdocs pipetting things. Postdocs have much higher, more interesting things that they can be doing. Robots can do it, and they can do it with scale, and they can do it with precision. And again, the postdocs, who themselves have superpowers from AI, can have more ideas, can test their ideas more quickly, and I really do think that this is the future of science. Uh so, along these lines, uh I was uh I was speaking with this professor, his name is Gaspar, who’s a linguist at Berkeley. He, by the way, is decoding the language of sperm whales using AI. It’s so cool. Uh so, did you know sperm whales have vowels? Sperm whales have complex social structures and a common language that they speak to each other. And you can actually use AI to understand patterns like vowels in their language. Um his research is awesome. You should look it up. It’s like it will blow your mind. But he said this thing, which has stuck in my head, which is AI is a metal detector for hypotheses. So, you know, as as humans, we have tons of ideas. We have way more ideas than we have time to experiment on or investigate deeply. But when you have an advanced AI model at your side, it’s this super smart thing that has read substantially every scientific paper that has been written in the last n decades across every field. It’s infinitely patient. You can direct it mercilessly to explore the pros and cons of any idea that you have. You can even spawn 10 of them in parallel to explore all of your ideas at once and come back with the best ones, right? It is a metal detector for hypotheses. Which is why I keep coming back to this AI as a tool for scientists, to your question, Ed. Right? It gives scientists superpowers. The models are getting really, really good. They can do some things you can’t, but also you’re better at a whole bunch of things than they are. And it’s it’s really about the two together. It’s a power tool that is ready to be guided by human expertise uh and intuition and judgment. Uh so I’m very optimistic about this, as you can tell. But I didn’t want to just talk about the stuff that works. I wanted to spend a little bit of time as we get towards wrapping up on the stuff that doesn’t. So AI needs to get a lot better. I’m very confident that it will. But it still does struggle with a number of things from a research perspective. One is verification, which I’ve come back to now a couple times. Uh so in the when I was talking about the first proof uh example, I said, “Well, we believe that we solved five out of the 10.” And the reason it’s we believe versus we know is it’s actually really subtle to tell the difference between a correct proof and an incorrect one, especially an almost correct proof. And the model can be overconfident. It does things like refer to a lemma to solve a key part of the proof. And then it turns out that lemma is you know, sometimes it doesn’t exist or sometimes the the the requirements of the lemma are slightly different than the situation. Requires very, very careful checking. Right? On the scale of five or 10 problems, yeah, you can check, but imagine and and you know, this is something that we and other labs commonly do. One of the ways you test your new model that you haven’t launched is by throwing it against all 700 unsolved Erdős problems. Right? And why wouldn’t you? This is the benefit of of having scale. So, the model will come back and say, “Well, I couldn’t solve 500 of those 700 problems, but I think I have solutions for 200.” And and we’ll go, “Okay, no you don’t. Like, you don’t have solutions for 200 Erdős problems. We know that. But, it may have solved some ones that were still open. And now you need an army of mathematicians to go check 200 potential subtly correct or incorrect proofs, uh which takes time and is actually super annoying to the mathematicians. Um And so, the bottleneck has shifted, which is really interesting, right? It’s it’s from a problem of attention, where many problems just go unlooked at. Many of these Erdős problems just, you know, probably weren’t solved because nobody had really focused on them, to a problem of verification, where an AI model and some compute will propose a solution, and now you need to figure out whether it’s correct. So, in response to this, it in the mathematics, at least, you have things like Lean, um where you can do formal verification of problems. You can do some of this in computer science, obviously. But, I think this is a general thing that we, as you know, in the field of science, more broadly, are going to need to deal with. Verification is going to become much more important in a world where you have lots of AI proposed solutions to lots of problems. Yeah. a proof for AI should be is a lot easier than coming up with it, doesn’t this mean that where actually the problem is is that it hasn’t all been properly formalized, and that’s where things like Lean Once it’s all formalized, you know, checking the proof is, you know, takes seconds. Uh well, once it’s formalized, but the process of formalization is not trivial for a number of reasons, right? So, is in the lack of formalization. The problem is in the lack of formal- I mean, in general, you need a way to verify it. In math, the answer is make your model really good at Lean, which a lot of the labs are are doing. Uh there are a number of startups doing it as well. But even there, Lean is uh is is a problem of of building in layers. So, combinatorics, okay, there aren’t a ton of pre-reqs, you know, that’s okay. If you start doing something in algebraic topology where you need to you know, before you’re ready to define and verify uh you know, some topic in fundamental groups, well, what’s a group? Well, what’s a topology? Well, and then you you have to go all the way down and you realize Lean actually just hasn’t built up all of these structures yet. Um so, there are a number of sort of practical things as well as improving AI models to get there. there’s thousands of mathematicians doing this now. So, do you think that you know, as they mostly have that done, this probably, at least for math, will become a minor one? Uh I think it will be a pretty I I think it’ll take us a little while to get there. Um it is solvable in in theory, right? Um and it’ll be great when you get there because there are lots of I mean, there are even today parts of mathematics that people are arguing about, um you know, is this proof correct or is it not correct? Um and we’re going to need it with AI, too. The uh the next one is is I’ll call it unconventionality. Uh this one’s interesting because for most of what people use AI for, you want the AI to give you the like right down the center of the fairway answer. If you’re asking it, you know, is 91 prime? Can you summarize the this email into three bullets? Who was the third Holy Roman Emperor? You want the AI to do the like high probability thing, right? You want the quick, correct, normal answer. But when we go beyond IMO problems, you get into real research problems, and especially the the like really hard open unsolved stuff, you probably don’t want it to just go right down the fairway. Every conventional angle of attack on these problems has been taken by smart humans already. And the way that they’re going to be solved is by going off into low probability corners of the space. Uh and you know, that’s actually very different than how we train AI models by default. So, that is something that uh that that we’re going to need to solve. And then lastly, there’s this this kind of higher-level idea of invention, right? The the as we talk about models going from SAT math to contest math to graduate math to open research problems, we haven’t yet seen a model solve a major open problem like at the scale of a millennium problem. I don’t think that’ll be long, but then there’s even a notion beyond that of inventing a whole new field. Right? The like Grothendieck inventing algebraic geometry, uh the Langlands program proposing connections between previously unrelated fields of math. Einstein inventing general relativity from scratch. You know, these are these kinds of breakthroughs go way beyond solving an individual problem and uncover whole new areas of study that, you know, new vistas to explore. AI just isn’t there yet. But we want it to be and I think it will get there. Um why do I think that? Because AI is progressing faster than any technology that we have ever seen in our lives. This is a chart put out by uh a third-party organization called Meter that evaluates models on essentially how long of a task, measured in the time it would take humans to do it, that a model can consistently do. So, it’s I mean, as you can see, it’s literally exponential. And in the time since I made this chart like 2 months ago, uh Anthropic topped us. Well, first we released 5.4, which was better than 5.2. Then Anthropic topped that, and then GPT-5.5 topped that. So, in 2 months, there’s like three entries that continue this exponential curve. Periodically, people talk about AI capabilities plateauing. I’m here to tell you, I do not see them plateauing at any point in the near term. Um Building AI is a surprisingly empirical science in a way that I I wouldn’t have expected before I joined OpenAI. And all the data that I see across you know, the lab that I worked in and and what I hear from other labs, it points to continued growth in both capabilities and intelligence. We have models internally that are more capable than what we’ve launched. We have pathways that we see to get more capable models. It’s just not going to stop. The way that I like to think about this, the thing that that uh if there’s one thing you walk away from this talk, remember this. The AI model that you use today is the worst AI model that you will ever use for the rest of your life. The AI that you’re using today is the worst AI model that you will ever use for the rest of your life. And when you think about that, it makes you very optimistic about what this is going to help us all do. So, that’s why I’m so excited about AI in science and about how these models give us superpowers. Um I I hope that I have shown you how models today are already dramatically helpful for scientists. They accelerate our thinking, the way that we test, the way that we calculate, the way that we discover. And they’ve come so far in just 3 years, right? It’s hard to imagine where we’re going to be in another three or four. I think that this year, in 2026, will this year will be for AI in science what 2025 was for AI and uh software engineering. All right, if you go back to the beginning of 2025, if you were using AI to write substantially all of your code, congratulations, you were probably an early adopter cuz the models were okay at it, but you know, not amazing. By the end of 2025, if you were not using AI to write substantially all of your code, you were probably falling behind. I think that’s going to be true for AI in science. I think it’s happening right now. If you’re using models heavily to do scientific work at the beginning of this year, you’re somewhat of an early adopter. I think by the end of this year, if you’re not using them, you’re going to be falling behind. It’s just the pace that that that uh that this is moving at and we can use AI to accomplish more. When you think a few years out, I think it’s plausible that accelerated by AI, we are going to be doing the science of 2050 in 2030. And I find that incredibly exciting for the world, for the like the lives that we’re going to save through things like personalized medicine and curing disease, better materials that lead to more abundant energy, it it you know, what it will mean for our understanding of the universe and the world around us. Like doing the science of 2050 in 2030 instead is an incredibly exciting goal, and it will take all of us, but I think we can do it. Thank you. [applause] Time for questions. And if I could ask you to repeat the question. Oh, yeah, sure. Yeah. So, fantastic talk. I wanted to ask you I wanted to ask you about your thoughts about the knowledge gap. Uh as we hear people using AI more and more to generate code, uh people understand less and less how this code works. Uh if I think about your example, uh pretty soon we would accept that Open AI has solved 200 problems. If uh mathematicians will build on those assumptions that these problems are solved to solve more problems, pretty soon we will be unable to tell the difference between what is true and what is perhaps needs to be fixed. What are your thoughts on the developments in this direction? I The question is does this lead to a knowledge gap where we rely on the AI models and as a result we sort of lose our ability to do basic things that that we expect and then that leads to other problems down the line. Um I respectfully think this will be less of a big deal. I think this is a continuation of like the pace of scientific progress uh that we’ve seen. I I would imagine when people started thinking about calculators, there were similar questions about, you know, what will happen when students don’t know how to multiply and divide long, you know, long numbers. Um I bet when um you know, when my dad was uh was here studying computer science, the idea that, you know, most of the people in this room probably couldn’t do a lot of what was required to do computer science back then. You know, like my knowledge doesn’t go below about C. Right? I can I I’m comfortable in C, but if you ask me to write assembler, I don’t know how to do it. If you ask me to get below that into like machine level stuff, I can only go so deep. And at the time that probably would have been like I would have been a failure as a computer science grad back in that day. We don’t worry about it a whole lot right now. People write Python and do cool things with it and like um so I think there is sort of a natural um like progress is about building on abstractions and there will be certain people who stay deep experts in some of these things and the vast majority of people don’t need to be and they’ll be able to build on these abstractions and do way more than they could without them. Um and the other thing I think is interesting is when you you need to know them You have literally the best free personalized teacher that you could possibly ask for in the models themselves to go learn whatever you need to learn. So, I guess I’m more optimistic that that will be less of a big deal over time. So, there’s another aspect to Dan’s question, I think, which was people building on results that turn out to be incorrect. And that to me too seems to be a progression. Yeah. There’s already plenty of wrong stuff out there, and people build on that wrong stuff, and then at some point the rug gets pulled out from under it. So, thank you. Yeah, yeah, yeah. That that I think is very real. The question was like building Okay, what what then if you’re What if you build on sort of an edifice that that has that just isn’t correct from the beginning? And that I think points at the the importance of the verification. Like you can’t skip that part. You don’t just get to vibe code science and put results out there. It’s why humans are critical in this loop. The result like at the end of the day, the humans are responsible for the results. You as a scientist don’t get to say, “Well, the AI did it. I assume it’s right.” You have to verify that it’s right. Verification will be super important. And like yeah, if you don’t get this right, the whole thing the whole edifice collapses. earlier, you know, I think this is a continuation of a trend, which is we already are building on things that turn out to be incorrect. This accelerates it. And it accelerates the need for uh more rapid verification because the rate at which garbage is generated is increasing exponentially. Yeah. Well, and hopefully hopefully the AI is is sort of the the cause of and the solution to some of these problems. I think you can use models themselves to filter out a whole lot of AI slop. Um and it’s a little bit like the spam problem. When you start you know, when we all got email for a while, spam kind of overwhelmed it, and then at some point you get it under control, and it’s okay. Like we can use these tools to to bring this stuff under control. But it’s it will take work. Uh in the back. Okay. One of the things you said was kind of striking is that building these things is an empirical process. It’s not something you should basically, you know, have rules for something like that. And it’s probably what everybody considers their trade secrets, how you do that. But can you And maybe it’s just another talk at some point, but how do you guys go about this sort of stuff, figuring out what to do, where to poke at it, how to verify it? Um, you know, what’s the design space and how do you try and approach it? Uh, yeah, the question was sort of uh, it is a very empirical science, so how do you what’s the intuition? How do you how do you go about experimenting and building these things? Um, I mean, hey, that’s probably a talk I’m not allowed to give. [laughter] Uh, but it it is it was remarkable to I mean, obviously there’s a lot of theory behind these things. It’s not that it’s it we’re just completely in the dark. Um, you can take classes on them, you can study them, but uh, especially as you push the frontier, it’s not obvious which direction you should go and how you push and it is extremely empirical. Uh, if you look at the way that um, you know, you talk about like the compute needed for training uh, a model, people tend to think about these big training runs that you do, but actually most of the compute for most of the time goes into hundreds of AI researchers running little experiments all the time trying to see if there’s like some, you know, low-hanging fruit or if there’s some um, you know, if you start to see the the loss curves going in the right direction when you do this particular thing on a really small model and if you do, you know, what what if you try it on a slightly larger model? Does that trend hold? And and it’s just a huge amount of experimentation. Um, and and then, you know, these are very smart people. As you do more experimentation, you develop intuition and you [snorts] you’re you sort of explore for landscape in a in a more intelligent way, but but yeah, it’s So, it’s one of the reasons I guess one one way to see this even empirically is when you see one lab make a breakthrough, the others very quickly can follow it, right? It’s hard to get to a breakthrough, but then it’s very easy to fast follow because once you once somebody tells you that there’s an interesting thing over here in this part of the landscape, you usually know enough to go, “Oh, okay, I know how I would maneuver myself there.” Um and so, like that that to me is an indication of how empirical this stuff is. Uh back. Um more so a question about your background. How would you say a background in physics has helped you in your career in product? Uh The the question was how how does a background in physics help in product? I mean, mostly not at all. Um not at all directly. Indirectly, you know, a lot of physics is about um is about learning how to learn. Um and there are things, you know, a lot of equations that you deal with in physics are too complex to solve exactly. And so, one of the techniques, one of the things you learn very quickly is about like estimating the different parameters and which parameters you can drop from the equation to simplify it so that you can solve it. And that, you know, that technique considered broadly helps you in product management, which of these things just don’t matter and I should ignore them to simplify the problem. Um but you know, mostly it’s I think curiosity and wanting to learn and being open to learning new skills that benefits wherever you’re starting from, wherever you’re going, you know, is always a a really valuable skill. Um Um so, I think most of the people sitting here are students. So, I’m just have a question representing as a student here. So, how can we intentionally train ourselves to have a better judgment and a better critical thinking to collaborate and work with AI. Okay, the question is how how sort of how do you learn to better work with AI? Um I think that I I don’t know of a way other than to just dive in and do it. Um it’s a skill like any other. It evolves with practice. So, you’ll get better the more that you do it. And it unlike a lot of other things, it’s changing, you know, week by week. And so, if you uh the only way to sort of stay up with it and to stay uh current on the capabilities cuz they’re always evolving is to be using it regularly. So, um you know, there’s a lot of uh we’re all kind of as a society figuring this stuff out together because it’s all happening, you know, in real time. So, I try and uh I try and read what others are saying as they figure things out. Um I try and use a lot myself and and just, you know, do my best to ride the wave. I don’t know of any other way. Yeah, so at each point on this curve as we’re moving forward, the AI or the models and whatever started to get better than humans at certain things. Um I’m curious if you think this trend continues, which you said, you know, continue in this exponential. How are you so confident that humans do stay in the loop? For example, hypothesis generation right now, maybe you push the model in a certain way, but I can imagine in 10 years you figure out ways that maybe the model just figured out what it wants to do, goes and does it, comes back with the results, and continues forever. Um yeah, how are you confident, I guess, that humans should be in the loop? Yeah, so like what’s the tension? The question is what’s the If the models keep getting better, then why aren’t they better at humans at everything all the time? Why why do I think humans are involved? I think there’s a lot that we do on a on a um like there are certain things that are pure intelligence, and on a lot of those things, I think the models will be superhuman or already in some ways, but will be dramatically superhuman over time. And then there’s lots of other things that are really hard to train into a model. You know, the reward function for many of the things that we do, most the most complex things we do, is unclear. And that means it will be hard to make a model obviously superhuman at it. There may not even be an What is obviously superhuman in certain decisions because you and I might disagree and neither of us is right. You know, we’re just we both have our own opinions. And so I I still look at these things as tools at the end of the day, even as they become dramatically superhuman at many things. Like we we’ve had tools that are superhuman in various ways over you know, calculators are superhuman at multiplying. Um these tools are AI tools are superhuman at a much broader set of things, but I think they’re still going to be tools at the end of the day. They’re just going to be able to take on more and more of the time. Uh Thank you for your thoughts on the the use of AI on on materials such as and subjects such as mathematics and physics, things which can be verifiable by humans. What about those areas such as um macroeconomics future foreign relations, interactions with other places and things which are not verifiable? Yeah. I would appreciate your thoughts on on use of AI in those areas. Yeah, so what about things like foreign relations? The question was things like foreign relations that are much harder to verify. I I think this is a good example of of the question that uh that you just asked. Um so if you look at things like foreign relations or to you know, in like a military context, wargaming, um there’s not it’s not there’s no verifiable reward function there. Um nobody knows how these complex systems, you know, nations are going to react in in various things. AI models are super useful here. They they’re very broadly used within a lot of these contexts as uh as idea generators, as partners, as uh tools that can explore different hypotheses and, you know, come back with, “Hey, if this happened, then I think this would happen, and then that would happen. You have to think about this.” But that’s not to say that anybody treats them as the end answer. It’s just in the same way that we all uh get better as we brainstorm with other people, having AI in the middle of your brainstorm, thinking through a bunch of these things can be super valuable. But at the end of the day, this is why I think, you know, for complex stuff like this, humans are definitely going to be in the loop. You’re going to have an AI that’s helping you think, and at the end of the day, it’s going to be your call, and uh and and so that’s, you know, humans are going to be These things are tools, and and they’re valuable tools, but they’re just tools. Let’s do one or two more questions, and then we can continue out in the hall, okay? Sure. Uh in the middle. Um so, you said that you and your colleague with different levels of experience with physics will um work with this models differently. If [cough] um the tools, [clears throat] do you think that someone who doesn’t have the experience will be able to develop the kind of experience that’s needed to be able to work with this models effectively at the level of the experienced person? I do. Yeah, so the question was if you don’t have the experience, um the it how will people who don’t have uh who aren’t experts in a particular thing be able to interact with the model as experts? How do you gain that expertise? So, I’ll I’ll I’ll tell you a story, actually. Um the uh when some of these Erdős problems started getting knocked out, we’re seeing people post on Twitter, “Hey, I solved the Erdős problem. Hey, I solved the Erdős problem.” Uh one account posted three or four times that they solved an Erdos problem. And it unlike most of the accounts that were doing it, these it wasn’t a well-known person. He had a uh like a animated character for his profile picture. And uh his name was not obviously his actual name. And I was like, “Who is this guy?” So, I DM’d him on Twitter. I was just like, “Hey, you’re doing such cool work. Like, tell me a little bit more about yourself if you would.” Turns out, he’s a 20-year-old kid in college. Uh he is the first person in his family to ever go to college. And uh he’s off he studies mathematics, you know, he obviously cares about this stuff, but he’s a 20-year-old kid. And uh he he when I was like, “How do you know all this stuff?” Cuz by the way, as I started talking to him, he was not just deep in mathematics, he also had a bunch of ideas for things that we should do with our models to make them better at I mean, at at AI, so clearly he’d gone some level of depth in in AI research. I was just like, “How do you know all this stuff? Like, I thought I was, you know, doing okay when I was 20 years old if I was passing my classes. I was not solving open mathematics problems.” And he said, “How like what excuse do I have for not knowing all of this stuff? Because for the last three or four years, I’ve had a free personalized tutor that knows everything in the world across any subject that I can spend arbitrary amounts of time with to learn whatever I want. And so, that’s what I’ve done. And uh and it it makes me very optimistic about, you know, you can you can definitely worry about, you know, what happens if people use these things to not learn and to, you know, be lazy. But you also have examples like this kid, who by the way will be a intern at OpenAI this summer. Uh Uh you these examples like this kid who have taken it and used it to propel themselves to become experts in things that, you know, would be very hard for any other 20-year-old to have ever been an expert in. One more. Uh Um so, my question revolves around purpose. Right? We all remember a time before Gen AI in this room. In 2050, this room might not have that. Right? They’re going to have a different purpose. We grew up with a purpose in mind. Theirs is going to be surrounded by, “Well, here’s all the answers you need. You don’t even have to seek purpose.” So, how do we instill purpose in a future generation to use this? Yeah, the question is how do we instill purpose in a world where AI is able to do lots of things for us? I’m very much an optimist here. Uh I think humans are humans and we are born wanting to like we strive, we want to create things, we want to leave the world a better place. Uh I think AI will give us superpowers to do that. It’s not going to have all the answers. It’s not going to be able to solve every problem. The world is much more complex than that. There aren’t answers to every problem, no matter how smart you are. And I think that we are going to live in a world where AI will allow us all to create a bunch of thing like, you know, in a before about 5 months ago, if you weren’t an engineer and you wanted to make something, you know, something digital, a website, a product, it was very hard to do. You had to know how to program. Now you don’t. Now nobody in this room, regardless of whether you’re a CS major or you don’t know how to program at all, none of you have an excuse not to create anything you can think of because you have these tools at your command. And they’re free. And they’re available broadly all over the world. So, I I think, you know, 3 4 years from now, God knows what the future looks like, but these tools are going to be even more powerful. We’re going to be able to do even more things, and I think that for the right set of people, it’s going to be empowering, not the opposite. Hey, join me in thanking Kevin, please. If you could [applause] [applause]