Openais Chief Scientist On Continual Learning Hype Rl Beyond Code And Future Alignment Directions
read summary →TITLE: OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions CHANNEL: Unsupervised Learning: Redpoint’s AI Podcast URL: https://youtu.be/vK1qEF3a3WM?si=KlITBqi7UrCFZ-xZ
---TRANSCRIPT---
I definitely agree that continual learning is really the thing. It’s really the thing that we’re building. But I don’t really think this is like a problem that’s ignored and and off the path of what we’re doing currently. I think it is what we’re working toward.
What are like the other research areas within alignment that you’re paying attention to or that you think are promising?
A lot of the like longerterm challenge with alignment is about generalization. What are the values that the model falls back on?
What are the things that you need to figure out to be able to really make models work well in some of these other spaces?
I come back to this.
Akopi is the chief scientist of OpenAI. I think literally one of the most important people on the planet. And today on Unsupervised Learning, I got to ask him literally everything that I’ve been thinking about and I know a bunch of people in the ecosystem have too. We talked a lot about model progress, what’s required to make longrunning agents work, as well as the really interesting work Open AI has done in the AI for science world and the progress he sees in that over the next years. We talked a lot about how companies should be thinking about model building in this moment, when they should be doing reinforcement learning, how they should be thinking about the evolution of harnesses and the impact that will have. We hit on a lot of his really interesting research, including the work he’s done around alignment, the work that OpenAI broadly has done around math competitions. And we also talked about this focusing moment in OpenAI and what it means for the research organization and how he runs his team. literally just such an awesome opportunity to talk to someone who is driving so much of the change that has revolutionized this space in the world. I hope folks enjoy this wide-ranging conversation as much as I did.
I feel like you are the perfect person to talk to about all the questions everyone has in the ecosystem. Uh what’s you know happening with model progress. A lot of companies are thinking about how they should be building things based on what’s happening with the models. A lot of people at a societal level are thinking about the impact AI is going to have on science and broader society. Uh and you’ve been at the forefront of the space for pretty much every generation of uh of improvement uh these past years and so really excited to have you on the podcast.
Happy to be here.
I think I’ll start with one of the mo the juiciest things you said which is you know four months ago I think you and the open team talked about aiming for a system with research level intern capabilities by September of this year. So coming up uh I think that’s uh what 6 months from now. and then a more fully automated AI researcher by March 2028. And so I guess you know checking in four months later, how are you feeling about those timelines?
Yeah, I think you know over I think over over the last months I think like the change that really happened is we’ve seen this explosive growth of coding tools. Um it’s an understatement. Yeah, we’ve definitely like really kind of gone um to a place uh in OpenAI where we use Codex for the um for the majority of um you know actual coding. Um and so I think I think for most people like the kind of the act of programming has has has changed quite a bit. Um so I definitely see this as a signal that like you know something here is on track. The other kind of like very interesting update over the last few months to me has been the progress on the math research capabilities. Uh also the results we’ve kind of seen in physics in other fields. I think I think this kind of level of capability this level of like ability to provide insight when combined with ability to access infrastructure ability to use maybe uh more computed test time that’s something that cod is using currently uh and very strong improvement in general intelligence which I also expect over over the next couple of months. Yeah, it’s something we’re still very much planning for and very focused on.
And how do you like know when you’ve you’ve gotten there? like what’s like a a workflow you might look to to say hey okay I think we’ve got these you know research intern level capabilities. The the way I would distinguish you know a research intern from from full automated researcher uh is um the kind of span of time that that we would have it work um mostly autonomously or the kind of like specificity of the task that has to be given so I don’t expect uh you know we’ll have systems where you kind of just tell them oh like you know go improve your model capability go solve alignment uh and you know and they will do it not this year you know I think we might get there at some point uh but I think for like more specific technical ideas like I I have this particular idea how to improve the models how to like you know run this evaluation differently I think I think we have the pieces that we mostly just need to put together.
Karpathy released you know a pretty viral version of of uh using some of these models to you know improve some of his uh you know obviously way less complex models than what you guys are building here but did that feel like generally in this uh you know in the spirit of of uh some of what these tools might look like.
Yeah, I think it’s in the spirit. Yeah, I mean I I I expect it to look like a pretty continual evolution uh from kind of where Codex is now. I think towards a bit more autonomy uh running for a longer time. Um but yeah, I I think I think we’ll see a lot of this sort of application. I think in general we’ll see we’ll see like more autonomous and higher compute use of these models for different things.
You mentioned kind of like the math and physics side and obviously you’ve had these really impressive breakthroughs uh in math on you know uh some interesting like different kinds of competition uh you know problems maybe you know I think for our listeners it like intuitively makes sense how progress in coding directly translates to something like you know helping with AI research how does like math and physics progress like also tie into this.
The the biggest role that like u you know focusing on these math benchmarks has played for us as as a general yeah like benchmark and and and and a northstar for like how to improve this technology. Like math is very measurable, right? It’s much easier to tell whether you’ve actually solved the math problem than whether you’ve even like produced a good uh you know piece of software and also it can get very hard right so you can have things where like it’s very definite whether you’ve solved them but it can be like arbitrarily pretty much hard to to actually solve them. You know, I would say like up until not too long ago like um you know, my perspective has been like well okay like we you know our models are not you know maybe able to solve like simple math problems. Okay, our models are able to solve simple enough problems but are not able to solve like IMO level problem. So clearly there is just like a gap in just like this uh you know intelligence of these models that like that that is very measurable very you know very easy to run at. It’s very clear what we need to do and you know and this has be kind of our northstar for like reasoning models and so forth. Now of course um that is changing quite a bit right and we are um you know we have kind of reached these milestones that we’ve been working towards of like yeah IMO goals level solving IMO problem six and you know and making forests into research level mathematics. Um and you know from this point I think I think there still is uh you know there definitely still is utility like continuing to measure progress on this I think there’s also like you know there’s definitely like transfer that that you can get from like getting better at mathematical reasoning to getting better at AI research. You know, a lot of our uh best researchers uh are uh you know mathematicians we’re training or from other kind of theoretical fields. But definitely we are uh you know we are very much uh changing how we think about you know these nerf stars and we are very focused on how the models the next models that we’re producing are actually useful in the real world you know useful you know especially for a research but also for other kind of economically valuable activities and for other uh fields of science uh and especially maybe more applied sciences. And the reason for this shift is because we believe the models are now capable enough, not as smart as people and always, but capable enough to actually materially change the economy, change how things are done. And so, uh, yeah, we feel a lot of urgency about that.
In the early days, uh, picking a domain like math that is so, uh, hard to solve, but then easily to verify whether you did it, like it’s kind of the the perfect place to get started. And I think code obviously shares a lot of attributes to that. You know, uh possible to check uh and verify and great for reinforcement learning. I think one question that a lot of people are are thinking about is okay, we’ve seen reinforcement learning work incredibly well in these domains where you can verify it rather easily. A lot of, you know, valuable tasks in the world, medicine, law, finance, you know, there’s some level of of the ability to do that, but it’s certainly not to the same extent that math and code are. And so I think a lot of people are trying to figure out, you know, are we going to see similar improvements?
Yeah, I definitely expect so. Um I think an interesting duality that we think about a lot is um you know for this more general task for these tasks are kind of harder to evaluate. They share a lot lot of common uh commonalities with um just longer horizon tasks, right? Because if you think about even like a very well specified math or coding problem again like if it’s it’s something that you need to work on for like a year then uh you know even it’s very clear what the criteria of success are in the long term like what to do on your first day of working on it is a pretty open-ended problem. And so I I kind of believe this these difficulties coincide and they’re very clearly the next the next frontier uh for for how these systems develop. And I think we’ve definitely seen very encouraging signs both on just like our ability to scale RL on these more general domains.
In these other domains, it feels like one of the hardest things to know is just what was success in a task, right? And you can imagine you know there’s going to be you know whatever the problems you are that are facing code of math that are short-term tasks and then longerterm tasks feels it will be amplified in the space that is you know outside of those right where a short-term uh legal task or medical task may be harder to run thousands of iterations on right and figure out you know was that done correctly and then those longer term tasks like even harder I’m curious like how you even conceptualize that research challenge like what are the things that need to be that you need to figure out to be able to to really make models work well in some of these other spaces.
Yeah, I think I think I I come back to this reality of like how do we make the models work for a very long time and how do we teach them to evaluate kind of partial progress. Yeah. I mean I think if if you look at like even outside of RL like like where that sort of progress on longer horizons is coming from right like I mean as the models kind of become more consistent from just like pure supervision in pre-training um they uh they gain some idea of like you know oh what what does like a good partial artifact here look like and so I think I think even if we weren’t like scaling RL very meaningfully we would see an alongation of these horizons over time yeah it’s definitely um you know a research challenge to like to figure out how to like leverage this new ideas from RL and so forth to to apply this to general domains. But I’m quite optimistic about that.
Yeah. And it’s interesting. It sounds like part of your mental model is like the models themselves being able to check progress with some at some sort of cadence that is, you know, reliable enough from the outside at least. It’s not totally clear if we’ve seen like generalization in RL yet. feels like we yeah clearly you seem to have some techniques that really optimize models around whatever we choose to focus on but it’s like almost feels like an older school version of of ML of like one one thing at a time is that like you know I guess would you agree with that characterization and like you know how do you kind of see this this current climate.
Well we are buying a lot of compute right because we we don’t I mean we still believe a bit less and we believe you know more than ever to some degree yeah we’ve seen you know, new techniques and I think new ways to scale, but like that that is kind of the the lens through which we’ve been viewing things. Yeah, I think there is a certain amount of complexity that we needs to grapple with and kind of everyone needs to grapple with because, you know, we’re no longer really like purely building like um um you know, brain the sky that’s completely isolated from the real world, right? Like if you actually you know if you want this model to do like medical research if you want it to cure cancer at some point it needs to like learn about the real world is a meaningful way you know maybe conduct some experiment and learn from its results and for that you you need to figure out how to actually connect it right and that is going to involve something that is yeah that that goes in the direction you described but I I don’t think that goes counter to actually scaling the the like finding and scaling the simple algorithms that that we’ve been developing.
I feel like I talk to a lot of companies and like I one of the main questions everyone seems to be asking these days is like should we be doing you know our own reinforcement learning like take an open source model and like we have some data on a task that people do. um we have evals cuz we know our domain pretty well like is this something that makes sense for us to do or like should we just wait for the models to continue to get better at at some of these things. you know what advice would you guess would you give for like the many builders that listen to the podcast as they think through you know uh the extent to which they invest on the on the reinforcement learning side.
Reinforcement learning definitely can be a very data efficient way to like really improve the model as some sort of task right there is a much more data efficient way of learning that we know right which is like learning in context right and this is maybe the most fundamental way that people you know teach these models you just prompt them with like examples with with with instructions for what you want I expect that learning is going to get much better over time. And so I think it definitely really matters that the models can adapt to your context. They can adapt adapt to kind of the the kind of tasks you care about. So I think that will be very important. I’m not sure if like you know replicating the kind of current a pipeline is going to be like the right way to go about it. But yeah, it’s definitely a problem that that we’re thinking about.
Yeah. So it’s almost like yeah you still have to do the work like you still should you know figure out what the eval are that matter gather the data the examples but like it may just turn out in the future you’re far better off just feeding that into this context than trying to like do anything on on you know your own model.
Yeah, I think I think that’s quite plausible. And I think that like you know obviously people have seen the success of of tools like Codex which I know you know you’ve obviously been a key part of and um and wondered like you know hey do we need to build like our own kind of you know should we build our own harnesses or our own ways of of using these things or you know uh for for our own domains whether it’s like you know uh legal or finance or or healthcare or do we kind of just like take the harnesses that the large models do um and and kind of use them within you know with with the context that we have.
Like the implementation of the harness shouldn’t really be a limitation for a very long time. I think we’ll be able to get like much more general harnesses that people can use for uh for all sorts of other domains. I mean I think codex is pretty good actually if you try using it for things beyond coding.
That’s so interesting. Like a much more general harness being something that’s almost like uh adaptive to or like just works across whatever the you know specific set of tools you have in your domain or specific set of things you want to expose to the model.
Yeah. I mean I I think and you know I think it’s also worth thinking about like you know why like you know what what what is kind of the kind of ultimate interface that we want to interact to the model with. So, so the model gives some the models gives some UI hard forensicness, right? They can build their own UIs. They can kind of do things that uh you know people would find very timeconsuming. Um but I yeah I definitely think there is also just like a lot of space to kind of enable the models to access like the current interfaces that we use for for people right. So I think like we want to have um um you know AIs on Slack for example or that that are kind of plugged into our our context and uh and yeah and are able to to learn from it and able to kind of yeah to realize this existing things right so definitely like there is some meet in the middle here but definitely I believe like longterm like uh you know like by default the AI should kind of meet you where where you are uh and if not that would be because it kind of it has new abilities, not because it has limitations.
Yeah, it’s an interesting point that basically today it feels like these harnesses are so bespoke to certain environments, but like over time as you add more and more skills and tools and models can navigate uh across those effectively, it’s like there just be a general like you know the way humans have uh that that makes a tremendous amount of sense. I guess I’m curious like you know you uh obviously I’m I’m sure like every day you see kind of crazy stuff on the research side at this point like what are the milestones that are like still meaningful to you as you think about like it would be pretty crazy if I you know uh did a run one day and saw like X or Y like what are the things you’re paying most attention to?
Yeah. Um I mean at this point it really is about um research right like is it about it is about can the model discover new things can it execute on like a longer horizon um research problem. It’s almost like looking for some sort of insight that you’re like oh someone on my team had come up with that that would I’ve been pretty intrigued by. Yeah, we we’ve actually had like some minor uh um but I think I think quite impactful ideas uh come from uh even like GPT 5.2 Pro uh that that we’re using entirely. But you know, I think it’s still very very small compared to where I expect it to be.
Yeah, I mean it seems like almost inevitably like these models are going to get better. They will be used in research. They’ll be used in science more generally. You’re like one of the first people interacting directly with these models as like research partners almost at this stage. anything like you’ve learned around the right way to do that or do you think about like what a research organization you know as these models continue to get better might look like?
Yeah, I I I think we’re definitely kind of at um at a transition point where kind of the shortterm immediate quality of the model uh is about to be a quite determining factor for the pace of our research progress because the models are going to drive a lot of that. And so that definitely requires um you know rewiring some intuitions about how to um run a research organization. Uh you know normally you kind of try to not be too focused on like immediate quality. you try to be much more focused on like the longer term. I think we have like a lot of very exciting uh stuff queued up that we are kind of working towards but I feel a lot of urgency to kind of yes to actually execute on it and to actually use this advances in model intelligence to um accelerate research on the AI and especially AI alignment.
Yeah, it’s such a fascinating point because I’ve heard you talk before about running a research organization and I feel like in the past it was like giving people the space to, you know, pursue a lot of things that weren’t like directly, you know, hey, this is for a month or two months of progress, but it’s like what are the ideas that are really going to drive things forward, but it makes total sense that we’re in a time now where uh you’re like, look, everything we do will be so much better if we just focus on this in the in the short term and make it better.
Right now you have um you know a a ton of compute as a company, but you obviously you have great scaling laws on the pre-training side, you have great scaling on the RL side, you have probably lots of experiments going on that have nothing to do with either of those vectors, but are like interesting new ways. How do you even think about like allocating compute across all of this stuff?
Yeah, it can get very complicated, right? Because there’s so many things that we need to do. One thing we’ve been one kind of discipline we’ve started keeping is we um we try to make sure we just like explicitly budget like a large chunk of our compute to the most scalable methods to the things that we believe are the most responsible for driving general model intelligence. And you know even if it’s not the most efficient allocation of compute at all times because you know if you’re allocating so much compute to like one experiment or like one set of experiments you know there’s so many things you can accelerate a little bit of that compute elsewhere. Uh but you know but I think it’s easy to kind of like with all the all the interesting and important things that we’re doing I think it’ll be very easy to kind of partner all of it and like not not really end up doing the things that we believe are most important. You definitely want to like understand the kind of empirical evidence. You definitely want to make sure your evaluations are in order and the kind of experimental rigor is there. And then you also want to apply some regularization based on like okay do we understand this method? Do we actually expect it will scale? Do we expect this is something you can actually build on in the future? Is this kind of a one-off? Right. And I think and based on that uh determine the priority.
Obviously the the place where we talked about Codex a lot and and the success of coding and it feels like you know last year was like the year of just incredible hill climbing on on coding. I’m curious you know obviously Codex has been a super successful product in many ways like Anthropic was kind of first to this market you know Claude Code you know was it was a dominant product there. What do you kind of like you know reflecting on that I guess like what do you make of the success Anthropic’s had in this space.
Yeah I think I think it’s a matter of you know really focusing your product direction on where where you believe the kind of the the next application of the technology is right and um you know if you look at the kind of prioritization we’ve had on the on our product right I mean we have been right like working on on cutting products but they have kind of been like a secondary thing right compared to like our main priorities and the interesting thing is that is not very reflective of like the priorities of the research organization within OpenAI. Uh I think you know given that like we’ve kind of had this you know explosive success of ChatGPT you know charging as it was you know I I think charging quite a bit and it’s going to evolve quite a bit but as it was in 23 right is this particular you know product that’s maybe not, you know, I think it’s definitely quite aligned with our vision of like where AI is going, but but like it’s not really like the like representative of like everything that that that that it enables. And so the majority of like our work in research has been focused on like that that future thing. And I think increasingly it has decoupled from our our our kind of like short-term product strategies. I’m very kind of um confident about um the things we’ve been building and the things we we we are building on on on the research on the model intelligence side. You know, a lot of our our reprioritization and increased focus on the on the product side is about actually kind of getting to deploy them and the belief that actually they are uh the thing that really matters now.
What do you think will look different in their lives or like how will they be using Codex in you know three six months?
I would expect um just a a gradual increase in just the level of autonomy uh you feel comfortable giving the model just the the vagueness of description that can work with you know the level of supervision it needs. I think we’re not very far for models that can work autonomously for a couple days. Um maybe use quite a bit more computer than they’re using now and produce much higher quality artifacts on their own.
Do you have a gut instinct on like what like you know there’s always been this question of like will the world you know do you need that software engineering skill set to supervise these models running for a few days or like hey does it turn out at some point of like being able to run for a while you know anybody can can use coding agents and supervise them to to some sort of output.
I mean I think definitely for like a lot of outputs you already don’t need much experience right I think I think still the distinction I would draw between like you know an intern here and like really an autonomous researcher software engineer would be that like if you want to build something bigger like you know you probably still want to apply supervision you still kind of want to have like an overarching thing you want to recognize like what what what building blocks fit in and what which don’t but yeah I definitely expect that like that desired skill set uh to shift quite a bit over time towards towards this like more general uh vision setting.
You know, I guess on on the on the research side, I feel like there’s been uh you know, maybe maybe like a month ago, I feel like all anyone could talk about was continual learning and there’s just you know, it was in the Zeitgeist. There’s all these neolabs starting to go focus on continual learning. Some folks left OpenAI to go focus on that. Um I’m curious like you know I think it part maybe part behind that is a belief that like you know uh RL alone you know either won’t get us there or will get us to like some level of very inefficient scaling and it’s kind of different than the way you know humans learn. What’s your take on on like that you know that whole movement?
Yeah, I I am a little bit confused by it because you know in my mind like the whole kind of like excitement that like we’ve had I mean even even if you look at the titles of like the GPT uh you know three paper right like it is that like oh you know this class of models is actually capable of continue learning right it’s capable of like learning uh um learning to learn in context right that has been really you know the driving force behind the kind of excitement to like scale these GPT models further. That has been like the premise for why we really need to teach them with RL like learn in context more efficiently. And so I definitely agree that continual learning is really the thing, right? Like it’s really the thing that we’re building, but I I don’t really think this is like a problem that’s like, oh, you know, it’s kind of ignored and off the path of what we’re doing currently. I think it is what we’re working towards.
Yeah. Like in your mind, this is like the single best path to get there is to continue to kind of scale uh the pre-training and RL.
I think that is kind of how we’ve made the most progress on this problem so far and you know I think there are I think that there definitely are like more ideas more steps um I think also a lot of improvements that will just come from scale.
How do you kind of articulate to them I guess the set of things that need to be true for these like much longer steps to happen?
Yeah. I mean I think a lot of that prediction comes from just looking at like historical improvement lines, right? And but I think increasingly we can we can roughly see the the the the shape here. I do think a lot of this is about just the models becoming intelligent enough to recognize like whether you know they’re making progress. Um I think some of this is like yeah this very kind of pragmatic work of like are the models actually you know can they actually access you know all the context all the files all the infrastructure they need to do the work you want them to do.
I want to talk about some of the AI for science stuff um that you guys have been working on. One thing in particular, you know, I feel like the coding stuff is something that everyone feels very viscerally. On the math side, not all of us competed in in in IMO competitions. And so one of them I know that was really interesting that you guys did is you use some compelling work around like first proof, right?
Yeah, I mean you know I think yeah I I was very excited with the first proof challenge and you know again like I I kind of you particular one is kind of a benchmark right it’s like a couple you know respected mathematicians theoretical computer scientists releasing problems that like they believe are like representative of their day-to-day work but haven’t been published anywhere so that we can really have our models take a crack. We were so excited about this challenge, but you know, it was kind of dropped um without any any any advanced warning um with like a week-long deadline to actually execute. Um we had a we had a very exciting model training uh at the time. And so uh um uh um one of the people in charge of training James Lee kind of started prompting the uh that model just um by hand and and and and uh and yeah and actually kind of seeing oh okay it’s actually solving these problems was really a fascinating things to see. Uh you know one of these proofs actually is from a domain that I I I I did my PhD in and yeah seeing the model kind of come up with these ideas which I would you know quite proud to come up with like in a in a week or or two uh seeing it come up with them in like an hour or so that was very uh yeah it’s a very weird feeling.
I guess like you know I feel like there was this maybe common criticism a year ago that like okay these models are like pattern matchers but like you really want AI for science like we’re not going to get new ideas or like you know entirely novel things out of out of pattern matching. Feels like we continue to like chip away at that narrative. Are we getting closer to kind of fundamentally disproving that?
I believe so yeah I mean I think kind of on schedule we’re starting to see like minor advancements right like not huge things right like a small idea here or there I mean maybe maybe some like bigger papers in collaboration with with scientists, right? But, you know, was Alpha Zero a pattern matcher? Alpha Go a pattern matcher? You know, our our data bots like they did kind of come up with new strategies for the respective games.
It’s funny that there’s counter examples to it all the way back to, you know, 2016, 2017.
Right. Right. And and, you know, and you can say like, well, I guess you can always fall to flaws in that which I think is interesting like AlphaGo can be beaten with some strategy. Our data bots could have been beaten with some with some strategy. I think I think there will be a lot of deficiencies for a while of of like these models, right? But but I think also like they they are able to discover new things because they have a lot of these capabilities and like the way you know yeah I mean it’s you know taken a couple years to like get go from like this like very tiny game environments to like this much more um general scientific research. It required kind of going through um you know like a decent approximation of like all human knowledge in the meantime and you know learning all the human languages and so forth but but um but I think the basic principle is is is very similar.
You know, it’s funny. I think like when you guys had these first proof results, um I remember like the organizers said, you know, they were commenting on these AI solutions and they were like this feels like, you know, 19th century mathematics of like brute force, you know, computation-heavy approaches rather than these like elegant modern techniques.
It doesn’t concern me. I mean I think it’s expected that like I I’m sure I I thought for at least one of the problems like actually actually our produced pretty pretty nice proof that was quite a bit shorter than like the intended one you know but I think in general you would expect like yeah this models kind of you know they can produce so much more reasoning in a short time than like a person can right just like in terms of just raw number of like tokens or thoughts I don’t expect that to be like kind of a long-term feature.
It feels like there’s so much momentum behind AI for science right now. You mentioned obviously like you know at some point you do have to connect these these models to the physical world and you guys released some cool stuff with GKO and like some of these other things you’ve been experimenting with. Have you developed any intuition for as you think about like 3 years from now, the spaces where of science where you’re like, “Oh, there’s going to be crazy progress there versus the ones that might prove like a little more resistant to immediate change.”
You know, a tempting answer would be that like oh, you know, it’s really about like um you know, do you uh you know, what are the things that kind of require some some you know, manual work like where the models are not like not not quite plugged in the ecosystem or you know like the that the the different laboratories will also kind of evolve pretty quickly to adopt to like these new technologies.
Within those STEM fields. Obviously, you know, I feel like there’s a question of is it like an LLM with access to the physical world or you’ve obviously had companies that are have been started specifically around these domains, right? Like an Isomorphic in biology or Periodic in in material sciences or Physical Intelligence and robotics. What’s your kind of gut instinct on the extent to which it makes sense to pursue some of these things like independent with different model architectures versus like all within the context of one place?
Yeah, I think it’s kind of similar to you know my answer about like the um UI for you know for codex which like I I would build around the capabilities of a technology and not around its limitations so much. Um so you know you definitely like if you have something that like can suddenly design like a huge amount of like interesting like chemical or biological experiments like yeah I mean it makes sense to uh you know build labs that enable that. You know, I think if we if we did get to a place where like the model is like very capable of designing high quality experiments. It also makes sense to like have it work with humans in a loop, right? Like we shouldn’t think of it as like oh it’s either you kind of automated fully and you have this like fun thing using some tools on the side. Like we will get to a world where like it’s just very natural to be collaborating with um you know AI scientists that are that are working hard on a problem.
I want to also make sure just to talk about AI safety because I think that’s an area that you’ve done a lot of really pioneering work on. You actually did some really interesting work across the labs right uh and and were focused on you know chain of thought monitoring.
Yeah so this is um a realization that actually we had um around the time we actually saw like the first um reasoning models of kind of the current crop. We realized that like okay like well this works right and we were pretty uh you know we were thinking a lot about what this means we kind of were like okay like probably the world really changes over the next I don’t know year or two or three you know we were thinking what this means for for safety and for for our ability to kind of understand what these models are doing and we realize that because of the way we train these models that because we don’t supervise the reasoning process directly right it’s not like you know ChatGPT is trained to kind of um you know be be polite and nice. And it always tells me I have great ideas. Yeah. Well, you know, that’s a separate issue, right? But but you know, but like even assuming it’s like aligned exactly in the way we would want it to, which is definitely not, you know, it’s still kind of not going to be uh, you know, there are just still still some things it’s not going to reveal about its motivations and time because, you know, maybe it would be unsafe or maybe it would be unkind. Um or you know or maybe because it’s not maybe it’s actually not aligned the way we think but it wants to hide that right and uh and the way we train the reasoning models like the the train of thought doesn’t have any of that it’s not optimized to uh to be in any particular way because it’s just not not directly trained it’s only trained in how it relates to like producing a high quality output. Um and realized this is actually a very powerful paradigm for being able to interpret what the model is doing, right? It’s actually not a very different idea from uh um mechanistic interpretability, right? Because in mechanistic like the idea is again like you kind of have this model, you have these activations of the model um that you know are not directly supervised to predict any label. They’re they’re kind of like indirectly supervised but you know the model kind of has never been trained with like any sort of like uh you know inspection of the of these activations and so these activations might reveal something about its inner workings but the big advantage of the chains of thought is that you know by default they are in English right and so it’s so much easier to understand what is going on.
And the other interesting thing is um you know we were just talking about how probably you know how how we believe in in the future where we go uh well these models work for a very long time they work autonomously right and so there there is much more of this reasoning uh and so you know if this is a big axis of how the capability of these models increases um that the sort of our ability to supervise them will will scale commensurately.
Yeah, this really comes down to this principle though that like you know you’re not supposed to supervise the train of thought and so this is actually something uh when we originally you know we’re releasing the preview model like we made this decision to like hide the chains of thought. And um you know for me that was the primary motivation that was the reason like I didn’t really even want to consider releasing it in different ways you know there definitely was a bit of internal discussion about this but like the reason I felt very strongly like we should we should just hide it is because of this. Uh then there was this other concern that like I didn’t initially think about but I think was also like very valid of like well you know like this model is going to be distilled to some extent. Uh but but yeah but I actually think that like this uh you know allowing the models some sort of private space. Oh and by the way like why do I think it’s important that we don’t like you know show this chain of thought in product you know um if if if I’m saying like the important thing is not to supervise them during training well I think if we did show it if we like established a paradigm where like oh you just show this chains of thought in product uh eventually you kind of have to train them right like you’ll have to train them for the same reasons you have to train like whatever models you ship.
We might not all want to know what the chain of thought our model has that gets to a response.
Right I mean you know I think I think it’ll be useful to some extent and we are trying to capture most of that value you know either with like chain of summaries uh which I think are kind of like a little bit of a stop gap. I think the longer term solution here is having the model actually talk to you in real time which you know the latest version of Codex kind of does latest version of of the reasoning GPT models kind of do but I think I think that will get much better.
Yeah but but yeah I think there’s something very exciting here about just like not having the training signal fight against us right. Because yeah I think if you if you want to be able to understand what the model does in the long term, but you know you’re scaling a method that is like kind of going directly against that, you’re probably not going to have a good time, right? And so this decoupling I think is a very it’s an idea that gives me a lot of hope for our ability to at least understand um you know how these models motivations and generalization evolve as they get better as they work for longer. Um yeah, I don’t think it’s a complete solution to AI alignment by a long shot. I think it’s just another tool in our in our toolbox. Uh but I am hopeful that building our toolbox with technical tools like this, we can actually continue chipping away at the fundamental problems here.
What are like the other research areas within alignment that you’re paying attention to or that you think are promising?
A lot of the like longer term challenge with alignment is about generalization, right? Like we can train our models to do well and and and and or you know at least mostly to some extent like we we can mostly kind of control their behavior in the in the things that that you know are in distribution that that we train for. Um, but you know the things that are worrisome is like well what happens when the model is asked to do something very very different or it finds itself in a very different situation or it’s like much smarter than it ever was before and and and you know it has all these capabilities. It’s like we haven’t really kind of thought about how to train for and so yeah so so I think I think you know the study of like this kind of longer term value alignment is really a study of generalization like what are the values that the model falls back on. Um like one line of research I’m very excited about here and something that we’re uh investing in quite a bit is uh understanding like how that um how the generalization falls back onto the pre-training data.
I guess over like you know the last six months have your concerns around alignment increased decreased like how do you you know where are we kind of trending overall?
I will speak to like the the the longer term challenges of like alignment right or like what happens when you have very smart models the the way my thinking about the problem has evolved over the past few years is definitely kind of gone from you know oh is this like very nebulous problem that like is just like very hard to even grapple with or define uh to like oh you know I think we can actually make progress at it by very concrete technical solutions and technical insights. And this is why we’ve really been uh viewing alignment as like just a core part of of research and really uh you know making sure that like we are you know designing our reasoning models uh thinking about this and we are you know and we are kind of like conducting our alignment research with like these reasoning models in mind and so forth. Um so I think my general kind of uh belief that there’s like a research path here that actually gets us to an extremely happy world uh has increased quite a lot. At the same time, right, I think uh my timelines to very capable models have definitely decreased a lot, right? I think we’re we’re not that far, right? Again, I don’t think these are models that are smarter than all the ways, but I think these are models that are just very transformative. And so, I’m quite optimistic like we can keep a good grip on like how we’re doing on the alignment problem, how to roughly evaluate the risks of of of of our models or or the problems with them. You know, but I do think we have to be, you know, as an industry as really prepared to like take trade-offs and, you know, and possibly, you know, slow down development uh um depending on what we see.
It’s already interesting to see a lot of this work happening across the major labs. You know, the fact that you did this in collaboration with I think Anthropic and DeepMind.
There’s definitely some I mean there’s definitely like shared interest in this topics. Yeah.
I want to shift a little bit to going inside OpenAI. You talked before about how it’s you know important part of your job is giving researchers you know to to kind of have comfort and space. How do you actually operationalize this balance today?
You know I focus on on just high quality experiments recognizing you know are we actually making progress being honest with ourselves and you know and promoting honesty about about the results um I don’t think that has changed right. Even though our work will evolve a lot I believe we still have quite a lot of work left to do and so I don’t think it’s like oh you know we need to wrap up all our projects uh um you know very very quickly so yeah I don’t think those fundamentals change I think what what does change is uh you know a level of urgency to really kind of bring some of these things that we think are most promising uh to fruition.
Obviously you know I feel like there’s been um you know some very public internal moments of OpenAI over the years you’ve been here for a long time as you kind of reflect back like what were some of the difficult decisions that you guys made that maybe were like 51-49 that really you know defined the company?
Well, yeah. I mean, there’s certainly a number of, you know, dramatic moments, uh, like this. Um, you know, I think the ways the company underwent the most change is not really this like snap changes, snap decisions, but more like just like shifts and and how it operates, right? I would say like OpenAI has gone for a couple phases. You know when I joined at the start of 2017 very much kind of uh felt like very academic lab pursuing like a lot of different ideas not so you know scaling pilled in practice uh and I think that was like the first like big change with the data product with GPT we’ve kind of moved to okay like we actually are going to have to buy big computers we’re actually going to have to um scale things we’re going to have to develop the science of scaling we’ll have to develop the infrastructure for it um and so that kind of started the second phase of of okay now we’re scaling right like we’re we’re we’re still going to pursue like a lot of these basic research ideas but we are going to evaluate them like for the are they scalable. Then yeah then there was this interesting period where you kind of have ChatGPT is this big thing. Yeah I mean I thought it would look a little bit differently right like I think I I was actually surprised that like text models I was pleasantly surprised like text models are actually kind of the first thing. I thought we would be in a world where like it’s more the kind of like you know video style uh uses of generative AI are kind of like the first uh the first big thing to take off. So but yeah but I think definitely like we anticipated that like this sort of tension would arise right where like you have a thing that is kind of like popular now but it’s like you know you believe it’s going to evolve quite a lot before you get to where you’re going and so I think that’s kind of the phase we’ve been in for a while um and yeah I think now we’re we’re like uh well yeah I mean we believe we are kind of like starting to be in this phase where yeah we’re actually deploying AGI or you know deploying models that are actually very economically transformative.
What’s one thing you’ve changed your mind on in the AI world in the last year?
Yeah, I mean I I think I think it’s really, you know, starting to reconcile this tension between, you know, the AI that you build ultimately is something that affects the world, but, you know, until you until you kind of get pretty close, it’s like a pretty theoretical thing that you’re just kind of, you know, training and developing algorithms for. And so, you know, recognizing that okay, now we actually need um we really need to um you know make a lot of progress and focus on like how actually we’re deploying this technology.
What’s maybe one thing that you think we’re underthinking right now as a society in terms of the impact of these models?
Yeah, I I I think getting to a point where so much intellectual work um can be automated I think comes with pretty big problems that I don’t think have obvious solutions. One natural is a question of jobs and you know concentration of wealth and I suspect this requires like real policy maker involvement. Yeah, I’ve heard some kind of optimistic takes on how is this resolved, but I think I think at a fundamental level it does seem like you know some things that like used to be very valuable used to kind of cost a lot and used to provide something like now can be done pretty cheaply and you know in the long term it should be a good thing but I think it does lead like I think it can happen quite quickly. And there is a related question of you know you really can like if you actually have you know an automated research laboratory an automated company that can do so many things like it can be controlled by a very small number of people right it can it can do a lot right and this gets this gets you know even more crazy when you have robots but but you don’t need to have robots and you know I think figuring out like what does governance of such things looks like look like right like what are these like organizations that like so powerful and yet maybe made of like only a couple of people like what how to think about these things I think is uh it’s a new question we have to grapple with.
How has your work on AI changed the way you think about like the way in which you know this next generation should be raised?
A task for all of us right is to build the AI right build a world in a way where uh you know at the end of the day humans have the agency right humans set the the direction right and you know maybe a lot of the the technical challenges that we cherish right now will become more of a you know past time that’s something that we really kind of like needs to do in order to make progress and and the challenges will be more and like figuring out like what are the things that are important what are the things we should go do you know I think that that will still be you know I think I think you know in that world like people can end up with you know more things to do and definitely more more exciting things to to do and you know I think I think you still want like to have an understanding of you know of like uh you know some understanding of like you know technology like all all the kind of like uh basic you know education however you want to acquire it for the sake of being able to think about these problems.
Anything you uh want to point our listeners to?
I think the set of problems we just discussed, right, and also the questions around alignment, monitorability, I I I think I think those are growing to be very urgent challenges. And I don’t think there are challenges only for AI researchers, right? I think there are challenges for policy makers, but also also just things we have to think through as a society and uh yeah, I I’m you know, I’m happy to see some discourse starting to arise and I I think we need more of it.