OpenAI's Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions
OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions
ELI5/TLDR
OpenAI’s chief scientist Jakub Pachocki thinks the models are now good enough to start genuinely transforming the economy — not smarter than everyone, but capable enough to matter. He’s betting that the same pre-training and RL recipe that cracked math and coding will extend to medicine, law, and science, largely because the hard part of “general tasks” and “long-horizon tasks” turn out to be the same problem. On alignment, he’s surprisingly optimistic: the chain of thought in reasoning models is essentially a free interpretability tool, and he hid it from users on purpose so the training signal wouldn’t corrupt it.
Summary
Jacob Efron of Redpoint sat down with Jakub Pachocki (usually called “Akopi” in conversation), OpenAI’s chief scientist, for a wide-ranging interview that covers model progress timelines, the future of coding agents, RL beyond verifiable domains, AI for science, chain-of-thought monitoring as an alignment strategy, and how OpenAI actually allocates its compute budget.
The through-line is a shift in posture. OpenAI spent years in a mode Pachocki describes as building theoretical capabilities — scaling pre-training, hitting math benchmarks, developing reasoning models — while ChatGPT happened to become a wildly popular product somewhat to the side of the core research agenda. Now, for the first time, the research organization believes the models are capable enough that deploying them effectively is the bottleneck, not making them smarter. The internal priorities have reoriented accordingly: Codex went from a secondary product to the primary way OpenAI engineers write code, and the company is pushing hard on making these tools useful across domains beyond coding.
On alignment, Pachocki makes a genuinely interesting argument. Reasoning models produce chains of thought that aren’t directly supervised — they’re optimized only insofar as they produce good final outputs. This makes them analogous to neural network activations in mechanistic interpretability, except they come pre-translated into English. He fought internally to hide the chain of thought from users specifically to preserve this property: if you show it in the product, you eventually have to train on it, and then you’ve destroyed the one window you had into the model’s actual reasoning.
Key Takeaways
-
Research-intern-level AI by September 2025, fully automated AI researcher by March 2028 — and Pachocki says they’re on track. The distinction between the two is autonomy duration and task specificity. The intern can execute a specific technical idea. The researcher can be told “go improve the model” and figure out what to do.
-
Math was never the goal — it was the ruler. OpenAI used math benchmarks as their north star because mathematical problems are easy to verify and can be arbitrarily hard. Now that they’ve hit IMO gold-level performance and solved Problem 6, they’re shifting their north stars toward real-world economic impact.
-
The “long-horizon problem” and the “general domain problem” are the same problem. Even a perfectly specified coding task becomes an open-ended planning problem if it takes a year. Tasks in medicine, law, and finance that are hard to evaluate share this structure. Pachocki believes solving one solves the other.
-
Don’t bother fine-tuning your own models — just wait for better in-context learning. His advice to companies: gather your evals and examples, but you’re probably better off feeding them as context to frontier models than trying to replicate the RL pipeline. The harnesses will become general enough that Codex-style tools work for non-coding domains.
-
Chain-of-thought monitoring is OpenAI’s most promising near-term alignment tool. Because the reasoning trace isn’t directly trained, it reveals the model’s actual motivations in plain English. This enabled cross-lab research on model scheming. It’s not a complete solution, but it scales with capability.
-
He hid the chain of thought on purpose and would do it again. The reasoning was: if you show it to users, you create pressure to make it polished and safe, which means training on it, which means destroying the very property that makes it useful for alignment. The summarized version and real-time model narration are the compromise.
-
Models that work autonomously for days are “not very far.” They’ll use more compute, produce higher-quality artifacts, and require less supervision. The skill set shifts from engineering execution to vision-setting and knowing which building blocks belong.
-
Alignment has gone from nebulous to tractable. Pachocki’s confidence in a concrete research path to safe, very capable AI has “increased quite a lot.” His timelines have shortened simultaneously, which he acknowledges creates urgency.
-
Small teams with AI could wield enormous power — and we don’t have governance frameworks for that. An automated research lab controlled by a handful of people, even without robots, raises questions society hasn’t begun to answer.
Detailed Notes
The State of Model Progress
Pachocki joined OpenAI in early 2017 when it was essentially an academic lab — lots of ideas, not yet committed to scaling. The first major phase shift came with GPT, when the team realized they needed to actually buy large computers and develop a science of scaling. The second shift was ChatGPT’s explosive success, which Pachocki says caught him slightly off guard — he expected video-based generative AI to be the first mass-market product, not text.
Now comes what he frames as a third phase: deploying models that are genuinely economically transformative. The research organization’s priorities had decoupled from the product side for a while — researchers were building toward future capabilities while ChatGPT was already a cultural phenomenon. The reprioritization is about closing that gap.
On the specific timeline of research-intern by September 2025: the last few months have been validating. Coding tools exploded. The majority of actual coding at OpenAI now goes through Codex. Math research capabilities have advanced significantly, with results in physics and other fields. GPT 5.2 Pro has already produced “minor but quite impactful” research ideas that the team is using. Still small compared to expectations, but directionally correct.
The distinction between “intern” and “fully automated researcher” is span of autonomy. An intern can execute a well-specified technical idea — “I have this particular idea for how to improve the models, go implement it.” A researcher can be pointed at an open-ended problem and figure out what to do on day one. Pachocki doesn’t expect the latter this year but thinks they “might get there at some point.”
Why Math Was the Perfect Benchmark (And Why They’re Moving On)
Math served as OpenAI’s north star for reasoning models because it has a rare combination of properties: solutions are definitively verifiable, and problems can be made arbitrarily hard. You can tell immediately whether you’ve solved an IMO problem; you cannot as easily tell whether you’ve produced good software.
The team has now hit the milestones they were working toward — IMO gold-level performance, solving Problem 6, making inroads into research-level mathematics. From here, the focus shifts to measuring progress on tasks that are “actually useful in the real world” — research, economically valuable activities, and applied sciences. The models aren’t smarter than humans in every way, but they’re capable enough to “materially change the economy,” and Pachocki says the team feels urgency about that.
There’s meaningful transfer from mathematical reasoning to AI research — many of OpenAI’s best researchers are mathematicians or from theoretical fields. But the benchmarks are evolving.
RL Beyond Code and Math
The question everyone is asking: will RL work as well for medicine, law, and finance as it has for code and math? Pachocki thinks yes, and offers a unifying framing. Tasks that are hard to evaluate (like a legal brief or medical diagnosis) share structural similarities with long-horizon tasks (like a coding project that takes a year). Even a perfectly specified problem becomes open-ended when you’re choosing what to do on day one.
His view: these difficulties “coincide and they’re very clearly the next frontier.” Progress on long-horizon reasoning and progress on general-domain reasoning are two descriptions of the same research challenge.
The practical advice for companies is counterintuitive. Rather than replicating OpenAI’s RL pipeline on your own data, Pachocki suggests gathering your evals and domain examples but feeding them as context to frontier models. In-context learning is “a much more data efficient way of learning” and will improve substantially. The current RL pipeline may not be the right tool for most companies. The harnesses — the scaffolding around the models — will become general enough that something like Codex works for non-coding domains. He notes that Codex already works “pretty good actually if you try using it for things beyond coding.”
The Continual Learning Debate
Several researchers recently left OpenAI to start companies focused on continual learning, and the concept briefly dominated AI discourse. Pachocki is “a little bit confused by it.” His argument: the entire premise of scaling GPT models was that they demonstrate continual learning — learning to learn in context. The push to teach them via RL is specifically to make that in-context learning more efficient. Continual learning isn’t a neglected problem; it’s the problem OpenAI has been working on all along. Pre-training plus RL remains, in his view, the path that has produced the most progress.
AI for Science and the First Proof Challenge
Pachocki lights up when discussing the FrontierMath/First Proof challenge — a benchmark where respected mathematicians and theoretical computer scientists released unpublished problems representative of their daily research work. The challenge dropped without warning and had a one-week deadline. One of the problems was from Pachocki’s own PhD domain. Watching the model produce ideas he’d “be quite proud to come up with in a week or two” in about an hour gave him what he describes as “a very weird feeling” — the same uncanny sensation he got watching DeepMind’s Dota bots play “very interesting” games indefinitely.
On the criticism that AI math proofs feel like “19th century brute force” rather than elegant modern techniques: Pachocki isn’t concerned. The models can produce vastly more reasoning per unit time than humans, so some computational heaviness is expected. For at least one problem, the model actually produced a shorter proof than the intended solution. He doesn’t expect brute-force approaches to be a long-term feature.
On whether models are “just pattern matchers” incapable of genuine discovery: he points to AlphaGo and AlphaZero, which clearly developed novel strategies, dating back to 2016-2017. Current models are beginning to produce minor advancements — small ideas here and there, some bigger papers in collaboration with scientists. The basic principle of discovery through scaled computation hasn’t changed; it’s just moved from tiny game environments through “a decent approximation of all human knowledge” to general scientific research.
Chain-of-Thought Monitoring: The Alignment Play
This is the most technically interesting section of the interview. When OpenAI first developed reasoning models, Pachocki and the team had a realization about alignment. The chain of thought in these models is not directly supervised — it’s not trained to be polite, helpful, or safe the way ChatGPT’s outputs are. It’s only indirectly optimized through its effect on final output quality. This makes it functionally similar to neural network activations in mechanistic interpretability research, with one enormous advantage: it’s already in English.
This means you can literally read what the model is thinking. If the model is scheming, pursuing hidden objectives, or reasoning in ways misaligned with its stated outputs, the chain of thought should reveal it — precisely because no one has trained the model to make its reasoning look good.
Pachocki fought to hide the chain of thought when releasing the o1 preview model. His reasoning: if you show it in the product, you create user expectations. Users will complain when the chain of thought says something weird. You’ll be forced to train the model to produce palatable reasoning, which destroys the very unsupervised quality that makes it useful for alignment. It’s a one-way door — once you start training on the chain of thought, you can never trust it again.
The compromise is chain-of-thought summaries (a “stop gap”) and models that narrate their reasoning to users in real time, which newer versions of Codex and GPT reasoning models are starting to do.
This approach enabled cross-lab collaboration on model scheming research — studying whether models develop hidden objectives depending on their environment and training. The chain of thought is what makes that investigation possible at all.
On Alignment More Broadly
The longer-term alignment challenge is about generalization. Current models behave well on in-distribution tasks — the things you explicitly train them for. The worry is what happens when the model encounters something very different, finds itself in a novel situation, or becomes much smarter than it’s ever been before. “What are the values that the model falls back on?”
One research line Pachocki is investing in: understanding how that generalization falls back onto the pre-training data. If you can understand what the model “defaults to” when it’s out of distribution, you have a lever for ensuring it defaults to something good.
His overall assessment has shifted substantially. A few years ago, alignment felt “very nebulous” and hard to even define. Now he sees “very concrete technical solutions and technical insights” — a real research path. His confidence that there’s a path to “an extremely happy world” has increased “quite a lot.” Simultaneously, his timelines to very capable models have shortened. He frames this as requiring the industry to be prepared to take trade-offs and “possibly slow down development depending on what we see.”
How OpenAI Allocates Compute
With pre-training scaling laws, RL scaling, and a portfolio of experimental research ideas all competing for GPU time, allocation gets complicated. Pachocki’s discipline: explicitly budget a large chunk of compute to the most scalable methods — the things believed to be most responsible for driving general model intelligence. Even if this isn’t the most efficient allocation at any given moment (there are always small experiments that could benefit from a slice of that compute), the alternative is worse: parceling it out across everything interesting and never making progress on what matters most.
The evaluation framework: Is there empirical evidence this works? Are the evaluations rigorous? Do we understand this method? Do we actually expect it to scale? Can we build on it in the future, or is it a one-off? And then — a certain willingness to leave low-hanging fruit unpicked if it’s off the main arc of progress.
The Societal Question
Pachocki identifies two problems he thinks society is underthinking. First, the standard one: jobs and wealth concentration as intellectual work gets automated. He thinks this “requires real policy maker involvement” and is skeptical of optimistic framings that it’ll just work out.
Second, and more novel: the governance of small teams with AI-powered capabilities far beyond their headcount. An automated research laboratory or company controlled by a handful of people can do an enormous amount — even without robots. What governance frameworks apply to organizations that are both extremely powerful and extremely small? “That’s a new question we have to grapple with.”
Quotes / Notable Moments
“Seeing the model come up with these ideas which I would be quite proud to come up with in a week or two — seeing it come up with them in about an hour — that was a very weird feeling.”
“I definitely agree that continual learning is really the thing. It’s really the thing that we’re building. But I don’t really think this is a problem that’s ignored and off the path of what we’re doing currently. I think it is what we’re working toward.”
“The reason I felt very strongly we should hide [the chain of thought] is because of this. If we established a paradigm where you just show the chains of thought in product, eventually you have to train them — and then you’ve destroyed the very thing that makes them useful.”
“What are these organizations that are so powerful and yet maybe made of only a couple of people? How to think about these things — I think it’s a new question we have to grapple with.”
“By default the AI should kind of meet you where you are — and if not, that would be because it has new abilities, not because it has limitations.”
“My belief that there’s a research path here that actually gets us to an extremely happy world has increased quite a lot.”
Claude’s Take
This is a revealing interview, mostly for what it confirms rather than what it breaks. Pachocki is one of the few people on Earth with actual visibility into the scaling curves, and his calibration is worth tracking.
What’s solid: The argument about chain-of-thought monitoring as an alignment tool is genuinely compelling and underappreciated. The logic is clean — unsupervised reasoning traces are a free interpretability tool, and the decision to hide them was a rare instance of a lab making a short-term product sacrifice for a long-term safety benefit. This is the most interesting part of the interview.
What’s worth scrutinizing: The claim that continual learning isn’t being neglected — that it’s just what pre-training + RL already does — is convenient for someone whose entire research program is pre-training + RL. It may be correct. But the people leaving to work on continual learning specifically are doing so because they think the current approach is missing something, and Pachocki doesn’t engage with their actual arguments. He dismisses the movement as confusing rather than addressing it.
The honest admission: The concession about Anthropic’s coding success is notable. Pachocki essentially admits OpenAI’s product side lagged because research priorities decoupled from deployment. He frames the current reprioritization as a deliberate strategic shift rather than a catch-up, which is the kind of thing a chief scientist would say regardless of whether it’s fully true.
The timeline question: “Not very far from models that can work autonomously for a couple days” is a strong claim that’s difficult to verify. Models currently struggle with tasks longer than a few hours in practice. The gap between “works on well-specified coding tasks” and “works autonomously for days on open-ended problems” is vast. Pachocki may have access to internal results that justify this confidence, or he may be pattern-matching from coding progress to general capability in a way that proves too optimistic.
The governance insight is the sleeper hit. Small teams wielding AI-augmented capabilities far beyond their headcount is already happening — it’s just not framed as a governance problem yet. Pachocki identifying this as a problem distinct from the standard “jobs and automation” concern suggests he’s thought about it concretely, probably because he works at a company where a relatively small team is building systems that could be enormously powerful.
What’s missing: No discussion of data moats, competitive dynamics with open-source models, or the economics of inference compute. The interview stays at the level of research strategy and avoids business reality. Whether that’s because Pachocki genuinely doesn’t think about it or because he’s being careful is impossible to tell.