He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]

ELI5/TLDR

Llion Jones — one of the people who invented the transformer, the architecture behind every major AI system — has left that work behind because he thinks the field is stuck in a rut. At his new company Sakana AI, he and researcher Luke Darlow built something called a Continuous Thought Machine (CTM), which borrows a trick from biology: instead of just checking what neurons are doing at one moment, it watches how they synchronize over time, the way brain waves work. The result is a system that naturally spends more time thinking about hard problems and less on easy ones, without anyone having to code that behavior in. They also created a brutally hard reasoning benchmark using variant Sudoku puzzles that current AI models largely fail at.

The Full Story

The Architecture Lottery

Imagine you discover fire, and it’s so good at cooking food that everyone stops looking for other energy sources. That’s roughly where AI is with transformers. Jones makes the case bluntly: the transformer is not the final architecture, and we’re probably wasting enormous effort making tiny tweaks to it — the same way researchers spent years shaving fractions off RNN benchmarks (1.26 bits per character… 1.25… 1.24…) right before transformers blew past all of them at 1.1. People literally came to his desk and said they must have made a calculation error.

The problem is what Jones calls “technology capture” — a cousin of audience capture on YouTube. Transformers work so well that the entire ecosystem has locked in: training code, fine-tuning pipelines, inference infrastructure, researcher expertise. Being a little better is not enough. You have to be crushingly better to move the industry.

“There is actually already architectures that have been shown in the research to work better than transformers. But not better enough in order to move the entire industry away from such an established architecture.”

Jones points to a concept he calls “jagged intelligence” — LLMs solving PhD-level problems in one breath and saying something obviously wrong in the next. He sees this as a symptom of something fundamentally off with the architecture, not just a scaling problem.

The Spiral Problem

There’s a beautifully simple illustration of what’s wrong. Imagine a dataset shaped like a spiral — two intertwined spirals you need to separate. A standard neural network (using ReLU activations) can technically classify every point correctly. But if you look at how it drew its decision boundary, it’s just a mess of tiny straight-line cuts stitched together. It learned to trace the spiral, not understand it as a spiral.

A different approach — matrix exponentiation layers — produced a decision boundary that was itself a spiral. And because it actually represented the underlying structure, it could extrapolate: the spiral just keeps going outward, and the model gets that right.

Think of it like the difference between memorizing turn-by-turn directions to a friend’s house versus actually understanding the road layout. One breaks the moment you take a wrong turn. The other adapts.

“It’s almost mad that it’s controversial to say that we should represent a spiral like a spiral.”

This connects to the six-fingered hand problem in video generation. Models learned to produce five fingers not because they understand hands, but because enough training data hammered it in. A system that actually represented hand structure would never have had that problem.

Three Ideas Inside the CTM

The Continuous Thought Machine rests on three innovations, each inspired by how brains actually work.

The internal thought dimension. Imagine giving a neural network not just one chance to answer, but a runway of sequential steps to think through a problem. This is related to chain-of-thought reasoning in language models, but here it happens entirely inside the model — no text output, no tokens, just internal computation unfolding over time. Think of it as giving the model a scratchpad it works through silently before answering.

Their test case was maze-solving. A standard approach feeds a maze image in and gets a full solution map out in one shot. The CTM instead traces a path step by step — up, right, up, left — the way a human would. This sequential framing is much harder for machines, but it’s how real reasoning works.

Neuron-level models. In standard deep learning, a “neuron” is absurdly simple — it’s basically a switch that’s on or off (a ReLU activation). Real neurons are far more complex. The CTM replaces each neuron with a tiny model of its own — a small neural network that takes in a history of its recent activations (like a short memory buffer) and produces a richer output. Think of upgrading every light switch in your house to a small thermostat that remembers recent temperature patterns.

Synchronization as representation. This is the most novel piece. Instead of looking at what each neuron is doing at a single moment, the CTM measures how pairs of neurons move together over time — their synchronization. If you have d neurons, you get d-squared-over-two synchronization pairs, which means the representation space is vastly larger than the raw neuron count.

Imagine an orchestra. Looking at what note each musician plays at one instant tells you something. But the real music — harmony, rhythm, texture — emerges from how instruments move together over time. That’s what synchronization captures.

They also use exponential decay rates at different time scales, so some neuron pairs are measured on how they fire together right now (sharp decay) while others capture long-term patterns (slow decay). This mirrors how biological brains operate at multiple timescales — the same reason we have different brainwave frequencies for different mental states.

Emergent Behaviors Nobody Planned

The CTM produced several behaviors that nobody designed in.

Adaptive computation time. Easy ImageNet images get classified in one or two thinking steps. Hard ones use all fifty. This falls out naturally from the loss function — no penalty term needed, no hyperparameter sweep. Previous attempts at adaptive computation (like Alex Graves’ work) required carefully balanced penalties to stop models from just using all available compute. The CTM does it for free.

Near-perfect calibration. Most neural networks are poorly calibrated — they’ll say “90% confident this is a cat” when they’re actually right only 60% of the time. The CTM came out nearly perfectly calibrated without anyone trying to achieve that. Jones calls this “a smoking gun that this actually seems to be probably a better way to do things.”

Backtracking in mazes. During training, the model spontaneously learned to go down one path, realize it was wrong, backtrack, and try another route. This is not something anyone coded in.

The leapfrog algorithm. When given too little thinking time to trace a long maze path step by step, the model invented a faster algorithm: it would jump ahead to roughly where it needed to be, trace backwards to fill in the path, then leapfrog forward again. A compression strategy that emerged from constraint — raising deep questions about how time pressure changes the algorithms a system discovers.

Sudoku Bench: A Reasoning Litmus Test

Jones also built a reasoning benchmark called Sudoku Bench, inspired by a Karpathy quote: if you wanted AGI, you wouldn’t want the text humans produced — you’d want the thought traces in their heads while they produced it.

He found exactly that data in the YouTube channel Cracking the Cryptic, where professional puzzle solvers narrate their reasoning through variant Sudoku puzzles in agonizing detail — sometimes for four hours straight. With their permission, he scraped thousands of hours of high-quality human reasoning traces.

These aren’t normal Sudokus. One puzzle tells you its rules in natural language, then adds: “By the way, one of the numbers in that description is wrong.” Another overlays a maze on the grid with path constraints. Each puzzle has a unique “break-in” — a novel logical insight you need before brute-force can even begin.

The best AI models solve about 15%, and only the simplest puzzles. Current RL approaches fail because the specific reasoning required for each break-in is too rare to sample effectively. The models fall back to boring trial-and-error instead of the creative meta-reasoning humans use.

The Freedom Problem

Running through the whole conversation is a thesis about how research actually happens. The transformer wasn’t born from a top-down corporate plan — it was people talking over lunch, following curiosity, with months of freedom. Jones worries that commercial pressure, publish-or-perish culture, and the sheer gravitational pull of the transformer ecosystem are squeezing out the exploratory research that produces breakthroughs.

At Sakana, he tells new hires: work on what you think is interesting and important. The CTM took eight months — long for AI research — and at no point did they worry about being scooped, because nobody else was exploring this space. He sees that as a feature, not a bug.

“Encouraging researchers to take a little bit more of a risk, right? To try these slightly more speculative long-term ideas… I want to have the CTM as like a poster child of: it works.”

Key Takeaways

The transformer is almost certainly not the final architecture. History shows that incremental tweaks to a dominant paradigm (RNNs before transformers) get obliterated when a fundamentally better approach arrives.
“Technology capture” is a real phenomenon: the sunk cost of infrastructure, expertise, and tooling around transformers creates enormous inertia, even when better alternatives exist in research.
Jagged intelligence — LLMs solving PhD problems then failing at basic logic — likely reflects architectural limitations, not just a scaling gap.
The CTM has three core innovations: an internal sequential thinking dimension, neuron-level models (tiny networks replacing simple activations), and synchronization-based representations (measuring how neuron pairs move together over time).
Synchronization gives the CTM a representation space of d-squared-over-two dimensions from just d neurons — a massive expansion that also helps with gradient propagation during training.
Adaptive computation falls out naturally from the CTM’s loss function. No penalty terms, no careful balancing — easy problems get solved fast, hard problems use more thinking steps.
The CTM produces near-perfect calibration (confidence matches accuracy) without that being an explicit training objective — a strong signal the architecture is doing something right.
Under time pressure, the CTM spontaneously invented a leapfrog algorithm for maze-solving — jumping ahead, tracing backwards, leapfrogging forward — a compression strategy nobody designed.
Variant Sudoku puzzles (Sudoku Bench) expose a fundamental gap in AI reasoning: models can’t find the creative “break-in” insights that human solvers use, falling back to boring enumeration instead.
Thousands of hours of narrated human reasoning traces from Cracking the Cryptic represent exactly the kind of “thought trace” data Karpathy argued would be needed for AGI.
The human-AI chess analogy has flipped: human+engine fusion no longer beats engines alone. Jones wonders when the same will happen for AI research.
Research freedom — following gradients of interestingness without commercial pressure — is what produced both the original transformer and now the CTM.

Claude’s Take

This is a genuinely stimulating conversation, and the CTM paper sounds like the real deal — not a marginal improvement but a different way of thinking about neural computation. The three innovations (internal thought dimension, neuron-level models, synchronization) are individually interesting but become much more compelling in combination, especially when unplanned behaviors like adaptive computation and near-perfect calibration just fall out.

That said, the conversation is heavier on philosophy and lighter on hard numbers than I’d like. There’s a lot of “this seems like a good idea” and “we’re exploring this” without much quantitative comparison to existing architectures at scale. The maze task is a lovely demonstration, but it’s a long way from maze-solving to language modeling, and Luke’s comments about applying CTMs to language prediction are still firmly in the “we’re thinking about it” category.

The Sudoku Bench section is surprisingly compelling. The insight that variant Sudokus require meta-reasoning — reasoning about the rules before you can even start solving — is a clean, grounded way to test something LLMs genuinely can’t do. The Cracking the Cryptic dataset as a source of human thought traces is clever.

Jones’ argument about technology capture and the local minimum problem is well-stated and historically grounded. The RNN-to-transformer transition story is a perfect illustration. Whether the CTM is the thing that breaks us out of the current local minimum, or just a pointer toward the right direction, remains to be seen. Score of 8: substantive new architecture with emergent properties, thoughtful meta-commentary on research culture, but still early-stage with limited scaling evidence.