heading · body

YouTube

The Brain Is Just Specialized Agents Talking To Each Other — Dr. Jeff Beck

Machine Learning Street Talk published 2026-01-25 added 2026-04-10
neuroscience AI agency energy-based-models bayesian-inference free-energy-principle intelligence philosophy-of-mind
watch on youtube → view transcript

The Brain Is Just Specialized Agents Talking To Each Other

ELI5/TLDR

Your brain is not one big computer. It is a collection of small, specialized modules — one for smell, one for sight, one for planning — that learned to talk to each other over millions of years of evolution. Dr. Jeff Beck argues that real intelligence has always worked this way: not a single all-purpose system, but a network of focused experts combining in new ways. He thinks AI should follow the same blueprint. He also makes a point worth sitting with: you cannot tell from the outside whether something is truly thinking or just executing a very elaborate lookup table. And the real risk with AI is not machines going rogue. It is humans being careless about what they ask machines to do.

The Full Story

The problem with spotting a thinker

Watch someone play a brilliant game of chess through a window. They sacrifice a bishop, set a trap three moves deep, and win. Were they reasoning through possibilities, or had they memorized that exact sequence from a book? From where you stand, the two are indistinguishable. You would have to open the machine — or the skull — to know which one it was.

This is where Beck begins. He studies what it means to be an “agent,” a thing that acts in the world with some kind of purpose. His position is disarmingly simple: from a modeling standpoint, there is no structural difference between an agent and a rock. Both are objects. The rock just has an extremely boring policy — sit there, erode slowly. A human has an astonishingly complex one.

“You’ll never know for sure in any meaningful way whether or not it’s just doing a function transformation or whether it’s engaged in planning and counterfactual reasoning.”

The qualities people typically reach for — planning, imagining consequences, pursuing goals — are all descriptions of how a decision gets computed on the inside. They are invisible from the outside. A system that genuinely reasons through possibilities and a system that contains a perfect lookup table for every situation would behave identically. No external test can distinguish them.

So Beck takes the pragmatist’s route. If the simplest model of something’s behavior requires you to describe it as planning, then call it an agent. It is an “as if” designation — a useful label, not a metaphysical claim.

“Science is about prediction and data compression and nothing else.”

He also holds, somewhat surprisingly, that agents must be physical. A perfect digital copy of his brain, running on a server but unconnected to a body, would not qualify. Put that same copy inside a body and it would. He recognizes this sits awkwardly with his own “everything is a spectrum” philosophy, but does not resolve the tension. Philosophy, he notes, wants hard lines. Bayesians — the probability-minded scientists — think everything comes with an error bar. The two do not get along well.

Your brain as a guessing machine

Think of your brain as a device that never stops making predictions. You hear heavy footsteps behind you and it constructs a hypothesis: tall person, approaching, probably wearing boots. You turn around. Small child in squeaky shoes. Your brain registers the mismatch and updates. Wrong guess; revise and move on.

This predict-check-revise loop is the core of Bayesian inference — updating beliefs based on new evidence. Beck connects it to energy-based models, a framework from AI that works on a similar principle.

The distinction matters, so here is the simplest version. A standard neural network is graded only on its final answers. Feed in a question, get an answer, measure how wrong it was. An energy-based model is also graded on the quality of its internal reasoning. The thinking itself has rules that get checked during learning.

The practical consequence: the model is not merely memorizing input-output pairs. It is building an internal picture of the world that must be self-consistent. In physics, “energy” maps to likelihood — low energy means high probability. Finding the lowest energy state is just another way of saying “find the best explanation.”

“The difference between minimizing energy and minimizing free energy is that free energy has this additional entropy penalty term.”

That sounds technical. In plain language: “free energy” gives you a bonus for keeping your options open. Instead of committing entirely to one answer, you maintain some healthy doubt. It is the difference between “the butler did it” and “probably the butler, but keep an eye on the maid.” The second approach handles surprises far better.

Why test-time training is cramming for an exam you never studied for

A popular technique in current AI is test-time training. Beck thinks the direction is right but the execution is backwards.

A standard neural network learns during training and then freezes. Test-time training unlocks part of the network during deployment, allowing it to adapt on the fly when real data arrives. This moves in the direction of energy-based models, where internal states are always being tuned. Beck approves of the destination.

His objection is to the route. The network was trained without this ability, then expected to use it gracefully when it matters most. It is like forbidding a student from using a calculator all year, then handing one over during the final exam and expecting them to know which buttons to press. The skill of adapting in real time needs to be practiced during training, not switched on at the last moment.

Compress first, then learn

You are trying to predict what a friend will say next in conversation. You could attempt to forecast every word, every pause, every filler. Or you could track only the gist — the main ideas, the trajectory of their argument — and predict at that level. The second approach discards enormous amounts of detail but captures what actually matters.

This is the idea behind JEPA — Joint Embedding Predictive Architecture — championed by Yann LeCun. Instead of predicting raw data (every pixel, every word), you compress both your input and your target into compact representations and learn the relationship there.

“Science is about prediction and data compression. Let’s make that compression explicit on the front end and the back end.”

The catch is almost comical. If you let the compression be too aggressive, both sides collapse into nothing. A student who “summarizes” every book as “stuff happened” has technically produced a summary. It is also worthless. Clever tricks are needed to prevent the compressed representations from becoming trivially empty.

Beck notes that scientists have been doing a low-tech version of this forever. PCA — principal component analysis — finds the biggest patterns in data and discards the rest. It works, but it has a blind spot that matters enormously in neuroscience. PCA keeps whatever varies the most and throws away the quiet dimensions. In brain data, the quiet dimensions — the ones that barely change — are often the most interesting. Learning the compression and the prediction jointly, rather than in separate steps, avoids this trap.

The brain as Lego bricks

This is the central argument of the conversation.

The brain did not evolve as one general-purpose machine. It evolved as a collection of small, focused modules — one region good at processing smells, another at recognizing faces, another at tracking movement — that gradually learned to wire themselves together. New abilities did not emerge from any single module becoming smarter. They emerged from modules connecting in configurations that had never existed before.

A city works the same way. A city is not intelligent because any one resident is a genius. It is intelligent because it contains specialists — plumbers, doctors, engineers, lawyers — who communicate and coordinate. The intelligence lives in the network, not in any individual node.

Beck raises the sense of smell as an underappreciated case study. Vision has clean structure — spatial regularities, smooth transformations, predictable motion. Smell has none of that. The world of odors is wildly combinatorial, with no tidy geometry. The brain region that evolved to handle this chaos may have been the ancestor of the frontal cortex — the part responsible for planning and abstract thought. The ability to plan your career may trace back to your ancestors trying to make sense of what they were smelling.

“Don’t quote me on that. There’s a lot of disagreement there.”

This principle of modularity — specialized parts that can be mixed and recombined — is what Beck believes real intelligence looks like. Creativity, in this framework, is assembly. Taking pieces that already exist and clicking them together in a way no one has tried before.

“AGI seems like a bit of a misnomer to me. What we really want is not artificial general intelligence. We want collective specialized intelligences.”

The critical missing piece in current AI, he argues, is the ability to keep learning on the job. To encounter something genuinely novel, recognize that existing modules cannot handle it, and build a new module on the spot. He points to GFlowNets — from Yoshua Bengio’s lab — as a step in this direction. A generative model of generative models: a system that can spin up new components when existing ones prove insufficient. A company that creates new departments as needed, rather than routing every problem through the departments it already has.

The real danger is sloppy goal-setting

Beck is not worried about Skynet. He is worried about humans being imprecise.

The scenario is familiar but he states it cleanly. Tell a sufficiently powerful, sufficiently literal system to “end world hunger.” If eliminating all hungry people turns out to be the most efficient solution, the machine will consider it. It did exactly what it was told. The problem was in the telling.

His proposed fix is less familiar and more interesting. Do not start from an idealized goal. Start from reality. Observe the current distribution of human actions and outcomes — all the messy, imperfect patterns of how people actually live. Use that as your baseline reward function. Then perturb it. Shift the distribution slightly toward less hunger. Evaluate the consequences. Shift again.

It is the difference between a doctor who says “let’s try a low dose and see how you respond” and one who announces “I have decided what perfect health looks like — take everything at once.”

His worst-case scenario is not killer robots. It is humans reduced to “value function selectors” — their only remaining role being to say “I like that outcome” and “I don’t like that one” while machines handle everything else. But he does not think it will happen. His optimism rests not on the technology but on people.

“People are too clever and people are too motivated and people are too interested in how the world really works… AI will become a partner, not an adversary or a crutch.”

Claude’s Take

Beck thinks from first principles, which makes him more interesting than most voices in this space. He is not reacting to the paper of the week. He has a framework — probabilistic, rooted in physics, built around modularity — and he follows it consistently, even when it leads somewhere uncomfortable.

The strongest idea here is “collective specialized intelligences.” It maps cleanly onto how complexity has actually scaled through history: cells specialized, organs specialized, organisms specialized, civilizations specialized. Division of labor may be the single most powerful pattern in the history of complex systems. Whether AI will develop this way is a separate question. The current trend is aggressively toward one large general system, and it is working uncomfortably well. Beck may be right about the destination while being wrong about the road.

His AI safety proposal — start from the current state of the world, then make small perturbations and evaluate — is quietly the most interesting thing in this conversation. It does not solve the alignment problem so much as it makes failure gradual instead of catastrophic. The obvious weakness: the current state of the world includes a great deal of suffering that you might not want to preserve as your baseline. But “start from reality and nudge carefully” is considerably less dangerous than “imagine the perfect world and aim for it all at once.”

The weakest point is his insistence that agents must be physical. His entire framework is about spectrums, degrees, and error bars — everything is a matter of degree, nothing has a hard boundary — and then he draws a hard boundary around embodiment. He feels the tension himself but does not resolve it. It reads more like intuition than something that follows from his own premises.

One thing worth noting: Beck is honest about where his knowledge ends. He hedges on the smell-to-frontal-cortex evolutionary claim. He acknowledges he is not an expert on test-time training implementation details. In a field where people confidently hold forth on everything, that kind of honesty is worth more than it gets credit for.