Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing

ELI5/TLDR

Michael Jordan — the ML one, not the basketball one — argues that the entire AI conversation is stuck inside the wrong picture. We keep talking about a single big brain in a box that thinks for us. He says intelligence is mostly a social, economic thing. Tomatoes show up at your restaurant every morning not because some super-intelligent forager solved a planning problem, but because a market did. The next wave of AI shouldn’t be one giant model whispering in your ear — it should be lots of statistical pieces stitched together by markets, contracts, and incentives. He’s not anti-LLM. He’s anti-the-story-we’re-telling-around-it.

The Full Story

Why a statistician is annoyed by the word “AI”

Jordan opens by pointing out that he never thought of himself as an AI researcher. He came up in statistics and operations research, in a tradition that built decision trees, hidden Markov models, logistic regression — the boring stuff that runs Amazon’s supply chain and got you your packages on time. That tradition has been called “machine learning” for decades. The word “AI”, he says, is a buzzword that came back into fashion only when neural nets started spitting out fluent English. It’s now actively distorting how research, business, and young people’s careers are being shaped.

The AI buzzword returned because of LLMs. And now to my view it’s been a distortionary effect on the path of research.

He’s allergic to the term AGI for the same reason — it’s PR. The real damage, he argues, is to 20- and 25-year-olds. They’re told they have two options: build the superintelligence or fear the superintelligence. Exuberance or alarmism. Nothing in between. That’s not how any previous era of engineering worked.

The pitch: a collectivist, economic view of AI

His paper is called “A collectivist economic perspective on AI”. The pitch is short. These models are built on inputs from billions of people. They’re meant to serve billions of people. There’s already a vast network in there — we just pretend it’s one big brain.

If you take that seriously, you stop thinking about AI as an “agent” and start thinking about it as a piece of an ecosystem. The right questions become: who are the participants, what do they want, what do they know, what are they hiding, how do they cooperate, how do they compete, how do we keep them from hurting each other. Those are economics questions. And they have mathematics.

When I say collectivist I just mean that most of this technology is based on inputs from billions of people. So there’s already a collective putting input in. And it’s meant to serve billions. So there’s a collective it’s serving.

Why “just scale LLMs into agents” is the wrong instinct

Ilya Sutskever and others say: don’t worry, we’ll just turn LLMs into multi-agent systems and the economics will sort itself out for free. Jordan thinks this is engineering naivety. Chemical engineers in the 1940s could have said the same thing — just throw chemicals together and see what happens. You’d get explosions. We did get explosions. Facebook is an explosion. Teen mental health is an explosion. The fact that you can build something does not mean you should hand it the wheel and hope.

He keeps coming back to a comparison: every other engineering discipline had something like Maxwell’s equations or Newton’s laws — a real intellectual scaffolding. Modern AI, in his telling, has gradient descent and a lot of intuition, and that’s roughly it.

”Understanding” is a media word, not an engineering one

A long stretch of the conversation circles around the word “understand”. The host (Tim Scarfe) tries to coax Jordan into saying that AlphaFold “understands” proteins in some weak sense — it refines, iterates, gets better with more passes. Jordan won’t bite.

I don’t think we need to see this anthropomorphizing of intelligence and understanding. It’s not necessary, not appropriate, and is a distraction for many many problems.

His example is Amazon’s old supply chain models. No human understood what was happening inside that giant statistical box. It didn’t matter. The box reduced uncertainty enough that you could build planning and stockpiling and engineering around it. Whether it “understood” logistics is a media question. Whether it lowered the variance on delivery times is an engineering question. He thinks we should ask the second one and let the first one go.

The AlphaFold story, and why bias hides where it hurts most

This is the most concrete bit. Jordan’s group looked at AlphaFold’s 200 million predicted protein structures and asked a specific question: are quantum fluctuations in a protein associated with being phosphorylated (active in the cell)? With only the 200,000 experimentally crystallised structures, you don’t have enough data to reject the null hypothesis with confidence. With 200 million predicted ones, you do — but the confidence interval lands miles away from the truth.

Why? Because there are very few proteins with quantum fluctuations in AlphaFold’s training data. AlphaFold is excellent on average and badly biased on exactly the questions scientists are most likely to ask, because scientists ask about the edges of knowledge, not the middle of it.

Jordan’s group developed something called prediction-powered inference to fix this. You take the foundation model’s predictions and merge in a small dose of ground-truth data, and the error bars now actually cover the truth while staying narrow. Think of the foundation model as a fast but biased oracle, and a tiny patch of real data as the correction tape.

Specifically these foundation models will be most poor and most highly biased where new questions are being asked… So there needs to be around any foundation model the ability to collect a bit of ground truth data, merge it in with some procedure like this, and then give out a more trustable answer.

This is the template for how he thinks foundation models should live in the world: not as oracles you trust, but as priors you correct.

A toy model of a data market

Jordan walks through a “Bohr-atom” version of how to think economically about data. Three layers:

Users send data to a platform (say, Mastercard).
The platform uses the data to improve its service. Nice little loop.
The platform also sells the data to third-party buyers.

The moment you add the third layer, the user has lost something — a slice of privacy — without being compensated. In a healthy market, two things happen. Platforms start offering tunable differential privacy (one offers level 0.3, another offers level 7), users pick the level they like, and data buyers start paying more for less-noisy data. Now you have a real Stackelberg game with statistical assertions baked into the math — not pure optimisation, but equilibrium-finding.

This is the kind of analysis that ML people don’t do because they’re trained as optimisers, and economists don’t do because they don’t have the data. Jordan thinks the future has to merge the two.

Machine learning people are really good at optimization, but this is not an optimization problem.

The Spotify problem, the YouTube mistake

He uses Spotify as a case of incentives gone slightly wrong. Spotify is close to a monopoly, prices aren’t being set competitively, artists get pennies, and now Spotify itself is incentivised to generate AI music — because the cheaper the supply, the higher the margin. His view is that another market should emerge to fix this; he’s an advisor to United Masters, which is trying to.

He’s harder on Google. YouTube, he argues, was a moment when Google could have built a real producer-consumer market between viewers and creators — direct economic flows, signal about who’s worth supporting. Instead they wedged in an ad layer and kept most of the money. Facebook then made that pattern worse.

Mechanism design, in one breath

The host asks about game theory. Jordan gives one of the cleanest definitions I’ve seen.

Game theory is forward: you write down a game, you compute the equilibria, you predict what will happen. Like F = ma but for strategic actors.

Mechanism design is the inverse problem. You start with the outcome you want — fair allocation, honest reporting, a working market — and you design the game whose equilibrium produces that outcome. Engineering goes backward from goal to design. Mechanism design is the engineering version of game theory.

The inverse of game theory is what’s called mechanism design. I want a certain outcome in the world — that this person gets paid, that the wealth is divided equally, that some fairness or some market is created. What game do I design so that that outcome is realized?

Within mechanism design there are sub-fields: contract theory (two parties, asymmetric information) and auction theory (many symmetric bidders, designed to reveal value). Jordan’s research lives mostly in contract theory.

Statistical contract theory and e-values

This part is dense. The core idea: classical p-values are a one-shot tool. If you peek at the data and stop when the p-value looks good, you’re cheating (“p-hacking”) and your math is wrong. E-values are a different object — an expectation of a non-negative random variable that stays bounded under the null hypothesis. The trick is that you can monitor an e-value continuously and stop whenever you like, without breaking the statistical guarantee. Vladimir Vovk and others have built a whole theory around this called “anytime inference”.

Jordan’s group has shown that incentive compatibility in contract theory is equivalent to the e-value condition in statistics. A contract that doesn’t let you game the system, mathematically, looks like a statistical procedure that doesn’t let you p-hack. Two fields, same underlying object. He thinks this is the kind of result that will eventually be everywhere.

The duck on the lake

He gives a small parable about why LLMs don’t have any real sense of uncertainty. Imagine a duck. There’s twice as much grain on the left side of the lake as on the right. A Bayesian duck, optimising expected value, would go left every time. Real ducks don’t. They go left two-thirds of the time and right one-third. They’re getting the ratio right.

Why? Because if every duck went left, there’d be a crowd on the left and free grain on the right. The 2:1 mixed strategy is actually the Nash equilibrium of the duck population. Uncertainty isn’t a property of one head — it lives in the context of the population. You can’t read it off a single model.

He sketches three kinds of uncertainty an LLM has none of:

Sampling uncertainty — the classical statistical kind, about how much data you have.
Information asymmetry — the economic kind, about what the other side knows and isn’t telling you.
Provenance — the database kind, about how old or how trustworthy this data point is. (His example: medical operation data from ten years ago should widen your confidence interval today.)

LLMs do none of these. When you ask one “how sure are you?”, it’s not reasoning — it’s mimicking the sentences humans wrote on the internet when they were asked the same thing.

Markets as uncertainty-reducers

The closing image is the best. Jordan, building his hypothetical pizza restaurant, doesn’t have to go forage for tomatoes every morning. There’s a market. Someone else solved the foraging problem, and the price signal tells him roughly how many tomatoes he can expect. His uncertainty about tomato supply has been crushed to almost zero, without anyone running an experiment design or a multi-armed bandit on his behalf. The market did it through distributed incentives.

Markets mitigate uncertainty. And they don’t do it because someone designed an optimal experiment design. They do it because the market tried various things out there — there’s incentives for people to explore and exploit.

This is the punchline of the whole talk. Intelligence, the way he wants us to think about it, is mostly distributed across a network of self-interested agents whose incentives have been shaped just well enough that the system works. Not one mind. A web of them.

Key Takeaways

The triangle. Jordan’s mental model for the next era: economics (incentives), statistics (uncertainty), and computer science (algorithms). LLMs sit inside the computer-science corner. They need the other two corners around them to be useful.
Mechanism design is the inverse of game theory. Game theory predicts outcomes from games. Mechanism design designs games to produce desired outcomes. Most useful AI work, he argues, is mechanism design in disguise.
Foundation models are biased exactly where it matters. They’re accurate on the bulk of the training distribution and fail on the edge of knowledge, which is where scientific questions live. AlphaFold was excellent on average but gave dangerously narrow, dangerously wrong confidence intervals on questions about under-represented protein behaviours.
Prediction-powered inference. A method his group built to merge a foundation model’s predictions with a small amount of ground-truth data so that the resulting error bars actually cover the truth. The general template for using LLMs in serious decision-making.
The three-layer data market. Users → platforms → third-party buyers. The moment the third layer exists, you need privacy as a tunable parameter and you need to find the Stackelberg equilibrium, not just optimise.
E-values and “anytime inference” — Vladimir Vovk’s framework lets you peek at data and stop whenever you want without breaking statistical guarantees. Jordan’s contract-theory result: incentive compatibility ↔ the e-value condition.
Ducks on the lake. Real ducks distribute themselves across resources in proportion to abundance, not deterministically. Uncertainty is a population-level property, not a model-level one.
Three kinds of uncertainty LLMs miss: sampling, information asymmetry, and provenance (how old / how trustworthy a piece of data is).
Markets reduce uncertainty for free. The tomato example: nobody runs an experiment to forecast supply for your restaurant; the market does it via distributed incentives. This is what AI ecosystems should look like — not one oracle, but many specialised pieces wired together by incentives.
Why scaling won’t get us to AGI in his view. Scale gives you a bigger optimisation engine over more data. It doesn’t give you equilibrium-finding, incentive design, or honest uncertainty quantification. Those need different math.
The young-people argument. Jordan keeps returning to this. The “utopia or extinction” dialogue is demoralising for the generation that should be building useful, mid-scale, economically literate AI systems. Most of his anger at Silicon Valley is about the message it’s sending downstream.

Claude’s Take

This is one of those conversations where the speaker is mostly right about what’s missing, and not always right about what’s possible. Jordan is a serious thinker — graphical models, variational inference, he’s the real article — and his frustration with the LLM-as-AGI narrative isn’t snobbery. It’s the frustration of someone who watched the field he helped build get rebranded around a single architecture.

The strongest parts of the talk are the technical ones. Prediction-powered inference is real and useful. The AlphaFold bias story is a clean, specific example of what goes wrong when you treat a foundation model as ground truth on the edge of knowledge. The mechanism-design framing — game theory inverted — is a genuinely good lens. The duck parable is the kind of thing you remember. The e-value / contract-theory equivalence is the sort of result that suggests there’s real intellectual structure underneath, not just metaphors.

Where he’s weaker is when he reaches for the rhetorical hammer. The dismissal of “understanding” as a media word is convenient — it lets him sidestep the actual question of whether LLMs have any internal model of the world, which is at least worth asking. The comparison of modern AI to chemical engineering before Maxwell’s equations is colourful but a little smug; it’s not obvious that statistical learning lacks principles, just that the principles aren’t the ones he prefers. And the repeated jabs at Sutskever, Altman, Hinton, and Russell start to read less like critique and more like territory marking. Plenty of people who care deeply about uncertainty and economics still think LLMs are doing something more interesting than next-token prediction.

The “demoralising young people” argument is also doing more work than it should. Yes, the doom-vs-utopia framing is dumb. But that’s a culture problem, not evidence about the technology itself. You can think both that the discourse is bad and that the models are interesting.

What’s worth keeping is the positive program. The three-corner triangle — economics, statistics, computation — is a real research agenda, not just a critique. The idea that the future is many specialised pieces wired together by markets, rather than one big brain, is the most plausible structural prediction I’ve heard in a while. And the insistence that uncertainty is a population property, not a head property, is a useful corrective to anyone who keeps asking their chatbot how sure it is.

Score: 9/10. Loses a point for the territorial bits and the slight tendency to use “you can’t because chemical engineering” as an argument. Earns it back, and then some, for the actual technical content — prediction-powered inference, e-values, mechanism design as the inverse of game theory, and the duck. Worth a second pass if any of those rang a bell.