Bell's Theorem, a Glitch in Reality

ELI5/TLDR

If two tiny particles are born together in a particular linked state and then shot in opposite directions, measuring one of them instantly tells you what the other will do — even when they’re too far apart for any signal to pass between them. For thirty years after this was noticed in 1935, people hoped the weirdness was just missing information, little secret variables the particles carried with them like invisible luggage. In 1964, a physicist named John Bell did the math and showed that no such luggage, no matter how cleverly you design it, can ever reproduce the numbers we actually measure. Something in reality genuinely doesn’t care about the speed of light the way the rest of the universe does.

The Full Story

The paper this is about

Richard Behiel walks through John Bell’s 1964 paper “On the Einstein-Podolsky-Rosen Paradox” line by line. The paper is famously short and famously cryptic — six parts, heavy on equations, light on words. The video’s job is to unpack what Bell actually said, without the usual pop-physics hand-waving about spooky action and consciousness.

Bell’s paper is a direct reply to a 1935 argument by Einstein, Podolsky, and Rosen (EPR). EPR had looked at quantum mechanics and said: either this theory is incomplete, or reality is deeply weird. They preferred the first option. Bell, thirty years later, closes the door on that preference.

Spin, and a warm-up experiment

The whole argument runs on a particular kind of particle — spin-1/2 particles, which for our purposes means silver atoms behaving like tiny magnets. The classic experiment is the Stern-Gerlach setup. Imagine a hot oven full of silver. Atoms evaporate, fly out in a beam, and pass through a strong non-uniform magnetic field. The beam splits cleanly into two — not a smear, not a spread, but two discrete beams. Call them “spin up” and “spin down.”

Already this is strange. A classical little magnet with a random orientation should fan out across every angle. Instead, nature forces a binary answer.

Now stack two of these magnets in a row. Filter the first one so only spin-up atoms continue. Rotate the second magnet by some angle theta. The beam that emerges is mostly spin-up, but some fraction come out spin-down. The fraction follows a specific curve:

Probability of spin-up = cos²(theta/2) Probability of spin-down = sin²(theta/2)

At zero tilt, everything stays up. At 90 degrees, it’s a perfect 50/50. At 180 degrees, everything flips. This cos² curve is the hard experimental fact the rest of the video is built around.

The instinctive fix, and why it fails

Here’s what any sensible person wants to do. Maybe each atom carries a little arrow — call it lambda — pointing in some direction. The magnet just sorts atoms by whether their arrow points a bit up or a bit down. No quantum weirdness, just hidden luggage.

This is a “local hidden variable” model. Local because the atom only carries what it has when it leaves the oven. Hidden because we can’t see the arrow directly, only the consequences.

The problem: if you work out the math, this model predicts a probability that’s linear in theta, not cos². Imagine two overlapping hemispheres — one defined by the first magnet’s polarization, one by the second magnet’s measurement axis. The probability of a spin-up measurement is just the overlap area, which varies straight-line with the tilt angle. A straight line is not a curve. The model is wrong.

You can patch it by defining an “effective measurement axis” that’s secretly bent toward the polarization vector. Behiel calls this “the sketchy move.” It technically works — you can warp the line into the cos² curve — but it requires the magnet to somehow know about the particle’s history in a contrived way. It’s legal. It’s not convincing. Bell lets it slide, for now, to stay maximally charitable.

Two particles, the singlet state, and the real paradox

Bell’s argument really gets going with pairs of particles in what’s called the singlet state. The setup:

Two spin-1/2 particles are created together, then fly apart.
Neither particle has a preferred spin direction before measurement. This is not a classical “one up, one down, we just don’t know which” situation. Rotationally, the particles have no bias at all.
But measure them along the same axis — any axis — and you’ll always get perfectly opposite results. One up, one down. Guaranteed.

This is the strange part. The particles don’t decide their direction until measured, yet whatever they “decide” is perfectly coordinated with a partner potentially miles away.

Quantum mechanics predicts that if you measure particle 1 along axis A and particle 2 along axis B, the average of the product of the two results is:

P(A, B) = −cos(theta)

where theta is the angle between the axes. This is the quantum correlation function. It’s smooth, it’s curved, and it has zero slope at its minimum. Memorize that shape — that’s the target.

Bell’s setup, fully general

Bell allows any hidden-variable model you can imagine. Lambda can be a vector, a scalar, a function, a set of functions — anything. There’s a probability distribution rho(lambda) over these hidden variables, and two functions A(a, lambda) and B(b, lambda) that return +1 or −1 depending on what each detector measures.

The crucial locality assumption: A depends only on the setting of detector A and the hidden variables. B depends only on detector B’s setting and the hidden variables. Detector A does not know what detector B is doing. They’re too far apart and the measurements happen too fast for light-speed signaling.

The correlation function is then:

P(a, b) = integral over lambda of rho(lambda) × A(a, lambda) × B(b, lambda)

The question: can we pick any rho, any A, any B, such that this integral matches −cos(theta)?

Two proofs that the answer is no

Bell proves it twice, in two different ways. Both are worth absorbing.

First proof: slopes don’t match. Any correlation function built from this local hidden variable recipe will have a nonzero slope at its minimum value (when a = b). Think of it as the straight-line vs. curve problem again, generalized. The quantum correlation −cos(theta) is flat at its minimum — its slope is zero at theta = 0. A local hidden variable correlation can’t be flat there. Bell shows this by bounding how much the correlation can change when you nudge one measurement axis by a tiny amount epsilon. The change is always first-order in epsilon. But for the curve to be flat at its minimum, it would need to be second-order or higher. Contradiction.

Second proof: the famous inequality. Consider three measurement directions: a, b, c. Bell derives (via a string of algebraic moves) this constraint:

4(epsilon + delta) >= |A·C − A·B + B·C − 1|

Here epsilon is the maximum allowed mismatch between the hidden-variable correlation and the quantum one, and delta is a tiny smoothing term. Now plug in a specific arrangement: A pointing up, C pointing right (so A and C perpendicular), B at 45 degrees between them. The right-hand side evaluates to √2 − 1, about 0.41. So epsilon + delta has to be at least about 0.1. It cannot be driven to zero.

Meaning: there exists at least one experimental configuration where no local hidden variable model can match quantum mechanics’ predictions. You only need one.

The sketchy move, revisited, dies a second death

Earlier we let the “sketchy move” slide in the single-particle case — bending the effective measurement axis toward the polarization. Bell shows that for two entangled particles, the equivalent trick requires detector A’s effective axis to bend toward detector B’s axis and vice versa. But the detectors are far apart and causally disconnected. For one to know the other’s orientation is literal non-locality. The sketchy move is no longer even technically local. It’s just spooky action dressed up in mathematical clothes.

So either you accept non-locality baked into quantum mechanics directly, or you accept non-locality baked into your hidden-variable model. There’s no escape to a local classical picture.

Generalization, and what this actually means

Part five of Bell’s paper makes the case that none of this is specific to spin. Any quantum system with a two-dimensional subspace — and that’s essentially all quantum systems — supports an analogue of the singlet state, with the same non-local correlation behavior. The theorem isn’t about electrons or magnets. It’s about the fabric of quantum theory itself.

The conclusion, Bell’s own words:

In a theory in which parameters are added to quantum mechanics to determine the results of individual measurements, without changing the statistical predictions, there must be a mechanism whereby the setting of one measuring device can influence the reading of another instrument, however remote. Moreover, the signal involved must propagate instantaneously.

Any complete picture of reality that reproduces quantum mechanics’ experimentally-verified numbers has to carry instantaneous, faster-than-light coordination between distant systems. Full stop.

Why this doesn’t break the universe

A natural panic: if particles coordinate faster than light, can we build a faster-than-light telephone? No. The “no-signaling theorem” saves us. The correlations exist, but they’re only visible when you compare results after the fact, which requires classical (light-speed) communication. You can’t use entanglement to send a message.

But philosophically, the tension is real and unresolved. Quantum mechanics is non-local in a way that sits uncomfortably next to Einstein’s relativity, which is all about locality and light cones. Nobody has a clean, agreed-upon explanation. Depending on who you ask: many worlds, superdeterminism, some kind of retrocausality, or just giving up on realism entirely. Each option is its own flavor of disturbing.

The loophole Bell flags at the very end

Bell ends by noting that if the detector settings were fixed in advance, you could in principle imagine the detectors somehow having established an agreement with each other before the experiment began — slower-than-light rapport. That’s the superdeterminism loophole, roughly. To close it, experimenters later ran versions where detector settings are randomized during the flight of the particles, so no prior communication could possibly matter. Those experiments confirmed quantum mechanics again.

Key Takeaways

The singlet state has no preferred spin axis before measurement, but measurements along the same axis always come out opposite. This coordination is the thing that needs explaining.
A “local hidden variable” model means: each particle carries everything it needs, nothing non-local happens, measurements are determined by the particle’s luggage plus the detector setting.
Quantum mechanics predicts the correlation P(a, b) = −cos(theta) where theta is the angle between detector axes. This is experimentally verified.
Any local hidden variable model produces a correlation that varies linearly with theta, not as −cos(theta). A line is not a curve.
Bell’s theorem proves something stronger: for at least one experimental configuration (axes at 45 degrees), the mismatch between a local hidden variable model and quantum mechanics cannot be made arbitrarily small. One experiment is enough to rule out the entire class of local hidden variable theories.
You can save hidden variables only by allowing them to be non-local — letting one detector’s setting influence the other’s result. But that’s exactly the spookiness you were trying to avoid.
The theorem is general. It applies to any quantum observable with a two-dimensional subspace, not just spin.
No faster-than-light communication is possible, due to the no-signaling theorem, even though the correlations themselves appear to violate locality.
The philosophical fallout is unresolved. Every interpretation (many worlds, superdeterminism, non-realism, pilot waves, etc.) pays a heavy price somewhere.

Claude’s Take

Behiel is doing something rare on YouTube: actually reading the paper. Not summarizing a summary, not giving you a vibes-based version, but walking through Bell’s 1964 text word by word with the math filled in. That’s an enormous amount of work and it shows. The pacing is slow but never condescending. The sketchy-move framing is a genuinely useful pedagogical device — it gives the hidden-variable camp every benefit of the doubt before destroying them, which is exactly how Bell structured the argument.

The downside of this format is that it’s three hours of heavy content and the payoff takes a long time to arrive. You have to sit through a lot of setup — Stern-Gerlach experiments, Pauli matrices, spinor flag diagrams — before the punchline lands. If you’re already comfortable with quantum mechanics, some sections will feel slow. If you’re not, you’ll need to re-watch parts.

One honest gripe: Behiel occasionally slips into a “mind-blown” register (“glitch in reality!”) that sits awkwardly next to the careful technical work. The math is doing the heavy lifting; the showmanship is unnecessary. The actual result is strange enough without narration telling you it’s strange.

The 9/10 score reflects that this is, as far as I know, the most thorough free video explanation of Bell’s theorem anywhere, by someone who clearly respects both the math and the audience. It’s not a video you watch once. It’s a reference you come back to. Dock a point for length and for occasional tonal lapses.

Worth noting: Bell’s theorem is one of the few results in physics that can change how a thoughtful person looks at the world. Most physics tells you how things work inside a framework you already accept. Bell tells you the framework itself has a feature — call it non-locality, call it whatever you like — that doesn’t fit the intuitions we inherited from a century of classical mechanics. The universe is stranger than it needs to be.