Transcript: Bells Theorem A Glitch In Reality

TITLE: Bell’s Theorem, a Glitch in Reality CHANNEL: Richard Behiel DATE: 2025-12-12 ---TRANSCRIPT--- [music] Hey everyone. Today I have for you a genuine glitch in reality that’s going to blow your mind and change the whole way you think about everything. So it’s called Bell’s theorem and this is one of the most mysterious, unsettling, magnificent results in all of theoretical physics. So let’s talk about it. Bell’s theorem demonstrates that quantum mechanics is weirdly non-local. That is, there’s something going on with quantum physics that doesn’t seem to be bothered by the limitations of space and time. Now, of course, much has been said about this, including in various popular science uh articles and videos and all that sort of thing. You often hear about quantum entanglement, spooky action at a distance, and all that kind of stuff. And there’s often some crossover with uh sci-fi, about communication systems that work faster than light and all that. And there’s also kind of this woo woo connotation about consciousness and all that sort of thing. And those are all really fanciful notions, but in many cases, what you hear about Bell’s theorem and quantum entanglement and all that is not well grounded in the actual physics and the math of quantum mechanics. And so I wanted to make a video where we actually really get into the technical details of what exactly did Bell teach us about the nature of reality. And so I wanted to go through his famous legendary 1964 paper, you know, word for word, equation for equation. I want to really dive into it and explore with you exactly what is his argument and what does it imply about the nature of reality. I should point out in case you don’t know, I recently made a video on the Einstein Podilski Rosen paradox which is definitely a prequel to this video. In fact, Bell’s legendary 1964 paper is called on the Einstein Podilski Rosen paradox. Okay, so this is a followup to the argument that Einstein, Podolski, and Rosen put forward back in 1935 in which they looked at quantum mechanics and said, “Hey, wait a minute. Something’s wrong here. Something’s paradoxical. Either quantum mechanics is super weird or maybe it’s just incomplete.” And so almost 30 years after that, John Stewart Bell thought about it real hard and was like, “You know what? Sorry Einstein and friends, actually quantum mechanics is not incomplete, but rather it’s just really weird and genuinely non-local in at least in some subtle ways.” So that’s the context in which Bell wrote this paper. It’s a follow-up to the argument put forward by Einstein, Podilski, and Rosen. So before watching this video, I do recommend watching my video on the EPR paradox. Or if you haven’t seen that video, but you’re just familiar with the EPR paradox, then that’s cool, too. You don’t have to get your info from me. I’m just one of many sources on this beautiful internet. All right, then let’s get into the paper. Well, first of all, this paper is broken up into six parts. Part one is the introduction. Part two is the formulation where we sort of define our terms and think about what it is we’re going to be thinking about. Part three is an illustration of some examples. And part four has the main argument of the paper in which we find that if you try to explain quantum physics using a local hidden variable theory, you run into a contradiction. In part five, the ideas are generalized. And in part six, we have our conclusion. So those are the six parts of this paper. We’re going to go through them one at a time. And in between these, I’m also going to have some animations and some information and equations that provide context because one thing you got to know about this paper is it is so cryptic and it is so dense with equations and very few words that if you just try to read it, it’s really hard actually. You really got to take your time with this one. And so we’re going to take our time and I’m going to have related animations and equations to help us along and to fill in the gaps in the paper where it’s assumed that the reader is going to be imagining a certain thing in mind when they read it. Oh, and speaking of, I’ve put a link to the PDF in the description below the video. And I definitely recommend printing out this paper so that you have it for reference as we go through it. If you don’t have a printer, that’s fine, but then you should open it up on another screen or another tab or something. All right. So now it’s time to get into the introduction of the paper. The paper begins. The paradox of Einstein, Podilski, and Rosen was advanced as an argument that quantum mechanics could not be a complete theory, but should be supplemented by additional variables. Remember at the end of the EPR paper they talked about how quantum physics is incomplete and it’s missing something and you have to put variables into quantum physics in order to have it provide a complete description of reality. These additional variables were to restore to the theory causality and locality and that’s often called local causality. It’s just the idea that cause and effect should propagate such that an object is only affected by its immediate surroundings. as opposed to some kind of weird teleportation or spooky action at a distance. So Einstein and friends argued that you have to put some kind of additional variables into quantum mechanics in order to resolve the EPR paradox and give quantum mechanics local causality. In this note that is Bell’s paper, that idea will be formulated mathematically and shown to be incompatible with the statistical predictions of quantum mechanics. So that’s what we’re going to do today. We’re going to mathematically explore the concept of hidden additional variables in quantum mechanics and show that it doesn’t work and that therefore quantum mechanics genuinely does exhibit non-local phenomena which is crazy. Like that goes against everything we think we know about the nature of reality. Anyway, it is the requirement of locality or more precisely that the result of a measurement on one system be unaffected by operations on a distant system with which it has interacted in the past. That creates the essential difficulty. So the hidden variable story doesn’t work if you require the theory to be local. There have been attempts to show that even without such a separability or locality requirement, no hidden variable interpretation of quantum mechanics is possible. These attempts have been examined elsewhere and found wanting. That is to say, actually, you can make a hidden variable interpretation of quantum mechanics work if you relax the constraint of locality. But then it’s like what’s the point, right? Moreover, a hidden variable interpretation of elementary quantum theory has been explicitly constructed. Here he’s referring to bombian mechanics. That particular interpretation bomb mechanics has indeed a grossly non-local structure. Famously bow mechanics is a non-local theory. This the non-locality is characteristic according to the results to be proved here of any such theory which reproduces exactly the quantum mechanical predictions. That is to say, what we’re going to show in this paper is that if you want a theory that matches the quantum mechanical statistics and you want it to involve hidden variables as advocated for by Einstein, Prolski, and Rosen, then necessarily you’re going to end up with a non-local theory. And of course, that non-locality is the same kind of dilemma that you end up having to confront if you just take quantum mechanics at face value in which it does appear to be a non-local theory. So, no matter how you look at it, there’s some weird non-local stuff going on in quantum mechanics. All right. Now, before going further, I want to say a few words about spin 1/2 particles because spin 1/2 particles are the main characters of this paper. And so, it’ll be helpful to review some of the main points regarding the experiment and theory of spin 1/2 particles. So on the experimental side for sure the most important and famous spin 1/2 experiment is the stern gerlock experiment. The way this experiment works is imagine that you have an oven and inside the oven you put some silver and the oven is so hot that the silver atoms start to evaporate and fly around with crazy high speeds and some of them are going to fly out of a hole in the oven. And then suppose you have some kind of apparatus called a columator so that we end up with a line of silver atoms flying in a particular direction. And also suppose this whole experiment happens in a vacuum so that the silver atoms aren’t bumping into air as they fly along. Now then this beam of atoms is directed to fly through a strong non-uniform magnetic field. And amazingly, what happens is that magnetic field somehow splits the beam of atoms into two beams. And it’s like, what what’s going on with that two beams? Why do we have two beams? How can it be that you have one beam of atoms coming in and you have two beams going out? Well, the key to understanding this is that a silver atom is electrically neutral. It’s 47 protons perfectly cancel out. It’s 47 electrons because it’s just a neutral atom. It’s not ionized. But if you look at the electrons in a silver atom, you find that all of the electrons are paired up in their various orbitals, but there remains a single unpaired electron in the 5s orbital. And so for all of the paired electrons in the silver atom, their spins cancel each other out. But the unpaired 5s electron has a spin of 1/2 because an electron is a spin 1/2 particle. And as a result, it’s sort of like the whole silver atom behaves like an electrically neutral spin 1/2 particle. So that unpaired electron spin gives the whole atom a tiny magnetic moment. That is it makes the silver atom sort of like a tiny little magnet. I should also say the nucleus of the silver atom also has a net spin of 1/2. But because the nucleus is so tightly packed compared to the electrons, the magnetic effect of the nuclear spin is thousands of times smaller than the magnetic effect of the electron spin. So for all intents and purposes, it doesn’t matter in this experiment. So then what happens to the silver atoms as they’re flying through this apparatus is that the initial beam is totally thermally random. I mean, you’re talking about evaporated silver atoms. There’s no preferred directionality to the spin. It’s all a random distribution over the spin directions. But then as they fly through the sternerlock magnet, for some reason the spins get projected either onto purely spin up or purely spin down. And that’s really weird because it’s not this distribution of some continuous quantity. No, it’s a quantum like either up or down. There’s only two options that it can be, which is super weird, right? This is a very quantum effect. And so then if we want to say okay well these two states are going to be separated by one quantum unit then you realize that given the symmetry of the situation since both beams are deflected by equal amounts we can say that spin up is associated with a quantity of plus 1/2 and spin down is associated with a quantity of - 1/2. So that the difference between plus one/2 and minus one/2 is one quantum unit. And so that’s why we call this a spin 1/2 particle. Okay. So we have two discrete beams. And clearly there’s something weirdly quantum going on here. But what’s really going on here? You know, cuz the story I just told about spin 1/2 and the electron, it’s like a little magnet and it separates out. What does that really mean? Like physically, how should we imagine that? Well, in a moment I’ll tell you a little bit of the quantum theory and then we’ll also imagine some kind of speculative hidden variable theory and we’ll see that those don’t really work. So, we’ll get into the theory in a moment, but for now I actually want to stick on the experimental side of things so that we can learn a little bit more about how spin 1/2 particles actually behave. So, imagine we do a sternerlock experiment where we have a beam of silver atoms flying through. It goes through the sternerlock magnet and it splits into two beams, spin up and spin down. Now suppose we put a wall so that all the spin down atoms hit the wall and they stop going. But then the spin up atoms, they can fly right through and they can keep going. And now we have a beam of spin up atoms. So then we line it up and pass it through another sternerlock magnet that’s oriented along the same axis, the same direction in space. Well, then an amazing thing happens, which is that in the second Stern Gerlock magnet, we only see a spin up beam. There’s no spin down. And I guess that’s not too surprising. It kind of makes sense because we start off with a random beam of silver atoms. We split that into a spin up and a spin down. And then we reme-measure and we find, okay, there’s only spin up. Yeah. Okay, that’s not too mind-blowing. That kind of makes a lot of sense, right? And remember, all of this is happening in a vacuum chamber. So there’s no air molecules that the silver atoms are bumping into cuz if there were, then we could imagine the beam kind of rerandomizing. You know, eventually the silver atoms are slamming into air molecules and getting all reoriented and all that sort of thing. So this is all happening inside a vacuum chamber. What this two-stage Stern Gerlock experiment shows is that spin is a state that the atom is in, right? It’s a property that persists with the atom and has some continuity across time. So that it makes sense to say this is a spin up atom at least for now. You know, I mean, it can bump into something and change its spin. But supposing it doesn’t, then it can continue on in that spin- up state for some amount of time. So that’s cool. That gives us some sense of the physicality of spin. But we’re still left with the mysterious question of why do we have two discrete options for a spin measurement anyway as opposed to some continuous range of outcomes? And how should we visualize a spin state? Well, again, we’ll talk about the theory of that in just a moment, but there’s one more experimental thing I want to show you before we get there. What we’re going to do now is imagine slightly rotating the second magnet by some small angle theta. And then a magical thing happens. The second beam now mostly comes out as spin up. But now there’s also a spin down beam as well. And it’s very subtle because all the spin up atoms that are flying through the second detector, most of them are going to come out spin up. But every now and then there is a chance that it’ll come out spin down. And so if you think about many atoms flying through and so it’s sort of like a continuous beam situation, then imagine a very bright spin up beam and a dull but nonzero spin down beam. And so then the question becomes what is the probability of it being spin up versus spin down in this kind of an experiment? And there’s actually a very good agreement between quantum mechanics and experimental results which show that for the atoms passing through the second magnet they have a cosine^ squar theta /2 probability of being spin up and likewise a sin^ 2 theta /2 probability of being spin down. Remember that cosine^ 2 + sin^ square is 1. So those probabilities add up to one 100%. And we’re going to take that as sort of a ground truth for this video. This cosine^ 2 / 2 sin^ square. We’re going to take that as an absolute fact about reality because it has been measured in many experiments and it is a pretty direct result of quantum theory. Oh, and one thing I should say in this diagram, you see that second beam is still horizontal even though I tilted the picture of the detector. In reality, if you’re doing an experiment like this, you would want to realign the second beam so that it comes in parallel to the detector. But there are ways of doing that without modifying the spin state of the particle. So I just didn’t show that in this diagram because I wanted to keep things simple. Actually, let me show you this. This is a cool much better diagram. So this comes from Wikipedia. Shout outs to Clara Kate Jones for making this beautiful diagram. What this diagram shows is a two-stage Stern Gerlock experiment. The particle beam comes in. You get a 50-50 split between spin up and spin down denoted as Z plus and Z minus. You know, because we’re measuring along the Z-axis. Then we send that second beam through the second detector. The second detector appears to be tilted, but is actually just in alignment with the way the Z plus beam comes out of the first detector. But now I want to look at something really cool, which is what if the second detector measures along a whole different axis. So, for example, if the second detector measures along the x-axis, the spin up particle beam goes through the second detector and then splits into a 50/50 probability mix of being spin left or spin right. By the way, instead of spin left and spin right, let’s use the language spin up along x and spin down along x. So you see when we say spin up and spin down, it’s always with reference to a measurement axis and spin up is going to be the beam which goes up relative to that axis. Okay? So we can always use the words spin up and spin down. But in this experiment, you can also think about it as spin left and spin right when we’re measuring along the x-axis. I suppose this experiment is not too surprising either because we see that the particles come in spin up. We wouldn’t really expect any kind of probabilistic biases as far as spin left, spin right because all we know is that the particles are all spin up and up is perpendicular to left and right. So it’d be kind of weird if the second particle beam had some kind of bias towards left and right, right? Like where would that come from? We should still expect some kind of randomness along the x direction. Okay, so that doesn’t really blow your mind, but this next part will. See, imagine we have a three-stage experiment where the particle beam comes in, the first detector splits into spin up and spin down. We send only the spin up through. Then the second detector measures along X. So we get our spin left and our spin right or in other words along X. We can talk about it in terms of spin up and spin down along X. And then suppose we only allow the spin up along X beam to go through. Then we measure again along the Zaxis. And the craziest thing happens. Look what we get. We get a 50/50 particle beam of spin up or spin down along Z. Well, how can that be? Because the first magnet already filtered out all of the spin down along Z. So, shouldn’t we expect for the outgoing beam, we should have only spin- ups, right? Isn’t that what we should expect is only spin up along Z because the first magnet already filtered out the spin down. But no, in reality in experiments, you get a 50/50 spin up along Z. So what is going on there? That’s very strange. And the reason this is so strange is that we know that spin is a property of the atom. We know that it’s a physical thing that the atom carries with it as it moves along. Right? Right? I mean, we thought about this earlier and we realized, yeah, the Stern Gerlock experiment shows us that spin is a state that the atom can be and it’s a property of the atom at some moment in time. And so, how can it be that if we’ve filtered out the spin down along Z atoms, somehow after the third detector, we get spin down along Z? Like, what’s happening there? How can spin be a conserved quantity if it comes back like that? Like, what’s going on? Now, what I’m showing here, this is just an experimental fact. This is the reality. And then as people, it’s on us to figure out how do we tell a story that makes sense of this reality. And so in just a moment, I’m going to tell you the quantum story, which is going to explain what’s happening here. And the long story short of that is when you measure the spin along some axis, the particle forgets its spin information along the other axis because you’re resetting the spin state of the particle. you’re projecting it into a spin igen state of whatever axis you most recently measured it on. And so once you measure it spin up spin down along X, now all of a sudden if it’s in a spin up along Xigg state, that has equal 50/50 odds of being measured spin up or spin down along Z. But then of course when you learn quantum physics you’re always thinking about this is so weird and so strange and I don’t like it and surely there’s some kind of more classical explanation with some kind of hidden variable. Surely there’s some kind of secret behavior happening inside the atom or to do with these detectors. Maybe the detectors are modifying the atom in such a way as to flip them up and flip them down and kind of reset their state. All right. So when you learn quantum physics, you yearn for a more sane explanation. And especially, you know what would be really nice is if we didn’t have all these weird quantum probabilities, right? So wouldn’t it be cool if we can come up with some kind of explanation for what’s going on in the Stern Gerlock experiment, but rather than this confusing quantum story with wave functions and states, what if we can come up with some kind of more classical deterministic model of what’s going on here? Even though such models don’t work, it’s still very helpful to give it a try, see what we can come up with, and then when we figure out the way in which the model doesn’t work, that’ll help us appreciate why we need quantum mechanics, even though it’s super weird. And seeing the failure of these local hidden variable models is going to segue very nicely into the core argument of Bell’s paper. All right. So, I want to return to this picture of the two-stage Stern Gerlock experiment where we use the first magnet just to filter out the spin down atoms and give us a beam of nice pure spin up atoms. Then, we’re going to send those through a second detector tilted relative to the first by an angle of theta. And as we talked about earlier, the probability of the atom being spin up in the second detector is going to be cosine^ 2 of theta / 2. In this plot, we put the theta angle along the x-axis and we put the percentage probability that it’ll be spin up on the y-axis. So on the far left of this plot, you can see that we have a 100% chance of measuring spin up when the second detector is tilted 0° relative to the first. That is when they’re in alignment. A spin up coming in is always a spin up going out. On the opposite extreme, if you imagine we put the second detector all the way upside down, 180 degrees tilted, then relative to that orientation, the detector is going to say, “Hey, every particle spin down.” And now that’s not too surprising because all that is is we’re flipping the second detector around. So what was defined as spin up is now relative to the second detector spin down. And so really, we don’t have to think about an angle of all the way up to 180° because the interesting stuff happens with a tilt angle between 0 and 90°. And beyond that point, there’s a kind of symmetry where it’s the same thing, but it’s just everything’s flipped relative to before. And speaking of 90°, if we tilted the second detector 90°, then we’d have a 50/50 chance of an incoming spin up atom going out as either spin up or spin down. Here’s an animation, and this will give us a more dynamic picture of what’s going on here. So, we have our incoming beam of silver atoms coming in from the left. They go through the first detector. We split out, spin up, spin down. The spin- ups keep going. And on the right, what I’m showing here, and this is just a rectangle, so it’s kind of abstract, but all I mean to indicate there is we’re doing a spin measurement along the axis symbolized by the orientation of that rectangle. And as the rectangle goes back and forth, you can kind of get a feel for how the relative probability of measuring spin up and spin down along that second measurement axis changes as a function of the angle. On one extreme, when the detectors are aligned, spin up is always spin up. On the other hand, when the detector is 90°, we get a 50/50 split. And in between, we get a probability which goes with this cosine^ 2 theta / 2 curve. Now this equation the cosine^ square of theta / 2 comes from the spinner math of what happens when you project a spin state relative to one axis onto another axis. But all of that spinner math and projection and all that that’s the weird quantum stuff we don’t want to have to deal with if we don’t have to. So when we’re trying to come up with a hidden variable explanation, we want to think in terms of some kind of quantity that we can attach to each particle. maybe some kind of arrow that indicates some sort of direction. And you know, one of the first things that comes to mind when you think about the Stern Gerlock experiment is maybe each incoming atom has some kind of vector-like directional quantity associated with it and then maybe the detector sort of flips that vector up or down as the particle passes through. Now, I’m not saying that’s the case. I’m just saying that’s kind of something that we might instinctively or intuitively think might be the case. And so let’s go ahead and test our intuition against logic and reason and see if it actually holds up. So what I’m showing here is an animation where we have these atoms coming in and there’s a yellow vector associated with each one of them which encodes some sort of orientational direction like thing that goes with the atom. And so for the sake of argument, we can say our incoming beam should have a random distribution over those vector angles because these are evaporated silver atoms and it’s all thermally random. Then suppose we claim that what a sternerlock magnet does is it’s going to flip that arrow either up or down. And then if it flips it up, it sends it upwards. If it flips it down, it sends it downwards. Well, at first glance, an explanation like this seems like it could possibly be kind of what’s going on here. This is a model where the Sternlock magnet plays a really active role in aligning the particle a certain way. And whether or not it flips up or flips down, we can say the rule there is just if the vector is pointing even a little bit up, it goes up. If it’s pointing even a little bit down, it goes down. If it’s pointing perfectly horizontal, well, in reality, nothing’s perfectly horizontal. There’s probability zero of that happening. And even if it did happen, it happens so rarely you’d never even notice. You know, the cool thing about physics is that you can put an idea forward and you can really propose it like, hey, maybe this is how it is. But one of the rules of physics is you have to stick to whatever principles you propose. But then if you can show that your own principle leads to a contradiction, well then sorry, but you have to redesign your model. Okay. So what I want to show now is that this assumption that the sternerlock magnet flips up or flips down the atom is actually not consistent with the experimental data. And the reason is actually very simple and you can totally see it which is that if you have a two-stage Stern Gerlock experiment where the second detector is tilted. We know from the experimental data that when the second detector is tilted then some of the particles should sometimes come out spin down even if they went in as spin up. But if we tilt the detector anywhere between 0° and all the way up to 89.9°, then by this rule that the sternerlock magnet is going to flip the particle in whichever way it was already kind of pointing in. Well, that leads us to see that an incoming beam of spin up is always going to come out spin up. And so right there you see that this model doesn’t actually work by our own principle that we put forward about these arrows getting flipped up or flipped down and and all that it doesn’t work. It just doesn’t match the two-stage Stern Gerlock experiment. And so whatever is going on with spin, it’s not that. It’s something else. So what do we do? Well, just because our model didn’t work doesn’t mean we can’t massage it into something that might work. So let’s go ahead and see if we can massage our model into something which matches the experimental data at least better than our first attempt which kind of matched the data in the case of one sternerlock magnet but failed miserably when we had two and the second one was tilted. Well, okay. So what if we did this? Let’s say that a sternerlock magnet doesn’t actually flip the particle up or down, right? Because if it does that, then as we’ve seen, the second detector is just going to give us a bunch of spin ups and no spin downs. So let’s say instead of flipping the arrow up or down, the Stern Gerlock magnet just kind of passively sorts these particles based on whether their vector points a little bit up or a little bit down. And so any vector that points even a little bit up, that gets sent towards the up beam. And any vector that points a little bit down, that atom goes in the down beam. But the sternerlock magnet doesn’t change the direction of that vector. So maybe this vector represents a kind of classical spin axis. Then in this model, the angular momentum of the particle would be conserved as it passes through the detector. But somehow and for some reason, the detector is just sorting the incoming particles into two beams depending on whether they’re a little bit up or a little bit down. Well, you know, there’s a problem with this model, which is that philosophically, it’s starting to feel a bit contrived because it’s hard to reconcile the fact that we see two discrete beams with such a passive thing going on at the detector. Because at least before when we thought that maybe the magnet just flips the thing up or flips the thing down, there you have kind of a naturally physically dichomous situation where yeah, it’s a sword, but then it’s also an action where the particles are really separated out in a binary way. So if you have a more passive situation where it’s just a sword, you kind of have to wonder, well then how is it that we get two sharp beams? But never mind all that because even though it seems implausible, that’s different than it being illogical or impossible or incoherent. You know, nature is weird. So maybe this is how it is. But now if we take this model and pass it through a second sternerlock magnet, the question comes up of does this model match the data? In particular, do we find a cosine^ squar theta / 2 of an incoming spin up remaining spin up versus a sin^ square theta /2 probability of it going spin down? Well, if you just look at the animation shown here, you can see that at first glance it kind of does seem to work because when the second detector is not tilted at all, anything coming in spin up is going to go out spin up. So that’s good. at theta equals 0, this model matches experiment. And then if you imagine at 90°, well, there it’s a 50/50 because coming in the spin up beam, that’s just going to be a vector that’s pointing up a little bit, but the distribution is totally random as far as left and right. And so when the detector is tilted at 90°, that could go either way at that point, you know. And so there again, we find another angle at which our model matches the data. And another wonderful thing about this model is that for intermediate angles, it kind of seems like it would fit the data. You know, if you tilt the detector like 45°, you can see there’s kind of a chance that it would be spin down versus spin up. And so at first, this feels very exciting and very promising. But when you think through it carefully, you realize that this model actually doesn’t quite [clears throat] match the cosine squared statistics that we get from the experiment and from quantum physics because instead of a cosine squared function, it’s actually just a linear function in theta. And that’s actually a very important point. So I want to linger on that for a moment and I want to see exactly why this model gives us a probability which is linear in theta. So you think about the fact that we have evaporated silver atoms coming in and presumably they’re all going to be randomly oriented. And so if we want to come up with a picture that involves this hidden variable of an orientational vector-like degree of freedom, call it lambda, then the situation we’re describing here begins with lambda vectors chosen totally at random as far as their direction is concerned. And if you like, you can imagine lambda is being selected uniformly from the unit circle. or if you want to be fully three-dimensional, the unit sphere. Although, as we’re about to see, it actually really doesn’t matter whether we think about it in terms of a two-dimensional situation or a three-dimensional situation. In either case, we find the same linear trend. All right, then. So, the particle passes through the first sternlock magnet and all of these vectors lambda that were pointing a little bit downwards get filtered out. They go in the spin down beam and we block that. But then if the vector is pointing even a little bit up then it keeps passing through and then it moves on to the next sterner lock detector. So let’s go ahead and use the vector P to symbolize the polarization vector that is the axis of measurement for the first sterning lock magnet. You see here based on the diagram that all of the particles that have made it through our filter are all going to be measured spin up if they’re measured again perfectly along the direction P with no tilt angle. And so that’s what it means experimentally to prepare some spin 1/2 particles some firmians with the spin polarization along the vector P. It means that for sure we know if we measure the spin along P we’re going to get spin up. Now then what can we say about that hidden variable vector lambda? Well, we can say that the particles that are allowed through necessarily have lambda which is somewhere in the northern hemisphere. that is the hemisphere that points in the same kind of direction as the polarization vector P. Or in other words, these are the lambda such that lambda.p is greater than zero. And the lambda are still going to be uniformly distributed around that hemisphere because they came in uniformly distributed around the sphere and we’ve just cut it in half. So now we want to ask the question of what is the probability of a particle with some lambda vector being measured spin up in the second detector which would happen in our local hidden variable model if lambda. A is greater than zero. That is if the lambda vector happens to be pointing in the same hemisphere as the measurement axis a. And when you think about it, you realize that the probability of lambda measuring spin up depends on the overlap of the lambda hemisphere and the a hemisphere. See, cuz if we draw a and then we think about the hemisphere of vectors that point in kind of the same direction as a that is for which the vector a is positive, you realize that the set of all lambdas which are going to be measured spin up is precisely the overlap between the lambda hemisphere and the a hemisphere. And given that lambda is going to have a uniform probability distribution, we can see then that the probability of measuring spin up is just going to be the fraction of the lambda hemisphere that overlaps with A. And the probability of it measuring spin down is going to be the fraction of lambda’s hemisphere that does not overlap with A. And if you see that, then you see one of the core concepts of Bell’s paper. We’re going to describe this slightly differently in a moment when we get into the paper and it’s going to be a little bit more complicated, but this right here is a very fundamental insight. Imagining rotating hemispheres and seeing how the overlap varies linearly. That is a mental image that you want to keep in mind as we get into parts three and four of the paper. All right, then. So, just to be really formal about this, let’s go ahead and say that theta is the tilt angle between our polarization vector P and our measurement axis vector A. And then I want you to go ahead and imagine rotating theta from 0 to pi or 180 if you want to talk in terms of degrees. Well, when you start off with theta equals 0, p and a are aligned the same way. And there’s a complete overlap between the lambda hemisphere and the a hemisphere. And so you have a 100% chance, guaranteed chance that when theta is zero, you’re going to measure the particle spin up. But now imagine theta growing and growing until theta equals 90° or p /2 radians. Well, at that point you’re going to have a 50/50 overlap between the lambda hemisphere and the a hemisphere. And so then you’re going to have a 50/50 chance of measuring spin up versus spin down. And then if you go ahead and flip it all the way around 180° A and P are perfectly antiparallel, then it’ll be guaranteed that you’ll measure spin down for a theta of 180°. Bearing in mind that spin down is relative to that upside down vector a. Now these three points for which theta is 0, theta is 90 and theta is 180° all of those actually do match the experimental data and quantum mechanics. So that’s all good. But what’s not all good is that linear dependence on the probability of measuring spin up as a function of the angle theta. And you can see that linear dependence just based on the way the area fraction changes as you slide theta around and you change the overlap between these two hemispheres. You know, one way to think about the probability logic here is just imagine you’re playing one of those board games that has the spinner thing and you spin the thing and then the probability that it lands on some wedge is just going to be the wedge area. Well, yeah. So when you think about that kind of logic and then you think about the wedge area of the overlap between the hemispheres and the way it changes you can see that the probability is indeed linear in theta. But now that linearity is actually a real problem because from experiments and from quantum mechanics we can very confidently say that the probability of measuring the particle spin up is not linear in the tilt angle theta but rather it’s the cosine^ square of theta / 2. And that fact that cosine squared curvy fact makes our linear model very hard to believe because the math is wrong. the statistical predictions of our model are not the true statistics of the situation. So what do we do? We just give up. Well, we actually should give up because as we’ll see in this, you know, the whole paper is about how local hidden variable models don’t work. But let’s not give up yet. Let’s be very stubborn, okay? Because technically there is a way that we can fix this particular model for this particular situation. And the way in which we do that is going to involve a concept which we’ll see later on in the paper. So we’re going to try to save this model somehow. And the way that we’re going to try to do that is going to be illustrative and teach us something about the situation. Even though ultimately this fix is going to break down when we later on start looking at quantum entanglement. All right. Then so the way to fix the model is to define an effective measurement axis. Call that a prime. and define that as the measurement axis A tilted towards the polarization vector P such that the equation 1 - 2 theta prime pi= cosine of theta is satisfied. Now here by theta prime I mean the tilt angle between the polarization vector p and the effective measurement axis a prime which has been magically tilted in towards the polarization vector p. And when you look at this equation here with the 1 - 2 theta prime pi, that is a linear equation. And then you look on the right hand side and that’s a cosine. Now this equation here, it’s not immediately obvious what this has to do with cosine^ 2 thet. In just a minute though, we’re going to talk about expectation values and cosine of theta. And then when we come back to this equation later on in the paper, it’ll make more sense why exactly it has the form that it does. But I don’t want to get into that just now because it’s a bit of a tangent. For now, all I want to say is that this equation involving theta prime and theta is going to warp the linear probability dependence of our model which is linear and theta is going to warp that into the cosine^ 2 theta /2 curve that we expect from quantum mechanics. And in fact, that is the definition of where this theta prime and theta equation comes from. So this trick is actually a lot simpler than it seems because when you think about what we have here, as we’ve seen, our model works when theta is 0, when theta is 90°, when theta is 180, but it breaks down in between because we have a line instead of a cosine squar. And so all this trick is is just saying that we can go ahead and warp that line into that cosine squared curve simply by saying that the effective measurement axis that the particle is actually being measured along is not the A that we thought it was but is actually this A tilted slightly towards the polarization vector P. And by doing that we can go ahead and bend the statistical predictions of our model in such a way as to make it match the experimental data and also quantum mechanics. Now, the first time you hear this, I mean, you should be thinking, “Rich, come on now. What? This is absurd. We should not tolerate this. We should not go along with this.” Your eyebrow should raise skeptically to the point where your forehead starts to get sore. Like, there’s just no credible way to justify this move, this little trick that we’re doing. And so, for that reason, I want to go ahead and call this the sketchy move. I know it’s kind of a playful terminology, but there’s a couple of good reasons why we want to call it this. First of all, it’s a concept that we’re going to see a couple more times throughout the paper. And then secondly, I want to emphasize that this move is not illegal. It’s not logically impossible. Technically, it doesn’t violate locality. There’s nothing uh physically impossible going on when we put forward this model. But it’s extremely sketchy and hard to believe because it raises so many questions. Why should the effective measurement axis be a prime? And also, how is it then that we have the polarization vector and also our hidden variable lambda vector that we both have to take into account? Because the polarization vector bends the effective measurement axis. Then we also have this lambda vector and what’s going on there? And our whole model starts to become complicated and contrived and very very hard to believe. But we’re not going to dismiss it just yet. because later when we think about quantum entanglement, we’re going to prove that even the sketchy move is no longer enough to save our model or any local hidden variable model. And that’s really at the heart of Bell’s theorem. So in summary, by going along with the sketchy move for now, we’re being maximally open-minded, we’re giving the local hidden variable perspective every benefit of the doubt. So that later on when we absolutely destroy local hidden variables, when we crush this idea, we’ll say, “Look, we even allowed the sketchy move and that still wasn’t enough to make it work.” Now, I want to take just a moment to talk about the kind of mathematical vocabulary we use in quantum physics when we’re describing measuring the spin of a spin 1/2 particle along some direction, call it a. And to do that you often see this expression sigma a. Let me tell you what that is. So we have the famous poly matrices which are sigma x is 0 1 1 0 sigma y is 0 i i 0 and sigma z is 1 0 01. And you can find the definition of these polymatrices in Griffith’s intro to elementary particles equation 4.26. Although honestly if you just Google polymatrices you’ll find them all over the place. They’re super famous. And these polymatrices are generators of sud 2, the le algebra of su2 which is the group that has to do with transformations of two component spinners. It’s the special unitary group of degree 2. Anyway, today we don’t need to get into the group theory of su2, but I just bring up the poly matrices in a sort of vocabulary like context. Like we’re not actually going to have to explore their mathematical properties, but I just want to show you why it is that these matrices are associated with measuring the spin of a spin 1/2 particle. You often see sigma with an arrow over it. And you can think of that as a vector whose components are the three poly matrices. So you have sigma x, sigma y, sigma z all packaged into this vector-like quantity. And with that sigma vector, we can go ahead and define the spin operator along the unit vector A as S hat. The spin operator equals H bar / 2 sigma. A. And what we mean by sigma A is we’re going to multiply all of the components of our measurement direction A with each of the corresponding poly matrices. So we have a sub x sigma x plus a sub y sigma y plus a subz sigma z. So when you pick out a particular direction in three-dimensional space and you want to measure the spin of a particle along that direction, the components of that direction unit vector are like weights of how much of each of the poly matrices we’re going to bake into our spin operator along that direction. Now why do we care about a spin operator? Well, as we talked about in the EPR paper, when you have an observable quantity like spin, the value of the quantity is going to be the igen value corresponding to the igen states of the operator. So if we have a spin 1/2 particle and its state is represented by the two component spinner s then the spin operator acts on s as the equation shat operating on s is h bar / 2 * sigma a * s and bear in mind sigma do a this is going to be a 2x2 matrix in fact if you want to think about it in terms of the lee algebra sue 2 that matrix is going to live at the coordinate It’s a sub x, a sub y, a subz within the lee algebra which is spanned by the poly matrices sigma x, sigma y, sigma z. If that makes sense, great. If it doesn’t, don’t worry about it. That’s a level of group theory that we don’t have to get into today. Instead, I want to give you a specific example of what it means for a particle to be an igen state of the spin operator. So if a particle has definite spin, that is we’ve measured the spin and it’s either spin up or spin down along some axis, then it is going to be an igen state of the spin operator along that axis. That’s what the measurement does. You measure the spin of a particle and you’re projecting its wave function onto an igen state of the spin operator along that axis. And so therefore s is going to be a solution to the equation of shat acting on s equals lambda s for some real value lambda which is going to be the spin of the particle. As a concrete example let’s suppose we’re measuring the spin of a particle along the zaxis. Well in that case our direction vector becomes 0 0 1 cuz the vector doesn’t point in x. It doesn’t point in y it points entirely in z. And so therefore if we evaluate this quantity of sigma a we find that we have no sigma x no sigma y and all sigma z. And so then our spin operator along the z direction becomes h bar / 2 1 0 01. And so now if we want to solve for what are the igen states of spin up and spin down along z all we have to do is solve this equation of h bar / 2 * this sigma z matrix * s equals lambda * s for some real igen value lambda and this igen vector igen value equation has the solutions of 1 0 or 0 1 for s and then you find igen values of plus h bar / 2 and minus h R /2 respectively. And you can verify that for yourself if you plug into that igen vector igen value equation these different options for S and lambda. Oh, and one other thing I’ll say is that for these igen vectors, you can go ahead and slap a complex phase factor onto both components and they remain states. And in a moment, I’ll show you a picture which makes that point obvious. But for now, I just leave that as a mathematical algebraic statement. All right. Right. Now, instead of the spin operator S hat, we may as well just talk in terms of sigma. A, which is conceptually it’s exactly the same thing as Shat. The only difference is it’s not scaled by that factor of H bar / 2. And so therefore, this sigma operator has nice dimensionless values of plus or minus one for spin up versus spin down. And so therefore the sentence the particle was measured spin up along the axis can be said as measuring sigma. A yielded a value of + one. Or in other words if you want to say the particle was measured spin down along the axis. We can say sigma. A yielded a value of negative 1. Or if you want to say the particle was measured spin up along the b axis you say sigma.b yielded a value of + one. Right? So [clears throat] what we have here is a very concise and mathematical way of saying that a spin 1/2 particle was measured along some axis and the result of that measurement is simply the igen value + one or minus1. So in Bell’s paper, he’s going to use this a lot. And so that’s why I wanted to show you where sigma.A comes from and what it means. And we don’t really have to get too deep today into the theory of SU2 and spinners and all that and poly matrices. So if you’re not super familiar with all of these algebraic details, that’s actually totally fine. For the purpose of understanding Belle’s paper, you really just have to know from a vocabulary point of view that sigma. A means measuring the particle spin along the AIS and that the results are going to be + one or minus one depending on whether it turns out to be spin up or spin down respectively. Before we move on, I do want to give you just a couple more examples of this concept just to make the idea a little bit more intuitive, a little bit more familiar. So suppose we had measured instead of along Z along the X direction. Well then we find that the spin operator along X is going to be H bar over 2 sigma X. And when you think about what are the solutions to h bar 2 sigma x acting on s= lambda s you find the igen states of 1 / 2 * 1 plus or - 1 corresponding to igen values of plus or - h bar / 2. That is to say we find the same exact kind of situation as before when we measured along z as far as the igen values. You have two options spin up or spin down. The magnitude of the observable is h bar over two. But now you have this spinner that’s in a different state. It’s pointing in a different direction. And by the way, the one over 2, that’s just a normalization constant. And likewise, we can repeat exactly the same procedure. We can measure along y. We find that the spin operator along the y direction is h bar over 2 sigma y. You solve that vector value equation. you find the igen states of 1 /2 1 plus orus i with the same old values of plus orus h bar / 2. And I know all of this feels very abstract, but there is a visual story that goes with this algebra. And I’ve touched on it in my previous videos about the mystery of spinners and electromagnetism as a gauge theory and also driving the dro equation where there’s a way of drawing a two component spinner as a flag in three dimensions. So for example, let’s take the igen state for a particle that’s in a spin up state relative to the z-axis. that is the spinner 1 Z. Well, if we plot that using this flag picture diagram and we’ll go ahead and slap on a time evolution phase factor corresponding to the energy of the particle, we see that we have a flag that points straight up along Z. And then the time evolution phase factor, that is the rotation in the complex plane, is going to twirl that flag around. If you’re curious as to the algebraic machinery that’s happening behind the scenes, definitely check out the paper an introduction to spinners by Andrew Mstein. That paper explains in depth how exactly the two component spinners map on to these flag diagrams. But now then if we plot the spin down along Z spinner 01 that is you see hey it’s a flag that’s pointing down along Z. So that makes sense. And now notice the time evolution phase vector which rotates the flag in the complex plane has the effect of twirling the flag but in the opposite way as before. Although really it’s the same way. It’s just that the flag is pointing in the opposite direction. The way to see this is point your right thumb along the direction that the flag pole is pointing and then you find that the phase factor is going to twirl the flag in the same way that your fingers go around on your right hand. So we find in these spinners a picture of a thing of some kind of quantity that has an orientation and that kind of spins around under a complex phase time evolution. And so that gives you a feel for some of the algebraic machinery that’s happening behind the scenes when we talk about spinners and polyatrices and all of that. And so [clears throat] now I want you to imagine in your mind what would the igen state of spin up along the xaxis look like? Well, there it is. Makes sense, right? So, this is 1 / 2 1 with the time evolution phase factor. We can go ahead and also add on the spin down along xigg state. And that’s exactly as you would expect. Now, let’s also add in the spin up along yen state. And there it is pointing along y spinning around. And if you add in the spin down along yen state, well then there it is. So without going into too too much detail about the algebra of spinners and all that, I just wanted to show you that there is a picture corresponding to all of this algebra. And that’s something that I would definitely encourage you to read more about and to explore. But for the purposes of Belell’s paper, we actually don’t need to get too into the details there. But I hope this has been useful context. All right. So before returning to the paper, I want to say a couple of words about the concept of the expectation value of these spin measurements cuz we’re going to see that concept later on in the paper. So remember earlier we were looking at the slide shown here and we thought about how if we rotate the second magnet by an angle theta for a particle beam, which we know is going to be spin up if we measure it vertically, then the beam is going to split into two beams. And for a small angle theta, it’s going to be mostly spin up. But there’s some probability of that also being spin down. And then as we talked about before, the probability of spin up is going to be cosine^ squar of that tilt angle theta / 2. And likewise, the probability of it being spin down is going to be 1 minus that. So we’re going to have sin^ square of theta / 2. And that’s all fine and good and that’s totally true and that’s one way to talk about it. But there’s another way we can talk about it in terms of expectation value which is in some ways more convenient. So to be really technical about this, suppose we go ahead and call the second magnet’s axis the vector A and then as we talked about we can use the notation sigma A as a shorthand for the result of measuring the spin along the axis A. Because as you know when you dot the sigma vector comprised of the poly matrices by some unit vector a you end up with something that’s directly proportional to the spin operator but which has igen values of + one if the particle is measured spin up and negative 1 if the particle is measured spin down. So then now we ask the question of what is the expectation value of sigma. A and all we mean by expectation value is the average over many measurements holding the A vector constant. Let me give you an analogy. Let’s say you’re a gambler and somehow you have the opportunity to play a game where you have a 60% chance of winning a dollar and a 40% chance of losing a dollar. Well, in that case, the expectation value is going to be 20 cents because you have 0.6 6 * 1 which is 6 and then you add on to that the 0.4

-1 which is 0.4 and so you have a net 0.2 expectation value of a profit and so you should play that game. Now the reason I bring up this analogy is because of course if you play the game once you’re not going to get 20. You’re either going to make a dollar or you’re going to lose a dollar. So we should not expect one game to yield 20 cents. However, if you play that game a 100 times you’re going to have about 20 bucks. that’s what you should expect to have. And so that’s exactly the sense in which we use the term expectation value when thinking about these spin measurements. In every case, when you measure the spin, it’s going to be a plus one or a minus one. But depending on the tilt angle and depending on the probability that depends on the tilt angle, there’s going to be some average number that we’ll find for that tilt angle over many subsequent measurements along that axis. And if you work out the math as we’ll do in just a moment, you end up with the plot shown here where on the x-axis we have the tilt angle theta and then if you look at this curve for the expectation value and by the way we use the bracket notation here to indicate expectation value. Well, as a sanity check, let’s go ahead and look at a few points and see if this curve kind of makes sense. So first of all when theta is zero and when a is aligned with the polarization of those incoming spin-up atoms then we find an expectation value of one and that makes sense because when the second detector is not tilted then every single time a spin up coming in is going to be a spin up going out and so sigma. A is going to yield an igen value of plus one all the time. So you do it 100 times you’re going to get 100 plus ones. And then conversely, if we flip a all the way upside down, then you have a spin up coming in relative to the upside down second detector. That’s always going to come out as a spin down. And so in that extreme case, you always have a negative 1 for sigma. A, therefore, the expectation value is precisely -1. Now, if you check out this point in the middle of the plot when theta is 90° and the measurement axis A is perfectly perpendicular to the incoming spin up polarization, well, in that case, sigma. A is going to be a +1 or a minus1, you know, each with a 50% probability. And so if you have a set of 100 numbers which are either +1 or minus1 with equal probability, well, you add those all up and on average you’re going to get zero. All right, then. So based on the three points we’ve looked at, the curve seems to make sense. But how do we calculate the exact form of this curve? Well, all you have to do is think like a gambler and say the expectation value is going to be the probability of measuring spin up along the axis A times a plus one corresponding to spin up plus the probability of measuring spin down along the axis A time the negative 1 that corresponds to spin down. This is just like in that game where you have 60% chance of winning a dollar, 40% chance of losing a dollar. So the expectation value is $0.2. So it’s the same reasoning as a gambling calculation. And as we saw earlier, we already know the probability of measuring spin up versus spin down. In the first case, we have a cosine^ 2 / 2 probability of measuring spin up. And then we have a sin^ 2 thet / 2 probability of measuring spin down. Now, if you are a trig identity enthusiast, you’ll recognize this form as having a delightful simplification, which is that cosine^ 2 / 2us theta / 2 equals cosine of theta. Isn’t that wonderful how that simplifies? So that’s a super nice result. And we’re going to see the same result in Belle’s paper in equation 3 in a slightly different context, but it’s the same exact reasoning. So anyway, that’s all I wanted to say about the expectation value. So just think about this as a pretty common and useful way of putting a statistical handle on this kind of probabilistic situation. All right, then. So now I think we’ve discussed all of the prerequisites that we need for the remainder of the paper. So now let’s go ahead and get into part two formulation. So remember how in the EPR paper they gave a specific example of a two particle wave function with anti-correlated momenta and correlated positions. And with that wave function, we saw how if we measure the momentum of one of the particles, we end up putting the other particle in a momentum state. And conversely, if we choose to measure the position of the particle, then we put the other one into a position state. So that specific wave function in the EPR paper was a very mathematically convenient example to illustrate the point. However, of course, the EPR paradox is more general than just a single specific two particle wave function. And if you see equations 7 and 8 of the EPR paper, you can see that more generically, whenever you have two particles in an entangled state and you think about representing that wave function as a sum over states of the first particle, then when you measure the first particle and put it into that igen state, that’s going to have an impact on the state of the second particle. And so really the EPR paradox is just the observation that because we have the freedom to choose which observable we measure of the first particle, we have the ability then to affect the quantum state of the second particle in a way that somehow appears to violate the constraint of local causality. So anyway, the reason I bring that up is because in Bell’s paper, we’re going to use a different two particle state to get at the same fundamental paradoxical nature of quantum physics. So instead of the particles having anti-correlated momenta and correlated positions, we’re going to imagine a pair of spin 1/2 particles whose spins are going to be in an entangled state. And this configuration for thinking about the EPR paradox is actually not original to Bell. It was first put forward by Bow and Aharonov in 1957. So part two of Bell’s paper begins with the example advocated by Bow and Aharonov. The EPR argument is the following. Consider a pair of spin 1/2 particles formed somehow in the singlet spin state. Now I want to pause here and say what exactly is the singlet spin state? Well that means that the spins of the two particles have no preferred direction a priori. If you think about either of the particles and you’re going to measure their spin, there’s total rotational symmetry in that neither of the particles has a preferred spin axis. It’s totally uniformly distributed over all possibilities. However, the spins of the particles exhibit perfectly anti-correlated outcomes when measured along the same axis. And this is a very bizarre state of affairs. Intuitively, you would think that such a state is not possible. And yet, the singlet state has been measured in all kinds of experiments. So, this really is possible. This is something that is real. And as we’ll talk about later in the paper, even though it’s very hard to imagine and it seems kind of surreal, the experimental data very strongly indicates that the singlet state is actually a legit thing that can exist. And you sometimes hear the singlet state described as the particles having equal and opposite spin. But that’s not exactly true, or rather that’s too narrow of a description. It is true that if you measure the two particles along the same axis, you’ll always find that their spins are equal and opposite. But, and this is really a super important fact about the singlet spin state. So, I want to reemphasize this. Before the measurement, neither of the particles has a preferred spin direction. This is very hard to imagine but that is a super important aspect of what it is for the particles to be in the singlet state. All right. So that’s the singlet state. Now imagine that we have some process which produces pairs of spin 1/2 particles in the singlet state and then each particle goes its separate way and they’re both moving freely in opposite directions. Now then suppose we send each particle into a detector say maybe a sternerlock magnet and then we measure the spin of both particles to get a sense of the kind of thing that happens here. At first we’re going to say that the detectors are measuring along the same axis. Let’s go ahead and denote that with the unit vectors A and B respectively. And for starters, those unit vectors are going to be precisely aligned so that we’re measuring both particles along the same spin axis. And now because the particles are in the singlet spin state, if we measure the spin of particle one along the direction A and we get the value of + one, right? So suppose particle one measures spin up along a then according to quantum mechanics and what it means for the particles to be in the singlet state. For sure it’s 100% guaranteed that measuring the spin of particle 2 along the same axis is going to yield a value of -1 that is spin down and vice versa. Had we measured particle one in the spin down state then we would know for sure that particle 2 would be spin up along the same axis. By the way, just a comment on the notation here. So, as we talked about earlier, the expression sigma A is shorthand for measuring the spin of the particle along the axis A. And this operator returns a plus one if it’s spin up along A and a minus one if it’s spin down along A. Now, then the subscripts here 1 and two, all that indicates is that in the first case we’re measuring particle one and in the second case we’re measuring particle two. So it’s not like we have two different sigma vectors. No, it’s the same poly matrices. It’s the same operator. It’s just that in the first case we apply it to the first particle. And in the second case, sigma 2, we apply that to the second particle. So now we make the hypothesis of local causality. And it seems one at least worth considering that if the two measurements are made at places remote from one another, the orientation of one magnet does not influence the result obtained with the other. And just to really emphasize that point, imagine that detector A and detector B are separated so far and that the measurement of particle one and the measurement of particle 2 happen so closely together in time that whatever tiny time difference there is between these two measurements, not even light could travel between detectors A and B during that time. So, we imagine that the measurements going on at detector A and detector B are completely causally disconnected if local causality is to be believed. But here’s where we run into the APR paradox. Since we can predict in advance the result of measuring any chosen component of the spin of particle 2 by previously measuring the same component of the spin of particle 1, it follows that the result of any such measurement must actually be predetermined. That is to say because the particles start off in the singlet state with no preferred spin direction. Then imagine particle one is measured in detector A ever so slightly before particle 2 is measured in detector B. You know by 0001 ns or whatever. Well, as soon as we’ve measured particle 1 along the axis A, now we can predict with certainty the component of the spin of particle 2 along the same axis. And yet that certainty does not exist in quantum physics. Now we can tell a story about non-local wave function collapse where you measure particle one along axis A and the wave function instantly collapses and then particle 2 is no longer in the singlet state but now it’s for sure going to be polarized in accordance with that measurement direction A. But assuming that we don’t allow for non-local wave function collapse because we want to preserve our sanity and we want to hold on to this concept of local causality, then we find here an apparent contradiction because the spin of particle 2 along the axis A should definitely not be predictable with certainty given the wave function of the singlet state. A quantum physics just doesn’t allow for that level of predictability unless we allow for the possibility of instantaneous wave function collapse. So then since the initial quantum mechanical wave function that is the singlet state does not determine the result of an individual measurement this predetermination implies the possibility of a more complete specification of the state. And so that is apparently the EPR paradox this time thought of in terms of spins rather than momentum and position states. And so in other words, all of this thought process leads us to think that surely there must be some kind of hidden variables that go along with particles one and two in a way that quantum mechanics doesn’t account for. And if only we had some kind of more complete model where we could figure out what are those hidden variables and what are their dynamics and how do they influence the spin measurements. Then surely we can find a more complete and more sane and more understandable explanation of what’s going on here than what quantum mechanics currently has to offer. Well, all right then. So we want a more complete theory involving some kind of hidden variables. So let this more complete specification be affected by means of parameters lambda. These are going to be our hidden variables. So in this video whenever you see this yellow lambda that’s going to stand for whatever hidden variables we want to put into our model that’s going to give us a more complete description of what’s happening. So you know earlier we were looking at the Sternerlock experiment and we were trying to explain it in terms of particles carrying with them this yellow vector. And so that was an example of lambda. But now we’re going to broaden that up a little bit. Or actually we’re going to broaden it up all the way and say lambda can be whatever you want it to be. Whatever you can imagine. A vector, a scalar, a tensor, a function, a set, whatever you want it to be. It is a matter of indifference in the following. Whether lambda denotes a single variable or a set or even a set of functions and whether the variables are discrete or continuous. The beautiful thing about Belle’s paper is it accounts for all possible hidden variable models in one fell swoop because it’s such a generic argument as we’ll see. However, we write as if lambda were a single continuous parameter. So the notation that we’ll be using, for example, we’ll integrate over all possible lambda and it’ll look like we’re assuming that lambda is a continuous parameter. However, what Belle is saying here is that if you want to modify the argument so that lambda is not a continuous parameter but is rather a discrete parameter or a set or whatever contrived thing you want to come up with, you can trivially modify the argument to account for that. Replace an integral with a sum or whatever you have to do. Those kinds of modifications won’t have any effect on the logical structure of the argument put forward in this paper. So now let’s think about what’s happening in these detectors. And at this moment we can go ahead and say that the axis of measurement in detector B does not have to be the same as the axis of measurement in detector A. So we’re going to make this more generic. Oh, and one thing that I’ll point out is that in everything we’re about to talk about, what matters as far as the orientations of the unit vectors of A and B is only the angle between those two vectors, the extent to which they’re aligned or misaligned. And when you think about two vectors in three-dimensional space, the two vectors are going to span a plane, and then there’s going to be some angle between them in that plane. And that angle between them, that theta angle is the relevant quantity when we’re thinking about how the orientations of these two measurement axes are going to matter. And so if you want, you can imagine a fully generic three-dimensional situation where A and B can point whichever ways you want to imagine them pointing. But because it’s only the theta angle between them that matters in whatever plane they happen to span, we may as well imagine the A vector pointing straight up. And then we can imagine the B vector having some random orientation in the plane. And so the diagram shown here on your two-dimensional screen with A pointing up and B pointing wherever, imagine rotating the B- axis a full 360. Well, for all intents and purposes, that 360 sweep is going to span all of the possibilities as far as the ways in which we can misorient our detectors relative to each other. And actually, you only need 180 cuz once you tilt it past 180, theta starts to come back in. See what I mean? And then technically by symmetry, all the interesting stuff happens between 0 and 90°. Okay. So then what is actually going on in these detectors? Well, if we assume this hidden variable model, then the result A of measuring the spin of particle 1 along the AIS is then determined by the AIS and the hidden variable lambda. So, particle 1 is coming in, it’s carrying with it some kind of hidden variable, maybe some vector, some scalar, some tensor, whatever it is, whatever hidden variable we want to imagine. And as particle one goes into detector A and detector A is oriented along the A axis, then the only things that are going to affect the spin measurement at particle 1 are the orientation that A vector and the hidden variable lambda that goes with particle

Because particles one and two are in the singlet state, they don’t have any a priori preferred directions. So the result of the spin measurement is going to be deterministically well determined by however the hidden variable lambda interacts with the detector oriented along A. And likewise then the result B of measuring the spin of particle 2 along the B ais in the same instance is determined by the B ais and lambda for exactly the same reason. And so we can write that the measurement outcome at A as a function of the measurement direction A and the hidden variables lambda can take on a value of + one or minus1 depending on whether particle 1 is measured spin up or spin down respectively. And likewise the measurement result at detector B which is a function of the B ais and the hidden variables lambda is also going to take on a value of +1 or minus1 for spin up and spin down respectively. And we’re going to leave this fully generic as far as in what way or by what function do the hidden variables interact with the measurement axis. Whatever it is you can imagine, whatever principle you want to go ahead and postulate, then it’s still for sure the case whatever these functions actually are, by definition, they’re going to have values of plus or minus one depending on the outcome of the spin measurement. Now the vital assumption of local causality is that the result B for particle 2 does not depend on the setting A of the magnet for particle 1. Nor does A depend on B. So in equation one you see that A is a function of the A vector and the hidden variables lambda. b is a function of the B vector and the hidden variables lambda. But notice that A is not a function of the B vector, nor is B a function of the A vector. The reason being detectors A and B are separated out so far and these measurements happen so quickly. So there’s no way that the information about which way one detector is oriented can propagate over to the other detector and affect the measurement result in any way. No, these two things happen in different light cones. And so by local causality, you can’t have the measurement result of A depending on the B vector or vice versa. And one of the things that we’re going to show in this paper is that any hidden variable model is going to have to violate that assumption. And the only way to get it to work is if you’re going to relax that constraint and say, okay, the measurement outcome at A depends on the orientation at B and vice versa. And then it’s like, oh, that’s weird. That’s non-local. That is absurd. But you know that’s like super weird. And then so at that point there’s no advantage of using a hidden variable model because whether you take ordinary quantum mechanics or some speculative hidden variable model in both cases you’re going to have a non-local model. And so no matter how you look at it it’s a glitch in reality. All right. Then suppose we define row of lambda as the probability distribution of the hidden variables lambda. So in other words, imagine all possible configurations of our hidden variables lambda, whether they’re vectors or scalers or tensors or functions or sets, whatever you want to imagine for lambda. There’s going to be some space of configurations, some space of possibilities that lambda can take on. And you can assign a probability to each and every configuration. And so row of lambda is precisely the distribution which defines how likely our hidden variables are to exist in whatever state we can imagine them existing in. So this is quite a generic thing and as we go through the paper we’ll imagine some specific cases with some simple functions for row of lambda. But notice the power in keeping this generic. See so far we haven’t narrowed down what lambda can be. Our hidden variables can be whatever you can imagine. And then row of lambda as a probability distribution on those hidden variables can also be whatever you want to imagine. Whatever distribution you want to take over whatever space of variables you want to define. And even though our setup is so generic, one of the things we can still say for sure is that the expectation value of the product of the two components measuring particle one along the A axis and measuring particle 2 along the B ais is going to be P of A and B where here P is the expectation value of the products of A and B that is the plus or minus one that’s recorded at each detector. We can say that P of A and B is going to be the integral over all possible configurations of hidden variables. Each one weighted by row of lambda that is how likely that configuration is to be. And then as we’re integrating over that space of possible hidden variables for each possibility, we simply multiply the outcome of the measurement at detector A, that is A of A and lambda times the measurement outcome at detector B, that is B of B and lambda. By the way, in Belle’s paper, he writes this integral as integral row lambda D lambda A * B. I like to write it in the sandwich notation where you have the integral sign on the left and the differential element on the right and then whatever you’re integrating over in between. It doesn’t matter either way. It’s just a stylistic choice. So, well, anyway, I want to reflect on exactly what this equation means, equation two, because it is of central importance to everything that follows. So, this parameter P, we’re going to go ahead and call that the correlation between our measurements. And this correlation has a really intuitive meaning. So the first thing to notice is that P the correlation has to be somewhere in between -1 and 1. When it’s negative 1, then the measurement outcomes at detector A are going to be perfectly anti-correlated with the measurement outcomes at detector B. So for example, this would be when detector A and detector B are aligned along precisely the same axis. Because if we have a pair of particles in the singlet state and we measure them both along the same axis then if one is spin up the other spin down and vice versa. So if a is + one then b is minus1 and vice versa. And so when we’re measuring the singlet state along the same axis then the product of a and b is always going to be -1 because 1 *1 is -1 and -1 * 1 is -1. And so in that configuration if the product of a and b is always neg -1 then equation 2 is simply the negative integral over row of lambda d lambda. Now this is a normalized probability distribution. So when you integrate over all possibilities and each one is weighted by the probability distribution the result of that integral is always going to equal one because there’s a 100% chance that the hidden variables are in some kind of configuration. And so then we find that P of A and B when A and B are the same vector is equal to -1. Conversely, if we flip B around so that now B is equal to A and our measurement axes are pointing in equal and opposite directions, then we find a correlation of one. That is the product of A and B is always going to equal one. Because if we measure the particle spin up in detector A, but then detector B is flipped upside down relative to detector A, then the other particle is also going to be measured spin up in detector B, but along the upside down axis. So the singlet correlation is still there. It’s just that when you flip the vector B upside down, that’s kind of a redefinition of what spin up and spin down means in detector B. And so in that case if the product of a and b is always equal to 1 because 1 * 1 is 1 and also - 1 * 1 is 1 then equation 2 simply reduces to the integral of row of lambda d lambda which because row is a normalized probability distribution equals 1. Now there’s one more special case that we can imagine which is when a and b are perpendicular. So suppose a is pointing straight up and b is pointing straight to the right. Well, in that case, we should expect a correlation of zero. The reason being in the singlet state, say you measure spin up along A, well, if B is perpendicular to A, then it could go either way. You could get a spin up or a spin down. And so on average, the product of A and B is going to be a + one or a minus1 about 50/50. And so that’ll average out to zero. So if we have a value of P equals Z, there is no correlation between the two detectors. Okay, so that’s equation two. The correlation between our measurement outcomes is found simply by integrating over the space of all possible hidden variables weighted by the probability of each configuration of the products of the plus -1 outcome at A times the plus orus one outcome at B. Now that correlation given by equation 2 based on a hidden variable model should equal the quantum mechanical expectation value which for the singlet state the expectation value of that product is going to be a b or as we saw earlier negative cosine of theta where theta is the angle between the two measurement axis vectors a and b. And the way to prove that equation three is true, that this is the quantum mechanical expectation value, and that this does match the experimental data is just to imagine that particle 1 gets to detector A ever so slightly before particle 2 gets to detector B. So then particle 1 is measured along the AIS and the wave function instantly collapses. And now particle 2 is going to be polarized opposite to the AIS. And so then when you measure the spin of particle 2 along the direction B, you can think about it sort of like the two-stage sternerlock experiment where we create a beam of purely polarized spin- up particles and we send that through a second detector which is tilted by some angle theta. And then as we know we have a cosine^ 2 probability of measuring spin up sin^ 2 thet2 probability of measuring spin down. And if you take the expectation value, you think like a gambler and calculate the expectation value, you end up with an expectation value of cosine of theta for the measurement outcome at the second detector if spin up is + one and spin down is ne1. And we saw that earlier. And then the minus sign here simply comes from the fact that the two particles in the singlet state are anti-correlated. So if particle one is spin up along the axis A, then particle 2 is actually going to be polarized spin down along A. And so that’s where the minus sign comes from. It’s basically just a 180 flip of the two-stage Stern Garlock experiment that we were looking at earlier. Well, anyway, all that’s to say, quantum mechanics tells us that the correlation of the measurement outcomes for unit vector A at detector A and unit vector B at detector B for two particles in the singlet state should be negative cosine of theta where theta is the angle between the two vectors. And so the main question of this paper is is it possible to have some hidden variable model based on some set of possible lambdas and some probability distribution which describes the likelihood of each lambda. Based on a model like that, can we get equation 2 to match the quantum mechanical and the experimental value of negative cosine of theta between the vectors a and b? If so, then such a hidden variable model might be plausible, you know, because it would match the data. It would match quantum theory and yet it would be an alternate way of looking at things. So that’s cool. But what we’re going to show in this paper, in particular, part four in the contradiction, is that no local hidden variable model can actually have an equation 2 correlation which matches the quantum mechanical correlation and the experimental data. And so therefore, we cannot have a local hidden variable explanation of what’s going on here. And so therefore, we have to confront the fact that quantum mechanics genuinely is super weird and non-local and a glitch in reality. Oh, and then one little caveat on the way we’ve formulated things here. Some might prefer a formulation in which the hidden variables fall into two sets with the measurement outcome at A dependent on one set of hidden variables and the measurement at B depending on another set of hidden variables. However, this possibility is contained in the above since lambda stands for any number of variables and the dependencies thereon of A and B are unrestricted. So in other words, if you want to have a hidden variable model where particle one carries with it some kind of set of hidden variables and particle 2 carries with it a whole another set of hidden variables, go right ahead. That’s fine. We’re not ruling out that possibility. When we use this character lambda to stand for any imaginable hidden variables, you can go ahead and imagine that in whatever way you want, including the situation where you have two sets of hidden variables, one for each particle. You know, go for it. That’s totally fine. were not restricting that possibility at all. And likewise, in a complete physical theory of the type envisaged by Einstein, the hidden variables would have dynamical significance and laws of motion. Our lambda can then be thought of as initial values of these variables at some suitable instant. So in other words, if you want to think about hidden variables as some kind of fields with dynamical significance, that’s cool, too. Everything we’re about to argue doesn’t rule out that possibility at all. And if you want, you can imagine lambda representing a snapshot in time of those fields. And then you can imagine those fields evolving in accordance with some dynamical equations. But none of that time evolution is going to break that thought experiment outside of the framework that we’re setting up because our argument is fully generic. Anything you can imagine for lambda, lambda can be. You know, I just noticed this yellow lambda. It kind of looks like a banana peel. You wouldn’t want that as a hidden variable. [laughter] Hey, that would affect the measurement of your spin state. All right, moving on. Part three of the paper begins. The proof of the main result is quite simple. Well, according to Belle, at least. I don’t know if I would say it’s quite simple, but uh anyway, before giving it in part four, however, a number of illustrations may serve to put it in perspective. So part three is all about establishing some context for part four looking at some specific examples which we’re then going to generalize in part four when we give the formal argumentation that local hidden variable models don’t work. Now I’m going to go ahead and break up part three into three parts 3 A 3 B and 3 C because this part of the paper is kind of naturally broken up into those three parts anyway and I want to take the time to zoom in on each part of this individually. So the first part of part three is that for a single particle we can make up a hidden variable story of what’s going on with the spin and it’s okay it seems to work. Firstly there is no difficulty in giving a hidden variable account of spin measurements on a single particle. Suppose we have a spin half particle in a pure spin state with polarization denoted by a unit vector P. And all that means is imagine we send a beam of spin 1/2 particles through a sternerlock magnet and then filter it out like what we saw before where we allow only the spin up particles through. Well, then if the axis of that sternerlock magnet is the vector P, then the outgoing beam of particles are polarized with reference to that vector P. That is to say, if you were to do a subsequent spin measurement on that particle along the direction P, then for sure the result of that measurement is going to be spin up. So that’s what it means for the particle to be polarized along the direction P. All right. Now then suppose we let our hidden variable be for example a unit vector lambda with uniform probability distribution over the hemisphere lambda.p is greater than zero. That is to say a lambda is going to be some additional directional or orientational degree of freedom that travels along with the particle. And we don’t know exactly what lambda is going to be. All we know about it is that it’s going to have a uniform probability distribution over the hemisphere which points in the same direction as P. And so this constraint that the dotproduct of lambda and P is greater than zero, all that means is that lambda kind of points towards P and it doesn’t kind of point away from P. Now, if you think back to what we saw earlier in this video, where we sent our particle through a two-stage Sternerlock experiment, and we supposed that all the magnet does is filters out the particles that point a little up versus a little down without actively flipping the arrow up and down. You’ll see that that thought experiment actually gives us a beam of this kind of particle where we start off with the assumption that the incoming particle, those evaporated silver atoms, have a totally randomly oriented lambda vector, but then we send it through the first sternerlock magnet to get a beam that’s purely polarized along the axis of that magnet. And then at that point, what we know about the lambda vector is it’s still going to be totally random, but only on the half of the sphere that kind of points along the direction P because the particles for which lambda pointed away from P were sent into the spin down beam and those didn’t go forward. And so the question then comes up, what happens if we measure the spin of this kind of particle along some axis A? Well, we already know what the expectation value is going to be. The expectation value of the spin of this kind of particle from quantum mechanics and from experiment is going to be the cosine of the tilt angle of the second detector relative to the first. That is in this language we would say it’s going to be the coine of the angle theta between the polarization vector P and the measurement vector A. So then suppose that as we’re building our hidden variable model, we speculate that the result of measuring along some axis A is going to be the sign of the hidden variable lambda vector dotted with the effective measurement axis A prime. See, we’re going to have to do a sketchy move here of the kind we talked about earlier. And so A prime is going to be a unit vector which depends on A and P in a way to be specified. We’re going to talk about exactly what that has to be in a moment, but this is exactly the same kind of sketchy move we looked at earlier when we were thinking about how can we modify our hidden variable model into something that matches the data. And in fact, the example we looked at earlier in the video is mathematically equivalent to what we’re talking about now. Oh, and then the sign function here simply takes on the values of + one or minus one according to the sign of its argument. So the sign of the dotproduct of the lambda vector and the effective measurement axis a prime is going to be positive if lambda kind of points along a prime and it’s going to be negative if lambda kind of points away from a prime. And so all this is to say the measurement result is going to be spin up if lambda is in the hemisphere whose pole is a prime and otherwise it’ll be spin down if lambda is outside of that hemisphere. And then you can say what if lambda is right on the equator relative to the north pole of a prime. Well, the probability of lambda being perfectly on the equator is zero. And so we don’t have to worry about it. As Bell says in his paper, actually this leaves the result undetermined when lambda a prime equals zero. But as the probability of this is zero, we will not make special prescriptions for it. So we don’t have to worry about that. Now then if you average over all possible hidden variable vectors lambda in accordance with the setup we’ve described here the expectation value of the spin measurement is going to be 1 - 2 theta prime over pi. Call that equation 5 where theta prime is the angle between the effective measurement axis a prime and the polarization vector p. That’s the same theta prime from our sketchy move we talked about earlier. And so let’s go ahead and see where equation 5 comes from. Why does this model give us an expectation value of 1 - 2 theta prime over pi? Well, the reason being is that the expectation value of the spin measurement along the measurement axis A in accordance with the equation for the rule that we’ve stipulated here is going to be the probability that the lambda vector is in the hemisphere defined with A prime at the pole times a + one for the spin up result plus the probability of lambda not being in a prime’s hemisphere times the negative 1 value which goes along with the spin down measurement. So we’re thinking like a gambler here and we’re calculating that expectation value. And then when we think about this, what we realize is that the expectation value of the spin measurement is going to be one, its maximum value, when the theta prime angle is zero. That is when our polarization vector is exactly aligned with the effective measurement axis a prime, then we’re always going to get spin up. like for sure 100% guarantee because when you think about the hemisphere of possible lambda vectors, well, those are going to be in the same hemisphere as the polarization vector. So if the polarization vector and the a prime vector point in exactly the same direction, then lambda is guaranteed to be in a prime’s hemisphere. So you’re always going to get a plus one in that case. And conversely, the expectation value of the spin measurement if the polarization vector P is completely antiparallel to the effective measurement axis A prime that is if theta prime is pi or 180° then we’re always going to get a negative one a spin down measurement in that case. If the polarization vector is pointing completely away from a prime then the space of possible lambda vectors is precisely the opposite of a prime’s hemisphere. And so you’re always going to get a spin down measurement in that case. And then if you think about rotating the polarization vector P relative to A prime and think about the overlap in the hemispheres of P and A prime, you see that the overlap varies linearly with the angle theta prime. This goes back to what we were talking about earlier. When you imagine the board game with the spinny thing and you spin the needle and the probability of it landing somewhere simply has to do with the area of the wedge that it’s going to land on. Well, as you rotate theta prime, you see that our expectation value is going to vary linearly with the angle theta prime for precisely the same reason. And you can think about that as a two-dimensional circle and a board game spinner thing. Or you can think about it in the full three dimensions as if it’s like an orange and you have the volume of the orange slice going along with the wedge angle. But in any case, this model is going to give us an expectation value of the spin measurement which is linearly dependent on theta prime. And so if you consider the two boundary conditions we’ve looked at for theta prime= 0 and theta prime= pi and then apply the fact that this is a linear function and then just think in terms of y = mx + b. You see that our equation for the expectation value of the spin measurement is necessarily 1 - 2 theta prime over pi. And as we know this linear function is not what quantum mechanics predicts and is not a match of the experimental data because in both cases that’s going to be the coine of the angle, not a linear function of the angle. But here’s where the sketchy move comes in. Right? Here’s why we have a prime instead of just a. Suppose then that a prime is obtained from a by rotation towards the polarization vector p until 1 - 2 thet prime / pi equals cosine of theta. Call that equation 6 where theta is the angle between the measurement axis a and the polarization vector p. So that’s that sketchy move that we use in order to warp the linear function into a cosine function. Well then if we do that if we apply equation six then we have the desired result that the expectation value of the spin measurement is cosine of theta which is in alignment with quantum physics and it’s in alignment with the experimental data. And so technically we haven’t done anything illegal here. We haven’t broken any rules and this model therefore cannot be completely dismissed though it is contrived and it is implausible and it’s like we don’t want to have to believe this because if we have a detector which is oriented along the vector A and we have to stipulate that no actually what’s happening there is the effective measurement axis is bent a little bit in towards the polarization vector. It’s like uh well you can say that but why would that be the case? This is not a very convincing model but we will not dismiss it on the basis that it’s not convincing. Instead we’re going to go ahead and say look it’s possible. We’re not going to rule it out just yet. And so by lowering the epistemic standards for the hidden variable model, then that’s going to hold us to a higher standard when later on we rule out all possible local hidden variable models. Because then we’ll be able to say, look, we went along with the sketchy move. We allowed it. But even allowing that, our proof later on is going to be so strong that we’re going to actually show that despite our generosity here, despite being maximally charitable to the local hidden variable model perspective, later on we’re going to show that it just doesn’t work. All right. So in this simple case there is no difficulty in the view that the result of every measurement is determined by the value of an extra variable lambda and that the statistical features of quantum mechanics arise because the value of this variable is unknown in individual instances. That is in this particular case we can come up with a story involving local hidden variables and it kind of appears to work even though it is a little bit sketchy. Okay, so part three of the paper then goes on to show that hidden variables also seem to work for special cases in which the two detectors have special orientations for their measurement axis. Secondly, there is no difficulty in reproducing in the form of equation two that is the correlation function based on local hidden variables the only features of the quantum mechanical and experimental correlation function three commonly used in verbal discussions of this problem. That is when our two measurement directions are the same in which case we have P of A and A cuz A and B are the same when they’re aligned the same way. And that’ll give us the negative of the correlation that we would find when B is equal to negative A and that’s equal to1. So when the unit vectors A and B are aligned the same way, we get a perfect anti-correlation of -1. And when A and B are oppositely aligned, then we get a perfect correlation of 1. And the other special case is when the dotproduct of A and B equals zero. That is when A and B are perfectly perpendicular to each other. in which case we have no correlation. So aligned the same way we have negative
Aligned opposite ways P is 1. Perpendicular P is zero. And these three special cases can be explained by a local hidden variable model. For example, let lambda now be the unit vector lambda with uniform probability distribution over all directions and take the rules that the measurement outcome a as a function of the unit vector a and this hidden variable lambda vector is going to be the sign of a dol lambda. And conversely, the measurement outcome at b as a function of the unit vector b and the hidden variable vector lambda is going to be the negative sign of b do lambda. By the way, in Belle’s paper, there’s a typo here. In the paper, it’s written as B is a function of A and B, but that should be B as a function of B and lambda. All right. So, what are we doing here? Well, what we’re saying is that we have the two particles in the singlet state, and we’re going to stick a unit vector onto this pair of particles. So you can imagine particles one and particles 2 both carrying along this orientational piece of information. This unit vector lambda which is chosen totally randomly out of all possible directions. And [clears throat] then when particle 1 gets to detector A, if lambda is pointing kind of along the direction of A, that is if the dotproduct of A and lambda is positive, then you measure a spin up of particle 1 in detector A. And likewise, as particle 2 is measured in detector B, if the lambda vector is pointing in the same kind of direction as B, then you measure a spin down at B. So what this model is is kind of uh what we might instinctively expect is happening with a pair of particles who have an entangled spin because you might expect that there is some kind of orientational quantity that each particle intrinsically has, but that quantum mechanics doesn’t account for. and that this hidden variable which carries with it a kind of orientation is what predetermines how particles 1 and two are going to be measured at A and B respectively. And so the claim is that this rule given by equation 9 works in the special cases that the vectors A and B are perfectly parallel, perfectly antiparallel or perfectly perpendicular. And you can show that that’s the case. So in the first case, imagine A and B being perfectly parallel. Well then in equation 9 you see that the rules for the measurement outcomes at a and b are going to be equal and opposite in that case because for a we have the sign of a dot lambda but if a and b are the same vector then for b the rule is that it’s the negative sign of b do lambda which is equal to a dot lambda. So you have the negative of the outcome of particle a. Therefore we find perfect anti-correlation in the case that the unit vector a equals the unit vector b. Likewise, then if you reverse that logic and you look at rule 9 in the case that A and B are antiparallel, so B equals negative A, then the measurement outcome at detector A is s of A dot lambda. And the measurement outcome at detector B is negative sign of B do lambda. B dot lambda in this case would equal A dot lambda. And you can carry that negative sign outside of the sign function. So that then the two negatives cancel out and we find for the measurement outcome at B sine of A do lambda which is precisely the same as the measurement outcome at A. So in the case that the measurement directions A and B are perfectly antiparallel we find a perfect correlation of one for the measurement outcomes with this local hidden variable model. And so in that case this model works just fine. And then finally, for the case that A and B are perpendicular, whatever the measurement outcome is at A, you’re going to have a 50/50 chance of it being the same or the opposite at B. And so in that case too, this model works just fine. But again, this model has a flaw, which is that just like what we saw before in part 3A, the dependence of the measurement correlation on the angle theta between the vectors A and B is linear in theta. It’s not the negative cosine of theta that we expect from quantum physics and that is shown in experiments. And to see that let’s draw a picture where we imagine all possibilities for lambda selected uniformly across all possible directions and then we draw the measurement direction a and you consider the hemisphere of all possible vectors that sort of point in the same direction as a that is all vectors for which a dot that vector is positive. Well, then the measurement result at detector A is going to be spin up if lambda is in the same hemisphere as A or it’ll be spin down if lambda is in the opposite hemisphere. So, we have a 50/50 chance of measuring spin up or spin down, which is an agreement with experiment. But then things get a little tricky when you also draw the measurement direction B in detector B and then you apply the same reasoning about what the measurement result is going to be in detector B. In this case, the result is going to be spin down if lambda is in the same hemisphere as B. Spin down because we’re in the singlet state where the spins are anti-correlated and that’s encoded in the minus sign in the second part of equation 9. And then conversely, detector B will measure spin up if lambda is not in the same hemisphere as the measurement direction B. And then if we want to go ahead and imagine this as an animation where we’re sweeping the theta angle and considering simultaneously all possibilities for the hidden variable lambda that are uniformly distributed over the sphere which you may as well imagine as a circle or a sphere because in either case the area or the volume respectively changes the same way as a function of the theta angle. Well, then just think about what is the probability of having the same outcome at both detectors versus the probability of having opposite outcomes. And what you realize is that you’re going to have the same outcome at both detectors when lambda is in the hemisphere of one of the measurement directions, but not in the hemisphere of the other measurement direction. So in this animation, if you look at the two sectors with the blue arc, for both of those sectors, you’re going to have the same measurement outcome for both A and B. And so the product of the outcomes at A and B is going to equal one if lambda lies in one of the two blue sectors shown here. And then on the other hand, if lambda is in the hemispheres of both measurement directions or neither measurement directions, then in that case you’re going to have opposite outcomes at the two detectors. And so then the product of the outcomes A and B is going to be -1. And so to find the correlation, all we have to do is compare the area of the blue sectors to the area of the red sectors. And so all the formula is is 1

the fraction of the circle taken up by the blue sectors minus 1 * the fraction of the circle taken up by the red sectors. And then as we sweep theta around, we can see the linear dependence of the correlation on the theta angle. And this linear dependence of the correlation on theta, which now we’ve seen a few times in a few different contexts, is really at the heart of Bell’s argument, as we’re going to see in part four. And so in part 3B of this paper, Bell shows us that the local hidden variable model does work for the three special cases where A and B are either parallel, antiparallel, or perpendicular. And when you look at the plot of the correlation that we get from our local hidden variable model that is this blue line and you compare it to the quantum mechanical correlation that we would expect namely negative cosine of theta you see that even though these two curves are different they do in fact intersect at precisely these three special cases. And so part 3B of Belle’s paper is all about saying like, yeah, the local hidden variable model does seem to work for those three special cases. But nonetheless, the local hidden variable model breaks down for anything other than those three special cases because a line is not a cosine. And there’s actually a couple of ways in which a line is not a cosine. The most obvious one is that there’s just a mismatch in these two curves for most values. So pick a theta value at random and negative cosine of theta is just not the same value as what our linear correlation gives us. So it doesn’t match. But the other noticeable thing that differs between this linear correlation that we get from our local hidden variable model and the quantum mechanical correlation is that the linear correlation has a nonzero slope at a theta angle of 0. Whereas the quantum mechanical correlation has a flat slope of zero at theta= 0. And this is kind of a subtle difference between these two correlation functions, but nonetheless, it is a difference and it’s a difference that’s totally generic to all local hidden variable models. So, one of the things that we’re going to prove in this paper in part 4 a is that any local hidden variable model is going to have a nonzero slope at a theta angle of zero. So this animation gives us a great intuition for how the local hidden variable model gives us a correlation which depends linearly on the angle theta between the vectors a and b. And therefore bell goes on to say this gives a correlation as a function of a and b of1 + 2 thet pi. Call that equation 10 where theta is the angle between the vectors a and b and 10 has the properties of equation 8. that is it works for the three special cases. And of course, the precise form of equation 10, this 2 pi, that’s just y= mx plus b. That’s just what it has to be to be a line that goes through the boundary conditions given by equation 8. But noticeably, the blue curve and the purple curve are not the same in general. Not only do their values not match in general but also at theta equals z the blue line has a non-zero slope whereas the purple quantum curve has a slope of zero. Now here Belle abruptly brings up a very important point although it is kind of jarring the way in which he brings it up so abruptly but in any case following the paper um for comparison consider the result of a modified theory in which the pure singlet state is replaced in the course of time by an isotropic mixture of product states. This gives the correlation function a b / 3. Call that equation 11. Now, what does that mean? I mean, that sentence just comes out of nowhere, right? And there is a lot that Belle is communicating in this one sentence. So, I want to take a moment to unpack exactly what he means because this is actually a really profound point. So when we have our purple curve of negative cosine theta for the correlation between the measurement outcome at detector A and detector B. This is based on the two particles being in the singlet spin state where before the measurement neither particle has a preferred spin direction. But the spin measurement outcomes for the two particles are guaranteed to be anti-correlated along the same measurement axis. whatever that measurement axis may be. On the other hand, if instead of the singlet state, we imagine that the two particles already have some preferred spin direction before they’re measured, but still their spins are equal and opposite relative to that particular spin direction, then we would expect anti-correlated spin measurements if the particles are measured along that particular spin direction. But if the particles are measured perpendicularly to that spin direction, then in that case we would expect no correlation between the spin outcomes of those two particles. And so what Belle means by isotropic mixture of product states is that imagine when we’re producing these particles instead of being in the singlet state with pure rotational symmetry and no preferred spin axis a priori instead of that the particle pairs do have an intrinsic preferred spin direction relative to which they’re equal and opposite and then by isotropic all that means is that that direction call it n hat is selected uniformly from the sphere. So the particles preferred direction is going to be totally random. And so now if you imagine measuring over many such pairs of particles and for the sake of argument suppose we imagine them along the same measurement axis A. Well sometimes that spin axis n is going to be aligned but usually it’s not going to be very aligned in which case we won’t really see much of a correlation. And when you work out the math of on average, what correlation strength would we expect, you find a correlation strength which is the same as for the singlet state, but divided by a factor of three, which represents the fact that when you average over all three dimensions of space, more often than not, our measurement directions are not going to be aligned with the spin direction n. And so we actually see a very strong theoretical and experimental difference between the singlet state and a situation where the particles have equal and opposite spin along some random axis. The correlation we get from the singlet state is weirdly strong in a surreal kind of way. And this reflects the fact that in the singlet state neither particle has a preferred direction before it’s measured. And so if you think in terms of one of the particles being measured ever so slightly before the other, then you’re guaranteed to collapse the wave function along that measurement direction. And so in the singlet state, your measurement axes are always going to be more aligned. Whereas for an isotropic mixture of product states, in general, you’re not going to have this kind of alignment. All right. So Belle then goes on to say it is probably less easy experimentally to distinguish equation 10 from equation 3 than equation 11 from equation 3. So equation 10 is the linear correlation that we get from our local hidden variable model. And equation 11 is the A.B3 that is negative cosine theta over 3 correlation that we get from a quantum mechanical model in which the two particles are not in the singlet state but rather are in a product state with some preferred direction. And what Bell is saying here is that there’s really a big contrast in the experimental data between a singlet state and an isotropic mixture of product states. whereas the linear correlation from a local hidden variable model is going to be a better approximation to the actual quantum mechanical singlet correlation. So that’s just a point about experimental practicality. Now before moving on from part 3B, Bell makes one final comment which is that unlike equation 3, the quantum mechanical correlation negative cosine of theta, the function of equation 10, this linear correlation we get from the local hidden variable model is not stationary. That is the slope is non zero at the minimum value -1 where theta equals 0. So we talked about that earlier when thinking about the differences between the blue line and the magenta curve that is between the local hidden variable model and the quantum mechanical correlation. One of the differences is that the values in general are not the same value. But another difference is that the quantum mechanical correlation has a slope of zero at its minimum value whereas the local hidden variable line does not. It’ll be seen in part 4 a that this is characteristic of functions of type two that is where the correlation is given by a local hidden variable model. So in part 4 a we’re going to prove that any local hidden variable model is going to have a nonzero slope in its correlation function at the minimum value which is incompatible with quantum mechanics and with the experimental data. And then in part 4B, we’re going to prove that in general, the two correlation curves for a local hidden variable model and for quantum mechanics in general cannot take on the same values everywhere. So in part four, we’re going to prove in two different ways that local hidden variable models are not compatible with quantum mechanics and not compatible with the experimental data. Okay, so then Bell wraps up part three by talking about how a hidden variable model could work if we allow for non-locality. Thirdly and finally, there is no difficulty in reproducing the quantum mechanical correlation of equation three if the results of the spin measurements at A and B in equation two, the correlation function of the local hidden variable model are allowed to depend on the measurement directions B and A respectively as well as on A and B. And Belle shows this by saying if we do a non-local sketchy move, we can warp the blue line into the magenta curve. So the reasoning here is exactly the same as what we’ve seen before when we thought about doing a sketchy move to warp the line into the curve. But the key difference now is that when you have two entangled particles that are separated in space, you can’t do this sketchy move unless you know the angle between the measurement directions A and B, which are in different light cones. And so this is a non-local sketchy move because somehow what’s happening at detector A depends on the measurement axis at detector B and vice versa. So as a concrete example of this, we can replace the vector A in equation 9 by an effective measurement axis A prime obtained from A by rotation towards the measurement vector B until 1 - 2 theta prime over pi equals cosine of theta where theta prime is the angle between the effective measurement axis A prime and B. So if you make that sketchy move then the blue line is going to warp into the magenta quantum curve and then in that case we would have a match between our hidden variable model and quantum mechanics and the experimental data. And so this is exactly the same reasoning as the sketchy moves that we looked at before. In fact, it’s exactly the same mathematical maneuver. However, for given values of the hidden variables, the results of measurements with one magnet now depend on the setting of the distant magnet, which is just what we would wish to avoid, that is non-locality. And there’s really no way around that. If you look at the example shown here where we replaced a with a prime and you think maybe there’s some way to do the sketchy move differently in a way that doesn’t violate locality, well, try to do that and you find it doesn’t work. So for example, what if instead of rotating A into A prime, we leave A alone and rotate B into B prime in a way that gives us the same result. Well, that would require for B prime to be a vector that’s slightly rotated towards A. And again, it’s the same thing. And in fact, by symmetry, that reasoning is the same as before, where now we’re just saying that what’s happening at detector B is somehow bent towards the measurement direction A. And so really, it’s the same kind of nonsense. And then also philosophically we might expect there to be some symmetry here. So if we wanted an idea like this to work maybe we should actually bend A to A prime and B to B prime where A prime is bent towards B and B prime is bent towards A in an equal and opposite kind of way. But in that case then both detectors know something about how the other detector is configured. And so fundamentally it’s exactly the same problem no matter how you look at it. So reflecting on part three, we’ve seen some specific examples of how hidden variable models don’t really work. They just don’t match the experimental data, whereas quantum mechanics does. And so what follows in part four is going to be very abstract, very mathematical, very algebraic, and we’re going to take our time with it because it’s a whole lot of equations and symbols and all that. But if you followed along part three, then you already have the fundamental insight required to make sense of part four. All we’re doing in part 4 is generalizing on this specific example to show first that every local hidden variable model is going to have a correlation function with nonzero slope at its minimum value, which is in contradiction with quantum mechanics and the experimental data. And then second, in part 4B, we’re going to show that in general, the correlation function given by a local hidden variable model cannot take on the same value as the correlation given by quantum mechanics and experiment at every theta point. That is for every possible configuration of the measurement axes A and B. And so it’s the same kind of reasoning that we’ve seen in part three, but just in a much more abstract and generic kind of way. And the abstraction is worth it. Even though it is somewhat impenetrable and it takes a lot of time to digest, it’s going to be a very powerful result. And so, as usual, ask not for easier equations, but for stronger coffee. You got to prepare yourself for this because it’s going to be a bit of work, but it is well worth the effort. All right, my friends. We’re now ready to approach the core argument of Bell’s paper, part four, contradiction. Okay, so in the first part of part four, we’re going to show that the correlation function that we get from a local hidden variable model cannot be stationary at its minimum value when theta equals 0 unlike the quantum correlation which is stationary that is does have zero slope at its minimum value for theta equals 0. And so this is going to be a generic difference between the kinds of correlations that local hidden variable models can give us and the correlation that we expect from quantum mechanics which is also the correlation measured in experiments. All right, the main result will now be proved because row is a normalized probability distribution. The integral over row d lambda equals 1. And we saw that before. That just means if you consider every possible configuration of hidden variables and add them all up, each one weighted by its probability, then the result is going to be one. In other words, the hidden variables have to be in some kind of configuration. And next, because of the properties of equation one, where we saw that the measurement outcomes at detectors A and B can only take on the values of + one or minus one depending on whether that detector measured spin up or spin down respectively. Then if we consider the definition of our local hidden variable correlation function in equation two where we found that P is going to be the integral over all possible configurations of the hidden variables of the measurement result at A times the measurement result at B and this correlation is going to be a function of the measurement axes A and B. Then as you can see this correlation P cannot be less than -1. That is the lowest value our correlation can be is a perfectly anti-correlated value of negative 1. And when can it take on that value? Well, as we’ve seen, the correlation function can only reach -1 at a equals b. That is when the two measurements are aligned along the same axis. Then for the singlet state, you’re going to have perfectly anti-correlated results. Measure spin up at detector A along the axis A. And for sure you know you’re going to measure spin down at detector B for an axis B which is equal to A. So we’ve seen that before. That’s nothing new. And now Belle makes a technically nuanced comment which is that this is only the case if A as a function of A and lambda is equal to B as a function of A and lambda except at a set of points lambda of zero probability. Now this is a technical caveat that is designed to keep this argument fully generic. We know from experiments that for the singlet state, it is going to be true that the measurement result at A for measurement axis A is indeed going to be equal to the negative of the measurement result at B for measurement along the same axis A. But because we’re trying to rule out the possibility of all imaginable hidden variable models, you could in theory imagine a model where these functions A as a function of A and lambda is not necessarily equal to B as a function of A and lambda. But you could have some superfluous configurations of hidden variables. And that’s technically fine as long as those configurations of hidden variables have zero probability. So this is a really minor point and honestly it probably kind of goes without saying because we know from the experimental data that for sure the result at detector A is going to be the negative of the result at detector B when you’re measuring along the same axis. So you can think of that as an experimental boundary condition. And if any local hidden variable model disagrees with that, that is if you have a local hidden variable model that goes against equation 13, well, that can only match the experiment if the lambda which violate equation 13 have zero probability of occurring. Anyway, I think the paper probably could have gone without that little comment about a set of points lambda of zero probability, but it’s in there just for the reader who’s going to be very pedantic about that. So, all right then. If we assume equation 13, which is really less of an assumption and more of an experimental boundary condition, then equation 2, the correlation for a local hidden variable model, can be written as P as a function of A and B is equal to the negative of the integral over all possible configurations of hidden variables of the result at detector A as a function of A and lambda times the hypothetical result at detector A as a function and lambda. Now let’s linger on that for a second. What is this term as a function of lambda? Well, what that is is imagine a generic case where we have our detectors A and B and A is aligned with some axis A and the alignment of detector B is some axis B. Well, we know that our correlation is going to depend on the product of the measurement results at detectors A and B. And all equation 14 is is that the result at detector B can be thought of as the negative of the result that detector A would measure if A were aligned along the B axis. And so you see the only difference between equation 14 and equation two is that the measurement result at B aligned along the axis B as a function of our hidden variables lambda has been replaced with what would have been the results of the measurement at A if A were aligned along the same axis B and we had the same hidden variables lambda. So this is just a way of writing our correlation in terms of measurement results at detector A. All right. And next what we’re going to do is we’re going to let C be another unit vector which is an alternative option for B. So imagine C as the alignment in detector B. In fact, at first imagine that C is the same thing as B and then give it just a little nudge so that C is just a little different than B. And then the question we can ask is if you imagine two hypothetical scenarios, one where you had the measurement axes A and B and another where you had the measurement axes A and C where C is just a little nudge away from B. Then how do we calculate the difference in the correlations P of A and B and P of A and C? In other words, what kind of difference in the correlation do we get when we apply a small little nudge on the axis of detector B? Well, all we have to do is replace P with the integral formula given by equation 14. And we can go ahead and smush these together into one integral. And we see that we have the negative integral over all possibilities for the hidden variables of a as a function of a and lambda * a as a function of b and lambda minus a as a function of a and lambda time a as a function of c and lambda which is how the correlation function would change if we slightly changed the measurement axis at detector B from the vector B to the very similar vector C. So now Belle goes on to algebraically massage this integral expression into a different form shown here. And to see what he’s done here, let’s go ahead and color code this like so. So first of all, you see that both parts of the integrant have in common this factor of a as a function of a and lambda. So we can go ahead and factor that out and pull that to the left. And the next thing you want to look at is in the top expression there, we have that factor of a of b and lambda. And we also have a minus sign. So now what we’re going to do to bring that into the bottom expression is we’re going to factor out that term a of b and lambda. So we’re going to bring that to the left. And then what remains is just the number one. But then we’re going to go ahead and pull in that minus sign from the outside of the integral to the inside. And so that term is just going to be a -1 inside of the brackets from which we factored out a of b and lambda. And then the final thing that we have to prove is that in that top expression, the term on the right involving the a of c and lambda can be brought down below and turned into this expression a of b and lambda time a of c and lambda. And to show that this is in fact a legitimate move, first of all, in the top equation, notice how we have two minus signs. And so those are going to cancel each other out. And then the only question that remains is, is the product of these two purple expressions times a of cm and lambda equal to a of c and lambda? Well, yeah, it is. The reason being that purple expression is the square of a of b and lambda. But remember this capital A, this is the measurement result at detector A. And the only values it can take on are either plus or minus one. But in either case, the square of plus or -1 equals 1. And so yeah, the purple expression then collapses onto the number one. And we see that this was in fact a legitimate move the way we’ve factored things out here. So what we end up with is the same integral we had before, but just massaged into a different form. All right. So now bell is going to claim that this integral expression is less than another integral. So using equation one which is where we specified that the measurement outcomes at detectors A and B can only either be +1 or minus1 then we can show that our integral expression is going to be less than or equal to the integral over row D lambda of 1 minus A as a function of B and lambda* A of C and lambda. Now, when I got to this part of the paper, I was looking at it and I was like, “Uh, hm. Okay, [clears throat] why? [laughter] How do we know that’s the case?” And I was staring at it for a while and I just couldn’t figure it out. I I don’t know if there’s supposed to be an easier way to do this because if you look at the two sides of this inequality, you see that on the left side we have something of the form n * m -1 and on the right side we have something of the form 1 - m. And notice that in both cases that blue expression m is the same number on both the left side and the right side. And both n and m are necessarily integers. And because both of them are just a * a, we know that n and m are both going to be plus or - 1. And so then the question just becomes whether it is in fact the case that n * mus1 is always less than or equal to 1 - m for the four possibilities of each n and m being plus or minus one. So anyway, I ended up just checking all four possibilities and verifying that for all the four possible options, this is actually true. I don’t know if there’s a more elegant way of demonstrating that this is true. But in any case, this way works fine. It’s just a little bit tedious. If you check all four possibilities here, you find that this is in fact a legit move and the integral expression on the left is indeed always less than or equal to the integral expression on the right. Okay, then. So that works. But why do we care? Like what are we doing here? Well, notice this. If we look at that integral expression, the second term on the right is our correlation function evaluated for the vectors B and C. You see, because by equation 14, we know that we can calculate our correlation function in terms of results at detector A as a function of measurement axes and the hidden variables. And in that case we just integrate over all row d lambda a of a and lambda* a of b and lambda with a minus sign on the outside. And so by pattern recognition we can see that the second term on the right of this integral by equation 14 is actually equal to p evaluated with the vectors b and c. And that’s going to be very important in just a moment. Okay. Okay. So having recognized P as a function of B and C, it follows that 1 + P as a function of B and C is greater than or equal to the absolute value of P of A and B minus P of A and C. And you can kind of read that directly from the characters that are colorful here. You see because if you think about the expression that we’ve been evaluating, remember we started off with thinking about what is the difference in the correlation function. If we have P as a function of A and B compared to that is minus the correlation as a function of A and C where C is a vector very much like B but with a little nudge. And we showed after evaluating all of these integral expressions that this difference in correlations has to be less than or equal to this integral expression which contains in it P is a function of B and C plus one. There’s also a one in the integral. But because row is normalized, that one just pops outside of the integral. But then Belle goes ahead and switches this expression around so that you have the difference in the correlation on the right side and we pull the expression involving P of B and C on over to the left side. And so that’s why the less than or equal to sign flips around into a greater than or equal to. And so that reasoning justifies equation 15 without the absolute value. But now we need to justify where that absolute value comes from. And as it turns out, the absolute value sign arises from symmetry. So imagine swapping the vectors B and C in all of these equations. Well, on the left hand side, when you consider the function P as a function of B and C, if instead we had P as a function of C and B, that’s actually the same thing. That’s equal to P as a function of B and C. Because at the end of the day, you’re still measuring along the same two measurement axes. And it doesn’t matter which detector we say is detector A versus detector B. So the order of the input B and C doesn’t matter in the correlation function. However, on the right hand side where we have this difference P of A and B minus P of A and C, if you switch around B and C on that side, you end up with P of A and C minus P of A and B, which is the same right hand side as before, but with a sign flip. And then you think about the fact that we should be able to swap B and C around in this argument. By symmetry, there’s no meaningful difference between the vectors B and C. And so then you can imagine that the same line of reasoning shows us that our left hand side is going to be greater than or equal to plus or minus the right hand side. And so without loss of generality, we can go ahead and clean that up and just say that the left hand side is greater than or equal to the absolute value of the right hand side. So we’re not losing anything by shaving off that negative option. All right. So then Bell goes on to say that unless P is constant, the right hand side is in general of order absolute value B minus C for small absolute value of B minus C. And in just a moment I’m going to unpack why that is. But real quick, I just want to read the next thing that Belle wrote, which is that thus P of B and C cannot be stationary at the minimum value, which is -1, where B equals C. Right? When the axes are aligned, our correlation takes on its minimum value of a perfect anti-correlation. And therefore, the correlation function cannot equal the quantum mechanical value given by equation 3, which is a b or also known as negative cosine of theta. Okay, now I am a huge fan of Belle and his work and he is a great genius, but my goodness does he say so much with so few words and here it’s kind of hard to see exactly what he’s talking about. So I want to take a moment to just unpack this and really get into what exactly he’s saying here. Okay, so the first thing to realize is that if we write our correlation function P of B and C and we think about it just as a mathematical function that takes two vectors as input and we know that this function is going to take on a minimum value of -1 when the vector B equals the vector C. Then we can say that if the function were stationary then the curve would be flat there at that value. Just like the negative cosine of theta curve of quantum mechanics is flat at theta equals 0. So if we claim that we have a hidden variable model that matches the quantum mechanical predictions, we should expect it to have a slope of zero when its two vector inputs are the same vector. But now if it’s flat wherever its two vector inputs are the same, then we can say something about this situation. We can say that if now we imagine that the vectors B and C are very similar. They’re almost the same. B is approximately C with the absolute value of B minus C. That is the size of the tiny difference between these two vectors. Call that epsilon. And let’s say epsilon is much less than one. It’s a very small number. Then our correlation function evaluated for the inputs B and C is going to be -1 plus some positive number that is of order epsilon squared or in principle it could be higher order in epsilon but the biggest of a number it could be for small epsilon is going to be of order of epsilon squar but we can’t have a first order term of order epsilon because then that would be a slope in the function you see what I mean if P evaluated at B and C for B approximately equal to C if that had the form of -1 + something on the order of epsilon then the function would be sloped there and that wouldn’t be stationary that wouldn’t be a zero slope situation and so what we’re saying here about this second order or higher in epsilon I mean this is really just the definition of what it means for the function to be stationary at its minimum value you know the slope is zero well okay then but the absolute value of The difference in the correlations P as a function of A and B minus P as a function of A and C. That is the difference in correlations that we would get if first we have our detectors set up with the axis A on one side and the axis B on the other side minus what the correlation would be if instead we had the axis A on one side and the axis C on the other side where again C is equal to B plus a tiny little nudge. Well, that difference in the correlations, its absolute value is going to be first order in epsilon. The reason being our correlation function changes when A and/ or B change. I mean by definition, you think about how the correlation is defined as the integral of the product of A and B integrated over all possible lambda each weighted by the probability and lambda. Well, the only thing that can change as we rotate C a little bit away from B in the correlation function is going to be some of the results at A and B changing sign from +1 to minus1 or minus1 to + one. And this is a very binary thing. And so the amount of change that’s happening here is going to be directly proportional to the difference between the vectors B and C epsilon for small epsilon. And if you think about what the measurement results are going to be at A and B as we’re moving C slightly away from B, you can think about like a belt of area where A and B are flipping sign and that’s contributing to the change in the correlation. Sort of like thinking about an orange slice having a volume proportional to the wedge angle. And then you see that that area of A and B flipping around is going to be directly proportional to this change in the correlations. Well, okay. So considering all of that we run into a contradiction because then equation 15 would imply that a positive number which is second order in a small epsilon is greater than or equal to a positive number which is first order in a small epsilon. But that’s not true for a small positive epsilon the first order term dominates because if epsilon is small then epsilon squar is a small* a small which is a tiny and so the thing should be flipped around the other way. you know something of the order of epsilon squar is going to be smaller than order of epsilon not what equation 15 would imply and so that mathematical contradiction proves that our correlation function cannot be stationary at its minimum value unlike the quantum mechanical correlation function which is stationary at its minimum value. And so this is one way in which we see that a local hidden variable model cannot give us the same correlation function as quantum mechanics which is also the correlation function that we see in experiments. And so this right here is the first of the two parts of part 4 where we’ve proven that a local hidden variable model just is not capable of reproducing the statistical predictions of quantum mechanics. Now it’s time to get into the second part of part four. And this is the main argument of Bell’s paper. This is the really powerful proof that the correlation function we get from a local hidden variable model cannot be equal to the quantum mechanical correlation function. In other words, in just the same way that we’ve seen that a line cannot be a cosine, it’s true more generically that any kind of correlation function we can get from a classical hidden variable model cannot be equal to the quantum mechanical correlation of negative cosine of theta also known as a b. All right. So having already proven the thing about the slope being non zero, Bell goes on to say, nor can the quantum mechanical correlation of equation 3 that is a b also known as negative cosine of theta be arbitrarily closely approximated by the form of equation 2 that is a correlation function given by a local hidden variable model. No matter what kind of local hidden variable model you want to come up with, it’s just not the case that the correlations that model gives you are going to be the same as the quantum mechanical correlations. And this holds for all possible local hidden variable models. The formal proof of this may be set out as follows. Well, first of all, we would not worry about the failure of the approximation at isolated points. So let us consider instead of equation 2 and three the functions p bar of a and b and a dob bar. And these functions are essentially exactly the same thing as equations 2 and three but they’re averaged over vectors near the vectors a and b. So the bar denotes independent averaging of the correlations as a function of a prime and b prime within specified small angles of a and b. Okay, so let’s pause here and think about what Belle is saying and why it matters. So this averaging thing, it’s kind of a mathematically pedantic point. But what Belle is saying here is look, let’s be generous and say that if someone came up with a local hidden variable model which had a correlation function that matched the quantum mechanical correlation for the most part, but there were isolated points at specific values of A and B where there was a mismatch between the local hidden variable correlation and the quantum mechanical correlation. So for example, let’s say P of A and B is equal to A.B everywhere except at one special point where A is equal to B or whatever it may be. And at that one infinite decimally small point suppose there’s some disagreement between P and the quantum mechanical correlation. What Belle is saying is don’t worry about that. If the local hidden variable model matches the quantum correlations except at these special isolated points where for whatever reason it doesn’t work out, you know what? We’re going to be generous and we’re going to say that would work. The reason being experimentally we might not notice if there was a mismatch at very specific isolated points between local hidden variables and quantum mechanics. And so when you’re thinking about the mismatch between the local hidden variable correlation and the quantum mechanical correlation, you want to kind of smear things out or smooth things out just a bit to where a mismatch at an isolated point would be totally washed away. And so all we’re doing when we’re taking this average over very close nearby points is we’re just saying don’t worry if the correlation fails at specific isolated points. That’s all that is. So to imagine the vectors A prime and B prime, just imagine the vectors A and B, but then smear them out just a little bit over a tiny little space of nearby vectors. That’s all that means. Suppose that for all A and B, the difference between the local hidden variable correlation and the quantum mechanical correlation is bounded by some number epsilon. That is P bar of A and B plus A.B bar. the absolute value is always going to be less than or equal to this value epsilon. Now the thing you have to see about equation 16 is that this is just the local hidden variable model correlation minus the quantum mechanical correlation because the quantum mechanical correlation is negative a b and so this plus a b this is minus the quantum mechanical correlation and then you take the absolute value and that is just the magnitude of the error or the mismatch between our local hidden variable models correlation and the quantum mechanical correlation. And so what epsilon represents is the maximum amount of error in our local hidden variable model relative to quantum mechanics. And then as a reminder, these bars are just there to say don’t worry about single isolated points where there’s a mismatch. We’ll allow that. We’re going to go ahead and smooth out or filter out any infinite decimally small points of mismatch. So epsilon is the maximum mismatch when you factor out any infinite decimally small areas where the two correlation functions disagree. And so if we can show that epsilon is zero for some local hidden variable model then that model would effectively reproduce the quantum mechanical correlation. So that would work. However, it will be shown that epsilon cannot be made arbitrarily small. That is what we’re about to prove is that at minimum epsilon has to be some nonzero number. And so therefore, you’re always going to have some mismatch between the local hidden variable correlation and the quantum mechanical correlation, no matter the details of your local hidden variable model. So that’s going to be the main proof of Bell’s paper. All right. So next we’re going to massage equation 16 into a slightly different form by supposing that for all a and b the absolute value of a dob bar minus a dob is going to be less than or equal to some small number delta. So this expression is the mismatch between the average dotproduct over the a prime and b prime that are close to a and b minus the exact dotproduct a dob. So you can think about this as the error introduced into the quantum mechanical correlation as a result of our averaging technique. So as we smooth things out just a little bit and we average away those infinite decimal potential points of mismatch. Suppose that this is going to smear things out such that the average of the dotproduct of a and b minus the dotproduct of exactly a and exactly b is going to be at most some small number delta. Then from equation 16 we find that p bar of a and b plus a dob that is the average local hidden variable correlation function minus the exact quantum mechanical correlation function evaluated at exactly a and b. Notice we no longer have the bar over a dob is going to be less than or equal to the small number epsilon plus the small number delta. That is the mismatch between P bar and the exact quantum correlation function evaluated at A and B has to be less than or equal to the maximum mismatch between P bar and A.B bar plus whatever the maximum number is that results from us smearing out the quantum mechanical correlation a dob into a b bar. And that kind of makes sense just by looking at it. But just to show exactly how equation 18 follows from equation 16 and 17, we can go ahead and write equation 18 as p bar of a and b plus a dob bar plus a dob minus a dob bar. See, all we’ve done here is within that absolute value, we’ve added an a.b bar and we’ve subtracted out an a.b bar. Now, why does that matter? Well, because now we know that that has to be less than or equal to the absolute value of p of a and b plus a dob bar plus the absolute value of a dob minus a dob bar. And that comes from the triangle inequality, which is the idea that if you have the absolute value of x + y, that can at most be the absolute value of x plus the absolute value of y. Well, then now if you examine these two quantities on the right side, you see that the first one p bar of a and b plus a dob bar is what we have in equation

And so we know that has to be less than or equal to epsilon. And then the yellow expression a dob minus a dob bar absolute value. Well, that’s the same thing we have in equation 17. And so that has to be less than or equal to delta. And so therefore, the whole thing has to be less than or equal to epsilon plus delta. And so therefore we’ve just proven equation 18. Okay. So next we want to think about what exactly is par bar of a and b. Well by equation two this is just going to be p of a and b the local hidden variable correlation function but averaged out over a space of vectors a prime and b prime which are very close to the vectors a and b but just a little bit smeared out so we don’t worry about weird singular points. And so therefore we can write P bar in exactly the same way as we write P in equation 2. But here we simply put a bar over A and B because when we smear out the vectors A and B a little bit and we ask what is P bar? Well, that’s just going to depend on how smearing out A and B affects the average of the results at detector A and detector B because the correlation function is just the product of the results at A and B integrated over all possible hidden variables. And remember that the bar is just averaging out or smearing out the vectors A and B a little bit. So that’s not going to affect the distribution of hidden variables. And that’s why we don’t have a bar over the row because this process of smearing out A and B doesn’t have any effect on the probability distribution of our hidden variables. But now if you think about what are the values that a bar and b bar are going to take on. Well remember that the results at a and b can only ever be plus or minus one. And so now when we smear out a and b and we’re going to average over the values that a and b take on for these smeared out vectors. Well then we find that at most the absolute values of a and b are going to be one. But now it is possible for a bar and b bar to be less than one. If when we smear out the vectors a and b, we dip into a space of the detector results where the sign flips relative to what it would have been along exactly the measurement direction a or the measurement direction b. That is to say, if the result at detector a is a function of a is equal to 1. But if you give a a little nudge, then you could nudge the result into being negative 1. If the measurement direction a is right on the edge of what determines the sign of the result at detector A, well then in that case, the absolute value of the result at A might be something like 0.9 or 0.8 or whatever. But no matter what, it’s going to be some number less than or equal to

All right. And next, Belle goes ahead and constructs equation 21 from equations 18 and 19 with the measurement direction A set equal to the measurement direction B. So that yields equation 21. And in just a moment, we’re going to use equation 21 and we’re going to see why it matters and why Bell writes it out. But for now, I just want to reflect on how equation 21 follows from equations 18 and 19. So the first thing to recognize is that the right hand side of equation 21 is precisely the same as the right hand side of equation 18. And then if you look at the left hand side of equation 18, you see the p bar of a and b. And you can recognize that on the left hand side of equation 21 as the integral over all row d lambda of a as a function of b and lambda time b as a function of b and lambda. Because remember here in the context of equation 21, we’re setting the two measurement axes to the same vector B for both detectors. And so then we see that this integral expression is par bar evaluated for the vectors b and b. Now then you notice there’s also that plus one inside the integrant and that is simply a dob because when a and b are the same unit vector then you have b dob which is magnitude of b ^ 2 which is 1 cuz b is a unit vector and the one we can bring inside or outside of the integral because of the fact that the integral of row lambda d lambda is equal to 1 because row is a normalized probability distribution. And then there’s another little detail here, which is that notice how we’ve dropped the absolute value sign on the left side of equation 18. The reason that’s an okay move is because by inspection, the integral on the left hand side of equation 21 cannot be negative. Because if you consider the product of a bar and b bar, the minimum value that can be is -1. Say a bar is 1 and b bar is ne. And so therefore, a bar

b bar + 1 is at least zero. It can’t go negative. So then when we integrate over a bar * b bar + 1, we’re always integrating over a non- negative number. And so that’s why we can just go ahead and drop the absolute value sign because if we know it’s not negative, then there’s no point in having an absolute value sign. Okay, so I’m sure you’re wondering, what’s the point of equation 21? Where are we going with this? Why does this matter? Well, I want to take a moment to recognize where we are currently at in the paper as a kind of natural checkpoint in part 4B. So, everything we’ve done up until now is sort of the warm-up of part 4B. We’ve essentially been setting the stage, thinking about what it is that we want to prove, thinking about averaging, smoothing things out, not worrying about isolated points and all this sort of thing. and then introducing these quantities epsilon and delta and making some algebraic observations. In the next part of this derivation, we’re going to be utilizing these equations to make an algebraic argument which is going to lead to Bell’s famous result that the error between the local hidden variable correlation and the quantum mechanical correlation that is epsilon cannot be made arbitrarily small. which is to say that no local hidden variable model can reproduce the statistics of quantum mechanics to an arbitrarily good approximation. And then the next thing Belle goes ahead and does is he writes an expression for P bar as a function of A and B minus P bar as a function of A and C. And in just a moment I’ll tell you exactly what that is. But for now, let’s see why the equation is true. So if you look at equation 19, we have the definition of P bar as a function of A and B, which is simply the integral definition of the correlation P as a function of A and B, that is equation 2, but average over smeared out vectors near A and B. So that’s why we have A bar and B bar in the integrant. Well, then if we want to write the expression P bar of A and B minus P bar of A and C, we can just go ahead and copy and paste equation 19 twice. in the first case evaluated for the vectors A and B and in the second case evaluated for the vectors A and C and then you may as well smoosh them together into the same integral. So that’s all we’ve written here. It’s basically just equation 19. So now let’s reflect on what is this quantity P bar of A and B minus P bar of A and C. Well, you want to think of C as another alternative for B that is the measurement axis of detector B. And this is just like how we had imagined the vector C before in part 4 A. However, whereas before we imagined that B and C were very similar vectors, so that C was just a little nudge away from B. And that let us probe the behavior of the correlation P of B and C near its minimum value where B equals C. We’re now going to imagine the vector C as being totally unrelated to B. So not just a nudge away but a whole different vector that we are totally free to choose for the measurement axis of detector B. So then in that context P bar of A and B minus P bar of A and C is the difference between the correlation strengths that we would measure for the detector settings A and B compared to the detector settings A and C. Now of course we have the bar on the P and so we’re neglecting aberant isolated points. you know, we’re smoothing out any infinite decimal pathological point. And so that’s why we have the bar and the P. All right. So then now Belle goes ahead and writes this equation in a form that looks way more complicated, but is going to be useful in a moment. So he writes out this integral expression like so. I’m not going to try to pronounce this equation cuz it’s a mouthful. But I will show you why this is a legit move and why this complicated expression is in fact algebraically equivalent to the previous integral. So to recognize this, you just have to consider the fact that if you have an expression of the form x
y - x * z. If you want, you could write that as x * y * the quantity of 1

w * z - x * z * the quantity of 1 + w

y, assuming all these variables commute, which they do because they’re scalers. And the reason that’s true is because on the right hand side here, the terms involving w are going to cancel each other out. In one case, you’ll have uh xy wz, but then you’re going to have a minus xzwy. And so you’re going to end up with wxyzus wxyz equals zero. Then what remains the terms multiplied by 1 is just xy - xz, which is exactly the left hand side of the equation. So if you look at the integral expression shown here on the bottom line, you see that it has this complicated form where we have something of the form xy * 1 + wz - xz * 1 + wy. And so that’s how to see that these two integrals are equivalent. So this kind of feels like backwards math. Like if you started with the second line, you would feel a sense of accomplishment upon seeing that the terms simplify into the first line. But here we’re going backwards. We’re expanding out the equation. We’re making it more messy because this is going to be a form that’s going to be useful for us in just a moment. All right. So, where do we go from here? Well, think about what this equation is. This is a generic statement that for any local hidden variable model, the difference between the correlations that we would expect with the measurement axes A and B compared to A and C is going to be equal to this big mess of an equation involving integrating over these expressions involving the various outcomes at A and B with given measurement axes A, B, and C. So the difference in correlations equals a big mess. And the next move that we’re going to do is we’re going to convert this equation into an inequality. And in the process, we’re also going to convert the big mess into a medium-sized mess. From equation 20, we find that the absolute value of this difference in correlations is going to be less than or equal to this medium-sized mess. Now, to get from this inequality from the previous equation, it only takes a couple of steps. The first thing you want to do is take the absolute value of both sides. So you see on the left hand side, we’ve simply taken the absolute value of the difference in correlations. And then when you take the absolute value of the right hand side, you find that you’re taking the absolute value of an integral minus an integral or plus a negative integral if you want to think about it like that. And then you realize that by the triangle inequality, the absolute value of the sum of two integrals can be at most the absolute value of the first integral plus the absolute value of the second integral. And so then because we’re converting the equation to an inequality, then we can go ahead and imagine the absolute value on the right hand side applying to each integral individually. And then because a bar * b bar is at least -1 because there’s no way if a bar and b bar could be less than negative 1 then the quantity 1 + a bar b bar is always going to be non- negative. So that’s all good. All right. Now at this stage in the derivation it should not be obvious why we care about this inequality that we’ve written here. But if you look at this equation you can see a bit of foreshadowing here. The reason being we have a very generic statement that applies for any local hidden variable model which says that the magnitude of the difference in the correlations for the settings A and B versus the settings A and C are going to be bounded by an upper limit given by the right hand side of this inequality. So you can imagine that we’re just a few algebraic moves away from a very interesting result which constrains all possible local hidden variable models in a way that is relevant to the question of whether local hidden variable models can reproduce the statistical correlations of quantum mechanics. So in service of that goal, we can now go ahead and rewrite this inequality with a much simpler right- hand side. See from equations 19 and 21 we can see that the expression on the right hand side is going to be less than or equal to 1 + p bar of b and c plus epsilon plus delta. The reason being if you look at the first of the two integrals on the right hand side we see that there’s a 1 which can be pulled outside of the integral because row of lambda is a normalized probability distribution. And then what remains in that integral is by definition P bar evaluated with the vectors B and C by equation 19. So the first integral is going to be exactly equal to 1 + P bar of B and C. And then if you look at the second integral, you find that that is exactly the left hand side of equation 21 because we’re integrating over row d lambda of 1 plus a bar of b and lambda * b bar of b and lambda. And we’ve already established in equation 21 that that has to be less than or equal to epsilon plus delta. And so those inequalities stack. And so then we can go ahead and pull that down to the bottom line here. And we end up with this much more elegant upper bound on the difference between the correlations of a local hidden variable model for detector settings A and B versus A and C. And now we’re really getting somewhere. You can see that things are starting to clean up really nicely. And so now Bell goes on to abruptly say that finally using equation 18, the absolute value of a C minus A.B B - 2 epsilon + delta has to be less than or equal to 1 minus B do C

2 quantity epsilon + delta. And that’s a bit of a leap. You know, you can’t really see that just by looking at it. So, we have to take a moment to see why that’s the case. All right. So, if you look at equation 18, we find that the absolute value of p bar of a and b plus a dob is less than or equal to epsilon plus delta. And remember what that equation means. That is the absolute value of the difference between the correlation function given by a local hidden variable model and smoothed out a little bit. So we’re neglecting any pathological aberant points minus the quantum mechanical correlation of a b. And as we saw earlier that has to be less than or equal to epsilon plus delta where epsilon is the upper bound on the mismatch between the local hidden variable correlation and the quantum mechanical correlation. And this small number delta encodes the mismatch between the precise quantum mechanical correlation and the slightly smeared out quantum mechanical correlation when we’re averaging over the vectors a prime and b prime near a and b respectively. And we saw earlier why equation 18 is true. But now we can think of it in another way which is to say equation 18 tells us that p bar of a and b is going to be equal to a dob plus some error which let’s go ahead and subscript that error sub a. And the reason this follows directly from equation 18 is that equation 18 tells us that the difference between p bar of a and b and a dob the absolute value of that is going to be bounded by the sum of two small numbers epsilon and delta. And so therefore p bar of a and b and a do.b are going to be pretty similar numbers. And so we can think about these two as the same thing plus some error factor. So now then if you take that reasoning and you apply it to the inequality we derived before regarding the absolute value of p bar of a and b minus p bar of a and c you see that we can go ahead and replace those p bars with a quantum correlation a dob plus error sub a and then for the negative p bar of a and c that becomes for the same reason plus a do c minus error a sub c And you see we’ve gone ahead and distributed a minus sign throughout those terms. And so thinking about equation 18 as a statement about the error between par and the quantum mechanical correlation with the absolute value of the error bounded by epsilon plus delta. We can go ahead and replace any expression involving p bar with the quantum mechanical correlation plus that error. And so likewise on the right hand side we can go ahead and replace par bar of b and c with negative b c plus error subbc. And so now what we want to do is ideally we would like to replace these error factors with factors of epsilon plus delta. But when we do that, we have to be careful because it’s not guaranteed that the absolute value of the error is going to equal epsilon plus delta because in general, it’s going to be actually less than or equal to epsilon plus delta. And so if we’re starting with this inequality about the absolute value of p bar of a and b minus p bar of a and c and we want to go from that inequality to another inequality where we can replace these error factors with factors of epsilon and delta and we want to make sure that logically our new inequality actually does logically follow from the previous one. then we have to consider the quote unquote worst case scenario where the magnitude of the error is indeed equal to epsilon plus delta. And in a way, this is the best case scenario for ensuring that the inequality that we’re going to arrive at is true. Because what this means is that on the left hand side of this expression, we’re going to subtract 2 * the quantity of epsilon plus delta corresponding to the most that our error factors could pull down the left side of that inequality to make it as small as possible. And then likewise on the right hand side of the expression, we’re going to let our error be the most it could possibly be. So we’re going to add epsilon plus delta on the right side to bring up the right hand side as much as we possibly can. And so because we did it like that where we considered, okay, worst case scenario, the error is as big as possible and we’re going to let it pull down the small side and push up the big side. then we know for sure that the simpler inequality where the errors have been replaced with epsilon plus delta is guaranteed to still be true. All right. Now, there’s one little adjustment we’re going to do cuz when you look at an equation like this, you think maybe we can clean this up a little bit. So, let’s go ahead and pull all factors of epsilon and delta on over to the left side of the expression. And while we’re at it, let’s go ahead and flip around the inequality and then put everything else on the right. So with just a little bit of algebraic maneuvering we end up with this inequality that 4 * the quantity of epsilon plus delta is guaranteed to be greater than or equal to the absolute value of a dot c minus a dob plus b dot c minus 1. This is equation 22 of bell’s paper. And this is a very profound result. In fact, you know, the term Bell’s theorem is kind of a vague generic statement that applies generally to Bell’s observation that local hidden variable models don’t work. But if you had to take a single equation, or in this case, an inequality from Bell’s paper and say this is the result. This is the statement, well, it would be the inequality shown here. And why is that? What’s the big deal? Who cares about equation 22? Well, to see what equation 22 can tell us, let’s imagine a thought experiment where we consider the vectors A, B, and C. A is going to be a constant measurement axis at detector A. And then B and C are going to be the two different options that we imagine for detector B. And suppose for the sake of a specific example that A and C are perpendicular such that A dot C equals zero. And then also A.B B is equal to B do C, which is 1 / 2. That is to say, we have a 45° angle between the vectors A and B, as well as also a 45° angle between the vectors B and C. So, for example, if A is pointing straight up and C is pointing straight to the right, then B is going to be right in between them, a 45° angle that points up and to the right. And if we apply this reasoning to that scenario, you’ll find when you evaluate the dot products in equation 22 that 4 * the quantity of epsilon plus delta has to be greater than or equal to the of 2 - 1, which is about 0.41. So divide both sides by 4, you find that epsilon plus delta has to be greater than or equal to.1 something. And then remember that delta is kind of an artifact of our smearing process. So you can imagine making that as small as you want. In fact, if you want to make that zero and say forget about averaging, don’t worry about the averaging process. But even then, you’ll find that epsilon cannot be made arbitrarily small because in this case, it would have to be at least 01 something. But remember what epsilon is. It’s a bound on the mismatch between the local hidden variable correlation and the quantum mechanical correlation. So if epsilon cannot be set to zero then the quantum mechanical expectation value cannot be represented either accurately or arbitrarily closely in the form of equation 2 which is the definition of a generic local hidden variable correlation. So that is argument. Now you can see there’s a bit of algebra and it takes a moment to kind of soak it in. And when you’re first encountering this argument, probably the thing you want to do is just focus on how each step follows logically from the previous step and then think big picture about what are our assumptions and what is the result. And you think about how our assumptions were so generic going all the way back to equation two defining the correlation for a local hidden variable model. We made no assumptions or any kind of restrictions on the sort of thing that our hidden variables lambda could be. And so we’ve proven this very generic result which is that at least for some measurement settings A, B, and C. We can show that there is going to be a finite nonzero mismatch between the correlation given by a local hidden variable model and the correlation given by quantum mechanics. And here there’s a possibility of getting confused by equation 22 because you might say, well wait a minute, aren’t there settings of A, B, and C that make the right hand side zero and so this isn’t a problem? And that is true, but it’s not surprising because remember, as we saw earlier in part 3B of this paper, you can have an agreement between a local hidden variable model and quantum mechanics for certain specific settings of our measurement directions. So the fact that there exist experimental configurations where a local hidden variable model might agree with quantum mechanics is not philosophically profound because the profound thing is that there exist experimental conditions where no local hidden variable model can explain the results of quantum mechanics. All that’s to say, if you as an experimentter design an experiment where local hidden variables in quantum mechanics agree, it’s like fine. Okay. But if someone else designs an experiment where they orient their detectors in such a way, like the example given here, where no local hidden variable explanation makes sense and only quantum mechanics with its weird non-local wave function collapse or something mathematically isomeorphic is able to explain the data. Well, then that’s the case and point right there that reality is not described by a local hidden variable model. And so even the existence of one possible experimental setup that violates local realism is all you need to know that well something other than local realism is going on in this universe. So that’s a glitch in reality right there. You know, this is one of those things that the more you think about it, the more it blows your mind. You’d like to think the more you think about something, the less it blows your mind. But no, in this case, it’s the opposite. Part five, generalization. All right. Right. So in this part of the paper, Belle is going to make the argument that even though we’ve been thinking in terms of spin and the singlet state of two spin 1/2 particles with entangled spin, the same arguments regarding non-locality and correlations and hidden variables applies much more generally in quantum mechanics in a way that doesn’t depend specifically on spin. We just thought about it in terms of spin because that’s an example that’s easy to think about. So Bell begins part five generalization with the example considered above has the advantage that it requires little imagination to envisage the measurements involved actually being made cuz you can imagine the sternerlock magnets and the orientation and the spin and all of that. But in a more formal way, assuming that any hermission operator with a complete set of igen states is an observable, the result is easily extended to other systems. So in other words, it’s not just about spin. We can apply this reasoning to any quantum mechanical observable. If two systems have state spaces of dimensionality greater than two, we can always consider two-dimensional subspaces and define in their direct product operators sigma 1 and sigma 2 formally analogous to those used above and which are zero for states outside of the product subspace. Whenever we have two quantum systems, no matter how complicated they might be, they’ll always contain smaller two-state parts that we can focus in on. And within those parts, we can define measurements that behave just like the simple spin measurements we discussed earlier. And when we do that in that two-dimensional subspace, there’s going to be a state which is analogous to the singlet spin state but pertaining to whatever observable we’re talking about in this more general context. Then for at least one quantum mechanical state, the singlet state in the combined subspaces, the statistical predictions of quantum mechanics are incompatible with separable predetermination. That is the kind of realism or local causality that we would expect from a local hidden variable theory or even a kind of quantum mechanical picture where the two states are separable. Like remember earlier we were talking about the uh isotropic mixture of product states where each particle had an equal and opposite spin and we saw how that gave a correlation which was three times weaker than the singlet state. Well, that same kind of reasoning applies to this two-dimensional subspace of whatever observable we’re dealing with. you can create a state which is directly analogous to the spin singlet state. And when you do that and you separate out the particles and you measure them in different ways, you’ll find that the quantum mechanical singlet quote unquote state is always going to have weirdly strong non-local correlations. And so all that’s to say, Bell’s theorem is not about spin per se. Generically, quantum mechanics can exhibit non-local correlations in all kinds of different observables. All right, my friends, let’s go ahead and wrap things up with part six, conclusion. In a theory in which parameters are added to quantum mechanics to determine the results of individual measurements without changing the statistical predictions, there must be a mechanism whereby the setting of one measuring device can influence the reading of another instrument, however remote. That is to say, if you take Einstein’s perspective that quantum mechanics needs to be supplemented with hidden variables, then Bell has proven that that hidden variable model has to contain non-local interactions which are apparently unrestricted by the normal limitations of space and time. Moreover, the signal involved must propagate instantaneously so that such a theory could not be Loren’s invariant. and Lorent and variance. That’s just one of the main principles of special relativity. That is to say, once you have a non-local theory, you run into all kinds of problems with special relativity. And really, a non-local theory just totally goes against the usual relativistic notions of space and time and causality. Now, fortunately, because of the no signaling theorem, the non-local correlations in quantum physics are not actually able to corrupt our universe by allowing for the transmission of information faster than the speed of light. But still, there’s a deep philosophical tension between the non-local correlations in quantum mechanics and the way we usually think about the nature of space and time from a relativistic perspective. And to this day, that tension remains unresolved. We really do not have a good explanation for what’s going on with the non-local correlations in quantum mechanics. Depending on who you ask, different people have different ideas and theories, but there’s really no consensus. And the reason being, well, one of the reasons is that all these different models are so crazy that it’s like what are you going to believe in? You want to believe in many worlds or super determinism or that you just give up the concept of realism? I mean, no matter how you try to explain the implications of Bell’s theorem, it ends up just blowing your mind. No one has yet found a sane explanation for what’s going on here. All right, so this is basically the conclusion of Bell’s paper right here. But then he goes on to add one additional note, a little caveat, which is, of course, the situation is different if the quantum mechanical predictions are of limited validity. Conceivably, they might apply only to experiments in which the settings of the instruments are made sufficiently in advance to allow them to reach some mutual rapport by exchange of signals with velocity less than or equal to that of light. In that connection, experiments of the type proposed by Bow and Aaronov in which the settings are changed during the flight of the particles are crucial. And all that’s to say, if you’re doing an experiment where the settings of the two detectors are set in advance and then you’re sending your entangled particles to each detector, well, maybe there’s some way that the two detectors have communicated with each other or established some sort of rapport somehow. And even though for each pair of particles, the measurements are happening so fast that they’re in different light cones, perhaps somehow the two detectors are already kind of in sync with each other in some sort of way. in that they somehow know the settings of one another and therefore you don’t need non-locality to explain the correlation results. Now, that would be a very hard to believe situation because you’d be like, how can that be? And you know, how and why would the two detectors know about each other, but I mean, in theory, that is a loophole that you could imagine possibly somehow being true. And so, that’s why Belle mentions these experiments where you change the settings of the detectors as the particles are flying along, so that there’s no possible time for the two detectors to establish a rapport with one another. And so each detector is going to be truly independent of each other detector. And so then you’re really ensuring that these correlations are genuinely non-local. Well, okay. So that’s the end of the paper. I hope you found this interesting. I hope it’s given you something to think about. So yeah, thanks for watching. I really appreciate it. And I’ll see you next time. Hey, I want to say thank you to everyone who’s been supporting my channel on Patreon. Your support really means a lot. It really makes a big difference. And genuinely without your support, I wouldn’t be able to really dive into this full-time. So, I’m so grateful for all of you. Thank you so much. It really means a lot.