heading · body

Transcript

Correlation Vs Causation The Difference Matters A Lot In The Money By Zerodha

read summary →

TITLE: Correlation vs. Causation — The Difference Matters. A Lot. | In The Money by Zerodha CHANNEL: In The Money by Zerodha DATE: ---TRANSCRIPT--- What if I told you that one of the most accurate predictors of the S&P 500, the most tracked stock market index in the world, was butter production in Bangladesh? You’d probably laugh at me, and you should. But here’s the thing. Someone actually found this to be true, and it wasn’t some random person. It was Dr. David Lineweber, an MIT graduate in physics and computer science, PhD in applied mathematics from Harvard, a visiting faculty at Caltech, a respected quant. In 1995, Linweber got curious about a question and he asked, “What if I go looking for correlations in a large data set without any logic or filter? What will I find?” So, he sat down with a UN data CDROM. If you don’t know what a CDROM is, you should look it up. Now, that CDROM had economic data from 145 countries. And what he found was this butter production in Bangladesh explained 75% of the S&P 500’s movement over a 10-year period. Not GDP numbers, not earnings, not interest rates, but butter in Bangladesh. And he was not making it up. He ran a real regression test on the data and it came back with a 75% fit. The math was completely clean. Not satisfied with 75%, he kept going. He threw in butter and cheese production from the US as well and now he was at 95%. Then he added sheep population of Bangladesh and the United States 99% a near perfect fit. The man had essentially built a model that predicted the S&P 500 with 99% accuracy using dairy products and sheep. He never published it. It was supposed to stay a joke, a set of slides he would show internally to make a point. So 20 years later, he finally wrote it up properly and published it under the journal of investing under the title Stupid data minor tricks. I’m not kidding. The link is in the show notes. Anyways, closer home, we hear something similar. I’ll call it the panwala indicator. The idea being when your local panwala starts giving you stock market tips, you know the market has stopped. It’s funny and relatable and like all good jokes, there is a grain of truth buried in it. But should you actually listen to the panala or the people making these kinds of calls more seriously? That’s exactly what this episode is all about. The difference between correlation and causation. Two variables can move together perfectly. The math can look spotless and the whole thing can still mean absolutely nothing. We confuse the two more often than we realize. And in markets that confusion has a price. I’m your host Sepra and let’s get into it. Before I start, here’s a quick word about my upcoming varsity live session on commodities basics. This time it’s going to be in Hindi. I will be co-presenting the session with Pratik Sharma from Varsity. So commodities trading session or registration link video description session free. or session live format recording. Before I get into the concepts, I want you to pause and look at these statements and then think through them. If Amabasia falls on a Saturday or Sunday and if the market had closed in negative on Friday, the market opens gap down on Monday. If FII flows are net negative for 5 days in a row and the market also closes negative for 5 days, the next day the market would gap up. You may have heard such statements on social media for sure. But keep those statements in mind and I will come back to it later. But for now, let’s move on and define the terms. You see, the word correlation was actually invented by Francis Galton, a Victorian era British scientist, a cousin of Charles Darwin, a noted Africa explorer. At the time, Galton was obsessed with the question, “Does genius run in families. He spent years compiling pedigrees of 605 eminent Englishmen, looking for evidence that intelligence was inherited. What he found instead was something he didn’t expect. A pattern he first called reversion and later regression towards mediocrity and what we call mean reversion. Tended to have sons who were tall but not as tall as them. Short fathers tended to have sons who were short but not as short as them. Everything pulled towards the average. While investigating this, Galton started plotting height against forearm length, height against head width, all kinds of physical measurements against each other and he kept seeing the same pattern. When one went up, the other tended to go up too, but neither was causing the other. Both were just the downstream of the same underlying metac genetic inheritance. He needed a word to describe a phenomena where two things move together but neither caused the other. That’s how the word correlated came into being. His student Carl Pearson later turned this into a precise mathematical formula. The correlation coefficient, a number between 0ero and one that tells you how strongly two variables move together. Zero means no relationship at all and one means a perfect lock step moment. It is still the first number statisticians all over the world compute when they want to understand the relationship between two variables. Here’s the thing Pearson was thrilled about. The correlation coefficient is completely agnostic about cause and effect. It doesn’t know and it doesn’t care. Think about this. Every summer in India, ice cream sales go up and every summer in India, power cuts go up, too. If you plotted both on a graph, you would see a near-perfect correlation. Now, does that mean ice cream consumption is causing power cuts across the country? Obviously not. Summer and the heat is driving both. People consume more ice cream because the weather is hot. And at the same time, and the same heat pushes everyone to run their ACs and coolers on full blast, which strains the grid and causes power cuts. The ice cream and the power cuts are just two symptoms of the same underlying cause. That hidden third variable, summer heat in this case, the one doing the actual work while the two other things move in tandem is what the statisticians call a confounding variable. The lurking cause that neither of our two variables can see. And this is where causation comes in. Causation is a much stronger conceptual claim. It says a change in X causes a change in Y. Not just that they move together, but that one is genuinely driving the other. Rain causes the ground to get wet. A rate hike by the RBI causes borrowing costs to rise. Now these are causal claims. There is a mechanism, a chain of logic that connects one thing to another. The difference between the two concepts that is these things move together versus one is driving the other is enormous. In markets this matters more than almost anywhere else. Because markets generate enormous amounts of data and wherever there is enormous data, there are enormous opportunities to find correlations that look meaningful but aren’t. As lineber showed us, if you search hard enough through enough variables, you will always find something in a sense, some stupid correlation. Now, for instance, did you know that an increase in sewn papri sales indicates an oncoming recession by 90% accuracy? Okay, I’m kidding on this one. So we now know what correlation and causation mean. But then how do we actually measure them? Measuring correlation is simple. Carl Pearson gave us a single number. The correlation coefficient. It runs from minus1 to + one. + one means two variables move in perfect lock step in the same direction. Minus1 means they move in perfect lock step in opposite directions. Zero means no meaningful relationship whatsoever like shoe size and maybe need scores. Now coming to our markets, if you calculated the correlation between Nifty50 and the Sensex, you’d get a number very close to one. They’re on identical indexes. The Nifty has 50 stocks and Sensex has 30, but but both track India’s largest companies through the same economic environment responding to the same news. they move almost in lock step and hence a correlation of close to one. Measuring causation is a completely different problem. Judia Pearl in his book the book of Y the new science of cause and effect frames this through what he calls the ladder of causation. Three distinct levels of understanding causation each more powerful than the previous. The first rung of the ladder is association. When I see X, how likely am I to see Y? This is what the correlation coefficient measures. It is also Pearl points out somewhat uncomfortably what most machine learning systems do today, including the most sophisticated AI systems. Pure pattern matching. On a side note, if the last name of Judiah Pearl rings a bell, yes, Judia Pearl is the father of Daniel Pearl, the Wall Street journalist who was killed in Pakistan in 2002. Back to measuring causation. The second rung is intervention. You change some variable and ask what happened as a result. This is what a randomized control trial does, the gold standard in medicine. You take a large group, randomly split them in two and give one group the treatment and the other group a placebo and measure the difference. The randomization neutralizes all confounding variables at once. The third and the highest rung is counterfactual reasoning. That is asking what would have happened if things had been different. My headache went away after I took a croen. But would it have gone away anyways? Or better still, going back to our Amavasia example this weekend, let’s say there is no Amawasia. Markets fell on Friday. Does it still increase the odds of a Monday gap down? Now, this is the deepest form of causal reasoning in markets. Randomized control trials are never possible. You cannot randomly assign one group of investors to experience a rate hike and another group a rate cut. Markets have one history. But most of the time in everyday trading and investing decisions, we are sitting at rung one observing patterns in data and asking what they mean. And that is precisely where the problem lies. Because the correlation co tells you that two things move together, but it never tells us why. As humans, we are wired to seek why. When the answer isn’t obvious, we latch on to the most compelling narrative available. And when even that fails, our minds simply invent one. By now, I’m sure you’re wondering, okay, this is all very interesting, but how does any of this actually matter in markets? How does knowing the difference between correlation and causation help me become a better trader or investor? Fair question. But before that, I want to introduce to you two more concepts. One applies whether you are dealing with correlation or causation. The other is a trap that sits specifically inside the concept of causation and it catches more traders than almost anything else. The first is the concept of a lag. Finding a correlation between two assets or even establishing that one causes the other is only half the battle. The more important question is how much is the time gap between the move in one and the move in the other. Because if asset B moves one microcond after asset A, the correlation is real but completely useless to you as a trader. By the time you’ve seen the signal and reached for the buy button, the opportunity has already closed. Highfrequency trading firms with servers colllocated at the exchange might exploit that gap. You and I cannot. For a correlation or a causal relationship to be actionable, there needs to be enough lag for you to see the signal, make a decision, and execute a trade. The lag, in other words, has to be wide enough for a human or at least an automated system to execute. This is why pair trading for instance works on some time frames and completely falls apart on others. Two correlated assets drifting apart and then mean reverting sounds like a clean opportunity and it of course is but if the reversion happens in mere seconds the window doesn’t exist for most participants. The relationship is real but the opportunity is not tradable. So whenever you find a correlation that excites you whether or not there’s a causal story behind it, the first question to ask is how much lag is there and is it enough for me to act. Just to reiterate, let’s say you have figured out that both Nifty50 and Nifty Mid select are correlated. They move together. Now you also need to figure out which one moves first. Let’s say by running some sort of lead lag statistical test, you have figured out that mid select moves first and then Nifty50 follows. Now there should be some time between the two for you to trade it. If Nifty follows midselect in a few seconds, it’s not tradable. If it lags by 5 minutes, then there is an opportunity. The second concept is that of causal direction. This one can get a bit complex and comes under causation. It’s not about whether a causal relationship exists. It’s about which way the arrow is pointing. What causes what that is? And this is where I want to talk about technical analysis indicators. Moving averages, RSI, MACD. These are the tools that most traders start with. And there’s nothing inherently wrong with them. But there’s a profound conceptual error that many beginners make and even some experienced traders carry around without realizing it. And the error is treating the indicator as the cause and the price as the effect. When RSI drops to 30, traders say the stock is oversold and expect a bounce. And when a moving average crossover happens, traders treat it as a signal that something is about to move. The implicit assumption is that the indicator is telling the market what to do next. But think about what these indicators actually are. RSI is a mathematical function of the last end price closes. MACD is the difference between two exponential moving averages, both of which are themselves derived from price. A moving average is just the average of recent prices smooth over time. Every single of these indicators is derived entirely from price. Price is the input. The indicator is the output. Which means the causal arrow runs in exactly one direction. Price causes the indicator to change. The indicator cannot cause the price to change. When RSI reads 30, it is because the price has already fallen sharply over the recent period. The indicator is not predicting a bounce. It is reflecting the fact that a sharp fall has already occurred and that’s all it is. Jude Pearl calls this reverse causality. It is a situation where you have correctly identified that a causal relationship exists between the price and the indicator but you get the direction wrong backwards. That is in other words, you assume indicator movement causes the price to change. Very similar to the dog walking under a bull cart, assuming that the cart is running because of it. This direction of causality reminds me of this classic debate in the markets about option prices and spot. The question is what moves first? Does spot move first and option prices follow or through other factors option prices actually lead spot? If you look it up, there are no clear answers. Academic evidence is mixed. Retail traders believe one thing and institutional dealers may say something else. There is no ultimate consensus on which way the arrow points. And getting the direction backwards or wrong in markets means you are making decisions based on a misreading of what is driving what. Referring back to the technical indicators, it doesn’t mean technical indicators are worthless. It only means you need to be clear about what they are, a visual representation of what the price has already done. They are the market’s shadow, not its sun. Indicators move because of price and not the other way around. In short, when it comes to building a trading strategy, the two questions to consider are one, is there enough lag in a relationship causal or correlated for it to be actionable for me? and two if I believe one thing is driving another am I sure I have the direction of the arrow right get those two wrong and even a genuine wellestablished relationship in the data will cost you money now if you remember the beginning of the episode I had shown you two statements if Amabasia falls on a Saturday or a Sunday and if the market had closed in negative on Friday the market opens gap down on Monday if fii flows or net negative for 5 days in a row and the markets close negative for 5 days, then the next day the market gaps up. I’m sure you may have come across statements like these on social media, but I hope now you know how to read them, or rather how not to read them. And if it’s possible to test them, you should. And lastly, even if you see a perfect correlation now, you know that you are better off ignoring it altogether. Before I wrap up, if any of what we discussed today sparked a curiosity to go deeper, here are three books I genuinely recommend. The first is Nerds on Wall Street by David Lionweber. Yes, the butter guy himself. It’s part memoir, part manifesto written by someone who spent decades at the intersection of quantitative finance and markets. And if you want to understand how data gets abused in finance and how to think more clearly about it, this is a great place to start. The second is the book of why the new science of cause and effect by Judiah Pearl and Dana McKenzie. This is a serious one. Pearl is a Turing award-winning computer scientist who spent his career trying to put causation on rigorous mathematical footing. The book is his argument that for over a century, science became so obsessed with correlation that it forgot to ask the more important question which is why. Let me warn you, it’s not a light read, but it will permanently change how you think about data and inference. The third is Naked Statistics by Charles Wen. The style of this book is just the opposite of Pearl’s book in the sense it’s anything but serious. It covers everything from correlation to regression to probability with wit and clarity and without requiring you to remember anything from your school days. A perfect book if you want to understand statistical concepts without the mathematics. So yes, that brings me to the end of this episode. I hope you found this episode useful. If you have any questions, feel free to drop them in the comments. I will do my best to respond to them. Till then, take care and trade safe.