correlazione e causalità

Does eating less ice cream reduce the number of divorces? No: here’s the difference between correlation and causation

Data is a critical tool for understanding the world and making informed decisions, but it’s not always easy interpret them in the correct way. For example, it may happen that two phenomena behave over time in such a similar way that they seem to one the cause of the other. This, however, could be a simple coincidence or it could depend on a third event, which is not observed, and which is the cause of both. Succeed in distinguish a simple one coincidence from a real relationship cause-effect among them events can be complexbut it is essential in fields such as healthcare, politics and economics, where important decisions are made based on the interpretation of data. In this article we understand what it means that two phenomena are “correlated”, what spurious correlations are and the importance of not confusing the correlation (i.e. a particular relationship between the trends of two events) with the causality (i.e. the connection by which one event is the cause of the other).

Phenomena that seem linked to each other, but are not

Let’s start with a graph and a question: Does reducing ice cream consumption reduce the number of divorces in Alabama? Obviously not: the quantity of ice cream consumed in a year in the United States does not in any way affect the number of divorces in Alabamabut if we just looked at this graph, we might think it’s true.

Image
Graph showing the trend over time in ice cream consumption (in black) and number of divorces in Alabama (in red). The two graphs are similar, but clearly the two events do not cause each other. Credit: Chart by Tyler Vigen, tylervigen.com, CC BY 4.0

What we see, in fact, is that the consumption of ice cream in the United States and the number of divorces in Alabama appear decrease in a similar manner from 1999 to 2020. In this period, in fact, people ate less and less ice cream (from 16.2 to 12 pounds per year) and divorces decreased (from approximately 6 divorces per 1000 inhabitants to 4).

This could therefore make us mistakenly think that if we reduced the amount of ice cream consumed, then we could also reduce divorces, and this is due to the extreme “similarity” of the two graphs. When reading the data, it is always important to keep in mind that situations like this can happen for randomness, or – usually – because of one third variable which connects the two phenomena, but which we are not taking into consideration at that moment.

In statistics, when two phenomena vary contextually, they are said to be related variables. Let’s see what that means.

Two variables are correlated when they change together

Two variables or phenomena I am related when we observe that as one varies, the other also varies. The correlation describes just this trend of one variable to change according to the other and is expressed with a correlation coefficient which varies between -1 And 1 in this way:

  • 1: positive correlation. As one variable increases, the other also increases.
  • -1: negative correlation. As one variable increases, the other decreases.
  • 0: no correlation. The two variables have trends that are totally independent of each other.
positive negative correlation

For example, let’s think about the number of beach slippers sold and the number of people attacked by a jellyfish: both variables increase during the summer periods and decrease during the winter periods, following the same behavior. These variables will then be strongly correlated. When we find ourselves faced with phenomena that evolve over time in a similar way or with graphs such as that of ice creams and divorces, it is easy to think that not only him events they are tied between them, but also that a of the two be the cause of the other. This, however, is our “mental shortcut” and is not always true.

Correlative does not necessarily mean causative

There correlation between two phenomena, however strong, does not imply necessarily that one is there cause of the other. In the case of the sale of slippers and jellyfish stings, for example, the variables are correlated, although one is not the cause of the other: buying slippers does not put us at risk of a jellyfish attack and we do not feel the urgency to go to buy flip-flops immediately after being stung. Slippers and stings, therefore, are related to each other because they vary together, but do not have a direct cause-effect relationship.

correlation causation

For show the relationship of causality techniques must be used causal inferencewhich aim to eliminate any confounding factors and to leave the cause-effect relationship as the only factor to be observed. When we find ourselves faced with situations like that of ice cream and divorces or slippers and bites we talk about spurious correlations.

What are spurious correlations

A spurious correlation is the situation where two or more variables I am relatedbut Not linked by a causal relationship. It can happen because of one coincidence or the presence of a third factor not considered, the “confounding factor“.

The example of divorces and gods ice creams perfectly represents the situation in which two variables have an index of very high correlation (0.967), but they have not no causal link. On the site Spurious correlation there are many other curious examples, such as the relationship between the number of breweries in the United States and solar energy generated in Peru (correlation = 0.978) or the number of bachelor’s degrees in psychology with the number of gardeners in Utah (correlation = 0.990 ). Clearly, these correlations are only one coincidence.

As regards jellyfish stings and the sale of slippers, however, the correlation is not a coincidence, but depends on a confounding factor, that is, the factor that connects the two variables: going to the seaside. Both variables, in fact, are dependent on this third variable: if I go to the beach I need flip-flops and the more time I spend in the water, the more likely it is that a jellyfish will sting me. This is an external variable that influences both events, creating a correlation without direct causation.

Knowing how to distinguish between correlation and causation, without drawing hasty conclusions, can help us make important decisions in crucial areas such as medicine, politics and economics.