The Simpson’s paradox it is a statistical phenomenon due to one imbalance in the available datain which the trends that can be inferred from a group of data are in contrast with the trends extracted from one or more subgroups of the same data set. It may happen, for example, that a result which seems obvious, becomes inconsistent when we consider all the data at our disposal, that is, when we know all the variables involved. Indeed, it may happen that a result come to reverse when we separate the data into subgroups. This is very important for example in medical field when you want to study the effectiveness of drugs in a given population.
What is Simpson’s paradox
The Simpson’s paradox it is a counterintuitive statistical phenomenon – a paradox, in fact – in which a trend or a result that emerges from data, it becomes different when the data itself is split into subgroups, that is, when the population on which the statistics are made is divided according to a specific variable. To be clear, this can mean that the statistics over time of the average height of Italians can have a different result if we consider all Italians as a whole, or separately women and men, or young and old
This means that a overall resultso in our case on all Italians, it can be contradicted come on “local” resultsthat is, observed in small groups, such as men and women. This happens due to the influence of external factorssuch as the size of the subgroups, which alter the results.
This paradox gets its name from Edward Hugh Simpson, statistical and former Bletchley Park cryptanalyst, who described it in a 1951 article, “The Interpretation of Interaction in Contingency Tables”.
But let’s explain it better with an example.
When the result reverses: an example of Simpson’s paradox
Let’s imagine we want to understand whether, to lose weight, it is better to eat only Brussels sprouts or tiramisu. To do this, we involve you in our study 100 people let them just eat sprouts Brussels for a week and others 100 let them just eat tiramisu. At the end of the week we ask all participants to measure their weight and let’s see what happens 75 people out of 100, that is, 75% of people who ate alone sprouts, they lost weight at the end of the week, while those who ate alone tiramisu, well 80 people out of 100, so 80%, have lost weight.
If we stopped here, we would only notice that 80% of those who ate tiramisu lost weight, compared to 75% of those who ate sprouts. We could therefore state with great joy that eating tiramisu makes you lose weight to people more than eating Brussels sprouts and we could continue peacefully with our lives.
Clearly, even though we would like it to be true, the situation it’s good different. Indeed, in our analysis we have not consideredfor example, the sex of the participants.
Introduce confounding variables and stratify the data
In our imaginary study to decide the best diet we involved 100 people for one diet and 100 for the other. However, we did not pay too much attention to biological sex of the participants and we therefore involved 70 men And 30 women to eat sprouts and 90 men And 10 women to eat tiramisu.
If we now look at the separate results for men and women (in technical terms it is said “stratify”), we observe that:
- between women who ate tiramisu, 3 out of 10 (30%) lost weight, while among those who ate sprouts this happened for 14 out of 30 (46.6%);
- among them men who ate tiramisu, 77 out of 90 (85.5%) lost weight, while among those who ate sprouts this happened for 61 out of 70 (87.1%).
In both cases, we see that the percentages of people who lose weight increase if they eat sproutsgoing from 30% to 46.6% for women and from 85.5% to 87.1% for men. This, however, seems like it contradict the results globally, that is, general regardless of gender, where we had seen that it was better to eat tiramisu. In other words, dividing the sample based on sex seems to reverse the result obtained for the entire sample. This is precisely Simpson’s paradox: a trend (better sprouts than tiramisu for both men and women) visible within each subgroup of data disappears if I combine and aggregate all the data together (80% tiramisu VS 75% sprouts ).
But why does such a statistical phenomenon happen?
How does this happen and where does Simpson’s paradox lie?
In this case men and women (our confounding factor) they were not balanced within the two groups. If a subgroup has a much higher (or lower) number of observations than the others, the overall result can be dominated by this imbalance, altering the average or the general trend.
This paradox is often found in statistical analyses social and medical sciences and it can create big problems, especially if they are interpreted correlations as if they were causal relationships. For example, moles clinical trials for new drugs, it may happen that one drug appears more effective than another when the data are aggregated, but once separated by groups (such as age or severity of the disease), the result is reversed, as in a very famous case of treatment of kidney stones. The same happened no later than 2021 with i vaccines against COVID-19 in England: considering separately the age groupsit clearly emerged that the rate of mortality among the unvaccinated it was much higher than among the vaccinated for both people under 50 and over. However, considering the entire population, the mortality rate appeared paradoxically lower among the unvaccinated. This is because there was a higher vaccination rate among the elderly, who still have a higher mortality risk.
Be aware of Simpson’s paradox it is therefore fundamental to interpret the data correctly and avoid errors of evaluation.