June 24, 2025

How not to be fooled by the case: what is statistical significance

Statistical significance helps us to understand when the case is making fun of us, or if we can trust what is happening. When we do head or crossfor example, we know that choosing one or the other of the faces is indifferent: the probability is 50/50. But has you ever thought about, when a coin always seems to show the same face, that the coin is made up? Sometimes we notice something so unusual as to make us think: “It is alone the case that is making fun of usor is there really something wrong? “.

Malice when the chance that a result is due to the case descends under a certain threshold – usually the 5% – We can begin to hypothesize that what we observe is not a coincidence.

Head or cross: how many launches do you need to understand if a coin is made up?

Imagine launching a coin four times and always get a head. Suspected? Perhaps. But it could also be lucky. Let’s see what the numbers say: if the coin is regular, at each launch we have 50% probability of making head. If we carry out 4 launcheswe have to multiply this probability four times:

0.5*0.5*0.5*0.5 = 0.0625 = 6.25%

So the probability of obtaining 4 heads in a row is just over 6%. Rare, but statistically not enough To say that the coin is made up when the head-test-test-test event takes place. But what happens if we add a fifth launch?

0.5*0.5*0.5*0.5 0.5 = 0.03125 = 3,125%

Now the probability of obtaining 5 heads in 5 launches drops under the 5%, the main threshold used by researchers To say “attention, here it is plausible that it is not just a coincidence”. For statistics, we entered the kingdom of the significance: The result is so extreme that it is unlikely that it is only the result of the case. But why was the 5% probability threshold chosen?

The 5% threshold in statistical significance: a choice of convenience

There statistical significance It helps us to distinguish an effect is real from one due only to case. But there is no threshold of universal significance valid in any context. 5% is simply the most common!

To introduce it was the British statistical Ronald A. Fisher in 1925, in his book Statistical Methods for Research Worldrs. His proposal was concrete: to trace a 5%line, that is to accept a maximum error in the interpretation of 1 time out of 20. This choice was also motivated by the practical limitations of the time: the calculations were all done by hand, and a fixed reference like 0.05 facilitated the work.

Over time, however, this threshold has been the subject of criticism: Fisher himself, in a 1956 publication, stated that There is no fixed level of significance valid for every situation. We need common sense. For example, CERN scientists use even more severe thresholds. In 2012, to confirm the discovery of the Higgs Bosone, statistical tests were used in which the threshold was 5 sigma, or a probability of approximately 1 out of 3.5 million. This level of significance is it Standard to declare a discovery in particles physics.

Mrs. Bristol’s tea: the experiment that revolutionized the statistics

The need to establish a threshold of statistical significance arises following the definition of the so -called hypotheses: experiments consisting of aNothing hypothesis (which assumes that the observed effect is only a coincidence) and oneAlternative hypothesis (i.e. the evidence against the case).

The creation of these statistical tests arises from a very particular statistical experiment. At the beginning of the 1920s, Fisher found himself discussing with the English biologist Muriel Bristolwho claimed to knowing how to distinguish If the milk had been paid before or after tea.

As a good statistical, Fisher did not just smile on the strangeness of his colleague, but designed a rigorous test to verify the statement, still known as Fisher’s exact test. He took eight cups of tea, four with the milk poured first and four afterwards, and placed them on a table in random order. In these conditions, there are 70 different ways of dividing the cups into two sets and therefore, Mrs. Bristol had only a probability of 1/70, or 1.4%, to guess the correct preparation of all 8 cups for luck. But so he did: the probability was only a case was so low that he forced Fisher a refuse the hypothesis nothing And to accept that Mrs. Bristol actually possessed the ability it declared. In other words, The result was statistically significant.

Since then, the statistical significance It helps us to understand if a result of a hypothesis test is so unlikely that it cannot be attributed to the case. It therefore allows us to prevent intuition or the first impression from making us take for real ones that are only coincidences.

However, a significant result does not guarantee the certainty that an effect really exists: for example, a 5% threshold involves 1 in 20 error even when there is no real effect. Just as a non -statistically significant result does not imply the certainty that it is all the work of the case. To have more confirmations, you have to repeat the experiments several times and check for any errors. In short, only one experiment is not enough. In order not to be fooled by the case, it serves method. And patience. Because in science, the stroke of luck should always be verified.

Alexander Marchall

Alexander Marchall is a distinguished journalist with over 15 years of experience in the realm of international media. A graduate of the Columbia School of Journalism, Alex has a fervent passion for global affairs and geopolitics. Prior to founding The Journal, he contributed his expertise to several leading publications.