The insignificance of statistical significance

In most settings, statistical significance is not that significant. I mean, I have reached the conclusion that even using the correct interpretation of statistical significance is not that important. The background is an old article in The Guardian I stumbled upon, which discussed the top 20 things politicians need to know about science.

One of the points in the article (point 13) is “Significance is significant,” with the following description: “Expressed as P, statistical significance is a measure of how likely a result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks like an effect of the treatment could have occurred randomly, and, in truth, there was no effect at all. Typically, scientists report results as significant when the P-value of the test is less than 0.05 (1 in 20).”

I find such a description of the p-value misleading at best, and I wondered whether it was simply a journalist getting it wrong, as the piece covers a comment in Nature with twenty tips for interpreting scientific claims (from around the same time as the article, i.e., 2013). The joke goes along the lines of a scientist saying “My findings are pointless when taken out of context,” and the media reporting a scientist claims “findings are pointless”. However, in the comment, you will find the exact same description.

We all know that this is not how to interpret a p-value, and at the risk of being pedantic (not a risk I have ever taken seriously in my blogging), it is p, not P. The p-value is not the probability of the effect having occurred randomly, nor is it the probability that the null hypothesis is true. The point I want to make here is that even if this were true, it would not make that much of a difference.

The example above helped me understand why I do not really care about statistical significance. That is, if the p-value indicated the probability that an effect could have occurred randomly (e.g., a 1-in-100 probability that the effect occurred by chance), it will not make the result more or less impressive. Should I care whether there is a 1% probability of the result being due to chance or, say, a 10% probability? I don’t think so.

More specifically, even if we could apply a Bayesian interpretation of frequentist p-values, this would not make p-values significantly more important. We are, at the end of the day, still dealing with a single p-value, often only a few p-values from a big batch of p-values analysed by the scientists reporting them, making it less relevant, independent of the correct or incorrect interpretation of these values.

I cannot recall a single p-value that I find important in all of social science. This is not to say that specific p-values were not important in evaluating a particular finding, or putting it into context, but that they – in the grand scheme of things – are rather insignificant.

This is not to say that papers should not report p-values, and I am still in favour of scientists justifying and reflecting upon the use and interpretation of such values, but these values should be insignificant in the grand scheme of things, including the decision of whether a research project is deemed successful or not. In other words, we should care more about the limitations of p-values in a literature rather than the interpretation of a p-value in a single study. Statistical significance is simply not that significant.