Two months ago, a preprint suggested that the scientific community do something huge. The 72 authors of the paper (behavioural economist Daniel Benjamin et al.) recommended changing the threshold for defining “statistical significance” from a p-value of 0.05 to 0.005, claiming that it would help alleviate the ongoing “replication crisis” plaguing psychological and biomedical research.
My PhD student friends and I had a good chuckle about it, lamenting half-jokingly about how, if Benjamin et al. got their way, we’d have a much harder time getting our degrees (publishing scientific papers, which is – unfortunately for us and for science – extremely difficult to do without “statistically significant” results, is a requirement for being awarded a PhD at the Charité).
But jokes aside, there are several reasons why it’s a bad idea. And it’s crucial that early career researchers, in particular, understand why.
A few days ago, we responded to Benjamin et al. In our paper, we point out that there is little empirical evidence that changing the statistical significance threshold will make studies more replicable. Their recommendation also distracts from the real issues and can have harmful consequences on how resources for research are allocated.
So what do we suggest instead? Just the uncomfortable truth – that there is no quick and easy fix. After countless hours brainstorming ideas, designing experiments, collecting data, and reading up on the latest research, scientists don’t have much left in the tank. Few of us actually think carefully about how we use statistics, and far fewer still do any more than relying on a strictly binary (significant vs non-significant) interpretation of the p-value. Those little asterisks on our plots and tables save us a lot of mental effort.
But this just doesn’t work – it never will, not with any blanket threshold. Properly interpreting our data will require us to do more – even after we think we’re finished and ready to write up the results. This makes our jobs harder, but there’s simply no avoiding it if we’re to practice good science. What’s worse is, we don’t yet know exactly what “doing more” entails – we have some ideas, but there’s a ton of work to be done figuring this out.
So what’s the harm in changing a threshold if the alternative is scratching our heads pondering a bunch of other things no one can even agree on?
Well, keep in mind that shifting the p-value threshold isn’t just about statistical rigour. It can have far-reaching consequences like influencing which studies get done (favouring “novel” instead of replication studies) and encouraging the wasteful use of animals in preclinical research.
Also, Benjamin et al.‘s suggestion takes us one step forward and two back. I’ve only been in science a few years and even I’ve noticed the progress that has been made in how we think about and use statistics. Few can argue with the fact that this has largely been due to a bottom-up effort, with more junior scientists driving the change.
Now here comes a paper penned by dozens of leading (and senior) scientists that shifts the focus back on p-values and effectively says: “Let’s just change this number and everything will be tip-top – go about with your science as you were.” Think of what that does to the graduate student trying to convince her supervisor that not everything that glitters (***) is gold.
There’s absolutely no doubt in my mind that Benjamin et al. do not intend their recommendation to be interpreted in the way I described above. But that’s irrelevant. The following has been quoted to the point it’s almost lost its meaning, but the inventor of the p-value himself never intended it to be used the way it has. That didn’t stop people from misusing it – and this might very well happen again if the new threshold lulls people into a false sense of security. We, the scientific community, should know better.
I know what you’re thinking – why not leave this whole discussion to the pros? But it can’t be entirely up to statisticians – after all, they’ve been trying, mostly unsuccessfully, to cure us of our obsession with p-values forever.
One thing’s for sure – by no means just take anyone’s word for it. Read up on both arguments, educate yourselves. I know most of us feel like we don’t know enough about statistics to even think about things like this. But they are an essential part of our work as scientists, so it’s our duty to inform ourselves and put in our two cents.
2 thoughts on “A Band-Aid on a Gunshot Wound: Redefining Statistical Significance”
Great, concise post.
Thank you, Daniel!