Precision is nice but oomph is the bomb

BombThis is the fun title of a chapter section in a book called The Cult of Statistical Significance, which is as entertaining as you would expect any book to be with both cult and statistics in the title. I have not read all of it but a lot of it seems like one long attack on p-values.

P-values are extensively used in science, particularly in biology and medicine. You can see the Wikipedia page for details of p-values but basically the idea is as follows.

Let us say you are testing a new drug on 200 people, half of whom get the drug and half of whom just get a sugar pill. Let us say 50 of the 100 people who get the drug get better, while only 36 of the 100 who don’t get the drug get better. Your question is then: Is this difference 50 – 36 = 14 significant? Or could it have arisen just by chance? For example, because by chance the 100 people who got the drug contained an unusually high fraction of people who would have got better anyway.

This is what the p-value does. It is used to test if data shows a significant effect or not. People calculate the p-value from the data, and the typical threshold for significance is P = 0.05. This is supposed to mean that there is only a probability of 0.05  or a 5% chance that your data was just due to chance, and so a 95% chance that the effect is real.

So if you calculate a P = 0.03, say, from your data of 50 people getting better, then you say that there is only a 3% chance of your results arising by chance and so you think the result is significant – that the drug works.

This reasoning is very widely used and it annoys the authors of The Cult of Statistical Significance, Ziliak and McCloskey, so much that they wrote a 322 page book.

I have been thinking about whether what Ziliak and McCloskey say. In particular if it means that we should not use a p-value test on data on protein dynamics. This is data from work a PhD student and I am doing with coworkers at King’s College London.

Our question is: Is there a significant difference in the times two different proteins spend at a membrane in a muscle cell? The two proteins are the usual form we all have in our bodies, and a mutant version. At the moment, the data we have would clearly fail the p-value test, i.e., the p-value test would tell us there is not a significant difference in the two times.

Now it is unlikely that the two proteins spend absolutely identical times at the membrane of the cells. And the way the p-value test works means that then if we got enough data points the data would pass the p-value test, i.e., it would say that there is a significant difference. We are not going to do this. The student and postdoc do not have the time. But the fact that if you get enough data, you will always be told by this test that there is a difference is a fault of the p-value test.

The fault lies not in the maths itself, which is correct and precise, but it is because we are not interested in whether or not the two times are absolutely identical, we want to know if they is a big difference between the times. Only a big difference is a result with oomph, and a p-value test does not answer this question.