Skip to main content
Ivory TowerSpiegeloog 404: Experience

Ivory Tower: Paddle like crazy

By May 25, 2020No Comments

There are certain complaints about psychological science that arise with clockwork regularity. Although they are never quite gone, the choirs of critics that lament these issues reach their full volume at a given time point, and then recede into the background, until it’s time to rise again. 

In my own field, criticisms of the null hypothesis test have this property and are currently at maximum strength. I have lived through one of these bouts before (around 2000), but if you go back and study the literature you can find such eruptions every two or three decades. There were others in the late 1980s, the mid-1970s, and an early one around 1960. Interestingly, critics rarely tread new ground; if you read, say, Rozeboom’s 1960 classic “The fallacy of the null hypothesis test”, and follow the literature from there, you’ll find that most of the arguments are being periodically recycled. The same arguments erupt every two decades or so, like a methodological Cicada. 

An example of a criticism that is invariably adduced is that psychologists don’t actually understand what a p-value means. To argue this point, critics typically showcase incorrect interpretations, like “the p-value is the probability that the null hypothesis is correct, given the data” and show how they lead to horrible scientific accidents. My colleague E.J. Wagenmakers even administered a test in which he let researchers choose among interpretations of the p-value to show they could not choose correctly (although I think he rigged the outcomes by not including the correct interpretation among the answer categories in the first place).

I never quite get the force of this argument. Sure, many researchers cannot give you the correct definition of a p-value. Neither can they give you the correct definition of Cronbach’s Alpha, factor loadings, variance components, interaction effects, Bayes Factors, or correlation coefficients. The interpretation of statistical methods is tricky and rarely a favorite topic among psychologists, so they keep forgetting what it all means. Having said that, I am quite sure that the situation isn’t much better in other fields; biologists, economists and physicists are, in my experience, just as bad at statistics. In physics, the so-called five-sigma policy comes uncomfortably close to the idea that a really small p-value can prove the existence of particles. You wouldn’t even get away with that in psychology.

In the 1990s, some people edited a book with the title “What if there were no significance tests?”. The most amusing contribution, amid the obligatory methodological criticisms, was Abelson’s “If there were no significance tests, they would be invented”. His point was that null hypothesis tests fulfill an important function in scientific practice, namely that of keeping our heads above the water: “we are awash in a sea of uncertainty, caused by a flood tide of sampling and measurement errors, and the best we can do is to keep our heads above water, and paddle like crazy”. In this chaos, he maintained, people need something to hold onto, and whether that is a p-value, a confidence interval, or a Bayes Factor doesn’t really matter. Of course, being a methodologist, I have to disagree with that on professional grounds. But as a human being, I certainly can see the point.

There are certain complaints about psychological science that arise with clockwork regularity. Although they are never quite gone, the choirs of critics that lament these issues reach their full volume at a given time point, and then recede into the background, until it’s time to rise again. 

In my own field, criticisms of the null hypothesis test have this property and are currently at maximum strength. I have lived through one of these bouts before (around 2000), but if you go back and study the literature you can find such eruptions every two or three decades. There were others in the late 1980s, the mid-1970s, and an early one around 1960. Interestingly, critics rarely tread new ground; if you read, say, Rozeboom’s 1960 classic “The fallacy of the null hypothesis test”, and follow the literature from there, you’ll find that most of the arguments are being periodically recycled. The same arguments erupt every two decades or so, like a methodological Cicada. 

An example of a criticism that is invariably adduced is that psychologists don’t actually understand what a p-value means. To argue this point, critics typically showcase incorrect interpretations, like “the p-value is the probability that the null hypothesis is correct, given the data” and show how they lead to horrible scientific accidents. My colleague E.J. Wagenmakers even administered a test in which he let researchers choose among interpretations of the p-value to show they could not choose correctly (although I think he rigged the outcomes by not including the correct interpretation among the answer categories in the first place).

I never quite get the force of this argument. Sure, many researchers cannot give you the correct definition of a p-value. Neither can they give you the correct definition of Cronbach’s Alpha, factor loadings, variance components, interaction effects, Bayes Factors, or correlation coefficients. The interpretation of statistical methods is tricky and rarely a favorite topic among psychologists, so they keep forgetting what it all means. Having said that, I am quite sure that the situation isn’t much better in other fields; biologists, economists and physicists are, in my experience, just as bad at statistics. In physics, the so-called five-sigma policy comes uncomfortably close to the idea that a really small p-value can prove the existence of particles. You wouldn’t even get away with that in psychology.

In the 1990s, some people edited a book with the title “What if there were no significance tests?”. The most amusing contribution, amid the obligatory methodological criticisms, was Abelson’s “If there were no significance tests, they would be invented”. His point was that null hypothesis tests fulfill an important function in scientific practice, namely that of keeping our heads above the water: “we are awash in a sea of uncertainty, caused by a flood tide of sampling and measurement errors, and the best we can do is to keep our heads above water, and paddle like crazy”. In this chaos, he maintained, people need something to hold onto, and whether that is a p-value, a confidence interval, or a Bayes Factor doesn’t really matter. Of course, being a methodologist, I have to disagree with that on professional grounds. But as a human being, I certainly can see the point.

Denny Borsboom

Author Denny Borsboom

More posts by Denny Borsboom