Skip to main content
ScienceSocietySpiegeloog 411: Power

More Power to Psychology

By April 23, 2021No Comments

One could argue that psychology – like many other sciences – is underappreciated by the public. Policymakers could improve their decision-making, students could learn better, and people would overall be happier. Yet, before psychological research is used to fix society’s problems, shouldn’t it solve its own issues first?

One could argue that psychology – like many other sciences – is underappreciated by the public. Policymakers could improve their decision-making, students could learn better, and people would overall be happier. Yet, before psychological research is used to fix society’s problems, shouldn’t it solve its own issues first?

Photo by Bersson29
Photo by Bersson29

Imagine that a politician stumbles upon research (Rauscher, Shaw, & Ky, 1993) indicating that listening to classical music improves spatial reasoning skills. ‘This is an excellent opportunity to use scientific research in order to improve our society!’, the politician thinks. Hence, he convinces his fellow policymakers to fundamentally change the country’s education system by making students listen to classical music for 10 minutes before each math class. With the help of this intervention, thousands of school children would improve their mathematical ability. Right?

Not so fast. You’ve probably already been suspicious of this research, and as it turns out, studies do not find any consistent effect of music on cognitive ability (Mehr, Schachner, Katz, & Spelke, 2013). Consequently, there is no reason to assume that such a change in the education system would be anything more than a huge waste of time for everyone involved.

The obvious point here is that policies based on scientific research are only beneficial if the research is actually correct. In the worst case, interventions developed from invalid research might even be counterproductive and harmful – e.g., if, in reality, listening to classical music actually worsens children’s ability to learn. This is especially relevant for psychology, considering the fact that my beloved discipline is exceptionally affected by the replication crisis (i.e., researchers who rerun studies often don’t find the effects suggested by the original studies). Hence, we should actually be more hesitant to promote findings from psychological research for the public, unless we are reasonably certain that the research is accurate and not affected by the same issues that so many of the psychological studies are suffering from.

Now I’m sure you’re wondering: ‘Why, then, does this article’s title suggest that we should give more power to psychological research?’ Well, let’s take a few steps back and take a closer look at some of the origins of the replication crisis. For this, I will assume that you have a basic background in psychological research methodology (statistical significance, type 1 & 2 error, etc).

“research study is unlikely to get published if you don’t have significant results”

The most obvious problem is that many psychological researchers engage in bad research practices, such as p-hacking and HARKing. Assume, as an illustration of the former, that you are investigating the relationship between music and cognitive ability. To your dismay, you find that your results are not significant, and so you start tweaking your study: you exclude a few outliers, you change the measurements of your concepts slightly, and voilà: you p-hacked your results into significance. You can make it even easier for yourself with HARKing (hypothesizing after the results are known): you simply take one of the huge datasets containing hundreds of variables available online and run lots of correlations. At some point, you will find a correlation just by chance. When this happens, you simply deceive yourself into thinking: ‘Well, this is what I hypothesized all along!’ And again, you end up with a result that falsely declares there to be an effect between two variables.

Now you might be wondering: If it’s that easy to understand why these research practices are bad, why do professional researchers still fall prey to them from time to time? Why is it so important for researchers to find significant effects, such that they are compromising the validity of their studies to reach the desired significance level? Again, the answer is quite straightforward: because a research study is unlikely to get published if you don’t have significant results. Having in mind that publications are basically determining your success as an academic, there is a strong incentive to make the results significant in some way just to get it published. In short, the publication bias of scientific journals is one of the underlying reasons why researchers engage in those bad research practices illustrated above.

I know what you’re thinking: ‘Why don’t scientific journals accept results that don’t show an effect?’ After all, just because a study didn’t find an effect, this doesn’t mean that its methodology was somehow flawed. It might just mean that the hypothesized effect simply doesn’t exist. The problem is that the established standards regarding errors of type 1 (false positives) and type 2 (false negatives) are very different. On the one hand, the highest accepted type 1 error rate is 5% (i.e., assuming that there is, in reality, no effect, only 5% of the studies would wrongly conclude that there is an effect). On the other hand, the accepted type 2 error rate is a whopping 20% (i.e., assuming that there is an effect, 20% of the studies would wrongly conclude that there is no effect). Hence, a random scientific paper that does not find an effect is more likely to be wrong than a random paper that does find the effect. Ultimately, this leads journals to discriminate against the former, which explains journals’ publication bias.

“Whether or not an article will be accepted for publication thereby does not depend on the results anymore.”

But again, you can ask the ‘why’ question to dig deeper. How come there are different standards for type 1 and type 2 errors in the first place? Why do researchers care more about a low type 1 error rate than about a low type 2 error rate? Because of its consequences. If a study finds an effect that does in fact not exist (type 1 error), then all the research money used to further investigate the purported effect goes to waste, and the same is true for resources that are spent to develop practical interventions based on the supposed effect – like the fictitious politician who reshapes the country’s education system based on incorrect research. If it is the other way around and the study does not find an effect that actually exists (type 2 error), then the consequences are not that dramatic. Sooner or later, someone else will investigate the topic once more and find the effect, and until then, everything simply stays as it was.

Now that we have a grasp of some of the issues tormenting psychological research today, the question is: What can we do about it? The problem of publication bias can be addressed in an effective way by submitting registered reports rather than classic scientific articles. In a registered report, the article is already accepted for publication before the researchers collect the data, which makes it impossible for the journal to be biased towards significant results. Whether or not an article will be accepted for publication thereby does not depend on the results anymore.

Of course, for a registered report to be accepted by a scientific journal the report often has to improve on a regular scientific paper in a specific way, and by this point, I guess you will have figured out what the title of this article actually refers to. If a journal agrees to publish a result that shows no effect, it will probably want to make sure that the study does not make a type 2 error – or, in other words, that the study has sufficient statistical power. If a study has high statistical power, then it is more likely to find the effect if it actually exists. The larger the statistical power, the smaller the type 2 error rate (power = 1 – type 2 error rate). Hence, if a highly powered study does not find an effect, then we can assume that the effect actually doesn’t exist.

“before a psychological theory is used to shape our society, we should make sure that the research (on which the theory) is based is valid”

How do we increase the power of psychological studies (without increasing the type 1 error rate)? By using more participants, because if more subjects are taking part in your study, random fluctuations will have less impact – just like it’s better to throw a coin 100 times than 10 times if you want to figure out whether the coin is fair or not. There is a small chance that a fair coin shows heads 10 times in a row due to randomness, but the chance that it shows 100 heads in a row is much smaller. In the same way, if you have only 10 people in each group in your psychology experiment, the chance that these groups will be different merely because of random fluctuation is much higher than if you had 100 people in each group.

In conclusion, psychological research should definitely have more power, but not always in the way you would expect. I believe that psychological research has important applications that are relevant to the public, but before a psychological theory is used to shape our society, we should make sure that the research (on which the theory) is based is valid. Unfortunately, the replication crisis that has unfolded in recent years shows that many of them are not. You, dear reader and potential psychological researcher, can help us bring psychological science back on track – and increasing statistical power will be one of the ways to do so.<<

References

– Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two Randomized Trials Provide No Consistent Evidence for Nonmusical Cognitive Benefits of Brief Preschool Music Enrichment. PLoS ONE, 8(12). Doi: https://doi.org/10.1371/journal.pone.0082007.
– Rauscher, F. H., Shaw, G. I., & Ky, K. N. (1993). Music and Spatial Task Performance. Nature, 365(6447).

Imagine that a politician stumbles upon research (Rauscher, Shaw, & Ky, 1993) indicating that listening to classical music improves spatial reasoning skills. ‘This is an excellent opportunity to use scientific research in order to improve our society!’, the politician thinks. Hence, he convinces his fellow policymakers to fundamentally change the country’s education system by making students listen to classical music for 10 minutes before each math class. With the help of this intervention, thousands of school children would improve their mathematical ability. Right?

Not so fast. You’ve probably already been suspicious of this research, and as it turns out, studies do not find any consistent effect of music on cognitive ability (Mehr, Schachner, Katz, & Spelke, 2013). Consequently, there is no reason to assume that such a change in the education system would be anything more than a huge waste of time for everyone involved.

The obvious point here is that policies based on scientific research are only beneficial if the research is actually correct. In the worst case, interventions developed from invalid research might even be counterproductive and harmful – e.g., if, in reality, listening to classical music actually worsens children’s ability to learn. This is especially relevant for psychology, considering the fact that my beloved discipline is exceptionally affected by the replication crisis (i.e., researchers who rerun studies often don’t find the effects suggested by the original studies). Hence, we should actually be more hesitant to promote findings from psychological research for the public, unless we are reasonably certain that the research is accurate and not affected by the same issues that so many of the psychological studies are suffering from.

Now I’m sure you’re wondering: ‘Why, then, does this article’s title suggest that we should give more power to psychological research?’ Well, let’s take a few steps back and take a closer look at some of the origins of the replication crisis. For this, I will assume that you have a basic background in psychological research methodology (statistical significance, type 1 & 2 error, etc).

“research study is unlikely to get published if you don’t have significant results”

The most obvious problem is that many psychological researchers engage in bad research practices, such as p-hacking and HARKing. Assume, as an illustration of the former, that you are investigating the relationship between music and cognitive ability. To your dismay, you find that your results are not significant, and so you start tweaking your study: you exclude a few outliers, you change the measurements of your concepts slightly, and voilà: you p-hacked your results into significance. You can make it even easier for yourself with HARKing (hypothesizing after the results are known): you simply take one of the huge datasets containing hundreds of variables available online and run lots of correlations. At some point, you will find a correlation just by chance. When this happens, you simply deceive yourself into thinking: ‘Well, this is what I hypothesized all along!’ And again, you end up with a result that falsely declares there to be an effect between two variables.

Now you might be wondering: If it’s that easy to understand why these research practices are bad, why do professional researchers still fall prey to them from time to time? Why is it so important for researchers to find significant effects, such that they are compromising the validity of their studies to reach the desired significance level? Again, the answer is quite straightforward: because a research study is unlikely to get published if you don’t have significant results. Having in mind that publications are basically determining your success as an academic, there is a strong incentive to make the results significant in some way just to get it published. In short, the publication bias of scientific journals is one of the underlying reasons why researchers engage in those bad research practices illustrated above.

I know what you’re thinking: ‘Why don’t scientific journals accept results that don’t show an effect?’ After all, just because a study didn’t find an effect, this doesn’t mean that its methodology was somehow flawed. It might just mean that the hypothesized effect simply doesn’t exist. The problem is that the established standards regarding errors of type 1 (false positives) and type 2 (false negatives) are very different. On the one hand, the highest accepted type 1 error rate is 5% (i.e., assuming that there is, in reality, no effect, only 5% of the studies would wrongly conclude that there is an effect). On the other hand, the accepted type 2 error rate is a whopping 20% (i.e., assuming that there is an effect, 20% of the studies would wrongly conclude that there is no effect). Hence, a random scientific paper that does not find an effect is more likely to be wrong than a random paper that does find the effect. Ultimately, this leads journals to discriminate against the former, which explains journals’ publication bias.

“Whether or not an article will be accepted for publication thereby does not depend on the results anymore.”

But again, you can ask the ‘why’ question to dig deeper. How come there are different standards for type 1 and type 2 errors in the first place? Why do researchers care more about a low type 1 error rate than about a low type 2 error rate? Because of its consequences. If a study finds an effect that does in fact not exist (type 1 error), then all the research money used to further investigate the purported effect goes to waste, and the same is true for resources that are spent to develop practical interventions based on the supposed effect – like the fictitious politician who reshapes the country’s education system based on incorrect research. If it is the other way around and the study does not find an effect that actually exists (type 2 error), then the consequences are not that dramatic. Sooner or later, someone else will investigate the topic once more and find the effect, and until then, everything simply stays as it was.

Now that we have a grasp of some of the issues tormenting psychological research today, the question is: What can we do about it? The problem of publication bias can be addressed in an effective way by submitting registered reports rather than classic scientific articles. In a registered report, the article is already accepted for publication before the researchers collect the data, which makes it impossible for the journal to be biased towards significant results. Whether or not an article will be accepted for publication thereby does not depend on the results anymore.

Of course, for a registered report to be accepted by a scientific journal the report often has to improve on a regular scientific paper in a specific way, and by this point, I guess you will have figured out what the title of this article actually refers to. If a journal agrees to publish a result that shows no effect, it will probably want to make sure that the study does not make a type 2 error – or, in other words, that the study has sufficient statistical power. If a study has high statistical power, then it is more likely to find the effect if it actually exists. The larger the statistical power, the smaller the type 2 error rate (power = 1 – type 2 error rate). Hence, if a highly powered study does not find an effect, then we can assume that the effect actually doesn’t exist.

“before a psychological theory is used to shape our society, we should make sure that the research (on which the theory) is based is valid”

How do we increase the power of psychological studies (without increasing the type 1 error rate)? By using more participants, because if more subjects are taking part in your study, random fluctuations will have less impact – just like it’s better to throw a coin 100 times than 10 times if you want to figure out whether the coin is fair or not. There is a small chance that a fair coin shows heads 10 times in a row due to randomness, but the chance that it shows 100 heads in a row is much smaller. In the same way, if you have only 10 people in each group in your psychology experiment, the chance that these groups will be different merely because of random fluctuation is much higher than if you had 100 people in each group.

In conclusion, psychological research should definitely have more power, but not always in the way you would expect. I believe that psychological research has important applications that are relevant to the public, but before a psychological theory is used to shape our society, we should make sure that the research (on which the theory) is based is valid. Unfortunately, the replication crisis that has unfolded in recent years shows that many of them are not. You, dear reader and potential psychological researcher, can help us bring psychological science back on track – and increasing statistical power will be one of the ways to do so.<<

References

– Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two Randomized Trials Provide No Consistent Evidence for Nonmusical Cognitive Benefits of Brief Preschool Music Enrichment. PLoS ONE, 8(12). Doi: https://doi.org/10.1371/journal.pone.0082007.
– Rauscher, F. H., Shaw, G. I., & Ky, K. N. (1993). Music and Spatial Task Performance. Nature, 365(6447).
Valentin Weber

Author Valentin Weber

More posts by Valentin Weber