Skip to main content
ScienceSIOSSpiegeloog 418: Time

Misused Metrics and the Quest to Quantify Good Science

By April 15, 2022No Comments

Humans have an innate need to quantify success in an objective and simple manner. We operationalize our achievements in order to estimate our position in a hierarchy as accurately as possible, be it about being the fastest runner in the world or the student who writes the best philosophical essays. These accomplishments then lead to acknowledgements – a gold medal with a fancy ceremony, the highest possible grade and a special mention by one’s teacher. And in academia, the best paying tenure-track position at a high-ranking university. Inevitably, the need for a parameter that could reliably estimate our capacities as researchers has arisen. This has resulted in the invention and use of numerous metrics such as the impact factor and the h-index, both of which are actively used in the evaluation of individual researchers.

 

About the Author

Nita is a final-year psychology student at the UvA, specializing in Brain & Cognition with a minor in Biomedical Sciences: Neurobiology. Interested in pursuing a career in psychopathological research, she is passionate about transparent scientific communication.

Humans have an innate need to quantify success in an objective and simple manner. We operationalize our achievements in order to estimate our position in a hierarchy as accurately as possible, be it about being the fastest runner in the world or the student who writes the best philosophical essays. These accomplishments then lead to acknowledgements – a gold medal with a fancy ceremony, the highest possible grade and a special mention by one’s teacher. And in academia, the best paying tenure-track position at a high-ranking university. Inevitably, the need for a parameter that could reliably estimate our capacities as researchers has arisen. This has resulted in the invention and use of numerous metrics such as the impact factor and the h-index, both of which are actively used in the evaluation of individual researchers.

About the Author

Nita is a final-year psychology student at the UvA, specializing in Brain & Cognition with a minor in Biomedical Sciences: Neurobiology. Interested in pursuing a career in psychopathological research, she is passionate about transparent scientific communication.

An impact factor (IF) quantifies the influence of a scientific journal in statistical terms. It represents the average number of citations that articles in a given journal received over the last two years and can, arguably, be used to establish the relevance of the researched topic and the impact it has on the scientific community. Originally, the IF aided librarians to determine which journals to purchase for the university’s bookshelf but it has since been employed by parties with another agenda: ranking and recruitment of individual researchers. Nowadays, an academic’s IF may be looked at by universities’ admissions and registration offices as well as the public bodies that fund their research. 

Giving the IF this much weight is especially problematic considering that drawing individual-level conclusions from population-level data constitutes an ecological fallacy. In other words, the statistical long-term journal-specific article citation count data is misinterpreted as if it told us something about the author when in reality, the only thing it tells us is the journal’s impact. As an example of how easily the IF can be manipulated, we should look at how it tends to favour researchers who work in fields with a specific pattern of literature expansion. In biology, for instance, the body of work increases rapidly and affects numerous research areas, leading to multiple references per article and a quicker publishing rate. In physics and mathematics, in turn, the conduction of research is much slower and prior studies are not cited as much. This results in higher IFs for biologists compared to mathematicians; and not because either one would be any more qualified than the other, but because the parameter conflates good quality and fast production. Although each grant application is, of course, evaluated individually and the researchers’ respective departments are taken into account, this is an important demonstration of how easily the IF is influenced by irrelevant factors.

Avoiding falling into the trap of these formal fallacies has resulted in numerous attempts to invent metrics that reliably quantify the success of an individual academic instead of a journal. For instance, the h-index is a parameter that summarizes an author’s productivity (as measured by the number of publications) and the impact of their publications (as measured by the number of times that the publications are cited in other papers). Thus, its use can incentivize editors to improve their journals through the publication of relevant and high-quality science, and universities to employ the most talented and passionate researchers. The h-index has also been shown to correlate with clear measures of a researcher’s success such as holding positions in high-ranking universities and even possessing a Nobel prize. However, the question of whether this positive relationship exists independent of the prevailing academic culture or whether we have merely created it by gatekeeping the positions and prizes for the researchers with the highest parameters remains. Nevertheless, the problematic nature of the h-index becomes especially evident when we zoom into how it can be increased: by publishing more and by getting more of one’s work cited.

“there is a fine line between encouraging and pressuring authors to publish”

By publishing more, authors are encouraged to investigate a topic deeper and to find their niche. Needless to say, focusing on a specific nucleus group instead of the brain as a whole in neuroscience, or a particular cluster of personality disorders instead of the whole DSM in clinical psychology may prove to be more fruitful in terms of scientific breakthroughs. Although knowledge from multiple fields and the ability to bridge between them are characteristics of a great problem-solver, at these fundamental levels of research, they are not necessarily as needed. This stimulating of researchers toward specialization will ultimately lead them to publish more compared to if everyone remained a generalist; a great example of both the ancient and the modern theories of the division of labour. However, there is a fine line between encouraging and pressuring authors to publish – it is not unheard of that researchers feel forced to publish and may drift to the use of questionable research practices. For example, it is widely known that authors often trade insignificant resources to get their names on each other’s papers (e.g., “check my abstract for spelling and I will add you as one of the authors”) to inflate their publishing counts. Though, compared to the sharing of p-hacked (as in, misreported results) and therefore fundamentally ungeneralizable findings, these types of exchanges fall into the category of the lesser evil. Of course, pressuring scientists to produce more studies has broader implications for scientific advancement, too. As is, researchers may choose their specialization not because of passion for the subject but purely due to awareness of its metric performance. 

         The other method in which the h-index can be increased is by getting more of one’s articles cited. This way, authors are encouraged to work with meaningful, cutting-edge questions that have a great impact on the scientific community and the advancement of science. As one’s research question would affect multiple ongoing research programs in various sub-fields of the discipline, more academics will unavoidably use it in their own papers’ theoretical frameworks. Here too, however, this encouragement approach fosters problems: increasing one’s citation index is not possible only via meaningful findings but also by dubious methods such as self-citing. Although using one’s own prior findings to substantiate the background of a current research paper is allowed, if everyone does so, the network of publications will quickly become a closed loop.

It would be comfortable to conclude that these indexes’ only pitfalls are in their respective formulas. In reality, however, there are substantial financial interests and therefore gatekeeping hidden behind the quest to quantify good science. Namely, the official impact factors are calculated by Clarivate, an American analytics company that then sells the data to universities via the Web of Science in exchange for an annual subscription fee of over €180.000. Fortunately, though, this compensation is not just for the recruiters to be able to select the best candidates, but also for the students and staff of universities to have the access to browse others’ publication indexes.

“the open science movement proposed their solution – drop the indexes completely and focus on qualitative metrics such as commitment to teamwork and transparency in research”

As we can see, although the initial reasons for the use of these indexes were based on good incentives, they have been and continue to be misused. As early as in 1999, in a monumental essay titled Scientific Communication — A Vanity Fair? Georg Franck discussed the use of citation indexes and pointed out their capacity to produce a type of shadow economy in which self-citations and even completely counterfeit references are created. Finally, approximately two decades later, campaigns such as the open science movement have addressed the problem hands-on and proposed their solution – drop the indexes completely and focus on qualitative metrics such as commitment to teamwork and transparency in research. As we have had to admit that the journals’ tight grip on the null ritual is unlikely to loosen anytime soon, we have come up with our own solution: accept that the traditional threshold of significance has to be reached in order to get work published, but then, do not require researchers to publish and get high citation counts. As a result, establishments such as Utrecht University (UU) have recently announced their dedication to stop using the IF among other publication indexes as a measure of a researcher’s scientific success. Briefly and to the point, “[Impact factors] don’t really reflect the quality of an individual researcher or an academic,” says Paul Boselie, the project lead for the university’s new Recognition and Rewards scheme, for Nature in 2021.

Perhaps surprisingly, this decision to drop the IF in hiring and promotion practices has been met with resistance. In the Netherlands, 172 researchers, 143 of which are professors, signed an open letter that demands UU to reconsider its decision. Understanding the imperfectness of the metric, these scientists say that its use is nevertheless helpful and will aid in keeping the evaluation of researchers objective. Indeed, arguably, the extent to which a researcher practices open science and is dedicated to teamwork is difficult to measure and is greatly affected by subjective factors such as the nature of one’s interpersonal relationships. Moreover, IFs are still regarded by parties that fund research at UU, risking the institution’s future in terms of their ability to engage in (international) competition and ultimately, to conduct research in the first place. Thus, unless other universities and their major funders jump on this bandwagon, UU perhaps risks isolating itself from the rest of the academic world. In general, it seems that our system is not fully ready for a large-scale change like this but would instead require a broader allocation of resources into all the open science principles. 

Although the academic industry could run with these shoes for many decades, in 2022, it seems that they are starting to wear off. More and more researchers are stepping up and taking part in the discussion, aiding in the making of a new system of evaluations that tries to take into account the researcher’s overall contribution to science instead of summarizing decades of one’s work in a single value. In general, this conversation seems to follow a pattern familiar in our field of study: we tend to measure constructs that we have yet to define.  <<

Student Initiative for Open Science: This Month

This article has been written as part of an ongoing collaborative project with the Student Initiative for Open Science (SIOS). The Amsterdam-based initiative is focused on educating undergraduate- and graduate-level students about good research practices.

In May 2022, SIOS is organizing a movie night presenting Beyond the Paywall. Keep an eye on the website to register for this colloquium event! SIOS is also looking for new members – if the article spoke to you and you would like to be part of the process of implementing the open science framework in the academic world, join today!

An impact factor (IF) quantifies the influence of a scientific journal in statistical terms. It represents the average number of citations that articles in a given journal received over the last two years and can, arguably, be used to establish the relevance of the researched topic and the impact it has on the scientific community. Originally, the IF aided librarians to determine which journals to purchase for the university’s bookshelf but it has since been employed by parties with another agenda: ranking and recruitment of individual researchers. Nowadays, an academic’s IF may be looked at by universities’ admissions and registration offices as well as the public bodies that fund their research. 

Giving the IF this much weight is especially problematic considering that drawing individual-level conclusions from population-level data constitutes an ecological fallacy. In other words, the statistical long-term journal-specific article citation count data is misinterpreted as if it told us something about the author when in reality, the only thing it tells us is the journal’s impact. As an example of how easily the IF can be manipulated, we should look at how it tends to favour researchers who work in fields with a specific pattern of literature expansion. In biology, for instance, the body of work increases rapidly and affects numerous research areas, leading to multiple references per article and a quicker publishing rate. In physics and mathematics, in turn, the conduction of research is much slower and prior studies are not cited as much. This results in higher IFs for biologists compared to mathematicians; and not because either one would be any more qualified than the other, but because the parameter conflates good quality and fast production. Although each grant application is, of course, evaluated individually and the researchers’ respective departments are taken into account, this is an important demonstration of how easily the IF is influenced by irrelevant factors.

Avoiding falling into the trap of these formal fallacies has resulted in numerous attempts to invent metrics that reliably quantify the success of an individual academic instead of a journal. For instance, the h-index is a parameter that summarizes an author’s productivity (as measured by the number of publications) and the impact of their publications (as measured by the number of times that the publications are cited in other papers). Thus, its use can incentivize editors to improve their journals through the publication of relevant and high-quality science, and universities to employ the most talented and passionate researchers. The h-index has also been shown to correlate with clear measures of a researcher’s success such as holding positions in high-ranking universities and even possessing a Nobel prize. However, the question of whether this positive relationship exists independent of the prevailing academic culture or whether we have merely created it by gatekeeping the positions and prizes for the researchers with the highest parameters remains. Nevertheless, the problematic nature of the h-index becomes especially evident when we zoom into how it can be increased: by publishing more and by getting more of one’s work cited.

“there is a fine line between encouraging and pressuring authors to publish”

By publishing more, authors are encouraged to investigate a topic deeper and to find their niche. Needless to say, focusing on a specific nucleus group instead of the brain as a whole in neuroscience, or a particular cluster of personality disorders instead of the whole DSM in clinical psychology may prove to be more fruitful in terms of scientific breakthroughs. Although knowledge from multiple fields and the ability to bridge between them are characteristics of a great problem-solver, at these fundamental levels of research, they are not necessarily as needed. This stimulating of researchers toward specialization will ultimately lead them to publish more compared to if everyone remained a generalist; a great example of both the ancient and the modern theories of the division of labour. However, there is a fine line between encouraging and pressuring authors to publish – it is not unheard of that researchers feel forced to publish and may drift to the use of questionable research practices. For example, it is widely known that authors often trade insignificant resources to get their names on each other’s papers (e.g., “check my abstract for spelling and I will add you as one of the authors”) to inflate their publishing counts. Though, compared to the sharing of p-hacked (as in, misreported results) and therefore fundamentally ungeneralizable findings, these types of exchanges fall into the category of the lesser evil. Of course, pressuring scientists to produce more studies has broader implications for scientific advancement, too. As is, researchers may choose their specialization not because of passion for the subject but purely due to awareness of its metric performance. 

         The other method in which the h-index can be increased is by getting more of one’s articles cited. This way, authors are encouraged to work with meaningful, cutting-edge questions that have a great impact on the scientific community and the advancement of science. As one’s research question would affect multiple ongoing research programs in various sub-fields of the discipline, more academics will unavoidably use it in their own papers’ theoretical frameworks. Here too, however, this encouragement approach fosters problems: increasing one’s citation index is not possible only via meaningful findings but also by dubious methods such as self-citing. Although using one’s own prior findings to substantiate the background of a current research paper is allowed, if everyone does so, the network of publications will quickly become a closed loop.

It would be comfortable to conclude that these indexes’ only pitfalls are in their respective formulas. In reality, however, there are substantial financial interests and therefore gatekeeping hidden behind the quest to quantify good science. Namely, the official impact factors are calculated by Clarivate, an American analytics company that then sells the data to universities via the Web of Science in exchange for an annual subscription fee of over €180.000. Fortunately, though, this compensation is not just for the recruiters to be able to select the best candidates, but also for the students and staff of universities to have the access to browse others’ publication indexes.

“the open science movement proposed their solution – drop the indexes completely and focus on qualitative metrics such as commitment to teamwork and transparency in research”

As we can see, although the initial reasons for the use of these indexes were based on good incentives, they have been and continue to be misused. As early as in 1999, in a monumental essay titled Scientific Communication — A Vanity Fair? Georg Franck discussed the use of citation indexes and pointed out their capacity to produce a type of shadow economy in which self-citations and even completely counterfeit references are created. Finally, approximately two decades later, campaigns such as the open science movement have addressed the problem hands-on and proposed their solution – drop the indexes completely and focus on qualitative metrics such as commitment to teamwork and transparency in research. As we have had to admit that the journals’ tight grip on the null ritual is unlikely to loosen anytime soon, we have come up with our own solution: accept that the traditional threshold of significance has to be reached in order to get work published, but then, do not require researchers to publish and get high citation counts. As a result, establishments such as Utrecht University (UU) have recently announced their dedication to stop using the IF among other publication indexes as a measure of a researcher’s scientific success. Briefly and to the point, “[Impact factors] don’t really reflect the quality of an individual researcher or an academic,” says Paul Boselie, the project lead for the university’s new Recognition and Rewards scheme, for Nature in 2021.

Perhaps surprisingly, this decision to drop the IF in hiring and promotion practices has been met with resistance. In the Netherlands, 172 researchers, 143 of which are professors, signed an open letter that demands UU to reconsider its decision. Understanding the imperfectness of the metric, these scientists say that its use is nevertheless helpful and will aid in keeping the evaluation of researchers objective. Indeed, arguably, the extent to which a researcher practices open science and is dedicated to teamwork is difficult to measure and is greatly affected by subjective factors such as the nature of one’s interpersonal relationships. Moreover, IFs are still regarded by parties that fund research at UU, risking the institution’s future in terms of their ability to engage in (international) competition and ultimately, to conduct research in the first place. Thus, unless other universities and their major funders jump on this bandwagon, UU perhaps risks isolating itself from the rest of the academic world. In general, it seems that our system is not fully ready for a large-scale change like this but would instead require a broader allocation of resources into all the open science principles. 

Although the academic industry could run with these shoes for many decades, in 2022, it seems that they are starting to wear off. More and more researchers are stepping up and taking part in the discussion, aiding in the making of a new system of evaluations that tries to take into account the researcher’s overall contribution to science instead of summarizing decades of one’s work in a single value. In general, this conversation seems to follow a pattern familiar in our field of study: we tend to measure constructs that we have yet to define. <<

Student Initiative for Open Science: This Month

This article has been written as part of an ongoing collaborative project with the Student Initiative for Open Science (SIOS). The Amsterdam-based initiative is focused on educating undergraduate- and graduate-level students about good research practices.

In May 2022, SIOS is organizing a movie night presenting Beyond the Paywall. Keep an eye on the website to register for this colloquium event! SIOS is also looking for new members – if the article spoke to you and you would like to be part of the process of implementing the open science framework in the academic world, join today!

SIOS Editors

Author SIOS Editors

SIOS editorial staff.

More posts by SIOS Editors