For many years, academia has relied on citation count as the main way to measure the impact or importance of research, informing metrics such as the Impact Factor and the h-index. But how well do these metrics actually align with researchers’ subjective evaluation of impact and significance? Rachel Borchardt and Matthew R. Hartings report on a study that compares researchers’ perceptions of significance, importance, and what is highly cited with actual citation data. The results reveal a strikingly large discrepancy between perceptions of impact and the metric we currently use to measure it.
Academia, we have a problem. What began as an attempt to quantify research quality has gotten away from us and taken on a life of its own. This problem isn’t particularly new; it has been widely recognised by scholars and researchers and, as a result, is being talked about more openly. The problem comes down to defining and measuring impact.
A simple description for impactful research is research that gets used. Some research has the ability to transform society through groundbreaking discovery, to impact social policy and government regulation through eye-opening analysis, and the ability to engage public attention with research that’s relevant to their lives, environment, or wellbeing. Loosely bound together, we tend to refer to this kind of research as “high-impact”, and it has become the focus for many universities, research centres, and administrators as they compete for grant funding, to attract the best and brightest students, and for prestige and rankings.
However, designating research as high-impact is not as straightforward as it may seem. For many years, academia has relied on citation count as the main way in which we measure impact or importance of research. As a result, citation count is one of the primary metrics used when evaluating researchers. Citation counts also form the basis for other metrics, most notably Clarivate’s Impact Factor as well as the h-index, which respectively evaluate journal quality/prestige and researcher renown.
Citations, JIF, and h-index have served as the triumvirate of impact evaluation for many years, particularly in STEM fields, where journal articles are frequently published. Many studies have pointed out various flaws with reliance on these metrics, and over time, a plethora of complementary citation-based metrics have been created to try and address various proficiencies. At the same time, we see altmetrics emerging as a potential alternative or complement to citations, where we can collect different data about the ways in which research is viewed, saved, and shared online.
However, what is discussed less often is how well all of these metrics actually align with the subjective evaluation of impact and significance itself. We have all come to see metrics as synonymous with impact and, by proxy, importance. But are they?
We set out to answer this question by surveying chemistry researchers to gauge their perceptions of significance, importance, and highly cited materials. In a post on Matt’s chemistry-oriented blog, we asked readers to look at 63 articles from one issue of the Journal of the American Chemical Society and take the #JACSChallenge. We asked them to identify up to three articles in the issue that they thought were: the most significant (allowing them to define significance however they deemed fit); the most highly-cited; the articles they would share with other chemists; and the articles they would share more broadly. We analysed data from more than 350 respondents.
The results, while not truly startling, were nevertheless a stark illustration of how different these concepts are. To start, respondents chose different articles for each of the four questions, though some questions correlated more highly than others. Significant and highly cited articles had the highest correlation at .9, while articles to share with chemists and articles to share broadly had the lowest correlation at .64. This tells us that our respondents see differences in these different approaches to what could all be called “impactful research”.
Table 1: Correlations between answers given for each question. This table is taken from the authors’ co-written article “Perception of the importance of chemistry research papers and comparison to citation rates” and is published under a CC BY 4.0 license.
But perhaps the more startling discovery was when we started to compare these responses to citations. In comparing the four questions to citation counts 10 and 13 years after the articles were published, the correlations ranged from .06 (articles to share with chemists) to .33 (highly-cited articles). This shows a strikingly large discrepancy between researchers’ perceptions of impact and the metric we currently use to measure impact.
Figure 1: Respondent evaluations and citations (2013) by paper. The top panel shows the composite selections of our respondents for the question asking which papers they thought had the most citations (blue) and the actual number of citations in 2013 (grey). The other panels also include the number of citations (grey) along with: selections for most significant (green), selections for which should be shared with chemists (yellow), which should be shared widely (orange), and h-index of the corresponding author (red) for each of the manuscripts in the journal issue. This figure is taken from the authors’ co-written article “Perception of the importance of chemistry research papers and comparison to citation rates” and is published under a CC BY 4.0 license.
Why are these correlations so low? There are likely a number of reasons why actual citation practice is not more closely aligned with researcher perception, but it highlights just how divorced perception is from current practice.
So what now? We think this work clearly highlights a major issue with metrics – they aren’t measuring what everyone commonly assumes we are measuring, or at least, are not accurately representing the more abstract perceptions of impact and importance that we measured in our survey.
As hinted earlier, we think our research shows that impact goes beyond citation count, and beyond scholarly impact. Recent articles, such as that in PLoS Biology and Nature, also call out current evaluation models for researchers. But what can we done to change current practice?
Some of the responsibility lies with the evaluators – the administrators, the “benchmarkers” of university prestige rankings, the grant funders. But responsibility also lies with researchers and their respective professional societies. Many professional societies have large, blanket statements about the role of metrics in the evaluation of researchers in this field, but we think there’s more work to be done.
For chemistry, Matt’s field, this means better describing the types of impact that chemists can have, in academia and beyond, and laying them out in a document that chemists can rely on when asked to submit their bodies of work for review, such as during tenure and promotion. For library science, Rachel’s field, they are going a step further by creating an evaluation framework that clearly communicates the types of research outputs created by academic librarians and models for their evaluation. This type of framework is best demonstrated by the Becker Model, created for the biomedical community, which highlights five different areas of impact, including economic and policy impact, and clearly outlines research outputs and evaluation models for each area of impact.
Every academic discipline would be well-served in taking a serious look at the research output of their discipline and providing meaningful guidance on its importance within the scholarly communications of that discipline, along with best practices for its appropriate evaluation. Concurrently, researchers can also advocate for change in research evaluation practices at their institutions in the form of updated policy documents, including departmental guidelines for tenure and promotion, that more accurately reflect their disciplinary research and its impact.
Only then will we start to bridge the gap between “actual” impactful practice and meaningful assessment of research.
This blog post is based on the authors’ co-written article, “Perception of the importance of chemistry research papers and comparison to citation rates”, published in PLoS ONE (DOI: 10.1371/journal.pone.0194903).
Featured image credit: Dmitri Popov, via Unsplash (licensed under a CC0 1.0 license).
Note: This article gives the views of the authors, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below.
About the authors
Rachel Borchardt is the Science Librarian at American University. Her research interests focus on the intersection of research impact, metrics, and libraries, and she is co-author of the 2015 OA book Meaningful Metrics. Rachel hopes to help bridge the gap between impactful research and research evaluation practices.
Matthew R. Hartings is an Associate Professor of Chemistry at American University. His research interests include developing new materials for biomineralisation and 3D printing. Matthew is also deeply invested in advancing the ways in which science is communicated by the chemistry community. These interests manifest in a range of activities from studying the perception of research articles to engaging non-chemists with food and cooking chemistry. Matthew has recently published a book, Chemistry in Your Kitchen, in which he explores the fascinating chemistry people do every day in their own homes.
One issue we all struggle with is that the broadest definitions of impact tend to be lists of abstract virtues and subjective assessments of what is ‘impactful’. Altmetrics often only indicate level of popularity and/or controversy surrounding a publication. In the last Ref, the impact statements were typically subjective storytelling, with little comparability even in the same disciplines. Qualitative versus quantitative indicators are always likely to be an issue here. The authors are right to highlight the need for gatekeepers of assessment to get to grips with this.
Given the significance of bias in the review of scientific articles, I’d be interested in learning how you addressed and assessed implicit gender bias in your study. Thank you.
Isn’t it clear that bibliographic indices (and also monetary ones, by the way) cannot correlate with anything? Their distributions are all so strongly skewed that if you plot them against each other, all observations clutter in the lower left corner. Any regression line is exclusively determined by the outliers in the plot. By the way, any distribution approaching an exponential strongly suggest that most or all variation is due to pure chance factors.
Some journals implicitly give a higher premium to submitted manuscripts that cite previous articles published by the same journals. Data from citations can thus be informed by this bias. While citations should not be discarded altogether, readership should also play a key role in impact factor analysis. For example, if I make my article available on a research repository like Bepress or SSRN, I get monthly statistics of downloads of my articles. Accessibility and readership of my article should count with respect to impact factor. For an academic, impact factor then may not be judged on the journal you publish in but rather the accessibility of your work. Merely seeking to publish in a journal with a high impact factor based on citations can just result in free-riding on the success of a few frequently cited articles in that particular journal. If the main aim of publishing is to disseminate and advance the frontiers of knowledge, then readership should play a critical role in impact factor analysis. This can also compel more publishers to make accessibility a key component in their publishing decisions.