Access to more and more publication and citation data offers the potential for more powerful impact measures than traditional bibliometrics. Accounting for more of the context in the relationship between the citing and cited publications could provide more subtle and nuanced impact measurement. Ryan Whalen looks at the different ways that scientific content are related, and how these relationships could be explored further to improve measures of scientific impact.
There has long been extensive discussion both here and elsewhere about the need for more nuanced measures of academic impact. While citations provide valuable evidence of how research goes on to impact future work, traditional “bean counting” impact measures lack subtlety and an appreciation for the varied meanings that citations can have.
This topic cropped up repeatedly at the Quantifying and Analysing Scholarly Communication on the Web workshop hosted at the ACM WebSci conference at Oxford last year. Participants discussed a variety of ways that developments in scholarly communication along with advances in data access and computational power could be leveraged to improve impact measures.
Traditional impact measures that rely on citation counting capitalize on the relationships between scientific articles. They assume that if Article B cites Article A, then Article A in some way influenced the later work and thus deserves some credit for it. However, this is obviously a relatively coarse measure; one that has evolved in this manner as a necessary concession to the complexity of scientific knowledge diffusion.
Image credit: Marcin Wichary Hollerith Census Machine dials (CC BY)
As access to data and computational power improve, this concession becomes less-and-less necessary. Increased computational power and access to more and more publication and citation data offer the potential for impact measures that account for more of the context in the relationship between the citing and cited publications. Accounting for more of the context that defines the relationship between publications will allow bibliometricians, researchers, and policymakers more fine-grained control over impact measures. As more layers of context are measured, those interested will be able to “turn the dials” on their metrics and more accurately measure the sort of impact they are interested in.
Our paper at ASCW demonstrated that—at least in the context of one journal—accounting for heterogeneity in citations improved our ability to predict high impact scientific articles. In this exploratory project we developed a measure that accounted not just for the presence or absence of citations, but weighted those citations based on how similar the citing/cited documents were. Doing so in the context of early citations, improved our ability to predict which articles would go on to receive more later citations.
My own work on patent citation data shows similarly interesting results. By weighting patent citations based on the textual similarity of the citing/cited patents, I reveal an inverted curvilinear relationship between citation similarity, and the power that a citation has in predicting future references.
This plot shows the 95% confidence intervals of estimated future citations based on the presence of early citations to patents. Early citations are separated into four textual similarity quartiles from most to least similar. We see that citations from both dissimilar and highly-similar patents predict fewer future references than those from middling-similarity inventions.
This is not to say that textual similarity is the only metric to use in contextualizing citation measures. I’ve raised it here as an example, but it is only one of many potential methods that bibliometricians can use to provide more subtlety and nuance to citation impact measures.
What we need now is both theoretical and empirical work that explores the different ways that citing/cited content are related to one another, and how these relationships can be used to improve measures of scientific impact. This work should focus on two areas of pressing need:
- Theoretically justifying why specific measures of context are appropriate to include in citation impact analyses
- Empirically testing different measures of citation context to determine how they affect impact assessment.
There are many potential types of data that seem at least facially appropriate to include in a weighted citation impact measure. Things like geographic diffusion, disciplinary boundary spanning, crossing language divides, etc. all offer potential insight into how impactful a given piece of scientific research is. Studies of scientific impact should focus on developing measures of these varied sorts of relationships between citing/cited references and subsequently using data about these relations to improve impact assessment.
Note: This article gives the views of the author, and not the position of the LSE Impact blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Ryan Whalen is a joint JD/PhD candidate at Northwestern University where he is affiliated with the Science of Networks in Communities (SONIC) research group and serves as a Law & Science Fellow at the Northwestern University School of Law. His research focuses on collaboration & creativity, innovation policy, and intellectual property law.
This is part of a series of pieces from the Quantifying and Analysing Scholarly Communication on the Web workshop. More from this series:
The ResearchGate Score: a good example of a bad metric
According to ResearchGate, the academic social networking site, their RG Score is “a new way to measure your scientific reputation”. With such high aims, Peter Kraker, Katy Jordan and Elisabeth Lex take a closer look at the opaque metric. By reverse engineering the score, they find that a significant weight is linked to ‘impact points’ – a similar metric to the widely discredited journal impact factor.Transparency in metrics is the only way scholarly measures can be put into context and the only way biases – which are inherent in all socially created metrics – can be uncovered.
Bringing together bibliometrics research from different disciplines – what can we learn from each other?
Currently, there is little exchange between the different communities interested in the domain of bibliometrics. A recent conference aimed to bridge this gap.Peter Kraker, Katrin Weller, Isabella Peters and Elisabeth Lex report on the multitude of topics and viewpoints covered on the quantitative analysis of scientific research. A key theme was the strong need for more openness and transparency: transparency in research evaluation processes to avoid biases, transparency of algorithms that compute new scores and openness of useful technology.
We need informative metrics that will help, not hurt, the scientific endeavor – let’s work to make metrics better.
Rather than expecting people to stop utilizing metrics altogether, we would be better off focusing on making sure the metrics are effective and accurate, argues Brett Buttliere. By looking across a variety of indicators, supporting a centralised, interoperable metrics hub, and utilizing more theory in building metrics, scientists can better understand the diverse facets of research impact and research quality.