Can metrics be used instead of peer review for REF-type assessments? With the stakes so high, any replacement would have to be extremely accurate. Olesya Mryglod, Ralph Kenna, Yurij Holovatch and Bertrand Berche looked at two metric candidates, including the departmental h-index, and four subject areas: biology, chemistry, physics and sociology. The correlations are significant, but comparisons with RAE indicate that while the departmental h-index is the best metric, it would not have been good enough to replace the peer review exercise. A more important question is whether we should seek to measure research quality using metrics at all.
Academic research is a very special kind of human endeavour. It is often founded purely on curiosity and useful applications may not be immediately obvious. Curiosity-driven research has, however, led to some of the most important practical advances our civilisation has produced. These include the internet, GPS, progress in genetics and in social network theories. Scientists and academic researchers involved in such discoveries and developments typically follow their career paths in pursuit of knowledge – rather than for financial gain. Indeed, commercial exploit-ability and profitability may be impossible to predict or entirely absent from blue-skies research. For this reason curiosity-driven research is mostly carried out at universities and research institutes funded by the public purse.
To check that society is getting the best possible value for money, some governments appraise the research emanating from higher education institutes on a regular basis. One of the world’s most developed assessment exercise is the UK’s Research Excellence Framework (REF), the results of which are due on the 18th of December. But does the REF itself provide value for money? It is based on peer review, which is considered by some as the most reasonable tool for comprehensive research evaluation. But, although it takes place only every 5-7 years, it is costly, disruptive and time-consuming. Is this a price worth paying to measure research? Indeed, can one reasonably measure this special human activity, which combines creativity and special way of thinking?
Image credit: Fritz Cohn Wikimedia (Creative Commons Attribution-Share Alike 3.0)
If one can do it, can it be done cheaply and non-invasively instead? It has been suggested that a set of automated, scientometric or bibliometric indicatorsmay form a suitable basis for a substitute for, or component of, peer-review at the level of the research groupor department. Indicators and metrics are certainly cheap and easy for managers to use (perhaps too easy – they reflect only a simplistic aspect of the research process). And because they can be monitored continuously, they would avoid the disruption and tension in the run-up to REF time. So, can metrics be used instead of peer review for REF-type assessments?
The stakes in this game are very high. Besides determining the amount of money which society donates to universities for research, the REF is the primary source for research rankings and therefore contributes to the reputations of universities, departments and research institutes in the UK. So any replacement for the REF has to be extremely accurate to be accepted by policy makers and the academic community.
In recent papers [1,2] we compared a citation-based indicator to the results of the UK’s last appraisal, the Research Assessment Exercise (RAE). Conducted in 2008, this was also based on peer review. Although RAE2008, like REF014, delivered a quality profile for each submission, this can be compacted into a single quality estimator using the post-RAE funding formula used by the Higher Education Funding Council for England (HEFCE). We denote the resulting statistic by s (as it is some measure of research strength per head). Our objective was to try to find a bibliometric indicator which correlates well with s.
We looked at two candidates and four subject areas: biology, chemistry, physics and sociology [3]. The best was a departmental version of the Hirsch index (h-index). As in Dorothy Bishop’s blog, a departmental h-index of n means that n papers, authored by staff from a given department, and in a given subject area, were cited n times or more in a given time period. The departmental h-index is easily calculated using a data-base such as Scopus. There are many differences between the data sets behind s and h. For example, unlike the RAE or REF, all researchers in a department contribute in principle to h, not just a select few. Also, while the outputs of a researcher who has moved institutions during the REF period can count towards the RAE/REF submission of the new domicile, contributions to departmental h-index are based on affiliations as recorded on the Scopus database (for example). Despite these difference (and more), the results were (surprisingly) not too bad. We found correlation coefficients between about 0.55 and 0.8. But are these good enough to make predictions?
Fig. 1. h2008 versus the peer-review based measure s for research groups from different HEI’s in sociology. The Pearson correlation coefficient here is equal to 0.62.
Before discussing this, we ask if we can improve these results. First we tweaked the formula for s; instead of basing it on the post-RAE HEFCE funding formula, which valued 4*, 3* and 2* research in the ratio 7:3:1, we used the more recent formula involving the ratio 3:1:0. We found no improvement. Actually, s encapsulates three aspects of research: the outputs themselves (mostly publications), the research environment and esteem indicators. Since the h-index only involves outputs, we also restricted the calculation of s to that component of RAE2008. Again there was no improvement. We conclude that our crude statistic s is pretty robust as a summary of RAE profiles.
The h-indices we use in Fig.1 were measured at beginning of 2008 and involved the same time-window as RAE 2008, i.e., papers which appeared between 2001–2007. Citation counts, of course, change with time and to investigate the evolution of the Hirsch metric we also determined h as of 2009, 2010, etc, each based on publications appearing in the preceding 7 years. We found that while the h-indices grow gradually, the ranks of the various institutions do not change significantly year on year and the correlation coefficients do not become stronger with time. This means that, if one wants to use departmental h-indices based on the citation within the limited time window, it is as reasonable to do so early in the game as later; one does not have to wait for citations to accumulate when dealing with entire departments.
This brings us to our conclusions for RAE and our predictions for REF. The correlations of between 0.55 and 0.8 which we measured (see Fig.1 for the case of sociology) would certainly not have been good enough to replace RAE2008 by the departmental h-index. Various higher education institutes (HEI’s) for the four subject areas are ranked in Tables 1-4 (not all HEI’s are listed due to technical reasons – see [3]). E.g., the University of Essex was in second place in the list of HEI’s in sociology when ranked using the RAE2008-score s, but in 20th place using the departmental h-index. We have yet to see how this plays out for REF2014. However, we still can try to predict the changes in the ranked positions of HEI’s by comparing its new and old departmental h-indices. E.g., the departmental h-index predicts that Oxford and Cambridge will both rise in the sociology ranks at REF2014 while Manchester will fall. In this sense, perhaps the h-index can be used as a navigator between REF’s. These and other predictions are contained in the tables and in [3]. Now we have to wait until the REF2014 results to see what will happen.
In summary, comparisons with RAE indicate that the departmental h-index is perhaps the best metric we have but it would not have been good enough to replace that peer review exercise. Time will tell how it performs in relation to REF or if it can be used as a “navigator”.
In our opinion, a more important question is whether we should seek to measure research quality using metrics at all. In our opinion, we should not. We believe that their introduction would encourage managers to force researchers to change direction and pursue metrics. This would undermine academic freedom itself, the foundation of basic research. It would therefore be devastating to an endeavour which is at the very heart of what it is to be human and a foundation of our society — curiosity itself.
If we have to monitor research quality, let us stick with peer review. Let us accept that REF distorts the very thing it seeks to measure, but let us turn that to our advantage. REF can be used as a driver not only for research quality but also for the conditions to enable top-quality research to thrive. In recent years these have become so severely distorted as to be damaging not only to science but to scientists themselves in the new metrics-driven culture of publish and perish.
Table 1. The list of British HEI’s in Biology, ranked by RAE2008-scores s, h2008 and by (the corresponding values of h-indices are shown in parentheses). is used for departmental h-index based on publication period between 2007 and 2013 to compare with REF 2014.
Table 2. As in Table 1 but for Chemistry.
Table 3. As in Table 1 but for Physics.
Table 4. As in Table 1 but for Sociology.
Note: This article gives the views of the authors, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Olesya Mryglod is Researcher at Laboratory for Statistical Physics of Complex Systems, Institute for Condensed Matter Physics of the National Academy of Sciences of Ukraine
Ralph Kenna is Professor of Theoretical Physics at the Applied Mathematics Research Centre, Coventry University.
Yurij Holovatch is Research Head at Laboratory for Statistical Physics of Complex Systems, Institute for Condensed Matter Physics of the National Academy of Sciences of Ukraine.
Bertrand Berche is Professor at the Statistical Physics Group, IJL, Université de Lorraine, France.
The argument that the weak correlation of departmental h-index with RAE/REF outcome means that h-index is not a good proxy is plainly false.
RAE/REF are not quantifying anything, they provide a purely subjective measurement. Only a fraction of papers can be read properly. Critical reading takes time, when you are reading in your own field, you may scan papers, but then will come the real critical reading, where you may take hours over a single paper. So the “quality” of an output is not measured through critical reading. We then have the problems of how to evaluate the “importance” of a piece of work before it has penetrated the community, a process that may take years
I could go on, but simply put, all assessments we have undertaken are subjective.
The choice is not between “method A”, which is reputable and known to yield a measurement with defined signal-to-noise, variance, etc., and “method B”, which is fast, but much more uncertain. The choice is between “method A”, which is laborious and deeply flawed and “method A”, which is fast and equally deeply flawed.
Until people actually do the simple maths whereby they measure the time it takes them to properly read 10-20 papers in their field and then calculate the time required to read if on a RAE/REF panel, we will continue to engage in a false argument.
THE ONLY SUBSTITUTE FOR METRICS IS BETTER METRICS
“The man who is ready to prove that metaphysical knowledge is wholly impossible… is a brother metaphysician with a rival theory” Bradley, F. H. (1893) Appearance and Reality
https://www.goodreads.com/quotes/1369088-the-man-who-is-ready-to-prove-that-metaphysical-knowledge
The topic of using metrics for research performance assessment in the UK has a rather long history, beginning with the work of Charles Oppenheim.
The solution is neither to abjure metrics nor to pick and stick to one unvalidated metric, whether it’s the journal impact factor or the h-index.
The solution is to jointly test and validate, field by field, a battery of multiple, diverse metrics (citations, downloads, links, tweets, tags, endogamy/exogamy, hubs/authorities, latency/longevity, co-citations, co-authorships, etc.) against a face-valid criterion (such as peer rankings).
Oppenheim, C. (1996). Do citations count? Citation indexing and the Research Assessment Exercise (RAE). Serials: The Journal for the Serials Community, 9(2), 155-161.
Oppenheim, C. (1997). The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics, anatomy and archaeology. Journal of documentation, 53(5), 477-487.
Oppenheim, C. (1995). The correlation between citation counts and the 1992 Research Assessment Exercise Ratings for British library and information science university departments. Journal of Documentation, 51(1), 18-27.
Oppenheim, C. (2007). Using the h‐index to rank influential British researchers in information science and librarianship. Journal of the American Society for Information Science and Technology, 58(2), 297-301.
Harnad, S. (2001) Research access, impact and assessment. Times Higher Education Supplement 1487: p. 16. http://cogprints.org/1683/
Harnad, S. (2003) Measuring and Maximising UK Research Impact. Times Higher Education Supplement. Friday, June 6 2003 http://eprints.ecs.soton.ac.uk/7728/
Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35. http://eprints.soton.ac.uk/265852/
Hitchcock, Steve; Woukeu, Arouna; Brody, Tim; Carr, Les; Hall, Wendy and Harnad, Stevan. (2003) Evaluating Citebase, an open access Web-based citation-ranked search and impact discovery service http://eprints.ecs.soton.ac.uk/8204/
Harnad, S. (2004) Enrich Impact Measures Through Open Access Analysis. British Medical Journal BMJ 2004; 329: http://bmj.bmjjournals.com/cgi/eletters/329/7471/0-h#80657
Harnad, S. (2006) Online, Continuous, Metrics-Based Research Assessment. Technical Report, ECS, University of Southampton. http://eprints.ecs.soton.ac.uk/12130/
Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072. http://eprints.ecs.soton.ac.uk/10713/
Brody, T., Carr, L., Harnad, S. and Swan, A. (2007) Time to Convert to Metrics. Research Fortnight pp. 17-18. http://eprints.ecs.soton.ac.uk/14329/
Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and Swan, A. (2007) Incentivizing the Open Access Research Web: Publication-Archiving, Data-Archiving and Scientometrics. CTWatch Quarterly 3(3). http://eprints.ecs.soton.ac.uk/14418/
Harnad, S. (2008) Validating Research Performance Metrics Against Peer Rankings. Ethics in Science and Environmental Politics 8 (11) doi:10.3354/esep00088 The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance http://eprints.ecs.soton.ac.uk/15619/
Harnad, S. (2008) Self-Archiving, Metrics and Mandates. Science Editor 31(2) 57-59
http://www.councilscienceeditors.org/members/secureDocument.cfm?docID=1916
Harnad, S., Carr, L. and Gingras, Y. (2008) Maximizing Research Progress Through Open Access Mandates and Metrics. Liinc em Revista 4(2). http://eprints.ecs.soton.ac.uk/16617/
Harnad, S. (2009) Open Access Scientometrics and the UK Research Assessment Exercise. Scientometrics 79 (1) Also in Proceedings of 11th Annual Meeting of the International Society for Scientometrics and Informetrics 11(1), pp. 27-33, Madrid, Spain. Torres-Salinas, D. and Moed, H. F., Eds. (2007) http://eprints.ecs.soton.ac.uk/17142/
Harnad, S. (2009) Multiple metrics required to measure research performance. Nature (Correspondence) 457 (785) (12 February 2009) http://www.nature.com/nature/journal/v457/n7231/full/457785a.html
Harnad, S; Carr, L; Swan, A; Sale, A & Bosc H. (2009) Maximizing and Measuring Research Impact Through University and Research-Funder Open-Access Self-Archiving Mandates. Wissenschaftsmanagement 15(4) 36-41 http://eprints.soton.ac.uk/266616/