There is growing concern that the contentious journal impact factor is being used by universities as a proxy measure for research assessment. In light of this and the wider REF2014 exercise, Dorothy Bishop believes we need a better system for distributing funding to universities than the REF approach allows. A bibliometric measure such as a departmental H-index to rank departments would be a better suited and more practical solution.
In 2014, British institutions of Higher Education are to be evaluated in the Research Excellence Framework (REF), an important exercise on which their future funding depends. Academics are currently undergoing scrutiny by their institutions to determine whether their research outputs are good enough to be entered in the REF. Outputs are to be assessed in terms of “‘originality, significance and rigour’, with reference to international research quality standards.” Here’s what the REF2014 guidelines say about journal impact factors: “No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs.”
Here is some evidence by Dr Jenny Rohn that the REF2014 statement on impact factors is being widely ignored. And here’s an extract of the letter [full text here] I wrote to the representatives of RCUK, who act as observers on REF panels, asking for a response to the concerns raised by Dr Rohn to clarify the position on and reasoning of journal impact factors:
I am contacting you because a growing number of academics are expressing concerns that, contrary to what is stated in the REF guidelines, journal impact factors are being used by some Universities to rate research outputs….Many, perhaps most, of the academics involved in panels and REF preparations grew up in a climate where publication in a high impact journal was regarded as the acme of achievement. Insofar as there are problems with the use of impact factors, they seem to think the only difficulty is the lack of comparability across sub-disciplines, which can be adjusted for. Indeed, I have been told that it is naïve to imagine that this statement should be taken literally: “No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs.”
Institutions seem to vary in how strictly they are interpreting this statement and this could lead to serious problems further down the line. An institution that played by the rules and submitted papers based only on perceived scientific quality might challenge the REF outcome if they found the panel had been basing ratings on journal impact factor. The evidence for such behaviour could be reconstructed from an analysis of outputs submitted for the REF….Although the statement seems unambiguous, there is a widespread view that the intention is only to avoid slavish use of impact factors as a sole criterion, not to ban their use altogether. If that is the case, then this needs to be made explicit. If not, then it would be helpful to have some mechanism whereby academics could report institutions that flout this rule.
I now have a response from Graeme Rosenberg, REF Manager at HEFCE, who kindly agreed that I could post relevant content from his email here. This briefly explains why impact factors are disallowed for REF panels, but notes that institutions are free to flout this rule in their submissions, at their own risk. The text follows:
I think your letter raises two sets of issues, which I will respond to in turn.
The REF panel criteria state clearly that panels will not use journal impact factors in the assessment. These criteria were developed by the panels themselves and we have no reason to doubt they will be applied correctly. The four main panels will oversee the work of the sub-panels throughout the assessment process, and it part of the main panels’ remit to ensure that all sub-panels apply the published criteria. If there happen to be some individual panel members at this stage who are unsure about the potential use of impact factors in the panels’ assessments, the issue will be clarified by the panel chairs when the assessment starts. The published criteria are very clear and do not leave any room for ambiguity on this point.
The question of institutions using journal impact factors in preparing their submissions is a separate issue. We have stated clearly what the panels will and will not be using to inform their judgements. But institutions are autonomous and ultimately it is their decision as to what forms of evidence they use to inform their selection decisions. If they choose to use journal impact factors as part of the evidence, then the evidence for their decisions will differ to that used by panels. This would no doubt increase the risk to the institution of reaching different conclusions to the REF panels. Institutions would also do well to consider why the REF panels will not use journal impact factors – at the level of individual outputs they are a poor proxy for quality. Nevertheless, it remains the institution’s choice.
Many people have asked me what alternative I’d recommend. Clearly, we need a transparent, fair and cost-effective method for distributing funding to universities to support research. Those designing the REF have tried hard over the years to devise such a method, and have explored various alternatives, but the current system leaves much to be desired.
Consider the current criteria for rating research outputs, designed by someone with a true flair for ambiguity:
Rating |
Definition |
4* | Quality that is world-leading in terms of originality, significance and rigour |
3* | Quality that is internationally excellent in terms of originality, significance and rigour but which falls short of the highest standards of excellence |
2* | Quality that is recognised internationally in terms of originality, significance and rigour |
1* | Quality that is recognised nationally in terms of originality, significance and rigour |
Since only 4* and 3* outputs will feature in the funding formula, then a great deal hinges on whether research is deemed “world-leading”, “internationally excellent” or “internationally recognised”. This is hardly transparent or objective. That’s one reason why many institutions want to translate these star ratings into journal impact factors. But substituting a discredited, objective criterion for a subjective criterion is not a solution.
The use of bibliometrics was considered but rejected in the past. My suggestion is that we should reconsider this idea, but in a new version. A few months ago, I blogged about how university rankings in the previous assessment exercise (RAE) related to grant income and citation rates for outputs. Instead of looking at citations for individual researchers, I used Web of Science to compute an H-index for the period 2000-2007 for each department, by using the ‘address’ field to search. As noted in my original post, I did this fairly hastily and the method can get problematic in cases where a Unit of Assessment does not correspond neatly to a single department. The H-index reflected all research outputs of everyone at that address – regardless of whether they were still at the institution or entered for the RAE. Despite these limitations, the resulting H-index predicted the RAE results remarkably well, as seen in the scatterplot below, which shows H-index in relation to the funding level following from RAE. This is computed by number of full-time staff equivalents multiplied by the formula:
.1 x 2* + .3 x 3* + .7 x 4*
(N.B. I ignored subject weighting, so units are arbitrary).
Yes, you might say, but the prediction is less successful at the top end of the scale, and this could mean that the RAE panels incorporated factors that aren’t readily measured by such a crude score as H-index. Possibly true, but how do we know those factors are fair and objective? In this dataset, one variable that accounted for additional variance in outcome, over and above departmental H-index, was whether the department had a representative on the psychology panel: if they did, then the trend was for the department to have a higher ranking than that predicted from the H-index. With panel membership included in the regression, the correlation (r) increased significantly from .84 to .86, t = 2.82, p = .006. It makes sense that if you are a member of a panel, you will be much more clued up than other people about how the whole process works, and you can use this information to ensure your department’s submission is strategically optimal. I should stress that this was a small effect, and I did not see it in a handful of other disciplines that I looked at, so it could be a fluke. Nevertheless, with the best intentions in the world, the current system can’t ever defend completely against such biases.
So overall, my conclusion is that we might be better off using a bibliometric measure such as a departmental H-index to rank departments. It is crude and imperfect, and I suspect it would not work for all disciplines – especially those in the humanities. It relies solely on citations, and it’s debatable whether that is desirable. But for sciences, it seems to be pretty much measuring whatever the RAE was measuring, and it would seem to be the lesser of various possible evils, with a number of advantages compared to the current system. It is transparent and objective, it would not require departments to decide who they do and don’t enter for the assessment, and most importantly, it wins hands down on cost-effectiveness. If we’d used this method instead of the RAE, a small team of analysts armed with Web of Science should be able to derive the necessary data in a couple of weeks to give outcomes that are virtually identical to those of the RAE. The money saved both by HEFCE and individual universities could be ploughed back into research. Of course, people will attempt to manipulate whatever criterion is adopted, but this one might be less easily gamed than some others, especially if self-citations from the same institution are excluded.
It will be interesting to see how well this method predicts RAE outcomes in other subjects, and whether it can also predict results from the REF2014, where the newly-introduced “impact statement” is intended to incorporate a new dimension into assessment.
This blog was originally published on Dorothy’s BishopBlog and can be found here along with a further discussion. If you would like to comment on this article, please do so on the original post.
Note: This article gives the views of the author(s), and not the position of the Impact of Social Sciences blog, nor of the London School of Economics.
About the author
Dorothy Bishop is Professor of Developmental Neuropsychology and a Wellcome Principal Research Fellow at the Department of Experimental Psychology in Oxford and Adjunct Professor at The University of Western Australia, Perth. The primary aim of her research is to increase understanding of why some children have specific language impairment (SLI). Dorothy blogs at BishopBlog and is on Twitter @deevybee.
Is there a growing sense of disciplinary benchmarks or norms of expectation around H scores for individuals? I have sat on life/ natural science panels where an H of 20 is perhaps the unofficial barrier for promotion to professor, but this seems way too high for my field (political science).
It would also be simpler if HEFCE just used a two-part criteria for their funding decisions: i.e. (1) ‘excellent’ and (2) ‘not excellent’. Splitting hairs between “world-leading” and “internationally excellent” seems like a waste of time and money. A two-part criteria would make it much easier for assessors and submitters to make what are always going to be flawed judgements – any money saved from the administration budget could then be distributed to the “not excellent” UoAs. Caveat: I’m making this comment partially in jest, but only partially. Not having to deal with RAE/REF is one of the joys of moving overseas.
The h-index, as a measure of impact, is only as good as the coverage of the field in the citation dtabases. That’s why it doesn’t work particularly well for the arts, humanities and social sciences, who publish books, book chapters and conference papers. But even within the hard science disciplines, one department’s chosen outlets may enjoy better coverage than another, automatically creating an unfair advantage. I think that using an aggregate measure of article-level metrics that measure both scholarly impact and public interest, but also provide a fairer coverage of subjects and formats, is a better way to go, along with a narrative of real impact, in my opinion.
I am sympathetic to the goals of this exercise. My first thought was to try and replicate it for political science. Unfortunately, I fell at the first hurdle: Web of Science doesn’t have a political science category, and I had no great desire to create a Frankencategory from “Public Administration” plus “Other social science”.
This raises points which are not just about coverage, but which also touch on boundaries, taxonomy and control of the process.
Thank you, I enjoyed reading this blog. The current obsession with impact factors is far-reaching and has many serious implications, which are poorly understood or studied. Developing an efficient and cost-effective method that can yield highly similar results as a full scale assessment exercise has a lot of merit. However, there are also several issues with an institutional H-Index. For one, citations are highly skewed, which makes citations hardly suitable as a performance criterion. Granted, restricting citations to a certain time period (e.g., 2000-2007) and looking at the aggregated level addresses this issue to some extent, but at the individual level, where impact factors are causing most of the problems, these concerns remain. Second, an institutional H-index would give preference to more established researchers, who have been out longer and had more of a chance to accumulate citations in a given assessment period. Third, institutions would have less of an incentive to increase their workforce, and instead be motivated to go “lean”. Co-authorships would also carry no value under such a system. Fourth, the system would discourage mobility, both within the UK, as well as into the UK. There are downsides to the current horse trading practices, but it is fair to assume that mobility has many benefits for a knowledge economy (exchange; sharing best practice; etc.). Fifth, citations would have to be adjusted for disciplines (see also other comments on this blog), which then raises the issue of where to draw discipline boundaries. Sixth, citations give preference to certain types or forms of academic output. As you know, in psychology reviews attract a large number of citations, and so do certain psychometric measures. I would be wary of an index of world-leading research quality that places disproportionate value on these and other types of publications that attract a lot of citations. In sum then, I don’t think there is a panacea for the quest of objective measures and even if there is, the institutional H-index does not strike me as the way forward.
This is a great initiative. Impact factor is declared dead. There has to be an alternative mechanism to measure what the scholary community realy wanted to measure. http://publication2application.org/2013/12/02/impact-factor-a-poor-quality-indicator-of-quality/ also verifies this assertion based on more consolidated analysis.