When Garfield first launched his Science Citation Index in 1965, many criticised its unequal geographical coverage of the world’s scientific literature. Almost 60 years later, the problem has not gone away. Drawing on their recent paper David Mills and Toluwase Asubiaro examine the regional disparities within Scopus and Web of Science.
What data set links university rankings, research assessment exercises, promotion decisions and academic reputations? Citations. ‘Highly cited’ researchers become academic celebrities, journals boast about their ‘impact factor’ and the author’s h-index is prominently displayed on many an academic CV. But how reliable and ‘objective’ is this citation data? Does it fairly represent the world’s research productivity. Or might this data be reinforcing an already highly unequal, anglophone-dominated global science system?
In our recent paper we analysed journal data to show that these two key indexes under-represent journals from certain regions and parts of the world. There are many different national and international journal databases, but these two are the most influential.
The research compared the total number of academic journals published in all eight UNESCO world regions with the number selected for inclusion in the two indexes. We made use of UlrichsWeb, the most inclusive existing database of academic journals, though even this is not fully comprehensive. The number, and overall proportion, of journals from different regions listed in UlrichsWeb is very different from those indexed in Web of Science and Scopus. As the figure below shows, journals published from Europe are 30-40% more likely to be indexed in Web of Science and Scopus, whereas journals from Central and South Asia, East and South-East Asia and from Sub-Saharan Africa are 50-60% less likely to be included.
When Eugene Garfield first launched his Science Citation Index (now Web of Science) in 1965, many were sceptical. Creating an index of journal citations seemed an impossible task. Critics mocked the objectivity that could be achieved by ‘not reading the literature’. Others disputed Garfield’s claim that the index fairly represented global science: was citation data from different scientific disciplines and regions comparable?
Garfield had funding to back his ambitions. His company, the Institute for Scientific Information (ISI), already had an important role in the US research information landscape. Many companies and research labs subscribed to Current Contents, the ISI’s profitable journal abstracts service. Securing first US Navy and then NSF backing, Garfield poured ISI profits into building the database on a rudimentary IBM mainframe.
Questions continued to mount. In 1995 Scientific American published an extended critique, entitled ‘Lost Science in the Third World’. It called out SCI’s systematic discrimination against ‘third world’ journals. Garfield robustly rejected all such claims. In one response, published two years later, Garfield argued that ‘many Third World countries suffer by publishing dozens of marginal journals whose reason for being is questionable’. He insisted that he had ‘devoted 40 years to espousing the cause of third world journals’ and the problem lay in the scholars themselves, and their ‘mobility and frequency of contact with peers outside the Third World’.
Building on earlier work about English language and disciplinary bias, ours provides a scientometric analysis of these regional disparities, showing how both citation indexes disproportionately index English-language publications, journals published from Europe and journals in the life sciences
From 1971 to 1995, Garfield penned a whimsical monthly column in Current Contents, entitled ‘Essays of an Information Scientist’, where he addressed contemporary issues in scientometrics, as well as occasional riffs on jazz. Yet he never fully explained how he choose the journals to include in his original index. In 1971, he wrote about Bradford’s law of scattering, which he renamed as ‘Garfield’s law of concentration’. In it he cited recent ISI data to argue that ‘500 to 1000 journals will account for 80 to 100% of all journal references’. This was his justification for journal selectivity. He went on to note that ‘the implications of this finding for establishing future libraries, especially in developing countries, should be quite obvious’. For Garfield, citations are facts. He never seemed to acknowledge their gate-keeping, status seeking, alliance-building and enemy defeating role. In Science in Action Latour called this the ‘context of citations ‘, whereby ‘one text acts on others to keep them more in keeping with its claims’.
The debate about the robustness of the indexes’ journal coverage and selection process continues. Building on earlier work about English language and disciplinary bias, ours provides a scientometric analysis of these regional disparities, showing how both citation indexes disproportionately index English-language publications, journals published from Europe and journals in the life sciences. Yet there are growing concerns about the integrity of these indexes, under pressure from paper mills and so-called ‘predatory’ journals. In 2023, Web of Science delisted 50 journals, and announced it was reviewing 450 more. Both indexes have turned to AI, elaborate algorithms and data-based selection processes for vetting candidate journals and excluding underperforming serials. The metrics tide is beginning to turn against such ‘black boxing’ of data. In April 2024, the launch of the Barcelona declaration on Open Research Information called for a ‘fundamental change’ in the research landscape, and for ‘information used in research evaluations to be accessible and auditable by those being assessed’. Whether making data open can help tackle global citation inequalities remains to be seen.
This post draws on the authors’ paper, Regional disparities in Web of Science and Scopus journal coverage, published in Scientometrics.
The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.
Image Credit: Reproduced with permission of the authors.
A significant proportion of my publications over the last 50 years have been refutations, refuting papers or research programmes. These do not get a lot of citations (460 according to Academia.) My aim is that anyone who reads them will switch to another research programme, another theoretical approach, evidence that has not been faked. Some research programmes dropped from the literature immediately I published.
What I do get is reads. Most appear to be because my papers are on postgraduate reading lists – half a dozen to a dozen readers from the same university at the beginning of the year. The payoff is that people do not waste three years of their life, or even three months, doing meaningless research, or building research on fraud. I get this information from ResearchGate and Academia. Obviously, the coverage is limited: reading on the journal web page is not covered. Recently a Journal told me that my paper was the most read for the year for that journal, with five times as many reads as ResearchGate and Academia reported.
In one week recently, for instance, ResearchGate reported that more than 200 researchers and students from 12 countries read 10 of my comments and refutations; seven of these were written more than 20 years ago. One, written 38 years ago gets readers every week, one 45 years old is again on reading lists in a different discipline and gets readers every month, and occasionally even older papers are still read.
Some of these papers may have saved millions of pounds, even millions of lives.
Note that people read these. One doubts if most authors have read the papers they cite, let alone tried to learn from them. Research misconduct.
So counting citations is classic bad research, using the information that is easily available, rather than the information needed. Just what we are warned about in Statistics 1.
Peter Bowbrick