3.1 Assessing how well an author is cited
3.2 Assessing how far journals and books are cited
3.3 Who cites a little or a lot: Hub and authority patterns
So far we have focused chiefly on finding out which parts of an academic’s outputs are being cited and achieving influence. Once this information is collated, it is then possible to look at a range of different indicators or measures of success.
Some of the concepts discussed in this section (like the h-index versus the g-index) may sound overly technical or complex. In fact, all of the indicators we discuss here are relatively straightforward each is useful in capturing one facet of the complex picture of academic impact. Any single indicator will have some things it does well, along with some limitations that need to be borne in mind. The most useful approach is to take a small set of indicators and create a well-balanced view of an individual’s citations profile.
We first consider the indicators which are useful in assessing an academic’s citations records. We next consider how indicators of a journal’s success can be useful in deciding where to try and place future articles, and how to assess the comparative dividends from publishing journal articles and from books. Finally we consider who cites a little and a lot in academic disciplines, often discussed under the ‘hub’ and ‘authority’ patterns.
3.1 Assessing how well an author is cited
Straightforward totals are the simplest type of indicators for judging how widely a researcher or academic is being cited:
1) An author’s total number of publications is obviously fewer for new researchers, and tends to grow over time. Comparisons are easier if you know total publications per year measures, starting with someone’s PhD award date. This is easy to do for academics analysing their own records but PhD dates are difficult to calculate for other academics. Total publications per year measures are therefore not readily available on a comparative basis. Clearly there is also a great difference between a short note or report, a full journal article, or an academic book, so any publications head-count that treats each output the same can only be of limited value. In Harzing’s Publish or Perish(HPoP)/Google publications count details can be distorted by other authors mis-spelling the original author’s names or mis-referencing the title, each of which will register as a separate publication. But the HPoP software hugely improves on Scholar by including a handy facility to merge together records. Simply click on the titles tab to view titles in alphabetical author, and then pile duplicate entries for an item into the correctly cited entry for that item.
2) The total number of citations for an author solves this problem somewhat (we’d expect a book to be more cited than a short report). However, citations totals are equally shaped by longevity, and hence normally flatter senior academics relative to new entrants. To meet this problem, HPoP calculates a useful average citations per year index that controls well for senior versus junior staff differences.
3) HPoP also provides an age-weighted citation rate (AWCR) that measures the average number of citations to an entire body of work, adjusted for the number of years since the academic’s first paper was published. The AWCR is very useful, but it only works if publishers enter the dates of their online materials correctly.
Some other apparently straightforward-looking indices raise quite interesting issues about whether they are of any use, because they are not easy to interpret. The key instance is the average citations per item. This may seem a useful statistic for estimating how influential an author’s work is on average, and it does have a certain rudimentary value. However, any mean score like this makes most sense when data are normally distributed, which is rarely true for academic citations data. Most authors will tend to have a few strongly cited pieces that ‘break through’ into being extensively referenced by others, a larger number of medium-cited pieces, and a ‘long tail’ of rarely and barely cited pieces, including some or many that are uncited by anyone. (The more book reviews the author writes in ISI journals, the longer this tail will be).
There is extensive evidence for academic disciplines as a whole that patterns of citations of journal articles display a ‘power law’ configuration, such as that shown in Figure 3.1 for physics papers analysed by Sidney Redner. On the left are the small numbers of highly influential papers, and as one moves to the right so the number of papers with a given but lower and lower number of cites increases. The vertical axis uses a logarithmic scale here so that if the distribution approximates to a straight line sloping down to the right, then this is a sure sign of a power law effect in action.
Figure 3.1: The ‘power law’ effect in the citation of physics journal articles
Compare this distribution with that for five senior professor whose distributions of publications across rates of citation are shown in Figure 3.2.
Figure 3.2: Publication profiles for six senior social science academics
For this illustration we chose one senior academic from each discipline included in the Project’s database because their longer career time, plus their greater prominence in their academic disciplines, helps to bring out patterns more clearly. (By contrast, the scantier publication profiles of younger staff are often susceptible to different interpretations). Among our chosen professor the top-cited publications have from 40 to 250 references each, but in most cases there are only one or a few such papers or books. The number of publications generally increases as we move into lower citation ranges, with the peak being in items with single or zero citations. There are good grounds for expecting that this kind of broad pattern will be reasonably common across most academics.
To just take a mean average per item score across distributions such as these is clearly not a very useful thing to do, because the preponderance of single cited or zero cited items will produce very low numbers, which capture very little of the real variations in success in being cited across different academics. We need to use instead some slightly more complex indicators that compute a number by looking across the whole of an author’s outputs:
The h-index has become the most widely used of these indicators. It was suggested by Jorge S. Hirsch and defined by him as follows: ‘A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np − h) papers have at most h citations each’. In case this leaves you none the wiser, an h score of 5 means that the person involved has at least 5 papers which have attracted at least five citations each; and an h score of 10 means they have 10 papers with at least 10 citations each.
Figure 3.3 shows how this approach works. We graph the number of papers an academic has on the horizontal axis, against the level of cites achieved on the vertical axis and then find the point where the resulting curve cuts the ‘parity line’, where the number of cites equals the number of papers at that level of cites. As a physical scientist, Hirsch envisioned that this computation would be done in ISI, which is easy to do. As we have seen, this is a reasonable approach in physics, where the internal coverage of the ISI database is high. However, for the social sciences we suggest that it should instead be much better carried out in HPoP/Google, which also has the great advantage of computing an h score index automatically for all authors. (In the humanities only HPoP should be used at all). This number is accurately calculated provided that two things are done:
i) Check that no extraneous (similarly named) authors are included in the top publications in the HPoP listing, those close to or above the h-score level. (For authors with very numerous publications, it is not strictly necessary to check the whole listing to ensure an accurate h-score, only down to just below the h-score level).
ii) Check through the full HPoP list to ensure that any duplicate entries for one of the top listed publications have been added to the appropriate entries. If duplicate entries appear lower down the list, this may somewhat depress the h-score level below what it should be.
Figure 3.3: How the h-index works
The huge advantage of the h-index is that it is very robust – it will not be much affected by mis-citations of most pieces and it usually will not move very radically even when corrections are made to clean data. In particular, the index is highly resistant to being influenced by the numbers of low cited or uncited items (where most errors live). Hirsch also claimed that the index summed up in one useful number a measure of how much an academic’s work is valued by her peers, how diverse that individual’s contribution has been, and how sustained it is over time.
So what would a good h-score index level be in the social sciences? Probably we can set the maximum feasible level at around 45 to 50 for the greatest international stars across these disciplines, and this would be using the HPoP index h-scores and not just looking at the ISI databases.
Our project’s database also suggests that in the social sciences the range of h-scores that are attained by staff at different levels of age and seniority are markedly different as Figure 3.4 shows for five main disciplines. Taken as a whole our 20 geographers have the best h scores, closely followed by economists, while law academics have noticeably lower citation scores. These h score variations clearly reflect differences in citations behaviours across disciplines, with more article-based disciplines having higher scores. (On our definitions, geography is also of course regarded as being 50 per cent a physical sciences discipline). H scores are also almost certainly affected by the sheer sizes of disciplines, and perhaps by other confuser factors. (For instance, because economics lecturers in the UK are generally paid around one third higher academic salaries than others of the same age in other disciplines, they may also be somewhat older on appointment to full-time positions than elsewhere). Overall, economics and geography professors clearly top the average h score rankings here; and lecturers in these two disciplines have h scores more or less equivalent to those of professors elsewhere in our sample.
The h-score has some limitations. A rather key one is that your h-score is constrained not just by how many cites you get, but by the simple fact of the number of papers you have had time to produce. The index tends to favour senior people who have had the chance to publish a lot, as well as having had more time for their items to accumulate citations. So it is not surprising that Figure 3.4b shows that h scores vary a lot by rank, with professors generally having more than twice the h-scores of senior lecturers and lecturers. (To counteract the age-bias of the h-score in the social sciences you can just use age-weighted benchmarks. The HPoP software calculates an age-weighted version of the h-score that helps compare across different staff of different ranks or ages). Putting together discipline and rank influences in Figure 3.6c shows a more complicated picture from the mixing of the two factors. Some lecturers (in economics and geography) have h scores above law professors and comparable to those political science and sociology professors. The senior economics lecturers in the IPD also have rather low h scores on average.
Figure 3.4: Average h-scores for 120 social science academics in the IPD
A more fundamental critique of the h-score is that it assumes that all academics in a field have the same pattern, such as the cites curve shown in Figure 3.1 and the profiles considered in Figure 3.2. But what if they don’t? Should we not more highly value an academic whose top publications are very highly cited, compared with another academic whose top items are not much more cited than those on the h-score boundary? To address this issue another score – the g index – has been developed. It is a key variant of the h-score, and it was suggested by Leo Egghe to incorporate the effect of very highly cited top publications. It is also automatically calculated by the HPoP software.
To understand how the g score is calculated, we first need to draw the same graph as for the h-index in Figure 3.3. According to Egghe we then pick ‘the (unique) largest number such that the top g articles received on average at least g citations’. (Note that here what Egghe means by ‘the average’ is the mean). In practice, we add up total number of cites for items above the h score limit, and find the mean of this sub-set of well cited publications. If an author has some very highly cited pieces in her top listed h pieces, then their extra impetus operates to raise that person’s g score well above their h score. For instance, for one senior researcher we looked at the h score in HPoP was 28, but the g-score was 53, almost twice as great. This is because the top cited piece here had over 700 cites, and several more have 100 to 250 cites, thereby strongly raising the mean level of cites across the whole top-cited group. By contrast, if an academic does not have this marked inequality in cites across their different publications then their h-scores and g-score will tend to be much closer together, although the g-score will almost always still be higher. Harzing (2010, p. 13) judges that the g index ‘is a very useful complement to the h index’, and we concur that using the h and g indices in tandem is clearly very helpful.
3.2 Assessing how far journals and books are cited
In the STEM disciplines, and in the social sciences in subjects such as economics and geography, there are strong and straightforward incentives for academics to concentrate on producing peer-reviewed journal articles, as far and way the premier form of output. Journals are also arranged in a clear and well-known hierarchy in terms of their journal impact factors, a rather inadequate proxy indication of outputs quality there, but still the main determinant of journals’ relative prestige. Books (and even more book chapters) constitute only a small proportion of research outputs, although a few classic or standard reference high-end textbooks may also be influential and well cited in the research literature.
By contrast, in some humanities subjects the hierarchy of journals is often rather weakly defined, with multiple specialist outlets. Here books can often appear to be more well cited, a pattern that might apply in some of the social sciences as well, such as in sociology and law. Here too external assessors (such as the REF panels in the UK) may assign as much or more weight to books. And promotion committees may expect young academics to make a distinct (‘own voice’) contribution to the discipline by publishing at least one book before being promoted to more senior or tenure track positions. Hence it is important for academics in these disciplines to assess carefully the likely gains to their citation scores from concentrating solely on journal articles, or from widening their outputs to include books.
On the other hand it seems clear that book chapters are generally second-order publications, unless the edited collection involved is an especially prestigious or influential one (such as a widely used Handbook for a sub-field). Regular series of edited books in some disciplines may also be well referenced. But normally book chapters will be harder for other authors to find and reference, unless they actually own the book in question, than are whole books or journal articles. Because more senior authors in ‘soft subjects tend to gravitate towards writing book chapters in later life, and may not sustain journal publications, book chapters may still seem to be well-cited – but we would need to be able to discount here for seniority and cumulative reputational effects to be sure of this.
To shed some more light on these issues, we look next at some preliminary data on citation patterns for 120 academics across five social science disciplines included in the IPD.
Figure 3.5: The importance of different types of outputs in academic citations
(a) Total outputs by type
|Type of Output||Total||Percentage|
|All book outputs||199||17|
|Discussion and Working papers||126||11|
(b) Variations in the citing of type of outputs across discipline (percentages of all cites per discipline)
|Geography||Political Science||Economics||Law||Sociology||Total||Total %|
|All book outputs||17.5||15.8||7.4||25.7||29.9||199||16.9|
|Discussion and Working papers||4.6||7.9||21.2||6.1||7.3||126||10.7|
Figure 3.5a shows that looking across all areas journal articles account for more than three fifths of the more than 1,100 citations included. Books and book chapters are the next most important category, accounting for one in six citations, followed by research and working papers accounting for a tenth of citations. Perhaps surprisingly, Figure 3.5b shows that journal articles were more important as a source of citations in geography and political science than in economics. However, in economics discussion papers and working papers also accounted for a further fifth of citations, reflecting the longer lags to publication here, Books and book chapters accounted for less than one in twelve citations in economics, around one in six citations in geography and political science, over a quarter of references in sociology and law. In these last two areas journal articles only accounted for just over half of citations.
We also looked at the patterns of citing for outputs across academics of different ranks in the university hierarchy,, which figure 3.6 demonstrates. Lecturers were cited four fifths of the time for journal articles, but the same was also true of professors, with both groups also showing small cites for working papers. By contrast, senior lecturers were cited more than twice as often for books and book chapters than other academics, although even for this group articles were the main outputs that were extensively cited. This pattern may reflect a concentration of senior lecturers in more teaching track forms of academic work.
Figure 3.6: The origins of inwards citations to social scientists in five disciplines, by university rank and the type of outputs
|Type of Output||Lecturer||Senior Lecturer||Professor|
|All book outputs||13||29||12|
|Discussion & Working papers||6||3||6|
|Percentage of all citations||18.2||14.1||67.7|
In numerical terms, the predominance of journal articles in terms of citations is unsurprising, because a large majority of academic outputs are in this form, and books (even book chapters) are published less frequently. A key question to consider is how publishing books or articles compare in terms of achieving high h score items, those which fall above the parity line in Figure 3.3. Here the picture is more mixed, because books tend to have a longer shelf life in referencing terms than most articles (see Figure 2.xx) and so may accumulate citations for longer.
In many academic fields where (senior) authors write books (such as political science), it is common to draw attention to a book being forthcoming by condensing its key content into one or two rather ‘hard-boiled’ journal article that show key parts of the argument in a professionally impressive if rather hard-to-understand way. The book itself is not so condensed and is written in a somewhat more accessible style, designed more to maximise its audience. The book may also give more details of methods etc. than is feasible in the brief compass of a journal article. Little wonder then that the book will tend to be more referenced, and in a wider range of academic media, than its article precursors.
For all these reasons, we hypothesise that in social science disciplines where books remain a regular and important type of output:
- an individual author’s books tend to figure disproportionately in the h-score entries above the parity line, compared with their journal articles;
- an individual’s books also figure disproportionately in the ‘above the line’ h score entries with higher than average citations, and hence they tend to build that person’s g index number;
- an individual’s books rarely accumulate no or only a few (under 5 say) citations;
- whereas some or many journal articles will tend to do so;
- however, chapters in books will also tend to figure disproportionately below the h score parity line, and they may also disproportionately accumulate no or very few (0, 1 or 2) citations.
Currently the IPD offers some supportive indicative evidence for each of these propositions, but their fuller exploration must rest on creating a wider database.
3.3 Who cites a little or a lot: Hub and authority patterns
Network analysis provides some interesting insights into how academics tend to cite and be cited. Research on network analysis originated in the work of Kleinberg (1998) on computer sciences, exploring which websites link to each other. The approach has greatly expanded in recent years in the social sciences, where researchers try to show how many different kinds of things are inter-connected. For instance, researchers have examined which US Supreme Court decisions cite which other decisions as precedents (Fowler, 2008; Fowler et al, 2007) and how major US universities academic departments secure the placement and hiring of their PhDs (see Fowler et al 2007; Fowler and Aksnes 2007). However, network analyses of academic citing behaviours are far better developed.
The basic concept of network analysis is to consider the different units (articles or books, individual researchers or whole academic departments) as nodes that are connected among each other by inward or outward citations. Taking the example of individual researchers, an inward citation is a citation to that person, while an outward citation is that academic citing someone else. The number of inward and outward citations flowing into and out of a node may be considered as a degree of centrality.
In network analysis nodes with a high number of inward citations are regarded as an authority, because they are identified by units within the network being analysed as worthwhile tokens or links to make. An academic who receives a high number of inward citations is clearly considered an authority by her peers. Typically, an authority will have published key works in the disciplines, works that are frequently cited by other academics in order to ground new research – such as classic treatments or standard references. Given that it often takes time for their key articles or books to be widely recognised in the discipline, we might expect that authority scholars will be generally older and well established researchers, usually in high prestige universities. A scholar who achieves wide peer-recognition initially at a less prestigious university is generally able to move into an Ivy League or other high-prestige university. And indeed, Figure 3.6 shows that in the IPD covering 120 UK social scientists shows that the xx professors accounted for two thirds of all inwards citations, compared with less than a fifth of citations for the numerically most numerous group, the xx lecturers.
Network theorists also argue that the number of outward citations can be used to indicate whether the work of a given academic is well grounded in the body of academic research. An academic with a high number of outward citations can be considered as a hub because she cites and aggregates a set of relevant works in her discipline. Figure 3.5 shows a hypothetical network of academics with inwards and outwards citations. In this figure ‘Academic 1’ is clearly an ‘authority’ because she receives a total of 5 inwards citations (represented by the inward looking arrows). By contrast, ‘Academic 4’ is a hub because he has 4 outwards citations (represented by the outward-pointing arrows).
Figure 3.5: Network of academic citations
In this figure ‘Academic 1’ is clearly an ‘authority’ because she receives a total of 5 inwards citations (represented by the inward looking arrows). By contrast, ‘Academic 4’ is a hub because he has 4 outwards citations (represented by the outward-pointing arrows).
Young academics will probably have a higher number of outward citations relative to their inward citations, because they are in the early stages of their careers and hence receive fewer citations than well-established academics. Younger staff may also tend to cite more works than established academics, because they are keener to demonstrate diligent scholarship and may feel more pressure to establish that their work is grounded in a comprehensive knowledge of relevant works in their discipline. Senior academics may be more experienced in defining topics narrowly, using a customary range of sources. And they may feel less need to prove knowledge of the literature through comprehensive references.