Introduction

This paper introduces a graphical method for testing for indicative excessive author self-citation and hence distinguishing this, via informed data review, from the true performance of the most influential researchers.

Citations are widely considered to be an indicator of a published work’s significance but how can an analyst confirm whether a high citation count for an individual is a genuine reflection mark of influence or a consequence of other factors such as extraordinary, even excessive, self-referencing? The question is pertinent because self-citation has featured in recent publications that address the possible misrepresentation of research performance by individuals and, specifically, attempts to game citation scores (Baccini et al. 2019; D’Antuono and Ciavarella 2019; Ioannidis et al. 2019; Kacem et al. 2019; Peroni et al. 2019; Seeber et al. 2019).

Note, in this paper, that we refer to self-references and to self-cites and that they are not the same thing. It should be obvious that citations (to older papers) come from references (in newer papers) and that the totals of references and cites are the same, but it is less obvious that the distributions differ. The rate at which authors self-reference can be a guide to cultural norms (and to outliers) as much as is the frequency of self-citing.

Studies of self-citation

The phenomenon of self-citation has long interested bibliometricians, sociologists of science, and scientists and scholars themselves (Kaplan 1965; Meadows and O’Connor 1971; Chubin and Moitra 1975; Narin 1976; Tagliacozzo 1977; Porter 1977; Garfield 1979; Lawani 1982; Peritz 1983; Brooks 1985, 1986; Trimble 1986; Merton 1988; MacRoberts and MacRoberts 1989; Bonzi and Snyder 1991; Bott and Hargens 1991, for example). More recent studies include those of Glänzel and colleagues (Glänzel et al. 2004, 2006; Glänzel and Thijs 2004a, b; Schubert et al. 2006; Thijs and Glänzel 2006; Glänzel 2008) and others (Aksnes 2003; Hyland 2003; Fowler and Aksnes 2007; van Raan 2008; Costas et al. 2010; Leblond 2012; Lin and Huang 2012; Cooke and Donaldson 2014; Ioannidis 2015; Soares et al. 2015; Galvez 2017; Hyland and Jiang 2018; Mishra et al. 2018; Zhao and Strotmann 2018; Kacem et al. 2019; Simoes and Crespo 2020). Waltman’s review of citation impact indicators includes a summary of studies on self-citation (Waltman 2016).

From one perspective, self-citation functions in the same way as any cited reference appended to a paper: it points to publications on which the present work depends, is related, or is positioned by the author. Reasons for self-citation include: “the cumulative nature of individual research, the need for personal gratification, or the value of self-citation as a rhetorical and tactical tool in the struggle for visibility and scientific authority” (Fowler and Aksnes 2007). Such reasons imply behaviors that conform both to Mertonian norms and also to the social constructivist interpretation of citation theory (Merton 1942; Kaplan 1965; Moravcsik and Murugesan 1975; Gilbert 1977; Cozzens 1989; Bonzi and Snyder 1991; White 2001; Bornmann and Daniel 2008; Davis 2009; Erikson and Erlandson 2014; Tahamtan and Bornmann 2018, 2019; Aksnes et al. 2019). Small (2004) offered a synthesis of motives for citation, in which citations serve a dual function as “vehicles of peer recognition and constructed symbols for specific original achievements.” In practice, self-citation seems rarely to be an activity located at one pole or the other—disinterestedness (Mertonian) or interestedness (social constructivist)—but generally to occupy some middle ground along a spectrum of motivations and meanings. White (2001) described most instances of self-citation as “egocentric without being egotistical” and others have explained or defended the legitimate desire to make one’s work visible to other scientists and scholars, in addition to indicating its relevance (Kaplan 1965; Pichappan and Sarasvady 2002; Hyland 2003; Glänzel et al. 2006; Glänzel 2008; van Raan 2008; Hyland and Jiang 2018; Mishra et al. 2018; van Raan 2019).

Nonetheless, since the earliest publications on self-citation, suspicion of self-citation continues amidst uncertainty as to whether it is in fact to be considered sui generis, discounted, or even removed entirely in any analysis of influence or impact. Such considerations stem from the Mertonian notion of a citation as a repayment of an intellectual debt, which in turn provides reward in the form of community recognition to those so credited (Small 2004). If, therefore, a self-citation is interpreted primarily as self-reward, the distance to self-promotion is not far away, and that behavior rubs up against the norm of disinterestedness which proscribes “self-aggrandizement” (Merton 1942). Merton later included humility in one description of the norms of scientific research (Merton 1957). While originality and priority are goals for scientists, self-promotion at the cost of disinterestedness and humility was to Merton, and is today, a violation at least in terms of community ideals if not always in reality (Macfarlane and Cheng 2008; Anderson et al. 2010; Kim and Kim 2018). Humility, noted Merton, serves “to reduce the misbehavior of scientists below the rate that would occur if importance were assigned only to originality and the establishing of priority” (Merton 1957).

Much has changed from the late 20th century to the early 21st.

In 1979, Eugene Garfield noted: “Theoretically, self-citations are a way of manipulating citation rates…. [but] it is quite difficult to use self-citation to inflate a citation count without being rather obvious about it. Any person attempting to do this would have to publish very frequently to make any difference. Given the refereeing system that controls the quality of the scientific literature in the better known journals, the high publication count could be achieved only if the person had a lot to say that was at least marginally significant. Otherwise, the person would be forced into publishing in obscure journals. The combination of a long bibliography of papers published in obscure journals and an abnormally high self-citation count would make the intent so obvious that the technique would be self-defeating” (Garfield 1979).

Narin (1976) saw another possible method for manipulating the citation record: “If citation analysis becomes an accepted, universal method of evaluating research utilization, scientists may conspire with their colleagues to cite one another to effect an increase in their individual citation counts…. The problem will arise if cronyism occurs only in isolated instances. In an isolated instance, citation counts will be highly inflated, leading to overestimates of the influence the scientist has in his field.” While Garfield doubted self-citation would be a problem in research evaluation, Narin imagined the possibility. In fact, later in the 1980s Merton predicted it in conversations with staff at the Institute for Scientific Information (D. Pendlebury, pers. comm.).

During the past two decades concern over the extent of an author’s self-citation has risen in tandem with increased emphasis on citation-related performance indicators in the context of research evaluation and competition for support (Hicks 2012; Wilsdon et al. 2015). The issue is whether, under pressure, scientists and scholars do indeed increase self-citation artificially to boost their potential for rewards (including appointments, promotions, and funding).

Today, both excessive self-citation and citation cartels are real concerns and have been documented (Ioannidis 2015; Fister et al. 2016; Heneberg 2016; Fong and Wilhite 2017; Zaggl 2017; Scarpa et al. 2018; Baccini et al. 2019; Biagioli et al. 2019; Seeber et al. 2019; Biagioli and Lippman 2020). Excessive and artificial forms of self-citation have been discussed and studied for:

  • Individuals, especially in the context of the h index (Hirsch 2005; Schreiber 2007; Engqvist and Frommen 2008; Zhivotovsky and Krutovsky 2008; Gianoli and Molina-Montenegro 2009; Schreiber 2009; Engqvist and Frommen 2010; Minasny et al. 2010; Bartneck and Kokkelmans 2011; Huang and Lin 2011; Ferrara and Romero 2013; Viiu 2016).

  • Journals, especially to game the impact factor (Rousseau 1999; Frandsen 2007; Yu and Wang 2007; Campanario 2011; Wilhite and Fong 2012; Yu et al. 2014; Chorus and Waltman 2016; Heneberg 2016; Yang et al. 2016; Campanario 2018; Yu et al. 2018; Copiello 2019; Ioannidis and Thombs 2019).

  • Institutions (Glänzel et al. 2006; Thijs and Glänzel 2006; Schubert et al. 2006; Hendrix 2009; Costas et al. 2010; Gul et al. 2017)

  • Nations (Aksnes 2003; Glänzel and Thijs 2004a; Glänzel et al. 2006; Schubert et al. 2006; Minasny et al. 2010; Ladle et al. 2012; Adams 2013; Tang et al. 2015; Bakare and Lewison 2017; Bornmann et al. 2018; Scarpa et al. 2018; Baccini et al. 2019; D’Antuono and Ciavarella 2019; Peroni et al. 2019; Shehatta and Al-Rubaish 2019). Scarpa and Baccini and colleagues have reported that the Italian national research assessment exercise (VQR, Research Quality Evaluation), initiated at the beginning of the past decade, provoked strategic behavior among researchers, especially those early in their careers, that resulted in significantly higher rates of self-citation than previously seen (Scarpa et al. 2018; Baccini et al. 2019; Peroni et al. 2019; Seeber et al. 2019), although not all agree with this finding (D’Antuono and Ciavarella 2019).

Moreover, excess is not the only object of study: differences in the rate at which men and women cite themselves is another topic of recent interest (Ghiasi et al. 2016; King et al. 2017; Mishra et al. 2018; Andersen et al. 2019).

Self-citation definitions and contexts

Before considering what rates or levels of self-citation are excessive and suggestive of citation manipulation, we need to consider the definition of self-citation and the factors apart from merit or manipulation that influence the extent of self-citation in the literature.

Many have observed that the term self-citation is frequently used to mean two different forms or modes of self-mention: self-referencing (cited reference to self within a list in one’s own publication); and self-citing (citations from one’s own work among all citations to one’s publications). An author’s self-references are a fixed ratio at a given time, and the calculation of a self-referencing rate uses as a denominator all cited references in the publications of the author. By contrast the calculation of a self-citation rate divides all citations to self by all citations received from self and others, and both accumulate over time.

The terminology used to describe these different aspects of self-mention can be confusing: self-referencing is called synchronous and (in terms of author or journal citations) the self-citing rate; self-citation is called diachronous and (in terms of author or journal citations) the self-cited rate (Garfield 1979; Lawani 1982; Todeschini and Baccini 2016). The distinction is important, however, since the two phenomena convey different meanings: “Self-referencing describes how much an author (or group of authors) draws upon their own work to inform the present work…. Self-citations, on the other hand, demonstrate the impact of the work upon the scientific community” (Sugimoto and Larivière 2018). A high rate of self-referencing may reflect research in a specialty area or convey a pattern of a “cohesive and sustained research program” (Cooke and Donaldson 2014). A high level of self-citation may also reflect a certain insularity in terms of area of investigation, but what it does not signal is broad or community-wide influence (MacRoberts and MacRoberts 1989; van Raan 1998; Aksnes 2003; Glänzel et al. 2006). Identifying community-wide influence is a central, though not sole, focus of the work that led to this paper, as will be described below.

The concept of self-citation (self-referencing or self-citation) can also be defined narrowly or broadly, that is, including or not including coauthors (Snyder and Bonzi 1998; Aksnes 2003; Glänzel et al. 2004, 2006; Fowler and Aksnes 2007; Costas et al. 2010; Carley et al. 2013). The narrow definition would require a specific author’s name on both citing and cited documents; the broad view would describe a self-citation as any instance in which any author name appears on the two (scientist B, without scientist A as an author, citing a paper by scientist A and B), sometimes called all-author to all-author self-citation. As will be seen below, this paper concentrates on author self-citation rather than co-author, or all-author to all-author, self-citation (following the approach of Kacem et al. 2019).

Benchmarks for self-citation

Qualifiers and confounding issues immediately arise when asking what constitutes excessive or extreme levels of author self-citation.

Is the researcher a senior investigator with a long record of publication? A high self-citation rate may reflect the extent of contributions and the researcher’s focus. A younger person with few publications would not generally exhibit high rates of self-citation. The length of time surveyed certainly affects any reading since self-citations tend to be given earlier by an author than by peers: over time an author’s self-citation rate tends to decline (Tagliacozzo 1977; Aksnes 2003; Glänzel et al. 2004; Costas et al. 2010).

As with all bibliometric indicators, field variation is significant, so much depends on field definition and the alignment of individuals and their work. The nature of research in a field, its level of fractionation, typical citation densities and potentials: all these and more contribute to what may be an expected rate of self-citation for a field or an individual.

In addition, self-citation rises with number of authors on a publication, as is also seen for citation impact (van Raan 1998; Aksnes 2003; Glänzel and Thijs 2004b; Leblond 2012; Lin and Huang 2012; Lariviève et al. 2015; Bornmann 2017). Fowler and Aksnes (2007), in a study that should give pause to all who wish to erase self-citations in an evaluation, demonstrated that self-citations generate citations from others by making publications more visible. Their study found no penalty for self-citation but diminishing returns with respect to generating citations from others beyond a certain point. In other words, self-citations are implicitly woven in the fabric of scientific and scholarly publishing and communication and, in that sense, their influence cannot be excised from the citation record.

It is argued that, in fact, the status of self-citations is more than that of a stepchild in the family of scientometrics (Schubert 2016). Kacem et al. (2019) also urge that self-citations should remain in a citation record and that what is needed is transparency, not removal. Zhao and Strotmann (2018) argued, on the basis on in-text analysis, that self-citations may be more important than other types of citation. Ioannidis et al. (2019), in publishing a vast compendium of citation data on some 100,000 Highly Cited Researchers, have likewise included, not excluded, self-citation in their analysis and reported its extent by author, employing co-author self-citation rates.

Conceding all variables and interactions, and different opinions about the function and meaning of self-citation, is there a ‘standard’ rate at which self-citation is expected to appear? An early estimate was 8%, based on first author to first author self-citation and modest data availability (Garfield and Sher 1963). That figure is close to the findings of many other researchers in the following five decades. There are too many studies to list, some dealing with broad and others with specific fields, and differing with respect to time windows, but a few may be mentioned with the statistics they report: 9% overall, 15% for the physical sciences, 6% for the social sciences, and 3% for the humanities, using co-author self-citation (Snyder and Bonzi 1998); 11%, using author self-citation (Fowler and Aksnes 2007); roughly 15% of references and 13% of citations, using co-author self-citation (Sugimoto and Larivière 2018); about 13%, using co-author self-citation (Mishra et al. 2018); about 13% as a median, using co-author self-citation (Ioannidis et al. 2019); and, 5%, using author self-citation (Kacem et al. 2019). A higher rate of self-citation is expected for co-author than for strict author self-citation, and, in general, a rising rate of co-author self-citation is observed with the growth in average number of authors per paper over time (van Raan 1998; Aksnes 2003; Glänzel and Thijs 2004b).

This study

No clear reference thresholds and no firm consensus on the management of self-citation data has emerged but opinion now appears to favor leaving self-citations in the data—even in the context of research evaluation, even for individuals, and thus contrary to traditional views—but accounting for them in some manner (Glänzel 2008; Costas et al. 2010; Cooke and Donaldson 2014; Huang and Lin 2011; Schubert 2016; Galvez 2017; Hyland and Jiang 2018; Mishra et al. 2018; Zhao and Strotmann 2018; Ioannidis et al. 2019; Kacem et al. 2019; Peroni et al. 2019).

Our work arose out of a practical exercise, undertaken at Clarivate Analytics since 2014, to identify a group of elite, highly cited scientists and social scientists (Highly Cited Researchers, https://recognition.webofsciencegroup.com/awards/highly-cited/2019/). Year on year, we have witnessed a small but increasing number of instances of the scenario described by Garfield long ago: prodigious publication in low-impact journals accompanied by high levels of author self-citation (Garfield 1979). Since the purpose of analysing highly cited research is to recognize scientists and social scientists with community-wide influence, which such portfolios lack, we introduced a filter to detect and review those whose high citation profile was in fact narrow and substantially self-generated. Our focus was not the detection of gaming, which carries motive, but rather to identify “evidence of the peer recognition that is the ultimate coin of the domain of science and scholarship” (Merton 1995) and that yet permits the inclusion of self-citation which, as summarized above, proves to be an integral component of research publication and communication and far too difficult to treat on a case-by-case basis within a large dataset.

In this paper we report our investigations as to whether there is a typical or ‘normal range’ of self-citation for each of 21 discipline-based fields employed in Essential Science Indicators (ESI: these are listed in the “Appendix” Table 1) and we describe a graphical test for significant outliers. We show that each field does have its characteristic range of self-citation and that relatively high rates are attached to this range in some fields. While marked outliers are detectable, they are not present in all fields and they cannot be excluded by a single universal arbiter such as a percentile.

Table 1 Web of Science Essential Science Indicators (ESI) fields and the abbreviated codes used in tables and figures elsewhere

Even with a graphical test, analytical identification of outliers is only the first step in assessing the data as a reliable indicator of community-wide influence or not. Informed judgment is, as elsewhere in good research management, an essential step in valid decision-making.

Methods

The dataset used for this analysis was the Web of Science list of Highly Cited Researchers for 2019. The list is intended to identify scientists and social scientists who have demonstrated significant community influence through their publication of multiple papers frequently cited by their peers during the last decade. Researchers are selected for their publication output in one or more of the 21 fields used in ESI or across several fields (“Appendix” Table 1).

This dataset has particular value. The publication clusters associated with Highly Cited Researchers have a high level of manual curation and each researcher has multiple papers all of which have many citations. These data are therefore susceptible to a thorough quantitative analysis that is unaffected by small-number effects among papers or cites for any single cluster. Because this dataset includes what are acknowledged to be particularly influential researchers, the results may not be wholly typical of the ‘average’ researcher in the field but should serve as a bellwether for culturally appropriate citation practice. Note, however, that the expected rate of author self-citation for this group is likely to be lower than for researchers in general owing to the high levels of citation (Highly Cited Researchers are at the end of the distribution driven by the laws of cumulative advantage: Price 1976; van Raan 2008).

Highly cited papers are defined for this purpose as those in the top 1% by citations in their ESI field and for their year of publication. For the 2019 Highly Cited Researchers analysis, the papers included in the preliminary data development were those published and cited during 2008–2018.

The highly cited papers were clustered by author, initially using an algorithmic approach. The preliminary clusters were then cleaned by visual inspection and further aggregated where appropriate.

The number of Highly Cited Researchers to be selected in each field is determined once the bulk of the highly cited papers in each field have been clustered by author. The number of researchers selected for the field is based on the square root of the population of unique authors listed on the field’s highly cited papers (Price 1971; Egghe 1987). Researchers are ranked by publication count within field and then a cut-off is applied at the threshold set by the square root calculation. The threshold number of highly cited papers that determines selection differs by field, with Clinical Medicine requiring the most and Agricultural Sciences, Economics & Business, and Pharmacology & Toxicology typically among the fewest.

Through this selection process, over 6000 researchers were recognized: 3517 within the 21 ESI fields; and 2491 in the additional cross-field category created to recognize equivalent contributions across several fields. Although the cross-field Highly Cited Researchers are reviewed in-house with other data, they are not included in the analysis in this paper because of the variance between fields in citation rates.

The Highly Cited Researchers review procedure has the goal of identifying community-wide influence and its focus is the extent of self-citing rather than self-referencing, but both are examined for each putative selectee. The publication portfolio of each of the initially selected Highly Cited Researchers in each ESI field is examined to determine how often on average their highly cited papers:

  1. 1.

    are cited by later papers on which they also appear as an author (Self-Citing, which reveals whether Highly Cited Researcher status is driven by self-citation rather than wider peer influence);

  2. 2.

    are citing earlier papers on which they also appear as an author (Self-Referencing, which reveals whether Highly Cited Researchers draw on their own research as distinct from a wider research network).

There are thus two comparator metrics of self-citation activity (inward and outward citations from highly cited papers) for each author cluster. We emphasize the labelling we use for these two because there is potential confusion in the terms ‘self citing’ and ‘self cited’, which have been used elsewhere. We believe that it is more intuitive to link ‘self referencing’ to the analysis of a publication’s reference list.

Rather than setting a percentage of author self-citation above which an individual would be eliminated from consideration, we examined the distributions of author self-citation rates (and author self-referencing rates) for each of the 21 fields to determine true outliers. For example, the top decile of author self-citation could be within normal range for some fields and yet not for others. Furthermore, publication and citation data are universally skewed and we will show that this is true of the range of self-citation rates across researchers. Consequently, it would not be universally true that a particular percentile across a field-specific range is necessarily caused by undue self-citation: it may simply be a relatively high value within normal range in that field.

We will show below that the range of self-citation rates among Highly Cited Researchers follows a negative binomial distribution (NBD). Because of the skew in such a distribution, an issue arises concerning the application of standard tests for data outliers. It is said that one person’s ‘outlier’ may be, for an NBD, simply an extreme part of the skew.

To aid in interpretation, therefore, we adopted a graphical approach and both linear and log plots and analyses of referencing and citing rates were examined.

  • The linear analysis provides an accessible dataset for initial interpretation of citation rates.

  • The log plots provide an informative analysis for outlier detection.

The median (M = Q2) and lower (Q1) and upper (Q3) quartile values for percentage self-citation rate among Highly Cited Researchers were calculated for each ESI field.

A threshold for indicative outliers was then introduced to the graphical analysis. Because of extreme skews, thresholds can only be indicative under any circumstances. They are not statistical tests but point to rather than definitively identifying possible cases of excessive self-citation. In a graphical analysis, the location of the outliers is readily compared to the location (and shape) of the inter-quartile range as a ready indicator of the degree of departure from typical behaviour.

We have evaluated variants on:

$${\text{Threshold}} = \left[ {{\text{Q}}3} \right] + \left[ {\left( {{\text{Q}}3 - {\text{Q}}1} \right)*N} \right]$$

That is to say: a test threshold value is set above the third quartile (Q3) boundary equivalent to N times the inter-quartile range (Q3–Q1). A standard boxplot normally displays 1.5 times the inter-quartile range (i.e. N = 1.5 in this equation).

A variety of values for N may be considered, where a low threshold would be more exclusive, delineating a relatively large proportion of researchers to be reviewed, and a high threshold would be conservative, maximising inclusiveness and focussing only on extreme outliers. The practical intention of a graphical method is that the threshold value applied to plotted data should identify exceptional instances of high self-citation compared to the normal range for that field and that set of researchers. The interpretation of ‘exceptional’ will therefore depend on purpose and circumstance.

Note again our strong caveat: a graphical test draws attention to data that may be considered to lie outside behavioural norms for that field, where a statistical test is made problematic by the nature of a skewed distribution, but it still cannot by itself identify any one outlier as egregious.

Results

The distribution of the average percentage of references that were to self-papers and citations that were from self-papers for each Highly Cited Researcher was calculated. For this exercise, ISI always uses full author counts rather than any variant form of fractional counting because the issue is a binary question of whether or not an author has self-cited.

The overall distributions are shown in Fig. 1 with the parameters of a lognormal fit to a negative binomial distribution. It is evident that the modal self-reference share is around 5–7% and the modal self-citation count is less than 5%, but this will be shown to vary by field.

Fig. 1
figure 1

The distributions of the average percentage of a self-references in highly cited papers and b self-citations from highly cited papers for 3517 Highly Cited Researchers identified in the Web of Science for 2008–2018 publications. Each graph shows the frequency distribution of the original data for the researchers and a computed log-normal fit of these data to a negative binomial distribution

To explore the variation in self-citation by field, a standard visualisation was created to evaluate the profile of the bulk of the population and the location within that profile of the median and upper and lower quartiles. Within each ESI field, the average self-citing rate for each Highly Cited Researcher’s output was displayed by ranked order of the percentage of cites (to highly cited papers) that were self-cites or references (listed in highly cited papers) that were self-cites or self-references.

The bulk of these followed a continuous range towards the high end of which there was a steeper change in trajectory and substantively higher self-citation rates with points lying outside the continuous range.

Data are plotted both as a linear and as a log-normal plot. The linear plot is of benefit because it allows ready interpretation and reference back to the original data values. The log plot is of benefit because it is a more appropriate presentation of the data for a negative binomial distribution. Overall, our interest is in maximising our understanding of the data and the status of outliers, which comes through multiple perspectives, rather than with the technical rightness or wrongness of any specific visualisation.

An explanatory and interpretive example for Chemistry, which had 240 Highly Cited Researchers in the 2019 dataset, is shown in Fig. 2. There are four plots: two each (one linear and one log plot) for self-references and self-citations. The rate of self-citation increases progressively across most of the spread of researchers. There is a continuous range from just above zero self-citation to around 9% self-citing and 22% self-referencing. The trajectory changes at the higher end, curving upwards towards much greater self-citation rates. Individual author data-points appear as outliers, discontinuous with the central range, which lie beyond the calculated test thresholds.

Fig. 2
figure 2

Self-citation rates for 240 Highly Cited Researchers in the Essential Science Indicators field of Chemistry (CHE), analysing Web of Science data for 2008–2018. Data displayed are the average percentage of citations from (self-referencing to earlier papers) and to (self-citing from later publications) highly cited papers on which a researcher is also an author or co-author. The same data are shown as a linear plot and a log plot. Each graph includes Q1, Median, Q3 and the lower and upper thresholds (set at 1.5 and 3 times the inter-quartile range, IQR) for indicative outliers. These are only labelled on the first graph

The process captured in Fig. 2 was repeated with the other ESI fields. “Appendix” (Figs. 6, 7, 8, 9) includes all graphs for each field, to illustrate the balance of consistency and variability in self-referencing and self-citing behaviour. Relevant data are summarised for reference in “Appendix” Tables 2 and 3.

Table 2 Key parameters for rates of self-referencing (author references in highly cited papers to own previous papers) by Highly Cited Researchers, grouped by Essential Science Indicators category and with categories ranked by median rate of self-citation
Table 3 Key parameters for rates of self-citing (subsequent researcher citations to own highly cited papers) by Highly Cited Researchers, grouped by Essential Science Indicators category and with categories ranked by median rate of self-citation

A difference revealed in the analysis by field was the variable extent to which outliers were present and detectable. This suggests that a propensity to unusually high self-citation rates is not distributed uniformly across all research areas. This is very important when considering tests for outliers.

We considered a variety of thresholds, but as visual guidelines we found that the accepted box-plot standard of 1.5 times the inter-quartile range worked well. This is most easily interpreted on the linear plot, whereas statistical sense would suggest that the data should be log-plotted because of the skewed negative binomial distribution. The linear thresholds were therefore plotted on the log-plots rather than calculating a log-normal value which would have been very inclusive with few indicative outliers.

Summary data for the key parameters of the distribution of average rates for all fields are shown in Figs. 3 (self-referencing) and Fig. 4 (self-citing) and in the “Appendix” (Table 2: self-referencing and Table 3: self-citing) as linear data for ready interpretation of percentages.

Fig. 3
figure 3

Key parameters for analysing the distributions of the frequency of self-references among Highly Cited Researchers (2008–2018) by Essential Science Indicator fields. These data are also summarised in the “Appendix” Table 2. Fields are ranked by median self-referencing

Fig. 4
figure 4

Key parameters for analysing the distributions of the frequency of self-citations among Highly Cited Researchers (2008–2018) by Essential Science Indicator fields. These data are also summarised in the “Appendix” Table 3. Fields are ranked by median self-citation rate

Figure 3 presents clear visual confirmation that the self-referencing rate across most fields—shown here by rank order of the median—is rather low with a very consistent pattern which includes all but the most-citing three or four fields. Median rates of self-referencing were generally within a small range, from 7.38% in Economics & Business (ECB) to 11.26% in Social Sciences (SSS) with upper quartiles below 16%. Exceptional medians were Space Science (SPA, 14.1%) and Mathematics (MAT, 15.6%). Pharmacology & Toxicology (PHT) is also an outlier with a median of 0%. This odd result is driven by frequent group authorship of highly cited papers. As noted in the Introduction, ‘typical’ rate of self-referencing around 10% was discussed by Garfield (1979) and has since been found by others. Thus, we can consider the core referencing behaviour of the overall group of Highly Cited Researchers, as research leaders, to be in accordance with precedents previously established.

The patterns for the key parameters for self-citations (Fig. 4) show a superficially similar pattern to those for self-referencing, with a relatively consistent series of median values up to around 6% self-citations. These rates of self-citation are evidently much lower than those of self-referencing, as expected for Highly Cited Researchers because their broad peer influence attracts many cites from others (van Raan 2008), and they are relatively more variable between fields compared to self-referencing. The upper quartile values rise from 3% (ECB) to 11% (ENG).

The field-specific self-citation profiles, showing a data point for each Highly Cited Researcher and following the style of Fig. 2, are collated in the “Appendix” ( Figs. 6, 7, 8, 9). Some descriptive examples illustrate the variation.

Clinical medicine is the largest field (448 Highly Cited Researchers). The self-citation rate to highly cited papers is relatively consistent and mostly within a central range up to 10% self-citations. “Central range” is used here as a descriptive, non-technical term to indicate the more consistent plateau of similar self-citation rates within each field that encompasses somewhat more than the inter-quartile range for these data. It appears to reflect the typical cultural self-citation behaviour in a field. By comparison, there are points on a higher trajectory and, while these were largely between the low and high statistical indicators, some exceed the high indicative outlier threshold value.

Engineering (225 Highly Cited Researchers) has a continuous range that climbs to near 20% but then has a steeper part of the curve. The change in trajectory close to the lower indicative outlier threshold, combined with a distinct ‘break’ in the series around 27%, makes it questionable whether the next part of the series is still part of the ‘normal range’ for the field, or represents a distinct sub-field with higher self-citing, or is indicative of inappropriate citation behaviour. However, the spread of outlier values that rise above the high threshold may be clearer evidence of unusual citation behaviour well outside the normal range.

Mathematics is a relatively small ESI field (97 Highly Cited Researchers) and it is possibly composed of a number of distinct sub-fields (see Bensman et al. 2010). The self-citation pattern is different to other sciences and rates appear generally to be higher. The continuous central range extends up to at least 30% self-citing: this is a rate well above most other fields. A series of disconnected points, rather more extended than in other fields, lies above this crossing the low and then the high indicative outlier threshold. Unusually, self-citations to highly cited papers are very high and actually exceed the self-referencing rates for all other fields. This certainly cautions against any simple test of inappropriate behaviour and confirms the need for informed scrutiny and interpretation.

Physics (194 Highly Cited Researchers) is a similar sized field to Engineering to which its research culture might also be linked. However, its self-citation pattern is much more similar to Clinical Medicine. Almost all individuals’ records lie clearly within the ‘normal’ central range and largely below the low threshold. Although the trajectory changes and the range extends beyond the high threshold there is no break with the overall distribution.

The correlation between the rates of self-citation to and from the highly cited papers of each researcher was calculated and was highly significant within every field. As examples: for the ESI field of Chemistry (Fig. 2: r = 0.55, N = 240 researchers, P < 0.001) and for the ESI field of Engineering (r = 0.59, N = 225 researchers, P < 0.001). The value of the correlation coefficient was lower in fields with fewer outliers, which suggests these high values drive the correlation, but was still statistically significant.

It will be noted in the Figs. 6, 7, 8 and 9 in the “Appendix” that:

  • The central normal range is similar, except in Engineering and Mathematics;

  • The value of the low statistical threshold is well defined at the high end of the normal range, where the continuous distribution often breaks;

  • Many but not all fields have some outlier values that lie above the high threshold

  • The number of these outliers is unrelated to the numbers of researchers profiled.

The median self-reference rate and the median self-citation rate for each ESI field are compared in Fig. 5a, b compares the calculated low outlier threshold (i.e. 1.5 times the inter-quartile range as for box-plots) for each field.

Fig. 5
figure 5

The comparative median rates of a self-referencing and b self-citation analysed by Essential Science Indicator field and the comparative lower thresholds for indicative outliers in those fields. The self-referencing median rates and outlier thresholds are similar across fields but these are more variable for the self-citation indicators. Fields that lie outside the main distribution (Pharmacology, Mathematics and Space Science) have been labelled

In every field, except Mathematics, the median and lower indicative threshold self-citing rates (to the highly cited papers) are lower but more variable than that of the self-referencing (from the highly cited papers) (Fig. 5a, b). The average for the ESI Mathematics is, in fact, a high outlier itself, for both self-referencing and even more for self-citing. This emphasises the need for caution in understanding and interpreting the publication culture of a field. Space Science has an exceptional level of self-referencing, which is presumably also a consequence of the nature of research in that area.

The field of Pharmacology and Toxiciology stands out for a different reason. Self-citations are somewhat lower than many fields but self-referencing is extremely low and often zero. This is a consequence of the relatively large number of (often annually refreshed) review papers in these fields.

This appears to make the recognition of outlier values relatively straightforward. Very importantly, however, the data here and in the Appendix show that in some fields, within the central normal range and below the statistical threshold, there can be a well-populated and unproblematic upper decile and even upper percentile.

The outcomes of analysis of citations to and from highly cited papers broadly concur. We found that there a significant correlation between the values for individual researchers. This was particularly true among the outliers, in other words: researchers who self-reference at unusually high rates (above the central range) are also authors on papers where an unusually high proportion of the citations received are self-citations.

Discussion

The work we report here has focussed on a particular tranche of researchers: individuals with a significant portfolio of papers that all lie within the 1% most highly cited for their field. The extended verification and manual curation applied to these data has enabled us to focus with some accuracy on the citation behaviour that is revealed and, since this group necessarily includes a high proportion of leading and influential researchers, the behaviour reflected in these data may be taken as a signal of established cultural norms. Characteristics revealed by an analysis of researcher portfolios at the level of the Essential Science Indicators fields include:

  • A consistent central or normal range of average self-citation for researchers in every discipline-based category. (As noted above: “central range” is a descriptive, non-technical term used to indicate the consistent plateau of similar Highly Cited Researcher self-citation rates within each field that appears to reflect typical cultural self-citation behaviour.)

  • Observation of the field profile is more informative than any ‘average’ value and, for most fields, the central range lies below 10% self-citation.

  • Highest rates of self-citing in some categories that are still within that central range. In every field-based profile, some researcher portfolios are on a steeper part of the curve with values above but continuous with the central range.

  • Outliers beyond the ‘high-end’ group in some but not all categories, and where occurrence is unrelated to sample size. These outliers may be distinguished graphically by setting a threshold related to the inter-quartile range of the relevant dataset.

We agree with a possible criticism that the group of researchers analysed here may not be wholly representative of their field since they are generally likely to be more experienced in building a substantial portfolio of well-cited publications. However, their citation records are substantial enough not to be affected by small variations and they should be broadly representative of peer expectations about standards and good practice in publication and citation behaviour. Furthermore, many will be seen by more junior peers as exemplars of leadership in their field and the standards they set would therefore be deemed ‘appropriate’ to the field.

The patterns of self-citation rates suggest a very broad cultural consensus about the balance between building on one’s own work and recognising the influence of others. A clear exception to the general profile is Mathematics, where self-citation rates in the continuous distribution may be well above 30%. The reasons for this require expert, informed interpretation but the degree to which this ‘category’ may capture a series of small, isolated fields has been noted previously (Bensman et al. 2010). The wider implication is that, despite the general and widespread pattern of self-cites across field profiles, care will always be required to interpret the particular statistics of any field-limited sample.

The identification of a fixed fraction, such as the upper decile, as ‘outliers’ is evidently inappropriate. In fields such as Molecular Biology and Psychiatry/Psychology, there are no significant outliers in the sample: every researcher portfolio is included within the continuous range, albeit with some slightly above the core of the ‘normal’ range. Ioannidis et al. (2019) were right to draw attention to the statistical properties of these distributions: they have regular and testable properties. But, in practice, a visual inspection of the full profile for the data set is required to complete a valid interpretation.

The IQR-related threshold tested in this analysis is highly conservative: it seeks to attach as many researcher portfolios as possible to the central range and to establish where the potentially most egregious outliers are located. More radical approaches would identify more of the researchers close to but above the high end of the central range as requiring investigation, but this could perhaps also enable a focus on individual cases where an unusually high rate of self-cites may be justifiable for topic-specific reasons.

The key to interpretation is the need to determine the relationship between summary citation impact and fundamental research influence. A paper with an exceptionally high ratio of self- to total-cites is clearly not reflecting the same degree of influence as one with a similar citation count where most of the cites are from other researchers.

As Garfield (1979) noted four decades ago, we can also take into account the venues in which the work appears. The publication channels for many Highly Cited Researchers are the journals readily recognised as leading in their field, but this is not the case for some prolific self-citers. This raises questions about editorial and refereeing standards and the ability of the research base to manage its own affairs. If there are indeed a significant number of individuals who abuse the conventional research system by inappropriate practice then why does the system not detect and sanction this at review before publication? Or is the management of the journals also a problem?

Finally, attention must be paid to the presence of co-authors on publications that have self-citation rates significantly above expectations. In the past, as here, attention has been given to ‘self-citation’ in isolation but in fact the co-authors also benefit. If a co-author has established a track record that depends in fact on egregious self-citation by their colleagues then their record is no more valid than the self-citer, of whom they must be aware.

This final point is important because it changes our focus for the future. This analysis, like others before, has been about individual self-citation by researchers. In fact, the problem for the community is the research papers to which attention is falsely drawn by high citation counts generated by the author or authors. If these papers were suppressed then this would not only be an appropriate signal to the offender and their co-authors but also serve to remove offensive noise from the research information system.