Skip to main content

How much is too much? The difference between research influence and self-citation excess

Abstract

Citations can be an indicator of publication significance, utility, attention, visibility or short-term impact but analysts need to confirm whether a high citation count for an individual is a genuine reflection of influence or a consequence of extraordinary, even excessive, self-citation. It has recently been suggested there may be increasing misrepresentation of research performance by individuals who self-cite inordinately to achieve scores and win rewards. In this paper we consider self-referencing and self-citing, describe the typical shape of self-citation patterns for carefully curated publication sets authored by 3517 Highly Cited Researchers and quantify the variance in the distribution of self-citation rates within and between all 21 Essential Science Indicators’ fields. We describe both a generic level of median self-referencing rates, common to most fields, and a graphical, distribution-driven assessment of excessive self-citation that demarcates a threshold not dependent on statistical tests or percentiles (since for some fields all values are within a central ‘normal’ range). We describe this graphical procedure for identifying exceptional self-citation rates but emphasize the necessity for expert interpretation of the citation profiles of specific individuals, particularly in fields with atypical self-citation patterns.

Introduction

This paper introduces a graphical method for testing for indicative excessive author self-citation and hence distinguishing this, via informed data review, from the true performance of the most influential researchers.

Citations are widely considered to be an indicator of a published work’s significance but how can an analyst confirm whether a high citation count for an individual is a genuine reflection mark of influence or a consequence of other factors such as extraordinary, even excessive, self-referencing? The question is pertinent because self-citation has featured in recent publications that address the possible misrepresentation of research performance by individuals and, specifically, attempts to game citation scores (Baccini et al. 2019; D’Antuono and Ciavarella 2019; Ioannidis et al. 2019; Kacem et al. 2019; Peroni et al. 2019; Seeber et al. 2019).

Note, in this paper, that we refer to self-references and to self-cites and that they are not the same thing. It should be obvious that citations (to older papers) come from references (in newer papers) and that the totals of references and cites are the same, but it is less obvious that the distributions differ. The rate at which authors self-reference can be a guide to cultural norms (and to outliers) as much as is the frequency of self-citing.

Studies of self-citation

The phenomenon of self-citation has long interested bibliometricians, sociologists of science, and scientists and scholars themselves (Kaplan 1965; Meadows and O’Connor 1971; Chubin and Moitra 1975; Narin 1976; Tagliacozzo 1977; Porter 1977; Garfield 1979; Lawani 1982; Peritz 1983; Brooks 1985, 1986; Trimble 1986; Merton 1988; MacRoberts and MacRoberts 1989; Bonzi and Snyder 1991; Bott and Hargens 1991, for example). More recent studies include those of Glänzel and colleagues (Glänzel et al. 2004, 2006; Glänzel and Thijs 2004a, b; Schubert et al. 2006; Thijs and Glänzel 2006; Glänzel 2008) and others (Aksnes 2003; Hyland 2003; Fowler and Aksnes 2007; van Raan 2008; Costas et al. 2010; Leblond 2012; Lin and Huang 2012; Cooke and Donaldson 2014; Ioannidis 2015; Soares et al. 2015; Galvez 2017; Hyland and Jiang 2018; Mishra et al. 2018; Zhao and Strotmann 2018; Kacem et al. 2019; Simoes and Crespo 2020). Waltman’s review of citation impact indicators includes a summary of studies on self-citation (Waltman 2016).

From one perspective, self-citation functions in the same way as any cited reference appended to a paper: it points to publications on which the present work depends, is related, or is positioned by the author. Reasons for self-citation include: “the cumulative nature of individual research, the need for personal gratification, or the value of self-citation as a rhetorical and tactical tool in the struggle for visibility and scientific authority” (Fowler and Aksnes 2007). Such reasons imply behaviors that conform both to Mertonian norms and also to the social constructivist interpretation of citation theory (Merton 1942; Kaplan 1965; Moravcsik and Murugesan 1975; Gilbert 1977; Cozzens 1989; Bonzi and Snyder 1991; White 2001; Bornmann and Daniel 2008; Davis 2009; Erikson and Erlandson 2014; Tahamtan and Bornmann 2018, 2019; Aksnes et al. 2019). Small (2004) offered a synthesis of motives for citation, in which citations serve a dual function as “vehicles of peer recognition and constructed symbols for specific original achievements.” In practice, self-citation seems rarely to be an activity located at one pole or the other—disinterestedness (Mertonian) or interestedness (social constructivist)—but generally to occupy some middle ground along a spectrum of motivations and meanings. White (2001) described most instances of self-citation as “egocentric without being egotistical” and others have explained or defended the legitimate desire to make one’s work visible to other scientists and scholars, in addition to indicating its relevance (Kaplan 1965; Pichappan and Sarasvady 2002; Hyland 2003; Glänzel et al. 2006; Glänzel 2008; van Raan 2008; Hyland and Jiang 2018; Mishra et al. 2018; van Raan 2019).

Nonetheless, since the earliest publications on self-citation, suspicion of self-citation continues amidst uncertainty as to whether it is in fact to be considered sui generis, discounted, or even removed entirely in any analysis of influence or impact. Such considerations stem from the Mertonian notion of a citation as a repayment of an intellectual debt, which in turn provides reward in the form of community recognition to those so credited (Small 2004). If, therefore, a self-citation is interpreted primarily as self-reward, the distance to self-promotion is not far away, and that behavior rubs up against the norm of disinterestedness which proscribes “self-aggrandizement” (Merton 1942). Merton later included humility in one description of the norms of scientific research (Merton 1957). While originality and priority are goals for scientists, self-promotion at the cost of disinterestedness and humility was to Merton, and is today, a violation at least in terms of community ideals if not always in reality (Macfarlane and Cheng 2008; Anderson et al. 2010; Kim and Kim 2018). Humility, noted Merton, serves “to reduce the misbehavior of scientists below the rate that would occur if importance were assigned only to originality and the establishing of priority” (Merton 1957).

Much has changed from the late 20th century to the early 21st.

In 1979, Eugene Garfield noted: “Theoretically, self-citations are a way of manipulating citation rates…. [but] it is quite difficult to use self-citation to inflate a citation count without being rather obvious about it. Any person attempting to do this would have to publish very frequently to make any difference. Given the refereeing system that controls the quality of the scientific literature in the better known journals, the high publication count could be achieved only if the person had a lot to say that was at least marginally significant. Otherwise, the person would be forced into publishing in obscure journals. The combination of a long bibliography of papers published in obscure journals and an abnormally high self-citation count would make the intent so obvious that the technique would be self-defeating” (Garfield 1979).

Narin (1976) saw another possible method for manipulating the citation record: “If citation analysis becomes an accepted, universal method of evaluating research utilization, scientists may conspire with their colleagues to cite one another to effect an increase in their individual citation counts…. The problem will arise if cronyism occurs only in isolated instances. In an isolated instance, citation counts will be highly inflated, leading to overestimates of the influence the scientist has in his field.” While Garfield doubted self-citation would be a problem in research evaluation, Narin imagined the possibility. In fact, later in the 1980s Merton predicted it in conversations with staff at the Institute for Scientific Information (D. Pendlebury, pers. comm.).

During the past two decades concern over the extent of an author’s self-citation has risen in tandem with increased emphasis on citation-related performance indicators in the context of research evaluation and competition for support (Hicks 2012; Wilsdon et al. 2015). The issue is whether, under pressure, scientists and scholars do indeed increase self-citation artificially to boost their potential for rewards (including appointments, promotions, and funding).

Today, both excessive self-citation and citation cartels are real concerns and have been documented (Ioannidis 2015; Fister et al. 2016; Heneberg 2016; Fong and Wilhite 2017; Zaggl 2017; Scarpa et al. 2018; Baccini et al. 2019; Biagioli et al. 2019; Seeber et al. 2019; Biagioli and Lippman 2020). Excessive and artificial forms of self-citation have been discussed and studied for:

  • Individuals, especially in the context of the h index (Hirsch 2005; Schreiber 2007; Engqvist and Frommen 2008; Zhivotovsky and Krutovsky 2008; Gianoli and Molina-Montenegro 2009; Schreiber 2009; Engqvist and Frommen 2010; Minasny et al. 2010; Bartneck and Kokkelmans 2011; Huang and Lin 2011; Ferrara and Romero 2013; Viiu 2016).

  • Journals, especially to game the impact factor (Rousseau 1999; Frandsen 2007; Yu and Wang 2007; Campanario 2011; Wilhite and Fong 2012; Yu et al. 2014; Chorus and Waltman 2016; Heneberg 2016; Yang et al. 2016; Campanario 2018; Yu et al. 2018; Copiello 2019; Ioannidis and Thombs 2019).

  • Institutions (Glänzel et al. 2006; Thijs and Glänzel 2006; Schubert et al. 2006; Hendrix 2009; Costas et al. 2010; Gul et al. 2017)

  • Nations (Aksnes 2003; Glänzel and Thijs 2004a; Glänzel et al. 2006; Schubert et al. 2006; Minasny et al. 2010; Ladle et al. 2012; Adams 2013; Tang et al. 2015; Bakare and Lewison 2017; Bornmann et al. 2018; Scarpa et al. 2018; Baccini et al. 2019; D’Antuono and Ciavarella 2019; Peroni et al. 2019; Shehatta and Al-Rubaish 2019). Scarpa and Baccini and colleagues have reported that the Italian national research assessment exercise (VQR, Research Quality Evaluation), initiated at the beginning of the past decade, provoked strategic behavior among researchers, especially those early in their careers, that resulted in significantly higher rates of self-citation than previously seen (Scarpa et al. 2018; Baccini et al. 2019; Peroni et al. 2019; Seeber et al. 2019), although not all agree with this finding (D’Antuono and Ciavarella 2019).

Moreover, excess is not the only object of study: differences in the rate at which men and women cite themselves is another topic of recent interest (Ghiasi et al. 2016; King et al. 2017; Mishra et al. 2018; Andersen et al. 2019).

Self-citation definitions and contexts

Before considering what rates or levels of self-citation are excessive and suggestive of citation manipulation, we need to consider the definition of self-citation and the factors apart from merit or manipulation that influence the extent of self-citation in the literature.

Many have observed that the term self-citation is frequently used to mean two different forms or modes of self-mention: self-referencing (cited reference to self within a list in one’s own publication); and self-citing (citations from one’s own work among all citations to one’s publications). An author’s self-references are a fixed ratio at a given time, and the calculation of a self-referencing rate uses as a denominator all cited references in the publications of the author. By contrast the calculation of a self-citation rate divides all citations to self by all citations received from self and others, and both accumulate over time.

The terminology used to describe these different aspects of self-mention can be confusing: self-referencing is called synchronous and (in terms of author or journal citations) the self-citing rate; self-citation is called diachronous and (in terms of author or journal citations) the self-cited rate (Garfield 1979; Lawani 1982; Todeschini and Baccini 2016). The distinction is important, however, since the two phenomena convey different meanings: “Self-referencing describes how much an author (or group of authors) draws upon their own work to inform the present work…. Self-citations, on the other hand, demonstrate the impact of the work upon the scientific community” (Sugimoto and Larivière 2018). A high rate of self-referencing may reflect research in a specialty area or convey a pattern of a “cohesive and sustained research program” (Cooke and Donaldson 2014). A high level of self-citation may also reflect a certain insularity in terms of area of investigation, but what it does not signal is broad or community-wide influence (MacRoberts and MacRoberts 1989; van Raan 1998; Aksnes 2003; Glänzel et al. 2006). Identifying community-wide influence is a central, though not sole, focus of the work that led to this paper, as will be described below.

The concept of self-citation (self-referencing or self-citation) can also be defined narrowly or broadly, that is, including or not including coauthors (Snyder and Bonzi 1998; Aksnes 2003; Glänzel et al. 2004, 2006; Fowler and Aksnes 2007; Costas et al. 2010; Carley et al. 2013). The narrow definition would require a specific author’s name on both citing and cited documents; the broad view would describe a self-citation as any instance in which any author name appears on the two (scientist B, without scientist A as an author, citing a paper by scientist A and B), sometimes called all-author to all-author self-citation. As will be seen below, this paper concentrates on author self-citation rather than co-author, or all-author to all-author, self-citation (following the approach of Kacem et al. 2019).

Benchmarks for self-citation

Qualifiers and confounding issues immediately arise when asking what constitutes excessive or extreme levels of author self-citation.

Is the researcher a senior investigator with a long record of publication? A high self-citation rate may reflect the extent of contributions and the researcher’s focus. A younger person with few publications would not generally exhibit high rates of self-citation. The length of time surveyed certainly affects any reading since self-citations tend to be given earlier by an author than by peers: over time an author’s self-citation rate tends to decline (Tagliacozzo 1977; Aksnes 2003; Glänzel et al. 2004; Costas et al. 2010).

As with all bibliometric indicators, field variation is significant, so much depends on field definition and the alignment of individuals and their work. The nature of research in a field, its level of fractionation, typical citation densities and potentials: all these and more contribute to what may be an expected rate of self-citation for a field or an individual.

In addition, self-citation rises with number of authors on a publication, as is also seen for citation impact (van Raan 1998; Aksnes 2003; Glänzel and Thijs 2004b; Leblond 2012; Lin and Huang 2012; Lariviève et al. 2015; Bornmann 2017). Fowler and Aksnes (2007), in a study that should give pause to all who wish to erase self-citations in an evaluation, demonstrated that self-citations generate citations from others by making publications more visible. Their study found no penalty for self-citation but diminishing returns with respect to generating citations from others beyond a certain point. In other words, self-citations are implicitly woven in the fabric of scientific and scholarly publishing and communication and, in that sense, their influence cannot be excised from the citation record.

It is argued that, in fact, the status of self-citations is more than that of a stepchild in the family of scientometrics (Schubert 2016). Kacem et al. (2019) also urge that self-citations should remain in a citation record and that what is needed is transparency, not removal. Zhao and Strotmann (2018) argued, on the basis on in-text analysis, that self-citations may be more important than other types of citation. Ioannidis et al. (2019), in publishing a vast compendium of citation data on some 100,000 Highly Cited Researchers, have likewise included, not excluded, self-citation in their analysis and reported its extent by author, employing co-author self-citation rates.

Conceding all variables and interactions, and different opinions about the function and meaning of self-citation, is there a ‘standard’ rate at which self-citation is expected to appear? An early estimate was 8%, based on first author to first author self-citation and modest data availability (Garfield and Sher 1963). That figure is close to the findings of many other researchers in the following five decades. There are too many studies to list, some dealing with broad and others with specific fields, and differing with respect to time windows, but a few may be mentioned with the statistics they report: 9% overall, 15% for the physical sciences, 6% for the social sciences, and 3% for the humanities, using co-author self-citation (Snyder and Bonzi 1998); 11%, using author self-citation (Fowler and Aksnes 2007); roughly 15% of references and 13% of citations, using co-author self-citation (Sugimoto and Larivière 2018); about 13%, using co-author self-citation (Mishra et al. 2018); about 13% as a median, using co-author self-citation (Ioannidis et al. 2019); and, 5%, using author self-citation (Kacem et al. 2019). A higher rate of self-citation is expected for co-author than for strict author self-citation, and, in general, a rising rate of co-author self-citation is observed with the growth in average number of authors per paper over time (van Raan 1998; Aksnes 2003; Glänzel and Thijs 2004b).

This study

No clear reference thresholds and no firm consensus on the management of self-citation data has emerged but opinion now appears to favor leaving self-citations in the data—even in the context of research evaluation, even for individuals, and thus contrary to traditional views—but accounting for them in some manner (Glänzel 2008; Costas et al. 2010; Cooke and Donaldson 2014; Huang and Lin 2011; Schubert 2016; Galvez 2017; Hyland and Jiang 2018; Mishra et al. 2018; Zhao and Strotmann 2018; Ioannidis et al. 2019; Kacem et al. 2019; Peroni et al. 2019).

Our work arose out of a practical exercise, undertaken at Clarivate Analytics since 2014, to identify a group of elite, highly cited scientists and social scientists (Highly Cited Researchers, https://recognition.webofsciencegroup.com/awards/highly-cited/2019/). Year on year, we have witnessed a small but increasing number of instances of the scenario described by Garfield long ago: prodigious publication in low-impact journals accompanied by high levels of author self-citation (Garfield 1979). Since the purpose of analysing highly cited research is to recognize scientists and social scientists with community-wide influence, which such portfolios lack, we introduced a filter to detect and review those whose high citation profile was in fact narrow and substantially self-generated. Our focus was not the detection of gaming, which carries motive, but rather to identify “evidence of the peer recognition that is the ultimate coin of the domain of science and scholarship” (Merton 1995) and that yet permits the inclusion of self-citation which, as summarized above, proves to be an integral component of research publication and communication and far too difficult to treat on a case-by-case basis within a large dataset.

In this paper we report our investigations as to whether there is a typical or ‘normal range’ of self-citation for each of 21 discipline-based fields employed in Essential Science Indicators (ESI: these are listed in the “Appendix” Table 1) and we describe a graphical test for significant outliers. We show that each field does have its characteristic range of self-citation and that relatively high rates are attached to this range in some fields. While marked outliers are detectable, they are not present in all fields and they cannot be excluded by a single universal arbiter such as a percentile.

Table 1 Web of Science Essential Science Indicators (ESI) fields and the abbreviated codes used in tables and figures elsewhere

Even with a graphical test, analytical identification of outliers is only the first step in assessing the data as a reliable indicator of community-wide influence or not. Informed judgment is, as elsewhere in good research management, an essential step in valid decision-making.

Methods

The dataset used for this analysis was the Web of Science list of Highly Cited Researchers for 2019. The list is intended to identify scientists and social scientists who have demonstrated significant community influence through their publication of multiple papers frequently cited by their peers during the last decade. Researchers are selected for their publication output in one or more of the 21 fields used in ESI or across several fields (“Appendix” Table 1).

This dataset has particular value. The publication clusters associated with Highly Cited Researchers have a high level of manual curation and each researcher has multiple papers all of which have many citations. These data are therefore susceptible to a thorough quantitative analysis that is unaffected by small-number effects among papers or cites for any single cluster. Because this dataset includes what are acknowledged to be particularly influential researchers, the results may not be wholly typical of the ‘average’ researcher in the field but should serve as a bellwether for culturally appropriate citation practice. Note, however, that the expected rate of author self-citation for this group is likely to be lower than for researchers in general owing to the high levels of citation (Highly Cited Researchers are at the end of the distribution driven by the laws of cumulative advantage: Price 1976; van Raan 2008).

Highly cited papers are defined for this purpose as those in the top 1% by citations in their ESI field and for their year of publication. For the 2019 Highly Cited Researchers analysis, the papers included in the preliminary data development were those published and cited during 2008–2018.

The highly cited papers were clustered by author, initially using an algorithmic approach. The preliminary clusters were then cleaned by visual inspection and further aggregated where appropriate.

The number of Highly Cited Researchers to be selected in each field is determined once the bulk of the highly cited papers in each field have been clustered by author. The number of researchers selected for the field is based on the square root of the population of unique authors listed on the field’s highly cited papers (Price 1971; Egghe 1987). Researchers are ranked by publication count within field and then a cut-off is applied at the threshold set by the square root calculation. The threshold number of highly cited papers that determines selection differs by field, with Clinical Medicine requiring the most and Agricultural Sciences, Economics & Business, and Pharmacology & Toxicology typically among the fewest.

Through this selection process, over 6000 researchers were recognized: 3517 within the 21 ESI fields; and 2491 in the additional cross-field category created to recognize equivalent contributions across several fields. Although the cross-field Highly Cited Researchers are reviewed in-house with other data, they are not included in the analysis in this paper because of the variance between fields in citation rates.

The Highly Cited Researchers review procedure has the goal of identifying community-wide influence and its focus is the extent of self-citing rather than self-referencing, but both are examined for each putative selectee. The publication portfolio of each of the initially selected Highly Cited Researchers in each ESI field is examined to determine how often on average their highly cited papers:

  1. 1.

    are cited by later papers on which they also appear as an author (Self-Citing, which reveals whether Highly Cited Researcher status is driven by self-citation rather than wider peer influence);

  2. 2.

    are citing earlier papers on which they also appear as an author (Self-Referencing, which reveals whether Highly Cited Researchers draw on their own research as distinct from a wider research network).

There are thus two comparator metrics of self-citation activity (inward and outward citations from highly cited papers) for each author cluster. We emphasize the labelling we use for these two because there is potential confusion in the terms ‘self citing’ and ‘self cited’, which have been used elsewhere. We believe that it is more intuitive to link ‘self referencing’ to the analysis of a publication’s reference list.

Rather than setting a percentage of author self-citation above which an individual would be eliminated from consideration, we examined the distributions of author self-citation rates (and author self-referencing rates) for each of the 21 fields to determine true outliers. For example, the top decile of author self-citation could be within normal range for some fields and yet not for others. Furthermore, publication and citation data are universally skewed and we will show that this is true of the range of self-citation rates across researchers. Consequently, it would not be universally true that a particular percentile across a field-specific range is necessarily caused by undue self-citation: it may simply be a relatively high value within normal range in that field.

We will show below that the range of self-citation rates among Highly Cited Researchers follows a negative binomial distribution (NBD). Because of the skew in such a distribution, an issue arises concerning the application of standard tests for data outliers. It is said that one person’s ‘outlier’ may be, for an NBD, simply an extreme part of the skew.

To aid in interpretation, therefore, we adopted a graphical approach and both linear and log plots and analyses of referencing and citing rates were examined.

  • The linear analysis provides an accessible dataset for initial interpretation of citation rates.

  • The log plots provide an informative analysis for outlier detection.

The median (M = Q2) and lower (Q1) and upper (Q3) quartile values for percentage self-citation rate among Highly Cited Researchers were calculated for each ESI field.

A threshold for indicative outliers was then introduced to the graphical analysis. Because of extreme skews, thresholds can only be indicative under any circumstances. They are not statistical tests but point to rather than definitively identifying possible cases of excessive self-citation. In a graphical analysis, the location of the outliers is readily compared to the location (and shape) of the inter-quartile range as a ready indicator of the degree of departure from typical behaviour.

We have evaluated variants on:

$${\text{Threshold}} = \left[ {{\text{Q}}3} \right] + \left[ {\left( {{\text{Q}}3 - {\text{Q}}1} \right)*N} \right]$$

That is to say: a test threshold value is set above the third quartile (Q3) boundary equivalent to N times the inter-quartile range (Q3–Q1). A standard boxplot normally displays 1.5 times the inter-quartile range (i.e. N = 1.5 in this equation).

A variety of values for N may be considered, where a low threshold would be more exclusive, delineating a relatively large proportion of researchers to be reviewed, and a high threshold would be conservative, maximising inclusiveness and focussing only on extreme outliers. The practical intention of a graphical method is that the threshold value applied to plotted data should identify exceptional instances of high self-citation compared to the normal range for that field and that set of researchers. The interpretation of ‘exceptional’ will therefore depend on purpose and circumstance.

Note again our strong caveat: a graphical test draws attention to data that may be considered to lie outside behavioural norms for that field, where a statistical test is made problematic by the nature of a skewed distribution, but it still cannot by itself identify any one outlier as egregious.

Results

The distribution of the average percentage of references that were to self-papers and citations that were from self-papers for each Highly Cited Researcher was calculated. For this exercise, ISI always uses full author counts rather than any variant form of fractional counting because the issue is a binary question of whether or not an author has self-cited.

The overall distributions are shown in Fig. 1 with the parameters of a lognormal fit to a negative binomial distribution. It is evident that the modal self-reference share is around 5–7% and the modal self-citation count is less than 5%, but this will be shown to vary by field.

Fig. 1
figure1

The distributions of the average percentage of a self-references in highly cited papers and b self-citations from highly cited papers for 3517 Highly Cited Researchers identified in the Web of Science for 2008–2018 publications. Each graph shows the frequency distribution of the original data for the researchers and a computed log-normal fit of these data to a negative binomial distribution

To explore the variation in self-citation by field, a standard visualisation was created to evaluate the profile of the bulk of the population and the location within that profile of the median and upper and lower quartiles. Within each ESI field, the average self-citing rate for each Highly Cited Researcher’s output was displayed by ranked order of the percentage of cites (to highly cited papers) that were self-cites or references (listed in highly cited papers) that were self-cites or self-references.

The bulk of these followed a continuous range towards the high end of which there was a steeper change in trajectory and substantively higher self-citation rates with points lying outside the continuous range.

Data are plotted both as a linear and as a log-normal plot. The linear plot is of benefit because it allows ready interpretation and reference back to the original data values. The log plot is of benefit because it is a more appropriate presentation of the data for a negative binomial distribution. Overall, our interest is in maximising our understanding of the data and the status of outliers, which comes through multiple perspectives, rather than with the technical rightness or wrongness of any specific visualisation.

An explanatory and interpretive example for Chemistry, which had 240 Highly Cited Researchers in the 2019 dataset, is shown in Fig. 2. There are four plots: two each (one linear and one log plot) for self-references and self-citations. The rate of self-citation increases progressively across most of the spread of researchers. There is a continuous range from just above zero self-citation to around 9% self-citing and 22% self-referencing. The trajectory changes at the higher end, curving upwards towards much greater self-citation rates. Individual author data-points appear as outliers, discontinuous with the central range, which lie beyond the calculated test thresholds.

Fig. 2
figure2

Self-citation rates for 240 Highly Cited Researchers in the Essential Science Indicators field of Chemistry (CHE), analysing Web of Science data for 2008–2018. Data displayed are the average percentage of citations from (self-referencing to earlier papers) and to (self-citing from later publications) highly cited papers on which a researcher is also an author or co-author. The same data are shown as a linear plot and a log plot. Each graph includes Q1, Median, Q3 and the lower and upper thresholds (set at 1.5 and 3 times the inter-quartile range, IQR) for indicative outliers. These are only labelled on the first graph

The process captured in Fig. 2 was repeated with the other ESI fields. “Appendix” (Figs. 6, 7, 8, 9) includes all graphs for each field, to illustrate the balance of consistency and variability in self-referencing and self-citing behaviour. Relevant data are summarised for reference in “Appendix” Tables 2 and 3.

Table 2 Key parameters for rates of self-referencing (author references in highly cited papers to own previous papers) by Highly Cited Researchers, grouped by Essential Science Indicators category and with categories ranked by median rate of self-citation
Table 3 Key parameters for rates of self-citing (subsequent researcher citations to own highly cited papers) by Highly Cited Researchers, grouped by Essential Science Indicators category and with categories ranked by median rate of self-citation

A difference revealed in the analysis by field was the variable extent to which outliers were present and detectable. This suggests that a propensity to unusually high self-citation rates is not distributed uniformly across all research areas. This is very important when considering tests for outliers.

We considered a variety of thresholds, but as visual guidelines we found that the accepted box-plot standard of 1.5 times the inter-quartile range worked well. This is most easily interpreted on the linear plot, whereas statistical sense would suggest that the data should be log-plotted because of the skewed negative binomial distribution. The linear thresholds were therefore plotted on the log-plots rather than calculating a log-normal value which would have been very inclusive with few indicative outliers.

Summary data for the key parameters of the distribution of average rates for all fields are shown in Figs. 3 (self-referencing) and Fig. 4 (self-citing) and in the “Appendix” (Table 2: self-referencing and Table 3: self-citing) as linear data for ready interpretation of percentages.

Fig. 3
figure3

Key parameters for analysing the distributions of the frequency of self-references among Highly Cited Researchers (2008–2018) by Essential Science Indicator fields. These data are also summarised in the “Appendix” Table 2. Fields are ranked by median self-referencing

Fig. 4
figure4

Key parameters for analysing the distributions of the frequency of self-citations among Highly Cited Researchers (2008–2018) by Essential Science Indicator fields. These data are also summarised in the “Appendix” Table 3. Fields are ranked by median self-citation rate

Figure 3 presents clear visual confirmation that the self-referencing rate across most fields—shown here by rank order of the median—is rather low with a very consistent pattern which includes all but the most-citing three or four fields. Median rates of self-referencing were generally within a small range, from 7.38% in Economics & Business (ECB) to 11.26% in Social Sciences (SSS) with upper quartiles below 16%. Exceptional medians were Space Science (SPA, 14.1%) and Mathematics (MAT, 15.6%). Pharmacology & Toxicology (PHT) is also an outlier with a median of 0%. This odd result is driven by frequent group authorship of highly cited papers. As noted in the Introduction, ‘typical’ rate of self-referencing around 10% was discussed by Garfield (1979) and has since been found by others. Thus, we can consider the core referencing behaviour of the overall group of Highly Cited Researchers, as research leaders, to be in accordance with precedents previously established.

The patterns for the key parameters for self-citations (Fig. 4) show a superficially similar pattern to those for self-referencing, with a relatively consistent series of median values up to around 6% self-citations. These rates of self-citation are evidently much lower than those of self-referencing, as expected for Highly Cited Researchers because their broad peer influence attracts many cites from others (van Raan 2008), and they are relatively more variable between fields compared to self-referencing. The upper quartile values rise from 3% (ECB) to 11% (ENG).

The field-specific self-citation profiles, showing a data point for each Highly Cited Researcher and following the style of Fig. 2, are collated in the “Appendix” ( Figs. 6, 7, 8, 9). Some descriptive examples illustrate the variation.

Clinical medicine is the largest field (448 Highly Cited Researchers). The self-citation rate to highly cited papers is relatively consistent and mostly within a central range up to 10% self-citations. “Central range” is used here as a descriptive, non-technical term to indicate the more consistent plateau of similar self-citation rates within each field that encompasses somewhat more than the inter-quartile range for these data. It appears to reflect the typical cultural self-citation behaviour in a field. By comparison, there are points on a higher trajectory and, while these were largely between the low and high statistical indicators, some exceed the high indicative outlier threshold value.

Engineering (225 Highly Cited Researchers) has a continuous range that climbs to near 20% but then has a steeper part of the curve. The change in trajectory close to the lower indicative outlier threshold, combined with a distinct ‘break’ in the series around 27%, makes it questionable whether the next part of the series is still part of the ‘normal range’ for the field, or represents a distinct sub-field with higher self-citing, or is indicative of inappropriate citation behaviour. However, the spread of outlier values that rise above the high threshold may be clearer evidence of unusual citation behaviour well outside the normal range.

Mathematics is a relatively small ESI field (97 Highly Cited Researchers) and it is possibly composed of a number of distinct sub-fields (see Bensman et al. 2010). The self-citation pattern is different to other sciences and rates appear generally to be higher. The continuous central range extends up to at least 30% self-citing: this is a rate well above most other fields. A series of disconnected points, rather more extended than in other fields, lies above this crossing the low and then the high indicative outlier threshold. Unusually, self-citations to highly cited papers are very high and actually exceed the self-referencing rates for all other fields. This certainly cautions against any simple test of inappropriate behaviour and confirms the need for informed scrutiny and interpretation.

Physics (194 Highly Cited Researchers) is a similar sized field to Engineering to which its research culture might also be linked. However, its self-citation pattern is much more similar to Clinical Medicine. Almost all individuals’ records lie clearly within the ‘normal’ central range and largely below the low threshold. Although the trajectory changes and the range extends beyond the high threshold there is no break with the overall distribution.

The correlation between the rates of self-citation to and from the highly cited papers of each researcher was calculated and was highly significant within every field. As examples: for the ESI field of Chemistry (Fig. 2: r = 0.55, N = 240 researchers, P < 0.001) and for the ESI field of Engineering (r = 0.59, N = 225 researchers, P < 0.001). The value of the correlation coefficient was lower in fields with fewer outliers, which suggests these high values drive the correlation, but was still statistically significant.

It will be noted in the Figs. 6, 7, 8 and 9 in the “Appendix” that:

  • The central normal range is similar, except in Engineering and Mathematics;

  • The value of the low statistical threshold is well defined at the high end of the normal range, where the continuous distribution often breaks;

  • Many but not all fields have some outlier values that lie above the high threshold

  • The number of these outliers is unrelated to the numbers of researchers profiled.

The median self-reference rate and the median self-citation rate for each ESI field are compared in Fig. 5a, b compares the calculated low outlier threshold (i.e. 1.5 times the inter-quartile range as for box-plots) for each field.

Fig. 5
figure5

The comparative median rates of a self-referencing and b self-citation analysed by Essential Science Indicator field and the comparative lower thresholds for indicative outliers in those fields. The self-referencing median rates and outlier thresholds are similar across fields but these are more variable for the self-citation indicators. Fields that lie outside the main distribution (Pharmacology, Mathematics and Space Science) have been labelled

In every field, except Mathematics, the median and lower indicative threshold self-citing rates (to the highly cited papers) are lower but more variable than that of the self-referencing (from the highly cited papers) (Fig. 5a, b). The average for the ESI Mathematics is, in fact, a high outlier itself, for both self-referencing and even more for self-citing. This emphasises the need for caution in understanding and interpreting the publication culture of a field. Space Science has an exceptional level of self-referencing, which is presumably also a consequence of the nature of research in that area.

The field of Pharmacology and Toxiciology stands out for a different reason. Self-citations are somewhat lower than many fields but self-referencing is extremely low and often zero. This is a consequence of the relatively large number of (often annually refreshed) review papers in these fields.

This appears to make the recognition of outlier values relatively straightforward. Very importantly, however, the data here and in the Appendix show that in some fields, within the central normal range and below the statistical threshold, there can be a well-populated and unproblematic upper decile and even upper percentile.

The outcomes of analysis of citations to and from highly cited papers broadly concur. We found that there a significant correlation between the values for individual researchers. This was particularly true among the outliers, in other words: researchers who self-reference at unusually high rates (above the central range) are also authors on papers where an unusually high proportion of the citations received are self-citations.

Discussion

The work we report here has focussed on a particular tranche of researchers: individuals with a significant portfolio of papers that all lie within the 1% most highly cited for their field. The extended verification and manual curation applied to these data has enabled us to focus with some accuracy on the citation behaviour that is revealed and, since this group necessarily includes a high proportion of leading and influential researchers, the behaviour reflected in these data may be taken as a signal of established cultural norms. Characteristics revealed by an analysis of researcher portfolios at the level of the Essential Science Indicators fields include:

  • A consistent central or normal range of average self-citation for researchers in every discipline-based category. (As noted above: “central range” is a descriptive, non-technical term used to indicate the consistent plateau of similar Highly Cited Researcher self-citation rates within each field that appears to reflect typical cultural self-citation behaviour.)

  • Observation of the field profile is more informative than any ‘average’ value and, for most fields, the central range lies below 10% self-citation.

  • Highest rates of self-citing in some categories that are still within that central range. In every field-based profile, some researcher portfolios are on a steeper part of the curve with values above but continuous with the central range.

  • Outliers beyond the ‘high-end’ group in some but not all categories, and where occurrence is unrelated to sample size. These outliers may be distinguished graphically by setting a threshold related to the inter-quartile range of the relevant dataset.

We agree with a possible criticism that the group of researchers analysed here may not be wholly representative of their field since they are generally likely to be more experienced in building a substantial portfolio of well-cited publications. However, their citation records are substantial enough not to be affected by small variations and they should be broadly representative of peer expectations about standards and good practice in publication and citation behaviour. Furthermore, many will be seen by more junior peers as exemplars of leadership in their field and the standards they set would therefore be deemed ‘appropriate’ to the field.

The patterns of self-citation rates suggest a very broad cultural consensus about the balance between building on one’s own work and recognising the influence of others. A clear exception to the general profile is Mathematics, where self-citation rates in the continuous distribution may be well above 30%. The reasons for this require expert, informed interpretation but the degree to which this ‘category’ may capture a series of small, isolated fields has been noted previously (Bensman et al. 2010). The wider implication is that, despite the general and widespread pattern of self-cites across field profiles, care will always be required to interpret the particular statistics of any field-limited sample.

The identification of a fixed fraction, such as the upper decile, as ‘outliers’ is evidently inappropriate. In fields such as Molecular Biology and Psychiatry/Psychology, there are no significant outliers in the sample: every researcher portfolio is included within the continuous range, albeit with some slightly above the core of the ‘normal’ range. Ioannidis et al. (2019) were right to draw attention to the statistical properties of these distributions: they have regular and testable properties. But, in practice, a visual inspection of the full profile for the data set is required to complete a valid interpretation.

The IQR-related threshold tested in this analysis is highly conservative: it seeks to attach as many researcher portfolios as possible to the central range and to establish where the potentially most egregious outliers are located. More radical approaches would identify more of the researchers close to but above the high end of the central range as requiring investigation, but this could perhaps also enable a focus on individual cases where an unusually high rate of self-cites may be justifiable for topic-specific reasons.

The key to interpretation is the need to determine the relationship between summary citation impact and fundamental research influence. A paper with an exceptionally high ratio of self- to total-cites is clearly not reflecting the same degree of influence as one with a similar citation count where most of the cites are from other researchers.

As Garfield (1979) noted four decades ago, we can also take into account the venues in which the work appears. The publication channels for many Highly Cited Researchers are the journals readily recognised as leading in their field, but this is not the case for some prolific self-citers. This raises questions about editorial and refereeing standards and the ability of the research base to manage its own affairs. If there are indeed a significant number of individuals who abuse the conventional research system by inappropriate practice then why does the system not detect and sanction this at review before publication? Or is the management of the journals also a problem?

Finally, attention must be paid to the presence of co-authors on publications that have self-citation rates significantly above expectations. In the past, as here, attention has been given to ‘self-citation’ in isolation but in fact the co-authors also benefit. If a co-author has established a track record that depends in fact on egregious self-citation by their colleagues then their record is no more valid than the self-citer, of whom they must be aware.

This final point is important because it changes our focus for the future. This analysis, like others before, has been about individual self-citation by researchers. In fact, the problem for the community is the research papers to which attention is falsely drawn by high citation counts generated by the author or authors. If these papers were suppressed then this would not only be an appropriate signal to the offender and their co-authors but also serve to remove offensive noise from the research information system.

References

  1. Adams, J. (2013). The fourth age of research. Nature,497(7451), 557–560. https://doi.org/10.1038/497557a.

    Article  Google Scholar 

  2. Aksnes, D. W. (2003). A macro study of self-citation. Scientometrics,56(2), 235–246. https://doi.org/10.1023/A:1021919228368.

    Article  Google Scholar 

  3. Aksnes, D. W., Langfeldt, L. & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. Sage Open, 9(1), article number 2158244019829575. https://doi.org/10.1177/2158244019829575.

  4. Andersen, J. P., Schneider, J. W., Jagsi, R. & Nielsen, M. W. (2019). Gender variations in citation distribution in medicine are very small and due to self-citation and journal prestige. eLIFE, 8, article number e45374. https://doi.org/10.7554/elife.45374.

  5. Anderson, M. S., Ronning, E. A., DeVries, R., & Martinson, B. C. (2010). Extending the Mertonian norms: Scientists’ subscription to norms of research. Journal of Higher Education,81(3), 366–393. https://doi.org/10.1353/jhe.0.0095.

    Article  Google Scholar 

  6. Baccini, A., De Nicolao, G., & Petrovich, E. (2019). Citation gaming induced by bibliometric evaluation: A country-level comparative analysis. PLoS ONE, 14(9): article number e0221212. https://doi.org/10.1371/journal.pone.0221212.

  7. Bakare, V., & Lewison, G. (2017). Country over-citation ratios. Scientometrics,113(2), 1199–1207. https://doi.org/10.1007/s11192-017-2490-z.

    Article  Google Scholar 

  8. Bartneck, C., & Kokkelmans, S. (2011). Detecting h-index manipulation through self-citation analysis. Scientometrics,87(1), 85–98. https://doi.org/10.1007/s11192-010-0306-5.

    Article  Google Scholar 

  9. Bensman, S. J., Smolinsky, L. J., & Pudovkin, A. I. (2010). Mean citation rate per article in mathematics journals: Differences from the scientific model. Journal of the American Society for Information Science and Technology,61(7), 1440–1463. https://doi.org/10.1002/asi.21332.

    Article  Google Scholar 

  10. Biagioli, M., Kenney, M., Martin, B. R., & Walsh, J. P. (2019). Academic misconduct, misrepresentation and gaming: A reassessment. Research Policy,48(2), 401–413. https://doi.org/10.1016/j.respol.2018.10.025.

    Article  Google Scholar 

  11. Biagioli, M., & Lippman, A. (Eds.). (2020). Gaming the metrics: Misconduct and manipulation in academic research. Cambridge, MA: MIT Press. ISBN 0 2625 37933.

  12. Bonzi, S., & Snyder, H. W. (1991). Motivations for citations: A comparison of self-citation and citation to others. Scientometrics,21(2), 245–254. https://doi.org/10.1007/BF02017571.

    Article  Google Scholar 

  13. Bornmann, L. (2017). Is collaboration among scientists related to the citation impact of papers because their quality increases with collaboration? An analysis based on data from F1000Prime and normalized citation scores. Journal of the Association for Information Science and Technology,68(4), 1036–1047. https://doi.org/10.1002/asi.23728.

    Article  Google Scholar 

  14. Bornmann, L., Adams, J., & Leydesdorff, L. (2018). The negative effects of citing with a national orientation in terms of recognition: National and international citations in natural-sciences papers from Germany, the Netherlands, and UK. Journal of Informetrics,12(3), 931–949. https://doi.org/10.1016/j.joi.2018.07.009.

    Article  Google Scholar 

  15. Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation,64(1), 45–80. https://doi.org/10.1108/00220410810844150.

    Article  Google Scholar 

  16. Bott, D. M., & Hargens, L. L. (1991). Are sociologists’ publications uncited? Citation rates of journal articles, chapters, and books. American Sociologist,22(2), 147–158.

    Article  Google Scholar 

  17. Brooks, T. A. (1985). Private acts and public objects: an investigation of citer motivations. Journal of the American Society for Information Science,36(4), 223–229. https://doi.org/10.1002/asi.4630360402.

    Article  Google Scholar 

  18. Brooks, T. A. (1986). Evidence of complex citer motivations. Journal of the American Society for Information Science,37(1), 34–36. https://doi.org/10.1002/(SICI)1097-4571(198601)37:1%3c34:AID-ASI5%3e3.0.CO;2-0.

    Article  Google Scholar 

  19. Campanario, J. M. (2011). Large increases and decreases in journal impact factors in only one year: The effect of journal self-citations. Journal of the American Society for Information Science and Technology,62(2), 230–235. https://doi.org/10.1002/asi.21457.

    Article  Google Scholar 

  20. Campanario, J. M. (2018). The journal citation reports (SCI edition) with and without journal self-citations. Profesional de la Información,27(2), 241–253. https://doi.org/10.3145/epi.2018.mar.03.

    Article  Google Scholar 

  21. Carley, S., Porter, A. L., & Youtie, J. (2013). Toward a more precise definition of self-citation. Scientometrics,94(2), 777–780. https://doi.org/10.1007/s11192-012-0745-2.

    Article  Google Scholar 

  22. Chorus, C. & Waltman, L. (2016). A large-scale analysis of impact factor biased journal self-citations. PLoS ONE, 11(8), article number e0161021. https://doi.org/10.1371/journal.pone.0161021.

  23. Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting. Social Studies of Science,5(4), 423–441. https://doi.org/10.1177/030631277500500403.

    Article  Google Scholar 

  24. Cooke, S. J., & Donaldson, M. R. (2014). Self-citation by researchers: Narcissism or an inevitable outcome of a cohesive and sustained research program? Ideas in Ecology and Evaluation,7(1), 1–2. https://doi.org/10.4033/iee.2014.7.1.e.

    Article  Google Scholar 

  25. Copiello, S. (2019). On the skewness of journal self-citations and publisher self-citations: Cues for discussion from a case study. Learned Publishing,32(3), 249–258. https://doi.org/10.1002/leap.1235.

    Article  Google Scholar 

  26. Costas, R., van Leeuwen, T. N., & Bordons, M. (2010). Self-citations at the meso and individual levels: Effects of different calculation methods. Scientometrics,82(3), 517–537. https://doi.org/10.1007/s11192-010-0187-7.

    Article  Google Scholar 

  27. Cozzens, S. E. (1989). What do citations count? The rhetoric-first model. Scientometrics,15(5–6), 437–447. https://doi.org/10.1007/BF02017064.

    Article  Google Scholar 

  28. D’Antuono, P. & Ciavarella, M. (2019). The false myth of the rise in self-citations, and the impressively positive effect of bibliometric evaluations on the increase of the impact of Italian research. Arxiv preprint. https://arxiv.org/abs/1910.02948.

  29. Davis, P. M. (2009). Reward or persuasion? The battle to define the meaning of a citation. Learned Publishing,22(1), 5–11. https://doi.org/10.1087/095315108X378712.

    Article  Google Scholar 

  30. Egghe, L. (1987). An exact calculation of Price’s Law for the Law of Lotka. Scientometrics,11(1–2), 81–97. https://doi.org/10.1007/BF02016632.

    Article  Google Scholar 

  31. Engqvist, L., & Frommen, J. G. (2008). The h-index and self-citations. Trends in Ecology & Evolution,23(5), 250–252. https://doi.org/10.1016/j.tree.2008.01.009.

    Article  Google Scholar 

  32. Engqvist, L., & Frommen, J. G. (2010). New insights into the relationship between the h-Index and self-citations? Journal of the American Society for Information Science and Technology,61(7), 1514–1515. https://doi.org/10.1002/asi.21298.

    Article  Google Scholar 

  33. Erikson, M. G., & Erlandson, P. (2014). A taxonomy of motives to cite. Social Studies of Science,44(4), 625–637. https://doi.org/10.1177/0306312714522871.

    Article  Google Scholar 

  34. Ferrara, E., & Romero, A. E. (2013). Scientific impact evaluation and the effect of self-citations: Mitigating the bias by discounting the h-Index. Journal of the American Society for Information Science and Technology,64(11), 2332–2339. https://doi.org/10.1002/asi.22976.

    Article  Google Scholar 

  35. Fister, I., Fister, I., & Perc, M. (2016). Toward the discovery of citation cartels in citation networks. Frontiers in Physics, 4, article number 49. https://doi.org/10.3389/fphy.2016.00049.

  36. Fong, E. A. & Wilhite, A. W. (2017). Authorship and citation manipulation in academic research. PLoS ONE, 12(12), article number e0187394. https://doi.org/10.1371/journal.pone.0187394.

  37. Fowler, J. H., & Aksnes, D. W. (2007). Does self-citation pay? Scientometrics,72(3), 427–437. https://doi.org/10.1007/s11192-007-1777-2.

    Article  Google Scholar 

  38. Frandsen, T. F. (2007). Journal self-citations: Analyzing the JIF mechanism. Journal of Informetrics,1(1), 47–58. https://doi.org/10.1016/j.joi.2006.09.002.

    Article  Google Scholar 

  39. Galvez, R. H. (2017). Assessing author self-citation as a mechanism of relevant knowledge diffusion. Scientometrics,111(3), 1801–1812. https://doi.org/10.1007/s11192-017-2330-1.

    Article  Google Scholar 

  40. Garfield, E. (1979). Citation indexing: Its theory and application in science, technology, and humanities. New York, NY: Wiley.

    Google Scholar 

  41. Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. American Documentation,14(3), 195–201. https://doi.org/10.1002/asi.5090140304.

    Article  Google Scholar 

  42. Ghiasi, G., Larivière, V., & Sugimoto, C. R. (2016). Gender differences in synchronous and diachronous self-citations. In I. Rafols, J. Molas-Gallart, E. Castro-Martinez & R. Woolley (Eds.), 21st International conference on science and technology indicators (STI 2016), pp. 844–851. ISBN 978-84-9048-519-4.

  43. Gianoli, E., & Molina-Montenegro, M. A. (2009). Insights into the relationship between the h-Index and self-citations. Journal of the American Society for Information Science and Technology,60(6), 1283–1285. https://doi.org/10.1002/asi.21042.

    Article  Google Scholar 

  44. Gilbert, N. G. (1977). Referencing as persuasion. Social Studies of Science,7(1), 113–122.

    Article  Google Scholar 

  45. Glänzel, W. (2008). Seven myths in bibliometrics about facts and fiction in quantitative science studies. Collnet Journal of Scientometrics and Information Management,2(1), 9–17. https://doi.org/10.1080/09737766.2008.10700836.

    Article  Google Scholar 

  46. Glänzel, W., Debackere, K., Thijs, B., & Schubert, A. (2006). A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics,67(2), 263–277. https://doi.org/10.1556/Scient.67.2006.2.8.

    Article  Google Scholar 

  47. Glänzel, W., & Thijs, B. (2004a). World flash on basic research: The influence of author self-citations on bibliometric macro indicators. Scientometrics,59(3), 281–310. https://doi.org/10.1023/B:SCIE.0000018535.99885.e9.

    Article  Google Scholar 

  48. Glänzel, W., & Thijs, B. (2004b). Does co-authorship inflate the share of self-citations? Scientometrics,61(3), 395–404. https://doi.org/10.1023/B:SCIE.0000045117.13348.b1.

    Article  Google Scholar 

  49. Glänzel, W., Thijs, B., & Schlemmer, B. (2004). A bibliometric approach to the role of author self-citations in scientific communication. Scientometrics,59(1), 63–77.

    Article  Google Scholar 

  50. Gul, S., Shah, T. A., & Shafiq, H. (2017). The prevalence of synchronous self-citation practices at the institutional level. Malaysian Journal of Library & Information Science,22(1), 1–14. https://doi.org/10.22452/mjlis.vol22no1.1.

    Article  Google Scholar 

  51. Hendrix, D. (2009). Institutional self-citation rates: A three-year study of universities in the United States. Scientometrics,81(2), 321–331. https://doi.org/10.1007/s11192-008-2160-2.

    Article  Google Scholar 

  52. Heneberg, P. (2016). From excessive journal self-cites to citation stacking: Analysis of journal self-citation kinetics in search for journals, which boost their scientometric indicators. PLoS ONE, 11(4), article number e0153730. https://doi.org/10.1371/journal.pone.0153730.

  53. Hicks, D. (2012). Performance-based university research funding systems. Research Policy,41(2), 251–261.

    Article  Google Scholar 

  54. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America,102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102.

    MATH  Article  Google Scholar 

  55. Huang, M. H., & Lin, W. Y. C. (2011). Probing the effect of author self-citations on h index: A case study of environmental engineering. Journal of Information Science,37(5), 453–461. https://doi.org/10.1177/0165551511412027.

    Article  Google Scholar 

  56. Hyland, K. (2003). Self-citation and self-reference: Credibility and promotion in academic publication. Journal of the Association for Information Science and Technology,54(3), 251–259.

    Google Scholar 

  57. Hyland, K., & Jiang, F. K. (2018). Changing patterns of self-citation: Cumulative inquiry or self-promotion? Text & Talk,38(3), 365–387. https://doi.org/10.1515/text-2018-0004.

    Article  Google Scholar 

  58. Ioannidis, J. P. A. (2015). A generalized view of self-citation: Direct, co-author, collaborative, and coercive induced self-citation. Journal of Psychosomatic Research,78(1), 7–11. https://doi.org/10.1016/j.jpsychores.2014.11.008.

    Article  Google Scholar 

  59. Ioannidis, J. P. A., Baas, J., Klavans, R., & Boyack, K. W. (2019). A standardized citation metrics author database annotated for scientific field. PLoS Biology, 17(8), article number e3000384. https://doi.org/10.1371/journal.pbio.3000384.

  60. Ioannidis, J. P. A. & Thombs, B. D. (2019). A user’s guide to inflated and manipulated impact factors. European Journal of Clinical Investigation, article number UNSP e13151.

  61. Kacem, A., Flatt, J. W., & Mayr, P. (2019). Tracking self-citations in academic publishing. bioRxiv preprint, December 23, 2019. https://doi.org/10.1101/2019.12.20.884031.

  62. Kaplan, N. (1965). The norms of citation behavior: Prolegomena to the footnote. American Documentation,16(3), 179–184. https://doi.org/10.1002/asi.5090160305.

    Article  Google Scholar 

  63. Kim, S. Y., & Kim, Y. (2018). The ethos of science and its correlates: An empirical analysis of scientists’ endorsement of Mertonian norms. Science Technology & Society,23(1), 1–24.

    Article  Google Scholar 

  64. King, M. M., Bergstrom, C. T., Correll, S. J., Jacquet, J., & West, J. D. (2017). Men set their own cites high: Gender and self-citation across fields and over time. Socius: Sociological Research for a Dynamic World,3, 1–22. https://doi.org/10.1177/2378023117738903.

    Article  Google Scholar 

  65. Ladle, R. J., Todd, P. A., & Malhado, A. C. M. (2012). Assessing insularity in global science. Scientometrics,93(3), 745–750. https://doi.org/10.1007/s11192-012-0703-z.

    Article  Google Scholar 

  66. Lariviève, V., Gingras, Y., Sugimoto, C. R., & Tsou, A. (2015). Team size matters: Collaboration and scientific impact since 1900. Journal of the Association for Information Science and Technology,66(7), 1323–1332. https://doi.org/10.1002/asi.23266.

    Article  Google Scholar 

  67. Lawani, S. M. (1982). On the heterogeneity and classification of author self-citations. Journal of the American Society for Information Science,33(5), 281–284.

    Article  Google Scholar 

  68. Leblond, M. (2012). Author self-citations in the field of ecology. Scientometrics,91(3), 943–953. https://doi.org/10.1007/s11192-011-0565-9.

    Article  Google Scholar 

  69. Lin, W. Y. C., & Huang, M. H. (2012). The relationship between co-authorship, currency of references and author self-citations. Scientometrics,90(2), 343–360. https://doi.org/10.1007/s11192-011-0523-6.

    MathSciNet  Article  Google Scholar 

  70. Macfarlane, B., & Cheng, M. (2008). Communism, universalism and disinterestedness: Re-examining contemporary support among academics for Merton’s scientific norms. Journal of Academic Ethics,6(1), 67–78. https://doi.org/10.1007/s10805-008-9055-y.

    Article  Google Scholar 

  71. MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science,40(5), 342–349. https://doi.org/10.1002/(SICI)1097-4571(198909)40:5%3c342:AID-ASI7%3e3.0.CO;2-U.

    Article  Google Scholar 

  72. Meadows, A. J., & O’Connor, J. G. (1971). Bibliographical statistics as a guide to growth points in science. Science Studies,1(1), 95–99. https://doi.org/10.1177/030631277100100107.

    Article  Google Scholar 

  73. Merton, R. K. (1942). Science and technology in a democratic order. Journal of Legal and Political Sociology, 1:115–126 (Reprinted in: Merton, R. K. (1973). The Sociology of Science: Theoretical and Empirical Investigations. Chicago, IL: University of Chicago Press, pp. 267–278).

  74. Merton, R. K. (1957). Priorities in scientific discovery: A chapter in the sociology of science. American Sociological Review,22(6), 635–659. https://doi.org/10.2307/2089193.

    Article  Google Scholar 

  75. Merton, R. K. (1988). The Matthew effect in science. II. Cumulative advantage and the symbolism of intellectual property. Isis,79, 606–623.

    Article  Google Scholar 

  76. Merton, R. K. (1995). The Thomas theorem and the Matthew effect. Social Forces,74(2), 379–422. https://doi.org/10.2307/2580486.

    Article  Google Scholar 

  77. Minasny, B., Hartemink, A. E., & McBratney, A. (2010). Individual, country, and journal self-citation in soil science. Geoderma,155(3–4), 434–438. https://doi.org/10.1016/j.geoderma.2009.12.003.

    Article  Google Scholar 

  78. Mishra, S., Fegley, B. D., Diesner, J., & Torvik, V. I. (2018). Self-citation is the hallmark of productive authors, of any gender. PLoS ONE, 13(9), article number e0195773. https://doi.org/10.1371/journal.pone.0195773.

  79. Moravcsik, M. J., & Murugesan, P. (1975). Some results on function and quality of citations. Social Studies of Science,5(1), 86–92. https://doi.org/10.1177/030631277500500106.

    Article  Google Scholar 

  80. Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Cherry Hill, NJ: Computer Horizons Inc.

    Google Scholar 

  81. Peritz, B. (1983). Are methodological papers more cited than theoretical or empirical ones: the case of sociology. Scientometrics,5(4), 211–218. https://doi.org/10.1007/BF02019738.

    Article  Google Scholar 

  82. Peroni, S., Ciancarini, P., Gangemi, A., Nuzzolese, A. G., Poggi, F., & Presutti V. (2019). The practice of self-citations: a longitudinal study. Arxiv preprint. https://arxiv.org/abs/1903.06142.

  83. Pichappan, P., & Sarasvady, S. (2002). The other side of the coin: The intricacies of author self-citations. Scientometrics,54(2), 285–290. https://doi.org/10.1023/A:1016070029935.

    Article  Google Scholar 

  84. Porter, A. L. (1977). Citation analysis: Queries and caveats. Social Studies of Science,7(2), 257–267. https://doi.org/10.1177/030631277700700207.

    MathSciNet  Article  Google Scholar 

  85. Price, D. J. D. (1971). Some remarks on elitism in information and the invisible college phenomenon in science. Journal of the American Society for Information Science,22(2), 74–75.

    Article  Google Scholar 

  86. Price, D. J. D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science,27, 292–306.

    Article  Google Scholar 

  87. Rousseau, R. (1999). Temporal differences in self-citation rates of scientific journals. Scientometrics,44(3), 521–531. https://doi.org/10.1007/BF02458493.

    Article  Google Scholar 

  88. Scarpa, F., Bianco, V., & Tagliafico, L. A. (2018). The impact of the national assessment exercises on self-citation rate and publication venue: An empirical investigation on the engineering academic sector in Italy. Scientometrics,117(2), 997–1022. https://doi.org/10.1007/s11192-018-2913-5.

    Article  Google Scholar 

  89. Schreiber, M. (2007). Self-citation corrections for the Hirsch index. EPL, 78(3), article number 30002. https://doi.org/10.1209/0295-5075/78/30002.

  90. Schreiber, M. (2009). The influence of self-citation corrections and the fractionalised counting of multi-authored manuscripts on the Hirsch index. Annalen der Physik,18(9), 607–621. https://doi.org/10.1002/andp.200910360.

    Article  Google Scholar 

  91. Schubert, A. (2016). Self-citations: The stepchildren of scientometrics? Orvosi Hetilap,157(32), 1289–1293. https://doi.org/10.1556/650.2016.30443.

    Article  Google Scholar 

  92. Schubert, A., Glänzel, W., & Thijs, B. (2006). The weight of author self-citations. A fractional approach to self-citation counting. Scientometrics, 67(3), 503–514. https://doi.org/10.1556/scient.67.2006.3.11.

    Article  Google Scholar 

  93. Seeber, M., Cattaneo, M., Meoli, M., & Malighetti, P. (2019). Self-citations as strategic response to the use of metrics for career decisions. Research Policy,48(2), 478–491. https://doi.org/10.1016/j.respol.2017.12.004.

    Article  Google Scholar 

  94. Shehatta, I., & Al-Rubaish, A. M. (2019). Impact of country self-citations on bibliometric indicators and ranking of most productive countries. Scientometrics,120(2), 775–791. https://doi.org/10.1007/s11192-019-03139-3.

    Article  Google Scholar 

  95. Simoes, N., & Crespo, N. (2020). Self-citations and scientific evaluation: Leadership, influence, and performance. Journal of Informetrics,14(2), 100990. https://doi.org/10.1016/j.joi.2019.100990.

    Article  Google Scholar 

  96. Small, H. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics,60(1), 71–79. https://doi.org/10.1023/B:SCIE.0000027310.68393.bc.

    Article  Google Scholar 

  97. Snyder, H., & Bonzi, S. (1998). Patterns of self-citation across disciplines (1980–1989). Journal of Information Science,24(6), 431–435. https://doi.org/10.1177/016555159802400606.

    Article  Google Scholar 

  98. Soares, J. A., Bresciani, K., Ramos, T., De Matos, L., Castellane, P., & Cerdeira, M. (2015). A review of the state of the art of self-citations. Journal of Education and Social Policy,2, 75–79.

    Google Scholar 

  99. Sugimoto, C. R., & Larivière, V. (2018). Measuring research. Oxford: Oxford University Press.

    Google Scholar 

  100. Tagliacozzo, R. (1977). Self-citations in scientific literature. Journal of Documentation,33(4), 251–265. https://doi.org/10.1108/eb026644.

    Article  Google Scholar 

  101. Tahamtan, I., & Bornmann, L. (2018). Core elements in the process of citing publications: Conceptual overview of the literature. Journal of Informetrics,12(1), 203–216. https://doi.org/10.1016/j.joi.2018.01.002.

    Article  Google Scholar 

  102. Tahamtan, I., & Bornmann, L. (2019). What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018. Scientometrics,121(3), 1635–1684. https://doi.org/10.1007/s11192-019-03243-4.

    Article  Google Scholar 

  103. Tang, L., Shapira, P., & Youtie, J. (2015). Is there a clubbing effect underlying Chinese research citation increases? Journal of the Association for Information Science and Technology,66(9), 1923–1932. https://doi.org/10.1002/asi.23302.

    Article  Google Scholar 

  104. Thijs, B., & Glänzel, W. (2006). The influence of author self-citations on bibliometric meso-indicators: The case of European universities. Scientometrics,66(1), 71–80. https://doi.org/10.1007/s11192-006-0006-3.

    Article  Google Scholar 

  105. Todeschini, R. & Baccini, A. (2016). Handbook of bibliometric indicators: Quantitative tools for studying and evaluating research. VCH-Wiley. ISBN-13: 978-3527337040.

  106. Trimble, V. (1986). A note on self-citation rates in astronomical papers. Publications of the Astronomical Society of the Pacific,98(610), 1347–1348.

    Article  Google Scholar 

  107. van Raan, A. F. J. (1998). The influence of international collaboration on the impact of research results: Some simple mathematical considerations concerning the role of self-citations. Scientometrics,42(3), 423–428. https://doi.org/10.1007/BF02458380.

    Article  Google Scholar 

  108. van Raan, A. F. J. (2008). Self-citation as an impact-reinforcing mechanism in the science system. Journal of the American Society for Information Science and Technology,59(10), 1631–1643. https://doi.org/10.1002/asi.20868.

    Article  Google Scholar 

  109. van Raan, A. F. J. (2019). Measuring science: Basic principles and application of advanced bibliometrics. In W. Glänzel, H. F. Moed, U. Schmoch, & M. Thelwall (Eds.), Springer handbook of science and technology indicators (pp. 237–280). Cham: Springer.

    Chapter  Google Scholar 

  110. Viiu, G. A. (2016). A theoretical evaluation of Hirsch-type bibliometric indicators confronted with extreme self-citation. Journal of Informetrics,10(2), 552–566. https://doi.org/10.1016/j.joi.2016.04.010.

    Article  Google Scholar 

  111. Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics,10(2), 365–391. https://doi.org/10.1016/j.joi.2016.02.007.

    Article  Google Scholar 

  112. White, H. D. (2001). Authors as citers over time. Journal of the American Society for Information Science and Technology,52(2), 87–108. https://doi.org/10.1002/1097-4571(2000)9999:9999%3c:AID-ASI1542%3e3.0.CO;2-T.

    Article  Google Scholar 

  113. Wilhite, A. W., & Fong, E. A. (2012). Coercive citation in academic publishing. Science,335(6068), 542–543. https://doi.org/10.1126/science.1212540.

    Article  Google Scholar 

  114. Wilsdon, J., Allen, E., Belfiore, E., Campbell, P., Curry, S., Hill, S., et al. (2015). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Higher Education Funding Council for England (HEFCE). https://doi.org/10.13140/RG.2.1.4929.1363.

    Article  Google Scholar 

  115. Yang, D. H., Li, X., Sun, X. X., & Wan, J. (2016). Detecting impact factor manipulation with data mining techniques. Scientometrics,109(3), 1989–2005. https://doi.org/10.1007/s11192-016-2144-6.

    Article  Google Scholar 

  116. Yu, G., & Wang, L. (2007). The self-cited rate of scientific journals and the manipulation of their impact factors. Scientometrics,73(3), 321–330. https://doi.org/10.1007/s11192-007-1779-8.

    Article  Google Scholar 

  117. Yu, T., Yu, G., Song, Y., & Wang, M. Y. (2018). Toward the more effective identification of journals with anomalous self-citation. Malaysian Journal of Library & Information Science,23(2), 25–46. https://doi.org/10.22452/mjlis.vol23no2.2.

    Article  Google Scholar 

  118. Yu, T., Yu, G., & Wang, M. Y. (2014). Classification method for detecting coercive self-citation in journals. Journal of Informetrics,8(1), 123–135. https://doi.org/10.1016/j.joi.2013.11.001.

    Article  Google Scholar 

  119. Zaggl, M. A. (2017). Manipulation of explicit reputation in innovation and knowledge exchange communities: The example of referencing in science. Research Policy,46(5), 970–983. https://doi.org/10.1016/j.respol.2017.02.009.

    Article  Google Scholar 

  120. Zhao, D. Z., & Strotmann, A. (2018). In-text function of author self-citations: Implications for research evaluation practice. Journal of the Association for Information Science and Technology,69(7), 949–952. https://doi.org/10.1002/asi.24046.

    Article  Google Scholar 

  121. Zhivotovsky, L. A., & Krutovsky, K. V. (2008). Self-citation can inflate h-index. Scientometrics,77(2), 373–375. https://doi.org/10.1007/s11192-006-1716-2.

    Article  Google Scholar 

Download references

Acknowledgements

We thank our ISI colleagues, particularly Ross Potter, Gordon Rogers and Josh Schnell, for their advice and suggestions during the development of this work. We also thank an anonymous referee for their comments regarding the ‘typicality’ of highly-cited researchers.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jonathan Adams.

Ethics declarations

Conflict of interest

The authors are employees of the Institute for Scientific Information, which is a part of Clarivate Analytics, the owners of the Web of Science Group.

Appendix

Appendix

See Tables 1, 2 and 3; Figs. 6, 7, 8 and 9.

Fig. 6
figure6

Linear plots by Essential Science Indicators field of the pattern of self-referencing to earlier papers by Highly Cited Researchers identified by analysing Web of Science data for 2008–2018 and to (self-citing from later publications) highly cited papers on which a researcher is also an author or co-author. Each graph includes Q1, Median, Q3 and the lower and upper thresholds (set at 1.5 and 3 times the inter-quartile range, IQR) for indicative outliers (see Fig. 2)

Fig. 7
figure7

Log plots by Essential Science Indicators field of the pattern of self-referencing to earlier papers by Highly Cited Researchers identified by analysing Web of Science data for 2008–2018 and to (self-citing from later publications) highly cited papers on which a researcher is also an author or co-author. Each graph includes Q1, Median, Q3 and the lower and upper thresholds (set at 1.5 and 3 times the inter-quartile range, IQR) for indicative outliers (see Fig. 2)

Fig. 8
figure8

Linear plots by Essential Science Indicators field of the pattern of self-citation to highly cited papers authored by Highly Cited Researchers identified by analysing Web of Science data for 2008–2018 and to (self-citing from later publications) highly cited papers on which a researcher is also an author or co-author. Each graph includes Q1, Median, Q3 and the lower and upper thresholds (set at 1.5 and 3 times the inter-quartile range, IQR) for indicative outliers (see Fig. 2)

Fig. 9
figure9

Log plots by Essential Science Indicators field of the pattern of self-citation to highly cited papers authored by Highly Cited Researchers identified by analysing Web of Science data for 2008–2018 and to (self-citing from later publications) highly cited papers on which a researcher is also an author or co-author. Each graph includes Q1, Median, Q3 and the lower and upper thresholds (set at 1.5 and 3 times the inter-quartile range, IQR) for indicative outliers (see Fig. 2)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Szomszor, M., Pendlebury, D.A. & Adams, J. How much is too much? The difference between research influence and self-citation excess. Scientometrics 123, 1119–1147 (2020). https://doi.org/10.1007/s11192-020-03417-5

Download citation

Keywords

  • Self-citation
  • Citation analysis
  • Citation distribution
  • Outliers
  • Research evaluation
  • Self-references