Skip to main content
Log in

The practice of self-citations: a longitudinal study

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many other countries, is a requirement for accessing to a professorship. To this end, we obtained from ScienceDirect 35,673 articles published from 1957 to 2016 by the participants to the 2012 Italian Scientific Habilitation, that resulted in the extraction of 1,379,050 citations retrieved through Semantic Publishing technologies. Our analysis showed an overall increment in author self-citations (i.e. where the citing article and the cited article share at least one author) in several of the 24 academic disciplines considered. However, we depicted a stronger causal relation between such increment and the rules introduced by the 2012 Italian Scientific Habilitation in 10 out of 24 disciplines analysed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. It is worth mentioning that (Glänzel et al. 2006), as well as other cited studies on self-citations, define the term citations of a particular article as the number of other articles that cite it, by including it in their reference list. On the contrary, as introduced at the beginning of this section, we use the term citations for indicating the links between a citing entity and a cited one, as defined in (Peroni and Shotton 2018a). In the former case, we say that the definition of term citations is article-centric since it strictly depends on the particular article one is considering, while in the latter case we say that the definition is relational-centric since it cares only about the connection between two entities.

  2. The definition of these and other kinds of self-citations has been taken from CiTO, the Citation Typing Ontology available at http://purl.org/spar/cito, part of the SPAR Ontologies (Peroni and Shotton 2018b). We used (Wallace, Larivière and Gingras 2012) and the blog post “Journal self-citations are increasingly biased toward impact factor years” by Ludo Waltman and Caspar Chorus (https://www.cwts.nl/blog?article=n-q2x264) to derive the definitions of some of the self-citations described in CiTO.

  3. We decided not to use the ORCiD data retrieved for testing the precision and recall of the heuristics based on family names for identifying author self-citations due to the partial coverage of the ORCiDs assigned to the authors of the articles considered in the analysis. To build a robust gold set using ORCiD, we needed that all the authors in all the citing and cited articles in such set had an ORCiD specified. However, this is not the case of the ORCiD dump we have used for the analysis, where we did not find some citing and cited articles with all the authors having the ORCiDs assigned. In addition, we were aware that the family name approach for identifying author self-citations can be unreliable for people coming from particular countries, such as Asian authors. Even if this situation did not happen in the empirical test we run, it was possible in principle and, as a consequence, could have slightly distorted some (even if a limited set of) self-citation counts.

  4. We used the Python class LinearRegression defined in the sklearn.linear_model package of scikit-learn (https://scikit-learn.org).

References

Download references

Acknowledgements

We want to thank our colleagues of the Digital and Semantic Publishing Laboratory at the University of Bologna for their support and discussions on the topic—namely Angelo Di Iorio and Fabio Vitali. Also, we would like to extend our thanks to Marzia Freo and Alessandra Luati (University of Bologna) for providing specific statistical backgrounds and techniques we used in the analysis presented in this paper. Last but not least, Andrea Bonaccorsi (University of Pisa) provided essential insights on the project and Riccardo Fini (University of Bologna) made available to us its SQL database containing information about the people who participated in the 2012 Italian Scientific Habilitation. ANVUR has partially funded this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvio Peroni.

Appendix

Appendix

See Tables 1, 2, 3 and 4.

Table 1 The citing articles considered in the experiment (published between 1957 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”
Table 2 The citing articles considered in the experiment (published between 2009 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”
Table 3 The citing articles considered in the experiment (published between 1957 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author network self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”
Table 4 The citing articles considered in the experiment (published between 2009 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author network self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peroni, S., Ciancarini, P., Gangemi, A. et al. The practice of self-citations: a longitudinal study. Scientometrics 123, 253–282 (2020). https://doi.org/10.1007/s11192-020-03397-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03397-6

Keywords

Navigation