The practice of self-citations: a longitudinal study

Peroni, Silvio; Ciancarini, Paolo; Gangemi, Aldo; Nuzzolese, Andrea Giovanni; Poggi, Francesco; Presutti, Valentina

doi:10.1007/s11192-020-03397-6

The practice of self-citations: a longitudinal study

Published: 22 February 2020

Volume 123, pages 253–282, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

1012 Accesses
16 Citations
11 Altmetric
Explore all metrics

Abstract

In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many other countries, is a requirement for accessing to a professorship. To this end, we obtained from ScienceDirect 35,673 articles published from 1957 to 2016 by the participants to the 2012 Italian Scientific Habilitation, that resulted in the extraction of 1,379,050 citations retrieved through Semantic Publishing technologies. Our analysis showed an overall increment in author self-citations (i.e. where the citing article and the cited article share at least one author) in several of the 24 academic disciplines considered. However, we depicted a stronger causal relation between such increment and the rules introduced by the 2012 Italian Scientific Habilitation in 10 out of 24 disciplines analysed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

Article 26 March 2021

The journal coverage of Web of Science and Scopus: a comparative analysis

Article 19 October 2015

Notes

It is worth mentioning that (Glänzel et al. 2006), as well as other cited studies on self-citations, define the term citations of a particular article as the number of other articles that cite it, by including it in their reference list. On the contrary, as introduced at the beginning of this section, we use the term citations for indicating the links between a citing entity and a cited one, as defined in (Peroni and Shotton 2018a). In the former case, we say that the definition of term citations is article-centric since it strictly depends on the particular article one is considering, while in the latter case we say that the definition is relational-centric since it cares only about the connection between two entities.
The definition of these and other kinds of self-citations has been taken from CiTO, the Citation Typing Ontology available at http://purl.org/spar/cito, part of the SPAR Ontologies (Peroni and Shotton 2018b). We used (Wallace, Larivière and Gingras 2012) and the blog post “Journal self-citations are increasingly biased toward impact factor years” by Ludo Waltman and Caspar Chorus (https://www.cwts.nl/blog?article=n-q2x264) to derive the definitions of some of the self-citations described in CiTO.
We decided not to use the ORCiD data retrieved for testing the precision and recall of the heuristics based on family names for identifying author self-citations due to the partial coverage of the ORCiDs assigned to the authors of the articles considered in the analysis. To build a robust gold set using ORCiD, we needed that all the authors in all the citing and cited articles in such set had an ORCiD specified. However, this is not the case of the ORCiD dump we have used for the analysis, where we did not find some citing and cited articles with all the authors having the ORCiDs assigned. In addition, we were aware that the family name approach for identifying author self-citations can be unreliable for people coming from particular countries, such as Asian authors. Even if this situation did not happen in the empirical test we run, it was possible in principle and, as a consequence, could have slightly distorted some (even if a limited set of) self-citation counts.
We used the Python class LinearRegression defined in the sklearn.linear_model package of scikit-learn (https://scikit-learn.org).

References

Aksnes, D. W. (2003). A macro study of self-citation. Scientometrics,56(2), 235–246. https://doi.org/10.1023/A:1021919228368.
Article Google Scholar
Baccini, A., De Nicolao, G., & Petrovich, E. (2019). Citation gaming induced by bibliometric evaluation: A country-level comparative analysis. PLoS ONE,14(9), e0221212. https://doi.org/10.1371/journal.pone.0221212.
Article Google Scholar
Barabucci, G., Di Iorio, A., Peroni, S., Poggi, F., & Vitali, F. (2013). Annotations with EARMARK in practice: a fairy tale. In Proceedings of the 1st International workshop on collaborative annotations in shared environment: Metadata, vocabularies and techniques in the digital humanities (DH-CASE 2013) (pp. 1–8). https://doi.org/10.1145/2517978.2517990.
Bartneck, C., & Kokkelmans, S. (2011). Detecting h-index manipulation through self-citation analysis. Scientometrics,87(1), 85–98. https://doi.org/10.1007/s11192-010-0306-5.
Article Google Scholar
Costas, R., van Leeuwen, T. N., & Bordons, M. (2010). Self-citations at the meso and individual levels: Effects of different calculation methods. Scientometrics,82(3), 517–537. https://doi.org/10.1007/s11192-010-0187-7.
Article Google Scholar
Cyganiak, R., Wood, D., & Lanthaler, M. (2014). RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation 25 February 2014. https://www.w3.org/TR/rdf11-concepts/.
Di Iorio, A., Giannella, R., Poggi, F., Peroni, S., & Vitali, F. (2015). Exploring scholarly papers through citations. In Proceedings of the 2015 ACM symposium on document engineering (DocEng 2015) (pp. 107–116). https://doi.org/10.1145/2682571.2797065.
Di Iorio, A., Peroni, S., & Poggi, F. (2019). Open data to evaluate academic researchers: An experiment with the Italian Scientific Habilitation. In Proceedings of the 17th international conference on scientometrics and informetrics (ISSI 2019). Retrieved from http://arxiv.org/abs/1902.03287.
Fortunato, S., Bergstrom, C., Börner, K., Evans, J. A., Helbing, D., Milojević, S., et al. (2018). Science of science. Science,359(6379), eaao0185. https://doi.org/10.1126/science.aao0185.
Article Google Scholar
Gálvez, R. H. (2017). Assessing author self-citation as a mechanism of relevant knowledge diffusion. Scientometrics,111(3), 1801–1812. https://doi.org/10.1007/s11192-017-2330-1.
Article Google Scholar
Glänzel, W., Debackere, K., Thijs, B., & Schubert, A. (2006). A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics,67(2), 263–277. https://doi.org/10.1007/s11192-006-0098-9.
Article Google Scholar
Glänzel, W., & Thijs, B. (2004). Does co-authorship inflate the share of self-citations? Scientometrics,61(3), 395–404. https://doi.org/10.1023/B:SCIE.0000045117.13348.b1.
Article Google Scholar
Gul, S., Shah, T. A., & Shafiq, H. (2017). The prevalence of synchronous self-citation practices at the institutional level. Malaysian Journal of Library & Information Science,22(1), 1–14. https://doi.org/10.22452/mjlis.vol22no1.1.
Article Google Scholar
Harris S., & Seaborne A. (2013). SPARQL 1.1 Query Language. W3C Recommendation 21 March 2013. https://www.w3.org/TR/sparql11-query/.
Huang, C.-K., Neylon, C., Brookes-Kenworthy, C., Hosking, R., Montgomery, L., Wilson, K., et al. (2019). Comparison of bibliographic data sources: Implications for the robustness of university rankings. BioRxiv. https://doi.org/10.1101/750075.
Article Google Scholar
Larivière, V., Gingras, Y., Sugimoto, C. R., & Tsou, A. (2015). Team size matters: Collaboration and scientific impact since 1900. Journal of the Association for Information Science and Technology,66(7), 1323–1332. https://doi.org/10.1002/asi.23266.
Article Google Scholar
Marzolla, M. (2015). Quantitative analysis of the Italian National Scientific Qualification. Journal of Informetrics,9(2), 285–316. https://doi.org/10.1016/j.joi.2015.02.006.
Article Google Scholar
Marzolla, M. (2016). Assessing evaluation procedures for individual researchers: The case of the Italian National Scientific qualification. Journal of Informetrics,10(2), 408–438. https://doi.org/10.1016/j.joi.2016.01.009.
Article Google Scholar
Nuzzolese, A. G., Ciancarini, P., Gangemi, A., Peroni, S., Poggi, F., & Presutti, V. (2019). Do altmetrics work for assessing research quality? Scientometrics,118(2), 539–562. https://doi.org/10.1007/s11192-018-2988-z.
Article Google Scholar
Peroni, S. (2017). Automating Semantic Publishing. Data Science,1(1–2), 155–173. https://doi.org/10.3233/DS-170012.
Article Google Scholar
Peroni, S. (2018). Material of the article “The practice of self-citations: a longitudinal study”. Figshare. https://doi.org/10.6084/m9.figshare.6866660.
Article Google Scholar
Peroni, S., & Shotton, D. (2018a). Open Citation: Definition. Figshare. https://doi.org/10.6084/m9.figshare.6683855.
Article Google Scholar
Peroni, S., & Shotton, D. (2018b). The SPAR Ontologies. In Proceedings of the 17th international semantic web conference (ISWC 2018) (pp. 119–136). https://doi.org/10.1007/978-3-030-00668-6_8.
Peroni, S., & Shotton, D. (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies. https://doi.org/10.1162/qss_a_00023.
Article Google Scholar
Peroni, S., Shotton, D., & Vitali, F. (2017). One year of the OpenCitations Corpus—Releasing RDF-based scholarly citation data into the Public Domain. In Proceedings of the 16th international semantic web conference (ISWC 2017) (pp. 184–192). https://doi.org/10.1007/978-3-319-68204-4_19.
Poggi, F., Ciancarini, P., Gangemi, A., Nuzzolese, A. G., Peroni, S., & Presutti, V. (2019). Predicting the results of evaluation procedures of academics. PeerJ Computer Science,5, e199. https://doi.org/10.7717/peerj-cs.199.
Article Google Scholar
Seeber, M., Cattaneo, M., Meoli, M., & Malighetti, P. (2018). Self-citations as strategic response to the use of metrics for career decisions. Research Policy,48(2), 478–491. https://doi.org/10.1016/j.respol.2017.12.004.
Article Google Scholar
Shotton, D. (2009). Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing,22(2), 85–94. https://doi.org/10.1087/2009202.
Article Google Scholar
Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in semantic publishing: Exemplar semantic enhancements of a research article. PLoS Computational Biology,5(4), e1000361. https://doi.org/10.1371/journal.pcbi.1000361.
Article Google Scholar
Swales, J. (1986). Citation analysis and discourse analysis. Applied Linguistics,7(1), 39–56. https://doi.org/10.1093/applin/7.1.39.
Article Google Scholar
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006) (pp. 103–110). http://www.aclweb.org/anthology/W06-1613.
Van Noorden, R. (2013). Brazilian citation scheme outed. Nature,500(7464), 510–511. https://doi.org/10.1038/500510a.
Article Google Scholar
Vrandečić, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM,57(10), 78–85. https://doi.org/10.1145/2629489.
Article Google Scholar
Wallace, M. L., Larivière, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PLoS ONE,7(3), e33339. https://doi.org/10.1371/journal.pone.0033339.
Article Google Scholar
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data,3, 160018. https://doi.org/10.1038/sdata.2016.18.
Article Google Scholar
Xiao, M., Shi, Z., & Wang, S. (2018). The Impact on Citation Analysis Based on Ontology and Linked Data. In M. Jibu & Y. Osabe (Eds.), Scientometrics. https://doi.org/10.5772/intechopen.76377.
Yu, T., Yu, G., & Wang, M.-Y. (2014). Classification method for detecting coercive self-citation in journals. Journal of Informetrics,8(1), 123–135. https://doi.org/10.1016/j.joi.2013.11.001.
Article Google Scholar
Zhu, Y., Yan, E., Peroni, S., & Che, C. (2019). Nine million book items and eleven million citations: A study of book-based scholarly communication using OpenCitations. Scientometrics,122(2), 1097–1112. https://doi.org/10.1007/s11192-019-03311-9.
Article Google Scholar

Download references

Acknowledgements

We want to thank our colleagues of the Digital and Semantic Publishing Laboratory at the University of Bologna for their support and discussions on the topic—namely Angelo Di Iorio and Fabio Vitali. Also, we would like to extend our thanks to Marzia Freo and Alessandra Luati (University of Bologna) for providing specific statistical backgrounds and techniques we used in the analysis presented in this paper. Last but not least, Andrea Bonaccorsi (University of Pisa) provided essential insights on the project and Riccardo Fini (University of Bologna) made available to us its SQL database containing information about the people who participated in the 2012 Italian Scientific Habilitation. ANVUR has partially funded this work.

Author information

Authors and Affiliations

Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
Silvio Peroni
Department of Computer Science and Engineering, University of Bologna, Bologna, Italy
Paolo Ciancarini & Francesco Poggi
Innopolis University, Innopolis, Russia
Paolo Ciancarini
Digital Humanities Advanced Research Centre (DHARC), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
Silvio Peroni & Aldo Gangemi
Semantic Technology Laboratory (STLab), Institute of Cognitive Science and Technologies, National Research Council, Rome, Italy
Andrea Giovanni Nuzzolese & Valentina Presutti

Authors

Silvio Peroni
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Ciancarini
View author publications
You can also search for this author in PubMed Google Scholar
Aldo Gangemi
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Giovanni Nuzzolese
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Poggi
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Presutti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvio Peroni.

Appendix

See Tables 1, 2, 3 and 4.

Table 1 The citing articles considered in the experiment (published between 1957 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”

Full size table

Table 2 The citing articles considered in the experiment (published between 2009 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”

Full size table

Table 3 The citing articles considered in the experiment (published between 1957 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author network self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”

Full size table

Table 4 The citing articles considered in the experiment (published between 2009 and 2016), split into two populations and by subject category: articles published by 2012 [year <= 2012], and articles published after 2012 [year > 2012]. In addition to the number of the articles included in the two populations, the table shows the mean of the author network self-citations per article, the related standard deviation, the difference of the means between the two populations accompanied by the related confidence interval “ci-low”/“ci-high”

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peroni, S., Ciancarini, P., Gangemi, A. et al. The practice of self-citations: a longitudinal study. Scientometrics 123, 253–282 (2020). https://doi.org/10.1007/s11192-020-03397-6

Download citation

Received: 10 August 2019
Published: 22 February 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11192-020-03397-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The practice of self-citations: a longitudinal study

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

The journal coverage of Web of Science and Scopus: a comparative analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The practice of self-citations: a longitudinal study

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

The journal coverage of Web of Science and Scopus: a comparative analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation