Abstract
In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many other countries, is a requirement for accessing to a professorship. To this end, we obtained from ScienceDirect 35,673 articles published from 1957 to 2016 by the participants to the 2012 Italian Scientific Habilitation, that resulted in the extraction of 1,379,050 citations retrieved through Semantic Publishing technologies. Our analysis showed an overall increment in author self-citations (i.e. where the citing article and the cited article share at least one author) in several of the 24 academic disciplines considered. However, we depicted a stronger causal relation between such increment and the rules introduced by the 2012 Italian Scientific Habilitation in 10 out of 24 disciplines analysed.
Similar content being viewed by others
Notes
It is worth mentioning that (Glänzel et al. 2006), as well as other cited studies on self-citations, define the term citations of a particular article as the number of other articles that cite it, by including it in their reference list. On the contrary, as introduced at the beginning of this section, we use the term citations for indicating the links between a citing entity and a cited one, as defined in (Peroni and Shotton 2018a). In the former case, we say that the definition of term citations is article-centric since it strictly depends on the particular article one is considering, while in the latter case we say that the definition is relational-centric since it cares only about the connection between two entities.
The definition of these and other kinds of self-citations has been taken from CiTO, the Citation Typing Ontology available at http://purl.org/spar/cito, part of the SPAR Ontologies (Peroni and Shotton 2018b). We used (Wallace, Larivière and Gingras 2012) and the blog post “Journal self-citations are increasingly biased toward impact factor years” by Ludo Waltman and Caspar Chorus (https://www.cwts.nl/blog?article=n-q2x264) to derive the definitions of some of the self-citations described in CiTO.
We decided not to use the ORCiD data retrieved for testing the precision and recall of the heuristics based on family names for identifying author self-citations due to the partial coverage of the ORCiDs assigned to the authors of the articles considered in the analysis. To build a robust gold set using ORCiD, we needed that all the authors in all the citing and cited articles in such set had an ORCiD specified. However, this is not the case of the ORCiD dump we have used for the analysis, where we did not find some citing and cited articles with all the authors having the ORCiDs assigned. In addition, we were aware that the family name approach for identifying author self-citations can be unreliable for people coming from particular countries, such as Asian authors. Even if this situation did not happen in the empirical test we run, it was possible in principle and, as a consequence, could have slightly distorted some (even if a limited set of) self-citation counts.
We used the Python class LinearRegression defined in the sklearn.linear_model package of scikit-learn (https://scikit-learn.org).
References
Aksnes, D. W. (2003). A macro study of self-citation. Scientometrics,56(2), 235–246. https://doi.org/10.1023/A:1021919228368.
Baccini, A., De Nicolao, G., & Petrovich, E. (2019). Citation gaming induced by bibliometric evaluation: A country-level comparative analysis. PLoS ONE,14(9), e0221212. https://doi.org/10.1371/journal.pone.0221212.
Barabucci, G., Di Iorio, A., Peroni, S., Poggi, F., & Vitali, F. (2013). Annotations with EARMARK in practice: a fairy tale. In Proceedings of the 1st International workshop on collaborative annotations in shared environment: Metadata, vocabularies and techniques in the digital humanities (DH-CASE 2013) (pp. 1–8). https://doi.org/10.1145/2517978.2517990.
Bartneck, C., & Kokkelmans, S. (2011). Detecting h-index manipulation through self-citation analysis. Scientometrics,87(1), 85–98. https://doi.org/10.1007/s11192-010-0306-5.
Costas, R., van Leeuwen, T. N., & Bordons, M. (2010). Self-citations at the meso and individual levels: Effects of different calculation methods. Scientometrics,82(3), 517–537. https://doi.org/10.1007/s11192-010-0187-7.
Cyganiak, R., Wood, D., & Lanthaler, M. (2014). RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation 25 February 2014. https://www.w3.org/TR/rdf11-concepts/.
Di Iorio, A., Giannella, R., Poggi, F., Peroni, S., & Vitali, F. (2015). Exploring scholarly papers through citations. In Proceedings of the 2015 ACM symposium on document engineering (DocEng 2015) (pp. 107–116). https://doi.org/10.1145/2682571.2797065.
Di Iorio, A., Peroni, S., & Poggi, F. (2019). Open data to evaluate academic researchers: An experiment with the Italian Scientific Habilitation. In Proceedings of the 17th international conference on scientometrics and informetrics (ISSI 2019). Retrieved from http://arxiv.org/abs/1902.03287.
Fortunato, S., Bergstrom, C., Börner, K., Evans, J. A., Helbing, D., Milojević, S., et al. (2018). Science of science. Science,359(6379), eaao0185. https://doi.org/10.1126/science.aao0185.
Gálvez, R. H. (2017). Assessing author self-citation as a mechanism of relevant knowledge diffusion. Scientometrics,111(3), 1801–1812. https://doi.org/10.1007/s11192-017-2330-1.
Glänzel, W., Debackere, K., Thijs, B., & Schubert, A. (2006). A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics,67(2), 263–277. https://doi.org/10.1007/s11192-006-0098-9.
Glänzel, W., & Thijs, B. (2004). Does co-authorship inflate the share of self-citations? Scientometrics,61(3), 395–404. https://doi.org/10.1023/B:SCIE.0000045117.13348.b1.
Gul, S., Shah, T. A., & Shafiq, H. (2017). The prevalence of synchronous self-citation practices at the institutional level. Malaysian Journal of Library & Information Science,22(1), 1–14. https://doi.org/10.22452/mjlis.vol22no1.1.
Harris S., & Seaborne A. (2013). SPARQL 1.1 Query Language. W3C Recommendation 21 March 2013. https://www.w3.org/TR/sparql11-query/.
Huang, C.-K., Neylon, C., Brookes-Kenworthy, C., Hosking, R., Montgomery, L., Wilson, K., et al. (2019). Comparison of bibliographic data sources: Implications for the robustness of university rankings. BioRxiv. https://doi.org/10.1101/750075.
Larivière, V., Gingras, Y., Sugimoto, C. R., & Tsou, A. (2015). Team size matters: Collaboration and scientific impact since 1900. Journal of the Association for Information Science and Technology,66(7), 1323–1332. https://doi.org/10.1002/asi.23266.
Marzolla, M. (2015). Quantitative analysis of the Italian National Scientific Qualification. Journal of Informetrics,9(2), 285–316. https://doi.org/10.1016/j.joi.2015.02.006.
Marzolla, M. (2016). Assessing evaluation procedures for individual researchers: The case of the Italian National Scientific qualification. Journal of Informetrics,10(2), 408–438. https://doi.org/10.1016/j.joi.2016.01.009.
Nuzzolese, A. G., Ciancarini, P., Gangemi, A., Peroni, S., Poggi, F., & Presutti, V. (2019). Do altmetrics work for assessing research quality? Scientometrics,118(2), 539–562. https://doi.org/10.1007/s11192-018-2988-z.
Peroni, S. (2017). Automating Semantic Publishing. Data Science,1(1–2), 155–173. https://doi.org/10.3233/DS-170012.
Peroni, S. (2018). Material of the article “The practice of self-citations: a longitudinal study”. Figshare. https://doi.org/10.6084/m9.figshare.6866660.
Peroni, S., & Shotton, D. (2018a). Open Citation: Definition. Figshare. https://doi.org/10.6084/m9.figshare.6683855.
Peroni, S., & Shotton, D. (2018b). The SPAR Ontologies. In Proceedings of the 17th international semantic web conference (ISWC 2018) (pp. 119–136). https://doi.org/10.1007/978-3-030-00668-6_8.
Peroni, S., & Shotton, D. (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies. https://doi.org/10.1162/qss_a_00023.
Peroni, S., Shotton, D., & Vitali, F. (2017). One year of the OpenCitations Corpus—Releasing RDF-based scholarly citation data into the Public Domain. In Proceedings of the 16th international semantic web conference (ISWC 2017) (pp. 184–192). https://doi.org/10.1007/978-3-319-68204-4_19.
Poggi, F., Ciancarini, P., Gangemi, A., Nuzzolese, A. G., Peroni, S., & Presutti, V. (2019). Predicting the results of evaluation procedures of academics. PeerJ Computer Science,5, e199. https://doi.org/10.7717/peerj-cs.199.
Seeber, M., Cattaneo, M., Meoli, M., & Malighetti, P. (2018). Self-citations as strategic response to the use of metrics for career decisions. Research Policy,48(2), 478–491. https://doi.org/10.1016/j.respol.2017.12.004.
Shotton, D. (2009). Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing,22(2), 85–94. https://doi.org/10.1087/2009202.
Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in semantic publishing: Exemplar semantic enhancements of a research article. PLoS Computational Biology,5(4), e1000361. https://doi.org/10.1371/journal.pcbi.1000361.
Swales, J. (1986). Citation analysis and discourse analysis. Applied Linguistics,7(1), 39–56. https://doi.org/10.1093/applin/7.1.39.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006) (pp. 103–110). http://www.aclweb.org/anthology/W06-1613.
Van Noorden, R. (2013). Brazilian citation scheme outed. Nature,500(7464), 510–511. https://doi.org/10.1038/500510a.
Vrandečić, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM,57(10), 78–85. https://doi.org/10.1145/2629489.
Wallace, M. L., Larivière, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PLoS ONE,7(3), e33339. https://doi.org/10.1371/journal.pone.0033339.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data,3, 160018. https://doi.org/10.1038/sdata.2016.18.
Xiao, M., Shi, Z., & Wang, S. (2018). The Impact on Citation Analysis Based on Ontology and Linked Data. In M. Jibu & Y. Osabe (Eds.), Scientometrics. https://doi.org/10.5772/intechopen.76377.
Yu, T., Yu, G., & Wang, M.-Y. (2014). Classification method for detecting coercive self-citation in journals. Journal of Informetrics,8(1), 123–135. https://doi.org/10.1016/j.joi.2013.11.001.
Zhu, Y., Yan, E., Peroni, S., & Che, C. (2019). Nine million book items and eleven million citations: A study of book-based scholarly communication using OpenCitations. Scientometrics,122(2), 1097–1112. https://doi.org/10.1007/s11192-019-03311-9.
Acknowledgements
We want to thank our colleagues of the Digital and Semantic Publishing Laboratory at the University of Bologna for their support and discussions on the topic—namely Angelo Di Iorio and Fabio Vitali. Also, we would like to extend our thanks to Marzia Freo and Alessandra Luati (University of Bologna) for providing specific statistical backgrounds and techniques we used in the analysis presented in this paper. Last but not least, Andrea Bonaccorsi (University of Pisa) provided essential insights on the project and Riccardo Fini (University of Bologna) made available to us its SQL database containing information about the people who participated in the 2012 Italian Scientific Habilitation. ANVUR has partially funded this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Peroni, S., Ciancarini, P., Gangemi, A. et al. The practice of self-citations: a longitudinal study. Scientometrics 123, 253–282 (2020). https://doi.org/10.1007/s11192-020-03397-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03397-6