Accuracy of author names in bibliographic data sources: an Italian case study
We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages.
KeywordsAuthor names Accuracy Scopus WoS CrossRef PubMed
First of all, we wish to thank the anonymous reviewers for their useful suggestions. We are also indebted to Irene Bongioanni, Adriano Fazzone, Emanuele Fusco, and all members of Sapienza University of Rome who contributed to the analysis and cleaning of the bibliographic records stored in our local repository. Marco Schaerf was partially supported by H2020 Project Second Hands under grant Agreement No. 643950.
- Hood, W. W., & Wilson, C. S. (2003). Informetric studies using databases: Opportunities and challenges. Scientometrics, 58(3), 587–608. https://doi.org/10.1023/B:SCIE.0000006882.47115.c6.CrossRefGoogle Scholar
- Olensky, M. (2014). Testing an automated accuracy assessment method on bibliographic data. Journal of Library and Information Studies, 12(2), 19–38.Google Scholar
- Olensky, M. (2015). Data accuracy in bibliometric data sources and its impact on citation matching. PhD thesis. http://edoc.hu-berlin.de/docviews/abstract.php?id=41398. Accessed 23 Oct 2018.
- Pao, M. L. (1989). Importance of quality data for bibliometric research. In Proceedings of the 10th national online meeting on learned information, Medford, NJ (pp. 321–327).Google Scholar
- Ruiz-Pérez, R., López-Cózar, E. D., & Jiménez-Contreras, E. (2002). Spanish personal name variations in national and international biomedical databases: Implications for information retrieval and bibliometric studies. Journal of the Medical Library Association, 90(4), 411–430.Google Scholar
- Tunger, D., Haustein, S., Ruppert, L., Luca, G., & Unterhalt, S. (2010). The Delphic oracle: An analysis of potential error sources in bibliographic databases. In 11th International conference on science and technology indicators, Leiden, The Netherlands, 9 Sept 2010–11 Sept 2010. http://juser.fz-juelich.de/record/138630. Accessed 23 Oct 2018.