Abstract
The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.
Similar content being viewed by others
Notes
E.g.: "Univ", "Dept", "Fac", etc.
Segments are those parts of the records (addresses) between two commas.
The list considers "&", "and", "de", "el", "lo", "las", "la" and "los". Optionally, we can update it with some other words.
References
Abramo, G., D’Angelo, C. A., & Pugini, F. (2008). The measurement of Italian Universities’ research productivity by a non-parametric-bibliometric methodology. Scientometrics, 76(2), 225–244.
Abramo, G., D’Angelo, C. A., & Di Costa, F. (2011). National research assessment exercises: The effects of changing the rules of the game during the game. Scientometrics, 88(1), 229–238.
Almeida, J. A. S., Pais, A. A. C. C., & Formosinho, S. J. (2009). Science indicators and science patterns in Europe. Journal of Informetrics, 3(2), 134–142.
Bador, P., & Lafouge, T. (2005). Rédaction des adresses sur les publications. Un manque de rigueur défavorable aux universités françaises dans les classements internationaux. La Presse Médicale, 34(9), 633–636.
Bornmann, L., & Ozimek, A. (2012). Stata commands for importing bibliometric data and processing author address information. Journal of Informetrics, 6(4), 505–512.
Butler, L. (1999). Who “owns” this publication? Problems with assigning research publications on the basis of addresses. In Proceedings of the seventh conference of the international society for scientometrics and informetrics (pp. 87–96). Universidad de Colima, México.
D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.
De Bruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted to the 2nd international conference on bibliometrics, scientometrics and informetrics, London, Ontario, Canada, July 5–7, 1989 (pp. 65–78). Amsterdam: Elsevier.
FECYT (2007). Workshop on “normalisation of institutions for bibliometric uses”, Barcelona.
Feller, I., & Gamota, G. (2007). Science indicators as reliable evidence. Minerva, 45(1), 17–30.
Fernández, M. T., Cabrero, A., Zulueta, M. A., & Gómez, I. (1993). Constructing a relational database for bibliometric analysis. Research Evaluation, 3(1), 55–62.
Gálvez, C., & Moya-Anegón, F. (2006). The unification of institutional addresses applying parameterized finite state graphs (P FSG). Scientometrics, 69(2), 323–345.
Gálvez, C., & Moya-Anegón, F. (2007). Standardizing formats of corporate source data. Scientometrics, 70(1), 3–26.
García-Zorita, C., Martín-Moreno, C., Lascurain-Sánchez, M. L., & Sanz-Casado, E. (2006). Institutional addresses in the Web of Science: the effects on scientific evaluation. Journal of Information Science, 32(4), 378–383.
Gurney, T., Horlings, E., & van den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.
Hood, W. W., & Wilson, C. S. (2003). Informetric studies using databases: Opportunities and challenges. Scientometrics, 58(3), 587–608.
Katz, J. S., & Hicks, D. (1997). Desktop scientometrics. Scientometrics, 38(1), 141–153.
Mallig, N. (2010). A relational database for bibliometric analysis. Journal of Informetrics, 4(4), 564–580.
Morillo, F., Aparicio, J., González-Albo, B., & Moreno, L. (2013). Towards the automation of address identification. Scientometrics, 94(1), 207–224. doi:10.1007/s11192-012-0733-6.
National Science Board (2012). Science and engineering indicators 2012. Arlington: National Science Foundation (NSB 12-01).
OST (2010) Indicateurs de sciences et de technologies. Édition 2010. Rapport de l’Observatoire des Sciences et des Techniques établi sous la direction de Ghislaine Filliatreau par l’équipe de l’Observatoire des Sciences et des Techniques (OST), Paris.
Perianes-Rodríguez, A., Chinchilla-Rodríguez, Z., Vargas-Quesada, B., Olmeda-Gómez, C., & Moya-Anegón, F. (2009). Synthetic hybrid indicators based on scientific collaboration to quantify and evaluate individual research results. Journal of Informetrics, 3(2), 91–101.
Thijs, B., & Glänzel, W. (2008). A structural analysis of publication profiles for the classification of European research institutes. Scientometrics, 74(2), 223–236.
Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.
Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.
Acknowledgments
We wish to thank Adrián Arias Díaz-Faes for his valuable statistical assistance and the anonymous reviewer of this paper for his/her comments and suggestions. This work is supported by the Spanish Ministry of Science and Innovation (Grant CSO2011-25102).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Morillo, F., Santabárbara, I. & Aparicio, J. The automatic normalisation challenge: detailed addresses identification. Scientometrics 95, 953–966 (2013). https://doi.org/10.1007/s11192-013-0965-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-013-0965-0