The automatic normalisation challenge: detailed addresses identification

Morillo, Fernanda; Santabárbara, Ignacio; Aparicio, Javier

doi:10.1007/s11192-013-0965-0

The automatic normalisation challenge: detailed addresses identification

Published: 08 February 2013

Volume 95, pages 953–966, (2013)
Cite this article

Scientometrics Aims and scope Submit manuscript

Fernanda Morillo¹,
Ignacio Santabárbara¹ &
Javier Aparicio¹

624 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scientific Truth in a Post-Truth Era: A Review*

Article 15 May 2024

Corporate governance and sustainability: a review of the existing literature

Article 03 January 2021

A systematic exploration of scoping and mapping literature reviews

Article Open access 23 May 2024

Notes

E.g.: "Univ", "Dept", "Fac", etc.
Segments are those parts of the records (addresses) between two commas.
The list considers "&", "and", "de", "el", "lo", "las", "la" and "los". Optionally, we can update it with some other words.

References

Abramo, G., D’Angelo, C. A., & Pugini, F. (2008). The measurement of Italian Universities’ research productivity by a non-parametric-bibliometric methodology. Scientometrics, 76(2), 225–244.
Article Google Scholar
Abramo, G., D’Angelo, C. A., & Di Costa, F. (2011). National research assessment exercises: The effects of changing the rules of the game during the game. Scientometrics, 88(1), 229–238.
Article Google Scholar
Almeida, J. A. S., Pais, A. A. C. C., & Formosinho, S. J. (2009). Science indicators and science patterns in Europe. Journal of Informetrics, 3(2), 134–142.
Article Google Scholar
Bador, P., & Lafouge, T. (2005). Rédaction des adresses sur les publications. Un manque de rigueur défavorable aux universités françaises dans les classements internationaux. La Presse Médicale, 34(9), 633–636.
Article Google Scholar
Bornmann, L., & Ozimek, A. (2012). Stata commands for importing bibliometric data and processing author address information. Journal of Informetrics, 6(4), 505–512.
Article Google Scholar
Butler, L. (1999). Who “owns” this publication? Problems with assigning research publications on the basis of addresses. In Proceedings of the seventh conference of the international society for scientometrics and informetrics (pp. 87–96). Universidad de Colima, México.
D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.
Article Google Scholar
De Bruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted to the 2nd international conference on bibliometrics, scientometrics and informetrics, London, Ontario, Canada, July 5–7, 1989 (pp. 65–78). Amsterdam: Elsevier.
FECYT (2007). Workshop on “normalisation of institutions for bibliometric uses”, Barcelona.
Feller, I., & Gamota, G. (2007). Science indicators as reliable evidence. Minerva, 45(1), 17–30.
Article Google Scholar
Fernández, M. T., Cabrero, A., Zulueta, M. A., & Gómez, I. (1993). Constructing a relational database for bibliometric analysis. Research Evaluation, 3(1), 55–62.
Google Scholar
Gálvez, C., & Moya-Anegón, F. (2006). The unification of institutional addresses applying parameterized finite state graphs (P FSG). Scientometrics, 69(2), 323–345.
Article Google Scholar
Gálvez, C., & Moya-Anegón, F. (2007). Standardizing formats of corporate source data. Scientometrics, 70(1), 3–26.
Article Google Scholar
García-Zorita, C., Martín-Moreno, C., Lascurain-Sánchez, M. L., & Sanz-Casado, E. (2006). Institutional addresses in the Web of Science: the effects on scientific evaluation. Journal of Information Science, 32(4), 378–383.
Article Google Scholar
Gurney, T., Horlings, E., & van den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.
Article Google Scholar
Hood, W. W., & Wilson, C. S. (2003). Informetric studies using databases: Opportunities and challenges. Scientometrics, 58(3), 587–608.
Article Google Scholar
Katz, J. S., & Hicks, D. (1997). Desktop scientometrics. Scientometrics, 38(1), 141–153.
Article Google Scholar
Mallig, N. (2010). A relational database for bibliometric analysis. Journal of Informetrics, 4(4), 564–580.
Article Google Scholar
Morillo, F., Aparicio, J., González-Albo, B., & Moreno, L. (2013). Towards the automation of address identification. Scientometrics, 94(1), 207–224. doi:10.1007/s11192-012-0733-6.
Article Google Scholar
National Science Board (2012). Science and engineering indicators 2012. Arlington: National Science Foundation (NSB 12-01).
OST (2010) Indicateurs de sciences et de technologies. Édition 2010. Rapport de l’Observatoire des Sciences et des Techniques établi sous la direction de Ghislaine Filliatreau par l’équipe de l’Observatoire des Sciences et des Techniques (OST), Paris.
Perianes-Rodríguez, A., Chinchilla-Rodríguez, Z., Vargas-Quesada, B., Olmeda-Gómez, C., & Moya-Anegón, F. (2009). Synthetic hybrid indicators based on scientific collaboration to quantify and evaluate individual research results. Journal of Informetrics, 3(2), 91–101.
Article Google Scholar
Thijs, B., & Glänzel, W. (2008). A structural analysis of publication profiles for the classification of European research institutes. Scientometrics, 74(2), 223–236.
Article Google Scholar
Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.
Article Google Scholar
Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.
Article Google Scholar

Download references

Acknowledgments

We wish to thank Adrián Arias Díaz-Faes for his valuable statistical assistance and the anonymous reviewer of this paper for his/her comments and suggestions. This work is supported by the Spanish Ministry of Science and Innovation (Grant CSO2011-25102).

Author information

Authors and Affiliations

Instituto de Estudios Documentales sobre Ciencia y Tecnología (IEDCYT), Centro de Ciencias Humanas y Sociales (CCHS), Spanish National Research Council (CSIC), Albasanz 26-28, 28037, Madrid, Spain
Fernanda Morillo, Ignacio Santabárbara & Javier Aparicio

Authors

Fernanda Morillo
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Santabárbara
View author publications
You can also search for this author in PubMed Google Scholar
Javier Aparicio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernanda Morillo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morillo, F., Santabárbara, I. & Aparicio, J. The automatic normalisation challenge: detailed addresses identification. Scientometrics 95, 953–966 (2013). https://doi.org/10.1007/s11192-013-0965-0

Download citation

Received: 13 July 2012
Published: 08 February 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11192-013-0965-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The automatic normalisation challenge: detailed addresses identification

Abstract

Access this article

Similar content being viewed by others

Scientific Truth in a Post-Truth Era: A Review*

Corporate governance and sustainability: a review of the existing literature

A systematic exploration of scoping and mapping literature reviews

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The automatic normalisation challenge: detailed addresses identification

Abstract

Access this article

Similar content being viewed by others

Scientific Truth in a Post-Truth Era: A Review*

Corporate governance and sustainability: a review of the existing literature

A systematic exploration of scoping and mapping literature reviews

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation