Skip to main content
Log in

The automatic normalisation challenge: detailed addresses identification

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. E.g.: "Univ", "Dept", "Fac", etc.

  2. Segments are those parts of the records (addresses) between two commas.

  3. The list considers "&", "and", "de", "el", "lo", "las", "la" and "los". Optionally, we can update it with some other words.

References

  • Abramo, G., D’Angelo, C. A., & Pugini, F. (2008). The measurement of Italian Universities’ research productivity by a non-parametric-bibliometric methodology. Scientometrics, 76(2), 225–244.

    Article  Google Scholar 

  • Abramo, G., D’Angelo, C. A., & Di Costa, F. (2011). National research assessment exercises: The effects of changing the rules of the game during the game. Scientometrics, 88(1), 229–238.

    Article  Google Scholar 

  • Almeida, J. A. S., Pais, A. A. C. C., & Formosinho, S. J. (2009). Science indicators and science patterns in Europe. Journal of Informetrics, 3(2), 134–142.

    Article  Google Scholar 

  • Bador, P., & Lafouge, T. (2005). Rédaction des adresses sur les publications. Un manque de rigueur défavorable aux universités françaises dans les classements internationaux. La Presse Médicale, 34(9), 633–636.

    Article  Google Scholar 

  • Bornmann, L., & Ozimek, A. (2012). Stata commands for importing bibliometric data and processing author address information. Journal of Informetrics, 6(4), 505–512.

    Article  Google Scholar 

  • Butler, L. (1999). Who “owns” this publication? Problems with assigning research publications on the basis of addresses. In Proceedings of the seventh conference of the international society for scientometrics and informetrics (pp. 87–96). Universidad de Colima, México.

  • D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.

    Article  Google Scholar 

  • De Bruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted to the 2nd international conference on bibliometrics, scientometrics and informetrics, London, Ontario, Canada, July 5–7, 1989 (pp. 65–78). Amsterdam: Elsevier.

  • FECYT (2007). Workshop on “normalisation of institutions for bibliometric uses”, Barcelona.

  • Feller, I., & Gamota, G. (2007). Science indicators as reliable evidence. Minerva, 45(1), 17–30.

    Article  Google Scholar 

  • Fernández, M. T., Cabrero, A., Zulueta, M. A., & Gómez, I. (1993). Constructing a relational database for bibliometric analysis. Research Evaluation, 3(1), 55–62.

    Google Scholar 

  • Gálvez, C., & Moya-Anegón, F. (2006). The unification of institutional addresses applying parameterized finite state graphs (P FSG). Scientometrics, 69(2), 323–345.

    Article  Google Scholar 

  • Gálvez, C., & Moya-Anegón, F. (2007). Standardizing formats of corporate source data. Scientometrics, 70(1), 3–26.

    Article  Google Scholar 

  • García-Zorita, C., Martín-Moreno, C., Lascurain-Sánchez, M. L., & Sanz-Casado, E. (2006). Institutional addresses in the Web of Science: the effects on scientific evaluation. Journal of Information Science, 32(4), 378–383.

    Article  Google Scholar 

  • Gurney, T., Horlings, E., & van den Besselaar, P. (2012). Author disambiguation using multi-aspect similarity indicators. Scientometrics, 91(2), 435–449.

    Article  Google Scholar 

  • Hood, W. W., & Wilson, C. S. (2003). Informetric studies using databases: Opportunities and challenges. Scientometrics, 58(3), 587–608.

    Article  Google Scholar 

  • Katz, J. S., & Hicks, D. (1997). Desktop scientometrics. Scientometrics, 38(1), 141–153.

    Article  Google Scholar 

  • Mallig, N. (2010). A relational database for bibliometric analysis. Journal of Informetrics, 4(4), 564–580.

    Article  Google Scholar 

  • Morillo, F., Aparicio, J., González-Albo, B., & Moreno, L. (2013). Towards the automation of address identification. Scientometrics, 94(1), 207–224. doi:10.1007/s11192-012-0733-6.

    Article  Google Scholar 

  • National Science Board (2012). Science and engineering indicators 2012. Arlington: National Science Foundation (NSB 12-01).

  • OST (2010) Indicateurs de sciences et de technologies. Édition 2010. Rapport de l’Observatoire des Sciences et des Techniques établi sous la direction de Ghislaine Filliatreau par l’équipe de l’Observatoire des Sciences et des Techniques (OST), Paris.

  • Perianes-Rodríguez, A., Chinchilla-Rodríguez, Z., Vargas-Quesada, B., Olmeda-Gómez, C., & Moya-Anegón, F. (2009). Synthetic hybrid indicators based on scientific collaboration to quantify and evaluate individual research results. Journal of Informetrics, 3(2), 91–101.

    Article  Google Scholar 

  • Thijs, B., & Glänzel, W. (2008). A structural analysis of publication profiles for the classification of European research institutes. Scientometrics, 74(2), 223–236.

    Article  Google Scholar 

  • Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.

    Article  Google Scholar 

  • Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.

    Article  Google Scholar 

Download references

Acknowledgments

We wish to thank Adrián Arias Díaz-Faes for his valuable statistical assistance and the anonymous reviewer of this paper for his/her comments and suggestions. This work is supported by the Spanish Ministry of Science and Innovation (Grant CSO2011-25102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernanda Morillo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morillo, F., Santabárbara, I. & Aparicio, J. The automatic normalisation challenge: detailed addresses identification. Scientometrics 95, 953–966 (2013). https://doi.org/10.1007/s11192-013-0965-0

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-013-0965-0

Keywords

Navigation