Advertisement

Scientometrics

, Volume 99, Issue 3, pp 823–838 | Cite as

Institution name disambiguation for research assessment

  • Shuiqing Huang
  • Bo Yang
  • Sulan Yan
  • Ronald Rousseau
Article

Abstract

Research evaluation is a necessity for management of academic units (scientists, research groups, departments, institutes, universities) and for government decision making in science and technology. Yet, wrong conclusions may be drawn due to errors in assignments of authors to institutions. To improve existing techniques of institution name disambiguation (IND) based on word similarity or editing distance, a rule-based algorithm is proposed in this study. One-to-many relationships between an institution and many variant names under which it is referred to in bylines of publications are recognized with the aid of statistical methods and specific rules. The performance of the rule based IND algorithm is evaluated on large datasets in four fields. These experimental results demonstrate that the precision of the algorithm is high. Yet, recall should be improved.

Keywords

Institution name disambiguation (IND) Rule-based system Artificial intelligence Informetrics 

Notes

Acknowledgments

We would like to thank Qiuru Peng, Hui Lin, Xueqin Jiang, and Zengli She from the college of information science and technology for their work on data verification. The authors are supported by Grant No. 13CTQ031 of the National Social Science Fund of China.

References

  1. Abramo, G., Cicero, T., & D’Angelo, C. A. (2011). A field-standardized application of DEA to national-scale research assessment of universities. Journal of Informetrics, 5(4), 618–628.CrossRefGoogle Scholar
  2. Alias-i. (2002). http://alias-i.com/lingpipe/web/about.html Accessed 13 May 2013.
  3. Bollegala, D., Matsuo, Y., & Ishizuka, M. (2012). Automatic annotation of ambiguous personal names on the web. Computational Intelligence, 28(3), 398–425.CrossRefMathSciNetGoogle Scholar
  4. Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.CrossRefGoogle Scholar
  5. Csajbók, E., Berhidi, A., Vasas, L., & Schubert, A. (2007). Hirsch-index for countries based on essential science indicators data. Scientometrics, 73(1), 91–117.CrossRefGoogle Scholar
  6. D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.CrossRefGoogle Scholar
  7. DeBruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted for the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics (pp. 65–78). Amsterdam: Elsevier.Google Scholar
  8. Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Quantitative methods in library, documentation and information science. Amsterdam: Elsevier.Google Scholar
  9. French, J. C., Powell, A. L., & Schulman, E. (2000). Using clustering strategies for creating authority files. Journal of the American Society for Information Science and Technology, 51(8), 774–786.CrossRefGoogle Scholar
  10. Galvez, C., & Moya-Anegón, F. (2006). The unification of institutional addresses applying parametrized finite-state graphs. Scientometrics, 69(2), 323–345.CrossRefGoogle Scholar
  11. Galvez, C., & Moya-Anegón, F. (2007). Standardizing formats of corporate source data. Scientometrics, 70(1), 3–26.CrossRefGoogle Scholar
  12. Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 241–272.Google Scholar
  13. Jiang, Y., Zheng, H. T., Wang, X., Lu, B., & Wu, K. (2011). Affiliation disambiguation for constructing semantic digital libraries. Journal of the American Society for Information Science and Technology, 62(6), 1029–1041.CrossRefGoogle Scholar
  14. Kim, S. W., & Cho, S. Y. (2013). Characteristics of Korean personal names. Journal of the American Society for Information Science and Technology, 64(1), 86–95.CrossRefGoogle Scholar
  15. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10, 707–710.MathSciNetGoogle Scholar
  16. Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5), 1030–1047.CrossRefGoogle Scholar
  17. Morillo, F., Aparicio, J., González-Albo, B., & Moreno, L. (2013). Towards the automation of address identification. Scientometrics, 94(1), 207–224.CrossRefGoogle Scholar
  18. Narin, F., Stevens, K., Anderson, J., Collins, P., Irvine, J., Isard, P., et al. (1988). On-line approaches to measuring national scientific output: a cautionary tale. Science and Public Policy, 15(3), 153–163.Google Scholar
  19. Onodera, N., Iwasawa, M., Midorikawa, N., Yoshikane, F., Amano, K., Ootani, Y., et al. (2011). A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. Journal of the American Society for Information Science and Technology, 62(4), 677–690.CrossRefGoogle Scholar
  20. Pereira, D. A., Ribeiro-Neto, B., Ziviani, N., Laender, A. H. F., & Gonçalves, M. A. (2011). A generic web-based entity resolution framework. Journal of the American Society for Information Science and Technology, 62(5), 919–932.CrossRefGoogle Scholar
  21. Praal, F., Kosten, J., Calero-Medina, C., & Visser, M. S. (2013). Ranking universities: The challenge of affiliated institutes. Proceedings of the 18 th International Conference on Science and Technology Indicators. Sept. 4–6, 2013, Berlin, 284–289.Google Scholar
  22. Richardson, G. (2010). Automated country name disambiguation for code set alignment. Proceedings of the 14 th European Conference on Research and advanced technology for digital libraries. Springer-Verlag Berlin, Heidelberg, 498–501.Google Scholar
  23. Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.CrossRefGoogle Scholar
  24. Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis. Journal of the American Society for Information Science and Technology, 63(9), 1820–1833.CrossRefGoogle Scholar
  25. Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.CrossRefGoogle Scholar
  26. Taşkın, Z., & Al, U. (2013). Institutional name confusion on citation indexes: The example of the names of Turkish Hospitals. Procedia—Social and Behavioral Sciences, 73, 544–550.CrossRefGoogle Scholar
  27. Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.CrossRefGoogle Scholar
  28. Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.CrossRefGoogle Scholar
  29. Yang, K. H, Peng, H. T., & Jiang, J. Y. (2008). Author name disambiguation for citation using topic and web correlation. Proceedings of the 12 th Conference in the series of European Digital Library conferences (ECDL2008). Sept.19, 2008, Aarhus, 185–196.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2013

Authors and Affiliations

  • Shuiqing Huang
    • 1
  • Bo Yang
    • 1
  • Sulan Yan
    • 1
  • Ronald Rousseau
    • 2
    • 3
  1. 1.College of Information Science and TechnologyNanjing Agricultural UniversityNanjingPeople’s Republic of China
  2. 2.Institute for Education and Information Sciences, IBWUniversity of Antwerp (UA)AntwerpBelgium
  3. 3.KU LeuvenLouvainBelgium

Personalised recommendations