Institution name disambiguation for research assessment
- 666 Downloads
Research evaluation is a necessity for management of academic units (scientists, research groups, departments, institutes, universities) and for government decision making in science and technology. Yet, wrong conclusions may be drawn due to errors in assignments of authors to institutions. To improve existing techniques of institution name disambiguation (IND) based on word similarity or editing distance, a rule-based algorithm is proposed in this study. One-to-many relationships between an institution and many variant names under which it is referred to in bylines of publications are recognized with the aid of statistical methods and specific rules. The performance of the rule based IND algorithm is evaluated on large datasets in four fields. These experimental results demonstrate that the precision of the algorithm is high. Yet, recall should be improved.
KeywordsInstitution name disambiguation (IND) Rule-based system Artificial intelligence Informetrics
We would like to thank Qiuru Peng, Hui Lin, Xueqin Jiang, and Zengli She from the college of information science and technology for their work on data verification. The authors are supported by Grant No. 13CTQ031 of the National Social Science Fund of China.
- Alias-i. (2002). http://alias-i.com/lingpipe/web/about.html Accessed 13 May 2013.
- Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.CrossRefGoogle Scholar
- DeBruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted for the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics (pp. 65–78). Amsterdam: Elsevier.Google Scholar
- Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Quantitative methods in library, documentation and information science. Amsterdam: Elsevier.Google Scholar
- Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 241–272.Google Scholar
- Narin, F., Stevens, K., Anderson, J., Collins, P., Irvine, J., Isard, P., et al. (1988). On-line approaches to measuring national scientific output: a cautionary tale. Science and Public Policy, 15(3), 153–163.Google Scholar
- Onodera, N., Iwasawa, M., Midorikawa, N., Yoshikane, F., Amano, K., Ootani, Y., et al. (2011). A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. Journal of the American Society for Information Science and Technology, 62(4), 677–690.CrossRefGoogle Scholar
- Praal, F., Kosten, J., Calero-Medina, C., & Visser, M. S. (2013). Ranking universities: The challenge of affiliated institutes. Proceedings of the 18 th International Conference on Science and Technology Indicators. Sept. 4–6, 2013, Berlin, 284–289.Google Scholar
- Richardson, G. (2010). Automated country name disambiguation for code set alignment. Proceedings of the 14 th European Conference on Research and advanced technology for digital libraries. Springer-Verlag Berlin, Heidelberg, 498–501.Google Scholar
- Yang, K. H, Peng, H. T., & Jiang, J. Y. (2008). Author name disambiguation for citation using topic and web correlation. Proceedings of the 12 th Conference in the series of European Digital Library conferences (ECDL2008). Sept.19, 2008, Aarhus, 185–196.Google Scholar