Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps
Authorship identity has long been an Achilles’ heel in bibliometric analyses at the individual level. This problem appears in studies of scientists’ productivity, inventor mobility and scientific collaboration. Using the concepts of cognitive maps from psychology and approximate structural equivalence from network analysis, we develop a novel algorithm for name disambiguation based on knowledge homogeneity scores. We test it on two cases, and the results show that this approach outperforms other common authorship identification methods with the ASE method providing a relatively simple algorithm that yields higher levels of accuracy with reasonable time demands.
KeywordsName disambiguation Common names Cognitive map Approximate structural equivalence Knowledge homogeneity score Hierarchical clustering
- Abbasi, A., & Chun, H. (2006). Visualization authorship for identification. In: S. Mehrotra, et al. (Eds.), Proceedings of the IEEE international conference on intelligence and security informatics (LNCS 3975) (pp. 60–71). Berlin: Springer-Verlag.Google Scholar
- Borgman, C. L., & Siegfried, S. L. (1999). Getty’s SynonameTM and its cousins: A survey of applications of personal name-matching algorithms. Journal of the American Society for Information Science, 43(7), 45–476.Google Scholar
- Chaski, C. E. (2005). Who’s at the keyboard? Authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 4(1), 1–13.Google Scholar
- Frietsch, R., Tang, L., & Hinze, S. (2008). Bibliometric data study: Assessing the current ranking of the People’s Republic of China in a set of research fields. Fraunhofer ISI Discussion Papers Innovation Systems and Policy Analysis, No 15. Karlsruhe: Fraunhofer ISI.Google Scholar
- Griffith, R. A. (2008). Method and system for disambiguating informational objects. Patent Application Number: US 20080275859 A1. USPTO.Google Scholar
- Han, H., Giles, C. L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. Paper presented at the Proceedings of the ACM/IEEE Joint Conference on Digital Libraries.Google Scholar
- Han, H., Xu, W., Zha, H., & Giles, C. L. (2005). A hierarchical naive Bayes mixture model for name disambiguation in author citations. Paper presented at the Proceedings of the 2005 ACM Symposium on Applied Computing.Google Scholar
- Han, H., Zha, H., & Giles, C. L. (2005). Name disambiguation in author citations using a K-way spectral clustering method. In M. Marlino, T. Sumner, & F. M. Shipman III (Eds.), Proceedings of the 5th ACM/IEEE joint conference on digital libraries (pp. 334–343). Denver: ACM Press.Google Scholar
- Hanneman, R. A., & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California, Riverside. http://faculty.ucr.edu/~hanneman/.
- Houvardas, J., & Stamatatos, E. (2006). N-gram feature selection for authorship identification. In J. Euzenat & J. Domingue (Eds.), Proceedings of the 12th international conference on artificial intelligence: Methodology, systems, applications (AIMSA’06), LNCS 4183 (pp. 77–86). Berlin: Springer-Verlag.Google Scholar
- Huang, J., Ertekin, S., & Giles, C. L. (2006). Fast author name disambiguation in CiteSeer. Working paper. http://www.cse.psu.edu/~sertekin/Papers/IST-TR_DisambiguationCiteseer.pdf.
- Kostoff, R. (2008). Comparison of China/USA science and technology performance. Journal of Informetrics, 57, 1–10.Google Scholar
- Kostoff, R., et al. (2006). The structure and infrastructure of Chinese science and technology. DTIC Technical Report, No. ADA 443315. http://www.onr.navy.mil/sci_tech/33/332/docs/060307_chinese_sci_tech.pdf.
- Kuhn, T. (1970). The structure of scientific revolutions (2nd ed.). Chicago: University of Chicago Press.Google Scholar
- Lai, R., D’amour, A., & Fleming, L. (2009). The careers and co-authorship networks of U.S. patent-holders since 1975. Working paper.Google Scholar
- Lorrain, F., & White, H. C. (1971). Structural equivalence of individuals in social networks. Journal of Mathematical Sociology, 1, 49–80.Google Scholar
- McCallum, A., & Wellner, B. (2003). Toward conditional models of identity uncertainty with application to proper noun conference. Paper presented at the IJCAI Workshop on Information Integration.Google Scholar
- Merton, R. (1973). The sociology of science: Theoretical and empirical investigations. Chicago: University of Chicago Press.Google Scholar
- National Science Foundation (NSF). (2008). Science and engineering indicators. Washington: Government Printing Office.Google Scholar
- O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford, England: Oxford University Press.Google Scholar
- Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2004). Identity uncertainty and citation matching. Paper presented at the Advances in Neural Information Processing (NIPS).Google Scholar
- Pauly, D., & Stergiou, K. I. (2005). Equivalence of results from two citation analyses: Thomson ISI’s citation index and Google’s scholar service. Ethics in Science and Environmental Politics, 5, 33–35.Google Scholar
- Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. In B. Cronin (Ed.), Annual review of information science and technology (Vol. 43). Maryland, USA: American Society for Information Science and Technology (ASIST).Google Scholar
- Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis. Working paper.Google Scholar
- Tan, C. N. (1986). Chinese personal names. Library Association Record, 88, 551.Google Scholar
- Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(3), Article 11.Google Scholar
- Trajtenberg, M., Shiff, G., & Melamed, R. (2006). The “names game”: Harnessing inventors’ patent data for economic research. NBER Working Paper No. 12479.Google Scholar
- Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. Paper presented at the JCDL, Austin, Texas, USA.Google Scholar
- Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.Google Scholar
- Zhou, P., & Leydesdorff, L. (2008). China ranks second in scientific publications since 2006. ISSI Newsletter, 13, 7–9.Google Scholar