Scientometrics

, Volume 84, Issue 3, pp 763–784 | Cite as

Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps

Article

Abstract

Authorship identity has long been an Achilles’ heel in bibliometric analyses at the individual level. This problem appears in studies of scientists’ productivity, inventor mobility and scientific collaboration. Using the concepts of cognitive maps from psychology and approximate structural equivalence from network analysis, we develop a novel algorithm for name disambiguation based on knowledge homogeneity scores. We test it on two cases, and the results show that this approach outperforms other common authorship identification methods with the ASE method providing a relatively simple algorithm that yields higher levels of accuracy with reasonable time demands.

Keywords

Name disambiguation Common names Cognitive map Approximate structural equivalence Knowledge homogeneity score Hierarchical clustering 

References

  1. Abbasi, A., & Chun, H. (2006). Visualization authorship for identification. In: S. Mehrotra, et al. (Eds.), Proceedings of the IEEE international conference on intelligence and security informatics (LNCS 3975) (pp. 60–71). Berlin: Springer-Verlag.Google Scholar
  2. Borgman, C. L., & Siegfried, S. L. (1999). Getty’s SynonameTM and its cousins: A survey of applications of personal name-matching algorithms. Journal of the American Society for Information Science, 43(7), 45–476.Google Scholar
  3. Chaski, C. E. (2005). Who’s at the keyboard? Authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 4(1), 1–13.Google Scholar
  4. Frietsch, R., Tang, L., & Hinze, S. (2008). Bibliometric data study: Assessing the current ranking of the People’s Republic of China in a set of research fields. Fraunhofer ISI Discussion Papers Innovation Systems and Policy Analysis, No 15. Karlsruhe: Fraunhofer ISI.Google Scholar
  5. Garfield, E. (1969). British quest for uniqueness versus American egocentrism. Nature, 223(5207), 763.CrossRefGoogle Scholar
  6. Griffith, R. A. (2008). Method and system for disambiguating informational objects. Patent Application Number: US 20080275859 A1. USPTO.Google Scholar
  7. Han, H., Giles, C. L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. Paper presented at the Proceedings of the ACM/IEEE Joint Conference on Digital Libraries.Google Scholar
  8. Han, H., Xu, W., Zha, H., & Giles, C. L. (2005). A hierarchical naive Bayes mixture model for name disambiguation in author citations. Paper presented at the Proceedings of the 2005 ACM Symposium on Applied Computing.Google Scholar
  9. Han, H., Zha, H., & Giles, C. L. (2005). Name disambiguation in author citations using a K-way spectral clustering method. In M. Marlino, T. Sumner, & F. M. Shipman III (Eds.), Proceedings of the 5th ACM/IEEE joint conference on digital libraries (pp. 334–343). Denver: ACM Press.Google Scholar
  10. Hanneman, R. A., & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California, Riverside. http://faculty.ucr.edu/~hanneman/.
  11. Houvardas, J., & Stamatatos, E. (2006). N-gram feature selection for authorship identification. In J. Euzenat & J. Domingue (Eds.), Proceedings of the 12th international conference on artificial intelligence: Methodology, systems, applications (AIMSA’06), LNCS 4183 (pp. 77–86). Berlin: Springer-Verlag.Google Scholar
  12. Huang, J., Ertekin, S., & Giles, C. L. (2006). Fast author name disambiguation in CiteSeer. Working paper. http://www.cse.psu.edu/~sertekin/Papers/IST-TR_DisambiguationCiteseer.pdf.
  13. Jacobs, L. F., & Schenk, F. (2003). Unpacking the cognitive map: The parallel map theory of hippocampal function. Psychological Review, 110(2), 285–315.CrossRefGoogle Scholar
  14. Jones, B., Wuchty, S., & Uzzi, B. (2008). Multi-university research teams: Shifting impact, geography, and stratification in science. Science, 322, 1259–1262.CrossRefGoogle Scholar
  15. Kang, I. S. (2009). On co-authorship for author disambiguation. Information Processing and Management, 45(1), 84–97.CrossRefGoogle Scholar
  16. Kostoff, R. (2008). Comparison of China/USA science and technology performance. Journal of Informetrics, 57, 1–10.Google Scholar
  17. Kostoff, R., et al. (2006). The structure and infrastructure of Chinese science and technology. DTIC Technical Report, No. ADA 443315. http://www.onr.navy.mil/sci_tech/33/332/docs/060307_chinese_sci_tech.pdf.
  18. Kuhn, T. (1970). The structure of scientific revolutions (2nd ed.). Chicago: University of Chicago Press.Google Scholar
  19. Lai, R., D’amour, A., & Fleming, L. (2009). The careers and co-authorship networks of U.S. patent-holders since 1975. Working paper.Google Scholar
  20. Lin, J. C. (1988). Chinese names containing non-Chinese given name. Cataloging & Classification Quarterly, 9(1), 69–81.CrossRefGoogle Scholar
  21. Lorrain, F., & White, H. C. (1971). Structural equivalence of individuals in social networks. Journal of Mathematical Sociology, 1, 49–80.Google Scholar
  22. Macroberts, M. H., & Macroberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40, 342–349.CrossRefGoogle Scholar
  23. McCallum, A., & Wellner, B. (2003). Toward conditional models of identity uncertainty with application to proper noun conference. Paper presented at the IJCAI Workshop on Information Integration.Google Scholar
  24. Meho, L., & Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus Scopus and Google Scholar. Journal of the American Society for Information Science & Technology, 58(13), 2105–2125.CrossRefGoogle Scholar
  25. Merton, R. (1973). The sociology of science: Theoretical and empirical investigations. Chicago: University of Chicago Press.Google Scholar
  26. National Science Foundation (NSF). (2008). Science and engineering indicators. Washington: Government Printing Office.Google Scholar
  27. O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford, England: Oxford University Press.Google Scholar
  28. Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2004). Identity uncertainty and citation matching. Paper presented at the Advances in Neural Information Processing (NIPS).Google Scholar
  29. Pauly, D., & Stergiou, K. I. (2005). Equivalence of results from two citation analyses: Thomson ISI’s citation index and Google’s scholar service. Ethics in Science and Environmental Politics, 5, 33–35.Google Scholar
  30. Phelan, T. J. (1999). A compendium of issues for citation analysis. Scientometrics, 45(1), 117–136.CrossRefGoogle Scholar
  31. Pieters, R., Baumgartner, H., Vermunt, J., & Bijmolt, T. (1999). Importance and similarity in the evolving citation network of the International Journal of Research in Marketing. International Journal of Research in Marketing, 16(2), 113–127.CrossRefGoogle Scholar
  32. Porter, A. L., Youtie, J., Shapira, P., & Schoneneck, D. (2008). Refining search terms for nanotechnology. Journal of Nanoparticle Research, 10, 715–728.CrossRefGoogle Scholar
  33. Raffo, J., & Lhuillery, S. (2009). How to play the “names game”: Patent retrieval comparing different heuristics. Research Policy, 38(10), 1617–1627.CrossRefGoogle Scholar
  34. Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. In B. Cronin (Ed.), Annual review of information science and technology (Vol. 43). Maryland, USA: American Society for Information Science and Technology (ASIST).Google Scholar
  35. Soler, J. M. (2007). Separating the articles of authors with the same name. Scientometrics, 72(2), 281–290.CrossRefMathSciNetGoogle Scholar
  36. Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis. Working paper.Google Scholar
  37. Tan, C. N. (1986). Chinese personal names. Library Association Record, 88, 551.Google Scholar
  38. Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(3), Article 11.Google Scholar
  39. Trajtenberg, M., Shiff, G., & Melamed, R. (2006). The “names game”: Harnessing inventors’ patent data for economic research. NBER Working Paper No. 12479.Google Scholar
  40. Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. Paper presented at the JCDL, Austin, Texas, USA.Google Scholar
  41. Van Mechelen, I., Bock, H. H., & De Boeck, P. (2004). Two-mode clustering methods: A structured overview. Statistical Methods in Medical Research, 13(5), 363–394.MATHCrossRefMathSciNetGoogle Scholar
  42. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.Google Scholar
  43. Wooding, S., Wilcox-Jay, K., Lewison, G., & Grant, J. (2005). Co-author inclusion: A novel recursive algorithmic method for dealing with homonyms in bibliometric analysis. Scientometrics, 66(1), 11–21.CrossRefGoogle Scholar
  44. Youtie, J., Shapira, P., & Porter, A. (2008). National nanotechnology publications and citations. Journal of Nanoparticle Research, 10(6), 981–986.CrossRefGoogle Scholar
  45. Zhao, D. Z., & Logan, E. (2002). Citation analysis using scientific publications on the web as data source: A case study in the XML research area. Scientometrics, 54(3), 449–472.CrossRefGoogle Scholar
  46. Zhou, P., & Leydesdorff, L. (2008). China ranks second in scientific publications since 2006. ISSI Newsletter, 13, 7–9.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2010

Authors and Affiliations

  1. 1.School of Public PolicyGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations