Scientometrics

, Volume 112, Issue 2, pp 747–766 | Cite as

Name usage pattern in the synonym ambiguity problem in bibliographic data

Article

Abstract

Individuals often appear with multiple names when considering large bibliographic datasets, giving rise to the synonym ambiguity problem. Although most related works focus on resolving name ambiguities, this work focus on classifying and characterizing multiple name usage patterns—the root cause for such ambiguity. By considering real examples bibliographic datasets, we identify and classify patterns of multiple name usage by individuals, which can be interpreted as name change, rare name usage, and name co-appearance. In particular, we propose a methodology to classify name usage patterns through a supervised classification task and show that different classes are robust (across datasets) and exhibit significantly different properties. We show that the collaboration network structure emerging around nodes corresponding to ambiguous names from different name usage patterns have strikingly different characteristics, such as their common neighborhood and degree evolution. We believe such differences in network structure and in name usage patterns can be leveraged to design more efficient name disambiguation algorithms that target the synonym problem.

Keywords

Name ambiguity Classification Collaboration network 

References

  1. Amancio, D. R., Oliveira, O. N., & Costa, L. D. F. (2015). Topological-collaborative approach for disambiguating authors’ names in collaborative networks. Scientometrics, 102(1), 465–485.CrossRefGoogle Scholar
  2. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.MATHGoogle Scholar
  3. Elliot, S. (2010). Survey of author name disambiguation: 2004 to 2010. Library Philosophy and Practice. http://digitalcommons.unl.edu/libphilprac/473/.
  4. Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.CrossRefGoogle Scholar
  5. Fegley, B. D., & Torvik, V. I. (2013). Has large-scale named-entity network analysis been resting on a flawed assumption? PLoS ONE, 8(7), 1–16.CrossRefGoogle Scholar
  6. Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. SIGMOD Record, 41(2), 15–26.CrossRefGoogle Scholar
  7. Gomide, J., Kling, H., & Figueiredo, D. (2015). A model for ambiguation and an algorithm for dis-ambiguation in social networks. In Complex networks VI, studies in computational intelligence (pp. 37–44). New York: Springer. doi:10.1007/978-3-319-16112-9_4.
  8. Gupta, M., & Han, J. (2011). Heterogeneous network-based trust analysis: A survey. ACM SIGKDD Explorations Newsletter, 13(1), 54–71.CrossRefGoogle Scholar
  9. Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.CrossRefMATHGoogle Scholar
  10. Hermansson, L., Kerola, T., Johansson, F., et al. (2013). Entity disambiguation in anonymized graphs using graph kernels. In: Conference on information and knowledge management (CIKM).Google Scholar
  11. Huang, J., Ertekin, S., & Giles, C. L. (2006). Fast author name disambiguation in citeseer. Information Sciences Institute Technical Reports.Google Scholar
  12. Kim, J., & Diesner, J. (2015). The effect of data pre-processing on understanding the evolution of collaboration networks. Journal of Informetrics, 9(1), 226–236.CrossRefGoogle Scholar
  13. Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461.CrossRefGoogle Scholar
  14. Li, G. C., Lai, R., DAmour, A., & Doolin, D. M. (2014). Disambiguation and co-authorship networks of the U.S. patent inventor database. Research Policy, 43(6), 941–955.CrossRefGoogle Scholar
  15. Liu, W., Islamaj Doğan, R., Kim, S., et al. (2014). Author name disambiguation for pubmed. Journal of the Association for Information Science and Technology, 65(4), 765–781.CrossRefGoogle Scholar
  16. Shen, W., Wang, J., & Han, J. (2015). Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2), 443–460.Google Scholar
  17. Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50.CrossRefGoogle Scholar
  18. Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.CrossRefGoogle Scholar
  19. Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in medline. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(3), 11.CrossRefGoogle Scholar
  20. Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.CrossRefGoogle Scholar
  21. Zhang, B., Saha, T. K., & Hasan, M. A. (2014). Name disambiguation from link data in a collaboration graph. In: Advances in Social Networks Analysis and Minig (ASONAM).Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  1. 1.PESC/COPPEFederal University of Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations