Name disambiguation from link data in a collaboration graph using temporal and topological features

  • Tanay Kumar Saha
  • Baichuan Zhang
  • Mohammad Al Hasan
Original Article

Abstract

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error lead to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from timestamped link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.

References

  1. Allison P, Long JS (1987) Interuniversity mobility of academic scientists. Am. Sociol. Rev. 52(5):643–652CrossRefGoogle Scholar
  2. Bhattacharya I, Getoor L (2004) Deduplication and group detection using links. In: Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD)Google Scholar
  3. Bhattacharya I, Getoor L (2006) A latent dirichlet model for unsupervised entity resolution. In:  Proceedings of the SIAM international conference on data mining, pp 47–58Google Scholar
  4. Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp 9–16Google Scholar
  5. Cen L, Dragut EC, Si L, Ouzzani M (2013) Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th International ACM SIGIR conference on Research and development in information retrieval, pp 741–744Google Scholar
  6. Chin WS, Juan YC, et al (2013) Effective string processing and matching for author disambiguation. In: Proceedings of the KDD Cup 2013 Workshop, pp 71–79Google Scholar
  7. Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 708–716Google Scholar
  8. Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pp 334–343Google Scholar
  9. Han H, Zha H, Giles CL (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp 334–343Google Scholar
  10. Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of the 22nd ACM international conference on information knowledge management, pp 1037–1046Google Scholar
  11. Jackson MO (2008) Social and economic networks. Princeton University Press, PrincetonGoogle Scholar
  12. Kataria SS, Kumar KS, Rastogi RR, Sen P, Sengamedu SH (2011) Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1037–1045Google Scholar
  13. Li Y, Wang C, Han F, Han J, Roth D, Yan X (2013) Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1070–1078Google Scholar
  14. Liu J, Lei KH, Liu JY, Wang C, Han J (2013) Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the KDD Cup 2013 Workshop, pp 81–88Google Scholar
  15. Malin B (2005) Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, pp 93–102Google Scholar
  16. Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 27–34Google Scholar
  17. Newman MEJ (2006) Modularity and community structure in networks. In: Proceedings of the National Academy of Sciences, pp 8577–8582Google Scholar
  18. Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining,  pp 269–278Google Scholar
  19. Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web, pp 729–738Google Scholar
  20. Tan YF, Kan MY, Lee D (2006) Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp 314–315Google Scholar
  21. Tang J, Fong ACM, Wang B, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987CrossRefGoogle Scholar
  22. Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141MATHMathSciNetCrossRefGoogle Scholar
  23. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395-416Google Scholar
  24. Wang F, Li J, Tang J, Zhang J, Wang K (2008) Name disambiguation using atomic clusters. In: Proceedings of the 9th international conference on Web-Age information management, pp 357–364Google Scholar
  25. Wang X, Tang J, Cheng H, Yu PS (2011) Adana: active name disambiguation. In: Proceedings of the IEEE 11th international conference on data mining, pp 794–803Google Scholar
  26. Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. In: Proceedings of the VLDB Endowment 3(1–2):1326–1337Google Scholar
  27. Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp 219-232Google Scholar
  28. Yin X, Han J, Yu P (2007) Object distinction: distinguishing objects with identical names. Data Eng 1242–1246Google Scholar
  29. Zhang B, Saha TK, Hasan MA (2014) Name disambiguation from link data in a collaboration graph. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, pp 81–84Google Scholar
  30. Zhang D, Tang J, Li J, Wang K (2007) A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp 1019–22Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  • Tanay Kumar Saha
    • 1
  • Baichuan Zhang
    • 1
  • Mohammad Al Hasan
    • 1
  1. 1.Department of Computer and Information ScienceIndiana University - Purdue University IndianapolisIndianapolisUSA

Personalised recommendations