Skip to main content
Log in

Name disambiguation from link data in a collaboration graph using temporal and topological features

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error lead to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from timestamped link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://dblp.org/search/index.php.

  2. http://arnetminer.org.

References

  • Allison P, Long JS (1987) Interuniversity mobility of academic scientists. Am. Sociol. Rev. 52(5):643–652

    Article  Google Scholar 

  • Bhattacharya I, Getoor L (2004) Deduplication and group detection using links. In: Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD)

  • Bhattacharya I, Getoor L (2006) A latent dirichlet model for unsupervised entity resolution. In:  Proceedings of the SIAM international conference on data mining, pp 47–58

  • Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp 9–16

  • Cen L, Dragut EC, Si L, Ouzzani M (2013) Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th International ACM SIGIR conference on Research and development in information retrieval, pp 741–744

  • Chin WS, Juan YC, et al (2013) Effective string processing and matching for author disambiguation. In: Proceedings of the KDD Cup 2013 Workshop, pp 71–79

  • Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 708–716

  • Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pp 334–343

  • Han H, Zha H, Giles CL (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp 334–343

  • Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of the 22nd ACM international conference on information knowledge management, pp 1037–1046

  • Jackson MO (2008) Social and economic networks. Princeton University Press, Princeton

  • Kataria SS, Kumar KS, Rastogi RR, Sen P, Sengamedu SH (2011) Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1037–1045

  • Li Y, Wang C, Han F, Han J, Roth D, Yan X (2013) Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1070–1078

  • Liu J, Lei KH, Liu JY, Wang C, Han J (2013) Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the KDD Cup 2013 Workshop, pp 81–88

  • Malin B (2005) Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, pp 93–102

  • Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 27–34

  • Newman MEJ (2006) Modularity and community structure in networks. In: Proceedings of the National Academy of Sciences, pp 8577–8582

  • Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining,  pp 269–278

  • Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web, pp 729–738

  • Tan YF, Kan MY, Lee D (2006) Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp 314–315

  • Tang J, Fong ACM, Wang B, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987

    Article  Google Scholar 

  • Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141

    Article  MATH  MathSciNet  Google Scholar 

  • Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395-416

  • Wang F, Li J, Tang J, Zhang J, Wang K (2008) Name disambiguation using atomic clusters. In: Proceedings of the 9th international conference on Web-Age information management, pp 357–364

  • Wang X, Tang J, Cheng H, Yu PS (2011) Adana: active name disambiguation. In: Proceedings of the IEEE 11th international conference on data mining, pp 794–803

  • Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. In: Proceedings of the VLDB Endowment 3(1–2):1326–1337

  • Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp 219-232

  • Yin X, Han J, Yu P (2007) Object distinction: distinguishing objects with identical names. Data Eng 1242–1246

  • Zhang B, Saha TK, Hasan MA (2014) Name disambiguation from link data in a collaboration graph. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, pp 81–84

  • Zhang D, Tang J, Li J, Wang K (2007) A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp 1019–22

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanay Kumar Saha.

Additional information

A conference version of this paper is published in ASONAM 2014 conference proceedings. This research is supported by Mohammad Hasan’s NSF CAREER Award (IIS-1149851). T. K. Saha and B. Zhang have contributed equally for this research.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, T.K., Zhang, B. & Hasan, M.A. Name disambiguation from link data in a collaboration graph using temporal and topological features. Soc. Netw. Anal. Min. 5, 11 (2015). https://doi.org/10.1007/s13278-015-0249-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-015-0249-1

Keywords

Navigation