Abstract
In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error lead to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from timestamped link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.
Similar content being viewed by others
Notes
http://dblp.org/search/index.php.
http://arnetminer.org.
References
Allison P, Long JS (1987) Interuniversity mobility of academic scientists. Am. Sociol. Rev. 52(5):643–652
Bhattacharya I, Getoor L (2004) Deduplication and group detection using links. In: Proceedings of the ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD)
Bhattacharya I, Getoor L (2006) A latent dirichlet model for unsupervised entity resolution. In: Proceedings of the SIAM international conference on data mining, pp 47–58
Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp 9–16
Cen L, Dragut EC, Si L, Ouzzani M (2013) Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th International ACM SIGIR conference on Research and development in information retrieval, pp 741–744
Chin WS, Juan YC, et al (2013) Effective string processing and matching for author disambiguation. In: Proceedings of the KDD Cup 2013 Workshop, pp 71–79
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 708–716
Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pp 334–343
Han H, Zha H, Giles CL (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp 334–343
Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of the 22nd ACM international conference on information knowledge management, pp 1037–1046
Jackson MO (2008) Social and economic networks. Princeton University Press, Princeton
Kataria SS, Kumar KS, Rastogi RR, Sen P, Sengamedu SH (2011) Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1037–1045
Li Y, Wang C, Han F, Han J, Roth D, Yan X (2013) Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1070–1078
Liu J, Lei KH, Liu JY, Wang C, Han J (2013) Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the KDD Cup 2013 Workshop, pp 81–88
Malin B (2005) Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, pp 93–102
Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 27–34
Newman MEJ (2006) Modularity and community structure in networks. In: Proceedings of the National Academy of Sciences, pp 8577–8582
Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–278
Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web, pp 729–738
Tan YF, Kan MY, Lee D (2006) Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp 314–315
Tang J, Fong ACM, Wang B, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987
Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395-416
Wang F, Li J, Tang J, Zhang J, Wang K (2008) Name disambiguation using atomic clusters. In: Proceedings of the 9th international conference on Web-Age information management, pp 357–364
Wang X, Tang J, Cheng H, Yu PS (2011) Adana: active name disambiguation. In: Proceedings of the IEEE 11th international conference on data mining, pp 794–803
Whang SE, Garcia-Molina H (2010) Entity resolution with evolving rules. In: Proceedings of the VLDB Endowment 3(1–2):1326–1337
Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp 219-232
Yin X, Han J, Yu P (2007) Object distinction: distinguishing objects with identical names. Data Eng 1242–1246
Zhang B, Saha TK, Hasan MA (2014) Name disambiguation from link data in a collaboration graph. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, pp 81–84
Zhang D, Tang J, Li J, Wang K (2007) A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp 1019–22
Author information
Authors and Affiliations
Corresponding author
Additional information
A conference version of this paper is published in ASONAM 2014 conference proceedings. This research is supported by Mohammad Hasan’s NSF CAREER Award (IIS-1149851). T. K. Saha and B. Zhang have contributed equally for this research.
Rights and permissions
About this article
Cite this article
Saha, T.K., Zhang, B. & Hasan, M.A. Name disambiguation from link data in a collaboration graph using temporal and topological features. Soc. Netw. Anal. Min. 5, 11 (2015). https://doi.org/10.1007/s13278-015-0249-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-015-0249-1