Abstract
In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Most of the previous works to solve this issue also known as name disambiguation often employ hierarchal clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we propose an approach that can effectively identify and retrieve information from web pages and use the information to disambiguate authors. Initially, we implement a web pages identification model by using a neural network classifier and traffic rank. Considering those records can not be found directly in personal pages, we then enhance the model to handle such case during the clustering process with performance improvement. We examine our approach on a subset of digital library records and the result is reasonable effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P., Fienberg, S.E.: Adaptive name matching in information integration. IEEE. Inte. Sys. 18, 16–23 (2003)
Dongwen, L., Byung-Won, O., Jaewoo, K., Sanghyun, P.: Effective and scalable solutions for mixed and split citation problems in digital libraries. In: Proc of the 2nd Int. Workshop on Information Quality in Info. Sys, pp. 69–76 (2005)
Han, H., Giles, C.L., Hong, Y.Z.: Two supervised learning approaches for name disambiguation in author citations. In: 4th ACM/IEEE Joint Conf. on Digital Libraries, pp. 296–305 (2004)
Han, H., Zhang, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: 5th ACM/IEEE Joint Conf. on Digital Libraries, pp. 334–343 (2005)
Haykin, S.: Neural networks: A comprehensive foundation (1999)
Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity relationship graph. ACM Trans. Database System (2006)
Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. In: Information Processing and Management, pp. 84–97 (2009)
Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proc. of the 38th Annual Hawaii Int. Conf. on System Sciences, pp. 99–108 (2005)
Koehler, H., Zhou, X., Sadiq, S., Shu, Y., Taylor, K.: Sampling dirty data for matching attributes. In: SIGMOD, pp. 63–74 (2010)
Lee, D., Kang, J., Mitra, P., Giles, C.L.: Are your citations clean? new scenarios and challenges in maintaining digital libraries. Communication of the ACM, 33–38 (2007)
Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: ICDE, pp. 1183–1186 (2009)
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. Knowledge Discovery and Data Mining, 169–178 (2000)
Pasula, H., Marthi, B., Milch, B., Russell, S.J., Shpitser, I.: Identity uncertainty and citation matching. Neur. Info. Proc. Sys. 9, 1401–1408 (2002)
Pereira, D.A., Ribeiro, B.N., Ziviani, N., Alberto, H.F., Goncalves, A.M., Ferreira, A.A.: Using web information for author name disambiguation. In: 9th ACM/IEEE Joint Conf. on Digital Libraries, pp. 49–58 (2009)
Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Efficient topic-based unsupervised name disambiguation. In: In 7th ACM/IEEE Joint Conf. on Digital Libraries, pp. 342–352 (2007)
Tan, Y.F., Kan, M.Y., Lee, D.W.: Search engine driven author disambiguation. In: 6th ACM/IEEE Joint Conf. on Digital Libraries, pp. 314–315 (2006)
Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008)
Yin, X.X., Han, J.W.: Object distinction: Distinguishing objects with identical names. In: IEEE 23rd Int. Conf. on Data Engineering, pp. 1242–1246 (2007)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow., 718–729 (2009)
Zhu, J., Fung, G., Zhou, X.: Efficient web pages identification for entity resolution. In: 19th International WWW, pp. 1223–1224 (2010)
Zhu, J., Fung, G.P.C., Zhou, X.F.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, J., Fung, G., Wang, L. (2011). Efficient Name Disambiguation in Digital Libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-23535-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)