Efficient Name Disambiguation in Digital Libraries

Zhu, Jia; Fung, Gabriel; Wang, Liwei

doi:10.1007/978-3-642-23535-1_37

Jia Zhu²¹,
Gabriel Fung²² &
Liwei Wang²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

International Conference on Web-Age Information Management

1723 Accesses
1 Citations

Abstract

In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Most of the previous works to solve this issue also known as name disambiguation often employ hierarchal clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we propose an approach that can effectively identify and retrieve information from web pages and use the information to disambiguate authors. Initially, we implement a web pages identification model by using a neural network classifier and traffic rank. Considering those records can not be found directly in personal pages, we then enhance the model to handle such case during the clustering process with performance improvement. We examine our approach on a subset of digital library records and the result is reasonable effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P., Fienberg, S.E.: Adaptive name matching in information integration. IEEE. Inte. Sys. 18, 16–23 (2003)
Article Google Scholar
Dongwen, L., Byung-Won, O., Jaewoo, K., Sanghyun, P.: Effective and scalable solutions for mixed and split citation problems in digital libraries. In: Proc of the 2nd Int. Workshop on Information Quality in Info. Sys, pp. 69–76 (2005)
Google Scholar
Han, H., Giles, C.L., Hong, Y.Z.: Two supervised learning approaches for name disambiguation in author citations. In: 4th ACM/IEEE Joint Conf. on Digital Libraries, pp. 296–305 (2004)
Google Scholar
Han, H., Zhang, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: 5th ACM/IEEE Joint Conf. on Digital Libraries, pp. 334–343 (2005)
Google Scholar
Haykin, S.: Neural networks: A comprehensive foundation (1999)
Google Scholar
Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity relationship graph. ACM Trans. Database System (2006)
Google Scholar
Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. In: Information Processing and Management, pp. 84–97 (2009)
Google Scholar
Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proc. of the 38th Annual Hawaii Int. Conf. on System Sciences, pp. 99–108 (2005)
Google Scholar
Koehler, H., Zhou, X., Sadiq, S., Shu, Y., Taylor, K.: Sampling dirty data for matching attributes. In: SIGMOD, pp. 63–74 (2010)
Google Scholar
Lee, D., Kang, J., Mitra, P., Giles, C.L.: Are your citations clean? new scenarios and challenges in maintaining digital libraries. Communication of the ACM, 33–38 (2007)
Google Scholar
Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: ICDE, pp. 1183–1186 (2009)
Google Scholar
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. Knowledge Discovery and Data Mining, 169–178 (2000)
Google Scholar
Pasula, H., Marthi, B., Milch, B., Russell, S.J., Shpitser, I.: Identity uncertainty and citation matching. Neur. Info. Proc. Sys. 9, 1401–1408 (2002)
Google Scholar
Pereira, D.A., Ribeiro, B.N., Ziviani, N., Alberto, H.F., Goncalves, A.M., Ferreira, A.A.: Using web information for author name disambiguation. In: 9th ACM/IEEE Joint Conf. on Digital Libraries, pp. 49–58 (2009)
Google Scholar
Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Efficient topic-based unsupervised name disambiguation. In: In 7th ACM/IEEE Joint Conf. on Digital Libraries, pp. 342–352 (2007)
Google Scholar
Tan, Y.F., Kan, M.Y., Lee, D.W.: Search engine driven author disambiguation. In: 6th ACM/IEEE Joint Conf. on Digital Libraries, pp. 314–315 (2006)
Google Scholar
Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008)
Chapter Google Scholar
Yin, X.X., Han, J.W.: Object distinction: Distinguishing objects with identical names. In: IEEE 23rd Int. Conf. on Data Engineering, pp. 1242–1246 (2007)
Google Scholar
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow., 718–729 (2009)
Google Scholar
Zhu, J., Fung, G., Zhou, X.: Efficient web pages identification for entity resolution. In: 19th International WWW, pp. 1223–1224 (2010)
Google Scholar
Zhu, J., Fung, G.P.C., Zhou, X.F.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of ITEE, The University of Queensland, Australia
Jia Zhu
iConcept Press, Hong Kong
Gabriel Fung
Wuhan University, China
Liwei Wang

Authors

Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Fung
View author publications
You can also search for this author in PubMed Google Scholar
Liwei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Asia, 5 Danling Rd., Haidian District, 100190, Beijing, China
Haixun Wang
Computer School, Wuhan University, 16 Luojiashan Road, 430072, Hubei, China
Shijun Li
Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, 060-0814, Hokkaido, Sapporo, Japan
Satoshi Oyama
College of Information Science and Technology, Drexel University, 19104, Philadelphia, PA, USA
Xiaohua Hu
State Key Laboratory of Software Engineering, Wuhan University, 16 Luojiashan Road, 430072, Wuhan, Hubei, China
Tieyun Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, J., Fung, G., Wang, L. (2011). Efficient Name Disambiguation in Digital Libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-23535-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics