Skip to main content

Efficient Name Disambiguation in Digital Libraries

  • Conference paper
Web-Age Information Management (WAIM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

Abstract

In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Most of the previous works to solve this issue also known as name disambiguation often employ hierarchal clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we propose an approach that can effectively identify and retrieve information from web pages and use the information to disambiguate authors. Initially, we implement a web pages identification model by using a neural network classifier and traffic rank. Considering those records can not be found directly in personal pages, we then enhance the model to handle such case during the clustering process with performance improvement. We examine our approach on a subset of digital library records and the result is reasonable effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P., Fienberg, S.E.: Adaptive name matching in information integration. IEEE. Inte. Sys. 18, 16–23 (2003)

    Article  Google Scholar 

  2. Dongwen, L., Byung-Won, O., Jaewoo, K., Sanghyun, P.: Effective and scalable solutions for mixed and split citation problems in digital libraries. In: Proc of the 2nd Int. Workshop on Information Quality in Info. Sys, pp. 69–76 (2005)

    Google Scholar 

  3. Han, H., Giles, C.L., Hong, Y.Z.: Two supervised learning approaches for name disambiguation in author citations. In: 4th ACM/IEEE Joint Conf. on Digital Libraries, pp. 296–305 (2004)

    Google Scholar 

  4. Han, H., Zhang, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: 5th ACM/IEEE Joint Conf. on Digital Libraries, pp. 334–343 (2005)

    Google Scholar 

  5. Haykin, S.: Neural networks: A comprehensive foundation (1999)

    Google Scholar 

  6. Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity relationship graph. ACM Trans. Database System (2006)

    Google Scholar 

  7. Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. In: Information Processing and Management, pp. 84–97 (2009)

    Google Scholar 

  8. Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proc. of the 38th Annual Hawaii Int. Conf. on System Sciences, pp. 99–108 (2005)

    Google Scholar 

  9. Koehler, H., Zhou, X., Sadiq, S., Shu, Y., Taylor, K.: Sampling dirty data for matching attributes. In: SIGMOD, pp. 63–74 (2010)

    Google Scholar 

  10. Lee, D., Kang, J., Mitra, P., Giles, C.L.: Are your citations clean? new scenarios and challenges in maintaining digital libraries. Communication of the ACM, 33–38 (2007)

    Google Scholar 

  11. Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: ICDE, pp. 1183–1186 (2009)

    Google Scholar 

  12. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. Knowledge Discovery and Data Mining, 169–178 (2000)

    Google Scholar 

  13. Pasula, H., Marthi, B., Milch, B., Russell, S.J., Shpitser, I.: Identity uncertainty and citation matching. Neur. Info. Proc. Sys. 9, 1401–1408 (2002)

    Google Scholar 

  14. Pereira, D.A., Ribeiro, B.N., Ziviani, N., Alberto, H.F., Goncalves, A.M., Ferreira, A.A.: Using web information for author name disambiguation. In: 9th ACM/IEEE Joint Conf. on Digital Libraries, pp. 49–58 (2009)

    Google Scholar 

  15. Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Efficient topic-based unsupervised name disambiguation. In: In 7th ACM/IEEE Joint Conf. on Digital Libraries, pp. 342–352 (2007)

    Google Scholar 

  16. Tan, Y.F., Kan, M.Y., Lee, D.W.: Search engine driven author disambiguation. In: 6th ACM/IEEE Joint Conf. on Digital Libraries, pp. 314–315 (2006)

    Google Scholar 

  17. Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  18. Yin, X.X., Han, J.W.: Object distinction: Distinguishing objects with identical names. In: IEEE 23rd Int. Conf. on Data Engineering, pp. 1242–1246 (2007)

    Google Scholar 

  19. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow., 718–729 (2009)

    Google Scholar 

  20. Zhu, J., Fung, G., Zhou, X.: Efficient web pages identification for entity resolution. In: 19th International WWW, pp. 1223–1224 (2010)

    Google Scholar 

  21. Zhu, J., Fung, G.P.C., Zhou, X.F.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, J., Fung, G., Wang, L. (2011). Efficient Name Disambiguation in Digital Libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics