Efficient Name Disambiguation in Digital Libraries

  • Jia Zhu
  • Gabriel Fung
  • Liwei Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6897)


In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Most of the previous works to solve this issue also known as name disambiguation often employ hierarchal clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we propose an approach that can effectively identify and retrieve information from web pages and use the information to disambiguate authors. Initially, we implement a web pages identification model by using a neural network classifier and traffic rank. Considering those records can not be found directly in personal pages, we then enhance the model to handle such case during the clustering process with performance improvement. We examine our approach on a subset of digital library records and the result is reasonable effective.


Search Engine Neural Network Model Digital Library Latent Dirichlet Allocation Probabilistic Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P., Fienberg, S.E.: Adaptive name matching in information integration. IEEE. Inte. Sys. 18, 16–23 (2003)CrossRefGoogle Scholar
  2. 2.
    Dongwen, L., Byung-Won, O., Jaewoo, K., Sanghyun, P.: Effective and scalable solutions for mixed and split citation problems in digital libraries. In: Proc of the 2nd Int. Workshop on Information Quality in Info. Sys, pp. 69–76 (2005)Google Scholar
  3. 3.
    Han, H., Giles, C.L., Hong, Y.Z.: Two supervised learning approaches for name disambiguation in author citations. In: 4th ACM/IEEE Joint Conf. on Digital Libraries, pp. 296–305 (2004)Google Scholar
  4. 4.
    Han, H., Zhang, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: 5th ACM/IEEE Joint Conf. on Digital Libraries, pp. 334–343 (2005)Google Scholar
  5. 5.
    Haykin, S.: Neural networks: A comprehensive foundation (1999)Google Scholar
  6. 6.
    Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity relationship graph. ACM Trans. Database System (2006)Google Scholar
  7. 7.
    Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. In: Information Processing and Management, pp. 84–97 (2009)Google Scholar
  8. 8.
    Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proc. of the 38th Annual Hawaii Int. Conf. on System Sciences, pp. 99–108 (2005)Google Scholar
  9. 9.
    Koehler, H., Zhou, X., Sadiq, S., Shu, Y., Taylor, K.: Sampling dirty data for matching attributes. In: SIGMOD, pp. 63–74 (2010)Google Scholar
  10. 10.
    Lee, D., Kang, J., Mitra, P., Giles, C.L.: Are your citations clean? new scenarios and challenges in maintaining digital libraries. Communication of the ACM, 33–38 (2007)Google Scholar
  11. 11.
    Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: ICDE, pp. 1183–1186 (2009)Google Scholar
  12. 12.
    McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. Knowledge Discovery and Data Mining, 169–178 (2000)Google Scholar
  13. 13.
    Pasula, H., Marthi, B., Milch, B., Russell, S.J., Shpitser, I.: Identity uncertainty and citation matching. Neur. Info. Proc. Sys. 9, 1401–1408 (2002)Google Scholar
  14. 14.
    Pereira, D.A., Ribeiro, B.N., Ziviani, N., Alberto, H.F., Goncalves, A.M., Ferreira, A.A.: Using web information for author name disambiguation. In: 9th ACM/IEEE Joint Conf. on Digital Libraries, pp. 49–58 (2009)Google Scholar
  15. 15.
    Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Efficient topic-based unsupervised name disambiguation. In: In 7th ACM/IEEE Joint Conf. on Digital Libraries, pp. 342–352 (2007)Google Scholar
  16. 16.
    Tan, Y.F., Kan, M.Y., Lee, D.W.: Search engine driven author disambiguation. In: 6th ACM/IEEE Joint Conf. on Digital Libraries, pp. 314–315 (2006)Google Scholar
  17. 17.
    Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Yin, X.X., Han, J.W.: Object distinction: Distinguishing objects with identical names. In: IEEE 23rd Int. Conf. on Data Engineering, pp. 1242–1246 (2007)Google Scholar
  19. 19.
    Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow., 718–729 (2009)Google Scholar
  20. 20.
    Zhu, J., Fung, G., Zhou, X.: Efficient web pages identification for entity resolution. In: 19th International WWW, pp. 1223–1224 (2010)Google Scholar
  21. 21.
    Zhu, J., Fung, G.P.C., Zhou, X.F.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jia Zhu
    • 1
  • Gabriel Fung
    • 2
  • Liwei Wang
    • 3
  1. 1.School of ITEEThe University of QueenslandAustralia
  2. 2.iConcept PressHong Kong
  3. 3.Wuhan UniversityChina

Personalised recommendations