Skip to main content

A Novel Approach for Author Name Disambiguation Using Ranking Confidence

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Abstract

In digital libraries, ambiguous author names may occur because of the existence of multiple authors with the same name or different name variations for the same person. In recent years, name disambiguation has become a major challenge when integrating data from multiple sources in bibliographic digital libraries. Most of the previous works solve this issue by using many attributes, such as coauthors, title of articles/publications, topics of articles, and years of publications. However, in most cases, we can only get the coauthor and title attributes. In this paper, we propose an approach which is based on Hierarchical Agglomerative Clustering (HAC) and only use the coauthor and title attributes, but can more effectively identify the disambiguation authors. The whole algorithm can divide into two stages. In the first stage, we employ a pair-wise grouping algorithm which is based on coauthors’name to group records into clusters. Then, we merge two clusters if the similarity of the article titles from two clusters reach the threshold. Here, we use three kinds of similarity algorithms such as Jaccard Similarity, Cosine Similarity and Euclidean Distance to compare the similarity between the titles of two clusters. To minimize the risk of using only one similarity metric, we design the concept of ranking confidence to measure the confidence of different similarity meausrements. The ranking confidence decides which similarity measure to use when merging clusters. In the experiments, we use PairPresicion, PairRecall and PairF1 score to evaluate our method and compare with other methods. Experimental results indicate that our method significantly outperforms the baseline methods: HAC, K-means and SACluster when only use coauthor and title attributes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Euclidean_distance.

  2. 2.

    https://en.wikipedia.org/wiki/Euclidean_distance.

  3. 3.

    https://cn.aminer.org/data-sna.

References

  1. Arif, T., Ali, R., Asger, M.: Author name disambiguation using vector space model and hybrid similarity measures. In: International Conference on Contemporary Computing-IC, pp. 135–140 (2014)

    Google Scholar 

  2. Bishop, T.A., Dudewicz, E.J.: Complete ranking of reliability-related distributions. IEEE Trans. Reliab. R–26(5), 362–365 (1977)

    Article  MATH  Google Scholar 

  3. Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (2013)

    Google Scholar 

  4. Cota, R.G., Ferreira, A.A., Nascimento, C., Goncalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)

    Article  Google Scholar 

  5. Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, pp. 296–305 (2004)

    Google Scholar 

  6. Han, H., Zha, H., Giles, C.L.: A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference (2003)

    Google Scholar 

  7. Han, H., Zha, H., Giles, C.L.: Name disambiguation spectral in author citations using a k-way clustering method. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Denver, CO, USA, 7–11 June, pp. 334–343 (2005)

    Google Scholar 

  8. Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)

    Article  Google Scholar 

  9. Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 569–584. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33460-3_42

    Chapter  Google Scholar 

  10. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  11. Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation, pp. 33–40 (2004)

    Google Scholar 

  12. Nadimi, M.H., Mosakhani, M.: A more accurate clustering method by using co-author social networks for author name disambiguation. J. Comput. Secur. 1, 307–317 (2015)

    Google Scholar 

  13. On, B.W.: Social network analysis on name disambiguation and more. In: International Conference on Convergence and Hybrid Information Technology, pp. 1081–1088 (2008)

    Google Scholar 

  14. On, B.W., Lee, I.: Meta similarity. Appl. Intell. 35(3), 359–374 (2011)

    Article  Google Scholar 

  15. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: NIPS, pp. 1425–1432 (2003)

    Google Scholar 

  16. Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)

    Article  Google Scholar 

  17. Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15, June, pp. 314–315 (2006)

    Google Scholar 

  18. Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2011)

    Article  Google Scholar 

  19. Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87599-4_19

    Chapter  Google Scholar 

  20. Yin, X., Han, J., Yu, P.S.: Object distinction: distinguishing objects with identical names. In: International Conference on Data Engineering, ICDE, The Marmara Hotel, Istanbul, Turkey, April, pp. 1242–1246 (2007)

    Google Scholar 

  21. Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)

    Article  Google Scholar 

  22. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)

    Article  Google Scholar 

  23. Zhu, J., Fung, G., Wang, L.: Efficient name disambiguation in digital libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 430–441. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_37

    Chapter  Google Scholar 

  24. Zhu, J., Cheong Fung, G.P., Zhou, X.: Anddy: a system for author name disambiguation in digital library. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 444–447. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12098-5_46

    Chapter  Google Scholar 

  25. Zhu, J., Zhou, X., Fung, G.P.C.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM -2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00672-2_29

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported by the Natural Science Foundation of Guangdong Province, China (2015A030310509), the Public Research and Capacity Building in Guangdong Province, China (2016A030303055), the Major Science and Technology projects of Guangdong Province, China (2016B030305004, 2016B010109008, 2016B010124008) and the National Natural Science Foundation of China (61272067).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lin, X., Zhu, J., Tang, Y., Yang, F., Peng, B., Li, W. (2017). A Novel Approach for Author Name Disambiguation Using Ranking Confidence. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55705-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55704-5

  • Online ISBN: 978-3-319-55705-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics