A Novel Approach for Author Name Disambiguation Using Ranking Confidence

Lin, Xueqin; Zhu, Jia; Tang, Yong; Yang, Fen; Peng, Bo; Li, Weiling

doi:10.1007/978-3-319-55705-2_13

A Novel Approach for Author Name Disambiguation Using Ranking Confidence

Xueqin Lin¹⁷,
Jia Zhu¹⁷,
Yong Tang¹⁷,
Fen Yang¹⁷,
Bo Peng¹⁷ &
…
Weiling Li¹⁷

Conference paper
First Online: 22 March 2017

1723 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Abstract

In digital libraries, ambiguous author names may occur because of the existence of multiple authors with the same name or different name variations for the same person. In recent years, name disambiguation has become a major challenge when integrating data from multiple sources in bibliographic digital libraries. Most of the previous works solve this issue by using many attributes, such as coauthors, title of articles/publications, topics of articles, and years of publications. However, in most cases, we can only get the coauthor and title attributes. In this paper, we propose an approach which is based on Hierarchical Agglomerative Clustering (HAC) and only use the coauthor and title attributes, but can more effectively identify the disambiguation authors. The whole algorithm can divide into two stages. In the first stage, we employ a pair-wise grouping algorithm which is based on coauthors’name to group records into clusters. Then, we merge two clusters if the similarity of the article titles from two clusters reach the threshold. Here, we use three kinds of similarity algorithms such as Jaccard Similarity, Cosine Similarity and Euclidean Distance to compare the similarity between the titles of two clusters. To minimize the risk of using only one similarity metric, we design the concept of ranking confidence to measure the confidence of different similarity meausrements. The ranking confidence decides which similarity measure to use when merging clusters. In the experiments, we use PairPresicion, PairRecall and PairF1 score to evaluate our method and compare with other methods. Experimental results indicate that our method significantly outperforms the baseline methods: HAC, K-means and SACluster when only use coauthor and title attributes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Arif, T., Ali, R., Asger, M.: Author name disambiguation using vector space model and hybrid similarity measures. In: International Conference on Contemporary Computing-IC, pp. 135–140 (2014)
Google Scholar
Bishop, T.A., Dudewicz, E.J.: Complete ranking of reliability-related distributions. IEEE Trans. Reliab. R–26(5), 362–365 (1977)
Article MATH Google Scholar
Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (2013)
Google Scholar
Cota, R.G., Ferreira, A.A., Nascimento, C., Goncalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)
Article Google Scholar
Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, pp. 296–305 (2004)
Google Scholar
Han, H., Zha, H., Giles, C.L.: A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference (2003)
Google Scholar
Han, H., Zha, H., Giles, C.L.: Name disambiguation spectral in author citations using a k-way clustering method. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Denver, CO, USA, 7–11 June, pp. 334–343 (2005)
Google Scholar
Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)
Article Google Scholar
Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 569–584. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33460-3_42
Chapter Google Scholar
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation, pp. 33–40 (2004)
Google Scholar
Nadimi, M.H., Mosakhani, M.: A more accurate clustering method by using co-author social networks for author name disambiguation. J. Comput. Secur. 1, 307–317 (2015)
Google Scholar
On, B.W.: Social network analysis on name disambiguation and more. In: International Conference on Convergence and Hybrid Information Technology, pp. 1081–1088 (2008)
Google Scholar
On, B.W., Lee, I.: Meta similarity. Appl. Intell. 35(3), 359–374 (2011)
Article Google Scholar
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: NIPS, pp. 1425–1432 (2003)
Google Scholar
Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)
Article Google Scholar
Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15, June, pp. 314–315 (2006)
Google Scholar
Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2011)
Article Google Scholar
Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87599-4_19
Chapter Google Scholar
Yin, X., Han, J., Yu, P.S.: Object distinction: distinguishing objects with identical names. In: International Conference on Data Engineering, ICDE, The Marmara Hotel, Istanbul, Turkey, April, pp. 1242–1246 (2007)
Google Scholar
Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)
Article Google Scholar
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)
Article Google Scholar
Zhu, J., Fung, G., Wang, L.: Efficient name disambiguation in digital libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 430–441. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_37
Chapter Google Scholar
Zhu, J., Cheong Fung, G.P., Zhou, X.: Anddy: a system for author name disambiguation in digital library. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 444–447. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12098-5_46
Chapter Google Scholar
Zhu, J., Zhou, X., Fung, G.P.C.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM -2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00672-2_29
Chapter Google Scholar

Download references

Acknowledgement

This work was supported by the Natural Science Foundation of Guangdong Province, China (2015A030310509), the Public Research and Capacity Building in Guangdong Province, China (2016A030303055), the Major Science and Technology projects of Guangdong Province, China (2016B030305004, 2016B010109008, 2016B010124008) and the National Natural Science Foundation of China (61272067).

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Xueqin Lin, Jia Zhu, Yong Tang, Fen Yang, Bo Peng & Weiling Li

Authors

Xueqin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Fen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Peng
View author publications
You can also search for this author in PubMed Google Scholar
Weiling Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Tang .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology , Melbourne, Australia
Zhifeng Bao
Northwestern University , Evanston, Illinois, USA
Goce Trajcevski
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, X., Zhu, J., Tang, Y., Yang, F., Peng, B., Li, W. (2017). A Novel Approach for Author Name Disambiguation Using Ranking Confidence. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-55705-2_13
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics