Abstract
In digital libraries, ambiguous author names may occur because of the existence of multiple authors with the same name or different name variations for the same person. In recent years, name disambiguation has become a major challenge when integrating data from multiple sources in bibliographic digital libraries. Most of the previous works solve this issue by using many attributes, such as coauthors, title of articles/publications, topics of articles, and years of publications. However, in most cases, we can only get the coauthor and title attributes. In this paper, we propose an approach which is based on Hierarchical Agglomerative Clustering (HAC) and only use the coauthor and title attributes, but can more effectively identify the disambiguation authors. The whole algorithm can divide into two stages. In the first stage, we employ a pair-wise grouping algorithm which is based on coauthors’name to group records into clusters. Then, we merge two clusters if the similarity of the article titles from two clusters reach the threshold. Here, we use three kinds of similarity algorithms such as Jaccard Similarity, Cosine Similarity and Euclidean Distance to compare the similarity between the titles of two clusters. To minimize the risk of using only one similarity metric, we design the concept of ranking confidence to measure the confidence of different similarity meausrements. The ranking confidence decides which similarity measure to use when merging clusters. In the experiments, we use PairPresicion, PairRecall and PairF1 score to evaluate our method and compare with other methods. Experimental results indicate that our method significantly outperforms the baseline methods: HAC, K-means and SACluster when only use coauthor and title attributes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arif, T., Ali, R., Asger, M.: Author name disambiguation using vector space model and hybrid similarity measures. In: International Conference on Contemporary Computing-IC, pp. 135–140 (2014)
Bishop, T.A., Dudewicz, E.J.: Complete ranking of reliability-related distributions. IEEE Trans. Reliab. R–26(5), 362–365 (1977)
Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (2013)
Cota, R.G., Ferreira, A.A., Nascimento, C., Goncalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)
Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, pp. 296–305 (2004)
Han, H., Zha, H., Giles, C.L.: A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference (2003)
Han, H., Zha, H., Giles, C.L.: Name disambiguation spectral in author citations using a k-way clustering method. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Denver, CO, USA, 7–11 June, pp. 334–343 (2005)
Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)
Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7523, pp. 569–584. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33460-3_42
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation, pp. 33–40 (2004)
Nadimi, M.H., Mosakhani, M.: A more accurate clustering method by using co-author social networks for author name disambiguation. J. Comput. Secur. 1, 307–317 (2015)
On, B.W.: Social network analysis on name disambiguation and more. In: International Conference on Convergence and Hybrid Information Technology, pp. 1081–1088 (2008)
On, B.W., Lee, I.: Meta similarity. Appl. Intell. 35(3), 359–374 (2011)
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: NIPS, pp. 1425–1432 (2003)
Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)
Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15, June, pp. 314–315 (2006)
Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2011)
Yang, K.-H., Peng, H.-T., Jiang, J.-Y., Lee, H.-M., Ho, J.-M.: Author name disambiguation for citations using topic and web correlation. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 185–196. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87599-4_19
Yin, X., Han, J., Yu, P.S.: Object distinction: distinguishing objects with identical names. In: International Conference on Data Engineering, ICDE, The Marmara Hotel, Istanbul, Turkey, April, pp. 1242–1246 (2007)
Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)
Zhu, J., Fung, G., Wang, L.: Efficient name disambiguation in digital libraries. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 430–441. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_37
Zhu, J., Cheong Fung, G.P., Zhou, X.: Anddy: a system for author name disambiguation in digital library. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 444–447. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12098-5_46
Zhu, J., Zhou, X., Fung, G.P.C.: A term-based driven clustering approach for name disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM -2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00672-2_29
Acknowledgement
This work was supported by the Natural Science Foundation of Guangdong Province, China (2015A030310509), the Public Research and Capacity Building in Guangdong Province, China (2016A030303055), the Major Science and Technology projects of Guangdong Province, China (2016B030305004, 2016B010109008, 2016B010124008) and the National Natural Science Foundation of China (61272067).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lin, X., Zhu, J., Tang, Y., Yang, F., Peng, B., Li, W. (2017). A Novel Approach for Author Name Disambiguation Using Ranking Confidence. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-55705-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)