Abstract
The problem of author name ambiguity in digital bibliography repositories can compromise the integrity and reliability of data. There are several techniques available in the literature to solve the author name disambiguation problem. In this work, we present a multi-strategic approach for author name disambiguation in bibliography repositories applying comparison of strings with the Jaccard similarity coefficient, Levenshtein distance measure, and social network clustering technique. Information from the DBLP digital bibliography repository is used to compare disambiguation results to SCI-synergy, an online scientific social network analysis artifact. The proposed approach outperforms the baseline with a precision of 0.8867, recall of 1, and F-measure of 0.9399, considering a Brazilian graduate program case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, A.F., Gonçalves, M.A., Laender, A.H.F.: Automatic disambiguation of author names in bibliographic repositories. Synth. Lect. Inf. Concept. Retrieval Serv. 12(1), 1–146 (2020). https://doi.org/10.2200/S01011ED1V01Y202005ICR070
DBLP: Bibliographies statistics (2020). https://blog.dblp.org/2020/03/26/5-million-publications/
Kim, J., Kim, J., Owen-Smith, J.: Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics 118(1), 253–280 (2018). https://doi.org/10.1007/s11192-018-2968-3
Shin, D., Kim, T., Choi, J., Kim, J.: Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 15–50 (2014). https://doi.org/10.1007/s11192-014-1289-4
Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8397, pp. 123–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05476-6_13
Hussain, I., Asghar, S.: A survey of author name disambiguation techniques: 2010–2016. Knowl. Eng. Rev. 32, (2017). https://doi.org/10.1017/S0269888917000182
Saeedi, A., Nentwig, M., Peukert, E., Rahm, E.: Scalable matching and clustering of entities with FAMER. Complex Syst. Inf. Model. Q. 16, 61–83 (2018). https://doi.org/10.7250/csimq.2018-16.04
Sanyal, D.K., Bhowmick, P.K., Das, P.P.: A review of author name disambiguation techniques for the pubmed bibliographic database. J. Inf. Sci. (2019). https://doi.org/10.1177/0165551519888605
InfoKnow Research Group.: SCI-Synergy: Synergy of Science. http://165.227.113.212
Bollen, J., Rodriguez, M.A., Van de Sompel, H., Balakireva, L.L., Hagberg, A.: The largest scholarly semantic network...ever. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1247–1248. ACM (2007). https://doi.org/10.1145/1242572.1242789
Hussain, I., Asghar, S.: Incremental author name disambiguation using author profile models and self-citations. Turk. J. Electr. Eng. Comput. Sci. 27, 3665–3681 (2019). https://doi.org/10.3906/elk-1806-132
Hussain, I., Asghar, S.: DISC: dsambiguating homonyms using graph structural clustering. J. Inf. Sci. 44(6), 830–847 (2018). https://doi.org/10.1177/0165551518761011
Gu, S., Xu, X., Zhu, J., Ji, L.: Name disambiguation method based on multi-step clustering. In: Shakshuki, E.M. (ed.) The 7th International Conference on Ambient Systems, Networks and Technologies (ANT 2016)/The 6th International Conference on Sustainable Energy Information Technology (SEIT-2016)/Affiliated Workshops, 23–26 May 2016, Madrid, Spain, vol. 83 of Procedia Computer Science, pp. 488–495. Elsevier (2016). https://doi.org/10.1016/j.procs.2016.04.237
Hussain, I., Asghar, S.: LUCID: author name disambiguation using graph structural clustering. In: Proceedings of the Intelligent Systems Conference (IntelliSys), pp. 406–413. IEEE (2017). https://doi.org/10.1109/IntelliSys.2017.8324326
Shiokawa, H., Fujiwara, Y., Onizuka, I.: SCAN++: eficient algorithm for finding clusters, hubs and outliers on large-scale graphs. Proc. VLDB Endow. 8(11), 1178–1189 (2015). https://doi.org/10.14778/2809974.2809980
Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. Distributed by ERIC Clearinghouse, Washington, D.C. (1990). https://eric.ed.gov/?id=ED325505
Niwattanakul, S., Singthongchai, J., Naenudorn, E., Wanap, W.E.: Using of Jaccard coefficient for keywords similarity. In: Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS), vol. 1 (2013)
Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012). https://doi.org/10.1145/2350036.2350040
Xu, X., Yuruk, N., Feng, Z., Schweiger, TA.J.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007). https://doi.org/10.1145/1281192.1281280
Zhang, Y., Zhang, E., Yao, P., Tang, J.: Name disambiguation in aminer: clustering, maintenance, and human in the loop. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1002–1011 (2018)
Peng, L., Shen, S., Li, D., Xu, J., Fu, Y., Su, H.: Author disambiguation through adversarial network representation learning. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN.2019.8852233
Xinhua, S.Z.E., Pan. T.: A multi-level author name disambiguation algorithm. IEEE Access 7, 104250–104257 (2019). https://doi.org/10.1109/ACCESS.2019.2931592
Kumar, M., Bhatia, R., Dhavleesh, R.: A survey of web crawlers for information retrieval. WIREs Data Mining Knowl. Discovery 7(6), (2017). https://doi.org/10.1002/widm.1218
WarchaŁ, Ł.: Using Neo4j graph database in social network analysis. Stud. Informatica 33(2A), 271–279 (2012). https://doi.org/10.21936/SI2012_V33.N2A.147
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2008, pp. 990–998. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1401890.1402008
Wang, K.: A review of Microsoft academic services for science of science studies. Front. Big Data 2, 45 (2019). https://doi.org/10.3389/fdata.2019.00045
Needham, M., Hodler, A.E.: Graph Algorithms: Practical Examples in Apache Spark and Neo4j. O’Reilly Media (2019)
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 37–63 (2011). http://www.bioinfo.in/contents.php?id=51
Tharwat, A.: Classification assessment methods. Applied Computing and Informatics, ahead-of-print (2020). ISSN: 2634-1964. https://doi.org/10.1016/j.aci.2018.08.003
Acknowledgments
Prof. Célia G. Ralha thanks the support received from the Brazilian National Council for Scientific and Technological Development (CNPq) for the research grant in Computer Science number 311301/2018-5.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de Souza Rodrigues, N., Costa, A.R., Lemos, L.C., Ralha, C.G. (2021). Multi-strategic Approach for Author Name Disambiguation in Bibliography Repositories. In: Lossio-Ventura, J.A., Valverde-Rebaza, J.C., Díaz, E., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2020. Communications in Computer and Information Science, vol 1410. Springer, Cham. https://doi.org/10.1007/978-3-030-76228-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-76228-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76227-8
Online ISBN: 978-3-030-76228-5
eBook Packages: Computer ScienceComputer Science (R0)