Skip to main content

Multi-strategic Approach for Author Name Disambiguation in Bibliography Repositories

  • Conference paper
  • First Online:
Information Management and Big Data (SIMBig 2020)

Abstract

The problem of author name ambiguity in digital bibliography repositories can compromise the integrity and reliability of data. There are several techniques available in the literature to solve the author name disambiguation problem. In this work, we present a multi-strategic approach for author name disambiguation in bibliography repositories applying comparison of strings with the Jaccard similarity coefficient, Levenshtein distance measure, and social network clustering technique. Information from the DBLP digital bibliography repository is used to compare disambiguation results to SCI-synergy, an online scientific social network analysis artifact. The proposed approach outperforms the baseline with a precision of 0.8867, recall of 1, and F-measure of 0.9399, considering a Brazilian graduate program case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://dblp.uni-trier.de/.

  2. 2.

    https://baike.baidu.com.

  3. 3.

    http://www.cnki.net/.

  4. 4.

    https://dblp.org/search/publ/api.

  5. 5.

    https://neo4j.com/.

  6. 6.

    https://web.archive.org/web/20110728092533/http://arnetminer.org/.

  7. 7.

    https://academic.microsoft.com/home.

References

  1. Anderson, A.F., Gonçalves, M.A., Laender, A.H.F.: Automatic disambiguation of author names in bibliographic repositories. Synth. Lect. Inf. Concept. Retrieval Serv. 12(1), 1–146 (2020). https://doi.org/10.2200/S01011ED1V01Y202005ICR070

  2. DBLP: Bibliographies statistics (2020). https://blog.dblp.org/2020/03/26/5-million-publications/

  3. Kim, J., Kim, J., Owen-Smith, J.: Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics 118(1), 253–280 (2018). https://doi.org/10.1007/s11192-018-2968-3

    Article  Google Scholar 

  4. Shin, D., Kim, T., Choi, J., Kim, J.: Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 15–50 (2014). https://doi.org/10.1007/s11192-014-1289-4

    Article  Google Scholar 

  5. Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8397, pp. 123–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05476-6_13

    Chapter  Google Scholar 

  6. Hussain, I., Asghar, S.: A survey of author name disambiguation techniques: 2010–2016. Knowl. Eng. Rev. 32, (2017). https://doi.org/10.1017/S0269888917000182

  7. Saeedi, A., Nentwig, M., Peukert, E., Rahm, E.: Scalable matching and clustering of entities with FAMER. Complex Syst. Inf. Model. Q. 16, 61–83 (2018). https://doi.org/10.7250/csimq.2018-16.04

    Article  Google Scholar 

  8. Sanyal, D.K., Bhowmick, P.K., Das, P.P.: A review of author name disambiguation techniques for the pubmed bibliographic database. J. Inf. Sci. (2019). https://doi.org/10.1177/0165551519888605

    Article  Google Scholar 

  9. InfoKnow Research Group.: SCI-Synergy: Synergy of Science. http://165.227.113.212

  10. Bollen, J., Rodriguez, M.A., Van de Sompel, H., Balakireva, L.L., Hagberg, A.: The largest scholarly semantic network...ever. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1247–1248. ACM (2007). https://doi.org/10.1145/1242572.1242789

  11. Hussain, I., Asghar, S.: Incremental author name disambiguation using author profile models and self-citations. Turk. J. Electr. Eng. Comput. Sci. 27, 3665–3681 (2019). https://doi.org/10.3906/elk-1806-132

    Article  Google Scholar 

  12. Hussain, I., Asghar, S.: DISC: dsambiguating homonyms using graph structural clustering. J. Inf. Sci. 44(6), 830–847 (2018). https://doi.org/10.1177/0165551518761011

    Article  Google Scholar 

  13. Gu, S., Xu, X., Zhu, J., Ji, L.: Name disambiguation method based on multi-step clustering. In: Shakshuki, E.M. (ed.) The 7th International Conference on Ambient Systems, Networks and Technologies (ANT 2016)/The 6th International Conference on Sustainable Energy Information Technology (SEIT-2016)/Affiliated Workshops, 23–26 May 2016, Madrid, Spain, vol. 83 of Procedia Computer Science, pp. 488–495. Elsevier (2016). https://doi.org/10.1016/j.procs.2016.04.237

  14. Hussain, I., Asghar, S.: LUCID: author name disambiguation using graph structural clustering. In: Proceedings of the Intelligent Systems Conference (IntelliSys), pp. 406–413. IEEE (2017). https://doi.org/10.1109/IntelliSys.2017.8324326

  15. Shiokawa, H., Fujiwara, Y., Onizuka, I.: SCAN++: eficient algorithm for finding clusters, hubs and outliers on large-scale graphs. Proc. VLDB Endow. 8(11), 1178–1189 (2015). https://doi.org/10.14778/2809974.2809980

  16. Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. Distributed by ERIC Clearinghouse, Washington, D.C. (1990). https://eric.ed.gov/?id=ED325505

  17. Niwattanakul, S., Singthongchai, J., Naenudorn, E., Wanap, W.E.: Using of Jaccard coefficient for keywords similarity. In: Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS), vol. 1 (2013)

    Google Scholar 

  18. Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012). https://doi.org/10.1145/2350036.2350040

    Article  Google Scholar 

  19. Xu, X., Yuruk, N., Feng, Z., Schweiger, TA.J.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007). https://doi.org/10.1145/1281192.1281280

  20. Zhang, Y., Zhang, E., Yao, P., Tang, J.: Name disambiguation in aminer: clustering, maintenance, and human in the loop. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1002–1011 (2018)

    Google Scholar 

  21. Peng, L., Shen, S., Li, D., Xu, J., Fu, Y., Su, H.: Author disambiguation through adversarial network representation learning. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN.2019.8852233

  22. Xinhua, S.Z.E., Pan. T.: A multi-level author name disambiguation algorithm. IEEE Access 7, 104250–104257 (2019). https://doi.org/10.1109/ACCESS.2019.2931592

  23. Kumar, M., Bhatia, R., Dhavleesh, R.: A survey of web crawlers for information retrieval. WIREs Data Mining Knowl. Discovery 7(6), (2017). https://doi.org/10.1002/widm.1218

  24. WarchaŁ, Ł.: Using Neo4j graph database in social network analysis. Stud. Informatica 33(2A), 271–279 (2012). https://doi.org/10.21936/SI2012_V33.N2A.147

  25. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2008, pp. 990–998. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1401890.1402008

  26. Wang, K.: A review of Microsoft academic services for science of science studies. Front. Big Data 2, 45 (2019). https://doi.org/10.3389/fdata.2019.00045

    Article  Google Scholar 

  27. Needham, M., Hodler, A.E.: Graph Algorithms: Practical Examples in Apache Spark and Neo4j. O’Reilly Media (2019)

    Google Scholar 

  28. Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 37–63 (2011). http://www.bioinfo.in/contents.php?id=51

  29. Tharwat, A.: Classification assessment methods. Applied Computing and Informatics, ahead-of-print (2020). ISSN: 2634-1964. https://doi.org/10.1016/j.aci.2018.08.003

Download references

Acknowledgments

Prof. Célia G. Ralha thanks the support received from the Brazilian National Council for Scientific and Technological Development (CNPq) for the research grant in Computer Science number 311301/2018-5.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Souza Rodrigues, N., Costa, A.R., Lemos, L.C., Ralha, C.G. (2021). Multi-strategic Approach for Author Name Disambiguation in Bibliography Repositories. In: Lossio-Ventura, J.A., Valverde-Rebaza, J.C., Díaz, E., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2020. Communications in Computer and Information Science, vol 1410. Springer, Cham. https://doi.org/10.1007/978-3-030-76228-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-76228-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-76227-8

  • Online ISBN: 978-3-030-76228-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics