Skip to main content
Log in

Automatic Merging of Scopus and Web of Science Data for Simplified and Effective Bibliometric Analysis

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

The desideratum of organizing and synthesizing the rising corpus of publications has prompted an escalation in bibliometric studies. Bibliometric analysis is an essential statistical tool that ascertains critical information for identifying research prospects for researchers. Besides, it acts as evidence to support scientific findings. Researchers primarily use either Scopus or Web of Science (WoS) databases for conducting bibliometric analysis. The individual usage of these databases in the bibliometric analysis does not achieve the desired outcome, which requires the merging of these two databases. There are several manual processes defined in the literature for merging Scopus and WoS data. However, all these manual procedures consume more time and may lead to an inaccurate merging of the databases, as they often involve human errors due to difficulty in data scrutinization. Hence, to avoid the manual process, this paper proposes an automatic process for merging Scopus and WoS data. To understand the importance of the proposed process, a small (40 records) and large (2344 records) dataset cases are considered on which both the manual and automatic processes are implemented. From the simulation results, it is observed that the proposed process consumed 0.4497659 s on small dataset and 1.715981 s on large dataset for merging process. Thus, it can be said that the proposed automatic merging process is an effective and time-saving approach that significantly reduces human effort and the risk of committing an error. The outcome of this process is a merged dataset that includes unique data of both Scopus and WoS databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Raban DR, Gordon A (2020) The evolution of data science and big data research: a bibliometric analysis. Scientometrics 122:1563–1581. https://doi.org/10.1007/s11192-020-03371-2

    Article  Google Scholar 

  2. Simplilearn (2022) What is data science: lifecycle, applications, prerequisites and tools. Simplilearn. https://www.simplilearn.com/tutorials/data-science-tutorial/what-is-data-science. Accessed 8 Jul 2022

  3. Shi Y (2022) Advances in big data analytics: Theory, algorithm and practice. Springer, Singapore

    Book  Google Scholar 

  4. Craig S, Adam H (n.d.) data mining. Search business analytics, Techtarget. https://www.techtarget.com/searchbusinessanalytics/definition/data-mining#:~:text=Data%20mining%20is%20a%20key,useful%20information%20in%20data%20sets. Accessed 9 Jul 2022

  5. Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York

    Google Scholar 

  6. Olson DL, Delen D (2008) Advanced data mining techniques. Springer Science & Business Media, Berlin

    Google Scholar 

  7. Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM (2021) How to conduct a bibliometric analysis: an overview and guidelines. J Bus Res 133:285–296. https://doi.org/10.1016/j.jbusres.2021.04.070

    Article  Google Scholar 

  8. Saulo CM, de Benedicto CG, do Prado WJ, Robb AD, de Almeida BON, de Brito JM (2019) Mapping the literature on credit unions: a bibliometric investigation grounded in Scopus and web of science. Scientometrics 120(3):929–960. https://doi.org/10.1007/s11192-019-03165-1

    Article  Google Scholar 

  9. Andrea C, Pizzi S, Pellegrini MM, Dabic M (2021) Digitalization and business models: Where are we going? A science map of the field. J Bus Res 123:489–501. https://doi.org/10.1016/j.jbusres.2020.09.053

    Article  Google Scholar 

  10. Alan P, Ole VG (1969) Statistical bibliography or bibliometrics? J Doc 25(4):344–349. https://doi.org/10.1108/eb026482

    Article  Google Scholar 

  11. Muhuri PK, Amit SK, Ajith A (2019) Industry 4.0: a bibliometric analysis and detailed overview. Eng Appl Artif Intell 78:218–235. https://doi.org/10.1016/j.engappai.2018.11.007

    Article  Google Scholar 

  12. Ole E, Johan AW (2015) The bibliometric analysis of scholarly production: how great is the impact? Scientometrics 105(3):1809–1831. https://doi.org/10.1007/s11192-015-1645-z

    Article  Google Scholar 

  13. Kumari R, Aakanksha K (2021) Work–life balance: a systematic literature review and bibliometric analysis. Int J Sociol Soc Policy. https://doi.org/10.1108/IJSSP-06-2021-0145

    Article  Google Scholar 

  14. Sourabh K, Sarkar S, Bhawna C (2021) A systematic review of work-life integration and role of flexible work arrangements. Int J Organ Anal. https://doi.org/10.1108/IJOA-07-2021-2855

    Article  Google Scholar 

  15. Quoc NA, Nguyen DV, Nguyet NAN (2021) Entrepreneurship, family and migration: a systematic literature review on Vietnamese migrant entrepreneurship. J Enterp Commun People Places Glob Econ. https://doi.org/10.1108/JEC-03-2020-0042

    Article  Google Scholar 

  16. Laura M, Gloria BM, Antonio T, Pie L (2019) Bibliometric and visualization analysis of socially responsible funds. Sustainability 11(9):2526. https://doi.org/10.3390/su11092526

    Article  Google Scholar 

  17. Michael CH (2011) Publish and perish? Bibliometric analysis, journal ranking and the assessment of research quality in tourism. Tour Manag 32(1):16–27. https://doi.org/10.1016/j.tourman.2010.07.001

    Article  Google Scholar 

  18. Said E (2020) Why and how to merge Scopus and Web of Science during bibliometric analysis: the case of sales force literature from 1912 to 2019. J Mark Anal 8(3):165–184. https://doi.org/10.1057/s41270-020-00081-9

    Article  Google Scholar 

  19. Martijn V, Nees JVE, Ludo W (2021) Large-scale comparison of bibliographic data sources: Scopus, web of science, dimensions, crossref, and microsoft academic. Quant Sci Stud 2(1):20–41. https://doi.org/10.1162/qss_a_00112

    Article  Google Scholar 

  20. Amador DS, Maria CD, Jose AG (2017) Bibliometric analysis of publications on wine tourism in the databases Scopus and WoS. Eur Res Manag Bus Econ 23(1):8–15. https://doi.org/10.1016/j.iedeen.2016.02.001

    Article  Google Scholar 

  21. Fernandez E, MIL Barbosa P, Guerrero PA (2010) Web of science vs. SCOPUS: a quantitative study in chemical engineering. Annals of Documentation 13: 159–175. https://revistas.um.es/analesdoc/article/view/107121. Accessed 11 Feb 2022

  22. Philippe M, Adele PH (2016) The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics 106(1):213–228. https://doi.org/10.1007/s11192-015-1765-5

    Article  Google Scholar 

  23. Pallab P, Lavji NZ (2021) Bibliometrics analysis and comparison of global research literatures on research data management extracted from scopus and web of science during 2000–2019. Libr Philos Pract (e-journal) 5519

  24. Gavel Y, Iselid L (2008) Web of science and Scopus: a journal title overlap study. Online Inf Rev 32(1):8–21. https://doi.org/10.1108/14684520810865958

    Article  Google Scholar 

  25. Lokman IM, Cassidy RS (2009) Assessing the scholarly impact of information studies: a tale of two citation databases—Scopus and Web of Science. J Am Soc Inform Sci Technol 60(12):2499–2508. https://doi.org/10.1002/asi.21165

    Article  Google Scholar 

  26. Junwen Z, Weishu L (2020) A tale of two databases: the use of Web of Science and Scopus in academic papers. Scientometrics 123(1):321–335. https://doi.org/10.1007/s11192-020-03387-8

    Article  Google Scholar 

  27. Lokman IM, Yang K (2006) A new era in citation and bibliometric analyses: web of science, Scopus, and google scholar. https://arxiv.org/abs/cs/0612132

  28. Andrea C, Mariya K (2021) A user-friendly method to merge Scopus and Web of Science data during bibliometric analysis. J Mark Anal. https://doi.org/10.1057/s41270-021-00142-7

    Article  Google Scholar 

  29. Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4(2):149–178. https://doi.org/10.1007/s40745-017-0112-5

    Article  Google Scholar 

  30. Shi Y, Tian YJ, Kou G, Peng Y, Li JP (2011) Optimization based data mining: theory and applications. Springer, Berlin

    Book  Google Scholar 

  31. Mariana B, Pedro C, Alicja O (2021) Top 21 data mining tools. Imaginarycloud. https://www.imaginarycloud.com/blog/data-mining-tools/. Accessed 9 Jul 2022

  32. David T, Denyer D, Palmindar S (2003) Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br J Manag 14(3):207–222. https://doi.org/10.1111/1467-8551.00375

    Article  Google Scholar 

  33. David M, Liberati A, Jennifer T, Altman GD, The PRISMA Group (2009) Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097

    Article  Google Scholar 

  34. Massimo A, Corrado C (2017) Bibliometrix: an R-tool for comprehensive science mapping analysis. J Informetr 11(4):959–975. https://doi.org/10.1016/j.joi.2017.08.007

    Article  Google Scholar 

  35. Massimo A, Corrado C (2017a) A brief introduction to bibliometrix. Accessed from https://www.bibliometrix.org/vignettes/Introduction_to_bibliometrix.html on 16 Feb 2022

Download references

Acknowledgements

The authors acknowledge Mr. K. Purna Prakash, Dr. Y. V. Pavan Kumar, and Mr. G. Pradeep Reddy affiliated with VIT-AP University, and the reviewers for their insightful advices to improve the quality of the paper.

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, review, and editing were done by SR. Data extraction, code executions, and formal analysis were done by HJK.

Corresponding author

Correspondence to Salini Rosaline.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical statements

We have followed all publication ethics as per the ethical guidelines of the journal.

Data availability

Not applicable.

Code availability

Not provided directly.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kasaraneni, H., Rosaline, S. Automatic Merging of Scopus and Web of Science Data for Simplified and Effective Bibliometric Analysis. Ann. Data. Sci. (2022). https://doi.org/10.1007/s40745-022-00438-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40745-022-00438-0

Keywords

Navigation