Abstract
The desideratum of organizing and synthesizing the rising corpus of publications has prompted an escalation in bibliometric studies. Bibliometric analysis is an essential statistical tool that ascertains critical information for identifying research prospects for researchers. Besides, it acts as evidence to support scientific findings. Researchers primarily use either Scopus or Web of Science (WoS) databases for conducting bibliometric analysis. The individual usage of these databases in the bibliometric analysis does not achieve the desired outcome, which requires the merging of these two databases. There are several manual processes defined in the literature for merging Scopus and WoS data. However, all these manual procedures consume more time and may lead to an inaccurate merging of the databases, as they often involve human errors due to difficulty in data scrutinization. Hence, to avoid the manual process, this paper proposes an automatic process for merging Scopus and WoS data. To understand the importance of the proposed process, a small (40 records) and large (2344 records) dataset cases are considered on which both the manual and automatic processes are implemented. From the simulation results, it is observed that the proposed process consumed 0.4497659 s on small dataset and 1.715981 s on large dataset for merging process. Thus, it can be said that the proposed automatic merging process is an effective and time-saving approach that significantly reduces human effort and the risk of committing an error. The outcome of this process is a merged dataset that includes unique data of both Scopus and WoS databases.
Similar content being viewed by others
References
Raban DR, Gordon A (2020) The evolution of data science and big data research: a bibliometric analysis. Scientometrics 122:1563–1581. https://doi.org/10.1007/s11192-020-03371-2
Simplilearn (2022) What is data science: lifecycle, applications, prerequisites and tools. Simplilearn. https://www.simplilearn.com/tutorials/data-science-tutorial/what-is-data-science. Accessed 8 Jul 2022
Shi Y (2022) Advances in big data analytics: Theory, algorithm and practice. Springer, Singapore
Craig S, Adam H (n.d.) data mining. Search business analytics, Techtarget. https://www.techtarget.com/searchbusinessanalytics/definition/data-mining#:~:text=Data%20mining%20is%20a%20key,useful%20information%20in%20data%20sets. Accessed 9 Jul 2022
Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
Olson DL, Delen D (2008) Advanced data mining techniques. Springer Science & Business Media, Berlin
Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM (2021) How to conduct a bibliometric analysis: an overview and guidelines. J Bus Res 133:285–296. https://doi.org/10.1016/j.jbusres.2021.04.070
Saulo CM, de Benedicto CG, do Prado WJ, Robb AD, de Almeida BON, de Brito JM (2019) Mapping the literature on credit unions: a bibliometric investigation grounded in Scopus and web of science. Scientometrics 120(3):929–960. https://doi.org/10.1007/s11192-019-03165-1
Andrea C, Pizzi S, Pellegrini MM, Dabic M (2021) Digitalization and business models: Where are we going? A science map of the field. J Bus Res 123:489–501. https://doi.org/10.1016/j.jbusres.2020.09.053
Alan P, Ole VG (1969) Statistical bibliography or bibliometrics? J Doc 25(4):344–349. https://doi.org/10.1108/eb026482
Muhuri PK, Amit SK, Ajith A (2019) Industry 4.0: a bibliometric analysis and detailed overview. Eng Appl Artif Intell 78:218–235. https://doi.org/10.1016/j.engappai.2018.11.007
Ole E, Johan AW (2015) The bibliometric analysis of scholarly production: how great is the impact? Scientometrics 105(3):1809–1831. https://doi.org/10.1007/s11192-015-1645-z
Kumari R, Aakanksha K (2021) Work–life balance: a systematic literature review and bibliometric analysis. Int J Sociol Soc Policy. https://doi.org/10.1108/IJSSP-06-2021-0145
Sourabh K, Sarkar S, Bhawna C (2021) A systematic review of work-life integration and role of flexible work arrangements. Int J Organ Anal. https://doi.org/10.1108/IJOA-07-2021-2855
Quoc NA, Nguyen DV, Nguyet NAN (2021) Entrepreneurship, family and migration: a systematic literature review on Vietnamese migrant entrepreneurship. J Enterp Commun People Places Glob Econ. https://doi.org/10.1108/JEC-03-2020-0042
Laura M, Gloria BM, Antonio T, Pie L (2019) Bibliometric and visualization analysis of socially responsible funds. Sustainability 11(9):2526. https://doi.org/10.3390/su11092526
Michael CH (2011) Publish and perish? Bibliometric analysis, journal ranking and the assessment of research quality in tourism. Tour Manag 32(1):16–27. https://doi.org/10.1016/j.tourman.2010.07.001
Said E (2020) Why and how to merge Scopus and Web of Science during bibliometric analysis: the case of sales force literature from 1912 to 2019. J Mark Anal 8(3):165–184. https://doi.org/10.1057/s41270-020-00081-9
Martijn V, Nees JVE, Ludo W (2021) Large-scale comparison of bibliographic data sources: Scopus, web of science, dimensions, crossref, and microsoft academic. Quant Sci Stud 2(1):20–41. https://doi.org/10.1162/qss_a_00112
Amador DS, Maria CD, Jose AG (2017) Bibliometric analysis of publications on wine tourism in the databases Scopus and WoS. Eur Res Manag Bus Econ 23(1):8–15. https://doi.org/10.1016/j.iedeen.2016.02.001
Fernandez E, MIL Barbosa P, Guerrero PA (2010) Web of science vs. SCOPUS: a quantitative study in chemical engineering. Annals of Documentation 13: 159–175. https://revistas.um.es/analesdoc/article/view/107121. Accessed 11 Feb 2022
Philippe M, Adele PH (2016) The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics 106(1):213–228. https://doi.org/10.1007/s11192-015-1765-5
Pallab P, Lavji NZ (2021) Bibliometrics analysis and comparison of global research literatures on research data management extracted from scopus and web of science during 2000–2019. Libr Philos Pract (e-journal) 5519
Gavel Y, Iselid L (2008) Web of science and Scopus: a journal title overlap study. Online Inf Rev 32(1):8–21. https://doi.org/10.1108/14684520810865958
Lokman IM, Cassidy RS (2009) Assessing the scholarly impact of information studies: a tale of two citation databases—Scopus and Web of Science. J Am Soc Inform Sci Technol 60(12):2499–2508. https://doi.org/10.1002/asi.21165
Junwen Z, Weishu L (2020) A tale of two databases: the use of Web of Science and Scopus in academic papers. Scientometrics 123(1):321–335. https://doi.org/10.1007/s11192-020-03387-8
Lokman IM, Yang K (2006) A new era in citation and bibliometric analyses: web of science, Scopus, and google scholar. https://arxiv.org/abs/cs/0612132
Andrea C, Mariya K (2021) A user-friendly method to merge Scopus and Web of Science data during bibliometric analysis. J Mark Anal. https://doi.org/10.1057/s41270-021-00142-7
Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4(2):149–178. https://doi.org/10.1007/s40745-017-0112-5
Shi Y, Tian YJ, Kou G, Peng Y, Li JP (2011) Optimization based data mining: theory and applications. Springer, Berlin
Mariana B, Pedro C, Alicja O (2021) Top 21 data mining tools. Imaginarycloud. https://www.imaginarycloud.com/blog/data-mining-tools/. Accessed 9 Jul 2022
David T, Denyer D, Palmindar S (2003) Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br J Manag 14(3):207–222. https://doi.org/10.1111/1467-8551.00375
David M, Liberati A, Jennifer T, Altman GD, The PRISMA Group (2009) Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097
Massimo A, Corrado C (2017) Bibliometrix: an R-tool for comprehensive science mapping analysis. J Informetr 11(4):959–975. https://doi.org/10.1016/j.joi.2017.08.007
Massimo A, Corrado C (2017a) A brief introduction to bibliometrix. Accessed from https://www.bibliometrix.org/vignettes/Introduction_to_bibliometrix.html on 16 Feb 2022
Acknowledgements
The authors acknowledge Mr. K. Purna Prakash, Dr. Y. V. Pavan Kumar, and Mr. G. Pradeep Reddy affiliated with VIT-AP University, and the reviewers for their insightful advices to improve the quality of the paper.
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization, review, and editing were done by SR. Data extraction, code executions, and formal analysis were done by HJK.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethical statements
We have followed all publication ethics as per the ethical guidelines of the journal.
Data availability
Not applicable.
Code availability
Not provided directly.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kasaraneni, H., Rosaline, S. Automatic Merging of Scopus and Web of Science Data for Simplified and Effective Bibliometric Analysis. Ann. Data. Sci. (2022). https://doi.org/10.1007/s40745-022-00438-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40745-022-00438-0