The SARS-CoV-2 virus, which causes Covid-19, induced a global pandemic for which an effective cure, either in the form of a drug or vaccine, has yet to be discovered. In the few brief months that the world has known Covid-19, there has been an unprecedented volume of papers published related to this disease, either in a bid to find solutions, or to discuss applied or related aspects. Data from Clarivate Analytics’ Web of Science, and Elsevier’s Scopus, which do not index preprints, were assessed. Our estimates indicate that 23,634 unique documents, 9960 of which were in common to both databases, were published between January 1 and June 30, 2020. Publications include research articles, letters, editorials, notes and reviews. As one example, amongst the 21,542 documents in Scopus, 47.6% were research articles, 22.4% were letters, and the rest were reviews, editorials, notes and other. Based on both databases, the top three countries, ranked by volume of published papers, are the USA, China, and Italy while BMJ, Journal of Medical Virology and The Lancet published the largest number of Covid-19-related papers. This paper provides one snapshot of how the publishing landscape has evolved in the first six months of 2020 in response to this pandemic and discusses the risks associated with the speed of publications.
The global Covid-19 pandemic has already infected 13,734,518 people, causing 588,149 deathsFootnote 1 in one of humanity’s greatest challenges of modern time. For academics, it has provided an extremely rare opportunity to examine so many aspects, biomedical and other (economic, public health, psychology, social, historical, etc.), related to the SARS-CoV-2 virus. In just a few months, crudely tens of thousands of preprints and peer reviewed papers and other documents related to Covid-19 have been published. NCBI’s LitCovid (Chen et al. 2020) shows that 31,360 documents had already been published on PubMed until July 14, 2020. The objective of this paper is to provide a snapshot of the publishing landscape to appreciate the volume of papers that have been published in indexed, peer reviewed journals in two major databases, Clarivate Analytics’ Web of Science (WoS) and Elsevier’s Scopus, from January 1 until June 30, 2020. More than 80% (83% in Scopus, 89% in WoS) of Covid-19 papers are open access (OA), but not all, despite a public agreement to make all papers OA.Footnote 2 We synthesize what we have discovered in these two databases. A brief discussion of some of the risks of the high publication volume of papers related to Covid-19 and the discrepancies between database findings is provided at the end of the letter.
A search was limited, in WoS (SCI, SSCI, A&HCI and ESCI) and Scopus, to any documents published in the past six months (January 1 to June 30, 2020) using the search query “SARS-CoV-2” OR “COVID-19” OR “Coronavirus 2019” OR “Corona Virus 2019” OR “novel coronavirus” OR “novel corona virus” OR “2019-nCoV”, in titles and keywords. Data was collected on July 1, 2020. As a result, 12,331 and 21,602 documents were retrieved from WoS and Scopus, respectively. These documents were then cleaned up manually in order to remove any duplicates. In a few cases, 2.3% in WoS and 0.28% in Scopus, early access and published versions of a same document were published twice with different accession numbers (in WoS) or Scopus IDs. Consequently, the sample sizes decreased to 12,052 documents with unique digital object identifiers (DOIs) or titles for WoS and 21,542 documents for Scopus. These records were then analyzed by a variety of fields including subject areas, document types, organizations, funding sponsors, authors, source titles, countries, languages, and most cited documents.
We discovered 23,634 unique documents in WoS and Scopus, i.e., documents with unique DOIs or titles. More specifically, 2092 documents were exclusively indexed in WoS (but not in Scopus), 11,582 documents were exclusively indexed in Scopus (but not in WoS), and 9960 documents were indexed in both WoS and Scopus (Fig. 1).
Table 1 indicates that articles, letters and editorials were generally ranked as the top three categories of published documents, although the rank and the relative percentage differed, depending on the database. For example, original papers accounted for 47.6% of all Covid-19-related documents in Scopus (vs. 36.8% in WoS). In general, editorials are not peer reviewed, while letters to the editors are generally only screened by editors.
Scopus reveals (Table 2) that the top three (by volume of unique documents) institutions were: Huazhong University of Science and Technology (China) (442), Tongji Medical College (China) (433), and Harvard Medical School (USA) (395); publishing authors: Viroj Wiwanitkit (83), Elisabeth Mahase (65), and Gareth Iacobucci (53); countries: USA (5033), China (3511), and Italy (2590); funding sponsors: National Natural Science Foundation of China (560), National Institutes of Health (USA) (254), and National Institute for Health Research (UK) (88); source titles: BMJFootnote 3 (BMJ Publishing Group Ltd.) (574), Journal of Medical Virology (John Wiley & Sons, Inc.) (317), and The Lancet (Elsevier) (230); languagesFootnote 4: English (20,232), Chinese (510), and Spanish (475); subject areas: medicine (17,578), biochemistry, genetics and molecular biology (2,065), and immunology and microbiology (1722); top cited papers: 3469 citations (https://doi.org/10.1016/s0140-6736(20)30183-5; Huang et al.; The Lancet), 2031 citations (https://doi.org/10.1001/jama.2020.1585; Wang et al. JAMA), and 1887 citations (https://doi.org/10.1056/nejmoa2001017; Zhu et al. New England Journal of Medicine).
WoS reveals (Table 3) that the top three (by volume of unique documents) institutions were: University of London (UK) (370), Harvard University (USA) (292), and University of California (USA) (250); publishing authors: Elisabeth Mahase (55), Gareth Iacobucci (43), Viroj Wiwanitkit (39); countries: USA (2999), China (2131), and Italy (1513); funding sponsors: National Natural Science Foundation of China (472), United States Department of Health and Human Services (329), and National Institutes of Health (USA) (317); source titles: BMJ (BMJ Publishing Group Ltd.) (456), Journal of Medical Virology (John Wiley & Sons, Inc.) (248), and The Lancet (Elsevier) (183); Languages: English (11,447), Chinese (155), and German (150); research areas: general internal medicine (2,178), public environmental occupational health (959), and surgery (701); top cited papers: 2513 citations (https://doi.org/10.1016/s0140-6736(20)30183-5; Huang et al.; The Lancet), 1484 citations https://doi.org/10.1001/jama.2020.1585; Wang et al.; JAMA), and 1356 citations (https://doi.org/10.1016/s0140-6736(20)30211-7; Chen et al.; The Lancet).
Lack of disambiguation of Chinese names in WoS
While analyzing the top-ranked authors (in terms of publishing volume), it was noticed that several Chinese authors were highly ranked. For example, in WoS, Wang Y, Zhang Y, Li Y and Liu Y were ranked 1st, 2nd, 3rd and 5th in the top 10 authors. Similarly, Wang L, Li L, Wang J and Liu J were ranked 6th to 9th. As one example, Wang Y, who should have been a “unique” author, and who had 76 papers attributed to his name, was in fact found, after manual disambiguation and analysis of the original records, be several Wang Y.Footnote 5
The importance of accurate, culturally-sensitive indexing of Chinese names cannot be over-emphasized as incorrectly indexed names can drastically alter author-based metrics (Teixeira da Silva 2020a, b) and, as can be observed in the statistics in the previous paragraph, metrics specifically related to Covid-19 literature. There are several techniques available to improve author name disambiguation (Hussain and Hasghar 2017). One solution to solve the issue of name ambiguity is to inspect authors’ names and their background information (e.g., the institution in which they worked or the research areas or topics in which they are active) prior to any further bibliometric study.
Errors in assignment of DOIs
Some errors in assignment of DOIs were observed in WoS and Scopus during data analysis. The incorrect assignment of a single DOI to multiple papers is one of these issues. For example, each of the following DOIs is mistakenly assigned to two different articles in Scopus: “https://doi.org/10.3760/cma.j.issn.0254-6450.2020.02.001”, “https://doi.org/10.4414/smw.2020.20247”, “https://doi.org/10.1001/jama.2020.6122”, “https://doi.org/10.3760/cma.j.issn.0254-6450.2020.02.001”, and “https://doi.org/10.3760/cma.j.issn.0254-6450.2020.02.002”. To exemplify this, Fig. 2 shows a screenshot of the Scopus database response to the query “DOI = https://doi.org/10.4414/smw.2020.20247”. A single paper with two different DOIs was another error observed in both databases. For example, both the DOIs “https://doi.org/10.15252/embj.2020105114” and “https://doi.org/10.15252/embj.20105114” can be used to search a unique paper in WoS.
Discussion and limitations of this study
Our analysis and data mining of two major citation databases indicates that 23,634 unique documents related to Covid-19 were published between January 1 and June 30, 2020. Fraser et al. (2020) also indicate that thousands of preprints have been published, accounting for roughly one third of the total volume of published papers related to Covid-19 in the January 1 to June 30 period. This sheer volume of papers related to a single topic may be unprecedented. These astonishing volumes of indexed papers and other documents related to Covid-19 in reputed databases are already placing pressure on academics to quickly publish their findings and their thoughts, and on editors and journals to rapidly release potentially medically important information that could be of value to health practitioners and policy makers alike. However, the publishing system is under pressure, and this may result in analytical errors (Ioannidis 2020).
The Covid-19 literature will require separate and thorough analyses to appreciate how this pandemic shaped the academic publishing sector. For example, a tiny fraction of the documents are corrections and retractions. These numbers may increase as Covid-19 papers, including preprints and papers in potentially predatory venues (Teixeira da Silva 2020a, b) are increasingly scrutinized. Results of the current research showed that 0.8% of the Covid-19 related documents in WoS (101 documents) and 0.5% of documents in Scopus (114 documents) have been corrected or retracted within a short time after publication.
One notable aspect of the data summary in Tables 1, 2 and 3 (also see Fig. 1) are the sometimes-stark differences between Scopus and WoS, i.e., the choice of database for such analyses might bias the findings. Scopus tends to offer a more comprehensive coverage of the scientific and scholarly literature than WoS, and since Scopus collects a huge number of “secondary documents”, by doing so, it can capture citations from documents that are not indexed in Scopus (Falagas et al. 2008; Martín-Martín et al. 2018). The most important reason is that the content coverage of these two databases are substantially different. Consequently, the results of bibliometric analyses may vary depending on the database used. There are also biases in each of these citation databases, e.g., WoS has been criticized for favoring North American-based, English-language journals (Mongeon and Paul-Hus 2016). There are a few reasons for the predominance of documents in English, and apparent under-representation of Chinese: Scopus covers more Chinese-language documents than WoS, there are not many Chinese-language journals indexed in WoS, China has its own citation index (Chinese Science Citation Database) (Vera-Baceta et al. 2019), and most publications of authors affiliated with Chinese institutions were published in English-language journals.
We recognize that the current study, which provides a six-month snapshot of two major databases, only gives a limited time- and database-based perspective, of the published Covid-19 literature. As the pandemic tails-off, the volume of papers may also begin to decline, but a thorough reanalysis will be required to ascertain this trend.
https://coronavirus.jhu.edu/map.html (John Hopkins University; last accessed: July 17, 2020).
https://wellcome.ac.uk/coronavirus-covid-19/open-data (John Hopkins University; last accessed: July 18, 2020).
Including “BMJ Clinical Research Edition”.
More than one language is assigned to some documents in both databases.
A non-exhaustive list of examples: Wang Yu from Changzhi mental Health Center, Wang Yan from Chinese Academy of Science, Wang Yun from Sichuan University, Wang Yi from Huazhong University of Science & Technology, Wang Ying from Fenyang Hospital, Wang Ying from Sun Yat Sen University, Wang Yong from Hubei University of Medical Sciences, Wang Yin from Huazhong University of Science & Technology, Wang Yang from Nanjing University, Wang Yiquan from University of Hong Kong or Wang Yuan from Xiameng Chan Hospital.
Chen, Q., Allot, A., & Lu, Z. (2020). Keep up with the latest coronavirus research. Nature, 579(7798), 193. https://doi.org/10.1038/d41586-020-00694-1.
Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G. (2008). Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses. FASEB Journal, 22(2), 338–342. https://doi.org/10.1096/fj.07-9492LSF
Fraser, N., Brierley, L., Dey, G., Polka, J. K., Pálfy, M., & Coates, J. A. (2020). Preprinting a pandemic: the role of preprints in the COVID-19 pandemic. bioRxiv (preprint) https://doi.org/10.1101/2020.05.22.111294.
Hussain, I., & Hasghar, S. (2017). A survey of author name disambiguation techniques: 2010 to 2016. The Knowledge Engineering Review, 32, e22. https://doi.org/10.1017/S0269888917000182.
Ioannidis, J. P. A. (2020). Coronavirus disease 2019: The harms of exaggerated information and non-evidence-based measures. European Journal of Clinical Investigation, 50(4), e13223. https://doi.org/10.1111/eci.13223
Martín-Martín, A., Orduna-Malea, E., Thelwall, M., & Delgado López-Cózar, E. (2018). Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories. Journal of Informetrics, 12(4), 1160–1177. https://doi.org/10.1016/j.joi.2018.09.002.
Mongeon, P., & Paul-Hus, A. (2016). The journal coverage of Web of Science and Scopus: A comparative analysis. Scientometrics, 106(1), 213–228. https://doi.org/10.1007/s11192-015-1765-5
Teixeira da Silva, J. A. (2020a). Chinese names in the biomedical literature: Suggested bibliometric standardization. Publishing Research Quarterly, 36(2), 254–257. https://doi.org/10.1007/s12109-020-09725-1.
Teixeira da Silva, J. A. (2020b). An alert to COVID-19 literature in predatory publishing venues. The Journal of Academic Librarianship, 46(5), 102187. https://doi.org/10.1016/j.acalib.2020.102187.
Vera-Baceta, M., Thelwall, M., & Kousha, K. (2019). Web of science and Scopus language coverage. Scientometrics, 121(3), 1803–1813. https://doi.org/10.1007/s11192-019-03264-z.
Conflict of interest
The authors declare that they have no conflict of interest.
About this article
Cite this article
Teixeira da Silva, J.A., Tsigaris, P. & Erfanmanesh, M. Publishing volumes in major databases related to Covid-19. Scientometrics 126, 831–842 (2021). https://doi.org/10.1007/s11192-020-03675-3
- Acceptance and rejection
- Correction of the literature
- Peer review
- Open access
- SARS-CoV-2 virus