This study presents a novel method to estimate the subject coverage of scholarly databases. The BOK method made it possible to rank 56 databases based on their relative and absolute coverage and to determine their level of specialization. These findings are particularly helpful as they both quantify a textual description of coverage but also facilitate comparisons of subject coverage. BOK estimates have been shown to detect subjects that are not described in coverage descriptions. For example, seekers of Psychology literature might be surprised to learn about the more than 13% Psychology coverage in ERIC, a database focused on education that does not expressly mention its Psychology coverage, albeit listing psychology-oriented journals in its content.
Comparability of the individual database estimates (i.e., internal validity) is rated as high due to the BOK method being consistently and rigorously applied across databases. This study aimed to determine subject coverage across databases with sufficient precision to inform database selection. For a researcher, it is largely irrelevant whether a system has 77% or 82% coverage in a specific subject; what is relevant is reliable coverage estimates compared to other databases. BOK does that particularly well.
Assessments of the external validity showed the levels of inaccuracies that should be accounted for when interpreting the estimates for multidisciplinary systems. For WOS, this deviation was calculated at an average of 19.6% across all 26 subject categories, a value relatively small compared to the little or imprecise information in existence on subject coverage of most databases. For specialized systems, external validity was assessed narratively by comparing BOK estimates with textual coverage statements; overall, all six highly-specialized databases and two narrowly specialized ones were estimated plausibly. Overall, while it is important to note that BOK provides estimates that will reflect actual coverage with a margin of deviation, the estimates can be considered robust and to offer plausible guidance for selecting databases.
Search advice for each major academic search type
I discuss how researchers can utilize the estimated absolute and relative subject coverage of databases. Optimal database selection will depend on the goal of the researchers. Broadly speaking, academic researchers frequently have three different search goals: lookup, exploratory, or systematic (Gusenbauer & Haddaway, 2021).
Lookup searching
Lookup searches, where researchers know exactly what they are seeking, will require databases with high total absolute coverage (Table 2) or high absolute coverage in a specific discipline (Table 4), as the likelihood of a database covering the desired records is comparatively higher.
Exploratory searching
Exploratory searches benefit from high rates of absolute coverage of one or multiple potentially relevant subjects (Table 4), as serendipitous findings might occur in databases with a broader scope. However, if the goal is solely to explore a discipline or sub-discipline, then a database with high relative coverage (Table 5) might be the best choice. The fact that Google Scholar, as the largest database available, is used by most academics (Nicholas et al., 2017) engaged in lookup and exploratory searching indicates that users prefer comprehensiveness (Table 4) for these search types. In exploratory searching, the search moves are essential, consequently, search functionalities, such as citation searching or filtering, will play a greater role than they would in lookup searching.
Systematic searching
In systematic searches, where the goal is to identify all records on a subject, the optimal choice between high absolute and high relative coverage is not straightforward. As keyword queries can identify many irrelevant results when the subject focus is too wide, researchers need somehow to account for the problem. Researchers must either search specialized databases with high relative coverage (Table 5) or search multidisciplinary databases with high absolute coverage (Table 4) and limit the subject focus via subject-specific keywords or, when available, via subject filters or a controlled vocabulary. In all cases, users should search multiple databases when searching systematically (Konno & Pullin, 2020), including databases with specialized content (Table 5) (Bramer et al., 2017). Backward and forward citation searching of multidisciplinary databases with high rates of absolute coverage (Table 4) and a citation index helps to identify relevant records from a wide field of interest. Additional options are gray literature searching or hand-searching. The former can be particularly successful with larger databases that cover scholarly records of all kinds (Table 2).
The results of this study should also encourage researchers to use databases that are identified as relevant but not familiar. Using a variety of relevant databases will increase the number of identified relevant search results which is particularly beneficial in systematic searches (Konno & Pullin, 2020).
After we have considered the subject coverage of databases, it is important to remember that there are other questions researchers seeking optimal database selection should consider too. Here is a selection of those questions:
-
Does the top-ranked database cover the record type(s) I seek? For example, most databases cover journal articles, but not all. For an overview of record type coverage, see Table 2.
-
Is the retrospective coverage of the database adequate? For example, if you want to know about the origins of computer science, it is not advisable to choose arXiv, a database whose retrospective coverage starts in 1991. For information on retrospective coverage, see Table 2.
-
Does my institution subscribe to the database that covers most records in my discipline? What is the share of open access records on the database? Paywalls considerably limit access to databases that provide specialized records in particular. However, just because a database is openly searchable does not mean its records are openly accessible. For an overview of paywalled versus open databases and their relative open access coverage, see Table 2.
-
In the case of narrow search goals: does the top-ranked database also cover the most records for the specific concept I seek? For specific search goals, just a small number of records from an entire subject might be relevant. Some databases will cover this sub-topic more comprehensively than others. Researchers can assess the situation by conducting queries of their narrow concept(s) in several databases among those suggested by BOK estimates to contain the most records in the discipline (see Tables 4 or 5). To compare coverage results, the researcher must consistently apply queries with the same keywords and field codes across databases.
-
Does searching a combination of databases yield better outcomes than searching a single one? Results show which databases are most comprehensive in single subjects. Depending on search goals (lookup/exploratory/systematic searching), it will make more sense to search a single database, multiple multidisciplinary ones, or multiple specialized ones. Aggregator systems (e.g., Web of Science, ProQuest, EBSCOhost) in particular permit searching multiple specialized databases at once to balance recall and precision.
-
Does the search system support the search heuristics I want to perform? For example, not all search systems allow database access via Boolean queries, citation searching, filtering, or controlled vocabularies. It is important that users assess search functionalities relevant to their search goals of databases with good coverage also provide search functionalities relevant to their search goals. An in-depth analysis of approximately half of the systems analyzed in this study can be found in Gusenbauer & Haddaway, 2020.
Comparison of the coverage of Scopus and WOS CC
To illustrate how BOK estimates should be interpreted in light of existing assessments of subject coverage, I discuss the findings for both Scopus and WOS CC. A recent literature review comparing both systems has called them “The Titans of Bibliographic Information in Today’s Academic World” (Pranckutė, 2021). As institutions must pay substantial fees to access these paywalled systems, particular attention has recently been directed at their disciplinary coverage, among other important characteristics (e.g. Aksnes & Sivertsen, 2019; Bakkalbasi et al., 2006; Chadegani et al., 2013; Harzing, 2019; Harzing & Alakangas, 2016; Kousha & Thelwall, 2008; Martín-Martín et al., 2018a, 2018b, 2021; Meho & Yang, 2007; Mongeon & Paul-Hus, 2016; Moskaleva & Akoev, 2019; Singh et al., 2021; Vera-Baceta et al., 2019; Vieira & Gomes, 2009; Visser et al., 2021).
Harzing and Alakangas (2016, p. 788) noted that “Web of Science and Scopus provide fairly similar results,” based on a review of the literature up to 2015. Recently, Pranckutė (2021, p. 7) summarized previous findings to show “better Scopus coverage of all major disciplines when compared to WoS.” BOK estimates, assessing coverage in 2021, plausibly update both these statements and offer a more nuanced view of their coverage. Both WOS CC and Scopus probably have unique merits because (1) their ADS are almost identical,Footnote 6 yet (2) their coverage only overlaps to a certain extent. Previous studies found that both databases have significant proportions of unique records (Martín-Martín et al., 2018b, 2021; Visser et al., 2021), a finding substantiated by the BOK results. While BOK does not look at individual records, it is capable of detecting overlap at an aggregate level. The BOK estimates are derived from using the same keywords and query settings across databases. Accordingly, if indeed both databases overlapped to the greatest degree, the BOK estimates would show this by identifying similar keyword-based query results (QHCs) for both databases. Internal validity assessments show that BOK estimates work well in identifying whether systems access the same records. For example, Medline accessed by Ovid, WOS, and EBSCOhost were found to have very similar coverage (see Fig. 2). Accordingly, differences in BOK estimates between Scopus and WOS CC will be due to a significant share of unique records available in each database and the relative differences in disciplinary coverage of those records. Unlike sampling-based studies, BOK is, however, unable to determine the extent of the (non-)overlap.
How does BOK estimate Scopus versus WOS CC coverage?
In this study, the results of Scopus’ coverage are precise, as they are not derived from estimates, but rather from direct queries based on the ASJC subject classification of Scopus. The WOS CC data, and the data from all other databases in this study, is based on BOK estimates relying on word frequencies provided by Scopus. Figure 3 compares the absolute subject coverage data for Scopus and WOS CC, a selection of the data illustrated is in Table 4. The comparison shows that Scopus only covers 47% of the records in Arts and Humanities and only 61% of those in Social Sciences that WOS CC does. Conversely, BOK finds Scopus’ coverage is notably superior in Physics and Astronomy (137% of WOS CC), in Earth and Planetary Sciences (132% of WOS CC), in Computer Science (133% of WOS CC), and in Engineering (132% of WOS CC).
Assuming Google Scholar’s index is the most comprehensive collection of academic literature (Gusenbauer, 2019), WOS CC and Scopus only cover a small portion of that. BOK estimates confirm that both WOS CC and Scopus tend to cover more from the Life Sciences, Physical Sciences, and Health Sciences and less from the Social Sciences and Humanities than Google Scholar does (Pranckutė, 2021; Singh et al., 2021). Figure 4 shows Scopus’ coverage compared to Google Scholar, with Social Science and Humanities subjects highlighted in red. It shows how Scopus covers relatively more from the Engineering, Computer Science, Energy, Chemical Engineering, Chemistry, Veterinary, and Neuroscience fields, whereas Social Sciences, Economics, Arts and Humanities, and Business and Management are covered to only a smaller degree. A notable exception is Decision Science, a subject that is covered almost as well as Computer Science or Mathematics—subjects it is more closely related to than the subjects in Social Science and Humanities.
How to compare BOK results against other studies’ results
Some BOK findings confirm previous findings and some contradict them. Previous examinations are not homogeneous in their assessments of the coverage of Scopus and WOS CC (Pranckutė, 2021). The extent to which BOK estimates reflect actual subject coverage in databases (i.e., external validity) will also depend on what one considers ‘actual coverage’. Judging external validity also depends on the reference point, that is, the methodical decisions shaping a study. Those factors include the choice of subject classification system, sampling-based versus full analysis, retrospective coverage, journal- versus document-level analysis. It is also important to consider the variations in those decisions across studies. Four factors that will contribute to different subject coverage results across studies (not within studies) are discussed below in greater detail: (1) differences in institutional subscriptions of WOC CC, (2) differences in analysis procedures of subject classification, (3) differences in subject classification systems, and (4) differences in subject attribution.
Differences in institutional subscriptions of WOC CC
The first factor that needs to be accounted for when comparing WOS CC to other databases is the issue of differences in institutional coverage. That issue necessitates always considering WOS CC coverage results in light of the unique access situation of the investigating researcher. For example, the results of Visser et al. (2021) are difficult to compare with as their subscription starts in 1980; the subscription in this study starts considerably earlier for most indices. The version of the WOS CC included in this study is comprehensive in that it covers almost all of the records WOS CC provides in its full version. Only some minor indexes are not included in this study’s analysis of WOS CC (see Table 2). To enhance the assessment of WOS CC, this study includes all major sub-indexes of the WOS CC also individually, each index in its full retrospective coverage.
Differences in analysis procedures of subject classification
Second, differences occur in how a single document is determined to be attributable to a subject. BOK estimates are accurate insofar as they count each document that matches a highly-specialized keyword that represents a subject. These keywords are representative of the larger proportion of coverage of an entire subject. Inferences can be made about the entire database because the likelihood of a highly-specialized keyword being attributable to a document is known. While those inferences will not be exact, BOK estimates have the merits of attributing subject-coverage to each individual document in the database.
Sampling-based approaches make assumptions based on a sample of documents from the total. A document’s subject is then determined, either directly based on the subject(s) the journal is attributed to (e.g., Harzing & Alakangas, 2016) or indirectly where the subject attribution of a citing document is determined via the subject attribution of the seed document (e.g., Martín-Martín et al., 2021). Differences in how samples of documents are drawn, and on how directly subjects are attributed to documents will determine the comparability and the precision of results.
Differences in subject classification systems
Third, results from this study can only be directly compared to studies that also adopt the ASJC classification system at the level of analysis of this study. To the best of my knowledge, no studies use ASJC and compare Scopus and the WOS CC. Every classification system will demarcate subjects differently, even when the subjects are titled the same way. Previous assessments of the subject coverage of Scopus and WOS CC used Scopus’ ASJC at the five-category level (Visser et al., 2021), Google Scholar categories at the 252 and eight-category levels (Martín-Martín et al., 2018a, 2018b, 2021), the National Science Foundation classification at the four-category level (Mongeon & Paul-Hus, 2016), WOS classification at the five-category level (Singh et al., 2021) or not closer specified classifications at the four-category level (Aksnes & Sivertsen, 2019) and self-specified classifications at the five-category level (Harzing & Alakangas, 2016). As it is difficult to identify the best classification system, a multitude of different approaches might encourage better research classifications by comparing and learning (Wang & Waltman, 2016). Nevertheless, a drawback of scientometric studies using different classification systems is that comparing them will always be a vague process. For the method of BOK, it did not make sense to adopt one of the previously used classification systems (see section ‘Selection of reference database and its subject classification: Why ASJC by Scopus?’).
Differences in subject attribution
Fourth, an important question is how subject coverage is determined in terms of multi- or single-attribution. Most of the studies found comparing the subject coverage of Scopus and WOS CC assume one record is unequivocally attributable to a single subject: For example, Mongeon and Paul-Hus (2016) assume journals are categorizable into one of four disciplines, and Martín-Martín et al. (2021) assume cited documents share the same single subject the seed document was classified against under the Google Scholar categories. Nevertheless, records are often attributable to more than one subject. According to the WOS CC classification system, one record is on average attributable to 1.33 subjects, in arXiv the figure is 1.12, and in Scopus it is 1.59.
The absolute and relative subject coverage rates determined via BOK are based on fractional counting (Perianes-Rodriguez et al., 2016; Visser et al., 2021), which I refer to as single-attribution. Consequently, if a record is attributed to both Mathematics and Decision Sciences each subject is awarded half of one point, so the sum of subjects equals the sum of records. This study relied on single-attribution, as multi-attribution calculations of estimated databases would mean overly inflating their coverage. For example, for highly-specialized databases with excellent coverage in a single subject, calculating multi-attribution might mean that one subject has 100% coverage. Nevertheless, 100% coverage of a single subject is very unlikely, as there are likely to always be records from other subjects available in databases. The choice of single-attribution values does not limit comparability across databases as the assumptions are applied equally across all databases. Overall, the logic of assigning one or multiple subjects to a single document will, however, impact the results of subject coverage assessments. The reader needs to be aware that due to single-attribution, absolute subject coverage values estimated in this study should be interpreted as indicating that a database covers at least that number of records. The actual number of records in a specific subject will likely be higher, given that most records are attributable to multiple subjects.
Contribution of the BOK method
BOK as an additional method for coverage estimation
The BOK method has several advantages over contemporary scientometrics methods in assessing database coverage. BOK is compatible across many databases (high internal validity) and has a low marginal cost of updating and adding new estimates. Specifically, BOK can help continuously analyze databases that are relatively new and regularly updated, such as Lens or scite.
Its merits make BOK an ideal complement to existing sampling-based methodologies to assess database coverage. Primarily, BOK estimates are an efficient way to estimate absolute subject coverage values of entire databases, information that is typically missing in sampling-based studies because they often calculate overlap- or the coverage values of a specific sub-sample. Here BOK’s external validity is considerably improved by calibrating (normalizing) absolute coverage values based on the absolute database size—data that in most cases can be considered accurate. As the ADS vary greatly across databases—from close to 1 million to almost 400 million records—the absolute subject coverage estimates of BOK will reflect those differences. Accordingly, this kind of normalization ensures that estimates are within a certain absolute margin of deviation anchored at the ADS.
While BOK can provide a picture of the coverage of many databases, sampling-based studies can give a detailed view of specific databases (e.g., Martín-Martín et al., 2018a, 2018b, 2021; Visser et al., 2021; Walters, 2007). For example, the BOK method can provide information on databases with similar high subject coverage or information on databases that seem to overlap considerably (e.g., see Fig. 2: EPM, PMD, MED1-3, EMB, MET). Sampling-based methods could then analyze the overlap of databases with regard to specific types of records. In this example, analyzing the coverage of Europe PMC, PubMed, Medline (via Ovid, WOS, EBSCOhost), Embase, and the newly released/discontinued Meta would give more insights into their areas of overlap and levels of uniqueness.
The successful application of BOK in this study should also promote its application to different settings. BOK could be extended to analyze subject coverage of languages, authors, specific topics, and other criteria. It is also possible to investigate coverage at a sub-disciplinary level to determine an even more granular picture of what specific subtopics are covered or otherwise in specific databases. Furthermore, librarians might use the method to investigate differences between subscription packages offered by WOS, ProQuest EBSCOhost, or Ovid. Such an investigation might, for example, reveal the different coverage options of the WOS Core Collection. Given its broad applicability, it could be used to compare many smaller niche databases that often remain in the shadow of larger databases that promise superior coverage. Making these systems readily comparable can provide a promising way forward to shed light on databases and systems that have been too long overlooked.
The use of QHC as a measure of bias in search queries in general
BOK uses QHC as its underlying data collection method. If QHCs are inaccurate due to inexact keyword matching, these issues will occur for any search of the database in question. All users who access the database will find their keyword queries interpreted in some non-transparent way so that the search results obtained are biased. Semantic search systems such as Google Scholar, Microsoft Academic, or Semantic Scholar are proponents of such search functionalities. Microsoft Academic noted in its FAQs: “Traditional search engines rely mostly on keyword matching. Usually, they match the keywords you type in the search field with words found in the indexed content. The accuracy of the search results depends on the quality of the keywords you type, which puts the responsibility of a successful search on the user (Microsoft Academic, 2021).” As more systems take responsibility for articulating search goals away from users, the systems will also introduce bias and opaque algorithms that impede transparent, reproducible data collection. QHCs reflect what is actually available to the user via queries. Other data that is not retrievable via queries—the main way of identifying records at most search systems—will probably not emerge and thus will not be accessible. For example, in several cases, official information regarding retrospective coverage is inaccurate. Manually verifying retrospective coverage for these systems via QHCs showed that many have greater or lower retrospective coverage than is reported (see Table 2).
QHC method can illustrate the limitations of limited search functionality (Gusenbauer & Haddaway, 2020) and database descriptions. Further, it can illustrate how query results can be even more accurate than official information, and in its application of BOK the method can be used to get a more accurate picture of the coverage of a database. QHCs were used in this study as an efficient tool to determine not only subject coverage but also ADS, retrospective coverage, English, and open access coverage (see Table 2).
Limitations
While the BOK method provides good and robust estimates of subject coverage, the results should be interpreted with an eye on some limitations.
Language coverage
As the selected keywords are only in English, they will only identify English records. If the underlying dataset’s language composition is substantially different from that of Scopus, the estimates will be somewhat biased. Biases will occur when a keyword is used in multiple languages or shares the name with a prominent author. Keyword-specific differences were alleviated by selecting suitable keywords yet cannot be fully ruled out. To alleviate language bias, this study focused on databases with a majority of English content. Some variation in language composition is acceptable when we assume that the relative subject composition of the underlying dataset is similar between English and non-English records.
Given that English acts as the lingua franca of science communication, the BOK method applies to most popular bibliographic databases researchers use today. Nevertheless, there might be benefits to searching for and including non-English records in research, depending on the purpose of the research. For example, in the realm of evidence synthesis, the results of quantitative syntheses are changed by using additional non-English sources (Walpole, 2019). Other research found, however, that conclusions from evidence synthesis in health sciences remained similar for a sub-sample of all-English sources (Nussbaumer-Streit et al., 2020). Even though the effect of including non-English studies in evidence synthesis is not entirely clear, what is always true, however, is that including non-English databases will increase the variety of evidence in literature searches (Konno & Pullin, 2020) and thus improve outcomes. While databases in this study already partially cover non-English records, researchers seeking non-English records too will probably need to search other non-English databases.
Quality of underlying records
The estimations of databases’ subject coverage provided in this study do not indicate the quality of the underlying records. It is important to note that some databases focus on providing peer-reviewed, published records (e.g., Scopus or Web of Science), while others (also) include data of all types and quality (e.g., Core or BASE). Both these database types have their merits. The overall quality of records is higher in the former, while the latter might be more comprehensive and also include the gray literature important for quantitative analyses (Haddaway et al., 2015). The current research only compares databases that exclusively or at least mostly cover the various forms of scholarly records (see Table 2). That choice was made to ensure the consistency of the BOK method.
Another area of bias is the number of duplicate records and other database errors a database has. For example, duplicate rates vary from almost zero (0.00–0.05%) for Web of Science on various databases to almost five percent (1.0–4.8%) for Google Scholar (Haddaway et al., 2015; Orduna-Malea et al., 2017). Another study found Google Scholar’s duplication rate to be 4% and Scopus’ rate to be 2% (Moed et al., 2016). Data issues and particularly duplicate records are likely to be present in all databases to some degree. For example, a study found that WorldCat and other (non-)academic databases contain a number of duplicate records (Wilder & Walters, 2021). While the exact duplication rates are difficult to assess, duplication rates are likely to be lower for curated databases than for crawler-based ones. Google Scholar’s pre-eminent position in terms of superior subject coverage across most subjects is not at risk, even when factoring in a 5% duplicate rate. This secure position is even more likely as the runners-up (BASE, Microsoft Academic, Semantic Scholar, Core, Lens) are also likely to have similar duplication issues. Overall, these differences should be taken into account when selecting databases based on coverage preferences: the higher the duplicate rate, the more BOK estimates will overvalue absolute subject coverage. The relative shares of disciplinary coverage are unlikely to be particularly affected by duplicates.
Quality of ASJC classification
The liberal approach evident in Scopus with regard to assigning subjects to its records has been criticized (Mongeon & Paul-Hus, 2016; Wang & Waltman, 2016). This study not only confirms these previous findings but also shows subjects are attributed unevenly. The subjects with the least overlap were found to be Medicine (70% unique), Dentistry (64% unique), and Veterinary (56% unique), while the greatest overlap occurred in Decision Sciences (5% unique), Materials Science (10% unique) and Chemical Engineering (10% unique). The overlap percentages raise the question of whether categories are sufficiently unique particularly in the case of Decision Sciences and other highly overlapping categories. Conversely, larger categories might benefit from being divided. The issue of unevenly unique subject categories was also noticeable at the keyword level, where it was most difficult to find precise keywords for categories with the most overlap. That issue could negatively affect the accuracy of the estimates for those subjects. The differences in subject overlaps are addressed by using precision thresholds in calculating BOK estimates. Overall, this limitation illustrates the need to scrutinize the qualities of the subject classification systems in general and ASJC in particular.