The central question this study aimed to address is whether OA journal articles within a particular subject area experience a citation advantage, in comparison to subscription journal articles in the same subject area. As explained above, we apply three different measures of citation advantage, which translates into the following three sub-questions:
Are the normalised citation scores (NCSs) of OA journal articles higher than those of subscription journal articles?
Is the percentage of OA journal articles that receive at least one citation within 2 years after publication higher than that of subscription journal articles?
Are the percentages of OA journal articles among the most frequently cited 1%, 5%, and 10% of articles higher than those of subscription journal articles?
In addition, we investigated whether there is a relationship between whether an article is published in English and whether it is published in an OA journal. The result of this analysis informed our decision to include or exclude English-language publications from the rest of the analysis, depending whether language was identified as a confounding variable.
In this "Methodology" section, we begin by describing our operationalisation of the three measures implied in these research questions. This is followed by a sub-section on our conceptualisation and operationalisation of ‘access status’. The methodology section concludes with a discussion of our data extraction methods in a sub-section that includes our decisions regarding the definition of the population and variables relevant to that definition.
Measures of citation advantage
This study improves on previous research on the OA citation advantage of OA journal articles in comparison to subscription journal articles, by applying three measures of citation advantage, two of which have been rarely used. In the following three sub-sections, we provide more detail on each of these measures.
Normalised citation score
The first measure of citation advantage we used is based on the NCS (Waltman et al. 2011, pp. 469–470), which ‘normalises’ or corrects for both subject area and publication year. An article with an NCS of less than one (the average for its subject area and publication year) indicates that the article received below the expected number of citations, while an NCS of more than one indicates the opposite. Since articles are often associated with more than one subject area, each article and all citations it receives is attributed in equal fractions to all the subject areas associated with it. We selected a 6-year citation window, which allowed the counting of citations to articles in subject areas in which it takes longer to accumulate citations, while still normalising by publication year to compensate for variation in the number of years since publication that articles could potentially be cited. While 6-years have not passed for all of the articles, the normalisation results in articles still being comparable. Self-citations were included in the calculation of the NCS, as the complicated process of accurately identifying and removing self-citations was not deemed justified, considering evidence that excluding them has little effect on citation analysis conducted on highly aggregated levels, such as the analysis reported in this article (Aksnes 2003, p. 244; Glänzel and Thijs 2004, p. 286).
In order to test the OA citation advantage postulate, we determined whether the NCSs of OA journal articles are higher than those of subscription journal articles, by consulting the effect size of the relationship between the NCS and the variable access status. For this continuous-by-dichotomous relationship, the point-biserial correlation coefficient (rpb) was used, and a result of |rpb| ≥ 0.1 was interpreted as a large enough effect size to indicate a relationship, |rpb| ≥ 0.3 is taken to indicate a strong relationship. A negative value indicates an OA citation advantage.
The second measure of citation advantage refers to whether articles were cited at all, by individuals other than the authors, during the first 2 years after publication (i.e. over a fixed citation window of 2 years). Since this measure cannot be normalised, it is necessary to use a fixed citation window during which all citations will have been indexed. We chose a 2-year window as the longest window that still offers complete data. We refer to this measure as citedness, as opposed to uncitedness (Schwartz 1997), which has been investigated and discussed in other contexts as an indication of research with little to no impact (e.g. Garousi and Fernandes 2017; Cronin, 1984, p. 32), although not without dispute (Mohammed et al. 2020, pp. 1796–1797, Van Noorden 2017). We treat citedness as an indication that an article has been formally incorporated into its broader subject area and has become visible to the research community (Mohammed, Morgan, and Nyantaki 2020, p. 1797). Self-citations (i.e. when a citing article and cited article have at least one author in common, with exactly the same name and surname) were excluded from this measure, as they do not reflect visibility beyond the authors.
To determine whether OA journal articles experience a citation advantage according to this measure, we again consulted the effect size of the relationship between the measure and the variable access status. As analysis is dichotomous-by-dichotomous, the phi-coefficient (φ) was used. A result of |φ| ≥ 0.1 is interpreted as an effect size that is large enough to indicate a relationship, and |φ| ≥ 0.3 is taken to indicate a strong relationship. In each case, a negative value indicates an OA citation advantage.
Most frequently cited
The third measure of citation advantage that was investigated in this study is the percentage of articles among OA journal articles and subscription journal articles that, on the basis of the NCS, belong to the 1%, 5% and 10% most frequently cited articles in their subject area. This measure is often referred to as ‘percentage of publications in top 1/5/10%’, and in this article it is abbreviated as pptop1/5/10. As this measure is based on the NCS, a 6-year citation window is also applicable to this measure and self-citations are included. Due to the complications caused by multiple articles potentially having the same number of citations, various methods have been proposed in the literature for calculating these percentiles (Waltman and Schreiber 2013, pp. 373–376). The method chosen for this study was the one applied in the SCImago Institutions Rankings (Waltman and Schreiber 2013, p. 374), according to which all articles that have sufficient number of citations to qualify for the 1% most frequently cited articles are included under the pptop1; and likewise, for 5% and 10%. This method generally results in the inclusion of a percentage of articles higher than X in the pptopX (e.g. more than 1% of articles are included within the 1% most frequently cited articles), but avoids reintroducing the problems associated with a skewed distribution that percentile-based indicators were initially designed to overcome. To determine whether OA articles experience a citation advantage according to this measure, we applied the same statistical procedure as we did for citedness, i.e. φ.
Defining access status
There are many different types of OA, and importantly, these differ in terms of citation behaviour (Archambault et al. 2014b). Conceptual clarity on how articles are classified as OA in the WoS database is also lacking. As a result, the distinction between the different types of OA is not clear-cut. We therefore decided to focus on two types of articles that have the strictest and clearest definitions. The first type, subscription journal articles, we define as those that have no (OA) label in the WoS metadata. They were published in subscription only journals, and no self-archived versions of those articles are available.
The second type, OA journal articles, we define as those published in OA journals listed on the DOAJ. The WoS metadata contains five categories of OA, but these are not mutually exclusive, and therefore articles can be assigned more than one of these categories. One of these categories is Gold DOAJ, which is applied only to those articles published in DOAJ listed journals, and, for the purpose of our study, we classified all articles in this category as OA journal articles. All articles that have one of the other OA labels, but not Gold DOAJ, were excluded on the basis that their definition is unclear, or they refer to self-archived articles. An example of the ambiguity is the Gold Other label: an article with this label may be an OA journal article published in a journal not listed in the DOAJ, or it may be a hybrid OA article. Limiting our definition of OA journal articles to Gold DOAJ led to an exclusion of 22.8% of all the articles indexed for the years investigated. Of all the articles indexed for the years investigated 9.2% are Gold DOAJ articles and 67.9% are subscription journal only articles.
In summary, we define the variable access status as a dichotomous one, comprised of two attributes: Gold OA journal articles published in DOAJ listed journals; and subscription journal articles of which no self-archived versions are available. These two categories are henceforth referred to as ‘OA journal article’ and ‘subscription journal article’.
Data extraction and definition of the population
The data required for this study were extracted from what was, at the time of the study, the most recently available (2018) version of the WoS metadata housed at the Centre for Research on Evaluation, Science and Technology (CREST). The population was defined as all English-language articles and reviews published between 2013 and 2015 in either OA journals or subscription journals, and, in the case of the latter, for which no OA versions were available. This definition and the aim of the study imply selection in terms of language, document type, publication year, and subject area. The decisions underlying our selection in terms of each of these variables are elaborated upon below.
Our selection of the publication years 2013 to 2015 reflects a weighing up of various considerations. First, we needed to include articles in our analysis that were published as recently as possible, in order to account (and compromise) for changes in journals’ policy on OA, and how these are managed in WoS. In short, an article published in a journal that is not classified as an OA journal at the time of that article’s publication in the journal, is retrospectively classified as an OA journal in WoS if that journal changes its policy to become an OA journal. Thus, an article classified as an OA journal article in the 2018 version of the WoS metadata, may not have been OA when it was originally published and, more importantly, when it received its citations. In order to reduce the likelihood of reproducing retrospective classifications in our data set, we reduced our publication window by excluding articles published before 2013.
We only included articles published up until 2015 to ensure a sufficiently large citation window to calculate the NCS, especially considering our aim to disaggregate by subject area. Thus, to balance the requirements of having sufficiently recent articles (limiting misclassification and exclusion of articles), sufficient number of articles per subject area, and sufficient number of years for articles to receive citations (sufficiently long citation windows), articles published during the three-year period of 2013, 2014, and 2015 were selected and data were extracted from the most recently available (2018) version of the WoS metadata.
Various document types are indexed in the WoS database (Thomson Reuters 2018, p. 27), and document types published in academic journals differ in the number of citations they tend to receive (Tahamtan et al. 2016, p. 1200). We decided to include both research articles and review articles, as we wanted to focus the study on empirical research articles, however the conceptual distinction between these two document types in WoS is too arbitrary to be operationally useful (Basson 2019, pp. 69–70)—an observation supported by the fact that various previous studies have grouped similar document types together for the purpose of citation analysis (e.g. Van Leeuwen et al. 2001). In addition, including both these similar document types ensured that sufficient number of articles for each subject area were included in the population for analysis.
At the time of data extraction WoS used a subject area categorisation scheme that includes 251 subject area labels. Of special note is the subject area label ‘Multidisciplinary sciences’. While previous studies that investigated the OA citation advantage reported results on this ‘subject area’ (Dorta-González et al. 2017; Sotudeh and Horri 2009), we argue that it should be excluded in analyses that specifically investigate subject areas. Our reasoning being that one disaggregates by subject area because citation behaviour differs among subject areas. However, articles with the ‘Multidisciplinary sciences’ label are those articles published in multidisciplinary journals that include articles from various subject areas. As such, ‘Multidisciplinary sciences’ is not a subject area in itself, but rather a description of the type of journal the article is published in. WoS assigns articles published in these multidisciplinary journals with other subject area labels, e.g. based on the article’s cited references, when possible (Thomson Reuters 2018, p. 6). Due to these considerations we exclude ‘Multidisciplinary sciences’ from our analysis of subject areas.
Lastly, included in our study is the research question whether articles published in OA journals are more likely to be written in a language other than English, in other words to determine whether language is a confounding variable when investigating the OA citation advantage of OA journal articles. Previous studies have shown that English-language articles tend to receive more citations than articles published in other languages (Torres-Salinas et al. 2016, pp. 1206–1207; Van Leeuwen et al. 2001, p. 345). In addition, there is some anecdotal evidence that OA journal articles are more likely to be written in a language other than English (Björk 2011, p. e20961).
Previously, an insufficient proportion of articles indexed in WoS were assigned the relevant metadata to test whether there is a relationship between whether an article is published in an OA journal and whether it is written in a language other than English (Basson 2019). In the most recent metadata this has been rectified and now all WoS articles have language data. In this study, we investigated the abovementioned relationship for all the articles published from 2013 to 2015, regardless of subject area. To determine whether there is a relationship between access status and language, we applied the same statistical procedure as we did for citedness to determine the effect size of the relationship. Based on the results of this analysis, which is also reported below, language was determined to be a confounding effect, and only English articles were included for the analysis by subject area.