As briefly alluded to in the introduction, journals are not the only actors that influence research data sharing, organisations such as research funders and research performing organisations (e.g. universities) have also developed policies that recommend or require researchers to make their research available to various degrees. There is a wealth of literature available on the broad landscape of research data sharing, however, we limit our focus to the context of data sharing as it relates to the activity of journal publishing and there specifically on the ways that journals have (or have not) integrated research data sharing as part of their publication policies. Journal research data policies have actively been studied for over a decade, but what has remained a constant obstacle for making the results from various studies over time comparable is the variance in disciplinary focus, sample selection criteria, and the variation in how researchers have coded these policies. There is no universally adopted standard for journals to express their policies, which is one reason for why researchers have developed their own ways of making policies comparable to each other. In an attempt to summarise the diverse existing research on this topic, Table 1 presents a summary of prior studies of journal research data and editorial policies.
Table 1 Summary of prior studies of journal research data policies As can be seen from Table 1 research on the topic has been active during the last 10 years. Journal research data policies have previously been studied in the fields of environmental sciences (Weber et al. 2010), political science (Gherghina and Katsanidou 2013), genetics (Moles 2014), social sciences (Herndon and O’Reilly 2016; Crosas et al. 2018), biomedical sciences (Vasilevsky et al. 2017) and through multidisciplinary approaches (Piwowar and Chapman 2008; Sturges et al. 2015; Blahous et al. 2016; Naughton and Kernohan 2016; Castro et al. 2017; Resnik et al. 2019). Within the findings of recent multidisciplinary studies on journal research data policies, circa 50–65% of journals had a research data policy and 20–30% of these policies were either classified as strong policies or mandated data sharing into a public repository (Sturges et al. 2015; Blahous et al. 2016; Naughton and Kernohan 2016). Furthermore, studies suggest that journals with high Impact Factors also have the strongest data sharing policies (Vasilevsky et al. 2017; Resnik et al. 2019).
Prior studies have presented various classification frameworks for evaluation of journal’ research data policies (Piwowar and Chapman 2008; Moles 2014; Sturges et al. 2015; Herndon and O’Reilly 2016; Blahous et al. 2016; Crosas et al. 2018; Resnik et al. 2019). As an overall trend, the classification frameworks of research data policies have become more detailed and intricate over time. High-level classifications examining a perceived strength of policies, i.e. whether policies were in general seen as strong or weak, were first to emerge and are used in more recent studies as well (e.g. Piwowar and Chapman 2008; Blahous et al. 2016). However, a more intricate line of policy classifications has emerged where coding schemes include up to 24 variables (e.g. Stodden, Guo and Ma 2013; Moles 2014; Vasilevsky et al. 2017; Resnik et al. 2019). In most recent studies, specific data types, such as life science data, are acknowledged as factors affecting journal data policies and included as variables in classification schemes (Vasilevsky et al. 2017; Resnik et al. 2019).
Journal research data policies related to fields of neuroscience, physics and operations research have previously been examined as follows. Vasilevsky et al. (2017) examined research data policies of 318 biomedical journals and found that 21% of journals required data sharing and, in addition, 14.8% journals addressed only sharing of protein, proteomic, and/or genomic data. Of biomedical journals of that addressed research data in their editorial policies, the most recommended methods of data sharing were public repositories (57.6%) and data hosted in the journals’ online platforms (20.7%) (Vasilevsky et al. 2017). In general, the journal research data policies of physics and mathematics (operations research is often considered as a field related to applied mathematics) have been previously examined through large multidisciplinary approaches (Resnik et al. 2019). Even though the focus of Womack’s (2015) study was not on journal research data policies per se, his findings offer insight into research data practices within the fields of physics and mathematics. These including that even though only 25% of the 50 sampled articles representing mathematics used original data, the share of available data was relatively high (31.6%). Within the field of physics, Womack (2015) observed that even though 88% of the 50 articles representing the field used data (either original data or reused data), only in 8% of articles was the data available for reuse. More detailed domain-specific analyses of journals in the fields of physics and operations research would be beneficial to gain a better understanding of the current state of policies and their intricacies.
A recent paper by Jones et al. (2019) provides an overview of research data policies for all journals published by Taylor & Francis and Springer Nature. Taylor & Francis launched their data sharing policy initiative for journals at the start of 2018 offering journals five different standardised policies of various policy strength. By the end of the year, the average uptake of their basic data policy (encouragement to share data when possible) ranged between 71 and 83% across journals in various disciplines. Springer Nature started rolling out standardised research data policies for their approximately 2600 journals in 2016, having four different types of standard statements ranging from encouragements (Type 1 and 2) to requirements (Type 3 and 4). As of November 2018 more than 1500 Springer Nature journals have adopted one these four policies, some slightly modified to accommodate for disciplinary specificity. The distribution of policies in ascending order of policy strength: Type 1 (39%), Type 2 (34%), Type 3 (26%), and Type 4 (< 1%). Based on the overview of Jones, Grant and Hrynaszkiewicz (2019), which authors are employed at one of the two publishers, there seems to be momentum in more journals adopting data sharing policies that are expressed in standardised way across the publisher-level.
Looking beyond these two publishers and very recent developments, prior longitudinal studies confirm growth in scientific journals adopting research data policies. In their longitudinal study of 170 journals representing computational sciences, Stodden, Guo and Ma (2013) observed that the amount of research data policies increased by 16% between 2012 and 2013, along with increases in code (30%) and supplementary materials policies (7%). Herndon and O’Reilly (2016) observed that within the period of 2003–2015, the amount of high Impact Factor social science journals having a research data policy increased from 10 to 39%. Furthermore, their findings also suggest that on average the policies of 2015 are more exact and demanding than their 2003 counterparts (Herndon and O’Reilly 2016, 229). Castro et al. (2017) found that research data policies of a sampled set of multidisciplinary open access journals did not show signs of adoption of stronger data policies observed during 2 years interval of 2015–2017. Even though the amount of longitudinal studies is limited and earlier domain specific studies have not been replicated recently to observe change in the journal policies, the observations of Stodden et al. (2013) and Herndon and O’Reilly (2016) combined with the editorial efforts reported by Jones et al. (2019) suggests that scientific journals are more frequently adopting research data policy and that these policies are becoming more demanding. Previous studies also support the notion that it is more common for journals with higher Impact Factor ranking to have data policies.
The development of common frameworks for journals to express, and by extension for researchers to study journal data policies, has been gradually improving over the last decade. In Table 1 we summarised how 14 previous studies had approached development and use of journal selection and coding frameworks. The first study directly on journal data policies by Piwowar and Chapman (2008) relied on relatively crude classification of 70 journals into categories of “No”, “Weak”, or “Strong“, while the most recent by Resnik et al. (2019) incorporated an extensive 24-point framework in a study of 447 journals. For this study we will incorporate a scaled down 14-point framework that is capable of registering the central variables relating to the questions of what, where, and when data is to be shared based on policies from highly cited journals from different disciplines. Previously, an analysis on what, where, and when dimensions of research data policies has been done by at a multidisciplinary level by Sturges et al. (2015). However, Sturges et al. (2015, p. 2449) focused on providing recommendations regarding research data policies of scientific journals and presented their empirical findings at a very general level without domain specific differences, for example. The present study takes the approach of what, where, and when data is to be shared based on coding categories that are mutually exclusive, which helps comparisons between journals representing different fields of science, for example. The framework and data collection methodology is described more closely in the following section.