Introduction

In clinical research, spontaneous data sharing is not yet as common as it is in other fields such as genetics, astronomy or physics [1]. However, the concept of data sharing has been suggested for many reasons, including the patient-centred nature of medical research and healthcare and the expectation that knowledge from existing data should be maximized to benefit all stakeholders.

Although a transition to data sharing is a process that will take time and planning, those who adopt the principles and practices of open science will likely benefit from it [2, 3]. In addition, the emergence of data sharing as a potential requirement by some agencies and journals warrants attention by the imaging community. Indeed, from July 1st, 2018 the International Committee of Medical Journal Editors (ICMJE) will require a data sharing statement as a condition of consideration for publication of clinical trials [4].

In this article, we discuss potential advantages and disadvantages of data sharing.

From open-access to data sharing

A trend towards larger accessibility to scientific medical knowledge is already visible in the progressive tendency of medical journals in ensuring the open-access option, in which the authors or their institutions pay an article-level fee to guarantee the immediate free availability of their papers [5].

In Table 1 we report the policies of all the 18 general imaging journals on access and data sharing [6,7,8,9,10,11,12,13,14,15,16,17]. This was derived from the current Thomson Reuters list – Radiology, Nuclear Medicine, and Medical Imaging. For comparison, the 17 most-impacted general medicine journals were selected from the current Thomson Reuters list – Medicine, General and Internal [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. Among the 18 imaging journals, four are open access, 12 offer open access as an option (Radiology provides free access 12 months after publication), and two do not offer an open-access option. Among the 17 medical journals, six are open access (The Medical Journal of Australia only for research articles and case reports), eight offer open access as an option (Journal of the American Medical Association [JAMA] provides free access 6 months after publication), two do not offer an open access option, and one (The New England Journal of Medicine [NEJM]) provides free access to research articles 6 months after publication. Thus, the open access option is currently widely adopted by both general imaging journals (11/18) and general medicine journals (8/17).

Table 1 Policies on access and data repository or sharing by major general imaging journals and major general medicine journals

The practice of data sharing entails much more than open access. It is the regulated availability of the original participant-by-participant data obtained during a study, which may include data not yet analysed. Among the 18 general imaging journals, data sharing is not even mentioned by 12 journals, encouraged by three, mandatory only upon request in two, and requested by one. Among the 17 general medicine journals, it is not mentioned by seven journals, encouraged by six, requested by three (NEJM only for data obtained by microarray), and considered mandatory only upon request by one (Table 1). In practice, data repository or sharing is currently not mentioned in the instructions for authors of the majority of general imaging journals (14/18) and major general medicine journals (10/17). Despite individual journals do not mention any policy on data sharing, some publishers such as Elsevier have their own general suggestions, which refer to Open Access [8], even though not immediately visible to the authors when they submit a manuscript. When data sharing is encouraged, authors are informed they should be prepared to provide original study data if requested by the editors.

In recent years, several funding bodies declared the necessity for data sharing. In 2015, the U.S. National Institutes of Health (NIH) expressed its intention to request making the digital data from NIH-funded studies publicly available [36]. Regulatory agencies, specifically the European Medicines Agency, have requested greater data sharing by companies manufacturing drugs and clinical devices. Influential organizations such as the World Health Organization and the U.S. National Academy of Medicine published reports asking for responsible sharing of data from clinical trials [37]. Also, several foundations, for instance the Alfred P. Sloan Foundation [38], the Bill and Melinda Gates Foundation [39], the Ford Foundation [40], the Gordon and Betty Moore Foundation [41], and the National Science Foundation [42], require data sharing and data management plans for all research grant proposals.

The pharmaceutical industry also plays a role in promoting data sharing. The Yale University Open Data Access (YODA) project [43] performs independent scientific review of investigators’ requests for pharmaceutical and medical data from clinical trials on devices marketed by Johnson & Johnson, including both full clinical study reports and participant-level data. Notably, the YODA project has obtained permission to make independent decisions about the release of Johnson & Johnson’s clinical trial data. This project establishes a process in which requests are judged fairly and decisions are made by an independent academic partner, a model that could be applied to other fields of medicine [44].

Another example is the Academic Research Organization Consortium for Continuing Evaluation of Scientific Studies – Cardiovascular (ACCESS CV) [45]. They propose a secure method for sharing patient-sensitive data that combines the protection of patients’ identity with the legitimate desire of the scientific community for data access and the viewpoint of the researchers who created the database. This approach consists of the following steps: (1) After publication of the primary results of a trial, researchers interested in the study data may send a request to the trial's publication committee; (2) Twenty-four months after the publication of the primary study, requests should be considered by a review group composed of members of ACCESS CV not involved in the trial, the trial principal investigator, a trial statistician, and a member of the data and safety monitoring board. This committee evaluates all proposals to approve those that are feasible, hypothesis-based, non-duplicative, and guided by investigators with technical capability and a plan for publication. The period of 24 months is chosen to secure the database and to allow the original investigators to perform their own pre-planned secondary analyses; (3) All requests and subsequent decisions will be posted on an ACCESS CV Web portal, ideally within 60 days [45].

In the field of radiology, data sharing also means accessibility to medical images. Indeed, “Images are more than pictures, they are data” [46]. This implies access to the images produced in a given study for additional reading, interpretation, and extraction. To this end, several image repositories were created. An example is the XNAT Central [47, 48], a publicly accessible data repository based on the XNAT open-source platform which hosts a wide variety of research imaging datasets, especially from neuroimaging, but also from oncology, orthopaedics and cardiology. Other examples are The Cancer Imaging Archive [49] and the Lung Image Database Consortium [50].

Such repositories may be very helpful in several fields, especially for image biomarker development, radiomics and machine learning, each field demanding different approaches. Moreover, the integration, standardization and analysis of these data poses a big challenge, the solution to which may be addressed using cognitive computing. An example of cognitive computing is the system developed by IBM named Watson (IBM Watson Health Imaging, Armonk, NY, USA). It strives to organize available information and present it in a contextually relevant, probability-driven manner to assist healthcare professionals in an objective manner, whether at a reading workstation or at the point-of-care [51]. An important change is underway. To make datasets from medical research publicly available in a timely fashion requires regulations that maximize the benefits and minimize the risks [52, 53]. Indeed, data sharing provides a potential for stimulating new ideas, avoiding duplication of trials, and enhance transparency [36, 54,55,56,57] as well as increasing collaboration and interdisciplinary research [1, 58, 59]. However, at the same time, sharing clinical data presents some risks, burdens and challenges such as the need to preserve the privacy of patients, to defend the legitimate economic interests of the sponsors, and to guard against invalid secondary analyses potentially undermining trust in clinical trials or otherwise harming public health [36, 37, 53, 60].

Potential benefits of data sharing

These can be subdivided into: (i) verification and advancement in knowledge; (ii) reduced cost and time for clinical research; and (iii) clinical improvement (Fig. 1).

Fig. 1
figure 1

Expected pros and cons of data sharing. IPD individual patient data meta-analyses

Verification and advancement in knowledge

The first potential implication of data-sharing is the verification by independent authors of the results presented in a given publication. When data are shared, they may be used by other researchers to perform alternative or supplementary analyses. This ‘second-hand’ analysis may show results in support of the initial findings or could reveal errors or inconsistencies in the original research, or could identify issues needing extended analysis.

In other cases, data sharing can allow elucidation of new results. New findings can be disclosed starting from hypotheses not considered by the original study team. New insights can be presented from existing data but not yet analysed in the original publication(s). Also, investigators may be interested in performing the analysis of datasets coming from various sources to enhance precision, i.e. to perform reproducibility analyses across different databases, regarding established theories or new hypotheses. In fact, reproducibility analysis is crucial for emergent topics in radiology such as standardization of imaging biomarkers, especially from magnetic resonance imaging [61]. The availability of databases from different studies could allow for this gap to be filled and could help in translating new imaging biomarkers into clinical practice [62]. In this regard, reproducibility analysis could become one of the main advantages of data sharing.

The introduction of registries of patients affected with a defined disease could be considered a primitive form of data sharing [63, 64], important not only for widespread diseases, such as cancers, but especially for rare diseases.

Another approach of spontaneous data sharing is that underlying individual patient data meta-analyses [65]. Authors of an individual patient data meta-analysis typically contact the authors of each eligible study asking to share their data, with the aim of creating a new unique individual-patient database. Of note, the power of the individual-patient data approach is higher than that of conventional (study-level) meta-analyses, which rely on complex statistical methods [66]. For instance, in a study published by Marinovich et al. [67] on the agreement between MRI and pathological breast tumour size after treatment, a total of 24 studies (1,228 patients) were eligible for inclusion, but only eight of these contributed to the individual-patient data analysis for a total of 300 patients. Had regulated data sharing been in place, that individual-patient data meta-analysis would have included a much richer dataset. Moreover, data sharing could boost a wider adoption of health technology assessment. Indeed, in the context of a new product evaluation, data sharing may be useful in the validation level, requiring a high number of data/images, rather than at the initial development level.

Another potential advantage of data sharing is to reduce the publication of false studies, especially when the data are intentionally falsified. Recently, 64 articles were retracted from ten Springer journals after editorial checks found fake email addresses, and subsequent internal investigations uncovered fabricated peer-review reports [68]. This retraction came only a few months after BioMed Central had retracted 43 articles for the same reason; however, this phenomenon involved most major publishers such as also SAGE, Elsevier, Informa, and Lippincott Williams & Wilkins [69]. Data sharing might discourage data creation and manipulation, potentially more detectable in a complete database than in reported results.

Reduced cost and time for clinical research

Data sharing could potentially lead to an optimization of time and costs of clinical research by preventing the duplication of trials [70, 71]. For example, costs for the stipulation of insurances for patients’ coverage, the purchase of materials or the salaries of the staff responsible for data collection can be avoided. In addition, using an existing shared database, the new results could be obtained many years prior to those derived from a new clinical study.

Clinical improvement

An effect in terms of clearer evidence on the safety and effectiveness of diagnostic procedures and therapies, improving public healthcare [72,73,74], may be considered the final aim of data sharing. To avoid the loss of findings contained in the original dataset and not used for the primary publication(s) could play a role in this direction [53]. Institutions sharing their data could obtain a more comprehensive picture about the benefits and risks of a medical decision. However, a real clinical improvement from data sharing is a hypothesis that still needs to be demonstrated.

Potential drawbacks from data sharing

The sharing of clinical databases raises several concerns (see Fig. 1). One of the reasons not to share data is that researchers are evaluated competitively, based on the quality and number of articles published during their career, so they may worry that other people will use their data and efforts to produce new publications. The potential for secondary analyses contradicting initially reported results may be a deterrent. Authors may not be willing to share data that had cost them great effort and resources. However, reciprocally, they would also directly benefit from using someone else’s data.

Bierer et al. [75] recently suggested formalizing ‘data authorship’ as an incentive to data sharing: “as a matter of fairness and as a matter of providing an incentive for data sharing, the persons who initially gathered the data should receive appropriate and standardized credit that can be used for academic advancement, for grant applications, and in broader situations”.

Another concern is the potential for fault in the patient identity protection caused by the transmission of sensitive information. Data must be de-identified: de-identification, not simply anonymization, consists of transforming a dataset so that the back identification of individuals becomes impossible or extremely difficult. Different regulations may require different degrees of de-identification, particularly in the absence of informed consents specifying the possibility of data sharing. De-identification can be achieved with different types of data transformations that must ensure patient privacy without affecting data quality [76]. However, the de-identified data do not eliminate all risks of re-identification. Moreover, the reduction of this risk to zero may destroy or significantly impair the utility of the data for subsequent analysis or verification. For these reasons, the stipulation of Data Use Agreements (DUAs) is considered a useful strategy and best practice for increasing the benefits and mitigating the risks of clinical data sharing [77]. Specifically, DUAs address important issues such as limitations on date usage, obligations to data safeguard, liability for harm arising from data usage and publication, and privacy rights that are associated with transfer of confidential or protected data. In contrast, the U.S. Office for Human Research Protections stated that there is no need for separate consent from trial participants for the sharing of de-identified data [4].

A limitation to the adoption of data sharing can originate from technical barriers. The image conformity is influenced by vendor, modality, and acquisition parameters on the one hand; and by image post-processing manufacturer, reconstruction parameters, and software versions, on the other hand. An example is represented by the use in magnetic resonance of arbitrary units that clearly depend on the specific vendor and model, making a between-study comparison impossible. A way to overcome this limitation could be a drastic standardization, with manufacturers defining new shared standards.

Another intrinsic barrier to data sharing could be the poor documentation of datasets, especially if not documented in English. Moreover, important information about methodology might not be contained immediately in the database or immediately retrievable. All these issues should be considered when planning for potential data sharing of research.

To share or not to share?

In conclusion, in a world that moves towards greater transparency and privacy protection, data sharing stands between these two competing interests. Not all concerns on data sharing have already been solved and many questions remain to be addressed: Who is the rightful owner of the data? What is the role of individual patients and advocacy groups in decision making about sharing of data and images? Should Ethics Committees change their approach for study approval? And how? What is the exact role of institutions, especially public ones, that funded the original study? Should patient advocacy groups and funding organizations be involved in decision making about data sharing? These issues must be regulated.

Despite all the above-described issues relating to data sharing, a transition to a more open medical science has begun. If benefits of data sharing will be more and more perceived as prevailing over harms therefrom, this option will win. Researchers and institutions who first seize this opportunity will be on the wave-front of an innovation likely to be in favour of patients and public health. Radiologists should be kept informed of this emerging issue. It is time to share!