The level of complexity in the design, and ambiguity with regards to availability of information can compound intended secondary analyses to varying degrees. While accurate description of trials and the data being shared is an issue that in principle can be tackled, handling complexity in trial design is more challenging. The challenges that exist in the reuse of data are further exaggerated when data from multiple studies are reused and pooled; as with our examples. Re-analysis of data from a single trial may not always be impaired by limited understanding of the original study. In such cases, most aspects of the study design will remain similar for all participants and are thus less likely to influence the results. However, in a pooled analysis, which poses a special case of data reuse that is particularly useful for increasing sample sizes and for making comparisons that are otherwise not possible, even slight differences between pooled datasets could present a source of heterogeneity. Baseline differences between cohorts, differences in which patient characteristics and outcome measures are used and how they are recorded, as well as missing values, can all affect the planned analysis. A thorough understanding of all aspects of the studies, as well as selection of appropriate statistical methods, therefore, becomes imperative.
This added layer of complexity, introduced when data are combined across trials, requires that these pool-and-reuse studies themselves be carefully designed beforehand, very much like clinical trials. However, this careful design can only happen when highly detailed information about the original studies is provided. Projects, such as that described in Goldacre et al., look to link clinical trials with relevant trial documentation such as protocols, reports, and trial forms, as well as other literature of potential interest. This is a step in the right direction to provide researchers with the critical information needed when reusing trial datasets [7].
Nevertheless, even with existing barriers in the current state of trial data sharing, there is merit in reusing these often rich and much invested-in data. Our analysis of data, combined from eight prostate cancer clinical trials demonstrating the survival benefits of some standards of care over others [11], exemplifies potential gains from trial data pooling and reuse. In a second example, pooling data on placebo-treated patients across many failed trials in Alzheimer’s disease allowed for the identification of three trajectories of disease progression [12]; generating new hypotheses that would never have come to light in a standard trial.
It has been previously proposed that secondary users collaborate closely with primary clinical trialists to allow them to retain ownership and control of the uses of their data, but also to prevent secondary users from misunderstanding trial complexities and nuances [13]. This, however, is not always easy to achieve in practice. As suggested by Ohmann et al., involvement of data generators is not a necessity but primary data generators should have the option of being alerted about who requests access to their data and when [4]. A possible alternative is to get clinical specialists, experts in the specific field of medicine of focus, involved in the research, to advise and help inform analytical decisions. It may be worthwhile for data providers to require evidence of relevant clinician involvement in applications for access to trial data in a specific area of medicine.
A further consideration for clinical trial researchers may be that, if secondary analysis is considered an important outcome in itself, trial protocols should be designed with secondary analysis in mind. However the primary objectives of research cannot be forgone for the sake of secondary research potential. One possible solution would be to provide, prior to sharing of actual data, synthetic datasets that enable the design of pool-and-reuse studies by preserving the essential properties of the data.
A perhaps more sustainable solution is to impose standards for clinical trial data sharing. These have been proposed and lessons can be learned from efforts made to standardize electronic health records data, and recommendations made for non-commercial clinical trials. It may be the case that these need to be further developed and enforced, to provide detailed, complete data annotations, helping circumvent many data-related obstacles and allowing for more usable and utilitarian trial data sharing.