Background

A meta-analysis aims to combine findings from different studies to obtain a more precise estimate of the average effect of an intervention or the size of an association, or to explore how and why results differ across studies [1]. There are several ways of synthesizing study data [2, 3]. Generally, a meta-analysis may combine study level data or individual participant level data. Study level meta-analyses combine estimates from multiple studies to generate a summary estimate. Individual participant data (IPD) meta-analyses (MA) combine data from each specific participant from multiple studies into a single dataset for further analysis [4]. IPDMA are considered the “gold standard” [5,6,7,8,9] and possibly preferred to study level meta-analyses because they allow researchers to use the most current and comprehensive data, verify the findings of previous investigations, apply uniform definitions and analyses across studies, and avoid potential ecological bias when investigating interactions between interventions and patient-level characteristics (effect modifications, subgroup effects) [7, 8, 10,11,12]. Similar to systematic reviews and study level meta-analyses, IPDMAs often influence practice guidelines and the design of new trials [13, 14].

Ideally, an IPDMA should be based on IPD from all studies included in a systematic review, regardless of the study designs chosen for the systematic review [15]. An IPDMA can be conducted on data from randomized trials, observational studies, including registries, and other study designs although there are risks and challenges in combining these different study designs. However, fewer than half of systematic reviews with IPDMA, published between 1987 and 2015, retrieved data from at least 80% of relevant studies and from at least 80% of relevant participants [16]. The number of IPDMAs increased over this period [17], but data retrieval rates remained unchanged [16, 18]. Inability to include eligible studies compromises the systematic review’s purpose, decreases study power, and leads to healthcare decisions based on an incomplete, potentially biased data sample (studies with available data may differ from those whose data are not available) [10, 19, 20]. However, analysis combining individual and study level data may mitigate these effects [3, 21].

Since the first IPDMA guide published in 1995 [7], researchers have found that the process of obtaining, managing, and organizing IPD is typically the most resource intensive and time consuming step and may require years to complete [1, 4, 7, 16, 22]. Thus, many systematic reviews rely on study level data even though sharing IPD and conducting IPDMA would be more useful [8, 20, 23,24,25,26,27,28,29,30,31].

Study participants have also understood the benefits of data sharing and are generally willing for this to happen, but may fear the loss of data confidentiality, misuse, or sharing without consent [32,33,34,35]. Governments [36, 37], research organizations [38,39,40], scientific journals [38, 41,42,43,44,45,46] and the pharmaceutical industry [47, 48] have developed data sharing policies. The Institute Of Medicine (IOM) has released four recommendations to guide responsible data sharing [49]: (1) maximize the benefits of clinical trials while minimizing the risks of sharing clinical trial data, (2) respect individual participants whose data are shared, (3) increase public trust in clinical trials and the sharing of trial data, and (4) conduct the sharing of clinical trial data in a fair manner. In July 2013, amid some criticism [50, 51], the European Federation of Pharmaceutical Industries and Associations (EFPIA) and the Pharmaceutical Research and Manufacturers of America (PhRMA) issued a joint statement describing the principles of responsible data sharing [47]. Several pharmaceutical companies and academic institutions are now working to handle data sharing requests in a more timely, better organized, and increasingly transparent manner by using the services of independent data sharing platforms or creating their own [52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76].

Based on a review of the literature and our own experience with conducting IPDMAs, our goal was to provide practical guidance for researchers to successfully obtain IPD of eligible studies and to reduce resources required for IPDMA. We describe the key challenges and propose solutions to navigate obstacles commonly associated with IPDMA in the light of recent changes in data sharing policy and practice [16, 47, 77].

Methods

Search strategy and inclusion criteria

After delays during data acquisition for our recent IPDMA of the use heparin in patients with cancer [77], we noticed changes in data sharing policy and practice [47, 78] in clinical trial data access and began to log our setbacks and solutions. We then conducted systematic searches of MEDLINE, Embase, and the Cochrane Library (from inception of each database until January 2019) to identify publications describing strategies to obtain IPD or IPDMA best practice. An experienced research librarian helped design a comprehensive search strategy using MeSH terms and text words (Additional file 1) without any language restrictions.

Eligibility criteria included (1) articles describing IPDMA best practice including topics such as planning, cost, required time, common burdensome tasks, or administrative issues; (2) systematic reviews describing trends in IPDMA including topics such as IPD retrieval rates; (3) quantitative or qualitative studies describing strategies, barriers, or facilitators to obtain IPD from industry or investigator-sponsored studies; and (4) case reports describing authors’ attempts to obtain IPD. We excluded IPDMAs reporting on a specific clinical question or statistical papers, e.g. studies describing different techniques of combining IPD with study level data.

Screening

Two methodologically trained reviewers (MV and VG) independently screened titles and abstracts. If eligibility was suspected or unclear, we obtained full texts. Three reviewers (MV, MB, VG) screened full texts independently and in duplicate. Disagreements were resolved by discussion and consensus. From included articles we extracted information providing practical guidance for researchers to successfully obtain IPD and to make the conduct of IPDMA more efficient. Our scoping review adheres to the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines [79].

Additional sources

Several publications examining specific data sharing issues outside of the context of an IPDMA (e.g. data sharing models or author reimbursement in general) did not meet the inclusion criteria for the scoping review but were referenced to provide additional context. In addition, we searched websites of pharmaceutical companies which have publicly certified with PhRMA or EFPIA as having complied with the Principles for Responsible Data Sharing [47], data repositories [52, 76, 80], and industry organizations [81, 82] for press releases and other information about policies for sharing IPD. Finally, we drew from the authors’ experiences in providing, seeking, or using IPD - in particular, a recently conducted IPDMA investigating heparin use among cancer patients [77]. Based on the systematically identified literature, policy websites, and our own experience we developed practical guidance for IPDMA researchers that we structured according to the course of tasks when conducting an IPDMA.

Fig. 1
figure 1

PRISMA flow diagram

Table 1 Included articles with direct relevance to guide researchers in the conduct of IPDMA
Table 2 Summary recommendations for obtaining individual participant data

Results

The systematic search of our scoping review yielded 3470 titles and abstracts (Fig. 1). We identified 16 eligible articles that are presented in Table 1 together with a short description. In Table 2 we summarize our main recommendations for researchers when retrieving data sets for IPDMAs and provide corresponding explanations and elaborations in the following sections.

Identifying relevant studies

A sensitive search for all eligible studies, published and unpublished, is crucial for all systematic reviews to minimize publication bias [135]. Cochrane provides useful techniques to identify and obtain published as well as unpublished study data [15, 134]. Trial registries or regulatory bodies may be instrumental in identifying unpublished eligible studies and constitute an initial contact point (e.g. corresponding author or data sharing administrator) for data sharing requests. See Additional file 1 for detailed information about the International Clinical Trials Registry Platform and Additional file 1 on the United States Food and Drug Administration and the European Medicines Association. In principle, there are two approaches to obtain IPD: (1) direct contact with study authors, or (2) requests via a data repository [131].

The data collection process of our own IPDMA occurred between October 2012 and June 2016 [77]. All data requests were placed by contacting study authors except for two of the 19 studies, which we learned by reviewing each organization’s data sharing policies, required use of the online data request portal clinicalstudydatarequest.com (CSDR). For all studies, we requested access to the clinical trial data, meta-data, study protocol, annotated case report forms, and clinical study report.

Requesting study data through personal contact

Analysis of data sharing requests submitted solely through study authors indicates that 58% of requests are successful [129]. Qualitative research examining useful techniques to obtain unpublished data indicates that concise, friendly requests which minimize additional responsibilities (e.g. drafting a data sharing agreement, converting old datasets to digital format) for the primary study author and attempt to establish a personal connection would be more likely to receive a response [136]. IPDMA authors typically attempt contact several times before quitting; the most persistent tried every 6 months for 2 to 3 years [130, 131, 136, 137]. From our own experience, obtaining data sets through personal contact required as little as 4 months and as much as 4 years. Every corresponding author or study sponsor responded to our request; but we made repeated contact attempts via email, fax or phone. In some cases, we reviewed the institution’s data sharing request policy to identify additional data sharing contacts (eg. organizational email address such as datasharing@Amgen.com) or alternative request procedures (eg. submitting a request through an independent data repository such as clinicalstudydatarequest.com). A description of our approach to correspondence and a sample email request are available in Table 3 and Additional file 1, respectively. Email correspondence is often fragmented and delayed. Organizing phone or in-person meetings, e.g. at conferences, was often useful when explaining the IPDMA’s purpose and anticipated tasks to study authors before any data was shared and whenever detailed discussions of complicated issues (e.g. security of data storage servers) was necessary. These conversations also led to the development of personal relationships with study authors which we felt eased correspondence throughout the data sharing and analysis process.

Table 3 Approach to email correspondence

Primary authors may lack time, funding, or organizational resources to support essential data sharing tasks (e.g. transferring data to an electronic format, drafting data sharing agreement). Our IPDMA research team offered assistance with these tasks whenever possible. Recording contact information and roles of data sharing stakeholders (e.g. administrators, statisticians, industry liaisons, ethical and legal representatives) is essential. This eased subsequent communication which often occurred years after the first data request as the IPDMA progressed to publication.

Requesting study data via data repository or data sharing administrator

In our IPDMA, two datasets were requested and approved through CSDR, a consortium of clinical study sponsors and funders which facilitates responsible data sharing [138]. IPDMA authors may be required to directly contact a data repository or data sharing administrator and submit a full study proposal rather than make a simple inquiry [139]. Initially, we reviewed the list of studies with data available to be requested but neither dataset was available from the study’s sponsor. For one study, the sponsor had not yet properly curated the data. Despite this, we contacted CSDR via email, followed by a teleconference, and this process was expedited at our request. For the second, the study sponsor was in the process of establishing a presence on CSDR and shared data after doing so.

In our experience, the process of submitting data requests on CSDR takes approximately 30 to 60 min; it was intuitive, and directions were available [78, 140]. Our request package identified the specific study by the title and National Clinical Trial number and included our study protocol, timeline, funding sources, description of research team members’ experience and roles, conflicts of interest, and publication plans. Knowledge of jurisdictional laws (e.g. Personal Information Protection and Electronic Documents Act and General Data Protection Regulation) and collaboration with legal representatives was required before submitting data sharing requests and while negotiating data sharing agreements. Approximately 4 months were needed to process each data sharing request and finalize the data sharing agreement, consistent with CSDR estimates [120, 141]. After finalizing the data sharing agreement, our questions pertaining to data sharing processes or system technical difficulties were typically responded to within 1 day.

As of December 31, 2019, 1429 requests were made for data on CSDR which were not listed by the study’s sponsors; 559 submissions were approved and 843 denied, while 51 are still under consideration [142]. Of companies which have received at least 40 requests for non-listed studies, the reported lowest percentage of approval is 9%, (Eisai), and the highest 74% (GlaxoSmithKline) [142]. Geifman et al. reported the data request process via CSDR to be unnecessarily lengthy, while requests submitted through Project Data Sphere, an alternative data sharing platform devoted to cancer related clinical trials [76], required only days before data access was provided [143].

The joint PhRMA and EFPIA statement represents the minimum clinical transparency standard, but participation is voluntary [47, 144, 145]. Industry sponsors which are members of PhRMA or EFPIA are more likely to publicize a data sharing policy and make trial data eligible for sharing [146, 147]. For pharmaceutical companies publicly certifying compliance with the Principles for Responsible Clinical Trial Data Sharing through the PhRMA or EFPIA websites [83, 84], the data access points, summary of data made available, and date from which the pharmaceutical company’s IPD sharing policy applies is exhibited in Table 1 and Table 2. Certified pharmaceutical companies with data procedures that could not be confirmed through additional internet searching are not included. Each sponsor’s specific policy should be referred to for a complete review of available data. A sponsor’s exclusion from Table 4 or Table 5 is not meant to indicate they are not wholly committed to data sharing, but that as of March 5, 2019, certification of their compliance with the Principles for Responsible Clinical Trial Data Sharing was not confirmed through PhRMA or EFPIA websites [83, 84]. Repositories may also provide access to study data which is sponsored, generated or stored by governments, universities, charities and research organizations [52, 80].

Table 4 Data availability of pharmaceutical companies displaying certification via PhRMA or EFPIA websites which solicit data requests via online data sharing platform [47, 83, 84]
Table 5 Pharmaceutical companies displaying certification via PhRMA or EFPIA websites, which solicit data requests through email [47, 83, 84]

Examining the data sharing procedures of certified pharmaceutical companies, 26 use at least one internal or external online portal to manage data sharing requests, including clinicalstudydatarequest.com (12), vivli.org (11), yoda.yale.edu (1), fasttrack-bms.force.com (1), https://biogen-dt-external.pharmacm.com/DT/Home (1) and https://www.purduepharma.com/healthcare-professionals/clinical-trials/#request-trial-data (1). Data requests for the remainder of certified pharmaceutical companies are solicited via email. In Table 3 we describe the data request review processes from each pharmaceutical company certified through PhRMA or EFPIA. As of January, 31, 2020, 3123 studies were available on request through CSDR [142]. Vivli, an independent non-profit data-sharing and analytics platform, lists over 4900 studies [148]. Pharmaceutical companies with data procedures that could not be confirmed through internet searching are not included in Table 6.

Table 6 Data request review process of pharmaceutical companies displaying certification via PhRMA or EFPIA websites which solicit data requests via online data sharing platform [47, 83, 84]

Incentives for data contributors

Study authors and data curators who generated, managed and shared data, and provided commentary on findings make considerable efforts that should be recognized. Given the role in data collection and interpretation of data, we offered authorship or acknowledgement on relevant publications to corresponding authors and individuals the corresponding author deemed worthy of authorship or acknowledgement. Researchers generally agree that trialists who share data deserve recognition and propose several methods including, direct financial payments, publication incentives, and consideration of previous data sharing practices by funding agencies, and consideration by academic institutions in decisions regarding career promotions, or the possibility of penalties to large organizations, such as fines or suspension of a product’s market authorization, for those refusing data sharing [27, 136, 149,150,151,152,153,154,155,156]. Authorship also enables primary researchers to contribute to the manuscript before publication and reduces anxiety about a lack of control over data and fellow researchers’ ability to understand shared data or IPDMA results [153, 154].

There are several administrative, standardization, human resources and opportunity costs to properly preserve a data repository, manage requests and prepare data for additional analysis which IPDMA authors may be asked to contribute to [157,158,159,160,161]. Academic researchers are expected to pay between $30,000 and $50,000 annually to list up to 20 studies on CSDR [162]. Vivli asks researchers and pharmaceutical companies to pay between $2000 and $4500 per listed study [163]. We obtained funding to offer reimbursement of minor expenses associated with data sharing (e.g. shipping fees for datasets which corresponding authors preferred not to send electronically) but did not offer direct payment for time required to prepare study data, negotiate data sharing agreements, or respond to analytical questions. Funding for these tasks was also not requested by any of the collaborating parties. Offering a small financial incentive, 100 Canadian Dollars, to primary study authors has not improved IPD retrieval rates [137].

Setting up a data sharing agreement

Data sharing agreements describe the conditions which the IPDMA research team should respect in exchange for permission to analyze specified data from a trialist or study sponsor, and are recommended when sharing data [49, 164, 165]. Data sharing agreements describe the study rationale, analysis plan, contents being exchanged, participant confidentiality, timing of data sharing, data storage and security measures, third party data sharing, intellectual property rights, publication plans and authorship, among others. We adapted previous data sharing agreements to suit the institutional policies of respective study sponsors. Eight of the 14 eligible studies utilized data sharing data sharing agreements while the remaining six did not feel it was necessary. However, we do recommend their use. We sought feedback from our institution’s industry liaison department regarding legal phrasing and implications of the data sharing agreement. Additional file 1 presents an example data sharing agreement with further details. We had to negotiate amendments to ratified agreements if institutional policies changed, if there were data sharing issues affecting agreements with others, or when we conducted additional analyses.

Time to data retrieval and reasons for refused requests

Two of our data sharing requests were not granted (one because of ongoing analyses and the other because it could not be transferred to a shareable electronic format) and three could not be pursued because of timeline and resource restrictions. This meant that we were unable to obtain data for 18% of participants (n = 1763) [77]. Contacting trial authors, negotiating data sharing agreements and awaiting publication of study results are common reasons for delays. Approximately 43% of IPDMAs obtain at least 80% of IPD [16]. The IOM recommends that sponsors make available the “full data package” to external researchers no later than 18 months after trial completion and the “post-publication data package” no more than 6 months after trial completion [49]. In practice, the time until IPD become available after trial completion varies greatly. This availability is influenced by when primary results are published and when a drug’s development program is terminated or approved by regulators, among other factors [52].

Data which are commonly unavailable include commercially confidential information (information not in the public domain which may undermine the legitimate economic interests of the company [166]), and study data which were not submitted as part of a marketing authorization package [52]. Sponsors may require that secondary analysis investigate the same indication as the primary analysis because study participants have not provided consent for other investigations. Many sponsors have recognized this impediment and changed their participant consent forms accordingly [52]. Systematic reviews have identified several other technical, motivational, economic, political, legal and ethical barriers to data sharing such as inclusion of data from grey literature, increased costs due to use of commercial data sharing platforms, and advancing data anonymization standards [16, 160, 167, 168].

Authors’ motivations for accepting or rejecting data sharing requests include advancing science, improving healthcare, complying with employer, funding, or sponsor policies, participant privacy, perceived effort and personal recognition [20, 25, 49, 153, 154, 167,168,169,170,171,172,173]. Some argued that older trials require excessive time and resources to properly anonymize IPD, update databases to current standard or transfer data to an electronic format, assuming they have not been lost [16, 137]. Sharing of databases may be refused because datasets are too large to properly anonymize and transfer to other researchers [52, 174]. In such cases, IPDMA researchers may request only relevant variables rather than entire raw datasets which will be smaller and obstruct the ability to use multiple variables to identify a study participant.

If a request is denied, IPDMA researchers may combine IPD with study level data to examine the potential impact of studies without IPD on results and to understand the totality of the evidence [3, 19, 175,176,177].

Managing retrieved IPD

Reviewing supplemental material and readying datasets is a time consuming and resource intensive task [159]. Older datasets generally require additional maintenance as they are not digitally recorded or coded to current standards. For our IPDMA, we reviewed the study protocol, publications, clinical study reports, annotated case report forms and other shared files, before and alongside data extraction to understand the dataset and ensure accuracy. Annotated case report forms are particularly helpful in understanding shared data as they connect each specific variable in a dataset to when, why, where, or how the data was collected. We logged inconsistencies and typically resolved them through discussion with study stakeholders (e.g. trial coordinators). Important inconsistencies should be described in publications following the Preferred Reporting Items for Systematic Review and Meta-Analyses of Individual Participant Data (PRISMA-IPD) statement [178].

We created a unified database that was verified by two researchers. Our data sharing agreements require that shared data will be deleted within 6 months of results publication which requires careful planning of all analyses.

In our own IPDMA, access to one study required use of the SAS Clinical Trial Data Transparency (CTDT) portal and approval from the institutional review board and trial sponsors [179, 180]. A manual is provided to assist researchers using the CTDT portal, but training is needed if researchers are unfamiliar with statistical analysis programs [180,181,182,183]. A dedicated support team is available to resolve technical issues. Analysis of data accessed through the SAS CTDT portal may require IPDMA researchers to temporarily upload remaining data to this platform. The consent of clinical trial study sponsors not using the SAS CTDT system may be required before doing so. Conversely, IPDMA researchers may also try to negotiate the download of data typically securely accessed through the CTDT system. For further review of methodology and statistical issues for IPDMA see Debray et al. 2015 [176].

Confidentiality and data storage

In our IPDMA, we deleted information from databases that identified study participants (e.g. names or phone numbers) because storing personal information is not in the interest of study participants. Indeed, the general public and study participants worry about storing or sharing of personally identifying information, obtaining appropriate consent to use data, and relationships with the study investigators [26, 34, 184]. IPDMA researchers must be aware of local laws and sponsor policies about the storage of personally identifying information [16]. Concerns about lack of anonymity are also common when requesting data from case-studies or case-series involving fewer than 50 participants, trials of rare diseases or trials assessing genomic data [52]. Thus, all data requires storage on secure password protected servers where access is provided only to those directly involved in data analysis according to available standards [52, 185,186,187].

Discussion

We conducted a scoping review of challenges and solutions to obtaining and using IPD and supplemented this with descriptions of our own experiences to guide and facilitate future IPDMA. Many of the practical issues we identified are new compared to the Cochrane IPDMA working group’s guide published by Stewart and Clarke in 1995 [7]. Technological and cultural changes have modified the ways in which researchers communicate and collaborate and the ways data are shared, managed and analyzed. Recent guidance on the use and appraisal of IPDMAs [188, 189], reporting standards [178], data sharing [49], and statistical techniques [176] have influenced these policies.

Our IPDMA identified 19 eligible studies and 10,032 eligible participants which is above the median of typical IPDMAs (i.e. 14 eligible studies and 2369 participants) [16]. Unexpected delays throughout the data gathering process resulted from challenges in communication and the need to adapt to modifications in the various sponsors’ data sharing practices, which were evolving alongside industry and government policy. Some of these changes included the joint PhRMA/EFPIA statement on the principles of responsible clinical trial data sharing [47], launch of the AllTrials campaign [190], GlaxoSmithKline introducing the first online data request platform before transitioning to CSDR in 2014 [191], and influential publications highlighting the importance of data sharing and open science [192,193,194].

Limitations and strengths

This manuscript was not planned before starting the IPDMA which we use as a primary example in this work but because of the many challenges, we were encouraged to provide guidance. Thus, our solutions are based on firsthand experiences but have not been formally compared to alternatives and may not be applicable to all IPDMA. Our perspective is that of IPDMA researchers and not of trialists, sponsors, or data sharing administrators who may disagree with our proposals. Other IPDMA or study stakeholders may identify additional obstacles or solutions not described here but we have conducted a scoping review to overcome that limitation.

Relation to other studies

We identified several publications which aimed to provide a firsthand description of specific data sharing experiences [16, 23, 143, 195,196,197]. For example, Savage and Vickers obtained only one of 10 requested studies and established contact with only five of 10 corresponding authors [196]. Data from the remaining four studies were not shared because preparation was too laborious, data were forbidden from being shared, or required an extensive proposal submission [196]. Jaspers and Degraeuwe described their attempt to conduct an IPDMA, which was eventually abandoned because they were able to obtain only 40% of IPD. Barriers to accessing data were similar to those we describe here and included difficulties establishing contact with study authors, denial of requests for raw datasets because of ongoing analysis or because of a lack of time and personnel to properly prepare data. Geifman et al. and Filippon et al. reported costly and repeated data sharing requests [143, 197]. Nevitt et al. performed a systematic review of IPDMAs published between 1987 and 2015, and reported that only 25% of published IPDMAs had access to all identified IPD and no improvement in data retrieval rate over time [16]. IPDMAs were associated with retrieving at least 80% of IPD if they included only randomized trials, had an authorship policy which provided an incentive to share data (e.g. co-authorship), included fewer eligible participants, and were not Cochrane Reviews.

Conclusions

As shifts in data sharing policy and practice continue, and the number of IPDMA pursued increases, IPDMA researchers must be prepared to mitigate the effects of project delays. Knowledge of how to establish and maintain contact with study stakeholders, negotiate data sharing agreements, and manage clinical study data is required. Broader issues including designing trials for secondary analysis, participant confidentiality, data sharing models, data sharing platforms, data request review panels and recognition of primary study investigators must also be understood to ensure an IPDMA is conducted to appropriate scientific, ethical, and legal standard [128, 198,199,200,201,202,203,204,205,206]. We hope that a shift away from peer-to-peer requesting procedures towards data repository requests will help [207]. The discussion of specific data sharing issues such as the effectiveness of data sharing policies [208], output of data sharing endeavours [209], confidentiality of commercial information, whom data is shared with, timelines for data requests, and appropriately compensating data sharing parties must continue [26, 27, 49, 200, 210,211,212]. Additional research investigating the effectiveness of data acquisition techniques [133], platform features which aid the sharing of clinical trial data [213,214,215], incentives for data sharing [171, 208], participant broad consent for data sharing [216] and the decision to pursue an IPDMA versus study level MA is needed.