Background

Randomized controlled trials (RCTs) are considered the “gold standard” for assessing the clinical efficacy of interventions. However, the high cost and limited efficiency associated with classical RCTs [1, 2] have exposed the need for more efficient designs. Adaptive design, characterized by its flexibility and efficiency, allows for timely decision making based on accumulating trial data [3, 4], such as stopping trials early [5], allocating more participants to better groups [6], or dropping inefficient arms [7]. The advantages of adaptive design in reducing research time [8, 9], saving sample size [10, 11], and improving success rates [12, 13] have prompted many researchers to incorporate it into the new drug development process.

Reviews [14,15,16,17] have specifically focused on the application and reporting of adaptive trials. One review [14] including adaptive trials other than phase I and seamless phase I/II trials, found that seamless phase II/III trials were the most frequently used type, and that many researchers had failed to adequately report dependent monitoring committees (DMCs) and blinded interim analyses. Another literature survey [17], including phase II, phase III, and phase II/III adaptive trials in oncology, found that adaptive design was commonly applied in phase III trials and that the reporting of adaptive design-related methods was inadequate. A review [15] summarizing features of 60 adaptive trials with specific methodology types showed that the statistical method descriptions were poor. A systematic review [16] assessing the reporting compliance of group sequential RCTs by Consolidated Standards of Reporting Trials (CONSORT) 2010 checklist revealed a lack of accessibility to protocols for details. However, these studies had important limitations for addressing the application and reporting of adaptive trials. First, the included adaptive trials were restricted to specific clinical phases and areas of disease. Second, the studies were focused on identifying deficiencies on specific aspects of interest (e.g., statistical methods). Third, none of the studies focused on drug trials. Thus, the findings of those studies were not comprehensive and may not be generalizable to other adaptive design types.

The Adaptive designs CONSORT Extension (ACE) statement, a reporting guidance for adaptive trials, was developed in 2020 to advise clinical researchers on how to report details of the adaptive design [18]; this statement is also considered a valid tool to evaluate the reporting quality of adaptive trials. Our study aimed to retrieve adaptive drug RCTs in all phases and disease areas to systematically investigate the overall application of adaptive design to drug RCTs, comprehensively identify gaps in reporting, and investigate the extent to which adaptive design information was reported before the publication of the ACE checklist, to provide evidence leading to directional improvements and advocacy for adequate reporting in the future.

Materials and methods

Eligibility criteria

We selected studies according to the following criteria: (1) RCTs explicitly stating to be adaptive trials or applying any type of adaptive design; (2) RCTs assessing efficacy or safety of drugs; and, (3) RCTs published in English journals. We excluded: (1) re-published studies; (2) protocols, abstracts, or re-analyses of adaptive trials; and, (3) incomplete trials.

Search strategy and screening

We searched EMBASE, MEDLINE, Cochrane Central Register of Controlled Trials (CENTRAL), and ClinicalTrials.gov databases from inception to January 2020. We used both subject headings and free-text terms related to adaptive clinical trials to identify relevant studies (See Appendix 1 for the search strategy).

Data extraction

We generated a data extraction table to record the following information: first author, publication year, journal (quantile 1 defined by Journal Citation Reports [JCR], others), reasons for utilizing adaptive designs, trial center (multicenter, single-center), whether a trial was international or not, trial clinical phase, adaptive design type, area of disease, type of control (active, non-active, both), type of primary outcome, expected sample size, randomized sample size, and funding source (government, private for-profit, private not-for-profit, not funded, or unclear).

We extracted primary outcome according to the following strategy: (1) if a trial specified primary outcome(s), we selected it or the first one as the primary outcome; (2) if a trial did not specify primary outcomes, we selected the first one reported in the results. Further, we classified these selected primary outcomes into two types: clinical outcomes (clinically meaningful endpoints that directly measured how patients feel, functions, or survives) or surrogate endpoints (laboratory measures or physical signs intended to be substitutes for clinically meaningful endpoints) [19].

Based on the literature [12,13,14], we classified adaptive designs into 10 types: group sequential, adaptive dose-finding, adaptive randomization, sample size re-estimation, adaptive hypothesis, biomarker adaptive, seamless, pick the winner/drop the loser, adaptive treatment-switching, and multiple adaptive designs. We identified and extracted the adaptive design types as planned, regardless of whether they were implemented, which avoided the omission of any types.

Reporting quality assessment

ACE checklist, a specific CONSORT extension to adaptive trials, provided essential reporting requirements to enhance transparency and improve reporting. Hence, we assessed the reporting quality of the included studies by the ACE checklist. First, we evaluated the adaptive RCTs’ compliance for 26 topics of ACE checklist. Second, we also assessed seven essential items (new items) specific to adaptive trials in the ACE checklist, nine modified items relative to the CONSORT 2010 checklist, and six items with expanded text for adaptive design. The response to each topic/item could be “yes”, “no”, or “not applicable”, indicating compliance with ACE, non-compliance, or not applicable, respectively. Based on previous literature, we selected proportions of adherence ≤ 30% as underreporting [20,21,22]. Due to the complexity of the adaptive design, we chose a strict threshold of 80% adherence to define good reporting [23, 24]. To quantify the reporting quality, we used a scoring strategy for every topic, assigning 1 point to “yes” or “not applicable” and 0 points to “no” [3, 4, 25], with a total score ranging between 0 and 26.

Study process

Two-paired method-trained researchers screened abstracts and full texts for eligibility and then independently extracted data from eligible trials using predesigned standardized forms with detailed instructions. Additionally, two researchers trained in the ACE checklist independently assessed the reporting quality of the studies included. Any disagreements were resolved through discussions or after consultation with a third researcher.

Statistical analysis

We used R (4.2.0) for statistical descriptions and analyses. We summarized epidemiological characteristics and reporting adherence on the basis of the extracted data. We reported frequencies with proportions for categorical data and means with standard deviations (SD) or medians with first and third quartiles for continuous data. We compared characteristics and adherence of reporting between trials from quantile 1 (Q1) of JCR and others to identify any potential differences, using chi-square or Fisher’s exact tests for categorical data and Student’s t-tests (if data were normal and variances were homogeneous) or Wilcoxon rank-sum tests for continuous data.

To explore factors associated with the overall reporting scores, we developed univariable and multivariable linear regression models, selecting four factors: publication year (as a continuous variable), trial center type (1 for “multicenter” and 0 for “single-center”), type of outcome (1 for “clinical outcome” and 0 for “surrogate endpoint”), and funding source (1 for “private for-profit” and 0 for “others”). Our aim was to determine whether later published trials or multicenter trials had better reporting quality, possibly due to improved understanding and stringent quality control measures. Additionally, we sought to explore whether the type of outcome and funding source influenced the conduct and reporting of adaptive trials. We tested basic assumptions for the models: whether the residuals followed a normal distribution and whether collinearity among the factors existed (VIF > 10) (α = 0.05).

Results

Literature screening results

Our search yielded 4891 records published in English. After removing duplicates, we screened titles and abstracts of 3597 records according to the eligibility criteria. We assessed the eligibility of the 341 selected records by reading their full texts. Finally, we included 108 clinical trials from 107 records (where one record included two trials), in our survey (Fig. S1).

Epidemiological characteristics of included studies

The use of adaptive design has shown an increasing trend over the years (Figure S2). Group sequential (n = 63, 58.3%), adaptive randomization (n = 26, 24.1%), adaptive dose-finding (n = 24, 22.2%), sample size re-estimation (n = 17, 15.7%), and adaptive hypothesis (n = 16, 14.8%) designs were common types planned in adaptive trials. In addition, 52 trials (48.1%) were planned to apply multiple types of adaptive design. Adaptive designs were mostly used to speed trials and facilitate decision-making (n = 24, 22.2%), maximize the benefit of participants (n = 21, 19.4%), and reduce the total sample sizes (n = 15, 13.9%). We quantitatively present the total sample size reductions after calculating the difference between the expected and randomized sample sizes (Fig. 1). The range of this difference was between − 4829 and 319, with 51 of the reductions (47.2%) being less than 0 (i.e., reducing the total sample size) and 25 (23.1%) being larger than 0.

Fig. 1
figure 1

Difference between the expected and randomized sample size of included trialsa. aOnly one difference (-4829) was less than − 550 and was not shown in the figure due to its small size

Of these trials, 68 (63.0%) were multicenter trials, while only 23 (21.3%) were international trials. Phase II trials (n = 45, 41.7%) were the most numerous within the adaptive trials, followed by phase III (n = 14, 13.0%) and phase I (n = 13, 12.0%) trials. The main area of disease was oncology (n = 28, 25.9%). The most used types of control were non-active control (n = 52, 48.1%) and active control (n = 49, 45.4%). We found 64 trials (59.3%) with selected clinical outcomes as primary outcomes, and others (n = 44, 40.7%) with selected surrogate endpoints. The medians of the expected and randomized sample sizes were 162 (first, third quartile: 86, 400) and 124 (69, 290), respectively. Most trials, 66 (61.1%), were funded by private for-profit institutions, 29 (26.9%) by governments, and 27 (25.0%) by private not-for-profit institutions (Table 1).

Table 1 Characteristics of included studies

Adaptive trials published in JCR Q1 included more international trials than others (28.2% vs. 8.1%, p = 0.03). The clinical phase distributions differed between trials published in JCR Q1 and others (p = 0.01). Fewer JCR Q1 trials considered multiple adaptive designs than others (49.3% vs. 73.0%, p = 0.03). The expected and randomized sample sizes in JCR Q1 trials were larger than in others (median, 457 vs. 188, p < 0.01; 141 vs. 86, p < 0.01, respectively). The proportion of trials with governmental support was higher in the JCR Q1 trials than in others (35.2% vs. 10.8%, p = 0.01). Differences in other characteristics were not statistically significant.

Adherence to the ACE checklist

Overall, of all the 26 topics in the ACE checklist, the adherence rate of the included trials ranged between 7.4% and 99.1%. Eight topics (30.8%) were reported adequately (adherence proportion ≥ 80%). “Interpretation” was the most adequately reported topic (n = 107, 99.1%), followed by “harms” (n = 103, 95.4%), and “numbers analyzed” (n = 100, 92.6%). Eight topics (30.8%) had poor adherence proportions (below 30%), including “SAP and other relevant documents” (n = 8, 7.4%), “blinding” (n = 12, 11.1%), “generalizability” (n = 17, 15.7%), “outcomes and estimation” (n = 23, 21.3%), “Implementation” (n = 24, 22.2%), “baseline data” (n = 25, 23.1%), “protocol” (n = 25, 23.1%), and “sequence generation” (n = 29, 26.9%) (Table 2 and Figure S3). A lower proportion of JCR Q1 trials adhered to the “baseline data” topic than other trials (14.1%, 40.5%, p < 0.01).

Table 2 Adherence to ACE checklist

In terms of items specific to the adaptive design, only one of seven essential items (new items) was adequately reported. In addition, the adherence rate of the item was higher among JCR Q1 trials than among others (98.6% vs. 86.5%, p = 0.03). Three new items were underreported: 8 trials (7.4%) reported the SAP and other relevant documents, 14 (13.0%) studies mentioned measures to safeguard confidentiality of interim information and minimize potential operational bias during the trial, and 25 (23.1%) described assessments of similarity between interim stages. We found a statistically significant difference between JCR Q1 trials and others in terms of the similarity assessments (14.1% vs. 40.5%, p < 0.01). Of the remaining items, two targeted reporting of interim results and adaptive decisions made, and we found lower adherence rates among JCR Q1 trials than among others (17c item, 28.2% vs. 51.4%, p = 0.03; 14c item, 53.5% vs. 78.4%, p = 0.02) (Table 3).

Table 3 Adherence to adaptive design-specific items in ACE checklist

Of the nine modified items, three were adequately reported and none were underreported. Of the six items with expanded text, three were reported adequately and one for generalizability was reported poorly (n = 17(15.7%)) (Table 3). We found no statistically significant differences in either the modified or expanded items between JCR Q1 trials and others.

Scores for the ACE checklist and potential factors associated with reporting quality

Based on our scoring strategy, the mean ACE checklist score of the 108 adaptive trials was 13.9 (SD, 3.5) out of 26, with 13.9 (SD, 3.5) in JCR Q1 trials and 14 (SD, 3.5) in other trials (p = 0.84). Both our univariable and multivariable regression analyses demonstrated that later published trials and the multicenter trials were associated with better reporting than the other trials (Table 4). We failed to find any associations between the type of outcome, the funding source, and the reporting quality.

Table 4 Univariable and multivariable analyses for reporting score

Discussion

Main findings and interpretations

We comprehensively identified the available adaptive drug RCTs and showed that the use of adaptive designs has been increasing. The adaptive designs have been applied mostly to speed the trials and facilitate decision-making. Adaptive designs have commonly been used in phase II and in oncology trials, with group sequential design being the most popular type. Adherence to the ACE checklist varied across 26 topics. We found adequate reporting for eight topics, and poor reporting for eight others. Moreover, we found a discrepancy between the new, modified, and expanded items, which are specific to adaptive designs in contrast to the CONSORT 2010 checklist. Through univariable and multivariable analyses, we explored potential influencing factors and found that trials published more recently and multicenter trials were associated with better reporting.

Our findings are partially consistent with those in other adaptive trial reviews [14,15,16,17, 26]. We found that adaptive designs have commonly been applied to phase II trials, whereas a previous review [17] that included phase II, phase III, and phase II/III RCTs on oncology found that adaptive designs were common in phase III trials. This discrepancy could be attributed to differing search strategies and inclusion criteria. Common applications in oncology and the frequent use of group sequential design were consistent findings in both our study and other reviews [14, 17]. The poor reporting identified in other studies [14,15,16,17] was limited to data monitoring, methodology, and accessibility to protocol. We also identified these deficiencies, which have been explicitly included as items or topics in the ACE checklist, such as the “measures for confidentiality” item and the “blinding” and “protocol” topics.

We also identified inadequate reporting of other important items, especially for those specific to the adaptive design. First, we consider the poor reporting of the “similarity assessment” item in the “baseline data” topic as a crucial matter. Baseline data may vary due to time drift, leading to dissimilarities between interim stages. This could affect analyses between and within different interim stages [27, 28], ultimately compromising the validity of the trials’ results. Second, the “outcomes and estimation” topic, which pertains to reporting of interim results, was insufficiently reported. This lack of reporting is not conducive to supporting interim decision-making. Unreasonable or unplanned adjustments may be made if actual decisions are contrary to what interim results direct, resulting in an increase in type I errors and incorrect conclusions [25]. Third, the reporting of the “SAP and other relevant documents” topic, a new topic added to ACE checklist, was unsatisfactory. Supporting documents could provide detailed information on adaptive designs, including the adaptive design type, statistical methods, and pre-planned decision-making criteria [29, 30], and these would increase the transparency and credibility of adaptive design trials. All of the above issues are critical for adaptive trials and should be taken seriously when reporting.

We identified additional general deficiencies that have also been prevalent in traditional trials, such as a failure to report allocation concealment and implementation. Moreover, we found that the more recently published and the multicenter trials were associated with more adequate reporting than other trials. This may reflect the widespread use of adaptive design and the increasing emphasis on the importance of its adequate reporting [31]. The rigorous quality control in multicenter trials had a significant role in this improvement [32].

Strengths and limitations

Our study has several strengths. First, we included all phases of adaptive randomized trials and exposed their comprehensive characteristics. Second, we thoroughly assessed the reporting quality of the adaptive trials, using new, modified, and expanded items of the ACE checklist specifically tailored to adaptive trials. Third, we rigorously implemented a study process, which included developing inclusion and exclusion criteria, screening the literature, extracting data, and assessing report quality.

We are aware of the limitations of our review. First, our literature retrieval included only trials that explicitly claimed to be adaptive or used certain types of adaptive design. We failed to include trials with similar design details which did not explicitly claim an adaptive design. Therefore, it is possible that we missed some adaptive trials due to the limitations of our search strategy. Second, many topics contain multiple items, and our overall adherence rates for such topics do not accurately reflect adherence to each individual item. Finally, we only searched the literature up until 2020 to coincide with the publication of the ACE checklist. Hence, our results on reporting quality only exposed gaps in adaptive trials prior to the publication of the ACE checklist, highlighting areas for further improvement.

Suggestions for reporting of drug adaptive randomized trials

Flexibility is a significant strength of adaptive designs, but it emphasizes the need for rigorous reporting of both pre-planned and actual changes in adaptive trials. Our results indicate that reporting on drug adaptive randomized trials is frequently inadequate, especially on essential items that include the SAP accessibility, confidentiality measures, and assessments of similarity between interim stages. This inadequate reporting may lead to ambiguity regarding planned modifications and the reasoning behind actual decisions, ultimately undermining the credibility of the findings from drug adaptive design trials.

Future adaptive trials should adhere to the ACE checklist to ensure that all pertinent details get reported, particularly regarding items essential to the adaptive design. Journals should consider requiring authors to follow the ACE checklist when reporting the design, analysis, and results of adaptive trials.

Conclusion

The use of adaptive design has increased, and is primarily in early phase drug trials. Group sequential design is the most frequently applied method, followed by adaptive randomization, and adaptive dose-finding designs. However, the reporting quality of adaptive trials is suboptimal, especially in terms of essential items. Our findings suggest that clinical researchers need to provide adequate details of adaptive design and adhere strictly to the ACE checklist. Journals should consider requiring such information for adaptive trials.