Introduction

The drivers of protocol complexity are constantly evolving in step with the strategies that dominate drug development at any given time. In the 1980s, for example, the pursuit of blockbuster therapies expanded the number of assessments conducted, clinical investigators engaged and patients enrolled in later stage, phase III clinical trial designs [1]. During the next decade, cost containment measures and growing interest in cycle time reduction prompted clinical teams to increase their use of contract research organizations (CROs) and engage with larger numbers of private sector, community-based investigative sites [2, 3]. During this decade, protocol designs began capturing more endpoints and conducting even more assessments—most notably in phase II—in an effort to inform, and even avoid, transitioning into the more expensive phase III clinical trials [4].

Pressure to reach treatment-naïve patient communities, identify less expensive though well-trained investigators, and support simultaneous international submissions drove more globally oriented protocol designs in the early 2000s [5, 6]. During this decade, regulatory agency interest in quality by design principles and in improving risk evaluation and mitigation drove growth in the number of safety procedures and the volume of data collected in phase I and II protocol designs [7,8,9].

Between 2010 and 2020, sponsor companies, in pursuit of more flexible and efficient clinical trials, piloted and implemented more novel designs—including adaptive and master protocols [10,11,12]. During this period, the proportion of programs in the global drug development pipeline targeting rare diseases and narrowly defined patient subpopulations increased dramatically supported by rapid growth in the volume of biomarker and genetic data collected per protocol [13]. The end of this decade also saw heightened interest in collecting real-world data and patient health information to supplement, and even replace, data collected during the clinical trial [14].

Since the 1980s, nearly all protocol design changes—both scientific (e.g., number of endpoints, eligibility criteria and procedures performed) and executional (e.g., number of countries and Investigative sites)—have been additive. In the past two decades, research routinely and periodically conducted by the Tufts Center for the Study of Drug Development (Tufts CSDD) in collaboration with several dozen pharmaceutical companies finds that benchmarked protocol designs have yet to show a downward trend in any given design element [15, 16].

Tufts CSDD research also demonstrates that as protocol designs become larger in scope and more demanding, clinical trial performance worsens. Protocols with a higher relative number of endpoints, eligibility criteria and procedures are associated with lower physician referral rates; increased procedure administration burden; diminished study volunteer willingness to participate; lower patient recruitment and retention rates; lower dose adherence; increased data volume; and a higher incidence of protocol deviations and substantial amendments. Ultimately, these outcomes contribute to higher failure rates, longer clinical trial cycle times, poorer data quality and greater drug development study and program costs [17, 18].

Early in the current decade (2020–2030) the rapid deployment and adoption of decentralized clinical trials (DCT) has already been recognized as an important and defining new drug development strategy. Virtual and remote approaches include the use of telemedicine, wearable devices; mobile applications; procedures performed at more convenient locations by visiting study staff; and investigational drugs delivered directly to the study volunteer’s home. The shift to decentralized clinical trials has been facilitated largely by the COVID-19 pandemic and by heightened interest in improving access to, engaging and enrolling more, demographically diverse study volunteers [19].

Empirical data characterizing the impact of DCTs on protocol design are yet to be collected. This paper presents the results of a study benchmarking protocol design practice just before the onset of the global, COVID-19 pandemic. As such, it provides a valuable opportunity to serve as an important baseline for making comparisons and drawing insights on ways to optimize protocol designs developed and executed during and post-pandemic. This paper also presents data providing comparisons between two primary subgroups—oncology vs. non-oncology and rare disease vs. non-rare diseases. These subgroups are the most active areas in the drug development pipeline and they receive the most frequent requests for benchmarks by sponsor companies.

Methods

Clinical and clinical operations professionals from 20 major and mid-sized pharmaceutical companies and CROs—Amgen, AstraZeneca, Biogen, Boehringer-Ingelheim, Bristol-Myers Squibb, CSL Behring, Eli Lilly, EMD Serono, GlaxoSmithKline, Janssen, Merck, Novartis, Otsuka, Parexel, Pfizer, Roche, Sanofi, Takeda, UCB, Veristat—provided protocol design and performance data.

Each company was asked to select protocols representative of their current portfolio of clinical trial activity and to include protocols from each of three phases (i.e., Phase I, Phase II, and Phase III). The convenience sampling frame included only those protocols that had received final protocol approval between January 2013 and December 2018 and had a primary completion date or database lock date prior to December 31st, 2019. CROs participating in the study gathered protocol data specifically from client companies other than those represented by sponsor companies in the working group. On average, each participating company submitted data characterizing 11 protocols.

The data collection process used in this study is consistent with the methodology that Tufts CSDD has been using since 2008 to evaluate protocol design practices and their impact. The results of these studies have been published extensively. In each of these studies, design variables typically gathered include the number and type of endpoints, number of eligibility criteria, number of distinct and total procedures performed, number of countries and investigative sites where the protocol was conducted, and number of planned study volunteer visits per month.

Clinical trial performance and quality variables typically gathered by Tufts CSDD include clinical trial milestone durations, recruitment and retention rates. Performance and quality variable definitions are as follows:

  • Study Initiation Duration—days from Protocol Approval to First Patient First Visit (FPFV);

  • Enrollment Duration—days from First Patient First Visit (FPFV) to Last Patient First Visit (LPFV);

  • Treatment Duration—days from Last Patient First Visit (LPFV) to Last Patient Last Visit (LPLV);

  • Study Close-out Duration—days from Last Patient Last Visit (LPLV) to Database Lock (DBL);

  • Total Clinical Trial Duration—days from Protocol Approval to Database Lock (DBL);

  • Patient Randomization Rate—the ratio of the number of patients enrolled to the total number screened;

  • Patient Completion Rate—the ratio of the number of patients completing the clinical trial to the total number enrolled.

Participating companies also classified each protocol procedure according to the endpoint that it supported as defined by the clinical study report and the study’s statistical analysis plan. ‘Core’ procedures supported primary and key secondary efficacy and safety endpoints. ‘Non-Core’ procedures supported supplemental secondary, tertiary and exploratory safety, efficacy or other endpoints and objectives.

The analysis dataset excluded master protocols and adaptive designs to focus on only traditional protocol design practices. We combined data for Phase II and III protocols for comparisons by therapeutic area, by oncology vs. non-oncology, and rare disease vs. non-rare disease, given the smaller sample sizes by individual phase. Descriptive statistics including means and coefficients of variation were calculated. The latter measure is an indication of the consistency in experience between and across participating companies. Protocol data were stored as an excel file and saved on a secure, shared, online drive. The analysis was conducted in SAS 9.4.

Results

In all, 187 protocols were analyzed. Table 1 presents characteristics of the analysis dataset. It contains similar numbers of Phase II (72) and Phase III (67) protocols, with somewhat fewer Phase I (48) protocols. Slightly more than a quarter of the protocols (27.3%) targeted oncology diseases and approximately 1 in 5 protocols (17.7%) targeted rare disease indications.

Table 1 Data characteristics

Table 2 provides means for several scientific design characteristics by phase. Generally, these characteristics are lowest for Phase I protocols. Phase II protocols have the highest mean number of endpoints (20.7). Phase III protocols have the highest mean number of distinct (34.5) and total procedures (266.0) and total protocol pages (115.9). The mean number of datapoints collected per protocol by phase shows a strong progression from 330,420 in phase I, to 2,091,577 in phase II, and 3,453,133 in Phase III. The coefficients of variation around the mean scientific design characteristics are generally very high, most notably the total procedures performed, proportion of procedures that are non-core, total case report form pages and total datapoints collected. A significant correlation was observed between the number of endpoints and the number of eligibility criteria (p < 0.01) and the number of endpoints and the total number of datapoints collected (p < 0.05).

Table 2 Scientific design characteristics by phase

The means for scientific design characteristics—phase II and III combined—are presented for oncology vs. non-oncology protocols in Table 3. The means and coefficients of variation for many design characteristics are comparable between oncology and non-oncology protocols including the mean number of eligibility criteria (29.8 and 31.0), the mean number of distinct procedures (33.3 and 34.3), the average proportion of procedures that are non-core (24.1% and 24.9%) and the mean number of total datapoints collected (2.6 million and 2.7 million). The mean number of total procedures performed was substantially higher for oncology vs. non-oncology protocols at 315 and 243, respectively. Non-oncology protocols have a higher mean number of endpoints (21.4 vs. 15.3 for oncology protocols). No significant relationship was observed between the number of endpoints, the number of eligibility criteria and the total number of datapoints collected in oncology protocols.

Table 3 Scientific design characteristics by TA and indication (Phase II and III only)

Table 3 also shows notable differences observed between protocols targeting rare vs. non-rare diseases with the latter having much higher mean total number of endpoints (12.9 for rare disease and 21.2 for non-rare disease protocols); average proportion of non-core procedures (14.0% for rare disease and 26.4% for non-rare disease protocols); and mean total number of datapoints collected (1.6 million for rare disease and 2.9 million for non-rare disease protocols). Non-rare disease protocols collect nearly double the amount of data than do rare disease protocols.

Rare disease protocols have a higher mean number of distinct procedures (38.1 for rare disease and 33.3 for non-rare disease protocols), mean total number of procedures performed (301.6 for rare disease and 255.6 for non-rare disease protocols), and mean number of case report form pages (244.0 for rare disease and 158.7 for non-rare disease protocols). A significant correlation was observed between the number of endpoints and the total number of datapoints collected (p < 0.01) in rare disease protocols.

Means for executional design characteristics per protocol, by phase, are presented in Table 4. These characteristics include the mean total number of countries, mean total number of planned visits, and the mean total number of patients screened and enrolled. The typical phase III protocol, for example, has more than double the average number of countries and investigative sites than does the typical phase II protocol. Very high coefficients of variation are observed around the mean values for most executional variables, in particular the mean number of investigative sites, number of patients screened, enrolled and completing clinical trials by phase.

Table 4 Executional design characteristics by phase

Table 5 presents the executional design characteristics by oncology and rare disease subgroups. With few exceptions, oncology protocols have higher mean executional variable values than do non-oncology protocols including the average number of countries, investigative sites, planned visits. Exceptions include the mean number of vendors (4.4 for oncology and 5.8 for non-oncology, mean number of procedures per visit (11.9 for oncology and 14.4 for non-oncology protocols) and the mean number of patients completing clinical trials (244.9 for oncology and 291.1 for non-oncology protocols). Among oncology protocols, the coefficient of variation is very high around the mean number of patients completing the clinical trial indicating widely varied experiences between studies and sponsors.

Table 5 Executional design characteristics by TA and indication (Phase II and III only)

Many mean values for executional design characteristics are similar between rare disease and non-rare disease protocols. Exceptions include the mean number of investigative sites, mean number of patients screened, enrolled and completing clinical trials where the benchmark values for non-rare disease protocols are considerably higher. The mean number of planned visits and days for follow-up are higher for rare disease compared to none-rare disease protocols. The coefficients of variation for both rare disease and non-rare disease protocols are generally very high–in particular those associated with patient recruitment and retention.

Tables 6 and 7 contain benchmarks for select protocol performance outcomes. In Table 6, mean performance outcomes are shown per protocol by phase. The mean treatment duration for a phase III protocol is 2.2 times longer than the typical phase I, and 1.3 times longer than the typical phase II, protocol. The average total clinical trial duration—from protocol finalization to database lock—for a phase III protocol is approximately 1,328 days.

Table 6 Select protocol performance outcomes by phase
Table 7 Select protocol performance outcome comparisons (phase II and III combined)

Mean durations are longer for later stage protocols with two exceptions: study close-out duration and time to clinical study report. Protocol randomization and completion rates are also similar between phases, although the completion rate for Phase I trials was slightly higher than that observed in phase II and III protocols.

Oncology protocols show longer cycle time durations than do non-oncology protocols for all clinical trial durations except study initiation (see Table 7). On average, phase II/III oncology protocols are 1.5 times longer than non-oncology protocols with the widest differences observed in durations associated with patient enrollment. Completion rate was also substantially lower for oncology protocols than for non-oncology protocols—31.4% and 80.0%, respectively. Protocols targeting rare diseases have longer cycle time durations for most measures except study conduct, study close-out, and time to clinical study report. Protocols targeting rare disease also had lower completion rates than did non-rare disease protocols—50.8% and 72.5%, respectively. The most notable difference in clinical trial durations is observed in the time to complete each enrolled patients’ first visit.

Table 8 shows trends in select scientific and executional design characteristics. Mean values per protocol, in phase II and III, are presented in four-year increments between 2009 and 2020. An upward trend is observed for all variables. The total mean number of countries and the total mean number of procedures performed showed the highest relative growth rates during this period with both increasing by slightly less than 70% over the time horizon measured. Others design variables showed more moderate but still substantial growth including mean total number of investigative sites, which increased by 33.0%, and mean total number of endpoints, which increased by 27.1%.

Table 8 Notable trends in select design characteristics (phase II and III protocols)

Discussion

The results of this study provide data that can serve as benchmarks for proactively assessing the scientific and executional complexity of new protocols. These benchmarks also establish important baselines for measuring the impact of the pandemic on future protocol design practices.

The results show a continuing upward trend across all protocol design variables. Phase II and III protocols now average 20.7 and 18.6 total endpoints, respectively; 30.9 and 30.4 inclusion and exclusion criteria; 107.6 and 115.9 protocol pages; 35.1 and 82.2 investigative sites disbursed within 6.1 and 13.7 countries, respectively; and 2.1 million and 3.5 million datapoints collected, respectively.

These findings are an expected consequence of increasingly more ambitious and customized drug development strategies driven in part by highly challenging disease targets in active R&D; strong demand for data to understand differences between patient subgroups (e.g., biomarker stratification); and great difficulty associated with identifying, competing for, recruiting and retaining study sites and volunteers.

Whereas oncology and rare disease protocols have average numbers of endpoints and eligibility criteria comparable to non-oncology and non-rare disease protocols, wide differences are observed in the executional variables. Although oncology and rare disease protocols have considerably lower relative target patient enrollment numbers, they involve a much higher average number of countries and investigative sites, require more patient visits per protocol and generate considerably more clinical research data that must be monitored, cleaned, curated and analyzed.

Oncology and rare disease clinical trial durations are longer—most notably between study startup and database lock. This is due in part to the long follow-up periods found in oncology and rare disease studies: the former had a mean days-for-follow-up four times longer than that observed in non-oncology protocols; and the latter rare disease protocols had a mean-days-for-follow-up nearly 2.5 times longer than the comparison non-rare group. In our dataset, more than 80% of oncology protocols had completion times that were event-driven as opposed to fixed-duration driven. This compares to non-oncology protocols where only 9% had event-driven completion times. Further, completion metrics for oncology clinical trials may have been substantially longer due, in part, to disease progression leading to early discontinuation. Rare disease protocols also had longer relative study initiation periods likely due to the difficulty in engaging investigative sites and in finding and enrolling study volunteers.

The results of this study, combined with those from a recent Tufts CSDD study looking at design variables correlated with clinical trial performance [18], also suggest practical considerations for protocol design decision-makers. Strong observed growth in the number of investigative sites and countries supporting protocol execution–and the significant positive correlation between these executional design variables and clinical trial durations–represents a substantial opportunity to improve speed and efficiency. The relatively high proportion of non-core procedures, most notably in non-oncology and non-rare disease protocols, suggests a critical need and opportunity to reduce and simplify the total number of less essential endpoints and the protocol procedures supporting them.

This study has several limitations of note: The protocols were selected by participating companies arbitrarily and, as such, represent a convenience sample. Moreover, the benchmarks are based on aggregated data drawn from a wide variety of disease conditions. The large coefficients of variation observed around the mean values indicate that the benchmarks should be used with some caution.

Future research will look to gather a larger sample of protocols so that comparisons by individual disease conditions can be made. Tufts CSDD is also planning to explore the relationship between protocol complexity and the ethics review cycle, the regulatory review and approval cycle and its outcome, and between protocol complexity and commercialization performance.

As drug development strategies evolve and decentralized clinical trial solutions gain acceptance, we can expect to see ongoing changes in protocol designs. Data volume and data diversity, for example, will likely increase with more widespread adoption of handheld devices and mobile apps and greater integration of patient health data into the clinical trial analysis dataset.

As clinical trials for select disease conditions move to wherever and whenever patients can most easily and conveniently participate, we may see more countries involved in clinical trials but fewer physical investigative site locations. Early anecdotal reports suggest that DCTs may shorten clinical trial durations through faster recruitment and better retention and a reduction in the number of protocol amendments. Some anecdotal reports also suggest that the introduction of new DCT vendors, non-standard datasets, training requirements and novel practices, at least in the short term, may contribute to higher levels of protocol complexity.

As these changes unfold, we look forward to continuing our research benchmarking protocol design behaviors and their impact on clinical trial performance.