The clinical trial landscape has evolved over time, shaped by advances in medicine and therapeutic development and innovation in trial design and methods. The tracking of such changes became possible with trial registration, providing the public with a window into the massive clinical research enterprise. Many clinical trial registries exist globally, established with the shared objective to address issues of reporting biases, including publication bias and selective outcome reporting, and increasing clinical trial transparency and accountability, and used by the public to access clinical trial information. The research presented herein focuses on clinical trial registry data from, which is managed by the US National Institute of Health (NIH) National Library of Medicine (NLM) and is currently the largest clinical trial registry worldwide.

Two decades have passed since was launched in 2000, which now includes over 400,000 registered studies (interventional and observational) across 220 countries (as of May 2022) [1]. The number of trials registered in has increased over time with an uptick in registration first observed in 2005, when the International Committee of Medical Journal Editors (ICMJE) required that trials under consideration for publication must be registered prior to beginning enrollment [2]. Shortly afterwards, Congress passed the Food and Drug Administration Amendments Act of 2007 (FDAAA) expanding trial registration and reporting requirements [3,4,5]. Around the same time, the World Health Organization established a trial registration policy in 2006, launching the International Clinical Trials Registry Platform (ICTRP). In 2016, the FDAAA 801 Final Rule was issued further clarifying and expanding the regulatory requirements and procedures for trial registration and result reporting [6]. Another important milestone for improving our ability to analyze clinical trial registration data in the USA is the availability of The Clinical Trials Transformation Initiative (CTTI) Aggregate Analysis of (AACT) [7]. The CTTI AACT is a publicly available relational cloud-based database that includes aggregated and restructured data from for which content is updated daily and available for download. It includes additional tables, variables, and restructured and formatted data, which has significantly facilitated and enhanced the ability for researchers to download, analyze, and summarize registration data [7].

Using the publicly available registration data from the CTTI AACT database of clinical trials, we have previously reported on characteristics and trends of trials by funding source as well as analyses of trials funded by the NIH Institutes and Centers [8,9,10]. While our previous analyses focused primarily on the nature of completed trials over time, the overarching objective of this review is to characterize all trials registered in and started between 2000 and 2020. Specifically, we aim to describe changes in trial design features over time: trial phase, allocation, masking, interventional study model, and primary purpose. We also explore patterns in the composition of registered trials with regard to the key inclusion and exclusion criteria data elements, and the quality of trial reporting over time, including missing data elements, reporting of trial results, and availability of trial documents (e.g., protocol).


Data source

We conducted a cross-sectional analysis of publicly available trial registration data as structured and organized through the CTTI AACT. A static copy of the database is created on the first of every month and archived on the CTTI AACT website ( We downloaded the static version of the database on April 1, 2021, for the purpose of the analysis. Additional details on methods and analysis of CTTI AACT database data have been described previously. Included in the analysis were clinical trials (“interventional studies” defined as “a type of clinical study in which participants are assigned to groups that receive one or more intervention/treatment/no intervention”) registered in and started between 1 January 2000 and 31 December 2020. Observational studies and expanded access studies were excluded. As this is a review of aggregate-level, publicly available data, institutional review board approval is not required.

Outcomes of interest

Characteristics of design (e.g., phase, randomization, use of masking, number of treatment groups, sample size), eligibility criteria (age groups, gender), interventions, conditions, and funders (primary sponsor) were tabulated over time and by overall status. Overall status was grouped as completed, stopped (terminated, withdrawn, or suspended), and recruiting (not yet recruiting, active, not recruiting, or enrolling by invitation). Trials were grouped by year started in 1-year increments. Trials were categorized by year started (date of first enrollment), as trials may have been registered retrospectively, especially in earlier years (e.g., a trial that started in 2001 and registered in 2007). Thus, year trial started represented a more accurate estimate for assessing trends in trial design over time.

Trial phases were defined according to FDA phases and as included in the glossary of common site terms ( and further grouped as phase 1–2, phase 3–4, and phase not applicable (N/A), defined as trials without FDA-defined phases, such as trials of devices or behavioral interventions. Trial funders were determined based upon the “lead” agency_class from the CTTI AACT sponsor table, where organizations listed as sponsors and collaborators for a particular study include US National Institutes of Health (NIH) and other US Federal agencies (“NIH/US Fed”) (e.g., FDA, CDC, US Department of Veterans Affairs), industry, and all others (e.g., individuals, universities, and community-based organizations).

All variables were defined and categorized as included in the CTTI AACT database, which represent data retrieved directly from, as well as derived variables and new variables created from information available on, as well as from the National Library of Medicine (NLM) (e.g., Medical Subject Headings (MeSH) for conditions and interventions). The complete data dictionary including variable names and definitions are available at the following link: [7].

Statistical analysis

The analysis used all available data from trials that met eligibility criteria and were registered in the registration database up to May 1, 2022, and summarized by overall status, year groups, and other variables of interest, as described above. Comparisons across year groups were made using the chi-square analysis, where applicable. Year groupings were created to align with key milestones and updates to trial registration regulations over time. Start dates were selected to account for trials that may have been registered retrospectively. The frequency of missing registration data were tabulated for each variable, but could not be included in the analysis as registration fields changed over time, and some were not required in early year groupings. All tabulations and counts were independently conducted by two reviewers (AGG and JLM) using different statistical software (PostgreSQL and SAS). Discrepancies were resolved by a third reviewer (CLM or GG).


From 413,389 registered studies in, as accessed on 01 May 2022, 320,129 (77%) were classified as “Interventional,” of which 274,043 had start dates between 1 January 2000 and 31 December 2020 (Fig. 1). The number of registered trials increased from 1873 trials started in 2000 to 22,131 trials started in 2020 (Fig. 2). Between 23.9 and 85.9% of registered trials were reported as complete and 6.2–14.5% of trials started were reported stopped (withdrawn, terminated, or suspended). The majority of registered trials reported to be active (open to accrual, recruiting) started between 2015 and 2020, with 64.6% open trials in 2020. A large percentage (6.9–18.9%) of registered trials have unknown status (recruitment status had not been verified in for two years).

Fig. 1
figure 1

Flow chart of the included studies, as at 01 May 2022

Fig. 2
figure 2

Number of trials registered in, by year started and overall status

Design characteristics

Design characteristics of registered trials started between 2000 and 2020 are displayed in Table 1. The percentage (Table 1B) of registered trials reported to be multi-site has decreased over time with 49.4% multi-site trials started in 2000, 39.3% in 2010, and 32.7% in 2020 (16.7% change since 2000). The percentage of trials reported as randomized has remained relatively stable over time (range 51.3–67.3%) with the greatest percentage of randomized trials reported in 2011. Most registered trials started between 2000 and 2020 were reported as parallel design (range 39.4–61.4%) and increased over time. Other reported intervention models, as provided and defined in, include crossover trials, factorial trials, sequential design, and single group, with increases observed in reported crossover and sequential trials, and a small decrease in factorial trials. For example, since 2015, the percentage of trials reported as sequential design increased from 1% to 5.1%. The percentage of trials reported as crossover was largest between 2010 and 2015 (range 8.8–10.1%) decreasing to 6.5% in 2020 (Table 1B).

Table 1 Design characteristics by year started

The percentage of registered trials reported as single or double+ masked (blinded) has been stable over time, with over 40% of trials reported as masked since 2005. Approximately one quarter of registered trials are reported to have a single treatment group (arm), while the remainder have two or more treatment groups. The percentage of trials reported to have two groups has increased with 20.1% in 2000, 32.1% in 2005, 51.2% in 2010, and 55.2% and 55.6% in 2015 and 2020, respectively (Table 1).

Figure 3 shows the number of registered trials started by year and phase: phase N/A (non-FDA-defined phase), phase 1–2, and phase 3–4. The number of registered trials reported as “phase N/A” has increased from 300 registered trials started in 2000 to 13,367 started in 2019; this number decreased to 12,125 trials started in 2020 (Fig. 3A). The percentage of trials reported as “phase N/A” also increased from 16 to 54.8% over the past two decades. In contrast, phase 1–2 trials and phase 3–4 trials have decreased over time (Fig. 3B).

Fig. 3
figure 3

Number (A) and percent (B) of trials registered in started, by year started and phase category. *there were 9 registered trials that did not report phase

Trial conduct and recruitment information

Trial descriptive information and recruitment details, as reported in, are summarized in Table 2, including trial sponsor, presence of data safety, and monitoring committee (DSMC), availability of trial protocol, and eligibility. The majority of trials started between 2000 and 2020 report primary sponsor as “other” (e.g., individuals, universities, and community-based organizations). The relative proportion of trials reporting “other” as primary sponsor has increased over time while the proportion of trials reporting industry or NIH/other US Gov as the primary sponsor has decreased (Table 2). Trials reported having a data safety monitoring committee (DSMC) ranged from approximately 20% in 2000 to 38% in 2010 and 34.2% in 2020. The composition of trials has remained relatively stable over time, with the majority of trials involving adults and children (74.5%) and both men and women (>80%). The average percentage of trials, 2000 through 2020, conducted in women only or men only were 9.9% and 5.1%, respectively. The percentage of registered trials across all age categories (adults only, children only, or adults and children) remained relatively stable over time involving populations of all ages (adults and children) ranging from 72–81%, 16-20% among adults only, and 5.2-6.5% among (Table 2).

Table 2 Trial descriptive and recruitment information by year started

Between 2000 and 2005, the percentage of registered trials reporting “drugs” as primary intervention types decreased from 70.2% in 2000 to 39.1% in 2020. The percentage of trials involving devices, behavioral interventions, and “other intervention types” increased over time (Fig. 4). These trends are reflected in the changing percentage of trials reporting “treatment” as primary purpose over time, with 84.2% in 2000, 79.7% in 2005, 70.2% in 2010, 63.3% in 2015, and 62.4% in 2020 (Fig. 5). Of note, registered trials with primary purpose reported as “prevention and screening”, “supportive care”, and “others” increased over time (Fig. 5).

Fig. 4
figure 4

Intervention types by year started

Fig. 5
figure 5

Primary purpose over time by 5-year increments

Reporting characteristics among completed trials registered in

Median trial duration has decreased over time among registered trials reported as completed in The time from the date of first enrollment to enrollment completion date, ranged from 0.6 to 4.3 years for trials starting between 2000 through 2020 (Table 3). A twofold decrease in median years to trial completion was observed between 2000 (4.3 years, IQR 2.3, 6.8) and 2007 (2.0 years, IQR 1.0, 3.7), decreasing to 1.6 years in 2015. Median trial enrollment (actual sample size) was 82 (IQR 33–256) in 2000, 69 (IQR 30, 200) in 2005, 57 (IQR 24, 149) in 2010, 60 (25, 140) in 2015, and 62 (IQR 30, 150) in 2020 (Table 3). Most completed registered trials report sample sizes <50 participants across all years, with the percentage of trials conducted in more than 500 participants decreasing over time (Table 3). The number and percentage of registered trials reporting results has increased over time, with a notable increase in 2007 (n=2840, 36%) when the results database was launched, compared to previous years ranging from 8.7 to 24.5% trials with posted results. The time to report results has also improved over time, decreasing from a median of 29 months in 2007, when the result database was launched, to 12 months in 2015, and 10 months in 2020. The percentage of registered trials posting results in 12 months or less also increased after 2015, when the final rule for FDAAA 801 was issued, although remains a small (0.1–1.4%) (Table 3). This percentage reflects reporting for all completed trials, including those that do not meet the definition of an “applied clinical trial” or are required to report results. Finally, the percentage of trial registration fields with missing/null values has decreased across most required registration fields over time. Since the FDAAA 801 submission requirements were expanded in 2007, the percentage of missing responses for fields including randomization, masking, intervention model, and eligibility all decreased to <1% missing (Table 4). Registration fields including the number of facilities (sites), treatment groups, and primary purpose had a higher proportion of missing values in earlier years (2000–2015), decreasing to 0–1% by 2020.

Table 3 Completion status, enrollment, and result reporting among completed registered trials, 2000–2020
Table 4 Percent (%) missing responsesa, by data element and trial start year


In this study, we characterized and described trends in the design and composition of trials registered in that started between 2000 through 2020. Prior to registration, there was no viable way of identifying trials except via the published literature—a biased sample since only a small fraction of trials are published. With the launch of in 2000 and subsequent establishment of the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) in 2006, access to important trial information along with the ability to trace the state and nature of trials became possible. While it would be of interest to analyze all available registry data across multiple International Registries, differences in regulations by country and definitions, lack of a common data structure, and risk for duplicate entries make it difficult to provide an accurate account [11]. Thus, leveraging the publicly available AACT CTTI data, we provide an overview of the clinical trial landscape through the lens of

During the first 5 years from when was launched, and prior to the ICMJE edict of 2005, we observed a much less complete account of trials. This is reflected in the small number of trials registered between 2000 and 2005. In the years that followed, there were important developments in trial registration regulation in the United States, including the establishment of the FDAAA section 801 in 2007, which required more trials to be registered and expansion of required data elements. Consequently, the number of trials started doubled during that year period, along with a jump in the number of trials without FDA-defined phases (phase “N/A”). The Food and Drug Administration Amendments Act of 2007 (FDAAA) also included the requirement that investigators post results of trials covered under FDA regulations on within 1 year of completion. Failure to comply carries provisions for heavy fines. Although not a substitution for publication, we observed that results reporting in has improved over time and represents an important step towards trial accountability and transparency.

A notable trend observed in this analysis was the decline in phase 1–4 trials and the increase in trials without FDA-defined phases, indicated as “Phase N/A” in Feasibility studies, non-drug trials, behavioral trials, and other trial designs (e.g., adaptive or platform) that do not fit within the FDA definition for phases fall into the “Phase N/A” category. Between 2016 and 2020, more than 55.9% trials were categorized “Phase N/A.” Given the broader definition and larger number of trials that do not fit within the FDA-defined phases, there is a need to update the registration information capture to include additional data elements which specifically categorize more of the “N/A” study characteristics into pre-specified design classifications. When was first established, emphasis was placed on FDA trials where the majority of registrations included US-funded drug trials, following the FDA definitions for the primary sponsor as the funder and holder of Investigational New Drug applications. The FDAAA “final rule” of 2016 refined the definition of an “applicable clinical trial” (ACT) and expanded on requirements for result reporting [7], supporting the need to include additional trial design options and categories in the registration elements.

As half of the trials registered in are conducted outside of the USA, a third conducted in the US only, and the remainder in both US and other countries, there has been a significant increase in the number of trials funded by other sources (e.g., universities, foundations) and a smaller percentage of trials funded by the NIH/US Government or industry over time.

One improvement to the registry would be to provide means to specifically identify primary funding source(s) and respective investments in each trial undertaken. Over three quarters of primary sponsors for registered trials are categorized as “other.” As trials are collaborative and often include more than one sponsor or funder, it is difficult to describe the current funding status of trials. We have previously suggested the inclusion of a funding variable and additional link or established connection to the NIH Reporter funding information for any trial funded by NIH [9, 10]. The majority of trials funded by other sources tend to be smaller, do not have FDA-defined phases, and do not have results posted, inundating the registry with small, underpowered trials that are too small to answer meaningful questions [12, 13]. However, such trials are often required to generate preliminary data for grant applications and to obtain funding for larger, more informative and practice-changing trials. Thus, it would be of interest to include an additional variable in to establish linkage to the subsequent larger trials, to determine how many have been funded as a result of these smaller “pilot” or feasibility trials.

Trial designs have evolved over time, and while is structured to accommodate trials conducted independently and sequentially (i.e., from phase 1 to 2 to 3 to 4), there are more adaptive designs, platform trials, expansion cohorts, decentralized designs, and other methods applied to enhance trial efficiency [14, 15]. This can be observed in the increasing number of sequential designs over time, for instance, although not all trial designs are included in the drop-down menu when registering a clinical trial in Until data capture and the quality of reporting of these trial designs improve, it is difficult to know how many trials are currently being conducted [16, 17]. Additional registration fields to capture further specifics of trial design may help improve our understanding of how trial designs have changed over time, and whether the reported sample sizes are sufficient to provide meaningful answers.

Evolving designs may be driven by several factors. For one, trial outcomes have also evolved over time, with more trials using surrogate outcomes and biomarkers, composite outcomes, patient-reported outcomes, and massive lists of genomic information [18, 19]. To describe the different types of outcomes being used, an additional field specifying the outcome type or category would be informative to understand trends in trial outcomes over time. The need for additional categories and links to publications related to the primary and secondary objectives, if any, would also allow for better tracking of publications related to the registered trials. In addition, advances in technology have resulted in its integration into trial design and changed how trials are being conducted (e.g., decentralized designs) and how outcomes are being captured [20]. As technology continues to advance and becomes integrated with health care, the registration fields will once again need to be reimagined. This became apparent in the year 2020, with the COVID-19 pandemic and increased use of telehealth and technology to continue study visits and assessments for many of the ongoing trials [21]. This was also marked by over 4500 additional interventional trials related to COVID-19 registered in alone, as defined by the “covid-19” search terms as listed on the website [22]. The impact of COVID-19 on the completion status and recruitment for non-COVID-19-related trials will continue to unfold in the years that follow, and a more in-depth analysis of the characteristics of COVID-19 trials is planned. Finally, trial designs have evolved along with the changing populations and conditions we study over time.

A limitation of this analysis is that it only includes trials registered in Although accounting for a sizable fraction of all trials, our scope is not necessarily representative of the entire clinical research enterprise. is only one of many trial registries that currently exist globally. While we considered analzying all registration data as available in the WHO ICTRP, several obstacles exist to analyzing study metadata from the WHO ICTRP as a result of incomplete data, lack of a single minimum information standard for fields required, and discrepancies between fields across the WHO Trial registration datasets as noted by Miron et al. [23]. We have previously commented on the value of merging registries into a single international trial registry [10]. While the ICTRP provides a platform for multiple registries with a unique trial identifier, it only accounts for approximately 30% of registrations across 16 registries. Furthermore, trials are not registered directly through the platform, thus do not follow the same registration and reporting requirements, or share a common data structure. Therefore, trial registration platforms are at risk of including incomplete or inconsistent trial information and, for instance, duplicate registrations, without an informed standardized protocol existing to identify and merge these [23, 24]. Additionally, there remains a large number of trials that are not registered, making it difficult to obtain a complete account of all trials [25]. As observed in our analysis as well as other reports, the number of trial registrations, especially in the last decade, have increased [26].

Another limitation of this analysis is the inability to account for differences in the reporting quality and completeness of registered studies over time, due to changing policies and updates to registration elements and reporting requirements. Not all trial registration data are available over any given time frame, especially during the first 5-6 years prior to ICMJE. As observed in our analysis, however, the percentage of missing fields decreases for most required elements over time, with less than 1% missing in the later half of the decade. Although the completeness of trial reports is reviewed through the Protocol Registration System (PRS), the accuracy, consistency, and quality of the data in the registry cannot be guaranteed [1, 27]. Thus, it is difficult to make accurate comparisons across time periods or data elements and these limitations should be taken into account when interpreting the findings from this analysis.

Despite its limitations, this study provides a comprehensive look at the AACT CTTI database to date, spanning over two decades and including all interventional studies registered in We summarize insights and suggestions to improve the database and registration fields in order to adapt to the evolving and expanding clinical trial landscape. Future directions for this research include analyzing the results database, including trial composition and demographics, primary outcome results, and safety data. Using the MeSH database, a more detailed analysis oftrials by condition, including COVID-19-related trials, will also be explored. Finally, an analysis of all registered trials across multiple clinical trials registries will be important to gain a global perspective of the International clinical trial landscape.


Clinical trial registration has transformed how trial information is accessed, disseminated, and used. As clinical trials evolve and regulations change, trial registries, including, will continue to provide a means to access and follow trials over time, thus informing future trial design and highlighting the value of this tremendous resource.