Introduction

Appropriate therapeutic treatments are lacking for most rare diseases [1]. This is due to multi-faceted reasons, including scarce knowledge of the biology of the disease, difficult and unprecedented regulatory pathways, and limited commercial benefits for pharmaceutical companies. In addition, there are significant hurdles to conducting clinical trials in a globally small patient population, including patient access and maintenance, study enrollment, and overall timelines [1].

In the United States, a rare disease is defined as a condition that affects fewer than 200,000 people [2]. For the European Union, a disease is considered rare if the prevalence is no more than 5 per 10,000. An estimated 7000 rare diseases affect more than 400 million people globally, with 50% of patients being children [3].

Different initiatives have been introduced including support for basic research in academia as well as funding rare disease research and targeted drug development by governments. Further, specialized care centers associated with academic medical centers have been established that provide specialized treatment for dedicated orphan diseases. This has greatly increased the awareness of these diseases and their impact on patients.

In this article, we present solutions that Bayesian strategies can provide to tackle challenges to drug development in rare diseases. The objective is to address the gap of insufficient knowledge of Bayesian approaches which was reported as one of key barriers to using Bayesian methods in drug development [4]. Readers not familiar with Bayesian methods could consider reading the Tutorial article that is part of this special section on Bayesian Clinical Trials. Not all interesting strategies could be covered in this paper. Most notably, Bayesian platform trials are missing, and we refer interested readers to the recent review by Kidwell and co-authors [5].

Challenges of Drug Development in Rare Diseases

A primary challenge to drug development in rare diseases arises from the small population from which participants can be recruited. This challenge is exacerbated when patients are reluctant to enroll in the control arm of a trial, due to a belief that the new treatment is likely to be more efficacious than the standard-of-care [6].

A second challenge is limited knowledge of the disease’s natural history [7] and the difficulty of defining appropriate/clinically relevant endpoints that are feasibly measured during a clinical trial. Many rare diseases show significant clinical heterogeneity in terms of presentation and severity due to population genetic and environmental diversity, differences in diagnostic categories, and varying survival across geographic areas [8]. Important definitions (e.g., of clinical endpoints or of disease subtypes) may change during drug development; thus, the learning curve is steeper on almost all aspects of drug development as compared to common diseases. For example, when drugs target only certain molecular subtypes of the disease, defined by, mutations, for example, in a particular gene or genetic deficiencies of enzymes, this requires a precise description of all relevant aspects of the subtype and well-defined tools to identify these patients. Subtype definitions may change with increasing knowledge, and what was a subtype in the past may be seen as a rare disease of its own in the future (and vice versa). Also, which outcome measures are important to describe the course of the disease, and which clinical endpoints are relevant to patients, may only reveal themselves over time. These multi-faceted aspects require extensive assessments of patients, and measuring multiple endpoints in each study, thereby necessitating complex studies (and statistical analyses) that are difficult for sponsors to design and for patients to participate in.

Finally, a ubiquitous challenge in pharmaceutical development is access to accurate information on trial characteristics and patient outcomes that can be used to estimate a trial’s success rate. Gathering such data is expensive, time-consuming, and susceptible to error [9]. This is already challenging in common indications but there is often less information available to make accurate predictions in the rare disease setting, rendering risk assessments for investors difficult.

Strategies to Tackle the Challenges

Adaptive Designs

Adaptive designs include prospectively planned opportunities for modifying study design elements (e.g., sample size, target population, treatments, endpoints, and randomization ratios) and hypotheses based on interim data analyses. This makes them attractive to tackle the problem of limited availability of knowledge when the trial is being designed. Appropriate, pre-planned modifications in the trial design or underlying statistics can improve overall effectiveness while controlling the chance of erroneous conclusions and maintaining trial conduct and integrity [10]. This is particularly important for late Phase 2 and Phase 3 trials, where the median expense for a single Phase 3 trial is $19 million [11, 12].

Bayesian statistical inference is easy to apply to adaptive decision-making. The prior distribution can be updated continually as data accumulate; further, unlike frequentist inference, inclusion of adaptive elements in a design (typically) does not affect Bayesian measures of uncertainty [13]. However, we note that Bayesian methods do not guarantee control of Type I error (which is a frequentist concept). When Type I error control is required, extra effort is required to tweak default Bayesian decision criteria to achieve control over the range of (likely) parameter values. This can be provided with the use of simulations to estimate operating characteristics [14].

Some studies are hybrid: the final inference is frequentist, but prior information/beliefs about the parameters are used to design the study and posterior beliefs/distributions for interim decisions. Hybrid approaches use the frequentist framework to control type-1-error while still making use of the flexibility of the Bayesian framework to drive planning and interim decisions.

A special type of adaptivity is response-adaptive randomization (RAR), where the chance of a newly enrolled subject being assigned to a treatment arm varies over the course of the trial (based on accumulating outcome data for subjects previously enrolled) [10]. Response-adaptive techniques intend to minimize the number of patients allocated to inferior treatments, and may reduce sample size. Additionally, recruitment may be easier because patients find the higher chance of receiving (potentially) promising treatments appealing. Note that there are ongoing debates around RAR as the advantages come with some risks. Most of those raised in Hey et al. [15] do not apply to rare diseases, yet an important risk relevant for rare diseases is that the profile of patients enrolled and standard-of-care may change over time, which leads to some risk for biased results. For a discussion of these risks, and others, see Proschan et al. [16]. In their article, they discourage the use of RAR in general, yet they acknowledge that the Bayesian approach can avoid some disadvantages.

Using External Data: Extrapolation

The European Medicines Agency (EMA) defines extrapolation as “extending information and conclusions available from studies in one or more subgroups of the patient population (source population), or in related conditions or with related medicinal products, to make inferences for another subgroup of the population (target population), or condition or product, thus reducing the need to generate additional information (types of studies, design modifications, number of patients required) to reach conclusions for the target population, or condition or medicinal product” [17].

More commonly, the term extrapolation is also used when mathematical models (population PK-PD, physiologically based models, simple regression etc.) are used to bridge the gap between species, different target populations, or treatment regimens (different administration route, dosage, etc.). All these approaches have relevance to drug development and many employ Bayesian methods.

Using External Data: Borrowing

In the context of clinical trials, we define borrowing to mean the use of data that are logically external/separate (with respect to data generated within the trial) to perform statistical inference or make decisions. This broad definition dates back to Tukey [18]; in this sense all borrowing is extrapolation. Some specific examples are provided in Table 1.

Table 1 Borrowing: sources and applications

Extrapolation and borrowing are effective means to reduce sample sizes in clinical trials. Often, external information will only be available for the control arm. Even in this situation, the sample size of the control arm may be reduced with a greater proportion of patients allocated to the treatment arm. The increased chance of being randomized to the new treatment may increase the willingness of patients to participate in the study (especially when there is no curative treatment available).

Single-arm trials are an extreme example, where the effect of a novel treatment (as estimated from the single arm) is compared to a reference effect (as estimated/inferred from external/historic data, often called the external control). Single-arm trials have been accepted for drug approval in circumstances of high unmet need for (ultra)rare diseases [19, 20]. However, the absence of any concurrent control data and lack of randomization increases the risk of bias. Thus, this extreme use of external data may only be supported in specific cases where the risks are deemed to be outweighed by the benefits.

Single-arm trials may be acceptable when an investigational treatment aims to address an important unmet clinical need and solid knowledge of the natural history of a disease without treatment/with current standard-of-care is known. However, in diseases with somewhat evolving standard-of-care or heterogeneity in historical data, inclusion of concurrent controls (with randomization and blinding) may be important to be able to address prior-data conflict (especially for trials with slow progressive diseases or symptom-based endpoints). In such cases, external information may be borrowed to augment the information generated within the trial. Table 1 provides three common situations in which borrowing may occur in clinical trials (including situations where borrowing may occur for the treatment arm).

The Bayesian framework provides principled methods to facilitate borrowing external information. Such methods include robust meta-analytic predictive priors, power priors, and commensurate priors [21,22,23]. A detailed description of these methods is beyond the scope of this review. Conceptually, each of these methods can be thought of as summarizing external information in the form of an (informative) prior. This prior is “updated” to the posterior by adding information obtained from the current/new trial. Thus, the posterior reflects information from both sources. The models underlying these methods allow for the possibility of systematic differences between the “populations” from which patients in the external studies and the current/new trial are sampled. In the case of a mismatch between current and external data (often referred to as “prior-data conflict”), the methods reduce the amount of information borrowed from the external data. They can be tuned to control how drastically external information is down-weighted. Thus, the amount of borrowing can be calibrated to reflect prior skepticism about the external data or to achieve reasonable frequentist operating characteristics.

As noted above, borrowing improves efficiency but also increases the risk of bias when the populations underlying current and external patients are systematically different. Even methods that allow the possibility of down-weighting external information do not completely eliminate the risk of bias. The FDA acknowledges these two sides of informative priors/borrowing and encourages simulations to assess the impact: “For some Bayesian designs, it is possible to use simulations to estimate the frequentist operating characteristics of power and Type I error probability. In these cases, decision criteria can be chosen to provide Type I error control at a specified level,” as well as the use of alternative trial characteristics: “When Type I error probability is not applicable (e.g., some Bayesian designs that borrow external information), appropriate alternative trial characteristics should be considered” [10]. However, the guidance does not explicitly suggest particular alternative trial characteristics. A necessity from a regulatory perspective is to have the means to apply “the same rule” for all sponsors that work in the same field, so the trial characteristic needs to be generally applicable and meaningful.

Disease Progression Modeling

Natural history studies in patients with rare, progressive diseases provide valuable information on the expected trajectory of the disease. This information can be leveraged to improve clinical trial design in several ways: to inform trial planning, inclusion/exclusion criteria, defining endpoints (and when to measure them), and to predict effect size and variability. Mathematical functions that quantitatively describe the time course of the disease can be used in probabilistic “virtual trial” models for internal decision-making, and to build efficient trial designs with synthetic controls when the risk of bias is considered acceptable by regulatory bodies, e.g., when compared to a single-arm trial with a fixed threshold for efficacy, or with a historical control arm as the only viable alternatives [24]. Additionally, data from natural history studies can be used to build smarter analysis models, and patients from these databases may be easily incorporated into disease progression models as external controls [25].

Applying Bayesian Methods to Clinical Trials: Evolving Regulatory Science

Since FDA CDRH developed a guidance for the use of Bayesian statistics in medical device clinical trials [26] in 2010, the Bayesian method has been used in confirmatory trials for medical devices. Medical devices tend to evolve over the course of clinical trials with companies updating device features. Borrowing data collected when using previous versions of the same device is considered quite appropriate (and routinely implemented) within the medical device community. A decade of experience with Bayesian methods in medical device trials is discussed in a dedicated article in this issue/series.

Interest in Bayesian trials for drugs and biologic development has been increasing recently. Benefits of the Bayesian framework are more evident for rare diseases; however, there is growing interest in other therapeutic fields as well.

Recent epidemic and pandemic outbreaks also highlight the need for trials that adapt to evolving disease and evolving understanding of a disease and find a solution for clinical practice within the shortest possible timeframe [27]. During the Ebola virus disease outbreak, FDA CDER statisticians worked in collaboration with the National Institutes of Health and several academic centers. The PREVAIL II trial used an adaptive design with Bayesian features [28]. More recently, Pfizer utilized a Phase 1-2-3 trial using a Bayesian approach for its Covid-19 vaccine. The chosen design provided the required flexibility without compromising the quality of evidence [29].

The recent FDA Complex Innovative Trial Design (CID) Pilot Meeting Program was designed to facilitate and advance the use of complex adaptive, Bayesian, and other novel clinical trial designs. All three CID case studies leveraged Bayesian design elements.

A new ICH guidance E11 [30] is in development for pediatric extrapolation. In the draft, the Bayesian strategy is among three general options mentioned for model-informed approaches, and instructions are given explicitly for Bayesian methods on how to quantify the impact of use of reference data, on how to justify the design and how to report. This is showing support for the use of Bayesian methods.

Examples of Implementing Bayesian Strategies in Rare Diseases

Extrapolation from Adult to Pediatric

In pediatric drug development, there is an ethical imperative to minimize both the number of studies in children and the number of children recruited to studies. Extrapolation of information learned on adults is a natural candidate to fulfill this requirement and has been previously utilized in many circumstances in the pediatric trials setting.

Similar to general case, extrapolation of adult information into a pediatric population can occur along a spectrum of extrapolation and borrowing. In cases where there is robust support for a shared mechanism, similar clinical responses to intervention, and similar dose-exposure–response relationships in both populations, limited efficacy data in children might be needed. The pediatric development in such cases may be constructed with sufficient precision and quantification of uncertainty by pediatric safety (and PK) data exclusively using a Bayesian framework. On the other hand, when there is uncertainty about the underlying assumptions of the mechanism of the disease, partial extrapolation of efficacy may be achieved using a PK/PD exposure–response study or a single dedicated, well-controlled efficacy trial [31].

Multiple reflections and guidances for extrapolation in the pediatric drug development context have been presented by regulators. The EMA issued a reflection paper [17] that highlights development of the exposure/response relationship in adults as being central for extrapolation. The FDA distinguishes between full, partial, and no extrapolation as well as between extrapolation for safety vs. that for efficacy in its guidance [32]. Full extrapolation occurs when no pediatric trial is deemed necessary; conversely, no extrapolation can occur when adult and pediatric populations are considered sufficiently different [33]. The FDA notes that it expects full extrapolation to be rare.

Hence pediatric extrapolation is an approach encouraged by the health authorities to evaluate the efficacy and/or safety of a drug in children. When warranted, pediatric extrapolation can deliver realistic and credible predictions of pediatric efficacy, increase the acceptance of results by health authorities, and accelerate approval of pediatric medicines [34].

Example 1 Extrapolation: Bayes Calculation to Justify More Liberal Level of Significance for Pediatric Trials

This method is a hybrid method in the sense that the Bayesian paradigm is only used to justify a larger significance level for the pediatric trial such that the confidence in the efficacy of the drug in children is not less than the confidence in the efficacy of the drug in adults. It was developed in an EU-funded project [35] and presented and discussed at a workshop at EMA [36].

The procedure can be specified after the early phase of adult drug development when the plan for registering the drug in children is provided to the regulators. It starts with a prior probability that the drug is efficacious in adults, for example, using the historic Phase 3 success rate, and the typical assumptions for power and significance level in that disease area. The prior probability to be efficacious in adults when reaching phase 3 can be updated to a posterior probability to be efficacious when two pivotal trials were successful at specific power and alpha level. In standard settings with two successful phase 3 trials in adults and for a prior probability of 50%, the posterior probability to be efficacious in adults is 0.9992 (see [35] or [36] for details). If the skepticism is 20% that it does not work in children even if it works in adults, and 0% that it works in children even if it would not work in adults, that gives a a-priori probability for working in children of 0.8*0.9992 = 0.79936. From that starting point, an originally required alpha of 0.025 in one pediatric trial can be increased by a factor of 3.98. For details of these calculations and this factor please refer to [35, Table I]. The most critical parameter to agree upon is the level of skepticism.

Example 2 Extrapolation: Belimumab in Children with Systemic Lupus Erythematosus (SLE) [37]

Human Genome Sciences conducted a required post-marketing pediatric study of Belimumab in SLE. Despite intense efforts in recruitment, only 93 subjects were enrolled in the study, which was not enough to be adequately powered, and no formal statistical hypothesis testing was planned in the protocol. The clinical review proposed Bayesian methods as a means to borrow information from the adult to the pediatric population, expecting similarity of disease and response in these two populations.

The method applied was a Bayesian mixture model with an informative prior based on a weighted combination of a skeptical prior with a mean effect size of zero and a meta-analytical prior from two adult studies. For weights in the range of zero and one, the posterior probability of efficacy was calculated and reported. For weights for the meta-analytical prior larger or equal to 0.3, the posterior probability of efficacy exceeded 95%; for larger or equal to 0.55, it exceeded 97.5%; and for larger than 0.7, it exceeded 99%.

Based on discussions and these results, it was concluded that Belimumab 10 mg/kg has a positive treatment effect in pediatric subjects.

Note that this was a post hoc analysis. In a pre-defined analysis, either the maximum weight for the meta-analytical prior would be specified, or a mechanism on how to achieve the weight would be predefined (dynamic borrowing). The analyses with weights ranging from zero to one could be presented as a sensitivity analysis.

Example 1 Adaptive Trial with External Data: Phase 2 Transfusion-Dependent Beta-Thalassemia

Clinical development of new therapies in transfusion-dependent beta-thalassemia has several challenges. Patient enrollment in rare disease trials requires multi-center, multi-country studies, and the lack of reliable surrogate endpoint for dose selection requires powering for clinical endpoints usually used in Phase 3 trials. An acceptable endpoint from a regulatory perspective, which is based on a “responder analysis,” such as the proportion of patients experiencing ≥ 50% reduction in Red Blood Cell (RBC) transfusion burden and a reduction of ≥ 2 units, requires a 12 week screening period to establish the baseline transfusion burden for reliable comparison.

A Phase-2b, double-blind, randomized, placebo-controlled, multi-center study (NCT04938635) follows a Bayesian design with the use of noninformative, or weakly informative, priors for the active dose arms while using a robustified informative prior for the control arm. Historical control data is “borrowed” in an informative prior for the control arm rate from the Phase 3 trial BELIEVE [38]. As discussed in the previous section, a robust prior is important to address potential prior-data conflict, which can arise from multiple sources like population heterogeneity between the historical and current study. Therefore, the selection of historical data (BELIEVE trial) addresses similarity in inclusion/exclusion criteria, standard-of-care, etc. A prior-data conflict may arise if data from this Phase-2b trial suggests that the proportion is substantially different that 4.5%. Because prior-data conflict can inflate the frequentist Type I error, the robustification is required to control the level of borrowing depending on the level of prior-data conflict. The robustification of the informative prior does not take into account prior-data conflict in terms of population or study characteristics but focuses on the informative prior of the parameter of interest and the corresponding likelihood of the current data. For example, in the BELIEVE study, out of 112 patients randomized to the control arm, five patients (4.5%) had a ≥ 33% reduction in transfusion burden over  24 weeks.

The historical control data is used to construct an informative prior for the control arm to reduce the burden of patients randomized to a control arm and improve the trial’s efficiency in performing dose selection [39].

Example 2 Adaptive Trial with Borrowing and RAR: Phase 2 in Systemic Lupus Erythematosus

In a randomized, double-blind, Phase 2 study [40] in patients with SLE, patients are to be randomized to one of four treatment groups: three doses of investigational product (IP) or placebo. The primary endpoint of the study is Systemic Lupus Erythematosus Responder Index 4 (SRI-4) response at 52 weeks, a dichotomous outcome where response indicates success. This endpoint will be evaluated using a Bayesian Hierarchical Model (BHM) with non-informative priors. The initial randomization ratio will be 1:1:1:1.

At each interim analysis planned at eight prespecified time points, a response-adaptive randomization procedure will take place. Other planned adaptations include an adaptive rule to allow for the possibility of changing the primary endpoint at Week 52 to Lupus Low Disease Activity State (LLDAS) or BILAG-Based Composite Lupus Assessment (BICLA) and an adaptive rule that allows for the possibility of pooling data from different dose levels in the comparison to placebo for the primary analysis [40]. It should be noted that there were simulations presented for competing designs before this design was accepted.

Example 3 Adaptive Trial with External Data: Phase 3 Duchenne Muscular Dystrophy

Duchenne muscular dystrophy (DMD) is a challenging therapeutic indication for drug development. To date, no therapy has demonstrated a convincing benefit on a clinical endpoint in DMD. Clinical trials usually focus on the timed six-minute walk test (6MWT) or the North Star Ambulatory Assessment (NSAA10). The high level of variability in ambulation-based endpoints has complicated the interpretation of several trials in DMD.

A Bayesian adaptive trial [41] was designed to serve as a basis for accelerated approval in the United States. This trial was selected for FDA CID program. The primary endpoint was the change from baseline in dystrophin level and the key secondary endpoint was the change in NSAA through 48 weeks. Multiple interim analyses were planned for primary and key secondary endpoints.

Interim analysis for dystrophin could be used for accelerated approval, supplemented with dystrophin results from the ongoing open label trial.

The objective of NSAA interim analysis is to potentially stop enrollment based on predicted success of 48-weeks analysis. The trial incorporates placebo augmentation using placebo data from past clinical trials. It uses a meta-analytic approach to dynamically determine the level of borrowing from previous data. A thorough simulation study was conducted to understand the operating characteristics of the trial.

Conclusion

Using external information to influence the conclusions of a clinical trial and running the trial in an adaptive way can help reduce sample size, increase power, reduce costs, and reduce ethical dilemma. The Bayesian paradigm offers a principled way to implement these design features. The use of prior information can help augment the precision for decision-making and help reduce the sample size, duration, and cost. With or without prior information, a Bayesian approach can offer flexibility in the design and analysis of adaptive trials, especially when complex adaptations and predictive models are used. Additional effort is needed, if one has to prove that the Bayesian method will control Type I error for the final decision, as Type I error is a frequentist uncertainty measure and is not automatically ensured with Bayesian methods.

Bayesian approaches require specialized statistical and computational expertise. Computational complexity had been a barrier for widespread use of Bayesian methods; this is no longer an issue with modern computation power and the availability of software to implement many design/analysis options. All innovative trials—using the frequentist or the Bayesian framework—ought to be planned using simulations to compare competing options; computations have become the norm in planning of all trials. Yet as Bayesian methods themselves rely on extensive computations, trial simulations can be particularly resource-intensive for Bayesian methods. This and adjusting to the Bayesian way of thinking can be challenging to researchers that are only familiar with the frequentist framework. Though, while this may appear to be burdensome, its potential advantage of a principled way to reduce sample size, increase power, reduce costs, and reduce ethical dilemma can outweigh the initial learning curve.