European legislation states that market access for new drugs requires the same level of evidence, regardless of whether the drug is intended for rare or highly-prevalent diseases [1]. However, generating robust evidence with small subject samples is a methodological and logistic challenge [2] that may discourage sponsors from researching new treatments for rare diseases [3,4,5,6]. In addition, reports have warned of the potential risks of approving medicinal products when decision-making is based on limited data [4, 7,8,9,10,11,12,13,14].

Regulators prefer conventional trials to new designs because the benefit is generally read as less uncertain and they include larger pre-marketing safety populations and allow a better benefit-risk assessment and more confident decision-making. There are various reviews of the amount and quality of evidence supporting regulatory decisions on medicinal products intended for rare diseases or orphan indications -orphan medicinal products (OMP) under European regulations [1, 15,16,17,18] and of the potential risks of accelerated approval procedures when decision-making is based on limited data obtained using conventional methods [7,8,9,10,11,12].

Methodologies aiming to increase the statistical efficiency of clinical studies that might be useful in small populations have been proposed, but have mostly been applied to the clinical development of prevalent diseases, rather than rare diseases [19]. The reasons why such models are not applied to rare diseases may include the lack of predictability of regulatory requirements and sponsors’ fears of regulatory reluctance to accept non-standard methods.

Methodological guidance specific to clinical investigation of a particular disease is an effective method of providing a predictable decision-making framework [20], and is useful for developers and regulators. Such regulatory guidance for the clinical development of new medicinal products has been issued for many prevalent diseases for decades by the European Medicines Agency (EMA) [21], Food and Drug Administration (FDA) [22] and other regulatory agencies. However, there is limited disease- or medical condition-specific regulatory guidance on orphan and rare conditions: The EMA has issued two general guidance papers on small populations [23] and paediatric development [24], respectively. These provide general considerations on the rationale of regulatory assessments and the specificities of diseases that should be taken into account when tailoring clinical development to a specific clinical condition. In addition, some disease-specific documents have been issued, but for only 14 of the thousands of orphan medical conditions described [25]. The huge number of rare diseases hinders the development of disease-specific scientific, methodological, statistical and/or regulatory guidance, which would be time and resource consuming, but may not be necessary, as many diseases or situations have common features that may allow similar recommendations to be applied to their study.

From a regulatory and clinical development point of view, it may not be appropriate to refer to diseases, as defined by available medical classifications, to identify situations for which similar recommendations could be given, since the clinical development of OMP for a given disease is likely to depend also on the therapeutic approach, expected outcomes and feasible measurements, amongst other characteristics, and may differ substantially depending on the intended therapeutic indication. Thus, one disease may encompass different situations depending on the therapeutic indication (i.e. an acute infection in a patient with congenital immunodeficiency is a single acute episode with a short treatment and short time to outcome, but the underlying immune suppression is a chronic disease resulting from an underlying genetic defect requiring a permanent solution or life-lasting treatment), so that the study of each indication may require distinct methodological approaches. Thus, it may be better to talk of medical conditions resulting from the combination of disease and therapeutic indication for a given product rather than diseases.

The key first step towards improvement is to describe the current regulatory basis for approval of OMP and identify potential areas for improvement in the robustness of the data supporting regulatory decisions. In addition, knowing the reference standard is required to explore the potential impact of new statistical methods, such as those arising from the ASTERIX project [26], on the overall process of development and regulatory decision-making. Identifying uncertainties at the time of regulatory decision-making on OMP will help to focus on areas where greater robustness of the data obtained during clinical development is mainly required.

Rare diseases have in common a low prevalence but are otherwise widely heterogeneous clinically. We therefore aimed to propose a grouping of medical conditions that was sound from a regulatory and methodological perspective and could facilitate the selection of examples for testing the applicability of new methodologies. Accordingly, we developed a clustering based on medical conditions, as defined by two principle features: (i) the clinical disease and therapeutic approach or intended indication to be claimed by the OMP, and (ii) the characteristics of the condition driving requirements for the applicability of different methodologies and designs of clinical studies.

The aim of this study is to summarise the reference of the current regulatory basis for approval of OMP by the EMA, as systematized using a clustering of medical conditions, and to provide proposals for the management of the uncertainties identified and areas for improvement.


Development of the clustering framework

Three steps were used to build the clustering of medical conditions. First, initial clustering was made using an unsupervised statistical method −multiple correspondence analysis (MCA) [27,28,29] − based on potentially-informative criteria (clinical characteristics, treatment of interest, endpoints and variables, feasibility of recruitment, available treatments and treatment targets) for a representative differential set of 27 medical conditions. Secondly, the clustering was interpreted and refined by consensus between experts from different fields (regulatory, statistics, clinical). Thirdly, the clustering was validated in a larger, comprehensive set of orphan medical conditions and by an external panel of clinicians, methodologists and regulators.

The larger set of conditions consisted of all authorised OMP for which there are European Public Assessment Reports (EPAR) on the EMA webpage [30] since the inception of the Orphan Act until December 2014, and with active OMP designation at the time of authorisation (N = 125). The unit analysed was the EPAR, meaning the binomy OMP-medical indication as the unit assessed in the regulatory evaluation; the orphan medical indication is referred from now on as “medical condition”.

The overall process was carried out by 12 investigators with different backgrounds and expertise (public and industry drug development, medical research, statistics, medical practice, regulation, reimbursement and patient networking), with the involvement of a panel of additional external experts in the last phase.

Development of the reference on regulatory basis for approval of OMP

The pivotal evidence supporting approval of the 125 OMP with marketing authorisations was extracted using variables describing the methods and key results of the dataset summarised in the EPAR (Additional file 1: Table S1). The data was analysed descriptively to identify the areas where regulatory decision-making deviated from the usually-accepted standards (i.e. statistically-significant and clinically-relevant demonstration of efficacy obtained from two replicate well-designed clinical trials [31], and a safety database compliant with ICH E1 standards [32], and to describe areas of regulatory uncertainty. Only trials identified or referenced as pivotal in the EPAR were analysed (generally phase III or phase II trials), since these are the trials supporting the risk/benefit assessment. The analysis was systematized according to six clusters of medical conditions for which the OMP applied for marketing authorisation. Prevalences were extracted from OMP designations.

Frequencies and percentages (n (%)) were used to describe qualitative variables, and mean (SD) or median (P25-P75) for quantitative variables, as appropriate.


A total of 125 EPARs were analysed that included positive opinions for 98 different active substances (14 active substances had > 1 authorised orphan indication, with a maximum of 4) authorised in 84 different orphan medical indications (20 orphan indications had positive opinions for > 1 OMP, maximum of 7).

Clustering of medical conditions

The process of clustering converged, resulting in six clusters: (1) conditions with single acute episodes, (2) conditions with recurrent acute episodes, (3) chronic slow or non-progressive conditions, (4) progressive conditions led by one organ-system, (5) progressive multidimensional conditions and (6) chronic staged conditions. The prevalence of the condition (rare: ≤5/10,000 and > 1/100,000 and ultrarare: ≤1/100,000) was taken into account due to potential implications of the limited feasibility of certain types of design and the implications for regulatory assessment [33] (Fig. 1 and Table 1).

Fig. 1
figure 1

Proposed clusters of conditions

Table 1 ASTERIX clustering of orphan medical conditions

Eighty-five medical conditions (pairs of diseases with their corresponding therapeutic indications) were identified from the 125 EPARs published between 1999 and 2014. All medical conditions were uniquely assigned to one cluster (Additional file 1: Table S2). EPAR for staged conditions were the largest cluster (38/125, 30%), and EPAR for conditions with recurrent acute episodes the smallest (9/125, 7%).

Regulatory standard

Fifteen (15/125, 12%) OMP authorisations were granted in the absence of evidence from clinical trials; of these, nine were based on literature reports summarising the clinical experience on well-established use of products that had been available for many years as compounded medication or off-label used medicinal products, four were based on observational retrospective studies collecting data on clinical practice with the OMP, and two on data from compassionate programs. Thus, 110 applications were based on clinical trials (Table 2).

Table 2 Description of European Public Assessment Reports (EPARs) of orphan medicinal products

The 110 OMP authorisations based on clinical trials included a total of 159 pivotal clinical trials. The mean (SD) number of pivotal trials per marketing authorisation application (MAA) was 1.4 (0.7): 38 applications were based on ≥2 pivotal trials (35% of MAA based on clinical trials, 30% of all MAA of OMP). Applications for chronic conditions with stable or slow progression had the highest mean number of pivotal trials, and applications for chronic progressive conditions led by multiple system/organs and chronic staged conditions the smallest. In addition to pivotal trials, a mean of ≥2 supportive trials were included in MAA in all clusters, with conditions with recurrent acute episodes having > 4 supportive trials per MAA.

Twenty (12.6%) pivotal trials did not fulfil the main study objective. The highest proportion of positive trials was for chronic staged conditions, whilst one third of pivotal trials on chronic conditions with stable or slow progression did not meet the main end-point. Thirteen MAA (11.8%) of those based on evidence from clinical trials did not include any pivotal trial fulfilling its main objective. Chronic staged conditions had the lowest proportion of authorisations based only on negative trials. The conclusions of 20 (12.5%) pivotal trials were based on analysis of subgroups; this represented 18/110 (16.3%) of MAA based on clinical trials; of these, 13 were predefined and five were decided post-hoc.

Half the pivotal clinical trials in MAA were double-blinded, ranging from 92.3% of trials for conditions with recurrent acute episodes to 26.9% for chronic progressive conditions led by one system/organ. Randomisation was applied in all pivotal trials for conditions with recurrent acute episodes and 86% for chronic progressive conditions led by multiple system/organs, but only to 38.5% for chronic progressive conditions led by one system/organ and 52% for conditions with single acute episodes. Placebo controls were used in 92.3% of trials for conditions with recurrent acute episodes but only in 19.2% of trials for chronic progressive conditions led by one system/organ and 25.9% for conditions with single acute episodes. Active controls were used in < 20% of trials in all clusters. Single arm trials were the most frequent design in chronic progressive conditions led by one system/organ (61.5%), and frequently used in conditions with single acute episodes (44.4%), while two trial arms were more frequent in conditions with recurrent acute episodes (84.6%) and chronic progressive conditions led by multiple system/organs (76.2%); three or more trial arms were relevantly used only in chronic staged conditions (37.8%). Parallel design was the most frequent setting for comparative trials. Crossover or other methods were infrequent.

Most trials in clusters for chronic conditions used intermediate primary variables; only conditions with recurrent acute episodes used mainly clinical variables as primary outcome (84.6% of trials). Discrete primary variables were used more frequently in clusters of conditions with single acute episodes and chronic progressive conditions led by one system/organ (74.1% and 69.2% of trials, respectively). Continuous variables were frequently used for trials of chronic progressive conditions led by multiple system/organs and conditions with recurrent acute episodes (61.9% and 61.5% of trials, respectively). Time variables were used frequently (46.7%) for chronic staged conditions. Chronic conditions with stable or slow progression had the highest proportion of trials with multiple primary endpoints (14.8%). Most trials had a superiority objective, but in 69.2% of trials in the cluster of chronic progressive conditions led by one system/organ the objective was to estimate value.

The size of the safety population (number of patients exposed to the product) was lower for ultra-rare conditions [median (IQR): 28 (22–64)], than for rare or very rare conditions [median (IQR): 151 (65–298)]. The cluster of progressive multidimensional conditions included the most ultrarare conditions (5/10) and also had the smallest datasets.

The uncertainties derived from the analysis of the data supporting OMP regulatory approval are summarised in Table 3.

Table 3 Regulatory uncertainties identified


Summary of findings

We analysed the current basis for regulatory approval for OMP in the European Union (EU). The results show that 88% (110/125) of OMP authorizations were based on clinical trials, of which only 35% complied with the usual regulatory standard of ≥2 replicated pivotal trials [34]. The mean number of pivotal trials per indication was 1.45 and half the pivotal trials were phase II trials. Likewise, 13% of OMP approvals included clinical trials that did not meet their main objective, which could be considered consistent with the theoretically-expected number of false negatives in a standard scenario, but almost 10% of EPAR were authorised based only on negative trials. The overall size of the exposed population at the time of authorisation was generally lower than that required for the qualification of clinically-relevant adverse reactions [32]. Reports have described similar results concerning the number of trials and the proportion of phase III trials, but none has reported on the proportion of negative trials [35].

Quality of scientific evidence

One-third of trials did not include a control arm, one-third did not use randomisation, half were open-label and 75% used intermediate or surrogate variables as the main outcome. These characteristics differ substantially from the recommended standards [36]. Differences between trials in orphan medical conditions compared with those in prevalent conditions have been reported, including a higher frequency of non-controlled study designs, the lesser use of randomized allocation of patients, a higher percentage of open-label trials and fewer individuals enrolled [4, 15, 16, 37, 38]. As expected, noticeably smaller sample sizes are reported for ultra-rare diseases (prevalence < 1/100,000) compared with more prevalent rare diseases (prevalence between ≥1/100,000 and 50/100,000) [39]. All these features are related to the risk of bias, and may increase type 1 error, suggesting that current evidence supporting OMP authorizations might be biased towards a higher chance of positive results [40].

Although pivotal trials generally included small numbers of patients, the EPAR included a median of three additional supportive studies (i.e.: non-pivotal trials) per authorised indication. In general, the median number of supportive trials doubled the number of pivotal trials, suggesting that the number of patients recruited into pivotal trials may potentially have been higher, meaning that bigger sample sizes might have been feasible; this would had allowed to detect smaller effects, increase power and potentially reduced the likelihood of negative trials [40]. Supportive trials were likely a relevant source of additional data to support decision-making, especially in applications including no pivotal trials, those based on one single pivotal trial and – especially – only on negative trials. Supportive studies contribute to the assessment of dose ranging, the clinical relevance of main end-points, and the duration of effects and safety issues, and are a source of complementary information in a setting of a scarcity of pivotal evidence [36]. Thus, in the context of the relative scarcity of data in OMP dossiers, supportive studies become especially relevant, and it is of utmost importance maximizing the quality of any study or research during the product development, i.e. from early proof-of-concept trials to open-label extension safety cohorts.

These findings suggest that, on the one hand, the generation of robust scientific evidence for OMP is a hard challenge and, on the other hand, that regulators are often taking decisions on OMP based on weak scientific evidence [15, 41, 42].

Findings in clusters of medical conditions

Authorization in the absence of clinical trials was more frequent in the cluster of chronic progressive conditions led by multiple system/organs, which included many inherited diseases affecting children. There were a number of EPAR that recognised well-established uses of products already available in clinical practice, whose authorization was probably unavoidable [43]. The applications included both retrospective studies, which have a low level of robustness and are a source of uncertainty for decision-making, but also prospective registries and compassionate programs. The latter may allow structured, complete information on effectiveness and safety to be obtained, provided that the design is made considering their future utility as a source of data for priors in Bayesian designs or as an external reference [44]. However, the data is not comparative and is of small value in assessing causality [36]. Specific meta-analytical techniques can be applied to studies to ease the interpretation of data at the time of regulatory assessment [40].

Negative trials were observed across all clusters, but less frequently in conditions with recurrent acute episodes and chronic staged conditions. The clinical setting of conditions with recurrent acute episodes allows designs based on repeated measurements and paired data, both of which increase the efficiency of trials [36]. In the case of chronic staged conditions, the smaller number of negative trials might be related to an overall greater number of patients included than for other clusters, but the fact that the trials were often open-label may have also contributed [40, 45].

In 61.5% of pivotal trials for chronic progressive conditions led by one system/organ and 44.4% of those for conditions with single acute episodes the design had an inherently-low potential to conclude causality, due to lack of control and open-label designs with a single arm. Both clusters included many serious conditions with a lack of an acceptable standard of care. The willingness to provide any potential treatment (even in a scenario of huge uncertainty) for patients lacking alternatives, in response to the ethical right of beneficence, may have precluded the conduct of comparative designs [46]. In such a setting, efficacy may be overestimated for many reasons (lack of comparator, lack of blinding, use of historical controls with different background therapies and reliance on surrogate, non-validated variables based on subjective assessments, amongst others). Thus, the lack of conclusive information is a reason for concern for patients when granting regulatory approval, since there is a poor basis to determine the efficacy and safety of the new products [44].

The percentage of EPAR based on replicated trials was < 20% in the cluster of chronic progressive conditions led by multiple system/organs, which also had the lowest mean number of patients exposed. This may be because this cluster includes many ultrarare and often inherited paediatric conditions, where the feasibility of recruitment is limited and, accordingly, few subjects could potentially be recruited for (replicated) trials. In contrast, the cluster of staged conditions also had < 20% of EPAR based on replicated pivotal trials, with evidence based mainly on one (often phase II) trial, but this cluster represented mostly adult malignancies, with no ultrarare conditions, and with the highest mean number of exposed patients. This suggests that that the lack of replicated trials in this case is not related to the disease prevalence, but rather to the reduced requirements due to early access policies in the context of perceived severity and medical need. In fact, warnings on the overestimation of benefits at the time of approval under early access policies have been raised [47].

The cluster of conditions with single acute episodes had a higher proportion of decisions based on data other than clinical trials or on negative trials, taken in the absence of positive trials and lacking replicated trials, suggesting that clinical research may be especially challenging for many reasons in this cluster.

Conclusions based only on subgroup analysis were observed in 13% of trials, but in one-third of positive opinions for chronic progressive conditions led by one system/organ, and in some cases these were post-hoc subgroup analyses of otherwise negative trials. These conditions are characterized by a poor prognosis that makes it ethically difficult to conduct conventional controlled double-blind parallel trials, but also by substantial clinical heterogeneity. However, the EMA [48] warns against the risks of subgroup analyses potentially leading to unreliable inferences and, consequently, poor decisions, due to their increased probability of false-positive findings, especially if data-driven, and gives specific mention to the inappropriate “rescue” of negative trials through subgroup analysis. Thus, especial care should be paid to the pre-determination of subgroups in this setting.

The type of primary variables (discrete vs continuous, final vs surrogate, time to event) allowed discrimination between clusters. Clusters including chronic conditions mainly had primary variables based on surrogates; for chronic progressive conditions led by one system/organ, the variables were often functional and based on subjective assessment. While surrogates have many benefits in that they may improve trial power and the ability to describe product activity, warnings on overreliance on intermediate variables have repeatedly been made: surrogates may not actually predict clinical benefits, can mislead physicians on whether a drug works and have the potential to expose patients to poorly effective treatments or unanticipated adverse effects [4].

Study limitations

The study had a number of limitations. First, it was based only on data from medicines approved in the EU, when they received marketing authorisation from the European Commission and had an orphan drug designation. Three groups of medicines were excluded: (a) medicines authorised before the orphan drug legislation entered into force, (b) medicines without an ODD, and (c) medicines that held an ODD during development, but not at the time of marketing authorisation. Comparisons to standards in other regions, or to decisions issued by other regulatory bodies were out of the scope of the current exercise. Secondly, regulatory evidence was analysed using only conditions for which an approved OMP was already available, and this may be regarded as a source of bias, because successful OMP may over-represent conditions for which conventional research methods are actually applicable, making new treatments easier to study and develop [38]. Partial selection of the data used to describe current practice may lead to a biased picture of the actual methods used in clinical research for OMP. However, the available information on negative opinions has only recently been published, and is less extensive than that for positive opinions [30], and there are no other publicly available sources for systematized information on the evidence supporting regulatory decisions. In addition, the description of the regulatory standard in authorised OMP showed that replicated parallel randomised double-blind trials were not the rule.

Thirdly, product labelling has been proposed as a flawed source for the study of orphan drug approvals [4]. However, EPAR include detailed information on the basis for regulatory decisions, including thorough discussion on the strengths and weaknesses of data [30]. Even so, there was heterogeneity in the extension and detail of the EPAR over time, so that the reliability of information on specific trial details, i.e. pre-definition of subgroup analysis, cannot be ensured. We may have overestimated some parameters due to a lack of details in the EPAR; similar limitations have been reported [35]. Fourthly, we did not extract details on the actual statistical methods applied (i.e. adaptations, interim analyses or type of adjustments for multiplicity). Fifthly, we compared the robustness of data supporting regulatory decisions using conventional methodological standards as a reference [36], but did not consider other aspects such as the effect size, the degree of unmet medical need or contextual considerations. Thus, the possibility that conclusions on the weakness of supportive evidence may be overestimated cannot be ruled out. However, such criteria, when mentioned in EPAR, are referred to as narrative statements under the risk-benefit considerations, not systematized, and generally referred to the singularity of cases. Due to the lack of available references on the acceptability of these criteria for robustness of data, we limited our analysis to conventional items on methodological quality. Finally, we focused our analysis on the areas of uncertainty at the time of decision making, but did not study whether uncertainty resulted later in lack of effectiveness in real life, or drug withdrawal for safety reasons; such objective was out of the scope of the current work, and would require further investigation.

The clustering proposal was built based on a limited number of conditions, which could be regarded as too small to be representative of the overall complexity of the huge number of orphan and rare conditions [25]. However, the description of the regulatory standard across the clusters showed that the EPAR included similar situations and methodological approaches to the development of OMP that were shared by several OMP within a given cluster, and is useful in identifying where the key challenges in the design and selection of outcomes for a given development in different groups of medical conditions lie.

The development of new methodologies and statistical approaches to the study of rare diseases have been boosted in recent years, in part thanks to the FP7 initiative funding three projects (ASTERIX, IDeAl, and InSPiRe) [49] on improving methods suited to the study of small populations. However, the translation of statistical advances to practice has traditionally been a challenge, because of perceived technical complexity and regulatory reluctance to deviate from the double-blind randomised gold standard. Any initiative aimed at facilitating the dissemination of methods and focused guidance may help to improve their uptake and, consequently, may facilitate better research into OMP. Such an unmet need was noted in a recent expert discussion (Small Population Clinical Trials Task Force led by IRDiRC [2] which agreed that a classification of rare diseases suitable to discuss the potential application of different study methods or designs was required. Our clustering proposal might be a contribution to this aim. By bridging the distance between too general guidance and unfeasible disease-specific guidance, it may help to systematize such dissemination and guidance. Our proposal differs from other medical or clinical classifications [25, 50, 51] in that the proposed clusters agglutinate rare medical conditions, rather than rare diseases, and may be a pragmatic way of identifying situations where new developments are required, and where newly developed methods could add value. Our proposal may require further validation and refining if new conditions appear that are unclassifiable but, until now have been acceptable to describe the current situation for authorised OMP in the EU, and to systematize situations where certain methodologies or study designs may be applicable in order to structure the output of the ASTERIX project.


Our description of the regulatory evidence supporting OMP authorization has identified substantial uncertainties, such as weaker protection against type 1 and type 2 errors, the use of designs unsuited to conclude on causality, the use of intermediate variables without validation, a lack of a priorism and insufficient safety data to quantify risks of relevant magnitude. Some of these features are not exclusive to rare diseases and some may be unavoidable in some situations because of the sometimes (ultra-) rare nature of the disease. However, it is reasonable to assume that there are opportunities for improvement, including increasing the application of available methods and designs that may be more efficient or robust in small populations, but also the development of novel methods better suited to these conditions. A clustering of medical conditions based on the convergence of clinical features and their methodological requirements is proposed, aimed at facilitating the production of specific methodological and regulatory recommendations, and as a framework for the testing and validation of new methods for the study of OMP.