FormalPara Key Points

Less than half of the authorized orphan drugs for metabolic diseases show good effectiveness in the real world.

Of drugs with an unclear efficacy at the time of authorization only 21% had good real-world effectiveness.

The use of a clinical or validated surrogate primary endpoint in the pivotal study seems to be the most important factor associated with good real-world effectiveness.

1 Introduction

In the European Union (EU), a disease is considered orphan if it is a life-threatening or seriously debilitating disorder, with a prevalence of less than 5 in 10,000. Up to now, about 7000 rare diseases have been documented, affecting 30–40 million people in the EU [1]. In the past 16 years, 130 orphan drugs have been marketed. Initially, the small consumer’s market offered little attraction to the pharmaceutical industry to develop drugs for rare diseases. To stimulate the development of orphan drugs, European legislation was introduced in 2000. The legislation implies that, for a product with orphan status, the pharmaceutical industry is entitled to receive (1) a fee reduction for scientific advice, (2) reduction of fees for marketing authorization application, (3) centralized registration in the EU, and (4) 10 years’ market exclusivity after marketing authorization [2]. Since the implementation of the Orphan Drug Legislation and its incentives, orphan medicinal products (OMPs) are being developed at an increasing rate. In 2015, for example, the European Medicines Agency (EMA) approved 14 OMPs, whereas in 2000 only three OMPs were approved [3]. In addition, more than 40% of all registered OMPs have been developed in the past 5 years [4]. Expectations are that the development rate of OMPs will only increase in the future [5].

Since patients with rare diseases have the same right to access to effective and safe drugs as all other patients, robust evidence is needed regarding benefits and risks. At evaluation of OMPs, however, regulatory authorities are often confronted with difficulties that are inherent to the rarity of the diseases for which the drugs are developed. Interpretation of data is often hampered, because clinical trials may lack a placebo control, are open label or are performed in a limited number of patients. Moreover, the heterogeneity of each disease and the frequent use of non-validated biomarkers or functional outcome measures (FOMs) may hamper reliable interpretation of data [6,7,8]. It is important to emphasize that biomarkers or FOMs are only ‘surrogate endpoints’ if they are validated (Box 1).

Table 2

The regulatory authorities assess the benefit/risk ratio of OMPs based on the evidence provided in the dossier submitted to the EMA. In case the evidence is insufficient to conclude on a positive benefit/risk ratio and thereby to grant a full marketing authorization, the EU legislation allows two alternatives. The first is a ‘conditional approval’, which can be granted for drugs when the clinical data are limited, but indicate a positive benefit/risk ratio, and when it is likely that additional data will be provided in a reasonable timeframe. If the additional data confirm the positive benefit/risk ratio, the conditional approval can be changed to a full marketing authorization. The second is a marketing authorization under ‘exceptional circumstances’, which is granted in cases when limited data are provided at the time of marketing authorization application, but additional data cannot reasonably be expected to be delivered in the future due to for example the rarity of the disease [9]. Nonetheless, the marketing authorization holder (MAH) is generally requested by the regulatory authorities to collect additional data on safety and long-term effectiveness in the post-marketing period.

After marketing authorization, regulatory authorities in each EU member state make their own assessment before granting reimbursement. These national assessments may for example take into account relative effectiveness–the ‘net benefit’ of the new drug needs to be desirable, sufficiently big and a relevant addition to the present care—probability of real-world effectiveness as well as cost-effectiveness. The debate on real-world effectiveness has recently been fueled by the authorization of a drug for mucopolysaccharidosis (MPS) IV, elosulfase alfa (Vimizim®). It was granted a full marketing authorization by the EMA on the basis of a favorable benefit/risk ratio regarding quality, safety and efficacy data submitted by the manufacturer [10]. This was mainly based on a short-term study that showed a significant increase in mobility, using the 6-min walk test (6-MWT). However, it was rejected for reimbursement by several national authorities (e.g., The Netherlands, Sweden, Belgium and Spain). The Dutch reimbursement authority (Zorginstituut Nederland, ZiN) stated that the data are limited to a suboptimal short-term outcome in a heterogeneous population, showing a small effect with a wide confidence interval that is not clinically relevant, herewith estimating a small probability of good real-world effectiveness.

This difference between the opinion of the EMA (or other regulatory authorities such as the US Food and Drug Administration) and the (estimated) real-world effectiveness is in the literature often referred to as the ‘efficacy-effectiveness gap’ and is particularly an issue in the field of orphan drugs [11]. To support the harmonized decisions of the EMA and the national reimbursement authorities in the future, it is important to bridge this gap. Hence, the purpose of our research project is to investigate whether data used for authorization (i.e. data from the pivotal studies) are predictive for real-world effectiveness (i.e. effect on clinical endpoints in clinical practice and post-marketing), and to explore which factors contribute to the efficacy-effectiveness gap. In this article we focus on OMPs for metabolic diseases. Based on the results of this study we aim to provide recommendations to be used by both investigators and regulators in order to reduce this gap and improve both access to and appropriate use of such medicines.

2 Methods

The ‘Community register of orphan medicinal products for human use’ of the European Commission was used to identify all OMPs that were authorized for the treatment of metabolic diseases up to 1 January 2016 [14]. A detailed description of study methods can be found in Supplementary Table 1.

2.1 Pre-Marketing Data

To collect information on the pivotal studies (pre-marketing data), European Public Assessment Reports (EPARs) were used, which are available on the website of the EMA. We assessed the quality of evidence of the pivotal studies using the COMPASS (Clinical evidence of Orphan Medicinal Products–an ASSessment) tool [15]. In addition, we scored the efficacy of the OMPs based on the effect on primary and secondary endpoints in the pivotal studies and the considerations of the EMA as set out in the EPARs as follows: category 3 = good efficacy, category 1 or 2 = unclear efficacy, and category 0 = no effect (Table 1).

Table 1 Categories of efficacy and effectiveness

2.2 Post-Marketing Data

We divided the evaluation of effectiveness in two phases: (1) relative effectiveness at the time of reimbursement decisions by the Dutch National Health Care Institute, and (2) real-world effectiveness, including all post-marketing literature that was available until June 2016, supplemented with experts’ and patients’ opinions. If an OMP was authorized after January 2014 it was assumed that no reliable estimate of the real-world effectiveness could be made. Similar to pre-marketing data, the effectiveness of each OMP was divided into three categories as defined by the authors: category 3 = good effectiveness, category 2 or 1 = unclear effectiveness, or category 0 = no effect (Table 1). In addition, experts were interviewed and their opinions were used to up- or downgrade the category a drug received by one category at most. Also, expert opinions were used to select relevant post-marketing studies. Patients were approached to fill in a short online survey with questions about their main symptoms and their opinion on the choice of endpoints in clinical trials. Patients’ opinions were taken into consideration upon identifying relevant clinical endpoints.

2.3 Statistical Analysis

To explore which factors contribute to the efficacy-effectiveness gap, the variables of the COMPASS tool as well as the relative and real-world effectiveness were dichotomized and compared with 2 × 2 tables. COMPASS variables were dichotomized as follows: type of primary endpoint was divided into clinical endpoints and surrogate endpoints versus non-validated biomarkers and FOMs, study population into extreme selection of patient population versus representative selection of patient population, type of marketing authorization into exceptional or conditional versus full authorization, study phase into phase III versus other phases, randomization into yes or no, and disease prevalence into rare or ultra-rare. Relative and real-world effectiveness were divided into ‘no or unclear effect’ (category 0, 1 or 2) or ‘good effect’ (category 3). Fisher’s exact test was used to assess statistical significance. A similar analysis was performed to compare efficacy [dichotomized into ‘no or unclear efficacy’ (category 0, 1 or 2) or ‘good efficacy’ (category 3)] and effectiveness. All analyses were performed using IBM SPSS Statistics version 22.

3 Results

3.1 Orphan Medicinal Products (OMPs) and Indications

We included 27 OMPs in our study, which were authorized for 25 metabolic orphan diseases (Table 2). Four OMPs were authorized for two different indications (miglustat, carglumic acid and pasireotide) or for different age groups (alglucosidase alfa), adding up to a total of 31 OMPs.

Table 2 Overview of orphan medicinal products (OMPs) for metabolic diseases authorized in the EU between 2000 and 2016

3.2 Pre-Marketing Data–Characteristics of Pivotal Studies

From the EPARs of the abovementioned 31 OMPs, 40 ‘pivotal’ or ‘main’ studies were identified (Table 3). Nineteen of these 40 (47%) studies were phase III studies, and the majority were multicentre (30/40, 75%) and multinational (25/40, 63%). In 26/40 (65%) studies the allocation was randomized and a control arm was used in 27/40 (67%). Nineteen of the 40 (47%) studies applied some type of blinding. In 6/40 (15%) studies, the study population was an extreme selection of the patient population due to strict inclusion criteria (i.e., only men included, considerable proportion of patients were physically unable to perform primary outcome measure test, etc.). Five of the 31 (16%) OMPs were authorized on the basis of an improvement on a clinical endpoint, while changes in FOMs or biomarkers resulted in the approval of 6/31 (19%) and 20/31 (65%) OMPs, respectively. Of the latter 20, seven used biomarkers that were classified as surrogate endpoints.

Table 3 Characteristics of pivotal studies (n = 40)

3.3 Pre-Marketing Data–Efficacy

Category 3 (good efficacy, see Table 1) was assigned to 11/31 (35%) OMPs, category 2 (unclear efficacy) to 9/31 (30%) and category 1 (unclear efficacy) to 11/31 (35%) (Table 2). Supplementary Table 2 gives an overview of the EMA considerations upon authorizing the individual OMPs, including the post-marketing commitments as agreed upon with the MAHs.

3.4 Post-Marketing Data–Relative Effectiveness at the Time of Reimbursement Decisions

Eleven OMPs were not yet assessed by the Dutch National Health Care Institute for one of the following reasons: (1) marketing authorization was granted only recently, (2) the OMP did not carry a claim of added benefit, or (3) the OMP did not have an annual budget impact exceeding 2.5 million euros [5]. Of the remaining 20 OMPs, 7/20 (35%) were classified as category 3 (good effectiveness), while 13/20 (65%) OMPs were categorized as category 1 or 2 (unclear effectiveness) (Table 2).

3.5 Post-Marketing Data–Real-World Effectiveness

For ten OMPs either no post-marketing studies were performed or a reliable judgement about effectiveness could not be made since marketing authorization was granted after January 2014 (Table 2). Of the 21 OMPs with post-marketing studies available, 8/21 (38%) were classified as category 3 (good effectiveness), 8/21 (38%) as category 2 (unclear effectiveness) and 4/21 (19%) as category 1 (unclear effectiveness). One OMP (5%) was categorized as category 0 (no effect). See Supplementary Table 3 for an overview of post-marketing evidence used. Forty-one experts participated in an in-depth interview, resulting in a range of 0–3 experts per OMP. For two OMPs no experts were willing to participate. Eight experts were involved in the pivotal trials of eight OMPs. Of these, three OMPs were not included in our analysis since post-marketing data were not (yet) available. Following expert opinions, the judgement on real-world effectiveness was upgraded for two OMPs and downgraded for one OMP (Table 2). A sensitivity analysis without up-/downgrading the categories according to expert opinion did not change the results. Seven patient organizations (POs) were approached representing 25 orphan diseases and 27 OMPs. Four out of seven POs (57%) were willing to participate. The remaining POs either did not want to participate (N = 1) or did not reply to e-mails (N = 2). On average 19 (range 4–40) patients per participating PO filled in the online survey resulting in a total of 75 patients representing 14 diseases and 13 OMPs. Forty-two patients (56%) reported fatigue and 15 (20%) mentioned pain as the most inconvenient symptom of their disease. When asked for which symptom of their disease patients preferred to see improvement upon treatment with the OMP, 29 (39%) mentioned fatigue and nine (12%) mentioned deterioration of their vital organs or cognition. In contrast, experts seemed to have a preference for endpoints that are convenient to measure, e.g. by blood tests or function tests. A comparison of endpoints used pre- and post-marketing, and preferred endpoints of patients and experts can be found in Supplementary Table 4.

3.6 COMPASS Variables

Ten OMPs were excluded from the analysis of factors contributing to the efficacy-effectiveness gap since no post-marketing evidence was available, resulting in a total of 21 OMPs (Table 4). Regarding the type of primary endpoint, 5/7 OMPs (71%) that were authorized based on a study with a clinical or surrogate primary endpoint showed good effectiveness in the real world, versus 3/14 (21%) OMPs that were authorized based on a study with a biomarker or FOM as primary endpoint (Fisher’s exact test: p = 0.056). Of the OMPs for which the study population was an extreme selection of the patient population, none showed good effectiveness in the real world (category 3), compared to 47% of the OMPs for which the study population was representative for the patient population (p = 0.131). Also, 5/16 OMPs (31%) that were authorized for ultra-rare orphan diseases (prevalence of less than 1 in 50,000) had good real-world effectiveness, compared to 3/5 OMPs (60%) that were authorized for rare diseases (prevalence between 1 in 50,000 and 5 in 10,000) (p = 0.325). Of the OMPs that were granted a full marketing authorization, 50% had good real-world effectiveness, compared to 22% of the OMPs that were granted authorization under exceptional circumstances (p = 0.367). The other COMPASS variables (study phase and randomization) did not show a relevant association with the real-world effectiveness (percentage difference ≤15%, data not shown).

Table 4 Relationship between COMPASS variables and real-world effectiveness of orphan medicinal products (OMPs)

3.7 Relationship Between Efficacy and Effectiveness

In total, eight out of 21 authorized OMPs (38%) showed good effectiveness in the real-world situation, and in 16/21 OMPs (76%) the efficacy category corresponded with the real-world effectiveness category. Five out of eight OMPs (63%) for which the efficacy was judged as ‘good’ showed good relative effectiveness versus 2/12 (17%) of OMPs with no or unclear efficacy (Supplementary Table 5). With respect to the real-world effectiveness: five out of seven OMPs (71%) for which the efficacy was judged as ‘good’ showed good effectiveness versus three out of 14 (21%) with no or unclear efficacy (Supplementary Table 5).

3.8 Relationship Between Relative Effectiveness and Real-World Effectiveness

For 13 OMPs no relative effectiveness assessment or post-marketing evidence was available, resulting in a total of 18 OMPS for this analysis. Six out of seven (86%) OMPs for which relative effectiveness was good also showed good real-world effectiveness, versus 1/11 (9%) OMPs with no or unclear relative effectiveness (Supplementary Table 5).

4 Discussion

Bridging the gap between efficacy and effectiveness is difficult, particularly in the field of orphan diseases, where generation of evidence is often limited to a few studies with considerable methodological shortcomings. This study reveals that less than half of the approved OMPs for which sufficient post-marketing evidence is available to judge the real-world effectiveness show good effectiveness in the real-world situation. The exploratory analyses of factors contributing to this gap showed that the type of primary endpoint used in the pivotal study seems to be the most important factor. Additional important findings are that none of the OMPs for which the study population was an extreme selection of the patient population showed good effectiveness in the real world, and that a very low disease prevalence and conditional/exceptional authorization also more often led to disappointing real-world effectiveness. Relative effectiveness (at the time of the reimbursement decision) has been shown to be highly correlated with real-world effectiveness. For this study, we chose to compare efficacy assessment from a centralized procedure with relative effectiveness assessment of a national reimbursement authority, which may not represent the situation in all EU countries. However, several general assumptions can be made.

4.1 Study Design, Choice of Endpoints and Study Population

Due to the heterogeneity and slowly progressive nature of many rare diseases, there is often a need to use biomarkers or FOMs as the primary study endpoint. Our study, however, suggests that improvements on a biomarker level may not correspond with improvements relevant for the patient in the real-world setting. According to regulatory guidelines, biomarkers or FOMs may be acceptable but only if validated, and thus labeled as surrogate endpoints. In this respect, it is a common misconception that if an outcome is believed to be correlated to clinical outcomes it can be used as a surrogate endpoint [16]. In-depth studies are needed to confirm that the surrogate endpoint responds to treatment, predicts clinical response and is related to the pathophysiology of the disease. When, at the time of marketing authorization, uncertainty exists about the relationship between the biomarker or FOM and clinical benefit, adequate post-marketing studies and registries may ideally be used to validate the endpoint. Our study shows that if an OMP is authorized based on a relevant effect on a surrogate endpoint, the real-world effectiveness is generally good. This is for example the case in phenylalanine levels in phenylketonuria [17] and platelet counts or spleen volume in Gaucher disease, for which the correlation with clinical outcomes is well established. Interestingly, two of the three OMPs that were approved on the basis of an improvement in biomarker levels but showed good effectiveness in the real-world situation used biomarkers that were–according to the experts–highly correlated with clinical symptoms. Experts’ opinions might thus be valuable in judging whether or not a specific endpoint could be used as a surrogate endpoint. Another important note to make is that if a biomarker or FOM is validated as a surrogate endpoint in a certain disease, this does not mean that it can automatically be used as surrogate endpoint in other diseases. The 6-MWT is for example validated as a surrogate endpoint in pulmonary arterial hypertension, but not for many other diseases. Another important limitation of using the 6-MWT as a primary endpoint is that only patients who are able to walk can be included in the clinical study. Consequently, extrapolating clinical study results to the entire patient population may be problematic [18]. As was shown by our study, none of the OMPs for which the study population was an extreme selection of the patient population showed good effectiveness in the real world. This highlights the need for studies with study populations that are a better reflection of the total patient population, thus having a high external validity.

4.2 Recommendations and Ways Forward

Improvements can be introduced at the pre-marketing and post-marketing stages. First, the quality of pre-marketing studies has to be sufficient to allow an unbiased (independent) judgement about the efficacy of an OMP at the time of marketing authorization. If biomarkers are used as the primary endpoint, caution is required for their predictive value for real-world effectiveness, and studies on long-term effectiveness are of great importance. Moreover, validation of biomarkers or FOMs as a surrogate endpoint in the context of a specific disease seems appropriate. Since our results suggest that patients’ and experts’ preferences regarding study endpoints may differ, it is important that all relevant stakeholders (patients, academia, industry and regulatory authorities) agree upon what constitutes a sensitive and validated endpoint early in the development process. Also, selection of the study population and the definition of what constitutes a minimal clinically important difference (MCID) require close attention. Such discussion may take place in the context of a validation procedure at the EMA or a protocol assistance.

Second, since 50% of the OMPs with a full marketing authorization have good real-world effectiveness (in contrast to 22% of OMPs with exceptional circumstances authorization), this underlines the need for generation of coordinated, robust post-marketing evidence. Post-marketing commitments currently often imply the set-up of an industry-sponsored drug registry, which has several shortcomings [19]. Moreover, access to real-world data across the EU is hampered by the lack or inefficiency of cross-border collaborations, fragmentation of resources and the lack of interoperability, which complicates decision making by health technology assessment (HTA) bodies. Hence, there is a need for the implementation of methods to integrate and analyze heterogeneous data. The concept of a self-regulatory market, in which an OMP will automatically no longer be prescribed by doctors when effectiveness is not convincing, might fail in rare diseases with an unmet medical need.

To accelerate access to new drugs, new pathways are increasingly being explored in the EU. The ‘adaptive pathway’, introduced by the EMA in 2015, enhances timely access of an OMP by approving it in a well-defined patient subgroup with a high (unmet) medical need, followed by widening of the indication to a larger patient population [20]. In addition, HTAs are involved early in the process. Again, the launch of improved disease registries, which can generate robust and independent evidence, is of critical importance [19]. Through this, the importance of timely access is balanced with the need for adequate information on effectiveness and safety, rendering marketing authorization a continuous process [21]. An evaluation on the adaptive pathways pilot showed that this procedure can promote multi-stakeholder dialogues and support development of drugs for which generation of evidence is difficult. Another promising development is the ‘real-world evidence’ initiative that is currently being developed by the EMA, creating a framework that delivers access to and analysis of multinational real-world data to optimize decision making on medicinal products developed for areas with a high unmet medical need [22].

4.3 Limitations

Despite the small sample, we believe that the results of our study may serve as a lead for future studies. Our study is the first to systematically compare pre-marketing data with post-marketing data for OMPs. Studies about the quality of pivotal studies of orphan drugs have been published before [23,24,25], but they lack the inclusion of post-marketing effectiveness. Since the COMPASS tool is not developed to score or rank the quality of clinical evidence, it was not possible to quantify the level of pre-marketing evidence. Grading of Recommendations Assessment, Development and Evaluation (GRADE) is not appropriate to assess clinical studies of OMPs, as this approach does not take into consideration the methodological drawbacks of these studies [15]. To our knowledge, no other quality assessment tools exist for evaluation of OMPs. Although clinical experts were asked to complete our list of post-marketing studies with studies that they believed were of value for the evaluation of post-marketing effectiveness, we may have missed some studies in the evaluation of post-marketing data. Moreover, since results of negative studies might not be available due to publication bias, the real-world effectiveness as assessed in our study may be an overestimation [26].

5 Conclusion

We showed that less than half of the authorized OMPs show good effectiveness in the real-world situation and that the most important contributor to the efficacy-effectiveness gap seemed to be the use of a biomarker in the pivotal study. Whether the results of our research on medicinal products for metabolic orphan diseases can be extrapolated to other orphan disease fields (e.g. oncology, neurology), remains to be elucidated.