FormalPara Key Points for Decision Makers

This umbrella review provides a comprehensive overview of current issues for economic evaluations of orphan drugs in rare diseases.

For economic evaluations of rare diseases, there is a paucity of evidence and a pronounced publication bias, as a result, few cost-effectiveness analyses exist for orphan drugs.

Stakeholders working with rare diseases can improve their work by following recommendations outlined in this umbrella review, for example, using comprehensive and flexible cost-effectiveness models.

1 Background

The term orphan drug is recommended by the International Society for Pharmacoeconomics and Outcomes Research when a drug is indicated for the treatment of rare diseases with a prevalence threshold of 40–50 patients per 100,000 people [2]. The US Orphan Drug Act of 1983 and the European Orphan Regulation (No. 141/2000) have provided drug manufacturers with research incentives for rare diseases [3, 4]. They are widely regarded as successful and have led to an increase in orphan-drug designations [5,6,7]. For example, in the USA, the number of orphan designations more than quadrupled from the 1990s to 2010s [8].

Before the introduction of incentives, there was a widely held view that manufacturers should be rewarded for orphan-drug development, which in exchange, meant that they could claim prices that ensured profitability. Although drug prices were high, the impact on healthcare budgets was negligible because of few marketed orphan drugs, and patients to benefit from them [9]. The situation has now changed because of the policy-induced surge in orphan drugs, and both policymakers and researchers are attempting to find sustainable solutions to the issue of reimbursement [9, 10].

The fundamental aim of clinical trials is regulatory approval, which involves a risk-benefit evaluation that should answer whether the benefit of an intervention outweighs the risk [11]. It is difficult to obtain high-quality trial data when investigating rare diseases. For example, it may be hard to recruit enough trial participants, hard endpoints may be missing and including placebo arms in clinical trials may be unethical [12, 13]. However, these challenges are often magnified when it comes to health technology assessments (HTAs), where the aim is a systematic assessment of both clinical and cost effectiveness [14]. These challenges lead to high uncertainty for cost-effectiveness analyses and along with their high prices result in many orphan drugs not being recommended for reimbursement [12, 13].

Multiple authors have described economic evaluation challenges for rare diseases, focusing on various aspects such as the decision analytic modelling component of economic evaluations. Some of the most influential papers, based on a number of citations, are from 2018 [9, 12, 13]. However, the literature is diverse, with researchers and policymakers looking for ways to alleviate the challenges for economic evaluations of orphan drugs [15, 16]. Recent events include the introduction of the Innovative Medicines Fund in the UK that facilitates the collection of additional data for promising orphan drugs or a living HTA, which is the concept of continuous updating of economic models [17, 18]. The existing reviews are limited in terms of their ability to synthesise the most recent policy, economic and clinical developments because they have been superseded by recent developments. Consequently, the issues, challenges and opportunities associated with the economic evaluation of orphan drugs have not been summarised comprehensively. As a result, an umbrella review that focuses on the challenges for economic evaluations of rare diseases is warranted.

2 Methods

Scoping searches helped inform the literature searches [1]. They confirmed that the surge in orphan drugs had resulted in a growing and disparate field of literature. Ultimately, the decision to conduct an umbrella review was made, which in this case, was deemed as an appropriate solution. Umbrella reviews aim to synthesise systematic reviews, with or without meta-analyses, and have been described as a natural option to handle increases in systematic reviews to provide a summary of broad topic areas [19]. Previously, this approach proved useful in similar situations, where fields of research expanded rapidly, and consequently, resulted in a diffuse body of literature [20,21,22].

2.1 Research Objectives

This research was informed by a modified version of the Setting, Perspective, Interest, Comparison, Evaluation (SPICE) framework [23]. The perspective component was omitted because all perspectives were considered relevant. When applying the framework with its parameters in brackets, for example, [Setting]. The research question became: in health-economic-research settings [Setting] are there any issues and challenges [Evaluation] for the economic evaluation of orphan drugs in rare diseases [Interest], which apply less to other drugs [Comparison]?

2.2 Literature Searches

The most relevant databases for the umbrella review were MEDLINE, Cochrane and EMBASE. Thus, during January 2023, MEDLINE and EMBASE were accessed through the Ovid platform and Cochrane independently through its website. For both MEDLINE and EMBASE, search filters for economic evaluations and models, and systematic reviews were sourced from the Canadian Agency for Drugs and Technologies in Health and Scottish Intercollegiate Guidelines Network databases, respectively [24,25,26]. These filters were combined with search terms for orphan drugs and rare diseases. Eligibility criteria, scoping and literature searches are available from the Electronic Supplementary Material (ESM).

As recommended by Booth and colleagues, a hand search of references and bibliographies of papers from the review was conducted [27]. This was followed by a verification process where it was checked if any known and relevant papers were missing from the review.

2.3 Data Collection Process

Titles and abstracts were screened by two independent researchers against the inclusion and exclusion criteria. Discrepancies were discussed until a consensus was reached. The papers that met the inclusion criteria, after screening of the title and abstract, were further subjected to full screening. Papers were also excluded at full screening if they were deemed as containing insufficient information to allow for meaningful data collection, for example, abstracts. The data collection process was divided into three steps: summary of characteristics, critical appraisal and data extraction.

2.3.1 Summary of Characteristics

An extraction table captured summary characteristics recommended for umbrella reviews: citations details, type of review, objectives, date range of database searching, number of studies, rating by the Joanna Briggs Institute checklist and themes [19].

2.3.2 Critical Appraisal

For the critical appraisal, the Joanna Briggs Institute critical appraisal checklist was used. This checklist is recommended by the umbrella review methodology working group for a critical appraisal of systematic reviews [19]. The checklist contains 11 questions that were used to critically appraise the reviews [28]. For this tool, there is high degree of freedom for deciding on a scoring system for the inclusion or exclusion of papers. To avoid missing any information, it was decided not to exclude any papers based on their scores. The reviews were divided into three levels according to their quality scores: 8–11 (high quality), 4–7 (moderate quality) and 0–3 (low quality) [29, 30].

2.3.3 Data Extraction

The included reviews were carefully assessed with the aim of identifying broader themes that pertain to economic evaluations of orphan drugs. Challenges were extracted and tabulated according to their themes, based on an approach previously used to extract modelling challenges for rare diseases [12].

3 Results

3.1 Literature Search and Study Selection

The study selection is illustrated by a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram in Fig. 1. The number of identified records was 282. They were retrieved from the following databases: EMBASE (n = 211), MEDLINE (n = 67) and Cochrane Library (n = 4). A total of 64 duplicate records were removed. Moreover, 172 records were excluded during screening of the abstract and title, which left 46 studies for full screening. Of those, 11 were excluded because of not containing components for economic evaluations (n = 5) or systematic reviews (n = 4), or because they were abstracts (n = 2). Overall, 35 reviews from the database searches were deemed eligible for inclusion as listed in the ESM under literature search results.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram

The hand search yielded five papers, of which four were excluded for the following reasons: no component of economic evaluations (n = 1) or systematic reviews (n = 2), and for not being concerned with orphan drugs (n = 1). It meant that one paper was carried forward from the hand search, which brought the total number of eligible reviews to 36. The ESM lists papers included for full screening.

3.2 Study Characteristics

A two-step approach was used to determine if studies could qualify as systematic reviews. First, a Scottish Intercollegiate Guidelines Network search filter for systematic reviews was used, which is a pre-tested search strategy that identifies the higher quality evidence from vast amounts of literature indexed in a medical database. Second, eligibility was assessed, and a consensus obtained between the first and second reviewer on their inclusion. Using this approach, two scoping reviews were included because the methods were sufficiently systematic [31, 32]. Similarly, a study described their approach as a series of targeted literature reviews, which was also sufficiently systematic for inclusion [12]. The number of records included in the systematic reviews varied between 2 and 338. The ESM provides a summary of study characteristics.

3.3 Critical Appraisal

One study had low quality [31]. The highest frequency was found in the category of moderate quality, which comprised 27 studies [12, 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57], whereas eight studies were rated as having high quality [58,59,60,61,62,63,64,65]. The ESM includes scores for each individual Joanna Briggs Institute checklist question across all studies, which showed that most studies (n = 35) obtained points from question 4, which was: Were the sources and resources used to search for studies adequate? Question 8 was not widely applicable and was fulfilled by the least studies (n = 4). Question 8 was: Were the methods used to combine studies appropriate? Critical appraisal methods used in individual systematic reviews were assessed by question 5: Were the criteria for appraising studies appropriate? Fourteen studies included appropriate criteria for critical appraisal, whereas in 13 studies it was unclear whether they did, seven studies did not, and for two studies the question was not applicable.

3.4 Data Extraction

The systematic reviews were divided into two categories: those that considered a specific rare disease (13 studies) and those that considered multiple rare diseases (23 studies). As shown in Fig. 2, three broad themes were identified: issues with health economic parameters, issues with health economic evaluations and issues with estimating value/reimbursement, with subtopics further developed for each theme. For issues with health economic parameters, the subtopics were the natural history of disease, clinical effectiveness, costs and quality of life. For issues with health economic evaluations, the subtopics were cost effectiveness and budget impact. For issues with estimating value or reimbursement, the subtopics were thresholds, value frameworks and multiple criteria decision analyses (MCDA). A repository of all extracted data on issues for economic evaluations of rare diseases is available in the ESM.

Fig. 2
figure 2

Data extraction themes, sub-topics and findings

3.5 Issues with Health Economic Parameters

3.5.1 Natural History of Disease

Rare diseases often progress slowly or are chronic by nature, which make clinical trials insufficient as they tend to have short durations [12, 62]. The non-existence or limited number of studies that include data on prevalence and incidence further magnify issues [44, 48, 51]. Moreover, clinical experts are few and private practitioners may only encounter few rare disease cases, which make them difficult to diagnose, and expert advice on rare diseases might not be easy to find [12, 44, 45, 57]. Delayed diagnosis and misdiagnosis make it difficult to define treatment-eligible cohorts [45, 50, 51, 58].

To summarise, an economic evaluation is challenging, for example for long-term modelling, because of missing data on the natural history of the disease or unknown rare disease trajectories [40]. Although registries can alleviate data issues, they may suffer from challenges such as diverging disease and diagnostic codes, data ownership and missing comparator data [12, 48, 50].

3.5.2 Clinical Effectiveness

Whilst clinical trials are common sources for effectiveness data in economic evaluations, appropriate clinical evidence is not always available for this purpose [57, 58]. Moreover, clinical trials may suffer from short durations, small sample sizes, premature termination, inadequate power, missing data or missing control arms, for example, for ethical reasons [12, 37, 45, 47, 57]. In addition, published long-term studies providing post-marketing data on safety and efficacy are rarely available [37, 38].

Other challenges are missing treatment guidelines, data to predict treatment responses, concerns on the patient relevance and the use of surrogate endpoints [40, 50, 52, 60, 63]. Comparator data are essential for economic evaluations, but might be missing for rare diseases, and if they are available, there might not be a consensus on the use of treatment regimes or treatment eligibility of patients, which result in heterogeneity across studies [39, 48, 50, 51]. A review found that studies reporting clinical evidence for orphan drugs had low-to-moderate quality, and none of them had high quality [60].

3.5.3 Costs

Cost-of-illness or burden-of-disease studies are scarce in rare diseases [12, 34, 39, 42, 43, 48, 52, 55]. Of those studies available, most are retrospective and only a small proportion of studies report indirect, non-medical or informal-care costs. [12, 34, 51, 58, 59]. Aggregated primary data are rarely available, hence, studies tend to report patient-reported claims or registry data [42].

It is complicated to transfer cost-of-illness results between different rare disease settings because of differences in study designs, methods and results. For example, one study estimated lost productivity without following recommendations for handling uncertainty [42]. A multitude of factors influence transferability such as data sources, geographical perspective, nomenclature, assumptions, discount rates, unit costs, treatment guidelines and value frameworks [34, 43, 46, 50].

3.5.4 Quality of Life

Quality-of-life studies in rare diseases are limited, but availability depends on the rare disease of interest [35, 39, 47]. For example, a review found two studies that included utility values for Cushing’s syndrome, whereas another review concerned with Crigler–Najjar syndrome found no data on the humanistic burden, apart from anecdotes on treatment challenges [39, 47]. In addition, there are data limitations on the quality of life of caregivers [63]. A probable explanation for the scarcity is the limited applicability of quantitative methods such as choice experiments or conjoint analysis in rare diseases, for example because of small sample sizes [35]. Furthermore, studies tend to be small, not randomised or controlled, which decreases the reliability of conclusions [51]. This scarcity of evidence may lead to the use of assumptions, for example, assumption of equal utility values across treatment arms or a linearity assumption of utilities between different timepoints [46, 47]. Moreover, the reviews highlight shortcomings in methods and reporting, for example, the failure to include utility values or mapping algorithms, and insufficiently describing the elicitation of utility weights [58, 64].

3.6 Issues with Health Economic Evaluations

3.6.1 Cost Effectiveness

Health economic evaluations for rare diseases are scarce. For example, a systematic review failed to identify any studies, whereas another noted a remarkable absence of pharmacoeconomic evidence [45, 58]. A notable opinion on the cause of scarcity is that limited information on input parameters simply deter people from attempting to construct cost-effectiveness analyses because it is presumed unachievable [62]. In brief, causes are missing patient-level data, high drug costs and the inability to measure effects for clinical or quality-of-life outcomes [55, 57, 62].

The difficulties for economic evaluations are driving factors for the use of assumptions to overcome challenges for cost-effectiveness modelling. For example, assumptions on mortality, efficacy, treatment and complications [55]. It is commonplace to use modelling techniques such as mapping algorithms or long-term extrapolation for outcomes because of data limitations [38, 47]. Moreover, limited patient numbers coupled with unreliable estimates of effects, symptoms and complications suggest that methods such as patient-level simulation modelling may have limited applicability in rare diseases [62].

Additionally, publication bias in relation to positive results or industry-sponsorship bias seems to be prominent in rare diseases [66, 67]. It may occur when manufacturers decide to publish only if they have favourable cost-effectiveness results, a post-marketing obligation, or an opportunity to adopt favourable input parameters and an advantageous interpretation of results [45, 62, 65]. Numerous reviews suggest issues of publication bias [38, 45, 49, 61, 65]. For example, Schuller et al. indicated a higher frequency of analyses in countries with post-marketing obligations [62]. Others found that studies failed to discuss the direction and magnitude of bias, despite using data from potentially biased sources [38, 61, 65]. Another review highlighted selection bias to explain conflicting cost-effectiveness results for a particular drug [49]. Additionally, it was highlighted that most studies were industry funded in a systematic review of cost-utility analyses for haemophilia [64]. Furthermore, incremental cost-utility ratios were significantly lower when published by industry compared with foundations and academia [49].

Most economic evaluations have moderate quality, and the failure to reach high quality may be partly attributed to a lack of good-quality model inputs (e.g. utility values that do not account for patient characteristics and disease severity) or because they omit lifetime horizons for chronic rare diseases [55, 59, 61]. Moreover, problems with reporting are frequently highlighted as another factor that may contribute to insufficient quality. For example, not adequately reporting discount rates, sensitivity analyses, utility weights, patient characteristics, funding sources and time horizons [38, 59, 64].

Transferability is another issue for cost-effectiveness results [57]. Cost-effectiveness analyses are heterogenous because of modelling variations in treatments, patient populations, time horizons, countries, cost-effectiveness thresholds, settings, year of analysis, comparators and assumptions [49, 51, 55, 59, 61, 62, 64]. Thus, a high degree of carefulness is advised when assessing the transferability of results across different healthcare settings [61].

3.6.2 Budget Impact

Studies on budget impact modelling are few, mostly from high-income or native English-speaking countries. If Kanters and colleagues’ suggestion is accurate, it is not possible to rule out publication bias as a cause for the scarcity of studies on budget-impact modelling [45]. Furthermore, they are low quality and show poor adherence to guidelines [33, 45]. A proportion of budget-impact studies fail to report side effects, drug-related services, life-extension costs, savings from mortality reductions and validation methods [33, 53]. The importance of assumptions should not be overlooked, which are frequently incorporated for target populations, population sizes, interventions, comparators, costs and market uptake [33].

3.7 Issues with Estimating Value and Reimbursement

3.7.1 Value Frameworks and Thresholds

Most countries require budget-impact and cost-effectiveness models as part of HTAs, but the appraisal process (e.g. cost-effectiveness thresholds) may vary across countries, thus making comparisons difficult. As mentioned, whilst evidence may be scarce, input parameters on prevalence, incidence, number of treatment-eligible patients, and clinical benefits are nonetheless needed when estimating the budget impact and cost effectiveness for rare diseases [54]. For Europe, reference pricing further adds to the complexity and may prevent launches of orphan drugs in low-income countries [57]. Overall, value frameworks may suffer from transparency and consistency issues. This largely makes budget-impact and cost-effectiveness analyses country specific [36, 61].

3.7.2 Multiple Criteria Decision Analysis

A multiple criteria decision analysis (MCDA) is an emerging value framework for orphan drugs because it offers an opportunity to include a broad range of value criteria, for example, societal, disease or treatment criteria [31, 41]. Critics highlight variations in scoring functions for value criteria as a significant limitation and for decision making it is difficult to observe consistent recommendations [41, 56]. Interestingly, by meticulous examination of value criteria weights and scores in MCDAs, Friedmann and colleagues suggested that traditional value aspects used in HTAs (budget impact and cost effectiveness) were considered unimportant by stakeholders involved in orphan drug appraisal processes. The most cited value criterion was disease severity (n = 10), cost effectiveness (n = 7) and budget impact (n = 3) were cited ten times, collectively [41]. By contrast, Mohammadshahi and colleagues found in their review an equal citation frequency for the value criteria: disease severity (n = 8), cost effectiveness (n = 8) and budget effect (n = 8) [32].

4 Discussion

This section discusses the umbrella review findings, which indicated multiple issues for the economic evaluation of orphan drugs in rare diseases. However, it was not possible, with confidence, to assert whether all issues for orphan drugs applied less to other drugs, which was part of the original research objective [1]. Many papers focused on the evidence for a specific disease or multiple diseases, rather than how it compares to other drugs. For example, a systematic review of available evidence on 11 high-priced inpatient orphan drugs found that study populations were significantly smaller in randomised trials for orphan drugs as compared with non-orphan drugs [45]. Other systematic reviews in rare diseases confirmed that study populations were small but did not compare to other drugs [12, 37, 57]. The magnitude of issues varies, and this is the case for orphan drugs and other drugs. Thus, some of these issues may also be applicable to other drugs; however, these issues are critical in the case of orphan drugs as the issues tend to be amplified. In acknowledgement of this inability to consistently compare to other drugs, the ESM provides an indication of commonality for issues with economic evaluations of orphan drugs.

4.1 Issues with Health Economic Parameters

Scarcity of evidence was reported for natural history of the disease, clinical effectiveness, costs and quality of life [12, 34, 39, 42,43,44,45, 47, 48, 51, 52, 55, 57, 58]. It was previously pointed out that there were simply no easy answers to the problem of assessing evidence for orphan drugs [9]. In this review, this was exemplified by analysts who expressed a hope, rather than an actionable plan, for better availability of clinical trials with longer time horizons to conduct a thorough analysis of cost effectiveness, for example, for paediatric pulmonary arterial hypertension [37]. Others have suggested that high drug prices and the inability to measure effects would discourage people from even attempting to construct cost-effectiveness analyses [62]. This interpretation contrasts with that of Picavet and colleagues who conclude that orphan drugs can meet traditional cost-effectiveness thresholds [49]. It is an option to use expert opinion if few data are available, although it may be difficult to obtain [68, 69].

Some strategies may help improve evidence sources, but most do require extensive resources. For example, registries have the potential to inform modelling on the natural history of disease or can help construct a replacement for the standard of care, which may be relevant for trials without a control arm [12, 62, 63]. In addition, surrogate markers can play a vital role when clinical trials have short durations, they may, however, be difficult to validate without long-term data [57]. Analysts have drawn attention to this matter and highlighted the importance of consulting experts and to source data from other similar diseases to fill data gaps, for example, quality of life associated with wheelchair confinement between multiple sclerosis (more prevalent) and Duchenne’s disease (less prevalent) [12]. Last, authors suggest investigating the geographical variation in treatment patterns, reporting of side effects, long-term trials in disease areas with little evidence and a Cochrane review group dedicated to systematic reviews that reduce evidence gaps for orphan drugs [37, 48, 60].

For cost-of-illness studies in rare diseases, first, the studies should be clear on their perspective; second, they should report indirect costs separately from direct costs, for example, lost productivity; third, they should report costs associated with prevented comorbidities; and fourth, they should provide clarity on applied discount rates [34, 42, 59, 63]. The importance of future research for informal care, in terms of costs and quality of life, was highlighted by multiple authors because rare diseases may have severe implications for the closest providers of care, for example, family and friends [34, 55, 63].

4.2 Issues with Health Economic Evaluations

Systematic reviews reported a scarcity of cost-effectiveness modelling studies [45, 58]. As alluded to earlier, it could suggest a strong link between evidence issues, publication bias and the observed paucity of cost-effectiveness analyses [62]. Researchers want economic evaluations with higher quality and extended time horizons [61]. To achieve this aim, without conducting a clinical trial, one could evaluate: entry-level agreements and registries for data collection, patient surveys to assess the burden of disease, Delphi techniques for validation, expert opinion for estimation, population-adjusted indirect comparisons to account for patient characteristics and rare events with high costs [12, 64].

The explanations for the paucity of budget-impact models may be in terms of input parameters, for example, issues around a lack of data for prevalence or incidence estimation could contribute to their paucity [48, 51]. Budget-impact models were low quality and rarely validated. Summarising recommendations for improvement, they simply were that researchers should adhere to guidelines [33, 70]. Furthermore, publication bias for budget-impact models cannot not be ruled out [45, 54]. Health technology assessment bodies often require them, but for manufacturers, being the cause of increased healthcare costs might not be a message worth communicating, thus providing an explanation for potential publication bias. It is plausible that the budget impact is less of a concern for rare diseases because a low prevalence can translate to a lower impact on budgets for payers, thus providing another explanation for the scarcity of publications.

4.3 Issues with Estimating Value and Reimbursement

The appropriateness of value frameworks in the context of rare diseases is debated. For traditional value frameworks, examples of proposed solutions are: weighting of quality-adjusted life-years according to disease severity and prevalence, categorising quality-adjusted life-years based on disease states, implementing higher cost-effectiveness thresholds and special rules for those that exceed thresholds, for example, managed entry-level agreements and stopping rules for cost containment [12, 57]. The UK is an example where some of these measures have been incorporated through the Innovative Medicines Fund for medicines that are promising but associated with high uncertainty or decision modifiers through highly specialised technology appraisals [17, 71].

As highlighted throughout this review, criticism of traditional value frameworks has partly been related to their limited transparency and transferability of results. Critics have suggested policymakers explore other frameworks, for example, MCDA. So far, this method has only seen sporadic implementation, but it is clearly emerging [31, 41]. The benefit of MCDA is the ability to include a range of value criteria, for example, the burden on caregivers [36, 41]. However, like traditional frameworks, transferability and transparency for MCDA are areas that warrant further research [41, 56]. However, it should be noted that using a different value framework will not solve the problem of evidence scarcity.

5 Recommendations

Challenges are abundant and solutions are not plentiful and rarely forthcoming. Stakeholders, however, must recognise that certain types of research are costly, and demanding these could further eliminate company incentives to research rare diseases [57]. For example, clinical trials with extended time horizons. Thus, there is a need for recommendations that are more sustainable. As a first step towards these, we provide practical recommendations that may help alleviate challenges identified in this umbrella review.

5.1 Comprehensive and Flexible Cost-Effectiveness Models

Data availability is critical at the time of economic evaluations for rare diseases, this is why economic models should be transparent, and uncertainty rigorously explored through sensitivity analyses and set up for continuous updating as data become available over time [59]. Continuous updating of cost-effectiveness models with new data is an unexplored opportunity, especially considering the necessity of post-launch monitoring or real-world data [12, 60]. Such a framework has been referred to as a living HTA [18, 72].

Furthermore, transparency may increase for other stakeholders who are not trained researchers because user-friendly interfaces, for example, Shiny apps in the software R, allow them to “safely” explore model scenarios without having to face backend code [73]. For risk-sharing agreements, rather than focussing purely on clinical endpoints, for example, survival, they could potentially allow for fully updated cost-effectiveness models.

Consequently, for economic evaluations of rare diseases, there is untapped potential for using living HTAs. What is more, it has been recommended to use cost-effectiveness models in rare diseases to facilitate an expected value of information analysis using inputs from, for example, phase II or registry data [12]. It provides researchers with an opportunity to address the root causes of uncertainty by reprioritising or initiating data collection efforts, for example, before initiation of a phase III trial or an HTA [74].

In summary, we recommend using comprehensive and flexible cost-effectiveness models, which report value of information as initially suggested by Pearson and colleagues, which should as a minimum include both expected value of perfect information and expected value of perfect parameter information [12].

5.2 Publication Bias and Ability to Meet Cost-Effectiveness Thresholds

In the case of bias, one unanticipated finding was the extent to which publication bias seemed to be an issue [38, 45, 61, 62, 65]. Unfortunately, failure to account for bias can result in overambitious claims, for example, that cost-effectiveness analyses for rare diseases can indeed meet traditional cost-effectiveness thresholds. In this example, most studies were industry funded, which made the authors speculate and wary of a potential publication bias [49]. Their sample of studies was not fully representative for economic evaluations of rare diseases because they mainly came from the literature, and if the hypothesis of publication bias is correct, there must be a higher likelihood that these studies were published, simply because they showed that cost-effective thresholds were reached.

Unfortunately, biased conclusions may disrupt ongoing efforts to improve reimbursement conditions for orphan drugs, and the momentum could be lost if policymakers take their conclusion at face value. The overall conclusion that cost-effectiveness analyses can meet common cost-effectiveness thresholds seems strongly contested by the findings of this review. In this example, the research would have been more convincing if the authors had considered cost-effectiveness analyses submitted to HTA bodies as compared to those available in the literature. We recommend further research to determine the effect of publication bias on the ability to meet cost-effectiveness thresholds and caution when interpreting results.

5.3 Other Opportunities

Researchers need to identify data gap years before economic evaluations to allow for sufficient time to generate the data needed. We have already described the potential for registries, but we recommend in addition to conduct an early economic evaluation of phase II data, which may provide timely knowledge on pricing and reimbursement [75]. Furthermore, patient organisations may be able to support reimbursement efforts, as there should be a mutual interest to bring orphan drugs to the market.

Another opportunity is risk-sharing agreements. Decision makers have implemented alternative methods of financing in response to high uncertainty for interventions, for example, future clinical and economic outcomes for orphan drugs [76, 77]. In short, they are in place to facilitate risk sharing between those supplying (manufacturers) and paying (healthcare providers) for health interventions, which is why they have broadly been referred to as risk sharing, pay-for-performance or managed-entry agreements. Although the nomenclature is not consistent, they can generally be divided into two categories: health outcome-based or non-outcome-based agreements [78, 79].

6 Limitations

Our review has some limitations. First, two researchers conducted the screening of titles and abstracts, but only one reviewer conducted the full screening and quality assessment. For this reason, the reliability could have been higher. To make up for this, we transparently report the full screening and quality assessment in the ESM. Second, exclusion of studies that did not qualify as systematic reviews meant that there was a chance of missing valuable information. Such an example was a narrative review of orphan drugs, which could have supported our findings [9]. Moreover, the search only included studies from 2010. However, the literature searches were partly based on search filters, which balanced sensitivity and specificity. Third, we included all studies, no matter their quality rating, to maximise inputs into the study. This resulted in the inclusion of one study with a low-quality rating [31]. Fourth, advanced therapeutic medicinal products (ATMPs) were excluded from this umbrella review, even if they were considered orphan drugs. It has been much debated whether they should qualify as drugs because the production process typically involves modifying cells or genes. There are challenges for economic evaluations of ATMPs such as high prices and sparse supportive evidence, for example, small sample sizes, single-arm studies and insufficient follow-up [80]. Thus, the identified opportunities for orphan drugs could apply equally to them. However, there are likely differences, ATMPs are frequently curative with a one-off cost, which is why major challenges are affordability and long-term uncertainty [81,82,83]. Furthermore, it was previously suggested to consider economic aspects for curative and non-curative treatments differently [57]. Finally, cross-referencing in the included papers was most prominent in recent papers, and in those with a broader scope. For example, a review concerned with methods for assessment of orphan drugs included six references, whereas another review of economic evaluations for enzyme replacement therapy in lysosomal storage disease included none [32, 46].

7 Conclusions

This umbrella review set out to determine issues for the economic evaluation of orphan drugs. The most obvious finding to emerge from this study was scarcity of evidence for clinical effectiveness, costs, quality of life and natural history of the disease. Scarcity of evidence and publication bias emerged as possible causes for the limited quantity of economic evaluations from the literature. The results support the notion that an economic evaluation of rare diseases is challenging.

We recommend that researchers focus on sustainable initiatives and explore flexible cost-effectiveness models, for example, using living HTAs. We highlight that further research is required to determine the effect of publication bias on the ability to meet cost-effectiveness thresholds.