1 Introduction

Comprehensive measurement of health economic value is a considerable challenge. However, appropriate economic value estimation is crucial for scarce resources to be allocated efficiently, and it is necessary to judge whether the healthcare technology at a certain price is good value for a specified society. The value of health technologies can be determined through a multidisciplinary process called health technology assessment (HTA) [1]. This is a formal, systematic, and transparent approach that traditionally assesses the economic implications of a new intervention by two structured components: cost-effectiveness analysis and budget impact analysis. Cost-effectiveness analysis is an approach to estimate the long-term value compared with additional cost, and budget impact analysis is an approach to assess the short-term affordability of new therapies [2,3,4]. In addition to these components, other factors, such as unmet medical need or equity, are taken into account in several jurisdictions during the deliberative process, but no generally used quantitative method exists to measure and incorporate these additional factors [5].

With scientific advances in innovative therapies for targeted populations, including those with rare diseases, the traditional approaches to estimating economic value may be considered less relevant (i.e., not sufficient) [6]. An important reason, besides many others, is the unique regulatory pathway, which results in scarcity of good clinical and health economic evidence [7]. On the other hand, the prices of treatments for rare diseases can be exceptionally high [8]. These factors in combination mean the traditional quantitative HTA methods are less applicable [9, 10]. As a result, in the case of several therapies for rare diseases, decisions and corresponding behaviors suggested discordance with traditional HTA results [11]. Therefore, the methods should be updated, and additional value-bearing criteria can be taken into account during the value assessment. Different value frameworks have been developed by recognized HTA organizations to define criteria that should be considered in a more comprehensive evaluation process [4, 12, 13]. Although these endeavors were successful in highlighting certain “novel” value criteria (e.g., scientific spillover, equity, real option value, etc.), challenges remain in terms of how these fit into the decision-making process in a measurable, structured, and transparent way. Despite these challenges, we can find examples of how to incorporate certain novel value criteria into the HTA process [14].

Using value frameworks, multi-criteria decision analysis (MCDA) tools can be developed [15]. MCDA is a structured decision-making process that offers the flexibility of incorporating multiple objectives and criteria into a single evaluation. Research has explored the application of MCDA in coverage and reimbursement decisions of innovative healthcare technologies [16]. Despite early examples [17] of MCDA applications, the uptake of MCDA by health technology decision makers remained at the level of pilot tests [18]. In most published cases, MCDA was developed as a tool that substituted the existing practices entirely with incorporating traditional criteria such as clinical effectiveness, safety, quality-adjusted life-year (QALY) gains and, in some cases, economic aspects (i.e., cost effectiveness and budget impact) [19]. By following such a holistic approach, ambiguity may be introduced in the assessment, as individuals are asked to trade-off traditional and novel value criteria with overlap across value domains. On the other hand, MCDA remains a promising tool in the assessment of novel value criteria that are difficult to measure and typically do not flow into traditional cost-effectiveness and budget impact approaches.

One of the most researched areas of MCDA application is in the coverage and reimbursement of innovative therapies for rare diseases, especially the assessment of orphan drugs [20,21,22]. Therefore, we selected this therapeutic field for further research.

2 Objectives

The objective of this research was to perform a systematic literature review (SLR) of the criteria and scoring functions applied in value frameworks and MCDA tools relevant to the evaluation of therapies for rare diseases. The research aimed to reveal the criteria that directly mapped into the traditional value criteria, and special attention was given to those not commonly used in cost-effectiveness and budget impact analyses. The aim was to gain a better understanding of the measurement of novel value criteria with the help of the scales and scoring functions applied to them in the published studies.

To our knowledge, this is the first SLR to systematically collect the criteria and applied scoring functions to support the development of a tool that focuses on value criteria that are not incorporated into traditional HTA.

3 Methods

We performed a systematic review of scientific and gray literature of value frameworks and MCDA tools relevant for pricing and reimbursement decision making for pharmaceuticals. Value frameworks were defined as comprehensive frameworks used to assess the health economic value of health technologies, mainly focusing on value criteria and their independent evaluation without any aggregation methods. MCDA tools overcame these limitations by explicit scoring and mathematical aggregation of different value criteria. Structured review papers on rare-disease-specific MCDA articles were also investigated. Information sources included PubMed, Embase, Scopus, and 26 other gray literature sources covering the period from January 2013 until October 2019. Gray literature sources included databases of universities, HTA agencies, and other relevant HTA organizations (see electronic supplementary material [ESM] 1 for the complete list of sources). Our search was limited to papers that were written in the English language. The publication date was restricted to 2013 or later to summarize the most recent and relevant evidence published in the literature. The search strategy was built up as a combination of search strings, allowing the capture of all relevant keywords and synonyms that may appear in the papers (see ESM 1 for the detailed search strategies).

Initial title and abstract screening of relevant papers was conducted by NDM and BE, with disputes in sorting arbitrated by TZ. Screening was conducted alongside hierarchical exclusion criteria. Publications not excluded by one of these criteria proceeded to full-text screening. Full-text screening of relevant articles was conducted by NDM and BE separately, with disputes arbitrated by TZ as a third senior investigator (see ESM 1 for the detailed literature selection methods). Abstracts identified as posters or podium presentations were screened separately. Studies were included in the final selection if they used explicit scoring functions for the included evaluation criteria and were either orphan drug specific or were considered “referenced” general frameworks. Value frameworks were considered to be “referenced,” if they were cited more than once in the relevant orphan-specific papers and/or previous literature reviews focusing on orphan-drug-specific MCDA methods. Data extraction from the relevant value frameworks included both general article descriptions (authors, affiliation, year, and country of publication) and evaluation criteria descriptions (name of criterion, definition, scoring function details, weighting).

Criteria were grouped according to whether they mapped directly into the value criteria assessed traditionally from the payer perspective. Three highly referenced value frameworks were selected (i.e., the International Society for Pharmacoeconomics and Outcomes Research [ISPOR] value “flower” [12], the ICER value framework [23], and the Second Panel on Cost-Effectiveness impact inventory [13]) to categorize criteria into three groups according to their use in traditional HTA. The terminology of the ISPOR value flower was used: (1) core criteria (i.e., criteria using traditional HTA), (2) common but inconsistently included criteria (i.e., criteria assessed in certain settings/jurisdictions), and (3) novel criteria (value criteria typically not assessed in traditional HTA) [12].

4 Results

The systematic search yielded a total of 2913 independent records after deduplicating hits from the different scientific databases. The title and abstract screening further narrowed the number of relevant publications to 434. After full-text screening, 62 peer-reviewed articles including different value criteria remained. With our final set of inclusion criteria requiring explicit scoring functions, as well as the value framework needing to be either orphan drug specific or referenced, 11 relevant studies were included in the final synthesis. In addition to the peer-reviewed publications, 74 conference abstracts were also found, resulting in 12 relevant posters being reviewed and one being included in the final synthesis (Fig. 1). The three highly referenced value frameworks were also included in narrative synthesis as guiding value frameworks. These guiding value frameworks included the ISPOR value flower [12], the ICER value framework [23], and the Second Panel on Cost-Effectiveness Impact Inventory [13]. Altogether, 15 studies were analyzed in this review (11 peer-reviewed publications [24,25,26,27,28,29,30,31,32,33,34], one poster [35], and three guiding frameworks [12, 13, 23]).

Fig. 1
figure 1

Flow of information diagram

Within the 15 included publications, eight publications were orphan drug specific [28,29,30,31,32,33,34,35] and seven were general frameworks [12, 13, 23,24,25,26,27]. Further differentiating the 15 publications, eight were considered value frameworks [12, 13, 23, 25, 27, 28, 32, 33], and the other seven were considered developed MCDA tools [24, 26, 29,30,31, 34, 35]. The criteria included in the 15 relevant studies were extracted and duplications eliminated. There were also significant overlaps between several criteria that was revealed by the thorough evaluation of the definitions and scoring functions of each criterion. Finally, 56 individual value criteria were identified. These were categorized into “core,” “common,” and “novel” value criteria (see ESM 2).

The most common criteria considered to be part of the “core” value criteria and for which a scoring function was published were comparative clinical effectiveness, comparative safety, and health-related quality of life. Some of these criteria substantially overlapped each other. As an example, the four most commonly used individual criteria all contribute to the incremental cost-effectiveness ratio. Table 1 presents the most frequently included core value criteria in orphan-specific, general, and all frameworks (either direct or indirect inclusion).

Table 1 Summary of the most frequent “core” value criteria (type and frequency of inclusion)

Among criteria considered “common,” the most frequently included were adherence-improving factors, labor market earnings lost, and cost of unpaid lost productivity due to illness (Table 2). Most of these criteria can be quantified and therefore taken into account within traditional cost-effectiveness analyses. However, their structured assessment is limited, especially in terms of the incremental therapy impact. In practice, these value criteria are typically folded into cost-effectiveness evaluations when the analysis uses a societal perspective, but their inclusion is inconsistent. Orphan-drug-specific articles less frequently incorporated these common value criteria into the evaluation process.

Table 2 Summary of the most frequent “common” value criteria (type and frequency of inclusion)

For value criteria categorized as “novel,” the most frequent were unmet medical need, severity of disease, and reduction in uncertainty (Table 3). Basically, orphan-specific and general frameworks and tools were similar in terms of the frequency of inclusion of the different criteria in the core, common, and novel categories. However, “size of the population” and “type of therapeutic benefit” seemed to be exemptions, as these criteria were more important in orphan-drug-specific frameworks. On the other hand, “level of innovation” and “burden of illness” appeared more frequently in general frameworks.

Table 3 Summary of the most frequent “novel” value criteria (type and frequency of inclusion)

The scoring functions for these novel criteria were investigated to increase our understanding of these value criteria. “Unmet medical need” and “severity of disease” were two novel criteria found in every orphan-drug-specific value framework and MCDA tool, as these criteria capture crucial aspects of the potential added value of rare disease therapies. The scoring functions provided for these criteria were subjective. For unmet need, the concept was to give a higher score to interventions intended to treat diseases with no available therapeutic alternatives, but the methods differed. Some MCDA tools used a quantitative scale from 0 to 5, where no points were given when there were “no unmet needs,” and 5 points were given when there were “many unmet needs” [34]. Other tools used 3-point scales ranging from “no unmet need” or “available alternatives” to “high unmet need” or “no available alternatives” [31]. For “disease severity,” Iskrov et al. [29] proposed a scoring function in which chronic life-threatening disorders gained a higher score than acute diseases and chronic non-life-threatening diseases. However, the article stated that there was debate about choosing the chronic life-threatening disease over the acute disease to gain a higher score in the severity criterion. To score disease severity, Hughes-Wilson et al. [28] defined a three-level scale where diseases with increased mortality or severe invalidism in infancy had the highest score, increased mortality or severe invalidism in adulthood were at the second level, and low scores were dedicated to diseases that influence patient morbidity rather than mortality [28]. Simple scales from not severe to very severe (on a scale from 0 to 5) also appeared in the frameworks [26]. These examples highlighted the heterogeneity of the current measurement of these criteria.

The “size of the population” criterion is especially interesting, because, unlike general value frameworks, which usually provide higher scores for more common diseases, orphan-specific value frameworks give higher scores for diseases with a lower prevalence, so an ultra-rare disease would be provided the highest score from this aspect [29, 31, 34]. However, exact disease incidence thresholds were not provided in any of the frameworks investigated. Methods of evaluating the criterion “level of innovation” also differed and included items such as the number of new indications, stereochemical properties, or being an advanced therapy medicinal product.

For other novel criteria (i.e., reduction in uncertainty, type of therapeutic benefit, burden of illness, equity/support vulnerable groups, manufacturing complexity), the identified scoring functions were highly heterogeneous and specific for the individual framework or study. Unified scales were not used in any of these criteria (see ESM 2 for the detailed scoring functions included in the studies). Moreover, none of the included MCDA tools or value frameworks described the detailed justification behind the selection of any of the scoring functions.

5 Discussion

In this SLR, a large number (n = 62) of scientific publications relevant to the evaluation of therapies for rare diseases were identified. However, only a small portion of these studies (12 of 62) published any recommendations on the scoring functions of the included evaluation criteria. Review of these studies and the three selected guiding frameworks identified a large number of individual criteria. The use of these criteria substantially overlapped each other across the investigated frameworks and MCDA tools. These observations are in line with findings of previous studies [22, 36, 37]. In an SLR, Baran-Kooiker et al. [19] also emphasized the importance of simplifying the MCDA methodology and having fewer and well-defined criteria with less overlap [19].

It was a complex exercise to decide whether or not a criterion is commonly considered in traditional cost-effectiveness or budget impact evaluations. Although such a distinction is inherently arbitrary, and countries have different practices to conduct these evaluations, it is still beneficial to categorize the identified value criteria on a ISPOR value flower spectrum from “core” and “common” to “novel” based on their application as recommended in generally accepted guidelines. Novel criteria carry more uncertainties during the evaluation as less evidence is available on their assessment methods. For these novel criteria, a well-designed MCDA tool could have substantial added benefit in establishing a transparent and reproducible methodology for their evaluation.

An important observation was that criteria that related more to the costs side, and were therefore easier to monetize, were considered more frequently within the traditional evaluations (especially when the broader societal perspective was used), whereas value criteria that related more to the outcomes side but could not easily be translated into QALYs were not included. Such outcomes criteria are usually considered “novel criteria.” In the case of these value criteria, the identified scoring functions may support their more transparent use within MCDA frameworks, especially as the assessment of the novel criteria of value remains less standardized and therefore yet to be part of a structured evidence-generation process.

There was substantial variation in the assessment methods and scoring functions of the included value criteria. No unified scoring function was used for any of the criteria presented in the studies. The exact application may influence the selection of the most applicable, quantifiable, and objective scoring function. Most value frameworks and MCDA tools did not describe the rationale behind choosing a specific scoring function for a criterion; however, in some cases (e.g., “size of population”), key differences were rationalized for scoring functions used in orphan-specific value frameworks compared with a general value frameworks. This missing justification is a significant limitation of the included studies, decreases their level of transparency, and underlines the uncertainties around the evaluation of these novel value criteria.

The investigation of scoring functions supports better understanding of novel value criteria and justifies their application in MCDA tools. In addition, weighting of criteria also depends on the applied scales, so it is necessary to develop credible and transparent scoring functions that are made explicit before the weighting exercise [38]. The ultimate goal of this research was to facilitate the development of well-designed MCDA tools by systematically collecting the existing criteria and their scoring functions, with special focus on novel value criteria. Tables 1, 2 and 3 highlight the most frequently used criteria, and ESM 2 provides the pool of existing scoring functions. These results may provide substantial support for the development of MCDA tools in terms of the selection and measurement of different value criteria.

An important limitation of the current review is that, despite the broad literature search strategy, relevant papers might have been missed, e.g., if a tool was developed for the evaluation of a specific compound or therapies for a specific rare disease, and the general search terms did not capture the MCDA tool. Moreover, the time frame investigated was relatively short to maintain the feasibility of the research. A further limitation is that no formal analysis was performed to judge the appropriateness of the scoring functions identified for the different value criteria.

6 Conclusion

Our results confirmed that using only traditional criteria in HTA found in cost-effectiveness and budget impact analyses could miss components of health economic value. By reviewing the current literature, more evidence is published in terms of the scope of relevant value criteria. We identified several novel value criteria, but the methods of evaluating these criteria were not sophisticated, and rationales for the applied scoring functions were missing. Therefore, questions remain around what novel components of value are most helpful to measure and how.

MCDA is a promising tool in the implementation of novel value criteria into the evaluation process for orphan drugs. To support the development of a transparent and justified evaluation process, the measurement methods and scoring functions for novel value criteria should be further investigated. Future lines of this research will support the development of an MCDA framework for the evaluation of novel value criteria to be used as a supplement to traditional cost-effectiveness evaluations in HTA.