FormalPara Key Summary Points

This is the first systematic review to comprehensively evaluate existing economic evaluations in relapsing multiple sclerosis (RMS) across the world, and to provide pragmatic recommendations for future economics models.

The cost-utility models in RMS are mostly constructed using a Markov cohort model design, and we recommend to continue using the same structure as it is appropriate and widely accepted by Health Technology Assessment (HTA) bodies across the world.

The existing economic models are completely based on the Expanded Disability Status Scale (EDSS), which is a physical disability scale and hence does not capture other clinically relevant outcomes such as cognition. It is recommended to incorporate such outcomes in the future models to make the models more clinically relevant.

The data sources for the models should be chosen carefully such that they reflect the current disease course and management paradigm for multiple sclerosis.

Introduction

Multiple sclerosis (MS) is an inflammatory demyelinating disease of the central nervous system with a variable clinical course. The disease can be broadly classified into relapsing and progressive forms of MS [1]. Relapsing forms of MS (RMS) encompasses all patients with MS who experience relapses, such as clinically isolated syndrome, relapsing–remitting MS (RRMS), and relapsing secondary progressive MS (SPMS), while progressive forms of MS include non-relapsing SPMS and primary progressive MS (PPMS).

Currently, there are 18 disease-modifying therapies (DMTs) approved by the European Medicines Agency or the US Food and Drug Administration (FDA) for the treatment of MS. All DMTs for MS are indicated for the treatment of patients with relapsing forms of MS in the USA, and the labeled indication varies in the European Union according to the disease activity, with the exception of mitoxantrone that is approved for progressive MS [2]. Ocrelizumab is the only DMT approved for PPMS [3], while siponimod and diroximel fumarate are the only DMTs approved for active SPMS as well as RMS [4, 5]. Considering only a few DMTs are approved for use in progressive forms of MS, this review focuses on RMS for which several different DMTs are available. Furthermore, the introduction of new DMTs over the last two decades has prompted economic evaluations to estimate the economic benefits, long-term clinical benefits, and consequently, the value for money of DMTs, with the overall aim to inform healthcare decision-making and funding decisions [6].

Therefore, it is hardly surprising that economic evaluations of DMTs have recently been the topic of systematic reviews, albeit with different areas of focus. One review focused on evaluating the quality of economic evaluations using available instruments while another specifically reviewed cost-effectiveness analyses employing a long-term time horizon [6, 7]. Other reviews were particularly interested in aspects related to modeling techniques [8,9,10,11,12]. Of particular interest, one review worked on identifying economic modeling methods, data sources, and assumptions of the cost-effectiveness analyses of DMTs; however, this economic modeling-focused review was limited to studies conducted in the UK and the RRMS population [8]. Therefore, it is important to include models developed outside the UK into this systematic review to paint the complete picture of economic evaluation studies of DMTs for RMS that have been undertaken globally. Moreover, as economic evaluations of DMTs play a significant role in informing healthcare decision-making and thus impacting patient care in RMS, performing a systematic review in this area will improved understanding of the differences and similarities across models, especially in terms of model structure and assumptions. Subsequently, this will help to identify areas for improvements and developments in future economic models in RMS, such that the evidence informing decision-makers and payers can be presented in a consistent manner and fully represent the value of DMTs in RMS.

The aim of this systematic literature review (SLR) was to evaluate the modeling approach and data sources used in economic evaluation of DMTs for RMS to identify differences and similarities between models, as well as the evolution of models in RMS over time. This SLR builds upon the previous systematic reviews of economic evaluations of DMTs in MS by expanding the review to include studies in the RMS population regardless of the country of study.

Methods

An SLR was performed to identify the evidence pertaining to economic evaluations of DMTs in RMS using a robust methodology as recommended by the National Institute of Health and Care Excellence (NICE) [13]. The search and reporting of the systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [14], and a protocol with predefined inclusion and exclusion criteria was developed. The SLR was subsequently updated in January 2021 to incorporate recent studies. The update follows the inclusion and exclusion criteria of the parent SLR. The protocol summary, search terms, and search dates of both the parent and SLR update are presented in the Electronic Supplementary Material (ESM).

The search for economic evaluation studies of adults with RMS (aged ≥ 18 years) published in English was conducted in several electronic databases, such as MEDLINE®, Embase®, and Evidence-Based Medicine (EBM) Reviews through the Ovid® platform. In the parent SLR, these databases were searched from the date of inception up to 20 December 2019 and in the SLR update, the database review was up to 15 February 2021. In addition, the Database of Abstracts of Reviews of Effects (DARE), National Health Service Economic Evaluation Database (NHS EED), and Health Technology Assessment (HTA) database were searched through the Centre for Reviews and Dissemination (CRD) York database from the date of inception up to 5 March 2020. These databases were not searched in the SLR update in 2021 as the DARE and NHS EED databases were not updated after 31 March 2015 and the HTA database was not updated after 31 March 2018. In addition, congress abstracts, HTA agency websites, the Cost-Effectiveness Analysis (CEA) registry, the University of Sheffield Health Utilities Database (HUD), the EQ-5D Publications Database, and bibliography of relevant reviews were hand searched in both the parent SLR and SLR update to include relevant studies.

Each citation was screened by one reviewer, and the decisions were validated by a second reviewer. Citations that did not match the eligibility criteria were excluded at “first pass”; where unclear, citations were included. Duplicates of citations were also excluded at the first-pass stage. In the second stage, each full text was screened by one reviewer, and the decisions were validated by a second reviewer. Data from the included studies were extracted into the data extraction sheet by a single reviewer, and the quality and completeness of the data were thoroughly checked by a second reviewer. All the extracted studies were critically appraised using the Drummond [15] and Philips [16] checklists.

The study did not require informed consent or institutional review board approval, as no identifiable patient information was extracted. This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

The database searches in the parent SLR retrieved 1400 citations. After initial screening of titles and abstracts, 398 articles were selected for full-text screening and, finally, 119 studies from 142 publications, one publication linked to a National Institute for Health and Care Excellence (NICE) HTA report, one publication by the Norwegian Institute of Public Health (NIPH) HTA, and 30 HTA reports were included in the parent SLR (Fig. 1). The database searches in the SLR update retrieved 175 citations. After first screening of titles and abstracts, 146 articles were excluded and 29 articles were selected for full-text screening. Subsequently, 11 studies passed the full-text screening, of which two studies [17, 18] were linked to two studies included in the parent SLR [19, 20], and one conference abstract [21] was linked to a full-text journal article [22]. Finally, eight studies from nine publications were included in the SLR update. Cumulatively, 155 publications and 30 HTAs from the parent SLR and SLR update were included (Fig. 1). When reporting the results, the publication linked to the NIPH HTA is categorized as an HTA, therefore the number of HTAs considered in the Results section is 31. Critical appraisal of the included studies was conducted using the Drummond [15] and Philips [16] checklists based on the recommendations in the NICE guidelines.

Fig. 1
figure 1

PRISMA diagram for study identification in the SLR and cumulative number of included studies. AWMSG All Wales Medicines Strategy Group, CA conference abstract, CADTH Canadian Agency for Drugs and Technologies in Health, CEA cost-effectiveness analysis, CRD Centre for Reviews and Dissemination, HAS Haute Autorité de Santé, HTA Health Technology Assessment, ICER Institute for Clinical and Economic Review, MS multiple sclerosis, NICE National Institute of Health and Care Excellence, NIPH The Norwegian Institute of Public Health, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, SLR systematic literature review, SMC Scottish Medicines Consortium

Critical Appraisal

The full results of the critical appraisal of included studies and HTAs using the Drummond and Philips checklist are provided in the ESM.

In summary, the results from the Drummond checklist show that 93% of studies were able to meet the criteria on study design, and at least 50% of studies met the criteria on reporting methods of data collection. However, the criteria on reporting the methods of evidence synthesis, the relevance of including productivity changes, and reporting the quantities of resource use separately were not met by 83% of the studies. Critical appraisal of HTAs using the Drummond checklist show that 83% of HTAs were able to meet the criteria on study design and that at least 50% of HTAs met 11 criteria on analysis and interpretation of results. However, several criteria on data collection, criteria on reporting productivity changes separately, reporting resource use quantities separately, addressing generalizability issue, and providing justification on discount rate choice were met by less than 50% of HTA reports.

The results from the Philips checklist show that at least eight criteria were met by at least 50% studies and HTA reports. The criteria on half-cycle correction, assumption on continuation of treatment effect, and model consistency were met by less than 50% of studies and HTAs.

Study Characteristics

Overall, 73 studies and 25 HTA reports were cost-utility analyses, 16 studies were cost-effectiveness analyses, five studies and three HTAs were cost-minimization analyses [23,24,25,26,27,28,29,30], and six studies were budget impact analyses [31,32,33,34,35,36]. Furthermore, 17 studies comprised multiple economic evaluations [18, 22, 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51], six were cost-consequences analyses [52,53,54,55,56,57], two were cost-saving analyses [58, 59], one was a cost–benefit analysis [60], and one was a cost-offset analysis [61]. A complete list of the type of economic evaluation employed by the studies is available in the ESM

In terms of the funding source, of the 81 studies reporting their funding source, only one study was funded by a university [42]. A majority of studies were funded by pharmaceutical companies (n = 65), while seven studies were funded by government institutions (n = 7) [40, 62,63,64,65,66,67]; eight studies received no funding [43, 68,69,70,71,72,73,74].

Geographically, of the 127 included studies, a majority were from the USA (n = 29), followed by the UK (n = 17) and Spain (n = 10) [24, 50, 53, 59, 75,76,77,78,79,80]. HTA reports were mostly from Scotland (n = 11) [29, 30, 81,82,83,84,85,86,87,88,89], followed by England (n = 8) [90,91,92,93,94,95,96,97], and Canada (n = 7) [28, 98,99,100,101,102,103] (Fig. 2). Eight HTA reports from Australia (Pharmaceutical Benefits Advisory Committee) were identified; however, they were not included in this review as the information on the reports was redacted. A complete list of the country origin of studies is available in the ESM.

Fig. 2
figure 2

Geographic region of included studies and HTA reports. Others* refers to the European countries of Austria (n = 1), Bulgaria (n = 2), Cyprus (n = 1), Czech (n = 1), Finland (n = 1), France (n = 2), Germany (n = 2), Ireland (n = 1), Norway (n = 1), Russia (n = 1), Serbia (n = 1). Others** refers to the Asian countries of China (n = 1), Kazakhstan (n = 1), Lebanon (n = 1), Saudi Arabia (n = 2), South Korea (n = 1), Thailand (n = 1). Others*** refers to the countries of Argentina (n = 1), Brazil (n = 2), Chile (n = 1), Colombia (n = 4), Egypt (n = 1), Peru (n = 1)

Modeling Approach

Most studies (n = 94) and HTA reports (n = 25) constructed Markov cohort models. The remaining studies used spreadsheet-based cohort models (n = 9) [38, 39, 69, 104,105,106,107,108,109], discrete event simulation (DES; n = 4) [110,111,112,113], a simulated decision tree (n = 1) [114], individual-level simulation models (n = 3) [37, 40, 57, 115], and microsimulation (n = 1) [116]. A consistent approach was seen in most studies that constructed Markov cohort models, wherein models were structured per the Expanded Disability Status Scale (EDSS), a widely used scale to quantify disability in MS. The complete data table is presented in the ESM. Categories of the model structure discussed in this review were identified by extracting the information explicitly being presented by the author and categorizing it into groups that fit best. The categories referred to in this review were adapted from the previous work by Brennan et al. [117], which separated cohort- and individual-level models, and Salleh et al. [118], which distinguished different simulation modeling techniques, when applicable. The definition of model types considered in this review is presented in the ESM.

Markov Model Health States

The most common Markov model structure consists of 21 health states: ten EDSS-based states each in RRMS and SPMS and one death state (20 studies,  12 HTA reports). Fourteen studies constructed a seven-state Markov model: four EDSS states in RRMS, two relapse states, and one death state. In the seven-state Markov model, the EDSS state was grouped into few limitations in mobility (EDSS 0.0–2.5), moderate limitations in mobility (EDSS 3.0–5.5), walking aid or wheelchair required (EDSS 6.0–7.5), restricted to bed (EDSS 8.0–9.5), and death (EDSS 10 or natural causes of death). Nine studies [47, 78, 119,120,121,122,123,124,125] and three HTA reports [88, 97, 103] constructed models consisting of 11 health states: ten EDSS states per the EDSS (EDSS 0–9) and one death state. The remaining studies had a Markov model in which the number of states varied between three and 21. A complete list of Markov model health states constructed in the studies is available in the ESM.

A Markov model of 21 health states stems from the model evaluating interferons and glatiramer acetate by the School of Health and Related Research of the University of Sheffield (ScHARR) [126] using 0.5-point increments in the EDSS. A submission to NICE for natalizumab [127] and a publication by Gani et al. [128] were the first to illustrate the schematic of this model. This model structure was adapted by subsequent economic evaluation studies, especially in the UK, as well as in HTA submissions to NICE.

Models that included 11 health states did not include SPMS-specific health states on the basis that conversion to SPMS can occur at a range of EDSS scores, patients of both subtypes are reported at many EDSS levels, and mean costs and quality of life are not dependent on MS subtypes [121, 129,130,131,132]. These arguments were supported by recent natural history cohort studies, i.e., the Swedish MS cohort and British Columbia MS cohort [130, 133], and by results from observational studies in Sweden, Germany, and UK [129, 131, 134].

The seven-state Markov model originated from a US study by Prosser et al. [135] and consists of four EDSS states, two relapse states, and one death state. The EDSS states pooled the RRMS, SPMS, and RMS populations because the authors considered MS to be a disease with broad activity, of which disability survival curves do not differ by MS type at the time of diagnosis, and health states in terms of costs and quality of life do not differ between MS subtypes [135, 136].

Time Horizon and Cycle Length

The time horizon in the models varied considerably between 1 and 50 years up to the lifetime horizon. Most models used the lifetime horizon (n = 37), followed by 50-year (n = 22) and 10-year (n = 21) time horizons. DES models appear to be consistent, as three of four DES models used the lifetime horizon [110, 112, 113]. The most common cycle length used by the models was 1 year (n = 63), followed by 3 months (n = 12) and then 1 month (n = 10) [42, 43, 50, 64, 68, 71, 73, 76, 137, 138]. A data table with data of time horizon and cycle length from all included studies is available in the ESM.

Model Inputs

Natural History of the Disease

This review found studies referred to data from clinical trials, MS natural history studies, or both. Overall, across regions, more studies referred to natural history studies than clinical trials, as shown in Fig. 3.

Fig. 3
figure 3

Sources of the natural history of multiple sclerosis data in economic evaluations included in this review. “Others” refers to Asia (China, Iran, Saudi Arabia, Thailand), Brazil, Canada. DMT Disease-modifying therapy, RMS relapsing MS

In total, 40 studies sourced data from natural history studies, 11 studies supplemented data from published natural history studies with data from clinical trials [51,52,53, 107, 139,140,141,142,143,144,145], and nine studies referred to clinical trials alone [67, 75, 116, 130, 132, 146,147,148,149]. Three studies extracted data from previously published economic evaluations [22, 125, 150].

Overall, the most common source of the natural history data was the London Ontario data set (n = 33). Another source was the British Columbia data set, which was sourced by 13 studies [42, 46, 77, 78, 119,120,121, 124, 144, 151,152,153,154]. Among these, four studies [42, 119, 144, 152] coupled the data with the London Ontario data set, seven studies supplemented the London Ontario data set with a Sweden registry data set, three of which were US studies [71, 137, 155], while the remaining were studies from Europe (n = 2) [50, 76] and Iran (n = 2) [43, 156]. Two studies included data from a country-specific registry data together with the London Ontario data set: one Italian study [157] and one Finnish study [158].

The most common approach in the HTA models was to supplement data with those from clinical trials with published studies (n = 12). Of these, nine HTA models used the London Ontario data set [85, 90, 92,93,94, 99, 100, 159, 160] while the remaining used the British Columbia data set (n = 1) [101] or both (n = 2) [95, 102]. Other HTAs sourced data from the British Columbia data set alone (n = 6) [88, 89, 95,96,97, 103], the London Ontario data set alone (n = 4) [91, 98, 161, 162], or randomized controlled trials (RCTs; n = 3) [86, 91, 110].

Treatment Effect

Most studies sourced treatment-effect data directly from RCTs (n = 43). Fewer studies sourced data from published evidence synthesis [network meta-analysis (NMA)/mixed treatment comparison (MTC)/indirect treatment comparison (ITC)/SLR] (n = 23) and previously published economic evaluations (n = 8) [21, 27, 64, 153, 156, 163,164,165]. Meanwhile, most HTA reports derived estimates by synthesizing evidence (NMA/MTC/ITC/SLR; n = 24). Only seven HTAs sourced treatment-effect data directly from RCTs [28, 30, 81, 84, 95, 100, 110].

Efficacy Waning

Overall, 12 studies assumed a treatment-waning effect, following the approach advocated by NICE in some of their previous appraisals. These studies assumed a treatment-waning effect of 25% after 2 years and 50% after 5 years of treatment. These studies originated from the UK (n = 4) [113, 121, 151, 166], Europe (n = 3) [31, 154, 167], Iran (n = 2) [144, 168], Canada (n = 1) [124], USA (n = 1) [119], and Chile (n = 1) [48]. Similarly, the base–case analysis of ten HTAs in the UK [83, 87, 88, 91, 92, 94,95,96,97, 110] assumed that the treatment effect would fall to 75% after 2 years of treatment and to 50% after 5 years of treatment. Meanwhile, six HTAs did not assume an efficacy waning in their base–case analysis yet included it in the sensitivity analysis [81, 86, 89, 93, 96, 159]. Moreover, 14 studies assumed a sustained treatment effect over time.

Utility Values

Studies tended to use utility data available in the literature, while some HTAs coupled the literature data with data from clinical trials. Thirteen studies in the UK [37, 46, 49, 63, 70, 112, 113, 128, 144, 146, 151, 166, 169], seven studies in the USA [40, 41, 71,72,73, 114, 137], and 11 studies in Europe [50, 76, 130, 132, 140, 142, 145, 147, 158, 170, 171] referred to published country-specific studies to derive utility inputs. Country-specific studies were used whenever available, while data from published studies from other countries were used in 12 studies [40, 64, 66, 74, 114, 119, 124, 149, 150, 154, 156, 163] and five HTAs [98,99,100, 102, 161]. The most common publication referred to by the UK studies was that of Orme et al. [172] (n = 8) [46, 49, 70, 112, 113, 128, 144, 166], while in the USA, the most common publication referred to was that of Prosser et al. [173] (n = 4) [71,72,73, 137]. Another approach was to use data from clinical trials to supplement data from utility studies (n = 9) [67, 78, 121, 125, 141, 143, 157, 167, 174].

Models in HTAs in the UK sourced utility data from either published studies supplemented with RCT data (n = 12) [83, 85, 86, 88, 89, 91, 92, 94,95,96,97, 159] or published studies alone (n = 10) [81, 87, 90,91,92,93, 95, 110, 160, 161]. NICE HTA reports consist of the submitted model by the manufacturer and the model developed by the NICE Evidence Review Group; therefore, the same citation can be reported multiple times. One UK study in particular became the main reference for UK HTAs [172], as this study estimated utility values from the UK MS Trust Database. Most HTAs outside of the UK sourced utility value data directly from published studies (n = 7). Among these seven HTAs, four [98, 100, 102, 161] referred to studies conducted in other countries, particularly a UK study [172]. In addition, one HTA from Canada [101] and one from the USA [162] combined data from studies with those of clinical trials.

Mortality

Overall, 38 studies reported the source of mortality data inputs. While 19 used adjusted data from the national life table for MS-specific mortality risk using mortality multipliers from published studies, 18 studies used data from the national life table of the general population (age- and gender-adjusted) without adjusting for MS. The published study most commonly referred to was that of Pokorski [175] (n = 14) [67, 74, 119, 128, 138, 139, 141, 143, 145, 150, 157, 165, 166, 174]. Other published mortality studies referred to were those of Hirst et al. [176], which was referred to by one study [152], Cutter et al. [177], which was referred to by one study [120], and Sadovnick et al. [178], which was referred by two studies [154, 166]. Three studies used specific mortality data of a MS population: one study [41] sourced mortality data from pivotal clinical trials of interferon beta-1b [179, 180], one [158] from the Finland MS registry [181], and one [130] from the Danish MS registry [182].

Most HTA models adjusted data from the respective national life table using the MS mortality multiplier from the Pokorski study [175] (n = 11) [90,91,92,93,94,95,96, 100,101,102, 162]. Three HTAs used the MS mortality multiplier from other published studies, with one HTA [160] using the mortality multiplier from the study by Leray et al. [183], one HTA [99] using the MS mortality multiplier from the study by Sadovnick et al. [178], and the last [103] using the mortality multiplier from the study of Kingwell et al. [184]. Two HTAs [98, 161] reported no adjustment to the mortality rate of the general population.

Summary

A summary of the recommendations for model characteristics are given in Table 1.

Table 1 Summary of recommendations for model characteristics

Discussion

Economic evaluations of DMTs for RMS and the modeling aspects involved play a significant role in informing healthcare decision-making, which will, in turn, impact patient care in MS and eventually patients’ outcomes. Moreover, evidence informing decision-makers and payers need to be presented in a consistent manner and fully represent the value of DMTs in RMS. Therefore, it is important to understand the differences and similarities across models, especially in model structure and assumptions, to be able to identify areas for improvements and developments in future economic models in RMS. This study reviewed all published economic evaluation studies and HTAs of DMTs in RMS available to date. While previous reviews have focused on a particular geographic region, i.e., the UK [8], a short time period [190], or the reporting quality of studies [6], to the best of our knowledge, this is the first comprehensive review to examine the trend of model characteristics over the years (Fig. 4) and provide recommendations for future models (Table 1).

Fig. 4
figure 4

Timeline diagram of evolution of Markov models of RMS over time. EDSS Expanded Disability Status Scale, GA glatiramer acetate, RCT randomized controlled trial, RMS relapsing MS, RRMS relapsing–remitting MS, ScHARR School of Health and Related Research, SPMS secondary progressive MS

Our review suggests that models differ in several assumptions (e.g., long-term treatment effect and EDSS improvement over time) and in the approach in estimating mortality rates. At the same time, there are similarities in modeling aspects, such as the type of model, model structure, time horizon, and the source of natural history data. Additionally, it is apparent from this review that medicines manufacturers have taken a leadership in economic modeling to generate economic evidence of DMTs in RMS due to the requirement by HTA agencies for funding or reimbursement decisions. It is fair to conclude the evolution of economic modeling methods in this area occurred in conjunction with the development of new DMTs by medicines manufacturers.

First, evidence from this review suggests that the majority of studies consistently used a Markov cohort model. Previous studies have pointed out the limitations of both Markov cohort models in modeling RMS disease and of EDSS in representing cognitive impairment related to MS [46, 119, 187]. However, only three studies used the DES model to overcome this limitation and to capture the heterogeneity and complexity of MS and its treatments [111,112,113]. Furthermore, it is important to highlight that only one model used in the HTA reports was a DES model and the rest were Markov cohort models, explaining the prominent use of Markov cohort model in economic evaluation studies in RMS. Evidence suggests that economic models continue constructing a Markov cohort model to align with HTA agencies’ guidance. A DES model should be considered when individual patient-level data are available and when there is a need to incorporate specific events such as treatment switching to evaluate long-term outcomes [113].

Secondly, we found that most models were structured based on the EDSS scores, mainly as 21 EDSS-based health states covering ten states in RRMS, ten states in SPMS, and one death state. This structure can characterize the relapsing forms of MS, including the transition to SPMS. However, as this structure is solely based on a physical disability scale, this would mean current models do not capture other clinically relevant outcomes of RMS, such as cognition, despite cognitive dysfunction affecting up to 70% of patients [198]. This may therefore imply that current models undervalue the benefits of new therapy where there is a proven effect on elements of cognition. Our recommendations are twofold; first, that the Markov model use the 21 health states when it is structured based on EDSS and, second, that the model structure incorporate other clinically relevant outcomes of MS, such as the cognition.

Though models are consistently structured as a Markov cohort with EDSS-based states, models differ in terms of assumptions. First, regarding the transition to a lower EDSS state wherein prior to 2017 models assumed the effectiveness of DMTs could not alter patients’ disability such that a transition to a lower EDSS state was possible. This assumption was in line with the London Ontario data set. Recently, several models assumed that patients could transition to a lower EDSS state (Fig. 4) [46, 119, 121]. Indeed, the emergence of this assumption concurred with the new evidence surrounding the natural history of MS by Palace et al. [133] using the British Columbia data set. Indeed, the London Ontario and British Columbia data sets were the prominent sources for the natural history of MS data among the economic models. However, both data sets were obtained from cohorts in the 1970s and 1980s. Therefore, there is a need for a new natural history data set of MS to derive transition probabilities that reflect the current MS disease course in order to better involve decision-making based on the current management paradigm for MS. We recommend incorporating the more recent evidence in the future economic models and assume patients can transition to a lower EDSS state. Additionally, a sensitivity analysis using the London Ontario data set should be performed to understand the impact of the alternative.

Furthermore, differences were observed in the assumption on how the long-term treatment effect is applied in the model, and it varied with the country of study. Most of the studies from the UK assumed that treatment efficacy would wane after a certain period of time as a result of NICE imposing this assumption to address the concern over the short clinical trial duration and the formation of neutralizing antibodies that might hamper long-term effectiveness of DMTs [92]. However, studies and HTAs outside the UK, especially those that were recently published, generally did not apply this assumption. In particular, the CADTH Pharmacoeconomic Review Report of Ocrelizumab and NICE Technology Appraisal Guidance of Ocrelizumab (TA533) [96, 101] mentioned that applying such an assumption would double count discontinuations due to efficacy failure, which is likely due to the current practice where patients switch to another therapy when experiencing a lack of treatment response, and that ocrelizumab generates negligible neutralizing antibodies. Indeed, evidence from long-term studies of fingolimod, ocrelizumab, and other DMTs show sustained benefit in clinical and magnetic resonance imaging outcomes [193,194,195], supporting the no-waning-effect assumption. Acceptance of the no-waning-effect assumption by the Canadian Agency for Drugs and Technologies in Health [101] marked an important milestone. This decision corresponds with our recommendation for future models not to apply the efficacy-waning assumption. A sensitivity analysis of an alternative assumption can be performed to evaluate the impact.

Another apparent heterogeneity across models was model inputs, such as mortality and utility data sources. Our review found that despite the availability of a recent evidence by Harding and colleagues [199], the most common source cited for mortality data was the Pokorski et al. study [175]. There are a number of caveats that concern the applicability of data from the Pokorski et al. [175] study as well as data from the Harding et al. [199] study to the current therapeutic environment. First, the number of DMTs currently available compared with the Pokorski et al. [175] era may have an impact on the mortality rates. Second, Harding et al. [199] demonstrated that the mortality rate increases with higher EDSS scores but more substantially at EDSS ≥ 8.0, contrary to the evidence suggested by the Pokorski et al. [175] study where the mortality rate increases even with a mild disease form. Finally, the finding from the Harding et al. study [199] suggests that the life expectancy for MS patients has improved more than that of the general population. As mentioned above, there are limitations associated with both studies, and we recommend that future models consider using contemporary mortality data for RMS patients. There is a need for further mortality studies to resolve the questions raised.

On similar lines, this review highlights the different approaches taken by studies and HTAs in their utility inputs. Generally, HTAs used data from individual patient-level data from RCTs of the DMTs of interest coupled with data from published utility studies, while economic evaluation studies relied on published utility estimates from the literature. Accordingly, this review also found several studies referring to utility data from other countries (n = 7), which could indicate the lack of utility studies in various countries. This might be a concern as several national guidelines recommend the use of utility values from the countries of interest and transferring utility scores across countries might be questionable [197]. Therefore, future models should select their utility source from their respective country whenever available. When local data are unavailable, we suggest conducting a sensitivity analysis with various sources of utility data inputs to assess the magnitude of impact to the cost-effectiveness estimate [190]

To conclude, existing economic models of DMTs in RMS are mostly constructed as Markov cohort models as per the EDSS, which should continue. Furthermore, in light of the availability of many new multiple DMTs in current clinical practice, future models should consider taking the following approach: to allow for the transition to a lower EDSS state; to assume a sustained treatment effect over the long term; to use a contemporary MS-specific mortality data; and to consider the most recent natural history and country-specific utility data. In addition, future models should incorporate other clinically relevant outcomes, such as the cognition, vision, and psychological aspects of RMS, to be able to present the comprehensive value of DMTs.