FormalPara Key Points

This review examined how existing health economic models (i.e., tools to demonstrate the monetary value of a new treatment and its impact on a healthcare system) are used to evaluate drugs for relapsed/refractory multiple myeloma; in particular, it evaluated challenges associated with key methodological concepts, including conceptualization, model structure, uncertainty analyses, model validation, and transparency; and how health economic models addressed differences in the line of therapy and exposure to prior treatment.

Published economic models in relapsed/refractory multiple myeloma mostly adhere to conventional methodology, such as cohort partitioned survival or Markov models, as opposed to more complex, individual patient model types due to the higher data requirements associated with the latter. Less than half of the models included a step to verify the results, potentially reflecting the variance in validation reporting between health technology assessment submissions and journal articles, where journals typically offered a more comprehensive coverage.

1 Background

Multiple myeloma (MM) is a rare incurable hematological cancer that forms in plasma cells, a type of white blood cell that produces antibodies [1]. As malignant plasma cells accumulate in the bone marrow, they eventually lead to bone destruction and marrow failure. In 2020, more than 176,000 new cases of MM (~ 0.9% of all new cancer cases) and 117,000 deaths due to MM (~ 1.2% of all cancer deaths) were recorded globally according to data from the Global Cancer Statistics initiative of the World Health Organization [2]. Multiple myeloma accounts for 1.8% of all new cancers and 13% of hematologic cancers in the USA [1, 3]. Similar overall trends have been observed in the European Union (approximately 36,000 new cases in 2020 with an age-adjusted standardized incidence rate of 7.5 cases per 100,000), although the incidence of disease differed considerably (range: 2.7 per 100,000 in Bulgaria to 10.9 per 100,000 in Denmark) across the member states [4].

The cause of MM is not fully known. The disease disproportionately affects people over the age of 60 years and men, [5], as well as Black individuals and African American individuals, who are more than twice as likely to be diagnosed with MM than White individuals (estimated 15.9 vs 6.9 cases, respectively, per 100,000) in the USA in 2019 [6]. Disease prognosis remains poor, with a 5-year relative survival rate of less than 60% in the USA [3]. Most patients with MM eventually relapse and/or become refractory to treatment (RRMM) [7]; therefore, treatment goals aim to prolong survival and maximize quality of life by alleviating symptoms, controlling disease progression, and minimizing adverse events.

In the last decade, the treatment of MM has improved dramatically with the introduction of agents such as second-generation proteasome inhibitors (PIs), immunomodulatory drugs (IMiDs), and monoclonal antibodies (mAbs) [8]. In recent years, other novel agents have received marketing authorization, including a chimeric antigen receptor T-cell (CAR-T) targeting B-cell maturation antigen immunotherapy, idecabtagene vicleucel, which was approved by the US Food and Drug Administration in 2021 for adults with RRMM after four or more lines of therapy (including a PI, an IMiD, and an anti-CD38) [9]. The choice of second-line or later-line treatment for RRMM generally depends on the exposure and response to first-line agents and varies among organizations and jurisdictions. Recommended first-line treatment options include lenalidomide-based, bortezomib-based and/or daratumumab-based regimens. In patients who are eligible for transplant, first-line induction treatment is followed by autologous stem cell transplantation and lenalidomide or bortezomib maintenance therapy [7, 10,11,12].

As the therapeutic landscape in RRMM continues to evolve, decision-analytic models play a cornerstone role in the economic evaluations of treatments to inform decision makers, payers, and stakeholders about whether new treatments should be adopted and reimbursed and whether existing regimens remain as an efficient use of healthcare resources [13]. The aim of this research was two-fold: (1) to critically review the existing health economic models used in economic evaluations of systemic treatments for RRMM to better understand their methodological challenges and rigor and (2) to summarize how health economic models addressed differences in the line of therapy and exposure to prior treatment. To achieve these objectives, a systematic literature review (SLR) was conducted to comprehensively summarize the available information on decision-analytic models used in economic evaluations of systemic treatments for patients with RRMM. In addition, the critical review of the economic evaluations was based on five aspects as recommended by Caro [14]: conceptualization, model structure, uncertainty analyses, model validation, and transparency.

2 Methods

The SLR was conducted in accordance with the reporting requirements of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [15] guidance and general methodological requirements outlined in the Cochrane Handbook for Systematic Reviews of Interventions [16].

2.1 Literature Sources

Literature searches were performed on 17 February, 2023, in MEDLINE and MEDLINE In-Process, Embase, National Health Service Economic Evaluations Database, EconLit, Database of Abstracts of Reviews of Effects, the Health Technology Assessment (HTA) database (via International Network of Agencies for HTA), and the Cost-Effectiveness Analysis Registry as part of a wider SLR of economic evidence. The bibliographies of any identified SLRs published since 2020 were reviewed to ensure that all relevant studies were captured.

Additionally, manual searches were conducted to identify any economic models that were only published as part of submission dossiers to key HTA agencies (Canadian Agency for Drugs and Technologies in Health [CADTH]/pan-Canadian Oncology Drug Review, National Institute for Health and Care Excellence [NICE] in England, Australian Pharmaceutical Benefits Advisory Committee [PBAC], and Scottish Medicines Consortium [SMC]) and a US-based non-profit organization (Institute for Clinical and Economic Review) that produces clinical and cost-effectiveness reports for use by multiple stakeholders. A full list of literature database search sources is presented in Table 1 of the Electronic Supplementary Material (ESM).

Database searches were restricted to journal articles published since 2014. Bortezomib, lenalidomide, and daratumumab were first approved to treat MM in the first-line setting in 2008 [17], 2015 [18], and 2018 [19], respectively, and thus studies published before 2014 were unlikely to include significant numbers of patients who received these treatments in the first-line setting. Similarly, manual HTA searches were restricted to dossiers (first submissions) published since 2018 given the time needed from enrolling patients into a clinical trial to HTA submission. The full search strategies can be found in the ESM.

2.2 Study Selection and Screening

The selection of relevant economic models followed predefined criteria using the Population, Interventions, Comparisons, Outcomes, and Study (PICOS) design framework as outlined in Table 1. The target population comprised adults with RRMM irrespective of the treatment previously received or line of therapy. Any systemic pharmacological intervention aimed at managing the disease was considered and no restrictions were placed on included comparators. Outcomes of interest included economic model conceptualization, structure, uncertainty elaboration, validation, and transparency to align with the recommendations in Caro [14]. Only full economic evaluations (i.e., cost-effectiveness analysis, cost-utility analysis, cost-minimization analysis, and cost-benefit analysis) published in full-text articles and first submission documents were considered for inclusion. Budget impact and cost analyses were excluded. Conference abstracts were not considered given the limited information on the economic model structure and methods provided. No geographical restrictions were applied to literature database searches. Only English-language publications and HTA dossiers were included.

Table 1 Population, Interventions, Comparisons, Outcomes, and Study (PICOS) design selection criteria

Screening of articles identified through the literature database searches was conducted by two independent reviewers at both the title/abstract and full-text levels. Any discrepancies were resolved by a third, more senior reviewer. Manual searches of HTA submission dossiers were conducted by one reviewer, with a second reviewer validating the search approach and results.

2.3 Data Extraction

Data from the included models were extracted by one reviewer with full data validation by a second reviewer using pre-designed data extraction sheets. Extracted data variables were related to the five core model elements highlighted by Caro [14]: conceptualization, structure, uncertainty, validation, and transparency. Patient and treatment/comparator characteristics, data sources and inputs, and assumptions were also systematically extracted. To avoid double counting, we connected related (multiple) publications by mapping identified models for the same treatment, treatment line, and country as well as re-submissions to the same HTA agency.

3 Results

The database searches returned 2867 records, of which 2396 records were screened at the title/abstract level following the removal of duplicates; 330 full-text articles were screened, of which 312 were excluded. In total, 18 full-text articles published in peer-reviewed journals were eligible from the literature and an additional 36 records were identified via HTA websites, resulting in 54 records (49 primary publications representing unique models and five related publications) included in the SLR (see Fig. 1). The evidence summary refers to the 49 primary publications hereafter.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram. HTA health technology assessment

3.1 Overview of General Model Characteristics

3.1.1 Treatments Evaluated in the Included Models

Most models included patients across multiple lines of treatment; seven models were in second line only [20,21,22,23,24,25,26], 27 models were second-line or later-line therapies [27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53], and 14 models were in patients who had received three or more lines of treatment previously [54,55,56,57,58,59,60,61,62,63,64,65,66,67]. One model in heavily pre-treated patients did not specify the treatment line setting, but the clinical trial data used for the model suggested that patients had received a median of five prior treatments [68]. Overall, information on relapse and refractoriness to prior therapy was limited; most models referred to the population as RRMM without specifying the exact line of therapy or exposure to prior therapy. From those publications providing more information on this area, one model each was in patients who were triple-class refractory [33], triple-class exposed [56], triple-refractory or penta-refractory [64], triple-/quad-/penta-refractory [67], and second-line, lenalidomide-refractory [39]. Baseline demographics of the clinical studies utilized were presented in 17 models [20, 22, 28,29,30, 36, 37, 39,40,41, 44, 55, 58,59,60, 62, 67].

Of the 49 models evaluated, seven [25, 29, 37, 43, 45, 51, 67] evaluated the cost effectiveness of multiple interventions, while the remainder focused on a single agent or treatment combination versus comparator treatment(s). The models evaluated a wide range of treatments and combination regimens, many of which contained a PI, an IMiD, and an mAb. Bortezomib-based (n = 26) [20, 22, 23, 25,26,27, 30, 32, 34, 36,37,38,39,40, 42,43,44,45,46, 48, 49, 51, 52, 59,60,61] or lenalidomide-based regimens (n=17) [27, 28, 30, 31, 35, 37, 38, 40, 43,44,45,46, 49, 51, 62, 66, 67] were the most common reference treatments across the models. Sixteen models evaluated the cost effectiveness of treatments including an mAb, such as daratumumab, elotuzumab, or isatuximab [22, 23, 25, 29, 35,36,37, 41, 45, 47, 51, 53, 58, 60, 61, 63]. Thirty-nine models assessed a combination including a PI, such as bortezomib, carfilzomib, and ixazomib [21,22,23,24,25,26,27,28,29,30,31,32, 34,35,36,37,38,39,40,41, 43,44,45,46, 48,49,50,51,52, 62, 66]. This number was mainly driven by bortezomib-based regimens, which have become a cornerstone of MM treatment. Lenalidomide-based and pomalidomide-based treatments (both of which are IMiDs) were evaluated in 27 models [20, 21, 24, 28, 29, 31, 35, 37,38,39, 42,43,44,45, 47, 49, 52,53,54,55, 58,59,60, 62, 63, 66, 68]. Chimeric antigen receptor T-cell therapy was evaluated in two models only [56, 67]. This may reflect the recent approval of these therapies since 2021 by the US Food and Drug Administration [69,70,71], which are generally reserved for patients who have received multiple prior lines of treatment.

3.1.2 Model Outcomes

Health benefits were primarily reported in terms of quality-adjusted life-years (QALYs). Forty-two models reported incremental cost-effectiveness ratios (ICERs) per QALY [20,21,22, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48, 51, 54,55,56,57,58,59,60,61,62, 64,65,66,67,68] and 31 models reported ICERs per life-year gained [20, 22, 28,29,30,31,32, 35,36,37,38,39,40,41,42,43,44, 48, 54,55,56,57, 59,60,61,62, 64,65,66,67,68]. Nine models (of note, all of which were included in submissions to the PBAC in line with requirements for presenting cost data and cost implications to the Australian health budget [72]) reported total costs and additional costs per course and patient as a result of introducing a new treatment [23, 24, 48,49,50, 52, 53, 63, 65]. As some model publications reported more than one type of outcome, the sum of the models reported above is higher than the total number of unique 49 models included in this SLR.

3.1.3 Model Settings and Perspectives

The vast majority of the models examined a population from a certain country, such as Australia (14, mostly driven by a high number of PBAC submissions) [23,24,25,26, 48,49,50,51,52,53, 63,64,65,66] the USA (12 models) [27, 29,30,31,32,33,34, 36, 37, 57, 67, 68], the UK (UK; eight out of 11 from England [22, 42,43,44, 59,60,61,62] and the other three from Scotland [45,46,47]), and Canada (five models) [38,39,40,41, 58]. One model each examined a population from Singapore [35], Czech Republic [28], Chile [20], Sweden [54], South Africa [21], and France [55]. Among all identified models, only one reported and compared the results across populations from multiple countries (i.e., from Canada and France) [56].

The majority of analyses in the identified models used a public healthcare payer perspective (n = 35) [20, 22,23,24,25,26, 29, 33, 35, 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53, 58,59,60,61,62,63, 65, 66, 73], followed by eight models using a payer perspective without further definition [27, 28, 31, 32, 34, 36, 67, 68]. Two models used a commercial healthcare payer perspective [30, 57], three were from a societal perspective [21, 54, 56], and one model did not report the perspective [55].

A summary of the characteristics of models published in peer-reviewed journals and those included in HTA submissions is given in Tables 2 and 3, respectively. Characteristics of each identified model according to the recommended core model elements by Caro [14] are presented in the following sections.

Table 2 Model characteristics: peer-reviewed articles (19 articles)
Table 3 Model characteristics: health technology assessment submissions (31 submissions)

3.2 Conceptualization

Six of the 18 models published in a peer-reviewed journal described the intended audience, including health policy makers, payers, and other stakeholders [21, 27, 29, 35, 37, 56]. All of the models included in submissions to CADTH, NICE, PBAC, and SMC were intended for HTA reimbursement. The model included in the Institute for Clinical and Economic Review assessment report was intended for multiple stakeholders and healthcare decision makers in the USA given the remit of this organization [67].

Most models were cohort level (n = 46) [20,21,22,23,24,25,26,27,28,29,30,31,32, 34,35,36,37,38,39,40,41, 43,44,45,46,47,48,49,50,51,52,53, 55,56,57,58,59,60,61,62,63,64,65,66,67,68] and three were individual patient level [33, 42, 54]. All models were developed for a single application (i.e., assessing the added cost and clinical benefit of a certain technology at a single point in the treatment pathway). Multiple applications may include, for example, “whole disease models,” where the scope of the economic evaluation is more extensive and covers more than one decision point in the disease pathway; no model captured such considerations. A summary of conceptualization items of models published in peer-reviewed journals and included in HTA submissions is reported in Tables 4 and 5, respectively.

Table 4 Model conceptualization: peer-reviewed articles (19 articles)
Table 5 Model conceptualization: health technology assessment submissions (31 submissions)

3.3 Model Structure

Partitioned survival models (PSMs) are frequently used in oncology [74, 75]. This modeling approach determines health-state membership based on the within-trial relationship between survival curves that are non-mutually exclusive. In a typical three health-state PSM, progression-free survival (PFS) and overall survival (OS) curves directly determine the proportion of patients who remain alive in the progression-free and the overall proportion of alive patients, respectively, while the proportion of patients with non-fatal progression is based on the difference between the two curves. Overall survival and PFS are projected independently, and transition probabilities are not required to establish the distribution of patients across the health states over time [74, 75]. In state transition models (e.g., Markov, semi-Markov), the distribution of patients across health states is determined based on transition probabilities that depend on the current health state, and, in the case of semi-Markov models, also on time [75, 76]. In discrete event simulation models, the simulated patients experience events at specific (i.e., discrete) timepoints and the consequences of each event (costs and health benefits or harms) are processed as the events occur [77, 78].

3.3.1 Cohort vs Individual Patient Approach

Following the classification provided by Caro [14], this review classified models into two main modeling approaches, deterministic and stochastic. Stochastic models incorporate randomness or uncertainty to simulate different possible scenarios and their associated outcomes and determine individual trajectories under varying conditions such as disease progression and patient characteristics, which is referred to as an individual patient approach or an individual-level approach hereafter. For example, Monte Carlo Markov Chain (MCMC) microsimulation models allow hypothetical individuals within hypothetical populations to transition to different states included in the model structure. By contrast, deterministic models consider the entire cohort based on fixed parameters and assumptions and do not involve randomness, which is referred to as a cohort-level approach in this review. The cohort-level approach includes, for example, decision trees, PSMs, and Markov models; although probabilistic analyses are usually conducted within these approaches to assess parameter uncertainty [79, 80].

Of the 49 models, three were individual-level models, two were discrete event simulation models [42, 54], and one was an MCMC microsimulation model [33]. Among the 46 cohort-level models, 27 employed a PSM structure [22, 25, 28,29,30,31,32, 35, 38,39,40,41, 43,44,45,46,47,48, 52, 56,57,58, 60,61,62, 64, 66], six were Markov models [20, 21, 33, 34, 36, 59], two were semi-Markov models [37, 65], and one model combined a decision node and a PSM [67]. Structure was not explicitly reported for ten models [23, 24, 26, 49,50,51, 53, 55, 63, 68].

3.3.2 Justification and Description of Model Structure

Overall, there was a lack of justification on the model structure chosen. The rationale was available in eight out of 49 reports[20, 22, 44, 56, 60, 62, 64, 67]. Of those, six were PSMs [22, 44, 56, 60, 62, 64], one combined a decision-node and PSM structure [67] and one was a Markov model [20]. The use of PSMs was mainly justified as being a common approach that was consistent with other models and/or appraisals in RRMM and had been widely used and accepted in economic evaluation studies in the same or similar settings in oncology [22, 44, 60]. Other reasonings provided in those models included being the best use given data availability [22, 44, 60, 64] and limited assumptions needed [44, 60], as well as the rationale that these models provided a better fit to OS and PFS data from clinical trials [60, 62]. The only combined model (a decision node and PSM) justified the selection of the decision node component by stating it better reflected the model purpose of describing costs and outcomes associated with the preparation for receiving CAR-T therapy [67]. The Markov structure was argued to be appropriate when disease conceptualization necessitated a long-term perspective and entailed transitions between distinct health states [20].

Among eight out of 49 included models, one publication described a model that combined a decision node and PSM structure [67], which was developed to investigate various CAR-T therapies. In this model, the decision node covered the initiation of the CAR-T therapy process by including all patients eligible for CAR-T who had received leukapheresis. Following the initiation of leukapheresis, patients would either continue to a T-cell infusion or would discontinue because of other reasons such as progressed disease, adverse events, manufacturing failures, or death before receiving the infusion. Patients who discontinued treatment because of reasons other than death receive a basket of treatments appropriate for triple-refractory or quad-refractory patients. The PSM structure was designed to capture long-term costs and benefits related to various treatments and covered three health states, two of which represented the response status (i.e., alive and responding to treatment; alive and not responding to treatment) and an absorbing death health state. Responding patients continued with the current treatment, while non-responders transitioned to progressed disease and received the basket of subsequent treatments. In addition, different costs and utility values were applied to distinguish on treatment and off treatment in the corresponding health state.

Thirty-seven of the 46 cohort-level models reported on specific health states: 22 employed a three-health-state model [21, 25, 27,28,29, 35,36,37,38,39, 43, 46,47,48, 52, 56, 58, 62, 64, 66,67,68], nine used four health states [20, 30, 32, 41, 44, 57, 60, 61, 65], and six used five health states [22, 31, 34, 40, 45, 59]. Progression free, progressive disease, and death were the most common health states. Where more than three health states were employed, progression free and/or progressive disease were usually split into sub-categories of patients who were on or off treatment.

One of the three individual-patient level models (i.e., a discrete event simulation) [54] modeled a patient’s course through three health states including stable disease, progressive disease, and death, while the other discrete-event simulation model, based on patient characteristics and treatment, calculated time to progression and post-progression survival, which were then added to calculate OS [42]. The model choice in the latter model was justified as an appropriate approach to capture the clinical history and variation in efficacy between patients, and that this structure did not require a half-cycle correction. However, the Evidence Review Group (ERG) was not convinced with the justification for the extra complexity of a discrete event simulation model, especially when the model only examined one course of the drug and did not track patients through a sequence of treatments. Additionally, another critique was that a half-cycle correction could be easily implemented in other simpler structures such as Markov models, while the marginal benefit of incorporating precise event timing into discrete-time modeling would be negligible given the model’s overarching uncertainties [42]. An MCMC microsimulation model [33] generated both duration on treatment and survival estimates at the patient level from trial data using the PFS and OS curves and simulated subjects across different health states including complete response, minimal response, no response, and progressive disease. The authors stated that the model structure was chosen to extend the preliminary analysis from the Institute for Clinical and Economic Review. In this model, the initial analysis compared idecabtagene vicleucel and ciltacabtagene autoleucel, with a basket of relevant comparator regimens. It also investigated costs and QALYs related to belantamab-mafodotin versus its own comparator market basket, without comparing the three treatments with each other directly. By using the MCMC approach, the analysis generated a hypothetical cohort of patients with RRMM who were treated with one of the three treatments of interest, which enabled direct comparisons across their costs and outcomes.

3.3.3 Cycle Length

Sixteen models each used a monthly (or 28-day) cycle length [20, 21, 28, 31,32,33,34, 37, 43, 44, 46, 51, 56, 59, 67, 68] or a 1-week cycle length [22, 29, 30, 35, 38, 40, 41, 45, 48, 57, 60,61,62, 64,65,66]. Two models used a cycle length of 3 weeks (21 days) [36, 39]. One of the two discrete-event simulation models mentioned that the concept of a model cycle length was not applicable as it was a continuous time model [42]. The remaining models did not provide this information.

Ten models explicitly consider a half-cycle correction (five from those with a monthly cycle [37, 44, 56, 59, 68], four with a weekly cycle [22, 29, 35, 62], and one with a 3-week cycle [36]). No other information related to this, including the methods used for a half-cycle correction, was reported by the other models.

3.3.4 Time Horizon

The time horizon used across the models varied. Almost half of the models (n = 22) specifically stated that a lifetime horizon was used in the model, with a range between 15 and 40 years [21, 22, 28,29,30, 33, 34, 36, 39, 41, 42, 44,45,46,47, 54, 56, 58,59,60, 62, 67]. Eight models used a time horizon of 20–30 years [20, 25, 31, 32, 38, 40, 51, 66], and ten used a time horizon of 15 years or less [26, 27, 35, 37, 48, 57, 61, 64, 65, 68].

Where used, the lifetime horizon was justified as suitable and sufficient to capture all potential long-term costs and benefits related to treatments and economic and clinical impacts of RRMM as this is an incurable disease that requires lifelong treatment. In three models, a time horizon at a certain duration was selected where almost all patients would have died (i.e., survival is around 1–2%), which was also deemed to be a lifetime horizon [42, 57, 62].

3.3.5 Clinical Inputs

In most instances, clinical inputs were not discussed in detail. Instead, only trial names and, if applicable, indirect treatment comparison (ITC) methods were mentioned. Most of the clinical efficacy data were provided by ITCs (n = 33), including network meta-analyses, matching-adjusted indirect comparisons, or simulated treatment comparisons [20,21,22,23, 27,28,29,30,31, 33, 38,39,40,41,42,43,44,45,46, 48, 50, 52, 53, 56, 57, 59, 61,62,63,64,65, 67, 68]; individual clinical trials provided clinical model inputs in 11 cases [23, 26, 32, 34,35,36, 54, 55, 58, 60, 66].

Health utilities were sourced directly from clinical trials (n = 12) [22, 25, 41, 44,45,46,47,48, 60, 62, 64, 68] or from the existing literature in the absence of utility data collected in the clinical trials (n = 12) [20, 21, 27, 29, 33,34,35,36,37, 40, 42, 59]. Eleven models used a mixed approach combining both clinical trial data and literature sources [28, 30,31,32, 39, 43, 54, 56, 57, 65, 67], while the remaining 14 models did not provide this information [23, 26, 38, 49,50,51,52,53, 55, 58, 61, 63, 66]. Cost inputs were generally from country-specific published sources such as public, academic, or non-governmental reference cost lists. Where reported, healthcare resource use (HCRU) inputs were obtained from the published literature or previous HTA submissions (n = 8) [22, 28, 30, 31, 57, 60, 67, 68], specific trials or observational studies (n=5) [20, 21, 28, 35, 60], expert elicitation (n = 4) [21, 31, 42, 60], or were based on the study authors’ data and assumptions (n = 2) [34, 54].

Among the 41 models reporting the type of clinical efficacy data used, 39 models used both PFS and OS data [20,21,22, 25,26,27,28,29,30,31,32,33,34,35,36,37, 39,40,41, 43,44,45,46,47,48, 51, 54, 56,57,58,59,60,61,62, 64,65,66,67,68], while one model each reported using OS [55] or PFS data [38] only. Time to discontinuation was incorporated in 19 models [22, 25, 28, 31, 32, 39, 41, 44, 46,47,48, 51, 57, 60,61,62, 64,65,66]. Response status was used as a secondary clinical outcome in ten models, of which the objective response rate was used in nine of them [22, 30, 34,35,36, 40, 46, 60, 64], whereas various response outcomes were used in one model [33].

As OS and PFS data from clinical trials are not typically reported over a lifetime, these endpoints required long-term extrapolation when the time horizon of the economic model was longer than the trial follow-up period. Given that prolonging survival constitutes one of the primary objectives for treatments targeting patients with RRMM, precise estimations of survival advantages hold significant importance. Typically, the identified models determined their fitness of parametric distributions to the observed PFS and OS data from trials and the long-term extrapolations based on various factors, including goodness-of-fit statistics (such as Akaike or Bayesian information criterion), visual assessment of goodness of fit to the observed Kaplan–Meier data, and clinical plausibility of the long-term extrapolations [22, 28, 30, 32, 34, 35, 41, 43, 44, 46, 54, 56, 57, 60, 61, 68]. While it was stated in some models that no long-term data were available to assess the clinical plausibility of long-term extrapolations [34, 35, 60], for some models, decisions were validated with clinical experts [22, 32, 38, 41, 44, 47, 61] to assess the clinical plausibility or evaluate against different trials or real-world evidence [31, 43, 44, 61, 62, 64]. For example, real-world data from registry and observational data were used to derive the probability of survival beyond the trial follow-up [31, 44, 62, 64], or were leveraged to validate generalizability of the trial data [61]. Different trials with the same treatments were utilized to adjust the efficacy of the treatment (i.e., compared PFS and OS outcomes across trials using matching-adjusted indirect comparison methodology before fitting on the matched data) [43].

3.3.6 Subsequent Therapies

Considerations around subsequent therapies were specifically mentioned in 28 models [20, 22, 27, 28, 30,31,32,33,34,35,36,37, 39,40,41,42, 44, 45, 47, 56, 57, 59,60,61,62, 66,67,68]. However, limited information was available in terms of how this was incorporated in the models. Subsequent treatments were mostly modeled for patients entering the progressive disease state (n = 13) [22, 28, 32, 34,35,36, 44, 47, 56, 57, 60, 62, 67], while two models assumed subsequent therapies only for patients who discontinued their initial treatments irrespective of disease progression [30, 40]. Given the wide range of treatment regimens and combinations available for patients with RRMM [7, 10,11,12], the most common approach to considering subsequent treatment lines was to use a weighted basket of treatments (n = 15) [22, 30, 33, 34, 36, 37, 40,41,42, 57, 59, 60, 66,67,68], while three models reported the specific treatments modelled [39, 44, 47]. The remaining ten models mentioning subsequent treatments did not provide further clarity on how this information was captured [20, 27, 28, 31, 32, 35, 45, 56, 61, 62].

Approaches used to consider the efficacy of subsequent treatments were briefly mentioned in some HTA submissions, such as the two-stage methodology with re-censoring [62, 66] or the inverse probability of censoring weights methodology [22] to adjust OS data and to align the included subsequent therapies with routine clinical practice in the country of interest. Jakubowiak et al. stated that OS data from the ASPIRE trial was used to incorporate efficacy data from subsequent therapy implicitly, however, the authors did not report the method used [31]. Costs related to subsequent treatments were either applied per cycle (n = 11) [20, 22, 28, 30,31,32, 37, 44, 57, 59, 68] or as one-off cost (n = 5) [40, 41, 56, 60, 62]. The most common approach to considering costs related to subsequent treatment lines was to use a basket of treatments with a distribution of subsequent treatments according to market share and treatment patterns (n = 15) [22, 30, 33, 34, 36, 37, 40,41,42, 57, 59, 60, 66,67,68]. The relevant inputs such as doses, proportion of patients receiving these treatments, treatment duration, or market share/prescribing distribution were obtained from the same clinical trials that provided clinical inputs or observational studies in 18 models [22, 28, 30, 31, 34, 36, 37, 39,40,41, 44, 45, 47, 60,61,62, 66, 67] and clinical experts in four models [20, 32, 35, 57]; a combination of both sources was used in one model [56] and the manufacturer’s unpublished data (i.e., subsequent treatment distributions derived from market share data) was used in another model [68].

3.3.7 Handling of HCRU Data

Of the 20 models that reported how HCRU data were incorporated, either in the main text or as supplementary material, 14 described both related unit costs and frequencies [20,21,22, 28, 30, 31, 35, 42, 54, 56, 57, 60, 67, 68], five reported only cost data [27, 36, 37, 44, 62], and one included only frequency data [34]. The most common approach to incorporating HCRU data was the calculation of per-cycle costs for each health state using frequency data and unit costs for those resources that were incurred more than once during the time horizon, as reported in 16 models [21, 22, 27,28,29,30, 32, 34,35,36,37, 44, 60, 62, 67, 68]. Four models only reported the annual cost based on average frequency per year without specifying how this average cost was applied in the patient flow [31, 42, 54, 68]. A summary of the structure-specific items of models published in peer-reviewed journals and included in HTA submissions is reported in Tables 6 and 7, respectively.

Table 6 Model structure: peer-reviewed articles (19 articles)
Table 7 Model structure: health technology assessment submissions (31 submissions)

3.4 Uncertainty

All but eight models [23, 24, 38, 42, 45, 50, 53, 55] reported conducting different sensitivity analyses, including deterministic sensitivity analyses (DSAs), probabilistic sensitivity analyses (PSAs) and scenario analyses. Among the 18 models published in peer-reviewed journals, 12 reported DSAs, PSAs, and scenario analyses [28,29,30,31,32, 34, 35, 37, 54, 56, 57, 68], and four reported DSAs and PSAs only [20, 27, 33, 36]. One model reported DSAs and scenario analyses [21], while the remaining model did not report any sensitivity analyses [55].

Among the 31 models published as part of the HTA submissions, nine reported all three types of sensitivity analyses [22, 41, 43, 44, 59,60,61,62, 67], and three models reported DSAs and scenario analyses [46, 47, 65]. Eleven models reported scenario analyses only [25, 26, 39, 40, 48, 49, 51, 52, 63, 64, 66]. One model reported a DSA [58]. The remaining seven models did not report any sensitivity analyses [23, 24, 38, 42, 45, 50, 53].

3.4.1 Structural Uncertainty (Scenario) Testing

The most common aspects tested in scenario analyses were parametric distributions modeling for OS, PFS, and time to discontinuation (n = 22) [22, 25, 26, 28, 29, 31, 32, 41, 44, 46,47,48, 57, 59,60,61,62, 64,65,66,67,68], followed by time horizons (n = 22) [22, 25, 26, 28, 29, 31, 32, 41, 42, 44, 46,47,48, 57, 60,61,62, 64,65,66,67,68], utility values (n = 17) [22, 28,29,30,31,32, 34, 35, 41, 44, 46, 48, 56, 57, 60, 65, 68], annual discount rates for cost and benefits (n = 11) [22, 28, 30,31,32, 41, 44, 57, 60, 66, 68], treatment prices [e.g., discount applied to treatment prices] (n = 9) [22, 32, 37, 44, 59, 60, 62, 66, 68], treatment duration and discontinuation (n = 8) [44, 46, 49, 52, 54, 60, 63, 64], wastage assumptions (n = 8) [22, 31, 32, 41, 44, 60, 65, 68], assumptions related to subsequent treatments (n = 6) [22, 44, 47, 60,61,62], cost categories included (n = 6) [28, 30, 31, 44, 57, 67], dose-intensity assumptions (n = 6) [22, 31, 32, 48, 49, 63], assumptions related to lines of treatment (n = 6) [29,30,31,32, 34, 59], and different perspectives (n = 2) [54, 67].

The most frequently included parameters in DSAs were costs including of treatment and administration, health state utility and utility decrements, efficacy variables such as hazard ratios and curve fit, treatment durations, adverse event rates, HCRU frequency, and discount rates. Among those parameters, the most influential were efficacy, costs, and utility inputs.

In those models in which justifications were provided on the range used in the DSA, the majority was based on 95% confidence intervals identified in the literature or trial data when available [20, 22, 28,29,30,31,32, 34, 37, 43, 56, 60]. In the absence of data on confidence intervals, the variation ranged from ±10% [28, 30, 56] to 20% [20, 22, 27, 31, 32, 35,36,37, 61], 25% [60], or 50% [21, 34] of the mean.

Regarding PSAs, most models (n = 14) generated 1000 simulations [20, 22, 27, 30,31,32,33, 36, 37, 40, 41, 56, 57, 68]. Probability sensitivity analysis outputs were evaluated by averaging outcomes from multiple simulations for a higher number of iterations such as 4000 [29], 5000 [28, 62, 67], 10,000 [34, 44, 61], and 15,000 [35]. Because of the high number of parameters considered in the PSA, this information was rarely reported in detail across the models.

One of the discrete-event simulation models included a stochastic sensitivity analysis of the base case with regard to age and time spent in each health state [54]. This sensitivity analysis was similar to a PSA, although it only varied a few parameters (e.g., the mean time in stable and progressive disease in each treatment arm and by age), while all other model parameters were held constant. A summary of the methods used to address uncertainty for models published in peer-reviewed journals and included in HTA submissions is reported in Tables 8 and 9, respectively.

Table 8 Model uncertainty and validation: peer-reviewed articles (19 articles)
Table 9 Model uncertainty and validation: health technology assessment submissions (31 submissions)

3.5 Model Validation

Five types of validation methods were identified across the models, including face validity, internal validity, external validity, cross validity, and predictive validity. Nineteen models reported at least one type of validation [20, 22, 28,29,30,31,32, 35,36,37, 43, 44, 54, 56, 57, 60, 62, 67, 68]. There were considerable differences in the reporting of validation between models published in peer-reviewed journals and those included in HTA submissions. Thirteen of the 18 models published in journals [20, 28,29,30,31,32, 35,36,37, 54, 56, 57, 68] provided this information, compared with six of 31 HTA submissions (including five NICE submissions and one Institute for Clinical and Economic Review submission) [22, 43, 44, 60, 62, 67].

One type of validation was reported for ten models [20, 28, 29, 31, 35,36,37, 43, 54, 56], five models reported using two types [22, 32, 57, 60, 62], and three models were validated with three methods [44, 67, 68]. One model reported four validation types [30], and zero reported using all five methods.

The most common validation method used was external validity (n = 14) [20, 22, 28, 30,31,32, 43, 44, 54, 56, 60, 62, 67, 68], followed by verification of internal validity (n = 8) [22, 30, 32, 44, 57, 60, 67, 68], face validity (n = 8) [29, 30, 36, 44, 57, 62, 67, 68], predictive validity (n = 2) [35, 37], and cross validity (n = 1) [30]. Of the eight models with internal validation, three specifically mentioned that validation had been performed by an independent group of experts [57, 60, 67]. A summary of the validation methods used for models published in peer-reviewed journals and included in HTA submissions is reported in Tables 8 and 9, respectively.

3.6 Transparency

Among the 49 models, three did not report the sponsor or funding source [28, 33, 55] and four reported not receiving any funding [21, 35,36,37]. Most of the models were industry sponsored (n = 38), which was mainly driven by the high number of HTA submissions included [20, 22,23,24,25,26, 30,31,32, 38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54, 56,57,58,59,60,61,62,63,64,65,66, 68]. Four models received grants or funding from governments, non-government or not-for-profit organizations, or academic institutions [27, 29, 34, 67].

All 18 peer-reviewed publications provided a plain language non-technical summary within the main text [20, 21, 27,28,29,30,31,32,33,34,35,36,37, 54,55,56,57, 68], while a detailed technical documentation was available for six models [30,31,32, 56, 57, 68]. One model published in a peer-reviewed journal was available for review upon request [33].

Among the 31 HTA submissions, five CADTH submissions and the Institute for Clinical and Economic Review submission provided a dedicated non-technical summary of the model [38,39,40,41, 58, 67]. The NICE company submissions generally focus on a detailed technical documentation; therefore, non-technical summaries are likely available from companion documentation produced by NICE (e.g., committee slides or summaries). All NICE and PBAC submissions included a detailed technical documentation of the model, although commercially sensitive information was redacted in publicly available submission documents [22,23,24,25,26, 42,43,44, 48,49,50,51,52,53, 59,60,61,62,63,64,65,66]. Limited information on the model was available for the SMC submissions [45,46,47]. Models were not available for review from any of the HTA submissions.

Microsoft Excel® (Redmond, WA, USA) was the most common software used to implement the models (n = 11) [30, 31, 35, 39, 41, 44, 57, 60, 62, 64, 68]. Five models used TreeAge (Williamstown, MA, USA) [21, 27, 34, 36, 37], three models used R [34, 36, 37]. The remaining studies did not state the software used to implement the models. A summary of the transparency elements for models published in peer-reviewed journals and included in HTA submissions is provided in Tables 10 and 11, respectively.

Table 10 Model transparency: peer-reviewed articles (19 articles)
Table 11 Model transparency: health technology assessment submissions (31 submissions)

3.7 Models in Heavily Pre-Treated Patients with RRMM

Heavily pre-treated patients who are refractory to different treatment classes have limited treatment options and thus a high disease burden. The introduction of novel treatments to address these unmet needs is also associated with unique modeling challenges to demonstrate the cost effectiveness of new therapies. This may include higher heterogeneity in patient populations across trials included in the ITC because of differences in prior treatments; the need to consider the whole treatment sequence rather than modeling the current treatment line only because prior treatment may impact the choice of subsequent treatment; and insufficient data for robust extrapolations owing to data from fewer patients being available in later treatment lines. For these reasons, we separately present information from models for this population.

Five of the 49 models included in this review were specifically in patients who were refractory to at least one treatment class: lenalidomide refractory [39], triple-class refractory [33], triple-class exposed [56], triple-refractory or penta-refractory [64], and triple-/quad-/penta-refractory [67].

All models involved cost-utility analyses reporting ICERs per QALY, with three models also including a cost-effectiveness analysis component of ICER per life-year gained [56, 64, 67]. Four models used a healthcare payer perspective [33, 39, 64, 67] and one a societal perspective, accounting for absenteeism from work or school and productivity losses [56]. All but one model were cohort level, covering three PSMs with three health states [39, 56, 64], and a model with three health states combining both decision node and PSM aspects [67, 81]. Last, the individual-patient level model also included three health states [33].

Four models used a lifetime horizon [33, 39, 56, 67], while one model used a 7-year horizon driven by prior PBAC decisions for a shorter time horizon in later-line or last-line treatment settings and the associated shorter median OS [64]. Three models considered a 4-week cycle length [33, 56, 67], and one model each had a 3-week [39] and 1-week [64] cycle length. Clinical data and utilities were generally sourced from clinical trials or previously published literature.

Two models addressed uncertainty through the use of DSAs, PSAs, and scenario analyses [56, 67], one model considered both DSAs and PSAs [33], and the remaining two models used scenario analyses only [39, 64]. One model reported three validation methods (face, internal, and external) [67], while another model reported external validation only [56]. The other three models did not report this information [33, 39, 64].

Four models [39, 56, 64, 67] were industry sponsored. The last model did not provide this information [33]. Plain language summaries were available for four models [33, 39, 56, 67], while detailed technical documentation was available for three models [33, 56, 64]. Only one model was available for an external review upon request [33]. Two models provided information on the software used, which was Microsoft Excel® in both cases [39, 64].

4 Discussion

This SLR provides an up-to-date summary and critique of RRMM health economic models, using a systematic approach and a wide evidence base (published peer-reviewed literature and HTA submissions), filling an evidentiary gap in this indication. Three previously published SLRs of economic models in MM focused mainly on describing factors impacting the cost-effectiveness results [82,83,84]. The most recently published SLR by Choon-Quinones et al. [85] covered economic models of treatments and diagnostic procedures for both untreated and previously treated MM. However, the review did not comprehensively capture the model elements and modeling challenges for either population. The SLR presented here has a wider evidence scope than previously published SLRs as a result of searching related evidence through additional literature database sources and HTA agencies; including all types of economic evaluations (such as cost-minimization analyses previously used by HTA agencies, such as NICE [86, 87] and PBAC [72], to assess novel health technologies); and providing a detailed summary and critique of the methodology adopted by each included model (i.e., beyond a description by line of treatment and geography, based on the recognized recommendations outlined by Caro [14]).

The key issues identified in the development of economic models in RRMM were: (1) how to accurately incorporate trial data and extrapolate to a lifetime horizon; (2) how to model comparators without head-to-head evidence; (3) how to correctly model the costs and efficacy of subsequent treatment lines; (4) how to handle efficacy data from trials in which the subsequent treatment mix did not match that of the target population; (5) how to handle patient-level or subgroup-level heterogeneity in efficacy data; (6) how to accurately capture robust health benefit data and the impact of treatment on patient quality of life when there is a lack of such data from clinical trials; and (6) how to handle structural and parameter uncertainty.

In studies and HTA submissions reporting the clinical efficacy information used, OS and PFS were the primary model inputs; other model inputs, such as response status, were used as secondary clinical outcomes. The majority of models faced challenges because of immature OS and PFS data from the clinical trials informing clinical efficacy inputs, thus necessitating long-term extrapolations and the respective validation of the clinical plausibility of such extrapolations. When assessing the survival benefit of treatments, extrapolation beyond the duration of clinical trials is crucial for estimating mean survival gains and assessing the long-term cost effectiveness. Although several survival extrapolation curves are available, selection of the appropriate curve requires careful consideration as different survival curves could lead to variations in estimates and uncertainty [88, 89]. While clinical plausibility assessments via expert elicitation or external validation using trial or real-world data are accepted by payers, biases may still arise from the former because of the rigor employed in eliciting expert knowledge or wider social pressures impacting expert opinion. In addition, changes to clinical practice may impact the appropriateness of data sources for external validation. Future models considering extrapolation beyond trial duration should attempt a systematic approach to avoid these potential drawbacks [87].

Similar to observations in other oncology indications [90], Markov models and PSMs were the most common approach in economic models for RRMM. All but three [33, 42, 54] of the economic models included in this SLR used a cohort-level approach. More than half of the models using a cohort-level approach employed a PSM structure to allow direct use of the PFS and OS data from clinical trials (with an additional long-term extrapolation) within a simple model structure. Cohort-level Markov and semi-Markov structures were also used, typically to allow for the inclusion of subsequent treatment lines and health states, especially with a distinction between on-treatment and off-treatment within the conventional pre-progression and post-progression health states [34, 59]. The International Society for Pharmacoeconomics and Outcomes Research advocates for the use of state-transition models when disease conceptualization spans a long time horizon and encompasses transitions between diverse health states, or if the simulation of multiple consecutive treatment lines is required [91]. Individual-patient level simulation was not widely used across models included in the SLR; however, where it was used, it allowed modelers to capture heterogeneity amongst the patient population in terms of different response categories (e.g., complete response, partial response/minimal response, no response), patient characteristics, and treatment pathways (i.e., prior treatment, current treatment, and comparator choice) [33, 54].

The PSM structure maintains enduring popularity, with no apparent trend towards an alternative. Of note, in its updated guidance on economic modeling [92], CADTH recently recommended against the use of PSM for long-term extrapolations, irrespective of disease area, owing to the importance of considering causal relationships between the probability of events and time, health status, and treatment. The PSM structure is no longer preferred by CADTH as it relies on an assumption of independence between the OS and PFS curves and does not integrate the likelihood of events and their evolution over time, along with changes in patient conditions, which could hinder CADTH or other external parties from conducting a robust and thorough assessment of the evidence. The preferred approach of CADTH is therefore one that appropriately accounts for the causal relationships between the probability of events and time-varying parameters. It is possible that this pattern may initiate a transition away from PSMs. However, it remains conceivable that PSMs could continue to be used for HTA submissions in other jurisdictions, concurrently developing economic models with structures, such as Markov models, tailored to the requirements of CADTH. As novel and innovative medications receive approval for RRMM, the focus of decision-making may transition from economic evaluations of individual treatment regimens to assessments of treatment sequences. This shift necessitates the consideration of alternative modeling approaches to accurately capture these sequences, whether it involves evaluating a single regimen within a treatment sequence or analyzing sequences of regimens as a whole.

Among included models for which there was an absence of sufficient direct head-to-head comparisons of relevant treatment regimens, clinical model inputs were typically sourced from ITCs. However, no information was available on how these clinical sources were initially identified (e.g., whether an SLR had been conducted). When ITCs were employed, information regarding the comparability of studies and feasibility of conducting such analyses were generally limited, including references to related publications that may elaborate on these analyses and their assumptions (e.g., selection of ITC methodology, and differences in trial and population comparability) [93].

Appropriately including the efficacy and costs of subsequent therapies is a key aspect of modeling therapies for which there is one or more potential subsequent treatment available. However, information on how subsequent therapies were considered in the economic models included in this review was generally limited. Where such information was reported, models generally referred to a basket of available treatments rather than individual treatment regimens [22, 30, 33, 34, 36, 37, 40,41,42, 57, 59, 60, 66,67,68] and only considered the impact of subsequent treatments from a cost and HCRU perspective; the assumption in these models is that the impact of subsequent treatments on clinical outcomes is captured in the OS data used to inform the clinical efficacy inputs. Furthermore, costing of subsequent treatments was mostly done on a per-cycle basis, based on treatment duration and the proportion of patients receiving later lines, or was applied as a one-off cost either at the time of initial treatment discontinuation or post-progression. It should be noted that the four-state PSM (progression-free on treatment, progression-free off treatment, progressed disease, and death), as opposed to the conventional PSM (progression-free, progressed disease, and death), can capture time on and off treatment more appropriately and reflect additional nuances in disease progression or treatment response [22, 30,31,32, 40, 41, 44, 45, 57, 60, 61].

In some cases, it was found that the subsequent treatment mix in the clinical trial did not match the therapies approved in the country of the target population of the economic model. Furthermore, the impact of subsequent treatment on clinical outcomes was only infrequently addressed and, as mentioned above, typically assumed to be captured by the OS data. Three studies reported that OS was adjusted using the two-stage methodology with re-censoring [62, 66] or using inverse probability of censoring weights methodology [22] to more accurately predict the treatment effect expected in routine clinical practice or the expected treatment effect if no differences in the distribution of subsequent treatments between treatment arms had been present. An SLR by Sullivan et al. suggested that insufficient reporting regarding the implementation of the methods used for treatment switching amplifies uncertainty surrounding outcomes, potentially diminishing the adoption of these techniques by decision makers [94]. The NICE TSD provides comprehensive guidance on treatment switching, covering aspects such as timing of switching, clinical evidence (i.e., from clinical trials, real-world studies, or SLRs), patient preferences, cost effectiveness comparing switching treatments versus continuation of current therapy, and healthcare resource implications [95]. Future economic models should be developed with the capability to assess outcomes in scenarios using the observed data from clinical trials (i.e., without adjustments for subsequent treatments) as well as adjusting for subsequent treatments to better reflect those used in routine clinical practice in the country and setting of interest and, if appropriate, real-world effectiveness. These analyses can help address uncertainty around the impact of subsequent treatments on the model outcomes, which would be of relevance, and is often required, during HTA submissions.

In general, there were no discernable differences between the model structures utilized across various lines of treatment. However, one model for a CAR-T cell therapy included a decision node to model the CAR-T cell therapy pre-treatment process in order to determine which patients would be receiving this therapy, moving then to a PSM structure to assess the long-term clinical and economic outcomes of the CAR-T cell therapy [67]. Across the included economic models, a wide range of interventions were included. Most models were in second-line or third-line settings, yet there was limited information on the relapsed/refractory status following prior therapy. This review identified few models in later-line settings or those explicitly in patients who were refractory to specific previous therapies [33, 39, 56, 64, 67], which may be indicative of the limited treatment options available to these patients and potentially a lack of standardized clinical practice. Furthermore, all models evaluated specific interventions rather than the whole treatment sequence, which may introduce a higher degree of uncertainty due to differences in prior treatments across the trials.

Modeling treatment sequencing likely requires model structures that are more complex than typical cohort-level approaches, such as PSMs, for several reasons. First, the choice of subsequent treatments is heavily impacted by prior treatment lines and the type of treatment (e.g., IMiD, PI, mAb) under which a patient has experienced disease progression. Second, efficacy outcomes are likely to differ across treatments in the overall sequence of multiple treatment lines, resulting in complex calculations of the cumulative effect of multiple treatments in a sequence and across different sequences. This is also likely to introduce heterogeneity into the model given differences in patient populations and study designs across the sources for clinical inputs. Third, treatment-free intervals between lines of treatments, if they do occur, may also affect clinical outcomes. There are therefore considerable limitations to using a PSM structure to model treatment sequences in the absence of clinical trials that have been specifically designed to capture the cumulative effect of treatment sequences in OS and PFS. As more therapies enter the treatment landscape, decision problems may increasingly shift from selecting the most cost-effective treatment within a specific line of therapy to determining the optimal treatment sequence.

Where health utilities could be derived from the clinical trials informing the efficacy and safety inputs of the economic model, such inputs were generally considered suitable for the models. This approach was in line with, for example, current NICE methods that consider the mapping of patient-level data from clinical trials to utility values appropriate [87]. However, some records raised concerns regarding the lack of generalizability of clinical trial data to a wider population, particularly data from small trials. There was no common approach across models regarding the use of treatment-specific utilities/disutilities. In addition, some challenges remain regarding the requirement for geography or country-specific preference weights (i.e., tariffs) to derive health utilities from the health-related quality-of-life data from clinical trials or sourcing utility data from the published literature, especially in HTA submissions. While some HTA agencies accept health utilities from different countries, other HTA agencies are stricter and may not accept utility data derived from tariffs from a country other than the one considered in the model [96].

The presence of structural uncertainty can significantly influence the results of cost-effectiveness models. Whilst parameter uncertainty was addressed in almost all models reviewed (i.e., using PSAs, DSAs, or both), structural uncertainty (i.e., using a scenario analysis) was less commonly assessed. Assumptions such as drug wastage, subsequent treatments, treatment lines, and perspective were tested in some models, but not all. Given the growing treatment landscape, testing assumptions around both the cost and efficacy of subsequent lines of therapy is likely to become more important over time.

The results of this SLR should be viewed considering specific limitations related to selection criteria (search dates, language, and selected HTA bodies). First, evidence searches were restricted to the preceding 10 years for models published in peer-reviewed journals and the preceding 5 years for HTA submissions. However, the publication period restriction in the search was justified by aligning with current treatment pathways, which generally consider daratumumab-based, lenalidomide-based, or bortezomib-based regimens in the first-line treatment setting. Although economic models frequently evaluate treatments that have not yet sufficiently penetrated the market, models published before 2014 would not include a sufficient number of patients with RRMM who aligned with current clinical practice. From a purely methodological point of view, however, this may have resulted in economic models that describe some relevant technical model aspects having been missed. Second, the restriction to English-language publications may limit the generalizability of findings to other settings, local languages, or HTA decision makers. However, most pivotal/first economic models for a treatment of interest are typically published in English and only subsequent local adaptations may be published in other languages. Therefore, key model aspects are likely to have been captured in this review as part of the pivotal/first economic model. Third, potential double counting of models may have occurred if insufficient information was available to connect related publications. Given the objective of this review was to describe the methodology and challenges of existing economic models in RRMM, any such double counting is unlikely to impact the overall review findings. Finally, journal word restrictions or dossier templates and publication policies by HTA agencies may have impacted the breadth of information on modeling methods provided. Data availability in HTA documents may have also been subject to redaction of commercially sensitive material. While ensuring data confidentiality, further efforts to promote transparency of the evidence and decision-making processes could increase the level of modeling detail provided and thus support the appropriate development of future models in the same diseases. Recently, NICE, CADTH, and the Institute for Clinical and Economic Review have published a joint statement on changes in how confidential information about health technologies is handled to streamline processes and increase transparency, thus hopefully impacting future submissions and the level of public information available [97].

5 Conclusions

Most health economic models in RRMM lacked sufficient information on the patient population with regard to the treatment-line setting, incorporation of clinical inputs in the model, consideration of subsequent therapy and its impact on health outcomes, and structural parameter uncertainty.

This systematic review highlighted that, in line with other oncology indications, most models followed a conventional approach, such as PSM or Markov models. This was mainly driven by the lower data requirements needed for these modeling approaches compared with more complex, individual-patient level model types. However, as more patients experience disease progression following multiple lines of treatment, there is expected to be a growing need to appropriately model more complex treatment strategies, including longer treatment sequences with more complex subsequent treatment mixes. This, added to more complex analyses to overcome trial and indirect comparison challenges, as well as guidance from at least one HTA agency highlighting the limitations of PSMs, may lead to a trend towards more complex modeling approaches in the future. More extensive data requirements would accompany any such development. Transparency remains fundamental, requiring high reporting standards, and a comprehensive analysis of structural and parameter uncertainty.