FormalPara Key Points For Decision Makers

The cost effectiveness of treatments in first-line non-small cell lung cancer is well established, with all identified models including an epidermal growth factor receptor tyrosine kinase inhibitor as an intervention.

There is a lack of reporting for the justification of structural choice for the model.

Future models should provide justification for the structural choice made, and perform extensive sensitivity analyses and validation in economic evaluations to increase validity to guide healthcare decision making in rare indications.

1 Background

Non-small cell lung cancer (NSCLC) accounts for 80–85% of all lung cancers and 25% of cancer deaths [1]. Adenocarcinoma is the most common histological subtype of NSCLC, comprising approximately 40–50% of all cases [2,3,4,5]. The characterization of tumor subtype and the detection of actionable oncogenic driver mutations are the key features of adenocarcinoma treatment [6, 7]. Mutations in the epidermal growth factor receptor (EGFR) tyrosine kinase occur in approximately 11.9% [8] to 32.3% [9] of patients with NSCLC. Most EGFR mutations are associated with a dedicated treatment pathway, as defined by guidelines from the National Comprehensive Cancer Network and European Society for Medical Oncology, among others [10, 11]. In recent years, EGFR tyrosine kinase inhibitors (TKIs), such as dacomitinib, osimertinib, erlotinib, gefitinib, and afatinib, have been developed to treat patients with EGFR-positive NSCLCs that have demonstrated high efficacy in treating patients with some forms of EGFR mutations in exons 18–21 [10, 12,13,14].

Decision-analytic models are a key component of economic evaluations used to inform policy makers, payers, and stakeholders on whether new treatments should be adopted and reimbursed [15]. The framework provided by decision-analytic models can place treatment options in context with one another, which is particularly valuable when assessing multiple emerging therapies [15]. The goal of this study was to assess the approach and structure of decision-analytic models used in previous economic evaluations for therapies indicated for EGFR-positive NSCLC to present the best practices for use in upcoming models for therapies to treat first-line EGFR-positive NSCLC. To accomplish this, a systematic literature review (SLR) was performed to identify published economic evaluations in adults with locally advanced (stage IIIB or IIIC) or metastatic (stage IV) NSCLC, with tumors harboring EGFR mutations, who had not previously received systemic treatment for locally advanced or metastatic disease. Previous publications have reviewed economic evaluations for targeted therapies in NSCLC; however, these have focused on the detail provided in models or the quality of reporting [16, 17].

This review aimed to (1) critically examine modeling approaches from published economic evaluations based on five components (conceptualization, model structure, uncertainty, model validation, and transparency) as recommended by Caro [18]; (2) explore variation across studies; and (3) discuss challenges and potential areas for improvement for decision-analytic models in front-line EGFR-positive NSCLC.

2 Methods

An SLR was conducted based on guidance from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement [19] and the Cochrane Handbook for Systematic Reviews of Interventions [20].

2.1 Literature Sources

Literature searches were first conducted on 19 December 2022 and updated on 11 April 2023 via Ovid in MEDLINE, MEDLINE In-Process, Embase, Evidence-Based Medicine Reviews: Health Technology Assessment (HTA), Evidence-Based Medicine Reviews: National Health Service Economic Evaluation Database, and EconLit. The bibliographies of relevant SLRs and meta-analyses published during the same timeframe that were identified through the database searches were also searched. Eight conferences of interest that featured oncology or health economics content were identified. Searches of these relevant proceedings were conducted to identify records from 2020 to the present, since most high-quality congress abstracts are published as full text within a 2-to 3-year timeframe. While full publications of economic evaluations are common, some remain unpublished and reported only in HTAs. Several HTA agencies commonly review the type of economic evaluations relevant to this study (i.e., cost-effectiveness analyses, cost-utility analyses, cost-benefit analyses, etc.), including the Canadian Agency for Drugs and Technologies in Health (CADTH) [21], the National Institute for Health and Care Excellence (NICE) [22], and Pharmaceutical Benefits Advisory Committee [23]. Eight HTA agencies [21,22,23,24,25,26,27,28] and the Institute for Clinical and Economic Review [29] were therefore searched for relevant economic evaluations published from 2020 to the present.

The Embase search strategies are provided in Online Resource Table 1 (update search) and Online Resource Table 2 (original search), and the full list of sources searched is provided in Online Resource Table 3.

2.2 Study Selection

The study selection criteria were predefined using the population, intervention, comparator, outcome, and study (PICOS) design framework, as outlined in Table 1. Two independent reviewers screened identified articles at both the title/abstract and full-text levels, and a third reviewer resolved any discrepancies. HTA submission dossiers were searched manually by one reviewer, and a second reviewer validated the search approach and results.

Table 1 Population, interventions, comparisons, outcomes, and study design selection criteria

The target population comprised adults with locally advanced (stage IIIB or IIIC) or metastatic (stage IV) NSCLC, with tumors harboring EGFR mutations, who had not previously received systemic treatment for locally advanced or metastatic disease.

Interventions were included if they were routinely used in clinical care, such as platinum-doublet chemotherapy, immunotherapy alone or in combination with other regimens, TKIs, and emerging therapies, including amivantamab. Interventions with curative intent (e.g., surgery and radiotherapy) were excluded, along with any systemic anticancer treatments not considered usual care. No restrictions were placed on included comparators.

Outcomes of interest included economic model conceptualization, structure, how uncertainty was assessed, validation, and transparency to align with the recommendations reported by Caro [18].

Literature databases and HTA submissions were searched for economic evaluations relevant to this study, including cost-benefit analyses, cost-utility analyses, cost-effectiveness analyses, cost-consequence analyses, and cost-minimization analyses. Publications that were categorized as SLRs or network meta-analyses (NMA) in the literature databases were also hand-searched to identify relevant economic evaluations. Budget impact analyses and cost analyses were excluded.

No geographical or timeframe restrictions were applied to the literature database searches; conference proceedings and SLRs/NMAs were included if they were published from 2020 onward. English-language publications from literature and conference proceedings and HTA submissions were included, along with non-English HTA submissions from the Institute for Quality and Efficiency in Health Care in Germany [27], French National Health Authority [26], Dutch National Health Care Institute [24], and the Dental and Pharmaceutical Benefits Agency in Sweden [25].

2.3 Data Extraction

Data were extracted into predefined data extraction sheets. The extracted data were related to key model elements: conceptualization, structure, uncertainty, validation, and transparency.

Records that used an identical model structure for the same treatment and country were considered to be related to the model’s original publication. Only the record of a unique model with the earliest publication date was used when summarizing model designs and characteristics.

3 Results

At the title/abstract level, 721 records were screened in the original search and 43 records were screened during the update; 81 reports of the 721 records (78 from the original search and three from the update) were selected for full-text review. As part of the original search, four congress abstracts were identified through hand-searching and 82 reports from HTA bodies were reviewed for eligibility; no additional congress abstracts or HTA reports were identified as part of the update search. In total, 59 unique studies reporting on an economic evaluation (summarized in 67 reports) were selected for data extraction (see Fig. 1 for details on both the original and updated searches). Among the 67 reports, 33 were published as manuscripts in peer-reviewed journals, and six as conference abstracts [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68], 20 were HTA submission documents [69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88], and eight were related reports [89,90,91,92,93,94,95]. The full list of the 67 reports is presented in Online Resource Table 4. Among the eight related reports, one was an abridged secondary publication [96] and seven were resubmission documents to an HTA body [89,90,91,92,93,94,95]; these eight reports were not included as part of summary analysis.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram. *These conferences were searched as part of the original SLR. **Not an HTA body. Search by intervention (brand and generic name) since it is not possible to search by indication. AACR American Association for Cancer Research, ASCO American Society of Clinical Oncology, BTOG British Thoracic Oncology Group, CADTH Canadian Agency for Drugs and Technologies in Health, ELCC European Lung Cancer Congress, ESMO European Society for Medical Oncology, EU European Union, HAS French National Health Authority (Haute Autorité de Santé), HTA health technology assessment, IASLC International Association for the Study of Lung Cancer, ICER Institute for Clinical and Economic Review, IQWiG Federal Joint Committee (Gemeinsamer Bundesausschuss/Institute for Quality and Efficiency in Health Care), ISPOR The Professional Society for Health Economics and Outcomes Research, NHS EED National Health Service Economic Evaluations Database, NICE National Institute for Health and Care Excellence, PBAC Pharmaceutical Benefits Advisory Committee, SLR systematic literature review, SMC Scottish Medicines Consortium, TAT targeted anticancer therapies, TLV Dental and Pharmaceutical Benefits Agency (Tandvårds-och läkemedelsförmånsverket), ZIN National Health Care Institute (Zorginstituut Nederland)

Study characteristics for the included economic evaluations are summarized in Table 2. The global distribution of identified economic evaluations is illustrated in Fig. 2.

Table 2 Study characteristics
Fig. 2
figure 2

World map of economic evaluations in patients with non-small cell lung cancer harboring epidermal growth factor receptor mutations

3.1 Conceptualization

The model conceptualization is summarized in Table 3.

Table 3 Conceptualization

Eighteen studies explicitly described the intended audience [30,31,32, 39, 49, 50, 55,56,57, 59,60,61, 66, 84,85,86,87,88]. Among these studies, 13 categorized the audience as a medical/clinical decision maker [30,31,32, 39, 49, 50, 55,56,57, 59,60,61, 66] and five were NICE submission documents that specified the audience as ‘[NICE] consultees and commentators’ [84,85,86,87,88]. The results of the economic evaluations were used to directly support decisions regarding reimbursement via HTA documents (n = 20) [69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88]. For the remainder, studies stated that the use was for policy/funding decisions (n = 5) [34, 35, 48, 56, 62], to promote the sustainability of limited healthcare resources (n = 5) [30,31,32, 60, 61], or to support treatment choices (n = 7) [39, 47, 50, 55, 57, 59, 65]. Twenty-two studies did not explicitly state the intended use of the economic evaluations [33, 36,37,38, 40,41,42,43,44,45,46, 49, 51,52,53,54, 58, 63, 64, 66,67,68].

Caro [18] calls for a description on whether models have a single- or multiple-application use. Fifty-seven studies evaluated treatment at a single point in the therapeutic pathway, and two studies evaluated treatment in first- and second-line settings [62, 68]. Multiple applications or whole disease modeling is described as valuable when, for example, upstream events in the treatment pathway are expected to have important downstream effects, or when simple cost-utility decisions fail to reflect the complexity of the decision-makers’ objectives [97]. Given the intended use and objectives of the economic evaluations identified, i.e., to make decisions at a single point in the disease pathway (locally advanced [stage IIIB/IIIC] or metastatic [stage IV] NSCLC that have not previously received systemic treatment for locally advanced or metastatic disease), it was appropriate that only two of the economic evaluations considered multiple applications.

All 59 studies included an EGFR TKI as an intervention; this was considered appropriate given the focus of the identified studies in patients harboring an EGFR mutation. The most frequently evaluated interventions were osimertinib (n = 18) [30, 32, 34, 35, 38, 40, 44, 47, 50, 51, 55, 62, 68, 71, 73, 78, 83, 88], dacomitinib (n = 14) [31, 32, 40, 42, 43, 47, 52, 53, 63, 67, 72, 74, 82, 87], afatinib (n = 17) [32, 33, 36, 39, 40, 45, 46, 56, 57, 59, 65, 66, 69, 70, 77, 80, 86], gefitinib (n = 12) [33, 37, 40, 41, 49, 50, 54, 65, 69, 75, 81, 84], and erlotinib (n = 12) [32, 40, 50, 57, 58, 60, 64, 65, 69, 76, 79, 85]. Ramucirumab, an immunotherapy, was included in a combination treatment arm with erlotinib in one economic evaluation [48]; the rationale in the investigation of this treatment was to support policy decision toward its listing in China [48]. Twelve studies [32, 33, 40, 46, 47, 49, 50, 57, 62, 64, 65, 69] had a primary aim to evaluate multiple first-line treatments.

The most common comparators among the 59 studies reporting on an economic evaluation in locally advanced NSCLC patients harboring an EGFR mutation were EGFR TKIs (n = 35) [30,31,32, 34,35,36, 38, 41,42,43,44,45, 48, 51,52,53,54,55, 59, 61, 63, 67, 68, 71,72,73,74, 78, 80, 82, 83, 85,86,87,88], platinum-based chemotherapy (n = 12) [49, 56, 58, 60, 65, 66, 75,76,77, 79, 81, 84], or either EGFR TKIs or platinum-based chemotherapies evaluated in the same model (n = 5) [37, 39, 57, 62, 70]. Seven studies did not distinguish a reference comparator but evaluated multiple first-line treatments [33, 40, 46, 47, 50, 64, 69]. Among the 20 HTA submissions, the comparator of choice transitioned from platinum-based chemotherapy to a TKI, as TKIs became the standard of care—all new technologies submitted to a HTA agency after 2016 evaluated against a TKI only (n = 9) [71,72,73,74, 78, 82, 83, 87, 88] (Fig. 3). Twenty-eight studies did not state the rationale for the choice of comparator [30, 33, 35, 36, 38, 39, 41,42,43, 45, 46, 48, 52, 54, 56,57,58,59,60,61,62,63, 65,66,67,68, 72, 76]. In the remaining 31 studies, comparators were selected to reflect standard of care, which was defined as commonly used regimens or licensed treatment [31, 32, 34, 37, 40, 44, 47, 49,50,51, 53, 55, 64, 69,70,71, 73,74,75, 77,78,79,80,81,82,83,84,85,86,87,88]. Only one of these studies also included comparator regimens that were investigational in order to provide a comprehensive picture of possible treatment options [40]. The use of economic evaluations relying on ‘commonly used’ treatments to select comparator choices in a new mutation subgroup (i.e., patients harboring EGFR mutations) is reflective of the treatment paradigm shift from standard chemotherapy in an all-comer population, to new treatment options for EGFR TKIs in a new mutation subgroup. The use of investigational agents as comparators that do not reflect standard of care in the case of one study [40] has limited use in the context of clinical/policy decision making.

Fig. 3
figure 3

Evolving comparator landscape among health technology assessment and/or value Assessments. CADTH Canadian Agency for Drugs and Technologies in Health, ICER Institute for Clinical and Economic Review, NICE National Institute for Health and Care Excellence, PBAC Pharmaceutical Benefits Advisory Committee, SMC Scottish Medicines Consortium, TLV Dental and Pharmaceutical Benefits Agency (Tandvårds-och läkemedelsförmånsverket), ZIN National Health Care Institute (Zorginstituut Nederland)

Most were cost-utility analyses (n = 52) [30,31,32, 34,35,36,37,38,39,40,41,42,43,44, 46,47,48,49,50, 52, 53, 55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79, 81,82,83,84,85,86,87,88,89,90,91,92,93,94,95], four were cost-effectiveness analyses [33, 45, 51, 54], one was a cost-minimization analysis [80], and two (both of which were HTA submissions) presented both a cost-utility and cost-minimization analysis depending on the comparator [76, 81]. The choice of model type was only reported in two studies that used a cost-minimization analysis, to justify that model type, given there were no statistically significant differences in efficacy and safety between treatment options [80, 81]. Among the cost-effectiveness analyses, incremental cost-effectiveness ratio by median survival time [45], life-years [54], and overall and progression-free survival [33, 51] were presented; however, no rationale was provided for why these outcomes were selected.

3.2 Model Structure

The model structures for each study are summarized in Table 4.

Table 4 Model structure

Fifty-six of the 59 studies were deterministic [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44, 46,47,48,49,50, 52, 53, 55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88] and three studies did not report structure type [45, 51, 54]. A similar number of studies used a Markov model (n = 20) [30, 33, 38, 40, 41, 47, 50, 57,58,59,60,61,62, 65, 68, 71, 75, 77, 81, 84] or a partitioned survival model (n = 22) [31, 32, 34, 36, 42,43,44, 46, 52, 53, 56, 67, 69, 70, 72, 73, 78, 82, 83, 86,87,88]. For the remaining studies, seven used a decision tree and Markov model (n = 7) [35, 39, 48, 49, 55, 63, 66], six of which depicted a schematic decision tree, followed by the Markov state transition model [35, 39, 49, 55, 66]. Other model structures included semi-Markov (n = 2 [79, 85]) or decision tree only (n = 1 [37]). The remaining seven economic evaluations did not clearly specify the model structure [45, 51, 54, 64, 74, 76, 80]. There was a gap in justification of the model structure chosen; 52 studies did not provide a rationale [30,31,32,33,34,35,36,37,38,39,40,41,42,43, 45,46,47,48,49,50,51,52, 54,55,56,57,58,59,60,61,62,63,64,65,66, 68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83, 85]. Of the seven studies that reported a rationale, six used a partitioned survival model and cited ease in construction/direct use of summary data from published Kaplan–Meier curves and representativeness of the trial data (progression-free survival and overall survival) [44, 88]; representativeness of the disease pathway (e.g., the chronic/metastatic nature of the disease, and treatment goal to avoid disease progression and prolong life) [44, 67, 87, 88]; and due to its use in other published economic evaluations in NSCLC and/or oncology more broadly [53, 86]. The other study that reported a rationale was a Markov model and provided justification for the structure as it was previously used to inform decision problems in lung cancer and reflected the natural progression of the disease [84]. For the studies that used a decision tree only and semi-Markov, no rationale was provided to confirm suitability of the structured used [37, 79, 85].

Of the studies that adopted a decision tree and semi-Markov model, two studies utilized the decision tree to present two strategies in regard to EGFR testing/screening [49, 66]. In both studies, patients who did not undergo EGFR testing were assigned to receive platinum-based chemotherapy, regardless of EGFR mutation status. Patients who underwent testing and tested positive for an EGFR mutation would receive an EGFR TKI, while those testing negative would receive platinum-based chemotherapy [49, 66]. Other studies that also adopted a decision tree used it as a basis for assigning different interventions to patients [35, 39, 48, 55, 63].

The majority of the 59 studies (n = 46) [30,31,32,33,34,35,36,37, 39,40,41, 44, 46,47,48,49,50, 52, 53, 55,56,57,58,59,60,61, 63, 65,66,67,68,69,70, 72,73,74, 76,77,78,79, 82, 83, 85,86,87,88] employed a three-health state model consisting of progression-free, progressive disease, and death. An additional three studies used a four-health state model including response, stable disease, disease progression and death (two of which were HTA submissions for gefitinib to NICE [84] and the Scottish Medicines Consortium [SMC] [81], and one for osimertinib for the CADTH [71]). In addition, two studies [38, 62] used a six-health state model as described in Table 4. Lastly, eight studies did not specify the number or description of the health states [42, 43, 45, 51, 54, 64, 75, 80]. Of the studies that used a four- or six-health state model, rationale was only provided in the HTA submission for gefitinib, which used health states to model the natural progression of advanced NSCLC [84].

Nearly half (n = 28) of the 59 studies used a 1-month cycle length (or 28/30 days) [30,31,32, 36, 39, 41, 44, 45, 47, 50, 53, 55,56,57,58, 63, 65, 71,72,73, 77, 78, 82,83,84,85, 87, 88], followed by 3 weeks (or 21 days) [n = 10] [35, 49, 59,60,61,62, 66, 67, 79, 84], 1 week (n = 4) [34, 38, 40, 69], and 2 weeks (n = 1) [48]. Sixteen economic evaluations did not report the cycle length [33, 37, 42, 43, 45, 51, 52, 54, 64, 68, 70, 74,75,76, 80, 81]. Rationale for choice of cycle length was not typically reported. However, the main justification noted included alignment with treatment schedules, as well as being long enough to reasonably detect meaningful differences in the interventions being compared.

Sixteen studies applied a half-cycle correction in the model [31, 32, 34, 38, 41, 48, 55, 56, 58, 73, 77, 84,85,86,87,88]; however, the method utilized was only stated in HTA submissions for dacomitinib [87] and osimertinib [88], in which the number of patients at the start and end of each cycle was averaged for costs and outcomes.

Time horizons modeled were 1 year (n = 2) [51, 80], 4 years (n = 1) [58], 5 years (n = 10) [44, 46, 56, 66, 70, 75,76,77, 79, 84], 7 years (n = 1) [72], 10 years (n = 20) [34,35,36, 38, 39, 49, 50, 55, 57, 59,60,61,62,63, 65, 67, 72, 78, 85, 86], 15 years (n = 8) [30,31,32, 42, 43, 53, 82, 87], and 20 years (n = 4) [68, 73, 83, 88]. Two studies each used two time horizons (5 and 10 years [47]; 3 and 5 years [33]), and six economic evaluations modeled a lifetime horizon but did not specify the number of years [40, 41, 48, 52, 64, 69]. The remaining studies did not report any details on the time horizon (n = 5) [37, 45, 54, 74, 79]. Among the studies that used a lifetime horizon and specified the length of years, the range was between 5 and 20 years. Justification for the time horizon was infrequently reported, but usually aligned with the maximum life expectancy and/or nature of NSCLC, and was sufficiently long enough to capture all meaningful differences (which generally aligns with methods set out by HTA guidelines [e.g., NICE]) [98].

The most frequently reported cost discount rate was 3%, which was used for economic evaluations for China [47, 48, 55, 59, 60], Hong Kong [67], multiple European countries (France, Italy, and Spain) [58], Singapore [34, 56], Spain [30,31,32], Sweden [53], Taiwan [64], United States (US) [57, 67, 69] and the US and China [62, 63]. Other cost discount rates included 1.5% for Canada and The Netherlands [38, 71], 3.5% for the UK [40, 84,85,86,87,88], 4% France [36] and Netherlands [41, 73], and 5% for Australia, China, Colombia, Mexico, and Portugal [33, 39, 40, 42,43,44, 46, 49, 50, 52, 61,62,63, 66, 68, 77, 92]. A 0% discount rate was used in one economic evaluation for Asia [37], and discount rate was not reported in five studies [45, 51, 54, 70, 72, 74,75,76,77,78,79,80,81,82,83]. In two studies based in The Netherlands [41, 95], a 5% discount rate was applied for costs, but 1.5% for outcomes, in accordance with Dutch guidelines. In one study [50] in which a discount rate (5%) was only applied to costs, it was unclear if outcomes were also discounted; no justification was provided. Generally, however, studies applied the same discount rate for both costs and outcomes in accordance with local HTA guidelines.

Utilities were generally sourced from literature (secondary sources included longitudinal cohort studies and other cost-effectiveness models) or trial data and were applied to health states, although this is inferred. In three studies [47, 57, 84], utilities were applied for the delivery of treatment (oral vs. intravenous). Disutilities were also sourced from published literature and were typically applied as a utility decrement (utility values adjusted). Disutilities were explicitly not included in the base-case analysis of two studies, one in which the stated rationale was to avoid double-counting [87] and the other that stated treatment-specific utility values would have accounted for this already [63]. No rationale was provided in the remaining two studies [44, 55]. No studies reported on applying age-related disutilities.

3.3 Uncertainty

Sensitivity analyses, including probabilistic sensitivity analysis (PSA) and one-way sensitivity analysis (OWSA), were described in 47 of the 59 economic evaluations [30,31,32,33,34,35,36, 38,39,40,41,42,43,44, 46,47,48,49,50, 52, 53, 55, 56, 58,59,60,61,62,63,64,65,66,67,68,69, 71,72,73, 77, 78, 82,83,84,85,86,87,88]. The most common parameters tested in the PSA and OWSA were costs, efficacy inputs such as hazard ratios, utility, disutility, as well as routine care frequency, treatment durations, and discount rates. Justifications provided on the upper/lower bounds used in the deterministic sensitivity analyses were based on 95% confidence intervals identified in the literature, or, in the absence of data from the literature, the variables used in the model were commonly changed (i.e., by more than one economic evaluation) by plus or minus 20% [30, 41, 50, 55, 60], 25% [34, 41, 44, 56, 73, 83, 88], or 50% of the mean (i.e., base-case value of the parameter being varied) [34, 40, 44, 53, 73, 86].

Scenario analyses were described in 25 of the 59 studies [34, 36, 37, 40, 41, 44, 46, 49, 53, 55, 56, 66, 69, 71,72,73, 77, 78, 82,83,84,85,86,87,88]. As depicted in Fig. 4, the most common parameters tested were the overall survival and progression-free survival parametric distributions modeling (n = 10) [34, 41, 44, 56, 73, 78, 83, 85, 87, 88]; health state utility values and disutility values associated with adverse events (n = 9) [34, 40, 44, 53, 73, 82, 84, 86, 87]; drug costs (n = 8) [34, 40, 48, 56, 66, 73, 78, 88]; subsequent treatment assumptions such as those receiving subsequent treatment and the distribution/regimens assumed in the subsequent line of therapy (n = 7) [40, 41, 78, 83, 86,87,88]; time horizon (n = 7) [44, 72, 73, 78, 82, 84, 88]; and rebate/patient access scheme for the intervention and comparators (n = 4) [30, 41, 50, 55, 60]. Only five studies (all HTA submissions) presented additional analyses where assumptions around treatment waning and relative treatment effect were explored [69, 71, 83,84,85]. The remaining parameters were each represented in three or fewer studies.

Fig. 4
figure 4

Uncertainty parameters tested in scenario analysis. External survival estimates refers to the use of an external clinical trial to estimate the survival probability of patients in the chemotherapy arm. FPNMA fractional polynomial network meta-analysis

3.4 Model Validation

The distribution of validation across the models is depicted in Fig. 5.

Fig. 5
figure 5

Model validation

Four types of validation methods were identified, including internal validation, external validation, cross validity and face validity. Thirty-two of the 59 studies reported at least one validation method [30, 32, 34, 38, 39, 41, 44, 45, 48, 49, 53, 55, 57,58,59,60,61,62,63, 65,66,67, 69, 70, 73, 78, 83,84,85,86,87,88]. Approximately one-quarter (n = 15) of the studies were cross-validated with other published cost-effectiveness models in the same indication, where the estimated quality-adjusted life-years, life-years, and incremental cost-effectiveness ratio were compared [30, 32, 34, 38, 39, 41, 44, 49, 53, 55, 58, 59, 62, 65, 66]. External validity was used in 16 studies, as extrapolated progression-free survival and/or overall survival curves were compared with trial data or real-world data [44, 48, 53, 55, 57, 60,61,62,63, 73, 83,84,85,86,87,88]. Face validity was used in nine studies, by means of clinical or health economics experts [53, 69, 70, 73, 78, 84, 85, 87, 88]. Ten studies reported to have undergone an internal validation, where model calculations, mathematical equations, and data sources were checked for consistency and accuracy [38, 53, 63, 67, 73, 84,85,86,87,88]. As shown in Fig. 5, 16 studies reported using a single validation method [30, 32, 34, 39, 41, 45, 48, 49, 57,58,59,60,61, 65,66,67], six studies used two methods [38, 44, 55, 62, 63, 86], five studies used three methods [69, 70, 78], and one study used four types of validation [53].

3.5 Transparency

A summary of the transparency elements from included economic evaluations is provided in Table 5.

Table 5 Transparency

As depicted in Fig. 6, 24 of the economic evaluations were sponsored by the manufacturer of the intervention [37, 39, 46, 53, 69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88] (among these, 19 were identified from HTA documents [70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88]) and 24 did not state the sponsor [30,31,32,33,34,35, 38, 41, 42, 44, 45, 49,50,51, 54, 56,57,58,59, 61, 62, 65,66,67]. For the remaining economic evaluations, seven were sponsored by non-industry organizations (such as National Natural Science Foundation of China, and University Hospital in China) [40, 47, 48, 55, 60, 63, 64], and four did not report details on sponsorship [36, 43, 52, 68]. A non-technical summary was provided in 31 economic evaluations [30,31,32,33,34,35,36,37,38,39,40,41, 44, 46, 48, 50, 53, 55,56,57,58,59,60,61,62, 64,65,66,67, 71, 72], of which all documentation was freely available. Full technical documentation was available in 26 economic evaluations [30, 31, 35,36,37,38,39,40,41, 44, 45, 47, 48, 50, 58, 59, 63, 65,66,67, 69, 73, 85,86,87,88]. No models were available to review and use/replicate. TreeAge Pro® software (Williamstown, MA, USA) was used in 14 models [35, 37, 38, 44, 47, 50, 55, 57, 59, 61, 65,66,67,68], followed by Microsoft Excel® (Redmond, WA, USA) in 12 models [31, 34, 53, 56, 63, 69, 73, 84,85,86,87,88], and R was utilized in four models [33, 41, 60, 62]. The remainder (n = 28) did not report the type of software used [30, 32, 36, 39, 40, 42, 43, 46, 48, 49, 51, 52, 54, 58, 64, 70,71,72, 74,75,76,77,78,79,80,81,82,83].

Fig. 6.
figure 6

Reported sponsors. NR, not reported

In half of the 59 studies, sufficient documentation detailing the model structure, assumptions, and model inputs as well as data sources used were presented. Studies that did not provide adequate documentation were mainly congress abstracts and HTA submissions from agencies other than NICE, such as SMC and CADTH, or NICE submissions that were published more than a decade ago.

4 Discussion

Decision-analytic models are an integral component of the economic evaluation of new health technologies, providing a common framework to contextualize the comparative clinical and economic consequences of treatments, and inform healthcare reimbursement decision making [15]. The current study critically examined the approach and structure of economic evaluations used in previous published studies for therapies in untreated locally advanced or metastatic NSCLC harboring an EGFR mutation. This examination was conducted in five areas as recommended by Caro [18]—conceptualization, model structure, uncertainty, model validation, and transparency.

4.1 Conceptualization

Researchers should outline basic details regarding the conceptualization of their models, including the decision problem, target audience, model type and its rationale. Caro [18] also recommends stating whether models have a single- or multiple-application use. Not surprisingly, the majority of identified models were built for a single application, which aligns with the decision at a single point in the disease pathway for the population of interest. Almost 90% of the models were cost-utility analyses, which allow for the consideration of measuring how well treatments may impact clinical outcomes and patient’s quality of life [99, 100]. This is also the standard established by many HTA and value assessment agencies and methods task forces [99, 100].

4.2 Model Structure

Although descriptions regarding model structure were consistently reported across the studies, justification for these choices were lacking in many studies. For example, rationales were infrequently reported for the choice of number of health states, time horizon, cycle length, and model type. Markov models and partitioned survival models each were used in more than one-half of the studies, with the others employing decision tree, semi-Markov, or a combination of approaches. Decision tree models are particularly well suited for modeling simple scenarios occurring over a short time horizon, limiting their suitability to adequately model the continuous changes in health-related quality of life and costs associated with oncology treatments over a longer time horizon. These limitations arise due to the inherent characteristics of decision tree models, which may impact their suitability and accuracy in capturing complex cost-effectiveness dynamics [101, 102]. In locally advanced (stage IIIB or IIIC) or metastatic (stage IV) NSCLC, with tumors harboring EGFR mutations, the use of a partitioned survival model or Markov model may be considered appropriate due to the common use of these structures in existing studies and the progressive nature of the disease. Partitioned survival models also have the advantage of the direct use of endpoints measured in the clinical trial. While partition survival models do capture subsequent treatment costs, it is important to acknowledge their limitation in reflecting the impact of subsequent treatment on overall survival from a health outcome perspective. Other approaches offer additional advantages. For example, semi-Markov and Markov models are able to capture subsequent disease progressions across multiple stages or lines of treatments (i.e., transition from first-line progressed disease to second-line progressed disease, etc.). This may be more representative of real-world clinical practice, allowing for an accurate depiction of disease progression, especially given that drugs such as amivantamab could be available at later lines of therapy. None of the studies identified in this review reported the utilization of discrete event simulation (DES) models. DES models might offer enhanced flexibility in implementing complex models, resulting in a simpler structure compared with Markov models that require a large number of health states. However, it is important to note that these models are mainly used in the presence of baseline heterogeneity, continuous disease markers, time-varying event rates, and the need to assess the impact of prior events on subsequent event rates. In addition, a DES model might often require patient-level data, time, and expertise from both the reviewers and analysts [103]. As for modeling NSCLC harboring EGFR mutations, it has been demonstrated that more straightforward models such as Markov models or partition survival models are adequate for accurately assessing the cost and health benefits of treating patients with NSCLC harboring EGFR mutations. However, due to the potential significant heterogeneity in clinical and physiological manifestations of NSCLC, which can have an impact on outcomes, it becomes crucial to take into account and explain the influence of these heterogeneous groups on the reported differences in effects, and to select the appropriate model structure based on the decision problem.

In general, the appropriateness of the chosen cycle length can be guided by clinical judgment and available clinical trial data. In populations with NSCLC harboring EGFR mutations, cycle length should be determined based on the administration schedule of the treatment regimens considered in the economic evaluation, varying from a 1-week to a monthly cycle length. In the majority of the studies, the application of half-cycle correction was not reported; however, the application of a half-cycle correction is recommended, adjusting for potential bias in estimating costs and health outcomes by accounting for the timing of the transitions between health states. The selection of an appropriate time horizon for modeling NSCLC, harboring EGFR mutation, should consider the natural history of the disease and be long enough to capture all the relevant economic and health consequences of the interventions of interest (which may require extrapolation of clinical outcomes observed in clinical trials). Since NSCLC is a chronic disease, a lifetime time horizon may be necessary. Lastly, it is recommended that the choice of discount rates for costs and benefits aligns with the guidelines or recommendations from relevant HTA agencies or decision-making bodies, with widely used annual discount rates ranging from 1.5 to 5%.

4.3 Uncertainty

Sensitivity analysis is a fundamental element in economic evaluations, serving as a tool to assess the reliability and robustness of the presented results by evaluating the impact of varying key inputs and assumptions on key model outputs. Approximately 80% of the models incorporated sensitivity analyses, with common parameters including costs, efficacy inputs (e.g., hazard ratios) and utilities/disutilities. Parameters were also varied in scenario analyses, but less frequently (approximately 42% of studies).

4.4 Model Validation and Transparency

Validation and transparency are both crucial, interrelated steps when developing cost-effectiveness models [104], and Caro [18] recommends seeking independent face validity and documentation of all testing, comparison, and resolutions. Model validation involves assessing the accuracy and reliability of the model results. This ensures that the model accurately represents the expected costs and health benefit of the modeled patient population with robust and reliable conclusions and predictions. Model transparency, on the other hand, refers to clear and explicit documentation of the model structure, assumptions, data sources and calculations. This enables other researchers in the field to not only understand but also replicate the analyses [104].

Model validation is performed through various steps. These include internal validity: model calculations, mathematical equations, and data sources are checked for consistency and accuracy; external validity: model results are compared with reported data, including clinical trials and real-world data; cross validity: model results are compared with other published cost-effectiveness studies in the same indication; face validity: an external clinical and/or health economic expert assess(es) the model structure, assumptions, and predications; predictive validity: the model results are compared with prospectively observed events [104].

Current published models generally failed to properly validate the results and assumptions in the cost-effectiveness models. For example, cross-validation with other published cost-effectiveness models in the same indication was used in one-quarter (n = 15) of the studies, and nine other studies used face validity. Slightly more than half of the models reported using at least one type of validation method (internal validation, external validation, cross validity, and face validity), and of these, half used a single method. There are no clear guidelines on the required number of validations for a model to be classified as robust and high-quality evidence. However, employing at least two to three levels of validation is recommended to enhance the reliability and robustness of the cost-effectiveness analysis used to inform decision making. With additional levels of validation, the analysis becomes more reliable and less susceptible to uncertainties or variations in the input parameters. Robustness ensures that the results of the cost-effectiveness analysis are more dependable and can withstand scrutiny, providing more confidence in the findings for decision makers. Starting with an internal validity that follows published quality check guidelines is an important step in the model development to ensure the accuracy of model calculations and overall model inputs and to identify any potential errors or biases. Following the technical validation step, model results and assumptions should be validated at least through external validation with real-world data or clinical trial data, or a through a face validity involving clinical experts. While we acknowledge the limitation of the external validation and cross-validation due to paucity of the data, researchers are encouraged to compare model results with other published studies in a similar indication or perhaps in a different line of therapy, or in the wider patient population within the same indication to ensure the model is accurately projecting the patients’ outcomes.

4.5 Limitations

The SLR provides a comprehensive review of literature published up to April 2023. It is possible that new decision-analytic models have been published since this date. Given the lack of economic evaluations in locally advanced (stage IIIB or IIIC) or metastatic (stage IV) NSCLC, with tumors harboring EGFR mutations, that had not previously received systemic treatment, regular monitoring and surveillance of new literature published in this rare indication can help to enhance the understanding of the most appropriate modeling approaches. In addition, although a broad range of HTA and non-HTA agencies were hand searched, the SLR did not include a critical review of economic evaluations from agencies outside these organizations, unless the models were published in manuscripts in peer-reviewed journals accessible via electronic databases and conference abstracts available via key scientific congresses.

5 Conclusions

Although almost two-thirds of the cost-effectiveness studies identified were published in recent years (2019–2022), many lacked sufficient reporting on the justification for structural choice, validation, and the incorporation of sufficient sensitivity analyses. Future models should aim to provide rigorous justifications of structural choices, extensive sensitivity analyses, and multi-level validation in economic evaluations while carefully considering various factors such as data sources and demographic heterogeneity to ensure the validity of model results and enhance the accuracy of the presented model. This critical review of existing decision-analytic models highlights how increased transparency and collaboration with multiple stakeholders (clinicians and payers) can help to strengthen the validity of economic evaluations to guide healthcare decision making. As the treatment landscape for NSCLC with EGFR mutations evolves, the need to replicate and refine the decision-analytic models in these indications will be required.