FormalPara Key Points for Decision Makers

The detailed methodologies and results of 14 cost-effectiveness analysis (CEA) publications of pembrolizumab trials in non-small cell lung cancer (NSCLC) were reviewed and compared.

Differences in methodology can potentially lead to opposing conclusions on the cost-effectiveness of NSCLC therapies.

Policy makers must weigh the limitations of CEA designs to make informed decisions.

1 Introduction

Lung cancer is the most common cancer type and cause of cancer death worldwide [1]. Non-small cell lung cancer (NSCLC) accounts for 80–85% of all lung cancers and can be further divided into squamous and nonsquamous subtypes. NSCLC often does not show symptoms until advanced stages [2]. Nearly 70% of diagnosed cases are locally advanced or metastatic [3].

Historically, late-stage NSCLC was treated with chemotherapy, which led to an overall survival (OS) of 8–10 months [4]. Immune checkpoint inhibitors are a breakthrough cancer treatment, significantly improving OS and/or progression-free survival (PFS) over chemotherapy for patients with advanced NSCLC in multiple clinical trials [5].

Pembrolizumab, an anti–programmed death-1 (anti–PD-1) monoclonal antibody, was initially approved as a monotherapy by the United States (US) Food and Drug Administration (FDA) in the second- and later-line settings for patients with metastatic NSCLC based on programmed death-ligand 1 (PD-L1) tumor proportion score (TPS) (TPS ≥ 50% based on KEYNOTE-001, and TPS ≥ 1% based on KEYNOTE-010) [6, 7]. Subsequently, first-line pembrolizumab monotherapy was approved for patients with metastatic NSCLC with a TPS of ≥ 50% and no epidermal growth factor receptor or anaplastic large-cell lymphoma kinase (EGFR−/ALK−) genomic tumor aberrations (KEYNOTE-024) [7]. Today, pembrolizumab monotherapy is a first-line standard of care for patients with metastatic NSCLC with a TPS of ≥ 1% and patients with locally advanced NSCLC with a TPS of ≥ 1% who are ineligible for surgical resection or definitive chemoradiation (KEYNOTE-042) [8]. First-line pembrolizumab plus chemotherapy combinations are a standard of care for EGFR−/ALK− nonsquamous metastatic NSCLC (KEYNOTE-189) and squamous metastatic NSCLC (KEYNOTE-407) [8]. Regulatory agencies in many other countries have also approved pembrolizumab with or without chemotherapy as a treatment for all or a portion of these FDA-approved NSCLC indications.

The economic value of pembrolizumab-based regimens versus other treatments for NSCLC has been examined in multiple cost-effectiveness analyses (CEAs). These analyses differed in modeling approaches, survival and cost estimation, and/or utility analyses, yielding varied results and conclusions even for identical patient populations and treatments. Healthcare payers rely on CEA results to make coverage and reimbursement decisions. Here, we present a review that examines CEA methodologies in depth and discusses how they may affect study findings.

As of this writing, two literature reviews of the economic value of NSCLC treatment with immunotherapy have been published [9, 10]. da Veiga et al. conducted a meta-narrative review on the costs and economic value of pembrolizumab and nivolumab in treating melanoma, NSCLC, and renal cell carcinoma, as well as using PD-L1 testing to select NSCLC patients eligible for immunotherapy. They found contradictory results from three published CEAs studying nivolumab as treatment for advanced NSCLC in Saudi Arabia, Canada, and the US, and attributed differences to the choice of chemotherapy comparator [9]. Verma et al. systematically reviewed published costs and CEAs of immunotherapies such as pembrolizumab for treatment of head and neck cancers, NSCLC, genitourinary cancers, and melanoma, as well as using PD-L1 testing to identify eligible patients. They listed results from previously published CEAs comparing immuno- and chemotherapy treatment for NSCLC and concluded that nivolumab was only cost-effective above certain PD-L1 levels, while pembrolizumab was cost-effective for both previously treated and treatment-naïve patients with NSCLC [10]. Neither review examined modeling methodologies before reaching their conclusions.

2 Materials and Methods

This study aims to compare methodologies and findings in model-based CEAs of pembrolizumab with/without chemotherapy for treating advanced NSCLC. The eligibility criteria for a publication to be included are as follows:

  • Study population included patients with advanced/metastatic NSCLC.

  • Interventions included pembrolizumab regimen(s).

  • Study type was CEA.

  • Study designs included modeling and simulation.

  • Outcomes included incremental cost-effectiveness ratios (ICERs).

  • Published in the English language.

A PubMed search was performed using the following strategy: (pembrolizumab) AND (non-small-cell lung carcinoma OR non-small cell lung carcinoma OR non-small-cell lung cancer OR non-small cell lung cancer OR NSCLC) AND (cost-effectiveness OR cost effectiveness) with limits to English-language publications that were published through to December 10, 2019. The search yielded 21 studies. One author examined titles/abstracts/full texts to determine eligibility and excluded seven studies. The excluded studies and the reasons for exclusion were as follows: da Veiga et al. and Verma et al. were CEA literature reviews [9, 10]; Bravaccini, Norum et al. and Tartari et al. were not CEAs [11,12,13]; and Aguiar et al. did not include modeling and simulation [14, 15]. The remaining 14 studies were reviewed in detail. One author extracted data, and a second author cross-checked the extracted information for accuracy. Different findings between the two authors were resolved via consultation with other authors.

Figure 1 outlines the selection process of the included publications.

Fig. 1
figure 1

Literature selection flow chart. CEA cost-effectiveness analysis

3 Results

3.1 Overview

Table 1 provides an overview of the 14 publications and shows base-case results. These studies covered regulatory-approved pembrolizumab NSCLC indications from a wide geographic area. The most commonly used perspectives were payer (eight of 14 studies) and healthcare system (six of 14 studies), within which Georgieva et al. took both a UK payer and a US healthcare system perspective [16]. The major difference between payer and healthcare system perspectives is that the former includes medical costs paid by payers only, while the latter includes medical costs paid by payers or patients [17]. Two studies from a US or Chinese payer perspective did not report the application of coinsurance rates to cost calculations (Table 1) [18, 19].

Table 1 Summary of included cost-effectiveness analyses

The base-case time horizon applied in these studies varied from 10 years to a lifetime. Some studies explicitly listed the evidence used to support their time horizon choice: Huang et al. consistently applied a 20-year time horizon in the base case, as the extrapolated OS projected that only 0.7% of KEYNOTE-010 pembrolizumab-treated patients were still alive 20 years after treatment onset [20]. Insinga et al. extrapolated OS curves for the pembrolizumab combination arm in the KEYNOTE-189 and KEYNOTE-407 trial populations. Based on their models, approximately 10% and < 5% of patients remained alive 10 and 20 years after initiating pembrolizumab plus chemotherapy, respectively, and thus 20 years was chosen as the base case and 10 years used as a scenario. Estimated ICERs were decreased from $119K/quality-adjusted life-year (QALY) to $105K/QALY and from $103K/QALY to $86K/QALY at 10 and 20 years for KEYNOTE-189 and KEYNOTE-407 patients, respectively [21, 22]. Based on these comparisons, a 10-year time horizon may not fully capture lifetime costs and health benefits of pembrolizumab-based therapy and a ≥ 20-year time horizon is warranted. CEA publications should report proportions of surviving patients in all arms when fixed time horizons are used.

Results could vary even across CEAs of the same trial population. For example, among CEAs based on KEYNOTE-024, ICERs varied from £43K/QALY (lifetime horizon) to £87K/QALY (until 99% of patients died) for the UK population and from $49K/QALY (lifetime) to $98K/QALY (20 years) for the US population [16, 23, 24]. Among three US-based KEYNOTE-042 CEAs, ICERs varied from $48K/QALY (until 99% of patients died) to $136K/QALY (20 years) for patients with a TPS of ≥ 50%, from $47K/QALY (until 99% of patients died) to $161K/QALY (20 years) for patients with a TPS of ≥ 20%, and from $68K/QALY (until 99% of patients died) to $180K/QALY (20 years) for patients with a TPS of ≥ 1% [18, 25, 26].

According to the International Society for Pharmacoeconomics and Outcomes Research (ISPOR)–Society for Medical Decision Making (SMDM) Modeling Good Research Practices Task Force, examining and reporting uncertainty is a very important aspect of cost-effectiveness modeling [27]. The most commonly conducted uncertainty analyses in cost-effectiveness modeling are deterministic sensitivity analysis (DSA) that assesses uncertainty related to one or a set of parameters with continuous values, probabilistic sensitivity analysis (PSA) that assesses uncertainty by varying all continuous variables simultaneously using a simulation, and scenario analyses that assess uncertainty related to parameters with discrete values [27]. Among the 14 studies, eight performed one-way DSA, PSA, and scenario analyses [20,21,22,23,24, 26, 28, 29]. Another five studies performed DSA and PSA, but not scenario analyses [18, 19, 25, 30, 31]. Georgieva et al. performed a sensitivity analysis on prior distribution of survival and study-to-study heterogeneity and also conducted scenario analyses on discount rates and distribution for the survival model for their Bayesian Markov model [16].

Among the 13 studies that conducted DSA, 11 used 95% confidence intervals for extreme value tests when available [20,21,22, 24,25,26, 28, 29, 31], while Hu and Hay and Zhou et al. only made assumptions on the variation range of the tested parameters, which were ± 20% and ± 30%, respectively [19, 23]. Nine CEAs specified parametric distribution use in the PSA [18, 20,21,22, 24,25,26, 29, 30].

3.2 Modeling Approaches and Survival Extrapolations

Table 2 summarizes the modeling approaches and survival extrapolation methods used in the 14 studies.

Table 2 Comparison of modeling approaches and survival extrapolation methods

3.2.1 Modeling Approaches

Seven studies applied a partitioned-survival modeling approach, while the other seven applied one of several Markov-related approaches (two Markov, three semi-Markov, one patient-level state transition, and one Bayesian Markov).

All approaches assumed disease progression is irreversible, i.e., patients cannot move from progressed disease (PD) to progression free (PF). While partitioned-survival models incorporated time dependency in extrapolated PFS and OS curves, the three semi-Markov, one patient-level simulation, and the Bayesian Markov models also used survival functions to estimate transition probabilities over time to incorporate time dependency [16, 18, 23, 25, 31]. In contrast, Liao et al. and Zhou et al. assumed constant transition probabilities over time in their Markov models [19, 30].

Partitioned-survival models use PFS and OS Kaplan-Meier (KM) data from clinical trials and therefore can model survival functions precisely during the trial period, which is more difficult to achieve within a Markov modeling framework. On the other hand, Markov models make patients’ transitions explicit, while partitioned-survival models only provide a Markov trace but not a transition matrix. Thus, while both models report the number of deaths to occur in a given cycle, the transition matrix allows one to determine how many of these deaths were among individuals previously in the PF or PD health state.

Partitioned-survival and Markov models applied in some of the studies assumed structurally unrelated survival functions for each treatment arm, and therefore survival parameters cannot easily be varied dependently in PSAs. Some partitioned-survival models applied a relative risk of mortality between treatment arms. For example, three studies applied the same constant hazard within both arms, estimated from the Surveillance Epidemiology and End Results (SEER) program, when extrapolating OS beyond the trial period [20, 24, 26]. In contrast, two studies applied SEER-based mortality risks in the chemotherapy arm for long-term OS prediction and then an efficacy relative risk to the chemotherapy arm mortality risks to derive long-term OS predictions for the pembrolizumab plus chemotherapy arm, thus allowing dependence of OS between comparator arms [21, 22]. Among the Markov models, Criss et al. also applied SEER-based mortality risks across arms after year 5 when extrapolating OS curves [31]. Georgieva et al. allowed dependency between arms and between progression and OS by using a Gaussian copula in their Bayesian Markov model [16].

Each modeling approach has strengths and weaknesses. Researchers should select the approach that best fits their study purposes while taking its limitations into consideration. Time dependency of survival should reflect the real-world disease progression process. Survival parameters should have the option to vary dependently across treatment arms, as third factors such as trial inclusion/exclusion criteria may affect survival of both arms.

3.2.2 Health States

Most studies included three health states commonly applied in oncology models, i.e., PF, PD, and death. Georgieva et al. included stable disease, PD, and death, but also added discontinuation due to treatment-related adverse events (TRAEs), discontinuation due to progression, and post-discontinuation treatment discontinuation as three alternative absorbing states other than death [16]. Based on KEYNOTE-024 trial results, Georgieva et al. assigned 14% of pembrolizumab-treated patients and 11% of chemotherapy-treated patients to discontinue treatment due to TRAEs, and 56% of pembrolizumab-treated patients and 46% of chemotherapy-treated patients to discontinue treatment upon progression. After disease progression, pembrolizumab-treated and chemotherapy-treated patients were assumed to discontinue treatment after a median of four and five cycles, respectively [16]. As a higher proportion of patients in the pembrolizumab arm entered into the three treatment discontinuation absorbing states than patients in the chemotherapy arm and were not allowed to transition to the death state afterwards, their survival was overestimated. Setting the three treatment discontinuation states as non-absorbing states and allowing patients to transition to death can reduce bias.

3.2.3 Cycle Length

Cycle lengths varied across analyses, with intervals of 1 week, 3 weeks, 1 month, or 6 weeks. Without affecting computational efficiency, a shorter cycle length is preferred to provide more flexibility in capturing the actual interval of treatment, which can differ among comparators and increase estimation precision.

3.2.4 Estimation of PFS/OS

In the studies utilizing partitioned survival models, different assumptions were made related to survival prediction beyond the trial observation period. All studies used a piecewise model to extrapolate survival curves so that the original PFS and OS KM curves could be used within the trial period. As the best fitting parametric approach predicted higher annual mortality risks in the medium- to long-term than mortality observed from SEER data, some studies used SEER data in longer-term OS prediction to avoid overestimating NSCLC mortality [20,21,22, 24, 26]. For the Markov models, the probabilities were based on the PFS and OS trial survival curves, their exponential or Weibull extrapolation, or extrapolation based on a piecewise model encompassing long-term SEER mortality data. Regardless of the applied survival prediction method, it is difficult to assess the plausibility of predicted survival curves beyond the trial period without long-term trial or observational data.

An overreliance on statistical fitting criteria for generating extrapolated survival curves has been observed in some studies. Parametric statistical fitting can suggest a statistical distribution that fits a limited short-term observation window and yields a survival curve that reliably matches trial KMs, but clinical considerations should govern choices made for longer-term survival extrapolation. The exponential distribution was selected for OS extrapolation in five partitioned-survival analyses and one patient-level simulation analysis mainly based on statistical fitting criteria and visual inspection [20, 24, 26, 28, 29, 31]. However, the exponential distribution assumes a constant risk of death across time, while NSCLC mortality risks are observed to decline over time in population-based data, likely due to surviving patients increasingly reflecting those with long-term remission or cure (complete remission ≥ 5 years) or who otherwise are in better general health (hardy survivor population) [21, 22]. Therefore, applying the exponential distribution for extrapolation is likely to underestimate the long-term survival potential and cost-effectiveness of pembrolizumab-based therapies.

Extrapolated annual mortality risks in the trial control arm should not exceed those observed in historical population-based data for metastatic NSCLC patients, such as SEER. Six studies chose to use population-based NSCLC mortality risks directly in long-term modeling [20,21,22, 24, 26, 31], while the rest relied on parametric statistical fitting and seldom compared the extrapolated mortality risks with those observed in population data [16, 18, 23, 25, 28, 29]. If extrapolated mortality risks for the trial control arm are higher than those observed in historical population data, the fitting should be considered problematic, as patients in the control arm can switch to pembrolizumab or other new and efficacious therapies that became available in recent years. Additionally, patients enrolled in the KEYNOTE NSCLC trials were relatively healthier than the general metastatic NSCLC population tracked by SEER because of stringent trial inclusion/exclusion criteria [32,33,34,35,36]. Thus, having higher extrapolated mortality risks for the chemotherapy arm than in historical patients likely underestimates the arm’s longer-term survival. Indeed, Insinga et al. found that the longer-term survival of the pembrolizumab arm and the absolute magnitude of benefit of the pembrolizumab regimen were underestimated in this situation [22].

All the trial KMs incorporated survival effects of subsequent treatment, and thus the extrapolated survival curves reflected survival effects contributed by subsequent treatment.

3.2.5 Model Validation

Vemer et al. [37] summarized five parts of validation in their Assessment of the Validation Status of Health-Economic decision models (AdViSHE) tool, including validation of the conceptual model, model inputs, the computerized model, model outcomes, and others. Among the 14 publications, only Huang et al. (KEYNOTE-024) and Loong et al. described all five parts of validation performed by expert opinions and comparing with real-world data (RWD) in the AdViSHE tool [24, 29]. She et al. , Weng et al. and Zhou et al. did not report validation in their papers [18, 19, 25]. The rest of the papers mainly focused on discussing validation of the long-term survival extrapolation or transition probability estimation, i.e., validation of model outcomes [16, 20,21,22,23, 26, 28, 30, 31].

Per suggestions from Vemer et al. validation of model outcomes can be done through face validity testing, cross validation testing, validation using alternative input data, and validation against empirical data [37]. Four studies used the face validation technique such as visual inspection and expert opinions to validate survival extrapolation results [20, 24, 28, 29]. Four studies cross-validated survival extrapolation results with data from a different clinical trial or RWD [24, 26, 28, 29]. Among these four studies, three only cross-validated survival of chemotherapy-treated patients due to lack of long-term clinical trial data or long-term RWD on survival of patients treated with pembrolizumab regimens at the time when these studies were conducted [24, 26, 29]. Chouaid et al. cross-validated survival of patients treated with pembrolizumab monotherapy using published results of KEYNOTE-001, a single-arm study examining pembrolizumab’s treatment effects on patients with advanced NSCLC with a median follow-up of 10.9 months [6, 28]. Six studies constructed long-term OS of chemotherapy-treated patients based on SEER [20,21,22, 24, 26, 31]. Three studies cross-validated model-estimated survival results with the original trial data [16, 23, 30].

3.3 Cost Calculation Methods

Table 3 summarizes the cost calculation methods used in the reviewed studies.

Table 3 Comparison of cost calculation methods

3.3.1 Cost Categories

All 14 studies limited cost calculations to direct medical costs, reflective of study perspectives. Major cost categories (drug acquisition/administration, disease management, adverse event [AE] costs) were included in all studies. Terminal care costs were captured by most studies except Liao et al., She et al. and Zhou et al [18, 19, 30]. Chouaid et al. also included transportation costs, categorizing them as direct medical costs [28].

Several KEYNOTE NSCLC trials were limited to patients whose tumors expressed PD-L1. For example, KEYNOTE-010 and KEYNOTE-042 enrolled patients with a TPS of ≥ 1%, whereas KEYNOTE-024 enrolled patients with a TPS of ≥ 50%. Most CEAs based on these trials included PD-L1 testing costs. Huang et al. (KEYNOTE-024) and Huang et al. (KEYNOTE-042) conducted a comparison of pembrolizumab versus chemotherapy assuming PD-L1 testing was performed as routine practice and thus did not include PD-L1 test costs in the base case. Both, however, added PD-L1 test costs in scenario analyses and found it had little impact on ICERs [24, 26].

Consistent with real-world clinical practice, patients could switch to subsequent therapies after treatment discontinuation in all KEYNOTE NSCLC trials. Therefore, post-discontinuation treatment costs should be included to accurately reflect real-world practice and costs. Thirteen studies included subsequent treatment costs, most of which reported estimated ICERs were sensitive to post-discontinuation costs [18, 21, 22, 24,25,26, 29]. Liao et al. did not include post-discontinuation treatment costs [30].

3.3.2 Treatment Durations

Two treatment duration approaches, trial-based time on treatment (ToT) and treat to progression (TTP), were applied in most studies. Five studies used trial-based ToT capped with maximum treatment durations per trial protocols and FDA recommendation (e.g., 35 cycles for pembrolizumab) to measure treatment duration [20,21,22, 24, 26]. Four studies applied the TTP approach, with Chouaid et al. and Weng et al. also incorporating maximum treatment durations [25, 28,29,30]. Georgieva et al. modeled treatment to end either upon progression or TRAEs [16]. The remaining studies reported applying a maximum treatment duration without specifying further details [18, 19, 23, 31].

TTP does not necessarily produce estimates corresponding to actual patient treatment durations. In all pembrolizumab NSCLC clinical trials, as in real-world clinical practice, patients can discontinue treatment before disease progression due to safety, intercurrent illness, protocol non-compliance, or investigator/patient preference. Patients can also continue treatment after disease progression if the investigator considers patients can continue benefiting from pembrolizumab [20,21,22, 24, 26]. TTP approaches can overestimate treatment costs relative to actual observed ToT. For example, in the pembrolizumab plus chemotherapy arm in KEYNOTE-189, Gandhi et al. reported median PFS of 8.8 months versus median duration of treatment for pembrolizumab of 6.9 months, as estimated based on the median number of administrations [32]. Huang et al. (KEYNOTE-010) reported that when applying TTP without the 2-year cap to pembrolizumab-treated patients, the estimated ICER increased from $169K/QALY in the base case using trial-based ToT to $215K/QALY. Applying TTP with the 2-year cap reduced the ICER to $167K/QALY [20].

3.3.3 Disease Management Costs

Non-drug disease management costs were incorporated across the 14 studies at different levels of detail. For example, Huang et al. (KEYNOTE-024) included costs of long-term care, laboratory tests, radiation therapy, nurse/primary/specialist care, hospitalization, and emergency department (ED) use for PF state disease management, and costs of hospitalization, ED use, ambulatory care, other medical services, and retail pharmacy for PD state disease management [24]. In contrast, Criss et al., Liao et al. and She et al. only counted radiographic and/or laboratory test fees as disease management costs [18, 30, 31].

As disease management in PD tends to be more costly than in PF, disease management costs were applied by health state in nine studies. Three studies further stratified health-state–based management costs by years after treatment initiation. Based on cost inputs reported in these studies, disease management costs declined sharply over the first 6 years after treatment initiation [21, 22, 26]. Therefore, models that use a fixed disease management cost based on short-term follow-up data after treatment initiation may substantially overestimate costs compared with models that allow costs to decline over time based on extended follow-up data. As a result, total costs for treatments that extend survival time, such as pembrolizumab regimens, will be overestimated and thus undervalued if a single fixed disease management cost is applied.

3.3.4 Subsequent Treatment Costs

Not all patients who discontinue treatment receive subsequent treatment. Patients with clinical progression/deterioration may not be candidates for another line of therapy, while others may opt for no further treatment. The proportions of post-discontinuation patients who received subsequent treatment in the pembrolizumab versus chemotherapy arms were 40% and 44% in KEYNOTE-010 [20], 44% and 59% in KEYNOTE-024 [29], 44% and 49% in KEYNOTE-042 [26], 46% and 57% in KEYNOTE-189 [21], and 27% and 52% in KEYNOTE-407, respectively [22]. Five studies accounted for these values as reported [20,21,22, 26, 29]. Three studies included proportions of discontinued patients receiving subsequent treatment implicitly in proportions of patients receiving subsequent treatment [18, 24, 28]. Criss et al. and Georgieva et al. used proportions of progressed patients receiving subsequent treatment as a proxy for proportions of discontinued patients receiving subsequent treatment [16, 31].

Among studies including subsequent treatment costs, ten applied different post-discontinuation regimens across treatment arms and three assumed both arms switched to the same post-discontinuation regimens and had the same patient distribution across regimens, including anti-PD-1/PD-L1 medications [16, 25, 31]. This assumption contradicts actual trial observations, i.e., subsequent anti–PD-1/PD-L1 medication use was substantially higher in chemotherapy arms. Therefore, incorporation of equivalent use in the pembrolizumab regimen arms without adjustments to efficacy biases cost-effectiveness results against pembrolizumab regimens [32, 35].

3.3.5 Adverse Event Costs

Twelve studies included grade ≥ 3 AEs [18,19,20,21,22,23,24, 26, 28,29,30,31], and three studies only considered TRAEs and/or immune-mediated AEs [16, 19, 28]. Seven and two studies included AEs with ≥ 5% [18, 20,21,22,23,24, 26] and ≥ 1% frequency [25, 28], respectively. One study counted AE-related hospitalization costs only [20].

Ideally, all AEs regardless of causality, grade, severity, and frequency of observation should be included in a CEA in order to fully capture AE costs in each arm. However, some AEs observed from KEYNOTE NSCLC trials belong to one or more of the following categories: not caused by treatment, low-grade, less severe, and low-frequency [32,33,34,35,36]. For model simplification purposes, these AEs can be excluded.

3.4 Utility Analysis Methods

Table 4 presents the utility analysis methods used in the 14 studies.

Table 4 Comparison of utility analysis methods

3.4.1 Data Sources

Two data sources provided utility values to these studies: clinical trial data and published literature. Most pembrolizumab NSCLC trials (KEYNOTE-010, KEYNOTE-024, KEYNOTE-189, and KEYNOTE-407) collected utility data from patients using instruments such as the EuroQol 5-dimension, 3-level (EQ-5D 3L) questionnaire or the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire. Six studies based on these trials applied utility values calculated from trial-collected EQ-5D 3L survey data [20,21,22, 24, 28, 29]. The other four elicited utility values from published literature [16, 23, 30, 31]. The literature cited the most was a UK-based publication authored by Nafees et al. [42], which evaluated utility values for health-state vignettes where patients with NSCLC were treated with second-line chemotherapies based on a survey conducted in a convenience sample of the general UK population. Thus, there are limitations associated with applying these data to pembrolizumab patients, reflecting the time period of the study (before pembrolizumab availability and other potential changes to clinical practice), a lack of geographically representative data for establishing health-state values, limited applicability of second-line data, and a lack of direct health-related quality of life measurement in pembrolizumab-treated patients with NSCLC and the relevant trial comparator.

Quality of life data were not collected in the KEYNOTE-042 trial. Studies based on this trial used utility data collected from KEYNOTE-024 [26], published literature [19, 25], or a combination [18]. KEYNOTE-042 is a subsequent trial to KEYNOTE-024, expanding the patient population from TPS ≥ 50% to TPS ≥ 1%. It is unknown whether KEYNOTE-042 patients with TPS of 1–49% share the same utility values with KEYNOTE-024 patients, though a previous analysis did not observe substantive differences in utilities for a given health state by PD-L1 expression status [43].

3.4.2 Analysis Approaches

Two approaches were used to incorporate utilities in the cost-effectiveness models: progression status and time to death (TTD). The former assumes utility values for patients vary by NSCLC disease progression status, whereas the latter assumes utility values are affected by the patient’s proximity to death and measures utility values accordingly. TTD potentially allows more detailed utility measurement compared with the progression status approach, which only distinguishes utility values before and after progression. According to Hatswell et al. patients with advanced or metastatic melanoma will exhibit a rapid decrease in utility in the last 180 days before death and using the progression status approach cannot capture contributions to the change caused by other factors [44].

Georgieva et al. [16] cited the end-of-life framework suggested by the British National Institute of Clinical Excellence (NICE) and assigned a utility value of 1.0 to patients whose OS was extended for > 3 months by pembrolizumab versus chemotherapy.

Three studies applied AE disutility explicitly in modeling [20, 23, 28]. Two studies did not include AE disutility explicitly or implicitly in modeling [19, 25]. The rest of the studies included AE disutility implicitly within the progression status or TTD utility values [16, 18, 21, 22, 24, 26, 29,30,31]. None of the studies compared QALYs and ICERs with and without taking AE disutility into consideration, and thus we could not conclude whether excluding AE disutility would impose a significant impact on the cost-effectiveness estimation.

3.4.3 Utility Values

The utility value inputs used in the 14 studies varied by indication, data source, and analysis approach. Nafees et al. estimated utility values of the three NSCLC health states (responding, stable, and progressive) to be 0.673, 0.653, and 0.473, respectively [42]. These values were lower than estimates for previously treated patients with NSCLC in KEYNOTE-010 reported by Huang et al. which were 0.761 for PF and 0.687 for PD [20]. The discrepancy can be attributed to three factors. First, with the introduction of new health technologies, utility values can increase over time. For example, in a meta-analysis of chronic kidney disease patients’ utilities, utility values of transplant patients increased from 0.66 to 0.85 from the 1980s to the 2000s [45]. Utility values estimated by Nafees et al. do not reflect the development of less toxic, more efficacious therapeutic options for NSCLC in the 10–15 years since that study was conducted. The second factor refers to whether utilities are elicited from patients or the general public. For example, the general public may be more likely to overestimate NSCLC’s impact on patients’ quality of life than cancer patients [46]. Finally, the discrepancy may be partially attributed to different instruments used to measure quality of life in the two studies.

Between the two approaches, patients with ≥ 360 days to death in the TTD approach had higher utility values compared to the PF utility value with the progression status approach, while patients approaching death had lower utility values than the PD utility value. For example, Huang et al. used KEYNOTE-010 trial data to calculate utility inputs for both progression status and TTD approaches. The utility values related to PF and PD were estimated to be 0.761 and 0.687, while the utility values for patients with ≥ 360, 180–360, 90–180, 30–90, and < 30 days to death were estimated to be 0.807, 0.728, 0.688, 0.602, and 0.396, respectively [20], suggesting that the progression status approach underestimates patients’ quality of life when they have longer remaining life and overestimates patients’ quality of life when they approach death in KEYNOTE-010. This finding may indicate that TTD is a more appropriate approach to estimate utility values of advanced NSCLC patients, while we acknowledge that this conclusion may not be generalized to other types of cancer, especially those with high survival rates.

4 Discussion

This is the first comprehensive review of assumptions and methodologies applied in published CEA literature comparing pembrolizumab regimens with chemotherapy/immunotherapy in treating advanced NSCLC. Other existing CEA literature review publications including pembrolizumab focused on ranking the cost-effectiveness among different treatments by directly comparing ICERs, without comparing and discussing the appropriateness of the applied methodologies [9, 10].

Included publications in our review had varying levels of reporting quality. Important methodological or input information was missing or not clearly specified in several studies, making comparison difficult. For example, Criss et al. did not report the time horizon of the CEA [31]. Chouaid et al. and Zhou et al. did not indicate whether or how non-drug disease management costs were applied in their models [19, 28]. In Liao et al. many details of the modeling approach and cost inputs were not specified [30]. Loong et al. reported both TTP and extrapolated ToT as the treatment duration measurement approach, though PFS was actually used (based on communication with a study author) [29]. Furthermore, several studies used ambiguous terms. For example, Georgieva et al. reported including pembrolizumab-arm patient enrollment costs without explaining what these costs were [16]. She et al. and Zhou et al. included “administration” and “hospital administration” costs, without clarifying whether these costs represented drug administration costs or hospital administrative costs [18, 19]. To evaluate the quality of and to interpret the results of CEA publications, it will be critical for authors to follow current CEA reporting guidelines by reporting key elements of modeling methods and inputs, using clear and specific terminology [51].

Through summarizing and comparing methodologies applied in different studies and cost-effectiveness results, we found the choice of methodologies can lead to important differences in research findings and sometimes alter study conclusions.

Some studies employed approaches based on strong assumptions for model simplification purposes or due to data limitations. For example, assuming time independency of survival probabilities in a Markov model, TTP in estimating treatment costs, the same management costs for PF and PD states, or the same post-discontinuation regimens across arms are all contradicted by empirical experience from clinical trials and/or population data. The direction of the bias sometimes can be predicted. For example, neglecting post-discontinuation treatment costs or assuming the same cost in each arm will underestimate pembrolizumab’s cost offsets due to the use of more extensive and expensive second-line regimens following chemotherapy treatment discontinuation, and therefore biases the results towards favoring chemotherapy. In contrast, assigning a perfect utility score to pembrolizumab-treated patients with > 3 months life extension versus chemotherapy can influence the ICER in a direction favoring pembrolizumab. Finally, when a questionable approach affects both arms or when multiple inappropriate approaches confound results in opposite directions, it can be impossible to predict the direction of bias. Researchers should acknowledge and report the potential bias introduced through such assumptions and readers should be very cautious when interpreting results.

As many markets require demonstration of cost-effectiveness for new drugs, studies with opposite conclusions around pembrolizumab’s cost-effectiveness in treating the same indication for the same markets can lead to confusion. For example, Georgieva et al. reported ICERs of £45K/QALY [16] in comparison with £87K/QALY reported by Hu and Hay [23] for the KEYNOTE-024 indication from the UK National Health Service perspective. The differences in estimated ICERs can be attributed to many factors, such as assigning a perfect utility value to pembrolizumab-treated patients with > 3 months life extension compared with the chemotherapy arm in Georgieva et al. [16] and using the exponential extrapolation as the base for transition probability estimation in Hu and Hay [23]. Results from the two studies ended up on opposite sides of the £50K/QALY willingness-to-pay threshold for end-of-life therapies recommended by NICE [52], which theoretically could have led to divergent coverage/reimbursement decisions affecting patients’ access to pembrolizumab.

4.1 Limitations

Several limitations are acknowledged for this literature review. First, the study focused on peer-reviewed CEA publications. We acknowledge that Health Technology Assessment (HTA) appraisals may include insightful discussions on limitations of research design and/or data inputs used in selected CEAs. However, we did not find any HTA appraisal published on pembrolizumab as a treatment for NSCLC when the search was conducted. Second, only studies published in the English language were included. Despite these studies covering multiple geographic regions, it should be noted that they may not fully represent pembrolizumab CEA publications in other languages. Third, no quality assessment was performed on searched studies as part of the inclusion criteria. All CEAs of pembrolizumab regimens as treatment for NSCLC regardless of quality were summarized and compared for the purpose of critically evaluating and comparing applied methodologies. Fourth, some studies missed reporting or failed to report in detail important methodological aspects, and these elements could not be compared with other studies. Finally, due to the number of studies and limitations in information reported, we could not precisely identify and quantify all sources of variation in ICERs across studies, though some relevant factors and their perceived impact have been identified herein.

5 Conclusion

With growing healthcare expenditures worldwide, more and more healthcare payers rely on CEAs to make drug coverage and reimbursement decisions. These decisions can affect clinical guidelines and practice and impact patients’ access to drugs. In such circumstances, it is important for the CEAs used for decision making to meet high standards and produce unbiased results. In this literature review, we found the quality of published CEAs varies greatly, and that questionable CEA methodologies could significantly bias results and alter study conclusions. This illustrates the importance of payers, policy makers, and the scientific community to carefully examine study designs and assumptions when using CEAs for evidence-based decision-making.