Background

Non-small cell lung cancer (NSCLC) is the most common form of lung cancer, occurring in 85–90% of lung cancer cases [1], and includes adenocarcinoma (40% of all lung cancers), squamous cell carcinoma (25–30%) and large cell carcinoma (10–15%) [2]. NSCLC is staged according to the American Joint Committee on Cancer/Union for International Cancer Control system [3], and measurement of lesions follows the Response Evaluation Criteria in Solid Tumors (RECIST) [4]. Approximately 40% of patients will have metastatic NSCLC (mNSCLC) at diagnosis [5], which includes cancers found in the lung and in the lymph nodes in the middle of the chest (defined as stage IIIA and IIIB; no distant metastasis), and cancers that have spread to both lungs or to another part of the body (defined as stage IV; distant metastasis) [6, 7].

Treatment is recommended according to the stage of mNSCLC, but treatment options are limited in the later stages of disease [7, 8]. Five-year survival rates are considerably lower in later than in earlier stages of NSCLC (stage IA, 45%; stage IIIA, 14%; stage IIIB, 5%; stage IV, 1%) [9]. Moreover, symptoms such as coughing and wheezing, chest pain, hoarseness and weight loss can severely reduce functional independence in patients with mNSCLC [10, 11]. Patient-reported health-related quality of life (HRQoL) provides an overall evaluation of health, well-being and daily functioning, and is impaired in patients with mNSCLC owing both to the disease and to treatment sequelae. Maintenance or improvement of HRQoL is an important treatment goal [12].

HRQoL can be expressed as a health state utility value (HSUV) ranging from 0 (death) to 1 (full health). If the health state is considered to be worse than death, health states can be valued at less than 0. Utility values are key drivers in cost-effectiveness analyses because estimates of quality-adjusted life-years (QALYs) are obtained by multiplying HSUVs for each health state by the time spent in that health state. Estimates of cost per QALY are highly sensitive to the choice of HSUV. It is therefore important to identify specifically those HSUVs that have been derived using methods acceptable to health technology assessment (HTA) authorities [13].

HSUVs can be derived using a range of instruments and techniques [14, 15]. In brief, instruments include: generic preference-based measures such as the EQ-5D-3 L [16] or EQ-5D-5 L [17], Health Utilities Index (HUI) [18], 6-dimension Short-Form Health Survey (SF-6D) [19], Assessment of Quality of Life instrument (AQoL) [20], 15-dimensional HRQoL measure [21], Quality of Well-Being scale [22], and multi-attribute utility instrument; as well as directly elicited standard gamble (SG), time trade-off (TTO) and visual analogue scale (VAS, e.g. EuroQoL VAS [EQ-VAS]). Mapping algorithms may also be used to convert values obtained from a condition-specific questionnaire to a generic preference-based measure; or to convert data from the 12- or 36-item Short-Form Health Survey (SF-12 or SF-36) to SF-6D [23]. Techniques may vary in terms of whose health is being measured (a patient’s or a caregiver’s), who responds to the questionnaire or, if using vignettes, who considers the health-state description (the patient regarding their own health, a patient with a different disease, the patient’s closest caregiver, another caregiver, a physician or another healthcare provider). For preference-based measures, variation can stem from who values the health state (e.g. UK general population sample) and which choice-based method is used in this valuation (SG or TTO).

HTA bodies including the UK National Institute for Health and Care Excellence (NICE) [24, 25], the Scottish Medicines Consortium (SMC) [26], the Canadian Agency for Drugs and Technologies in Health (CADTH) [27], the French Haute Autorité de Santé (HAS) [28] and the Australian Pharmaceutical Benefits Advisory Committee (PBAC) [29] have stated preferences for HSUV methodology. Across these agencies, there is a preference for HSUVs estimated using generic preference-based measures. NICE has a strong preference for EQ-5D, as this reduces variability induced when different instruments are used between different disease areas. Agencies also strongly prefer patients to be the respondents, as patients can best describe their own health state. Finally, valuation estimated using a country-specific general-population tariff via a choice-based elicitation technique such as SG or TTO is preferred, as this represents societal preferences.

This systematic review had three main aims: first, to identify HSUVs for adults with previously treated mNSCLC, by treatment line and health state, and to evaluate the relevance of each health state to patients, for example, line of treatment, adverse events (AEs), response status and prognostic factors; second, to identify relevant disutilities or utility decrements associated with adverse events (irrespective of line of treatment or health state). Finally, the suitability of the HSUVs according to HTA reference case was explored and the quality of the HSUVs assessed.

Methods

Study design and search strategy

A systematic review of HSUVs in mNSCLC was undertaken to identify HSUV studies in any treatment line. Studies, published either as full papers or as conference abstracts, in patients previously treated for mNSCLC were selected for further analysis. The following databases were searched: Embase (1974 to 7 September 2016); MEDLINE® (1966 to 7 September 2016); MEDLINE In-Process and e-publications ahead of print (database inception to 7 September 2016); and the Cochrane Library (including the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects, the Cochrane Central Register of Controlled Trials, the National Health Service Economic Evaluation Database and the HTA database; 1968 to 7 September 2016).

Search strings are summarized in Additional file 1: Table S1, and were constructed not only to find utilities in mNSCLC (using a wide range of NSCLC and mNSCLC terms combined with the HSUV filter adapted from Arber et al. 2015 [30]) but also to identify all relevant disutilities or utility decrements associated with AEs/comorbidities. To ensure that estimates would be available from previously treated mNSCLC populations for all AEs or comorbidity health states relevant to the experience of such patients, the strings were designed to search for disutilities or decrements from a broader group of populations, as follows: from lung cancer; for progressive disease disutilities from advanced/metastatic cancer; for disutilities associated with the most common sites of metastasis from the lung (bone, respiratory system, nervous system, adrenal gland and liver) from advanced cancer; for disutilities associated with AEs or toxicities of cancer therapy; and disutilities associated with specific grade 3–4 AEs known to occur with cancer treatments from advanced cancer populations (pneumonia, pneumonitis, increased aspartate aminotransferase, febrile neutropenia, neutropenia, infection, sepsis, fatigue, lethargy, nausea, vomiting, ulcers, stomatitis, gastrointestinal disturbance, diarrhoea, visual disturbance, hearing loss, hair loss, psychological/self-esteem changes, rash, anaemia, bleeding and hypertension). From the identified disutilities/decrements for each AE/co-morbidity health state, those from the most relevant population available could be selected following an order of decreasing population specificity from first-line mNSCLC to NSCLC, lung cancer and advanced/metastatic cancer (Fig. 1).

Fig. 1
figure 1

Studies reporting adverse event health state (dis) utilities by patient population and country. Abbreviations: 1L first line, 2L second line, BC breast cancer, BM bone metastasis, Hb haemoglobin, i.v. intravenous, LC lung cancer, LNS line of treatment not specified, mLC metastatic lung cancer, mNSCLC metastatic non-small cell lung cancer, NICE National Institute for Health and Care Excellence, NSCLC non-small cell lung cancer, SCLC small cell lung cancer, SRE skeletal-related event

Using the term “NSCLC” or “non-small cell lung cancer”, manual searching of the EQ-5D website, of the School of Health and Related Research Health Utilities Database (ScHARRHUD) and of major pharmacoeconomic and clinical conferences in 2015–2016 was conducted on 3 and 5 December 2016. Conferences included: the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) International Meetings and European Congresses; the HTA International Annual Meetings (HTAi); the Society for Medical Decision Making (SMDM) North American Meetings and European Conferences; the American Society of Clinical Oncology (ASCO) Meetings; and the European Society for Medical Oncology (ESMO) Congresses. Bibliographic reference lists of relevant systematic reviews from 2010 onwards were searched and of relevant cost-utility analyses, and HTA reports from various bodies identified in a parallel economic systematic review, including: NICE; SMC; All Wales Medicines Strategy Group (AWMSG); PBAC; CADTH; Institut National d’Excellence en Santé et en Services Sociaux; pan-Canadian Oncology Drug Review (pCODR); and HAS.

The PICOS (patient, intervention, comparator, outcome, study) statements for study inclusion and exclusion criteria are summarized in Table 1. Although, second- and later-line data were of primary interest, studies that reported utilities for patients with mNSCLC who were either treatment-naïve or in receipt of maintenance first-line treatment were included for reference at the first screening but data were not extracted. These studies are listed in Additional file 2: Table S2.

Table 1 Inclusion criteria

Mapping from condition-specific to preference-based studies was not sought because it was anticipated that sufficient published utility and EQ-5D data would be available to populate the health states of an economic model, and because results based on mapping algorithms sit lower in the acceptance hierarchy used by some HTA authorities (Additional file 3: Figure S1). We have acknowledged NICE’s stated preference for EQ-5D-3 L data over EQ-5D-5 L (Additional file 3: Figure S1) and provide detailed information of the instrument used for generating data for each identified study in Table 2 [31].

Table 2 Identified utility studies by line of treatment

Study selection

The screening process complied with the 2009 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [32]. Publications were de-duplicated using EndNote (Clarivate Analytics, Philadelphia, PA, USA) and using Rayyan (Qatar Computing Research Institute, Doha, Qatar) [33], an internet-based reference management system endorsed as suitable for systematic review screening by the European Network for HTA [34]. Abstracts and titles of papers were screened by one reviewer, and a 50% sample check conducted by a second reviewer; exclusion criteria are summarized in Table 1. The full texts of papers potentially meeting the selection criteria were screened by one reviewer, and a 50% sample check was conducted by a second reviewer. Discrepancies were discussed between reviewers, and any unresolved disputes were referred to a third reviewer.

Data extraction

Data were collected using a piloted data-extraction sheet. Extraction was conducted by one reviewer, and priority data elements were quality checked by a second reviewer. The information extracted included study design, whether the selection criteria yielded a population that matched the target population (i.e. previously treated adult patients with mNSCLC), health state description, instrument type, instrument scale, HSUV or (dis) utility or decrement estimates and measure of variability (median with interquartile range or mean with standard error, standard deviation or 95% confidence interval), derivation methods and if the data presented were appropriate for use in HTA submissions to NICE, SMC, CADTH, HAS and PBAC.

Quality and relevance assessment

The appropriateness of utilities reported for use in economic evaluations was determined by whether data met the requirements of the HTA body reference case; and the quality of utility estimates (based on sample size, response to the questionnaire, loss to follow-up, handling of missing data, and reporting of point and variance estimates, as discussed in NICE Decision Support Unit Technical Support Document 11 and its related publication [25, 35]; Additional file 4: Table S3). Any recommendation for, or rationale against, the use of specific utilities in a cost–utility analysis model in previously treated patients with mNSCLC was also taken into consideration in line with preliminary guidance from the ISPOR Health State Utility Good Practices Task Force [36].

Results

Search yields

Electronic database searches identified 1883 citations (1521 from MEDLINE/Embase, 144 from MEDLINE In-Process/e-publications and 218 from the Cochrane Library databases). After de-duplication (51 citations: 30 via Endnote and 21 via Rayyan) and title/abstract screening (1557 exclusions), 275 full-text papers were reviewed. Of these, 250 were excluded (21 of which were tagged as reporting first-line treatment; Additional file 2: Table S2), yielding 25 citations that were included from electronic sources. Manual searching identified 11 citations. In total, 36 articles were included, reporting 34 studies (Table 2). Two articles [37, 38] were linked to other publications [39, 40], and were retained because they provided additional information. The study selection is summarized in a PRISMA flow chart in Fig. 2.

Fig. 2
figure 2

PRISMA flow chart for study selection. Abbreviation: PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Description of studies identified

Among the 36 articles (34 studies) identified, 19 reported EQ-5D scores [37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55] (three studies further specified the instrument as EQ-5D-3 L [39, 41, 56] and two as EQ-5D-5 L [44, 57]; Table 2), two reported SG or TTO directly elicited from patients [58, 59], two reported EQ-VAS scores only [57, 60], and one reported AQoL scores [61] (Table 2). Moreover, one study reported SF-12 scores for caregivers to patients with mNSCLC [62], eight reported HSUVs from valuations of vignettes made by members of the public using SG or TTO [59, 63,64,65,66,67,68,69] (one of which reported both general public-elicited SG and patient-elicited TTO) [59], and one reported disutility estimates based on expert opinion for pneumothorax, thrombocytopenia and thrombosis, adverse event health states for which disutilities were not available from other HSUV derivation methods [70]. A further two articles reported HSUVs but were unclear about how these were derived; one reported disutilities used in previous NICE submissions, for anaemia and for oral and intravenous treatment modes [71], and one reported a “global quality of life index” for second-line NSCLC [72].

Among the dataset, two studies were retained despite reporting first-line treatment only, because they reported AE disutility estimates from populations broader than mNSCLC [68, 70]; three further studies that reported first-line data also reported on subsequent treatment lines [38, 39, 46]. Eleven studies focused exclusively on HSUVs associated with second-line treatment [41, 45, 48,49,50,51, 57, 59, 69, 71, 72], and five reported HSUVs in second-line and subsequent treatment [37, 40, 56, 60, 73]. Line of treatment was unspecified in 15 studies [42,43,44, 47, 52,53,54,55, 58, 61,62,63, 65,66,67].

Relevant HSUVs by line of treatment

Utilities were reported for a range of health state types: treatment-specific or not, RECIST response-based or not, time-on-treatment, time-till-death, or a combination of these. Details of HSUV estimates by treatment line are given in Table 3. Among patients receiving second-line or subsequent treatment for advanced NSCLC or mNSCLC, mean HSUV estimates based on EQ-5D for stable/progression-free disease and for patients at baseline or pre-treatment were in the range 0.66–0.76 [38, 39, 41, 45, 49, 50]; in the same group, mean values for patients with progressive disease were generally lower (0.55–0.69) [38, 39, 45]. Among patients on treatment at this stage of disease and treatment line, the range of mean HSUVs based on EQ-5D was broad (0.53–0.82) [40, 41, 46, 51, 56], the highest value being associated with treatment with tyrosine kinase inhibitors [41, 56]. A similar range of HSUV values was seen among patients being treated for advanced NSCLC or mNSCLC when the treatment line was unspecified (0.53–0.77) [42, 47, 52, 53]. Only three papers specified using EQ-5D-3 L [39, 41, 56] and only two EQ-5D-5 L [44, 57].

Table 3 Health state utility values by treatment line, health state and instrument

Disutilities for progression from a stable state were − 0.056 or − 0.065 by EQ-5D, both from Griebsch et al. [37], or − 0.1798 by general population-derived SG [69]. Overall, HSUVs varied not only by treatment line and disease state, but also by the treatment received under the same health state (potentially reflecting differences in safety profiles) and by the instrument/tariff used to derive the HSUV.

Relevant disutilities and decrements

Eleven studies identified in this systematic review reported disutilities or decrements for AE health states [44, 52, 55, 58, 59, 65, 67,68,69,70,71]. Only two studies reported disutilities specifically associated with second-line treatment [69, 71], and another two studies did not specify the treatment line [44, 65]; disutility and decrement data are summarized in Table 4. Utility-incorporating decrements were identified for the following AEs in the context of second-line “stable disease” or second-line “responding”: diarrhoea, fatigue, febrile neutropenia, hair loss and nausea/vomiting. Disutilities associated with second-line treatment were reported for the following events [69]: “moving from stable to progressive state” (− 0.18), neutropenia (− 0.09), febrile neutropenia (− 0.09), fatigue (− 0.07), nausea/vomiting (− 0.05), diarrhoea (− 0.05), hair loss (− 0.04) and rash (− 0.03).

Table 4 Disutilities and decrements for adverse event health states in patients with previously treated mNSCLC

Further recommended sources of AE health state (dis) utilities were as follows (Fig. 1): in 2 L from general population SG in Nafees et al. 2008 [69]; in metastatic NSCLC (line unspecified) from general population SG in Doyle et al. 2008 [65]; in 1 L from patients without NSCLC using directly elicited TTO in Nafees et al. 2016 [68]; in 2 L in NSCLC as reported in Westwood et al. 2014 [71]; in cancer with bone metastases for skeletal-related events from general population TTO in Matza et al. 2014 [67]; stage IV NSCLC in 1 L from expert opinion estimates in Handorf et al. 2012 (expert-opinion-derived utilities from this study were included, as they are the only source of estimates for pneumothorax, thrombocytopenia and thrombosis disutilities) [70]; and anaemia from general population SG or from patient-derived TTO in Lloyd et al. 2008 [59].

Description of HTA-relevant HSUVs and disutilities

Of the 36 publications, 13 provided HSUVs that meet the NICE reference case or are considered acceptable to the HTA agencies of interest [37,38,39,40, 42, 45, 46, 49, 53, 56, 58, 64, 69], based on the measurement technique for generation of HSUVs, as outlined in Additional file 3: Figure S1. The main characteristics of these studies are presented in Table 3. These 13 publications reported data from multinational studies [37,38,39,40, 45, 49, 64], and from Canada [42, 56], France/Germany [46], USA [58], Italy [53] and the UK [69]. In these studies, HRQoL was measured using EQ-5D [37,38,39,40, 42, 45, 46, 49, 53, 56], EQ-VAS [37, 39, 40, 49] and SG [58, 64, 69]. The HTA suitability of disutilities and decrements for AE health states in previously treated patients are reported in Table 4.

Discussion

Economic evaluation, particularly cost–utility analysis, provides important information for guiding decision-making in health care, and its use in HTA is increasing globally. Such evaluation includes examination of the time spent in different disease states and uses an HSUV for each disease state to calculate QALYs; HSUVs therefore play a key role in economic evaluation. As summarized in Additional file 3: Figure S1, NICE, SMC, CADTH, HAS and PBAC prefer utilities to be estimated using a generic preference-based instrument, with health states described by patients through use of a questionnaire, and with the health state valued using a country-specific tariff that reflects societal preferences. As the aim of this systematic review was to evaluate the experience of adults with previously treated mNSCLC, the synthesis of health state utility estimates was outside its scope. However, the findings presented here may provide a basis for generation of an accurate estimate of the mean HSUV for use in economic evaluations [74, 75].

This systematic review identified HSUVs relevant to the experience of previously treated adult patients with mNSCLC. Search strings were designed to allow (dis) utilities from a broader population (including lung cancer, advanced/metastatic cancer and specific metastases common in patients with lung cancer). In the absence of second-line mNSCLC (dis) utilities, alternatives were selected with decreasing population specificity and relevance from first-line mNSCLC, NSCLC, lung cancer or advanced/metastatic cancer, as outlined in Fig. 1. Ordering the HSUVs by line of treatment reflects the practice of switching treatment at progression. However, for the newer immunotherapies, patients may remain on treatment post-progression, and their HRQoL may remain at pre-progression levels. Thus, HSUVs estimated for progression status-specific health states from patients receiving chemotherapy may not be suitable to apply to the equivalent health states when patients receive immunotherapy.

In total, the 36 identified articles reported 591 HSUVs relevant to the experience of previously treated adult patients with mNSCLC, and 11 of these studies reported a total of 195 disutilities or decrements for AE health states that are relevant to the experience of patients with mNSCLC. The range of HSUVs identified for comparable health states, such as progression-free/stable disease among patients treated second-line for advanced/metastatic NSCLC [39, 45], highlights how differences in study type, tariff, health state and the measures used can drive variation in HSUV estimates. For instance, disutilities for progression from a stable state were − 0.056 or − 0.065 using EQ-5D, [37] or − 0.1798 by general-population-derived SG. [69] To overcome such variations, where possible, HSUV studies should seek to use instruments, respondents and valuation populations that are most acceptable to HTA bodies. However, there are instances where variation in methods can be justified. For example, disutility values derived from vignettes and a general public sample were used by Nafees et al. [69], because asking patients suffering such toxicities to complete HRQoL questionnaires was considered to be too burdensome for patients and potentially unethical. Moreover, although the variation may be large, it helps decision makers to identify where variability exists and informs the design of sensitivity analyses.

In the 36 publications identified, 13 provided HSUVs that meet the NICE reference case or are considered acceptable to the HTA agencies of interest [37,38,39,40, 42, 45, 46, 49, 53, 56, 58, 64, 69]. These were deemed suitable because HRQoL was measured using the EQ-5D [37,38,39,40, 42, 45, 46, 49, 53, 56] or SG [58, 64, 69], both measures preferred or accepted by several HTA authorities. This endeavour fills an important gap in the field because hitherto, only two reports had described HSUVs in mNSCLC [68, 69]; neither was a systematic review of the literature, nor did they assess their appropriateness for use in economic evaluations.

This systematic review did not identify an HSUV report based on data from the OAK trial (NCT02008227), because it was published as a congress abstract after the cut-off date for literature searching [76]. However, the HSUVs are relevant to the aims of this systematic review, and a brief description is provided below for completeness. Patients with locally advanced NSCLC or mNSCLC after failure of platinum-containing chemotherapy were randomized in a phase 3 trial to receive atezolizumab or docetaxel [76, 77]. As part of the trial, patients completed the EQ-5D, and the resultant HSUVs were presented by time point before death. This study is similar to Huang et al. 2016, which also presented time-to-death EQ-5D utilities for a similar patient group receiving immunotherapy, except comparing pembrolizumab and docetaxel [45]. Overall, HSUVs were very similar between studies at approximately equivalent time points. In the OAK study, the following HSUVs were reported by time point before death: 0.77 (> 210 days), 0.71 (105–210 days), 0.61 (35–105 days) and 0.39 (< 35 days). For comparison, HSUVs published by Huang et al. 2016 were 0.73 (180–360 days), 0.69 (90–180 days), 0.60 (30–90 days) and 0.40 (< 30 days). A further study evaluating the efficacy of immunotherapy in patients with NSCLC showed that baseline mean EQ-VAS and EQ-5D index scores were similar for nivolumab (63.7 and 0.68, respectively) and docetaxel (66.3 and 0.66, respectively) [50].

Strengths of this systematic review include the wide range of data sources searched and the search string design, which enabled identification of disutilities and utility decrements for a wide range of AEs and progressive disease states (e.g. common sites of metastasis from lung cancer) of relevance to the experience of patients previously treated for mNSCLC. We have presented HSUVs by line of treatment, allowing use in economic modelling, and have discussed HSUVs likely to be accepted by the HTA bodies of interest. Inadequate or inconsistent reporting is common, and low sample sizes and response rates considerably impact on the reported confidence intervals of the reported results. However, among the studies identified here, most reported sample size (over 100 respondents in most cases), many provided a measure of variability for the values reported, and several were based on response rates greater than 80% (although response rates were unreported in more than half of the studies). Moreover, the use of only published HSUVs can be a limitation, as HTA submissions may use HSUVs that have not been previously published. As part of this systematic review, we have therefore searched HTA submissions for any relevant utilities; most HTAs use data reported by Nafees et al. [69]

Limitations of this review include that the label for the upper bound of the utility scale (e.g. “full health” or “perfect health”) was not recorded. This has been shown to be a significant predictor of utility in lung cancer [78], so variation in utilities due to a different upper bound label cannot be explored. A further limitation concerns data extraction from some studies presented as congress abstracts or posters. Owing to the word restrictions placed on conference proceedings they may not be considered a robust data source in comparison with full publications. Furthermore, both screening and data extraction were conducted primarily by a single reviewer, and only 50% of studies were checked by a second reviewer. The exclusion of studies that used mapping to derive EQ-5D and utility values is a further limitation of this study; however, sufficient data obtained through direct measurement were identified to be informative.

Conclusions

This systematic review begins to address the challenge of identifying reliable estimates of utility values in mNSCLC that are suitable for use in economic evaluations. Our work has also highlighted that these estimates are vulnerable to variations in study type, tariff, health state and the measures used, and that shortcomings in reporting are common.