FormalPara Key Points for Decision Makers

A comprehensive review of all completed National Institute for Health and Care Excellence (NICE) single technology appraisals (STAs) and their supporting documents identified 12 STAs with utility scores derived from 14 published utility sources and 3 redacted unpublished sources.

STA guidance published prior to 2016 lacked compliance or exhibited poor compliance with NICE preferred methods for measuring quality-of-life (utility scores), which reduces the consistency of decision making both within and between disease areas.

There appears to be a pattern of improved compliance, beginning in 2016. When NICE periodically updates guidance to reflect new evidence and treatments, it is important that quality-of-life data are concurrently updated to conform to NICE's preferred methods.

1 Background

The role of the National Institute for Health and Care Excellence (NICE) is to improve health outcomes using the English National Health Service (NHS) and other public health and social care services. This is achieved through several research avenues, one of which will be explored in this study: producing evidence-based guidance and advice for health, public health and social care practitioners through the single technology appraisal (STA) process. STAs assess the clinical and cost effectiveness of health technologies to ensure that all NHS patients have equitable access to the most clinically effective and cost-effective treatments available. STAs are designed to appraise a single health technology for a single indication [1].

The STA process consists of several stages designed to create rigorous, transparent decision making with ample stakeholder input into decisions. The primary evidence sources are company evidence submissions and work conducted by an independent Evidence Review Group (ERG) that assesses the validity and quality of the company submissions (CSs). The ERG requests and incorporates clarifications from the company, and conducts additional analyses to address uncertainty and areas where the CS is lacking.

Cost-utility analyses (CUAs) are the primary type of health economic analysis in NICE technology appraisals, a type of cost-effectiveness analysis where the primary measure of benefit is health-related quality-of-life (HRQoL), measured as quality-adjusted life-years (QALYs). In CUAs, HRQoL is generally measured through preference-based utility scores and decrements (utility values).

In 2004, NICE established a reference case defining the preferred methods for economic submissions; guidance on measuring utility scores were as follows: utility scores should be measured from patients or carers with valuation of health states conducted by the general public using a standardised and validated generic choice-based instrument (time trade-off or standard gamble), with EuroQol 5-Dimensions (EQ-5D) as the preferred method of utility measurement [2]. Alternative methods required justification. In 2008, NICE methods guidance recommended considering mapping to EQ-5D from other instruments when EQ-5D was unavailable; this transforms other methods of measuring quality-of-life into EQ-5D data and removed standard gamble from valuation methods [3]. The latest guidance (2013) provides a checklist explicitly asking whether EQ-5D was used [4].

The logic behind recommending one validated instrument is that different methods of measurement produce different results [2,3,4]. A systematic review in breast cancer utilities by Peasgood et al. found that measurement method substantially influenced utility score, as did whose preferences were measured and whose values were ascribed to those preferences [5]. This measurement inconsistency can lead to inconsistent decision making, with potential adverse consequences for maximising population health.

2 Objective

This study assesses the consistency in utility measurement methods used in NICE STAs. Breast cancer was chosen as a case study for this review. Because the incidence of breast cancer is consistently among the highest of any cancer in England, with 15.4% of new cancer registrations in 2015 from breast cancer [6], we deemed that there would be a higher likelihood of attaining a good sample size of published STAs.

3 Methods

3.1 Evidence Selection

All published guidance produced by the NICE STA and multiple technology appraisal (MTA) programmes is available on the NICE website [7]. Title searches were conducted among these guidance to identify all published breast cancer STA guidance published between January 2006 and April 2017. The primary evidence documents for an STA are the CS and the ERG report. These documents, as well as the following additional documents, were obtained and checked for utility data: (1) company responses to requests for clarification from the ERG and NICE; (2) errata and addenda; and (3) documents produced by the NICE committee for public consultation and guidance production. Only STAs with published final funding decisions were included. Where any of the above documents referenced external sources, these publications were obtained and reviewed for utility data and additional sources.

3.2 Data Extraction

Data were extracted using a form developed in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA). The extraction form was piloted on two STAs and further refined to ensure the capture of all relevant data. Data were extracted by one researcher (MJR) and checked by a second researcher (SJCR).

A streamlined process was used to extract utility data. First, data were extracted from the CS, and, second, the ERG report was examined. Utility scores and utility decrements were identified in the ERG report, but only extracted where they differed from the CS. NICE appraisal documents other than CSs and ERG reports were checked in the same way but none contained new utility data. Lastly, studies identified by STAs as utility value sources were extracted.

Data on study and population characteristics, such as year of publication, patient age, menopausal status, breast cancer stage, receptor status, treatments compared, and previous treatments, were extracted. In addition, utility data were extracted for each health state and each adverse event. Health states were commonly represented by utility scores, while adverse events and complications were commonly represented by utility decrements. Collectively, we define these as utility values. Additional data were extracted on measurement methods, additional utility data references, and mapping techniques.

3.3 Synthesis and Analysis

The STAs were first divided into early breast cancer (EBC) and metastatic breast cancer (MBC). The sources and methods for the utility values were then compared with the relevant NICE reference case, which was defined by the year of STA submission. Next, utility scores identified in STAs were analysed for value consistency with their source. Additionally, an analysis of clarification requests by the ERG and NICE was undertaken. Feasibility of mapping was assessed by checking the availability of mapping algorithms using the Health Economics Research Centre (HERC) database of mapping studies [8]. Analyses included assessment of alternative utility values identified by ERGs and used in their exploratory analyses, where these differed from the CS.

4 Results

Twelve breast cancer STAs were identified, four evaluating EBC and eight evaluating MBC. The following sections analyse STA consistency with preferred methods in the NICE reference case, and then consistency with source data and methods.

4.1 Early Breast Cancer Technology Appraisals

There were four STAs relating to EBC: TA 107, TA 108, TA 109, and TA 424 [9,10,11,12]. EBC STAs and their utility score sources had broadly similar populations, with some differences in epidermal growth factor receptor, oestrogen receptor and axillary lymph node involvement status, which may affect prognosis but was considered unlikely to affect quality-of-life measurement [13]. No studies in the meta-analysis by Peasgood et al. assessed quality-of-life using the above factors [14].

Three EBC STAs were assessed against 2004 methodological guidance, and one was assessed against 2013 methodological guidance [2, 4]. Across the four EBC STAs, there were 39 utility values (utility scores and utility decrements). Three unpublished sources based on company trials and ten published sources were used for utility values [15,16,17,18,19,20,21,22,23,24]. Among utility values, 21% (n = 8) were redacted or not reported in the submission document, including all health-state utility scores in TA 107 (only utility decrements from Hillner and Smith were reported [16]) and health-state utility scores from TA 109 for remission (disease-free survival [DFS]) and first locoregional recurrence. Among the redacted or not reported scores, there were six utility scores that did not have the methods reported and two that were derived by mapping to EQ-5D from European Organisation for Research and Treatment of Cancer (EORTC) Quality-of-Life Questionnaire-Core 30 (QLQ-C30) data in the company primary effectiveness trial (the Breast Cancer International Research Group 001 Trial [BCIRG001]), both redacted [11, 25]. In summary, there were 33 utility scores that had methods reported in EBC. Table 1 shows the consistency of unredacted utility value sources, with the relevant NICE reference case for each utility source used in EBC STAs. The ERG on TA 424 listed Essers et al. as the source of alternative utility values in their analyses, however they did not identify the four original source papers these utility values were derived from in the study by Essers et al., i.e. Tengs and Wallace, Carter et al., Hayman et al. and Van Hanswijck de Jonge et al. [20, 21, 23, 24, 26]. Utility scores did not appear to be directly derived from the source papers as these papers were not cited by the ERG. We were unable to confirm utilities by Van Hanswijck de Jonge et al. as the poster presentation was not available to us; therefore, these values are not included in Table 1 or in the following totals. Values derived from the study by Hayman et al. were identical to those derived from the case report by Carter et al., and were therefore also not included [20, 21].

Table 1 Consistency of EBC utility sources with the NICE reference case

All percentages in this section are based on the 33 utility scores with clear methods reporting. Among the transparently reported utility scores, 36% (n = 12) of utility values were derived from patients and 39% (n = 13) were valued by the general public. The most consistent application of the reference case in EBC was the use of standard gamble or time trade-off methods, with 45% (n = 15) of utility scores using appropriate methods. Only 21% (n = 7) of utility scores used EQ-5D values. EBC utilities were often inconsistent with NICE reference case preferences.

Two ERG groups evaluated four CSs of EBC treatments. Of the four EBC STAs, only two included questions submitted to the company for clarification as the NICE STA process was still in its infancy in 2006. The clarification questions submitted to the company in TA 107 did not contain any questions about the derivation of utility scores, and TA 424 primarily used methodologically appropriate scores, resulting in no questions about NICE-preferred methods [27, 28]. While each of the ERG groups commented on the methods of deriving utility scores in their reports, no commentary indicated an NICE preference for EQ-5D data [29,30,31,32]. In fact, as Table 1 shows, one ERG group that examined company utility values derived from EQ-5D explicitly chose to test alternative utility values that consistently did not use NICE-preferred methods or comply with the 2013 NICE reference case [4, 32].

Utility scores used in EBC had utility values that were generally consistent with their cited sources, but sources rarely used NICE-preferred methods. Among studies assessed against the 2004 NICE reference case, Hillner and Smith used a rating scale [16], Sorensen et al. used a chained standard gamble method [33, 34], Brown and Hutton used the standard gamble method [15], Ossa et al. used a bespoke time trade-off method based on the EQ-5D [18], and redacted scores in submissions used undisclosed scales or EQ-5D. One STA was assessed against the 2013 NICE reference case, TA 424. Between the CS and the ERG report, there were five sources for utility scores reported, i.e. Lidgren et al., Lloyd et al., Tengs and Wallace, Carter et al., and Van Hanswijck de Jonge et al. [19, 20, 22,23,24]. The company used five utility scores from Lidgren et al. and one score from Lloyd et al. All five scores from Lidgren et al. were derived by mapping EORTC QLQ-C30 to EQ-5D. Lloyd et al. conducted a standard gamble with regression to derive scores.

The ERG used alternative utility scores from Essers et al. [26], which were derived from three studies not cited by the ERG [20, 23, 24]. Tengs and Wallace gathered together a large sample of utility scores derived from a variety of studies using a variety of methods; however, none that were cited in the study by Essers et al. complied with the 2013 NICE guidance [4, 23, 26]. Carter et al. reported on a case study of a 74-year-old woman who used standard gamble elicitation derived from healthcare workers to assess her quality-of-life [20]. The utility value reported by Essers et al. does not perfectly match the value reported by Carter et al. [20, 26]. The third study was a poster that we could not acquire [24]. Mixing these different instruments could provide inconsistent results as it is well-known that alternative methods of utility measurement produce different results [2].

4.2 Metastatic Breast Cancer Technology Appraisals

There were eight technology appraisals of MBC: TA 116, TA 214, TA 239, TA 250, TA 263, TA 295, TA 371, and TA 423 [12, 35,36,37,38,39,40,41]. Some appraisals were for locally advanced breast cancer (LABC) or MBC. Similar to how human epidermal growth factor receptor 2 (HER2) and oestrogen receptor status in EBC were handled, the same utility scores were used for LABC as those used for MBC. The populations for the STAs were broadly similar, with most technology appraisals using utility scores derived from one study, i.e. Lloyd et al. [19]. No utility values were redacted in MBC STAs.

All STAs were assessed against the NICE reference case relevant to the year of their production. Only TA 116 was assessed against the 2004 reference case [40], and six STAs were assessed against the 2008 reference case: TA 214, TA 239, TA 250, TA 263, TA 295 and TA 371 [35,36,37,38,39, 46]. While TA 371 was published after the 2013 NICE reference case had been published, the STA began before the 2013 methods guidance had been published, therefore it has also been assessed against the 2008 reference case. One MBC STA was assessed against the 2013 NICE reference case, i.e. TA 423 [41]. In total, 135 utility values were reported.

Six sources were used among the eight MBC STAs. Evaluating the six sources for consistency with the NICE reference case is more parsimonious than evaluating each of the 135 utility values. Table 2 presents the six MBC utility sources and assesses them against the NICE reference case. Five of six sources used regression methods to derive utility scores [5, 19, 41,42,43]. Lloyd et al. built a regression based on chained standard gamble results from the general population [19], while Cooper et al. [42] used a meta-regression of utility scores for MBC, with most values derived from three sources, all of which used the standard gamble approach, with oncology nurses and oncologists providing responses and valuations [15, 17, 44]. Peasgood et al. conducted a systematic review with separate meta-regressions of utility scores in EBC and MBC, derived using a variety of measurement methods and a variety of different groups of individuals valuing health states; the meta-regressions included many covariates, including valuation methods [5]. TA 423 used the mapping algorithm developed by Crott and Briggs to map EORTC-QLQC30 trial data to EQ-5D utility scores for both health states and for utility decrements due to adverse events [41, 43, 45].

Table 2 Comparison of MBC utility methods with NICE reference case

As Table 2 shows, most utility scores used in STAs of MBC did not use the preferred methods of utility measurement specified in the relevant NICE methods guidance. From 2006 onwards, seven of eight CSs for MBC STAs used Lloyd et al., a source that did not use NICE-preferred methods [35,36,37,38,39,40,41, 46]. All eight ERG reports for MBC STAs used utility values derived from Lloyd et al. [32, 47,48,49,50,51,52,53]. As a consequence, many of the utility values are out of line with NICE-preferred methods used in other STAs in other disease areas, making the utility values in breast cancer difficult to compare with those in other areas—the currency of comparison is not universal and there is no means of conversion [19].

Peasgood et al. found that measurement methodology (who measured, who valued, what instrument) was often highly influential on utility values [5]. While the work of Peasgood et al. was used for sensitivity analysis in TA 214, no explanation was given as to which regression variables were utilised and how the utility scores were generated [5]. As such, no STA conducted on breast cancer has produced cost-effectiveness results that can be compared like-for-like with other disease areas.

The eight MBC STAs included three ERG groups and five companies [35,36,37,38,39,40,41, 46]. Only one ERG group explicitly expressed NICE’s preference for EQ-5D data and queried whether mapping algorithms to EQ-5D were considered [50]. The company reply indicated that no search had been conducted but that a current trial was collecting both EQ-5D and FACT-B data with a goal of producing such an algorithm, with data expected in the next 18–24 months (that submission was in 2010 [54]). The company has since had three NICE STA guidance published using the study by Lloyd et al. for utilities, with the latest submission produced in 2013 [19, 36, 46, 55].

In 2008, NICE officially advocated mapping methods to convert other quality-of-life scales to EQ-5D [3]. Only the last of the seven breast cancer STAs produced after the 2008 NICE methods guidance used mapping to produce EQ-5D values [35,36,37,38,39, 46]. Each of the six STAs that did not use mapping algorithms used trial evidence with effectiveness measures that could be mapped to EQ-5D at the time of submission to NICE.

Four clinical effectiveness studies used in STAs used the FACT-B disease-specific quality-of-life instrument. The E2100 trial was used by TA 214 (submitted May 2010) and TA 263 (submitted December 2011) for effectiveness evidence [36, 54, 56], while the CONFIRM trial was used for effectiveness evidence in TA 239 (submitted November 2011) [38, 57]. The study referred to as Study 201 was used in the CS for TA 250 (submitted March 2011) [37, 58], and the EMILIA trial was used for effectiveness evidence in TA 371 (submitted December 2013) [46, 59, 60]. FACT-B contains the FACT-G cancer questionnaire plus additional breast cancer-specific questions [61]. Cheung et al. and Teckle et al. mapped FACT-G to EQ-5D in a breast cancer population [62, 63]. During TA 214, the company in TA 263 indicated that an in-process trial using FACT-B and EQ-5D data would produce a mapping algorithm [54], and use of individual patient data in this algorithm could allow sophisticated control for adverse events and disease states. Examining the latest version of the HERC database of mapping studies identified no published algorithm based on that trial [8].

The Bolero-2 trial was used for effectiveness evidence in TA 295 (submitted February 2013); Bolero-2 used the EORTC QLQ-C30 and the breast cancer-specific questionnaire (QLQ-BR23) [2, 35, 64]. EORTC QLQ-C30 was mapped to EQ-5D by Kind [25], Crott and Briggs (breast cancer population) [45], and Kim et al. (MBC population) [65]. Additionally, Kim et al. included the QLQ-BR23 in mapping [65]. In 2013, mapping was possible at the time of submission for TA 295. No justification was provided as to why mapping methods were not used by the company.

Table 3 provides a checklist for assessing consistency with the NICE reference case with regard to utility measurement. The checklist is designed to force technology appraisal participants (whether the company or appraisal group) to fully examine consistency with the NICE reference case, help formulate clarification requests and requests for additional analyses to be sent to the company, and encourage ERGs to ensure that they are knowledgeable about available mapping algorithms for the disease area being studied in a technology appraisal.

Table 3 Reference case assessment checklist for technology appraisal submissions

5 Discussion

The results of this review show that NICE methodological guidance has not led to consistent application of NICE-preferred methods of utility measurement in breast cancer STAs. While mapping algorithms were often available, they were rarely assessed, recommended, or used. In one instance, a company stated that their trial data would allow mapping [54], and subsequently submitted five submissions that did not use a mapping algorithm generated from that trial [12, 36, 46, 55, 66]. In addition to nonreference methods being prevalent across breast cancer STAs, clinical guideline (CG) 81 and an HTA of lapatinib plus trastuzumab with an aromatase inhibitor also used utility values that were not valued using EQ-5D or mapped to EQ-5D [47, 67]. These guidance were produced with direct NICE oversight or by an NICE-contracted technology appraisal group.

The findings of this review relate to breast cancer STAs conducted by NICE for the English NHS. It is unclear whether the results are generalisable to other disease areas. We have not made, nor can we make, any conclusions about methodological consistency in utility score measurement in NICE technology appraisals in general.

The review did not formally assess other forms of NICE guidance, such as MTAs, diagnostic guidance (DG), or CG. For completeness, we have conducted rapid informal reviews of the utility scores used in other breast cancer guidance produced by and for NICE. We identified two NICE CGs related to breast cancer: CG 80 and CG 81 [67, 68]. CG 80 did not use QALYs or utility scores [68], whereas CG81 used utility scores that were derived from Cooper and colleagues and were thus inconsistent with NICE-preferred methods, as stated earlier [42, 67]. Four additional technology appraisals were identified through the NICE website (TA 34) and through CG 81, which replaced TA 30, TA 54, and TA 62, all of which were not available from the NICE website, and, although TA 34 was available, it did not use QALYs or utility scores [69]. CG 81 did not use methodologically appropriate HRQoL data [67]. Two DGs were identified: DG 8 and DG 10. DG 8 used health-state utility scores that were inconsistent with preferred methods, but did include utility decrements that were consistent with NICE methodology; the majority of utility values used in DG 8 were inconsistent with NICE methods [70]. The utility values in DG 10 used NICE-preferred methods [71]. A limited amount of data were available for TAs produced prior to 2002, but it is clear from other published guidance in breast cancer that inconsistency with methodological guidance has historically been a problem in these evidence streams, in a similar fashion to STAs.

The findings of this study are similar to previous work evaluating utility value measurement in STAs. Tosh et al. [72] found that only 56% of values used in technology appraisals were compliant with the 2004 NICE reference case [2], and only 49% used EQ-5D [72]. Technology appraisal participants have shown a lack of awareness of the full specification of NICE’s methodological preferences for utility measurement, and have not emphasised mapping to EQ-5D. This can be remedied in future technology appraisals by working from the full reference case guidance and using the framework laid out in Table 3 for assessing utilities.

The study by Lloyd et al. was used most across the 12 published STAs, and was used in some form by all eight MBC STAs and one EBC STA [19]. Their study was conducted by United BioSource Corporation, a consultancy company, and Eli Lilly, a pharmaceutical company and one of the companies that made a submission to NICE [19, 40]. In the study by Lloyd et al., EQ-5D was collected at baseline in a nonclinical valuation population but not in a clinical study for the disease population [19]. One ERG stated that Lloyd et al. might have been the best available utility data; however, this ERG did not consider whether a mapping algorithm could have been used to produce EQ-5D utilities. Using utility scores derived by mapping as the base case, and the values used in the study by Lloyd et al. utilities as a sensitivity analysis would have been in line with NICE guidance, would allow transparent discussion of any limitations of mapping algorithms, and would allow decision makers to decide whether they prefer consistency within breast cancer STAs or consistency with methodological guidance [3, 19].

There are many mapping studies but support for mapping methods is not universal. Doble and Lorgelly assessed the external validity of mapping algorithms from EORTC QLQ-C30 to EQ-5D and found that existing algorithms generally perform poorly, with seven of ten algorithms having statistically significantly different observed and predicted QALYs [73]. The best performing algorithm was the most complicated and computationally intensive algorithm, which limits its application [73, 74]. Pennington and Davis found that choice of mapping algorithm could significantly affect cost-effectiveness results [75]. Introducing mapping algorithms may add complexity to the technology appraisal process without producing better estimates of EQ-5D valuations. The impending transition from EQ-5D-3L to EQ-5D-5L now that a UK scoring algorithm for EQ-5D-5L is available adds further complications to the debate over the appropriateness of mapping as the greater sensitivity of EQ-5D-5L may make it more amenable to mapping [76, 77].

A search for STAs currently in development revealed eight STAs [78]; four were suspended, withdrawn, or discontinued before any evidence submissions were received from the company. In addition, one STA was suspended after evidence submission by the company and analysis by the ERG; this STA used utility scores from the study by Lloyd et al., with no objection from the ERG and no request for additional analyses using NICE-preferred methods [79]. NICE currently has four guidance actively in development for MBC [78], two of which do not have publicly available submissions from the manufacturers [80, 81]. One STA began in 2013, and, in this STA, the CS and ERG report used the study by Lloyd et al. for utility data [19, 46, 82]. The most recent STA (CS provided in September 2016) to have CSs, clarification questions, and an ERG report conforms well to NICE-preferred methods [83, 84].

Why EQ-5D values were so infrequently used in CSs is an unanswered question. An in-progress systematic review, currently published only as a conference abstract, identified 17 studies that provided EQ-5D utility scores for breast cancer [85]. Only three published breast cancer STAs used EQ-5D values. Two EBC STAs used EQ-5D—one used mapping (TA 109, scores redacted) and one used directly measured EQ-5D scores (TA 424). The clinical evidence sources used in the companies’ submissions for MBC all had data that could be mapped to EQ-5D at the time of submission. Only one MBC STA used mapping from EORTC QLQ-C30 to derive appropriate EQ-5D utility scores and received criticism for using these appropriate methods [41, 48]. We believe there are three potential explanations for why alternative utility values have not been used: (1) alternative values have been identified and are unfavourable to companies; (2) there is a lack of belief in mapping methodology; (3) and/or the values from the study by Lloyd et al. are viewed as accepted and further work viewed as unnecessary or inappropriate in spite of the dataset’s inconsistency with NICE-preferred methods. None of these potential explanations helps foster informed consistent decision making across disease areas.

The last two published MBC guidance (TA 423 and TA 424) both used EQ-5D utility scores, with one STA using mapping to derive these values [12, 41]. The most recent breast cancer STA guidance in development also used EQ-5D [83]. It appears probable that NICE, ERGs and companies are improving compliance to NICE’s methods; however, STA guidance issued prior to this improvement in compliance are still binding and still have the potential to cause inequitable decisions. The NICE should ensure that when these guidance are updated, the utility methods in these guidance are also made conformant with NICE-preferred methods.

While some ERGs have criticised utility score usage in STAs, the primary focus in breast cancer STAs has been on survival rather than utilities, which is not inappropriate. Time is limited in STAs. Only 8 weeks are available to evaluate a multi-hundred page document, formulate clarification questions to the company, incorporate new data and analyses from the company clarification response into ERG analysis, check an executable model, and produce a report usually spanning more than 100 pages [86]. It is important for ERGs to focus on the factors most likely to affect cost-effectiveness conclusions, and survival data frequently have the largest impact on cancer model outcomes.

However, ERGs have frequently failed to comment on the consistency of utility scores in breast cancer STAs with NICE-preferred methods. In fact, in TA 423, the ERG advised using noncompliant data from the study by Lloyd et al. [19], and, in TA 424, the ERG used utility scores from sources that did not comply with the NICE reference case for their alternative utility value analyses [32, 48]. We believe that failure to use NICE preferences may be prevented through the use of Table 3.

In the current methodological guidance (2013), the section on valuing health effects for the reference cases has 12 subsections, with the use of EQ-5D being reiterated in nine of these [4]. The emphasis in the guidance is clearly on consistency across different disease areas and between NICE guidance. Unfortunately, the written emphasis has not, until very recently, been followed by consistent actions.

In breast cancer STAs, NICE and ERG groups have often not requested analyses, and have frequently not conducted supplemental analyses, using EQ-5D utilities. The quantity of EQ-5D mapping algorithms for breast cancer populations has increased, as evidenced in the HERC database of mapping studies [8]. This database facilitates the use of mapping methods to transform clinical effectiveness data into EQ-5D utility scores in current and future breast cancer STAs. NICE needs to clarify whether the use of mapping methodologies is their preferred second option, and, if it is, they should cease to accept utility values derived from noncompliant measurement methods in breast cancer STAs (and more broadly), and ensure that their preference for EQ-5D utility scores is adhered to, otherwise NICE’s repeated emphasis of their preference for EQ-5D has little meaning and they increase the risk of inconsistent decision making.

6 Conclusion

Historically, breast cancer STAs have shown a broad lack of compliance with preferred methods for measuring utility scores in the NICE reference case. However, the last two guidance published in December 2016 adhere to the NICE reference case, and the latest STA guidance in development also conforms to NICE-preferred methods [12, 41, 83]. It will be the continued responsibility of the ERG groups and NICE to ensure that future guidance and reviews of guidance maintain this forming trend. It would be inappropriate to let noncompliant utilities used in past guidance remain the preferred utilities in future guidance. To fail to update to methodologically appropriate utility scores when updating guidance would be to grant these guidance using methods that are inconsistent with the NICE reference case a durable precedent, which is a decision that should not stand.