FormalPara Key Points

This systematic literature review identifies and summarizes utility values for heart failure (HF) to support economic evaluations, derived from 161 publications reporting HF utility values from 142 studies.

79.5% of the publications provided utility values on chronic HF, 24% on hospitalization of patients with HF, and 2% on other acute events (some publications provided data on multiple HF states).

EQ-5D was the most common instrument used in 73% of the studies based on EQ-5D values, the interquartile limits (25th and 75th percentiles) of study means for chronic HF were 0.64 and 0.72.

1 Introduction

Heart failure (HF) is the inability of the heart to pump sufficient blood to meet the body’s needs, causing symptoms such as dyspnea, fatigue, and edema. It affects about 64 million patients worldwide and carries a heavy morbidity and mortality burden [1, 2]. Approximately 50% of patients die within 5 years of diagnosis and HF is the leading cause of hospitalization in patients aged over 65 years [2, 3]. Heart failure therefore presents a large public health burden, which is anticipated to grow with an aging global population.

Heart failure may be chronic or acute. Chronic HF is a relatively stable condition. However, periods of stable heart function are punctuated by acute events (also referred to as decompensations). Chronic HF is commonly classified by the New York Heart Association (NYHA) classification, which describes the severity of symptoms and their impact on the patient’s physical activity and daily functioning [4,5,6]. Acute heart failure (AHF) is the rapid onset of new or worsening of symptoms and signs of HF, requiring urgent medical attention [7]. The majority of AHF cases occur in patients with worsening chronic HF, i.e., as a decompensation of existing disease rather than new-onset (‘de novo’) presentation of HF [4].

New treatments and models of care are under exploration for both chronic HF with reduced left ventricular ejection fraction (HFrEF, LVEF < 40%) and chronic HF with preserved left ventricular ejection fraction (HFpEF, LVEF ≥ 50%). As HF has a substantial impact on health-related quality of life (HRQoL) [6], it is important to understand the effect of treatment on HRQoL and consequently assess cost effectiveness (CE) through a cost-utility analysis, where effectiveness is measured in terms of quality-adjusted life-years (QALYs). Health state utility values can be obtained via generic measures, such as the EQ-5D, or condition-specific measures. It is essential that a choice-based evaluation is applied, for instance, in the shape of a time trade-off, to derive a utility scoring algorithm, also called a value set [8]. Utility values are therefore key components that inform health technology assessment decisions. In methodological reviews of HF cost-effectiveness models, utility is one of the key drivers, and sources of heterogeneity, of incremental cost-effectiveness ratios, making it an important parameter for consideration [9, 10]. It is therefore essential that the values used in economic evaluations can be justified. The latest reporting standards from the International Society for Pharmacoeconomics and Outcomes Research recommend values are obtained systematically, are reviewed for quality, and are consistent in the methods used to derive the values [11].

To the best of our knowledge, there have been no systematic literature reviews (SLR) of HF utility values. The aim of this SLR is to identify, summarize, and appraise HF utility values to support and inform economic evaluations.

2 Methods

2.1 Search Strategy

Sources from the National Institute for Health and Care Excellence, Scottish Intercollegiate Guidelines Network, and Canadian Agency for Drugs and Technologies in Health were used to develop the search strategy (Electronic Supplementary Material [ESM] 1), which adhered to a prespecified protocol and methods recommended by the Cochrane Collaboration and the Centre for Reviews and Dissemination. MEDLINE, EMBASE, EconLit, and Centre for Reviews Dissemination York database (which included the National Health Service Economics and Evaluation Database and Health Technology Assessment Database) were searched for relevant articles published from the beginning of database records until June 2019. Databases were searched for primary studies that published utility values for adult patients (aged ≥ 18 years) with HF, regardless of treatment or intervention. The search strategy allowed for the inclusion of studies conducted in broader patient populations if utility values were reported for a HF sub-population, and studies that reported HF utility values as valued or perceived by the general population and caregivers. No minimum sample size was set for inclusion.

Conference abstracts (ESM 1) published between 2016 and June 2019 were searched to identify data from the gray literature. In addition, websites for the Canadian Agency for Drugs and Technologies in Health, European Medicines Agency, National Institute for Health and Care Excellence, Scottish Medicines Consortium, US Food and Drug Administration, and School of Health and Related Research Health Utilities Database were also reviewed. The search was also supplemented with relevant publications identified in a parallel SLR on CE models and economic evaluations in HF.

Only reports, abstracts and manuscripts published in English were selected for further review. References cited in retrieved articles were reviewed for additional publications that had not been already identified (citation snowballing).

As only primary studies publishing new utility values were of interest, CE studies, health technology assessment submissions, SLRs, and meta-analyses were excluded from the review, unless they used or published de-novo data; however, references for utility inputs cited by these publications were assessed for inclusion as part of the citation snowballing exercise. The review was registered (Registration Number CRD42019134288) with the International Prospective Register of Systematic Reviews (PROSPERO) and reported according to the following guidelines: Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) and International Society for Pharmacoeconomics and Outcomes Research good practices on identification, review, and use of health state utilities in CE models (SpRUCE) [11, 12].

2.2 Selection, Extraction, and Quality Assessment

Two reviewers independently screened database records and identified relevant studies for review (Fig. 1) in accordance with pre-defined search strategy. Data were extracted and primary studies with a full publication were assessed for quality, based on relevant criteria from the Papaioannou et al. [13] checklist. Data extraction and quality assessment were performed by one reviewer and quality checked by a second reviewer. Any discrepancies between the two reviewers during selection, extraction, and quality assessment were adjudicated by a third reviewer.

Fig. 1
figure 1

Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) flow chart displaying the number of publications included as well as the number of publications that were excluded, with reasons. CE cost effectiveness, HF, heart failure, SLR systematic literature review

2.3 Data Review

Publications were reviewed with specific attention to the design of the study that generated the utility data, the method of utility elicitation, the value set used to generate utility values, and the utility value summary statistic and data. There are two versions of the EQ-5D instrument, the original three-level (− 3L) version and the five-level (− 5L) version that was introduced in 2009. All papers that were published in or prior to 2009 were assumed to use the EQ-5D-3L version, regardless of whether this was specified.

Publications were categorized according to the health state(s) for which utility data were reported: ‘chronic HF’, ‘hospitalized’, and ‘other AHF’. Other AHF captured those publications that presented data on acute events but did not specify restriction to hospitalization events. Categorization was based on the study population for which the utility values were elicited. In instances where the utility value publication did not report sufficient detail around the study population, where feasible, further information on the study was sourced from either clinicaltrials.gov or the study publication. Publications were assigned the category ‘HF’ where there was insufficient information to assign it to a more specific health state.

The interquartile limits (IQLs) of study means (25th percentile ‘Q1’, 75th percentile ‘Q3’) were calculated for HF subgroups where there were sufficient data. For inclusion in the IQLs, calculation studies had to have a sample size ≥ 100. Weighted averages were calculated where publications reported utility values for two or more study arms and baseline data used where utility values were reported for multiple timepoints. Baseline data were used so as to not confound the analysis with the effect of different therapies, which adds further heterogeneity. Publications that reported utility values for individual NYHA classes were omitted from the calculation if it was not possible to calculate an overall utility value as a weighted average of the classes’ values, as there is often an unequal distribution of patients across the NYHA classes, only a weighted average would avoid bias [14, 15]. Follow-up studies were also omitted. In occasional cases where publications reported utility values for multiple value sets, values based on the most frequently reported value set for the UK were used to calculate Q1 and Q3.

3 Results

3.1 Study Identification and Description

The SLR identified 161 primary publications that reported utility value data (including disutilities) for HF, based on elicitation from 142 studies (Fig. 1, PRISMA flow diagram, Table 1, ESM 2). There was considerable heterogeneity in the design of studies; 78 were observational studies, 43 randomized controlled trials, and 13 cost-utility studies. Studies differed in terms of sample size (range 6–28,500) and treatment arms. Furthermore, substantial diversity was seen within study populations. Some studies (28.5%) recruited a broad population of patients with HF (e.g., patients with chronic HF), others (7%) recruited hypothetical patients with HF or the general population, while the majority (64.5%) had much more specific inclusion criteria based on HF type, severity, or comorbidity (e.g., patients in specific NYHA classes and below pre-specified LVEF thresholds). There was also heterogeneity in how utility values were reported (Table 1). The mean was the most reported statistic (n = 87), eight publications reported both mean and median, and five reported the median. Other reported data included variability statistics, coefficients, estimates, weights, and base-case/model inputs; 35 studies did not state the statistic for the reported data.

Table 1 Overview of studies (n = 142) and publications (n = 161) identified in the systematic literature review

Publications were categorized according to the health state for which utility data were reported. Although new-onset cases of ‘de novo’ acute HF do occur, the disease course of HF is typically that of a chronic condition, with hospitalization episodes due to a worsening of the previously stable condition [4]. This is reflected in the literature with 128 publications focusing on chronic HF, 39 on hospitalization, and only 3 on other AHF (Table 1). The health state could not be defined for ten publications. Of note, some publications reported utility values for several HF health states. EQ-5D (3L and 5L) was the most common elicitation instrument, being used in 104 studies. Value sets used also varied between studies. The UK value set was the most frequently reported value set (n = 33); however, the majority of publications (n = 89) did not report the value set used.

Most studies (n = 130) recruited patients with HF, of which 113 elicited data using patient-reported outcomes. One study used proxy report by caregivers (both informal and healthcare providers) [16]; six studies recruited a general population and one study used healthcare providers, and elicited utility values for vignettes describing specific health states [17,18,19,20,21,22,23]; and five cost-utility models defined hypothetical HF populations deriving utility values from the literature [24,25,26,27,28].

Full manuscripts (n = 91) included in the review were assessed for quality according to relevant criteria from the Papaioannou et al. checklist (ESM 3). In general, publications were of good quality in reporting response rates, using population characteristics that matched those modeled (e.g., chronic HF, hospitalization), using generic preference-based instruments, and assessing utility values elicited directly from patients. However, loss to follow-up and missing data were not reported or addressed by many of the papers.

This high level of heterogeneity between publications limits the ability to compare and synthesize studies. Consequently, focus was predominantly placed on those papers that used the EQ-5D instrument and published mean health utility values as the summary statistic.

3.2 Utility Values in Patients with Chronic HF

The SLR identified 52 studies that published mean utility values for patients with chronic HF, using the EQ-5D (− 3L or − 5L) instrument, of which 35 (Table 2) met inclusion criteria for calculating Q1 and Q3 limits of 0.64–0.72. In a subgroup analysis of those publications that used the EQ-5D-3L instrument (n = 22), the IQL did not substantially change (0.64–0.71). Only two publications that met criteria for inclusion in the IQL calculation used the EQ-5D-5L instrument. Squire et al. reported a mean utility value of 0.60 for patients with HF with NYHA II–IV who had been diagnosed for at least 12 months [29]. Zhu et al. reported a mean utility value of 0.73 for a broad population of patients with chronic HF [30]. Eleven of the 35 papers were excluded from − 3L and 5L subgroup analyses because the EQ-5D version could not be determined.

Table 2 Mean EQ-5D (3L or 5L) utility values for patients with chronic heart failure (CHF)

The chronic HF populations of studies included in the IQL calculations were varied ranging, for example, from advanced HF populations (with or waiting for heart transplant), to more general chronic HF populations who were stable for at least 3 months [31, 32]. A review of data by NYHA class clearly illustrates the impact of the severity of chronic HF on utility values. The SLR identified 11 publications that provided data on mean EQ-5D utility values according to NYHA class, of which nine were included in the IQL calculation; one study was omitted from the IQL calculation as it grouped NYHA classes (I/II and III/IV), and a second study was omitted because it published the mean difference in utility values between classes [33, 34]. Aside from Zhu et al. [30], which reported lower utility values for NYHA I vs NYHA class II, increasing NYHA class was associated with lower utility values (Fig. 2). Interquartile limits were 0.79–0.86 for NYHA class I, 0.75–0.81 for class II, 0.61–0.69 for class III, and 0.51–0.66 for class IV.

Fig. 2
figure 2

Mean utility scores for chronic heart failure, based on EQ-5D health-related quality-of-life data, according to New York Heart Association (NYHA) class [30, 57,58,59,60,61,62,63,64]. Black circle: Comin-Colet 2013, white circle: Delgardo 2014, black square: Gohler 2009, white square: Grustam 2018, white up-pointing trianlge: Kularatna 2017, white up-pointing trianlge: Marti 2010, black diamond: Marti 2011, white diamond: Yao 2007, grey diamond: Zhu 2017

The impact of a value set on the utility value is illustrated in two publications. Berg et al. calculated mean utility values for a large population of Swedish patients with chronic HF, using Swedish and UK value sets [35]. Baseline mean utility score calculated using the UK value set was 0.696; whereas the utility value was nearly 20% higher when calculated with the Swedish value set (0.828) [35]. Eurich et al. calculated mean utility values for patients with HFrEF managed in an outpatient setting using US and UK value sets that gave mean scores of 0.74 and 0.66, respectively [36].

Of those papers included in the IQL calculation, nine publications published utility values for patients with HFrEF, giving an IQL of 0.67–0.74. None of the publications eligible for inclusion in the IQL calculation focused on patients with HFpEF; however, two publications that were not eligible compared utility values in HFrEF and HFpEF. Berg et al., which used the EQ-5D-3L instrument but did not specify the summary statistic, reported a lower utility in HFpEF (0.65) than in HFrEF (0.72–0.73) [37]. Nafees et al., which used a vignette elicitation method, reported similar values between HFrEF and HFpEF populations [21].

In addition to understanding utility values for HF populations, understanding disutility due to chronic HF in the general population (or other patient populations) may be of value for modeling studies. The SLR identified ten papers, regardless of the elicitation instrument or statistic reported, that provided data on the disutility of chronic HF (ESM 4). The large degree of heterogeneity between study designs, in particular, the background population, instrument used, and statistic reported, prevents a detailed collective review of these studies. However, in most cases, the presence of HF was associated with a reduction in the utility value (although statistical significance was not assessed or demonstrated for all disutility values).

3.3 Utility Values in Hospitalized Patients with HF

Acute HF events, whether new-onset (‘de novo’) events or, more commonly, acute decompensations of chronic HF, usually lead to urgent hospital admission [7]. The SLR identified 31 publications that reported EQ-5D utility values for hospitalization, of which two were based on the ACEND-HF trial and four on the WHICH study. Patients with HF are at risk of all-cause hospitalization as well as hospitalization for HF [2]. Twenty of the 31 EQ-5D publications focused on hospitalization for HF, three on all-cause hospitalization, and eight failed to clearly report the cause of hospitalization.

Many of the studies did not report, or poorly defined, the timing of HRQoL elicitation during the hospitalization event. Understanding when HRQoL questionnaires are administered is important as two studies suggest that utility values change rapidly during a hospitalization event. Ambrosy et al. published EQ-5D-3L utility values for the ASCEND-HF trial, which investigated the effect of nesiritide in patients hospitalized with AHF. Mean utility value, for the total study population (regardless of treatment arm) increased rapidly from 0.56 at baseline (assumed near to time of admission) to 0.67 at 24 h of hospitalization; by the time the patient was discharged (day 10), the utility value had further increased to 0.79 [38]. A second smaller study by Swinburn et al. reported mean utility values of patients with HF, as perceived by caregivers and healthcare professionals, following admission to hospital. EQ-5D-3L values, as perceived by experienced cardiac nurses (n = 50), increased from 0.199 on day 1 post-hospital admission to 0.563 on day 3 [16]. By day 7, utility values had increased to 0.817.

For the IQL calculation, papers that published mean EQ-5D utility values (for patients with HF) were included regardless of the cause of hospitalization, if they reported inpatient or discharge utility values. Four papers were identified that provided utility values collected during hospital admission, and six at discharge (Table 3), yielding IQLs of 0.54–0.63 and 0.64–0.73, respectively. Differences in study design, including whether the study focused on all-cause hospitalization or hospitalization for HF, are likely to have contributed to the variability in estimates. In addition, when the EQ-5D questionnaire was administered during hospital stay is also likely to have contributed to the variation in utility values.

Table 3 Mean EQ-5D (3-level or 5-level) utility values during hospitalization and at discharge

Four papers provided longer term, follow-up, mean EQ-5D utility values on patients hospitalized for HF (ESM 5) [38,39,40,41]. Temporal changes in utility values following discharge vary between studies and follow-up care. In most studies, utility values maintain or increase following discharge.

Ten publications reported EQ-5D disutility values for a hospitalization event, with four using data from the SHIFT study (Table 4). While a large degree of study heterogeneity (in particular, the summary statistic reported) prevents IQL from being calculated, it is evident that a hospitalization event reduces utility (Table 4). The publications based on the SHIFT study provide several interesting insights. Two publications by Griffiths et al., both of which use EQ-5D data from the SHIFT study, indicate that disutility because of hospitalization increases with NYHA class; differences in the reported values between the publications suggest sensitivity to differences in the analyses applied in the different publications (both papers report the results of a mixed model using NYHA classes as a time-varying covariate but the model building strategies appear to be different with an automatic backward elimination used to retain covariates in the latter paper) [42, 43]. Kansal et al. provide disutilities specifically for HF hospitalizations based on SHIFT data; disutilities for one or two HF hospitalizations are similar (parameter estimates [standard error] − 0.076 [0.007] and − 0.074 [0.013]), but increases for patients with three or more HF hospitalizations (− 0.133 [0.016]) [44]. Apart from McMurray et al. [45], none of the papers that publish hospitalization disutility values provide time boundaries around the data, i.e., when the decrement is applied and how long the effect lasts. According to McMurray et al., disutility is − 0.105 for patients hospitalized in the previous 30 days and reduces to − 0.054 for patients hospitalized in the previous 30–90 days (UK value set) [45]. This suggests that disutility because of hospitalization reduces over time, which is consistent with other studies publishing trends in utility values following discharge (ESM 5).

Table 4 Disutility because of a hospitalization event, EQ-5D (3-level or 5-level) values

3.4 Utility Values in Patients with Other AHF Events

Acute HF events do not always result in hospitalization. Three studies reported utility values for a broader group of patients with AHF, none of which used EQ-5D. Collins et al. is a modeling study [46], while Davies et al. and Matza et al. use vignette methodology and surveyed general populations [17, 20].

4 Discussion

This SLR identified a wealth of HF utility data, with 161 publications reporting data from 142 studies. This large evidence base provides opportunities for the relevant utility values to be identified and used in cost-utility analyses. However, opportunities to compare and synthesize the studies were limited, as heterogeneity between the studies was considerable. This degree of heterogeneity is not unique to HF, a review of CE analyses in cardiovascular disease by Ara et al. found utilities values varied hugely in terms of the patient population and the methods (in particular, the instruments and value set) used to obtain them, resulting in considerable heterogeneity in the data [47].

From the quality assessment, reporting of loss to follow-up and missing data in HF utility publications needs to be improved to enable the reader to establish whether bias might have been introduced [48]. However, as this review focused on baseline data, lack of reporting loss to follow-up did not pose a risk of bias in this instance. Furthermore, the specifics of the instrument (e.g., EQ-5D-3L vs EQ-5D-5L) and country value set applied should also be reported routinely.

In this SLR, EQ-5D was the dominant instrument accounting for 73% of utility studies. While other instruments may be relevant for specific uses, for the purpose of a comparative synthesis, heterogeneity was reduced by focusing the detailed review on those studies that used the EQ-5D instrument. Furthermore, as utility values are sensitive to study specificities, such as study population and value set used (as well as instrument for elicitation), IQLs were calculated for the comparative synthesis, as utilities for a broad population cannot be accurately represented by a single value.

The IQL for chronic HF was 0.64–0.72, with a trend of decreasing utility with increasing disease severity observed (IQLs 0.79–0.86 for NYHA class I, 0.75–0.81 for class II, 0.61–0.69 for class III and 0.51–0.66 for class IV). As expected, utilities were lower for hospitalized patients with HF (compared with chronic HF), with an IQL of 0.54–0.63. However, IQLs at discharge (0.64–0.73) were near identical to those reported for the general chronic HF population.

Hospitalization of patients with HF is an area of focus for this review as a treatment goal of HF is to prevent hospital admission. Consequently, ‘hospitalization for heart failure’ is a key outcome in many HF trials [7, 49]. Understanding the impact of hospitalization on utility is likely to be central to economic evaluations of new treatments. While 39 publications reported utility data following hospitalization of patients with HF, there were limitations in the data. In particular, the timing of administration of the EQ-5D questionnaire was poorly defined and some publications failed to report the cause of hospitalization, e.g., HF specific, cardiovascular, or all-cause. Longitudinal studies of HF utility were rare; only four studies reported utilities during hospitalization (admission or discharge) as well as at follow-up timepoints, none provided pre-admission data. Furthermore, studies reporting disutility because of hospitalization did not, in general, specify when disutility was assessed during hospitalization or time-bound the effect of a hospitalization event. Consequently, despite the large number of publications, there are important limitations to the hospitalization data that need careful consideration when applying these values in economic models.

Acute HF events may not always result in hospitalization but may require urgent medical attention and treatment; a recent HF trial included ‘urgent HF visit’ alongside hospitalization for HF and cardiovascular death in the primary composite endpoint [50]. However only three studies reported utility values for a broader group of patients with acute HF, none of which used EQ-5D. Consequently, utility data of acute heart failure, not restricted to hospitalization, are limited, highlighting it as an area for further investigation.

To the best of our knowledge, this is the first dedicated SLR of utility in HF. Dyer et al. reviewed EQ-5D utility values in a broad group of cardiovascular diseases [51]. They identified 150 studies that published EQ-5D values for chronic HF, with mean values ranging from 0.31 to 0.78 [51]. The IQLs for mean EQ-5D values for chronic HF calculated in this review fall within the range published by Dyer et al. This comprehensive SLR expands on Dyer et al. While mainly focusing on EQ-5D (because of its high usage), this review is not restricted to this instrument and captures an additional 9 years of the latest data. Furthermore, utility values for specific health states (specifically chronic HF and hospitalization) are analyzed.

Rankin et al. reviewed trial-based economic evaluations of HF interventions that derive QALYs as an outcome measure, to identify approaches used to measure and value change in HRQoL [52]. They identified 20 studies reporting economic evaluations based on 18 individual trials, with most studies (n = 17) using generic preference-based measures to describe HRQoL and derive QALYs, commonly the EQ-5D-3L. Rankin et al. did not provide the utility values reported per study but rather they examined whether the evaluations undertaken alongside trials identified significant changes in QALYs. Our review expands on Rankin et al. to identify, summarize, and appraise primary studies publishing HF utility values, regardless of the treatment or intervention and study design provided the studies report on de-novo utility data.

4.1 Limitations

Whilst we were inclusive in our approach to selecting studies, this review might be affected by publication bias which, although beyond the control of a systematic review, could have distorted our summaries with an over-representation of studies with larger and/or statistically significant results. Further, language bias is also possible, as only publications in English were included in the review.

Because of the fact that HF is asymptomatic in its first stages, early assessment of the severity of HF is a crucial task [53]. The most commonly employed classifications for HF severity are NYHA and American College of Cardiology/American Heart Association stages of HF. The NYHA classification system has been criticized because of the fact that it is based on a subjective evaluation and thus intra-observer variability can be introduced [54]. Although we do acknowledge the criticism on the NHYA classification system, we did investigate how NYHA class impacted utility as this was the most frequently reported classification for HF severity; while none of the identified studies provided utilities stratified per American College of Cardiology/American Heart Association stages of HF.

Acute HF events, whether ‘de novo’ events or, more commonly, acute decompensations of chronic HF, usually lead to urgent hospital admission. In this review, acute HF was driven by hospitalization except from three studies (listed under ‘other AHF events’). The differentiation between ‘de novo’ and ‘acute-on-chronic’ HF cases would have some merit because the initial hospitalization with diagnosis of HF is generally considered more costly. However, we did not differentiate among the two types of acute events because of studies that either do not clearly define the population with HF hospitalization or they include both patients with chronic HF and newly diagnosed HF in the sample of patients admitted to hospital [40, 41, 55]. To avoid grouping the studies based on the reviewers’ interpretation of the data, we did present the findings according to the health state for which utility data were reported: ‘chronic HF’, ‘hospitalized’, and ‘other AHF’. Consequently, all studies reporting utilities for HF hospitalization were grouped together regardless of ‘de novo’ or ‘acute-on-chronic’ events. It is possible that many chronic HF studies may have investigated patients with an exacerbation of chronic HF (‘acute-on-chronic’) and therefore the study population may not be per se stable chronic patients. The manner in which we approached the studies of patients with chronic HF is that when the study reported utility for an acute event (e.g., hospitalization) clearly defined by the authors, we grouped the study under ‘hospitalized HF’ or ‘other AHF’.

Although age of respondents was reported in most of the studies, direct comparison of outcomes based on age between studies was not possible owing to the large differences in the set-up of the studies. Looking at within-study reporting of the role of age, some insights have been offered only by the studies by Calvert et al. and McMurray et al. [45, 56]. The former reported utilities for patients with HF stratified by age, and revealed that the impact of HF on quality of life appears to be independent of age with no specific trend identified (25–34 years: 0.55; 35–44 years: 0.65; 45–54 years: 0.60; 55–64 years: 0.60; 65–74 years: 0.60; 75 + years: 0.60). McMurray et al., in the supplemental results, reported the results of multivariable mixed models for utilities, but for both cohorts (UK and Colombian, Danish analysis) the centered-on-the-mean coefficient for age was very small (− 0.001) and borderline statistically significant (95% confidence interval − 0.001, 0.000), indicating a weak disutility effect by older ages.

5 Conclusions

There is a wealth of published utility values providing a useful source for health economic modelers. In line with latest International Society for Pharmacoeconomics and Outcomes Research recommendations, utility values should be obtained systematically, reviewed for quality, and derived using consistent methods [11]. This SLR provides evidence on suitable values to support future economic evaluations in HF and, where feasible, summarizes the data; utility value IQLs for chronic HF were 0.64–0.72. We advocate the use of systematic reviews to inform the parameters of the models used for cost-effectiveness analyses because utilities are among the key drivers of the models used in HF [10]. This study is an exhaustive repository of data from which utility values can be selected, justified (relevant to specific modeling scenarios), and used. Meanwhile, for those modelers using de novo utility values, data identified in this SLR provide a useful resource for benchmarking.