Introduction

The World Health Organization (WHO) defines cardiac rehabilitation and secondary prevention programs as ‘the sum of activity and interventions required to ensure the best possible physical, mental, and social conditions so that patients with chronic or post-acute cardiovascular disease may, by their efforts, preserve or resume their proper place in society and lead an active life’ [1]. Cardiac rehabilitation and secondary prevention programs (CR) are recommended for patients diagnosed with coronary heart disease, heart failure, heart valve disease and following cardiac surgery, including coronary artery bypass graft and following a cardiac event [2]. Cardiac rehabilitation and secondary prevention programs aim to delay disease progression or prevent future cardiac events, also referred to as secondary prevention. Secondary prevention includes lifestyle interventions for risk factor management, such as healthy eating, exercise, weight management, and psychosocial support, including monitoring of patient-reported outcomes [3].

Patient-reported outcomes encompass any report on a patient’s condition as reported by the patient [4]. The assessment of patient-reported outcomes is increasingly important as part of routine patient monitoring and as a quality indicator for treatment programs such as CR [3, 5]. In addition, patient-reported outcomes are a key outcome measure in economic evaluation studies assessing the cost-effectiveness of different healthcare interventions. A recent international study on the cost of CR reported average cost per patient ranging from US$731.54 in the United Kingdom to US$1023.99 in Australia and US$5016.60 in the United States of America [6]. Reported healthcare expenditure on cardiovascular diseases is significant, amounting to AU$12.7 billion in Australia [7] and, £7.4 billion in the United Kingdom [8] in 2019/20 and €155 billion in the European Union in 2021 [9]. With increasing CVD prevalence and morbidity globally [10], rising expenditure is certain, and therefore, the efficient allocation of these resources must be considered. The use of PROMs has been on the rise, and there is a growing demand for cost-utility analysis to evaluate the cost-effectiveness of healthcare programs. This trend aligns with recommendations from influential decision-making bodies, including the Pharmaceutical Benefits Advisory Committee (PBAC) and the Medical Services Advisory Committee (MSAC) in Australia, as well as the National Institute for Health and Care Excellence (NICE) in the UK [11, 12]. There are different types of economic evaluations depending on how the outcomes are assessed, and health-related quality of life (HRQoL), when assessed using utility-based patient-reported outcome measures (PROMs), also known as preference-based PROMs, is applied in cost-utility analysis [13, 14]. Preference-based or utility-based PROMs are comprised of an HRQoL assessment accompanied by a utility algorithm, which is an indication of the preferences of the different health states generated by completing the assessment. By applying the utility weights, the scores obtained from such PROMs, referred to as utility scores, are used to generate quality adjusted life years (QALYs), the outcome measure in cost-utility analysis [13, 14]. The QALY is a composite measure of the quantity of life accrued by a given intervention (usually calculated using survival analysis) and the utility obtained from that life (utility scores obtained when a utility-based PROM assesses HRQoL). Examples of such PROMs include generic measures such as the Euroqol 5-dimensions measures, EQ-5D-3L and EQ-5D-5L [15], the Short-Form 6-Dimensions (SF-6D) [16] and disease-specific measures such as the MacNew heart disease HRQoL questionnaire [17].

Cardiac rehabilitation and secondary prevention programs are an evidence-based intervention that improves the HRQoL of people with CVD; therefore, HRQoL is a recommended measured outcome in this population. Although several PROMs have been validated in populations with cardiovascular disease and CR programs, there is a limited understanding of the suitability of these PROMs for use in cost-utility analysis studies. It is, therefore, important to identify the most suitable utility-based PROM in this population by assessing the quality of its measurement properties and its relevance to the needs of that specific population. This will facilitate accurate assessment of the cost-utility of CR programs and inform decision making.

With the increasing number of PROMs, guidance on the choice of PROM to be used in a specific population is required by mapping their content to internationally recommended patient-reported outcomes to be assessed in each population. The International Classification of Functioning, Disability and Health (ICF) is a recognized tool for comparing different PROMs [18, 19]. It is a bio-psychosocial framework of health developed by the World Health Organization for measuring health and disability at both individual and population levels across different categories: Body functions, Body structures, Activities and participation, Environmental factors, and Personal factors [20]. In addition, to achieve value-based care, the International Consortium for Health Outcomes Measurement (ICHOM) has defined key patient-reported outcomes that are important to and should be monitored in patients affected by different diseases, including CVD [21].

Therefore, this review aimed to identify utility-based PROMs that have been validated for use in a population undergoingCR. To assess their suitability for this population, the PROMs were mapped onto the ICF and the PRO global sets for cardiovascular disease, including atrial fibrillation [22], heart failure [23], heart valve disease and coronary artery disease [24] developed by ICHOM.

Review question(s)

  1. 1.

    Which utility-based PROMs have been validated for assessing HRQoL in patients attending cardiac rehabilitation and secondary prevention programs?

  2. 2.

    How does the content of these measures compare to the ICF framework, and do they address the domains recommended by ICHOM for individuals with CVD?

Methods

This review was registered with PROSPERO (CRD42022349395) and conducted following the JBI methodology for systematic reviews of measurement properties [25]. The full protocol for the conduct of this review has been published in detail elsewhere [26], and a summary is presented below.

Inclusion criteria

This review considered studies in adults ≥ 18 years of age eligible for a cardiac rehabilitation and secondary prevention program, assessing quality of life or HRQoL using a generic, disease-specific, or population-specific utility-based health-related PROM or PROMs accompanied by a scoring algorithm to generate utility scores. Studies were considered for inclusion if they assessed one or more aspects related to the measurement properties, development (to assess content validity), or interpretability of the PROM. Included studies reported on at least one of the following properties: 1) reliability, encompassing internal consistency, reliability, and measurement error, 2) validity, including structural validity, content validity, and construct validity, and 3) responsiveness. The COSMIN definitions for these measurement properties and the tests to assess them are provided in Supplementary Table S1.

Types of studies

Studies of quasi-experimental designs, before and after studies, analytical observational studies, including prospective and retrospective cohort studies, case–control studies, and cross-sectional studies were considered.

Search strategy

We employed a three-step search approach, commencing with an initial exploration of MEDLINE (via Ovid) and CINAHL (via EBSCO) to pinpoint relevant articles pertaining to the subject. Subsequently, we extracted text words and index terms from pertinent articles to formulate a comprehensive search strategy for use across other databases. We also examined the reference lists of included studies to identify any relevant supplementary studies. A search strategy was developed based on COSMIN-recommended search filters and previously published research in patients undergoing a cardiac rehabilitation and secondary prevention program [27] and assessing HRQoL [28]. This search strategy is provided in supplementary data, Table S2. Studies published from database inception to 30th Sept 2022 were included.

Instrument

For the ‘type of instrument’ concept, search filters developed by the Patient-Reported Outcomes Measurement Group (PROM Group) at the University of Oxford were used to find studies that evaluated PROMs [29].

Measurement properties

The highly sensitive validated search filters developed by the COSMIN initiative in PubMed were used to find measurement property studies. Translation of the original PubMed filter to Ovid MEDLINE by Macquarie University was employed [29].

Databases

The Databases searched were MEDLINE (Ovid), Emcare (Ovid), Embase (Ovid), Scopus (Elsevier), CINAHL (EBSCO), Web of Science Core Collection (Clarivate), lnformit, PsyclNFO (Ovid) and REHABDATA. Unpublished studies/grey literature was searched in Dissertations and Theses Global, WorldCat, Health, Psychosocial Instruments (HaPI) database, and a list of information sources specific to PROMS collated by the PROMS Group at the University of Oxford (e.g. organizations and research groups; journals; royal colleges and relevant links).

Study selection

Two independent reviewers screened the studies (abstracts and titles and then full texts) in Covidence (NB, LG, CMK, HD, VP, MAPP), and conflicts were resolved by involving a third reviewer (AB, SH, JH and RC). Search results and the study inclusion process were presented in a Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) flow diagram [30].

Assessment of methodological quality of the study

The quality of each study was appraised against the COSMIN Risk of Bias checklist [31]. Two independent reviewers completed the checklist for methodological quality, and a third reviewer was involved in any disagreements. Studies were rated as ‘very good’, ‘adequate’, ‘doubtful’ or ‘inadequate’ quality. An overall rating was assigned based on the lowest rating for any standards assessed in the checklist [32]. Data extraction and synthesis were conducted regardless of methodological quality, with the impact of including studies with ‘doubtful’ or ‘inadequate’ ratings assessed in the sensitivity analysis. However, studies with ‘inadequate’ evidence on content validity were excluded from further assessment in the review at this stage [31].

Data extraction

Two independent reviewers (NB, LG, SH, HD, VP) extracted the data using modified overview tables and templates from appendices 3–6 of the COSMIN manual [33], and any disagreements were resolved by involving a third reviewer.

Data synthesis

Data on measurement properties for each PROM was synthesised and evaluated by two independent reviewers (NBB, HD, SH) and conflicts were resolved by a third reviewer (CMK and BK). The quality of each measurement property reported in the included studies was qualitatively summarised, and a narrative synthesis was provided.

The aggregated results were compared against the criteria for good measurement properties to determine whether the measurement property of the PROM was sufficient (+), insufficient (–), inconsistent ( ±), or indeterminate (?) [33, 34]. A positive rating was assigned if the authors provided sufficient evidence that a particular property has been satisfied, negative if not and indeterminate if no information was provided.

The quality of the evidence generated for each measurement property was also graded as high, moderate, low, or very low using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach [31, 33].

Mapping PROM items to ICF categories and domains to ICHOM global sets

To evaluate the content validity of each PROM and its relevance to the needs of patients undergoing cardiac rehabilitation and secondary prevention programs, the content/items of the PROMs were mapped onto the ICF using standardized linking rules and their domains were compared to the domains recommended by the ICHOM for cardiovascular disease.

Results

Supplementary Figure S1 illustrates the screening and full-text review to identify the relevant studies for inclusion. 

Study characteristics

This review found ten eligible studies conducted between 2004 and 2019, with the majority (4) undertaken in Germany. All studies were observational except for one randomised control trial [35] (see Table S3). Nine utility-based PROMs were identified; five language translations of the MacNew heart disease HRQoL questionnaire (MacNew): French [36], Portuguese [37], Italian [38], Persian [35] and German [39, 40], two versions of the 12-Item Short Form Health Survey (SF-12), English [41] and German [42] and the German translation of EQ-5D-3L [43] and EQ-5D-5L [44]. The SF-36 was the predominant PROM (7 studies) against which these PROMs were compared for convergent validity.

The EQ-5D is a utility-based health status measure with general population value sets from several countries, including Australia [45,46,47]. Although both the SF-12 and MacNew questionnaires are not stand-alone preference-based measures like EQ-5D, utility scores can be obtained from responses to SF-12 using its utility system, the SF-6D [48] and from MacNew using the health state classification system developed by Kularatna et al. [17]. As such, these two PROMs were included in this review.

Assessment of methodological quality

Methodological quality was assessed for each study against the COSMIN Risk of Bias checklist [31]. These results are presented in Table 1. Nine studies assessed hypothesis testing for construct validity [35,36,37,38,39,40, 42,43,44], seven studies assessed internal consistency [35,36,37,38,39,40, 44], and responsiveness [35, 37, 39,40,41,42,43], six assessed structural validity [36,37,38,39,40, 44] while four studies assessed reliability [35,36,37, 43] and only one study assessed criterion validity [43]. None of the studies assessed any of the following properties, PROM development, content validity, cross‐cultural validity, measurement invariance or measurement error.

Table 1 Quality of studies on measurement properties—COSMIN checklist

Hypothesis testing for construct validity was very good except in three studies where it was adequate because there was no evidence that the comparator instrument for assessing convergent validity had been validated in the study population [39] and because the statistical method for assessing known groups was not stated but assumed by the reviewers to be appropriate [35, 43]. The internal consistency was very good in six of the seven studies assessing these properties and inadequate in one study [35] where no information was provided about whether other specific internal consistency statistics or IRT-based scores such as standard error were calculated. Responsiveness was very good in four of the seven studies [35, 40,41,42], adequate in one study [43] because the statistical methods were not stated and doubtful in two studies where the intervention was not adequately described [37, 39]. Factor analysis was performed on each sub-scale separately for studies assessing structural validity. Structural validity was very good in two studies that applied confirmatory factor analysis [39, 44], adequate in the two studies that applied exploratory factor analysis [36, 38] and inadequate in two studies where factor analysis was not used [37] and where the sample size was below 5 × the number of items tested [37, 40]. The assessment of reliability was very good in two of the four studies [35, 43] and adequate in two studies because the intra-class correlation was calculated but the model was not described [36, 37].

Data synthesis

Although six studies applied the MacNew, it was administered in five different language translations, and these studies could not be pooled together in a meta-analysis. A narrative synthesis based on the GRADE assessment is therefore provided with details in Table 2. Responsiveness of the English version of the SF-12 (n = 65) was rated as sufficient (+); however, the quality of evidence was low because only one study was identified [41]. Conversely, for the French version of SF-12, the quality of evidence was moderate for responsiveness and hypothesis testing; although only one study because the sample size was large (n = 2441) [42]. Rating for reliability, criterion validity, hypothesis testing and responsiveness of the German version of EQ-5D-3L was sufficient (+), but the quality of evidence was low as only one study (n = 114) was identified [43]. On the other hand, the quality of evidence for structural validity, internal consistency, hypothesis testing and responsiveness of the German version of EQ-5D-5L was moderate because of this study’s significantly larger sample size (n = 3225) [44]. For the MacNew, assessment of structural validity, internal consistency and hypothesis testing was rated as sufficient (+); however, the quality of evidence was low for the Portuguese (n = 200) [37], French (n = 323) [36], Italian (n = 298) [38] and Persian (n = 60) [35] translations. The quality of evidence was high for the German version as two studies [39, 40] with significantly high sample size (overall n = 5781) were included. Reliability was rated as sufficient in all except the Italian version [38], where it was not assessed. The quality of this evidence was low except for the German version [39, 40], where it was moderate. Responsiveness was sufficient in the German and Persian versions [35], but the quality of evidence was low for the Persian version because only one study was included.

Table 2 GRADE—Quality of the evidence for measurement properties of the PROMS

Mapping PROM items to ICF categories and domains to ICHOM global sets

Nine different PROMs were identified with four core questionnaires: MacNew, EQ-5D-5L, EQ-5D-3L and the SF-12. Because both EQ-5D-3L and EQ-5D-5L only differ in the levels and not the items, this was treated as one measure for the linking and mapping exercise. Linking followed the linking rules for PROMs developed by Cieza et al., 2005 [49].Cieza et al. published the ICF linking results for the SF-12 and therefore, this review reproduces linking results from that original paper (Table S4) [49]. In this review, we linked the MacNew and EQ-5D, reported in Tables 4 and 5.

Table 3 Matching to ICHOM global sets
Table 4 Linking MacNew questionnaire to the International Classification of Functioning, Disability and Health (ICF)

The 12-item short form health survey (SF-12)

For this measure, results from the linking guidelines paper are reported as the SF-12 was the illustrated example, reproduced in Table S4 [49]. All items of SF-12 were linked to the ICF category activities and participation.

All the SF-12 domains were mapped to ICHOM global sets for coronary artery disease, heart valve disease, heart failure and atrial fibrillation as described in see Table 3.

MacNew health-related quality of life questionnaire

Items of the MacNew were linked to ICF categories except items 3 and 11, which were not classified and items 20 and 26, which were not definable (see Table 4). Thirteen items were linked to the Body Functions category, and chapters mental functions (1, 4, 5, 6, 7, 8, 10, 18), functions of the cardiovascular, haematological, immunological, and respiratory system (9, 19, 21), and sensory functions (14, 16). Five items were linked to the category Activities and Participation, chapters community, social and civic life (12, 17, 24, 25) and interpersonal interactions and relationships—particular interpersonal relationships (item 27). Items 13 and 22, were linked to the Environmental Factors category chapter attitudes. The level of agreement between reviewers was 96% on the categories, 93% on the chapters and level 1 with 89% agreement on level 2.

All items of the MacNew were mapped to ICHOM global sets for coronary artery disease, heart valve disease, heart failure and atrial fibrillation as demonstrated in see Table 3.

Euroqol 5-dimensions (EQ-5D)

All domains of the EQ-5D were linked to ICF categories (see Table 5). The mobility, self-care and usual activities domains were linked to the Activities and Participation category and chapters mobility, self-care and general tasks and demands, respectively. Pain/discomfort and anxiety/depression domains were linked to the Body Function category, chapters sensory functions and pain and mental functions, respectively. Agreement between reviewers was 90% for the categories, 80% for the chapters and 70% for level 1.

Table 5 Linking EQ-5D to the International Classification of Functioning, Disability and Health (ICF)

All domains of the EQ-5D were mapped to ICHOM global sets for coronary artery disease, heart valve disease, heart failure and atrial fibrillation (see Table 3).

Discussion

Main findings

Nine utility-based PROMs validated for application in populations undergoing cardiac rehabilitation and secondary prevention programs were identified; the German [42] and English [41] translations of SF-12, the German translation of EQ-5D-3L [43] and EQ-5D-5L [44], the Italian [38], Portuguese [37], French [36], Persian [35] and German [39, 40] translations of the MacNew heart disease HRQoL questionnaire.

The quality of evidence for responsiveness and hypothesis testing of the German version of the SF-12 [42] was moderate. The quality of evidence for structural validity, reliability, criterion validity, hypothesis testing, and responsiveness of the German version of EQ-5D-5L [44] was moderate. The quality of evidence for structural validity, internal consistency and hypothesis testing, reliability and responsiveness of the German version of MacNew [39, 40] was high. The quality of evidence for measurement properties of the following PROMs in a population undergoing cardiac rehabilitation and secondary prevention programs was low; English version of SF-12 [41], German translation of EQ-5D-3L [43], Portuguese [37], French [36], Italian [38] and Persian [35] translations of the MacNew heart disease questionnaire.

For all PROMs, linking was predominantly to the activities and participation category of the ICF. All the PROMs domains were matched onto similar constructs from the ICHOM global sets.

Discussion of findings

Several studies have reviewed the literature to identify PROMs used in patients with cardiovascular disease [50, 51], however, this is the first study to specifically consider utility-based PROMs and their measurement properties in patients undergoing a cardiac rehabilitation and secondary prevention program. Since improvement in HRQoL is expected with CR and cost-effectiveness assessment of models and modes of delivery for CR, such as home-based and web-based CR, is key to inform implementation into practice, it is important to identify the best PROMs for assessing these outcomes. Like Thompson et al., 2016 [50], our review identified the disease-specific MacNew, which has been validated and can be applied across different cardiac populations and both the generic PROMs EQ-5D and SF-12 [50]. Our findings are particularly important to inform the choice of PROMs for application in cost-utility analysis studies, which are increasingly a preferred type of analysis recommended by decision-making bodies like NICE in the UK and PBAC in Australia [11, 12]. A recent review of the national health technology assessment (HTA) guidelines from these bodies revealed the prevalence of the generic utility-based PROMs as recommended for use in cost-utility analysis [52]. However, there is potential for additional validated PROMS to be applicable using mapping algorithms to calculate utility scores from responses to non-preference-based disease-specific PROMs. Mapping algorithms to the generic EQ-5D-5L have been developed for some PROMs like the MacNew heart disease HRQoL questionnaire [53, 54], the Kansas City Cardiomyopathy Questionnaire (KCCQ) [55], and the Minnesota Living with Heart Failure Questionnaire (MLHFQ) [56], and have been applied in cost-utility studies [55, 57].

Thompson et al., 2016 [50] highlighted the importance of measurement properties, specifically reliability, validity and responsiveness, when choosing a PROM to be used in cardiovascular disease. This review found moderate level evidence for responsiveness and validity of the German versions of the SF-12 [42], EQ-5D-5L [44] and MacNew heart disease questionnaire [39, 40] in a population undergoing a cardiac rehabilitation and secondary prevention program. These PROMs' reliability (test re-test reliability and internal consistency) is also reported. Responsiveness of the German version of SF-12 in a study assessing predictors of returning to work six months following cardiac rehabilitation and secondary prevention programs reported a moderate standardised effect size of 0.53 and 0.51 for the physical (PCS) and mental (MCS) component scales [58]. The majority (40%) of patients in this study had acute coronary syndrome (ACS), and 8% had undergone coronary artery bypass grafting (CABG). The standard response mean reported by Muller-Nordhorn et al., 2004 [42], identified in this review, for patients who had undergone CABG were PCS = 0.63 and MCS = 0.60 while for those undergoing CR following a myocardial infarction/ACS were MCS = − 0.18 and PCS = − 0.05. Due to the disproportionate distribution of CABG patients in that sample [58], the results of these two studies are not comparable for CABG and are dissimilar for myocardial infarction or ACS, highlighting the need for further studies on the responsiveness of this PROM in this population.

In their scoping review and mapping of heart disease-specific PROMs to the ICF, Alguren et al., 2020 [51] identified 34 PROMs whose items were linked to ICF categories of body function, activities and participation and environmental factors. Similarly, in our review, the heart disease specific MacNew was linked to body function (13 items) and activities and participation (5 items). All items of the EQ-5D were linked to similar ICF categories and chapters in this review, like Cieza and Stucki, 2005 [59]. Mobility was linked to b450 (walking); self-care was linked to d510 (washing oneself) and d540 (dressing); usual activities to d2301 (managing daily routine), d850 (remunerative employment), d835 (education life), d640 (doing housework) and d920 (recreation and leisure); pain/discomfort to b280 (sensation of pain) and anxiety/depression to b152 (emotional functions), b1528 (other specified emotional functions) and b1522 (range of emotion).

Limitations

Although extensive searches were conducted, there were insufficient studies to undertake a meta-analysis of the measurement properties. Several language translations of the MacNew were identified, but only one study reported measurement properties of each version except the German version with two studies. This highlights the need for future studies to assess measurement properties of various translations of utility-based PROMs to guide recommendations for inclusion in health economic modelling studies assessing interventions in the different environments of delivery of cardiac rehabilitation and secondary prevention programs.

There are several limitations of the COSMIN guidelines noted in the literature regarding reporting of the assessment categories for the measurement properties [60]. This is classified as + sufficient, ? Indeterminate—insufficient and refers to the design and reporting of the validation studies but may be interpreted as the quality of the PROM, which is not the case [60]. Commentators recommend more clarity in the guidelines regarding this rating as it affects the confidence users will have in the given PROM. In addition, completing the risk of bias assessment and the GRADE assessment takes a significant amount of time and requires a more than basic understanding of psychometrics [61].

Implications for practice

The EQ-5D-5L, SF-12 and MacNew heart disease questionnaire are linked to ICF categories and ICHOM global sets for CVD, demonstrating their suitability in a population experiencing any form of disability and cardiovascular disease.

This review has highlighted significant gaps in the literature on validation studies for utility-based PROMs in this population and the need for future research to validate these PROMs in patients undergoing cardiac rehabilitation and secondary prevention programs.

Conclusion

This review has identified three PROMs that can generate health state utility values, validated for cardiac rehabilitation and secondary prevention programs: the German version of the generic EQ-5D and SF-12 and the heart disease-specific MacNew HRQoL questionnaire. The PROMs were predominantly linked to ICF categories of Body Function, and Activities and Participation, and matched to all ICHOM global sets. However, with only the German versions of these measures validated in cardiac rehabilitation and secondary prevention programs, it highlights the need for future larger studies to validate the different language translations of PROMs and provide options for use in this population.