FormalPara Key Points for Decision-Makers

As health state valuation methods continue to develop, countries with existing value sets may generate new value sets to reflect state-of-the-art approaches; new value sets need to be tested in ‘real-world’ settings to determine any impacts on quality of life assessment.

This study provides comprehensive insights into how the new Australian value set for the EQ-5D-5L instrument functions in practice, using a common healthcare intervention (joint replacement surgery) as a case example.

The new value set produces less extreme negative scores, giving rise to a narrower scoring spectrum (with clear impacts for post-operative change scores) and likely fewer incremental quality-adjusted life years.

1 Introduction

As health state valuation methods continue to develop, countries with existing value sets may generate new value sets to reflect emerging approaches. This poses an interesting challenge, and requires policymakers to balance alignment with contemporary methods alongside continuity with previous decisions. A key question in this area is to quantify the impact of moving between existing and new value sets. This will be context-specific, depending on the purpose of the quality of life assessment, but if data reveal similar patterns across countries and health conditions, the implications of transitioning to a new value set can be understood and managed appropriately. The Australian EQ-5D-5L value set that was published in 2013 [1] is used widely by clinical quality registries, researchers and health economists to calculate utility scores. These utility scores are used to monitor quality of care, assess healthcare outcomes and inform decision-making around resource allocation. The 2013 value set was derived from a pilot study, with recognition that it may not accurately represent broader community preferences and that interaction effects between levels of dimensions should be further considered [1]. In 2023, a new Australian value set for the EQ-5D-5L instrument was published by Norman and colleagues [2]. The new value set was developed using contemporary preference methods and a fourfold larger population sample, yielding greater precision in the resultant value set. However, the new value set has not yet been tested in ‘real-world’ clinical contexts, and it is not known whether switching to the new value set will have any impact on quality of life assessment. This is particularly important for situations where the collection and reporting of EQ-5D-5L data is well established.

The Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR) has been collecting EQ-5D-5L scores for the past 5 years as part of its national Patient-Reported Outcome Measures (PROMs) program [3, 4]. To date, pre-operative and post-operative EQ-5D-5L data have been collected for over 42,000 primary joint replacement procedures around Australia. Utility scores derived from the EQ-5D-5L instrument are regularly reported in AOANJRR research publications (including those arising from registry-nested clinical trials) and annual reports. Understanding whether the new value set produces substantially different EQ-5D-5L utility scores will inform score interpretation and reporting decisions moving forward. It is also important to understand whether scores obtained from the new value set can be fairly compared with previously published scores. This type of evaluation will also be highly relevant for other EQ-5D-5L users, particularly those using the instrument for cost-utility analysis or longitudinal monitoring of quality of life. This study aimed to examine the impact of the new value set on pre-operative and post-operative EQ-5D-5L utility scores, and the magnitude of improvement after surgery, for patients undergoing primary joint replacement.

2 Methods

2.1 Study Design

This study was an analysis of longitudinal national arthroplasty registry data.

2.2 Ethics Approval

The collection of joint replacement and PROMs data by the AOANJRR is approved by the Commonwealth of Australia as a federal quality assurance activity (QAA 3/2017) under section 124X of the Health Insurance Act, 1973. All patients undergoing joint replacement in Australia provide consent for the routine AOANJRR collection of demographic, clinical and surgery data on an opt-out basis. Additional informed consent is obtained from all participants in the PROMs program.

2.3 Data Source

Data from the AOANJRR PROMs program were used for this study. All primary unilateral THR, TKR, and TSR procedures performed between 1 July 2018 and 31 December 2022 for which pre-operative and 6-month post-operative EQ-5D-5L data were available were included in the analysis.

2.4 Data Collection

The AOANJRR has been collecting PROMs data for patients undergoing joint replacement surgery since 2018; these data are collected for all consenting patients before surgery and at 6 months post-operatively [3]. An automated electronic data capture system is used to collect PROMs data, with established procedures for minimising missing data [3]. A suite of PROMs instruments is administered to patients at each time point, including the 5-level EuroQoL quality of life instrument (EQ-5D-5L). The EQ-5D-5L is a widely used generic health-related quality of life instrument that contains five items measuring mobility, self-care, usual activities, pain/discomfort and anxiety/depression [5]. Utility scores for the EQ-5D-5L are derived from country-specific weights, tariffs or value sets, with a negative score (score less than 0.00) indicating a quality of life worse than dead, and a score of 1.00 indicating full quality of life. The EuroQoL visual analogue scale (EQ-VAS) is also administered as part of the AOANJRR PROMs program but was not used for the present study. Patient demographics (age and sex), clinical characteristics [American Society of Anaesthesiologists (ASA) grade, body mass index and diagnosis] and surgery information (type of surgery and date of surgery) are collected for all patients undergoing joint replacement surgery as part of usual AOANJRR data collection procedures.

2.5 Statistical Analysis

All statistical analysis was undertaken using SAS (version 9.4, SAS). Pre-operative and post-operative EQ-5D-5L utility scores were calculated using algorithms for the two Australian value sets: (1) the value set published in 2013 [1] and (2) the new value set published in 2023 [2]. Data on demographic and clinical characteristics were analysed descriptively to characterise the three joint replacement cohorts. Descriptive analysis was undertaken for pre-operative and post-operative EQ-5D-5L utility scores, and change scores, calculated using the previous value set and the new value set. Effect sizes were calculated for each set of EQ-5D-5L utility scores by dividing the change score by the pre-operative standard deviation; these were classified into small (0.20–0.49), medium (0.50–0.79) or large (≥ 0.80) according to Cohen’s classification [6]. The proportion of patients who achieved a minimal clinically important improvement in quality of life (defined as 0.28 EQ-5D-5L units [7]) was also evaluated. Agreement between the two sets of EQ-5D-5L utility scores was assessed: (1) quantitatively using Lin’s concordance correlation coefficients [8], with coefficients > 0.80 indicating excellent agreement [9], and (2) visually using Bland–Altman plots [10], where the limits of agreement were estimated from the mean difference in utility scores ± 1.96 of the standard deviation of the raw differences. Data analysis was undertaken for the overall THR, TKR and TSR cohorts and stratified analyses were also undertaken by sex.

3 Results

3.1 Characteristics of the Joint Replacement Cohorts

Data were available for 17,576 THR, 23,010 TKR and 1667 TSR procedures. Table 1 presents the demographic and clinical characteristics of the three joint replacement cohorts. The mean ages of patients undergoing THR, TKR or TSR were 66, 67 and 71 years, respectively. Mild or severe systematic disease, according to the ASA classification, was common among the three cohorts (Table 1). Most joint replacement procedures were performed for females (range 55–59%), and the primary diagnosis was most commonly osteoarthritis (range 58–98%).

Table 1 Characteristics of the joint replacement cohorts

3.2 Impact of The New Value Set on EQ-5D-5L Utility Scores

Table 2 presents the EQ-5D-5L utility scores derived from the two value sets for patients undergoing THR, TKR or TSR. For all three cohorts, the new value set produced fewer and smaller negative scores, with a lowest possible score of −0.30 in the new value set (versus −0.68 for the previous value set). There was an upward shift in scores which was most notable pre-operatively. Mean pre-operative utility scores were 0.21, 0.19 and 0.17 utility units higher for the THR, TKR and TSR cohorts, respectively, when using the new value set. There was also a lower proportion of patients with scores indicating a ‘worse than death’ quality of life before surgery (THR: 4% versus 17%; TKR: 2% versus 9%; TSR: 1% versus 6%). For all three cohorts, mean EQ-5D-5L utility scores were higher post-operatively with the new value set, although not to the same extent as observed pre-operatively (Table 2). Post-operatively, there was a small increase in the proportion of patients with the highest possible utility score (to 34%, 19% and 20% for the THR, TKR and TSR cohorts, respectively), reflecting that two EQ-5D-5L health states (11111 and 11211) are valued at 1.00 in the new value set versus only one health state (11111) in the previous value set.

Table 2 Comparison of EQ-5D-5L utility scores derived from the two value sets

Average improvement in quality of life after surgery was smaller using the new value set, compared with the previous value set (THR: 0.32 versus 0.42 units; TKR: 0.22 versus 0.28 units; TSR: 0.15 versus 0.18 units). This was also reflected in smaller effect sizes after THR (1.08 versus 1.23) and TKR (0.86 versus 0.92). For all three cohorts, the proportion of patients with clinically important improvement was lower with the new value set (Table 2). As similar patterns to those described above (with respect to mean pre-operative, post-operative and change scores, effect sizes, and clinically important improvement for the new value set versus the previous set) were observed for females and males in all three cohorts, the results of the sex-specific analyses are not reported here.

3.3 Agreement Between the Two Sets of EQ-5D-5L Utility Scores

Concordance correlation coefficients for each joint replacement cohort showed moderate-to-good overall agreement between the two sets of EQ-5D-5L utility scores, and negligible or no sex-based differences (Table 3). The concordance correlation coefficients were slightly higher for the pre-operative utility scores. The mean difference in scores, limits of agreement and variability in agreement across the breadth of the EQ-5D-5L scoring scale are shown in the Bland–Altman plots, for pre-operative scores (Fig. 1) and post-operative scores (Fig. 2). Separate plots for the THR, TKR and TSR cohorts are provided in the Supplementary information (Figs. S1–S3). To illustrate the observed differences in utility scores, Table 4 provides example health states and the utility scores derived from the previous value set and the new value set for those health states, limited to the three most frequent pre- and post-operative health states for each cohort.

Table 3 Concordance correlation coefficients for the EQ-5D-5L utility scores
Fig. 1
figure 1

Bland–Altman plot for agreement between pre-operative EQ-5D-5L utility scores. The solid red line indicates the mean difference between EQ-5D-5L utility scores derived from the new value set and those derived from the previous value set. The dashed red lines indicate the 95% limits of agreement. The blue line indicates no difference between EQ-5D-5L utility scores derived from the previous value set and the new value set.

Fig. 2
figure 2

Bland–Altman plot for agreement between post-operative EQ-5D-5L utility scores. The solid red line indicates the mean difference between EQ-5D-5L utility scores derived from the new value set and those derived from the previous value set. The dashed red lines indicate the 95% limits of agreement. The blue line indicates no difference between EQ-5D-5L utility scores derived from the previous value set and the new value set.

Table 4 Utility scores derived from the two value sets for the most common EQ-5D-5L health states

4 Discussion

We believe this is the first study to examine EQ-5D-5L utility scores generated by the new Australian value set beyond the initial development of this valuation approach. Importantly, our analysis provides comprehensive insights into how the new value set functions in practice, in terms of the impact on utility scores, which in turn provides information about how quality-adjusted life years (QALYs) estimated using both value sets might differ. We examined national joint replacement registry data for over 42,000 procedures and found noticeable differences in EQ-5D-5L utility scores calculated using the new value set, compared with the previous value set. Most notably, the new value set produces less extreme negative scores, giving rise to a narrower scoring spectrum and hence likely fewer incremental QALYs. We observed higher pre- and post-operative EQ-5D-5L utility scores, and a smaller magnitude of improvement in quality of life after surgery. These findings will be of direct relevance to clinical quality registries, researchers and health economists who use the EQ-5D-5L, particularly where quality of life is measured longitudinally and where improvement in quality of life is a key metric (for example, when estimating QALY gains). More broadly, our research emphasises the importance of comprehensively assessing the impacts of shifting to a new value set, and of documenting these impacts in a transparent manner.

The new Australian value set for the EQ-5D-5L instrument was developed using contemporary discrete choice experiment methods [2]. As such, it can be considered more methodologically robust than the previous value set, which was developed from a pilot study that was designed to test the plausibility and acceptability of the methods. As described elsewhere [2], the new valuation methods incorporated: (1) duration as an attribute, (2) a ‘dead’ health state option and (3) interaction effects. This is different to the previous experimental design [1], which only considered choices between non-dead health states, and hence inferred the position of dead. Why such an approach might yield a value set with a narrower range is uncertain, although this pattern has been observed elsewhere [11]. It may be that explicitly asking people the question of whether something is ‘worse than being dead’ is confronting and could lead to more negative responses, meaning many fewer health states are scored below 0. The updated discrete choice experiment involved over 4400 community-based participants and is therefore considered to more reliably reflect population preferences than earlier approaches due to its large sample size [2].

One previous study in THR has shown that mean differences in utility scores for the EQ-5D-3L instrument varied significantly according to the method and perspective used to develop the value sets [12]. Our THR, TKR and TKR data confirm that EQ-5D-5L utility scores also differ substantially when the methods for developing the value sets are updated. For cost–utility analysis, our results suggest that use of the new value set will likely reduce incremental QALYs and increase the incremental cost-effectiveness ratios (ICERs) for interventions. The effect of using different value sets for the same instrument can be contrasted with the effect of switching between different instruments, with evidence that the selection of instrument [for example, EQ-5D versus the short-form 6-dimension (SF-6D)] can affect incremental QALYs from an intervention [13]. While Australia does not formalise an ICER threshold, these results suggest that policymakers should be cognisant of the value set on which the QALYs are based and adjust their interpretation of ICERs accordingly. The present study demonstrates the importance of this issue, giving guidance on the translation of results between evaluations that use different value sets. Such translation will likely differ across countries and patient populations, which would be challenging for policy makers, but nonetheless, it signifies that their conclusions are sensitive to the underlying methodological assumptions employed.

Equally, our findings have implications for the future reporting of AOANJRR PROMs data, including the presentation of quality of life outcomes in annual reports and any future health economic analyses that use EQ-5D-5L utility scores. While we found moderate-to-good agreement overall between scores derived from the two value sets, the two sets of utility scores were clearly different. To ensure alignment with the most up-to-date methods for calculating utility scores, it is recommended that the new Australian value set be used for future reporting of pre-operative quality of life and joint replacement outcomes. From a practical perspective, the use of the new value set will need to be specifically explained in future dissemination activities. Transition to reporting new EQ-5D-5L utility scores will require accompanying explanation to signal that higher pre-operative and post-operative utility scores (relative to scores published for earlier years) do not reflect changes in wellbeing but rather they reflect the new methods used to calculate these scores. Similarly, explanation will also be needed around the smaller magnitude of change in EQ-5D-5L utility scores; again, this relates to the new score calculation methods and not poorer outcomes from surgery. Another option would be to report EQ-5D-5L utility scores from both value sets for an interim period; however, double reporting may cause confusion. To ensure transparency of reporting moving forward, we recommend that any future dissemination activities explicitly report which Australian value set was used to derive EQ-5D-5L utility scores.

This study has several strengths, including the use of a large national registry dataset containing EQ-5D-5L data that were collected at two time points for three major joint replacement procedures. This enabled us to examine the impact of the new value set on pre-operative and post-operative utility scores, as well as on the magnitude of change following THR, TKR and TSR. We observed similar findings across the three cohorts, strengthening our conclusions. Importantly, the AOANJRR dataset included the full range of EQ-5D-5L utility scores at each time point (from −0.68 to 1.00 for the previous value set, and from −0.30 to 1.00 for the new value set), and this allowed us to examine impacts across the entire quality of life measurement spectrum. We also acknowledge the study limitations. Post-operative data were limited to the 6-month time point, reflecting AOANJRR data collection procedures (most post-operative improvement is attained by this time [14]), and the impact on longer-term quality of life outcomes cannot be determined. We did not investigate utility scores according to other patient characteristics (for example, age or disease severity), but note there were no sex-based differences in how the new value set functions. Finally, we are unable to determine the generalisability of our results beyond the joint replacement population. While we consider our study to have broad relevance (given the EQ-5D-5L is a generic instrument commonly used for a range of health conditions and to establish population norms), we recognise that our sample reflects the typical demographic and clinical characteristics of people undergoing joint replacement and recommend confirmation in other clinical contexts using a similar comparative approach. This is particularly pertinent for patient populations where changes in EQ-5D-5L responses are likely to be more concentrated in dimensions other than mobility and pain/discomfort (for example, anxiety/depression).

5 Conclusions

Despite acceptable agreement between the two sets of utility scores, our analysis has shown that the new Australian value set produces less extreme negative utility scores, markedly higher group-level scores and smaller change scores. Future reporting of EQ-5D-5L utility scores should note the specific value set used, and accompanying explanation will be needed to signal the shift in the methods for quality of life estimation. This study provides important ‘real-world’ evidence about how the new value set impacts EQ-5D-5L utility score estimation, using a common healthcare intervention (joint replacement surgery) as a case example. Further exploration is required to understand how utility scores may be impacted in other patient populations.