FormalPara Key Points for Decision Makers

The general public in England assigns high importance to elements of value that are not captured by conventional value metrics, such as quality-adjusted life-year (QALY).

Due to its disproportionate impact on other value elements, cost should be included in value metrics with caution, especially when linear additive models are employed.

Local healthcare commissioners in England could use the relative weights derived in this study to systematically assess new care models.

1 Introduction

In response to increasing demand for healthcare, rising care costs, and tighter fiscal policies, health systems are being redesigned towards value-based care [1, 2]. This is driven by new models of care that focus on a holistic and coordinated approach of organizing healthcare services around population needs, and aim to improve population health, patient experience, and efficiency [3]. Most of these models are emerging as bottom-up innovations and are prioritized by local decision makers based on a multi-composite perception of value [4, 5]. In England, such models of care include the provider collaboratives for acute and mental health services and their predecessors [6, 7]. They can therefore range from condition-specific interventions with integrated elements, such as the Integrated Diabetes Care Programme, to broader initiatives such as the Aging Well Programme, which addresses the needs of people with multiple long-term conditions [6]. Therefore, there is an urgent need among national and local decision makers in England and elsewhere to determine the elements of value of healthcare and their relative importance (RI) to the public, integrating societal values into the decision-making process [2, 8]. This tendency stems from the public’s expectation that local commissioners make decisions on behalf of their communities, and that the public’s ‘voice’ should be incorporated in the priority-setting process, reciprocating their contribution through taxes [9].

Although a consensus on value-based healthcare (VBHC) definition remains elusive, it could be described as the equitable, sustainable, and transparent allocation of resources for enhanced outcomes and patient experiences, transcending traditional metrics such as quality-adjusted life-year (QALY) [8, 10]. The application of the VBHC concept in (local) healthcare priority setting is however still unclear [11]. Previous studies have attempted to define a comprehensive set of elements of VBHC [12,13,14], but its operationalization is challenging. Data are not always available, it is difficult to define a widely accepted scale of measurement (e.g., value of hope), and there is potential overlap between some of the sets of elements (e.g., enjoyment of life and psychological well-being). Currently, priority setting is informed, at its best, by several fragmented ad hoc analyses of cost effectiveness, equity, budget impact, etc., and revisited based on numerous, and frequently arbitrary, performance indicators [15].

Attempts to define VBHC are further complicated by the ongoing debate about whether the cost of healthcare should be a component of value (reflecting forgone value or opportunity cost) that could counterweigh other elements of value. Proponents argue that cost reduction is one of the objectives of new care models [16,17,18] and essential in decision making [13, 19]. Reflecting these principles, multiple value frameworks in the context of priority setting, such as the EU-funded Horizon2020 project SELFIE (Sustainable intEgrated chronic care modeLs for multimorbidity: delivery, FInancing, and performancE), have explicitly included cost as one of their elements [20, 21]. On the other hand, opponents contend that cost is not a value element, and its inclusion would be theoretically wrong [22,23,24]. This debate might be eased if cost inclusion would not change the ranking of other elements of value in terms of importance. Preliminary stated-preference studies have concluded that the integration of cost as a healthcare value element does not alter the structure of preference for other value elements [25, 26]. However, the reliability of these findings can be questioned because of the potential irrelevance of healthcare costs to respondents that do not pay directly for healthcare services (e.g., in the English National Health Service [NHS]) [26]. Furthermore, focusing these studies on specific patient groups limits their broader applicability [26, 27], overlooking the general public’s funding role through taxes or insurance.

The aim of this study was to define the elements of VBHC and assess their RI to the public in England. In addition, the study explored the impact of including cost in a multi-composite value metric on the RI of other elements of value.

2 Methods

2.1 Decision Context and Perspective of Value

In England, local healthcare commissioners are responsible for purchasing most of the hospital and community NHS services in their local areas. With the 2022 Health and Care Act, local commissioners became part of one of the 42 Integrated Care Systems (ICSs) across the country [28]. Each ICS aims to improve outcomes in population health and healthcare; tackle inequalities in outcomes, experience and access; and enhance productivity and value for money [29]. To achieve these goals, ICS commissioners are expected to work in collaboration with health and social care providers and the local authorities to promote the implementation of the new care models [28].

As part of this process, local commissioners have to routinely assess the performance of the different care models, and allocate tight healthcare budgets to the most valuable ones. This prioritization process is ideally based on ICS’ goals and requires a clear definition of what constitutes VBHC. In addition, commissioners are expected to act on behalf of the public as they are the taxpayer's base and the ones that bear the cost of healthcare decisions [9]. It is therefore desirable to define elements of VBHC in light of the aims that ICSs pursue and to assess their RI from the perspective of the public.

2.2 Selection and Definition of the Value Elements

We followed a stepwise approach to select relevant and measurable elements of VBHC. In the first step, we created a full list of potential candidate value elements. To do this, we conducted 26 semi-structured interviews between April and August 2021, asking local decision makers in South East England to indicate factors that are or should be considered when prioritizing health interventions, including new models of care. Information about the methodology used to conduct the interviews and analyse information collected is described in detail elsewhere [15]. The draft list of candidate value elements was supplemented with the findings of a systematic literature review of empirical studies that had applied multi-criteria decision analysis (MCDA) to inform prioritization of healthcare interventions [30]. We focused on MCDA studies because this analysis requires the explicit and systematic generation of a multi-composite value metric and assigns RI to the value elements. Furthermore, MCDA has been widely applied to guide decisions in healthcare [30, 31]. The full list included 28 potential value elements (7 from the semi-structured interviews and 21 from the systematic literature review).

In the second selection step, two researchers (PG, AT) assessed each of the candidate elements against theoretical properties of multi-composite value metrics founded in decision theory [32]. These properties included: (1) completeness—all relevant factors to the decision context were considered; (2) non-redundancy—unimportant or irrelevant to the decision context value elements were excluded; (3) non-overlap—minimize overlap between elements of value to avoid double counting; and (4) operationality—elements of value can be objectively assessed based on data that local decision makers have available to systematically assess and prioritize care models. Figure 1 summarizes the selection process of the elements of VBHC. The list of all potential elements and the corresponding assessment is illustrated in electronic supplementary material (ESM) File A. The six elements of VBHC selected were:

  1. 1.

    Final or intermediate health outcomes.

  2. 2.

    Quality of life and well-being considerations.

  3. 3.

    Quality of care, patient experience or features of the process of care delivery.

  4. 4.

    Cost.

  5. 5.

    Equity.

  6. 6.

    Size of the target population.

Fig. 1
figure 1

Flowchart to select value elements

2.3 Design of a Discrete Choice Experiment

To elicit the RI of the six elements of value, we designed two discrete choice experiments (DCEs) following best practice guidance for conducting such studies [33,34,35,36,37]. DCEs are theoretically sound, relatively easy to administer, and widely used to quantify public preferences in healthcare priority-setting [38,39,40].

2.3.1 Definition of Attributes and Levels

For the definition of the six elements of value (henceforth, ‘attributes’) and their corresponding levels, we first reviewed previous stated-preference studies in healthcare priority-setting to identify all possible options. In addition, we reviewed definitions from policy documents and surveys conducted by NHS England. Based on both sources, we elaborated on a preliminary definition for each attribute and selection of levels, taking into account formative research on best practice guidance [33,34,35,36,37, 41]. Afterwards, we presented the attribute definitions and corresponding levels to the Patient and Public Involvement (PPI) group from the National Institute for Health and Care Research (NIHR) Oxford and Thames Valley Applied Research Collaboration (OTV-ARC) in June 2022. We discussed each of the attributes and refined the wording based on the PPI’s feedback, as part of the pretesting stage [42]. ESM File B summarizes the rationale behind the definition of each of the attributes, and ESM File C includes a detailed explanation of the attribute-level selection process.

The attribute names, definitions and levels used in our descriptive system are presented Table 1. These attributes describe the value elements that should guide the local commissioning of models of care. As indicated in the introduction, new care models involve multiple teams or health professionals that target groups, settings, or levels rather than, or in addition to, individual patients. Examples include the ‘Aging Well Programme’, the ‘Integrated Diabetes Care Programme’, or ‘Early Intervention in Psychosis Services’.

Table 1 Attributes and levels

All attributes were expected to have a positive relation with utility except for the ‘Additional Budget Needed’. The combination of the two first attributes, i.e. additional years of life and quality-of-life improvements, allowed the estimation of QALY measure. As the RI of the QALY compared with other value elements is unclear, we created a QALY gain variable using the collected experimental data (see Sect. 2.5).

2.3.2 Construction of Choice Tasks

An unlabelled paired comparison elicitation format was used in this experiment. Each paired comparison included two hypothetical care programmes and the profiles for each programme, describing the attributes with their specific levels. For each choice task, participants were asked to choose between two ‘Care programmes’, with no opt-out option, as local healthcare commissioners do not have such option in the real world. Participants were asked to imagine themselves as local healthcare commissioners that have to decide which care programme to prioritize for funding, given a limited budget.

To increase a participant’s attention and reduce task complexity, we used graphics and colour coding [45, 46]. A colour-blindness simulator (www.color-blindness.com) was employed to ensure accessibility of the visual elements. The choice task was discussed in a second meeting with the NIHR OTV-ARC PPI group, and tested at the Oxford Biomedical Research Centre (BRC) Open Day in July 2022 with members of the public. For the latter, copies with different potential choice tasks were printed, and members of the public were asked to complete the choice task and give their opinion (see ESM File F). Members of the PPI group confirmed that all icons/graphics were user-friendly, enabling respondents to thoughtfully engage with the different attributes while making their choices [47].

To explore the impact of including cost in a multi-composite value metric, we randomly assigned participants across two DCE versions: one excluding the cost attribute (i.e., DCE-NoAddTax) and the other including the cost attribute (i.e., DCE-AddTax). Both DCEs were identical in all other respects. An example of a choice task, with the cost attribute included, is presented in Fig. 2.

Fig. 2
figure 2

Example of choice task

2.3.3 Experimental Design

Two Bayesian D-efficient factorial designs were created using Ngene (www.choice-metrics.com) [48]. The 'NoAddTax' design included the first five attributes, each with three levels, while the 'AddTax' design also included the attribute ‘Additional budget required’. The two experimental designs were created using results from a pilot study with 40 respondents from the public. For each design, we used 28 rows divided into two blocks, to which participants were randomly allocated, resulting in 14 choice tasks in each block. A similar number of choice tasks has been used in previous similar studies [26, 40, 44, 49]. To reduce task complexity and improve choice consistency, we used attribute-level overlap, with two attributes at the same level in each choice task [45]. To this purpose, partial profile designs were used by creating candidate sets in Stata [50]. We examined the independence of preferences between attributes as this could affect the ranking of preferences and its consequent application in healthcare priority setting [24]. Although in the pilot study interaction terms were not included, we employed a model averaging approach to make possible the estimation of interaction terms [51].

Both designs had a low D-error and a relatively even distribution of movements between levels for each attribute. Details about the results of the pilot survey and the experimental design are presented in ESM Files D and E.

2.4 Survey and Data Collection

The survey, conducted online, comprised eight sections, beginning with a welcoming landing page.

Section 2 featured five screening questions, followed by survey information and informed consent in Sect. 3. Section 4 detailed the attributes and levels, including warm-up tasks for attribute familiarization. This led to two practice questions, the latter being a dominant choice task (Sect. 5), and then to the main 14 choice tasks (Sect. 6), described using simple, neutral language to ensure scenario clarity and encourage honest responses [52, 53]. Subsequent to the choice tasks, Section 7 posed two debriefing questions about survey difficulty, with the final section gathering data on respondents' education, employment, and health. The survey instrument used can be found in ESM File G.

The survey indicated an estimated 15–20 min for completion to manage participant expectations and encourage thoroughness. Based on the time reported by colleagues who tested the instrument, the pilots conducted and experience from similar DCEs [49], we anticipated a realistic minimum engagement time of around 4 min, with an expected average engagement time of approximately 15 min (i.e., the mean completion time, based on pilot studies).

We randomized the order tasks to control for learning curves and fatigue [54]. When collecting the data, we used three quality checks and disqualified respondents who (1) spent less than one-third of the median time to complete the survey; (2) chose the same alternative across the 14 choice tasks (i.e., those that always picked programme A or always picked programme B); and (3) had failed the dominant test, presented as the second practice choice task in the survey.

The survey, targeting adult members of the English public, was conducted online by Dynata, a market research firm responsible for both creation and data collection. To represent the general English population accurately, we employed national census-balanced quotas and targeted recruitment strategies across regions, sex, age, and ethnicity. Dynata ensured respondent quality and representativeness by targeting panel invitations and enforcing strict quality checks based on our quota sampling strategy.

To estimate the minimum sample size requirements, we used data from the pilot study and followed the parametric approach proposed by de Bekker-Grob and colleagues [55]. For the 'AddTax' design, we needed a minimum sample size of 166 respondents with a statistical power of 0.8 at a confidence level of 95%. For the 'NoAddTax' design, the minimum required sample size was 120 respondents. Based on these results, we aimed to collect 200 responses for each design. To account for potential left-right bias, for each of the blocks we created ‘mirror blocks’ by reversing the order in which profiles for ‘Care Programme A’ and ‘Care Programme B’ were displayed [56]. The 400 sample of respondents was therefore randomized across eight blocks, with 50 respondents per block (i.e., the four original blocks plus the four ‘mirror blocks’).

The data of the main survey were collected between 1 November and 1 December 2022.

2.5 Data Analysis

Characteristics of the respondents were summarized using descriptive statistics. To model participants’ choices for each of the designs, we assumed a random utility model under which the two alternative care programmes (A and B) are characterized by a utility function with a deterministic and a random component [38]. For each respondent n, the utility function \({U}_{ni}\) of alternative i is a random variable based on attributes that influence individual’s behaviour (i.e., deterministic or systematic component), \({V}_{ni}\), and a stochastic disturbance term \({\varepsilon }_{ni}\)(i.e., random component). The later measures the deviation from the modelled utility for alternative i and respondent n (Eqs. 1a and 1b):

$${\text{DCE-NoAddTax}}:U_{ni}^{{{\text{NoAddTax}}}} = V_{ni}^{{{\text{NoAddTax}}}} + \varepsilon_{ni}^{{{\text{NoAddTax}}}} where i = A,B$$
(1a)
$${\text{DCE-AddTax}}:U_{ni}^{{{\text{AddTax}}}} = V_{ni}^{{{\text{AddTax}}}} + \varepsilon_{ni}^{{{\text{AddTax}}}} where i = A,B$$
(1b)

The deterministic part of the utility, \({V}_{ni}\), is typically assumed to have an additive structure defined by the attributes of the alternatives and the corresponding estimated parameter \(\beta\), as follows (Eqs. 2a and 2b):

$$V_{ni}^{{{\text{NoAddTax}}}} = \mathop \sum \limits_{k = 1}^{5} \beta_{{nX_{k} }}^{{No{\text{AddTax}}}} X_{kni} = \delta_{10} + \beta_{11} YoL + \beta_{12} QoL + \beta_{13} Exp + \beta_{14} Size + \beta_{15} Eq$$
(2a)
$$V_{ni}^{{{\text{AddTax}}}} = \mathop \sum \limits_{k = 1}^{6} \beta_{{nX_{k} }}^{{{\text{AddTax}}}} X_{kni} = \delta_{20} + \beta_{21} YoL + \beta_{22} QoL + \beta_{23} Exp + \beta_{24} Size + \beta_{25} Equ + \beta_{26} AddTax$$
(2b)

With \({\delta }_{10}\) and \({\delta }_{20}\) defining the alternative-specific constant (ASC), indicating the propensity of participants to choose A over B, and expected to be not significant in our model due to  the mirror blocking used in the experimental design. The estimated parameters \({\beta }_{11:15} and {\beta }_{21:26}\) capture the marginal sensitivity to changes in the attribute levels. All attributes were categorical and dummy coded.

To account for random variation across respondents, we estimated mixed multinomial logit models (MMNL), with all parameters set as random and normally distributed [36, 57, 58]. For the simulation of the choice probabilities we use 1000 Modified Latin Hypercube Sampling (MLHS) draws per individual [59].

To determine the RI of the attributes, and identify whether the preference ranking changes when the monetary attribute ‘Additional Tax’ is included, we calculated RI scores using Eq. 3 [26, 60]:

$$RI = \frac{{Max\left( {pwu_{k} } \right) - Min\left( {pwu_{k} } \right)}}{{\mathop \sum \nolimits_{k} \left( {\max \left( {pwu_{k} } \right) - Min\left( {pwu_{k} } \right)} \right)}}*100$$
(3)

where \(pw{u}_{k}\) corresponds to the part-worth utility (the coefficients) for the attribute k. To obtain a 95% confidence interval (CI) around the RI scores, we used a bootstrapping procedure with 1000 replications [26].

To estimate the RI of ‘QALY gain’ compared with the other elements of value, we combined the survival attribute with the health-related quality-of-life attribute. Since 'QoL' and 'YoL' have three levels each, the 'QALY gains' variable consisted of seven values ranging from 0.1 (i.e., 0.5 additional years of life and 20 points improvement in 'QoL') to 1.8 (i.e., 3 additional years of life and 60 points improvement in 'QoL'). An illustration of the combination of the years of life and quality-of-life attributes to generate QALYs is presented in Fig. 3.

Fig. 3
figure 3

Illustration of generating QALY gains from two attributes. QALY quality of life

To further explore the impact of the ‘Additional Tax’ attribute on people’s preferences, we estimated marginal rates of substitutions (MRS) and compared differences between the two DCE subsamples (i.e., DCE-NoAddTax vs. DCE-AddTax). To this end, ‘YoL’ and ‘QoL’ were treated as continuous.

To test the robustness of the models employed and the quality of the data, both DCEs (i.e. DCE-NoAddTax and DCE-AddTax) were re-estimated excluding respondents who completed the choice tasks in <10 min (i.e., median completion time).

To determine statistically significant coefficients and standard deviations (SDs), we used a significance level of 5%. All data analyses were performed in Apollo software [61,62,63].

3 Results

3.1 Respondent Characteristics

The survey was completed by 402 respondents via an online panel (201 in each of the DCEs). The median completion time in the DCE subsample with five attributes (i.e. DCE-NoAddTax) was 10.2 min (mean 14.3, SD 13.2), and 11.4 min (mean 14.6, SD 14.0) in the subsample with six attributes (i.e. DCE-AddTax). Around 20% found the survey difficult or very difficult (NoAddTax: 18.4%; AddTax: 19.9%). Table 2 summarizes the sociodemographic characteristics of the respondents. The total sample was representative of the English population in terms of sex, age, region and ethnicity. The two DCE subsamples were fairly similar, with people from the 'NoAddTax' DCE reporting few more comorbidities and a slightly lower EQ-5D-5L index or visual analogue scale (VAS)-based quality-of-life score; however, none of these differences was statistically significant (p < 0.05).

Table 2 Sociodemographic characteristics of the respondents

3.2 Choice Analysis

Results from the random parameter panel mixed logit models are presented in Table 3, with all variables treated as categorical. Part-worth utilities are summarized in ESM File H. Results from the root likelihood test indicated a good fit of the choice model on the respondents’ choices. In both choice experiments (DCE-NoAddTax and DCE-AddTax), the models were robust even when respondents who completed the survey in <10 min were excluded (104 responses under DCE-NoAddTax, and 128 responses under DCE-AddTax) [see ESM File I]. When the 'AddTax' attribute was not included, all coefficients had the expected sign and were statistically significant. The derived SD of most of the random coefficients was highly significant (p < 0.05), indicating the existence of heterogeneity across respondents around the mean parameter estimate (see Table 3).

Table 3 Parameter coefficients from the mixed logit model

According to the RI mean scores (Fig. 4a), people in England assigned the highest values to 'YoL' (25.3%; 95% CI 22.5–28.6%), 'Exp' (25.2%; 95% CI 21.6–28.9%) and 'Size' (22.4%; 95% CI 19.1–25.6%), followed by 'QoL' (17.6%; 95% CI 15.0–20.3%). The 'Equ' attribute was the least important in our descriptive system (9.6%; 95% CI 6.4–12.1%). When the 'AddTax' attribute was present, ‘Equ’ is no longer statistically significant (2.8%; 95% CI −2.1% to 6.4%) and there is an equal-sized trade-off between the importance of ‘AddTax’ and the importance of 'QoL' and 'Equ'. 'YoL' (25.3; 95% CI 22.1–29.3%), 'Exp' (22.4%; 95% CI 18.6–26.6%) and ‘Size’ (20.8%; 95% CI 18.1–23.8%) remain as the most important attributes, followed by 'AddTax' (16.5%; 95% CI 13.7–18.8%) and 'QoL' (12.1%; 95% CI 9.3–14.6%).

Fig. 4
figure 4

Comparison of RI scores (a) RI scores with error bars (95% CI); (b) RI scores with QALY gain variable derived from DCE data and error bars (95% CI). RI relative importance, CI confidence interval, DCE discrete choice experiment, QALY quality-adjusted life-year. *Error bars (95% confidence interval)

When we use the 'QALY gains' variable in the DCE-NoAddTax model (ESM File J), all parameter estimates have the expected signs and are statistically significant. According to the RI scores (Fig. 4b), 'QALY gains' (36.1%; 95% CI 32.0–40.8%) and 'Exp' (26.6%; 95% CI 22.7–31.1%) were valued more by people, followed by 'Size' (26.4%; 95% CI 22.5–30.3%) and 'Equ' (10.9%; 95% CI 7.5–14.2%). If the 'AddTax' attribute is present, 'QALY gains' (32.4%; 95% CI 27.9–39.2%), 'Size' (26.2%; 95% CI 22.0–32.5%) and 'Exp' (25.1%; 95% CI 19.9–33.5%) remain as the most preferred elements to the general public in England, but the 'Equ' parameter is not statistically different to zero. As previously, having to pay additional income taxes for a care programme seems to contribute to patients’ choice (14.3%; 95% CI 10.4–19.4%), although not as much as the first three value elements.

ESM File K details the MRS estimates derived from mixed logit models treating 'Yol', 'QoL', and 'AddTax' as continuous variables. Statistically significant differences in MRS estimates expressed in terms of 'YoL' are observed between the two DCE subsamples (i.e., NoAddTax vs. AddTax) across all value attributes except for 'Size' and 'Equ'. For example, when the 'AddTax' attribute is present, individuals seem willing to trade nearly 1 year of life for a 1-point improvement in quality of life (MRS 0.86; 95% CI 0.85–0.88), whereas without 'AddTax', the willingness drops to merely 0.05 years for quality-of-life improvements (MRS 0.05; 95% CI 0.049–0.05). When 'QoL' serves as the denominator, MRS estimates significantly differ for all attributes.

Finally, in regard to the preference independence principle, we found that some of the interaction terms were statistically significant (e.g., 'Size' and 'QoL'). We therefore ran mixed logit models allowing for correlation between the random parameters [69], and results were similar those in Table 3; no significant improvement in the model fit was found. When calculating the corresponding RI scores, results led to the same attribute ordering. ESM Files L and M summarize the results of models with the interactions between attributes, and results of the correlated mixed logit.

4 Discussion

The findings of this study suggest that life-years and health-related quality of life are important elements of VBHC, therefore providing reassurance that the use of QALYs, as a bi-composite measure of value, in prioritization decisions in healthcare is meaningful and relevant [10]. However, our results suggest that the perceived value of healthcare is broader than a QALY as it extends to patient experience, and covers distributional aspects regarding the size of the benefitted population and its socioeconomic vulnerability. This broadened perspective aligns with several international and national trends in defining and assessing VBHC. The UK government, in particular, has incorporated patient experience and equity at the core of ICS priorities [29]. Previous literature has also shown that people value benefits of healthcare other than health benefits and are even willing to trade health gains for better care experiences [10, 14].

The six elements of value identified in this study align with the four pillars of VBHC proposed by the Expert Panel of the European Commission [70]. First, ‘Final or intermediate health outcomes’ and ‘Quality of life and well-being considerations’ align with personal value (pillar 1) as they address individual patients' goals. ‘Quality of care, patient experience or features of the process of care delivery’ relates to ‘societal value’ (pillar 2) as high-quality care and positive experiences may enhance social participation and connectedness. ‘Equity’ and ‘Size of the target population’ align with allocative value (pillar 3) by addressing social disparities and promoting equitable benefit distribution across all patient groups. Finally, ‘Cost’ ties into technical value (pillar 4) through the efficient use of resources. These six value elements also encompass the NHS England definition of VBHC: “the equitable, sustainable and transparent use of the available resources to achieve better outcomes and experiences for every person” [8].

Our findings indicate patient experience as the second most important value element, aligning with studies acknowledging its importance, although none exclusively focused on a 'Patient experience' attribute [14, 71, 72]. In a DCE study conducted for eight European countries, for instance, ‘continuity of care’ and ‘person-centeredness’ are used as measures of patient experience. According to results from the UK, respondents do not place these two attributes at the top of the preference ranking, but the sum of their RI scores suggests that people value patient experience more than physical functioning [14].

The relatively high value assigned to ‘Patient experience’ may reflect challenges in the process of care delivery (e.g., long waiting lists, lack of continuity of care). The coronavirus disease 2019 (COVID-19) pandemic might have played a role by increasing the number of patients waiting for treatment as well as people’s demand for better provider interaction [73]. Another explanation may be the rising numbers of people living with multiple conditions (more than one in four of the adult population in England) or with a long-standing health problem (37.7% in the UK). Healthcare needs for these patients are typically more intricate, requiring more complex and coordinated healthcare than healthier patients [74, 75]. One could argue that the use of smiley/sad faces as the graphic for the patient experience attribute might have induced respondents to rely on heuristics (e.g., salience bias). However, similar graphics have been used successfully in other choice experiments [76], and our PPI group confirmed that the other icons were also accessible and allowed respondents to engage thoughtfully with the various attributes when making their choices. Regardless of the underlying reasons, the high RI of patient experience highlights the need for measuring patient experience in the NHS and providing these data to decision makers. However, patient experience is a broad and complex concept that refers to the entire care delivery journey, and as such, encompasses many dimensions [77]. It is therefore crucial to agree on a set of universal patient-reported experience measures and standardized data collection for effective care model evaluation and monitoring. Ongoing initiatives to routinely collect data on continuity and coordination of care, responsiveness to patient concerns, the opportunity of care, and professional/patient communication should be strengthened [78].

‘Quality-of-life improvements’ were valued less than ‘Additional years of life’, adding to the voices that question the use of QALYs in certain decision-making contexts. QALYs overlook individual preference variations and distributional concerns. Hence, a care programme that leads to a 0.8 QALY gain is seen as equal in value to another care programme that also results in 0.8 QALY gain. Our findings suggest that care models that increase life expectancy at lower levels of quality of life may generate more value to the population than care programmes that achieve the same QALY gain, but with higher improvement in quality of life and less gains in life expectancy. This is in line with studies showing that, from a societal perspective, some QALY gains due to improvements in longevity are more valuable than QALY gains associated with improvements in health-related quality of life [79, 80].

Regarding equity, our study indicates it ranks lowest in value, especially when healthcare cost was factored into the equation, aligning with previous research that shows a preference for efficiency or effectiveness over equity [81, 82]. The relatively low value assigned to equity contrasts with the high value given to the size of the target population, suggesting that the public favours an allocation of healthcare benefits across more people than to favour the most vulnerable populations. At first glance, this does not conform to John Rawls’ difference justice principle, under which social and economic inequalities must be arranged so that they work to the greatest benefit of the less fortunate [83]; however, this result may also reflect a society that recognizes the NHS as a universal health system under which all residents are assured access to healthcare. As such, people place a higher value on universal access to healthcare, and therefore on care programmes that offer high population coverage.

Regarding the healthcare costs, our results suggest that people care about the additional healthcare costs that a care programme entails. Although this attribute did not receive the highest value and its inclusion did not alter preferences ordering, it reduced the importance attained to ‘Quality of life improvements’ and ‘Target population’ attributes, almost by the same proportion, and had a ‘crowding-out’ effect on the ‘Equity’ attribute. Differences on the MRS between the two DCE subsamples also confirmed the impact of healthcare costs on people’s preferences. These results might be partially explained by the cost-of-living crisis currently being experienced in England [84].

Our results suggest that a decision framework is incomplete when the preferences for the monetary attribute are ignored. Nevertheless, its inclusion in the value metric reduces the relevance of other elements of value, with some more affected by others, and it can also be argued that cost is not a value element in itself [22,23,24]. Both approaches may be relevant and useful to decision makers. On the one hand, the relative weights derived in this study can be used to calculate a cost-per-value ratio, with the denominator based on multi-attribute benefit scores (i.e. using the results when costs were excluded). An MCDA could be used for this purpose, together with the use of routinely collected data to obtain performance scores [30, 31]. However, as the elements of value have been broadly defined, decision makers would need to decide on the most appropriate performance indicators to operationalize each attribute (MCDA criteria), and, to avoid potential bias or inconsistency, it would be crucial to justify the choice of these indicators, taking into account the specific health condition or type of health intervention being assessed [85]. This approach would make the efficiency of the care programmes under evaluation more explicit. On the other hand, the relative weights of the six elements of value can be used to obtain a single composite measure for all different alternatives, making the contribution of each element more explicit [10]. Under this approach, decision makers could also develop a ranking of several competing alternative care programmes to inform budget allocation decisions [19].

4.1 Strengths and Limitations

The main strength of this study is the decomposition of cost from value, expressed in terms of additional income taxes. This is the first choice experiment using a split-sample design that expresses costs in terms of income tax increase, adding to the literature in the field. Previous studies have used an out-of-pocket expenditure or changes in insurance premiums-based definition. In principle, this suggests that the type of payment vehicle used to define the monetary attribute does not seem to affect choice preferences in health [26]. The second strength of this study is the use of interviews, literature and decision theory to identify a comprehensive yet applicable set of value-based elements. The suggested elements of value not only encompass the NHS England definition of VBHC but can be operationalized on the basis of routinely collected data. Local healthcare commissioners in England could use the relative weights that we derived in this study to systematically assess new models of care using an MCDA approach. In particular, an additive MCDA model, easy to communicate to decision makers, can be employed to assess new care models, as the elements of value identified in this study comply with the axiomatic properties of decision theory [85]. This adds to efforts to incorporate public values into the local priority-setting and resource-allocation process [86]. Another strength lies in the rigorous methodology followed to conduct the DCEs, which included two pilot studies, involvement from PPI representatives, and the use of quality checks to increase internal validity.

This study has also several limitations. First, the semi-structured interviews were conducted with managers and commissioners, and did not cover carers, clinicians or patients. Second, we did not elicit preferences of different stakeholder groups for the elements of value. However, the sample was representative of the English general population and EQ-5D-5L health states are derived from the general population's preferences through choice experiments. In addition, a latent class analysis of a similar DCE concluded that the preferences of public, patients, payers, and healthcare providers for value elements of integrated care were not statistical significantly different [14]. Third, as respondents received a small payment for participating, this payment might bias the sample if individuals participate solely for financial gain rather than interest in the survey topic. However, evidence suggests that the effect of incentives on response quality can vary depending on the type of survey and the amount of incentive given [86-88], and we used multiple quality checks when collecting the data. Another limitation lies in the review's focus on MCDA in healthcare, rather than on a broader value-based care literature. This may have resulted in a selective sample of value elements towards those highlighted in MCDA literature, rather than a more targeted value-based care perspective. However, although not exhaustive, using both interviews and a systematic literature review helped create a comprehensive list of potential VBHC elements. A last limitation concerns the relative weight estimated for ‘QALY gains’ as it was a construction from the stated preference dataset. Future studies could explore the extent to which preferences for QALY gains differ when its elicitation is derived from a choice experiment that directly includes QALY gains into the descriptive system.

5 Conclusions

Although preferences for maximizing population-level health gains remain highly relevant, our results highlight the importance of elements of value that are not included in traditional health economic evaluations, as they are not captured by conventional value metrics, such as QALY. The relatively high importance assigned to patient experience stresses the need for collating data on this dimension and adopting economic evaluation approaches that account for the multiple elements of VBHC. Healthcare decision makers should also be cautious when including cost in a multi-composite value metric as it may reduce the RI of other elements of value, such as equity.