7.1 Introduction

Previous chapters have provided an overview of the EQ-5D-5L value sets produced to date. Taken together, these value sets – and the methodological development which underpins them – constitute a very substantial body of work. The availability of EQ-5D-5L value sets has facilitated the use of EQ-5D-5L data collected from patients around the world for a variety of purposes. Primarily, these value sets are aimed at supporting the estimation of Quality Adjusted Life Years (QALYs) and QALY gains from health care for use in cost effectiveness and cost utility analysis, providing evidence to inform health technology assessment (HTA) processes globally. Additionally, the value sets allow the use of EQ-5D-5L in other applications, such as monitoring population health (both in the general and patient population) where there is a requirement to summarise EQ-5D profile data, focussing on those aspects of health that are considered to be most important by society (see Chap. 5).

The production of these EQ-5D-5L value sets, coordinated by the EuroQol Group, represents a unique endeavour in scale and breadth, unprecedented in the preference-weighting of other measures of health-related quality of life (HRQoL). It has improved on the earlier EQ-5D-3L valuation efforts, which were largely researcher-driven, used protocols that were not always fully documented, and consequently had limited comparability because of differences in methods and protocols. In contrast, the EQ-5D-5L valuation studies have been based on a similar and well documented protocol for collecting data that is carefully managed in accordance with agreed metrics and includes a deliberate process for incremental improvement of the protocol. The high standards applied in developing the protocol and in the application of quality control in its use have resulted in a protocol (see Chap. 2) that has been successfully replicated in many different contexts. This suggests that a new level of maturity in valuation approach has been reached, and that the techniques used reflect modern best practice in the health valuation field.

While the EQ-5D-5L valuation effort already has significant global coverage, further EQ-5D-5L value sets are planned or underway (for example, in the Middle East and Africa where such studies are relatively few), reflecting continued growth in use of the instrument. The development of universal health care systems around the world (for example, in China and Mexico) will further reinforce the demand for evidence on ‘value for money’ to support the allocation of resources in publicly funded public health care systems. This is likely to result in continued demand for use of the EQ-5D-5L and its accompanying value sets in both existing and new contexts.

The purpose of this chapter is to reflect on the future of EQ-5D-5L valuation studies, beyond the value sets summarised in Chap. 4. This includes a number of linked themes.

First, the EQ-5D-5L valuation project has allowed continued evolution in methods, as methodological studies have demonstrated that aspects of the protocol could be strengthened or improved. This chapter will describe some of the key candidates for future refinement of the methods.

Second, while the standardisation of the methodology is important, it is anticipated that many countries may seek a less resource-intensive, but still rigorous version of the valuation protocol. We outline progress towards developing a ‘lite’ version of EQ-VT. This ‘lite’ version of EQ-VT will also include a description of the development of a stand-alone discrete choice experiment (DCE) protocol, with accompanying strengths and weaknesses relative to the ‘gold-standard’ approach described in previous chapters.

Finally, it is worthwhile considering the shelf-life of value sets. As time progresses, pre-existing studies become increasingly unreliable estimates of what a contemporary study would report as the ‘average’ preferences of a society, due to methodological improvements, changes in the demographic makeup of the population, and preference shifts caused by broader cultural trends that may manifest in how people consider HRQoL and its value relative to life extension. More broadly, there are questions about who should make judgements about value sets, e.g., who decides when a new value set is needed? Similarly, who decides whether it is the general public (however defined) or some other group whose preferences are relevant? And who should judge whether any given value set is acceptable for use? What is the role and responsibility of the EuroQol Group versus local HTA bodies or other bodies?

7.2 Future Directions for Improvements in the EQ-VT – An Overview

As has been demonstrated in Chap. 2, significant work has gone into ensuring that the EQ-VT protocol is a reliable and defensible method for the valuation of EQ-5D-5L health states. EQ-VT is a living product which will continue to evolve. Any concern that has been expressed or that will be expressed regarding the methods adopted in the EQ-VT protocol can act as a catalyst to further research and development and to inform and shape future methodological choices. Some key areas for future progress are described below. Before discussing these, it is important to point out that changing the EQ-VT protocol necessarily involves a balance between using the improvements in data that may arise from incorporating enhanced methods against the reduction in consistency and comparability between value sets. Each advance to the EQ-VT protocol needs to lead to demonstrably better data, ideally in multiple methodological studies in a multinational context. Given the level of existing work to refine the EQ-VT approach, as described in Chap. 2, this sets a high bar for change.

The principal questions concerning the future directions of EQ-VT are in effect the same questions that confront any stated preferences study for any HRQoL instrument, namely: (i) what method(s) to use to elicit stated preferences, using what mode of administration; (ii) what study design to use (what sample size is required; and what sub-sample of states to include in stated preference tasks); and (iii) what modelling approaches to use to interpolate values across the descriptive system for the HRQoL instrument.

7.2.1 What Methods to Use?

The choice to include both time trade-off (TTO) and DCE methods, made early on in the programme of work (see Chap. 2), reflected both the growing popularity of DCE methods in health economics and the long-standing role of TTO in providing evidence to support QALY estimation – and the lack of consensus in health economics about any one method being optimal.

Despite the widespread acceptance of TTO, and the leading place it has earned among EQ-5D valuation methods, there are nevertheless remaining issues with TTO and the variant of it used in EQ-VT, the composite TTO (cTTO). As with any TTO approach, the cTTO tasks in the EQ-VT protocol necessarily incorporate methodological choices e.g., about the iterative routing process used to achieve the point of indifference; and about the duration of the states being valued (see Chap. 2 for more detail). Each of these choices has the potential to exert a framing effect on the values which are produced and might be challenged. For example, the use of a ten-year duration for all states to be valued is very widely used and has come to be regarded as standard, but that duration might be considered as an arbitrary choice, and it is likely that the observed proportional trade-offs would differ if alternative durations were employed (Stalmeier et al. 2007; Craig et al. 2018; Jonker et al. 2018, Attema and Brouwer 2014). The use of a 10-year duration is known to encounter issues with violations of constant proportionality and with the difficulty of imagining states (especially severe ones) over such a long period, without relief. The use of cTTO also involves the use of different tasks for obtaining values > 0 (the conventional TTO) and < 0 (a lead time TTO task) (Devlin et al. 2011; Janssen et al. 2013). The use of different methods for obtaining values across the scale raises questions about the comparability of values above and below 0. The particular design of the task for states < 0 sets the minimum observable value at −1 by design, which has the appeal of avoiding the likely need for rescaling of values. However, it also raises the question of whether −1 is the lowest meaningful value possible and, if values less than that exist, how to reflect that (e.g., in modelling). These and other issues will remain the subject for future research.

DCE methods have the appeal of presenting respondents with a potentially simpler choice task, allowing the rapid collection of large quantities of stated preferences data via online self-completion. However, the DCE tasks as included in the EQ-VT protocol have the limitation that they produce values on a latent scale. When the protocol was initially established, DCE approaches that allow calibration of the values relative to ‘dead’ were still in an early stage of development and were rejected, mainly because results obtained when the methods were tested varied a lot for reasons that were poorly understood. However, research done in recent years has put these initial results into perspective, revealing a dependency of values derived from the DCE-duration approach on modelling choices, design specification and the interdependencies between the two (Lim et al. 2018; Jonker et al. 2018; Jonker and Bliemer 2019. This seems to have brought a future closer where DCE can reach more of its potential and have a larger role in valuation studies of EQ-5D instruments. To some extent, this can already be seen in the valuation protocol for EQ-5D-Y, where DCE plays a bigger role (Ramos-Goñi et al. 2020).

7.2.2 Procedural Aspects

Similarly, there is ongoing attention to various procedural aspects of valuation studies. A key one is the basis for decisions about the number of health states and choice tasks to be included in the valuation tasks. It is important to select health states and pairs which allow unbiased estimation of coefficients based on whichever functional form is required. Yang et al. (2018, 2019) advanced the field by showing just how much the statistical properties of the set of health states/pairs matter to the predictive performance of the designs, and demonstrated that many published ways to select health states were suboptimal (including popular designs used to value EQ-5D-3L) and that by contrast the design used in EQ-5D-5L valuation studies performed well in comparison to alternative approaches. In the statistical approach to create a design for valuing EQ-5D-5L, the functional form, design, and sample size were considered in parallel. A large number of candidate designs was created using random draws, and the performance of these designs was evaluated using a given model (main effect) and priors derived from pilot studies, and the best one was kept (Oppe and van Hout 2017) (see Chap. 3 for more details). However, scope for improvement may still exist as we do not yet know how larger designs perform, and what number of observations per state is optimal. Moreover, Yang et al. (2019) showed that accurate prediction of the value of mild states is especially challenging and that some designs that perform well overall, perform poorly with respect to the value of mild states. This in turn calls for more attention on the models too.

Questions also exist about the mode of data collection – debate over which was fuelled by the COVID-19 pandemic and the resulting disruption to face-to-face interviewer administration of EQ-VT, as described in Chap. 2, in countries which had been planning value set studies. This gave rise to the idea of conducting EQ-VT interviews online – i.e., interviewer-guided, rather than self-completed, but conducted via an online platform rather than face-to-face. Initial experimentation suggested online data collection to be feasible; to enable reasonable responder engagement; and to yield data that appears to be of acceptable quality (Lipman 2020). Online interviews may even have some advantages e.g., in reaching respondents from broader geographic areas; in reducing costs of interviewer travel; and allowing use of ‘expert interviewers’ who do not need to be based physically in the same region or even the same country as respondents. However, there are also potential disadvantages e.g., in accessing samples without access to internet. Further, caution is required as there may be important differences between the preferences obtained from each mode of administration. Further evidence is required to establish the equivalence of data obtained via online administration.

7.2.3 Analysis and Modelling

While the EuroQol Group has been prescriptive about the use of its protocol for study design and elicitation, local research teams have a choice about other analyses to undertake, which modelling methods to use and about the criteria to use when choosing which algorithm is regarded as the preferred one. As we have shown in Chap. 4, modelling practise varies widely, but the common underlying protocol nevertheless facilitates comparison of resulting values and value sets between countries.

In particular, value sets differ in regard to whether they base their preferred value set on cTTO data only (for example, China and US), or a hybrid of cTTO and DCE (for example, England and Denmark) (see Chap. 4). Such differences reflect both scientific and strategic issues. Strategically, in some countries HTA bodies have expressed a preference for TTO-based values, and this is reflected in the choice of modelling approach taken to value sets. Scientifically, as is the case when competing approaches are taken to measurement, there is ongoing uncertainty about whether the cTTO and DCE are measuring the same thing, and what should be made of inconsistency between them. For instance, recent work has suggested differing relative importance of dimensions between cTTO and DCE in Peru and Mexico (Augustovski et al. 2020; Gutierrez-Delgado et al. 2021). Going forward, any disagreement in values derived from DCE and cTTO tasks need to be reviewed carefully, in relation to the level of conceptual resemblance between cTTO and DCE, assumptions used in both methods (including modelling assumptions), and scope for implementation issues to arise.

As we survey the future of EQ-5D value set development, we are cognisant that there will always be methodological questions; this is part of the inquisitive nature of science and good science depends on scientific debate. Such questions can lead to different responses: either to strengthen the methods currently in the protocol or to investigate new methods. As long as no method exists that commands universal support – which is likely to be the case here since we have no external validation to judge – any methodological question will fuel debate and can lead to either type of response. The research and development investment of the EuroQol Group in recent years has mainly focussed on refinement of the methods included in EQ-VT, as described in Chap. 2. However, other methods development has also been supported and the EuroQol Group continues to be open to alternatives, both from within the membership and from the broader and vibrant community of health preference and valuation researchers.

The use of TTO over so many years means we have a considerable evidence base to support its use. This has raised the bar for other methods as well, requiring very considerable evidence on their performance and the properties of the preference data they yield, before they can be considered a candidate for use. This is particularly apparent in our cautious approach to DCE, where an ambitious programme of research is underway to yield a deep understanding of its use in valuing EQ-5D instruments. This is good scientific practise – but is also strategically important, as stakeholders have a lot riding on their use of EQ-5D data and value sets. No transition can be made lightly, and the level of maturity reached in the EQ-VT protocol is difficult to match. The EuroQol Group is committed to progressing the science around valuation and to ensure evidence supports a new generation of methods fit for purpose in the future.

7.3 Developing Alternative Approaches and Answering Different Questions

The EQ-5D-5L exists in a dynamic environment, both in terms of the methods that can be used to develop value sets, and the empirical questions it can help to solve. This ever-changing context we work in continues to also present new challenges. The development of a ‘Lite’ protocol, a lighter, less resource-intensive EQ-VT (as described in Chap. 3), is a good example of this. As we move into more resource-constrained settings, we need to reduce the cost of conducting valuation surveys, and to make the undertaking of such work more accessible to those who bring essential local knowledge, context and contacts, but relatively less experience in the more technical aspects of the work. But, if we progress down this path, it is unclear whether we yet know the impact of switching protocols, something which requires some caution and careful comparative evaluation.

Either as part of the Lite valuation or not, the configuration of the DCE is an important ongoing consideration. DCEs that include comparisons of states with ‘dead’ have the appeal of being simple; but DCE with duration arguably conceptually resembles TTO to a greater extent, which may be considered an advantage (Mulhern et al. 2014). This potential advantage was recognized when the EQ-VT protocol was developed, but it was coupled with concerns about the low values that were obtained in some initial applications. Stolk et al. (2019) suggest these results arise because of the difference between DCE with duration and cTTO: the latter observes values and uses lead time TTO to assess the strength of preferences for health states that are classified as worse than dead. In contrast, the DCE with duration task never indicates directly whether a health state has a value worse than dead. It also relies on extrapolation – and this comes with extra uncertainty and the potential for bias if the underlying assumptions are wrong. Evidence suggests estimates of values obtained by DCE with duration estimates are sensitive to model specification and in particular to assumptions made regarding time preferences. Models applied to cTTO rely on the assumption of constant proportionality, which may not hold. However, violations of this assumption can be a bigger problem for DCE with duration than for cTTO, because of the required extrapolation in the former. These issues with DCE with duration are an ongoing area of methodological research.

Quantitative approaches to valuing EQ-5D-5L are valuable and will always remain a centrepiece of value set development within the EuroQol Group. However, there is a growing literature focused on greater reflection and deliberation by respondents (Robinson and Bryan 2013; Devlin et al. 2019; Karimi et al. 2017, 2019). This line of enquiry is potentially extremely valuable in identifying why respondents place value on certain aspects of health, and also in minimising the risk from datasets being contaminated with ill-considered or hasty responses.

7.4 Making Scientific and Social Value Judgements About Value Sets

As discussed in Chap. 5, users of value sets should consider both the inherent scientific quality and the underlying social value judgements that value sets embody. Indeed, community decision makers are becoming more active in independently scrutinising value sets and applying their own quality assurance – for example, the England EQ-5D-5L value set, which was part of the first wave of studies, was subject to a formal review by the Department of Health for England (Hernández-Alava et al. 2020; van Hout et al. 2020) and ultimately rejected for use by the National Institute for Clinical Excellence (NICE) (NICE 2019). This has led to efforts (currently underway) to produce a new, UK-wide value set. More generally, the question remains of who is responsible for value set endorsement – is this a case of ‘caveat emptor’ i.e., is it ultimately the responsibility of users and decision-making bodies, or is there a role for the EuroQol Group? To date, other than allowing use of EQ-VT and monitoring data collection via quality control, the EuroQol Group has not imposed any process for approving (or not) the value sets modelled from EQ-VT data.

This question is particularly pertinent in settings where value sets have been developed using methods which are quite different from those recommended by the EuroQol Group at the time. For instance, EQ-5D-5L value set studies using different methods to elicit the state preferences of the general public have been conducted in the US (Craig and Rand 2018) and New Zealand (Sullivan et al. 2019). These value sets are not reported in this book, as our focus is on value sets produced using the EQ-VT protocol. Similarly, there is an emerging body of work examining the preferences of patients, rather than the general public – an example of a value set based on these ‘experienced’ values can be found in Burström et al. (2020) for Sweden. Such studies offer interesting methodological comparisons and can, under particular circumstances, be used in those countries. However, the differences in methods used in such cases means comparisons of the EQ-5D-5L values yielded by them with the value sets reported in Chap. 4 should be treated with caution, as these differences are attributable to both different local preferences and methodological differences, which are impossible to disentangle.

Moving away from scientific judgement of value sets, the social values that underpin the use of each are potentially important. While value sets are most commonly developed using the adult general population, this is defined differently in different countries – for example, in Japan and Taiwan this is considered to be those over 20 years of age; more commonly it is interpreted to be those over 18 years, while in some countries, such as Indonesia, this was set at 17 years and older (see Chap. 4 for details). The views of younger adolescents and children are typically excluded from such studies.Footnote 1 While the merits of such exclusion in the valuation population can be discussed, a key issue is how we define the age threshold. At what age do we define a person to have transitioned into adulthood and able to complete the cognitively challenging valuation tasks we use? And are we imposing age criteria for practical reasons (e.g., with respect to comprehension and data quality), ethical reasons (concerns about confronting younger people with life/death trade-offs) or philosophical/normative reasons about whose preferences should determine public policy – or a combination of all three? To the extent that age impacts on preferences, this can have significant implications for decision making in practice. It could be argued that such determinations are best made by the users of the value set themselves. The appropriate method for engagement on such topics is likely to be context-specific, and will yield different decisions, impacting the comparability of the value sets between nations. This trade-off between consistency and tailoring to the local context is an ongoing challenge.

7.5 Adapting to Change

Previous value sets for the EQ-5D-3L have remained in use and accepted by policy makers for long periods of time e.g., the UK MVH value set (Dolan 1997), data for which were collected in 1993/94, and NICE continues to recommend while awaiting a new EQ-5D-5L value set for the UK. This begs the question of what the shelf-life is of such value sets, and what factors might prompt the need for new value sets, bearing in mind both the potential benefits of updated values and the costs of producing them.

Samples are recruited to be representative of the general public at the point at which data are collected, and value sets represent the average preferences of society. Over time, the socio-demographic composition of populations changes due to population ageing, trends in fertility rates and patterns of immigration. These changes could be expected to lead to changes in the average preferences of the general public, if this means that the share of sub-groups in the population with different preferences changes. Perhaps less obviously, changes in the proportion of the population who are very elderly and more likely to be in residential care, or those incarcerated in prisons or are in other institutions may also be important, since these people often fall outside the sample frames used to recruit the general public. Such changes might indicate the need for a new value set. An alternative would be to use population weights to account for such changes, but this would rely on appropriate demographic data collection during the initial value set development, which would be challenging as we do not know in advance the population demographics we would want to weight on.

Changes in preferences could provide another reason for updating value sets and may arise due to other factors influencing society. For example, over time living standards health and HRQoL have improved for many people, and this may increase our expectations about health and health care in ways that affect our preferences for HRQoL. There may also be specific health issues locally that exert an effect on preferences. One might speculate about whether the high-profile debates over euthanasia that have occurred in a number of countries might affect the trade-offs the general public were prepared to make against dead/duration. In Mexico, the relatively high importance placed on problems with mobility have been suggested to be linked to the widespread lack of support or social services for those with mobility problems (Gutierrez-Delgado et al. 2021). In general, increasing awareness of mental health issues may affect how people consider these health issues and their importance relative to other health problems. The COVID-19 pandemic, and its global impact, could also potentially exert an effect on how people value HRQoL. However, there is a lack of research on such factors and very little clear evidence on how they affect stated preferences.

These issues suggest a rationale for updating value sets from time to time – but there are currently no guidelines about this, and no consensus about what factors or prima facie evidence should trigger an update. One possibility may be to conduct a less expensive survey, such as a DCE, at regular intervals with updated sampling frames to monitor if there is evidence of preference shifts which might motivate conduct of a replication EQ-VT study to accurately capture the shift.

Further, the benefits of updating a value set need to be weighed up against the costs. These include not just the costs of producing a new value set but the costs and consequences for their use in decision making. For example, HTA bodies may be concerned about changes to the HRQoL values used in cost effectiveness evidence and the implications of these for consistency of their decisions. In economists’ terms, these changes impose costs of their own, so updating may need to be balanced against these pragmatic and operational considerations.

7.6 Concluding Remarks

The national value sets for EQ-5D-5L summarised in this book play a vital role in supporting the use of EQ-5D-5L data, providing evidence for HTA and other health care decision making contexts. The EQ-VT protocol used to produce these value sets can now be considered to represent a mature and well-tested set of methods. However, there will always remain questions relating to which methods for eliciting and modelling values for HRQoL are best – and this is the case both for EQ-5D-5L and other HRQoL instruments. The EuroQol Group actively encourages and supports innovative research and development into valuation methods and is a leading investor in such research internationally. This ensures that there is scope for researchers to develop and explore potential new methods, and a process for assessing the case for their inclusion in the protocol in future. These efforts not only benefit studies to value EQ-5D-5L, but also inform the wider scientific agenda on valuation of HRQoL instruments.