Introduction

Like other decisions, medical decisions often involve trade-offs between gains and losses in different domains. In health economics, an important trade-off concerns that between length and quality of life (QoL), also in the context of health state valuations. Research in behavioral economics and psychology has established that in such trade-off losses typically carry more weight than gains of the same size. This sensitivity to losses is referred to as loss aversion [1, 3]. Recently, scholars demonstrated the importance of loss aversion within the health domain, both for life duration [4,5,6,7] and quality of life (QoL) [7,8,9]. In health economic analyses, utilities are often defined as a product of these two attributes, jointly comprising Quality-Adjusted Life Years (QALYs) [10]. Commonly, the utility function over these two outcomes is decomposed into separate utility functions over life duration and QoL. This separability of QALYs is, however, only possible under several assumptions, which have solely been tested under conditions in which no distinction is made between gains and losses [11].

Here, we use prospect theory (PT), which incorporates loss aversion and judges changes from the perspective of some relevant reference point (RP). Bleichrodt and colleagues [11] established that, when considering multi-attribute outcomes, such as QALYs, gains and losses may be determined per attribute with separate attribute-specific RPs. This also makes it possible quantify loss aversion, to see how much more weight losses carry than gains. Earlier attempts at quantifying loss aversion under PT have typically focused on single attributes within the QALY framework, for example by obtaining loss aversion for life duration while maintaining QoL constant [4, 5] or vice versa [8]. Although these studies produced similar median estimates of loss aversion, with health losses receiving between 1.5 and 2 times more weight than gains, they did not allude to the issue of separability. In other words, these studies ignored the possibility that loss aversion for one attribute (e.g., length of life) depends on the level of the other attribute (which is typically held constant) and, hence, assumes loss aversion for health outcomes to be constant, independent of their QALY profile.

However, it could be the case that some QALY losses carry more weight relative to commensurate QALY gains than others, for example if loss aversion is more pronounced for more severe health states. In this article, we test this assumption using a non-parametric method [12] to quantify loss aversion over life duration, under varying levels of QoL. This non-parametric method was developed recently and allows the estimation of utility curvature and loss aversion without imposing parametric assumptions on either. Earlier work has argued that the choice of parametric family or functional form restricts interpretation of subjects’ choice patterns, and may lead to considerable bias especially for extreme cases [12, 13]. This method has been adapted to and used in the health domain before [5].

Theoretical framework

Consider a decision maker facing choices with regard to his health under uncertain conditions, operationalized by presenting decision makers with risky prospects representing different life durations and QoL. We assume completeness and monotonicity for both attributes. We consider lotteries involving chronic health profiles, described as \((\beta ,T)\), where β represents QoL and T duration in years. According to the generalized QALY model [14], a decision maker’s preferences for health profiles can be represented by the following:

$$V(\beta ,T)=U(\beta ) \times L(T),$$
(1)

with \(V(\beta ,T)\) being a product of U(β), the utility of β, and L(T) denoting the utility of T life years.

Here, we assume PT under risk with a sign-dependent utility function for life duration, so that gains are evaluated differently than losses, relative to an attribute-specific RP. We assume that, through instruction, it is possible to set this attribute-specific RP to a specific health condition \({\beta _{\text{c}}}\) and life duration \({T_0}\). To elicit a continuous utility function for life duration, we elicit a standard sequence for life duration that runs through \(L({T_0})=0\). Meanwhile, we keep QoL constant at \({\beta _{\text{c}}}\) throughout the task. We repeat this process under different levels of \({\beta _{\text{c}}}\).

We elicit the utility function for life duration, relative to this RP, both for gains and losses for the different health states. Hence, we obtain \({L^i}(T)\) for each \({\beta _{\text{c}}}\), with \(i=+\) for gains and \(i= -\) for losses. \({L^i}(T)\) is a standard ratio scale utility function, which is strictly increasing and real-valued with \({L^i}({T_0})=0\). We incorporate loss aversion by taking \({L^ - }(T)=\lambda\)L(T) for T < \({T_0}\), where λ denotes a loss aversion index, with λ > 1 [= 1, < 1] indicating loss aversion [loss neutrality, gain seeking]. Hence, by obtaining the utility around the RP, the degree of loss aversion can be derived.

Methods

A total of 111 students (average age 20.23, SD = 1.52) of Rotterdam School of Management (61 female) participated in this study for a course credit reward. Experimental sessions lasted for 25 min and were run with up to four subjects per session. One experimenter was presented in the room to answer questions. The experiment was computerized with Matlab.

To test the robustness of loss aversion, we used the non-parametric method [12] under four levels of QoL. In other words, each subject completed the non-parametric method four times, with a different \({\beta }_{c}\) throughout each of these four phases. This process allows us to obtain estimates of utility curvature and loss aversion for each of the four levels of QoL, and compare them within subjects.

QoL was defined by means of EQ-5D-5L health state descriptions [15], which utilize five domains: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The 5L version of the EQ-5D distinguishes five levels of severity on each domain, ranging from ‘no problems’ to ‘extreme problems/unable to’. Health states are typically denoted by 5 digit codes like 22113, with each number representing severity of the relevant domain level of QoL. In this study, we used four relatively mild-to-moderate health states as RP \({\beta _{\text{c}}}\) in the non-parametric method: 11111, 21211, 31221, and 32341 (see “Appendix 1” for exact description). This was done to have variation in health states but avoid states worse than dead, for which no separate procedure was included.

The non-parametric method used here consisted of three stages which are described in detail in “Appendix 2”.Footnote 1 The first stage connects the utility for gains and losses. The second and third stages employ the trade-off method developed by [16] to measure a standard sequence of outcomes in life years for gains \((x_{1}^{+},x_{2}^{+}, \ldots ,x_{5}^{+})\), and for losses \((x_{1}^{ - },x_{2}^{ - }, \ldots ,x_{5}^{ - })\). This enables measuring loss aversion, without imposing parametric assumptions on utility curvature.Footnote 2 In addition, the standard sequences allow the testing of utility independence [11]. The three stages had slightly different instructions, providing context for the required trade-offs. The instructions were similar to those used by Lipman and colleagues [5]. During all the stages of the experiment, it was made clear to subjects that they should imagine living until 70 years in \({\beta _{\text{c}}}\), after which they would contract a disease, resulting in immediate death without any pain. Subjects completed a series of binary choices between two drugs which could change their situation (leading to gains and losses compared to living until 70). Employing a bi-section choice method, we obtained indifferences, set equal to the midpoint after the fifth binary choice. Some stimuli and constants relevant to the non-parametric method had to be set beforehand; these are listed in “Appendix 1”.

Results

Seven subjects were excluded from further analyses for the following reasons: mechanical failure (n = 2), refusing to incur life year losses (n = 3), and observed misbehavior (e.g., rushing through the task, n = 2). The results are reported for the reduced sample (n = 104).Footnote 3 Throughout, we will first report aggregate analyses, where median parameters are compared for the whole sample, and refer to these as results at ‘the aggregate level’. Second, we will investigate individual results more closely, by classifying each individual according to classification rules reported in “Box 1” and we explore within-subjects parameter instability. We refer to these analyses as ‘individual-level analyses’.

Table 1 demonstrates the results at the aggregate level, by comparing point-estimates for utility curvature and loss aversion for each health state. We compared differences between health states using omnibus tests (i.e., comparing all four health states simultaneously), more specifically Friedman’s tests, which are robust against the violations of normality typically observed for parameters under the definitions reported in “Box 1”. Next, we compared all health states in pairs with Wilcoxon signed-rank tests. For the omnibus tests, no significant differences were observed between health states, both for utility curvature and loss aversion (all p’s > 0.06). When comparing parameter estimates in pairs of health states, some significant differences were observed. For loss aversion under both definitions, parameter estimates for β2 were significantly lower than for β3 (p’s < 0.03). All other pairwise comparisons for loss aversion yielded no significant differences (all p’s > 0.07). Using pairwise comparisons for utility curvature, we observe no significant differences for both parametric and non-parametric estimations (all p’s > 0.05).

Table 1 Median (IQR in brackets) parameter point-estimates for loss aversion under two definitions and utility curvature as defined by area under the curve (AUC) and power utility

In general, we observe close to linear utility for all health states, both for gains and losses.Footnote 4 Furthermore, we observe considerable loss aversion at the aggregate level, with λ significantly greater than 1 for all \({\beta _{\text{c}}}\) (Wilcoxon tests: p < 0.001 for all β’s).

Table 2 demonstrates how subjects classify under different estimations of utility curvature and loss aversion (see “Box 1”). For all individual classifications, we observed that the conventionally assumed loss neutrality and linear utility curvature are not present in our data. Although, at the aggregate level, linear utility was found, when classifying individually, considerable heterogeneity in utility curvature was observed, with proportions of concave/convexity varying between definitions and health states. This finding could be explained by the near equal division of concavity/convexity in our sample, resulting in roughly linear utility at the aggregate level. For loss aversion, however, such an equal division was not visible, with the majority of subjects classifying as loss averse across definitions and health states.

Table 2 Individual classifications for utility curvature (n = concave, linear, convex) and loss aversion (n = loss averse, loss neutral, and gain seeking)

Our design allowed exploring point-estimate stability for utility curvature and loss aversion between different levels of \({\beta _{\text{c}}}\). To this end, we calculated the difference between the smallest and largest estimates within subjects (e.g., the lowest and highest \(~\lambda\)). Furthermore, to allude to within-subjects heterogeneity in classification, we calculated the proportion of subjects for whom classifications were dependent on health states (e.g., loss averse for β0–2 and gain seeking for β3). Both exploratory measures of within-subjects parameter and classification variance demonstrated considerable heterogeneity between health states (see Table 3). Finally, we investigated whether systematic patterns in utility curvature or loss aversion could be observed in our sample. To this end, we determined the extent to which subjects showed monotonically increasing (or decreasing) parameters (see Table 3). For loss aversion, this classification indicated that subjects became more (less) loss averse for increasing health state severity for \(~{\beta _{\text{c}}}\). These analyses indicate that these patterns did occur, but only for a small part of our sample, again suggesting non-systematic heterogeneity of parameter estimates.

Table 3 Exploration of within-subjects heterogeneity for different health states

Discussion

In this paper, we compared estimates for utility curvature and loss aversion for QALY outcomes under four levels of QoL, to test the robustness of these estimates. An extensive literature exists testing the validity of QALY models, which has documented mixed evidence with regard to the separability of life duration and QoL [e.g., 1821]. In addition, many authors have investigated utility independence with regard to health state valuation (e.g., the relation between utilities and time horizon in the standard gambles), finding many descriptive violations of this independence [for a review, see: 20]. Ours was the first experimental test of this separability for QALY gains and losses separately, and we also tested the robustness of loss aversion. Our results, at the aggregate level, provided evidence that estimations of loss aversion and utility curvature are independent of QoL. However, loss aversion and utility curvature estimates were heterogeneous at the individual level, i.e., varied considerably between health states for the same individual.

Our findings are in many regards similar to earlier work that measured PT for QALY outcomes. We observed considerable loss aversion (defined over length of life), as was found in similar magnitude in earlier work applying similar methodology [5, 22], or with different elicitation methods [4, 8]. In contrast to what was observed in earlier applications of the non-parametric method for health outcomes [5, 22], we found linear utility for both gains and losses at the aggregate level. Applying a parametric approach to our non-parametric measurements did not affect these conclusions. However, when estimating individual classifications, we found none for whom our data supported this linearity, as we observed a near equal spread in concave/convex utility (i.e., averaging out to linear).

We document considerable heterogeneity in parameter estimates between subjects, and also observed such heterogeneity within subjects for different health states. Our exploratory analyses did not uncover systematic or monotonic patterns in this within-subjects heterogeneity. An explanation related to our chosen chained utility elicitation method could be that these individual differences occurred as a result of preference imprecision [23]. Such ‘noisy preferences’ could result in error propagation, i.e., cascading of errors or imprecision in the early stages of our chained method into later stages, producing differences in parameters between health states when errors occur randomly. Although earlier work using similar methodology [5, 24] observed no effects of error propagation, we cannot rule out it affected current results. Another factor contributing to possible error propagation in our study could be that we opted to obtain indifferences via bi-section only (to reduce complexity), whereas earlier work [5, 12] using this method applied a slider to obtain indifferences, allowing subjects to correct errors adaptively. Future work could explore this further, for example by adding a slider to obtain indifference points, using non-chained methodology, or running an error propagation simulation.

Some additional limitations of this study deserve noting. First, since this study involved a first test of independence of loss aversion in health, we used a convenience sample consisting of students. Of course, future extensions preferably should include representative samples to generalize our findings. Although power analyses suggested that our sample was adequately powered to detect small effects, using a larger sample could, perhaps, result in the detection of smaller effects, also given the large heterogeneity for parameter estimates reported here. Second, we assumed that it is possible to set the RP through instruction, while it may be the case that respondents took another RP in mind. Still, given the high loss aversion coefficients that we found, it seems plausible that our respondents, indeed, held the induced RP in mind. Finally, our study used four mild-to-moderate health states, including perfect health, while the EQ-5D descriptive system enables many more possible health states, with more severe health problems than our selection. Given the aim of our study, this is a clear limitation, as, perhaps, these states where insufficiently spaced in terms of utility for us to observe systematic patterns in loss aversion or utility curvature parameters. However, our empirical approach required us to make a fundamental assumption: monotonicity. The non-parametric method breaks down if monotonicity is not satisfied, i.e., if subjects prefer to lose years of life instead of gaining them. For more severe health states, monotonicity need not always hold [25]. Obviously, many other mild health states were available for our purposes, but to reduce cognitive strain for our subjects that we decided on including just four. For reference, these four health profiles receive utility weights ranging from 1.00 to 0.46 in the Dutch tariff [26], which we considered to be sufficient for our purposes. Future work could replicate our findings with a different or larger selection of health states.

Our findings may have implications for policy makers and researchers aiming to apply PT measurements to health-related decision-making. Our results imply that median parameters in applications of PT may have merit, as these estimates appear to be robust across different scenarios (in terms of QoL). For example, our work warrants the conclusion that, at the aggregate level, life year losses are weighed twice as much as similarly sized gains, regardless of QoL level. However, as our exploratory analyses of within-subject heterogeneity demonstrated, individuals’ loss aversion and utility curvature may depend on the health state used during elicitation. This heterogeneity at the individual level may be problematic for approaches using averages, like median-optimized parameters (e.g., [27]). When aiming to address PT biases for QALYs [28], such as loss aversion, at the individual level, our data would suggest that assuming such median loss aversion parameters may misrepresent individuals’ actual preferences and trade-offs. When one aims to apply PT to allude to biases in individual cases (e.g., in health state valuation), an individual approach may be more suitable, given both the considerable between subjects and between-health states’ heterogeneity reported in this study. Such corrections with individually estimated parameters could be too time-consuming and labor-intensive when applied separately for each economic evaluation. However, in many countries, such as the UK, QALYs are not derived individually, but from indirect preference-based classification systems, such as EQ5D or SF6D via social tariff lists [29]. Recent developments in de-biasing QALY measurement [5] suggest that it may be suitable and possible to apply the correction for PT at the individual level to obtain value sets for these social tariffs [see 30].Footnote 5 When considering such individual correction, however, it seems important to consider which health state is used to quantify PT parameters.

In conclusion, although we observed large heterogeneity of loss aversion and utility of life duration depending on QoL, we failed to observe systematic patterns in this dependence, and observed no differences on average. Future work should aim to address whether this heterogeneity is method-dependent or due to systematic differences between individuals or health states. For now, it appears that, on average, loss aversion is equal across health states, i.e., a QALY loss is a QALY loss is a QALY loss, and it receives approximately twice as much weight as equally sized QALY gains.