A total of 1143 individuals were interviewed between March 2018 and November 2018, of whom 95 were excluded due to the poor quality of data. The final sample of respondents was 1048. The exclusion criteria were related to interviewers not complying to instructions or when serious inconsistencies in valuating health states were observed. Regarding the noncompliance in protocol rules, we excluded interviews for each interviewer who had not shown the ‘worse than dead’ configuration in the training part of the survey. The inconsistencies were related to conditions in which the respondent gave the worse state of 55555 a value that was higher than the value given to the mildest health state presented in the TTO task.
After exclusions, the average number of interviews per interviewer was 71.4 (standard deviation [SD] 30.9; minimum 10, median 82, maximum 132). The average time of interviews was 39.2 min (SD 9.4; minimum 17, median 37.8, maximum 95.4). The average time of a single TTO task was 60 s (SD 43.1; minimum 1.7, median 48.5, maximum 1081) and a single DCE task took an average time of 38.8 s to be completed (SD 30.0; minimum 4.8, median 30.2, maximum 725.9).
Sample Characteristics
Table 1 displays descriptive statistics of the sample. The final sample after exclusions (n = 1048) did not present major differences from the total sample (N = 1143). The average age was 49.4 years, while women represented 55.44% of respondents. The market research company used a three-level standard description of socioprofessional status (higher retired and active socioprofessional status; lower retired and active socioprofessional status; no professional activity). With this classification, final distribution of the sample was consistent with stratification goals (see ESM 3 for description of socioprofessional classes).
Table 1 Characteristics of respondents The final sample presented with a difference in age and sex in the French general population [24]. An overrepresentation of females versus males in the sample was observed when compared with the planned stratification and with the general population. A breakdown of age groups per sex (Fig. 2) shows that there is an imbalance in favor of the 25–34 years age group for both sexes, and an imbalance for women, with a deficit in the number of women respondents in the older age group (75 years and older) versus women in the 55–74 years age group. An extra quota of 20 women aged 65 years and over was surveyed to reduce this imbalance but was not sufficient for a full correction. According to the market research company, acceptance of interviews was lower in this age group.
Figure 3 represents the geographical distribution of respondents. Compared with national statistics, rural areas were well represented, whereas there was an underrepresentation in population size of residences of 2,000–100,000 inhabitants, and an overrepresentation of people living in residential units of over 100,000 inhabitants, and also the Paris ‘Petite Couronne’ (i.e. Paris + 4 adjacent departments). Supplementary data on the sample, including reporting on the personal experiences of diseases are presented in ESM 3.
Data Characteristics
Respondents declared 181 health states out of 3125. The list of declared health states is presented in ESM 4. Of 181 health states, 5 represented 50% of the sample health states declared by respondents (11111, 11112, 11113, 11114, and 11121).
Overall, 20.2% of cTTO values were negative, with 2.3% elicited at − 1 (Fig. 4). An unwillingness to trade-off full health (value 1) was observed in 13.7% of responses. In addition, values of 0.5 and − 0.5 were often observed (9.29% and 5.2%, respectively) but were not interviewer-dependent. The proportion of values around 0 (± − 0.05) was 3.7%.
Value Set
Altogether, seven models were tested: (1) a cTTO tobit model unadjusted for age and sex; (2) a DCE logit model unadjusted for age and sex; (3) a hybrid model unadjusted for age and sex; (4) a hybrid model adjusted for age and sex; (5) a hybrid model adjusted for age only; (6) a hybrid model adjusted for sex only; (7) and a main effect adjusted hybrid model.
When including age and sex in the hybrid model, only age was highly significant (p = 0.023), but its coefficient was small (0.00250). When including age alone, it was no more significant (p = 0.066). Nevertheless, because the initial objective of the study was to provide a value set that reflects preferences of the general population, correction for sample biases was essential. Thus, a hybrid main effect model adjusted for age and sex was performed and is the preferred value set. This model was compared with an unadjusted hybrid model to measure the effect of adjusting. In Table 2, we present the cTTO and DCE models, followed by the unadjusted and adjusted main effects hybrid model. Coefficients are incremental utility variations when moving from one level to the next. Using the sum of levels across dimensions as a proxy for health state severity, the higher the severity, the lower the mean cTTO values but the higher the SD, indicating heteroscedasticity in the cTTO data (Fig. 5). Heteroscedasticity was thus taken into account by modeling the variance. The theta rescaling coefficient was 5.226 (the full data of the preferred value set, including Sigma statistics, are shown in ESM 2, and the full value set is shown in ESM 5).
The appropriateness of the models can be assessed by identifying the inconsistencies in each specification. We expect disutility to increase as we move to worse health conditions. Both the cTTO and DCE models present one illogically ordered coefficient (MO3 and UA3, respectively), which is corrected for in all hybrid models. The agreement between models can also be assessed by comparing the ordering on the most impacted dimensions of health-related quality of life. UA was the dimension with the lowest cumulative decrement in all models, but models differ in the relative position of mobility, anxiety/depression, and self-care; however, in all models, cumulative decrements of anxiety/depression and self-care are very close. In the hybrid non-adjusted model, anxiety/depression ranks third, and also ranks third in the hybrid adjusted self-care model (Table 3). For 2402 health states, utility values were higher in the unadjusted model versus the adjusted model, which is consistent with what was expected by correcting for the imbalance in age. The value of the worst health state (55555) was − 0.5255 in the adjusted model versus − 0.5217 in the unadjusted model.
Table 3 Cumulative decrements of utilities per dimension Table 3 allows for calculation of the utility of any given health states, using cumulative decrements. For example, the utility for the health state 54321 from the adjusted model is equal to 1 − 0.32509 − 0.172251 − 0.03979 − 0.02198 = 0.441.
Figure 6a, b, and c represent the scatterplots of the predicted values of two by two models of the 86 health states assessed in the cTTO part of the study; Fig. 6d is the scatter plot of the predicted values versus the observed values of the same health states using the adjusted model. DCE coefficients have been rescaled using the theta parameter to facilitate the comparisons. The DCE model provides a better fit in terms of convergence with the adjusted hybrid model, than the cTTO model. This has also been the case when comparing each model’s predicted versus observed values for the 86 health states from the cTTO experiment (see ESM 6). Thus, data support the assumption of proportionality between cTTO and DCE coefficients, and justify using a hybrid model, which brings together two different sources of stated preferences, with a larger number of health states than for each submodel alone.
Comparing Value Sets: 5-Level, 3-Level, and Crosswalk
Figure 7 provides the Kernel density distributions for the French 5L value set, the 3L, and the 5L crosswalk. It highlights a displacement of 5L utility values to the right side of the distribution, indicating a shift to higher values. The 5L crosswalk distribution curve is similar to the 3L value set.
In the 3L version, 78/243 (32%) health states had a negative value, while in the 5L version, this number was 401/3125 (12.8%), confirming that this shift to higher values also impacts negative values. However, this is mitigated by the fact that if 5 and 3 are considered as the worst levels in both sets, there are proportionally less health states including a 5 (67%) than those including a 3 (87%). The worst health state has a value of − 0.52, and was 0.53 in the 3L and the crosswalk value sets.
The ranking of dimensions has changed. In the 3L version, the worst utility decrement (Level 3) was observed for mobility, followed by self-care and then pain/discomfort (followed by anxiety/depression and usual activities). In the 5L version, the ranking was pain/discomfort, mobility, self-care, anxiety/depression, and usual activities. The coefficients of self-care and anxiety/depression are very close. Maximum decrements are also lower in the 5L value set than in the 3L value set. In the 3L value set, the maximum decrement for MO was 0.5602, versus 0.3250 for 5L; the maximum decrement for PD was 0.4517 in the 3L value set and 0.4439 in the 5L value set. Mutatis mutandis, main differences with the 5L crosswalk, are quite similar to what was observed for the 3L value set.
Table 4 presents a selection of health states and values for both questionnaires, which confirm the shift to the right of the 5L value set. However, caution is recommended when comparing the 3L and 5L value sets, since formulation of the worse level for mobility is ‘confined to bed’ (3L version) versus ‘unable to walk about’ (5L version); intermediate level 2 labeling in the 3L version is classified as ‘some problems’, whereas it is classified as ‘moderate problems’ for intermediate level 3 in the 5L version.
Table 4 Comparing 5L–3L values