Introduction

The EQ-5D-5L instrument is a health classification system that is used for health outcomes measurement in developed, and now increasingly also in developing countries [1]. EQ-5D-5L has many uses in healthcare and health economics including its use as the adjustment for Quality Adjusted Life Years (QALYs), for cost utility analysis and in quantifying burden of illness. EQ-5D-5L comprises 5 dimensions in order: mobility, self-care, usual activities, pain/discomfort and anxiety/depression, on each of which a respondent can indicate problems at one of 5 levels: no, slight, moderate, severe or extreme problems/unable to. With 5 dimensions in 5 levels, there are 55 = 3,125 possible states of health that can be described using the classification system. Each of these health states can be presented as a five-digit string. For example, if a respondent indicates he/she has moderate problems (level 3) with walking about, slight problems (level 2) with self care, no problems (level 1) with usual activities, severe (level 4) pain/discomfort and is extremely (level 5) anxious or depressed, the health state can be coded as state “32145”, representing the level of problems on each dimension, in order of appearance. In an EQ-5D-5L valuation study, a societal value is obtained for each EQ-5D-5L state relative to all other EQ-5D-5L states [2]. This is known as the index value for the state and the collection of all 3,125 index values for a population is known as a value set. Different countries will have different EQ-5D-5L value sets as the preferences among states of health are known to be driven by many factors, including factors relating to national culture [3].

The original EQ-5D instrument (EQ-5D-3L) had 3 levels [4]. This was similar to EQ-5D-5L but without the two intermediate levels: slight and severe and using ‘confined to bed’ instead of ‘unable to walk about’ for the highest level of problems with mobility. An EQ-5D-3L valuation study was undertaken for Trinidad and Tobago in 2016 [5]. This has been used to produce a set of EQ-5D-3L population norms for Trinidad and Tobago and in several applications [6,7,8]. EQ-5D-5L was introduced to increase the sensitivity of the instrument [9]. A crosswalk EQ-5D-5L value set was developed for Trinidad and Tobago based on the EQ-5D-3L value set [10]. The Trinidad and Tobago crosswalk value set has been used in several applications in Trinidad and Tobago [11,12,13,14,15] and the Trinidad and Tobago EQ-5D-5L crosswalk value set has also been used in other countries in the Caribbean region [16,17,18]. While an EQ-5D-5L crosswalk value set is known to be more sensitive than EQ-5D-3L, a directly elicited EQ-5D-5L value set is preferred over a crosswalk. Directly elicited value sets are not subject to the assumptions underlying mapped value sets, for example the Van Hout et al. crosswalk algorithm was based mostly on responses from European respondents which may not necessarily be representative for countries with different cultures [10].

Given the growing role that EQ-5D health outcomes plays in clinical practice and disease studies, and the potential that it offers for policy work in Trinidad and Tobago and the wider Caribbean, a decision was made to develop a directly elicited EQ-5D-5L value set for Trinidad and Tobago. The EuroQol Group has developed and published a standardized protocol for EQ-5D-5L valuation studies known as EQ-VT [19]. This study reports the application of EQ-VT in Trinidad and Tobago. The goal of the study was to develop a value set for the EQ-5D-5L, by directly assessing the preferences for EQ-5D-5L health states in the population of Trinidad and Tobago.

Methods

This study followed the EQ-VT protocol, version 2.1 [19]. Computer-assisted personal interviews were utilized in which respondents completed Composite Time Trade Off (cTTO) and Discrete Choice Experiment (DCE) tasks. These data were subsequently modelled to derive a national EQ-5D-5L value set for Trinidad and Tobago. This value set was then compared against the existing Trinidad and Tobago EQ-5D-5L crosswalk value set that was based on the EQ-5D-3L value set. We followed the CREATE checklist for reporting Valuation Studies of Multi-Attribute Utility-Based Instruments [20].

Valuation methods

The cTTO combines the traditional Time Trade Off (TTO) method for health states considered better than dead (BTD), and lead-time TTO (LT-TTO) for states that respondents consider to be worse than dead (WTD) [21]. Both methods follow an iterative procedure in which respondents choose between living in two different hypothetical lives. In the TTO, Life A is described as living for a number of life years in full health, and Life B being 10 years in some EQ-5D-5L health state. Depending on the choice made by the respondent, the number of years in full health in Life A is subsequently varied, until the respondent is indifferent between the two lives, and a value can be inferred for the health state of Life B. The LT-TTO is invoked when respondents indicate that they consider a health state to be WTD, and follows a similar iterative procedure, except that the 10 years in Life B are now preceeded by 10 years in full health. Further details can be found elsewhere [19, 21].

In the DCE task, respondents were asked to choose which of two different EQ-5D-5L health states they prefer, without any duration of time spent in these health states specified. In contrast to the iterative cTTO method, which produces cardinal values, the DCE task encompasses a single choice, which produces binary outcomes, from which no direct value for a health state can be inferred.

Interview procedures

Respondents were interviewed in computer-assisted personal interviews, following the standardized EQ-VT interview protocol and interviewer script. First, respondents were presented with information about the aims and the content of the study, and completed an informed consent form. Subsequently, respondents completed a warm-up exercise, which included some demographic questions, a self-completion EQ-5D-5L questionnaire and accompanying EuroQol Visual Analogue scale (EQ VAS). After that, the cTTO task was introduced. Respondents were first presented with an example task in which they valued the health state “being in a wheelchair”. Here, the task was explained to the respondents after which they completed the example question. This was followed by another example question: “a health state much better than being in a wheelchair” or “a health state much worse than being in a wheelchair”, depending on whether the respondent considered “being in a wheelchair” as BTD or WTD. This was followed by another 3 practice questions, using EQ-5D-5L health states (states 21121, 35554 and 15411, representing mild, severe and potentially difficult to imagine health states, respectively). Subsequently, respondents valued 10 EQ-5D-5L health states using the cTTO task. After completing the cTTO tasks, respondents were shown the feedback module: a rank order of their answers, after which they could indicate whether any of the responses were in the wrong order [22]. Lastly, respondents were presented with a set of 12 DCE choice pairs, followed by a short demographic survey (with the remaining demographic questions) after which they were thanked for their participation and invited to choose a gift valued at 60 Trinidad and Tobago dollars (about $9 USD) from range of options including gift vouchers, tote bags, water bottles etc. as compensation for their time.

Selection of health states

For the cTTO task, the standard EQ-VT health state design was used comprising 86 health states distributed over 10 blocks of 10 health states. Each block consisted of one mild health state (one of the following: 21111, 12111, 11211, 11121 and 11112), the worst health state 55555 and a set of 8 states chosen from an efficient design comprising 80 health states [23]. For the DCE tasks, respondents were allocated a block of 12 choice pairs out of 20 unique blocks from a Bayesian efficient design using priors from a set of 19 different EQ-5D-5L valuation studies. The design is described in Appendix C. Each respondent was randomly allocated a block of cTTO states and DCE choice pairs.

Quality control

The EQ-VT quality control (QC) procedures were implemented to ensure adequate data quality. We followed the Ramos-Goñi et al. protocol in which interviews were flagged as potentially non-compliant to the interview protocol if at least one of the following conditions was met: 1) the interviewer did not explain the WTD part of the cTTO task in at least one of the wheelchair practice tasks, 2) the interviewer spent less than 3 min on the two wheelchair tasks, 3) the respondent completed the main 10 cTTO tasks in less than 5 min, and 4) state 55555 did not receive the lowest value and another state received a value that was at least 0.5 lower [24]. Data were collected in batches of 10 interviews, after which the data were examined, and the number of interviews that were flagged per interviewer. If more than 40% of interviews were flagged during a single round of interviews, the interviewer failed the QC, their batch of data was removed and the interviewer was retrained. If an interviewer failed the QC twice, the interviewer was removed from the study.

Sampling

A nationally representative sample of 1000 respondents aged 18 years and over was targeted. Quota sampling was performed based on age and sex, as well as the 14 administrative regions of Trinidad and the combined 7 parishes of Tobago based on the 2011 Population and Housing Census for Trinidad and Tobago (Central Statistical Office (CSO) of Trinidad and Tobago) which was the most recent census data available for Trinidad and Tobago. Streets were randomly selected from the CSO maps and 1 in every 4 households were visited on each selected street. One member of each household was selected using the most recent birthday method and invited to take part in the survey. The respondents were recruited by a local market research company. A team of 14 interviewers, employed by the market research company, was trained during a 1-week training session by two members of the research team (HB and BR). Subsequently, each interviewer completed at least two sets of 5 pilot interviews. Four interviewers failed the QC procedures during the pilot phase or thereafter, and were removed from the study. A team of 10 remaining interviewers completed the data collection. Furthermore, interviewer effects were monitored by assessing whether the interviewers produced roughly similar distributions of values. The data were collected between July and September 2022.

Analyses

Several 20-parameter models, with each parameter representing the difference between having no problems and having a certain level of problems on a particular dimension, were estimated on the data. The cTTO and DCE data were modelled in isolation as well as jointly. Details on the functional form of these models can be found in Appendix B.

We first estimated a random intercept model on the cTTO data to account for the nested structure of the cTTO data (respondents each complete multiple cTTO tasks) and a random left-censored intercept Tobit model (to account for the nesting of the data as well as the fact that respondents cannot assign values lower than -1 to any health state). Furthermore, we estimated models that corrected for the heteroskedastic nature of the cTTO data, with and without the Tobit link to account for the censored nature of the data. The regression constant was suppressed in cases where it was not significant.

The DCE data were analyzed using conditional logit and mixed logit models with the latter accounting for preference heterogeneity. Furthermore, hybrid models were estimated which used a joint likelihood function to model the cTTO and DCE data in combination [25]. Hybrid models were estimated taking into account the heteroskedastic nature of the data, as well as the using the Tobit link for the cTTO data. Each model that corrects for heteroskedasticity in the cTTO data, as well as the hybrid models were estimated without a constant when the constant was not significant. Lastly, sensitivity analyses were carried out by estimating the cTTO and hybrid models while leaving out the responses flagged in the feedback module. A final model was selected based on the properties of the model, such as whether heteroskedasticity was present and whether there was a substantial number of censored responses. Furthermore, model fit and predictivity was considered using mean absolute error (MAE). MAE was calculated over all responses as well as how well the models predicted the mean observed cTTO value for the 86 health states included in the cTTO health state design. Lastly, out-of-sample predictivity was tested using a leave-one-out analysis, in which models were estimated by leaving out a single state, predicting its value and evaluating the MAE of the model. The same procedure was used by leaving out a block of health states rather than a single health state. Data analyses were performed in Stata 18, using the xtreg, xttobit, intreg, clogit, mixlogit and hyreg commands.

Results

Demographics

A representative sample (age, sex, geography) of 1,079 adults completed the EQ-VT valuation tasks in face-to-face interviews. The response rate was 34%. Table 1 shows a breakdown of the age and sex distribution of the sample compared with the population over age 18, and Table 2 shows the geographic composition of the sample compared with the population. All comparisons were done against the 2011 census data.

Table 1 Age and sex composition of the sample compared with the population
Table 2 The geographic composition of the sample compared with the population

Ethnicity in the sample generally reflected the 2011 census data for the population aged 18 + with Afro- ethnicity slightly under-represented, and mixed/other slightly over-represented. Table 3 shows that the sample appeared to be more educated than the 2011 census data with 24.8% of the sample being tertiary/university educated versus 11.5%.

Table 3 Ethnicity and education of the sample compared with population data (aged 18 +) from the 2011 Population and Housing Census for Trinidad and Tobago

Valuation results

Each of the 1079 respondents completed 10 TTO tasks and 12 DCE tasks, giving a total of 10,790 TTO tasks and 12,948 DCE tasks. The average completion time was 43 min and 7 s (standard deviation 18 min and 10 s). In the QC process 164 interviews (15%) were flagged. In all, 63 interviews were flagged for not explaining the WTD task, 18 for inconsistencies with state 55555, 45 for not spending enough time on the wheelchair examples, and 50 for not spending enough time on the cTTO tasks. There were 2 non-traders (assigning the full health value to all states) in the sample. Figure 1 shows the distribution of responses in the cTTO task.

Fig. 1
figure 1

Responses to the cTTO task

There was some clustering of responses at 1, 0.5 and -1, and 19.27% of responses were negative. There were some differences in the proportion of responses at 1, 0.5 and -1, between interviewers, suggesting that there may have been some minor interviewer effects. The proportion of responses equalling zero was 3.05%, while 6.33% of responses were at -1, indicating the share of potentially censored data. A Breusch-Pagan test showed that there was heteroskedasticity present in the modelled data. Because of the presence of heteroskedasticity and a relatively large share of potentially censored observations at -1, only models correcting for heteroskedasticity and accounting for censoring in the cTTO data were considered, as well as DCE-only models. Table 4 shows the estimated coefficients of selected cTTO-only, DCE-only and hybrid models. Other models are reported in Appendix A, Tables 5, 6 and 7.

Table 4 Modelling results of the best performing models for cTTO-only, DCE-only and hybrid

In all models pain/discomfort received the highest weight followed by mobility, anxiety/depression, self-care and lastly, usual activities. The value for the worst health state, state 55555, ranged between -0.563 (hybrid heteroskedastic Tobit) and -0.607 (heteroskedastic Tobit model), although other Tobit models, which did not correct for heteroskedasticity, produced lower values (see Appendix A). The difference between the CTTO-only heteroskedastic Tobit model and the hybrid heteroskedastic Tobit model was small in terms of fit statistics such as the MAE. In terms of out-of-sample predictivity, the hybrid heteroskedastic Tobit model performed better than the cTTO-only heteroskedastic Tobit model on removing a single state from the design, while the cTTO-only heteroskedastic Tobit model performed better on removing a whole block. Robustness on out-of-sample predictivity for single states was considered more important, and therefore, the hybrid heteroskedastic Tobit model was selected as the final model, to be used as the value set for Trinidad and Tobago. The value set then takes the following form:

$$\begin{array}{cc}2)&U=1-0.027MO2-0.085MO3-0.187MO4-0.368MO5-0.024SC2-0.072SC3-0.150SC4-0.232SC5-0.011UA2-0.065UA3-0.146UA4-0.219UA5-0.044PD2-0.128PD3-0.311PD4-0.480PD5-0.020AD2-0.074AD3-0.161AD4-0.264AD5\\\end{array}$$

This means that for example health state 21354 would receive the following value: \(U\left(21354\right)= 1- 0.027MO2-0.065UA3-0.480PD5-0.161AD4=0.267\).

Figure 2 shows a Bland–Altman plot for the existing EQ-5D-5L crosswalk value set and the EQ-VT value set. The EQ-VT produces lower values on average (mean EQ-VT 0.386, while this is 0.524 in the crosswalk). The EQ-VT value set had a higher number of negative values (275 states or 8.8% versus 21 states or 0.7%). The mean absolute difference between the value set and the crosswalk set was 0.157 and the correlation coefficient between the two sets was 0.879. The EQ-VT value set had a wider range (-0.563 to 1.000) than the crosswalk set (-0.163 to 1.000).

Fig. 2
figure 2

Bland–Altman plot for the crosswalk versus the EQ-VT value set

On the vertical axis, the difference between the crosswalk and EQ-VT value sets is shown for all 3125 health states. On the horizontal axis, the average value of the crosswalk and EQ-VT value set is shown for each health state

Discussion

Main findings

This study produced a set of EQ-5D-5L values that directly represent the preferences of the Trinidad and Tobago adult population and that can now be used in clinical and economic applications in Trinidad and Tobago as well as in other Caribbean countries for applications in which the Trinidad and Tobago crosswalk values might have been used. Furthermore, the current study is the first one to use a mixed logit model to analyse the DCE data, owing to the use of a larger health state design for the DCE task. The mixed logit models produced similar results to the cTTO-only and hybrid models. The existing crosswalk value set was shown to be considerably different from the EQ-VT value set.

Interpretation

There are two main drivers of differences between values in EQ-5D value sets: differences in scale and differences in the magnitudes of the coefficients of the underlying utility function relative to each other. Differences in scale between EQ-5D-3L value sets and EQ-VT based value sets have been observed in other studies [26]. For differences in scale, the lower values and the increase in the number of negative values in the new value set can be explained by several factors. There could have been greater willingness to trade life-years in the EQ-VT protocol than in the modified MVH Time Trade Off (TTO) protocol that was used to recalibrate the DCE data to a 0 (dead) to 1 (full health) scale in the 2016 EQ-5D-3L valuation study [5]. In the 2016 study, 10% of the respondents who completed all of the TTO tasks were non-traders (assigning the full health value to all states). In this (2023) study this fell to 0.2% (only 2 non-traders). This would in part be associated with the quality control protocol in EQ-VT which ensures that interviewers explain the cTTO tasks to each respondent, thereby promoting a better understanding of the cTTO tasks on behalf of the respondents. Generally, the introduction of the EQ-VT data quality control protocol ensures that respondents were explained all elements of the cTTO task, leading to more reliable data compared to TTO studies that do not employ a quality control strategy [24]. Further, the use of a pilot phase during the data collection may have improved the quality of the collected data which may impact the outcomes of the study as well [26]. Lastly, the method used to value health states with negative values has changed compared to the Measurement and Valuation of Health (MVH) protocol that was followed in the 2016 valuation study, which may affect the values elicited for those states.

The differences in patterns among the coefficients of the crosswalk and EQ-VT value sets could be associated with social change over the 7 year period: e.g. greater awareness of mental health may have influenced the anxiety/depression coefficients and the lock down associated with covid over 2020–2022 may have brought increased salience of usual activities to the respondents [27]. Such changes may highlight the need for revisiting/updating EQ-5D value sets. Further, the crosswalk algorithm was developed based on responses from European respondents. It is possible that T&T respondents respond differently to EQ-5D-3L and EQ-5D-5L, which may exacerbate any differences in value sets. Lastly, the descriptive systems are different between the EQ-5D-3L and the EQ-5D-5L in the mobility dimension, with level 3 mobility being defined as “confined to bed” in the EQ-5D-3L, while level 5 for mobility in the EQ-5D-5L is defined as “unable to walk about”. Directly valuing “confined to bed” may lead to a higher willingness to trade life years as compared to valuing “unable to walk”, as it may be perceived as being more severe [28]. This may (partially) explain differences observed between the crosswalk and directly evaluated EQ-5D-5L value sets, as for the crosswalk, the value for health states with level 4 or 5 problems on mobility are inferred from the value assigned to “being confined to bed”.

Compared to the USA which also used EQ-VT, the Trinidad and Tobago level 5 coefficients follow a similar pattern with pain/discomfort having the largest coefficients in both values sets (0.480 versus 0.414 for the USA) and usual activities having the smallest coefficients (0.219 and 0.255 respectively) [29]. At level 5, both value sets show the same ranking of coefficients (smallest to largest): UA, SC, AD, MO, PD. However at other levels the rankings are not the same, for example for the level 1 coefficients the Trinidad and Tobago value set has: UA, AD, SC, MO, PD whereas for the USA this is AD, PD, UA, SC, MO. State 55555 has values of -0.573 (USA) and -0.563 (Trinidad and Tobago). Such differences in ranking and scale show the importance of using local health-state values to inform resource allocation decisions.

This was the first use of mixed logit models in EQ-VT (due to the use of a new DCE design, which has more choice tasks per respondent, which allows for the mixed logit model to be identified). The current study shows that an expanded health state design for the DCE task allows us to estimate mixed logit models on EQ-VT data, which are theoretically superior to the standard conditional logit model. The results of the mixed logit models were, after rescaling, similar to those of the cTTO-only and hybrid models.

Limitations and strengths

This study had some minor limitations in the sampling. Females in the 35–44 and 55–64 age groups were slightly over-represented. More educated groups were also over-represented but this has been found to have little or no impact on EQ-5D valuation results [30, 31]. Another limitation is that there were some protocol compliance issues with some of the interviewers. Some of these could be solved during the pilot phase but some issues persisted beyond, which had to be resolved during the data collection. Furthermore there were also some interviewer effects suggesting that there may have been some differences in how interviewers conducted their interviews, although this could also be associated with demographic differences in the respondents interviewed by each interviewer.

There are several strengths and contributions of this study. First, our study design allowed the use of mixed logit models to analyse the DCE data; the first country to do so for the EQ-5D-5L using the EQ-VT protocol. Second, the study created an updated set of values for the Trinidad and Tobago population that will replace the crosswalk values. Lastly, the crosswalk was originally created to provide interim value sets for countries that had EQ-5D-3L value sets. Crosswalk sets can also be used in resource-constrained settings to allow users time to build capacity in working with health outcomes. This would facilitate the use of EQ-5D without the resource commitment for EQ-VT, until an EQ-5D-5L valuation study can be undertaken. This study gives users in Trinidad and Tobago the opportunity to move to an updated value set that represents the preferences of the population as of 2022.

Conclusion

This study has produced a state of the art EQ-5D-5L value set for Trinidad and Tobago that can now be used in clinical and resource-allocation decision-making. Changes in the value sets for Trinidad and Tobago over the period 2016 to 2022 highlight the need for revising EQ-5D value sets to ensure they are developed using the highest standard of current practice, and represent the current preferences of the national population. Furthermore, this value set represents the first value set developed using the EQ-VT protocol in the Caribbean and may be used as a reference case for countries in the region with similar population characteristics.