FormalPara Key Points for Decision Makers

The EQ-5D-5L Mexican value set will facilitate both cross-country comparisons and inclusion of Mexican participants in international multicenter clinical trials of new medical technologies and quality-of-life studies.

The Mexican value set will facilitate the inclusion of QALYs into the HTA process that Mexican healthcare authorities apply for new medical technologies to be financed by public healthcare institutions.

The Mexican value set will allow use of EQ-5D-5L, a preference-based health-related quality-of-life measure, to quantify patients’ health outcomes in Mexico. These data can be used in health economic analysis and contribute to the monitoring of healthcare service quality in the Mexican context.

1 Introduction

Mexico, with a population of just over 125 million in 2018 [1], is considered the second biggest market for medical technology in Latin America, just behind Brazil [2]. Since 2003, the General Health Council (GHC) has positioned itself as the main health technology assessor (decision maker) of the public institutions given that it is this collegiate body's sole responsibility to constantly maintain and update the Basic Formulary of Medications and Healthcare Supplies Catalogue (BFMHSC) [3,4,5,6,7]. This document groups, characterizes, and encodes the drugs, medical materials, instruments, medical equipment, and diagnostics used by the National Health System’s public institutions to provide health services to the population. In 2020 the GHC decided to transform the BFMHSC into the National Compendium of Healthcare Supplies (“Compendium”). The Compendium aims to strengthen the evaluation of healthcare, to optimize public resources directed at addressing health problems in the country, and to notify and update health professionals. The GHC periodically updates the health technology assessment (HTA) processes used to determine inclusion in the Compendium.

HTA in Mexico has relied mainly on cost per life-year analyses. However, the GHC wishes to extend this to include quality of life and quality-adjusted life years (QALYs). To include evidence on cost per QALYs gained, it is necessary to measure and value health-related quality of life. Several generic preference-based measures of health-related quality of life exist for this purpose. One of the most widely used, internationally, for the purpose of facilitating cost-effectiveness analysis, is the EQ-5D-5L [8]. The EQ-5D-5L summarizes health in terms of five dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression) and five levels of problems (no = 1, slight = 2, moderate = 3, severe = 4, extreme/unable to = 5). Its use in HTA involves collection of EQ-5D-5L data from patients (e.g., in clinical trials) and summarizing those data using a “value set.” The value sets indicate how good or bad each health state is, on a scale anchored at 1 (full health) and 0 (dead), as required for the estimation of QALYs. The value sets are generally based on the views of a general public sample, obtained using stated preference elicitation methods [9].

The GHC is strongly interested in adopting results from an internationally well-established methodology for valuing health-related quality of life, suitable for routine use in HTA. Hence, the GHC supported a project developed by the Economic Analysis Unit (EAU) of the Mexican Ministry of Health to estimate the first EQ-5D-5L value set for the Mexican population using the EuroQol Group’s international EQ-5D-5L valuation protocol and software. Rationales for GHC and EAU selecting the EQ-5D-5L include the widespread use of the instrument worldwide in cost-utility analysis [10], and its acceptance by key HTA bodies for use in evidence submitted to their decision-making processes [11,12,13]. The EQ-5D-5L is also suitable for use in population health studies, and in routine outcomes measurement in health-care systems (such as the English NHS PROMs programme and Swedish National Quality Registers) [14, 15]. A further rationale for selecting the EQ-5D-5L is the availability of an international protocol to support and guide the generation of a value set. This protocol has been used to generate EQ-5D-5L value sets in over 20 countries to date, and its methods and processes for quality control have been refined and strengthened to ensure that population preferences are captured in a robust manner, which is required for application in HTA [16].

To address the GHC’s requirements, the main objective of this study was to generate a value set for the Mexican adult general population to support and facilitate the inclusion of QALYs into the HTA process of the Mexican healthcare authorities.

2 Methods

2.1 Study Design

A nationally representative sampling of the Mexican adults (18 years and older) stratified by sex, age, and socioeconomic level was designed and used. The design followed geographical and population frames developed by CONAPO [1], the Mexican Office of Statistics and Geography [17], as well as the socioeconomic classification of households of the Mexican Association of Marketing Research and Public Opinion Agencies (AMAI) [18]. To ensure the sample size of 1,000 completed interviews recommended by the EQ-5D-5L valuation protocol, a 15% non-response rate was considered (see detailed sampling design in Section 1 of the ESM Appendix).

Two methods were used to elicit stated preferences from the sample: (a) composite time trade-off (cTTO), comprising “conventional” TTO to obtain values ≥ 0, and a “lead-time TTO” to obtain values < 0 [19], and (b) discrete-choice experiments, involving pairwise choices between health states. Face-to-face computer-assisted interviews were undertaken with the Mexican general public aged 18 years and over. The study design, sampling, and data quality monitoring followed the EQ-5D-5L valuation protocol version 2.0 that has emerged as best practice from previous studies [14]. This included a training process for the Mexican study team and weekly follow-up meetings with EuroQol Office scientists during the data collection phase. It also recommends a minimum sample of 1000 useable respondents [20].

A prior decision was made to base the Mexican value set on cTTO data only, as long as the cTTO data were of high quality and the resulting value set showed desirable characteristics such as logically ordered parameter estimates. This is in line with the approach used in many other countries’ value-set studies (e.g., [21,22,23,24,25]). DCE data alone can be used to obtain values on a latent scale, but these do not meet the conventions of QALY estimation [26]. cTTO and DCE data can be combined via hybrid modelling [27], but there is a lack of consensus about the merits of this approach, and, notably, the new UK value set for EQ-5D-5L now underway is based on cTTO only [28]. The design, methods, results, and analyses relating to the DCE data are reported in Section 2 of the ESM Appendix.

2.2 Valuation Intervention and Methods of Eliciting Preferences

The EuroQol Valuation Technology (EQ-VT) software captured respondents’ preferences regarding EQ-5D-5L health states. A sub-set of 86 health states were valued directly in the cTTO tasks. From this sub-set, blocks of ten health states were generated, each containing the worst health state (55555), one mild state (level 2 in one dimension only), and eight states that varied in severity. Further details about the experimental design are reported by Oppe and van Hout [20]. The software randomly assigned participants to cTTO blocks.

2.3 Data Collection and Quality Control Process

Data were collected in the period June–August 2019 by a team of 15 interviewers. The interviewers were chosen from 42 candidates who had been recruited by a public opinion agency (De las Heras—Demotecnia). Selection criteria included possession of a social science background (sociologists, social anthropologist, social workers, psychologists, historians) and experience in in-depth interviewing. During the training process two interviewers opted out for personal reasons. Another was discharged during the pilot phase for not meeting the required quality standards [29].

2.4 Data Exclusions

Prior to commencing data analysis, the study team agreed to exclude two sets of data on quality grounds: (a) cTTO data for any respondents who gave the same value for all ten health states; and (b) any cTTO observations flagged for exclusion by the respondent via the feedback module (a feature in the EQ-VT that allows respondents to view the rank ordering of health states that would be inferred from their cTTO responses, and offers them the opportunity to flag any health states they feel, on reflection, may have been evaluate inappropriately). The former exclusion rule is commonly used in value set studies [30]; the latter has regularly been used in recent EQ-5D-5L valuation studies following the introduction of the feedback module (e.g., [31]).

Interviewer effects were assessed, for example, by comparing interviewers’ cTTO value distributions. If the data for a given interviewer showed highly unusual patterns or suggested protocol non-compliance, then those data were excluded on quality grounds. However, such effects were assessed throughout the data collection phase. This included analysing the interim data on a weekly basis and providing regular feedback to interviewers about their performance (such as whether they were spending sufficient time explaining the TTO task to respondents), in line with best practice [29], thereby minimising the need for data exclusions.

2.5 Data Analysis and Modelling

Descriptive analyses were used to examine responses to the tasks. For cTTO, this included inspecting the overall distribution of values and calculating means, medians, and standard deviations, both for each health state and grouped by level sum score (LSS; sum of the five dimension levels—e.g., health state 13122 has an LSS of 1 + 3 + 1 + 2 + 2 = 9). The proportion of respondents with logical inconsistencies in their cTTO data—where a higher value was given to A than to B when B is at least as good as A across all dimensions—was also examined.

The data modelling strategy sought to take three key features of the cTTO data into account. First, the minimum cTTO value is bounded at − 1 by design but respondents might have traded more time in full health had they been given the opportunity to do so, so responses were treated as left-censored at − 1. Second, each respondent undertook ten cTTO tasks so the modelling accounted for the possibility that observations from the same respondent would be more similar than those from other respondents. Third, it is commonly observed that the variance of TTO values increases as the severity of the health states increases (see [26, 31, 32]). The extent to which this was the case in the Mexican data was examined by calculating the standard deviations of residuals (from a 20-parameter generalized least-squares (GLS) model) by health state, and modelling sought to account for this heteroscedasticity.

Four 20-parameter main effects cTTO models were estimated. Model 1 was estimated by a GLS regression. The model accounted for the second feature of the cTTO data noted above, i.e., observations from the same respondent might be more similar than those from other respondents. Model 2 was estimated by a Tobit regression. The model addressed features 1 (i.e., cTTO data were censored at − 1) and 2 of the data. Model 3 is a heteroscedastic model with Bayesian estimation that accounted for features 2 and 3 (i.e., the variance of TTO values might increase as the severity of the health states increase) of the cTTO data. Model 4 addressed all three features of the cTTO data using a heteroscedastic censored model with Bayesian estimation. The estimation was conducted using the Markov chain Monte Carlo simulation (MCMC) method with random walk Metropolis-Hastings algorithm [33]. To confirm the presence of feature 3 in the cTTO data, we checked whether the variance of residuals in the 20-parameter main effects GLS model was constant. Final model selection was informed by theoretical considerations relating to the characteristics of the cTTO data, logical ordering of the parameter estimates (i.e., larger decrements are expected for worse problems), significance of the parameters, and relevant information criteria (i.e., Bayesian information criterion (BIC), Akaike information criterion (AIC), Deviance information criterion (DIC)). As model 4 addressed all features of the cTTO data, we present its specification in detail below:

The latent variable \({\text{cTTO}}_{ij}^{*}\) is censored at − 1. As a result, the observed variable \({\text{cTTO}}_{ij}\) can only have values no less than the censored value.

$$ {\text{cTTO}}_{ij} = \left\{ {\begin{array}{*{20}c} {{\text{cTTO}}_{ij}^{*} } & {{\text{if }}\;{\text{cTTO}}_{ij}^{*} > - 1} \\ { - 1{ }} & {{\text{if}}\;{\text{cTTO}}_{ij}^{*} \le - 1} \\ \end{array} } \right.. $$

A 20-parameter model was estimated in modelling the observed \(cTTO_{ij}\) using Eq. (1).

$$ \begin{aligned} {\text{cTTO}}_{ij} & = 1 - \bigg(\beta_{1} {\text{MO}}2_{j} + \beta_{2} {\text{MO}}3_{j} + \beta_{3} {\text{MO}}4_{j} + \beta_{4} {\text{MO}}5_{j} \\ & \quad + \beta_{5} {\text{SC}}2_{j} + \beta_{6} {\text{SC}}3_{j} + \beta_{7} {\text{SC}}4_{j} + \beta_{8} {\text{SC}}5_{j} \\ & \quad + \beta_{9} {\text{UA}}2_{j} + \beta_{10} {\text{UA}}3_{j} + \beta_{11} {\text{UA}}4_{j} + \beta_{12} {\text{UA}}5_{j} \\ & \quad + \beta_{13} {\text{PD}}2_{j} + \beta_{14} {\text{PD}}3_{j} + \beta_{15} {\text{PD}}4_{j} + \beta_{16} {\text{PD}}5_{j} \\ & \quad + \beta_{17} {\text{AD}}2_{j} + \beta_{18} {\text{AD}}3_{j} + \beta_{19} {\text{AD}}4_{j} + \beta_{20} {\text{AD}}5_{j} \bigg) + u_{i} + \varepsilon_{ij} \\ \end{aligned} $$
(1)
$$ u_{i} \sim {\text{iid}}\, N\left( {0,\sigma_{u}^{2} } \right) $$
$$ \varepsilon_{ij} \sim {\text{ iid }}N\left( {0,\sigma_{j}^{2} } \right), $$

where the 20 parameters for \(\beta \) reflect the decrement in utility from full health as problems reported in each level and dimension of a health state. The utility for a health state is described by 20 dummies, i.e., MO2–MO5 (Mobility levels 2–5), SC2–SC5 (Self-Care levels 2–5), UA2–UA5 (Usual Activities levels 2–5), PD2–PD5 (Pain/Discomfort levels 2–5), and AD2–AD5 (Anxiety and Depression levels 2–5), with value equal to one if the category applies to the description of the health state and zero otherwise; \(u_{i}\) is the respondent level random intercept, \(\varepsilon_{ij}\) is a heteroscedastic error term, subscript i refers to a respondent, subscript j accounts for each valuation task completed, iid refers to independent and identically distributed.

The heteroscedastic error term is specified by Eq. (2), which allows for an exponential relationship between the variance of the cTTO values and the severity of health states.

$$ \begin{aligned} \sigma_{j}^{2} & = \exp \bigg(\alpha_{0} + \alpha_{1} \bigg(1 - \bigg(\beta_{1} {\text{MO}}2_{j} + \beta_{2} {\text{MO}}3_{j} + \beta_{3} {\text{MO}}4_{j} + \beta_{4} {\text{MO}}5_{j} \\ & \quad + \beta_{5} {\text{SC}}2_{j} + \beta_{6} {\text{SC}}3_{j} + \beta_{7} {\text{SC}}4_{j} + \beta_{8} {\text{SC}}5_{j} \\ & \quad + \beta_{9} {\text{UA}}2_{j} + \beta_{10} {\text{UA}}3_{j} + \beta_{11} {\text{UA}}4_{j} + \beta_{12} {\text{UA}}5_{j} \\ & \quad + \beta_{13} {\text{PD}}2_{j} + \beta_{14} {\text{PD}}3_{j} + \beta_{15} {\text{PD}}4_{j} + \beta_{16} {\text{PD}}5_{j} \\ & \quad + \beta_{17} {\text{AD}}2_{j} + \beta_{18} {\text{AD}}3_{j} + \beta_{19} {\text{AD}}4_{j} + \beta_{20} {\text{AD}}5_{j} \bigg)\bigg)\bigg) . \\ \end{aligned} $$
(2)

All models were run with and without applying the exclusion criteria. Comparisons were made with the value sets of other selected countries.

STATA/MP 16.0 was used for all statistical analysis.

3 Results

The mean interview duration—including background questions and task explanations as well as the cTTO and DCE tasks themselves—was 44.2 min (SD: 19.0 min; median: 41.9 min). Of the sample, 38.0% self-reported being in health state 11111. Table 1 summarizes the background characteristics of the sample.

Table 1 Sample background characteristics

3.1 Data Characteristics

Using the feedback module, 2.1% of cTTO responses were flagged as problematic. These responses were excluded, leaving 9787 cTTO observations in the dataset for analysis. No other data were excluded. No respondent gave the same value for all health states. The majority of respondents (70.6%) had no logical inconsistencies within their cTTO data. Before excluding responses flagged via the feedback module, 67.3% of respondents had no logical inconsistencies.

The cTTO values and standard deviations increased with LSS. The mean observed cTTO values ranged from 0.96 (for 11121) to − 0.66 (for 55555). Four health states out of the 86 that were valued directly—55555, 52455, 44553, and 43555—had a mean value of < 0. Of the 9787 cTTO valuations, 1925 (19.7%) were < 0. The proportions of values clustered at − 1, 0, and 1 were 6.9%, 1.3%, and 4.5%, respectively. Descriptive statistics of the values, by health state and by LSS, as well as the value distribution, are shown in Tables 1 and 2 of Section 3 in the ESM Appendix. The histogram of observed cTTO values is presented in Fig. 1 of Section 3 in the ESM Appendix.

3.2 Modelling Results

We checked whether our cTTO data show increased variance as the severity of the health states increases. This data feature was validated in two ways. First, in the test of whether the variance of residuals in the 20-parameter main effects GLS model was constant, the null hypothesis was rejected [p < 0.0001; χ2(1000) = 32429.5]. Second, it was shown graphically that the standard deviation of residuals was greater for more severe health states (see Fig. 3 of Section 3 in the ESM Appendix). As the cTTO data presented all three features noted above, the chosen model should be able to address them all. Our preferred model is model 4, i.e., the Bayesian heteroscedasticity model with censoring at − 1.

Table 2 shows the cTTO results using the preferred model. Just under 4.5% of all TTO observations were left-censored. Excluding observations flagged using the feedback module lowered the deviance information criterion (DIC) but did not substantially affect the model results—the only minor change was that without exclusions Anxiety/Depression 2 (AD2) had a marginally smaller coefficient than Pain/Discomfort 2 (PD2), whereas after exclusions the AD2 coefficient was marginally larger. All of the coefficients are logically ordered—that is, for any given dimension a higher level of problems has a larger coefficient (and therefore confers more disutility) than a smaller level.

Table 2 cTTO model results: Heteroscedastic censored model with Bayesian estimation

The MCMC sample size is 10,000. It is the result of 12,500 MCMC iterations in total with the first 2500 iterations discarded. The diagnostics of MCMC suggests good performance of the preferred model. The acceptance rate is reported as 43.91%. The average efficiency is 7.9%.

In the other models estimated (see Table 3 in Section 3 of the ESM Appendix), the coefficient for Mobility 2 (MO2) is negative but non-significant; all other coefficients are logically ordered and consistent with those of the preferred model. In all models, the largest utility decrement for a dimension-level is for PD5; the smallest is for MO2. There are large utility decrements in the moves from PD4 and AD4 to PD5 and AD5, respectively.

3.3 Using the Preferred Value-Set Model to Calculate EQ-5D-5L Health-State Values

The EQ-5D-5L Mexican value set is based on cTTO data (with exclusions applied) modelled using the Bayesian heteroscedasticity model with censoring at − 1 (Table 2). To apply this model as an algorithm for obtaining EQ-5D-5L health-state values, the parameter estimates for each relevant dimension-level combination should be subtracted from 1 (which represents full health). For example, the value for health state 12345 is 1–0–0.0476–0.0952–0.2283–0.3337 = 0.2952.

The value set ranges from 0.984 for 21111 (mildest health state other than full health, describing slight problems in walking about and no problems on the other dimensions) to − 0.596 for 55555 (worst health state in the descriptive system). Based on the sizes of level 5 coefficients, the most important dimension is pain/discomfort, followed by anxiety/depression, usual activities, mobility, and self-care.

3.4 Comparison with Value Sets of Other Countries

The results from the Mexican study can be compared to those of Uruguay [21], the first Latin American country to undertake an EQ-5D-5L valuation study; and the USA [25], Mexico’s neighbour to the north (Table 3). All three countries have based their value sets on cTTO data only. Mexico sits halfway between Uruguay and the USA in terms of the proportion of EQ-5D-5L health states with a modelled value of less than zero, though its minimum value of − 0.596 is the lowest of the three countries. Compared to the value sets of Uruguay and the USA, the Mexican value set places greater importance on anxiety/depression and usual activities, and less importance on mobility and self-care.

Table 3 Comparison of Mexican, Uruguayan, and USA value-set characteristics

Another Latin American country—Peru—recently completed an EQ-5D-5L value-set study, albeit using a “Lite” protocol that is less reliant on the cTTO [34].

4 Discussion

The value set reported in this paper will enable evidence on QALYs, based on the preferences of Mexican people, to inform the HTA processes that Mexican healthcare authorities use.

While both cTTO and DCE data were collected, following the EuroQol EQ-5D-5L valuation protocol, the study team decided to base the value set on cTTO data. A number of other countries have also made this decision—including the USA and Uruguay, as noted earlier. Notably, the new EQ-5D-5L value set for the UK, recently announced by NICE and the EuroQol Group, will be based only on cTTO data as well [28]. Given the importance of this work for HTA in Mexico, and the greater acceptance of cTTO as a basis for the value sets to be used in HTA, this was deemed the preferred option a priori. The possibility of a hybrid model—where both DCE and cTTO data are modelled together—was considered but rejected, given the lack of consensus regarding hybrid models.

It is important to note the patterns in utility decrements associated with movements between levels on each of the dimensions in the cTTO model. The largest differences in coefficients are evident between levels 5 and 4 (on four of the dimensions), and these differences are particularly marked for pain/discomfort and anxiety/depression. For example, the difference between PD5 and PD4 is 0.2296. These characteristics of the value set will exert an influence on the estimation of QALY gains—for example, suggesting that a one-level improvement from extreme pain/discomfort will yield more QALY gains than a one-level improvement from moderate pain/discomfort, ceteris paribus. The use of the value set to inform cost-effectiveness analysis will reflect these patterns of preferences. The availability of an EQ-5D-5L value set will facilitate local data collection using this instrument to inform future HTA decisions in Mexico. Exactly how the value set is to be used in HTA—for example, whether QALY estimates based on it should be accompanied by sensitivity analyses based on the standard errors—will require consideration and guidance from local decision makers.

Some limitations of the study should be mentioned. The design and methodology followed a well-established international valuation protocol, which involves basing health-state utility values on the preferences of the general population. This practice is informed by normative arguments (e.g., the argument that preferences ought to be elicited from behind a veil of ignorance rather than from individuals with vested interests) [35]. However, it has been criticized in the literature and there have been calls to use preferences from patients (either instead of or in addition to public preferences) [36, 37] as patients are likely to be better informed than the public about what it is like to live in states of impaired health.

Preference data were collected using two techniques—cTTO and DCE—but the value set is based only on TTO data. While there are reasonable grounds for this decision (described above), it means that the DCE data collected will play no role in informing HTA decision making in Mexico. Further, the TTO technique is itself subject to limitations [38] (though it is worth noting that this is true of all stated preference methods). The cTTO and DCE data showed markedly different preference patterns (see Section 2 of the ESM Appendix for an overview of the DCE results). Our existing analyses have not been able to fully explain the reasons for these differences. Qualitative methods—such as think-aloud interviews—may help to explore this issue, but these were not included as part of the current study.

5 Conclusions

The EQ-5D-5L value set reported in this study is the first value set produced for any health-related quality of life instrument in Mexico. The provision of a value set, which reflects the distinctive preferences of the adult Mexican general public, will facilitate and support use of EQ-5D-5L in multicenter clinical trials and quality-of-life studies. This will in turn improve the evidence used in HTA to define technologies that might be financed by public healthcare institutions.