FormalPara Key Points for Decision Makers

Pakistan did not have an EQ-5D value set.

An EQ-5D-3L value set was derived from a moderately representative sample of the Pakistani general population. Data were collected using a rigorous quality control procedure which led to logical and significant models.

This Pakistani EQ-5D-3L value set is now becoming available and will be used by health-economic evaluations and health-related quality-of-life studies in Pakistan that use EQ-5D.

1 Introduction

The EQ-5D-3L is one of the most widely used preference-based health-related quality-of-life (HRQoL) measures worldwide [1]. It was the first of the family of instruments developed by the EuroQol Group [1, 2]. Since its development, it has been translated into more than 150 languages and in various modes of administration. As the multi-attribute utility instrument (MAUI) preferred by most published pharmacoeconomic guidelines, it has been used all over the world to inform resource allocation decisions in economic evaluations [2, 3], and has been accepted as a valid and responsive tool in multiple disease areas, conditions and cultural contexts [4, 5].

The EQ-5D-3L consists of five domains describing health in terms of mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each domain has three response levels describing the severity of the health problems from no problems, moderate problems to extreme problems, which defines a total of 243 health states [6]. To assist its role in economic evaluations, health state utilities that range from 0 (equivalent to death) to 1 (equal to full health) are needed to quantify the quality-of-life measures. To this end, valuation studies around world have been conducted to derive the 0–1 utility scale. To date, at least 38 value sets have been developed and on-going (www.euroqol.org).

Pakistan is the world’s fifth-most populous country with almost 242 million people, and has the world’s second-largest Muslim population [7]. Despite the high burden of disease and resource constraints, there is a limited use of patient-centred evidence to inform medical decision-making and/or resource allocation [8]. To date, health outcome measurement in Pakistan has relied almost exclusively on clinical endpoints. Hence, including patient-reported outcomes (PRO) measures in decision-making is one step further towards patient autonomy, engagement and dignity within the healthcare system [6]. To improve healthcare resource allocation and patient access, understanding of the public health preferences should play a significant role in priority setting and allocation of the scarce resources [4]. During recent years, an increasing awareness regarding the importance of pharmacoeconomics in Pakistan has been witnessed, which has highlighted the need to conduct high-quality economic evaluations to support and inform pricing and reimbursement decisions and to develop preference-based measures in different disease states [5]. Pakistan intends to employ cost-effectiveness analysis for new and existing medical interventions. The Pakistani government also encourages studies and health decision-making based on cost utility analysis [6].

To date, no value sets exist for either the EQ-5D-3L or the EQ-5D-5L in Pakistan. Previous EQ-5D studies conducted in Pakistan ‘borrowed’ health preferences developed either in Thailand or from the UK [9, 10]. However, for a value set to be valid for Pakistan it should represent the culture and living standards of Pakistan. For this reason, a pilot study was conducted which reported the feasibility of the preference elicitation methods in Pakistani population for the EQ-5D-3L valuations, namely, the time trade-off (TTO) and the discrete choice experiments (DCE) [11]. Based on the confirmation of feasibility of the EQ-5D-3L, the main objective of this study was to conduct a valuation task subsequently to derive a national value set so that calculation of quality-adjusted life years (QALYs) from a societal and Pakistani perspective.

2 Methods

2.1 Study Design

This study implemented the standardized EQ-5D-5L valuation protocol developed by EuroQol using the EQ-Portable Valuation Technology (EQ-PVT) [12]. Details of the study design were reported by Malik et al. [11], in which the descriptions of the pilot feasibility study that precedes this EQ-5D-3L valuation study were provided. Two preference elicitation methods were used, including the composite time-trade-off (cTTO) and discrete choice experiments (DCE). In the DCE task, respondents are presented with two EQ-5D-3L health states (State A and State B), and choose the health state they consider to be the best. The cTTO method is more complex. Respondents are presented with two different lives: 10 years in some EQ-5D-3L health state (Life B) and a number of years in full health, smaller than or equal to 10 years (Life A). Respondents choose which life they consider to be the best, and depending on their answer, the number of years in full health is subsequently varied. The cTTO task ends when respondents indicate they are indifferent between Life A and Life B, and a value is inferred from the number of life years in full health are traded. If respondents indicate they would prefer to die (0 years in full health in Life A), they are presented with a lead-time TTO task (LT-TTO), where Life A and Life B are preceded by an extra 10 years in full health (e.g. 10 years in full health in Life A versus 10 years in full health, followed by 10 years in some EQ-5D-3L state in Life B). The procedure of the LT-TTO task is then similar to the regular TTO task. Details of the DCE method [13] and the description of iteration process of cTTO method is available elsewhere [12, 14].

For cTTO, 28 health states being valued, grouped into 3 blocks with 10 health states in each block. The most severe health state, ‘33333’ (the most severe health problems defined by all EQ-5D-3L domains) was included and valued in all three blocks. For DCE, 60 pairs of health states grouped in 6 blocks were used.

All related documents including the interview guides, survey instructions and the EQ-PVT platform were translated into Urdu, the official language of Pakistan. The translation was done by a professional translator commissioned by the EuroQol Office and reviewed by the study team members. The ethics approval was obtained from the Health Research Ethics Committee at the Hamdard University (HREC, HUIC-091).

2.2 Respondents and Recruitment

A moderately representative sample was recruited from the Pakistani general population. A multi-stage stratified quota sampling approach was used on the basis of respondents’ ethnicity, age, gender and religion beliefs. Adults aged 18 years and older who were able to complete the interviews were included as respondents. Those who reported to have severe mental and/or physical illnesses at the time of survey were excluded. Chan KKW et al. (2020) explored the sample size and prediction accuracy of EQ-5D-3L values and reported that the effect of sample size and threshold for the minimum number of respondents is 300–500 [15]. Moreover, Hansen et al. [16] reported that, keeping in view the cost, the expected gain in prediction accuracy from increasing sample sizes beyond 300–500 respondents is minimal and that the choice of model can compensate for a smaller sample size. Taking this into contemplation, a total of 300 participants were recruited from the three cities using population proportionate to size sampling (PPS) to allocate this sample among the five major ethnic groups of Pakistan. The total sample of 300 participants was distributed among these ethnicities according to their proportion. We considered those cities which have higher density of these ethnic groups, such as Punjabi participants from Lahore, Kashmiri and Pashtun’s participants from Islamabad, and Karachi as the area for Sindhi and Baluchi ethnic communities. Moreover, PPS was also used to allocate quota for each demographic characteristic including gender, age and religion. Data collections were conducted in the three most populous and diverse cities of Pakistan including Islamabad (Capital), Lahore (Punjab) and Karachi (Sindh). Three interviewers who were trained by experts from the EuroQol Research Foundation recruited respondents and conducted the interviews in these cities. Recruitment used several strategies, for example, the networks of the interviewers, promoting via social media, posting flyers in local markets, pharmacies and respondents’ recommendations of other potential respondents. Interviews were performed between February 2019 and August 2019.

2.3 Data Quality Control

During the data collection, performance of the interviewers and the quality of data were checked continuously by the quality assurance officers from the EuroQol Research Foundation. Low-quality interviews and/or responses were identified as follows: (1) the lead-time time trade-off (LT-TTO) was not explained by the interviewers; (2) not enough time was spent on explaining the task, using the wheelchair examples—180 s was used as a lower limit; (3) the ten cTTO health states were valued in less than 5 min; and (4) inconsistency was spotted in the cTTO ratings (e.g. if the value for the worst state ‘33333’ was not the lowest value. Data collected by interviewers that met any of these criteria in at least 40% of their interviews were discarded, as they were of suspicious quality. Details of the quality control protocol can be found in the publication by Ramos-Goni et al. [17].

2.4 Data Analysis

Descriptive analyses were first performed to examine the respondents’ demographic information and other general characteristics. Findings were reported using the mean values, standard deviations and percentages/frequencies.

Several models were used to estimate the cTTO data. In these models, the cTTO value, on a scale between 1 and − 1, was used as the dependent variable. Ten dummy variables were constructed representing the presence of a certain level of problems on a dimension in the health states that are valued, coded as regular dummies. These dummies equal 1 if the respective level of problems is present on the respective dimension. For example, MO3 represents mobility problems at level 3, described as ‘confined to bed’. If the health state to be valued has mobility problems described as ‘confined to bed’, MO3 will equal 1, and 0 otherwise. For example, for health state 21322, the variables MO2, UA3, PD2 and AD2 equal 1, and all other variables (MO2, SC2, SC3, UA2, PD3 and PD3) equal 0, as the health state to be valued is not described by those health problems. The first of the estimated models was a random intercept model, which takes into account the nested structure of the data, that is, that responses were nested in respondents, which may cause correlation between responses within respondents. Second, a random intercept Tobit model was estimated. Tobit models account for censoring that may be present in the observed data. In the case of the cTTO, respondents cannot assign values lower than − 1 to a health state, due to the way in which the cTTO task was constructed. However, some respondents may actually be willing to assign an even lower value to some health states, which is then not reflected in the data. To this end, the Tobit model accounted for this censoring of the data at − 1. Third, as responses vary over the severity of health states, with more variation in responses present for more severe health states, the valuation data is likely to be subject to heteroskedasticity as the error term is not constant over the health domains. To account for this heteroskedasticity, we model the variance of the error term as a function of the ten dummy variables that were included as the dependent variables. Lastly, we estimated a Tobit model that corrected both for heteroskedasticity and for the censored nature of the data.

For the DCE data, we estimated a conditional logit model and a probit model. The dependent variable was a binary variable, with 1 indicating the alternative chosen in the paired comparison and 0 otherwise. Both of these models produce values on a latent scale ranging from 0 to some positive value. Therefore, they cannot be used directly to compute QALYs. The DCE data is therefore needed to be anchored onto the 0–1 scale, i.e. the cTTO data, to produce utilities scaled on the 1 (full health) and 0 (dead) QALY scale. This is done via a hybrid modelling strategy [16] in which a joint likelihood function was estimated for the cTTO and DCE data combined. Hybrid models account for the censoring and the heteroskedasticity, and the combination of both cTTO data were used.

All hybrid and cTTO-only models were estimated with and without a constant. All models were estimated using maximum likelihood estimation (MLE). The final value set was selected based on the logical consistency, significance of the coefficients of the models, model fit criteria such as Bayesian information criterion (BIC) and, mean absolute error (MAE). The MAE was computed both as a MAE over all responses, and as a comparison of the mean observed and predicted values for the 28 health states included in the cTTO health state design.

3 Results

3.1 Respondents

In total, 289 respondents were included in the final data analysis. We excluded 11 respondents that were interviewed for practice sessions. The mean age was 34.05 years (SD 12.27 years), 51.2% were female, 47.1% were married, 62.6% had at least a Bachelor’s degree and 4.2% were unemployed. The majority of the respondents were Punjabis (67.5%), and were Urdu speakers (61.5%). Details can be found in Table 1.

Table 1 General characteristics of respondents (N = 289)

3.2 Valuation Data

The means for each health state valued using cTTO are reported in Appendix 1. Four respondents did not trade off any life years for any health state, and were considered non-traders, but were kept in the modelling sample.

The modelling results are reported in Tables 2 (cTTO models), 3 (DCE models) and 4 (hybrid models). The coefficients were monotonically ordered and significant, except for some of the level 2 problems in some of the models. Usual activities level 3 (indicated as UA3) was assigned the largest weight in all estimated models, followed by mobility level 3, self-care level 3, pain/discomfort level 3 and anxiety/depression level 3, depending on the model. For level 2 problems estimates, self-care received the largest weight, followed by either mobility or usual activities, depending on the model. The random intercept model and random intercept Tobit models for the cTTO data, as well as the conditional logit and probit models for the DCE data and the standard hybrid model, produced insignificant parameter estimates for at least one of the level-dimension dummy variables, suggesting that these were assigned a small weight by the respondents. However, when correcting for heteroskedasticity, this issue was resolved, and significant parameter estimates were found.

Table 2 cTTO modelling results
Table 3 DCE modelling results
Table 4 Final selected hybrid modelling results

The MAE for the observed means for the 28 health states valued in the cTTO task was substantially higher in models that did not correct for heteroskedasticity, compared with those models that did. Figures 1 and 2 plot the mean observed cTTO values (horizontal axis) versus the predicted values for the cTTO models (Fig. 1) and the hybrid models (Fig. 2). The dashed line was the reference for when they would be equal. What can be seen from Fig. 1, is that the cTTO model correcting for heteroskedasticity predicted the mean values for most health states very well, except for health state 33333, the worst health state. The models that do not correct for heteroskedasticity (e.g. the random intercept model and the Tobit model) predict the mean for 33333 much better, but performs much poorer for all other health states. A similar observation can be made for the hybrid models in Fig. 2, where models correcting for heteroskedasticity predict the mean observed cTTO values very well, but not for 33333, and vice versa for the models that do not correct for heteroskedasticity. This leads to much lower MAEs for models correcting for heteroskedasticity, which suggests a better model fit.

Fig. 1
figure 1

Predicted versus mean observed cTTO values for cTTO models

Fig. 2
figure 2

Predicted versus mean observed values for hybrid models

Therefore, the hybrid model correcting for heteroskedasticity, without a constant, was selected as the final model for the value set. The value set can then be represented mathematically as:

\(\begin{aligned} U & = 1 - \bigg(0.049 \times {\text{MO}}2 + 0.279 \times {\text{MO}}3 + 0.057 \times {\text{SC}}2 + 0.217 \times {\text{SC}}3 + 0.040 \times {\text{UA}}2 + 0.280 \times {\text{UA}}3 \\ & \quad + 0.035 \times {\text{PD}}2 + 0.214 \times {\text{PD}}3 + 0.036 \times {\text{AD}}2 + 0.181 \times {\text{AD}}3\bigg). \\ \end{aligned}\) Based on this study, each health state is then assigned a value between − 0.171 (the worst health state 33333) and 1 (the best health state 11111), with 1.23% of all values being negative (the full value set can be seen in Appendix 2). For example, health state 12312 could then be assigned the value of 0.628.

$$\begin{aligned} U & = 1 - \left( {0.057 \times {\text{SC}}2 + 0.280 \times {\text{UA}}3 + 0.035 \times {\text{AD}}2} \right) \\ & = 1 - 0.057 - 0.280 - 0.035 \\ & = 0.628. \\ \end{aligned}$$

4 Discussion

The present study aimed to derive a Pakistani EQ-5D-3L value set. There were 289 respondents who completed the cTTO and DCE tasks as preference elicitation techniques in the EQ-PVT platform. The hybrid model correcting for heteroskedasticity was selected as the final model to represent the value set for Pakistan, ranged between −0.1791 (the worst health state 33333) and 1 (the best health state 11111).

This valuation study of the EQ-5D-3L in Pakistan was conducted following the prior feasibility assessment [11]. A similar approach has been adopted in other countries where a feasibility study was first performed preceding the formal valuation [18, 19]. This EQ-5D-3L value set was developed based on societal preferences obtained from a national representative sample in Pakistan, a country with a large Muslim population and unique cultural aspects. Based on this study, clinicians or researchers in Pakistan are now able to compute Pakistan-specific health state utilities to address their HRQoL measurement needs.

The value for the worst health state ‘33333’ in Pakistani population is − 0.171, which is higher than other EQ-5D-3L value sets in a Muslim majority country such as Tunisia, but lower than 0.131 in Malaysia [20]. Mobility is the dimension in Tunisia and Malaysia that received the largest weight, and in Pakistani population, mobility is the second most weighted. However, Pakistani consider usual activities as the dimension that received the largest weight, but this dimension is considered among the least important in the other two countries.

In terms of modelling used herein, a hybrid model correcting for heteroskedasticity was selected as the final model to represent the value set for Pakistan. This model predicted the mean values for all health states very well, but not for the worst health state 33333. This is due to the fact that the observed mean for 33333 was much lower in relation to the other health states, indicating that respondents are much more willing to give up life years to avoid this health state compared with other health states. The hybrid model we selected was correcting for heteroskedasticity, which ensured that the values for all health states were predicted well, except the worst health state 33333. The alternative was to choose a model that could better estimate the value for health state 33333, but the tradeoff would be worsened predictions for all other health state values. In reality, state 33333 rarely occurs in patients, hence, the model that better represents all other health states was preferred and selected. In comparison to results reported elsewhere, where weights for different health domains varied by country and by statistical models used [21,22,23,24,25,26,27,28]. These differences may very well reflect the country-specific population health preferences, but may also manifest the sociodemographic influences [27] and the possible statistical modelling design artefacts [24]. These differences validate the fact that value sets borrowed from other countries may bias the estimated health outcomes and the need of country-specific value sets. Note that no neighbouring countries of Pakistan (e.g. India, Bangladesh, Nepal, Afghanistan) have their EQ-5D-3L value set; therefore, it is difficult to draw a comparison.

The design of this study led to the use of the EQ-PVT platform developed by the EuroQol Research Foundation. Although effective for the current setting, the EQ-PVT platform is subject to the limitation that it is unable to allow randomization in the blocks of cTTO and DCE, thus each interviewer was assigned specific sequence to ensure equivalence of number of blocks being valued by the respondents. At the same time, the EQ-PVT allowed the current study to be carried out with great flexibility and agility given the circumstances of collecting data in the three cities in Pakistan.

A limitation of the present study must be acknowledged. First, our sample is older, more literate, have higher education and is more urban than the Pakistani general population. To obtain a sample that is highly representative of the general population is challenging, especially respondents who lived in rural areas and those who are illiterate. Future EQ-5D valuation studies in Pakistan, such as for the EQ-5D-5L or EQ-5D-Y, should aim for a more representative sample.

5 Conclusions

The EQ-5D-3L Pakistani value set was developed on the basis of a hybrid model correcting for heteroskedasticity without constant. The most severe level of problems associated with usual activities followed by mobility were key drivers for the health state utilities in Pakistan. The availability of a Pakistani-specific value set may help promote the uptake of the EQ-5D-3L in health economics evaluations and resource allocation decisions.