FormalPara Key Points for Decision Makers

This is the first EQ-5D-5L valuation study in Egypt and in the Middle East and North Africa region.

The Egyptian tariff can be used as a scoring system for economic evaluations, to inform decision making, and to improve the quality of health technology assessment in the Egyptian healthcare system.

The availability of the Egyptian tariff will encourage health economists and clinicians to include quality-of-life questionnaires in clinical trials and implement cost-utility analysis and pharmacoeconomic modelling.

1 Introduction

The EQ-5D is the most widely used preference-based health-related quality of life (HRQoL) measure, developed by the EuroQol Group [1]. It is used to inform resource allocation decisions in economic evaluations across the world [2,3,4]. In addition, it is the multi-attribute utility instrument preferred by most published pharmacoeconomic guidelines [5]. The EQ-5D consists of five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. There are several versions: the three-level EQ-5D (EQ-5D-3L) defining 243 health states, the five-level EQ-5D (EQ-5D-5L) defining 3125 health states, and the youth version (EQ-5D-Y) used for pediatric populations [6, 7]. The EQ-5D-5L has advantages over the EQ-5D-3L as it has more discriminatory power and even distribution with reduced ceiling effects [8,9,10].

Egypt is the most populous country in the Middle East, having a population of 100 million citizens [11], and exerts significant cultural influence on the region [12]. In Egypt, there is growing awareness of the importance of pharmacoeconomics. There is a great need to conduct high-quality economic evaluations to support and inform pricing and reimbursement decisions and to develop preference-based measures in different disease states. In Egypt, no value sets exist for either the EQ-5D-3L or the EQ-5D-5L; however, local pharmacoeconomic guidelines recommend the use of the EQ-5D as one of the preferred methods to derive utility [13]. Most published Egyptian economic evaluation studies depend on utility values from other published studies and systematic reviews without a reference value set for Egypt [14,15,16,17,18,19].

The aim of this study was to develop the EQ-5D-5L value set for Egypt by eliciting general public preferences, which will allow the assessment of healthcare interventions using cost-utility analysis and cross-country comparison of health technology assessment (HTA) evidence.

2 Methods

2.1 Study Design

This study was a computer-based, cross-sectional, interviewer-administered face-to-face survey of a representative Egyptian population following the EQ-VT developed for the valuation of the EuroQoL family of instruments [20]. This study was approved by the Research Ethics Committee at the Faculty of Pharmacy, Cairo University. Written informed consent was obtained from all participants. For reporting the key elements of the Egyptian valuation studies, we followed the CREATE checklist for multi-attribute utility instruments [21].

2.2 The EQ-5D-5L Descriptive System

The EQ-5D-5L describes five dimensions. Each dimension describes five levels of severity: no, mild, moderate, severe, and unable/extreme [2, 4]. The combination of the five dimensions and their levels results in a health state. Each health state can be described by a five-digit number that ranges from 11111 (no problems in any of the five dimensions) to 55555 (extreme problems or unable to in all dimensions). The level of sum scores or the ‘misery score’ is a proxy for severity and is calculated by summing the five digits for the given health state [2].

2.3 Preference-Elicitation Techniques

The EQ-VT design elicits preferences using the composite time trade-off (cTTO) and discrete-choice experiments (DCEs). The cTTO consists of the conventional TTO for health states better than dead and the lead-time TTO for states considered worse than dead. The cTTO design consists of a set of 86 health states assigned to ten blocks. As for the DCE tasks, the participants are asked to choose between two impaired health states. It includes 196 pairs of EQ-5D-5L health states divided into 28 blocks of seven pairs. Detailed descriptions of the valuation protocol and the two elicitation techniques have been previously published [20, 22,23,24].

2.4 Sampling Method and Study Population

Egypt is divided into seven regional units containing 27 governorates [25]. For the best geographical, social, and cultural representation, adult Egyptian participants were recruited from different Egyptian governorates representing all geographical areas as per the population distribution. Participants were recruited through personal contact and from public places such as university campuses, governmental authorities, sporting clubs, and shops using multi-stratified quota sampling based on Egypt official statistics updated in March 2019 [11], to select a representative sample in terms of age, sex, and geographical distribution. Adult participants who agreed to give informed consent and were able to understand the valuation tasks were included in the study. The interviews took place at the interviewer’s office, or the participant’s workplace or home, or other public places according to participants’ preferences. The participants did not receive any incentives.

The interviewer team included 12 interviewers (11 females and one male). All interviewers were teaching assistants in the Clinical Pharmacy Department, Faculty of Pharmacy, Cairo University, who received intensive training by the research team using the training material received from the EuroQol. Egypt employs no sex segregation in education, work, or social interactions, so sex matching of interviewers and participants was not necessary.

2.5 Pilot Phase

A well-defined pilot phase (n = 216 interviews) took place from July to October 2019. The main objective of the pilot phase was to test the feasibility and cultural appropriateness of the EQ-VT protocol and to describe which specific elements of the protocol might need adaptation. Other objectives were to standardize interviewers’ performance to reduce variability within and across interviewers, promote quality, and improve data distribution while avoiding clustering at specific values. Some adaptations were applied to the standard valuation protocol and the interview process where the initial practice health state ‘wheelchair (WC) example’ was changed to ‘migraine example’ as most of the participants stated that being in a WC would be worse than being dead. The WC example was originally designed to elicit a ‘better than dead’ response, so this change had the positive effect of ensuring consistency with other valuation studies, where the structure of the familiarization session remained unchanged with the application of the same quality control (QC) criteria. In addition, to facilitate illiterate participants’ comprehension of the tasks, we used visual aids that were tested in the pilot phase. Graphics were used to represent the five dimensions, and colored cards (green, yellow, orange, red, and dark crimson) were used to represent levels 1–5, respectively. These colors were adapted from the traffic light system familiar to participants. Interviewers were instructed to read aloud the health states to illiterate participants twice while placing the colored cards corresponding to the level of severity in front of the graphics to express the health states as they appeared on the screen. All methodological changes undertaken to accommodate cultural and social considerations will be presented in a subsequent publication.

2.6 Interview Process

The valuation tasks were carried out using the standardized Egyptian Arabic version of the EQ-VT software (2.1), where participants were given the study objectives with the clarification that valuation tasks were not intended to cause any conflict with their spiritual or religious beliefs [20]. Participants then reported and rated their own health using the EQ-5D-5L descriptive system and visual analogue scale (VAS). Five practice cTTO tasks, followed by the valuation of ten cTTO hypothetical EQ-5D-5L health states, were then completed. Afterwards, a feedback module was completed in which the ten health states were arranged on the screen with the highest value at the top and the lowest value at the bottom, according to the participant’s choices. Participants could flag any health state that was out of order (flagged health states were excluded from the final data analysis). Next, seven forced-pair comparison DCE tasks were presented in random order. Finally, participants completed a validated country-specific questionnaire pertaining to participants’ views about health, life, and death.

For all the valuation tasks, participants were instructed to read the description of each health state aloud to ensure their engagement.

2.7 Quality Control

The EuroQol group developed a QC tool [26] to improve protocol compliance. This QC tool flagged interviews that were completed in less than 3 min for the WC example or less than 5 min for the ten TTO tasks, interviews where the interviewer did not explain the ‘worse than dead’ (WTD) element of the task, or interviews with clear inconsistencies. The QC tool also identified the presence of interviewers’ effects by comparing the distribution of cTTO data across interviewers for any skewed distributions or spikes at − 1, − 0.5, 0, 0.5, and 1. The tool also detected any unusual patterns in DCE responses. QC meetings were held between the Egyptian team and the EQ-VT support team, weekly in the pilot phase and biweekly during actual data collection, to discuss the QC reports. Interviewers were dropped or retrained based on their performance according to the QC reports.

2.8 Data Analysis and Model Selection

We used SPSS software version 22 for the descriptive statistics of the sample characteristics and responses to cTTO and DCE data, and statistical modelling was conducted using STATA software version 14 to estimate the EQ-5D-5L values for all health states. Several models were tested, including generalized least square (GLS), Tobit, heteroskedastic, conditional logit, and hybrid models. The 20-parameter model is a main-effect model consisting of 20 dummies, one for each dimension level from mobility level 2 to anxiety/depression level 5 (MO2–AD5) using level 1 as the reference. For the cTTO data, random effects (GLS) models (models 1, 2, and 3) were tested to account for the panel structure of the data and heterogeneity of the participants’ views in valuing EQ-5D-5L health states. Tobit models (models 2 and 3) were used to account for the censored nature of cTTO data because participants could hypothetically continue trading below the left lower bound at − 1 for the WTD health states. The heteroskedastic models (models 3 and 4) were investigated to deal with the heteroskedasticity of the error term as the observed variance of the cTTO values increased with increasing severity of the health state. The heteroskedastic model used was a generalization of the Tobit model that uses the interval regression (intreg) command of STATA. The intreg command models the error term as a function of the dummies MO2–AD5, accounting for multiplicative heteroskedasticity. This means that the error term is modelled in the same way over all participants. The dependent variable of the cTTO data was the disutility defined as 1 minus the cTTO observed value for a given health state. The DCE data were analyzed using the conditional logit model (model 5), where a binary outcome was used (0/1), 0 for dead and 1 for full health, representing the choice of the participant for each pair of the DCE tasks. To compare the modelling results of the cTTO and DCE data, the coefficients of the DCE model were rescaled using the rescaling parameter of the TTO model estimations [27, 28]. The cTTO and DCE data were combined in a hybrid model by multiplying the likelihood function of the cTTO model by the likelihood function of the DCE model [27, 28]. Four hybrid models were tested (models 6–9) by allowing heteroskedasticity, random effects, and/or censoring at − 1 for the cTTO data and conditional logit model for the DCE data.

2.9 Evaluation of the Model Performance

The model performance was evaluated using the logical consistency of the parameter estimates, the significance level of the parameters (p < 0.05), the value range between observed and predicted values [27, 29], the goodness of fit using the Akaike information criteria (AIC) and Bayesian information criteria (BIC), and prediction accuracy, where root mean square error (RMSE) and mean absolute error (MAE) were calculated. Other factors were considered in model selection, such as accounting for the censored nature of the data, heteroskedasticity of the error term, and heterogeneity of the participants’ views. Finally, a sensitivity analysis was performed to evaluate the robustness of the tested models by re-inclusion of the participants’ flagged health states.

3 Results

3.1 Data Cleaning

A total of 1378 interviews were conducted from July 2019 to March 2020, where three interviewers were excluded from the data collection and their interviews were dropped (n = 113) because compliance with protocol was poor. Of the remaining 1267 interviews, 75 were incomplete and 216 were pilot, which resulted in 974 interviews being included in the final analysis. We planned to have 1000 final interviews, but sampling was interrupted by the global coronavirus 2019 (COVID-19) pandemic. We had good-quality data because we followed the QC criteria strictly and had an extensive pilot phase, so 974 interviews was deemed adequate.

3.2 Participants’ Characteristics

Table 1 shows the characteristics of the study sample in comparison with the Egyptian general population [11, 30]. The average age was 36.9 years, and 52.4% of the participants were males. Overall, the sample of participants was representative of the Egyptian adult general population with respect to age, sex, and geographical distribution. However, compared with national statistics, we found that illiterate participants, elderly participants (≥ 65 years), and residents of rural areas were underrepresented in our sample, whereas those aged 35–54 years were overrepresented.

Table 1 Background characteristics of the Egyptian participants

3.3 Self-Reported Health Using the EQ-5D-5L Descriptive System

In the actual sample, 17.3% of participants were in a state of full health (11111). The most common health problem reported by Egyptian participants was anxiety and depression (64.3%), whereas the least common health problem was self-care (6.3%). The mean VAS score was 76.9 ± 16.7 (Table 1).

3.4 Composite Time Trade-Off and Discrete-Choice Experiment Data

The 974 interviews provided 9740 cTTO responses and 6818 DCE responses. The mean interview time was 41 ± 16 min. The mean iterative steps to reach the point of indifference was 7.2 ± 3.2. The mean time spent in the feedback module was 2.8 ± 10.4 min. The participants flagged 898 cTTO responses using the feedback module. A total of 254 (26%) participants had at least one inconsistency, which reduced to 122 (12.5%) after using the feedback module. The number of inconsistencies related to severity ‘6’ (mild issue in one dimension only) and 55555 states were 11 (1%) and 31 (3%), respectively, which reduced to 6 (0.6%) and 3 (0.3%), respectively, after using the feedback module.

The main analysis included all the unflagged cTTO valuations (8842 responses); 41% of these were considered WTD, and the mean observed value was negative for 36 of the 86 health states included in the cTTO design. The percentages of values clustered at − 1, − 0.5, 0, 0.5, and 1 were 13.3%, 4%, 1.5%, 5.2%, and 12.3%, respectively (Fig. 1). As the level sum score increased for the EQ-5D-5L health states, lower mean TTO values and a larger standard deviation were observed (Fig. 2). The mean observed cTTO values of the 86 health states was 0.12 ± 0.73, which ranged from 0.96 ± 0.08 for health state 11211 to − 0.83 ± 0.3 for health state 55555.

Fig. 1
figure 1

Observed (composite time trade-off [cTTO]) value distribution

Fig. 2
figure 2

Mean observed composite time trade-off value by level of sum scores. cTTO composite time trade-off, SD standard deviation

For the DCE tasks, the participants were more likely to choose the health states with the lower misery score as the difference in severity increased between the two health states. In total, 23 participants (2.4%) answered using the following specific pattern (AAAAAAA, BBBBBBB, ABABABA, BABABAB). However, their mean time to complete the DCE tasks was acceptable, so we decided not to exclude these interviews from the analysis.

3.5 Modelling Results

Modelling results are shown for cTTO, DCE and hybrid models in Tables 2, 3 and 4 respectively. All the tested models were logically consistent except for some minor inconsistencies appearing in the conditional logit models for DCE data in the self-care and usual activities dimensions, level 3 (SC3 and UA3) (Table 3). Furthermore, all model parameter estimates were statistically significant except self-care dimension level 2 (SC2) for the Tobit and heteroskedastic Tobit models (models 2 and 3) (Table 2) and anxiety/depression dimension level 2 (AD2) in the conditional logit model (model 5). Dimension ranking for the cTTO models in terms of relative importance were as follows. For models 1, 2, and 3, mobility was the most important dimension, followed by anxiety/depression, pain/discomfort, self-care, and usual activities (least important). For the heteroskedastic model (model 4), pain/discomfort was more important than anxiety/depression (0.436 vs. 0.412, respectively). Disutility values of the DCE model (model 5) were calculated by dividing the coefficients of the DCE model by the rescaling factor (factor = 3.884). Mobility had the largest impact on health state preference values for all the tested models.

Table 2 Parameter estimates for composite time trade-off models
Table 3 Parameter estimates for discrete-choice experiment model
Table 4 Parameter estimates for hybrid models

3.6 Preferred Model and Value Set

Both the GLS and the heteroskedastic models performed better than the other tested models in terms of logical consistency, significance level, goodness of fit (AIC and BIC) and prediction accuracy (MAE and RMSE) (Table 2). However, the heteroskedastic model (model 4) was considered the preferred model because of its ability to handle the variability of the error term. In addition, it had a lower MAE than the other tested models, indicating better accuracy. The constant term in the model was not significant and reported a reasonable value range between the observed and predicted values (Fig. 3).

Fig. 3
figure 3

Scatterplots of the predicted values of the heteroskedastic model versus observed values of composite time trade-off

The predicted cTTO values ranged from − 0.933 for the worst health state (55555) to 0.953 for 11211 and 11121. About 36.3% (1136) of the health states were WTD. Dimension ranking in terms of relative importance was mobility (most important), pain/discomfort, anxiety/depression, self-care, and usual activities (least important). For any given health state, the utility value can be calculated by subtracting the regular dummies (parameter estimates) for each dimension level of the health state from 1.

3.7 Sensitivity Analysis

The model performance worsened after the inclusion of the flagged health states in the feedback module, so we decided to exclude the flagged health states from the analysis. No other exclusions were applied to the data as only three inconsistencies related to 55555, and only one participant gave the same value for all health states.

4 Discussion

To the best of our knowledge, this is the first EQ-5D-5L valuation study in Egypt and in the Middle East and North Africa (MENA) region. A consistent tariff was generated with statistically significant decrements for all dimensions that can be used as a scoring system for economic evaluation, inform decision making, and improve the quality of HTA in the Egyptian healthcare system.

The successful application of the EQ-VT valuation protocol on the Egyptian population verified the feasibility and cultural appropriateness of using such valuation techniques in Muslim and Arabic-speaking countries. Furthermore, the extensive pilot phase and the periodic QC meetings allowed the Egyptian study team and the EQ-VT support team to enhance the interviewers’ performance and promote compliance with the valuation tasks.

The heteroskedastic model (model 4) based on the cTTO data was selected as the preferred model for the Egyptian tariff. The cTTO data were of very good quality, as all the parameter estimates of all the tested models were logically consistent and statistically significant and only two inconsistencies appeared in the DCE conditional logit model. Furthermore, in the tested models, there was a large difference in terms of size of coefficients for the five dimensions at different levels for the DCE and TTO data as both techniques have different underlying assumptions. cTTO data are time-dependent data influenced by scale compatibility and loss aversion [20, 31], whereas the DCE is a choice-based task characterized by attribute non-attendance and lexicographic preferences [32, 33]. Other countries, such as the USA [34], the Netherlands [35], China [36], Uruguay [37], Korea [38], and Hungary [39] also used only cTTO data to generate their national value set.

All EQ-5D-5L valuation studies followed the same standardized international protocol (EQ-VT) so the results can be easily compared across countries. In Egypt, mobility had the largest impact on health state preference values. This may be due to limited access to social welfare for immobility. Furthermore, Egypt lacks the infrastructure that enables people with mobility problems to live normally and independently in society. Mobility was also the most important dimension in all Asian countries [36, 38, 40,41,42,43,44,45,46], Hungary [39], Uruguay [37], and Canada [47].

In this study, the predicted cTTO values ranged from -0.933 for the worst health state (55555) to 0.953 for 11211 and 11121. The worst health state had a higher value than in Taiwan (− 1.0259) [42] and Ireland (− 0.974) [48] but was lower than all other published valuation studies [34,35,36,37,38,39,40,41, 43,44,45,46,47, 49,50,51,52,53,54].

Egypt had the largest percentage (41%) of cTTO observations considered to be WTD compared with other countries such as Taiwan (38.5%) [42], Hong Kong and Indonesia (36%) [40, 46], Japan and Korea (0.1%) [38, 45], The Netherlands (2%) [35], China (10%) [36], and Ethiopia (11.7%) [49]. This may be attributed to cultural and social factors as most participants in Egypt preferred to die than to be a burden on family and friends if they had severe illness, as stated in the country-specific questionnaire (publication of detailed questionnaire results is underway). This is in line with findings published in the Indonesian EQ-5D-5L valuation study [40].

There were 1172 (13.3%) observations at − 1, where the participants traded all 20 years of life to avoid living in certain health states in the cTTO task; this percentage is higher than in Ethiopia (8.04%) [49] and Indonesia (6.92%) and lower than in the USA (14.7%) [34] and Hong Kong (16%) [46]. Furthermore, 12.3 and 1.5% of the observations were clustered at 1 and 0, respectively, compared with 20.5 and 5.1%, respectively, in the USA [34]. Clustering at these critical points might be due to interviewer’s effect, task shortcutting, and social and cultural factors. In this study, the QC tool was used rigorously and the pilot phase was extensive to reduce the variability among and within interviewers, standardize their performance, and improve data quality.

There were some limitations in terms of differences in the distribution of background variables in the actual sample compared with the data provided by the Egyptian Central Agency for Public Mobilization and Statistics [11]. Rural and illiterate participants were underrepresented in our sample, it was difficult for the interviewers to reach some rural areas but extreme effort was made to represent people living in those areas as much as possible; however, the sample accurately represented the geographical distribution in Egypt. The EQ-VT protocol was designed for literate and educated participants. Tunisia recently published an EQ-5D-3L valuation study that only included literate individuals, in spite of the fact that illiterate people represent 18.8% of the general Tunisian population [55]. However, our study team decided not to exclude illiterate participants from the Egyptian study to ensure they had a voice in the produced tariff. The team exerted all possible efforts to interview illiterate participants with the use of some visual aids, but decided not to follow the exact quota for illiterate participants (25.8%) because the tool used was not fully validated.

Other demographic characteristics are shown in Table 1. Some characteristics did not significantly deviate from the population, such as religion and employment status, whereas marital status and health insurance coverage differed significantly from the population distribution. Despite the deviations from the exact population distribution, the demographic characteristics still produced the required diversity. Furthermore, the estimated quota was not fully fulfilled because the COVID-19 pandemic led to the sudden interruption of the data collection. These deviations in the sample characteristics in terms of residence and/or education are in line with other valuation studies. Further research is needed to assess the feasibility and impact of weighting of underrepresented characteristics on the produced value sets. To explore how demographic characteristics affected participants’ preferences, a publication exploring the effect of cultural and demographic differences on health valuation in Egypt is underway.

Finally, the availability of the Egyptian tariff will encourage health economists and clinicians to include quality-of-life questionnaires in clinical trials and implement cost-utility analysis and pharmacoeconomic modelling to assist decision makers in appropriate allocation of healthcare resources.

Since cultural and socioeconomic factors play a role in shaping people’s preferences, the high quality of the data used in the Egyptian value set may allow its use in economic evaluations for MENA countries that share common cultural and socioeconomic backgrounds but for which a country-specific value set is not yet available, rather than using tariffs from outside the region [56]. It must be noted that recommendations are for each country to develop its own value set to represent the views and preferences of its own population [57].

5 Conclusion

This is the first value set for EQ-5D-5L based on social preferences obtained from a nationally representative sample in Egypt. The value set will play a key role in economic evaluations and HTAs in Egypt. In addition, other countries in the MENA region may be encouraged to follow suit and develop their own value sets.