FormalPara Key Points for Decision Makers

This is the first EQ-5D-5L valuation study in Egypt and the Middle East and North Africa region.

The Egyptian tariff can be used as a scoring system for economic evaluations, to inform decision making, and to improve the quality of health technology assessment in the Egyptian healthcare system.

The availability of the Egyptian tariff will encourage health economists and clinicians to include quality-of-life questionnaires in clinical trials and implement cost-utility analysis and pharmacoeconomic modelling.

1 Introduction

The EQ-5D was developed by the EuroQol Group and is the most widely used preference-based health-related quality-of-life measure [1]. It is used to inform resource allocation decisions in economic evaluations across the world [2,3,4]. In addition, it is the multi-attribute utility instrument preferred by most published pharmacoeconomic guidelines [5] and has been reported as valid and responsive in multiple disease areas and conditions and multiple cultural contexts [6, 7]. The EQ-5D consists of five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. There are several versions: the three-level EQ-5D (EQ-5D-3L) defining 243 health states, the five-level EQ-5D (EQ-5D-5L) defining 3125 health states, and the youth version (EQ-5D-Y) used for pediatric populations [8, 9]. The EQ-5D-5L has advantages over the EQ-5D-3L in that it has more discriminatory power and a more even distribution with improved informativity and reduced ceiling effect [10,11,12,13].

Egypt is the most populous country in the Middle East and exerts significant cultural influence on the region [14]. In Egypt, there is a growing awareness of the importance of pharmacoeconomics. There is a great need to conduct high-quality economic evaluations to support and inform pricing and reimbursement decisions and to develop preference-based measures in different disease states. In Egypt, no value sets exist for either the EQ-5D-3L or the EQ-5D-5L; however, local pharmacoeconomic guidelines recommend the use of the EQ-5D as one of the preferred methods to derive utility [15]. Most published Egyptian economic evaluation studies depend on utility values from other published studies and systematic reviews without a reference value set for Egypt [16,17,18,19,20,21].

The aim of this study was to develop the EQ-5D-5L value set for Egypt by eliciting general public preferences, which will allow the assessment of healthcare interventions using cost-utility analysis and cross-country comparison of health technology assessment (HTA) evidence. This study is a revision of a previously published EQ-5D-5L valuation study for Egypt that was retracted by the authors because of an inconsistency in the preferred model [22, 23]. The models were revised to avoid any inconsistencies.

2 Methods

2.1 Study Design

This study was a computer-based, cross-sectional, interviewer-administered face-to-face survey of a representative Egyptian population following the EQ-VT developed for the valuation of the EuroQoL family of instruments [24]. This study was approved by the Research Ethics Committee at the Faculty of Pharmacy, Cairo University. Written informed consent was obtained from all participants. For reporting the key elements of the Egyptian valuation study, we followed the CREATE checklist for multi-attribute utility instruments [25].

2.2 The EQ-5D-5L Descriptive System

The EQ-5D-5L describes health in terms of five dimensions. Each dimension is described in terms of five levels of severity: no, mild, moderate, severe, and unable/extreme [2, 4]. The combination of the five dimensions and their levels results in a health state. Each health state can be described by a five-digit number that ranges from 11111 (no problems in any of the five dimensions) to 55555 (extreme problems or unable to in all dimensions). The level of sum scores or the “misery score” is a proxy for severity and is calculated by summing the five digits for the given health state [2].

2.3 Preferences-Elicitation Techniques

The EQ-VT design elicits preferences using the composite time trade-off (cTTO) and discrete-choice experiments (DCEs). The cTTO consists of the conventional TTO for health states better than dead and the lead time TTO for states considered worse than dead (WTD). The cTTO design consists of a set of 86 health states assigned to ten blocks. As for the DCE tasks, the participants are asked to choose between two impaired health states. It includes 196 pairs of EQ-5D-5L health states divided into 28 blocks of seven pairs. Detailed descriptions of the valuation protocol and the two elicitation techniques have been previously published [24, 26,27,28].

2.4 Sampling Method and Study Population

Egypt is divided into seven regional units containing 27 governorates [29]. For the best geographical, social, and cultural representation, adult Egyptian participants were recruited from different Egyptian governorates representing all geographical areas as per the population distribution. Participants were recruited through personal contact and from public places such as university campuses, governmental authorities, sporting clubs, and shops using multi-stratified quota sampling based on Egyptian official statistics updated in March 2019 [30]. Adult participants who provided informed consent and were able to understand the valuation tasks were included in the study. The interviews took place at the interviewer’s office, or the participant’s workplace or home, or other public places according to participants’ preferences. The participants did not receive any incentives.

The interviewer team included 12 interviewers (11 females and 1 male). All interviewers were teaching assistants in the Clinical Pharmacy Department, Faculty of Pharmacy, Cairo University, who received intensive training using the training material received from EuroQol. Egypt employs no sex segregation in education, work, or social interactions, so sex matching of interviewers and participants was not necessary.

2.5 Pilot Phase

A well-defined pilot phase (n = 216 interviews) took place from July to October 2019. The main objective of the pilot phase was to test the feasibility and cultural appropriateness of the EQ-VT protocol and to describe which specific elements of the protocol might need adaptation. Other objectives were to standardize interviewers’ performance to reduce variability within and across interviewers, promote quality, and improve data distribution while avoiding clustering at specific values. Some adaptations were applied to the standard valuation protocol where the initial practice health state “wheelchair example” was changed to “migraine example” as most of the participants stated that being in a wheelchair would be worse than being dead. The wheelchair example was originally designed to elicit a “better than dead” response, so this change had the positive effect of ensuring consistency with other valuation studies, where the structure of the familiarization session remained unchanged with the application of the same quality control (QC) criteria. In addition, to facilitate illiterate participants’ comprehension of the tasks, we used visual aids that were tested in the pilot phase. Graphics were used to represent the five dimensions, and colored cards (green, yellow, orange, red, and dark crimson) were used to represent levels 1–5, respectively. These colors were adapted from the traffic light system familiar to participants. Interviewers were instructed to read aloud the health states twice to illiterate participants while placing the colored cards corresponding to the level of severity in front of the graphics to express the health states as they appeared on the screen. All methodological changes undertaken to accommodate cultural and social considerations will be presented in a subsequent publication.

2.6 Interview Process

The valuation tasks were carried out using the standardized Egyptian Arabic version of the EQ-VT software (2.1), where participants were given the study objectives with the clarification that valuation tasks were not intended to cause any conflict with their spiritual or religious beliefs [24]. Participants then reported and rated their own health using the EQ-5D-5L descriptive system and visual analogue scale (VAS). Five practice cTTO tasks were then completed, followed by the valuation of ten cTTO hypothetical EQ-5D-5L health states. Afterwards, a feedback module was completed in which the ten health states were arranged on the screen, with the highest value at the top and the lowest value at the bottom, according to the participant’s choices [31]. Participants could flag any health state that was out of order (flagged health states were excluded from the final data analysis). Next, seven forced paired comparison DCE tasks were presented in random order. Finally, participants completed a validated country-specific questionnaire pertaining to participants’ demographics and views about health, life, and death.

For all the valuation tasks, participants were instructed to read the description of each health state aloud to ensure their engagement.

2.7 Quality Control

The EuroQol Group developed a QC tool [32] to improve protocol compliance. This QC tool flagged interviews that were completed in less than 3 minutes for the wheelchair example or less than 5 minutes for the ten TTO tasks, interviews where the interviewer did not explain the WTD element of the task, or interviews with clear inconsistencies. The QC tool also identified the presence of interviewers’ effects by comparing the distribution of cTTO data across interviewers for any skewed distributions or spikes at − 1, − 0.5, 0, 0.5, or 1. It also detected any unusual patterns in DCE responses, such as respondents selecting only A or only B in all seven choice tasks, or respondents alternating A and B, respectively. QC meetings were held between the Egyptian team and the EQ-VT support team—weekly in the pilot phase and biweekly during actual data collection—to discuss the QC reports. Interviewers were dropped or retrained based on their performance according to the QC reports.

2.8 Data Analysis and Model Selection

We used SPSS software version 22 to calculate the percentages of the sample demographics, self-reported health, and descriptive statistics of the cTTO and DCE responses. Statistical modelling was conducted using STATA software version 14 to estimate the EQ-5D-5L values for all health states. Several models were tested, including generalized least square (GLS), Tobit, heteroskedastic, conditional logit, and hybrid models. The 20-parameter model is a main effect model consisting of 20 dummies, one for each dimension level from mobility level 2 to anxiety/depression level 5 (MO2-AD5) using level 1 as the reference. For the cTTO data, random effects (GLS) models (model 1 and 2) were tested to account for the panel structure of the data and heterogeneity of the participants’ views in valuing EQ-5D-5L health states. Tobit models (model 2 and 3) were used to account for the censored nature of cTTO data because participants could hypothetically continue trading below the left lower bound at − 1 for the WTD health states. The heteroskedastic models (model 3 and 4) were investigated to deal with the heteroskedasticity of the error term as the observed variance of the cTTO values increased with increasing severity of the health state. The heteroskedastic model used is a generalization of the Tobit model, which uses the interval regression (intreg) command of STATA. The intreg command models the error term as a function of the dummies MO2-AD5, accounting for multiplicative heteroskedasticity. This means that the error term is modelled in the same way over all participants. The final model would be subjected to monotonicity constraints, if needed. In all models, the dependent variable of the cTTO data was the disutility defined as 1 minus the cTTO observed value for a given health state.

The DCE data were analysed using the conditional logit model (model 5) where a binary outcome was used (0/1), 0 for dead and 1 for full health, representing the choice of the participant for each pair of the DCE tasks. To compare the modelling results of the cTTO and DCE data, the coefficients of the DCE model were rescaled using the rescaling parameter of the TTO model estimations [33, 34]. The cTTO and DCE data were combined in a hybrid model by multiplying the likelihood function of the cTTO model by the likelihood function of the DCE model [33, 34]. Four hybrid models were tested (models 6–9) by allowing heteroskedasticity and/or censoring at − 1 for the cTTO data and conditional logit model for the DCE data.

2.9 Evaluation of the Model Performance

The model performance was evaluated using prediction accuracy (where root mean square error [RMSE] and mean absolute error [MAE] were calculated), logical consistency of the parameter estimates, the significance level of the parameters (P < 0.05), the model parsimony, the value range between observed and predicted values, and goodness of fit using the Akaike information criterion (AIC) and Bayesian information criterion (BIC) [33, 35]. Other factors were considered in model selection, such as accounting for the censored nature of the data, heteroskedasticity of the error term, and heterogeneity of the participants’ views. Finally, a sensitivity analysis was performed to evaluate the robustness of the tested models by re-inclusion of the participants’ flagged health states.

3 Results

3.1 Data Cleaning

A total of 1378 interviews were conducted from July 2019 to March 2020. Of these, 75 interviews were incomplete, 113 were dropped—along with the three interviewers who conducted them—because of poor protocol compliance, and 216 interviews were pilot, which resulted in 974 interviews being included in the final analysis. We planned to have 1000 final interviews, but sampling was interrupted by the global coronavirus disease 2019 (COVID-19) pandemic. We had good-quality data because QC criteria were strictly followed and the pilot phase was extensive, so 974 interviews were deemed adequate.

3.2 Participants’ Characteristics

Table 1 shows the characteristics of the study sample in comparison with the Egyptian general population [30, 36]. The average age was 36.9 years, and 52.4% of the participants were male. Overall, the sample was representative of the Egyptian adult general population with respect to age, sex, and geographical distribution. However, compared with national statistics, illiterate participants, elderly participants (≥ 65 years), and residents of rural areas were underrepresented in our sample, whereas those aged 35–54 years were overrepresented.

Table 1 Background characteristics of the Egyptian participants

3.3 Self-Reported Health Using the EQ-5D-5L Descriptive System

In the actual sample, 15.2% of the participants were in full health (11111). The most common health problem reported by Egyptian participants was anxiety and depression (64.3%), and the least common health problem was self-care (6.3%). The mean ± standard deviation VAS score was 76.9 ± 16.7 (Table 1).

3.4 Composite Time Trade-Off and Discrete-Choice Experiment Data

The 974 interviews provided 9740 cTTO responses and 6818 DCE responses. The mean interview time was 41 ± 16 min. The mean iterative steps to reach the point of indifference was 7.2 ± 3.2. The mean time spent in the feedback module was 2.8 ± 10.4 min. The participants flagged 898 cTTO responses using the feedback module. A total of 254 (26%) participants had at least one inconsistency (incorrectly ranked), which reduced to 122 (12.5%) after using the feedback module. The number of inconsistencies related to severity “6” (mild issue in one dimension only) and 55555 states was 11 (1%) and 31 (3%), respectively, and reduced to 6 (0.6%) and 3 (0.3%), respectively, after using the feedback module.

The main analysis included all the unflagged cTTO valuations (8842 responses); 40.9 % of these were considered WTD, and the mean observed value was negative for 36 of the 86 health states included in the cTTO design. The percentages of values clustered at − 1, − 0.5, 0, 0.5, and 1 were 13.3%, 4%, 1.5%, 5.2%, and 12.3%, respectively (Fig. 1). As the level sum score increased for the EQ-5D-5L health states, lower mean TTO values and a larger standard deviation were observed (Fig. 2). The mean observed cTTO value of the 86 health states was 0.12 ± 0.73, which ranged from 0.96 ± 0.08 for health state 11211 to − 0.83 ± 0.3 for health state 55555.

Fig. 1
figure 1

Observed composite time trade-off (cTTO) value distribution

Fig. 2
figure 2

Mean observed composite time trade-off (cTTO) values by level of sum scores. SD standard deviation

For the DCE tasks, the participants were likely to choose the health states with the lower misery score as the difference in severity increased between the two health states. In total, 23 participants (2.4%) answered using the following specific pattern (AAAAAAA, BBBBBBB, ABABABA, BABABAB). However, their mean time to complete the DCE tasks was acceptable, so we decided not to exclude these interviews from the analysis.

3.5 Modelling Results

Modelling results are shown for the cTTO, DCE, and hybrid models in Tables 2, 3, and 4, respectively. All the tested models were logically consistent except for some minor inconsistencies appearing in the conditional logit models for DCE data in the self-care and usual activities dimensions level 3 (SC3 and UA3) (Table 3). Furthermore, all model parameter estimates were statistically significant except self-care dimension level 2 (SC2) for the Tobit (models 2) (Table 2) and anxiety/depression dimension level 2 (AD2) in the conditional logit model (model 5). Dimension ranking for the cTTO models in terms of relative importance were as follows. For models 1, 2, and 3, mobility was the most important dimension followed by anxiety/depression, pain/discomfort, self-care, and usual activities (least important). For the heteroskedastic model with constraints (model 4), pain/discomfort was more important than anxiety/depression (0.434 vs. 0.413, respectively). Disutility values of the DCE model (model 5) were calculated by dividing the coefficients of the DCE model by the rescaling factor (factor = 3.884). Mobility had the largest impact on health state preference values for all the tested models.

Table 2 Parameter estimates for composite time trade-off models
Table 3 Parameter estimates for discrete-choice experiment model
Table 4 Parameter estimates for hybrid models

3.6 Preferred Model and Value Set

Both the GLS model and heteroskedastic model with constraints performed better than the other tested models in terms of prediction accuracy (MAE and RMSE), logical consistency, significance level, and goodness of fit (AIC and BIC) (Table 2). However, the heteroskedastic model with constraints (model 4) was considered the preferred model because of its ability to handle the heteroskedasticity of the error term. In addition, it had a lower MAE than the other tested models in the observed cTTO data, in the mean observed values for the 86 health states included in the design, and in the mean observed values for the mildest health states, with level sum score < 10, indicating better accuracy (the fit statistics are shown in the electronic supplementary material). The constant term in the model was not significant and was suppressed (Fig. 3).

Fig. 3
figure 3

Scatterplots of the predicted values of the heteroskedastic model with constraints versus the mean observed values of composite time trade-off (cTTO) of each health state

The predicted cTTO values ranged from − 0.964 for the worst health state (55555) to 0.948 for 11211. About 1123 (35.94%) of the health states were WTD. Dimension ranking in terms of relative importance was mobility (most important), pain/discomfort, anxiety/depression, self-care, and usual activities (least important). For any given health state, the utility value can be calculated by subtracting the regular dummies (parameter estimates) for each dimension level of the health state from 1.

3.7 Sensitivity Analysis

The model performance worsened after inclusion of the flagged health states in the feedback module, so we decided to exclude the flagged health states from the analysis. No other exclusions were applied to the data as only three inconsistencies related to 55555 and only one participant gave the same value for all health states.

4 Discussion

To the best of our knowledge, this is the first EQ-5D-5L valuation study in Egypt and in the Middle East and North Africa (MENA) region. A consistent tariff was generated with statistically significant decrements for all dimensions for use as a scoring system for economic evaluation to inform decision making and improve the quality of HTA in the Egyptian healthcare system.

The successful application of the EQ-VT valuation protocol on the Egyptian population verified the feasibility and cultural appropriateness of using such valuation techniques in Muslim and Arabic-speaking countries. Furthermore, the extensive pilot phase and the periodic QC meetings allowed the Egyptian study team and the EQ-VT support team to enhance the interviewers’ performance and promote compliance with the valuation tasks.

The heteroskedastic model with constraints (model 4) based on the cTTO data was selected as the preferred model for the Egyptian tariff as the cTTO data were of good quality. The parameter estimates of the heteroskedastic constrained model were statistically significant and monotonic and accounted for the heteroskedasticity feature of the data. Two inconsistencies appeared in the DCE conditional logit model. Furthermore, in the tested models, there was a large difference in terms of size of coefficients for the five dimensions at different levels for the DCE and TTO data as both techniques have different underlying assumptions. cTTO data are time-dependent data influenced by scale compatibility and loss aversion [24, 37], whereas the DCE is a choice-based task characterized by attribute non-attendance and lexicographic preferences [38, 39]. Although methods have been developed to correct for attribute non-attendance [40], there exists no software packages that would allow us to use these in combination with hybrid modelling, making it impossible to anchor the attribute-non-attendance adjusted values onto the quality-adjusted life-year scale using the hybrid modelling technique. Other countries such as the USA [41], the Netherlands [42], China [43], Uruguay [44], Korea [45], and Hungary [46] also used only cTTO data to generate their national value sets.

All EQ-5D-5L valuation studies followed the same standardized international protocol (EQ-VT) so the results can be easily compared across countries. In Egypt, mobility had the largest impact on health state preference values. This may be due to the limited access to social welfare for immobility in this country. Furthermore, Egypt lacks the infrastructure that enables people with mobility problems to live normally and independently in society. Mobility was also the most important dimension in all Asian countries [31, 43, 45, 47,48,49,50,51,52], Hungary [46], Uruguay [44], and Canada [53].

In this study, the predicted cTTO values ranged from − 0.964 for the worst health state (55555) to 0.948 for 11211. The worst health state had a higher value than in Taiwan (− 1.0259) [49] and Ireland (− 0.974) [54] but was lower than in all other published valuation studies [31, 41,42,43,44,45,46,47,48, 50,51,52,53, 55,56,57,58,59,60].

Egypt had the largest percentage (40.9%) of the cTTO observations considered to be WTD compared with other countries such as Taiwan (38.5%) [49], Indonesia (35.39%) [47], and Japan (7.5%) [52]. This may be attributed to cultural and social factors as most participants in Egypt preferred to die than to be a burden on family and friends as a result of severe illness, as stated in the country-specific questionnaire (details will be published subsequently). This is in line with findings published in the Indonesian EQ-5D-5L valuation study [47].

There were 1172 (13.3%) observations at − 1, where the participants traded all 20 years of life to avoid living in certain health states in the cTTO task; this percentage was higher than in Ethiopia (8.04%) [55] and lower than in the USA (14.7%) [41] and Hong Kong (16%) [31]. Furthermore, 12.3 and 1.5% of the observations were clustered at 1 and 0, respectively, compared with 20.5 and 5.1%, respectively, in the USA [41]. Clustering at these critical points might be due to interviewer’s effect, task shortcutting, and social and cultural factors. In this study, the QC tool was used rigorously, and the pilot phase was extensive to reduce the variability among and within interviewers, standardize their performance, and improve data quality.

There were some limitations in terms of differences in the distribution of background variables in the actual sample compared with the data provided by the Egyptian Central Agency for Public Mobilization and Statistics [30]. Rural and illiterate participants were underrepresented in our sample as it was difficult for interviewers to reach some rural areas, but extreme effort was made to represent people living in those areas as much as possible; however, the sample accurately represented the geographical distribution in Egypt. The EQ-VT protocol was designed for literate and educated participants. Tunisia recently published an EQ-5D-3L valuation study that only included literate individuals, despite illiterate people representing 18.8% of the general Tunisian population [61]. However, our study team decided not to exclude illiterate participants from the Egyptian study to ensure they had a voice in the produced tariff. The team exerted all possible efforts to interview illiterate participants with the use of some visual aids, without fulfilling the exact quota for illiterate participants (25.8%) because the tool used was not fully validated.

Other demographic characteristics are shown in Table 1. Some characteristics did not significantly deviate from the population, such as religion and employment status, whereas marital status and health insurance coverage differed significantly from the population distribution. Despite the deviations from the exact population distribution, the demographic characteristics still had the required diversity. Furthermore, the estimated quota was not fully fulfilled because the COVID-19 pandemic led to sudden interruption of data collection. These deviations in the sample characteristics in terms of residence and/or education are in line with other valuation studies. Further research is needed to assess the feasibility and impact of weighting of underrepresented characteristics on the produced value sets. A publication exploring the effect of cultural and demographic differences on health valuation in Egypt is underway.

Finally, the availability of the Egyptian tariff will encourage health economists and clinicians to include quality-of-life questionnaires in clinical trials and implement cost-utility analysis and pharmacoeconomic modelling to assist decision makers in appropriate allocation of healthcare resources.

Since cultural and socioeconomic factors play a role in shaping people’s preferences, the high quality of the data used in the Egyptian value set may allow its use in economic evaluations for MENA countries that share common cultural and socioeconomic backgrounds but for which a country-specific value set is not yet available, rather than using tariffs from outside the region [62]. It must be noted that recommendations are for each country to develop its own value set to represent the views and preferences of its own population [63].

5 Conclusion

This is the first value set for EQ-5D-5L based on social preferences obtained from a nationally representative sample in Egypt. The value set will play a key role in economic evaluations and HTAs in Egypt. In addition, other countries in the MENA region may be encouraged to follow suit and develop their own value sets.