Background

Diabetes mellitus leads to severe micro- and macrovascular complications and results in increased mortality. Complications and treatment of the disease reduce health-related quality of life (HRQoL) [1, 2]. This effect can be measured with numerous disease-specific questionnaires, such as the Diabetes Health Profile (DHP) [3], Diabetes Quality of Life measure (DQOL) [4], Diabetes-39 (D-39) [5] or the Audit of Diabetes Dependent Quality of Life (ADDQoL) [6]. Generic quality of life instruments, such as the 36-Item Short-Form Health Survey (SF-36), the World Health Organization Quality-of-Life Scale (WHOQOL-BREF) or the EQ-5D questionnaire, may complement this measurement or be used separately as a standalone measure [7].

The three-level version of EQ-5D is commonly used in diabetes research and for modelling health outcomes in economic evaluations of antidiabetic drugs. Multiple studies have confirmed EQ-5D-3L measurement properties in patients with diabetes [8]. Recent years have brought the development of a new five-level version of EQ-5D, intended to improve some psychometric properties of the original three-level version [9]. Early assessment within a multi-country study involving patients with eight chronic conditions demonstrated several advantages of the new version: a reduced ceiling effect, improved discriminatory power and convergent validity [10]. These findings were confirmed by a recent review [11].

Several reports on the validity of the EQ-5D-5L in patients with diabetes were published [10, 12,13,14,15]. The majority of them focused on type 2 diabetes [12, 14, 15]. Analyses also confirmed the reliability [13, 14] and responsiveness [15] of the EQ-5D-5L in the clinical context under evaluation. One of the studies employed a qualitative examination of the content validity [12]. In published research, the EQ-5D-5L index was not estimated (only EQ-5D descriptive results were presented) [10, 12] or the EQ-5D-5L index was based on mapping with EQ-5D-3L index values (cross-over value set) [13, 14]. The only EQ-5D-5L validation study in patients with diabetes with health state utility values based on the directly elicited set come from Alberta province (Canada) and used a Canadian time trade-off-based value set [15]. Reports presented comparisons with EQ-5D-3L [13], SF-36 [13] and SF-6D [15]. In none of the studies was convergent validity with SF-12 domains examined. There is also no comparison based on direct methods of the EQ-5D-5L index with EQ-5D-3L, nor with the EQ-5D-5L-index based on a crosswalk algorithm (EQ-5D-5Lcrosswalk index).

Our study aimed to assess the validity of the EQ-5D-5L questionnaire in respondents with self-reported diabetes coming from a general population survey. We aimed to perform a comparison with EQ-5D-3L, SF-12, SF-6D, EQ VAS and the EQ-5D-5Lcrosswalk index.

Methods

Respondents

Adult Polish citizens, participants of a nationally representative general population survey [16, 17] who confirmed having a diagnosis of diabetes and had complete HRQoL data, were allowed to enter the validation study.

General population survey sample recruitment and interviewing was carried out by a market research company—the Public Opinion Research Centre (CBOS). To obtain a representative study group, multi-stage random sampling was used. Firstly, the Polish adult population was divided into 65 strata, taking into account the country's administrative division into 16 provinces and the type and size of the localities where participants resided. The pre-determined study sample was proportionally allocated into layers in a way reflecting the general population structure. Multi-stage random sampling was performed in three steps: first–localities (towns/cities or villages), second—small areas (one or several adjacent streets), third—eight people living in separate households from each of the selected areas. The final selection of individuals was based on their Personal Identification Number (PESEL) [16, 18].

Respondents were classified as having self-reported diabetes if, in response to the following question: "Have you ever been diagnosed with diabetes?", they chose one of the following answers: (a) "Yes, but I don't take any medication", (b) "Yes, I take antidiabetic medication (other than insulin)" or (c) "Yes, I take insulin". Respondents on combined antidiabetic treatment were allowed to choose both answer (b) and (c). The diagnosis was not verified using blood HbA1c, fasting plasma glucose level or using medical records or registries.

Measures

Data were collected during face-to-face interviews led by professional CBOS interviewers in participants' homes. The health-related quality of life of respondents was measured with: EQ-5D-5L [9], EQ VAS, SF-12v2 [19] and EQ-5D-3L [20, 21]. Questionnaire instruments were always presented in the same order as mentioned above. Self-completed paper and pencil versions were used. Sociodemographic data covering age, sex, type and size of locality, administrative region, education level, professional status, religiosity and smoking status were collected using a computer-assisted personal interviewing (CAPI) system.

To obtain EQ-5D index values, we used three different country-specific Polish value sets: (1) a directly elicited EQ-5D-5L set, based on a hybrid model (combining time trade-off (TTO) and discrete choice experiment (DCE) data) [22], (2) an EQ-5D-5L set, based on mapped EQ-5D-3L values and official EuroQol Group crosswalk methodology [23, 24], and (3) a directly elicited EQ-5D-3L set, based on TTO [25]. As there is no Polish country-specific SF-6D value set, we used an SF-12v2-based algorithm for the United Kingdom, developed by Brazier et al. [26]. For comparative purposes, we presented all EQ VAS values transformed to a scale from 0 to 1 (divided per 100). The study was approved by the ethics committee of the Medical University of Warsaw. All participants gave informed consent before inclusion.

Analysis

Only respondents with complete HRQoL data were included in the psychometric analysis. We analysed the proportion and level of logical inconsistencies in EQ-5D-5L–EQ-5D-3L pairs of answers according to a method proposed by Janssen et al. (see [10] for details). In short, an inconsistent response was defined as an EQ-5D-3L response followed by an EQ-5D-5L response that was two levels apart (grade 1 of inconsistency), three levels apart (grade 2 of inconsistency) or four levels apart (grade 3 of inconsistency). We compared EQ-5D-5L, EQ-5D-3L and SF-6D in terms of the frequency of individual health states, ceiling effect and informativity (the discriminatory power) [27]. We evaluated construct validity in terms of known-groups validity and convergent validity of EQ-5D-5L dimensions with SF-12 domains, SF-6D or EQ-5D-3L dimensions, as well as the convergence of EQ-5D-5L index with EQ-5D-3L, SF-6D indexes and EQ VAS.

Discriminatory power was assessed with the Shannon Index (H'), representing the absolute amount of captured informativity, and the Shannon Evenness Index (J'), reflecting the rectangularity of distribution regardless of the number of levels (for details, see [10]). When the instrument achieves an evenness of the distribution (rectangularity), H' approximates 2.32 (for EQ-5D-5L) or 1.58 (for EQ-5D-3L). At the same time, J' approaches 1.0, which indicates maximum informativity captured by the instrument [28].

Known-groups validity was determined for the EQ-5D-5L, EQ-5D-5Lcrosswalk, EQ-5D-3L and SF-6D indices regarding age group, sex, type of diabetes treatment, education level and subjective health status, as determined by the EQ VAS quartile [29, 30]. We hypothesized that health state utility would be higher in younger age groups, males, patients taking no medication or oral antidiabetic drugs, respondents with a medium or high level of education and with a superior subjective assessment of health.

Convergent validity was evaluated by examining the strength of association between the EQ-5D-5L and EQ-5D-3L dimensions, and between the EQ-5D-5L and SF-12 domains using a Spearman rank correlation. We used the following criteria to interpret strength of correlation: Rho < 0.20: absent, 0.20–0.34: poor, 0.35–0.50: moderate, > 0.50: strong [31, 32]. Additionally, the convergence of index values of the generic questionnaires, the SF-12 summary scores and EQ VAS were also assessed with Spearman rank correlation using the interpretation criteria mentioned above.

The relationships between instruments were explored with an intra-class correlation coefficient (ICC; one-way random effects model) and illustrated with Bland–Altman plots. These plots show the relationship between the means of scores (X-axis) and the differences between scores (Y-axis). The 95% limits of agreement were estimated using the following formula: mean of the differences (d) ± 1.96 × SD of d [33]. Differences lying within the 95% limits of agreement are usually interpreted as not clinically important and show that the two measurement methods could be used interchangeably. Potential proportional bias was investigated with linear regression. To examine the influence of scale range differences on observed differences in health state values, we ran an additional Bland–Altman analysis with all utility instruments adjusted to the same scale (from 0 to 1). All data analyses were performed with StatsDirect software (ver. 2.8.0).

Results

From April 2014 to June 2014, 2974 respondents from the general population of Poland were surveyed with HRQoL instruments and a screening question about diabetes [14]. 255 (8.6%) individuals self-reported diagnosis of diabetes. Within this group, 247 (96.9%, mean age 64.6 years, 53.4% female) respondents had complete HRQoL data and were included in the psychometric analysis.

The overall proportion of inconsistent EQ-5D-5L responses, in comparison with EQ-5D-3L, was 7.9%, ranging from 4.9% for pain/discomfort to 14.2% for usual activities. The majority of inconsistencies (89.7%) were level 1, as defined by Janssen et al. [10].

The proportion of respondents reporting 'no problems' was 14.2% for EQ-5D-3L and 9.3% for EQ-5D-5L (compared to 1.6% for SF-6D and 2.4% for EQ VAS). The relative reduction of the ceiling effect in EQ-5D-5L in comparison to EQ-5D-3L (34.5%) was highest in the anxiety/depression dimension (18.2%), followed by mobility (10.4%) and pain/discomfort (10.2%). However, within the usual activities domain, we noticed a relative increase of the ceiling effect (by 15.7%). Figure 1 shows the dichotomized response distributions of the EQ-5D-5L, EQ-5D-3L and SF-6D instruments.

Fig. 1
figure 1

Response distribution of EQ-5D-5L, EQ-5D-3L and SF-6D domains

Both the Shannon Index and Shannon Evenness Index showed high informativity of the EQ-5D-5L pain/discomfort and mobility dimensions in respondents with self-reported diabetes (Table 1). Moreover, the domains mentioned above showed the most significant improvement in relative discriminatory power, when moving from EQ-5D-3L to EQ-5D-5L (an increase in J' of 22.7% and 23.1% respectively). However, the change in the number of levels (from three to five) also resulted in a deterioration of relative informativity within the usual activities and self-care dimensions (a relative decrease in J' of 11.5% and 11.4% respectively).

Table 1 Shannon index (H’) and Shannon Evenness index (J’) for EQ-5D-5L and EQ-5D-3L

The total number of unique health states was 119 for EQ-5D-5L (most common 11111, n = 23 and 11122, n = 17), 43 for EQ-5D-3L (most common 11111, n = 35 and 22222, n = 29) and 172 for SF-6D (most common 243333, n = 8 and 242323, n = 7).

The mean health state utility value for all the respondents with self-reported diabetes was highest when based on EQ-5D-5L—0.798 (SD 0.251; range − 0.446 to 1.0). The corresponding scores for the EQ-5D-5Lcrosswalk index, EQ-5D-3L and SF-6D were lower by 0.044, 0.047 and 0.165 respectively. Figure 2 shows the distribution of scores for the four analysed health state utility instruments and EQ VAS.

Fig. 2
figure 2

Distribution of four health status indices: EQ-5D-5L, EQ-5D-5Lcrosswalk, EQ-5D-3L, SF-6D and EQ VAS

The results for known-groups construct validity confirmed our prior hypotheses: index scores were higher in younger groups, males, those taking no medication or oral antidiabetic drugs, respondents with a medium or high level of education and respondents with better subjective assessment of health according to EQ VAS (Table 2). There were two unexpected outcomes: a lower utility level in patients with insulin therapy, in comparison to patients on combination treatment (for all instruments except SF-6D) and, in terms of EQ-5D-3L, identical scores were observed for the 18–50 and 51–60 age groups. Caution should be taken in the interpretation of known-groups validity results, as the majority of the observed differences were not statistically significant. Moreover, the sample size of the group with the combined treatment was limited.

Table 2 Known-groups construct validity: mean index-based scores of EQ-5D-5L, EQ-5D-5Lcrosswalk, EQ-5D-3L, SF-6D and EQ VAS (and 95% confidence intervals) by patient characteristics

The results for convergent validity of dimensions are shown in Table 3. The EQ-5D-5L and EQ-5D-3L dimensions revealed similar correlations, with a high likelihood of statistical insignificance of difference between the two. The SF-12 social functioning domain was, in general, poorly correlated with the EQ-5D-5L dimensions and poorly correlated or uncorrelated with the EQ-5D-3L dimensions (SC, UA and MO, PD, AD). The relationships between index scores are reported in Table 4. EQ-5D-5L index scores were strongly correlated with other index scores—EQ-5D-5Lcrosswalk, EQ-5D-3L and SF-6D. They were also strongly correlated with EQ VAS and physical component scores (PCS-12), but poorly correlated with mental component scores (MCS-12).

Table 3 Convergent validity with SF-12, SF-6D and EQ-5D-3L domains (Spearman’s rank correlation coefficients) (N = 247)
Table 4 Convergent validity with generic questionnaires indexes (Spearman's rank correlation coefficients) (N = 247)

An ICC of 0.81 between the EQ-5D-5L and EQ-5D-3L index scores indicated good agreement. ICCs of 0.29 and 0.27 showed a poor agreement of EQ-5D-5L with SF-6D scores and EQ VAS, respectively. The Bland–Altman analysis showed a mean difference of 0.047 (95% limits of agreement: − 0.258 to 0.352) between the EQ-5D-5L and EQ-5D-3L index scores, a difference of 0.165 (-0.226 to 0.557) between EQ-5D-5L and SF-6D, and a difference of 0.231 (-0.183 to 0.644) between the EQ-5D-5L and EQ VAS index scores. EQ-5D-5L index scores were higher in 64.0%, 86.6% and 88.7% cases respectively. Overall, 6.4%, 4.9% and 4.4% observations were outside the 95% limits of agreement. The discrepancy between EQ-5D-5L and SF-6D index scores was larger for lower utility values. The adjusted Bland–Altman analysis showed an increase in the mean difference between EQ-5D-5L and SF-6D values to 0.395 (see Additional file 1: Fig. 1).

Linear regression analysis showed signs of proportional bias, indicating that the methods do not agree equally across the range of measurements (Fig. 3).

Fig. 3
figure 3

Bland–Altman plots of EQ-5D-5L and a EQ-5D-3L, b SF-6D and c EQ VAS scores (blue lines represent regression lines)

Discussion

This study indicates that EQ-5D-5L index values based on a directly measured value set, EQ-5D-5L index values based on a crosswalk algorithm and EQ-5D-3L index values provide valid measurement in the population of Polish respondents with self-reported diabetes, coming from a general population survey. We confirmed the construct validity of the EQ-5D-5L questionnaire in terms of known-groups validity and convergence validity with other generic HRQoL measures—EQ-5D-3L, SF-12, SF-6D, EQ VAS and the EQ-5D-5Lcrosswalk index. According to our best knowledge, this is the first study reporting convergent validity of the EQ-5D-5L and SF-12 descriptive systems in patients with diabetes. These are also the first comparisons between EQ-5D-5L index values based on a directly measured value set and the EQ-5D-5Lcrosswalk index, and between EQ-5D-5L index values based on a directly measured value set and the corresponding values in EQ-5D-3L, in patients with diabetes.

It was surprising to find that in the analysed population of self-reported diabetes patients, three dimensions (MO, PD, AD) seemed to function better psychometrically in the EQ-5D-5L questionnaire, while the remaining two performed better in the EQ-5D-3L version. The dimensions of usual activities (UA) and self-care (SC) were characterized by an increase in the ceiling effect (by 15.7% and 0.5% respectively) and a decrease in relative discriminatory power (by 11.4% and 11.5% respectively) in the five-level EQ-5D, compared to the three-level version.

Although other authors of psychometric studies in diabetes have observed similar relationships, this was only to a limited extent. In a study by Pattanaphesaj et al., the EQ-5D-5L questionnaire demonstrated better severity level distribution than EQ-5D-3L in all dimensions other than self-care [13]. A separate study also demonstrated better distribution than EQ-5D-3L across all dimensions other than SC and PD [34] However, in both publications, SC had an overall low absolute informativity in patients with diabetes. This problem was also indicated in the qualitative study of Matza (2015), in which respondents stated that SC is the dimension having the lowest relevance to type 2 diabetes problems [12]. In a recent systematic review of EQ-5D-5L psychometric properties, Feng et al. found that SC is the dimension with the lowest percentage of reported problems across all of the disease groups and healthy samples under analysis [11]. A previous systematic review of these authors indicated that SC is also the domain in which a lower ceiling effect for EQ-5D-3L than EQ-5D-5L most commonly occurs (20% of studies included in the review) [35].

According to Gamst-Klaussen et al. (2018), SC and UA problems are a reflection of the other three dimensions. More specifically, PD and AD are causal indicators that derive SC and UA, with MO in the intermediate position [36]. Interestingly, in our study, the two dimensions with the most significant number of inconsistencies between the five-level and three-level descriptive systems were UA and SC (14.2% and 8.5% inconsistencies respectively). Perhaps to some extent, these results stemmed from the fixed order of presentation of the questionnaires (EQ-5D-5L first, then SF-12, and finally EQ-5D-3L). Self-care and usual activities are certainly less intuitive dimensions than pain, anxiety, or mobility problems. Respondents are generally less likely to report SC [37] and UA restrictions [16]. We can hypothesise that the presentation of the SF-12 questionnaire, preceding the completion of EQ-5D-3L, prompted the respondents to analyse their situation more thoroughly and thus become more aware of their limitations, resulting in a higher percentage of reported complaints (for UA and SC, in 3L than 5L) and a relatively high proportion of inconsistencies within these dimensions. In such a situation, our observation would only be an artefact of the survey design and the sequence of presentation of the questionnaires. The final confirmation can only be obtained from further psychometric studies in diabetic patients, employing a randomised sequence of questionnaires.

In our study, the Bland–Altman analysis showed a mean difference in utility values between EQ-5D-5L and SF-6D of 0.165, which may appear somewhat surprising. This significant disagreement may be the consequence of instrument differences in terms of their health state classification systems (number and type of dimensions and levels included). The second possible cause may be a more than two-fold difference in the ranges of utility scales for both instruments (EQ-5D-5L: utilities from − 0.590 to 1.0, range 1.59; SF-6D: utilities from 0.296 to 1.0, range 0.704)—a consequence of two different valuation methods (time trade-off and standard gamble, respectively). The Bland–Altman analysis of instruments adjusted to the same scale (from 0 to 1) showed an increase of 0.395 in the mean difference in utility values between EQ-5D-5L and SF-6D. These results do not support the hypothesis that the main drivers of the differences between utility values estimated with different instruments were the differences in instrument ranges. After careful examination of the Bland–Altman plot for SF-6D (both unadjusted and adjusted), some form of inverted u-shape can be observed. This shape suggests that for mild and moderate health problems, SF-6D is more responsive than EQ-5D-5L. Contrary, for severe health problems, EQ-5D-5L is more responsive than SF-6D. The majority of the studied population seems to have mild or moderate health problems, the situation in which SF-6D appears to be the more responsive.

The major strength of our study is that the psychometric properties of EQ-5D-5L, EQ-5D-3L and SF-12 could be examined and compared against each other, since they were collected at the same time and within the same cohort. Another important strength is that we were able to use country-specific Polish utility algorithms to estimate and compare three EQ-based indices (two direct and one mapped) [21, 22, 24]. Past studies used crosswalk algorithms [13, 14] or only calculated an EQ-5D-5L index based on a direct value set [15]. Finally, another advantage of our research lies in its resistance to selection bias. Our results are based on data from respondents self-reporting diabetes in a sample representative of the entire population of Poland, in terms of gender, age and geographical region. Previous studies of EQ-5D-5L psychometric properties focused primarily on type 2 diabetes [12, 14, 15] and had a one-centre [13], three-centres [14] or regional [15] character.

Our study has some limitations. We included respondents based on self-reporting of diabetes without verified diagnoses using blood HbA1c, fasting plasma glucose level or through using data from medical records or registries. However, a similar approach may be found in epidemiological research into diabetes [38, 39] and the prevalence of diabetes in our study is in line with Polish research that is based on laboratory tests [40, 41]. Since our research had a cross-sectional nature, and we did not collect longitudinal data, we could not assess the responsiveness or test–retest reliability. Fortunately, this has been carried out by other authors [13,14,15]. We did not include a diabetes-specific instrument in the interview, which could have served as a standard anchor that best measures HRQoL in patients with diabetes. There are many disease-specific instruments used in diabetes studies, and none of them are indisputably considered to represent the gold standard of assessment. None of the five identified studies on EQ-5D-5L psychometric properties in diabetes used disease-specific instruments [10, 12,13,14,15]. Nevertheless, adding a diabetes-specific questionnaire to our comparisons would undoubtedly have deepened the current analysis. As described above, we presented the questionnaires in a fixed order, which might have generated a bias toward the lower response rate in EQ-5D-3L (given at the end) [42]. This was the reason for not assessing the feasibility. To some extent, we addressed potential memory effects, as the SF-12 questionnaire was presented between EQ-5D-5L and EQ-5D-3L. We only used records of respondents with complete HRQoL data and had to reject 8 (3.1%) patients with missing answers. This rejection percentage appears to be relatively low. Moreover, we avoided the necessity of missing scores imputation, which may always lead to bias. The lack of a country-specific value set forced us to estimate SF-6D values based on a UK algorithm developed by Brazier et al. [25], and the lack of a Polish algorithm for PCS-12 and MCS-12 constrained us into employing US norms from 2009 [43]. Most researchers have encountered similar problems, and the solutions we adopted are a standard approach. In simple terms, SF-12 and SF-6D have a lower number of country-specific algorithms than instruments from the EQ-5D family [44].

The golden rule of health-related quality of life research states that, when possible, the researcher should use both generic and disease-specific questionnaires. Our study shows that, in practice, both EQ-5D-5L and EQ-5D-3L are good candidates for the choice of generic instrument to be used in populations of patients with diabetes. The first format is characterised by a slightly lower ceiling effect and improved informativity in some dimensions. Nevertheless, the second format possesses some advantages within the self-care and usual activities dimensions. Another practical implication of our study is that countries lacking a directly-measured value set for EQ-5D-5L may still use mapped (cross-over) value sets in studies in diabetes, as we showed that both approaches produce valid measurements.

Conclusions

In conclusion, evidence supports the EQ-5D-5L descriptive system and EQ-5D-5L index, based on a directly measured value set, as constituting valid generic HRQoL measures in respondents from the general population with self-reported diabetes. The EQ-5D-5L and EQ-5D-3L questionnaires showed clear psychometric advantages across different dimensions.