Introduction

In outcomes research, measurement equivalence is achieved if a scale generates comparable scores for individuals at the same level of health regardless of the populations they come from [1]. It is important to test the cross-cultural measurement equivalence of a self-reported health-status scale that is intended to compare health outcomes of populations from different cultures. This is because individuals from different cultures may have different ways of living, thinking, and expressing [2], leading to culture-specific interpretation of questionnaire items and/or response styles and difference in scale scores. When such difference is large enough, measurement equivalence cannot be assumed.

The 5-level EQ-5D (EQ-5D-5L) is a new version of the EQ-5D, a brief, generic health-status instrument [3]. It has been shown to have good psychometric properties [46] and suffer from fewer ceiling effects than the original version, i.e., the 3-level EQ-5D (EQ-5D-3L) [47]. The first part of the instrument contains 5 five-point Likert-type items (no/slight/moderate/severe/extreme), which describe five dimensions of a respondent’s health status on the day of the survey, i.e., mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. An individual’s responses to the five items jointly form a multi-attribute health state for which a utility value (i.e., the EQ-5D-5L index score) can be assigned to indicate the utility of the health state to the general public [8]. The index score is anchored by 0 (death) and 1 (full health), with a higher score indicating higher utility. The second part is the EQ-visual analog scale (EQ-VAS), which is a vertical, 0 (the worst health state) to 100 (the best health state) hash-marked numerical rating scale, to rate respondents’ overall health.

The EQ-5D-5L questionnaire has been available in the official languages of Singapore, a multicultural, multiethnic city-state in South East Asia. Measurement equivalence is important to the use of health-related quality of life (HRQoL) instruments in Singapore because none of the official languages is spoken fluently by all residents, although many of them are multilingual. However, measurement equivalence of the EQ-5D-5L across different language subgroups of the Singaporean population is unknown. Therefore, this study aimed to assess the measurement equivalence of the EQ-5D-5L index score (calculated using an interim algorithm) and the EQ-VAS score among English, Chinese, and Malay versions in patients with type 2 diabetes mellitus (T2DM).

Methods

Patient recruitment

A cross-sectional survey was conducted in a convenience sample of T2DM patients visiting a primary health care institution in Singapore between July and December, 2012. Patients were enrolled if they were: 1) 21 years or older, 2) a Singaporean citizen or permanent resident, 3) diagnosed with T2DM, 4) able to read local newspapers or magazines in English, Chinese or Malay, and 5) able to see well enough to read text in the font size of 14.

Data collection

Patients were approached by interviewers in the clinics while they were waiting for their routine consultations. Consenting patients were asked to complete the EQ-5D-5L questionnaire in English, Chinese or Malay, depending on their language preference. Patients’ socio-demographic and clinical characteristics were collected using a standardized questionnaire. The hemoglobin A1c (HbA1c) values of the patients were obtained from their doctors if they had the routine HbA1c test on the day of the survey. The HbA1c test measured the average blood glucose over the previous weeks and could give an indication of the long-term blood glucose control. This study was approved by the SingHealth Institutional Review Board.

EQ-5D-5L index score

The index score was calculated using an algorithm, which can map each EQ-5D-5L health state to a linear combination of EQ-5D-3L health states [9] and thus the EQ-5D-3L value set [10], since values for EQ-5D-5L health states directly elicited from a representative general population sample were not available. The 5 Likert-type items of the EQ-5D-3L are similar to those of the EQ-5D-5L except that they only have three descriptive levels (no/moderate/extreme). We used the UK value set [11] due to the lack of a Singaporean value set at the time of this study.

Statistical analysis

Descriptive statistics were used to describe participants’ characteristics and responses to the Likert-type items, the EQ-5D-5L index score, and the EQ-VAS score. Responses to the Likert-type items of the EQ-5D-5L were coded into “no(t)” = 0, “slight(ly)” = 1, “moderate(ly)” = 2, “severe(ly)” = 3 and “unable”/”extreme(ly)” = 4, and were compared across language using the Kruskal-Wallis test. The Chi-square test was used for other categorical variables, nominal or ordinal, and the ANOVA test for at least interval variables.

Three multiple linear regression models were used to estimate the between-language (Chinese vs. English, Malay vs. English, and Malay vs. Chinese) difference in the EQ-5D-5L index score, adjusting for age, gender, marital status (married vs. single vs. divorced/separated/widowed), employment status (employed/retired vs. unemployed/homemaker/others), housing type (government-subsidized house with 1–3 rooms vs. government-subsidized house with ≥ 4 rooms/private house), education (≤ secondary vs. > secondary school), HbA1c, body mass index, duration of T2DM (< 5 vs. ≥ 5 years), presence of T2DM-related complications (no vs. yes), and presence of comorbidities (< 2 vs. ≥ 2). Survey language was coded into dummy variables. This multiple linear regression analysis was also performed for the EQ-VAS score.

Measurement equivalence is demonstrated if the difference across language is clinically unimportant [12]. Based on the approach to evaluating therapeutic equivalence in clinical trials [13, 14], we assessed measurement equivalence across language by comparing the 90 % confidence interval (CI) of the between-language difference in the EQ-5D-5L index and EQ-VAS scores, respectively, with a pre-determined equivalence margin that represented a range of score difference too small to be clinically important [15]. Based on studies of the minimally important differences of the EQ-5D [1618], the equivalence margin was set as −0.08 to 0.08 for the EQ-5D-5L index scores [19, 20] and −10.00 to 10.00 for the EQ-VAS scores. This would lead to one of the three possible results (Fig. 1): 1) ‘equivalence’ was demonstrated if the 90 % CI fell completely within the equivalence margin, 2) ‘equivalence undetermined’ (i.e., equivalence cannot be determined, and either equivalence or non-equivalence might be presented) was demonstrated if the 90 % CI partially overlapped with the equivalence margin, and 3)’non-equivalence’ was demonstrated if the 90 % CI fell completely outside the equivalence margin.

Fig. 1
figure 1

Possible relationships between equivalence margins and 90 % confidence intervals

In addition, fifteen multiple logistic regression models, five for each pair of survey languages, were used to compare the response patterns of participants to the EQ-5D-5L items between languages, with and without adjustment of the above-mentioned covariates. In all the models, the response of ‘no(t)’ was coded as 1 (the event) while the response of ‘slight(ly)’ , ‘moderate(ly)’, ‘severe(ly)’ or ‘unable’/’extreme(ly)’ was coded as 0 (the non-event). Survey language was coded into dummy variables.

Statistical tests were two-sided and performed using STATA/SE 11 software (StataCorp, Texas 77845 USA, 1984–2009), with the level of significance set at p < 0.05.

Results

Seven hundred and twenty-nine patients participated in the study, representing an overall response rate of 61.5 %. Participants completing different language versions differed in some characteristics (Table 1). More severe levels of problems were less endorsed by all language groups for all health dimensions of the EQ-5D-5L (Table 2). Participants completing the Malay version (mean ± standard deviation [SD]: 81.85 ± 15.04) had significantly higher mean EQ-VAS score than those who completed the English (mean ± SD: 75.46 ± 18.46) or Chinese (mean ± SD: 78.00 ± 18.33) version (p < 0.001). No statistically significant difference was found in the trend of responses and the mean EQ-5D-5L index score across language.

Table 1 Characteristics of participants
Table 2 Distributions of EQ-5D-5L dimension, index and VAS scores by survey language

Adjusted and unadjusted results from the linear regression analyses are shown in Table 3. After adjusting for the covariates, the mean EQ-5D-5L index score of the Chinese version was higher than that of the English version; the Malay version had a lower mean EQ-5D-5L index score than the English and Chinese versions. Comparisons of the 90 % CIs of the differences with the respective pre-determined equivalence margin suggested that, equivalence of the EQ-5D-5L index scores was demonstrated between the Chinese and English versions and between the Malay and English versions, whereas equivalence could not be determined between the Malay and Chinese versions. The adjusted mean EQ-VAS scores of the Chinese and Malay versions were higher than that of the English version. The Malay version had a higher adjusted mean EQ-VAS score than the Chinese version. The 90 % CIs of the differences suggested that, while equivalence of the EQ-VAS scores was demonstrated between the Chinese and English versions and between the Malay and Chinese versions, equivalence could not be determined between the Malay and English versions.

Table 3 The 90 % confidence intervals of the differences in EQ-5D-5L index and VAS scores between different language groups

Adjusted and unadjusted results from the logistic regression analyses are presented in Table 4. After adjusting for the covariates, participants completing the Malay version were less likely to report ‘no problems’ in mobility than those completing the Chinese version (adjusted odds ratio: 0.435; 95 % CI: 0.221 to 0.855). Other between-language differences were not statistically significant.

Table 4 Odds ratios of reporting problems in EQ-5D-5L dimensions between different language groups

Discussion

Assessment of self-reported health outcomes in Singapore usually involves multiple ethnic groups, which necessitates the use of more than one survey language. Therefore, only cross-culturally equivalent instruments would provide the most valid measurement in such a setting. In this study, measurement equivalence was found between the Chinese and English versions and between the Malay and English versions of the EQ-5D-5L index scores. Measurement equivalence was also found between the English and Chinese versions and between the Malay and Chinese versions of the EQ-VAS scores. The findings are consistent with a previous study, which reported that the 90 % CI of difference in EQ-5D-5L index and EQ-VAS scores between the Chinese and English versions were −0.02 to 0.06 and −5.30 to 5.50, respectively [21]. However, it should be noted that the Chinese and English EQ-5D-5L questionnaires used in that study were not official versions, although the study participants were Singaporean residents.

Nevertheless, equivalence of the EQ-5D-5L index scores could not be determined between the Malay and Chinese versions. Participants using the Chinese version reported better overall health status. This is consistent with previous studies, which found that ethnic Chinese were more likely to endure health problems than other ethnicities [22, 23]. Indeed, our analyses of the participants’ response patterns to the EQ-5D-5L items suggested that participants using the Chinese version, who were all ethnic Chinese, were less likely to report mobility problems than those using the Malay version, who were mainly ethnic Malay and Indian. Equivalence of the EQ-VAS scores between the Malay and English versions could not be confirmed. Patients completing the Malay version had higher adjusted mean EQ-VAS score than those completing the English version, indicating that the former would give higher rates to their overall health than the latter even if they were in same level of health. One explanation could be that the EQ-VAS has been found to be a more mental than physical health measure [24]; studies conducted in Singapore consistently found that Malays reported better mental health than other ethnicities [23, 25].

It should be noted that a comprehensive assessment of the cross-cultural measurement equivalence of the EQ-5D-5L should also include responses to the five items. The individual EQ-5D-5L items have also been used as independent outcome measures, and their cross-cultural equivalence cannot be inferred from that of the index or VAS score. Assessing the equivalence of the items, however, would require a sample size larger than the one we used in the current study. Therefore, we did not perform the equivalence analysis for the EQ-5D-5L items; it would not be informative to conclude that the equivalence of the items between any two language versions cannot be determined. The cross-cultural equivalence of the EQ-5D-5L at the item level should be examined in the future when suitable datasets are available.

This study has a few limitations. First, the convenience sample used in this study may have led to selection bias, as patients who had poorer health may have been less willing to participate in the survey. Second, the EQ-5D-5L index score was calculated using an interim algorithm, mapped to the general UK population-based EQ-5D-3L value set, which may not fully reflect the measurement properties of the index score obtained from direct valuation of the EQ-5D-5L health states. Third, most clinical data (e.g., presence of T2DM-related complications and comorbidities) used in this study were patient-reported, which may not be accurate.

In conclusion, this study provides evidence for the measurement equivalence of the EQ-5D-5L instruments across language, in a multicultural, multiethnic Asian population with T2DM. Future studies are needed to investigate the cross-cultural measure equivalence of the EQ-5D-5L items and whether this research finding can be generalized to other populations.