Data
This study aimed at assessing measurement properties for 3L and 5L in eight broad patient groups. A student cohort was added in order to investigate how both instruments perform in a healthy population sample. Respondents completed both the 3L and 5L in six countries: Denmark, England, Italy, the Netherlands, Poland, and Scotland. Data collection in Denmark was conducted through the endocrinology, rheumatology, and orthopedic departments of a regional university hospital. Data collection in England was organized through a specialist patient recruitment agency and aimed at patients with prespecified conditions. In Italy the cohort of liver disease patients completed the questionnaires locally at two hospitals (Bergamo and Naples). Data collection in the Netherlands was conducted at a specialist center for personality disorders and at a local hospital for the kidney dialysis patients. In Poland, the student cohort was recruited at the Medical University of Warsaw in Poland, and the stroke cohort was recruited through the Neurological Clinic in Warsaw. Data collection in Scotland took place through a specialist patient recruitment agency, with patients completing the questionnaires at primary care centers. Paper and pencil versions of the questionnaires were used in all countries except in England where data collection took place online. Data collection took place between August 2009 and September 2010. The 5L was administered first, followed by the EQ-5D visual analogue scale (EQ-VAS) and a number of demographic questions, then the 3L, and finally a set of five dimension-specific rating scales. All respondents scored 5L first, as a previous study showed a tendency to avoid the in-between levels 2 and 4 of 5L when responding to the 3L first [20]. Data collection was undertaken with informed consent and according to the ethical guidelines for health research in each country.
Measures
The 3L version of the EQ-5D is the initial version that has been used in many clinical trials and methodological studies published in the peer-reviewed literature [1]. It is a brief self-reported generic measure of current health that consists of five dimensions (Mobility, Self-Care, Usual Activities, Pain/Discomfort, and Anxiety/Depression), each with three levels of functioning (no problems, some problems, and unable to/extreme problems). This health state classification describes 243 unique health states that are often reported as vectors ranging from 11111 (full health) to 33333 (worst health). Societal value sets have been derived from population-based valuation studies around the world that, when applied to the health state vectors, result in preference-based index values that typically range from states worse than dead (<0), to 1 (full health), anchoring dead at 0. In addition, the EQ-5D includes an EQ-VAS where own health “today” is rated on a scale from 0 (worst imaginable health) to 100 (best imaginable health).
In developing the 5L, the five-dimensional structure of the 3L was retained, but the descriptors within each dimension were adapted to a 5-level system based on qualitative and quantitative studies conducted by the EuroQol group [19]. The labels for 5L followed the format no problems, slight problems, moderate problems, severe problems, and unable to/extreme problems for all dimensions. For Mobility, the description of “confined to bed” was changed to “unable to walk about.” Additionally, for Usual Activities, the word “performing” was changed to “doing” (English for UK version). The official EQ-5D-3L and EQ-5D-5L language versions for each country were used.
For the purposes of the current study, respondents also rated their own health “today” on five dimension-specific rating scales, one for each of the EQ-5D dimensions. Each scale consisted of a horizontal hash-marked line (from 0 to 100) with corresponding numbers (0, 10, 20, …, 100). The descriptive anchors at each end of the scales were the same anchors as used in the 3L and 5L, that is, no problems and unable to/extreme problems.
Convergent validity was assessed by comparing the 3L and 5L dimensions to the WHO-5 Well Being questionnaire. The WHO-5 captures well-being and was developed from the World Health Organization-Ten Well-Being Index [24, 25]. It was conceptualized as a unidimensional measure that contains five positively worded items: “I have felt cheerful and in good spirits”; “I have felt calm and relaxed”; “I have felt active and vigorous”; “I woke up feeling fresh and rested”; and “My daily life has been filled with things that interest me,” all operationalized using a six-point Likert scale ranging from 0 (not present) to 5 (constantly present). A sum-score can be calculated as a summary measure.
Analysis
Feasibility was assessed by calculating the number of missing values for 3L and 5L. The ceiling of the EQ-5D was defined as the proportion of respondents scoring no problems on any of the five dimensions, that is, the proportion of respondents scoring 11111. Under the assumption that the majority of patients should have at least some problem on at least one of the EQ-5D dimensions, we expect the ceiling to be lower for 5L compared to 3L. An absolute reduction when going from 3L to 5L was calculated, but since the ceiling was very small in some patient groups, a percentage reduction was also calculated: (ceiling3L − ceiling5L)/ceiling3L.
Redistribution properties of the 3L to 5L extension
Redistribution properties and (in)consistency of responses were evaluated using criteria established in previous studies [20, 21]. An inconsistent response was defined as a 3L response followed by a 5L response that was at least two levels away. The redistribution properties of the consistent response pairs were described as proportions of the 3L–5L response pairs within each 3L response level (3L-1, 3L-2, and 3L-3) and corresponding dimension-specific rating scale values. For valid redistribution, dimension-specific rating scale values should be increasing when going from the “healthiest” response pair (3L-1 paired with 5L-1) to the most extreme response pair (3L-3 paired with 5L-5).
Discriminatory power
The Shannon index and the Shannon Evenness index were used to assess discriminatory power. Originating from the field of information theory, the Shannon index has been widely used in ecological studies as a measure of biodiversity and in molecular biology as a measure of the information content of DNA molecules [26–28]. Previous research showed Shannon’s methodology to be useful in assessing discriminatory power in health state classifications [20, 21, 23, 29, 30]. In the present study, we estimated discriminatory power for each dimension separately. The Shannon index is defined as:
$$ H^{\prime} = - \sum\limits_{i = 1}^{L} {p_{i} \log_{2} } p_{i} $$
where H′ represents the absolute amount of informativity captured, L is the number of levels, and p
i
= n
i
/N, the proportion of observations in the ith level (i = 1, …, L), where n
i
is the observed number of scores (responses) in level i and N is the total sample size [31]. The higher the index H′ is, the more information is captured by the system. In the case of a uniform (rectangular) distribution (i.e., p
i
= p* for all i), the optimal amount of information is captured and H′ has reached its maximum (H′max) which equals log2
L. If the number of levels (L) is increased, H′max increases accordingly, but H′ will only increase if the newly added levels are actually used. Shannon Evenness index (J′) exclusively reflects the evenness (rectangularity) of a distribution, regardless of the number of levels. Shannon Evenness index (J′) is defined as: J′ = H′/H′max. The Shannon indices are calculated by dimension and also by instrument as a whole, treating each health state vector as a unique category.
The Shannon indices are purely descriptive measures of the discriminatory power of a classification system and have no relation to the content, meaning, or clinical relevance of what the instrument aims to measure. Both the Shannon index and the Shannon Evenness index are needed to make a useful interpretation of the discriminatory power of a measurement scale. Consider any 3L and 5L dimension: Clearly, the 5L has more discriminatory potential. However, if the extra levels are not used, the H′ value will be the same in both dimensions. Therefore, the Shannon Evenness index J′, which will be lower, is needed to express the loss in potential of the 5-level dimension. Conversely, when both the 3L and 5L show rectangular distributions, the J′ value will be the same. In this case, H′ is needed to express the better discriminatory performance of the 5L. We expected H′ to increase and J′ to marginally decrease at most.
Convergent validity
Convergent validity between the 3L and 5L dimensions and the WHO-5 items was assessed using Spearman rank order coefficients (Spearman’s rho), including a comparison with the WHO-5 sum-score. We hypothesized correlations to be highest for WHO-5 items with Anxiety/Depression. Convergence of 3L and 5L with dimension-specific rating scales was also assessed.
Known-groups validity
Known-groups validity was tested for all 3L and 5L dimensions in regard to age, education, and smoking status. Tests for age-groups (18–24, 35–44, 45–54, 55–64, 65–74, and 75+) and education were performed using Spearman rank order coefficients, and smoking status (never smoked, ex-smoker, and current smoker) was assessed with the Kruskall–Wallis H statistic. Education was included in three substudies (England, Denmark, and Scotland) and was recoded into three levels (1 = primary/lower secondary; 2 = secondary/vocational; 3 = higher/college). In regard to known-groups validity, we expected a lower reported health status for respondents with increasing age, lower education, and respondents who smoke or have smoked. In order to take possible clustering effects into account, we applied a set of statistical techniques developed for nonparametric statistics for clustered data, with country as cluster variable [32, 33].
The study data were analyzed centrally using PASW version 18.0.0 and R version 2.15.2.