Instruments
Three instruments were used in this study: the standard EQ-5D3L version, an adapted Dutch 5L version developed in 1993 [20], and a set of five dimension-specific VAS scales. The version of the 5L EQ-5D used in this study was an experimental version, since at the time of this study, no official five-level version had been advocated by the EuroQol Group. We chose to test a five-level EQ-5D system, even though we also could have chosen four or six levels. An increase in the number of levels is always an increase of discriminatory potential at the cost of a more complex descriptive system (which might compromise the robustness of the value function). Five levels appears to be an optimal number of response options concerning reliability [21, 22]. Furthermore, Preston et al. (2000) investigated feasibility for 11 different rating formats (ranging from 2 to 11 and a 101 point scale) and found that feasibility peaked at five levels [23]. We chose to add two in-between levels to the existing 3L descriptive system (between levels 1 and 2 and levels 2 and 3) because we considered this the most obvious option in regard to the objective of refining the EQ-5D instrument. In any preference-based instrument, level descriptors are practically required for valuation research in which generic profiles are to be valued. A small focus group was assigned to determine the wording of the level descriptors. The level descriptors presented here were translated from Dutch. The one-, three-, and five-level descriptors in 5L were the same as the one-, two-, and three--level descriptors in the standard EQ-5D3L. The grading terms that were used for the intermediate levels two and four in the 5L-system were “a little” for level 2 (5L-2) in Anxiety/Depression and “mild problems” for the remaining dimensions; and “severe” for level 4 (5L-4) in Pain/Discomfort, “very” for Anxiety/Depression, and “many problems” for the remaining dimensions. One further alteration was made to both the 3L and 5L systems: the most severe response category in Mobility was changed from “confined to bed” to “unable to walk about”, so it would be analogous to the extreme response categories of the other dimensions. Table 1 displays the exact wording of the descriptors in the 3L and 5L systems, respectively.
Table 1 Direct quantification of three- and five- level (3L, 5L) descriptors
To obtain quantitative values for each level descriptor of 3L and 5L, the VAS was used. We used five VAS scales, one for each EQ-5D dimension. Each VAS consisted of a horizontal hashmarked line without corresponding numbers, with the extreme-level descriptors belonging to that dimension as anchors. Respondents were asked to indicate their score on the VAS by marking the line. For the most severe category of Pain/Discomfort and Anxiety/Depression, the original descriptor was labeled “extreme”. Because the study was part of a larger process of choosing the definite level descriptors for the official five-level version of the EQ-5D, we decided to use the entire continuum of disability (extreme included), and used “worst imaginable” as upper VAS anchor for these two dimensions. This is analogous to the other three dimensions, which ranged from “no problems” to “unable to”.
Study design
Data collection took place in the form of one of two panel sessions and a follow-up postal survey 2 weeks later. A convenience sample of 82 laypeople from an existing general population panel (N = 560) participated. All participants were familiar with the vignette presentation form used in the indirect method.
All participants completed both the direct and the indirect quantification task. For the direct method, all 3L answers were obtained during the panel sessions and all 5L answers as part of the postal survey to avoid memory effects. For the indirect method, participants scored ten health states in the panel sessions (acute pharyngitis, exacerbation of eczema, hip fracture, cerebrovascular accident/stroke with moderate impairments, moderate gastritis, low spinal cord lesion, mild depression, back and neck pain, severe dementia, and acute multiple injury) and the remaining five in the survey (otitis externa, severe stable brain injury, irritable bowel syndrome, acute large burn, and posttraumatic stress disorder), because we expected that more than ten health states within one session could lead to concentration problems. The two sets of health states were balanced according to severity and duration. Following this design, the indirect method provided 225 responses for each respondent: 15 diseases × 5 dimensions × 3 response scales.
Direct quantification of level descriptors
In the direct method, respondents were asked to project the 3L and the 5L descriptors on the VAS scales for each dimension separately. As the extreme levels were used as anchors of the VAS, for 3L only, the midcategory (3L-2) level descriptor needed to be scored, except for Pain/Discomfort and Anxiety/Depression, which needed additional scoring of 3L-3 (extreme). Similarly, the midcategories 5L-2, 5L-3, and 5L-4 descriptors were scored for each dimension, except for Pain/Discomfort and Anxiety/Depression, which included the scoring of 5L-5.
Indirect quantification of level descriptors
As an alternative to the direct method, we developed an indirect method that we believe lies closer to the actual use of the EQ-5D instrument, as it uses a (hypothetical) health state as a calibrator or medium to derive a VAS score. In contrast to the direct method, the object of measurement in the indirect method is not a 3L or 5L descriptor but a complete health scenario (vignette). Each vignette was scored with the 3L and 5L descriptors and on a VAS, one for each separate dimension, independently. Consequently, an indirect head-to-head comparison of 3L and 5L scores could be made, calibrated via the common VAS score.
Figure 1 shows one of the vignettes. Each vignette was designed to present a disease as close to clinical reality as possible, therefore also including information on disease duration. All 15 diseases were presented on a standardized sheet (vignette) that contained (1) a disease label with a naturalistic description of the disease; (2) the course of the disease over a 1-year period using a calendar (the grey scales represent the duration of the disease); (3) the location of the disease with, if relevant, a visual representation; and (4) the EQ-5D dimensions, of which the levels were left unspecified, as the respondents were invited to select the appropriate EQ-5D level (according to his or her own view) for each dimension. Respondents were asked to read each vignette carefully and to select the level of each dimension of the EQ-5D descriptive system that best described the presented health state in their view using three response scales: the standard 3L response scale, the new 5L scale, and the VAS scale (similar to the VAS used in the direct method).
The 5L and 3L response scales were presented on the left and the right side of one page (per dimension), respectively. The respondents were first invited to score the 5L descriptors for all dimensions and all vignettes while covering the right side of the page that showed the 3L descriptors. Next, they were instructed to return to the first vignette, asked to cover the left side with the 5L scores, and provide the 3L response for all vignettes. Pilot testing revealed that when respondents scored 3L first, there was a tendency to avoid the in-between levels 2 and 4 of 5L, and for this reason, all respondents were asked to score 5L first. Adequate instruction was critical, stressing that 3L and 5L were two independent ways of scoring (in the postal survey, these instructions were repeated in writing). Subsequently, VAS scores were obtained on a separate form without respondents having access to the 3L and 5L scores. The demanding task of first providing 5L classifications on all five dimensions of all 15 vignettes minimized possible memory effects when the participants were instructed to return to the first vignette to score the 3L classifications while covering the 5L responses.
Analysis
Results of the direct and indirect methods are presented with conventional descriptive statistics. Results of the indirect method were derived by grouping 3L-VAS pairs and 5L-VAS pairs for each respondent per vignette and subsequently by calculating level means over all vignettes and all respondents combined. For each respondent, scorings were removed for the combined 3L, 5L, and VAS scores if at least one of the 3L, 5L, or VAS scores was missing, equalizing the number of VAS observations between 3L and 5L.
Characteristics
For both the direct and indirect methods, the 3L–5L extension of EQ-5D was investigated in terms of three characteristics. First, equidistance addresses the degree to which 3L and 5L level descriptors are distributed evenly over the VAS continuum, either without or with transformation. Equidistance is determined for each dimension and each instrument (3L and 5L) separately. Untransformed equidistance implies that level descriptors are distributed according to VAS ratings of 0–50–100 for 3L and 0–25–50–75–100 for 5L. There is evidence that the precision of the VAS might be illusory, as respondents mentally divide the VAS continuum in a smaller number of segments, which is nine or ten at maximum [23, 24]. Therefore, we defined a deviation of 5 VAS points as the maximum acceptable deviation (which makes a segment of 10 VAS points, as the deviation can be either way). Furthermore, a deviation of 5 VAS points has been used before [16]. If untransformed equidistance is rejected, equidistance using power [y = (ax)b] transformation is considered. A power relation of, e.g., y = (5.38*x)1.5 for 5L would result in a VAS rating distribution of 0–12–35–65–100. Note that transformation is only possible for 5L, as there is only one 3L observation apart from the anchors.
Part of the evaluation of equidistance is analysis of the position of the extreme levels according to the indirect method: are the VAS ratings for the extreme level descriptors close to the supposed anchor values for the indirect method? Ideally, 3L-1 and 5L-1 scores would equal 0 and 3L-3 and 5L-5 scores would equal 100, except for Pain/Discomfort and Anxiety/Depression in which the 3L and 5L extreme level descriptors were not identical to the VAS anchors.
Second, isoformity is the degree to which the positions of 3L-2 and 5L-3 level descriptors (and also 3L-3 versus 5L-5 for Pain/Discomfort and Anxiety/Depression) are similar. Isoformity directly compares the 3L and 5L descriptive systems for each separate dimension between instruments. For the indirect method, all 3L level means, including 3L-1 and 3L-3, can be compared with 5L. Analysis of isoformity is based on paired 3L–5L response means for each dimension separately. For the direct method, isoformity was tested with a paired t test between the 3L and 5L scorings. For the indirect method, a deviation of 5 VAS points was defined as the maximum acceptable deviation.
Finally, consistency between dimensions is the degree to which the positions of the same level descriptors differ across dimensions. Consistency, between dimensions was tested for each instrument (3L, 5L) separately. The first three dimensions (Mobility, Self-Care, and Usual Activities) were distinguished from the last two (Pain/Discomfort and Anxiety/Depression), as these—in Dutch—share identical level descriptors, e.g., some problems for Mobility, Self-Care, and Usual Activities. For the direct method, analysis of variance (ANOVA) was used for each identical level descriptor for the first three dimensions combined (one comparison for 3L and three for 5L) and Pain/Discomfort and Anxiety/Depression combined (two comparisons for 3L and four for 5L), resulting in a total of ten comparisons . For the indirect method, consistency is tested with a generalizability study (G-study). In a G-study, one is able to separate multiple sources of error variance [25]. Generalizability coefficients (G-coefficients) can be constructed as functions of the estimated variance components, expressing consistency on a 0–1 scale, with 1 expressing perfect consistency [26, 27]. We used a variance components analysis based on the restricted maximum likelihood method and identified four possible sources of variance: label, vignette, dimension, and respondent. Four separate G-studies were conducted, one on the first three dimensions and one on the remaining two dimensions, for each instrument (3L, 5L) separately. A G-coefficient expressing consistency between dimensions was calculated on the basis of these variance components (“Appendix A”).
We regarded transformed or untransformed equidistance to be a desirable characteristic for the new 5L system as opposed to no systematic relation between the quantitative position of the level descriptors at all. Consistency between identical-level descriptors across dimensions was also regarded as a desirable property because this expresses that respondents have a consistent conceptualization of the grading terms used over different dimensions of health. When consistency is achieved, this does not imply that utility values would also be expected to be consistent over dimensions, because utility values are an expression of an entire EQ-5D profile, whereas we investigated VAS scores within each dimension separately. Furthermore, a choice-based method presumably leads to different results than the dimension-specific VAS scales we used. We investigated isoformity to see whether the new 5L system was a refinement or a new system, and whether isoformity was achieved or not does not tell us anything about the 5L system in itself.