Disability weights represent the relative severity of disease stages to be incorporated in summary measures of population health. The level of agreement on disability weights in Western European countries was investigated with different valuation methods.
Disability weights for fifteen disease stages were elicited empirically in panels of health care professionals or non-health care professionals with an academic background following a strictly standardised procedure. Three valuation methods were used: a visual analogue scale (VAS); the time trade-off technique (TTO); and the person trade-off technique (PTO). Agreement among England, France, the Netherlands, Spain, and Sweden on the three disability weight sets was analysed by means of an intraclass correlation coefficient (ICC) in the framework of generalisability theory. Agreement among the two types of panels was similarly assessed.
A total of 232 participants were included. Similar rankings of disease stages across countries were found with all valuation methods. The ICC of country agreement on disability weights ranged from 0.56 [95% CI, 0.52–0.62] with PTO to 0.72 [0.70–0.74] with VAS and 0.72 [0.69–0.75] with TTO. The ICC of agreement between health care professionals and non-health care professionals ranged from 0.64 [0.58–0.68] with PTO to 0.73 [0.71–0.75] with VAS and 0.74 [0.72–0.77] with TTO.
Overall, the study supports a reasonably high level of agreement on disability weights in Western European countries with VAS and TTO methods, which focus on individual preferences, but a lower level of agreement with the PTO method, which focuses more on societal values in resource allocation.
Summary measures of population health combine information on mortality and non-fatal health outcomes in order to represent the health of a particular population as a single measure . They are used traditionally for comparative judgements of average levels of population health between populations and over time. Summary measures of population health were recently used with an explicit link to health resource allocation, e.g. disability-adjusted life expectancies (DALE) computed among other measures for the evaluation of the performance of health systems in the World Health Report 2000 , or disability-adjusted life years (DALY) for burden of disease estimates and cost-effectiveness analyses [3–5].
All summary measures of population health are built on three critical inputs: mortality by age, sex and condition; epidemiological data on non-fatal health outcomes by age, sex and condition; and valuations of health states (disability weights) that assess the relative severity of a year lived in a particular condition. Whereas mortality and epidemiological data may be seen as objective measures, even if scarcity and heterogeneity of data may compromise their accuracy, valuations of health states are undoubtedly subjective measures.
The lack of a gold standard for health state valuation has led to the development of various valuation methods . The 1996 Global Burden of Disease study (GBD) represented a milestone in the development of summary measures of population health, as it established a single set of several hundred disability weights relating to 107 conditions using the same valuation method [7, 8]. The choice of the specific values of an international panel of about ten health experts was supported by high correlations of their disability weights for 22 hypothetical indicator health states with those of eight panels from National Burden of Disease teams or World Health Organization (WHO) workshops on burden of disease methods . Since then the assumption of cross-national agreement on disability weights has been further supported by studies using similar valuation protocols [10, 11], whereas agreement between different types of informants in health have shown contradictory results [12–15].
One of the primary objectives of the European Disability Weights (EDW) project was to assess the cross-national agreement on valuations of health states when elicited using different methods . In the EDW study, a visual analogue scale (VAS) measured the severity of health states relative to the anchoring endpoints of the scale (worst and best imaginable health states). The time trade-off technique (TTO) measured the extent to which respondents would be willing to give up an amount of life time to avoid a hypothetical condition and be in full health, and the person trade-off technique (PTO) elicited directly the health decision maker's trade-off between severity of illness, the size of the health gain and the number of people helped . Hypothetical health states were valued in panels of two possible informants in health, i.e. health care professionals and the general public with an academic background. We report here on the agreement of disability weights from five Western European countries (England, France, the Netherlands, Spain, and Sweden) using VAS, TTO and PTO.
The valuation of health states in the participating Western European countries followed a standardised protocol with back and forth translation from English for all valuation materials . Key points of the valuation procedure were fixed to limit construct-irrelevant variance:
1. The scenarios to be valued were presented consistently in the form of a disease label, a brief clinical description of the disease stage, and a generic health state profile (EQ-5D extended with a cognitive dimension) [17–19];
2. Three valuation methods were used: visual analogue scale (VAS), time trade-off (TTO), and person trade-off (PTO);
3. A structured protocol which allowed for discussion and deliberation was followed in all panel sessions;
Panel sessions in each country were led by a trained facilitator from that country. Facilitators were trained by the Dutch group who had previous experience in valuation in panel sessions .
Two sources of variance in the valuation of health states were retained in our interrater reliability study of each valuation method: 1) the country; 2) the type of panel according to medical background of participants.
At least two panels of health care professionals (almost all medical doctors) and two panels of 'non-health care professionals,' each consisting of around ten participants, were planned for each country. Incentives to participate were given to health care professionals (medical doctors were paid in England and Spain, and received continuing medical education credits in the Netherlands) whereas the 'non-health care professionals' were recruited generally on local academic webs (they were also paid a small amount in England). Panels took place in five European countries: England, France, the Netherlands, Spain, and Sweden, between March and September 2000.
Disease stages selection and description
A list of diseases accounting for almost 80% of years of life lost due to premature mortality and 80% of years lived with disability in the Established Market Economies Region (including all Western European countries) was extracted from the Global Burden of Disease study . Thirteen diseases were then selected to cover:
1. The main chapters from the ninth revision of the International Classification of Diseases,
2. Different dimensions of disability,
3. Very mild to very severe health states.
External health care professionals and public health experts participated in both the subdivision of selected diseases into homogenous disease stages with respect to functional status, treatment and prognosis, and the elaboration of a brief clinical description for each disease stage .
Fifteen disease stages were selected for the panel valuation procedure: the stages selected covered the full range of disease severity, from the common cold to a final year of an unspecified fatal disease. All selected disease stages were described on a separate sheet with the name of the disease, the position of the selected disease stage among the other stages, a brief clinical description and a health state profile defined using the EQ-5D descriptive system extended to include a cognitive dimension, i.e. EQ-5D+C [17–19]. The EQ-5D+C system has six dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression, cognition) each with three possible levels of severity (no problem, some problems, extreme problems). Consistency of profiles was checked across disease stages within diseases and across diseases. Figure 1 shows an example of a disease stage description.
Pilot studies conducted in participating countries tested innovative societal valuation methods  after the GBD societal valuation protocol had been criticized at an early stage of the project on ethical grounds [21, 22]. Agreement on the valuation protocol was reached by consensus, and the three valuation methods are described below in the order of their use in panels. In VAS, all fifteen disease stages were valued; in PTO and TTO the nine chronic disease stages were valued.
In the self-administered VAS participants were asked to consider the consequences of living with the disease stage for one year. The disease stages were first ranked by decreasing severity, and then scored on a vertical thermometer graded from 0 (the worst imaginable health state) to 100 (the best imaginable health state) considering the consequences of living with the disease stage for one year. The best and the worst disease stages were scored first.
In the PTO, panel participants played the role of decision-makers in their country prioritising between two preventive programmes. Several assumptions about the programmes were made explicit in the panel sessions:
- Prevention means the reduction of occurrence in two to four years; programmes are of the same costs and otherwise equal (e.g. age, sex, socio-economic status of groups);
- Both programmes include people of various ages;
- Loss of production for society and burden on family or caretakers were to be disregarded in decisions.
The PTO session began with the following example: "Programme A prevents the occurrence of a rapidly fatal disease in 100 people in your country in 2 to 4 years' time. The identity of these people is unknown. With the programme they will live in normal health for a normal lifetime. Programme B prevents the occurrence of severe vision disorder in a number of people in your country in 2 to 4 years' time. The identity of these people is unknown. With the programme they will avoid the state and live in normal health for a normal life time." Participants determined the number of people in programme B at which they were indifferent between the two programmes with the aid of a visual prop that displayed a stepwise procedure increasing the numbers in programme B (100, 200, 1000, 10 000, etc.). Indifference numbers lower than 100 were also allowed . After the example, participants had to prioritise between the prevention of a rapidly fatal disease and quadriplegia, and then between each of the eight chronic disease stages on the one hand, and quadriplegia on the other. Quadriplegia was thus used as an anchoring state, linking the valuation of chronic states to death. After initial individual valuations, discussion was structured among panel participants by the facilitator who ensured that participants understood and were aware of the implications of their choices. Following discussion panel members had the opportunity to change their responses if they so wished.
In TTO, panel participants had to imagine someone like themselves in full health, and choose between living their remaining 10 years of life in the chronic disease stage or less time in full health. The number of years at which the panel participants were indifferent was found using a "ping-pong" procedure, but participants were allowed not to trade-off any years of life . The facilitator again ensured that panel participants fully understood the task.
Finally, participants had the opportunity to reconsider their responses after discussion in the panel and were encouraged to compare individual rankings of all disease stages for all three valuation methods and make changes to any responses if they so wished.
While TTO and PTO responses yield disability weights for life years directly, the VAS responses in this study do not, since in the VAS exercise, states of illness were not valued relative to the state of being dead. For simplicity of exposition we refer to all health state values as disability weights (DWs) in the following. Statistical analyses were performed on the final figures recorded after panel deliberations. Following the GBD study convention, DWs were valued to unity for death (or the worst imaginable health state in VAS) and zero for full health (or the best imaginable health state in VAS), and were computed as follows for the three valuation methods:
VAS: DW = 1-score/100;
TTO: DW = 1-years/10;
PTO: DWquadriplegia = 100/numberquadriplegia,
and DWdisease stage = 100/numberdisease stage * DWquadriplegia
As implied in the last two equations, the PTO used quadriplegia rather than death as the anchoring state. In cases in which the DW exceeded 1 due to the chained procedure in the PTO (i.e. when participants valued quadriplegia worse than death), the DW was truncated to 1. The proportion of participants who valued quadriplegia worse than death was recorded as well as changes in PTO and TTO numbers after panel discussions and consistency checks across valuation methods.
Rankings of the disease stages based on mean DW computed from VAS, PTO and TTO were compared across countries and health care professional status with Spearman's rank correlation coefficients. In a random effect model, variance components of disability weight were estimated for the random effects identified in this study:
1. Disease stages (n = 15 for VAS and n = 9 for TTO and PTO),
2. Subjects nested within a type of panel (n = 232),
3. Types of panel (health care or non-health care professional) nested within country (n = 2),
4. Countries (n = 5),
5. Crossed effects of disease stages and other random effects.
Maximum likelihood estimates of the variance components were used to compute the proportion of total variance accounted for by each random effect. For our interrater reliability study, two intraclass correlation coefficients were computed according to generalisability theory, which is a specific application of analysis of variance . A first intraclass correlation coefficient measured the agreement between countries on disability weights deduced from VAS, TTO and PTO:
(σ2 disease + σ2 subject(panel) + σ2 panel(country) + σ2 disease*panel)/(σ2 disease + σ2 subject(panel) + σ2 panel(country) + σ2 disease*panel + σ2 country + σ2 disease*country + σ2 residual).
The numerator includes the variance components of all random effects on disability weights other than country-related effects and the residual term, which are added in the denominator. The closer to unity the intraclass correlation coefficient, the better the agreement of countries on valuations of disease stages. With a comparable design, a second intraclass correlation coefficient was computed to measure the agreement between the two types of panel for all valuation methods. Non-parametric bootstrap resampling techniques were used to compute 95% confidence intervals , since the complex design of our interrater reliability study did not allow simple computations . One hundred independent random samples were resampled from individual data depending on country and panel type. Significance was examined at the 5% level. All analyses were undertaken using SAS version 8.0 (SAS Institute, Cary NC).
No general statement about the desired level of the reliability coefficient of a test can be made, because the purpose for which the test is used must always be taken into account . When tests are intended for important decisions at the individual level, e.g. admission for or discontinuation of a clinical treatment, a reliability coefficient greater than or equal to 0.90 may be considered as "good." When tests are intended for less important decisions at the individual level, e.g. evaluation of treatment outcome, a reliability coefficient greater than or equal to 0.80 may be considered as "good." In our particular case, where valuation methods were intended for research at the group level, a reliability coefficient greater than or equal to 0.70 may be considered as "good," between 0.60 and 0.70 as sufficient, and less than 0.60 as insufficient [27, 28].
A total of 232 participants from England, France, the Netherlands, Spain and Sweden were included in 13 panels of health care professionals and 10 panels of non-health care professionals. Overall, 60% of subjects were females, and the mean age was 40.4 years, with a standard deviation of 15.2. Mean age differed significantly across countries as shown in Table 1. Health care professionals included 84% medical doctors. Health care professionals differed significantly from non-health care professionals in age (48.9 ± 14.1 vs. 32.4 ± 11.3) and gender (48% vs. 71% female).
At least one disease stage description was questioned in panels of either non-health care professionals (8 out of 10) or health care professionals (10 among 13). Similar proportions of panels of non-health care professionals and health care professionals also reported difficulties with prognosis of some disease stages in TTO (74%), the initial example of PTO (35%), and the PTO valuation method overall (17%). Table 2 shows that discussion in panel sessions decreased significantly individual PTO numbers in five out of nine chronic disease stages. Quadriplegia, used as the anchoring state in the chained procedure of PTO valuation, was valued less than death by 61 participants with significant differences across countries, but not according to health care professional status. In these participants, quadriplegia PTO-DW was truncated to 1.
Tables 3, 4 and 5 show that disease stages were ranked similarly between countries according to mean DW computed from the three methods. The averages of the ten Spearman's rank correlation coefficients between countries in pairwise comparisons were 0.96, 0.93 and 0.96 for VAS, PTO and TTO, respectively, with minimum values of 0.94 (Spain/Sweden), 0.87 (France/Sweden) and 0.88 (England/France), respectively. Similar rankings were found according to health care professional status with VAS (0.98), PTO (0.96) and TTO (0.95). Independence of ranks was rejected at p < 0.0001 in all measures.
Table 6 shows that disease stages accounted for more than 60% of total variance of disability weights from VAS and TTO, whereas this proportion decreased to 36.7% with disability weights from PTO. The contribution of systematic differences between participants in valuation of the nine disease stages increased substantially from VAS (5.4%) to TTO (9.8%) and PTO (16.4%).
Country-related effects accounted for 1.9% of total variance with VAS, increasing to 3% with TTO and 10.8% with PTO. The agreement between countries fell from 0.72 with VAS and TTO to 0.56 with PTO. Panel type-related effects accounted for 1.3% of total variance with VAS, 1.1% with TTO and 3% with PTO. The agreement between health care professional panels and non-health care professional panels decreased from 0.73 with VAS and 0.74 with TTO to 0.64 with PTO.
A total of 232 participants from five western European countries valued disease stages in health care professional and non-health care professional panels. Overall we found a very similar ranking of disease stages across countries irrespective of the valuation method used. This confirms previous findings based on the valuation of seventeen health conditions, either with VAS through individual interviews of about fifteen key informants in fourteen countries from different regions , or with PTO in the GBD study and recent refinements [9, 10]. Similar rankings of disability weights are not enough, however, to judge the appropriateness of a universal disability weight set used at a cardinal level in summary measures of population health.
We found that intraclass correlation coefficients measuring agreement between countries were good with VAS and TTO. At first glance, this finding may appear at odds with cross-national comparisons of disability weights focusing on disease conditions separately. Other studies eliciting values for EQ-5D health states with TTO from the general public in the United Kingdom and Spain , or in the United Kingdom and Japan , showed a high positive correlation of values between countries, but significant differences in values were found for a number of health states. Whereas a great variability in the valuation of health states is observed within countries , the previous approach does not allow one to disentangle systematic differences in valuation between subjects and between countries. As shown within the framework of generalisability theory, the subject effect accounted for more variance of disability weights than the country effect for all valuation methods.
In the case of the PTO method, the intraclass correlation coefficient measuring agreement between countries was insufficient. PTO elicited directly health decision-makers' trade-offs between preventive programmes and attempted to get societal preferences between disease stages. Whether respondents actually took a societal view in PTO questions (as opposed to an individual view in VAS and TTO) was not confirmed directly, e.g. through follow-up interviews, and is certainly worthy of further research. The PTO method demonstrated a dramatic increase in the systematic effects related to subjects and countries as compared to VAS and TTO. This might be related to different views regarding equity across European people .
We found that the agreement between people of similar academic background but of different medical background was good with VAS and TTO, and sufficient with PTO. This confirms results of an earlier study in the Netherlands . However, agreement between possible informants in health, e.g. individuals in health states, patients' families, health care professionals and the general public, showed contradictory results in Western countries using the TTO method [12–14] or VAS . In the absence of clear agreement between possible informants on disability weights, the United States Panel on Cost-effectiveness in Health and Medicine stated that the general public preferences on health conditions should be used to inform health care resources allocation . Further research should assess differences in valuations between representative samples of the general public and the more educated and homogeneous groups used in this study. In particular, academics who volunteered to participate in such a time-consuming enterprise may represent a biased, highly literate sample of the population of the country. Academics and medical doctors are in many cases exposed to a similar global intellectual culture, which might override the national culture in intellectual matters. This may be especially true in the context of developing countries where highly educated people may have values at odds with those of the general public .
The design of our valuation methods may limit comparison with other studies. Framing and anchoring effects were likely to have been present with all three valuation methods. Among other framing effects, VAS scores are prone to sequencing effects (i.e. the worst and the best disease stages were scored first in this study), and the range of health states considered . The anchoring of the TTO in a ten-year time frame was fixed for all participants to ensure comparability of results. However, TTO disability weights for most disease stages decreased with the age of these relatively young participants, with older people less willing to give up an amount of life time to avoid a health condition than younger people (data not shown). This may have been of particular relevance in the cross-national comparisons focusing on disease conditions separately, since age patterns differed between participating countries.
Pilot studies resulted in a "chained PTO" to limit the "rule of rescue" encapsulated by the technique, i.e. valuations take into account the initial disease severity of the programmes' recipients in particular in lifesaving programmes. Quadriplegia as the anchoring state had various consequences at the country level. Firstly, 43% of participants thought that the prevention of quadriplegia should receive a higher or equal priority to that of a life-saving program. This finding was not related to age of participant but differed significantly across countries, from 23% in Sweden to 25% in Spain, 37% in France, 56% in England, and 64% in the Netherlands. Secondly, 24% and 5% of participants thought that the prevention of severe depression and stroke, respectively, should receive a higher or equal priority to that of quadriplegia with significant variation across countries (from 4% in Spain to 12% in France, 20% in Sweden, 28% in England, and 50% in the Netherlands in case of severe depression).
The PTO-DW of disease stages worse than the PTO reference programme was truncated to 1 to allow face validity across different valuation metrics. This truncation means that there was no differentiation between the different very severe states, and that, as all responses were recoded as 1, the level of agreement could obviously be higher than if ranking of these states was also done. In addition, participants were encouraged to compare their individual rankings of the nine chronic disease stages for all three valuation methods at the end of panel sessions. Spearman's rank correlations between valuation methods increased significantly (in paired t-tests) at the individual level after these "consistency" checks, by 0.013 (± 0.0.049) between VAS and TTO; 0.010 (± 0.0045) between VAS and PTO; and 0.007 (± 0.0038) between TTO and PTO. This extra consistency between valuation methods of different perspectives could alternatively contribute to higher levels of agreement across countries and health care professional status in the case of PTO-DW, or to lower levels of agreement in the cases of VAS-DW and TTO-DW.
Another limitation of this study is related to the validity of our valuation protocol. Despite great care being taken to ensure the face validity of disease stages, at least one disease stage was questioned in a majority of panels of both health care and non-health care professionals. For instance, discrepancies between the brief clinical description of spinal cord injuries resulting in quadriplegia and its generic health state profile were often noted. Difficulties were also encountered with TTO and PTO methods in spite of the deliberative panel process led by a facilitator and the high level of education of the participants. If we are to collect values from the general public as recommended, then we need to put more effort into ensuring that valuation methods are understood as intended by respondents. Discussion in panel sessions had a considerable impact on individual PTO valuations (see Table 2), and underlines that the collection of societal values at the individual level without discussion may hamper its face validity. Although discussion increased the level of agreement on DW computed from PTO in this study, differences across countries and health care professional status were still striking.
This study supports a reasonably high level of agreement on disability weights in Western European countries with VAS and TTO methods, but a lower level of agreement with the PTO method. This study showed that even within a relatively homogenous and wealthy region, and with a PTO valuation protocol that may inflate the level of agreement across countries, the agreement on disability weights was insufficient when a societal perspective was taken into account, i.e. when the summary measure of population health was considered explicitly within the context of health care resource allocation. Accordingly, this study casts some doubts on the generalisability of the disability weights computed from PTO used in the Global Burden of Disease study, although PTO protocols differed. For any valuation method, the level of agreement on disability weights requires further evidence in larger and more representative samples of the general public within and across regions, as defined by countries' location and possibly by similarities in mortality patterns and cost structures.
However, uncertainty surrounding disability weights may be considered small when compared to the lack of epidemiological data in many areas of the world to compare summary measure of population health across countries, as in the World Health Report 2000 . In the European Disability Weights study, cross-national comparisons of burden of disease, as measured by disability-adjusted life years, showed that differences between European countries for a given valuation method were negligible in comparison to differences in epidemiological estimates .
Members of the European Disability Weights group
Finn Kamper-Jørgensen, Ulla Christensen, Kim Moesgaard Iburg, (National Institute of Public Health, Copenhagen, Denmark); James Raftery, Claire Packer, Lisa Gold, Suzanne Robinson (from Oct 1999) (University of Birmingham, England); Isabelle Durand-Zaleski, Michael Schwarzinger (Dept. of Public Health, Hôpital Henri Mondor, AP-HP, Paris, France); Louise Gunning-Schepers, Gouke Bonsel, Clara Moerman (from 01.01.2000), Marlies Stouthard (Academic Medical Center, University of Amsterdam, The Netherlands); Paul van der Maas (project co-ordinator), Marie-Louise Essink-Bot (Dept. of Public Health, Erasmus University Rotterdam, The Netherlands); Joaquin Pereira, Ana Baylin (until 01.01.1999), Eduardo Fernandez Zincke (National School of Public Health, Madrid, Spain); Finn Diderichsen, Kristina Burström, Rickard Ljung (Karolinska Institute, Stockholm, Sweden).
Field MJ, Gold GM, eds: Summarizing population health: directions for the development and application of population metrics Washington DC: National Academy Press 1998.
WHO: The World Health Report 2000. Health Systems: Improving Performance Geneva: World Health Organization 2000.
Politi C, Carrin G, Evans D, Kuzoe FA, Cattand PD: Cost-effectiveness analysis of alternative treatments of African gambiense trypanosomiasis in Uganda. Health Econ 1995, 4: 273-87.
Goodman CA, Coleman PG, Mills AJ: Changing the first line drug for malaria treatment – cost-effectiveness analysis with highly uncertain inter-temporal trade-offs. Health Econ 2001, 10: 731-49. 10.1002/hec.621
Marseille E, Hofmann PB, Kahn JG: HIV prevention before HAART in sub-Saharan Africa. Lancet 2002, 359: 1851-6. 10.1016/S0140-6736(02)08705-6
Nord E: Methods for quality adjustment of life years. Soc Sci Med 1992, 34: 559-69. 10.1016/0277-9536(92)90211-8
Murray CJ, Lopez AD: Global mortality, disability, and the contribution of risk factors: Global Burden of Disease Study. Lancet 1997, 349: 1436-42. 10.1016/S0140-6736(96)07495-8
Murray CJ, Lopez AD: Regional patterns of disability-free life expectancy and disability-adjusted life expectancy: Global Burden of Disease Study. Lancet 1997, 349: 1347-52. 10.1016/S0140-6736(96)07494-6
Murray CJ: Rethinking DALYs. In The Global Burden of Disease. Vol 1: a comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected to 2020 (Edited by: Murray CJ, Lopez AD). Cambridge: Harvard University Press 1996, 1-99.
Murray CJ, Lopez AD: Progress and directions in refining the global burden of disease approach: a response to Williams. Health Econ 2000, 9: 69-82. 10.1002/(SICI)1099-1050(200001)9:1<69::AID-HEC493>3.0.CO;2-I
Stouthard ME, Essink-Bot ML, Bonsel GJ, on behalf of the Dutch Disability Weights Group: Disability weights for diseases: a modified protocol and results for a Western European region. Eur J Public Health 2000, 10: 24-30.
Dolan P: Whose preferences count? Med Decis Making 1999, 19: 482-6.
Zethraeus N, Johannesson M: A comparison of patient and social tariff values derived from the time trade-off method. Health Econ 1999, 8: 541-5. 10.1002/(SICI)1099-1050(199909)8:6<541::AID-HEC464>3.3.CO;2-#
Ubel PA, Loewenstein G, Hershey J, et al.: Do nonpatients underestimate the quality of life associated with chronic health conditions because of a focusing illusion? Med Decis Making 2001, 21: 190-9. 10.1177/02729890122062488
Suarez-Almazor ME, Conner-Spady B, Kendall CJ, Russell AS, Skeith K: Lack of congruence in the ratings of patients' health status by patients and their physicians. Med Decis Making 2001, 21: 113-21. 10.1177/02729890122062361
Essink-Bot ML, Pereira J, Packer C, Schwarzinger M, Burstrom K: Cross-national comparability of burden of disease estimates: the European Disability Weights Project. Bull World Health Organ 2002, 80: 644-52.
EuroQol – A new facility for the measurement of health-related quality of life. The EuroQol Group Health Policy 1990, 16: 199-208.
Brooks R: EuroQol: the current state of play. Health Policy 1996, 37: 53-72. 10.1016/0168-8510(96)00822-6
Krabbe PF, Stouthard ME, Essink-Bot ML, Bonsel GJ: The effect of adding a cognitive dimension to the EuroQol multiattribute health-status classification system. J Clin Epidemiol 1999, 52: 293-301. 10.1016/S0895-4356(98)00163-2
Robinson S, Gold L, Moesgaard Iburg K, and the European Disability Weights group: The development of the PTO for estimating disability weights [Abstract 45A014]. International Health Economics Association York, United Kingdom 2001.
Arnesen T, Nord E: The value of DALY life: problems with ethics and validity of disability adjusted life years [Erratum in BMJ 2000;320:1398 ]. BMJ 1999, 319: 1423-5.
Essink-Bot ML, Stouthard M, Bonsel G, Gunning-Shepers L, van der Maas P: The problems with disability weights. eBMJ 1999, 2 december
Drummond MF, O'Brien BJ, Stoddart GL, Torrance GW: Methods for the Economic Evaluation of Health Care Programmes Second Edition Oxford: Oxford University Press 1997.
Streiner DL, Norman GR: Health measurement scales: a practical guide to their development and use Second Edition Oxford: Oxford University Press 1995.
Efron B, Tibshirani RJ: An introduction to the bootstrap New-York, London: Chapman and Hall 1993.
Zou KH, McDermott MP: Higher-moment approaches to approximate interval estimation for a certain intraclass correlation coefficient. Stat Med 1999, 18: 2051-61. 10.1002/(SICI)1097-0258(19990815)18:15<2051::AID-SIM162>3.3.CO;2-G
Bartram D: The development of international guidelines on test use: the International Test Commission Project. International Journal of Testing 2001, 1: 33-53. 10.1207/S15327574IJT0101_3
Evers A: The revised Dutch rating system for test quality. International Journal of Testing 2001, 1: 155-82. 10.1207/S15327574IJT0102_4
Ustun TB, Rehm J, Chatterji S, et al.: Multiple-informant ranking of the disabling effects of different health conditions in 14 countries. WHO/NIH Joint Project CAR Study Group. Lancet 1999, 354: 111-5. 10.1016/S0140-6736(98)07507-2
Badia X, Roset M, Herdman M, Kind P: A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Med Decis Making 2001, 21: 7-16.
Tsuchiya A, Ikeda S, Ikegami N, et al.: Estimating an EQ-5D population value set: the case of Japan. Health Econ 2002, 11: 341-53. 10.1002/hec.673
Sculpher M, Gafni A: Recognizing diversity in public preferences: the use of preference sub-groups in cost-effectiveness analysis. Health Econ 2001, 10: 317-24. 10.1002/hec.592
Mossialos E, King D: Citizens and rationing: analysis of a European survey. Health Policy 1999, 49: 75-135. 10.1016/S0168-8510(99)00044-5
Gold MR, Siegel JE, Russell LB, Weinstein MC: Cost-effectiveness in Health and Medicine New York: Oxford University Press 1996.
Jelsma J, Chivaura VG, Mhundwa K, De Weerdt W, de Cock P: The global burden of disease disability weights. Lancet 2000, 355: 2079-80.
Almeida C, Braveman P, Gold MR, et al.: Methodological concerns and recommendations on policy consequences of the World Health Report 2000. Lancet 2001, 357: 1692-7. 10.1016/S0140-6736(00)04825-X
We thank Joshua A. Salomon (WHO, Geneva, Switzerland) for general advice, Bruno Falissard (Dept. of Public Health, Hôpital Paul Brousse, France) for help and advice with the generalisability study, Jennifer Jelsma (University of Cape Town, South Africa), and John Brazier (Sheffield University, UK) for their valuable comments on the submitted manuscript. The European Disability Weights group is also grateful for funding provided by the European Union, but we stress that the opinions expressed, and any errors made, are solely our responsibility.
This study was supported by a grant from the BIOMED II Programme of the European Union (project number BMH4-98-3253).
M.L. Essink-Bot, L.J. Gunning-Schepers, P.J. van der Maas, M.E.A. Stouthard and G.J. Bonsel were responsible for the original grant application. P.J. van der Maas, L.J. Gunning-Schepers, J Pereira, I. Durand-Zaleski, J. Raftery, F. Diderichsen and F. Kamper-Jørgensen were members of the Steering Committee of the European Disability Weights project. P.J. van der Maas acted as the Project Co-ordinator. The substudies on valuation and burden of disease estimation were ultimately designed in the discussions of the European Disability Weights group, to which all members listed below contributed. The respective country teams did empirical data collection in each participating country. Writing Committee, consisting of M. Schwarzinger, M.E.A. Stouthard, K. Burström and E. Nord drafted the manuscript of this paper. All members of the European Disability Weights group read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Schwarzinger, M., Stouthard, M.E., Burström, K. et al. Cross-national agreement on disability weights: the European Disability Weights Project. Popul Health Metrics 1, 9 (2003). https://doi.org/10.1186/1478-7954-1-9
- cross-national comparison
- outcome measures
- valuation methods
- Disability-Adjusted Life Years
- Quality-Adjusted Life Years