Background

The Global Burden of Disease (GBD) study is a global collaborative project since the 1990s to evaluate the contribution of diseases, injuries, and risks on population health in the world [1]. GBD study summarizes health loss in disability-adjusted life years (DALYs) which are the sum of years of life lost (YLLs) and years lived with disability (YLDs) [2]. As a general concept, YLL reflects the burden of premature mortality from diseases and is calculated by multiplying the number of deaths and standard life expectancy at age of death and YLDs reflects the burden of morbidity and is calculated by multiplying the number of prevalent cases of disease by a disability weight (DW) that reflects the severity of the disabling consequences of disease. A major advantage of the DALY is that it indicates not only the burden of mortality and morbidity separately but also integrated in one number that enables to compare disease burden across all diseases.

DWs are weight factors that reflect the severity of health states. In the GBD 2010 DW study, the methodology to assess DWs had been revised considerably to incorporate the views of the general public rather than relying on the opinion of a select group of global public health experts who provided health state valuations for earlier rounds of GBD. Face to face and telephone surveys were conducted in Peru, Indonesia, Bangladesh, Tanzania, and the USA, supplemented by an open access web-based survey [3]. Instead of person-trade off methods used previously, the surveys were based on paired comparison (PC) questions eliciting valuations based on asking “who is the healthier?” between two persons, each described with a short description in lay terms of the main aspects of their health state [3, 4]. DWs for a parsimonious set of 220 health states covering all disabling outcomes of the diseases and injuries quantified in GBD were derived. Following these initial surveys, there was criticism of the wording of some of the health state descriptions. When the opportunity of a new DW study using a web-based survey in four European countries arose, some lay descriptions were altered to include key components of disability such as the effect of social isolation in someone with more severe hearing loss and incontinence as part of the description of spinal cord injury [5]. The modifications of lay descriptions resulted in a change of DWs in the expected direction.

The GBD 2010 DW study and the subsequent European surveys showed a high level of consistency of responses between countries and educational attainment [3]. However, these studies included few respondents from East Asian countries where cultural differences may influence health state valuations more than has been found elsewhere. Several previous studies suggest that the DWs in East Asian countries may differ from that of Western countries [6, 7]. Rigorous evaluation of the potential for contextual differences to rate the severity of health states in different settings is important for the further development of disease burden studies. In this study, we aimed to estimate DWs in the Japan—an East Asian country that has a unique healthcare system in that social health insurance offers universal health care [8,9,10]—using the same methodology as the previous GBD DW studies. The estimated Japanese DWs were compared to the estimates that have been used in GBD since 2013. We added health states that are common in the Japanese population and that were not included in previous DW studies.

Methods

For the assessment of the DWs for Japan, we followed the same procedure as was used in the previous DW measurement studies in a web-based survey design [3, 5, 11].

Lay description of health states

DWs for a set of 231 health states were assessed. The health states consisted of the following categories: 166 health states that were included in the GBD 2010 DW study and repeated in unaltered form in the European study (GBD 2010 original); 33 health states for which the lay descriptions were revised for the European DW study (GBD 2010 modified); 27 health states that were included only in the European DW study (European original); and 5 new health states. The five new health states included two generic drug health states (drug dependence and mild drug dependence) rather than the drug-specific ones (i.e., opioid, cannabis, amphetamine, and cocaine) that the previous studies had, one existing health state for which the lay description was expanded (vaginal discharge), and two completely new health states, cancer-post treatment and dermatitis. We excluded health states of the GBD 2010 and 2013 studies that were not relevant or rare in the Japanese context such as lymphatic filariasis, fetal alcohol syndrome, lower airway burns, and kwashiorkor. The new Japanese health state for dermatitis replaced the three GBD health states for disfigurement with itch or pain. The list of the 231 health states and their origins (GBD 2020 original, etc.) is presented in Additional file 1: table 1.

By using professional outsourced translation services, the lay descriptions were translated from English to Japanese and back-translated from Japanese to English and the consistency of meaning was verified by independent clinical experts from the authors’ institutions. The framing of the pair wise comparison questions was varied between chronic (“imagine each of the conditions in the pair would last for a person’s life time”) and temporary (“imagine each of the conditions in the pair would last for one week”). Of the 231 health states, 34 were framed as chronic only, 106 as temporary only, and 91 as either chronic or temporary. The list of lay descriptions of the 231 health states and their designation as chronic, temporary or both is also presented in Additional file 1: table 1.

Study population

The participants of the web-based survey were those registered to the panel of web survey company (Cross Marketing Inc.) [12]. The panel included those aged from 18 to 70 years old. Membership of the panel is on a voluntary basis, and the incentives to join the panel are that those who respond to questionnaires administered by the company are provided with “points” based on the survey volume. Points can be used to purchase products and services from partner companies [12].

In this study, the target number of study participants was set at approximately 40,000, and in order to ensure national representation, a quota sampling method based on age, gender, and prefecture population ratios obtained from the 2015 National Census was used to finally set 37,318 participants as the fixed number. Participation was first-come-first-served and the survey was closed when the number of respondents reached the pre-determined target population by age, gender, and prefecture. The survey began on 25 January 2019, and the target was reached on 30 January 2019.

The respondents were required to respond to each question so that there was no missing value. The respondents had given consent to the terms and conditions and privacy policy that the company sent with the invitation of questionnaires detailing how the company deals with confidential information of individuals. This sample size was determined based on statistical considerations as well as sample sizes used in similar research [5]. Characteristics of respondents are shown in Table 1 in comparison with the whole population distribution in Japan, derived from the National Census 2010 and 2015 [13, 14]. Except for educational level, most demographic characteristics were similar to the distribution of the whole population. The percentage of university graduates among respondents (46.2%) was larger than the whole population (16.1%).

Table 1 Background of respondents

Web survey

We used the same questionnaire as the GBD 2010 and European DW study that consisted of three parts. The first part consisted of questions about sociodemographic and geographic characteristics of participants, the second part consisted of PC questions, and the last part consisted of population health equivalence (PHE) questions. In the PC part, each participant answered 15 questions with either chronic or temporary framing for computer-generated random selection of health states pairs. Participants were allocated to the chronic or temporary framing according to the ratio of the number of health states in each set, which was 125 or 197. In the PHE part, each participant answered three questions comparing two randomly selected health programs: one prevented 1000 people to die immediately and another prevented randomly selected number of people such as 1500, 2000, 3000, 5000, and 10,000 to suffer from one of the selected 28 health states for the rest of their lives. Respondents were instructed to choose which program had produced the greatest amount of health gain.

Data analysis

All analyses were performed with STATA (version 15) and R (version 3.6.1). The PC data were plotted with a heat map that represents the probability of selecting the first health state in the pair as the healthier of the two states. We tested reliability of the PC responses by deliberately repeating the first pair in the last PC question, a similar test-retest procedure to that of the European DW study [5].

A probit regression model was used to estimate the latent preference of the health states using the PC data. The response variable was given a value of 1 if the first health state in the pair was selected as the healthier and 0, otherwise. The regression included indicator variables for each health state, which took the value of 1 if the state was the first one presented in the pair, −1 if it was the second state in the pair, −1 if the state was part of the PC, and 0 otherwise. A linear regression model was used to anchor the estimated results of the probit regression model, which were logit transformed to map onto a DW scale ranging from 0 to 1, based on the PHE responses. Then, Monte Carlo integration using normal random samples was used to estimate the mean of DWs [15]. Lastly, 1000 bootstrap iterations were implemented to compute 95% uncertainty intervals (UIs).

We also compared the estimated Japanese DWs for 226 health states (excluding 5 new states for the present study) with the GBD 2013 DWs to assess the health state or health category differences in the DWs.

In addition, regression analysis was performed to assess what symptoms mentioned in the lay descriptions were associated with the difference between the Japanese DW and the GBD DW. We identified eleven symptom categories based on the wording of the lay descriptions (Additional file 1: table 1), including mobility, pain, mental symptoms, fatigue, disfigurement, sensory symptoms, infection/diarrhea, substance use, activities of daily living (ADL), cognitive symptoms, and other physical symptoms. We constructed a linear regression model with outcomes of proportional differences between 226 Japanese and GBD 2013 DWs (excluding 5 states that were not included in the GBD 2013 study). All eleven symptom categories were simultaneously entered into the models.

Role of the funding source

The funders of the study had no role in the study design, data collection, data analysis, data interpretation, or writing of the paper. The authors had full access to all the data in the study and had final responsibility to submit for publication.

Results

Paired comparison

The heat map of the responses of the PC is shown in Fig. 1. Each cell in the heat map indicates the response probability for one pair of health states. The colors of the heat map correspond to the probability that the first health state in a pair comparison is chosen as the healthier outcome. The colors in the heat map show a smooth transition of preferences in each comparison, indicating high internal consistency. A reliability check of PC responses showed that inconsistent responses to the same pair occurred in 21.6% of cases.

Fig. 1
figure 1

Response probabilities for paired comparison. Red represents a probability of less than 0.25. Blue represents a probability greater than 0.75. Green, yellow, and orange correspond to probabilities between 0.25 and 0.75. A smooth transition of colors between the upper left and lower right corners exhibits indicates low measurement error and good internal consistency, while a completely random combination of colors reflects high measurement error and poor internal consistency. Note that not all possible 231 × 231 pairs were evaluated by pairwise comparison, which is indicated by some blanks in the figure

Population health equivalence

Figure 2 shows the proportion of participants that selected the health programs treating health states with randomly assigned a bid of the number of people to the program treating 1000 people of disease of immediate death. We expected that the proportion of choosing the second health program increased with increasing bid; however, the correlation between the proportion of choosing the second program and bid was low (0.32 for Spearman’s correlation coefficient). In addition, the probability of choosing the second program converged around 50%, regardless of the severity of health states, in contrast to the PHE responses from the GBD study that had an increasing trend in probabilities with the severity of health states (results also shown in Fig. 2).

Fig. 2
figure 2

Probability of choosing the second program for each of the 28 health states that were evaluated with the population health equivalence (PHE) questions, in the present study (bottom panel) compared to results in the GBD 2010 study (top panel). GBD, Global Burden of Disease study; DW, disability weight. Each line represents one health state and each dot represents a bid within one health state

Estimates of Japanese disability weights

Because of the evident lack of discernment in the PHE response in our study, we used the PHE data from the GBD 2010 DW study to anchor our regression estimates from the PC onto DW scale ranging from 0 to 1. Estimated Japanese DW for the 231 health states, in comparison with the GBD 2013 study are shown in Table 2. The highest DW was 0.707 (95% UI 0.527–0.842) for spinal cord injury at neck level (untreated), followed by 0.675 (0.506–0.822) of intensive care unit admission and 0.653 (0.483–0.798) of multiple sclerosis, severe. The lowest DW was 0.004 (0.001–0.009) of mild anemia, followed by 0.005 (0.002–0.012) of mild distance vision loss, and 0.006 (0.003–0.013) of controlled asthma.

Table 2 Estimated Japanese disability weights (95% uncertainty interval), compared to the GBD 2013 disability weights

Overall, a high correlation between Japanese DW and GBD 2013 DW was observed (0.88 for Spearman’s correlation coefficient), although there was considerable disagreement. Figure 3 shows a scatter plot of the Japanese and GBD 2013 DWs. The blue and red lines are straight lines that present the difference by a factor of two and three in the Japanese DW and GBD 2013 DW, respectively. Out of 226 health states, 55 (24.3%) showed more than a twofold difference, of which 41 (74.6%) had a higher value in Japanese DW. More than a factor-of-three difference was found for 23 health states (13.0%), of which 20 (87.0%) were health states with higher DW in Japan and they were mostly injuries including amputations and fractures. The largest proportional difference was a 13 times higher Japanese DW for amputation of toe (0.080 [95% UI 0.052–0.114]) compared to the GBD DW of 0.006 [0.002–0.012]), followed by a 12 times higher Japanese DW for amputation of finger(s), excluding thumb (0.063 [0.041–0.092] vs 0.005 [0.002–0.010]), an 8.5 times higher Japanese DW for amputation of thumb (long term) (0.093 [0.061–0.132] vs 0.011 [0.005–0.021]), and 8.3 times higher Japanese DW for moderate acute infectious disease (0.424 [0.289–0.577] vs 0.051 [0.032–0.074]). On the other hand, for the following three health states, the GBD 2013 DWs were higher than the Japanese DWs by a factor of three or more: drowning (0.079 [0.052–0.114] vs 0.247 [0.164–0.341]); severe anemia (0.040 [0.024–0.061] vs 0.149 [0.101–0.209]); generic uncomplicated disease: worry and daily medication (0.016 [0.008–0.028] vs 0.049 [0.031–0.072]).

Fig. 3
figure 3

Comparison of Japanese disability weights and GBD 2013 disability weights: (a) all values; (b) zoomed in on values <0.2; (c) zoomed in on values <0.1; (d) zoomed in on values<0.05. The black line is a diagonal line, representing equivalence between Japanese and GBD 2013 disability weights. The blue line represents a factor-of-two difference, and the red line represents a factor-of-three difference. abd mild: abdominopelvic problem, mild; abd mod: abdominopelvic problem, moderate; AMI: acute myocardial infarction, days 3-28; amput fings: amputation of finger(s), excluding thumb; amput limb Rx: amputation of one lower limb (long term, with treatment); amput thumb: amputation of thumb (long term); amput toe: amputation of toe; amput upp limb Rx: amputation of one upper limb (with treatment); amput upp limb, no Rx: amputation of one upper limb (long term, without treatment); anemia sev: anemia, severe; anx mild: anxiety disorders, mild; asthma cont: asthma, controlled; COPD, mild: COPD and other chronic respiratory problems, mild; caries: dental caries: symptomatic; depr mild: major depressive disorder, mild episode; diab foot: diabetic foot; disfig 1: disfigurement: level 1; disloc hip: dislocation of hip (long term, with or without treatment); disloc knee: dislocation of knee (long term, with or without treatment); disloc shoulder: dislocation of shoulder (long term, with or without treatment); blind: distance vision blindness; vision loss sev: distance vision, severe impairment; drown: drowning and nonfatal submersion (short or long term, with or without treatment); ear pain: ear pain; ESKD: end-stage kidney disease, on dialysis; # clav: fracture of clavicle, scapula or humerus (short or long term, with or without treatment); # face: fracture of face bone (short or long term with or without treatment); # foot, long: fracture of foot bones (short term, with or without treatment); # hand, short: fracture of hand (short term, with or without treatment); # neck: fracture of neck of femur (long term, with treatment); # femur oth: fracture, other than femoral neck (short term, with or without treatment); # lower leg short: fracture of patella, tibia or fibula or ankle (short term, with or without treatment); # lower leg long: fracture of patella, tibia or fibula or ankle (long term, with or without treatment); # lower arm: fracture of radius or ulna (short term, with or without treatment); gen worry + med: generic uncomplicated disease: worry and daily medication.; TTH: tension-type headache; heart fail mild: heart failure, mild; hearing mild + ring: hearing loss, mild, with ringing; hearing mild: hearing loss, mild; hearing mod + ring: hearing loss, moderate, with ringing; herp zost: herpes zoster; acute inf mod: infectious disease, acute episode, moderate; post-acute inf : infectious disease, post-acute consequences (fatigue, emotional lability, insomnia); inj nerve: injured nerves (short term); insom: insomnia; invas device: invasive device/drain; mastec: mastectomy; MSK low mild: musculoskeletal problems, lower limbs, mild; neck sev: neck pain, chronic, severe; wound: open wound (short term, with or without treatment); oth inj: Other injuries of muscle and tendon (includes sprains, strains and dislocations other than shoulder, knee, hip); somat dis: somatoform disorder; stroke mod + cogn: stroke, long-term consequences, moderate plus cognition problems; minor TBI: Traumatic brain injury, long-term consequences, minor (with or without treatment)

The distribution of the difference between the Japanese DW and GBD 2013 DW is presented in Additional file 1: figure 1, and the health category-specific differences are shown in Additional file 1: figures 2–12. Remarkable differences were found in several health categories. Japanese DWs for injuries and hearing and vision loss were generally larger than the GBD 2013 DW, whereas mental, behavioral, and substance use disorder were generally larger in the GBD 2013 DW than in the Japanese DW.

We found an inconsistency of DWs in four out of 28 diseases and injuries with a gradient in severity between health states. This concerned infectious disease episodes, neck pain, abdominopelvic problem, and anemia. Moderate infectious disease episode had a higher DW (0.424 [95% UI 0.289–0.577]) than severe infectious disease episode (0.242 [0.163–0.340]); severe neck pain had a higher DW (0.169 [0.115–0.236]) than most severe neck pain (0.144 [0.099–0.200]); moderate abdominopelvic problem had a higher DW (0.382 [0.270–0.524]) than severe moderate abdominopelvic problem (0.339 [0.235–0.458]); and moderate anemia had a higher DW (0.064 [0.040–0.092]) than severe anemia (0.040 [0.024–0.061]). All comparisons of conditions with several severity levels are presented in Additional file 1: figure 13.

The results of the regression analysis by key symptoms mentioned in the lay descriptions are shown in Additional file 1: table 2. Mental symptoms, substance use, and the residual category of other physical symptoms were statistically significantly associated with a lower Japanese DW than the GBD 2013 DW. The symptoms of pain and sensory symptoms were statistically significantly associated with a higher Japanese DW than the GBD 2013 DW. These findings remained robust in sensitivity analyses with the exclusion of non-significant symptoms.

Discussion

Disease burden research is primarily used as a decision-making tool to prioritize resource allocation at the population level, and it has been recommended to incorporate the health perceptions of the public in order to inform decision-making in democratic societies [3, 16,17,18]. In Japan, however, there has not been a comprehensive assessment of health states based on valuations by the general population. Burden of disease assessment in Japan has relied on the GBD studies in other countries. We found considerable disagreement between Japanese DWs and GBD DWs. Health states for injuries, and hearing and vision loss were valued as more severe and mental, and substance use disorders were less severe in Japan. Health states with pain and sensory symptoms in the lay descriptions were significantly valued higher in our study while mental symptoms, substance use, and a residual category of other physical symptoms had higher DWs in GBD.

Differences of estimated Japanese DW from the GBD 2013 DW

Like the GBD 2010 and the European DW study, the present study aimed to quantify the severity of health loss, rather than general welfare loss. Many previous studies have shown that there are clear contextual differences (such as socioeconomic status, ethnicity, and living environment) in how people perceive health problems and how such problems affect their lives [6, 19,20,21,22,23,24,25,26,27]. For instance, Komiyama et al. found that Japanese people were more sensitive to pain-related suffering when some pain detection thresholds were compared with Belgians and Caucasians [22, 23].

Tsuchiya et al. pointed out that the EQ-5D instrument, which was developed based on the health perspectives in European settings as a measure of health-related quality of life, does not necessarily adequately assess that of the Japanese [28]. Gerlinger et al. reported that the value sets of the EQ-5D-5L utility index between Denmark, France, Germany, Japan, Netherlands, Spain, Thailand, UK, US, and Zimbabwe varied substantially. They argued that when analyzing multinational clinical trials, country-specific value sets should be used to assess treatment effects on patient health perceptions [29, 30].

Ustün et al. also showed a significant difference in the disability rank of health conditions in 14 countries (Canada, China, Egypt, Greece, India, Japan, Luxembourg, Netherlands, Nigeria, Romania, Spain, Tunisia, Turkey, and the UK) [6]. The study included a total of 241 health professionals, policy makers, and patients, who subjectively ranked 17 health conditions from most disabling to least disabling. For Japan, the ranking of amputation and blindness was relatively high compared to other countries, while major depression and drug dependence were relatively lowly ranked, analogous to our findings. In the present study, the Japanese DW was higher than the GBD 2013 DW in all states related to amputation, especially the amputation of toe, which differed by a factor of 13. Similar differences in the Ustün study were found in China, but not in the UK, Canada, and other European countries. Also, in a 2016 DW study in South Korea, the DW of injuries, and hearing and vision loss were estimated to be considerably larger than those of GBD 2010 DW. However, this study modified the study protocol compromising the ability to make direct comparisons [7].

In the DW studies incorporated into GBD, the pair-wise comparisons of different health states produced similar results in different cultural, educational, environmental, and demographic contexts [3, 5]. However, it should be noted that the majority of responses came from high-income countries and around a quarter from four low- and middle-income countries, raising concerns about the universality of the DW estimates. This study, with a sample size two-thirds of the combined set of responses from the GBD 2010 and European DW study, shows enough differences in the DW values to challenge the universality of DWs as applied in GBD. However, only few studies are available that would allow contextual examination of differences in DWs across a wide range of health states but these studies have been conducted in a more distant past and used very different methods [7, 18, 19, 31]. There clearly is a need for further comparable studies to address the contextual differences. The differences between the GBD and Japanese DWs also raise the question whether one set of disability weights should be used universally, as is now the practice for the GBD study, or whether country-specific DWs should be used in future iterations of the GBD. Using a universal set of DWs has the great advantage of allowing country comparisons of burden of diseases in a standardized manner across countries, and is very useful for identifying drivers of successes and failures in health improvement of specific countries. On the other hand, countries may choose their own disability weights to better reflect preferences of their population. We would advise the GBD incorporates our findings in a new joint analysis with all previous studies and thus reduce the gap between the Japanese and previous GBD DWs for future iterations.

We also recommend that future DW studies cover the populations that are not represented in the GBD 2013 DWs. Note that the GBD 2013 DWs relied on sampling from the original GBD 2010 DW study and the subsequent European study. The GBD 2010 DW study was based on surveys in four low- and middle-income countries and five high-income countries supplemented by a web-based survey with respondents from many countries but the majority coming from North America, Australia, Western Europe and, to a lesser extent from China, India, Brazil, and South Africa. Data from most other countries were rather limited [3]. The target audience of the surveys was also limited to those aged 18 years or older. Meanwhile, the population of the European DW study consisted of those aged 18 to 65 years from four European countries, namely, Hungary, Italy, the Netherlands, and Sweden [5]. There is a large data gap in countries that were not covered by these studies, and we expect that future DW studies address this gap, which will help to contribute to the methodological and empirical basis for the modeling framework in future iterations of the GBD.

Strengths and limitations

The strength of this study was the large number of respondents. The size of the sample allowed for detailed estimation of DWs within the Japanese context. Meanwhile, the use of a web-based survey for data collection constituted limitations of our study. Internet users tend to be more highly educated and younger than the general Japanese population, limiting interpretation of our findings as being fully reflective of the opinions of the Japanese population.

The responses of PC had high consistency for each pair of health states when viewed in the heatmap. The 21% disagreement in the test-retest assessment was largely similar to the findings of previous studies [5]. However, we found an inconsistency of DWs in four out of 28 diseases and injuries that had several health states of increasing severity. This inconsistency may be explained by the expression of Japanese translation that the lay description of these health states did not capture the intended difference of level of severity. In this regard, many approaches have been discussed in literature to improve the validity and reliability of translated questionnaires [32,33,34,35]. Literature reviews proposed that in addition to the linguistic equivalence (which we ensured using the forward/backward-translation technique in the present study), the cultural adaptation of the original questionnaire needs to be explored [32,33,34]. They suggested conducting pilot testing prior to survey launch to assess the cultural equivalence, such as if the meanings of questionnaire items in the written language is viewed and interpreted in the intended way, by means of interviews with representatives of prospective respondents, followed by evaluation of psychometric properties using different tests.

The responses of PHE questions showed no variation between health states that did vary widely in the pair wise comparisons. The European DW study reported a similar lack of discernment in the PHE responses and speculated that the cognitive demands of the PHE questions was better suited to the GBD internet panel consisting largely of tertiary educated health professionals, rather than general population samples [3, 5]. As the proportion of respondents with higher education in our sample was substantially higher than the general population (46%), the more important reason for the greater success of the PHE questions in the GBD survey may be that respondents were a self-selected group who were evidently interested in the content of the survey and voluntarily participated. This may have improved the signal-to-noise ratio in their response. Our survey, on the other hand, was conducted by participants randomly selected from an existing panel and given incentives (points). This may have affected the attention paid to the intent of this study or the amount of time to consider more complex questions. We concluded that PHE questions might not be suitable for a web-based survey among the general population because of the high cognitive demand to make a meaningful distinction between the two hypothetical health programs.

Conclusions

This study has provided an empirical basis for DWs that are specific to Japan. Despite high correlation, considerable disagreement between Japanese DWs and GBD 2013 DWs were observed. Our findings suggest sizeable cultural differences in perceptions of the severity of key domains of ill health among the Japanese with greater severity assigned to pain and sensory loss but lower severity to mental and substance use disorders. The ramifications are that for resource allocation decision-making in Japan, this set of DWs may be more appropriate than the GBD DWs. However, for international comparisons of disease burden, it remains desirable to continue using a common set of DWs. For future rounds of the GBD study, combined analysis of all previous GBD pair wise comparison results with this new information from Japan is recommended.