Introduction

When economic evaluations of healthcare technologies are performed, the incremental cost-effectiveness ratio (ICER) is regarded as a standard calculation. Various outcomes can be used as the denominator of ICER, but quality-adjusted life year (QALY) is widely applied for various areas of cost-effectiveness analysis. One reason is that quality of life (QOL) is one of the most important outcomes for not only medical interventions, but also healthcare policies. To calculate the QALY, the QOL score must be measured on a scale of 0 (death) to 1 (full health). Preference-based measures, such as the EuroQol 5-dimension (EQ-5D) [1, 2], the Health Utilities Index (HUI) [35], and the Short Form 6-dimension (SF-6D) [68], have been developed to calculate QOL scores. These measures were originally developed in English but have been translated into many languages. Japanese value sets for the EQ-5D (3L [9] and 5L [10]) and the SF-6D [11] have also been developed.

The mean QOL score in the general population is normally <1 because some people will have a less than full health score. People with diseases or symptoms are likely to continue living in their local community. Others may not report their health state as full health even if they do not have any diseases. Such reductions in QOL should be reflected in QALY calculations for economic evaluations. In addition, to interpret QOL scores obtained through a survey, it is important to be compared with the score for the general population as a reference value. Therefore, the population norms, which have been previously defined as “population reference data… for a specific country or international region” [12], used for preference-based measures are essential for both researchers and policymakers. The norms for these measures, especially for the EQ-5D-3L, have already been reported in many countries, including the UK [13], USA [14, 15], six European countries (Belgium, France, Germany, Italy, Netherlands and Spain) [16, 17], Spain (Catalonia) [18], Switzerland (French-speaking population) [19], Finland [20], Denmark [21], Portugal [22], Poland [23], Canada (Alberta) [24], Australia (Queensland) [25], China [26], Taiwan [27], Singapore [28, 29], Sri Lanka [30], and Brazil [31]. The population norms for the SF-6D have also been investigated in some countries, including the UK [32], USA [15], Australia [33], Portugal [34], and Brazil [35]. However, the Japanese population norms for QOL scores do not currently exist, with the exception of surveys performed in three areas [12] that were originally performed to obtain a value set [9]. Few standard norms for the EQ-5D-5L, a newly developed measure by the EuroQol Group, have been reported across the world.

The population in Japan was about 12.5 million in 2015, and almost all of the population speaks Japanese. Therefore, Japanese versions of the EQ-5D-3L, EQ-5D-5L, and SF-6D are widely used for calculating QOL scores in Japan, and Japan’s economic evaluation guideline [36] recommends the use of measures with value sets developed in Japan. The Ministry of Health, Labour and Welfare (MHLW) of Japan has collected data on these measures based on our concept. They also collected responses to a questionnaire included in the National Livelihood Survey, which Japan’s MHLW performs annually. This questionnaire includes questions regarding disease types and subjective symptoms.

Therefore, the objective of this study was to analyze data to obtain the population norms for the Japanese versions of three preference-based measures: the EQ-5D-3L, EQ-5D-5L, and SF-6D. The second objective was to examine the characteristics of each measure and the relations among measures. We also aimed to present the relation between the QOL score for the general population and characteristics such as sex, age, diseases, symptoms, and other socio-demographic factors.

Methods

Sampling

Data in this study came from MHLW’s survey, which took a representative sample. In the survey, a total of 1000 adult respondents (aged ≥ 20 years) were targeted in a random sampling from 100 sites (municipalities). The method used to select the 100 sites was as follows: First, the number of sites in each region (8 regions) was determined in proportion to the population of each region. Then, in every region, the number of sites belonging to each stratum (prefecture × size of municipalities) was calculated based on the populations of the stratum. The surveyed district (Cho-me, in Japanese) was randomly determined in a manner corresponding to the allocated number of sites in each stratum. Respondents were also randomly sampled from each selected district, stratified according to sex and age. People in a hospital or a nursing home were not included.

The Basic Resident Register can be used to select respondents living on each street in a random manner. In Japan, each municipality has its own Basic Resident Register data, which includes information on the name, sex, address, and date of birth of all residents. Each municipality has permitted the use of such data for public surveys. A door-to-door survey was performed from January to March in 2013. Investigators visited the registered addresses and distributed the questionnaire. They then collected the questionnaires a few days later and checked for any apparent errors (placement method). These visits continued until the planned number of responses was collected for each district. The investigators obtained the informed consent of all the respondents.

Measures

Health status was measured using the EQ-5D-3L, EQ-5D-5L, and SF-6D. The respondents were presented with the EQ-5D-5L, EQ-5D-3L, and SF-6D (SF-36) in a fixed order. In addition, socio-demographic data for the respondents, such as sex, age, education, marital status, employment, and household income, were also collected.

The EQ-5D was developed by the EuroQol Group. The original version of the EQ-5D (now called the EQ-5D-3L) is comprised of five items: “mobility,” “self-care,” “usual activities,” “pain/discomfort,” and “anxiety/depression” assessed at three levels of description. To improve the lack of a sufficient sensitivity and the ceiling effect of the EQ-5D-3L, the newly developed EQ-5D-5L [37] has increased the number of levels for each health dimension from three to five.

The SF-6D is a measure for converting responses to the SF-36 (or SF-12 [38]) to a preference-based QOL score for economic evaluation. The SF-36 [3941] is the most widely used measure for assessing health states in the world. Responses to selected items of the SF-36 can be classified according to descriptions of the SF-6D system, which consists of six dimensions [physical functioning (PF), role limitation (RL), social functioning (SF), bodily pain (BP), mental health (MH), and vitality (VT)] with five or six levels (defining a total of 22,500 health states). As the direct use of the SF-6D questionnaire is not recommended, we used the Japanese SF-36, version 2 [42].

The questionnaire also included a part of the National Livelihood Survey, which Japan’s MHLW performs annually. The questionnaire asks respondents whether they have any diseases for which they consult a doctor or not and whether they have any subjective symptoms or not. If they answer “yes,” they must then select the most important diseases and symptoms that they exhibit from a list of forty symptoms (having a fever, feeling sluggish, sleeplessness, etc.) and diseases (diabetes, obesity, hyperlipidemia, etc.).

Statistical analysis

The responses obtained for the EQ-5D-3L, EQ-5D-5L, and SF-6D were first converted to QOL scores based on the Japanese value sets. Summary statistics for the QOL scores were calculated according to sex and age category (20–29, 30–39, 40–49, 50–59, 60–69, and 70 years and older). The percentage of people reporting any problem in each dimension was calculated after stratifying the subjects according to sex and age category. Chi-square tests (or the Fisher exact test if the expected frequency was low) were applied to determine the significance between the frequency of respondents with any problem and sex or age. The McNemar test was performed to confirm the frequencies of respondents with any problem in the EQ-5D-3L and the EQ-5D-5L. The intraclass correlation coefficient (ICC) was used for reliability between the three measures in addition to the Bland–Altman plot [43]. In the Bland–Altman plot, the average of the two measures was plotted on the x-axis, and the difference between the two measurements on the y-axis was used to check for systematic errors.

To detect the influence of socio-demographic factors and diseases/symptoms on the QOL scores, these variables were added (in addition to sex and age) to an analysis of variance (ANOVA). Diseases and symptoms for which more than 10 respondents had responded positively or that had a significant influence on the QOL score were included in the above statistical model. The influence of each disease and symptom was estimated using an ANOVA that included all the pertinent variables. The significance level was set at 0.05. Statistical analyses were performed using SAS 9.4.

We compared the QOL scores of the respondents between those with any subjective diseases/symptoms and those without using an ANOVA model. The difference was interpreted as the between-group minimal important difference (MID) of each preference-based measure in the general population. The MID, which corresponds to the smallest improvement considered to be worthwhile by a patient, is normally measured using a distribution-based or anchor-based method. Reportedly, “anchor-based differences can be determined either cross-sectionally at a single time point or longitudinally across multiple time points” [44]. The former cross-sectional anchor-based method was applied to our data, as the diseases and subjective symptoms were regarded as the anchors for the between-group MID.

This analysis was approved by the Ethics Committee of the National Institute of Public Health.

Results

Socio-demographic factors

Table 1 shows the socio-demographic factors of the sampled respondents. In total, the responses of 1143 respondents were randomly collected. In 2013, 4.3 % of the Japanese population lived in Hokkaido region, 7.1 % lived in Tohoku, 33.5 % lived in Kanto, 16.9 % lived in Chubu, 17.8 % lived in Kinki, 5.9 % lived in Chugoku, 3.1 % lived in Shikoku, and 11.4 % lived in Kyushu. The actual Japanese median household income was JPY 4.3 million, while the average was JPY 5.4 million in 2012. Married and unmarried people accounted for 61.1 and 22.8 % of the population, respectively. Overall, 19.1 % had graduated from university. Note that this statistic reflects the actual distribution of the population, but we sampled the same number of respondents from each age category. This means that the percentage among younger people was higher than that of the entire Japanese population. Based on the responses to the National Livelihood Survey, 48.2 % of the respondents had some disease for which they were consulting a doctor, while 48.6 % had some symptoms.

Table 1 Socio-demographic characteristics of respondents

QOL score and relation to socio-demographic factors

Table 2 shows the mean scores of the EQ-5D-3L, EQ-5D-5L, and SF-6D in the general population classified according to sex and age category. The QOL score measured using the SF-6D was significantly lower than those measured using the EQ-5D scores. The ICC was 0.802 between EQ-5D-3L and EQ-5D-5L, 0.249 between EQ-5D-3L and SF-6D, and 0.234 between EQ-5D-5L and SF-6D, respectively. The Bland–Altman plot between EQ-5D-5L and SF-6D is shown in Fig. 1. This plot indicates that outliers (SF-6D scores that are higher than the EQ-5D scores) exist for lower QOL scores.

Table 2 Summary statistics of QOL scores
Fig. 1
figure 1

Bland–Altman plot between EQ-5D-5L and SF-6D

In Table 3, the results of the ANOVA, including socio-demographic factors, are presented. The measured QOL scores of people older than 60 years of age were significantly lower than those of younger people when calculated using all three measures. The QOL scores of women tended to be slightly lower than those of men. Considering other socio-demographic factors, a lower household income (<2 million JPY) was associated with a lower QOL score, even after adjustments for sex and age. A shorter education period also influenced the QOL score, but the QOL scores did not differ among people who had received an education beyond high school. Marital status and employment pattern (full time, part time or self-employment) were not correlated with the QOL score.

Table 3 Relation between QOL scores and socio-demographic characteristics

A comparison with the population norms for the EQ-5D-3L and SF-6D in other countries is shown in Fig. 2. The figure shows the relation between the mean QOL score of both sexes and the median age category based on published reports [EQ-5D-3L (country-specific value set): Szende et al. [12] except Singapore [29], SF-6D: already shown in the Introduction section]. The Japanese population norms for the EQ-5D tended to be lower than those in some countries (China, Korea, Singapore, and Germany) and to be higher than others (USA, UK, France, and Thailand). On the other hand, the SF-6D score was the lowest among the other countries (USA, UK, Australia, Portugal, and Brazil) for which population norms are available.

Fig. 2
figure 2

Comparison of Japanese population norms with those of other countries. a EQ-5D-3L, b SF-6D

Percentage of respondents reporting full health

The percentages of respondents reporting full health were 68 % when measured using the EQ-5D-3L (80 % for subjects in their 20 s, 78 % in their 30 s, 75 % in their 40 s, 74 % in their 50 s, 60 % in their 60 s, and 47 % in their 70 s or older) and 55 % when measured using the EQ-5D-5L (70, 64, 55, 59, 47, and 38 % for the respective age categories); however, 4 % (8, 3, 4, 6, 3, and 2 % for the respective age categories) reported full health when measured using the SF-6D (Fig. 3). Table 4 shows the percentages of respondents with any problem in each dimension of the EQ-5D-3L, the EQ-5D-5L, and the SF-6D. Among younger people’s responses for the EQ-5D, the percentages of pain/discomfort and anxiety/depression were higher than those of other dimensions, which mainly correspond to physical and/or social function. When both sexes were compared, the percentage of women with any problem in the pain/discomfort dimension was significantly higher than that for men regardless of age. In addition, the EQ-5D-5L detected more health problems than the 3L in almost all the dimensions independently of the sex and age categories. Respondents chose a not-full state on the SF-6D more frequently than on the EQ-5D. For example, in almost all the sex and age categories, approximately 50–70 % of respondents reported a problem in the pain dimension, 60–80 % in the mental health dimension, and 80–90 % in the vital dimension.

Fig. 3
figure 3

Percentage of respondents reporting full health

Table 4 Percentage of respondents reporting any problem in each dimension

Influence of diseases and symptoms on QOL score

Table 5 shows the relations between the QOL scores and both the diseases and symptoms that the respondents felt were the most important to them. Among the diseases, “depression or mental diseases,” “stroke,” and “rheumatoid arthritis” had the largest influence on the QOL score. These diseases decreased the QOL by 0.15–0.2. On the other hand, “dyslipidemia,” “hypertension,” and “tooth disorder” had a minimal impact on the QOL score, although their prevalence was relatively high. Considering the prevalence of diseases (decrease in the QOL score multiplied by the number of respondents), “depression or mental diseases,” “lumbago,” and “diabetes” were the top three diseases, decreasing the QOL score at the general population level. The QOL scores of respondents with some symptoms, such as “sleeplessness,” “arthritic pain,” and “having trouble moving limbs,” were lower than those of respondents reporting other symptoms.

Table 5 Relation between QOL scores and diseases and symptoms

The differences in the QOL scores between respondents with and those without any diseases were 0.064 for measurements based on the EQ-5D-3L, 0.061 for measurements based on the EQ-5D-5L, and 0.073 for measurements based on the SF-6D, which is regarded as the between-group MID in the general population. If symptoms were used in the same analysis, the differences were 0.093 for both the EQ-5D-3L and EQ-5D-5L and 0.112 for the SF-6D. Considering our results, the between-group MID can be estimated to range between 0.05 and 0.1 for all three measures.

Discussion

To our knowledge, this is the first study to examine the Japanese population norms of three preference-based QOL measures: the EQ-5D-3L, EQ-5D-5L, and SF-6D. Sampling was based on the Basic Resident Register data for each municipality. This sampling is regarded as one of the most rigid and reliable methods in Japan. The reason for the differences in the QOL scores, compared with the population norms in other countries, is unclear; however, the differences may be influenced by (a) differences in actual health states, (b) differences in the value sets used in each country, and/or (c) differences in the degree of the ceiling effect or other characteristics. The ceiling effect of the EQ-5D-3L (especially for pain/discomfort among younger respondents) may be higher in the present study than in western countries [12]. Of note, the difference in the population norms does not necessarily indicate a difference in the respondents’ health states.

The results are shown stratified according to sex and age category. The QOL scores were significantly reduced if the respondents were older than 60 years of age, female, had a lower income, or a shorter period of education. According to our results, a larger income was associated with a higher QOL score. The causal relation (whether poverty causes a poor health state or a poor health state is the cause of poverty) is unclear, but this finding may be useful for public health policies. This relation was observed in other countries. For example, in the USA [14], the QOL score as measured using the EQ-5D-3L was 0.81 for the poorest category (≤USD 10,000), although it was 0.92 for the richest (≥USD 75,000).

The percentage of reports of any health problem for the EQ-5D-5L is higher than that for EQ-5D-3L in almost all the sex and age categories. Some authors have pointed out that the EQ-5D-3L has a ceiling effect, which is defined as “the proportion of respondents scoring ‘no problems’ on any of the five dimensions” [45], because the instrument lacks enough sensitivity. A three-level questionnaire allows respondents with a slightly worsened health state to be reported as having a full health state. This is one example of how the ceiling effect problem has been improved by the revision of the EQ-5D-3L, resulting in the EQ-5D-5L. According to Table 2, the standard deviation of the QOL score measured using the EQ-5D-5L tended to be smaller than that measured using the EQ-5D-3L. This result may also arise from the increased number of levels, enabling respondents to choose intermediate levels.

Compared with the EQ-5D measures, the QOL score measured using the SF-6D was lower in the general population. A poor agreement between the EQ-5D and the SF-6D scores was observed, with a low ICC of 0.249 (EQ-5D-3L) and 0.234 (EQ-5D-5L). One cause seems to be clear, considering the percentages of respondents with full health as shown in Table 4. The percentage of people who chose no problem on the SF-6D was much lower than that for either EQ-5D measure. This result may be characteristic of the SF-6D and not only for the Japanese population. In Australia [33], the proportions of respondents in the 18- to 30-year age category who reported any problem in each dimension were as follows: 32 % for PF, 23 % for RL, 39 % for SF, 60 % for BP, 49 % for MH, and 94 % for VT. On the other hand, a Bland–Altman plot indicated that most outliers (an SF-6D score that was higher than the EQ-5D score) occurred at lower QOL scores. These tendencies were similar to those reported by Kontodimopoulos et al. [46] in Greece. Thus, the SF-6D may have a floor effect [47], i.e., the lowest QOL score of the SF-6D (0.292) is higher than that of the EQ-5D-5 (−0.025).

The Japanese population norms for the SF-6D seem to be lower than those for other countries, although that of EQ-5D-3L is similar to those of other countries (except Thailand). It is unclear whether this lower score is a result of the Japanese response pattern or a Japanese tariff for the SF-6D. According to these results, if the QOL score is used for economic evaluations, its interchangeability should be carefully considered [4852], since the baseline scores of the general population differ between the Japanese EQ-5D and the SF-6D.

We analyzed the differences in the QOL scores between respondents with diseases/symptoms and those without diseases/symptoms by comparing the cross-sectional between-group MID of each measure. The anchor-based MID is more commonly measured longitudinally across multiple time points, which is closer to the definition of MID. In the general population, repeated surveys are more difficult to perform than in clinical trials. Of note, our estimated score may not be the same as the intra-respondent MID. However, the between-group MID may be more useful when the results of between-group differences have been interpreted. Walters et al. [53] showed that the mean MID of the SF-6D was 0.041 and that of the EQ-5D-3L was 0.074 in a review of studies. In cancer patients, the MID of the EQ-5D was estimated to be 0.08 (UK score) and 0.06 (US score), and these values were anchored to the performance status and the FACT-G score [54]. According to a study examining post-traumatic stress disorder (PTSD), the MID was calculated as 0.05–0.08 (anchor-based method) and 0.04 to 0.10 (distribution method) [55]. Considering these scores, our MID is consistent with previous studies.

A limitation of this study was its relatively small sample size, compared with other studies to identify population norms. We think that the sample number was sufficient to estimate the population norms according to sex and age category, considering the interpretable and consistent results with previous studies in other countries. However, a larger number of subjects may enable a clearer relation between the QOL score and diseases/symptoms to be identified. Furthermore, analyses of the effects of diseases with small prevalence could not be performed. Another limitation is the order in which the three instruments were presented to the respondents. As the order was fixed, and not randomized, the possible influence of the order on the results cannot be excluded based only on our data.

In conclusion, we demonstrated the following characteristics of three preference-based measures: (a) the Japanese population norms according to sex and age category, (b) the relation between QOL scores and socio-demographic factors, (c) the reliability of the three measures in the general Japanese population, (d) the percentage of reports of any problem, (e) the influence of diseases/symptoms on the QOL scores, and (f) the between-group MID. The respondents were randomly collected from all eight regions of Japan in a door-to-door survey, and the representativeness of the sample was considered to be good. The resulting information may be useful for calculating QALY in economic evaluations and research examining QOL score.