Introduction

Japan has achieved the longest life expectancy in the world (2010 data: 79.64 and 86.39 years for men and women, respectively; [1, 2]). However, the quality of health status in Japan, i.e., health-related quality of life (HRQOL) among the entire Japanese population, has yet to be fully examined. Although several studies have attempted to investigate the trends in physical and mental health status using cohort studies, these often experienced major limitations in terms of data and health measurements [38]. Two major limitations of such studies are: (1) data which are not nationally representative in terms of sampling method and size of surveyed population; (2) health measures which are not clearly validated because they usually depend on a single domain of health status, such as self-rated health [9, 10], which can be easily affected by errors in the measurement of an individual’s characteristics.

Numerous international studies on HRQOL have measured health conditions by multi-dimensional questions [e.g. EuroQol (EQ-5D), the Health Utilities Index (HUI2/HUI3), Short Form (36) Health Survey (SF-36)] [1113]. For example, EQ-5D consists of five sub-domains (attributes), namely, mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, which represent the personal preferences for health outcomes; HUI2 has six sub-domains, namely, sensation, mobility, emotion, cognition, self-care, and pain, in which mental health constitutes one of the domains of HRQOL. Such composite HRQOL measures are available in representative national data sets compiled in many countries [1416], but efforts to establish HRQOL measures and to follow up the trend of changes in HRQOL over time in Japan’s current and future nationally representative data have, in comparison, lagged behind.

Therefore, the aims of the study reported here were (1) to create multi-dimensional scales for physical, mental, and summary health in the context of HRQOL, and (2) to describe the age-related trends in these scales in the Japanese population, using the most recent nationally representative data on the Japanese population.

Methods

Study population

We utilized the best nationally representative data available, which was a cross-sectional sample of the Comprehensive Survey of the Living Conditions of People on Health and Welfare (LCPHW), conducted by the Japanese Ministry of Health, Labour, and Welfare (MHLW) in June 2007 [17]; permission for secondary use was obtained. A total of 5,440 regional clusters from 47 prefectures in Japan were randomly sampled, and 624,166 individuals who were at least 15 years of age at the time of survey (in 229,821 households living in the regional clusters) answered the questionnaire. Hospitalized or institutionalized individuals were excluded from the surveyed samples. The response rate 79.9% from 287,707 households. The study population was restricted to those who answered all of the key variables described below (240,421 respondents were excluded). Consequently, the study population for further analysis comprised 383,745 individuals.

Basic characteristics

Individual-level information was obtained on the basic characteristics of the study population for several socio-demographic factors in addition to age and gender. Marital status (married, never married, widowed, or divorced), occupation status (currently yes or no), house ownership (currently yes or no), and smoking behavior (currently smoke, never smoked, previously smoked) were measured for each individual (Table 1).

Table 1 Basic characteristics and health measures in the Comprehensive Survey of the Living Conditions of People on Health and Welfare (LCPHW; Kokumin Seikatsu Kiso Chosa) 2007, Japan (n = 383,745)

Physical, mental, and summary health measures

We created a summary health scale that was subsequently divided into two major sub-categories: physical and mental health statuses (Table 1). Although these measures were not identical to a widely used HRQOL measure of EuroQOL (EQ-5D)—the sub-domains of which are mobility, self-care, usual activities, pain/discomfort, and anxiety/depression [11, 12]—we chose relatively similar health-related items that were available in the health-related questions in LCPHW. General health status, which was chosen for our measure, is one of the major components of the other measure of SF-36 [12, 13].

For physical health, we utilized four self-reported items: general health status, bedridden status/mobility, self-care/usual activities, and pain. First, general health status was measured by asking, “How is your current health status: excellent, very good, good, fair, or poor?” We created a discrete variable (4 if excellent; 3 if very good; 2 if good; 1 if fair; 0 if poor). Second, bedridden status/mobility was measured by asking, “How often have you been bed-ridden because of health-related problems for the previous 1 month: never, 1–3 days, 4–6 days, 7–14 days, 15 days or more?” For this items, we created a discrete variable (4 if never; 3 if 1–3 days; 2 if 4–6 days; 1 if 7–14 days; 0 if 15 days or more). Third, self-care/usual activities were ascertained by asking, “Do you have any of the difficulties below in your daily life due to your physical health conditions? yes or no for each: (1) daily movement (e.g., getting out of bed, getting dressed, eating, or bathing); (2) going outdoors; (3) working, doing housework, or studying; (4) exercise or sports.” For this item, we also created a discrete variable (4 if no difficulty; 3 if one difficulty; 2 if two difficulties; 1 if three difficulties; 0 if all difficulties). Fourth, we measured four kinds of pain in different parts of the body (headache, abdominal pain, back pain, extremity pain), for which we also created a discrete variable (4 if no pain; 3 if pain in one location; 2 if in two locations; 1 if in three locations; 0 if in all locations). Finally, we summed up the scores for these four items to represent physical health (0 = worst score, 16 = best score).

For mental health, we used the Kessler-6 scale (K6), which measures psychological distress based on answers to six questions. The K6 has been widely used around the world [18, 19], and a Japanese version has also been validated (K6, a discrete variable ranging from 0 to 24) [20, 21]. We created a modified K6 to represent mental health so that higher scores indicated better conditions, in line with the scales for sub-domains in physical health [i.e., 24–0 was converted into 0–4 proportionally; (mental health) = 4 − (original K6)/6]. We then created a summary health scale by simply combining the figures for physical and mental health (physical health + mental health: 0 = worst score, 20 = best score).

Statistical analysis

We checked the inter-item reliability (internal consistency reliability) of physical health (four items), mental health (six items: six questions in K6), and summary health (five items) using Cronbach’s α. To validate the physical and summary health scales, we calculated the areas under the receiver operating characteristic (ROC) curve (AUC) for diagnosed illnesses as the external criteria (any diagnosed co-morbidities with physician management: yes or no). We also calculated the AUCs of components of the summary health scale (general health status, bedridden status/mobility, self-care/usual activities, and pain) and compared these AUCs with those of physical and summary health. For K6, we should utilize strict diagnostic results based on structural interview (i.e., 30-day Diagnostic and Statistical Manual of Mental Disorders; [21]) as the external criteria. There is already an internationally well-established methodology for validating K6, however, the data are not available in LCPHW. In this study, therefore, we decided not to perform a validation test for K6.

We described the age-related trend of physical, mental, and summary health among the study population using the developed/evaluated scales, stratified by gender. We reported conventional two-sided p values without adjustment for multiple testing. All of the analyses were performed using Stata/IC ver. 11.2 (StataCorp LP, College Station, TX).

Results

The basic characteristics of the study participants were similar to those given in the governmental report of LCPHW [17]. Briefly, the majority of participants were married, working, had their own house, employee’s health insurance, were not currently smoking, and not currently receiving healthcare (Table 1). While the means of the four sub-domains (bedridden status/mobility, self care/usual activities, pain, and mental health) ranged from 3.5 to 4, the mean of general health status was less than 3 (mean 2.45). Men reported better scores in physical, mental, and summary health than women.

The reliability test results revealed that Cronbach’s α was 0.64 for physical health, 0.90 for K6, and 0.67 for summary health among the entire population. For validity testing, the AUC for diagnosed illnesses was 0.72 [95% confidence interval (CI) 0.72–0.72] for physical health and 0.71 (95% CI 0.70–0.71) for summary health, as compared with 0.68 for single-item general health status (95% CI 0.68–0.68), 0.55 for bedridden status/mobility (95% CI 0.55–0.55), 0.60 for self-care/usual activities (95% CI 0.60–0.61), and 0.62 for pain (95% CI 0.62–0.62). These results illustrate that the three health measures that we created were better than the single-item results for general health status (self-rated health) and other sub-domains.

Figure 1 shows the trends in physical, mental, and summary health measures by different age-groups and gender. While physical and summary health measures declined monotonically with increasing age, mental health peaked around age 65–74 years. After 65–74 years of age, mental health declined with increasing age. The trends were the same among men and women.

Fig. 1
figure 1

Physical, mental, and summary health by age-groups. Top Physical health (0 = worst score,16 = best score), middle mental health (0 = worst score, 4 = best score), bottom summary health (physical health + mental health: 0 =worst score, 20 = best score)

Discussion

We examined the reliability and validity of the physical, mental, and summary health scales as measures of HRQOL in the Japanese population using a nationally representative sample from a 2007 survey. The reliability is debatable and should be subjected to further empirical analysis, but the validities of the physical health and summary health measures were within a statistically acceptable range (better than the single-item self-rated health measure).

Our study identified several interesting age-related trends in physical and mental health. Physical health of our sample decreased monotonously with increasing age; however, the slope of the decline was shallower in the younger generation (age 15–64 years) and steeper in the older generation (age 65+ years). This trend could be related to the presence of multiple diseases in the older generation [22]. Mental health peaked at age 65–74 years and sharply declined after age 75–84 years. These results suggest that mental health and health-related well-being is rated as “best” after the mandatory retirement age of 60 years (in Japan), probably due to emancipation from demanding labor, child-bearing, and care-giving (parents/parents-in-law) activities, which mainly fall between age 45 and 65 years. These possible explanations for our results should be tested in future research using the LCPHW.

There are three major limitations to this study. First, missing values of physical health and mental health (K6) may have influenced the results because many participants (15.8% in the entire population) did not respond to the questions of the K6 in the LCPHW and missing observations may not have been random. We excluded these individuals from our analysis, which also could have affected the results in terms of the reliability and validity of these three health measures. For further study, we may adopt an imputation technique to our health measures [23].

Second, the internal consistency reliability of the physical and summary health scales was not sufficiently high. Also, the external criteria for validity tests are not adequate enough to support our results. These could be the most significant shortcomings of the scales. Nevertheless, the former shortcoming may characterize the multi-dimensional/attributable structure of these scales. When each sub-domain/question is a directly correlated measure of the latent variable (e.g., K6 for psychological distress), Cronbach’s α could be required to be very high (e.g., >0.8) [24]. In contrast, the multi-attributable sub-domains in the physical and summary health measures could be overlapped (sharing the same latent variable), but not identical, which suggests Cronbach’s α does not have to be very high. In terms of the latter concern for further validation tests, we need to apply our methodology to other data, including more strict diagnostic results based on a structural interview.

Third, the summary health scale, by integrating the physical and mental health scale in this study, remains questionable and should be more carefully examined. We simply summed up the physical health scale [0 (worst score) to 16 (best score)] and mental health scale [ 0 (worst score) to 4 (best score)], extracting the summary health scale [0 (worst score) to 20 (best score)]. We followed this procedure because we chose the HRQOL scale weights on each sub-domain based on multi-attribute utility theory [12, 25] and also followed another study’s scale development based on internationally compatible U.S. datasets (Health and Retirement Survey) [16]. However, this simple summation (0–16 + 0–4) cannot always be justified because the assumption that the contribution of the physical health scale is fourfold greater than that of the mental health scale toward “overall” health status is not necessarily acceptable. Therefore, for the future studies, we propose two alternative ways of calculation: (1) physical health scale (range 0–16) + 4 × mental health scale (range 0–16), and (2) the weight on each sub-domain based on standard gamble or time trade-off methods in cost-utility analysis [12].

Our health scales with the LCPHW datasets have several practical strengths. First, the data that we used included nationally representative samples so that the generalizability problem—the major issue of analyses involving community (or convenient) samples—is unlikely to appear. Second, almost all of the variables, except for K6, which had been asked only after 2007, that we used in this study, including the health measures, are available for—and compatible with—the LCPHW in different years (1989, 1992, 1995, 1998, 2001, 2004, 2007, and 2010). Although knowledge is required to analyze repeated cross-section or pseudo-cohort datasets, such as difference-in-difference estimations or multilevel analysis [2628], the development of reliable health measures will provide physicians, epidemiologists, economists, researchers from various academic fields, and policy-makers with the means to analyze the socio-demographic trends in health status and health disparity in Japan for the past 20 years (1989–2010) both consistently and thoroughly.

In conclusion, further use of the physical and summary health scale reported here in the Japanese population requires further discussion, although the K6 was an excellent measure of mental health in the LCPHW. Future research should focus on confirming and improving the reliability and the validity of these measures.