Depression disorders are a major health problem in Japan. Depressive mood is associated with suicide in middle-aged workers [1], and the number of suicides has increased as economic conditions have worsened since 1998 [2]. Nonetheless, there are few studies of the prevalence of depression or of depressive symptoms in communities in Japan [3, 4].

To assist in detecting depression or depressive symptoms, many screening questionnaires have been developed. Some of these have 20 to 30 items, take only a few minutes to complete, use the number of symptoms as the score, and have good performance to detect depressive state. Instruments that are even shorter but nonetheless have good performance to detect depressive state have also been developed [57]. One such questionnaire is the five-item version of the Mental Health Inventory (MHI-5) [6, 7]. The MHI-5 is used as the "Mental Health" domain of the Medical Outcomes Study 36-Item Short Form Health Survey (SF-36). The SF-36 has been translated into Japanese [8], and the Japanese version has been validated for use in the general population of Japan [9], but the performance of the MHI-5 has not been evaluated in detail. In addition, two of the items in the MHI-5 are almost identical to two items in a scale developed to measure anxiety [10]. We hypothesized that removing those two anxiety-related items would result in a scale (the MHI-3) that performs as well as the MHI-5 in detecting symptoms of depression.

In this study, we compared the Japanese version of the MHI-5 and MHI-3 to the 20-item Zung Self-rating Depression Scale (ZSDS) [11], and assessed the performance of the Japanese versions of the MHI-5 and MHI-3 in detecting depressive symptoms among the general population.


Setting and participants

We used data that had been collected previously for a study of the validity of the Japanese version of the SF-36, and calculated national norm scores of all subscales of the SF-36 [8, 9]. Details of the nationwide survey have been described previously [9]. Briefly, a total of 4500 people 16 years old or older were selected from the entire population of Japan by stratified-random sampling in 1995. A self-administered questionnaire was mailed, and the subjects were visited to collect the questionnaires. The SF-36, the ZSDS [11] (described below), and questions about demographic characteristics were included in the questionnaire.

The ZSDS consists of 10 positively worded items and 10 negatively worded items asking about symptoms of depression. Several studies have established the ZSDS as a reliable and valid instrument for measuring depressive symptoms [1214]. The ZSDS scores were used to define four categories of the severity of depression: within normal range or no significant psychopathology (below 40 points); presence of minimal to mild depression (40–47 points); moderate to marked depression (48–55 points); presence of severe to extreme depression (56 points and above). These score ranges result from the studies of Zung [15] and Barrett et al [16]. The ZSDS has been translated into Japanese and studies of the validity of the Japanese version have been published [17]. Because the ZSDS is not a clinical diagnostic tool, subjects with high scores are said to have depressive symptoms rather than "depression."

Like the rest of the SF-36, the MHI-5 was administered as a paper-and-pencil questionnaire. The instrument contains the following questions: 'How much of the time during the last month have you: (i) been a very nervous person?; (ii) felt downhearted and blue?; (iii) felt calm and peaceful?; (iv) felt so down in the dumps that nothing could cheer you up?; and (v) been a happy person?' For each question the subjects were asked to choose one of the following responses: all of the time (1 point), most of the time (2 points), a good bit of the time (3 points), some of the time (4 points), a little of the time (5 points), or none of the time (6 points). Because items (iii) and (v) ask about positive feelings, their scoring was reversed. The score for the MHI-5 was computed by summing the scores of each question item and then transforming the raw scores to a 0–100-point scale [18].

Items (i) and (iii) are almost identical to 2 items in the Zung Self-rating Anxiety Scale [10]. To make a scale that is even shorter than the MHI-5 and is focused on depression we removed those two anxiety-related items. Thus, the MHI-3 comprised only (ii), (iv), and (v) above. Possible scores on the MHI-3 ranged from 3 to 18 points.

Statistical methods

First, we computed the correlation coefficient (Pearson's) between the ZSDS scores and the scores on the MHI-5 and the MHI-3. We computed the sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve. Analysis of ROC curves has been described in detail and ROC analysis is used extensively in health-related diagnostics [19, 20]. ROC analysis can be used to study the performance of diagnostic or screening tests across a wide range of sensitivities and specificities. For example, it can be used to compute the sensitivity (the true-positive rate) and specificity (the true-negative rate) for any specified test score. The area under the ROC curve (AUC) is an index of the amount of information the test provides over its entire scoring range [21, 22]. In general, an AUC can range from 0.5, which indicates a test with no information, to 1.0, which indicates a perfect test. The "gold standard" criteria for diagnosing depression are considered to be those of the Diagnostic and Statistical Manual of Mental Disorders (DSM) [7]. In this study, because we could not interview all subjects, we used, instead, scores on the ZSDS. For each of the three categories of the severity of depressive states (ZSDS scores of 40 or higher), we computed the AUC of each of the five items, the MHI-5, and the MHI-3. To define the cut-off points, we first considered each of the actually measured MHI-5 scores as a possible cut-off point. For each score, we took the sum of the sensitivity and the specificity. The score with the highest sum was used as the cut-off point. One cut-off point was determined for each of the three levels of severity defined by ZSDS scores (mild, moderate, and severe).


The nationwide survey targeted 4500 people, and 3395 (male: 1704; female: 1691) responded to the questionnaire (75% response rate). Of these 3395 individuals, 3107 (male: 1573; female: 1534) completed all of the items on the ZSDS. The mean score on the MHI-5 was 72.8 (SD = 19.1). The mean scores on the MHI-5 for respondents of different demographic categories are shown in Table 1. These mean scores ranged from 68.5 to 76.6. Almost 23% of the respondents had ZSDS scores indicating mild depressive symptoms, 12% had scores indicating moderate depressive symptoms, and 2% had scores indicating severe depressive symptoms.

Table 1 MHI-5 scores by demographic categories

The correlations of ZSDS scores with MHI-5 scores and with MHI-3 scores were similar: -0.63 and -0.61, respectively. These correlation coefficients were almost the same whether or not the data were stratified by age and sex (Table 2).

Table 2 Correlations of ZSDS scores with MHI-5 and MHI-3 scores, by demographic category

With ZSDS scores as the basis for classifying depressive symptoms, ROC analysis allowed us to evaluate the performance of the MHI-5 and the MHI-3. The AUC values are shown in Table 3, and other performance characteristics are shown in Table 4. We also evaluated the performance of each of the MHI-5 question items individually (Table 3). For the individual items, the range of "cut-off scores" was determined by the range of each question's response options: from "none of the time" to "all of the time." The best-performing item for detecting severe depressive symptoms was the one asking about the frequency of "feeling downhearted and blue". That item had a sensitivity of 0.88 and a specificity of 0.77 (based on a score of 4 points or less). The AUC of the MHI-3 was only slightly lower than that of the MHI-5 (Figure 1).

Table 3 ROC analysis of individual MHI-5 items, the whole MHI-5, and the MHI-3, by severity of depressive symptoms
Table 4 Performance of the MHI-5 and MHI-3 for detecting depressive symptoms
Figure 1
figure 1

ROC curves of the MHI-5 and MHI-3 for detecting severe depressive symptoms (ZSDS above 55).

Using the MHI-5, the prevalence of severe depressive symptoms (cut-off: 52 points) was 17%, that of moderate or severe depressive symptoms (cut-off: 60 points) was 28%, and that of mild, moderate, or severe depressive symptoms (cut-off: 68 points) was 40%.


These data show that the MHI-5 and MHI-3 scores were each correlated with the ZSDS score and had good screening accordance with the ZSDS in the general population of Japan. We also found that the MHI-3 performs almost as well as the MHI-5. The best-performing single item was the one asking about "feeling downhearted and blue," which was also the case in the US [6]. The usefulness of the MHI-5 is consistent with results of a study done in the US [6]. Each scale and each item performed best as a detector of severe depressive symptoms, but each also contributed some information even for detecting moderate and mild depressive symptoms (Table 3). Both scales performed better than did any item alone.

Because prevalence affects positive predictive value, the latter was lowest for severe depressive symptoms and was highest for mild, moderate, and severe depressive symptoms (Table 4). For all levels of symptom severity, the positive predictive values of the MHI-3 were similar to those of the MHI-5, and for severe depressive symptoms they were nearly identical (10.8% and 10.4%) (Table 4).

A previous study showed that the prevalence of mood disorders (major depression, bipolar disorders, and dysthymia) as measured using the DSM criteria in Japanese people 20 years old and older was 3.1% [4]. On the other hand, 37% of the sample in the present study had mild, moderate, or severe depressive symptoms as measured using the ZSDS. People in whom depression is diagnosed using the DSM criteria are probably only a small number of those who report at least some depressive symptoms. In a previous study that also used the ZSDS, the prevalence of mild depressive symptoms among Japanese male workers was 45% [23], which is similar to that in our study.

In addition to its performance as shown in the present ROC analysis, an advantage of the MHI-5 may be the fact that it is part of the SF-36. The reason is that the possibility of a Hawthorne-type effect (i.e. an effect on study participants that results from their knowing that they are being studied) can be an obstacle to screening for depressive state. Specifically, the subjects' responses on a mental-health screening instrument may be affected by their knowledge that they are subjects in a study of mental health. Embedding the mental-health screening instrument in a more general survey, as the MHI-5 is embedded in the SF-36, could help minimize any such effect.

While the results of this study may be useful for public-health purposes, surveys done in primary-care settings could provide information that is more directly applicable to clinical work. Also, it should be kept in mind that ZSDS scores alone cannot be used to diagnose clinical depression. Studies using psychiatrist-diagnosed depression in addition to ZSDS scores would provide further information about the utility of the Japanese version of the MHI-5.

Another limitation is that the data set was obtained from a 1995 survey. Further studies are needed to confirm the performance of the MHI-5 and MHI-3 using data obtained in recent years.

In conclusion, the MHI-5 and MHI-3 scores were correlated with the ZSDS score, and can be used to identify people with depressive symptoms in the general population of Japan.