Recent studies of disease burden have demonstrated the importance of psychological disorders. For instance, depression was the fourth leading cause of disease burden, accounting for 4.4% of total disability adjusted life years in the world in 2000 [1]. The 12-item General Health Questionnaire (GHQ-12) has been widely used in many countries for detecting psychological morbidity. Some major national studies such as the British Household Panel Survey (BHPS) also employ this instrument [2]. Calibration of this instrument may therefore contribute significantly to a large community of researchers.

While the longer versions of the GHQ are normally considered multidimensional, the GHQ-12 is often regarded as measuring only a single dimension of psychological health. For example, Corti [3] analyzed the GHQ-12 data in the BHPS and maintained that the high Cronbach's alpha value indicated the unidimensionality of this instrument. However, several authors suggested that the GHQ-12 contained two or three clinically meaningful factors. Using principal component analysis, Politi et al. [4] identified two factors: general dysphoria and social dysfunction. Andrich and van Schoubroeck [5] suggested that the positively worded items formed one factor and the negatively worded items formed another. Graetz [6], Martin [7] and Worsely and Gribbin [8] proposed three different 3-factor models. In a multi-centre study, although considerable between-centre variation was found, the final solution tended to have either two or three factors [9].

Using confirmatory factor analysis (CFA) to analyze the BHPS data, Cheung [10] compared various models and found that the 3-factor model proposed by Graetz [6] gave the best fit. The factors are anxiety and depression (4 items), social dysfunction (6 items), and loss of confidence (2 items). In a study of employees in New Zealand, Kalliath et al [11] also employed CFA to compare various models. They also found that Graetz's 3-factor model gave better goodness-of-fit than the others. However, they maintained that none of the models they examined gave a sufficient level of goodness-of-fit. Hence they modified the instrument to propose a short (8-item) version of GHQ. In a study of college students and young adolescents in Australia, French and Tait [12] found that Graetz's model not only fitted the data better than other models, but also satisfactorily achieved some fit indices targets such as Comparative Fit Index > 0.95. In a study of a rural population in Australia [13], the model of Worsely and Gribbin fitted best and that of Graetz was second best.

While the structure of the GHQ-12 has been studied using factor analysis methods, the construct validity and usefulness of those resulting factors are not often tested. The question is whether the additional information provided by the 2 or 3 factors, if they exist, is clinically useful. In other words, will multiple scores be more useful than a total single score in helping us to understand respondents' health status?

The purpose of this study was therefore two-fold. First, we aimed to compare the previously proposed models of the GHQ-12 in an oriental population and identify the best-fitting one. It was not our objective to assess their absolute level of fit or to derive new model or version of the GHQ. Second, we aimed to assess whether the factors identified relate to clinical and health-related quality of life variables in different ways.


Subjects and study design

A consecutive sample of outpatients with anxiety disorders and/or depressive disorders was recruited from a psychiatric clinic at a tertiary hospital in Singapore. Inclusion criteria were the presence of any anxiety disorder and/or major depressive disorder, literacy in English or Chinese, and completion of an informed consent form. Patients with organic brain syndrome or psychosis were excluded.

During routine consultation visits, diagnoses of recruited patients were ascertained by a psychiatrist using DSM-IV criteria and the severity of their psychiatric disorders was assessed using a Clinical Global Impression (CGI) scale, which ranges from 1 (very mild) to 5 (very severe). Patients were then given a questionnaire containing the General Health Questionnaire (GHQ-12) [14], the Beck Anxiety Inventory (BAI) [15], and the Short Form-36 Health Survey (SF-36) [16] for self-completion. Identical English and Chinese questionnaires were prepared for subjects to select according to their preference. A research assistant checked returned questionnaires for completeness.


The General Health Questionnaire (GHQ-12) consists of 12 items, each assessing the severity of a mental problem over the past few weeks using a 4-point scale (from 0 to 3). The score was used to generate a total score ranging from 0 to 36, with higher scores indicating worse conditions [14]. The Chinese version of GHQ-12 used in this study had been validated [17, 18]. A previous study of the 60- and 30-item versions of English and Chinese GHQ yielded comparable scale scores, suggesting equivalence for the two language versions [19].

The Beck Anxiety Inventory (BAI) is a valid and reliable self-report checklist for anxiety symptoms [15]. This instrument consists of 21 items, each describing an anxiety symptom for a respondent to assess how much he or she has been bothered by the symptom over the past week on a 4-point scale. Responses to all items are summed up to a total score ranging from 0 to 63, with higher scores indicating more severe anxiety. A Chinese BAI was developed by the authors using forward- and back-translation procedures, and refined after a pilot study of subjects with anxiety disorders [20].

The Short Form 36 Health Survey (SF-36) [16] is a 36-item questionnaire assessing functional health-related quality of life (HRQoL) in 8 domains: physical functioning, role limitations due to physical problems, bodily pain, general health, vitality, social functioning, role limitations due to emotional problems, and mental health. The instrument yields each domain a score ranging from 0 to 100, with higher scores indicating better HRQoL. The validity and reliability of SF-36 have been extensively documented [21]. In Singapore, both the UK English [16] and Chinese (Hong Kong) [22] versions of SF-36 have been validated [23, 24] and these two language versions appear to be equivalent [25].

Statistical analysis

Various factor structures of the GHQ-12 were tested by confirmatory factor analysis. Model I was unidimensional. Model IIA contained 2 factors: General Dysphoria and Social Dysfunction [4]. Model IIB also contained 2 factors: positively worded items forming one factor and negatively worded items forming another [5]. Model IIIA contained 3 factors: Cope, Stress and Depress, identified by Martin [7]. Model IIIB was the 3-factor model proposed by Graetz [6]: Anxiety and Depression, Social dysfunction, and Loss of Confidence. Model IIIC was also a 3-factor model: Anhedonia-Sleep disturbance, Social Performance and Loss of Confidence [8]. In the confirmatory factor analysis the number of factors and the relationship between factors and observed GHQ-12 items were pre-specified according to the models. The loading of an item on a factor within a model was estimated using the maximum likelihood method.

Methodologists have emphasized that it is desirable to use different indicators to examine a model's goodness-of-fit [26]. The fit of the six models was assessed by three measures. The Akaike's Information Criterion (AIC) penalizes the maximum log likelihood of a model according to its number of parameters. A model with a lower AIC is more plausible than one with a higher AIC. Instead of showing relative fitness, the Comparative Fit Index (CFI) assesses the fit of a model itself. The values range between 0 and 1. A CFI larger than 0.90 indicates an acceptable model. (Hu and Bentler [27] suggested that a CFI value above 0.95 indicates an acceptable model. In a later section we will discuss the more stringent cutoff.) The Root Mean Square of Approximation (RMSEA) assesses a model's amount of error. An RMSEA value larger than 0.08 indicates too much error.

The best-fitting model was examined in detail. The Kruskal-Wallis test was used to compare the GHQ-12 overall and factor scores of patients with different diagnosis. Pearson's correlation coefficient (r) was used to assess the association between GHQ-12 scores and various variables, namely Beck Anxiety Inventory, Clinical Global Impression and SF-36 scores. The Fisher's Z transformation was used to produce 95% confidence interval.

Results and Discussion

A total of 120 participants (63 man and 57 women) were included in the analysis (Table 1). Most (90%) respondents were Chinese; the mean (SD) age was 43.1 (12.7). Sixty six percent of the participants chose to administer an English version of the questionnaire. The mean scores of clinical and HRQoL data reported by the respondents in both gender were shown in Table 1. Men tended to have less anxiety, better clinical global impression, and higher SF-36 scores.

Table 1 Mean (SD) clinical and SF-36 health-related quality of life values by gender

Table 2 shows goodness-of-fit statistics for the 1-, 2- and 3-factor models. The 3-factor model (IIIB) proposed by Graetz (1991) was the best in terms of all three fit statistics. It gave the lowest AIC and RMSEA and highest CFI. Its CFI was 0.935. All six models produced RMSEA's which exceeded 0.08. The one-dimensional model (Model I) had the highest AIC, highest RMSEA and lowest CFI.

Table 2 Goodness-of-fit of six confirmatory factor analysis models (N = 120) (a),(b)

Figure 1 displays the standardized factor loadings and between-factor correlation of model IIIB. The factor loadings ranged between 0.72 and 0.90. The three factors were strongly correlated. The correlation between factor 1 (Anxiety and Depression) and factor 2 (Social Dysfunction) was 0.89. The correlation between factor 2 and factor 3 (Loss of Confidence) was 0.83. That between factor 1 and 3 was 0.90. These strong correlations suggest that even if there were in fact three factors, in practice it may be very difficult to discern them.

Figure 1
figure 1

Standardised factor loadings and between-factor correlations of Graetz's model [6]. Boxes represent GHQ-12 items; ellipses represent factors. One-way and two-way arrows indicate factor loadings and between-factor correlations, respectively.

Having established that Graetz's 3-factor model fitted the data better than the other models, we calculated the factor scores as unweighted sums of the items concerned. From figure 1 we could see that the loadings on each factor did not vary substantially. Hence we chose to use unweighted sums for simplicity. Table 3 shows the mean (SD) factor scores and the overall GHQ-12 score by clinical diagnosis. Some patients had multiple diagnoses; we categorized them into one of three major clinical diagnoses. The three factor scores and the overall GHQ-12 scores behaved in fairly similar ways. All four scores were significantly different between patients with and without depression; none was significantly different between patients with and without general anxiety disorder. Patients with panic disorder had lower scores on the factor Loss of Confidence (difference = 0.68; P = 0.043). The SD of the two diagnosis groups pooled was about 1.75; the between group difference was therefore approximately about 0.4 SD.

Table 3 Comparison of mean (SD) values of GHQ-12 scores by clinical diagnosis.

Table 4 presents the results of the correlation of 3 factors of Graetz's model and BAI, Clinical Global Impression Score, and SF-36 scales. The 3 factors were correlated with the 10 clinical and HRQoL variables to very similar degree.

Table 4 Pearson's correlation coefficients (95% confidence intervals) between GHQ-12 scores and clinical and health-related quality of life variables

Several previous confirmatory factor analyses found that the 3-factor model of Graetz gave better fit to survey data from Australia [12], Britain [10] and New Zealand [11]. In this study we examined the issue in an Asian population in Singapore, whose members are mainly ethnic Chinese. All three goodness-of-fit indices employed, namely AIC, CFI and RMSEA, agreed that the 3-factor model of Graetz out-performed the other five models. The CFI value was 0.935. Conventionally, a CFI of 0.90 or larger is taken as evidence of sufficient fit. A more stringent criterion of CFI larger than 0.95 has recently been proposed and debated [27, 28]. The RMSEA also indicated that even the best-fitting model did not fit well, using the cut-off of 0.08 as a criterion. However, our aim is to compare the models rather than to modify the instrument. So for our purpose it is the comparison of the goodness-of-fit of the six models that matters, not the absolute values of the fit indices. We consider the "correctness" and "usefulness" of a model two fairly separate issues. Although the goodness-of-fit of Graetz's model was limited, we proceeded to examine the factor scores in relation to external criteria in order to reach a conclusion about the usefulness of the model.

The one-dimensional model was the worst according to all three goodness-of-fit indices.

The three factors in the model proposed by Graetz were found to be strongly correlated with each other, with correlation coefficients in the neighborhood of 0.8 to 0.9. Such strong correlations suggest that even if there were indeed three different factors, in practice it is quite difficult to differentiate them. The study of French and Tait [12] also showed strong correlation between the factors, which led the authors to recommend that it may be prudent to use the overall score rather than overinterpret the factors within the GHQ-12. We examined the three factor scores and the overall GHQ-12 score in relation to clinical diagnoses. The four scores behaved in fairly similar ways. Although the Loss of Confidence scale was significantly different between patients with and without panic disorder while the other three scales did not show significant differences between the two groups of patients, the difference was only about 0.4 SD. This is smaller than a recommended threshold (0.5 SD) corresponding to minimal clinically important differences for health states questionnaires [29]. We also examined the association between the three GHQ scores and the Beck Anxiety Inventory, a clinical impression score, and the 8 scales of the SF-36. The three factors were associated with the clinical and HRQoL variables to similar degrees.

Two limitations of the study should be noted. Firstly, the sample size was somewhat small for confirmatory factor analysis. Secondly, the participants were clinical cases. This homogeneity might have made it more difficult to detect variations in GHQ-12 scores. We believe that the question about the relative plausibility of various factor models have been sufficiently answered by this and several previous studies [1012]. Nevertheless, future studies of non-clinical participants based on larger sample sizes will be helpful to further assess the practical usefulness of the factors of the GHQ-12.


Several studies, including the present one, have found that Graetz's 3-factor model of the GHQ-12 is more plausible than other models. However, the factors were strongly correlated and difficult to discern. Our analysis of the three GHQ scores in relation to clinical variables and aspects of health-related quality of life did not appear to be more informative than analysis of a single overall GHQ-12 score. As such, from a pragmatic point of view we consider it acceptable to use this instrument as a one-dimensional measure. Unless one has specific questions that are best answered by a subset of the three factors, there is no need to consider the multi-dimensionality.