Background

Depression is frequently observed in cancer patients. Meta-analyses reported a 25 % prevalence of all types of depression among cancer patients [1] and a 32 % prevalence of mental health conditions in general [2]. Depression may negatively affect treatment outcomes [3] and can be associated with elevated mortality in cancer patients [4].

Oncologists often fail to detect depression in their patients [5, 6]. Therefore, it is important to use standardized and easily applicable tools to detect depression. There are several screening instruments that proved to be effective for that purpose. The most often used questionnaires measuring depression in cancer patients are the Hospital Anxiety and Depression Scale HADS [7], the Beck Depression Inventory BDI [8] and the Center for Epidemiologic Studies Depression Scale CES-D [9], for a new summarizing review cf. [10]. A further, freely available and more recently developed questionnaire is the Patient Health Questionnaire PHQ-9 [11]. Its validity has been proven in several studies [1215]. Normative scores are available [16], and two studies supply tools for converting scores between PHQ-9 and other depression scales [17, 18]. The PHQ-9 is generally used in its original one-dimensional form (sum score of the 9 items), but several psychometric studies with multiple disease groups challenged the one-dimensional solution [19, 20] and showed that two-dimensional solutions fitted better [2125]. The assignments of the items to these two factors were not totally identical in these studies, but all of them obtained one factor concentrating on emotional and cognitive aspects (depressed mood, feeling worthless, and thoughts of death), and the other on somatic aspects (sleep problems, loss of energy, and appetite problems). The first central aim of this study was to test whether such a two-dimensional solution could also be found in a sample of cancer patients. In particular, we test the specific two-dimensional model that performed best in three [22, 23, 25] of the five [2125] studies. In addition, the psychometric properties of the items in terms of item-test correlations and the correlations with other scales on emotional and somatic factors were to be examined.

Furthermore, there are age and gender differences influencing the PHQ-9 scores in the general population [16]. These differences should be taken into account when comparing patients with different cancer locations. It has often been documented that depression is relatively high in breast cancer patients [26] and low in prostate cancer patients [27]. However, to what degree is this difference due to different age and gender distributions? Unbiased comparisons between cancer groups can be done by calculating expected mean scores from the general population using linear regression analyses, cf. [28], and by considering the differences between the patients’ group means and these expected mean scores derived from the general population. The second objective of this study was to perform such regression analyses and compare the mean depression levels for multiple cancer types with and without correction for age and gender effects.

In summary, the aims of this paper were

  • to test psychometric properties and the factorial structure of the PHQ-9,

  • to calculate a regression analysis for the assessment of expected mean scores that help evaluate the PHQ-9 mean values of different cancer entities, and

  • to calculate unbiased estimates of the depression burden for several cancer diagnoses.

Methods

Cancer patients

Between 2011 and 2012, a group of 3,592 consecutive patients treated in a German rehabilitation clinic were asked to participate in the study. In Germany, most cancer patients are offered the opportunity to participate in rehabilitation program to regain physical and psychosocial functioning. During that program, generally lasting three weeks, the patients receive physiotherapy, physical fitness exercises, relaxation techniques, counseling concerning nutritional and occupational issues, and coping training. Inclusion criteria were age 18 years and above, absence of severe cognitive impairment, and sufficient command of the German language. Written informed consent was obtained from the study participants after full explanation of the purpose and nature of the data collection and storage. This research meets the ethics guidelines of the institution where the study was performed, including adherence to the legal requirements of Germany. A total of 2909 (81.0 %) of the 3,592 candidates agreed to participate in the study. These patients were sent a letter with several questionnaires six months after being discharged from the rehabilitation clinic. In all, 2059 patients responded (57.3 % of all patients; 72.4 % of the patients receiving the letter). Table 1 presents characteristics of the sample.

Table 1 Sociodemographic characteristics of the patients’ sample

General population

The data basis for the control group was a survey of the German general population (age 14 to 92 years), conducted in two waves: in 2003 (n = 2,500) and in 2008 (n = 2,518). Age, gender, and regional distribution were the major criteria for representativeness. The random-route procedure included random selection of sample points within Germany, random selection of houses and household within these areas, and random selection of the target person within the household. The summarized response rate was 63 %; 5,018 subjects (54 % females) participated in the study. Written informed consent was obtained by all study participants. In order to allow for a fair comparison with the patient sample, we selected a subsample so that the age and gender distribution matched that of the patients. The final sample of the general population comprised 2,693 subjects between 40 and 92 years, mean age: 62.3 (SD = 11.3) years; 1,579 males (59 %) and 1,114 females (41 %), in accordance with the distribution of the patients’ sample. The aims of the general population studies were to obtain normative values for several questionnaires. Further details of the studies have been reported elsewhere [16]. The study was approved by the Ethics Committee of the University of Leipzig, Germany.

Instruments

PHQ-9

The PHQ-9 is a screening instrument with 9 items (see Table 2), developed to measure depression. For each item the patients are asked to assess how much they were bothered by the symptoms over the last two weeks. There are four answer options: not at all (0), several days (1), more than half of the days (2), and nearly every day (3). The sum score (range 0 to 27) indicates the degree of depression, with scores of ≥5, ≥10, and ≥15 representing mild, moderate, and severe levels of depression [11]. While the PHQ-9 was used in both the patients’ and the general population group, the further questionnaires (see below) were only administered to the patients’ group.

Table 2 Mean scores of the PHQ-9 items and the sum score for cancer patients and the general population

EORTC QLQ-C30

The quality of life questionnaire EORTC QLQ-C30 [29] consists of 30 items and incorporates five functioning scales (physical, role, emotional, social, and cognitive), three symptom scales (fatigue, pain and nausea/vomiting), a global health status/QoL scale and six single items. Higher functioning scores represent better functioning/QoL, whereas higher symptom scores represent more severe symptoms.

FoP

The 12-item Fear of Progression Questionnaire is a short form of the original 43-item Fear of Progression instrument [30] designed to assess fear of cancer progression. The items are scored on a five-point Likert scale, ranging from 1 (‘never’) to 5 (‘very often’), resulting in a sum score from 12 to 60.

GAD-2

The Generalized Anxiety Disorder Questionnaire GAD-2 is a 2-item short form of the GAD-7 [31]. Together with the PHQ-2, the GAD-2 forms the PHQ-4 [32]. The answer options are identical to those of the PHQ-9.

Statistical analyses

Means, standard deviations, reliability coefficients (Cronbach’s alpha) and part-whole-corrected item-test-correlations were calculated. Mean score differences were expressed in terms of effect sizes d according to Cohen [33]. Principal component analyses (PCA) with two factors and varimax rotation were performed since previous studies also detected two-dimensional structures [21, 22]. Confirmatory factorial analyses (CFA) [34] were performed with Mplus. We used the following criteria: Bayesian Information Criterion (BIC), Standardized Root Mean Square Residual (SRMR), Root Mean Square Error of Approximation (RMSEA) [35], Comparative Fit Index (CFI) [36], and Tucker Lewis Index (TLI) [37]. RMSEA should be lower than 0.10, and CFI and TLI should be greater than or equal to 0.95 [34].

A linear regression analysis of the general population’s sample was performed, with age and gender as independent variables and the PHQ-9 sum score as the dependent variable. Expected PHQ-9 mean scores were calculated with these regression coefficients for several cancer groups, depending on their age and gender distributions. These expected mean scores were then compared with the raw mean scores of the cancer groups.

Results

Mean scores of the PHQ-9 items

The mean sum score of the PHQ-9 in the patients’ sample was 5.26 (Table 2). According to the cut-off criteria for mild (5–9), moderate (10–14), and severe (≥15) depression, the frequencies were 49.8 % (no), 35.1 % (mild), 11.3 % (moderate), and 3.7 % (severe) depression for the patients. In the general population, the percentages for no, mild, moderate, and severe depression were 72.2 %, 21.2 %, 5.1 %, and 1.5 %, respectively.

On the item level, the mean scores (Table 2) of the cancer sample ranged from 0.15 (suicidal ideation) to 1.07 (sleep problems). All items showed higher mean scores in the cancer group compared with the general population. The greatest differences between both samples, expressed in terms of effect sizes (d > 0.40), were found for items 3 (sleep problems), 4 (loss of energy), 7 (concentration problems), and 8 (psychomotor agitation/retardation).

Reliability and factorial analyses

The reliability coefficient (Cronbach’s alpha) of the PHQ-9 for the patients’ sample was 0.84. The highest part-whole-corrected correlations between item and sum score (r it) were obtained for items 4 (loss of energy), 2 (feeling depressed), 7 (concentration problems), and 1 (loss of interest) (Table 3). All items contributed positively to the reliability of the scale. The contribution of the last item (suicidal ideation) was lowest, but positive nevertheless.

Table 3 Factor loadings and item-test correlations for the cancer patients

Results of the 2-factorial principal components analysis (PCA) for the patients’ sample are also given in Table 3. The theoretically assumed structure (Items 1, 2, 6, and 9 in one factor and items 3, 4, 5, 7, and 8 in the other) was realized with one exception (item1: loss of interest). CFA results for the total scale and for the two-factorial structure are given in Table 4, indicating a better fit for the latter model.

Table 4 CFA results for the cancer patients

Relationship between PHQ-9 scores and other scales

The PHQ-9 items were correlated with several scales of other questionnaires (Table 5). In the left part of Table 5, the scales focus the affective and mental component, while in the right part the scales also include physical aspects.

Table 5 Correlations between item scores of the PHQ-9 and scale scores of other instruments in the cancer patients’ sample

There is a clear correspondence between item 7 (concentration problems) and Cognitive functioning (r = −0.72). Item 4 (loss of energy) is highly correlated with all scales, including affective scales and those with physical aspects. Among the four items (1, 2, 6, 9) assigned to Factor 2 (emotional and cognitive aspect), all correlations with the Emotional functioning scale are higher than those with the EORTC fatigue scale. On the other hand, among the five items (3, 4, 5, 7, 8) of Factor 1 (somatic aspect), only three items (3, 4, and 5) showed higher correlations with fatigue, compared with the correlations to Emotional functioning.

Tumor-specific analyses

Table 6 shows PHQ-9 mean scores for all tumor sites with subsample sizes of 25 and above, arranged according to the PHQ-9 mean score. There are great differences among the subsamples concerning age and sex distribution. Since the PHQ-9 mean scores depend on age and sex in the general population, a fair comparison among the tumor sites requires the consideration of these age and sex differences.

Table 6 PHQ-9 scores, broken down by cancer site

The linear regression analysis of the general population’s sample yielded the following regression equation:

$$ \mathrm{P}\mathrm{H}\mathrm{Q} = 0.0367\ *\ \mathrm{age} + 0.310\ *\ \mathrm{sex} + 0.884. $$

Sex is to be coded with the values of 0 (males) and 1 (females). For example, the expected PHQ-9 score of a 60-years old woman is 0.0367 * 60 + 0.310 * 1 + 0.884 = 3.396. For the whole sample of the general population (41 % women; mean age: 62.2 years), the calculation is as follows: PHQ-9 (expected) = 0.0367 * 62.2 + 0.310 * 0.41 + 0.884 = 3.294. The column “Expected PHQ-9 Mean” in Table 6 shows these expected values for samples with the age and gender distribution of the cancer groups. These expected means deliver the basis for the comparison of depression burden of the different cancer patients groups.

All groups of patients show higher mean values than the (matched) controls, with differences ranging from 0.7 (prostate) to 5.2 (thyroid gland). The sequence of the cancer sites according to the PHQ-9 mean scores is similar to the sequence according to the age-and gender-corrected mean values (right part of Table 6). Patients with testis cancer have a mean PHQ-9 score of 4.9, which is the third lowest mean value in Table 6. Taking into account that the patients are males and that they are relatively young, the difference between the actual score and the expected one (diff = 2.7) indicates a higher level of distress in this group. A similar phenomenon can be observed for patients with Hodgkin lymphoma.

Discussion

The first aim of this study was to test the factorial structure of the PHQ-9 administered to cancer patients. The results of the factorial analyses demonstrate that a two-dimensional model according to [23] performed better than the one-dimensional model. With one exception (item 1; loss if interest) the hypothetically assumed structure emerged in the PCA. It is interesting to note that the two items that were selected for the PHQ-2 (loss of interest and feeling depressed) reached good part-whole corrected item-test-correlations (0.60 and 0.68), and that they had positive loadings in both factors in the PCA. Together with the results of other studies reported in the literature, we can conclude that the PHQ-9 comprises two aspects, an affective-cognitive component (feeling depressed, self-blame, and suicidal ideation) and a somatic component (sleep problems, loss of energy, and appetite problems), but that the assignment of the remaining three items to the scales according to the factorial analyses (loss of interest, concentration problems, and agitation/retardation) is less clear. The reliability coefficient of the total scale (Cronbach’s alpha) was good (alpha = 0.84), and all items contributed to this scale. This is similar to the results of other studies [38, 39]. Insufficient CFA fit indices for the total sum scale are also found in other depression questionnaires (e.g., [40]). As long as there is no other structure of the questionnaire that can be reliably replicated in several studies, we believe that it is best to maintain the sum score.

Sleep problems (item 3) and loss of energy (item 4) were the symptoms that differed most greatly between the cancer patients and the general population, followed by concentration problems (item 7) and agitation/retardation (item 8). As such, “classical” depression features like feeling depressed and loss of interest, were not reported to be key burdens of cancer patients half a year after rehabilitation. Item 8 contains two contradictory aspects of psychomotorics: agitation and retardation. This item fitted most poorly in the Forkmann et al. study [19] and was therefore excluded there. Clinicians report that patients have difficulties answering this item because of its seemingly contradictory nature. In the PCA, the item was associated with factor 1 in the patients’ sample. It cannot be clearly interpreted. Item 9 (suicidal ideation) showed very low mean scores, and the item-total correlations were lowest, though both coefficients were greater than 0.40. The contribution to the sum score of the PHQ-9 is small. However, physicians may obtain relevant information when this item is not totally denied [41]. Taking these properties of the PHQ-9 together, it is an advantage of this short instrument that it can nevertheless been used for different purposes: (a) general screening for depression, (b) focusing on two aspects of depression according to the two factors, and (c) considering single items such as suicidal ideation.

The mean score differences between cancer patients and the general population were most pronounced for the items indicating sleep problems and loss of energy. These items belong to Factor 1 and indicate general health problems. All nine items are heightened in the cancer patients’ sample, but it is worth noticing that the health-related components are most strongly affected.

The comparison between the cancer types confirmed high degrees of depression in patients with thyroid cancer [42] and low degrees in those with prostate cancer [27]. While breast cancer patients also show high mean levels of mental distress [43], in this study breast cancer was in the upper margin, but not at the top. Moreover, PHQ-9 mean scores were presented for several other, more seldom types of cancer that have not been extensively examined in psycho-oncological research. In addition to the raw PHQ-9 mean scores for the different cancer types, we also calculated the differences between these mean scores and the expected mean scores, based on the age and gender distribution. There were no great differences between raw scores and corrected scores in the sequences (Table 6). However, for cancer types with large proportions of males and young patients, (testis, Hodgkin lymphoma), the burden of cancer is underestimated when only simple mean scores are considered. Regression analyses such as those performed here can also be calculated for other questionnaires in order to provide a basis for unbiased comparisons among subgroups of patients.

Some limitations of this study should be mentioned. We examined patients half a year after discharge from a rehabilitation clinic. Patients with a very bad prognosis may be underrepresented or overrepresented in the sample. Though we believe that patients in a good health state are more compliant in filling in the questionnaire, resulting in a slight underestimation of the depression burden in the sample, we have no information on the non-participants. A further limitation is the limited information surrounding the health status of the respondents. In addition, participants of a rehabilitation program are not totally representative of all cancer patients. We only calculated CFA analyses for the one-dimensional model and one two-dimensional model. It would be possible to refine the CFA models and to arrive at better fit indices if several modifications were made such as: considering sub-dimensions, correlated error terms or removing items. However, special modifications, adapted to each data set, would not lead to generalizable results. Some patient groups in our study had small sample sizes, their depression mean scores should be considered with caution. Finally, the PHQ-9 is an economic screening instrument, which, however, is not a sufficient substitute for a clinical diagnosis of depression. Nevertheless, it can help provide aggregated information on the burden of special disease groups such as cancer patients.

Conclusions

The results showed that the PHQ-9 is comprised of items that measure several aspects of depression, but that it is nevertheless useful to maintain the PHQ-9 as a one-dimensional scale in practical applications. The regression coefficients can be utilized to qualify the comparison among different groups of patients.