An estimated 14 % of the global burden of disease can be attributed to neuropsychiatric disorders, primarily related to the disabling nature of depression and other common mental disorders [1, 2]. Depression has also been found to be co-morbid with a range of chronic diseases, including HIV/AIDS [3, 4] cardiovascular disorder (CVD) and diabetes [1, 5]. Large studies have shown that patients with chronic obstructive pulmonary disease, chronic renal failure and cerebrovascular disease compared with age-matched healthy controls were almost three times more likely to have depression, and twice as likely for patients with diabetes or hypertension [6, 7]. The WHO World Mental Health Survey in developed and developing countries [8] and the review of the evidence by the National Institute for Health and Clinical Excellence (NICE) guidance for depression in adults with a chronic physical health problem reports similar findings [9].

The co-morbidity of depression with chronic conditions is a public health concern due to their mutual reinforcement and synergistic clinical effects [7, 10, 11]. Evidence for the reciprocal relationship between depression and chronic physical health problems suggests a number of causal pathways, including emotional distress and poor sleep due to pain [12], the prospect of disability, [1] and changes in allostatic load whereby the ability of the body to adapt may be compromised due to ongoing tissue damage and degenerative changes [9]. Similarly, depression can also contribute to the development of physical health problems. Systematic reviews of 11 prospective cohort studies in health populations reveal that depression predicts later development of coronary heart disease [13] and stroke. Furthermore, major depressive disorder (MDD) can act as a precipitant for HIV infection as well be a consequence of HIV infection [14, 15]. One third of HIV-infected patients were found to be depressed in a recent screening study for depression in a high HIV burden primary health care clinic in South Africa study [16]. The burden of disease in South Africa is in the process of profound health transitions where communicable and non-communicable disorders coexist, and where chronic conditions such as cardiovascular disease, type 2 diabetes, cancer, chronic lung disease and depression are all predicted to increase substantially [17].

Treatment of depression is vital to improved adherence, social functioning and the disease course of chronic conditions [5, 14, 18, 19]. Early recognition and appropriate management of depression has the potential to improve adherence and impact on the social functioning and the disease progression of chronic conditions, thereby enhancing quality of life [5, 9, 1820].

While identifying depressive disorder in primary care is recognized as important to effective treatment, only about half of patients with depressive disorder are detected by regular health care providers in high-income settings [21, 22]. In South Africa, this gap is far greater, with only one in four people with a common mental disorder receiving treatment [23]. With HIV transitioning to a chronic condition as a function of the roll-out of anti-retroviral treatment (ART) in South Africa, as well as the rising burden of NCDs (Non Communicable Diseases), which includes mental disorders [17], there is an imperative to integrate the treatment of depression with chronic disease management.

In the context of busy primary health care (PHC) clinics, the use of valid screening tools that facilitate assessment of depression by non-specialists would allow for greater depression detection and care, as well as being in line with the task-sharing model that has been embraced by the recent National Mental Health Policy Framework and Strategic Action Plan (2013–2020) in South Africa [24]. The Patients Health Questionnaire (PHQ-9) is a brief diagnostic and severity measure of depression, which has been widely used in research and clinical practice but has not been validated for use in South Africa on chronic care patients.

While there are numerous studies that report on the validity of the PHQ-9 in relation to various chronic illnesses, most of these studies fail to use an appropriate reference standard to evaluate the performance of the PHQ-9′s performance in identifying depression [2528]. Only a few studies involving patients in chronic care have examined the PHQ-9 against a criterion standard. The PHQ-9 successfully screened post-stroke depression patients with operating characteristics similar or superior to other depression measures [29]. Similar findings are noted in the Heart and Soul study of depression among patients with coronary heart disease [30]. A meta-analysis that included chronic care conditions involving patients from cardiology, renal dialyses, brain injury and stroke facilities as well as general medical outpatients found the PHQ-9 acceptable as a diagnostic screening tool for major depression [31].

In sub-Saharan Africa, only a few studies have established the criterion validity of the PHQ-9 as a screening tool for depression [16, 26, 32, 33]. A meta-analysis establishing the range of optimal cut-off scores for diagnosing depression with the PHQ-9 notes that optimal cut-off scores range between 8 and 11 [34]. In low and middle income contexts, a positive screen for depression was defined as a score of ≥10 [32, 35].

This study aimed to validate the PHQ-9 as a screening tool for depression among chronic care patients attending two public primary health care facilities in South Africa. As a secondary objective, we validated the PHQ-2 - a subset of two questions drawn from the PHQ-9 – which is a simpler screening option that has been validated elsewhere [36]. The PHQ-2 has not been validated against a gold standard in sub-Saharan Africa, though it has been recommended as a valid and reliable tool for use in resource-constrained settings [28].


Setting and participants

The study was conducted in two primary health care clinics in the North West province of South Africa. The two clinics are pilot sites for a national Department of Health model for integrated chronic disease management, which adopts the collaborative chronic care model and services all chronic care patients at one service point [37].

Study procedures

This validation study was nested within a larger facility detection survey designed to assess the detection and treatment of depression and alcohol use disorders by health care providers for adult attendees of primary health clinics. Patients who had come to the chronic care clinic were recruited from the consultation waiting room before their consultation with a clinician. In the waiting room, a field worker asked for volunteers to take part in a survey on depression and alcohol use disorders. The field worker directed interested individuals to a research assistant in a private consultation room. The research assistants who were recruited from the local communities had completed secondary school, were fluent in seTswana and English, and were trained in administering the PHQ-9 by a clinical psychologist. The research assistant assessed the patients for inclusion criteria, which were: age 18 years or older, clinic attendance for routine chronic disease services (e.g., HIV, hypertension, diabetes) and ability to comprehend and complete study components in seTswana or English. Exclusion criteria included incapacity to provide informed consent (e.g., less than 18 years of age, presence of severe intellectual disability or currently experiencing an acute medical issue, or in treatment for major depression). Eligible patients provided written informed consent to participate in the validation study. Patients with low levels of literacy could sign the informed consent form with an “X” after discussing the study with a study supervisor, and having the consent form read out to them.

Research assistants who administered the PHQ-9 were supervised by mid-level psychological counsellors with 4-year Bachelor’s degrees in psychology who were fluent in seTswana and trained by the seTswana-speaking clinical psychologist in the administration of the PHQ-9. These research assistants then orally administered the PHQ-9 screening tool, and entered the participant’s responses in a questionnaire application programmed onto a mobile handheld device [38]. In addition to the PHQ-9 screening instrument, the interview contained questions on socio-demographic characteristics, economic status, chronic care services received at the clinic, alcohol use and disability status as part of the larger study. Immediately after the conclusion of the screening interview, the research assistant directed the participant to another private consultation room for the diagnostic interview with a clinical psychologist. The research assistant did not appraise the participant or the psychologist of the participant’s PHQ-9 screening score. In each clinic, a clinical psychologist conducted the Structured Clinical Interview for DSM-IV (SCID) [39] diagnostic interview for a current episode of depressive disorder. One psychologist was fluent in both study languages, while the other was assisted by a seTswana-speaking mid-level trained psychological counsellor to conduct the diagnostic interviews. Most patients chose to have the interviews in seTswana with only a few who chose to be interviewed in English. At the conclusion of the diagnostic interview, any patient who 1) expressed suicidality (i.e. thoughts, plans, actions); or 2) was judged to have severe symptoms of depression by the clinical psychologist, was asked by the research assistant to provide consent for a referral to the consulting nurse in the clinic.

Ethical approval for the validation study was obtained from the University of KwaZulu-Natal Biomedical Research Ethics Committee (BE271/13). Ethical approval for the larger study was obtained from the University of KwaZulu-Natal Biomedical Research Ethics Committee (BE400/13), and the University of Cape Town, Faculty of Health Sciences, Human Research Ethics Committee (412/2011), and the World Health Organization Research Ethics Review Committee (RPC497).


The PHQ-9 asks patients to rate how often they were bothered by specific problems over the last two weeks. Each item is scored from 0 to 3 (0 = not at all; 1 = several days; 2 = more than half the days; and 3 = nearly every day) [36]. All nine items in the PHQ-9 are derived from DSM-IV criteria relevant for diagnosis of a current depressive episode [21]. The PHQ-2 is comprised of the first two questions of the PHQ-9, namely whether the patient has depressed mood and loss of interest (anhedonia). The PHQ-9 asks patients to rate how often they were bothered by specific problems over the last two weeks. Each item is scored from 0 to 3 according to the frequency of the problem. After a piloting process we adapted the response set to improve respondent understanding, such that “several days” was understood to be 1–7 days, “half the days” was understood to be 8–11 days and “nearly every day” was understood to be 12–14 days, as was the case in other validity studies in Africa [16, 26]. The PHQ-9 was translated into seTswana by a seTswana-speaking mental health professional and then back-translated by an independent seTswana-speaking clinical psychologist using the methodology described by Brislin [40] (Additional file 1). Due to time considerations and the logistics of administering the PHQ-9 twice to a clinic population, test-retest reliability was not done. Inter-rater reliability was also not completed for the same reasons.

The gold standard diagnostic interview was the depression module of the SCID-I to assess the participants for the presence or absence of major depressive disorder. The SCID is a semi-structured interview administered by a trained clinician who assesses a respondent for the presence or absence of a mental health disorder. At the time of the study, the more recent version of the SCID was not available. Nevertheless, the criteria for depression remains the same in the DSM-V as the DSM-IV version. According to the American Psychiatric Association “neither the core criterion symptoms applied to the diagnosis of major depressive episode nor the requisite duration of at least 2 weeks has changed from DSM-IV. Criterion A for a major depressive episode in DSM-V is identical to that of DSM-IV, as is the requirement for clinically significant distress or impairment in social, occupational, or other important areas of life” (p. 4) [41].

Sample characteristics

Out of a total of 1321 eligible patients who participated in the larger facility survey, 1025 patients were attendees of the clinics where the validation study was conducted. Of these patients, a sub-sample of 676 patients participated in the validation study between February and April 2014. Participants who consented to the general study also consented to the PHQ9 validation study. The sub-sample size was limited by the availability of the clinical psychologists to conduct diagnostic interviews. Overall, there were 233 refusals to the larger study. Table 1 reflects the demographic characteristics of the 676 participants who completed both the SCID and PHQ-9 interviews. The participants were predominantly female (75.0 %), with mean age of 47.1 years (SD 13.1). Most participants (72.4 %) had completed 6 or more years of primary education. Participants could report multiple conditions for which they had been diagnosed and for which they were receiving services. The most common conditions reported were HIV infection (61.1 %), hypertension (51.0 %), diabetes (9.3 %), and tuberculosis (4.9 %). Of the 413 participants diagnosed with HIV and 345 patients diagnosed with hypertension, 118 participants were found to be co-morbid.

Table 1 Demographic and clinical characteristics chronic care patients (n = 676) in North West Province, South Africa

Data analyses

First, we described the socio-demographic and clinical characteristics of participants recruited for this validation study and proportion of patients receiving a depression diagnosis on the SCID. We evaluated the internal consistency of the PHQ-9 by calculating the Cronbach alpha. Next, we used the screening scores from the PHQ-9 interview to construct a receiver operating characteristic (ROC) curve and calculated the area under the ROC (AUROC) using the SCID as the gold standard. The AUROC provides a summary measure of a screening tool’s sensitivity and specificity, relative to the gold standard diagnostic, across the entire range of screening scores. An AUROC score of 0.5 is consistent with a screening tool that is no better than chance, and a score of 1.0 indicates a perfectly accurate screening tool. We also calculated the AUROC for the PHQ2 and compared it to the AUROC for the PHQ-9. To explore potential heterogeneity of the validity of the PHQ-9, we also repeated the AUROC analysis with two (overlapping) subsets of participants who reported that their clinic attendance was for ongoing care with HIV infection and with hypertension. For all AUROC calculations, we report exact binomial 95 % confidence intervals (95 % CI).

We completed all analyses using Stata 13.1 (StataCorp, College Station, USA); we used the ‘roctab’ command for analysis of the individual AUCs, and ‘roccomp’ for the comparison of the PHQ-9 and the PHQ-2.


Descriptive results

In the SCID diagnostic interview, more than one in ten (11.47 %) participants were diagnosed as currently experiencing a major depressive episode. The mean PHQ-9 score for those who were SCID-positive (PHQ-9 score 9.4, SD 5.3) was substantially higher than for those who were SCID-negative (PHQ-9 score 3.2, SD 3.1). For those with an HIV diagnosis, 12.6 % were SCID-positive, while 9.9 % of those with hypertension were also SCID-positive. No participant experienced any adverse events during either interview. The study psychologist referred 69 participants to the clinic on-call nurse, following disclosure of suicidal ideation, planning or attempts (Table 2).

Table 2 Performance of the PHQ-9 and PHQ-2 to detect major depressive disorder among chronic care patients (n = 676)

Internal consistency of the PHQ-9 revealed a Cronbach alpha of 0.76.

Sensitivity and specificity of the PHQ-9 and PHQ-2

With the full study sample, the PHQ-9 showed reasonably high validity (AUROC 0.85, 95 % CI 0.82–0.88). With a cut point of ≥9, the PHQ-9 had sensitivity of 49 % and specificity of 94 %. The likelihood ratio of a person testing positive for depression was seven times more likely at this cut point (7.7792). The Overall Correct Classification (OCC) rate of 0.886 was also the best at a cutoff of 9 and above. With this cut point, the positive and negative predictive values were 50 % and 94 %, respectively. Youden’s Index (Youden’s J) identifies a cut point of ≥5, with a maximum value of sensitivity- (1-specificity) of 52.3 %.

The validity of the PHQ-9 was similar for the subsamples of 413 patients receiving services for HIV infection (AUROC 0.85, 95 % CI 0.81–0.88) and the 345 patients receiving services for hypertension (AUROC 0.85, 95 % CI 0.82–0.90).

While the PHQ-2 appeared to be a valid screening tool (AUROC 0.76, 95 % CI 0.73–0.79), its AUROC was significantly lower than for the PHQ-9 (p < 0.0001). With a cut point of ≥2, the PHQ-2 had sensitivity of 60 % and specificity of 84 %, correctly classifying 81 % of the population at this level (Figs. 1, 2 and 3, Tables 3, 4 and 5).

Fig. 1
figure 1

Receiver operating characteristic (ROC) curve for PHQ-9

Fig. 2
figure 2

Receiver operating characteristic (ROC) curve for PHQ-2

Fig. 3
figure 3

Receiver operating characteristic (ROC) curve comparing PHQ-9 to PHQ-2

Table 3 Item-level discrimination PHQ-9 (N = 676)
Table 4 Performance of the PHQ-9 in Detecting Depression in HIV positive patients (n = 413) in North West Province, South Africa
Table 5 Performance of the PHQ-9 in Detecting Depression in Patients with hypertension (n = 345) in North West Province, South Africa


The health care burden associated with the burgeoning chronic care population in South Africa makes this a timely and important focus as individuals with depression are less likely to be treatment adherent or engage in health enhancing behavior change to promote healthy lifestyles [5, 1820]. Given that identifying depression in chronic care patients can be a diagnostic challenge in busy clinics, a short and valid screening tool can assist in the identification of patients with depression.

A meta-analysis establishing the range of optimal cut-off scores for diagnosing depression with the PHQ-9 notes that optimal cut-off scores range between 8 and 11 [34]. In this study, a cut-point of ≥9 yielded a fairly low sensitivity (49 %) in comparison to sensitivity indices of 78.7 % and 89.6 %, respectively in other similar validity studies in sub-Saharan Africa [16, 32], However, higher specificity (94 %) was noted in the present study relative to the only other South African study (83.4 %) [16] and was the same as that found for HIV patients in Cameroon [26]. A prospective study of psychosocial factors and health outcomes in patients with a diagnosis of coronary heart disease also showed similar sensitivity and specificity to the present study [30].

Similar to the PHQ-9, the PHQ-2 had lower sensitivity (60 %) than specificity (84 %) and may be explained in the same way as that for the PHQ-9 given that these data are drawn from the same sample. The PHQ-2 remains a valid option for use, particularly in time-constrained settings. The trade-off between sensitivity and specificity using the PHQ-2 is more substantial than for the PHQ-9. Selecting a high cut-off score when using the PHQ-2 would enable clinicians to screen a large number of patients, but then to refer a relatively modest number of patients – who are likely true cases - for confirmation of a diagnosis of depression.

The strength of this study is that it is one of the few studies to consider the validity of the PHQ-9 in sub-Saharan Africa, and the first to validate the PHQ-9 or the PHQ-2 for a chronic care population in South Africa. Limitations of this study include that the participants were drawn from only two clinics in North West province, and were predominantly female. Therefore, it may not be possible to generalize our results to other populations. We were unable to establish test-retest reliability due to time considerations and the burden it would impose on the public health clinics. On the same basis, we were unable to establish inter-rater reliability of administration of the gold standard instrument. For logistic reasons we were also unable to randomize the order of interviews as participants with depression would be more likely to be detected in the diagnostic interview than in the screening interview. Further, a willingness to disclose feelings of distress in the diagnostic interview may have been heightened after completing the screening interview. This ‘order’ effect may have biased our AUROC results to be lower than the true values.


The brevity of the scale, ease of administration and its concurrent validity with the SCID suggests that the PHQ-9 can be a valuable instrument for identifying co-morbid depression in patients with chronic conditions using a cut-point of ≥9. It is possible that the abbreviated PHQ-2 could be used in a busy primary care clinic, but would need to be followed up with a full assessment. Identification is the first step towards closing the treatment gap for depression and advancing the agenda of integrated chronic disease management in South African public health facilities. Further research is needed in understanding how patients with low levels of understanding of depression as a legitimate disorder and recognition of depressive symptoms are likely to respond to screening tools such as the PHQ-9.