Depression contributes significantly to the global burden of disease. Depression is often under-diagnosed in low and middle-income countries (LMICs) [1, 2]. The poor detection of depression is associated with disability and leads to increased use of health services for physical health complaints in both high-income countries and LMICs [3]. If healthcare workers are able to diagnose depression, they can reduce morbidity and improve patient wellbeing by providing cost-effective treatments [4, 5]. In order to detect and diagnose depression, we need to have effective tools. Because depression varies based on cultural context, screening tools must be adapted and validated for particular populations.

There are a number of tools used to screen for depression. The Patient Health Questionnaire (PHQ-9) is widely used for screening and monitoring treatment of depression [6]. The PHQ 9 is a nine-item scale assessing symptoms experienced in the preceding two weeks. The reliability and validity of the PHQ-9 are sound, and internal validity of the PHQ-9 is high. The PHQ-9 questions are easily understood, and the PHQ-9 requires minimal time to administer and score [7]. The PHQ-9 has been validated and translated in some African countries including Nigeria [8], Ghana [9], Kenya [10], Cameroon, [11], Ethiopia [12], South Africa [13], and Uganda [1]. However, the PHQ-9 has not been validated for use in Malawi, and there are currently few tools for efficient and effective depression screening in a general healthcare setting in Malawi. The Self Reporting Questionnaire (SRQ) has been validated to screen for major depressive episodes in Malawi [14]. However the SRQ-20 is limited in response options (‘yes’ or ‘no’) [15], whereas the PHQ-9 has greater variety of options for describing symptom occurrence.

Accordingly, we conducted a validation study of the PHQ-9 for detection of depression in Malawi. We validated this tool in patients with type-2 diabetes mellitus attending two NCD clinics in Malawi. We chose the PHQ-9 for this study because it is effective in other settings and brief, which is compatible with Malawian health care setting workload. We conducted this study in an NCD clinic because depression is often comorbid with NCDs [16,17,18], and it is particularly important to have a tool that is validated for use in general healthcare settings, such as an NCD clinic. We included only patients with diabetes because the health care burden associated with the rapidly increasing diabetes population in Malawi, and the concern that depression can interfere with clinic appointment attendance and treatment adherence, makes this a timely and important focus [19,20,21,22,23]. The use of a valid screening tool for depression will help clinicians better diagnose patients and initiate treatment, which is in line with the strategy of integration of mental health as outlined in Malawi’s National Mental Health Policy.

Materials and methods

Setting and participants

We conducted the study in Lilongwe district, a predominantly Chichewa speaking district. The study was conducted in two NCD clinics of Area 25 Health Centre under Lilongwe District Health Office and Kamuzu Central Hospital. The Area 25 Health Centre has been piloting a chronic care clinic for key NCDs such as hypertension, diabetes, asthma and epilepsy at the primary health care level since March 2014. Research assistants recruited consecutive patients attending the NCD clinic at the study sites between December 2017 and April 2018. Participants were eligible for the study if they were at least 18 years of age or older, attending the NCD clinic for diabetes care, and available for an interview. Participants were excluded if they required acute medical care or were unable to speak.


Content validity

The PHQ-9 is a depression module that incorporates the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria into a brief measure of depression [24]. The PHQ-9 is a concise tool for assessing depression. Two bilingual Malawians translated the English PHQ-9 independently into Chichewa; one mental health nurse and one linguistics and communication specialist. This was followed by evaluation of the translated tools by the principal investigator, two mental health professionals and two health promotion officers with extensive expertise in developing health communication tools in order to arrive at a consensus translation. Two additional independent bilingual Malawians back-translated the consensus Chichewa translation into English.

We pretested the Chichewa translated version of the PHQ 9 on a convenience small sample of 15 participants attending a general outpatient clinic in area 25. The PHQ-9 was interviewer-administered, and participants were probed about their perceived interpretation of the constructs. This pretesting helped to identify any challenges respondents might have with the translation. Any unclear Chichewa terms were modified to include terms that are more commonly used and understandable by the participants in order to produce a final PHQ-9 Chichewa translation. (Additional file 1).

Criterion validity

The Structured Clinical Interview for DSM-IV (SCID) [25] depression module was used as a gold standard to validate the PHQ 9. The SCID is a semi-structured interview designed for administration by a clinician or skilled researcher that determines formal diagnosis according to the Diagnostic and Statistical Manual of Mental Disorders. The SCID for depression was translated into Chichewa previously and has been used in Malawi after undergoing a process of validation which included translation, back translation and testing [14] . At the time of the study the most recent version of SCID was not available however major depression is defined in both the DSM-IV and DSM-V as the presence of either depressed mood or loss of interest or irritability with five or more depressive symptoms, lasting at least two weeks, with no history of a manic, hypomanic, or mixed episode [26, 27]. In contrast, minor depression is described in the DSM-IV as the presence of at least two, but less than five, depressive symptoms (one symptom must be either depressed mood or loss of interest) during the same 2-week period, with no history of a major depressive episode or dysthymia [27].

Study procedure

The sample size was calculated using Buderer’s formula [28]. We used the following parameters: anticipated sensitivity (SN) of the PHQ-9 was 80%, the standard normal deviation corresponding to the specified size of the critical region (z21-α/2) is 3.84, alpha (α) which is size of the critical region is 0.05, the absolute precision desired on either side of sensitivity L was set at 0.1, and prevalence for depression was estimated as 20%. The prevalence of depression of 20% was based on a recent study among patients attending a health care setting in Malawi [29] . We increased the sample size by 5% to account for potential participants’ refusal and loss of data. Taking these assumptions into consideration, the required sample size was calculated as 323.

Two research assistants conducted the recruitment and screening of participants. The research assistants had bachelor’s degree and had 3 days of training in administration of the research tools and spoke both English and Chichewa. After assessing each patient for inclusion and exclusion criteria, the research assistants explained the study’s purpose and procedures. Informed consent (signed or thumb printed) was obtained from patients. The research assistant collected demographic information from the participant and then administered the Chichewa version of the PHQ-9 in a private room.

After seeing their clinician for their regular appointment, participants completed the SCID with a separate SCID interviewer. The SCID interviewer was masked from the PHQ-9 score. The SCID interviewer was a mental health clinician and had regular supervision and record review for quality assurance to ensure consistence and accuracy of diagnoses. Finally, the patient’s health passport was examined to determine whether or not the NCD clinician had made a diagnosis of depression and/or prescribed antidepressant medication during the clinical encounter.

Data analyses

We compared the PHQ-9 score against the “SCID” diagnosis of depression that was made by the 2-stage diagnostic process. We also calculated the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for various PHQ-9 cut-off scores. The PHQ-9 ability to discriminate between cases and non-cases was then examined using receiver operating characteristic (ROC) analysis. The ROC curve analysis was used to choose cut-points for the PHQ-9 scale. Two different cut-points were used, and the diagnostic ability was assessed by a number of statistics at each of these points. The first cut-point was the point which maximised the combination of sensitivity and specificity, whilst the second cut-point was chosen to give a more give a higher sensitivity. The PHQ-9 cutpoint that maximized sensitivity + specificity was selected to report test characteristics. The ROC curves were obtained by plotting sensitivity against 1-specificity for each possible cut-off score. The area under the ROC curve (AUC) was used to indicate the performance of the PHQ-9. A value of 0.5 on the AUC indicates discrimination no better than chance, and a value of 1.0 represents perfect discrimination. The correctly classified rate and likelihood ratio were also considered. Internal consistency of the PHQ-9 was evaluated using Cronbach’s alpha. Data analysis was conducted using SPSS version 20.0.

Ethical approval

We obtained ethical approval from the University of Malawi, College of Medicine Review and Ethics Committee (COMREC) (Reference-P.07/17/2218). Written informed consent was obtained from every participant and fingerprint impressions were taken from consenting illiterate participants. All interviews were conducted in private at the health facilities.


Sample characteristics

In total, 323 patients who were eligible and approached completed both the PHQ-9 and SCID between December, 2017 and April, 2018 thus we had no refusals. Among the 323 patients, 127 (39.3%) had diabetes only while 196 (60.7%) had both diabetes and hypertension. The mean patient age was 54 years (range, 21–79 years), with a standard deviation of 11.4 years; 75.5% of patients were female (Table 1).

Table 1 Socio-demographic characteristics of 323 validation study participants recruited from two non-communicable diseases clinics in Lilongwe, Malawi

Prevalence of depression

Of the 323 patients, a total of 133 had either a minor or major depression identified by SCID, resulting in an estimated prevalence of 41%. Major depression was observed in 58 (18%) of the patients. No patients (0%) were diagnosed with depression or prescribed depression treatment by the NCD clinicians.

Performance of the patient health Questionnaire-9

The calculated internal consistency of the PHQ-9 (Cronbach’s alpha) was 0.83. The area under the receiver operator curve was 0.93 (95% CI, [0.91–0.96]) (Fig. 1), suggesting good discriminating power of the PHQ-9 between cases and non cases of depression.

Fig. 1
figure 1

The predictive ability of PHQ-9 for detecting any depression

The results suggested that a cut-point of 6 or higher on the PHQ-9 scale gave the best combination of sensitivity and specificity in detecting either minor or major depression. This cut-point gave a very high sensitivity of 93% and a lower specificity of 78%. There was a high NPV of 94%, but a lower PPV of 74%. The Overall Correct Classification (OCC) rate was 84% at a cut point of 6 or higher and the likelihood ratio of a positive screen for depression was 4.2) and the likelihood ratio of a negative screen was 0.9. Using a higher cut-off of ≥7 increased the specificity to 86%. However, this was at the expense of sensitivity, which dropped down to 80%. There was a high NPV of 86%, but a lower PPV of 79% with an OCC of 83% and likelihood ratio-positive of 5.6.

The best combination of sensitivity and specificity was found to be at a cut-point of 9 or higher. A cut-point of ≥9 had a sensitivity of 64% and a specificity of 94% in detecting either minor or major depression. At the cut-point of ≥9, the NPV was 79% while the PPV was 88% and the likelihood ratio-positive was 10.1 and the likelihood ratio-negative was 0.4. The OCC rate of 81% was also good at a cut point of 9 and higher (Table 2).

Table 2 Operating characteristics of the Patient Health Questionnaire-9 at various cut-off scores for identifying either minor or major depression

A similar analysis was performed to examine the predictive ability of PHQ-9 for detecting major depression alone. The ROC curve analysis gave an AUC value of 0.91 (95% CI, 0.88 to 0.94; Fig. 2).

Fig. 2
figure 2

The predictive ability of PHQ-9 for detecting Major Depression

The best combination of sensitivity and specificity was found to be at a cut-point of 9 or higher. This gave a relative high sensitivity and specificity of 85 and 82% respectively. There was also very high NPV with a value of 96%, but a lower PPV of 51% with an OCC of 82% and likelihood ratio-positive of 4.6. Lowering the cut-point on the PHQ-9 scale to 7 gave relatively similar overall performance in terms of the combination of sensitivity and specificity. However, using this cut-off increased sensitivity up to 95%, but at the expense of specificity which dropped to 71%. The OCC was 75% and likelihood ratio-positive was 3.2 (Table 3).

Table 3 Operating characteristics of the Patient Health Questionnaire-9 at various cut-off scores for identifying major depression


Few screening tool to detect common mental disorders (CMDs) have been specifically developed in low and middle income countries [30] as such many researchers rely on tools from developed countries. It is important that instruments for screening patients have to be evaluated for their reliability and validity prior to their use in a country to ensure that the instruments are measuring what they supposed to measure [31].

In this study when the PHQ-9 was used against the gold-standard diagnosis, it performed well showing reasonable accuracy in identifying cases of depression. The area under the ROC curve was found to be 0.93, this is a high value, suggesting good diagnostic ability of the PHQ-9 score.

In this study, the PHQ-9 showed good predictive performance, comparable to that seen in validation studies in other parts of sub-Saharan Africa [1, 13]. The internal consistency observed for the PHQ-9 was also similar to that found in other previous study elsewhere [32]. Maximum sensitivity with a specificity ≥75% has been considered as desirable for clinical use [32]. Relative to previous studies, our results suggest PHQ-9 has better sensitivity and acceptable specificity in the type 2 diabetic population.

Depression is common and likely to impact diabetes mellitus care therefore routine screening is really important in diabetes mellitus care and we need these tools. In this validation study of the PHQ-9 in NCD clinic settings in Malawi, there was evidence of high prevalence of depression in patients with diabetes mellitus. The rate of depression in this study is comparable to rates of depression in other LMICs such as 46% in South Africa, 40% in Iraq, 32% in Egypt, 15 to 30% in Nigeria, 14. 7 to 43% in Pakistan, 43 to 70% in Iran, 27 to 63% in Mexico [33] and 39.73% in Ethiopia [34]. Depression often is missed by clinicians working in NCD clinics. Indeed, none of the cases of SCID-defined minor or major depression had been identified by the patient’s NCD clinician. This underscores the importance of this study. Given that the study has shown that identifying depression in NCD clinic is a challenge, a short and valid screening tool like PHQ-9 can assist in the identification of patients with depression.

One of the strength of this study is that it is the first study to consider the validity of the PHQ-9 in non communicable disease clinics in Malawi, and the first to validate the PHQ-9 in Malawi. Another strength of the study, is the careful attention which was paid to the translation. The study also used a reference standard, the SCID for depression that had previously been translated and adapted for use in Chichewa. A limitation of this study is that the participants were drawn from only two specialized NCD clinics in Lilongwe which may not be representative of the wider population.


The validity, ease of administration and brevity of the PHQ-9 imply that it will be a valuable tool for identifying comorbid depression in patients with non communicable diseases using a cut-point of ≥9. Clinicians in NCD clinics when choosing a tool for screening depression should consider the PHQ-9. The use of a validated PHQ-9 will be in line with the strategy of integrating depression management in chronic care clinics in Malawi. The findings support our planned use of the PHQ-9 as a screening tool in a pilot study to evaluate the effects of depression management on glycaemic control in non communicable diseases clinics in Malawi.