Validation of the Patient Health Questionnaire (PHQ)-9 for prenatal depression screening
- 881 Downloads
The study was designed to validate the Patient Health Questionnaire (PHQ-9) for depression risk identification among pregnant women. Pregnant women were routinely administered the Prenatal Risk Overview, a comprehensive psychosocial screening interview, which included the PHQ-9, at their prenatal intake appointment at three community clinics. Study participants completed the Structured Clinical Interview for DSM-IV (SCID) at a later appointment. PHQ-9 risk classifications were cross-tabulated with SCID diagnostic categories to examine concordance, sensitivity, specificity, and positive and negative predictive values. The study sample included 745 women. Prevalence of a current major depressive episode was 3.6 %; an additional 7.0 % were classified as meeting subdiagnostic criteria of three or more depressive symptoms. A PHQ-9 score cutoff of 10 yielded sensitivity and specificity rates of 85 and 84 %, respectively, for a depression diagnosis and 75 and 88 % for a subdiagnosis, respectively. Positive predictive value was higher for the expanded group (43 %) than that of the diagnosis-only group (17 %). The PHQ-9, embedded within a multidimensional risk screening interview, effectively identified pregnant women who met criteria for current depression. The moderate risk score cutoff also identified women with subdiagnostic symptom levels who may benefit from interventions to alleviate their distress and improve pregnancy outcomes.
KeywordsDepression Risk screening Prenatal care Healthy Start
Screening for postpartum depression is widely promoted as an important public health intervention (Lusskin et al. 2007; Wisner et al. 2002; Miller 2002), but the urgency of detecting prenatal depression has received much less attention. Studies have shown that a significant proportion of depressive disorders identified in the postpartum period may have existed during pregnancy and that screening initiatives focused exclusively on the postpartum period miss an opportunity to intervene earlier in the course of depression (Evans et al. 2001; Yonkers et al. 2001; Lee et al. 2007). Pregnancy is a time of increased risk for depression because it is a major life event and because of related hormonal changes (Bennett et al. 2004).
Prenatal depression has serious implications for both maternal and fetal outcomes. Depressed pregnant women are at increased risk for the use of tobacco, alcohol and drugs, and poor prenatal care attendance (Lusskin et al. 2007) which are associated with poor birth outcomes (Damus 2008; Goldenberg and Culhane 2007; Kotelchuck 1994; Saenger et al. 2007). Untreated prenatal depression has been associated with gestational hypertension, preeclampsia, preterm birth (Bonari et al. 2004; Dayan et al. 2002; Lusskin et al. 2007; Orr et al. 2002; Steer et al. 1992), and postpartum depression (Hobfoll et al. 1995; Lee et al. 2007). Furthermore, infants of women with untreated prenatal depression are at increased risk for being small for gestational age, low birth weight, and other complications (Chung et al. 2001; Preti et al. 2000; Steer et al. 1992; Lusskin et al. 2007; Davalos et al. 2012).
Despite growing awareness of the risks and prevalence of prenatal depression, many women experiencing depressive symptoms during pregnancy go unidentified and untreated (Bonari et al. 2004; Marcus et al. 2003; Scholle et al. 2003). As a result, there is a growing movement to screen women as a part of regular prenatal care. In 2010, the American Congress of Obstetricians and Gynecologists encouraged providers to strongly consider screening for prenatal and postpartum depression (Committee opinion no. 453: Screening for depression during and after pregnancy 2010).
Prenatal depression screening is a core service requirement for the 105 federally funded Healthy Start programs designed to reduce racial disparities in infant mortality and poor birth outcomes. Recognizing the need for systematic screening for a constellation of risk factors associated with poor maternal health and birth outcomes, including depression, Twin Cities Healthy Start developed and implemented a multidimensional risk screening tool (Harrison and Sidebottom 2008). The Prenatal Risk Overview (PRO) assesses lack of basic needs, lack of social support, partner and other interpersonal violence, depression, substance use, legal problems, and involvement with child protective services. The multidimensional tool has the advantage of allowing clinicians to view particular risk factors within the context of other stressors and behaviors rather than in isolation.
Two brief depression screening tools were considered for inclusion in the PRO: the ten-item Edinburgh Postpartum Depression Screen (EPDS) (Cox et al. 1987) and the nine-item PHQ-9 (Spitzer et al. 1999). The EPDS was developed for postpartum screening but has been used prenatally (Flynn et al. 2011; Bennett et al. 2004). The PHQ-9 was developed for use in primary care settings and has been tested for validity extensively among diverse populations (Adewuya et al. 2006; Gjerdingen et al. 2009; Huang et al. 2006; Kroenke et al. 2001; Kroenke et al. 2010; Lotrakul et al. 2008; Martin et al. 2006; Monahan et al. 2009; Wittkampf et al. 2007; Yeung et al. 2008), including among obstetrics–gynecology (Kroenke et al. 2001) and postpartum (Gjerdingen et al. 2009; Weobong et al. 2009; Hanusa et al. 2008) patients. These two instruments performed similarly among a sample of 81 pregnant women in the USA who were seeking psychiatric services (Flynn et al. 2011). Only one study has assessed the validity of the English version of the EPDS during pregnancy against a structured diagnostic interview; this study was conducted in the UK, included only 100 women, and identified only six cases of major depression (Murray and Cox 1990). Although no previous study has validated the PHQ-9 against a structured diagnostic interview among an exclusively prenatal patient population, one study did conduct such a comparison using briefer versions known as the PHQ-2 and the PHQ-8 in a sample of pregnant women selected to overrepresent those with a history of depressive or posttraumatic stress symptoms (Smith et al. 2010). That study found lower sensitivity and specificity for the pregnant women than had been found for other populations, a result which the researchers believe may have been related to their sample characteristics.
We selected the PHQ-9 for inclusion in the PRO both for methodological and pragmatic reasons. It was designed specifically to align with DSM-IV criteria (American Psychiatric Association 2000) for depression, and its response options were identical for each question, facilitating recall for the interview method we planned to use. It was also already used in prenatal care at the largest local Healthy Start program site, so selecting a different instrument without a strong empirical justification would have been problematic.
The purpose of this study was to validate the PHQ-9 in pregnant women in the context of a multidimensional screening tool. This study is one component of a research project to validate major components of the PRO and examine methods of administration (Harrison et al. 2011a, b). We hypothesized a PHQ-9 sensitivity rate of 85 % and a specificity rate of 75 % for a current major depressive disorder episode, rates we believed would be adequate to establish acceptability of this instrument for use during pregnancy.
Materials and methods
Study overview and context
As part of the local Healthy Start program protocol, all women seeking prenatal care at participating community health care centers were administered the PRO at their intake appointment by a registered nurse, a social worker, or a paraprofessional such as a community health worker. As the interviewer administered the PRO, she entered responses into the web-based Twin Cities Healthy Start Screening and Case Management System, which also documents demographic descriptors, service-related information, and birth outcomes (Harrison and Sidebottom 2008).
This study was approved by the institutional review boards at the Minnesota Department of Health and the University of Minnesota. Written informed consent for study participation was obtained from all participants. Eligibility for minors was approved because in Minnesota, minors have the legal right to obtain reproductive health services without parental consent.
The study sample consisted of consecutive women seeking prenatal care at three community health centers. All three sites are federally qualified health centers serving predominantly low-income patients. Two sites began recruitment in July 2007, and the third site began recruitment in February 2009. All data collection was completed by December 2010. Patients were ineligible if English was not their preferred language or if staff determined they did not speak English well enough to complete a diagnostic interview. The language exclusion criteria was put in place because of concerns about the cultural appropriateness of the questionnaires for two of the immigrant communities served by these clinics (Hmong and Somali) as well as logistical challenges of providing interpreters to conduct a diagnostic interview. Women were also ineligible if they had participated in the study during a previous pregnancy.
The PRO, which includes the PHQ-9, was conducted at the end of the prenatal intake appointment, which typically included a substantial discussion of medical history. After completing the screening interview, eligible patients were provided with study information and asked to consent to a diagnostic interview to be conducted within the next 2 weeks. Patients who consented to the diagnostic interview were contacted by telephone by the study research assistant to set up an interview appointment. If the prospective participant was not reached by telephone, the research assistant identified her next clinic appointment through the scheduling system and met her in person. When these methods failed, the research assistant sent a letter with information about the study and how to schedule an interview. To ease participant burden, the second interview was typically conducted in conjunction with a regularly scheduled prenatal care appointment or a visit to a colocated service, such as the Women, Infants, and Children program, or the laboratory. Participants were provided with a $50 grocery store or discount store gift card when they completed the study interview.
The lay research assistant received SCID training that included training videos, meetings with an academic psychologist who had substantial experience in conducting SCID training, practice interviews, and feedback. She conducted all SCID interviews and was blinded to the results of the PHQ-9. She adhered to a protocol to alert a clinician if depression symptoms were serious, a safety procedure that was explained to participants during the informed consent process.
Instruments and measures
The PRO consisted of 58 questions that addressed 13 psychosocial domains(Harrison and Sidebottom 2008) including depression (PHQ-9). This interview took about 10–15 min to administer.
PHQ-9 questions address the previous 2 weeks and ask about mood, cognitive, and physical symptoms of depression: feeling little interest or pleasure, feeling down or hopeless, problems sleeping, being tired or having little energy, poor appetite or overeating, feeling bad about yourself, trouble concentrating, speaking or moving slowly or being fidgety or restless, and suicidal ideation (Kroenke et al. 2001, 2010). Our version consisted of ten questions because we asked about moving or speaking slowly and being fidgety or restless separately; only the greater frequency response was included in our scoring algorithm to be consistent with PHQ-9 guidelines.
Response categories were (0) not at all, (1) several days, (2) more than half the days, and (3) every day or nearly every day. Scores for all items were summed. Based on PHQ-9 scoring recommendations (Kroenke et al. 2001, 2010), a score of 20–27 was classified as very high risk, 15–19 as high risk, 10–14 as moderate risk, and less than 10 as low risk. We made one deviation from the standard PHQ-9 scoring algorithm. Regardless of total score, women who scored “2” on the suicidal thoughts item were categorized as high risk, and women who scored “3” were categorized as very high risk. For analyses, women classified as very high and high risk by this item or total score were grouped together.
Diagnostic instrument (SCID)
The SCID (First April 2005) module for current major depressive episode was used to validate PHQ-9 results. An interview guide was created based on the SCID question items assessing presence, frequency, and duration of symptoms. In accordance with SCID interview guidance for probing to rule out symptoms resulting from medical conditions, we included probes to determine whether the respondent believed that an identified somatic symptom was pregnancy-related. The diagnostic interview also included modules for posttraumatic stress disorder, generalized anxiety disorder, and alcohol and drug use disorders and took approximately 30–45 min to administer.
The depression section of the interview guide began by asking whether in the past month there was a time when the respondent experienced a generally depressed mood or marked loss of interest in things she usually enjoyed and whether those feelings lasted at least 2 weeks. Women who met this threshold were asked to focus on the worst 2 weeks in the past month or the past 2 weeks if equally depressed for the entire month. Subsequent questions asked about the presence of the following symptoms: change in appetite or weight loss, insomnia or hypersomnia, psychomotor agitation or retardation, fatigue or loss of energy, feelings of worthlessness or excessive guilt, diminished ability to think/concentrate or make decisions, and recurrent thoughts of death, suicidal ideation, or suicide attempt. Affirmative responses were followed by a question to determine whether the respondent experienced the symptom nearly every day in the 2-week period. Women who reported somatic symptoms were asked whether they perceived those symptoms as pregnancy-related. Women who reported five or more symptoms were then asked about clinically significant distress or impairment such as interference with relationships.
Symptoms were counted as positive only if they occurred nearly every day and if they were not perceived as pregnancy-related. A categorical variable was created to classify respondents as meeting DSM-IV criteria for a clinical diagnosis of major depressive episode (American Psychiatric Association 2000), or a subdiagnostic category, or no diagnosis. A major depression diagnosis required five or more of the nine symptoms, one of which had to be depressed mood or loss of interest or pleasure, plus symptom-associated significant distress or impairment. The subdiagnostic category created for this study required three or more symptoms, one of which had to be depressed mood or loss of interest or pleasure.
Race/ethnicity was recorded as mutually exclusive categories: African-American, American-Indian, Asian/Pacific Islander, Hispanic (any race), white, or bi/multiracial. Nativity was categorized as US-born or foreign-born. Marital status was categorized as unmarried or married.
Statistical analyses were conducted using SPSS 19.0 (2010). Bivariate cross tabulations were conducted with chi-square tests for significance to examine differences between consenters and nonconsenters as well as completers and noncompleters for race, nativity, marital status, and PHQ-9 depression screening results. ANOVA was used to compare differences in mean age for these groups. To assess validity, PHQ-9 scores were divided into <10 (low risk) and >10 (moderate, high, and very high risk). These values were cross-tabulated with the SCID diagnostic categories. These cross tabulations were used to calculate sensitivity, specificity, positive predictive value, and negative predictive value.
Comparisons between study consenters and nonconsenters and SCID interview completers and noncompleters
Consenters (n = 1,274)
Nonconsenters (n = 93)
Completers (n = 745)
Noncompleters (n = 529)
Race/ethnicity, n (%)
Hispanic (any race)
Nativity, n (%)
Marital status (%)
Age, mean, years (SD)
Depression risk classification (PHQ-9), n (%)
Concordance of PHQ-9 risk levels and SCID diagnoses (n = 745)
PHQ-9 risk level (scores)
Major depressive disorder episode diagnosis
To determine whether concordance between the two interviews was affected by the interval between them, PHQ-9 scoring thresholds were compared for cases who met diagnostic criteria. Of the 17 women who had the SCID completed within 3 weeks of the PHQ-9 and met diagnostic criteria, 15 (88 %) scored at 10 or higher on the PHQ-9. Of the ten women who completed the SCID more than 3 weeks after the PHQ-9 screening and met diagnostic criteria, ten (80 %) scored at 10 or higher on the PHQ-9. The difference between these two groups was not significant.
Validity measures of the PHQ-9 compared to the SCID
SCID major depressive disorder episode
PHQ-9 risk levels (scores)
Met diagnostic criteria
Met diagnostic or subdiagnostic criteria
Discussion and conclusion
The proportion of women in this study identified as at risk for depression (PHQ-9 > 10) was 18.4 %. Based on diagnostic interviewing, prevalence of a current major depressive disorder was 3.6 %, with an additional 7.0 % meeting criteria for a subdiagnostic category. These levels are similar to other levels of depressive disorders identified in a population of urban pregnant women (Melville et al. 2010).
This study is the first to validate the PHQ-9 against a structured diagnostic interview in a pregnant population. The sensitivity (85 %) and specificity (84 %) rates for a diagnosis of depressive disorder exceed those hypothesized. Validity measures were comparable to the sensitivity and specificity rates of 88 % in the original validation study of the PHQ-9 in primary care and obstetrics–gynecology clinics (Kroenke et al. 2001); 80 % sensitivity and 92 % specificity in a meta-analysis of 14 PHQ-9 diagnostic studies (Gilbody et al. 2007); and 82 % sensitivity and 84 % specificity in a study of postpartum women (Gjerdingen et al. 2009). This study found higher sensitivity and specificity values than those (74 and 73 %) found in a study using provider diagnosis as the validation criteria in pregnant women (Flynn et al. 2011). This study also found higher sensitivity and similar specificity than results (54 % sensitivity and 84 % specificity) of a study of the PHQ-8 instrument (which excludes the question on suicidal ideation) in a sample of pregnant women using a structured diagnostic interview (Smith et al. 2010). Our finding that predictive value was substantially higher when measured against a subdiagnostic level of depression means that the PHQ-9 also identified many women with pervasive distress who could also benefit from an appropriate intervention.
One potential difficulty with assessing depression prenatally is that several symptoms of depression—fatigue, appetite changes, and sleep problems—are also associated with pregnancy. For the PHQ-9, these symptoms were scored at face value. For the SCID, interviewers are instructed to probe for “medical” problems that might account for these symptoms. The interviewer guide we developed for the study included specific probes to determine whether the study participant thought her somatic symptoms were pregnancy-related. If she thought so, the interviewer did not ask about symptom frequency, precluding our ability to conduct an analysis to see to what extent this study procedure affected SCID diagnosis rates and concordance with the PHQ-9. Excluding somatic symptoms is a conservative approach and, in our study, may have produced an underestimate of depression and reduced concordance between the PHQ-9 and the SCID. Future studies should explore the implications of counting or excluding somatic symptoms in depression diagnoses during pregnancy.
Another limitation of the study is the lack of generalizability beyond the study-specific population of urban women served by community health care centers. However, availability of substantial numbers of African-American and American-Indian women in the study sample is important, as birth outcomes are poorest among these populations (MacDorman 2011). Additionally, the exclusion of women who were not fluent in English limits the generalizability of the findings. There were some differences within the women who consented between those who completed the SCID and those who did not. However, these sample differences were relatively small and were probably more likely to have an effect on the prevalence of depression rather than the concordance between the two study instruments. Nonetheless, more research is needed on mental health screening among diverse immigrant populations.
Another consideration for the generalizability of the study is that the PHQ-9 was embedded in a multidimensional screening interview that first asked about basic needs and social support. This introduction may have provided a context in which women were more comfortable disclosing emotional distress. Furthermore, there were methodological differences in administration. We used face-to-face interviews for both instruments. Other studies have had women complete questionnaires in isolation, via a paper-and-pencil or computerized questionnaire. The extent to which methods of administration affect candor among prenatal populations or subpopulations is unknown.
Prenatal screening for depression serves two purposes: identifying patients who may meet diagnostic criteria as well as those whose pervasive symptoms may adversely affect their pregnancy despite falling short of a diagnostic threshold. An earlier study of a similar population found that PHQ-9 risk classifications were significantly correlated with other risk factors measured by the PRO, most notably food insecurity and housing instability (Harrison and Sidebottom 2008). Casting a wider net to identify emotional distress during pregnancy, and simultaneously screening for other stressors and risk behaviors, provides critically important information to assist providers in discussing the context of a woman's depression with her and working jointly to develop an appropriate care plan.
This research was funded by grant number R40MC07840 from the Department of Health and Human Services, Health Resources and Services Administration. The authors acknowledge Stacye Ballard for conducting the diagnostic interviews at study sites and Carol B. Peterson for providing SCID training to the research team. The Community University Health Care Center and NorthPoint Health and Wellness Center in Minneapolis and West Side Community Health Services East Side Family Clinic in St. Paul served as study sites.
Conflicts of interest
The authors have no financial conflicts of interest.
- American Psychiatric Association (2000) Diagnostic and statistical manual of mental disorders, 4th edn. APA, Washington DC, DSM-IVGoogle Scholar
- First MS, Spitzer RL, Gibbon M, Williams J (2005) Structured clinical interview for DSM-IV-TR axis I disorders, research version-patient edition (SCID-I/P). Biometrics Research Department, New York State Psyciatric Institute, New YorkGoogle Scholar
- Flynn HA, Sexton M, Ratliff S, Porter K, Zivin K (2011) Comparative performance of the Edinburgh Postnatal Depression Scale and the Patient Health Questionnaire-9 in pregnant and postpartum women seeking psychiatric services. Psychiatry Res 187(1–2):130–134. doi:10.1016/j.psychres.2010.10.022 PubMedCrossRefGoogle Scholar
- Harrison PA, Godecker A, Sidebottom AC (2011b) Validation of the alcohol use module from a multidimensional prenatal psychosocial risk screening instrument. Matern Child Health J. doi:10.1007/s10995-011-0926-2