Significance

What is already known on this subject? Postpartum depression has a high prevalence and its early detection and treatment improves the prognosis of both mother and child. Screening for postpartum depression may be valuable to improve detection and mother and child outcomes, if implemented in the right setting.

What this study adds? This review supplies an overview of the current evidence on the value of screening for PPD in a well-baby care setting. The evidence found is limited but promising; it shows that screening in WBC leads to higher detection, referral and treatment and, when combined with enhanced care, to improvement in lowering depression scores.

Introduction

Children’s early social-emotional development affects their mental health during their entire life-course. The parents’ mental health problems can affect this development negatively. One of the most frequent mental health problems that mothers encounter after delivery is postpartum depression (PPD). An analysis of 28 prevalence studies showed that 7.1 % of women suffer from major depression in the first 3 months postpartum. When minor depression was included, the prevalence increased to 19.2 % (Gavin et al. 2005). Children of mothers who had experienced PPD have more difficulties in their cognitive, social-emotional and language development, and have higher levels of internalizing and externalizing behavior, as well as general psychopathology later in life (Goodman et al. 2011; Kingston and Tough 2012; Brand and Brennan 2009). Early treatment of maternal PPD may reduce these problems (Wan and Green 2009; Sohr-Preston and Scaramella 2006).

Depression can be treated effectively in several ways (O’Hara and McCabe 2013), but many cases of PPD remain undetected, partly because mothers face barriers to discuss their feelings (Liberto 2012) and partly because the professionals they encounter do not recognize the symptoms or fail to discuss them (Heneghan et al. 2000). Therefore, several articles on PPD advocate incorporation of screening in public healthcare (Gavin et al. 2005; Liberto 2012). Well-baby care (WBC) may be a very promising setting for early detection of maternal PPD as this setting provides routine check-ups during the first year after delivery (Gjerdingen et al. 2011). The intention of WBC is to monitor the child’s development and health, including the wellbeing of the parents. Examples of systems supplying this care are: the well-child care in the United States, health visitors in the United Kingdom, Child and Family Health care in Australia and preventive child health care systems in various European countries. Systems providing WBC often have large coverage. In some countries, WBC is being delivered to 95–99 % of newborn children (van den Heuvel et al. 2013), thereby also reaching the majority of postpartum mothers.

A few reviews on the efficacy of screening for PPD are available (Myers et al. 2013; Hewitt et al. 2009), but none of these specifically address the value of screening in a WBC setting. We therefore systematically reviewed the evidence on the effectiveness of screening for PPD in WBC compared to no screening, regarding mother and child outcomes and report our findings here according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) Statement (Liberati et al. 2009).

Methods

Search Method

A search was performed by the first author (A.Z.-B.) in three electronic databases: Scopus (including all the citations in PubMed and Embase from 1996), PsychINFO and CINAHL. We searched the databases for publications up to May 2014. The search strategies were based on the MESH-terms (MEDLINE thesaurus) available for the subject and the key terms extracted from the background literature. Three main concepts were combined and fed into the search engine: postpartum depression, early identification, and well-baby care setting.

As the subject is related to several research areas (psychiatry, child development, primary health care, women‘s health), we added a number of synonyms for each concept. We created several alternative terms for the well-baby care setting as the nature of this kind of setting varies from country to country. Full details of the search strategy in Scopus are reported in Appendix 1. We used the same search strategy for PsychINFO and CINAHL, except for the exclusion of subject areas as these databases do not have this option.

Selection Process

Two of the authors, A.Z.-B. and M.B.-B., independently assessed the eligibility of the resulting publications in three rounds. The first selection was based on the title. Next, the abstracts of the selected articles were reviewed according to the inclusion and exclusion criteria (Table 1), based on the PICOTS categories (Population, Intervention, Comparators, Outcomes, Timing and Setting). In the final round, the selected articles were judged after full-text-reading. Selected articles that appeared to be reviews were hand searched by one reviewer, A.Z.-B., for additional references. In each stage of the selection process, the reviewers used one of three response options to indicate their opinion as to whether an article should go to the next stage; “yes”, “no”, and “maybe”. The outcomes of the two independent reviewers were compared before proceeding to the next stage. Titles, abstracts and articles with differing opinions were discussed and reread if necessary. An independent third reviewer could be consulted to resolve remaining disagreements, but this proved to be unnecessary. The author of one article (Yawn et al. 2012) was contacted to obtain more information on the setting before deciding on its inclusion.

Table 1 Inclusion and exclusion criteria

A flow diagram of the selection procedure is shown in Fig. 1. Seven articles, concerning six individual studies, met the inclusion criteria and were used in this review.

Fig. 1
figure 1

Flow diagram of study selection

Quality Assessment

To assess the quality of the included studies, the reviewers independently applied the Quality Assessment tool for Quantitative Studies, developed by the Effective Public Health Practice Project (EPHPP) (Armijo-Olivo et al. 2012). Studies were rated on six aspects: selection bias, study design, confounders, blinding, data collection method, withdrawals and dropouts. The aspects were explored by answering guiding questions and were next rated according to established criteria, e.g. for an aspect like data collection methods, rating depended on the validity and reliability of the data collection tools. A study received a strong global rating when none of the aspects were weak, a study with one weak aspect was rated as moderate, and two or more weak aspects resulted in a weak global rating. Differences in quality ratings were discussed and agreement was reached by critically applying the criteria again. In addition to the standard EPHPP scoring, possible study specific biases were investigated by comparing method and result sections on contradictions and missing data.

Data Synthesis

One reviewer (A.Z.-B.) extracted the data from the six selected studies using a predefined data extraction form, including the results of two articles by Glavin et al. (2010) and Glavin (2012); they were compared but there were no conflicting or contradicting data. The data categories are presented in Table 2. The authors of all the included studies were approached for more information on certain aspects, like setting or population; three out of six authors responded and answered our questions. We described the differences and similarities of the studies in terms of setting, population, the intervention applied including specific screening aspects like instrument and timing, and the used outcome measures. After presenting the results of the quality assessment, a narrative synthesis was undertaken. The included studies were reviewed for a shared summary effect measure like risk ratio (RR) or odds ratio (OR), expressing the effect of screening on primary outcomes such as an improvement of depression scores. The extracted data were not pooled or analyzed statistically because of the small number of studies, the differences in the compared interventions, and the heterogeneity of the outcome measures and time horizons.

Table 2 Main characteristics of the included studies (N = 6)

Results

Setting and Population

The characteristics of the six included studies are presented in Table 2. The settings of the studies (Yawn et al. 2012; Glavin 2012; Glavin et al. 2010; Chaudron et al. 2004; Leung et al. 2011; Carroll et al. 2013; Gerrard et al. 1993) differ in location and the professionals performing the screening. In the studies by Chaudron et al. (2004) and Carroll et al. (2013), care was delivered by the pediatric staff from a primary care center. In the Norwegian Glavin et al. study (2010, 2012), public health nurses screened the mothers at well-baby clinics, a comparable setting to that of the Leung et al. (2011) study in Hong Kong, where nurses screened the mothers at Maternal and Child Health Centers. The screening investigated by Gerrard et al. (1993) was carried out by trained health visitors at baby clinics in England. Yawn et al. (2012) focused on family medicine research network practices in 21 USA states; 22 of the included practices offered continuity to the mother and her child, and six only to the mother. Pediatrician offices offering services only to the child were excluded. Except for the six practices studied by Yawn et al., the other practices offered frequent appointments to both mother and child. In the first year postpartum the frequency varied from 7 to 10. The intention of the settings was to service the general population and to reach 90–100 % of the mothers of newborn children in their area. The frequency and outreach of the services in the Gerrard et al. study (1993) could not be verified.

Intervention Content

The interventions offered in the various studies differed greatly. Those in the Chaudron et al. (2004) and Carroll et al. (2013) studies consisted mainly of incorporating screening questionnaires into the regular visits. In addition, Carroll et al. used a decision support system, incorporated in an electronic medical support system. Depending on the answers on the screening questionnaire, reminders were created by the system to guide clinicians during their visit. Four of the six studies (Yawn et al. 2012; Glavin et al. 2010; Leung et al. 2011; Gerrard et al. 1993) investigated an intervention consisting of both screening and enhanced care. In the Glavin et al. study (2010) screening was one of several components of the intervention and was followed by a standard supportive counseling session for all mothers with the Public Health Nurse. Depressed mothers received follow-up supportive counseling sessions. Yawn et al. (2012) compared a practice-based training program for screening, diagnosis, and management of mothers with PPD. Intervention practices were provided with a set of tools to facilitate each part of the process. Leung et al. (2011) also described the steps following screening: participants with a positive EPDS were directed to another nurse for counseling. During this session, subsequent management was recommended. This could be either non-directive counseling by a Maternal and Child Health Centre (MCHC) nurse or referral to the community psychiatric team. These steps were also offered to mothers clinically observed as depressed, and were therefore not limited to the intervention. Mothers with elevated EPDS scores in the post-training group of the Gerrard et al. study (1993) were offered 4–8 non-directive counselling visits by their health visitor.

Screening Instrument, Cut-off Score and Timing

Five studies used the EPDS as the screening instrument; four (Yawn et al. 2012; Glavin et al. 2010; Chaudron et al. 2004; Leung et al. 2011) had the same cut-off score of ≥10 and one, by Gerrard et al. (1993), selected 12 as the cut-off score. Glavin et al. (2010) and Chaudron et al. (2004) mentioned that clinical judgment should confirm the EPDS indication of a mother as probably being depressed. Leung et al. (2011) also considered a positive answer on question ten (suicidal ideation) as indicative. Carroll et al. (2013) adapted a validated two question depression screening tool into an existing pre-screening form. In the study by Yawn et al. (2012), mothers with an EPDS score of ≥10 were asked to complete the Patient Health Questionnaire (PHQ-9) as well. A mother was considered to have PPD if her PHQ-9 score was ≥10 and the physician’s evaluation revealed no other cause for the depressive symptoms. Carroll et al. (2013) reported the PHQ-9 was added as a hand-out to one of the two intervention arms to assist the physician in diagnosing depression but no PHQ-9 data were shown in the results. In the studies by Leung et al. (2011), Glavin et al. (2010) and Yawn et al. (2012), screening was performed once, at 2 months, 6 weeks and between 5–12 weeks postpartum, respectively. In the Chaudron et al. study (2004), mothers received the EPDS at each well-child visit during the child’s first year, starting with the routine 2 week visit. In the study by Carroll et al. (2013), mothers were screened every 3 months until the age of 15 months. Health visitors in the Gerrard et al. study (1993) were instructed to screen at 6–8 weeks and/or 10–12 weeks, depending on the number of training sessions attended by the health visitor.

Outcome Measures

The types of primary outcomes depended on the study design. Studies examining screening without enhanced care (Chaudron et al. 2004; Carroll et al. 2013) used documented depressive symptoms and referrals, indicated in Table 2 as primary outcomes at process level. Five studies (Yawn et al. 2012; Glavin et al. 2010; Chaudron et al. 2004; Leung et al. 2011; Carroll et al. 2013) reported the rates of the elevated scores on their screening instrument at the moment of intervention. None of the studies used a golden standard to confirm the PPD diagnosis. The four studies (Yawn et al. 2012; Glavin et al. 2010; Leung et al. 2011; Gerrard et al. 1993), which examined screening combined with enhanced care, used the screening instrument of their intervention also as a primary outcome measure for maternal depressive symptoms later in the postpartum year. Regarding secondary outcomes, different outcome measures were used. Three (Yawn et al. 2012; Glavin et al. 2010; Leung et al. 2011) of those studies used the Parenting Stress Index (PSI). The only secondary outcome at child level was the child’s body weight at 6 and 18 months presented by Leung et al. (2011).

Study Quality

Table 3shows the outcomes of the Quality Assessment tool for Quantitative Studies (Armijo-Olivo et al. 2012).

Table 3 Quality of the 6 included studies, assessed with the Quality Assessment tool for Quantitative Studies (Armijo-Olivo et al. 2012)

Four (Glavin et al. 2010; Chaudron et al. 2004; Carroll et al. 2013; Gerrard et al. 1993) of the six studies were globally rated as weak, according to this Quality Assessment tool. All four studies had a weak score on description and control of possible confounders. In both Chaudron’s (Chaudron et al. 2004) and Carroll’s (Carroll et al. 2013) study the data collection methods were weak as their data were based on health care provider documentations, which were incomplete and not based on valid instruments in the control groups.

Interpretation of Results

Four studies presented screening outcomes at process level (Table 2) (Yawn et al. 2012; Chaudron et al. 2004; Leung et al. 2011; Carroll et al. 2013). The effect on the detection rate when screening for PPD was quantified in three of the six studies (Chaudron et al. 2004; Leung et al. 2011; Carroll et al. 2013). The calculated RRs for detection of PPD in the studies by Chaudron et al. (2004) and Leung et al. (2011) were, respectively, 5.3 (8.5 %/1.6 %) and 4.8 (29 %/6 %). Improvement in the rate of referral in the study by Carroll et al. (2013) was presented with an OR of 2.06 (95 % confidence interval (CI) 1.08–3.93). We calculated the RRs for the other three studies: for the referral to a social worker in the study by Chaudron et al. (2004) the RR was 18 (3.6/0.2), for receiving treatment in the study by Leung et al. (2011) the RR was 4.9 (23.8/4.8), and for being diagnosed as PPD in the study by Yawn et al. (2012) the RR was 1.6 (66 %/41 %). Carroll et al. (2013) mentioned that adding handouts to the screening process resulted in earlier referral, but no data were presented.

Four of the six studies (Yawn et al. 2012; Glavin et al. 2010; Leung et al. 2011; Gerrard et al. 1993) (including the two strong studies) in which screening and enhanced care were combined in the intervention, showed significant improvement of depression scores later in the postpartum year in the intervention arms. In the Leung et al. study (2011), mothers in the intervention group had an RR of 0.59 (95 % CI 0.39–0.89) for having an elevated EPDS (≥10) at 6 months postpartum. In the Glavin et al. study (2010), mothers in the intervention group had an OR of 0.5 (95 % CI 0.3–0.8) for having an elevated EPDS (≥10) and in the Gerrard et al. study (1993) the post-training group had an RR of 0.51 (9.8 %/19.3 %) for an EPDS of 12 or above. Mothers in the intervention group in the Yawn et al. (2012) study had an OR of 1.74 (95 % CI 1.05–2.86) for having a ≥5-point drop in PHQ-9 score between baseline and 12 months postpartum. Of the mothers in the study of Glavin et al. (2010) who had an EPDS score of 10 or above at 6 weeks postpartum, those in the intervention group had a larger improvement in EPDS scores from 6 weeks to 12 months postpartum compared to the those in the control group (effect size 0.53). We could not create a summarized effect size as the measurement moments and outcome measures in the included six studies varied too much.

Regarding secondary outcomes, there were no results on child development or social-emotional wellbeing. No significant difference was found with respect to the child’s weight in the Leung et al. study (2011). At parent level, no statistical significant differences were found in secondary outcomes regarding measuring long-term effects (Table 2), except in the study by Glavin et al. (2010). The intervention group’s PSI Health subscale 12 months postpartum demonstrated a better score.

Discussion

This review has identified limited but promising evidence for the effectiveness of screening for PPD on maternal health outcomes. Four (Yawn et al. 2012; Chaudron et al. 2004; Leung et al. 2011; Carroll et al. 2013) of the six studies indicate an increase in detection rate of depressive symptoms or referral or treatment rates and four studies report a reduction in depressive symptoms at 3, 6 or 12 months postpartum (Glavin et al. 2010; Leung et al. 2011; Gerrard et al. 1993; Yawn et al. 2012). Screening on PPD leads to significant changes in the measured secondary outcomes at mother level in only one study; no relevant outcomes were measured at child level. Both strong quality studies were conducted in a setting providing care for both mother and child, with an intervention consisting of a combination of screening with some enhancement of care. It was not possible to untangle the effect of screening from the offer of extra care.

The improvement in depression scores, and yet the lack of the effect on secondary outcomes is comparable with studies on screening for PPD in general. In the HTA-review of Hewitt et al. (2009) outcomes were combined. This resulted in a pooled OR of 0.64 (95 % CI 0.52–0.78) for scoring above the threshold for depression for women in an intervention group compared to the control group. This effect is comparable to those demonstrated by Leung et al. (2011) and Yawn et al. (2012). The HTA review also encountered the same problem of disentanglement regarding the effect of screening and enhancement of care, and the lack of evidence of improving other maternal and child outcomes. The Agency for Healthcare Research and Quality (AHRQ) report (Myers et al. 2013) selected some of the same studies as our review, and also concludes that screening has a positive effect on depressive symptoms, but effects on secondary outcomes have not been proven.

The included studies may not have fully exploited the potentials of screening for PPD in WBC, for several reasons. One aspect is the timing of screening; the potential benefit of screening in a WBC setting may lie mainly in the possibility of repeated screening and continuous follow-ups. However, only three (Chaudron et al. 2004; Carroll et al. 2013; Gerrard et al. 1993) (weak quality) studies had repeated screening interventions. Furthermore, mothers in the control group of other studies (Yawn et al. 2012; Leung et al. 2011), with high scores on the screening instrument or suicidal thoughts at the time of intervention, were also given follow-up advice for ethical reasons. This may have reduced the effect of the intervention on secondary outcomes.

Another factor influencing the secondary outcomes may have been the follow-up-process after screening. Recent studies (Myers et al. 2013; Yawn et al. 2012) advise to incorporate follow-up care within the same (primary care) setting as the screening, which is the case in the two strong studies (Yawn et al. 2012; Leung et al. 2011). Although significantly more mothers in the intervention groups were diagnosed and/or treated, a substantial number of the depressed women did not receive this follow-up care. As a consequence, screening might have been less effective. Finally, most of the included studies used ≥10 as the EPDS screening cut-off score. According to Hewitt et al. (2009) this is the optimal cut point if screening for both major and minor depression, while 12 is optimal if screening for major depression only. Use of different cut points may affect the effectiveness of screening.

Only one study measured the effect of screening for PPD at child level by including the child’s weight. As the effect of PPD on the child’s wellbeing is an important argument in favor of the necessity of screening, we expected studies examining both screening and enhanced care to also include some outcomes at child level. Possible explanations for not including outcomes at child level might be the limited options for standardization of the quality of care after screening and for measuring social-emotional development in the first year after birth. In addition, controlling the moderators and mediators influencing the social-emotional development is difficult.

Strengths and Limitations

Although many countries have preventive child health care incorporated in their health care system, nomenclature proved to be quite diverse. We carefully identified the different options to ensure we included the most relevant articles in our search. Another strength of our review is the thorough systematic search of three extensive databases, supplemented by systematic hand searches of reviews included in the search. Every step of the selection process was consistently executed and judged by two independent reviewers.

A limitation may be that we did not search the grey literature for evidence, thus some relevant studies may have been missed. Reporting bias may have influenced the outcomes of this review, as the studies included in the review only reported the positive effect of screening.

Implications

Screening for postpartum depression calls for a setting that has the facility to combine screening with the judgment of a professional, reaches most new mothers, has professionals available who are in a position to create a bond of trust, and offers frequent contact to the mother in the first year postpartum. Professional preventive services for child healthcare can meet all of these criteria, and our current review supports the potential of screening in WBC with positive evidence. The small number of studies limits the precision of the effect estimates.

Future research should aim at creating stronger evidence of the possible benefits of this combination of characteristics when screening in a WBC setting. General aspects of the design and intervention need attention, such as cut-off scores, golden standards to be used, a control group and the possibility of separating the effect of screening and subsequent offers of extra care. Moreover, new research should explore the benefits of repeated screening during the first year postpartum and, preferably, also include outcomes at child level.

Conclusions

The evidence in this review on the effectiveness of screening for PPD in a WBC setting is promising, though based on a limited number of studies. The use of a validated instrument like the EPDS led, in all the included studies, to significantly higher detection of mothers with depressive symptoms or, when screening was combined with enhanced care, to improvement of depression scores. Whether this leads to better outcomes for mother and child on the long term needs additional high-quality research. The potential health gains of screening for PPD in a WBC setting are large but need to be confirmed.