Background

It is generally accepted that the detection of ante- and postnatal depression is crucial for the prevention of adverse effects on maternal and infant mental health [1]. Accordingly, recommendations for screening programs in primary care often include screening for antenatal and postnatal depression [2]. However, researchers have begun to question whether screening for mental health problems in the perinatal period should be limited to detection of depression [3, 4].

Anxiety is prevalent in the perinatal period [5, 6] and can have significant impacts on the mother as well as on the offspring [7,8,9]. Moreover, perinatal anxiety often co-occurs with depression [3]. For example, a population-based study of postnatal women (N = 522) found that 52% of those, who reported depressive symptoms, also presented with high levels of anxiety [10], another study (N = 4451) found a prevalence of postnatal anxiety of 18% out of which 35% also reported symptoms of depression [5], and a recent meta-analysis shows that the prevalence of co-morbid depression and anxiety disorder is 8.2% during the first 24 weeks postnatal [3]. Based on such results, it may be argued that perinatal screening programs should include the detection of women with high levels of anxiety [3, 11].

Yet, when implementing public health screening programs, the screening program must be time and resource-effective, and in most primary healthcare settings, it is not feasible to add extra instruments to already implemented screening programs. Taking a pragmatic approach, in the current study, we investigate the utility of using a subscale of the Edinburgh Postnatal Depression Scale (EPDS: [12]) to screen for anxiety in postpartum mothers.

The EPDS is one of the most validated perinatal screening instruments [13] and is implemented in many primary healthcare settings around the world, including the Danish where the current study was conducted. Although the EPDS was intended to be unidimensional, studies have demonstrated that the scale is multidimensional with at least two factors, including an ‘anxiety factor’ as well as a depression factor. Most studies find that the anxiety factor includes items 3, 4, and 5 (for a review, see Kozinszky et al. [14]). This factor has been referred to as the EPDS-3A [1]; henceforth, this term is used in the current paper. In a validation study of the Danish version of the EPDS [15], using exploratory factor analysis (EFA), we also found the EPDS to be multidimensional with a sub-factor consisting of items 3, 4, and 5.

Some authors suggest that the EPDS-3A can be used to screen for anxiety: Matthey (2008) [16] used principal components analysis to confirm the presence of items 3, 4, and 5 as a sub-factor and examined Receiver Operating Characteristics (ROCs) in relation to fulfilling diagnostic criteria for an anxiety disorder (N = 238). He found a cutoff score of ≥ 6 to be effective in detecting at least one anxiety disorder (sensitivity: 66.7%; specificity: 88.2%). Eighteen women met criteria for a diagnosis of anxiety, and 11 (61%) of these were not detected using the total EPDS cutoff score. Another study [17] also used principal components analysis to confirm the EPDS-3A and correlated EPDS-3A scores with scores on a psychosocial risk factors questionnaire. A score of ≥ 4 was suggested as the optimal cutoff as it identified the top-quartile of the sample. In this study, the proportion of women identified with the EPDS-3A, above those identified with the total EPDS, was not investigated. Stasik-O’Brien et al. [18]identified the EPDS-3A with EFA and used the cutoff suggested by Matthey [16] to detect mothers with emotional distress and anxiety symptoms in an at-risk sample. In this study, no measure of anxiety was used to confirm the positive EPDS-3A score. Of the 200 participants, 43 (21.9%) scored above EPDS-3A cutoff and ten (23.3%) of these did not screen positive on the total EPDS.

In contrast, two studies conclude that the EPDS-3A does not perform well enough to be used for screening for anxiety in the postnatal period. Among a range of screening tools, Fairbrother et al. [19] evaluated the EPDS-3A for the detection of postnatal anxiety (N = 360). A diagnostic interview was used as the criteria. To be recommended for widespread clinical application, the authors argued that a screening tool should evidence an area under the ROC curve (AUC) ≥ 0.8. The EPDS-3A performed slightly better than the total EDPS for detecting anxiety (AUCs: 0.757 vs. 0.744). However, with an AUC of 0.757, the authors conclude that the accuracy of the EPDS-3A was too low to recommend to be used to screen for anxiety. In a community sample (N = 550) [20] AUC for the EPDS-3A (which in this study consisted of items 3, 4, 5 along with item 10) was compared with AUC for the total EPDS for detecting clinical levels of anxiety measured using the Spielberger State-Trait Anxiety Inventory (STAI-6 [21]) as criteria. While AUC for the EPDS-3A was 0.729, it was 0.811 for the total EPDS, indicating that the total EPDS performed better than the EPDS-3A in terms of discriminating anxiety cases from non-cases. Similar results appeared when leaving out item 10 from the EPDS-3A. The authors conclude that adequate screening for anxiety requires an additional effort on top of the EPDS.

In sum, the evidence for the utility of using the EPDS-3A in routine screenings is mixed. The three studies advocating for using the EPDS-3A are limited by the use of different methodologies, relatively small sample sizes, and importantly, only one of them [16] validated the scale against anxiety casesness status.

Therefore, research is needed to inform decisions on whether the EPDS-3A could be used as a ‘good enough’ anxiety screener if including an extra instrument in a screening program is not feasible. Without adding extra time or burden on the screened woman, using the EPDS-3A could provide an efficient method for the detection of women in need of treatment who otherwise might be overlooked.

In the present study, we addressed the following research questions: (1) Can the Danish EPDS-3A be confirmed in a new sample of postnatal mothers recruited from the same population from which participants were recruited for the Dansih validation of the EPDS [15]?; and (2) can the EPDS-3A be used as an acceptable screening instrument in the detection of anxiety cases? We addressed this question by (a) evaluating ROCs of the EPDS-3A in relation to anxiety caseness status, and (b) investigating whether the EPDS-3A increases identification of women in need for further assessment and treatment above those identified by the total EPDS score.

Methods

Procedure and sampling

This study was part of a larger research project, the Copenhagen Infant Mental Health Project, that comprises the evaluation of several screening instruments for use in primary care, including a Danish validation of the EPDS published previously [15], and a treatment trial [22]. In the present study, we used two datasets: The dataset used for the Danish EPDS validation (data collected June 2015–July 2017: ‘original EPDS validation sample’) and a new dataset for which data was collected between August 2017 and June 2019 (‘new sample’).

Participants were recruited via public health visitors during routine home visits. At two months postpartum, and in some cases also at later visits, all mothers in Copenhagen are offered to be screened with the EPDS. To ensure recruitment of a sufficient number of women scoring in the high range of the EPDS, an oversampling strategy was used, similar to that used in other studies [23]. Accordingly, all mothers scoring 10 or more at the routine screening were invited to participate in the study. Additionally, to cover the whole range of the EPDS, a subgroup of health visitors invited women scoring 0–9 (for a detailed description of recruitment sampling strategy, see [15]). A psychologist conducted a second home visit where written informed consent was obtained, questionnaires on sociodemographic information were filled in, and the EPDS was administered again. The EPDS data collected during this visit was used in the present study. Participants received an online survey that included an anxiety measure. From August 2017, the protocol for the psychologist’s home visit was changed so that only mothers who were enrolled in the treatment trial received the online survey. Therefore, anxiety data were available for a smaller number of women than EPDS and sociodemographic data. Inclusion criteria were: mother at least 18 years old, speaks and reads Danish, and infant between 2 and 10 months.

Measures

Edinburgh Postnatal Depression Scale [12] is a well-validated 10-item self-report questionnaire (range 0–30) designed to screen for possible depression in new mothers [See also Additional file 1: Edinburgh Postnatal Depression Scale]. In the Danish validation study of the EPDS [15], a cutoff score of ≥11 was found to be the most efficient to detect depression according to both ICD-10 and DSM-5 diagnostic criteria for depression (sensitivity: 79.2 and 78.2 respectively; specificity: 94.4; PPV: 49.0). This corresponds to findings from a recent meta-analysis of the accuracy of the diagnostic accuracy of the EPDS using individual participant data from 58 studies (N = 15,557) finding a cutoff of 11 or more to maximize combined sensitivity and specificity across reference standards [24]. Each item be can be scored from 0 to 3, and thus, the range of the EPDS-3A subscale is 0–9. In the current sample, Cronbach’s alpha for the full EPDS was 0.88.

Hopkins Symptom Check-List, SCL-63 [25, 26] was used to establish anxiety caseness status. The questionnaire includes 63 items each rated on a five-point Likert Scale ranging from 0 (not at all) to 4 (extremely). The timeframe is the past week. The SCL-63 was validated in a Danish adult population (N = 1152) [25]. National norms and cutoffs have been established in a population sample of 2040 adult Danish women [26]. We used the cutoff for Danish women of the anxiety scale to define anxiety caseness. The cutoff indicates the demarcation between normal distress and clinical anxiety cases, which is in contrast to diagnostic systems, where classifications are based on diagnostic symptom criteria and not on quantitative measures of symptom severity with population norms. Cronbach’s alpha for the SCL anxiety subscale was 0.86 in this sample.

Data analysis

Research question 1

Because there are several validated structures for the EPDS [14] and as EPDS-3A has already been identified with EFA in a Danish population of postnatal women [15], we used confirmatory factor analysis (CFA) to confirm the Danish EPDS-3A in the new sample [27]. IBM AMOS 26.0 [28] was used to conduct CFA as well as preliminary analyses to CFA.

Preliminary analyses

Prior to CFA, we used Mahalanobis Distance to screen for multivariate outliers. Multivariate normality kurtosis coefficient was assessed using a critical ratio of 5.0 as threshold as indicated by AMOS [28]. Since the presence of outliers and non-normal distribution of data can potentially cause inflated model fit statistics and standard error bias, we applied robust methods to handle both multivariate outliers and non-normality [29] as suggested by previous research [30]. Data screening indicated the presence of outliers and non-normal distribution of data (see also results) and therefore, we used bootstrapping as a robust solution for these violations [31]. The Bollen–Stine bootstrap procedure has been shown to perform just as well as robust maximum likelihood procedures and to perform effectively, despite excessive non-normality [29], and was therefore applied.

CFA

We compared a unidimensional model consisting of all 10 EPDS items with two different two-dimensional models to determine which measurement model best fitted the data. The tested models were: (1) a 10-item, one factor model, (2) a 7-item, two-factor model based on the EFA results from the Danish validation of the EPDS, and (3) an alternative two-factor model based on results from previous studies [32]. The initial EFA of the Danish EPDS suggested a three-factor structure where item 10 (the ‘self-harm item’) emerged as a separate third factor in addition to a depression factor consisting of items 1, 2, 8, and 9, and the anxiety factor consisting of items 3, 4, and 5. However, as also reported previously [15], parallel analysis did not firmly establish whether the correct number of factors were two or three, and in terms of proportion of explained variance, the two first factors were far more important than the third “one-item factor” that explained less than 10% of the variance. As factors with less than three items are weak and unstable [27], in the present study, in model 2 we omitted the third factor consisting of item 10. In Model 2, we also omitted items 6 and 7 because they cross-loaded on factor 1 and 2 in the EFA, thus resulting in a 7-item two-dimensional model. In the alternative 10-item two-factor model (Model 3, based on [32]), we included all items in the depression factor except from items 3, 4, and 5.

The CFAs were conducted using maximum likelihood estimation with and without bootstrapping (set at 200 samples) and the Bollen–Stine bootstrap p-value to assess fit in addition to model fit indices [29]. These included: χ2 -test statistics (χ2/df), Comparative Fit Indices (CFI), Non-Normed Fit Index, or Tucker Lewis Index (TLI) and Root Mean Square Error of Approximation with accompanying 90% confidence interval (RMSEA). A good model fit is obtained when χ2/df ≤ 3, CFI and TLI > 0.90, and RMSEA < 0.08. With χ2/df values < 2 and RMSEA < 0.05, the model has an excellent fit. With χ2/df values below 2 and RMSEA < 0.05, the model has an excellent fit [33, 34]. Additionally, Modification Indices (MI) for chi-square change were considered for each model, and item errors were allowed to covary based on (a) MI values indicative of the greatest chi-square improvement, (b) theoretically meaningful covariances (such as similar or reversed items), and (c) if the items loaded on the same factor [28]. Factor loading threshold was established at 0.40 for moderate loadings [35]. Finally, for model comparison we evaluated the Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC), with lower values indicative of a superior model [36]. Following model specification by CFA, internal consistency of the superior model was examined using Chronbach’s Alpha. However, as alpha for a scale comprising of more than one dimension has been demonstrated to underestimate reliability, in addition, construct reliability was examined using McDonald’s coefficient omega (ω) and hierarchical omega coefficient (h − ω) [37, 38]. This was calculated using an OMEGA macro extension for IBM SPSS 26 [39].

Research question 2

All subsequent analyses were conducted in R version 3.6.3 on a dataset combined of the original EPDS-validation sample and the new sample, and this data set was reweighted to match the target population. The calculation of sample weights was based on EPDS scores from 4931 postnatal women screened by health visitors in Copenhagen effectively giving us the population-wide distribution of EPDS scores (see also, [15]).

To define anxiety caseness status to be used as the criteria in the ROC analysis of the EPDS-3A, we used the raw score cutoff for the Danish SCL-63 anxiety scale (=1.15) [26]. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and AUC for all relevant cutoffs were computed directly from the reweighted data for anxiety caseness status. Confidence intervals were computed by embedding the calculations in a weighted logistic regression which provided confidence intervals corrected for weights. Because data-driven selections of optimal cutoffs may lead to overestimates of sensitivity and specificity [40], prior to analysis, we set criteria for what we considered acceptable levels of sensitivity and specificity when the EPDS-3A is used for first-phase screening purposes in addition to the full EPDS. Thus, to not overwhelm clinical services with many inappropriate referrals, we prioritized a high specificity (≥ 90%) while still aiming at a sensitivity that ensured the detection the majority of anxiety cases (≥ 70%).

Cross tabulation was used to investigate the proportion of anxiety cases identified by the EPDS-3A but not by the total EPDS.

Table 1 Sample characteristics

Results

In the new sample, EPDS data was available for 442 women. The combined dataset comprised 762 women (including the original EPDS-validation sample, N = 320). Sample characteristics are shown in Table 1. Anxiety data was available for 532 women. Of these, 161 (30.26%) presented with clinical levels of anxiety. In the combined dataset, mean total EPDS score was 11.25 (SD = 5.58; range = 0–28) and for the EPDS-3A, it was 4.56 (SD = 2.23; range: 0–9).

Research question 1: confirmation of the EPDS-3A

Preliminary analyses

Data screening revealed 63 multivariate outliers and because reviewing data showed no outliers or data inaccuracy, outliers were not omitted from the data set. Kurtosis and skewness scores for each item fell between + 2 and − 2, except from item 10 (“The thought of harming myself has occurred to me”). Multivariate normality kurtosis coefficient was 8.751, with a critical ratio of 5.938, which was above the threshold of 5.0. In sum, data screening indicated the presence of outliers and non-normal distribution of data, potentially causing inflated model fit statistics and standard error bias. The Bollen–Stine yielded a p-value for estimation of overall model fit, with a threshold of p ≥ .05 indicative of a model with excellent data fit [29, 41].

CFA results

CFA results are presented in Table 2. Fit indices of Model 1, the unidimensional, 10 item model of the EPDS, initially revealed a poor model fit: χ2/df (157.907/35) = 4.521, p < .000, CFI = 0.916, TLI = 0.892, RMSEA = 0.089 (0.075 –0.104). Bollen–Stine p = .005, indicating poor overall model fit. Modification Indices (MI) indicated covariances between errors for item 1 (“I have been able to laugh and see the funny side of things”) and 2 (“I have looked forward with enjoyment to things”), item 4 (“I have been anxious or worried for no good reason”), and 5 (“I have felt scared or panicky for no good reason”), and between item 8 (“I have felt sad or miserable”) and 9 (“I have been so unhappy that I have been crying”). After inspection of the suggested MI and item properties, we added error covariances between these items. As a result, the model fit indices improved to a close to excellent fit, however, Bollen–Stine p = .005 suggested an overall poor fit. Factor loadings were significant and moderate to high (0.52 0.75), except for item 10 (self-harm), which had a factor loading of 0.35. CFA for model 2, the EFA-driven two-factor, 7 item model, demonstrated good fit indices for the CFI = 0.952 and TLI = 0.920, but not for the χ2/df statistics (59.679/13) = 4.592, p < .000, or RMSEA = 0.90 (0.68–0.114) or the Bollen–Stine p = .005. However, when adding suggested error covariances between item 1 and 2 and item 8 and 9 on the depression factor, the model had an excellent fit across all indices (Table 2). All item factor loadings were moderate to high (between 0.61 and 0.75) and significant onto their respective factor. Model 3, the alternative two-dimensional, 10 item model revealed an overall acceptable fit, except for χ2/df (118.422/34) = 3.483, p < .000, CFI = 0.942, TLI = 0.924, RMSEA = 0.075 (0.061–0.090). Bollen–Stine p = .002 suggested a poor fit. MI’s indicated applying error covariances between item 1 and 2 and item 8 and 9 on the depression factor the model, resulting in a model demonstrating a marginal excellent fit, however, Bollen–Stine p = .006, indicated a poor overall model fit. All factor loadings were significant, with the majority having a moderate to high loading onto their respective factor (0.61−0.76), except for item 10, which had a loading of 0.34 on the depression factor. Comparison of both AIC and BIC statistics across the three models revealed that the alternative two-factor, 7 item model demonstrated a superior fit.

Cronbach’s alpha of the depression factor α = 0.80 and for the anxiety factor α = 0.68, and for the total EPDS with 7 items α = 0.82. Coefficient omega was consistent with the alpha values, i.e. ω = 0.80 for the depression factor, ω = 0.69 for the anxiety factor, and ω = 0.81 for the total EPDS. Hierarchical omega coefficient was equally similar with the other reliability measures, i.e. h − ω = 0.81 for the total EPDS and h − ω = 0.79 for the depression sub factor without items 3, 4, and 5.

Table 2 Confirmatory factor analyses of the EPDS

Research question 2: the utility of the EPDS-3A in detecting anxiety

The distribution of raw EPDS-3A scores is presented in Fig. 1. The unusual distribution is due to the sampling strategy. The reweighted distribution, reflecting the distribution in the target population, is presented in Fig. 2.

Fig. 1
figure 1

EPDS-3A sores (raw counts)

Fig. 2
figure 2

EPDS-3A scores (weighted)

Table 3 presents sensitivity, specificity, PPV, NPV, and AUC for anxiety caseness status for different EPDS-3A cutoff scores. AUCs was 0.926. Employing our criteria for selecting an optimal cutoff, an EPDS-3A score of ≥ 5 was suggested by the data.

Table 3 Sensitivity, specificity, PPV, NPV, and AUC for anxietya of the EPDS-3A

Table 4 shows the proportion of anxiety cases identified by the EPDS-3A and the total EPDS respectively. The EPDS-3A and the total EPDS work in similar ways with each scale identifying 70.9% of the true anxiety cases. Of the anxiety cases identified by the EPDS-3A, 96.5% were also identified by the total EPDS. Moreover, the EPDS-3A identified 92.2% of true non-cases; this number was 92.7% for the total EPDS. Adding the EPDS-3A to the total EPDS leads to correct identification of an additional 2.5% of the true anxiety cases.

Table 4 Comparison of the total EPDS and the EPDS-3A with respect to percentage of identified cases with clinical levels of anxiety

Discussion

The objective of this study was to examine whether the EPDS items 3, 4, and 5 (EPDS-3A) can be used as a time and resource-effective way of screening for anxiety in the postnatal period in settings where the EPDS is already implemented and where implementation of additional instruments to detect anxiety is not feasible.

First, CFAs of the three models indicated that the EFA-driven two-factor structure of the EPDS consisting of a an anxiety factor (items 3, 4 and 5) and a depression factor (items 1, 2, 8 and 9) [15], was a better measurement model for the present data in comparison with a unidimensional model including all 10 original EPDS items and an alternative two-factor model found in a previous study [28] including all 10-items consisting of an anxiety factor (items 3, 4, and 5) and a depression factor including all remaining items. Overall, the different estimates of reliability indicated acceptable internal consistency (indicated by Cronbach’s alpha and construct reliability by coefficient Omega) although estimates for the anxiety factor were just below the generally agreed upon thresholds of 0.70 (e.g., [32, 42].

Taken together, our results confirmed the presence of a sub-factor consisting of items 3, 4, and 5, also found in previous studies [14], suggesting that the EPDS-3A is a stabil phenomena across samples and cultures. Second, in this Danish postnatal sample, the EPDS-3A evidenced an AUC of 0.926 indicating that the EPDS-3A has high discriminative power for detecting clinical levels of anxiety.

A score ≥ 5 was suggested by the data as the optimum cutoff for detecting clinical levels of anxiety. This cutoff fulfilled our criteria for selecting the cutoff with a sensitivity ≥ 70%, specificity ≥ 90%. In comparison, Matthey [16] suggested a cutoff of ≥ 6 to be the appropriate cutoff (sensitivity: 66.7%; specificity: 88.2%, and Swalm et al. ([17] suggested a cutoff of ≥ 4 to be optimal. These differences may be explained by several factors. Swalm et al. [17] did not use an external criteria for validating their cutoff and instead suggested their cutoff because it identified the top-quartile of their sample. In regard to the study by Matthey [16]differences in optimal cutoffs may be explained by differences in sample size and/or the classification of anxiety. In the sample used by Matthey [16], 18 (7.6%) of the mothers presented with anxiety whereas in our sample 161 (30.26%) mothers had clinical levels of anxiety. It should be noted that the high proportion of anxiety cases in our study is due to the oversampling strategy employed and does not represent prevalence rates in the target population. Still, this strategy provided us a with a solid basis for estimating accurate ROCs of the EPDS-3A as compared with the sample used in the Matthey [16] study. Moreover, Matthey [16] used a diagnostic interview when establishing caseness status which is often considered a more ‘conservative’ approach compared with self-report measures, and this may also explain the difference in cutoffs. Finally, cultural differences between Matthey (2008)’s Australian sample and our Danish sample may also explain the differences in cutoff scores.

An important aim of this study was to investigate the proportion of women identified by the EPDS-3A beyond those identified with the total EPDS. Some authors [16,17,18] have argued that adding the EPDS-3A to routine screening with the EPDS leads to a substantial additional identification of women in need for referral and treatment who might otherwise be overlooked. However, our results suggest that the vast majority (> 95%) of mothers with clinical levels of anxiety identified by the EPDS-3A are also identified using the total EPDS. Using the EPDS-3A alongside the total EPDS score would lead to an additional correct identification of 2.5% of the anxiety cases in a Danish population. Furthermore, it would lead to false identification of 2% of anxiety non-cases. Thus, our results show that implementing the EPDS-3A into routine screening with the EPDS only leads to a minor increase in the percentage of women in need of referral. Based on these results, it could therefore be argued that using extra effort on checking the EPDS-3A score during a routine screening with the EPDS should not be recommended. In terms of identifying anxiety, however, the EPDS-3A does not perform worse than the total EPDS. The total EPDS and the EPDS-3A both correctly identify 71% and miss 30% of the true anxiety cases. While not being a perfect anxiety screening instrument, we propose that the EPDS-3A indeed provides a brief, time-efficient tool to screen for anxiety with high discriminative power (AUC = 0.926) for detecting cases of anxiety if no other screening instruments are available or possible to implement.

Finally, our results show that the total EPDS indeed identifies a large group (≥ 70%) of women with clinical levels of anxiety. Considering our results concerning the EPDS-3A this is not surprising, yet the implications are of importance. Rowe et al. (2008) accordingly showed that the total EPDS detects but does not distinguish anxiety disorders from depression in mothers [43]. In their study, the EPDS correctly identified 28 women (44%) as having major depression, either alone or co-morbid with an anxiety disorder. However, 10 mothers (16%) screening positive on the EPDS only had an anxiety disorder and did not fulfill criteria for a diagnosis of depression. Based on these results it could be argued that it is a limitation of the EPDS that it is not specific to depression. On the other hand, since high levels of co-morbidity have been consistently demonstrated in women with psychological difficulties after birth [11, 44], it could also be argued that it is, in fact, a strength of the instrument that it detects a range of difficulties and not just depression. Otherwise, there would a risk of missing a substantial number of new mothers who need referral and treatment—with potential adverse consequences for the infant. Still, it is important for clinicians to keep in mind that the EPDS identifies a heterogeneous group of women and that a positive screening result may reflect anxiety either as the primary condition or comorbid with depression.

It is a limitation of the current study that only women from urban Copenhagen area were included. This may limit generalizability to the whole population. Further, it could be argued that it is a limitation that we used a self-repport measure (the SCL) to establish anxiety caseness status. While diagnostic interviews are usually considered the ‘gold standard’, it can be questioned whether a diagnosis is more valid for the detection of women in need of referral and treatment than the SCL used in our study. Diagnostic classification is based on diagnostic symptom criteria and not on comparison of quantitative measures of general distress and symptom severity with population norms. The purpose of the SCL-cutoffs for caseness is to indicate the demarcation between normal distress and clinical cases (e.g. clinical anxiety) which in our perspective is in accordance with the purpose of screening for mental health issues in the perinatal women. Moreover, the SCL-63 used in our study is thoroughly validated [25, 26], and the cutoff is based on population norms for adult Danish women.

We also believe that our study has some strengths. Using of an over-sampling strategy ensured high numbers of women scoring within the high range of the EPDS (and in turn the EPDS-3A). Further, the use of sample weighting according to the population wide distribution of the EPDS scores in Danish postnatal women enabled us to re-weigh the sample corresponding to if we had done a random sample of the full population but with substantial higher statistical power.

Conclusions

In this study of postnatal women, a cutoff of ≥ 5 on the EPDS-3A was found to be efficient for identifying women experiencing clinical levels of anxiety (sensitivity: 70.9%; specificity: 92.2). In settings where the EPDS is already implemented and where adding extra mental health screening instruments is not feasible, the EPDS-3A could be used as an resource-effective means of detecting mothers with possible anxiety disorder. At the same time, adding the EPDS-3A to routine screening with the EPDS only leads to a minor increase in the percentage of women in need of referral because the vast majority of women screening positive on the EPDS-3A also screen positive on the total EPDS. Using the EPDS-3A score along with the total EPDS score can indicate whether a mother may be suffering from anxiety either co-morbid with depression or as the primary problem.