Introduction

Mood and anxiety disorders in the perinatal period affect up to a quarter of women (Howard and Khalifeh 2020). Timely and accurate diagnosis is essential (Kroenke 2020), as early interventions have the greatest potential to improve the wellbeing of mothers and children (Phua et al. 2017; Letourneau et al. 2017). However, as it has been estimated that over half of cases go undetected (National Childbirth Trust 2017; Sudhanthar and Thakur 2019), guidelines have recommended screening and case-finding strategies (National Institute for Health and Care Excellence 2014; Austin et al. 2017; Scottish Intercollegiate Guidelines Network (SIGN), 2012; American College of Obstetricians and Gynecologists 2018).

However, screening strategies to date have focused predominantly on perinatal depression. This is a significant shortcoming, as exposure to prenatal maternal anxiety has detrimental behavioural and cognitive effects on the offspring (van der Zee-van den Berg et al. 2019; O’Donnell et al. 2017) and associated changes in early brain development (Lautarescu et al. 2020). Further, prenatal anxiety is associated with an increased risk for severe postnatal depression (Norhayati et al. 2015). A recommendation to expand antenatal screening to include a tool to assess for anxiety may be impractical in the context of antenatal clinics that are already under significant pressure. An alternative would be to adapt our current screening tools to identify potential anxiety disorders.

The most commonly used screening questionnaire for perinatal depression is the 10-item self-rating Edinburgh Postnatal Depression Scale (EPDS). This was originally developed for postnatal depression (Cox et al. 1987), but its use has since been expanded to prenatal populations (Kozinszky and Dudas 2015; Vázquez and Míguez 2019; Smith-Nielsen et al. 2018). A total score of 13 or more is typically considered to indicate depressive symptoms (Cox et al. 1987; Milgrom and Gemmill, 2015), but a recent meta-analysis suggested that a cut-off of 11 or more maximises combined sensitivity and specificity across reference standards (Levis et al. 2020).

Growing evidence from factor analysis studies suggests that the EPDS is not a unidimensional measure of depression and may be a useful tool for screening for perinatal anxiety (Matthey 2008). More specifically, 3 of the EPDS questions, namely items 3 (“I have blamed myself unnecessarily when things went wrong”), 4 (“I have been anxious or worried for no good reason”) and 5 (“I have felt scared or panicky for no very good reason”), are suggested to constitute an anxiety subscale, called the EPDS-3A. In perinatal women, the EPDS-3A has been found in studies using the English version of the EPDS (Ross et al 2003; Jomeen and Martin 2007; Matthey 2008; Tuohy and McVey 2008; Cunningham et al. 2015) as well as studies using translated versions of the EPDS such the Chinese (Lau et al. 2010), Spanish (Hartley et al. 2014), Japanese (Kubota et al. 2014), Hebrew (Bina and Harrington 2016) or Danish (Smith-Nielsen et al. 2018) versions. It is important to note that some studies do not find the EPDS-3A as a separate factor (e.g. Coates et al. 2017; Phillips et al. 2009), some report an anxiety factor including other items (e.g. Adouard et al. 2005) and some report that a one-factor model is the best fit for the data (e.g. Lydsdottir et al. 2019).

A growing number of researchers have suggested a separate analysis of the EPDS-3A score to screen for perinatal anxiety (Jomeen and Martin 2005; Swalm et al. 2010; Matthey 2008; Phillips et al. 2009; Tuohy and McVey 2008). High scores on the EPDS-3A have been associated with anxiety disorders (Matthey 2008), being a “worrier” (Swalm et al. 2010) and are more strongly associated with anxiety than depression scores (Loyal et al. 2020). Further, the EPDS-3A has the potential to be particularly helpful in detecting anxiety disorders without comorbid depression, in women who may otherwise not reach the total cut-off score necessary for further action (Matthey 2008; Muzik et al. 2000). A cut-off of 6 or more was validated in a postnatal community sample (Matthey 2008), while a cut-off of 4 or more was validated in a sample of women with unsettled infants (Phillips et al. 2009). However, some researchers have voiced concerns regarding the suitability of the EPDS-3A as a screening measure (Adhikari et al. 2020; van der Zee-van den Berg et al.2019; Matthey and Agostini 2017, Matthey et al. 2013).

Although previous studies have examined the factor structure of the EPDS, the methodology used for exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) has been inconsistent. Most EFA studies have incorrectly used orthogonal rotations, which assume that variables are not correlated (e.g. Swalm et al. 2010; Maroto-Navarro et al. 2005; Mazhari and Nakhaee 2007; Vivilaki et al. 2009; Flom et al. 2018) or principal component analysis, a technique more appropriate for data reduction (e.g. Brouwers et al. 2001; Adouard et al. 2005; Zhong et al. 2014; Agampodi and Agampodi 2013; Töreki et al. 2013; Lau et al. 2010; Montazeri et al. 2007; Chabrol and Teissedre 2004; Matthey 2008). Other studies have used varying methods including polychoric correlation matrices (Lydsdottir et al. 2019), or Pearson correlation matrices with maximum likelihood extraction (MLE, Kubota et al. 2014, 2018; Phillips et al. 2009; Stasik-O’Brien et al. 2019; Coates et al. 2017), ordinary least squares (Chiu et al. 2017) or principal axis factoring (Tuohy and McVey 2008; Pop et al. 1992). CFA studies have used either MLE (Zhong et al. 2014; Flom et al. 2018; Hartley et al. 2014; Kozinsky and Dudas 2015) or weighted least squares methods (Jomeen and Martin 2007; Kwan et al. 2015; Lydsdottir et al. 2019; King 2012; Gutierrez-Zotes et al. 2018; Martin and Redshaw 2018).

The variability in methodology may explain differences in results reported in the literature.

Other methodological limitations in previous studies include small sample sizes, treating the EPDS as an interval rather than ordinal scale, not reporting cross-loadings, not accounting for non-normally distributed data, not reporting the frequency of responses per each individual item, and performing CFA on the same sample on which EFA was done. Further, most studies have focused on postpartum women, with only two UK-based studies investigating the factor structure of the EPDS antenatally (Jomeen and Martin 2005, n = 101; Jomeen and Martin 2007, n = 148).

In this study, we aim to overcome some of the methodological shortcomings of previous research and assess, both pre- and postnatal, the factor structure of the EPDS in several subsamples, including self-identified high-risk women and community samples. As a past history of anxiety is a risk factor for perinatal anxiety (Leach et al. 2017; Field 2018), a secondary aim of our study was to assess whether the EPDS-3A is associated with maternal history of anxiety disorders.

Methods

Participants

Participants were recruited between March 2015 and March 2020 as part of the developing Human Connectome Project (dHCP, community sample) and the Perinatal Stress Study (high-risk sample). Ethical approval was obtained from the Riverside NHS Research Ethics Committee in the UK (14/LO/1169 and 18/LO/0786).

The dHCP is a large-scale neuroimaging project, with eligibility criteria including pregnant women (aged 16 years or older), with a gestational age of 20–42 weeks, and newborn infants aged 24–44 weeks. The Perinatal Stress Study included pregnant women (any trimester) who self-identified as experiencing low mood during pregnancy (Supplement).

Measures

All participants were asked to complete the English version of the EPDS. Total scores were calculated with cut-offs set at 11 or more and at 13 or more. EPDS-3A scores were calculated with cut-offs set at 4 or more and 6 or more.

For the Perinatal Stress Study, women completed the EPDS online, alongside a short demographics questionnaire (maternal age, gestational age, and GP details). Eligible pregnant women (total EPDS 13 or more, BMI < 30, no contraventions for MRI) were invited to take part in the dHCP (Supplement).

For the dHCP sample, women were asked to complete the EPDS at the time of their visit to St Thomas’ Hospital in London for an MRI scan (prenatally and/or postnatally). Participants also completed a questionnaire pack which included demographics, medical history, and mental health history. Participant history of mental health concerns (coded as binary yes/no) was determined based on a combination of multiple sources: maternal self-report (in the questionnaire pack), maternity notes, and mental health records from South London and Maudsley NHS Foundation Trust (Supplement). There are no measures of current depression or anxiety symptomatology other than the EPDS in this study.

Statistical analysis

Data analysis was performed using R (R core team 2018). Throughout the manuscript, “ < ” is used to signify “less than” and “ > ” is used to signify “greater than.” Each dataset was divided randomly into two subsets: 40% for EFA and 60% for CFA. Listwise deletion was used for handling missing data, and only questionnaires with full data were included in the analysis (McNeish 2017). To evaluate the internal consistency of the instrument, Cronbach’s alpha (α) was calculated (acceptable value above 0.70). We also calculated McDonald’s omega (ω) (Hayes and Coutts 2020), as it has been suggested that α is only informative in restrictive settings (Raykov 2004). Raincloud plots were used for visualisation (Allen et al. 2019). Raincloud plots combine split-half violins (showing the probability density of the data), boxplots (showing the median and interquartile range), and raw data points jittered for improved visibility.

To ensure data were suitable for factor analysis, we calculated the Kaiser–Meyer–Olkin measure of sampling adequacy (acceptable limit of > 0.5) and the Bartlett test of sphericity (p < 0.05 indicating that the correlations between items are sufficiently large). To detect multicollinearity, we calculated the determinant of the correlation matrix (acceptable value above 0.00001) (Watkins 2018; Jackson et al. 2009; Dziuban and Shirkey 1974).

In subsamples where the frequency of positive responses on item 10 (self-harm) was small, factor analysis was performed on only 9 items, to avoid calculation of potentially negative eigenvalues that would yield non-positive definite matrices (Wothke 1993, as per Chiu et al. 2017; Flom et al. 2018).

EFA

EFA was conducted on 40% of the sample. The number of factors for EFA was determined using the conventional Kaiser criteria (eigenvalues above 1), a scree plot, and parallel analysis using Minimum Rank Factor Analysis (MRFA, Shapiro and Berge 2002). MRFA was chosen due to its putative superiority in the identification of the number of factors for ordinal data and performance relative to methods such as Horn’s parallel analysis or those based on principal axis factoring (Baglin 2014). As the variables are ordinal, we used polychoric correlations to correct for bias (Holgado-Tello et al. 2010). However, given that the majority of analyses reported to date have treated the EPDS as an interval scale, we also repeated our analysis using a Pearson correlation matrix (Supplement).

The EFA was conducted using MLE with non-orthogonal oblique (oblimin) rotation, as the variables are expected to be correlated (Jomeen and Martin 2005). All loadings of 0.3 or more (including cross-loadings) were included. We applied a cut-off of 0.3 (Howard 2016; Martin and Thompson 1999, 2000; Martin et al. 2004) to generate a more complete psychological interpretation of data (Jomeen and Martin 2005), while a coefficient of 0.5 or more was used to indicate substantial factor loadings. A factor solution was considered meaningful if it explained at least 50% of variance (Streiner 1994).

An additional EFA was conducted on all 10 items using the whole sample (n = 1190), to determine whether there was a common factor structure for women across the perinatal period. This included all high-risk participants and one timepoint (randomly selected) for each participant in the community sample.

CFA

CFA was conducted on the remaining 60% of the data using weighted least squares mean and variance (WLSMV), which uses polychoric correlations and robust corrections to account for ordinal and non-normally distributed data (e.g. Lydsdottir et al. 2019; Albuquerque et al. 2017; Martin and Redshaw 2018). As much of the previous literature has used MLE, we also performed CFA using this method (results reported in Supplement).

To test the model fit, we used chi-square statistics, the comparative fit index (CFI, Bentler 1990), and Tucker-Lewis index (TLI, Tucker and Lewis 1973), with values above 0.95 indicating a good fit (Hu and Bentler 1999) and the Root Mean Square Error of Approximation (RMSEA, Schumacker and Lomax 2010), with values under 0.05 indicating adequate fit (Schumacker and Lomax 2010). Goodness of fit was also considered based on the clearest factor structure (i.e. items loading highly on only one factor, and few cross-loadings), plausibility, and interpretability.

In addition to the models suggested by the EFA, a number of models chosen to reflect the wide variety of solutions from the literature were also examined using CFA: 1-factor model (Cox et al. 1987; Lydsdottir et al. 2019); bifactorial model containing depression and anxiety (Phillips et al. 2009; Matthey 2008); bifactorial model containing depression and anhedonia (Zhong et al. 2014); a 3-factor model containing depression, anxiety, and self-harm (Brouwers et al. 2001); and three 3-factor models containing anhedonia, anxiety, and depression (Lau et al. 2010; Kubota et al. 2014; Tuohy and McVey 2008).

Maternal history of mental health concerns

To account for ordinal non-normally distributed data, the relationship between EPDS scores and history of mental health concerns was assessed using Wilcoxon rank sum tests, and effect sizes were calculated using Vargha and Delaney’s A (vd.a, Vargha and Delaney 2000, see Supplement). As the EPDS only asks about symptoms over the last 7 days, for longitudinal cases (i.e. where more than one EPDS was completed prenatally or postnatally), the highest score was selected for this analysis (see Supplement).

Results

Descriptive statistics

In total, 1374 EPDS questionnaires (Table 1) were available for factor analysis (n = 266 high-risk sample, n = 1108 community sample). EPDS total scores in the community sample were lower both prenatally (5.10 ± 4.33; 3.89 ± 4.11) and postnatally (5.56 ± 4.38, 6.54 ± 3.78) than in the high-risk prenatal sample (15.52 ± 5.25). The distribution of scores for item 10 (“The thought of harming myself has occurred to me”) was markedly different between the groups, with the community sample answering “Never” in 96.17% and 96.86% of the cases (prenatal and postnatal), while this was the case for only 66.16% of the high-risk sample (Table S2). In the community postnatal sample, the EPDS total score was higher in mothers of babies born extremely preterm (i.e. under 28 weeks), with a mean EPDS total score of 10.13 ± 5.36, than in mothers of babies born at term (i.e. over 37 weeks), with a mean EPDS total score of 5.22 ± 4.12 (see Supplement). As per the methodology described above, the factor analysis was performed on 10 items in the high-risk sample, and 9 items in the community sample.

Table 1 Descriptive statistics

Factor analysis

Reliability

The Cronbach’s α internal reliability coefficients for the EPDS were good for (a) the 10-item questionnaire for the prenatal high-risk sample 0.85 (95% CI 0.82–0.87), (b) the 9-item questionnaire for the prenatal community sample 0.87 (95% CI 0.85–0.89) and (c) the postnatal community sample 0.85 (95% CI 0.83–0.87). Values were similar for McDonald’s ω (Supplement).

EFA

EFA was conducted on 40% of each sample (n = 106 prenatal high-risk, n = 188 prenatal community, n = 255 postnatal community). All criteria for factor analysis were met (Table S3). Both 2- and 3-factor models were examined for all samples (Table 2).

Table 2 EFA using polychoric correlation matrices

In all 3 samples, the 3-factor EFA revealed distinct factors for anhedonia, anxiety, and depression (Table 2), while the 2-factor structure differed between the groups (i.e. anhedonia and depression in the high-risk sample, anxiety and depression in the postnatal community sample, and no clear solution in the prenatal community sample). All factors were positively correlated in all samples (Supplement). The models were similar for the EFA performed with a Pearson correlation matrix (Table S4). The EFA conducted on the whole sample (n = 1190) revealed a 2-factor solution including depression (items 1, 2, 6, 7, 8, 9) and anxiety (items 3, 4, 5) and a 3-factor solution including anhedonia (items 1, 2), anxiety (items 3, 4, 5) and depression (items 8, 9) (Table S6).

CFA

CFA was conducted on 60% of the sample (n = 160 prenatal high-risk, n = 283 prenatal community, n = 382 postnatal community). Across all groups, the model with the poorest fit was the unifactorial model (chi-square p values < 0.001, smallest CFI and TLI values, RMSEA poor fit, largest SRMSR values), followed by Zhong et al. (2014) 2-factor model of anhedonia and depression (Table 3). This was supported by the results of the MLE analysis (Table S5).

Table 3 CFA using WLSMV

Across all groups, the model with the best fit was the 3-factor model including anhedonia (items 1, 2), anxiety (items 3, 4, 5), and depression (items 7, 8, 9 and 10 where included in the analysis). This was the 3-factor model obtained through the EFA on the prenatal high-risk sample, as well as Tuohy and McVey’s (2008) and Kubota et al. (2014) models. Lau et al. (2010) 3-factor model was also a good fit (similar to models above, but including item 6 in the depression factor).

EPDS and history of mental health (community sample)

On average, the highest prenatal EPDS total score was higher in those with a history of mental health conditions (n = 148, M = 8.81, SD = 6.16) than in those without a history (n = 325, M = 3.88, SD = 3.18), W = 12,185, p < 0.001, vd.a = 0.253 (large effect size). This was also the case in the postnatal sample, where the highest EPDS total score was higher in those with a history of mental health conditions (n = 167, M = 7.25, SD = 5.43) than in those without a history (n = 473, M = 5.03, SD = 3.80), with W = 30,044, p < 0.001, vd.a = 0.382 (small effect size) (Fig. 1) (Supplement).

Fig. 1
figure 1

Raincloud plots showing distribution of highest prenatal and postnatal EPDS scores in women with and without a history of mental health conditions. For each group, jittered raw data are shown on the left; boxplots with median and interquartile range are shown in the middle; and density plots are shown on the right

EPDS-3A

The percentage of women with high EPDS-3A scores but EPDS total scores under threshold varied substantially based on the applied cut-offs (Table S7). For example, when using the EPDS-3A cut-off validated in a community sample (i.e. 6 or more, Matthey 2008) and the EPDS total cut-off recommended by a recent meta-analysis (i.e. 11 or more, Levis et al. 2020), the percentage of women that may have anxiety symptoms but not score high enough on the EPDS total to warrant further assessment ranged between 1.90 and 3.38%. However, when using the EPDS-3A cut-off validated in a sample of women with unsettled infants (i.e. 4 or more, Phillips et al. 2009) and the total EPDS original validated cut-off of 13 or more (Cox et al. 1987), these numbers rose to a range of 20.67–26.42%.

On average, the highest prenatal EPDS-3A score was higher in those with a history of anxiety disorders (n = 58, M = 4.37, SD = 2.47) than in those without a history (n = 415, M = 2.57, SD = 1.93), W = 6861, p < 0.001, vd.a = 0.285 (medium effect size). This was also evident in the postnatal sample, with higher EPDS-3A scores in those with a history of anxiety disorders (n = 51, M = 3.25, SD = 2.29) than those without a history (n = 589, M = 2.68, SD = 1.95), but the difference was not statistically significant, W = 12,856, p = 0.087, vd.a = 0.429 (negligible effect size) (Fig. 2) (Supplement).

Fig. 2
figure 2

Raincloud plots showing distribution of the highest prenatal and postnatal EPDS-3A scores in women with and without a history of anxiety disorders. For each group, jittered raw data are shown on the left; boxplots with median and interquartile range are shown in the middle; and density plots are shown on the right

Discussion

This study represents an exploration of the 3-factor model of the EPDS as administered prenatally and postnatally, in samples including both community and high-risk populations. We found that the 3-factor structure model of the EPDS (anhedonia, anxiety, and depression) was consistent across populations and was similar to that reported in previous studies (Tuohy and McVey 2008; Kubota et al. 2014; Lau et al. 2010). The EPDS-3A consistently emerged as a separate factor and was associated with a prenatal maternal history of anxiety disorders. These findings are important for several reasons.

Firstly, it has been argued that it is important to examine the utility of screening questionnaires in different populations of women at risk of postnatal depression (Austin et al. 2014). A strength of this study was the ability to examine the factor structure of the EPDS at different timepoints (i.e. prenatally and postnatally) and in different prenatal populations (i.e. high risk and community). It is important to note that the difference in setting (i.e. online versus clinical environment) may influence results, as the relative anonymity offered by the online environment could increase participants’ willingness to more accurately disclose sensitive information (Bowling 2005). While our samples did include mothers of infants born prematurely and a small number of adolescent mothers, further research is required to better understand the factor structure of the EPDS in different (at risk) groups as well as different settings.

Secondly, the current study used both polychoric and Pearson correlation matrices for EFA, in addition to WLS and MLE methods for CFA. This rigorous application of different methodologies increased confidence that the 3-factor model of the EPDS is a genuine construct, rather than the result of idiosyncratic methodological choices.

Thirdly, between 1.9 and 26.4% of the women who had screened positive for anxiety symptoms using the EPDS-3A had scored below threshold when using EPDS total score. This percentage was markedly influenced by the applied cut-offs, and the discrepancies highlight the urgent need for consistent validated cut-offs used across research and clinical settings. Given the inconsistencies associated with the EPDS and EPDS-3A, it may be preferable to use validated measures that screen for a variety of mood and anxiety disorders (e.g. the Matthey Generic Mood Questionnaire, Matthey et al. 2019). However, pending further validation studies, the EPDS-3A may be a useful adjunct to our current screening practice and facilitate patient-provider communication about anxiety symptoms, without further increasing the burden on women. This is of particular importance in contexts where no validated anxiety questionnaire is routinely administered in the perinatal period. It is important to note that the EPDS only asks about the last 7 days. A substantial proportion of high scores on the EPDS will reflect only transient symptoms of depression and/or anxiety (Agostini et al. 2019; Matthey and Ross-Hamid 2012).

Fourthly, our study was strengthened by the availability of data on maternal history of psychiatric disorders. This led to the finding of an association between prenatal EPDS-3A and maternal history of anxiety disorders. It remains unclear why postnatal EPDS-3A was unrelated to history of anxiety but this may be due to the postnatal sample consisting largely of women in the very early postnatal period (neonate postmenstrual age = 40.00 ± 3.64 weeks), when anxiety may be more related to specific experiences during labour (Bell and Andersson 2016, Paul et al. 2013). Further research is required in order to determine whether the EPDS-3A is uniquely associated with a history of anxiety disorders relative to other mental health concerns (Supplement). A major limitation of our study is the lack of validated measures of current anxiety symptoms. We recommend that future research includes comprehensive diagnostic interviews and clinical assessments of mental health and psychiatric history, as well as quantitative screening measures of anxiety.

Finally, it is important to note that although the overall factor structure was relatively stable, the exact factor loadings were influenced by analysis choices. Further studies are required to determine the measurement invariance (Widaman 2010) of this instrument.

We believe that studies may also benefit from analysing the relationship between EPDS-3A (or other screening tools) and physiological correlates of anxiety. For example, we are currently exploring the correlation between EPDS-3A score, maternal heart rate variability and neonatal brain development, using data collected from the wider dHCP project. It is hoped that this will enable us to better clarify the clinical relevance of a prenatal maternal EPDS-3A score, and ultimately to better target interventions with positive effects on early brain development.