The impact that eating disorders (EDs) have in pregnancy is considerable, although the best way to identify these disorders during this unique time remains to be determined. Estimates suggest that at least 5% of women experience some type of ED during pregnancy (Linna et al. 2014; Watson et al. 2013), although some estimates are much broader from 0.6 to 27.8% (Broussard 2012; Bye et al. 2020; Easter et al. 2013; Micali et al. 2007; Pettersson et al. 2016; Soares et al. 2009). These estimates reflect some of the challenges in identifying an ED in this population.

Women with EDs are more likely to have unplanned pregnancies, miscarriages, nutritional differences, and an increased risk of postpartum depression and anxiety as opposed to women without eating disorders (Chan et al. 2019; Kimmel et al. 2015; Zerwas & Claydon 2014). Some women with past EDs have similar nutritional patterns during pregnancy as individuals without EDs and many experiences an improvement in nutrition with pregnancy (Dörsam et al. 2019). However, there are some nutritional differences among individuals with a past or active ED which could affect fetal development. Additionally, there is burgeoning epigenetic research to show how modification of the genome can occur prenatally due to malnourishment, which can affect the developing fetus initially and throughout their life (Hoffman et al. 2017; Sebastiani et al. 2020). Identifying women with a current ED, those who are at-risk for developing an ED, and/or those who are at risk to relapse during pregnancy provides an opportunity for early intervention to improve outcomes for the mother and developing fetus.

Among women with infertility, 58% of those who presented with oligomenorrhea or amenorrhea had some form of an ED, none of whom disclosed those to their providers (Bruneau et al., 2017; Linna et al. 2014). This is consistent with recent qualitative evidence which suggests that many pregnant women with EDs during pregnancy or with a history of an ED do not feel comfortable disclosing information about their ED with their healthcare providers (Claydon et al., 2018). Other qualitative work on this topic indicates several barriers for identifying EDs during pregnancy which reinforce issues of stigma as well as limits to professional training (Bye et al. 2018). Additionally, most medical professionals are not well-trained in identifying pregnant women who may have an ED or clinically significant ED symptomatology unless they are in the psychiatry discipline or have taken continuing medical education specific to EDs (Anderson et al. 2017; Leddy et al. 2009). There is minimal training on EDs for medical professionals outside of the psychiatry specialty (Mahr et al., 2014), leaving most medical professionals who interact with pregnant women with limited knowledge on how to identify and help these patients.

To fill the gap with identification, rapid screening tools can be useful to assist with early screening. These rapid screening tools can assist in identifying at-risk patients, but currently, there are none that consider the unique characteristics and situation of pregnant women with EDs. The SCOFF is a five-item simple clinical tool used to identify a potential case of anorexia nervosa (AN) or bulimia nervosa (BN), but it is not validated in pregnancy (Morgan et al. 2013). Recent research has suggested that when used in a pregnant population, it can provide false positives, which has resulted in an expert consensus that the SCOFF is not the ideal tool for use in these populations (Bannatyne et al. 2018; Baudet et al. 2013). Currently, the Eating Disorder Examination Questionnaire (EDE-Q; Fairburn 2008) has the most psychometric data available in pregnancy but needs further validation and is too lengthy with 28 questions (Bannatyne et al. 2019).

Given those concerns, a few researchers have tried to adapt current measures such as the SCOFF (Hubin-Gayte and Squires 2012) or the EDE (Bannatyne et al. 2018) for use in a pregnant population. One revised version of the SCOFF used in a French pregnant population tried separating the SCOFF questions into pre-pregnancy and during pregnancy categories to identify and understand antenatal ED symptoms (Hubin-Gayte and Squires 2012). However, this method does not truly capture ED symptoms accurately during pregnancy (the time when both mother and developing fetus are at risk), and limited psychometrics were reported in the study. The adaptation of the EDE for pregnancy, known as the EDE-PV, had adequate internal reliability (0.59–0.67 for subscales) in a population of pregnant women who were overweight and/or obese (Bannatyne et al. 2018; Emery et al. 2017). However, the EDE-PV is a structured interview with multiple sections and is better for use as a full assessment, rather than for quick clinical identification. The EDE-PV has also not been validated in pregnant women who are under a BMI of 30 (Emery et al. 2017). But the use of BMI during pregnancy has been a topic of debate and it appears that is more accurate in the first trimester as the pregnancy progresses there are some other physiologic changes that may affect this parameter (Louise et al. 2020).

The aim of this study was to determine the psychometric properties for the Prenatal Eating Behaviors Screening (PEBS) Tool, which was created to address these concerns. The goal of this tool was to be rapid (less than 5 min to complete), applicable to most common EDs in pregnancy [AN, BN, binge eating disorder (BED), and other specified feeding and eating disorders (OSFED)], and valid across the span of pregnancy (all three trimesters). This was accomplished using large validation and development samples and making use of experts in both medical and ED research fields. Once validation was shown, the goal is for the Prenatal Eating Behaviors Screening (PEBS) Tool to be used as a rapid screen tool to identify pregnant women at risk in order to refer them for a specialist for further assessment.


PEBS item development

A series of questions were included from existing sources (e.g., EDE-Q, Fairburn 2008; SCOFF, Morgan et al. 1999; EDI-3, Clausen et al. 2011) and from pilot data themes from a previous study (Claydon et al. 2018). An initial list of 34 questions were combined into a single framework and reviewed by content experts, with an eye towards (1) adapting or using language to be appropriate for terms that should be used or avoided in the ED field (e.g., using person-first language; Weissman et al. 2016), (2) adapting or using language appropriate for women throughout their pregnancy (e.g., removing items that looked for a certain amount of weight loss over several months), and (3) making items follow similar Likert type patterns of responses for ease of respondent use (e.g., using “you” for stem, checking for consistency in “strongly agree” to “strongly disagree” in responses). Finally, questions were checked to ensure all four types of ED symptomology were represented by a content expert, and then a total of 9 questions were dropped for redundancy reasons. A final set of 25 questions underwent further piloting by a small sample (n = 3) of pregnant women with a diagnosed ED or history of an ED to check for face validity, accuracy in wording, and question interpretation by questioning relevance, language sensitivity, and accuracy as they filled out the study; all 25 items were then included in the development and validation samples. Supplemental Table 1 has the full list of 25 items along with a number of women responding to each Likert-type response for both samples.

Self-report ED diagnoses

As part of the survey, participants were asked to self-report whether they had ever been diagnosed with an eating disorder. If they selected yes, they were asked “Which eating disorders have you had” and could select any of the following: AN, BED, BN, or OSFED, formerly known as eating disorders not otherwise specified (EDNOS). Participants responded to each of those prompts by selecting: current professional diagnosis, past professional diagnosis, current self-diagnosis, or past self-diagnosis (see Appendix 1 for the complete survey). They were allowed to choose more than one answer in each category. Self-report ED diagnoses have been shown to be adequate in detecting current and lifetime AN and BN as measured by diagnostic interviews (Keski-Rahkonen et al. 2006). Participants were counted as having a current diagnosis (yes) if they selected either current professional diagnosis and/or current self-diagnosis for any of the four ED types. These current self-reported diagnoses were used because it was not feasible for the scope of a psychometric study to conduct full diagnostic interviews on all participants.

Pregnancy trimester

Pregnancy trimester was assessed by a self-reported week of pregnancy and categorized as first trimester, weeks 4–13; second trimester, weeks 14–27; and third trimester, weeks 28–40.


Two separate samples were collected for this study. The development sample was collected first via Amazon Mechanical Turk (MTurk, an online crowdsourcing site to complete tasks virtually), and online mother and pregnancy groups with an IRB-approved advertisement (see Appendix 2). This included 271 responses recorded between August 2020 and November 2020. Next, the validation sample was collected via SurveyCircle (an online survey exchange platform), a second MTurk sample excluding “super users”, and online mother and pregnancy groups and included 236 responses recorded between October 2020 and March 2021. Sample collection methods differed slightly based on including additional exclusions in MTurk sampling following recommendations from Litman and Robinson (2021) and then extending sampling to SurveyCircle to allow for additional responses in the validation sample. All participants were recruited from online groups based in North America or the UK; SurveyCircle has geographic constraints which were enabled to similarly draw from participants in North America and the UK. Although we did not ask about geographic location, other studies suggest the majority of users on MTurk are from the USA (75%) with India second at 16%, and Canada third at 1.1% (Difallah et al., 2018; Moss et al. 2020). A final restriction of English-language proficiency to complete this survey suggests that the majority of the sample are likely located in North America and the UK.

Human subjects

This study was filed with the referent university’s Institutional Review Board and exempt status was acknowledged (IRB#: 2003925385). Qualtrics software was used to host and distribute the survey and no protected health information (PHI) was obtained.

Data quality checks

Both datasets underwent a series of data quality checks to eliminate possible “bot” submissions and other data quality problems. First, cases were eliminated if they did not list the weeks pregnant as a number between 4 and 40 (development n = 67, validation n = 55). The weeks pregnant were selected as the initial restriction as it is a sample requirement that only pregnant women be included. This pregnancy week selection eliminated those not pregnant, unsure about pregnant status, and probable “bot” included but unlikely values (e.g., 80). Next, cases were eliminated if they did not complete at least 90% of the survey (development n = 9, validation n = 9). This was part of the decision tree because those cases would be dropped via pairwise deletion if they did not fill out the final 25 survey questions related to the eating disorder tool. Finally, cases were eliminated if they completed the survey in under 2 min (development n = 5, validation n = 5); this is assuming a loss of comprehension when reading speed exceeded 15 words per second. Final datasets included n = 190 (70.1%) for development, and n = 167 (70.8%) for validation. These response rates after quality control are consistent, if not higher with what is being seen for MTurk and other online survey platforms (Kennedy et al. 2020; Nayak and Narayan 2019). All analyses were conducted with SAS software, version 9.4 (SAS Institute, Inc. 2013).



Analyses were conducted first on the development dataset (n = 190). The goal of the developmental dataset was data reduction (to reduce from 25 items to a minimum number of items to accurately screen pregnant women for an ED) with a focus on relationship with current ED diagnosis. Within the development sample, the majority were 25–34 years old (n = 104; 54.74%), married (n = 164; 86.32%), White (n = 140; 74.07%), and had private insurance (n = 129; 67.89%). The mean week of pregnancy was 19.26 (SD = 10.61), and 30.0% had a current ED diagnosis (n = 57). Full descriptive statistics and demographic characteristics of the sample are presented in Table 1.

Table 1 Demographics and descriptive statistics for development (n = 190) and validation (n = 167) samples

Internal reliability

Cronbach’s alpha was used to ensure the 25 items were consistent and had internal reliability; item correlations below 0.60 were first considered as possible for deletion. Five items were below 0.20, and another three below 0.60.

Relationship with current ED diagnosis

Each of the 25 items was then examined using non-parametric Mann–Whitney tests against self-reported current ED diagnosis (Table 2). The five items with the lowest internal consistency also did not have a statistically significant relationship with current ED diagnosis, alpha set to 0.05. Two items with modest internal reliability (below 0.60) had statistically significant but more modest relationships (p-values between 0.002 and 0.005) relative to the other items. All other items p < 0.0001.

Table 2 Items content with the single factor EFA solution on the polychoric correlation matrix, development only (n = 190), of pattern coefficients, p-values for differences between current diagnosis, and other goodness-of-fit indicators

Exploratory factor analysis

Next, a polychoric correlation matrix was output from the 25 items and used for all further factor analyses. First, exploratory factor analysis (EFA) was run with squared multiple correlation (smc) priors and full information maximum likelihood method, with a one and then two factor solution. Single factor with all items demonstrated best fit (Table 2: 64% variability, Tucker and Lewis: 0.58, AIC: 1553.86). Factor loadings below 0.70 were then removed, and another EFA run with a single factor on the final 12 items, resulting in significantly improved model fit (94% variance explained, Tucker and Lewis: 0.88, AIC: 183.55, all factor loadings above 0.72.)

Confirmatory factor analysis

EFA results guided CFA models, starting with a single factor solution, and then adding correlated errors of items related by ED diagnosis, and then substituting a second factor for ED diagnosis, all on the polychoric matrix of the items. The best fitting model was the single factor solution with correlated errors and is presented in Table 3. All factor loadings p < 0.0001. Final model included eight correlated errors, including three correlated errors pertaining to items related to BN, three correlated errors related to items about BED, and two sets of correlated errors for AN items (see Supplemental Path Diagram Figure A1 for correlated error terms). With correlated errors, questions demonstrated acceptable model fit (e.g., GFI: 0.91, RMSEA: 0.10, NNFI: 0.95).

Table 3 Confirmatory factor analysis results for development (n = 190) and validation (n = 167) samples, and then by trimester across both samples (n = 357) for the final 12 item instrument

Final scale clinical cutoff score and validation against current ED diagnosis

The final 12-item scale demonstrated excellent internal reliability, Cronbach’s alpha = 0.95. Items were summed and ranged from 12 to 60, M = 32.48 (SD = 13.9). Those with current ED diagnosis (n = 57, M = 45.63, SD = 8.4) had higher scores against all other participants (n = 133, M = 26.85, SD = 11.9), t (df = 188) = 12.37, p < 0.0001. Using logistic regression with receiver operating characteristic curve on current ED diagnosis, we found increased odds of 1.16 (Wald 95% CL 1.11, 1.21) for each point increase in the summed score, Table 3. A cutoff of 39 had good sensitivity (80.7%) and specificity (79.7%) for detecting current ED diagnosis, with an AUC of 0.88 (Fig. 1a). Those with a score of 39 or greater had 16.42 increased odds of having a current ED diagnosis relative to those with a score below 39. The final scale can be found in Appendix 3. Lower cutoffs could be used in order to maximize sensitivity over specificity. For example, a cutoff score of 34 gives a sensitivity of 89.5%, but reduces specificity to 71.4%. Using this cutoff results in N = 73, 38.4% of the sample as being identified by the screening tool, resulting in a similar OR of 16.4.


Final scale reliability and CFA were replicated with the smaller validation sample (n = 167), using the same cutoff of 39 found in the developmental dataset. The validation sample looked similar to the development sample. A majority of the sample were 25–34 years old (n = 91; 54.49%), married (n = 96; 57.49%), White (n = 107; 64.07%), and had private insurance (n = 90; 53.89%). The mean week of pregnancy was 18.91 (SD = 9.06), and 15.6% had a current ED diagnosis (n = 26).

The 12-item scale demonstrated slightly attenuated but still excellent internal reliability, Cronbach’s alpha = 0.91. Model fit statistics of the CFA on the polychoric matrix attenuated (Table 3); three items standardized factor loading dropped below the preferred 0.60 although all p-values remained significant. Although some model fit indicators also dropped below preferred values (e.g., RMSEA), generally acceptable model fit was also found for the validation model (i.e., GFI: 0.85, RMSEA: 0.14, NNFI: 0.86). Logistic regression indicated well-fitting sensitivity (69.2%) and specificity (86.5%) with the new sample at the same cutoff value of 39, with an AUC of 0.88 (Fig. 1b).


Next, we wished to ascertain that the model would work for each trimester of pregnancy. Due to small samples within trimesters, the development and validation samples were combined for this analysis and then stratified by trimester. Due to the small sample for the third trimester, a different method (ULS) for the CFA was used, and thus, some estimates are not available. The polychoric correlation matrix by trimester was used for all factor analyses. Generally acceptable model fit indices were found (first trimester n = 127, GFI: 0.89, RMSEA: 0.12, NNFI: 0.94; second trimester n = 150, GFI: 0.83, RMSEA: 0.14, NNFI: 0.88; third trimester n = 80, GFI: 0.99, NNFI: 0.99). AUC against current ED diagnosis ranged from 0.85 to 0.89 (Fig. 2). Sensitivity (67.9 to 77.8%) and specificity (79.3 to 85.7%) were also acceptable with the cutoff score of 39 and retained increased odds of having a current ED diagnosis relative to a score below 39 (OR range: 13.64 to 17.68).


The findings from this research suggest the PEBS tool can reliably and sensitively detect EDs in pregnancy with only 12 questions. This would provide a rapid initial screen to allow clinicians to refer women for further assessment. The American College of Obstetricians and Gynecologists (ACOG) in their practice bulletin 740 states that health care providers should be comfortable screening and recognizing patients with eating disorders. Still, only a professional expert should make the final diagnosis (ACOG 2018). It is well known that patients with eating disorders are at risk of having higher rates of anxiety and postpartum depression and babies with a small fetal head circumference (Kimmel et al. 2016). Universal screening for ED in pregnancy has not been established or recommended by professional organizations, although it has been acknowledged that there are clinical management strategies that are currently missing (Paslakis and Zwaan 2019). Considering that many patients with eating disorders engage in weight cycling, clinicians may focus on those groups during pregnancy (Marchesini et al. 2004). Applying the PEBS tool to those patients at the highest risk, could represent an opportunity for clinicians to identify a potential ED in pregnancy. Identifying EDs during pregnancy may allow for appropriate referrals to mental health providers, physicians, and registered dietitians to improve women’s psychological and physical well-being during pregnancy and promote better maternal and child health outcomes.

Strengths and limitations

The final PEBS tool with 12 items meets clinical recommendations, suggesting that brief screening instruments contain 15 items or less and use a simple cutoff score (Marquer et al. 2012). This allows the tool to be easily used by clinicians who have limited time with patients. Additionally, this screening tool was tested on a large sample of pregnant women across all trimesters.

There are some limitations inherent in the methodology which are important to note. First, the sample collected were only English-speaking since the tool is designed to be validated in English first. Therefore, the PEBS tool may not be culturally or linguistically appropriate in all populations and will need to be tailored and translated as needed. Second, convenience sampling was employed which has biases based on that sampling technique. Third, the data collection for the development and validation sample was conducted at slightly different time points, although within a few months of each other. Fourth, there was a smaller sample gathered for the third trimester, which gives us less information on how the tool works relative to the first two trimesters, although there is still sound psychometric data for the third trimester. All model fits ranged from acceptable to good. However, future studies may consider more stringent model fit criteria. Fifth, our final alpha coefficients are high (above 0.90), which raises some question about scale homogeneity at the expense of content coverage and validity (Streiner 2003). However, we believe that we addressed this concern by having content experts review the final 12 items to ensure that we incorporated symptomatology in the final questions that demonstrated the multifaceted nature of the breadth of eating disorders. Sixth, we did not conduct diagnostic interviews with participants to gather their current and past ED diagnoses, but instead asked for self-report of self and professional diagnoses. Comparison with a diagnostic tool would be ideal, but self-reported diagnoses have been shown to be a good indicator of current and lifetime ED diagnoses (Keski-Rahkonen et al. 2006). Additionally, survey collection was conducted during the COVID-19 pandemic and for the safety of participants, an online survey was deemed to be most appropriate. One diagnostic online survey assessment possibility is the digital version of the eating disorder assessment for DSM-5 (EDA-5; Sysko et al. 2015). However, the EDA-5 takes approximately 20–30 min to complete, significantly adding to participant burden and reducing the likelihood of obtaining a large enough sample for meaningful results. Finally, the PEBS tool may screen positive for women with a prior ED, but who are currently in recovery. However, the sensitivity and specificity are high for the PEBS tool, which indicates that women are being identified based on their current risk.

Future research and conclusions

One of the most critical aspects of research is the translation, in order to ensure that the information gathered is made applicable and available to the people that it can help the most. To that end, following this publication, informational booklets on how to use the PEBS tool will be made available to clinicians, along with referral resources. This will bridge the research-practice gap that perpetuates incremental change rather than allowing for broad dissemination of results. Additionally, a feasibility study will be conducted in order to see how the 12 item PEBS tool works in a clinical setting. To address the limitations of self-report diagnoses, this follow-up study will utilize a smaller confirmatory sample with diagnostic interviews for EDs. A pencil/paper and online version of the PEBS tool have also been created to allow for further ease of dissemination and use. Additionally, future research will have to tailor and translate this tool so that it can be more linguistically and culturally appropriate for diverse populations. A further implication of this work is to reduce health and mental health treatment disparities in pregnant women through this standard and rapid screening measure to ensure early identification and treatment.