Multiple gestation pregnancies rose steadily in the United States between 1980 and the early 2000s, with twin births more than doubling (Martin et al. 2012) and triplet + (triplets, quadruplets, etc.) pregnancies increasing by nearly 700% (Blondel and Kaminski 2002). Similar trends are evident elsewhere in North America, as well as in Europe and Asia (Monden et al. 2021), although Africa remains the continent with the highest rate of multiple gestation births (Monden et al. 2021). Rates have decreased slightly in many Western countries in recent years, but they remain high. For example, twin births still comprise 32.60 of 1000 live births in the United States (Martin and Osterman 2019), and triplet + births comprise 93.00 of 100,000 live births (Martin et al. 2019). These relatively high rates of multiple gestation pregnancies have resulted from increasing use of fertility treatments and older average maternal age at childbirth (Blondel and Kaminski 2002). Unfortunately, multifetal gestations confer significant health risks on both mother and babies (American College of Obstetricians and Gynecologists 2021), and a considerable percentage of twins and triplets + are born prematurely and/or require care in a neonatal intensive care unit (NICU; Morrison 2005).

Documented perinatal mental health risks in multiple gestation pregnancies have been less consistent. Due to greater functional demands, financial strain, medical complications, and more, one might expect heightened mental health concerns in parents of multiples versus parents of singletons (Wenze et al. 2020). However, the evidence is mixed; most published work has found more perinatal depression, anxiety, and/or stress in parents of multiples, but some studies have found no significant differences in rates (for a review, see Wenze et al. 2015).

Research in this area has utilized a wide range of methods (i.e., measures used, timepoints surveyed), potentially contributing to these discrepancies. An important limitation to prior work is the exclusive reliance on single-administration, retrospective, self-report measures to assess outcomes. Retrospective recall of constantly-fluctuating, context-dependent, subjective inner states like mood and stress is notoriously unreliable (Levine and Safer 2002) and is impacted by current mood, peak and end heuristics, and other cognitive biases (Stone et al. 2004). Increasingly, ecological momentary assessment (EMA), in which respondents complete repeated, brief surveys in real-time, in situ, is used to circumvent these concerns and to produce a richer, more valid, and more reliable understanding of daily life (Myin-Germeys and Kuppens 2022). To our knowledge, no published work has used EMA to measure mental health in new parents of multiples or to compare such variables to those of parents of singletons.

Some important postpartum outcomes have received surprisingly little or no attention in research on well-being among new parents of multiples. For example, poor postpartum sleep – including both insufficient quantity and repeated interruptions (Medic et al. 2017) – predicts depression and anxiety (Okun et al. 2018), psychosis (Sharma and Mazmanian 2003), and other psychiatric problems. Although research on postpartum sleep, in general, is widespread (Hunter et al. 2009), few studies have assessed sleep in parents of multiples. Those that have suggest restricted sleep duration (Damato and Burant 2008) and repeated sleep disruption (Damato et al. 2021) are prevalent. No studies of which we are aware have included a comparison sample of parents of singletons, however, and many suffer from methodological limitations, such as limited information about twins’ age (Yokoyama 2002); small and homogeneous samples, no verification of when sleep diaries were completed (Damato and Burant 2008); and lack of reporting about time spent awake after sleep onset, napping, or other important variables (Damato et al. 2021).

Maternal-infant bonding, the process in which a mother forms a close emotional connection with her baby or babies, is another critical outcome in the postpartum period (Tichelman et al. 2019). Strong maternal-infant bonds predict positive pediatric cognitive, social, psychological, and behavioral outcomes, while poor bonding predicts negative outcomes (Johnson 2013). Bonding is also linked with maternal mental health (Tichelman et al. 2019) and stress (Lutkiewicz et al. 2020). Prior work has found that mothers of twins often express sadness and guilt over insufficient time to bond with both babies (Wenze et al. 2020), and behavioral observation suggests that, compared to mothers of singletons, mothers of twins demonstrate lower sensitivity and more distant behaviors when interacting with their babies (Ionio et al. 2022). No research of which we are aware has directly measured self-reported maternal-infant bonding in mothers of multiples versus mothers of singletons.

Finally, given its role as a “buffer” against parenting stress (Goldberg and Smith 2014) and its close ties with mental health (Downward et al 2022) and adaptive parenting behaviors (Krishnakumar and Buehler 2000), marital satisfaction is another important postpartum outcome. Relationship satisfaction generally decreases in the first year postpartum (Bogdan et al 2022), with commonly-cited reasons including decreased time spent together, less time for communication and discussion, decreased physical intimacy, and conflict over division of household labor and childcare (Prino et al 2016). The higher time demands, caretaking responsibilities, medical complications, and more than comes with having multiples suggests that parents of multiples might report even lower postpartum relationship satisfaction than parents of singletons. To our knowledge, however, this question has not been empirically tested.

In the current study, we extend the literature on maternal mental health after multiple gestation births. Specifically, our primary aims were to: 1) address some of the methodological limitations of prior work by using EMA to explore real-world, real-time postpartum mood, stress, sleep, and related experiences in mothers of multiples vs. mothers of singletons, and 2) determine whether key variables that have not been examined in prior work (e.g., maternal-infant bonding, relationship satisfaction, time spent awake after sleep onset) differ between mothers of multiples and mothers of singletons. An exploratory, tertiary aim was to determine whether any of these differences depend on postpartum timing. We hypothesized that, compared to mothers of singletons, mothers of multiples would report worse mood, sleep, relationship satisfaction, and maternal-infant bonding and higher stress at baseline and during repeated, real-time, ecological momentary assessments.

Materials and methods

Study overview and design considerations

We tested baseline differences in depression and anxiety, stress, sleep disruption, relationship satisfaction, and maternal-infant bonding between mothers of multiples and mothers of singletons. We also measured in situ, momentary mood, stress, fatigue, bonding, and sleep in both populations during a 7-day EMA period.

Although our primary research questions centered on global differences between mothers of multiples and mothers of singletons, we were also interested in exploring whether any findings might be impacted by the specific postpartum timeframe. Research shows that parents of multiples retrospectively identify months 0–3 postpartum as the most challenging, followed by months 4–6 (Wenze and Battle 2018), so we planned to recruit and compare women from these two periods. However, because multiples are at particular risk of experiencing lengthy NICU stays and many of our research questions would not be relevant for parents with infant(s) who are not yet at home with them (e.g., certain questions about sleep, parenting stress, momentary maternal-infant connection), we elected to sample only weeks 6–12 of the 0 to 3-month postpartum period and, for consistency, only weeks 18–24 from the 4 to 6-month period (i.e., also the latter half of that timeframe). Importantly, although maternal perinatal mental health challenges peak during weeks 0–6 postpartum (Munk-Olsen et al 2006), they remain significantly elevated throughout the first 3 months (Munk-Olsen et al 2006; Putnam et al 2017), and the World Health Organization emphasizes that perinatal mental health problems can arise anytime through the first year after childbirth (World Health Organization 2022). Finally, since pregnancy after age 45 is rare, we used this age cutoff as an extra precaution against bots completing the screening survey.

Participants

To participate, women were required to: 1) be either 6–12 or 18–24 weeks postpartum; 2) be age 18 to 45; 3) currently use a smartphone with internet access; 4) self-report that they read and write English well enough to complete study procedures. Participants were 221 mothers of multiples (n = 127, 57.47%; n = 126 twins, n = 1 triplets) or singletons (n = 94, 42.53%). One hundred twenty-nine (58.37%) reported being 6–12 weeks postpartum and 83 (37.56%) reported being 18–24 weeks postpartum. Postpartum period was unknown for 9 (4.07%) of the 221 participants. Average age was 32.30 (SD = 4.64). The majority of participants (n = 121, 54.75%) reported living in a suburban setting, in a total of 41 different states, plus the District of Columbia (n = 1) and Canada (n = 7). The most commonly reported states were Pennsylvania (n = 28), California (n = 20), Texas (n = 20), and New York (n = 12).

Procedures

The Lafayette College Institutional Review Board approved all study procedures. Data were collected between January, 2019 and March, 2020. All data were collected online, using Qualtrics (Qualtrics, Provo, UT) and SurveySignal (Hofmann and Patel 2015) software; no in-person study contact occurred. See our Open Science Framework (OSF) project for recruitment materials and the screening survey, consent form, baseline, and EMA surveys (Wenze, 2022). Some survey questions were not relevant to current hypotheses.

To help preserve anonymity, we did not track or require participants to report where they learned about the study. However, ads were posted on United States-based online postpartum support (e.g., Twiniversity.com, “Newmomstuff” subreddit, “New Moms, Moms-to-Be, & Experienced Moms” Facebook community) and parenting forums (e.g., TwinsMagazine.com, “Parenting” subreddit, MommyCon’s “BFF” Facebook community), as well as websites and listservs for 24 different chapters of the National Organization of Mothers of Twins Clubs. Online recruitment was supplemented with in-person recruitment at a local new moms’ group and a presentation to local lactation consultants in the Lehigh Valley, PA area.

Individuals who followed the link presented in the study ad (n = 734) first completed (n = 678) a Qualtrics screening form. Those who passed the screener (n = 227; n = 130 multiple, n = 97 singleton) were routed to the informed consent document. Participants who consented (n = 225; n = 130 multiple, n = 95 singleton) were routed to a baseline survey, assessing demographics, symptoms of depression and anxiety, sleep disturbance, general stress and parenting stress, maternal-infant bonding, and relationship satisfaction (see “Materials,” below). Finally, after completing the baseline measures (n = 221; n = 127 multiple, n = 94 singleton), participants were routed to a registration page on SurveySignal.com, where they provided their mobile phone number (verified by text message), time zone, and email address. For the next week, those participants who provided this information (n = 130; n = 65 singleton, n = 65 multiple) received 4 automated text messages per day at 9am and at semi-random intervals between 11:30am and 1 pm, 3 pm and 4:30 pm, and 6:30 pm and 8 pm local time. Text messages included a direct link to a brief EMA survey, also hosted on Qualtrics, which included questions about sleep (morning survey only), mood, stress, fatigue, and maternal-infant connection (see “Materials,” below). Participants had 60 min to begin each EMA survey and received a reminder after 20 min if no response was initiated. Overall, study participation took 8 days; screening, consent, baseline surveys, and EMA registration took place on day 1, and EMA data collection took place on days 2–8. Participant flow is presented in Fig. 1.

Fig. 1
figure 1

CONSORT flow diagram aThere were 225 lines of data reflecting 221 discrete respondents for the baseline surveys. Responses suggested that, in 3 cases, a respondent’s browser crashed as they were routed from the consent form to the baseline survey (i.e., duplicate IP addresses and baseline survey timestamps within a minute of each other, but a first-survey duration of only several seconds with no data recorded). In these cases, the first, “empty” baseline survey was deleted and data from the second survey were included. Additionally, based on identical contact information, one respondent completed the survey twice, 3 months apart. Because relevant survey data (e.g., demographics, babies’ birthweights and ages) were consistent, we deleted this participant’s second set of data and included the first set of data

Participants received $10 for completing baseline measures and $0.50 per EMA survey, delivered via electronic Amazon gift card. Participants who completed at least 21 (75%) of the EMA surveys also received a lottery entry for one of four $50 Amazon gift cards.

Numerous safeguards maximized response validity. Neither the recruitment flyer nor the screening form (which also included distractor questions) revealed specific inclusion criteria; respondents were unable to attempt the screener more than once on the same device; and participants could not advance to the survey without passing the screener. We also verified there were no duplicate IP addresses. Baseline surveys included several duplicate and open-ended questions, which we checked for completion, consistency, and coherence. Captchas were included in the screener, consent, and survey, and respondents could not advance the survey until a minimum amount of time elapsed on each page. Finally, the procedures required to register for and complete the EMA surveys precludes the possibility of a bot providing responses.

Materials

Baseline measures (single-administration, retrospective)

Participants completed a demographics questionnaire, the 20-item Center for Epidemiologic Studies Depression Questionnaire (CESD, Radloff 1977; possible range = 0–60, 16 or above = clinical concern), the Generalized Anxiety Disorder 7-item scale (GAD-7, Spitzer et al. 2006; possible range = 0–21, 0–4 = minimal anxiety, 5–9 = mild anxiety, 10–14 = moderate anxiety, 15–21 = severe anxiety), the 8-item Patient-Reported Outcomes Measurement Information System Sleep Disturbance Short Form 8a scale (PROMIS-SD, Buysse et al. 2010; possible t-score range = 0–100, M = 50, SD = 10), the 10-item Perceived Stress Scale (PSS, Cohen et al. 1983; possible range = 0–40, 0–13 = low stress, 14–26 = moderate stress, 27–40 = high stress), the 5-item Pre- and Postnatal Bonding Scale (PPBS, Cuijlits et al. 2016; this scale was administered as many times as participants had babies; possible range = 5–20), the 18-item Parenting Stress Scale (PaSS, Berry and Jones 1995; possible range = 18–90), and the 4-item Couples Satisfaction Index (CSI, Funk and Rogge 2007; possible range = 0–21, scores falling below 13.5 suggest notable relationship dissatisfaction). Questionnaires were presented in the same order for all participants. Higher scores reflect more of the measured construct.

EMA survey (repeated administration, real-time)

The first survey each day assessed sleep quantity, interruption, timing, and effort, as well as subjective ratings of sleep, using sleep diary questions validated in previous work (Carney et al. 2012). Sleep quantity variables included Naps (number of naps in the past 24 h), NapDur (total nap time in the past 24 h, in minutes), TSTPM (total nighttime sleep time in the past 24 h, in minutes), and TST24 (total sleep time in the past 24 h, in minutes). Sleep interruption variables included Wakings (number of awakenings during the night) and WASO (wake time after sleep onset, in minutes). Sleep timing variables included SOnset (sleep onset—time of day relative to midnight, in minutes), SMidpoint (sleep midpoint—time of day relative to midnight, in minutes), and SOffset (sleep offset—time of day relative to midnight, in minutes). Sleep effort variables included TIB (time in bed, in minutes), SOL (sleep onset latency, in minutes), and SE (sleep efficiency—percentage of time spent asleep while in bed). Subjective ratings of sleep included Rested (“How rested or refreshed did you feel when you woke up for the day?”, 1–5 Likert scale) and Quality (“How would you rate the quality of your sleep?”, 1–5 Likert scale).

All four daily surveys then asked about mood (Positive and Negative Affect Schedule 10-item scale, PANAS, Thompson 2007, plus 10 additional items based on qualitative interviews with mothers of multiples, Wenze et al. 2020), stress (1 item; Wenze et al. 2018), fatigue (1 item; Powell et al. 2017), and feelings of in-the-moment connection with one’s babies (1 item, developed from widely-used measures of parental bonding, Brockington et al. 2001). After sleep diary questions, mood items were presented first, in a random order, to reduce the risk of question order bias across repeated EMA survey administrations. All other questions then appeared in the same order for all participants. Mood, stress, fatigue, and connection questions were rated on a 1 to 5 Likert scale, yielding a total range of 1 to 5 for these variables. Higher scores on all scales reflect more of the measured construct.

Analytic Approach

Because of their intensive, longitudinal, repeated measures design, EMA studies yield multilevel datasets. In our case, “level-1” data (repeated momentary assessments) are nested within higher-order, “level-2” units (the individual participants). Level 1 (EMA) data are therefore also called “within-participant” and level 2 (baseline) data are “between-participant.”

Hierarchical linear modeling is used to account for the nested, multilevel structure of EMA datasets. In our analyses, each participant’s 28 repeated observations were used to generate a unique, unconditional level-1 regression equation modeling their own individual average level of each momentary outcome variable. At level 2, we obtained an average estimate of that outcome variable across the sample as a whole and we determined how singleton vs. multiple birth and postpartum period (6–12 vs. 18–24 weeks) – both between-subjects variables—affected these overall estimates. For example, if we are interested in finding the effects of singleton vs. multiple birth and postpartum period on momentary stress, the level 1 regression equation is:

$${STRESS}_{ij}={\pi }_{ij}+{e}_{ij}$$

The intercept, πij is participant i’s level of stress at time j and eij is the error term for person i. At level 2, we estimate how singleton vs. multiple birth and postpartum period affect these level-1 estimates, via the following equation:

$$\pi_0=b_{00}+b_{01}(\mathrm{singleton}\;\mathrm{vs}.\;\mathrm{multiple})+b_{02}(\mathrm{postpartum}\;\mathrm{period})+r0$$

The intercept, b00, is the average level of momentary stress across the sample as a whole. Because mothers of singletons and 6–12 weeks postpartum are coded as a “0” and mothers of multiples and 18–24 weeks postpartum are coded as a “1,” b01 is the average change in momentary stress for mothers of multiples vs. mothers of singletons and b02 is the average change in momentary stress for participants who are 18–24 weeks vs. 6–12 weeks postpartum.

For exploratory analyses (see Results, below), each participant’s 28 repeated observations were used to generate a unique level-1 regression equation modeling their own individual relationships between WASO (rescaled to hours for ease of interpretation) and wakings and the outcome variable. For example, if we are interested in finding the relationships between WASO and wakings and stress, the level 1 regression equation is:

$${stress}_{ij}={\pi }_{0i}+{\pi }_{1i}(WASO)+{\pi }_{2i}(wakings)+{e}_{ij}$$

Level-1 predictor variables are individual mean centered in HLM (i.e., centered around each person’s own average score), so stressij is participant i’s level of stress at assessment j; the intercept, π0i, is participant i’s level of stress at their average level of WASO and wakings; the slope, π1i, is the change in participant i’s level of stress for every one unit (i.e., one hour) increase in WASO; the slope, π2i, is the change in participant i’s level of stress for every additional nighttime waking; and eij is the error term for person i. At level 2, we obtain average estimates of those relationships across the sample as a whole:

$${\pi }_{0}={b}_{00}+{r}_{0}$$
$${\pi }_{1}={b}_{10}+{r}_{1}$$
$${\pi }_{2}={b}_{20}+{r}_{2}$$

Sleep data were obtained each morning at 9am, so these analyses model how disrupted sleep over the previous night predicts momentary outcomes the next morning.

In multilevel models, fifty respondents (level 2, between-participants) and 20 ecological momentary assessments per respondent (level 1, within-participants) are sufficient to permit accurate estimation of regression coefficients and variances (Hox 2002). Anticipating about 50% attrition after baseline measure completion, 20% attrition in EMA procedures (Depp et al. 2010), and higher rates of attrition among mothers of multiples, we originally aimed to recruit 300 participants (150 mothers of singletons, 150 mothers of multiples) to yield at least 50 participants per group with at least 20 EMA assessments each. We stopped data collection in March, 2020 with the initial COVID-19 lockdowns, prior to reaching this target. However, examination of our data showed that we already had 48 mothers of singletons and 47 mothers of multiples with 20 or more responses, so we did not re-open data collection.

In terms of between-group (i.e., mothers of singletons versus mothers of multiples or 6–12 weeks versus 18–24 weeks postpartum) comparisons, power analysis (G*Power version 3.1.9.6; Faul et al. 2007) indicated that our total sample size of 221 was sufficient to test between-group differences in baseline measures, where the between-group effects were medium or large in size (i.e., to achieve 80% power for detecting a medium effect, at a significance criterion of α = 0.05, N = 128 for ANOVA).

EMA data were analyzed using HLM 8.2. Between-group comparisons for baseline data were conducted using SPSS version 27 and two (multiple vs. singleton) by two (6–12 vs. 18–24 weeks) ANOVAs. No data were excluded from analyses. Participants for whom a predictor variable was missing (e.g., n = 9 for postpartum period) were omitted from models for all relevant analyses. At level 2, mean imputation was used when one scale item was missing. At level 1, items were averaged to yield mood subscale scores (see below) when one subscale item was missing. If more than one item was missing for a scale at level 2 or for mood subscales at level 1, a total score was not computed. We used a Hochberg adjustment to control for false discovery rate (Hochberg 1988) for each set of analyses.

Results

Preliminary analyses

Participants with one or more imputed values (n = 56, 25.34%) did not differ from participants without imputed values on any demographic factors or baseline measures (CESD, GAD-7, PROMIS, CSI-4, PPBS, PSS, PaSS; all ps > 0.05). Reliability analyses and confirmatory factor analyses of EMA data at levels 1 (within-participants) and 2 (between-participants) supported retaining four separate mood subscales: PANAS negative mood items (NA-PANAS), PANAS positive mood items (PA-PANAS), the additional negative mood items (NA-other), and the additional positive mood items (PA-other; see Appendix for details). At each EMA assessment, mood items were averaged to yield these four subscale scores, which could range from 1.00 to 5.00. We conducted separate analyses on the item assessing overwhelm, since prior research revealed the most common word used by mothers of multiples to describe the postpartum period is “overwhelming” (Wenze et al. 2020).

Descriptive analyses

Table 1 presents baseline survey completion data and demographic information. As expected, compared to mothers of singletons, mothers of multiples were more likely to report using assisted reproductive technology (ART) to conceive; one or more babies needing NICU care; lower gestational age at birth; and smaller birthweight for babies (ps < 0.05). Mothers of multiples were also more likely to report an unplanned pregnancy (p < 0.05). However, given higher use of ART in this group, this probably reflects that having a multiple gestation pregnancy was unexpected, not that the pregnancy itself was unplanned. There were no other significant between-group differences.

Table 1 Baseline Survey Completion Data and Participant Demographic Characteristics

Table 2 presents means, standard deviations, and intercorrelations between baseline mood, sleep, stress, bonding, and relationship satisfaction. With one exception (anxiety scores were not significantly correlated with maternal-infant bonding scores), all intercorrelations were significant and in the expected directions. For example, higher depression was associated with higher anxiety, higher sleep disruption, lower relationship satisfaction, lower maternal-infant bonding, higher general stress, and higher parenting stress. In the current sample, CESD scale items were highly inter-correlated (α = 0.93) and total scores ranged from 0–38. GAD-7 scale items were intercorrelated (α = 0.87) and total scores ranged from 0–21. PROMIS scale items were intercorrelated (α = 0.89) and total t-scores ranged from 28.90–76.50. CSI-4 scale items were intercorrelated (α = 0.88) and total scores ranged from 1–21. PPBS scale items were highly intercorrelated (α = 0.93) and total scores ranged from 5–20. PSS scale items were intercorrelated (α = 0.89) and total scores ranged from 0–33. PaSS scale items were intercorrelated (α = 0.88) and total scores ranged from 21–71.

Table 2 Means, Standard Deviations, and Intercorrelations Between Baseline Variables

As noted in Fig. 1, 221 participants provided baseline data but only 130 completed EMA surveys. Compared to participants who registered for the EMA surveys, participants who dropped out after baseline procedures had significantly higher CESD scores (M = 15.63, SD = 10.88 vs. M = 10.43, SD = 7.76, t(63.42) = 3.01, p = 0.004), PROMIS scores (M = 53.90, SD = 6.57 vs. M = 51.29, SD = 8.92, t(89.87) = 2.02, p = 0.05), PSS scores (M = 18.74, SD = 6.14 vs. M = 15.73, SD = 7.14, t(173) = 2.48, p = 0.01), and PaSS scores (M = 49.32, SD = 8.36 vs. M = 40.76, SD = 9.88, t(171) = 5.01, p < 0.001), as well as lower CSI-4 scores (M = 13.21, SD = 4.11 vs. M = 15.13, SD = 3.83, t(180) = -3.02, p = 0.003) and PPBS scores (M = 13.77, SD = 4.22 vs. M = 17.97, SD = 2.70, t(66.64) = -6.59, p < 0.001). Dropouts were also more likely to report having multiples, χ2 (1, N = 221) = 6.22, p = 0.01, and being in the 6 to 12-week postpartum period, χ2 (1, N = 212) = 22.45, p < 0.001. There were no other significant differences between participants who provided only baseline data vs. those who provided baseline and EMA data.

Between-group baseline comparisons

Table 3 presents fixed-effects ANOVA results reflecting between-group baseline comparisons (CESD, GAD-7, PROMIS, CSI-4, PPBS, PSS, and PaSS) for mothers of multiples vs. mothers of singletons and mothers in the 6 to 12-week postpartum period vs. mothers in the 18 to 24-week postpartum period. Interaction terms were included in the models when the p-value was less than 0.20 (Tate and Pituch 2007).

Table 3 Fixed-Effects ANOVA Results for Baseline Measures

Compared to mothers of singletons, mothers of multiples reported more parenting stress (p < 0.001, partial η2 = 0.08). See Fig. 2. Compared to participants in the 18 to 24-week postpartum period, those in the 6 to 12-week period reported less relationship satisfaction (p = 0.01, partial η2 = 0.04). See Fig. 3. For analyses of maternal-infant bonding, there were main effects and an interaction effect, such that mothers of multiples in the 6 to 12-week postpartum period reported the lowest levels of maternal-infant bonding (p = 0.01, partial η2 = 0.04). See Fig. 4. Finally, for analyses of sleep disruption, there was a significant interaction effect; mothers of singletons who were 6–12 weeks postpartum reported less sleep disruption than mothers of singletons who were 18–24 weeks postpartum, whereas mothers of multiples who were 6–12 weeks postpartum reported more sleep disruption than mothers of multiples who were 18–24 weeks postpartum. See Fig. 5.

Fig. 2
figure 2

Main Effect of Singleton vs. Multiple Birth on Parenting Stress PaSS = Parenting Stress Scale scores. Error bars represent 95% confidence intervals

Fig. 3
figure 3

Main Effect of Postpartum Period on Relationship Satisfaction CSI-4 = Couples Satisfaction Index scores. Error bars represent 95% confidence intervals

Fig.4
figure 4

Main Effects and Interaction Effect of Singleton vs. Multiple Birth and Postpartum Period on Maternal-Infant Bonding PPBS = Pre- and Postnatal Bonding Scale scores. Error bars represent 95% confidence intervals

Fig. 5
figure 5

Interaction Effect of Singleton vs. Multiple Birth by Postpartum Period on Sleep Disruption PROMIS = Patient-Reported Outcomes Measurement Information System Sleep Disturbance Short Form 8a scale t-scores scores. Error bars represent 95% confidence intervals

EMA Results

Participants completed an average of 21.10 EMA surveys (range = 2–28). We obtained a total of 3640 ecological momentary assessments across the 130 participants who provided EMA data. Median EMA survey completion time was 101 s (1.68 min).

Mood, stress, overwhelm, fatigue, & connection

NA-PANAS composite scores ranged from 1.00—4.80. PA-PANAS composite scores ranged from 1.00 – 5.00. NA-other composite scores ranged from 1.00 – 4.63. PA-other composite scores ranged from 1.00 to 5.00. The ranges for questions assessing stress, overwhelm, fatigue, and connection were 1.00 to 5.00.

Table 4 presents hierarchical linear modeling results for EMA of mood, stress, overwhelm, fatigue, and connection with babies. Compared to mothers of singletons, mothers of multiples reported more momentary stress (b01 = 0.29, corresponding to 0.29 point, p = 0.01) and overwhelm (b01 = 0.34, corresponding to 0.34 point, p = 0.01).

Table 4 Multilevel Regressions: Effects of Singleton Versus Multiple Birth and Postpartum Period on Momentary Variables

Sleep

Reported number of naps (Nap) ranged from 0 to 4. Reported total nap time (NapDur) ranged from 5 – 265 min. Reported total nighttime sleep (TSTPM) ranged from 0 -665 min. Reported total 24-h sleep time (TST24) ranged from 0 – 745 min. Reported number of awakenings per night (Wakings) ranged from 1 – 12. Reported wake time after sleep onset (WASO) ranged from 0 – 300 min. Reported sleep onset (SOnset) ranged from -239 – 360 min (i.e., 8:01 pm to 6:00am). Reported sleep midpoint (SMidpoint) ranged from 12 – 1435 min (i.e., 12:12am to 11:55 pm). Reported sleep offset (SOffset) ranged from 60—720 min (i.e., 1:00am to 12:00 pm). Reported time in bed (TIB) ranged from 105 – 810 min. Reported sleep onset latency (SOL) ranged from 0 – 330 min. Sleep efficiency (SE) ranged from 0% – 100%. The ranges for questions assessing feeling rested/refreshed and sleep quality were 1—5 and 2 – 5, respectively.

Table 5 presents hierarchical linear modeling results for EMA of sleep. Compared to mothers of singletons, mothers of multiples reported more nighttime awakenings (b01 = 0.53, corresponding to 0.53 more awakenings, p = 0.02) and more wake time after sleep onset (WASO; b01 = 26.54, corresponding to 26.54 more minutes per night, p < 0.001). Compared to participants in the 6 to 12-week postpartum period, those in the 18 to 24-week period reported less WASO (b02 = -28.90, corresponding to 28.90 fewer minutes per night, p < 0.001), an earlier sleep midpoint (b02 = -30.26, corresponding to 30.26 min earlier each night, p = 0.002), earlier sleep offset (b02 = -37.03, corresponding to 37.03 min earlier each morning, p < 0.001), and greater sleep efficiency (b02 = 7.86, corresponding to 7.86% more efficient, p < 0.001).

Table 5 Multilevel Regressions: Effects of Singleton Versus Multiple Birth and Postpartum Period on Sleep Variables

Exploratory analyses

Given prior research demonstrating the negative repercussions of repeatedly disrupted sleep (Medic et al. 2017), as well as the previously-presented results showing that mothers of multiples reported more nighttime awakenings and more WASO than mothers of singletons, we tested whether nighttime awakenings and WASO (rescaled to hours for ease of interpretation) were associated with other momentary variables. Results are presented in Table 6. WASO and wakings were each independently associated with fatigue (b10 = 0.32, p < 0.001 and b20 = 0.09, p = 0.007, respectively) and NA-other (b10 = 0.08, p = 0.011 and b20 = 0.05, p = 0.001, respectively). WASO was associated with PA-PANAS (b10 = -0.14, p < 0.001). Wakings were associated with NA-PANAS (b20 = 0.04, p = 0.001), PA-other (b20 = -0.08, p = 0.006), and stress (b20 = 0.11, p = 0.002).

Table 6 Multilevel Regressions: Effects of Wake Time After Sleep Onset and Nighttime Awakenings on Momentary Variables

Supplemental analyses

As noted previously, mothers of multiples were significantly more likely to report use of ART, a NICU stay, lower gestational age, lower birthweight, and an unplanned pregnancy. All of these differences except for unplanned pregnancy (which, as discussed, probably reflects that having a multiple gestation pregnancy was unexpected, not that the pregnancy itself was unplanned) are an expected part of the experience of having multiples and, indeed, we would anticipate that they would significantly contribute to the relatively higher stress and mental health burden that parents of multiples often report. As such, statistically controlling for these factors in between-group comparisons may not make as much conceptual sense as controlling for a factor that is related to (but not an inherent part of) the independent or dependent variable(s).

Nevertheless, for exploratory purposes, we re-ran our main analyses while controlling for use of ART, NICU stay, gestational age, birthweight, and planned pregnancy. Findings were similar for between-group baseline comparisons, but there were differences in results at the momentary (i.e., EMA, within-participant) level. Of note, beta values remained similar even when findings dropped from significance, while standard errors doubled or tripled and degrees of freedom halved, reflecting the considerable reduction of power when controlling for so many covariates. Please refer to the Appendix for details.

Discussion and Conclusions

We compared self-reported mood, stress, sleep, relationship satisfaction, and maternal-infant bonding among women who had recently given birth to singletons or multiples, using both traditional, single-administration, retrospective recall measures and a 7-day period of EMA. Mothers of multiples reported more baseline parenting stress and less maternal-infant bonding than mothers of singletons, with mothers of multiples who were 6–12 weeks postpartum reporting the lowest bonding of all. Mothers of multiples also reported more momentary stress, overwhelm, nighttime awakenings, and wake time after sleep onset (WASO). Nighttime awakenings and WASO correlated with real-time, momentary fatigue, stress, and poor mood.

Consistent with prior work (Glazebrook et al. 2004), mothers of multiples reported more baseline parenting stress as assessed by retrospective self-report scales. Our research demonstrates that, on a real-time basis, subjective stress and overwhelm are also significantly higher. Increased caregiving burden, time demands, sleep disruption, medical complications, and guilt at inability to simultaneously meet all babies’ needs may be among the factors that yield elevated stress in this population (Wenze et al. 2020). Postpartum supportive stress management programs have been successful in singleton parents (Song et al. 2015). These approaches could likely be helpful for parents of multiples, though would need to be tailored specifically for their unique needs.

Mothers of multiples reported lower maternal-infant bonding than mothers of singletons, with those in the earlier (i.e., 6 to 12-week) postpartum period at particular risk. To our knowledge, this is the first to study to examine postpartum bonding in parents of multiples. Unfortunately, research suggests that poor maternal-infant bonding predicts numerous negative outcomes, including pediatric behavior problems, abusive parenting, and maternal depression (Brockington 2011). In previous qualitative work (Wenze et al. 2020), new mothers of multiples reported feelings of grief and loss over inability to bond with their babies due to time constraints and exhaustion. Interventions to improve bonding (Mascheroni and Ionio 2019) and instrumental, structural support (e.g., universal, paid parental leave; insurance coverage for in-home help or night nurses) – particularly in the early postpartum months—might help.

Compared to mothers of singletons, mothers of multiples reported more awakenings at night and more WASO. Prior work suggests that it is not just restricted total sleep time that is associated with poor physical and mental health; regularly disrupted sleep also correlates with worse health (Medic et al. 2017). Indeed, in the current study, nighttime awakenings and WASO were associated with worse momentary mood, more stress, and fatigue. Findings underscore the importance of interventions to improve postpartum sleep, particularly in parents of multiples. To our knowledge, however, no tailored sleep interventions for this population have been developed. Commonly-used strategies in parents of singletons (e.g., exercise, massage; Owais et al. 2018) might require adaptation in parents of multiples (e.g., completed at home, childcare required). Self-management strategies (Doering and Dogan 2018) could be promising.

Contrary to expectations, there were no significant differences in baseline levels of depression, anxiety, general stress, sleep disruption, or relationship satisfaction, or momentary mood, fatigue, or bonding between mothers of multiples and mothers of singletons. Potential reasons may include our choice of measures or the particular postpartum timepoints we surveyed. Previous work suggests that such methodological decisions can impact whether between-group differences are found between mothers of multiples and mothers of singletons (Wenze et al. 2015). Our study was also under-powered to detect small effects. It is possible that our recruitment strategy played a role, as women who seek postpartum support in online forums and new moms’ groups might be struggling with different issues than those who seek help in hospitals and clinics (e.g., parenting stress and bonding challenges vs. anxiety and depression). With specific respect to the fact that mothers of multiples and mothers of singletons did not differ in their reported relationship satisfaction, in previous qualitative work, mothers of twins shared that the need to collaboratively “team-parent” was a pleasant surprise that improved their relationships with their partners (Wenze et al 2020). It is possible that such “team-parenting” serves as a buffer against more severe postpartum relationship deterioration in couples parenting multiples. Finally, regarding the lack of significant between-group differences in certain momentary outcomes, differential attrition might have played a role. As noted previously, mothers of multiples and participants with worse mental health and sleep disruption were especially likely to drop out before providing EMA data.

Strengths of this study include measurement of under-studied outcomes (e.g., maternal-infant bonding in mothers of multiples), a comparison sample of mothers of singletons, and the use of EMA to reduce recall biases and capture real-time postpartum experiences. Limitations include missing data and a disproportionately White and non-Hispanic sample. We also did not include women who were 0–6 weeks postpartum, which is the timeframe during which new mothers are most likely to experience mental health challenges, stress, and relationship disruption (Munk-Olsen et al. 2006). Findings might not extend to partners or new mothers outside North America. Due to remote data collection, self-reported outcomes (both at baseline and in the moment, via EMA) could not be corroborated with behavioral observation, clinician ratings, or, in the case of sleep outcomes, actigraphy or polysomnography. Finally, it is likely that those participants who had the time, technology, and other resources to take part in this study are not representative of all new mothers; indeed, those who dropped out after completing the baseline assessments (before completing EMA) reported significantly worse mental health, sleep, maternal-infant bonding, and relationship satisfaction at baseline, and were more likely to report having multiples and being in the 6 to 12-week postpartum period.

Despite these limitations, this study contributes to a small but growing body of literature on perinatal mental health in parents of multiples. Future research should replicate these findings with an in-person sample, using objective, psychophysiological sleep measures and clinical rating scales. Future work should also include women in the early postpartum period (i.e., weeks 0–6), partners, and parents outside of the United States; like birth mothers, partners also fare poorly in terms of postpartum sleep and mental health following multiple births (Wenze et al. 2015), and the vastly different ways that pregnancy and the postpartum period are handled in different countries (i.e., regarding availability of healthcare, parental leave, postnatal stipends and assistance, etc.) undoubtedly impacts parental well-being. Exploring the impact of parity on postpartum well-being in parents of multiples is also an important topic for future study. Finally, tailored mental health support interventions for new parents of multiples are needed to reduce stress, improve sleep, and increase maternal-infant bonding in this vulnerable population. Encouragingly, research suggests that parents of multiples are enthusiastic about mobile mental health support (Wenze and Battle 2018; Wenze et al. 2020), which may be more accessible than traditional, in-person paradigms.