Introduction

Although fathers in Western countries play an important role in child development (Lamb, 2010) and have increased their participation in child rearing over the past decades (Bakermans-Kranenburg et al., 2019), they still spend less than half the amount of time on childcare compared to mothers (Huerta et al., 2013). Moreover, fathers generally seem to show lower parenting sensitivity than mothers in the first years after the child’s birth (e.g., Hallers-Haalboom et al., 2017). Fathers and children may thus benefit from interventions aimed at enhancing early paternal caregiving. Previous research showed that close physical contact between mothers and children induced by the use of a soft baby carrier or sling may positively affect maternal parenting and child attachment security (Anisfeld et al., 1990). In the current randomized controlled trial (RCT), preregistered on https://osf.io/qwe3a, we examined effects of the use of a baby carrier on fathers’ parenting behavior and hormonal functioning. Our research questions were: What are the effects of baby carrying on fathers’ interactive behavior with their baby, i.e., sensitivity and involvement, and hormonal functioning, i.e., basal oxytocin and cortisol levels and oxytocin and cortisol reactivity in response to interacting with the infant?

Over the past decades, there has been a growing awareness of fathers’ importance in child rearing, concurrent with increases in participation of fathers in childcare as well as in research focusing on fathering (Bakermans-Kranenburg et al., 2019). Despite these developments, research on parenting still tends to focus mostly on mothers, possibly because mothers are often still seen as primary caregivers as they generally spend more time with children (Schoppe‐Sullivan & Fagan. 2020). In line with this focus on mothers, research on programs and interventions to stimulate paternal caregiving has been very limited. Nevertheless, such research is essential given the impact of paternal caregiving on child outcomes. More specifically, both paternal caregiving quality, i.e., paternal sensitivity, and quantity, i.e., paternal involvement, have been shown to be important for child outcomes (e.g., Lucassen et al., 2011; Sarkadi et al., 2008).

Paternal sensitivity is defined as fathers’ ability to perceive, adequately interpret, and appropriately and promptly respond to child signals (Ainsworth et al., 1974). Children whose fathers are more sensitive show, for example, better cognitive functioning, better emotion regulation, less externalizing behaviors and more attachment security (Lucassen et al., 2011; Rodrigues et al., 2021).

Paternal involvement is a multi-faceted construct that encompasses the amount of time fathers directly engage with their child, i.e., time spent in one-on-one interaction with the child such as in play or physical care, and are available or accessible to their child (i.e., being present without necessarily directly interacting; Lamb et al., 1985), but also the amount of time fathers spend thinking or communicating about or with their child (cognitive/affective involvement; Hawkins & Palkovitz, 1999). Research has shown that father involvement positively affects social, behavioral, psychological and cognitive child outcomes (Sarkadi et al., 2008). Those effects were mostly examined and reported for paternal engagement, although some studies also indicated positive effects of paternal accessibility or of an overall involvement measure combining different aspects (i.e., being accessible, engaged and responsible) (Sarkadi et al., 2008). Importantly, it has been shown that paternal sensitivity and involvement early in the child’s life is relevant for later child outcomes (Brown et al., 2012).

Because of the relevance of early paternal caregiving for child development, it is important to examine ways in which parental caregiving may be improved. Several studies found positive effects of close physical contact, such as skin-to-skin contact, on infants (e.g., infant crying; Erlandsson et al., 2007), and on parenting of both fathers (Chen et al., 2017) and mothers (e.g., Bigelow et al., 2010), even years later (Bigelow et al., 2018). Parent-infant physical contact can also be facilitated by baby carrying. Although research on the use and the effects of the use of baby carriers in fathers is virtually non-existent (but see Riem et al., 2021 for effects on fathers’ amygdala reactivity to infant crying in the current sample), studies in mothers have suggested that baby carrying may promote maternal caregiving and infant outcomes. One study in a sample of 49 mothers indicated that baby carrying may positively affect maternal behavior. Mothers who used a baby carrier showed higher responsiveness to infant signals at 3.5 months of infant age compared to mothers using a baby seat (Anisfeld et al., 1990). Baby carrying mothers were not significantly more sensitive than mothers using a baby seat, but the moderate effect size for sensitivity in this study seems to support continued evaluation of effects of baby carrying on parental sensitivity. Moreover, infants of mothers who used a baby carrier were significantly more securely attached when they were 13 months old, compared to infants of mothers using a baby seat and the effect size for attachment was large (Anisfeld et al., 1990). Another study in a very small sample (N = 33) found similar results (Williams & Turner, 2020), but the fact that attachment was assessed at a very young age (7 months) and with an atypical procedure (the Still Face Paradigm) precludes drawing firm conclusions. Of note, the studies by Anisfeld et al. (1990) and Williams and Turner (2020) did not include a pre-test assessment and therefore it remained unclear whether using a soft baby carrier led to changes over time. Including pre-test assessments in an RCT is recommended, in particular when sample sizes are modest (Venter et al., 2002). Other studies (some with pre-test assessments) reported positive effects of baby carrying on infant crying (Hunziker & Barr, 1986) and on breastfeeding duration (Pisacane et al., 2012; Little et al., 2021). In addition, mothers tend to be more responsive to infant vocalizations when they carry their baby than during face-to-face interaction (Little et al., 2019). Taken together, these previous studies suggest that promoting physical contact between parent and infant can positively affect caregiving behavior. We expected that using a baby carrier would enhance both fathers’ sensitivity and their overall involvement (i.e., engagement, accessibility, and cognitive/affective involvement).

Baby carrying may also affect parent physiology. Two of the hormones that may be relevant are oxytocin and cortisol. Oxytocin is a neuropeptide that is produced in the hypothalamus and is mostly known for its involvement in labor and lactation, although oxytocin also has anxiolytic effects and has been related to a variety of (social) behaviors (Ellis et al., 2021). Cortisol is a steroid hormone secreted by the hypothalamic-pituitary-adrenal axis and is well known for its association with stress and stress regulation (Saxbe, 2008). Importantly, both oxytocin and cortisol have been associated with caregiving, such that oxytocin has been positively and cortisol negatively related to positive caregiving behavior (e.g., Bos et al., 2018; Naber et al., 2010). Increased oxytocin and decreased cortisol may therefore be relevant for caregiving behaviors and, in turn, for child outcomes. What is more, oxytocin and cortisol seem to be affected by physical contact (Field, 2010). Skin-to-skin contact of parents with their pre-term infants has been related to increases in oxytocin levels and decreases in cortisol levels measured with single saliva samples in both fathers and mothers (Bigelow et al., 2012; Cong et al., 2015; Vittner et al., 2019). Additionally, fathers who engaged in stimulatory touch with their infants showed higher levels of baseline plasma and salivary oxytocin (measured with single samples) and increases in salivary oxytocin levels (Feldman et al., 2010). Moreover, paternal salivary cortisol decreased when fathers held their infant (Kuo et al., 2018). Frequent use of a baby carrier may affect several aspects of fathers’ hormonal functioning. First, enhanced physical contact between parent and infant may increase secretion of oxytocin and decrease secretion of cortisol, resulting in higher basal oxytocin levels and lower basal cortisol levels. Second, fathers using a soft baby carrier may become more attuned to their infants and therefore show higher increases in oxytocin levels and higher decreases in cortisol levels following interaction with their infant.

Fathers in the control condition of the current RCT used a baby seat. We chose the baby seat as the control condition because this was also the control condition in the Anisfeld et al. (1990) study on the effects of using a baby carrier. Using a baby seat may stimulate fathers’ face-to-face interaction with their infant, but we do not expect any hormonal effects of using a bay seat because of the absence of close physical contact. Moreover, because in mothers it was found that they were more responsive to infant vocalizations when they carried their baby than during face-to-face interaction (Little et al., 2019), we expected few (if any) effects of baby seat use on fathers’ sensitivity.

In this RCT, we examined the effects of baby carrying on fathers’ parenting behavior and hormonal functioning. Our primary hypothesis was (1) fathers in the baby carrier condition will show higher increases in sensitivity from pre-to post-intervention compared to fathers in the baby seat condition. Our secondary hypotheses were (2) fathers in the baby carrier condition will show higher increases in involvement from pre-to post-intervention compared to fathers in the baby seat condition; (3) fathers in the baby carrier condition will show increases in basal oxytocin levels and decreases in basal cortisol levels from pre- to post-test compared to fathers in the baby seat condition; (4) fathers in the baby carrier condition will show increased hormonal reactivity (i.e., higher increases in oxytocin levels and higher decreases in cortisol levels) in response to interacting with their infant from pre- to post-test compared to fathers in the baby seat condition. We additionally performed some exploratory analyses that were not preregistered. Specifically, we explored differences between the baby carrier and baby seat group in changes of fathers’ endorsement of parenting principles relating to regularity and routines (structure) and to infant cues and close physical contact (attunement). Finally, potential moderators of intervention effects on sensitivity and involvement, such as infant sex and reported tool-use time, were explored.

Methods

Study Design

First-time fathers in the early postnatal phase (i.e., 2–4 months post-birth) were randomly assigned to the experimental condition (i.e., use of a baby carrier, n = 41) or control condition (i.e., use of a baby seat, n = 39). Randomization was performed before the start of the intervention using a computer-generated randomization sequence. Information about the intervention condition was placed in numbered envelopes and participants were assigned to consecutive numbers based on the moment of their pre-test assessment. The envelope was opened only after the pre-test was conducted. Following the opening of the envelope, participants and interveners were not blind to the condition; however, participants were not informed about which of the two conditions was the experimental condition. Researchers involved in processing and coding of data were blind to the randomization. See Fig. 1 for a flowchart of participant inclusion, allocation and follow-up. The original planned sample size was 140 participants. Due to recruitment difficulties and time constraints (i.e., end-of-project funding), we obtained a sample of 80 participants. This study was sufficiently powered (0.80) to detect effect sizes of Cohen’s d = 0.32. A more detailed power analysis can be found in the preregistration (https://osf.io/qwe3a).

Fig. 1
figure 1

Flowchart of participant inclusion, allocation and follow-up

Participants

First-time fathers were recruited via the distribution of flyers and letters through midwife practices, child healthcare centers, municipal records and via (online) advertisements between February 2018 and November 2019. Inclusion criteria were: male adults who recently had their first baby (i.e., infant’s age approximately 2 months), the infant is healthy and full-term (i.e., born after 37 weeks of gestation). A priori stated exclusion criteria were: not cohabitating with the child’s biological mother, no mastery of the Dutch language, a current endocrine disorder, the use of medication potentially interfering with the endocrine system, a cardiovascular disease, a psychiatric disorder, current heavy drinking, regular use of soft drugs, use of hard drugs within the past 3 months, MRI contraindications (e.g., metallic foreign objects, a neurological disorder), using a baby carrier for over 5 h per week at time of inclusion, and having an upper torso injury that could hinder the use of a baby carrier. Due to recruiting difficulties, we deviated from a priori exclusion criteria in seven cases (MRI contraindications, n = 3, these participants were included in all parts of the study except MRI scanning; diabetes and use of medication potentially interfering with the endocrine system (metformin), n = 1; ADHD, n = 1; cardiovascular disease, n = 1; child was born after 36 weeks and 6 days of gestation, but was considered healthy, n = 1). Fathers were 25–56 years old (M = 33.10, SD = 5.36) at time of the pre-test. Fathers were mainly born in the Netherlands and followed on average 8.28 years of education after primary school (SD = 1.85). One father was not the biological father of the infant, but he had been living with the biological mother since mid-pregnancy. Infants were 7–21 weeks old (M = 11.62, SD = 3.37) at time of the pre-test and approximately half of the infants were boys (53%). Demographics per intervention group are reported in Table 1. Additional information on fathers’ working hours and paternal leave is reported in the Supplementary Information.

Table 1 Demographics per intervention group

Procedure

Data were collected between March 2018 and May 2020 during sessions that took place at the lab (Leiden University Medical Centre; 82% of pre-test and 76% of post-test sessions), at home (11% of pre-tests and 15% of post-tests) or partly at the lab and partly at home (6% of pre-tests and 9% of post-tests). The location of the sessions depended on whether participants underwent MRI scanning and on participants’ personal preferences. Sessions were split in two parts when for instance father came in alone for one part of the session because the infant was sick. The pre-test and post-test sessions comprised a set of questionnaires, sampling of saliva for the determination of hormone levels, a 10-min father–infant free play, followed immediately by an Auditory Startling Task (AST, Lotz et al., 2020; only for visits that took place in the lab), a second sampling of saliva, and additional behavioral and neural assessments that were not included in the current study (see Lotz et al., 2020 for a description of the full procedure, and Riem et al. (2021) for the effects of the intervention on amygdala reactivity to infant crying). After both the pre-test and the post-test session, fathers were provided with saliva swabs and passive drool tubes for the collection of saliva at home on 2 consecutive days in the week following the session. Moreover, after both sessions they received a link to online questionnaires to be completed at home (including questionnaires assessing involvement (time spent with infant), endorsement of parenting principles relating to structure and attunement (Baby Care Questionnaire), and depressive symptoms). All pre-test questionnaires were completed before the intervention ended; if not, they were marked as missing. Included pre-test questionnaires were completed on average 7.45 days (SD = 5.74) after the pre-test session and the vast majority was completed before the intervention started (i.e., 83%). Post-test questionnaires were completed on average 14 days (SD = 12.28) after the post-test session. Additionally, after both pre-test and post-test, paternal involvement was assessed in real-time via a smartphone application that started 1 day after the session. On average around 2 weeks after the pre-test (M = 13.27 days, SD = 4.75, range 7–31), fathers were visited at home and received instructions regarding the use of the assigned tool (i.e., baby carrier or seat). The intervention period lasted for 3 weeks, and the post-test was scheduled as soon as possible after the end of the intervention period. Mean number of days between start of the intervention and post-test was 28.04 (SD = 12.73, range 20–90 days). For three participants, the post-test could not take place shortly (i.e., within 3 weeks) after the intervention period ended because of scheduling difficulties (n = 2) or measures taken against COVID-19 (n = 1). Participants received a travel allowance and a financial compensation of a maximum of 95 euros for their participation. The study procedure was registered at the Central Committee on Research Involving Human Subjects (CCMO, registry number NL62692.058.17) and approved by the medical ethics committee of the Leiden University Medical Centre and the ethics committee of the Department of Education and Child studies at Leiden University. All participants provided written informed consent.

Intervention

Participants in the experimental group received a soft baby carrier during the intervention home visit. While wearing their infant in the carrier, fathers and infants are in close physical (chest-to-chest) contact. During the intervention visit, interveners verbally provided fathers with general information about infant physiology (e.g., grasp reflex, spread-squat reflex, anatomy of the infant’s spine and pelvis) and with safety instructions regarding the use of a baby carrier. Then, participants tried two different baby carriers (i.e., Kodaki Flip and Ergobaby Adapt) and chose which one they preferred to use. The intervener showed the father step-by-step how to use the baby carrier and let him practice using a baby doll and then their own infant. Fathers were told that during the intervention period, they were the only one allowed to use this baby carrier (i.e., not other caregivers). We informed fathers that the carrier contained a temperature logger to measure their use of the carrier, and asked them not to store the carrier near a heating device, in the sun, or within reach of any pets. Fathers in the control condition received a Doomoo baby seat during the intervention visit. Using the seat can induce proximity between father and infant, without requiring physical contact. Interveners showed fathers the (different parts of the) baby seat and gave instructions on safe usage of the seat. Fathers were asked to use the assigned tool (carrier or seat) for at least 6 h per week, spread over a minimum of 4 days, for 3 weeks. The intervention duration was informed by repeated consultation with an expert baby carrier consultant and aimed to balance sufficient exposure on the one hand and commitment of the participants on the other hand. It was expected that using the baby carrier for 6 h per week for a period of 3 weeks would be doable for fathers. We expected that using the baby carrier multiple times per week might be more effective than using it just once or twice for longer periods of time. An application was installed on fathers’ smartphones via which they were daily asked to report on their use of and experiences with the tool.

Measures

Primary outcome: sensitivity

Paternal sensitivity was coded from a videotaped 10-min free play interaction (5 min without toys, 5 min with toys) between father and infant at pre-test and post-test. Sensitivity was coded using the Ainsworth scales for Sensitivity and Cooperation (Ainsworth et al., 1974), which range from 1 (highly insensitive/highly interfering) to 9 (highly sensitive/highly cooperative). Sensitive fathers notice the child’s signals (e.g., distress, or interest in a specific toy) and respond to those signals promptly and adequately (e.g., provide comfort, or offer the child the toy that they show interest in). Insensitive fathers do not notice or respond to the child’s signals, e.g., do not pick up the infant when they stretch out their arms. Cooperative fathers follow the child’s initiative and do not interfere with their ongoing activities (e.g., when an infant explores a toy, the father lets the child play with it and does not introduce another toy). Interfering fathers do not respect the infant’s wishes or ongoing activities in situations in which a cooperative approach is appropriate (e.g., they forcefully take away a toy from the infant to show how it should be used or move the child without an apparent reason). The Ainsworth scales have been shown to be valid (De Wolff & Van IJzendoorn, 1997). Five coders were trained and found reliable with an expert coder: ICCs (single measures, absolute agreement) based on 20 videos ranged from 0.68 to 0.76 for sensitivity and 0.64 to 0.79 for cooperation, indicating adequate reliability (Cicchetti, 1994). All remaining videos were single-coded by one of the coders. Coders were blind to participants’ intervention condition, and pre-test and post-test interactions from one participant were never scored by the same coder. Sensitivity and cooperation scores significantly and highly correlated at pre-test, r (77) = 0.67, p < 0.001, and post-test, r (68) = 0.63, p < 0.001, and were therefore averaged into one score indicating parental sensitivity.

Secondary outcomes

Involvement

Involvement application: paternal involvement was assessed in real-time in the week following pre-test and post-test sessions via a smartphone application. Fathers received six app notifications per day, for 7 consecutive days. Notifications were sent at random times between 9 and 10 am, 11 and 12 am, 1 and 2 pm, 3 and 4 pm, 7 and 8 pm, and 9 and 10 pm and remained visible on participants’ phones for 1 h, after which they disappeared and questions could no longer be answered. The questions concerned various dimensions of fathers’ involvement in the past 15 min. Cognitive/affective involvement (Hawkins & Palkovitz, 1999) was assessed by asking fathers whether they had thought about, spoken about or communicated with their baby. Accessibility, i.e., being present and available to the infant, was assessed by asking fathers whether they had been near their baby. When fathers indicated they had been near their baby, they were asked whether the baby was awake. When fathers indicated the baby was awake, they were asked: “In the past 15 min, have you interacted with the baby, for instance changed the baby’s diaper or talked to the baby?” to assess engagement. See Supplementary Information for an overview of the involvement app questions. Responses were coded 0 = no, 1 = yes. A score for cognitive/affective involvement was calculated by averaging the scores for cognitive/affective involvement across the 1-week period. Similarly, accessibility scores were averaged across the 1-week period to create one accessibility score per person. A score for engagement was calculated by dividing the sum of engagement by the sum of times the infant was awake across the 1-week period. In case fathers never reported they were near their baby while their child was awake, engagement was coded as zero. Seventy-two and 71 fathers responded to at least one of the app notifications at pre-test and post-test, respectively. Response rates were calculated based on the responses to the first app question and, on average, fathers responded to 25 notifications, i.e., 60%, at pre-test (SD = 9.75 notifications, range = 1–40) and to 23 notifications, i.e., 55%, at post-test (SD = 11.16 notifications, range = 2–40).

One outlier for cognitive/affective involvement at pre-test, two outliers for engagement at pre-test and two outliers for engagement at post-test (all z < −3.29) were winsorized. Principal component analyses were conducted on the three dimensions at pre-test and post-test separately. For pre-test and post-test, one component was extracted with an eigenvalue >1 that explained 62% and 56% of the variance, respectively. The involvement variables loaded >0.69 at the pre-test component and >0.56 at the post-test component. Tucker’s congruence coefficient for the pre-test and post-test components was 0.998, indicating that the components at the two time points can be considered equal (Lorenzo-Seva & Ten Berge, 2006). Component scores were used in subsequent analyses. One outlier at pre-test (z < −3.29) was winsorized.

Time spent with infant: we additionally assessed the amount of time fathers spent with the infant via an online questionnaire that fathers received following the pre-test and post-test sessions. Fathers reported at pre-test and at post-test for each day of the week the number of hours they spent with their child on average, counting only time that both father and child were awake. Mean scores were calculated reflecting the number of hours fathers spent with the child on average per day. Two outliers at pre-test (z > 3.29) were winsorized. As per preregistered plan, we examined the existence of a component “Involvement” by performing a PCA on the three involvement dimensions assessed in the involvement application and reported hours spent with child. One factor loading at post-test was <0.40, and therefore no component was extracted based on the four involvement measures, see Supplementary Information.

Oxytocin and cortisol

Saliva for the determination of oxytocin levels was collected using a cotton swab (Salivette, Sarstedt). Participants were instructed to lightly chew on the swab while slightly moving it around in their mouths. Oxytocin was assayed at RIAgnosis (Sinzing, Germany). After centrifugation of the salivettes at 4 °C for 30 min with ca. 5000 g centrifugal force, 0.3 ml of saliva was pipetted into a vial. Oxytocin was quantified using radioimmunoassay. Saliva samples were not extracted prior to assay because pilot studies have shown that radioimmunoassay data from extracted and unextracted saliva samples are almost identical (R. Landgraf, personal communication, March 5 2020). The detection limit of oxytocin was 0.1 pg/ml. Inter-assay and intra-assay variability was <10%.

Saliva for the determination of cortisol was collected using the passive drool method. Participants collected approximately 1.5 ml saliva in a 2 ml cryogenic vial by drooling directly into the vial or using a straw-like saliva collection aid (SalivaBio, Salimetrics). Cortisol was quantified at Dresden LabServices GmbH (Germany) using Luminescence immunoassay (IBL International GmbH). Twenty µl of saliva was used for the analysis of cortisol. The detection limit for cortisol was 0.012 µg/dl. A random selection of 32% of the pre-test and post-test lab and home samples were assayed in duplicate and the intra-assay coefficient of variation was 6%. Inter-assay variability was computed from controls run at each microtiter plate and amounted to ≤8 %. Samples for both oxytocin and cortisol were shipped in two batches.

Basal levels

Paternal basal oxytocin and cortisol levels were determined from saliva samples collected by the participant two times a day (morning and evening) on 2 subsequent days in the weeks following pre-test and post-test sessions. See Supplementary Information for a description of instructions fathers received. Per hormone, an area under the curve (AUC) with respect to the ground (Pruessner et al., 2003) was calculated across the four repeated measurements at pre-test and post-test separately, reflecting overall oxytocin (in pg/ml) and cortisol (in nmol/l) secretion from the morning sample from day 1 to the evening sample from day 2. Time in-between sampling moments for the calculation of the AUC was derived from the saliva collection application.

Hormonal reactivity

Fathers’ oxytocin and cortisol reactivity in response to interacting with their infant was assessed from saliva samples collected before and after a 10-min father–infant free play interaction at pre-test and post-test. Fathers sampled saliva before the interaction and 10 min after the end of the interaction. Samples were stored at −20 °C as soon as possible after collection. Extreme outliers (i.e., z > 5) on the pre-interaction and post-interaction samples at pre-test and post-test were winsorized. A change score (post-interaction value – pre-interaction value) was calculated per hormone for both pre-test and post-test. Pre-interaction hormone values were significantly correlated with change scores for oxytocin at pre-test, and cortisol at pre-test and post-test, rs between −0.42 and −0.79, ps < 0.001. Therefore, residualized change scores were calculated (i.e., the post-interaction value was residualized for the pre-interaction value) and used in further analyses. Outliers, one for pre-test oxytocin reactivity, one for pre-test cortisol reactivity, and two for post-test cortisol reactivity (all z > 3.29), were winsorized.

Exploratory outcomes: structure and attunement

The Baby Care Questionnaire (BCQ; Winstanley & Gattis, 2013) was used to assess fathers endorsement of principles relating to regularity and routines (Structure; 17 items, e.g., “Babies benefit from a fixed napping/sleeping schedule;” αpre = 0.85, αpost = 0.85) and to infant cues and close physical contact (Attunement; 13 items, e.g., “Responding quickly to a crying baby leads to less crying in the long run;” αpre = 0.68, αpost = 0.75). The BCQ Structure and Attunement scales have been shown to have concurrent validity (Winstanley & Gattis, 2013). Items were rated on a 4-point scale ranging from 1 (strongly disagree) to 4 (strongly agree). Higher mean scores on the subscales reflected higher endorsement of principles relating to structure and attunement.

Covariates

Depressive symptoms

To allow controlling for potential intervention effects on fathers’ depressive symptoms and potential effects of fathers’ depressive symptoms on outcome variables, fathers completed the Edinburgh Postnatal Depression Scale (EPDS; Cox et al., 1987; Edmondson et al., 2010) after the pre-test and post-test research visits. A change score (EPDS score post-test – EPDS score pre-test) was calculated to reflect change in depressive symptoms. One outlier (z < −3.29) was winsorized. As per CONSORT guidelines (Moher et al., 2010), we did not test whether the intervention and control group differ on depressive symptoms at baseline as this is not a valid basis for adjusting for confounders (see, e.g., De Boer et al., 2015). We report analyses both without and with controlling for change in depressive symptoms.

Hormone covariates

We examined whether there were relevant covariates for hormone basal levels and reactivity, see Supplementary Information. We controlled pre-test oxytocin reactivity values for having drunk caffeine via residualizing. None of the other measures for hormone basal levels or reactivity were controlled for any covariates.

Intervention measures

A temperature data logger was secured in the baby carrier to measure how much participants effectively used the baby carrier during the intervention period. The logger recorded the temperature every 5 min across all 21 days of intervention. The recorded temperature approaches human body temperature while carrying. To distinguish periods of baby carrier use from periods of non-use, a baseline temperature threshold was set (see Supplementary Information). When temperature exceeded this threshold, this was considered the start-point of a carrying period. All subsequent measurement points during which the logger recorded a temperature above the threshold were considered part of the carrying period up until the last data point during which the temperature exceeded the threshold, which was considered the end point of the carrying period. The carrying period was reflected by clear peaks in the logger data (steep increase in temperature followed by a steep decrease). On average, fathers used the carrier for 4.36 h (SD = 4.00) spread over 3.41 days (SD = 1.95) in week 1, for 3.57 h (SD = 2.82) spread over 2.98 days (SD = 1.84) in week 2, and for 3.55 h (SD = 3.20) spread over 3.12 days (SD = 2.08) in week 3. Mean total recorded time of use across the intervention period was 11.38 h (SD = 8.63, range = 0–34.75 h).

Fathers reported daily on their use of the baby carrier or seat via an application on their smartphones. Mean reported use of the tool across the intervention period was 10.16 h (SD = 7.50), i.e., on average around half an hour per day. Baby carrier and baby seat users did not differ in reported tool-use time, t(72) = −0.09, p = 0.93. For the carrier users, reported tool-use time correlated highly with the time recorded by the temperature data logger, r(36) = 0.87, p < 0.001. For each time they reported using the tool, fathers also indicated the reason why they used the tool and after the intervention fathers indicated how they experienced using the tool. Descriptive information on the reasons for tool use and fathers’ experience can be found in the Supplementary Information.

Statistical Analyses

All analyses were performed in SPSS version 25. Effects for which p values <0.05 are considered statistically significant.

Missing data

Estimating incidental missings

Six fathers had partly missing home saliva samples. These missings were imputed based on the values of samples of these participants that were not missing, in order to make use of the data closest to the missings. Four fathers had missing values for evening samples but did have values for morning samples (or vice versa) and their missing values were imputed based on the regression equation in the complete cases predicting the hormone values in the evening from morning values on the same day (or vice versa). One participant at pre-test and one at post-test had two missing samples on 1 day (i.e., morning and evening), values for this day were replaced by his values for the other sampling day. When time in-between sampling moments was missing for a participant, mean imputation was used to replace the missing value. Similarly, for fathers who had partly missing EPDS scores (i.e., missing score at pre-test or post-test, n = 18), missing values were imputed based on the regression equation predicting post-test from pre-test (or vice versa) within their condition (carrier vs doomoo).

Multiple imputation

Eleven percent of data concerning final scores on outcome variables, moderators and fathers change in depressive symptoms was missing (range 1–20%). Little’s missing completely at random (MCAR) test indicated that data was MCAR, Χ2(753) = 777.71, p = 0.31. Missing data were imputed 50 times with 100 iterations using predictive mean matching (Little, 1988). The imputation model contained all outcome variables, all moderators, and intervention condition, change in depressive symptoms, father’s age at pre-test, father’s educational level and infant’s age at pre-test. The imputed datasets were used in all further analyses, unless stated otherwise. One participant (allocated to the baby seat intervention) withdrew informed consent after participating in the pre-test, and therefore had missing data on all variables except condition allocation. This participant was included in the multiple imputation analyses to allow intention-to-treat analyses (Ye et al., 2011). Of note, an alternative approach of imputing data separately for the two conditions did not change this study’s main conclusions.

Main analyses

Univariate repeated measures ANOVAs were conducted to test main effects of condition (carrier vs seat), time (pre-test vs post-test), and the interaction effect time × condition on the primary outcome variable sensitivity and secondary outcome variables involvement, basal hormone levels and hormonal reactivity. To allow pooling of F-tests of the multiply imputed datasets, ANOVAs were carried out using the SPSS Mixed Models procedure using effect coding, including a random intercept of person to account for the repeated measures within persons (see Van Ginkel & Kroonenberg, 2014). For the effects on oxytocin reactivity and cortisol reactivity, we ran population averaged models instead of random effects models as parameter estimates for the fixed effects for the population averaged models and random effects models were identical. F-tests of the 50 imputed datasets were pooled using SPSS syntax (see Van Ginkel, 2010). We averaged effect sizes across the imputed datasets to provide an indication of effect sizes (Van Ginkel et al., 2020).

Effect of time using the carrier

To explore whether time using the carrier was associated with the outcome variables we computed correlations between carrying time as recorded by the temperature logger and post-test outcome variables within the carrier group.

Sensitivity analyses

Several sensitivity analyses were conducted to see whether findings were robust against different analytical decisions. These analyses are described in the Supplementary Information.

Exploratory analyses

We explored whether the intervention affected fathers’ endorsement of principles relating to structure and attunement using repeated measures ANOVAs. Additionally, we explored whether infant sex, fathers’ basal oxytocin levels at pre-test, infant health, pregnancy complications, effect of the child on fathers’ sleep, paternal protective parenting at pre-test, and time using the tool as reported by fathers moderated the time by condition effect on sensitivity and involvement (see Supplementary material for the measurement of these moderators).

Results

Descriptive Statistics

Table 2 reports the means and standard deviations of the outcome variables per condition in complete cases.

Table 2 Descriptives per intervention group (complete cases)

Main analyses

Primary outcome: sensitivity

For fathers’ sensitive parenting behavior as the outcome, there were no statistically significant effects of condition, F(1,72) = 1.04, p = 0.31, d = −0.24, time, F(1,71) = 0.96, p = 0.33, d = −0.23, or time × condition, F(1,70) = 0.25, p = 0.62, d = 0.12. The intervention did not affect fathers’ sensitive parenting behavior.

Secondary outcomes

Involvement

There were no statistically significant effects of condition, F(1,71) = 1.50, p = 0.23, d = 0.29, time, F(1,61) = 0.31, p = 0.58, d = −0.14, or time × condition F(1,62) = 0.00, p = 0.96, d = 0.01, on paternal involvement as measured with the application. In addition, there were no statistically significant effects of condition, F(1,68) = 1.20, p = 0.28, d = 0.26, or time, F(1,53) = 0.04, p = 0.85, d = 0.05 on time fathers reported to spent with their infant. However, a statistically significant time × condition effect was found, F(1,59) = 8.27, p = 0.006, d = 0.73. Follow-up tests indicated that fathers in the carrier condition showed a non-significant decrease in time spent with infant from pre- to post-test, t(30) = −1.84, p = 0.08, d = −0.65, whereas fathers in the seat condition showed a significant increase in time spent with infant from pre- to post-test, t(26) = 2.16, p = 0.04, d = 0.83.

Basal hormone levels

A statistically significant effect of time on basal oxytocin levels was found, F(1,64) = 9.49, p = 0.003, d = −0.75, indicating that fathers’ basal oxytocin levels decreased from pre-test to post-test. There were no statistically significant effects of condition, F(1,69) = 0.67, p = 0.42, d = −0.19, or time x condition, F(1,66) = 0.64, p = 0.43, d = 0.19. For fathers’ basal cortisol levels as outcome variable, there were no statistically significant effects of condition, F(1,69) = 0.02, p = 0.88, d = −0.04, time, F(1,66) = 1.14, p = 0.29, d = 0.26, or time × condition F(1,64) = 0.15, p = 0.70, d = 0.10. The intervention did not affect fathers’ basal oxytocin or cortisol levels.

Hormonal reactivity

No statistically significant effects of condition, F(1,72) = 0.83, p = 0.37, d = 0.21, time, F(1,73) = 0.00, p = 0.96, d = 0.01, or time x condition, F(1,73) = 1.77, p = 0.19, d = 0.31 on oxytocin reactivity were found. Moreover, there were no statistically significant effects of condition, F(1,72) = 0.24, p = 0.63, d = −0.11, time, F(1,72) = 0.01, p = 0.91, d = −0.03, or time × condition, F(1,73) = 0.61, p = 0.44, d = 0.18 on cortisol reactivity. The intervention had no effect on fathers’ oxytocin or cortisol change in response to interacting with their infant.

Effect of time using the carrier

There were no significant correlations between recorded carrying time by the temperature logger and post-test outcome variables (see Supplementary material); however, the correlations between carrying time and sensitivity (r = 0.29), involvement (r = 0.22) and cortisol reactivity (r = −0.31) were of a moderate size and in the expected direction.

Sensitivity Analyses

Statistical results from the sensitivity analyses are presented in the Supplementary material. None of the sensitivity analyses resulted in different conclusions compared to the main analyses.

Exploratory Analyses

Effects on structure and attunement

There were no effects of condition, F(1,64) = 0.06, p = 0.81, d = −0.06, time, F(1,46) = 0.17, p = 0.68, d = −0.11, or time × condition, F(1,47) = 1.06, p = 0.31, d = 0.28 on Structure. Moreover, there were no effects of condition, F(1,65) = 0.12, p = 0.73, d = −0.09, time, F(1,47) = 0.79, p = 0.38, d = 0.24, or time × condition, F(1,54) = 0.13, p = 0.73, d = 0.09 on Attunement. The intervention did not affect fathers’ endorsement of parenting principles relating to structure and attunement.

Moderation analyses

None of the explored moderators (i.e., infant sex, fathers’ basal oxytocin levels at pre-test, infant health, pregnancy complications, effect of the child on fathers’ sleep, paternal protective parenting at pre-test, and time using the tool as reported by fathers) interacted significantly with time and condition in the prediction of paternal sensitivity (Fs between 0.00 and 2.29, ps > 0.14). Additionally, there was no significant time × condition × moderator effect for any of the explored moderators on involvement as measured with the application (Fs between 0.00 and 0.46, ps > 0.50) or on time spent with infant (Fs between 0.00 and 3.39, ps > 0.07). See the Supplementary material for the statistics of these analyses.

Discussion

This RCT, preregistered on https://osf.io/qwe3a, was the first to examine the effects of a soft baby carrier intervention on fathers’ parenting behavior and hormonal functioning. The results showed that the intervention did not affect fathers’ sensitive parenting, although carrying time was moderately (but non-significantly) associated with fathers’ post-test sensitivity in the intervention group. The baby carrier intervention also did not promote fathers’ involvement. Involvement operationalized as hours spent with the infant decreased over time for fathers in the carrier condition compared to fathers in the control condition. The baby carrier intervention had no effect on fathers’ basal oxytocin or cortisol levels, nor did it affect fathers’ oxytocin and cortisol reactivity to interacting with their infant. Additionally, there were no effects on fathers’ endorsement of principles relating to structure and attunement. Exploratory moderation analyses revealed no moderators of the intervention effects on sensitivity and involvement.

No significant effect of baby carrying on paternal sensitivity was found in the current study. Using an inferiority test (Lakens et al., 2018), we compared this study’s effect to a previous study in mothers showing a medium, albeit non-significant positive effect on sensitivity (Anisfeld et al., 1990) and found that the current effect was statistically smaller than the previously reported effect in mothers, t(68) = −3.17, p < 0.01. Baby carrying may affect mothers differently than fathers, as sex-specific differences have been reported in parenting and its neurohormonal correlates (Rajhans et al., 2019). Nevertheless, such differences seem attenuated when fathers are more attuned to and involved with their infants (Abraham et al., 2014), which is exactly what the current baby carrier intervention aimed to do. Other studies did find positive effects of skin-to-skin contact on fathers’ feelings and behavior toward their baby (Chen et al., 2017; Varela et al., 2014), but it should be noted that the baby carrier did not promote skin-to-skin contact. Additional differences between the current study and the Anisfeld et al. (1990) study may explain the diverging findings. Specifically, the Anisfeld et al. (1990) sample consisted of mothers with a low socioeconomic status, whereas fathers in the current study were overall highly educated. Parents from lower socioeconomic backgrounds have been reported to show lower parenting sensitivity (Pelchat et al., 2003) and examining effects of baby carrying in fathers with a lower socioeconomic background may be an interesting target for future research. Importantly, in the study by Anisfeld et al. (1990) no pre-test was included and therefore it remained unclear whether using a soft baby carrier led to an increase in maternal sensitivity over time.

The baby carrier intervention did not promote fathers’ involvement with their infants. No effect was found on involvement assessed in real-time using a smartphone application. Unexpectedly, we found that fathers in the baby seat condition significantly increased in self-reported hours spent with their infant from pre- to post-test, whereas fathers in the baby carrier condition showed a non-significant decrease over time. This suggests that using a baby seat may positively influence how much time fathers spend with their child. Speculatively, enhanced face-to-face or playful interaction between father and infant through the use of a baby seat may stimulate father–child bonding and contribute to an increase in fathers’ time spent in caretaking activities (Premberg et al., 2008). Also note that we had no information on whether the child was awake or asleep during tool use, and if children tended to fall asleep during carrying this would not promote further interaction between father and child. Future research should examine whether the increase in involvement in the seat condition and trend toward decrease in involvement in the carrier condition are maintained over a longer-term period.

Fathers’ basal oxytocin and cortisol levels were not affected by the baby carrier intervention, nor was their oxytocin and cortisol reactivity to interacting with their infant. These findings are not in line with previous studies suggesting that touch and skin-to-skin contact are related to increased oxytocin levels and decreased cortisol levels (e.g., Cong et al., 2015; Feldman et al., 2010; Field, 2010). Contrary to these previous studies, here we did not assess immediate effects of physical contact, but rather longer-term effects on hormonal basal levels and reactivity. Possibly, physical contact enhances oxytocin and decreases cortisol momentarily, but not on the longer term. In line with this suggestion, fathers’ cortisol levels were found to decrease during skin-to-skin contact with their infant, but increased again shortly after the skin-to-skin period ended (Cong et al., 2015). Moreover, we note again that in our study we did not assess effects of skin-to-skin contact but rather of physical contact that was not directly skin-to-skin. Although both types of contact are physical, they are not the same, and hormonal levels may be more strongly affected by direct skin-to-skin contact. Interestingly, we found a decrease in oxytocin over time independent of intervention condition, suggesting that fathers’ oxytocin levels drop from 2–4 to 3–6 months postnatally. It is still largely unknown how fathers’ oxytocin levels may change throughout the postnatal phase. A previous study reported a rise in plasma oxytocin levels in first-time fathers from the first postpartum weeks to 6 months postpartum (Gordon et al., 2010). This points to mixed findings and possibly large interindividual differences in hormonal changes exist. Future research may focus on examining how fathers’ hormonal levels vary during this period and on potential moderators of such changes.

Limitations and Future Directions

This study’s findings should be considered within the context of some limitations. First, fathers did not all adhere to the instruction to use the carrier for at least 6 h per week, spread over a minimum of 4 days, for 3 weeks. The average recorded time of use across the intervention period was less than 12 h, where it should have been 18 h if the fathers had followed the instructions. Intervention effects might have been stronger if program adherence had been higher. We did find moderate correlations within the carrier group of recorded carrying time with sensitivity, involvement, and cortisol reactivity at post-test, suggesting that fathers who used the carrier more intensively were more sensitive, more involved, and their cortisol levels decreased more during interacting with their infant. These correlations should be interpreted cautiously because they did not account for pre-test differences between fathers and they might be explained by other factors relating to both carrying time and outcome variables. The exploratory moderation analyses indicated that time using the tool did not significantly moderate the effects of the intervention, but this may also be due to compromised power for the moderation analyses. Future studies may consider adjusting the intervention schedule by for instance increasing the number of intervention weeks. Extending the intervention period will allow us to assess whether longer use of the baby carrier (and/or use with somewhat older infants) has more pronounced effects. As an alternative or complementary approach, parents may be contacted weekly to review their baby carrier use, so that use of the baby carrier can be reinforced and constructive feedback can be provided when they experience problems in (sufficient) use of the baby carrier. For many interventions, higher compliance predicts better outcomes (Berkel et al., 2018; Clarke et al., 2015).

Second, we had no information on baby carrier use of fathers in the control condition during the intervention period. We know that at the time of inclusion none of the participants used a carrier for over 5 h per week. Importantly, we asked fathers in both conditions approximately 4 months after the end of the intervention how much they had used an infant carrier on average per week over the past 4 months. Fathers in the carrier condition reported using a carrier more often than fathers in the control condition, with a medium effect size (Cohen’s d = 0.47). This suggests that fathers assigned to the carrier intervention used the infant carrier more often than fathers in the control condition in the months after the intervention and it is likely that this difference was at least similar during the intervention period. However, it remains important for future studies to rule out cross-contamination during the intervention period.

Third, we only assessed short-term effects of the intervention and therefore cannot speak to potential effects of baby carrying on the longer term. Research on the effects of the baby carrier intervention suggests that spending time in physical contact with the infant may promote fathers’ attention to infant signals (i.e., increases fathers’ amygdala reactivity to infant crying; Riem et al., 2021). Possibly, this may stimulate sensitivity in the longer run, but this remains to be tested in future research.

Fourth, for the measurement of basal cortisol levels, it might have been optimal to increase sampling intensity, i.e., collect more than two samples per day, as this would have allowed us to account for cortisol’s diurnal rhythm which is not linear. However, given out-of-the-home work of most of our participants, this would have led to many missing data, and we therefore preferred collecting saliva four times across 2 days. Nevertheless, as sampling intensity may affect the accuracy of cortisol estimation, future studies should consider using a higher number of samplings per day (Hoyt et al., 2016).

Fifth, we focused on fathers in the current study and not on both parents. Although less is known about paternal caregiving, it would be beneficial for future studies to take multiple caregivers (e.g., fathers and mothers) into account. Also, the fathers in this study were in heterosexual relationships, which may affect their parenting roles. Finally, we did not assess effects of the carrier intervention on infant outcomes, such as attachment security. Examining effects of fathers’ baby carrying on infants seems an interesting avenue for future research as previous studies on infant carrying in mothers and skin-to-skin contact in fathers reported positive effects on child outcomes (Anisfeld et al., 1990; Shorey et al., 2016).

Implications

Our findings have implications for those who want to promote paternal sensitive caregiving. In all, we did not find that baby carrier affected fathers’ caregiving behavior. It is important to examine whether interventions such as a baby carrier intervention can have positive effects on parental sensitivity, as this type of intervention requires less involvement of interveners and is easier and cheaper to implement than interventions that specifically target sensitive caregiving behavior (e.g., Video-feedback Intervention to promote Positive Parenting and Sensitive Discipline (VIPP-SD; Juffer et al., 2008; see Buisman et al., 2022 for effects of prenatal video-feedback using ultrasound (VIPP-PRE) on fathers’ sensitivity). When a baby carrier would be as effective as personalized feedback to promote sensitive parenting behavior, the lower costs of a baby carrier intervention may mean that it would be better in terms of cost-effectiveness. However, the current findings provide no indication that this less extensive type of intervention is effective at enhancing paternal sensitivity (at least not with the current carrying duration). Intervention programs may need to target parenting behavior more directly to positively affect it.

A theoretical issue is whether effects of a baby carrier intervention could be different for men and women, because of differential hormonal processes underlying parenting behavior. In support of this idea, increasing oxytocin levels were found to be related to different parenting behaviors in fathers and mothers (Feldman et al., 2010). Nevertheless, as mentioned above, other explanations for the different outcomes for fathers in this study compared to Anisfeld et al. (1990) results for mothers cannot be excluded. Additionally, it is unclear how much the mothers in that study used the infant carrier. If mothers in that study used the infant carrier more often and with longer duration than fathers in the current study, it may be beneficial in future research to more closely follow-up or more actively stimulate fathers to use the baby carrier more frequently during the intervention.

Conclusion

In conclusion, this RCT was the first to examine the effects of a soft baby carrier intervention on fathers’ interactive behavior with their infant and their oxytocin and cortisol levels. Infant carrying did not promote fathers’ sensitive parenting or involvement, nor did it affect their basal hormonal levels or reactivity to interacting with their infant. Future research may examine whether infant carrying has more beneficial effects in the longer term or in different groups of fathers (e.g., fathers with a lower socioeconomic status). Of course, besides the impact on fathers the effects on infants of being carried by their fathers is an important question still to be answered.