Introduction

Pre- and perinatal events are associated with childhood and adult behaviour and mental illness [1]. Low birth weight (LBW) is a well-studied risk factor in the field of developmental origin of health and disease (DOHaD) [2] and predicts adult psychopathology and violent criminal behaviour [3,4,5,6]. Similarly, childhood behaviour has been associated with LBW in multiple settings, including twin studies [7,8,9,10,11]. There are two primary reasons why babies have LBW: they are either constitutionally small with symmetric fetal growth, or they have fetal growth restriction, which is associated with more neurodevelopmental deficits [12]. The latter is typically marked by asymmetric fetal growth (larger head than abdomen) as blood/nutrient flow is diverted to the developing brain. Male fetal sex in LBW is associated with more asymmetric growth [13], more perinatal complications [14] and possibly more early neurodevelopmental deficits [15,16,17,18], suggesting male vulnerability. Given widespread sex dimorphism in normal fetal neurodevelopment [19], neuropsychiatric behavioural correlates of LBW could similarly differ, but reports of behavioural sex differences from LBW in childhood and adolescence have been inconsistent [20,21,22,23].

Murray et al., who examined the childhood behaviour checklist (CBCL) attention problems [24, 25], found stronger effects of reduced fetal growth in females compared to males [21]. In this study, 3700 individuals from the Brazilian PELOTAS birth cohort were followed prospectively with confounders recorded during pregnancy and behavioural assessment at age four. In contrast, Momany et al., using the DSM-IV ADHD Rating Scale and the Conners’ Rating Scale–Revised Short Form, found that males with lower BW had a stronger association with externalising and ADHD behaviour at age 12 [22] in 900 individuals from North America. In this study, parents recalled confounders and BW, giving rise to the risk of recall bias. A meta-analysis from 2018 by the same author did not find an influence of sex on effect size of BW on ADHD but demonstrated that use of categories for both BW and ADHD measures resulted in significant heterogeneity in effect sizes [26]. Interestingly, a high-powered study by Dooley et al. subsequently examined the sex-specific effects of continuous BW on all continuous CBCL-scores [25]. In this paper, nearly 10 000 individuals in the ABCD cohort were assessed at age 9–11 with subsequent testing of the association between BW and CBCL subscales- and total score [20]. At a conservative significance threshold, Dooley et al. found an inverse association between BW and total CBCL scores, attention problems and aggression problems driven by males [20]; additionally, social problems had a nominally significant increase in males with lower BW compared to females, but the sex interaction was insignificant after correction for multiple testing. A limitation of the ABCD study is the reliance on parental recollection of BW and potential confounders and the single behaviour assessment. The cross-sectional nature of previous studies limits conclusions regarding persistent sex differences as low BW and biological sex are also implicated as determinants of behavioural phenotype trajectories across childhood and adolescence [27, 28]. A male vulnerability to attention- and peer problems from lower BW was partly supported by a recent study using repeated measures of the strength and difficulties questionnaire from ages 9–17 but the interaction was not significant. In addition, the study’s information on familial confounders of BW was collected from age 9 onward and could represent downstream consequences of childhood behaviour [23]. The conflicting results at different ages and limitations cited above leave the question of a persistent sex difference unresolved.

This paper aims to test sex differences in the relationship between BW and outcomes of aggression-, social- and attention problems. We seek to add to previous knowledge by using repeated measures across childhood and adolescence and to adjust for confounders collected during pregnancy to avoid recall bias.

Participants and methods

Study sample

We analysed data from the Raine Study (https://rainestudy.org.au/) [29]. The Raine Study is a longitudinal study following mother-baby dyads recruited at or around 18 weeks gestation (n = 2979) through the public antenatal clinic at King Edward Memorial Hospital and nearby private clinics in Perth, Western Australia, from May 1989 to November 1991. Offspring with information on BW and sex were followed up throughout childhood for behavioural assessments (age 5 n = 2058, age 8 n = 1978, age 10 = 1915, age 14 n = 1697, age 17 n = 1314) with attrition of mothers with lower age, education, income and non-European ancestry [30, 31]. The Human Research Ethics Committees at the University of Western Australia, King Edward Memorial Hospital, and Princess Margaret Hospital in Perth, Australia, granted ethics approval for each follow-up in the study.

Outcome variables

Using the Achenbach System of Empirical Assessment (ASEBA) CBCL for Ages 4–18 (CBCL/4–18), we derived scores for attention problems, aggressive behaviour and social problems at ages 5, 8, 10, 14 and 17 based on the report by Dooley et al. [20]. The CBCL/4–18 is a commonly used dimensional measure of child behaviour during the previous six months. The complete questionnaire contains 118 items and shows good internal reliability and validity in several population settings [24]. Participants were excluded from the analysis if they were missing more than 8 items on the entire CBCL [32]. The attention problem subscale measures both problems of attention, impulsivity and hyperactivity and consists of 11 items (score 0–22). The social problems scale measures peer interaction problems and consists of 8 items (score 0–16). The aggressive behaviour scale consists of 20 items (score 0–40). The CBCL is a highly validated psychometric tool and is used in the clinical setting as a guide and screening tool [24, 25, 33], with reproducibility of an 8-factor structure across countries [34]. Previous authors have used the CBCL raw scores (and not T-score corrected for age and sex) to examine sex differences [20, 21]. We chose the same approach of regressing the raw scores. The CBCL syndrome scores across populations tend to be right skewed and the clinical cut-offs for the raw scores are therefore low (for attention problems clinical relevance is assumed around 7 to 8 points depending on sex and age). Post-hoc we used T-scores to derive a clinical “borderline” score in each of the 3 domains (T-score cut-off > = 67).

We used two additional instruments for sensitivity analyses: the 1991 ASEBA preschool form of the CBCL (also filled out by parents) and the Teacher Report Form (TRF). The preschool CBCL scale for aggressive behaviour was assessed at two years of age and had 33 items (score 0–66). The 1991 edition of the preschool questionnaire did not have a validated attention problem and social problem scale. The TRF is another well-validated form under ASEBA and can be used in the clinical setting to support findings from the CBCL [35]. The TRF attention problems scale has 20 items (score 0–40), the aggression problems scale has 25 items (score 0–50) and the social problems scale has 13 items (score 0–26).

Early life determinants and potential confounders

Gestational age (GA) in weeks was determined either by the date of the last menstrual period (LMP) or fetal biometry at the 18-week gestation ultrasound (USS) examination. Maternal age, BW and fetal sex were retrieved from hospital records. Different populations have different normal spectra for BW [36], and we used the continuous, normalised BW as the exposure by subtracting our sample BW mean and dividing by the sample standard deviation. Post hoc, a dichotomized variable corresponding to a LBW was derived (< 2500 g cut-off).

Information on confounders was recorded at prenatal visits at gestational week 18 by maternal questionnaire. As pregnancy-related risks are increased in mothers with predisposition for mental illness [37] we included factors that could be associated with both BW and behavioural outcomes. The included factors were maternal education, maternal psychiatric illness, maternal smoking during pregnancy, maternal diabetes mellitus or hypertension, family income, maternal ethnicity and maternal alcohol consumption (for questionnaire formulation see supplementary materials). No post-natal variables were used, to avoid adjusting for downstream consequences of BW. We did, however, include the cohort age at assessment as a fixed term to minimise the noise from age-related CBCL-score reductions. Potential confounders were inserted in a directed acyclic graph (supplementary Fig. 1) to help us decide on the co-variables to include in our models. Smoking, alcohol consumption, economic class and maternal education were recorded as ordinal 5–6 level variables and were treated as continuous in the multivariable analysis.

Statistics

All statistical analyses and graphs were performed in “R” [38] and its associated libraries “Gmisc”, “lme4”, “lmeresampler”, “lmertest” and “boot”. The Wilcoxon rank-sum test was used to compare sample distributions in the continuous baseline variables and CBCL outcomes between males and females.

Testing the associations between BW and behavioural outcomes was done with mixed-effects modelling to avoid pseudo-replication from repeated measurements of the same participant. Recent work has demonstrated that the treatment of ordinal data as continuous does not impact inference in most situations [39], and linear mixed modelling is robust to missing data and violation of distributional assumptions [40]. We, therefore, chose to treat the CBCL scores as continuous variables. As we examined three different (albeit correlated) outcomes, we applied a Bonferroni-correction of 3 for our alpha, meaning statistical significance was set at \(\frac{0.05}{3}=0.01667\).

Model diagnostics were evaluated by examining histograms and qq-plots of residuals and random effects. CBCL subscale scores are highly right-skewed, and our residuals and random-effects displayed significant violations of distributional assumption. We, therefore, performed a non-parametric bootstrap at the participant-ID level with 5000 simulations, as suggested by Thai et al. [41], to derive estimates and 98.3% confidence intervals, which were then used for primary inference. Approximate p-values were calculated from the bootstrapped estimate z-statistic. For the sensitivity analysis using the TRF (see below) we performed a simple linear regression of TRF-scores as described below, and subsequently did a non-parametric bootstrap with 5000 simulations because of non-normality in our residuals.

We had 4 model levels, with the sex - interaction (\({\varvec{B}}_{3}\)) being the variable of primary interest. If \({y}_{IA}\) is the CBCL score for a given individual at a given assessment age, \({\epsilon }_{IA}\) is the error term, \({B}_{0}\) is the intercept and \({u}_{I0}\) is the random effect of the participant on the intercept, our models were as follows:

1) An unadjusted model including only a fixed effect of BW (\({B}_{1}\)).

2) An unadjusted model as in 1) but with a fixed effect of sex (\({B}_{2}\)) and a sex interaction term (\({\varvec{B}}_{3}*{sex}_{IA}\)) where females were the reference category,

3) A model as in 2), but with confounders as detailed in the directed acyclic graph (DAG) (\({B}_{cov}\)) and the age at assessment (\({B}_{4}\)) added as fixed effects

4) A parsimonious model with main effects and interaction. The parsimonious model was derived using backward variable removal from the full model 3) by examining p-values from the linear mixed effects model. Only variables with a p-value < 0.2 were included in model 4)

The full model, including interactions and confounders (model 3 and 4) was specified as follows:

$$\begin{array}{l}{y_{IA}} = {B_0} + {u_{I0}} + ({B_1} + {B_3} * se{x_{IA}}) * B{W_{IA}}\\+ {B_2} * se{x_{IA}} + {B_4} * ag{e_{IA}} + {B_{cov}} * co{v_{IA}} + {\epsilon _{IA}}\end{array}$$

For prespecified sensitivity analyses, we excluded preterm births (less than 37 weeks gestational age at birth) to ensure robustness of results in the term cohort; additionally, aggressive behaviour was reassessed with the inclusion of an age 2 behavioural assessment using the preschool form of the CBCL. We added a fixed term to the model to account for the increased number of items on the preschool aggression problems. Finally, as parent characteristics have an association with both BW and CBCL scores, it is possible that parents of children with lower BW could rate behaviour differently and bias the estimate of childhood phenotype; therefore, we applied our parsimonious model covariables to the teacher ratings at age 10 in a linear regression with bootstrapping of standard errors. Post-hoc we examined how dichotomisation of variables at extremes of BW and CBCL-scores affected parsimonious model output. To model the dichotomised outcomes we used a generalised mixed effects model with a logit link. Graphical illustration of effects was performed with the plot_model function from the sjPlot package in R. We used the STROBE cohort checklist when writing our paper [42].

Results

In the Raine Study, 2868 (100%) live births were assessed for eligibility (Supplementary Fig. 2). BW and sex were recorded for 2269 (79.1%) participants with a CBCL assessment from ages 5–17 (used in the model 1 + model 2 regression). Information on DAG-determined confounders was available for 1994 (69.5%) participants (used in the model 3 regression). Gestational age was dropped from the aggressive behaviour parsimonious model 4 for a total of 1996 (69.6%) participants. Maternal baseline variables were evenly distributed between pregnancies with male and female offspring (Table 1). The mean BW in the Raine Study cohort was 3318.5 g (1-SD = 594.5 g) and males had higher BW than females. Males had higher CBCL scores across the examined subscales including teacher assessments and preschool assessments (Table 2 and supplementary Table 3). CBCL follow-up decreased as the cohort age increased with 1076 participants completing all 5 assessments, a mean of 3.95 assessments and substantial attrition at age 17. Compared to dropouts, the Raine Study participants with at least 1 CBCL assessment had more favourable socio-economic status, less pregnancy-related risk behaviour and a higher BW (supplementary Table 1).

Table 1 Baseline demographics for the analytic cohort (n = 2269)
Table 2 Child behaviour checklist (CBCL) scores for the analytic cohort (n = 2269)

For aggressive behaviour, there was no significant main effect of BW in the unadjusted model 1(B: -0.0872, 98.3%CI: [-0.294, 0.130]) (Table 3). We found a significant sex x BW interaction in the unadjusted model 2 (B: -0.436, 98.3%CI: [-0.844, -0.0253]) with a stronger association for aggressive behaviour observed for males. The sex x BW interaction diminished after confounder adjustment in our complete (B: -0.315, 98.3%CI: [-0.744, 0.127]) model 3 and parsimonious (B: -0.310, CI: [-0.742, 0.140]) model 4 and did not show any significant interaction between BW and male sex. (Table 3; Fig. 1)

Table 3 The association between BW and aggression problems in the Raine Study ages 5–17

For attention problems, there was a significant inverse linear effect of BW in the unadjusted model 1 (B: -0.131, 98.3%CI: [-0.227, -0.0252]) with a 1 SD increase in BW reducing attention problems by 0.131 points (Table 4). We found a significant sex x BW interaction in the unadjusted model 2 (B: -0.334 98.3%CI: [-0.530, -0.137]) with male sex driving the association between BW and attention problems in the overall sample. The sex interaction was robust to multivariable adjustment in the complete model 3 (B: -0.276 98.3%CI: [-0.503, -0.0388]) and in the parsimonious model 4 (B: -0.274 98.3%CI: [-0.507, -0.0432]) (Table 4; Fig. 1).

Fig. 1
figure 1

Graphical illustration of model output

Graphical illustration of the estimated overall linear associations between birth weight and scores of aggression, attention and social problems measured by the Child behaviour checklist from ages 5–17 in the parsimonious confounder adjusted model. Note that the y-axis changes for each scale due to the different number of items and that the sex difference of interest (association of birthweight and behaviour) is captured in the slope of the lines, not the mean differences between the lines. The shaded area represents 95% CI

Table 4 The association between BW and attention problems in the Raine Study ages 5–17

For social problems, there was a significant inverse linear effect of BW in the unadjusted model 1(B: -0.0646 98.3%CI: [-0.123 -0.0043]) with a 1 SD increase in BW reducing social problems by 0.0646 points (Table 5). We found a significant sex x BW interaction in the unadjusted model 2 (B: -0.164 98.3%CI: [-0.283, -0.0441]), with male sex driving the association between BW and social problems in the overall sample. The sex interaction was robust to multivariable adjustment in the complete model 3 (B: -0.149 98.3%CI: [-0.286, -0.013]) and the parsimonious model 4 (B: -0.148 98.3%CI: [-0.285, -0.00734]) (Table 5; Fig. 1).

Table 5 The association between BW and social problems in the Raine Study ages 5–17

For sensitivity analyses of aggressive behaviour, we had CBCL data from the age 2 round of assessments for aggressive behaviour. Inclusion yielded an additional 115 participants for regression (model 1 n = 2384). A full reanalysis with inclusion of age 2 assessments did not change the estimate size or significance (supplementary Table 3).

Exclusion of preterm births (n = 147) had little effect on the estimate parameters for social- and attention problems (supplementary Table 4). We saw a 30% reduction in the beta coefficient of the BW x sex interaction for aggressive behaviour.

Age 10 teacher assessment of child behaviour with the TRF was available for 1585 individuals with relevant covariables for the model 4 regression. No sex x BW interaction reached significance at the 98.3% level, but we saw directionally similar associations between BW and childhood assessments as was seen with the parent assessments (supplementary Table 5). Post-hoc we used a BW category of < 2500 g as the exposure. This approach yielded consistent results for attention problems, but diminished results for social problems (supplementary Table 6). When using a “borderline” cut-off for CBCL scores (T-score > = 67) estimates were drastically reduced (supplementary Table 7).

Discussion

Using repeated parent ratings across childhood and adolescence we examined crude and confounder-adjusted sex differences in the association between BW and aggression, attention, and social problems from ages 5–17 years. We found longitudinal sex difference in the relationship between birth weight and attention problems and social problems, but not aggression problems.

We found no BW x sex interaction in aggressive behaviour and so could not reproduce the Dooley et al. findings showing a male vulnerability. Although there was a crude association in our model 2, the signal weakened with multiple regression in models 3 and 4. The inclusion of age 2 assessments, exclusion of preterm births or assessment by teachers did not change inference. Our findings also contrast with results from Momany et al. [22], who, similar to Dooley et al. found a BW x sex interaction. Both corrected for variables collected at the time of assessment. One explanation for this discrepancy could be that we did not have the power to detect a difference; however, our study sample was more than double that of Momany et al. Alternatively, it is possible that the earlier findings were subject to residual confounding. Although confounders of the sex interaction are randomly allocated (through fetal sex), perinatal risks, specifically smoking, may have sex-specific effects on BW [43]. Perinatal risks can be markers of genetic liability to poor mental health [37] and inadequate confounder adjustment could bias previous estimates. Including perinatal risks collected during pregnancy should reduce such confounding, and maternal smoking survived backwards variable removal in all parsimonious models underscoring the association with child behaviour.

We found a BW x sex interaction in attention problems, confirming a male vulnerability. This finding was robust across all models. When converting the parsimonious effect-estimates to kg, they were larger but comparable to the Dooley paper (0.46 (95% CI 0.07–0.85) vs. 0.35 (95% CI 0.19–0.51) points per kg BW) [20]. The exclusion of preterm births did not dramatically change the effect size. Sensitivity analysis using raw scores from the TRF was in directional agreement with CBCL scores. The TRF models were not significant at our Bonferroni corrected threshold; however, there was a signal when using the 95% CI (data not shown). Although not conclusive, this data supports the results from the primary analysis. Our finding is also in line with the finding from Momany et al. [22]. Our finding is inconsistent with results from Murray et al., who found a female vulnerability in the Brazilian birth cohort PELOTAS [44]. It is not clear why our results show opposite findings, but there are differences between our designs that might contribute. First, Murray et al. used a very early behaviour assessment (age 4), and males and females have been known to manifest attention problems differently across childhood and adolescence [44]. Second, they used categorical exposures (low vs. appropriate birth weight) with a 2500 g threshold for both males and females, although BW spectra differ between males and females. Third, they approached the CBCL as an ordinal outcome variable, whereas we treated it as a continuous outcome. Fourth, they had a markedly different study sample compared to us, characterised by less affluent mothers with different pregnancy-related risk behaviours, which might change the relationship between BW and parent assessment. Our results also contrast with the apparent null finding from the sibling control study by Pettersson et al. [6]. Although not compared directly, they found a similar relationship between BW and rates of neurodevelopmental disorders diagnosed across life in males and females (supplementary Table 9 in [6]). It is important to note that “neurodevelopmental disorders” also encompass autism; furthermore, a diagnostic category is different from the continuous spectrum in the CBCL attention problem scale. Recent developments have suggested that psychopathology may be better viewed as dimensional traits [45]. Our supplementary analysis using categories of CBCL suggests that this could have important implications for inference regarding sex differences. In addition, incidence of ADHD diagnosis peaks later in girls as compared to boys [46], which might suggest that the age 17 cut-off of in our data collection represents a period of symptomatic latency for females [26].

We found a BW x sex interaction in social problems suggesting a male vulnerability. This finding was robust across all models. The exclusion of preterm births did not change the inference, and teacher assessment agreed with the results from the primary analysis. There have been previous reports of an overall association between lower birth weight and social problems in childhood and adulthood [47, 48]; however, Dooley et al. were to our knowledge the first authors to examine a sex difference directly and did not find a significant sex difference. Our results suggest that lower birth weight increases social problems in males but given the novelty of this finding replication is needed. The social problems scale measures problems with peers (e.g. “Gets teased a lot”) and immaturity (e.g. “Prefers being with younger kids”) and correlates with sustained attention [49] meaning that social problems could lie downstream of a primary effect on attention problems. Social problems have also been associated with autism [50] and monozygotic twin studies suggest increased autism from lower BW [51].

The biology explaining increased male social- and attention problems at lower BW are not elucidated by the current study, but white matter vulnerability during fetal development is well-recognised [52]. Male white matter growth in temporal and frontal regions is increased during periods of rapid growth [19]. Although not stratified by sex, MRI follow-up of LBW in adulthood demonstrates a persistent loss of white matter integrity in these areas [53]. In turn, frontal and temporal structures are important determinants of social cognition [54] and attention problems [55]. Studies in rats have suggested a male cellular vulnerability of myelin-forming cells to stressors [56] and others have reported reduced placental gene expression of enzymes forming the hormonal feto-placental barrier in LBW [57]. Whether direct cellular damage or neuromodulation from maternal hormones could underly such white matter changes in lower BW is unclear.

This study’s strengths are the prospective data collection (avoiding recall bias), limited attrition, and use of multiple well-validated psychometric instruments. The models are also robust in using repeated measures on individuals and a careful approach to model construction using DAGs to make explicit the relationship between our variables. This study’s limitations are the primarily Caucasian sample in the cohort and the age 17 cut-off for psychometric evaluation; furthermore, reports of a secular trend of increasing BW (around 24 g from 1995 to 2005) could limit the generalisability of reported effect sizes in present day. Changes in parental and medical practise from the 1990s to present day aimed at ameliorating adverse behaviour postnatally could also result in an altered association between lower BW and behaviour throughout childhood and adolescence, including the sex differences in this association.

In conclusion, using repeated measures from ages 5–17 with correction of maternal baseline variables collected during pregnancy, we were able to show a male vulnerability of lower birth weight in the development of attention problems and social problems; however, we failed to find a similar sex-interaction for the development of aggressive behaviour. Future studies should be careful in selecting continuous or categorical outcomes, as potential sex-differences at the extremes may differ from those found when considering total sample behaviour; furthermore, sex differences in brain morphology associated with variation in BW and potential postnatal mediators such as parenting strategies are potential areas of future research.