Background

Adolescence is a time of profound changes to the body and the brain, with substantial impact on an individual’s behaviour, emotions, and self-perception, among other things [1]. This transition includes puberty, the time period during which an individual acquires the capability for sexual reproduction [2]. The latter is characterised by an interplay of gonadotropin-releasing hormone, gonadotropins such as follicle-stimulating hormone and luteinizing hormone, and sex hormones such as androgens, estrogens and progesterones. Together, they not only drive changes of the body, but also directly act on the brain [3]. Studies using magnetic resonance imaging (MRI) to investigate human brain anatomy have illustrated that the brain undergoes significant changes during adolescence described by a complex yet orchestrated interplay of progressive (e.g., myelination) and regressive (e.g., pruning) neuronal processes [4]. While brain development in youths has been commonly investigated through the lens of age-related brain maturation, there has been an increasing number of studies focusing on neurodevelopment mediated by pubertal processes in youth [5,6,7]. These studies suggest that puberty-related brain development cannot be simply explained by age trajectories but rather goes beyond the effects of growing older [7,8,9] and consequently that pubertal development may thus be a more sensitive measure to study neurodevelopment in youths as compared to age [6].

A recent systematic review on the relationship between pubertal and structural brain development in human adolescents describes brain wide reductions in cortical grey matter thickness and volume associated with progressed pubertal maturation from both cross-sectional and longitudinal studies [7]. Findings suggest that these effects are global across the brain with frontal regions showing the most pronounced effects [7, 10]. Alongside cortical changes, advanced pubertal maturation is also associated with subcortical brain development, in particular the amygdala and hippocampus [11]. Across studies these effects are subject to sex differences, which not only manifest in varying effect sizes but also sometimes in opposing effect directions in males and females [5, 12].

While methodological choices, such as accounting for age in statistical models, may factor into the diverging observations, these differences may also stem from variability that is inherent to pubertal maturation [13]. Although all individuals pass through the same pubertal stages, there is large variability regarding pubertal timing and the speed of progression [14, 15]. Pubertal timing describes the time point at which an individual reaches certain pubertal milestones in comparison to their peers of the same age [16]. While pubertal timing appears to be highly heritable [17, 18], recent evidence is also linking variation in pubertal timing to environmental factors, such as nutrition intake, socioeconomic status or obesity [14]. This malleability may consequently lead to pubertal onsets that deviate in their timing and individuals thus experiencing early or late pubertal onsets [5]. Interestingly these deviations in pubertal timing appear to be associated with physical and psychiatric health issues [5, 19].

Many studies over the years have shown an association between pubertal timing and psychopathology [20,21,22]. In boys, evidence concerning the effect of pubertal timing on health risks is inconsistent and could be best described by the ‘off-time hypothesis’, that is either very early or very late onset [23]. In contrast, evidence for the association between health risk and pubertal timing in girls has been well-replicated, converging on the so-called ‘early timing hypothesis’, which posits that early maturing girls (most often assessed using age at menarche as a proxy measure, i.e. age at which individuals experience their first menstruation) are more likely to experience adverse mental health outcomes than their on time and late maturing peers [24, 25]. Therefore, pubertal timing and its malleability depict a critical tipping point which may set the course for later vulnerability and worse (mental) health outcome.

While most puberty-related imaging studies to date have focused on investigating the association between the brain and puberty status (i.e. the quantification of pubertal characteristics indicating a more or less advanced maturation akin to the transition through pubertal stages), imaging studies on pubertal timing—despite its importance for emerging (mental) health risks—are to the best of our knowledge scarce. The current study investigated the impact of pubertal timing on brain maturation, deploying age-matching to control for age-related neurodevelopment. Using structural imaging data from the Adolescent Brain Cognitive Development cohort (ABCD; [26]) we aimed at classifying pre- and post-menarcheal females using a machine learning model. To validate the sensitivity of our approach and to test the biological validity of the obtained class probabilities, we drew comparison to a brain age prediction framework, investigating to what extent both approaches capture the same or distinct neurodevelopmental variance in the female adolescent brain.

Methods

Sample descriptions

ABCD: For the menarche classification and as the test sample for the age prediction model, we included data of N = 3248 female (henceforth referring to individuals assigned female at birth; mean age = 11.91 years, SD = 0.65) participants of the Adolescent Brain Cognitive Development study 2-year follow up data [26]. Study protocols have been approved by either local institutional review boards (IRB) or by reliance agreements with the central IRB at University of California. For each study participant, structural brain imaging features were obtained from the tabulated imaging data provided by the ABCD release 4.0 [27]. The 2-year follow up data was chosen because it offers the most balanced distribution of pre- and post-menarcheal individuals. Subjects with missing MRI or missing relevant demographic data were excluded. Furthermore, those who did not answer either ‘yes’ or ‘no’ to the question ‘Have you begun to menstruate (started to have your period)?’ from the ABCD Youth Pubertal Development Scale and Menstrual Cycle Survey History (PDMS) [28], or whose imaging data quality was deemed too low for inclusion by two ABCD researchers, were excluded (see Additional file 1: Methods for a detailed description of the in- and exclusion procedure). From the PDMS data we determined pubertal status ranging from prepubertal to postpubertal. In brief, we summed pubic hair growth and breast development scores and incorporated information about menarche, and converted the resulting score to a pubertal status category according to a scheme provided by the ABCD study (variable: pds_p_ss_female_category). Pubertal status was calculated from youth-reported as well as caregiver-reported data to account for differences in the perception of pubertal maturation.

PNC: We used data from N = 786 female participants (mean age = 15.25 years, SD = 3.65) of the Philadelphia Neurodevelopmental Cohort (PNC; [29]) as an independent training sample to derive an age prediction model. In PNC, all study procedures were approved by the respective institutional review boards. We processed the T1 MRI images using FreeSurfer (version 7.1.1) [30] and derived the same cortical and subcortical features as used for the ABCD cohort. Euler numbers were used as a proxy of image quality for quality control [31]. Subjects with missing MRI, missing demographic data, a Euler number more than three standard deviations below the mean, or those with a medical rating of 3 or higher (severe medical condition) were excluded.

MRI data description

For each subject in both data sets we included a total number of 234 anatomical MRI features. Specifically, we used 30 subcortical features as well as, for each hemisphere, 34 volume, 34 thickness, and 34 area cortical features matching the Desikan–Killiany atlas [32] (see Additional file 1: Table S1). Of note, since ABCD data was acquired across 21 study sites, we performed batch harmonization with neuroCombat (v.0.2.12) [33] for each individual modality and training and test sets independently.

Statistical analyses

All statistical analyses were performed in python (v.3.11.5) [34]. Basic data handling was performed with numpy (v.1.24.3) [35] and pandas (v.2.0.3) [36, 37].

Menarche Classification: For the classification of pre- and post-menarche individuals in the ABCD sample, a linear discriminant analysis (LDA) classification model was trained using scikit-learn (version 1.3.0; [38]. For classification we split the full ABCD sample (N = 3248 females) into a training and an independent test set by randomly sampling 20% of the data into the test set (visual inspection indicated no difference in menarche status distributions across the whole, training, and test data set; see Additional file 1: Figure S1). The test sample consisted of N = 650 participants (pre-menarche: N = 419, mean age = 11.68 years, SD = 0.56; post-menarche: N = 231, mean age = 12.32 years, SD = 0.59; Table 1). Furthermore, to avoid bias in the training process, propensity score matching [39] was performed in the training data to achieve equal distributions of age and MRI scanner in the pre- and post-menarche groups, as well as equal group sizes. After age-matching, there was no statistical age difference (two-sided independent samples t-test, p = 0.968) between the pre- and post-menarche groups in the training dataset (pre-menarche: N = 775, mean age = 12.09 years, SD = 0.58; post-menarche: N = 775, mean age = 12.09 years, SD = 0.58; Table 1).

Table 1 Demographic information about training and test dataset

Participants’ responses to the question ‘Have you begun to menstruate (started to have your period)?’ from the ABCD Youth Pubertal Development Scale and Menstrual Cycle Survey History (PDMS) were used as target labels for the classification algorithm. Responses were encoded numerically in the original survey as (4: Yes; 1: No) and relabelled to 1 and 0.

The estimated class probabilities of the withheld test sample were extracted from the LDA model to further assess the biological validity of the classification. Train- and test set features were independently transformed into z-scores. We implemented a nested cross-validation procedure with a stratified 10-fold outer and inner loop. The inner loop was deployed for hyperparameter tuning via scikit-learn’s GridSearchCV. The hyperparameters explored in the grid search included the ‘solver’ parameter with options [‘svd’, ‘lsqr’, ‘eigen’] and the ‘shrinkage’ parameter with values [None, auto, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]. The performance metric used for evaluation was balanced accuracy and the outer loop was employed to assess the model’s performance in the training set. For the final model, an LDA model was re-fitted to the entire training dataset using the selected hyperparameters (solver: least squares solution, shrinkage: 0.7). To assess the model’s performance on unseen data, the menarche status of participants from a held-out test sample of ABCD subjects was classified using the final model. A balanced accuracy score and a confusion matrix were calculated. To furthermore confirm the classifier’s performance to be above chance in unseen data, we deployed a permutation testing with 1000 iterations. In each iteration, we randomly shuffled the target label (i.e. menarche status) in the training sample and fitted the final model to the training features. Those 1000 random models were then used to predict unseen data in the test sample, resulting in a null distribution of balanced accuracies against which we tested the empirical balanced accuracy score. In detail, we assessed how often the random models would result in a performance metric as extreme as the observed empirical value and divided it by the total number of iterations to obtain a p-value. To foster our results and to control for the influence of the unbalanced test set, we performed an additional analysis in which we iteratively subsampled n = 231 females 1000 times from the pre-menarche- to match the post-menarche group, which resulted in an evenly balanced test set.

Brain age prediction: The python package of the XGBoost (v2.0.3) library was used [40] to predict chronological age in months from the same 234 sMRI features as those used in the menarche classification. Model tuning was again performed via scikit-learn’s GridSearchCV. The hyperparameters explored in the grid search were ‘max_depth’: [3,6,9], ‘max_leaves’: [0,2,5,10], ‘learning_rate’: [0.001,0.01,0.1,0.5,1,3], ‘min_child_weight’: [1,10,100] and ‘n_estimators’: [100, 500, 1000]. The performance metric used for evaluation was r2_score and a 5-fold cross-validation approach was employed to assess the model’s performance during hyperparameter tuning. The final model was fitted to the entire training dataset using the determined hyperparameters (‘learning_rate’: 0.01, ‘max_depth’: 6, ‘max_leaves’: 0, ‘min_child_weight’: 10, ‘subsample’: 0.5, ‘num_rounds’: 1000). The model’s performance on unseen data was tested by applying it to the ABCD withheld test sample described above. The root mean squared error (RMSE) and mean absolute error (MAE) were calculated and the brain age gap (difference between predicted brain age and chronological age; BAG) was calculated for further analysis. To confirm that the age prediction model performed above chance, a permutation test with 1000 permutations was performed equivalent to the one described above in the menarche classification section.

Association analyses

Association analyses were performed with the ordinary least squares (OLS) regression function of the Python module statsmodels.formula.api (v0.14.0) [41]. In line with previous studies [42, 43], a residualised BAG was produced by regressing age and scanning site on BAG. Menarche class probabilities were residualised in the same way to account for age and scanning site. OLS regression was performed to test the association of residualised BAG and residualised menarche class probabilities. Finally, we tested for associations between age at menarche and menarche class probabilities, as well as age at menarche and BAG, controlling for age and scanning site in both instances, using OLS. Likewise, we tested for association between pubertal status, both caregiver- and youth-reported, and menarche class probabilities, as well as BAG, respectively. All association analyses were repeated accounting for potential effects of sociodemographic status (SES), body mass index (BMI) and race / ethnicity. In brief, BMI was calculated by averaging two height and weight measurements respectively and using the formula ‘height (lb) / height (in) × 703’. Ethnicity was encoded in 5 levels: White, Black, Hispanic, Asian, other (multiracial or ethnicity with too few members in the sample). Ordinal SES variables were transformed through rank-based inverse normal transformation and averaged, producing a single SES variable. Further details on the covariates and its calculations can be found in Kraft et al. [13].

Results

We first tested if it was possible to classify from anatomical MRI between same-aged pre- and post-menarcheal girls. Our classifier, trained in a sample of age-matched groups of females pre- and post-menarche performed with a balanced accuracy of 59.24% in the nested cross-validation procedure. Applied to a held-out test set of 419 pre- and 231 post-menarcheal girls, the classifier performed equally well (61.05% balanced accuracy, Fig. 1a). Permutation testing indicated significant above chance performance in the test set (p = 0.001, Fig. 1b). A validation analysis with an evenly matched subsets (50:50 balanced test set) yielded a mean balanced accuracy of 61.24% (range 57.14–64.94% across 1000 iterations) lending credibility to our initial results (see Additional file 1: Figure S2).

Fig. 1
figure 1

Menarche can be classified from brain imaging data. a Confusion matrix of performance in test data; b Histogram of the result of a permutation test in the held-out test sample. The red line shows the empirical balanced accuracy score of 0.61

Figure 2a depicts the class probability obtained from the pre-/post-menarche classifier for each individual in the independent ABCD test sample. In an association analysis in the post-menarche group, we found an association of derived class probabilities and age at menarche. Specifically, individuals with an earlier menarche tend to be classified as post-menarche with a higher confidence (coef = − 0.0793, p = 0.002, see Additional file 1: Figure S3 for bootstrapped results), which may suggest biological sensitivity of the class probabilities beyond the binary pre-/post distinction (Fig. 2b). We furthermore found a positive association between menarche class probability and pubertal status, which was stable across caregiver- and youth-report (youth-reported PDMS: coef = 0.061, p < 0.001, caregiver-reported PDMS: coef = 0.065, p < 0.001; Fig. 2c) supporting our initial interpretation. Of note, the associations with age at menarche and youth-reported puberty status were diminished when incorporating BMI, SES and race/ethnicity as confound factors (Additional file 1: Table S2), highlighting the complex relationship between these factors, puberty and the brain.

Fig. 2
figure 2

Menarche class probabilities are associated with measures of pubertal timing and status. a Density plot of post-menarche class probabilities of the pre- and post-menarche groups respectively. Class probability of 1 signifies a 100% confident classification as post-menarche, class probability of 0 signifies a 100% confident classification as pre-menarche. b Association of age at menarche and menarche class probability controlled for age and scanner. c Distribution of class probabilities (age- and scanner residualised) by puberty category (youth-reported)

Our classifier distinguished between pre- and post-menarcheal females of same age, thereby essentially distinguishing earlier from later pubertal timing relative to age-matched peers. Since pubertal development and age are intertwined, we further sought to investigate whether and to which degree the menarche class probabilities relate to brain age patterns. We next applied the brain age prediction model to the same independent ABCD test sample as used as test sample in the menarche classification. Here, the prediction model performed with an RMSE of 1.3 years and a MAE of 1.1 years (Fig. 3a). Permutation testing indicated performance significantly above chance (p = 0.001, Additional file 1: Figure S4). From the predicted brain ages, we calculated the brain age gap (difference between predicted brain age and chronological age; BAG). These gaps were significantly associated with menarche class probabilities (Fig. 3b), as observed from a linear model controlling for the effect of age and scanner (coef = 8.579, p < 0.001). This association stayed significant when including BMI, SES, and race / ethnicity as covariates in the analysis (Additional file 1: Table S2). BAG was positively associated with pubertal status (youth-reported PDMS: coef = 0.959, p = 0.025, caregiver-reported PDMS: coef = 0.961, p = 0.027). This effect was descriptively smaller as compared to the class probability effect and also diminished after including both variables into a single model, in which only class probabilities remained significantly associated with pubertal status (PMDS caregiver: coef = 0.4407, p = 0.001, PMDS youth: coef = 0.4268, p = 0.001). Interestingly, whereas the menarche class probability was weakly associated with age at menarche as reported above, the brain age gaps were not. A model controlling for the effects of age and scanner showed no significant correlation of BAG and age at menarche (coef = − 1.428, p = 0.089), lending support to the idea that the menarche classification model may pick up putative biological variability additional to that revealed by a brain age model.

Fig. 3
figure 3

Comparison of menarche classification to a brain age prediction framework. a Predicted age by age. b BAGs residualised for age and scanner by menarche class probabilities residualised for age and scanner

Discussion

The present study aimed at investigating whether structural MRI data can be used to correctly classify pre- vs post-menarche status in adolescent females, thus shedding light on the neurodevelopment associated with pubertal timing. For this, we successfully trained a machine learning model for the classification of pre- and post-menarcheal females in the ABCD cohort while strictly controlling for age-related neurodevelopment through age-matching. To further disentangle age- vs. puberty-related patterns in neurodevelopment, we performed subsequent comparison to a brain age prediction framework that predicts chronological age from MRI, revealing shared and distinct variance in the two machine learning approaches. Finally, we investigated if the class probabilities obtained from menarche classification may render a continuous biological marker of pubertal timing that can add relevant information beyond the pre- vs post-menarche dichotomy. Indeed, our results indicate that the probabilities are significantly associated with other key variables of pubertal maturation, in particular age at menarche and pubertal status.

Menarche classification

We argue that leveraging a multivariate, machine learning model helps to integrate information from a collection of brain regions into a single score, which eventually may overcome the inherent complexity of modelling puberty in a univariate fashion and its accompanying statistical considerations [7, 44,45,46] (see [44] for a conceptually similar approach of integrating various sources into a single marker representing pubertal timing). By doing so, our menarche classification model performed with moderate yet significantly above chance accuracy during cross-validation and when applied to a withheld test sample. To rule out that this classification solely mimics a separation of a younger vs. older subgroup of females, we performed a strict age matching prior to model training. Consequently, our results suggest that there is menarche related neuronal variance detectable in structural MRI data. This aligns well with endocrinological trajectories, which are characterized by a substantial, year-long increase in estradiol levels prior to menarche [47] and related findings that estrogens affect neuroplasticity [6, 48, 49].

Validation of derived class probabilities

Given the close relationship between pubertal- and age-related neurodevelopment we contrasted the results of our classification model to outcomes of a brain age prediction framework. This approach aimed at exploring the degree to which our derived class probabilities and brain-age estimations capture similar or distinct patterns of neurodevelopmental variation in the adolescent female brain. Testing the brain age prediction model on the above mentioned held-out test sample from the ABCD cohort, we observed highly significant and accurate model performance comparable to results of previous studies modelling brain age in the ABCD cohort [6]. Our derived menarche class probabilities were positively related to brain age gaps (BAGs, i.e., the difference between someone’s brain and chronological age), matching earlier results that associated brain age with pubertal status (e.g., [6]) and pubertal timing (e.g., [44]). Individuals with higher class probabilities (i.e., a higher probability of being classified as post-menarche) also had higher brain-age gaps (i.e., an indication of a more mature brain in relation to their chronological ages), suggesting that both approaches capture variations in adolescent brain development related to advanced brain maturation. Our work however extends previous findings, by showing that our menarche classification approach seems to be able to better exploit traces of pubertal timing in the brain that are specific to puberty and go beyond the traces of age-related neurodevelopment that are captured by a brain age prediction framework. This finding is in line with previous suggestions that puberty related processes may be a more sensitive measure to investigate adolescent brain development compared to age-related neurodevelopment [6,7,8,9].

In particular, we aimed at showing that exploiting the obtained class-probabilities beyond a binary < or > 0.5 decision could help towards developing an objective brain-based marker for pubertal development. To prove the additional benefit of such an approach we aimed at investigating the probabilities' associations with puberty-related measurements. In the ABCD study, puberty is assessed by different means, ranging from hormonal measurements to self-reported evaluation of perceived pubertal maturation (see [50]). The latter allows to localize individuals in different pubertal stages or categories ranging from pre- to post-pubertal. As described before, for females the score is derived by summing over ratings of key physical changes, such as breast development and pubic hair growth, but also the (non-) completion of menarche [14]. Since menarche is directly incorporated in the pubertal category scores, we additionally showed that higher pubertal category scores (thus indicating a later pubertal stage) are associated with higher class probabilities, which serves as an important sanity check for our proposed approach. Higher BAGs were also significantly associated with higher pubertal category scores, however with a descriptively smaller effect sized compared to the class probabilities. Furthermore, after including both variables into a model, only the effect of class probabilities remained significant. Furthermore, we show that post-menarche class probabilities were weakly yet significantly associated with age at menarche. In contrast, the association between BAG and age at menarche was not significant. This may suggest that there are traces of pubertal timing in the brain that go beyond patterns of age-related brain development, and that these traces can be more successfully exploited by our proposed menarche classification model than by the brain age prediction framework. Interestingly, the pattern of higher class probabilities in females that underwent early or earlier menarche resonates with the hypothesis that the brain might be more susceptible to the hormonal influences of puberty at a younger age and that, therefore, individuals who experience an earlier menarche undergo the increase of gonadal hormones at a time when their brain is relatively more sensitive to their effects on neuroplasticity [51, 52]. Of note, after adding SES, race/ethnicity, and BMI as covariates into our analyses, the association between the class probabilities and age at menarche diminished. This observation matches previous reports about the close link between pubertal processes, for example pubertal timing (e.g., operationalized as age at menarche) [53] and these covariates (e.g., [54,55,56]). Importantly, associations between the aforementioned covariates and puberty were also replicated for ABCD 2-year follow up data, which we used in the current study [14]. While we consider it important to understand the associations with these covariates, their interplay is difficult to disentangle with the data at hand, given the relationship between pubertal timing and these variables. We argue that these results rather warrant further systematic investigation of the interplay between all factors in the equation.

Methodological considerations and future directions

Potential limitations may stem from the fact that we limited our machine learning model to structural imaging data from cortical and subcortical regions. While this decision resonates with well-replicated findings of cortical and subcortical grey matter changes during puberty [7], integrating additional imaging features, such as white matter measures, may help in a more holistic investigation of pubertal timing. While myelination plays a crucial role in shaping the human brain during adolescence, findings regarding pubertal maturation appear to be either mixed regarding different measures of white matter (e.g., [5, 7]) or lack previous investigations. Furthermore, while we followed a common approach of training our brain-age prediction model in an independent dataset (see e.g., [6, 42]) and applying it to our target sample in the ABCD cohort, recent work from Ray and colleagues [57] suggests that refined brain age models (i.e. a combination of pre-trained models with subsequent finetuning on a fraction of the target data) may improve model performance and thus also downstream analyses. Furthermore, our work focuses on a proof of concept on the 2-year follow up data of the ABCD study. With additional longitudinal data becoming available through upcoming releases, the ABCD study depicts an unprecedented resource to validate our model and proof its usability, since more and more females will eventually undergo their menarche. Lastly, one needs to acknowledge that the validation analyses with class-probabilities and external measures of pubertal timing yielded small—yet significant—effects. Further research is thus warranted to replicate our findings in a different cohort, or later timepoint in the ABCD study, to further elucidate the usability of the continuous class probabilities as proposed here.

Perspective and significance

This work may be seen as a proof of principle that pubertal timing can be classified from brain imaging data. Previous studies that have used age-focused approaches like brain age prediction frameworks have found associations with pubertal measures [44], yet our results suggest that grounding the modelling in puberty data directly may yield brain based markers that are even more sensitive to pubertal status and timing. Future studies may thus further explore similar approaches toward the development of brain-based puberty markers that may be useful in downstream analyses in developmental neuroscience.

Conclusion

We introduced a machine learning approach that classifies menarche status of adolescent females from their cortical and subcortical structural MRI data. The derived continuous brain-based class probabilities captured shared but also unique variations of adolescent neurodevelopment when compared to a brain-age prediction model. Taken together, our results suggest that there are markers of menarche in the brain that can be formalized into a continuous class probability, which might in the future be developed toward an objective brain-based marker of pubertal timing.