Brain age predicted using graph convolutional neural network explains neurodevelopmental trajectory in preterm neonates

Objectives Dramatic brain morphological changes occur throughout the third trimester of gestation. In this study, we investigated whether the predicted brain age (PBA) derived from graph convolutional network (GCN) that accounts for cortical morphometrics in third trimester is associated with postnatal abnormalities and neurodevelopmental outcome. Methods In total, 577 T1 MRI scans of preterm neonates from two different datasets were analyzed; the NEOCIVET pipeline generated cortical surfaces and morphological features, which were then fed to the GCN to predict brain age. The brain age index (BAI; PBA minus chronological age) was used to determine the relationships among preterm birth (i.e., birthweight and birth age), perinatal brain injuries, postnatal events/clinical conditions, BAI at postnatal scan, and neurodevelopmental scores at 30 months. Results Brain morphology and GCN-based age prediction of preterm neonates without brain lesions (mean absolute error [MAE]: 0.96 weeks) outperformed conventional machine learning methods using no topological information. Structural equation models (SEM) showed that BAI mediated the influence of preterm birth and postnatal clinical factors, but not perinatal brain injuries, on neurodevelopmental outcome at 30 months of age. Conclusions Brain morphology may be clinically meaningful in measuring brain age, as it relates to postnatal factors, and predicting neurodevelopmental outcome. Clinical relevance statement Understanding the neurodevelopmental trajectory of preterm neonates through the prediction of brain age using a graph convolutional neural network may allow for earlier detection of potential developmental abnormalities and improved interventions, consequently enhancing the prognosis and quality of life in this vulnerable population. Key Points •Brain age in preterm neonates predicted using a graph convolutional network with brain morphological changes mediates the pre-scan risk factors and post-scan neurodevelopmental outcomes. •Predicted brain age oriented from conventional deep learning approaches, which indicates the neurodevelopmental status in neonates, shows a lack of sensitivity to perinatal risk factors and predicting neurodevelopmental outcomes. •The new brain age index based on brain morphology and graph convolutional network enhances the accuracy and clinical interpretation of predicted brain age for neonates. Supplementary information The online version contains supplementary material available at 10.1007/s00330-023-10414-8.


Methods S2: Steps for NEOCIVET pipeline
4][5] The pipeline began with general MR image pre-processing, including denoising and intensity nonuniformity correction.Then, the brain is extracted using a patch-based brain extraction algorithm (BEaST) 6 and registered to the MNI-NIH neonatal brain template (http://www.bic.mni.mcgill.ca/ServicesAtlases/NIHPD-obj2).Different types of brain tissue (GM, WM, and CSF) were thereafter segmented by a label fusion based on a joint probability between selected templates. 7Next, the corpus callosum was segmented on the midline-plane and used to divide the WM into hemispheres.A marching cube-based framework was adopted to generate a triangulated mesh WM surface attached to the boundary between the GM and WM.After resampling to a fixed number of 81,920 surface meshes using the icosahedron spherical fitting, this surface was further fitted to the sharp edge of the GM-WM interface based on the image intensity gradient information.This allowed for the deformation while preserving the spherical topology of the cortical mantle.
A CSF skeleton was then generated from the union of WM and CSFs.Pial surface was constructed by expanding the WM surface towards the skeleton as an intermediate pial surface.The intermediate pial surface further underwent a fine deformation to identify actual edges of sulcal CSF volumes using an intensity gradient feature model.Finally, the cortical thickness was estimated based on the Euclidean distance between the white matter and pial surface.

Methods S3: Graph convolutional neural network (GCN) based brain age prediction
The proposed PBA model using GCN is illustrated in Figure 1.GCNs 9 are designed to exploit the underlying graph structure of the data.To this end, GCNs consider spectral convolutions on graphs defined as the multiplication of a signal with a filter in the Fourier domain. 10The signal ℎ on the graph nodes is filtered by  as: where U is the Fourier basis of the graph Laplacian L, given by the eigen decomposition of L i.e.,  =   ,  is the ordered real nonnegative eigenvalue values vector of graph Fourier transform, * is the convolution operator and ⊙ denotes element-wise multiplication.The graph Laplacian L is defined as L: = D -W where the degree matrix D is a diagonal matrix whose th diagonal element   is equal to the sum of the weights of all the edges connected to vertex i as   = ∑

𝑗
. W is a binarized adjacency matrix encoding the connection between vertices, where 1 represents a connection between two vertices in brain mesh and 0 otherwise.After normalization, the graph Laplacian is defined as  =   −  −1/2  −1/2 where   is the identity matrix.
Vertices on graph are re-arranged such that a graph pooling operation becomes as efficient as 1D pooling.Fake nodes, or disconnected nodes, are added to construct a balanced binary tree from the coarsest to finest level to make the pooling operation very efficient without losing information.
Mean squared error (MSE) was used as the loss function with an Adam optimizer, the empirically determined set of parameters with a learning rate of 10 -6 , an L2 regularization parameter of 10 -8 , and a batch size of 2 were applied.

Methods S4: number of nodes in GCN
We down-sampled 81,924 vertices on cortical surfaces using the icosahedron downsampling to investigate the accuracy of the GCN-based brain age prediction while saving computational time in the training of GCN.We thus feed each of the brain meshes that were down-sampled with 324, 1,284, 5k and 20k vertices to the GCN model respectively.The number of 1,284 was chosen to use in the following analysis by a compromise of the computational accuracy and computational time (Figure S3).Next, it combines the slice encodings using an aggregation module, resulting in a single embedding for the scan.Finally, this embedding was passed through the feed-forward layers to predict the brain age.The model was trained end-to-end using MSE loss.More details of the used network architecture can be found in (Gupta et al., 2021). 11This model has been proved to be better than conventional 3D-CNN based model in adults brain age prediction study (Gupta et al., 2021).This model has a shallow network architecture and might be better for learning brain age from a small sample dataset.A recent paper from Smith's group also concluded that a shallow neural network performs more favorably for the brain age prediction task (Peng et al., Medical Image Analysis 2021). 12Methods S6: Hyperparameter tunning of all the brain age prediction models used in this study.
For the CNN model, we first used the hyperparameters recommended by the author in their original paper (Gupta et al., 2021) 11 and then varied these parameters from this initial setting and tested them in separate runs, which allowed us to achieve the best accuracy of the CNN model.
For the Random Forest model, the computational time is quite short.Hence, to tune the parameters, we tested various combinations of parameters, including the number of trees, maximum levels in each tree, maximum number of features used for splitting a node, and the need for bootstrap analysis.Our reported results are derived from the combination of parameters that led to the highest PBA accuracy.
The optimal setting of hyperparameters for the GCN model has not yet been documented in previous studies.Thus, we tried to achieve the best accuracy by testing various parameters, including the number of layers, the kernel size, the number of filters, pooling approaches, etc. Due to the huge computational burden in deep learning algorithms, however, we did not focus on finding the best combination of parameters.Instead, we independently varied and tested each parameter while fixing others.As a result, we found that the GCN model obtained the best performance when it had 3 convolutional and pooling layers, with the kernel size of 5 (vs. 3 or 10), the number of filters to be [8, 16, 32] (vs.16, 32, 64), the pooling size of 2 and the max-pooling approach (vs.average pooling).We also found that the outputs based on different settings were only slightly different.Notably, a larger variation in output performance was observed when using different random initialization and nested 5 folds cross-validation models.We thus built several models (100 initialization X 5 folds) and averaged them.To compare whether the brain age index, which is based on three cortical thickness, sulcal depth, and GM/WM ratio, is more sensitive to the brain abnormalities than the three features respectively, we assessed the association between three cortical features and clinical factors.Specifically, we first averaged each of the three cortical features across all the vertices per individual.Similar to brain age index, we then converted each feature into a global normative index to indicate how the cortical features may deviate from the normal developmental trajectories.To do this, we ran a normative modelling using Gaussian process regression with a 5-fold cross validation to predict the cortical features while correcting for the effects of age at scan and sex.The predicted values are in a z-score quantifying abnormality deviating from the normal developmental trajectory.Then, a similar analysis, as shown in Figure 3, was conducted on z-scores which were calculated from each cortical feature.The brain age index and the cortical features exhibited similar patterns in relation to various clinical variables to an extent.However, the brain age index had the best sensitivity compared to single cortical features as it showed the greatest number of significant associations.

Figure S7.
Mean absolute error map for regional BAIs predicted using GCN models.

Methods S9: Structural Equation Modelling
We built structural equation models (SEM) that impute relationships between latent variables.Based on the hypothesized latent risk variables and timeline in Figure 2 This analysis, which was designed to identify the clinical variables and their paths leading to adverse neurodevelopmental outcomes, was conducted only on 50 MRI images from UCSF dataset including baseline and follow-up scans at 30 months from 38 preterm survivors.
By constructing the SEM, we tested our hypothesized model (Figure 2) using the previously mentioned data and determined the strength and significance of each hypothesized path.We estimated all parameters using weighted least squares with standard errors and mean-and variance-adjusted test statistics with a full weight matrix (WLSMV), which yields parameter estimates and standard errors that are robust to violations of multivariate normality 34.We used the χ2 statistic fit indices to evaluate whether the model fit to data well.We reported standardized parameter estimates computed using the WLSMV estimation to enable more direct comparisons of the effects for different pathways to neurodevelopmental outcomes.
To begin, we used the confirmatory factor analysis (CFA) to model latent variables summarizing a subset of pathologies.As a priori, we hypothesized a latent variable for 1) preterm birth measures, informed by birth age and birthweight; 2) the presence of perinatal injuries; 3) postnatal conditions/treatments, informed by exposure to steroids, hypotension, infection, PDA, days intubated, and CLD; 4) neurodevelopmental outcome at 30 months (cognitive, language, and motor scores).Notably, the MAE reported here is slightly different from that in manuscript because the MAE value in the manuscript is based on the average of 100 outputs (more details can be found in our response to your next question).Given the very small differences between the two models, we decided to keep the original results, as changing the model would be extremely time-consuming and yet not change the main message of the paper.

Figure S1 .
Figure S1.Comparison of three cortical features used in this study after the ComBat harmonization method.We applied ComBat to the 3 groups of scanner types -1.5TGE (with 1x1mm voxel size), 3T GE (0.7x0.7mm), and 3T Siemens (0.6x0.6mm).Results demonstrate that ComBat can remove possible bias and generate a consistent trajectory of maturation from 3 different data sources as seen in the 3 different imaging features used for brain age prediction.

Figure S2 .
Figure S2.A) The newest NEOCIVET version was used in this study.8NEOCIVET was developed based on 3 datasets from UCSF (1.5T & 3T), dHCP (3T), and University of North Carolina (UNC, 3T).All key steps in the pipeline, including 3D U-Net models for brain and tissue segmentation, were trained using the 3 datasets.The surface reconstruction quality has been validated on these datasets by an expert neuroscientist whose scoring is provided in the figure (blue: good quality, orange: fair quality, gray: poor quality).B) Illustration of the output from the NEOCIVET pipeline.Left panel represents the tissue segmentation.The right panel displays the surface reconstruction: 10 individual surfaces are overlaid on MRI, which were reconstructed from the dHCP dataset.

8 Figure S3 .
The number of 1,284 was chosen to use in the following analysis by a compromise of the computational accuracy and computational time Methods S5: Convolutional neural network (CNN) based brain age prediction To compare our cortical surface-based GCN model with conventional image-based deep learning model, we also built a brain age prediction model by applying a 2D aggregated CNN-based deep learning model (Gupta et al., ISBI 2021) to T1 MR images directly.This model was fed a 3D scan as input and encoded each image slice using a 2D-CNN encoder.

Methods S7 :
Carefully designed cross-validation strategy.One-fifth of the sample was excluded from the training step and reserved as unseen testing data (Figure S4-left).The other four fifths were entered into the nested 5-folds crossvalidation loop as training and validation samples presented in figure S4-left or the figure below for your convenience.The nested cross-validation set underwent separate trainingvalidation processes using its own 5 folds to avoid overfitting (Figure S4-middle).For each fold of the processes, four-fifths of the sample was grouped as training samples, and the remaining one-fifth was used as validation samples.For each fold, the trained model was applied to the unseen test data to generate predicted ages.Also, we ran the cross-validation model 100 times using the random initialization step in the GCN to remove potential bias due to different initial parameter settings.The final predicted ages were calculated by averaging the output from the 5x100=500 models.

Figure S4 .
Figure S4.The training strategy of GCN in our study.

Figure S5 .
Figure S5.The conventional RBA is negatively associated with true age (left, p < 0.05), while this effect is corrected in new RBA (right).

Figure S6 .
Figure S6.The epochs converging along with the iteration for our training and validation datasets.

Methods S8 :
Association of clinical variables with the morphological and imaging features used as the input to brain age prediction models, such as cortical thickness, sulcal depth, and GM/WM intensity ratio.
, we analyzed multiple relationships/paths between severity of preterm birth, perinatal injuries, pre-scan post-natal factors, BAI at postnatal scan and neurodevelopmental outcome scores at 30 months.Eur Radiol (2023) Liu M, Lu M, Kim S et al.

Figure S8 .
Figure S8.Predicted brain age using cortical feature harmonized by combat before and after splitting them into training and test.Notably, the MAE reported here is slightly different from that in manuscript because the MAE value in the manuscript is based on the average of 100 outputs (more details can be found in our response to your next question).Given the very small differences between the two models, we decided to keep the original results, as changing the model would be extremely time-consuming and yet not change the main message of the paper.

Figure S9 .
Figure S9.Results of path analysis including the maternal risk factors.Rectangles represent manifest variables, and ellipses represent latent variables.Each single-headed arrow denotes a hypothesized unidirectional effect of one variable on another.Single-headed arrows represent the impact of one variable on another, and double-headed arrows represent covariances between pairs of variables.Numbers associated with effects are standardized regression coefficients.Asterisks refer to the paths that are statistically significant.

Figure S10 .
Figure S10.Mean absolute error map for regional BAIs predicted using GCN models.

Table S3 .
The performance of the age prediction (errors in weeks for MSE, MAE and SDAE).