Automated neonatal nnU-Net brain MRI extractor trained on a large multi-institutional dataset

Chen, Joshua V.; Li, Yi; Tang, Felicia; Chaudhari, Gunvant; Lew, Christopher; Lee, Amanda; Rauschecker, Andreas M.; Haskell-Mendoza, Aden P.; Wu, Yvonne W.; Calabrese, Evan

doi:10.1038/s41598-024-54436-8

Automated neonatal nnU-Net brain MRI extractor trained on a large multi-institutional dataset

Article
Open access
Published: 26 February 2024

Volume 14, article number 4583, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Automated neonatal nnU-Net brain MRI extractor trained on a large multi-institutional dataset

Download PDF

Joshua V. Chen¹,
Yi Li¹,
Felicia Tang¹,
Gunvant Chaudhari¹,
Christopher Lew²,
Amanda Lee²,
Andreas M. Rauschecker¹,
Aden P. Haskell-Mendoza³,
Yvonne W. Wu⁴ &
…
Evan Calabrese^2,5

1125 Accesses
24 Altmetric
3 Mentions
Explore all metrics

Abstract

Brain extraction, or skull-stripping, is an essential data preprocessing step for machine learning approaches to brain MRI analysis. Currently, there are limited extraction algorithms for the neonatal brain. We aim to adapt an established deep learning algorithm for the automatic segmentation of neonatal brains from MRI, trained on a large multi-institutional dataset for improved generalizability across image acquisition parameters. Our model, ANUBEX (automated neonatal nnU-Net brain MRI extractor), was designed using nnU-Net and was trained on a subset of participants (N = 433) enrolled in the High-dose Erythropoietin for Asphyxia and Encephalopathy (HEAL) study. We compared the performance of our model to five publicly available models (BET, BSE, CABINET, iBEATv2, ROBEX) across conventional and machine learning methods, tested on two public datasets (NIH and dHCP). We found that our model had a significantly higher Dice score on the aggregate of both data sets and comparable or significantly higher Dice scores on the NIH (low-resolution) and dHCP (high-resolution) datasets independently. ANUBEX performs similarly when trained on sequence-agnostic or motion-degraded MRI, but slightly worse on preterm brains. In conclusion, we created an automatic deep learning-based neonatal brain extraction algorithm that demonstrates accurate performance with both high- and low-resolution MRIs with fast computation time.

An automatic and accurate deep learning-based neuroimaging pipeline for the neonatal brain

Article 08 March 2023

Segmentation of 3D MRI Using 2D Convolutional Neural Networks in Infants’ Brain

Article 19 September 2023

Funcmasker-flex: An Automated BIDS-App for Brain Segmentation of Human Fetal Functional MRI data

Article 31 March 2023

Introduction

Magnetic Resonance Imaging (MRI) allows for the acquisition of high-resolution images with exceptional soft tissue contrast¹, making it especially useful for evaluation of the brain, where it often informs patient medical management. For neonates, brain MRI is particularly important for assessment of patients with neonatal encephalopathy, where both the presence and pattern of brain injury can assist prognostication and treatment planning^2,3,4,5,6,7. Advances in artificial intelligence (AI) and machine learning (ML) have allowed accurate prediction of functional outcomes in infants using MRI data^8,9,10,11 taking advantage of the imaging information beyond what is reasonably utilized by human visual inspection alone. Image preprocessing is an essential step in standardizing data inputs for AI/ML algorithms, and ensures faster, more robust data processing while minimizing potential confounding features^{12,13,14,15,16,17,18}.

Brain extraction, otherwise known as skull-stripping, is an essential step for virtually all AI/ML approaches to brain MRI analysis. While this process is well-established in adult brain models, there are limited extraction algorithms available for the neonatal brain. Brain extraction refers to the process by which brain tissue is segmented, and non-brain tissue, including the skull and extracranial soft tissues, is removed^{12,14,16,18,19}. Brain extraction facilitates data de-identification by removing three-dimensional face data, which mitigates bias by preventing AI/ML algorithms from focusing on extracranial and facial soft tissues. Accurate automated brain extraction tools are important for improving standardization of the skull-stripping step, as manual editing is prone to variability, is time-consuming, and could influence the accuracy of associated AI/ML models. Historically, automated brain extraction tools have been based on thresholding and binary morphological operations, shape analysis, and/or atlas registration techniques^{20,21,22,23,24,25,26,27,28}; however, the most modern and accurate approaches are based on deep learning (DL) with convolutional neural networks (CNNs)²⁹. Despite recent progress with ML^16,29, there is still a need for improved MRI brain extraction tools designed specifically for neonatal brains³⁰, which differ from adult brains based on differences in morphology, signal contrast, and the increased frequency of motion artifact^{13,15,17,18,24,29,31}.

DL-based brain extraction algorithm performance relies heavily on its training data, and generalizability can be limited by small training set sizes and lack of training data heterogeneity. Though models may learn to perform well on institution specific data, there is a need for more generalizable algorithms that can perform well on MRI data with varying acquisition parameters, field strength, and vendor platforms. To address this need for generalizability, we present ANUBEX (automated neonatal nnU-Net brain MRI extractor), a publicly-available DL-based algorithm for neonatal brain extraction based on the domain-leading nnU-Net architecture and trained on a large multi-institution dataset. We compare the performance of our algorithm to five publicly available algorithms spanning conventional, machine learning, and deep learning methods using a multi-institution external dataset^20,21,32,33.

Methods

Study population

This was an Institutional Review Board approved ancillary study of the High-dose Erythropoietin for Asphyxia and Encephalopathy (HEAL) study^34,35,36, which prospectively enrolled 501 neonates from 17 different institutions across the United States of America with moderate to severe encephalopathy at birth. Informed consent was previously obtained from all subjects and/or their legal guardian, and all methods were carried out in accordance with relevant guidelines and regulations. A subset of HEAL participants (N = 474) underwent neonatal MRI. Exclusion criteria included missing, incomplete, or severely artifact degraded T1-weighted MR imaging data (N = 41) resulting in a final study population of 433 participants from 17 different institutions (Fig. 1).

Study data

Imaging data used for this study consisted of T1-weighted, T2-weighted, and diffusion-weighted imaging of the brain acquired as part of the HEAL trial. Scan parameters varied based on the imaging site and scanner platform. T1-weighted images included both three-dimensional gradient echo and two-dimensional spin echo imaging. T2-weighted images were two-dimensional Fast Spin Echo (FSE) imaging and diffusion-weighted images were Echoplanar Imaging (EPI). Other than in-plane resolution and slice thickness, scan parameters were not collected as part of the HEAL trial and are not consistently available for these data.

Iterative deep learning model development

The ANUBEX architecture was designed using nnU-Net³⁷, a self-configuring segmentation framework based on the popular U-Net architecture³⁸, which is both widely used and has demonstrated domain leading segmentation performance on related tasks. Model training was accomplished using an iterative, human-in-the-loop AI approach. First, baseline automated brain masks were generated from T1-weighted images using a widely used tool for adult MRI brain extraction²¹. Next, all brain masks were manually reviewed by a single medical trainee (author JC) using ITK-SNAP³⁹ and categorized as either “Acceptable,” “Borderline,” or “Needs Revision” using the following criteria:

Acceptable

Very little or no non-brain tissue included or brain tissue excluded; manual revision not expected to improve algorithm performance.

Borderline

Small amount of non-brain tissue included or brain tissue excluded; uncertain if manual revision will change algorithm performance.

Needs revision

Significant amount of non-brain tissue included or brain tissue excluded; manual revision expected to improve algorithm performance.

Studies labeled as “Borderline” were manually edited in ITK-SNAP by the same medical trainee. Next, all “Acceptable” and revised “Borderline” studies were used to train an instance of nnU-Net (single fold, random 80%/20% train/validation split). This model was then used to re-generate automated masks for the remaining “Needs revision” cases and the process was repeated for a total of five iterations, with each training instance reusing all previously labeled “Acceptable” and manually revised “Borderline” images. After five iterations, all remaining “Borderline” (N = 11) and “Needs revision” (N = 23) masks were manually edited to complete the training dataset.

Final model training using all the manually reviewed/corrected data (N = 433) was performed using a five-fold cross-validation approach with a standard random 80%/20% train/validation split for each fold. Model training was accomplished using a desktop computer equipped with two Nvidia RTX A600 40 GB graphics processing units running in parallel (one training fold per GPU). We developed two models, one trained on only T1-weighted imaging referred to as ANUBEX, and one trained on all three included sequences in a randomized manner referred to as ANUBEX Sequence Agnostic (ANUBEX-SA).

External validation

Performance of the fully trained ANUBEX model was evaluated using an out-of-sample, external test set consisting of N = 39 T1-weighted images from two different sources: N = 20 from the developing Human Connectome Project (dHCP)⁴⁰ consisting of high-resolution three-dimensional gradient echo T1-weighted imaging, and N = 19 from the NIH Pediatric MRI study⁴¹ consisting predominantly of lower resolution two-dimensional spin echo T1-weighted imaging. Corresponding T2-weighted images were also obtained from the dHCP test set. A single reviewer (author JC) manually reviewed the test set and manually generated each mask, which were subsequently used as ground truth for assessing automated brain masks. The proposed model was applied to the external test set using an ensemble of all five training folds.

Model performance was compared to five different publicly available automated brain extraction methods: BET, BSE, CABINET, iBEATv2, and ROBEX^{20,21,22,32,33}. Each algorithm was applied to the external test set using default parameters. These benchmark comparison methods were chosen based on the following criteria: (1) publicly available, (2) out-of-the-box functionality (i.e. single command that runs on native data), and (3) based on a variety of different methods (e.g. shape analysis, atlas registration, deep learning).

Sub-analyses

In addition to the primary external validation described in the previous section, we performed several sub-analyses to evaluate model performance in different scenarios including different MRI sequences, preterm brain MRIs, and motion degraded brain MRIs. To address performance on different MRI sequences we evaluated ANUBEX-SA on T2-weighted imaging from the dHCP test set only, as the NIH data does not consistently contain T2-weighted imaging. To address performance on preterm brain MRIs, we evaluated ANUBEX on 18 T1-weighted brain MRIs performed before 36 weeks that were available in the dHCP dataset. To address performance in the setting of motion artifact, we evaluated the performance of ANUBEX on motion degraded validation data from the fivefold cross-validation. We chose this approach because there were insufficient exams with motion artifact in the testing data for a meaningful analysis. We identified 92/433 (21%) exams with at least moderate motion artifact and 341/433 (79%) exams with either mild or no significant motion artifact using the following objective criteria (Fig. 2):

Mild motion artifact

Slight motion artifact that does not obscure grey-white matter junction.

Moderate motion artifact

Motion artifact that incompletely obscures grey-white matter junction.

Severe motion artifact

Obvious motion artifact that completely obscures grey-white matter junction.

Evaluation metrics and statistical analyses

The Dice coefficient was chosen as the primary metric for comparing manual and automated brain masks. The Dice coefficient compares the degree of spatial overlap between two binary images, ranging between 0 (no overlap) to 1 (perfect agreement), and is calculated as: Dice coefficient (A,B) = 2(A ∩ B)/(A + B) where (A ∩ B) is the union of masks A and B. Secondary metrics included sensitivity and specificity, calculated as Sensitivity = TP/(TP + FN), and Specificity = TN/(FP + TN) where TP is the number of true positive voxels in the mask, TN the number of true negative voxels, FP the number of false positive voxels, and FN the number of false negative voxels. Dice coefficients were calculated using custom Python code, and statistical comparisons between average Dice scores were computed using a two-sample, two-tailed t-test with a significance threshold of p < 0.05. We controlled for multiple comparisons using the Benjamini and Hochberg False Discovery Rate correction method.

Ethical approval

This study was approved by the University of California, San Francisco Institutional Review Board as an ancillary study of the High-dose Erythropoietin for Asphyxia and Encephalopathy (HEAL) study.

Results

Study data and patient demographics

The final training dataset included N = 433 neonatal MRI studies from 17 institutions, 44% of which were female. The median gestational age (GA) at birth was 39.3 weeks (interquartile range [IQR] 38.1–40.3), with MRIs obtained between 96 and 144 h after birth³⁶. The final external testing dataset included N = 39 neonatal MRI studies from two institutions, N = 20 from the dHCP and N = 19 from the NIH. The dHCP preterm sub-analysis data set included N = 18 MRIs. The median GA at scan of patients from the NIH, dHCP, and dHCP Preterm data sets, respectively, were 42.3 weeks (IQR 42.1–43.1), 40.6 weeks (IQR 39.7–40.9), and 34.5 weeks (IQR 34.0–35.3). The demographics of the NIH, dHCP, and dHCP Preterm data sets, respectively, were 53%, 30%, and 44% female. Basic participant demographic data is shown in Table 1. MRI resolution is shown in Table 2.

Table 1 Patient demographic information for the training and testing datasets.

Full size table

Table 2 Slice resolution for N = 433 T1-weighted MRIs. Resolution Z-axis represents slice thickness.

Full size table

Model training

Final model training lasted approximately 36 h. Training and validation loss (Dice) decreased appropriately throughout the training process. Final trained model weights are freely available online (https://github.com/ecalabr/nnUNet_models).

External validation and performance evaluation

External validation and performance evaluation were performed using the multi-institution external test dataset (N = 39). Processing time for all 39 studies in the external test set took 330.34 s or an average of 8.5 s per study using an Nvidia RTX A6000 GPU. Results from ANUBEX were compared to results from 5 other publicly available brain extraction tools: BET, BSE, CABINET, iBEATv2, and ROBEX^{20,21,22,32,33}. Dice scores for all models evaluated on the testing dataset are provided in Table 3. Example brain masks generated by each algorithm are shown in Fig. 3. The Dice coefficient of our model was the highest of all methods tested with a mean ± standard deviation of 0.955 ± 0.017 (Fig. 4A). The next best performing model (iBEATv2) yielded an average Dice of 0.949 ± 0.017, followed by CABINET at 0.934 ± 0.015. Other evaluated methods yielded average Dice scores below 0.85. Our model showed a small but statistically significant improvement in performance compared to the two other deep learning algorithms CABINET (p < 0.001) and iBEATv2 (p = 0.012) and a larger statistically significant difference between the non-deep learning algorithms ROBEX, BSE, and BET. Sub-analysis of algorithm performance on the external test set by site revealed a trend towards better performance on the dHCP (3D) image data (Fig. 4C) compared to the NIH (2D) data (Fig. 4B). Notably, our algorithm showed the highest performance of all algorithms tested for both dHCP and NIH data.

Table 3 Model performance metrics are presented for each of the test sets.

Full size table

Sub-analyses

Sub-analysis results are presented in Table 3 and Fig. 4. ANUBEX-SA (trained on T1-, T2-, and diffusion-weighted images) showed similarly high performance on T1-weighted imaging from both test sets (average Dice = 0.956 ± 0.012 for dHCP and Dice = 0.943 ± 0.014 for NIH) and performance on T2-weighted imaging from the dHCP test set was nearly identical (average Dice = 0.956 ± 0.008). We detected small but statistically significant decreases in performance of ANUBEX-SA compared to ANUBEX for the dHCP test set but not for the NIH test set or aggregate test set.

ANUBEX performance on the 18 preterm (< 36 weeks gestational age) brain MRIs from the dHCP yielded an average Dice = 0.947 ± 0.030, which was slightly worse compared to performance on term dHCP MRI data (p = 0.015). ANUBEX-SA performance was average Dice = 0.940 ± 0.028 for T1-weighted images and 0.925 ± 0.028 for T2-weighted images, which was not significantly different compared to regular ANUBEX performance on preterm T1-weighted images (Fig. 4D).

ANUBEX performance in the setting of moderate or severe motion artifact was evaluated on validation data from the fivefold cross-validation, which results in elevated Dice scores compared to test set data but still allows comparison of performance between MRIs with and without motion artifact. Average validation Dice score for ANUBEX was 0.986 ± 0.021 for the group with at least moderate motion artifact compared to 0.988 ± 0.020 in the group without significant motion artifact. This difference was not statistically significant (p = 0.470).

Discussion

In this study, we evaluated ANUBEX, a new deep learning-based model for neonatal MRI brain extraction based on the widely used nnU-Net architecture. Model performance was evaluated on an independent, multi-institution, external dataset and results were compared to five other publicly available brain extraction methods including deep learning-based and non-deep learning-based methods: BET, BSE, CABINET, iBEATv2, and ROBEX. Compared to the other methods we evaluated, our model demonstrated superior brain extraction performance on both 2D and 3D neonatal brain MRIs. Specifically, there was a small but significant improvement in performance compared to the other two deep learning-based methods (CABINET and iBEATv2) and a larger significant difference compared to the non-deep learning-based methods. Based on sub-analysis results, our model performs slightly worse on brain MRIs of preterm infants as compared to term infants, an expected outcome given our model was trained on term and near-term infants. We did not find significant differences in performance between our T1-weighted model (ANUBEX) or our sequence agnostic model (ANUBEX-SA) whether evaluated on T1- or T2-weighted images, and model validation performance was not significantly different in moderately to severely motion degraded versus non to mildly motion degraded images.

Our approach to model generation has several potential advantages that may have contributed to the observed performance increase. First, we employed an iterative semi-automated approach to ground truth brain mask generation, which allowed increased efficiency and consistency. Second, we utilized a multi-institutional dataset from the HEAL trial as training data for our deep learning algorithm in order to create a more generalizable model across different institutions. By training with a larger and more heterogeneous sample including variation in MRI manufacturer, model, software, and imaging parameters³⁶, our model can potentially achieve higher accuracy in neonatal skull stripping across various institutions in comparison to studies performed with a smaller and institution specific dataset. For example, our model showed improved performance with both high-resolution (0.8 × 0.8 × 1.6 mm) 3D imaging (dHCP) and thicker slice (1.0 × 1.0 × 3.0 mm) 2D imaging (NIH), which is likely attributable to the training data heterogeneity. Comparatively, iBEATv2 was trained on only the high-resolution Baby Connectome Project dataset (resolution 0.8 × 0.8 × 0.8 mm), and ROBEX was trained on a proprietary dataset of 92 healthy adult subjects (downsampled to lower resolution 1.5 × 1.5 × 1.5 mm)³³. Finally, our model was generated using the widely used nnU-Net architecture, which has “out-of-the-box” functionality and has shown domain-leading performance in other medical image segmentation tasks. The use of nnU-Net also allows straightforward sharing of trained model weights and can lower barriers to implementation and use in future research projects.

This study has several important limitations. First, the use of data from the HEAL trial limits the scope of brain pathology included in the training data. HEAL study participants all had moderate to severe encephalopathy and did not have other major structural brain abnormalities. While several other intracranial pathologies were present in HEAL participants (e.g., infarcts, hemorrhages, hydrocephalus) these were not rigorously documented nor was the model specifically tested for brain extraction performance in the setting of any brain abnormality. Therefore, performance in the setting of brain structural pathology may be degraded. Second, we focused exclusively on the early neonatal period (< 44 weeks GA at scan) and therefore performance in patients older than 44 weeks GA may be degraded. Finally, comparison with other publicly available models was not exhaustive as several previously published algorithms had webpages that were inactive or code that was non-functional on modern software stacks.

Because accurate brain tissue segmentation is key to subsequent image analysis and volumetric measurements, necessary future steps would include further evaluation of the accuracy of our model on patients outside of the neonatal age range, such as in young children or adults, and assessing our model’s utility on brains with diverse structural pathology. We were not able to uniformly perform sub-analyses on all other algorithms because of varying abilities to support T2-weighted imaging.

In conclusion, we propose an application of nnU-Net to create a newer high-accuracy automatic neonatal brain extraction algorithm trained on a large multi-institutional dataset to improve generalizability across MRI acquisition parameters. Our model demonstrates accurate performance with both high- and low-resolution MRIs and is designed to have a lower barrier to use as an “out-of-the-box” ready software with fast computational time.

Data availability

Trained model weights are available through the corresponding author or online at: https://github.com/ecalabr/nnUNet_models

Abbreviations

JVC:: Study design, literature search, data acquisition or analysis, manuscript drafting, manuscript figures/tables, manuscript revision
YL:: Study design, data acquisition or analysis, manuscript drafting, manuscript revision
FT:: Data acquisition or analysis, manuscript drafting, manuscript figures/tables, manuscript revision
GC:: Data acquisition or analysis, manuscript revision
CL:: Manuscript revision
AL:: Manuscript revision
AMR:: Manuscript revision
APH:: Manuscript figures/tables, manuscript revision
YWW:: Data acquisition or analysis, manuscript revision
EC:: Study design, literature search, data acquisition or analysis, manuscript drafting, manuscript figures/tables, manuscript revision

References

Plewes, D. B. & Kucharczyk, W. Physics of MRI: A primer. J. Magn. Reson. Imaging 35(5), 1038–1054. https://doi.org/10.1002/jmri.23642 (2012).
Article PubMed Google Scholar
Wu Y. W. Clinical features, diagnosis, and treatment of neonatal encephalopathy. UpToDate (2023).
Meijler, G. & Steggrda, S. Overview of cerebellar injury and malformations in neonates. UpToDate (2022).
Heinz, E. R. & Provenzale, J. M. Imaging findings in neonatal hypoxia: A practical review. AJR Am. J. Roentgenol. 192(1), 41–47. https://doi.org/10.2214/ajr.08.1321 (2009).
Article PubMed Google Scholar
Miller, S. P. et al. Patterns of brain injury in term neonatal encephalopathy. J. Pediatr. 146(4), 453–460. https://doi.org/10.1016/j.jpeds.2004.12.026 (2005).
Article PubMed Google Scholar
Barnette, A. R. et al. Neuroimaging in the evaluation of neonatal encephalopathy. Pediatrics 133(6), e1508-1517. https://doi.org/10.1542/peds.2013-4247 (2014).
Article PubMed Google Scholar
Chau, V., Poskitt, K. J. & Miller, S. P. Advanced neuroimaging techniques for the term newborn with encephalopathy. Pediatr. Neurol. 40(3), 181–188. https://doi.org/10.1016/j.pediatrneurol.2008.09.012 (2009).
Article PubMed Google Scholar
Mostapha, M. & Styner, M. Role of deep learning in infant brain MRI analysis. Magn. Reson. Imaging 64, 171–189. https://doi.org/10.1016/j.mri.2019.06.009 (2019).
Article PubMed PubMed Central Google Scholar
Saha, S. et al. Predicting motor outcome in preterm infants from very early brain diffusion MRI using a deep learning convolutional neural network (CNN) model. Neuroimage 215, 116807. https://doi.org/10.1016/j.neuroimage.2020.116807 (2020).
Article PubMed Google Scholar
Baker, S. & Kandasamy, Y. Machine learning for understanding and predicting neurodevelopmental outcomes in premature infants: A systematic review. Pediatr. Res. 93(2), 293–299. https://doi.org/10.1038/s41390-022-02120-w (2023).
Article PubMed Google Scholar
Scheinost, D. et al. Machine learning and prediction in fetal, infant, and toddler neuroimaging: A review and primer. Biol. Psychiatry S0006–3223(22), 01706–01711. https://doi.org/10.1016/j.biopsych.2022.10.014 (2022).
Article Google Scholar
Fatima, A., Shahid, A. R., Raza, B., Madni, T. M. & Janjua, U. I. State-of-the-art traditional to the machine- and deep-learning-based skull stripping techniques, models, and algorithms. J. Digit. Imaging 33(6), 1443–1464. https://doi.org/10.1007/s10278-020-00367-5 (2020).
Article PubMed PubMed Central Google Scholar
Khalili, N. et al. Automatic extraction of the intracranial volume in fetal and neonatal MR scans using convolutional neural networks. Neuroimage Clin. 24, 102061. https://doi.org/10.1016/j.nicl.2019.102061 (2019).
Article PubMed PubMed Central Google Scholar
George, M. M. & Kalaivani, S. A view on atlas-based neonatal brain MRI segmentation. In ICTMI 2017 (eds Gulyás, B. et al.) 199–214 (Singapore, Springer, 2019). https://doi.org/10.1007/978-981-13-1477-3_16.
Chapter Google Scholar
Wang, G. et al. Impacts of skull stripping on construction of three-dimensional T1-weighted imaging-based brain structural network in full-term neonates. BioMed. Eng. OnLine 19(1), 41. https://doi.org/10.1186/s12938-020-00785-0 (2020).
Article PubMed PubMed Central Google Scholar
Serag, A. et al. Accurate Learning with Few Atlases (ALFA): An algorithm for MRI neonatal brain extraction and comparison with 11 publicly available methods. Sci. Rep. 6, 23470. https://doi.org/10.1038/srep23470 (2016).
Article ADS PubMed PubMed Central CAS Google Scholar
Gao, Y. et al. A multi-view pyramid network for skull stripping on neonatal T1-weighted MRI. Magn. Reson. Imaging 63, 70–79. https://doi.org/10.1016/j.mri.2019.08.025 (2019).
Article PubMed Google Scholar
Alansary, A. et al. Infant brain extraction in T1-weighted MR images using BET and refinement using LCDG and MGRF models. IEEE J. Biomed. Health Inform. 20(3), 925–935. https://doi.org/10.1109/JBHI.2015.2415477 (2016).
Article PubMed Google Scholar
Zhang, Q., Wang, L., Zong, X., Lin, W,. Li, G. & Shen, D. Frnet: Flattened residual network for infant MRI skull stripping. In 2019 IEEE 16th International Symposium on Biomedical Imaging. vol. 2019 (2019) 999–1002. https://doi.org/10.1109/ISBI.2019.8759167
Shattuck, D. W., Sandor-Leahy, S. R., Schaper, K. A., Rottenberg, D. A. & Leahy, R. M. Magnetic resonance image tissue classification using a partial volume model. Neuroimage 13(5), 856–876. https://doi.org/10.1006/nimg.2000.0730 (2001).
Article PubMed CAS Google Scholar
Smith, S. M. Fast robust automated brain extraction. Hum. Brain Mapp. 17(3), 143–155. https://doi.org/10.1002/hbm.10062 (2002).
Article PubMed PubMed Central Google Scholar
Iglesias, J. E., Liu, C.-Y., Thompson, P. M. & Tu, Z. Robust brain extraction across datasets and comparison with publicly available methods. IEEE Trans. Med. Imaging 30(9), 1617–1634. https://doi.org/10.1109/TMI.2011.2138152 (2011).
Article PubMed Google Scholar
Eskildsen, S. F. et al. BEaST: Brain extraction based on nonlocal segmentation technique. Neuroimage 59(3), 2362–2373. https://doi.org/10.1016/j.neuroimage.2011.09.012 (2012).
Article PubMed Google Scholar
Devi, C. N., Chandrasekharan, A., Sundararaman, V. K. & Alex, Z. C. Neonatal brain MRI segmentation: A review. Comput. Biol. Med. 64, 163–178. https://doi.org/10.1016/j.compbiomed.2015.06.016 (2015).
Article PubMed Google Scholar
Ségonne, F. et al. A hybrid approach to the skull stripping problem in MRI. Neuroimage. 22(3), 1060–1075. https://doi.org/10.1016/j.neuroimage.2004.03.032 (2004).
Article PubMed Google Scholar
Brummer, M. E., Mersereau, R. M., Eisner, R. L. & Lewine, R. J. Automatic detection of brain contours in MRI data sets. IEEE Trans. Med. Imaging. 12(2), 153–166. https://doi.org/10.1109/42.232244 (1993).
Article PubMed CAS Google Scholar
Somasundaram, K. & Kalaiselvi, T. Fully automatic brain extraction algorithm for axial T2-weighted magnetic resonance images. Comput. Biol. Med. 40(10), 811–822. https://doi.org/10.1016/j.compbiomed.2010.08.004 (2010).
Article PubMed CAS Google Scholar
Kalavathi, P. & Prasath, V. B. S. Methods on skull stripping of MRI head scan images-a review. J. Digit. Imaging 29(3), 365–379. https://doi.org/10.1007/s10278-015-9847-8 (2016).
Article PubMed CAS Google Scholar
Makropoulos, A., Counsell, S. J. & Rueckert, D. A review on automatic fetal and neonatal brain MRI segmentation. Neuroimage 170, 231–248. https://doi.org/10.1016/j.neuroimage.2017.06.074 (2018).
Article PubMed Google Scholar
Salehi, S. S. M., Erdogmus, D. & Gholipour, A. Auto-context Convolutional Neural Network (Auto-Net) for brain extraction in magnetic resonance imaging. IEEE Trans. Med. Imaging 36(11), 2319–2330. https://doi.org/10.1109/TMI.2017.2721362 (2017).
Article PubMed Central Google Scholar
Chen, J. V. et al. Factors and labor cost savings associated with successful pediatric imaging without anesthesia: A Single-Institution Study. Acad. Radiol. S1076–6332(22), 00697–00703. https://doi.org/10.1016/j.acra.2022.12.041 (2023).
Article Google Scholar
CABINET | Zenodo. https://zenodo.org/record/7843888. Accessed June 22, 2023.
Wang, L. et al. iBEAT V2.0: A multi-site applicable, deep learning-based pipeline for infant cerebral cortical surface reconstruction. Nat. Protoc. 18(5), 1488–1509. https://doi.org/10.1038/s41596-023-00806-x (2023).
Article PubMed PubMed Central CAS Google Scholar
Wu, Y. W. et al. Trial of erythropoietin for hypoxic-ischemic encephalopathy in newborns. N Engl. J. Med. 387(2), 148–159. https://doi.org/10.1056/NEJMoa2119660 (2022).
Article PubMed PubMed Central CAS Google Scholar
Juul, S. E. et al. High-dose erythropoietin for asphyxia and encephalopathy (HEAL): A randomized controlled trial—background, aims, and study protocol. Neonatology 113(4), 331–338. https://doi.org/10.1159/000486820 (2018).
Article PubMed CAS Google Scholar
Wisnowski, J. L. et al. Integrating neuroimaging biomarkers into the multicentre, high-dose erythropoietin for asphyxia and encephalopathy (HEAL) trial: Rationale, protocol and harmonisation. BMJ Open 11(4), e043852. https://doi.org/10.1136/bmjopen-2020-043852 (2021).
Article PubMed PubMed Central Google Scholar
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211. https://doi.org/10.1038/s41592-020-01008-z (2021).
Article PubMed CAS Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. arXiv: https://doi.org/10.48550/arXiv.1505.04597 (2015).
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage 31(3), 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015 (2006).
Article PubMed Google Scholar
Edwards, A. D. et al. The developing human connectome project neonatal data release. Front. Neurosci. https://doi.org/10.3389/fnins.2022.886772 (2022).
Article PubMed PubMed Central Google Scholar
Evans, A. C. & The, N. I. H. MRI study of normal brain development. NeuroImage 30(1), 184–202. https://doi.org/10.1016/j.neuroimage.2005.09.068 (2006).
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank and acknowledge all the members of the HEAL MRI committee who harmonized, processed, and curated the MRI data used in this study: Jessica Wisnowski, Bob McKinstry, and Amit Mathur

Author information

Authors and Affiliations

Department of Radiology, University of California San Francisco, San Francisco, CA, USA
Joshua V. Chen, Yi Li, Felicia Tang, Gunvant Chaudhari & Andreas M. Rauschecker
Division of Neuroradiology, Department of Radiology, Duke University Medical Center, Durham, NC, 27710, USA
Christopher Lew, Amanda Lee & Evan Calabrese
Duke University School of Medicine, Durham, NC, USA
Aden P. Haskell-Mendoza
University of California San Francisco Weill Institute for Neurosciences, San Francisco, CA, USA
Yvonne W. Wu
Duke Center for Artificial Intelligence in Radiology (DAIR), Durham, NC, USA
Evan Calabrese

Authors

Joshua V. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Felicia Tang
View author publications
You can also search for this author in PubMed Google Scholar
Gunvant Chaudhari
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Lew
View author publications
You can also search for this author in PubMed Google Scholar
Amanda Lee
View author publications
You can also search for this author in PubMed Google Scholar
Andreas M. Rauschecker
View author publications
You can also search for this author in PubMed Google Scholar
Aden P. Haskell-Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Yvonne W. Wu
View author publications
You can also search for this author in PubMed Google Scholar
Evan Calabrese
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.V.C.: study design, literature search, data acquisition or analysis, manuscript drafting, manuscript figures/tables, manuscript revision. Y.L.: study design, data acquisition or analysis, manuscript drafting, manuscript revision. F.T.: data acquisition or analysis, manuscript drafting, manuscript figures/tables, manuscript revision. G.C.: data acquisition or analysis, manuscript revision. C.L.: manuscript revision. A.L.: manuscript revision. AMR: manuscript revision. A.P.H.: manuscript figures/tables, manuscript revision. Y.W.W.: data acquisition or analysis, manuscript revision. E.C.: study design, literature search, data acquisition or analysis, manuscript drafting, manuscript figures/tables, manuscript revision. All authors reviewed and approved final manuscript.

Corresponding author

Correspondence to Evan Calabrese.

Ethics declarations

Competing interests

Authors have no relevant disclosures. AMR otherwise discloses, unrelated to this work: Research support from GE Healthcare; Consulting income from Arterys, Inc (now Tempus).

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, J.V., Li, Y., Tang, F. et al. Automated neonatal nnU-Net brain MRI extractor trained on a large multi-institutional dataset. Sci Rep 14, 4583 (2024). https://doi.org/10.1038/s41598-024-54436-8

Download citation

Received: 29 August 2023
Accepted: 13 February 2024
Published: 26 February 2024
DOI: https://doi.org/10.1038/s41598-024-54436-8
Springer Nature Limited

Automated neonatal nnU-Net brain MRI extractor trained on a large multi-institutional dataset

Abstract

Similar content being viewed by others

An automatic and accurate deep learning-based neuroimaging pipeline for the neonatal brain

Segmentation of 3D MRI Using 2D Convolutional Neural Networks in Infants’ Brain

Funcmasker-flex: An Automated BIDS-App for Brain Segmentation of Human Fetal Functional MRI data

Introduction

Methods

Study population

Study data

Iterative deep learning model development

Acceptable

Borderline

Needs revision

External validation

Sub-analyses

Mild motion artifact

Moderate motion artifact

Severe motion artifact

Evaluation metrics and statistical analyses

Ethical approval

Results

Study data and patient demographics

Model training

External validation and performance evaluation

Sub-analyses

Discussion

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation