Abstract
The increasing use of neuroimaging in clinical research has driven the creation of many large imaging datasets. However, these datasets often rely on inconsistent naming conventions in image file headers to describe acquisition, and time-consuming manual curation is necessary. Therefore, we sought to automate the process of classifying and organizing magnetic resonance imaging (MRI) data according to acquisition types common to the clinical routine, as well as automate the transformation of raw, unstructured images into Brain Imaging Data Structure (BIDS) datasets. To do this, we trained an XGBoost model to classify MRI acquisition types using relatively few acquisition parameters that are automatically stored by the MRI scanner in image file metadata, which are then mapped to the naming conventions prescribed by BIDS to transform the input images to the BIDS structure. The model recognizes MRI types with 99.475% accuracy, as well as a micro/macro-averaged precision of 0.9995/0.994, a micro/macro-averaged recall of 0.9995/0.989, and a micro/macro-averaged F1 of 0.9995/0.991. Our approach accurately and quickly classifies MRI types and transforms unstructured data into standardized structures with little-to-no user intervention, reducing the barrier of entry for clinical scientists and increasing the accessibility of existing neuroimaging data.
Similar content being viewed by others
References
Bedetti, C., arnaudbore, Guay, S., Carlin, J., Nick, Dastous, A. (2022, May). UNFmontreal/Dcm2Bids: 2.1.7. Zenodo. https://doi.org/10.5281/zenodo.6596007.
Butzkueven, H., Chapman, J., Cristiano, E., Grand’Maison, F., Hoffmann, M., Izquierdo, G., et al. (2006). MSBase: An international, online registry and platform for collaborative outcomes research in multiple sclerosis. Multiple Sclerosis Journal, 12(6), 769–774.
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., & Chen, K. (2015). Xgboost: Extreme gradient boosting. R Package Version 0 4-2, 1(4), 1–4.
Esteban, O., Birman, D., Schaer, M., Koyejo, O. O., Poldrack, R. A., & Gorgolewski, K. J. (2017). MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLOS ONE, 12(9), e0184661.
Esteban, O., Wright, J., Markiewicz, C. J., Thompson, W. H., Goncalves, M., Ciric, R. (2019). NiPreps: enabling the division of labor in neuroimaging beyond fMRIPrep, 7–9.
Gauriau, R., Bridge, C., Chen, L., Kitamura, F., Tenenholtz, N. A., Kirsch, J. E., et al. (2020). Using DICOM Metadata for Radiological Image Series categorization: A feasibility study on large clinical brain MRI datasets. Journal of Digital Imaging, 33(3), 747–762. https://doi.org/10.1007/s10278-019-00308-x.
Gorgolewski, K. J., Auer, T., Calhoun, V. D., Craddock, R. C., Das, S., Duff, E. P., et al. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1), 160044. https://doi.org/10.1038/sdata.2016.44.
Halchenko, Y. O. (2018). & others. Open Source Software: Heudiconv. Zenodo. doi, 10.
JackJr., C. R., Bernstein, M. A., Fox, N. C., Thompson, P., Alexander, G., Harvey, D., et al. (2008). The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging, 27(4), 685–691. https://doi.org/10.1002/jmri.21049.
Kennedy, D. N., Abraham, S. A., Bates, J. F., Crowley, A., Ghosh, S., Gillespie, T., et al. (2019). Everything matters: The ReproNim Perspective on reproducible neuroimaging. Frontiers in Neuroinformatics.
LaMontagne, P. J., Benzinger, T. L. S., Morris, J. C., Keefe, S., Hornbeck, R., Xiong, C. (2019). OASIS-3: Longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer disease. MedRxiv, 2012–2019.
Li, X., Morgan, P. S., Ashburner, J., Smith, J., & Rorden, C. (2016). The first step for neuroimaging data analysis: DICOM to NIfTI conversion. Journal of Neuroscience Methods, 264, 47–56. https://doi.org/10.1016/j.jneumeth.2016.03.001.
Lundberg, S. M., & Lee, S. I. A Unified Approach to Interpreting Model Predictions. Advances in neural information processing systems 30 (2017).
Luo, X. J., Kennedy, D. N., & Cohen, Z. (2009). Neuroimaging Informatics Tools and resources Clearinghouse (NITRC) Resource announcement. Neuroinformatics, 7(1), 55–56. https://doi.org/10.1007/s12021-008-9036-8.
Marek, K., Jennings, D., Lasch, S., Siderowf, A., Tanner, C., Simuni, T., et al. (2011). The Parkinson progression marker Initiative (PPMI). Progress in Neurobiology, 95(4), 629–635. https://doi.org/10.1016/j.pneurobio.2011.09.005.
Markiewicz, C. J., Gorgolewski, K. J., Feingold, F., Blair, R., Halchenko, Y. O., Miller, E., et al. (2021). The OpenNeuro resource for sharing of neuroscience data. eLife, 10, e71774. https://doi.org/10.7554/eLife.71774.
Mason, D., scaramallion, Suever, Vanessasaurus, J. (2022). pydicom/pydicom: pydicom 2.3.0. https://doi.org/10.5281/ZENODO.6394735.
Mildenberger, P., Eichelberg, M., & Martin, E. (2002). Introduction to the DICOM standard. European Radiology, 12(4), 920–927. https://doi.org/10.1007/s003300101100.
Milham, M., Fair, D., Mennes, M., & Mostofsky, S. (2012). The adhd-200 consortium: A model to advance the translational potential of neuroimaging in clinical neuroscience. Frontiers in Systems Neuroscience.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.
Satterthwaite, T. D., Connolly, J. J., Ruparel, K., Calkins, M. E., Jackson, C., Elliott, M. A., et al. (2016). The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. Neuroimage, 124, 1115–1119. https://doi.org/10.1016/j.neuroimage.2015.03.056.
Smith, S. M., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P. M., Federico, A., & De Stefano, N. (2002). Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage, 17(1), 479–489. https://doi.org/10.1006/nimg.2002.1040.
Smith-Bindman, R., Kwan, M. L., Marlow, E. C., Theis, M. K., Bolch, W., Cheng, S. Y., et al. (2019). Trends in Use of Medical Imaging in US Health Care systems and in Ontario, Canada, 2000–2016. JAMA - Journal of the American Medical Association, 322(9), 843–856. https://doi.org/10.1001/jama.2019.11456.
Tapera, T. M., Cieslak, M., Bertolero, M., Adebimpe, A., Aguirre, G. K., Butler, E. R., et al. (2021). FlywheelTools: Data Curation and Manipulation on the Flywheel platform. Frontiers in Neuroinformatics.
Taylor, J. R., Williams, N., Cusack, R., Auer, T., Shafto, M. A., Dixon, M., et al. (2017). The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. Neuroimage, 144, 262–269. https://doi.org/10.1016/j.neuroimage.2015.09.018.
van der Voort, S. R., Smits, M., & Klein, S. (2021). DeepDicomSort: An automatic sorting algorithm for Brain magnetic resonance Imaging Data. Neuroinformatics, 19(1), 159–184. https://doi.org/10.1007/s12021-020-09475-7.
van Ooijen, P. M. A. (2019). In E. R. Ranschaert, S. Morozov, & P. R. Algra (Eds.), Quality and Curation of Medical images and data BT - Artificial Intelligence in Medical Imaging: Opportunities, applications and risks (pp. 247–255). Springer International Publishing. https://doi.org/10.1007/978-3-319-94878-2_17.
Zwiers, M. P., Moia, S., & Oostenveld, R. (2022). BIDScoin: A User-Friendly Application to Convert Source Data to Brain Imaging Data Structure. Frontiers in Neuroinformatics, 15(January). https://doi.org/10.3389/fninf.2021.770608.
Author information
Authors and Affiliations
Contributions
A.B. wrote the main manuscript text, including tables and figures, and conceptualized the approach.A.B., M.S., N.B., and M.D. identified model features.A.B. and C.S. trained and validated the model.A.B. and S.S. implemented the model in a tool to transform MRI data to BIDS.All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bartnik, A., Singh, S., Sum, C. et al. An Automated Tool to Classify and Transform Unstructured MRI Data into BIDS Datasets. Neuroinform (2024). https://doi.org/10.1007/s12021-024-09659-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s12021-024-09659-5