How You Split Matters: Data Leakage and Subject Characteristics Studies in Longitudinal Brain MRI Analysis

Rumala, Dewinda J.

doi:10.1007/978-3-031-45249-9_23

Dewinda J. Rumala ORCID: orcid.org/0000-0002-1458-2238²⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14242))

Included in the following conference series:

523 Accesses

Abstract

Deep learning models have revolutionized the field of medical image analysis, offering significant promise for improved diagnostics and patient care. However, their performance can be misleadingly optimistic due to a hidden pitfall called ‘data leakage’. In this study, we investigate data leakage in 3D medical imaging, specifically using 3D Convolutional Neural Networks (CNNs) for brain MRI analysis. While 3D CNNs appear less prone to leakage than 2D counterparts, improper data splitting during cross-validation (CV) can still pose issues, especially with longitudinal imaging data containing repeated scans from the same subject. We explore the impact of different data splitting strategies on model performance for longitudinal brain MRI analysis and identify potential data leakage concerns. GradCAM visualization helps reveal shortcuts in CNN models caused by identity confounding, where the model learns to identify subjects along with diagnostic features. Our findings, consistent with prior research, underscore the importance of subject-wise splitting and evaluating our model further on hold-out data from different subjects to ensure the integrity and reliability of deep learning models in medical image analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://adni.loni.usc.edu/methods/mri-analysis.

References

Arun, N., et al.: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. radiology. Artif. Intell. 3(6), e200267 (2021). https://doi.org/10.1148/ryai.2021200267
Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing. Nat. Commun. 14(1), 4314 (2023). https://doi.org/10.1038/s41467-023-39902-7
Article Google Scholar
Bussola, N., Marcolini, A., Maggio, V., Jurman, G., Furlanello, C.: AI slipping on tiles: data leakage in digital pathology. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12661, pp. 167–182. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68763-2_13
Chapter Google Scholar
Chaibub Neto, E., et al.: Detecting the impact of subject characteristics on machine learning-based diagnostic applications. npj Digit. Med. 2(1), 99 (2019). https://doi.org/10.1038/s41746-019-0178-x
Drukker, K., et al.: Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment. J. Med. Imaging 10(06) (2023). https://doi.org/10.1117/1.JMI.10.6.061104
Gaser, C., Dahnke, R., Thompson, P.M., Kurth, F., Luders, E.: Alzheimer’s disease neuroimaging initiative: CAT – a computational anatomy toolbox for the analysis of structural MRI data. Neuroscience (2022). https://doi.org/10.1101/2022.06.11.495736
Geirhos, R., et al.: Shortcut learning in deep neural networks. Nat. Mach. Intell. 2(11), 665–673 (2020). https://doi.org/10.1038/s42256-020-00257-z
Article Google Scholar
Ghazal, M.: Alzheimer RSQUO s disease diagnostics by a 3D deeply supervised adaptable convolutional network. Front. Biosci. 23(2), 584–596 (2018). https://doi.org/10.2741/4606
Article MathSciNet Google Scholar
Goenka, N., Tiwari, S.: AlzVNet: a volumetric convolutional neural network for multiclass classification of alzheimer’s disease through multiple neuroimaging computational approaches. Biomed. Sig. Process. Control 74, 103500 (2022). https://doi.org/10.1016/j.bspc.2022.103500
Article Google Scholar
Jack, C.R., et al.: ADNI study: the Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 27(4), 685–691 (2008). https://doi.org/10.1002/jmri.21049
Jiménez-Sánchez, A., Juodelyte, D., Chamberlain, B., Cheplygina, V.: Detecting shortcuts in medical images - a case study in chest x-rays (2022)
Google Scholar
Kaufman, S., Rosset, S., Perlich, C., Stitelman, O.: Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6(4) (2012). https://doi.org/10.1145/2382577.2382579
Korolev, S., Safiullin, A., Belyaev, M., Dodonova, Y.: Residual and plain convolutional neural networks for 3D brain MRI classification. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 835–838. IEEE, Melbourne, Australia, April 2017. https://doi.org/10.1109/ISBI.2017.7950647
Little, M.A., et al.: Using and understanding cross-validation strategies. Perspectives on Saeb et al. GigaScience 6(5) (2017). https://doi.org/10.1093/gigascience/gix020
Narazani, M., Sarasua, I., Pölsterl, S., Lizarraga, A., Yakushev, I., Wachinger, C.: Is a PET all you need? a multi-modal study for alzheimer’s disease using 3D CNNs. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, pp. 66–76. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-16431-6_7
Chapter Google Scholar
Neto, E.C., Perumal, T.M., Pratap, A., Bot, B.M., Mangravite, L., Omberg, L.: On the analysis of personalized medication response and classification of case vs control patients in mobile health studies: the mpower case study (2017)
Google Scholar
Petersen, E., Feragen, A., da Costa Zemsch, M.L., Henriksen, A., Wiese Christensen, O.E., Ganz, M.: Alzheimer’s disease neuroimaging initiative: feature robustness and sex differences in medical imaging: a case study in MRI-based Alzheimer’s disease detection. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, vol. 13431, pp. 88–98. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-16431-6_9
Ricci Lara, M.A., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat. Commun. 13(1), 4581 (2022). https://doi.org/10.1038/s41467-022-32186-3
Article Google Scholar
Saeb, S., Lonini, L., Jayaraman, A., Mohr, D.C., Kording, K.P.: The need to approximate the use-case in clinical machine learning. GigaScience 6(5) (2017). https://doi.org/10.1093/gigascience/gix019
Solovyev, R., Kalinin, A.A., Gabruseva, T.: 3D convolutional neural networks for stalled brain capillary detection. Comput. Biol. Med. 141, 105089 (2022). https://doi.org/10.1016/j.compbiomed.2021.105089
Article Google Scholar
Varoquaux, G., Cheplygina, V.: Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit. Med. 5(1), 48 (2022). https://doi.org/10.1038/s41746-022-00592-y
Yagis, E., et al.: Effect of data leakage in brain MRI classification using 2D convolutional neural networks. Sci. Rep. 11(1), 22544 (2021). https://doi.org/10.1038/s41598-021-01681-w
Zhang, J., Zheng, B., Gao, A., Feng, X., Liang, D., Long, X.: A 3D densely connected convolution neural network with connection-wise attention mechanism for Alzheimer’s disease classification. Magn. Reson. Imaging 78, 119–126 (2021). https://doi.org/10.1016/j.mri.2021.02.001
Article Google Scholar

Download references

Acknowledgement

This research was funded by the Ministry of Education and Research Technology, Indonesia through the PMDSU scholarship. Special thanks to Prof. I Ketut Eddy Purnama, the author’s PhD supervisor, for securing the research funding and for his valuable ideas and insights, and Prof. Tae-Seong Kim, whose insightful perspectives inspired the development of this paper. Additionally, the author would like to express gratitude to the Bio Imaging Laboratory at Kyung Hee University, South Korea, where the data collection for this study was conducted.

Author information

Authors and Affiliations

Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Dewinda J. Rumala

Authors

Dewinda J. Rumala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dewinda J. Rumala .

Editor information

Editors and Affiliations

Fraunhofer-Institute for Computer Graphics Research (IGD), Darmstadt, Germany
Stefan Wesarg
King's College London, London, UK
Esther Puyol Antón
Université de Rennes, Rennes, France
John S. H. Baxter
Singapore, Singapore
Marius Erdt
Aachen University of Applied Sciences, Aachen, Germany
Klaus Drechsler
Fraunhofer-Institute for Computer Graphics Research (IGD), Darmstadt, Germany
Cristina Oyarzun Laura
Technion – Israel Institute of Technology, Haifa, Israel
Moti Freiman
Tongji University, Shanghai, China
Yufei Chen
Imperial College London, London, UK
Islem Rekik
Western University, London, ON, Canada
Roy Eagleson
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
King's College London, London, UK
Andrew P. King
University of Copenhagen, Copenhagen, Denmark
Veronika Cheplygina
University of Copenhagen, Copenhagen, Denmark
Melani Ganz-Benjaminsen
Universidad Nacional del Litoral, Santa Fe, Argentina
Enzo Ferrante
Imperial College London, London, UK
Ben Glocker
Vanderbilt University, Nashville, TN, USA
Daniel Moyer
Technical University of Denmark, Kgs. Lyngby, Denmark
Eikel Petersen

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 961 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rumala, D.J. (2023). How You Split Matters: Data Leakage and Subject Characteristics Studies in Longitudinal Brain MRI Analysis. In: Wesarg, S., et al. Clinical Image-Based Procedures, Fairness of AI in Medical Imaging, and Ethical and Philosophical Issues in Medical Imaging. CLIP EPIMI FAIMI 2023 2023 2023. Lecture Notes in Computer Science, vol 14242. Springer, Cham. https://doi.org/10.1007/978-3-031-45249-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-45249-9_23
Published: 09 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45248-2
Online ISBN: 978-3-031-45249-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

How You Split Matters: Data Leakage and Subject Characteristics Studies in Longitudinal Brain MRI Analysis