Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference
Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3% accuracy. Finally, we propose to tell causal from confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data.
This research was partially supported by the Bavarian State Ministry of Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B).
- 1.Alexander, L.M., Escalera, J., et al.: An open resource for transdiagnostic research in pediatric mental health and learning disorders. bioRxiv, p. 149369 (2017)Google Scholar
- 2.Buckner, R., et al.: The brain genomics superstruct project. HDN (2012)Google Scholar
- 10.Han, X., Fischl, B.: Atlas renormalization for improved brain MR image segmentation across scanner platforms. IEEE TMI 26(4), 479–486 (2007)Google Scholar
- 12.Kaltenpoth, D., Vreeken, J.: We are not your real parents: telling causal from confounded by MDL. In: SIAM International Conference on Data Mining (2019)Google Scholar
- 19.Milham, M.P., Fair, D., Mennes, M., Mostofsky, S.H., et al.: The ADHD-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience. Front. Syst. Neurosci. 6, 62 (2012)Google Scholar
- 23.Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR), pp. 1521–1528 (2011)Google Scholar