Emerging Shifts in Neuroimaging Data Analysis in the Era of “Big Data”

  • Danilo BzdokEmail author
  • Marc-Andre Schulz
  • Martin Lindquist


Advances in positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) have revolutionized our understanding of human cognition and its neurobiological basis. However, a modern imaging setup often costs several million dollars and requires highly trained technicians to conduct data acquisition. Brain-imaging studies are typically laborious in logistics and data management, and require costly-to-maintain infrastructure. The often small numbers of scanned participants per study have precluded the deployment of and potential benefits from advanced statistical methods in neuroimaging that tend to require more data (Bzdok and Yeo, NeuroImage 155:549–564, 2017; Efron and Hastie, Computer age statistical inference, 2016). In this chapter we discuss how the increased information granularity of burgeoning neuroimaging data repositories—in both number of participants and measured variables per participant—will motivate and require new statistical approaches in everyday data analysis. We put particular emphasis on the implications for the future of precision psychiatry, where brain-imaging has the potential to improve diagnosis, risk detection, and treatment choice by clinical-endpoint prediction in single patients. We argue that the statistical properties of approaches tailored for the data-rich setting promise improved clinical translation of empirically justified single-patient prediction in a fast, cost-effective, and pragmatic manner.


Neuroimaging Big data Brain-imaging studies MRI 


  1. Alfaro-Almagro F et al (2018) Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage 166:400–424PubMedPubMedCentralCrossRefGoogle Scholar
  2. Arbabshirani MR et al (2017) Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. NeuroImage 145:137–165PubMedCrossRefGoogle Scholar
  3. Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonGoogle Scholar
  4. Berkson J (1938) Some difficulties of interpretation encountered in the application of the chi-square test. J Am Stat Assoc 33(203):526–536CrossRefGoogle Scholar
  5. Bickel PJ, Doksum KA (2007) Mathematical statistics: basic ideas and selected topics. Pearson, Upper Saddle RiverGoogle Scholar
  6. Bishop CM (2006) Machine learning and pattern recognition. Information science and statistics. Springer, HeidelbergGoogle Scholar
  7. Bishop CM, Lasserre J (2007) Generative or discriminative? Getting the best of both worlds. Bayesian Stat 8(3):3–24Google Scholar
  8. Bloom DE et al (2012) The global economic burden of noncommunicable diseases. No. 8712. Program on the global demography of agingGoogle Scholar
  9. Brodersen KH et al (2011) Generative embedding for model-based classification of fMRI data. PLoS Comput Biol 7(6):e1002079PubMedPubMedCentralCrossRefGoogle Scholar
  10. Bzdok D, Meyer-Lindenberg A (2018) Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging 3(3):223–230PubMedCrossRefGoogle Scholar
  11. Bzdok D, Yeo BTT (2017) Inference in the age of big data: future perspectives on neuroscience. NeuroImage 155:549–564PubMedCrossRefGoogle Scholar
  12. Bzdok D, Eickenberg M, Varoquaux G, Thirion B (2017) Hierarchical region-network sparsity for high-dimensional inference in brain imaging. Inf Process Med Imaging 10265:323–335PubMedPubMedCentralCrossRefGoogle Scholar
  13. David O et al (2008) Identifying neural drivers with functional MRI: an electrophysiological validation. PLoS Biol 6(12):2683–2697PubMedCrossRefGoogle Scholar
  14. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New YorkCrossRefGoogle Scholar
  15. Drysdale AT et al (2017) Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med 23(1):28–38PubMedCrossRefGoogle Scholar
  16. Editorial (2016) Daunting data. Nature 539:467–468Google Scholar
  17. Efron B (2012) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, CambridgeGoogle Scholar
  18. Efron B, Hastie T (2016) Computer age statistical inference. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  19. Fisher RA, Mackenzie WA (1923) Studies in crop variation. II. The manurial response of different potato varieties. J Agric Sci 13(3):311–320CrossRefGoogle Scholar
  20. Focke NK et al (2011) Multi-site voxel-based morphometry—not quite there yet. NeuroImage 56(3):1164–1170PubMedCrossRefGoogle Scholar
  21. Freedman D (1995) Some issues in the foundation of statistics. Found Sci 1(1):19–39CrossRefGoogle Scholar
  22. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer Series in Statistics, New YorkGoogle Scholar
  23. Friston K, Penny W (2003) Posterior probability maps and SPMs. NeuroImage 19(3):1240–1249PubMedCrossRefGoogle Scholar
  24. Friston KJ, Penny W, Phillips C, Kiebel S, Hinton G, Ashburner J (2002) Classical and Bayesian inference in neuroimaging: theory. NeuroImage 16(2):465–483PubMedCrossRefGoogle Scholar
  25. Friston KJ, Harrison L, Penny W (2003) Dynamic causal modelling. NeuroImage 19(4):1273–1302PubMedCrossRefGoogle Scholar
  26. Gennatas ED et al (2017) Age-related effects and sex differences in gray matter density, volume, mass, and cortical thickness from childhood to young adulthood. J Neurosci 37(20):5065–5073PubMedPubMedCentralCrossRefGoogle Scholar
  27. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeGoogle Scholar
  28. Gustavsson A et al (2011) Cost of disorders of the brain in Europe 2010. Eur Neuropsychopharmacol 21(10):718–779PubMedCrossRefGoogle Scholar
  29. Hastie T, Tibshirani R (1990) Generalized additive models. Chapman & Hall, LondonGoogle Scholar
  30. Insel TR, Cuthbert BN (2015) Medicine. Brain disorders? Precisely. Science 348(6234):499–500PubMedCrossRefGoogle Scholar
  31. James G et al (2013) An introduction to statistical learning: with applications in R. Springer, New YorkCrossRefGoogle Scholar
  32. Jebara T (2012) Machine learning: discriminative and generative. Springer Science & Business Media, BerlinGoogle Scholar
  33. Jordan MI (2011) A message from the president: the era of big data. ISBA Bull 18(2):1–3Google Scholar
  34. Jordan MI et al (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233CrossRefGoogle Scholar
  35. Mejia AF, Nebel MB, Shou H, Crainiceanu CM, Pekar JJ, Mostofsky S, Caffo B, Lindquist MA (2015) Improving reliability of subject-level resting-state fMRI parcellation with shrinkage estimators. NeuroImage 112:14–29PubMedPubMedCentralCrossRefGoogle Scholar
  36. Miller RG (1981) Simultaneous statistical inference. Springer, HeidelbergCrossRefGoogle Scholar
  37. Miller KL et al (2016) Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat Neurosci 19(11):1523–1536PubMedPubMedCentralCrossRefGoogle Scholar
  38. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, CambridgeGoogle Scholar
  39. Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Phil Trans R Soc Lond A Math Phys Sci 231:289–337CrossRefGoogle Scholar
  40. Smith SM, Nichols TE (2018) Statistical challenges in “big data” human neuroimaging. Neuron 97(2):263–268PubMedCrossRefGoogle Scholar
  41. Sudlow C et al (2015) UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12(3):e1001779PubMedPubMedCentralCrossRefGoogle Scholar
  42. Takao H, Hayashi N, Ohtomo K (2013) Effects of the use of multiple scanners and of scanner upgrade in longitudinal voxel-based morphometry studies. J Magn Reson Imaging 38(5):1283–1291PubMedCrossRefGoogle Scholar
  43. Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134–1142CrossRefGoogle Scholar
  44. Vapnik V (1998) Statistical learning theory. Wiley, New YorkGoogle Scholar
  45. Varoquaux G, Gramfort A, Poline J-B, Thirion B (2010) Brain covariance selection: better individual functional connectivity models using population prior. Advances in neural information processing systems, pp 2334–2342Google Scholar
  46. Wang H-T et al (2018) Dimensions of experience: exploring the heterogeneity of the wandering mind. Psychol Sci 29(1):56–71PubMedCrossRefGoogle Scholar
  47. Woo C-W et al (2017) Building better biomarkers: brain models in translational neuroimaging. Nat Neurosci 20(3):365–377PubMedPubMedCentralCrossRefGoogle Scholar
  48. Yang Y, Wainwright MJ, Jordan MI (2016) On the computational complexity of high-dimensional Bayesian variable selection. Ann Stat 44(6):2497–2532CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Danilo Bzdok
    • 1
    • 2
    • 3
    Email author
  • Marc-Andre Schulz
    • 1
  • Martin Lindquist
    • 4
  1. 1.Department of Psychiatry and PsychotherapyRWTH Aachen UniversityAachenGermany
  2. 2.Jülich Aachen Research Alliance (JARA)—Translational Brain MedicineAachenGermany
  3. 3.Parietal Team, INRIAGif-sur-YvetteFrance
  4. 4.Department of BiostatisticsJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations