Big Data and Neuroimaging


Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14(5):365–376

    Article  Google Scholar 

  2. 2.

    Munafò M, Noble S, Browne WJ, Brunner D, Button K, Ferreira J, Holmans P, Langbehn D, Lewis G, Lindquist M et al (2014) Scientific rigor and the art of motorcycle maintenance. Nat Biotechnol 32(9):871–873

    Article  Google Scholar 

  3. 3.

    Carp J (2012) The secret lives of experiments: methods reporting in the fMRI literature. Neuroimage 63(1):289–300

    Article  Google Scholar 

  4. 4.

    Biswal BB, Mennes M, Zuo XN, Gohel S, Kelly C, Smith SM, Beckmann CF, Adelstein JS, Buckner RL, Colcombe S et al (2010) Toward discovery science of human brain function. Proc Nat Acad Sci 107(10):4734–4739

    Article  Google Scholar 

  5. 5.

    Van Essen DC, Ugurbil K, Auerbach E, Barch D, Behrens T, Bucholz R, Chang A, Chen L, Corbetta M, Curtiss SW et al (2012) The human connectome project: a data acquisition perspective. Neuroimage 62(4):2222–2231

    Article  Google Scholar 

  6. 6.

    Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014

    Article  Google Scholar 

  7. 7.

    Lindquist MA et al (2008) The statistical analysis of fMRI data. Stat Sci 23(4):439–464

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Cattell R (2011) Scalable SQL and NoSQL data stores. ACM SIGMOD Rec 39(4):12–27

    Article  Google Scholar 

  9. 9.

    Snir M (1998) MPI—the complete reference: the MPI core, vol 1. MIT press, Cambridge

    Google Scholar 

  10. 10.

    Luebke D, Harris M, Govindaraju N, Lefohn A, Houston M, Owens J, Segal M, Papakipos M, Buck I (2006) Gpgpu: general-purpose computation on graphics hardware. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p 208. ACM (2006)

  11. 11.

    Bock T (2017) R4CouchDB: A R Convenience Layer for CouchDB 2.0. R package version 0.7.5.

  12. 12.

    Lewis B, Lewis MB (2014) doredis: Foreach parallel adapter for the rredis package. R package version 1.1.1.

  13. 13.

    James DA, Falcon S (2011) Rsqlite: Sqlite interface for r. R package version 0.11 1

  14. 14.

    James DA, DebRoy S (2012) Rmysql: R interface to the mysql database. R package version 0.9-3

  15. 15.

    Grothendieck G (2014) sqldf: Perform SQL selects on r data frames. R package version 0.4-10.

  16. 16.

    Conway J, Eddelbuettel D, Nishiyama T, Prayaga S, Tiffin N (2012) Rpostgresql: R interface to the postgresql database system (2010). r package version 0.1-7

  17. 17.

    Lindsly G (2012) rmongodb: R-mongodb driver. R package version 1.0.5.

  18. 18.

    Tierney L, Rossini A, Li N, Sevcikova H (2008) Snow: simple network of workstations. R package version 0.3-3,

  19. 19.

    Pathak AMS, Bannard T (2014) Rhadoop: an improved execution environment for restricted map reduce programs. R package.

  20. 20.

    Buckner J, Wilson J, Seligman M, Athey B, Watson S, Meng F (2010) The gputools package enables gpu computing in R. Bioinformatics 26(1):134–135

    Article  Google Scholar 

  21. 21.

    Eddelbuettel D (2014) Cran task view: High-performance and parallel computing with R

  22. 22.

    Duato J, Pena AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rcuda: Reducing the number of gpu-based accelerators in high performance clusters. In: International conference on high performance computing and simulation (HPCS) , pp 224–231. IEEE (2010)

  23. 23.

    Chen S, Huang L, Qiu H, Nebel MB, Mostofsky S, Pekar J, Eloyan A, Caffo B (2017) Parallel group independent component analysis for massive fMRI data sets. PloS one 12(3): e0173496. doi:10.1371/journal.pone.0173496

  24. 24.

    Chen S, Liu K, Yang Y, Xu Y, Lee S, Lindquist M, Caffo BS, Vogelstein JT. (2016) An M-Estimator for Reduced-Rank System Identification. Pattern Recognition Letters.

  25. 25.

    Efron B, Tibshirani R (1993) An introduction to the bootstrap, vol 57. CRC press, Boca Raton

    Google Scholar 

  26. 26.

    Fisher A, Caffo B, Schwartz B, Zipunnikov V (2014) Fast, exact bootstrap principal component analysis for p> 1 million. arXiv preprint arXiv:1405.0922

  27. 27.

    Jolliffe I (2005) Principal component analysis. Wiley, nEW yORK

    Google Scholar 

  28. 28.

    Koch I (2013) Analysis of multivariate and high-dimensional data. Cambridge University Press, Cambridge. doi:10.1017/CBO9781139025805

    Google Scholar 

  29. 29.

    Stephan KE, Roebroeck A (2012) A short history of causal modeling of fMRI data. NeuroImage 62(2):856–863

    Article  Google Scholar 

  30. 30.

    Lindquist MA, Sobel ME (2011) Graphical models, potential outcomes and causal inference: comment on ramsey, spirtes and glymour. NeuroImage 57(2):334–336

    Article  Google Scholar 

  31. 31.

    Lindquist MA, Sobel ME (2013) Cloak and dag: a response to the comments on our comment. NeuroImage 76:446–449

    Article  Google Scholar 

  32. 32.

    Splawa-Neyman J, Dabrowska D, Speed T et al (1990) On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat Sci 5(4):465–472

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688

    Article  Google Scholar 

  34. 34.

    Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701

    Article  Google Scholar 

  35. 35.

    Greenland S, Robins JM, Pearl J (1999) Confounding and collapsibility in causal inference. Stat Sci 14(1):29–46

    Article  MATH  Google Scholar 

  36. 36.

    Sobel ME, Lindquist MA (2014) Causal inference for fMRI time series data with systematic errors of measurement in a balanced on / off study of social evaluative threat. J Am Stat Assoc 109(507):967–976

    MathSciNet  Article  Google Scholar 

  37. 37.

    Luo X, Small DS, Li CSR, Rosenbaum PR (2012) Inference with interference between units in an fMRI experiment of motor inhibition. J Am Stat Assoc 107(498): 530–541. 10.1080/01621459.2012.655954.

  38. 38.

    Lindquist MA (2012) Functional causal mediation analysis with an application to brain connectivity. J Am Stat Assoc 107(500):1297–1309

    MathSciNet  Article  MATH  Google Scholar 

  39. 39.

    Thirion B, Flandin G, Pinel P, Roche A, Ciuciu P, Poline JB (2006) Dealing with the shortcomings of spatial normalization: multi-subject parcellation of fMRI datasets. Hum Brain Mapp 27(8):678–693

    Article  Google Scholar 

  40. 40.

    Lindquist M, Lindquist A (2014) Zen and the art of multiple comparisons. Psychosom Med 77:114

    Article  Google Scholar 

  41. 41.

    Sporns O, Tononi G, Kötter R (2005) The human connectome: a structural description of the human brain. PLoS Comput Biol 1(4):e42

    Article  Google Scholar 

  42. 42.

    Nebel MB, Joel SE, Muschelli J, Barber AD, Caffo BS, Pekar JJ, Mostofsky SH (2014) Disruption of functional organization within the primary motor cortex in children with autism. Hum Brai Mapp 35:567–580

    Article  Google Scholar 

  43. 43.

    Cohen AL, Fair DA, Dosenbach NU, Miezin FM, Dierker D, Van Essen DC, Schlaggar BL, Petersen SE (2008) Defining functional areas in individual human brains using resting functional connectivity MRI. Neuroimage 41(1):45

    Article  Google Scholar 

  44. 44.

    Ryali S, Chen T, Supekar K, Supekar V (2013) A parcellation scheme based on von mises-fisher distributions and markov random fields for segmenting brain regions using resting-state fMRI. NeuroImage 65:83–96

    Article  Google Scholar 

  45. 45.

    Blumensath T, Jbabdi S, Glasser MF, Van Essen DC, Ugurbil K, Behrens TE, Smith SM (2013) Spatially constrained hierarchical parcellation of the brain with resting-state fMRI. Neuroimage 76:313–324

    Article  Google Scholar 

  46. 46.

    Cordes D, Haughton V, Carew JD, Arfanakis K, Maravilla K (2002) Hierarchical clustering to measure connectivity in fMRI resting-state data. Magn Reson Imaging 20(4):305–317

    Article  Google Scholar 

  47. 47.

    Salvador S, Brovelli A, Longo R (2002) A simple and fast technique for on-line fMRI data analysis. Magn Reson Imaging 20(2):207–213

    Article  Google Scholar 

  48. 48.

    Kim JH, Lee JM, Jo HJ, Kim SH, Lee JH, Kim ST, Seo SW, Cox RW, Na DL, Kim SI et al (2010) Defining functional sma and pre-sma subregions in human mfc using resting state fMRI: functional connectivity-based parcellation method. Neuroimage 49(3):2375

    Article  Google Scholar 

  49. 49.

    Craddock RC, James GA, Holtzheimer PE, Hu XP, Mayberg HS (2011) A whole brain fMRI atlas generated via spatially constrained spectral clustering. Hum Brain Mapp 33(8):1914–1928

    Article  Google Scholar 

  50. 50.

    Wig GS, Laumann TO, Cohen AL, Power JD, Nelson SM, Glasser MF, Miezin FM, Snyder AZ, Schlaggar BL, Petersen SE (2014) Parcellating an individual subject’s cortical and subcortical brain structures using snowball sampling of resting-state correlations. Cereb Cortex 24:2036–2054

    Article  Google Scholar 

  51. 51.

    Mejia AF, Nebel MB, Shou H, Crainiceanu CM, Pekar JJ, Mostofsky S, Caffo B, Lindquist MA (2014) Improving reliability of subject-level resting-state fMRI parcellation with shrinkage estimators. arXiv preprint arXiv:1409.5450

  52. 52.

    James W, Stein C (1961) Estimation with quadratic loss. Proc Fourth Berkeley Symp Math Stat Probab 1:361–379

    MathSciNet  MATH  Google Scholar 

  53. 53.

    Efron B, Morris C (1975) Data analysis using stein’s estimator and its generalizations. J Am Stat Assoc 70(350):311–319

    Article  MATH  Google Scholar 

  54. 54.

    Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering analysis and an algorithm. Proceedings of advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856

    Google Scholar 

  55. 55.

    Van Essen DC, Smith SM, Barch DM, Behrens TE, Yacoub E, Ugurbil K (2013) The wu-minn human connectome project: an overview. Neuroimage 80:62–79

    Article  Google Scholar 

  56. 56.

    Di Martino A, Yan C, Li Q, Denio E, Castellanos F, Alaerts K, Anderson J, Assaf M, Bookheimer S, Dapretto M et al (2014) The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol Psychiatr 19(6):659–667

    Article  Google Scholar 

  57. 57.

    Bullmore ET, Bassett DS (2011) Brain graphs: graphical models of the human brain connectome. Annu Rev Clin Psychol 7:113–140

    Article  Google Scholar 

  58. 58.

    Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci 10(3):186–198

    Article  Google Scholar 

  59. 59.

    Chang C, Glover GH (2010) Time-frequency dynamics of resting-state brain connectivity measured with fMRI. Neuroimage 50(1):81–98

    Article  Google Scholar 

  60. 60.

    Nakai T, Bagarinao E, Matsuo K, Ohgami Y, Kato C (2006) Dynamic monitoring of brain activation under visual stimulation using fMRI the advantage of real-time fMRI with sliding window GLM analysis. J Neurosci Methods 157(1):158–167

    Article  Google Scholar 

  61. 61.

    Lindquist MA, Waugh C, Wager TD (2007) Modeling state-related fMRI activity using change-point theory. Neuroimage 35(3):1125–1141

    Article  Google Scholar 

  62. 62.

    Qiu H, Han F, Liu H, Caffo B (2013) Joint estimation of multiple graphical models from high dimensional time series. arXiv preprint arXiv:1311.0219

  63. 63.

    Lindquist MA, Xu Y, Nebel MB, Caffo BS (2014) Evaluating dynamic bivariate correlations in resting-state fMRI: a comparison study and a new approach. NeuroImage 101:531–546

    Article  Google Scholar 

  64. 64.

    Engle R (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J Bus Econ Stat 20(3):339–350

    MathSciNet  Article  Google Scholar 

  65. 65.

    Robinson LF, Wager TD, Lindquist MA (2010) Change point estimation in multi-subject fMRI studies. Neuroimage 49(2):1581–1592

    Article  Google Scholar 

  66. 66.

    Cribben I, Haraldsdottir R, Atlas LY, Wager TD, Lindquist MA (2012) Dynamic connectivity regression: determining state-related changes in brain connectivity. Neuroimage 61(4):907–920

    Article  Google Scholar 

Download references


The projects described were supported by the NIH Grants R01 EB012547 and R01 EB016061 from the National Institute of Biomedical Imaging And Bioengineering, and R01 NS060910 from the National Institute of Neurological Disorders and Stroke.

Author information



Corresponding author

Correspondence to Martin A. Lindquist.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Webb-Vargas, Y., Chen, S., Fisher, A. et al. Big Data and Neuroimaging. Stat Biosci 9, 543–558 (2017).

Download citation


  • Big Data
  • Neuroimaging
  • High-dimensional computation
  • High-dimensional inference
  • High-dimensional causal inference
  • Data fusion
  • Shrinkage
  • Dynamic networks