Big Data and Neuroimaging

  • Yenny Webb-Vargas
  • Shaojie Chen
  • Aaron Fisher
  • Amanda Mejia
  • Yuting Xu
  • Ciprian Crainiceanu
  • Brian Caffo
  • Martin A. Lindquist
Article

Abstract

Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.

Keywords

Big Data Neuroimaging High-dimensional computation High-dimensional inference High-dimensional causal inference Data fusion Shrinkage Dynamic networks 

References

  1. 1.
    Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14(5):365–376CrossRefGoogle Scholar
  2. 2.
    Munafò M, Noble S, Browne WJ, Brunner D, Button K, Ferreira J, Holmans P, Langbehn D, Lewis G, Lindquist M et al (2014) Scientific rigor and the art of motorcycle maintenance. Nat Biotechnol 32(9):871–873CrossRefGoogle Scholar
  3. 3.
    Carp J (2012) The secret lives of experiments: methods reporting in the fMRI literature. Neuroimage 63(1):289–300CrossRefGoogle Scholar
  4. 4.
    Biswal BB, Mennes M, Zuo XN, Gohel S, Kelly C, Smith SM, Beckmann CF, Adelstein JS, Buckner RL, Colcombe S et al (2010) Toward discovery science of human brain function. Proc Nat Acad Sci 107(10):4734–4739CrossRefGoogle Scholar
  5. 5.
    Van Essen DC, Ugurbil K, Auerbach E, Barch D, Behrens T, Bucholz R, Chang A, Chen L, Corbetta M, Curtiss SW et al (2012) The human connectome project: a data acquisition perspective. Neuroimage 62(4):2222–2231CrossRefGoogle Scholar
  6. 6.
    Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014CrossRefGoogle Scholar
  7. 7.
    Lindquist MA et al (2008) The statistical analysis of fMRI data. Stat Sci 23(4):439–464MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Cattell R (2011) Scalable SQL and NoSQL data stores. ACM SIGMOD Rec 39(4):12–27CrossRefGoogle Scholar
  9. 9.
    Snir M (1998) MPI—the complete reference: the MPI core, vol 1. MIT press, CambridgeGoogle Scholar
  10. 10.
    Luebke D, Harris M, Govindaraju N, Lefohn A, Houston M, Owens J, Segal M, Papakipos M, Buck I (2006) Gpgpu: general-purpose computation on graphics hardware. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p 208. ACM (2006)Google Scholar
  11. 11.
    Bock T (2017) R4CouchDB: A R Convenience Layer for CouchDB 2.0. R package version 0.7.5.Google Scholar
  12. 12.
    Lewis B, Lewis MB (2014) doredis: Foreach parallel adapter for the rredis package. R package version 1.1.1.Google Scholar
  13. 13.
    James DA, Falcon S (2011) Rsqlite: Sqlite interface for r. R package version 0.11 1Google Scholar
  14. 14.
    James DA, DebRoy S (2012) Rmysql: R interface to the mysql database. R package version 0.9-3Google Scholar
  15. 15.
    Grothendieck G (2014) sqldf: Perform SQL selects on r data frames. R package version 0.4-10.Google Scholar
  16. 16.
    Conway J, Eddelbuettel D, Nishiyama T, Prayaga S, Tiffin N (2012) Rpostgresql: R interface to the postgresql database system (2010). r package version 0.1-7Google Scholar
  17. 17.
    Lindsly G (2012) rmongodb: R-mongodb driver. R package version 1.0.5.Google Scholar
  18. 18.
    Tierney L, Rossini A, Li N, Sevcikova H (2008) Snow: simple network of workstations. R package version 0.3-3, http://CRAN.R-project.org/package=snow
  19. 19.
    Pathak AMS, Bannard T (2014) Rhadoop: an improved execution environment for restricted map reduce programs. R package.Google Scholar
  20. 20.
    Buckner J, Wilson J, Seligman M, Athey B, Watson S, Meng F (2010) The gputools package enables gpu computing in R. Bioinformatics 26(1):134–135CrossRefGoogle Scholar
  21. 21.
    Eddelbuettel D (2014) Cran task view: High-performance and parallel computing with RGoogle Scholar
  22. 22.
    Duato J, Pena AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rcuda: Reducing the number of gpu-based accelerators in high performance clusters. In: International conference on high performance computing and simulation (HPCS) , pp 224–231. IEEE (2010)Google Scholar
  23. 23.
    Chen S, Huang L, Qiu H, Nebel MB, Mostofsky S, Pekar J, Eloyan A, Caffo B (2017) Parallel group independent component analysis for massive fMRI data sets. PloS one 12(3): e0173496. doi:10.1371/journal.pone.0173496
  24. 24.
    Chen S, Liu K, Yang Y, Xu Y, Lee S, Lindquist M, Caffo BS, Vogelstein JT. (2016) An M-Estimator for Reduced-Rank System Identification. Pattern Recognition Letters.Google Scholar
  25. 25.
    Efron B, Tibshirani R (1993) An introduction to the bootstrap, vol 57. CRC press, Boca RatonCrossRefMATHGoogle Scholar
  26. 26.
    Fisher A, Caffo B, Schwartz B, Zipunnikov V (2014) Fast, exact bootstrap principal component analysis for p> 1 million. arXiv preprint arXiv:1405.0922
  27. 27.
    Jolliffe I (2005) Principal component analysis. Wiley, nEW yORKCrossRefMATHGoogle Scholar
  28. 28.
    Koch I (2013) Analysis of multivariate and high-dimensional data. Cambridge University Press, Cambridge. doi:10.1017/CBO9781139025805 CrossRefGoogle Scholar
  29. 29.
    Stephan KE, Roebroeck A (2012) A short history of causal modeling of fMRI data. NeuroImage 62(2):856–863CrossRefGoogle Scholar
  30. 30.
    Lindquist MA, Sobel ME (2011) Graphical models, potential outcomes and causal inference: comment on ramsey, spirtes and glymour. NeuroImage 57(2):334–336CrossRefGoogle Scholar
  31. 31.
    Lindquist MA, Sobel ME (2013) Cloak and dag: a response to the comments on our comment. NeuroImage 76:446–449CrossRefGoogle Scholar
  32. 32.
    Splawa-Neyman J, Dabrowska D, Speed T et al (1990) On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat Sci 5(4):465–472MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688CrossRefGoogle Scholar
  34. 34.
    Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701CrossRefGoogle Scholar
  35. 35.
    Greenland S, Robins JM, Pearl J (1999) Confounding and collapsibility in causal inference. Stat Sci 14(1):29–46CrossRefMATHGoogle Scholar
  36. 36.
    Sobel ME, Lindquist MA (2014) Causal inference for fMRI time series data with systematic errors of measurement in a balanced on / off study of social evaluative threat. J Am Stat Assoc 109(507):967–976MathSciNetCrossRefGoogle Scholar
  37. 37.
    Luo X, Small DS, Li CSR, Rosenbaum PR (2012) Inference with interference between units in an fMRI experiment of motor inhibition. J Am Stat Assoc 107(498): 530–541. 10.1080/01621459.2012.655954. http://www.tandfonline.com/doi/abs/10.1080/01621459.2012.655954
  38. 38.
    Lindquist MA (2012) Functional causal mediation analysis with an application to brain connectivity. J Am Stat Assoc 107(500):1297–1309MathSciNetCrossRefMATHGoogle Scholar
  39. 39.
    Thirion B, Flandin G, Pinel P, Roche A, Ciuciu P, Poline JB (2006) Dealing with the shortcomings of spatial normalization: multi-subject parcellation of fMRI datasets. Hum Brain Mapp 27(8):678–693CrossRefGoogle Scholar
  40. 40.
    Lindquist M, Lindquist A (2014) Zen and the art of multiple comparisons. Psychosom Med 77:114CrossRefGoogle Scholar
  41. 41.
    Sporns O, Tononi G, Kötter R (2005) The human connectome: a structural description of the human brain. PLoS Comput Biol 1(4):e42CrossRefGoogle Scholar
  42. 42.
    Nebel MB, Joel SE, Muschelli J, Barber AD, Caffo BS, Pekar JJ, Mostofsky SH (2014) Disruption of functional organization within the primary motor cortex in children with autism. Hum Brai Mapp 35:567–580CrossRefGoogle Scholar
  43. 43.
    Cohen AL, Fair DA, Dosenbach NU, Miezin FM, Dierker D, Van Essen DC, Schlaggar BL, Petersen SE (2008) Defining functional areas in individual human brains using resting functional connectivity MRI. Neuroimage 41(1):45CrossRefGoogle Scholar
  44. 44.
    Ryali S, Chen T, Supekar K, Supekar V (2013) A parcellation scheme based on von mises-fisher distributions and markov random fields for segmenting brain regions using resting-state fMRI. NeuroImage 65:83–96CrossRefGoogle Scholar
  45. 45.
    Blumensath T, Jbabdi S, Glasser MF, Van Essen DC, Ugurbil K, Behrens TE, Smith SM (2013) Spatially constrained hierarchical parcellation of the brain with resting-state fMRI. Neuroimage 76:313–324CrossRefGoogle Scholar
  46. 46.
    Cordes D, Haughton V, Carew JD, Arfanakis K, Maravilla K (2002) Hierarchical clustering to measure connectivity in fMRI resting-state data. Magn Reson Imaging 20(4):305–317CrossRefGoogle Scholar
  47. 47.
    Salvador S, Brovelli A, Longo R (2002) A simple and fast technique for on-line fMRI data analysis. Magn Reson Imaging 20(2):207–213CrossRefGoogle Scholar
  48. 48.
    Kim JH, Lee JM, Jo HJ, Kim SH, Lee JH, Kim ST, Seo SW, Cox RW, Na DL, Kim SI et al (2010) Defining functional sma and pre-sma subregions in human mfc using resting state fMRI: functional connectivity-based parcellation method. Neuroimage 49(3):2375CrossRefGoogle Scholar
  49. 49.
    Craddock RC, James GA, Holtzheimer PE, Hu XP, Mayberg HS (2011) A whole brain fMRI atlas generated via spatially constrained spectral clustering. Hum Brain Mapp 33(8):1914–1928CrossRefGoogle Scholar
  50. 50.
    Wig GS, Laumann TO, Cohen AL, Power JD, Nelson SM, Glasser MF, Miezin FM, Snyder AZ, Schlaggar BL, Petersen SE (2014) Parcellating an individual subject’s cortical and subcortical brain structures using snowball sampling of resting-state correlations. Cereb Cortex 24:2036–2054CrossRefGoogle Scholar
  51. 51.
    Mejia AF, Nebel MB, Shou H, Crainiceanu CM, Pekar JJ, Mostofsky S, Caffo B, Lindquist MA (2014) Improving reliability of subject-level resting-state fMRI parcellation with shrinkage estimators. arXiv preprint arXiv:1409.5450
  52. 52.
    James W, Stein C (1961) Estimation with quadratic loss. Proc Fourth Berkeley Symp Math Stat Probab 1:361–379MathSciNetMATHGoogle Scholar
  53. 53.
    Efron B, Morris C (1975) Data analysis using stein’s estimator and its generalizations. J Am Stat Assoc 70(350):311–319CrossRefMATHGoogle Scholar
  54. 54.
    Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering analysis and an algorithm. Proceedings of advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856Google Scholar
  55. 55.
    Van Essen DC, Smith SM, Barch DM, Behrens TE, Yacoub E, Ugurbil K (2013) The wu-minn human connectome project: an overview. Neuroimage 80:62–79CrossRefGoogle Scholar
  56. 56.
    Di Martino A, Yan C, Li Q, Denio E, Castellanos F, Alaerts K, Anderson J, Assaf M, Bookheimer S, Dapretto M et al (2014) The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol Psychiatr 19(6):659–667CrossRefGoogle Scholar
  57. 57.
    Bullmore ET, Bassett DS (2011) Brain graphs: graphical models of the human brain connectome. Annu Rev Clin Psychol 7:113–140CrossRefGoogle Scholar
  58. 58.
    Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci 10(3):186–198CrossRefGoogle Scholar
  59. 59.
    Chang C, Glover GH (2010) Time-frequency dynamics of resting-state brain connectivity measured with fMRI. Neuroimage 50(1):81–98CrossRefGoogle Scholar
  60. 60.
    Nakai T, Bagarinao E, Matsuo K, Ohgami Y, Kato C (2006) Dynamic monitoring of brain activation under visual stimulation using fMRI the advantage of real-time fMRI with sliding window GLM analysis. J Neurosci Methods 157(1):158–167CrossRefGoogle Scholar
  61. 61.
    Lindquist MA, Waugh C, Wager TD (2007) Modeling state-related fMRI activity using change-point theory. Neuroimage 35(3):1125–1141CrossRefGoogle Scholar
  62. 62.
    Qiu H, Han F, Liu H, Caffo B (2013) Joint estimation of multiple graphical models from high dimensional time series. arXiv preprint arXiv:1311.0219
  63. 63.
    Lindquist MA, Xu Y, Nebel MB, Caffo BS (2014) Evaluating dynamic bivariate correlations in resting-state fMRI: a comparison study and a new approach. NeuroImage 101:531–546CrossRefGoogle Scholar
  64. 64.
    Engle R (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J Bus Econ Stat 20(3):339–350MathSciNetCrossRefGoogle Scholar
  65. 65.
    Robinson LF, Wager TD, Lindquist MA (2010) Change point estimation in multi-subject fMRI studies. Neuroimage 49(2):1581–1592CrossRefGoogle Scholar
  66. 66.
    Cribben I, Haraldsdottir R, Atlas LY, Wager TD, Lindquist MA (2012) Dynamic connectivity regression: determining state-related changes in brain connectivity. Neuroimage 61(4):907–920CrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2017

Authors and Affiliations

  1. 1.Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreUSA

Personalised recommendations