pp 1–4 | Cite as

Comments on: Data science, big data and statistics

  • Marc G. GentonEmail author
  • Ying Sun


We would like to start by congratulating the authors for a very timely and stimulating paper. They have provided thought-provoking ideas on Data Science and Big Data, and on how Statistics must play a major role in these new areas. We focus our discussion on two points that have caught our attention and interest: visualization and computations for new sources of information.

Visualization for new sources of information

Traditionally, Statistics has dealt with scalar and vectorial observations. However, as noted by the authors, advances in technology have greatly facilitated the collection of large-scale high-dimensional data in many research fields. Among various types of high-dimensional data, spatiotemporal data and functional data have been particularly popular. Classical statistical methodologies face many challenges for such datasets because they often contain massive amounts of observations, non-Gaussian features, and they may exhibit complex spatiotemporal dynamics....

Mathematics Subject Classification

62M30 62H30 



  1. Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2018a) Parallel approximation of the maximum likelihood estimation for the prediction of large-scale geostatistics simulations. In: IEEE Int Conf Clust Comput, pp 98–108Google Scholar
  2. Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2018b) ExaGeoStat: a high performance unified software for geostatistics on manycore systems. IEEE Trans Parallel Distrib Syst 29:2771–2784CrossRefGoogle Scholar
  3. Baugh S, Stein ML (2018) Computationally efficient spatial modeling using recursive skeletonization factorizations. Spat Stat 27:18–30MathSciNetCrossRefGoogle Scholar
  4. Castruccio S, Genton MG (2018) Principles for statistical inference on big spatio-temporal data from climate models. Stat Probab Lett 136:92–96MathSciNetCrossRefzbMATHGoogle Scholar
  5. Castruccio S, Ombao H, Genton MG (2018) A scalable multi-resolution spatio-temporal model for brain activation and connectivity in fMRI data. Biometrics 74:823–833MathSciNetCrossRefzbMATHGoogle Scholar
  6. Castruccio S, Genton MG, Sun Y (2019) Visualising spatio-temporal models with virtual reality: from fully immersive environments to apps in stereoscopic view. J R Stat Soc A Stat 182:379–387CrossRefGoogle Scholar
  7. Dai W, Genton MG (2018a) Functional boxplots for multivariate curves. Stat 7:e190CrossRefGoogle Scholar
  8. Dai W, Genton MG (2018b) Multivariate functional data visualization and outlier detection. J Comput Graph Stat 27:923–934MathSciNetCrossRefGoogle Scholar
  9. Dai W, Genton MG (2019) Directional outlyingness for multivariate functional data. Comput Stat Data Anal 131:50–65MathSciNetCrossRefzbMATHGoogle Scholar
  10. Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111:800–812MathSciNetCrossRefGoogle Scholar
  11. Euán C, Sun Y (2019) Directional spectra-based clustering methods for visualizing patterns of winds and waves in the Red Sea. J Comput Graph Stat.
  12. Euán C, Ombao H, Ortega J (2018) The hierarchical spectral merger algorithm: a new time series clustering procedure. J Classif 35:71–99Google Scholar
  13. Euán C, Sun Y, Ombao H (2019) Coherence-based time series clustering for statistical inference and visualization of brain connectivity. Ann Appl Stat (to appear)Google Scholar
  14. Gangnon RE, Clayton MK (2004) Likelihood-based tests for detecting spatial clustering of disease. Environmetrics 15:797–810CrossRefGoogle Scholar
  15. Genton MG, Castruccio S, Crippa P, Dutta S, Huser R, Sun Y, Vettori S (2015) Visuanimation in statistics. Stat 4:81–96MathSciNetCrossRefGoogle Scholar
  16. Huang H, Sun Y (2018) Hierarchical low rank approximation of likelihoods for large spatial datasets. J Comput Graph Stat 27:110–118MathSciNetCrossRefGoogle Scholar
  17. Huang H, Sun Y (2019) Visualization and assessment of spatio-temporal covariance properties. Spat Stat.
  18. Lee J, Gangnon RE, Zhu J (2017) Cluster detection of spatial regression coefficients. Stat Med 27:110–118MathSciNetGoogle Scholar
  19. Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, Sain S (2015) A multiresolution Gaussian process model for the analysis of large spatial datasets. J Comput Graph Stat 24:579–599MathSciNetCrossRefGoogle Scholar
  20. Sun Y, Genton MG (2011) Functional boxplots. J Comput Graph Stat 20:316–334MathSciNetCrossRefGoogle Scholar
  21. Sun Y, Genton MG (2012) Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics 23:54–64MathSciNetCrossRefGoogle Scholar
  22. Sun Y, Genton MG, Nychka D (2012a) Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked? Stat 1:68–74CrossRefGoogle Scholar
  23. Sun Y, Li B, Genton MG (2012b) Geostatistics for large datasets, Chap 3. In: Porcu E, Montero JM, Schlather M (eds) Space-time processes and challenges related to environmental problems, vol 207. Springer, Berlin, pp 55–77Google Scholar
  24. Xie W, Kurtek S, Bharath K, Sun Y (2017) A geometric approach to visualization of variability in functional data. J Am Stat Assoc 112:979–993MathSciNetCrossRefGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2019

Authors and Affiliations

  1. 1.Statistics ProgramKing Abdullah University of Science and TechnologyThuwalSaudi Arabia

Personalised recommendations