Skip to main content

Competition on Spatial Statistics for Large Datasets

Abstract

As spatial datasets are becoming increasingly large and unwieldy, exact inference on spatial models becomes computationally prohibitive. Various approximation methods have been proposed to reduce the computational burden. Although comprehensive reviews on these approximation methods exist, comparisons of their performances are limited to small and medium sizes of datasets for a few selected methods. To achieve a comprehensive comparison comprising as many methods as possible, we organized the Competition on Spatial Statistics for Large Datasets. This competition had the following novel features: (1) we generated synthetic datasets with the ExaGeoStat software so that the number of generated realizations ranged from 100 thousand to 1 million; (2) we systematically designed the data-generating models to represent spatial processes with a wide range of statistical properties for both Gaussian and non-Gaussian cases; (3) the competition tasks included both estimation and prediction, and the results were assessed by multiple criteria; and (4) we have made all the datasets and competition results publicly available to serve as a benchmark for other approximation methods. In this paper, we disclose all the competition details and results along with some analysis of the competition outcomes.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  • Abdulah S, Li Y, Cao J, Ltaief H, Keyes DE, Genton MG, Sun Y (2019) ExaGeoStatR: A package for large-scale geostatistics in R. arXiv preprint arXiv:1908.06936

  • Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2018a) ExaGeoStat: A high performance unified software for geostatistics on manycore systems. IEEE Trans Parallel Distrib Syst 29(12):2771–2784

    Article  Google Scholar 

  • Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2018b). Parallel approximation of the maximum likelihood estimation for the prediction of large-scale geostatistics simulations. In: 2018 IEEE international conference on cluster computing (CLUSTER), pp. 98–108

  • Abdulah S, Ltaief H, Sun Y, Genton MG, Keyes DE (2019). Geostatistical modeling and prediction using mixed precision tile Cholesky factorization. In: 2019 IEEE 26th international conference on high performance computing, data, and analytics (HiPC), pp. 152–162

  • Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J Royal Stat Soc: Ser B (Stat Methodol) 70(4):825–848

    MathSciNet  Article  Google Scholar 

  • Bradley JR, Cressie N, Shi T (2016) A comparison of spatial predictors when datasets could be very large. Stat Surv 10:100–131

    MathSciNet  Article  Google Scholar 

  • CHAMELEON (2021, January). The Chameleon project. Available at https://project.inria.fr/chameleon

  • Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J Royal Stat Soc: Ser B (Stat Methodol) 70(1):209–226

    MathSciNet  Article  Google Scholar 

  • Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812

    MathSciNet  Article  Google Scholar 

  • Englund EJ (1990) A variance of geostatisticians. Math Geol 22(4):417–455

    Article  Google Scholar 

  • Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Comput Graph Stat 15(3):502–523

    MathSciNet  Article  Google Scholar 

  • Guinness J, Katzfuss M, Fahmy Y (2021) GpGp: Fast Gaussian Process Computation Using Vecchia’s Approximation. R package version 0.3.2

  • Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, Lindgren F, Nychka DW, Sun F, Zammit-Mangion A (2019) A case study competition among methods for analyzing large spatial data. J Agricult Biol Environ Stat 24(3):398–425

    MathSciNet  Article  Google Scholar 

  • HICMA (2021, January). The HiCMA project. Available at https://github.com/ecrc/hicma

  • Hong Y, Abdulah S, Genton MG, Sun Y (2021). Efficiency assessment of approximated spatial predictions for large datasets. Spat Stat 43:100517

  • Johnson SG (2014) The NLopt nonlinear-optimization package. Available at https://github.com/stevengj/nlopt

  • Katzfuss M (2017) A multi-resolution approximation for massive spatial datasets. J Am Stat Assoc 112(517):201–214

    MathSciNet  Article  Google Scholar 

  • Kaufman CG, Schervish MJ, Nychka DW (2008) Covariance tapering for likelihood-based estimation in large spatial data sets. J Am Stat Assoc 103(484):1545–1555

    MathSciNet  Article  Google Scholar 

  • Litvinenko A, Sun Y, Genton MG, Keyes DE (2019) Likelihood approximation with hierarchical matrices for large spatial datasets. Comput Stat Data Anal 137:115–132

    MathSciNet  Article  Google Scholar 

  • R Core Team (2019) R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/

  • Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J Royal Stat Soc: Ser B (Stat Methodol) 71(2):319–392

    MathSciNet  Article  Google Scholar 

  • Sang H, Huang JZ (2012) A full scale approximation of covariance functions for large spatial data sets. J Royal Stat Soc: Ser B (Stat Methodol) 74(1):111–132

    MathSciNet  Article  Google Scholar 

  • Srivastava RM (1987) A non-ergodic framework for variograms and covariance functions. Master’s thesis, Stanford University, Stanford, CA

  • Sun Y, Li B, Genton MG (2012) Geostatistics for large datasets, Chapter 3. In: Porcu E, Montero J-M, Schlather M (eds) Advances and challenges in space-time modelling of natural events, vol 207. Springer, Berlin, pp 55–77

    Chapter  Google Scholar 

  • Varin C (2008) On composite marginal likelihoods. Adv Stat Anal 92(1):1–28

    MathSciNet  Article  Google Scholar 

  • Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Statistica Sinica 21:5–42

    MathSciNet  MATH  Google Scholar 

  • Vecchia AV (1988) Estimation and model identification for continuous spatial processes. J Roy Stat Soc: Ser B (Methodol) 50(2):297–312

    MathSciNet  Google Scholar 

  • Wikle CK, Cressie N, Zammit-Mangion A, Shumack C (2017). A common task framework (ctf) for objective comparison of spatial prediction methodologies. Stats & data science views. Available at https://www.statisticsviews.com/article/a-common-task-framework-ctf-for-objective-comparison-of-spatial-prediction-methodologies

  • Xu G, Genton MG (2017) Tukey \(g\)-and-\(h\) random fields. J Am Stat Assoc 112(519):1236–1249

    MathSciNet  Article  Google Scholar 

Download references

Funding

Funding was provided by King Abdullah University of Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc G. Genton.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1227 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, H., Abdulah, S., Sun, Y. et al. Competition on Spatial Statistics for Large Datasets. JABES 26, 580–595 (2021). https://doi.org/10.1007/s13253-021-00457-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-021-00457-z

Keywords

  • Gaussian processes
  • Matérn covariance function
  • Parameter estimation
  • Prediction
  • Tukey g-and-h random fields