A scalable Bayesian nonparametric model for large spatio-temporal data

  • Zahra Barzegar
  • Firoozeh RivazEmail author
Original paper


The Bayesian nonparametric (BNP) approach is an effective tool for building flexible spatio-temporal probability models. Despite the flexibility and attractiveness of this approach, the resulting spatio-temporal models become computationally demanding when datasets are large. This paper develops a class of computationally efficient and easy to implement BNP models for large spatio-temporal data. To be more specific, we introduce a random distribution for the spatio-temporal effects based on a stick-breaking construction in which the atoms are modeled in terms of a basis system. In this framework, a low rank basis approximation and a vector autoregressive process are used to model spatial and temporal dependencies, respectively. We demonstrate that the proposed model is an extension of the Gaussian low rank model with similar computational complexity, hence it offers great scalability for large spatio-temporal data. Through a simulation study, we assess the performance of the proposed model. For illustration, we then analyze a set of data comprised of precipitation measurements.


Large datasets Stick-breaking process Non-stationarity Non-Gaussianity 



The Editor, and two referees are gratefully acknowledged. Their precise comments and constructive suggestions have substantially improved the manuscript.

Supplementary material

180_2019_905_MOESM1_ESM.pdf (67 kb)
Supplementary material 1 (pdf 67 KB)


  1. Bandyopadhyay S, Rao SS (2017) A test for stationarity for irregularly spaced spatial data. J R Stat Soc Ser B (Stat Method) 79(1):95–123MathSciNetzbMATHCrossRefGoogle Scholar
  2. Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(4):825–848MathSciNetzbMATHCrossRefGoogle Scholar
  3. Banerjee S, Finley AO, Waldmann P, Ericsson T (2010) Hierarchical spatial process models for multiple traits in large genetic trials. J Am Stat Assoc 105(490):506–521MathSciNetzbMATHCrossRefGoogle Scholar
  4. Bradley JR, Cressie N, Shi T (2011) Selection of rank and basis functions in the spatial random effects model. In: Proceedings of the 2011 joint statistical meetings. American Statistical Association, Alexandria, pp 3393–3406Google Scholar
  5. Bradley JR, Cressie N, Shi T (2015) Comparing and selecting spatial predictors using local criteria. Test 24(1):1–28MathSciNetzbMATHCrossRefGoogle Scholar
  6. Bradley JR, Cressie N, Shi T (2016) A comparison of spatial predictors when datasets could be very large. Stat Surv 10:100–131MathSciNetzbMATHCrossRefGoogle Scholar
  7. Canale A, Scarpa B (2016) Bayesian nonparametric location–scale–shape mixtures. Test 25(1):113–130MathSciNetzbMATHCrossRefGoogle Scholar
  8. Carter CK, Kohn R (1994) On Gibbs sampling for state space models. Biometrika 81:541–553MathSciNetzbMATHCrossRefGoogle Scholar
  9. Cavatti Vieira C, Loschi RH, Duarte D (2015) Nonparametric mixtures based on skew-normal distributions: an application to density estimation. Commun Stat Theory Methods 44(8):1552–1570MathSciNetzbMATHCrossRefGoogle Scholar
  10. Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(1):209–226MathSciNetzbMATHCrossRefGoogle Scholar
  11. Cressie N, Shi T, Kang EL (2010) Fixed rank filtering for spatio-temporal data. J Comput Graph Stat 19(3):724–745MathSciNetCrossRefGoogle Scholar
  12. Di Lucca MA, Guglielmi A, Müller P, Quintana FA (2013) A simple class of Bayesian nonparametric autoregression models. Bayesian Anal (Online) 8(1):63MathSciNetzbMATHCrossRefGoogle Scholar
  13. Duan JA, Guindani M, Gelfand AE (2007) Generalized spatial Dirichlet process models. Biometrika 94(4):809–825MathSciNetzbMATHCrossRefGoogle Scholar
  14. Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577–588MathSciNetzbMATHCrossRefGoogle Scholar
  15. Finley AO, Banerjee S, Gelfand AE (2012) Bayesian dynamic modeling for large space–time datasets using Gaussian predictive processes, vol 14. Springer, BerlinGoogle Scholar
  16. Frühwirth-Schnatter S (1994) Data augmentation and dynamic linear models. J Time Ser Anal 15(2):183–202MathSciNetzbMATHCrossRefGoogle Scholar
  17. Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Comput Graph Stat 15(3):502–523MathSciNetCrossRefGoogle Scholar
  18. Gelfand AE, Kottas A, MacEachern SN (2005) Bayesian nonparametric spatial modeling with Dirichlet process mixing. J Am Stat Assoc 100(471):1021–1035MathSciNetzbMATHCrossRefGoogle Scholar
  19. Gelfand AE, Diggle P, Guttorp P, Fuentes M (eds) (2010) Handbook of spatial statistics. CRC Press, CambridgezbMATHGoogle Scholar
  20. Gelfand AE, Banerjee S, Finley A (2012) Spatial design for knot selection in knot-based dimension reduction models. In: Mateu JM, Mueller W (eds) Spatio-temporal design: Advances in efficient data acquisition. Wiley, pp 142–169Google Scholar
  21. Ghosal S, Ghosh JK, Ramamoorthi RV (1999) Posterior consistency of Dirichlet mixtures in density estimation. Ann Stat 27(1):143–158MathSciNetzbMATHCrossRefGoogle Scholar
  22. Griffin JE, Steel MF (2011) Stick-breaking autoregressive processes. J Econom 162(2):383–396MathSciNetzbMATHCrossRefGoogle Scholar
  23. Gutiérrez L, Mena RH, Ruggiero M (2016) A time dependent bayesian nonparametric model for air quality analysis. Comput Stat Data Anal 95:161–175MathSciNetzbMATHCrossRefGoogle Scholar
  24. Hanson T, Johnson WO (2002) Modeling regression error with a mixture of Polya trees. J Am Stat Assoc 97(460):1020–1033MathSciNetzbMATHCrossRefGoogle Scholar
  25. Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109MathSciNetzbMATHCrossRefGoogle Scholar
  26. Heaton MJ, Katzfuss M, Berrett C, Nychka DW (2014) Constructing valid spatial processes on the sphere using kernel convolutions. Environmetrics 25:2–15MathSciNetCrossRefGoogle Scholar
  27. Higdon D (1998) A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ Ecol Stat 5(2):173–190CrossRefGoogle Scholar
  28. Hosseinpouri M, Khaledi MJ (2019) An area-specific stick breaking process for spatial data. Stat Pap 60(1):199–221MathSciNetzbMATHCrossRefGoogle Scholar
  29. Kalli M, Griffin JE (2018) Bayesian nonparametric vector autoregressive models. J Econom 203(2):267–282MathSciNetzbMATHCrossRefGoogle Scholar
  30. Kalli M, Griffin JE, Walker SG (2011) Slice sampling mixture models. Stat Comput 21(1):93–105MathSciNetzbMATHCrossRefGoogle Scholar
  31. Kang EL, Cressie N, Shi T (2010) Using temporal variability to improve spatial mapping with application to satellite data. Can J Stat 38(2):271–289MathSciNetzbMATHCrossRefGoogle Scholar
  32. Katzfuss M (2013) Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3):189–200MathSciNetCrossRefGoogle Scholar
  33. Katzfuss M, Cressie N (2011) Bayesian hierarchical spatio-temporal smoothing for very large datasets. Environmetrics 23(1):94–107MathSciNetzbMATHCrossRefGoogle Scholar
  34. Kaufman L, Rousseeuw P (1990) Finding groups in data, vol 16. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  35. Lemos RT, Sanso B (2009) A spatio-temporal model for mean, anomaly, and trend fields of North Atlantic sea surface temperature. J Am Stat Assoc 104(485):5–18MathSciNetCrossRefGoogle Scholar
  36. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092CrossRefGoogle Scholar
  37. Nieto-Barajas LE, Contreras-Cristán A (2014) A Bayesian nonparametric approach for time series clustering. Bayesian Anal 9(1):147–170MathSciNetzbMATHCrossRefGoogle Scholar
  38. Nieto-Barajas L, Müller P, Ji Y, Lu Y, Mills G (2008) Time series dependent Dirichlet process. PreprintGoogle Scholar
  39. Nguyen H, Cressie N, Braverman A (2012) Spatial statistical data fusion for remote sensing applications. J Am Stat Assoc 107(499):1004–1018MathSciNetzbMATHCrossRefGoogle Scholar
  40. Pati D, Dunson DB, Tokdar ST (2013) Posterior consistency in conditional distribution estimation. J Multivar Anal 116:456–472MathSciNetzbMATHCrossRefGoogle Scholar
  41. Petrone S, Guindani M, Gelfand AE (2009) Hybrid Dirichlet mixture models for functional data. J R Stat Soc Ser B (Stat Methodol) 71(4):755–782MathSciNetzbMATHCrossRefGoogle Scholar
  42. Reich BJ, Fuentes M (2007) A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann Appl Stat 1:249–264MathSciNetzbMATHCrossRefGoogle Scholar
  43. Reich BJ, Fuentes M (2012) Nonparametric Bayesian models for a spatial covariance. Stat Methodol 9(1–2):265–274MathSciNetzbMATHCrossRefGoogle Scholar
  44. Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. CRC Press, CambridgezbMATHCrossRefGoogle Scholar
  45. Rue H, Tjelmeland H (2002) Fitting Gaussian Markov random fields to Gaussian fields. Scand J Stat 29(1):31–49MathSciNetzbMATHCrossRefGoogle Scholar
  46. Sahr K, White D, Kimerling AJ (2003) Geodesic discrete global grid systems. Cartogr Geogr Inf Sci 30(2):121–134CrossRefGoogle Scholar
  47. Schörgendorfer A, Branscum AJ, Hanson TE (2013) A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data. Biometrics 69(2):508–519MathSciNetzbMATHCrossRefGoogle Scholar
  48. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit (with Discussion). J Roy Stat Soc B 64:583–639zbMATHCrossRefGoogle Scholar
  49. Stein ML (2014) Limitations on low rank approximations for covariance matrices of spatial data. Spat Stat 8:1–19MathSciNetCrossRefGoogle Scholar
  50. Stein ML, Chi Z, Welty LJ (2004) Approximating likelihoods for large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 66(2):275–296MathSciNetzbMATHCrossRefGoogle Scholar
  51. Vecchia AV (1988) Estimation and model identification for continuous spatial processes. J R Stat Soc Ser B (Methodol) 50(2):297–312MathSciNetGoogle Scholar
  52. Walker SG (2007) Sampling the Dirichlet mixture model with slices. Commun Stat Simul Comput 36(1):45–54MathSciNetzbMATHCrossRefGoogle Scholar
  53. Walker SG, Mallick BK (1999) Semiparametric accelerated life time model. Biometrics 55:477–483MathSciNetzbMATHCrossRefGoogle Scholar
  54. Warren J, Fuentes M, Herring A, Langlois P (2012) Bayesian spatial–temporal model for cardiac congenital anomalies and ambient air pollution risk assessment. Environmetrics 23(8):673–684MathSciNetCrossRefGoogle Scholar
  55. West M, Harrison J (1997) Bayesian forecasting and dynamic models, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  56. Xu K, Wikle CK, Fox NI (2005) A kernel-based spatio-temporal dynamical model for nowcasting weather radar reflectivities. J Am Stat Assoc 100(472):1133–1144MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Statistics, Faculty of Mathematical SciencesShahid Beheshti UniversityTehranIran

Personalised recommendations