Skip to main content
Log in

Parallel inference for massive distributed spatial data using low-rank models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require processing all data at a single central computing node. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis. The communication cost of our distributed algorithms does not depend on the number of data points. After extending our results to the spatio-temporal case, we illustrate our methodology by carrying out distributed spatio-temporal particle filtering inference on total precipitable water measured by three different satellite sensor systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B 70(4), 825–848 (2008). doi:10.1111/j.1467-9868.2008.00663.x

    Article  MathSciNet  MATH  Google Scholar 

  • Bevilacqua, M., Gaetan, C., Mateu, J., Porcu, E.: Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach. J. Am. Stat. Assoc. 107(497), 268–280 (2012). doi:10.1080/01621459.2011.646928

    Article  MathSciNet  MATH  Google Scholar 

  • Bradley, J.R., Cressie, N., Shi, T.: A comparison of spatial predictors when datasets could be very large. (2014) arXiv:1410.7748

  • Calder, C.A.: Dynamic factor process convolution models for multivariate space-time data with application to air quality assessment. Environ. Ecol. Stat. 14(3), 229–247 (2007). doi:10.1007/s10651-007-0019-y

    Article  MathSciNet  Google Scholar 

  • Caragea, P.C., Smith, R.L.: Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. J. Multivar. Anal. 98(7), 1417–1440 (2007). doi:10.1016/j.jmva.2006.08.010

    Article  MathSciNet  MATH  Google Scholar 

  • Caragea, P.C., Smith, R.L.: Approximate likelihoods for spatial processes. Technical Report, University of North Carolina, Chapel Hill, NC (2008)

  • Cortés, J.: Distributed kriged Kalman filter for spatial estimation. IEEE Trans. Autom. Control 54(12), 2816–2827 (2009)

    Article  MathSciNet  Google Scholar 

  • Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B 70(1), 209–226 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Cressie, N., Shi, T., Kang, E.L.: Fixed rank filtering for spatio-temporal data. J Comput. Graph. Stat. 19(3), 724–745 (2010)

    Article  MathSciNet  Google Scholar 

  • Curriero, F., Lele, S.: A composite likelihood approach to semivariogram estimation. J. Agric. Biol. Environ. Stat. 4(1), 9–28 (1999)

    Article  MathSciNet  Google Scholar 

  • Douc R, Cappé O, Moulines E (2005) Comparison of resampling schemes for particle filtering. In: ISPA 2005 Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005 pp 64–69. doi:10.1109/ISPA.2005.195385

  • Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M., Niemi, J.: Estimation and prediction in spatial models with block composite likelihoods using parallel computing. J. Comput. Graph. Stat. 23(2), 295–315 (2014)

    Article  Google Scholar 

  • Finley, A., Banerjee, S., Gelfand, A.E.: Bayesian dynamic modeling for large space-time datasets using Gaussian predictive processes. J. Geogr. Syst. 14, 29–47 (2012)

    Article  Google Scholar 

  • Finley, A.O., Sang, H., Banerjee, S., Gelfand, A.E.: Improving the performance of predictive process modeling for large datasets. Comput. Stat. Data Anal. 53(8), 2873–2884 (2009). doi:10.1016/j.csda.2008.09.008

    Article  MathSciNet  MATH  Google Scholar 

  • Forsythe, J.M., Dodson, J.B., Partain, P.T., Kidder, S.Q., Haar, T.H.V.: How total precipitable water vapor anomalies relate to cloud vertical structure. J. Hydrometeorol. 13(2), 709–721 (2012)

    Article  Google Scholar 

  • Fuller, S.H., Millett, L.I. (eds.): The Future of Computing Performance: Game Over or Next Level?. Committee on Sustaining Growth in Computing Performance; National Research Council, Washington, DC (2011)

  • Gordon, N., Salmond, D., Smith, A.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. Radar Signal Process. 140(2), 107–113 (1993)

    Article  Google Scholar 

  • Graham, R., Cortés, J.: Cooperative adaptive sampling of random fields with partially known covariance. Int. J. Robust Nonlinear Control 22(5), 504–534 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, New York (1997)

    Book  MATH  Google Scholar 

  • Henderson, H., Searle, S.: On deriving the inverse of a sum of matrices. SIAM Rev. 23(1), 53–60 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Higdon, D.: A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ. Ecol. Stat. 5(2), 173–190 (1998)

    Article  Google Scholar 

  • Kang, E.L., Cressie, N.: Bayesian inference for the spatial random effects model. J. Am. Stat. Assoc. 106(495), 972–983 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Kang, E.L., Liu, D., Cressie, N.: Statistical analysis of small-area data based on independence, spatial, non-hierarchical, and hierarchical models. Comput. Stat. Data Anal. 53(8), 3016–3032 (2009). doi:10.1016/j.csda.2008.07.033

    Article  MathSciNet  MATH  Google Scholar 

  • Kanter, M.: Unimodal spectral windows. Stat. Probab. Lett. 34(4), 403–411 (1997). http://linkinghub.elsevier.com/retrieve/pii/S0167715296002088

  • Katzfuss, M.: Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3), 189–200 (2013)

    Article  MathSciNet  Google Scholar 

  • Katzfuss, M.: A multi-resolution approximation for massive spatial datasets. J. Am. Stat. Assoc. (2015). doi:10.1080/01621459.2015.1123632

  • Katzfuss, M., Cressie, N.: Maximum likelihood estimation of covariance parameters in the spatial-random-effects model. In: Proceedings of the Joint Statistical Meetings, American Statistical Association, Alexandria, VA, pp 3378–3390 (2009)

  • Katzfuss, M., Cressie, N.: Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets. J. Time Ser. Anal. 32(4), 430–446 (2011). doi:10.1111/j.1467-9892.2011.00732.x

    Article  MathSciNet  MATH  Google Scholar 

  • Katzfuss, M., Cressie, N.: Bayesian hierarchical spatio-temporal smoothing for very large datasets. Environmetrics 23(1), 94–107 (2012)

    Article  MathSciNet  Google Scholar 

  • Kidder, S.Q., Jones, A.S.: A blended satellite total precipitable water product for operational forecasting. J. Atmos. Ocean. Technol. 24(1), 74–81 (2007)

    Article  Google Scholar 

  • Lemos, R.T., Sansó, B.: A spatio-temporal model for mean, anomaly, and trend fields of North Atlantic sea surface temperature. J. Am. Stat. Assoc. 104(485), 5–18 (2009). doi:10.1198/jasa.2009.0018

    Article  MathSciNet  MATH  Google Scholar 

  • Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B 73(4), 423–498 (2011)

  • Mardia, K., Goodall, C., Redfern, E., Alonso, F.: The kriged Kalman filter. Test 7(2), 217–282 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen, H., Cressie, N., Braverman, A.: Spatial statistical data fusion for remote sensing applications. J. Am. Stat. Assoc. 107(499), 1004–1018 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen, H., Katzfuss, M., Cressie, N., Braverman, A.: Spatio-temporal data fusion for very large remote sensing datasets. Technometrics 56(2), 174–185 (2014)

    Article  MathSciNet  Google Scholar 

  • Nychka, D.W., Bandyopadhyay, S., Hammerling, D., Lindgren, F., Sain, S.R.: A multi-resolution Gaussian process model for the analysis of large spatial data sets. J. Comput. Graph. Stat. 24(2), 579–599 (2015)

    Article  Google Scholar 

  • Rao, B., Durrant-Whyte, H., Sheen, J.: A fully decentralized multi-sensor system for tracking and surveillance. Int. J. Robot. Res. 12(1), 20–44 (1993). doi:10.1177/027836499301200102

    Article  Google Scholar 

  • Sang, H., Jun, M., Huang, J.Z.: Covariance approximation for large multivariate spatial datasets with an application to multiple climate model errors. Ann. Appl. Stat. 5(4), 2519–2548 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Sherman, J., Morrison, W.: Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Stat. 21(1), 124–127 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  • Shi, T., Cressie, N.: Global statistical analysis of MISR aerosol data: a massive data product from NASA’s Terra satellite. Environmetrics 18, 665–680 (2007)

    Article  MathSciNet  Google Scholar 

  • Shoshani, A., Klasky, S., Ross, R.: Scientific data management: Challenges and approaches in the extreme scale era. In: Proceedings of the 2010 Scientific Discovery through Advanced Computing (SciDAC) Conference, Chattanooga, TN, 1, pp 353–366 (2010)

  • Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York (1999)

    Book  MATH  Google Scholar 

  • Stein, M.L.: Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat. 8, 1–19 (2014). doi:10.1016/j.spasta.2013.06.003

    Article  MathSciNet  Google Scholar 

  • Stein, M.L., Chi, Z., Welty, L.: Approximating likelihoods for large spatial data sets. J. Roy. Stat. Soc. B 66(2), 275–296 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Vecchia, A.: Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B 50(2), 297–312 (1988)

    MathSciNet  Google Scholar 

  • Wikle, C.K., Cressie, N.: A dimension-reduced approach to space-time Kalman filtering. Biometrika 86(4), 815–829 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Woodbury, M.: Inverting modified matrices. Memorandum Report 42, Statistical Research Group, Princeton University (1950)

  • Xu, B., Wikle, C.K., Fox, N.: A kernel-based spatio-temporal dynamical model for nowcasting radar precipitation. J. Am. Stat. Assoc. 100(472), 1133–1144 (2005)

    Article  MATH  Google Scholar 

  • Xu, Y., Choi, J.: Adaptive sampling for learning gaussian processes using mobile sensor networks. Sensors 11(3), 3051–3066 (2011). doi:10.3390/s110303051

    Article  Google Scholar 

  • Zhang, K.: ISSCC 2013: Memory trends. http://www.electroiq.com/articles/sst/2013/02/isscc-2013-memory-trends.html (2013). Accessed 12 June 2013

Download references

Acknowledgments

This material was based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Katzfuss was partially supported by NASA’s Earth Science Technology Office AIST-14 program and by National Science Foundation (NSF) Grant DMS-1521676. Hammerling’s research also had partial support from the NSF Research Network on Statistics in the Atmosphere and Ocean Sciences (STATMOS) through Grant DMS-1106862. We would like to acknowledge high-performance computing support from Yellowstone (ark:/85065/d7wd3xhc) provided by NCAR’s Computational and Information Systems Laboratory, sponsored by the National Science Foundation. We would like to thank Amy Braverman for making us aware of the problem of distributed spatial data; John Forsythe and Stan Kidder for the datasets and helpful advice; Yoichi Shiga for support with preprocessing and visualizing the data; and Andrew Zammit Mangion, Emtiyaz Khan, Kirk Borne, Jessica Matthews, Emily Kang, several anonymous reviewers, and the SAMSI Massive Datasets Environment and Climate working group for helpful comments and discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Katzfuss.

Appendices

Appendix 1: Derivation of the likelihood

We derive here the expression of the likelihood in (5). First, note that \(\mathbf {z}_{1:J} | {\varvec{\theta }}\sim N_n(\mathbf {B}_{1:J}{\varvec{\nu }}_0,{\varvec{\Sigma }}_{1:J})\), where \({\varvec{\Sigma }}_{1:J} = \mathbf {B}_{1:J} \mathbf {K}_0 \mathbf {B}_{1:J}' + \mathbf {V}_{1:J}\). Hence, the likelihood is given by,

$$\begin{aligned} -2 \log [\mathbf {z}_{1:J} | {\varvec{\theta }}]&= \log |{\varvec{\Sigma }}_{1:J}| \\&\quad + (\mathbf {z}_{1:J} - \mathbf {B}_{1:J}{\varvec{\nu }}_0)'{\varvec{\Sigma }}_{1:J}^{-1}(\mathbf {z}_{1:J} - \mathbf {B}_{1:J}{\varvec{\nu }}_0)\\&\quad -(n/2)\log (2\pi ). \end{aligned}$$

Applying a matrix determinant lemma (e.g., Harville 1997 Thm. 18.1.1), we can write the log determinant as,

$$\begin{aligned} \log |{\varvec{\Sigma }}_{1:J}|&= \log |\mathbf {V}_{1:J}| + \log |\mathbf {K}_0| \\&\quad + \log | \mathbf {B}_{1:J}'\mathbf {V}_{1:J}^{-1} \mathbf {B}_{1:J} + \mathbf {K}_0^{-1} | \\&= \textstyle \sum _{j=1}^J \log |\mathbf {V}_j| - \log |\mathbf {K}_0^{-1}| + \log |\mathbf {K}_z^{-1}|. \end{aligned}$$

Further, using the Sherman-Morrison-Woodbury formula (Sherman and Morrison 1950; Woodbury 1950; Henderson and Searle 1981), we can show that \({\varvec{\Sigma }}_{1:J}^{-1} = \mathbf {V}_{1:J}^{-1} - \mathbf {V}_{1:J}^{-1} \mathbf {B}_{1:J} \mathbf {K}_z \mathbf {B}_{1:J}' \mathbf {V}_{1:J}^{-1}\), and so

$$\begin{aligned}&(\mathbf {z}_{1:J} - \mathbf {B}_{1:J}{\varvec{\nu }}_0)'{\varvec{\Sigma }}_{1:J}^{-1}(\mathbf {z}_{1:J} - \mathbf {B}_{1:J}{\varvec{\nu }}_0) \\&\quad = \textstyle \sum _{j=1}^J (\mathbf {z}_j - \mathbf {B}_j{\varvec{\nu }}_0)'\mathbf {V}_j^{-1}(\mathbf {z}_j - \mathbf {B}_j{\varvec{\nu }}_0) \\&\qquad - \big (\sum _{j=1}^J \mathbf {B}_j'\mathbf {V}_j^{-1}(\mathbf {z}_j-\mathbf {B}_j{\varvec{\nu }}_0)\big )'\mathbf {K}_z\\&\qquad \times \textstyle \big (\sum _{j=1}^J \mathbf {B}_j'\mathbf {V}_j^{-1}(\mathbf {z}_j-\mathbf {B}_j{\varvec{\nu }}_0)\big )\\&\quad = \sum _j \mathbf {z}_j'\mathbf {V}_j^{-1} \mathbf {z}_j -2 {\varvec{\nu }}_0'(\mathbf {K}_z^{-1}{\varvec{\nu }}_z-\mathbf {K}_0^{-1}{\varvec{\nu }}_0) \\&\qquad + {\varvec{\nu }}_0'(\mathbf {K}_z^{-1}- \mathbf {K}_0^{-1}) {\varvec{\nu }}_0 \\&\qquad - \big ( (\mathbf {K}_z^{-1}{\varvec{\nu }}_z-\mathbf {K}_0^{-1}{\varvec{\nu }}_0)-(\mathbf {K}_z^{-1}-\mathbf {K}_0^{-1}){\varvec{\nu }}_0\big )'\mathbf {K}_z\\&\qquad \times \big ( (\mathbf {K}_z^{-1}{\varvec{\nu }}_z-\mathbf {K}_0^{-1}{\varvec{\nu }}_0)-(\mathbf {K}_z^{-1}-\mathbf {K}_0^{-1}){\varvec{\nu }}_0\big )\\&= \sum _j \mathbf {z}_j'\mathbf {V}_j^{-1} \mathbf {z}_j - 2{\varvec{\nu }}_0'\mathbf {K}_z^{-1}{\varvec{\nu }}_z + {\varvec{\nu }}_0'\mathbf {K}_0^{-1}{\varvec{\nu }}_0 + {\varvec{\nu }}_0'\mathbf {K}_z^{-1}{\varvec{\nu }}_0 \\&\qquad - (\mathbf {K}_z^{-1}{\varvec{\nu }}_z)'\mathbf {K}_z(\mathbf {K}_z^{-1}{\varvec{\nu }}_z)-{\varvec{\nu }}_0'\mathbf {K}_z^{-1}\mathbf {K}_z\mathbf {K}_z^{-1}{\varvec{\nu }}_0 \\&\qquad + 2(\mathbf {K}_z^{-1}{\varvec{\nu }}_z)'\mathbf {K}_z \mathbf {K}_z^{-1} {\varvec{\nu }}_0\\&\quad = \textstyle \sum _j \mathbf {z}_j'\mathbf {V}_j^{-1} \mathbf {z}_j + {\varvec{\nu }}_0'\mathbf {K}_0^{-1}{\varvec{\nu }}_0 - {\varvec{\nu }}_z'\mathbf {K}_z^{-1}{\varvec{\nu }}_z, \end{aligned}$$

where \(\sum _{j=1}^J \mathbf {B}_j'\mathbf {V}_j^{-1}\mathbf {B}_j = \mathbf {K}_z^{-1}-\mathbf {K}_0^{-1}\) and \(\sum _{j=1}^J \mathbf {B}_j'\mathbf {V}_j^{-1}\mathbf {z}_j = \mathbf {K}_z^{-1}{\varvec{\nu }}_z-\mathbf {K}_0^{-1}{\varvec{\nu }}_0\) both follow from (3).

Appendix 2: Spatial prediction when observed and prediction locations coincide

Here we describe how to do spatial prediction when a small number, q say, of the observed locations are also in the set of desired prediction locations. Define \({\varvec{\delta }}_{P,O}\) to be the vector of the first q elements of \({\varvec{\delta }}^P\), which we assume to correspond to the q observed prediction locations, and let \(\mathbf {P}_j\) be a sparse \(n_j \times q\) matrix with \((\mathbf {P}_j)_{k,l} = I(\mathbf {s}_{j,k} = \mathbf {s}_l^P)\). We write our model in state-space form with identity evolution equation, \(\mathbf {z}_j = \tilde{\mathbf {B}}_j \tilde{{\varvec{\eta }}} + \tilde{{\varvec{\xi }}}_j\), where \(\tilde{\mathbf {B}}_j {:}{=}(\mathbf {B}_j, \mathbf {P}_j)\), \( \tilde{{\varvec{\eta }}} {:}{=}({\varvec{\eta }}', {\varvec{\delta }}_{P,O}')' \sim N(\tilde{{\varvec{\nu }}}_0,\tilde{\mathbf {K}}_0)\), \(\tilde{{\varvec{\nu }}}_0 {:}{=}({\varvec{\nu }}_0', \mathbf {0}_q')'\), \(\tilde{\mathbf {K}}_0 \) is blockdiagonal with first block \(\mathbf {K}_0\) and second block \(diag\{ v_\delta (\mathbf {s}^P_1),\ldots ,v_\delta (\mathbf {s}^P_q)\}\), \(\tilde{{\varvec{\xi }}}_j \sim N_{n_j}(\mathbf {0},\tilde{\mathbf {V}}_j)\), and \(\tilde{\mathbf {V}}_j\) is the same as \(\mathbf {V}_j\) except that the ith diagonal element is now \(v_\epsilon (\mathbf {s}_{j,i})\) if \(\mathbf {s}_{j,i}\) is one of the prediction locations.

The decentralized Kalman filter (Rao et al. 1993) gives \(\tilde{\mathbf {K}}_z^{-1} = \tilde{\mathbf {K}}_{0}^{-1} + \textstyle \sum _{j=1}^J \tilde{\mathbf {R}}_j\) and \(\tilde{{\varvec{\nu }}}_z = \tilde{\mathbf {K}}_z(\tilde{\mathbf {K}}_0^{-1} \tilde{{\varvec{\nu }}}_0 + \textstyle \sum _{j=1}^J \tilde{{\varvec{\gamma }}}_j)\), where \(\tilde{\mathbf {R}}_j {:}{=}\tilde{\mathbf {B}}_j'\tilde{\mathbf {V}}_j^{-1} \tilde{\mathbf {B}}_j\) and \(\tilde{{\varvec{\gamma }}}_j {:}{=}\tilde{\mathbf {B}}_j'\tilde{\mathbf {V}}_j^{-1}\mathbf {z}_j\) are the only quantities that need to be calculated at and transfered from server j, which is feasible due to sparsity if q is not too large. The predictive distribution is then given by \(\mathbf {y}^P | \mathbf {z}_{1:J} \sim N( \tilde{\mathbf {B}}^P \tilde{{\varvec{\nu }}}_J, \tilde{\mathbf {B}}^P \tilde{\mathbf {K}}_z \tilde{\mathbf {B}}^P{}' + \tilde{\mathbf {V}}_\delta ^P )\), where \(\tilde{\mathbf {B}}^P {:}{=}(\mathbf {B}^P, (\mathbf {I}_q, \mathbf {0})')\) and \(\tilde{\mathbf {V}}_\delta ^P {:}{=}diag\{\mathbf {0}_q', v_\delta (\mathbf {s}^P_{q+1}),\ldots ,v_\delta (\mathbf {s}^P_{n_P})\}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Katzfuss, M., Hammerling, D. Parallel inference for massive distributed spatial data using low-rank models. Stat Comput 27, 363–375 (2017). https://doi.org/10.1007/s11222-016-9627-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9627-4

Keywords

Navigation