A memory-free spatial additive mixed modeling for big spatial data

Murakami, Daisuke; Griffith, Daniel A.

doi:10.1007/s42081-019-00063-x

A memory-free spatial additive mixed modeling for big spatial data

Original Paper
Spatial statistics
Published: 07 December 2019

Volume 3, pages 215–241, (2020)
Cite this article

Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

360 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

This study develops a spatial additive mixed modeling (AMM) approach estimating spatial and non-spatial effects from large samples, such as millions of observations. Although fast AMM approaches are already well established, they are restrictive in that they assume a known spatial dependence structure. To overcome this limitation, this study develops a fast AMM with the estimation of spatial structure in residuals and regression coefficients together with non-spatial effects. We rely on a Moran coefficient-based approach to estimate the spatial structure. The proposed approach pre-compresses large matrices whose size grows with respect to the sample size N before the model estimation; thus, the computational complexity for the estimation is independent of the sample size. Furthermore, the pre-compression is done through a block-wise procedure that makes the memory consumption independent of N. Eventually, the spatial AMM is memory free and fast even for millions of observations. The developed approach is compared to alternatives through Monte Carlo simulation experiments. The result confirms the estimation accuracy of the spatially varying coefficients and group coefficients, and computational efficiency of the developed approach. Finally, we apply our approach to an income analysis using United States (US) data in 2015.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Getis’s spatial filtering legacy: spatial autocorrelation mixtures in geospatial agricultural datasets

Article 04 November 2023

A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data

Small area estimation of general parameters: Bayesian transformed spatial prediction approach

Article 07 December 2019

Notes

\({\mathbf{w}}\) yields a negatively dependent spatial process if \({\varvec{\Lambda}}\) is defined using the absolute values of negative eigenvalues and E is defined using their corresponding eigenvectors. Our approach is capable of analyzing negative spatial dependence situations (see Griffith, 2006).
They confirmed that the Moran eigenvector approach successfully captures/eliminates residual spatial dependence for large samples up to N = 80,000.
Our MC-based spatial process yields a rank-reduced GP (see Murakami et al., 2017). Therefore, the standard error of the MC-based process can be underestimated if the full GP is regarded as the true process. Murakami and Griffith (2019a) showed that standard error bias is small if the true process is a positively dependent spatial one, in which the MC-based approach intended. For fitting a full GP, bias correction (see, e.g., Finley et al. 2009) will be an important future step.
We prefer the radial basis function-based GP approximation (see Kammann and Wand 2003) because it is one of the most popular GP approximations (Liu et al. 2018). In geostatistics, the predictive process modeling (Banerjee et al. 2008) and fixed rank kriging (Cressie and Johannesson 2008) assume radial basis functions too. Although soap film smoothing (Wood 2008), the stochastic partial differential equation approach (Rue et al. 2009), and other approaches are also available for low rank GP modeling, they are relatively computationally demanding. Thus, we adopt the radial basis function approach for the SVC modeling in GeoAMM and GeoAMM*.

References

Anselin, L. (2003). Spatial externalities, spatial multipliers, and spatial econometrics. International Regional Science Review, 26(2), 153–166.
Google Scholar
Anselin, L. (2010). Thirty years of spatial econometrics. Papers in Regional Science, 89(1), 3–25.
Google Scholar
Arbia, G., Ghiringhelli, C., & Mira, A. (2019). Estimation of spatial econometric linear models with large datasets: How big can spatial Big Data be? Regional Science and Urban Economics, 76, 67–73.
Google Scholar
Banerjee, S., Gelfand, A. E., Finley, A. O., & Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825–848.
MathSciNet MATH Google Scholar
Bates, D. M. (2010). lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/book/. Accessed 25 Nov 2011.
Cressie, N. (1992). Statistics for spatial data. New York: Wiley.
MATH Google Scholar
Cressie, N., & Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 209–226.
MathSciNet MATH Google Scholar
Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514), 800–812.
MathSciNet Google Scholar
Dray, S., Legendre, P., & Peres-Neto, P. R. (2006). Spatial modelling: A comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling, 196(3–4), 483–493.
Google Scholar
Drineas, P., & Mahoney, M. W. (2005). On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6, 2153–2175.
MathSciNet MATH Google Scholar
Finley, A. O., Sang, H., Banerjee, S., & Gelfand, A. E. (2009). Improving the performance of predictive process modeling for large datasets. Computational Statistics & Data Analysis, 53, 2873–2884.
MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.
MATH Google Scholar
Furrer, R., Genton, M. G., & Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3), 502–523.
MathSciNet Google Scholar
Gelfand, A. E., Diggle, P., Guttorp, P., & Fuentes, M. (2010). Handbook of spatial statistics. Boca Raton: CRC Press.
MATH Google Scholar
Gelfand, A. E., Kim, H. J., Sirmans, C. F., & Banerjee, S. (2003). Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association, 98(462), 387–396.
MathSciNet MATH Google Scholar
Genton, M. G., & Kleiber, W. (2015). Cross-covariance functions for multivariate geostatistics. Statistical Science, 30(2), 147–163.
MathSciNet MATH Google Scholar
Goldstein, H. (2011). Multilevel statistical models. West Sussex: Wiley.
MATH Google Scholar
Gotway, C. A., & Young, L. J. (2002). Combining incompatible spatial data. Journal of the American Statistical Association, 97(458), 632–648.
MathSciNet MATH Google Scholar
Griffith, D. A. (2003). Spatial autocorrelation and spatial filtering: Gaining understanding through theory and scientific visualization. Berlin: Springer.
Google Scholar
Griffith, D. A. (2006). Hidden negative spatial autocorrelation. Journal of Geographical Systems, 8(4), 335–355.
Google Scholar
Griffith, D. A., & Chun, Y. (2019). Implementing Moran eigenvector spatial filtering for massively large georeferenced datasets. International Journal of Geographical Information Science, 33(9), 1–15.
Google Scholar
Griffith, D. A., & Peres-Neto, P. R. (2006). Spatial modeling in ecology: The flexibility of eigenfunction spatial analyses. Ecology, 87(10), 2603–2613.
Google Scholar
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., et al. (2018). A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24, 1–28.
MathSciNet MATH Google Scholar
Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics, 31(2), 423–447.
MATH Google Scholar
Henderson, C. R. (1984). Applications of linear models in animal breeding. Guelph, ON: University of Guelph.
Google Scholar
Hodges, J. S. (2016). Richly parameterized linear models: additive, time series, and spatial models using random effects. Boca Raton: Chapman and Hall/CRC.
MATH Google Scholar
Hox, J. (1998). Multilevel modeling: When and why. In I. Balderjahn, R. Mathar, & M. Schader (Eds.), Classification, data analysis, and data highways (pp. 147–154). Berlin: Springer.
Google Scholar
Hughes, J., & Haran, M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 139–159.
MathSciNet Google Scholar
Kammann, E. E., & Wand, M. P. (2003). Geoadditive models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(1), 1–18.
MathSciNet MATH Google Scholar
Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. Journal of the American Statistical Association., 112(517), 201–214.
MathSciNet Google Scholar
Kelejian, H. H., & Prucha, I. R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1), 99–121.
Google Scholar
Kneib, T., Hothorn, T., & Tutz, G. (2009). Variable selection and model choice in geoadditive regression models. Biometrics, 65(2), 626–634.
MathSciNet MATH Google Scholar
Krock, M., Kleiber, W., & Becker, S. (2019). Penalized basis models for very large spatial datasets. arXiv:1902.06877.
LeSage, J., & Pace, R. K. (2009). Introduction to Spatial Econometrics. Boca Raton: Chapman and Hall/CRC.
MATH Google Scholar
Li, Z., & Wood, S. N. (2019). Faster model matrix crossproducts for large generalized linear models with discretized covariates. Statistics and Computing. https://doi.org/10.1007/s11222-019-09864-2.
Article MATH Google Scholar
Lindgren, F., Rue, H., & Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4), 423–498.
MathSciNet MATH Google Scholar
Liu, H., Ong, Y. S., Shen, X., & Cai, J. (2018). When Gaussian process meets big data: A review of scalable GPs. arXiv:1807.01065.
Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17–23.
MathSciNet MATH Google Scholar
Murakami, D., & Griffith, D. A. (2015). Random effects specifications in eigenvector spatial filtering: A simulation study. Journal of Geographical Systems, 17(4), 311–331.
Google Scholar
Murakami, D., & Griffith, D. A. (2019a). Eigenvector spatial filtering for large data sets: Fixed and random effects approaches. Geographical Analysis, 51(1), 23–49.
Google Scholar
Murakami, D., & Griffith, D. A. (2019b). Spatially varying coefficient modeling for large datasets: Eliminating N from spatial regressions. Spatial Statistics, 30, 39–64.
MathSciNet Google Scholar
Murakami, D., Lu, B., Harris, P., Brunsdon, C., Charlton, M., Nakaya, T., et al. (2019). The importance of scale in spatially varying coefficient modeling. Annals of the American Association of Geographers, 109(1), 50–70.
Google Scholar
Murakami, D., Yoshida, T., Seya, H., Griffith, D. A., & Yamagata, Y. (2017). A Moran coefficient-based mixed effects approach to investigate spatially varying relationships. Spatial Statistics, 19, 68–89.
MathSciNet Google Scholar
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392.
MathSciNet MATH Google Scholar
Samuels, M. L. (1993). Simpson’s paradox and related phenomena. Journal of the American Statistical Association, 88(421), 81–88.
MathSciNet MATH Google Scholar
Stein, M. L. (2014). Limitations on low rank approximations for covariance matrices of spatial data. Spatial Statistics, 8, 1–19.
MathSciNet Google Scholar
Tiefelsdorf, M., & Griffith, D. A. (2007). Semiparametric filtering of spatial autocorrelation: The eigenvector approach. Environment and Planning A, 39(5), 1193–1221.
Google Scholar
Tsutsumi, M., & Seya, H. (2008). Measuring the impact of large-scale transportation projects on land price using spatial statistical models. Papers in Regional Science, 87(3), 385–401.
Google Scholar
Tsutsumi, M., & Seya, H. (2009). Hedonic approaches based on spatial econometrics and spatial statistics: Application to evaluation of project benefits. Journal of Geographical Systems, 11(4), 357–380.
Google Scholar
Wang, C., & Furrer, R. (2019). Combining heterogeneous spatial datasets with process-based spatial fusion models: A unifying framework. Arxiv, 1906, 00364.
Google Scholar
Wiesenfarth, M., & Kneib, T. (2010). Bayesian geoadditive sample selection models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59(3), 381–404.
MathSciNet Google Scholar
Wood, S. N. (2008). Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518.
MathSciNet MATH Google Scholar
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 3–36.
MathSciNet MATH Google Scholar
Wood, S. N. (2017). Generalized additive models: An introduction with R. Boca Raton: Chapman and Hall/CRC.
MATH Google Scholar
Wood, S. N., Goude, Y., & Shaw, S. (2015). Generalized additive models for large data sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64(1), 139–155.
MathSciNet Google Scholar
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized additive models for gigadata: Modeling the UK black smoke network daily data. Journal of the American Statistical Association, 112(519), 1199–1210.
MathSciNet Google Scholar
Zhang, K., & Kwok, J. T. (2010). Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Transactions on Neural Networks, 21(10), 1576–1587.
Google Scholar

Download references

Acknowledgements

This study is funded by the JSPS KAKENHI Grant numbers 17K12974 and 18H03628.

Author information

Authors and Affiliations

Department of Data Science, Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo, 190-8562, Japan
Daisuke Murakami
School of Economic, Political and Policy Science, The University of Texas, Dallas, 800 W Campbell Rd, Richardson, TX, 75080, USA
Daniel A. Griffith

Authors

Daisuke Murakami
View author publications
You can also search for this author in PubMed Google Scholar
Daniel A. Griffith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daisuke Murakami.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murakami, D., Griffith, D.A. A memory-free spatial additive mixed modeling for big spatial data. Jpn J Stat Data Sci 3, 215–241 (2020). https://doi.org/10.1007/s42081-019-00063-x

Download citation

Received: 01 July 2019
Accepted: 11 November 2019
Published: 07 December 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s42081-019-00063-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A memory-free spatial additive mixed modeling for big spatial data

Abstract

Access this article

Similar content being viewed by others

Getis’s spatial filtering legacy: spatial autocorrelation mixtures in geospatial agricultural datasets

A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data

Small area estimation of general parameters: Bayesian transformed spatial prediction approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A memory-free spatial additive mixed modeling for big spatial data

Abstract

Access this article

Similar content being viewed by others

Getis’s spatial filtering legacy: spatial autocorrelation mixtures in geospatial agricultural datasets

A Sparse Areal Mixed Model for Multivariate Outcomes, with an Application to Zero-Inflated Census Data

Small area estimation of general parameters: Bayesian transformed spatial prediction approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation