Spatial Autocorrelation Parameter Estimation for Massively Large Georeferenced Datasets

Griffith, Daniel A.; Paelinck, Jean H. P.

doi:10.1007/978-3-319-72553-6_7

Daniel A. Griffith⁸ &
Jean H. P. Paelinck⁹

Part of the book series: Advanced Studies in Theoretical and Applied Econometrics ((ASTA,volume 51))

737 Accesses

Abstract

Features often linked to datasets classified as “big spatial data” include: massive in size, complex (e.g., contain spatial autocorrelation), and the failure of conventional/standard analysis techniques originally designed for more modest sample sizes. These features characterize remotely sensed data, whose sizes may only be in the hundreds of thousands or millions, but whose spatial weights matrix sizes are the squares of these numbers. Consequently, spatial scientists have found that conventional spatial statistical/econometric techniques designed to handle data with n in the hundreds or thousands fail to handle practical sized remotely sensed images. This chapter outlines revised techniques to circumvent this restriction for spatial autoregression analyses. It also lays a foundation for extending findings reported here to sizeable irregular surface partitionings.

Sections of this chapter are adapted from, and an extension of Griffith, D. (2015).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Calibration refers to either estimation based upon a purposeful systematic sample covering a feasible parameter space or calculations based upon population moments.
2.
Mining began at the La Oroya, Peru, smelting site in 1893. Copper smelting began at the site in 1922 (70K tonnes capacity). Lead smelting began at the site in 1928 (120K tonnes capacity). Finally, zinc smelting began at the site in 1952 (45K tonnes capacity). Air pollution readings in 1999 indicated extreme levels of arsenic, cadmium, and lead.
3.
For a P-by-Q regular square tessellation and a queen adjacency SWM , \( \mathrm{TR}\left({\mathbf{W}}^{\mathrm{T}}\mathbf{W}\right)=\sum \limits_{j=1}^n{\lambda}_j^2+\frac{81P+81Q+326}{2400} \). For a P-by-Q (horizontal-by-vertical) regular hexagonal tessellation SWM, \( \mathrm{TR}\left({\mathbf{W}}^{\mathrm{T}}\mathbf{W}\right)=\sum \limits_{j=1}^n{\lambda}_j^2+\frac{5P+12Q+25}{180} \). The limit of these correction factors goes to zero as the size of a surface partitioning goes to infinity:
4.
For a given n, this correction factor appears to be of the form \( \frac{P+Q+12}{72}-\beta\ {\left(\frac{2}{\delta}\right)}^{\gamma }+\alpha +\beta\ {\left(\frac{1}{\delta +\rho }+\frac{1}{\delta -\rho}\right)}^{\gamma } \). For n = 20², k = 0.7222, \( \widehat{\alpha} \) = −0.0872, \( \widehat{\beta} \) = 0.4414, \( \widehat{\delta} \) = 1.0011, \( \widehat{\gamma} \) = 1.4571, and the RESS = 1.1 × 10⁻⁵.
5.
This formulation allows negative values to be raised to noninteger exponents.
6.
This value minimized the chances of having a count of 0 or100, neither of which occurs in the empirical data.

References

Burden, S., Cressie, N., & Steel, D. (2015). The SAR model for very large datasets: A reduced rank approach. Econometrics, 3, 317–338.
Article Google Scholar
Cressie, N., Olsen, A., & Cook, D. (1996). Massive data sets: Problems and possibilities, with application to environmental monitoring. In The Committee of Applied and Theoretical Statistics (Ed.), Massive data sets: Proceedings of a Workshop (pp. 115–119). Washington, DC: National Academy Press.
Google Scholar
Griffith, D. (2015). Approximation of Gaussian spatial autoregressive models for massive regular square tessellation data. International Journal of Geographical Information Science, 29, 2143–2173.
Article Google Scholar
Kelejian, H., & Prucha, I. (2010). Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics, 157, 53–67.
Article Google Scholar
Ord, J. (1975). Estimation methods for models of spatial interactions. Journal of the American Statistical Association, 70, 120–126.
Article Google Scholar
Walde, J., Larch, M., & Tappeiner, G. (2008). Performance contest between MLE and GMM for huge spatial autoregressive models. Journal of Statistical Computation and Simulation, 78, 151–166.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Texas at Dallas, Richardson, Texas, USA
Daniel A. Griffith
George Mason University, Fairfax, VA, USA
Jean H. P. Paelinck

Authors

Daniel A. Griffith
View author publications
You can also search for this author in PubMed Google Scholar
Jean H. P. Paelinck
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Griffith, D.A., Paelinck, J.H.P. (2018). Spatial Autocorrelation Parameter Estimation for Massively Large Georeferenced Datasets. In: Morphisms for Quantitative Spatial Analysis. Advanced Studies in Theoretical and Applied Econometrics, vol 51. Springer, Cham. https://doi.org/10.1007/978-3-319-72553-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-72553-6_7
Published: 08 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72552-9
Online ISBN: 978-3-319-72553-6
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics