A hierarchical approach to scalable Gaussian process regression for spatial data

Dearmon, Jacob; Smith, Tony E.

doi:10.1007/s43071-021-00012-5

A hierarchical approach to scalable Gaussian process regression for spatial data

Original Paper
Published: 14 June 2021

Volume 2, article number 7, (2021)
Cite this article

Journal of Spatial Econometrics

610 Accesses
2 Citations
Explore all metrics

Abstract

Large scale and highly detailed geospatial datasets currently offer rich opportunities for empirical investigation, where finer-level investigation of spatial spillovers and spatial infill can now be done at the parcel level. Gaussian process regression (GPR) is particularly well suited for such investigations, but is currently limited by its need to manipulate and store large dense covariance matrices. The central purpose of this paper is to develop a more efficient version of GPR based on the hierarchical covariance approximation proposed by Chen et al. (J Mach Learn Res 18:1–42, 2017) and Chen and Stein (Linear-cost covariance functions for Gaussian random fields, arXiv:1711.05895, 2017). We provide a novel probabilistic interpretation of Chen’s framework, and extend his method to the analysis of local marginal effects at the parcel level. Finally, we apply these tools to a spatial dataset constructed from a 10-year period of Oklahoma County Assessor databases. In this setting, we are able to identify both regions of possible spatial spillovers and spatial infill, and to show more generally how this approach can be used for the systematic identification of specific development opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conjugate Bayesian Regression Models for Massive Geostatistical Data Sets

Bayesian hierarchical models for analysing spatial point-based data at a grid level: a comparison of approaches

Article 10 September 2014

Model-Based Geostatistics Under Spatially Varying Preferential Sampling

Article 26 September 2023

Data availability

The Oklahoma County Assessor has database exports that are available from their office upon request and can be used for empirical research.

Code availability

Matlab; GPStuff; Custom Coding in Matlab.

Notes

To allow comparability of length scales, individual attribute variables are implicitly assumed to be standardized.
Note in particular from (4) above that for test locations, $x_{*l}$, far from all training locations,$X = (x_{1} , \ldots ,x_{n} )$, the covariance vector, $K(x_{*l} ,X)$, must approach zero. This in turn implies from (7) together with model (1) that the corresponding predictions, $\hat{y}_{l} = E(Y_{*l} |Y) = E(f_{*l} |Y)\,\, + \mu$, must necessarily approach the mean, $\mu$. Such (extrapolated) predictions thus exhibit “mean reversion”.
In this regard, the most popular alternative kernel function, namely the simple exponential kernel (as for example in Genten 2001), is overly sensitive to small differences between similar attribute profiles. Within the larger family of Matern kernels, the squared exponential kernel is also the simplest to analyze from both estimation and inference perspectives.
These are also referred to as “inducing” points (as for example in Rasmussen and Quinonero-Candela 2005).
Note that whenever $X_{i} \cap X_{r} \ne \emptyset$, the conditional covariance matrix, $X_{ii} - X_{ir} \,X_{rr}^{ - 1} X_{ri}$, in (14) must be singular. But as will be seen in footnote 3 below, this has no substantive consequences for the model constructed.
As seen in (22) below, the random vectors, $H_{1}$ and $H_{2}$, have full rank covariance matrices, and thus are properly multi-normally distributed even when $Z_{1|r}$ and $Z_{2|r}$ are singular multi-normal [see for example Anderson (1958, Theorem 2.4.5)].
Blake (2015). Fast and Accurate Symmetric Positive Definite Matrix Inverse, Matlab Central File Exchange.
In fact, the temperature application of Chen and Stein [C2] involves more than 2 million observations.
The explicit samples sizes shown are [10,000, 20,000, 40,000, 80,000, 160,000, 320,000, 500,000].
Computation times for HCA automatically include calculations of Local Marginal Effects at each prediction point (which are not directly relevant for either NNGP or GBM). But these add little in the way of time differences.
These predictions are computed for a regular grid of points in $[0,1]^{2}$ and, in a manner similar to Fig. 8a, contours are then interpolated and plotted using the Matlab program, contour.m. Similar procedures are used to obtain Figs. 13b and 14b below.
It should also be noted that “local marginal effects” are much more problematic for both NNGP and GBM models, and are not considered here. In NNGP, the dominant effects of predictors are regression-like global mean effects with only the spatial error term modeled as a nearest-neighbor Gaussian process. In GBM, prediction surfaces are either locally flat or at best governed by the type of weak-learner functions used. So local behavior is of less interest in either of these prediction models.
In particular, by adding 'irrelevant' variables to our simulation model, experiments showed that the combined time for computing predictions together with LMEs for any given variable is approximately linear in the total number of variables. In the present case with 12,569 sample points, 150 landmark points, and using only in-sample calculations of LMEs, the added time for each new irrelevant variable was approximately 30 s.
LME performance deteriorates under higher error variance. Within this simulation setting, we multiplied the error’s 0.5 standard deviation by a scaling factor which ranged from 0.5 to 2.5 by units of 0.5. Regression results suggest that, for every one unit increase in this scaling factor, we find that MAE increases by 0.06 for LME × 1 and by 0.12 for LME × 2.
This is consistent with Matlab’s own findings that “Using A\b instead of inv(A)*b is two to three times faster, and produces residuals on the order of machine accuracy relative to the magnitude of the data”, (https://www.mathworks.com/help/matlab/ref/inv.html).
For well-behaved data sets, a random selection of landmark points is probably sufficient and much less costly to execute. But, as noted by Chen and Stein (2021, p.15), in more challenging cases such as our present housing-price data, a careful selection of landmarks can substantially reduce approximation errors. The primary drawback is computation time. Tree construction is much more costly with k-means. For a simulated data set with 311,422 observations and 500 landmark points, random selection took 28 s while k-means took almost 5 min. But nonetheless, k-means is often used way of selecting inducing points; see for example (Park and Choi 2010; Hensman et al. 2015).
Changes of a more indirect nature are to allow tree indexing on different sets of attributes. For our present purposes, we partitioned on all variables, except sale year, which made the grouping more spatial than temporal.
Assessor data also provides estimates of market value for each parcel. These are only used in the assessment of predictive model performance.
Note in particular the row of blue points with underestimated values ($\approx \,$$172,000) that roughly correspond to the sample mean price of the data. As mentioned in footnote 2 above, these are instances of parcels with extreme attribute profiles involving extrapolated price predictions that exhibit mean reversion. However, such outliers are usually identified easily and can be analyzed separately.
A cursory search suggests that the lower bound on a room addition is about $80 per square foot (as for example in https://www.ownerly.com/home-improvement/home-addition-cost/, https://www.homeadvisor.com/cost/additions-and-remodels/build-an-addition/, and https://www.homelight.com/blog/room-addition-cost/).
This building permit data is taken from the County Assessor’s database, with dates issued in 2018 or later. Building costs for permits in our data set all exceed $5,000.
One exception is the recent paper by Cohen and Zabel (2020) which analyzes such spillover effects at the census tract level in the Greater Boston Area.
Such situations do not usually occur in tier-one markets.

References

Anderson TW (1958) Introduction to multivariate statistical analysis. Wiley, New York
Google Scholar
Chen J, Avron H, Sindhwani V (2017) Hierarchically compositional kernelsfor scalable nonparametric learning. J Mach Learn Res 18:1–42
Google Scholar
Chen J, Stein ML (2017) Linear-cost covariance functions for gaussian random fields. arXiv:1711.05895
Chen J, Stein ML (2021) Linear-cost covariance functions for Gaussian random fields. J Am Stat Assoc, p 1-43
Cohen JP, Zabel J (2020) Local house price diffusion. Real Estate Econ 48:710–743
Article Google Scholar
Daisa JM, Parker T (2009) Trip generation rates for urban infill land uses in California. ITE J 79(6):30–39
Google Scholar
Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812
Article Google Scholar
Dearmon J, Smith TE (2016) Gaussian process regression and Bayesian model averaging: an alternative approach to modeling spatial phenomena. Geogr Anal 48:82–111
Article Google Scholar
Dearmon J, Smith TE (2017) Local marginal analysis of spatial data: a Gaussian process regression approach with Bayesian model and kernel averaging. Spat Econom Qual Limit Depend Var 37:297–342
Google Scholar
DeFusco A, Ding W, Ferreira F, Gyourko J (2018) The role of price spillovers in the American housing boom. J Urban Econ 108:72–84
Article Google Scholar
Finley AO, Abhirup D, Banerjee S (2020) spNNGP R package for nearest neighbor Gaussian process models. arXiv:2001.09111v1 [stat.CO]
Genton MG (2001) Classes of kernels for machine learning: a statistics perspective. J Mach Learn Res 2:299–312
Google Scholar
Hensman J, Matthews AG, Filippone M, Ghahramani Z (2015) MCMC for variationally sparse Gaussian processes. In: Advances in neural information processing systems, pp 1648–1656
Landis JD, Hood H, Li G, Rogers T, Warren C (2006) The future of infill housing in California: opportunities, potential, and feasibility. Hous Policy Debate 17(4):681–725
Article Google Scholar
McConnell V, Wiley K (2010) Infill development: perspectives and evidence from economics and planning. Resour Fut 10:1–34
Google Scholar
Park S, Choi S (2010) Hierarchical Gaussian process regression. In: Proceedings of 2nd Asian conference on machine learning, pp 95–110
Rasmussen C, Quinonero-Candela J (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6:1939–1959
Google Scholar
Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. Update 1(1):2007
Google Scholar
Vanhatalo J, Riihimäki J, Hartikainen J, Jylänki P, Tolvanen V, Vehtari A (2013) GPstuff: Bayesian modeling with Gaussian processes. J Mach Learn Res 14(Apr):1175–1179
Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Meinders School of Business, Oklahoma City University, Oklahoma City, USA
Jacob Dearmon
Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA
Tony E. Smith

Authors

Jacob Dearmon
View author publications
You can also search for this author in PubMed Google Scholar
Tony E. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacob Dearmon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

To show that expressions (34) and (37) are indeed the actual covariances of the random vectors in expression (33) of the text, it is convenient to introduce further simplifying notation. For each possible root path, $i_{1} \, \to \,i_{2} \, \to \, \cdots \, \to \,i_{m - 1} \, \to \,i_{m} \to \,r$, let $H_{{i_{1} \,i_{2} \, \cdots \,i_{m} \,r}}$ be defined recursively for paths of length one by

$$H_{{i_{1} \,r}} \, = \,Z_{{i_{1} \,|\,r}} \, + \,A_{{i_{1} \,r}} \,Z_{r}$$

(A.1)

[as in (18) of the text] and for longer paths by

$$H_{{i_{1} \,i_{2} \, \cdots \,i_{m} \,r}} \, = \,Z_{{i_{1} \,|\,i_{2} }} + \,A_{{i_{1} \,i_{2} \,}} H_{{i_{2} \cdots \,i_{m} \,r\,}}$$

(A.2)

Then, (A.1) together with the argument in (14) through (17) of the text again shows that for paths of length one,

$${\text{cov}} \left( {H_{{i_{1} \,r}} } \right)\, = \,K_{{i_{1} \,i_{1} }}$$

(A.3)

So if it hypothesized that

$${\text{cov}} (H_{{i_{1} \, \cdots \,i_{m} \,r}} ) = K_{{i_{1} \,i_{1} }}$$

(A.4)

holds for all paths of length m, then for paths of length $m + 1$ it follows from the independence of $Y_{{i_{1} \,|\,i_{2} }}$ and $H_{{i_{2} \cdots \,i_{m + 1} \,r\,}}$, together with (A.4) that [again from the argument in (14) through (17) in the text],

$$\begin{aligned} {\text{cov}} \left( {H_{{i_{1} \,i_{2} \, \cdots \,i_{m} \,i_{m + 1} \,r}} } \right) & = {\text{cov}} \left( {Y_{{i_{1} \,|\,i_{2} }} + A_{{i_{1} \,i_{2} \,}} H_{{i_{2} \cdots \,i_{m + 1} \,r\,}} } \right) \\ & = {\text{cov}} \left( {Y_{{i_{1} \,|\,i_{2} }} } \right) + A_{{i_{1} \,i_{2} }} {\text{cov}} \left( {H_{{i_{2} \cdots \,i_{m + 1} \,r}} } \right)A_{{i_{2} \,i_{1} }} \\ & = K_{{i_{1} \,i_{1} }} - K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} + \left( {K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} } \right)\left[ {K_{{i_{2} \,i_{2} }} } \right]\left( {K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} } \right) \\ & = K_{{i_{1} \,i_{1} }} - K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} + K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} = K_{{i_{1} \,i_{1} }} \\ \end{aligned}$$

(A.5)

So by induction, (A.4) must hold for all m. But for any leaf, $i$, with root path, $i\, \to i_{1} \, \to \cdots \, \to i_{m} \to \,r$, this implies at once that

$${\text{cov}} \left( {H_{i} } \right) = {\text{cov}} \left( {H_{{i\,\,i_{1} \cdots i_{m} \,r}} } \right) = K_{i\,i}$$

(A.6)

and thus that expression (39) in the text must hold.

It remains to establish expression (37) in the text for any distinct leaves, $i$ and $j$ with root paths as in (35) and (36) (where again this taken to include the case, $s = r$). To do so, we first expand $H_{i} = H_{{i\,\,i_{1} \cdots i_{p} \,s\,h_{\,1} \, \cdots \,h_{m} \,r}}$ and $H_{j} = H_{{j\,\,j_{1} \cdots j_{q} \,s\,h_{\,1} \, \cdots \,h_{m} \,r}}$ as follows:

$$H_{i} = Y_{{i\,|\,i_{1} }} + A_{{i\,i_{1} }} Y_{{i_{1} \,|\,i_{2} }} + \left( {A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} } \right)Y_{{i_{2} \,|\,i_{3} }} + \cdots + \left( {A_{{i\,\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p - 1\,p} }} } \right)Y_{{i_{p} \,s}} + \left( {A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} } \right)H_{{s\,h_{\,1} \cdots h_{m} \,r}}$$

(A.7)

$$H_{j} = Y_{{j\,|\,j_{1} }} + A_{{j\,j_{1} }} Y_{{j_{1} \,|\,j_{2} }} + \left( {A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} } \right)Y_{{j_{2} \,|\,j_{3} }} + \cdots + \left( {A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} \cdots A_{{j_{q - 1\,q} }} } \right)Y_{{j_{q} \,s}} + \left( {A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} \cdots A_{{j_{q} \,s}} } \right)H_{{s\,h_{\,1} \cdots h_{m} \,r}}$$

(A.8)

Next recall that since the random variables $(Y_{{i\,|\,i_{1} }} ,Y_{{i_{1} \,|\,i_{2} }} ,\, \ldots ,\,Y_{{i_{p} \,|\,s}} ,\,Y_{{j\,|\,j_{1} }} ,Y_{{j_{1} \,|\,j_{2} }} ,\, \ldots ,\,Y_{{j_{q} \,|\,s}} ,H_{{s\,h_{\,1} \cdots h_{m} \,r}} )$ are all independent, it follows [as for example in (31) of the text] that all covariance terms between $H_{i}$ and $H_{j}$ are zero except for the shared term involving $H_{{s\,h_{\,1} \cdots h_{m} \,r}}$, so that,

$$\begin{aligned} {\text{cov}} \left( {H_{i} ,H_{j} } \right) & = {\text{cov}} [(A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} )\,H_{{s\,h_{\,1} \cdots h_{m} \,r}} \,,\,\,(A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} \cdots A_{{j_{q} \,s}} )\,H_{{s\,h_{\,1} \cdots h_{m} \,r}} ] \\ & = \left( {A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} } \right){\text{cov}} \left( {H_{{s\,h_{\,1} \cdots h_{m} \,r}} } \right)\left( {A_{{s\,j_{q} }} \cdots A_{{j_{2} \,j_{1} }} \cdots A_{{j_{1} \,j}} } \right) \\ \end{aligned}$$

(A.9)

But this implies at once from (A.5) that

$${\text{cov}} \left( {H_{i} ,H_{j} } \right) = A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} \left( {K_{ss} } \right)A_{{s\,j_{q} }} \cdots A_{{j_{2} \,j_{1} }} \cdots A_{{j_{1} \,j}}$$

(A.10)

and thus that expression (37) must hold.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dearmon, J., Smith, T.E. A hierarchical approach to scalable Gaussian process regression for spatial data. J Spat Econometrics 2, 7 (2021). https://doi.org/10.1007/s43071-021-00012-5

Download citation

Received: 20 November 2020
Accepted: 12 May 2021
Published: 14 June 2021
DOI: https://doi.org/10.1007/s43071-021-00012-5

Keywords

JEL Classification codes

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hierarchical approach to scalable Gaussian process regression for spatial data

Abstract

Access this article

Similar content being viewed by others

Conjugate Bayesian Regression Models for Massive Geostatistical Data Sets

Bayesian hierarchical models for analysing spatial point-based data at a grid level: a comparison of approaches

Model-Based Geostatistics Under Spatially Varying Preferential Sampling

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification codes

Navigation

A hierarchical approach to scalable Gaussian process regression for spatial data

Abstract

Access this article

Similar content being viewed by others

Conjugate Bayesian Regression Models for Massive Geostatistical Data Sets

Bayesian hierarchical models for analysing spatial point-based data at a grid level: a comparison of approaches

Model-Based Geostatistics Under Spatially Varying Preferential Sampling

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification codes

Search

Navigation