Skip to main content
Log in

A hierarchical approach to scalable Gaussian process regression for spatial data

  • Original Paper
  • Published:
Journal of Spatial Econometrics

Abstract

Large scale and highly detailed geospatial datasets currently offer rich opportunities for empirical investigation, where finer-level investigation of spatial spillovers and spatial infill can now be done at the parcel level. Gaussian process regression (GPR) is particularly well suited for such investigations, but is currently limited by its need to manipulate and store large dense covariance matrices. The central purpose of this paper is to develop a more efficient version of GPR based on the hierarchical covariance approximation proposed by Chen et al. (J Mach Learn Res 18:1–42, 2017) and Chen and Stein (Linear-cost covariance functions for Gaussian random fields, arXiv:1711.05895, 2017). We provide a novel probabilistic interpretation of Chen’s framework, and extend his method to the analysis of local marginal effects at the parcel level. Finally, we apply these tools to a spatial dataset constructed from a 10-year period of Oklahoma County Assessor databases. In this setting, we are able to identify both regions of possible spatial spillovers and spatial infill, and to show more generally how this approach can be used for the systematic identification of specific development opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Data availability

The Oklahoma County Assessor has database exports that are available from their office upon request and can be used for empirical research.

Code availability

Matlab; GPStuff; Custom Coding in Matlab.

Notes

  1. To allow comparability of length scales, individual attribute variables are implicitly assumed to be standardized.

  2. Note in particular from (4) above that for test locations, \(x_{*l}\), far from all training locations,\(X = (x_{1} , \ldots ,x_{n} )\), the covariance vector, \(K(x_{*l} ,X)\), must approach zero. This in turn implies from (7) together with model (1) that the corresponding predictions, \(\hat{y}_{l} = E(Y_{*l} |Y) = E(f_{*l} |Y)\,\, + \mu\), must necessarily approach the mean, \(\mu\). Such (extrapolated) predictions thus exhibit “mean reversion”.

  3. In this regard, the most popular alternative kernel function, namely the simple exponential kernel (as for example in Genten 2001), is overly sensitive to small differences between similar attribute profiles. Within the larger family of Matern kernels, the squared exponential kernel is also the simplest to analyze from both estimation and inference perspectives.

  4. These are also referred to as “inducing” points (as for example in Rasmussen and Quinonero-Candela 2005).

  5. Note that whenever \(X_{i} \cap X_{r} \ne \emptyset\), the conditional covariance matrix, \(X_{ii} - X_{ir} \,X_{rr}^{ - 1} X_{ri}\), in (14) must be singular. But as will be seen in footnote 3 below, this has no substantive consequences for the model constructed.

  6. As seen in (22) below, the random vectors, \(H_{1}\) and \(H_{2}\), have full rank covariance matrices, and thus are properly multi-normally distributed even when \(Z_{1|r}\) and \(Z_{2|r}\) are singular multi-normal [see for example Anderson (1958, Theorem 2.4.5)].

  7. Blake (2015). Fast and Accurate Symmetric Positive Definite Matrix Inverse, Matlab Central File Exchange.

  8. In fact, the temperature application of Chen and Stein [C2] involves more than 2 million observations.

  9. The explicit samples sizes shown are [10,000, 20,000, 40,000, 80,000, 160,000, 320,000, 500,000].

  10. Computation times for HCA automatically include calculations of Local Marginal Effects at each prediction point (which are not directly relevant for either NNGP or GBM). But these add little in the way of time differences.

  11. These predictions are computed for a regular grid of points in \([0,1]^{2}\) and, in a manner similar to Fig. 8a, contours are then interpolated and plotted using the Matlab program, contour.m. Similar procedures are used to obtain Figs. 13b and 14b below.

  12. It should also be noted that “local marginal effects” are much more problematic for both NNGP and GBM models, and are not considered here. In NNGP, the dominant effects of predictors are regression-like global mean effects with only the spatial error term modeled as a nearest-neighbor Gaussian process. In GBM, prediction surfaces are either locally flat or at best governed by the type of weak-learner functions used. So local behavior is of less interest in either of these prediction models.

  13. In particular, by adding 'irrelevant' variables to our simulation model, experiments showed that the combined time for computing predictions together with LMEs for any given variable is approximately linear in the total number of variables. In the present case with 12,569 sample points, 150 landmark points, and using only in-sample calculations of LMEs, the added time for each new irrelevant variable was approximately 30 s.

  14. LME performance deteriorates under higher error variance. Within this simulation setting, we multiplied the error’s 0.5 standard deviation by a scaling factor which ranged from 0.5 to 2.5 by units of 0.5. Regression results suggest that, for every one unit increase in this scaling factor, we find that MAE increases by 0.06 for LME × 1 and by 0.12 for LME × 2.

  15. This is consistent with Matlab’s own findings that “Using A\b instead of inv(A)*b is two to three times faster, and produces residuals on the order of machine accuracy relative to the magnitude of the data”, (https://www.mathworks.com/help/matlab/ref/inv.html).

  16. For well-behaved data sets, a random selection of landmark points is probably sufficient and much less costly to execute. But, as noted by Chen and Stein (2021, p.15), in more challenging cases such as our present housing-price data, a careful selection of landmarks can substantially reduce approximation errors. The primary drawback is computation time. Tree construction is much more costly with k-means. For a simulated data set with 311,422 observations and 500 landmark points, random selection took 28 s while k-means took almost 5 min. But nonetheless, k-means is often used way of selecting inducing points; see for example (Park and Choi 2010; Hensman et al. 2015).

  17. Changes of a more indirect nature are to allow tree indexing on different sets of attributes. For our present purposes, we partitioned on all variables, except sale year, which made the grouping more spatial than temporal.

  18. Assessor data also provides estimates of market value for each parcel. These are only used in the assessment of predictive model performance.

  19. Note in particular the row of blue points with underestimated values (\(\approx \,\)$172,000) that roughly correspond to the sample mean price of the data. As mentioned in footnote 2 above, these are instances of parcels with extreme attribute profiles involving extrapolated price predictions that exhibit mean reversion. However, such outliers are usually identified easily and can be analyzed separately.

  20. A cursory search suggests that the lower bound on a room addition is about $80 per square foot (as for example in https://www.ownerly.com/home-improvement/home-addition-cost/, https://www.homeadvisor.com/cost/additions-and-remodels/build-an-addition/, and https://www.homelight.com/blog/room-addition-cost/).

  21. This building permit data is taken from the County Assessor’s database, with dates issued in 2018 or later. Building costs for permits in our data set all exceed $5,000.

  22. One exception is the recent paper by Cohen and Zabel (2020) which analyzes such spillover effects at the census tract level in the Greater Boston Area.

  23. Such situations do not usually occur in tier-one markets.

References

  • Anderson TW (1958) Introduction to multivariate statistical analysis. Wiley, New York

    Google Scholar 

  • Chen J, Avron H, Sindhwani V (2017) Hierarchically compositional kernelsfor scalable nonparametric learning. J Mach Learn Res 18:1–42

    Google Scholar 

  • Chen J, Stein ML (2017) Linear-cost covariance functions for gaussian random fields. arXiv:1711.05895

  • Chen J, Stein ML (2021) Linear-cost covariance functions for Gaussian random fields. J Am Stat Assoc, p 1-43

  • Cohen JP, Zabel J (2020) Local house price diffusion. Real Estate Econ 48:710–743

    Article  Google Scholar 

  • Daisa JM, Parker T (2009) Trip generation rates for urban infill land uses in California. ITE J 79(6):30–39

    Google Scholar 

  • Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812

    Article  Google Scholar 

  • Dearmon J, Smith TE (2016) Gaussian process regression and Bayesian model averaging: an alternative approach to modeling spatial phenomena. Geogr Anal 48:82–111

    Article  Google Scholar 

  • Dearmon J, Smith TE (2017) Local marginal analysis of spatial data: a Gaussian process regression approach with Bayesian model and kernel averaging. Spat Econom Qual Limit Depend Var 37:297–342

    Google Scholar 

  • DeFusco A, Ding W, Ferreira F, Gyourko J (2018) The role of price spillovers in the American housing boom. J Urban Econ 108:72–84

    Article  Google Scholar 

  • Finley AO, Abhirup D, Banerjee S (2020) spNNGP R package for nearest neighbor Gaussian process models. arXiv:2001.09111v1 [stat.CO]

  • Genton MG (2001) Classes of kernels for machine learning: a statistics perspective. J Mach Learn Res 2:299–312

    Google Scholar 

  • Hensman J, Matthews AG, Filippone M, Ghahramani Z (2015) MCMC for variationally sparse Gaussian processes. In: Advances in neural information processing systems, pp 1648–1656

  • Landis JD, Hood H, Li G, Rogers T, Warren C (2006) The future of infill housing in California: opportunities, potential, and feasibility. Hous Policy Debate 17(4):681–725

    Article  Google Scholar 

  • McConnell V, Wiley K (2010) Infill development: perspectives and evidence from economics and planning. Resour Fut 10:1–34

    Google Scholar 

  • Park S, Choi S (2010) Hierarchical Gaussian process regression. In: Proceedings of 2nd Asian conference on machine learning, pp 95–110

  • Rasmussen C, Quinonero-Candela J (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6:1939–1959

    Google Scholar 

  • Ridgeway G (2007) Generalized boosted models: a guide to the gbm package. Update 1(1):2007

    Google Scholar 

  • Vanhatalo J, Riihimäki J, Hartikainen J, Jylänki P, Tolvanen V, Vehtari A (2013) GPstuff: Bayesian modeling with Gaussian processes. J Mach Learn Res 14(Apr):1175–1179

    Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacob Dearmon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

To show that expressions (34) and (37) are indeed the actual covariances of the random vectors in expression (33) of the text, it is convenient to introduce further simplifying notation. For each possible root path, \(i_{1} \, \to \,i_{2} \, \to \, \cdots \, \to \,i_{m - 1} \, \to \,i_{m} \to \,r\), let \(H_{{i_{1} \,i_{2} \, \cdots \,i_{m} \,r}}\) be defined recursively for paths of length one by

$$H_{{i_{1} \,r}} \, = \,Z_{{i_{1} \,|\,r}} \, + \,A_{{i_{1} \,r}} \,Z_{r}$$
(A.1)

[as in (18) of the text] and for longer paths by

$$H_{{i_{1} \,i_{2} \, \cdots \,i_{m} \,r}} \, = \,Z_{{i_{1} \,|\,i_{2} }} + \,A_{{i_{1} \,i_{2} \,}} H_{{i_{2} \cdots \,i_{m} \,r\,}}$$
(A.2)

Then, (A.1) together with the argument in (14) through (17) of the text again shows that for paths of length one,

$${\text{cov}} \left( {H_{{i_{1} \,r}} } \right)\, = \,K_{{i_{1} \,i_{1} }}$$
(A.3)

So if it hypothesized that

$${\text{cov}} (H_{{i_{1} \, \cdots \,i_{m} \,r}} ) = K_{{i_{1} \,i_{1} }}$$
(A.4)

holds for all paths of length m, then for paths of length \(m + 1\) it follows from the independence of \(Y_{{i_{1} \,|\,i_{2} }}\) and \(H_{{i_{2} \cdots \,i_{m + 1} \,r\,}}\), together with (A.4) that [again from the argument in (14) through (17) in the text],

$$\begin{aligned} {\text{cov}} \left( {H_{{i_{1} \,i_{2} \, \cdots \,i_{m} \,i_{m + 1} \,r}} } \right) & = {\text{cov}} \left( {Y_{{i_{1} \,|\,i_{2} }} + A_{{i_{1} \,i_{2} \,}} H_{{i_{2} \cdots \,i_{m + 1} \,r\,}} } \right) \\ & = {\text{cov}} \left( {Y_{{i_{1} \,|\,i_{2} }} } \right) + A_{{i_{1} \,i_{2} }} {\text{cov}} \left( {H_{{i_{2} \cdots \,i_{m + 1} \,r}} } \right)A_{{i_{2} \,i_{1} }} \\ & = K_{{i_{1} \,i_{1} }} - K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} + \left( {K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} } \right)\left[ {K_{{i_{2} \,i_{2} }} } \right]\left( {K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} } \right) \\ & = K_{{i_{1} \,i_{1} }} - K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} + K_{{i_{1} \,i_{2} }} K_{{i_{2} \,i_{2} }}^{ - 1} K_{{i_{2} \,i_{1} }} = K_{{i_{1} \,i_{1} }} \\ \end{aligned}$$
(A.5)

So by induction, (A.4) must hold for all m. But for any leaf, \(i\), with root path, \(i\, \to i_{1} \, \to \cdots \, \to i_{m} \to \,r\), this implies at once that

$${\text{cov}} \left( {H_{i} } \right) = {\text{cov}} \left( {H_{{i\,\,i_{1} \cdots i_{m} \,r}} } \right) = K_{i\,i}$$
(A.6)

and thus that expression (39) in the text must hold.

It remains to establish expression (37) in the text for any distinct leaves, \(i\) and \(j\) with root paths as in (35) and (36) (where again this taken to include the case, \(s = r\)). To do so, we first expand \(H_{i} = H_{{i\,\,i_{1} \cdots i_{p} \,s\,h_{\,1} \, \cdots \,h_{m} \,r}}\) and \(H_{j} = H_{{j\,\,j_{1} \cdots j_{q} \,s\,h_{\,1} \, \cdots \,h_{m} \,r}}\) as follows:

$$H_{i} = Y_{{i\,|\,i_{1} }} + A_{{i\,i_{1} }} Y_{{i_{1} \,|\,i_{2} }} + \left( {A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} } \right)Y_{{i_{2} \,|\,i_{3} }} + \cdots + \left( {A_{{i\,\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p - 1\,p} }} } \right)Y_{{i_{p} \,s}} + \left( {A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} } \right)H_{{s\,h_{\,1} \cdots h_{m} \,r}}$$
(A.7)
$$H_{j} = Y_{{j\,|\,j_{1} }} + A_{{j\,j_{1} }} Y_{{j_{1} \,|\,j_{2} }} + \left( {A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} } \right)Y_{{j_{2} \,|\,j_{3} }} + \cdots + \left( {A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} \cdots A_{{j_{q - 1\,q} }} } \right)Y_{{j_{q} \,s}} + \left( {A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} \cdots A_{{j_{q} \,s}} } \right)H_{{s\,h_{\,1} \cdots h_{m} \,r}}$$
(A.8)

Next recall that since the random variables \((Y_{{i\,|\,i_{1} }} ,Y_{{i_{1} \,|\,i_{2} }} ,\, \ldots ,\,Y_{{i_{p} \,|\,s}} ,\,Y_{{j\,|\,j_{1} }} ,Y_{{j_{1} \,|\,j_{2} }} ,\, \ldots ,\,Y_{{j_{q} \,|\,s}} ,H_{{s\,h_{\,1} \cdots h_{m} \,r}} )\) are all independent, it follows [as for example in (31) of the text] that all covariance terms between \(H_{i}\) and \(H_{j}\) are zero except for the shared term involving \(H_{{s\,h_{\,1} \cdots h_{m} \,r}}\), so that,

$$\begin{aligned} {\text{cov}} \left( {H_{i} ,H_{j} } \right) & = {\text{cov}} [(A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} )\,H_{{s\,h_{\,1} \cdots h_{m} \,r}} \,,\,\,(A_{{j\,j_{1} }} A_{{j_{1} \,j_{2} }} \cdots A_{{j_{q} \,s}} )\,H_{{s\,h_{\,1} \cdots h_{m} \,r}} ] \\ & = \left( {A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} } \right){\text{cov}} \left( {H_{{s\,h_{\,1} \cdots h_{m} \,r}} } \right)\left( {A_{{s\,j_{q} }} \cdots A_{{j_{2} \,j_{1} }} \cdots A_{{j_{1} \,j}} } \right) \\ \end{aligned}$$
(A.9)

But this implies at once from (A.5) that

$${\text{cov}} \left( {H_{i} ,H_{j} } \right) = A_{{i\,i_{1} }} A_{{i_{1} \,i_{2} }} \cdots A_{{i_{p} \,s}} \left( {K_{ss} } \right)A_{{s\,j_{q} }} \cdots A_{{j_{2} \,j_{1} }} \cdots A_{{j_{1} \,j}}$$
(A.10)

and thus that expression (37) must hold.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dearmon, J., Smith, T.E. A hierarchical approach to scalable Gaussian process regression for spatial data. J Spat Econometrics 2, 7 (2021). https://doi.org/10.1007/s43071-021-00012-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43071-021-00012-5

Keywords

JEL Classification codes

Navigation