Multi-Scale Vecchia Approximations of Gaussian Processes

Zhang, Jingjie; Katzfuss, Matthias

doi:10.1007/s13253-022-00488-0

Jingjie Zhang¹ &
Matthias Katzfuss¹

476 Accesses
3 Citations
Explore all metrics

Abstract

Gaussian processes (GPs) are popular models for functions, time series, and spatial fields, but direct application of GPs is computationally infeasible for large datasets. We propose a multi-scale Vecchia (MSV) approximation of GPs for modeling and analysis of multi-scale phenomena, which are ubiquitous in geophysical and other applications. In the MSV approach, increasingly large sets of variables capture increasingly small scales of spatial variation, to obtain an accurate approximation of the spatial dependence from very large to very fine scales. For a given set of observations, the MSV approach decomposes the data into different scales, which can be visualized to obtain insights into the underlying processes. We explore properties of the MSV approximation and propose an algorithm for automatic choice of the tuning parameters. We provide comparisons to existing approaches based on simulated data and using satellite measurements of land-surface temperature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Wavelet analysis of atmospheric turbulent data

Article 27 April 2024

A pre-whitening with block-bootstrap cross-correlation procedure for temporal alignment of data sampled by eddy covariance systems

Article Open access 21 April 2024

Multivariate Gaussian processes: definitions, examples and applications

Article Open access 27 January 2023

References

Ba S, Joseph VR (2012) Composite Gaussian process models for emulating expensive functions. Ann Appl Stat 6(4):1838–1860
Article MathSciNet Google Scholar
Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall, Cambridge
MATH Google Scholar
Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J Roy Stat Soc B 70(4):825–848
Article MathSciNet Google Scholar
Comer ML, Delp EJ (1999) Segmentation of textured images using a multiresolution gaussian autoregressive model. IEEE Trans Image Process 8(3):408–420
Article Google Scholar
Cotton WR, Bryan G, Van den Heever SC (2010) Storm and cloud dynamics, volume 99. Academic Press
Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J Roy Stat Soc B 70(1):209–226
Article MathSciNet Google Scholar
Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken, NJ
MATH Google Scholar
Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812
Article MathSciNet Google Scholar
Du J, Zhang H, Mandrekar VS (2009) Fixed-domain asymptotic properties of tapered maximum likelihood estimators. Ann Stat 37:3330–3361
Article MathSciNet Google Scholar
Duvenaud D, Lloyd JR, Grosse R, Tenenbaum JB, Ghahramani Z (2013) Structure discovery in nonparametric regression through compositional kernel search. In: Proceedings of the 30th international conference on machine learning, vol 28, pp 1166–1174
Ferreira MA, Lee HK (2007) Multiscale modeling: a Bayesian perspective. Springer Science & Business Media, Berlin
MATH Google Scholar
Ferreira MA, West M, Lee HK, Higdon DM et al (2006) Multi-scale and hidden resolution time series models. Bayesian Anal 1(4):947–967
Article MathSciNet Google Scholar
Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Comput Graph Stat 15(3):502–523
Article MathSciNet Google Scholar
Gneiting T, Katzfuss M (2014) Probabilistic forecasting. Ann Rev Stat Appl 1(1):125–151
Article Google Scholar
Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97(458):632–648
Article MathSciNet Google Scholar
Guinness J (2018) Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics 60(4):415–429
Article MathSciNet Google Scholar
Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, Lindgren F, Nychka DW, Sun F, Zammit-Mangion A (2019) A case study competition among methods for analyzing large spatial data. J Agric Biol Environ Stat 24(3):398–425
Article MathSciNet Google Scholar
Higdon D (1998) A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ Ecol Stat 5(2):173–190
Article Google Scholar
Huang H-C, Cressie N, Gabrosek J (2002) Fast, resolution-consistent spatial prediction of global processes from satellite data. J Comput Graph Stat 11(1):63–88
Article MathSciNet Google Scholar
Katzfuss M (2017) A multi-resolution approximation for massive spatial datasets. J Am Stat Assoc 112(517):201–214
Article MathSciNet Google Scholar
Katzfuss M, Cressie N (2011) Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets. J Time Ser Anal 32(4):430–446
Article MathSciNet Google Scholar
Katzfuss M, Gong W (2020) A class of multi-resolution approximations for large spatial datasets. Stat Sin 30(4):2203–2226
MATH Google Scholar
Katzfuss M, Guinness J (2021) A general framework for Vecchia approximations of Gaussian processes. Stat Sci 36(1):124–141
Article MathSciNet Google Scholar
Katzfuss M, Guinness J, Gong W, Zilber D (2020) Vecchia approximations of Gaussian-process predictions. J Agric Biol Environ Stat 25(3):383–414
Article MathSciNet Google Scholar
Katzfuss M, Jurek M, Zilber D, Gong W, Guinness J, Zhang J, Schäfer F (2020b) GPvecchia: fast Gaussian-process inference using Vecchia approximations. R package version 0.1.3
Kaufman CG, Schervish MJ, Nychka DW (2008) Covariance tapering for likelihood-based estimation in large spatial data sets. J Am Stat Assoc 103(484):1545–1555
Article MathSciNet Google Scholar
Kim S-W, Yoon S-C, Kim J, Kim S-Y (2007) Seasonal and monthly variations of columnar aerosol optical properties over east asia determined from multi-year modis, lidar, and aeronet sun/sky radiometer measurements. Atmos Environ 41(8):1634–1651
Article Google Scholar
Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J Roy Stat Soc B 73(4):423–498
Article MathSciNet Google Scholar
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6:1939–1959
MathSciNet MATH Google Scholar
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Sang H, Jun M, Huang JZ (2011) Covariance approximation for large multivariate spatial datasets with an application to multiple climate model errors. Ann Appl Stat 5(4):2519–2548
Article MathSciNet Google Scholar
Saquib SS, Bouman CA, Sauer K (1996) A non-homogeneous mrf model for multiresolution bayesian estimation. In: Proceedings of 3rd IEEE international conference on image processing, vol 2, pp 445–448. IEEE
Schäfer F, Katzfuss M, Owhadi H (2020) Sparse Cholesky factorization by Kullback-Leibler minimization. arXiv:2004.14455
Schäfer F, Sullivan TJ, Owhadi H (2017) Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity. arXiv:1706.02205
Skøien JO, Blöschl G, Western A (2003) Characteristic space scales and timescales in hydrology. Water Resour Res, 39(10)
Snelson E, Ghahramani Z (2007) Local and global sparse Gaussian process approximations. In: Artif Intell Stati 11 (AISTATS)
Sobolewska MA, Siemiginowska A, Kelly BC, Nalewajko K (2014) Stochastic modeling of the Fermi/LAT $\gamma $-ray blazar variability. Astrophys J, 786(143)
Tzeng S, Huang H-C, Cressie N (2005) A fast, optimal spatial-prediction method for massive datasets. J Am Stat Assoc 100(472):1343–1357
Article MathSciNet Google Scholar
Vecchia A (1988) Estimation and model identification for continuous spatial processes. J Roy Stat Soc B 50(2):297–312
MathSciNet Google Scholar
Wikle CK, Cressie N (1999) A dimension-reduced approach to space-time Kalman filtering. Biometrika 86(4):815–829
Article MathSciNet Google Scholar
Wilson AG, Adams RP (2013) Gaussian process kernels for pattern discovery and extrapolation. In: Proceedings of the 30th international conference on machine learning
Wilson AG, Gilboa E, Nehorai A, Cunningham JP (2014) Fast kernel learning for multidimensional pattern extrapolation. In: Advances in neural information processing systems, pp 3626–3634
Zhu J, Morgan CL, Norman JM, Yue W, Lowery B (2004) Combined mapping of soil properties using a multi-scale tree-structured spatial model. Geoderma 118(3–4):321–334
Article Google Scholar

Download references

Acknowledgements

Katzfuss’ research was partially supported by National Science Foundation (NSF) Grants DMS–1521676, DMS–1654083, and DMS–1953005. We would like to thank David Jones for helpful comments. Part of our MSV implementation was inspired by the R package GPvecchia (Katzfuss et al. 2020b), and it relies on a fast maximum–minimum distance ordering algorithm by Florian Schaefer.

Author information

Authors and Affiliations

Department of Statistics, Texas A&M University, College Station, USA
Jingjie Zhang & Matthias Katzfuss

Authors

Jingjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Katzfuss
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Computing $\mathbf {U}$

Extending the derivations in Section 2.4.1, the sparse upper triangular matrix $\mathbf {U}$ can be specified by the following rules:

(1) For each $\ell = 1,2,..., L-1$, denote $\mathbf {U}^{(\ell )}$ as the block of $\mathbf {U}$ corresponding to level $\ell $ with size $n_\ell \times n_\ell $. For each $i=1,2,...,n_\ell $,

$$\begin{aligned} \mathbf {U}^{(\ell )}_{ii} = (D_i^{(\ell )} )^{-1/2}. \end{aligned}$$

For the conditioning set of $y^{(\ell )}_i$, suppose the s-th element in its conditioning set is $y^{(\ell )}_{i'}$, then

$$\begin{aligned} \mathbf {U}^{(\ell )}_{i'i} = -\{B_i^{(\ell )}\}^s (D_i^{(\ell )} )^{-1/2}, \end{aligned}$$

where $\{B_i^{(\ell )}\}^s$ is the s-th element of $B_i^{(\ell )}$.

(2) For the data level L, first denote an $n \times n$ diagonal matrix by

$$\begin{aligned} \mathbf {U}^{(L)(L)}=diag\left( (D_1^{(L)} )^{-1/2},(D_2^{(L)} )^{-1/2},..., (D_n^{(L)} )^{-1/2}\right) . \end{aligned}$$

Next, for $\ell =1,2,..., L-1$, denote an $n_\ell \times n$ matrix $\mathbf {U}^{(L)(\ell )}$ as the block of $\mathbf {U}$ corresponding to $\{ N_{z_i}^{(\ell )}, i=1,2,...,n\}$. Then, for each i, suppose the s-th element in $N_{z_i}^{(\ell )}$ is $y_{i'}^{(\ell )}$, then

$$\begin{aligned} \mathbf {U}^{(L)(\ell )}_{i'i} = -\{B_i^{(L)}\}^s (D_i^{(L)} )^{-1/2}. \end{aligned}$$

(3) Finally, the matrix $\mathbf {U}$ is

All unmentioned entries in $\mathbf {U}$ are 0.

Prediction at unobserved locations

For simplicity in the proof, denote $\mathcal {S}$ as the locations corresponding to the knot set at level $\ell $. First, we show $\left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} =\mathbf {U}^{(\ell )}\mathbf {U}^{(\ell )\top }$. When $\ell =1$, denote $\mathbf {U}= \left( \begin{array}{cc} \mathbf {U}_1 &{} \mathbf {U}_2 \\ \mathbf{0}&{} \mathbf {U}_3 \end{array} \right) $, where $\mathbf {U}_1= \mathbf {U}^{(1)}$ is the block of $\mathbf {U}$ corresponding to knot variables at level 1. Then,

$$\begin{aligned} \hat{\mathbf {C}}^{-1}=\mathbf {U} \mathbf {U}^ \top =\left( \begin{array}{cc} \mathbf {U}_1 &{} \mathbf {U}_2 \\ \mathbf{0}&{} \mathbf {U}_3 \end{array} \right) \left( \begin{array}{cc} \mathbf {U}_1^\top &{} \mathbf{0}\\ \mathbf {U}_2^\top &{} \mathbf {U}_3^\top \end{array} \right) = \left( \begin{array}{cc} \mathbf {U}_1 \mathbf {U}_1^\top +\mathbf {U}_2 \mathbf {U}_2^\top &{} \mathbf {U}_2\mathbf {U}_3^\top \\ \mathbf {U}_3\mathbf {U}_2^\top &{} \mathbf {U}_3\mathbf {U}_3^\top \end{array} \right) . \end{aligned}$$

Since we can also write $\hat{\mathbf {C}}^{-1}$ as $\hat{\mathbf {C}}^{-1}=\left( \begin{array}{cc} C^{(1)}(\mathcal {S}, \mathcal {S}) &{} A \\ A^\top &{} B \end{array} \right) ^{-1}=\left( \begin{array}{cc} E &{} F \\ F^\top &{} G \end{array} \right) $, by the property of matrix inverse in block form, $C^{(1)}(\mathcal {S}, \mathcal {S})^{-1}=E-FG^{-1}F^\top $. Thus we have

$$\begin{aligned} C^{(1)}(\mathcal {S}, \mathcal {S})^{-1}=\mathbf {U}_1 \mathbf {U}_1^\top + \mathbf {U}_2 \mathbf {U}_2^\top -\mathbf {U}_2\mathbf {U}_3^\top (\mathbf {U}_3\mathbf {U}_3^\top )^{-1}\mathbf {U}_3\mathbf {U}_2^\top =\mathbf {U}_1 \mathbf {U}_1^\top = \mathbf {U}^{(1)}\mathbf {U}^{(1)\top }. \end{aligned}$$

For any $\ell >1$, similar results hold: $ C^{(\ell )}(\mathcal {S}, \mathcal {S})^{-1}= \mathbf {U}^{(\ell )}\mathbf {U}^{(\ell )\top }. $

Using Algorithm 2 and achieving a KL divergence of (almost) zero, the MSV approximation based on the knot set $\mathbf {y}_\ell $ at level $\ell $ is (almost) exact, and so we assume that all information about the process at level $\ell $ is captured by knot set $\mathbf {y}_\ell $. Then, the posterior mean in (5) at an unobserved location $\mathbf {s}_0$ can be computed as

$$\begin{aligned} E(y^{(\ell )}(\mathbf {s}_0) |\mathbf {z}) =&E\left( E(y^{(\ell )}(\mathbf {s}_0) |\mathbf {y}_\ell ,\mathbf {z}) |\mathbf {z}\right) = E\left( E(y^{(\ell )}(\mathbf {s}_0) |\mathbf {y}_\ell ) |\mathbf {z}\right) \\ =&C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} E(\mathbf {y}_\ell |\mathbf {z})\\ =&C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \mathbf {U}^{(\ell )}\mathbf {U}^{(\ell )\top } \mathbf {H}_{\ell }\varvec{\mu }. \end{aligned}$$

The posterior variance in (6) can be computed as

$$\begin{aligned}&var(y^{(\ell )}(\mathbf {s}_0) |\mathbf {z}) \nonumber \\ =&E\left( var(y^{(\ell )}(\mathbf {s}_0) |\mathbf {y}_\ell ,\mathbf {z}) |\mathbf {z}\right) + var\left( E(y^{(\ell )}(\mathbf {s}_0) |\mathbf {y}_\ell ,\mathbf {z}) |\mathbf {z}\right) \nonumber \\ =&E\left( var(y^{(\ell )}(\mathbf {s}_0) |\mathbf {y}_\ell ) |\mathbf {z}\right) + var\left( E(y^{(\ell )}(\mathbf {s}_0) |\mathbf {y}_\ell ) |\mathbf {z}\right) \nonumber \\ =&E \left( C^{(\ell )}(\mathbf {s}_0, \mathbf {s}_0)- C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} C^{(\ell )}(\mathcal {S},\mathbf {s}_0) |\mathbf {z}\right) \nonumber \\&+ var\left( C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} \mathbf {y}_\ell |\mathbf {z}\right) \nonumber \\ =&\, C^{(\ell )}(\mathbf {s}_0, \mathbf {s}_0)- C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} C^{(\ell )}(\mathcal {S},\mathbf {s}_0)\nonumber \\&+ C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} var(\mathbf {y}_\ell |\mathbf {z}) \left( C^{(\ell )}(\mathcal {S}, \mathcal {S}) \right) ^{-1} C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) ^\top \nonumber \\ =&\,C^{(\ell )}(\mathbf {s}_0, \mathbf {s}_0)- C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \mathbf {U}^{(\ell )}\mathbf {U}^{(\ell )\top } C^{(\ell )}(\mathcal {S},\mathbf {s}_0)\nonumber \\&+ C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) \mathbf {U}^{(\ell )}\mathbf {U}^{(\ell )\top } \varvec{\Sigma }_{\mathbf {H}_{\ell }} \mathbf {U}^{(\ell )}\mathbf {U}^{(\ell )\top } C^{(\ell )}(\mathbf {s}_0, \mathcal {S}) ^\top . \end{aligned}$$

(7)

The posterior predictive variance for the entire latent process is

$$\begin{aligned}&var(y^{(1)}(\mathbf {s}_0)+y^{(2)}(\mathbf {s}_0) |\mathbf {z}) \\&\quad = E\left( var \left( y^{(1)}(\mathbf {s}_0)+y^{(2)}(\mathbf {s}_0) |\mathbf {y}_1,\mathbf {y}_2,\mathbf {z}\right) |\mathbf {z}\right) + var\left( E \left( y^{(1)}(\mathbf {s}_0)+y^{(2)}(\mathbf {s}_0) |\mathbf {y}_1,\mathbf {y}_2,\mathbf {z}\right) |\mathbf {z}\right) \\&\quad = E\left( var \left( y^{(1)}(\mathbf {s}_0)+y^{(2)}(\mathbf {s}_0) |\mathbf {y}_1,\mathbf {y}_2 \right) |\mathbf {z}\right) + var\left( E \left( y^{(1)}(\mathbf {s}_0)+y^{(2)}(\mathbf {s}_0) |\mathbf {y}_1,\mathbf {y}_2 \right) |\mathbf {z}\right) \\&\quad = E\left( \left( var(y^{(1)}(\mathbf {s}_0)|\mathbf {y}_1) + var(y^{(2)}(\mathbf {s}_0) |\mathbf {y}_2)\right) |\mathbf {z}\right) + var\left( \left( E(y^{(1)}(\mathbf {s}_0)|\mathbf {y}_1)+ E(y^{(2)}(\mathbf {s}_0) |\mathbf {y}_2) \right) |\mathbf {z}\right) \\&\quad = E\left( var(y^{(1)}(\mathbf {s}_0)|\mathbf {y}_1) |\mathbf {z}\right) + E \left( var(y^{(2)}(\mathbf {s}_0) |\mathbf {y}_2) |\mathbf {z}\right) \\&\qquad + var\left( C^{(1)}(\mathbf {s}_0, \mathcal {S}_1) \left( C^{(1)}(\mathcal {S}_1, \mathcal {S}_1) \right) ^{-1} \mathbf {y}_1 + C^{(2)}(\mathbf {s}_0, \mathcal {S}_2) \left( C^{(2)}(\mathcal {S}_2, \mathcal {S}_2) \right) ^{-1} \mathbf {y}_2|\mathbf {z}\right) \end{aligned}$$

The first two terms can be computed similar to (7). The last term is a linear combination of $\mathbf {y}_{1:L-1}$, and so it can be calculated by $var (\mathbf {H}\mathbf {y}_{1:L-1}|z) = (\mathbf {V}^{-1} \mathbf {H}^\top )^\top (\mathbf {V}^{-1} \mathbf {H}^\top )$.

Proofs

Proposition 1

For a polynomial $y^{(\ell )}(\mathbf {s}) = \mathbf {p}(\mathbf {s})^\top \varvec{\beta }$ as a function of spatial location $\mathbf {s}$ with p coefficients $\varvec{\beta } \sim \mathcal {N}_p(\mathbf {0},\varvec{\Sigma }_\beta )$, the corresponding covariance function $C^{(\ell )}(\mathbf {s}_i, \mathbf {s}_j) = \mathbf {p}(\mathbf {s}_i)^\top \varvec{\Sigma }_\beta \mathbf {p}(\mathbf {s}_j)$ can be captured exactly by setting the knot and conditioning set to be any distinct p locations.

Proof of Proposition 1

Denote any p distinct locations as $\{\mathbf {s}_1, \mathbf {s}_2,..., \mathbf {s}_p\}$. For polynomial $y(\mathbf {s}) = \mathbf {p}(\mathbf {s})^\top \varvec{\beta }$ with $\varvec{\beta } \sim \mathcal {N}_p(\mathbf {0},\varvec{\Sigma }_\beta )$, the system of equations $\{\mathbf {p}(\mathbf {s}_1)^\top \varvec{\beta } = y(\mathbf {s}_1) , \mathbf {p}(\mathbf {s}_2)^\top \varvec{\beta } = y(\mathbf {s}_2) ,..., \mathbf {p}(\mathbf {s}_p)^\top \varvec{\beta } = y(\mathbf {s}_p) , \mathbf {p}(\mathbf {s})^\top \varvec{\beta } = y(\mathbf {s}) \}$ is equivalent to the system of equations $\{\mathbf {p}(\mathbf {s}_1)^\top \varvec{\beta } = y(\mathbf {s}_1) , \mathbf {p}(\mathbf {s}_2)^\top \varvec{\beta } = y(\mathbf {s}_2) ,..., \mathbf {p}(\mathbf {s}_p)^\top \varvec{\beta } = y(\mathbf {s}_p) \}$, thus $P \left( y(\mathbf {s}) | y(\mathbf {s}_1), y(\mathbf {s}_2),...,y(\mathbf {s}_p) \right) = 1$. Then, the exact distribution for $\mathbf {y}=(y(\mathbf {s}_1), y(\mathbf {s}_2), ..., y(\mathbf {s}_n))$ can be written as

$$\begin{aligned} f(\mathbf {y})=&\prod _{i=1}^{n} f(y(\mathbf {s}_i)| y(\mathbf {s}_{h_i}))\\ =&f(y(\mathbf {s}_1)) f(y(\mathbf {s}_2)|y(\mathbf {s}_1)) f(y(\mathbf {s}_3)|y(\mathbf {s}_1),y(\mathbf {s}_2)) \cdots f(y(\mathbf {s}_p)|y(\mathbf {s}_1),y(\mathbf {s}_2), ...,y(\mathbf {s}_{p-1}) ) \cdot \\&\prod _{i=p+1}^{n} f(y(\mathbf {s}_i)| y(\mathbf {s}_1), y(\mathbf {s}_2),..., y(\mathbf {s}_p)), \end{aligned}$$

which equals $\hat{f}(\mathbf {y})$ in Vecchia by setting the knot and conditioning set to be $\{\mathbf {s}_1, \mathbf {s}_2, ...,\mathbf {s}_p\}$. Thus, the covariance can be captured exactly. $\square $

Proof of Theorem 1

The following proof is related to Guinness (2018, Thm. 1). Suppose the true covariance is $ \Sigma _0$ and the approximated covariance is $\hat{\Sigma }$. At each level $\ell $, the KL divergence between the two normal distributions can be written as $ KL \left( f(\mathbf {y}^{(\ell )}) || \hat{f}(\mathbf {y}^{(\ell )}) \right) =\frac{1}{2} E \left( -(\mathbf {y}^{(\ell )})^\top \Sigma _0^{-1}\mathbf {y}^{(\ell )} \right) +\frac{1}{2} E \left( (\mathbf {y}^{(\ell )})^\top \hat{\Sigma }^{-1}\mathbf {y}^{(\ell )}\right) +\frac{1}{2} \log \frac{ |\hat{\Sigma } |}{|\Sigma _0 |}$. Since $\Sigma _0$ is the true covariance, the first term $E \left( -(\mathbf {y}^{(\ell )})^\top \Sigma _0^{-1}\mathbf {y}^{(\ell )} \right) = -n$. Based on MSV, we have $\log |\hat{\Sigma } |=\sum _{i=1}^{n} \log D_i^{(\ell )}$. Suppose $L_0$ is the Cholesky factor of $\Sigma _0$, then $E \left( (\mathbf {y}^{(\ell )})^\top \hat{\Sigma }^{-1}\mathbf {y}^{(\ell )}\right) =tr(UU^T \Sigma _0)= \sum _{i,j} (L_0^T U)_{ij}^2 =n$. Thus, the KL divergence can be written as $KL \left( f(\mathbf {y}^{(\ell )}) || \hat{f}(\mathbf {y}^{(\ell )}) \right) =\frac{1}{2} \left( -n+n+ \sum _{i=1}^{n} \log D_i^{(\ell )}-\log |\Sigma _0 |\right) =\frac{1}{2} \sum _{i=1}^{n} \log D_i^{(\ell )}-constant$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Katzfuss, M. Multi-Scale Vecchia Approximations of Gaussian Processes. JABES 27, 440–460 (2022). https://doi.org/10.1007/s13253-022-00488-0

Download citation

Received: 02 February 2021
Revised: 30 December 2021
Accepted: 31 December 2021
Published: 08 February 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s13253-022-00488-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Scale Vecchia Approximations of Gaussian Processes

Abstract

Access this article

Similar content being viewed by others

Wavelet analysis of atmospheric turbulent data

A pre-whitening with block-bootstrap cross-correlation procedure for temporal alignment of data sampled by eddy covariance systems

Multivariate Gaussian processes: definitions, examples and applications

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Publisher's Note

Appendices

Computing \(\mathbf {U}\)

Prediction at unobserved locations

Proofs

Proposition 1

Proof of Proposition 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-Scale Vecchia Approximations of Gaussian Processes

Abstract

Access this article

Similar content being viewed by others

Wavelet analysis of atmospheric turbulent data

A pre-whitening with block-bootstrap cross-correlation procedure for temporal alignment of data sampled by eddy covariance systems

Multivariate Gaussian processes: definitions, examples and applications

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Publisher's Note

Appendices

Computing \(\mathbf {U}\)

Prediction at unobserved locations

Proofs

Proposition 1

Proof of Proposition 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation