Half-tapering strategy for conditional simulation with large datasets

Abstract

Gaussian conditional realizations are routinely used for risk assessment and planning in a variety of Earth sciences applications. Assuming a Gaussian random field, conditional realizations can be obtained by first creating unconditional realizations that are then post-conditioned by kriging. Many efficient algorithms are available for the first step, so the bottleneck resides in the second step. Instead of doing the conditional simulations with the desired covariance (F approach) or with a tapered covariance (T approach), we propose to use the taper covariance only in the conditioning step (half-taper or HT approach). This enables to speed up the computations and to reduce memory requirements for the conditioning step but also to keep the right short scale variations in the realizations. A criterion based on mean square error of the simulation is derived to help anticipate the similarity of HT to F. Moreover, an index is used to predict the sparsity of the kriging matrix for the conditioning step. Some guides for the choice of the taper function are discussed. The distributions of a series of 1D, 2D and 3D scalar response functions are compared for F, T and HT approaches. The distributions obtained indicate a much better similarity to F with HT than with T.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Bevilacqua M, Faouzi T, Furrer R, Porcu E (2016) Estimation and prediction using generalized Wendland covariance functions under fixed domain asymptotics. ArXiv 1607.06921v1:1–36. arXiv:1607.06921

  2. Bochner S (1933) Monotone funktionen, stieltjessche integrale und harmonische analyse. Math Ann 108:378–410

    Article  Google Scholar 

  3. Bohman H (1960) Approximate Fourier analysis of distribution functions. Ark Mat 4:99–157

    Article  Google Scholar 

  4. Bolin D, Lindgren F (2013) A comparison between Markov approximations and other methods for large spatial data sets. Comput Stat Data Anal 61:7–21

    Article  Google Scholar 

  5. Bolin D, Wallin J (2016) Spatially adaptive covariance tapering. Spat Stat 18–Part A:163–178

    Article  Google Scholar 

  6. Chan G, Wood ATA (1999) Simulation of stationary Gaussian vector fields. Stat Comput 9(4):265–268

    Article  Google Scholar 

  7. Chen Y, Davis TA, Hager WW, Rajamanickam S (2008) Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans Math Softw 35(3):1–14

    CAS  Article  Google Scholar 

  8. Chilès J, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, London

    Google Scholar 

  9. Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B 70:209–226

    Article  Google Scholar 

  10. Davis TA (2006) Direct methods for sparse linear systems. SIAM Series Fundamentals of Algorithms, Philadelphia

    Google Scholar 

  11. Deltheil R (1926) Probabilités géométriques (Tome II, Fascicule II of : E. Borel, Traité du calcul des probabilités et de ses applications). Gauthier-Villars, Paris, 1–123

    Google Scholar 

  12. Dyjkstra E (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271

    Article  Google Scholar 

  13. Emery X (2004) Testing the correctness of the sequential algorithm for simulating Gaussian random fields. Stoch Env Res Risk Assess 18:401–413

    Article  Google Scholar 

  14. Emery X (2008) Statistical tests for validating geostatistical simulation algorithms. Comput Geosci 34(11):1610–1620

    Article  Google Scholar 

  15. Emery X, Lantuéjoul C (2006) TBSIM: a computer program for conditional simulation of three-dimensional Gaussian random fields via the turning bands method. Comput Geosci 32(10):1615–1628

    Article  Google Scholar 

  16. Emery X, Arroyo D, Porcu E (2015) An improved spectral turning-bands algorithm for simulating stationary vector Gaussian random fields. Stoch Env Res Risk Assess 30(7):1863–1873. doi:10.1007/s00477-015-1151-0

    Article  Google Scholar 

  17. Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Comput Graph Stat 15(2):502–523

    Article  Google Scholar 

  18. Gneiting T (2002) Compactly supported correlation functions. J Multivar Anal 83(2):493–508

    Article  Google Scholar 

  19. Gneuss P, Schmid W, Schwarze R (2013) Efficient approximation of the spatial covariance function for large datasets—analysis of atmospheric \(\text{CO}_2\) concentrations. In: Discussion paper series recap15

  20. Hovadik J, Larue D (2007) Static characterizations of reservoirs: refining the concepts of connectivity and continuity. Pet Geosci 13:195–211

    CAS  Article  Google Scholar 

  21. Lantuéjoul C (2002) Geostatistical simulation. Springer, Berlin

    Google Scholar 

  22. Lim T, Teo P (2008) Gaussian fields and Gaussian sheets with generalized Cauchy covariance structure.arXiv:0807.0022v1

  23. Lindgren F, Rue H, Lindstrom J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc B 73:423–498

    Article  Google Scholar 

  24. Lu TT, Shiou SH (2002) Inverses of \(2 \times 2\) block matrices. Comput Math Appl 43:119–129

    Article  Google Scholar 

  25. Marcotte D (2015) TASC3D: a program to test the admissibility in 3D of non-linear models of coregionalization. Comput Geosci 83:168–175

    Article  Google Scholar 

  26. Marcotte D (2016) Spatial turning bands simulation of anisotropic non linear models of coregionalization with symmetric cross-covariances. Comput Geosci 89:232–238

    Article  Google Scholar 

  27. Matheron G (1965) Les variables régionalisées et leur estimation. Ph.D. thesis, Faculté des Sciences, Université de Paris, Masson

  28. Matheron G (1971) The theory of regionalized variables and its applications. École nationale supérieure des mines 5:1–211

    Google Scholar 

  29. Paravarzar S, Emery X, Madani N (2015) Comparing sequential Gaussian and turning bands algorithms for cosimulating grades in multi-element deposits. C R Geosci 347:84–93

    Article  Google Scholar 

  30. Philip J (1991) The probability distribution of the distance between two random points in a box. Department of Mathematics, Royal Institute of Technology, pp 1–13. https://people.kth.se/~johanph/habc.pdf

  31. Porcu E, Daley DJ, Buhmann M, Bevilacqua M (2013) Radial basis functions for multivariate geostatistics. Stoch Env Res Risk Assess 27(4):909–922

    Article  Google Scholar 

  32. Renard P, Allard D (2013) Connectivity metrics for subsurface flow and transport. Adv Water Resour 51:168–196

    Article  Google Scholar 

  33. Safikhani M, Asghari O, Emery X (2016) Assessing the accuracy of sequential Gaussian simulation through statistical testing. Stoch Environ Res Risk Assess. doi:10.1007/s00477-016-1255-1

    Google Scholar 

  34. Sang H, Huang J (2012) A full scale approximation of covariance functions for large spatial data sets. J R Stat Soc B 74:111–132

    Article  Google Scholar 

  35. Shinozuka M, Jan CM (1972) Digital simulation of random processes and its applications. J Sound Vib 25:111–128

    Article  Google Scholar 

  36. Sneddon I (1951) Fourier transforms. McGraw-Hill, New Year

    Google Scholar 

  37. Stein M (1993) A simple condition for asymptotic optimality of linear predictions of random fields. Stat Probab Lett 17:399–404

    Article  Google Scholar 

  38. Stein M (1999) Interpolation of spatial data: some theory for kriging. Springer, Berlin

    Google Scholar 

  39. Stein M (2013) Statistical properties of covariance tapers. J Comput Graph Stat 22:866–885

    Article  Google Scholar 

  40. Wackernagel H (2003) Multivariate geostatistics: an introduction with applications, 3rd edn. Springer, Berlin

    Google Scholar 

  41. Wendland H (1995) Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv Comput Math 4(1):389–396

    Article  Google Scholar 

  42. Wu Z (1995) Compactly supported positive definite radial functions. Adv Comput Math 4(1):283–292

    Article  Google Scholar 

  43. Zhang H, Du J (2008) Covariance tapering in spatial statistics. In: Mateu J, Porcu E (eds) Positive definite functions: from Schoenberg to space-time challenges. Universidad Jaume I., Castellon (Spain)

Download references

Acknowledgements

We are indebted to one anonymous reviewer for his attentive and detailed review and for his numerous constructive comments. We thank Pr. Emilio Porcu from Universidad Técnica Federico Santa María in Valparaiso (Chile) for fruitful discussions and for providing us working material on generalized Wendland covariance functions and their use under fixed domain asymptotics. This research was financed in part by National Science Research Council of Canada (Grant RGPIN105603-05).

Author information

Affiliations

Authors

Corresponding author

Correspondence to D. Marcotte.

Appendix: Proof of Proposition 2

Appendix: Proof of Proposition 2

We first establish two lemmas:

Lemma 1

(Adapted from Lu and Shiou 2002) Consider the symmetric block matrix

$$\left( \begin{array}{ll} {\mathbf{K}}_0^{-1} &{} \mathbf{D}_T \\ \mathbf{D}_T &{} {\mathbf{K}}_1\end{array}\right),$$

where \(\mathbf{D}_T\) is diagonal and \({\mathbf{K}}_0\) and \({\mathbf{K}}_1\) are symmetric non singular matrices such that \({\mathbf{K}}_1 - \mathbf{D}_T {\mathbf{K}}_0 \mathbf{D}_T\) and \({\mathbf{K}}_0^{-1} - \mathbf{D}_T {\mathbf{K}}_1 \mathbf{D}_T\) are non singular. Then,

$$({\mathbf{K}}_0^{-1} - \mathbf{D}_T {\mathbf{K}}_1 \mathbf{D}_T)^{-1} = {\mathbf{K}}_0 + {\mathbf{K}}_0 \mathbf{D}_T ({\mathbf{K}}_1 - \mathbf{D}_T {\mathbf{K}}_0 \mathbf{D}_T)^{-1} \mathbf{D}_T {\mathbf{K}}_0.$$

Proof

This lemma is a direct consequence of Theorem 2 in Lu and Shiou (2002). \(\square\)

Lemma 2

Let \({\mathbf{x}}_1,\dots , {\mathbf{x}}_n\) be n sample points in finite domain D and let \(C({\mathbf{h}})\) be a covariance function on D with \(C(\mathbf{0})=1\). Let Z be a zero mean Gaussian random field with covariance function \(C({\mathbf{h}})\) on D. Let further \({\mathbf{K}}\) be the \(n \times n\) matrix with elements \([{\mathbf{K}}]_{ij} = C({\mathbf{x}}_i-{\mathbf{x}}_j)\), for \(1 \le i,j \le n\) and \({\mathbf{k}}_s\) be the n vector with elements \([{\mathbf{k}}_{\mathbf{x}}]_i = C({\mathbf{x}}_i-{\mathbf{x}})\), for \({\mathbf{x}}\in D\) and \(1 \le i \le n\). Then, the matrix

$${\mathbf{K}}- {\mathbf{k}}_{\mathbf{x}}{\mathbf{k}}^{\prime }_{\mathbf{x}}$$

is positive semi-definite for any \({\mathbf{x}}\in D\).

Proof

In order to show this, we need to show that for any vector \(\pmb {\lambda }= (\lambda _1,\dots ,\lambda _n)^{\prime } \in {\mathfrak{R}}^n\), it holds that

$$Q = \sum _{i=1}^n \sum _{j=1}^n \lambda _i \lambda _j \left( [{\mathbf{K}}]_{ij} - [{\mathbf{k}}_{\mathbf{x}}]_{i} [{\mathbf{k}}_{\mathbf{x}}]_{j} \right) \ge 0.$$
(21)

Let us denote \(S = \sum _{i=1}^n \lambda _i Z({\mathbf{x}}_i)\). Then, since \(\hbox {Var}\{S\} = \sum _{i=1}^n \sum _{j=1}^n \lambda _i \lambda _j [{\mathbf{K}}]_{ij}\) and \(\hbox {Cov}\{S,Z({\mathbf{x}})\} = \sum _{i=1}^n \lambda _i [{\mathbf{k}}_{\mathbf{x}}]_i\), Eq. (21) is equivalent to

$$Q = \hbox{Var}\{S\} -\hbox {Cov}\{S,Z({\mathbf{x}})\}^2.$$

Using \(\hbox {Cov}\{S,Z({\mathbf{x}})\}^2 \le \hbox {Var}\{S\} \hbox {Var}\{Z({\mathbf{x}})\}\) and \(\hbox {Var}\{Z({\mathbf{x}})\} = 1\), we thus get very easily that

$$Q \ge \hbox {Var}\{S\} -\hbox {Var}\{S\} \hbox {Var}\{Z({\mathbf{x}}_0)\} = 0,$$

which finishes the proof. \(\square\)

We are now ready to provide the proof of Proposition 2. We must show that \(\sigma ^2_{k,C_1}({\mathbf{x}}) \ge \sigma ^2_{k,C_0}({\mathbf{x}})\) for all \({\mathbf{x}}\in D\). As usual, we drop the dependency on \({\mathbf{x}}\) for sake of conciseness. Since \(\sigma ^2_{k,C_1} = \sigma ^2_0 - {\mathbf{k}}^{\prime }_1 {\mathbf{K}}_1^{-1} {\mathbf{k}}_1\) and \(\sigma ^2_{k,C_0} = \sigma ^2_0 - {\mathbf{k}}^{\prime }_0 {\mathbf{K}}_0^{-1} {\mathbf{k}}_0\), we need to prove that:

$${\mathbf{k}}^{\prime }_0 {\mathbf{K}}_0^{-1} {\mathbf{k}}_0 - {\mathbf{k}}^{\prime }_1 {\mathbf{K}}_1^{-1} {\mathbf{k}}_1 \ge 0.$$
(22)

Since \({\mathbf{k}}_1 = {\mathbf{k}}_0 {\mathbf{k}}_T\) and \({\mathbf{K}}_1 = {\mathbf{K}}_0 \odot {\mathbf{K}}_T\), Eq. (22) is equivalent to

$$\sum _{i=1}^n \sum _{j=1}^n [{\mathbf{k}}_0]_i \left( [{\mathbf{K}}_0^{-1}]_{ij} - [{\mathbf{k}}_T]_i \, [\{{\mathbf{K}}_0 \odot {\mathbf{K}}_T\}^{-1}]_{ij} \, [{\mathbf{k}}_T]_j\right) [{\mathbf{k}}_0]_j \ge 0.$$
(23)

To show that this expression is always nonnegative, we will show that the matrix \({\mathbf{M}}\) with elements \([{\mathbf{M}}]_{ij} = [{\mathbf{K}}_0^{-1}]_{ij} - [{\mathbf{k}}_T]_i \, [\{{\mathbf{K}}_0 \odot {\mathbf{K}}_T\}^{-1}]_{ij} \, [{\mathbf{k}}_T]_j\), for \(1 \le i,j \le n\) is positive definite (p.d.) except for the trivial case \({\mathbf{K}}_0={\mathbf{K}}_1, {\mathbf{k}}_0={\mathbf{k}}_1\), corresponding to a taper with infinite range, where \({\mathbf{M}}={\mathbf{0}}\) and Eq. (22) equals zero. Introducing the diagonal matrix \(\mathbf{D}_T = \hbox {diag}({\mathbf{k}}_T)\), this matrix can also be written

$${\mathbf{M}}={\mathbf{K}}_0^{-1} - \mathbf{D}_T\, \{{\mathbf{K}}_0 \odot {\mathbf{K}}_T\}^{-1} \mathbf{D}_T.$$
(24)

Since \({\mathbf{M}}\) is invertible, it is p.d. if and only if \({\mathbf{M}}^{-1}\) is p.d. Using Lemma 1, its inverse is

$${\mathbf{M}}^{-1} = {\mathbf{K}}_0 + {\mathbf{K}}_0 \mathbf{D}_T\, \{{\mathbf{K}}_0 \odot {\mathbf{K}}_T - \mathbf{D}_T {\mathbf{K}}_0 \mathbf{D}_T\}^{-1} \mathbf{D}_T {\mathbf{K}}_0.$$
(25)

Using Lemma 2, one has that \({\mathbf{K}}_T - {\mathbf{k}}_T {\mathbf{k}}_T^{\prime }\) is p.d. Hence, using Schur’s product theorem, \({\mathbf{K}}_0 \odot ({\mathbf{K}}_T - {\mathbf{k}}_T {\mathbf{k}}_T^{\prime }) = {\mathbf{K}}_0 \odot {\mathbf{K}}_T - {\mathbf{K}}_0 \odot {\mathbf{k}}_T {\mathbf{k}}_T^{\prime }= {\mathbf{K}}_0 \odot {\mathbf{K}}_T - \mathbf{D}_T {\mathbf{K}}_0 \mathbf{D}_T\) is p.d. and so is its inverse. As sums and products of p.d. matrices are p.d., we can conclude that \({\mathbf{M}}^{-1}\) in Eq. (25) is also p.d., which completes the proof. \(\square\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Marcotte, D., Allard, D. Half-tapering strategy for conditional simulation with large datasets. Stoch Environ Res Risk Assess 32, 279–294 (2018). https://doi.org/10.1007/s00477-017-1386-z

Download citation

Keywords

  • Wendland covariance functions
  • Sparsity index
  • Infill asymptotics
  • Spectral density
  • Covariance tapering
  • Taper function