Skip to main content
Log in

Properties and Comparison of Some Kriging Sub-model Aggregation Methods

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

Kriging is a widely employed technique across computer experiments, machine learning and geostatistics. An important challenge for kriging is its high costs when dealing with large datasets. This article focuses on a class of methods aiming at decreasing this computational burden by aggregating kriging predictors based on smaller data subsets. More precisely, it shows that aggregation methods that ignore the covariance between sub-models typically yield inconsistent predictions, whereas the nested kriging method enjoys several attractive properties: it is consistent, it can be interpreted as an exact conditional distribution for a modified prior, and the conditional covariances given the observations can be computed efficiently. This article also includes a theoretical and numerical analysis of how the assignment of the observation points to the sub-models can affect the prediction ability of the aggregated model. Finally, the nested kriging method is extended to measurement errors and to universal kriging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abrahamsen P (1997) A review of Gaussian random fields and correlation functions. Technical report, Norwegian Computing Center

  • Allard D, Comunian A, Renard P (2012) Probability aggregation methods in geoscience. Math Geosci 44(5):545–581

    Article  Google Scholar 

  • Bacchi V, Jomard H, Scotti O, Antoshchenkova E, Bardet L, Duluc CM, Hebert H (2020) Using meta-models for tsunami hazard analysis: an example of application for the French Atlantic coast. Front Earth Sci 8(41):1–17

    Google Scholar 

  • Bachoc F (2013) Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model mispecification. Comput Stat Data Anal 66:55–69

    Article  Google Scholar 

  • Bachoc F, Ammar K, Martinez JM (2016) Improvement of code behavior in a design of experiments by metamodeling. Nucl Sci Eng 183(3):387–406

    Article  Google Scholar 

  • Bachoc F, Lagnoux A, Nguyen TMN (2017) Cross-validation estimation of covariance parameters under fixed-domain asymptotics. J Multivar Anal 160:42–67

    Article  Google Scholar 

  • Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(4):825–848

    Article  Google Scholar 

  • Cao Y, Fleet DJ (2014) Generalized product of experts for automatic and principled fusion of gaussian process predictions. In: Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal, pp 1–5

  • Chevalier C, Ginsbourger D (2013) Fast computation of the multi-points expected improvement with applications in batch selection. In: Learning and intelligent optimization. Springer, Berlin, pp 59–69

  • Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, vol 713. Wiley, New York

    Book  Google Scholar 

  • Chilès JP, Desassis N (2018) Fifty years of Kriging. Handbook of mathematical geosciences. Springer, Cham, pp 589–612

    Book  Google Scholar 

  • Cressie N (1990) The origins of Kriging. Math Geol 22(3):239–252

    Article  Google Scholar 

  • Cressie N (1993) Statistics for spatial data. Wiley, New York

    Book  Google Scholar 

  • Cressie N, Johannesson G (2008) Fixed rank Kriging for very large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(1):209–226

    Article  Google Scholar 

  • Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812

    Article  Google Scholar 

  • Davis BJK, Curriero FC (2019) Development and evaluation of geostatistical methods for non-Euclidean-based spatial covariance matrices. Math Geosci 51(6):767–791

    Article  Google Scholar 

  • Deisenroth MP, Ng JW (2015) Distributed Gaussian processes. In: Proceedings of the 32nd international conference on machine learning, Lille, France JMLR: W&CP, vol 37

  • Finley AO, Sang H, Banerjee S, Gelfand AE (2009) Improving the performance of predictive process modeling for large datasets. Comput Stat Data Anal 53(8):2873–2884

    Article  Google Scholar 

  • Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Comput Graph Stat 15(3):502–523

    Article  Google Scholar 

  • He J, Qi J, Ramamohanarao K (2019) Query-aware Bayesian committee machine for scalable Gaussian process regression. In: Proceedings of the 2019 SIAM international conference on data mining. SIAM, pp 208–216

  • Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, Lindgren F, Nychka D, Sun F, Zammit-Mangion A (2019) A case study competition among methods for analyzing large spatial data. J Agric Biol Environ Stat 24(3):398–425

    Article  Google Scholar 

  • Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. In: Uncertainty in artificial intelligence, pp 282–290

  • Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800

    Article  Google Scholar 

  • Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black box functions. J Global Optim 13:455–492

    Article  Google Scholar 

  • Kaufman CG, Schervish MJ, Nychka DW (2008) Covariance tapering for likelihood-based estimation in large spatial data sets. J Am Stat Assoc 103(484):1545–1555

    Article  Google Scholar 

  • Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J South Afr Inst Min Metall 52(6):119–139

    Google Scholar 

  • Krityakierne T, Baowan D (2020) Aggregated GP-based optimization for contaminant source localization. Oper Res Perspect 7:100151

    Google Scholar 

  • Liu H, Cai J, Wang Y, Ong Y S (2018) Generalized robust Bayesian committee machine for large-scale Gaussian process regression. In: Proceedings of machine learning research, vol 80, pp 3131–3140, International Conference on Machine Learning 2018

  • Liu H, Ong Y, Shen X, Cai J (2020) When Gaussian process meets big data: a review of scalable GPs. IEEE Trans Neural Netw Learn Syst 31:4405–4423

    Article  Google Scholar 

  • Marrel A, Iooss B, Laurent B, Roustant O (2009) Calculations of Sobol indices for the Gaussian process metamodel. Reliab Eng Syst Saf 94(3):742–751

    Article  Google Scholar 

  • Matheron G (1970) La Théorie des Variables Régionalisées et ses Applications. Fascicule 5 in Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau, Ecole Nationale Supérieure des Mines de Paris

  • Putter H, Young A (2001) On the effect of covariance function estimation on the accuracy of Kriging predictors. Bernoulli 7(3):421–438

    Article  Google Scholar 

  • Quinonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6:1939–1959

    Google Scholar 

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  • Roustant O, Ginsbourger D, Deville Y (2012) DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J Stat Softw 51(1):1–55

  • Rue H, Held L (2005) Gaussian Markov random fields, Theory and applications. Chapman & Hall, Boca Raton

    Book  Google Scholar 

  • Rullière D, Durrande N, Bachoc F, Chevalier C (2018) Nested Kriging predictions for datasets with a large number of observations. Stat Comput 28(4):849–867

    Article  Google Scholar 

  • Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–423

    Google Scholar 

  • Santner TJ, Williams BJ, Notz WI (2013) The design and analysis of computer experiments. Springer, Berlin

    Google Scholar 

  • Stein ML (2012) Interpolation of spatial data: some theory for Kriging. Springer, Berlin

    Google Scholar 

  • Stein ML (2014) Limitations on low rank approximations for covariance matrices of spatial data. Spatial Stat 8:1–19

    Article  Google Scholar 

  • Sun X, Luo XS, Xu J, Zhao Z, Chen Y, Wu L, Chen Q, Zhang D (2019) Spatio-temporal variations and factors of a provincial pm 2.5 pollution in eastern china during 2013–2017 by geostatistics. Sci Rep 9(1):1–10

    Article  Google Scholar 

  • Tresp V (2000) A Bayesian committee machine. Neural Comput 12(11):2719–2741

    Article  Google Scholar 

  • van Stein B, Wang H, Kowalczyk W, Bäck T, Emmerich M (2015) Optimally weighted cluster Kriging for big data regression. In: International symposium on intelligent data analysis. Springer, pp 310–321

  • van Stein B, Wang H, Kowalczyk W, Emmerich M, Bäck T (2020) Cluster-based Kriging approximation algorithms for complexity reduction. Appl Intell 50(3):778–791

    Article  Google Scholar 

  • Vazquez E, Bect J (2010a) Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J Stat Plann Inference 140(11):3088–3095

    Article  Google Scholar 

  • Vazquez E, Bect J (2010b) Pointwise consistency of the kriging predictor with known mean and covariance functions. In: Giovagnoli A, Atkinson AC, Torsney B, May C (eds) mODa 9—Advances in model-oriented design and analysis. Physica-Verlag HD, Heidelberg, pp 221–228. ISBN 978-3-7908-2410-0

  • Ying Z (1991) Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process. J Multivar Anal 36:280–296

    Article  Google Scholar 

  • Zhang H, Wang Y (2010) Kriging and cross validation for massive spatial data. Environmetrics 21:290–304

    Article  Google Scholar 

  • Zhu Z, Zhang H (2006) Spatial sampling design under the infill asymptotic framework. Environmetrics 17(4):323–337

    Article  Google Scholar 

Download references

Funding

Part of this research was conducted within the frame of the Chair in Applied Mathematics OQUAIDO, which gathers partners in technological research (BRGM, CEA, IFPEN, IRSN, Safran, Storengy) and academia (Ecole Centrale de Lyon, Mines Saint-Étienne, University of Nice, University of Toulouse and CNRS) around advanced methods for computer experiments. The authors F. Bachoc and D.Rullière acknowledge support from the regional MATH-AmSud program, Grant Number 20-MATH-03. The authors are grateful to the Editor-in-Chief, an Associate Editor and all referees for their constructive suggestions that led to an improvement in the manuscript

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Bachoc.

Appendices

Appendix 1. Proof of Proposition 1

For \(v \in \mathbb {R}^m\), we let \(|v| = \max _{i=1,\ldots ,m} |v_i|\) and \(B(v,r) = \{w \in \mathbb {R}^m,|v-w| \le r\}\).

Let \(x_0,{\bar{x}} \in D\), \(r_{x_0}>0\) and \(r_{{\bar{x}}}>0\) be fixed and satisfy \(B(x_0,r_{x_0}) \subset D\), \(B({\bar{x}},r_{{\bar{x}}}) \subset D\), \(B(x_0,r_{x_0}) \cap B({\bar{x}},r_{{\bar{x}}}) = \varnothing \) and \(k(x_0,{\bar{x}}) >0\). [The existence is implied by the assumptions of the proposition.] By continuity of k, \(r_{x_0} >0\) and \(r_{{\bar{x}}} >0\) can be selected small enough so that, with some fixed \(\epsilon _2 >0\) and \(\delta _1 >0\), for \(v \in B( x_0 , r_{x_0} )\) and \(w \in B( {\bar{x}} , r_{{\bar{x}}} )\), \(| v - w | \ge \delta _1\), \( k(x_0,x_0) /2 \le k(v,v) \le 2k(x_0,x_0)\), \( k({\bar{x}},{\bar{x}}) /2 \le k(w,w) \le 2k({\bar{x}},{\bar{x}})\) and

$$\begin{aligned} k(v,v) - \frac{k(v,w)^2}{k(w,w)} \le k(v,v) - \epsilon _2. \end{aligned}$$
(28)

For \(\delta >0\), let

$$\begin{aligned} V(\delta ) = \inf _{n \in \mathbb {N}} \inf _{\begin{array}{c} x_0,x_1,\ldots ,x_n \in D; \\ \forall i=1,\ldots ,n, |x_i - x_0| \ge \delta \end{array}} {{\,\mathrm{\mathrm {Var}}\,}}\left[ Y(x_0) | Y(x_1),\ldots ,Y(x_n) \right] . \end{aligned}$$

Then \(V(\delta )>0\) because of the NEB, by continuity of k and by compacity.

Consider a decreasing sequence \(\delta _n\) of non-negative numbers such that \(\delta _n \rightarrow _{n \rightarrow \infty } 0\), and which will be specified below. There exists a sequence \((u_n)_{n \in \mathbb {N}} \in D^{\mathbb {N}}\), composed of pairwise distinct elements, such that \(\lim _{n \rightarrow \infty } \sup _{x \in D}\min _{i=1,\ldots ,n} | u_{i} - x | = 0\), and such that for all n

$$\begin{aligned} \inf _{ \begin{array}{c} 1 \le i,j \le n \\ i \ne j \\ u_i,u_j \in B(x_0,r_{x_0} ) \end{array}} |u_i - u_j| \ge 4 \delta _n. \end{aligned}$$

Such a sequence indeed exists from Lemma 2 below.

Consider then a sequence \((w_n)_{n \in \mathbb {N}} \in D^{\mathbb {N}}\) such that for all n, \(w_n = {\bar{x}} -(r_{{\bar{x}}}/(1+n)) e_1\) with \(e_1=(1,0,\ldots ,0)\). We can assume furthermore that \(\{u_n \}_{n \in \mathbb {N}}\) and \(\{w_n \}_{n \in \mathbb {N}}\) are disjoint (this almost surely holds with the construction of Lemma 2 for \((u_n)\)).

Let us now consider two sequences of integers \(p_n\) and \(k_n\) with \(k_n \rightarrow \infty \) and \(p_n \rightarrow \infty \) to be specified later. Let \(C_n\) be the largest natural number m satisfying \(m (p_n-1) < n\). Let \(X = (X_1,\ldots ,X_{p_n})\) be defined by, for \(i=1,\ldots ,k_n\), \(X_i = ( u_j)_{ j=(i-1)C_n + 1,\ldots ,i C_n }\); for \(i=k_n+1,\ldots ,p_n-1\), \(X_i = ( w_j)_{j=(i-k_n-1) C_n + 1,\ldots ,(i - k_n) C_n }\); and \(X_{p_n} = ( w_j)_{j=(p_n-k_n-1) C_n + 1,\ldots ,n- k_n C_n }\). With this construction, note that \(X_{p_n}\) is nonempty. Furthermore, the sequence of vectors \(X = (X_{1},\ldots ,X_{p_n})\), indexed by \(n \in \mathbb {N}\), defines a triangular array of observation points satisfying the conditions of the proposition.

Let us discuss the construction of \((u_n)_{n \in \mathbb {N}}\), \((w_n)_{n \in \mathbb {N}}\), \(k_n\), \(C_n\) and \(p_n\) more informally. The sequence \((u_n)_{n \in \mathbb {N}}\) is dense in D, and \(X_1,\ldots ,X_{k_n}\) are composed of the \(k_n C_n\) first points of this sequence. Then, \(X_{k_n+1},\ldots ,X_{p_n}\) are composed of the \(n - C_n k_n\) first points of the sequence \((w_n)_{n \in \mathbb {N}}\), which is concentrated around \({\bar{x}}\). We will let \(k_n / p_n \rightarrow 0\) so that the majority of the groups in X contain points of \((w_n)_{n \in \mathbb {N}}\), so that they do not contain relevant information on the values of Y on \(B(x_0,r_{x_0})\) and yield an inconsistency of the aggregated predictor \(M_{{\mathcal {A}},n}\) on \(B(x_0,r_{x_0})\).

Coming back to the proof, observe that \(\inf _{i \in \mathbb {N}} \inf _{x \in B(x_0,r_{x_0})} |w_i - x| \ge \delta _1\) and let \(\epsilon _1 = V(\delta _1) >0\). Then, we have for all \(n \in \mathbb {N}\), for all \(x \in B(x_0,r_{x_0})\), and for all \(k=k_n+1,\ldots ,p_n\), since then \(X_k\) is nonempty and only contains elements \(w_i \in B({\bar{x}},r)\), from Eq. (28)

$$\begin{aligned} \epsilon _1 \le v_k(x) \le k(x,x) - \epsilon _2. \end{aligned}$$
(29)

Let \(\mathcal {E}_n = \{x \in B(x_0,r_{x_0}) ; \min _{i=1,\ldots ,n} | x - u_i | \ge \delta _n \}\) and let \(x \in \mathcal {E}_n\). Since x is not a component of X, we have \(v_k(x) >0\) for all k. Also, \(v_{p_n}(x) < k(x,x)\) from Eq. (29). Hence, \(M_{{\mathcal {A}},n}(x)\) is well-defined.

For two random variables A and B, we let \(||A-B|| = ({{\,\mathrm{\mathrm {E}}\,}}\left[ (A-B)^2\right] )^{1/2}\). For \(x \in \mathcal {E}_n\) let

$$\begin{aligned} R(x)= & {} \left| \left| \sum _{k=1}^{k_n} \alpha _{k,n}( v_1(x),\ldots ,v_{p_n}(x),v_\mathrm{prior}(x) ) M_k(x) \right| \right| . \end{aligned}$$

Then, from the triangular inequality, and since from the law of total variance, \(|| M_k(x) || \le || Y( x ) || = v_\mathrm{prior}(x)^{1/2}\), we have with \(\mathcal {V} = \{ k(x,x) ; x \in B(x_0,r(x_0)) \} \)

$$\begin{aligned} R(x)\le & {} \frac{ \sum _{k=1}^{k_n} a( v_k(x) , v_\mathrm{prior}(x) ) \sqrt{v_\mathrm{prior}(x)} }{ \sum _{l=1}^{p_n} b( v_l(x) , v_\mathrm{prior}(x) ) } \\\le & {} \frac{ k_n \sup _{v \in \mathcal {V}, V(\delta _n) \le s^2 \le v} a( s^2,v ) \sqrt{v} }{ (p_n - k_n) \inf _{v \in \mathcal {V}, \epsilon _1 \le s^2 \le v - \epsilon _2} b( s^2,v ) }, \end{aligned}$$

where the last inequality is obtained from Eq. (29) and the definition of \(\delta _n\) and \(V(\delta )\).

Now, for \(\delta >0\), let \(s(\delta ) = \sup _{v \in \mathcal {V},V(\delta ) \le s^2 \le v } a( s^2 , v )\). Since a is continuous and since \(V(\delta ) >0\), we have that \(s(\delta )\) is finite. Hence, we can choose a sequence \(\delta _n\) of positive numbers such that \(\delta _n \rightarrow _{n \rightarrow \infty } 0\) and \(s(\delta _n) \le \sqrt{n}\) (for instance, let \(\delta _n = \inf \{ \delta \ge n^{-1/2}; s(\delta ) \le n^{1/2} \}\)). Then, we can choose \(p_n = n^{4/5}\) and \(k_n = n^{1/5}\). Then, for large enough n,

$$\begin{aligned} \frac{k_n}{p_n - k_n} s(\delta _n) \le 2 n^{-3/5} \sqrt{n} \rightarrow _{n \rightarrow \infty } 0. \end{aligned}$$

Since

$$\begin{aligned} \frac{ \sup _{v \in \mathcal {V}} \sqrt{v} }{ \inf _{v \in \mathcal {V},\epsilon _1 \le s^2 \le v - \epsilon _2 } b(s^2,v) }\, , \end{aligned}$$

is a finite constant, as b is positive and continuous on \(\mathring{\Delta }\), we have that \( \sup _{x \in \mathcal {E}_n} R(x) \rightarrow _{n \rightarrow \infty } 0\). As a consequence, we have from the triangular inequality, for \(x \in \mathcal {E}_n\)

$$\begin{aligned} || Y(x) - M_{{\mathcal {A}},n}(x) || \ge&||Y(x) - \sum _{k=k_n+1}^{p_n} \alpha _{k,n}( v_1(x),\ldots ,v_{p_n}(x) , v_\mathrm{prior}(x) ) M_k(x) || \\&- || \sum _{k=k_n+1}^{p_n} \alpha _{k,n}( v_1(x),\ldots ,v_{p_n}(x) , v_\mathrm{prior}(x) ) M_k(x) - M_{{\mathcal {A}},n}(x) || \\ \ge&\inf _{x \in \mathcal {E}_n} \left| \left| Y(x) - \sum _{k=k_n+1}^{p_n} \alpha _{k,n}( v_1(x),\ldots ,v_{p_n}(x) , v_\mathrm{prior}(x) ) M_k(x) \right| \right| \\&- \sup _{x \in \mathcal {E}_n} R(x). \end{aligned}$$

Since \(X_{k_n+1},\ldots ,X_{p_n}\) are composed only of elements of \(\{ w_i \}_{i \in \mathbb {N}}\), we obtain

$$\begin{aligned} \liminf _{n \rightarrow \infty } \inf _{x \in \mathcal {E}_n} || Y(x) - M_{{\mathcal {A}},n}(x) || \ge V(\delta _1) > 0. \end{aligned}$$

As a result there exist fixed \(n_0 \in \mathbb {N}\) and \(A >0\) so that for \(n \ge n_0\), \(|| Y(x) - M_{{\mathcal {A}},n}(x) || \ge A\). We thus have, for \(n \ge n_0\)

$$\begin{aligned} \int _{D} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{{\mathcal {A}},n}(x)\right) ^2\right] dx \ge&\int _{\mathcal {E}_n} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{{\mathcal {A}},n}(x)\right) ^2\right] \\ \ge&\int _{\mathcal {E}_n} A^2 dx. \end{aligned}$$

It remains to be shown that the limit inferior of the volume of \(\mathcal {E}_n\) is not zero in order to show Eq. (4). Let \(N_n\) be the integer part of \(r_{x_0} / 4 \delta _n\). Then, the ball \(B(x_0,r_{x_0})\) contains \((2 N_n)^d\) disjoint balls of the form \(B(a,4 \delta _n)\) with \(a \in B(x_0,r_{x_0})\). If one of these balls \(B(a,4 \delta _n)\) does not intersect with \((u_i)_{i=1\ldots ,n}\), then we can associate to it a ball of the form \(B(s_a,\delta _n) \subset B(a,4 \delta _n) \cap \mathcal {E}_n\). If one of these balls \(B(a,4 \delta _n)\) does intersect with one \(u_j \in \{u_i\}_{i=1\ldots ,n}\), then we can find a ball \(B(s_a, \delta _n/2 ) \subset ( B(u_j, 2 \delta _n) \backslash B(u_j, \delta _n) ) \cap B(a,4 \delta _n) \cap \mathcal {E}_n\). Hence, we have found \((2 N_n)^d\) disjoint balls with radius \(\delta _n/2\) in \(\mathcal {E}_n\). Therefore, \(\mathcal {E}_n\) has volume at least \(2^d ((r_{x_0} / 4 \delta _n) - 1)^d \delta _n^{d} \) which has a strictly positive limit inferior. Hence, Eq. (4) is proved.

Finally, if \( {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x_0) - M_{{\mathcal {A}},n}(x_0)\right) ^2\right] \rightarrow 0\) as \(n \rightarrow \infty \) for almost all \(x_0 \in D\), then

$$\begin{aligned} \int _D \max \left( {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x_0) - M_{{\mathcal {A}},n}(x)\right) ^2\right] , 1 \right) dx \rightarrow _{n \rightarrow \infty } 0\, , \end{aligned}$$

from the dominated convergence theorem. This is in contradiction to the proof of Eq. (4). Hence, Eq. (5) is proved.       

Lemma 2

There exists a sequence \((u_n)_{n \in \mathbb {N}} \in D^{\mathbb {N}}\), composed of pairwise distinct elements, such that

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{x \in D}\min _{i=1,\ldots ,n} | u_{i} - x | = 0, \end{aligned}$$
(30)

and such that for all n

$$\begin{aligned} \inf _{ \begin{array}{c} 1 \le i,j \le n \\ i \ne j \\ u_i,u_j \in B(x_0,r_{x_0} ) \end{array}} |u_i - u_j| \ge 4 \delta _n. \end{aligned}$$
(31)

Proof

Such a sequence can be constructed, for instance, by the following random procedure. Let \(D \subset B(0,R)\) for large enough \(R>0\). Define \(u_1 \in D\) arbitrarily. For \(n=1,2,\ldots \), (1) if the set \( \mathcal {S}_n = \{u \in B(x_0,r_{x_0} ) ; \min _{i=1,\ldots ,n} |u-u_i| > 4 \delta _{n+1} \}\) is nonempty, sample \(u_{n+1}\) from the uniform distribution on \(\mathcal {S}_n\). (2) If \(\mathcal {S}_n\) is empty, sample \({\tilde{u}}_{n+1}\) from the uniform distribution on \(B(0,R) \backslash B(x_0,r_{x_0} )\), and set \(u_{n+1}\) as the projection of \({\tilde{u}}_{n+1}\) on \(D \backslash B(x_0,r_{x_0} )\). One can see that Eq. (31) is satisfied by definition. Furthermore, one can show that Eq. (30) almost surely holds. Indeed, let \(x \in B(x_0,r_{x_0} )\) and \(\epsilon >0\), and assume that with nonzero probability \(B(x,\epsilon ) \cap \{ u_i\}_{i \in \mathbb {N}} = \varnothing \). Then, case (1) occurs infinitely often, and for each i for which case (1) occurs, there is a probability at least \( \epsilon ^d / (2 r_{x_0})^d \) that \(u_i \in B(x,\epsilon )\) (when \(4 \delta _n \le \epsilon / 2\)). This yields a contradiction. Hence, for all \(x \in B(x_0,r_{x_0} )\) and \(\epsilon >0\), almost surely, \(B(x,\epsilon ) \cap \{ u_i\}_{i \in \mathbb {N}} \ne \varnothing \). We similarly show that for all \(x \in D \backslash B(x_0,r_{x_0} )\) and \(\epsilon >0\), almost surely, \(B(x,\epsilon ) \cap \{ u_i\}_{i \in \mathbb {N}} \ne \varnothing \). This shows that Eq. (30) almost surely holds. Hence, a fortiori, there exists a sequence \((u_n)_{n \in \mathbb {N}} \in D^{\mathbb {N}}\) satisfying the conditions of the lemma. \(\square \)

Remark 6

Consider the case \(d=1\). The proof of Proposition 1 can be modified so that the partition \(X_1,\ldots ,X_{p_n}\) also satisfies \(x \le x'\) for any \(x \in X_i\), \(x' \in X_j\), \(1 \le i < j \le p_n\). To see this, consider the same X as in this proof. Let \(X_1,\ldots ,X_{p_n}\) have the same cardinality as in this proof, and let the \(C_n\) smallest elements of X be associated to \(X_1\), the next \(C_n\) smallest be associated to \(X_2\) and so on. Then, one can show that there are at most \(k_n+2\) groups containing elements of \((u_i)_{i \in \mathbb {N}} \cap B(x_0 , r_{x_0}) \) and at least \(p_n - k_n -2\) groups containing only elements of \(B({\bar{x}},r_{{\bar{x}}})\). From these observations, Eqs. (4) and (5) can be proved similarly as in the proof of Proposition 1.

Appendix 2. Proof of Proposition 2

Because D is compact, we have \(\lim _{n \rightarrow \infty } \sup _{x \in D} \min _{i=1,\ldots ,n} || x_{ni} - x || = 0\). Indeed, if this does not hold, there exists \(\epsilon >0\) and a subsequence \(\phi (n)\) such that \(\sup _{x \in D} \min _{i=1,\ldots ,\phi (n)} || x_{\phi (n)i} - x || \ge 2 \epsilon \). Hence, there exists a sequence \(x_{\phi (n)} \in D\) such that \(\min _{i=1,\ldots ,\phi (n)} || x_{\phi (n)i} - x_{\phi (n)} || \ge \epsilon \). Since D is compact, up to extracting a further subsequence, we can also assume that \(x_{\phi (n)} \rightarrow _{n \rightarrow \infty } x_{lim}\) with \(x_{lim} \in D\). This implies that for all large enoughn, \(\min _{i=1,\ldots ,\phi (n)} || x_{\phi (n)i} - x_{lim} || \ge \epsilon / 2\), which is in contradiction to the assumptions of the proposition.

Hence there exists a sequence of positive numbers \(\delta _n\) such that \(\delta _n \rightarrow _{n \rightarrow \infty } 0\) and such that for all \(x \in D\) there exists a sequence of indices \(i_n(x)\) such that \(i_n(x) \in \{1,\ldots ,n\}\) and \(||x - x_{n i_n(x)}|| \le \delta _n\). There also exists a sequence of indices \(j_n(x)\) such that \(x_{ni_n(x)}\) is a component of \(X_{j_n(x)}\). With these notations we have, since \(M_1(x)\),..., \(M_{p_n}(x)\), \(M_{{\mathcal {A}}}(x)\) are linear combinations with minimal square prediction errors

$$\begin{aligned} \sup _{x \in D} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{{\mathcal {A}}}(x) \right) ^2\right]\le & {} \sup _{x \in D} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{j_n(x)}(x)\right) ^2\right] \nonumber \\\le & {} \sup _{x \in D} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - {{\,\mathrm{\mathrm {E}}\,}}\left[ Y(x) | Y(x_{ni_n(x)})\right] \right) ^2\right] . \end{aligned}$$
(32)

In the rest of the proof we essentially show that, for a dense triangular array of observation points, the kriging predictor that predicts Y(x) based only on the nearest neighbor of x among the observation points has a mean square prediction error that tends to zero uniformly in x when k is continuous. We believe that this fact is somehow known, but we have not been able to find a precise result in the literature. From Eq. (32) we have

$$\begin{aligned}&\sup _{x \in D} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{{\mathcal {A}}}(x) \right) ^2 \right] \\&\quad \le \sup _{x \in D} \left[ {\mathbbm {1}}\{ k(x_{ni_n(x)},x_{ni_n(x)}) = 0 \} k(x,x) + {\mathbbm {1}}\{ k(x_{ni_n(x)},x_{ni_n(x)})> 0 \}\right. \\&\qquad \left. \left( k(x,x) - \frac{k(x,x_{ni_n(x)})^2}{k(x_{ni_n(x)},x_{ni_n(x)})} \right) \right]&\\&\quad \le \sup _{\begin{array}{c} x,t \in D; \\ ||x-t|| \le \delta _n \end{array}} \left[ {\mathbbm {1}}\{ k(t,t) = 0 \} k(x,x) + {\mathbbm {1}}\{ k(t,t) > 0 \}\left( k(x,x) - \frac{k(x,t)^2}{k(t,t)} \right) \right]&\\&\quad = \sup _{\begin{array}{c} x,t \in D; \\ ||x-t|| \le \delta _n \end{array}} F(x,t).&\end{aligned}$$

Assume now that the above supremum does not go to zero as \(n \rightarrow \infty \). Then there exists \(\epsilon >0\) and two sub-sequences \(x_{\phi (n)}\) and \(t_{\phi (n)}\) with values in D such that \(x_{\phi (n)} \rightarrow _{n \rightarrow \infty } x_{lim}\) and \(t_{\phi (n)} \rightarrow _{n \rightarrow \infty } x_{lim}\), with \(x_{lim} \in D\) and such that \(F(x_{\phi (n)},t_{\phi (n)}) \ge \epsilon \). If \(k(x_{lim},x_{lim}) = 0\) then \(F(x_{\phi (n)},t_{\phi (n)}) \le k(x_{\phi (n)},x_{\phi (n)}) \rightarrow _{n \rightarrow \infty } 0\). If \(k(x_{lim},x_{lim}) > 0\), then for large enough n,

$$\begin{aligned} F(x_{\phi (n)},t_{\phi (n)}) = k(x_{\phi (n)},x_{\phi (n)}) - \frac{k(x_{\phi (n)},t_{\phi (n)})^2}{k(t_{\phi (n)},t_{\phi (n)})} \, , \end{aligned}$$

which tends to zero as \(n \rightarrow \infty \) since k is continuous. Hence we have a contradiction, which completes the proof.

Appendix 3. Proofs from Sect. 3.2

First notice that denoting \(k_{\mathcal {A}}(x,x') = {{\,\mathrm{\mathrm {Cov}}\,}}\left[ Y_{\mathcal {A}}(x), Y_{\mathcal {A}}(x')\right] \), we easily get for all \(x, x' \in D\)

$$\begin{aligned} \begin{aligned} k_{\mathcal {A}}(x,x')&= k(x,x') + 2 {k_M(x,x)}^t K_M(x,x)^{-1} K_M(x,x') K_M(x',x')^{-1} k_M(x',x') \\&\qquad - {k_M(x,x)}^t K_M(x,x)^{-1} k_M(x,x') - {k_M(x',x')}^t K_M(x',x')^{-1} k_M(x',x). \end{aligned} \end{aligned}$$
(33)

A direct consequence of Eq. (33) is \(k_{\mathcal {A}}(x,x) = k(x,x)\), and under the interpolation assumption H2, since \(Y_{\mathcal {A}}(X) = Y(X)\), \(k_{\mathcal {A}}(X,X) = k(X,X)\).

Proof of Proposition 3

The interpolation hypothesis \(M_{\mathcal {A}}(X) = Y(X)\) ensures that \(\varepsilon '_{\mathcal {A}}(X)=0\), so we have

$$\begin{aligned} \begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left[ Y_{\mathcal {A}}(x) | Y_{\mathcal {A}}(X)\right]&= {{\,\mathrm{\mathrm {E}}\,}}\left[ Y_{\mathcal {A}}(x) | M_{\mathcal {A}}(X)+0\right] \\&= {{\,\mathrm{\mathrm {E}}\,}}\left[ M_{\mathcal {A}}(x) | M_{\mathcal {A}}(X)\right] + {{\,\mathrm{\mathrm {E}}\,}}\left[ \varepsilon '_{\mathcal {A}}(x) | M_{\mathcal {A}}(X)\right] \\&= {{\,\mathrm{\mathrm {E}}\,}}\left[ g_x(Y(X)) | Y(X)\right] + 0 \\&= M_{\mathcal {A}}(x). \end{aligned} \end{aligned}$$
(34)

The proof that \(v_{\mathcal {A}}\) is a conditional variance follows the same pattern

$$\begin{aligned} \begin{aligned} {{\,\mathrm{\mathrm {Var}}\,}}\left[ Y_{\mathcal {A}}(x) | Y_{\mathcal {A}}(X)\right]&= {{\,\mathrm{\mathrm {Var}}\,}}\left[ Y_{\mathcal {A}}(x) | M_{\mathcal {A}}(X)\right] \\&= {{\,\mathrm{\mathrm {Var}}\,}}\left[ M_{\mathcal {A}}(x) | M_{\mathcal {A}}(X)\right] + {{\,\mathrm{\mathrm {Var}}\,}}\left[ \varepsilon '_{\mathcal {A}}(x) \right] \\&= v_{\mathcal {A}}(x). \end{aligned} \end{aligned}$$
(35)

\(\square \)

Proof of Proposition 4

Equation (11) is the classical expression of Gaussian conditional covariances, based on the fact that \(Y_{\mathcal {A}}\) is Gaussian. Let us now prove Eq. (12). For a component \(x_k\) of the vector of points X, using the interpolation assumption, we have \(M_{\mathcal {A}}(x_k) = Y(x_k)\) and

$$\begin{aligned} {{\,\mathrm{\mathrm {Cov}}\,}}\left[ Y_{\mathcal {A}}(x),Y_{\mathcal {A}}(x_k)\right] = {{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\mathcal {A}}(x)+ \varepsilon _{\mathcal {A}}'(x), M_{\mathcal {A}}(x_k)\right] = {{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\mathcal {A}}(x), Y(x_k)\right] . \end{aligned}$$

Note that \(\alpha _{\mathcal {A}}(x)\) is the \(p \times 1\) vector of aggregation weights of different sub-models at point x, so that \(M_{\mathcal {A}}(x)= {\alpha _{\mathcal {A}}(x)}^tM(x)\) and \( k_{\mathcal {A}}(x,x_k) = {\alpha _{\mathcal {A}}(x)}^t {{\,\mathrm{\mathrm {Cov}}\,}}\left[ M(x), Y(x_k)\right] \). We thus get

$$\begin{aligned} k_{\mathcal {A}}(x,X)= & {} {\alpha _{\mathcal {A}}(x)}^t {{\,\mathrm{\mathrm {Cov}}\,}}\left[ M(x), Y(X)\right] \, . \end{aligned}$$
(36)

Under the linearity assumption, there exists a \(p \times n\) deterministic matrix \(\Lambda (x)\) such that \(M(x)=\Lambda (x) Y(X)\). Thus \(k_{\mathcal {A}}(x,X) = {\alpha _{\mathcal {A}}(x)}^t \Lambda (x) k(X,X)\). As noted in Sect. 3, because of the interpolation condition, \(k_{\mathcal {A}}(X,X)=k(X,X)\) and

$$\begin{aligned} k_{\mathcal {A}}(x,X) k_{\mathcal {A}}(X,X)^{-1} k_{\mathcal {A}}(X,x') = {\alpha _{\mathcal {A}}(x)}^t \Lambda (x) k(X,X) {\Lambda (x')}^t \alpha _{\mathcal {A}}(x') \, . \end{aligned}$$
(37)

Using \(K_M(x,x')={{\,\mathrm{\mathrm {Cov}}\,}}\left[ M(x),M(x')\right] = \Lambda (x) k(X,X) {\Lambda (x')}^t\), we get

$$\begin{aligned} k_{\mathcal {A}}(x,X) k_{\mathcal {A}}(X,X)^{-1} k_{\mathcal {A}}(X,x') = {\alpha _{\mathcal {A}}(x)}^t K_M(x,x') \alpha _{\mathcal {A}}(x') \, . \end{aligned}$$
(38)

Lastly, starting from Eq. (11) and using both Eqs. (33) and (38), we get Eq. (12).

Finally, the development of \({{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x)-M_{\mathcal {A}}(x)\right) \left( Y(x')-M_{\mathcal {A}}(x')\right) \right] \) leads to the right-hand side of Eq. (12), so that

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x)-M_{\mathcal {A}}(x)\right) \left( Y(x')-M_{\mathcal {A}}(x')\right) \right] = c_{\mathcal {A}}(x,x') \, , \end{aligned}$$

and Eq. (13) holds. \(\square \)

Appendix 4. Proofs From Sect. 3.3

Proof of Proposition 5

Consider \(\Delta (x)\) as defined in Eq. (14). From Eq. (36), using both the linear and the interpolation assumptions, we get \(k(x,X)\Delta (x) = \left[ k(x,X) - k_{\mathcal {A}}(x,X) \right] k(X,X)^{-1}\). Inserting this result into Eq. (14), we have

$$\begin{aligned} M_{\mathcal {A}}(x)-M_{\mathrm{full}}(x)= [k_{\mathcal {A}}(x,X)-k(x,X)] k(X,X)^{-1} Y(X) \, , \end{aligned}$$
(39)

and the first equality holds. From Eq. (14), we also get \(v_{\mathcal {A}}(x)-v_{\mathrm{full}}(x)=k(x,X)k(X,X)^{-1}k(X,x)-k_{\mathcal {A}}(x,X)k(X,X)^{-1}k_{\mathcal {A}}(X,x)\), and the second equality holds. Note that under the same assumptions, we can also use \(k_{\mathcal {A}}(X,X)=k(X,X)\) and \(k_{\mathcal {A}}(x,x)=k(x,x)\) and start from \(M_{\mathcal {A}}=k_{\mathcal {A}}(x,X)k_{\mathcal {A}}(X,X)^{-1}Y(X)\) and \(v_{\mathcal {A}}(x)= k_{\mathcal {A}}(x,x)-k_{\mathcal {A}}(x,X)k_{\mathcal {A}}(X,X)^{-1}k_{\mathcal {A}}(X,x)\) to get the same results.

Let us now show Eq. (17). The upper bound comes from the fact that \(M_{\mathcal {A}}(x)\) is the best linear combination of \(M_k(x)\) for \(k \in {\lbrace {1,\ldots , p}\rbrace }\). The positivity of \(v_{\mathcal {A}}- v_{\mathrm{full}}\) can be similarly proved: \(M_{\mathcal {A}}(x)\) is a linear combination of \(Y(x_k)\), \(k \in {\lbrace {1, \ldots , n}\rbrace }\), whereas \(M_{\mathrm{full}}(x)\) is the best linear combination. Note that \(v_{\mathcal {A}}(x)-v_{\mathrm{full}}(x) \ge 0\) implies, using Eq. (15), that \({\Vert {}k_{\mathcal {A}}(X,x)\Vert }_K\le {\Vert {}k(X,x)\Vert }_K\). Let us show Eq. (16). We get the result starting from Eq. (39), applying Cauchy-Schwartz inequality. The bound on \(v_{\mathcal {A}}(x)-v_{\mathrm{full}}(x)\) derives directly from Eq. (15), using \({\Vert {}k_{\mathcal {A}}(X,x)\Vert }_K\le {\Vert {}k(X,x)\Vert }_K\).

Finally, the classical inequality between \({\Vert {}.\Vert }_K\) and \({\Vert {}.\Vert }\) derives from the diagonalization of k(XX). One can observe that it depends on n and X, but it does not depend on the prediction point x. \(\square \)

Proof of Remark 4

Using \({\Vert {}k_{\mathcal {A}}(X,x)\Vert }_K\le {\Vert {}k(X,x)\Vert }_K\), using the equivalence of norms and triangular inequality, assuming that the smallest eigenvalue \(\lambda _{\min }\) of k(XX) is nonzero, bounds Eq. (16) in the previous Proposition 5 imply that

$$\begin{aligned} \left\{ {\begin{array}{lcl} {\vert {}M_{\mathcal {A}}(x)-M_{\mathrm{full}}(x)\vert } &{} \le &{} \frac{2}{\lambda _{\min }}{\Vert {}k(X,x)\Vert } {\Vert {}Y(X)\Vert } \\ {\vert {}v_{\mathcal {A}}(x)-v_{\mathrm{full}}(x)\vert } &{} \le &{} \frac{1}{\lambda _{\min }}{\Vert {}k(X,x)\Vert }^2 \end{array}}\right. . \end{aligned}$$
(40)

Noting that the \({\Vert {}.\Vert }_K\) and \(\lambda _{\min }\) do not depend on x (although they depend on X and n), the result holds. \(\square \)

Proof of Remark 5

As \(\Lambda (x)\) is \(n \times n\) and invertible, we have

$$\begin{aligned} {k_M(x,x)}^t K_M(x,x)^{-1} M(x)&= {k(x,X)}^t {\Lambda (x)}^t ( \Lambda (x) k(X,X) { \Lambda (x) }^t )^{-1} \Lambda (x) Y(x) \\&= M_{{\mathrm{full}}}(x), \end{aligned}$$

and similarly \(v_{{\mathcal {A}}}(x) = v_{{\mathrm{full}}}(x)\). As \(M_{\mathcal {A}}= M_{\mathrm{full}}\), we have \(Y_{\mathcal {A}}= M_{\mathrm{full}}+ \varepsilon \) where \(\varepsilon \) is an independent copy of \(Y- M_{\mathrm{full}}\). Furthermore \(Y= M_{\mathrm{full}}+ Y- M_{\mathrm{full}}\) where \(M_{\mathrm{full}}\) and \(Y- M_{\mathrm{full}}\) are independent, by Gaussianity, so \(Y_{\mathcal {A}}{\mathop {=}\limits ^{law}} Y\). \(\square \)

Appendix 5. Proofs from Sect. 4

Proof of Proposition 6

Let \((x,v_1,\ldots ,v_r)\) be \(r+1\) two-by-two distinct real numbers. If \(x < \min (v_1,\ldots ,v_r)\), then the conditional expectation of Y(x) given \(Y(v_1),\ldots ,Y(v_r)\) is equal to \(\exp ( - | \min (v_1,\ldots ,v_r) - x | / \theta ) Y( \min (v_1,\ldots ,v_r) )\) (Ying 1991). Similarly, if \(x > \max (v_1,\ldots ,v_r)\), then the conditional expectation of Y(x) given \(Y(v_1),\ldots ,Y(v_r)\) is equal to \(\exp ( - | \max (v_1,\ldots ,v_r) - x | / \theta ) Y( \max (v_1,\ldots ,v_r) )\). If \(\min (v_1,\ldots ,v_r)< x < \max (v_1,\ldots ,v_r)\), then the conditional expectation of Y(x) given \(Y(v_1),\ldots ,Y(v_r)\) is equal to \(a Y( x_{<} ) + bY( x_{>} )\), where \(x_{<}\) and \(x_{>}\) are the leftmost and rightmost neighbors of x in \(\{ v_1 , \ldots , v_r \}\) and where ab are nonzero real numbers (Bachoc et al. 2017). Finally, because the covariance matrix of \(Y(v_1),\ldots ,Y(v_r)\) is invertible, two linear combinations \(\sum _{i=1}^r a_i Y(v_i)\) and \(\sum _{i=1}^r b_i Y(v_i)\) are almost surely equal if and only if \((a_1,\ldots ,a_r) = (b_1,\ldots ,b_r)\).

Assume that \(X_1,\ldots ,X_p\) is a perfect clustering and let \(x \in \mathbb {R}\). It is known from Rullière et al. (2018) that \(M_{\mathcal {A}}(x) = M_{\mathrm{full}}(x)\) almost surely if \(x \in \{x_1,\ldots ,x_n\}\). Consider now that \(x \not \in \{x_1,\ldots ,x_n\}\).

If \(x < \min (x_1,\ldots ,x_n)\), then for \(i=1,\ldots ,p\), \(M_i(x) = \exp ( - | x_{j_i} - x | / \theta ) Y(x_{j_i})\) with \(x_{j_i} = \min \{ x ; x \in X_i \}\). Let \(i^* \in \{1,\ldots ,p\}\) be so that \(\min (x_1,\ldots ,x_n) \in X_{i^*}\). Then \(M_{\mathrm{full}}(x) = \exp ( - | x_{j_{i^*}} - x | / \theta ) Y(x_{j_{i^*}})\). As a consequence, the linear combination \(\lambda _x^t M(x) \) minimizing \({{\,\mathrm{\mathrm {E}}\,}}[ (\lambda ^t M(x) - Y(x) )^2]\) over \(\lambda \in \mathbb {R}^p\) is given by \(\lambda _x = e_{i^*}\) with \(e_{i^*}\) the \(i^*\)-th base column vector of \(\mathbb {R}^p\). This implies that \(M_{\mathrm{full}}(x) = M_{{\mathcal {A}}}(x)\) almost surely. Similarly, if \(x > \max (x_1,\ldots ,x_n)\), then \(M_{\mathrm{full}}(x) = M_{{\mathcal {A}}}(x)\) almost surely.

Consider now that there exists \(u \in X_{i}\) and \(v \in X_{j}\) so that \( u< x < v \) and (uv) does not intersect with \(\{x_1,\ldots ,x_n\}\). If \(i = j\), then \(M_i(x) = M_{{\mathrm{full}}}(x)\) almost surely because the leftmost and rightmost neighbors of x are both in \(X_i\). Hence, also \(M_{{\mathcal {A}}}(x) =M_{\mathrm{full}}(x)\) almost surely in this case. If \(i \ne j\), then \(u = \max \{t ; t \in X_i\}\) and \(v = \min \{t ; t \in X_j\}\) because \(X_1,\ldots ,X_p\) is a perfect clustering. Hence, \(M_i(x) = \exp ( - |x - u|) Y(u)\), \(M_j(x) = \exp ( - |x - v|) Y(v)\) and \(M_{{\mathrm{full}}}(x) = a Y(u) + bY(v)\) with \(a,b \in \mathbb {R}\). Hence, there exists a linear combination \(\lambda _i M_i(x) + \lambda _j M_j(x)\) that equals \(M_{{\mathrm{full}}}(x)\) almost surely. As a consequence, the linear combination \(\lambda _x^t M(x) \) minimizing \({{\,\mathrm{\mathrm {E}}\,}}[ (\lambda ^t M(x) - Y(x) ]^2)\) over \(\lambda \in \mathbb {R}^p\) is given by \(\lambda _x = \lambda _i e_{i} + \lambda _j e_j\), with \(e_i\) and \(e_j\) the i-th and j-th base column vectors of \(\mathbb {R}^p\). Hence \(M_{\mathrm{full}}(x) = M_{{\mathcal {A}}}(x)\) almost surely. All the possible sub-cases have now been treated, which proves the first implication of the proposition.

Assume now that \(X_1,\ldots ,X_p\) is not a perfect clustering. Then there exists a triplet uvw, with \(u,v \in X_i\) and \(w \in X_j\) with \(i,j=1 \ldots ,p\), \(i \ne j\), and so that \( u< w <v \). Without loss of generality it can further be assumed that there does not exist \(z \in X_i\) satisfying \(u< z <v\).

Let x satisfy \( u< x < w \) and so that (ux) does not intersect \(\{ x_1,\ldots ,x_n \}\). Then \(M_{{\mathrm{full}}} (x) = a Y(u) + b Y(z)\) with \(a , b \in \mathbb {R}\backslash \{0\}\) and \(z \in \{ x_1,\ldots ,x_n \}\), \(z \ne v\). Also, \(M_i(x) = \alpha Y(u) + \beta Y(v)\) with \(\alpha , \beta \in \mathbb {R}\backslash \{0\}\). As a consequence, there cannot exist a linear combination \(\lambda ^t M(x)\) with \(\lambda \in \mathbb {R}^p\) so that \(\lambda ^t M(x) = a Y(u) + b Y(w)\). Indeed, a linear combination \(\lambda ^t M(x)\) is a linear combination of \(Y(x_1),\ldots ,Y(x_n)\) where the coefficients for Y(u) and Y(v) are \(\lambda _i \alpha \) and \(\lambda _i \beta \), which are either simultaneously zero or simultaneously nonzero. Hence, \(M_{\mathcal {A}}(x)\) is almost surely not equal to \(M_{\mathrm{full}}(x)\). This concludes the proof. \(\square \)

Proof of Proposition 7

Let \(x \in \mathbb {R}\setminus X\). For \(i=1, \ldots , p\), \(0< v_i(x) < v_\mathrm{prior}(x)\), so that \(\alpha _i(v_1(x),\ldots ,v_p(x),v_\mathrm{prior}(x)) \in \mathbb {R}\setminus {\lbrace {0}\rbrace }\). Hence, the linear combination \(M_{\mathcal {A}}(x) = \sum _{k=1}^{p} \alpha _{k}(v_1(x),\ldots ,v_{p}(x),v_\mathrm{prior}(x)) M_k(x)\) is a linear combination of \(Y(x_1),\ldots ,Y(x_n)\) with at least p nonzero coefficients (since each \(M_k(x)\) is a linear combination of one or two elements of \(Y(x_1),\ldots ,Y(x_n)\), all these elements being two-by-two distinct, see the beginning of the proof of Proposition 6). Hence, because the covariance matrix of \(Y(x_1),\ldots ,Y(x_n)\) is invertible, \(M_{\mathcal {A}}(x)\) almost surely cannot be equal to \(M_{{\mathrm{full}}}(x)\), since \(M_{{\mathrm{full}}}(x)\) is a linear combination of \(Y(x_1),\ldots ,Y(x_n)\) with one or two nonzero coefficients. \(\square \)

Appendix 6. Proofs from Sect. 5

Proof of Proposition 8

Because \(M_{{\mathcal {A}},\eta }(x)\) is the best linear predictor of Y(x), for \(n \in \mathbb {N}\), we have

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{{\mathcal {A}},\eta }(x) \right) ^2 \right] \le {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{\eta ,i_n}(x) \right) ^2 \right] . \end{aligned}$$
(41)

Let \(\epsilon >0\). Let \(N_n\) be the number of points in \(X_{i_n}\) that are at Euclidean distance less than \(\epsilon \) from x. By assumption, \(N_n \rightarrow \infty \) as \(n \rightarrow \infty \). Let us write these points as \(x_{nj_1},\ldots ,x_{nj_{N_n}}\), with corresponding measurement errors \(\xi _{j_1},\ldots ,\xi _{j_{N_n}}\). Since \(M_{\eta ,i_n}(x)\) is the best linear unbiased predictor of Y(x) from the elements of \(Y(x_{n j_1}) + \xi _{j_1} , \ldots ,Y(x_{n j_{N_n}}) + \xi _{j_{N_n}} \), we have

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{\eta ,i_n}(x) \right) ^2 \right] \le {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - \frac{1}{N_n} \sum _{a=1}^{N_n} ( Y(x_{nj_a}) + \xi _{j_a} ) \right) ^2 \right] . \end{aligned}$$
(42)

By independence of Y and \(\xi _X\), we obtain

$$\begin{aligned}&{{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - \frac{1}{N_n} \sum _{a=1}^{N_n} ( Y(x_{nj_a}) + \xi _{j_a} ) \right) ^2 \right] = {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( \frac{1}{N_n} \sum _{a=1}^{N_n} (Y(x) - Y(x_{nj_a}) ) \right) ^2 \right] \\&\qquad + {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( \frac{1}{N_n} \sum _{a=1}^{N_n} \xi _{j_a} \right) ^2 \right] \le \left( \max _{a=1,\ldots ,N_n} {{\,\mathrm{\mathrm {E}}\,}}\left[ (Y(x) - Y(x_{nj_a}) )^2 \right] \right) + \frac{\sum _{a=1}^{N_n} \eta _a}{N_n^2}. \end{aligned}$$

The above inequality follows from Cauchy-Schwarz, the fact that Y has mean zero and the independence of \(\xi _{j_1},\ldots ,\xi _{j_{N_n}}\). We then obtain, since \((\eta _a)_{a \in \mathbb {N}}\) is bounded,

$$\begin{aligned} \limsup _{n \rightarrow \infty }&{{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - \frac{1}{N_n} \sum _{a=1}^{N_n} ( Y(x_{nj_a}) + \xi _{j_a} ) \right) ^2 \right] \le \sup _{\begin{array}{c} u \in D \\ ||u - x|| \le \epsilon \end{array} } {{\,\mathrm{\mathrm {E}}\,}}\left[ (Y(x) - Y(u) )^2 \right] \\&= \sup _{\begin{array}{c} u \in D \\ ||u - x|| \le \epsilon \end{array} } \left( k(x,x) + k(u,u) - 2 k(x,u) \right) . \end{aligned}$$

From Eqs. (41) and (42), we have, for any \(\epsilon >0\)

$$\begin{aligned} \limsup _{n \rightarrow \infty } {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( Y(x) - M_{{\mathcal {A}},\eta }(x) \right) ^2 \right] \le \sup _{\begin{array}{c} u \in D \\ ||u - x|| \le \epsilon \end{array} } \left( k(x,x) + k(u,u) - 2 k(x,u) \right) . \end{aligned}$$
(43)

The above display tends to zero as \(\epsilon \rightarrow 0\) because k is continuous. Hence the \(\limsup \) in Eq. (43) is zero, which concludes the proof. \(\square \)

Proof of Lemma 1

Let \(\epsilon >0\). For \(n \in \mathbb {N}\), let \(N_n\) be the number of points in \(\{x_1,\ldots ,x_n \}\) that are at Euclidean distance less than \(\epsilon \) to x. Because x is in the interior of D and because \(g >0\) on D, we have \(p_{\epsilon } = {{\,\mathrm{\mathrm {P}}\,}}( ||x_1 - x || \le \epsilon ) >0\). Hence from the law of large numbers, almost surely, for large enough n, \(N_n \ge (p_{\epsilon }/2) n \). For each \(n \in \mathbb {N}\), the \(N_n\) points in \(\{x_1,\ldots ,x_n \}\) that are at Euclidean distance less than \(\epsilon \) to x are partitioned into \(p_n\) classes. Hence, one of these classes, say the class \(X_{i_n}\), contains a number of points larger than or equal to \( N_n / p_n \). Since \(n / p_n\) tends to infinity by assumption, we conclude that the number of points in \(X_{i_n}\) at distance less than \(\epsilon \) from x almost surely tends to infinity. This concludes the proof. \(\square \)

Proof of Proposition 9

The proof is based on the same construction of the triangular array of observation points and of the sequence of partitions as in the proof of Proposition 1. We take x as \(x_0\) in this proof. Only a few comments are needed.

We let \(V(\delta )\) be as in the proof of Proposition 1 and we note that for any \(\delta >0\), for any \(r \in \mathbb {N}\), for any Gaussian vector \((U_1,\ldots ,U_{r})\) independent of Y and for any \(u_0,u_1,\ldots ,u_r \in D\) with \(||u_i - u_0|| \ge \delta \) for \(i=1,\ldots ,r\), we have

$$\begin{aligned}&{{\,\mathrm{\mathrm {Var}}\,}}\left[ Y(u_0) | Y(u_1) + U_{1},\ldots ,Y(u_r) + U_{r} \right] \ge {{\,\mathrm{\mathrm {Var}}\,}}\left[ Y(u_0) | Y(u_1),U_{1},\ldots ,Y(u_r) , U_{r}\right] \\&\quad = {{\,\mathrm{\mathrm {Var}}\,}}\left[ Y(u_0) | Y(u_1) ,\ldots ,Y(u_r) \right] \ge V(\delta ). \end{aligned}$$

We also note that the triangular array and sequence of partitions of the proof of Proposition 1 do satisfy the condition of Proposition 8. Indeed, the first component \(X_1\) of the partition, with cardinality \(C_n \rightarrow \infty \), is dense in D.

We note that for \(k=k_n+1,\ldots ,p_n\) (notations of the proof of Proposition 1), for any row of \(X_k\), of the form \(x_{nb}\) with \(b \in \{ 1 , \ldots ,n\}\), we have \(v_k(x) \le V[ Y(x) | Y(x_{nb}) + \xi _{b} ] \le k(x,x) - k(x,x_{nb})^2 / (k(x_{nb},x_{nb}) + \eta _{b})\). Hence, because \((\eta _i)_{i \in \mathbb {N}}\) is bounded, there is a fixed \(\epsilon '_2 >0\) such that for \(k=k_n+1,\ldots ,p_n\), \(\epsilon _1 \le v_k(x) \le k(x,x) - \epsilon '_2\), with \(\epsilon _1\) as in the proof of Proposition 1.

With these comments, the arguments of the proof of Proposition 1 lead to the conclusion of Proposition 9. \(\square \)

Proof of Proposition 10

We can see that \(M_{\text {UK},i}(x) = w_i(x)^t Z(X_i) \) for \(i=1,\ldots ,p\). Hence, for \(i,j=1,\ldots ,p\)

$$\begin{aligned}&{{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\text {UK},i}(x) , M_{\text {UK},j}(x) \right] = w_i(x)^t {{\,\mathrm{\mathrm {Cov}}\,}}\left[ Z(X_i) , Z(X_j) \right] w_j(x) \\&\quad = w_i(x)^t k(X_i,X_j) w_j(x). \end{aligned}$$

Hence, \({{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\text {UK}}(x) \right] = K_{\text {UK},M}(x,x) \). Furthermore, for \(i = 1 , \ldots ,p\)

$$\begin{aligned} {{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\text {UK},i}(x) , Z(x) \right] = w_i(x)^t {{\,\mathrm{\mathrm {Cov}}\,}}\left[ Z(X_i) , Z(x) \right] = w_i(x)^t k(X_i,x). \end{aligned}$$

Hence, \({{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\text {UK}}(x) , Z(x) \right] = k_{\text {UK},M}(x,x) \). Let

$$\begin{aligned} \alpha (x) = \underset{ \begin{array}{c} \gamma \in \mathbb {R}^p \\ {{\,\mathrm{\mathrm {E}}\,}}[ \gamma ^t M_{\text {UK}}(x) ] = {{\,\mathrm{\mathrm {E}}\,}}[ Z(x) ] \\ \hbox {for any value of}\, \beta \,\hbox {in} \,\mathbb {R}^m \end{array} }{ \mathrm {argmin} } {{\,\mathrm{\mathrm {E}}\,}}\left[ \left( \gamma ^t M_{\text {UK}}(x) - Z(x) \right) ^2 \right] . \end{aligned}$$
(44)

Since \({{\,\mathrm{\mathrm {E}}\,}}[ M_{\text {UK},i}(x) ] = {{\,\mathrm{\mathrm {E}}\,}}[ Z(x) ]\) for \(i=1,\ldots ,p\) and for any value of \(\beta \in \mathbb {R}^m\), the constraint in Eq. (44) can be written as \(\gamma ^t \mathrm {1}_p{{\,\mathrm{\mathrm {E}}\,}}[ Z(x)] = {{\,\mathrm{\mathrm {E}}\,}}[ Z(x) ] \); that is, \(\gamma ^t \mathrm {1}_p= 1\). The mean square prediction error in Eq. (44) can be written as

$$\begin{aligned} k(x,x) + \gamma ^t K_{\text {UK},M}(x,x) \gamma - 2 \gamma ^t k_{\text {UK},M}(x,x). \end{aligned}$$

Thus Eq. (44) becomes

$$\begin{aligned} \alpha (x) = \underset{ \begin{array}{c} \gamma \in \mathbb {R}^p \\ \gamma ^t \mathrm {1}_p= 1 \end{array} }{ \mathrm {argmin} } \left( k(x,x) + \gamma ^t K_{\text {UK},M}(x,x) \gamma - 2 \gamma ^t k_{\text {UK},M}(x,x) \right) . \end{aligned}$$

We recognize the optimization problem of ordinary kriging which corresponds to universal kriging with an unknown constant mean function (Sacks et al. 1989; Chilès and Delfiner 2012). Hence, we have

$$\begin{aligned} \alpha (x)^t M_{\text {UK}}(x)= & {} {\hat{m}}_{\text {UK},M}(x) + k_{\text {UK},M}(x,x)^t K_{\text {UK},M}(x,x)^{-1}\\&\times \left( M_{\text {UK}}(x) - {\hat{m}}_{\text {UK},M}(x) \mathrm {1}_p\right) , \end{aligned}$$

from for instance Sacks et al. (1989), Chilès and Delfiner (2012). Hence we have \(\alpha (x)^t M_{\text {UK}}(x) = M_{{\mathcal {A}},\text {UK}}(x)\), the best linear predictor described in Proposition 10.

We can see that \(M_{{\mathcal {A}},\text {UK}}(x) = \alpha _{{\mathcal {A}},\text {UK}}(x)^t M_{\text {UK}}(x) \) and that \(\alpha _{{\mathcal {A}},\text {UK}}(x) = \alpha (x)\). Then, since \({{\,\mathrm{\mathrm {E}}\,}}[ \alpha _{{\mathcal {A}},\text {UK}}(x)^t M_{\text {UK}}(x) ] = Z(x)\), from \({{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\text {UK}}(x) \right] = K_{\text {UK},M}(x,x) \) and from \({{\,\mathrm{\mathrm {Cov}}\,}}\left[ M_{\text {UK}}(x) , Z(x) \right] = k_{\text {UK},M}(x,x) \), we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}&\left[ \left( M_{{\mathcal {A}},\text {UK}}(x) - Z(x) \right) ^2 \right] = k(x,x) + \alpha _{{\mathcal {A}},\text {UK}}(x)^t K_{\text {UK},M}(x,x) \alpha _{{\mathcal {A}},\text {UK}}(x) \\&\quad - 2 \alpha _{{\mathcal {A}},\text {UK}}(x)^t k_{\text {UK},M}(x,x). \end{aligned}$$

This concludes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bachoc, F., Durrande, N., Rullière, D. et al. Properties and Comparison of Some Kriging Sub-model Aggregation Methods. Math Geosci 54, 941–977 (2022). https://doi.org/10.1007/s11004-021-09986-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-021-09986-2

Keywords

Navigation