1 Introduction

Spatial prediction is required in many applications, and examples can be found in natural resource mapping, meteorology and image analysis. Consider a regionalized variable \(\left\{ r(x); x \in \mathbf D \subset \mathfrak {R}^{m}\right\} \) where \(r(x) \in \mathfrak {R}^{1}\) is the variable of interest and x is a reference variable in the domain \(\mathbf D \). The challenge is to predict the regionalized variable from a set of observations \(\mathbf r _{o}=[r(x_{1}),\ldots ,r(x_{n})]\); \(x_{1},\ldots ,x_{n}\in \mathbf D \). In the current study, the predictors are defined in a probabilistic setting and associated predictor uncertainties can also be obtained.

The classical probabilistic approach to spatial prediction is kriging (Journel and Huijbregts 1978; Chiles and Delfiner 2009). The traditional ordinary kriging predictor is based on a stationary model for the regionalized variable, with spatially constant expectation and variance, and a translation invariant spatial correlation function. The localized predictors, local neighborhood kriging (Chiles and Delfiner 2009) can be defined to robustify the predictor with respect to deviations from the stationarity assumptions. The major challenge in using localized predictors is to define the size of the local neighborhood where a bias/variance trade-off must be made. For some spatial correlation structures, a screening effect is provided by the observations closest to the prediction location (Stein 2002), and this effect may robustify the localization. A weighted localized approach is also defined (Anderes and Stein 2011) and it is demonstrated to be useful for non-stationary random fields.

Recent developments in computer and sensor technology have provided enormous spatio-temporal sets of observations (Johns et al. 2003). Computational demands in spatial prediction have been a critical factor and considerable research is devoted to numerical algorithms for large sparse correlation matrices. Localized predictors may provide an alternative solution to reduce these computational demands.

In the current study, spatial predictors which are robust with respect to deviations from assumptions about spatial stationarity are defined. These predictors are based on a Gaussian random field with spatially varying expectation and variance. The spatial correlation function is shift invariant and known. The expectation and variance are assessed locally by a sliding window approach. In traditional local neighborhood kriging, this assessment is done by some maximum likelihood procedure. The new feature of the study is that these local assessments are done by shrinkage estimators in an empirical Bayes setting (Efron and Morris 1973). The hierarchical, Gaussian random field model (Røislien and Omre 2006) is used locally and the hyper-parameters are assessed from the global set of observations. Since the predictor is locally defined it is extremely computationally efficient. The resulting spatial predictor is termed localized/shrinkage kriging.

In the two first sections, various Gaussian random field models for the regionalized variable \(\left\{ r(x); x \in \mathbf D \right\} \) are presented with associated model parameter estimators. In the following section, five spatial predictors are defined, one global and four localized. Two of these predictors feature shrinkage. The next section contains an empirical evaluation of these predictors. The evaluation criteria used in the comparison are defined, and the five predictors are compared on a real set of annual accumulated precipitation data and in a synthetic simulation study. In the last section, the conclusions are forwarded. The paper summarizes the major findings in an extended study (Asfaw 2014).

2 Predictor Models

The spatial predictors are based on probabilistic models for the regionalized variable \(\left\{ r(x);x\in \mathbf D \right\} \). In traditional kriging prediction \(\left\{ r(x);x\in \mathbf D \right\} \) is associated with a stationary Gaussian random field (Chiles and Delfiner 2009). This assumption entails that

$$\begin{aligned} E[r(x)]= & {} \mu ;\quad \forall x\in \mathbf D \nonumber \\ \mathrm{Var}[r(x)]= & {} \sigma ^{2} ;\quad \forall x\in \mathbf D \nonumber \\ \mathrm{Corr}[r(x^{'}), r(x^{''})]= & {} \rho (x^{'}- x^{''});\quad \forall x^{'}, x^{''}\in \mathbf D \end{aligned}$$
(1)

where the expected level \(\mu \in \mathfrak {R}^{1}\), the variance level \(\sigma ^{2} \in \mathfrak {R}^{1}_{+}\) and the spatial correlation function \(\rho (x^{'}- x^{''})\) is positive definite. Note that for these model assumptions, the random field is shift invariant, and this property is extensively used to make inference about the model parameters \([\mu , \sigma ^{2}, \rho (.)]\) from the set of observations \(\mathbf r _{o}\). This traditional kriging model may be extended to have an expectation surface \(\mu (x)=\sum _{l=1}^{L}\alpha _{l}g_{l}(x)\) where \(\left\{ g_{l}(x); x\in \mathbf D \right\} \); \(l=1,\ldots ,L\) are known basis surfaces while \(\alpha =(\alpha _{1},\ldots ,\alpha _{L})\) are unknown coefficients. This model corresponds to a spatial regression model, and the shift invariance property is lost, which complicates model parameter inference. Note that for given correlation function \(\rho (.)\), maximum likelihood estimates based on \(\mathbf r _{o}\) are analytically assessable for the other model parameters, under both these model assumptions.

In the current study, it is assumed that \(\left\{ r(x); x\in \mathbf D \right\} \) is associated with a general Gaussian random field, which entails that

$$\begin{aligned} E[r(x)]= & {} \mu (x);\quad \forall x\in \mathbf D \nonumber \\ \mathrm{Var}[r(x)]= & {} \sigma ^{2}(x) ;\quad \forall x\in \mathbf D \nonumber \\ \mathrm{Corr}[r(x^{'}), r(x^{''})]= & {} \rho (x^{'}, x^{''});\quad \forall x^{'}, x^{''}\in \mathbf D \end{aligned}$$
(2)

with \(\mu (x) \in \mathfrak {R}^{1}\), \(\sigma ^{2}(x) \in \mathfrak {R}^{1}_{+}\) and \(\rho (x^{'}, x^{''})\) being positive definite. There is obviously a lack of translation invariance under these assumptions. One can expect that inference of the spatial model parameters \(\{\mu (x); x\in \mathbf D \}\), \(\{\sigma ^{2}(x); x\in \mathbf D \}\) and \(\{ \rho (x^{'}, x^{''}); \forall x^{'}, x^{''}\in \mathbf D \}\) based on \(\mathbf r _{o}\) is complicated. If the correlation function \(\rho (.,.)\) is fixed, one may use localized estimators in a kernel spirit to assess the spatial model parameters \(\mu (.)\) and \(\sigma ^{2}(.)\). When selecting the size of the localization, a bias/variance trade-off must be made large local neighborhoods introduce bias in the estimators due to smoothing while small neighborhoods introduce instability in the estimators due to censoring of observations. In the current study, this bias/variance trade-off is addressed by defining shrinkage estimators in an empirical Bayes setting.

To define the shrinkage estimators, a stationary, hierarchical Gaussian random field model is introduced for a local neighbourhood \(\mathbf D _{+}\) around an arbitrary \(x_{+}\in \mathbf D \), i.e for \(\{r(x); x\in \mathbf D _{+}\}\). In this model the expected level m and the variance level \(s^{2}\) are considered to be random variables. Moreover, \(\{[r(x){\mid } m,s^{2} ]; x\in \mathbf D _{+} \}\) is defined to be a stationary Gaussian random field, hence

$$\begin{aligned} E[r(x){\mid } m, s^{2}]= & {} m;\quad \forall x\in \mathbf D _{+}\nonumber \\ \mathrm{Var}[r(x){\mid } m, s^{2}]= & {} s^{2} ;\quad \forall x\in \mathbf D _{+}\nonumber \\ \mathrm{Corr}[r(x^{'}), r(x^{''}){\mid } m, s^{2}]= & {} \rho (x^{'}- x^{''});\quad \forall x^{'}, x^{''}\in \mathbf D _{+}. \end{aligned}$$
(3)

It is assumed that \(\rho (.)\) is known, and that the model parameters \([m {\mid } s^{2}]\) and \(s^{2}\) have prior models which are Gaussian and Inverse Gamma respectively, with

$$\begin{aligned} E[m {\mid } s^{2}]= & {} \mu _{m}\nonumber \\ \mathrm{Var}[m {\mid } s^{2}]= & {} \tau _{m}s^{2} \nonumber \\ E[s^{2}]= & {} [\xi _{s}-1]^{-1} \gamma _{s}\nonumber \\ \mathrm{Var}[s^{2}]= & {} [(\xi _{s}-1)^{2}(\xi _{s}-2)]^{-1}\gamma _{s}^{2} \end{aligned}$$
(4)

where the model parameters are \(\mu _{m} \in \mathfrak {R}^{1}\), \(\tau _{m} \in \mathfrak {R}^{1}_{+}\), \(\xi _{s} \in 2+\mathfrak {R}^{1}_{+}\) and \(\gamma _{s} \in \mathfrak {R}^{1}_{+}\). The prior model on \([m , s^{2}]\) is a conjugate model for the stationary Gaussian random field, and the marginal random field \(\left\{ r(x); x\in \mathbf D _{+}\right\} \) will be a t-distributed random field and analytically tractable (Røislien and Omre 2006).

The estimators for the model parameters \(\mu (x_{+})\) and \(\sigma ^{2}(x_{+})\) at an arbitrary location \(x_{+}\in \mathbf D \) are localized to observations in \(\mathbf D _{+}\) and with shrinkage according to the localized hierarchical Gaussian model. These estimators will of course depend on the parameters of the prior model \([\mu _{m}, \tau _{m}, \xi _{s}, \gamma _{s}]\), which are assessed in an empirical Bayesian spirit from a set of local neighborhood \(\mathbf D _{+}\) covering \(\mathbf D \).

Recent research on non-stationary Gaussian random fields (Higdon 1998), is usually based on models of the following form

$$\begin{aligned} E[r(x)]= & {} \mu ;\quad \forall x \in \mathbf D \\ \mathrm{Cov}[r(x^{'}), r(x^{''})]= & {} \kappa (x^{'}, x^{''}) ;\quad \forall x^{'}, x^{''} \in \mathbf D \end{aligned}$$

with stationary, shift invariant expectation \(\mu \) and non-stationary spatial covariance function \(\kappa (.,.)\). The latter must be a positive definite function which complicates the model parametrization. The major challenge is, however, to make inference of the model parameters based on one set of observations from one realization of the random field. The current model defined in Eq. (2), can be cast in the framework above

$$\begin{aligned} E[r(x)]= & {} \mu ;\quad \forall x \in \mathbf D \\ \mathrm{Cov}[r(x^{'}), r(x^{''})]= & {} \kappa (x^{'}, x^{''}) \quad \\= & {} \sigma (x^{'})\sigma (x^{''})\rho (x^{'}, x^{''}) + (\mu (x^{'})-\mu )(\mu (x^{''})-\mu )\quad \forall x^{'}, x^{''} \in \mathbf D \end{aligned}$$

with global centering value \(\mu \) and gross covariance function \(\kappa (.,.)\). The latter will be a positive definite function whenever \(\rho (.,.)\) is a positive definite function, for arbitrary functions \(\mu (.) \in \mathfrak {R}^{1}\) and \(\sigma (.) \in \mathfrak {R}^{1}_{+}\) on \(\mathbf D \). Note, in particular that for stationary, translation invariant \(\rho (x^{'}, x^{''}) = \rho (x^{'} - x^{''})\) the corresponding \(\kappa (.,.)\) will still be non-stationary. One may consider the model in Eq. (2), to be a flexible parametrization of \(\kappa (.,.)\) in a non-stationary Gaussian random field. In the current study, localized, robust estimators for \(\mu (x)\) and \(\sigma (x)\) at arbitrary \(x \in \mathbf D \) are presented, given a stationary correlation function \(\rho (.)\), based on one set of observations from one realization. Consequently, inference of the non-stationary \(\kappa (.,.)\) under the parametrization discussed above can be made.

These developments are also valid for non-stationary \(\rho (.,.)\), but this extension introduces new challenges in assessing \(\rho (.,.)\). These challenges are not addressed in the current study.

3 Inference of Model Parameters

To use the probabilistic models for \(\left\{ r(x); x\in \mathbf D \right\} \) in spatial prediction, the model parameters must be assessed from the set of observations represented by the n-vector \(\mathbf r _{o}=[r(x_{1}),\ldots ,r(x_{n})]^{T}=[r_{1},\ldots ,r_{n}]^{T}\).

Throughout the study, the spatial correlation function \(\{\rho (x^{'}, x^{''})=\rho (x^{'} - x^{''}); \forall x^{'}, x^{''}\in \mathbf D \}\) is assumed to be known, with associated inter-observation correlation \([n\times n]\)-matrix \(\Omega _{oo}\). This correlation model must of course be inferred in one way or the other, but in the current study the uncertainty related to this is not accounted for. The other model parameters will be estimated conditional on this correlation function, and hence capture the remaining spatial structure in the observations.

For a stationary Gaussian random field, given the correlation function, the model parameters can be assessed by a maximum likelihood estimator

$$\begin{aligned} \hat{\mu }= & {} [\mathbf i _{n}^{T}\Omega _{oo}^{-1}{} \mathbf i _{n}]^{-1}[\mathbf i _{n}^{T}\Omega _{oo}^{-1}{} \mathbf r _{o}]\nonumber \\ \hat{\sigma }^{2}= & {} \frac{1}{n}[\mathbf r _{o}-\hat{\mu }{} \mathbf i _{n}]^{T}\Omega _{oo}^{-1}[\mathbf r _{o}-\hat{\mu }{} \mathbf i _{n}] \end{aligned}$$
(5)

where \(\mathbf i _{n}\) is the n-vector \([1,\ldots ,1]^{T}\).

For a general Gaussian random field, given the correlation function, the assessments of the spatial model parameters \(\left\{ \mu (x); x \in \mathbf D \right\} \) and \(\left\{ \sigma ^{2}(x); x \in \mathbf D \right\} \), are more complicated. To define localized estimators, consider an arbitrary location \(x_{+} \in \mathbf D \) and parameterize the localization by the k observations in \(\mathbf r _{o}\) localized closest to \(x_{+}\), and denote this as k-localization. Define a binary-selection \([k \times n]\)-matrix \(G_{+}^{k}\) such that \(G_{+}^{k}{} \mathbf r _{o}\) is a k-vector containing the k observations in the k-localization of \(x_{+}\). Note that \(G_{+}^{k}\) can also be extended to account for favorable configurations of observations around \(x_{+}\). The k-localized maximum likelihood estimators for the model parameters are

$$\begin{aligned} \hat{\mu }_{+}^{k}= & {} \hat{\mu }^{k}(x_{+})=[\mathbf i _{k}^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}G_{+}^{k}{} \mathbf r _{o}][\mathbf i _{k}^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}{} \mathbf i _{k}]^{-1}\nonumber \\ \hat{\sigma }_{+}^{k2}= & {} \hat{\sigma }^{k2}(x_{+})=\frac{1}{k}[G_{+}^{k}{} \mathbf r _{o}-\hat{\mu }_{+}^{k}{} \mathbf i _{k}]^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}(G_{+}^{k}{} \mathbf r _{o}-\hat{\mu }_{+}^{k}{} \mathbf i _{k}). \end{aligned}$$
(6)

When \(x_{+}\) coincide with the observation locations \([x_{1},\ldots , x_{n}]\) this produces estimators for the observation expectation n-vector, and the diagonal standard deviation \([n \times n]\)-matrix

$$\begin{aligned}&\hat{\varvec{\mu }}_{o}^{k}=[ \hat{\mu }_{1}^{k}, \ldots , \hat{\mu }_{n}^{k} ]^{T}\nonumber \\&\quad \hat{\Gamma }_{o}^{k}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \hat{\sigma }_{1}^{k} &{} \ldots &{} 0\\ \vdots &{} \ddots &{} \vdots \\ 0 &{}\ldots &{} \hat{\sigma }_{n}^{k} \end{array} \right] . \end{aligned}$$
(7)

For a stationary, hierarchical Gaussian random field, with given correlation function, inference of the model parameters of the prior model \([\mu _{m}, \tau _{m}, \xi _{s}, \gamma _{s}]\) is required. This assessment is made in an empirical Bayes setting, by considering the localized estimates in the observation locations, \([\hat{\mu }_{i}^{k}, \hat{\sigma }_{i}^{k}]\); \(i=1,\ldots ,n\), to be a super-population of the k-localized estimate in an arbitrary location. Moment estimators are used, based on the moment expressions defined in Eq. (4). Then natural estimators for the parameters of the prior model are dependent on k, and given by

$$\begin{aligned} \hat{\mu }_{m}^{k}= & {} \frac{1}{n}{} \mathbf i _{n}^{T}\hat{\varvec{\mu }}_{o}^{k}\nonumber \\ \hat{\tau }_{m}^{k}= & {} [\hat{\sigma }_{.}^{k2}]^{-1}\hat{\sigma }_{m}^{k2} \end{aligned}$$
(8)

where

$$\begin{aligned} \hat{\sigma }_{ .}^{k2}= & {} \frac{1}{n}Tr [[\hat{\Gamma }_{o}^{k}]^{2}]\nonumber \\ \hat{\sigma }_{m}^{k2}= & {} \frac{1}{n}[\hat{\varvec{\mu }}_{o}^{k}-\hat{\mu }_{m}^{k}{} \mathbf i _{n}]^{T}[\hat{\varvec{\mu }}_{o}^{k}-\hat{\mu }_{m}^{k}{} \mathbf i _{n}] \end{aligned}$$

and

$$\begin{aligned} \hat{\xi }_{s}^{k}= & {} [\hat{\sigma }_{s}^{k2}]^{-1}[\hat{\mu }_{s}^{k2}]+2\nonumber \\ \hat{\gamma }_{s}^{k}= & {} \hat{\mu }_{s}^{k}[[\hat{\sigma }_{s}^{k2}]^{-1}[\hat{\mu }_{s}^{k2}] +1] \end{aligned}$$
(9)

where

$$\begin{aligned} \hat{\mu }_{s}^{k}= & {} \frac{1}{n}{} \mathbf i _{n}^{T}{} \mathbf s ^{2}\\ \hat{\sigma }_{s}^{k2}= & {} \frac{1}{n}[\mathbf s ^{2}-\hat{\mu }_{s}^{k}{} \mathbf i _{n}]^{T}[\mathbf s ^{2}-\hat{\mu }_{s}^{k}{} \mathbf i _{n}]\\ \mathbf s ^{2}= & {} [(r_{1}-\hat{\mu }_{m}^{k})^{2},\ldots , (r_{n}-\hat{\mu }_{m}^{k})^{2} ]^{T}. \end{aligned}$$

Estimators for all model parameters are now defined based on the observations \(\mathbf r _{o}\). Hence all probabilistic models for the regionalized variable \(\left\{ r(x); x\in \mathbf D \right\} \) are fully specified. The focus is on spatial prediction however, and in the following section spatial predictors are specified under the various model assumptions. The estimators for the model parameters can be inserted to obtain operable predictors.

4 Spatial Predictors

The focus of this study is on spatial prediction in a random field \(\left\{ r(x); x\in \mathbf D \right\} \) based on a set of observations represented in a n-vector \(\mathbf r _{o}\). Consider an arbitrary location \(x_{+}\in \mathbf D \) with value \(r(x_{+})=r_{+}\). The challenge is to provide a reliable predictor for \(r_{+}\) based on \(\mathbf r _{o}\). By using a squared error loss the predictor is \(\hat{r}_{+}=\widehat{E}[r_{+} {\mid } r_{o}] \), with associated estimated predictor variance \(\hat{\sigma }_{+}^{2}={\widehat{\mathrm{V}}}\mathrm{ar}[r_{+} {\mid } r_{o}] \). Note that a predictor for the entire regionalized variable \(\left\{ r(x); x\in \mathbf D \right\} \) can be obtained by letting \(x_{+}\) run over the domain \(\mathbf D \).

Recall that the correlation function \(\{\rho (x^{'} - x^{''}); x^{'},x^{''} \in \mathbf D \}\) is assumed known and that the inter observation correlation \([n \times n]\)-matrix is denoted \(\Omega _{oo}\), while the observation to \(x_{+}\) correlation n-vector is denoted \(\omega _{o+}\). Recall also that the localization operator \(G_{+}^{k}\) is defined such that \(G_{+}^{k}{} \mathbf r _{o}\) is an observation k-vector which contain the k observations located closest to \(x_{+}\).

4.1 Glob/Stat/Trad Predictor

This predictor is global and based on a stationary Gaussian random field model with traditional parameter estimates. It corresponds to the frequently used global ordinary kriging predictor, and it is defined by

$$\begin{aligned}{}[r {\mid } \mathbf r _{o}] \sim \mathrm{Gauss}[\mu _{+{\mid } o}, \sigma ^{2}_{+{\mid } o}] \end{aligned}$$

with

$$\begin{aligned} \mu _{+{\mid } o}= & {} \mu +\omega _{o+}^{T}\Omega _{oo}^{-1}[\mathbf r _{o}-\mu \mathbf i _{n}]\nonumber \\ \sigma ^{2}_{+{\mid } o}= & {} \sigma ^{2}[1-\omega _{o+}^{T}\Omega _{oo}^{-1}\omega _{o+}]. \end{aligned}$$
(10)

Note that the predictor is independent of the variance \(\sigma ^{2}\) while the prediction variance is independent of the observed values \(\mathbf r _{o}\). The latter is only dependent on the location configuration of \(\mathbf r _{o}\). These are well-known characteristics of kriging.

The Glob/Stat/Trad predictor with associated predictor variance is defined as

$$\begin{aligned} \hat{r}_{\mathrm{GST}}= & {} \hat{\mu }_{+{\mid } o}\\ \hat{\sigma }_{\mathrm{GST}}^{2}= & {} \hat{\sigma }_{+{\mid } o}^{2} \end{aligned}$$

which are defined by Eq. (10) with the estimates in Eq. (5) inserted.

4.2 Loc/Stat/Trad Predictor

This predictor is k-localized and based on a stationary Gaussian random field model with traditional parameter estimators. It corresponds to a localized ordinary kriging predictor which is frequently used in practice, and is defined by

$$\begin{aligned}{}[r_{+} {\mid } G_{+}^{k}{} \mathbf r _{o}] \sim \mathrm{Gauss}[\mu _{+{\mid } o}^{k} , \sigma ^{k2}_{+{\mid } o}] \end{aligned}$$

with

$$\begin{aligned} \mu _{+{\mid } o}^{k}&=\mu _{+}^{k}+[G_{+}^{k}\omega _{o+}]^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}[G_{+}^{k}{} \mathbf r _{o}-\mu _{+}^{k}{} \mathbf i _{k}]\nonumber \\ \sigma ^{k2}_{+{\mid } o}&=\sigma _{+}^{k2}[1-[G_{+}^{k}\omega _{o+}]^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}G_{+}^{k}\omega _{o+}]. \end{aligned}$$
(11)

This predictor is locally independent on \(\sigma ^{2}\) with predictor variance locally independent of the observed values \(\mathbf r _{o}\). Since both \(\mu _{+}^{k}\) and \(\sigma _{+}^{k2}\) will vary across the field, the predictor and predictor variance will vary across the field as well.

The Loc/Stat/Trad predictor with associated predictor variance is defined as

$$\begin{aligned} \hat{r}_{\mathrm{LST}}^{k}= & {} \hat{\mu }_{+{\mid } o}^{k}\\ \hat{\sigma }_{\mathrm{LST}}^{k2}= & {} \hat{\sigma }^{k2}_{+{\mid } o} \end{aligned}$$

which are defined by Eq. (11) with the estimates in Eq. (6) inserted.

4.3 Loc/Stat/Shr Predictor

This predictor is k-localized and based on a stationary Gaussian random field model with shrinkage parameter estimators defined in an empirical Bayes setting. The predictor is termed the stationary localized/shrinkage kriging predictor and constitutes a new predictor in the study

$$\begin{aligned}{}[r_{+} {\mid } m_{+}^{k}, s_{+}^{k2}, G_{+}^{k}{} \mathbf r _{o}] \sim \mathrm{Gauss}[\mu _{+{\mid } o}^{k} , \sigma ^{k2}_{+{\mid } o}] \end{aligned}$$

with

$$\begin{aligned} \mu _{+{\mid } o}^{k}&= m_{+}^{k}+[G_{+}^{k}\omega _{o+}]^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}[G_{+}^{k}{} \mathbf r _{o}-m_{+}^{k}{} \mathbf i _{k}] \nonumber \\ \sigma ^{k2}_{+{\mid } o}&= s_{+}^{k2}[1-[G_{+}^{k}\omega _{o+}]^{T}[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}G_{+}^{k}\omega _{o+}] \end{aligned}$$
(12)

where the posterior expectations for the hyper-parameters are (“Appendix”)

$$\begin{aligned} m_{+}^{k}&= E[m{\mid } s^{2}, G_{+}^{k}{} \mathbf r _{o}] \nonumber \\&= \mu _{m}^{k}+\tau _{m}^{k}{} \mathbf i _{k}^{T}[\tau _{m}^{k}{} \mathbf i _{k}{} \mathbf i _{k}^{T}+[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}]^{-1}[G_{+}^{k}{} \mathbf r _{o}-\mu _{m}^{k}{} \mathbf i _{k}]\nonumber \\ s_{+}^{k2}&= E[s_{+}^{k2} {\mid } G_{+}^{k} \mathbf r _{o}] =\left[ \xi _{s}^{k}+\frac{k}{2}-1\right] ^{-1} \nonumber \\&\quad \times \left[ \gamma _{s}^{k}+\frac{1}{2}\left[ [G_{+}^{k}{} \mathbf r _{o} -\mu _{m}^{k}{} \mathbf i _{k} ]^{T}[[G_{+}^{k}\Omega _{oo}[G_{+}^{k}]^{T}] +\tau _{m}^{k}{} \mathbf i _{k}{} \mathbf i _{k}^{T} ]^{-1}[G_{+}^{k}{} \mathbf r _{o} -\mu _{m}^{k}{} \mathbf i _{k}]\right] \right] . \end{aligned}$$
(13)

Equation (12) follows from the definition of a stationary hierarchical Gaussian random field conditioned on the model parameters and on \(G_{+}^{k}{} \mathbf r _{o}\). Equation (13) follows from the prior model of \([m , s^{2}]\) being conjugate, hence the posterior models are analytically accessible, and so are their expectations. Note in particular that \(m_{+}^{k}=E[m{\mid } G_{+}^{k}{} \mathbf r _{o}]\) hence independent of \(s_{+}^{k2}\). Note also that the predictor is independent of the variance while the predictor variance is dependent on the actual observed values.

The predictor exhibits shrinkage behaviour through the estimators for shift and scaling parameters \(m_{+}^{k}\) and \(s_{+}^{k2}\). The actual observation weights are independent of \(m_{+}^{k}\) and \(s_{+}^{k2}\).

The Loc/Stat/Shr predictor with associate predictor variance is defined as

$$\begin{aligned} \hat{r}_{\mathrm{LSS}}^{k}= & {} \hat{\mu }_{+{\mid } o}^{k}\\ \hat{\sigma }_{\mathrm{LSS}}^{k2}= & {} \hat{\sigma }^{k2}_{+{\mid } o} \end{aligned}$$

which are defined by Eqs. (12) and (13) with the estimators in Eqs. (8) and (9) inserted.

4.4 Loc/Non-stat/Trad Predictor

This predictor is k-localized and based on a general Gaussian random field model with traditional parameter estimators. This predictor is surprisingly seldom used, and it is defined as

$$\begin{aligned}{}[r_{+} {\mid } G_{+}^{k}{} \mathbf r _{o}] \sim \mathrm{Gauss}[\mu _{+{\mid } o}^{k} , \sigma ^{k2}_{+{\mid } o}] \end{aligned}$$

with

$$\begin{aligned} \mu _{+{\mid } o}^{k}&=\mu _{+}^{k}+\sigma _{+}^{k}[G_{+}^{k}\Gamma _{o}^{k}\omega _{o+}]^{T}[G_{+}^{k}\Gamma _{o}^{k}\Omega _{oo}\Gamma _{o}^{k}[G_{+}^{k}]^{T}]^{-1}[G_{+}^{k}[\mathbf r _{o}-\varvec{\mu }_{o}^{k}]]\nonumber \\ \sigma ^{k2}_{+{\mid } o}&=\sigma _{+}^{k2}[1-[G_{+}^{k}\Gamma _{o}^{k}\omega _{o+}]^{T}[G_{+}^{k}\Gamma _{o}^{k}\Omega _{oo}\Gamma _{o}^{k}[G_{+}^{k}]^{T}]^{-1}G_{+}^{k}\Gamma _{o}^{k}\omega _{o+}] \end{aligned}$$
(14)

where

$$\begin{aligned} \varvec{\mu }_{o}^{k}= & {} [\mu _{1}^{k},\ldots ,\mu _{n}^{k}]^{T}\\ \Gamma _{o}^{k}= & {} \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \sigma _{1}^{k} &{} \ldots &{} 0\\ \vdots &{} \ddots &{} \vdots \\ 0 &{}\ldots &{} \sigma _{n}^{k} \end{array} \right] . \end{aligned}$$

Note that this predictor is dependent on the variance variability across the field while the predictor variance is independent of the observation values.

The Loc/Non-stat/Trad predictor with associated predictor variance are defined as

$$\begin{aligned}&\hat{r}_{\mathrm{LNT}}^{k}=\hat{\mu }_{+{\mid } o}^{k}\\&\hat{\sigma }_{\mathrm{LNT}}^{k2}=\hat{\sigma }^{k2}_{+{\mid } o} \end{aligned}$$

which are defined by Eq. (14) with the estimates in Eqs. (6) and (7) inserted.

4.5 Loc/Non-stat/Shr Predictor

This predictor is k-localized and based on a non-stationary Gaussian random field model with shrinkage parameter estimators defined in an empirical Bayesian setting. This predictor is termed the non-stationary localized/shrinkage kriging predictor and constitutes another new predictor in this study

$$\begin{aligned}{}[r_{+} {\mid } m_{+}^{k}, \mathbf m _{o}^{k}, s_{+}^{k2}, S_{o}^{k}, G_{+}^{k}{} \mathbf r _{o}] \sim \mathrm{Gauss}[\mu _{+{\mid } o}^{k} , \sigma ^{k2}_{+{\mid } o}] \end{aligned}$$

with

$$\begin{aligned} \mu _{+{\mid } o}^{k}&=m_{+}^{k}+ s_{+}^{k} [G_{+}^{k} S_{o}^{k}\omega _{o+}]^{T}[G_{+}^{k} S_{o}^{k}\Omega _{oo} S_{o}^{k}[G_{+}^{k}]^{T}]^{-1}[G_{+}^{k}[\mathbf r _{o}-\mathbf m _{o}^{k}]] \nonumber \\ \sigma ^{k2}_{+{\mid } o}&=s_{+}^{k2}[1-[G_{+}^{k} S_{o}^{k}\omega _{o+}]^{T}[G_{+}^{k} S_{o}^{k}\Omega _{oo} S_{o}^{k}[G_{+}^{k}]^{T}]^{-1}G_{+}^{k} S_{o}^{k}\omega _{o+}] \end{aligned}$$
(15)

where \(m_{+}^{k}\) and \(s_{+}^{k2}\) are defined as in Eq. (13) while

$$\begin{aligned} \mathbf m _{o}^{k}= & {} [ m_{1}^{k}, \ldots , m_{n}^{k} ]^{T}\\ S_{o}^{k}= & {} \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} s_{1}^{k} &{} \ldots &{} 0\\ \vdots &{} \ddots &{} \vdots \\ 0 &{}\ldots &{} s_{n}^{k} \end{array} \right] \end{aligned}$$

are defined from the \(x_{+}\)-centered \(m_{+}\) and \(s_{+}^{k2}\) shifted to the observation locations \([x_{1}, \ldots , x_{n}]\).

The predictor exhibits shrinkage through the estimators of \(m_{+}^{k}\), \(s_{+}^{k2}\) and \(m_{i}^{k}\), \(s_{i}^{k2}\); \(i=1,\ldots ,n\). Hence both shift and scaling, as well as observation weights are influenced by the shrinkage effect. This is a full shrinkage predictor.

The Loc/Non-stat/Shr predictor with associated predictor variance are defined as

$$\begin{aligned} \hat{r}_{\mathrm{LNS}}^{k}= & {} \hat{\mu }_{+{\mid } o}^{k}\\ \hat{\sigma }_{\mathrm{LNS}}^{k2}= & {} \hat{\sigma }^{k2}_{+{\mid } o} \end{aligned}$$

which are defined by Eq. (15) with the estimates in Eqs. (8) and (9) inserted.

4.6 Cross-Validation Calibrated (CVC) Predictors

Five predictors are defined, one global and four localized. One challenge with localized predictors is lack of global anchoring. The variance estimates are localized and coupled with the expectation estimates which may cause incorrect scaling of the prediction variances. To account for this cross-validation calibrated (CVC) predictors are introduced to provide a global calibration.

Consider an arbitrary predictor in an arbitrary location \(x_{+} \in \mathbf D , \hat{r}_{+}=\widehat{E}[r_{+}{\mid } \mathbf r _{o}]\). The predictor may be global or localized, and it is based on the observations \(\mathbf r _{o} = [r_{1},\ldots ,r_{n}]^{T}\) in locations \([x_{1},\ldots ,x_{n}]\). The associated prediction variance is \(\hat{\sigma }_{+}^{2}={\widehat{\mathrm{V}}}\mathrm{ar}[r_{+}{\mid } \mathbf r _{o}]\).

Define the cross-validation predictors with associated predictor variances in the observation locations \([x_{1},\ldots ,x_{n}]\)

where \(\mathbf r _{o(-i)}\) represents observations \(\mathbf r _{o}\) with observations \(r_{i}\) removed.

The normalized cross-validation errors are defined as

$$\begin{aligned} e_{i}=[\hat{\sigma }_{i}]^{-1}[r_{i}-\hat{r}_{i}]; \quad i=1,\ldots ,n. \end{aligned}$$

Under a fully specified model, these errors will be centered at zero and scaled to unity. Consider the estimators

$$\begin{aligned} \hat{\mu }_{e}= & {} \frac{1}{n}\sum _{i=1}^{n}e_{i}\\ \hat{\sigma }_{e}^{2}= & {} \frac{1}{n}\sum _{i=1}^{n}[e_{i}-\hat{\mu }_{e}]^{2} \end{aligned}$$

so the mean normalized error (MNE) \(\hat{\mu }_{e}\), and mean square normalized error (MSNE) \(\hat{\sigma }_{e}^{2}\), should be close to zero and unity respectively.

The CVC predictor and CVC prediction variance are defined in an arbitrary location \(x_{+} \in \mathbf D \) by

$$\begin{aligned} \tilde{r}_{+}= & {} \hat{r}_{+}+\hat{\sigma }_{+}\hat{\mu }_{e}\nonumber \\ \tilde{\sigma }_{+}^{2}= & {} \hat{\sigma }_{e}^{2} \hat{\sigma }_{+}^{2}. \end{aligned}$$
(16)

Note that the corresponding normalized cross-validation errors will have \(\hat{\mu }_{e}\) and \(\hat{\sigma }_{e}^{2}\) which are identical to zero and unity, respectively. These CVC predictors with associated CVC prediction variances will be used in the following study.

5 Empirical Evaluation

It is difficult to compare the various predictors analytically, hence an empirical evaluation is conducted. First suitable evaluation criteria are defined, thereafter two studies are described and the results are summarized.

5.1 Evaluation Criteria

Several spatial CVC predictors with associated CVC prediction variances are defined. Recall that all predictors have normalized cross-validation errors centered at zero and scaled to unity.

The precision of the CVC predictors \(\tilde{r}_{+}\) measured by mean square cross-validation error

$$\begin{aligned} \mathrm{PMSE}=\frac{1}{n}\sum _{i=1}^{n}[r_{i}-\tilde{r}_{i}]^{2} \end{aligned}$$
(17)

may vary, however. In the CVC predictor the scale of the normalized cross-validation error is identical to unity but large deviations may be reduced by large prediction variances in this measure. The PMSE value is used as measure for precision in the CVC predictor \(\tilde{r}_{+}\) and small values of PMSE are preferable, of course.

The precision of the CVC prediction variances \(\tilde{\sigma }_{+}^{2}\) is indicated by the dependence between cross-validation squared errors \([r_{i}-\tilde{r}_{i}]^{2}\) and corresponding prediction variances \(\tilde{\sigma }_{i}^{2}\). Since these are variance estimates the mean square measure is defined as

$$\begin{aligned} \mathrm{VMSE}=\frac{1}{n}\sum _{i=1}^{n}\left[ \frac{[r_{i}-\tilde{r}_{i}]^{2}}{\tilde{\sigma }_{i}^{2}}-1\right] ^{2}. \end{aligned}$$
(18)

Recall that normalized cross-validation squared error in this expression is centered exactly at unity. The VMSE value is used as measure for precision in the CVC prediction variance \(\tilde{\sigma }_{+}^{2}\) and small values of VMSE are preferable.

By comparing values of PMSE and VMSE for different CVC predictors one can evaluate the relative quality of the predictors. Normally, the criterion PMSE is considered more important than the criterion VMSE.

5.2 Empirical Studies

Two cases are presented. The first is based on a set of observations of yearly accumulated precipitation and the second on a simulation study of a general Gaussian random field.

5.2.1 US Precipitation Study

The data set consists of observations of accumulated precipitation from 1997 in 1001 locations in an area of the US (Fig. 1). The observations are a subset of a much larger data set (Johns et al. 2003).

Fig. 1
figure 1

The 1997 accumulated precipitation observations in the US with sub-area to be studied (top). The 1,001 observations used in the study (bottom)

Fig. 2
figure 2

Spatial correlation function used with estimated correlation values

By inspecting the observations in Fig. 1, one sees relatively dense, uniform coverage of observations with a slight south-eastern trend in the values. The translation invariant correlation function \(\rho (.)\) is inferred from correlation values computed at regular shifts assuming translation invariant expectations and variances. These correlations (Fig. 2) constitutes some sort of spatial average values. The following model is fitted

$$\begin{aligned} \rho (\tau )=\exp \left\{ -\left[ \frac{\tau }{3.5}\right] ^{1.4}\right\} ; \quad \tau \ge 0 \end{aligned}$$

with \(\tau =|x^{'}- x^{''}|\). This spatial correlation model is used throughout the study and the estimates of the model parameters \(\mu (.)\) and \(\sigma ^{2}(.)\) are obtained conditional on this correlation model. The estimators and the localized predictors require that k, the number of observations in the neighborhood, is specified. After a small preliminary study, the value \(k=10\) is chosen.

Fig. 3
figure 3

Glob/Stat/Trad CVC predictor-ordinary kriging a Predictions, b Prediction standard deviations

Fig. 4
figure 4

Glob/Stat/Trad cross-validation errors a Normalized cross-validation errors, b Normalized cross-validation error histogram

Fig. 5
figure 5

Loc/Stat/Trad \(k=10\) CVC predictor a Predictions, b Prediction standard deviations

Fig. 6
figure 6

Loc/Stat/Trad \(k=10\) cross-validation errors a Normalized cross-validation errors, b Normalized cross-validation error histogram

Fig. 7
figure 7

Loc/Stat/Shr \(k=10\) CVC predictor a Predictions, b Prediction standard deviations

Fig. 8
figure 8

Loc/Stat/Shr \(k=10\) cross-validation errors a Normalized cross-validation errors, b Normalized cross-validation error histogram

Fig. 9
figure 9

US precipitation study. Priors model for expectation and variance

The comparison is made between the five alternative CVC predictors defined in Sect. 4 using the evaluation criteria defined in Sect. 5. The results from the evaluation are displayed in Figs. 3, 4, 5, 6, 7, 8, 9 and Table 1.

Table 1 US precipitation study

The results from the Glob/Stat/Trad CVC predictor are displayed in Figs. 3, 4 and Table 1. Figure 3 displays the actual cross-validation predictions and cross-validation prediction standard deviations in the observations locations. The predictions can be compared to the actual observations in Fig. 1, and no dramatic deviations are seen since the observation coverage is dense. Figure 4 displays the normalized cross-validation errors both spatially and as a histogram. Note that the histogram is centered at zero and scaled to unity since the CVC predictor is used. The locations of large errors seem to fall in the south-eastern corner where the trend effect is largest. The values of the evaluation criteria are listed in Table 1, first column \((k=1000)\). The two first lines MNE and MSNE contain the values of \(\hat{\mu }_{e}\) and \(\hat{\sigma }_{e}^{2}\) respectively, hence the empirical moments of the normalized cross-validation errors of the non-calibrated predictor. The predictor appears as well centered but with downward biased prediction variances. The two next lines contain the values of the evaluation criteria PMSE and VMSE, for the CVC predictor. The former criteria is related to prediction precision while the latter is related to precision in the prediction variance. These criteria provide the basis for comparison of the various CVC predictors.

The results from the Loc/Stat/Trad \(k=10\) CVC predictor are displayed in Figs. 5, 6 and Table 1. The formats are identical to the ones discussed in the previous paragraph. The cross-validation predictions and prediction standard deviations in Fig. 5 are very similar to the results for the global predictor in Fig. 3. The normalized cross-validation errors in Fig. 6 deviate noticeable from the results for the global predictor in Fig. 4, since the large errors tend to be more uniformly located in the area and the histogram has somewhat lighter tails. From Table 1 one sees that the non-calibrated predictor is well centered but with downward biased prediction variances. The evaluation criteria PMSE and VMSE of the Loc/Stat/Trad \(k=10\) CVC predictor have values that are favorable compared to the global CVC predictor. It is mildly favorable for prediction and clearly favorable for prediction variance.

The results from the Loc/Stat/Shr \(k=10\) CVC predictor are displayed in Figs. 7, 8, 9 and Table 1, in similar formats as above. The predictor relies on a set of hyper-parameters that defines the prior model for localized expectation and variance. These prior models are assessed in an empirical Bayesian spirit from the complete set of observations. For the current predictor with \(k=10\) we obtain the prior model displayed in Fig. 9. The cross-validation predictions and cross-validation standard deviations in Fig. 7 appear as very similar to the results for the other CVC predictors in Figs. 3 and 5. The normalized cross-validation errors in Fig. 8 appear as similar to the ones for the traditional localized predictor in Fig. 6. Note, however, that the histograms are different in the sense that the histogram of the shrinkage predictor has lighter tails than the traditional one. This is very much in the shrinkage spirit, since extreme predictions, often caused by unstable model parameter estimates, are dampened towards the center of the model. From Table 1, one observe that the non-calibrated predictor is well centered but with downward biased prediction variances. The evaluation criteria PMSE and VMSE for the Loc/Stat/Shr \(k=10\) CVC predictor are both favorable compared to both the traditional global and localized predictors. Minor improvement in prediction precision is obtained, while the precision in the prediction variance is clearly improved.

The results from the Loc/Non-stat/Trad \(k=10\) CVC predictor and the Loc/Non-stat/Shr \(k=10\) CVC predictor is summarized in Table 1. By comparing the values of the evaluation criteria PMSE and VMSE to the other predictors, we conclude that the precision in prediction is clearly poorer. Note, however, the improvement in precision for the prediction variance. These results may indicate that there is a trade-off in the precision of the prediction and prediction variance.

Fig. 10
figure 10

General Gaussian random field. Expectation and variance field (a), predictions and prediction variances for one realization (be)

To summarize, it can be concluded that the localized predictors are clearly preferable to the global one, both in prediction precision and particularly in the precision of prediction variance. The localized models are robust with respect to deviations from assumptions of global stationary, and this robustness improves the localized predictors. Localized stationary predictors are preferable to the localized non-stationary ones, since the precisions in prediction are clearly better. The precision in the prediction variance can, however, be improved by non-stationary predictors. In the non-stationary models one must estimate the expectation and variance at each observation location, which introduces additional uncertainty in the model. This uncertainty dominates the advantage of using a more general model. Among localized, stationary predictors, the shrinkage predictor is clearly favorable to the traditional one. The prediction precision is slightly better, while the precision in prediction variance is clearly favorable. For localized models, one needs to make bias/variance trade-offs when selecting the localization. Using a regularizer representing the global variability in the parameter estimators provide more stable estimates. It is not surprising that the effect is largest for prediction variance, since traditional variance estimators are notoriously unstable.

In the current study, localization with \(k=10\) is used. In the extended study (Asfaw and Omre 2014), the sensitivity to choice of k value is evaluated. If k is considerably reduced, to for example \(k=4\), the localized predictor are poorer than the global one. This deterioration is probably caused by overfitting to the observations. Results for k in the range 8–16 are consistent with the results in the current study with localized shrinkage predictors clearly preferable. By increasing k the localized and global predictors will eventually coincide.

5.2.2 Simulation Study

A simulation study is conducted on a general Gaussian random field model \(\{r(x); x \in \mathbf D \subset \mathfrak {R}^{1}\}\) with \(\mathbf D =[1{,}200]\) discretized to a grid \(\mathcal {L_\mathbf{D }}=\left\{ 1, \ldots , 200 \right\} \). The expectation and variance fields used in the study are non-stationary with similar shape (Fig. 10a). The known correlation function is \(\rho (\tau )=exp\{-0.2 \tau ^{1.5}\}\) with \(\tau ={\mid } x^{''} - x^{'}{\mid }\). One realization is generated from this random field model and the realizations at locations \(\mathcal {L_\mathbf{o }}= \left\{ 1, 10, \ldots , 190, 200 \right\} \) are used as observations, hence \(n=21\).

With known model parameters, the correct predictions and prediction variances in locations \(\mathcal {L_\mathbf{D }}\) are analytically assessable (Fig. 10b). The predictor reproduces the observations and the prediction variances reflect the non-stationarity. The various CVC predictors Glob/Stat/Trad, Loc/Stat/Trad and Loc/Stat/Shr are based on localization \(k=\pm 4\) whenever relevant. The corresponding results are displayed in Fig. 10c–e. Note that the global predictor has regression towards the observation average and prediction variance accounting only for the observation configuration. The localized predictors are fairly similar, capturing the local variability in the observations. The shrinkage results appear as somewhat damped relative to the traditional ones. This dampening is caused by the prior models for local expectation and variance which are assessed in an empirical Bayesian spirit. The prior model for this realization is displayed in Fig. 11. Predictions based on the non-stationary models are also evaluated but these results are not displayed in Fig. 10.

Fig. 11
figure 11

General Gaussian random field. Prior model for expectation and variance for one realization

To summarize, the global predictor, which corresponds to classical ordinary kriging, appears as highly unreliable since the underlying model is clearly non-stationary. This predictor will not be further evaluated. The results from the localized predictors are difficult to distinguish by visual inspection.

The evaluation criteria discussed in Sect. 5 can be revised to characterize the deviations between the CVC predictions and prediction variances and the correct ones displayed in Fig. 10a. One thousand realizations are generated and the evaluation criteria to obtain APMSC and AVMSC are averaged for prediction and prediction variance respectively, see Table 2. This procedure is repeated for varying localizations k. From Table 2, one observes that for localized, stationary predictors the shrinkage-versions are preferable to the traditional ones for both criteria and for all k, except for one extreme case with large k. For the localized, non-stationary predictors the shrinkage-versions are uniformly preferable to the traditional ones.

Table 2 General Gaussian random field

One typical feature is observed for localized/stationary predictors for the criterion APMSC, characterizing precision in prediction, for varying localizations k. The traditional predictor makes bias/variance trade-offs, resulting in poor performance for large k due to bias and also poor performance for small k due to instability in the parameter estimates. Localization at \(k=\pm 4\) provides a favorable traditional predictor. The shrinkage predictor stabilizes the variance estimates by shrinkage and performs well for smaller localizations with \(k=\pm 2\). The Loc/Stat/Shr \(k=\pm 4\) CVC predictor appears to be the preferable one, since the APMSC criterion is seen as more important than the AVMSC one.

The evaluation criteria discussed above require the correct predictors to be available, which is not the case in real studies. Therefore, the cross-validation criteria discussed in Sect. 5, which are always available are also computed. One thousand realizations are averaged over to obtain APMSE and AVMSE, and the resulting values are listed in Table 2. For these criteria, the shrinkage versions are uniformly preferable to the traditional ones for all cases. The Loc/Stat/Shr \(k=\pm 2\) CVC predictor appears to be superior based on these criteria as well.

To summarize, the cross-validation criteria appear as representative for the exact criteria based on the correct predictions. The former criteria can be computed in real studies with only one set of observations available. The Loc/Stat/Shr CVC predictors are identified as preferable to the other predictors, although the best localization k appears to be somewhat underestimated by cross-validation.

In the current simulation study, only one expectation and one variance function are used. In the extended study (Asfaw 2014), many other cases are considered. The results are consistent with the ones presented here, and it is demonstrated that the traditional predictors are particularly sensitive to deviations from stationarity in expectation, which also influences the variance estimates. Lastly, the synthetic simulation case is Gaussian so no outliers and no heavy-tailed distributions are involved. In spite of this, the shrinkage predictors are found to be preferable to the traditional ones. In the presence of outliers and heavy-tailed models, one can expect that the shrinkage predictors perform even more favorably.

6 Conclusions

Two versions of localized, shrinkage CVC predictors are defined, one based on local stationarity and one with local non-stationarity. The shrinkage is defined in an empirical Bayes setting while the cross-validation calibration (CVC) ensures correct global scaling of the predictor variance. The introduction of spatial shrinkage predictors constitutes the new feature of this study, and are termed localized/shrinkage kriging predictors.

The localized/shrinkage kriging predictors are compared to traditional kriging predictors, both global and localized, in a study on real precipitation data and in a synthetic simulation study. Two cross-validation-based criteria are used in the comparison. The localized/shrinkage kriging predictors are found to be clearly favorable to traditional kriging predictors on the real data set of yearly accumulated precipitation. The synthetic study is based on a Gaussian random field with spatially varying expectation and variance which makes local predictors suitable. The localized/shrinkage kriging predictors emerge as clearly favorable to traditional localized kriging predictors also in this simulation study. The shrinkage predictors based on local stationarity seems to be the superior models.

Our recommendation is to use localized, shrinkage kriging predictors, based on a local stationarity model, for spatial prediction whenever deviation from stationarity in the observations is suspected. Even for a stationary Gaussian model localized, shrinkage kriging predictors can be preferable to global ordinary kriging, if the focus is on computational demands.