Norges Teknisk-naturvitenskapelige Universitet Localized/shrinkage Kriging Predictors Localized/shrinkage Kriging Predictors

The objective of the study is to improve the robustness and flexibility of spatial kriging predictors with respect to deviations from spatial stationarity assumptions. A predictor based on a non-stationary Gaussian random field is defined. The model parameters are inferred in an empirical Bayesian setting, using observations in a local neighborhood and a prior model assessed from the global set of observations. The localized predictor appears with a shrinkage effect and is coined a localized/shrinkage kriging predictor. The predictor is compared to traditional localized kriging predictors in a case study on observations of annual cumulated precipitation. A crossvalidation criterion is used in the comparision. The shrinkage predictor appears as uniformly preferable to the traditional kriging predictors. A simulation study on prediction in non-stationary Gaussian random fields is conducted. The results from this study confirms that the shrinkage predictor is favorable to the traditional ones. Moreover, the crossvalidation criterion is found to be suitable for selection of predictor. Lastly, the shrinkage predictor appears as particularly robust towards spatially varying expectation functions.


Introduction
Spatial prediction is required in many applications, and examples can be found in natural resource mapping, meteorology and image analysis. Consider a regionalized variable {r(x); x ∈ D ⊂ m } where r(x) ∈ 1 is the variable of interest while x is a reference variable in the domain D. The challenge is to predict the regionalized variable from a set of observations r o = [r(x 1 ), ..., r(x n )]; x 1 , ..., x n ∈ D. We define the predictors in a probabilistic setting and can also obtain associated predictor uncertainties.
The classical probabilistic approach to spatial prediction is kriging, see Journel and Huijbregts (1978) and Chiles and Delfiner (1999). The traditional ordinary kriging predictor is based on a stationary model for the regionalized variable, with spatially constant expectation and variance, and a shift invariant spatial correlation function. Localized predictors, local neighborhood kriging, see Chiles and Delfiner (1999) can be defined to robustify the predictor with respect to deviations from the stationarity assumptions. The major challenge in using localized predictors is to define the size of the local neighborhood where a bias/variance trade-off must be made. For some spatial correlation structures a screening effect is provided by the observations closest to the prediction location, see Stein (2002), and this effect may robustify the localization. In Anderes and Stein (2011) a weighted localized approach is defined and demonstrated to be useful for non-stationary random fields.
Recent developments in computer and sensor technology have provided enormous spatio-temporal sets of observations, see Johns et al (2003). Computational demands in spatial prediction have been a critical factor and considerable research is devoted to numerical algorithmes for large sparse correlation matrices. Localized predictors may provide an alternative solution to reduce these computational demands.
In the current study we define spatial predictors which are robust with respect to deviations from assumptions about spatial stationary. We define a Gaussian random field with spatially varying expectation and variance. The spatial correlation function is shift invariant and known. We assess the expectation and variance locally by a sliding window approach. In traditional local neighborhood kriging this assessment is done by some maximum likeli-hood procedure. The new feature of the study is that these local assessments are done by shrinkage estimators in an empirical Bayes setting, inspired by the study of Efron and Morris (1973). The hierarchical, Gaussian random field model discussed in Røislien and Omre (2006) is used locally and the hyper-parameters are assessed from the global set of observations. Since the predictor is locally defined it is extremely computationally effecient. We denote the spatial predictor localized/shrinkage kriging.
In the two first sections, various Gaussian random field models for the regionalized variable {r(x); x ∈ D} are presented with associated model parameter estimators. In the following section, five spatial predictors are defined, one global and four localized. Two of these predictors appear with shrinkage. The next section defines the evaluation criteria used in the comparison of the predictors. In the next to last section the five predictors are compared on a real set of annual accumulated precipitation data and in a synthetic simulation study. Thereafter, in the last section, the conclusions are forwarded. The paper summarises the major findings in an extended study, reported in Asfaw and Omre (2014).

Predictor Models
The spatial predictors will be based on probabilistic models for the regionalized variable {r(x); x ∈ D}.
In traditional kriging prediction {r(x); x ∈ D} is associated with a stationary Gaussian random field, see Chiles and Delfiner (1999). This assumption entails that: where expected level µ ∈ 1 , variance level σ 2 ∈ 1 + and spatial correlation function ρ(x − x ) is positive definite. Note that for these model assumptions the random field is shift invariant, and this property is extensively used to make inference about the model parameters [µ, σ 2 , ρ(.)] from the set of observations r o . This traditional kriging model may be extended to have an expectation surface µ(x) = L l=1 α e g e (x) where {g e (x); x ∈ D}; l = 1, ..., L are known basis surfaces while α = (α 1 , ..., α L ) are unknown coefficients. This model corresponds to a spatial regression model, and the shift invariance property is lost, which complicates model parameter inference. Note that for given correlation function ρ(.), maximum likelihood estimates based on r o is analytically assessable for the other model parameters, under both these model assumptions.
In the current study it is assumed that {r(x); x ∈ D} is associated with a general Gaussian random field, which entails that: with µ ∈ 1 , σ 2 ∈ 1 + and ρ(x , x ) being positive definite. There is obviously lack of shift invariance under these assumptions. We can expect that inference of the spatial model parameters {µ(x); x ∈ D}, {σ 2 (x); x ∈ D} and ρ(x , x ); ∀x , x ∈ D , based on r o is complicated. If we fix the correlation function ρ(., .) we may use localized estimators in a kernel spirit to assess the spatial model parameters µ(.) and σ 2 (.). When selecting the size of localization we face a bias/variance trade-off. Large local neighborhood introduce bias in the estimators due to smoothing while small neighborhoods introduce instability in the estimators due to sensoring of observations. In the current study we adress this bias/variance trade-off by defining shrinkage estimators in an empirical Bayes setting.
In order to define the shrinkage estimators we introduce a stationary, hierarchical Gaussian random field model for a local neighborhood D + around an arbitrary x + ∈ D, i.e for {r(x); x ∈ D + }. In this model the expected level m and the variance level s 2 are considered to be random variables. Moreover, {[r(x) | m, s 2 ] ; x ∈ D + } is defined to be a stationary Gaussian random field, hence: Corr r(x ), r(x ) | m, s 2 = ρ(x − x ); ∀x , x ∈ D + By assuming that ρ(.) is known, and that the model parameters [m | s 2 ] and s 2 have prior models which are Gaussian and Inverse Gamma respectively, with: where the model parameters are µ m ∈ 1 + , τ m ∈ 1 + , ξ s ∈ 2+ 1 + and γ s ∈ 1 + . The prior model on [m, s 2 ] is a conjugate model for the stationary Gaussian random field, and the marginal random field {r(x); x ∈ D + } will be a T-dist random field, see Røislien and Omre (2006), and be analytically tractable.
The estimators for the model parameters µ(x + ) and σ 2 (x + ) in arbitrary location x + ∈ D are localized to observations in D + and with shrinkage according to the localized hierarchical Gaussian model. These estimators will of course depend on the parameters of the prior model [µ m , τ m , ξ s , γ s ], and we assess these parameters in an empirical Bayesian spirit from a set of local neighborhood D + covering D.

Inference of Model Parameters
In order to use the probabilistic models for {r(x); x ∈ D} to spatial prediction the model parameters must be assessed from the set of observations represented by the [n × 1]-vector r o = [r(x 1 ), ..., r(x n )] T = [r 1 , ..., r n ] T .
Throughout the study we assume the spatial correlation function ρ(x , x ) = ρ(x − x ); ∀x , x ∈ D to be known, with associated interobservation correlation [n × n]-matrix Ω oo . This correlation model must of course also be infered in one way or the other, but in our study we do not account for the uncertainty related to this.
For a stationary Gaussian random field, given the correlation function, the model parameters can be assessed by a maximum likelihood estimator: For a general Gaussian random field, given the correlation function, the assessments of the spatial model parameters {µ(x); x ∈ D} and {σ 2 (x); x ∈ D}, are more complicated. We prescribe localized estimators. Consider an arbitrary location x + ∈ D and parameterize the localization by the k observations in r o localized closest to x + , and denote it as k-localization. Define a binary- ]-vector containing the k observations in the k-localization of x + . Note that G k + easily can be extended to also account for favorable configurations of observations around x + . The k-localized maximum likelihood estimators for the model parameters are: By letting x + coinside with the observation locations [x 1 , ..., x n ] we obtain the estimators for observation expectation [n × 1]-vector and for diagonal standard deviation [n × n]-matrix respectively.
We have now defined estimators for all model parameters based on the observations r o . Hence all probabilistic models for the regionalized variable {r(x); x ∈ D} are fully specified. Focus is on spatial prediction however, and in the following Section we define spatial predictors under the various model assumptions and insert the estimators for the model parameters to obtain operable predictors.

Spatial Predictors
Focus of the study is on spatial prediction in a random field {r(x); x ∈ D} based on a set of observations represented in a [n × 1]-vector r o . Consider an arbitrary location x + ∈ D with value r(x + ) = r + . The challenge is to provide a reliable predictor for r + based on r o . We use a squared error loss, hence the predictor isr Note that a predictor for the entire regionalized variable {r(x); x ∈ D} can be obtained by letting x + run over the domain D.
Recall that the correlation function ρ(x − x ); x , x ∈ D is assumed known and that the inter observation correlation [n×n]-matrix is denoted Ω oo , while the observation to x + correlation [n × 1]-vector is denoted ω o+ . Recall also that the localization operator G k + is defined such that G k + r o is a observation [k × 1]-vector which contain the k observations located closest to x + .

Glob/Stat/Trad Predictor
This predictor is global and based on a stationary Gaussian random field model with traditional parameter estimates. It corresponds to the frequently used global ordinary kriging predictor, and it is defined by: Note that the predictor is independent of the variance σ 2 while the prediction variance is independent of the observed values r o only the location configuration. These are well recognized characteristics of kriging.
The Glob/Stat/Trad predictor with associated predictor variance is defined as:r which are defined by Expression (10) with the estimates in Expression (5) inserted.

Loc/Stat/Trad Predictor
This predictor is k-localized and based on a stationary Gaussian random field model with traditional parameter estimators. It corresponds to a localized ordinary kriging predictor which is frequently used in practice, and it is defined by: Also this predictor is locally independent on σ 2 with predictor variance locally independent on the observed values r o . Since both µ k + and σ k2 + will vary across the field, the predictor and predictor variance will vary across the field.
The Loc/Stat/Trad predictor with associated predictor variance is defined as:r LST =σ k2 +|o which are defined by Expression (11) with the estimates in Expression (6) inserted.

Loc/Stat/Shr Predictor
This predictor is k-localized and based on a stationary Gaussian random field model with shrinkage parameter estimators defined in an empirical Bayes setting. The predictor is denoted the stationary localized/shrinkage kriging predictor and it constitutes one new predictor in the study: where the posterior expectations for the hyper-parameters are, see Appendix A: Expression (12) follows from the definition of a stationary hierarchical Gaussian random field conditioned on the model parameters and on G k + r o . Expression (13) follows from the prior model of [m, s 2 ] being conjugate, hence the posterior models are analytically assessible, and so are their expectations. Note in particular that m k Note also that the predictor is independent of the variance while the predictor variance is dependent on the actual observed values.
The predictor appears with shrinkage through the estimators for shift and scaling parameters m k + and s k2 + . The actual observations weights are independent of m k + and s k2 + .
The Loc/Stat/Shr predictor with associate predictor variance is defined as: which are defined by Expressions (12) and (13) with the estimators in Expressions (8) and (9) inserted.

Loc/Non-stat/Trad Predictor
This predictor is k-localized and based on a general Gaussian random field model with traditional parameter estimators. This predictor is surprisingly seldom used, and it is defined as: Note that this predictor is dependent of the variance variability across the field while the predictor variance is independent of the observation values.
The Loc/Non-stat/Trad predictor with associated predictor variance are defined as:r which are defined by Expression (14) with the estimates in Expressions (6) and (7) inserted.

Loc/Non-stat/Shr Predictor
This predictor is k-localized and based on a non-stationary Gaussian random field model with shrinkage parameter estimators defined in an empirical Bayesian setting. The predictor is denoted the non-stationary localized/shrinkage kriging predictor and it constitutes another new predictor in the study: where m k + and s k2 + are defined as in Expression (13) (8) and (9) inserted.

Crossvalidation Calibrated (CVC) Predictors
We have defined five predictors, one global and four localized. One challenge with localized predictors is lack of global anchoring. The variance estimates are localized and coupled with the expectation estimates which may cause wrong scaling of the prediction variances. We introduce crossvalidation calibrated (CVC) predictors to provide a global calibration.
Consider an arbitrary predictor in an arbitrary location x + ∈ D,r + = E[r + | The normalized crossvalidation errors are defined as: Under a fully specified model, these errors will be centred at zero and be scaled to unity. Consider the estimators: and theμ e andσ 2 e should be close to zero and unity, respectively.
We define the CVC predictor and CVC prediction variance in an arbitrary location x + ∈ D by:r Note that the corresponding normalized crossvalidation errors will haveμ e andσ 2 e which are identical to zero and unity, respectively. These CVC predictors with associated CVC prediction variances will be used in the study.

Evaluation Criteria
We have defined several spatial CVC predictors with associated CVC prediction variances. We need to select one based on the set of observations r o only. Recall that all predictors have normalized crossvalidation errors centred at zero and scaled to unity.
The precision of the CVC predictorsr + measured by mean square crossvalidation error: may vary. In the CVC predictor the scale of the normalized crossvalidation error is identical to unity but large deviations may be reduced by large prediction variances in this measure. We use PMSE as measure for precision in the CVC predictorr + and we prefer small values of PMSE, of course.
The precision of the CVC prediction variancesσ 2 + is indicated by the dependence between crossvalidation squared errors [r i −r i ] 2 and corresponding prediction variancesσ 2 i . Since these are variance estimates we define the mean square measure: Recall that normalized crossvalidation squared error in this expression is centred exactly at unity. We use VMSE as measure for precision in the CVC prediction varianceσ 2 + and we prefer small values of VMSE, of course.
By comparing values of PMSE and VMSE for different CVC predictors we are able to evaluate the relative quality of the predictors. Normally, the criterion PMSE is considered more important than criterion VMSE.

Empirical Studies
It is difficult to compare the various predictors analytically, hence we conduct an empirical comparision. We present two case: set of observations of yearly accumulated precipitation and a simulation study on a general Gaussian random field.

US Precipitation Study
We consider observations of 1997 accumulated precipitation in 1001 locations in an area of the US, see Figure 1. The observations is a subset of much larger data set, see Data (2014) and Johns et al (2003).
By inspecting the observations in Figure 1 we note relatively dense, uniform coverage of observations with a slight south-eastern trend in the values. The spatial correlation function is inferred under a stationary model, see Figure 2, and the following model is fitted: This spatial correlation model is used throughout the study. The localized predictors require that the number of observations in the neighborhood k is specified. We have, after a small preliminary study, chosen k=10.
We compare the five alternative CVC predictors defined Section 4 by using the evaluation criteria defined in Section 5. The results from the evaluation are displayed in Figure 3 thru 9 and Table 1.
The results from the Glob/Stat/Trad CVC predictor are displayed in Figure  3 and 4 and in Table 1. Figure 3 display the actual crossvalidation predictions and crossvalidation prediction standard deviations in the observations locations. The predictions can be compared to the actual observations in Figure 1, and no dramatic deviations are seen since the observation coverage is dense. Figure 4 display the normalized crossvalidation errors both spatially and as a histogram. Note that the histogram is centred to zero and scaled to unity since the CVC predictor is used. The locations of large errors seem to fall in the south-eastern corner where the trend effect is largest. The values of the evaluation criteria are listed in Table 1, first column (k = 1000). The two first lines MNE and MSNE contain the values ofμ e andσ 2 e respectively, hence the empirical moments of the normalized crossvalidation errors of the non-calibrated predictor. The predictor appears as well centred but with downward biased prediction variances. The two next lines contain the values of the evaluation criteria PMSE and VMSE, for the CVC predictor. The former criteria is related to prediction precision while the latter is related to precision in the prediction variance. These criteria provide the base for comparision of the various CVC predictors.
The results from the Loc/Stat/Trad k=10 CVC predictor are displayed in Figure 5 and 6 and in Table 1. The formats are identical to the ones discussed in the previous paragraph. The crossvalidation predictions and prediction standard deviations in Figure 5 are very similar to the results for the global predictor in Figure 3. The normalized crossvalidation errors in Figure 6 do deviate from the results for the global predictor in Figure 4. Note that the large errors tend to be more uniformly located in the area and that the histogram have somewhat lighter tails. From Table 1 we see that the non-calibrated predictor is well centred but with downward biased prediction variances. The evaluation criteria PMSE and VMSE of the Loc/Stat/Trad k=10 CVC predictor have values that are favorable compared to the global CVC predictor. Not much favorable for prediction but clearly favorable for prediction variance.
The results from the Loc/Stat/Shr k=10 CVC predictor are displayed in Figure 7 thru 9, and in Table 1, on similar formats as above. The predictor relies on a set of hyper-parameters that defines the prior model for localized expectation and variance. These prior models are assessed in an empirical Bayesian spirit from the complete set of observations. For the current predictor with k=10 we obtain the prior model displayed in Figure 9. The crossvalidation predictions and crossvalidation standard deviations in Figure  7 appear as very similar to the results for the other CVC predictors in Figure 3 and 5. The normalized crossvalidation errors in Figure 8 appear as similar to the ones for the traditional localized predictor in Figure 6. Note, however, that the histograms are different in the sense that the histogram of the shrinkage predictor have lighter tails than for the traditional one. This is very much in the shrinkage spirit, since extreme predictions, often caused by unstable model parameter estimates, are dampened towards the centre of the model. From Table 1, we observe that the non-calibrated predictor is well centred but with downward biased prediction variances. The evaluation criteria PMSE and VMSE for the Loc/Stat/Shr k=10 CVC predictor are both favorable to both the traditional global and localized predictors. Minor improvement on prediction precision is obtained, while the precision in the prediction variance is clearly improved.
The results from the Loc/Non-stat/Trad k=10 CVC predictor and the Loc/Nonstat/Shr k=10 CVC predictor are only summarized in Table 1. By comparing the values of the evaluation criteria PMSE and VMSE to the other predictors, we conclude that the precision in prediction is clearly poorer. Note, however, the improvement in precision for the prediction variance. These results may indicate that there is a trade-off in the precision of prediction and prediction variance.
To summarize the current study, we conclude that localized predictors can be clearly favorable to the global one. Favorable both in prediction precision and particularly in precision of prediction variance. The localized models are robust with respect to deviations from assumptions of global stationary, and this robustness causes improvements of the localized predictors. Localized stationary predictors are favorable to the localized non-stationary ones, since the precisions in prediction are clearly better. The precision in the prediction variance can, however, be improved by non-stationary predictors. In the non-stationary models we must estimate expectation and variance in each observation location, which introduces additional uncertainity in the model. This uncertaininty is dominating the advantage of using a more general model. Among localized, stationary predictors, the shrinkage predictor is clearly favorable to the traditional one. The prediction precision is slightly better, while the precision in prediction variance is clearly favorable. For localized models we need to make bias/variance trade-offs when selecting the localization. Using a regularizer representing the global variability in the parameter estimators provide more stable estimates. It is not surprising that the effect is largest for prediction variance, since traditional variance estimators are notoriously unstable.
In the current study we have used a localization with k=10. In the extended study, see Asfaw and Omre (2014), we have evaluated the sensitivity to choise of k-value. If we reduce k considerably, to for example k=4, the localized predictor are poorer than the global one. This is probably caused by overfitting to the observations. Results for k in the range 8 to 16 are consistent with the results in the current study with localized shrinkage predictors clearly favorable. By increasing k we will of course in the limit have the localized and global predictors to coinside.
With known model parameters, the correct predictions and prediction variances in locations L D are analytically assessable, see Figure 10.b. The predictor reproduces the observations and the prediction variances reflects the non-stationarity. We use the various CVC predictors Glob/Stat/Trad, Loc/Stat/Trad and Loc/Stat/Shr with localization k=±4 when relevant, and obtain results as in Figure 10.c-e. Note that the global predictor have regression towards the observation average and prediction variance accounting only for the observation configuration. The localized predictors are fairly similar, capturing the local variability in the observations. The shrinkageversion appears as somewhat dampened relative to the traditional one. This dampening is caused by the prior models on local expectation and variance which are assessed in an empirical Bayesian spirit. The prior model for this realization are displayed in Figure 11. We also perform prediction based on the non-stationary models but the results are not displayed in Figure 10.
To summarize, the global predictor, which corresponds to classical ordinary Kriging, appears as highly unreliable due to deviations from the stationarity assumptions, and will not be further evaluated. The localized predictors are difficult to distinguish by visual inspection.
The evaluation criteria discussed in Section 5 can be revised to characterize the deviations between the CVC predictions and prediction variances and the correct ones in Figure 10.a. We repeat 1000 realizations and average the criteria to obtain APMSC and AVMSC as evaluation criteria for prediction and prediction variance respectively, see Table 2 with k=±4. We repeat this procedure for varying localizations k. Note that for localized, stationary predictors the shrinkage-versions are favorable to the traditional one for both criteria and all k, except one extreme case with large k. For the localized, non-stationary predictors the shrinkage-versions are uniformly favorable to the traditional one.
One typical feature is observed for localized/stationary predictors for criterion APMSC, characterizing precision in prediction, for varying localizations k. The traditional predictor makes bias/variance trade-offs, with poor performances for large k due to bias and for small k due to instability in parameter estimates. Localization k=±4 appears as favorable. The shrinkage predictor stabilizes the variance estimates by shrinkage and tolerates smaller localizations with k=±2. We consider the Loc/Stat/Shr k=±4 CVC predictor to be the favorable one, since the APMSC criterion is seen as more important than the AVMSC one.
The evaluation criterion discussed above require the correct predictors to be avaliable, which is not the case in real studies. We have also computed the crossvalidation criteria discussed in Section 5 which always are available. We average over 1000 realizations to obtain APMSE and AVMSE, and the values are listed in Table 2. For these criteria the shrinkage-versions are uniformly favorable to the traditional ones for all cases. We consider the Loc/Stat/Shr k=±2 CVC predictor to be best based on these criteria, of course.
To summarize, the crossvalidation based criteria are representative for the exact criteria based on the correct predictions. Note that the former can be computed in real studies with only one set of observations avaliable. The Loc/Stat/Shr CVC predictors are favorable to other predictors, although the localization k tends to be somewhat underestimated by crossvalidation.
In the current study we have only used one expectation and variance function. In the extended study, see Asfaw and Omre (2014), we have considered many other cases. The results are consistent with the ones presented here, and it is demonstrated that the traditional predictors are particularly sensitive to deviations from stationarity in expectation which also influences the variance estimates of course. Lastly, recognize that the synthetic simulation case is Gaussian, no outliers and no heavy-tailed distributions are involved. In spite of this, we find that the shrinkage predictors are favorable to the traditional ones. In presence of outliers and heavy-tailed models, we expect the shrinkage predictors to perform even more favorably.                  Table 2: General Gaussian random field. Evaluation criteria: Average prediction mean squared correct (APMSC), Average variance mean squared correct (AVMSC), Average prediction mean squared error (APMSE) and Average variance mean squared error (AVMSE).

Conclusion
We specify two versions of localized, shrinkage CVC predictors, one based on local stationarity and one with local non-stationarity. The shrinkage is defined in an empirical Bayes setting while the crossvalidation calibration (CVC) ensures correct global scaling of the predictor variance. The introduction of spatial shrinkage predictors constitutes the new feature of the study, and they are termed localized/shrinkage kriging predictors.
The localized/shrinkage kriging predictors are compared to the traditional kriging predictors, both global and localized, in a study on real precipitation data and in a synthetic simulation study. We use two crossvalidation based criteria in the comparison. The localized/shrinkage kriging predictors are found to be clearly favorable to traditional kriging predictors on the real data set of yearly accumulated precipitation. The synthetic study is based on a Gaussian random field with spatially varying expectation and variance which make local predictors suitable. The localized/shrinkage kriging predictors appear as clearly favorable to traditional localized kriging predictors. The shrinkage predictors based on local stationarity seems to be the superior ones.
Our recommendation is to use localized, shrinkage kriging predictors, based on a local stationarity model, for spatial prediction whenever deviation from stationarity in the observations is suspected. Even for a stationary Gaussian model localized, shrinkage kriging predictors can be preferable to global ordinary kriging, if focus is on computational demands.