Model-Based Geostatistics from a Bayesian Perspective: Investigating Area-to-Point Kriging with Small Data Sets
- 266 Downloads
Abstract
Area-to-point kriging (ATPK) is a geostatistical method for creating high-resolution raster maps using data of the variable of interest with a much lower resolution. The data set of areal means is often considerably smaller (\(<\,50 \) observations) than data sets conventionally dealt with in geostatistical analyses. In contemporary ATPK methods, uncertainty in the variogram parameters is not accounted for in the prediction; this issue can be overcome by applying ATPK in a Bayesian framework. Commonly in Bayesian statistics, posterior distributions of model parameters and posterior predictive distributions are approximated by Markov chain Monte Carlo sampling from the posterior, which can be computationally expensive. Therefore, a partly analytical solution is implemented in this paper, in order to (i) explore the impact of the prior distribution on predictions and prediction variances, (ii) investigate whether certain aspects of uncertainty can be disregarded, simplifying the necessary computations, and (iii) test the impact of various model misspecifications. Several approaches using simulated data, aggregated real-world point data, and a case study on aggregated crop yields in Burkina Faso are compared. The prior distribution is found to have minimal impact on the disaggregated predictions. In most cases with known short-range behaviour, an approach that disregards uncertainty in the variogram distance parameter gives a reasonable assessment of prediction uncertainty. However, some severe effects of model misspecification in terms of overly conservative or optimistic prediction uncertainties are found, highlighting the importance of model choice or integration into ATPK.
Keywords
Area-to-point kriging Spatial disaggregation Bayesian statistics Closed-form solutions1 Introduction
An important challenge often encountered in scientific research is spatial prediction using areal-support data, that is, data about the variable of interest that is available as areal means only. Data may be aggregated for privacy protection, administrative, technical or other reasons. Examples include data on the failures in semiconductor chip production (White et al. 2017), cancer mortality (Goovaerts 2006), precipitation (Park 2013), soil properties (Kerry et al. 2012; Orton et al. 2012; Malone et al. 2009; Brus et al. 2014), soil hydraulic properties (Horta et al. 2014) and, the motivating example in this research, crop yields. You et al. (2009) state that, because of limited resources, crop yield information in many sub-Saharan countries is available only sparsely and in aggregated format; however, such information is needed at a finer spatial resolution to support efforts to increase crop productivity [see for example Orton et al. (2018)] and thereby improve both human welfare and ecological sustainability.
Model-based spatial prediction is usually done using algorithms based on point support (Diggle and Ribeiro 2007). In the case of areal-supported input, area-to-point kriging (ATPK) is a popular approach for creating fine-scale raster maps of the variable of interest and the corresponding prediction uncertainty (called spatial disaggregation or downscaling). In ATPK, regression coefficients (in the presence of covariates) and variogram parameters (describing spatial relationships) have to be estimated, for example by a least square estimator for the regression coefficients combined with an iterative variogram fitting deconvolution algorithm (‘method of moments’) on the regression residuals (Goovaerts 2008). More recent methods such as restricted maximum likelihood (REML) in combination with universal kriging^{1} (UK) (both, and from hereon, referring to their application in the ATPK setting) consider the uncertainty in the regression coefficients (Webster and Oliver 2007). However, uncertainty in the variogram model parameters might also be a relevant source of uncertainty (Jansen 1998; Kitanidis 1986; Minasny et al. 2011). Truong et al. (2014) showed that variogram uncertainty can have a substantial impact on ATPK variances. Brus et al. (2018) summarised earlier work by Pardo-Igzquiza and Dowd (2001) showing that uncertainty in the variogram parameters can be quantified by the inverse Fisher matrix of the variogram parameters, but did not integrate this uncertainty in the kriging prediction uncertainty itself. In the Bayesian statistics paradigm, parameters can be considered stochastic rather than fixed but unknown (Gotway 2005). From a Bayesian perspective, REML in combination with UK considers the regression coefficients as stochastic and subsequently integrates them out—from the likelihood function for estimation of the variogram parameters as well as from the prediction. In this paper, this Bayesian direction is continued by successively integrating out the spatial variance parameter (analytically) and the spatial correlation distance parameter (numerically). By applying analytical solutions whenever possible, Markov chain Monte Carlo (MCMC) sampling from the Bayesian posterior distribution as proposed for example by Minasny et al. (2011) is avoided. MCMC can be computationally expensive and may, when used in a spatial context, be difficult to converge to a posterior distribution due to correlated parameters (Christensen et al. 2006). The extra effort of taking variogram parameter uncertainty into account could be most beneficial in the case of ATPK or area-to-area kriging, because the provided data set of areal means, from which regression coefficients and variogram parameters at high-resolution support must be inferred, can often be limited in size. However, taking variogram parameter uncertainty into account can also be beneficial in the case of point-to-point kriging (Le and Zidek 1992; Berger et al. 2001) and for sampling design (Marchant and Lark 2004, 2007).
It is not uncommon in ATPK studies to have only a small data set of areal means and no available relevant expert knowledge to inform model parameters (for example Brus et al. 2018). Therefore, this research aims to provide some insight into the applicability and behaviour of ATPK methods under these circumstances. To provide this insight, the following questions are answered: (i) What is the impact of different prior distributions—selected to represent a lack of prior knowledge about model parameters—on the quality of the ATPK predictions and prediction uncertainties? (ii) Can some aspects of uncertainty be disregarded, which might allow for computational benefits? (iii) How sensitive are results to misspecifications of the underlying statistical model?
In the following sections, the theoretical framework of model-based geostatistics for areal data, the Bayesian paradigm and the combination of these are briefly introduced. In the simulation part of this paper, REML is compared with more Bayesian approaches to perform and assess ATPK using a simulated spatial signal, including data sets as small as nine observations. Using real-world data, different approaches on self-aggregated remote sensing data are tested, referred to as the synthetic case study. Finally, as the motivating example, millet and sorghum yields, known as areal means only, are downscaled for each of the 45 provinces of Burkina Faso to a fine-scale grid of predicted yields.
2 Theory
2.1 Geostatistics Basics: Gaussian Random Field
2.2 Area-to-Point Kriging
For a mathematical elaboration of the above equations, refer to Wackernagel (2014). The term regression kriging (RK) is used later to refer to simple kriging of the trend residuals, an approach which disregards any uncertainty in the estimated plug-in values of the trend parameters, which results in a different kriging variance. In the following sections, the framework presented above will be extended to a Bayesian one.
2.3 Bayesian Statistics
In the Bayesian framework, the prior \(p({\varTheta })\) needs to be defined, stating the current state of knowledge—or, inversely formulated, the current state of ignorance—about the parameters. In many cases, a low-informative prior is desired; however, formulating a prior that ‘lets the data speak for itself’ is not always straightforward (Seaman et al. 2012; Lindley 2004). Also, a ‘conjugate’ prior is often chosen so that its distribution function matches the likelihood, resulting in a closed-form description of the posterior (Albert 2009).
For various reasons, prior distributions that do not integrate to a finite value (i.e., cannot be normalised to integrate to one) might be considered, termed in the Bayesian framework as ‘improper’ priors. Improper priors can result in improper (poorly defined) posterior probability densities.
In the remainder of this paper, a posterior probability distribution—proper or not—is indicated by \(f_p(\ldots |\ldots )\), the likelihood by \(f_l(\ldots |\ldots )\), and a prior distribution—again having propriety or not—by \(f_0(\ldots )\). When the function type is ambiguous, its interpretation depends on the context and its arguments.
3 Implementation
3.1 Partly Analytical Bayesian Area-to-Point Algorithm
3.1.1 Marginal Posterior Distance Parameter
Numerically, BAK creates a one-dimensional grid covering the parameter space of \(\phi \), calculates the marginal posterior for each \(\phi \), and normalises the marginal posterior to a distribution that integrates to one within the bounds of the \(\phi \) grid.
3.1.2 Marginal Posterior Sill
3.1.3 Marginal Posterior Regression Coefficients
Similarly to Sect. 3.1.2, BAK creates two-dimensional grids covering the parameter spaces of \(\phi \) and \(\beta _q\) (for all q) and applies the trapezoidal rule to calculate the integral over \(\phi \) in Eq. (19); finally, it normalises to get the marginal distributions for each individual \(\beta _q\).
3.1.4 Posterior Predictive Distribution
Geostatistical approaches from maximum likelihood to full Bayesian. Corresponding to their universal kriging counterparts, \({\hat{\varvec{z}}}^*_\mathrm{RK}\) and \({\hat{\varvec{v}}}^*_\mathrm{RK}\) indicate the regression kriging and regression kriging variance, respectively
Basis for estimation, function to be maximised/integrated posterior distribution | Estimated plug-ins | Basis for prediction | Predictive distribution | Prediction | Prediction variance | |
---|---|---|---|---|---|---|
ML—maximum likelihood | \(f_l := f_l(\varvec{z}| \varvec{\beta }, \sigma ^2, \phi ) \) | \(\hat{\varvec{\beta }}\), \(\hat{\sigma }^2\), \(\hat{\phi }\) | \( f(\varvec{z}^*| \varvec{z}, \hat{\varvec{\beta }}, \hat{\sigma }^2, \hat{\phi })\) | N | \({\hat{\varvec{z}}}^*_\mathrm{RK}\) | \({\hat{\varvec{v}}}^*_\mathrm{RK}\) |
REML—restricted maximum likelihood | \( \begin{aligned} \int f_l f_0( \varvec{\beta }) \mathop {}\!\mathrm {d}\varvec{\beta }\end{aligned} \) | \(\hat{\sigma }^2\), \(\hat{\phi }\) | \( \begin{aligned} \int f(\varvec{z}^*| \varvec{z}, \varvec{\beta }, \hat{\sigma }^2, \hat{\phi }) \\ \times f( \varvec{\beta }| \varvec{z}, \hat{\sigma }^2, \hat{\phi }) \mathop {}\!\mathrm {d}\varvec{\beta }\end{aligned} \) | N | \({\hat{\varvec{z}}}^*_\mathrm{UK}\) | \({\hat{\varvec{v}}}^*_\mathrm{UK}\) |
MML—maximum marginal likelihood | \( \begin{aligned} \int f_l f_0( \varvec{\beta }, \sigma ^2) \\ \times \mathop {}\!\mathrm {d}(\varvec{\beta }, \sigma ^2) \end{aligned} \) | \(\hat{\phi }\) | \( \begin{aligned}&\int f(\varvec{z}^*| \varvec{z}, \varvec{\beta }, \sigma ^2, \hat{\phi }) \\&\times f( \varvec{\beta }, \sigma ^2 | \varvec{z}, \hat{\phi }) \mathop {}\!\mathrm {d}( \varvec{\beta }, \sigma ^2 ) \end{aligned} \) | t | \({\hat{\varvec{z}}}^*_\mathrm{UK}\) | \( {\hat{\varvec{v}}}^*_\mathrm{UK} \times \frac{m - k}{m - k - 2} \) |
Full Bayesian | \(\begin{aligned} \int f_l f_0( \varvec{\beta }, \sigma ^2 \phi ) \\ \times \mathop {}\!\mathrm {d}(\varvec{\beta }, \sigma ^2, \phi ) \end{aligned} \) | – | \(\begin{aligned}&\int f(\varvec{z}^*| \varvec{z}, \varvec{\beta }, \sigma ^2, \phi ) \\&\times f_p( \varvec{\beta }, \sigma ^2, \phi | \varvec{z} ) \mathop {}\!\mathrm {d}( \varvec{\beta }, \sigma ^2, \phi ) \end{aligned} \) | ^{a} | ^{b} | ^{c} |
Note that Eq. (21) is an increased universal kriging variance [see for comparison Eq. (8)] because the uncertainty in \(\sigma ^2\) is also considered—hence the increment expressed in the first fraction. The second fraction equals the REML estimate for \(\sigma ^2\) given \(\phi \).
3.2 Methodological Details
In this work, a number of increasingly Bayesian approaches to ATPK are applied and compared (Table 1). The first three rows of the table represent plug-in approaches for some of the parameters (i.e., the stated parameters are first estimated, by maximising a likelihood or marginal likelihood function, before being plugged into the relevant predictive distribution equation for prediction), while the final row represents the fully Bayesian approach. In the case of maximum likelihood estimation (ML, and not implemented in this work), all parameters (in the geostatistical context: regression coefficients and spatial covariance parameters) are estimated by analytically or numerically maximising the likelihood. This general approach was consolidated by Fisher almost a century ago (Stigler 2007) and applied in geostatistics for example by Kitanidis (1983) and Lark (2000). REML, which has been advocated for several decades in geostatistics, is based on a likelihood function for a set of projected data rather than the original data, and gives conditionally unbiased estimates for the spatial parameters (Webster and Oliver 2007; Lark and Cullis 2004); see also Sect. A2 in Online Resource A. REML represents a form of marginal likelihood (a likelihood function in which some parameters have been marginalised), and has been presented in a Bayesian framework as such (the integral of the likelihood function with respect to the trend parameters, assuming a flat improper prior for these parameters) (Harville 1974). Note that \(f_0(\varvec{\beta })\) can be considered an uninformative prior when neglected—this is often valid for centrality parameters but not for other parameters. Underpinning the same approach, UK takes the uncertainty in the trend coefficients into account, making it a logical combination with REML. Within this research, the combined application of REML and UK is indicated by ‘REML approach’. The next gradation towards the fully Bayesian approach is maximum likelihood with both trend and variance integrated out, in the context of this paper indicated by the generic term ‘maximum marginal likelihood’ (MML). Finally, the full Bayesian approach (also referred to as ‘Bayesian approach’) provides a posterior distribution of all parameters, while in the prediction all parameters are integrated out and the uncertainty of all parameters is taken into account.
In the following sections, REML, MML and the Bayesian approach are compared, and for the Bayesian approach different priors for \(\phi \) (as defined in the following section) are applied. All algorithms (including the central BAK algorithm as presented in Fig. 1) are written in the statistical programming language R, and are available at Steinbuch et al. (2019).
3.2.1 Prior Distributions for \(\phi \)
For \(f_0(\phi )\), three potential forms of prior distribution are compared, intended to represent limited prior knowledge. These are (1) a uniform prior with limited bounds; (2) the reference prior as suggested by Berger et al. (2001) for analysis of point-support data, applied in the context of areal-support data, and explained in Online Resource B; and—in the simulation ensemble—(3) an inverse-gamma distribution. The bounded uniform and the inverse-gamma distributions are proper; the assumed propriety of the reference prior will be discussed later.
3.2.2 Estimation and Prediction with REML and MML
For REML, the approach as described by Brus et al. (2018) was applied. For MML, the posterior mode of \(\phi \) was calculated using the Bayesian approach with a uniform prior for \(\phi \). The predictive distribution was then defined conditionally on this value of \(\phi \). Mathematically, this equals integrating out \(\varvec{\beta }\) and \(\sigma ^2\) to arrive at an estimated \(\hat{\phi }\), which is successively used as a single plug-in value for MML prediction; the mean and variance of the predictive distribution (representing the prediction and prediction variance) are shown in Table 1 and Eqs. (7) and (21).
3.2.3 Estimating Average Covariances
The average correlation matrices, \(\bar{\varvec{C}} \) and \(\bar{\varvec{C}} ^*\), can be approximated in different ways. In this research, many discretisation points within each area are defined and the relevant Euclidean distances between those points are calculated, followed by construction of the corresponding correlation matrix, based on the correlation function—such as given in Eq. (2)—and distance parameter \(\phi \). Then, all correlations per area-area combination are averaged to arrive at \(\bar{\varvec{C}} \), and per area-prediction point combination to arrive at \(\bar{\varvec{C}} ^*\). The discretisation points were on a regular grid in the simulation study, and selected by simple random sampling in the two case studies.
3.2.4 Validation
4 Simulation Study
The following shows a single one-dimensional simulation where REML and full Bayesian (defined with \(f_0(\phi ) \sim \hbox {uniform}\)) are compared for illustration purposes. Following the illustration, an ensemble of many simulations is applied to assess several settings. Online Resource D contains similar results for two-dimensional simulations.
4.1 Single Simulation
4.1.1 Simulated Data Set
A line of length 300 abstract units (au) was created, and filled with \(n=600\) equally spaced nodes. Using the exponential covariance function, a spatially correlated signal (with a zero-nugget exponential model; \(\sigma ^2_\mathrm{sim} = 5\), \(\phi _\mathrm{sim} = 60\)) was generated and added to the trend of a linear function of the coordinate (\(\beta _\mathrm{1 sim} = 0\) for the intercept, \(\beta _\mathrm{2 sim} = 0.02\) for the slope on coordinate). For the Bayesian approach, \(f_0(\phi ) \sim \hbox {uniform}\) was used, bounded by \(\varvec{\phi }_{l,u} = \{10, 300\}\); these bounds also defined bounds for the REML parameter search. The priors for \(\varvec{\beta }\) and \(\sigma ^2\) are provided in Eq. (13). The above settings are referred to as the standard settings. The line was split into \(m=10\) equal one-dimensional ‘areas’ or line sections. Finally, both \(\varvec{z}\) and the covariate over the areas were averaged to arrive at the observed means \(\bar{\varvec{z}}\) and the averaged design matrix \(\bar{\varvec{X}}\).
4.1.2 Results
Results of single simulation run with validation on original data points
mean(StSE) | RMSE | ME | max(MMP) | |
---|---|---|---|---|
REML | 1.169 | 0.585 | 0.000 | 0.009 |
Bayesian-uniform | 0.827 | 0.588 | 0.000 | 0.009 |
Baseline | – | 0.666 | 0.000 | 0.000 |
4.2 Simulation Ensemble
In this section, results are generalised by generating many simulations, varying only the random number seed, while comparing validation statistics on the outcomes of several approaches: REML, MML and full Bayesian with the three different priors for \(\phi \) indicated earlier. The applied inverse-gamma prior for \(\phi \) is set as somewhat informative with shape \(= 11\) and rate \(= 600\). This results in a mean of 60 au, emulating a situation where decent prior knowledge about the range is available. Also, the number of observations m is varied by dividing the line into m sections of equal length; this together is one ‘session’. Furthermore, to investigate how the approaches behave for differently simulated data sets and different inference settings, both are varied into an ‘ensemble’ of many sessions. The settings as used in standard session 1 are given in Sect. 4.1.1. Sessions 2 and 3 vary the upper bound for the uniform prior and for the REML and MML searches for estimation of \(\phi \) in comparison with standard session 1 (where \(\phi = 300\) au). In session 4, the trend is removed (so that the inferential model has to infer the mean only); in session 5, the trend is based on a separate Gaussian random field (GRF) rather than on the coordinates. Sessions 6, 7 and 8 introduce a misfit between the simulation model and the inferential model, where the correlation function in the simulation model is changed or a nugget component is added—the inferential model stays unchanged. Finally, sessions 9 and 10 show the effects of a misfit between the actual signal and the support of the available data (i.e., short or very long distance parameter used in simulation compared with the area sizes and total extent), which might make it difficult to identify parameters.
Mean standardised squared error (mean(StSE)) for the one-dimensional simulation ensemble, comparing restricted maximum likelihood (REML), maximum marginal likelihood (MML) and three full Bayesian approaches with different priors for \(\phi \)
4.2.1 General Results
Referring to Online Resources C and E, the maximal difference with respect to the mass preserving property (max(MPP)) ranges between 0.09 and 0.28 in the case of the two-dimensional simulations. In the one-dimensional case, max(MPP) is much smaller. With all approaches in all simulations, the ME was small. The RMSE was, for a given simulation, almost equal for all, but the baseline approach was, on average, larger. The main difference between the approaches was in the prediction uncertainty (assessed by StSE).
The standard session 1 in Table 3 shows that \(m=10\) caused REML to be optimistic, while the Bayesian-uniform approach was less optimistic, Bayesian-inverse-gamma was closest to one (perhaps due to the knowledge captured in the prior distribution for \(\phi \)), MML was slightly conservative and Bayesian-reference very conservative. With increasing m, all mean(StSE) approached one, while the corresponding sd(StSE) decreased. Even with \(m = 20\), the differences between approaches and the deviation from one became small, except for the Bayesian-reference approach. The results for two-dimensional simulations were similar, although differences between approaches were a bit larger for \(m = 9\) and deviations from one were often still substantial for \(m=25\).
4.2.2 Changing Uniform Prior for \(\phi \) (Sessions 2 and 3)
Sessions 2 and 3 vary only in the upper bound of the uniform prior for \(\phi \) (au = 100 and 2000, respectively) used as the basis for inference, rather than in the simulating model; for comparison, the same bounds in the REML and MML parameter searches were applied. Note that Bayesian-inverse-gamma and Bayesian-reference results (see Sect. 3.2.1), having their own bounds, are not repeated here. Also recall that the extent of the simulated data set was 300 (au). The seemingly arbitrary choice of the upper bound of the uniform prior for \(\phi \) influenced the results, especially with few data (small m) and with the two-dimensional simulations (see Online Resource D).
Although MML and the Bayesian-uniform approach use the same range of possible \(\phi \) values, MML was far less influenced by the upper bound for its parameter search. The proportion of \(\hat{\phi }\) values (estimated by REML and MML, respectively) that were very close to the upper and lower bounds are also given (Online Resource C/E). Interestingly, in the case of a larger upper bound (session 3), the fraction of REML-estimated \(\phi \)’s close to the unchanged lower bound was larger than in session 1, while the fraction of MML-estimated \(\phi \)’s close to the lower bound stayed the same.
4.2.3 Varying Simulation Trend (Sessions 4 and 5)
The trend on the spatial coordinate (the standard) was also compared with a trend that was a simulated GRF itself (\(\beta _1 = 0\), \(\beta _2 = 2\), \(\tau ^2 = 0\), \(\sigma ^2 = 0.5\), \(\phi = 30\)), session 4, and with a constant mean, session 5. In both cases, the form of the trend (i.e., the design matrix) was assumed to be known for inference and prediction. The only difference between the sessions was the means: the simulated error signals for sessions 1–5 were identical. Compared with a trend on the coordinate, both the GRF trend and a constant mean gave only minor differences in mean(StSE); this also held for the two-dimensional simulations.
4.2.4 Misspecified Model (Sessions 6, 7 and 8)
In sessions 6 and 7, the error signal was simulated using a Matérn covariance function with large and small values for the smoothness parameter \(\nu _\mathrm{sim}\) (not to be confused with the degrees of freedom \(\nu \) of a t-distribution used earlier). The inference in these sessions was still based on the exponential covariance model, which equals the Matérn model with \(\nu _\mathrm{sim} = 0.5\). These sessions were designed to provide a test of how the methods deal with a misspecified inferential model. The large \(\nu _\mathrm{sim}\) in session 6 caused all mean(StSE) to be far too conservative, with the Bayesian approaches slightly more conservative, and with average mean(StSE) values becoming smaller with increasing m. In the two-dimensional simulations, the values stayed considerably closer to one. A small \(\nu _\mathrm{sim}\), as shown in session 7, caused almost all results to be optimistic. With increasing m, the mean(StSE) did not converge towards one, but rather seemed to stabilise at an optimistic value. With a nugget component added to the simulated data (session 8; with nugget-sill ratio 1/6), all approaches were optimistic (except Bayesian-reference and \(m = 10\), and its two-dimensional counterpart with \(m = 9\)), and the average mean(StSE) increased with m in the one-dimensional simulations. In the two-dimensional simulations, the relation between m and mean(StSE) was ambiguous.
4.2.5 Simulation with Extreme Distance Parameter (Sessions 9 and 10)
If the distance parameter used for the simulations was very small in relation to the areas under consideration, such as in session 9, all approaches seemed to be quite optimistic, but this effect strongly decreased with increasing m. The worst performer was the Bayesian-inverse-gamma, where information encapsulated in \(f_0(\phi )\) was now mismatched with the simulation model, although Bayesian-uniform also performed badly. The Bayesian-reference approach performed best. In the two-dimensional simulations, values were more extreme, especially for \(m=9\). When, as in session 10, the distance parameter was large compared with the total extent under consideration, REML performed almost perfectly while other approaches tended to be slightly or fairly conservative, but improved with increasing m.
5 Case Studies
5.1 Synthetic Case Study: Vegetation Index Data, with Validation on Point Support
To briefly investigate how REML, MML and full Bayesian would perform for a real-world data set, a remote-sensing vegetation index, CFAPAR-27, was used as the variable of interest. These data are used as a covariate in the real case study (spatial prediction of crop yield in Burkina Faso, Sect. 5.2) and therefore concisely described in Online Resource F. This spatial variable is, obviously, available on pixel support. The CFAPAR-27 data were masked using the crop yield mask (see also Sect. 5.2), and subsequently aggregated over the 45 provinces of Burkina Faso. As covariates for inference, two climate variables broadly representing rainfall and temperature (CRAIN-EC-27 and TMIN-EC-21) and one variable representing soil pH (PHAQ) were used. Gaussianity for all real-world variables of interest was assumed.
ATPK was applied using four approaches: (1) REML, (2) MML, (3) the full Bayesian approach using the uniform prior for \(\phi \), and (4) the full Bayesian approach using the reference prior for \(\phi \). For REML and MML, the parameter search for \(\phi \) was bounded between 37 and 300 km, being roughly the smallest distance between the centres of any two areas, and one third of the largest extent of the region of interest, respectively. The same bounds defined the uniform prior for the full Bayesian approach.
The resulting mean(StSE) was 2.87, 2.73, 2.92 and 2.59 for the REML, MML, Bayesian-uniform and Bayesian-reference approaches, respectively, showing that prediction uncertainty was seriously underestimated by all approaches. The mean(StSE) of the Bayesian-uniform approach could be changed by several tenths by adjusting the bounds of the uniform prior. All RMSE values for the four approaches were around 6.19 (compared with the baseline approach RMSE of 16.54), indicating that they offered the same prediction quality and probably quite similar predictions.
5.2 Real Case Study: Crop Yield Data
As a real-world case study, this paper predicts yields of sorghum and millet, both cereal staple foods, in Burkina Faso, West Africa. The observation areas are the 45 provinces, for which only the average yields are known (averaged over the years 2000–2013, and provided by AGRHYMET), as shown in Fig. 6 for millet. Covariates for the trends as suggested by Brus et al. (2018) are used: for millet no covariates, and for sorghum four covariates are shown and briefly explained in Online Resource F.
6 Discussion
6.1 Setting Uniform Prior
Both in the simulations (Table 3 and Online Resource C) and in the case studies, the choice of the upper and lower bounds of a uniform prior for \(\phi \) can influence the prediction uncertainty, especially (but not exclusively) with smaller data sets and if the posterior mode of \(\phi \) coincides with one of the bounds of the prior. This effect can also occur with REML and MML approaches, where the search for the optimum value of \(\phi \) is bounded by the same limits. It should be stressed that, in this context, this ‘flat’ uniform prior cannot be considered uninformative. The fact that the posterior modes of \(\phi \) (resulting from Bayesian approaches), or \(\hat{\phi }\) (from the REML approach), often coincided with one of the bounds (for example see the ‘\(\hat{\phi }\), \(\hbox {mode}(\phi ) \approx ~ \min , ~ \max \)’ columns in Online Resources C and E, but also sorghum in the case study) highlights the importance of carefully considering such prior or parameter search settings in geostatistical practice.
6.2 Reference Prior
According to the simulations, the reference prior did not perform well, being in many cases too conservative about prediction uncertainty, and pushing posterior distributions of the distance parameter too strongly towards zero (see also Online Resources C and D). In the case of small \(\nu _\mathrm{sim}\) or \(\tau _\mathrm{sim}^2 > 0\), this conservatism compensated to some extent for model misspecification. Berger et al. (2001) derived the form of the reference prior for analysis of spatial point-support data, and the same logic was applied—with area-to-area average correlations replacing the point-to-point correlations of Berger et al. (2001)—to justify a similar prior for analysis of areal-support data. However, although the logic to derive the form of the prior follows analogously, the authors are unsure of the analogous logic to ensure propriety of the resulting prior and posterior distributions. As such, even if the simulations would have demonstrated a strong advantage, it would require further work to derive the required proofs of propriety. With the simulation results not demonstrating strong advantages, other priors for \(\phi \) are currently recommended.
6.3 Misspecified Model
Validation statistics about prediction uncertainties are very sensitive to the misspecification of variogram parameters that determine the smoothness of the spatial signal. Examples are the nugget parameter and the smoothness parameter of the Matérn covariance function, as demonstrated in the simulation sessions 6, 7 and 8 and, in the authors’ opinion, in the vegetation index synthetic case study. Short-range spatial relationships are however difficult, if not impossible, to assess if only areal means are available. Situations with areal data combined with some high-density point data could improve the results, see for example Moraga et al. (2017); another approach would be to use prior information, such as expert opinions, for the nugget (Truong et al. 2014). In cases in which more summary data per area are available than only the mean, Orton et al. (2012) proposed a method for incorporating this information. In this situation, the exponential covariance model without a nugget was applied for convenience, as is often the case in comparable research; this is however a quite arbitrary choice, and given the results, a careful consideration of all model parameters that determine the smoothness of realisations is suggested for future research.
6.4 Number of Observations
In simulation sessions 6, 7 and 8 (very smooth or very rough simulated signals), mean(StSE) did not converge to one with an increasing number of observations m as might be expected, and actually diverged from one in most cases. Therefore, more data do not alleviate a poor choice of model. Furthermore, in the simulation set-up, increasing m had the side effect of decreasing the size of the individual areas, which might also have influenced this behaviour due to short-range variations becoming better observable.
Very small data sets (nine or ten observations) were analysed in the simulations. The main point of interest was to see how the different approaches behave in such extreme situations, assessed by taking averages over many simulations. The authors stress that, even when using a Bayesian approach, any geostatistical conclusion based on nine or ten observations should be interpreted with caution, except perhaps if strong and honest prior information is available and can be incorporated.
6.5 One-Dimensional Versus Two-Dimensional Simulations
The effect of simulation choices and statistical inference approaches was quite similar in the one-dimensional and two-dimensional simulations. Differences might be due to different mutual spatial relationships. For example, for the two-dimensional simulations with nine observations, the closest pairs of units have centroids separated by 100 au. For the one-dimensional simulations with ten observations, neighbouring units’ centroids are separated by 30 au. Therefore, despite there being an almost equal amount of data, there is much more short-range information in the one-dimensional data in the set-up used. This might explain the extremely large mean(StSE) values in session 9 of the two-dimensional study (up to 49.2) compared with the less extreme values in the one-dimensional study (up to 4.9).
The approximation of the average covariance matrices might have been less successful for the two-dimensional simulations. This would explain the relatively high max(MPP) and the unexpected irregular spatial pattern of the prediction error sd (see Online Resource D, Fig. D1 d).
6.6 The Algorithm
Although the authors did not compare the approach used here with more conventional MCMC methods, it proved an effective and efficient means of performing Bayesian (and also MML) spatial data analysis in the area-to-point context presented. As indicated in Sect. 3.2.3, several different methods can be used to approximate the average covariance matrices. For example, the Legendre–Gauss quadrature—as described by Orton et al. (2017)—is computationally and memory-wise much cheaper, but perhaps less accurate than the applied discretisation points method. Both methods, including some variations, are included in the code, as is area-to-area kriging. Future extensions might include directionality and point-to-point kriging.
The Integrated Nested Laplace Approximations (INLA) alternative to MCMC (Blangiardo et al. 2013) has some similarities to our partly analytical Bayesian approach, such as a gridded search in parameter space and numerical integration. However, it uses Laplace approximations of some integrals and is applicable to a much wider range of models, including hierarchical ones. Furthermore, its spatial implementation assumes the Markovian property on the spatial Gaussian random field (meaning that any point or area in the region is influenced only by its immediate neighbours), leading to sparse covariance matrices and thus reductions in computational costs. In our opinion, our approach offers specific and transparent insights into the Bayesian approach of model-based geostatistics. For future research, however, it would be interesting to redo the calculations with INLA, or to integrate some of the sophisticated and cost-reducing details of INLA into the code.
6.7 General Remarks
The general impression obtained from the ensemble of simulations is that REML tended to underestimate prediction uncertainty the most, followed by Bayesian-uniform. The Bayesian-reference approach tended to be more conservative, while MML was slightly conservative but seemed relatively stable. The differences between the approaches decreased with increasing m.
Given a covariance model that is more or less accurate in terms of short-range behaviour, the conclusion is that, for data sets of sufficient size, or if a slight underestimation of prediction uncertainty is allowed, the REML approach as demonstrated by Brus et al. (2018) should be sufficient. For smaller data sets with no prior information available, the most robust and in many cases best approach, albeit somewhat conservative, appeared to be MML. An additional advantage of MML is its relative insensitivity to arbitrary choices such as bounds on the correlation distance parameter. In several sessions, MML even outperformed Bayesian-inverse-gamma when the supplied prior information about \(\phi \) was correct. Finally, MML has additional computational benefits for prediction over the fully Bayesian approach.
The authors suggest focusing future research on modelling short-range variation and including a smoothness parameter in the inferential models. Using honest and informative priors—depending on the research question at hand—might also yield interesting results. The Matérn smoothness parameter \(\nu _\mathrm{M}\) could be made an integral part of the Bayesian model, or alternatively incorporated as an extra model parameter to be optimised in an MML approach, which could then be used as a plug-in value for prediction.
7 Conclusions
All tested geostatistical approaches for ATPK (REML, MML, and Bayesian with different priors for the distance parameter) provided very similar predictions, but were different in the prediction uncertainties, with REML slightly underestimating the uncertainty in the case of very few data. Prediction uncertainties are quite sensitive to the parameters determining the smoothness of the spatial signal (i.e., nugget and smoothness parameter of the Matérn covariance function). Given correctly modelled short-range effects, for data sets of sufficient size, or if an underestimation of prediction uncertainty is allowed, the REML approach as demonstrated by Brus et al. (2018) is satisfactory. The MML approach (maximum likelihood with trend and variance integrated out) provided acceptable results while being relatively robust to arbitrary settings for the parameter search. Also, this approach does not need a choice of prior for the distance parameter. A useful and robust full Bayesian approach could not be accomplished, perhaps due to the lack of a good uninformative prior for the distance parameter of the covariance function; the reference prior as proposed by Berger et al. (2001) overestimated the prediction uncertainty in most cases. For real-world case studies, the demonstrated algorithms (https://doi.org/10.4121/uuid:1fe0c01e-7f67-435b-a240-800579adc6e6) can be used.
Footnotes
- 1.
Universal kriging in this work is defined as geostatistical prediction with the trend uncertainty included and where this trend is based on one or more covariates, which may or may not include the spatial coordinates.
Notes
Acknowledgements
This study was supported by the SIGMA European Collaborative Project (FP7-ENV-2013 SIGMA-Stimulating Innovation for Global Monitoring of Agriculture and its Impact on the Environment in support of GEOGLAM-project), and by the Grains Research and Development Corporation (GRDC) of Australia under Project PROC-9175385. The helpful comments and suggestions from the anonymous reviewers and from the editor are acknowledged with gratitude.
Supplementary material
References
- Albert J (2009) Bayesian computation with R. Springer, New York. https://doi.org/10.1007/978-0-387-92298-0 CrossRefGoogle Scholar
- Berger JO, de Oliveira V, Sansó B (2001) Objective Bayesian analysis of spatially correlated data. J Am Stat Assoc 96(456):1361–1374CrossRefGoogle Scholar
- Blangiardo M, Cameletti M, Baio G, Rue H (2013) Spatial and spatio-temporal models with R-INLA. Spat Spatio-temporal Epidemiol 4:33–49. https://doi.org/10.1016/j.sste.2012.12.001 CrossRefGoogle Scholar
- Brus DJ, Orton TG, Walvoort DJJ, Reijneveld JA, Oenema O (2014) Disaggregation of soil testing data on organic matter by the summary statistics approach to area-to-point kriging. Geoderma 226–227:151–159. https://doi.org/10.1016/j.geoderma.2014.02.011 CrossRefGoogle Scholar
- Brus DJ, Boogaard H, Ceccarelli T, Orton TG, Traore S, Zhang M (2018) Geostatistical disaggregation of polygon maps of average crop yields by area-to-point kriging. Eur J Agron 97(July):48–59. https://doi.org/10.1016/j.eja.2018.05.003 CrossRefGoogle Scholar
- Christensen OF, Roberts GO, Skld M (2006) Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. J Comput Graph Stat 15(1):1–17. https://doi.org/10.1198/106186006x100470 CrossRefGoogle Scholar
- Diggle P, Ribeiro PJ (2007) Model-based geostatistics. Springer, Berlin. https://doi.org/10.1007/978-0-387-48536-2 CrossRefGoogle Scholar
- Goovaerts P (2006) Geostatistical analysis of disease data: accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using area-to-point Poisson kriging. Int J Health Geogr 5(1):52. https://doi.org/10.1186/1476-072x-5-52 CrossRefGoogle Scholar
- Goovaerts P (2008) Kriging and semivariogram deconvolution in the presence of irregular geographical units. Math Geosci 40(1):101–128. https://doi.org/10.1007/s11004-007-9129-1 CrossRefGoogle Scholar
- Harville DA (1974) Bayesian inference for variance components using only error contrasts. Biometrika 61(2):383–385. https://doi.org/10.2307/2334370 CrossRefGoogle Scholar
- Horta A, Pereira MJ, Gonalves M, Ramos T, Soares A (2014) Spatial modelling of soil hydraulic properties integrating different supports. J Hydrol 511(Supplement C):1–9. https://doi.org/10.1016/j.jhydrol.2014.01.027 CrossRefGoogle Scholar
- Jansen MJ (1998) Prediction error through modelling concepts and uncertainty from basic data. Nutr Cycl Agroecosyst 50(1):247–253. https://doi.org/10.1023/a:1009748529970 CrossRefGoogle Scholar
- Kerry R, Goovaerts P, Rawlins BG, Marchant BP (2012) Disaggregation of legacy soil data using area to point kriging for mapping soil organic carbon at the regional scale. Geoderma 170(Supplement C):347–358. https://doi.org/10.1016/j.geoderma.2011.10.007 CrossRefGoogle Scholar
- Kitanidis PK (1983) Statistical estimation of polynomial generalized covariance functions and hydrologic applications. Water Resour Res 19(4):909–921. https://doi.org/10.1029/WR019i004p00909 CrossRefGoogle Scholar
- Kitanidis PK (1986) Parameter uncertainty in estimation of spatial functions: Bayesian analysis. Water Resour Res 22(4):499–507. https://doi.org/10.1029/WR022i004p00499 CrossRefGoogle Scholar
- Kyriakidis PC (2004) A geostatistical framework for area-to-point spatial interpolation. Geogr Anal 36(3):259–289. https://doi.org/10.1111/j.1538-4632.2004.tb01135.x CrossRefGoogle Scholar
- Lark RM (2000) Estimating variograms of soil properties by the method-of-moments and maximum likelihood. Eur J Soil Sci 51(4):717–728. https://doi.org/10.1046/j.1365-2389.2000.00345.x CrossRefGoogle Scholar
- Lark RM, Cullis BR (2004) Model-based analysis using reml for inference from systematically sampled data on soil. Eur J Soil Sci 55(4):799–813. https://doi.org/10.1111/j.1365-2389.2004.00637.x CrossRefGoogle Scholar
- Le ND, Zidek JV (1992) Interpolation with uncertain spatial covariances: a Bayesian alternative to kriging. J Multivar Anal 43(2):351–374. https://doi.org/10.1016/0047-259X(92)90040-M CrossRefGoogle Scholar
- Lindley D (2004) That wretched prior. Significance 1(2):85–87. https://doi.org/10.1111/j.1740-9713.2004.026.x CrossRefGoogle Scholar
- Malone BP, McBratney AB, Minasny B, Laslett GM (2009) Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154(12):138–152. https://doi.org/10.1016/j.geoderma.2009.10.007 CrossRefGoogle Scholar
- Marchant BP, Lark RM (2004) Estimating variogram uncertainty. Math Geol 36(8):867–898. https://doi.org/10.1023/B:MATG.0000048797.08986.a7 CrossRefGoogle Scholar
- Marchant BP, Lark RM (2007) Optimized sample schemes for geostatistical surveys. Math Geol 39(1):113–134. https://doi.org/10.1007/s11004-006-9069-1 CrossRefGoogle Scholar
- McElreath R (2016) Statistical rethinking: a Bayesian course with examples in R and Stan. Chapman & Hall/CRC texts in statistical science series. CRC Press, Boca RatonGoogle Scholar
- Minasny B, Vrugt JA, McBratney AB (2011) Confronting uncertainty in model-based geostatistics using Markov Chain Monte Carlo simulation. Geoderma 163(34):150–162. https://doi.org/10.1016/j.geoderma.2011.03.011 CrossRefGoogle Scholar
- Moraga P, Cramb SM, Mengersen KL, Pagano M (2017) A geostatistical model for combined analysis of point-level and area-level data using inla and spde. Spat Stat 21:27–41. https://doi.org/10.1016/j.spasta.2017.04.006 CrossRefGoogle Scholar
- Oliver M, Webster R (2014) A tutorial guide to geostatistics: computing and modelling variograms and kriging. Catena 113:56–69. https://doi.org/10.1016/j.catena.2013.09.006 CrossRefGoogle Scholar
- Orton TG, Saby NPA, Arrouays D, Walter C, Lemercier B, Schvartz C, Lark RM (2012) Spatial prediction of soil organic carbon from data on large and variable spatial supports. I. Inventory and mapping. Environmetrics 23(2):129–147. https://doi.org/10.1002/env.2136 CrossRefGoogle Scholar
- Orton TG, Romn Dobarco M, Saby NPA (2017) Kriging based on areal summary statistics data: effects of within-unit variability on predictions and uncertainties. Spat Stat 19:42–67. https://doi.org/10.1016/j.spasta.2016.11.003 CrossRefGoogle Scholar
- Orton TG, Mallawaarachchi T, Pringle MJ, Menzies NW, Dalal RC, Kopittke PM, Searle R, Hochman Z, Dang YP (2018) Quantifying the economic impact of soil constraints on Australian agriculture: a case-study of wheat. Land Degrad Dev 29(11):3866–3875. https://doi.org/10.1002/ldr.3130 CrossRefGoogle Scholar
- Pardo-Igzquiza E, Dowd P (2001) Variance–covariance matrix of the experimental variogram: assessing variogram uncertainty. Math Geol 33(4):397–419. https://doi.org/10.1023/a:1011097228254 CrossRefGoogle Scholar
- Park NW (2013) Spatial downscaling of trmm precipitation using geostatistics and fine scale environmental variables. Adv Meteorol 2013:9. https://doi.org/10.1155/2013/237126 CrossRefGoogle Scholar
- Roth M (2013) On the multivariate t distribution. Report, Department of Electrical Engineering, Linköpings universitet. http://users.isy.liu.se/en/rt/roth/student.pdf. Accessed 30 Nov 2018
- Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. Chapman and Hall, LondonGoogle Scholar
- Seaman JW, Seaman JW, Stamey JD (2012) Hidden dangers of specifying noninformative priors. Am Stat 66(2):77–84. https://doi.org/10.1080/00031305.2012.695938 CrossRefGoogle Scholar
- Steinbuch L, Orton TG, Brus DJ (2019) Source code in the R programming language, belonging with: model based geostatistics from a Bayesian perspective: investigating area to point kriging with small datasets [Dataset]. 4TU. Centre for Research Data. https://doi.org/10.4121/uuid:1fe0c01e-7f67-435b-a240-800579adc6e6
- Stigler SM (2007) The epic story of maximum likelihood. Stat Sci 22(4):598–620. https://doi.org/10.1214/07-STS249 CrossRefGoogle Scholar
- Traore SB, Ali A, Tinni SH, Samake M, Garba I, Maigari I, Alhassane A, Samba A, Diao MB, Atta S, Dieye PO, Nacro HB, Bouafou KGM (2014) Agrhymet: a drought monitoring and capacity building center in the west africa region. Weather Clim Extremes 3:22–30. https://doi.org/10.1016/j.wace.2014.03.008 CrossRefGoogle Scholar
- Truong PN, Heuvelink GBM, Pebesma E (2014) Bayesian area-to-point kriging using expert knowledge as informative priors. Int J Appl Earth Obs Geoinf 30:128–138. https://doi.org/10.1016/j.jag.2014.01.019 CrossRefGoogle Scholar
- Wackernagel H (2014) Geostatistics. Wiley StatsRef: Statistics Reference Online. https://doi.org/10.1007/978-3-662-05294-5 CrossRefGoogle Scholar
- Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, HobokenCrossRefGoogle Scholar
- White P, Gelfand A, Utlaut T (2017) Prediction and model comparison for areal unit data. Spat Stat 22:89–106. https://doi.org/10.1016/j.spasta.2017.09.002 CrossRefGoogle Scholar
- You L, Wood S, Wood-Sichra U (2009) Generating plausible crop distribution maps for sub-saharan africa using a spatially disaggregated data fusion and optimization approach. Agric Syst 99(23):126–140. https://doi.org/10.1016/j.agsy.2008.11.003 CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.