Abstract
Areatopoint kriging (ATPK) is a geostatistical method for creating highresolution raster maps using data of the variable of interest with a much lower resolution. The data set of areal means is often considerably smaller (\(<\,50 \) observations) than data sets conventionally dealt with in geostatistical analyses. In contemporary ATPK methods, uncertainty in the variogram parameters is not accounted for in the prediction; this issue can be overcome by applying ATPK in a Bayesian framework. Commonly in Bayesian statistics, posterior distributions of model parameters and posterior predictive distributions are approximated by Markov chain Monte Carlo sampling from the posterior, which can be computationally expensive. Therefore, a partly analytical solution is implemented in this paper, in order to (i) explore the impact of the prior distribution on predictions and prediction variances, (ii) investigate whether certain aspects of uncertainty can be disregarded, simplifying the necessary computations, and (iii) test the impact of various model misspecifications. Several approaches using simulated data, aggregated realworld point data, and a case study on aggregated crop yields in Burkina Faso are compared. The prior distribution is found to have minimal impact on the disaggregated predictions. In most cases with known shortrange behaviour, an approach that disregards uncertainty in the variogram distance parameter gives a reasonable assessment of prediction uncertainty. However, some severe effects of model misspecification in terms of overly conservative or optimistic prediction uncertainties are found, highlighting the importance of model choice or integration into ATPK.
Introduction
An important challenge often encountered in scientific research is spatial prediction using arealsupport data, that is, data about the variable of interest that is available as areal means only. Data may be aggregated for privacy protection, administrative, technical or other reasons. Examples include data on the failures in semiconductor chip production (White et al. 2017), cancer mortality (Goovaerts 2006), precipitation (Park 2013), soil properties (Kerry et al. 2012; Orton et al. 2012; Malone et al. 2009; Brus et al. 2014), soil hydraulic properties (Horta et al. 2014) and, the motivating example in this research, crop yields. You et al. (2009) state that, because of limited resources, crop yield information in many subSaharan countries is available only sparsely and in aggregated format; however, such information is needed at a finer spatial resolution to support efforts to increase crop productivity [see for example Orton et al. (2018)] and thereby improve both human welfare and ecological sustainability.
Modelbased spatial prediction is usually done using algorithms based on point support (Diggle and Ribeiro 2007). In the case of arealsupported input, areatopoint kriging (ATPK) is a popular approach for creating finescale raster maps of the variable of interest and the corresponding prediction uncertainty (called spatial disaggregation or downscaling). In ATPK, regression coefficients (in the presence of covariates) and variogram parameters (describing spatial relationships) have to be estimated, for example by a least square estimator for the regression coefficients combined with an iterative variogram fitting deconvolution algorithm (‘method of moments’) on the regression residuals (Goovaerts 2008). More recent methods such as restricted maximum likelihood (REML) in combination with universal kriging^{Footnote 1} (UK) (both, and from hereon, referring to their application in the ATPK setting) consider the uncertainty in the regression coefficients (Webster and Oliver 2007). However, uncertainty in the variogram model parameters might also be a relevant source of uncertainty (Jansen 1998; Kitanidis 1986; Minasny et al. 2011). Truong et al. (2014) showed that variogram uncertainty can have a substantial impact on ATPK variances. Brus et al. (2018) summarised earlier work by PardoIgzquiza and Dowd (2001) showing that uncertainty in the variogram parameters can be quantified by the inverse Fisher matrix of the variogram parameters, but did not integrate this uncertainty in the kriging prediction uncertainty itself. In the Bayesian statistics paradigm, parameters can be considered stochastic rather than fixed but unknown (Gotway 2005). From a Bayesian perspective, REML in combination with UK considers the regression coefficients as stochastic and subsequently integrates them out—from the likelihood function for estimation of the variogram parameters as well as from the prediction. In this paper, this Bayesian direction is continued by successively integrating out the spatial variance parameter (analytically) and the spatial correlation distance parameter (numerically). By applying analytical solutions whenever possible, Markov chain Monte Carlo (MCMC) sampling from the Bayesian posterior distribution as proposed for example by Minasny et al. (2011) is avoided. MCMC can be computationally expensive and may, when used in a spatial context, be difficult to converge to a posterior distribution due to correlated parameters (Christensen et al. 2006). The extra effort of taking variogram parameter uncertainty into account could be most beneficial in the case of ATPK or areatoarea kriging, because the provided data set of areal means, from which regression coefficients and variogram parameters at highresolution support must be inferred, can often be limited in size. However, taking variogram parameter uncertainty into account can also be beneficial in the case of pointtopoint kriging (Le and Zidek 1992; Berger et al. 2001) and for sampling design (Marchant and Lark 2004, 2007).
It is not uncommon in ATPK studies to have only a small data set of areal means and no available relevant expert knowledge to inform model parameters (for example Brus et al. 2018). Therefore, this research aims to provide some insight into the applicability and behaviour of ATPK methods under these circumstances. To provide this insight, the following questions are answered: (i) What is the impact of different prior distributions—selected to represent a lack of prior knowledge about model parameters—on the quality of the ATPK predictions and prediction uncertainties? (ii) Can some aspects of uncertainty be disregarded, which might allow for computational benefits? (iii) How sensitive are results to misspecifications of the underlying statistical model?
In the following sections, the theoretical framework of modelbased geostatistics for areal data, the Bayesian paradigm and the combination of these are briefly introduced. In the simulation part of this paper, REML is compared with more Bayesian approaches to perform and assess ATPK using a simulated spatial signal, including data sets as small as nine observations. Using realworld data, different approaches on selfaggregated remote sensing data are tested, referred to as the synthetic case study. Finally, as the motivating example, millet and sorghum yields, known as areal means only, are downscaled for each of the 45 provinces of Burkina Faso to a finescale grid of predicted yields.
Theory
Geostatistics Basics: Gaussian Random Field
According to the general theoretical framework of modelbased geostatistics (Diggle and Ribeiro 2007), the spatial variable of interest is represented by a Gaussian random field
with \(\hbox {MVN}\) indicating a multivariate normal distribution; \(\varvec{X}\) the design matrix containing locationspecific covariate values, including a column with ones to represent the regression intercept; \(\varvec{\beta }\) the vector of k regression coefficients (also called trend parameters); \(\sigma ^2\) in the terminology of geostatistics the partial sill variance; and \(\varvec{C}(\phi )\) the spatial correlation matrix as a parametric function of distance parameter \(\phi \).
Among several other possibilities, the exponential covariance function
is assumed, where i1 and i2 index two discrete point locations \(s_{i1}\) and \(s_{i2}\) within the Gaussian random field, and \(h(s_{i1}, s_{i2})\) represents the Euclidean distance between those locations, while \((\ldots )_{i1,i2}\) means the element in row i1, column i2 of the indicated covariance matrix. Spatial directional dependence is not considered. For notational convenience, \(\varvec{C}(\phi )\) is indicated by \(\varvec{C}\) from now on.
AreatoPoint Kriging
In the case of ATPK, the observations are m areal means
together \(\bar{\varvec{z}}\), with \({\varvec{z}}(s)\) an unobserved realisation of an infinite number of point values \(\varvec{Z}\) in area \(A_j\) and \(A_j\) the surface area of area \(A_j\), turning Eq. (1) into
with \(\bar{ \varvec{Z}}\) the stochastic representation of observations \(\bar{ \varvec{z}}\), \(\bar{\varvec{X}}\) containing covariate values averaged over the corresponding areas, and \(\sigma ^2 \bar{\varvec{C}}\) the matrix with average covariances between and within the areas. Note that \(\varvec{\beta }\), \(\sigma ^2\) and \(\phi \) are equivalent in Eqs. (4) and (1): the parameters on point support are estimated from the areal data. Note the absence of a nugget effect (often indicated by \(\tau ^2\)), which might represent measurement errors and microscale variation, in the covariance model. Such a nugget effect cannot be identified based purely on arealsupport data. Although Truong et al. (2014) demonstrated the potential of expert prior knowledge to help define a nugget parameter, in the situation investigated here, no such knowledge is available. This issue will return in the discussion.
To be able to predict values \({\varvec{z}^*} \) at \(n^*\) point locations \(\varvec{s}^*\), together with the corresponding prediction variances \({\varvec{v}^*}\), it is necessary to define (1) matrix \(\bar{\varvec{C}} ^*\), the mean correlation between points in the observation areas and the prediction points and also implicitly a function of \(\phi \); (2) matrix \({\varvec{C}^{**}}\), the correlation matrix between the prediction points, again a function of \(\phi \); (3) the design matrix for the prediction locations \(\varvec{X}^*\); and finally (4) the generalised least squares (GLS) estimator for \(\varvec{\beta }\) as given by
The \({n^*+ m}\)dimensional distribution of the predictions and observations together can be written as
According to UK theory, the resulting prediction is given by
an implicit function of \(\phi \), with prediction variance–covariance matrix
an implicit function of \(\phi \) and \(\sigma ^2\). The diagonal of \(\hbox {var}[\hat{\varvec{z}^*_\mathrm{UK}}]\) is the vector of prediction error variances, better known as the universal kriging variance \(\hat{\varvec{v}}^*_\mathrm{UK}\), for every prediction point.
For a mathematical elaboration of the above equations, refer to Wackernagel (2014). The term regression kriging (RK) is used later to refer to simple kriging of the trend residuals, an approach which disregards any uncertainty in the estimated plugin values of the trend parameters, which results in a different kriging variance. In the following sections, the framework presented above will be extended to a Bayesian one.
Bayesian Statistics
In the Bayesian framework, parameters are considered random variables (McElreath 2016). Formulated in general probability notation where \({\varTheta }\) stands for ‘parameters of interest’ and E for evidence (observations), Bayes’ rule states that
where the probability distribution \(p({\varTheta }  E)\) indicates the posterior belief in the parameters given the evidence, \(p({\varTheta })\) indicates the prior belief in the parameters, independent of the evidence, \(p(E{\varTheta })\) is the probability of the evidence as a function of the parameters—called the likelihood—and p(E) is the probability of the evidence. Note that to assess the likelihood, a correctly defined probability distribution of the modelled process is assumed. Note also that p(E) is often left out when a proportional value for \(p({\varTheta }  E)\) is sufficient. Mathematically, p(E) equals \(p(E,{\varTheta })\) integrated over its parameters
In a Bayesian context, prediction entails formulating a distribution conditional on the evidence E. Inserting the parameters in the equation shows the derivation of the posterior predictive integral
where P stands for the predicted values.
In the Bayesian framework, the prior \(p({\varTheta })\) needs to be defined, stating the current state of knowledge—or, inversely formulated, the current state of ignorance—about the parameters. In many cases, a lowinformative prior is desired; however, formulating a prior that ‘lets the data speak for itself’ is not always straightforward (Seaman et al. 2012; Lindley 2004). Also, a ‘conjugate’ prior is often chosen so that its distribution function matches the likelihood, resulting in a closedform description of the posterior (Albert 2009).
For various reasons, prior distributions that do not integrate to a finite value (i.e., cannot be normalised to integrate to one) might be considered, termed in the Bayesian framework as ‘improper’ priors. Improper priors can result in improper (poorly defined) posterior probability densities.
In the remainder of this paper, a posterior probability distribution—proper or not—is indicated by \(f_p(\ldots \ldots )\), the likelihood by \(f_l(\ldots \ldots )\), and a prior distribution—again having propriety or not—by \(f_0(\ldots )\). When the function type is ambiguous, its interpretation depends on the context and its arguments.
Implementation
Partly Analytical Bayesian AreatoPoint Algorithm
In this section, a partly analytical and partly numerical algorithm to execute Bayesian ATPK is described, based on integrating out the trend and variance parameters and systematically exploring gridded values in the correlation distance parameter space. This algorithm is developed as an alternative to methods based on sampling from the posterior distribution. Starting from Eq. (4), the likelihood of data generated by the geostatistical model is based on the multivariate normal distribution
where \(\left ... \right \) indicates the determinant.
Throughout this work, the priors for the trend, variance and correlation distance parameters are given by
This prior represents a priori independence between the parameters with an unlimited uniform (and thus improper) prior for the regression coefficient vector; a prior for the variance that is equivalent to an unlimited uniform prior for \(ln(\sigma ^2)\), again an improper prior; and \(f_0 (\phi )\), for which different options are considered. It falls under the more general formulation of Berger et al. (2001), who considered appropriate objective (uninformative) priors for the analysis of spatial pointsupport data.
Given the above prior and likelihood function, the joint posterior distribution for the parameters is (up to a constant of proportionality)
Based on the above assumptions, Fig. 1 illustrates our partly analytical Bayesian algorithm, Bayesian areal kriging (BAK), to infer the marginal posterior distributions of all parameters and to calculate and summarise predictive distributions. The relevant equations and their derivation are given in Online Resource A; the summary stating the main equations follows in the coming sections.
Marginal Posterior Distance Parameter
Given the joint posterior (Eq. 14), \(\varvec{\beta }\) and \(\sigma ^2\) are analytically integrated out to arrive at the analytical solution for the marginal posterior for \(\phi \) given by
where \(\hat{\varvec{\beta }}\) is defined according to Eq. (5).
Numerically, BAK creates a onedimensional grid covering the parameter space of \(\phi \), calculates the marginal posterior for each \(\phi \), and normalises the marginal posterior to a distribution that integrates to one within the bounds of the \(\phi \) grid.
Marginal Posterior Sill
For the marginal posterior of \(\sigma ^2\), \(\varvec{\beta }\) is analytically integrated out from the joint posterior (Eq. 14) to arrive at
As, to the authors’ knowledge, there is no analytical way of integrating out \(\phi \), BAK creates a twodimensional grid over the parameter space of \(\phi \) and \(\sigma ^2\) and calculates the joint posterior for \(\sigma ^2\) and \(\phi \) (i.e., the integrand) for every grid point; then it performs a trapezoidal integration over \(\phi \) and normalises to arrive at the marginal distribution for \(\sigma ^2\).
Marginal Posterior Regression Coefficients
The marginal posteriors of the individual regression coefficients \(\beta _{q}\), \(q = 1\ldots k\) are based on the joint posterior for the vector \(\varvec{\beta }\)
The integrand here can be shown to be proportional to a multivariate t distribution (Roth 2013) for \(\varvec{\beta }\) with \(m k\) degrees of freedom, location (vector) parameter \(\hat{\varvec{\beta }}\) and scale (matrix) parameter
This integrand can be marginalised to a scaled tdistribution for the individual regression coefficients, as an implicit function of \(\phi \), and rearranged to give
with \(f_{p}(\phi  \bar{\varvec{z}})\) as indicated in Eq. (15) and where
defines a tdistribution for \(\beta _q\) with degrees of freedom \(\nu = m  k\), location parameter \(\hat{\beta }_q\) and scale parameter \(\Sigma _q\) the qth element on the diagonal of \(\varvec{\Sigma }_\beta \). Note that the variance of this tdistribution is \(\Sigma _q \nu / (\nu 2)\).
Similarly to Sect. 3.1.2, BAK creates twodimensional grids covering the parameter spaces of \(\phi \) and \(\beta _q\) (for all q) and applies the trapezoidal rule to calculate the integral over \(\phi \) in Eq. (19); finally, it normalises to get the marginal distributions for each individual \(\beta _q\).
Posterior Predictive Distribution
The conditional distribution for the variable of interest, given the data and any particular value of the distance parameter, \(f({\varvec{z}^*}  \bar{\varvec{z}}, \phi )\), is a tdistribution with degrees of freedom \(\nu = m k\), with location parameter \(\hat{z}^{*}  \phi \)—an implicit function of \(\phi \)—already given in Eq. (7), and with scale parameter as provided in Eq. A85 in Online Resource A. The variance of this conditional distribution, also a function of \(\phi \), is given by
Note that Eq. (21) is an increased universal kriging variance [see for comparison Eq. (8)] because the uncertainty in \(\sigma ^2\) is also considered—hence the increment expressed in the first fraction. The second fraction equals the REML estimate for \(\sigma ^2\) given \(\phi \).
The posterior predictive distribution is defined as an integral of the above conditional distribution with respect to the posterior distribution of the distance parameter,
which is numerically approximated. BAK first creates for each prediction point \(s^*\) a vector of predictions and a vector of corresponding prediction variances, both as a function of \(\phi \). Finally, the algorithm calculates the mean and variance of the posterior predictive distribution (or, more formally, of a finite mixture distribution that approximates this distribution, with weights defined based on \(f_p (\phi  \bar{\varvec{z}})\) and the spacings of the \(\phi \) parameter grid).
Methodological Details
In this work, a number of increasingly Bayesian approaches to ATPK are applied and compared (Table 1). The first three rows of the table represent plugin approaches for some of the parameters (i.e., the stated parameters are first estimated, by maximising a likelihood or marginal likelihood function, before being plugged into the relevant predictive distribution equation for prediction), while the final row represents the fully Bayesian approach. In the case of maximum likelihood estimation (ML, and not implemented in this work), all parameters (in the geostatistical context: regression coefficients and spatial covariance parameters) are estimated by analytically or numerically maximising the likelihood. This general approach was consolidated by Fisher almost a century ago (Stigler 2007) and applied in geostatistics for example by Kitanidis (1983) and Lark (2000). REML, which has been advocated for several decades in geostatistics, is based on a likelihood function for a set of projected data rather than the original data, and gives conditionally unbiased estimates for the spatial parameters (Webster and Oliver 2007; Lark and Cullis 2004); see also Sect. A2 in Online Resource A. REML represents a form of marginal likelihood (a likelihood function in which some parameters have been marginalised), and has been presented in a Bayesian framework as such (the integral of the likelihood function with respect to the trend parameters, assuming a flat improper prior for these parameters) (Harville 1974). Note that \(f_0(\varvec{\beta })\) can be considered an uninformative prior when neglected—this is often valid for centrality parameters but not for other parameters. Underpinning the same approach, UK takes the uncertainty in the trend coefficients into account, making it a logical combination with REML. Within this research, the combined application of REML and UK is indicated by ‘REML approach’. The next gradation towards the fully Bayesian approach is maximum likelihood with both trend and variance integrated out, in the context of this paper indicated by the generic term ‘maximum marginal likelihood’ (MML). Finally, the full Bayesian approach (also referred to as ‘Bayesian approach’) provides a posterior distribution of all parameters, while in the prediction all parameters are integrated out and the uncertainty of all parameters is taken into account.
In the following sections, REML, MML and the Bayesian approach are compared, and for the Bayesian approach different priors for \(\phi \) (as defined in the following section) are applied. All algorithms (including the central BAK algorithm as presented in Fig. 1) are written in the statistical programming language R, and are available at Steinbuch et al. (2019).
Prior Distributions for \(\phi \)
For \(f_0(\phi )\), three potential forms of prior distribution are compared, intended to represent limited prior knowledge. These are (1) a uniform prior with limited bounds; (2) the reference prior as suggested by Berger et al. (2001) for analysis of pointsupport data, applied in the context of arealsupport data, and explained in Online Resource B; and—in the simulation ensemble—(3) an inversegamma distribution. The bounded uniform and the inversegamma distributions are proper; the assumed propriety of the reference prior will be discussed later.
Estimation and Prediction with REML and MML
For REML, the approach as described by Brus et al. (2018) was applied. For MML, the posterior mode of \(\phi \) was calculated using the Bayesian approach with a uniform prior for \(\phi \). The predictive distribution was then defined conditionally on this value of \(\phi \). Mathematically, this equals integrating out \(\varvec{\beta }\) and \(\sigma ^2\) to arrive at an estimated \(\hat{\phi }\), which is successively used as a single plugin value for MML prediction; the mean and variance of the predictive distribution (representing the prediction and prediction variance) are shown in Table 1 and Eqs. (7) and (21).
Estimating Average Covariances
The average correlation matrices, \(\bar{\varvec{C}} \) and \(\bar{\varvec{C}} ^*\), can be approximated in different ways. In this research, many discretisation points within each area are defined and the relevant Euclidean distances between those points are calculated, followed by construction of the corresponding correlation matrix, based on the correlation function—such as given in Eq. (2)—and distance parameter \(\phi \). Then, all correlations per areaarea combination are averaged to arrive at \(\bar{\varvec{C}} \), and per areaprediction point combination to arrive at \(\bar{\varvec{C}} ^*\). The discretisation points were on a regular grid in the simulation study, and selected by simple random sampling in the two case studies.
Validation
To quantify the performance of each approach—for the simulation study and synthetic case study, where the original point data were available—the predictions \(\varvec{z}^*\) and prediction uncertainties \(\varvec{v}^*\) were assessed in relation to the original signal \(\varvec{z}\). As an indication of the quality of the prediction, the root mean squared error (RMSE) defined as
was calculated, where a smaller number indicates more accurate predictions (Oliver and Webster 2014). For comparison, a baseline approach was also included, for which point predictions were defined simply by the areal mean data for the corresponding area. Unbiasedness of predictions was tested using the mean error (ME), \(\frac{1}{n} \sum _{i=1}^{n} \left\{ z(s_i)z^*(s_i) \right\} \). The mass preserving property (MPP) of the predictions was checked, which states that, in the case of ATPK, the mean of all predictions in any observed area should equal the observed areal mean (Kyriakidis 2004). This check was summarised by showing the maximum observed difference between arealaverage data and the mean of the corresponding predictions.
As an indication of the quality of the prediction uncertainty, a motivating factor for this work, the standardised squared error (StSE) defined as
was calculated. This StSE should ideally have a mean of one (Lark 2000). Higher values indicate an underestimation of uncertainty, which is labelled ‘optimistic’, and lower values indicate an overestimation of uncertainty, labelled ‘conservative’.
Simulation Study
The following shows a single onedimensional simulation where REML and full Bayesian (defined with \(f_0(\phi ) \sim \hbox {uniform}\)) are compared for illustration purposes. Following the illustration, an ensemble of many simulations is applied to assess several settings. Online Resource D contains similar results for twodimensional simulations.
Single Simulation
Simulated Data Set
A line of length 300 abstract units (au) was created, and filled with \(n=600\) equally spaced nodes. Using the exponential covariance function, a spatially correlated signal (with a zeronugget exponential model; \(\sigma ^2_\mathrm{sim} = 5\), \(\phi _\mathrm{sim} = 60\)) was generated and added to the trend of a linear function of the coordinate (\(\beta _\mathrm{1 sim} = 0\) for the intercept, \(\beta _\mathrm{2 sim} = 0.02\) for the slope on coordinate). For the Bayesian approach, \(f_0(\phi ) \sim \hbox {uniform}\) was used, bounded by \(\varvec{\phi }_{l,u} = \{10, 300\}\); these bounds also defined bounds for the REML parameter search. The priors for \(\varvec{\beta }\) and \(\sigma ^2\) are provided in Eq. (13). The above settings are referred to as the standard settings. The line was split into \(m=10\) equal onedimensional ‘areas’ or line sections. Finally, both \(\varvec{z}\) and the covariate over the areas were averaged to arrive at the observed means \(\bar{\varvec{z}}\) and the averaged design matrix \(\bar{\varvec{X}}\).
Results
The original signal (assumed to be unobservable, and represented as point values), the areal means (the ‘observations’) and the predictions are shown in Fig. 2. The difference between the REML and the full Bayesian approach is mainly in the prediction uncertainty: the Bayesian approach gives a slightly larger prediction interval.
The marginal densities in Fig. 3 show that, based on this—small—simulated data set, it is rather difficult to identify the distance parameter \(\phi \), which has a very flat mode. The REML estimates for \(\phi \) and \(\sigma ^2\) (point values) are close to the modes of the respective marginal posteriors from the Bayesian approach. For the trend parameters, the marginal Bayesian posterior distributions are slightly skewed and wider than the corresponding distributions based on REML (Gaussian distributions parameterised by the GLS estimate and estimation variance), indicating that more uncertainty is included.
The pairwise joint posteriors (\(\phi \) with each other parameter) are shown in Fig. 4: \(\sigma ^2\) and \(\phi \) seem to have a positive correlation (subfigure a), while \(\beta _1\) has a slightly negative correlation with \(\phi \) (b) and \(\beta _2\) a slightly positive correlation (c). Figure 5 shows the variogram models obtained with REML and the Bayesianuniform approach. Table 2 shows the validation results of this single simulation. The quality of the prediction (RMSE) is almost equal for the REML and Bayesian approaches, and better than for the baseline approach. The ME and max(MPP) are close to zero, indicating the absence of bias, and a discretisation grid (which is different from the prediction grid) of sufficient density to provide good approximations of arealaverage covariances and arealaverage values of covariates. The uncertainty validation value mean(StSE) shows that REML is on average optimistic, while the Bayesian approach is conservative. Note that the baseline approach does not provide a quantification of prediction uncertainty.
Simulation Ensemble
In this section, results are generalised by generating many simulations, varying only the random number seed, while comparing validation statistics on the outcomes of several approaches: REML, MML and full Bayesian with the three different priors for \(\phi \) indicated earlier. The applied inversegamma prior for \(\phi \) is set as somewhat informative with shape \(= 11\) and rate \(= 600\). This results in a mean of 60 au, emulating a situation where decent prior knowledge about the range is available. Also, the number of observations m is varied by dividing the line into m sections of equal length; this together is one ‘session’. Furthermore, to investigate how the approaches behave for differently simulated data sets and different inference settings, both are varied into an ‘ensemble’ of many sessions. The settings as used in standard session 1 are given in Sect. 4.1.1. Sessions 2 and 3 vary the upper bound for the uniform prior and for the REML and MML searches for estimation of \(\phi \) in comparison with standard session 1 (where \(\phi = 300\) au). In session 4, the trend is removed (so that the inferential model has to infer the mean only); in session 5, the trend is based on a separate Gaussian random field (GRF) rather than on the coordinates. Sessions 6, 7 and 8 introduce a misfit between the simulation model and the inferential model, where the correlation function in the simulation model is changed or a nugget component is added—the inferential model stays unchanged. Finally, sessions 9 and 10 show the effects of a misfit between the actual signal and the support of the available data (i.e., short or very long distance parameter used in simulation compared with the area sizes and total extent), which might make it difficult to identify parameters.
Table 3 presents the results expressed as the average (and, in small font, the corresponding standard deviation) over 250 means of the standardised squared error (mean(StSE)). Table D1 in Online Resource D shows the results of the analogous twodimensional simulations. Additional validation statistics and assessments regarding \(\phi \) and \(\sigma ^2\) for the simulation ensemble can be found in Online Resource C, which also includes \(m = 15\) and \(m = 30\) for the onedimensional simulations, and in Online Resource E for the corresponding figures of the twodimensional simulations.
General Results
Referring to Online Resources C and E, the maximal difference with respect to the mass preserving property (max(MPP)) ranges between 0.09 and 0.28 in the case of the twodimensional simulations. In the onedimensional case, max(MPP) is much smaller. With all approaches in all simulations, the ME was small. The RMSE was, for a given simulation, almost equal for all, but the baseline approach was, on average, larger. The main difference between the approaches was in the prediction uncertainty (assessed by StSE).
The standard session 1 in Table 3 shows that \(m=10\) caused REML to be optimistic, while the Bayesianuniform approach was less optimistic, Bayesianinversegamma was closest to one (perhaps due to the knowledge captured in the prior distribution for \(\phi \)), MML was slightly conservative and Bayesianreference very conservative. With increasing m, all mean(StSE) approached one, while the corresponding sd(StSE) decreased. Even with \(m = 20\), the differences between approaches and the deviation from one became small, except for the Bayesianreference approach. The results for twodimensional simulations were similar, although differences between approaches were a bit larger for \(m = 9\) and deviations from one were often still substantial for \(m=25\).
Changing Uniform Prior for \(\phi \) (Sessions 2 and 3)
Sessions 2 and 3 vary only in the upper bound of the uniform prior for \(\phi \) (au = 100 and 2000, respectively) used as the basis for inference, rather than in the simulating model; for comparison, the same bounds in the REML and MML parameter searches were applied. Note that Bayesianinversegamma and Bayesianreference results (see Sect. 3.2.1), having their own bounds, are not repeated here. Also recall that the extent of the simulated data set was 300 (au). The seemingly arbitrary choice of the upper bound of the uniform prior for \(\phi \) influenced the results, especially with few data (small m) and with the twodimensional simulations (see Online Resource D).
Although MML and the Bayesianuniform approach use the same range of possible \(\phi \) values, MML was far less influenced by the upper bound for its parameter search. The proportion of \(\hat{\phi }\) values (estimated by REML and MML, respectively) that were very close to the upper and lower bounds are also given (Online Resource C/E). Interestingly, in the case of a larger upper bound (session 3), the fraction of REMLestimated \(\phi \)’s close to the unchanged lower bound was larger than in session 1, while the fraction of MMLestimated \(\phi \)’s close to the lower bound stayed the same.
Varying Simulation Trend (Sessions 4 and 5)
The trend on the spatial coordinate (the standard) was also compared with a trend that was a simulated GRF itself (\(\beta _1 = 0\), \(\beta _2 = 2\), \(\tau ^2 = 0\), \(\sigma ^2 = 0.5\), \(\phi = 30\)), session 4, and with a constant mean, session 5. In both cases, the form of the trend (i.e., the design matrix) was assumed to be known for inference and prediction. The only difference between the sessions was the means: the simulated error signals for sessions 1–5 were identical. Compared with a trend on the coordinate, both the GRF trend and a constant mean gave only minor differences in mean(StSE); this also held for the twodimensional simulations.
Misspecified Model (Sessions 6, 7 and 8)
In sessions 6 and 7, the error signal was simulated using a Matérn covariance function with large and small values for the smoothness parameter \(\nu _\mathrm{sim}\) (not to be confused with the degrees of freedom \(\nu \) of a tdistribution used earlier). The inference in these sessions was still based on the exponential covariance model, which equals the Matérn model with \(\nu _\mathrm{sim} = 0.5\). These sessions were designed to provide a test of how the methods deal with a misspecified inferential model. The large \(\nu _\mathrm{sim}\) in session 6 caused all mean(StSE) to be far too conservative, with the Bayesian approaches slightly more conservative, and with average mean(StSE) values becoming smaller with increasing m. In the twodimensional simulations, the values stayed considerably closer to one. A small \(\nu _\mathrm{sim}\), as shown in session 7, caused almost all results to be optimistic. With increasing m, the mean(StSE) did not converge towards one, but rather seemed to stabilise at an optimistic value. With a nugget component added to the simulated data (session 8; with nuggetsill ratio 1/6), all approaches were optimistic (except Bayesianreference and \(m = 10\), and its twodimensional counterpart with \(m = 9\)), and the average mean(StSE) increased with m in the onedimensional simulations. In the twodimensional simulations, the relation between m and mean(StSE) was ambiguous.
Simulation with Extreme Distance Parameter (Sessions 9 and 10)
If the distance parameter used for the simulations was very small in relation to the areas under consideration, such as in session 9, all approaches seemed to be quite optimistic, but this effect strongly decreased with increasing m. The worst performer was the Bayesianinversegamma, where information encapsulated in \(f_0(\phi )\) was now mismatched with the simulation model, although Bayesianuniform also performed badly. The Bayesianreference approach performed best. In the twodimensional simulations, values were more extreme, especially for \(m=9\). When, as in session 10, the distance parameter was large compared with the total extent under consideration, REML performed almost perfectly while other approaches tended to be slightly or fairly conservative, but improved with increasing m.
Case Studies
Synthetic Case Study: Vegetation Index Data, with Validation on Point Support
To briefly investigate how REML, MML and full Bayesian would perform for a realworld data set, a remotesensing vegetation index, CFAPAR27, was used as the variable of interest. These data are used as a covariate in the real case study (spatial prediction of crop yield in Burkina Faso, Sect. 5.2) and therefore concisely described in Online Resource F. This spatial variable is, obviously, available on pixel support. The CFAPAR27 data were masked using the crop yield mask (see also Sect. 5.2), and subsequently aggregated over the 45 provinces of Burkina Faso. As covariates for inference, two climate variables broadly representing rainfall and temperature (CRAINEC27 and TMINEC21) and one variable representing soil pH (PHAQ) were used. Gaussianity for all realworld variables of interest was assumed.
ATPK was applied using four approaches: (1) REML, (2) MML, (3) the full Bayesian approach using the uniform prior for \(\phi \), and (4) the full Bayesian approach using the reference prior for \(\phi \). For REML and MML, the parameter search for \(\phi \) was bounded between 37 and 300 km, being roughly the smallest distance between the centres of any two areas, and one third of the largest extent of the region of interest, respectively. The same bounds defined the uniform prior for the full Bayesian approach.
The resulting mean(StSE) was 2.87, 2.73, 2.92 and 2.59 for the REML, MML, Bayesianuniform and Bayesianreference approaches, respectively, showing that prediction uncertainty was seriously underestimated by all approaches. The mean(StSE) of the Bayesianuniform approach could be changed by several tenths by adjusting the bounds of the uniform prior. All RMSE values for the four approaches were around 6.19 (compared with the baseline approach RMSE of 16.54), indicating that they offered the same prediction quality and probably quite similar predictions.
Real Case Study: Crop Yield Data
As a realworld case study, this paper predicts yields of sorghum and millet, both cereal staple foods, in Burkina Faso, West Africa. The observation areas are the 45 provinces, for which only the average yields are known (averaged over the years 2000–2013, and provided by AGRHYMET), as shown in Fig. 6 for millet. Covariates for the trends as suggested by Brus et al. (2018) are used: for millet no covariates, and for sorghum four covariates are shown and briefly explained in Online Resource F.
REML, MML, Bayesianuniform and Bayesianreference approaches were applied, with similar settings to those of Sect. 5.1 in the previous analysis of the vegetation index data. The observed millet yields, and the resulting predictions and prediction uncertainties (standard deviations of the predictive distributions) when applying MML, are presented in Fig. 6; maps are presented first over the entire study region, then focused on a subregion to reveal more detail. Similar maps for sorghum are presented in Online Resource F.
Figure 7 shows the densities of the millet yield predictions and prediction standard deviations based on all four approaches, indicating that the Bayesian and, to a lesser extent, MML approaches generated larger prediction uncertainties than REML. For sorghum (see Online Resource F), the Bayesianreferencecalculated prediction diverted from the other approaches, due to the tendency of the distance parameter to move as close as possible to zero; the applied lower bound for the uniform prior for \(\phi \) and for the REML and MML parameter searches imposed a limit on this effect. This shows again that a seemingly arbitrary choice of a uniform prior or of a parameter range for REML or MML might influence the resulting prediction uncertainty.
Discussion
Setting Uniform Prior
Both in the simulations (Table 3 and Online Resource C) and in the case studies, the choice of the upper and lower bounds of a uniform prior for \(\phi \) can influence the prediction uncertainty, especially (but not exclusively) with smaller data sets and if the posterior mode of \(\phi \) coincides with one of the bounds of the prior. This effect can also occur with REML and MML approaches, where the search for the optimum value of \(\phi \) is bounded by the same limits. It should be stressed that, in this context, this ‘flat’ uniform prior cannot be considered uninformative. The fact that the posterior modes of \(\phi \) (resulting from Bayesian approaches), or \(\hat{\phi }\) (from the REML approach), often coincided with one of the bounds (for example see the ‘\(\hat{\phi }\), \(\hbox {mode}(\phi ) \approx ~ \min , ~ \max \)’ columns in Online Resources C and E, but also sorghum in the case study) highlights the importance of carefully considering such prior or parameter search settings in geostatistical practice.
Reference Prior
According to the simulations, the reference prior did not perform well, being in many cases too conservative about prediction uncertainty, and pushing posterior distributions of the distance parameter too strongly towards zero (see also Online Resources C and D). In the case of small \(\nu _\mathrm{sim}\) or \(\tau _\mathrm{sim}^2 > 0\), this conservatism compensated to some extent for model misspecification. Berger et al. (2001) derived the form of the reference prior for analysis of spatial pointsupport data, and the same logic was applied—with areatoarea average correlations replacing the pointtopoint correlations of Berger et al. (2001)—to justify a similar prior for analysis of arealsupport data. However, although the logic to derive the form of the prior follows analogously, the authors are unsure of the analogous logic to ensure propriety of the resulting prior and posterior distributions. As such, even if the simulations would have demonstrated a strong advantage, it would require further work to derive the required proofs of propriety. With the simulation results not demonstrating strong advantages, other priors for \(\phi \) are currently recommended.
Misspecified Model
Validation statistics about prediction uncertainties are very sensitive to the misspecification of variogram parameters that determine the smoothness of the spatial signal. Examples are the nugget parameter and the smoothness parameter of the Matérn covariance function, as demonstrated in the simulation sessions 6, 7 and 8 and, in the authors’ opinion, in the vegetation index synthetic case study. Shortrange spatial relationships are however difficult, if not impossible, to assess if only areal means are available. Situations with areal data combined with some highdensity point data could improve the results, see for example Moraga et al. (2017); another approach would be to use prior information, such as expert opinions, for the nugget (Truong et al. 2014). In cases in which more summary data per area are available than only the mean, Orton et al. (2012) proposed a method for incorporating this information. In this situation, the exponential covariance model without a nugget was applied for convenience, as is often the case in comparable research; this is however a quite arbitrary choice, and given the results, a careful consideration of all model parameters that determine the smoothness of realisations is suggested for future research.
Number of Observations
In simulation sessions 6, 7 and 8 (very smooth or very rough simulated signals), mean(StSE) did not converge to one with an increasing number of observations m as might be expected, and actually diverged from one in most cases. Therefore, more data do not alleviate a poor choice of model. Furthermore, in the simulation setup, increasing m had the side effect of decreasing the size of the individual areas, which might also have influenced this behaviour due to shortrange variations becoming better observable.
Very small data sets (nine or ten observations) were analysed in the simulations. The main point of interest was to see how the different approaches behave in such extreme situations, assessed by taking averages over many simulations. The authors stress that, even when using a Bayesian approach, any geostatistical conclusion based on nine or ten observations should be interpreted with caution, except perhaps if strong and honest prior information is available and can be incorporated.
OneDimensional Versus TwoDimensional Simulations
The effect of simulation choices and statistical inference approaches was quite similar in the onedimensional and twodimensional simulations. Differences might be due to different mutual spatial relationships. For example, for the twodimensional simulations with nine observations, the closest pairs of units have centroids separated by 100 au. For the onedimensional simulations with ten observations, neighbouring units’ centroids are separated by 30 au. Therefore, despite there being an almost equal amount of data, there is much more shortrange information in the onedimensional data in the setup used. This might explain the extremely large mean(StSE) values in session 9 of the twodimensional study (up to 49.2) compared with the less extreme values in the onedimensional study (up to 4.9).
The approximation of the average covariance matrices might have been less successful for the twodimensional simulations. This would explain the relatively high max(MPP) and the unexpected irregular spatial pattern of the prediction error sd (see Online Resource D, Fig. D1 d).
The Algorithm
Although the authors did not compare the approach used here with more conventional MCMC methods, it proved an effective and efficient means of performing Bayesian (and also MML) spatial data analysis in the areatopoint context presented. As indicated in Sect. 3.2.3, several different methods can be used to approximate the average covariance matrices. For example, the Legendre–Gauss quadrature—as described by Orton et al. (2017)—is computationally and memorywise much cheaper, but perhaps less accurate than the applied discretisation points method. Both methods, including some variations, are included in the code, as is areatoarea kriging. Future extensions might include directionality and pointtopoint kriging.
The Integrated Nested Laplace Approximations (INLA) alternative to MCMC (Blangiardo et al. 2013) has some similarities to our partly analytical Bayesian approach, such as a gridded search in parameter space and numerical integration. However, it uses Laplace approximations of some integrals and is applicable to a much wider range of models, including hierarchical ones. Furthermore, its spatial implementation assumes the Markovian property on the spatial Gaussian random field (meaning that any point or area in the region is influenced only by its immediate neighbours), leading to sparse covariance matrices and thus reductions in computational costs. In our opinion, our approach offers specific and transparent insights into the Bayesian approach of modelbased geostatistics. For future research, however, it would be interesting to redo the calculations with INLA, or to integrate some of the sophisticated and costreducing details of INLA into the code.
General Remarks
The general impression obtained from the ensemble of simulations is that REML tended to underestimate prediction uncertainty the most, followed by Bayesianuniform. The Bayesianreference approach tended to be more conservative, while MML was slightly conservative but seemed relatively stable. The differences between the approaches decreased with increasing m.
Given a covariance model that is more or less accurate in terms of shortrange behaviour, the conclusion is that, for data sets of sufficient size, or if a slight underestimation of prediction uncertainty is allowed, the REML approach as demonstrated by Brus et al. (2018) should be sufficient. For smaller data sets with no prior information available, the most robust and in many cases best approach, albeit somewhat conservative, appeared to be MML. An additional advantage of MML is its relative insensitivity to arbitrary choices such as bounds on the correlation distance parameter. In several sessions, MML even outperformed Bayesianinversegamma when the supplied prior information about \(\phi \) was correct. Finally, MML has additional computational benefits for prediction over the fully Bayesian approach.
The authors suggest focusing future research on modelling shortrange variation and including a smoothness parameter in the inferential models. Using honest and informative priors—depending on the research question at hand—might also yield interesting results. The Matérn smoothness parameter \(\nu _\mathrm{M}\) could be made an integral part of the Bayesian model, or alternatively incorporated as an extra model parameter to be optimised in an MML approach, which could then be used as a plugin value for prediction.
Conclusions
All tested geostatistical approaches for ATPK (REML, MML, and Bayesian with different priors for the distance parameter) provided very similar predictions, but were different in the prediction uncertainties, with REML slightly underestimating the uncertainty in the case of very few data. Prediction uncertainties are quite sensitive to the parameters determining the smoothness of the spatial signal (i.e., nugget and smoothness parameter of the Matérn covariance function). Given correctly modelled shortrange effects, for data sets of sufficient size, or if an underestimation of prediction uncertainty is allowed, the REML approach as demonstrated by Brus et al. (2018) is satisfactory. The MML approach (maximum likelihood with trend and variance integrated out) provided acceptable results while being relatively robust to arbitrary settings for the parameter search. Also, this approach does not need a choice of prior for the distance parameter. A useful and robust full Bayesian approach could not be accomplished, perhaps due to the lack of a good uninformative prior for the distance parameter of the covariance function; the reference prior as proposed by Berger et al. (2001) overestimated the prediction uncertainty in most cases. For realworld case studies, the demonstrated algorithms (https://doi.org/10.4121/uuid:1fe0c01e7f67435ba240800579adc6e6) can be used.
Notes
 1.
Universal kriging in this work is defined as geostatistical prediction with the trend uncertainty included and where this trend is based on one or more covariates, which may or may not include the spatial coordinates.
References
Albert J (2009) Bayesian computation with R. Springer, New York. https://doi.org/10.1007/9780387922980
Berger JO, de Oliveira V, Sansó B (2001) Objective Bayesian analysis of spatially correlated data. J Am Stat Assoc 96(456):1361–1374
Blangiardo M, Cameletti M, Baio G, Rue H (2013) Spatial and spatiotemporal models with RINLA. Spat Spatiotemporal Epidemiol 4:33–49. https://doi.org/10.1016/j.sste.2012.12.001
Brus DJ, Orton TG, Walvoort DJJ, Reijneveld JA, Oenema O (2014) Disaggregation of soil testing data on organic matter by the summary statistics approach to areatopoint kriging. Geoderma 226–227:151–159. https://doi.org/10.1016/j.geoderma.2014.02.011
Brus DJ, Boogaard H, Ceccarelli T, Orton TG, Traore S, Zhang M (2018) Geostatistical disaggregation of polygon maps of average crop yields by areatopoint kriging. Eur J Agron 97(July):48–59. https://doi.org/10.1016/j.eja.2018.05.003
Christensen OF, Roberts GO, Skld M (2006) Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. J Comput Graph Stat 15(1):1–17. https://doi.org/10.1198/106186006x100470
Diggle P, Ribeiro PJ (2007) Modelbased geostatistics. Springer, Berlin. https://doi.org/10.1007/9780387485362
Goovaerts P (2006) Geostatistical analysis of disease data: accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using areatopoint Poisson kriging. Int J Health Geogr 5(1):52. https://doi.org/10.1186/1476072x552
Goovaerts P (2008) Kriging and semivariogram deconvolution in the presence of irregular geographical units. Math Geosci 40(1):101–128. https://doi.org/10.1007/s1100400791291
Harville DA (1974) Bayesian inference for variance components using only error contrasts. Biometrika 61(2):383–385. https://doi.org/10.2307/2334370
Horta A, Pereira MJ, Gonalves M, Ramos T, Soares A (2014) Spatial modelling of soil hydraulic properties integrating different supports. J Hydrol 511(Supplement C):1–9. https://doi.org/10.1016/j.jhydrol.2014.01.027
Jansen MJ (1998) Prediction error through modelling concepts and uncertainty from basic data. Nutr Cycl Agroecosyst 50(1):247–253. https://doi.org/10.1023/a:1009748529970
Kerry R, Goovaerts P, Rawlins BG, Marchant BP (2012) Disaggregation of legacy soil data using area to point kriging for mapping soil organic carbon at the regional scale. Geoderma 170(Supplement C):347–358. https://doi.org/10.1016/j.geoderma.2011.10.007
Kitanidis PK (1983) Statistical estimation of polynomial generalized covariance functions and hydrologic applications. Water Resour Res 19(4):909–921. https://doi.org/10.1029/WR019i004p00909
Kitanidis PK (1986) Parameter uncertainty in estimation of spatial functions: Bayesian analysis. Water Resour Res 22(4):499–507. https://doi.org/10.1029/WR022i004p00499
Kyriakidis PC (2004) A geostatistical framework for areatopoint spatial interpolation. Geogr Anal 36(3):259–289. https://doi.org/10.1111/j.15384632.2004.tb01135.x
Lark RM (2000) Estimating variograms of soil properties by the methodofmoments and maximum likelihood. Eur J Soil Sci 51(4):717–728. https://doi.org/10.1046/j.13652389.2000.00345.x
Lark RM, Cullis BR (2004) Modelbased analysis using reml for inference from systematically sampled data on soil. Eur J Soil Sci 55(4):799–813. https://doi.org/10.1111/j.13652389.2004.00637.x
Le ND, Zidek JV (1992) Interpolation with uncertain spatial covariances: a Bayesian alternative to kriging. J Multivar Anal 43(2):351–374. https://doi.org/10.1016/0047259X(92)90040M
Lindley D (2004) That wretched prior. Significance 1(2):85–87. https://doi.org/10.1111/j.17409713.2004.026.x
Malone BP, McBratney AB, Minasny B, Laslett GM (2009) Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154(12):138–152. https://doi.org/10.1016/j.geoderma.2009.10.007
Marchant BP, Lark RM (2004) Estimating variogram uncertainty. Math Geol 36(8):867–898. https://doi.org/10.1023/B:MATG.0000048797.08986.a7
Marchant BP, Lark RM (2007) Optimized sample schemes for geostatistical surveys. Math Geol 39(1):113–134. https://doi.org/10.1007/s1100400690691
McElreath R (2016) Statistical rethinking: a Bayesian course with examples in R and Stan. Chapman & Hall/CRC texts in statistical science series. CRC Press, Boca Raton
Minasny B, Vrugt JA, McBratney AB (2011) Confronting uncertainty in modelbased geostatistics using Markov Chain Monte Carlo simulation. Geoderma 163(34):150–162. https://doi.org/10.1016/j.geoderma.2011.03.011
Moraga P, Cramb SM, Mengersen KL, Pagano M (2017) A geostatistical model for combined analysis of pointlevel and arealevel data using inla and spde. Spat Stat 21:27–41. https://doi.org/10.1016/j.spasta.2017.04.006
Oliver M, Webster R (2014) A tutorial guide to geostatistics: computing and modelling variograms and kriging. Catena 113:56–69. https://doi.org/10.1016/j.catena.2013.09.006
Orton TG, Saby NPA, Arrouays D, Walter C, Lemercier B, Schvartz C, Lark RM (2012) Spatial prediction of soil organic carbon from data on large and variable spatial supports. I. Inventory and mapping. Environmetrics 23(2):129–147. https://doi.org/10.1002/env.2136
Orton TG, Romn Dobarco M, Saby NPA (2017) Kriging based on areal summary statistics data: effects of withinunit variability on predictions and uncertainties. Spat Stat 19:42–67. https://doi.org/10.1016/j.spasta.2016.11.003
Orton TG, Mallawaarachchi T, Pringle MJ, Menzies NW, Dalal RC, Kopittke PM, Searle R, Hochman Z, Dang YP (2018) Quantifying the economic impact of soil constraints on Australian agriculture: a casestudy of wheat. Land Degrad Dev 29(11):3866–3875. https://doi.org/10.1002/ldr.3130
PardoIgzquiza E, Dowd P (2001) Variance–covariance matrix of the experimental variogram: assessing variogram uncertainty. Math Geol 33(4):397–419. https://doi.org/10.1023/a:1011097228254
Park NW (2013) Spatial downscaling of trmm precipitation using geostatistics and fine scale environmental variables. Adv Meteorol 2013:9. https://doi.org/10.1155/2013/237126
Roth M (2013) On the multivariate t distribution. Report, Department of Electrical Engineering, Linköpings universitet. http://users.isy.liu.se/en/rt/roth/student.pdf. Accessed 30 Nov 2018
Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. Chapman and Hall, London
Seaman JW, Seaman JW, Stamey JD (2012) Hidden dangers of specifying noninformative priors. Am Stat 66(2):77–84. https://doi.org/10.1080/00031305.2012.695938
Steinbuch L, Orton TG, Brus DJ (2019) Source code in the R programming language, belonging with: model based geostatistics from a Bayesian perspective: investigating area to point kriging with small datasets [Dataset]. 4TU. Centre for Research Data. https://doi.org/10.4121/uuid:1fe0c01e7f67435ba240800579adc6e6
Stigler SM (2007) The epic story of maximum likelihood. Stat Sci 22(4):598–620. https://doi.org/10.1214/07STS249
Traore SB, Ali A, Tinni SH, Samake M, Garba I, Maigari I, Alhassane A, Samba A, Diao MB, Atta S, Dieye PO, Nacro HB, Bouafou KGM (2014) Agrhymet: a drought monitoring and capacity building center in the west africa region. Weather Clim Extremes 3:22–30. https://doi.org/10.1016/j.wace.2014.03.008
Truong PN, Heuvelink GBM, Pebesma E (2014) Bayesian areatopoint kriging using expert knowledge as informative priors. Int J Appl Earth Obs Geoinf 30:128–138. https://doi.org/10.1016/j.jag.2014.01.019
Wackernagel H (2014) Geostatistics. Wiley StatsRef: Statistics Reference Online. https://doi.org/10.1007/9783662052945
Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, Hoboken
White P, Gelfand A, Utlaut T (2017) Prediction and model comparison for areal unit data. Spat Stat 22:89–106. https://doi.org/10.1016/j.spasta.2017.09.002
You L, Wood S, WoodSichra U (2009) Generating plausible crop distribution maps for subsaharan africa using a spatially disaggregated data fusion and optimization approach. Agric Syst 99(23):126–140. https://doi.org/10.1016/j.agsy.2008.11.003
Acknowledgements
This study was supported by the SIGMA European Collaborative Project (FP7ENV2013 SIGMAStimulating Innovation for Global Monitoring of Agriculture and its Impact on the Environment in support of GEOGLAMproject), and by the Grains Research and Development Corporation (GRDC) of Australia under Project PROC9175385. The helpful comments and suggestions from the anonymous reviewers and from the editor are acknowledged with gratitude.
Author information
Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix–Abbreviations
Appendix–Abbreviations
 au:

Abstract units (length)
 ATPK:

Areatopointkriging
 BAK:

Bayesian areal kriging
 BATPK:

Bayesian areatopoint kriging
 FIR:

Fraction inside region
 GRF:

Gaussian random field
 INLA:

Integrated Nested Laplace Approximation
 MCMC:

Markov Chain Monte Carlo
 ME:

Mean error
 ML:

Maximum likelihood
 MML:

Maximum marginal likelihood
 MPP:

Mass preserving property
 StSE:

Standardised squared error
 REML:

Restricted maximum likelihood
 RK:

Regression kriging
 RMSE:

Root mean squared error
 UK:

Universal kriging
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Steinbuch, L., Orton, T.G. & Brus, D.J. ModelBased Geostatistics from a Bayesian Perspective: Investigating AreatoPoint Kriging with Small Data Sets. Math Geosci 52, 397–423 (2020). https://doi.org/10.1007/s11004019098406
Received:
Accepted:
Published:
Issue Date:
Keywords
 Areatopoint kriging
 Spatial disaggregation
 Bayesian statistics
 Closedform solutions