1 Introduction

Geostatistical spatial estimators are standard techniques used in spatial analyses in the geosciences and geographical information systems. They are applied in remote sensing (Van der Meer 2012), meteorology (Teegavarapu et al. 2015), machine learning (De Iaco et al. 2022), medical geology (Zhang et al. 2021) and epidemiology (Graham et al. 2004), among others. In this context, co-kriging (Matheron 1963) is a multivariate geostatistical interpolation method that is used to generate maps of a primary variable by using experimental data of that variable and experimental data from auxiliary variables correlated with the former. Without loss of generality, and for simplicity, we consider the case of only one auxiliary variable, which is also the most frequent case encountered in practice.

In geostatistics, an observed value at a spatial location \({\mathbf{u}}_{i}\) is modelled as a realisation of a random variable \(Z\left({\mathbf{u}}_{i}\right)\). For example, \(Z\) may represent a spatial variable, such as temperature, and \({\mathbf{u}}_{i}=\{{x}_{i},{y}_{i}\}\) is a spatial location on the plane; thus \(Z\left({\mathbf{u}}_{i}\right)\) models the temperature at that location. The set of all random variables \(Z({\mathbf{u}}_{i})\) in a region \(\chi \) of the space \({\mathbf{u}}_{i}\in \chi \) is a random function, or a random field, \(Z\left(\mathbf{u}\right)\). For \(\chi \subset {\mathfrak{R}}^{d}\) and \(d=2\), the problem is two-dimensional, which is the case in the work presented here.

It is assumed that the random function \(Z(\mathbf{u})\) is second-order stationary, that is, with constant spatial mean and covariance function \({C}_{Z}\left(\mathbf{u},\mathbf{u}+\mathbf{h}\right)\) that depends only on the vector \(\mathbf{h}\)

$$ E\left\{ {Z\left( {\mathbf{u}} \right)} \right\} = m_{Z } , $$
(1)
$$ C_{Z} \left( {\mathbf{h}} \right) = E\left\{ {Z\left( {\mathbf{u}} \right)Z\left( {{\mathbf{u}} + {\mathbf{h}}} \right)} \right\} - m_{Z}^{2} , $$
(2)
$$ C_{Z} \left( 0 \right) = \sigma_{Z}^{2} , $$
(3)

where \({m}_{Z}, {\sigma }_{Z}^{2}\) and \({C}_{Z}(\mathbf{h})\) are the mean, variance and covariance, respectively, of the random function \(Z\left(\mathbf{u}\right)\) and \(E\{.\}\) is the mathematical expectation operator. Although the variogram is most often used in geostatistics, only covariances are used in the work presented here.

Co-kriging is a linear estimator that can be written as (e.g., Journel and Huijbregts 1978; Ver Hoef and Cressie 1993)

$$ Z^{*} \left( {{\mathbf{u}}_{0} } \right) = \mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} Z\left( {{\mathbf{u}}_{i} } \right) + \mathop \sum \limits_{j = 1}^{m} \beta_{j}^{0} Y\left( {{\mathbf{u}}_{j} } \right), $$
(4)

where \({Z}^{*}({\mathbf{u}}_{0})\) is the estimated value of the primary variable at the spatial location \({\mathbf{u}}_{0}=\{{x}_{0},{y}_{0}\}\), that is, \(Z\left({\mathbf{u}}_{0}\right)\), where \({\mathbf{u}}_{0}\) is a point on the plane with coordinates \({x}_{0}\) as easting and \({y}_{0}\) as northing. \(\left\{Z\left({\mathbf{u}}_{i}\right); i=1, \dots ,n\right\}\) is the set of \(n\) experimental data of the primary variable in Eq. (4), and \(\left\{Y\left({\mathbf{u}}_{j}\right); j=1, \dots ,m\right\}\) is the set of \(m\) experimental data of the auxiliary variable used in Eq. (4). \({\lambda }_{i}^{0}\) is the weight applied to the primary variable \(Z\left({\mathbf{u}}_{i}\right)\) in the estimation of \(Z\left({\mathbf{u}}_{0}\right)\), and \({\beta }_{j}^{0}\) is the weight applied to the auxiliary variable \(Z\left({\mathbf{u}}_{j}\right)\) in the estimation of \(Z\left({\mathbf{u}}_{0}\right)\).

The set of optimal weights \(\left\{{\lambda }_{i}^{0}; i=1, \dots ,n\right\}\) and \(\left\{{\beta }_{j}^{0}; i=1, \dots ,n\right\}\) are obtained by minimising the variance of the estimation error

$$ {\text{Var}}\left\{ {Z^{*} \left( {{\mathbf{u}}_{0} } \right) - Z\left( {{\mathbf{u}}_{0} } \right)} \right\}, $$
(5)

subject to the unbiasedness condition

$$ E\left\{ {Z^{*} \left( {{\mathbf{u}}_{0} } \right) - Z\left( {{\mathbf{u}}_{0} } \right)} \right\} = 0 .$$
(6)

The position of \({\mathbf{u}}_{0}\) can vary in order to define a grid (raster image), a polygon, lineation, and so on.

An important aspect to be considered in co-kriging is the spatial support of a random variable as shown in Fig. 1, which shows six random variables with three different types of support: point support, \(\bullet \left( \right)\), and two different sizes of areal support \((v\) and \(V)\) with \(\bullet < < v < V\). For example, \({Z}_{V}({\mathbf{u}}_{5})\) represents the mean value of the random variable over the surface \(V\).

Fig. 1
figure 1

Spatial support of the random variable. In two-dimensional problems, the spatial support is the area to which the experimental information is assigned, that is, the pixel size or the area covered by the values

In two dimensions, the spatial support of a random variable is the area over which the random variable is measured. In remote sensing, for example, the spatial support may be the pixel size or the spatial resolution of a satellite image. A rain gauge, for example, measures rainfall on a spatial support that is very small with respect to the area over which the rainfall is to be estimated, and is sometimes referred to as a point support. If, for example, rainfall is measured by the variation in the height of water in a swimming pool, the support would be the area of the swimming pool. If the mean rainfall over a river basin is known, the spatial support is the area of the river basin. When using data for the same variable measured on different spatial supports, the support effect must be taken into account.

Figure 2 provides two-dimensional illustrative representations of the various forms of co-kriging. Figure 2a is the classical application of (ordinary) co-kriging to estimate the primary variable at an unsampled spatial location by using the available primary variable data (shown as open circles) and the available auxiliary data (shown as solid squares). The cross represents a location at which the primary variable has not been measured and is to be estimated. The black cross can be located arbitrarily, and thus the primary variable can be estimated at a grid of locations to generate contour maps or colour-coded maps. Figure 2b provides an example of block ordinary co-kriging or upscaling co-kriging where the mean value of the grey square, or any other arbitrary polygon, is to be estimated. The areal average represents the mean value of the primary variable inside the polygon. The black cross at the centre of the square is a reference point and not necessarily a location to be estimated. The spatial support of the estimation is larger than the support of the experimental data and is an example of upscaling the primary variable. In Fig. 2c, the auxiliary variable has been measured at every location (blue square) of a grid or raster image, such as a satellite image or a digital elevation model. Often the primary variable is to be estimated on the same grid. An example of this is estimating rainfall from rainfall data measured in rain gauges together with altitude data as a secondary variable obtained from a digital elevation model (a raster image). In Fig. 2d, the objective is to estimate a directional derivative of the primary variable (open circles) by using the secondary data provided by the directional derivatives of the variable (arrows). In this way, the gradient of the primary variable can be estimated, and the auxiliary data could also be the known gradient values. In the co-kriging problem in Fig. 2e, the primary variable (a scalar) is to be estimated using the boundary conditions of the problem (no flow and constant value) expressed as directional derivatives. The derivative perpendicular to the boundary for the no-flow case and the derivative parallel to the boundary for the constant value case are null. Figure 2f is an example of inverse problem co-kriging. Although a groundwater flow model has been used here, the procedure is completely generalisable to other partial differential equation problems. In inverse problem co-kriging, the parameters of the model (primary variable) are estimated from the experimental values of the parameters and the model output values (auxiliary variables). In the groundwater flow problem, these are log-transmissivity and water head for the primary and auxiliary variables, respectively. In Fig. 2g, downscaling co-kriging can be used in a setting such as that shown in Fig. 2a, but it has a clearer application in increasing the spatial resolution of satellite images where the image is to be estimated at the same resolution as that of the secondary information. The spatial resolution to be estimated could even be a point support.

Fig. 2
figure 2figure 2

a The general ordinary co-kriging case. b Block ordinary co-kriging or upscaling co-kriging. c The case in which the auxiliary variable has been measured at every location (blue square) of a grid or raster image. d The objective here is to estimate a directional derivative of the primary variable. e Co-kriging when the primary variable is to be estimated using the boundary conditions of the problem expressed as directional derivatives. f An example of inverse problem co-kriging. g Downscaling co-kriging

Extensions of co-kriging considered in the work presented here are as follows:

  • Co-kriging to estimate the directional derivative of the primary variable at specified locations by using the available directional derivative data and the gradient of the primary variable (Fig. 2d).

  • Boundary co-kriging, in which the primary variable is estimated using primary variable data together with secondary variable boundary conditions in the form of directional derivatives (Fig. 2e).

  • Inverse problem co-kriging, in which the primary and the auxiliary variables are physically linked by a differential equation with given boundary conditions. Inverse problem co-kriging is used to solve an inverse problem (Fig. 2f).

  • Downscaling co-kriging or estimating the primary variable on a support smaller than the support of the experimental data of the primary variable (Fig. 2g).

The different aspects of co-kriging are summarised in Fig. 3 and Table 1. In this paper, without any loss of generality, we consider the interpolation of a primary variable using primary variable data and auxiliary variable data, that is, ordinary co-kriging with two variables. If there are no available auxiliary variable data, co-kriging reduces to kriging (1). Ordinary co-kriging assumes that the spatial means of the variables are constant. If there is a spatially variable mean (a trend) in the primary or secondary variables, the universal co-kriging estimator is used (2). If the support (or pixel size) of the estimation is larger than that of the experimental support, then up-scaling co-kriging is applied (3). When the support to be estimated is smaller than the experimental support of the primary variable, then downscaling co-kriging is the interpolator (4). If the primary and auxiliary variables are related by a physical model described by a partial differential equation, then the interpolator is inverse problem co-kriging (5). Co-kriging can be adapted to estimate the directional derivatives of a scalar variable by using the directional derivatives as auxiliary data (6). The boundary conditions of no flow and fixed value can be incorporated into co-kriging with boundary conditions. The following sections provide a brief review of ordinary co-kriging before dealing with upscaling co-kriging, downscaling co-kriging, inverse problem co-kriging, co-kriging of directional derivatives and co-kriging with boundary conditions.

Fig. 3
figure 3

The many forms of co-kriging

Table 1 The many forms of co-kriging shown as a diversity of interpolation problems

2 Ordinary Co-kriging

In the simplest case, a spatial variable of interest \(Z\left(\mathbf{u}\right)\), or primary variable, is to be estimated at a location \({\mathbf{u}}_{0}\) at which the variable was not sampled. The variable is to be estimated by using the experimental values of the primary variable \(\left\{Z\left({\mathbf{u}}_{i}\right); i=1, \dots ,n\right\}\) and the experimental values of an auxiliary variable \(\left\{Y\left({\mathbf{u}}_{j}\right); j=1, \dots ,m\right\}\). The ordinary co-kriging estimator is given in Eq. (4), and the optimal weights are obtained by minimising the variance of the estimation error given in Eq. (5) subject to the unbiasedness condition in Eq. (6). The unbiasedness condition of the co-kriging in Eq. (6) implies that the following conditions must be satisfied

$$ \mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} = 1, $$
(7)
$$ \mathop \sum \limits_{j = 1}^{m} \beta_{j}^{0} = 0. $$
(8)

Isaaks and Srivastava (1989) showed that a single unbiased condition in co-kriging

$$ \mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} + \mathop \sum \limits_{j = 1}^{m} \beta_{j}^{0} = 1, $$
(9)

may be preferable to using the two individual unbiased conditions in (7) and (8). In particular, the condition in Eq. (8) that the sum of the weights assigned to the secondary variable must be zero implies that some of the weights must be negative, which can produce problematic estimates. This problem is avoided by using the unique unbiasedness condition given in Eq. (9).

The ordinary co-kriging system is given in “Appendix A”. The constant mean condition in Eq. (1) can be relaxed to consider a spatially variable mean. In the latter case, the interpolator is given by universal co-kriging, which is reviewed in “Appendix B”.

The variance of the estimation error, or estimation variance, can be written as (Myers 1982, 1983, 1991; Isaaks and Srivastava 1989; Wackernagel 2003)

$$ \begin{aligned} & {\text{Var}}\left\{ {Z^{*} \left( {{\mathbf{u}}_{0} } \right) - Z\left( {{\mathbf{u}}_{0} } \right)} \right\} = \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} \lambda_{i}^{0} \lambda_{j}^{0} C_{Z} \left( {{\mathbf{h}}_{ij} } \right) + \mathop \sum \limits_{i = 1}^{m} \mathop \sum \limits_{j = 1}^{m} \beta_{i}^{0} \beta_{j}^{0} C_{Y} \left( {{\mathbf{h}}_{ij} } \right) \\ & \quad + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{m} \lambda_{i}^{0} \beta_{j}^{0} C_{ZY} \left( {{\mathbf{h}}_{ij} } \right) + \mathop \sum \limits_{j = 1}^{m} \mathop \sum \limits_{i = 1}^{n} \beta_{j}^{0} \lambda_{i}^{0} C_{YZ} \left( {{\mathbf{h}}_{ji} } \right) \\ & \quad - 2\mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} C_{Z} \left( {{\mathbf{h}}_{i0} } \right) - 2\mathop \sum \limits_{j = 1}^{m} \beta_{j}^{0} C_{Y} \left( {{\mathbf{h}}_{j0} } \right) + C_{Z} \left( {{\mathbf{h}}_{00} } \right), \\ \end{aligned} $$
(10)

where \({C}_{Z}({\mathbf{h}}_{ij})\) is the covariance between the random variables \(Z({\mathbf{u}}_{i})\) and \(Z({\mathbf{u}}_{j})\) for which the spatial distance is equal to the vector \({\mathbf{h}}_{ij}={\mathbf{u}}_{j}-{\mathbf{u}}_{i}\). In a similar way, \({C}_{ZY}({\mathbf{h}}_{ij})\) is defined as the cross-covariance between the random variables \(Z({\mathbf{u}}_{i})\) and \(Y({\mathbf{u}}_{j})\)

$$ C_{ZY} \left( {{\mathbf{h}}_{ij} } \right) = E\left\{ {Z\left( {{\mathbf{u}}_{i} } \right)Y\left( {{\mathbf{u}}_{j} } \right)} \right\} - m_{Z} m_{Y} . $$
(11)

Similarly, \({C}_{YZ}({\mathbf{h}}_{ji})\) is the cross-covariance between the random variables \(Y\left({\mathbf{u}}_{j}\right)\) and \(Z({\mathbf{u}}_{i})\).

If the number of auxiliary data, \(m\), in Eq. (4) is zero, then ordinary co-kriging reduces to ordinary kriging. The main additional requirement is the inclusion of the auxiliary information in the estimator in Eq. (4). This is a type of statistical inference because, in addition to the inference of the covariance of the primary variable, it is necessary to infer the covariance of the auxiliary variable and the cross-covariance between the two variables, although in special cases some simplifications could be assumed (Journel 1999; Babak and Deutsch 2009). The co-kriging system is given in “Appendix A” and is known as the ordinary co-kriging system (Journel and Huijbregts 1978; Isaaks and Srivastava 1989; Deutsch and Journel 1992; Goovaerts 1997; Chilès and Delfiner 1999; Remy et al. 2009) and is the most often used in practice, as in mining (e.g., Journel and Huijbregts 1978), hydrogeology (e.g., Hoeksema et al. 1989), remote sensing (e.g., Atkinson et al. 1994), soil science (e.g., Lesch et al. 1995), geophysics (e.g., Doyen 1988), meteorology and climatology (e.g., Pardo-Igúzquiza 1998), among many other earth science disciplines.

3 Upscaling Co-kriging

A random variable \(Z({\mathbf{u}}_{0})\) with point support is a random variable at the spatial location \({\mathbf{u}}_{0}=\{{x}_{0},{y}_{0}\}\), that is, a point in the plane. However, \(Z({\mathbf{u}}_{0})\) could represent the mean value of an area (areal support), for example, a square centred at the spatial location \({\mathbf{u}}_{0}=\{{x}_{0},{y}_{0}\}\). In this case, the value of the random variable at any location is an areal average and is denoted \({Z}_{V}({\mathbf{u}}_{0})\), referring to the mean value of the random function \(Z(\mathbf{u})\) over the polygon \(V\)

$$ Z_{V} \left( {{\mathbf{u}}_{0} } \right) = \int\limits_{V} {Z\left( x \right){\text{d}}x}. $$
(12)

In the notation \({Z}_{V}({\mathbf{u}}_{0})\), \({\mathbf{u}}_{0}\) is an arbitrary point denoting the polygon \(V\), for example, the centroid.

The ordinary co-kriging estimator \({Z}_{V}^{*}({\mathbf{u}}_{0})\) is similar to the estimator in Eq. (4), and the weights are obtained by solving the block co-kriging system, similar to that given in “Appendix A”, but with a new vector \(\mathbf{B}\)

$$ {\mathbf{B}} = \left[ {\begin{array}{*{20}c} {C_{Z} \left( {{\mathbf{h}}_{1V} } \right)} \\ \vdots \\ {\begin{array}{*{20}c} {C_{Z} \left( {{\mathbf{h}}_{nV} } \right)} \\ {C_{ZY} \left( {{\mathbf{h}}_{V1} } \right)} \\ {\begin{array}{*{20}c} \vdots \\ {C_{ZY} \left( {{\mathbf{h}}_{Vm} } \right)} \\ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] , $$
(13)

where \({C}_{Z}({\mathbf{h}}_{iV})\) is the mean covariance between the ith experimental location and the polygon \(V({\mathbf{u}}_{0})\)

$$ C_{Z} \left( {{\mathbf{h}}_{iV} } \right) \approx \frac{1}{M}\mathop \sum \limits_{j = 1}^{M} C_{Z} \left( {{\mathbf{h}}_{ij} } \right), $$
(14)

in which the polygon \(V\) has been approximated by \(M\) points, for example, points on a regular grid inside the polygon. In the same manner, the cross-covariance \({C}_{ZY}({\mathbf{h}}_{Vj})\) can be defined between the primary variable with support \(V\) and the secondary variable with point support.

Note that the support of the estimate is equal to, or greater than, the support of the experimental data. Equation (12) can be made more general by introducing a function that gives a different weight to every point that defines the support \(V\), for example, in remote sensing, in which a satellite sensor has a point spread function that, in general, gives more weight to the centre of a pixel than to the borders.

4 Downscaling Co-kriging

In downscaling co-kriging, the estimation support of the primary variable is smaller than its experimental support. A typical application is in remote sensing where the spatial resolution of a satellite image, for a particular spectral band, is to be increased. The ordinary co-kriging estimator in Eq. (4) can be rewritten to take explicit account of the support of each random function, as

$$ Z_{v}^{*} \left( {{\mathbf{u}}_{0} } \right) = \mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} Z_{V} \left( {{\mathbf{u}}_{i} } \right) + \mathop \sum \limits_{j = 1}^{m} \beta_{i}^{0} Y_{v} \left( {{\mathbf{u}}_{i} } \right), $$
(15)

where \(v\) is the high spatial resolution support (or pixel size), and \(V\) is the low spatial resolution pixel size, expressed as

$$ \bullet < < v < V, $$
(16)

where \(\bullet \) represents the point support.

The resolution of the downscaling co-kriging system provides the weights of the estimator given in Eq. (15) and is similar to the co-kriging system given in “Appendix A”, but takes into account the different supports as, for example, the following matrix \(\mathbf{B}\)

$$ {\mathbf{B}} = \left[ {\begin{array}{*{20}c} {C_{Z}^{Vv} \left( {{\mathbf{h}}_{10} } \right)} \\ \vdots \\ {\begin{array}{*{20}c} {C_{Z}^{Vv} \left( {{\mathbf{h}}_{n0} } \right)} \\ {C_{ZY}^{vv} \left( {{\mathbf{h}}_{01} } \right)} \\ {\begin{array}{*{20}c} \vdots \\ {C_{ZY}^{vv} \left( {{\mathbf{h}}_{0m} } \right)} \\ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]. $$
(17)

\({C}_{Z}^{Vv}({\mathbf{h}}_{10})\) is the covariance between the random variables \({Z}_{V}({\mathbf{u}}_{1})\) and \({Z}_{v}({\mathbf{u}}_{0})\), and \({C}_{ZY}^{vv}({\mathbf{h}}_{01})\) is the covariance between \({Z}_{v}({\mathbf{u}}_{0})\) and \({Y}_{v}({\mathbf{u}}_{1})\).

There are no experimental data for the random variable \({Z}_{v}(\mathbf{u})\) because it has not been observed at that high spatial resolution. Thus, the matrix in Eq. (17) cannot be experimentally estimated. The solution proposed in Pardo-Igúzquiza et al. (2006) and Atkinson et al. (2008) is a numerical one that implies convolutions and deconvolutions. The method consists of proposing covariance models with point support \(C_{Z}^{ \bullet \bullet } \left( {\mathbf{h}} \right)\) that are introduced in the equation (Matheron 1963)

$$ \tilde{C}_{Z}^{VV} \left( {\mathbf{h}} \right) = C_{Z}^{ \bullet \bullet } \left( {\mathbf{h}} \right)*\rho_{V} \left( {\mathbf{h}} \right), $$
(18)

which produces the induced covariance model \({\widetilde{C}}_{Z}^{VV}(\mathbf{h})\) that can be compared with the experimental covariance \({\widehat{C}}_{Z}^{VV}(\mathbf{h})\). In Eq. (18), \({\rho }_{V}(\mathbf{h})\) is the geometric covariogram in Matheron (1975), and * is the convolution operator. The iterative process consists in finding the covariance model with point support that minimises the difference between the covariance with support \(V\) induced by Eq. (18) and the experimental covariance with support \(V\). Once the covariance model with point support has been estimated, the covariance over any other support can be calculated using Eq. (18). In particular, all the covariances required in the downscaling co-kriging system can be calculated. A detailed account of downscaling co-kriging is given in Pardo-Igúzquiza et al. (2006) and Pardo-Igúzquiza and Atkinson (2007), and a computer program for downscaling co-kriging is given in Pardo-Igúzquiza et al. (2010). In remote sensing, downscaling co-kriging can be used for image sharpening and image fusion.

5 Co-kriging as a Solution to the Inverse Problem

Another application of co-kriging is related to the geostatistical solution of the inverse problem. For a physical system, the direct problem is to predict the system given the values of the parameters that characterise the system. The inverse problem uses experimental measurements to infer the values of the parameters that characterise the system. For example, solving the inverse problem in groundwater hydrology consists of gathering transmissivity and water head data and taking into account that both variables are related by the groundwater flow equations (Hoeksema and Kitanidis 1984; Dagan 1985; Ahmed and De Marsily 1993). Transmissivity is the primary variable that defines the parameters of the system. Water head, the output of the direct problem, is the auxiliary variable. The co-kriging solution of the inverse problem consists of using co-kriging with a cross-covariance between transmissivity and water head obtained from theoretical considerations by taking into account the groundwater flow equations and the aquifer boundary conditions; in particular, taking into account the steady-state groundwater flow equations (Kitanidis 1997)

$$ \frac{{\partial Z\left( {\mathbf{u}} \right)}}{\partial x}\frac{{\partial Y\left( {\mathbf{u}} \right)}}{\partial x} + \frac{{\partial Z\left( {\mathbf{u}} \right)}}{\partial y}\frac{{\partial Y\left( {\mathbf{u}} \right)}}{\partial y} + \frac{{\partial^{2} Y\left( {\mathbf{u}} \right)}}{{\partial x^{2} }} + \frac{{\partial^{2} Y\left( {\mathbf{u}} \right)}}{{\partial y^{2} }} = 0, $$
(19)

where \(Z\left(\mathbf{u}\right)\) is the logarithm of transmissivity and \(Y(\mathbf{u})\) is the water head. The cross-covariance between the primary and auxiliary variables in Eq. (19), together with any boundary conditions, can be found analytically or by a Monte Carlo methodology (Kitanidis 1997). Dagan’s (1985) analytical solution for Eq. (19) is given in “Appendix C”.

6 Co-kriging for Estimating Directional Derivatives

For some problems in the geosciences and in other disciplines, the interest is in calculating the directional derivative of a given scalar variable. For example, the hydraulic gradient is a vector variable, the two components of which are the directional derivatives of the water head in the directions of the principal axes. Co-kriging can be adapted for this type of estimation. Furthermore, by using linear systems theory, the covariance of the directional derivative and the cross-covariance between the directional derivative and the scalar variable can be theoretically obtained from the covariance of the scalar variable. Other earth science applications can be found in terrain analysis, remote sensing, geophysics and meteorology, among others.

The directional derivative of the random function \( Z(\mathbf{u})\) can be estimated by extending the trend model in equation (B2) to distinguish two components in the residual \(R(\mathbf{u})\), namely a correlated stochastic component \(\widetilde{R}(\mathbf{u})\), and a non-correlated stochastic component \(N(\mathbf{u})\). The geostatistical model is then

$$ Z\left( {\mathbf{u}} \right) = m\left( {\mathbf{u}} \right) + \tilde{R}\left( {\mathbf{u}} \right) + N\left( {\mathbf{u}} \right) . $$
(20)

There is no correlation between the three components in Eq. (20). The non-correlated stochastic component \(N(\mathbf{u})\) is a zero-mean random function with no spatial correlation and is discontinuous at every point, is non-differentiable, and represents the variability associated with a second-order stationary nugget covariance \({C}_{N}\left(\mathbf{h}\right)\)

$$ C_{N} \left( {\mathbf{h}} \right) = \left\{ {\begin{array}{*{20}c} {C_{0} } & {\left| {\mathbf{h}} \right| = 0} \\ 0 & {\left| {\mathbf{h}} \right| > 0} \\ \end{array} } \right.. $$
(21)

In applications, there will be a component \(N(\mathbf{u})\) if there is a nugget variance in the experimental covariance.

The correlated stochastic component \(\widetilde{R}(\mathbf{u})\) is a zero-mean random function with spatial covariance \(\widetilde{C}(\mathbf{h})\). Thus, the spatial correlation \(C(\mathbf{h})\) of the random function \(Z(\mathbf{u})\) is given by

$$ C\left( {\mathbf{h}} \right) = \left\{ {\begin{array}{*{20}c} {C_{0} + \tilde{C}\left( 0 \right)} & {\left| {\mathbf{h}} \right| = 0} \\ {\tilde{C}\left( {\mathbf{h}} \right)} & {\left| {\mathbf{h}} \right| > 0} \\ \end{array} } \right..$$
(22)

A random function, \(\widetilde{Z}(\mathbf{u})\), that is continuous and differentiable can be obtained from \(Z(\mathbf{u})\) by filtering the nugget component,

$$ \tilde{Z}\left( {\mathbf{u}} \right) = m\left( {\mathbf{u}} \right) + \tilde{R}\left( {\mathbf{u}} \right), $$
(23)

the covariance of which is \(\widetilde{C}(\mathbf{h})\). The first requirement is for \( \widetilde{Z}(\mathbf{u})\) to be differentiable for which the trend, \(m\left(\mathbf{u}\right),\) must be differentiable (Parzen 1972), which is the case for the polynomial drift considered here. A second requirement is that the covariance, \(\widetilde{C}(\mathbf{h})\), must be at least twice differentiable. In particular, the Gaussian covariance and the Matérn covariance (Pardo-Igúzquiza et al. 2009) are appropriate.

We use \({D}_{\mathbf{e}}\{\widetilde{Z}(\mathbf{u})\}\) to denote the directional derivative of \(\widetilde{Z}(\mathbf{u})\) in the direction described by the unitary vector \(\mathbf{e}=\mathrm{cos}(\varphi )\mathbf{i}+\mathrm{sin}(\varphi )\mathbf{j}\), where \(\varphi \) is the counterclockwise angle between the unitary vector \(\mathbf{e}\) and the X axis, and \(\mathbf{i}\) and \(\mathbf{j}\) are the unitary vectors of the coordinate axes X and Y. The directional derivative is defined as

$$ D_{{\mathbf{e}}} \left\{ {\tilde{Z}\left( {\mathbf{u}} \right)} \right\} = \mathop {{\text{lim}}}\limits_{h \to 0} \left[ {\frac{{\tilde{Z}\left( {{\mathbf{u}} + h{\mathbf{e}}} \right) - \tilde{Z}\left( {\mathbf{u}} \right)}}{h}} \right]. $$
(24)

The gradient of a scalar random function \(\widetilde{Z}(\mathbf{u})\), denoted by \(\nabla \widetilde{Z}(\mathbf{u})\), is a vector field defined at any point \(\mathbf{u}\) by its components

$$ \nabla \tilde{Z}\left( {\mathbf{u}} \right) = \left( {D_{{\mathbf{i}}} \left\{ {\tilde{Z}\left( {\mathbf{u}} \right)} \right\},D_{{\mathbf{j}}} \left\{ {\tilde{Z}\left( {\mathbf{u}} \right)} \right\}} \right). $$
(25)

The relationship between the directional derivative and the gradient is (Bradley and Smith 1989; Meyer et al. 2001)

$$ D_{{\mathbf{e}}} \left\{ {\tilde{Z}\left( {\mathbf{u}} \right)} \right\} = \nabla \tilde{Z}\left( {\mathbf{u}} \right) \cdot {\mathbf{e}}, $$
(26)

where \(\mathbf{a}\cdot \mathbf{b} \)is the scalar product of the vectors \(\mathbf{a}\) and \(\mathbf{b}\).

Direct measurements of the gradient can be incorporated in the estimation of the directional derivative by using co-kriging

$$ \hat{D}_{{{\mathbf{e}}_{0} }} \left\{ {\tilde{Z}\left( {{\mathbf{u}}_{0} } \right)} \right\} = \mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} Z\left( {{\mathbf{u}}_{i} } \right) + \mathop \sum \limits_{j = 1}^{m} \beta_{j}^{0} D_{{{\mathbf{e}}_{j} }} \left\{ {\tilde{Z}\left( {{\mathbf{u}}_{j} } \right)} \right\}, $$
(27)

where \({\widehat{D}}_{{\mathbf{e}}_{0}}\{\widetilde{Z}({\mathbf{u}}_{0})\}\) is the estimator of the directional derivative \({D}_{{\mathbf{e}}_{0}}\{\widetilde{Z}({\mathbf{u}}_{0})\}\) at the spatial location \({\mathbf{u}}_{0}\) in direction \({\mathbf{e}}_{0}\) using \(n\) experimental values \(Z({\mathbf{u}}_{i})\) and \(m\) experimental measurements of the directional derivatives of \(Z(\mathbf{u})\). \({D}_{{\mathbf{e}}_{j}}\{\widetilde{Z}({\mathbf{u}}_{j})\}\) provides the auxiliary information in the same way that \(Y({\mathbf{u}}_{j})\) does in Eq. (4).

The concept of estimating the gradient by ordinary kriging is given in Philip and Kitanidis (1989), and the estimation of the directional derivative by co-kriging is given in Pardo-Igúzquiza and Chica-Olmo (2004, 2007) in which co-kriging has been extended to take into account the trend that is observed in many spatial variables when considered at the regional scale. This is the case, for example, for the piezometric level or the water head in an aquifer, the rainfall in an area of rough orography, and the trend shown by geophysical variables associated with a sedimentary basin or a tectonic structure.

7 Co-kriging Taking Boundary Conditions into Account

This extension of co-kriging is related to the previous one, but the variable of interest is not the derivative of the primary variable but rather the primary variable itself. Thus, boundary conditions can be included in the co-kriging estimation by expressing the boundary conditions in the form of directional derivatives that play the role of the auxiliary variable. For example, no-flow boundary conditions in an aquifer indicate that the directional derivatives of water head perpendicular to the boundary are null. In a similar manner, constant-head boundary conditions indicate that the directional derivatives parallel to the boundary are null (Fig. 1f). Thus, the new co-kriging estimator is given by

$$ \tilde{Z}^{*} \left( {{\mathbf{u}}_{0} } \right) = \mathop \sum \limits_{i = 1}^{n} \lambda_{i}^{0} Z\left( {{\mathbf{u}}_{i} } \right) + \mathop \sum \limits_{j = 1}^{m} \beta_{j}^{0} D_{{{\mathbf{e}}_{0} }} \left\{ {\tilde{Z}\left( {{\mathbf{u}}_{j} } \right)} \right\}. $$
(28)

Although the right-hand sides of Eqs. (27) and (28) are the same, the weights will be different because the co-kriging systems are different. For example, the \(\mathbf{B}\) matrix is given by

$$ {\mathbf{B}} = \left[ {\begin{array}{*{20}c} {\tilde{C}\left\{ {{\mathbf{h}}_{10} } \right\}} \\ \vdots \\ {\tilde{C}\left\{ {{\mathbf{h}}_{n0} } \right\}} \\ {\tilde{C}_{{{\mathbf{e}}_{0} }}^{*} \left\{ {{\mathbf{h}}_{10} } \right\}} \\ \vdots \\ {\tilde{C}_{{{\mathbf{e}}_{0} }}^{*} \left\{ {{\mathbf{h}}_{m0} } \right\}} \\ {f_{1} \left( {{\mathbf{u}}_{0} } \right)} \\ \vdots \\ {f_{L} \left( {{\mathbf{u}}_{0} } \right)} \\ \end{array} } \right] .$$
(29)

The implementation of the methodology is given in detail in Kuhlman and Pardo-Igúzquiza (2010).

8 Illustrative Examples

We use a simulation to illustrate the different applications of co-kriging because, when the target field is known, the true errors can be evaluated and assessed. A 100 × 100 grid of log-transmissivity values was simulated using the spectral simulation method for generating realisations of correlated random fields (Pardo-Igúzquiza and Chica-Olmo 1994a, b). A realisation of a random field of the log-transmissivity of an aquifer is shown in Fig. 4a. The auxiliary variable shown in Fig. 4b was obtained from Fig. 4a by adding noise (an uncorrelated variable) so that both random fields reproduce the same covariance model but with a nugget variance in Fig. 4b. The coefficient of correlation between the realisations in Fig. 4a, b is 0.718. Another auxiliary variable is provided in Fig. 4c which was obtained as the solution of the direct problem of the groundwater flow in an unconfined aquifer given Eq. (19) and the boundary conditions shown in Fig. 2f, with \({H}_{1}=400\) m and \({H}_{2}=300\) m and using as parameters the log-transmissivity field given in Fig. 4a. The gradient field of the scalar field of water head data in Fig. 4c is shown in Fig. 4d.

Fig. 4
figure 4

a Realisation of a random field representing the primary variable. b Realisation of the auxiliary variable obtained as a noisy version of (a). c Realisation of a random field obtained as the solution of a direct problem in groundwater flow for which (a) is the parameter field. d Gradient field of (c). This representation consists of small arrows that represent the magnitude and orientation of the gradient vector field. Each of the four realisations comprises 100 × 100 elements

The experimental data were obtained by randomly sampling the gridded data in Fig. 4. In this way 100, 300, 300 and 100 data are sampled for log-transmissivity, porosity, water head and gradient. respectively. and are shown in Fig. 5a to d, respectively. The covariance models of log-transmissivity, porosity, and water head were inferred by maximum likelihood (Pardo-Igúzquiza 1997) assuming a constant mean and exponential covariance for log-transmissivity and porosity. A linear trend and a Gaussian covariance were assumed for the water head data. The cross-covariance between log-transmissivity and porosity was estimated using sgems (Remy et al. 2009). The estimated covariance parameters are given in Table 2.

Fig. 5
figure 5

Experimental data obtained by sampling from realisations on a 100 × 100 grid. a 100 data obtained by random sampling from Fig. 2a. b 300 data obtained by random sampling from Fig. 2b. c 300 data obtained by random sampling from Fig. 2c. d 100 data obtained by random sampling from Fig. 2d

Table 2 Estimated parameters of covariances and cross-covariances of the experimental data using MLREML (Pardo-Igúzquiza 1997) for the covariances and sgems (Remy et al. 2009) for the cross-covariance

When there are no auxiliary data, ordinary co-kriging reduces to ordinary kriging. The kriging estimates of the log-transmissivity field in Fig. 4a obtained by using only the experimental data of the primary variable (Fig. 5a) are shown in Fig. 6a. Validation statistics were used to compare the target field (Fig. 4a) and the interpolated field (Fig. 6a). The validation statistics used are the mean error \(\left(\mathrm{ME}\right)\), mean squared error \(\left(\mathrm{MSE}\right)\) and the mean absolute error \(\left( {{\text{MAE}}} \right)\),

$$ {\text{ME}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {Z^{*} \left( {{\mathbf{u}}_{i} } \right) - Z\left( {{\mathbf{u}}_{i} } \right)} \right), $$
(30)
$$ {\text{MSE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {Z^{*} \left( {{\mathbf{u}}_{i} } \right) - Z\left( {{\mathbf{u}}_{i} } \right)} \right)^{2} , $$
(31)
$$ {\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {Z^{*} \left( {{\mathbf{u}}_{i} } \right) - Z\left( {{\mathbf{u}}_{i} } \right)} \right|, $$
(32)

where \(({\mathbf{u}}_{i})\) is the target or true value at spatial location \({\mathbf{u}}_{i},\) \({Z}^{*}({\mathbf{u}}_{i})\) is an interpolated or estimated value at spatial location \({\mathbf{u}}_{i},\) \({\sigma }_{*}^{2}({\mathbf{u}}_{i})\) is the estimation variance at spatial location \({\mathbf{u}}_{i},\) and \(N\) is the number of spatial locations at which the variable was estimated. For an unbiased model, ME should be zero. MSE and MAE should be as small as possible.

Fig. 6
figure 6

Interpolated fields on a 100 × 100 grid. a Ordinary kriging map using the data in Fig. 3a. b Ordinary co-kriging map using the data in Fig. 3a and b. c Inverse problem universal co-kriging map using the data in Fig. 3a and c. d Gradient co-kriging using the data in Fig. 3a and d

Table 3 shows the validation statistics for the estimation of the log-transmissivity field by ordinary kriging (O-K) using only log-transmissivity data (Fig. 5a); ordinary co-kriging (O-COK) using log-transmissivity data (Fig. 5a) and the auxiliary porosity variable (Fig. 5b); and inverse problem universal co-kriging (IP-U-COK) using the water head data (Fig. 5c) as the secondary variable. The latter case is a different way of incorporating a different type of auxiliary data. A comparison of the MSE values shows that co-kriging is a clear improvement over kriging, but that is not the case for IP-U-COK. The latter case should be compared with the output of the direct problem of estimating water head from a log-transmissivity field, the results of which are given in Table 4, which clearly show the improvement of IP-U-COK over kriging.

Table 3 Validation statistics for the estimated log-transmissivity for the different interpolators: ordinary kriging (O-K), ordinary co-kriging (O-COK) and inverse problem universal co-kriging (IP-U-COK)
Table 4 Validation statistics using the direct problem of Eq. (18). Ordinary kriging (O-K), ordinary co-kriging (O-COK) and inverse problem universal co-kriging (IP-U-COK)

When estimating water head (Fig. 4c) as the primary variable, the universal kriging results using the experimental data in Fig. 5c can be compared with those of universal co-kriging taking the boundary conditions in Fig. 2e into account. This gives the estimated field in Fig. 7a. Table 5 lists the validation statistics that show the improvement of co-kriging with boundary conditions. With respect to the estimation of gradients, Fig. 6d shows the estimated gradient field that can be compared with the true gradient field in Fig. 4d to obtain the validation statistics given in Table 6. Finally, the log-transmissivity image in Fig. 4a has been degraded to a resolution image by a factor of 4 and to a high-resolution image by a factor of 2. Similarly, the auxiliary image in Fig. 4b has been degraded to a high-resolution image by a factor of 2. The purpose of downscaling co-kriging is to estimate a high-resolution image of the primary variable by using the low-resolution image of the primary variable and the high-resolution image of the auxiliary variable. The results are shown in Fig. 7, and the validation statistics are given in Table 7. Whilst every study must be considered in its own right, the results shown here demonstrate the potential of the methodology. Each estimation is accompanied by a map of the associated estimation variance which can be used as a measure of the uncertainty of the estimates.

Fig. 7
figure 7

Interpolated fields on a 100 × 100 grid. a Boundary co-kriging map using the data in Figs. 1e and 3a. b Low-resolution image from Fig. 2a. c High-resolution image from Fig. 2a. d High-resolution image estimated by downscaling co-kriging

Table 5 Validation statistics of the estimated water head for the different interpolators
Table 6 Validation statistics for the estimated gradient (directional derivatives in the X direction and the Y direction)
Table 7 Validation statistics for downscaling co-kriging (DS-COK) for the result shown in Fig. 7d

9 Conclusions

The ordinary co-kriging estimator that has traditionally been used to estimate a primary variable using data from that variable and data from an auxiliary variable can be modified to take into account a variety of aspects of significant practical interest as shown in Fig. 3. These aspects include accounting for different supports of the primary and auxiliary variables and the mathematical link between the primary and auxiliary variables which provide a means of solving problems such as the following:

  • Downscaling co-kriging to increase the spatial resolution of the primary variable

  • Solving inverse problems

  • Estimating directional derivatives and hence the gradient of a scalar variable

  • Including boundary conditions in the estimation of a scalar variable.

This is not an exhaustive list of applications, and more are likely to arise in future research. In addition, there are cases in which diverse forms of co-kriging are required because of the type of data. Examples include indicator co-kriging (Pardo-Igúzquiza and Dowd 2005), compositional co-kriging (Pardo-Igúzquiza et al. 2015) and estimating spatial factors or spatial components using factorial co-kriging (Pardo-Igúzquiza and Dowd 2002).

All the methods discussed here include an evaluation of the uncertainty of the estimates given by the estimation variance in Eq. (9). For each map of estimates, there is a corresponding map of the associated estimation variance. Among other applications, the evaluation of uncertainty is important for establishing interval estimates, for optimal sampling and for the propagation of the uncertainty of measurements.