Conditioning geological surfaces to horizontal wells

Kriging is a standard method for conditioning surfaces to observations. Kriging works for vertical wells, but may produce surfaces that cross horizontal wells between surface observations. We establish an approach that also works for horizontal wells, where surfaces are modeled as a set of correlated Gaussian random fields. The constraints imposed by the horizontal wells makes the conditional surfaces non-Gaussian. We present a method for exact conditional simulation and an approximation for prediction and prediction uncertainty. Thousands of constraints can be handled efficiently without numerical instabilities. The approach is illustrated with synthetic and real examples that show how the constraints influence the surfaces and reduce uncertainty.


Introduction
Conditioning a set of stratigraphic surfaces to surface observations was considered by [1]. The basic idea is to describe each surface as the top of a stratigraphic layer. Layer thicknesses are assumed to be modeled as Gaussian random fields, making surfaces a set of correlated Gaussian random fields. Standard kriging and conditional simulation techniques provide surface maps conditioned to the surface observations.
As horizontal wells became the standard drilling technique, a new and important source of data became available: zone-log information. The zone-logs constraint the position of the surfaces along the well path, giving a large number of surface (inequality) constraints, when compared with direct surface observations. In [3], the authors proposed a method to condition one surface to surface constraints. Their main idea was to represent the surface constraints by synthetic data with a particular The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
In this paper we generalize the inequality constrain method in [3] to multiple surfaces, while respecting the formalism in [1]. One important challenge is to handle large amounts of surface constraints that appear on real applications with tens of surfaces and hundreds of wells. The Monte Carlo method in [3] does not scale to real applications and requires modifications. One of the main improvements in this work is the use of an efficient version of the Gibbs sampler that uses the geometrical, pair-wise distribution of the constraints to efficiently draw accurate samples. The finite number of samples necessarily implies that there is an inherent Monte Carlo sampling error in the estimated positions of the surfaces. We have now provided an upper estimate of this mismatch. The method was briefly explained in [4] using two surfaces, but we now provide a rigorous mathematical presentation, several examples and details on the algorithms.
Conditioning to inequality constraints have been studied by a number of authors, going back to [16] with indicator kriging and [12], [18] and [19] that use the dual kriging formalism combined with quadratic programming. Others have used rejection sampling Monte Carlo simulations [14] to evaluate high dimensional integrals to obtain posterior distributions [29] and the Expectation-Maximization algorithm for predicting uranium grade at censored locations [24]. A kriging formalism for interval-valued data was developed in [11], and [8] worked with constraints in the area of vectorial and multi-point Bayesian maximum entropy analysis. An MCMC Bayesian model for modeling the spatial distribution of a parameter with censored observations, using a Metropolis-Hastings algorithm was presented by [25]. Versions of the Gibbs sampler have also been proposed to handle large number of constraints [13]. The paper [23] gives a rigorous presentation and application of the Gibbs method for truncated distributions from physically constrained problems (non-negativity of concentrations, censored data, etc.) in environmental applications. The author illustrates how the procedure is used to estimate uncertainty bands in both geostatistical interpolation and inverse modeling. The Gibbs sampler is also an important step when simulating conditioned pluri-Gaussian random fields, in the context of facies modeling [7].
The Data Augmentation Algorithm [31] in our application is also used in [15] to detect the presence of a pollutant below detection threshold. Gaussian process emulators that satisfy constraints (inequalities, convexity, etc.) with respect to some input variables are studied by [20]. The method in this paper is classified as 2.5D by [33] and cannot handle multi-valued surfaces like overturned folds and salt domes.
The next section describes the basic ideas without any technical details. Section 3 describes the stochastic model for surfaces and layers, and the formal representation of well data is given in Section 4. Section 5 describes how to calculate the conditional probability distribution of surface depth given surface observations and constraints from horizontal wells. This section also presents efficient ways to calculate conditional predictions, prediction uncertainties and an algorithm to draw simulated realizations from the conditional probability distribution. Some technical details are elaborated on in Appendix A. Section 6 provides two examples that illustrate the methodology, while Section 7 shows results from a real case.

Basic idea
Horizontal wells impose restrictions on the surfaces in the form of inequality constraints. Well observations (logs, cores) along a well are typically measured every foot. Assuming surfaces are somewhat smooth indicates that restricting a surface laterally at every foot is unnecessary. Also, the finite grid resolution (typically 10-100 meters) limits the need for lateral precision. It is therefore convenient to resample the well to reduce the number of constraints. It is sufficient to keep samples separated by approximately the grid resolution since this is the resolution of the predictions. Proper sampling of the well must take into account location of well points (observation of surfaces), well inclination, and interaction with other wells.
Standard kriging and conditional simulation techniques do not include the possibility to condition on inequality constraints so a few initial steps are added to the standard approaches. The following steps outlines the proposed method and they are illustrated in Fig. 1: (a) Initial state: Consider the vertical cross section through a well and two surfaces. The upper surface has been observed at one well point (green disc) where the well enter the layer between the two surfaces. The task is to predict or simulate the two surfaces given this information. (b) Condition on the well point: Calculate the depth prediction (solid red lines) and the prediction uncertainty envelope (dashed lines) along the well using kriging. Since surfaces are correlated, the surface observation of the upper surface influences the lower surface. (c) Select constraining sample points: These are the sample points that are close to a surface, that is, the vertical distance between the well path and surface predictions are small relative to the prediction uncertainty. Blue discs has been identified as constraining sample points. (d) Draw surface values at the constraints: Draw 256 sets of correlated surface depth values at the constraining sample points. This is done by using the Data Augmentation Algorithm described in Section 5.1. These 256 sets of depth values represent the probability distribution for the surface depth at the constraints. They are illustrated as small point clouds in Fig. 1(d). Note that this is not a multivariate Gaussian distribution since the combination of high correlations and truncations makes any marginal distribution non-Gaussian. The number of sets could be larger than 256, but experience has shown that this is a proper balance between speed and precision. (e) Prediction:Estimate the expectation and the covariance of the 256 sets of surface depth values from Step (d). The estimated expectations and covariances are used to calculate a set of data points that are used in the kriging equations. The expectations and variances of these data points are chosen such that the prediction and prediction uncertainty exactly reproduce the estimated expectations and covariances of the 256 sets of surface depth values (Section 5.5). Figure 1(e) shows the resulting prediction and prediction uncertainty.  Step (e) is an approximation where non-Gaussian marginals are replaced by a symmetric (multi-Gaussian) distribution calculated using kriging.
Step (f) however, is an exact simulation algorithm that provides perfect samples from the conditional distribution given well points and the inequality constraints at the constraining sample points.

Stochastic model for surfaces and layers
Consider L surfaces. The depth to surface l is obtained by adding the thicknesses of the stratigraphic layers above the surface In the following we will assume that the reference surface z 0 (x) = 0 to simplify the notation. The thickness of layer l (below surface l) is modeled as where m l (x) is the layer thickness trend and ξ l (x) is the layer thickness residual. The layer thickness residual is modeled as a zero mean Gaussian random field [2] specified by the correlation function ρ l and the standard deviation σ l so that where residuals from different layers have been assumed independent. Here δ kl is 1 if k is equal to l, and 0 otherwise.
The thickness trend m l (x) for layer l is a linear combinations of P l known trend maps g l p (x) where β l p are the regression or trend coefficients. These trend maps can be constants, isochore maps, or any map useful to model the thickness of layer l. Isochore maps can be constructed from interpretation of seismic data or a depositional geological concept. The role of these trend maps is to capture the large scale shape of the surface.
Using this notation we can write the depth to a surface as where the depth trend m l (x) and depth residual l (x) are given by The depth residual is a sum of Gaussian random fields and is therefore also a Gaussian random field.
To simplify notation, all trend coefficients are organized in a vector β = β 1 , . . . , β L of length P = L l=1 P l . A corresponding trend map vector f l (x) for each surface is formed by replacing trailing non-contributing trend maps by zeros We can now write Eq. 2 as This is a linear multiple regression model with correlated Gaussian errors. The trend coefficients are assumed to have a prior multi-Gaussian distribution The trend coefficients are assumed to be a priori independent of the residuals.
Different surfaces are coupled in two ways; they share some common trend maps and trend coefficients, and they have correlated residuals since they share some common layer thickness residuals. Covariances between two arbitrary surfaces a and b at two different lateral locations are (4) since ξ k and ξ l are assumed uncorrelated for k = l.
The framework above is sufficient to specify the multi-Gaussian distribution of any collection of surface depth values. We will use the symbol ϕ · for any (multi) Gaussian probability density function in the following.

Well observations
Consider all N locations in space along the wells where the stratigraphic layer changes. These are observations of the surfaces separating the stratigraphic layers and are referred to as well points. They are denoted Consider also a discrete sampling of the well path w that excludes all well points. These are the white and blue discs in Fig. 1(c). These sample points are denoted Note that every sample point imposes constraints on all L surfaces; all surfaces above layer l I i must be shallower than z I i and all surfaces below layer l I i must be deeper than z I i . It is however sufficient to consider only the constraints on the two surfaces confining a layer since all surfaces are correlated so a constraint on one surface will influence all other surfaces. Therefore, we will only consider the two constraints in Eq. 5. This is illustrated in Fig. 2.
Sample points are chosen as a subset of the original well samples. To control surfaces, a sampling density equal to the grid resolution is sufficient. Since surface control points are mostly needed in horizontal or near horizontal sections of the well, the sampling is made as a function of the horizontal component of measured depth rather than as a function of measured depth. This way, vertical wells get no sample points except at the end of the well, while horizontal well sections are sampled at grid resolution. Sample points are not chosen close to well points unless they are at the very end of the well. For multilateral wells, samples should be Fig. 2 Cross section along a well path that intersects surface l at z l i (x i ) = z i (green disc). Well sample points, which are indicated by blue discs, constrain surface points above and below the path. These constrained surface points are marked as red discs identical up to the branching point since only one of these sets are needed.
Sample points near the surfaces will impose strong restrictions on the surface depth whereas sample points far away from surfaces will not impose any restriction. In practice, therefore, only a subset of all the sample points will suffice to guarantee that surfaces are consistent with the well paths.

Conditional probability density function
Let z = z l 1 (x 1 ), . . . , z l n (x n ) be any collection of surface values. The moments of the distribution ϕ(z | w) are given by the standard kriging equations in Eqs. 26 and 27. Our goal is to obtain the conditional probability density function Conditioning on the inequality constraints w I makes this a non-Gaussian density. This implies that efficient approaches such as kriging for calculating predictions and prediction uncertainties cannot be applied. In the following, we will make a representation of this function that allows standard techniques such as linear prediction (kriging) and conditional simulation to be exploited. Let be the collection of all surface points located above or below the sample points. These are the surface points illustrated as red discs in Fig. 2. The probability density function Eq. 6 can be reformulated as where ϕ(·) denotes a (multi-)Gaussian probability density function. The second line follows from the definition of a conditional distribution and the probability density function ϕ z|w, β, z I is Gaussian since z is independent of w I given β and z I . Notice that ϕ z|w, β, z I is a multivariate Gaussian probability density function fully specified by the stochastic model described in Section 3. Explicit calculation of f (β, z I |w, w I ) is difficult. Instead, we approximate this function using S simulated realizations {z I (1) , . . . , z I (S) } and {β (1) , . . . , β (S) } drawn from this distribution. The next sections explain an efficient way to draw these samples.

A Monte Carlo approximation to f (β, z I |w, w I )
A naïve approach to draw samples from f (β, z I |w, w I ) is to draw samples from ϕ β, z I | w and use rejection sampling to satisfy the constraints imposed by w I . However, this will not work in practice since the number of constraints is large (2N I ) and even a small rejection rate for each constraint would lead to a vanishing acceptance rate. This is illustrated in Table 1 for a few examples. A second problem is that ϕ β |w can be very different from f (β|w, w I ) reducing the acceptance rate even further. The efficiency problem caused by β is solved by using an expectation-maximization-like algorithm called the Data Augmentation Algorithm [31] and by replacing the rejection sampling at the core of this algorithm by a Gibbs sampler. This sampler is explained in detail in Appendix A.
The Data Augmentation Algorithm uses the following approximation that converges as the number of simulated samples S increases where the conditional expectation and covariances of ϕ β | w, z I (s) is given by Eqs. 24 and 25. The algorithm is a fixed point iteration where the probability densities, including β, becomes more accurate at every iteration.
Data Augmentation Algorithm end for 7: The Data Augmentation Algorithm generates samples from f (β, z I | w, w I ). It is initialized by drawing β independently of the constraints. The algorithm then alternates between drawing z I dependent on β (line 5) and drawing β dependent on z I (line 4). The constraints w I are only included explicitly in the rejection sampling step when drawing z I .
If the number of constraints w is large, the impact of w I is expected to be small and we can use f (β|w, w I ) ≈ ϕ β|w . In this case, line 4 in the algorithm can be replaced by β (s) ∼ ϕ β |w and the fixed point iterations become obsolete. This approach is numerically more stable and saves calculation time. Experience shows that N f = 20 fixed point iterations is sufficient for most practical applications, see Fig. 5. Increasing this number also increases the calculation time.

Conditional distribution
The Data Augmentation Algorithm generates S samples of z I and β that can be used to represent the density f (β, z I | w, w I ). This allows the double integral of Eq. 7 to be replaced by a sum of Gaussian probability density functions This approximation converges to the correct distribution as S → ∞.

Conditional simulation
Surfaces are normally represented on a grid with thousands or even millions of grid nodes. If z represent the surface values at all grid nodes, the representation Eq. 8 provides a straightforward simulation algorithm for f (z | w, w I ) where [β (s) , z I (s) ] is one sample drawn from the S samples provided by the Data Augmentation Algorithm. This corresponds to one point from each of the point clouds in Fig. 1(d). We can draw from the conditional multi-Gaussian distribution Eq. 9 using standard methods of drawing samples from Gaussian random fields conditioned to point observations.

Prediction and prediction uncertainty
The conditional expectation is a commonly used predictor. The conditional expectation of Eq. 8 is Since the density ϕ z|w, β (s) , z I where μ (s) = E z|w, β (s) , z I (s) andμ = 1 S S s=1 μ (s) . Note that Var z|w, β (s) , z I (s) is equal for all S samples. The conditional variance is a measure of the prediction uncertainty and becomes the standard kriging error if w I contains no information.
In principle, z can include all grid nodes, but since the expected surface depths only contain marginals, it is more efficient to repeat Eq. 10 for every grid node where G is the collection of all grid nodes. Similar to Eq. 12, the prediction uncertainty at every grid node is where The Monte Carlo uncertainty in Eq. 12 is which follows from independent samples and the identity Var A = E Var A|B + Var E A|B .

Equation 14
shows that the Monte Carlo variance is inverse proportional to the sample size and proportional to the difference between the kriging variance without constraints and the kriging variance with the constraints. This difference is zero if there are no inequality constraints and reduces to the kriging variance at the constraints. This implies that an upper boundary for the Monte Carlo uncertainty is the standard kriging variance divided by the sample size, Var z(x)|w /S. In the examples below we have chosen S = 256, giving a Monte Carlo uncertainty of at most 6.25 percent of the standard kriging uncertainty.

An efficient approximation for prediction and prediction uncertainties
The predictions Eq. 12 and prediction uncertainties Eq. 13 are time consuming to calculate for every grid node. This can be solved by replacing the non-Gaussian density conditioned to constraints w I by a Gaussian density conditioned to synthetic help points z I syn located at the constrained surface points f (z|w, w I ) ≈ ϕ z|w, z I syn .
[3] proposed to select synthetic well points such that Var z I |w, z I syn ! = Var z I |w, w I .
The right hand sides of these equations are approximated using Eqs. 10 and 11 that collapses into simple sums since e.g. E z I |w, β (s) , z I (s) = z I (s) . The first requirement is trivially ensured by selecting z I syn = z I obtained from Eq. 10. This will however give zero variances at z I so the synthetic well points must be assigned a multi-Gaussian measurement error specified by a 2 · N I -dimensional covariance matrix I err . [3] give explicit formulas for the expectation E z I syn and the measurement error I err that fulfills requirements Eqs 16 and 17.
The multi-Gaussian approximation Eq. 15 gives the following alternative approximations to Eqs. 12 and 13 syn . This is the conditional expectation and variance of a multi-Gaussian distribution given well points and the additional synthetic well points. Calculating the expectations and variances are done by traditional techniques such as kriging.

Synthetic examples
We shall investigate a synthetic example to illustrates the properties of the proposed method. The example uses the general exponential correlation function ρ(h) = e −3(h/R) α , with an isotropic range R = 1500 m and power α = 1.5. The grid dimension is 5800 m by 3800 m, the grid resolution is 25 m and well paths are sampled every 100 m of lateral deviation. The sampling is coarser than recommended to make the pictures below easier to read.
For many well path sample points there are no corresponding constrained surface points. The reason for this is two-fold: First, constrained surface points are only added for well path sample points where the surface is at risk of crossing the well when doing a stochastic realization. Experience shows that such crossings are unlikely if t = |z(x) − m(x)|/σ (x) < 2.5, that is, when the distance between the surface and the well is less than 2.5 standard deviations of the surface uncertainty. For an individual point, this value compares to a probability of 98.76 percent of crossing the well. Secondly, many constraints are highly correlated and carry the same information, and it is sufficient to keep the constrained surface points where the inequality constraints are likely to be violated. The closer the well is to the surface the fewer constraints are removed.

Two surfaces
Consider an anticline with two domes and two drilled wells, as shown in Fig. 3. Well w1 enters a geological layer from the top of the main dome and runs horizontally just below the saddle and leaves the second dome on its flank. Figure 4 shows a vertical cross section along well w1. The upper cross section shows two predicted surfaces, s1 and s5, conditioned on well points from both wells. Uncertainty bands of plus/minus one standard deviation are shown as grey. In the center cross section, surfaces have been conditioned to well points and constraining sample points. We can see that adding the constraints has pushed surface s1 correctly above the well and the uncertainty Fig. 3 Birds view of surface s1 with wells w1 and w2. The map is 5.8 km by 3.8 km bands have been reduced. Also note that the deeper surface s5 is strongly affected. It is pushed down by the leftmost constraining sample point and the mid section is pushed upwards because it is correlated to surface s1. The lower cross section of Fig. 4 shows one simulated realization.
According to Eq. 5, constrained surface points come in pairs, but this is not necessary in many cases. A simple  criterion to evaluate a constrained surface point is to compute its distance to the well. If this distance is large compared to the surface uncertainty, then we remove it from the calculations. In the center cross section of Fig. 4, several of the pairs have been reduced to a single constraint; one red disc is lacking at locations of constraining sample points.
The convergence of one of the trend coefficients is shown in Fig. 5. Other trend coefficients show similar convergence. A grey uncertainty band is plotted as plus-minus one standard deviation of the trend coefficient. In this simple case we have convergence after a few fixed point iterations. The remaining fluctuations are due to Monte Carlo noise, and are significantly less than the standard deviation.

Five surfaces
The reservoir is the same as in the previous example, but three intermediate surfaces (s2, s3, s4) have been inserted between s1 and s5. The number of well points increased from two to eight for each well. The upper cross section of Fig. 6 shows the five surfaces conditioned to these well points. There are several segments along the well path where the well incorrectly crosses surface s3 and s4. The surfaces are correctly placed when they are also conditioned to constraining well samples (center cross section). As discussed in Section 4, constraints are only applied to the surfaces above and below the well path. The other surfaces adapt to these through the correlation structure Eq. 4. The lower cross section shows one simulated realization. Figure 7 shows the depth uncertainty map of s3 when conditioned to well points (top) and additional constraining

Real case
We now turn to a real world example where data have been anonymized. The example has seventeen surfaces and fourteen wells, most of which are strongly deviated or horizontal. The grid dimensions are 15 000 m by 6 000 m The upper part of Fig. 8 shows the predicted depths in a vertical cross section of a particular well. The surfaces are conditioned to the 215 available well points, eleven of which belongs to the well shown. The surfaces are seen to be inconsistent with the well path for large parts of the well, starting at the heel of the well and continuing for nearly two kilometers. Since the well runs inside a layer that is less than 5 m, it is impossible to get the layering correct without more information.
In the center part of Fig. 8, constraining sample points have been added to the calculation, and the predicted surfaces are now consistent with the well path. Notice that all surfaces are affected since they are highly correlated. The length of inconsistent layering is reduced from 3 700 m to 22 m for the fourteen wells, and all remaining errors are Fig. 8 Vertical cross sections through a well. Top: Surfaces conditioned to well points. Center: Surfaces conditioned to well points and constraining sample points. Bottom: Stochastic simulation conditioned to well points and constraining sample points less than one meter. There are 1 377 constraining sample points available and a corresponding 2 744 constrained surface points. Most of these are redundant due to strong correlations, and the set of constrained surface points was reduced to 612. This number can be increased or decreased by adjusting a correlation threshold. If the number of points is increased, the surface consistency generally improves but on the expense of increased computation time. The current calculation took 70 seconds on a standard 4 kernel laptop (Intel i7-7820HQ 2.90 GHz). The most time-consuming parts of the calculation can be parallelized, and the speedup scales close to linearly with the number of processors for laptops and small work stations.
The bottom part of Fig. 8 shows a stochastic simulation, see [5]. For all wells, in this particular realization, the total length of inconsistent layering is 68 m out of which 2 m has an error larger than one meter. One hundred stochastic simulations were run in 50 minutes. The average length of inconsistent layering in these simulations was 101 m. The worst simulation had 418 m of inconsistent layering, whereas the best simulation had only 18 m.

Comparing the Gibbs sampler to rejection sampling
A Gibbs sampler is used in line 5 of the Data Augmentation Algorithm. For comparison, the Gibbs sampler has been replaced by a naive rejection sampler that propose samples from a multivariate Gaussian distribution until all the constraints are satisfied. Table 1 compares the CPU times spent by the Gibbs and a rejection sampler for the examples in Sections 6 and 7. The rejection sampler failed to produce a single sample after three days of computation on the real example while the Gibbs sampler provided a sample in 7 seconds. So naive rejection sampling can be used for simple examples but fails for realistic cases with many constraints.

Discussion and closing remarks
A method for predicting and simulating a set of correlated geological surfaces conditioned to well points and inequality constraints has been presented. At the core of the approach is a resampling of the well to a finite set of constraints and the Data Augmentation Algorithm that finds a representation of the multivariate distribution of the surface depths at the constrained sample points. The multivariate distribution is a truncated version of a highdimensional, highly correlated multi-Gaussian distribution. Drawing samples from this truncated distribution efficiently is challenging. Common trend coefficients impose a global dependency between all observations that is resolved by the iterative estimation scheme in the Data Augmentation Algorithm. An important step in the Data Augmentation Algorithm is rejection sampling from a multi-Gaussian distribution. Straightforward rejection sampling will not work since the acceptance rate is practically zero. Therefore, a very efficient Gibbs sampler that exploits the structure of the problem is used for drawing samples from the truncated distribution.
The efficiency of the methodology has been demonstrated on a number of real cases, see [30] and [21] for two case studies from the North Sea. In those cases, large surface grids are divided into subgrids that are handled semi-independently for better efficiency [32]. Further applications and practical details can be found in [6].
The methods proposed in this paper have been extended to model well path uncertainty by [10].

Appendix A: Gibbs sampler for truncated multi-Gaussian distribution
The time consuming part of the Data Augmentation Algorithm is the rejection sampling in line 5 since acceptance rates are minute in realistic examples. The number of sample points and constraints can be hundreds or more. We aim thus at an efficient rejection sampler for a multivariate truncated Gaussian distribution, where the goal is to obtain a valid sample z I (s) . One possibility is to use the Gibbs sampler in [27]. The essential idea is to sequentially update each single variable conditioned on all other variables and loop through all variables many times.
Let z = z I (s) = {z 1 , . . . , z n } be one sample s of the constrained surface points z I with size n = 2 · N I . We define next an iteration z k that approximates the sample z, for every iteration in the Gibbs sampler. Starting with an initial sample z 0 = {z 0 1 , . . . , z 0 n }, we obtain the sample k from the sample k − 1 by sequentially drawing from the univariate distribution for i = 1, . . . , n. Here N z i ∈R i is the univariate truncated Gaussian distribution, truncated on some interval R i = (−∞, z I i ) or R i = (z I i , ∞) and the symbol i denotes all the indices except i. The acceptance rate in each step will normally be quite low but can be improved by using the optimal accept-reject algorithm in [27]. The sequence of iteration updates could be arbitrary so we may update any permutation z j 1 , . . . , z j n of the components z 1 , . . . , z n . A convenient permutation that takes into account the hardest constraints first improves the convergence of this sampler.
This univariate Gibbs sampler will in principal converge to the correct distribution but it suffers from extremely slow convergence since the constrained surface points z I are close laterally and are therefore highly correlated. The change in each iteration in Eq. 18 is therefore small and convergence is extremely slow. To improve on this we need to utilize the geometry of the problem.

A.1 Block Gibbs sampler
For simplicity of the notation, from now on we ignore the trend coefficients β. We introduce the distances from the well depth z I i to both surfaces above and below Define next the distance pairs The constraints Eq. 5 take on a simple form using the new variates, i.e., d i ∈ R + = R + × R + , for all i. We may thus equivalently draw from the distribution in terms of the distance variates A bivariate Gaussian distribution with mean μ and covariance truncated on the set R + = R + × R + will be denoted by N R + (μ, ). The conditional mean and expectations of a pair d i given well points w are denoted bŷ μ andˆ .
We now present a block version of the Gibbs sampler Eq. 18 where variates are simultaneously updated in pairs. The pair strategy has proven to be way more efficient than the univariate strategy explained in Section A. Starting with initial realization pairs d 0 = {d 0 1 , . . . , d 0 N I }, the block Gibbs algorithm computes the sample k from the sample k − 1 by sequentially drawing from the bivariate distribution Again, the symbol i denotes all the indices except i. The two-dimensional meanμ i = E d i | d k i and the 2 × 2 covariance matrix covarianceˆ i = Var d i | d k i are given by the well-known expressionŝ Here μ i = E [d i ] denotes the 2 × 1 mean of the pair i and i,j = Cov d i , d j is the 2 × 2 covariance matrix. The vector μ i excludes the index i from the full mean vector μ = E [d]. The matrix u,v is the part of the full covariance matrix = Var (d) with indices u ∈ u and v ∈ v.
The computation of the matrix −1 i,i of size (n − 2) × (n − 2) can be expedited by pre-computing and storing the full precision matrix V = −1 . We re-utilize parts of this matrix with the help of the following identity Paper [9] presents a satisfactory fast rejection sampler to draw from this type of distributions. The basic idea behind the fast truncated bivariate sampler is to divide the variable space into cases for which a conveniently chosen proposal distribution gives theoretical acceptance rates of 0.5.

A.2 Iterative approximation of the multivariate mean
The large number of dimensions and highly correlated points poses a threat to the convergence of the Gibbs sampler in Eq. 19 since the large number of iterations needed for convergence makes it prohibitively expensive. The Gibbs sampler requires therefore an initial state. We obtain that state by a (non-linear) iterative scheme whose objective is to approximate the mean of the truncated multivariate distribution of all distance pairs. The main idea is that the bivariate mean δ of a truncated pair given the rest of the pairs obtained in a previous iteration is used as hard data instead of a randomly drawn vector as in Eq. 19.
For each iteration, k, we compute δ k = {δ k 0 , . . . , δ k N I } as the mean of the positively truncated bivariate Gaussian distribution with untruncated mean. I.e., the bivariate Gaussian has the moments computed with formulas Eqs. 20-21 and The mean of this bivariate truncated Gaussian variable is computed analytically using formulas in [28]. Our experience on applications indicate that 1000 fixedpoint iterations is sufficient for a good approximation. In [17] each variable is updated using univariate conditional truncated variables. The authors show that a sufficient condition for this iterative scheme to converge is that the full precision matrix is diagonally dominant. This condition is not satisfied in our application. The authors show numerically that the iteration gives good approximations for the multivariate mean even if diagonal dominance is not satisfied.
This exploratory analysis of our multivariate truncated distribution needs to be handled with care. Using Gibbs sampler in Eq. 19 with the initial vector resulting from Eq. 22 underestimates the uncertainty. A remedy that produces better spread is to start the Gibbs sampler with higher conditional variances which accounts to adding random noise to the initial sample. N 1 ) . . .

B.1 Estimating the trend coefficients
The best linear unbiased estimator (BLUE) for the coefficients is the generalized least squares (GLS) estimator [22, p. 172] where the kriging matrix K is the covariance matrix with elements given by Eq. 4. A Bayesian estimate is obtained by specifying the prior means and the prior covariances in the prior P -dimensional Gaussian distribution for the coefficient values β 0 ∼ N P (μ 0 , 0 ).
The Bayesian estimate for the posterior expectations and covariances are This estimate is robust for any number of surface observations, N, including zero. If the prior uncertainty vanishes, 0 → 0, we recover the prior mean μ 0 . It can also be shown, with reasonable assumptions, that if 0 → ∞, we obtain the GLS estimate [26].

B.2 Conditional distribution given surface observations
The kriging predictor for a surface l is where the estimated means are obtained from the coefficient estimates as m l (x) = f l (x) · β and m = F · β. For universal kriging, with unknown trend coefficients, the GLS estimates Eq. 23 are used for the coefficients, whereas for Bayesian kriging, the Bayesian estimates Eqs. 24-25 are used for the coefficients. The covariance vector k l (x) is the covariance between the location of interest and all the surface observations: k l (x) = Cov Z l (x), z , where covariances are given by Eq. 4. The Bayesian prediction error is given by Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.