Exploring substitution random functions composed of stationary multi-Gaussian processes

Simulation of random fields is widely used in Earth sciences for modeling and uncertainty quantification. The spatial features of these fields may have a strong impact on the forecasts made using these fields. For instance, in flow and transport problems the connectivity of the permeability fields is a crucial aspect. Multi-Gaussian random fields are the most common tools to analyze and model continuous fields. Their spatial correlation structure is described by a covariance or variogram model. However, these types of spatial models are unable to represent highly or poorly connected structures even if a broad range of covariance models can be employed. With this type of model, the regions with values close to the mean are always well connected whereas the regions of low or high values are isolated. Substitution random functions (SRFs) belong to another broad class of random functions that are more flexible. SRFs are constructed by composing (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z=Y\circ T$$\end{document}Z=Y∘T) two stochastic processes: the directing function T (latent field) and the coding process Y (modifying the latent field in a stochastic manner). In this paper, we study the properties of SRFs obtained by combining stationary multi-Gaussian random fields for both T and Y with bounded variograms. The resulting SRFs Z are stationary, but as T has a finite variance Z is not ergodic for the mean and the covariance. This means that single realizations behave differently from each other. We propose a simple technique to control which values (low, intermediate, or high) are connected. It consists of adding a control point on the process Y to guide every single realization. The conditioning to local values is obtained using a Gibbs sampler.


Introduction
Random fields play a key role in Earth sciences (Chilès and Delfiner 1999;Lantuéjoul 2002).Indeed, stochastic spatial (or temporal) simulation is one of the most important tools for uncertainty quantification allowing the forecasting of natural phenomena and risk assessment.For example, the modeling of groundwater flow and solute transport underground requires hydraulic conductivity fields as input for the numerical code solving the flow equations.Generating an ensemble of stochastic hydraulic conductivity fields is a key step for representing and propagating the uncertainty in this system.In general, the goal of geostatistical simulation techniques is to provide methods to generate random fields that respect some spatial features and honor conditioning data if present.In particular, the size, shape, orientation, and connectivity are spatial characteristics that need to be controlled by a simulation technique to represent realistic geological structures.
Non-parametric methods such as multiple-point statistics (Mariethoz and Caers 2014), or machine learning techniques such as generative adversarial networks (Goodfellow et al. 2016) allow generating complex realistic random fields, providing that a training data set is available.Such algorithms are extremely flexible because they do not require inferring the parameters of an analytical statistical model, but they can be difficult to set up (parameters, neural network architecture) and can be time-consuming (learning structures from training data).
Conversely, simulation methods based on analytical models are easier to set up, and faster, but they are limited in terms of the complexity of the generated structures.A broad description of such algorithms can be found in Chilès and Delfiner (1999).The most standard simulation techniques are based on the multi-Gaussian random field (or function) (GRF) model (see Chilès and Delfiner 1999, p. 394-395) also known as Gaussian processes (Rasmussen and Williams 2006).These random functions assume a multivariate Gaussian distribution as their spatial statistical law and can be defined by a covariance model, describing the statistics between any pair of points according to their relative location.Considering stationary GRFs, anisotropies, and orientations are easy to handle with the covariance model, but there is no mean to control and simulate various connectivity patterns (Renard and Allard 2013): values around the mean are always well connected (in more than one dimension), and low-or high-value regions are isolated.
But, connectivity patterns in hydraulic conductivity fields have a very strong impact on groundwater flow and solute transport (Zinn and Harvey 2003;Knudby and Carrera 2005;Renard and Allard 2013;Tyukhova and Willmann 2016): low-conductivity connected regions can act as a barrier to the flow, whereas high-conductivity connected regions enable flow paths and fast mass transfer.Zinn and Harvey (2003) proposed a simple technique to get low or high values connected from a GRF.A zero mean GRF is transformed by taking the absolute value which produces a low-value connected region, or high-value connected by reversing the sign.Then a normal-score transform is applied to ensure that the marginal distribution is Gaussian and finally, a coordinate rescaling allows adjusting the covariance.However, this strategy produces peaks (non-derivable) for extreme value areas, and honoring the conditioning data is difficult.
In this article, we propose to modify stochastically GRFs to enrich the range of connectivity patterns that can be simulated while keeping the ability of conditioning.We use the framework of substitution random function (SRF), defined as the composition of two independent random processes, Z(x) = Y(T(x)) .This family of random functions was intro- duced by Lantuéjoul (1993).In his book, Lantuéjoul (2002) derives properties on Z assuming that the directing function (latent field T) has stationary increments and that the coding function is stationary.Moreover, he describes an algorithm for conditional simulation and proposes examples for categorical fields.These examples are based on Chentsov simulation as the directing function and a Markov chain as the coding process.
To our knowledge, only a few applications of SRF can be found in the scientific literature.Recently, Allard et al. (2020) developed a simulation technique for generating space-time random fields, where the coding process consists of a cosine function with a random amplitude and a random phase.Emery (2008) develops SRF methods for continuous simulation, based on a multivariate directing function composed of independent latent GRFs with unbounded variograms, and on a GRF as the coding process with separable covariance.In this way, a finite integral range can be obtained for the resulting SRF and, as a consequence, its ergodicity.Illustrations on a pollution data set show the ability of SRF to generate realizations of pollutant concentration depicting clustering of high values with more spatial contrasts than classical GRF.
In this work, another point of view is adopted, we investigate how to modify a stationary GRF considered as a latent field (directing function) with the use of a continuous coding process defined as a uni-dimensional multi-Gaussian process, to obtain various connectivity patterns.Considering that the directing function has a bounded variogram (finite variance) implies that the resulting SRF is not ergodic for the mean and the covariance.Different characteristics will be depicted from one realization to another.We introduce a control point on the coding process to guide the simulation towards the desired connectivity property.The idea is to condition the coding process at the mean of the latent field.Moreover, we derive an expression for the expectation of the probability distribution function of the simulated values in a single realization for this case.This allows applying an anamorphosis (normal score transform, or more generally change of distribution) while preserving the ability to generate conditional simulations.
The paper is organized as follows.Theoretical developments are proposed in Sect. 2 to 4, illustrations are presented in Sect.5, and finally, a discussion and conclusions are given in Sect.6.

Substitution random functions (SRF) as a composition of multi-Gaussian processes
A substitution random function (SRF) Z on ℝ d with values in ℝ is a composition where T and Y, respectively called the directing function and the coding process, are two independent random functions.
The directing function is assumed to have stationary increments, i.e. the distribution of T(x) − T(x + h) depends only on the lag vector h, and the coding process is assumed to be stationary, i.e. the distribution of Y at a family of locations {t i } is the same as the distribution at {t i + t} for any t.Note that the directing function can be multivariate, T with values in ℝ k , which implies a coding process on ℝ k .In the follow- ing, we consider the univariate case ( k = 1 ).Under these assumptions, some properties on Z are known.In particular, the SRF Z is stationary with same distribution at any point as Y (Lantuéjoul 2002), i.e. denoting D V the point distribution of a random function V, Moreover, denoting C Y (s) = Cov(Y(t), Y(t + s)) the covari- ance function of Y (assuming it exists), the covariance function of Z, C Z (h) = Cov(Z(x), Z(x + h)) , is expressed as (Lan- tuéjoul 2002)   In the following, we focus on the case where

Integral range and ergodicity
For a stationary random function V on ℝ d , the average value over a domain Ω ⊂ ℝ d , is an unbiased estimator of the point mean V = (V(x)) (independent of x), since [V(Ω)] = V .If its variance tends to zero when Ω grows to ℝ d (a property known as ergodic- ity for the mean, see Lantuéjoul (2002)), then this means that V can be estimated by taking the average value of a single realization over a large domain.Assuming V secondorder stationary with covariance C V , the variance of V(Ω) is expressed as (5) It is linked to the integral range of V defined as that can be computed, if C V is integrable, as (Lantuéjoul 2002) This gives a simpler expression for the variance of V(Ω) for a large domain Ω, Assuming that the covariance function C T of the GRF T decreases towards 0 (when |h| increases), such that the integral range of T, I R (T) (Eq. ( 10)) is finite, implies Var(T(Ω)) → 0 when Ω → ℝ d , i.e.T is ergodic for the mean.However, even if the covariance function of Y is rapidly decreasing towards 0, from Eq. ( 6) the covariance function of Z is decreasing but lower bounded by g Y (2C T (0)) = g Y (2 2 T ) , which is str ictly posi- tive since the variance of T is finite.This implies by Eq. ( 10) that I R (Z) = +∞ , and by Eq. ( 8) that T ) dh is finite, then, with 1 Ω the indicator function of the domain Ω, and Var[Z(Ω)] converges to g Y (2 2 T ) when Ω → ℝ d , i.e., for a large domain Ω, As Var[Z(Ω)] does not vanish, Z is not ergodic for the mean.A similar analysis of the ergodicity of the covariance can show that it is not ergodic as well for the covariance.This will be illustrated graphically with some examples in Sect. 5. ( 8)

Ensemble covariance and distribution of SRF
The non-ergodicity of Z for the mean and the covariance implies that one cannot infer the properties of Z from a single realization.For example, to estimate its covariance function (Eq.( 6)), an ensemble of realizations {Z i } i∈I has to be considered on a large domain Ω, (with Ω and I equipped with the uniform distribution).The covariance function C Z is then referred to as an ensemble covariance function.Similarly the cumulative distribution function (CDF) of Z, which is known to be the CDF of N( Y , 2 Y ) (see Eq. ( 2)), can be estimated from the ensemble of realizations, where 1 ⩽z is the indicator function of the interval ] − ∞, z] , and Φ the CDF of N(0, 1).
In the next sub-sections, we analyze the properties of single realizations.This is important since in practical applications single realizations are used as input for further computations.

Covariance of single SRF realizations
To decouple the ensemble of realizations and the simulation domain in the estimation of the covariance (Eq.( 14)), we use the fact that, for any random variables U, V, W, where C Z i denotes the covariance of a single realization Z i computed over Ω .Note that in the last step, h is assumed to be a small lag vector compared to the size of Ω , such that in the second term, x (Z i (x)) ≈ x (Z i (x + h)) ≈ Z i (Ω) .Hence, the covariance function for a single realization, C Z i , is in mean equal to the ensemble covariance function C Z , shifted by the variance of the average value over the simulation domain, the mean covariance function for single realization of Z, it follows by Eqs. ( 6), ( 13) and ( 18) that, for a large domain Ω, Analytical expressions for g Y (s 2 ) are given in Table 1 in the case of classical covariance functions C Y of type Gaussian, exponential, and spherical, with a sill of 2 Y and a range of r Y .They are obtained by simple integrations (the result for the Gaussian and exponential models can also be found in Emery (2008)).

Role of the parameters of the covariance models for the directing function and the coding process
In this section, we investigate the influences of the ranges and sills (variances) of the directing function T and the coding process Y onto the mean covariance function C Z s for a single realization of Z.
From Eq. ( 19), C Z s vanishes (or tends to zero) when C T does, therefore the mean range of single realization of Z is ( 17) equal to the range of T, r Z s = r T .In particular, a realization Z of the SRF displays the same anisotropies as in the latent field T.
The covariance function C Y with range r Y and sill The mean sill of a single realization of Z is then equal to Thus, the sill of Y, 2 Y , and the ratio 2 T ∕r 2 Y controls the mean sill of a single realization of Z: This means that taking a very small range r Y compared to T vanishes the spatial correlations on Y which tends to be a purely white Gaussian noise, and the variance of the resulting field Z will be equal to 2 Y .On the opposite, a very large range for Y compared to T implies almost no variation in the resulting field Z (sill decreases to zero), because the coding process Y(t) will return nearly constant values for the simulated values t of the latent field.In particular, the mean sill of a single realization of Z, 2 Z s , does not exceed the sill of Y.
To summarize: the range(s) r T controls the size and shape (anisotropy) of the main structures in single realizations Z, the ratio 2 T ∕r 2 Y controls the size of the small scale fluctuations within these main structures, and the sill 2 Y controls the overall amplitude of the simulated values in Z (see Figs. 1, 2 in Sect.5).( 23)

Adding control points on the coding process
As a consequence of the non-ergodicity of the SRF Z = Y•T , the properties of a single realization of Z may significantly differ from one realization to the other.First, as the values of the latent field T follow a normal distribution around its mean T (for instance more than 95% of the T values are in the interval T ± 2 ⋅ T ), and providing that the ratio 2 T ∕r 2 Y is not too high (i.e. the values do not vary too rapidly through Y), the distribution of the simulated Z values in one realization strongly depends on the value of Y( T ) : a value smaller than Y (mean value of the Gauss- ian process Y) will favour low values (compared to Y ) in the field Z, whereas a value greater than Y will favour high values.
Secondly, it is well known that, whatever the stationary covariance model for the directing function T, one can observe (in more than one dimension) that the region with values close to T in the latent field, {x ∈ Ω T(x) ≈ T } , is well connected, whereas the low-value and high-value regions are made up of several isolated zones (Zinn and Harvey 2003).Hence, in the field Z(x) = Y(T(x)) the region with values close to Y( T ) will be well connected (see Figs. 1, 2  in Sect.5).
According to this finding, a simple idea consists of imposing the value of Y at T that has to be connected.Changing this value allows for exploring several scenarios of connectivity patterns.Note that in general an ensemble of control points on the coding process, {Y(t k ) = y k } k∈K , can be considered.

Ensemble distribution of SRF conditioned
to where the third equality holds because of the independence of T and Y, and the last equality follows from the conditional CDF of the distribution in Eq. ( 25) expressed with the CDF Φ of N(0, 1) and from T(x) ∼ N( T , 2 T ).Note that the CDF in Eq. ( 26) is an ensemble distribution, which can be estimated from an ensemble of realizations Z i | Y( T ) = y T , as in Eq. ( 15),

Specifying a target distribution
The goal is to obtain realizations of the SRF with values following a marginal cumulative density function (CDF) G.
Knowing the ensemble distribution F Z , it is possi- ble to transform the values of Z by applying the anamor phosis Thus, for any realization, the transformation is For unconstrained SRF simulation, F Z is the CDF of N( Y , 2 Y ) (see Eq. ( 2)), whereas for SRF controlled by the value of Y at the mean of T, i.e.Z | Y( T ) = y T , the CDF given in Eq. ( 26) can be used to define the anamorphosis . The ensemble distribution can be expressed as the mean distribution of all the single realizations (see Eqs. ( 15) and ( 27)).Hence, an idea to reduce the spread of the ensemble of the single CDFs is to apply the anamorphosis In this way, each realization is transformed using its own anamorphosis that accounts for the value of the underlying realization of the coding process at the mean of the simulated T field.Note that one single realization could be transformed by an anamorphosis based on its empirical ( 26) CDF itself, however, this latter often displays sharp transitions and should then be smoothed to get a reliable anamorphosis function.

Conditional SRF simulations with control points on the coding process
Conditioning SRFs can be done using a Gibbs sampler (Lantuéjoul 2002;Emery 2008).In this section, we show that this strategy can still be used in the presence of control points in the coding process.Let {Y(t k ) = y k } k∈K a set of control points on Y and consider a set of conditioning data {Z(x j ) = z j } j∈J .The aim is then to generate conditional simulations of The following algorithm allows generating one conditional realization on a domain Ω .
(1) Generate Whereas the steps (2) and (3) consist of classical conditional multi-Gaussian simulations, the step (1) is more difficult: the aim is to generate values t j that are the outputs of T at the conditioning locations x j , and the inputs of Y sent to the conditioning values z j .Hence, these values t j must be consistent with the covariance of T regarding the locations x j and with the covariance of Y regarding the locations t k and the values z j and y k .This step (1) is done with a Gibbs sampler as follows.
(1a) Initialization: generate {t j = T(x j )} j∈J , unconditional simulation of T at the conditioning locations x j .(1b) Choose randomly (and uniformly) one index j 0 ∈ J. (1c) G e n e r a t e a c a n d i d a t e v a l u e t � j 0 = T(x j 0 ) | {t j = T(x j )} j∈J,j≠j 0 .
(1d) Compute the Metropolis ratio (see appendix A) and update t j 0 : accept the candidate t ′ j 0 with probability p j 0 = min(1, r j 0 ) , i.e. draw u uniformly in [0, 1], and set t j 0 = t � j 0 if u ⩽ p j 0 , and let t j 0 unchanged otherwise. (29) Go to step (1b) until a given number of iterations is reached.
This algorithm produces a Markov chain t (n) j j∈J n⩾1 following the distribution as wanted in step (1).Note that the acceptation probability p j 0 in step (1d) can be defined more generally as p j 0 = f (r j 0 ) , where f is a function defined on positive real number with values in ]0, 1] verifying f (u) = u ⋅ f (1∕u) .The function min(1, u) is such a function, u∕(1 + u) another one.Note finally that as for steps (2) and (3), step (1c) and the computation of the Metropolis ratio (Eq.( 29)) in step (1d) involve only classical conditional multi-Gaussian simulations.The numerator (as well as the denominator) is treated by solving a kriging system to retrieve the mean and variance of the corresponding Gaussian distribution (the density function is used instead of ℙ(.)).
Note that, provided that the ensemble distribution of the SRF is known a priori (before generating realizations), the anamorphosis H discussed in Sect.3.2 could be used to approximately fit a target distribution.In this situation, the data values z j = Z(x j ) are first transformed via H −1 , i.e. zj = H −1 (z j ) , then the conditional SRF simulation Z is done (given � Z(x j ) = zj ), and finally the resulting field is back- transformed via H, i.e.Z(x) = H Z(x) .However, as a con- ditioning data point may influence any point in the simulation grid (non-ergodicity of Z), the ensemble distribution of SRF is no longer the same one as for unconditional simulation; therefore, the anamorphosis only helps guide the realizations towards the target distribution, but the final ensemble CDF will not fit it exactly.

Illustrations
In the following examples, covariances are used for the directing function and the coding process.The Matérn covariance model (Stein 1999) of parameter is given by the function (in one dimension) defined as where K is the modified Bessel function of the second kind of order (Olver et al. 2010).If h → 0 , then M (h) → 2 , the variance of the model.The parameter r is a scale parameter linked to the effective range, r eff , such that M  (h) < 0.05 ⋅  2 for h > r eff ; given , one can numerically compute r as a (30) function of r eff and inversely.The advantage of such a model is that the parameter controls the smoothness of the resulting random fields: for = 1∕2 , one gets the exponential model of effective range 3r, M 1∕2 (h) = 2 exp − |t| r , and if → +∞ , then M (h) → 2 exp − t 2 2r 2 , the Gaussian model of effective range √ 6r (see expression of classical model in Table 1).

Simple case and influence of the range of the coding process
Two-dimensional SRFs Z = Y(T(x)) are generated in a simu- lation domain Ω of 250 × 200 cells.For the latent field T, a Matérn covariance model of parameter T = 3∕2 is used, with effective ranges of 45 and 15 (in number of cells) along horizontal and vertical axis respectively.The variance is set to 2 T = 1 and the mean to T = 0 .Note that for convenience these values for the variance and the mean for T can be kept constant because the final range of values in Z is controlled by the parameters of the coding process Y.
For the following examples, the mean of Y is set arbitrarily to Y = −3 , its variance to 2 Y = 2 , and a (uni-dimen- sional) Matérn covariance model of parameter Y = 3 is cho- sen (rather smooth).Different values of the effective range r Y are used in the following examples, they are taken as a given coefficient times T (according to the discussion in Sect.2.4).
Figure 1 shows one example of a realization of the SRF Z(x) with r Y = 3 ⋅ T .The figure shows the simulation used for the directing function T(x) and the coding process Y(t).The most important feature is that the intermediate values (around 0) are connected over large distances in the simulation of T(x) while the low values (around -5) are those which are connected in Z(x).Depending on the simulation of the coding process, the range of connected values will change.This feature is crucial since it will allow covering a broader range of connectivity patterns than the GRFs. Figure 2 shows another example with a smaller correlation length for the coding process, r Y = 1 ⋅ T .When comparing Figs. 1 and 2, we see that the sizes of the main structures in the fields T and Z are similar in both figures, but there are more inner variations within the large structures when r Y is smaller, as expected.We also observe that large values in the Z field in Fig. 2 are more frequent and more connected than in Fig. 1.This is not related to the parameter r Y , but it is explained by the fact that the value of Y( T ) is low in Fig. 1 (resp.high in Fig. 2) compared to Y (see the dotted lines in the figures), as discussed in Sect.3.

Ensemble of unconstrained SRF simulations
Distributions and covariances computed from an ensemble of SRF realizations are illustrated in this section.The same simulation domain and the same parameters as in the previous section are considered for both the directing function T and the coding process Y, except the range for Y set to r Y = 2 T .
An ensemble of 200 realizations of Z(x) = Y(T(x)) is generated.Figure 3 shows two realizations from this ensemble as well as their density distributions and covariances.For each realization Z i , the empirical cumulative distribution function F Z i and the covariance function along x-axis, C Z i (h) = Cov x∈{x∈Ω∶x+h∈Ω} Z i (x), Z i (x + h) with hor- izontal lag vector h, are computed.Statistics (min, max, mean, and quantiles) are then retrieved from these curves.For the CDF (Fig. 3d), the mean curve i F Z i (in brown in the figure) is similar to the CDF of N( Y , 2 Y ) (in pink), according to Eq. ( 15).For the covariance (Fig. 3e, f), the mean curve i C Z i (h) (in brown) also fits well the theoretical function (in pink) given by Eq. ( 19) (with g Y computed empirically).The two realizations dis- played in Fig. 3a, b are selected such that their empirical CDFs are respectively below and above the quantiles 25% and 75% of the theoretical CDF at Y .This figure shows the wide variability of the marginal distributions and covariances of the realizations obtained with the SRF model.In the following sections, we will use a control point on Y and histogram transforms to better constrain the realizations.

SRF simulations with a control point
The same set-up as in the previous section is considered but a control point is added to guide the simulations.An ensemble of 200 realizations of the constrained SRF The results are shown in Fig. 4.

Figure 4d shows that the mean CDF curve
(in pink) given by Eq. ( 26) (which is no longer Gaussian).Figure 4e, f show that the mean covariance curve does not deviate much from the theoretical covariance function given by Eq. ( 19) (not accounting for the control point).The two realizations displayed in Fig. 4a, b are selected such that their empirical CDFs are respectively below and above the quantiles 25% and 75% of the theoretical CDF at 1∕2( Y + y T ) .Note that the high-value region is rather well connected in both these realizations, but their covariance and distribution are rather different.
Compared to the unconstrained SRF simulations (Sect.5.2), the distributions of the simulated values in every realization are less spread around the theoretical mean distribution (compare Figs. 3c,d and 4c,d), whereas the covariance curves show similar variability (compare Figs. 3e, f and 4e, f).

Conditional SRF
This section compares conditional simulations obtained with the SRF (Z) and GRF (X) models.Five conditioning data points are considered in the simulation grid (same domain as in the previous examples).The data values are respectively set to −5 and −1 for the two points in the lower and upper parts of the simulation grid, and to −3 for the point near the center (see the circles in the first row of Fig. 5).For the SRF, we use the same parameters as those employed in Sect.4, with the target distribution is used, where G is the CDF of the target distribution.As mentioned in the last paragraph of Sect.4, the target will not be fitted exactly.For the GRF, we propose to use the covariance model used for the latent field T, but with the variance and mean of Y to fit the target distribution.The following three cases are considered.
(1) SRF with a low value as control point, In Fig. 5, the first realization is displayed for each case in the top row, and the statistics on the cumulative distribution and the covariance along the x-axis of every realization in the middle and bottom rows respectively.For the SRF, the distributions are guided by the target one, but the tails do not match well the target, whereas, for the GRF the distribution matches very well the target distribution everywhere.The theoretical mean covariance, computed without accounting for the control point, the conditioning data, and the anamorphosis, is shown as a pink dashed line on the two first plots.The presence of conditioning data and the anamorphosis explain the deviation from the actual mean of the covariances of the single realizations.Note that the covariance of the GRF is defined as 2 Y times the covariance Y ⋅ C T (h) , thus the last row in Fig. 5 shows how the coding process Y transforms the covariance of T into the covariance of Z.
We then consider the connectivity of the simulated fields.The top row of Fig. 6 shows the first realization of each case, thresholded in three categories: in blue (resp.orange) the cells having a value less than Y − Y (resp.greater than Y + Y ) and in green the remaining cells (with values between Y ± Y ).The realization of the SRF for case (1) has the low values (blue) well connected, whereas for case (2) the high values (orange) are well connected.For the GRF, the middle values (green) are well connected.This visual analysis confirms that the SRF simulations seem to have a different type of connectivity than the GRF simulations.
To quantify the connectivity properties, we use two metrics: the Γ connectivity function Γ(v) and the connectivity function (h) .These metrics are described in detail in Renard and Allard (2013).They are defined for any continuous field Z on a grid Ω as follows.For a value v, the subset of Ω composed of the cells where Z is less than or equal to v, S v = {x ∈ Ω ∶ Z(x) ⩽ v} , is considered, and the number N(v) of its connected components, and their respective number of cells, n 1 , … , n N(v) , are retrieved.Then, Γ(v) is defined by Renard and Allard (2013) as the probability that two cells randomly chosen in S(v) are connected (i.e.belong to the same connected component).It can be expressed as where i=1 n i is the total number of cells in S(v).Note that Γ(v) is set to 1 if S(v) is empty.When this prob- ability is equal to 1, all the grid cells having a simulated (32) value lower than v belong to the same connected component.When the probability is close to zero, the set of cells with Z lower than v is highly fragmented and composed of many unconnected subsets.For each realization, the curve Γ(v) is computed and shown in the middle row of Fig. 6.A complete characterization of the connectivity pattern would require in addition the computation and analysis of the Γ c (v) function for the complementary set corresponding to the values higher than the threshold.But, this analysis would go beyond the scope of this paper.Here, we can already observe and conclude from the middle row of Fig. 6 that the three Γ(v) functions are very different.On those plots (middle row of Fig. 6), the vertical dotted line indicates the abscissa value Y .For the GRF, the Γ curve rapidly increases around this value, whereas for the SRF in case (1) it starts to increase before, reflecting the good connection of low values.In case (2), the curve remains longer with low probabilities, meaning that the connections of the values below the threshold are broken by the connections of the high-value areas.
The connectivity function (h) is another tool to quantify the connectivity.It provides more information (about the size of the connected components) but is restricted to indicator (categorical) random functions.Here we apply it only for the high values of Z.For a realization Z, the set  6 for the first realization.Then, the probability that two grid cells x and x + h in M distant from a horizontal lag vector h are in the same connected component, is computed and written the two-head arrow meaning that a path of adjacent cells within the set M and linking the two cells exists.Statistics on the curves computed for each realization in each case are shown in the bottom row of Fig. 6.One observes that the SRF with the high control point value (case (2), middle column) shows higher probabilities of connection when h increases.This means that the probability of observing larger connected components is higher in this case.Although the range of the covariance is longer for the GRF than for the SRF, the curves decrease faster (right column) and therefore the probability of getting large connected components drops rapidly to zero in that situation.This confirms that the SRF model can cover a much larger range of patterns for the connected components than the GRF model.

Conclusions
Stationary multi-Gaussian random fields (GRF) are parameterized by their mean and covariance model.They are easy to define and simulate but their connectivity patterns cannot be controlled.In two or three-dimensional simulations, low and high-value regions always form isolated zones, whereas the middle-value (near to mean) region is well connected.Using multi-Gaussian fields may therefore lead to an underestimation of the uncertainty when predicting flow and transport (Gómez-Hernández and Wen 1998;Zinn and Harvey 2003;Kerrou et al. 2008) because simulations with highly connected or highly disconnected hydraulic conductivity values would not be simulated by this technique.This paper investigated therefore the feasibility of using substitution random field (SRF) to generate fields having a broader (and if possible controlled) distribution of connectivity patterns.
To investigate this question, we used substitution random fields built by composing two stationary GRFs, the directing function T, and the coding process Y to get Z = Y•T .This technique is parsimonious because it uses the simple parameterization of GRFs with their mean and covariance function, but it allows enriching the generated spatial features, in particular in terms of connectivity.
Assuming that the directing function T is second-order stationary with bounded variogram (finite variance), we have shown that the resulting SRF Z is not ergodic in the mean and the covariance.It means that the statistical properties of Z cannot be derived from a single realization or field observations.Nevertheless, we have established an analytical expression for the mean of the covariance function describing a large ensemble of realizations.Furthermore, adding a control point on the coding process Y, consisting of imposing the value of Y at the mean of T, allows to control partly the connectivity structures of the simulated fields: the region with values close to the prescribed value at the control point tends to be well connected.Moreover, the mean distribution over the ensemble of realizations can be expressed with respect to this control point.We show how the simulation of this type of SRF can handle conditioning data with a Gibbs sampler.Thus, we provide an algorithm able to generate conditional simulation with partial control of the connectivity patterns.
However, the type of SRF tested in this paper suffers from several drawbacks.First, the target distribution is only approximately reproduced for each single realization, especially around the extreme values.Indeed, for instance, fields with the high-value region well connected tend to present a peak for the high values in the histogram.This peak is difficult to transform in a long tail as in a Gaussian distribution.The underlying reason for this phenomenon is that it is not possible to create connected paths over a long distance (an infinite cluster) in a random field if the proportion of cells involved in this path is too small.Secondly, identifying the parameters of the underlying covariance models is difficult.The range of the coding process should be set relative to the standard deviation of the directing function.But more generally, the non-ergodicity of this model makes the inference of the parameters difficult.This suggests that further research should be conducted in this field before the method can be applied easily for field applications.
In summary, although the proposed method is still a bit difficult to constrain because the statistics and the connectivity patterns vary strongly between the realizations, this can also be seen as an advantage because it allows mitigating the risk of underestimation of uncertainty.This may be very important for groundwater flow and solute transport or other applications deeply impacted by connectivity structures.

Appendix A Metropolis ratio in the Gibbs sampler
The Metropolis ratio (Eq.( 29)) in the Gibbs sampler algorithm of Sect. 4 is obtained as follows (adapted from Lantuéjoul (2002)).Let be the distribution over {t j } j∈J that has to be sampled in step (1) for conditional SRF given by Eq. (28), i.e.
where t J = {t j } j∈J , and similarly for x J , z J , and t K , y K .
For a given index j 0 , let J 0 = J ⧵ {j 0 } be the ensemble of indices in J private of the index j 0 , and for a candidate value t ′ j 0 , let t ′ J be the vector with the value t ′ j 0 at index j 0 and with (A1) (t J ) = ℙ T(x J ) = t J | Z(x J ) = z J , Y(t K ) = y K

2Y
= C Y (0) can be written as where C Y b is the covariance function (of same type as C Y ) with range and sill equal to 1. Using the definition of g Y and the change of integration variable t = r Y ⋅ u , it follows that

Fig. 1
Fig. 1 Example of one realization of a substitution random field (SRF).Top left) one realization of the 2D directing function T(x), bottom) one realization of the 1D coding process Y(t) (blue line), top right) resulting SRF field Z(x) = Y(T(x)) .The bottom plot shows T to control what is connected in the realizations of Z.Then, we use a control point Y( T ) = y T and consider the conditional simulation the constraint on Y (control point) indicating the region around the specified value y

Fig. 2
Fig. 2 Example of one realization of SRF as in Fig. 1 but with a smaller range for Y; top left) field T(x), bottom) process Y(t) (blue line), top right) resulting SRF field Z(x) = Y(T(x)) and anamorphosis.(2) SRF with a high value as control point, y T = Y + 1.2 ⋅ Y , and anamorphosis.(3) GRF X based on the same covariance model as T, except the sill set to 2 X = 2 Y = 2 , and the mean set to X = Y = −3 . (No anamorphosis.)In the three cases, 200 conditional realizations are generated.The results are shown in Figs. 5 and 6.

Fig. 3 Fig. 4
Fig. 3Results for an ensemble of 200 realizations of unconstrained SRF; a-b) two selected realizations (same color bar), c-f) statistics computed from the ensemble and theoretical result (pink)

Fig. 6
Fig.6Results of conditional simulations: left column) SRF with low y T , case (1); middle column) SRF with high y T , case (2); right column) GRF X, case (3).Top row) first realization thresholded; middle T is a stationary GRF on ℝ d , defined by its mean T and covariance function C T ,• Y is a stationary GRF on ℝ , defined by its mean Y and covariance function C Y . •