Local Ranking of Geological Conceptual Models in Non-stationary Settings Using Multi-point Geostatistics

In geomodeling, it is commonly accepted that the distribution of physical properties is controlled by the architecture of geological objects. However, insufficient data and the complexity of earth processes create an ill-posed problem where many architectures are plausible. Consequently, several geologists will produce different geological models for the same location. This contribution proposes a way to objectivize the ranking of those conceptual models by comparing them with hard data, both globally for the whole study region and locally for certain of its sectors. The idea is to extend the multi-point geostatistics direct sampling algorithm to be able to extract data events from different training images, representing several competing geological models, and to record the training image origin of values pasted on simulation grid cells. By tracking the frequency with which every training image is visited, we can rank the likelihood of each geological model. Histograms of the frequency of usage of each training image will provide a global ranking of the several conceptual models, while maps of these frequencies can be used to produce the local rankings. We demonstrate this method in two synthetic fluvial depositional environments where three distinct geological concepts are being proposed, with different abundances of hard data. Results indicate that the proposed method could be a useful tool in defining which geological concept dominates at a particular region and which is the frequency ranking for each training image on that region.

which geological concept dominates at a particular region and which is the frequency ranking for each training image on that region.
Keywords Multi-point geostatistics · Geological uncertainty · Local ranking

Introduction
Geological heterogeneity has a strong influence on the distribution of physical properties related to earth resources such as the concentration of economic minerals, porosity and permeability. Therefore, the process of identifying and building a geological architecture, including the location of important geological bodies and their relationships, is usually established before spatially populating them with physical properties (Strebelle and Remy 2005;Zhang et al. 2006;de Almeida 2010). This conventional workflow has proven to be effective in both the exploration and exploitation phases of natural resources.
When evaluating the geological conditions of an area, geologists must deal with complex, scarce, and unevenly distributed data originating from a broad range of measurement devices with different scales (spatial, temporal, and statistical). Direct measurements of critical physical properties, e.g. out of core samples, are very limited so that the physical properties are mostly interpolated or calculated from indirect measurements such as well logs, seismic data, or other geophysical methods. The illposedness of this problem allows several geologists to have different interpretations for the same data on the same area. Another substantial reason for this disagreement is the interpreters' preexisting knowledge, which Bond et al. (2007) refer to as conceptual uncertainty: these authors observed 412 interpretations of a single synthetic seismic section produced by forward modeling and defined a conceptual uncertainty space that encompasses the most uncertain aspects of the interpretations obtained. This work highlighted the need for tools that can robustly quantify the likelihood of proposed geological concepts against further independent data, as a means for validation, selection, or ranking of competing geologic interpretations.
There are several methods that can be used to spatially predict subsurface geological elements, such as geostatistics and stratigraphic forward modeling methods. Geostatistics is useful in quantifying data uncertainties and has become a standard tool in the geosciences. On the other hand, stratigraphic forward modeling methods can be used to test hypotheses about the processes that take place in the system, but there is generally no way to condition their results to hard data. Both methods can be incorporated into any workflow (Michael et al. 2010;Zhang et al. 2019) so that it can be expected to deliver more realistic and objective results.
One group of geostatistical methods is called multi-point geostatistics (MPS). This utilizes a training image as input together with spatial data. The training image represents the conceptual arrangement of geological elements, and both second-order and higher-order statistics can be extracted from it (Guardiano and Srivastava 1993). The training image is often tauted as more intuitive for geologists, in particular compared with explicit two-and multi-point structural functions (variograms, autocovariance functions, generalized covariances, high-order moments, etc.) in providing explicit geological information, and it is particularly useful for categorical data. Here, geologists can apply their prior knowledge and experiences to incorporate features in the simulation that cannot be observed in sparse data but that the analysts consider necessary.
The use of a training image should consider its representativeness to conditional data, rather than based solely on relevant geology. Some authors (Boisvert et al. 2007;Feng et al. 2017;Pérez et al. 2014) have discussed the issue of training image quality assessment, a process which is usually carried out prior to simulation. Boisvert et al. (2007) proposed using the distribution of runs for a one-dimensional vertical direction, employing extensive vertical well data and a multiple-point histogram for any dimension (one-dimensional, two-dimensional, or three-dimensional). Pérez et al. (2014) counted the occurrences of conditioning data event patterns on each training image by scanning the whole training image or by using less expensive direct sampling algorithms. Feng et al. (2017) proposed the ranking of training images by calculating the mean and the variance of the minimum data event distance (MDevD), which can be obtained using the direct sampling algorithm.
Apart from MPS, there are also other methods that use training images to obtain spatial structure information such as the high-order spatial cumulant method (Dimitrakopoulos et al. 2010) and neural networks (Caers 2001). The high-order spatial cumulant method is less sensitive to training images, since it uses multi-point information from both conditioning data and training images to estimate the local probability density function. To the best of authors' knowledge, this method has not yet been used for assessing the quality of training images.
The objective of this paper is to propose a data-driven method to rank the preference for local conceptual models. Additionally, the method produces conditional simulations, with certain shortcomings. We approach the problem by employing multiple training images containing alternative geological concepts exemplarily in one of the MPS methods, direct sampling (DS). The conventional DS algorithm is modified slightly to enable it to visit several training images and to record the training image origin of simulated values. Subsequently, we obtain the frequency for each training image at each location of the simulation domain while generating realizations that might mix patterns and structures coming from different training images.
The paper is divided into four sections. This first section has discussed the background of this research. The second section describes in more detail the DS method and our proposal to extend it to the objective of this paper. This section also illustrates the application of the method to several synthetic cases representing fluvial depositional environments. Some training images with different geological concepts are used in these case studies. A discussion of the results is presented in Section 3.

Multi-point Geostatistics
MPS was first introduced in rather hypothetical fashion by Guardiano and Srivastava (1993), and since the work of Strebelle (2002), it has been developed to solve real case studies (Harding et al. 2005; van der Grijp and Minnitt 2015; Kim et al. 2018) with a variety of algorithms. Originally, MPS methods were developed for categor-ical variables such as rock facies, but current implementations are able to deal with continuous (Zhang et al. 2006) and even with compositional variables (Talebi et al. 2019). MPS methods are very attractive because they can reproduce curvilinear and complex structures while honoring conditional data, both hard and soft. This ability is particularly important in settings where the realizations are further employed for highly non-linear post-processing, such as fluid flow simulations or mine scheduling.
In MPS, geological concepts and interpretations are provided in the form of a training image, and multi-point statistics information is borrowed from it. This training image might come from a digitized image made by a geologist, numerical modeling of physical simulation (Michael et al. 2010), prior modeling using two-point geostatistics or object-based methods (Strebelle 2002;Zhang et al. 2006), outcrops (Pickel et al. 2015), modern analogues (Jung et al. 2012;Hashemi et al. 2014;Dagasan et al. 2019), or spatial data if extensively available (Yin et al. 2016). In geology, an expert opinion in the form of an image is highly informative, as it gives a broad picture of the distribution of geological facies and the interdependence between them. Hence, the idea of using a training image is a highly promising and relevant approach. Alternatively, training images can be generated by physical or numerical simulation, capturing highly detailed physical phenomena and high-frequency characteristics, although not all geological conditions can be readily simulated at the required complexity level. In any case, the representativity of a training image and its ability to appropriately reproduce the spatial distribution of the studied features was recently challenged by Emery and Lantuéjoul (2014), who concluded that training images must be theoretically very large.

MPS Direct Sampling
One method of MPS is direct sampling (DS, Mariethoz et al. (2010)), which is based on distance calculations between the data events in the simulation grid and the data events in the training image, which are limited by a specific window size. A data event contains values obtained from either the simulation grid (hard data or previously simulated data) or the training image that are located relative to a central reference point. The distance calculation depends on the type of variables that are being modeled, whether discrete, continuous (Mariethoz and Kelly 2011), or even specific multivariate scales, such as compositional variables (Talebi et al. 2019), to mention just a few. DS offers simplicity and flexibility because no pre-computation of conditional probabilities is required, and there is no need to store patterns in a database as in other MPS algorithms. The value of the central node in a training image data event will be exported directly to the simulation grid if the distance has met the necessary criteria, typically being under a certain threshold value. We may export the value of more than one pixel to make the simulation process faster (Rezaee et al. 2013). Another advantage of DS is that we can easily combine different information sources to perform co-simulation, either between the same type of variables (e.g. categorical and categorical) or different types of variables (e.g. categorical and continuous, or categorical and compositional), as presented by several authors (Talebi et al. 2019;Yin et al. 2016). This contribution focuses on DS for categorical variables, although some of its conclusions extend to other scales of data.
Assume that Z (x) is a categorical regionalized variable that can have a value on the set c = {c 1 , c 2 , ..., c k }, for instance rock facies. A simulation grid is characterized by nodes filled in by hard data and empty nodes to be simulated. A reference position in the simulation grid or the training image is denoted by x 0 or y 0 , respectively. A set of relative spatial lags of n data within a certain window size is denoted by Equivalently, a data event from the training image will be denoted as either N y 0 = N (y 0 , H ) =: {z(y 0 +h j )|h j ∈ H }, depending on the context. Moreover, let d(M x 0 , N y 0 ) be the distance between the two data events, for example computed using an appropriate distance criterion for categorical variables, such as (Mariethoz and Kelly 2011) The value of z(y 0 ) will be pasted in where d thr is a threshold value in the range of 0 to 1; otherwise the new data event N y 0 will be extracted from a different y 0 location and the distance calculation is repeated. The training image is randomly scanned until a data event satisfying that condition is found or until a portion f of the training image has been scanned. In the latter case, the value of z(y 0 ) associated with the smallest d(M x 0 , N y 0 ) will be used. This is achieved by storing both values each time the distance decreases. DS is a sequential method where every grid is visited one by one, either randomly or systematically (laterally starting from a corner). The same applies to the scanning path in the training image. Each simulated value in the simulation grid becomes the new hard data.

Direct Sampling with Multiple Training Images and Training Image Origin Tracking
A training image depicts prior geological information for a study area. However, as mentioned in the introduction, it is generally difficult to find only one proper concept for a geological problem. Geologists with diverse backgrounds will employ different concepts in solving the same problem. Thus, working with multiple training images that accommodate a collection of geological concepts may be more appropriate. Additionally, the use of a large training image, as recommended by Emery and Lantuéjoul (2014), can also be replaced by the use of many training images. One of the early MPS methods, SNESIM, implements a multiscale method, where multiple training images are used to capture different scales of phenomena (Liu 2006;Strebelle 2002). Straubhaar (2019) developed a DS algorithm with multiple training images that is able to combine patterns from different training images in each realization. However, its use is specifically to produce realizations, and not to quantify the compatibility of training images against data. The method proposed here requires modification of the DS algorithm to cope with multiple training images and to quantify the frequency of each training image in the simulation grid. Denote by z (i) (x) the i-th training image, and by N (i) y 0 = {z (i) (y 0 + h j )|h j ∈ H } = N (y 0 , H, i) a data event extracted from it. One can then devise a path randomly sampling across all training images, provided that it visits all of them equally. One can then accept the first z (i) (y 0 ) data event which satisfies the distance condition, i.e. which distance The training image index i should be stored as well, to be able to track the origin of each simulated value.
Alternatively, one can parallelize the search algorithm, and at the same time visit all training images equally, with the following modification. The distances between data events from each training image N y 0 ) < d thr , one will be picked up at random, and the z (i) (y 0 ) value of the selected N (i) y 0 will be pasted in the z(x 0 ). In this way, it will ensure that the DS algorithm visits all the training images equally, and a data event in the simulation grid will be compared to all the training images with the same frequency.
During the sequential process, the method tracks the origin of every simulated value in the simulation grid. Consequently, after generating several realizations, one can quantify the frequency of each training image at every simulation grid. This helps to observe the tendency of a training image to fill in a certain region of the simulation grid. Another advantage of the method is that at the same time, one produces realizations which might incorporate several of the conceptual models being considered. Here, the assessment of the training images is carried out while obtaining the realizations, not before obtaining them, as previous researchers did (Boisvert et al. 2007;Feng et al. 2017;Pérez et al. 2014;van den Boogaart 2006).

Illustration
In order to show the performance of the proposed method, two sets of experiments are conducted on two different synthetic settings (Fig. 3). In both cases, three geological concepts are considered as alternative training images (TI 1 , TI 2 , and TI 3 displayed in Fig. 1). The training images resemble the channels and the flood plains of the fluvial depositional environment which vary smoothly from one another. The yellow color represent facies 1, reflecting the river channel, which usually presents good porosity and permeability, while blue is facies 2, which reflects the surrounding depositional environment with poor porosity and permeability, such as a flood plain. Training image TI 1 describes braided stream patterns. TI 2 and TI 3 show meandering stream patterns, the latter displaying higher sinuosity and an oxbow lake feature. The lower part of Fig. 1 shows the omnidirectional variograms of each training image, where the sinuosity or the facies ratio of the training image patterns are reflected in range and periodicity or the sill of the variogram. Figure 2 shows the difference in the total fraction of facies 1. TI 1 has the highest global fraction of facies 1, with a value of 45%. TI 2 and TI 3 have global fractions of 25% and 30%, respectively.
These experiments describe situations in which the targets fit with only one or represent a combination of several geological models (Fig. 3). For the first experiment, the target is the TI 3 , but horizontally flipped so that training image and target are analogous and not identical. For the second experiment, the target represents a transition between TI 1 and TI 2 , resembling a fluvial transitional depositional environment consisting of meandering and braided stream patterns (Fig. 3), each occupying roughly 50% of the area. Conditional data are extracted randomly from the targets at 1%, 5%, and 10% (Fig. 3). With the variations in the amount of conditional data, we want to examine how the proposed method works with different data densities. The omnidirectional variograms of the targets and the conditional data can be seen in Fig. 3. In general, all the variograms have a sill value of 0.2 with a range of 8. As the data density decreases, the variograms become de-structured. In terms of the global fraction of facies 1, the targets and conditional data exhibit frequencies ranging from 25% to 35%. These are close to TI 2 and TI 3 (Fig. 2). The conditional data and the training images are the inputs to this method.
In these experiments, both the simulation grid and the three training images have a size of 100 × 100 pixels and the same resolution, to avoid inconsistent object size reproduction. A set of 40 realizations was produced, conditional to the available data, with the following simulation parameters: window size of 10 × 10 pixels, and a maximum portion of f = 20% of the training image to scan before stopping. These two parameters were unchanged in all the experiments. Although tuning these parameters may play a significant role in the final results, in this paper we focus the discussion on the aspect of data density.

Results and Discussion
Figures 4 and 6 show some of the realizations of the first and second experiments. The realizations display an overall proper continuity of facies 1. In the upper row of each figure are the outcomes using 1% conditional data, while in the middle and the lower rows are those using 5% and 10% conditional data, respectively. From these two experiments, we can observe that increasing the number of conditional data results in realizations that increasingly resemble their targets (Fig. 3). For example, in experiment 1, the oxbow lake feature can be captured with data density of 10%, but not when the data density is 1% or 5%. In this experiment, better reproductions of the target variogram are obtained as the data density increases (Fig. 5). And in experiment 2, two channel regimes can be observed in the realizations when the data density is 10%. The two regimes can also be seen in realization 2 when the data density is 5%, but at this data density the influence of TI 3 can create a highly sinuous channel, as shown in realization 1 and its TI tracking map. Figure 7 shows the variogram reproductions of experiment 2.
For every realization, there is a record of which training image was used at each location (Figs. 4 and 6, columns 2 and 4). Out of 40 realizations, the frequency of each training image in every pixel was computed (Figs. 8 and 9). In addition, we calculated the frequency over the simulation grid for each realization as well as for all realizations to produce the box plots and the histograms of selected training images (Fig. 10). The box plots show the variability in the frequency of usage of each training image across realizations.
In the first experiment, with 10% conditional data, the resulting frequency maps show that TI 3 is the most frequently visited, followed by TI 2 and TI 1 (Fig. 8). This is also reflected in the histogram (Fig. 10). As previously mentioned, the target in the first experiment was the flipped TI 3 , so we expect that the interpolated values in the simulation grid will be dominated by the values from TI 3 . The next most frequently used training image is TI 2 , since it is more similar to TI 3 than TI 1 (Fig. 1). This is true for both the fraction of each facies and the typical patterns the TIs exhibit. When the proportion of conditional data is reduced to 5%, we can still see comparably high values for the TI 3 frequency map, but this is no longer the case when it is 1%. Here, the values of TI 2 are taken more often than of TI 3 , suggesting that TI 2 offers more flexible data events.
In the second experiment, with 10% data density, TI 1 and TI 2 show a tendency to fill in a particular region in the simulation grid (Fig. 9). In other words, they are locally better supported by the data, as compared with TI 3 . This result was expected given the configuration of the target (Fig. 3) as a transition between regimes analogous to TI 1 and TI 2 . However, this tendency is not clearly observed in the histogram, where the selections of TI 1 and TI 3 are almost the same, and TI 2 was chosen very predominantly (Fig. 10). In other words, using the histograms on the proposed method to rank the training images does not tell the whole story. A visual inspection of the frequency maps is required to determine whether there is a strong local dominance of a training image. A low frequency of a certain training image in the global histogram does not mean that this training image is not important if it occurs highly concentrated in a specific area of the simulation grid. This local trend was not visible at 5% or 1% data density, Fig. 8 Training image frequency maps of experiment 1 after 40 realizations for each data density as occurred in the first experiment. We may say that the proposed method works when the data density is sufficient to differentiate between different patterns in the training images. Otherwise, when a data event in the simulation domain fits into several training images, the algorithm will select one of the fitting training images randomly. Thus, the updated data events in the neighborhood area will likely support the selected training image to fill in the values around it. To highlight the more suitable training image for a certain region, several possible solutions might be used, for example, smoothing the training image tracking information as a post-processing step. Alternatively, one can Fig. 9 Training image frequency maps of experiment 2 after 40 realizations for each data density implement multiscale distance calculations where the distance comparison between training images (equation 2) starts from a large window size and then continues to a smaller window size. The impact of such choices is left for future research.
Once the data density is sufficient, 10% in these experiments, the frequency maps can help geologists determine which geological concepts are more appropriate for the entire area or for certain parts of the area. This is also assisted by the resulting realizations (e.g. whether the selection of the geological concepts resulted in geologically proper realizations). Although in this contribution we only considered the value of visited TI bookkeeping for the purpose of conceptual model ranking, one can imagine other uses for it. For instance, in this contribution, local ranking maps are simply used as a tool to help the geologist understand the geological setting. But the maps may be used not only for indicating different regions of compatibility, but also for identifying actual transitions between regimes (or non-stationarity). If this is the case, in a second batch of simulations, they could be used as prior probabilities for several TIs, as is done with the DeSee method (Straubhaar 2019). One could also imagine an iterative process, in which the probability map of visiting each training image is updated after every batch. The details or properties (such as convergence) of such an algorithm are left for future research.

Conclusion
We have demonstrated how a multi-point algorithm, in the case of this contribution direct sampling (DS), can be used in the case of multiple, alternative training images, representing competing geological interpretations of the area under study. In this way, many prior geological concepts can be imposed as training images, and the corre-sponding visiting frequency for each training image at each sector of the simulation grid can be calculated. The frequency maps can assist geologists in determining the most locally compatible conceptual models out of a set of available alternatives. However, a sufficient amount of data is required so that the patterns from each training image can be properly distinguished and the correct frequency maps produced.