Abstract
Simulation of categorical and continuous variables is performed using a new patternbased simulation method founded upon coding spatial patterns in one dimension. The method consists of, first, using a spatial template to extract information in the form of patterns from a training image. Patterns are grouped into a pattern database and, then, mapped to one dimension. Cumulative distribution functions of the onedimensional patterns are built. Patterns are then classified by decomposing the cumulative distribution functions, and calculating class or cluster prototypes. During the simulation process, a conditioning data event is compared to the class prototype, and a pattern is randomly drawn from the best matched class. Several examples are presented so as to assess the performance of the proposed method, including conditional and unconditional simulations of categorical and continuous data sets. Results show that the proposed method is efficient and very well performing in both two and three dimensions. Comparison of the proposed method to the filtersim algorithm suggests that it is better at reproducing the multipoint configurations and main characteristics of the reference images, while less sensitive to the number of classes and spatial templates used in the simulations.
This is a preview of subscription content, access via your institution.
References
Allard D, Froidevaux R, Biver P (2006) Conditional simulation of multitype non stationary Markov object models respecting specified proportions. Math Geol 38(8):959–986
Arpat GB (2004) Sequential simulation with patterns. PhD thesis, Stanford University
Arpat G, Caers J (2007) Conditional simulation with patterns. Math Geol 39(2):177–203
Chatterjee S, Dimitrakopoulos R (2012) Multiscale stochastic simulation with a waveletbased approach. Comput Geosci 45:177–189
Chatterjee S, Dimitrakopoulo R, Mustapha H (2012) Dimensional reduction of patternbased simulation using wavelet analysis. Math Geosci 44(3):343–374
Chilès JP, Delfiner P (1999) Geostatistics—modeling spatial uncertainty. Wiley, New York
Comunian A, Renard P, Straubhaar J (2012) 3D multiplepoint statistics simulation using 2D training images. Comput Geosci 40:49–65
Daly C (2004) Higher order models using entropy, Markov random fields and sequential simulation. In: Leuangthong O, Deutsch CV (eds) Geostatistics Banff. Kluwer, Dordrecht, pp 215–225
Deutsch CV (2002) Geostatistical reservoir modeling. Oxford University Press, New York
Dimitrakopoulos R, Mustapha H, Gloaguen E (2010) Highorder statistics of spatial random fields: exploring spatial cumulants for modeling complex nonGaussian and nonlinear phenomena. Math Geosci 42(1):65–99
Gloaguen E, Dimitrakopoulos R (2009) Twodimensional conditional simulation based on the wavelet decomposition of training images. Math Geosci 41(7):679–701
Goovaerts P (1998) Geostatistics for natural resources evaluation. Oxford University Press, New York
Guardiano FB, Srivastava RM (1993) Multivariate geostatistics: beyond bivariate moments. In: Soares (ed) Geostatistics Troia ‘92. Kluwer, Dordrecht, pp 133–144
Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distancebased pattern modelling. Math Geosci 42:487–517
Huysmans M, Dassargues A (2011) Direct multiplepoint geostatistical simulation of edge properties for modeling thin irregularly shaped surfaces. Math Geosci 43(5):521–536. doi:10.1007/s1100401193367
Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafi E, Schofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224
Liu Y (2006) Using the Snesim program for multiplepoint statistical simulation. Comput Geosci 23(2006):1544–1563
Mao S, Journel AG (1999) Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. In: Stanford center for reservoir forecasting annual meeting. Available at: http://ekofisk.stanford.edu/SCRF.html
Mariethoz G, Renard P (2010) Reconstruction of incomplete data sets or images using direct sampling. Math Geosci 42(3):245–268
Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiplepoint simulation. Water Resour Res. doi:10.1029/2008WR007621
Mustapha H, Dimitrakopoulos R (2010) Highorder stochastic simulation of complex spatially distributed natural phenomena. Math Geosci 42(5):455–473
Mustapha H, Dimitrakopoulos R, Chatterjee S (2011) Geologic heterogeneity representation using highorder spatial cumulants for subsurface flow and transport simulations. Water Resour Res. doi:10.1029/2010WR009515
Ortiz JM, Deutsh CV (2004) Indicator simulation accounting for multiplepoint statistics. Math Geol 36(5):545–565
Remy N, Boucher A, Wu J (2008) Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press, Cambridge
Sarma P, Durlofsky L, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32
Scheidt C, Caers J (2009) Representing spatial uncertainty using distances and kernels. Math Geosci 41:397–419
Straubhaar J, Renard P, Mariethoz G, Froidevaux R, Besson O (2011) An improved parallel multiplepoint algorithm using a list approach. Math Geosci 43(3):305–328
Strebelle S (2000) Sequential simulation drawing structures from training images. PhD thesis, Stanford University
Strebelle S (2002) Conditional simulation of complex geological structures using multiplepoint statistics. Math Geol 34(1):1–21
Tjelmeland H (1998) Markov random fields with higher order interactions. Scand J Stat 25:415–433
Tjelmeland H, Eidsvik J (2004) Directional Metropolis: hastings updates for conditionals with nonlinear likelihoods. In: Geostatistics Banff 2004, vol 1. Springer, Berlin, pp 95–104
Wu J, Zhang T, Journel A (2008) Fast FILTERSIM simulation with scorebased distance. Math Geosci 40(7):773–788
Zhang T, Switzer P, Journel A (2006) Filterbased classification of training image patterns for spatial simulation. Math Geol 38(1):63–80
Zhang T, Stein Inge Pedersen SI, Christen Knudby C, McCormick D (2012) Memoryefficient categorical multipoint statistics algorithms based on compact search trees. Math Geosci 44(7):863–879
Acknowledgements
We thank the Associate Editor of Mathematical Geosciences handling our manuscript and the anonymous reviewers for their detailed comments that have helped improve the manuscript. The work in this paper was funded by Natural Science and Engineering Research Council of Canada CRDPJ 41127010, Discovery Grant 239019, and the industry members of the COSMO Stochastic Mine Planning Laboratory: AngloGold Ashanti, Barrick Gold, BHP Billiton, De Beers, Newmont Mining and Vale.
Author information
Authors and Affiliations
Corresponding author
Additional information
H. Mustapha now at Schlumberger, United Kingdom. S. Chatterjee now at National Institute of Technology Rourkela, India.
Appendices
Appendix A: A Dissimilarity Method and Its Relevance
For a completeness reason, it is important to mention that the dissimilarity checker applied to find two class centers V _{ p } and V _{ q } reflects the same level of dissimilarity between the patterns in the original space. In other words, for any two patterns ti _{ T }(u),ti _{ T }(u′)∈patdb _{ T }, the following property holds: If ∃r>0 such that f(ti _{ T }(u))−f(ti _{ T }(u′))≥r, then ∥ti _{ T }(u)−ti _{ T }(u′)∥≥r, where ∥.∥ and . denote the L _{1}norm and absolute value, respectively. The demonstration is
Note that ti _{ T }(u+h _{ k }) is assumed to be positive; this assumption can always hold by applying a simple translation to the input images. The different steps in Eq. (A.1) can be described as follows:

From Line 1 to Line 2: Using the formula ∑_{ i }x _{ i }≥∑_{ i } x _{ i }, one can write ∑_{ i }x _{ i }−y _{ i }≥∑_{ i } x _{ i }−∑_{ i } y _{ i }. Given the image values are assumed positive, then one can write ∑_{ i }x _{ i }−y _{ i }≥∑_{ i }x _{ i }−∑_{ i }y _{ i }. By multiplying the right hand side by a real positive value M≤1, it is
$$\begin{aligned} \sum_{i} x_{i}  y_{i} \ge& M\biggl\vert \sum_{i} x_{i}  \sum _{i} y_{i} \biggr\vert \\ =& \biggl\vert \sum_{i} x_{i} M  \sum_{i} y_{i} M \biggr\vert . \end{aligned}$$Here, x _{ i } and y _{ i } denote the points inside pattern i and j, respectively.

From Line 2 to Line 3: The function s in Eq. (2) is positive and bounded by M: s(k)≤M for every k. Then, using x∗a−y∗a≥x∗b−y∗b for any positive reals 0≥a≥b, one can write
$$\biggl\vert \sum_{i} x_{i} M  \sum _{i} y_{i} M \biggr\vert \ge\biggl\vert \sum_{i} x_{i} s(k)  \sum _{i} y_{i} s(k) \biggr\vert . $$ 
From Line 3 to Line 4: Straight forward using the definition of function f.
In Eq. (A.1), we show that if we map two patterns into one dimension using f, and if the distance between the onedimensional values is higher than r, then the distance between the patterns in n dimension is also high than r. In other words, f is a good measure of dissimilarity between patterns in n dimension.
Appendix B: Accuracy with Respect to Pattern Size?
A simple problem of rolling a dice is presented here to illustrate the idea that the method performance may improve for greater spatial pattern sizes. Let us consider the following two cases: rolling two (Case 1) and three (Case 2) dice and distinguish between them such that (a,b) and (a,b,c) denote respectively possible outcomes for Case 1 and Case 2, with a, b, and c are the number of top of the first die, the second die, and the third die, respectively. Having the values a, b and c vary between 1 and 6, the lists of all joint possibilities for Case 1 and Case 2 are given as follows:
Case 1: 36 possibilities
(1,1)  (1,2)  (1,3)  (1,4)  (1,5)  (1,6) 
(2,1)  (2,2)  (2,3)  (2,4)  (2,5)  (2,6) 
(3,1)  (3,2)  (3,3)  (3,4)  (3,5)  (3,6) 
(4,1)  (4,2)  (4,3)  (4,4)  (4,5)  (4,6) 
(5,1)  (5,2)  (5,3)  (5,4)  (5,5)  (5,6) 
(6,1)  (6,2)  (6,3)  (6,4)  (6,5)  (6,6) 
Case 2: 216 possibilities
(1,1,1)  (1,2,1)  (1,3,1)  (1,4,1)  (1,5,1)  (1,6,1)  (1,1,4)  (1,2,4)  (1,3,4)  (1,4,4)  (1,5,4)  (1,6,4) 
(2,1,1)  (2,2,1)  (2,3,1)  (2,4,1)  (2,5,1)  (2,6,1)  (2,1,4)  (2,2,4)  (2,3,4)  (2,4,4)  (2,5,4)  (2,6,4) 
(3,1,1)  (3,2,1)  (3,3,1)  (3,4,1)  (3,5,1)  (3,6,1)  (3,1,4)  (3,2,4)  (3,3,4)  (3,4,4)  (3,5,4)  (3,6,4) 
(4,1,1)  (4,2,1)  (4,3,1)  (4,4,1)  (4,5,1)  (4,6,1)  (4,1,4)  (4,2,4)  (4,3,4)  (4,4,4)  (4,5,4)  (4,6,4) 
(5,1,1)  (5,2,1)  (5,3,1)  (5,4,1)  (5,5,1)  (5,6,1)  (5,1,4)  (5,2,4)  (5,3,4)  (5,4,4)  (5,5,4)  (5,6,4) 
(6,1,1)  (6,2,1)  (6,3,1)  (6,4,1)  (6,5,1)  (6,6,1)  (6,1,4)  (6,2,4)  (6,3,4)  (6,4,4)  (6,5,4)  (6,6,4) 
(1,1,2)  (1,2,2)  (1,3,2)  (1,4,2)  (1,5,2)  (1,6,2)  (1,1,5)  (1,2,5)  (1,3,5)  (1,4,5)  (1,5,5)  (1,6,5) 
(2,1,2)  (2,2,2)  (2,3,2)  (2,4,2)  (2,5,2)  (2,6,2)  (2,1,5)  (2,2,5)  (2,3,5)  (2,4,5)  (2,5,5)  (2,6,5) 
(3,1,2)  (3,2,2)  (3,3,2)  (3,4,2)  (3,5,2)  (3,6,2)  (3,1,5)  (3,2,5)  (3,3,5)  (3,4,5)  (3,5,5)  (3,6,5) 
(4,1,2)  (4,2,2)  (4,3,2)  (4,4,2)  (4,5,2)  (4,6,2)  (4,1,5)  (4,2,5)  (4,3,5)  (4,4,5)  (4,5,5)  (4,6,5) 
(5,1,2)  (5,2,2)  (5,3,2)  (5,4,2)  (5,5,2)  (5,6,2)  (5,1,5)  (5,2,5)  (5,3,5)  (5,4,5)  (5,5,5)  (5,6,5) 
(6,1,2)  (6,2,2)  (6,3,2)  (6,4,2)  (6,5,2)  (6,6,2)  (6,1,5)  (6,2,5)  (6,3,5)  (6,4,5)  (6,5,5)  (6,6,5) 
(1,1,3)  (1,2,3)  (1,3,3)  (1,4,3)  (1,5,3)  (1,6,3)  (1,1,6)  (1,2,6)  (1,3,6)  (1,4,6)  (1,5,6)  (1,6,6) 
(2,1,3)  (2,2,3)  (2,3,3)  (2,4,3)  (2,5,3)  (2,6,3)  (2,1,6)  (2,2,6)  (2,3,6)  (2,4,6)  (2,5,6)  (2,6,6) 
(3,1,3)  (3,2,3)  (3,3,3)  (3,4,3)  (3,5,3)  (3,6,3)  (3,1,6)  (3,2,6)  (3,3,6)  (3,4,6)  (3,5,6)  (3,6,6) 
(4,1,3)  (4,2,3)  (4,3,3)  (4,4,3)  (4,5,3)  (4,6,3)  (4,1,6)  (4,2,6)  (4,3,6)  (4,4,6)  (4,5,6)  (4,6,6) 
(5,1,3)  (5,2,3)  (5,3,3)  (5,4,3)  (5,5,3)  (5,6,3)  (5,1,6)  (5,2,6)  (5,3,6)  (5,4,6)  (5,5,6)  (5,6,6) 
(6,1,3)  (6,2,3)  (6,3,3)  (6,4,3)  (6,5,3)  (6,6,3)  (6,1,6)  (6,2,6)  (6,3,6)  (6,4,6)  (6,5,6)  (6,6,6) 
In total, there are 36 possibilities for (a,b) and 216 possibilities for (a,b,c). Consider now the sum of each of the possibilities:
Case 1: Sum of a and b
2  3  4  5  6  7 
3  4  5  6  7  8 
4  5  6  7  8  9 
5  6  7  8  9  10 
6  7  8  9  10  11 
7  8  9  10  11  12 
Case 2: Sum of a, b and c
3  4  5  6  7  8  6  7  8  9  10  11 
4  5  6  7  8  9  7  8  9  10  11  12 
5  6  7  8  9  10  8  9  10  11  12  13 
6  7  8  9  10  11  9  10  11  12  13  14 
7  8  9  10  11  12  10  11  12  13  14  15 
8  9  10  11  12  13  11  12  13  14  15  16 
4  5  6  7  8  9  7  8  9  10  11  12 
5  6  7  8  9  10  8  9  10  11  12  13 
6  7  8  9  10  11  9  10  11  12  13  14 
7  8  9  10  11  12  10  11  12  13  14  15 
8  9  10  11  12  13  11  12  13  14  15  16 
9  10  11  11  13  14  12  13  14  15  16  17 
5  6  7  8  9  10  8  9  10  11  12  13 
6  7  8  9  10  11  9  10  11  12  13  14 
7  8  9  10  11  12  10  11  12  13  14  15 
8  9  10  11  12  13  11  12  13  14  15  16 
9  10  11  12  13  14  12  13  14  15  16  17 
10  11  12  13  14  15  13  14  15  16  17  18 
For example, the outcomes where the sum of the two dice is equal to 7 form an event. If we call this event S, we have S={(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}. Consider that the dice are fair and independent, then each possibility (a,b) is equally likely, and P(S)=6/36=1/6. Consider two random variables A and B with outcomes the sum of all possibilities in Case 1 and Case 2, respectively: A={2,3,4,5,6,7,8,9,10,11,12}, and B={3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}. The probability density and cumulative distribution functions of both A and B are shown in Figs. 30 and 31. Figure 30 shows that the probably of A is greater than the probability of B at least at 70 % of the possible values of A, i.e. {2,3,4,5,6,7,8}. Figure 31 shows that the cdf of A is always over the cdf of B due to high probability that A occurs.
The above experiments shows that if we increase the size of the patterns i.e. from (a,b) to (a,b,c), there is a good probability that the classification method proposed improves its performance based on the function f employed. Note that the function f is not used to measure when two possibilities in Case 1 or Case 2 sum to the same value in A or B, respectively. The function f focuses on when the possibilities sum to very different values.
Rights and permissions
About this article
Cite this article
Mustapha, H., Chatterjee, S. & Dimitrakopoulos, R. CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns. Math Geosci 46, 95–123 (2014). https://doi.org/10.1007/s1100401394901
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1100401394901
Keywords
 Patternbased simulation
 Clustering
 Patterns coding
 Conditional simulation
 Training image