CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns

Mustapha, Hussein; Chatterjee, Snehamoy; Dimitrakopoulos, Roussos

doi:10.1007/s11004-013-9490-1

CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns

Published: 09 November 2013

Volume 46, pages 95–123, (2014)
Cite this article

Mathematical Geosciences Aims and scope Submit manuscript

Hussein Mustapha¹,
Snehamoy Chatterjee¹ &
Roussos Dimitrakopoulos¹

558 Accesses
13 Citations
Explore all metrics

Abstract

Simulation of categorical and continuous variables is performed using a new pattern-based simulation method founded upon coding spatial patterns in one dimension. The method consists of, first, using a spatial template to extract information in the form of patterns from a training image. Patterns are grouped into a pattern database and, then, mapped to one dimension. Cumulative distribution functions of the one-dimensional patterns are built. Patterns are then classified by decomposing the cumulative distribution functions, and calculating class or cluster prototypes. During the simulation process, a conditioning data event is compared to the class prototype, and a pattern is randomly drawn from the best matched class. Several examples are presented so as to assess the performance of the proposed method, including conditional and unconditional simulations of categorical and continuous data sets. Results show that the proposed method is efficient and very well performing in both two and three dimensions. Comparison of the proposed method to the filtersim algorithm suggests that it is better at reproducing the multi-point configurations and main characteristics of the reference images, while less sensitive to the number of classes and spatial templates used in the simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A High-Order, Data-Driven Framework for Joint Simulation of Categorical Variables

A New High-Order, Nonstationary, and Transformation Invariant Spatial Simulation Approach

Sampling Strategies for Uncertainty Reduction in Categorical Random Fields: Formulation, Mathematical Analysis and Application to Multiple-Point Simulations

Article 04 January 2019

References

Allard D, Froidevaux R, Biver P (2006) Conditional simulation of multi-type non stationary Markov object models respecting specified proportions. Math Geol 38(8):959–986
Article Google Scholar
Arpat GB (2004) Sequential simulation with patterns. PhD thesis, Stanford University
Arpat G, Caers J (2007) Conditional simulation with patterns. Math Geol 39(2):177–203
Article Google Scholar
Chatterjee S, Dimitrakopoulos R (2012) Multi-scale stochastic simulation with a wavelet-based approach. Comput Geosci 45:177–189
Article Google Scholar
Chatterjee S, Dimitrakopoulo R, Mustapha H (2012) Dimensional reduction of pattern-based simulation using wavelet analysis. Math Geosci 44(3):343–374
Article Google Scholar
Chilès JP, Delfiner P (1999) Geostatistics—modeling spatial uncertainty. Wiley, New York
Book Google Scholar
Comunian A, Renard P, Straubhaar J (2012) 3D multiple-point statistics simulation using 2D training images. Comput Geosci 40:49–65
Article Google Scholar
Daly C (2004) Higher order models using entropy, Markov random fields and sequential simulation. In: Leuangthong O, Deutsch CV (eds) Geostatistics Banff. Kluwer, Dordrecht, pp 215–225
Google Scholar
Deutsch CV (2002) Geostatistical reservoir modeling. Oxford University Press, New York
Google Scholar
Dimitrakopoulos R, Mustapha H, Gloaguen E (2010) High-order statistics of spatial random fields: exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Math Geosci 42(1):65–99
Article Google Scholar
Gloaguen E, Dimitrakopoulos R (2009) Two-dimensional conditional simulation based on the wavelet decomposition of training images. Math Geosci 41(7):679–701
Article Google Scholar
Goovaerts P (1998) Geostatistics for natural resources evaluation. Oxford University Press, New York
Google Scholar
Guardiano FB, Srivastava RM (1993) Multivariate geostatistics: beyond bivariate moments. In: Soares (ed) Geostatistics Troia ‘92. Kluwer, Dordrecht, pp 133–144
Google Scholar
Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distance-based pattern modelling. Math Geosci 42:487–517
Article Google Scholar
Huysmans M, Dassargues A (2011) Direct multiple-point geostatistical simulation of edge properties for modeling thin irregularly shaped surfaces. Math Geosci 43(5):521–536. doi:10.1007/s11004-011-9336-7
Article Google Scholar
Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafi E, Schofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224
Google Scholar
Liu Y (2006) Using the Snesim program for multiple-point statistical simulation. Comput Geosci 23(2006):1544–1563
Article Google Scholar
Mao S, Journel AG (1999) Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. In: Stanford center for reservoir forecasting annual meeting. Available at: http://ekofisk.stanford.edu/SCRF.html
Google Scholar
Mariethoz G, Renard P (2010) Reconstruction of incomplete data sets or images using direct sampling. Math Geosci 42(3):245–268
Article Google Scholar
Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple-point simulation. Water Resour Res. doi:10.1029/2008WR007621
Google Scholar
Mustapha H, Dimitrakopoulos R (2010) High-order stochastic simulation of complex spatially distributed natural phenomena. Math Geosci 42(5):455–473
Article Google Scholar
Mustapha H, Dimitrakopoulos R, Chatterjee S (2011) Geologic heterogeneity representation using high-order spatial cumulants for subsurface flow and transport simulations. Water Resour Res. doi:10.1029/2010WR009515
Google Scholar
Ortiz JM, Deutsh CV (2004) Indicator simulation accounting for multiple-point statistics. Math Geol 36(5):545–565
Article Google Scholar
Remy N, Boucher A, Wu J (2008) Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press, Cambridge
Google Scholar
Sarma P, Durlofsky L, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32
Article Google Scholar
Scheidt C, Caers J (2009) Representing spatial uncertainty using distances and kernels. Math Geosci 41:397–419
Article Google Scholar
Straubhaar J, Renard P, Mariethoz G, Froidevaux R, Besson O (2011) An improved parallel multiple-point algorithm using a list approach. Math Geosci 43(3):305–328
Article Google Scholar
Strebelle S (2000) Sequential simulation drawing structures from training images. PhD thesis, Stanford University
Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34(1):1–21
Article Google Scholar
Tjelmeland H (1998) Markov random fields with higher order interactions. Scand J Stat 25:415–433
Article Google Scholar
Tjelmeland H, Eidsvik J (2004) Directional Metropolis: hastings updates for conditionals with nonlinear likelihoods. In: Geostatistics Banff 2004, vol 1. Springer, Berlin, pp 95–104
Google Scholar
Wu J, Zhang T, Journel A (2008) Fast FILTERSIM simulation with score-based distance. Math Geosci 40(7):773–788
Article Google Scholar
Zhang T, Switzer P, Journel A (2006) Filter-based classification of training image patterns for spatial simulation. Math Geol 38(1):63–80
Article Google Scholar
Zhang T, Stein Inge Pedersen SI, Christen Knudby C, McCormick D (2012) Memory-efficient categorical multi-point statistics algorithms based on compact search trees. Math Geosci 44(7):863–879
Article Google Scholar

Download references

Acknowledgements

We thank the Associate Editor of Mathematical Geosciences handling our manuscript and the anonymous reviewers for their detailed comments that have helped improve the manuscript. The work in this paper was funded by Natural Science and Engineering Research Council of Canada CRDPJ 411270-10, Discovery Grant 239019, and the industry members of the COSMO Stochastic Mine Planning Laboratory: AngloGold Ashanti, Barrick Gold, BHP Billiton, De Beers, Newmont Mining and Vale.

Author information

Authors and Affiliations

COSMO—Stochastic Mine Planning Laboratory, Department of Mining and Materials Engineering, McGill University, Montreal, QC, H3A 2A7, Canada
Hussein Mustapha, Snehamoy Chatterjee & Roussos Dimitrakopoulos

Authors

Hussein Mustapha
View author publications
You can also search for this author in PubMed Google Scholar
Snehamoy Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Roussos Dimitrakopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hussein Mustapha.

Additional information

H. Mustapha now at Schlumberger, United Kingdom. S. Chatterjee now at National Institute of Technology Rourkela, India.

Appendices

Appendix A: A Dissimilarity Method and Its Relevance

For a completeness reason, it is important to mention that the dissimilarity checker applied to find two class centers V _p and V _q reflects the same level of dissimilarity between the patterns in the original space. In other words, for any two patterns ti _T(u),ti _T(u′)∈patdb _T, the following property holds: If ∃r>0 such that |f(ti _T(u))−f(ti _T(u′))|≥r, then ∥ti _T(u)−ti _T(u′)∥≥r, where ∥.∥ and |.| denote the L ₁-norm and absolute value, respectively. The demonstration is

$$\begin{aligned} \bigl\Vert \mathit{ti}_{{T}}({\mathbf{u}}) - \mathit{ti}_{{T}} \bigl({\mathbf{u}}'\bigr)\bigr\Vert =& \sum _{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}({ \mathbf{u}} + {\mathbf{h}}_{k}) - \mathit{ti}_{{T}}\bigl({ \mathbf{u}}' + {\mathbf{h}}_{k}\bigr)\bigr\vert \\ \ge& \Biggl\vert \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}({\mathbf{u}} + {\mathbf{h}}_{k}) \bigr\vert M - \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}\bigl({\mathbf{u}}' + { \mathbf{h}}_{k}\bigr) \bigr\vert M \Biggr\vert \\ \ge& \Biggl\vert \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}({\mathbf{u}} + {\mathbf{h}}_{k}) \bigr\vert s(k) - \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}\bigl({\mathbf{u}}' + { \mathbf{h}}_{k}\bigr) \bigr\vert s(k) \Biggr\vert \\ \ge& \bigl\vert f\bigl(\mathit{ti}_{{T}}({\mathbf{u}})\bigr) - f \bigl(\mathit{ti}_{{T}}\bigl({\mathbf{u}}'\bigr)\bigr) \bigr\vert \\ \ge& r. \end{aligned}$$

(A.1)

Note that ti _T(u+h _k) is assumed to be positive; this assumption can always hold by applying a simple translation to the input images. The different steps in Eq. (A.1) can be described as follows:

From Line 1 to Line 2: Using the formula ∑_i|x _i|≥|∑_i x _i|, one can write ∑_i|x _i−y _i|≥|∑_i x _i−∑_i y _i|. Given the image values are assumed positive, then one can write ∑_i|x _i−y _i|≥|∑_i|x _i|−∑_i|y _i||. By multiplying the right hand side by a real positive value M≤1, it is
$$\begin{aligned} \sum_{i} |x_{i} - y_{i}| \ge& M\biggl\vert \sum_{i} |x_{i}| - \sum _{i} |y_{i}| \biggr\vert \\ =& \biggl\vert \sum_{i} |x_{i}| M - \sum_{i} |y_{i}| M \biggr\vert . \end{aligned}$$

Here, x _i and y _i denote the points inside pattern i and j, respectively.
From Line 2 to Line 3: The function s in Eq. (2) is positive and bounded by M: s(k)≤M for every k. Then, using |x∗a−y∗a|≥|x∗b−y∗b| for any positive reals 0≥a≥b, one can write
$$\biggl\vert \sum_{i} |x_{i}| M - \sum _{i} |y_{i}| M \biggr\vert \ge\biggl\vert \sum_{i} |x_{i}| s(k) - \sum _{i} |y_{i}| s(k) \biggr\vert . $$
From Line 3 to Line 4: Straight forward using the definition of function f.

In Eq. (A.1), we show that if we map two patterns into one dimension using f, and if the distance between the one-dimensional values is higher than r, then the distance between the patterns in n dimension is also high than r. In other words, f is a good measure of dissimilarity between patterns in n dimension.

Appendix B: Accuracy with Respect to Pattern Size?

A simple problem of rolling a dice is presented here to illustrate the idea that the method performance may improve for greater spatial pattern sizes. Let us consider the following two cases: rolling two (Case 1) and three (Case 2) dice and distinguish between them such that (a,b) and (a,b,c) denote respectively possible outcomes for Case 1 and Case 2, with a, b, and c are the number of top of the first die, the second die, and the third die, respectively. Having the values a, b and c vary between 1 and 6, the lists of all joint possibilities for Case 1 and Case 2 are given as follows:

Case 1: 36 possibilities

(1,1)	(1,2)	(1,3)	(1,4)	(1,5)	(1,6)
(2,1)	(2,2)	(2,3)	(2,4)	(2,5)	(2,6)
(3,1)	(3,2)	(3,3)	(3,4)	(3,5)	(3,6)
(4,1)	(4,2)	(4,3)	(4,4)	(4,5)	(4,6)
(5,1)	(5,2)	(5,3)	(5,4)	(5,5)	(5,6)
(6,1)	(6,2)	(6,3)	(6,4)	(6,5)	(6,6)

Case 2: 216 possibilities

(1,1,1)	(1,2,1)	(1,3,1)	(1,4,1)	(1,5,1)	(1,6,1)	(1,1,4)	(1,2,4)	(1,3,4)	(1,4,4)	(1,5,4)	(1,6,4)
(2,1,1)	(2,2,1)	(2,3,1)	(2,4,1)	(2,5,1)	(2,6,1)	(2,1,4)	(2,2,4)	(2,3,4)	(2,4,4)	(2,5,4)	(2,6,4)
(3,1,1)	(3,2,1)	(3,3,1)	(3,4,1)	(3,5,1)	(3,6,1)	(3,1,4)	(3,2,4)	(3,3,4)	(3,4,4)	(3,5,4)	(3,6,4)
(4,1,1)	(4,2,1)	(4,3,1)	(4,4,1)	(4,5,1)	(4,6,1)	(4,1,4)	(4,2,4)	(4,3,4)	(4,4,4)	(4,5,4)	(4,6,4)
(5,1,1)	(5,2,1)	(5,3,1)	(5,4,1)	(5,5,1)	(5,6,1)	(5,1,4)	(5,2,4)	(5,3,4)	(5,4,4)	(5,5,4)	(5,6,4)
(6,1,1)	(6,2,1)	(6,3,1)	(6,4,1)	(6,5,1)	(6,6,1)	(6,1,4)	(6,2,4)	(6,3,4)	(6,4,4)	(6,5,4)	(6,6,4)
(1,1,2)	(1,2,2)	(1,3,2)	(1,4,2)	(1,5,2)	(1,6,2)	(1,1,5)	(1,2,5)	(1,3,5)	(1,4,5)	(1,5,5)	(1,6,5)
(2,1,2)	(2,2,2)	(2,3,2)	(2,4,2)	(2,5,2)	(2,6,2)	(2,1,5)	(2,2,5)	(2,3,5)	(2,4,5)	(2,5,5)	(2,6,5)
(3,1,2)	(3,2,2)	(3,3,2)	(3,4,2)	(3,5,2)	(3,6,2)	(3,1,5)	(3,2,5)	(3,3,5)	(3,4,5)	(3,5,5)	(3,6,5)
(4,1,2)	(4,2,2)	(4,3,2)	(4,4,2)	(4,5,2)	(4,6,2)	(4,1,5)	(4,2,5)	(4,3,5)	(4,4,5)	(4,5,5)	(4,6,5)
(5,1,2)	(5,2,2)	(5,3,2)	(5,4,2)	(5,5,2)	(5,6,2)	(5,1,5)	(5,2,5)	(5,3,5)	(5,4,5)	(5,5,5)	(5,6,5)
(6,1,2)	(6,2,2)	(6,3,2)	(6,4,2)	(6,5,2)	(6,6,2)	(6,1,5)	(6,2,5)	(6,3,5)	(6,4,5)	(6,5,5)	(6,6,5)
(1,1,3)	(1,2,3)	(1,3,3)	(1,4,3)	(1,5,3)	(1,6,3)	(1,1,6)	(1,2,6)	(1,3,6)	(1,4,6)	(1,5,6)	(1,6,6)
(2,1,3)	(2,2,3)	(2,3,3)	(2,4,3)	(2,5,3)	(2,6,3)	(2,1,6)	(2,2,6)	(2,3,6)	(2,4,6)	(2,5,6)	(2,6,6)
(3,1,3)	(3,2,3)	(3,3,3)	(3,4,3)	(3,5,3)	(3,6,3)	(3,1,6)	(3,2,6)	(3,3,6)	(3,4,6)	(3,5,6)	(3,6,6)
(4,1,3)	(4,2,3)	(4,3,3)	(4,4,3)	(4,5,3)	(4,6,3)	(4,1,6)	(4,2,6)	(4,3,6)	(4,4,6)	(4,5,6)	(4,6,6)
(5,1,3)	(5,2,3)	(5,3,3)	(5,4,3)	(5,5,3)	(5,6,3)	(5,1,6)	(5,2,6)	(5,3,6)	(5,4,6)	(5,5,6)	(5,6,6)
(6,1,3)	(6,2,3)	(6,3,3)	(6,4,3)	(6,5,3)	(6,6,3)	(6,1,6)	(6,2,6)	(6,3,6)	(6,4,6)	(6,5,6)	(6,6,6)

In total, there are 36 possibilities for (a,b) and 216 possibilities for (a,b,c). Consider now the sum of each of the possibilities:

Case 1: Sum of a and b

2	3	4	5	6	7
3	4	5	6	7	8
4	5	6	7	8	9
5	6	7	8	9	10
6	7	8	9	10	11
7	8	9	10	11	12

Case 2: Sum of a, b and c

3	4	5	6	7	8	6	7	8	9	10	11
4	5	6	7	8	9	7	8	9	10	11	12
5	6	7	8	9	10	8	9	10	11	12	13
6	7	8	9	10	11	9	10	11	12	13	14
7	8	9	10	11	12	10	11	12	13	14	15
8	9	10	11	12	13	11	12	13	14	15	16
4	5	6	7	8	9	7	8	9	10	11	12
5	6	7	8	9	10	8	9	10	11	12	13
6	7	8	9	10	11	9	10	11	12	13	14
7	8	9	10	11	12	10	11	12	13	14	15
8	9	10	11	12	13	11	12	13	14	15	16
9	10	11	11	13	14	12	13	14	15	16	17
5	6	7	8	9	10	8	9	10	11	12	13
6	7	8	9	10	11	9	10	11	12	13	14
7	8	9	10	11	12	10	11	12	13	14	15
8	9	10	11	12	13	11	12	13	14	15	16
9	10	11	12	13	14	12	13	14	15	16	17
10	11	12	13	14	15	13	14	15	16	17	18

For example, the outcomes where the sum of the two dice is equal to 7 form an event. If we call this event S, we have S={(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}. Consider that the dice are fair and independent, then each possibility (a,b) is equally likely, and P(S)=6/36=1/6. Consider two random variables A and B with outcomes the sum of all possibilities in Case 1 and Case 2, respectively: A={2,3,4,5,6,7,8,9,10,11,12}, and B={3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}. The probability density and cumulative distribution functions of both A and B are shown in Figs. 30 and 31. Figure 30 shows that the probably of A is greater than the probability of B at least at 70 % of the possible values of A, i.e. {2,3,4,5,6,7,8}. Figure 31 shows that the cdf of A is always over the cdf of B due to high probability that A occurs.

The above experiments shows that if we increase the size of the patterns i.e. from (a,b) to (a,b,c), there is a good probability that the classification method proposed improves its performance based on the function f employed. Note that the function f is not used to measure when two possibilities in Case 1 or Case 2 sum to the same value in A or B, respectively. The function f focuses on when the possibilities sum to very different values.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mustapha, H., Chatterjee, S. & Dimitrakopoulos, R. CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns. Math Geosci 46, 95–123 (2014). https://doi.org/10.1007/s11004-013-9490-1

Download citation

Received: 12 January 2011
Accepted: 30 August 2013
Published: 09 November 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11004-013-9490-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns

Abstract

Access this article

Similar content being viewed by others

A High-Order, Data-Driven Framework for Joint Simulation of Categorical Variables

A New High-Order, Nonstationary, and Transformation Invariant Spatial Simulation Approach

Sampling Strategies for Uncertainty Reduction in Categorical Random Fields: Formulation, Mathematical Analysis and Application to Multiple-Point Simulations

References

Acknowledgements