Skip to main content

CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns

Abstract

Simulation of categorical and continuous variables is performed using a new pattern-based simulation method founded upon coding spatial patterns in one dimension. The method consists of, first, using a spatial template to extract information in the form of patterns from a training image. Patterns are grouped into a pattern database and, then, mapped to one dimension. Cumulative distribution functions of the one-dimensional patterns are built. Patterns are then classified by decomposing the cumulative distribution functions, and calculating class or cluster prototypes. During the simulation process, a conditioning data event is compared to the class prototype, and a pattern is randomly drawn from the best matched class. Several examples are presented so as to assess the performance of the proposed method, including conditional and unconditional simulations of categorical and continuous data sets. Results show that the proposed method is efficient and very well performing in both two and three dimensions. Comparison of the proposed method to the filtersim algorithm suggests that it is better at reproducing the multi-point configurations and main characteristics of the reference images, while less sensitive to the number of classes and spatial templates used in the simulations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

References

  • Allard D, Froidevaux R, Biver P (2006) Conditional simulation of multi-type non stationary Markov object models respecting specified proportions. Math Geol 38(8):959–986

    Article  Google Scholar 

  • Arpat GB (2004) Sequential simulation with patterns. PhD thesis, Stanford University

  • Arpat G, Caers J (2007) Conditional simulation with patterns. Math Geol 39(2):177–203

    Article  Google Scholar 

  • Chatterjee S, Dimitrakopoulos R (2012) Multi-scale stochastic simulation with a wavelet-based approach. Comput Geosci 45:177–189

    Article  Google Scholar 

  • Chatterjee S, Dimitrakopoulo R, Mustapha H (2012) Dimensional reduction of pattern-based simulation using wavelet analysis. Math Geosci 44(3):343–374

    Article  Google Scholar 

  • Chilès JP, Delfiner P (1999) Geostatistics—modeling spatial uncertainty. Wiley, New York

    Book  Google Scholar 

  • Comunian A, Renard P, Straubhaar J (2012) 3D multiple-point statistics simulation using 2D training images. Comput Geosci 40:49–65

    Article  Google Scholar 

  • Daly C (2004) Higher order models using entropy, Markov random fields and sequential simulation. In: Leuangthong O, Deutsch CV (eds) Geostatistics Banff. Kluwer, Dordrecht, pp 215–225

    Google Scholar 

  • Deutsch CV (2002) Geostatistical reservoir modeling. Oxford University Press, New York

    Google Scholar 

  • Dimitrakopoulos R, Mustapha H, Gloaguen E (2010) High-order statistics of spatial random fields: exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Math Geosci 42(1):65–99

    Article  Google Scholar 

  • Gloaguen E, Dimitrakopoulos R (2009) Two-dimensional conditional simulation based on the wavelet decomposition of training images. Math Geosci 41(7):679–701

    Article  Google Scholar 

  • Goovaerts P (1998) Geostatistics for natural resources evaluation. Oxford University Press, New York

    Google Scholar 

  • Guardiano FB, Srivastava RM (1993) Multivariate geostatistics: beyond bivariate moments. In: Soares (ed) Geostatistics Troia ‘92. Kluwer, Dordrecht, pp 133–144

    Google Scholar 

  • Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distance-based pattern modelling. Math Geosci 42:487–517

    Article  Google Scholar 

  • Huysmans M, Dassargues A (2011) Direct multiple-point geostatistical simulation of edge properties for modeling thin irregularly shaped surfaces. Math Geosci 43(5):521–536. doi:10.1007/s11004-011-9336-7

    Article  Google Scholar 

  • Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafi E, Schofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224

    Google Scholar 

  • Liu Y (2006) Using the Snesim program for multiple-point statistical simulation. Comput Geosci 23(2006):1544–1563

    Article  Google Scholar 

  • Mao S, Journel AG (1999) Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. In: Stanford center for reservoir forecasting annual meeting. Available at: http://ekofisk.stanford.edu/SCRF.html

    Google Scholar 

  • Mariethoz G, Renard P (2010) Reconstruction of incomplete data sets or images using direct sampling. Math Geosci 42(3):245–268

    Article  Google Scholar 

  • Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple-point simulation. Water Resour Res. doi:10.1029/2008WR007621

    Google Scholar 

  • Mustapha H, Dimitrakopoulos R (2010) High-order stochastic simulation of complex spatially distributed natural phenomena. Math Geosci 42(5):455–473

    Article  Google Scholar 

  • Mustapha H, Dimitrakopoulos R, Chatterjee S (2011) Geologic heterogeneity representation using high-order spatial cumulants for subsurface flow and transport simulations. Water Resour Res. doi:10.1029/2010WR009515

    Google Scholar 

  • Ortiz JM, Deutsh CV (2004) Indicator simulation accounting for multiple-point statistics. Math Geol 36(5):545–565

    Article  Google Scholar 

  • Remy N, Boucher A, Wu J (2008) Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press, Cambridge

    Google Scholar 

  • Sarma P, Durlofsky L, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32

    Article  Google Scholar 

  • Scheidt C, Caers J (2009) Representing spatial uncertainty using distances and kernels. Math Geosci 41:397–419

    Article  Google Scholar 

  • Straubhaar J, Renard P, Mariethoz G, Froidevaux R, Besson O (2011) An improved parallel multiple-point algorithm using a list approach. Math Geosci 43(3):305–328

    Article  Google Scholar 

  • Strebelle S (2000) Sequential simulation drawing structures from training images. PhD thesis, Stanford University

  • Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34(1):1–21

    Article  Google Scholar 

  • Tjelmeland H (1998) Markov random fields with higher order interactions. Scand J Stat 25:415–433

    Article  Google Scholar 

  • Tjelmeland H, Eidsvik J (2004) Directional Metropolis: hastings updates for conditionals with nonlinear likelihoods. In: Geostatistics Banff 2004, vol 1. Springer, Berlin, pp 95–104

    Google Scholar 

  • Wu J, Zhang T, Journel A (2008) Fast FILTERSIM simulation with score-based distance. Math Geosci 40(7):773–788

    Article  Google Scholar 

  • Zhang T, Switzer P, Journel A (2006) Filter-based classification of training image patterns for spatial simulation. Math Geol 38(1):63–80

    Article  Google Scholar 

  • Zhang T, Stein Inge Pedersen SI, Christen Knudby C, McCormick D (2012) Memory-efficient categorical multi-point statistics algorithms based on compact search trees. Math Geosci 44(7):863–879

    Article  Google Scholar 

Download references

Acknowledgements

We thank the Associate Editor of Mathematical Geosciences handling our manuscript and the anonymous reviewers for their detailed comments that have helped improve the manuscript. The work in this paper was funded by Natural Science and Engineering Research Council of Canada CRDPJ 411270-10, Discovery Grant 239019, and the industry members of the COSMO Stochastic Mine Planning Laboratory: AngloGold Ashanti, Barrick Gold, BHP Billiton, De Beers, Newmont Mining and Vale.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hussein Mustapha.

Additional information

H. Mustapha now at Schlumberger, United Kingdom. S. Chatterjee now at National Institute of Technology Rourkela, India.

Appendices

Appendix A: A Dissimilarity Method and Its Relevance

For a completeness reason, it is important to mention that the dissimilarity checker applied to find two class centers V p and V q reflects the same level of dissimilarity between the patterns in the original space. In other words, for any two patterns ti T (u),ti T (u′)∈patdb T , the following property holds: If ∃r>0 such that |f(ti T (u))−f(ti T (u′))|≥r, then ∥ti T (u)−ti T (u′)∥≥r, where ∥.∥ and |.| denote the L 1-norm and absolute value, respectively. The demonstration is

$$\begin{aligned} \bigl\Vert \mathit{ti}_{{T}}({\mathbf{u}}) - \mathit{ti}_{{T}} \bigl({\mathbf{u}}'\bigr)\bigr\Vert =& \sum _{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}({ \mathbf{u}} + {\mathbf{h}}_{k}) - \mathit{ti}_{{T}}\bigl({ \mathbf{u}}' + {\mathbf{h}}_{k}\bigr)\bigr\vert \\ \ge& \Biggl\vert \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}({\mathbf{u}} + {\mathbf{h}}_{k}) \bigr\vert M - \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}\bigl({\mathbf{u}}' + { \mathbf{h}}_{k}\bigr) \bigr\vert M \Biggr\vert \\ \ge& \Biggl\vert \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}({\mathbf{u}} + {\mathbf{h}}_{k}) \bigr\vert s(k) - \sum_{k = 1}^{n_{T}} \bigl\vert \mathit{ti}_{{T}}\bigl({\mathbf{u}}' + { \mathbf{h}}_{k}\bigr) \bigr\vert s(k) \Biggr\vert \\ \ge& \bigl\vert f\bigl(\mathit{ti}_{{T}}({\mathbf{u}})\bigr) - f \bigl(\mathit{ti}_{{T}}\bigl({\mathbf{u}}'\bigr)\bigr) \bigr\vert \\ \ge& r. \end{aligned}$$
(A.1)

Note that ti T (u+h k ) is assumed to be positive; this assumption can always hold by applying a simple translation to the input images. The different steps in Eq. (A.1) can be described as follows:

  • From Line 1 to Line 2: Using the formula ∑ i |x i |≥|∑ i x i |, one can write ∑ i |x i y i |≥|∑ i x i −∑ i y i |. Given the image values are assumed positive, then one can write ∑ i |x i y i |≥|∑ i |x i |−∑ i |y i ||. By multiplying the right hand side by a real positive value M≤1, it is

    $$\begin{aligned} \sum_{i} |x_{i} - y_{i}| \ge& M\biggl\vert \sum_{i} |x_{i}| - \sum _{i} |y_{i}| \biggr\vert \\ =& \biggl\vert \sum_{i} |x_{i}| M - \sum_{i} |y_{i}| M \biggr\vert . \end{aligned}$$

    Here, x i and y i denote the points inside pattern i and j, respectively.

  • From Line 2 to Line 3: The function s in Eq. (2) is positive and bounded by M: s(k)≤M for every k. Then, using |xaya|≥|xbyb| for any positive reals 0≥ab, one can write

    $$\biggl\vert \sum_{i} |x_{i}| M - \sum _{i} |y_{i}| M \biggr\vert \ge\biggl\vert \sum_{i} |x_{i}| s(k) - \sum _{i} |y_{i}| s(k) \biggr\vert . $$
  • From Line 3 to Line 4: Straight forward using the definition of function f.

In Eq. (A.1), we show that if we map two patterns into one dimension using f, and if the distance between the one-dimensional values is higher than r, then the distance between the patterns in n dimension is also high than r. In other words, f is a good measure of dissimilarity between patterns in n dimension.

Appendix B: Accuracy with Respect to Pattern Size?

A simple problem of rolling a dice is presented here to illustrate the idea that the method performance may improve for greater spatial pattern sizes. Let us consider the following two cases: rolling two (Case 1) and three (Case 2) dice and distinguish between them such that (a,b) and (a,b,c) denote respectively possible outcomes for Case 1 and Case 2, with a, b, and c are the number of top of the first die, the second die, and the third die, respectively. Having the values a, b and c vary between 1 and 6, the lists of all joint possibilities for Case 1 and Case 2 are given as follows:

Case 1: 36 possibilities

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

Case 2: 216 possibilities

(1,1,1) (1,2,1) (1,3,1) (1,4,1) (1,5,1) (1,6,1) (1,1,4) (1,2,4) (1,3,4) (1,4,4) (1,5,4) (1,6,4)
(2,1,1) (2,2,1) (2,3,1) (2,4,1) (2,5,1) (2,6,1) (2,1,4) (2,2,4) (2,3,4) (2,4,4) (2,5,4) (2,6,4)
(3,1,1) (3,2,1) (3,3,1) (3,4,1) (3,5,1) (3,6,1) (3,1,4) (3,2,4) (3,3,4) (3,4,4) (3,5,4) (3,6,4)
(4,1,1) (4,2,1) (4,3,1) (4,4,1) (4,5,1) (4,6,1) (4,1,4) (4,2,4) (4,3,4) (4,4,4) (4,5,4) (4,6,4)
(5,1,1) (5,2,1) (5,3,1) (5,4,1) (5,5,1) (5,6,1) (5,1,4) (5,2,4) (5,3,4) (5,4,4) (5,5,4) (5,6,4)
(6,1,1) (6,2,1) (6,3,1) (6,4,1) (6,5,1) (6,6,1) (6,1,4) (6,2,4) (6,3,4) (6,4,4) (6,5,4) (6,6,4)
(1,1,2) (1,2,2) (1,3,2) (1,4,2) (1,5,2) (1,6,2) (1,1,5) (1,2,5) (1,3,5) (1,4,5) (1,5,5) (1,6,5)
(2,1,2) (2,2,2) (2,3,2) (2,4,2) (2,5,2) (2,6,2) (2,1,5) (2,2,5) (2,3,5) (2,4,5) (2,5,5) (2,6,5)
(3,1,2) (3,2,2) (3,3,2) (3,4,2) (3,5,2) (3,6,2) (3,1,5) (3,2,5) (3,3,5) (3,4,5) (3,5,5) (3,6,5)
(4,1,2) (4,2,2) (4,3,2) (4,4,2) (4,5,2) (4,6,2) (4,1,5) (4,2,5) (4,3,5) (4,4,5) (4,5,5) (4,6,5)
(5,1,2) (5,2,2) (5,3,2) (5,4,2) (5,5,2) (5,6,2) (5,1,5) (5,2,5) (5,3,5) (5,4,5) (5,5,5) (5,6,5)
(6,1,2) (6,2,2) (6,3,2) (6,4,2) (6,5,2) (6,6,2) (6,1,5) (6,2,5) (6,3,5) (6,4,5) (6,5,5) (6,6,5)
(1,1,3) (1,2,3) (1,3,3) (1,4,3) (1,5,3) (1,6,3) (1,1,6) (1,2,6) (1,3,6) (1,4,6) (1,5,6) (1,6,6)
(2,1,3) (2,2,3) (2,3,3) (2,4,3) (2,5,3) (2,6,3) (2,1,6) (2,2,6) (2,3,6) (2,4,6) (2,5,6) (2,6,6)
(3,1,3) (3,2,3) (3,3,3) (3,4,3) (3,5,3) (3,6,3) (3,1,6) (3,2,6) (3,3,6) (3,4,6) (3,5,6) (3,6,6)
(4,1,3) (4,2,3) (4,3,3) (4,4,3) (4,5,3) (4,6,3) (4,1,6) (4,2,6) (4,3,6) (4,4,6) (4,5,6) (4,6,6)
(5,1,3) (5,2,3) (5,3,3) (5,4,3) (5,5,3) (5,6,3) (5,1,6) (5,2,6) (5,3,6) (5,4,6) (5,5,6) (5,6,6)
(6,1,3) (6,2,3) (6,3,3) (6,4,3) (6,5,3) (6,6,3) (6,1,6) (6,2,6) (6,3,6) (6,4,6) (6,5,6) (6,6,6)

In total, there are 36 possibilities for (a,b) and 216 possibilities for (a,b,c). Consider now the sum of each of the possibilities:

Case 1: Sum of a and b

2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12

Case 2: Sum of a, b and c

3 4 5 6 7 8 6 7 8 9 10 11
4 5 6 7 8 9 7 8 9 10 11 12
5 6 7 8 9 10 8 9 10 11 12 13
6 7 8 9 10 11 9 10 11 12 13 14
7 8 9 10 11 12 10 11 12 13 14 15
8 9 10 11 12 13 11 12 13 14 15 16
4 5 6 7 8 9 7 8 9 10 11 12
5 6 7 8 9 10 8 9 10 11 12 13
6 7 8 9 10 11 9 10 11 12 13 14
7 8 9 10 11 12 10 11 12 13 14 15
8 9 10 11 12 13 11 12 13 14 15 16
9 10 11 11 13 14 12 13 14 15 16 17
5 6 7 8 9 10 8 9 10 11 12 13
6 7 8 9 10 11 9 10 11 12 13 14
7 8 9 10 11 12 10 11 12 13 14 15
8 9 10 11 12 13 11 12 13 14 15 16
9 10 11 12 13 14 12 13 14 15 16 17
10 11 12 13 14 15 13 14 15 16 17 18

For example, the outcomes where the sum of the two dice is equal to 7 form an event. If we call this event S, we have S={(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}. Consider that the dice are fair and independent, then each possibility (a,b) is equally likely, and P(S)=6/36=1/6. Consider two random variables A and B with outcomes the sum of all possibilities in Case 1 and Case 2, respectively: A={2,3,4,5,6,7,8,9,10,11,12}, and B={3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}. The probability density and cumulative distribution functions of both A and B are shown in Figs. 30 and 31. Figure 30 shows that the probably of A is greater than the probability of B at least at 70 % of the possible values of A, i.e. {2,3,4,5,6,7,8}. Figure 31 shows that the cdf of A is always over the cdf of B due to high probability that A occurs.

Fig. 30
figure 30

Probability density functions of A and B

Fig. 31
figure 31

Cumulative distribution functions of A and B

The above experiments shows that if we increase the size of the patterns i.e. from (a,b) to (a,b,c), there is a good probability that the classification method proposed improves its performance based on the function f employed. Note that the function f is not used to measure when two possibilities in Case 1 or Case 2 sum to the same value in A or B, respectively. The function f focuses on when the possibilities sum to very different values.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mustapha, H., Chatterjee, S. & Dimitrakopoulos, R. CDFSIM: Efficient Stochastic Simulation Through Decomposition of Cumulative Distribution Functions of Transformed Spatial Patterns. Math Geosci 46, 95–123 (2014). https://doi.org/10.1007/s11004-013-9490-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-013-9490-1

Keywords

  • Pattern-based simulation
  • Clustering
  • Patterns coding
  • Conditional simulation
  • Training image