Skip to main content

Automatic cluster selection using gap statistics for pattern-based multi-point geostatistical simulation


An automatic cluster number selection algorithm is proposed for multi-point geostatistical simulation. The multi-point simulation is performed by extracting patterns from training image. The computational time of the pattern-based simulation is significantly reduced by dimension reduction of patterns by principal component analysis (PCA). The traditional PCA is used for its simplicity and computational ease. The patterns are classified using their principal components (PCs) by the k-means clustering algorithm. The number of clusters is selected automatically by calculating the gap statistics. The conditional cumulative density function (ccdf) for each class was generated based on the frequency of the central node value of the template. For sequential simulation, the similarity of the conditioning data with the class prototypes is measured using the L2-norm. The ccdf of best-matched class is used to draw a pattern from a class. The algorithm is validated with examples of conditional and unconditional simulation. The results show that the spatial continuity in terms of reproduction of curvilinear structure is well reproduced in all examples. The reproductions of first- and second-order statistics are also very good for all examples. A comparative study with the wavesim and filtersim techniques show that the proposed algorithm performed better than the filtersim and performed more or less very similar to the wavesim algorithm; however, the computational time of the proposed method is similar to filtersim and relatively less than that of the wavesim algorithm. The sensitivity of the algorithm on a number of PCs and the number of clusters have also been tested. Results revealed that automatic cluster selection helps to improve the performance of the proposed method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20


  1. Arpat G, Caers J (2007) Conditional simulation with patterns. Math Geol 39(2):177–203

    Article  Google Scholar 

  2. Chatterjee S, Dimitrakopoulos R (2011) Multi-scale stochastic simulation with a wavelet-based approach. Comput Geosci 45:177–189

    Article  Google Scholar 

  3. Chatterjee S, Dimitrakopulos R, Mustafa H (2012) Dimensional reduction of pattern-based simulation using wavelet analysis. Math Geosci 44:343–374

    Article  Google Scholar 

  4. Ding C, He X (2004) K-means clustering via principal component analysis. Proc. of Int’l Conf. Machine Learning (ICML 2004): 225–232.

  5. Goovaert P (1997) Geostatistics for natural resources evaluation (applied geostatistics series). Oxford University Press, Oxford

    Google Scholar 

  6. Guardiano FB, Srivastava RM (1993) Multivariate gostatistics; beyond bivariate moments. Quant Geol Geostat 5:133–144

    Article  Google Scholar 

  7. Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108

    Google Scholar 

  8. Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distance-based pattern modelling. Math Geosci 42:487–517

    Article  Google Scholar 

  9. Jolliffe I (1986) Principal component analysis. Springer, New York

    Book  Google Scholar 

  10. Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafy E, Shofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224

    Google Scholar 

  11. Mao S, Journel AG (1999) Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. In: Stanford Center for Reservoir Forecasting Annual Meeting. Available at:

  12. Mariethoz G, Renard P (2010) Reconstruction of incomplete data sets or images using direct sampling. Math Geosci 42(3):245–268

    Article  Google Scholar 

  13. Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple‐point geostatistical simulations. Water Resour Res 46(11): W11536

  14. Mustafa H, Chatterjee S, Dimitrakopulos R (2014) CDFSIM: efficient stochastic simulation through decomposition of cumulative distribution functions of transformed spatial patterns. Math Geosci 46:95–123

    Article  Google Scholar 

  15. Mustapha H, Dimitrakopoulos R (2010) High-order stochastic simulations for complex non-Gaussian and non-linear geological patterns. Math Geosci 42(5):457–485

    Article  Google Scholar 

  16. Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32

  17. Strebelle S (2002) Conditional simulation of complex geological structures using multiplepoint statistics. Math Geol 34(1):1–21

    Article  Google Scholar 

  18. Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98:750–763

    Article  Google Scholar 

  19. Tahmasebi P, Hezarkhani A, Sahimi M (2012) Multiple-point geostatistical modeling based on the cross-correlation functions. Computat Geosci 16:779–797

    Article  Google Scholar 

  20. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Statist Soc B 63(2):411–423

    Article  Google Scholar 

  21. Wu J, Zhang T, Journel A (2008) Fast FILTERSIM simulation with score-based distance. Math Geosci 40(7):773–788

    Article  Google Scholar 

  22. Yin H (2008) On multidimensional scaling and embedding of self-organising maps. Neural Netw 21:160–169

    Article  Google Scholar 

  23. Zhang T, Switzer P, Journel A (2006) Filter-based classification of training image patterns for spatial simulation. Math Geol 38(1):63–80

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Snehamoy Chatterjee.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chatterjee, S., Mohanty, M.M. Automatic cluster selection using gap statistics for pattern-based multi-point geostatistical simulation. Arab J Geosci 8, 7691–7704 (2015).

Download citation


  • Multi-point simulation
  • Geological heterogeneity
  • Conditional distribution
  • k-means clustering
  • Geostatistics