Arabian Journal of Geosciences

, Volume 8, Issue 9, pp 7691–7704 | Cite as

Automatic cluster selection using gap statistics for pattern-based multi-point geostatistical simulation

  • Snehamoy ChatterjeeEmail author
  • Manasi Manjari Mohanty
Original Paper


An automatic cluster number selection algorithm is proposed for multi-point geostatistical simulation. The multi-point simulation is performed by extracting patterns from training image. The computational time of the pattern-based simulation is significantly reduced by dimension reduction of patterns by principal component analysis (PCA). The traditional PCA is used for its simplicity and computational ease. The patterns are classified using their principal components (PCs) by the k-means clustering algorithm. The number of clusters is selected automatically by calculating the gap statistics. The conditional cumulative density function (ccdf) for each class was generated based on the frequency of the central node value of the template. For sequential simulation, the similarity of the conditioning data with the class prototypes is measured using the L2-norm. The ccdf of best-matched class is used to draw a pattern from a class. The algorithm is validated with examples of conditional and unconditional simulation. The results show that the spatial continuity in terms of reproduction of curvilinear structure is well reproduced in all examples. The reproductions of first- and second-order statistics are also very good for all examples. A comparative study with the wavesim and filtersim techniques show that the proposed algorithm performed better than the filtersim and performed more or less very similar to the wavesim algorithm; however, the computational time of the proposed method is similar to filtersim and relatively less than that of the wavesim algorithm. The sensitivity of the algorithm on a number of PCs and the number of clusters have also been tested. Results revealed that automatic cluster selection helps to improve the performance of the proposed method.


Multi-point simulation Geological heterogeneity Conditional distribution k-means clustering Geostatistics 


  1. Arpat G, Caers J (2007) Conditional simulation with patterns. Math Geol 39(2):177–203CrossRefGoogle Scholar
  2. Chatterjee S, Dimitrakopoulos R (2011) Multi-scale stochastic simulation with a wavelet-based approach. Comput Geosci 45:177–189CrossRefGoogle Scholar
  3. Chatterjee S, Dimitrakopulos R, Mustafa H (2012) Dimensional reduction of pattern-based simulation using wavelet analysis. Math Geosci 44:343–374CrossRefGoogle Scholar
  4. Ding C, He X (2004) K-means clustering via principal component analysis. Proc. of Int’l Conf. Machine Learning (ICML 2004): 225–232.Google Scholar
  5. Goovaert P (1997) Geostatistics for natural resources evaluation (applied geostatistics series). Oxford University Press, OxfordGoogle Scholar
  6. Guardiano FB, Srivastava RM (1993) Multivariate gostatistics; beyond bivariate moments. Quant Geol Geostat 5:133–144CrossRefGoogle Scholar
  7. Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108Google Scholar
  8. Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distance-based pattern modelling. Math Geosci 42:487–517CrossRefGoogle Scholar
  9. Jolliffe I (1986) Principal component analysis. Springer, New YorkCrossRefGoogle Scholar
  10. Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafy E, Shofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224Google Scholar
  11. Mao S, Journel AG (1999) Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. In: Stanford Center for Reservoir Forecasting Annual Meeting. Available at:
  12. Mariethoz G, Renard P (2010) Reconstruction of incomplete data sets or images using direct sampling. Math Geosci 42(3):245–268CrossRefGoogle Scholar
  13. Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple‐point geostatistical simulations. Water Resour Res 46(11): W11536Google Scholar
  14. Mustafa H, Chatterjee S, Dimitrakopulos R (2014) CDFSIM: efficient stochastic simulation through decomposition of cumulative distribution functions of transformed spatial patterns. Math Geosci 46:95–123CrossRefGoogle Scholar
  15. Mustapha H, Dimitrakopoulos R (2010) High-order stochastic simulations for complex non-Gaussian and non-linear geological patterns. Math Geosci 42(5):457–485CrossRefGoogle Scholar
  16. Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32Google Scholar
  17. Strebelle S (2002) Conditional simulation of complex geological structures using multiplepoint statistics. Math Geol 34(1):1–21CrossRefGoogle Scholar
  18. Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98:750–763CrossRefGoogle Scholar
  19. Tahmasebi P, Hezarkhani A, Sahimi M (2012) Multiple-point geostatistical modeling based on the cross-correlation functions. Computat Geosci 16:779–797CrossRefGoogle Scholar
  20. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Statist Soc B 63(2):411–423CrossRefGoogle Scholar
  21. Wu J, Zhang T, Journel A (2008) Fast FILTERSIM simulation with score-based distance. Math Geosci 40(7):773–788CrossRefGoogle Scholar
  22. Yin H (2008) On multidimensional scaling and embedding of self-organising maps. Neural Netw 21:160–169CrossRefGoogle Scholar
  23. Zhang T, Switzer P, Journel A (2006) Filter-based classification of training image patterns for spatial simulation. Math Geol 38(1):63–80CrossRefGoogle Scholar

Copyright information

© Saudi Society for Geosciences 2014

Authors and Affiliations

  1. 1.Department of Geological and Mining Engineering and SciencesMichigan Technological UniversityHoughtonUSA
  2. 2.Department of Mining EngineeringNational Institute of TechnologyRourkelaIndia

Personalised recommendations