Abstract
Stochastic modeling methods and uncertainty quantification are important tools for gaining insight into the geological variability of subsurface structures. Previous attempts at geologic inversion and interpretation can be broadly categorized into geostatistics and process-based modeling. The choice of a suitable modeling technique directly depends on the modeling applications and the available input data. Modern geophysical techniques provide us with regional data sets in two- or three-dimensional spaces with high resolution either directly from sensors or indirectly from geophysical inversion. Existing methods suffer certain drawbacks in producing accurate and precise (with quantified uncertainty) geological models using these data sets. In this work, a stochastic modeling framework is proposed to extract the subsurface heterogeneity from multiple and complementary types of data. Subsurface heterogeneity is considered as the “hidden link” between multiple spatial data sets. Hidden Markov random field models are employed to perform three-dimensional segmentation, which is the representation of the “hidden link”. Finite Gaussian mixture models are adopted to characterize the statistical parameters of multiple data sets. The uncertainties are simulated via a Gibbs sampling process within a Bayesian inference framework. The proposed modeling method is validated and is demonstrated using numerical examples. It is shown that the proposed stochastic modeling framework is a promising tool for three-dimensional segmentation in the field of geological modeling and geophysics.
Similar content being viewed by others
References
Attias H (2000) A variational Bayesian framework for graphical models. Adv Neural Inf Process Syst 12:209–215
Auerbach S, Schaeben H (1990) Computer-aided geometric design of geologic surfaces and bodies. Math Geol 22:957–987
Babak O, Deutsch CV (2009) An intrinsic model of coregionalization that solves variance inflation in collocated cokriging. Comput Geosci UK 35:603–614
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B (Methodol) 36:192–236
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc 48:259–302
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725
Blanchin R, Chilès J-P (1993) The Channel Tunnel: Geostatistical prediction of the geological conditions and its validation by the reality. Math Geol 25:963–974
Caers J (2011) Modeling uncertainty in the earth sciences. Wiley, Chichester
Caers J, Zhang T (2004) Multiple-point geostatistics: a quantitative vehicle for integrating geologic analogs into multiple reservoir models. G. M. Grammer, P. M. ldquoMitchrdquo Harris, and G. P. Eberli, Integration of outcrop and modern analogs in reservoir modeling. AAPG Memoir 80:383–394
Celeux G, Forbes F, Peyrard N (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36:131–144
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Chugunova TL, Hu LY (2008) Multiple-point simulations constrained by continuous auxiliary data. Math Geosci 40:133–146
Cline HE, Lorensen WE, Kikinis R, Jolesz F (1990) Three-dimensional segmentation of MR images of the head using probability and connectivity. J Comput Assist Tomogr 14:1037–1045
Cross GR, Jain AK (1983) Markov random field texture models. IEEE Trans Pattern Anal Mach Intell 5:25–39
Daly C (2005) Higher order models using entropy, Markov random fields and sequential simulation, Geostatistics Banff 2004. Springer, New York, pp 215–224
de Vries LM, Carrera J, Falivene O, Gratacós O, Slooten LJ (2009) Application of multiple point geostatistics to non-stationary images. Math Geosci 41:29–42
Elkateb T, Chalaturnyk R, Robertson PK (2003) An overview of soil heterogeneity: quantification and implications on geotechnical field problems. Can Geotech J 40:1–15
Figueiredo MA, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24:381–396
Fjortoft R, Delignon Y, Pieczynski W, Sigelle M, Tupin F (2003) Unsupervised classification of radar images using hidden Markov chains and hidden Markov random fields. IEEE Trans Geosci Remote Sens 41:675–686
Forbes F, Peyrard N (2003) Hidden Markov random field model selection criteria based on mean field-like approximations. IEEE Trans Pattern Anal Mach Intell 25:1089–1101
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Gao D (2003) Volume texture extraction for 3D seismic visualization and interpretation. Geophysics 68:1294–1302
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Gonzalez J, Low Y, Gretton A, Guestrin C (2011) Parallel Gibbs sampling: From colored fields to thin junction trees, International Conference on Artificial Intelligence and Statistics, pp 324–332
Ising E (1925) Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik A Hadrons Nuclei 31:253–258
Jessell MW, Valenta RK (1996) Structural geophysics: integrated structural and geophysical modelling. Comput Methods Geosci 15:303–324
Kindermann R, Snell JL (1980) Markov random fields and their applications. American Mathematical Society, Providence, RI
Koch J, He X, Jensen KH, Refsgaard JC (2014) Challenges in conditioning a stochastic geological model of a heterogeneous glacial aquifer to a comprehensive soft data set. Hydrol Earth Syst Sci 18:2907–2923
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press, Cambridge
Koltermann CE, Gorelick SM (1996) Heterogeneity in sedimentary deposits: A review of structure-imitating, process-imitating, and descriptive approaches. Water Resour Res 32:2617–2658
Lajaunie C, Courrioux G, Manuel L (1997) Foliation fields and 3D cartography in geology: principles of a method based on potential interpolation. Math Geol 29:571–584
Li Z, Wang X, Wang H, Liang RY (2016) Quantifying stratigraphic uncertainties by stochastic simulation techniques based on Markov random field. Eng Geol 201:106–122
Mallet J-L (1989) Discrete smooth interpolation. ACM Trans Gr 8:121–144
Mallet J-LL (2002) Geomodeling. Oxford University Press Inc, Oxford
Mann CJ (1993) Uncertainty in geology. Computers in Geology—25 Years of Progress. Oxford University Press, Oxford, pp 241–254
Mariethoz G, Caers J (2014) Multiple-point geostatistics: stochastic modeling with training images. wiley, New York
Mariethoz G, Renard P, Cornaton F, Jaquet O (2009) Truncated plurigaussian simulations to characterize aquifer heterogeneity. Ground water 47:13–24
McKenna SA, Poeter EP (1995) Field example of data fusion in site characterization. Water Resour Res 31:3229–3240
McLachlan G, Peel D (2004) Finite mixture models. Wiley, Hoboken, NJ
McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: Textbooks and Monographs. Dekker, New York, p 1
McLachlan GJ, Krishnan T (2007) The EM algorithm and extensions. Wiley-Interscience, New York
Norberg T, Rosén L, Baran A, Baran S (2002) On modelling discrete geological structures as Markov random fields. Math Geol 34:63–77
Pham DL, Xu C, Prince JL (2000) Current methods in medical image segmentation. Ann Rev Biomed Eng 2:315–337
Reitberger J, Schnörr C, Krzystek P, Stilla U (2009) 3D segmentation of single trees exploiting full waveform LIDAR data. ISPRS J Photogramm Remote Sens 64:561–574
Rubin Y, Chen X, Murakami H, Hahn M (2010) A Bayesian approach for inverse modeling, data assimilation, and conditional simulation of spatial random fields. Water Resour Res 46:W10523
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. CRC Press, Boca Raton
Solberg AHS, Taxt T, Jain AK (1996) A Markov random field model for classification of multisource satellite imagery. IEEE Trans Geosci Remote Sens 34:100–113
Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34:1–21
Thornton C (1998) Separability is a learner’s best friend, 4th Neural Computation and Psychology Workshop, 9–11 April 1997. Springer, London, pp 40–46
Toftaker H, Tjelmeland H (2013) Construction of binary multi-grid Markov random field prior models from training images. Math Geosci 45:383–409
Tolpekin VA, Stein A (2009) Quantification of the effects of land-cover-class spectral separability on the accuracy of Markov-random-field-based superresolution mapping. IEEE Trans Geosci Remote Sens 47:3283–3297
Wang X, Li Z, Wang H, Rong Q, Liang RY (2016) Probabilistic analysis of shield-driven tunnel in multiple strata considering stratigraphic uncertainty. Struct Saf 62:88–100
Wellmann JF (2013) Information theory for correlation analysis and estimation of uncertainty reduction in maps and models. Entropy 15:1464–1485
Wellmann JF, Regenauer-Lieb K (2012) Uncertainties have a meaning: Information entropy as a quality measure for 3-D geological models. Tectonophysics 526:207–216
Wellmann JF, Thiele ST, Lindsay MD, Jessell MW (2016) pynoddy 1.0: an experimental platform for automated 3-D kinematic and potential field modelling. Geosci Model Dev 9:1019–1035
Xie H, Pierce LE, Ulaby FT (2002) SAR speckle reduction using wavelet denoising and Markov random field modeling. IEEE Trans Geosci Remote Sens 40:2196–2212
Yuen KV, Mu HQ (2011) Peak ground acceleration estimation by linear and nonlinear models with reduced order Monte Carlo simulation. Comput Aided Civil Infrastruct Eng 26:30–47
Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 20:45–57
Zhu H, Zhang L (2013) Characterizing geotechnical anisotropic spatial variations using random field theory. Can Geotech J 50:723–734
Acknowledgements
Hui Wang and Florian Wellmann would like to acknowledge the support from the German research foundation (DFG) through the Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University. The authors would like to thank the anonymous reviewers for their constructive comments that have helped to improve the paper significantly.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: MRF Energy and Likelihood Energy
According to Bayesian theory
Although it is not possible to sample a posteriori realizations of x according to Eq. (26) directly, but one may note that the conditional random field \(p(\mathbf{x}|\mathbf{y},\Phi )\) is still a Gibbs field if one substitutes \(U^{\prime }(\mathbf{x})=U(\mathbf{x})-\sum \nolimits _{j\in V} {\log f_{x_j } (y_j ;\theta _{x_j } )} \) into Eq. (26). Assuming the emission distribution is Gaussian (i.e., \(\theta _{x_j } =(\mu _{x_j },\Sigma _{x_j } ))\), the corresponding local conditional distribution is
which can be rewritten as
with the MRF energy \(U(x_j,\mathbf{x}_{\partial _j } )\), and the likelihood energy is calculated as follows
Appendix B: Chromatic Sampler
The chromatic sampler applies a classic graph coloring technique to parallel job scheduling, so that a direct parallelization of the sequential scan Gibbs sampler can be achieved. To be more specific, the entire set of voxels is decomposed into k subsets such that adjacent vertices in the corresponding graph will have different colors. The k-coloring of the MRF ensures that within a certain subset, all vertices are conditionally independent given the configuration of all other vertices in the remaining colors. Therefore, all vertices with the same color can be sampled independently and in parallel. According to Gonzalez et al. (2011), it is guaranteed that given p processors and a k-coloring of an MRF with n vertexes, the parallel chromatic sampler is ergodic and generates a single joint sample in running time: \(O(n/p+k)\) which results in a p reduction in the mixing time. Given sufficient parallel resource, the running time is mainly dominated by the number of different colors k. A simple example is provided here: a three-dimensional grid system (Fig. 10) equipped with the neighborhood system defined in Sect. 2 is a graph with 8 colors (\(k=8\)).
Appendix C: Calculating Information Entropy
First, calculate the probability \(P_l (i)\) of assigning a certain label \(l\in L\) to a given voxel \(i\in V\) using the following expression
where n is a predefined number of realizations after the burn-in period and \(I_l (\cdot )\) is an indicator function which is defined as
Second, for voxel i, the information entropy reads
Based on Eq. (32), the average information entropy for the entire physical domain can be calculated as
where \(\left| V \right| \) denotes the cardinality of the set V. The average information entropy is used to quantify the uncertainties of the entire physical domain with a single number.
Appendix D: Geometric Separability Index (GSI)
According to Thornton (1998), the Geometric Separability Index (GSI) for a two-cluster complete data set (i.e., both cluster labels x and observed features y are known) is defined as
Here f(.) is a binary target function that returns 0 or 1 according to the input label, \(x_i \in L\) is the label at site i. \(x_i^{\prime } \in L\) is the label of site i’s nearest neighbor and n is the total number of data points. The nearest neighbor is defined using Euclidean distance in the feature space.
For cases with multiple clusters, simply define the binary target function linked to the specific label \(l\in L\)
Then the average GSI is defined as the algebraic mean of \(\mathrm{GSI}_l \hbox { }l\in L\)
where \(\left| L \right| \) is the cardinality of the label set L. The average GSI is used to provide a measure of the overall separability in a single number.
The average GSI intuitively quantifies the degree to which data points mix together and hence indicates how “difficult” the segmentation problem is. If the centroids of the clusters almost coincide or the observed data points are uniformly distributed in the feature space (i.e., highly overlapped), the GSI will be close to 0.5; in contrast, if there is almost no overlap among clusters, the GSI will be close to 1.0.
Rights and permissions
About this article
Cite this article
Wang, H., Wellmann, J.F., Li, Z. et al. A Segmentation Approach for Stochastic Geological Modeling Using Hidden Markov Random Fields. Math Geosci 49, 145–177 (2017). https://doi.org/10.1007/s11004-016-9663-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-016-9663-9