Parametric generation of conditional geological realizations using generative neural networks

Chan, Shing; Elsheikh, Ahmed H.

doi:10.1007/s10596-019-09850-7

Parametric generation of conditional geological realizations using generative neural networks

Original Paper
Open access
Published: 13 July 2019

Volume 23, pages 925–952, (2019)
Cite this article

Download PDF

You have full access to this open access article

Computational Geosciences Aims and scope Submit manuscript

Parametric generation of conditional geological realizations using generative neural networks

Download PDF

Shing Chan¹ &
Ahmed H. Elsheikh¹

1625 Accesses
72 Citations
5 Altmetric
Explore all metrics

Abstract

Deep learning techniques are increasingly being considered for geological applications where—much like in computer vision—the challenges are characterized by high-dimensional spatial data dominated by multipoint statistics. In particular, a novel technique called generative adversarial networks has been recently studied for geological parametrization and synthesis, obtaining very impressive results that are at least qualitatively competitive with previous methods. The method obtains a neural network parametrization of the geology—so-called a generator—that is capable of reproducing very complex geological patterns with dimensionality reduction of several orders of magnitude. Subsequent works have addressed the conditioning task, i.e., using the generator to generate realizations honoring spatial observations (hard data). The current approaches, however, do not provide a parametrization of the conditional generation process. In this work, we propose a method to obtain a parametrization for direct generation of conditional realizations. The main idea is to simply extend the existing generator network by stacking a second inference network that learns to perform the conditioning. This inference network is a neural network trained to sample a posterior distribution derived using a Bayesian formulation of the conditioning task. The resulting extended neural network thus provides the conditional parametrization. Our method is assessed on a benchmark image of binary channelized subsurface, obtaining very promising results for a wide variety of conditioning configurations.

Article PDF

Stochastic Facies Inversion with Prior Sampling by Conditional Generative Adversarial Networks Based on Training Image

Article 23 November 2023

Conditioning generative adversarial networks on nonlinear data for subsurface flow model calibration and uncertainty quantification

Article 06 November 2021

GANSim: Conditional Facies Simulation Using an Improved Progressive Growing of Generative Adversarial Networks (GANs)

Article 29 March 2021

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Jacquard, P.: Permeability distribution from field pressure data. Soc. Pet. Eng. https://doi.org/10.2118/1307-PA(1965)
Jahns, H. O.: A rapid method for obtaining a two-dimensional reservoir description from well pressure response data. Soc. Pet. Eng. https://doi.org/10.2118/1473-PA (1966)
Article Google Scholar
Sarma, P, Durlofsky, LJ, Aziz, K: Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math. Geosci. 40(1), 3–32 (2008)
Article Google Scholar
Ma, X, Zabaras, N: Kernel principal component analysis for stochastic input model generation. J. Comput. Phys. 230(19), 7311–7331 (2011)
Article Google Scholar
Vo, HX, Durlofsky, LJ: Regularized kernel PCA for the efficient parameterization of complex geological models. J. Comput. Phys. 322, 859–881 (2016)
Article Google Scholar
Shirangi, MG, Emerick, AA: An improved TSVD-based Levenberg–Marquardt algorithm for history matching and comparison with Gauss–Newton. J. Pet. Sci. Eng. 143, 258–271 (2016)
Article Google Scholar
Tavakoli, R, Reynolds, AC: Monte Carlo simulation of permeability fields and reservoir performance predictions with SVD parameterization in RML compared with EnKF. Comput. Geosci. 15(1), 99–116 (2011)
Article Google Scholar
Jafarpour, B., McLaughlin, D. B.: Reservoir characterization with the discrete cosine transform. Soc. Petrol. Eng. https://doi.org/10.2118/106453-PA (2009)
Article Google Scholar
Jafarpour, B, Goyal, VK, McLaughlin, DB, Freeman, WT: Compressed history matching: exploiting transform-domain sparsity for regularization of nonlinear dynamic data integration problems. Math. Geosci. 42(1), 1–27 (2010). ISSN 1874-8953. https://doi.org/10.1007/s11004-009-9247-z
Article Google Scholar
Moreno, D., Aanonsen, S. I.: Stochastic facies modelling using the level set method. In: EAGE Conference on Petroleum Geostatistics (2007)
Dorn, O, Villegas, R: History matching of petroleum reservoirs using a level set technique. Inverse Prob. 24(3), 035015 (2008). http://stacks.iop.org/0266-5611/24/i=3/a=035015
Article Google Scholar
Chang, H, Zhang, D, Lu, Z: History matching of facies distribution with the EnKF and level set parameterization. J. Comput. Phys. 229(20), 8011–8030 (2010). ISSN 0021-9991. https://doi.org/10.1016/j.jcp.2010.07.005. http://www.sciencedirect.com/science/article/pii/S0021999110003748
Article Google Scholar
Khaninezhad, MM, Jafarpour, B, Li, L: Sparse geologic dictionaries for subsurface flow model calibration: part i. Inversion formulation. Adv. Water Resour. 39, 106–121 (2012)
Article Google Scholar
Khaninezhad, MM, Jafarpour, B, Li, L: Sparse geologic dictionaries for subsurface flow model calibration: part ii. Robustness to uncertainty. Adv. Water Resour. 39, 122–136 (2012)
Article Google Scholar
Goodfellow, I, Pouget-Abadie, J, Mirza, M, Bing, Xu, Warde-Farley, D, Ozair, S, Courville, A, Bengio, Y: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680 (2014)
Mosser, L, Dubrule, O, Blunt, MJ: Reconstruction of three-dimensional porous media using generative adversarial neural networks. arXiv:1704.03225 (2017)
Mosser, L, Dubrule, O, Blunt, MJ: Stochastic reconstruction of an oolitic limestone by generative adversarial networks. arXiv:1712.02854 (2017)
Chan, S, Elsheikh, AH: Parametrization and generation of geological models with generative adversarial networks. arXiv:1708.01810 (2017)
Laloy, E, Hérault, R, Jacques, D, Linde, N: Training-image based geostatistical inversion using a spatial generative adversarial neural network. Water Resour. Res. 54(1), 381–406 (2018)
Article Google Scholar
Dupont, E, Zhang, T, Tilke, P, Liang, L, Bailey, W: Generating realistic geology conditioned on physical measurements with generative adversarial networks. arXiv:1802.03065 (2018)
Mosser, L, Dubrule, O, Blunt, MJ: Conditioning of three-dimensional generative adversarial networks for pore and reservoir-scale models. arXiv:1802.05622 (2018)
Chan, S, Elsheikh, AH: Parametrization of stochastic inputs using generative adversarial networks with application in geology. arXiv:1904.03677 (2019)
Marçais, J, de Dreuzy, J-R: Prospective interest of deep learning for hydrological inference. Groundwater 55(5), 688–692 (2017)
Article Google Scholar
Nagoor Kani, J, Elsheikh, AH: DR-RNN: a deep residual recurrent neural network for model reduction. arXiv:1709.00939 (2017)
Klie, H, et al.: Physics-based and data-driven surrogates for production forecasting. In: SPE Reservoir Simulation Symposium. Society of Petroleum Engineers (2015)
Stanev, VG, Iliev, FL, Hansen, S, Vesselinov, VV, Alexandrov, BS: Identification of release sources in advection–diffusion system by machine learning combined with Green’s function inverse method. Appl. Math. Model. 60, 64–76 (2018)
Article Google Scholar
Sun, W, Durlofsky, LJ: A new data-space inversion procedure for efficient uncertainty quantification in subsurface flow problems. Math. Geosci. 49(6), 679–715 (2017)
Article Google Scholar
Zhu, Y, Zabaras, N: Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. J. Comput. Phys. 366, 415–447 (2018)
Article Google Scholar
Valera, M, Guo, Z, Kelly, P, Matz, S, Cantu, A, Percus, AG, Hyman, JD, Srinivasan, G, Viswanathan, HS: Machine learning for graph-based representations of three-dimensional discrete fracture networks. arXiv:1705.09866 (2017)
Strebelle, SB, Journel, AG: Reservoir modeling using multiple-point statistics. In: SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers (2001)
Brock, A, Donahue, J, Simonyan, K: Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096 (2018)
Karras, T, Aila, T, Laine, S, Lehtinen, J: Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196 (2017)
Schmidhuber, J: Learning factorial codes by predictability minimization. Neural Comput. 4(6), 863–879 (1992)
Article Google Scholar
Radford, A, Metz, L, Chintala, S: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015)
Salimans, T, Goodfellow, I, Zaremba, W, Cheung, V, Radford, A, Chen, X: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp 2234–2242 (2016)
Arjovsky, M, Bottou, L: Towards principled methods for training generative adversarial networks. arXiv:1701.04862 (2017)
Arora, S, Ge, R, Liang, Y, Ma, T, Zhang, Y: Generalization and equilibrium in generative adversarial nets (GANs). arXiv:1703.00573 (2017)
Müller, A: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Article Google Scholar
Gretton, A, Borgwardt, KM, Rasch, M, Schölkopf, B, Smola, AJ: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, pp 513–520 (2007)
Dziugaite, GK, Roy, DM, Ghahramani, Z: Training generative neural networks via maximum mean discrepancy optimization. arXiv:1505.03906 (2015)
Arjovsky, M, Chintala, S, Bottou, L: Wasserstein GAN. arXiv:1701.07875 (2017)
Gulrajani, I, Ahmed, F, Arjovsky, M, Dumoulin, V, Courville, AC: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp 5769–5779 (2017)
Mroueh, Y, Sercu, T: Fisher GAN. In: Advances in Neural Information Processing Systems, pp 2510–2520 (2017)
Mroueh, Y, Li, C-L, Sercu, T, Raj, A, Cheng, Y: Sobolev GAN. arXiv:1711.04894(2017)
Mroueh, Y, Sercu, T, Goel, V: Mcgan: mean and covariance feature matching GAN. arXiv:1702.08398 (2017)
Kozachenko, L F, Leonenko, NN: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)
Google Scholar
Goria, MN, Leonenko, NN, Mergel, VV, Inverardi, PLN: A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametr. Stat. 17(3), 277–297 (2005)
Article Google Scholar
Kingma, D, Ba, J: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Tieleman, T, Hinton, G: Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4(2). https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf (2012)
Paszke, A, Gross, S, Chintala, S, Chanan, G, Yang, E, DeVito, Z, Lin, Z, Desmaison, A, Antiga, L, Lerer, A: Automatic differentiation in PyTorch. NIPS Autodiff Workshop (2017)
Strebelle, S: Conditional simulation of complex geological structures using multiple-point statistics. Math. Geol. 34(1), 1–21 (2002)
Article Google Scholar
Remy, N, Boucher, A, Wu, J: Sgems: Stanford geostatistical modeling software. Software Manual (2004)
Tan, X, Tahmasebi, P, Caers, J: Comparing training-image based algorithms using an analysis of distance. Math. Geosci. 46(2), 149–169 (2014)
Article Google Scholar
Borg, I, Groenen, P: Modern multidimensional scaling: theory and applications. J. Educ. Meas. 40(3), 277–280 (2003)
Article Google Scholar
Otsu, N: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article Google Scholar
Klambauer, G, Unterthiner, T, Mayr, A, Hochreiter, S: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp 971–980 (2017)
Yeh, R, Chen, C, Lim, TY, Hasegawa-Johnson, M, Do, MN: Semantic image inpainting with perceptual and contextual losses. arXiv:1607.07539 (2016)
Ulyanov, D, Vedaldi, A, Lempitsky, V: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: Proceedings of CVPR (2017)
Li, Y, Fang, C, Yang, J, Wang, Z, Lu, X, Yang, M-H: Diversified texture synthesis with feed-forward networks. In: Proceedings of CVPR (2017)
Kim, T, Bengio, Y: Deep directed generative models with energy-based probability estimation. arXiv:1606.03439 (2016)
Ioffe, S, Szegedy, C: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Rezende, DJ, Mohamed, S: Variational inference with normalizing flows. arXiv:1505.05770(2015)
Kingma, DP, Salimans, T, Jozefowicz, R, Chen, X, Sutskever, I, Welling, M: Improved variational inference with inverse autoregressive flow. In: Advances in Neural Information Processing Systems, pp 4743–4751 (2016)
Wang, D, Liu, Q: Learning to draw samples: with application to amortized mle for generative adversarial learning. arXiv:1611.01722 (2016)
Nguyen, A, Yosinski, J, Bengio, Y, Dosovitskiy, A, Clune, J: Plug & play generative networks: conditional iterative generation of images in latent space. arXiv:1612.00005 (2016)
Engel, J, Hoffman, M, Roberts, A: Latent constraints: learning to generate conditionally from unconditional generative models. arXiv:1711.05772 (2017)
Bengio, Y: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp 437–478. Springer (2012)
Reddi, SJ, Kale, S, Kumar, S: On the convergence of Adam and beyond. International Conference on Learning Representations (2018)
Fukushima, K, Miyake, S: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and Cooperation in Neural Nets, pp 267–285. Springer (1982)
LeCun, Y, Boser, B, Denker, JS, Henderson, D, Howard, RE, Hubbard, W, Jackel, LD: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Dumoulin, V, Visin, F: A guide to convolution arithmetic for deep learning. arXiv:1603.07285 (2016)
Shahriari, B, Swersky, K, Wang, Z, Adams, RP, De Freitas, N: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)
Article Google Scholar
Zoph, B, Le, QV: Neural architecture search with reinforcement learning. arXiv:1611.01578(2016)

Download references

Author information

Authors and Affiliations

Heriot-Watt University, Edinburgh, UK
Shing Chan & Ahmed H. Elsheikh

Authors

Shing Chan
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed H. Elsheikh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shing Chan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Implementation details

This section describes training and hyperparameters of the neural network models. See [67] for a practical guide on training neural networks.

1.1 A.1 Generator neural network

The generator $G\colon \mathbb {R}^{30}\to \mathbb {R}^{64\times 64}$ is a deep convolutional neural network based on the template provided in [34]. The generator architecture consists of stacks of (transposed) convolutional layers (see Appendix 1) together with batch normalization layers [61]. Batch normalization is the operation of normalizing the intermediate layer results to have zero mean and unit variance, which drastically improves optimization of deep neural networks [61]. For the non-linearity, we use rectified linear units (ReLU, σ(x) = max(0,x)) in the intermediate layers, and σ(x) = tanh(x) in the last layer to constrain the output in [− 1, 1]. The architecture is summarized in Table 1a. We train G using the Wasserstein formulation of GAN introduced in [41] with the proposed default hyperparameters. The optimization is performed using the Adam [48, 68] method with learning rate of 10^− 4 and batch size of 32. Our generator converges in approximately 20,000 iterations, taking around 30 minutes using a Nvidia GeForce GTX Titan X GPU. For deployment, it can generate approximately 5500 realizations per second using the GPU.

Table 3 Neural network parametrization. ConvT, transposed convolution, the triplet indicates (filter size, stride, padding); BN, batch normalization; FC, fully connected

Full size table

1.2 A.2 Inference neural network

We use the same inference network architecture $I\colon \mathbb {R}^{30}\to \mathbb {R}^{30}$ for all our conditioning experiments. The architecture is simply a stack of fully connected layers with constant-size intermediate layers. More specifically, we first transform the input from size 30 to size 512, then apply several more intermediate transformations preserving the size, and finally apply a transformation to bring the size back from 512 to 30 in the output layer. For the non-linearity, we use scaled exponential linear units (SeLU) [56], which are the current default option for deep fully connected networks: σ(x) = λx if x > 0, otherwise σ(x) = λα(e^x − 1), where constants λ,α are given in [56]. No non-linearity is applied in the output layer (we do not need to bound the output as in the case of the generator). We experimented with different numbers of layers. Perhaps not surprisingly, we found that deeper architectures tended to produce better results in general. In our work, we settled with 5 intermediate layers. The architecture is summarized in Table 1b. We optimize I using the Adam method with learning rate of 10^− 4 and batch size of 64 for all the test cases. The network converges in between 1000 and 10,000 iterations, depending on the conditioning, taking between seconds and a few minutes to train using a Nvidia GeForce GTX Titan X GPU. For deployment, the conditional generator G ∘ I can generate approximately 5500 realizations per second using the GPU—we do not see significant increase in generation time from G to G ∘ I.

Appendix B: Conditioning settings

The conditioning settings are summarized in Table 4.

Table 4 Conditioning configuration for each test case. The pair (i,j) denotes cell indices (row and column, respectively), and val = 1 indicates channel material, while val = 0 indicates background material

Full size table

Appendix C: Mixture of Gaussians

The proposed method described in Section 3 can be used to train a general neural sampler. In this side section, we perform a simple sanity check by assessing the method on a toy problem where we train neural networks to sample mixture of Gaussians. Concretely, we train fully connected neural networks $I_{\phi }\colon \mathbb {R}^{n_{w}}\to \mathbb {R}^{n_{z}}$ to sample simple 1D and 2D mixture of Gaussians, with n_z = n_w = 1 in the 1D case, and n_z = n_w = 2 in the 2D case. The source distribution p_w is the standard normal in both cases. Results are summarized in Fig. 17.

The first example (Fig. 17a) is a mixture of three 1D Gaussians, with centers μ₁ = − 1, μ₂ = 2, and μ₃ = 6, and standard deviations σ₁ = 1, σ₂ = 2, and σ₃ = 0.5, respectively. The density of the Gaussian mixture is indicated along with a histogram for 1000 points generated by the neural network at an early stage of the training (100 iterations), and at convergence (1000 iterations). The second example (Fig. 17b) is a mixture of three 2D Gaussians, with centers μ₁ = (− 1,− 1), μ₂ = (1, 2) and μ₃ = (2,− 1), and covariances ${\Sigma }_{1}=\left (\begin {array}{ll} 1&-0.5 \\ -0.5&1 \end {array} \right )$, ${\Sigma }_{2}=\left (\begin {array}{ll} 1.5&0.6 \\ 0.6&0.8 \end {array} \right )$, and ${\Sigma }_{3}=\left (\begin {array}{ll}1&0 \\ 0&1 \end {array} \right )$, respectively. We plot the contour lines of the density of the Gaussian mixture. We also show a scatter-plot of 4000 points generated by the neural network at an early stage of the training (20 iterations), and at convergence (1000 iterations). In both test cases, we can verify that the neural network effectively learns to transport points from the standard normal distribution to the mixture of Gaussians.

Appendix D: Comparison with related work based on inpainting

In image processing, image inpainting is used to fill incomplete images or replace a subregion of an image (e.g., a face with eyes covered). The recent GAN-based inpainting technique employed in [20, 21] uses an optimization approach with the following loss:

$$ \mathcal{L}(z) = \|G(z)_{\text{obs}}-d_{\text{obs}}\|^{2} + \lambda \log(1-D(G(z))) $$

(1)

The second term in this loss function is referred to as the perceptual loss and is the same second term in the GAN loss in Eq. 2, which is the classification score on synthetic realizations. Compare Eq. 10 with Eq. 6: While our Bayesian posterior uses a simple Gaussian prior, the prior in Eq. 10 (the perceptual loss) involves the discriminator D used during the GAN training. We argue that the Gaussian prior can be equally effective, as long as the GAN training has converged successfully: If G and D are at convergence, then G(z) always produces plausible realizations for z ∼ p_z where p_z is the chosen latent distribution, and D is 1/2 for all realizations of G(z). In such scenario, the perceptual loss should then act as a regularization term that drives z towards regions of high density of the latent distribution p_z, therefore having a similar effect to using p_z as the prior.

For example, let us consider $\mathbf {z}\sim \mathcal {U} [0,1]$ and $\mathbf {y}\sim \mathcal {U} [1,3]$. An optimal generator would be G(z) = 2z + 1 and an optimal discriminator D(y) = 1/2 for y ∈ [1, 3] and D(y) = 0 otherwise. Then, D(G(z)) = 1/2 for z ∈ [0, 1], and D(G(z)) = 0 otherwise, which is precisely the density function of $\mathbf {z}\sim \mathcal {U} [0,1]$ scaled by 1/2. Therefore, in this example the perceptual loss and p_z as prior would have the same effect. Nevertheless, in practice the perceptual loss can be very useful when G and D are not exactly optimal and there exist bad realizations from G. In that case, the perceptual loss can help the optimization to find good solutions. In our work, we found our Gaussian prior to be sufficient while removing a layer of complexity in the optimization.

Appendix E: Convolutional neural networks

We provide a very brief description by example of convolutional neural networks (see [69, 70] for further details or [71] for a more practical treatment). Let u = (u₁,u₂,u₃,u₄) and a = (a₁,a₂). Let us call a a filter. To convolve the filter a on u is to compute the output vector v with components v_i = u_ia₁ + u_i+ 1a₂ for i = 1,⋯ , 3. The operation is illustrated as a neural network layer in Fig. 18a. In this example, the convolution has a stride of 1 (at which the filter is swept), but in general it can be any positive integer.

We also show the matrix A associated with this operation—it is easy to verify that v = Au. We see that the associated matrix is sparse and diagonal-constant, which is the appeal of using convolutional layers. This structural constraint achieves two things: it drastically reduces the number of free weights, and it does so by assuming a locality prior. This locality prior turns out to be useful in practice, since nearby events in natural phenomena (natural images, speech, text, etc.) tend to be correlated.

Compare the convolutional layer with the fully connected layer shown in Fig. 18b: In the fully connected case, the associated matrix is dense, resulting in 12 free weights whereas the convolution layer has only 2 for the same layer sizes. This difference is greatly amplified in practice where inputs/outputs are large (e.g., images), making convolutional layers a much more efficient architecture. Note that in practice we use deep architectures, i.e., several stacks of convolutional layers, therefore the full connectivity can be recovered if necessary, although now with an embedded locality prior along with a hierarchy in the influence of the weights.

Note that the example considered above would always result in a smaller output vector size. If the opposite effect is desired, a simple solution is to transpose the matrix A. For this reason, this operation is called a transposed convolution. Several stacks of transposed convolutions are typically used in generators and decoders to upsample the small latent vector to the full-size output image. In classifier neural networks, normal convolutions are used instead to downsample the large image to a single number indicating a probability.

Our brief description can be readily extended to 2D and 3D arrays with corresponding multidimensional filters. For example, for a 2D input the filters are of rectangular shape and can be swept horizontally and vertically. See [71] for further practical details.

Appendix F: Computational complexity

Let N denote the dataset size and d the dimension of each realization. Fast PCA methods based on singular value decomposition can achieve a complexity of $\mathcal {O}(N^{2}d)$. This complexity is favorable in geology where N ≪ d, i.e., we have a small number of very large realizations (although it still grows quadratically with the number of realizations). For our present method, reporting the computational complexity is less straightforward since it is highly problem-dependent. To illustrate the difficulties, we discuss in the following the computational complexity of a classifier neural network—similar arguments apply to encoders, generators and decoders.

Computing the computational complexity of neural network models is cumbersome since it fully depends on the architecture, which in turn depends on the learning difficulty of the problem at hand. For example, in the simple case that the dataset is linearly separable, a classifier neural network of the form f(x) = σ(w^Tx + b), with w,b to be determined, is enough to correctly classify all points of the dataset. The evaluation cost of this neural network is simply $\mathcal {O}(d)$, hence the training cost is $\mathcal {O}(Td)$ (when using stochastic gradient descent as normally done), where T is the number of update iterations. Note that this expression does not depend on N, although in practice T is at most linear in N, e.g., when performing multiple passes through the dataset until convergence, but note that the training can also converge even before a single pass through the dataset (which happens on massive datasets). Hence, neural networks are very favorable in the big data setting, i.e., when N is very large.

The estimated evaluation cost of $\mathcal {O}(d)$ is overly optimistic since in practice we use deep architectures to deal with complex datasets that are not linearly separable. If the architecture is instead f(x) = σ(A_l(⋯σ(A₂(σ(A₁x + b₁) + b₂))⋯ ) + b_l), where each A_i is a d × d matrix, then the evaluation cost of this architecture is roughly $\mathcal {O}(d^{2})$ (we omit the number of layers l since this is a constant factor and l ≪ d). However, this estimate is now overly pessimistic: First, in practice the A_i are not shape-preserving, instead they decrease very quickly in size while exponentially compressing the input (e.g., A₁ is of size $d\times \frac {d}{2}$, A₂ is of size $\frac {d}{2}\times \frac {d}{4}$). Second, the matrices A_i are rarely full since convolutional layers are used instead (see Appendix E), resulting in very sparse matrices that are several orders of magnitude lighter. Modern architectures use several stacks of exponentially decreasing convolutional layers, while fully connected layers are avoided or used only sparingly (and for small inputs/outputs). The overall effect is a drastic reduction in the computational complexity, from $\mathcal {O}(d^{2})$ to $\mathcal {O}(kd)$ where k is a factor that is determined by the architecture. The corresponding training complexity is then $\mathcal {O}(Tkd)$. Note that although k < d in practice, k can still be sizable. On the other hand, k = 1 is also possible as just mentioned. Ultimately, k will depend on the learning difficulty of the problem. In most models encountered in the literature, k grows sublinearly with d.

Perhaps more importantly is the human time, rather than computational time, that is involved in optimizing the dozens of hyperparameters—in particular the architecture design—for which automation is currently limited. As mentioned before, designing the architecture is heavily based on experience, heuristics, and experimentation which incur high costs in terms of engineering time. The justification of such costs will ultimately depend on the lifespan of the model, since the model needs to be constructed only once but can be deployed for a long time (e.g., history matching) or virtually indefinitely (e.g., most applications in internet companies such as recommender systems, visual and voice recognition, language translation). Automatic architecture search is an ongoing area of research (see e.g. [72, 73] and references therein).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Chan, S., Elsheikh, A.H. Parametric generation of conditional geological realizations using generative neural networks. Comput Geosci 23, 925–952 (2019). https://doi.org/10.1007/s10596-019-09850-7

Download citation

Received: 17 July 2018
Accepted: 18 June 2019
Published: 13 July 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s10596-019-09850-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Parametric generation of conditional geological realizations using generative neural networks

Abstract

Article PDF

Similar content being viewed by others

Stochastic Facies Inversion with Prior Sampling by Conditional Generative Adversarial Networks Based on Training Image

Conditioning generative adversarial networks on nonlinear data for subsurface flow model calibration and uncertainty quantification

GANSim: Conditional Facies Simulation Using an Improved Progressive Growing of Generative Adversarial Networks (GANs)

References