Abstract
Deep learning techniques are increasingly being considered for geological applications where—much like in computer vision—the challenges are characterized by high-dimensional spatial data dominated by multipoint statistics. In particular, a novel technique called generative adversarial networks has been recently studied for geological parametrization and synthesis, obtaining very impressive results that are at least qualitatively competitive with previous methods. The method obtains a neural network parametrization of the geology—so-called a generator—that is capable of reproducing very complex geological patterns with dimensionality reduction of several orders of magnitude. Subsequent works have addressed the conditioning task, i.e., using the generator to generate realizations honoring spatial observations (hard data). The current approaches, however, do not provide a parametrization of the conditional generation process. In this work, we propose a method to obtain a parametrization for direct generation of conditional realizations. The main idea is to simply extend the existing generator network by stacking a second inference network that learns to perform the conditioning. This inference network is a neural network trained to sample a posterior distribution derived using a Bayesian formulation of the conditioning task. The resulting extended neural network thus provides the conditional parametrization. Our method is assessed on a benchmark image of binary channelized subsurface, obtaining very promising results for a wide variety of conditioning configurations.
Article PDF
Similar content being viewed by others
Explore related subjects
Find the latest articles, discoveries, and news in related topics.Avoid common mistakes on your manuscript.
References
Jacquard, P.: Permeability distribution from field pressure data. Soc. Pet. Eng. https://doi.org/10.2118/1307-PA(1965)
Jahns, H. O.: A rapid method for obtaining a two-dimensional reservoir description from well pressure response data. Soc. Pet. Eng. https://doi.org/10.2118/1473-PA (1966)
Sarma, P, Durlofsky, LJ, Aziz, K: Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math. Geosci. 40(1), 3–32 (2008)
Ma, X, Zabaras, N: Kernel principal component analysis for stochastic input model generation. J. Comput. Phys. 230(19), 7311–7331 (2011)
Vo, HX, Durlofsky, LJ: Regularized kernel PCA for the efficient parameterization of complex geological models. J. Comput. Phys. 322, 859–881 (2016)
Shirangi, MG, Emerick, AA: An improved TSVD-based Levenberg–Marquardt algorithm for history matching and comparison with Gauss–Newton. J. Pet. Sci. Eng. 143, 258–271 (2016)
Tavakoli, R, Reynolds, AC: Monte Carlo simulation of permeability fields and reservoir performance predictions with SVD parameterization in RML compared with EnKF. Comput. Geosci. 15(1), 99–116 (2011)
Jafarpour, B., McLaughlin, D. B.: Reservoir characterization with the discrete cosine transform. Soc. Petrol. Eng. https://doi.org/10.2118/106453-PA (2009)
Jafarpour, B, Goyal, VK, McLaughlin, DB, Freeman, WT: Compressed history matching: exploiting transform-domain sparsity for regularization of nonlinear dynamic data integration problems. Math. Geosci. 42(1), 1–27 (2010). ISSN 1874-8953. https://doi.org/10.1007/s11004-009-9247-z
Moreno, D., Aanonsen, S. I.: Stochastic facies modelling using the level set method. In: EAGE Conference on Petroleum Geostatistics (2007)
Dorn, O, Villegas, R: History matching of petroleum reservoirs using a level set technique. Inverse Prob. 24(3), 035015 (2008). http://stacks.iop.org/0266-5611/24/i=3/a=035015
Chang, H, Zhang, D, Lu, Z: History matching of facies distribution with the EnKF and level set parameterization. J. Comput. Phys. 229(20), 8011–8030 (2010). ISSN 0021-9991. https://doi.org/10.1016/j.jcp.2010.07.005. http://www.sciencedirect.com/science/article/pii/S0021999110003748
Khaninezhad, MM, Jafarpour, B, Li, L: Sparse geologic dictionaries for subsurface flow model calibration: part i. Inversion formulation. Adv. Water Resour. 39, 106–121 (2012)
Khaninezhad, MM, Jafarpour, B, Li, L: Sparse geologic dictionaries for subsurface flow model calibration: part ii. Robustness to uncertainty. Adv. Water Resour. 39, 122–136 (2012)
Goodfellow, I, Pouget-Abadie, J, Mirza, M, Bing, Xu, Warde-Farley, D, Ozair, S, Courville, A, Bengio, Y: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680 (2014)
Mosser, L, Dubrule, O, Blunt, MJ: Reconstruction of three-dimensional porous media using generative adversarial neural networks. arXiv:1704.03225 (2017)
Mosser, L, Dubrule, O, Blunt, MJ: Stochastic reconstruction of an oolitic limestone by generative adversarial networks. arXiv:1712.02854 (2017)
Chan, S, Elsheikh, AH: Parametrization and generation of geological models with generative adversarial networks. arXiv:1708.01810 (2017)
Laloy, E, Hérault, R, Jacques, D, Linde, N: Training-image based geostatistical inversion using a spatial generative adversarial neural network. Water Resour. Res. 54(1), 381–406 (2018)
Dupont, E, Zhang, T, Tilke, P, Liang, L, Bailey, W: Generating realistic geology conditioned on physical measurements with generative adversarial networks. arXiv:1802.03065 (2018)
Mosser, L, Dubrule, O, Blunt, MJ: Conditioning of three-dimensional generative adversarial networks for pore and reservoir-scale models. arXiv:1802.05622 (2018)
Chan, S, Elsheikh, AH: Parametrization of stochastic inputs using generative adversarial networks with application in geology. arXiv:1904.03677 (2019)
Marçais, J, de Dreuzy, J-R: Prospective interest of deep learning for hydrological inference. Groundwater 55(5), 688–692 (2017)
Nagoor Kani, J, Elsheikh, AH: DR-RNN: a deep residual recurrent neural network for model reduction. arXiv:1709.00939 (2017)
Klie, H, et al.: Physics-based and data-driven surrogates for production forecasting. In: SPE Reservoir Simulation Symposium. Society of Petroleum Engineers (2015)
Stanev, VG, Iliev, FL, Hansen, S, Vesselinov, VV, Alexandrov, BS: Identification of release sources in advection–diffusion system by machine learning combined with Green’s function inverse method. Appl. Math. Model. 60, 64–76 (2018)
Sun, W, Durlofsky, LJ: A new data-space inversion procedure for efficient uncertainty quantification in subsurface flow problems. Math. Geosci. 49(6), 679–715 (2017)
Zhu, Y, Zabaras, N: Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. J. Comput. Phys. 366, 415–447 (2018)
Valera, M, Guo, Z, Kelly, P, Matz, S, Cantu, A, Percus, AG, Hyman, JD, Srinivasan, G, Viswanathan, HS: Machine learning for graph-based representations of three-dimensional discrete fracture networks. arXiv:1705.09866 (2017)
Strebelle, SB, Journel, AG: Reservoir modeling using multiple-point statistics. In: SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers (2001)
Brock, A, Donahue, J, Simonyan, K: Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096 (2018)
Karras, T, Aila, T, Laine, S, Lehtinen, J: Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196 (2017)
Schmidhuber, J: Learning factorial codes by predictability minimization. Neural Comput. 4(6), 863–879 (1992)
Radford, A, Metz, L, Chintala, S: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015)
Salimans, T, Goodfellow, I, Zaremba, W, Cheung, V, Radford, A, Chen, X: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp 2234–2242 (2016)
Arjovsky, M, Bottou, L: Towards principled methods for training generative adversarial networks. arXiv:1701.04862 (2017)
Arora, S, Ge, R, Liang, Y, Ma, T, Zhang, Y: Generalization and equilibrium in generative adversarial nets (GANs). arXiv:1703.00573 (2017)
Müller, A: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Gretton, A, Borgwardt, KM, Rasch, M, Schölkopf, B, Smola, AJ: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, pp 513–520 (2007)
Dziugaite, GK, Roy, DM, Ghahramani, Z: Training generative neural networks via maximum mean discrepancy optimization. arXiv:1505.03906 (2015)
Arjovsky, M, Chintala, S, Bottou, L: Wasserstein GAN. arXiv:1701.07875 (2017)
Gulrajani, I, Ahmed, F, Arjovsky, M, Dumoulin, V, Courville, AC: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp 5769–5779 (2017)
Mroueh, Y, Sercu, T: Fisher GAN. In: Advances in Neural Information Processing Systems, pp 2510–2520 (2017)
Mroueh, Y, Li, C-L, Sercu, T, Raj, A, Cheng, Y: Sobolev GAN. arXiv:1711.04894(2017)
Mroueh, Y, Sercu, T, Goel, V: Mcgan: mean and covariance feature matching GAN. arXiv:1702.08398 (2017)
Kozachenko, L F, Leonenko, NN: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)
Goria, MN, Leonenko, NN, Mergel, VV, Inverardi, PLN: A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametr. Stat. 17(3), 277–297 (2005)
Kingma, D, Ba, J: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Tieleman, T, Hinton, G: Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4(2). https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf (2012)
Paszke, A, Gross, S, Chintala, S, Chanan, G, Yang, E, DeVito, Z, Lin, Z, Desmaison, A, Antiga, L, Lerer, A: Automatic differentiation in PyTorch. NIPS Autodiff Workshop (2017)
Strebelle, S: Conditional simulation of complex geological structures using multiple-point statistics. Math. Geol. 34(1), 1–21 (2002)
Remy, N, Boucher, A, Wu, J: Sgems: Stanford geostatistical modeling software. Software Manual (2004)
Tan, X, Tahmasebi, P, Caers, J: Comparing training-image based algorithms using an analysis of distance. Math. Geosci. 46(2), 149–169 (2014)
Borg, I, Groenen, P: Modern multidimensional scaling: theory and applications. J. Educ. Meas. 40(3), 277–280 (2003)
Otsu, N: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Klambauer, G, Unterthiner, T, Mayr, A, Hochreiter, S: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp 971–980 (2017)
Yeh, R, Chen, C, Lim, TY, Hasegawa-Johnson, M, Do, MN: Semantic image inpainting with perceptual and contextual losses. arXiv:1607.07539 (2016)
Ulyanov, D, Vedaldi, A, Lempitsky, V: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: Proceedings of CVPR (2017)
Li, Y, Fang, C, Yang, J, Wang, Z, Lu, X, Yang, M-H: Diversified texture synthesis with feed-forward networks. In: Proceedings of CVPR (2017)
Kim, T, Bengio, Y: Deep directed generative models with energy-based probability estimation. arXiv:1606.03439 (2016)
Ioffe, S, Szegedy, C: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Rezende, DJ, Mohamed, S: Variational inference with normalizing flows. arXiv:1505.05770(2015)
Kingma, DP, Salimans, T, Jozefowicz, R, Chen, X, Sutskever, I, Welling, M: Improved variational inference with inverse autoregressive flow. In: Advances in Neural Information Processing Systems, pp 4743–4751 (2016)
Wang, D, Liu, Q: Learning to draw samples: with application to amortized mle for generative adversarial learning. arXiv:1611.01722 (2016)
Nguyen, A, Yosinski, J, Bengio, Y, Dosovitskiy, A, Clune, J: Plug & play generative networks: conditional iterative generation of images in latent space. arXiv:1612.00005 (2016)
Engel, J, Hoffman, M, Roberts, A: Latent constraints: learning to generate conditionally from unconditional generative models. arXiv:1711.05772 (2017)
Bengio, Y: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp 437–478. Springer (2012)
Reddi, SJ, Kale, S, Kumar, S: On the convergence of Adam and beyond. International Conference on Learning Representations (2018)
Fukushima, K, Miyake, S: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and Cooperation in Neural Nets, pp 267–285. Springer (1982)
LeCun, Y, Boser, B, Denker, JS, Henderson, D, Howard, RE, Hubbard, W, Jackel, LD: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Dumoulin, V, Visin, F: A guide to convolution arithmetic for deep learning. arXiv:1603.07285 (2016)
Shahriari, B, Swersky, K, Wang, Z, Adams, RP, De Freitas, N: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)
Zoph, B, Le, QV: Neural architecture search with reinforcement learning. arXiv:1611.01578(2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Implementation details
This section describes training and hyperparameters of the neural network models. See [67] for a practical guide on training neural networks.
1.1 A.1 Generator neural network
The generator \(G\colon \mathbb {R}^{30}\to \mathbb {R}^{64\times 64}\) is a deep convolutional neural network based on the template provided in [34]. The generator architecture consists of stacks of (transposed) convolutional layers (see Appendix 1) together with batch normalization layers [61]. Batch normalization is the operation of normalizing the intermediate layer results to have zero mean and unit variance, which drastically improves optimization of deep neural networks [61]. For the non-linearity, we use rectified linear units (ReLU, σ(x) = max(0,x)) in the intermediate layers, and σ(x) = tanh(x) in the last layer to constrain the output in [− 1, 1]. The architecture is summarized in Table 1a. We train G using the Wasserstein formulation of GAN introduced in [41] with the proposed default hyperparameters. The optimization is performed using the Adam [48, 68] method with learning rate of 10− 4 and batch size of 32. Our generator converges in approximately 20,000 iterations, taking around 30 minutes using a Nvidia GeForce GTX Titan X GPU. For deployment, it can generate approximately 5500 realizations per second using the GPU.
1.2 A.2 Inference neural network
We use the same inference network architecture \(I\colon \mathbb {R}^{30}\to \mathbb {R}^{30}\) for all our conditioning experiments. The architecture is simply a stack of fully connected layers with constant-size intermediate layers. More specifically, we first transform the input from size 30 to size 512, then apply several more intermediate transformations preserving the size, and finally apply a transformation to bring the size back from 512 to 30 in the output layer. For the non-linearity, we use scaled exponential linear units (SeLU) [56], which are the current default option for deep fully connected networks: σ(x) = λx if x > 0, otherwise σ(x) = λα(ex − 1), where constants λ,α are given in [56]. No non-linearity is applied in the output layer (we do not need to bound the output as in the case of the generator). We experimented with different numbers of layers. Perhaps not surprisingly, we found that deeper architectures tended to produce better results in general. In our work, we settled with 5 intermediate layers. The architecture is summarized in Table 1b. We optimize I using the Adam method with learning rate of 10− 4 and batch size of 64 for all the test cases. The network converges in between 1000 and 10,000 iterations, depending on the conditioning, taking between seconds and a few minutes to train using a Nvidia GeForce GTX Titan X GPU. For deployment, the conditional generator G ∘ I can generate approximately 5500 realizations per second using the GPU—we do not see significant increase in generation time from G to G ∘ I.
Appendix B: Conditioning settings
The conditioning settings are summarized in Table 4.
Appendix C: Mixture of Gaussians
The proposed method described in Section 3 can be used to train a general neural sampler. In this side section, we perform a simple sanity check by assessing the method on a toy problem where we train neural networks to sample mixture of Gaussians. Concretely, we train fully connected neural networks \(I_{\phi }\colon \mathbb {R}^{n_{w}}\to \mathbb {R}^{n_{z}}\) to sample simple 1D and 2D mixture of Gaussians, with nz = nw = 1 in the 1D case, and nz = nw = 2 in the 2D case. The source distribution pw is the standard normal in both cases. Results are summarized in Fig. 17.
The first example (Fig. 17a) is a mixture of three 1D Gaussians, with centers μ1 = − 1, μ2 = 2, and μ3 = 6, and standard deviations σ1 = 1, σ2 = 2, and σ3 = 0.5, respectively. The density of the Gaussian mixture is indicated along with a histogram for 1000 points generated by the neural network at an early stage of the training (100 iterations), and at convergence (1000 iterations). The second example (Fig. 17b) is a mixture of three 2D Gaussians, with centers μ1 = (− 1,− 1), μ2 = (1, 2) and μ3 = (2,− 1), and covariances \({\Sigma }_{1}=\left (\begin {array}{ll} 1&-0.5 \\ -0.5&1 \end {array} \right )\), \({\Sigma }_{2}=\left (\begin {array}{ll} 1.5&0.6 \\ 0.6&0.8 \end {array} \right )\), and \({\Sigma }_{3}=\left (\begin {array}{ll}1&0 \\ 0&1 \end {array} \right )\), respectively. We plot the contour lines of the density of the Gaussian mixture. We also show a scatter-plot of 4000 points generated by the neural network at an early stage of the training (20 iterations), and at convergence (1000 iterations). In both test cases, we can verify that the neural network effectively learns to transport points from the standard normal distribution to the mixture of Gaussians.
Appendix D: Comparison with related work based on inpainting
In image processing, image inpainting is used to fill incomplete images or replace a subregion of an image (e.g., a face with eyes covered). The recent GAN-based inpainting technique employed in [20, 21] uses an optimization approach with the following loss:
The second term in this loss function is referred to as the perceptual loss and is the same second term in the GAN loss in Eq. 2, which is the classification score on synthetic realizations. Compare Eq. 10 with Eq. 6: While our Bayesian posterior uses a simple Gaussian prior, the prior in Eq. 10 (the perceptual loss) involves the discriminator D used during the GAN training. We argue that the Gaussian prior can be equally effective, as long as the GAN training has converged successfully: If G and D are at convergence, then G(z) always produces plausible realizations for z ∼ pz where pz is the chosen latent distribution, and D is 1/2 for all realizations of G(z). In such scenario, the perceptual loss should then act as a regularization term that drives z towards regions of high density of the latent distribution pz, therefore having a similar effect to using pz as the prior.
For example, let us consider \(\mathbf {z}\sim \mathcal {U} [0,1]\) and \(\mathbf {y}\sim \mathcal {U} [1,3]\). An optimal generator would be G(z) = 2z + 1 and an optimal discriminator D(y) = 1/2 for y ∈ [1, 3] and D(y) = 0 otherwise. Then, D(G(z)) = 1/2 for z ∈ [0, 1], and D(G(z)) = 0 otherwise, which is precisely the density function of \(\mathbf {z}\sim \mathcal {U} [0,1]\) scaled by 1/2. Therefore, in this example the perceptual loss and pz as prior would have the same effect. Nevertheless, in practice the perceptual loss can be very useful when G and D are not exactly optimal and there exist bad realizations from G. In that case, the perceptual loss can help the optimization to find good solutions. In our work, we found our Gaussian prior to be sufficient while removing a layer of complexity in the optimization.
Appendix E: Convolutional neural networks
We provide a very brief description by example of convolutional neural networks (see [69, 70] for further details or [71] for a more practical treatment). Let u = (u1,u2,u3,u4) and a = (a1,a2). Let us call a a filter. To convolve the filter a on u is to compute the output vector v with components vi = uia1 + ui+ 1a2 for i = 1,⋯ , 3. The operation is illustrated as a neural network layer in Fig. 18a. In this example, the convolution has a stride of 1 (at which the filter is swept), but in general it can be any positive integer.
We also show the matrix A associated with this operation—it is easy to verify that v = Au. We see that the associated matrix is sparse and diagonal-constant, which is the appeal of using convolutional layers. This structural constraint achieves two things: it drastically reduces the number of free weights, and it does so by assuming a locality prior. This locality prior turns out to be useful in practice, since nearby events in natural phenomena (natural images, speech, text, etc.) tend to be correlated.
Compare the convolutional layer with the fully connected layer shown in Fig. 18b: In the fully connected case, the associated matrix is dense, resulting in 12 free weights whereas the convolution layer has only 2 for the same layer sizes. This difference is greatly amplified in practice where inputs/outputs are large (e.g., images), making convolutional layers a much more efficient architecture. Note that in practice we use deep architectures, i.e., several stacks of convolutional layers, therefore the full connectivity can be recovered if necessary, although now with an embedded locality prior along with a hierarchy in the influence of the weights.
Note that the example considered above would always result in a smaller output vector size. If the opposite effect is desired, a simple solution is to transpose the matrix A. For this reason, this operation is called a transposed convolution. Several stacks of transposed convolutions are typically used in generators and decoders to upsample the small latent vector to the full-size output image. In classifier neural networks, normal convolutions are used instead to downsample the large image to a single number indicating a probability.
Our brief description can be readily extended to 2D and 3D arrays with corresponding multidimensional filters. For example, for a 2D input the filters are of rectangular shape and can be swept horizontally and vertically. See [71] for further practical details.
Appendix F: Computational complexity
Let N denote the dataset size and d the dimension of each realization. Fast PCA methods based on singular value decomposition can achieve a complexity of \(\mathcal {O}(N^{2}d)\). This complexity is favorable in geology where N ≪ d, i.e., we have a small number of very large realizations (although it still grows quadratically with the number of realizations). For our present method, reporting the computational complexity is less straightforward since it is highly problem-dependent. To illustrate the difficulties, we discuss in the following the computational complexity of a classifier neural network—similar arguments apply to encoders, generators and decoders.
Computing the computational complexity of neural network models is cumbersome since it fully depends on the architecture, which in turn depends on the learning difficulty of the problem at hand. For example, in the simple case that the dataset is linearly separable, a classifier neural network of the form f(x) = σ(wTx + b), with w,b to be determined, is enough to correctly classify all points of the dataset. The evaluation cost of this neural network is simply \(\mathcal {O}(d)\), hence the training cost is \(\mathcal {O}(Td)\) (when using stochastic gradient descent as normally done), where T is the number of update iterations. Note that this expression does not depend on N, although in practice T is at most linear in N, e.g., when performing multiple passes through the dataset until convergence, but note that the training can also converge even before a single pass through the dataset (which happens on massive datasets). Hence, neural networks are very favorable in the big data setting, i.e., when N is very large.
The estimated evaluation cost of \(\mathcal {O}(d)\) is overly optimistic since in practice we use deep architectures to deal with complex datasets that are not linearly separable. If the architecture is instead f(x) = σ(Al(⋯σ(A2(σ(A1x + b1) + b2))⋯ ) + bl), where each Ai is a d × d matrix, then the evaluation cost of this architecture is roughly \(\mathcal {O}(d^{2})\) (we omit the number of layers l since this is a constant factor and l ≪ d). However, this estimate is now overly pessimistic: First, in practice the Ai are not shape-preserving, instead they decrease very quickly in size while exponentially compressing the input (e.g., A1 is of size \(d\times \frac {d}{2}\), A2 is of size \(\frac {d}{2}\times \frac {d}{4}\)). Second, the matrices Ai are rarely full since convolutional layers are used instead (see Appendix E), resulting in very sparse matrices that are several orders of magnitude lighter. Modern architectures use several stacks of exponentially decreasing convolutional layers, while fully connected layers are avoided or used only sparingly (and for small inputs/outputs). The overall effect is a drastic reduction in the computational complexity, from \(\mathcal {O}(d^{2})\) to \(\mathcal {O}(kd)\) where k is a factor that is determined by the architecture. The corresponding training complexity is then \(\mathcal {O}(Tkd)\). Note that although k < d in practice, k can still be sizable. On the other hand, k = 1 is also possible as just mentioned. Ultimately, k will depend on the learning difficulty of the problem. In most models encountered in the literature, k grows sublinearly with d.
Perhaps more importantly is the human time, rather than computational time, that is involved in optimizing the dozens of hyperparameters—in particular the architecture design—for which automation is currently limited. As mentioned before, designing the architecture is heavily based on experience, heuristics, and experimentation which incur high costs in terms of engineering time. The justification of such costs will ultimately depend on the lifespan of the model, since the model needs to be constructed only once but can be deployed for a long time (e.g., history matching) or virtually indefinitely (e.g., most applications in internet companies such as recommender systems, visual and voice recognition, language translation). Automatic architecture search is an ongoing area of research (see e.g. [72, 73] and references therein).
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Chan, S., Elsheikh, A.H. Parametric generation of conditional geological realizations using generative neural networks. Comput Geosci 23, 925–952 (2019). https://doi.org/10.1007/s10596-019-09850-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10596-019-09850-7