Stochastic seismic waveform inversion using generative adversarial networks as a geological prior

We present an application of deep generative models in the context of partial-differential equation (PDE) constrained inverse problems. We combine a generative adversarial network (GAN) representing an a priori model that creates subsurface geological structures and their petrophysical properties, with the numerical solution of the PDE governing the propagation of acoustic waves within the earth's interior. We perform Bayesian inversion using an approximate Metropolis-adjusted Langevin algorithm (MALA) to sample from the posterior given seismic observations. Gradients with respect to the model parameters governing the forward problem are obtained by solving the adjoint of the acoustic wave equation. Gradients of the mismatch with respect to the latent variables are obtained by leveraging the differentiable nature of the deep neural network used to represent the generative model. We show that approximate MALA sampling allows efficient Bayesian inversion of model parameters obtained from a prior represented by a deep generative model, obtaining a diverse set of realizations that reflect the observed seismic response.


Introduction
Solving an inverse problem means finding a set of model parameters that best fit observed data (Tarantola, 1984;2005). The observed data or measurements are often noisy and/or sparse and therefore lead to an ill-posed inverse problem where numerous model parameters may match the observed data (Kabanikhin, 2008). Additionally the model used to describe how the observed data are generated, the so-called forward model, may be uncertain. Based on natural observations or an understanding of the 1 Department of Earth Science and Engineering, Imperial College London, London, United Kingdom. Correspondence to: Lukas Mosser <lukas.mosser15@imperial.ac.uk>. underlying data generating process we may have a preconception about possible or impossible states of the model parameters. We may formulate this knowledge as a prior probability distribution function (pdf) of our model parameters and use Bayesian inference to obtain a posterior pdf of the model parameters given the observations (Tarantola, 2005).
Investigating the interior structure of the earth is usually an ill-posed inverse problem (Tarantola, 2005). One method to explore the subsurface features of the earth is acoustic reflection seismic ( Fig. 1) (Claerbout, 1971). A number of recording devices (geophones) that record displacements are placed on the surface and a localized impulse provides an active source from which acoustic waves radiate. These waves propagate within the earth and are reflected from geological features back to the surface where geophones record the incoming signals. These recordings, seismographs 1 , represent a set of spatially distributed acoustic recordings (Fig. 2). The process of determining geological structures and properties of the rocks that match these data is called seismic inversion.
Seismic inversion involves modeling the physical process of waves radiating through the earth's interior. By comparing the simulated synthetic measurements to actual acoustic recordings of reflected waves we can modify model parameters and minimize the misfit between synthetic data and true measurements. The adjoint of the partial-differential equation defines a gradient of the data mismatch leading to gradient-based optimization of model parameters (Plessix, 2006). The set of parameters represented by the spatial distribution of the acoustic velocity of the rocks within the earth can easily exceed 10 6 values depending on the resolution of the simulation grid and the observed data. Large three-dimensional seismic observations may require billions of parameters to be inverted for, demanding enormous computational resources (Akcelik et al., 2003).
Direct observations of the earth's interior are extremely difficult to obtain. Bore-holes may have been drilled for hydrocarbon exploration/development or hydrological measurements. These represent a quasi one-dimensional source Computational domain for the acoustic inversion problem. Acoustic recording devices are placed on the surface (∇) and record incoming acoustic waves reflected from geological structures and emanating from an artificial source ( * ). The computational domain is embedded within a dampened boundary domain to emulate lateral and vertical dissipation of the wave-source. The generative model G θ (z) creates the underlying spatially distributed p-wave velocity. Additional lower-dimensional constraints (dashed vertical line representing a well) can be placed on the generative model, by incorporating loss terms. The vertical axis of the computational domain has been rescaled by a factor 10 for visualization purposes.
of information of an extremely localized nature. Typical bore-hole sizes are on the order of 10s of centimeters in diameter whereas the resolution of seismic observations is usually on the order of 10s of meters.
Nevertheless, we can deduce prior knowledge of the earth's interior from surface observations and exposed geological features. The principle of gradualism (Hutton, 1788) states that processes governing the earth's surface today are the same processes that controlled the deposition and erosion of ancient geological features now buried deep within the earth. This geological knowledge can be incorporated into prior distributions of physical properties of rocks, such as the acoustic p-wave velocity, or into geological features such as geological facies and fault distributions within the earth.
Efficient parameterizations (Akcelik et al., 2002;Boehm et al., 2016) that allow a dimensionality reduced representation of the high-dimensional parameter space of possible  Figure 2a shows the acoustic recordings as individual continuous wave forms of the measured acoustic signals. Figure 2b shows the same dataset of acoustic measurements visualized as a collection of discrete pixels of an image. In this case, each sample in time is represented by a single pixel and each recorded waveform or so-called trace is represented by a column of pixels within the image. models may allow for improved inversion techniques. Due to the high computational cost incurred by inversion (Modrak & Tromp, 2015;Akcelik et al., 2003), probabilistic ensembles of models that match observed data are rarely generated and often only a single model that satisfies predefined quality criteria is created and used for interpretation and decision making processes.
We parameterize a set of geological features by a deep generative model that creates stochastic realizations of possible model parameters. The probabilistic distribution of model parameters is parameterized by a lower-dimensional set of normally distributed latent variables. Combined with a generative deep neural network this represents a differentiable prior on the possible model parameters. We combine this differentiable generative model with the numerical solution of the acoustic wave equation to produce synthetic acoustic observations of the earth's interior (Louboutin et al., 2017). Using the adjoint method (Plessix, 2006), we compute a gradient with respect to model parameters not directly in the high-dimensional model space, but in the much smaller set of latent variables. These gradients are required to perform a Metropolis-adjusted Langevin (MALA) sampling of the posterior of the model parameters given the observed seismic data. Performing MALA sampling allows us to obtain a diverse ensemble of model parameters that match the observed seismic data. Additional constraints on the generative model, such as information located at an existing bore-hole, are readily incorporated and included in the MALA sampling procedure.
We summarize our contributions as follows: • We combine a differentiable generative model controlled by a set of latent variables with the solution of a PDE-constrained numerical solution of a physical forward problem.
• We use gradients obtained from the adjoint method to perform MALA sampling of the posterior in the lower-dimensional set of latent variables.
• We illustrate the proposed inversion framework using a simple seismic inversion problem and evaluate the resulting ensemble of model parameters.
• The framework allows integration of additional information, such as the knowledge of geological facies along one-dimensional vertical bore-holes.
• The proposed approach may readily be extended to a number of inverse problems where gradients of the objective function with respect to input parameters can be calculated. Tarantola (1987) cast the geophysical seismic inversion problem in a Bayesian framework. Mosegaard & Tarantola (1995) presented a general methodology to perform probabilistic inversion using Monte Carlo sampling. They sampled the prior distribution of model parameters using a random walk and subsequently sampled from their posterior using a Metropolis rule. In a similar manner, Sen & Stoffa (1996) evaluated the use of Gibbs sampling to obtain a posteriori model parameters and evaluate parameter uncertainties. Mosegaard (1998) showed that the general Bayesian inversion approach of Mosegaard & Tarantola (1995) could not only provide model parameter covariances but also gave information on the ability to resolve geological features. Geostatistical models allowed spatial relationships and dependencies of the petrophysical parameters to be modeled and incorporated into a stochastic inversion framework (Bortoli et al., 1993;Haas & Dubrule, 1994). Buland & Omre (2003) have developed an approach to perform Bayesian inversion for elastic petrophysical properties in a linearized setting.

Related Work
In the case of geophysical inverse-problems computation of the solution to the forward problem is highly expensive. Therefore, computationally efficient approximations to the full solution of the wave-equation may allow efficient solutions to complex geophysical inversion problems. Neural networks have been shown to be universal function approximators (Hornik et al., 1989) and as such lend themselves as possible proxy-models for solutions to the geophysical forward and inverse problem.
The early work by Röth & Tarantola (1994) presents an application of neural networks to invert from an acoustic time-domain of seismic amplitude responses to a depth profile of acoustic velocity in a supervised setting. They used pairs of synthetic data and velocity models to train a multi-layer feed-forward neural network with the goal of predicting acoustic velocities from recorded data only. They showed that neural networks can produce high resolution approximations to the solution of the inverse problem based on representations of the input model parameters and resulting synthetic waveforms alone. In addition, they showed that neural networks can invert for geophysical parameters in the presence of significant levels of acoustic noise.
Representing the geophysical model parameters at each point in space quickly leads to a large number of model parameters especially in the case of three-dimensional problems. Berg & Nyström (2017) represented the spatially varying coefficients that govern the solution of a partial differential equation (PDE) by a neural network. The neural network acts as an approximation to the spatially varying coefficients characterized by the weights of the neural network.
The weights of the individual neurons are modified by leveraging the adjoint-state equation in the reduced-dimensional space of network-parameters rather than at each spatial location of the computational grid.
Hansen & Cordua (2017) (Chang et al., 2017) and bilinear (Asim et al., 2018) inverse problems, as well as compressed sensing (Bora et al., 2017). For more general geophysical inverse problems, Laloy et al. (2017) have used a variational auto-encoder to create geological models used for hydrological inversion. Inversion was performed using an adapted Markov-Chain Monte-Carlo (MCMC) (Laloy & Vrugt, 2012) algorithm where the generative model was used as an unconditional prior to sample hydrological model parameters.
Mosser et al. (2018) used a GAN with cycle-constraints (cycle-GAN) (Zhu et al., 2017) to perform seismic inversion formulating the inversion task as a domain-transfer problem. Their work used a cycle-GAN to map between the seismic amplitude domain and p-wave velocity models. Due to the p-wave velocity models and seismic amplitudes being represented as a function of depth, rather than depth and time respectively, this approach lends itself to perform stratigraphic inversion, where a pre-existing velocity model is used to perform time-depth conversion of the seismic am- The set of possible earth models are represented by a generative model with parameters θ (the parameters of the generator m ∼ G θ (z)). We obtain model observations of the acoustic waves u via the deterministic partial-differential equation (PDE) as well as partial observation of the latent model parameters m from local information at e.g. bore-holes w.
plitudes. Richardson (2018) used a quasi-Newton method to optimize model parameters in the latent-space of a pretrained GAN for a synthetic salt-body benchmark dataset.

Bayesian Inversion
In the Bayesian framework of inverse problems we aim to find the posterior of latent variables p(z|d obs ) given the observed acoustic reflection data d obs (Fig. 3). The joint probability of the latent variables z and observed seismic data d is Furthermore, by applying Bayes rule, we define the posterior over the latent variables z given the observed seismic data d We represent the observed data by where S(m) = S(m(x)) = S(G θ (z)), denoting the spatial model coordinates as x, and S is the seismic forward modeling operator. The geological facies m f acies , the pwave velocity m Vp , and the rock density m ρ represent the set of model parameters m. The model parameter m f acies represents the probability of a geological facies to occur at a spatial location x, whereas m Vp and m ρ represent spatial distributions of rock properties. We assume a normally distributed noise term .
The aim is to find samples of the posterior z t ∼ p(z|d). We reformulate the approach using an approximate Metropolisadjusted Langevin sampling rule (MALA-approx) (Roberts & Tweedie, 1996;Roberts & Rosenthal, 1998;Nguyen et al., 2016) Rewriting the log-likelihood of the data given the latent variables in terms of the L 2 -norm log p(d|z t ) ∝ S(G θ (z t ))− d obs 2 leads to the proposal rule of the MALA approximation (Nguyen et al., 2016) Using this sampling approach requires gradients of the data mismatch with respect to model parameters of the forward problem, which are obtained by the adjoint-state method which will be presented in the following section. The gradients of the model parameters ∂G θ (zt) ∂zt with respect to the latent variables are obtained by traditional neural network backpropagation.
Where lower-dimensional information is available, such as at bore-holes, the geological models should honor not only the seismic response but also this additional information. In this study we additionally evaluate the possibility to obtain samples of the posterior that reflect observed geological facies indicators m f acies at a one-dimensional bore-hole. When including bore-hole information the step-proposal corresponds to where we seek to obtain samples of the posterior given the observed seismic data d obs and geological facies at the wells m f acies well . The additional term log p(m f acies = m f acies well |z t ) in equation 6 is equal to the log-likelihood of the facies probability, as derived from the generator, calculated at the bore-hole. In all our simulations we set 3 = 100 when bore-hole data is incorporated.

Adjoint-State Method
We perform numerical solution of the time-dependent acoustic wave equation given a set of model parameters where u(x, t) is the unknown wave-field and m Vp (x) is the acoustic p-wave velocity. The dampening term η du(x,t) dt prevents reflections from domain boundaries and ensures that waves dissipate laterally.
Time-dependent source wavelets q(x, x s , t) are introduced at source locations x s . We emulate the seismic acquisition process by placing acoustic receivers that record the incoming wave-field at the top edge of the simulation domain. While receiver locations remain fixed, the number of sources is varied to image a number of parts of the subsurface domain.
We introduce a compact notation of equation 7 which we refer to as the forward-problem.
To perform sampling according to the MALA algorithm presented in equation 5, we seek to obtain a gradient of the following functional where d pred and d obs are the predicted and observed acoustic observations respectively.
We augment the functional J(m Vp (x)) by forming the Lagrangian Differentiating L(m Vp , u, λ) with respect to λ leads to the state equation 7, but differentiation with respect to the acoustic wave-field u leads to the adjoint state equations (Plessix, 2006): showing that we obtain a similar back-propagation equation as that used to derive gradients in neural networks (LeCun et al., 1988): the data mismatch is backpropagated thanks to a linear equation in the adjoint state vector λ. By differentiating the Lagrangian in equation 10 with respect to m(x) we obtain ∂J ∂m Vp = − λ, which is equivalent to the gradient required to perform MALA sampling of the posterior distribution of latent variables (Eq. 5).
We perform numerical solution of the acoustic waveequation and the respective adjoint computation using the domain-specific symbolic language Devito (Kukreja et al., 2016;Louboutin et al., 2017). Numerical solution is performed using a fourth-order finite-difference scheme.

Generative Model
We parameterize the model parameters of the acoustic wave equation (Eq. 7), by a differentiable generative model G θ (z). We use a generative model to sample realizations of spatially varying model parameters m(x) ∼ G θ (z). These realizations are obtained by sampling a number of latent variable vectors z. The associated model representations represent the a priori knowledge about the spatially varying properties of the geological structures in the subsurface.
We model the prior distribution of the spatially varying model parameters m(x) (Sec. 3.1) by a generative adversarial network (GAN) (Goodfellow et al., 2014). GANs represent a generative model where the underlying probability density function is implicitly defined by a set of training examples. To train GANs two functions are required: a generator G θ (z) and a discriminator D ω (m). The role of the generator is to create random samples of an implicitly defined probability distribution that are statistically indistinguishable from a set of training examples. The discriminator's role is to distinguish real samples from those created by the generator. Both functions are trained in a competitive two-player min-max game where the overall loss is defined by: Due to the opposing nature of the objective functions, training GANs is inherently unstable and finding stable training methods remains an open research problem. Nevertheless, a number of training methods have been proposed that allow more stable training of GANs. In this work we use a so-called Wasserstein-GAN Gulrajani et al., 2017), that seeks to minimize the Wasserstein distance between the generated and real probability distribution. We use a Lipschitz penalty term proposed by Petzka et al. (2017) to stabilize training of the Wasserstein-GAN. The full objective function including the gradient penalty term is given by: where τ is a statistical distribution controlling a linear combination between a real and generated sample (Petzka et al., 2017).
In our work we set λ LP = 200 to train the generative model. We represent both the generator and discriminator 2 function by deep fully convolutional neural networks (see Appendix Table 1). The generator uses a number of convolutional 2 In the Wasserstein-GAN literature the discriminator is also termed a "critic".  (a) distinguish between river channel bodies (light) and shale (dark). Acoustic p-wave velocity Vp (b) and rock density ρ (c) are constant within river-channels and vary by layer within shale. layers followed by so-called pixel-shuffle transformations to create output domains (Shi et al., 2016).
The latent vector is parametrized as a multivariate standardized normal distribution: Due to the geological properties represented in our dataset, namely, geological facies indicators m f acies , acoustic pwave velocity m Vp and density m ρ , the generator must output three data channels. We represent the geological facies as a probability of a spatial location belonging to a shale or sandstone facies. To facilitate numerical stability of the GAN training process we apply a hyperbolic tangent activation function and convert to a probability m f acies for subsequent computation (Eq. 6). The acoustic p-wave velocity V p is represented by a Gaussian distribution within each modeled geological facies. We apply a hyperbolic tangent activation function to model the output distribution of the p-wave model parameters m Vp . The rock density m ρ follows a Gaussian distribution and a soft-plus activation function is used to ensure positive values of rock density (Appendix A.1). In this study, only the facies indicator m f acies and acoustic p-wave velocity m Vp are used in the inversion process.
The generator-discriminator pairing is trained on the set of training images described in section 5. After training, the generator G θ (z) and the forward modeling operator S(m) are arranged in a fully differentiable computational graph. To accommodate the sources and receivers of the acoustic forward modeling process described in section 3, we pad the output of the generator by a domain of constant p-wave velocity.

Dataset
Geological structures in the earth's interior often closely resemble features observed at the surface. As sediments accumulate over time underlying rock is buried and exposed to high temperatures and pressures, deforming and compacting sediments. Ancient river systems often represent pathways for fluids with high storage capacity and permeability and are therefore common targets for hydrocarbon exploration.
To demonstrate the proposed inversion method we will use a model of a fluvial-dominated system consisting of highly porous sandstones embedded in a fine grained shaly material. Object-based models are commonly used to model such geological systems (Deutsch & Wang, 1996). They represent the fluvial environment as a set of randomly located geometric objects following various size, shape and property distributions. We train a set of GANs on a dataset of ten thousand realizations of two-dimensional cross-sections of fluvial object-based models.
The individual cross-sections are created with an objectbased model, where half-circle sand-bodies follow a uniform width distribution and their p-wave velocity and density are constant for each channel-body. The locations of individual channel-bodies are determined by a uniform distribution in spatial location. The fine-grained material surrounding the river-systems is made of layers of single-pixel thickness where each layer has a constant value of acoustic p-wave velocity and density which varies randomly but marginally from one layer to another. We use a binary indicator variable to distinguish the two facies regions, riverchannel vs shale-matrix. The ratio of how much of a given cross-section is filled with river-channels compared to the overall area of the geological domain is a key property in understanding the geological nature of these structures. This ratio follows a uniform distribution in our dataset and riverchannels are placed at random until a cross-section meets the necessary ratio criterion.
A total of ten thousand training images were created as a training set for the GAN. A further five thousand images were retained as a test set to evaluate the inversion technique.
While training the generative model outlined in section 4 we monitor image quality and output distributions for each of the modeled properties.  wave velocity and rock density for the ground-truth example that was used to emulate a set of obtained measured seismic data.

Results
We evaluate the proposed method of inversion by sampling a set of latent variables z determining the output of the generative model G θ (z) (Sec. 5, Fig. 4). First, we evaluate the generative model as a prior for representing possible earth models and generating N = 100 unconditional samples. The resulting ensemble shows high variability in geological structures and closely matches the distribution of geophysical properties defined in the training dataset (Fig. 4a).
Two cases of inversion are considered: inversion for the acoustic p-wave velocity V p and combined inversion of acoustic velocity and of geological facies along a bore-hole. For all tests we perform inversion using the MALA-approx scheme and accept inverted models with a relative seismic error seismic of less than 10%. For the additional bore-hole constraint we require an accuracy above 95% of geological facies to be accepted as a valid inverted sample. While lower relative errors in seismic mismatch and bore-hole accuracy can be achieved, evaluating the forward problem and adjoint of the partial differential equation comes at high computational cost and therefore a cost-effectiveness tradeoff was necessary.
For the first case of seismic inversion without model constraints we perform simulations where the number of acoustic sources are increased. Fewer acoustic sources means that less of the domain is properly imaged, leading to high uncertainty in areas where no incoming waves have been reflected and recorded by the receivers on the surface. Figure 5 shows a comparison of the evaluated inverted models for N = 100 inverted earth models. Acoustic sources are shown by red diamonds. The acoustic sources and 128 receivers are equally spaced across the top edge of the domain.
In Figure 5 (b, d-f) we show the pixel-wise standard deviation of 100 inverted models for an increasing number of acoustic sources (2 sources up to 27 sources). As the total number of acoustic sources increases we obtain a lower standard deviation for the resulting model ensembles. In the case of two acoustic sources (Fig. 5b) we find that close to the sources there is small variation amongst the inverted models (dark shades) whereas the central area where no acoustic source has been placed shows a very high degree of variation. This is confirmed by the three source case where a central acoustic source has been placed in addition to the sources on the borders of the domain. Lower variability in the inverted ensemble and structural features can be observed. This correlates well with the Bayesian interpretation of the inverse problem; where acoustic sources allow the subsurface to be imaged we arrive at a low standard deviation in the posterior ensemble of geological models, whereas within regions that are only sparsely sampled by the acoustic sources we expect the prior, the unconditional generative model, to be more prevalent, leading to a higher variability of geological features. As expected, when we increase the number of sources, we find overall smaller variability in the resulting ensemble of inverted earth models. We observe only marginal reduction in variability between the case of nine and twenty-seven sources (Fig. 5e-f) as we reach the limits of resolution of the forward problem.
In the case where lower-dimensional information such as a bore-hole was included as an additional objective function constraining the generative model (Fig. 5b), we find a lower standard deviation around this bore-hole. The standard deviation along the well is close to zero due to the perrealization 95% accuracy constraint. Furthermore there is a region of influence where the bore-hole constrains lateral features such as channel bodies. This is shown by channel shaped features of low standard deviation at the top and bottom of the domain. Comparison with the ground truth image (Fig. 4a) shows that two channel bodies can be found along the one-dimensional feature.
For each generated realization we have recorded the relative error in the seismic data mismatch (Fig. 6) as a function of the required iterations to perform the inversion as well as the latent variable vectors z at each MALA sampling iteration.
In practice we find that performing 100 iterations of the MALA approximation leads to a sufficient reduction in the seismic likelihood and as the step-size is reduced linearly, the seismic data mismatch stabilizes at relative errors of 5-7%. When bore-hole constraints are included (Fig. 7 bottom) we need to perform 200 MALA-approx iterations to find inversions that respect seismic observations and bore-hole features.
Due to the fact that modern full-waveform inversion (FWI) methods come at very high computational cost for two and possibly three-dimensional inversions, a small number of required iterations is a necessity. In further tests, reducing the number of iterations of the MALA approximation or simply minimizing the seismic loss by gradient descent, as performed by Richardson (2018), allows convergence to small relative errors but this approach has been shown to lead to reduced sample diversity (Nguyen et al., 2016).
Due to the differentiable nature of the generative model, whose parameters are kept constant for inversion, it may be possible to deduce much more efficient implementations that do not require computation of the gradient of the highdimensional model parameters m but rather evaluate gradients direct with respect to the latent variables.
Using a probabilistic model that defines a posterior over the latent variables such as a variational auto-encoder (Kingma & Welling, 2013) may allow more efficient sampling by performing inversion once and finding the region of matching geological models in latent space.

Conclusions
Inversion of subsurface geological structures from acoustic reflection seismic data is a classical method to aid the understanding of the earth's interior. The inference of model parameters from measured acoustic properties is often performed in the very high-dimensional space of model properties leading to very CPU-intensive optimization (Akcelik et al., 2003).
We apply a method that combines a generative model of geological structures efficiently parameterized by a lowerdimensional set of latent variables, with a numerical solution of the acoustic inverse problem for seismic inversion using the adjoint method. Leveraging the adjoint of the studied partial differential equation we deduce gradients that are consequently used to sample from the posterior over the latent variables given the mismatch of the observed seismic data by following an approximate MALA scheme (Nguyen et al., 2016).
While the proposed application was illustrated on a very simple geophysical inversion this method may find use in other domains where spatial property models control the evolution of physical systems such as in the flow of fluids in porous media or materials science. The combination of a deep generative model parameterized by a lower-dimensional set of latent variables and gradients obtained by the adjoint method may lead to new efficient techniques for solving high-dimensional inverse problems.