Generative models and Bayesian inversion using Laplace approximation

,


Introduction
Inverse problems are ubiquitous and statistical methods for their treatment have been developed for a long time [1,2,3].Frequently, inverse problems are considered in a discretized form, resulting in (large-scale) regression tasks, or they are posed as discrete inverse problems from the start, for example in functional MR imaging [4,5].Inverse problems are often ill-posed, or ill-conditioned in the discrete case, and reliable estimation requires some form of regularization such as Tikhonov regularization [6].Bayesian inference [7,8] provides an alternative approach to ill-posed inverse problems, in which the employed prior renders the estimation task well-posed.For example, a Gaussian prior can lead to a maximum a posteriori (MAP) estimate equivalent to the result of a Tikhonov regularization.Gaussian Markov random field (GMRF) priors [9] are another popular class of priors used.For example, to model a priori spatial smoothness as it is often relevant in spatial modeling or image processing.
While these analytic priors have been successfully applied in many applications, they do not always adequately model the prior knowledge available in an individual problem.For example, in quantitative MR imaging of the brain employed conventional prior distributions do not truly reflect the structure of the brain [10].Generative models from machine learning using modern architectures such as generative adversarial networks (GANs) [11] or variational auto-encoders (VAEs) [12], on the other hand, have proven to generate data that closely resemble the properties of data in a training set.Their efficiency, adaptability and easy accessibility in standard libraries result in a multitude of applications.From speech synthesis [13] over text generation [14] to molecule generation [15] and urbanization [16], to mention but just a few.For a comprehensive overview on the usage of deep learning methods for the solution of linear inverse problems we refer to [17], and for a review on the application of data-driven models to [18].For an overview of generative models in computer vision see for instance [19,20,21], and for a review in medical imaging cf.[22].
In view of their capabilities, generative models have recently been considered as priors in a Bayesian treatment of inverse problems, cf., e.g., [18].The advantage of such proceeding is that individual properties of the problem at hand, such as the structure of brain images, are adequately taken into account [10], which leads to an improved inference.Another advantage is that often the sought (discrete) function or field belongs to a low-dimensional manifold, which is exploited by current generative models such as GANs or VAEs.Then, the inference can be carried out in a low-dimensional space of latent variables, which strongly facilitates the calculation of the results in a Bayesian inference, cf.[23,24,25,26].
However, restricting the inference to a low-dimensional latent space has the disadvantage that it admits no Lebesgue density for the posterior in the high-dimensional space of the actual variables [27].Furthermore, the estimation accuracy reached can strongly depend on the quality of the generative model [28].To circumvent this drawback and somehow escape from the lower dimensional manifold, [27] takes the mean of the push-forward of the posterior in the latent space as a Bayes estimate.Some novel approaches also incorporate the inversion problem directly into the learning process [29], e.g. to achieve super resolution or to reconstruct high fidelity magnetic resonance images [30,31].In this regard, we like to mention [10] for their deep direct estimation procedure and the development of a conditional Wasserstein GAN discriminator, which allows sampling from the posterior.
In this paper, we consider a Bayesian treatment of linear Gaussian inverse problems that is carried out in the high-dimensional original variable space.The employed prior is determined by a probabilistic generative model.We propose an analytic solution to this task based on a Laplace approximation of the prior which is inspired by a treatment of [32] in the context of density estimation.The choice of the class of considered inverse problems is made in view of tasks such as inpainting, restoration of images [33] or medical imaging [31,34].Another motivation for our choice of problems is that it allows analytical results to be derived.
The properties of the proposed inference such as, for example, consistency, are explored and contrasted to those obtained for the inference which is carried out in the low-dimensional latent space.In addition, an image restoration task for blurred and noisy MNIST data [35] is taken to quantitatively compare the two approaches.The comparison is augmented by a simple heuristic for the choice of method.Pros and cons of both approaches are discussed in view of our results.
The paper is organized as follows.In Section 2 the considered class of inverse problems is specified and generative models are introduced.The Bayesian treatment of the inverse problems utilizing a generative model as a prior is then considered in Section 3.After recalling the inference utilizing the low-dimensional latent space and characterizing its properties, the proposed approach using a probabilistic generator is developed and explored.Subsequently, the two approaches are discussed and assessed in terms of their properties.Our treatment assumes knowledge about the variance of the observations, and finally, we briefly note on the possible generalization to the case of unknown variance.In Section 4 numerical examples are presented for a quantitative assessment.An outlook on potential future research and conclusions from our findings are given in Section 5.

Generative models and considered class of inverse problems
This section specifies the considered class of inverse problems and introduces generative models.The generative models will later be used for the construction of a prior in a Bayesian treatment of the inverse problems.Two classes of generative models are distinguished: one that uses a deterministic map applied to the latent variables and a probabilistic approach.The former will be taken to carry out a Bayesian inference in a low-dimensional latent space, the latter serves as the starting point for the proposed inference carried out directly in the high-dimensional space of the actual variables.

Linear inverse problems
We consider linear inverse problems of the form y|x ∼ N (Ax, σ 2 I), (1) where A denotes some given operator mapping from the (original) space X to the space of observables Y.We take X = R d and Y = R n so that A is an n by d matrix.We assume throughout that A has full rank and n ≥ d.The goal is to infer x given the data y.The variance σ 2 is mostly treated as a fixed given parameter and is largely suppressed in our notation, except for those cases in which it is explicitly included in the inference.Problems of the form (1) typically emerge from the discretization of a continuous inverse problem in which x could be a spatially distributed property of the human body or an image that has been blurred.Usually, the dimension of the (discretized) x will be high and the linearity assumption simplifies matters.The assumption made about the structure of the covariance matrix in the sampling distribution (1) essentially means that the full covariance matrix needs to be known up to a factor, a situation which, after a suitable linear transformation, yields a problem of the form in (1).We note that simple models of the form (1) are relevant in many applications, for example functional MR imaging [36], inpainting [37] or de-blurring of images [38].

Generative models
Generative models such as VAEs or GANs can produce random samples by transforming a simple distribution of latent variables, e.g. a multivariate standard Gaussian distribution, through a neural network.Those networks are trained on a (large) database in such a way that the resulting distribution approximates the distribution underlying the employed database.The database could for instance consist of a training set of images with a specific characteristic for the task at hand, e.g.handwritten digits from the MNIST data set or a sequence of MR images from various patients.Often, such data sets can be modeled as belonging to a low-dimensional manifold M ⊂ X which is exploited by the generative models through a correspondingly chosen low dimension of the latent variables.
In Figure 1 we depict the considered structure of the generative model.A latent vector z ∈ Z := R p of lower dimension p d is mapped by the generator to the original variable space X .Throughout this work, we assume that such a generator or generative model is available and that it has already been trained on a data set, together with a multivariate standard Gaussian distribution π(z) = N (z|0, I) as prior for the latent variables.In literature, there are usually two types of generator outputs considered.The deterministic map, subsequently denoted by g : X → M, and the probabilistic formulation.For the probabilistic formulation, we assume a Gaussian model according to where g(z) and Γ(z) denote the outputs of a trained (deep) neural network for input z.The covariance matrix is modeled as one of the following variants. ( The motivation for the use of generative models for the data-driven construction of a prior is that these models are extremely versatile and capable to produce a distribution whose realizations closely resemble those of the database used to train them, cf. the randomly produced digits produced by a trained generative model in Figure 1 which resemble the properties of the MNIST digit database.When a (large) database is available whose members represent typical features of the solution of a considered inverse problem, a prior constructed by a generative model trained on that database can be highly informative and beneficial for a Bayesian solution to that problem.A conventional prior such as a standard Gaussian prior (Figure 1 top row) or a GMRF prior (Figure 1 middle row), used to turn an inference into a well-posed problem and exploiting the prior knowledge of smoothness, on the other hand, will generally be much less informative, for example when considering the task to infer a digit from a blurred image of it.In fact, realizations of such a prior will not even approximately resemble a digit.Supplying a large database for a particular problem often is challenging, and techniques such as data augmentation [39] or virtual experiments are used in this context.However, these issues are beyond the scope of this paper.

Bayesian inference using generative models
A Bayesian inference is considered for inverse problems of the form (1) when using a prior that is constructed from a generative model.We start by recalling an inference in latent space, followed by a push-forward through the deterministic mapping of the generator, as proposed in [27,25].Then an inference procedure is developed that works directly in the space X of the actual variables by using a prior constructed from a probabilistic formulation of the generator.Finally, pros and cons of both approaches are discussed in terms of the inferential properties derived for them.

Inference in latent space
This approach models the data from (1) by where g denotes the deterministic mapping of the employed generative model and the variance σ 2 > 0 is assumed to be known.In using the prior of the generative model, the posterior in latent space is proper and given by Then, a subsequent change of variables using the deterministic map g generates a distribution in X .By the fairly general form and usually nonlinear, non-invertible structure of g, the posterior in latent space has no closed form.A similar approach is pursuit in [23] using a deterministic Auto-Encoder and in [27] using a VAE with deterministic decoder.The latter reference correctly argues that the g push-forward of the posterior in latent space does not admit a Lebesgue density in the variable space X .We summarize these properties in the following lemma.
Lemma 1 (Degenerated push-forward).Assume that the mapping g satisfies some (weak) regularity conditions, for example continuity.Then the posterior in latent space (6) is proper and has finite p-th moment for p < ∞.
The g push-forward of π(z|y) is a distribution in X which admits no Lebesgue density.
Proof.Propriety and the existence of moments for the posterior in latent space follows directly by 1 being an upper bound for the continuous likelihood exponent and the existence of every p-th moment p < ∞ for the Gaussian prior π(z).The existence of no Lebesgue density on X follows from the fact that g maps all probability mass to M which is assumed to have lower dimensionality than X .
Even though the g push-forward of π(z|y) does not admit a Lebesgue density, it is a well-defined distribution.To efficiently compute statistics of the g push-forward of the latent posterior, the authors in [27] employ a parallel tempered, preconditioned Crank-Nicolson MCMC to generate samples {z i } N i from the posterior in latent space Z to subsequently approximate the posterior expectation This expectation does generally not belong to the image space of the generator map, but it can be expected to be close to it.Alternatively, one can numerically compute the MAP z MAP of the posterior (6) in the latent space and then use g(z MAP ) as an estimate.This choice directly illuminates the limitations of the approach.By construction, g(z MAP ) is limited to the image space of the generator map.This observation is a result of an inherent bias which is introduced by the statistical model ( 4) compared to the model (1).Consequently, arguing from a frequentist point-of-view, the latent space approach suffers from consistency issues.In particular, the posterior mean estimate in (7) viewed in dependence on the observation y takes the role of an Bayes estimator for the true value x ∈ X under L 2 loss.This estimator is usually referred to the minimum mean square error (MMSE) estimator.This interpretation allows us to formulate the following lemma as the main result regarding consistency in the latent space approach.
Lemma 2 (Inconsistency of Bayes estimator).Let g(z) = g(z)(y) = E π(z|y) [g(z)|y] denote the MMSE estimator and consider { [ g σ (z)} σ explicitly dependent on the data variance of the sampling distribution (4).Furthermore, assume that g is continuous and such that the model (4) is identifiable.Let x ∈ X and assume x is not contained in the image space of the generator map, i.e., there exists a δ > 0 such that min z∈Z x − g(z) ≥ δ.Then, the estimator [ g σ (z) is not consistent, i.e., [ g σ (z) does not converge to x in probability as σ → 0.
Proof.By forming an identifiable model and having a posterior in latent space such as (6) implies where ) and with the assumption x − g(z 0 ) ≥ δ > 0 the claim follows.In particular, for < δ it holds P( [ g σ (z) − x 2 > ) 9 0. Remark 3. Taking the limit with respect to σ for the considered linear inverse problem is equivalent to taking the limit with respect to infinitely many repeated observations of y.This can be seen by the fact that the mean of y in the likelihood is a sufficient statistic with variance σ 2 /k, where k is the number of repetitions.

Remark 4. The result above generalizes to more general estimators defined by a possible different loss, which induces a different topology in X .
Having the asymptotic behavior of the estimator established, we are also interested in the asymptotic covariance of the g push-forward of the latent posterior.
Lemma 5 (Asymptotic covariance).Assume that g is continuous and renders the model (4) identifiable.Then, there exists z * ∈ Z such that the latent posterior (6) converges in total variation norm to the Dirac measure centered at z * .Moreover, assume that g is totally differentiable in z * .Then, the asymptotic covariance of the g push-forward of the latent space posterior is given by the inverse of the Fisher information matrix at where J z * denotes the Jacobi matrix of g evaluated at z = z * .
Proof.The model ( 4) and the latent posterior ( 6) fulfill the assumptions of the Bernstein-von-Mises theorem.
Hence, the convergence of the posterior to a Gaussian can be assessed and Č follows by simple calculus.
Taking the second derivative of the log-likelihood and using that the gradient of the log-likelihood is zero at z * directly gives the asymptotic covariance of the latent posterior

Inference in original space
As an alternative, we propose to perform the inference in the original space X .We introduce an efficient way of solving the inverse problem by introducing a Laplace approximation for the prior.This approach uses the actual data model ( 1) and employs the hierarchical prior where g and Γ are given by (deep) neural networks, cf.Section 2.2.Throughout this work, we assume that Γ(z) is positive definite and its smallest eigenvalue is bounded away from zero, for every z ∈ Z.Then, the resulting posterior is given by and we collect some of its properties in the following lemma.
Lemma 6 (Variable space posterior is proper).The posterior (11) is proper, has finite p−th moment for p < ∞ and fulfills the assumptions of the Bernstein-von-Mises theorem.
Proof.Observe that the prior is proper.By Γ(z) being positive definite, the first part of the exponent is bounded.Since Γ(z) has its smallest eigenvalue bounded away from zero, also the determinant is bounded.Then, propriety of π(x) follows by propriety of π(z).By the choice of a (bounded) Gaussian prior for π(z), also π(x) is bounded.Then, the remaining claims follow from standard theory of Bayesian inference for linear models.
An immediate consequence of the previous lemma is the fact that the MMSE estimator derived from the posterior (11) is consistent.Beside this obvious statistical advantage, unfortunately, the prior is computationally infeasible, since for every evaluation an integral has to be solved.Therefore, we make use of a linearization.A similar approach in this context has been applied for density estimation [32].Here, a Laplace approximation is suggested to render the intractable prior π(x) feasible.The Laplace approximation applies a linearization to the generator mean by taking for some expansion point z 0 , which has to be determined previously, and the Jacobian J z 0 ∈ R d,p of the generator mean at z 0 .Similarly, we have to expand the covariance matrix Γ(z) around z 0 .With the argument that the variance is expected to be less volatile than the mean, it is justifiable to do a constant expansion, only.This means Γ(z) ≈ Γ(z 0 ).Higher order expansions are in general possible, since they require higher order derivatives of the mean and covariance function with respect to z which are feasible due to automatic differentiation, but the resulting posterior becomes quite unapparent.
Inserting the expansions into the prior allows for an analytic representation of π(x) and the prior PDF can be expressed as follows.
Lemma 7 (Laplace approximated prior).For z 0 ∈ Z, the prior π(x) from the hierarchical model (10) is approximated by a Gaussian distribution The covariance matrix Γ(z 0 ) + J z 0 J T z 0 is positive definite and thus, the prior is proper on X .
Proof.Simple calculus yields the form of π L (x) and the remaining claims follow directly from the assumption that Γ(z 0 ) is positive definite for every z 0 ∈ Z.
The prior π L (x) is a sensible choice for solving the inverse problem and computations can be carried out analytically.Moreover, the constructed approximation will preserve important properties in the fashion of Lemma 6.We formulate this claim in the following theorem.

Theorem 8. Using the Laplace approximation π L (x) as a prior for the Bayesian inverse problem yields a Gaussian distribution as posterior on
with covariance matrix and mean For this posterior, moments of arbitrary order exist and π L converges, as σ → 0, to a Gaussian centered at the maximum likelihood estimate (A T A) −1 A T y, with covariance given by the inverse of the Fisher information matrix σ −2 A T A, for the linear model (1).
Proof.The mean and covariance are results of simple calculations for linear Gaussian models and priors.The remaining properties are clear for this Gaussian posterior.The convergence can be assessed by the Bernstein-von-Mises theorem.In particular, the posterior mean converges in probability to the maximum likelihood estimate and the covariance converges to the inverse of the Fisher information matrix of the linear model (1).

Remark 9. It can be seen that the asymptotic covariance of the Laplace approximated posterior π L (x|y) is given by
But, a quantitative comparison to the asymptotic covariance of the g push-forward latent space posterior is not trivial.In general, it is not possible to conclude that Ĉ is "larger" than Č in some sense.In fact, numerical considerations yield that the matrix Ĉ − Č is often indefinite.However, it is unambiguous from their structure that Č spans a linear space of dimension at most p.In contrast, the covariance Ĉ spans the whole space X of dimension d.
In the end, we like to discuss the choice of the expansion point z 0 for the Laplace approximation.A natural choice is to take the expansion point from the tuple (z 0 , x 0 ) which maximizes the integrand of the prior integral, i.e., the joint prior π(x, = π(x|z)π(z).As long as x ≈ x 0 , the approximation ( 11) is reasonable.If this condition is violated, both the true prior (10) and its approximation ( 14) are (expected to be) small, although their relative difference could be large.However, achieving this optimum in the high-dimensional space Z × X is numerically challenging and an optimizer is likely to get stuck in local optima.In our experiments, we achieved distinguished results using an efficient updating scheme which we will describe in the following.
1. Choose as initial value in X the solution to the linear least-squares problem and for the latent vector take z 0 = f (x 0 ), where f denotes the encoder mean map of the employed VAE.For more general generative models, a numerical optimization for z 0 can be performed which aims for the closest element in the latent space, which generates x 0 .Since the consideration of variants of (multi-level) latent space approaches, disentanglement or style transfer are well outside the scope of this paper, we refer to [40,41,42] 2.Then, consider the log-integrand 3. Evaluate the new z 1 by maximizing the integrand 4. If π 0 (x 0 , z 1 ) > π 0 (x 0 , z 0 ) take the new value z 1 as expansion point candidate.
5. repeat step 3 and 4 until no further improvement is achieved.
Note that, the choice of x 0 is taken in view of the data y.This, to some extend, renders the approach empirical Bayesian [43,44].

Discussion
Recapitulating the previous sections, we compared two Bayesian inference problems based on the original space formulation (1) and a formulation in latent space (4).In literature, it is common to consider the latent space approach which simplifies the required computations to a lower dimensional space and quantities thereof are more feasible to estimate.However, this comes at the cost of an inherent bias introduced by the employed approximate statistical model.From a statistical point-of-view, it would be beneficial to be able to establish consistent Bayes estimators which asymptotically yield correct estimates also outside of the range of the generator.Such a procedure based on the original space formulation is however numerically challenging.One possible approximative scheme is presented in the form of a Laplace approximation.This method inherits and preserves the consistency analysis of the original space approach, while being numerically feasible.In this regard, the presented approximation is expected to yield preferable solutions to the inverse problem when the information contained in the data is large.

Generalization to unknown variance
The natural extension of our analysis to the case of unknown variance is straightforward.We exemplary demonstrate this for the choice of an Inverse Gamma (IG) prior distribution for the variance, i.e. π(σ 2 ) ∝ (σ 2 ) −1−α exp(−β/σ 2 ) with shape and scale hyperparameter α, β > 0. This often employed prior models the positive variance in a flexible manner.In fact, α and β can be chosen such that π(σ 2 ) has no finite moment.Moreover, the explicit form allows for an analytic derivation of certain quantities of interest.
For the inference in latent space the hierarchical model reads The resulting posterior is given by To obtain the marginal posterior for z, the variance needs to be integrated out.This marginalization can be done analytically, which yields (24) The marginal posterior is proper, since for β > 0 the term in the brackets is bounded away from zero.Hence, the existence of moments of arbitrary order is guaranteed by the choice of a standard Gaussian distribution for π(z).For this marginal posterior, the claims of Lemma 1 and Lemma 2 follow directly with the same arguments 1 .This implies that, even in the case of unknown variance, the g push-forward of the marginal posterior is inconsistent in the sense of derived Bayes estimators.
For the sampling distribution (1) in the variable space X and the hierarchical prior of Section 3.2, a similar marginal posterior can be derived.In particular, the hierarchical model where the prior π(x) is derived as in Section 3.2.Again, due to β > 0, the same arguments of Lemma 6 apply, which renders Bayes estimators based on π(x|y) consistent, and π(x) can still be approximated as in Lemma 7, i.e. π L (x) ≈ π(x).However, Theorem 8, which analytically provides the posterior as a Gaussian distribution, is not directly applicable, since it relies on the fact that the posterior is given as the product of two (non-scaled) Gaussian probability density functions (PDF).Here, the posterior is slightly different and obtaining an approximative Gaussian would rely on numerical estimation of the MAP and Hessian of π(x|y).
Remark 10.Another typical choice for the prior of the variance is the non-informative Jeffrey's prior π(σ 2 ) ∝ 1/σ 2 .In this case, ensuring propriety and existence of moments for the latent space posterior requires additional assumptions on g to ensure boundedness from below of the term Ag(z) − y 2 .A similar assumption is sufficient for the variable space.

Numerical examples
To validate our theoretical findings, we perform experiments on a well-known data set with linear inverse problems governed by a blurring operation and homoscedastic Gaussian noise.In particular the data model is considered with x being an unknown vector in X = R 28×28 and A : X → Y = R 28×28 denotes a linear Gaussian blurring operator with known precision parameter η > 0 which steers the impact of the blurring.In

Generative model
We consider a typical example from machine learning.The data set of handwritten digits (MNIST) [35] consists of 60.000 training sample and 10.000 test samples, each sample corresponds to a grayscale image of size (28,28).A typical representative is shown in Figure 2. The generative model is chosen as an extension to the decoder of a VAE in an "off-the-shelf" Matlab [45] architecture.The latent space is chosen as Z = R 20 and the VAE is trained by optimizing the evidence lower bound (ELBO) on the training set using Adam with a constant learning rate 10 −3 and batch-size 512 for 20 epochs.Afterwards, to obtain a probabilistic decoder, the final deconvolution layer of the decoder is extended to incorporate a diagonal covariance and the encoder is fixed during a transfer learning step of additional five epochs.Example draws of the generative model are shown in Figure 1 (bottom row) which resulted by taking the decoder output for some z ∼ N (0, I).The results shown in this work are achieved using Python and Matlab.The Python source is available under https://gitlab1.ptb.de/marsch02/datainformed-prior.

Inference with known variance
For a quantitative comparison of the inversion quality of the latent space approach and the Laplace approximated variable space approach, we consider a single inference result in Figure 3 by taking one image of the test set as ground truth for the data model.Then, the blurring operator with η = 3 is applied and Gaussian noise with σ = 10 −2 is added.Subsequently, three approaches are applied to generate reconstruction estimates.In c) the resulting reconstruction with the oracle L 2 regularized approach is given.We show in d) the reconstruction result using the Latent space approach and in e) the result of the Laplace method.Above the estimated reconstructions, the PSNR value is indicated.In f), the standard deviation of the posterior in original space is presented, i.e., the standard deviation of the marginalized posterior π L (x i |y) for each pixel i = 1, . . ., d.
For the latent space approach of Section 3.1, a BFGS optimizer is used to numerically compute the MAP of the latent space posterior π(z|y) and application of g yields the estimate.As starting point for the optimization in latent space, the solution to the least squares problem ( 20) is computed and the encoder mean of the VAE yields the initial guess.
For the Laplace approximated approach, the choice of z 0 is given in Section 3.2.With this expansion point, the mean of the Gaussian posterior π L (x|y) from theorem 8 is taken as the estimate.
As a reference, we additionally include a solution obtained by an L 2 regularized deterministic optimization of the least-squares problem The regularization parameter is chosen through an oracle method by finding the value for λ for which x L 2 (λ) has the smallest difference to the ground truth in L 2 norm.This reference can be seen as the best possible homoscedastic Gaussian prior for the variable space approach.As a quality measure, the peak-signal-to-noise ratio (PSNR) with the ground truth is assessed,i.e., PSNR(x, x) = 20 log 10 (L) − 10 log 10 x − x 2 2 /d.(28) Motivated by the normalized images of the MNIST data set, we take L = 1, which renders the PSNR a rescaling of the mean-square error.However, a larger PSNR value indicates a better reconstruction result.
In Figure 3 it can be seen that, the oracle L 2 regularized approach is merely able to reconstruct the shape of the original digit.In contrast, the latent space approach yields an estimate, which resembles the digit 8 albeit a 0 was used as ground truth.This behavior can be explained by the closeness of the digits 0 and 8 in the latent space representation and the highly nonlinear, non-convex optimization problem which is to solve in Z.In terms of PSNR, the best reconstruction is achieved using the Laplace approach in the variable space X .Here, the blurring of the L 2 regularized solution is mostly resolved and the resulting digit is not collapsed into a 8. Additionally, due to the construction of the posterior π L (x|y) and its estimate, the covariance is a byproduct.Here, we show the square root of its diagonal in Figure 3 f).Since the posterior is a Gaussian, this corresponds to the standard deviations of the marginalized posterior for each pixel.
For a statistical and comprehensive comparison, we now consider 100 distinct images from the test set and perform the inference on every image.The results are collected in Figure 4.For all blurring precisions, a similar behaviour of the PSNR can be observed.With decreasing noise variance, i.e. σ → 0, the variable space approach, denoted "Laplace", is usually to favor over the latent space approach, denoted "Latent".This verifies our theoretical findings that the variable space approach is consistent.For moderately large values of σ, the latent space approach is usually to favor, since the bias introduced by the generative model is small compared to the impact of the data variance.In those cases, the latent space approach is also superior to the oracle L 2 regularized approach.This can be assessed for the variable space approach only for σ ≤ 0.01.observational noise log 10 Blurring precision = 5 L 2 Latent Laplace Guide Figure 4: Peak-signal-to-noise ratio with the ground truth is shown for varying blurring precision η in dependence on the noise variance σ 2 .The negative logarithm of the standard deviation is taken as abscissa in each plot.We show the results of an oracle L 2 regularized solution, the results of the "Latent" space approach in Section 3.1, the results of the "Laplace" approximated variable space approach of Section 3.2 and a heuristic "Guide" as explained in Section 4.3.

Empirical guidance
We have shown that the variable space approach and the employed Laplace approximation is to favor in cases where the information contained in the data is large.However, the advice to use our approach is based on an asymptotic result and in general it is difficult to assess whether the asymptotic result is already relevant.Therefore, we propose a heuristic argument based on a simple bias estimation to give guidance on the choice of the approach.For given y, both approaches yield an estimate x Laplace and x Latent .Interpreting these estimates as new ground truth allows us to perform the inversion again using virtual data y Laplace and y Latent .This subsequent inference yields x Laplace Laplace and x Latent Latent , for which the squared error to their ground truth can be assessed.The method which yields the smaller deviation is to favor.In Figure 4 we employ this heuristic approach under the label "Guide".It is to observe that, using this guide yields, on average, better results than applying only one of the two approaches.Considering squared errors in a cross-validation fashion, i.e. taking x Laplace as new observation and applying the latent space approach to obtain x Latent Laplace and vice versa, yields in our experiments a slightly worse guidance.

Conclusions and future research
We presented approaches to the solution of linear Bayesian inverse problems using generative models as a prior.Furthermore, we established convergence results for a state-of-the-art inference method in latent space and contrasted them to the asymptotic behaviour of an alternative approach in the high-dimensional variable space.Such an approach in the variable space is usually intractable due to the dimensionality and the complexity of the prior representation.Therefore, we derived a novel inference technique in the variable space based on a Laplace approximation which yields an analytic posterior distribution that inherits and preserves a Bernstein-von-Mises result.An extension to the case with unknown variance is presented and numerical examples underpin our theoretical findings.Finally, we motivate an empirical guidance to choose between the presented approaches in real scenarios.
This proof-of-concepts work paves the way for future research tackling various questions.Extending the numerical examples to more complex and practical data sets is an important step which raises also the need for efficient numerical treatment, e.g. to tackle the involved matrix inversions.Different and more complex models for the approximation of the prior in variable space can be pursuit, e.g. using Gaussian mixture models quadrature schemes.In this regard, efficient sampling approaches in high dimension may be required to obtain samples from the posterior.Also, connections to other recent concepts, such as bilevel optimization [46] can be of interest.

A Visual inference examples
In this section, we highlight some details on the inference results of Section 4. In particular, we consider one ground truth x and fix the blurring operator A with precision η = 4.Then, we thoroughly analyse the performance of the presented approaches for varying noise variance σ 2 .In Figure 5   From top to bottom, the noise is decreased.The columns L 2 , "Latent", and "Laplace" show one resulting estimate for each approach and below each estimate we depict the PSNR for 100 repeated experiments, each with a different noise realization.quality of the L 2 regularized (oracle) method, the "Latent" space approach, and the "Laplace" method, side-by-side.From top to bottom, the noise variance is decreased, i.e., we show σ = 10 −s for s = 1, 2, 3, 4. The images represent one estimate for each approach and the box plots below each estimate show the distribution of PSNR values for 100 repeated reconstructions with different noise realizations for the observation y.In the noisy regime (top row), the latent space approach is to favor and yields, on average, the best PSNR.For deceasing σ 2 , the Laplace approach yields better results.Also, it can be observed that the latent space

Figure 1 :
Figure 1: Different variants of prior knowledge visualized by samples: (top row) a standard Gaussian model N (0, I), (mid row) realizations of a Gaussian Markov random field (GMRF) with eight nearest neighbor interaction and (bottom row) realizations of a generative model trained on the set of handwritten digits (MNIST).All images are scaled to [0, 1].

Figure 2 :
Figure 2: Impact of the employed blurring operator on an MNIST image.On the left is the original black and white image scaled to [0, 1].From left to right, the application of the blurring operator to the original image for η = 5, 4, 3, 2 is shown.

Figure 2 ,
Figure 2, we show the applied blurring operation.The added noise levels in the experiments are defined for σ = 10 −s for s = 1, 2, 3, 4.

Figure 3 :
Figure 3: A single image from the MNIST test set a) is taken as ground truth for subsequent inference.The resulting observation after applying the blurring operator with η = 3 and adding Gaussian noise with σ = 10 −2 is shown in b).In c) the resulting reconstruction with the oracle L 2 regularized approach is given.We show in d) the reconstruction result using the Latent space approach and in e) the result of the Laplace method.Above the estimated reconstructions, the PSNR value is indicated.In f), the standard deviation of the posterior in original space is presented, i.e., the standard deviation of the marginalized posterior π L (x i |y) for each pixel i = 1, . . ., d.

Figure 5 :
Figure 5: Visualization of the reconstruction quality of the different approaches.First column shows the same ground truth x, which is subject to blurring and noise in the second column showing the observation y.From top to bottom, the noise is decreased.The columns L 2 , "Latent", and "Laplace" show one resulting estimate for each approach and below each estimate we depict the PSNR for 100 repeated experiments, each with a different noise realization.