Non-asymptotic error estimates for the Laplace approximation in Bayesian inverse problems

In this paper we study properties of the Laplace approximation of the posterior distribution arising in nonlinear Bayesian inverse problems. Our work is motivated by Schillings et al. (2020), where it is shown that in such a setting the Laplace approximation error in Hellinger distance converges to zero in the order of the noise level. Here, we prove novel error estimates for a given noise level that also quantify the effect due to the nonlinearity of the forward mapping and the dimension of the problem. In particular, we are interested in settings in which a linear forward mapping is perturbed by a small nonlinear mapping. Our results indicate that in this case, the Laplace approximation error is of the size of the perturbation. The paper provides insight into Bayesian inference in nonlinear inverse problems, where linearization of the forward mapping has suitable approximation properties.


Introduction
The study of Bayesian inverse problems [8,26] has gained wide attention during the last decade as the increase in computational resources and algorithmic development have enabled uncertainty quantification in numerous new applications in science and engineering.Largescale problems, where the computational burden of the likelihood is prohibitive, are, however, still a subject of ongoing research.
In this paper we study the Laplace approximation of the posterior distribution arising in nonlinear Bayesian inverse problems.The Laplace approximation is obtained by replacing the log-posterior density with its second order Taylor approximation around the maximum a posteriori (MAP) estimate and renormalizing the density.This produces a Gaussian measure centered at the maximum a posteriori (MAP) estimate with a covariance corresponding to the Hessian of the negative log-posterior density (see, e.g., [3,Section 4.4]).
The asymptotic behavior of the parametric Laplace approximation in the small noise or large data limit has been studied extensively in the past (see, e.g., [30]).We note that in terms of approximation properties with respect to taking a posterior expectation over a given function, there is a long line of research which we discuss below.Our work is parallel

Our contributions
The main contribution of this work is threefold: 1.In Theorem 3.4, we derive our central error estimate for the total variation distance of the Laplace posterior approximation in nonlinear Bayesian inverse problems.The error bound consists of two error terms for which we derive an implicit optimal balancing rule in Proposition 3. 13.We assume uniform bounds on the third differentials of loglikelihood and log-prior density as well as a quadratic lower bound on the log-posterior density to control the error.Given such bounds the error estimate can be numerically evaluated.
2. In Theorem 4.1, we derive a further estimate for the Laplace approximation error that makes the effect of noise level, the bounds specified above and the dimension of the problem explicit.This error estimate readily implies linear rates of convergence for fixed problem dimension both in the small noise limit and when the third differential of the log-likelihood goes to zero, see Corollary 4.4.It furthermore leads to a convergence rate for increasing problem dimension in terms of noise level, problem dimension, and aforementioned bounds aligned with [15], see Corollary 4.6.
3. In Theorem 5.3, we quantify the error of the Laplace approximation in terms of the nonlinearity of the forward mapping for linear inverse problems with small nonlinear perturbation and Gaussian prior distribution.We assume uniform bounds on the differentials of the nonlinear perturbation of up to third order to control the error.This error estimate immediately implies linear convergence in terms of the size of the perturbation.Moreover, such a result provides insight into Bayesian inference in nonlinear inverse problems, where linearization of the forward mapping has suitable approximation properties.

Relevant literature
The asymptotic approximation of general integrals of the form e λf (x) g(x)dx by Laplace's method is presented in [21,30].Non-asymptotic error bounds for the Laplace approximation of such integrals have been stated in the univariate [20] and multivariate case [16,7].The Laplace approximation error and its convergence in the limit λ → ∞ have been estimated in the multivariate case when the function f depends on λ or the maximizer of f is on the boundary of the integration domain [10].A representation of the coefficients appearing in the asymptotic expansion of the approximated integral utilizing ordinary potential polynomials is given in [18].The error estimates on the Laplace approximation in TV distance are closely connected to the so-called Bernstein-von Mises (BvM) phenomenon that quantifies the convergence of the scaled posterior distribution toward a Gaussian distribution in the large data or small noise limit.Parametric BvM theory is well-understood [29,12].Our work is inspired by a BvM result by Lu in [15], where a parametric BvM theorem for nonlinear Bayesian inverse problems with an increasing number of parameters is proved.Similar to our objectives, he quantifies the asymptotic convergence rate in terms of noise level, nonlinearity of the forward mapping and dimension of the problem.However, our emphasis differs from [15] (and other BvM results) in that we are not restricted to considering the vanishing noise limit, but are more interested in quantifying the effect of small nonlinearity or dimension at a fixed noise level.We also point out that BvM theory has been developed for non-parametric Bayesian inverse problems (see, e.g., [19,17,6]), where the convergence is quantified in a distance that metrizes the weak convergence.
Let us conclude by briefly emphasizing that the Laplace approximation is widely utilized for different purposes in computational Bayesian statistics including, i.a., the celebrated INLA algorithm [22].It has also recently gained popularity in optimal Bayesian experimental design (see, e.g., [23,14,1]).Moreover, it provides a convenient reference measure for numerical quadrature [4,24] or importance sampling [2].

Organization of the paper
Before we present the aforementioned three main results in Sections 3 to 5, we introduce our set-up and notation, Laplace's method, and the total variation metric in Section 2. In Section 3, we introduce our central error bound for the Laplace approximation and explain the idea behind its proof.In Section 4, we derive an explicit error estimate for the Laplace approximation and describe its asymptotic behavior.In Section 5, we prove the error estimate for inverse problems with small nonlinearity in the forward mapping and Gaussian prior distribution.

Preliminaries and set-up
We consider for ε > 0 the inverse problem of recovering x ∈ R d from a noisy measurement y ∈ R d , where For all x, y ∈ R d , we denote the scaled negative log-likelihood by has a unique minimizer in R d , we call this minimizer the maximum a posteriori (MAP) estimate and denote it by x = x(y).Furthermore, we set for all x ∈ R d .This way, I is nonnegative, the MAP estimate x minimizes I and satisfies I(x) = 0.Moreover, we can express the posterior density as with a normalization constant Z.
Laplace's method approximates the posterior distribution by a Gaussian distribution L µ y whose mean and covariance are chosen in such a way that its log-density agrees, up to a constant, with the second order Taylor polynomial around x of the log-posterior density.If I ∈ C 2 (R d , R), the Laplace approximation of µ y is defined as where Σ := (D 2 I(x)) −1 .Here, DI denotes the differential of I, and we identify D 2 I(x) with the Hessian matrix {D 2 I(x)(e j , e k )} d j,k=1 .The Lebesgue density of L µ y is given by Σ is precisely the truncated Taylor series of I/ε around x.
The total variation (TV) distance between two probability measures ν and µ on (R d , B(R d )) is defined as [27].It has the alternative representation where f ∞ := sup x∈R d |f (x)| and ρ can be any probability measure dominating both µ and ν, see Remark 5.9 in [27] and equation (1.12) in [11].The total variation distance is valuable for the purpose of uncertainty quantification because it bounds the error of any credible region when using a measure ν instead of another measure µ.It can, moreover, be used to bound the difference in expectation of any bounded function f on R d with respect to µ and ν, respectively, by see Lemma 1.32 in [11].By Kraft's inequality the total variation distance bounds the square of the Hellinger distance , see Definition 1.28 and Lemma 1.29 in [11] or [9], while both metrics induce the same topology.The bounded Lipschitz metric which induces the topology of weak convergence of probability measures, is trivially bounded by the total variation distance.Here, we denote For further information on the relation between the total variation distance and other probability metrics we refer the survey paper [5].

Central error estimate
We will use the following ideas to bound the error of the Laplace approximation L µ y for a given realization of the data y ∈ R d .First, we will prove the fundamental estimate If we have a radial upper bound f ( x − x Σ ) for the integrand on the right hand side of (3.1), we can estimate where we applied a change of variable to a local parameter u := Σ − 1 2 (x − x).This integral, we can now express as a 1-dimensional integral using polar coordinates.
The integrand on the right hand side of (3.1) is very small and flat around x, since 1 2 x − x 2 Σ is the second order Taylor expansion of I(x) around x, and it falls off as |x| → ∞ because it is integrable.Its mass is thus concentrated in an intermediate distance from x.This can be seen, e.g., in Figure 1.We exploit this structure by splitting up the integral in (3.1) and bounding the integrand on a Σ-norm ball around the MAP estimate x and on the remaining space R d \ U (r 0 ) separately.On U (r 0 ), we then control the integrand by imposing uniform bounds on the third order differentials of the log-likelihood and the log-prior density.Outside of U (r 0 ), we control it by imposing a quadratic lower bound on I.
We make the following assumptions on Φ, R, I, x, and Σ, which will be further discussed in Remark 3.9.
Let Γ(z) denote the classical gamma function and γ(a, z) the lower incomplete gamma function.Then, describes the probability of a Euclidean ball in R d with radius √ t around 0 under a standard Gaussian measure (see Lemma 3.12).
The main result of this section is the following error estimate.
Theorem 3.4.Suppose that Assumptions 3.1 to 3.3 hold.Then we have for all r 0 ≥ 0, where for all r 0 ≥ 0, for all r ≥ 0, and Remark 3.5.The two functions E 1 and E 2 are continuous and monotonic with the following asymptotic behavior.The first error term E 1 (r 0 ) obeys whereas the second error term E 2 (r 0 ) satisfies This can be seen as follows.
The function f (r)r d−1 is bounded on the interval [0, 1], so that the integral r 0 0 f (r)r d−1 dr converges to 0 as r 0 → 0, and hence also E 1 (r 0 ).On the other hand, f (r)r d−1 converges to ∞ as r → ∞, so that the integral r 0 0 f (r)r d−1 dr and E 1 (r 0 ) converge to ∞ as r → ∞.Since f (r)r d−1 is positive for all r ≥ 0, E 1 moreover increases monotonically.By definition of the lower incomplete gamma function, Ξ d increases monotonically and as r 0 → 0, and toward 0 as r 0 → ∞.The asymptotic behavior of E 2 is described more precisely in Lemma 4.3.
The following three propositions formalize the ideas described in the beginning of this section and constitute the prove of Theorem 3.4.

Proposition 3.6 (Fundamental estimate). The Laplace approximation
Now, the estimate yields the proposition.
Proposition 3.7 (Close range estimate).Suppose that Assumption 3.2 holds.Then it follows that for all r 0 ≥ 0, where f and c d are defined as in Theorem 3.4.
Proposition 3.8 (Far range estimate).Suppose that Assumption 3.3 holds.Then we have for all r 0 ≥ 0.
The proof of Theorem 3.4 is now very short.
Proof of Theorem 3.4.By Proposition 3.6 we have Now, splitting up this integral into integrals over U (r 0 ) and its complement and applying Propositions 3.7 and 3.8 proves the statement.
1.Because of I(x) = 0 and the necessary optimality condition DI(x) = 0, the function R 2 (x) = I(x) − 1 2 x − x 2 Σ defined in Proposition 3.6 is precisely the remainder of the second order Taylor polynomial of I around x.In Proposition 3.7, Assumption 3.2 is used to control R 2 near the MAP estimate by bounding the third order differential of I.In Proposition 3.8, in turn, Assumption 3.3 is used to control R 2 at a distance from x by bounding it from below by 2. The constant K ≥ 0 in Assumption 3.2 quantifies the non-Gaussianity of the likelihood and the prior distribution and can be arbitarily large.Assumption 3.3 bounds the unnormalized log-posterior density from above by a multiple of the unnormalized log-density of the Laplace distribution, where the constant δ > 0 represents the scaling factor and can be arbitrarily small.This restricts our results to posterior distributions whose tail does not decay slower than that of a Gaussian distribution.Assumption 3.3 can for example be violated if a prior distribution with heavier than Gaussian tail is chosen such as a Cauchy distribution and if the forward mapping is linear but singular.
Our main interest lies on inverse problems with a posterior distribution that is not too different from a Gaussian distribution, since this is a setting in which the Laplace approximation can be expected to yield reasonable results.
3. In case of a linear inverse problem and a Gaussian prior distribution, the Laplace approximation is exact, so that Assumptions 3.2 and 3.3 are trivially satisfied with K = 0 and δ = 1.We will see in Section 5 that Assumptions 3.2 and 3.3 are satisfied for nonlinear inverse problems with δ and K as given in Propositions 5.6 and 5.7 if the prior distribution is Gaussian and the nonlinearity of the forward mapping is small enough.In this case, the quadratic lower bound on I in Assumption 3.3 restricts the nonlinearity of the forward mapping to be small enough such that the tail of the posterior distribution does not decay slower than that of a Gaussian distribution.
4. Note that neither in Section 3 nor in Section 4 we make use of the Gaussianity of the noise.Therefore, the results of these sections remain valid for non-Gaussian noise as long as the log-likelihood satisfies Assumptions 3.2 and 3.3.In case of noise with a log-density −ν ∈ C 3 (R d ), the negative log-likelihood takes the form Φ(x) = ν(y − G(x)) and we have I(x) = ν(y −G(x))+εR(x)+c.Consider for example standard multivariate Cauchy noise, where for all η ∈ R d .The derivatives of up to third order of s → ln(1 + s), s ≥ 0, are bounded since they are continuous and converge to 0 as s tends to infinity.By the smoothness of x → |x| 2 , ν is therefore in C 3 (R d ) and is uniformly bounded.In case of a linear forward mapping, the uniform boundedness transfers to D 3 Φ(x) and we can estimate for any symmetric positive definite matrix Σ.
5. We make Assumptions 3.2 and 3.3 globally, i.e., for all x ∈ R d , for the sake of simplicity.For a given r 0 ≥ 0, Theorem 3.4 remains valid if Assumption 3.2 only holds for x − x Σ ≤ r 0 and if Assumption 3.3 only holds for x − x Σ ≥ r 0 .This allows for prior distributions which are not supported on the whole space R d , as long as the support of the prior contains the set U (r 0 ) and R ∈ C 3 (U (r 0 )).In this case, R and I are allowed to take values in R := R ∪ {∞} and we follow the convention exp(−∞) = 0. 6.The constant K in Assumption 3.2 can be replaced by a radial bound ρ( x − x Σ ) with a monotonically increasing function ρ.This way, an estimate of the form (3.2) can be obtained with f replaced by Remark 3.10.Both the unnormalized posterior density exp(− 1 ε I(x)) and the unnormalized Gaussian density exp(− 1 2ε x − x 2 Σ ) attain their maximum 1 in x.The densities of µ y and L µ y themselves, however, take the values 1/Z and 1/ Z in x due to the different normalization, see Figure 1.For this reason, the probability of small balls around x under µ y and L µ y differs asymptotically by a factor of Z/Z.This has several consequences in case that the normalization constants Z and Z differ considerably.
One the one hand, credible regions around x may have considerably different size under the posterior distribution and its Laplace approximation.On the other hand, the integrand of the total variation distance d TV (µ y , L µ y ) may, unlike the integrand of the fundamental estimate (3.1), have a significant amount of mass around x, see Figure 1.This means that a significant portion of the error when approximating the probability of an event under µ y by that under L µ y may be due to the difference in their densities near the MAP estimate x.So although the Laplace approximation is defined by the local properties of the posterior distribution in the MAP estimate, it is not necessarily a good local approximation around it.
A large difference in the normalization constants Z and Z as mentioned above reflects that the log-posterior density cannot be approximated well globally by its second order Taylor polynomial around x.In the proof of Proposition 3.6, we saw that the difference in normalization is in fact bounded by the total variation of the unnormalized densities.The value of Proposition 3.6 lies in providing an estimate for the total variation error that only involves unnormalized densities.The probability densities of a posterior distribution µ y and its Laplace approximation L µ y (left), as well as the integrands of the total variation distance between µ y and L µ y and of the fundamental estimate (3.1) (right).
In the following subsections, we present the proofs of our close and far range estimate, and characterize the optimal choice of r 0 .

Proof of Proposition 3.7
We consider the close range integral over the Σ-norm ball with radius r 0 ≥ 0. The proof of our close range estimate is based upon the following estimate for the remainder term R 2 (x).
Lemma 3.11.If Assumption 3.2 holds, then we have Proof.We set h := x − x and write the remainder of the second order Taylor polynomial of Φ in mean-value form as , and estimate using Assumption 3.2.By proceeding similarly for R, we now obtain Now, we can prove our close range estimate.
Proof of Proposition 3.7.By Lemma 3.11 and (2.3), we have Here denotes the volume of the d-dimensional Euclidean unit ball (dκ d is its surface area).Using the fundamental recurrence Γ(z + 1) = zΓ(z), we can write

Proof of Proposition 3.8
Now, we consider the integral over the space outside of a Σ-norm ball with radius r 0 ≥ 0. In the proof of our far range estimate, the following expression is used to describe the probability of R d \ U (r 0 ) under L µ y .Let Γ(a, z) denote the upper incomplete gamma function.
Proof.We compute the tail integral explicitly using a local parameter and polar coordinates.This yields We can express this integral in terms of the upper incomplete gamma function by substituting s = δr 2 /2ε (note that r (s This leads to Now, using the fundamental recurrence Γ(z + 1) = zΓ(z) completes the proof.Now, we can prove our far range estimate.
Proof of Proposition 3.8.Let x ∈ R d \ U (r 0 ).We distinguish between two cases.First, consider the case that R 2 (x) ≥ 0. For t ≥ 0, the estimate |e −t − 1| ≤ 1 holds.This implies Next, consider the case that R 2 (x) < 0. By Assumption 3.3, we have Together, this shows that Now, the proposition follows from which in turn holds by Lemma 3.12 and the identity Γ(a, z) = Γ(a) − γ(a, z).

Optimal choice of the parameter
We have the following necessary optimality condition for the parameter r 0 in Theorem 3.4.
Proposition 3.13.The optimal choice of r 0 in the error bound (3.2) is either 0 or satisfies Proof.The terms E 1 and E 2 are differentiable on [0, ∞).Clearly, the optimal r 0 is either 0 or satisfies the identity We have that which yields the result.
Remark 3.14.The right hand side of the far range estimate (3.6) can be written as where c d is defined as in Theorem 3.4.The optimal choice of r 0 is therefore one for which the integrands f (r 0 )r d−1 and exp(−δr 2 0 /2ε)r d−1 of the close and the far range estimate take the same value.

Explicit error estimate
Here, we present a non-asymptotic error estimate in terms of K, δ, ε, and the problem dimension d.While Theorem 3.4 constitutes a non-asymptotic error estimate and is the sharpest of our three main results, it is not immediately clear how the non-Gaussianity of the likelihood and the prior distribution, as quantified by the constant K in Assumption 3.2, the noise level, and the problem dimension affect the error bound.The purpose of the following theorem is to make this influence more explicit.
with C := √ 2e/3, then In order to prove this theorem, we introduce an exponential tail estimate for the Laplace approximation, which is a modified version of [25,Prop. 4].
According to Theorem 3.4, we then have For all t ≥ 0, the exponential function satisfies the estimate exp(t) − 1 = (1 − exp(−t)) exp(t) ≤ t exp(t).Therefore, we have By the choice of r 0 , we have so that the integral is bounded by e r 0 0 exp − 1 2ε r 2 r d+2 dr.
By substituting s = r 2 /2ε, we can in turn express this integral as 1 2 (2ε) Now, we use the inequality γ(a, z) ≤ Γ(a) to obtain that By condition (4.1), we have .
Thus, we may apply Lemma 4.3, which yields by condition (4.1).Now, we obtain by summing up that

Asymptotic behavior for fixed and increasing problem dimension
Now, we describe the convergence of the Laplace approximation for a sequence of nonlinear problems that satisfy Assumptions 3.1 to 3.3 with varying bounds {K n } n∈N and {δ n } n∈N , respectively, and varying squared noise levels {ε n } n∈N , both in case of a fixed and an increasing problem dimension.We denote the data by y n , the prior distribution by R n , the scaled negative log-likelihood by Φ n , and set First, we consider the case that the problem dimension d remains constant.
Proof.Since {δ n } n∈N is bounded from below and {ε n } n∈N is bounded from above, the left hand side of (4.1) is bounded from below by C 1 /ε 1/2 n K n for some C 1 > 0. On the other hand, the right hand side of (4.1) is bounded from above by (8 ln for large enough n and some C 2 > 0, since {δ n } n∈N is bounded from below and {ε n } n∈N is bounded from below by 0. Consequently, there exists N 1 ≥ N 0 such that condition (4.1) holds for all n ≥ N 1 by the convergence ε 1/2 n K n → 0 and since lim t→∞ t −2/3 ln t = 0. Now, Theorem 4.1 yields the proposition.
Remark 4.5.Corollary 4.4 covers two cases of particular interest: That of K n → 0 while ε n = ε remains constant, which yields a rate of K n , and that of ε n → 0 while K n = K remains constant, which yields a rate of ε 1/2 n .The former case can, for example, occur if the sequence of forward mappings G n converges pointwise towards a linear mapping, see Section 5.The convergence rate in the latter case, i.e., in the small noise limit, agrees with the rate established in [25,Theorem 2] if we set ε n = 1 n .Now, we consider the case of an increasing problem dimension d → ∞.To this end, we index K d , δ d , ε d , and R d by d ∈ N.  and 3 for all d ≥ N 0 , then for every for all d ≥ N 1 .
Proof.We can write condition (4.1) as By [28, pp.67-68], we have so that the first term on the right hand side of (4.5) is bounded from above by 8e for some C 1 > 0, which in turn is bounded from above by for large enough d, due to the convergence ε 1/2 d K d → 0, the boundedness from above of {ε n } n∈N , and since lim t→∞ t −2/3 ln t = 0.By (4.3), the second term on the right hand side of (4.5) is bounded from above by (4.7) for all d ≥ N 0 as well.Due to the assumption δ d ≤ e −1/2 and (4.7), condition (4.3) also ensures that (4.4) is satisfied for all d ≥ N 0 .Therefore, condition (4.1) is satisfied for large enough d.Now, Theorem 4.1 and (4.6) yield that for every C > 2 √ 2e/3 there exists for all d ≥ N 1 .

Perturbed linear problems with Gaussian prior
In this section we consider the case that the forward mapping G is given by a linear mapping with a small nonlinear perturbation, i.e., that with A ∈ R d×d , F ∈ C 3 (R d ), and τ ≥ 0. We then quantify the error of the Laplace approximation for small τ , that is, when the nonlinearity of the forward mapping is small, and for fixed problem dimension d.In order to isolate the effect of the nonlinearity on estimate (4.2), we consider the case when not only the noise, but also the prior distribution is Gaussian.This ensures that all non-Gaussianity in the posterior distribution results from the nonlinearity of G τ .We assign a prior distribution µ = N (m 0 , Σ 0 ) with m 0 ∈ R d and symmetric, positive definite Σ 0 ∈ R d×d , i.e., we set For each τ ≥ 0 we denote the data by y τ and the scaled negative log-likelihood by Φ τ (x) = We make the following assumptions on the function I τ and the perturbation F .Let B(r) ⊂ R d denote the closed Euclidean ball with radius r around the origin.Assumption 5.1.We assume that there exists τ 0 > 0 such that for all τ ∈ [0, τ 0 ], has a unique minimizer xτ with D 2 I τ (x τ ) > 0. Furthermore, we assume that y τ , xτ and Σ τ := D 2 I τ (x τ ) −1 converge as τ → 0 with lim τ →0 Σ τ > 0 and denote their limits by y, x, and Σ, respectively.

Assumption 5.2.
There exist constants C 0 , . . ., C 3 > 0 and τ 0 > 0 such that D j F (x) Στ ≤ C j , j = 0, . . ., 3, for all x ∈ R d and τ ∈ [0, τ 0 ], and there exists M > 0 such that The idea behind the following theorem is to make explicit how the nonlinearity of the forward mapping, as quantified by τ and the constants C 0 , . . ., C 3 , M in Assumption 5.2, influences the total variation error bound of Theorem 4.1.
Theorem 5.3.Suppose that Assumptions 5.1 and 5.2 hold.Then, there exists where 1.The choice of the upper bound τ 1 is made explicit in the proof of Theorem 5.3 and depends on d and ε, i.a., through δ 0 as defined in Proposition 5.6.
The proof of Theorem 5.3 can be adapted to yield a result analogous to Corollary 4.6 in the case when the problem dimension d tends to ∞ while the size τ d of the perturbation tends to 0. Then, δ τ d may converge to 0, and (4.3) imposes a bound on the rate at which {τ d } d∈N tends to 0.
2. By the boundedness and continuity of F , G τ Γ-converges toward A. By the fundamental theorem of Γ-convergence and Assumption 5.1, this, in turn, implies that x is the minimizer of 3. Theorem 5.3 remains valid if the assumption that D 3 F ≡ 0 outside of a bounded set is replaced by In order to prove Theorem 5.3, we first show that Assumptions 3.2 and 3.3 are satisfied for small enough τ and determine the bounds K τ and δ τ .Then, we derive the error estimate for the perturbed linear case.

Verifying Assumption 3.3
We verify that Assumption 3.3 holds for small enough τ and determine δ τ .First, we estimate I τ from below.Lemma 5.5.For all τ ≥ 0 and x ∈ R d , we have Proof.Since xτ satisfies the necessary optimality condition we can write I τ (x) as for all x ∈ R d .For the log-likelihood, we have and using the Cauchy-Schwarz inequality.For the log-prior density, we have for all x ∈ R d , and thus Now, adding up (5.3) and (5.4) multiplied with ε yields the proposition.
Proposition 5.6.Suppose that Assumption 5.2 holds.Then there exists τ 0 > 0, such that I τ satisfies Proof.Let x ∈ R d be arbitrary.Then, there exists z and hence for all τ ≤ τ 0 .There also exists and for all τ ≤ τ 0 .Now, Lemma 5.5 yields that It remains to show that lim τ →0 δ τ > 0. The convergence of y τ and xτ yields

Verifying Assumption 3.2
Now, we verify that Assumption 3.2 holds for small τ .The following proposition also describes how the nonlinearity of the forward mapping translates into non-Gaussianity of the likelihood, as quantified by the costant K d .

Outlook
In this paper we prove novel error estimates for the Laplace approximation when applied to nonlinear Bayesian inverse problems.Here, the error is measured in TV distance and our estimates aim to quantify effects independent of the noise level.Our central error estimate in Theorem 3.4 is of particular use for high-dimensional problems because it can be evaluated without integrating in R d .Our estimate in Theorem 4.1 makes the influence of the noise level, the nonlinearity of the forward operator, and the problem dimension explicit.Our estimate for perturbed linear problems in Theorem 5.3, in turn, specifies in more detail how the properties of the nonlinear perturbation affect the approximation error.We point out that our central estimate diverges with an increasing dimension for a fixed noise level and forward mapping, and therefore such asymptotics does not provide any added value compared to the trivial TV upper bound of 1.This unsatisfactory observation is natural since the limiting posterior and Laplace approximation (if well-defined) are singular with respect to each other and, consequently, the TV distance is maximized.Future study is therefore needed to establish similar bounds with distances that metrize the weak convergence such as the 1-Wasserstein distance.Such effort would be aligned with recent developments in BvM theory that extend to nonparametric Bayesian inference and, in particular, Bayesian inverse problems.
random noise with standard normal distribution N (0, I d ), and G: R d → R d is a possibly nonlinear mapping.In the following, |•| denotes the Euclidean norm on R d .If we assume a prior distribution µ on R d with Lebesgue density exp(−R(x)), then Bayes' formula yields a posterior distribution µ y with density

Figure 1 :
Figure1: The probability densities of a posterior distribution µ y and its Laplace approximation L µ y (left), as well as the integrands of the total variation distance between µ y and L µ y and of the fundamental estimate (3.1) (right).

Corollary 4 . 4 ( 2 n
Fixed problem dimension).Suppose that I n , Φ n , and R n satisfy Assumptions 3.1 to 3.3.If ε 1/K n → 0 and if there exist δ > 0 and N 0 ∈ N such that

Corollary 4 . 6 (
Increasing problem dimension).Suppose that I d , Φ d , and R d satisfy Assumptions 3.1 to 3.3 and that ε