Convergence and adaptive discretization of the IRGNM Tikhonov and the IRGNM Ivanov method under a tangential cone condition in Banach space

In this paper we consider the iteratively regularized Gauss–Newton method (IRGNM) in its classical Tikhonov version as well as two further—Ivanov type and Morozov type—versions. In these two alternative versions, regularization is achieved by imposing bounds on the solution or by minimizing some regularization functional under a constraint on the data misfit, respectively. We do so in a general Banach space setting and under a tangential cone condition, while convergence (without source conditions, thus without rates) has so far only been proven under stronger restrictions on the nonlinearity of the operator and/or on the spaces. Moreover, we provide a convergence result for the discretized problem with an appropriate control on the error and show how to provide the required error bounds by goal oriented weighted dual residual estimators. The results are illustrated for an inverse source problem for a nonlinear elliptic boundary value problem, for the cases of a measure valued and of an \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^\infty $$\end{document}L∞ source. For the latter, we also provide numerical results with the Ivanov type IRGNM.


Introduction
In this paper we consider a nonlinear ill-posed operator equation where the possibly nonlinear operator F : D(F) ⊆ X → Y with domain D(F) maps between real Banach spaces X and Y . We are interested in the ill-posed situation, i.e., F fails to be continuously invertible, and the data are contaminated with noise, thus regularization has to be applied (see, e.g., [8,27], and references therein). Throughout this paper we will assume that an exact solution x † ∈ D(F) of (1) exists, i.e., F(x † ) = y, and that the noise level δ in the (deterministic) estimate is known. Partially we will also refer to the formulation of the inverse problem as a system of model and observation equation Here A : X × V → W * and C : V → Y are the model and observation operator, so that with the parameter-to-state map S : X → V satisfying A(x, S(x)) = 0 and F = C • S, (1) is equivalent to the all-at-once formulation (3), (4). Newton type methods for the solution of nonlinear ill-posed problems (1) have been extensively studied in Hilbert spaces (see, e.g., [2,20] and the references therein) and more recently also in a in Banach space setting. In particular, the iteratively regularized Gauss-Newton method [1] can be generalized to a Banach space setting by calculating iterates x δ k+1 in a Tikhonov type variational form as see, e.g., [11,16,17,21,28] where p ∈ [1, ∞), (α k ) k∈N is a sequence of regularization parameters, and R is some nonnegative regularization functional. Alternatively, one might introduce regularization by imposing some bound ρ k on the norm of x, or, again, generally, on a regularization functional of x which corresponds to Ivanov regularization or the method of quasi solutions, see, e.g., [7,[13][14][15]22,24,26]. A third way of incorporating regularization in a Newton type iteration is Morozov regularization, also called the method of the residuals, see, e.g., [9,22,23] x δ k+1 ∈ argmin x∈C R(x) such that F ( for some σ ∈ (0, 1), where the choice of the bound in the inequality constraint is very much inspired by the inexact Newton type regularization parameter choice in [10].
We restrict ourselves to the norm in Y as a measure of the data misfit, but the analysis could as well be extended to more general functionals S satisfying certain conditions, as e.g., in [11,28]. Here C is a set (possibly chosen with convenient properties for carrying out the minimization) containing x † and being contained in D(F), such that F satisfies additional conditions on C see (8), (10) below. If F is defined on all of X , then the minimization problem (5) can be posed in an unconstrained way C = X .
As a restriction on the nonlinearity of the forward operator F we impose the tangential cone condition (8) (also called Scherzer condition, cf. [25]) for some constant c tc < 1/3. Here, for any r > 0, is a sublevel set of the regularization functional and R will be specified in the convergence result Theorem 1. Note that the convergence conditions imposed in [11,16,17,21,28] in the situation without source condition, namely local invariance of the range of F (x) * , are slightly stronger, since this adjoint range invariance is sufficient for (8). However, most probably the gap is not very large, as in those application examples where (8) has been verified, the proof of (8) is actually often done via adjoint range invariance. In (5), (6), (7), the bounded linear operator F (x) is not necessarily a Gâteaux or Fréchet derivative of F, but just some local linearization (in the sense of (8)), satisfying additionally the weak closedness condition In here, T X and T Y are topologies on X and Y (e.g., just the weak or weak* topologies) such that bounded sets in Y are T Y -compact and the norm in Y is T Y -lower semicontinuous. The remainder of this paper is organized as follows. In Sect. 2 we state and prove convergence results in the continuous and discretized setting. Section 3 shows how to actually obtain the required discretization error estimates by a goal oriented weighted dual residual approach and Sect. 4 illustrates the theoretical findings by an inverse souce problem for a nonlinear PDE. In Sect. 5 we provide some numerical results for this model problem and Sect. 6 concludes with some remarks.

Convergence
In this section we will study convergence of the IRGNM iterates first of all in a continuous setting, then in the situation of having discretized for computational purposes.
The regularization parameters α k , ρ k , σ are chosen a priori (note that ( 2c ct 1−c ct ) p < 1 for c tc < 1/3), and with τ as in (14), and the iteration is stopped according to the discrepancy principle with some fixed τ > 1 chosen sufficiently large but independent of δ.
To handle the power p we make use of the following inequalities that can be proven by solving extremal value problems, see the "Appendix" for all a, b > 0, p ≥ 1 and γ, ∈ (0, 1), where for the right hand inequality to hold, additionally a ≥ b is needed.
for γ, ∈ (0, 1). So in order for this recursion to yield geometric decay of F(x δ k ) − y δ , we need to ensure for a proper choice of , γ ∈ (0, 1). To obtain the largest possible (and therefore least restrictive) bound on c tc , we rewrite the requirement above as as can be found out by evaluating the derivative of φ Thus we will furtheron set = 1 2 and assume that γ > 0 is sufficiently small so that (26) holds with = 1 2 , i.e., (19). Then, using (11), estimate (25) can be written as which we first of all regard as a recursive estimate for d k .
To derive a similar estimate also in the complementary case we use that fact that, for d k as in (18), this inequality just means and, using (22) and the left hand part of (24), hence after addition we again get (27) (even with a slightly smaller value of q : . Thus in both cases, using Lemma 2 we can conclude that This finishes the induction proof of (17) for all k ∈ {0, . . . , k * (δ, y δ )}. We next show that the discrepancy stopping criterion from (14), i.e., d k * ≤τ δ p forτ = 2 1− p (1 − c tc ) p τ p , will be satisfied after finitely many, namely O(log(1/δ)), steps. For this purpose, note thatτ > C 1−q , provided τ is chosen sufficiently large, which we assume to be done. Thus, indeed, using (11), (21), we have where the right hand side falls belowτ δ p as soon as Thus we get the upper estimate k * (δ, y δ ) ≤k(δ) = O(log(1/δ)).
For the Ivanov version (6), it only remains to show finiteness of the stopping index, as boundedness of the R values by R = ρ holds by definition. Applying the minimality argument with x † being admissible [cf. (12)] to (6) leads to the special case p = 1, Our notation becomes which gives and by induction, one can conclude where the right hand side is smaller thanτ so that we can again conclude k * (δ, y δ ) ≤k(δ) = O(log(1/δ)).
Finally we consider (7), where boundedness of the R values by R = R(x † ) holds by minimality and the fact that x † is admissible, cf. (16). Geometric decay of the residuals follows by the estimate and (13), i.e., so that similarly to above we end up with a logarithmic estimate for k * .
Remark 1 Convergence of R(x δ k * (δ,y δ ) ) to R(x † ) as δ → 0 holds along the T X convergent subsequence according to Theorem 1 (ii), first of all for the Morozov and the Ivanov version of the IRGNM, with the choice ρ = R(x † ) for the latter, since in both cases R(x δ k * (δ,y δ ) ) ≤ R(x † ) holds for all δ and R is T X lower semicontinuous. The same holds true also for the Tikhonov version with the alternative choice of α k such that for some constants σ , σ satisfying c tc + 1+c tc τ < σ < σ < 1 in place of (11), as can be seen directly from (22). If R is defined by the norm on a space with the Kadets-Klee property, and T X is the weak topology of this space, then this implies norm convergence of x δ k * (δ,y δ ) to x † along the same subsequence.

Remark 2
The fact that x δ k stays in B R [cf. (15)] is crucial for the applicability of the tangential cone condition (8) in these iterates. If the functional R quantifies some distance to an a priori guess x 0 , (e.g., R = x − x 0 q for some norm · and some q > 0), then x ∈ B R with small R means closeness of x to x 0 in a certain sense. Thus, the smaller R is, the better (8) might get achievable with some c tc < 1 3 . On the other hand, making R according to (15) small means closeness of x † to x 0 . Thus we deal with local convergence, as typical for Newton type methods. Now we consider the appearance of discretization errors in the numerical solution of (5), (6) arising from restriction of the minimization to finite dimensional subspaces X k h and leading to discretized iterates x δ k,h and an approximate version F k h of the forward operator i.e., we consider the discretized version of Tikhonov-IRGNM (5) and of Morozov-IRGNM (7) respectively. Moreover, also in the discrepancy principle, the residual is replaced by its actually computable discretized version We define the auxiliary continuous iterates and respectively in order to be able to use minimality, i.e., compare with the continuous exact solution x † . For an illustration we refer to [18, Figure 1]. First of all, we assess how large the discretization errors can be allowed to still enable convergence. Later on, in Sect. 3, we will describe how to really obtain such estimates a posteriori and to achieve the prescribed accuracy by adaptive discretization.

Corollary 1 Let the assumptions of Theorem 1 be satisfied and assume that the discretization error estimates
(note that no absolute value is needed in (38), (40); moreover, (40) is only be needed for (5) and (7)) hold with for all k ≤ k * (δ, y δ ) and constants c η , c ξ > 0 sufficiently small,ζ > 0. Then the assertions of Theorem 1 remain valid for with (34) in place of (14) and (42) in place of (23).
Proof For the Tikhonov version (31), in order to inductively estimate R(x δ k+1,h ), given x δ k ∈ B R , note that from (43) with k + 1 replaced by k, we get like in (23) that for γ,γ , c η ∈ (0, 1), which are chosen small enough so that q < θ. As before, from the minimality of x δ k+1 and (2), (8) as well as then using (38), (40), Hence, with the same technique as in the proof of Theorem 1, using (24) with = 1 2 , we have using (41). From this, by induction we conclude Hence, by (39), (41), we have the following estimate where the right hand side falls below τ δ as soon as (1 − c ξ )) p . Note thatτ > C 1−q , provided τ is chosen sufficiently large, which we assume to be done. That is, we have shown that the discrepancy stopping criterion from (34) will be satisfied after finitely many, namely O(log(1/δ)), steps.
On the other hand, the continuous discrepancy at the iterate defined by the discretized discrepancy principle (34) by (39), (41) satisfies To estimate R(x δ k * (δ,y δ ),h ), note that according to our notation, from (43), we get, like in (23), that for all k ∈ {1, . . . , k * (δ, y δ )} Now we show finiteness of the stopping index for the discretized Ivanov-IRGNM (32). By minimality of x δ k+1 and (38), for this problem we have by induction, (39) and (41) gives where the right hand side is smaller than τ δ for all then, by (39) and induction where ∈ (0, 1), and the right hand side of (44) falls below τ δ for all

Error estimators for adaptive discretization
The error estimators η k , ξ k and ζ k can be quantified, e.g., by means of a goal oriented dual weighted residual (DWR) approach [3], applied to the minimization problems (note that the last constraint is added in order to enable computation of I k 2 below) and which are equivalent to (5), (6), and (7), respectively, with as quantities of interest [where I k 3 is only needed for (5) and (7)]. We assume that C, R and the norms can be evaluated without discretization error, so the discretized versions of I k i only arise due to discreteness of the arguments. Indeed, it is easy to see that the left hand sides of (38) and (39) can be bounded (at least approximately) by combinations of I k 1 and I k 2 , using the triangle inequality: where we will neglect R k+1 It is important to note that I k+1 1,h is not equal to I k 2,h , see [18]. The computation of the a posteriori error estimators η k , ξ k , ζ k is done as in [18]. These error estimators can be used within the following adaptive algorithm for error control and mesh refinement: We start on a coarse mesh, solve the discretized optimization problem and evaluate the error estimator. Thereafter, we refine the current mesh using local information obtained from the error estimator, reducing the error with respect to the quantity of interest. This procedure is iterated until the value of the error estimator is below the given tolerance (41), cf. [3].
In this case, all the variables x, v, u,ũ are subject to a new discretization. For better readability we will partially omit the iteration index k and the discretization index h. The previous iterate x δ k is fixed and not subject to a new discretization. Consider now the cost functional for (45) and define the Langrangian functional neglecting for simplicity (cf. Remark 2) the constraints defined by C. The first-order necessary optimality conditions for (45) are given by stationarity for the Lagrangian L. Setting z = (x, v, u,ũ, λ,μ, μ), they read and for the discretized problem, To derive a posteriori error estimators for the error with respect to the quantities of interest (I 1 , I 2 , I 3 ), we introduce auxiliary functionals M i : respectively. Then, z, z h are continuous and discrete stationary points of L and there holds I i (z) = M i (z), i = 1, 2, 3. Thus the z part, as computed already during the numerical solution of the minimization problem (45) (or (46)) remains fixed for all i ∈ {1, 2, 3, }. Moreover, after computing the discrete stationary point z h for L (e.g., by applying Newton's method), it requires only one more Newton step to compute thē z coordinate of the stationary point for M from According to [3], there holds with a remainder term R of order O( z −z h 3 ) that is therefore neglected. Thus we use where π h is an operator defined such that (π hzi,h −z i,h ) approximates the interpolation error as in [18], typically defined by local averaging, to define the estimators η k , ξ k , ζ k according to the rule cf. (48), (49). The estimators obtained by this procedure can be used to trigger local mesh refinement until the requirements (41) are met cf. [3]. Explictly, for p = 2 (for simplicity) such a stationary point z = (x, v, u,ũ, λ,μ) can be computed by solving the following system of equations (analogously for the discrete stationary point of L) (54) Note that (58) is decoupled from the other equations and that if A u (x, u) * is injective, Eq. (54) implies μ = 0. Summarizing, since we have a convex minimization problem, after solving a nonlinear system of seven equations to find the minimizer, we need only one more Newton step to compute the error estimators to check whether we need a refinement on the mesh or not.
Regarding the problem (46) related to the Ivanov-IRGNM, we have the Lagrangian functional (50) with the cost functional defined by Similarly for (47) for Morozov-IRGNM, with the cost function , we end up with an optimality system by setting α k = 1 and replacing (53), (55) in (52)-(58) by respectively. Note that the bound on I 2 only appears-via (51)-in connection to the assumption (41). This may be satisfied in practice without refining explicitly with respect to η k , but simply by refining with respect to the other error estimators ξ k (and ζ k in the Tikhonov or Morozov case). The fact that I k 1,h and I k−1 2,h only differ in the discretization level, motivates the assumption that for small h, we have I k 1,h ≈ I k−1 2,h and η k−1 ≈ ξ k . Thefore, the algorithm used in actual computations will be built neglecting I 2 and hence skipping the constraint A(x, u), w W * ,W = 0, ∀w ∈ W in (45), (46), (47), which implies a modification of the Lagrangian (50) accordingly. Therefore, the corresponding optimality systems for p = 2 in the Tikhonov case is given by Note that Eq. (66) is decoupled from the others. Therefore, the strategy is to solve (66) forũ first, then solve the linear system (62), (63), (65) for (x, v, λ), and finally computẽ μ via the linear equation (64). Here, the system (62), (63), (65) can be interpreted as the optimality conditions for the following problem For the Ivanov case, we have to solve (63)-(66) with in place of (62), hence again (66) is decoupled from the other equations, (64) is linear with respect toμ, once (x, v, λ) has been computed, and the remaining system for (x, v, λ) can be interpreted as the optimality conditions for the following problem The Morozov case requires solution of (62) (with α k = 1), (65), (66), (60), (61). Thus again, we first solve (66) forũ, then the system (62) (with α k = 1), (65), (60), which is the first order optimality condition for with Lagrange multiplier λ for the equality constraint, and finally the (now possibly nonlinear) inclusion (61) forμ.

Remark 3
Since DWR estimators are based on residuals which are computed in the optimization process, the additional costs for estimation are very low, which makes this approach attractive for our purposes. However, although these error estimators are known to work efficiently in practice (see [3]), they are not reliable, i.e., the conditions can not be guaranteed in a strict sense in the computations, since we neglect the remainder term R and use an approximation forz −ẑ h . As our analysis in Theorem 1 is kept rather general, it is not restricted to DWR estimators and would also work with different (e.g., reliable) error estimators.

Model examples
We present a model example to illustrate the abstract setting from the previous section. Consider the following inverse source problem for a semilinear elliptic PDE, where the model and observation equations are given by where χ ω c denotes the extension by zero of a function on ω c to a function on all of Ω. We first of all consider Tikhonov regularization and, aiming for a sparsely supported source, therefore use the space of Radon measures M(ω c ) as a preimage space X .

Thus we define the operators
where Ω is a bounded domain in R d with d = 2 or 3, with Lipschitz boundary ∂Ω and ω c , ω o ⊂ Ω are the control domain and the observation domain, respectively.
A monotonicity argument yields well posedness of the above semilinear boundary value problem, i.e., well-definedness of u ∈ W 1,q 0 (Ω) as a solution to the elliptic boundary value problem (68), (69), as long as we can guarantee that u 3 ∈ W −1,q (Ω) for any u ∈ W 1,q 0 (Ω), i.e., the embeddings W 1,q 0 (Ω) → L 3r (Ω) and L r (Ω) → W −1,q (Ω) are continuous for some r ∈ [1, ∞], which (by duality) is the case iff W 1,q 0 (Ω) embeds continuously both into L 3r (Ω) and L r (Ω). By Sobolev's Embedding Theorem, this boils down to the inequalities which by elementary computations turns out to be equivalent to where the left hand side is larger than one and the denominator on the right hand side is positive due to the fact that for d ≥ 2 we have q > d ≥ d = d d−1 . Taking the extremal bounds for q > d-note that the lower bound is increasing and the upper bound is decreasing with q-in (72) we get .
Thus, as a by-product, we get that for any t ∈ [1,t) there exists q > d such that For the regularization functional R(x) = x M(ω c ) , the IRGNM-Tikhonov minimization step is given by (ignoring h in the notation) Here and below Ω dΩ and ω c dx denote the integrals with respect to the Lebesgue measure and with respect to the measure x, respectively. Therefore, to compute this Gauss-Newton step, one first needs to solve the nonlinear equation forũ = u δ k , then solve the following optimality system with respect to (x, v, λ) (written in a strong formulation) which can be interpreted as the optimality system for the minimization problem with Lagrange multiplier λ for the equality constraint, and finally, computeμ by solving For carrying out the IRGNM iteration,μ is not required, but we need it for evaluating the error estimators.
For the Gauss-Newton step, one needs to first solve the nonlinear equation (74) for u = u δ k , and then solve the following optimality system with respect to (x, v, λ) which can be interpreted as the optimality system for the minimization problem with Lagrange multiplier λ for the equality constraint. Finally,μ is computed from (76). For the IRGNM-Morozov case, using for simplicity the regularization functional R(x) = 1 2 x 2 L 2 (ω c ) , and leaving the rest of the setting as in the IRGNM-Ivanov case, the step is defined by and ∀w ∈ H 1 0 (Ω) : So again we first solve (74) forũ = u δ k , then the minimization problem or actually its first order optimality system for (x δ k+1 , v δ k , φ, λ), and finally, forμ. For numerically efficient methods to solve the minimization problems (75) and (77) we refer to e.g., [4][5][6] and the references therein.
We finally check the tangential cone condition in case ω o = Ω and, for simplicity also ω c = Ω, in both settings (where we will have to restrict ourselves to d = 2) and For this purpose, we use the fact that with the notation 1,q 0 (Ω) satisfy the homogeneous Dirichlet boundary value problems for the equations Using an Aubin-Nitsche type duality trick, we can estimate the L 2 norm of w via the adjoint state p ∈ W 1,n 0 (Ω), which solves with homogeneous Dirichlet boundary conditions, so that by Hölder's inequality where we aim at choosing m ∈ [4, ∞], n ∈ [1, ∞] such that indeed L 2 (Ω) ⊆ W −1,n (Ω), i.e., by duality, and the fact that κu 2 p ∈ L o (Ω) should be contained in W −1,n (Ω) for u ∈ V ⊆ L t (Ω), and p ∈ W 1,n (Ω), which via Hölder's inequality in (Ω) and duality leads to the requirements

Numerical tests
In this section, we provide some numerical illustration of the IRGNM Ivanov method applied to the example from Sect. 4, i.e., each Newton step consists of solving (74) and subsequently (77). For the numerical solution of (74) we apply a damped Newton iteration to the equation Φ(ũ) = 0 where which is stopped as soon as Φ(ũ l ) H −1 (Ω) has been reduced by a factor of 1.e−4.
The sources x and states u are discretized by piecewise linear finite elements, hence after elimination of the state via the linear equality constraint, (77) becomes a box constrained quadratic program for the dicretized version of x, which we solve with the method from [12] using the Matlab code mkr_box provided to us by Philipp Hungerländer, Alpen-Adria Universität Klagenfurt. All implementations have been done in Matlab. We performed test computations on a 2-d domain ω o = ω c = Ω = (−1, 1) 2 , on a regular computational finite element grid consisting of 2 · N · N triangles, with N = 32. We first of all consider κ = 1 (below we will also show results with κ = 100) and the piecewise constant exact source function where B is the ball of radius 0.2 around (−0.4, −0.3) cf. Fig. 1, and correspondingly set ρ = 10. In order to avoid an inverse crime, we generated the synthetic data on a  finer grid and, after projection of u ex onto the computational grid, we added normally distributed random noise of levels δ ∈ {0.001, 0.01, 0.1} to obtain synthetic data y δ . In all tests we start with the constant function with value zero for x 0 . Moreover, we always set τ = 1.1. According to our convergence result Theorem 1 with R = · L ∞ (Ω) , we can expect weak * convergence in L ∞ (Ω) here. Thus we computed the errors in certain spots within the two homogeneous regions and on their interface, cf. Fig. 1, more precisely, on 1 N × 1 N squares located at these spots, corresponding to the piecewise constant L 1 functions with these supports in order to exemplarily test weak * L ∞ convergence. Additionally we computed L 1 errors. Table 1 provides an illustration of convergence as δ decreases. For this purpose, we performed five runs on each noise level for each example and list the average errors.
In Fig. 2 we plot the reconstructions for κ = 1 and κ = 100. For κ = 1, the noise levels δ ∈ {0.1, 0.667, 0.333, 0.01} correspond to a percentage of p ∈ {5.6, 18.5, 37.1, 55.6} of the L 2 deviation of the exact state from the background state u 0 = −10 1/3 . In case of κ = 100, where the background state is u 0 = −0.1 1/3 the corresponding percentages are p ∈ {17.9, 59.7, 119.4, 179.2}. For an illustration of the noisy data as compared to the exact ones, see Figs. 3 and 4. Indeed, the box constraints enable to cope with relatively large noise levels, even in the rather nonlinear regime with κ = 100.

Conclusions and remarks
In this paper we have studied convergence of the Tikhonov type, the Ivanov type, and the Morozov type IRGNM with a stopping rule based on the discrepancy principle type. To the best of our knowledge, the Ivanov and Morozov IRGNMs have not been studied so far and in all three Tikhonov, Ivanov, and Morozov type IRGNMs, convergence results without source conditions so far use stronger assumptions than the tangential cone condition used here. We also consider discretized versions of the methods and provide discretization error bounds that still guarantee convergence. Moroever, we discuss goal oriented dual weighted residual error estimators that can be used in an adaptive discretization scheme for controlling these discretization error bounds. An inverse source problem for a nonlinear elliptic boundary value problems illustrates our theoretical findings in the special situations of measure valued and L ∞ sources. We also provide some computational results with the Ivanov IRGNM for the case of an L ∞ source. Numerical implementations and tests for a measure valued source, together with adaptive discretization is subject of ongoing work, based on the approaches from [4][5][6]18,19]. Future research in this context will be concerend with convergence rates results for the Ivanov and Morozov IRGNMs under source conditions.