Abstract
In this paper we study the formulation of inverse problems as constrained minimization problems and their iterative solution by gradient or Newton type methods. We carry out a convergence analysis in the sense of regularization methods and discuss applicability to the problem of identifying the spatially varying diffusivity in an elliptic PDE from different sets of observations. Among these is a novel hybrid imaging technology known as impedance acoustic tomography, for which we provide numerical experiments.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Inverse problems usually consist of a model
where the operator A acts on the state u of a system and contains unknown parameters x, and an observation equation
quantifying the additionally available information that is supposed to allow for identifying the parameters x; by a slight notation overload, we will often summarize (x, u) into a single element, which we again call x.
The classical formulation of an inverse problem is as an operator equation
where usually x is the searched for parameter (some coefficient, initial or boundary conditions in a PDE or ODE model) but—in an allatonce formulation—might as well include the state, i.e., the PDE solution. In a conventional reduced setting \(F=C\circ S\) is the concatenation of an observation operator C with a parametertostate map S satisfying \(A(x,S(x))=0\), whereas an allatonce setting considers the inverse problem as a system \(\left\{ \begin{array}{c} A(x,u)=0\\ C(x,u)=y\end{array}\right.\), which by the above mentioned replacement \(x:=(x,u)\) takes the form (3), see, e.g. [14, 15]. Allatonce approaches have been studied for PDE constrained optimization already for many years in, e.g., [25,26,27,28, 30, 31] due to their computational advantages: The iterates need not be feasible with respect to the PDE constraint which safes computational effort and potentially allows for larger steps. However, this looseness can also lead to convergence problems and we will actually see this in the most challenging of our numerical test cases, namely the severely illposed problem of electric impedance tomography EIT. In this sense we here do not intend to favor any of the formulations but clearly point to their chances and limitations.
We here follow the idea of generalizing this to a formulation of an inverse problem as a constrained minimization problem
where in a reduced setting, x is the parameter and in an allatoncesetting \(x=(x,u)\) contains both parameter and state. In what follows, it will not be necessary to distinguish between these two cases notationally.
Straightforward instances for equivalent minimization based formulations of (1), (2) are, e.g.,
or in the context of (3) comprising both the reduced \(F(x)=C(S(x))\) and the allatonce \(F(x,u)=\left( \begin{array}{c} A(x,u)\\ C(x,u)\end{array}\right)\) setting simply
For further examples of such formulations, see., e.g., [16, 32]. In particular we point to the variational formulation of EIT according to Kohn and Vogelius, see, e.g, [24], which will be extended to further diffusion/impendance identification problems in Sect. 4.
Advantages of such minimization based formulations lie in their great versatility, the straightforward addition of regularization by adding penalty terms (Tikhonov’s method) or imposing constraints (Ivanov’s method), as well as the applicability of efficient optimization methods.
As an alternative to applying standard optimization tools to regularized versions of such minimization based formulations, we here study regularization by iterative methods, that take into account illposedness by being stopped early enough to prevent dominance of the propagated noise.
If J is differentiable then the first order optimality condition for a minimizer of (4) is
Here J is a proper functional acting on a Banach space X, and we make the normalization assumption
for \(x^\dagger\) solving the inverse problem, i.e., we assume to know the minimal value of J (but of course not the minimizer, which is what we intend to retrieve).
In case of (6), condition (8) means attainability of the exact data, which is in fact a very natural condition in inverse problems, where one assumes the noiseless version of the observations \(C(x^\dagger ,u^\dagger )\) to be caused by the true parameter \(x^\dagger\) in the model \(A(x^\dagger ,u^\dagger )=0\).
Typically inverse problems also in the form (4) are illposed in the sense that solutions to (4) do not depend continuously on the data y that enters the definition of the cost function J and/or of the feasible set M. Since in a realistic setting the data is contaminated with measurement noise, i.e., only \(y^\delta \approx y\) is given, regularization needs to be employed. We first of all do so by possibly adding some regularizing constraints—in particular we think of bound constraints in the sense of Ivanov regularization—and/or by relaxing constraints like fit to the data in the sense of Morozov regularization. In the context of (5), this, e.g., means that we replace \(M=\{x\in X: Cx=y\}\) by \(\tilde{M}^\delta =\{x\in X : \Vert Cxy^\delta \Vert \le \tau \delta \text{ and } \tilde{\mathcal {R}}(x)\le \rho \}\), for the noise level \(\delta \ge \Vert yy^\delta \Vert\), some constants \(\tau >1\), \(\rho >0\) and some functional \(\tilde{\mathcal {R}}\) satisfying \(\tilde{\mathcal {R}}(x^\dagger )\le \rho\).
Thus we consider the partly regularized problem
which we intend to solve iteratively, where further regularization is incorporated by early stopping and potentially also by adding regularizing terms during the iteration. As in the above example of \(\tilde{M}^\delta\), we will generally assume \(x^\dagger\) to be feasible also for this modified problem, and also approximately minimal
A typical choice of the bound in (10) in case of (6) is \(\eta (\delta )\sim \delta ^2\), where \(\delta\) is the noise level, cf. Remark 2. With (9) we formally stay in the same setting as in (4) and, like in (8), assume
The key difference to (4) lies in the fact that \(J^\delta\) and \(\tilde{M}^\delta\) might depend on the noise level and this will in fact be crucial since we will study convergence as \(\delta\) tends to zero.
Since we consider formulations of inverse problems as constrained minimization problems, an essential step is to extend iterative regularization methods such as gradient or Newton type methods, to more general cost functions and constrained minimization problems. Clearly, the optimization literature on Newton and gradient methods is vast, however, their application to illposed problems requires special consideration. In particular, second order sufficient conditions will typically fail to hold here. To see this, consider the Hessian of the cost function (6) at a solution \(J''(x^\dagger )(h,h)=\Vert F'(x^\dagger )h\Vert ^2\). Assuming its uniform positivity amounts to demanding bounded invertibitily of \(F'(x^\dagger )\), which usually does not hold for illposed problems. Along with these two paradigms concerning the search direction, we will consider two approaches for guaranteeing feasibility of the sequence, namely projection onto the admissible set in the context of gradient methods in Sect. 2 and sequential quadratic programming SQP type constrained minimization in Sect. 3.
Some key references for gradient, i.e., Landweber type iterative methods are [5] on projected Landweber iteration for linear inverse problems, [9] on (unconstrained) nonlinear Landweber iteration, and more recently [23] on gradient type methods under very general conditions on the cost function or the forward operator, respectively. Extensions with a penalty term (also allowing for the incorporation of constraints) for linear inverse problems can be found in [4]; For nonlinear problems we also point to [13, 33], however, they do not seem to be applicable to constrained problems, since the penalty term is assumed to be pconvex and thus cannot be an indicator function.
Newton type methods for the solution of nonlinear illposed problems have been extensively studied in Hilbert spaces (see, e.g., [3, 19] and the references therein) and more recently also in a in Banach space setting (see, e.g., [29] and the references therein). In particular, the iteratively regularized Gauss–Newton method [2] or the Levenberg–Marquardt method [8] easily allow to incorporate constraints in their variational form. Projected Gauss–Newton type methods for constrained illposed problems have been considered in, e.g., [18].
The remainder of this paper is organized as follows. In Sect. 2 we will study a projected version of Landweber iteration, thus a gradient type method in a Hilbert space setting and prove its convergence under certain convexity assumptions on the cost function. Section 3 turns to a general Banach space setting and discusses Newton SQP methods as well as their convergence. Finally, in Sect. 4 we investigate applicability to the identification of a spatially varying diffusion coefficient in an elliptic PDE from different sets of boundary conditions which leads to three different inverse problems: Inverse groundwater filtration (often also used as a model problem and denoted by aproblem), impedance acoustic tomography and electrical impedance tomography. Numerical experiments in Sect. 5 illustrate our theoretical findings.
The message of this paper is supposed to be two fold: First of all, we show that for inverse problems formulated by constrained minimization, besides the approach of regularizing and then applying stateoftheart iterative optimization tools (regularize, then iterate) there is also the option of applying iterative methods to the un or only partly regularized problem (regularize by iteration; more precisely, by early stopping). Secondly, by means of the mentioned diffusion identification problems we wish to demonstrate the large variety of possible minimization formulations arising even in the context of a single elliptic PDE, and to highlight some of the chances and limitations related to these various formulations.
2 A projected gradient method
In this section, we consider the projected gradient method for (9)
and extend some of the results from [23] to the constrained setting, or from a different viewpoint, extend some of the results from [5] to the nonlinear setting. In (12), \(\mu _k>0\) is a stepsize parameter and \(\nabla J^\delta (x_k)\in X\) is the Riesz representation of \({J^\delta }'(x_k)\in X^*\) as in this section we restrict ourselves to a Hilbert space setting. The reason for this is the fact that in general Banach spaces, \({J^\delta }'(x_k)\) would have to be transported back into X by some (nonlinear, e.g. in \(L^p\) with \(p\not =2\)) duality mapping, which adds nonlinearity and therefore, among others, complicates the choice of the step size, see e.g. [21] for the unconstrained least squares case (6). Moreover, throughout this section we will assume \(\tilde{M}^\delta\) to be closed and convex and denote by \(P_{\tilde{M}^\delta }\) the metric (in the Hilbert space setting considered in this section also orthogonal) projection onto \(\tilde{M}^\delta\), which is characterized by the variational inequality
With \(z:=x_k\in \tilde{M}^\delta\), this immediately implies
hence
and thus, using the Cauchy–Schwarz inequality, the estimate
Moreover, as well known for (projected) gradient methods, under the Lipschitz type condition on the gradient
for \(\mu _k\le \overline{\mu }<\frac{2}{L}\), from (14) we get monotonicity of the cost function values
and square summability of the steps
Monotonicity of the error under additional convexity assumptions easily follows from nonexpansivity of the projection, which yields
This can be further estimated under a monotonicity condition on \(\nabla J^\delta\) (i.e., convexity condition on \(J^\delta\))
(which for \(\gamma =0\) follows from convexity of \(J^\delta\), i.e., monotonicity of \(\nabla J^\delta\)) and assuming approximate stationarity
Using (18), (19), we get from (17), that for all \(k\le k_*1\) with \(k_*\) defined by
the estimate
for \(\tau >\frac{1}{\gamma }\), \(0<\underline{\mu }\le \mu _k\le \bar{\mu }<2(\gamma \frac{1}{\tau })\) holds. Hence we get summability
Alternatively, we can further estimate (17) under a condition following from (18), (19) and comprising both convexity and approximate stationarity
which for \(k\le k_*1\) implies
as well as
Using (23)–(24) we get from (17)
for \(\tau >\frac{1}{\gamma }\), \(0<\underline{\mu }\le \mu _k\le \bar{\mu } < \tfrac{2\gamma }{1+\tfrac{1}{\gamma \tau 1}}\), hence summability
which via (24) also implies summability of \(\Vert \nabla J^\delta {(x_k)} \Vert ^2\).
The estimates (22), (26) imply convergence of the gradient to zero as \(k\rightarrow \infty\) in the noise free case and finiteness of the stopping index \(k_*\) in case of noisy data. In the noiseless case \(\delta =0\) Opial’s Lemma (Lemma 2 in the Appendix) with \(S=\{x^*\in X:\forall x\in M:\,\langle \nabla J(x^*),xx^*\rangle \ge 0\}\), due to monotonicity of \(\Vert x_kx^*\Vert\) and the Bolzano–Weierstrass Theorem, implies weak convergence of \(x_k\) as \(k\rightarrow \infty\) to a stationary point. In case of noisy data, one could think of applying the continuous version of Opial’s Lemma (Lemma 3 in the Appendix) with \(t:=\frac{1}{\delta }\), \(x(t):=x_{k_*(\delta )}\). However, we do not have monotonicity of the final iterates \(x_{k_*(\delta )}\) as a function of \(\delta\). Still, in case of uniqueness, that is, if S is a singleton \(S=\{x^\dagger \}\), then boundedness of the sequence \(\Vert x_{k_*(\delta )}x^*\Vert\) by \(\Vert x_0x^*\Vert\) together with a subsequencesubsequence argument yields its weak convergence of \(x_{k_*(\delta )}\) to \(x^\dagger\) as \(\delta \rightarrow 0\).
For this purpose, we have to impose certain continuity assumptions on the cost function and the constrains, namely
which in the noiseless case becomes
Proposition 1
Let (8), (11), (23) hold, and let the sequence of iterates \(x_k\) be defined by (12) with \(k_*\) defined by (20).

(i)
In the noiseless case \(\delta =0\), we assume that M and \(\nabla J\) satisfy (28). Then the sequence \((x_k)_{k\in \mathbb {N}}\) converges weakly to a solution \(x^*\in M\) of the first order optimality condition (7) as \(k\rightarrow \infty\).

(ii)
In the noisy case \(\delta >0\), we assume that (10), (19), and (27) hold. Then the family \((x_{k_*(\delta )})_{\delta \in (0,\bar{\delta }]}\) converges weakly subsequentially to a stationary point \(x^\dagger\) according to (7) as \(\delta \rightarrow 0\). If this stationary point is unique, then the whole sequence converges weakly to \(x^\dagger\).
The same assertions hold with stationarity (7) (with (27)) replaced by

(a)
minimality, i.e., \(x^\dagger\) (and z) \(\in \text{ argmin }\{J(x):x\in M\}\) or by

(b)
\(\Vert \nabla J(x^\dagger )\Vert =0\) (and \(\Vert \nabla J(z)\Vert =0\)).
Note that cases (a), (b) make uniqueness easier than (7).
Remark 1
Strong convergence can be shown for the modified projected Landweber method from [5, Section 3.2]. However, this requires a source condition to hold.
Remark 2
Let us finally comment on the convexity condition (23) and the continuity conditions (27), (28).
In the special case \(J^\delta (x)= \tfrac{1}{2}\Vert F(x)y^\delta \Vert ^2\) cf. (6), condition (23) becomes
which follows, e.g., from
with \(\Vert F(x^\dagger )y^\delta \Vert ^2\le 4\kappa \eta (\delta )\). Here \(\mathcal {D}(F)\) does not necessarily need to be the maximal domain of F and therefore can be chosen to be a sufficiently small closed ball (hence weakly closed) to enable—together with an appropriate scaling—the uniform boundedness condition on \(F'\) in (30). Condition (30) is closely related to the usual normalization and tangential cone conditions for Landweber iteration, see, e.g., [9, 23]. It is, e.g., satisfied for linear F as well as for some specific coefficient identification problems, see, e.g., [9] for the reduced setting, [14] for the allatonce setting, and [20] for some time dependent problems in both reduced and allatonce formulation.
In the same case (6), with \(\tilde{M}^\delta :=\mathcal {D}(F)\), condition (27) can be verified under usual assumptions on F cf. e.g., [6, 7] as well: We assume existence of a constant K such that
where the latter is supposed to hold for any sequence \((z_n)_{n\in \mathbb {N}}\subseteq \mathcal {D}(F)\) and means weak sequential closedness of \(x\mapsto F'(x)^*(F(x)y)\) at zero value. The prerequisite of (27) in this setting reads \(z_n \rightharpoonup z \text{ and } F'(z_n)^*(F(z_n)y^{\delta _n})\rightarrow 0\), and due to \(F'(z_n)^*(F(z_n)y^{\delta _n})=F'(z_n)^*(F(z_n)y) +F'(z_n)^*(yy^{\delta _n})\), under these assumptions implies \(\Vert F'(z_n)^*(yy^{\delta _n})\Vert \le K \Vert yy^{\delta _n}\Vert \rightarrow 0\), hence \(F'(z_n)^*(F(z_n)y)\rightarrow 0\), and therefore \(F'(z)^*(F(z)y)=0\), from which \(\langle \nabla J(z),xz\rangle = \langle \nabla F'(z)^*(F(z)y),xz\rangle \ge 0\) trivially follows.
3 An SQP type constrained Newton method
A quadratic approximation of the cost function combined with a Tikhonov type additive regularization term yields the iteration scheme
with
where \(G^\delta\) and \(H^\delta\) should be viewed as (approximations to) the gradient and Hessian of J, \(G^\delta (x_k)\approx {J^\delta }'(x_k)\), \(H^\delta (x_k)\approx {J^\delta }''(x_k)\), and \(\mathcal {R}\) is a regularization functional. Since we do not necessarily neglect terms in \({J^\delta }''(x_k)\), this differs from the iteratively regularized Gauss–Newton method IRGNM studied, e.g., in [2, 17, 22], cf. (35) below.
As opposed to Landweber iteration, where each step on its own is stable, for Newton’s method one has to regularize in each step due to unboundedness of the inverse of the Hessian. To allow the bias due to regularization to vanish as the Newton iteration proceeds, we choose the sequence of regularization parameters \(\alpha _k\) such that it tends to zero as \(k\rightarrow \infty\).
Here X is a general Banach space.
To guarantee existence of minimizers, besides (33) we will make the following assumption
Assumption 1
For some topology \(\mathcal {T}_0\) on X

for all \(r\ge \mathcal {R}(x^\dagger )\), the sublevel set \(\tilde{M}^\delta _r:=\{x\in M^\delta : \mathcal {R}(x)\le r\}\) is \(\mathcal {T}_0\) compact.

the mapping \(Q_k^\delta +\alpha _k\mathcal {R}\) is \(\mathcal {T}_0\) lower semicontinuous.
Uniqueness of a minimizer of (32) will not necessarily hold; the sequence \((x_k)_{k\in \{1,\ldots ,k_*\}}\) will therefore be defined by an arbitrary selection of minimizers of (32).
The overall iteration is stopped according to the discrepancy principle
for some constant \(\tau >1\).
As far as the sequence of regularization parameters \(\alpha _k\) is concerned, we will choose it a priori or a posteriori, see (37), (44) below.
A special case of this with
(note that \(H^\delta\) in general does not coincide with the Hessian of \(J^\delta\), since the term containing \(F''\) is skipped ) in Hilbert space is the iteratively regularized Gauss–Newton method for the operator equation formulation (3) of the inverse problem, see, e.g., [2, 17, 22].
Another special case we will consider is the quadratic one
with \(f^\delta \in X\), \(\bar{G}^\delta \in L(X,\mathbb {R})=X^*\), \(\bar{H}^\delta \in L(X^2,\mathbb {R})\), where trivially \(Q_k^\delta\) coincides with \(J^\delta\).
To provide a convergence analysis, we start with the case of an a priori choice of \(\alpha _k\)
for some \(\theta \in (0,1)\) and make, among others, the following assumption.
Assumption 2
For some topology \(\mathcal {T}\) on X,

the sublevel set \(\{x\in \bigcup _{\delta \in (0,\bar{\delta })}\tilde{M}^\delta : \mathcal {R}(x)\le R\} = \bigcup _{\delta \in (0,\bar{\delta })}\tilde{M}_R^\delta\) is \(\mathcal {T}\) compact, with \(R=(1+\tfrac{ab}{\tau (ab)c} \tfrac{\tau b+c}{(a\theta b)}){\mathcal {R}(x^\dagger )} + \tfrac{ab}{\tau (ab)c} \tfrac{\tau b+c}{\alpha _0} J(x_0)\) with \(\tau\) as in (34), \(\alpha _0\) as in (37), and a, b, c as in (38);

M is \(\mathcal {T}\) closed with respect to the family of sets \((\tilde{M}^\delta )_{\delta \in (0,\bar{\delta })}\) in the following sense:
$$\begin{aligned}&\text{ For } \text{ any } \text{ sequence } (z_n)_{n\in \mathbb {N}}\subseteq X, \, (\delta _n)_{n\in \mathbb {N}}\in (0,\bar{\delta }], \, \delta _n\rightarrow 0 \text{ as } n\rightarrow \infty \\&\Bigl (\forall n\in \mathbb {N}: z_n\in \tilde{M}^{\delta _n} \text{ and } z_n {\mathop {\longrightarrow }\limits ^{\mathcal {T}}} z \Bigr ) \Rightarrow z\in M \end{aligned}$$ 
\(\lim _{\delta \rightarrow 0} \sup _{x\in M^\delta }(J(x)J^\delta (x))\le 0\);

J is \(\mathcal {T}\) lower semicontinuous.
Comparably to the tangential cone condition in the context of nonlinear Landweber iteration [9] and more recently also the IRGNM [22] we impose a restriction on the nonlinearity/nonconvexity of J
with \(a>b\ge 0\), \(c\ge 0\), A discussion of this assumption can be found in Remark 5 below; in view of Taylor’s Theorem, typical values of the constants would be \(a\sim 1\), \(b\sim 0\), \(c\sim 1\).
Theorem 1
Let conditions (10), (11), (33), (38), and Assumptions 1, 2hold, assume that \(\alpha _k\) is chosen a priori according to (37), and \(k_*\) is chosen according to the discrepancy principle (34), with the following constraints on the constants
Then

For any \(\delta \in (0,\bar{\delta })\), and any \(x_0\in \bigcap _{\delta \in (0,\bar{\delta })}\tilde{M}^\delta \,\cap M\),

the iterates \(x_k\) are welldefined for all \(k\le k_*(\delta )\) and \(k_*(\delta )\) is finite;

for all \(k\in \{1,\ldots , k_*(\delta )\}\) we have
$$\begin{aligned} J^\delta (x_k)\le \tfrac{b}{a} J^\delta (x_{k1}) + \tfrac{1}{a} \alpha _k{\mathcal {R}(x^\dagger )} + \tfrac{c}{a}\eta ; \end{aligned}$$ 
for all \(k\in \{1,\ldots , k_*(\delta )\}\) we have
$$\begin{aligned} \mathcal {R}(x_k)\le R \end{aligned}$$


As \(\delta \rightarrow 0\), the final iterates \(x_{k_*(\delta )}\) tend to a solution of the inverse problem (4) \(\mathcal {T}\) subsequentially, i.e., every sequence \(x_{k_*(\delta _j)}\) with \(\delta _j\rightarrow 0\) as \(j\rightarrow \infty\) has a \(\mathcal {T}\) convergent subsequence and the limit of every \(\mathcal {T}\) convergent subsequence solves (4).
Proof
For any \(k\le k_*1\), existence of a minimizer follows from Assumption 1 by the direct method of calculus of variations. To this end, note that by \(x^\dagger \in \tilde{M}^\delta\), implying
and the lower bound
which yields
for \(r=\mathcal {R}(x^\dagger )+\frac{1}{\alpha }(bJ^\delta (x_k)+cJ^\delta (x^\dagger ))\), it suffices to restrict the search for a minimizer to the set \(\tilde{M}^\delta _r\) as defined in Assumption 1.
For a hence existing minimizer \(x_{k+1}\), its minimality together with feasibility of \(x^\dagger\) for (32) yields
which with (38) implies
thus, with the a priori choice (37), and (10), abbreviating \(J_k=J^\delta (x_k)\), \(\mathcal {R}_k=\mathcal {R}(x_k)\), \(\mathcal {R}^\dagger =\mathcal {R}(x^\dagger )\)
Inductively, with \(\mathcal {R}\ge 0\), we conclude that for all \(k\le k_*\)
Using the minimality of \(k_*\) according to (34), we get, for all \(k\le k_*1\), that \({\eta (\delta )}\le \frac{J_k}{\tau }\) and therefore, together with (42)
Inserting this back into (41) with \(J^\delta \ge 0\), after multiplication by \(\frac{a}{\alpha _k}\) and again using (34) yields
for all \(k\le k_*1\).
From (42), which holds for all \(k\le k_*\) and \(\tau >\tfrac{c}{ab}\), as well as \(\tfrac{b}{a}<\theta\), we conclude that the stopping index according to (34) is reached after finitely many, namely at most \(\frac{\log \left( \left( \tau \tfrac{c}{ab}\right) {\eta (\delta )}\right) \log \left( J_0+\tfrac{\alpha _0}{a\theta b}\mathcal {R}^\dagger \right) }{\log \theta }\) steps.
Setting \(k=k_*1\) in (43) yields \(\mathcal {R}(x_{k_*(\delta )})\le R\), which implies \(\mathcal {T}\) convergence of a subsequence \(x^j\) of \(x_{k_*(\delta )}\) to some \(\bar{x}\), which by Assumption 2 lies in M.
By definition of \(k_*\) and (10) we have \(J(x_{k_*(\delta )})\le \tau {\eta (\delta )}(\delta )+J(x_{k_*(\delta )})J^\delta (x_{k_*(\delta )})\rightarrow 0\) as \(\delta \rightarrow 0\); \(\mathcal {T}\) lower semicontinuity therefore yields \(J(\bar{x})=0\). \(\diamondsuit\)
We now consider convergence with an a posteriori choice of \(\alpha _k\) according to the discrepancy principle type rule (which can also be interpreted as an inexact Newton condition)
with \(0<\underline{\sigma }<\overline{\sigma }<1\); note that in (44), the denominator of \(\sigma _k(\alpha _k)\) will be positive and bounded away from zero by \(\tau \eta (\delta )\) for all \(k\le k_*(\delta )1\) by (34). In order to obtain welldefinedness of \(\sigma _k(\alpha )\) as a function of \(\alpha\), we will assume that the mapping
is single valued, which is, e.g., the case if the minimizer of \(Q_k^\delta (x)+\alpha \mathcal {R}(x)\) over \(\tilde{M}^\delta\) is unique. The latter can be achieved, e.g., by assuming convexity of \(Q_k^\delta\)—choosing \(H^\delta\) as a positive semidefinite approximation of the (not necessarily positive semidefinite) true Hessian \({J^\delta }''\)—and strict convexity of \(\mathcal {R}\). To numerically determine \(\alpha _k\) such that it satisfies (44), some iterative procedure such as the bisection method can be applied. For the a posteriori choice (44) we have to slightly modify the setting to guarantee existence of \(\alpha _k\) such that (44) holds. The latter is possible if for some appropriate point \(x^*\), the quotient \(\frac{Q^\delta _k(x^*)}{J^\delta (x_k)}\) is large enough
as we will show below. This leads us to the following case distinction for updating the iterates
Here \(x^*\in \bigcap _{\delta \in (0,\bar{\delta })} \tilde{M}^\delta \cap M\) is a point of attraction of \(\mathcal {R}\) in the sense of the following assumption.
Assumption 3
For some topology \(\mathcal {T}_1\) on X,

\(\mathcal {R}(x^*)=0\) and for any sequence \((x_j)_{j\in \mathbb {N}}\subseteq X\)
$$\begin{aligned} \mathcal {R}(x_j)\rightarrow 0 \Rightarrow x_j {\mathop {\longrightarrow }\limits ^{\mathcal {T}_1}} x^* \end{aligned}$$(46) 
sublevel sets of \(\mathcal {R}\) are \(\mathcal {T}_1\) compact;

\(\mathcal {R}\) is \(\mathcal {T}_1\) lower semicontinuous;

the mapping \(x\mapsto G^\delta (x_k)(xx_k)+\tfrac{1}{2} H^\delta (x_k)(xx_k)^2\) is \(\mathcal {T}_1\) continuous

\(\tilde{M}^\delta\) is \(\mathcal {T}_1\) closed.
A simple example of a functional \(\mathcal {R}\) satisfying this assumption is some power of the norm distance from the a priori guess \(x^*\), \(\mathcal {R}(x)=\Vert xx^*\Vert ^p\), for some \(p\in [1,\infty )\) along with the weak or weak* topology \(\mathcal {T}_1\), provided X is reflexive or the dual of a separable space.
Lemma 1
The mappings \(\alpha \mapsto \mathcal {R}(x_{k+1}(\alpha ))\) and \(\alpha \mapsto Q_k(x_{k+1}(\alpha ))\), where \(x_{k+1}(\alpha )\in X_{k+1}(\alpha )\) (cf. (32)) are monotonically decreasing.
If additionally Assumption 3and (38) hold, and the mapping \(\alpha \mapsto Q_k(X_{k+1}(\alpha ))\) is single valued, then the mapping \(\alpha \mapsto \sigma _k(\alpha )\) is welldefined and continuous on \((0,\infty )\).
Proof
For two values \(\alpha\), \(\tilde{\alpha }\), minimality implies
which implies
Hence, \(\alpha \mapsto \mathcal {R}(x_{k+1}(\alpha ))\) is monotonically decreasing and
that is, \(\alpha \mapsto Q_k(x_{k+1}(\alpha ))\) is monotonically increasing.
To prove continuity of the mapping \(\alpha \mapsto Q_k(x_{k+1}(\alpha ))\) under the assumption that this mapping is single valued, consider \(\bar{\alpha }>0\) and a sequence \((\alpha _\ell )_{\ell \in \mathbb {N}}\) converging to \(\bar{\alpha }>0\). Minimality and (38) yield
which by strict positivity of \(\bar{\alpha }\) implies boundedness of \((\mathcal {R}(x_{k+1}(\alpha _\ell )))_{\ell \in \mathbb {N}}\). By Assumption 3 there exists a \(\mathcal {T}_1\) convergent subsequence \((x_{k+1}(\alpha _{\ell _j}))_{j\in \mathbb {N}}\) whose limit \(\bar{x}\) lies in \(\tilde{M}^\delta\) and even in \(X_{k+1}(\alpha )\), due to the fact that \(Q_k(x_{k+1}(\alpha _{\ell _j}))\rightarrow Q_k(\bar{x})\) and the estimate
A subsequencesubsequence argument together with \(\mathcal {T}_1\) continuity of \(Q_k\) and the assumed single valuedness of the mapping \(\alpha \mapsto Q_k(X_{k+1}(\alpha ))\) implies convergence \(Q_k(x_{k+1}(\alpha _\ell ))\rightarrow Q_k(x_{k+1}(\bar{\alpha }))\), hence, after division by \(J^\delta (x_k)\), convergence \(\sigma _k(x_{k+1}(\alpha _\ell ))\rightarrow \sigma _k(x_{k+1}(\bar{\alpha }))\). \(\diamondsuit\)
To prove convergence of the iterates, we need a slightly stronger condition than (38), namely
with \(\underline{a},\underline{b},\overline{a},\overline{b}\ge 0\). Note that (47) implies \((\overline{a}\underline{a})J^\delta (x_+)+(\underline{b}\overline{b})J^\delta (x)\ge 0\), hence by nonnegativity of \(J^\delta\) and the fact that \(J^\delta (x^\dagger )\le \eta\) can get arbitrarily close to zero, \(\overline{a}\ge \underline{a}\) and \(\underline{b}\ge \overline{b}\). In fact, (47) implies (38) with \(a=\underline{a}\), \(b=\underline{b}\overline{b}\), \(c=\overline{a}\).
Theorem 2
Let conditions (10), (11), (33), (47), and Assumptions 1, 2, 3hold, assume that \(\alpha _k\) is chosen a posteriori according to (44) if (45) holds (otherwise set \(x_{k+1}:=x^*\)), and \(k_*\) is chosen according to the discrepancy principle (34), with the following constraints on the constants
Then

For any \(\delta \in (0,\bar{\delta })\), and any \(x_0\in \bigcap _{\delta \in (0,\bar{\delta })}{\tilde{M}^\delta }\),

the iterates \(x_k\) are welldefined for all \(k\le k_*(\delta )\) and \(k_*(\delta )\) is finite;

for all \(k\in \{1,\ldots , k_*(\delta )\}\) and \(q=\frac{\overline{\sigma } 1+\underline{b}}{\underline{a}}<1\) we have
$$\begin{aligned} J^\delta (x_k)\le q J^\delta (x_{k1}); \end{aligned}$$ 
for all \(k\in \{1,\ldots , k_*(\delta )\}\) and \(x^\dagger\) satisfying (10) we have
$$\begin{aligned} \mathcal {R}(x_k)\le \mathcal {R}(x^\dagger )\quad \text{ and }\quad x^\dagger \;\text{ solves } \text{(4). } \end{aligned}$$


As \(\delta \rightarrow 0\), the final iterates \(x_{k_*(\delta )}\) tend to a solution of the inverse problem (4) \(\mathcal {T}\) subsequentially, i.e., every sequence \(x_{k_*(\delta _j)}\) with \(\delta _j\rightarrow 0\) as \(j\rightarrow \infty\) has a \(\mathcal {T}\) convergent subsequence and the limit of every \(\mathcal {T}\) convergent subsequence solves (4).
Proof
Existence of minimizers \(x_{k+1}(\alpha )\) of (32) with \(\alpha >0\) in place of \(\alpha _k\) follows like in the a priori setting of Theorem 1, using the fact that (47) implies (38).
To prove that \(\alpha _k\) satisfying (44) exists under condition (45), we first of all verify the upper bound with \(\alpha =0\) (which actually does not require (45)). To this end, we make use of minimality (39) and the upper bound in (47) to conclude
so that by (34), for any \(k\in \{1,\ldots ,k_*1\}\)
On the other hand, minimality and the fact that \(x^*\in \tilde{M}^\delta\) together with the lower bound in (47) and \(\mathcal {R}(x^*)=0\) yield
which by nonnegativity of \(\underline{a}J^\delta (x_{k+1}(\alpha ))\) yields
which by Assumption 3 implies \(\mathcal {T}_1\) convergence of \(x_{k+1}(\alpha )\) to \(x^*\), thus, by (45) \(\lim _{\alpha \rightarrow \infty } \sigma _k(\alpha )\ge \underline{\sigma }\). The Intermediate Value Theorem together with continuity of the mapping \(\alpha \mapsto \sigma _k(\alpha )\) according to Lemma 1 implies existence of an \(\alpha \in (0,\infty )\) such that \(\underline{\sigma }\le \sigma _k(\alpha )\le \overline{\sigma }\).
In both cases we get geometric decay of the cost function values: If (45) is satisfied, this follows from the lower bound in (47) and the upper bound in (44)
Otherwise, negation of (45) and the fact that in that case we set \(x_{k+1}=x^*\), together with the lower bound in (47) directly yields
This implies that \(k_*\) is finite, more precisely \(k_*\le \frac{\log (\tau \eta )\log (J^\delta (x_0))}{\log (q)}\).
To establish the bound on \(\mathcal {R}(x_{k+1})\), we again employ minimality (39) together with (47), which in case (45) with (44) yields
hence, due to (34), \(\tau (\overline{b}+\underline{\sigma }1)\ge \overline{a}\),
If (45) fails to hold then we set \(x_{k+1}=x^*\), hence get \(\mathcal {R}(x_{k+1})=0\).
The rest of the proof is the same as for Theorem 1. \(\diamondsuit\)
Remark 3
The computational cost for each Newton step (as compared to a gradient method) is determined by the effort spent on evaluating the action of the Hessian (approximation) \(H^\delta (x_k)\) on a vector and on solving systems with \(H^\delta (x_k)\) as a system matrix—again usually based on iterative methods and matrixvector products. For example, when usung the exact Hessian as \(H^\delta (x_k)\) in a reduced formulation of the inverse problem, each matrixvector product amounts to numerically solving a possibly large number of linearized versions of the PDE model; this effort can be reduced to just one PDE solve by means of adjoint methods. On the other hand, setting \(H^\delta (x_k)\) just to a multiple of the identity, one can even end up with a plain gradient method. However, in view of Taylor’s Theorem one is more likely to satisfy condition (38) with a better Hessian approximation. In practice, a limited memory BFGS method can probably be viewed as standard and only needs gradient information; note however, that for the cost functions in our numerical examples it would also be possible to find analytical expressions for the Hessian, cf., e.g., (58), whose action on a vector can be evaluated by means of, e.g., a finite element method.
The advantages of method (32) lie in its versatility: Besides the various options of choosing \(H^\delta (x_k)\), it also works in general Banach spaces and with quite general regularization functionals \(\mathcal {R}\). In case of a quadratic functional \(\mathcal {R}\), note that as opposed to the IRGNM considered, e.g., in [17, 22, 29], where depending on the choice of the data misfit term, the cost function can become nonlinear, we always deal with a quadratic overall cost function here.
Remark 4
Note that the conditions (48) on the constants can be satisfied by choosing \(\tau\) sufficiently large and \(\underline{\sigma }<\overline{\sigma }\) in an appropriate way, provided the constants in (47) satisfy
since then we can choose \(\underline{\sigma },\overline{\sigma }\) to satisfy \(1\overline{b}<\underline{\sigma }<\overline{\sigma }<1+\underline{a}\underline{b}\), so that (48) can be achieved by making \(\tau\) large enough.
Remark 5
Condition (47) is motivated by the fact that
with equality in case of a quadratic functional \(J^\delta\) (36) from which (again using nonnegativity of \(J^\delta\)) we expect values \(\underline{a}\le 1\), \(\underline{b}\ge 1\), \(\overline{a}\ge 1\), \(\overline{b}\le 1\) where these constants can be chosen the closer to one the closer \(J^\delta\) is to a quadratic functional. Also note that (47) holds with \(\underline{a}=\underline{b}=\overline{a}=\overline{b}\) in the quadratic case (36) independently of the definiteness of the Hessian, so does not necessarily relate to convexity of \(J^\delta\). Indeed, while nonnegativity of the Hessian would be enforced by assuming \(J^\delta \ge 0\) on all of X, we only assume this to hold on \(\tilde{M}^\delta\) cf. (11).
A sufficient condition for (47) (with \(\underline{a}=1\tilde{c}\), \(\underline{b}=1+\tilde{c}\), \(\overline{a}=1+\tilde{c}\), \(\overline{b}=1\tilde{c}\)) is
which, in its turn is implied by the weak tangential cone condition in the Hilbert space least squares setting (35)
with \(\tilde{c}=(1+\sqrt{2})c_{tc}\); cf. (30). This can be seen by using the fact that the left hand side in (50) just equals the left hand side in (49) with (35), and by estimating the right hand side with \(\alpha :=\Vert F(x_+)y^\delta \Vert\), \(\beta :=\Vert F(x)y^\delta \Vert\) as follows
Condition (50) with \(x_+=x^\dagger\) is also sufficient for condition (29) from the previous section with \(\gamma = 1c_{tc}\kappa\) provided \((1+c_{tc})\Vert F(x)y^\delta \Vert \le 2\sqrt{\kappa \eta (\delta )}\) and \(\Vert F'(x)\Vert \le 1\) as the estimate
following from (50) with the triangle inequality and Young’s inequality shows.
In order to further relate the assumptions (38), (47) made for Newton’s method with those (18), (23) for the projected gradient method, we will now point out that actually also the sufficient condition (49) involves some convexity.
For this purpose we consider the noise free case \(\delta =0\) for simplicity of exposition and use the fact that for \(n\in \mathbb {N}_0\), a functional \(J\in C^n(X)\) and elements \(x,\tilde{x},h\in X\), the identity
holds. Thus we can rewrite the left hand sides of the nonlinearity conditions (18), (49) as
and, with \(J(x^\dagger )=0\)
where, assuming \(J'(x^\dagger )=0\) (as is the case in the examples from Sect. 4)
and likewise for x replaced by \(x_+\). Similarly, using the identities \(\int _0^1\, d\theta =1\), \(\int _0^1\int _0^1\theta \,d\sigma \, d\theta =\frac{1}{2}\), one sees that for the right hand side in (49) with \(G:=J'\), \(H:=J''\), the identity
holds.
Since the left hand sides in (18), (49) both have to be nonnegative (in some uniform sense) we see from (51) and (52) (and setting \(x_+=x^\dagger\) in (49) to see necessity) that \(J''\) needs to be positive definite (in some uniform sense) in order for (18), (49) to hold. This amounts to a convexity condition on J.
Remark 6
Alternatively to (32) one could consider the projected versions (based on unconstrained minimization)
see [18] which, however, analogously to the projected Landweber iteration from [5, Section 3.2] only converges under a sufficiently strong source condition.
4 Application in diffusion/impedance identification
Following the seminal idea from [24] we consider variational formulations of the problem of identifying the spatially varying parameter \(\sigma\) in the elliptic PDE
from observations of \(\phi\). Depending on what kind of observations we consider, this problem arises in several applications that we will consider here, namely

(a)
in classical electrical impedance tomography EIT, where it is known as Calderon’s problem and \(\sigma\) plays the role of an electrical conductivity,

(b)
in impedance acoustic tomography IAT, a novel hybrid imaging method, again for reconstructing \(\sigma\) as a conductivity;

(c)
but also as a simplified version of the inverse groundwater filtration problem GWF of recovering the diffusion coefficient \(\sigma\) in an aquifer.
Although we will finally be only able to verify the crucial conditions (18), (49) for GWF, we stick to the electromagnetic context notation wise, since in our numerical experiments we will focus on a version of EIT that is known as impedance acoustic tomography IAT, see, e.g., [34]. In Sect. 5 we will also allow for experiments with several excitations (and corresponding measurements), hence consider
However for simplicity of notation, we will focus on the case \(I=1\), i.e., (54), in this section. The observations are, depending on the application
where for EIT and IAT we will consider the more realistic complete electrode model in Sect. 5. Concerning GWF, measurements are actually done on the piezometric head itself, however this allows to recover an approximation of its gradient by means of regularized numerical differentiation, see, e.g. [10] and the references therein.
Regularization is here only introduced via imposing simple bound constraints \(\sigma \in [\underline{\sigma },\overline{\sigma }]\), that is, Ivanov regularization. Besides the availability of efficient optimization methods for solving such problems, see e.g. [11] and the references therein and the straightforward use of known physical bounds, this has the advantage of leading to piecewise constant solutions, a fact that can be explained by some bangbang principle. This is relevant when identifying inclusions in a homogeneous background, which is the case that we focus on in our numerical experiments.
Considering a smooth and simply connected bounded domain \(\varOmega \subseteq \mathbb {R}^2\) and using the vector fields \(\mathbf {E}\) (the electric field), \(\mathbf {J}\) (the current density), where \(\nabla =\left( \begin{array}{c}\partial _1\\ \partial _2\end{array}\right)\), \(\nabla ^\bot =\left( \begin{array}{c}\partial _2\\ \partial _1\end{array}\right)\) we can equivalently rephrase (54) as
for some potential \(\psi\) (note that we are using the opposite sign convention as compared to the usual engineering notation). The cost function part pertaining to this model is, analogously to [24], therefore often called the Kohn–Vogelius functional
where we denote the infinitesimal area element by \(d\varOmega\) to avoid confusion with the abbreviation \(x_k\) for the iterates in the first three sections of this paper. Alternatively, we will consider the output least squares type cost function term
Note that (56) is quadratic with respect to \(\mathbf {J}\), thus quadratic with respect to \(\psi\).
Excitation is imposed via the current j through the boundary, i.e., as Dirichlet boundary condition on \(\psi\).
To incorporate the observations, we will consider the functionals
where again for GWF the use of the \(H^s(\varOmega )\) norm or flux data can be justified by some presmoothing procedure applied to the given measurements.
Using these functionals as building blocks and incorporating the excitation via injection of the current j through the boundary we can write the above parameter identification problems in several minimization based formulations. We will now list a few of them, where j sometimes appears explicitly, sometimes in tangentially integrated form, meaning that for a parameterization \(\varGamma\) of the boundary \(\partial \varOmega\) (normalized to \(\Vert \dot{\varGamma }\Vert =1\)) we define \(\alpha (\varGamma (s))=\int _0^s j(\varGamma (r))\, d r\) so that \(\mathbf {J}\cdot \nu =\nabla ^\bot \psi \cdot \nu =\frac{d\alpha }{ds}=j\). Moreover we will sometimes work with smooth extensions \(\phi _0\), \(\psi _0\) of v, \(\alpha\) to the interior of \(\varOmega\). While, as already mentioned, the observation functional will depend on the application, we always have both \(J_{mod}^{KV}\) and \(J_{mod}^{LS}\) at our disposal to incorporate the model, thus will only write \(J_{mod}\) below. There will also be versions based on an elimination of \(\sigma\) by writing, for fixed \(\phi ,\psi\), the minimizer of \(J_{mod}\) with respect to \(\sigma\) under the constraint \(\underline{\sigma }\le \sigma \le \overline{\sigma }\) as
Alternatively to eliminating \(\sigma\) it is also possible to eliminate \(\phi ,\psi\) by writing them as \(\phi (\sigma )\), \(\psi (\sigma )\) minimizing \(J_{mod}\) with respect to \(\phi ,\psi\). This together with the integrated current \(\alpha\) leads to boundary value problems for the elliptic PDE (54) and a similar PDE for \(\psi\)
(the latter two lines imply that \(\nabla ^\bot \cdot \mathbf {E}(\sigma )=0\) so existence of \(\phi\) such that \(\mathbf {E}(\sigma )=\nabla \phi\)) and corresponds to the classical reduced formulation of the inverse problem. Note that \(\phi (\sigma )\) is only defined in case of v being observed, i.e., for EIT.
EIT:
IAT:
GWF:
where
and \(\beta >0\) is a fixed parameter; we will simply set it to one in our computations. Note that \(J_{mod}(\sigma ,\mathbf {E}(\sigma ),\nabla ^\bot \psi (\sigma ))=0\), therefore, the model term does not appear in the last instances of IAT and GWF, respectively. However, due to the bound constraints incorporated into the definition of \(\sigma (\phi ,\psi )\), a nonzero value of \(J_{mod}(\sigma (\phi ,\psi ),\nabla \phi ,\nabla ^\bot \psi )\) is possible, which is why it appears in the third and fourth instances of EIT. The sixth instance of EIT is just the classical reduced formulation.
As far as convexity is concerned, the Hessians of the functionals in (55), (56), (57) compute as
Thus, the Hessians of \(J_{mod}^{KV}\), \(J_{mod}^{LS}\), \(J_{obs_1}^{IAT}\), \(J_{obs_2}^{IAT}\) can only be guaranteed to be positive at their minimal points, whereas those of \(J_{obs}^{EIT}\), \(J_{obs_1}^{GWF}\), \(J_{obs_2}^{GWF}\) are always positive. Since \(J_{obs}^{EIT}\) only acts on the boundary, its additive combination with \(J_{mod}^{KV}\) or \(J_{mod}^{LS}\) cannot be expected to yield a globally convex functional. Likewise, combinations of \(J_{obs_1}^{IAT}\) or \(J_{obs_2}^{IAT}\) with \(J_{mod}^{KV}\) or \(J_{mod}^{LS}\) cannot be expected to be overall convex. This corresponds to the known fact that also for other formulations of EIT and IAT, the usual nonlinearity/convexity conditions fail to hold.
A combination satisfying the nonlinearity assumption (49) and therefore also (47), (18) is GWF with
To verify this, we show that (29), (50) is satisfied for \(F(\sigma ,\phi ,\psi )=\left( \begin{array}{c}\sigma \nabla \phi \nabla ^\bot \psi \\ \nabla \phi \end{array}\right)\) by estimating (with the abbreviations \(\mathbf {E}=\nabla \psi\), \(\mathbf {J}=\nabla ^\bot \psi\))
which directly implies (50) with \(c_{tc}=\frac{\overline{\sigma }\underline{\sigma }}{\sup _{x\in \tilde{M}^\delta }\Vert F'(x)\Vert }\) and hence (29) with \(\gamma = 1c_{tc}\kappa\) provided \((1+c_{tc})\Vert F(x)y^\delta \Vert \le 2\sqrt{\kappa \eta (\delta )}\). In order to obtain a finite value of
we choose \(\tilde{M}^\delta\) to be a bounded subset of \(L^\infty (\varOmega )\times W^{1,\infty }(\varOmega )\times H^1(\varOmega )\) with an a priori bound satisfied by the exact solution of the inverse problem.
5 Numerical results for IAT, GWF, and EIT
In this section, we will provide some numerical results for the problem of identifying the conductivity \(\sigma\) in (54). As already mentioned, we will work with the more realistic complete electrode model (CEM) instead of idealized continuous boundary excitation and observations. Moreover, we will focus on the hybrid tomographic application IAT and we will only show one set of reconstructions of GWF and EIT. More extensive numerical tests for IAT but also for GWF and EIT can be found in the PhD thesis [12].
5.1 The complete electrode model and setting for the cost functions
In the complete electrode model (CEM) current is fed in through a finite number of electrodes, \(e_1, \dots , e_L\), see Fig. 1. In case of boundary measurements, as relevant for EIT, they are also taken at these electrodes.
Let \(J_i\), \(E_i\), \(i = 1,2, \dots , I\), be the current density and the electric field in the ith measurement, and let \(\phi _i\), \(\psi _i\) be the potentials for \(J_i\), \(E_i\); then they must satisfy
\(\forall i= 1, 2, \dots , I\), where

\(j_{\ell ,i}\), \(v_{\ell ,i}\) are the applied current and measured voltage on \(e_\ell\) at the ith measurement,

\(\{z_\ell \}_{\ell =1}^{L}\) is the set of positive contact impedances.
By assuming \(\psi (e_i^a) = 0\) and using (59c), (59d) (59b), we get
where \({\bar{j}}_{\ell ,i} = \sum _{k=1}^{\ell } j_{k,i}\), \(\bar{v}_{\ell ,i}(x) = z_\ell \Big ( \sum _{k=1}^{\ell 1} j_{k,i} \Big )+ v_{\ell ,i} d_{e_\ell } (x)\) and \(d_{e_\ell } (x)\) is the length of \(e_\ell\) from \(e_\ell ^a\) to x.
In the case of EIT, the data \((j_{\ell ,i}, v_{\ell ,i})_{\ell ,i} \in \mathbb {R}^{2LI}\) can be considered as \(({\bar{j}}_{\ell ,i}, \bar{v}_{\ell ,i})_{\ell ,i} \in \prod _{i=1}^I \prod _{\ell =1}^L (L^2(g_\ell ) \times L^2(e_\ell ))\) and the cost function part corresponding to observations is chosen as \(J^{EIT}_{obs}: H^1(\varOmega )^{2I} \rightarrow \mathbb {R}\),
where \(\varPhi = (\phi _i)_i\), \(\varPsi = (\psi _i)_i\), \({\bar{j}} = (\bar{j}_{\ell ,i})_{\ell ,i}\), \({\bar{v}} = ({\bar{v}}_{\ell ,i})_{\ell ,i}\) and \(\int \phi _i_{e_\ell }: e_\ell \rightarrow \mathbb {R}, x \mapsto \int _{e_\ell ^a}^x \phi _i {~\mathrm {d}}s\).
In the cases of IAT and GWF, instead of \((j_{\ell ,i}, v_{\ell ,i})_{\ell ,i}\), we observe \(\mathcal {H} = (\mathcal {H}_i)_i = (\sigma \nabla \phi _i^2)_i\) or \(p= (p_i)_i = (\phi _i)_i\), respectively and the cost function part corresponding to these observations is \(J^{IAT}_{obs} {/J^{GWF}_{obs}} : L^2(\varOmega ) \times H^1(\varOmega )^I \rightarrow \mathbb {R}\),
In all three cases EIT, IAT, GWF, we choose the cost function part corresponding to the model as \(J^{KV}_{mod}: L^2(\varOmega ) \times H^1(\varOmega )^{2I} \rightarrow \mathbb {R}\),
and combine it with the observation part to
on the admissible sets
As in the previous section, besides the resulting allatonce versions [cf. EIT (i), (ii) and IAT (i)] we also consider some of the reduced versions of the cost function. [For GWF we will only show results with the allatonce version (i)]
The first version involves eliminating \(\sigma\) from the cost function [cf. EIT (iii), (iv) and IAT (ii)], by defining, for given \(\varPhi ,\varPsi\), the corresponding \(\sigma\) by
or explicitly
For the case of IAT, we set
for the case of EIT,
and in all cases
Note that in spite of the minimizing predefinition of \(\sigma (\varPhi , \varPsi )\), the model cost function part may be nonzero due to the constraints and therefore still needs to be taken into account.
The second alternative cost function involves eliminating \((\varPhi , \varPsi )\) from the cost function [cf. EIT (v), (vi) and IAT (iii)] by means of the weak form of the CEM PDE (59)
and \(\nabla ^\bot \psi _i = \sigma \nabla \phi _i, \forall i \in \{1,\dots ,I\}\), which leads to \(J^{KV}_{mod} (\sigma , \varPhi (\sigma ), \varPsi (\sigma )) = 0\). Hence, we have, for the case of IAT,
for the case of EIT,
where \(v(\sigma ) = (v_{\ell ,i}(\sigma ))_{i \in \{1,\dots ,I\},\ell \in \{1,\dots ,L\}}\) is the solution to (69) and
Summarizing, our tests comprise three different formulations: allatonce, eliminate \(\sigma\), eliminate \((\varPhi , \varPsi )\). The latter corresponds to the conventional reduced approach, whereas the former two are new.
5.2 Implementation using the finite element method in Matlab
In order to generate synthetic data by solving the CEM PDE (59) we use the finite element method. In all our computations, \(\varOmega\) is the unit circle in \(\mathbb {R}^2\) with eight identical electrodes (\(L=8\)) denoted by \(e_1, \dots , e_8\) attached equidistantly on its boundary (see Fig. 1). The domain \(\varOmega\) is discretized by a regular finite element mesh defined by nodes \(P_k, k\in \{1,\dots , N_{\mathrm {node}}= 913\}\) and elements \(\varOmega _h, h \in \{1,\dots ,N_{\mathrm {element}}=432\}\). The ansatz spaces \(L^2 (\varOmega )\) for \(\sigma\) and \(H^1 (\varOmega )\) for \(\phi ,\psi\) are approximated by piecewise constant and continuous piecewise quadratic finite element spaces \({\tilde{L}}^2 (\varOmega )\) and \({\tilde{H}}^1 (\varOmega )\), respectively.
With \(L=8\) electrodes, there are \(N_{\mathrm {meas}}= 28\) possible combinations of excitations—we will use some of them to reconstruct \(\sigma\) later. At the ith measurement, we impose the injected current \((j_{\ell ,i})_{\ell =1}^L\) with
at the electrodes and then solve the Galerkin discretized weak form (cf. (69))
to find \((\phi ^{\mathrm {ex}}_i,(v^{\mathrm {ex}}_{\ell ,i}))\) and the corresponding exact data
The synthetic measured data is generated by adding random noise such that
which in an obvious way defines the noisy versions \(J^\delta\), \(J_\sigma ^\delta\), \(J_{\varPhi ,\varPsi }^\delta\) and \(M^\delta\), \(M_{\sigma ,{\mathrm {ad}}}^\delta\), \(M_{(\varPhi , \varPsi ),{\mathrm {ad}}}^\delta\) of the cost functions and admissible sets, respectively. In our tests we consider three values of \(\delta\): \(\delta =0\), \(\delta =0.01\) and \(\delta =0.1\).
To avoid an inverse crime, we used a coarser mesh in our reconstructions.
In (69) we set the value of contact impedances \(z_\ell\), \(\ell \in \{1,\dots ,L\}\), to 0.1.
The test case considered in all of our computational results is defined by a constant inclusion on a constant background
where \(B_r(p_1,p_2) \subset \mathbb {R}^2\) is the ball centered at \((p_1,p_2)\) with radius r. For IAT we also show results with two inclusions
With each of the three above mentioned cost function combinations (allatonce, eliminated \(\sigma\), eliminated \((\varPhi ,\varPsi )\), the iterates \(x_k=(\sigma _k,\varPhi _k,\varPsi _k)\), or \(x_k=(\varPhi _k,\varPsi _k)\), or \(x_k=\sigma _k\), are defined by the projected gradient method (12) from Sect. 2 where \(\mu _k\) is found by an Armijo back tracking line search. Details on computation of the gradients of the various cost functions can be found in [12]. The iteration is stopped by the discrepancy principle (20) in the noisy case and as soon as the step size fell below a value \(\epsilon _\mu\) (which we set to \(10^{10}\) in our tests) in case of exact data.
5.3 Numerical results for IAT
We consider four cases of excitations, namely

\(I=1\), with \(j_{1,1}=1\), \(j_{5,1}=1\) and \(j_{k,1}=0\) otherwise;

\(I=2\), with \(j_{1,1}=j_{3,2}=1\), \(j_{5,1}=j_{7,2}=1\), and \(j_{k,i}=0\) otherwise;

\(I=4\), with \(j_{1,1}=j_{3,2}=j_{2,3}=j_{4,4}=1\), \(j_{5,1}=j_{7,2}=j_{6,3}=j_{8,4}=1\) and \(j_{k,i}=0\) otherwise.

\(I=28\), with all \((\begin{array}{c} 8\\ 2 \end{array})\) combinations of setting \(j_{k,i}=1\), \(j_{\ell ,i}=1\) for \(k\not =\ell \in \{1,\ldots ,8\}\)
The starting value is set to the mean value of the maximal and minimal value for the conductivity \(\sigma _0 = \frac{1}{2} (\underline{\sigma } + \overline{\sigma })\) and \(\varPhi _0, \varPsi _0\), if necessary, are gained from the weak form (69) where \(\sigma\) is replaced by \(\sigma _0\).
Tables 1, 2 and 3 show the data about the number of iterations, the error \(\Vert \sigma _{end}  \sigma ^{{\mathrm {ex}}} \Vert _{L^2 (\varOmega )}\), the CPU time (in seconds) and the CPU time for each iteration for various versions of cost functions.
In Figs. 2, 3 and 4, we display some pictures of reconstructions.
Evidently, reconstructions improve with more information—that is, larger I—and better quality data—that is, lower \(\delta\). Also, the computationally more expensive “eliminate \((\varPhi ,\varPsi )\)” version that requires solution of elliptic boundary value problems provides better results than the cheaper “eliminate \(\sigma\)” or allatonce versions. This is clearly visible in the relative errors (column 3 of Tables 1, 2 and 3) and CPU times (last two columns of Tables 1, 2 and 3). Surprisingly, in spite of the fact that IAT cannot be shown to satisfy the assumption (49) due to nonlinearity of the observation operator, results are better for IAT than for GWF, see Fig. 8, to be compared with Fig. 2
Finally, we also show some reconstructions of two inclusions in Figs. 5, 6 and 7.
5.4 Numerical results for GWF
Based on the same set of excitation combinations and the same starting value (see the beginning of Sect. 5.3), we generated the reconstructions displayed in Fig. 8. As already mentioned above, they are actually worse than those obtained in the (more nonlinear) IAT problem and the use of the CEM clearly leads to boundary artefacts.
5.5 Numerical results for EIT
We close with a few plots of reconstructions for EIT. Here, since we only have boundary observations, in order to gain enough information, it is necessary to use all measurements \(I=28\). Moreover, the allatonce and eliminating\(\sigma\) versions failed to converge, so we here only provide results with the classical reduced version of EIT corresponding to the eliminating \((\varPhi ,\varPsi )\) version (71). Starting from the constant value \(\sigma _{0} = \frac{1}{2} \big ( \underline{\sigma } + \overline{\sigma } \big )\) we obtain the reconstructions in Fig. 9 for noise levels of zero, one and ten per cent. Note that in view of the exponential illposedness of this inverse problem, the quality of reconstructions is more than reasonable for this level of data contamination. The reasons for failure of the allatonce and eliminating\(\sigma\) versions for EIT are at least two fold: First of all, the looser bond between parameter \(\sigma\) and states \((\varPhi ,\varPsi )\) in these versions would necessitate additional regularization of the state (recall that we here only regularize \(\sigma\) by imposing bounds)—which could be easily added, but we did not do so here in order to keep comparability of the three different examples. Secondly, this effect is most severe in this example due to its exponential illposedness.
The latter, together with some regularity loss in the solution due to the CEM, is the reason for boundary artefacts showing up at the highest noise level.
More details in particular on numerical tests for EIT can be found in the PhD thesis [12].
6 Conclusions and remarks
In this paper we have provided convergence results on the iterative solution methods (gradient or Newton type) for minimization based formulations of inverse problems. We apply these to the identification of a spatially varying diffusion coefficient in an elliptic PDE from different kinds of measurements, in particular corresponding to the electrical impedance tomography EIT, the impedance acoustic tomography IAT problem, and the inverse groundwater filtration problem GWF. We provide numerical results for these three test cases, thereby mainly focusing on IAT. Future work will, e.g., be concerned with investigations on the convexity conditions: How can an additive combination of functionals and constraints help to satisfy them, e.g., for EIT or IAT?
Also a comparison of the Newton type method analyzed in Sect. 3 with the gradient type method from Sect. 2 should be carried out. Clearly the Newton type method is more demanding both from an implementation and from a computational cost (per step) point of view. However, this might still pay off in view of the fact that it can be expected to reach the desired error tolerance already after a much smaller number of steps.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators. Math. Program. 174, 391–432 (2019). https://doi.org/10.1007/s101070181252x
Bakushinskii, A.B.: The problem of the convergence of the iteratively regularized Gauss–Newton method. Comput. Math. Math. Phys. 32, 1353–1359 (1992)
Bakushinsky, A.B., Kokurin, M.Y.: Iterative Methods for Approximate Solution of Inverse Problems. Mathematics and Its Applications, vol. 577. Springer, Dordrecht (2004)
Boţ, R.I., Hein, T.: Iterative regularization with a general penalty term—theory and application to \(L^1\) and \(TV\) regularization. Inverse Probl. 28(10), 104010 (2012). https://doi.org/10.1088/02665611/28/10/104010
Eicke, B.: Iteration methods for convexly constrained illposed problems in Hilbert space. Numer. Funct. Anal. Optim. 13(5–6), 413–429 (1992). https://doi.org/10.1080/01630569208816489
Engl, H., Kunisch, K., Neubauer, A.: Convergence rates for Tikhonov regularisation of nonlinear illposed problems. Inverse Probl. 5(4), 523–540 (1989)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht (1996). https://doi.org/10.1007/9789400917408
Hanke, M.: A regularization Levenberg–Marquardt scheme, with applications to inverse groundwater filtration problems. Inverse Probl. 13, 79–95 (1997)
Hanke, M., Neubauer, A., Scherzer, O.: A convergence analysis of the Landweber iteration for nonlinear illposed problems. Numer. Math. 72, 21–37 (1995)
Hanke, M., Scherzer, O.: Inverse problems light: numerical differentiation. Am. Math. Mon. (2001). https://doi.org/10.2307/2695705
Hungerländer, P., Kaltenbacher, B., Rendl, F.: Regularization of inverse problems via box constrained minimization. Inverse Probl. Imaging 14, 437–461 (2020). https://doi.org/10.3934/ipi.2020021
Huynh, K.V.: Solving inverse problems without forward operators: application examples and iterative solvers. Ph.D. thesis, AlpenAdriaUniversität Klagenfurt (2021)
Jin, Q., Wang, W.: Landweber iteration of Kaczmarz type with general nonsmooth convex penalty functionals. Inverse Probl. 29(8), 085011 (2013). https://doi.org/10.1088/02665611/29/8/085011
Kaltenbacher, B.: Regularization based on allatonce formulations for inverse problems. SIAM J. Numer. Anal. 54, 2594–2618 (2016). https://doi.org/10.1137/16M1060984. arXiv:1603.05332v1 [math.NA]
Kaltenbacher, B.: Allatonce versus reduced iterative methods for time dependent inverse problems. Inverse Probl. 33, 064002 (2017). https://doi.org/10.1088/13616420/aa6f34. arXiv:1910.02857 [math.NA]
Kaltenbacher, B.: Minimization based formulations of inverse problems and their regularization. SIAM J. Optim. 28, 620–645 (2018). https://doi.org/10.1137/17M1124036
Kaltenbacher, B., Klassen, A., Previatti de Souza, M.: The Ivanov regularized Gauss–Newton method in Banach space with an a posteriori choice of the regularization radius. J. Inverse Illposed Probl. 27, 539–557 (2019). https://doi.org/10.1515/jiip20180093. arXiv:1910.01811 [math.NA]
Kaltenbacher, B., Neubauer, A.: Convergence of projected iterative regularization methods for nonlinear problems with smooth solutions. Inverse Probl. 22, 1105–1119 (2006)
Kaltenbacher, B., Neubauer, A., Scherzer, O.: Iterative Regularization Methods for Nonlinear Problems. Radon Series on Computational and Applied Mathematics, de Gruyter, Berlin (2008)
Kaltenbacher, B., Nguyen, T.T.N., Scherzer, O.: The tangential cone condition for some coefficient identification model problems in parabolic PDEs. In: Kaltenbacher, B., Wald, A., Schuster, T. (eds.) Timedependent Problems in Imaging and Parameter Identification. Springer, Berlin (2021)
Kaltenbacher, B., Schöpfer, F., Schuster, T.: Convergence of some iterative methods for the regularization of nonlinear illposed problems in Banach spaces. Inverse Probl. 25, 065003 (2009). https://doi.org/10.1088/02665611/25/6/065003
Kaltenbacher, B., Previatti de Souza, M.: Convergence and adaptive discretization of the IRGNM Tikhonov and the IRGNM Ivanov method under a tangential cone condition in Banach space. Numer. Math. 140, 449–478 (2018). https://doi.org/10.1007/s0021101809715. arXiv:1707.07589 [math.NA]
Kindermann, S.: Convergence of the gradient method for illposed problems. Inv. Probl. Imaging 11(4), 703–720 (2017). https://doi.org/10.3934/ipi.2017033
Kohn, R.V., Vogelius, M.: Relaxation of a variational method for impedance computed tomography. Commun. Pure Appl. Math. 40(6), 745–777 (1987)
Kunisch, K., Sachs, E.W.: Reduced SQP methods for parameter identification problems. SIAM J. Numer. Anal. 29(6), 1793–1820 (1992). https://doi.org/10.1137/0729100
Kupfer, F., Sachs, E.: Numerical solution of a nonlinear parabolic control problem by a reduced SQP method. Comput. Optim. Appl. 1(1), 113–135 (1992)
Leibfritz, F., Sachs, E.W.: Inexact SQP interior point methods and large scale optimal control problems. SIAM J. Control. Optim. 38(1), 272–293 (1999). https://doi.org/10.1137/S0363012996298795
Orozco, C.E., Ghattas, O.N.: A reduced SAND method for optimal design of nonlinear structures. Int. J. Numer. Methods Eng. 40(15), 2759–2774 (1997)
Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.: Regularization Methods in Banach Spaces. Radon Series on Computational and Applied Mathematics, de Gruyter, Berlin (2012)
Shenoy, A.R., Heinkenschloss, M., Cliff, E.M.: Airfoil design by an allatonce method. Int. J. Comput. Fluid Mech. 11, 3–25 (1998)
Ta’asan, S.: “one shot” methods for optimal control of distributed parameter systems: I finite dimensional control. Tech. rep., Institute for Computer Applications in Science and Engineering : NASA Langley Research Center (1991)
Van Huynh, K., Kaltenbacher, B.: Some application examples of minimization based formulations of inverse problems and their regularization. Inverse Probl. Imaging 15, 415–443 (2020). https://doi.org/10.3934/ipi.2020074. arXiv:2004.12965 [math.NA]
Wang, J., Wang, W., Han, B.: An iteration regularization method with general convex penalty for nonlinear inverse problems in Banach spaces. J. Comput. Appl. Math. 361, 472–486 (2019). https://doi.org/10.1016/j.cam.2019.05.008
Widlak, T., Scherzer, O.: Hybrid tomography for conductivity imaging. Inverse Prob. 28(8), 084008 (2012). https://doi.org/10.1088/02665611/28/8/084008
Acknowledgements
The authors thank both reviewers for their careful reading of the manuscript and their detailed reports with valuable comments and suggestions that have led to an improved version of the paper. This work was supported by the Austrian Science Fund fwf under the grants P30054 and DOC 78.
Funding
Open access funding provided by University of Klagenfurt.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 2
([1, Lemma A.3]; Opial, discrete) Let S be a non empty subset of a Hilbert space X, and \((x_k)_{k\in \mathbb {N}}\) a sequence of elements of X. Assume that

(i)
for every \(z \in S\), \(\lim _{k\rightarrow \infty }\Vert x_k  z\Vert\) exists;

(ii)
every weak sequential limit point of \((x_k)_{k\in \mathbb {N}}\), as \(k\rightarrow \infty\), belongs to S.
Then \(x_k\) converges weakly as \(k\rightarrow \infty\) to a point in S.
Lemma 3
([1, Lemma A.2]; Opial, continuous) Let S be a non empty subset of a Hilbert space X, and \(x:[0,\infty )\rightarrow X\) a map. Assume that

(i)
for every \(z \in S\), \(\lim _{t\rightarrow \infty }\Vert x(t)  z\Vert\) exists;

(ii)
every weak sequential limit point of x(t), as \(t\rightarrow \infty\), belongs to S.
Then x(t) converges weakly as \(t\rightarrow \infty\) to a point in S.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kaltenbacher, B., Van Huynh, K. Iterative regularization for constrained minimization formulations of nonlinear inverse problems. Comput Optim Appl 81, 569–611 (2022). https://doi.org/10.1007/s1058902100343x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1058902100343x