# Iteratively regularized Newton-type methods for general data misfit functionals and applications to Poisson data

## Authors

- First Online:

- Received:
- Revised:

DOI: 10.1007/s00211-012-0499-z

- Cite this article as:
- Hohage, T. & Werner, F. Numer. Math. (2013) 123: 745. doi:10.1007/s00211-012-0499-z

- 15 Citations
- 824 Views

## Abstract

We study Newton type methods for inverse problems described by nonlinear operator equations \(F(u)=g\) in Banach spaces where the Newton equations \(F^{\prime }(u_n;u_{n+1}-u_n) = g-F(u_n)\) are regularized variationally using a general data misfit functional and a convex regularization term. This generalizes the well-known iteratively regularized Gauss–Newton method (IRGNM). We prove convergence and convergence rates as the noise level tends to \(0\) both for an a priori stopping rule and for a Lepskiĭ-type a posteriori stopping rule. Our analysis includes previous order optimal convergence rate results for the IRGNM as special cases. The main focus of this paper is on inverse problems with Poisson data where the natural data misfit functional is given by the Kullback–Leibler divergence. Two examples of such problems are discussed in detail: an inverse obstacle scattering problem with amplitude data of the far-field pattern and a phase retrieval problem. The performance of the proposed method for these problems is illustrated in numerical examples.

### Mathematics Subject Classification (2000)

65J1565J2078A4665K10## 1 Introduction

For fundamental physical reasons, photon count data are described by a Poisson process with the exact data \(g^{\dagger }\) as mean if read-out noise and finite averaging volume of detectors is neglected. Ignoring this a priori information often leads to non-competitive reconstruction methods.

*Kullback–Leibler divergence*

*a priori*knowledge about the unknown solution \(u^\dagger \). This leads to the iteratively regularized Newton-type method

The most common choice of the data misfit functional is \(\mathcal{S }\left(\hat{g};g\right) = \left\Vert\, g-\hat{g}\right\Vert_{\mathcal{Y }}^2\) with a Hilbert space norm \(\Vert \cdot \Vert _{\mathcal{Y }}\). This can be motivated by the case of (multi-variate) Gaussian errors. If the penalty term is also given by a Hilbert space norm \(\mathcal R \left(u\right)=\left\Vert\,u-u_0\right\Vert_{\mathcal{X }}^2\), (4) becomes the iteratively regularized Gauss–Newton method (IRGNM) which is one of the most popular methods for solving nonlinear ill-posed operator equations [2, 3, 9, 31]. If the penalty term \(\left\Vert\, u-u_0\right\Vert_{\mathcal{X }}^2\) is replaced by \(\left\Vert\,u - u_n\right\Vert_{\mathcal{X }}^2\) one obtains the Levenberg–Marquardt method, which is well-known in optimization and has first been analyzed as regularization method in [20]. Recently, a generalization of the IRGNM to Banach spaces has been proposed and analyzed by Kaltenbacher and Hofmann [30].

As opposed to a Hilbert or Banach space setting our data misfit functional \(\mathcal{S }\) does not necessarily fulfill a triangle inequality. Therefore, it is necessary to use more general formulations of the noise level and the tangential cone condition, which controls the degree of nonlinearity of the operator \(F\). Both coincide with the usual assumptions if \(\mathcal{S }\) is given by a norm. Our analysis uses variational methods rather than methods based on spectral theory, which have recently been studied in the context of inverse problems by a number of authors (see, e.g., [13, 24, 30, 40, 42]).

The plan of this paper is as follows: In the following section we formulate our first main convergence theorem (Theorem 2.3) and discuss its assumptions. The proof will be given in Sect. 3. In the following Sect. 4 we discuss the case of additive variational inequalities and state a convergence rates result for a Lepskiĭ-type stopping rule (Theorem 4.2). In Sect. 5 we compare our result to previous results on the iteratively regularized Gauss–Newton method. Section 6 is devoted to the special case of Poisson data, which has been our main motivation. We conclude our paper with numerical results for an inverse obstacle scattering problem and a phase retrieval problem in optics in Sect. 7.

## 2 Assumptions and convergence theorem with a priori stopping rule

Throughout the paper we assume the following mapping and differentiability properties of the forward operator \(F\):

**Assumption 1**

*Assumptions on*\(F\)

*and*\(\mathcal R \)) Let \(\mathcal{X }\) and \(\mathcal{Y }\) be Banach spaces and let \(\mathfrak B \subset \mathcal{X }\) a convex subset. Assume that the forward operator \(F:\mathfrak B \rightarrow \mathcal{Y }\) and the penalty functional \(\mathcal R : \mathcal{X }\rightarrow \left(-\infty , \infty \right]\) have the following properties:

- 1.
\(F\) is injective.

- 2.\(F:\mathfrak B \rightarrow \mathcal{Y }\) is continuous, the first variationsexist for all \(u,v\in \mathfrak B \), and \(h\mapsto F^{\prime }(u;h)\) can be extended to a bounded linear operator \(F^{\prime }[u]\in L(\mathcal{X },\mathcal{Y })\) for all \(u\in \mathfrak B \).$$\begin{aligned} F^{\prime }(u;v-u):=\lim \limits _{t\searrow 0} \frac{1}{t}(F(u+t(v-u))-F(u)) \end{aligned}$$
- 3.
\(\mathcal R \) is proper and convex, and \(\mathfrak B \cap \mathrm{dom}(\mathcal R )\ne \emptyset \).

At interior points \(u \in \mathfrak B \) the second assumption amounts to Gateaux differentiability of \(F\).

To motivate our assumptions on the data misfit functional, let us consider the case that \(g^{\mathrm{obs}}= F(u^{\dagger })+\xi \), and \(\xi \) is Gaussian white noise on the Hilbert space \(\mathcal Y \), i.e. \(\langle \xi ,g\rangle \sim N(0,\Vert g\Vert ^2)\) and \(\mathbf{E }\langle \xi ,g\rangle \,\langle \xi ,\tilde{g}\rangle = \langle g, \tilde{g}\rangle \) for all \(g,\tilde{g}\in \mathcal Y \). If \(\mathcal{Y }=\mathbb R ^J\), then the negative log-likelihood functional is given by \(\mathcal{S }\left(g^{\mathrm{obs}};g\right) = \Vert g-g^{\mathrm{obs}}\Vert _{2}^2\). However, in an infinite dimensional Hilbert space \(\mathcal{Y }\) we have \(\Vert g^{\mathrm{obs}}\Vert _{\mathcal{Y }}=\infty \) almost surely, and \(\mathcal{S }\left(g^{\mathrm{obs}};\cdot \right)\equiv \infty \) is obviously not a useful data misfit term. Therefore, one formally subtracts \(\Vert g^{\mathrm{obs}}\Vert _{\mathcal{Y }}^2\) (which is independent of \(g\)) to obtain \(\mathcal{S }\left(g^{\mathrm{obs}};g\right) := \left\Vert\, g\right\Vert_{\mathcal{Y }}^2 - 2 \left<g^{\mathrm{obs}},g\right>_{\mathcal{Y }}\). For exact data \(g^{\dagger }\) we can of course use the data misfit functional \(\mathcal{T }\left(g^\dagger ;g\right) = \left\Vert\, g-g^\dagger \right\Vert_{\mathcal{Y }}^2\). As opposed to \(\mathcal{S }\), the functional \(\mathcal{T }\) is nonnegative and does indeed describe the size of the error in the data space \(\mathcal{Y }\). It will play an important role in our analysis.

It may seem cumbersome to work with two different types data misfit functionals \(\mathcal{S }\) and \(\mathcal{T }\). A straightforward idea to fix the free additive constant in \(\mathcal{S }\) is to introduce \(\tilde{\mathcal{S }}\left(g^{\mathrm{obs}};g\right):= \mathcal{S }\left(g^{\mathrm{obs}};g\right)-\tilde{\mathfrak{s }}\) with \(\tilde{\mathfrak{s }}:=\inf _{g\in \mathcal{Y }} \mathcal{S }\left(g^{\mathrm{obs}};g\right)\) such that \(\tilde{\mathcal{S }}\left(g^{\mathrm{obs}};\cdot \right)\) is nonnegative and \(\tilde{\mathcal{S }}\left(g^{\dagger };g\right)=\mathcal{T }\left(g^{\dagger };g\right)\). However, \(\tilde{\mathfrak{s }}=-\infty \) a.s. A better choice of the additive constant is \(\mathfrak s = \mathbf{E }\mathcal{S }\left(g^{\mathrm{obs}};g\right)-\mathcal{T }\left(g^{\dagger };g\right) = -\Vert g^{\dagger }\Vert ^2\) since for this choice the error has the convenient representation \(\mathcal{S }\left(g^{\mathrm{obs}};g\right)+ \Vert g^{\dagger }\Vert ^2-\mathcal{T }\left(g^{\dagger };g\right) = -2\langle \xi ,g\rangle _{\mathcal{Y }}\), and the expected error \(\mathbf{E }\big |\mathcal{S }\left(g^{\mathrm{obs}};g\right)-\mathfrak s -\mathcal{T }\left(g^{\dagger };g\right)\big |^2\) is minimized. Note that \(\mathfrak s \) depends on the unknown \(g^{\dagger }\), but this does not matter since the value of \(\mathfrak s \) does not affect the numerical algorithms. Bounds on \(\sup _{g\in \tilde{\mathcal{Y }}}\left|\langle \xi ,g\rangle _{\mathcal{Y }}\right|\) with high probabilities for certain subsets \(\tilde{\mathcal{Y }}\subset \mathcal Y \) (concentration inequalities) have been studied intensively in probability theory (see e.g. [34]). Such results can be used in case of Gaussian errors to show that the following deterministic error assumption holds true with high probability and uniform bounds on \(\mathbf{err}(g)\) for \(g\in \tilde{\mathcal{Y }}\).

**Assumption 2**

*data errors, properties of*\(\mathcal{S }\)

*and*\(\mathcal{T }\)) Let \(u^\dagger \in \mathfrak B \subset \mathcal{X }\) be the exact solution and denote by \(g^\dagger := F\left(u^\dagger \right) \in \mathcal{Y }\) the exact data. Let \(\mathcal{Y }^{\mathrm{obs}}\) be a set containing all possible observations and \(g^{\mathrm{obs}}\in \mathcal{Y }^{\mathrm{obs}}\) the observed data. Assume that:

- 1.
The fidelity term \(\mathcal{T }: F\left(\mathfrak B \right) \times \mathcal{Y }\rightarrow [0,\infty ]\) with respect to exact data fulfills \(\mathcal{T }\left(g^\dagger ;g^\dagger \right)=0\).

- 2.\(\mathcal{T }\) and the fidelity term \(\mathcal{S }: \mathcal{Y }^{\mathrm{obs}}\times \mathcal{Y }\rightarrow (-\infty ,\infty ]\) with respect to noisy data are connected as follows: There exists a constant \(C_{\mathrm{err}}\ge 1\) and functionals \(\mathbf{err}: \mathcal{Y }\rightarrow \left[0, \infty \right]\) and \(\mathfrak s :F \left(\mathfrak B \right) \rightarrow (-\infty ,\infty )\) such that$$\begin{aligned} \mathcal{S }\left(g^{\mathrm{obs}};g\right) - \mathfrak s (g^{\dagger })&\le C_{\mathrm{err}}\mathcal{T }\left(g^\dagger ;g\right) + C_{\mathrm{err}}\mathbf{err}\left(g\right)\end{aligned}$$(8a)for all \(g \in \mathcal{Y }\).$$\begin{aligned} \mathcal{T }\left(g^\dagger ;g\right)&\le C_{\mathrm{err}}\left( \mathcal{S }\left(g^{\mathrm{obs}};g\right) - \mathfrak s (g^{\dagger }) \right) + C_{\mathrm{err}}\mathbf{err}\left(g\right) \end{aligned}$$(8b)

*Example 2.1*

- 1.Additive deterministic errors in Banach spaces. Assume that\(\mathcal{Y }^{\mathrm{obs}}= \mathcal{Y }\),with \(r\in \left[1,\infty \right)\). Then it follows from the simple inequalities \(\left(a+b\right)^r \le 2^{r-1}\left(a^r+b^r\right)\) and \(\left|a-b\right|^r+b^r\ge 2^{1-r}a^r\) that (8) holds true with \(\mathbf{err}\equiv \left\Vert\,g^{\mathrm{obs}}-g^{\dagger }\right\Vert_{\mathcal{Y }}^r\), \(\mathfrak s \equiv 0\) and \(C_{\mathrm{err}}= 2^{r-1}\).$$\begin{aligned} \Vert g^{\mathrm{obs}}-g^{\dagger }\Vert \le \delta ,\quad \quad \text{ and}\quad \quad \mathcal{S }\left(g_2;g_1\right) = \mathcal{T }\left(g_2;g_1\right)= \left\Vert\,g_1-g_2\right\Vert_{\mathcal{Y }}^r \end{aligned}$$
- 2.
For randomly perturbed data a general recipe for the choice of \(\mathcal S , \mathcal T \) and \(\mathfrak s \) is to define \(\mathcal S \) as the log-likelihood functional, \(\mathfrak s (g^{\dagger }):= \mathbf{E }_{g^{\dagger }} \mathcal{S }\left(g^{\mathrm{obs}};g^{\dagger }\right)\) and \(\mathcal{T }\left(g^{\dagger };g\right):=\mathbf{E }_{g^{\dagger }}\mathcal{S }\left(g^{\mathrm{obs}};g\right)-\mathfrak s (g^{\dagger })\). Then we always have \(\mathcal{T }\left(g^{\dagger };g^{\dagger }\right)=0\), but part 2. of Assumption 2 has to be verified case by case.

- 3.
*Poisson data.*For discrete Poisson data we have already seen in the introduction that the general recipe of the previous point yields \(\mathcal{S }\) given by (2), \(\mathcal{T }= \mathbb K \mathbb L \) and \(\mathfrak s (g^{\dagger })=\sum _{j=1}^J \left[g^{\dagger }_j- g^{\dagger }_j\ln \left(g^{\dagger }_j\right)\right]\). It is easy to see that \(\mathbb{KL }\left(g^{\dagger };g\right)\ge 0\) for all \(g^{\dagger }\) and \(g\). Then (8) holds true with \(C_{\mathrm{err}}= 1\) andObviously, it will be necessary to show that \(\mathbf{err}\left(g\right)\) is finite and even small in some sense for all \(g\) for which the inequalities (8) are applied (see Sect. 6).$$\begin{aligned} \mathbf{err}(g) = {\left\{ \begin{array}{ll} \Big |\sum \nolimits _{j\!=\!1}^J \ln \left(g_j\right) \left(g^{\mathrm{obs}}_j-g^{\dagger }_j\right)\Big |,&g\ge 0, \{j:g_j\!=\!0, g^{\dagger }_j\!+\!g^{\mathrm{obs}}_j>0\} \!=\! \emptyset \\ \infty ,&\text{ else}. \end{array}\right.} \end{aligned}$$

To simplify our notation we will assume in the following analysis that \(\mathfrak s \equiv 0\) or equivalently replace \(\mathcal{S }\left(g^{\mathrm{obs}};g\right)\) by \(\mathcal{S }\left(g^{\mathrm{obs}};g\right)- \mathfrak s (g^{\dagger })\). As already mentioned in the motivation of Assumption 2, it is not relevant that \(\mathfrak s (g^{\dagger })\) is unknown since the value of this additive constant does not influence the iterates \(u_n\) in (4a).

Typically \(\mathcal S \) and \(\mathcal T \) will be convex in their second arguments, but we do not need this property in our analysis. However, without convexity it is not clear if the numerical solution of (4a) is easier than the numerical solution of (6).

**Assumption 3**

(*Existence*) For any \(n \in \mathbb N \) the problem (4a) has a solution.

*Remark 2.2*

- 1.
\(\mathfrak B \) is sequentially closed w.r.t. \(\tau _{\mathcal{X }}\),

- 2.
\(F^{\prime }\left(u;\cdot \right)\) is sequentially continuous w.r.t. \(\tau _{\mathcal{X }}\) and \(\tau _{\mathcal{Y }}\) for all \(u\in \mathfrak B \),

- 3.
the penalty functional \(\mathcal R : \mathcal{X }\rightarrow \left(-\infty , \infty \right]\) is sequentially lower semi-continuous with respect to \(\tau _{\mathcal{X }}\),

- 4.
the sets \(M_\mathcal{R }\left(c\right) := \left\{ u \in \mathcal{X }~\big |~\mathcal R \left(u\right) \le c\right\} \) are sequentially pre-compact with respect to \(\tau _{\mathcal{X }}\) for all \(c \in \mathbb R \) and

- 5.
the data misfit term \(\mathcal{S }\left(g^{\mathrm{obs}};\cdot \right) : \mathcal{Y }\rightarrow \left(-\infty ,\infty \right]\) is sequentially lower semi-continuous w.r.t. \(\tau _{\mathcal{Y }}\), and \(\inf _{u \in \mathfrak B } \mathcal{S }\left(g^{\mathrm{obs}};F \left(u_n\right)+F^{\prime }\left(u_n;u-u_n\right)\right) > - \infty \).

All known convergence rate results for nonlinear ill-posed problems under weak source conditions assume some condition restricting the degree of nonlinearity of the operator \(F\). Here we use a generalization of the tangential cone condition which was introduced in [21] and is frequently used for the analysis of regularization methods for nonlinear inverse problems. It must be said, however, that for many problems it is very difficult to show that this condition is satisfied (or not satisfied). Since \(\mathcal{S }\) does not necessarily fulfill a triangle inequality we have to use a generalized formulation of the tangential cone condition, which follows from the standard formulation if \(\mathcal{S }\) is given by the power of a norm (cf. Lemma 5.2).

**Assumption 4**

*Generalized tangential cone condition*)

- (A)There exist constants \(\eta \) (later assumed to be sufficiently small) and \(C_{\mathrm{tc}}\ge 1\) such that for all \(g^{\mathrm{obs}}\in \mathcal{Y }^{\mathrm{obs}}\)$$\begin{aligned}&\frac{1}{C_{\mathrm{tc}}} \mathcal{S }\left(g^{\mathrm{obs}};F\left(v\right)\right) - \eta \mathcal{S }\left(g^{\mathrm{obs}};F\left(u\right)\right) \nonumber \\&\quad \le \mathcal{S }\left(g^{\mathrm{obs}};F \left(u\right) + F^{\prime }\left(u;v-u\right)\right)\nonumber \\&\quad \le C_{\mathrm{tc}}\mathcal{S }\left(g^{\mathrm{obs}};F\left(v\right)\right) + \eta \mathcal{S }\left(g^{\mathrm{obs}};F\left(u\right)\right) \quad \quad \text{ for} \text{ all} \ u,v\in \mathfrak B . \end{aligned}$$(9a)
- (B)There exist constants \(\eta \) (later assumed to be sufficiently small) and \(C_{\mathrm{tc}}\ge 1\) such that$$\begin{aligned}&\frac{1}{C_{\mathrm{tc}}} \mathcal{T }\left(g^{\dagger };F\left(v\right)\right) - \eta \mathcal{T }\left(g^{\dagger };F\left(u\right)\right) \nonumber \\&\quad \le \mathcal{T }\left(g^{\dagger };F \left(u\right) + F^{\prime }\left(u;v-u\right)\right)\nonumber \\&\quad \le C_{\mathrm{tc}}\mathcal{T }\left(g^{\dagger };F\left(v\right)\right)+ \eta \mathcal{T }\left(g^{\dagger };F\left(u\right)\right) \quad \quad \text{ for} \text{ all} \ u,v\in \mathfrak B . \end{aligned}$$(9b)

This condition ensures that the nonlinearity of \(F\) fits together with the data misfit functionals \(\mathcal{S }\) or \(\mathcal{T }\). Obviously, it is fulfilled with \(\eta = 0\) and \(C_{\mathrm{tc}}= 1\) if \(F\) is linear.

*index function*, i.e. \(\varphi \) is continuous and monotonically increasing with \(\varphi (0)=0\). Such general source conditions were systematically studied in [23, 36]. The most common choices of \(\varphi \) are discussed in Sect. 5.

Now we can formulate the following variational formulation of the source condition (10), which is a slight variation of the one proposed in [30]:

**Assumption 5A**

*Multiplicative variational source condition*) There exists \(u^*\in \partial \mathcal{R } \left(u^\dagger \right) \subset \mathcal{X }^{\prime }\), \(\beta \ge 0\) and a concave index function \(\varphi : \left(0,\infty \right) \rightarrow \left(0,\infty \right)\) such that

In many recent publications [11, 16, 25, 42] variational source conditions in additive rather than multiplicative form have been used. Such conditions will be discussed in Sect. 4.

Since we use a source condition with a general index function \(\varphi \), we need to restrict the nonlinearity of \(F\) with the help of a tangential cone condition. Nevertheless, we want to mention that for \(\varphi \left(t\right) = t^{1/2}\) in (12) our convergence analysis also works under a generalized Lipschitz assumption, but this lies beyond the aims of this paper. The cases \(\varphi \left(t\right) = t^\nu \) with \(\nu > \frac{1}{2}\) where similar results are expected are not covered by Assumption 5, since for the motivation in the Hilbert space setup we needed to assume that \(\left(\varphi ^2\right)^{-1}\) is convex, which is not the case for \(\nu > \frac{1}{2}\).

**Theorem 2.3**

## 3 Proof of Theorem 2.3

**Lemma 3.1**

**recursive error estimate**of the form

*Proof*

- In the case of 4B we use (8), which yieldsand (9b) with \(v = u^\dagger \), \(u = u_n\) leads to$$\begin{aligned}&\alpha _n d_{n+1}^2 + \frac{1}{C_{\mathrm{err}}}\mathcal{T }\left(g^\dagger ;F\left(u_n\right) + F^{\prime }\left(u_n;u_{n+1}- u_n\right)\right) \\&\quad \le C_{\mathrm{err}}\mathcal{T }\left(g^\dagger ;F\left(u_n\right)+ F^{\prime }\left(u_n;u^\dagger -u_n\right)\right)+ \alpha _n\beta d_{n+1}\varphi \left(\frac{s_{n+1}}{d_{n+1}^2}\right)+\mathbf{err}_{n} \end{aligned}$$By (9b) with \(v = u_{n+1}\), \(u = u_n\) we obtain (22a).$$\begin{aligned}&\alpha _n d_{n+1}^2 + \frac{1}{C_{\mathrm{err}}}\mathcal{T }\left(g^\dagger ;F\left(u_n\right) + F^{\prime }\left(u_n;u_{n+1}- u_n\right)\right)\\&\quad \le \eta C_{\mathrm{err}}s_n + \alpha _n\beta d_{n+1}\varphi \left(\frac{s_{n+1}}{d_{n+1}^2}\right)+\mathbf{err}_{n}. \end{aligned}$$
- In the case of 4A we are able to apply (9a) with \(v = u^\dagger \), \(u = u_n\) and (9a) with \(v = u_{n+1}\) and \(u = u_n\) to (25) to concludeDue to (8) and Assumption 2.2 this yields (22b).\(\square \)$$\begin{aligned}&\alpha _n d_{n+1}^2 + \frac{1}{C_{\mathrm{tc}}}\mathcal{S }\left(g^{\mathrm{obs}};F\left(u_{n+1}\right)\right) \\&\quad \le 2 \eta \mathcal{S }\left(g^{\mathrm{obs}};F\left(u_n\right)\right)+ C_{\mathrm{tc}}\mathcal{S }\left(g^{\mathrm{obs}};F\left(u^\dagger \right)\right) + \alpha _n \beta d_{n+1}\varphi \left(\frac{s_{n+1}}{d_{n+1}^2}\right). \end{aligned}$$

Before we deduce the convergence rates from the recursive error estimates (22) respectively, we note some inequalities for the index functions defined in (15) and their inverses:

*Remark 3.2*

- 1.We have$$\begin{aligned} \varphi \left(\vartheta ^{-1}\left(Ct\right)\right)&\le \max \left\{ \sqrt{C},1\right\} \varphi \left(\vartheta ^{-1}\left(t\right)\right)\end{aligned}$$(26)for all \(t \ge 0\) and \(C>0\) if defined, where each inequality follows from two applications of the monotonicity assumption (13) (see [30, Remark 2]).$$\begin{aligned} \varphi ^2 \left(\varTheta ^{-1}\left(Ct\right)\right)&\le \max \left\{ \sqrt{C},1\right\} \varphi ^2\left(\varTheta ^{-1}\left(t\right)\right) \end{aligned}$$(27)
- 2.Since \(\varphi \) is concave, we have$$\begin{aligned} \varphi \left(\lambda t\right) \le \lambda \varphi \left(t\right) \quad \quad \text{ for} \text{ all} \ t \text{ sufficiently} \text{ small} \text{ and} \ \lambda \ge 1 \end{aligned}$$(28)
- 3.Equation (28) implies the following inequality for all \(t\) sufficiently small and \(\lambda \ge 1\):$$\begin{aligned} \varTheta \left(\lambda t\right) \le \lambda ^3 \varTheta \left(t\right) \end{aligned}$$(29)

The following induction proof follows along the lines of a similar argument in the proof of [30, Theorem 1]:

**Lemma 3.3**

*Proof*

*Case 1*\(\alpha _n \beta d_{n+1} \varphi \left(\frac{s_{n+1}}{d_{n+1}^2}\right) \le C_{\eta , \tau } \varTheta \left(\alpha _n\right)\).

*Case 2*\(\alpha _n \beta d_{n+1} \varphi \left(\frac{s_{n+1}}{d_{n+1}^2}\right) > C_{\eta , \tau } \varTheta \left(\alpha _n\right)\).

Therefore, we have proven that (30) and (31) hold for all \(n \le n_*\) (or in case of exact data for all \(n \in \mathbb N \)).

## 4 A Lepskiĭ-type stopping rule and additive source conditions

In this section we will present a convergence rates result under the following variational source condition in additive form:

**Assumption 5B**

A special case of condition (35), motivated by the *benchmark condition*\(u^* = F\left[u^\dagger \right]^* \omega \) was first introduced in [24] to prove convergence rates of Tikhonov-type regularization in Banach spaces (see also [42]). Flemming [16] uses them to prove convergence rates for nonlinear Tikhonov regularization (6) with general \(\mathcal{S }\) and \(\mathcal R \). Bot & Hofmann [11] prove convergence rates for general \(\varphi \) and introduce the use of Young’s inequality which we will apply in the following. Finally, Hofmann & Yamamoto [25] prove equivalence in the Hilbert space case for \(\varphi \left(t\right) = \sqrt{t}\) in (10) and (35) (with different \(\varphi \), cf. [25, Prop. 4.4]) and almost equivalence for \(\varphi \left(t\right) = t^\nu \) with \(\nu < \frac{1}{2}\) in (10) (again with different \(\varphi \) in (35), cf. [25, Prop. 6.6 and Prop. 6.8]) under a suitable nonlinearity condition. Latest research results show that a classic Hilbert space source conditions (10), which have natural interpretations in a number of important examples, relates to (35) in a way that one obtains order optimal rates (see [17]). Nevertheless, this can be seen much easier for multiplicative variational source conditions [see (14)].

The additive structure of the variational inequality will facilitate our proof and the result will give us the possibility to apply a Lepskiĭ-type stopping rule. We remark that for \(\mathfrak s \ne 0\) in Assumption 2 it is not clear how to formulate an implementable discrepancy principle.

**Lemma 4.1**

*approximation error*bound \(\varPhi _\mathrm{app}(n)\), a

*propagated data noise error*bound \(\varPhi _\mathrm{noi}(n)\) and a

*nonlinearity error*bound \(\varPhi _\mathrm{nl}(n)\),

*Proof*

\(\square \)

**Theorem 4.2**

- 1.
*Exact data:*Then the iterates \((u_n)\) defined by (4) with exact data \(g^{\mathrm{obs}}= g^{\dagger }\) fulfill$$\begin{aligned} \mathcal D ^{u^*}_\mathcal{R } \left(u_n,u^\dagger \right) = \mathcal O \left(\varLambda \left(\alpha _n\right)\right), \quad \quad n \rightarrow \infty . \end{aligned}$$(43) - 2.
*A priori stopping rule:*For noisy data and the stopping rulewith \(\varPsi \) defined in (36b) we obtain the convergence rate$$\begin{aligned} n_* := \min \left\{ n \in \mathbb N ~\big |~ \varPsi \left(\alpha _n\right) \le \mathbf{err}\right\} \end{aligned}$$$$\begin{aligned} \mathcal D ^{u^*}_\mathcal{R } \left(u_{n_*},u^\dagger \right) = \mathcal O \left(\varLambda \left(\varPsi ^{-1} \left(\mathbf{err}\right)\right)\right)\!, \quad \quad \mathbf{err}\rightarrow 0. \end{aligned}$$(44) - 3.
*Lepskiĭ-type stopping rule:*Assume that (41) holds true for some \(q>1\). Then the Lepskiĭ balancing principle (42b) with \(c =C_{\mathrm{bd}}^{\frac{1}{q}}4 \left(1+ \gamma _\mathrm{nl}\right)\) leads to the convergence rate$$\begin{aligned} \left\Vert\, u_{n_\mathrm{bal}}- u^\dagger \right\Vert_{\mathcal{X }}^q = \mathcal O \left(\varLambda \left(\varPsi ^{-1} \left(\mathbf{err}\right)\right)\right)\!, \quad \quad \mathbf{err}\rightarrow 0. \end{aligned}$$

*Proof*

## 5 Relation to previous results

We note the explicit form of the abstract error estimates (19) for these classes of source conditions as a corollary:

**Corollary 5.1**

- 1.If \(\varphi \) in (12) is of the form (45a) and \(n_* := \min \left\{ n \in \mathbb N ~\big |~ \alpha _n \le \tau \mathbf{err}_n^{\frac{1}{1+2\nu }}\right\} \) with \(\tau \ge 1\) sufficiently large, then$$\begin{aligned} \mathcal D ^{u^*}_\mathcal{R } \left(u_{n_*},u^\dagger \right) = \mathcal O \left(\mathbf{err}_{n_*}^{\frac{2\nu }{1+2\nu }}\right). \end{aligned}$$(46a)
- 2.If \(\varphi =\bar{\varphi }_p\), \(\bar{n}_* := \min \left\{ n \in \mathbb N ~\big |~ \alpha _n^2 \le \tau \mathbf{err}_n\right\} \) and \(\tau \ge 1\) sufficiently large, then$$\begin{aligned} \mathcal D ^{u^*}_\mathcal{R } \left(u_{\bar{n}_*},u^\dagger \right)&= \mathcal O \left(\bar{\varphi }_{2p}\left(\mathbf{err}_{\bar{n}_*}\right)\right). \end{aligned}$$(47a)

*Proof*

In the case of Hölder source conditions we already remarked that the conditions in Assumption 5A are satisfied \(\nu \in (0,1/2]\), and we have \(\varTheta \left(t\right) = t^{1+2\nu }\), \(\varTheta ^{-1}(\xi ) = \xi ^{1/(1+2\nu )}\).

\(\square \)

It remains to discuss the relation of Assumption 4 to the standard tangential cone condition:

**Lemma 5.2**

*Proof*

## 6 Convergence analysis for Poisson data

In this section we discuss the application of our results to inverse problems with Poisson data. We first describe a natural continuous setting involving Poisson processes (see e.g. [1]). The relation to the finite dimensional setting discussed in the introduction is described at the end of this section.

- 1.
For all measurable subsets \(\mathbb{M }^{\prime }\subset \mathbb{M }\) the number \(G(\mathbb{M }^{\prime })=\#\{n:x_n\in \mathbb{M }^{\prime }\}\) is Poisson distributed with mean \(\int _{\mathbb{M }^{\prime }}g^{\dagger }\,\mathrm{d}x\).

- 2.
For disjoint measurable subsets \(\mathbb{M }_1^{\prime },\dots ,\mathbb{M }_m^{\prime }\subset \mathbb{M }\) the random variables \(G(\mathbb{M }^{\prime }_1),\dots , G(\mathbb{M }^{\prime }_m)\) are stochastically independent.

**Assumption**\(\mathcal P \) With the notation of Assumption 1 assume that

- 1.\(\mathbb{M }\) is a compact submanifold of \(\mathbb R ^d\), \(\mathcal{Y }:=L^1(\mathbb{M })\cap C(\mathbb{M })\) with norm \(\Vert g\Vert _\mathcal{Y }:=\Vert g\Vert _{L^1}+\Vert g\Vert _{\infty }\) and$$\begin{aligned} F(u)\ge 0 \quad \quad \text{ for} \text{ all} \ u\in \mathfrak B . \end{aligned}$$
- 2.For a subset \(\tilde{\mathcal{Y }}\subset \mathcal{Y }\) specified later there exist constants \(\rho _0,t_0> 0\) and a strictly monotonically decreasing function \(\zeta : (\rho _0,\infty )\rightarrow [0,1]\) fulfilling \(\lim _{\rho \rightarrow \infty } \zeta (\rho ) =0\) such that the concentration inequalityholds for all \(\rho >\rho _0\) and all \(t>t_0\).$$\begin{aligned} \mathbf{P }\left(\sup _{g\in \tilde{\mathcal{Y }}} \left|\int \limits _{\mathbb{M }} \ln (g) \left(\mathrm{d}G_t- g^{\dagger }\,\mathrm{d}x\right)\right| \ge \frac{\rho }{\sqrt{t}}\right) \le \zeta (\rho ) \end{aligned}$$(54)

**Theorem 6.1**

([46, Thm. 2.1]) Let \(\mathbb{M }\subset \mathbb{R }^d\) be a bounded domain with Lipschitz boundary and suppose \(s>\frac{d}{2}\). For \(R \ge 1\) consider the ball \(B_s \left(R\right) := \left\{ g \in H^s \left(\mathbb{M }\right) ~\big |~ \left\Vert\,g\right\Vert_{H^s \left(\mathbb{M }\right)} \le R\right\} \). Then there exists a constant \(C_\mathrm{conc} >0\) depending only on \(\mathbb{M }, s\) and \(\left\Vert\,g^\dagger \right\Vert_{\mathbf{L }^1 \left(\mathbb{M }\right)}\) such that (54) holds true with \(\tilde{\mathcal{Y }} = B_s \left(R\right)\), \(\zeta (\rho ) = \exp \left(-\frac{\rho }{R C_\mathrm{conc}}\right)\), \(\rho _0 = RC_\mathrm{conc}\) and \(t_0 = 1\).

*Remark 6.2*

*Assumptions*5A

*and*5B

*(source conditions)*) Using the inequality

**Corollary 6.3**

Assumptions 4A and \(\mathcal P \) hold true with \(\mathcal S \) and \(\mathcal T \) given by (50) and (52) and \( \tilde{\mathcal{Y }} = F(\mathfrak B )\).

- Assumptions 4B and \(\mathcal P \) hold true with \(\mathcal T \) and \(\mathcal S \) given by (55) and (56) and$$\begin{aligned} \tilde{\mathcal{Y }}&:= \{F(u)+\sigma : u\in \mathfrak B \}\\&\cup \,\left\{ F(u)+F^{\prime }(u;v-u)+\sigma : u,v\in \mathfrak B , F(u)+F^{\prime }(u;v-u)\ge -\frac{\sigma }{2}\right\} . \end{aligned}$$

*Proof*

If \(\zeta \left(\rho \right) = \exp \left(-c \rho \right)\) for some \(c > 0\) as discussed above, then our convergence rates result (59) means that we have to pay a logarithmic factor for adaptation to unknown smoothness by the Lepskiĭ principle. It is known (see [44]) that in some cases such a logarithmic factor is inevitable.

**Binning**Let us discuss the relation between the discrete data model discussed in the introduction and the continuous model above. Consider a decomposition of the measurement manifold \(\mathbb{M }\) into \(J\) measurable disjoint subdomains (bins) of positive measure \(|\mathbb{M }_j|>0\):

## 7 Applications and computed examples

**Solution of the convex subproblems**We first describe a simple strategy to minimize the convex functional (4a) with \(\mathcal{S }\) as defined in (56) in each Newton step. For the moment we neglect the side condition \(g\ge -\sigma /2\) in (56). For simplicity we further assume that \(\mathcal R \) is quadratic, e.g. \(\mathcal R (u) = \Vert u-u_0\Vert ^2\). We approximate \(\mathcal{S }\left(g^{\mathrm{obs}};g+h\right)\) by the second order Taylor expansion

In the examples below we observed fast convergence of the inner iteration (65). In the phase retrieval problem we had problems with the convergence of the CG iteration when \(\alpha _n\) becomes too small. If the offset parameter \(\sigma \) becomes too small or if \(\sigma =0\) convergence deteriorates in general. This is not surprising since the iteration (65) cannot be expected to converge to the exact solution \(u_{n+1}\) of (4a) if the side condition \(F(u_n)+F^{\prime }(u_n;u_{n+1}-u_n)\ge -\sigma /2\) is active at \(u_{n+1}\). The design of efficient algorithms for this case will be addressed in future research.

**An inverse obstacle scattering problem without phase information**The scattering of polarized, transverse magnetic (TM) time harmonic electromagnetic waves by a perfect cylindrical conductor with smooth cross section \(D\subset \mathbb R ^2\) is described by the equations

*far field pattern*or

*scattering amplitude*of \(u_s\).

\(L^2\)-error statistics for the inverse obstacle scattering problem (68)

\(t\) | \(\mathcal{S }\left(g^\mathrm{obs};g\right)\) | \(N\) | \(\sqrt{\mathbf{E }\Vert q_N\!-\!q^{\dagger }\Vert _{L^2}^2}\) | \(\sqrt{\mathbf{Var }\Vert q_N\!-\!q^{\dagger }\Vert _{L^2}}\) |
---|---|---|---|---|

\(100\) | \(\Vert g-g^\mathrm{obs}\Vert _{L^2}^2\) | 7 | 0.124 | 0.033 |

\(\phi _c^2 \left(g^{\mathrm{obs}};g\right)\) | 2 | 0.122 | 0.018 | |

\(\mathcal S \) in Eq. (56) | 3 | 0.091 | 0.025 | |

\(1,000\) | \(\Vert g-g^\mathrm{obs}\Vert _{L^2}^2\) | 9 | 0.106 | 0.014 |

\(\phi _c^2 \left(g^{\mathrm{obs}};g\right)\) | 7 | 0.091 | 0.012 | |

\(\mathcal S \) in Eq. (56) | 5 | 0.070 | 0.017 | |

\(10,000\) | \(\Vert g-g^\mathrm{obs}\Vert _{L^2}^2\) | 9 | 0.105 | 0.004 |

\(\phi _c^2 \left(g^{\mathrm{obs}};g\right)\) | 23 | 0.076 | 0.048 | |

\(\mathcal S \) in Eq. (56) | 5 | 0.050 | 0.005 |

**A phase retrieval problem** A well-known class of inverse problems with numerous applications in optics consists in reconstructing a function \(f:\mathbb R ^d\rightarrow \mathbb C \) from the modulus of its Fourier transform \(|\mathcal F f|\) and additional a priori information, or equivalently to reconstruct the phase \(\mathcal F f/|\mathcal F f|\) of \(\mathcal F f\) (see Hurt [28]).

The problem above occurs in optical imaging: If \(f(x^{\prime })=\exp (i \varphi (x^{\prime })) =u(x^{\prime },0)\) (\(x^{\prime }=(x_1,x_2)\)) denotes the values of a cartesian component \(u\) of an electric field in the plane \(\{x\in \mathbb R ^3:x_3=0\}\) and \(u\) solves the Helmholtz equation \(\Delta u +k^2u=0\) and a radiation condition in the half-space \(\{x\in \mathbb R ^3:x_3>0\}\), then the intensity \(g(x^{\prime }) = |u(x^{\prime },\Delta )|^2\) of the electric field at a measurement plane \(\{x\in \mathbb R ^3: x_3=\Delta \}\) in the limit \(\Delta \rightarrow \infty \) in the *Fraunhofer approximation* is given by \(|\mathcal F _2f|^2\) up to rescaling (see e.g. Paganin [38, Sec. 1.5]). If \(f\) is generated by a plane incident wave in \(x_3\) direction passing through a non-absorbing, weakly scattering object of interest in the half-space \(\{x_3<0\}\) close to the plane \(\{x_3=0\}\) and if the wave length is small compared to the length scale of the object, then the *projection approximation*\(\varphi (x^{\prime })\approx \frac{k}{2}\int _{-\infty }^0 (n^2(x^{\prime },x_3)-1)\,\mathrm{d}x_3\) is valid where \(n\) describes the refractive index of the object of interest (see e.g. [38, Sec. 2.1]). A priori information on \(\varphi \) concerning a jump at the boundary of its support can be obtained by placing a known transparent object before or behind the object or interest.

Nevertheless, comparing subplots (c) and (e) in Fig. 3, the median KL-reconstruction (e) seems preferable (although more noisy) since the contours are sharper and details in the interior of the cells are more clearly separated.

## Acknowledgments

We would like to thank Tim Salditt and Klaus Giewekemeyer for helpful discussions and data concerning the phase retrieval problem, Patricia Reynaud-Bouret for fruitful discussions on concentration inequalities, and two anonymous referees for their suggestions, which helped to improve the paper considerably. Financial support by the German Research Foundation DFG through SFB 755, the Research Training Group 1023 and the Federal Ministry of Education and Research (BMBF) through the project INVERS is gratefully acknowledged.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.