Big in Japan: Regularizing Networks for Solving Inverse Problems

Deep learning and (deep) neural networks are emerging tools to address inverse problems and image reconstruction tasks. Despite outstanding performance, the mathematical analysis for solving inverse problems by neural networks is mostly missing. In this paper, we introduce and rigorously analyze families of deep regularizing neural networks (RegNets) of the form \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {B}_\alpha + \mathbf {N}_{\theta (\alpha )} \mathbf {B}_\alpha $$\end{document}Bα+Nθ(α)Bα, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {B}_\alpha $$\end{document}Bα is a classical regularization and the network \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {N}_{\theta (\alpha )} \mathbf {B}_\alpha $$\end{document}Nθ(α)Bα is trained to recover the missing part \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text {Id}}_X - \mathbf {B}_\alpha $$\end{document}IdX-Bα not found by the classical regularization. We show that these regularizing networks yield a convergent regularization method for solving inverse problems. Additionally, we derive convergence rates (quantitative error estimates) assuming a sufficient decay of the associated distance function. We demonstrate that our results recover existing convergence and convergence rates results for filter-based regularization methods as well as the recently introduced null space network as special cases. Numerical results are presented for a tomographic sparse data problem, which clearly demonstrate that the proposed RegNets improve classical regularization as well as the null space network.


Introduction
This paper is concerned with solving inverse problems of the form where A is a bounded linear operator between Hilbert spaces and , and Þ denotes the data distortion that satisfies Þ AE for some noise level AE ¼.Many inverse problems arising in medical imaging, signal processing, astronomy, computer vision and other fields can be written in the form (1.1).A main characteristic property of inverse problems is that they are ill-posed [6,15].This means that the solution of (1.1) is either not unique or is unstable with respect to data perturbations of the right hand side.
To solve such kind of inverse problems one has to employ regularization methods, which serve the following two main purposes: Select particular solutions of the noise-free equation, thereby accounting for non-uniqueness Ö´Aµ ¼ .
Our aim is finding convergent regularization methods for the solution of (1.1) using deep neural networks that can be adjusted to realistic training data.
In [16] we focused on the non-uniqueness issue, where particular solutions of the noise-free equation, (1.1) with Þ ¼, are approximated using classical regularization methods combined with null-space networks.Null-space networks (introduced originally in [12] in a finite dimensional setting) are refined residual networks, where the residual is projected onto the null-space of the operator A. In this context, the stabilization of finding a solution to (1.1) comes from a given traditional regularization method and the role of the network is to select correct solutions in a data consistent manner.

Proposed regularizing networks (RegNets)
In this paper we go one step further, and generalize the concept of deep null-space learning by allowing the network to also act in the orthogonal complement of the null-space of A in a controlled manner.This is in particular useful if the operator contains several small singular values that are not strictly equal to zero.As the component in the kernel, these parts are difficult to be reconstructed by a classical linear regularization method and quantitative error estimates require strong smoothness assumptions on the objects to be recovered.Learning almost invisible components can significantly improve reconstruction results for less smooth objects.
The proposed RegNets generalize the structure of null-space networks analyzed in [16] and consist of a family Here ´B« µ « ¼ with B « is a classical regularization of the Moore-Penrose inverse A + , and Φ « ´Á B « AµU « are neural networks that can be trained to map the part B « AÜ recovered by the regularization method to the missing part ´Á B « AµÜ.
In this paper we show that if the networks Φ « satisfy Φ « Ö´Aµ, the RegNets defined by (1.2) yield a convergent regularization method with admissible set Å Φ´Ö Ò´A + µµ.Further we derive convergence rates (quantitative error estimates) for elements satisfying conditions different from the classical smoothness assumptions.

Outline
The organization of this paper is as follows.In Section 2 we present some background and related results.In Section 3 we introduce the proposed regularizing networks and show that they yield a convergent regularization method.Further, we derive convergence rates under a modified source condition.In Section 4 we demonstrate that our results contain existing convergence results as special cases.This includes filter-based methods, classical Tikhonov regularization, and regularization by null-space networks.Moreover we examine a data driven extension of singular components, where the classical regularization method is given by truncated singular value decomposition (SVD).The paper concludes with a short summary presented in Section 5.

Some background
Before actually analyzing the RegNets, we recall basic notions and concepts from regularization of inverse problems (see [15,6]) and the concept of null-space networks.We also review some previous related work.

Classical regularization of inverse problems
Regularization methods to stably find a solution of (1.1) use a-priori information about the unknown, for example that the solution Ü lies in a particular set of admissible elements Å. Classical regularization methods approximate the Moore-Penrose inverse A + and the set Å is given by Å Ö´Aµ .Note that for any Ý ¾ Ö Ò´Aµ, the Moore-Penrose inverse A + Ý is given by the minimal norm solution of (1.1).A precise definition of a regularization method is as follows.
Definition 2.1.Let ´B« µ « ¼ a family of continuous operators B « and The parameter choice « , depending on the noise level as well as on the data, determines the level of approximation of the Moore-Penrose inverse.For decreasing noise level the ill-posed problem (1.1) can be approximated by stable problems getting closer to finding the minimum norm solution of (1.1) and in the limit it holds A great variety of regularization methods, namely filter-based regularization methods, can be defined by regularizing filters.
Any regularizing filter ´ « µ « ¼ defines a regularization method by taking B « « In truncated SVD, the regularizing filter is given by see Figure 2.2.For both of these methods the admissible set is Å Ö´Aµ .To overcome this particular choice of the admissible set, regularization methods can be adapted [16] to approximate a generalized inverse with a different set of admissible elements, which can be implicitly learned by a deep neural network.
Other typical filter-based regularization methods are the Landweber iteration and iterative Tikhonov regularization [6].

Null-space networks
Standard regularization approximates the Moore Penrose inverse which therefore select elements in Ö´Aµ .In [16] we introduced regularization of null-space networks, where the aim is to approximate elements in a set Å different from Ö´Aµ .
Null-space networks are defined as follows.
Definition 2.3 (Null-space network).We call a function Á •P Ö´Aµ N a null-space network if N is any Lipschitz continuous neural network function.
Moreover we use the following generalized notion of a regularization method.Definition 2.4 (Regularization methods with admissible set Å). Let ´R« µ « ¼ be a family of continuous operators R « and « ´¼ ½µ¢ ´¼ ½µ.Then the pair ´´R « µ « ¼ « µ is a called a regularization method (for the solution of AÜ Ý) The regularizing null-space networks analyzed in [16] take the form where ´B« µ « ¼ is any classical regularization method and Á •P Ö´Aµ N any null- space network (for example, defined by a trained deep neural network).In [16] we have shown that (2.2) yields a regularization method with admissible set This approach is designed to find the null-space component of the solution in a data driven manner with a fixed CNN P Ö´Aµ N independent of the regularization parameter «, that works in the null-space of A; compare Figure 2.3.
In this paper we go one step further and consider a series of regularizing networks (RegNets) of the form ´Á •Φ « µ AE B « generalizing (2.2).Here Φ « depends on « and is allowed to act in the orthogonal complement of the kernel Ö´Aµ .We give conditions under which this approach yields a regularization method with admissible set Å.
Allowing the network Φ « to also act in Ö´Aµ in particular is beneficial, if the forward operator A contains many small singular values.In this case, the network can learn components which are not sufficiently well contained in the data.Note that in the limit « ¼, the regularization method ´B« µ « ¼ converges to A + pointwise.Therefore, in the limit « ¼, the network is restricted to learn components in the null-space of A.

Related work
Recently, many works using deep neural networks to solve inverse problems have been published.These papers include two stage approaches, where in a first step an initial reconstruction is done, followed by a deep neural network.Several network architectures, often based on the U-net architecture [14] and improvements of it [17,8], have been used for this class of methods.
CNN based methods that only modify the part of the reconstruction that is contained in the null-space of the forward operator have been proposed in [13,12].In [16] we introduced regularizing null-space networks which are shown to lead a convergent regularization method.Recently, a related synthesis approach for learning the invisible frame coefficients for limited angle computed tomography has been proposed in [5].
Another possibility to improve reconstructions by deep learning is to replace certain operations in an iterative scheme by deep neural networks or use learned regularization functionals [10,7,11,1,2].Further, a bayesian framework has been proposed in [4,3], where the a posteriori distribution of the solutions is learned by CNNs.

Convergence and convergence rates of RegNets
In this section, we formally introduce the concept of RegNets, analyze their regularization properties and derive convergence rates.
Throughout the following let A be a linear and bounded operator and be a null-space network, see Definiton 2.3.Further, let ´B« µ « ¼ denote a classical filter-based regularization method, defined by the regularizing filter ´ « µ « ¼ , see Definition 2.2.

Convergence
Let us first formally define a family of regularizing networks.
The smallest Lipschitz constants of Φ « are uniformly bounded by some con- stant Ä ¼.

Convergence rates
In this section we derive convergence rates for RegNets introduced in Section 3.1.To that end we first we introduce a distance function and define the qualification of a classical regularization method.The definition of the distance function is essentially motivated by [9].
The qualification of a regularization method is a classical concept in regularization theory [6] and central for the derivation of convergence rates.
Proof.As in the proof of Theorem 3.2 we have Further for all ¾ with the term « can be estimated as

Classical filter-based regularization
Classical Tikhonov regularization is a special case of the regularization method defined in Theorem 3.2 with In this case the distance function

Regularizing null-space networks
In the case of regularizing null space networks we take ´B« µ « ¼ as a filter-based regularization method and Φ « Φ ¼ .In the following theorem we derive a decay rate of the distance function on the source set Ä « Here Ä denotes the Lipschitz constant of Φ ¼ and is some constant depending on the regularization ´B« µ « ¼ .

Data-driven continued SVD
For the following, assume that A admits a singular value decomposition ´´Ù Ò µ Ò¾N ´ÚÒ µ Ò¾N ´ Ò µ Ò¾N µ where ´ÙÒ µ Ò¾N and ´ÚÒ µ Ò¾N are orthonormal systems in and , respectively, and The regularization method corresponding to the regularizing filter given in (2.1) yields to the truncated SVD given by The SVD only recovers signal components corresponding to sufficiently large singular values of A and sets the other components to zero.It seems reasonable to train a network that extends the coefficients with nonzero values and therefore can better approximate non-smooth functions.
To achieve a learned data extension, we consider a family of regularizing networks of the form (3.1) In particular, we have Ö Ò´´Á B « AµU « µ ×Ô Ò Ù ¾ « .Then for Ü ¾ and all «, the expression B « AΦ « B « AÜ vanishes and therefore (A3) in Theorem 3.6 is clearly satisfied.
The networks Φ « map the truncated SVD reconstruction B « ´ÝAE µ lying in the space spanned by the reliable basis elements (corresponding to sufficiently large singular values of the operator A) to coefficients unreliable predicted by A. Hence, opposed to truncated SVD, R « is some form of continued SVD, where the extension of the unreliable coefficients is learned from the reliable ones in a data driven manner.

Conclusion
In this paper we introduced the concept of regularizing families of networks (Reg-Nets), which are series of deep CNNs.The trained as well as the classical parts of the networks are allowed to depend on the regularization parameter and it is shown, that under certain assumptions this approach yields a convergent regularization method.We also derived convergence rates under the assumption, that the solution lies in a source set, that is different from the classical source sets.Examples were given, where the assumptions are satisfied.It is shown, that the new framework recovers results for classical regularization as special cases as well as data driven improvements of classical regularization.Such data driven regularization methods can give better results in practice than classical regularization methods which only use hand crafted prior information.Future work will be done to test the proposed regularizing networks on ill-posed inverse problems including limited data problems in computed tomography.

Figure 2 . 3 :
Figure 2.3: Regularization defined by a null space network.For a filter-based regularization method we have B « Ý AE ¾ Ö´Aµ .The regularized null-space network R « B « • P Ö´Aµ AE N AE B « adds reasonable parts along the null-space Ö´Aµ to the standard regularization B « Ý AE .

Definition 3 . 3 (
Distance function).For any numbers « ¼ and Ü ¾ we define the distance function of « and therefore satisfies « ´Ü Clearly, the above considerations equally apply to any filter-based regularization method including iterative Tikhonov regularization, truncated SVD, and the Landweber iteration.