Convergent regularization in inverse problems and linear plug-and-play denoisers

Plug-and-play (PnP) denoising is a popular iterative framework for solving imaging inverse problems using oﬀ-the-shelf image denoisers. Their empirical success has motivated a line of research that seeks to understand the convergence of PnP iterates under various assumptions on the denoiser. While a signiﬁcant amount of research has gone into establishing the convergence of the PnP iteration for diﬀerent regularity conditions on the denoisers, not much is known about the asymptotic properties of the converged solution as the noise level in the measurement tends to zero, i.e., whether PnP methods are provably convergent regularization schemes under reasonable assumptions on the denoiser. This paper serves two purposes: ﬁrst, we provide an overview of the classical regularization theory in inverse problems and survey a few notable recent data-driven methods that are provably convergent regularization schemes. We then continue to discuss PnP algorithms and their established convergence guarantees. Subsequently, we consider PnP algorithms with linear denoisers and propose a novel spectral ﬁltering technique to control the strength of regularization arising from the denoiser. Further, by relating the implicit regularization of the denoiser to an explicit regularization functional, we rigorously show that PnP with linear denoisers leads to a convergent regularization scheme. More speciﬁcally, we prove that in the limit as the noise vanishes, the PnP reconstruction converges to the minimizer of a regularization potential subject to the solution satisfying the noiseless operator equation. The theoretical analysis is corroborated by numerical experiments for the classical inverse problem of tomographic image reconstruction.


Introduction
Inverse problems deal with the estimation of an unknown model parameter x * ∈ X from its noisy and indirect measurement y δ ∈ Y given by y δ = Ax * + e. (1) We consider the case where X and Y are (potentially infinite dimensional) separable Hilbert spaces and A : X → Y is a bounded linear operator.X and Y are endowed with inner products •, • X and •, • Y , inducing the norms • X and • Y , respectively.The measurement noise level is bounded by δ, i.e., e Y ≤ δ.The clean measurement is denoted by y 0 .The inverse problem in ( 1) is considered ill-posed in the sense of Hadamard, if either injectivity or surjectivity of the forward operator, or stability of the solution map is violated.For instance, if A is a compact operator with an infinite-dimensional range, then surjectivity and stability are not satisfied.This is, for example, the case for the ray transform operator that underlies many applications in medical imaging, such as computed tomography (CT) and positron emission tomography (PET) [32,33].The study of inverse problems usually assumes ill-posedness, as we will also do in the following.
In order to address ill-posedness, one needs to introduce a general concept for stable and unique solvability for an inverse problem of the form (1). Due to the aforementioned ill-posedness, we can not guarantee to recover the true solution x * for all measurements and hence we first need the concept of a generalized solution.A common approach is to search for solutions that are closest to the measured data with respect to a suitable data discrepancy term f : Y × Y → R + , such as the (squared) distance in the norm, i.e., f (Ax, y δ ) = Ax − y δ 2 Y .Then we search for x ∈ X such that f (A x, y δ ) ≤ f (Ax, y δ ) for all x ∈ X. ( (2) implies that x is closest to the measured data with respect to f , which deals with the violation of surjectivity by disregarding components of y δ in the co-kernel of A. Furthermore, if A has a non-trivial null space, then x is not unique.To obtain a unique solution, one can define the minimum norm solution as The element x † can now be considered a desirable generalized solution to (1).When f and • X are given by the squared L 2 -norm, we call x † the least-squares minimum-norm solution and can define a mapping A † : Y → X, such that x † = A † y δ .In fact, the mapping A † defines what is referred to as the Moore-Penrose pseudo-inverse.Unfortunately, if the operator A is compact, then A † will be unbounded and as such does not take care of the stability problem in the presence of noise in the data.This is where the concept of regularization becomes important, as we will discuss next.
In order to deal with the stability issue, regularization theory considers specifically designed solution maps.Such a solution map R(•; λ) : Y → X, also called a reconstruction operator, is expressed as a parametric map that produces an estimate of x * given y δ .Here, the parameter λ depends on the noise level δ, and we denote this explicitly by the mapping δ → λ(δ).In this paper, we are specifically interested in the notion of convergent regularization which can be understood as convergence of the reconstruction operator when the noise level δ tends to zero.More specifically, we want that when the noise level δ → 0, then λ(δ) → λ 0 ≥ 0, and the reconstruction operator R(y δ ; λ) converges to a generalized solution of the noiseless operator equation In the following, we will first review classical approaches to regularization in Section 2 and why inverse problems necessitate a generalized notion of solvability.We will then continue to discuss how this classical approach can be combined with modern data-driven methods.In particular, we will devote special attention to the so-called plug-and-play (PnP) approaches in Section 3, which have been shown to yield excellent empirical results for imaging inverse problems.These methods utilize state-of-the-art denoisers, model-or datadriven, to replace the proximal operators within iterative proximal splitting algorithms for solving an underlying variational minimization problem for reconstruction.We will subsequently provide an analysis in Section 4 of how PnP approaches can in fact provide a convergent regularization method in the classical sense, in particular, with linear denoisers.

Regularization for inverse problems and data-driven methods
Regularization theory has been a rich and successful field in inverse problems for several decades.The primary motivation is to formulate a well-posed and stable inversion procedure that converges provably to a solution of the noiseless operator equation (4).The emergence of data-driven methods has given the field of inverse problems a new direction: by using large quantities of data we can significantly improve reconstruction results.However, the underlying question of a convergent regularization remains: does the obtained reconstruction solve the underlying operator equation?Indeed, there exist a few methods that are provably convergent regularization methods, we refer to [30] for a survey.In the following, we will give a short overview of the regularization theory and existing data-driven approaches that are provably convergent regularization methods in this context.

Classical regularization theory
Stable solutions to inverse problems need a way to handle varying noise levels.For this purpose, the concept of regularization has proven highly useful.Roughly, regularization can be understood as a convergence requirement to a unique solution, e.g., the minimum norm solution x † , where convergence depends on the noise level δ.That is, formally we consider the previously discussed reconstruction operator R λ := R(•, λ), which provides a parameterized family of continuous operators R λ : Y → X.The parameter λ depends on the noise level δ > 0, where y δ − y 0 ≤ δ and y 0 := Ax * denotes noise-free data.We say that the family of reconstruction operators is a convergent regularization method if there exists a parameter choice rule δ → λ(δ, y δ ) such that reconstructions x δ := R λ(δ,y δ ) (y δ ) converge to the solution x † := A † y 0 given by the pseudoinverse as noise vanishes, in the sense that lim sup In other words, we have point-wise convergence of the reconstruction operators to the pseudo-inverse, i.e., R λ(δ,y δ ) (y δ ) → A † y 0 as δ → 0. We refer interested readers to [15] for a detailed discussion.This is, of course, quite restrictive and only considers convergence to the least-squares minimum-norm solution.
Nevertheless, this can already be used as a useful tool to design learned regularization methods, i.e., learned reconstruction approaches that formally satisfy the above convergence criteria, as we will discuss in the following.

Direct regularization
Motivated by the convergence to the pseudo-inverse solution, one can obtain a regularization method by mimicking the construction of the pseudo-inverse.
In finite dimensions, this can be achieved by the singular value decomposition (SVD) A = U SV of the forward operator.The pseudo-inverse can then be simply obtained by A † = V S † U , where S † is the transposed singular value matrix with inverted singular values.A regularization method is now obtained by filtering the singular values with a noise-dependent filter function, or a noise level-dependent truncation.Similarly, direct reconstruction methods that apply a regularized inverse of the forward operator can be shown to be convergent regularization methods.The most prominent example of such methods is the filtered back-projection (FBP) for X-ray CT, which is, in fact, still relevant in clinical practice.Here, the filtering operation removes high-frequency components in Fourier space to regularize the reconstructions.If the filtering is interpreted as a noisedependent mollifier, one obtains the general class of approximate inverse [42] with convergence as noise vanishes.
A popular approach in data-driven methods is to formulate a reconstruction operator as composition of a regularized reconstruction operator R λ : Y → X with a data-driven component C θ : X → X.That is, the reconstruction operator is parameterized as R (θ,λ) := C θ • R λ , where the data driven component C θ is designed to improve the reconstruction by removing noise or undersampling artefacts, usually given by a deep convolutional neural network (CNN) [22,24].These approaches are also popularly referred to as post-processing methods.
Such one-step post-processing approaches are especially popular due to their simplicity, as C θ can be efficiently trained when supervised pairs of high and low-quality reconstructions are available.Unfortunately, there are very few results on reconstruction guarantees for such methods.Specifically, the problem formulation as a composition of a regularized reconstruction followed by the data-driven component causes the reconstruction to often violate the so-called data-consistency criterion.That is, even if the data-fidelity f ( A • R λ (y δ ), y δ ) is small, it does not necessarily imply a small value of f A • C θ • R λ (y δ ), y δ corresponding to the output of the post-processing network C θ .Thus, such schemes do not satisfy the convergence of the data fidelity and hence fail to be a convergent regularization strategy.
Nevertheless, as proposed in [43], this approach can be reformulated by constructing the post-processing network as C θ = id + id −A † A Q θ , where Q θ is a Lipschitz-continuous Deep Neural Network (DNN) and id denotes the identity operator on X.Here, id −A † A is the projection operator onto the null-space of A and hence the operator C θ (referred to as null-space network) always satisfies ensuring that the output of C θ explains the observed data.More importantly, the null-space network maintains the regularizing properties of the reconstruction method R λ and hence provides a convergent regularization scheme [43] in the sense of direct regularization.See [8] for a recent extension of null-space networks to non-linear inverse problems.

Variational regularization
The classical regularization theory, which defines convergent regularization by convergence to the pseudo-inverse solution as defined in (5) limits possible solutions.Therefore, one can consider more general variational approaches to inverse problems, which have been particularly popular due to their flexibility in incorporating prior knowledge and dealing with varying noise distributions.In the variational regularization framework, solutions are computed by minimizing a composite objective consisting of the data-consistency term and a regularization term.In particular, the solutions are given by The loss functional f : Y × Y → R + measures data fidelity and is not restricted anymore to be the squared L 2 -norm.The regularization functional g λ : X → R encodes prior belief about the ground-truth x * and effectively restricts the null space of A. In the general case, λ can be a parameter of the functional, and more commonly, a simple weighting parameter to balance between the two terms of the composite objective in (6) (i.e., g λ (x) = λg(x), where g is the regularizer).
The choice of a suitable regularizer g λ is governed by the need to balance two important factors: desirable analytical features and the encoded prior belief.For instance, an analytically favorable choice is given by the squared L 2 -norm, which, in combination with a squared L 2 -norm for the data fidelity, provides a closed-form solution.Unfortunately, the obtained solutions corresponding to this choice of the regularizer will be smooth, which may not be suitable for many imaging applications.Consequently, more advanced sparsity-promoting priors have been favored, most commonly the L 1 -norm for sparse signals and total variation (TV) for sparse gradients, i.e., piece-wise constant functions.These regularizers are non-differentiable and hence need more advanced non-smooth optimization techniques to compute a minimizer [7], but they typically lead to a better reconstruction than the simple squared L 2 -norm-based regularization.
See Figure 1 for a comparison of a few handcrafted regularizers in the context of sparse-view CT reconstruction.
Notably, the role of the two terms in ( 6) is conceptually similar to the general formulation in (3) of a minimum-norm solution.Nevertheless, the variational formulation provides more flexibility and also necessitates a broader concept of regularization.This is because we can not always guarantee convergence to the minimum-norm solution, but we have to consider convergence with respect to the chosen regularization functional g λ [41].
We can then formulate a regularization method in the framework of variational regularization as follows: let us first denote x λ ∈ X to be a minimizer to the objective in (6) for a given λ with data y δ ∈ Y and noise level y δ −y 0 Y < δ.Similarly, as before, we assume that there is a corresponding parameter choice rule δ → λ(δ, y δ ) such that λ → λ 0 as δ → 0. The variational model defined by ( 6) is then said to converge to a g-minimizing solution if x λ(δ,y δ ) → x as δ → 0. Here, x ∈ X solves the variational model that corresponds to (6) with noise-free data y 0 ∈ Y , i.e., x ∈ arg min x∈X g λ0 (x) subject to y 0 = Ax * and where λ 0 := lim δ→0 λ(δ, y δ ).(7) The primary differences to the classical formulation here are, that the minimizer of the regularizing functional g is not necessarily unique and the regularization parameter is not required to converge to 0.
Let us remark to this end, that it is desirable to formulate a regularizer that has small values for the desired images, i.e., it penalizes undesired solutions but is also analytically or computationally tractable.It is important to note at this point that different regularizers g, which provide a convergent regularization, will still produce very different reconstruction results as illustrated in Figure 1, as not all choices of g are a good representation of the desired ground-truth image.Here, learned regularizers have proven very successful, as the data itself can now be used to represent the regularizer and hence naturally offer a good representation of the desired features.Depending on the choice of representation, analysis of the learned regularizer may become more involved.In the following, we will discuss several choices for learned data-driven regularizers and how these can be used within the realm of variational regularization.

Learning a regularizer
The idea to learn a regularizer from data, rather than the classical approach of modeling it from first principles as outlined above, has appeared in the literature in various forms.We outline here a few such approaches, ranging from relatively older yet widely popular ideas like dictionary learning to the more recent approaches to learning regularizers using deep neural networks.

Learning sparsity-promoting dictionaries
We start with the concept of dictionary learning, which nicely illustrates how data can be used to learn a representation of the desired images.Here, we will use the concept of sparsity, which has long been important for modeling prior knowledge of solutions, to regularize inverse problems.Assuming that the reconstruction possesses a sparse representation in a given dictionary D, one can develop sparse recovery strategies, associated computational approaches, and error estimates for the reconstruction.Instead of working with a given dictionary, the key idea is to learn a dictionary either a-priori or jointly with the reconstruction.Notably, almost all work on dictionary learning in sparse models has been carried out in the context of denoising, i.e., with A = id.
Learning the dictionary separately to solve the reconstruction problem is usually done using a sparsity assumption on the representation given by the dictionary.Let L X : X × X → R be a given loss function (e.g. the L 2 -or L 1 -norm).Further, let x 1 , . . ., x N ∈ X be the given unsupervised training data, D = {φ i } ⊂ X a dictionary, and the synthesis operator E * D : Ξ → X acting on the encoder space Ξ given as One approach in dictionary learning is based on the idea of finding a dictionary that approximates the training data in the given loss function with the sparsest possible coefficients, by solving where s is a given sparsity level.Alternatively, one can formulate the optimality criterion by looking for a dictionary that maximizes the sparsity of the dictionary representation while enforcing a constraint on the precision in which the synthesis operation approximates, i.e., by seeking a dictionary such that L X (x i , E * D (ξ i )) ≤ ε for i = 1, . . ., N , while maximizing sparsity.A unified formulation is given by the unconstrained problem Both ( 8) and ( 9) are posed in terms of the L 0 -norm and are NP-hard problems.This suggests the use of convex relaxation, by replacing ξ i 0 with ξ i 1 in (9).This relaxation turns (9) into a bi-convex problem (convex in each variable when the others are kept fixed) subject to usual choices for L X , and one can apply alternating minimization approaches for obtaining an approximate solution.Seminal work on sparse dictionary learning includes the K-SVD approach [4], geometric multi-resolution analysis (GMRA) [5], and online dictionary learning [27].See also [39] and references therein for a thorough discussion on sparse dictionary learning approaches.While dictionary learning in the context of sparse coding has been very popular and successful, there are still several issues with it related to the locality of learned structures and the computational effort needed, for instance when sparse coding is performed over a large number of images or image patches.Consequently, a computationally feasible approach is needed that introduces further structure and invariances in the dictionary (e.g., shift-invariance), which makes each dictionary atom φ i dependent on the whole image instead of just individual patches.In this context, convolutional dictionaries have been introduced.Here, the dictionary atoms are given by convolution kernels that act on signal features via convolution and hence provide computationally feasible shift-invariant dictionaries, where the atoms depend on the entire signal/image.
Convolutional dictionary learning is formulated as follows.Given unsupervised training data x 1 , . . ., x N ∈ X and a loss function L where φ i 2 = 1.The above can be solved using an ADMM-type scheme, similar to what is done for the L 2 -loss in [16].There are various extensions of convolutional dictionary learning, for instance, multi-layer variants [44].
The dictionary can also be learned jointly with the reconstruction, by formulating a joint optimization problem.An example of such an approach is the adaptive dictionary-based statistical iterative reconstruction (ADSIR) [48], and its variants [12,46].A joint problem could be formulated as: where while E * D : Ξ → X being the synthesis operator associated with the dictionary D. A convergent regularization could now be obtained under suitable conditions on g λ following the variational regularization framework.
Finally, a formulation in infinite dimensional spaces is studied in [10], proposing a convex variational model for joint reconstruction and dictionary learning, that applies to inverse problems and allows to establish existence and stability guarantees for the reconstruction.

Bilevel learning
Starting from variational regularization methods where the reconstruction operator R λ : Y → X is defined as the solution map for (6), one can formulate a generic setup for learning selected components of ( 6) utilizing supervised training data and a suitable loss function L X : X × X → R.This setup can be tailored towards learning the regularization functional g λ [35,36], the data fidelity term f , or even an appropriate component in the forward operator A, e.g., in blind image deconvolution [20].Notably, the joint dictionary learning problem (11) can also be formulated as a bilevel learning problem.
First, we generalize the regularizer g λ consisting of a single regularization parameter λ to a set of parameters θ (vector-valued).Subsequently, we define the reconstruction operator as Given paired training data (x i , y i ) ∈ X × Y that are i.i.d.samples of the (X × Y )-valued random variable (x, y) ∼ π joint , we can formulate the following bilevel learning problem: where Note that θ here is a Bayes estimator.However, the true joint distribution π joint is typically unknown and is replaced by its empirical counterpart given by the training data, in which case θ corresponds to empirical risk minimization.
In the bilevel optimization literature, as in the optimization literature as a whole, there are two main and mostly distinct approaches.In the discrete approach that first discretizes the problem (13) and subsequently optimizes its parameters.In this way, optimality conditions and their well-posedness are derived in finite dimensions.Alternatively, R and its parameter θ in (14) are optimized in the continuum (i.e., appropriate infinite-dimensional function spaces) and then discretized.It should be noted that the resulting problems present several difficulties due to the frequent non-smoothness of the lower-level problem (think of TV regularization), which, in general, makes it impossible to verify Karush-Kuhn-Tucker constraint qualification conditions.This issue has led to the development of alternative analytical approaches in order to obtain first-order necessary optimality conditions [13,19].

Adversarial regularization
Another notable alternative approach to include a learned regularization in the reconstruction process is to learn an explicit regularization term in (6) and solve the variational problem subsequently.One such option is to learn adversarial regularizers as first proposed in [26] and further developed in [29].Here, the construction of data-driven regularization is inspired by how discriminative networks (also referred to as critics) are trained using modern Generative Adversarial Network (GAN) architectures.
To train such an adversarial regularizer, we assume to have {x i } n1 i=1 ∈ X and {y i } n2 i=1 ∈ Y , which are i.i.d.samples from the marginal distributions π x and π y of ground-truth images and measurement data, respectively.It is important to note here that the training samples are unpaired, i.e., y i does not necessarily correspond to the noisy measurement of x i , unlike, for instance, a supervised approach such as the learned primal-dual (LPD) method [3].Additionally, we assume that there exists a (potentially regularizing) pseudo-inverse A † : Y → X to the forward operator A and define the measure π † ∈ P X as π Then, the idea is to train a regularizer g θ , parametrized by a neural network, to discriminate between the distributions π x and π † , the latter representing the distribution of imperfect solutions A † y i .More concretely, we compute where L(θ) is chosen to be a Wasserstein-flavored loss functional [26].In particular, one minimizes Here, π denotes the distribution of the random variable u = ε x+(1−ε)z, where x ∼ π x , z ∼ π † , and ε is drawn uniformly at random from [0, 1].The heuristic behind this choice is that a regularizer trained this way will penalize noise and artifacts generated by the pseudo-inverse (and contained in π † ).The term penalizing the gradient norm of g θ in ( 16) encourages g θ to be approximately 1-Lipschitz, which is required for the well-posedness of ( 16) and the stability of the variational solution obtained using the regularizer resulting from (15).When used as a regularizer, it will hence prevent these undesirable features from occurring as a result of adversarial training.The resulting regularizer g θ is called an adversarial regularizer (AR).Note that in practical applications, the measures π x , π † ∈ P X are replaced with their empirical counterparts given by training data x i and A † y i , respectively.Suppose, one computes a gradient step on the learned regularizer, given by † be the distribution of x η .Under appropriate regularity assumptions on the Wasserstein distance W(π η † , π x ) (see [26,Theorem 1]), one can show that This ensures that by taking a small enough gradient step, one can reduce the Wasserstein distance from the ground truth π x .This is a good indicator that using g θ as a variational regularization term and consequently penalizing it indeed introduces the highly desirable incentive to align the distribution of regularized solutions with the distribution π x of ground truth samples.Further, one can show that if the AR is Lipschitz-continuous1 , then a minimizer of the following variational problem exists where the squared norm on x is needed to enforce coercivity.Additionally, we can enforce (strong) convexity on g θ , leading to the adversarial convex regularizer (ACR), to achieve stronger forms of convergence while precluding discontinuities in the reconstruction operator.This necessitates a suitable parameterization of the learned regularizer.One such option is given by input convex neural networks for imposing convexity [6] on g θ .
Given a so-constructed (ACR) g θ that is convex in x, we then consider a similar regularization functional of the form where g θ : X → R is the trained (ACR) which we assume to be 1-Lipschitz and convex in x.The corresponding variational regularization problem then consists in minimizing with respect to x ∈ X.In this setting, we get the following set of improved theoretical guarantees for the ACR, by following standard arguments in variational calculus for the proofs.

Theorem 1 (Properties of Adversarial Convex Regularizer [29])
i. Existence and uniqueness: The functional in (19) is strongly convex in x and has a unique minimizer x λ (y) for every y ∈ Y and λ > 0. ii.Stability: The optimal solution x λ (y) is continuous in y.
Theoretical guarantees notwithstanding, the numerical experiments in [29] (especially, for sparse-view CT reconstruction) indicate a lack of expressive power of ACRs as compared to their nonconvex counterpart AR.This underscores the need to develop techniques that achieve a better compromise between empirical performance and theoretical certificates.

The Network Tikhonov (NETT) approach
Traditionally, regularizers are often chosen as sparsifying transforms with respect to certain features.For instance, total variation (TV) is sparsifying for piecewise constant functions.Similarly, neural networks are often trained in an encoder-decoder (autoencoder) structure, where the encoder is trained to represent the input signal in a low-dimensional space or to find a more efficient, i.e., a sparse structure.The approach proposed as Network Tikhonov (NETT) in [25] follows this paradigm to learn a regularizer.Here, a pretrained network takes small values for desired model parameters and penalizes (by producing larger values for) model parameters with artifacts or other unwanted structures.The deep neural network E θ in this approach is allowed to be a rather general architecture, such as the above-mentioned autoencoder.Once trained, the reconstruction is then given as the minimizer of the variational objective Indeed, the NETT approach also provides a provably convergent regularization method under certain analytic conditions on (20), such as weak lower semicontinuity and coercivity of the regularizer g(E θ (•)).The primary difference to the ACR in (18), which achieves convergence in the strong topology of X by enforcing convexity, is that the NETT idea achieves convergence in the weak topology of X.The weak lower semi-continuity and coercivity of g(E θ (•)) can be achieved as follows.First, the usual ReLU activation function is replaced by leaky ReLU defined with a small τ > 0 as ReLU τ (s) := max(τ s, s), which tends to −∞ for s → −∞.In combination with the affine linear maps (weight matrices) in E θ , this yields a coercive and weakly lower semi-continuous regularization function g • E θ for standard choices of g, such as weighted pnorms g(ξ) = i v i |ξ| p , with uniformly positive weights v i and p ≥ 1.Finally, we note that strong convergence can be achieved by introducing the novel concept of absolute Bregman distances and imposing stronger conditions on the regularizer.

Regularization by Plug-and-play (PnP) denoising
Denoising is the simplest and arguably the most well-studied inverse problem in imaging, with numerous algorithms developed over the past few decades, particularly for removing additive white Gaussian noise from images.It is, therefore, natural to ask if one can leverage off-the-shelf denoisers for solving more complicated image recovery tasks with a non-trivial forward operator.Venkatakrishnan et al. [45] pioneered the idea of using denoisers within proximal splitting algorithms (e.g., the alternating directions method of multipliers (ADMM) algorithm) in a plug-and-play (PnP) fashion, and the resulting class of algorithms came to be known as the PnP denoising approach.To see the motivation behind using denoisers in place of proximal operators, let us recall the definition of the proximal operator with respect to a (potentially nonsmooth) convex functional g : X → R ∪ {+∞} and a step-size τ > 0: As indicated by (21), evaluating the proximal operator amounts to denoising a noisy image x using the Bayesian maximum a-posteriori probability (MAP) estimation framework with a Gibbs prior ∝ exp (−τ g(u)).This denoising interpretation of proximal operators underlies the foundation of PnP approaches, which have been shown to produce excellent reconstruction results for a wide range of imaging inverse problems.A classic and widely popular example of PnP denoising would be to consider it in conjunction with forward-backward splitting (FBS), leading to the following iterative reconstruction algorithm: Here, f denotes the data fidelity loss for the underlying inverse problem, η k > 0 is the step-size at iteration k, and D σ is a denoiser that eliminates Gaussian noise of standard deviation σ from its input.
Besides the PnP denoising framework within proximal methods, wherein a denoiser implicitly acts as a regularizer, Romano et al. [37] proposed an alternative approach to explicitly construct a regularizer as while utilizing a denoiser D σ (x).One can then seek to minimize the energy functional f (x) + λ g(x), where g is as defined in ( 23), leading to fixed-point iterative schemes known as the regularization-by-denoising (RED) algorithms.Nevertheless, it was shown subsequently by Schniter et al. [34] that the energy minimization interpretation of the RED algorithms is valid only when (i) the denoiser is locally homogeneous, i.e., D σ ((1 + ε)x) = (1 + ε)D σ (x) holds for all x with sufficiently small ε, and (ii) the Jacobian of D σ is symmetric.These conditions are generally not satisfied by generic denoisers, thereby invalidating the energy minimization-based interpretation of RED.Instead, the authors of [34] developed a new framework called score-matching to analyze the convergence of RED algorithms.
Notwithstanding their empirical success, PnP denoising algorithms such as (22) does not immediately inherit the convergence properties of the corresponding optimization scheme (in this specific instance, FBS).Studying the convergence of PnP denoising has received a significant amount of attention in the mathematical imaging community in recent years.Arguably, the most natural form of convergence for PnP algorithms of the form (22) is the stability of the iterations, i.e., to ascertain whether the sequence of iterates x k generated by a PnP algorithm converges.Such convergence guarantees are typically derived from fixed point theorems, which require showing that the PnP iterations are contractive maps [11,40].For instance, [40] established the fixed-point convergence of PnP-ADMM (i.e., PnP with the alternating direction method of multipliers algorithm) under the assumption of Lipschitz continuity of the operator (D σ − id).The specific result is stated in Theorem 2.

Theorem 2 (Fixed-point convergence of PnP-ADMM [40])
Consider the PnP-ADMM algorithm, given by where the data-fidelity loss f is assumed to be µ-strongly convex.One can equivalently express (24) as the fixed-point iteration z k+1 = T (z k ), where Suppose, the denoiser satisfies for all u, v ∈ X and some ε > 0, and the strong convexity parameter µ is such that ε As noted in [40], fixed-point convergence of PnP-ADMM follows from monotone operator theory if (2D σ − id) is non-expansive, but (26) imposes a less restrictive condition on the denoiser.
While fixed-point convergence ensures that the PnP iterations are stable, the specific fixed point to which they converge does not automatically minimize a variational energy function.To bridge the gap between classical variational approaches and PnP methods, it is important to derive conditions under which the limit point of PnP iterations can be characterized as the minimizer (or, at least a stationary point) of some regularized variational objective (which, of course, depends on the denoiser).This type of convergence is referred to as objective convergence and is stronger than fixed-point convergence.
Objective convergence of PnP with classical (pseudo) linear denoisers (e.g., non-local means denoiser) has been established in [31].Hurault et al. [21] showed that PnP with a denoiser constructed as a gradient field (referred to as gradient-step (GS) denoisers) converges to the stationary point of a (possibly non-convex) variational objective (c.f.Theorem 3).The construction of GS denoisers is motivated by Tweedie's identity: the optimal minimum mean-squared error (MMSE) Gaussian denoiser is given by Here, x = x 0 + σ z, where x ∼ N (0, id), is the Gaussian noise (with variance σ 2 ) corrupted version of the clean image x 0 ∈ X ⊆ R d and Indeed, the optimal Gaussian denoiser is of the form D * σ (x) = x − ∇ g * σ (x), where g * σ is the negative log of the smoothed distribution p σ defined in ( 29), which has a structure identical to that of a GS denoiser.

Theorem 3 (Objective convergence of PnP iterations with gradientstep (GS) denoisers [21])
Suppose, the denoiser is constructed as a gradient-step (GS) denoiser, i.e., D σ = id −∇g σ , where g σ is proper, lower semi-continuous, and differentiable with an L-Lipschitz gradient.The PnP algorithm proposed in [21] is given by where f : X → R ∪ {+∞} denotes data-fidelity and is assumed to be convex and lower semi-continuous.Then, the following guarantees hold for τ < 1 λ L : 1.The sequence F (x k ), where F = f + λ g σ , is non-increasing and convergent.2. x k+1 − x k 2 → 0, which indicates that iterations are stable, in the sense that they do not diverge if one iterates indefinitely.

All limit points of {x k } are stationary points of F (x).
Notably, the PnP iteration defined by (30) is exactly equivalent to proximal gradient descent on f + λ g σ , with a potentially non-convex g σ .
While objective convergence ensures a one-to-one connection between PnP iterates with the minimization of a variational objective, it does not provide any guarantees about the regularizing properties of the solution that the iterates converge to.In the same spirit as classical regularization theory, it is therefore desirable to be able to control the implicit regularization effected by the denoiser in PnP algorithms and analyze the asymptotic behavior of the PnP reconstruction as the noise level and the regularization strength tend to vanish.More precisely, assuming that the PnP iterations converge to a solution x y δ , σ, λ , where σ is a parameter associated with the denoiser and λ is an explicit regularization penalty, one would like to obtain appropriate selection rules for σ and/or λ such that x y δ , σ, λ exhibits convergence akin to (7) in the limit as δ → 0. To the best of our knowledge, some progress in this direction was first made in [14], and the precise convergence result is stated in Theorem 4.

Theorem 4 (Convergent plug-and-play (PnP) regularization [14])
Consider the PnP-FBS iterates of the form where D λ is a denoiser with a tuneable regularization parameter λ.Let PnP λ, y δ be the fixed point of the PnP iteration (31).For any y ∈ range(A) and any sequence δ k > 0 of noise levels converging to 0, there exists a sequence λ k of regularization parameters converging to 0 such that for all y k with y k − y 0 2 ≤ δ k , the following hold under appropriate assumptions on the denoiser (see Definition 3.1 in [14] for details): 1. PnP λ, y δ is continuous in y δ for any λ > 0; 2. The sequence (PnP (λ k , y k )) k∈N has a weakly convergent subsequence; and 3.The limit of every weakly convergent subsequence of (PnP (λ k , y k )) k∈N is a solution of the operator equation y 0 = Ax.
The result in [14], although the first of its kind, makes fairly restrictive assumptions on the denoiser.In particular, the denoiser needs to be contractive, which is not satisfied by most practical denoisers, especially denoisers modeled using deep CNNs.This led us to pose the following question: are PnP approaches with more general and expressive denoisers also convergent regularization methods?This question is perhaps more tractable if one can associate the PnP solution (after convergence) with the minimizer of an underlying variational objective.We, therefore, first consider gradient-step denoisers, for which it is possible to establish such a connection (see Theorem 3).Treating λ in (30) as an explicit regularization parameter while using a fixed, pre-trained denoiser, one can interpret the converged PnP solution as a minimizer of f + λ g σ , where λ is varied depending on the noise level δ in the measurement data and σ is kept fixed.The numerical results for image deblurring in Figure 2 seem to indicate that gradient-step PnP is indeed a convergent regularization scheme, while the classical theory only guarantees stability akin to what is shown in [26] subject to g σ being coercive and bounded below.In addition, the role of σ as an implicit regularization parameter is not exploited, and it is kept unchanged regardless of the noise level in the measurement.This, in part, is due to the fact that the behavior of g σ w.r.t.σ is non-trivial to characterize in a precise manner, leading to difficulties in tuning σ based on δ.In order to rigorously establish convergence, together with developing a principled approach to control the regularization strength arising from the denoiser, we consider PnP with linear denoisers in the next section.

Controlling the regularization strength in PnP
One fundamental question that arises when applying learned denoisers for solving inverse problems using PnP concerns itself with how to adjust the regularization strength that is applied.Indeed, learned denoisers are typically trained at a fixed noise level, whereas their practical application to inverse problems in a PnP framework and the theoretical notion of convergent regularization both require one to have certain control over the regularization strength.
An approach that has been shown to be beneficial in practice is the denoiser scaling approach [47]: given a denoiser D σ (designed for denoising at a given noise level σ), we introduce an extra scaling parameter α > 0, and define the scaled denoisers {D σ,α } α>0 as This choice of scaling is motivated by the fact that if J : X → R ∪ {∞} is 1-homogeneous (i.e., J(τ u) = τ J(u), for τ > 0) and its proximal operator is well-defined, we have In other words, if D σ = prox J , then D σ,α = prox J/α .Let us note that the choice of this particular scaling, while natural (norms and seminorms are 1-homogeneous, for example), is somewhat arbitrary.Indeed, suppose that J is instead c-homogeneous for some c > 0, i.e., J(δ u) = δ c J(u) for any u and δ > 0. We have, with δ > 0 arbitrary, which agrees with the result for 1-homogeneous functionals and generalizes it, except for 2-homogeneous functionals where the above derivation does not work.In fact, this leads nicely into a setting where no form of denoiser scaling as in Equation ( 33) can possibly be used to control the regularization strength to give a convergent regularization: for linear denoisers the multiplicative factor inside the denoiser can be pulled out and canceled against the factor outside of it.

Controlling the regularization strength of a linear denoiser
Let us consider the setting in which we have a linear denoiser D σ : X → X.
If we are to interpret it as a proximal operator of some underlying functional, we must assume that it is a symmetric, positive semi-definite (p.s.d.) operator, and if we assume that the underlying functional is convex as well, then D σ must be non-expansive in addition.These properties are direct consequences of the characterization of proximal operators given in [28] and generalized (to potentially non-convex functionals) in [18].Let us restrict to the case where D σ is non-expansive, bypassing the potential difficulties of non-convexity of the underlying variational problem.In fact, we will assume that D σ is contractive, i.e.D σ < 1, which as we will see later corresponds to assuming that the underlying regularization functional is coercive.Furthermore, we will assume that D σ is bounded from below, i.e.D σ (x) ≥ c x for some c > 0, so that D −1 σ exists and is a bounded operator.Remark 1 In practice, the assumption of symmetry can be relaxed somewhat by taking a different perspective: in [17] it is shown in finite dimensions that any denoiser which is similar to a symmetric p.s.d.matrix is admissible in PnP applications.Indeed, in this case we can find a modified inner product, with respect to which the denoiser is a proximal operator.
Let us study the characterization of proximal operators in more detail for the linear denoiser D σ .The goal is to understand the underlying functional J : X → R such that D σ = prox J .Note first that it is immediate from the definition (Equation ( 21)) that we can only hope to recover J up to an additive constant.We have with ∂ being the subdifferential.On the other hand, since D σ is linear, it is the gradient of the convex quadratic functional φ : X → R given by φ(x) = x, D σ x /2, i.e.D σ = ∇φ.Using a standard result from convex analysis, we know that (∇φ) −1 = ∇φ * , where φ * is the convex conjugate of φ, and Hence, up to an irrelevant additive constant, we find that the underlying regularization functional J corresponding to D σ is given by The most common way of controlling the regularization strength, when we have access to the underlying regularization functional J, is to simply scale it: introduce a parameter τ > 0 and consider prox τ J .If we apply this to Equation (34), we obtain which suggests, by following the above reasoning in reverse, that Here h τ : R → R, given by h τ (λ) = λ/(τ − λ(τ − 1)) is applied to D σ using the functional calculus.The takeaway message of the preceding derivation is that we can perform a spectral filtering operation on the linear denoiser D σ to control its regularization strength.In fact, more general filter functions h τ than the one seen here can be used, as we will see in what follows.
Remark 2 It is worth contrasting the spectral filtering approach proposed here with well-established spectral filtering approaches to regularization of linear, ill-posed, inverse problems [15]: whereas the traditional approaches operate on the forward operator to enact a regularization effect, we operate on the denoiser (agnostic about the forward operator to which the denoiser will be applied) to control its regularization strength.
To get a better understanding of what the spectral filtering operation does to a denoiser, consider Figure 3.This will help us get an idea of what we should ask of generalized filter functions, i.e. filter functions that do not just implement a scaling of the underlying regularization functional.The effect of spectral filtering on denoiser eigenvalues Fig. 3: The effect of filtering the denoiser as in (35).In accordance with intuition, the spectrum is flattened as τ → 0: as the regularization strength vanishes, the effect of the denoiser should vanish too.

Convergent regularization through generalized spectral filtering of linear denoisers
In the previous section, we saw that there is a way in which we can spectrally filter a linear denoiser to effectively scale the underlying regularization functional.Now, we will generalize the conditions on the spectral filter and show that this spectral filtering of linear denoisers allows us to obtain a convergent regularization of linear, ill-posed, inverse problems.
We saw in Equation ( 34) that a linear denoiser is related to an underlying regularization functional J as follows: we have D σ = prox J , where Furthermore, we saw the effect of scaling the regularization functional on the corresponding proximal operator.We can generalize this idea and look at where {h τ : R → R} τ ∈(0,∞) is a family of spectral filters that we can apply to D σ using the continuous functional calculus.Let us now derive conditions on the spectral filters h τ such that this gives a convergent regularization.For one, since we are assuming that D σ is bounded from below and contractive, we have that spec(D σ ) ⊂ (0, 1), and the same considerations that led to these assumptions then lead to us asking that h τ (spec(D σ )) ⊂ (0, 1) for each τ > 0.
Since we would like to think of J τ as somewhat similar to τ J, we ask the question whether the limit exists and is sufficiently well-behaved.Indeed, if this limit is well-defined, a natural result to aim for would be that we have convergence to a J * -minimizing least-squares solution to the inverse problem with appropriate choices of τ → 0 as δ → 0.
Remark 3 In Theorem 5, as above, we will assume that the denoiser is contractive, which by (34) implies that the corresponding regularization functional is coercive.This may be relaxed, by requiring that the kernel of the forward operator is compatible with the denoiser in the sense that the objective function in (36) is coercive.
Theorem 5 Suppose that D σ : X → X is a bounded, linear, self-adjoint operator, which is interpreted as a denoiser.Furthermore, assume that D σ is positive definite, bounded from below, and contractive (so that spec(D σ ) ⊂ (0, 1)).Suppose in addition that we have a bounded, linear forward operator A : X → Y (assuming w.l.o.g. that A = 1), and that {h τ : R → R} τ ∈(0,∞) is a collection of continuous scalar functions satisfying converges uniformly for λ ∈ spec(D σ ) as τ → 0, with limit r * and rate In this setting, let us define (using the continuous functional calculus to apply scalar functions to D σ ) We can compute the solution to the variational problem using PnP-FBS: By A.2 we can define J * (x) = lim τ →0 J τ (x)/τ .Now, we obtain a convergent regularization when the regularization parameter τ δ is chosen appropriately: suppose that τ δ ∼ δ.Assume that we have an underlying image x * ∈ X, clean measurements y = Ax * , {y δ } δ>0 is a sequence in Y satisfying y δ − y ≤ δ, and Then x(y δ , τ δ ) → x † , where is the J * -minimizing least squares solution to the inverse problem Ax = y.
Proof First note that under the assumptions of the theorem, PnP-FBS as described in ( 37) is a contractive fixed-point iteration (so that it has a unique fixed point to which it converges) for any τ and y, with fixed points satisfying the optimality condition of the variational problem in (36).By A.3, we can define J(x) = inf τ J τ (x)/τ and J(x) = sup τ J τ (x)/τ , so that c 2 Taking limits, this also gives us that The above bounds tell us that the J * -minimizing least squares solution to the inverse problem is unique, since it is defined by the minimization of a strongly convex functional on a closed linear subspace of X.
By the remarks above, we can compute these reconstructions using (37).For the sake of the proof, let us also define the variational reconstruction operators with a static regularization functional J * , as follows xstatic (y, τ ) := arg min This static regularization approach, with the parameter choice that we are using, is a convergent regularization by the existing theory (this is guaranteed, for example, by the general result in [41,Proposition 3.32]): xstatic (y δ , τ δ ) → x † with x † the J * -minimizing least squares solution to the inverse problem.Furthermore, the triangle inequality gives us that so it suffices to show that x(y δ , τ δ ) − xstatic (y δ , τ δ ) → 0 as δ → 0.
We can write so we just need to show that M −1 τ − M −1 τ,static → 0 as τ → 0 (since τ δ ∼ δ), where M τ = A * A + τ r τ (D) and M τ,static = A * A + τ r * (D).We have We will expand the inner matrix inversion using a Neumann series.Note first (by A.3) that M τ,static is bounded from below: M τ x ≥ cτ x .As a result, M −1 τ,static ≤ 1/(cτ ) and we can estimate Since A.2 tells us that r τ − r * L ∞ (spec(Dσ)) → 0 as τ → 0, this must be smaller than 1 for sufficiently small τ , which is a sufficient condition for absolute convergence of the Neumann series.Using this and (41), we see that Finally, we can simply estimate its norm from this as follows, using (42) and the fact M −1 τ,static ≤ 1/(cτ ): ) .
Example 1 Consider the case previously considered in (34) and (35), corresponding to h τ (λ) = λ/(τ (1 − λ) + λ).We have In particular, the assumptions A.3, A.2 and A.1 are trivially satisfied: we have h τ (λ) < λ for λ > 0, r * = r τ for all τ > 0 and inf τ >0,λ∈spec(Dσ) This should come as no surprise, since by the previous discussion, this choice of spectral filtering simply corresponds to the static regularization approach used in the proof of Theorem 5, for which classical theory establishes its convergence properties.

Experiments
In this section, we will demonstrate the use of the spectral filtering approach to control the regularization strength of a learned denoiser, when applied to an inverse problem using PnP-FBS, showing that it in fact gives rise to a practically convergent regularization method.Since the spectral filtering approach was developed for linear denoisers, we first need to decide on a reasonable design for a linear learnable denoiser.
In this work, we will modify the U-net architecture [38], which continues to be used with great success in image-to-image tasks, and combines a downscaling and upscaling path (as in an autoencoder) with skip connections that connect the corresponding scales before and after the bottleneck.The key insight for us is that the U-net architecture is symmetric, in the following sense: if the downscaling and upscaling operations are linear and each other's transposes, and the activation functions and biases are omitted, the U-net is linear and its transpose is a U-net of the same shape (which can be thought of as running the original U-net in reverse).In particular, it is straightforward to see that we can obtain a symmetric linear U-net in this way by tying weights between the downscaling and upscaling paths.Alternatively, and perhaps more simply, we can take the average of a linear U-net and its transpose to get a symmetric linear denoiser.This is the approach that we will take in the experiments considered in this section, since we can leverage the power of JAX [9] to do so: given a linear U-net, we can efficiently compute its vector-Jacobian products to get its transpose.Figure 4 shows a comparison of the denoising performance (in the same setting as the one we will consider for the application to inverse problems below) of such a linear U-net with a comparable non-linear U-net.By this, we mean that the networks have the same sizes and the same number of trainable parameters.While the non-linear U-net allows for better reconstructions, most notably in terms of sharpness, both denoisers remove a significant part of the noise in the noisy images.In what follows, we will use the linear U-net D σ,l and simply call it D σ .The experiment that we will consider is concerned with the inverse problem of image reconstruction in computed tomography (CT).We will consider images of size 64×64, consisting of randomly generated ellipse phantoms as in Figure 6, and simulate CT measurements (sinograms) using the ASTRA toolbox [1,2] with a parallel beam geometry with 150 equispaced views.A linear U-net is trained as a denoiser on ellipse phantoms corrupted with Gaussian white noise, after which we apply the denoiser in a PnP-FBS manner: denoting the forward operator, which maps images u to (clean) sinograms y by A, the noisy measurements y δ and the trained denoiser by D σ , we iterate where η is a step size, satisfying η ≤ 2/ A 2 , so that the limit is well-defined.
Here, we simply use the spectral filters h τ corresponding to scaling the underlying regularization functional as seen in Equation ( 35).We will consider a sequence of noisy measurements {y δ } δ>0 such that y δ − y ≤ δ and a corresponding step size τ δ ∝ δ, satisfying the conditions of Theorem 5.In Figure 5 and Figure 6 we show that the spectral filtering approach indeed leads to a practically convergent regularization, as predicted by Theorem 5.  x * x † x(y δ , τ δ ) FBP(y δ ) x(y δ , τ δ ) FBP(y δ ) x(y δ , τ δ ) FBP(y δ ) Increasing δ Fig. 6: Applying the spectral filtering approach, we observe convergent regularization in practice.We show a selection of snapshots corresponding to the plot in Figure 5.Note that x * (the underlying ground truth) is distinct from x † since the forward operator has a non-trivial kernel.

Conclusions
The question if a reconstruction algorithm provides a convergent regularization has been long studied in inverse problems, as it provides more than just the knowledge that a solution can be computed at a certain noise level.It tells us that stable solutions exist for all noise realizations and even more importantly that in the limit case, when noise vanishes, we obtain a solution of the underlying operator equation.In other words, we can guarantee mathematically that obtained solutions are indeed solutions to the inverse problem.This is in contrast to some novel data-driven approaches where we may only guarantee that obtained solutions are minimizers of the empirical loss, given suitable training data.Consequently, the concept of convergent datadriven reconstructions has gained considerable interest very recently, see for instance [30].Here, PnP approaches take a special role due to their straightforward connection to convex optimization [23] and the possibility to incorporate learned denoisers given by non-linear neural networks.But despite considerable advances in establishing convergence notions, i.e., fixed-point and objective convergence, the question of convergent regularization is still open for general non-linear denoisers.
In this work, we presented a step forward for learned linear denoisers using the novel concept of spectral filtering of the denoiser.The presented approach allows to establish a provably convergent regularization in the PnP framework.Additionally, this convergence is demonstrated numerically on the inverse problem of CT image reconstruction.As established in Theorem 5, there is some freedom in the choice of filters to apply to the denoiser.In future work, this choice could be studied in more detail.In this direction, it is of particular interest to choose spectral filters that are not too computationally costly to evaluate but still give a way to tune the regularization strength of the denoiser.Indeed, in the present implementation of the method, after training, the denoiser is instantiated as a matrix, the eigen-decomposition of which is computed to apply the spectral filtering.By considering spectral filters given by polynomials, for example, we would circumvent the need to instantiate the denoiser as a matrix and compute a full eigen-decomposition.Besides this, it would be of great interest to study whether there is any reasonable generalization of the denoiser filtering approach to the setting in which the denoiser is non-linear.
In fact, we have observed similar convergence behavior numerically even when using a non-linear denoiser in the PnP gradient-step framework (see Figure 2), suggesting a promising direction for proving that PnP with realistic assumptions on the denoiser can give rise to convergent regularization.The gradient-step framework is, however, just one way of controlling the regularization strength of the learned denoiser.In particular, it relies on flipping the usual splitting of the variational objective and, as a result, requires repeated evaluation of the proximal operator of the data term.This may be very computationally costly if the forward operator is expensive to evaluate.As a result, it is still of great interest to study other ways of controlling the regularization strength of a realistic learned denoiser in PnP that will result in provably convergent regularization.

Fig. 1 :
Fig. 1: Sparse-view CT reconstruction with different regularizers.The noisy sinogram is generated by first computing parallel-beam projections along 64 equally-spaced angular positions of the source, with 365 lines per position, and then by adding white Gaussian noise.The regularization parameter λ is chosen to be λ = 0.1 in (b) and λ = 10 −4 in (c) and (d).This experiment demonstrates that the convergence of the regularization scheme is not particularly indicative of the quality of the reconstruction and underscores the need to learn a data-adaptive regularizer for enhancing the reconstruction quality.

Fig. 2 :
Fig.2: PnP gradient-step DRUNet denoiser as a convergent regularization method for image deblurring.The PnP scheme for reconstruction minimizes variational energy of the form f + λ gσ, where f is the fidelity and gσ is the regularizer induced by a pre-trained denoiser.The input blurry image is given by y σ 0 = Ax + w, where A is a Gaussian blur kernel and w is additive Gaussian noise with variance σ 2 0 .The images from (b) to (i) are the deblurred images x (y σ 0 ) corresponding to the noise level σ 0 (expressed as the % of maximum pixel value 255.0 in the ground truth).The regularization parameter is selected as λ = c σ 0 + ε, where the constant c = 0.04 and ε = 10 −4 .

Fig. 4 :
Fig. 4: Comparing the denoising performance of a non-linear U-net (D σ,nl ) with a linear, symmetric U-net (D σ,l ) based on the same architecture.Here x * is a ground truth image and y is the same image, corrupted by Gaussian noise.These images are generated in the same way as the training data was generated.In contrast to D σ,nl , D σ,l struggles to reconstruct sharp edges as it does not contain any non-linearity.On the other hand, both denoisers significantly improve the signal-to-noise ratio: y has a PSNR of 24.3 dB, D σ,nl (y) has a PSNR of 34.0 dB and D σ,l (y) has a PSNR of 27.8 dB.

Fig. 5 :
Fig.5: Applying the spectral filtering approach, we observe convergent regularization in practice.Here x † is the J * -minimizing solution to the noiseless least-squares problem as in Theorem 5.