Flexible GMRES for total variation regularization

  • Silvia GazzolaEmail author
  • Malena Sabaté Landman
Open Access


This paper presents a novel approach to the regularization of linear problems involving total variation (TV) penalization, with a particular emphasis on image deblurring applications. The starting point of the new strategy is an approximation of the non-differentiable TV regularization term by a sequence of quadratic terms, expressed as iteratively reweighted 2-norms of the gradient of the solution. The resulting problem is then reformulated as a Tikhonov regularization problem in standard form, and solved by an efficient Krylov subspace method. Namely, flexible GMRES is considered in order to incorporate new weights into the solution subspace as soon as a new approximate solution is computed. The new method is dubbed TV-FGMRES. Theoretical insight is given, and computational details are carefully unfolded. Numerical experiments and comparisons with other algorithms for TV image deblurring, as well as other algorithms based on Krylov subspace methods, are provided to validate TV-FGMRES.


TV regularization Flexible GMRES Smoothing-norm preconditioning Image deblurring 

Mathematics Subject Classification

AMS 65F08 AMS 65F10 AMS 65F22 

1 Introduction

This paper considers large-scale discrete ill-posed problems of the form
$$\begin{aligned} b = Ax + e\,, \end{aligned}$$
where the matrix \(A\in \mathbb {R}^{N\times N}\) is ill-conditioned with ill-determined rank (i.e., the singular values of A quickly decay and cluster at zero without an evident gap between two consecutive ones), and \(e\in \mathbb {R}^N\) is unknown Gaussian white noise. Systems like this typically arise when discretizing inverse problems, which are central in many applications (such as astronomical and biomedical imaging, see [11, 18] and the references therein). This paper mainly deals with signal deconvolution (deblurring) problems, where \(x\in \mathbb {R}^N\) is the unknown sharp signal we wish to recover, and b is the measured (blurred and noisy) signal. In the case of images (i.e., two-dimensional signals) we use the following convention: \(X\in \mathbb {R}^{n\times n}\) is the array storing the sharp image, while \(x\in \mathbb {R}^N\), \(N=n^2\), is the vector obtained by stacking the columns of X. The spatially invariant convolution kernel (blur) is assumed to be known. More specifically, in image deblurring, A is determined starting from a so-called point spread function (PSF, which specifies how the points in the image are distorted by the blur) and the boundary conditions (which specify the behavior of the image outside the recorded values).
Because of the ill-conditioning of A and the presence of noise e, some regularization must be applied to (1.1) in order to compute a meaningful approximation of x. To this end, one may employ the well-known Tikhonov method in general form, which computes a regularized solution
$$\begin{aligned} x_{L,{\lambda }} = \arg \min _{x\in \mathbb {R}^N} \Vert Ax-b\Vert _2^2+\lambda \Vert Lx\Vert ^2_2\,. \end{aligned}$$
Here \(L\in \mathbb {R}^{M\times N}\) is the so-called regularization matrix that enforces some smoothing in \(x_{L,{\lambda }} \) (by including the penalization \(\Vert Lx\Vert _2^2\) in the above objective function), and \(\lambda >0\) is the so-called regularization parameter that specifies the amount of smoothing (by balancing the fit-to-data term \(\Vert Ax-b\Vert _2^2\) and the regularzation term \(\Vert Lx\Vert _2^2\)). The choice of L and \(\lambda \) is problem-dependent, the former being typically the identity (\(L=I\), in which case (1.2) is said to be in standard form), or a rescaled finite difference discretization of a derivative operator. When information about the regularity of x is available, one obtains enhanced reconstructions by considering a suitable \(L\ne I\). If the GSVD of the matrix pair (AL) (or the SVD of the matrix A when \(L=I\)) can be feasibly computed, then the vector \(x_{L,{\lambda }} \) in (1.2) can be directly expressed as a linear combination of the right (generalized) singular vectors basis. In particular, by using the GSVD, one can notice that additional smoothing is encoded in the basis vectors, and the components of \(x_{L,{\lambda }} \) belonging to the null space of L are unaffected by regularization (see [18, Chapter 8] for more details).

Unfortunately, when considering large-scale problems whose associated coefficient matrix A may not have an exploitable structure or may not be explicitly stored, one cannot assume the GSVD to be available. In this setting iterative regularization methods are the only option, i.e., one can either solve the Tikhonov-regularized problem (1.2) iteratively, or apply an iterative solver to the original system (1.1) and terminate the iterations early (see [4, 11, 16, 18] and the references therein).

This paper considers the last approach, sometimes referred to as “regularizing iterations”, and it focuses on the GMRES method [27, Chapter 6] and some variants thereof. GMRES does not require \(A^T\) nor matrix-vector products with \(A^T\), and therefore it appears computationally attractive when compared to other regularizing Krylov subspace methods such as LSQR [26]. Although GMRES was proven to be a regularization method in [6], it is well known that it may have a poor performance in some situations, e.g., when dealing with highly non-normal linear systems [21]. It has been shown that, however, this issue can be fixed by using specific preconditioners. For instance, the so-called smoothing-norm preconditioned GMRES method derived in [19] (and here referred to as GMRES(L)) follows from transforming the general Tikhonov problem (1.2) into standard form (i.e., into an equivalent Tikhonov problem with \(L=I\)), and then applying GMRES to the transformed fit-to-data term. GMRES(L) can be regarded as a right-preconditioned GMRES method that computes an approximate regularized solution as a linear combination of vectors that incorporate the smoothing effect of the regularization matrix L in (1.2). We emphasize that, here and in the following, the term “preconditioner” is used in a somewhat unconventional way. Indeed, the preconditioners used in this paper aim at computing a good regularized solution to problem (1.1) and, from a Bayesian point of view, they may be regarded as “priorconditioners” [5].

Total variation regularization is very popular when dealing with signal deconvolution problems (see [9, Chapter 5] and the references therein), and amounts to computing
$$\begin{aligned} x_{\text {TV},{\lambda }}=\text {arg} \min _{x\in \mathbb {R}^N} \Vert Ax-b\Vert _2^2+\lambda \text {TV}(x), \end{aligned}$$
where \(\text {TV}(x)\) denotes the isotropic total variation of the unknown x, which measures the magnitude of the discrete gradient of x in the \(\ell ^1\) norm. The weighted term \(\lambda \text {TV}(x)\) in the above Tikhonov-like problem has the effect of producing piecewise-constant reconstructions, as solutions with many steep changes in the gradient are penalized or, equivalently, solutions with a sparse gradient are enforced. In particular, for images, \(\lambda \text {TV}(x)\) helps preserving edges.

The convex optimization problem (1.3) is very challenging to solve, both because of its large-scale nature, and because of the presence of the non-differentiable total variation term (so that the efficient iterative techniques used to solve problem (1.2) cannot be straightforwardly adopted in this setting). We also mention in passing that the so-called \(\text {TV}_p\) penalization term, which evaluates the magnitude of the gradient with respect to some \(\ell ^p\) “norm”, \(0<p<1\), can be considered instead of the usual \(\text {TV}=\text {TV}_1\), see [7]. \(\text {TV}_p\) is notably more effective in enforcing sparse gradients (as it better approximates the \(\ell ^0\) quasi-norm), but the resulting Tikhonov-like problem is not convex anymore (and therefore may have multiple local minima). A variety of numerical approaches for the solution of (1.3) have already been proposed: some of them are based on fixed-point iterations, smooth approximations of \(\text {TV}(x)\), fast gradient-based iterations, and Bregman-distance methods; see [3, 8, 25, 29], to cite only a few.

This paper is concerned with strategies that stem from the local approximation of (1.3) by a sequence of quadratic problems of the form (1.2), and that exploit Krylov subspace methods to compute solutions thereof. To the best of our knowledge, this idea was first proposed for total variation regularization in [30], where the authors derive the so-called iteratively reweighted norm (IRN) method consisting of the solution of a sequence of penalized weighted least-squares problems with diagonal weighting matrices incorporated into the regularization term and dependent on the previous approximate solution (so that they are updated from one least-squares problem to the next one). For large-scale unstructured problems, this method intrinsically relies on an inner-outer iteration scheme. In the following we use the acronym IRN to indicate a broad class of methods that can be recast in this framework.

Although the IRN method [30] is theoretically well-justified and experimentally effective, it has a couple of drawbacks. Firstly, conjugate gradient is repeatedly applied from scratch to the normal equations associated to each penalized least-squares problem of the form (1.2) in the sequence: this may result in an overall large number of iterations. Secondly, the regularization parameter \(\lambda \) should be chosen (and fixed) in advance. The so-called modified LSQR (MLSQR) method [1] partially remedies both these shortcomings. Although the starting point of MLSQR is still an IRN approach [30], each Tikhonov-regularized problem in the sequence of least-squares problems is transformed into standard form: in this way the matrix A is now right preconditioned and a preconditioned LSQR method can be applied. This approach typically results in a smaller number of iterations with respect to IRN [30]; moreover, different values of the regularization parameter can be easily considered. On the downside, LSQR is still applied sequentially to each IRN least-squares problem, and a new approximation subspace for the LSQR solution is computed from scratch. The so-called GKSpq method [23] leverages generalized Krylov subspaces (GKS), i.e., approximation subspaces where the updated weights and adaptive regularization parameters can be easily incorporated as soon as they become available. In other words, only one approximation subspace is generated when running the GKSpq method for the IRN least-squares problems associated to (1.3), and the approximate solutions are obtained by orthogonal projections onto GKS of increasing dimension. In this way, GKSpq avoids inner-outer iterations and is very efficient when compared to IRN and MLSQR.

All the methods surveyed so far implicitly consider the normal equations associated to least-squares approximations of problem (1.3). As already remarked, approaches based on GMRES applied directly to the fit-to-data term in (1.2) may be more beneficial in some situations, as the computational overload of dealing with \(A^T\) can be avoided. The restarted generalized Arnoldi–Tikhonov (ReSt-GAT) method [15] is arguably the only approach that generates a GMRES-like approximation subspace for the solution of each least-squares problem associated to the IRN strategy. However ReSt-GAT has two shortcomings: it is based on an inner-outer iteration scheme (though approximations recovered during an iteration cycle are carried over to the next one by performing convenient warm restarts) and the TV penalization does not directly affect the approximation subspace of ReSt-GAT (failing to properly enhance piecewise constant reconstructions).

The goal of this paper is to propose a novel strategy that employs GMRES for the solution of Tikhonov-regularized problems associated to the IRN approach to (1.3). In particular, a flexible instance of a GMRES(L)-like method is used to solve preconditioned versions of system (1.1), which are obtained by considering quadratic approximations to problem (1.3), performing transformations into standard form, and applying GMRES to the resulting fit-to-data term. In this way, the effect of the total variation regularization term (defined with respect to iteratively updated weights and a discrete gradient operator) is incorporated into the solution subspace, which is affected by both the null space of the regularization matrix and the adaptive weights. As the weights are updated as soon as a new approximate solution becomes available, i.e., immediately after a new GMRES iteration is computed, the flexible GMRES (FGMRES) method (see [27, Chapter 9]) is employed to handle variable preconditioning along the iterations. The resulting regularization method is dubbed Total-Variation-FGMRES (TV-FGMRES). We emphasize that the TV-FGMRES method is inherently parameter-free, as only one stopping criterion should be set to suitably terminate the iterations (while, for all the other solvers for problem (1.3) listed so far, one has to choose both the parameter \(\lambda \) and the number of iterations). Moreover, the new approach is different from the ReSt-GAT one [15] for two reasons: firstly, the standard GMRES approximation subspaces are modified and, secondly, regularizing iterations are employed rather than solving a sequence of Tikhonov problems (1.2); also, this approach is somewhat analogous to the GKSpq [23] one, but the two methods differ in the computation of the approximation subspaces (recall that the GKSpq ones involve both \(A^T\) and \(\lambda \)).

This paper is organized as follows. Section 2 covers some background material, including the definition of the weighting matrices for the approximation of the total variation regularization term in an IRN fashion, and a well-known procedure for transforming problem (1.2) into standard form. Section 3 describes the new TV-FGMRES method. Section 4 dwells on implementation details. Section 5 contains numerical experiments performed on three different image deblurring test problems. Section 6 presents some concluding remarks and possible future works.

2 IRN, weights, and standard form transformation

The main idea underlying the IRN approaches for the solution of TV-regularized problems is to approximate the minimizer of (1.3) by solving a sequence of regularized problems with rescaled penalization terms expressed as reweighted \(\ell ^2\) norms, whose weights are iteratively updated using a previous approximation of the solution, i.e.,
$$\begin{aligned} x_{\text {TV},{\lambda }}\simeq x_{L,{\lambda }}=\text {arg} \min _{x\in \mathbb {R}^N} \Vert Ax-b\Vert _2^2+\lambda \Vert Lx\Vert _2^2\,,\quad \text{ with }\quad L=WD\in \mathbb {R}^{M\times N}\,, \end{aligned}$$
where \(W=W^{(i)}=W(Dx^{(i-1)})\) denotes a diagonal weighting matrix defined with respect to an available approximation \(x^{(i-1)}\) of \(x_{\text {TV},{\lambda }}\) (so that also L depends on \(x^{(i-1)}\)), and D denotes a scaled finite difference matrix discretizing a derivative operator of the first order. More precisely, (2.1) is obtained by locally approximating the total variation functional in (1.3) by the quadratic functional
$$\begin{aligned} {\frac{1}{2}}\left( ||W({D}x^{(i-1)}) D x||_2^2 +\text {TV}(x^{(i-1)})\right) \,, \end{aligned}$$
where the constant second term is dropped and the rescaling factor is incorporated into \(\lambda \) in (2.1). The weights \(W({D}x^{(i-1)})\) are determined in such a way that (2.2) is tangent to \(\text {TV}(x)\) at \(x = x^{(i-1)}\), and it is an upper bound for \(\text {TV}(x)\) elsewhere, see [30]. The weights are derived as follows, distinguishing between:
  • The one-dimensional (1d) case. In a discrete setting with \(v\in \mathbb {R}^N\), \(\text {TV}(v) = \Vert D_\mathrm{1d}v\Vert _1\), where
    $$\begin{aligned} D_\mathrm{1d}=\left[ \begin{array}{llll} 1 &{}\quad -1 &{}\quad &{}\quad \\ &{}\quad \ddots &{}\quad \ddots &{}\quad \\ &{}\quad &{}\quad 1 &{}\quad -1 \\ \end{array} \right] \in \mathbb {R}^{(N-1)\times N}. \end{aligned}$$
    The weighting matrix
    $$\begin{aligned} W_\mathrm{1d}= W_\mathrm{1d}(D_\mathrm{1d}v) = \mathrm {diag}\left( \left| D_\mathrm{1d}v\right| ^{-1/2} \right) \in \mathbb {R}^{(N-1)\times (N-1)}\,, \end{aligned}$$
    where both modulus and exponentiation are considered component-wise, is used in practice to approximate the 1-norm. Indeed, for a given v, one can easily see that
    $$\begin{aligned} \Vert W_\mathrm{1d}D_\mathrm{1d}v\Vert _2^2=\sum _{k=1}^{N-1}\left| [D_\mathrm{1d}v]_k\right| ^{-1}[D_\mathrm{1d}v]_k^{2}=\sum _{k=1}^{N-1}\left| [D_\mathrm{1d}v]_k\right| =\Vert D_\mathrm{1d}v\Vert _1 =\text {TV}(v) \,, \end{aligned}$$
    where \([w]_k\) denotes the kth entry of a vector w.
  • The two-dimensional (2d) case. In a discrete setting,
    $$\begin{aligned} \text {TV}(v) = \left\| \left( (D^\mathrm {h}v)^2+(D^\mathrm {v}v)^2\right) ^{1/2}\right\| _1\,, \end{aligned}$$
    where, if \(v\in \mathbb {R}^N\) is obtained by stacking the columns of a 2d array \(V\in \mathbb {R}^{n\times n}\) with \(N=n^2\), the discrete first derivatives in the horizontal and vertical directions are given by
    $$\begin{aligned} D^\mathrm {h}= (D_\mathrm{1d}\otimes I) \in \mathbb {R}^{n(n-1)\times n^2}\,,\qquad D^\mathrm {v}= (I \otimes D_\mathrm{1d}) \in \mathbb {R}^{n(n-1)\times n^2}\,, \end{aligned}$$
    respectively. Here Open image in new window is the 1d first derivative matrix (2.3) of appropriate size, and I is the identity matrix of size n, so that 2d discrete operators are defined in terms of the corresponding 1d ones (note that here both Open image in new window and I have n columns, i.e., the size of the 2d array V). Deriving an expression for the weights in the discrete 2d setting is less straightforward. Following [30], for a given v, and letting \(\widetilde{N}=n(n-1)\), one takes
    $$\begin{aligned} D_{{\mathrm{2d}}}= & {} \left[ \begin{array}{c} D^\mathrm {h}\\ D^\mathrm {v}\end{array} \right] \in \mathbb {R}^{2\widetilde{N}\times N}\,,\nonumber \\ \widetilde{W}_{{\mathrm{2d}}}= & {} \widetilde{W}_{{\mathrm{2d}}}(D_{{\mathrm{2d}}}v) =\mathrm {diag}\left( \left( (D^\mathrm {h}v)^2+(D^\mathrm {v}v)^2\right) ^{-1/4} \right) \in \mathbb {R}^{\widetilde{N}\times \widetilde{N}}\,,\nonumber \\ W_{{\mathrm{2d}}}= & {} W_{{\mathrm{2d}}}(D_{{\mathrm{2d}}}v)=\left[ \begin{array}{cc} \widetilde{W}_{{\mathrm{2d}}}&{} 0\\ 0 &{} \widetilde{W}_{{\mathrm{2d}}}\end{array} \right] \in \mathbb {R}^{2\widetilde{N}\times 2\widetilde{N}}\,. \end{aligned}$$
The expression of the weights in (2.4) and (2.5) generalizes to the \(\text {TV}_p\) functional, \(0<p<1\), defined as
$$\begin{aligned} \text {TV}_p(v) = \Vert D_\mathrm{1d}v\Vert _p^p\quad \text{ and }\quad \text {TV}_p(v) = \left\| \left( (D^\mathrm {h}v)^2+(D^\mathrm {v}v)^2\right) ^{1/2}\right\| _p^p\,, \end{aligned}$$
in the 1d and 2d cases, respectively. Indeed, given v, it suffices to take the weights
$$\begin{aligned}&W_{p,\mathrm{1d}}(D_\mathrm{1d}v) = \mathrm {diag}\left( \left| D_\mathrm{1d}v\right| ^{(p-2)/2} \right) \quad \text{ and }\\&\widetilde{W}_{p,{{\mathrm{2d}}}}(D_{{\mathrm{2d}}}v) = \mathrm {diag}\left( \left( (D^\mathrm {h}v)^2+(D^\mathrm {v}v)^2\right) ^{(p-2)/4} \right) \,, \end{aligned}$$
for the 1d and 2d cases, respectively, and then proceed as in the \(\text {TV}\) case illustrated above.
It is important to stress that division by zero may occur when computing the weights associated to the \(\text {TV}\) and \(\text {TV}_p\) functionals (this is the case when a component of the gradient magnitude is zero, which should not be regarded and a rare occurrence as (1.3) enforces sparsity in the gradient of the solution). To avoid this, one should set some safety thresholds \(\tau _1>\tau _2>0\), define
$$\begin{aligned} f_\tau ([w]_k)={\left\{ \begin{array}{ll} |[w]_k| &{} \quad \text{ if } |[w]_k|>\tau _1\\ \tau _2 &{} \quad \text{ otherwise } \end{array}\right. }\qquad \text{ for } \text{ each } \text{ component } [w]_k \text{ of } \text{ a } \text{ vector } w, \end{aligned}$$
and then consider as weights the diagonal matrices \(W_\mathrm{1d}(f_\tau (D_\mathrm{1d}v))\) and \(W_{{\mathrm{2d}}}(f_\tau (D_{{\mathrm{2d}}}v))\) for \(\text {TV}\), \(W_{p,\mathrm{1d}}(f_\tau (D_\mathrm{1d}v))\) and \(W_{p,{{\mathrm{2d}}}}(f_\tau (D_{{\mathrm{2d}}}v))\) for \(\text {TV}_p\).

In the following, when the distinction between the 1d and the 2d cases can be waived, we will use the simpler notations W for the \(M\times M\) diagonal weighting matrix, and D for the \(M\times N\) first derivative matrix (with \(M=N-1\) in the 1d case, and \(M=2\widetilde{N}\) in the 2d case).

We conclude this section by recalling a well-known strategy to transform a generalized Tikhonov-regularized problem (1.2) with a given regularization matrix \(L\in \mathbb {R}^{M\times N}\) into standard form, with \(\text {rank}(L)=M<N\) and \(\mathscr {N}(A)\cap \mathscr {N}(L)=\{\mathbf {0}\}\), i.e., the null spaces of A and L trivially intersect (see [12] for more details). Under these assumptions, the solution of (1.2) can be equivalently expressed as
$$\begin{aligned} x_{L,{\lambda }} = L^{\dagger }_{A} \bar{y}_{L,{\lambda }}+x_0=\bar{x}_{L,{\lambda }}+x_0\,,\quad \text{ where }\quad \begin{array}{lcl} \bar{y}_{L,{\lambda }} &{}=&{}\arg \min _{\bar{y}\in \mathbb {R}^{M}} \Vert \bar{A}\bar{y}-\bar{b}\Vert _2^2+{\lambda }\Vert \bar{y}\Vert _2^2\,,\\ \bar{A} &{}= &{} A L^{\dagger }_{A}\,, \\ \bar{b} &{}=&{} b-A x_0\,. \\ \end{array} \end{aligned}$$
Here \(L^{\dagger }_{A}\) is the A-weighted pseudoinverse of L, defined as
$$\begin{aligned}&L^{\dagger }_{A} = [I - (A(I-L^{\dagger }L))^{\dagger }A]L^\dagger \in \mathbb {R}^{N\times M}, \end{aligned}$$
where \(L^\dagger \) denotes the Moore–Penrose pseudoinverse of L, \(\bar{x}_L\) is the component of the solution \(x_L\) in \(\mathscr {R}(L_A^\dagger )\) (i.e., the range of \(L_A^\dagger \)), and \(x_0\) is the component of the solution \(x_L\) in \(\mathscr {N}(L)\) (i.e., \(x_0=(A(I-L^\dagger L))^\dagger b\)). We remark that, when a given matrix \(L\in \mathbb {R}^{M\times N}\) has full rank but is such that \(M>N\), a reduced QR factorization of L, \(L=QR\), can be performed, and the above derivations can be applied to \(R\in \mathbb {R}^{N\times N}\) (instead of L). In the following we will use this procedure to transform (2.1), for a given weighting matrix \(W=W^{(i)}=W(Dx^{(i-1)})\), into standard form.

3 TV-preconditioned flexible GMRES

We begin this section by defining a TV-preconditioned version of the GMRES(L) method [19], motivated by the generalized Tikhonov problem (2.1) appearing within the IRN framework and assuming that the matrix \(W=W^{(i)}=W(Dx^{(i-1)})\) is fixed. To achieve this, we first transform problem (2.1) into standard form, following the procedure outlined in (2.7). We then describe how to apply GMRES to a system linked to the fit-to-data term in (2.7) or, in other words, we consider \(\lambda =0\) in (2.7). Indeed, we would like to apply GMRES (as a regularizing iterative method) to the system
$$\begin{aligned} A(L_{A}^{\dagger }\bar{y}_L+x_0) = b\,, \end{aligned}$$
where \(L_A^\dagger \) can be regarded as a preconditioner accounting for a total variation regularization term, \(x_0\) is defined as in (2.7), and \(\bar{y}_L\)=\(\bar{y}_{L,0}\) (note that, to keep the notations simple, in the following we will use \(\bar{y}_L\)=\(\bar{y}_{L,0}\), \(\bar{x}_L\)=\(\bar{x}_{L,0}\), and \(x_L\)=\(x_{L,0}\) ).
Applying GMRES to (3.1) is not straightforward, because the coefficient matrix \(AL_A^\dagger \in \mathbb {R}^{N\times M}\) is rectangular. To overcome this obstacle we closely follow the approach proposed in [19]. Let K be a matrix with full column-rank whose columns span \(\mathscr {N}(L)\): when L is as in (2.1), since W is nonsingular, this means that
$$\begin{aligned} \mathscr {R}(K) =\mathscr {N}(L)=\mathscr {N}(D)=\text {span}\{\mathbf {1}\} \end{aligned}$$
for both the 1d and the 2d cases, where \(\mathbf {1}=[1,\ldots ,1]^T\in \mathbb {R}^N\). Thanks to this remark, the expression for \(x_L\) in (2.7) (with \(\lambda =0\)) can be further detailed as
$$\begin{aligned} x_L=\bar{x}_L+x_0=L_{A}^{\dagger }\bar{y}_L+x_0=L_{A}^{\dagger }\bar{y}_L+K t_0\,, \end{aligned}$$
where the scalar \(t_0\in {\mathbb {R}}\) is uniquely determined by computing
$$\begin{aligned} t_0= (AK)^\dagger b=\arg \min _{t\in {\mathbb {R}}}\Vert (AK)t-b\Vert _2\,. \end{aligned}$$
We note that decomposition (3.2) is uniquely determined if both L and K have full rank. By plugging expression (3.2) into (3.1), we get
$$\begin{aligned} A\left[ L_{A}^{\dagger },\;K\right] \left[ \begin{array}{cccc} \bar{y}_L \\ t_0 \\ \end{array} \right] = b\,, \end{aligned}$$
which, once premultiplied by \([D^{\dagger },\; K]^{T}\), gives the \(2\times 2\) block system
$$\begin{aligned} \left[ \begin{array}{ll} (D^{\dagger })^TA L_{A}^{\dagger } &{}\quad (D^{\dagger })^TAK \\ K^T A L_{A}^{\dagger } &{}\quad K^TAK \end{array} \right] \left[ \begin{array}{llll} \bar{y}_L \\ t_0 \\ \end{array} \right] = \left[ \begin{array}{llll} (D^{\dagger })^T b \\ K^{T} b \\ \end{array} \right] . \end{aligned}$$
We can easily eliminate \(t_0\) from this system by inverting the \(1\times 1\) (2, 2) block, so obtaining the Schur complement system,
$$\begin{aligned} (D^{\dagger })^T P A L_{A}^{\dagger }\bar{y}_L = (D^{\dagger })^T P b\,, \end{aligned}$$
or, using once more the relations in (3.2),
$$\begin{aligned} (D^{\dagger })^T P A \bar{x}_L = (D^{\dagger })^T P b\,, \end{aligned}$$
$$\begin{aligned} P = I - A K(K^{T}A K)^{-1}K^{T}\in \mathbb {R}^{N\times N} \end{aligned}$$
is the oblique projector onto the orthogonal complement of \(\mathscr {R}(K)\) along \(\mathscr {R}(AK)\). The coefficient matrix in system (3.5) has size \(M\times M\), while the one in system (3.6) has size \(M\times N\). We emphasize that this Schur complement system is different from the one derived in [19], where both sides of (3.4) are premultiplied by the matrix \([L_A^\dagger ,\; K]^{T}\). In our setting, since the matrix L contains weights W that should be suitably updated (as explained in the remaining part of this section), we conveniently discard the contribution of W in the left preconditioner in (3.5). In the following, to keep the formulas light, we often use the notations
$$\begin{aligned} \widehat{A}=(D^{\dagger })^T P A\in \mathbb {R}^{M\times N}\,,\quad \widehat{b} = (D^{\dagger })^T P b\in \mathbb {R}^M\,, \end{aligned}$$
so that systems (3.5) and (3.6) can be even more compactly written as \(\widehat{A}L_A^\dagger \bar{y}_L = \widehat{b}\) and \(\widehat{A}\bar{x}_L = \widehat{b}\), respectively. The mth iteration of the GMRES method applied to compute \(x_L\) as in (3.2) produces an approximation \(x_{L,m}\) thereof, such that
$$\begin{aligned} x_{L,m}\in L_A^\dagger \mathscr {K}_m((D^\dagger )^TPAL_A^\dagger , (D^\dagger )^TPb)+x_0 =\mathscr {K}_m(L_A^\dagger (D^\dagger )^TPA,L_A^\dagger (D^\dagger )^TPb)+x_0\,. \end{aligned}$$
We stress again that here \(x_0\) is the component of \(x_L\) in \(\mathscr {N}(L)\), and the weighting matrix W (implicit in L) is fixed. Also, referring to the original notations in (2.7), \(x_{L,m}=x_{L,0,m}\) (i.e., \(\lambda =0\)).
Dropping the assumption that the weights are fixed, according to the IRN framework we should keep updating L in (2.1) with new approximations of \(x_L\). If this happens as soon as a new approximation \(x_{L,m-1}\) is computed by the GMRES method applied to (3.5), i.e., if \(W=W(Dx_{L,m-1})\), then we should incorporate an iteration-dependent A-weighted pseudo-inverse (denoted by \((L^{(m)})_A^\dagger \)) within the GMRES scheme to solve (3.5), and adopt a flexible version of the Arnoldi algorithm [27, Chapter 9] to handle variable preconditioning: this leads to the TV-FGMRES method. The mth iteration of the flexible Arnoldi algorithm updates a decomposition of the form
$$\begin{aligned} \widehat{A}Z_m = V_{m+1}H_m\,,\quad \text{ where }\quad Z_m\in \mathbb {R}^{N\times m},\, V_{m+1}\in \mathbb {R}^{M\times (m+1)},\, H_m\in \mathbb {R}^{(m+1)\times m}. \end{aligned}$$
More specifically, in the above decomposition: \(H_m\) is an upper Hessenberg matrix; \(V_{m+1}\) has orthonormal columns \(v_i\), \(i=1,\ldots ,m+1\), with \(v_1 = \widehat{b}/\Vert \widehat{b}\Vert _2\); \(Z_m\) has columns \(z_i=(L^{(i)})_A^\dagger v_i\), \(i=1,\ldots ,m\). Since the columns of \(Z_m\) already include the contribution of the variable preconditioners \((L^{(i)})_A^\dagger \), for \(i=1,\ldots ,m\), they form a basis for the vector \(\bar{x}_L\) in (3.2). Therefore, at the mth step of TV-FGMRES, \(\bar{x}_L\) is approximated by the following vector
$$\begin{aligned} \bar{x}_{L,m}=Z_ms_m\,,\quad \text{ where }\quad s_m=\arg \min _{s\in \mathbb {R}^m}\Vert H_ms - \Vert \widehat{b}\Vert _2e_1 \Vert _2\,, \end{aligned}$$
and \(e_1\in \mathbb {R}^{m+1}\) is the first canonical basis vector of \(\mathbb {R}^{m+1}\). Due to decomposition (3.9) and the properties of the matrices appearing therein,
$$\begin{aligned} \min _{\bar{x}_L\in \mathscr {R}(Z_m)}\Vert \widehat{A}\bar{x}_L - \widehat{b}\Vert _2= & {} \min _{s\in \mathbb {R}^m}\Vert \widehat{A}Z_ms - \widehat{b}\Vert _2= \min _{s\in \mathbb {R}^m}\Vert V_{m+1}(H_ms - \Vert \widehat{b}\Vert _2e_1)\Vert _2\\= & {} \min _{s\in \mathbb {R}^m}\Vert H_ms - \Vert \widehat{b}\Vert _2e_1\Vert _2\,, \end{aligned}$$
i.e., the approximate solution \(\bar{x}_{L,m}\) obtained at the mth iteration of TV-FGMRES minimizes the residual norm of (3.6) over all the vectors in
$$\begin{aligned} {\mathscr {R}(Z_m)=\text {span}\{(L^{(1)})_A^\dagger v_1,(L^{(2)})_A^\dagger v_2,(L^{(3)})_A^\dagger v_3,\ldots \}}\,. \end{aligned}$$
The approximation subspace \(\mathscr {R}(Z_m)\) can be regarded as a preconditioned Krylov subspace, where the preconditioner is implicitly defined by successive applications of the matrices \((L^{(i)})_A^\dagger \) to the linearly independent vectors \(v_i\); see [24].
The main steps of the TV-FGMRES method are summarized in Algorithm 1 where, notation-wise, \([B]_{j,i}\) denotes the (ji)th entry of a matrix B.
If \((L^{(i)})_A^\dagger = L_A^\dagger \) is fixed, then
$$\begin{aligned} \mathscr {R}(Z_m)= & {} \mathscr {K}_m(L_A^\dagger (D^\dagger )^TPA,L_A^\dagger (D^\dagger )^TPb)\\= & {} L_A^\dagger \mathscr {K}_m((D^\dagger )^TPAL_A^\dagger , (D^\dagger )^TPb)=L_A^\dagger \mathscr {R}(V_m), \end{aligned}$$
where \(V_m\) is the orthonormal basis generated by the right-preconditioned Arnoldi algorithm, which can be expressed by a decomposition formally similar to (3.9), but where \(Z_m=L_A^\dagger V_m\). Recalling previous naming conventions, \(\mathscr {R}(Z_m)=L_A^\dagger \mathscr {R}(V_m)\) is the approximation subspace associated to the GMRES(WD) method.
We conclude this section by proposing a simple numerical experiment to illustrate the benefit of considering flexibility, i.e., TV-FMGRES, over non-flexible versions of GMRES. We consider a signal deblurring test problem, where a piecewise constant sharp signal of length \(N=256\) is convolved with a Gaussian blurring kernel, and Gaussian white noise of level \(10^{-2}\) is added, so to obtain a corrupted version of it (as reported in Fig. 1a). We consider the following iterative approaches to recover the exact signal: GMRES, GMRES(\(D_\mathrm{1d}\)), and TV-GMRES (with thresholds \(\tau _1=10^{-4}, \tau _2=10^{-12}\)); for the sake of comparison, we also include GMRES(\(W^{\text {ex}}D_\mathrm{1d}\)), where \(W^{\text {ex}}=W(Dx^{\text {ex}})\) are the (optimal) weights computed with respect to the exact solution \(x^{\text {ex}}\) of the noise-free problem (i.e., (1.1) with \(e=0\)). In Fig. 1b the best reconstructions obtained by GMRES, GMRES(\(D_\mathrm{1d}\)), and TV-FGMRES are reported: it is evident that the latter is the most capable method in reproducing the piecewise constant features of the original signal, as the GMRES and the GMRES(\(D_\mathrm{1d}\)) reconstructions are affected by many spurious oscillations. Indeed, the combination of flexibility and appropriate adaptive weightings allows to generate basis vectors for TV-FGMRES that are the closest to the optimal ones (see Fig. 1e, f). The basis vectors for GMRES immediately exhibit an extremely oscillatory behavior (see Fig. 1c), while the basis vectors for GMRES(\(D_\mathrm{1d}\)) are greatly smoothed, but fail to reproduce the jumps characterizing the exact signal (see Fig. 1d). We can experimentally conclude that one of the reasons underlying the success of TV-FGMRES is the iteration-dependent preconditioning that enforces piecewise-constant features of the solution into the TV-FGMRES basis vectors.
Fig. 1

Signal deblurring problem. a 1d piecewise constant exact signal \(x^{\text {ex}}\) and its corrupted version. b Best reconstructions obtained by the GMRES, GMRES(\(D_\mathrm{1d}\)), and TV-FGMRES methods. c Basis vectors \(v_1,\,v_2,\,v_3\) for GMRES. d Basis vectors \(z_1,\,z_2,\,z_6,\,z_8,\,z_{20}\) for GMRES(\(D_\mathrm{1d}\)). e Basis vectors \(z_1,\,z_2,\,z_6,\,z_8,\,z_{20}\) for TV-FGMRES. f Basis vectors \(z_1,\,z_2,\,z_6,\,z_8,\,z_{20}\) for GMRES(\(W^{\text {ex}}D_\mathrm{1d}\))

4 Implementation strategies

To devise efficient implementations of the TV-FGMRES method applied to the system (3.5) or (3.6), a number of properties of the involved matrices should be taken into account. In this section we will often use MATLAB-like notations: for instance, we will use a dot to denote a component-wise operation, a colon to access elements in a range of rows or columns of an array, and \(\text {diag}(\cdot )\) to denote a vector of diagonal entries. We will extensively invoke and generalize some of the propositions derived in [19]. We start by proving the following result for system (3.5) (analogous to Theorem 5.1 in [19]).

Theorem 4.1

If \(\mathscr {R}(L^T)\) and \(\mathscr {R}(AK)\) are complementary subspaces, the Schur complement system (3.5) is equivalent to
$$\begin{aligned} (D^{\dagger })^{T} PA L^{\dagger } y = (D^{\dagger })^{T} P b\,, \end{aligned}$$
where P is given by (3.7).


We start by noting that \(L_A^\dagger = EL^\dagger \), where
$$\begin{aligned} E = I-(A(I-L^\dagger L))^\dagger A = I-(AKK^\dagger )^\dagger A = I-K(AK)^\dagger A\,. \end{aligned}$$
$$\begin{aligned} (D^\dagger )^TPAEL^\dagger= & {} (D^\dagger )^TPA(I-K(AK)^\dagger A)L^\dagger \\= & {} (D^\dagger )^TPAL^\dagger - (D^\dagger )^TPAK(AK)^\dagger AL^\dagger \,, \end{aligned}$$
where the second term in the above sum is
$$\begin{aligned} (D^\dagger )^TPAK(AK)^\dagger AL^\dagger= & {} (D^\dagger )^T(I-AK(K^TAK)^{-1}K^T)AK(AK)^\dagger AL^\dagger \\= & {} (D^\dagger )^TAK(AK)^\dagger AL^\dagger \\&-\, (D^\dagger )^TAK(K^TAK)^{-1}(K^TAK)(AK)^\dagger AL^\dagger \!\!=0. \end{aligned}$$
Therefore, (3.5) reduces to (4.1). \(\square \)
Note that, although this theorem is stated for system (3.5), the same relations can be exploited when solving system (3.6), since the matrix-vector products \(((D^\dagger )^TPA)(L_A^\dagger v)=((D^\dagger )^TPA)(L^\dagger v)\) should be computed (see also lines 1, 2 of Algorithm 1). The above theorem is important from a computational point of view since, by avoiding multiplication by E, an additional matrix-vector product with A can be avoided at each iteration of TV-FGMRES. However, in a flexible framework, we may still need the solution \(\bar{x}_{L,i}\) (which is implicitly expressed through a matrix-vector product with \(L_A^\dagger \)) to update the weights \(W^{(i+1)}\) at each iteration (see line 5 of Algorithm 1). This is notably not the case for TV-FGMRES, as the weights are expressed with respect to the gradient of \(\bar{x}_{L,i}\), and
$$\begin{aligned} D\bar{x}_{L,i}=DL_A^\dagger \bar{y}_{L,i}=DL^\dagger \bar{y}_{L,i}-DK((AK)^\dagger A\bar{y}_{L,i})=DL^\dagger \bar{y}_{L,i}\,, \end{aligned}$$
where we have used the fact that \(\mathscr {R}(K)=\mathscr {N}(L)=\mathscr {N}(D)\). Note also that, to keep the notations light, we have compactly denoted by \(L_A^\dagger \bar{y}_{L,i}\) a vector belonging to the space (3.11). Finally, although the matrix P in (3.7) is defined in terms of A, matrix-vector products with A can be smartly avoided when computing matrix-vector products with P, by observing that
$$\begin{aligned} Pv= & {} v - AK(K^TAK)^{-1}K^Tv=v-Q_0R_0(K^TQ_0R_0)^{-1}K^Tv\\= & {} v-Q_0(K^TQ_0)^{-1}K^Tv\,, \end{aligned}$$
where \(AK=Q_0R_0\) is the reduced QR factorization of \(AK\in \mathbb {R}^N\), i.e., \(Q_0\in \mathbb {R}^N\) is the normalization of AK. Matrix-vector products with P have therefore an O(N) cost.
As a consequence, under the assumptions of Theorem 4.1, the computational cost per iteration of TV-FGMRES is dominated by one matrix-vector product with A (similarly to GMRES applied to (1.1)), plus one matrix-vector product with \((D^\dagger )^T\) and \(L^{\dagger }\). Efficient approaches to compute the latter will be explored in the next subsections. We conclude by remarking that, when considering image deblurring problems with spatially invariant blurs and periodic boundary conditions [20], the normalization condition \(A\mathbf {1}=\mathbf {1}\) is satisfied by the blurring matrix A (this is basically a conservation of light condition for the blurred image). In this setting, \(\mathscr {R}(L^T)\) and \(\mathscr {R}(AK)\) are indeed complementary subspaces, since
$$\begin{aligned} \mathscr {R}(L^T)+\mathscr {R}(AK)= & {} \mathscr {R}(L^T)+A\mathscr {R}(K)=\mathscr {R}(L^T)\\&+\, A\mathscr {N}(L)=\mathscr {R}(L^T)+\mathscr {N}(L)=\mathbb {R}^N, \end{aligned}$$
and \(\mathscr {R}(L^T)\cap \mathscr {N}(L)=\{\mathbf {0}\}\) thanks to the fundamental theorem of linear algebra.

4.1 Computations of matrix-vector products with D and \((D^\dagger )^{T}\)

These are required when computing the weights (defined in (2.4), (2.5)), and when computing matrix-vector products with \(\widehat{A}\) (defined in (3.8)). We focus on the 2d case, as special strategies should be used to handle large-scale quantities. Concerning the first task, given a vector \(v \in \mathbb {R}^{N}\), we can exploit the special structure of \(D_{{\mathrm{2d}}}\) and the Kronecker product properties, so that
$$\begin{aligned} D_{{\mathrm{2d}}}v= & {} \left[ \begin{array}{l} D^\mathrm {h}\\ D^\mathrm {v}\end{array} \right] v = \left[ \begin{array}{l} (D_\mathrm{1d}\otimes I) v\\ (I \otimes D_\mathrm{1d}) v \end{array} \right] \!\! =\!\! \left[ \begin{array}{l} V D_\mathrm{1d}^{T}\\ D_\mathrm{1d}V \end{array} \right] \nonumber \\= & {} \left[ \begin{array}{lll} V(:,1:n-1)\!\! &{}\quad - &{}\quad \!\! V(:,2:n)\\ V(1:n-1,:)\!\! &{}\quad - &{}\quad \!\! V(2:n,:) \end{array} \right] , \end{aligned}$$
where \(v{\in \mathbb {R}^N}\) is obtained by stacking the columns of \(V\in \mathbb {R}^{n\times n}\), \(N=n^2\), and where columns and rows differences of V are computed. Matrix-vector products with \(D_{{\mathrm{2d}}}\) have an \(O(n(n-1))=O(N)\) cost.
Concerning the computation of matrix-vector products with \((D^\dagger )^T\), let us first assume that the singular value decomposition of \(D_\mathrm{1d}=U_{\mathrm{1d}} \varSigma _{\mathrm{1d}} V^{T}_{\mathrm{1d}} \) can be computed. Then, by using some Kronecker product properties, we can decompose Open image in new window in (2.5) as follows
$$\begin{aligned} D_{{\mathrm{2d}}}= & {} \left[ \begin{array}{l} D_\mathrm{1d}\otimes I \\ I \otimes D_\mathrm{1d}\end{array} \right] = \left[ \begin{array}{l} U_{\mathrm{1d}} \varSigma _{\mathrm{1d}} V^{T}_{\mathrm{1d}} \otimes V_{\mathrm{1d}} V^{T}_{\mathrm{1d}} \\ V_{\mathrm{1d}} V^{T}_{\mathrm{1d}} \otimes U_{\mathrm{1d}} \varSigma _{\mathrm{1d}} V^{T}_{\mathrm{1d}} \end{array} \right] \nonumber \\= & {} \left[ \begin{array}{ll} U_{\mathrm{1d}} \otimes V_{\mathrm{1d}} &{}\quad 0 \\ 0 &{}\quad V_{\mathrm{1d}} \otimes U_{\mathrm{1d}} \end{array} \right] \widetilde{\varSigma } \left[ \begin{array}{l} V_{\mathrm{1d}} \otimes V_{\mathrm{1d}} \end{array} \right] ^{T}\,,\quad \text{ where }\quad \widetilde{\varSigma }=\left[ \begin{array}{l} \varSigma _{\mathrm{1d}} \otimes I\\ I \otimes \varSigma _{\mathrm{1d}} \end{array} \right] \,.\nonumber \\ \end{aligned}$$
The matrix \(\widetilde{\varSigma }\) has a very sparse structure, where the only nonzero entries are
$$\begin{aligned}{}[\widetilde{\varSigma }]_{j,j}\; \text{ if } j \le n(n-1),\quad \text{ and }\quad [\widetilde{\varSigma }]_{n(n-1)+j-\left\lfloor { \frac{j}{n}} \right\rfloor ,j}\;\text{ if } j < n^2 \text{ and } j \ne 0\!\!\!\mod n\,. \end{aligned}$$
A more convenient decomposition can be devised by applying a set of Givens rotations to the matrix \(\widetilde{\varSigma }\), so that the QR decomposition
$$\begin{aligned} \widetilde{Q}\widetilde{D}= \widetilde{\varSigma }\end{aligned}$$
is implicitly obtained, where \(\widetilde{Q}\in \mathbb {R}^{2\widetilde{N}\times 2\widetilde{N}}\) is an orthogonal matrix and \(\widetilde{D}\in \mathbb {R}^{2\widetilde{N}\times N}\) is a nonnegative diagonal matrix of rank \(N-1\). By plugging (4.5) into (4.4), and by using standard properties of the pseudoinverse, we get
$$\begin{aligned} (D ^{\dagger }_{{{\mathrm{2d}}}})^T= \left[ \begin{array}{ll} U_{\mathrm{1d}} \otimes V_{\mathrm{1d}} &{}\quad 0 \\ 0 &{}\quad V_{\mathrm{1d}} \otimes U_{\mathrm{1d}} \end{array} \right] \widetilde{Q}(\widetilde{D}^{\dagger })^T \left[ \begin{array}{l} V_{\mathrm{1d}} \otimes V_{\mathrm{1d}} \end{array} \right] ^T. \end{aligned}$$
Computing matrix-vector products with \((D ^{\dagger }_{{{\mathrm{2d}}}})^T\) has an overall \(O(N^{3/2})\) cost, provided that vectors of length N are conveniently reshaped as \(n\times n\) 2D arrays (\(N=n^2\)) to exploit the Kronecker product properties.

4.2 Computation of matrix-vector products with \(L^\dagger \) and \(L_A^\dagger \)

Since the A-weighted pseudoinverse can be expressed as \(L_A^\dagger = EL^\dagger \), see (4.2), and since \(K\in \mathbb {R}^N\) for TV-FGMRES, the computational burden of matrix-vector products with \(L^{\dagger }_{A}\) mainly lays in the computation of matrix-vector products with A and \(L^\dagger \). Concerning the latter, we remark that in the 1d case one simply has \((W_\mathrm{1d}D_{\mathrm{1d}})^\dagger =D_{\mathrm{1d}}^\dagger W_\mathrm{1d}^{-1}\) directly from the definition of Moore–Penrose pseudoinverse of matrices with linearly independent rows. Unfortunately, in the 2d case, the matrix
$$\begin{aligned} \widetilde{L}^{\dagger }=D^\dagger _{{{\mathrm{2d}}}} W_{{\mathrm{2d}}}^{-1} \end{aligned}$$
does not fulfill the definition of the Moore–Penrose pseudoinverse [17, Sect. 5.5.4], as \((L\widetilde{L}^{\dagger })^T\ne {L}\widetilde{L}^{\dagger }\) (i.e., the third condition is violated). Therefore, there is no trivial way of deriving a computationally feasible direct expression for \(L^\dagger =(W_{{\mathrm{2d}}}D_{{\mathrm{2d}}})^\dagger \).
A simple remedy is to consider anyway \(\widetilde{L}^{\dagger }\) (4.7) as an approximation of \(L^\dagger \). Indeed, while \(L^\dagger \) is characterized by
$$\begin{aligned} L^\dagger y= & {} \arg \min _{x\in \mathbb {R}^{N}}\left\| (W_{{\mathrm{2d}}}D_{{\mathrm{2d}}})x-y\right\| _2 =\arg \min _{x\in \mathbb {R}^{N}}\left\| W_{{\mathrm{2d}}}(D_{{\mathrm{2d}}}x-W_{{\mathrm{2d}}}^{-1}y)\right\| _{2} \nonumber \\= & {} \arg \min _{x\in \mathbb {R}^{N}}\left\| D_{{\mathrm{2d}}}x-W_{{\mathrm{2d}}}^{-1}y\right\| _{W_{{\mathrm{2d}}}^2}, \end{aligned}$$
the matrix \(\widetilde{L}^\dagger \) is characterized by
$$\begin{aligned} \widetilde{L}^\dagger y= & {} \arg \min _{x\in \mathbb {R}^{N}}\left\| D_{{\mathrm{2d}}}x-W_{{\mathrm{2d}}}^{-1}y\right\| _2 =\arg \min _{x\in \mathbb {R}^{N}}\left\| W_{{\mathrm{2d}}}^{-1}(W_{{\mathrm{2d}}}D_{{\mathrm{2d}}}x-y)\right\| _{2} \\= & {} \arg \min _{x\in \mathbb {R}^{N}}\left\| W_{{\mathrm{2d}}}D_{{\mathrm{2d}}}x-y\right\| _{W_{{\mathrm{2d}}}^{-2}}, \end{aligned}$$
so that \(\widetilde{L}^\dagger \) can be regarded as the pseudoinverse of \(W_{{\mathrm{2d}}}D_{{\mathrm{2d}}}\) computed in the \(W_{{\mathrm{2d}}}^{-2}\) norm. Alternatively, \(\widetilde{L}^\dagger \) can be regarded as the preconditioner for problem (1.1) obtained from (2.1) after the two-step transformation process
$$\begin{aligned}&\min _{x\in \mathbb {R}^N}\left\| A x-b\right\| _2^2 +\lambda \left\| (W_{{\mathrm{2d}}}D_{{{\mathrm{2d}}}})x\right\| _2^2 \nonumber \\&\quad \approx \min _{z=D_{{{\mathrm{2d}}}}x}\left\| A D^{\dagger }_{{{\mathrm{2d}}}} z-b\right\| _2^2 +\lambda \left\| W_{{\mathrm{2d}}}z\right\| _2^2\nonumber \\&\quad = \min _{y=W_{{\mathrm{2d}}}D_{{{\mathrm{2d}}}}x}\left\| A D^{\dagger }_{{{\mathrm{2d}}}} W_{{\mathrm{2d}}}^{-1} y-b\right\| _2^2 +\lambda \left\| y\right\| _2^2 \end{aligned}$$
has been performed. We stress that problem (4.9) is not equivalent to (2.1), as \(\widetilde{L}^\dagger \ne L^\dagger \). Nevertheless, matrix-vector products with \(\widetilde{L}^\dagger \) can be efficiently computed by exploiting its structure with an \(O(N^{3/2})\) cost (see (4.6)). Extensive numerical experiments (some of them reported in Sect. 5) show that, in practice, \(\widetilde{L}^{\dagger }\) (4.7) is a valid alternative to \(L^\dagger \).
The other preferred approach to compute \(L^\dagger \) for the 2d case (without resorting to approximations) is to employ an iterative method to solve the least-squares problem (4.8). This can be efficiently achieved by applying LSQR [26] or LSMR [13]. Both of them require a matrix-vector product with L and one with \(L^T\), and this can be efficiently achieved with an O(N) computational cost per iteration (see (4.3)). We remark that in the TV-FGMRES setting the matrix L depends on an approximation of the solution (as \(W_{{\mathrm{2d}}}=W_{{\mathrm{2d}}}^{(i)}=W_{{\mathrm{2d}}}(x^{{(i-1)}})\)), and since in practice the conditioning of L worsens as the vector \(D_{{{\mathrm{2d}}}}x^{(i)}\) gets sparser (i.e., when an increasing number of entries of \(f_\tau (D_{{{\mathrm{2d}}}}x^{(i)})\) are set to \(\tau _2\), see (2.6)), the convergence of LSQR and LSMR can be accelerated by using an appropriate preconditioner. Therefore, instead of (4.8), we consider the (right-) preconditioned least-squares problem
$$\begin{aligned} \min _{\widehat{x}\in \mathbb {R}^{N}}\left\| (W_{{\mathrm{2d}}}D_{{\mathrm{2d}}})P_L\widehat{x}-y\right\| _2\,,\quad x=P_L\widehat{x}\,. \end{aligned}$$
In this paper we explore two choices for the preconditioner \(P_L\) that are related to the structure of the pseudoinverse of L:
$$\begin{aligned} P_L^{(1)}=D^{\dagger }_{{{\mathrm{2d}}}}\,,\quad \text{ and } \quad P_L^{(2)}=D^{\dagger }_{{{\mathrm{2d}}}}W_{{\mathrm{2d}}}^{-1}=\widetilde{L}^{\dagger }. \end{aligned}$$
We also consider a preconditioner built around the idea of approximating the row scaling \(W_{{\mathrm{2d}}}\) by a diagonal column scaling \(\widetilde{W}_{{\mathrm{2d}}}\) (see [22]), so that
$$\begin{aligned} D^{T} W_{{\mathrm{2d}}}^2 D \approx \widetilde{W}_{{\mathrm{2d}}}(D^{T}D) \widetilde{W}_{{\mathrm{2d}}}\,, \quad \text{ where } \quad \text {diag}(\widetilde{W}_{{\mathrm{2d}}}) = \sqrt{\text {diag}(D^{T} W_{{\mathrm{2d}}}^2 D){.}/ \text {diag}(D^{T} D)} \end{aligned}$$
(all the operations in the definition of \(\widetilde{W}_{{\mathrm{2d}}}\) are applied component-wise). Note that \({\widetilde{W}_{{\mathrm{2d}}}}\) can be computed efficiently using the following relations
$$\begin{aligned} \text {diag}(D^{T} W_{{\mathrm{2d}}}^2 D)= (D^T).^2 \text {diag}(W_{{\mathrm{2d}}})\,,\quad \text {diag}(D^{T} D)= (D^T).^2 \mathbf {1}\,. \end{aligned}$$
The third choice for the preconditioner \(P_L\) in (4.10) is the inverse of the Cholesky factor of (4.12), so that, recalling the expressions (4.4) and (4.5),
$$\begin{aligned} P_L^{(3)}= \widetilde{W}_{{{\mathrm{2d}}}}^{-1} \left[ \begin{array}{c} V_{\mathrm{1d}} \otimes V_{\mathrm{1d}} \end{array} \right] {\widetilde{D}_{\beta }^{-1}}, \end{aligned}$$
where \(\widetilde{D}_\beta =\widetilde{D}_{1:N,1:N} + \beta I\) corresponds to taking the singular values \(\{ \sigma _{i} \}_{i=1,\ldots ,N}\) of \(D_{{{\mathrm{2d}}}}\), with and added tolerance \(\beta {>0}\) used to overcome the fact that the \(D_{{{\mathrm{2d}}}}\) is rank deficient (i.e., \(\sigma _{N}=0\)). This preconditioner is effective for a wide range of tolerances (in our numerical experiments we use \(\beta =\sigma _{N-1}\)).
Applying LSQR or LSMR to problem (4.10) with preconditioners (4.11) or (4.13) has an \(O(N^{3/2})\) computational cost per iteration [see (4.6)]. The LSQR or LSMR iterations are terminated when the approximation \(x_k\) of the solution of (4.8) obtained at the kth iteration satisfies a stopping criterion based on the residual or the normal equations residual norm tolerance, i.e., when
$$\begin{aligned} \Vert y-L x_k\Vert _2/\Vert y\Vert _2<\rho _1\quad \text{ or }\quad \Vert L^T(y-L x_k)\Vert _2/\Vert L^Ty\Vert _2<\rho _2\,,\quad \rho _1, \rho _2>0\,. \end{aligned}$$
We remark that the quantities in (4.14) can be conveniently monitored by computing the corresponding ones for the projected problems.
An illustration of the effect of the preconditioners (4.11) and (4.13) when LSQR is used to compute \(L^\dagger \) is provided in Figs. 2 and 3, where a model image deblurring problem with a small image of size \(32\times 32\) pixels is considered (analogously to Sect. 5, Example 1). Figure 2 displays the distribution of the (numerically) non-zero singular values of the preconditioned matrix \(LP_L\), i.e., the first \(N-1\), for \(P_L=I\) and \(P_L=P_L^{(j)}\), \(j=1,2,{3}\), and clearly shows that \(P_L^{(2)}= \widetilde{L}^{\dagger }\) is the most effective preconditioner in clustering the singular values of the preconditioned matrix \(LP_L\) around 1 and in reducing its conditioning, resulting in a fast convergence of LSQR applied to (4.10). Correspondingly, Fig. 3 displays the history of the LSQR relative errors \(\Vert L^\dagger y-x_k\Vert _2/\Vert L^\dagger y\Vert _2\) versus the number k of LSQR iterations for \(P_L=I\), \(P_L=P_L^{(2)}\), and \(P_L=P_L^{(3)}\). In this setting, \(L^\dagger y\) is computed (using the SVD of L) to get the vector \(z_i\) in line 1 of Algorithm 1, i.e., to get the solution \(\bar{x}_{L,i}\) at the ith TV-FGMRES iterate, for three different values of i.
Fig. 2

Non-zero singular values of the preconditioned coefficient matrix in (4.10), with \(L=(W_{{\mathrm{2d}}}D_{{\mathrm{2d}}})\) of size \(1984\times 1024\), and \(P_L=I\), \(P_L^{(j)}\), \(j=1,2,3\). a Distribution (clusters) of the singular values. b Singular values versus component number. These graphs are obtained at the 20th iteration of the TV-FGMRES method applied to the test problem in Sect. 5, Example 1

Fig. 3

History of the relative errors of the LSQR method for the computation of \(L^\dagger y\), when employed at the ith iteration of TV-FGMRES applied to the test problem in Sect. 5, Example 1. a Without preconditioning. b With \(P_L=P_L^{(2)}\) as preconditioner. c With \(P_L=P_L^{(3)}\) as preconditioner. d Comparative history of the relative errors for LSQR with different preconditioners at the 20th iteration of TV-FGMRES (so that LSQR is applied to a least-squares problem whose coefficient matrix has the singular value distribution displayed in Fig. 2)

4.3 Stopping criteria

As already remarked in Sect. 1, TV-FGMRES is inherently parameter-free, as only an appropriate stopping criterion for the iterations should be chosen. All the approaches hinted in this section are mentioned in the survey paper [2], where further references and details are available. We propose to use the following strategies (adapted to the solver at hand):
  • Quasi-optimality criterion, which prescribes to select the solution \(x_{L,m^*}\) obtained at the \(m^*\)th iteration such that
    $$\begin{aligned} m^*= \text {arg} \min _{m\le M_{\text {it}}} \text {TV}(x_{L,m+1}-x_{L,m})\,. \end{aligned}$$
    We remark that, although the quasi-optimality criterion requires \(M_{\text {it}}\) iterations to be performed in advance (where \(M_{\text {it}}\) is a selected maximum number of iterations), no additional computational cost per iteration has to be accounted for in order to apply (4.15) (recall the arguments at the beginning of this section).
  • Discrepancy principle, which prescribes to stop as soon as an approximation \(x_{L,m}\) is computed such that
    $$\begin{aligned} \Vert b-Ax_{L,m}\Vert _2 = \Vert r_{L,m}\Vert _2\le \theta \epsilon \,, \end{aligned}$$
    where \(\theta >1\) is a safety threshold, and \(\epsilon =\Vert e\Vert _2\) is the norm of the noise e affecting the data (1.1). The discrepancy principle is a very popular and well-established stopping criterion that relies on the availability of a good estimate of \(\Vert e\Vert _2\). However, for the TV-FGMRES method, application of the discrepancy principle may significantly increase the cost per iteration, since two additional matrix-vector products with A should be performed: one to compute \(\Vert r_{L,m}\Vert _2\) (which cannot be monitored in reduced dimension, as FGMRES is applied to the left-preconditioned system (3.5) or (3.6)), and one implicit in \(L_A^\dagger \) (to compute \(x_{L,m}\) at each iteration). For this reason, we also propose to consider the:
  • Preconditioned discrepancy principle, which prescribes to stop as soon as an approximation \(x_{L,m}\) is computed such that
    $$\begin{aligned} \Vert \widehat{b}-\widehat{A}\bar{x}_{L,m}\Vert _2 = \Vert \widehat{r}_{m-1}\Vert _2 \le \theta {\widehat{\epsilon }}\,, \end{aligned}$$
    where \({\widehat{\epsilon }}\) is the norm of the noise associated to the preconditioned problem, i.e.,
    $$\begin{aligned} {\widehat{\epsilon }}= & {} \Vert \widehat{e}\Vert _2=\Vert (D^\dagger )^TP e \Vert _2= \text {trace}(P^TD^\dagger (D^\dagger )^TP) \Vert e\Vert _2\nonumber \\= & {} \text {trace}(P^TD^\dagger (D^\dagger )^TP)\epsilon \,. \end{aligned}$$
    Although (4.17) can be monitored at no additional cost per FGMRES iteration by using projected quantities (see (3.10)), the computation of the trace in (4.18) can be prohibitive for large-scale (and possibly matrix-free) problems. We mention that, however, efficient randomized techniques can be used to handle this task (see [28]) and, most importantly, the computation of the trace should be performed only once for a system (3.5) or (3.6) of a given size, and can be done offline.

5 Numerical experiments

In this section we present three numerical test problems to investigate the performance of TV-FGMRES against other well-known approaches to total variation regularization. In each of the examples, we focus on specific aspects of the new method that we want to emphasize. Namely, the first experiment deals with a small image, which allows us to directly compute the pseudoinverse \(L^\dagger \) and compare different strategies to approximate it. The second experiment is a large-scale problem (so that \(L^\dagger \) cannot be computed directly) and deals with an image having low total variation. The third experiment deals with an image having higher total variation, showing the adaptability of the new methods to a broader class of images. All the tests are performed running MATLAB R2017a and using some of the functionalities available within the MATLAB toolbox IR Tools [14]. Table 1 summarizes the acronyms for the various methods tested in this section, together with the markers used to denote iterations satisfying specific stopping criteria within the displayed graphs. LSQR is used to compute \(L^\dagger \), possibly with the preconditioner \(P=\widetilde{L}^\dagger \), allowing at most 30 iterations, and taking \(\rho _1=10^{-8}\) for the stopping criterion in (4.14). In all the experiments, the quality of the solution is measured by the relative restoration error (RRE) \(\Vert x_{L,m}-x^{\text {ex}}\Vert _2/\Vert x^{\text {ex}}\Vert _2\), where \(x^{\text {ex}}\) is the exact solution of the noise-free problem (1.1).
Table 1

Summary of the acronyms denoting various solvers for TV regularization, and markers denoting the various stopping criteria




Stopping criteria


Smoothing-norm GMRES with L



(4.15), inner


Restarted generalized AT



Restarted Golub–Kahan bidiag.



(4.16), inner


Fast gradient-based TV



TV-FGMRES for \(\text {TV}\) with \(L^\dagger \)


(4.17), inner


TV-FGMRES for \(\text {TV}_p\) with \(L^\dagger \)


TV-FGMRES for \(\text {TV}_p\) with \(\widetilde{L}^\dagger \)

FGMRES(\(\sim p\))

Example 1

We consider the task of restoring a geometrical test image of size \(32\times 32\) pixels, corrupted by a Gaussian blur with PSF analytically defined by
$$\begin{aligned} p_{i,j}=\frac{1}{2\pi \sigma ^2}\exp \left( -\frac{1}{2\sigma ^2}(i^2+j^2)\right) \,, \end{aligned}$$
with \(\sigma =1\), \(i,j=-2,-1,0,1,2\), and additive Gaussian noise of relative noise level \(\varepsilon _{\text {rel}}=\Vert e\Vert _2/\Vert b^{\text {ex}}\Vert _2=10^{-2}\), with \(b^{\text {ex}}{=Ax^{\text {ex}}}=b-e\). The corrupted image is displayed in Fig. 5a. Since the size of this problem is moderate, it is affordable to compute directly the pseudoinverse of the matrix \(L=WD\), so that we can run Algorithm 1 without resorting to (preconditioned) LSQR to perform step 1. In Fig. 4 we compare the performance of standard GMRES, GMRES(D), TV-FGMRES, TV-FGMRES for \(\text {TV}_{0.1}\), a fast gradient-descent-method for TV (with a default value \(\lambda =2.9\times 10^{-3}\)), and the restarted generalized Arnoldi–Tikhonov method (with an automatically selected \(\lambda \) stabilizing around \(4.5\times 10^{-2}\)). 50 iterations are performed by each solver. Looking at the graphs in Fig. 4a we can see that the better-performing method for this test problem is TV-FGMRES for \(\text {TV}_{0.1}\). Including flexibly updated weights within TV-FGMRES clearly results in a great gain in accuracy with respect to an approach based on preconditioning GMRES with the fixed matrix \(D^\dagger \). Moreover, TV-FGMRES allows to reconstruct a solution of better quality with respect to the FBTV one, with considerable computational savings. The performance of ReSt-GAT, which is still based on the Arnoldi algorithm, is not as good, since the total-variation-inspired preconditioners are not incorporated into the approximation subspace for the solution. The graphs in Fig. 4b display the value of the total variation of the approximate solution recovered at each iteration, versus the iteration number. Looking at these graphs we can clearly see that TV-FGMRES, TV-FGMRES for \(\text {TV}_{0.1}\), and FBTV are the most successful methods in reconstructing approximate solutions whose total variation is the closest to the one of the exact image. The best reconstructed images for the GMRES-based methods are displayed in Fig. 5, where also surface plots thereof are provided in order to better highlight the ideally piecewise-constant features of the solutions that are fully recovered only when TV-FGMRES for \(\text {TV}_{0.1}\) is employed. Relative errors for these methods are reported in the caption. Finally, in Fig. 6, we display the relative error history and total variation history obtained running different instances of the TV-FGMRES method, where the computation of the pseudoinverse \(L^\dagger \) is done directly, the approximation \(\widetilde{L}^\dagger =D^\dagger W^{-1}\) of \(L^\dagger \) is used, or where both unpreconditioned LSQR and preconditioned (with \(P=\widetilde{L}^\dagger \)) LSQR (PLSQR) are employed to compute \(L^\dagger \). The best attained relative errors are reported in the caption. We can clearly notice that the quantities computed by PLSQR perfectly mimic the ones obtained using \(L^\dagger \) and, for this reason, in the following large-scale experiments (where computing \(L^\dagger \) directly is not feasible) we confidently use PLSQR to perform this task. On the other hand, the computationally convenient approximation \(\widetilde{L}^\dagger \) of \(L^\dagger \) recovers a solution of lower accuracy but similar total variation. The (unpreconditioned) LSQR performance is quite remarkable in terms of the relative error, though the corresponding total variations values are not very adherent to the true ones, as a low-accuracy approximation of \(L^\dagger \) is inevitably computed (recall the remarks in Sect. 4.2).
Fig. 4

Example 1. a History of the relative errors for various solvers. b History of the total variation of approximate solution for various solvers [line specifications as listed in frame (a)]; the dashed horizontal line is the total variation of the exact solution \(x^{\text {ex}}\). The \(\times \) marker highlights the iteration minimizing the relative error, while the other markers are summarized in Table 1

Fig. 5

Example 1. a Blurred noisy data. Best reconstructed solutions: b GMRES(D) (RRE, \(1.6239\times 10^{-1}\)). c TV-FGMRES (RRE, \(1.2538\times 10^{-1}\)). d TV-FGMRES for \(\text {TV}_{0.1}\) (RRE, \(1.0057\times 10^{-1}\))

Fig. 6

Example 1. a History of the relative errors of TV-FGMRES with exact \(L^\dagger \), \(L^\dagger \) approximated by \(\widetilde{L}^\dagger =D^\dagger W^{-1}\), \(L^\dagger \) computed by LSQR, and \(L^\dagger \) computed by preconditioned LSQR with \(P=\widetilde{L}^\dagger \). b History of the total variation for TV-FGMRES (line specifications as listed in frame (a))

Fig. 7

Example 2. a History of the relative errors for various solvers. b History of the total variation of approximate solution for various solvers (line specifications as listed in frame (a)); the dashed horizontal line is the total variation of the exact solution \(x^{\text {ex}}\). The \(\times \) marker highlights the iteration minimizing the relative error, while the other markers are summarized in Table 1

Example 2

We consider the task of restoring the well-known Shepp–Logan phantom of size \(256\times 256\) pixels, affected by a Gaussian blur whose PSF is given by (5.1), with \(\sigma =4\) and \(i,j=-127,\ldots ,127\), and corrupted by Gaussian noise with relative level \(\varepsilon _{\text {rel}}=5\times 10^{-2}\) (see Fig. 8a). In Fig. 7 we plot the values of the relative error (frame (a)) and the total variation (frame (b)) versus the number of iterations for a variety of solvers for (1.3): the layout of this figure is similar to the one of Fig. 4, and 90 iterations are performed for each solver. We can clearly see that, for this test problem, TV-FGMRES is the most effective solver, which attains better accuracy in the least number of iterations. The fast gradient-based method for TV (with a default value \(\lambda =5.4\times 10^{-4}\)) seems quite slow for this problem, and the restarted GKB algorithm (which is basically the restarted GAT method, where Golub–Kahan bidiagonalization is considered instead of the Arnoldi algorithm) rapidly stagnates (with an automatically selected \(\lambda \) stabilizing around \(1.7\times 10^{-2}\)).

Fig. 8

Example 2. a Blurred noisy data. Restored solutions when the discrepancy principle is satisfied: b GMRES(D) (RRE, \(4.0162\times 10^{-1}\); it, 57). c TV-FGMRES (RRE, \(3.9013\times 10^{-1}\); it, 49). d fast gradient-based method for TV (RRE, \(4.1600\times 10^{-1}\); it, 90)

Figure 8 displays the phantoms restored when the discrepancy principle (4.16) is satisfied by the GMRES(D), the TV-FGMRES, and the FBTV methods (the latter does not stop within the maximum number of allowed iterations). Relative errors and corresponding iteration numbers are reported in the caption. We can clearly see that the TV-FGMRES solution is the one with lower relative reconstruction error, though the FBTV solution surely appears more blocky (containing also some artifacts). On the opposite, the GMRES(D) solution displays many ringing artifacts, which are partially removed when adaptive weights are incorporated within the TV-FGMRES preconditioners and approximation subspace.

Example 3

We consider the task of restoring the cameraman test image of size \(256\times 256\) pixels, corrupted by the same blur and noise used with the previous example (see Fig. 10a). However, contrarily to the previous example, the total variation of the exact image is quite moderate. In Fig. 9 we plot the values of the relative error (frame (a)) and the total variation (frame (b)) versus the number of iterations for a variety of solvers for (1.3): the layout of this figure is similar to the one of Figs. 4 and 7. Also for this example, 90 iterations are performed for each solver. The best reconstructions computed by the GMRES(D), the TV-FGMRES, and the FBTV methods are displayed in Fig. 10 (relative restoration errors are reported in the caption). For this test problem all the solvers seem to have a similar performance in terms of relative errors (except for ReSt-GAT that exhibits an unstable behavior because of a likely inappropriate choice of the regularization parameter). We also remark that both ReSt-GKB and FBTV are very fast in recovering an approximate solution, whose quality however stagnates. TV-FGMRES seems to recover a more accurate value of the total variation of the approximate solutions along the iterations. Correspondingly, more details are visible in the image restored by TV-FGMRES with respect to the one restored by the FBTV method, which is more blocky (coherently to the fact that FBTV underestimates the total variation of the exact solution).

Fig. 9

Example 3. a History of the relative errors for various solvers. b History of the total variation of approximate solution for various solvers (line specifications as listed in frame (a)); the dashed horizontal line is the total variation of the exact solution \(x^{\text {ex}}\). The \(\times \) marker highlights the iteration minimizing the relative error, while the other markers are summarized in Table 1

Fig. 10

Example 3. a Blurred noisy image. Best restored solutions obtained by: b GMRES(D) (RRE, \(1.6170 \times 10^{-1}\)). c TV-FGMRES (RRE, \(1.5793 \times 10^{-1}\)). d fast gradient-based method for TV (RRE, \(1.5931\times 10^{-1}\))

6 Conclusion and future work

In this paper we presented a novel GMRES-based approach for computing regularized solutions for large-scale linear inverse problems involving TV penalization, with applications to image deblurring problems. By considering an IRN approach to approximate the non-differentiable total variation term, and by exploiting the framework of smoothing-norm preconditioning for GMRES, we could derive the TV-FGMRES method that leverages the flexible Arnoldi algorithm. The TV-FGMRES method easily extends to problems involving \(\text {TV}_p\) regularization, and it is inherently parameter-free and efficient, as various numerical experiments and comparisons with other solvers for total variation regularization show.

Future work includes a more careful investigation of how to optimally derive alternative preconditioners that can speed-up the convergence of LSQR for the computation of the pseudo-inverse \(L^\dagger \) for large-scale problems. Strategies to extend the TV-FGMRES method to incorporate additional penalization terms can be studied as well. Finally, ways of extending TV-FGMRES to handle non-square coefficient matrices can be devised, by exploiting the flexible Golub–Kahan bidiagonalization algorithm derived in [10].



We are grateful to the anonymous Referees for providing insightful suggestions that helped to improve the paper. We would also like to thank James Nagy for insightful discussions about structured matrix computations.


  1. 1.
    Arridge, S.R., Betcke, M.M., Harhanen, L.: Iterated preconditioned LSQR method for inverse problems on unstructured grids. Inverse Probl. 30(7), 075009 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bauer, F., Gutting, M., Lukas, M.A.: Evaluation of Parameter Choice Methods for Regularization of Ill-Posed Problems in Geomathematics, pp. 1713–1774. Springer, Berlin (2015)Google Scholar
  3. 3.
    Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Berisha, S., Nagy, J.G.: Iterative image restoration. In: Chellappa, R., Theodoridis, S. (eds.) Academic Press Library in Signal Processing, chap. 7, vol. 4, pp. 193–243. Elsevier, Amsterdam (2014)Google Scholar
  5. 5.
    Calvetti, D.: Preconditioned iterative methods for linear discrete ill-posed problems from a Bayesian inversion perspective. J. Comput. Appl. Math. 198(2), 378–395 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Calvetti, D., Lewis, B., Reichel, L.: On the regularizing properties of the GMRES method. Numer. Math. 91(4), 605–625 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Candés, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted l1 minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Chan, T.F., Shen, J.: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. SIAM, Philadelphia (2005)CrossRefzbMATHGoogle Scholar
  10. 10.
    Chung, J., Gazzola, S.: Flexible Krylov methods for \(\ell ^p\) regularization (2018) (submitted)Google Scholar
  11. 11.
    Chung, J., Knepper, S., Nagy, J.G.: Large-scale inverse problems in imaging. In: Scherzer, O. (ed.) Handbook of Mathematical Methods in Imaging, chap. 2, pp. 43–86. Springer, Berlin (2011)CrossRefGoogle Scholar
  12. 12.
    Eldén, L.: A weighted pseudoinverse, generalized singular values, and constrained least-squares problems. BIT Numer. Math. 22(4), 487–502 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Fong, D.C.L., Saunders, M.A.: LSMR: an iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33(5), 2950–2971 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Gazzola, S., Hansen, P.C., Nagy, J.G.: IR tools: a MATLAB package of iterative regularization methods and large-scale test problems. Numer. Algorithms (2018). Google Scholar
  15. 15.
    Gazzola, S., Nagy, J.G.: Generalized Arnoldi–Tikhonov method for sparse reconstruction. SIAM J. Sci. Comput. 36(2), B225–B247 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Gazzola, S., Novati, P., Russo, M.R.: On Krylov projection methods and Tikhonov regularization. Electron. Trans. Numer. Anal. 44(1), 83–123 (2015)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins, Baltimore (1996)zbMATHGoogle Scholar
  18. 18.
    Hansen, P.C.: Discrete Inverse Problems: Insight and Algorithms. Society for Industrial and Applied Mathematics, Philadelphia (2010)CrossRefzbMATHGoogle Scholar
  19. 19.
    Hansen, P.C., Jensen, T.K.: Smoothing-norm preconditioning for regularizing minimum-residual methods. SIAM J. Matrix Anal. Appl. 29(1), 1–14 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Hansen, P.C., Nagy, J.G., O’Leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. Society for Industrial and Applied Mathematics, Philadelphia (2006)CrossRefzbMATHGoogle Scholar
  21. 21.
    Jensen, T.K., Hansen, P.C.: Iterative regularization with minimum-residual methods. BIT Numer. Math. 47, 103–120 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Kubínová, M., Nagy, J.G.: Robust regression for mixed Poisson–Gaussian model. Numer. Algorithms 79(3), 825–851 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: A generalized Krylov subspace method for \(\ell _p-\ell _q\) minimization. SIAM J. Sci. Comput. 37, S30–S50 (2015)CrossRefzbMATHGoogle Scholar
  24. 24.
    Notay, Y.: Flexible conjugate gradients. SIAM J. Sci. Comput. 22, 1444–1460 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4(2), 460–489 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and and sparse least squares. ACM Trans. Math. Softw. 8(1), 43–71 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  28. 28.
    Saibaba, A.K., Alexanderian, A., Ipsen, I.C.F.: Randomized matrix-free trace and log-determinant estimators. Numer. Math. 137(5), 353–395 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Vogel, C.R., Oman, M.E.: Fast, robust total variation-based reconstruction of noisy, blurred images. IEEE Trans. Image Process. 7(6), 813–824 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Wohlberg, B., Rodríguez, P.: An iteratively reweighted norm algorithm for minimization of total variation functionals. IEEE Signal Process. Lett. 14, 948–951 (2007)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of Mathematical SciencesUniversity of BathBathUK

Personalised recommendations