Flexible GMRES for total variation regularization

This paper presents a novel approach to the regularization of linear problems involving total variation (TV) penalization, with a particular emphasis on image deblurring applications. The starting point of the new strategy is an approximation of the non-differentiable TV regularization term by a sequence of quadratic terms, expressed as iteratively reweighted 2-norms of the gradient of the solution. The resulting problem is then reformulated as a Tikhonov regularization problem in standard form, and solved by an efﬁcient Krylov subspace method. Namely, ﬂexible GMRES is considered in order to incorporate new weights into the solution subspace as soon as a new approximate solution is computed. The new method is dubbed TV-FGMRES. Theoretical insight is given, and computational details are carefully unfolded. Numerical experiments and comparisons with other algorithms for TV image deblurring, as well as other algorithms based on Krylov subspace methods, are provided to validate TV-FGMRES.


Introduction
This paper considers large-scale discrete ill-posed problems of the form consecutive ones), and e ∈ R N is unknown Gaussian white noise. Systems like this typically arise when discretizing inverse problems, which are central in many applications (such as astronomical and biomedical imaging, see [11,18] and the references therein). This paper mainly deals with signal deconvolution (deblurring) problems, where x ∈ R N is the unknown sharp signal we wish to recover, and b is the measured (blurred and noisy) signal. In the case of images (i.e., two-dimensional signals) we use the following convention: X ∈ R n×n is the array storing the sharp image, while x ∈ R N , N = n 2 , is the vector obtained by stacking the columns of X . The spatially invariant convolution kernel (blur) is assumed to be known. More specifically, in image deblurring, A is determined starting from a so-called point spread function (PSF, which specifies how the points in the image are distorted by the blur) and the boundary conditions (which specify the behavior of the image outside the recorded values).
Because of the ill-conditioning of A and the presence of noise e, some regularization must be applied to (1.1) in order to compute a meaningful approximation of x. To this end, one may employ the well-known Tikhonov method in general form, which computes a regularized solution Here L ∈ R M×N is the so-called regularization matrix that enforces some smoothing in x L,λ (by including the penalization Lx 2 2 in the above objective function), and λ > 0 is the so-called regularization parameter that specifies the amount of smoothing (by balancing the fit-to-data term Ax − b 2 2 and the regularzation term Lx 2 2 ). The choice of L and λ is problem-dependent, the former being typically the identity (L = I , in which case (1.2) is said to be in standard form), or a rescaled finite difference discretization of a derivative operator. When information about the regularity of x is available, one obtains enhanced reconstructions by considering a suitable L = I . If the GSVD of the matrix pair (A, L) (or the SVD of the matrix A when L = I ) can be feasibly computed, then the vector x L,λ in (1.2) can be directly expressed as a linear combination of the right (generalized) singular vectors basis. In particular, by using the GSVD, one can notice that additional smoothing is encoded in the basis vectors, and the components of x L,λ belonging to the null space of L are unaffected by regularization (see [18,Chapter 8] for more details).
Unfortunately, when considering large-scale problems whose associated coefficient matrix A may not have an exploitable structure or may not be explicitly stored, one cannot assume the GSVD to be available. In this setting iterative regularization methods are the only option, i.e., one can either solve the Tikhonov-regularized problem (1.2) iteratively, or apply an iterative solver to the original system (1.1) and terminate the iterations early (see [4,11,16,18] and the references therein).
This paper considers the last approach, sometimes referred to as "regularizing iterations", and it focuses on the GMRES method [27,Chapter 6] and some variants thereof. GMRES does not require A T nor matrix-vector products with A T , and therefore it appears computationally attractive when compared to other regularizing Krylov subspace methods such as LSQR [26]. Although GMRES was proven to be a regu-larization method in [6], it is well known that it may have a poor performance in some situations, e.g., when dealing with highly non-normal linear systems [21]. It has been shown that, however, this issue can be fixed by using specific preconditioners. For instance, the so-called smoothing-norm preconditioned GMRES method derived in [19] (and here referred to as GMRES(L)) follows from transforming the general Tikhonov problem (1.2) into standard form (i.e., into an equivalent Tikhonov problem with L = I ), and then applying GMRES to the transformed fit-to-data term. GMRES(L) can be regarded as a right-preconditioned GMRES method that computes an approximate regularized solution as a linear combination of vectors that incorporate the smoothing effect of the regularization matrix L in (1.2). We emphasize that, here and in the following, the term "preconditioner" is used in a somewhat unconventional way. Indeed, the preconditioners used in this paper aim at computing a good regularized solution to problem (1.1) and, from a Bayesian point of view, they may be regarded as "priorconditioners" [5].
Total variation regularization is very popular when dealing with signal deconvolution problems (see [9,Chapter 5] and the references therein), and amounts to computing where TV(x) denotes the isotropic total variation of the unknown x, which measures the magnitude of the discrete gradient of x in the 1 norm. The weighted term λTV(x) in the above Tikhonov-like problem has the effect of producing piecewise-constant reconstructions, as solutions with many steep changes in the gradient are penalized or, equivalently, solutions with a sparse gradient are enforced. In particular, for images, λTV(x) helps preserving edges. The convex optimization problem (1.3) is very challenging to solve, both because of its large-scale nature, and because of the presence of the non-differentiable total variation term (so that the efficient iterative techniques used to solve problem (1.2) cannot be straightforwardly adopted in this setting). We also mention in passing that the so-called TV p penalization term, which evaluates the magnitude of the gradient with respect to some p "norm", 0 < p < 1, can be considered instead of the usual TV = TV 1 , see [7]. TV p is notably more effective in enforcing sparse gradients (as it better approximates the 0 quasi-norm), but the resulting Tikhonov-like problem is not convex anymore (and therefore may have multiple local minima). A variety of numerical approaches for the solution of (1.3) have already been proposed: some of them are based on fixed-point iterations, smooth approximations of TV(x), fast gradient-based iterations, and Bregman-distance methods; see [3,8,25,29], to cite only a few.
This paper is concerned with strategies that stem from the local approximation of (1.3) by a sequence of quadratic problems of the form (1.2), and that exploit Krylov subspace methods to compute solutions thereof. To the best of our knowledge, this idea was first proposed for total variation regularization in [30], where the authors derive the so-called iteratively reweighted norm (IRN) method consisting of the solution of a sequence of penalized weighted least-squares problems with diagonal weighting matrices incorporated into the regularization term and dependent on the previous approximate solution (so that they are updated from one least-squares problem to the next one). For large-scale unstructured problems, this method intrinsically relies on an inner-outer iteration scheme. In the following we use the acronym IRN to indicate a broad class of methods that can be recast in this framework.
Although the IRN method [30] is theoretically well-justified and experimentally effective, it has a couple of drawbacks. Firstly, conjugate gradient is repeatedly applied from scratch to the normal equations associated to each penalized least-squares problem of the form (1.2) in the sequence: this may result in an overall large number of iterations. Secondly, the regularization parameter λ should be chosen (and fixed) in advance. The so-called modified LSQR (MLSQR) method [1] partially remedies both these shortcomings. Although the starting point of MLSQR is still an IRN approach [30], each Tikhonov-regularized problem in the sequence of least-squares problems is transformed into standard form: in this way the matrix A is now right preconditioned and a preconditioned LSQR method can be applied. This approach typically results in a smaller number of iterations with respect to IRN [30]; moreover, different values of the regularization parameter can be easily considered. On the downside, LSQR is still applied sequentially to each IRN least-squares problem, and a new approximation subspace for the LSQR solution is computed from scratch. The so-called GKSpq method [23] leverages generalized Krylov subspaces (GKS), i.e., approximation subspaces where the updated weights and adaptive regularization parameters can be easily incorporated as soon as they become available. In other words, only one approximation subspace is generated when running the GKSpq method for the IRN least-squares problems associated to (1.3), and the approximate solutions are obtained by orthogonal projections onto GKS of increasing dimension. In this way, GKSpq avoids inner-outer iterations and is very efficient when compared to IRN and MLSQR.
All the methods surveyed so far implicitly consider the normal equations associated to least-squares approximations of problem (1.3). As already remarked, approaches based on GMRES applied directly to the fit-to-data term in (1.2) may be more beneficial in some situations, as the computational overload of dealing with A T can be avoided. The restarted generalized Arnoldi-Tikhonov (ReSt-GAT) method [15] is arguably the only approach that generates a GMRES-like approximation subspace for the solution of each least-squares problem associated to the IRN strategy. However ReSt-GAT has two shortcomings: it is based on an inner-outer iteration scheme (though approximations recovered during an iteration cycle are carried over to the next one by performing convenient warm restarts) and the TV penalization does not directly affect the approximation subspace of ReSt-GAT (failing to properly enhance piecewise constant reconstructions).
The goal of this paper is to propose a novel strategy that employs GMRES for the solution of Tikhonov-regularized problems associated to the IRN approach to (1.3). In particular, a flexible instance of a GMRES(L)-like method is used to solve preconditioned versions of system (1.1), which are obtained by considering quadratic approximations to problem (1.3), performing transformations into standard form, and applying GMRES to the resulting fit-to-data term. In this way, the effect of the total variation regularization term (defined with respect to iteratively updated weights and a discrete gradient operator) is incorporated into the solution subspace, which is affected by both the null space of the regularization matrix and the adaptive weights. As the weights are updated as soon as a new approximate solution becomes available, i.e., immediately after a new GMRES iteration is computed, the flexible GMRES (FGM-RES) method (see [27,Chapter 9]) is employed to handle variable preconditioning along the iterations. The resulting regularization method is dubbed Total-Variation-FGMRES (TV-FGMRES). We emphasize that the TV-FGMRES method is inherently parameter-free, as only one stopping criterion should be set to suitably terminate the iterations (while, for all the other solvers for problem (1.3) listed so far, one has to choose both the parameter λ and the number of iterations). Moreover, the new approach is different from the ReSt-GAT one [15] for two reasons: firstly, the standard GMRES approximation subspaces are modified and, secondly, regularizing iterations are employed rather than solving a sequence of Tikhonov problems (1.2); also, this approach is somewhat analogous to the GKSpq [23] one, but the two methods differ in the computation of the approximation subspaces (recall that the GKSpq ones involve both A T and λ).
This paper is organized as follows. Section 2 covers some background material, including the definition of the weighting matrices for the approximation of the total variation regularization term in an IRN fashion, and a well-known procedure for transforming problem (1.2) into standard form. Section 3 describes the new TV-FGMRES method. Section 4 dwells on implementation details. Section 5 contains numerical experiments performed on three different image deblurring test problems. Section 6 presents some concluding remarks and possible future works.

IRN, weights, and standard form transformation
The main idea underlying the IRN approaches for the solution of TV-regularized problems is to approximate the minimizer of (1.3) by solving a sequence of regularized problems with rescaled penalization terms expressed as reweighted 2 norms, whose weights are iteratively updated using a previous approximation of the solution, i.e., x TV,λ x L,λ = arg min ) denotes a diagonal weighting matrix defined with respect to an available approximation x (i−1) of x TV,λ (so that also L depends on x (i−1) ), and D denotes a scaled finite difference matrix discretizing a derivative operator of the first order. More precisely, (2.1) is obtained by locally approximating the total variation functional in (1.3) by the quadratic functional where the constant second term is dropped and the rescaling factor is incorporated into λ in (2.1). The weights W (Dx (i−1) ) are determined in such a way that (2.2) is tangent to TV(x) at x = x (i−1) , and it is an upper bound for TV(x) elsewhere, see [30]. The weights are derived as follows, distinguishing between: The weighting matrix where both modulus and exponentiation are considered component-wise, is used in practice to approximate the 1-norm. Indeed, for a given v, one can easily see that where [w] k denotes the kth entry of a vector w. -The two-dimensional (2d) case. In a discrete setting, where, if v ∈ R N is obtained by stacking the columns of a 2d array V ∈ R n×n with N = n 2 , the discrete first derivatives in the horizontal and vertical directions are given by respectively. Here D1d ∈ R (n−1)×n is the 1d first derivative matrix (2.3) of appropriate size, and I is the identity matrix of size n, so that 2d discrete operators are defined in terms of the corresponding 1d ones (note that here both D1d and I have n columns, i.e., the size of the 2d array V ). Deriving an expression for the weights in the discrete 2d setting is less straightforward. Following [30], for a given v, and letting N = n(n − 1), one takes The expression of the weights in (2.4) and (2.5) generalizes to the TV p functional, 0 < p < 1, defined as in the 1d and 2d cases, respectively. Indeed, given v, it suffices to take the weights for the 1d and 2d cases, respectively, and then proceed as in the TV case illustrated above.
It is important to stress that division by zero may occur when computing the weights associated to the TV and TV p functionals (this is the case when a component of the gradient magnitude is zero, which should not be regarded and a rare occurrence as (1.3) enforces sparsity in the gradient of the solution). To avoid this, one should set some safety thresholds τ 1 > τ 2 > 0, define

6) and then consider as weights the diagonal matrices
In the following, when the distinction between the 1d and the 2d cases can be waived, we will use the simpler notations W for the M × M diagonal weighting matrix, and D for the M × N first derivative matrix (with M = N − 1 in the 1d case, and M = 2 N in the 2d case).
We conclude this section by recalling a well-known strategy to transform a generalized Tikhonov-regularized problem (1.2) with a given regularization matrix L ∈ R M×N into standard form, with rank(L) = M < N and N (A) ∩ N (L) = {0}, i.e., the null spaces of A and L trivially intersect (see [12] for more details). Under these assumptions, the solution of (1.2) can be equivalently expressed as where L † denotes the Moore-Penrose pseudoinverse of L,x L is the component of the solution , and x 0 is the component of the solution We remark that, when a given matrix L ∈ R M×N has full rank but is such that M > N , a reduced QR factorization of L, L = Q R, can be performed, and the above derivations can be applied to R ∈ R N ×N (instead of L). In the following we will use this procedure to transform (2.1), for a given weighting matrix W = W (i) = W (Dx (i−1) ), into standard form.

TV-preconditioned flexible GMRES
We begin this section by defining a TV-preconditioned version of the GMRES(L) method [19], motivated by the generalized Tikhonov problem (2.1) appearing within the IRN framework and assuming that the matrix To achieve this, we first transform problem (2.1) into standard form, following the procedure outlined in (2.7). We then describe how to apply GMRES to a system linked to the fit-to-data term in (2.7) or, in other words, we consider λ = 0 in (2.7). Indeed, we would like to apply GMRES (as a regularizing iterative method) to the system where L † A can be regarded as a preconditioner accounting for a total variation regularization term, x 0 is defined as in (2.7), andȳ L =ȳ L,0 (note that, to keep the notations simple, in the following we will useȳ L =ȳ L,0 ,x L =x L,0 , and To overcome this obstacle we closely follow the approach proposed in [19]. Let K be a matrix with full column-rank whose columns span N (L): when L is as in (2.1), since W is nonsingular, this means that for both the 1d and the 2d cases, where 1 = [1, . . . , 1] T ∈ R N . Thanks to this remark, the expression for x L in (2.7) (with λ = 0) can be further detailed as where the scalar t 0 ∈ R is uniquely determined by computing We note that decomposition (3.2) is uniquely determined if both L and K have full rank. By plugging expression (3.2) into (3.1), we get which, once premultiplied by [D † , K ] T , gives the 2 × 2 block system We can easily eliminate t 0 from this system by inverting the 1 × 1 (2, 2) block, so obtaining the Schur complement system, 5) or, using once more the relations in (3.2), where is the oblique projector onto the orthogonal complement of R(K ) along R(AK ). The coefficient matrix in system (3.5) has size M × M, while the one in system (3.6) has size M × N . We emphasize that this Schur complement system is different from the one derived in [19], where both sides of (3.4) are premultiplied by the matrix In our setting, since the matrix L contains weights W that should be suitably updated (as explained in the remaining part of this section), we conveniently discard the contribution of W in the left preconditioner in (3.5). In the following, to keep the formulas light, we often use the notations so that systems (3.5) and (3.6) can be even more compactly written as AL † Aȳ L = b and Ax L = b, respectively. The mth iteration of the GMRES method applied to compute x L as in (3.2) produces an approximation x L,m thereof, such that We stress again that here x 0 is the component of (3.9) More specifically, in the above decomposition: H m is an upper Hessenberg matrix; Since the columns of Z m already include the contribution of the variable preconditioners (L (i) ) † A , for i = 1, . . . , m, they form a basis for the vectorx L in (3.2). Therefore, at the mth step of TV-FGMRES,x L is approximated by the following vector and e 1 ∈ R m+1 is the first canonical basis vector of R m+1 . Due to decomposition (3.9) and the properties of the matrices appearing therein, i.e., the approximate solutionx L,m obtained at the mth iteration of TV-FGMRES minimizes the residual norm of (3.6) over all the vectors in The approximation subspace R(Z m ) can be regarded as a preconditioned Krylov subspace, where the preconditioner is implicitly defined by successive applications of the matrices (L (i) ) † A to the linearly independent vectors v i ; see [24]. The main steps of the TV-FGMRES method are summarized in Algorithm 1 where, notation-wise, [B] j,i denotes the ( j, i)th entry of a matrix B. If where V m is the orthonormal basis generated by the right-preconditioned Arnoldi algorithm, which can be expressed by a decomposition formally similar to (3. Gaussian white noise of level 10 −2 is added, so to obtain a corrupted version of it (as reported in Fig. 1a). We consider the following iterative approaches to recover the exact signal: GMRES, GMRES(D 1d ), and TV-GMRES (with thresholds τ 1 = 10 −4 , τ 2 = 10 −12 ); for the sake of comparison, we also include GMRES(W ex D 1d ), where W ex = W (Dx ex ) are the (optimal) weights computed with respect to the exact solution x ex of the noise-free problem (i.e., (1.1) with e = 0). In Fig. 1b the best reconstructions obtained by GMRES, GMRES(D 1d ), and TV-FGMRES are reported: it is evident that the latter is the most capable method in reproducing the piecewise constant features of the original signal, as the GMRES and the GMRES(D 1d ) reconstructions are affected by many spurious oscillations. Indeed, the combination of flexibility and appropriate adaptive weightings allows to generate basis vectors for TV-FGMRES that are the closest to the optimal ones (see Fig. 1e, f). The basis vectors for GMRES immediately exhibit an extremely oscillatory behavior (see Fig. 1c), while the basis vectors for GMRES(D 1d ) are greatly smoothed, but fail to reproduce the jumps characterizing the exact signal (see Fig. 1d). We can experimentally conclude that one of the reasons underlying the success of TV-FGMRES is the iteration-dependent preconditioning that enforces piecewise-constant features of the solution into the TV-FGMRES basis vectors.

Implementation strategies
To devise efficient implementations of the TV-FGMRES method applied to the system (3.5) or (3.6), a number of properties of the involved matrices should be taken into account. In this section we will often use MATLAB-like notations: for instance, we will use a dot to denote a component-wise operation, a colon to access elements in a range of rows or columns of an array, and diag(·) to denote a vector of diagonal entries. We will extensively invoke and generalize some of the propositions derived in [19]. We start by proving the following result for system (3.5) (analogous to Theorem 5.1 in [19]).

Proof We start by noting that L †
where the second term in the above sum is Therefore, (3.5) reduces to (4.1).
Note that, although this theorem is stated for system (3.5), the same relations can be exploited when solving system (3.6), since the matrix-vector products should be computed (see also lines 1, 2 of Algorithm 1). The above theorem is important from a computational point of view since, by avoiding multiplication by E, an additional matrix-vector product with A can be avoided at each iteration of TV-FGMRES. However, in a flexible framework, we may still need the solutionx L,i (which is implicitly expressed through a matrix-vector product with L † A ) to update the weights W (i+1) at each iteration (see line 5 of Algorithm 1). This is notably not the case for TV-FGMRES, as the weights are expressed with respect to the gradient ofx L,i , and where we have used the fact that R(K ) = N (L) = N (D). Note also that, to keep the notations light, we have compactly denoted by L † Aȳ L,i a vector belonging to the space (3.11). Finally, although the matrix P in (3.7) is defined in terms of A, matrix-vector products with A can be smartly avoided when computing matrix-vector products with P, by observing that where AK = Q 0 R 0 is the reduced QR factorization of AK ∈ R N , i.e., Q 0 ∈ R N is the normalization of AK . Matrix-vector products with P have therefore an O(N ) cost. As a consequence, under the assumptions of Theorem 4.1, the computational cost per iteration of TV-FGMRES is dominated by one matrix-vector product with A (similarly to GMRES applied to (1.1)), plus one matrix-vector product with (D † ) T and L † . Efficient approaches to compute the latter will be explored in the next subsections. We conclude by remarking that, when considering image deblurring problems with spatially invariant blurs and periodic boundary conditions [20], the normalization condition A1 = 1 is satisfied by the blurring matrix A (this is basically a conservation of light condition for the blurred image). In this setting, R(L T ) and R(AK ) are indeed complementary subspaces, since and R(L T ) ∩ N (L) = {0} thanks to the fundamental theorem of linear algebra.

Computations of matrix-vector products with D and (D † ) T
These are required when computing the weights (defined in (2.4), (2.5)), and when computing matrix-vector products with A (defined in (3.8)). We focus on the 2d case, as special strategies should be used to handle large-scale quantities. Concerning the first task, given a vector v ∈ R N , we can exploit the special structure of D 2d and the Kronecker product properties, so that where v∈ R N is obtained by stacking the columns of V ∈ R n×n , N = n 2 , and where columns and rows differences of V are computed. Matrix-vector products with D 2d have an O(n(n − 1)) = O(N ) cost. Concerning the computation of matrix-vector products with (D † ) T , let us first assume that the singular value decomposition of D 1d = U 1d Σ 1d V T 1d can be computed. Then, by using some Kronecker product properties, we can decompose D2d in (2.5) as follows The matrix Σ has a very sparse structure, where the only nonzero entries are
A more convenient decomposition can be devised by applying a set of Givens rotations to the matrix Σ, so that the QR decomposition is implicitly obtained, where Q ∈ R 2 N ×2 N is an orthogonal matrix and D ∈ R 2 N ×N is a nonnegative diagonal matrix of rank N − 1. By plugging (4.5) into (4.4), and by using standard properties of the pseudoinverse, we get Computing matrix-vector products with (D † 2d ) T has an overall O(N 3/2 ) cost, provided that vectors of length N are conveniently reshaped as n × n 2D arrays (N = n 2 ) to exploit the Kronecker product properties.

Computation of matrix-vector products with L † and L † A
Since the A-weighted pseudoinverse can be expressed as L † A = E L † , see (4.2), and since K ∈ R N for TV-FGMRES, the computational burden of matrix-vector products with L † A mainly lays in the computation of matrix-vector products with A and L † . Concerning the latter, we remark that in the 1d case one simply has directly from the definition of Moore-Penrose pseudoinverse of matrices with linearly independent rows. Unfortunately, in the 2d case, the matrix does not fulfill the definition of the Moore-Penrose pseudoinverse [17,Sect. 5.5.4], as (L L † ) T = L L † (i.e., the third condition is violated). Therefore, there is no trivial way of deriving a computationally feasible direct expression for L † = (W 2d D 2d ) † . A simple remedy is to consider anyway L † (4.7) as an approximation of L † . Indeed, while L † is characterized by the matrix L † is characterized by , so that L † can be regarded as the pseudoinverse of W 2d D 2d computed in the W −2 2d norm. Alternatively, L † can be regarded as the preconditioner for problem (1.1) obtained from (2.1) after the two-step transformation process has been performed. We stress that problem (4.9) is not equivalent to (2.1), as L † = L † . Nevertheless, matrix-vector products with L † can be efficiently computed by exploiting its structure with an O(N 3/2 ) cost (see (4.6)). Extensive numerical experiments (some of them reported in Sect. 5) show that, in practice, L † (4.7) is a valid alternative to L † . The other preferred approach to compute L † for the 2d case (without resorting to approximations) is to employ an iterative method to solve the least-squares problem (4.8). This can be efficiently achieved by applying LSQR [26] or LSMR [13]. Both of them require a matrix-vector product with L and one with L T , and this can be efficiently achieved with an O(N ) computational cost per iteration (see (4.3)). We remark that in the TV-FGMRES setting the matrix L depends on an approximation of the solution (as W 2d = W −1) )), and since in practice the conditioning of L worsens as the vector D 2d x (i) gets sparser (i.e., when an increasing number of entries of f τ (D 2d x (i) ) are set to τ 2 , see (2.6)), the convergence of LSQR and LSMR can be accelerated by using an appropriate preconditioner. Therefore, instead of (4.8), we consider the (right-) preconditioned least-squares problem (4.10) In this paper we explore two choices for the preconditioner P L that are related to the structure of the pseudoinverse of L: We also consider a preconditioner built around the idea of approximating the row scaling W 2d by a diagonal column scaling W 2d (see [22]), so that The third choice for the preconditioner P L in (4.10) is the inverse of the Cholesky factor of (4.12), so that, recalling the expressions (4.4) and (4.5), (4.13) where D β = D 1:N ,1:N + β I corresponds to taking the singular values {σ i } i=1,...,N of D 2d , with and added tolerance β> 0 used to overcome the fact that the D 2d is rank deficient (i.e., σ N = 0). This preconditioner is effective for a wide range of tolerances (in our numerical experiments we use β = σ N −1 ). Applying LSQR or LSMR to problem (4.10) with preconditioners (4.11) or (4.13) has an O(N 3/2 ) computational cost per iteration [see (4.6)]. The LSQR or LSMR iterations are terminated when the approximation x k of the solution of (4.8) obtained at the kth iteration satisfies a stopping criterion based on the residual or the normal equations residual norm tolerance, i.e., when We remark that the quantities in (4.14) can be conveniently monitored by computing the corresponding ones for the projected problems.
An illustration of the effect of the preconditioners (4.11) and (4.13) when LSQR is used to compute L † is provided in Figs. 2 and 3, where a model image deblurring problem with a small image of size 32 × 32 pixels is considered (analogously to Sect. 5, Example 1). Figure 2 displays the distribution of the (numerically) non-zero singular values of the preconditioned matrix L P L , i.e., the first N − 1, for P L = I and P L = P ( j) L , j = 1, 2, 3, and clearly shows that P (2) L = L † is the most effective preconditioner in clustering the singular values of the preconditioned matrix L P L around 1 and in reducing its conditioning, resulting in a fast convergence of LSQR applied to (4.10). Correspondingly, Fig. 3 displays the history of the LSQR relative errors L † y − x k 2 / L † y 2 versus the number k of LSQR iterations for P L = I , P L = P (2) L , and P L = P (3) L . In this setting, L † y is computed (using the SVD of L) to get the vector z i in line 1 of Algorithm 1, i.e., to get the solutionx L,i at the ith TV-FGMRES iterate, for three different values of i.

Stopping criteria
As already remarked in Sect. 1, TV-FGMRES is inherently parameter-free, as only an appropriate stopping criterion for the iterations should be chosen. All the approaches hinted in this section are mentioned in the survey paper [2], where further references and details are available. We propose to use the following strategies (adapted to the solver at hand): -Quasi-optimality criterion, which prescribes to select the solution x L,m * obtained at the m * th iteration such that m * = arg min L as preconditioner. c With P L = P L as preconditioner. d Comparative history of the relative errors for LSQR with different preconditioners at the 20th iteration of TV-FGMRES (so that LSQR is applied to a least-squares problem whose coefficient matrix has the singular value distribution displayed in Fig. 2) We remark that, although the quasi-optimality criterion requires M it iterations to be performed in advance (where M it is a selected maximum number of iterations), no additional computational cost per iteration has to be accounted for in order to apply (4.15) (recall the arguments at the beginning of this section).
-Discrepancy principle, which prescribes to stop as soon as an approximation x L,m is computed such that b − Ax L,m 2 = r L,m 2 ≤ θ , (4.16) where θ > 1 is a safety threshold, and = e 2 is the norm of the noise e affecting the data (1.1). The discrepancy principle is a very popular and wellestablished stopping criterion that relies on the availability of a good estimate of e 2 . However, for the TV-FGMRES method, application of the discrepancy principle may significantly increase the cost per iteration, since two additional matrix-vector products with A should be performed: one to compute r L,m 2 (which cannot be monitored in reduced dimension, as FGMRES is applied to the left-preconditioned system (3.5) or (3.6)), and one implicit in L † A (to compute x L,m at each iteration). For this reason, we also propose to consider the: -Preconditioned discrepancy principle, which prescribes to stop as soon as an approximation x L,m is computed such that b − Ax L,m 2 = r m−1 2 ≤ θ , (4.17) where is the norm of the noise associated to the preconditioned problem, i.e., Although (4.17) can be monitored at no additional cost per FGMRES iteration by using projected quantities (see (3.10)), the computation of the trace in (4.18) can be prohibitive for large-scale (and possibly matrix-free) problems. We mention that, however, efficient randomized techniques can be used to handle this task (see [28]) and, most importantly, the computation of the trace should be performed only once for a system (3.5) or (3.6) of a given size, and can be done offline.

Numerical experiments
In this section we present three numerical test problems to investigate the performance of TV-FGMRES against other well-known approaches to total variation regularization.
In each of the examples, we focus on specific aspects of the new method that we want to emphasize. Namely, the first experiment deals with a small image, which allows us to directly compute the pseudoinverse L † and compare different strategies to approximate it. The second experiment is a large-scale problem (so that L † cannot be computed directly) and deals with an image having low total variation. The third experiment deals with an image having higher total variation, showing the adaptability of the new  [14]. Table 1 summarizes the acronyms for the various methods tested in this section, together with the markers used to denote iterations satisfying specific stopping criteria within the displayed graphs. LSQR is used to compute L † , possibly with the preconditioner P = L † , allowing at most 30 iterations, and taking ρ 1 = 10 −8 for the stopping criterion in (4.14). In all the experiments, the quality of the solution is measured by the relative restoration error (RRE) x L,m − x ex 2 / x ex 2 , where x ex is the exact solution of the noise-free problem (1.1).

Example 1
We consider the task of restoring a geometrical test image of size 32 × 32 pixels, corrupted by a Gaussian blur with PSF analytically defined by with σ = 1, i, j = −2, −1, 0, 1, 2, and additive Gaussian noise of relative noise level ε rel = e 2 / b ex 2 = 10 −2 , with b ex = Ax ex = b − e. The corrupted image is displayed in Fig. 5a. Since the size of this problem is moderate, it is affordable to compute directly the pseudoinverse of the matrix L = W D, so that we can run Algorithm 1 without resorting to (preconditioned) LSQR to perform step 1. In Fig. 4 we compare the performance of standard GMRES, GMRES(D), TV-FGMRES, TV-FGMRES for TV 0.1 , a fast gradient-descent-method for TV (with a default value λ = 2.9 × 10 −3 ), and the restarted generalized Arnoldi-Tikhonov method (with an automatically selected λ stabilizing around 4.5 × 10 −2 ). 50 iterations are performed by each solver. Looking at the graphs in Fig. 4a we can see that the better-performing method for this test problem is TV-FGMRES for TV 0.1 . Including flexibly updated weights within TV-FGMRES clearly results in a great gain in accuracy with respect to an approach based on preconditioning GMRES with the fixed matrix D † . Moreover, TV-FGMRES allows to reconstruct a solution of better quality with respect to the FBTV one, with considerable computational savings. The performance of ReSt-GAT, which is still based on the Arnoldi algorithm, is not as good, since the total-variationinspired preconditioners are not incorporated into the approximation subspace for the  Table 1 solution. The graphs in Fig. 4b display the value of the total variation of the approximate solution recovered at each iteration, versus the iteration number. Looking at these graphs we can clearly see that TV-FGMRES, TV-FGMRES for TV 0.1 , and FBTV are the most successful methods in reconstructing approximate solutions whose total variation is the closest to the one of the exact image. The best reconstructed images for the GMRES-based methods are displayed in Fig. 5, where also surface plots thereof are provided in order to better highlight the ideally piecewise-constant features of the solutions that are fully recovered only when TV-FGMRES for TV 0.1 is employed. Relative errors for these methods are reported in the caption. Finally, in Fig. 6, we display the relative error history and total variation history obtained running different instances of the TV-FGMRES method, where the computation of the pseudoinverse L † is done directly, the approximation L † = D † W −1 of L † is used, or where both unpreconditioned LSQR and preconditioned (with P = L † ) LSQR (PLSQR) are employed to compute L † . The best attained relative errors are reported in the caption. We can clearly notice that the quantities computed by PLSQR perfectly mimic the ones obtained using L † and, for this reason, in the following large-scale experiments (where computing L † directly is not feasible) we confidently use PLSQR to perform this task. On the other hand, the computationally convenient approximation L † of L † recovers a solution of lower accuracy but similar total variation. The (unpreconditioned) LSQR performance is quite remarkable in terms of the relative error, though the corresponding total variations values are not very adherent to the true ones, as a low-accuracy approximation of L † is inevitably computed (recall the remarks in Sect. 4.2).

Example 2
We consider the task of restoring the well-known Shepp-Logan phantom of size 256 × 256 pixels, affected by a Gaussian blur whose PSF is given by (5.1), with σ = 4 and i, j = −127, . . . , 127, and corrupted by Gaussian noise with relative level ε rel = 5 × 10 −2 (see Fig. 8a). In Fig. 7 we plot the values of the relative error (frame (a)) and the total variation (frame (b)) versus the number of iterations for a variety iterations are performed for each solver. We can clearly see that, for this test problem, TV-FGMRES is the most effective solver, which attains better accuracy in the least number of iterations. The fast gradient-based method for TV (with a default value λ = 5.4 × 10 −4 ) seems quite slow for this problem, and the restarted GKB algorithm (which is basically the restarted GAT method, where Golub-Kahan bidiagonalization is considered instead of the Arnoldi algorithm) rapidly stagnates (with an automatically selected λ stabilizing around 1.7 × 10 −2 ). Figure 8 displays the phantoms restored when the discrepancy principle (4.16) is satisfied by the GMRES(D), the TV-FGMRES, and the FBTV methods (the latter does not stop within the maximum number of allowed iterations). Relative errors and corresponding iteration numbers are reported in the caption. We can clearly see that the TV-FGMRES solution is the one with lower relative reconstruction error, though the FBTV solution surely appears more blocky (containing also some artifacts). On the opposite, the GMRES(D) solution displays many ringing artifacts, which are partially removed when adaptive weights are incorporated within the TV-FGMRES preconditioners and approximation subspace.

Example 3
We consider the task of restoring the cameraman test image of size 256 × 256 pixels, corrupted by the same blur and noise used with the previous example (see  Table 1 Fig. 10a). However, contrarily to the previous example, the total variation of the exact image is quite moderate. In Fig. 9 we plot the values of the relative error (frame (a)) and the total variation (frame (b)) versus the number of iterations for a variety of solvers for (1.3): the layout of this figure is similar to the one of Figs. 4 and 7. Also for this example, 90 iterations are performed for each solver. The best reconstructions computed by the GMRES(D), the TV-FGMRES, and the FBTV methods are displayed in Fig. 10 (relative restoration errors are reported in the caption). For this test problem all the solvers seem to have a similar performance in terms of relative errors (except for ReSt-GAT that exhibits an unstable behavior because of a likely inappropriate choice of the regularization parameter). We also remark that both ReSt-GKB and FBTV are very fast in recovering an approximate solution, whose quality however stagnates. TV-FGMRES seems to recover a more accurate value of the total variation of the approximate solutions along the iterations. Correspondingly, more details are visible in the image restored by TV-FGMRES with respect to the one restored by the FBTV method, which is more blocky (coherently to the fact that FBTV underestimates the total variation of the exact solution).

Conclusion and future work
In this paper we presented a novel GMRES-based approach for computing regularized solutions for large-scale linear inverse problems involving TV penalization, with applications to image deblurring problems. By considering an IRN approach to approximate the non-differentiable total variation term, and by exploiting the framework of smoothing-norm preconditioning for GMRES, we could derive the TV-FGMRES method that leverages the flexible Arnoldi algorithm. The TV-FGMRES method easily extends to problems involving TV p regularization, and it is inherently parameter-free and efficient, as various numerical experiments and comparisons with other solvers for total variation regularization show. Future work includes a more careful investigation of how to optimally derive alternative preconditioners that can speed-up the convergence of LSQR for the computation of the pseudo-inverse L † for large-scale problems. Strategies to extend the TV-FGMRES method to incorporate additional penalization terms can be studied as well. Finally, ways of extending TV-FGMRES to handle non-square coefficient matrices can be devised, by exploiting the flexible Golub-Kahan bidiagonalization algorithm derived in [10].