1 Introduction

Object of this work is the numerical solution of the problem

$$ A_{n,m} \overline{\mathbf{x}}_{m} = \mathbf{g}_{n} \equiv \bar{\mathbf{g}}_{n} + \boldsymbol{\varepsilon}, \quad A_{n,m }\in \mathbb{R}^{n \times m}, \mathbf{x}_{m} \in \mathbb{R}^{m}, \mathbf{g}_{n},\boldsymbol{\varepsilon} \in \mathbb{R}^{n}, \quad n,m > 0, $$
(1)

obtained from the discretization of an ill-posed problem in the Hadamard sense [1], in which the right-hand side gn is the sum of the true signal \(\bar {\mathbf {g}}_{n}\) and of a noise vector ε with ∥ε∥ = δ.

A vast source of problems of this type is indeed the class of inverse problems, such as the numerical solution of Fredholm integral equations of the first kind with nondegenerate kernels, deblurring problems for images, and, more generally, the application of linear models for the reconstruction of corrupted signals.

The usual techniques for the solution of linear systems cannot be applied directly to (1) since their straightforward application would end up in the recovery of a non-physical/noisy corrupted solution [2]. A vast amount of ad-hoc techniques have been proposed in the literature to overcome this problem and they are broadly referred to as regularization techniques. In particular, we cite two strategies related to our proposal: regularizing preconditioning and hybrid methods.

Regularizing preconditioners have been developed to accelerate the convergence of iterative methods likes Landweber or CGLS, without spoiling the computed solution avoiding to amplify the high frequencies, where the noise lives. Many proposals define the preconditioner in a trigonometric matrix algebra [2,3,4,5,6], that can be applied only to shift-invariant operators. Since the preconditioner should preserve the structure of the coefficient matrix An,m (see [7]), we follow the more general approach to define it in a Krylov subspace as suggested in [8, 9].

The combination of a direct solution of the original problem in a small size Krylov subspace is usually referred as a hybrid method. In more detail, hybrid methods combine the iterative construction of the Krylov subspace with the exact solution of the Tikhonov problem projected into the computed Krylov subspace [10,11,12,13]. The small size of the Krylov space allows to efficiently solve the projected Tikhonov problem.

In this work, tackling problem (1) through the normal equation

$$ A_{n,m}^{T} A_{n,m} \overline{\mathbf{x}}_{m} = A_{n,m}^{T} \mathbf{g}_{n}, $$
(2)

we propose and investigate a class of regularization techniques based on a Matrix-Function approach. In detail, a regularized approximation of the solution of problem (1) is obtained as

$$ {\mathbf{x}}_{m, \alpha}^{\delta} = f_{\alpha}(A_{m,n}^{T}A_{m,n})A_{m,n}^{T}\mathbf{g}_{n}, $$
(3)

where fα(z) is a suitable regularization of the inverse function applied to \(A_{m,n}^{T}A_{m,n}\), i.e., \(f_{\alpha }(A_{m,n}^{T}A_{m,n})A_{m,n}^{T}A_{m,n} \approx I \) and, at the same time, \(f_{\alpha }(A_{m,n}^{T}A_{m,n})\) does not propagate the noise ε presents in gn.

The matrix function fα is defined having the property

$$ f_{\alpha}(\lambda) \lambda \approx h_{\alpha}(\lambda), $$
(4)

where hα(λ) is a differentiable approximation of the Heavyside step function in α, i.e.,

$$ {\text{given }\beta > 0,} \quad h_{\alpha}(\lambda) = \left\lbrace\begin{array}{ll} 1, & \lambda > \alpha+\beta^{-1},\\ 1/2, & \lambda = \alpha,\\ 0, & \lambda < \alpha-\beta^{-1}. \end{array}\right. $$
(5)

In this way, we are able to filter the spectrum of the inverse of \(A_{m,n}^{T}A_{m,n}\) clustering to zero the eigenvalues smaller than α and inverting the remaining ones. As a result, the eigencomponents of the eigenvalues clustered to zero are neglected in the approximate solution and the propagation of the noise due to the ill-conditioning is avoided.

In this paper, we consider a smooth approximation of the Heavyside step function (5) previously proposed in [14, 15] for the analysis of electronic structures. Moreover, in order to obtain a fast matrix function evaluation, we consider a Krylov subspace method of polynomial type based on either the Arnoldi or the Lanczos decomposition, according to the properties of the matrix Am,n. Hence, we obtain a hybrid method combining the approximate Heavyside step function with the iterative regularization due to the iterative computation of the Krylov subspace. Our proposal is independent of the particular structure of An,m and it allows to work in a matrix-free framework simply requiring only the matrix-vector product operation.

The parameter α will be fixed according to the noise level assumed to be known, while the computation of the Krylov basis is stopped according to the combination of the discrepancy principle with a new criterion based on the stagnation of the norm of the residual. The several numerical results presented include different kinds of inverse problems proving that the proposed framework is general and robust. To enhance the reproducibility of the presented results, the codes of the new algorithms are publicly available (see https://github.com/Cirdans-Home/IRfun).

The paper is organized as follows. In Section 2 we discuss the construction of the approximation (3) connecting it to the classic works in regularization [1]. In Section 3 we define our proposal for the actual computation of our matrix-function regularization and we provide two suitable stopping criteria. In Section 4 we apply the proposed procedures to several test problems proving the goodness of our proposal. Finally, Section 5 is devoted to some final comments.

2 Motivations and literature review

2.1 Filter-based regularization methods

In this section we briefly summarize the framework of filter-based regularization methods. The notation and results are entirely borrowed from [1] and the interested reader can find there further details. Let us consider a compact linear operator \(A:{\mathscr{X}} \rightarrow {\mathscr{Y}}\) between Hilbert spaces \({\mathscr{X}}\), \({\mathscr{Y}}\) and denote with \(A^{*} : {\mathscr{Y}} \rightarrow {\mathscr{X}}\) the adjoint operator of A. We denote by A the Moore-Penrose inverse of A, and by D(A) its domain.

We are interested in an ill-posed (in the Hadamard-sense) linear operator equation of the form

$$ Ax=g. $$
(6)

The following results hold:

Theorem 1

Let gD(A). Then, Ax = g has a unique best-approximate solution, which is given by

$$ x^{\dagger}:=A^{\dagger}g. $$

We have, moreover, that

Theorem 2

Let gD(A); \(x \in \mathcal {X}\) is a least-square solution of Ax = g if and only if the normal equation

$$ A^{*}Ax=A^{*}g $$
(7)

holds.

From the above results it follows that x is the solution of (7), i.e.,

$$ A^{\dagger}=(A^{*}A)^{\dagger}A^{*}. $$

When considering problem (6) where only an approximation gδ of g is available and δ is the noise level, due to the unboundedness of A, the solution Agδ is not a good approximation of x = Ag (we suppose g to be attainable). This follows by the following simple observation: consider the singular value expansion (s.v.e.) of A \((\sigma _{i}; v_{i}, u_{i})_{i \in \mathbb {N} }\), then if gD(A), we have

$$ A^{\dagger}g=\sum\limits_{i=1}^{\infty}\frac{<g,u_{i}>}{\sigma_{i}}v_{i}, $$
(8)

which clearly shows that errors in < g,ui > are propagated with a factor of 1/σi.

In practice, problem (6) is approximated by a family of neighboring well-posed problems [1].

Definition 1

By regularization operator for A we call any family of operators

$$ \{R_{\alpha}\}_{\alpha \in (0,\alpha_{0})}: \mathscr{Y}\rightarrow \mathscr{X}, \alpha_{0} \in (0,+\infty] $$
(9)

with the following properties:

  1. 1.

    \(R_{\alpha }:{\mathscr{Y}}\rightarrow {\mathscr{X}}\) is bounded for every α;

  2. 2.

    For every gD(A) there exists a rule choice \(\alpha : \mathbb {R}_{+} \times {\mathscr{Y}} \rightarrow (0,\alpha _{0}) \subset \mathbb {R}\), α = α(δ,gδ), such that

    $$ \lim \sup_{\delta \rightarrow 0} \{ \alpha(\delta, g^{\delta}): g^{\delta} \in \mathscr{Y}, \|g-g^{\delta} \|\leq \delta \}=0, $$

    and

    $$ \lim \sup_{\delta \rightarrow 0}=\{\|R_{\alpha(\delta,g^{\delta})}g^{\delta} -A^{\dagger}g \|: g^{\delta} \in \mathscr{Y}, \|g-g^{\delta}\|\leq \delta \}=0. $$

Thus, a regularization method consists of a regularization operator and a parameter choice rule which is convergent in the sense that, if the regularization parameter is chosen according to that rule, then the regularized solutions converge as the noisy level tends to zero. Moreover, we have

Proposition 1

Let, for all α > 0, Rα be a continuous operator. Then, the family {Rα} is a regularization for A if

$$ R_{\alpha} \rightarrow A^{\dagger} \text{ pointwise on } D(A^{\dagger}) \text{ as } \alpha \to 0. $$
(10)

We consider here a class of linear regularization methods based on spectral theory for selfadjoint compact linear operators. The basic idea is the following one: let {Eλ} be a spectral family for AA. The best-approximate solution x = Ag can be characterized as

$$ x^{\dagger}= \int \frac{1}{\lambda}dE_{\lambda}A^{*}g. $$

When the above integral does not exist due to the fact that 1/λ has a pole in 0, then the problem Ax = g is ill-posed, and the idea is now to replace 1/λ by a parameter dependent family of functions fα(λ), i.e.,

$$ x_{\alpha}:=\int f_{\alpha}(\lambda)dE_{\lambda}A^{*}g=:f_{\alpha}(A^{*}A)A^{*}g, $$

i.e., we are considering

$$ R_{\alpha}:= \int f_{\alpha}(\lambda)dE_{\lambda}A^{*} $$
(11)

as regularization operator for A.

The following result gives sufficient conditions for Rα as in (11), to be a regularization operator for A:

Theorem 3

Let, for all α > 0 and an ε > 0, \(f_{\alpha } : [0,\|A\|^{2}+\varepsilon ) \rightarrow \mathbb {R}\) fulfills the following assumptions: fα is piece-wise continuous, and there exists a C > 0 such that

$$ |\lambda f_{\alpha}(\lambda)| \leq C $$

and

$$ \lim_{\alpha \to 0} f_{\alpha}(\lambda)=\frac{1}{\lambda} $$

for all λ ∈ (0,∥A2]. Then, for all gD(A)

$$ \lim_{\alpha \to 0} f_{\alpha}(A^{*}A)A^{*}g=A^{\dagger}g. $$

If yD(A), then \(\lim _{\alpha \to 0}\|f_{\alpha }(A^{*}A)A^{*}g\|=+\infty \).

2.2 Tikhonov regularization and Heavyside step function

A special choice for fα which fulfills the assumptions of Theorem 3 is

$$ f_{\alpha}(\lambda):=\frac{1}{\lambda +\alpha}{.} $$

In this case we have

$$ x_{\alpha}^{\delta}= f_{\alpha}(A^{*}A)A^{*}g^{\delta}={\sum}_{i=1}^{\infty} \frac{\sigma_{i}}{{\sigma_{i}^{2}}+\alpha} <g^{\delta},u_{i}> \nu_{i}{.} $$
(12)

Formula (12), if on the one hand, clearly shows the regularization capabilities for this choice of fα(λ) since errors in < gδ,ui > are not propagated with factors 1/σi but with factors \(\sigma _{i}/({\sigma _{i}^{2}}+\alpha )\), on the other hand, it shows the limitation of choosing a uniform shift of all the singular values: this choice is responsible of an over-damping of the largest singular values.

Our proposal synthesizes and moves from the above observation. Let us consider (12) for a generic fα, we have

$$ x_{\alpha}^{\delta}= f_{\alpha}(A^{*}A)A^{*}g^{\delta}={\sum}_{i=1}^{\infty} f_{\alpha}({\sigma_{i}^{2}}) {\sigma_{i}} <g^{\delta},u_{i}> \nu_{i}. $$
(13)

In order to provide a regularized solution \(x_{\alpha }^{\delta }\), using (8), the function fα should be s.t.

$$ f_{\alpha}({\sigma_{i}^{2}}) {\sigma_{i}}= \left\lbrace\begin{array}{ll} 1/\sigma_{i}, & \text{ if } \ {\sigma_{i}^{2}}>\alpha \\ 0, & \text{ if }\ {\sigma_{i}^{2}}<\alpha, \end{array}\right. $$

i.e., it should be

$$ f_{\alpha}({\sigma_{i}^{2}}) {\sigma_{i}}^{2} \approx h_{\alpha} ({\sigma_{i}^{2}}), $$

from which we recover (5). Observe that the function

$$ f_{\alpha}(\lambda)= \frac{h_{\alpha}(\lambda)}{\lambda} $$
(14)

clearly satisfies the hypothesis of Theorem 3, and hence

$$ R_{\alpha}:= \int \frac{h_{\alpha}(\lambda)}{\lambda}dE_{\lambda}A^{*} $$

is a regularizing operator for A.

Remark 1

We remark in passing [1], that when the operator A is already selfadjoint positive semidefinite, one will not use Rα := fα(AA)A, but apply the theory of regularization methods for equations with selfadjoint operators, where Rα would be just fα(A).

Moreover, from now on, as it is customary, we will suppose that an approximation An,m of A in finite dimensional subspaces is available, i.e., that we have a discretization of the regularization operator [1]. In the following part we focus then on the construction of a regularization approach of hybrid type in which the regularization is effected by both the parameters defining the function fα(λ) and the choice of the projection subspace (see, e.g., [11,12,13]).

2.3 Regularizing preconditioners

In this section, for the sake of simplicity, we will suppose that AnAn,n is a symmetric positive definite matrix. Then, using Remark 1, instead of solving problem in (2), we can suppose to solve

$$ A_{n} \overline{\mathbf{x}}_{n} = \mathbf{g}_{n} \equiv \bar{\mathbf{g}}_{n} + \boldsymbol{\varepsilon}, \qquad A_{n }\in \mathbb{R}^{n \times n}, \mathbf{g}_{n},\boldsymbol{\varepsilon} \in \mathbb{R}^{n}, \quad n> 0, $$
(15)

During the last twenty years, intense research has been devoted the computation of an approximate solution of (15) by coupling the classic preconditioned Krylov iterative methods with preconditioners Pn able to prevent the propagation of the noisy corrupted components contained in gn, i.e.,

$$ P_{n}^{-1} A_ n \overline{\mathbf{x}}_{n} = P_{n}^{-1} \mathbf{g}_{n}, $$
(16)

where Pn is a regularizing preconditioner [2,3,4, 16]. These can be heuristically intended as preconditioners Pn with a dual role; on the one hand they have to be able to speed up the convergence in the well-conditioned space of the spectra of An, and, simultaneously, on the other, they need to slow down the restoration of the most corrupted components of gn.

Just to fix ideas in the right order, we stress that this is not the only class of preconditioners used in regularization problems, indeed there exist ample casuistry in which the objective is the same one of classic preconditioning, i.e., accelerating the solution of the underlying linear system and this happens, for example, when looking for the solution of the linear systems arising in the Tikhonov regularization, consider, e.g., [5, 6].

The class of regularizing matrix algebras preconditionersPn is a well studied class of regularizing preconditioners and can be described in a very general setting (see [3, Definition 3.1]).

Such definition is built up mimicking the procedure that is usually followed to produce regularizing preconditioner for matrices An of Toeplitz type, i.e.,

$$ [A_{n}(\kappa)]_{r,s} = a_{r-s}, \quad a_{k} = \frac{1}{2\pi} {\int}_{-\pi}^{\pi} \kappa(x) \exp(-ik x)dx, $$

for \(k \in \mathbb {Z}\) and κ a function in \(\mathbb {L}^{1}\) with one (or more) root(s). In this case the \(P_{n}^{-1}\) preconditioners are nothing more than matrices generated from a family of bounded functions approximating the unbounded function 1/κ. Specifically, if they are selected from an algebra \({\mathscr{M}}_{n}\) of matrices simultaneously diagonalized by an orthogonal transform Un [17, 18], i.e.,

$$ \mathscr{M}_{n} = \{ {M_{n}} \in \mathbb{C}^{n \times n} : {M_{n}} = U_{n} \operatorname{diag}(\mathbf{z}) U_{n}^{*}, \quad \mathbf{z} \in \mathbb{C}^{n}, U_{n}^{*}U_{n} = I_{n} \}, $$
(17)

then the construction of Pn can be achieved by applying a suitable filter fα to the diagonal term in the Schur decomposition of some suitably chosen matrix \(\overline {M}_{n} \in {\mathscr{M}}_{n}\) (e.g., the projection of An) . This is indeed nothing more than the computation of a filtering matrix-function fα(λ) on the particular \(\overline {M}\) chosen, since what is then built is simply

$$ {P_{n}:=f_{\alpha}(\overline{M}_{n}) = U_{n} \operatorname{diag}(f_{\alpha}(\mathbf{z})) U_{n}^{*}.} $$
(18)

To this class of preconditioners belong all the combinations that can be built taking \({\mathscr{M}}_{n}\) as a trigonometric algebra and fα a suitable filtering function in the sense of [3, Definition 3.1]. The steps needed in this approach include a careful selection of the algebra (17) in which the problem is projected, and the selection of an appropriate filter function; both the choices are strictly connected with the structure of the sequence of matrices {An}n, and severely effect the quality of the restored solution. An analogous observation holds for all those regularizing structured preconditioners [7, 19], in which the preconditioner shares the same structure of the underlying matrix.

As a matter of fact, in many applications, it is not possible to retrace a useful structure in An in order to devise a proper matrix algebra \({\mathscr{M}}_{n}\).

The approach we propose in this work, synthesizing from the above presented techniques and from [8, 9], aims to be applied independently from the particular structure of An avoiding the necessity of devising an opportune matrix algebra \({\mathscr{M}}_{n}\). In particular, it allows to work with the matrices {An}n in a matrix-free framework, i.e., gathering information from the matrices just focusing on the matrix-vector product operation (Krylov-type techniques).

2.4 The matrix-function technique

To give a precise idea of the general framework of our approach, if An is symmetric and positive definite, we have

$$ \begin{array}{@{}rcl@{}} f_{\alpha}(A_{n}) \mathbf{g}_{n} &= & U f_{\alpha}({\Lambda}_{A_{n}}) U^{H} \mathbf{g}_{n} = \sum\limits_{j=1}^{n} f_{\alpha}(\lambda_{j}) (\mathbf{u}_{j}^{T} \mathbf{g}_{n}) \mathbf{u}_{j} \\ &= & \sum\limits_{j : \lambda_{j} \leq \alpha} f_{\alpha}(\lambda_{j}) (\mathbf{u}_{j}^{T} \mathbf{g}_{n}) \mathbf{u}_{j} + \sum\limits_{j : \lambda_{j} > \alpha} f_{\alpha}(\lambda_{j}) (\mathbf{u}_{j}^{T} \mathbf{g}_{n}) \mathbf{u}_{j}, \end{array} $$
(19)

where α is a suitable threshold parameter and \(f:\mathbb {R}_{+}\rightarrow \mathbb {R}\).

Since the eigencomponents related to the {j : λjα} are those responsible for the propagation of the noise contained in gn—while the {j : λj > α} are the ones for which it is possible to reconstruct the signal without incurring in a noise propagation phenomenon—we devise the use of a fα(λ) such that

$$ f_{\alpha}(\lambda) \approx h_{\alpha}(\lambda) \lambda^{-1}, $$
(20)

for hα(λ) the Heavyside step function in α as in (5). Accordingly to this choice, we are setting in (19)

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{j : \lambda_{j} \leq \alpha} f_{\alpha}(\lambda_{j}) (\mathbf{u}_{j}^{T} \mathbf{g}_{n}) \mathbf{u}_{j} \approx 0, \\ &&\sum\limits_{j : \lambda_{j} > \alpha} f_{\alpha}(\lambda_{j}) (\mathbf{u}_{j}^{T} \mathbf{g}_{n}) \mathbf{u}_{j} \approx \sum\limits_{j : \lambda_{j} > \alpha} \frac{1}{\lambda_{j}} (\mathbf{u}_{j}^{T} \mathbf{g}_{n}) \mathbf{u}_{j}. \end{array} $$

To build such matrix function we need a suitable regular approximation of the Heavyside step function. This is a problem that has been addressed in a completely different setting for the analysis of electronic structures in quantum chemistry and solid state physics [14, 15]; we use hence the approximation

$$ f_{\alpha}(A_{n}) = h_{\alpha}(A_{n})A_{n}^{-1} \approx \frac{1}{2} \left[1 + \tanh\left( \beta (A_{n} - \alpha I_{n}) \right)\right]A_{n}^{-1}, \qquad \alpha,\beta > 0. $$
(21)

Since we want to avoid any occurrences of the computation of \(A_{n}^{-1}\mathbf {x}\) in this context, for computing fα(An)gn, we decide to recur to a Krylov subspace method of polynomial type based on either the Lanczos decomposition, if An are symmetric matrices, or the Arnoldi decomposition, if the An are nonsymmetric.

2.5 Fixing the parameters

As for all regularization methods, we need a suitable way of fixing the various parameters defining the method. In our case, we have to discuss the choice of the α, β for fα(λ).

From Fig. 1, the effects of the two parameters are clear. The value of α regulates which part of the spectrum we are filtering, and the β how sharp is the threshold process. Of course, the choice of α and β should depend on the noise level and on the decay speed of the eigenvalues/singular values of An.

Fig. 1
figure 1

Regularized version of hα(λ) (left) and fα(λ) function (right) for α = 10− 1,10− 2 and several values of β

A reasonable heuristic for this choice is represented by a value of α that is slightly smaller than the level of noise and by a value of β that is such that 1/β << α, in order to avoid the instances represented by the dashed lines in Fig. 1, where some small eigenvalues end up being magnified in the inversion procedure.

Theorem 4

Given the regularization problem in (1) and the regularizing function fα(An) in (21), when \(\delta \rightarrow 0\), for the parameter choice αδ and 1/β < α, we find that

  1. 1.

    If An,m = An, then \(f_{\alpha }(A_{n}) \mathbf {g}_{n} \rightarrow \bar {\mathbf {g}}_{n}\),

  2. 2.

    otherwise, \(f_{\alpha }(A_{n,m}^{T} A_{n,m}) A_{n,m}^{T} \mathbf {g}_{n}\) converges to the least-square solution of (1).

Proof

We simply need to observe that, by (20) and (5), \(f_{\alpha }(\lambda ) \rightarrow \lambda ^{-1}\) when \(\delta \rightarrow 0\) for the parameter choice αδ and 1/β < α. □

3 Computing the matrix function

For the sake of readability, from this section on, we fix the dimension of the problem n. We will write A instead An, \(\overline {\mathbf {x}}\) instead \(\overline {\mathbf {x}}_{n}\) and g instead gn.

The core of our approach is the computation fα(A)g and for this task, we will exploit Krylov subspace methods. This section is devoted to a careful review of these techniques.

We have seen in (18) that for diagonalizable matrices, computing fα(A)g amounts to the computation of the function fα on the eigenvalues of the matrix A. This procedure can be defined also in a more general setting [20], for an arbitrary matrix A and filter function f analytic inside a closed contour Γ enclosing the spectrum of A: the matrix function-vector product f(A)g can be defined as

$$ f(A)\mathbf{g}=\frac{1}{2\pi i}{\int}_{\varGamma }f(z)(zI -A)^{-1}\mathbf{g} d z. $$
(22)

An efficient class of algorithms for computing (22) is represented by projection methods. Let Vk be an orthogonal matrix whose columns \(\mathbf {v}_{1},\dots , \mathbf {v}_{k}\) span an arbitrary subspace, then we can approximate f(A)g on that subspace as

$$ \begin{array}{ll} f(A)\mathbf{g}=& \frac{1}{2\pi i}{\int}_{\varGamma }f(z)(zI -A)^{-1} \mathbf{g} d z \\ &{}\approx \frac{1}{2\pi i}{\int}_{\varGamma }f(z)V_{k}(zI -{V_{k}^{T}}AV_{k})^{-1}{V_{k}^{T}}\mathbf{g} d z \\ &{\kern-15.5pt}= V_{k}f({V_{k}^{T}}AV_{k}){V_{k}^{T}}\mathbf{g}. \end{array} $$
(23)

In this work we are interested in projection methods such that the matrix Vk spans the k th Krylov subspace

$$ \mathscr{K}_{k}(A,\mathbf{g}) = \operatorname{Span}\{\mathbf{g},A\mathbf{g},\ldots,A^{k-1}\mathbf{g}\}. $$

In the case of generic square matrix the Arnoldi procedure with modified Gram-Schmidt builds an orthonormal basis Vk of \({\mathscr{K}}_{k}(A,\mathbf {g})\) satisfying the so-called Arnoldi relation

$$ A V_{k} = V_{k} H_{k} + h_{k+1,k} \mathbf{v}_{k+1} \mathbf{e}_{k}^{T}, $$
(24)

where Hk = (hi,j)i,j is an upper Hessenberg matrix of size k × k. Finally, the approximation in (23) is computed as

$$ \mathbf{f}_{k} = \gamma V_{k} f(H_{k})\mathbf{e}_{1}, \quad \gamma = \|\mathbf{g}\|, \mathbf{e}_{1} = (1,0,\ldots,0)^{T} \in \mathbb{R}^{k}. $$

In Algorithm 1 we give a synthetic presentation for the matrix-function-times-vector computation which encompasses a reorthogonalization method.

figure a

When the matrix A is symmetric, Algorithm 1 can be greatly simplified by using the Lanczos procedure to generate the basis of the Krylov subspace \({\mathscr{K}}_{k}(A,\mathbf {g})\). By this procedure we build an orthonormal basis Vk of \({\mathscr{K}}_{k}(A,\mathbf {g})\) satisfying a modified version of the Arnoldi relation (24)

$$ A V_{k} = V_{k} T_{k} + \alpha_{k+1} \mathbf{v}_{k+1} \mathbf{e}_{k}^{T}, $$
(25)

where Tk = diag(β,α,β) is a symmetric tridiagonal matrix of size k × k. Finally, the approximation in (23) is computed as

$$ \mathbf{f}_{k} = \gamma V_{k} f(T_{k})\mathbf{e}_{1}, \quad \gamma = \|\mathbf{g}\|, \mathbf{e}_{1} = (1,0,\ldots,0)^{T} \in \mathbb{R}^{k}. $$

As for the nonsymmetric case, also the Lanczos algorithm can suffer from a loss of orthogonality in the computed vectors, thus the Algorithm 2 includes also a full reorthogonalization step.

figure b

As a matter of fact, we will use both Algorithm 1 and 2 in an iterative fashion and we need to provide a stopping rule to completely specify our proposal. In the next Section 3.1 we discuss and clarify this issue.

We conclude this section by highlighting a connection between our proposal and the standard approach with hybrid Krylov method. Specifically, we stress that under some hypotheses on the Krylov subspace the methods produce the same iterates.

Proposition 2

If \(V_{k} f_{\alpha }({V_{k}^{T}} A V_{k}) {V_{k}^{T}} \mathbf {e}_{1} \equiv V_{k} ({V_{k}^{T}} A V_{k})^{-1} {V_{k}^{T}} \mathbf {e}_{1}\) then

  • fk coincides with the k th iteration of the CG Algorithm, if A is SPD, and Algorithm 2 is used;

  • fk coincides with the k th iteration of the CGLS Algorithm, if fα is computed on ATA, and Algorithm 2 is used;

  • fk coincides with the k th iteration of the FOM Algorithm, if fα is computed on A, and Algorithm 1 is used.

Proof

All the properties follow straightforwardly from the standard construction of the CG, and CGLS Algorithm with the Lanczos orthogonalization procedure, and of FOM, for the Arnoldi orthogonalization procedure. We refer to [21] for the construction of the relative algorithms. □

3.1 The stopping criterion

A well-known stopping criterion for regularization is represented by the discrepancy principle. If we consider problem (1) and we denote by fk an approximation of the solution \(\overline {\mathbf {x}}\) obtained in an iterative fashion, the basic idea behind this criterion is to stop the iteration of the chosen method as soon as the norm of the residual rk = gAfk is sufficiently small, typically of the same size of δ, i.e., the norm of the perturbation ε in the right-hand side of (1). In our setting, \(\mathbf {f}_{k}=V_{k} f_{\alpha }({V_{k}^{T}} A V_{k}) {V_{k}^{T}} \mathbf {g}\), and we can easily monitor the residuals

$$ \|\mathbf{r}_{k} \|=\|\mathbf{g} - A\mathbf{f}_{k}\|=\| \mathbf{g} - AV_{k} f_{\alpha}({V_{k}^{T}} A V_{k}) {V_{k}^{T}} \mathbf{g} \|. $$
(26)

The discrepancy principle, based on this quantities, reads as

$$ \text{ select the smallest }k\text{ such that }\|\mathbf{r}_{k} \|/ \|\mathbf{g} \| \leq \frac{\eta \cdot \delta}{\|\mathbf{g}\|} , $$
(27)

where η ≥ 1 and we call NoiseLevel the relative noise level ∥ε∥/∥g∥≡ δ/∥g∥. Observe now that, since fα does not coincide, in general, with the function 1/λ, the quantity ∥rk∥ is not guaranteed to converge to 0 as k increases, as it happens in the linear system solution framework. In general, it will stabilize when the dimension of the Krylov space increases: when \(k \rightarrow n\), fk will be an increasingly better approximation of f(A)g and hence \(\| \mathbf {g} - AV_{k} f_{\alpha }({V_{k}^{T}} A V_{k}) {V_{k}^{T}} \mathbf {g} \|\) will converge to the quantity ∥gAf(A)g∥≠ 0 since f(A)≠A− 1.

In order to improve the stopping capabilities of our regularized reconstruction algorithm, we need to add a further stopping criterion. With this in mind, we consider the sequence of the residuals {ck := |∥rk∥−∥rk− 1∥|}k which is such that

$$ |\|\mathbf{r}_{k}\|-\|\mathbf{r}_{k-1}\||\leq \|\mathbf{r}_{k}-\mathbf{r}_{k-1}\|= \|A\mathbf{f}_{k}-A\mathbf{f}_{k-1}\|, $$
(28)

and thus goes to zero as k increases. We select as a stopping criterion the following:

$$ \text{ select the smallest }k\text{ such that } |\|\mathbf{r}_{k}\|-\|\mathbf{r}_{k-1}\||/\|\mathbf{g}\| \leq \frac{\eta \cdot \delta}{\|\mathbf{g}\|}. $$
(29)

Heuristically this choice is supported by the fact that, due to the noise presence, it is not possible to discern among reconstructions giving rise to residuals which differ for a quantity of the same order of the noise level. Observe, moreover, that from (28), it is clear that the stopping residual (29) will be satisfied. Finally, we point out that performing the stopping check using the form in (29) is very cheap in space and time once the norm of residuals has been computed.

We stress again that the method we are using generates a solution that is in the Krylov subspace \({\mathscr{K}}_{k}(A,\mathbf {b})\), or either \({\mathscr{K}}_{k}(A^{T}A,A^{T}\mathbf {b})\), for the fixed choice of the parameter α made in Section 2.5. Using analogous techniques to the one in [11,12,13], it is possible to devise a suitable hybrid choice that selects adaptively an optimal threshold α within the given Krylov space of order k, i.e., we could connect the choice of k and α in an adaptive way.

In the following Algorithm 3 we present the full pseudo-code for our proposal in the case in which the Arnoldi procedure is used; the case for the Lanczos based procedure is obtained similarly from Algorithm 2.

figure c

4 Numerical experiments

All the numerical experiments are performed on a Linux machine with an Intel®; Xeon®; Platinum 8176 CPU, 2.10 GHz, with 84 Gb of RAM. The code is written and executed in Matlab 9.3.0.713579 (R2017b). The regularization routines used for comparison, and the test problems are generated with the packages AIR Tools II [22] and IR Tools [23]. The algorithm presented here, together with the example files generating the examples, is available as the IRfun MATLAB function on https://github.com/Cirdans-Home/IRfun.

In order to study the effectiveness of our stopping criteria, we organize the tests comparing the best achievable PSNR within maximum allotted iterations with the results obtained employing the stopping criteria discussed in Section 3.1. Moreover, some testing on the stability of the algorithm with respect to the choice of the optimization parameter α in fα(x) is investigated here.

The methods we used for comparison are the standard Krylov methods for regularization implemented in IR Tools [23], i.e., the CGLS, Preconditioned CGLS [4], Range Restricted GMRES (RR-GMRES), the hybrid GMRES, GMRES with 1 penalty [24], and hybrid LSQR method [25]. Moreover, we consider also the solution with the Tikhonov regularization method in which we solve the auxiliary linear system with the CGLS method and compute the optimal parameter λ by means of the L-curve criterion using of the SVD of the matrix A. We report, moreover, the computational time for every method. To conclude, let us stress the fact that our proposal is a linear method exploiting the information from Kyrlov subspaces: for this reason, in the comparisons, we restrict ourselves only to the linear methods mentioned above not taking into account nonlinear techniques.

4.1 Deblurring problems: f α(A) for A symmetric

In all the examples in this section, the parameters α, and β for the matrix function regularization are selected as α = δ × 10− 1, and β = 109. We perform a fixed number of iterations (100) for each method, independently from the fact that a stopping criterion is satisfied or not; for all the comparison methods the discrepancy principle is used to pick the k at which the iterations are halted, while for our proposal we use the stopping criteria described in Section 3.1, i.e., the minimun between the two that satisfy the (27) and (29). The parameter λ for the Tikhonov method is set by using the SVD/L-curve criterion and the CGLS method is used to obtain the solution of the resulting linear system. We report in every table, both, the number of iterations and the achieved PSNR: the ones in brackets are relative to the best-reconstructed solution while the others, correspond to the results obtained by applying the stopping criteria. For the Tikhonov-CGLS method the two quantities coincide since the regularization is obtained by the shift λ, and the linear system is then solved to the highest accuracy. For the preconditioned CGLS we use the algebra preconditioner \(P_{(i)}(A):= {\mathscr{L}}_{A^{2^{i}}}^{\frac {1}{2^{i-1}}} {\mathscr{L}}_{A}^{-1}\) introduced by the authors in [4], where \({\mathscr{L}}_{A}\) is the projection of the matrix A on the space sdF of matrices simultaneously diagonalized by the two-level Fourier transform

$$ \mathscr{L} := \operatorname{sd} F=\{F\operatorname{diag}(\mathbf{z})F^{*} : \mathbf{z} \in \mathbb{C}^{n}\}. $$

Specifically, we always test the various preconditioners obtained for i = 1,…,8, and take the one giving the best results. Observe that in this way we always consider also the classic optimal and super-optimal preconditioners (see [4, 17]).

Satellite

The first example is the ‘satellite' image (Fig. 2) with a mild, medium, and severe Gaussian blur generated by the PRblurgauss() function. Since the images terminate with a zero boundary, we have used the zero Dirichlet boundary conditions to assemble the matrix A. To the right-hand side we add Gaussian noise of level

$$ \delta = \{10^{-3}, 10^{-2}, 10^{-1}, 2\times10^{-1}, 3\times 10^{-1}, 5\times 10^{-1}\} $$
(30)

by means of the function PRnoise(b,‘gauss',delta).

Fig. 2
figure 2

‘satellite' test problem with different Gaussian blur generated by the PRblurgauss() function. a mild. b medium. c severe

For the sake of completeness, for this particular problem set, we compare our approach using as preconditioner for the CGLS method an approximate inverse Toeplitz preconditioner proposed in [19]. To have a fair comparison with the CGLS and the PCGLS methods we apply the fftPrec() from the Regularization Tools [26] package to the CGLS method (the associate threshold for fftPrec() has been set using the generalized cross validation method). In the following, this method will be denoted as fftPCGLS.

The results are collected in Tables 1, if reorthogonalization is used for the Lanczos method, and in Table 2 without reorthogonalization.

Table 1 ‘satellite' test problem, three types of Gaussian blur (mild, medium, severe), different levels of noise δ. The parameter for the Tikhonov method is set by using the SVD/L-curve criterion, the parameter β = 109
Table 2 ‘satellite' test problem, three types of Gaussian blur (mild, medium, severe), different levels of noise δ without reorthogonalization (i.e., Reorth =‘Off'). To be compared with the results in Table 1

An example of the reconstructed ‘satellite' by the various algorithms is instead given in Fig. 3.

Fig. 3
figure 3

Reconstructed signals for the ‘satellite' test problem, for δ = 5e − 1 and medium level of blur

The first observation we can make on these cases is that the stopping criteria work efficiently on all the test cases. As a matter of fact, the PSNR for the best-reconstructed signal, and the one obtained from the stopping are comparable. For high levels of noise (δ ≥ 2e − 1) and severe blur level the combination of the matrix function routine and the stopping criteria deliver better results than the comparison methods in the majority of the cases. In particular, in the comparison with the hybrid Krylov methods, we should mention the fact that, in some cases, the latter achieves slightly better results in terms of combination of PSNR/stopping results but at the cost of having a higher execution time. Generally, we can observe that the timings of our proposal are smaller than the one needed for solving the problem with the Tikhonov approach and are comparable with the ones for the other Krylov-type methods. There is some improvement on the timings when no reorthogonalization is used at the cost of a slightly minor performance in terms of PSNR.

We are also interested in investigating the sensitiveness of the proposed approach with respect to the regularization parameters α and β of Section 2.5 in terms of achieved PSNR. In Fig. 4 we report the PSNR of the best reconstruction in 100 iterations obtained by the matrix function iterative regularization with the function fα(A), and varying the parameter α around the selected value in Section 2.5, i.e., α = δ × 10− 1.

Fig. 4
figure 4

Sensibility with respect to the choice of α and β for the matrix-function regularization algorithm based on the Lanczos procedure with reorthogonalization for the ‘satellite' test problem. The sensibility with respect to β is evaluated for the value of α selected for the experiments. a mild. b medium. c severe

We repeat the test for all the blur levels in Fig. 2 and the noise levels δ in (30). We report, moreover, the same stability test for β. What we observe is that the selected parameters lay, always, in a flat zone of the graph. This implies that even if we slightly vary them the resulting restoration quality is unchanged.

4.2 Space variant problems: f α(A)

We consider also the regularization of problems whose solution is not spatially invariant, namely we consider the first-kind Fredholm integral equation from Phillips [27]. This problem defines the function

$$ \phi(x) = \left\lbrace\begin{array}{ll} 1 + \cos(x\pi/3), & |x| < 3, \\ 0, & |x| \geq 3, \end{array}\right. $$

and tries to recover it by means of the kernel K(s,t) = ϕ(st), with right-hand side \(g(s) = (6-|s|)(1+.5\cos \limits (s\pi /3)) + 9/(2\pi )\sin \limits (|s|\pi /3)\) on the interval [− 6,6]. The second problem of this class we decide to investigate is the discretization of 1D gravity surveying model problem [28] in which a mass distribution \(f(t) = \sin \limits (\pi t) + 0.5\sin \limits (2\pi t)\), is located at depth d = 0.25, while the vertical component of the gravity field g(s) is measured at the surface. The resulting problem is again a first-kind Fredholm integral equation, this time with kernel \(K(s,t) = d(d^{2} + (s-t)^{2})^{-\frac {3}{2}}\), in which the discretization of the source g(t) is obtained as g = Ax, where A is computed by means of a mid-point quadrature rule on the interval [0,1]. We perturb again the right-hand side by the noise in (30), and give the solution in Table 3.

Table 3 Space variant problems: Philips and Gravity for different levels of noise δ in (30), η = 1.01, and using reorthogonalization

The “–” reported for the RR-GMRES algorithm in Table 3b occurs when the algorithm generates a Hessenberg matrix that is numerically singular, and thus halts without giving back a result.

In Fig. 5 we report the PSNR of the best reconstruction in 100 iteration obtained by the matrix function iterative regularization with the function fα(A), and varying the parameter α around the selected value in Section 2.5, i.e., α = δ × 10− 1, from which we observe again that the selected parameter is located in a flat zone, i.e., small variations do not alter the resulting PSNR for the best reconstruction.

Fig. 5
figure 5

Sensibility with respect to the choice of the α parameter for the matrix-function regularization algorithm with reorthogonalization for the ‘Phillips' and ‘Gravity' test problems

4.3 Tomography problem: \(f(A_{n,m}^{T}A_{n,m})\) for A n,m rectangular

Tomography problems are imaging problems in which the image has to be reconstructed from some of its sections obtained through the use of a penetrating wave, and it literally means a “slice view”. In general, we could expect to have to solve a problem of the form

$$ \text{find }\textbf{x}_{m} \in \mathbb{R}^{m}\text{ s.t. } A_{n,m} \mathbf{x}_{m} = \mathbf{g}_{n}, \quad n \neq m, \quad A_{n,m} \in\mathbb{R}^{n\times m}, \mathbf{g}_{n} \in \mathbb{R}^{n}. $$
(31)

Using the least-square interpretation of problem (31), we can then compute our regularized solution as

$$ \mathbf{x}_{m} = f_{\alpha}(A_{n,m}^{T} A_{n,m}) A^{T}_{n,m}\mathbf{g}_{n}, $$

for the fα given in (21) by means of the matrix function algorithm based on the Lanczos orthogonalization process.

Parallel tomography

we consider the “line model” for creating a 2D X-ray tomography test problem with an N × N pixel domain, using p parallel rays for each angle 𝜃 generated by the paralleltomo routine from the AIRTOLSII package. To the right-hand side we add Gaussian noise of level δ as in (30). In this example the maximum number of iterations is fixed at 800. The results are collected in Table 4. The relaxation parameters for the Kaczmarz, the random Kaczmarz, Cimmino, and Landweber methods are set with the training routine train_relaxpar(A,b,x_true,@method,20), while the threshold parameter τ is computed with train_dpme(A,bl,x_true, @method,'DP',NoiseLevel,10,20,options).

Table 4 ‘paralleltomo' test problem, different levels of noise δ. η = 1.01. The relaxation parameters for Kaczmarz, random Kaczmarz, Cimmino, and Landweber methods are set with train_relaxpar(A,b,x_true,@method,20), while the threshold parameter τ is computed by train_dpme(A,bl,x_true,@method,'DP',NoiseLevel,10,20,options)

On this set of test problems, we observe different behaviors for the considered methods allowing us to observe a couple of interesting facts. Firstly, we observe that for higher levels of noise the discrepancy principle combined with the Kaczmarz, random Kaczmarz, Landweber and Cimmino methods do not work well, i.e., the iteration at which it stops the method is very far from the one for which the best PSNR is achieved even though the discrepancy principle gives better results when combined with the standard CGLS algorithm.

Secondly, it is interesting to compare the best achievable reconstructions obtained by CGLS and by our proposal: the best obtained PSNR is the same in the two cases. This is due to the fact that the ill-conditioning of the matrices An,m obtained for this test problem is the result of very large singular values and not to the presence of decaying ones. This behavior is in accordance with Theorem 4, i.e., the function fα(x) is not “cutting out” the singular values with relatively small magnitude since they are not present, hence the solution computed by our method coincides with the one provided by the CGLS method. Even if on this dataset our proposal does not compare favorably with CGLS in terms of execution time, the results here obtained suggest its typical use-case, i.e., our proposal should be employed for problems exhibiting a fast decay of the smallest singular value to zero.

5 Conclusion and future perspectives

In this work we have introduced a hybrid Krylov method for the regularization of discrete inverse problems through the usage of matrix functions computed with Krylov methods. This construction generalizes the approach used for structured regularizing preconditioners to cases in which the opportune structure is not easily devised. The theoretical justification of the given approach is based on spectral filtering framework for the inverse, or the pseudo-inverse, of an ill-posed operator. We have discussed a heuristic for the stopping criterion and for the selection of the parameters of the filter that does not require further computations to be made on the system matrix, needing only the knowledge of the norm of the additive noise, and we plan to extend to a fully adaptive and automatic choice of the parameters in the style of [11,12,13].

The numerical results show that the proposed methods behave consistently throughout different applications, and are able to deal with cases in which the noise level is high (> 20%). The comparison with the standard Krylov-type methods is promising in terms of achieved PSNR and timings.

Moreover, the proposed methods behave better than both the Tikhonov method and the fixed point methods (Kaczmarz, Cimmino, Landweber) with trained parameters in several test cases.

Matter of future investigations is the usage of rational Krylov methods for the computation of the various matrix functions and the possible exploitation of different filtering functions within the same framework.