Journal of Scientific Computing

, Volume 70, Issue 3, pp 990–1009 | Cite as

Multidirectional Subspace Expansion for One-Parameter and Multiparameter Tikhonov Regularization

Open Access
Article
  • 707 Downloads

Abstract

Tikhonov regularization is a popular method to approximate solutions of linear discrete ill-posed problems when the observed or measured data is contaminated by noise. Multiparameter Tikhonov regularization may improve the quality of the computed approximate solutions. We propose a new iterative method for large-scale multiparameter Tikhonov regularization with general regularization operators based on a multidirectional subspace expansion. The multidirectional subspace expansion may be combined with subspace truncation to avoid excessive growth of the search space. Furthermore, we introduce a simple and effective parameter selection strategy based on the discrepancy principle and related to perturbation results.

Keywords

Tikhonov Multiparameter Tikhonov Generalized Krylov Multidirectional subspace expansion Subspace truncation Subspace method Linear discrete ill-posed problem Regularization Regularization parameter 

Mathematics Subject Classification

15A29 65F10 65F22 65F30 65R30 65R32 

1 Introduction

We consider one-parameter and multiparameter Tikhonov regularization problems of the form
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \sum _{i=1}^{\ell } {\mu ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \qquad (\ell \ge 1), \end{aligned}$$
(1)
where \(\Vert \cdot \Vert \) denotes the 2-norm and the superscript i is used as an index. We focus on large-scale discrete ill-posed problems such as the discretization of Fredholm integral equations of the first kind. More precisely, assume A is an ill-conditioned or even singular \(m \times n\) matrix with \(m \ge n\), \({L^{i}}\) are \(p^{i} \times n\) matrices such that the nullspaces of A and \({L^{i}}\) intersect trivially, and \({\mu ^{i}}\) are nonnegative regularization parameters. Furthermore, assume \({\varvec{b}}\) is contaminated by an error \({\varvec{e}}\) and satisfies \(\varvec{b} = A {\varvec{x}}_\star + {\varvec{e}}\), where \({\varvec{x}}_\star \) is the exact solution. Finally, we assume that a bound \(\Vert {\varvec{e}}\Vert \le \varepsilon \) is available, so that the discrepancy principle can be used.

In one-parameter Tikhonov regularization (\(\ell = 1\)), the choice of the regularization operator is typically significant, since frequencies in the nullspace of the operator remain unpenalized. Multiparameter Tikhonov can be used when a satisfactory choice of the regularization operator is unknown in advance, or can be seen as an attempt to combine the strengths of different regularization operators. In some applications, using more than one regularization operator and parameter allows for more accurate solutions [1, 2, 17, 20].

Solving (1) for large-scale problems may be challenging. In case the \({\mu ^{i}}\) are fixed a priori, methods such as LSQR [21] or LSMR [4] may be used. However, the problem becomes more complicated when the regularization parameters are not fixed in advance [12, 15, 17]. In this paper, we present a new subspace method consisting of three phases; a new expansion phase, a new extraction phase, and a new truncation phase. To be more specific, let Open image in new window be a subspace of dimension \(k \ll n\), and let the columns of \(X_k\) form an orthonormal basis for Open image in new window . Then we can compute matrix decompositions
$$\begin{aligned} A X_k= & {} U_{k+1} {\underline{H}}_k \nonumber \\ {L^{i}} X_k= & {} {V_k^{i}} {K_k^{i}} \qquad (i = 1, 2, \dots , \ell ), \end{aligned}$$
(2)
where \(U_{k+1}\) and \({V_k^{i}}\) are have orthonormal columns, \(\beta {\varvec{u}}_1 = \varvec{b}\), \(\beta = \Vert {\varvec{b}}\Vert \), \({\underline{H}}_k\) is a \((k+1) \times k\) Hessenberg matrix, and \({K_k^{i}}\) is upper triangular. Denote \(\varvec{\mu } = ({\mu ^{1}}, \dots , {\mu ^{\ell }})\) for convenience. Now restrict the solution space to Open image in new window so that \({\varvec{x}}_k(\varvec{\mu }) = X_k {\varvec{c}}_k(\varvec{\mu })\), where
$$\begin{aligned} {\varvec{c}}_k(\varvec{\mu })= & {} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert A X_k {\varvec{c}} - {\varvec{b}}\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {L^{i}} X_k {\varvec{c}}\Vert ^2 \nonumber \\= & {} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {K_k^{i}} {\varvec{c}}\Vert ^2. \end{aligned}$$
(3)
The vector \({\varvec{e}}_1\) is the first standard basis vector of appropriate dimension. Our paper has three contributions. First, a new expansion phase where we add multiple search directions to Open image in new window . Second, a new truncation phase which removes unwanted new search directions. Third, a new method for selecting the regularization parameters \({\mu _k^{i}}\) in the extraction phase. The three phases work alongside each other: the intermediate solution obtained in the extraction phase is preserved in the truncation phase, whereas the remaining perpendicular component(s) from the expansion phase are removed.

The paper is organized as follows. In Sect. 2 an existing nonlinear subspace method is discussed, whereafter we propose the new multidirectional subspace expansion of the expansion phase. Discussion of the truncation phase follows immediately. Section 3 is focused on discrepancy principle based parameter selection for one-parameter regularization. New lower and upper bounds on the regularization parameter are provided. Sections 4 and 5 describe the extraction phase. In the former, a straightforward parameter selection strategy for multiparameter regularization is given, in the latter, a justification using perturbation analysis. Numerical experiments are performed in Sect. 6 and demonstrate the competitiveness of our new method. We end with concluding remarks in Sect. 7.

2 Subspace Expansion for Multiparameter Tikhonov

Let us first consider one-parameter Tikhonov regularization with a general regularization operator. Then \(\ell = 1\) and we write \(\mu = {\mu ^{1}}\), \(L = {L^{1}}\), and \(K_k = {K_k^{1}}\), such that (1) simplifies to
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 + \mu \Vert L {\varvec{x}}\Vert ^2. \end{aligned}$$
When \(L=I\) we use the Golub–Kahan–Lanczos bidiagonalization procedure to generate the Krylov subspaceIn this case \({\underline{H}}_k\) is lower bidiagonal and \(K_k\) is the identity and
$$\begin{aligned} {\varvec{x}}_{k+1} = \frac{(I - X_k X_k^*) A^* {\varvec{u}}_{k+1}}{ \Vert (I - X_k X_k^*) A^* {\varvec{u}}_{k+1} \Vert } \end{aligned}$$
If \(L\ne I\) one can still try to use the above Krylov subspace [12], however, it may be more natural to consider a shift-independent generalized Krylov subspace of the formspanned by the first k vectors in
$$\begin{aligned}&\text {Group 0}\quad A^*{\varvec{b}} \\&\text {Group 1}\quad (A^*A) A^*{\varvec{b}}, (L^*L) A^*{\varvec{b}} \\&\text {Group 2}\quad (A^*A)^2 A^*{\varvec{b}}, (A^*A) (L^*L) A^*{\varvec{b}}, (L^*L) (A^*A) A^*{\varvec{b}}, (L^*L)^2 A^*{\varvec{b}} \\&\dots \end{aligned}$$
This generalized Krylov subspace was first studied by Li and Ye [18] and later by Reichel et al. [23]. An orthonormal basis can be created with a generalization of Golub–Kahan–Lanczos bidiagonalization [13]. However, while the search space grows linearly as a function of the number of matrix-vector products, the dimension of the generalized Krylov subspace grows exponentially as a function of the total degree of a bivariate matrix polynomial. As a result, if we take any vector Open image in new window and write it as \(p(A^*A, L^*L)A^* {\varvec{b}}\), where p is a bivariate polynomial, then p has at most degree \(\lfloor \log _2 k \rfloor \). This low degree may be undesirable especially for small regularization parameters \(\mu \). Reichel and Yu [24, 25] solve this in part with algorithms that can prioritize one operator over the other. For instance, if \({\varvec{w}}\) is a vector in a group j and B has priority over A, then group \(j+1\) contains \((A^*A){\varvec{w}}\), \((B^*B){\varvec{w}}\), \((B^*B)^2{\varvec{w}}\), ..., \((B^*B)^\rho {\varvec{w}}\). The downside is that \(\rho \) is a user defined constant, and that the expansion vectors are not necessarily optimal.
An alternative approach is a greedy nonlinear method described by Lampe et al. [17]. We briefly review their method and state a straightforward extension to multiparameter Tikhonov regularization. First note that the low-dimensional minimization in (3) simplifies to
$$\begin{aligned} {\varvec{c}}_k(\mu )&= \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert AX_k {\varvec{c}} - {\varvec{b}}\Vert ^2 {} + \mu \Vert LX_k {\varvec{c}}\Vert ^2 \\&= \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \mu \Vert K_k {\varvec{c}}\Vert ^2, \end{aligned}$$
in the one-parameter case. Next, compute a value \(\mu = \mu _k\) using, e.g., the discrepancy principle. It is easy to verify that
$$\begin{aligned}&A^* {\varvec{b}} - (A^* A + \mu _k L^* L) {\varvec{x}}_k(\mu _k)\\&\quad = A^* U_{k+1} (\beta {\varvec{e}}_1 - {\underline{H}}_k {\varvec{c}}_k(\mu _k)) {} + \mu _k L^* V_{k} K_k {\varvec{c}}_k(\mu _k) \end{aligned}$$
is perpendicular to Open image in new window , as well as the gradient of the cost function
$$\begin{aligned} {\varvec{x}} \mapsto \frac{1}{2}( \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 + \mu \Vert L {\varvec{x}}\Vert ^2 ) \end{aligned}$$
in the point \({\varvec{x}}_k(\mu _k)\). Therefore, this vector is used to expand the search space. As usual, expansion and extraction are repeated until suitable stopping criteria are met.
As previously stated, Lampe et al. [17] consider only one-parameter Tikhonov regularization, however, their method readily extends to multiparameter Tikhonov regularization. Again, the first step is to decide on regularization parameters \(\varvec{\mu }_k\). Next, use the residual of the normal equations
$$\begin{aligned}&A^* {\varvec{b}} - \Big ( A^* A + \sum _{i=1}^\ell {\mu _k^{i}}{L^{i}}^* {L^{i}} \Big ) {\varvec{x}}_k(\varvec{\mu }_k)\\&\quad = A^* U_{k+1} (\beta {\varvec{e}}_1 - {\underline{H}}_k {\varvec{c}}_k(\varvec{\mu }_k)) {} - \sum _{i=1}^\ell {\mu _k^{i}} {L^{i}}^* {V_k^{i}} {K_k^{i}} {\varvec{c}}_k(\varvec{\mu }_k), \end{aligned}$$
to expand the search space. Note that the residual is again orthogonal to Open image in new window as well as the gradient of the cost function
$$\begin{aligned} {\varvec{x}} \mapsto \frac{1}{2}\Big ( \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \Big ). \end{aligned}$$
We summarize this multiparameter method in Algorithm 1, but remark that in practice we initially use Golub–Kahan–Lanczos bidiagonalization until a \(\varvec{\mu }_k\) can be found that satisfies the discrepancy principle.

Algorithm 1

(Generalized Krylov subspace Tikhonov regularization; extension of [17])

Input: Measurement matrix A, regularization operators \({L^{1}}\), ..., \({L^{\ell }}\), and data \({\varvec{b}}\).

Output: Approximate solution \({\varvec{x}}_k \approx {\varvec{x}}_\star \).
  1. 1.

    Initialize \(\beta = \Vert {\varvec{b}}\Vert \), \(U_1 = {\varvec{b}} / \beta \), \(X_0 = []\), \({\varvec{x}}_0 = \mathbf{0 }\), and \(\varvec{\mu }_0 = \mathbf{0 }\). for \(k = 1, 2, \dots \) d o

     
  2. 2.

       Expand \(X_{k-1}\) with \(A^* {\varvec{b}} - ( A^* A + \sum _{i=1}^\ell {\mu _{k-1}^{i}} {L^{i}}^* {L^{i}}) {\varvec{x}}_{k-1}\).

     
  3. 3.

       Update \(A X_k = U_{k+1} {\underline{H}}_k\) and \({L^{i}} X_k = {V_k^{i}} {K_k^{i}}\).

     
  4. 4.

       Select \(\varvec{\mu }_k\); see Sect. 4 and Algorithm 3.

     
  5. 5.
     
  6. 6.

       \( {\varvec{x}}_k = X_k {\varvec{c}}_k\).

     
  7. 7.

       i f \(\Vert {\varvec{x}}_k - {\varvec{x}}_{k-1}\Vert /\Vert {\varvec{x}}_k\Vert \) is sufficiently small then break

     
Suitable regularization operators often depend on the problem and its solution. Multiparameter regularization may be used when a priori information is lacking. In this case, it is not obvious that the residual vector above is a “good” expansion vector, in particular if the intermediate regularization parameters \({\varvec{\mu }}_k\) are not necessarily accurate. Hence, we propose to remove the dependence on the parameters to some extent by expanding the search space with the vectors
$$\begin{aligned} A^* A {\varvec{x}}_k(\varvec{\mu }_k), \quad {L^{1}}^* {L^{1}} {\varvec{x}}_k(\varvec{\mu }_k), \quad \dots , \quad {L^{\ell }}^* {L^{\ell }} {\varvec{x}}_k(\varvec{\mu }_k), \end{aligned}$$
(4)
separately. Here, we omit \(A^* {\varvec{b}}\) as it is already contained in \(X_k\). Since we expand the search space in multiple directions, we refer to this expansion as a “multidirectional” subspace expansion. Observe that the previous residual expansion vector is in the span of the multidirectional expansion vectors.
It is unappealing for the search space to grow with \(\ell +1\) basis vectors per iteration, because the cost of orthogonalization and the cost of solving the projected problems depend on the dimension of the search space. Therefore, we wish to condense the best portions of the multiple directions in a single vector, and use the following approach. First we expand \(X_k\) with the vectors in (4) and obtain \({\widetilde{X}}_{k+\ell +1}\). Then we compute the decompositions
$$\begin{aligned} A {\widetilde{X}}_{k+\ell +1}{}= & {} {\widetilde{U}}_{k+\ell +2} \widetilde{{\underline{H}}}_{k+\ell +1} \\ {L^{i}} {\widetilde{X}}_{k+\ell +1} {}= & {} {{\widetilde{V}}_{k+\ell +1}^{i}} {{\widetilde{K}}_{k+\ell +1}^{i}} \qquad (i=1, 2, \dots , \ell ), \end{aligned}$$
analogously to (2) and determine parameters \({\varvec{\mu }}_{k+1}\) and the approximate solution \(\widetilde{\varvec{c}}_{k+\ell +1}\). Next, we compute
$$\begin{aligned} A ({\widetilde{X}}_{k+\ell +1} Z^*) {}= & {} ({\widetilde{U}}_{k+\ell +2} P^*) (P \widetilde{{\underline{H}}}_{k+\ell +1} Z^*) \nonumber \\ {L^{i}} ({\widetilde{X}}_{k+\ell +1} Z^*) {}= & {} ({{\widetilde{V}}_{k+\ell +1}^{i}} Q^{i*}) (Q^i {{\widetilde{K}}_{k+\ell +1}^{i}} Z^*) \qquad (i=1, 2, \dots , \ell ), \end{aligned}$$
(5)
where Z, P, and \(Q^i\) orthonormal matrices of the form
$$\begin{aligned} Z = \begin{bmatrix} I_{k}&\\&Z_{\ell +1} \end{bmatrix}, \quad P = \begin{bmatrix} I_{k+1}&\\&P_{\ell +1} \end{bmatrix}, \quad Q^i = \begin{bmatrix} I_{k}&\\&Q^i_{\ell +1} \end{bmatrix}. \end{aligned}$$
(6)
Here \(I_k\) is the \(k\times k\) identity matrix and \(Z_{\ell +1}\) is an orthonormal matrix so that \(Z_{\ell +1} \widetilde{\varvec{c}}_{k+1:k+\ell +1} = \gamma {\varvec{e}}_1\) for some scalar \(\gamma \). The matrices \(P_{\ell +1}\) and \(Q^i_{\ell +1}\) are computed to make \(\widetilde{{\underline{H}}}_{k+\ell +1} Z^*\) and \({{\widetilde{K}}_{k+\ell +1}^{i}} Z^*\) respectively upper-Hessenberg and upper-triangular again. At this point we can truncate (5) to obtain
$$\begin{aligned} A X_{k+1}= & {} U_{k+2} {\underline{H}}_{k+1} \\ {L^{i}} X_{k+1}= & {} {V_{k+1}^{i}} {K_{k+1}^{i}} \qquad (i=1, 2, \dots , \ell ), \end{aligned}$$
and truncate \(Z\widetilde{\varvec{c}}_{k+\ell +1}\) to obtain \({\varvec{c}}_{k+1}\) so that \({\widetilde{X}}_{k+\ell +1} \widetilde{\varvec{c}}_{k+\ell +1} = X_{k+1}\varvec{c}_{k+1}\). The truncation is expected to keep important components, since the directions removed from \(X_{k+\ell +1}\) are perpendicular to the current best approximation \({\varvec{x}}_{k+1}\), and also to the previous best approximations \({\varvec{x}}_{k}\), \({\varvec{x}}_{k-1}\), ..., \({\varvec{x}}_1\). If the rotation and truncation are combined in one step, then the computational cost of the method is Open image in new window , which quickly becomes smaller than the (re)orthogonalization cost as k grows.
To illustrate our approach, let us consider a one-parameter Tikhonov example where \(\ell = 1\). First we expand \(X_1 = {\varvec{x}}_1\) with vectors \(A^*A{\varvec{x}}_1\) and \(L^*L {\varvec{x}}_1\). Let \(A {\widetilde{X}}_{1+2} = {\widetilde{U}}_{2+2} \widetilde{{\underline{H}}}_{1+2}\) and \(L {\widetilde{X}}_{1+2} = {\widetilde{V}}_{1+2} {\widetilde{K}}_{1+2}\), and use \(\widetilde{{\underline{H}}}_{1+2}\) and \({\widetilde{K}}_{1+2}\) to compute \(\widetilde{\varvec{c}}_{1+2}\). We then compute a rotation matrix \(Z_2\) so that \(Z_2 \widetilde{\varvec{c}}_{2:3} = \pm \Vert \widetilde{\varvec{c}}_{2:3}\Vert {\varvec{e}}_1\), and let Z be defined as in (6). The matrices \(\widetilde{{\underline{H}}}_{1+2} Z^*\) and \({\widetilde{K}}_{1+2} Z^*\) are no longer have their original structure, hence, we need to compute orthonormal P and Q such that \(P \widetilde{{\underline{H}}}_{1+2} Z^*\) is again upper-Hessenberg and \(Q {\widetilde{K}}_{1+2} Z^*\) is upper-triangular. Schematically we have
$$\begin{aligned} \xrightarrow {\widetilde{\varvec{c}}_{1+2}^*}&\begin{bmatrix} \times\times & {} \times \end{bmatrix} \xrightarrow {(Z\widetilde{\varvec{c}}_{1+2})^*} \begin{bmatrix} \times\times & {} 0 \end{bmatrix} \\ \xrightarrow {\widetilde{{\underline{H}}}_{1+2}}&\begin{bmatrix} \times\times & {} \times \\ \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \xrightarrow {\widetilde{{\underline{H}}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ \times\times & {} \times \\ 0\times & {} \times \\ 0\times & {} \times \end{bmatrix} \xrightarrow {P\widetilde{{\underline{H}}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \\ \xrightarrow {{\widetilde{K}}_{1+2}}&\begin{bmatrix} \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \xrightarrow {{\widetilde{K}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ 0\times & {} \times \\ 0\times & {} \times \end{bmatrix} \xrightarrow {Q{\widetilde{K}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \end{aligned}$$
accompanied by the decompositions
$$\begin{aligned} A ({\widetilde{X}}_{1+2} Z^*)= & {} ({\widetilde{U}}_{2+2} P^*) (P \widetilde{{\underline{H}}}_{1+2} Z^*) \\ L ({\widetilde{X}}_{1+2} Z^*)= & {} ({\widetilde{V}}_{1+2} Q^*) (Q {\widetilde{K}}_{1+2}Z^*). \end{aligned}$$
At this point we truncate the subspaces by removing the last columns from \({\widetilde{X}}_{1+2} Z^*\), \({\widetilde{U}}_{2+2} P^*\), \(P \widetilde{{\underline{H}}}_{1+2} Z^*\), \({\widetilde{V}}_{1+2} Q^*\), and \(Q {\widetilde{K}}_{1+2} Z^*\), and the bottom rows of \(P \widetilde{{\underline{H}}}_{1+2} Z^*\) and \(Q {\widetilde{K}}_{1+2} Z^*\), to obtain
$$\begin{aligned} AX_2= & {} U_3 {\underline{H}}_2 \\ LX_2= & {} V_2 K_2. \end{aligned}$$
Below we summarize the steps of the new algorithm for solving problem (1). In our implementation we take care to use full reorthogonalization and avoid extending \(X_{k}\), \(U_{k+1}\), and \({V_k^{i}}\) with numerically linearly dependent vectors. We omit these steps from the pseudocode for brevity. In addition, we initially expand the search space solely with \(A^*{\varvec{u}}_{k+1}\) until the discrepancy principle can be satisfied conform Proposition 1 in Sect. 3.

Algorithm 2

(Multidirectional Tikhonov regularization)

Input: Measurement matrix A, regularization operators. \({L^{1}}\), ..., \({L^{\ell }}\), and data \({\varvec{b}}\).

Output: Approximate solution \({\varvec{x}}_k \approx {\varvec{x}}_\star \).
  1. 1.

    Initialize \(\beta = \Vert {\varvec{b}}\Vert \), \(U_1 = {\varvec{b}} / \beta \), \(X_0 = []\), \({\varvec{x}}_0 = \mathbf{0 }\), and \(\varvec{\mu }_0 = \mathbf{0 }\).

    for \(k=0\), 1, ..., d o

     
  2. 2.

       Expand \(X_k\) with \(A^* A {\varvec{x}}_{k}\), \({L^{1}}^* {L^{1}} {\varvec{x}}_{k}\), ..., \({L^{\ell }}^* {L^{\ell }} {\varvec{x}}_{k}\).

     
  3. 3.

       Update \(A{\widetilde{X}}_{k+\ell +1} = {\widetilde{U}}_{k+\ell +2} \widetilde{{\underline{H}}}_{k+\ell +1}\) and \({L^{i}} {\widetilde{X}}_{k+\ell +1} = {{\widetilde{V}}_{k+\ell +1}^{i}} {{\widetilde{K}}_{k+\ell +1}^{i}}\).

     
  4. 4.

       Select \(\varvec{\mu }_k\); see Sect. 4 and Algorithm 3.

     
  5. 5.
     
  6. 6.

       Compute P, Q, and Z (see text).

     
  7. 7.

       Truncate \(A ({\widetilde{X}}_{k+\ell +1} Z^*) = ({\widetilde{U}} _{k+\ell +2} P^*) (P \widetilde{{\underline{H}}}_{k+\ell +1} Z^*)\) to \(A X_{k+1} = U_{k+2} {\underline{H}}_{k+1}\).

       Truncate \({L^{i}} ({\widetilde{X}}_{k+\ell +1} Z^*) = ({{\widetilde{V}}_{k+\ell +1}^{i}} Q^{i*}) (Q^i {{\widetilde{K}}_{k+\ell +1}^{i}} Z^*)\) to \({L^{i}} X_{k+1} = {V_{k+1}^{i}} {K_{k+1}^{i}}\).

     
  8. 8.

       Truncate \(Z\widetilde{\varvec{c}}_{k+\ell +1}\) to obtain \({\varvec{c}}_{k+1}\) and set \({\varvec{x}}_{k+1} = X_{k+1} {\varvec{c}}_{k+1}\).

     
  9. 9.

    \(\quad {\mathbf{i }}{\mathbf{f }} \Vert {\varvec{x}}_{k+1} - {\varvec{x}}_k\Vert /\Vert {\varvec{x}}_k\Vert \) is sufficiently small then break

     

We have completed our discussion of the expansion and truncation phase of our algorithm. In the following section we discuss the extraction phase for one-parameter Tikhonov regularization and discuss the multiparameter case in later sections.

3 Parameter Selection in Standard Tikhonov

In this section we investigate parameter selection for general form one-parameter Tikhonov, where \(\ell = 1\), \(\mu = {\mu ^{1}}\), and \(L = {L^{1}}\). Multiple methods exist in the one-parameter case to determine particular \(\mu _k\), including the discrepancy principle, the L-curve criterion and generalized cross validation; see, for example, Hansen [11, Ch. 7]. We focus on the discrepancy principle which states that \(\mu _k\) must satisfy
$$\begin{aligned} \Vert A {\varvec{x}}_k(\mu _k) - {\varvec{b}}\Vert = \eta \varepsilon , \end{aligned}$$
(7)
where \(\Vert {\varvec{e}}\Vert \le \varepsilon \) and \(\eta >1\) is a user supplied constant independent of \(\varepsilon \).

Define the residual vector \({\varvec{r}}_k(\mu ) = A{\varvec{x}}_k(\mu ) - {\varvec{b}}\) and the function \(\varphi (\mu ) = \Vert {\varvec{r}}_k(\mu )\Vert ^2\). A nonnegative \(\mu _k\) satisfies the discrepancy principle if \(\varphi (\mu _k) = \eta ^2 \varepsilon ^2\). It is known that root finding methods can find solutions, for example, Lampe et al. [17] compare four of them. We prefer bisection for its reliability and straightforward analysis and implementation. The performance difference is not an issue because root finding requires a fraction of the total computation time and is no bottleneck. A unique solution \(\mu _k\) exists under mild conditions, see for instance [3]. Below we give a proof using our own notation.

Assume \({\underline{H}}_k\) and \(K_k\) are full rank and let \(P_k {\varSigma }_k Q_k^*\) be the singular value decomposition of \({\underline{H}}_k K_k^{-1}\). Let the singular values be denoted by
$$\begin{aligned} \sigma _{\max } = \sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _k = \sigma _{\min } > 0. \end{aligned}$$
(8)
Now we can express \({\varvec{c}}_k(\mu )\) and \(\varphi \) as
$$\begin{aligned} {\varvec{c}}_k(\mu )&= ({\underline{H}}_k^* {\underline{H}}_k + \mu K_k^* K_k)^{-1}{\underline{H}}_k^* \beta {\varvec{e}}_1\\&= K_k^{-1} (K_k^{-*} {\underline{H}}_k^* {\underline{H}}_k K_k^{-1} + \mu I)^{-1} K_k^{-*} {\underline{H}}_k^* \beta {\varvec{e}}_1 \\&= K_k^{-1} Q_k ({\varSigma }_k^2 + \mu I)^{-1} {\varSigma }_k P_k^* \beta {\varvec{e}}_1 \end{aligned}$$
and
$$\begin{aligned} \varphi (\mu )&= \Vert \beta {\varvec{e}}_1 - {\underline{H}}_k{\varvec{c}}_k(\mu )\Vert ^2\\&= \beta ^2 \Vert {\varvec{e}}_1 - {\underline{H}}_k K_k^{-1} Q_k ({\varSigma }_k^2 + \mu I)^{-1} {\varSigma }_k P_k^* {\varvec{e}}_1\Vert ^2 \\&= \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1 + P_k P_k^* {\varvec{e}}_1 {} - P_k {\varSigma }_k ({\varSigma }_k^2 + \mu I)^{-1} {\varSigma }_k P_k^* {\varvec{e}}_1\Vert ^2\\&= \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 {} + \beta ^2 \Vert \mu ({\varSigma }_k^2 + \mu I)^{-1} P_k^* {\varvec{e}}_1\Vert ^2. \end{aligned}$$
Or alternatively,
$$\begin{aligned} \varphi (\mu ) = \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 {} + \beta ^2 \sum _{j=1}^k \bigg ( \frac{\mu }{\sigma _j^2 + \mu } \bigg )^2 |P_k|_{1j}^2. \end{aligned}$$
(9)
Observe that \(P_k\) is a basis for the range of \({\underline{H}}_k\) and \(I - P_k P_k^*\) is the orthogonal projection onto the nullspace Open image in new window and is sometimes denoted by Open image in new window . Furthermore, it can be verified that \({\underline{H}}_k\beta {\varvec{e}}_1 \ne \varvec{0}\) if \(A^*{\varvec{b}} \ne \mathbf{0 }\), that is, Open image in new window .

Proposition 1

If \(\beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 \le \eta ^2 \varepsilon ^2 < \Vert {\varvec{b}}\Vert ^2\), then there exists a unique \(\mu _k\ge 0\) such that \(\varphi (\mu _k) = \eta ^2 \varepsilon ^2\).

Proof

(See also [3] and references therein). From (9) it follows that \(\varphi \) is a rational function with poles \(\mu =-\sigma _j^2\) for all \(\sigma _j>0\), therefore, \(\varphi \) is \(C^\infty \) on the interval \([0,\infty )\). Additionally, \(\varphi \) is a strictly increasing and bounded function on the same interval, since
$$\begin{aligned} \frac{d}{d\mu }\bigg ( \frac{\mu }{\sigma _j^2 + \mu } \bigg )^2 = 2 \frac{\mu \sigma _j^2}{(\sigma _j^2 + \mu )^3}> 0, \quad \text {for all} \quad \mu > 0 \end{aligned}$$
implies \(\varphi ^\prime (\mu ) > 0\) and
$$\begin{aligned} \varphi (0) = \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 \quad \text {and} \quad \lim _{\mu \rightarrow \infty } \varphi (\mu ) = \beta ^2 = \Vert {\varvec{b}}\Vert ^2. \end{aligned}$$
Consequently, there exists a unique \(\mu _k \in [0,\infty )\) such that \(\varphi (\mu _k) = \eta ^2 \varepsilon ^2\). \(\square \)
Beyond nonnegativity, the proposition above provides little insight on the location of \(\mu _k\) on the real axis, and we would like to have lower and upper bounds. We determine bounds in Proposition 2 and believe the results to be new. Both in practice and for the proof of the subsequent proposition, it is useful to remove nonessential parts of \(\varphi (\mu )\) and instead work with the function
$$\begin{aligned} {\widetilde{\varphi }}(\mu ) = \frac{\varphi (\mu ) - \varphi (0)}{\beta ^2} = \sum _{j=1}^k \bigg ( \frac{\mu }{\sigma _j^2 + \mu } \bigg )^2 |P_k|_{1j}^2, \end{aligned}$$
and the quantity
$$\begin{aligned} {\widetilde{\varepsilon }}^2 = \frac{\eta ^2 \varepsilon ^2 - \varphi (0)}{\beta ^2}. \end{aligned}$$
(10)
Then \(0\le {\widetilde{\varphi }}(\mu ) \le \rho \), where \(\rho = \Vert P_k^* \varvec{e}_1\Vert \le 1\), and \(\eta ^2\varepsilon ^2\) satisfies the bounds in Proposition 1 if and only if \(0 \le {\widetilde{\varepsilon }} < \rho \), and \(\varphi (\mu _k) = \eta ^2\varepsilon ^2\) if and only if \({\widetilde{\varphi }}(\mu _k) = {\widetilde{\varepsilon }}^2\).

Proposition 2

If \(0 \le {\widetilde{\varepsilon }} < \rho \), and \(\mu _k\) is such that \({\widetilde{\varphi }}(\mu _k) = {\widetilde{\varepsilon }}^2\), then
$$\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \sigma _{\min }^2 \le \mu _k \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \sigma _{\max }^2, \end{aligned}$$
(11)
where \(\sigma _{\min }\) and \(\sigma _{\max }\) are as in (8).

Proof

The key of the proof observe that
$$\begin{aligned} \frac{\mu }{\sigma _{\max }^2 + \mu } \le \frac{\mu }{\sigma _j^2 + \mu } \le \frac{\mu }{\sigma _{\min }^2 + \mu } \end{aligned}$$
for all \(j = 1\), ..., k. Combining this observation with the definition of \({\widetilde{\varphi }}\) yields
$$\begin{aligned} \left( \frac{\mu _k}{\sigma _{\max }^2 + \mu _k}\right) ^2 \sum _{j=1}^k |P_k|_{1j}^2 \le \sum _{j=1}^k \left( \frac{\mu _k}{\sigma _{j}^2 + \mu _k}\right) ^2 |P_k|_{1j}^2 \le \left( \frac{\mu _k}{\sigma _{\min }^2 + \mu _k}\right) ^2 \sum _{j=1}^k |P_k|_{1j}^2, \end{aligned}$$
Since \(\sum _{j=1}^k |P_k|_{1j}^2 = \Vert P_k^* {\varvec{e}}_1\Vert ^2 = \rho ^2\) and \({\widetilde{\varphi }}(\mu _k) = {\widetilde{\varepsilon }}^2\), it follows that
$$\begin{aligned} \frac{\mu _k}{\sigma _{\max }^2 + \mu _k} \rho \le {\widetilde{\varepsilon }} \le \frac{\mu _k}{\sigma _{\min }^2 + \mu _k} \rho . \end{aligned}$$
Hence, if \({\widetilde{\varepsilon }} = 0\), then \(\mu _k = 0\) and we are done. Otherwise \(\mu _k \ne 0\) and we can divide by \(\rho \), take the reciprocals, and subtract 1 to arrive at
$$\begin{aligned} \frac{\sigma _{\max }^2}{\mu _k} \ge \frac{\rho }{{\widetilde{\varepsilon }}} - 1 \ge \frac{\sigma _{\min }^2}{\mu _k}. \end{aligned}$$
So that
$$\begin{aligned} \frac{\mu _k}{\sigma _{\max }^2} \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \le \frac{\mu _k}{\sigma _{\min }^2}, \end{aligned}$$
and the proposition follows. \(\square \)
It is undesirable to work with the inverse of \(K_k\) when it becomes ill-conditioned. Instead it may be preferred to use the generalized singular value decomposition (GSVD)
$$\begin{aligned} {\underline{H}}_k= & {} P_k C_k Z_k^{-1} \\ K_k= & {} Q_k S_k Z_k^{-1}, \end{aligned}$$
where \(P_k\) and \(Q_k\) have orthogonal columns and \(Z_k\) is nonsingular. The matrices \(C_k\) and \(S_k\) are diagonal with entries \(0 \le c_1 \le c_2 \le \dots \le c_k\) and respectively \(s_1 \ge \dots \ge s_k \ge 0\), such that \(c_i^2 + s_i^2 = 1\). The generalized singular values are given by \(c_i / s_i\) and are understood to be infinite when \(s_i = 0\). If \(K_k\) is nonsingular, then the generalized singular values coincide with the singular values of \({\underline{H}}_k K_k^{-1}\). See Golub and Van Loan [8, Section 8.7.3] for more information.
Using a similar derivation as before, we can show that
$$\begin{aligned} \varphi (\mu ) = \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 {} + \beta ^2 \sum _{j=1}^k \bigg ( \frac{\mu s_j^2}{c_j^2 + \mu s_j^2} \bigg )^2 |P_k|_{1j}^2 \end{aligned}$$
and that the new bounds are given by
$$\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \bigg ( \frac{c_1}{s_1} \bigg )^2 \le \mu _k \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \bigg ( \frac{c_k}{s_k} \bigg )^2. \end{aligned}$$
Here \(\mu _k\) is unbounded from above if \(s_k = 0\), that is, if \(K_k\) becomes singular.

The bounds in this section can be readily computed and used to implement bisection and the secant method. We consider parameter selection for multiparameter regularization in the following section.

4 A Multiparameter Selection Strategy

Choosing satisfactory \({\mu _k^{i}}\) in multiparameter regularization is more difficult than the corresponding one-parameter problem. See for example [1, 2, 6, 14, 16, 20, 20]. In particular, there is no obvious multiparameter extension of the discrepancy principle. Nevertheless, methods based on the discrepancy principle exist and we will discuss three of them.

Brezinski et al. [2] had some success with operators splitting. Substituting \({\mu _k^{i}} = {\nu _k^{i}} {\omega _k^{i}}\) in (3) with nonnegative weights \({\omega _k^{i}}\) and \(\sum _{i=1}^\ell {\omega _k^{i}} = 1\) leads to
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \sum _{i=1}^\ell {\omega _k^{i}} (\Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + {\nu _k^{i}} \Vert {K_k^{i}} {\varvec{c}}\Vert ^2). \end{aligned}$$
This form of the minimization problem suggests the approximation of \(X_k^* {\varvec{x}}_\star \) by a linear combination [2, Sect. 3] of \({{\varvec{c}}_k^{i}}({\nu _k^{i}})\), where
$$\begin{aligned} {{\varvec{c}}_k^{i}}(\nu ) = \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} {} - \beta {\varvec{e}}_1\Vert ^2 + \nu \Vert {K_k^{i}} {\varvec{c}}\Vert ^2 \qquad (i = 1, 2, \dots , \ell ), \end{aligned}$$
(12)
and \({\nu _k^{i}}\) is such that \(\Vert {\underline{H}}_k {{\varvec{c}}_k^{i}}({\nu _k^{i}}) - \beta {\varvec{e}}_1\Vert = \eta \varepsilon \). Alternatively, Brezinski et al. [2] consider solvingwhere \({\nu ^{i}}\) are fixed and obtained from (12). The latter approach provides better results in exchange for an additional QR decomposition. In either case, operator splitting is a straightforward approach, but does not necessarily satisfy the discrepancy principle exactly.
Lu and Pereverzyev [19] and later Fornasier et al. [5] rewrite the constrained minimization problem as a differential equation and approximate
$$\begin{aligned} F(\varvec{\mu }) = \Vert {\underline{H}}_k {\varvec{c}}_k(\varvec{\mu }) - \beta {\varvec{e}}_1\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {K_k^{i}} {\varvec{c}}_k(\varvec{\mu })\Vert ^2 \end{aligned}$$
by a model function \(m(\varvec{\mu })\) which admits a straightforward solution to the constructed differential equation. However, it is unclear which \(\varvec{\mu }\) the method finds and its solution may depend on the initial guess. On the other hand, it is possible to keep all but one parameter fixed and compute a value for the free parameter such that the discrepancy principle is satisfied. This allows one to trace discrepancy hypersurfaces to some extent.

Gazzola and Novati [6] describe another interesting method. They start with a one-parameter problem and successively add parameters in a novel way, until each parameter of the full multiparameter problem has a value assigned. Especially in early iterations the discrepancy principle is not satisfied, but the parameters are updated in each iteration so that the norm of the residual is expected to approach \(\eta \varepsilon \). Unfortunately, we observed some issues in our implementation. For example, the quality of the result depends on initial values, as well as the order in which the operators are added (that is, the indexing of the operators). The latter problem is solved by a recently published and improved version of the method [7], which was brought to our attention during the revision of this paper.

We propose a new method that satisfies the discrepancy principle exactly, does not depend on an initial guess, and is independent of the scaling or indexing of the operators. The method uses the operator splitting approach in combination with new weights. Let us omit all k subscripts for the remainder of this section, and suppose \({\mu ^{i}} = \mu {\omega ^{i}}\), where \({\omega ^{i}}\) are nonnegative, but do not necessarily sum to one, and \(\mu \) is such that the discrepancy principle is satisfied. Then (3) can be written as
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}} {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \mu \sum _{i=1}^\ell {\omega ^{i}} \Vert {K^{i}}{\varvec{c}}\Vert ^2. \end{aligned}$$
(13)
Since the goal of regularization is to reduce sensitivity of the solution to noise, we use the weights
$$\begin{aligned} {\omega ^{i}} = \frac{\Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert D{{\varvec{c}}^{i}}({\nu ^{i}})\Vert }, \end{aligned}$$
(14)
which bias the regularization parameters in the direction of lower sensitivity with respect to changes in \({\nu ^{i}}\). Here D denotes the (total) derivative with respect to regularization parameter(s), and \({{\varvec{c}}^{i}}\) and \({\nu ^{i}}\) are defined as before, consequently
$$\begin{aligned} D {{\varvec{c}}^{i}}({\nu ^{i}}) = -({\underline{H}}^* {\underline{H}} + {\nu ^{i}} {K^{i}}^* {K^{i}})^{-1} {K^{i}}^*{K^{i}} {{\varvec{c}}^{i}}({\nu ^{i}}). \end{aligned}$$
If for some indices \(D {{\varvec{c}}^{i}}({\nu ^{i}}) = \mathbf{0 }\), then we take a \({{\varvec{c}}^{i}}({\nu ^{i}})\) as the solution, or replace \(\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \) by a small positive constant. With this parameter choice, the solution does not depend on the indexing of the operators, nor, up to a constant, on the scaling of A, \(\varvec{b}\), or any of the \({L^{i}}\). The former is easy to see; for the latter, let \(\alpha \), \(\gamma \), and \({\lambda ^{i}}\) be positive constants, and consider the scaled problem
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{\widehat{\varvec{x}}} \Vert \gamma {\varvec{b}} - \alpha A \widehat{\varvec{x}}\Vert ^2 {} + \mu \sum _{i=1}^{\ell } {{\widehat{\omega }}^{i}} \Vert \lambda ^{i} {L^{i}} \widehat{\varvec{x}}\Vert ^2. \end{aligned}$$
The noisy component of \(\gamma {\varvec{b}}\) is \(\gamma {\varvec{e}}\) and \(\Vert \gamma {\varvec{e}}\Vert \le \gamma \varepsilon \), hence the new discrepancy bound becomes
$$\begin{aligned} \Vert \alpha A \widehat{\varvec{x}} - \gamma {\varvec{b}}\Vert = \gamma \eta \varepsilon . \end{aligned}$$
The bound is satisfied when \({{\widehat{\omega }}^{i}} = \alpha ^2 / (\lambda ^i)^2\; {\omega ^{i}}\), since in this case
$$\begin{aligned} \widehat{\varvec{x}} = \Big (\alpha ^2 A^*A + \mu \sum _{i=1}^\ell {\omega ^{i}} \frac{\alpha ^2}{({\lambda ^{i}})^2}({\lambda ^{i}})^2 {L^{i}}^*{L^{i}} \Big )^{-1} \alpha A^* \gamma {\varvec{b}} = \frac{\gamma }{\alpha } {\varvec{x}}. \end{aligned}$$
and
$$\begin{aligned} \min _{\widehat{\varvec{x}}} \Vert \gamma {\varvec{b}} - \alpha A \widehat{\varvec{x}}\Vert ^2 {} + \mu \sum _{i=1}^{\ell } {{\widehat{\omega }}^{i}} \Vert \lambda ^{i} {L^{i}} \widehat{\varvec{x}}\Vert ^2 = \gamma ^2 \Big ( \min _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \mu \sum _{i=1}^\ell {\omega ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \Big ). \end{aligned}$$
It may be checked that the weights in (14) are indeed proportional to \(\alpha ^2/(\lambda ^i)^2\), that is
$$\begin{aligned} {\omega ^{i}} = \frac{\Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert } {}\sim \frac{\alpha ^2}{({\lambda ^{i}})^2}. \end{aligned}$$
There are additional viable choices for \({\omega ^{i}}\), including two smoothed versions of the above:
$$\begin{aligned} {\omega ^{i}} = \frac{\Vert {\underline{H}} {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert {\underline{H}} D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert } \quad \text {and}\quad {\omega ^{i}} = \frac{\Vert {K^{i}} {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert {K^{i}} D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }, \end{aligned}$$
which consider the sensitivity of \({{\varvec{c}}^{i}}({\nu ^{i}})\) in the range of \({\underline{H}}\) and \({K^{i}}\) respectively. We summarize the new parameter selection in Algorithm 3 below.

Algorithm 3

(Multiparameter selection)

Input: Projected matrices \({\underline{H}}\), \({K^{1}}\), ..., \({K^{\ell }}\), \(\beta = \Vert {\varvec{b}}\Vert \), noise estimate \(\varepsilon \), uncertainty parameter \(\eta \), and threshold \(\tau \).

Output: Regularization parameters \({\mu ^{1}}\), ..., \({\mu ^{\ell }}\).
  1. 1.

    Use (12) to compute \({{\varvec{c}}^{i}}\) and \({\nu ^{i}}\).

    i f \(\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \le \tau \Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \) for some i then

     
  2. 2.

       Set \({\omega ^{i}} = \tau ^{-1}\); or set \({\mu ^{i}} = {\nu ^{i}}\) and \({\mu ^{j}} = 0\) for \(j\ne i\).

    else

     
  3. 3.

       Let \({\omega ^{i}} = \Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert /\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \).

     
  4. 4.

       Compute \(\mu \) in (13) such that the discrepancy principle is satisfied.

     
  5. 5.

       Set \({\mu ^{i}} = \mu {\omega ^{i}}\).

     
An interesting property of Algorithm 3 is that, under certain conditions, \({\varvec{c}}(\varvec{\mu }({\widetilde{\varepsilon }}))\) converges to the unregularized least squares solution
$$\begin{aligned} {\varvec{c}}(\mathbf{0 }) = ({\underline{H}}^* {\underline{H}})^{-1} {\underline{H}}^* \beta {\varvec{e}}_1 = {\underline{H}}^+ \beta {\varvec{e}}_1, \end{aligned}$$
as the \({\widetilde{\varepsilon }}\) goes to zero. Here \({\underline{H}}^+\) denotes the Moore–Penrose pseudoinverse and \({\varvec{c}}(\mathbf{0 })\) is the minimum norm solution of the unregularized problem. The following proposition formalizes this observation.

Proposition 3

Assume that \({\underline{H}}\) is full rank, \({\underline{H}}^*\beta {\varvec{e}}_1 \ne \mathbf{0 }\), and that \({K^{i}}\) is nonsingular for \(i=1\), ...\(\ell \). Let \({\widetilde{\varepsilon }}\) and \(\rho \) be defined as in Sect. 3, let \(\eta >1\) be fixed, and suppose that \({\nu ^{i}}({\widetilde{\varepsilon }})\) and
$$\begin{aligned} \varvec{\mu }({\widetilde{\varepsilon }}) = ({\mu ^{1}}({\widetilde{\varepsilon }}), \dots , {\mu ^{\ell }}({\widetilde{\varepsilon }})) = \mu ({\widetilde{\varepsilon }}) ({\omega ^{1}}({\nu ^{1}}({\widetilde{\varepsilon }})), \dots , {\omega ^{\ell }}({\nu ^{\ell }}({\widetilde{\varepsilon }}))) \end{aligned}$$
are computed according to Algorithm 3 for all \(0\le {\widetilde{\varepsilon }} < \rho \). Then
$$\begin{aligned} \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\omega ^{i}}({\nu ^{i}}({\widetilde{\varepsilon }})) = {\omega ^{i}}(0) \quad \text {and}\quad \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\varvec{c}}(\varvec{\mu }({\widetilde{\varepsilon }})) = {\varvec{c}}(\mathbf{0 }). \end{aligned}$$

Proof

First note that \({\underline{H}}^* \beta {\varvec{e}}_1 \ne \mathbf{0 }\) implies that \(\beta > 0\) and \(\rho > 0\). Since \({\underline{H}}\) is full rank, the maps
$$\begin{aligned} \nu \mapsto {{\varvec{c}}^{i}}(\nu ), \quad \nu \mapsto D {{\varvec{c}}^{i}}(\nu ), \quad \text {and}\quad \varvec{\mu } \mapsto {\varvec{c}}(\varvec{\mu }) \end{aligned}$$
are continuous for all \(\nu \ge 0\) and \(\varvec{\mu }\ge \mathbf{0 }\), where the latter bound should be interpreted elementwise. Hence
$$\begin{aligned} \lim _{\nu \downarrow 0} {{\varvec{c}}^{i}}(\nu ) = {{\varvec{c}}^{i}}(0), \quad \lim _{\nu \downarrow 0} D{{\varvec{c}}^{i}}(\nu ) = D {{\varvec{c}}^{i}}(0), \quad \text {and}\quad \lim _{\varvec{\mu } \downarrow \mathbf{0 }} {\varvec{c}}(\varvec{\mu }) = {\varvec{c}}(\mathbf{0 }). \end{aligned}$$
It remains to be shown that
$$\begin{aligned} \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\nu ^{i}}({\widetilde{\varepsilon }}) = 0, \quad \Vert D {{\varvec{c}}^{i}}(0)\Vert \ne 0, \quad \text {and}\quad \lim _{{\widetilde{\varepsilon }} \downarrow 0} \varvec{\mu }({\widetilde{\varepsilon }}) = \mathbf{0 }. \end{aligned}$$
(15)
Let \({\widetilde{\varepsilon }}\) be restricted to the interval \([0, \rho /2]\) and define \({\nu _{\max }^{i}} = \sigma _{\max }^2({\underline{H}} ({K^{i}})^{-1})\). By Proposition 2,
$$\begin{aligned} 0 \le {\nu ^{i}}({\widetilde{\varepsilon }}) \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}}\, {\nu _{\max }^{i}} \le {\nu _{\max }^{i}}, \end{aligned}$$
which proves the first limit in (15). Furthermore, using the definitions of \({{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\) and \(D {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\) we find the bounds
$$\begin{aligned}&0< \rho \beta \frac{\sigma _{\min }({\underline{H}})}{ \Vert {\underline{H}}\Vert ^2 + {\nu _{\max }^{i}} \Vert {K^{i}}\Vert ^2} \le \Vert {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\Vert \le \rho \beta \Vert {\underline{H}}^+{\varvec{e}}_1\Vert ,\\&0 < \rho \beta \frac{\sigma _{\min }({\underline{H}}) \sigma _{\min }^2({K^{i}})}{ (\Vert {\underline{H}}\Vert ^2 + {\nu _{\max }^{i}} \Vert {K^{i}}\Vert ^2)^2} \le \Vert D {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\Vert \le \rho \beta \frac{\Vert {K^{i}}\Vert ^2\, \Vert {\underline{H}}^+{\varvec{e}}_1\Vert }{\sigma _{\min }^2({\underline{H}})}, \end{aligned}$$
which show that the inequality in (15) is satisfied. Moreover, the bounds show there exist \(\omega _{\min }\) and \(\omega _{\max }\) such that
$$\begin{aligned} 0< \omega _{\min } \le {\omega ^{i}}({\widetilde{\varepsilon }}) \le \omega _{\max } < \infty . \end{aligned}$$
Now, let \({\mathbf {K}}({\widetilde{\varepsilon }})\) be the nonsingular matrix satisfying
$$\begin{aligned} {\mathbf {K}}({\widetilde{\varepsilon }})^*{\mathbf {K}}({\widetilde{\varepsilon }}) = \sum _{i=1}^\ell {\omega ^{i}}({\widetilde{\varepsilon }}) {K^{i}}^*{K^{i}}, \end{aligned}$$
then it can be checked that
$$\begin{aligned} \Vert {\underline{H}} {\mathbf {K}}({\widetilde{\varepsilon }})^{-1}\Vert ^2 \le \frac{\Vert {\underline{H}}\Vert ^2}{\min _i \omega _{\min } \sigma _{\min }^2({K^{i}})} < \infty . \end{aligned}$$
Define the right hand side of the equation above as M, then by Proposition 2, each entry of \(\varvec{\mu }({\widetilde{\varepsilon }})\) is bounded from below by 0 and from above by
$$\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}}\, M \omega _{\max }, \end{aligned}$$
which goes to 0 as \({\widetilde{\varepsilon }} \downarrow 0\). Therefore, this proves second limit in (15). \(\square \)

Proposition 3 is related to [9, Thm 3.3.3], where it is shown that the solution of a standard form Tikhonov regularization problem converges to a minimum norm least squares solution when the discrepancy principle is used and the noise converges to zero.

In this section we have discussed a new parameter selection method. In the next section we will look at the effect of perturbations in the parameters on the obtained solutions.

5 Perturbation Analysis

The goal of regularization is to make reconstruction robust with respect to noise. By extension, a high sensitivity to the regularization parameters is undesirable. Consider a set of perturbed parameters \(\varvec{\mu }_k + {\varDelta }\varvec{\mu }\); if \(\Vert {\varDelta }\varvec{\mu }\Vert \) is sufficiently smallwhere M and \({\varDelta } M\) are defined as
$$\begin{aligned} M = {\underline{H}}_k^* {\underline{H}}_k + \sum _{i=1}^\ell {\mu _k^{i}} {K_k^{i}}^* {K_k^{i}}, \quad {\varDelta } M = \sum _{i=1}^\ell {\varDelta }{\mu _k^{i}} {K_k^{i}}^* {K_k^{i}}. \end{aligned}$$
(16)
Therefore, one might choose \(\varvec{\mu }_k\) to minimize the sensitivity measure
$$\begin{aligned} \Vert D{\varvec{c}}(\varvec{\mu }_k){\varDelta }\varvec{\mu }\Vert = \Vert M^{-1}{\varDelta } M{\varvec{c}}(\varvec{\mu }_k)\Vert . \end{aligned}$$
To see the connection with the previous section, suppose that \(\varvec{\mu }_k = {\nu _k^{i}} {\varvec{e}}_i\) and \({\varDelta }\varvec{\mu } = \pm \Vert {\varDelta }\varvec{\mu }\Vert {\varvec{e}}_i\), then
$$\begin{aligned} \Vert M^{-1}{\varDelta } M\Vert \ge \frac{\Vert M^{-1}{\varDelta } M{\varvec{c}}_k(\varvec{\mu }_k)\Vert }{\Vert \varvec{c}_k(\varvec{\mu }_k)\Vert } = \frac{\Vert D{\varvec{c}}_k(\varvec{\mu }_k){\varDelta }\varvec{\mu }\Vert }{\Vert {\varvec{c}}_k(\varvec{\mu }_k)\Vert } = \frac{\Vert D {{\varvec{c}}_k^{i}}({\nu _k^{i}})\Vert \,\Vert {\varDelta }\varvec{\mu }\Vert }{\Vert {{\varvec{c}}_k^{i}}({\nu _k^{i}})\Vert } = \frac{\Vert {\varDelta }\varvec{\mu }\Vert }{{\omega _k^{i}}} \end{aligned}$$
Thus, larger weights \({\omega _k^{i}}\) correspond to smaller lower bounds on \(\Vert M^{-1}{\varDelta } M\Vert \). Having small lower bounds is desirable, since we show in Propositions 4 and 5 that minimizing \(\Vert M^{-1}{\varDelta } M\Vert \) is equivalent to minimizing upper bounds on the forward and backward errors respectively.

Proposition 4

Given regularization parameters \({\mu _k^{i}}\) and perturbations \({\mu _\star ^{i}} = {\mu _k^{i}} + {\varDelta }{\mu _k^{i}}\), let \({\varvec{c}}_k = {\varvec{c}}_k(\varvec{\mu }_k)\), \({\varvec{c}}_\star = {\varvec{c}}_k(\varvec{\mu }_\star )\), \({\varvec{x}}_k = X_k {\varvec{c}}_k\), and \({\varvec{x}}_\star = X_k {\varvec{c}}_\star \). Assume \({\underline{H}}_k\) and all \({K_k^{i}}\) are of full rank and define matrices M and \({\varDelta } M\) as in (16). If M and \(M + {\varDelta } M\) are nonsingular and the \({\varDelta } {\mu _k^{i}}\) are sufficiently small so that \(\Vert M^{-1} {\varDelta } M\Vert < 1\), then
$$\begin{aligned} \frac{ \Vert {\varvec{x}}_k - {\varvec{x}}_\star \Vert }{ \Vert {\varvec{x}}_k\Vert } \le \frac{ \Vert M^{-1} {\varDelta } M \Vert }{ 1 - \Vert M^{-1} {\varDelta } M \Vert }. \end{aligned}$$

Proof

Observe that \({\varvec{c}}_k = M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1\) and \(\varvec{c}_\star = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1\). With a little manipulation we obtain
$$\begin{aligned} {\varvec{c}}_\star = (M + {\varDelta } M)^{-1} M {\varvec{c}}_k = (I + M^{-1} {\varDelta } M)^{-1} {\varvec{c}}_k = \sum _{j=0}^\infty (-M^{-1} {\varDelta } M)^j {\varvec{c}}_k. \end{aligned}$$
It follows that
$$\begin{aligned} \frac{\Vert {\varvec{c}}_k - {\varvec{c}}_\star \Vert }{\Vert {\varvec{c}}_k\Vert } = \frac{1}{\Vert {\varvec{c}}_k\Vert } \bigg \Vert \sum _{j=1}^\infty (-M^{-1} {\varDelta } M)^j {\varvec{c}}_k \bigg \Vert \le \sum _{j=1}^\infty \Vert M^{-1}{\varDelta } M\Vert ^j \le \frac{ \Vert M^{-1} {\varDelta } M\Vert }{ 1 - \Vert M^{-1} {\varDelta } M\Vert }. \end{aligned}$$
Since \(X_k\) has orthonormal columns, the result of the proposition follows. \(\square \)
One may wonder if it is possible to pick a vector \({\varvec{f}}\) close to \(\beta {\varvec{e}}_1\) such that
$$\begin{aligned} {\varvec{c}}_k = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}}. \end{aligned}$$
Or in other words, given perturbed regularization parameters, is there a perturbation of \(\beta {\varvec{e}}_1\) such that the optimal approximation to the exact solution is obtained? The following proposition provides a positive answer.

Proposition 5

Under the assumptions of Proposition 4, there exist vectors \({\varvec{f}}\) and \({\varvec{g}}\) such that \({\varvec{c}}_k = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}}\) and \({\varvec{c}}_\star = M^{-1} {\underline{H}}_k^* {\varvec{g}}\). Furthermore, \({\varvec{f}}\) and \({\varvec{g}}\) satisfy
$$\begin{aligned}&\frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{f}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert } \le \kappa ({\underline{H}}_k) \frac{ \Vert M^{-1} {\varDelta } M\Vert }{ 1 - \Vert M^{-1} {\varDelta } M\Vert },\\&\frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{g}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert } \le \kappa ({\underline{H}}_k) \Vert M^{-1} {\varDelta } M\Vert \end{aligned}$$
where \(\kappa ({\underline{H}}_k)\) is the condition number of \({\underline{H}}_k\).

Proof

The vector \({\varvec{f}}\) is easy to derive using the ansatz
$$\begin{aligned} (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}} = M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1. \end{aligned}$$
Let \({\underline{H}}_k = QR\) denote the reduced QR-decomposition of \({\underline{H}}_k\), then
$$\begin{aligned} R^* Q^* {\varvec{f}} = (M + {\varDelta } M) M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1, \end{aligned}$$
and
$$\begin{aligned} {\varvec{f}} = Q R^{-*} (M + {\varDelta } M) M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1 + (I - Q Q^*) {\varvec{v}} \end{aligned}$$
for arbitrary \({\varvec{v}}\). Indeed, it is easy to verify that the above vector satisfies
$$\begin{aligned} {\varvec{c}}_k = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}}. \end{aligned}$$
If we choose \({\varvec{v}} = \beta {\varvec{e}}_1\), then
$$\begin{aligned} {\varvec{f}} = Q R^{-*} {\varDelta } M M^{-1} R^* Q^* \beta {\varvec{e}}_1 + \beta {\varvec{e}}_1 \end{aligned}$$
so that
$$\begin{aligned} \frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{f}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert } = \Vert Q R^{-*} {\varDelta } M M^{-1} R^* Q^* {\varvec{e}}_1 \Vert \le \Vert R^{-*}\Vert \; \Vert R^*\Vert \; \Vert {\varDelta } M M^{-1}\Vert . \end{aligned}$$
Here \(\Vert R^{-*}\Vert \; \Vert R^*\Vert \) is the condition number \(\kappa ({\underline{H}}_k)\) and \(\Vert {\varDelta } M M^{-1}\Vert = \Vert M^{-1} {\varDelta } M\Vert \), since both M and \({\varDelta } M\) are symmetric. This proves the first part of the proposition.
The second part is analogous. In particular, we use the ansatz
$$\begin{aligned} M^{-1} {\underline{H}}_k^* {\varvec{g}} = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1 \end{aligned}$$
and derive
$$\begin{aligned} {\varvec{g}} = R^{-*} Q M (M + {\varDelta } M)^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1 + (I - Q Q^*) \beta {\varvec{e}}_1. \end{aligned}$$
Again it is easy to verify that \({\varvec{c}}_\star = M^{-1} {\underline{H}}_k^* {\varvec{g}}\). Observe that \({\varvec{g}}\) can be rewritten as
$$\begin{aligned} {\varvec{g}} = R^{-*} Q ((I + {\varDelta } M M^{-1})^{-1} - I) R^* Q^* \beta {\varvec{e}}_1 + \beta {\varvec{e}}_1 \end{aligned}$$
such that
$$\begin{aligned} \frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{f}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert }&= \Vert R^{-*} ((I + {\varDelta } M M^{-1})^{-1} - I) R^* Q^* {\varvec{e}}_1\Vert \\&\le \Vert R^{-*}\Vert \; \Vert R^*\Vert \; \Vert (I + {\varDelta } M M^{-1})^{-1} - I\Vert . \end{aligned}$$
Since \(\Vert {\varDelta } M M^{-1}\Vert = \Vert M^{-1} {\varDelta } M\Vert < 1\), it follows that
$$\begin{aligned} \Vert (I + {\varDelta } M M^{-1})^{-1} - I\Vert \le \sum _{j=1}^\infty \Vert -{\varDelta } M M^{-1}\Vert ^j = \frac{ \Vert M^{-1} {\varDelta } M\Vert }{ 1 - \Vert M^{-1} {\varDelta } M\Vert }, \end{aligned}$$
which concludes the proof. \(\square \)

We have discussed forward and backward error bounds which help motivate our parameter choice. Now that we have investigated each of the three phases of our method, we are ready to show numerical results.

6 Numerical Experiments

We benchmark our algorithm with problems from Regularization Tools by Hansen [10]. Each problem provides an ill-conditioned \(n \times n\) matrix A, a solution vector \({\varvec{x}}_\star \) of length n and a corresponding measured vector \({\varvec{b}}\). We let \(n = 1024\) and add a noise vector \({\varvec{e}}\) to \({\varvec{b}}\). The entries of \({\varvec{e}}\) are drawn independently from the standard normal distribution. The noise vector is then scaled such that \(\varepsilon = \Vert {\varvec{e}}\Vert \) equals \(0.01 \Vert {\varvec{b}}\Vert \) or \(0.05 \Vert {\varvec{b}}\Vert \) for 1 and 5 % noise respectively. We use \(\eta = 1.01\) for the discrepancy bound in (7). We test the algorithms with 1000 different noise vectors for every triplet A, \({\varvec{x}}_\star \), and \({\varvec{b}}\) and report the median results.

The algorithms terminate when the relative difference between two subsequent approximations is less then 0.01, when \({\varvec{x}}_{k+1}\) is (numerically) linear dependent in \(X_k\), when both \(U_{k+1}\) and none of the \({V_k^{i}}\) can be expanded, or when a maximum number of iterations is reached. For Algorithm 2 we use a maximum of 20 iterations and for Algorithm 1 a maximum of \((\ell +1) \times 20\) iterations. For the sake of a fair comparison, the algorithms return the best obtained approximations and their iteration numbers.

For each test problem, the tables below list the relative error obtained with Algorithm 1, abbreviated by \(E_\text {od}\), and Algorithm 2, abbreviated by \(E_\text {md}\). OD and MD stand for one direction and multidirectional respectively. Also listed are the ratio \(\rho _E\) of \(E_\text {md}\) to \(E_\text {od}\) and the ratio \(\rho _\text {mv}\) of the number of matrix-vector products. That is,
$$\begin{aligned} \rho _E = \frac{E_\text {md}}{E_\text {od}} \quad \text {and}\quad \rho _\text {mv} = \frac{\# \text {MVs Algorithm 2} }{ \# \text {MVs Algorithm 1} } \end{aligned}$$
Only matrix-vector multiplications with A, \(A^*\), \({L^{i}}\), and \({L^{i}}^*\) count towards the total number of MVs used by each algorithm. We note, however, that multiplications with \({L^{i}}\) and \({L^{i}}^*\) are often less costly than multiplications with A and \(A^*\).
Table 1

One-parameter Tikhonov regularization results

Noise

1 %

5 %

Problem

\({E}_{od}\)

\({E}_{md}\)

\({\rho _E}\)

\(\rho _{mv}\)

\({E}_{od}\)

\({E}_{md}\)

\({\rho _E}\)

\(\rho _{mv}\)

Baart

1.73e\(-\)01

1.11e\(-\)01

6.44e\(-\)01

1.93e\(+\)00

2.91e\(-\)01

2.71e\(-\)01

9.33e\(-\)01

1.53e\(+\)00

Deriv2-1

2.44e\(-\)01

2.44e\(-\)01

1.00e\(+\)00

1.00e\(+\)00

3.32e\(-\)01

3.32e\(-\)01

1.00e\(+\)00

7.78e\(-\)01

Deriv2-2

2.35e\(-\)01

2.35e\(-\)01

1.00e\(+\)00

8.33e\(-\)01

3.22e\(-\)01

3.22e\(-\)01

9.99e\(-\)01

7.78e\(-\)01

Deriv2-3

4.35e\(-\)02

4.35e\(-\)02

1.00e\(+\)00

9.17e\(-\)01

7.97e\(-\)02

7.64e\(-\)02

9.59e\(-\)01

1.17e\(+\)00

Foxgood

3.31e\(-\)02

3.30e\(-\)02

9.98e\(-\)01

6.67e\(-\)01

6.64e\(-\)02

6.63e\(-\)02

9.98e\(-\)01

6.67e\(-\)01

Gravity-1

3.85e\(-\)02

3.41e\(-\)02

8.84e\(-\)01

1.08e\(+\)00

7.39e\(-\)02

6.86e\(-\)02

9.28e\(-\)01

1.11e\(+\)00

Gravity-2

5.53e\(-\)02

5.26e\(-\)02

9.51e\(-\)01

1.10e\(+\)00

8.66e\(-\)02

8.39e\(-\)02

9.69e\(-\)01

1.11e\(+\)00

Gravity-3

1.03e\(-\)01

9.21e\(-\)02

8.98e\(-\)01

1.08e\(+\)00

1.14e\(-\)01

1.10e\(-\)01

9.69e\(-\)01

1.11e\(+\)00

Heat

9.26e\(-\)02

9.12e\(-\)02

9.85e\(-\)01

1.05e\(+\)00

2.02e\(-\)01

1.91e\(-\)01

9.45e\(-\)01

1.37e\(+\)00

Phillips

2.50e\(-\)02

2.50e\(-\)02

1.00e\(+\)00

1.00e\(+\)00

4.52e\(-\)02

4.52e\(-\)02

9.99e\(-\)01

1.00e\(+\)00

Table 1 lists the results one-parameter Tikhonov regularization, where we used the following regularization operators. The first derivative operator \(L_1\) with stencil \([1,-1]\) for Gravity-3, Heat-5, Heat, and Phillips. The second derivative operator \(L_2\) with stencil \([1,-2,1]\) for Deriv2-1, Deriv2-2, Foxgood, Gravity-1, and Gravity-2. The third derivative operator \(L_3\) with stencil \([-1,3,-3,1]\) for Baart. The fifth derivative operator \(L_5\) with stencil \([-1,5,-10,10,-5,1]\) and Deriv2-3. The derivative operators \(L_d\) are of size \((n-d) \times n\).

The table shows that multidirectional subspace expansion can obtain small improvements in the relative error at the cost of a small number of extra matrix-vector products, especially for 1 % noise. We stress that in these cases, Algorithm 1 is allowed to perform additional MVs, but converges with a higher relative error. If there is no improvement in the relative error, we see that multidirectional subspace expansion can improve convergence, for example, for the Deriv2 problems as well as Foxgood.
Table 2

Multiparameter Tikhonov regularization results

Noise

1 %

5 %

Problem

\({E}_{od}\)

\({E}_{md}\)

\({\rho _E}\)

\(\rho _{mv}\)

\({E}_{od}\)

\({E}_{md}\)

\({\rho _E}\)

\(\rho _{mv}\)

Baart

1.72e\(-\)01

5.39e\(-\)02

3.12e\(-\)01

2.60e\(+\)00

2.84e\(-\)01

2.59e\(-\)01

9.14e\(-\)01

2.60e\(+\)00

Deriv2-1

2.27e\(-\)01

5.82e\(-\)03

2.56e\(-\)02

1.81e\(+\)00

3.21e\(-\)01

2.91e\(-\)02

9.08e\(-\)02

2.20e\(+\)00

Deriv2-2

2.29e\(-\)01

2.03e\(-\)02

8.84e\(-\)02

1.55e\(+\)00

2.95e\(-\)01

4.91e\(-\)02

1.66e\(-\)01

1.72e\(+\)00

Deriv2-3

4.35e\(-\)02

4.32e\(-\)02

9.93e\(-\)01

1.00e\(+\)00

7.71e\(-\)02

7.71e\(-\)02

1.00e\(+\)00

1.00e\(+\)00

Foxgood

3.29e\(-\)02

1.10e\(-\)02

3.35e\(-\)01

1.35e\(+\)00

6.26e\(-\)02

5.44e\(-\)02

8.69e\(-\)01

1.35e\(+\)00

Gravity-1

3.69e\(-\)02

1.83e\(-\)02

4.96e\(-\)01

1.18e\(+\)00

7.24e\(-\)02

4.52e\(-\)02

6.25e\(-\)01

1.63e\(+\)00

Gravity-2

5.52e\(-\)02

3.97e\(-\)02

7.19e\(-\)01

2.04e\(+\)00

8.52e\(-\)02

6.96e\(-\)02

8.17e\(-\)01

2.26e\(+\)00

Gravity-3

1.02e\(-\)01

9.24e\(-\)02

9.07e\(-\)01

1.89e\(+\)00

1.14e\(-\)01

1.08e\(-\)01

9.54e\(-\)01

1.72e\(+\)00

Heat

8.79e\(-\)02

8.77e\(-\)02

9.98e\(-\)01

1.19e\(+\)00

1.97e\(-\)01

1.83e\(-\)01

9.30e\(-\)01

1.40e\(+\)00

Phillips

2.49e\(-\)02

2.47e\(-\)02

9.90e\(-\)01

1.21e\(+\)00

4.08e\(-\)02

4.01e\(-\)02

9.83e\(-\)01

1.40e\(+\)00

Table 2 lists the results for multiparameter Tikhonov regularization. We used the following regularization operators for each problem: the derivative operator \(L_d\) as listed above, the identity operator I, and the orthogonal projection \((I - N_d N_d^*)\), where the columns of \(N_d\) are an orthonormal basis for the nullspace Open image in new window .

Overall, we observe larger improvements in the relative error for multidirectional subspace expansion, but also a larger number MVs. We no longer see cases where multidirectional subspace expansion terminates with fewer MVs. In fact, the relative error is the same for Heat, although more MVs are required. Finally, Fig. 1 illustrates an example of the improved results which can be obtained by using multidirectional subspace expansion.
Fig. 1

baart test matrix with \(n = 1024\) and 1 % noise. The solid line is the exact solution. The dashed line is the solution obtained with multiparameter regularization and the residual subspace expansion (Algorithm 1). The dotted line is the solution obtained with multiparameter regularization and multidirectional subspace expansion (Algorithm 2)

In the next tests we attempt to reconstruct the original image from a blurred and noisy observation. Consider an \(n \times n\) grayscale image with pixel values in the interval [0, 1]. Then \({\varvec{x}}\) is a vector of length \(n^2\) obtained by stacking the columns of the image below each other. The matrix A represents a Gaussian blurring operator, generated with blur from Regularization Tools. The matrix A is block-Toeplitz with half-bandwidth band=11 and the amount of blurring is given by the variance sigma=5. The entries of the noise vector \({\varvec{e}}\) are independently drawn from the standard normal distribution after which the vector is scaled such that \(\varepsilon = {\mathbb {E}}[\Vert {\varvec{e}}\Vert ] = 0.05 \Vert {\varvec{b}}\Vert \). We take \(\eta \) such that \(\Vert {\varvec{e}}\Vert \le \eta \varepsilon \) in 99.9 % of the cases. That is,
$$\begin{aligned} \eta = 1 + \frac{3.090232}{\sqrt{2 n^2}}. \end{aligned}$$
For regularization we choose an approximation to the Perona–Malik [22] operatorwhere \(\rho \) is a small positive constant. Because Open image in new window is a nonlinear operator, we first perform a small number of iterations with a finite difference approximation \(L_{{\varvec{b}}}\) of Open image in new window . The resulting intermediate solution \(\widetilde{\varvec{x}}\) is used for a new approximation \(L_{\tilde{\varvec{x}}}\) of Open image in new window . Finally, we run the algorithms a second time with \(L_{\widetilde{\varvec{x}}}\) and more iterations; see Reichel et al. [23] for more information regarding the implementation of the Perona–Malik operator.
Fig. 2

Deblurring results for Lizards. The original (left), observed (middle), and reconstructed images (right)

Fig. 3

Deblurring results for Saturn. The original (left), observed (middle), and reconstructed images (right)

The first test image is also used in [13, 23, 25], and is shown in Figure 2. We use \(\rho = 0.075\), 20 iterations for the first run, and 100 iterations for the second run. The second image is an image of Saturn, see Figure 3. For this image we use \(\rho = 0.03\), 25 iterations for the first run and 150 iterations for the second run. In both cases we stop the iterations around the point where convergence flattens out, as can be seen from the convergence history in Figure 4. The figure uses the peak signal-to-noise ratio (PSNR) given by
$$\begin{aligned} -20 \log _{10}\left( \frac{\Vert {\varvec{x}}_\star - {\varvec{x}}_k\Vert }{n} \right) \end{aligned}$$
versus the iteration number k. A higher PSNR means a higher quality reconstruction.
Fig. 4

Convergence history for Lizards (left) and Saturn (right)

Table 3

The number of matrix-vector products and wall clock time used by the different methods. The results in the upper rows are for Lizards and the results in the lower rows are for Saturn

Method

Total

\({\mathbf {A}}\)

\(A^*\)

L

\(L^*\)

Time (s)

Alg 1

399

100

100

100

99

30.9

Alg 2

581

191

100

191

99

38.7

Parity

395

129

69

129

68

23.5

Alg 1

599

150

150

150

149

82.3

Alg 2

889

295

150

295

149

98.4

Parity

637

211

108

211

107

62.3

We observe that multidirectional subspace expansion may allow convergence to a more accurate solution. Because multidirectional subspace expansion requires extra matrix-vector products, we investigate the performance in Table 3 and when the PSNR of the output of Algorithm 2 achieves parity with the PSNR of the output of Algorithm 1. There is only a small difference in the total number of matrix-vector products when parity is achieved, but a large improvement in wall clock time. This improvement is in large part due to the block operations which can only be used Algorithm 2. For reference, the runtimes were obtained on an Intel Core i7-3770 and with MATLAB R2015b on 64-bit Linux 4.2.5.

7 Conclusions

We have presented a new method for large-scale Tikhonov regularization problems. In accordance with Algorithm 2, the method combines a new multidirectional subspace expansion with optional truncation to produce a higher quality search space. The multidirectional expansion generates a richer search space, whereas the truncation ensures moderate growth. Numerical results illustrate that our method can yield more accurate results or faster convergence. Furthermore, we have presented lower and upper bounds on the regularization parameter when the discrepancy principle is applied to one-parameter regularization. These lower and upper bounds can be used in particular to initiate bisection or the secant method. In addition, we have introduced a straightforward parameter choice for multiparameter regularization, as summarized by Algorithm 3. The parameter selection satisfies the discrepancy principle, and is based on easy to compute derivatives that are related to the perturbation results of Sect. 5.

Notes

Acknowledgments

We would like to thank the editor and the referees for their excellent feedback and helpful suggestions.

References

  1. 1.
    Belge, M., Kilmer, M.E., Miller, E.L.: Efficient determination of multiple regularization parameters in a generalized l-curve framework. Inverse Probl. 18(4), 1161–1183 (2002)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Brezinski, C., Redivo-Zaglia, M., Rodriguez, G., Seatzu, S.: Multi-parameter regularization techniques for ill-conditioned linear systems. Numer. Math. 94(2), 203–228 (2003)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Calvetti, D., Reichel, L.: Tikhonov regularization of large linear problems. BIT 43(2), 263–283 (2003)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Fong, D., Saunders, M.A.: LSMR: an iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33(5), 2950–2971 (2011)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Fornasier, M., Naumova, V., Pereverzyev, S.V.: Parameter choice strategies for multipenalty regularization. SIAM J. Numer. Anal. 52(4), 1770–1794 (2014)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Gazzola, S., Novati, P.: Multi-parameter Arnoldi–Tikhonov methods. Electron. Trans. Numer. Anal. 40, 452–475 (2013)MathSciNetMATHGoogle Scholar
  7. 7.
    Gazzola, S., Reichel, L.: A new framework for multi-parameter regularization. BIT, 1–31 (2015)Google Scholar
  8. 8.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)MATHGoogle Scholar
  9. 9.
    Groetsch, C.: The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Pitman Publishing, Boston (1984)MATHGoogle Scholar
  10. 10.
    Hansen, P.C.: Regularization tools: a matlab package for analysis and solution of discrete ill-posed problems. Numer. Algorithms 6, 1–35 (1994)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia (1998)CrossRefGoogle Scholar
  12. 12.
    Hochstenbach, M.E., Reichel, L.: An iterative method for Tikhonov regularization with a general linear regularization operator. J. Integral Equ. Appl. 22(3), 465–482 (2010)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Hochstenbach, M.E., Reichel, L., Yu, X.: A Golub-Kahan-type reduction method for matrix pairs. J. Sci. Comput. 65(2), 767–789 (2015)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Ito, K., Jin, B., Takeuchi, T.: Multi-parameter Tikhonov Regularization. arXiv:1102.1173v2 [math.NA] (2011). Preprint
  15. 15.
    Kilmer, M.E., Hansen, P., Español, M.: A projection-based approach to general-form Tikhonov regularization. SIAM J. Sci. Comput. 29(1), 315–330 (2007)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Lampe, J., Reichel, L., Voss, H.: Large-scale Tikhonov regularization via reduction by orthogonal projection. Linear Algebra Appl. 436(8), 2845–2865 (2012)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Li, R.C., Ye, Q.: A Krylov subspace method for quadratic matrix polynomials with applications to constrained least squares problems. SIAM J. Matrix Anal. Appl. 25(2), 405–528 (2003)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Lu, S., Pereverzyev, S.V.: Multi-parameter regularization and its numerical realization. Numer. Math. 118(1), 1–31 (2011)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Lu, S., Pereverzyev, S.V., Shao, Y., Tautenhahn, U.: Discrepancy curves for multi-parameter regularization. J. Inverse Ill-Posed Probl. 18(6), 655–676 (2010)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8(1), 43–71 (1982)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)CrossRefGoogle Scholar
  23. 23.
    Reichel, L., Sgallari, F., Ye, Q.: Tikhonov regularization based on generalized Krylov subspace methods. Appl. Numer. Math. 62(9), 1215–1228 (2012)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Reichel, L., Yu, X.: Matrix decompositions for Tikhonov regularization. Electron. Trans. Numer. Anal. 43, 223–243 (2015)MathSciNetMATHGoogle Scholar
  25. 25.
    Reichel, L., Yu, X.: Tikhonov regularization via flexible Arnoldi reduction. BIT 55(4), 1145–1168 (2015)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations