## Introduction

We consider one-parameter and multiparameter Tikhonov regularization problems of the form

\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \sum _{i=1}^{\ell } {\mu ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \qquad (\ell \ge 1), \end{aligned}
(1)

where $$\Vert \cdot \Vert$$ denotes the 2-norm and the superscript i is used as an index. We focus on large-scale discrete ill-posed problems such as the discretization of Fredholm integral equations of the first kind. More precisely, assume A is an ill-conditioned or even singular $$m \times n$$ matrix with $$m \ge n$$, $${L^{i}}$$ are $$p^{i} \times n$$ matrices such that the nullspaces of A and $${L^{i}}$$ intersect trivially, and $${\mu ^{i}}$$ are nonnegative regularization parameters. Furthermore, assume $${\varvec{b}}$$ is contaminated by an error $${\varvec{e}}$$ and satisfies $$\varvec{b} = A {\varvec{x}}_\star + {\varvec{e}}$$, where $${\varvec{x}}_\star$$ is the exact solution. Finally, we assume that a bound $$\Vert {\varvec{e}}\Vert \le \varepsilon$$ is available, so that the discrepancy principle can be used.

In one-parameter Tikhonov regularization ($$\ell = 1$$), the choice of the regularization operator is typically significant, since frequencies in the nullspace of the operator remain unpenalized. Multiparameter Tikhonov can be used when a satisfactory choice of the regularization operator is unknown in advance, or can be seen as an attempt to combine the strengths of different regularization operators. In some applications, using more than one regularization operator and parameter allows for more accurate solutions [1, 2, 17, 20].

Solving (1) for large-scale problems may be challenging. In case the $${\mu ^{i}}$$ are fixed a priori, methods such as LSQR  or LSMR  may be used. However, the problem becomes more complicated when the regularization parameters are not fixed in advance [12, 15, 17]. In this paper, we present a new subspace method consisting of three phases; a new expansion phase, a new extraction phase, and a new truncation phase. To be more specific, let be a subspace of dimension $$k \ll n$$, and let the columns of $$X_k$$ form an orthonormal basis for . Then we can compute matrix decompositions

\begin{aligned} A X_k= & {} U_{k+1} {\underline{H}}_k \nonumber \\ {L^{i}} X_k= & {} {V_k^{i}} {K_k^{i}} \qquad (i = 1, 2, \dots , \ell ), \end{aligned}
(2)

where $$U_{k+1}$$ and $${V_k^{i}}$$ are have orthonormal columns, $$\beta {\varvec{u}}_1 = \varvec{b}$$, $$\beta = \Vert {\varvec{b}}\Vert$$, $${\underline{H}}_k$$ is a $$(k+1) \times k$$ Hessenberg matrix, and $${K_k^{i}}$$ is upper triangular. Denote $$\varvec{\mu } = ({\mu ^{1}}, \dots , {\mu ^{\ell }})$$ for convenience. Now restrict the solution space to so that $${\varvec{x}}_k(\varvec{\mu }) = X_k {\varvec{c}}_k(\varvec{\mu })$$, where

\begin{aligned} {\varvec{c}}_k(\varvec{\mu })= & {} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert A X_k {\varvec{c}} - {\varvec{b}}\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {L^{i}} X_k {\varvec{c}}\Vert ^2 \nonumber \\= & {} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {K_k^{i}} {\varvec{c}}\Vert ^2. \end{aligned}
(3)

The vector $${\varvec{e}}_1$$ is the first standard basis vector of appropriate dimension. Our paper has three contributions. First, a new expansion phase where we add multiple search directions to . Second, a new truncation phase which removes unwanted new search directions. Third, a new method for selecting the regularization parameters $${\mu _k^{i}}$$ in the extraction phase. The three phases work alongside each other: the intermediate solution obtained in the extraction phase is preserved in the truncation phase, whereas the remaining perpendicular component(s) from the expansion phase are removed.

The paper is organized as follows. In Sect. 2 an existing nonlinear subspace method is discussed, whereafter we propose the new multidirectional subspace expansion of the expansion phase. Discussion of the truncation phase follows immediately. Section 3 is focused on discrepancy principle based parameter selection for one-parameter regularization. New lower and upper bounds on the regularization parameter are provided. Sections 4 and 5 describe the extraction phase. In the former, a straightforward parameter selection strategy for multiparameter regularization is given, in the latter, a justification using perturbation analysis. Numerical experiments are performed in Sect. 6 and demonstrate the competitiveness of our new method. We end with concluding remarks in Sect. 7.

## Subspace Expansion for Multiparameter Tikhonov

Let us first consider one-parameter Tikhonov regularization with a general regularization operator. Then $$\ell = 1$$ and we write $$\mu = {\mu ^{1}}$$, $$L = {L^{1}}$$, and $$K_k = {K_k^{1}}$$, such that (1) simplifies to

\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 + \mu \Vert L {\varvec{x}}\Vert ^2. \end{aligned}

When $$L=I$$ we use the Golub–Kahan–Lanczos bidiagonalization procedure to generate the Krylov subspace In this case $${\underline{H}}_k$$ is lower bidiagonal and $$K_k$$ is the identity and

\begin{aligned} {\varvec{x}}_{k+1} = \frac{(I - X_k X_k^*) A^* {\varvec{u}}_{k+1}}{ \Vert (I - X_k X_k^*) A^* {\varvec{u}}_{k+1} \Vert } \end{aligned}

If $$L\ne I$$ one can still try to use the above Krylov subspace , however, it may be more natural to consider a shift-independent generalized Krylov subspace of the form spanned by the first k vectors in

\begin{aligned}&\text {Group 0}\quad A^*{\varvec{b}} \\&\text {Group 1}\quad (A^*A) A^*{\varvec{b}}, (L^*L) A^*{\varvec{b}} \\&\text {Group 2}\quad (A^*A)^2 A^*{\varvec{b}}, (A^*A) (L^*L) A^*{\varvec{b}}, (L^*L) (A^*A) A^*{\varvec{b}}, (L^*L)^2 A^*{\varvec{b}} \\&\dots \end{aligned}

This generalized Krylov subspace was first studied by Li and Ye  and later by Reichel et al. . An orthonormal basis can be created with a generalization of Golub–Kahan–Lanczos bidiagonalization . However, while the search space grows linearly as a function of the number of matrix-vector products, the dimension of the generalized Krylov subspace grows exponentially as a function of the total degree of a bivariate matrix polynomial. As a result, if we take any vector and write it as $$p(A^*A, L^*L)A^* {\varvec{b}}$$, where p is a bivariate polynomial, then p has at most degree $$\lfloor \log _2 k \rfloor$$. This low degree may be undesirable especially for small regularization parameters $$\mu$$. Reichel and Yu [24, 25] solve this in part with algorithms that can prioritize one operator over the other. For instance, if $${\varvec{w}}$$ is a vector in a group j and B has priority over A, then group $$j+1$$ contains $$(A^*A){\varvec{w}}$$, $$(B^*B){\varvec{w}}$$, $$(B^*B)^2{\varvec{w}}$$, ..., $$(B^*B)^\rho {\varvec{w}}$$. The downside is that $$\rho$$ is a user defined constant, and that the expansion vectors are not necessarily optimal.

An alternative approach is a greedy nonlinear method described by Lampe et al. . We briefly review their method and state a straightforward extension to multiparameter Tikhonov regularization. First note that the low-dimensional minimization in (3) simplifies to

\begin{aligned} {\varvec{c}}_k(\mu )&= \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert AX_k {\varvec{c}} - {\varvec{b}}\Vert ^2 {} + \mu \Vert LX_k {\varvec{c}}\Vert ^2 \\&= \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \mu \Vert K_k {\varvec{c}}\Vert ^2, \end{aligned}

in the one-parameter case. Next, compute a value $$\mu = \mu _k$$ using, e.g., the discrepancy principle. It is easy to verify that

\begin{aligned}&A^* {\varvec{b}} - (A^* A + \mu _k L^* L) {\varvec{x}}_k(\mu _k)\\&\quad = A^* U_{k+1} (\beta {\varvec{e}}_1 - {\underline{H}}_k {\varvec{c}}_k(\mu _k)) {} + \mu _k L^* V_{k} K_k {\varvec{c}}_k(\mu _k) \end{aligned}

is perpendicular to , as well as the gradient of the cost function

\begin{aligned} {\varvec{x}} \mapsto \frac{1}{2}( \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 + \mu \Vert L {\varvec{x}}\Vert ^2 ) \end{aligned}

in the point $${\varvec{x}}_k(\mu _k)$$. Therefore, this vector is used to expand the search space. As usual, expansion and extraction are repeated until suitable stopping criteria are met.

As previously stated, Lampe et al.  consider only one-parameter Tikhonov regularization, however, their method readily extends to multiparameter Tikhonov regularization. Again, the first step is to decide on regularization parameters $$\varvec{\mu }_k$$. Next, use the residual of the normal equations

\begin{aligned}&A^* {\varvec{b}} - \Big ( A^* A + \sum _{i=1}^\ell {\mu _k^{i}}{L^{i}}^* {L^{i}} \Big ) {\varvec{x}}_k(\varvec{\mu }_k)\\&\quad = A^* U_{k+1} (\beta {\varvec{e}}_1 - {\underline{H}}_k {\varvec{c}}_k(\varvec{\mu }_k)) {} - \sum _{i=1}^\ell {\mu _k^{i}} {L^{i}}^* {V_k^{i}} {K_k^{i}} {\varvec{c}}_k(\varvec{\mu }_k), \end{aligned}

to expand the search space. Note that the residual is again orthogonal to as well as the gradient of the cost function

\begin{aligned} {\varvec{x}} \mapsto \frac{1}{2}\Big ( \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \Big ). \end{aligned}

We summarize this multiparameter method in Algorithm 1, but remark that in practice we initially use Golub–Kahan–Lanczos bidiagonalization until a $$\varvec{\mu }_k$$ can be found that satisfies the discrepancy principle.

### Algorithm 1

(Generalized Krylov subspace Tikhonov regularization; extension of )

Input: Measurement matrix A, regularization operators $${L^{1}}$$, ..., $${L^{\ell }}$$, and data $${\varvec{b}}$$.

Output: Approximate solution $${\varvec{x}}_k \approx {\varvec{x}}_\star$$.

1. 1.

Initialize $$\beta = \Vert {\varvec{b}}\Vert$$, $$U_1 = {\varvec{b}} / \beta$$, $$X_0 = []$$, $${\varvec{x}}_0 = \mathbf{0 }$$, and $$\varvec{\mu }_0 = \mathbf{0 }$$. for $$k = 1, 2, \dots$$ d o

2. 2.

Expand $$X_{k-1}$$ with $$A^* {\varvec{b}} - ( A^* A + \sum _{i=1}^\ell {\mu _{k-1}^{i}} {L^{i}}^* {L^{i}}) {\varvec{x}}_{k-1}$$.

3. 3.

Update $$A X_k = U_{k+1} {\underline{H}}_k$$ and $${L^{i}} X_k = {V_k^{i}} {K_k^{i}}$$.

4. 4.

Select $$\varvec{\mu }_k$$; see Sect. 4 and Algorithm 3.

5. 5. .

6. 6.

$${\varvec{x}}_k = X_k {\varvec{c}}_k$$.

7. 7.

i f $$\Vert {\varvec{x}}_k - {\varvec{x}}_{k-1}\Vert /\Vert {\varvec{x}}_k\Vert$$ is sufficiently small then break

Suitable regularization operators often depend on the problem and its solution. Multiparameter regularization may be used when a priori information is lacking. In this case, it is not obvious that the residual vector above is a “good” expansion vector, in particular if the intermediate regularization parameters $${\varvec{\mu }}_k$$ are not necessarily accurate. Hence, we propose to remove the dependence on the parameters to some extent by expanding the search space with the vectors

\begin{aligned} A^* A {\varvec{x}}_k(\varvec{\mu }_k), \quad {L^{1}}^* {L^{1}} {\varvec{x}}_k(\varvec{\mu }_k), \quad \dots , \quad {L^{\ell }}^* {L^{\ell }} {\varvec{x}}_k(\varvec{\mu }_k), \end{aligned}
(4)

separately. Here, we omit $$A^* {\varvec{b}}$$ as it is already contained in $$X_k$$. Since we expand the search space in multiple directions, we refer to this expansion as a “multidirectional” subspace expansion. Observe that the previous residual expansion vector is in the span of the multidirectional expansion vectors.

It is unappealing for the search space to grow with $$\ell +1$$ basis vectors per iteration, because the cost of orthogonalization and the cost of solving the projected problems depend on the dimension of the search space. Therefore, we wish to condense the best portions of the multiple directions in a single vector, and use the following approach. First we expand $$X_k$$ with the vectors in (4) and obtain $${\widetilde{X}}_{k+\ell +1}$$. Then we compute the decompositions

\begin{aligned} A {\widetilde{X}}_{k+\ell +1}{}= & {} {\widetilde{U}}_{k+\ell +2} \widetilde{{\underline{H}}}_{k+\ell +1} \\ {L^{i}} {\widetilde{X}}_{k+\ell +1} {}= & {} {{\widetilde{V}}_{k+\ell +1}^{i}} {{\widetilde{K}}_{k+\ell +1}^{i}} \qquad (i=1, 2, \dots , \ell ), \end{aligned}

analogously to (2) and determine parameters $${\varvec{\mu }}_{k+1}$$ and the approximate solution $$\widetilde{\varvec{c}}_{k+\ell +1}$$. Next, we compute

\begin{aligned} A ({\widetilde{X}}_{k+\ell +1} Z^*) {}= & {} ({\widetilde{U}}_{k+\ell +2} P^*) (P \widetilde{{\underline{H}}}_{k+\ell +1} Z^*) \nonumber \\ {L^{i}} ({\widetilde{X}}_{k+\ell +1} Z^*) {}= & {} ({{\widetilde{V}}_{k+\ell +1}^{i}} Q^{i*}) (Q^i {{\widetilde{K}}_{k+\ell +1}^{i}} Z^*) \qquad (i=1, 2, \dots , \ell ), \end{aligned}
(5)

where Z, P, and $$Q^i$$ orthonormal matrices of the form

\begin{aligned} Z = \begin{bmatrix} I_{k}&\\&Z_{\ell +1} \end{bmatrix}, \quad P = \begin{bmatrix} I_{k+1}&\\&P_{\ell +1} \end{bmatrix}, \quad Q^i = \begin{bmatrix} I_{k}&\\&Q^i_{\ell +1} \end{bmatrix}. \end{aligned}
(6)

Here $$I_k$$ is the $$k\times k$$ identity matrix and $$Z_{\ell +1}$$ is an orthonormal matrix so that $$Z_{\ell +1} \widetilde{\varvec{c}}_{k+1:k+\ell +1} = \gamma {\varvec{e}}_1$$ for some scalar $$\gamma$$. The matrices $$P_{\ell +1}$$ and $$Q^i_{\ell +1}$$ are computed to make $$\widetilde{{\underline{H}}}_{k+\ell +1} Z^*$$ and $${{\widetilde{K}}_{k+\ell +1}^{i}} Z^*$$ respectively upper-Hessenberg and upper-triangular again. At this point we can truncate (5) to obtain

\begin{aligned} A X_{k+1}= & {} U_{k+2} {\underline{H}}_{k+1} \\ {L^{i}} X_{k+1}= & {} {V_{k+1}^{i}} {K_{k+1}^{i}} \qquad (i=1, 2, \dots , \ell ), \end{aligned}

and truncate $$Z\widetilde{\varvec{c}}_{k+\ell +1}$$ to obtain $${\varvec{c}}_{k+1}$$ so that $${\widetilde{X}}_{k+\ell +1} \widetilde{\varvec{c}}_{k+\ell +1} = X_{k+1}\varvec{c}_{k+1}$$. The truncation is expected to keep important components, since the directions removed from $$X_{k+\ell +1}$$ are perpendicular to the current best approximation $${\varvec{x}}_{k+1}$$, and also to the previous best approximations $${\varvec{x}}_{k}$$, $${\varvec{x}}_{k-1}$$, ..., $${\varvec{x}}_1$$. If the rotation and truncation are combined in one step, then the computational cost of the method is , which quickly becomes smaller than the (re)orthogonalization cost as k grows.

To illustrate our approach, let us consider a one-parameter Tikhonov example where $$\ell = 1$$. First we expand $$X_1 = {\varvec{x}}_1$$ with vectors $$A^*A{\varvec{x}}_1$$ and $$L^*L {\varvec{x}}_1$$. Let $$A {\widetilde{X}}_{1+2} = {\widetilde{U}}_{2+2} \widetilde{{\underline{H}}}_{1+2}$$ and $$L {\widetilde{X}}_{1+2} = {\widetilde{V}}_{1+2} {\widetilde{K}}_{1+2}$$, and use $$\widetilde{{\underline{H}}}_{1+2}$$ and $${\widetilde{K}}_{1+2}$$ to compute $$\widetilde{\varvec{c}}_{1+2}$$. We then compute a rotation matrix $$Z_2$$ so that $$Z_2 \widetilde{\varvec{c}}_{2:3} = \pm \Vert \widetilde{\varvec{c}}_{2:3}\Vert {\varvec{e}}_1$$, and let Z be defined as in (6). The matrices $$\widetilde{{\underline{H}}}_{1+2} Z^*$$ and $${\widetilde{K}}_{1+2} Z^*$$ are no longer have their original structure, hence, we need to compute orthonormal P and Q such that $$P \widetilde{{\underline{H}}}_{1+2} Z^*$$ is again upper-Hessenberg and $$Q {\widetilde{K}}_{1+2} Z^*$$ is upper-triangular. Schematically we have

\begin{aligned} \xrightarrow {\widetilde{\varvec{c}}_{1+2}^*}&\begin{bmatrix} \times\times & {} \times \end{bmatrix} \xrightarrow {(Z\widetilde{\varvec{c}}_{1+2})^*} \begin{bmatrix} \times\times & {} 0 \end{bmatrix} \\ \xrightarrow {\widetilde{{\underline{H}}}_{1+2}}&\begin{bmatrix} \times\times & {} \times \\ \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \xrightarrow {\widetilde{{\underline{H}}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ \times\times & {} \times \\ 0\times & {} \times \\ 0\times & {} \times \end{bmatrix} \xrightarrow {P\widetilde{{\underline{H}}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \\ \xrightarrow {{\widetilde{K}}_{1+2}}&\begin{bmatrix} \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \xrightarrow {{\widetilde{K}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ 0\times & {} \times \\ 0\times & {} \times \end{bmatrix} \xrightarrow {Q{\widetilde{K}}_{1+2}Z^*} \begin{bmatrix} \times\times & {} \times \\ 0\times & {} \times \\ 0&0&\times \end{bmatrix} \end{aligned}

accompanied by the decompositions

\begin{aligned} A ({\widetilde{X}}_{1+2} Z^*)= & {} ({\widetilde{U}}_{2+2} P^*) (P \widetilde{{\underline{H}}}_{1+2} Z^*) \\ L ({\widetilde{X}}_{1+2} Z^*)= & {} ({\widetilde{V}}_{1+2} Q^*) (Q {\widetilde{K}}_{1+2}Z^*). \end{aligned}

At this point we truncate the subspaces by removing the last columns from $${\widetilde{X}}_{1+2} Z^*$$, $${\widetilde{U}}_{2+2} P^*$$, $$P \widetilde{{\underline{H}}}_{1+2} Z^*$$, $${\widetilde{V}}_{1+2} Q^*$$, and $$Q {\widetilde{K}}_{1+2} Z^*$$, and the bottom rows of $$P \widetilde{{\underline{H}}}_{1+2} Z^*$$ and $$Q {\widetilde{K}}_{1+2} Z^*$$, to obtain

\begin{aligned} AX_2= & {} U_3 {\underline{H}}_2 \\ LX_2= & {} V_2 K_2. \end{aligned}

Below we summarize the steps of the new algorithm for solving problem (1). In our implementation we take care to use full reorthogonalization and avoid extending $$X_{k}$$, $$U_{k+1}$$, and $${V_k^{i}}$$ with numerically linearly dependent vectors. We omit these steps from the pseudocode for brevity. In addition, we initially expand the search space solely with $$A^*{\varvec{u}}_{k+1}$$ until the discrepancy principle can be satisfied conform Proposition 1 in Sect. 3.

### Algorithm 2

(Multidirectional Tikhonov regularization)

Input: Measurement matrix A, regularization operators. $${L^{1}}$$, ..., $${L^{\ell }}$$, and data $${\varvec{b}}$$.

Output: Approximate solution $${\varvec{x}}_k \approx {\varvec{x}}_\star$$.

1. 1.

Initialize $$\beta = \Vert {\varvec{b}}\Vert$$, $$U_1 = {\varvec{b}} / \beta$$, $$X_0 = []$$, $${\varvec{x}}_0 = \mathbf{0 }$$, and $$\varvec{\mu }_0 = \mathbf{0 }$$.

for $$k=0$$, 1, ..., d o

2. 2.

Expand $$X_k$$ with $$A^* A {\varvec{x}}_{k}$$, $${L^{1}}^* {L^{1}} {\varvec{x}}_{k}$$, ..., $${L^{\ell }}^* {L^{\ell }} {\varvec{x}}_{k}$$.

3. 3.

Update $$A{\widetilde{X}}_{k+\ell +1} = {\widetilde{U}}_{k+\ell +2} \widetilde{{\underline{H}}}_{k+\ell +1}$$ and $${L^{i}} {\widetilde{X}}_{k+\ell +1} = {{\widetilde{V}}_{k+\ell +1}^{i}} {{\widetilde{K}}_{k+\ell +1}^{i}}$$.

4. 4.

Select $$\varvec{\mu }_k$$; see Sect. 4 and Algorithm 3.

5. 5. .

6. 6.

Compute P, Q, and Z (see text).

7. 7.

Truncate $$A ({\widetilde{X}}_{k+\ell +1} Z^*) = ({\widetilde{U}} _{k+\ell +2} P^*) (P \widetilde{{\underline{H}}}_{k+\ell +1} Z^*)$$ to $$A X_{k+1} = U_{k+2} {\underline{H}}_{k+1}$$.

Truncate $${L^{i}} ({\widetilde{X}}_{k+\ell +1} Z^*) = ({{\widetilde{V}}_{k+\ell +1}^{i}} Q^{i*}) (Q^i {{\widetilde{K}}_{k+\ell +1}^{i}} Z^*)$$ to $${L^{i}} X_{k+1} = {V_{k+1}^{i}} {K_{k+1}^{i}}$$.

8. 8.

Truncate $$Z\widetilde{\varvec{c}}_{k+\ell +1}$$ to obtain $${\varvec{c}}_{k+1}$$ and set $${\varvec{x}}_{k+1} = X_{k+1} {\varvec{c}}_{k+1}$$.

9. 9.

$$\quad {\mathbf{i }}{\mathbf{f }} \Vert {\varvec{x}}_{k+1} - {\varvec{x}}_k\Vert /\Vert {\varvec{x}}_k\Vert$$ is sufficiently small then break

We have completed our discussion of the expansion and truncation phase of our algorithm. In the following section we discuss the extraction phase for one-parameter Tikhonov regularization and discuss the multiparameter case in later sections.

## Parameter Selection in Standard Tikhonov

In this section we investigate parameter selection for general form one-parameter Tikhonov, where $$\ell = 1$$, $$\mu = {\mu ^{1}}$$, and $$L = {L^{1}}$$. Multiple methods exist in the one-parameter case to determine particular $$\mu _k$$, including the discrepancy principle, the L-curve criterion and generalized cross validation; see, for example, Hansen [11, Ch. 7]. We focus on the discrepancy principle which states that $$\mu _k$$ must satisfy

\begin{aligned} \Vert A {\varvec{x}}_k(\mu _k) - {\varvec{b}}\Vert = \eta \varepsilon , \end{aligned}
(7)

where $$\Vert {\varvec{e}}\Vert \le \varepsilon$$ and $$\eta >1$$ is a user supplied constant independent of $$\varepsilon$$.

Define the residual vector $${\varvec{r}}_k(\mu ) = A{\varvec{x}}_k(\mu ) - {\varvec{b}}$$ and the function $$\varphi (\mu ) = \Vert {\varvec{r}}_k(\mu )\Vert ^2$$. A nonnegative $$\mu _k$$ satisfies the discrepancy principle if $$\varphi (\mu _k) = \eta ^2 \varepsilon ^2$$. It is known that root finding methods can find solutions, for example, Lampe et al.  compare four of them. We prefer bisection for its reliability and straightforward analysis and implementation. The performance difference is not an issue because root finding requires a fraction of the total computation time and is no bottleneck. A unique solution $$\mu _k$$ exists under mild conditions, see for instance . Below we give a proof using our own notation.

Assume $${\underline{H}}_k$$ and $$K_k$$ are full rank and let $$P_k {\varSigma }_k Q_k^*$$ be the singular value decomposition of $${\underline{H}}_k K_k^{-1}$$. Let the singular values be denoted by

\begin{aligned} \sigma _{\max } = \sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _k = \sigma _{\min } > 0. \end{aligned}
(8)

Now we can express $${\varvec{c}}_k(\mu )$$ and $$\varphi$$ as

\begin{aligned} {\varvec{c}}_k(\mu )&= ({\underline{H}}_k^* {\underline{H}}_k + \mu K_k^* K_k)^{-1}{\underline{H}}_k^* \beta {\varvec{e}}_1\\&= K_k^{-1} (K_k^{-*} {\underline{H}}_k^* {\underline{H}}_k K_k^{-1} + \mu I)^{-1} K_k^{-*} {\underline{H}}_k^* \beta {\varvec{e}}_1 \\&= K_k^{-1} Q_k ({\varSigma }_k^2 + \mu I)^{-1} {\varSigma }_k P_k^* \beta {\varvec{e}}_1 \end{aligned}

and

\begin{aligned} \varphi (\mu )&= \Vert \beta {\varvec{e}}_1 - {\underline{H}}_k{\varvec{c}}_k(\mu )\Vert ^2\\&= \beta ^2 \Vert {\varvec{e}}_1 - {\underline{H}}_k K_k^{-1} Q_k ({\varSigma }_k^2 + \mu I)^{-1} {\varSigma }_k P_k^* {\varvec{e}}_1\Vert ^2 \\&= \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1 + P_k P_k^* {\varvec{e}}_1 {} - P_k {\varSigma }_k ({\varSigma }_k^2 + \mu I)^{-1} {\varSigma }_k P_k^* {\varvec{e}}_1\Vert ^2\\&= \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 {} + \beta ^2 \Vert \mu ({\varSigma }_k^2 + \mu I)^{-1} P_k^* {\varvec{e}}_1\Vert ^2. \end{aligned}

Or alternatively,

\begin{aligned} \varphi (\mu ) = \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 {} + \beta ^2 \sum _{j=1}^k \bigg ( \frac{\mu }{\sigma _j^2 + \mu } \bigg )^2 |P_k|_{1j}^2. \end{aligned}
(9)

Observe that $$P_k$$ is a basis for the range of $${\underline{H}}_k$$ and $$I - P_k P_k^*$$ is the orthogonal projection onto the nullspace and is sometimes denoted by . Furthermore, it can be verified that $${\underline{H}}_k\beta {\varvec{e}}_1 \ne \varvec{0}$$ if $$A^*{\varvec{b}} \ne \mathbf{0 }$$, that is, .

### Proposition 1

If $$\beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 \le \eta ^2 \varepsilon ^2 < \Vert {\varvec{b}}\Vert ^2$$, then there exists a unique $$\mu _k\ge 0$$ such that $$\varphi (\mu _k) = \eta ^2 \varepsilon ^2$$.

### Proof

(See also  and references therein). From (9) it follows that $$\varphi$$ is a rational function with poles $$\mu =-\sigma _j^2$$ for all $$\sigma _j>0$$, therefore, $$\varphi$$ is $$C^\infty$$ on the interval $$[0,\infty )$$. Additionally, $$\varphi$$ is a strictly increasing and bounded function on the same interval, since

\begin{aligned} \frac{d}{d\mu }\bigg ( \frac{\mu }{\sigma _j^2 + \mu } \bigg )^2 = 2 \frac{\mu \sigma _j^2}{(\sigma _j^2 + \mu )^3}> 0, \quad \text {for all} \quad \mu > 0 \end{aligned}

implies $$\varphi ^\prime (\mu ) > 0$$ and

\begin{aligned} \varphi (0) = \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 \quad \text {and} \quad \lim _{\mu \rightarrow \infty } \varphi (\mu ) = \beta ^2 = \Vert {\varvec{b}}\Vert ^2. \end{aligned}

Consequently, there exists a unique $$\mu _k \in [0,\infty )$$ such that $$\varphi (\mu _k) = \eta ^2 \varepsilon ^2$$. $$\square$$

Beyond nonnegativity, the proposition above provides little insight on the location of $$\mu _k$$ on the real axis, and we would like to have lower and upper bounds. We determine bounds in Proposition 2 and believe the results to be new. Both in practice and for the proof of the subsequent proposition, it is useful to remove nonessential parts of $$\varphi (\mu )$$ and instead work with the function

\begin{aligned} {\widetilde{\varphi }}(\mu ) = \frac{\varphi (\mu ) - \varphi (0)}{\beta ^2} = \sum _{j=1}^k \bigg ( \frac{\mu }{\sigma _j^2 + \mu } \bigg )^2 |P_k|_{1j}^2, \end{aligned}

and the quantity

\begin{aligned} {\widetilde{\varepsilon }}^2 = \frac{\eta ^2 \varepsilon ^2 - \varphi (0)}{\beta ^2}. \end{aligned}
(10)

Then $$0\le {\widetilde{\varphi }}(\mu ) \le \rho$$, where $$\rho = \Vert P_k^* \varvec{e}_1\Vert \le 1$$, and $$\eta ^2\varepsilon ^2$$ satisfies the bounds in Proposition 1 if and only if $$0 \le {\widetilde{\varepsilon }} < \rho$$, and $$\varphi (\mu _k) = \eta ^2\varepsilon ^2$$ if and only if $${\widetilde{\varphi }}(\mu _k) = {\widetilde{\varepsilon }}^2$$.

### Proposition 2

If $$0 \le {\widetilde{\varepsilon }} < \rho$$, and $$\mu _k$$ is such that $${\widetilde{\varphi }}(\mu _k) = {\widetilde{\varepsilon }}^2$$, then

\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \sigma _{\min }^2 \le \mu _k \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \sigma _{\max }^2, \end{aligned}
(11)

where $$\sigma _{\min }$$ and $$\sigma _{\max }$$ are as in (8).

### Proof

The key of the proof observe that

\begin{aligned} \frac{\mu }{\sigma _{\max }^2 + \mu } \le \frac{\mu }{\sigma _j^2 + \mu } \le \frac{\mu }{\sigma _{\min }^2 + \mu } \end{aligned}

for all $$j = 1$$, ..., k. Combining this observation with the definition of $${\widetilde{\varphi }}$$ yields

\begin{aligned} \left( \frac{\mu _k}{\sigma _{\max }^2 + \mu _k}\right) ^2 \sum _{j=1}^k |P_k|_{1j}^2 \le \sum _{j=1}^k \left( \frac{\mu _k}{\sigma _{j}^2 + \mu _k}\right) ^2 |P_k|_{1j}^2 \le \left( \frac{\mu _k}{\sigma _{\min }^2 + \mu _k}\right) ^2 \sum _{j=1}^k |P_k|_{1j}^2, \end{aligned}

Since $$\sum _{j=1}^k |P_k|_{1j}^2 = \Vert P_k^* {\varvec{e}}_1\Vert ^2 = \rho ^2$$ and $${\widetilde{\varphi }}(\mu _k) = {\widetilde{\varepsilon }}^2$$, it follows that

\begin{aligned} \frac{\mu _k}{\sigma _{\max }^2 + \mu _k} \rho \le {\widetilde{\varepsilon }} \le \frac{\mu _k}{\sigma _{\min }^2 + \mu _k} \rho . \end{aligned}

Hence, if $${\widetilde{\varepsilon }} = 0$$, then $$\mu _k = 0$$ and we are done. Otherwise $$\mu _k \ne 0$$ and we can divide by $$\rho$$, take the reciprocals, and subtract 1 to arrive at

\begin{aligned} \frac{\sigma _{\max }^2}{\mu _k} \ge \frac{\rho }{{\widetilde{\varepsilon }}} - 1 \ge \frac{\sigma _{\min }^2}{\mu _k}. \end{aligned}

So that

\begin{aligned} \frac{\mu _k}{\sigma _{\max }^2} \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \le \frac{\mu _k}{\sigma _{\min }^2}, \end{aligned}

and the proposition follows. $$\square$$

It is undesirable to work with the inverse of $$K_k$$ when it becomes ill-conditioned. Instead it may be preferred to use the generalized singular value decomposition (GSVD)

\begin{aligned} {\underline{H}}_k= & {} P_k C_k Z_k^{-1} \\ K_k= & {} Q_k S_k Z_k^{-1}, \end{aligned}

where $$P_k$$ and $$Q_k$$ have orthogonal columns and $$Z_k$$ is nonsingular. The matrices $$C_k$$ and $$S_k$$ are diagonal with entries $$0 \le c_1 \le c_2 \le \dots \le c_k$$ and respectively $$s_1 \ge \dots \ge s_k \ge 0$$, such that $$c_i^2 + s_i^2 = 1$$. The generalized singular values are given by $$c_i / s_i$$ and are understood to be infinite when $$s_i = 0$$. If $$K_k$$ is nonsingular, then the generalized singular values coincide with the singular values of $${\underline{H}}_k K_k^{-1}$$. See Golub and Van Loan [8, Section 8.7.3] for more information.

Using a similar derivation as before, we can show that

\begin{aligned} \varphi (\mu ) = \beta ^2 \Vert (I - P_k P_k^*) {\varvec{e}}_1\Vert ^2 {} + \beta ^2 \sum _{j=1}^k \bigg ( \frac{\mu s_j^2}{c_j^2 + \mu s_j^2} \bigg )^2 |P_k|_{1j}^2 \end{aligned}

and that the new bounds are given by

\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \bigg ( \frac{c_1}{s_1} \bigg )^2 \le \mu _k \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}} \bigg ( \frac{c_k}{s_k} \bigg )^2. \end{aligned}

Here $$\mu _k$$ is unbounded from above if $$s_k = 0$$, that is, if $$K_k$$ becomes singular.

The bounds in this section can be readily computed and used to implement bisection and the secant method. We consider parameter selection for multiparameter regularization in the following section.

## A Multiparameter Selection Strategy

Choosing satisfactory $${\mu _k^{i}}$$ in multiparameter regularization is more difficult than the corresponding one-parameter problem. See for example [1, 2, 6, 14, 16, 20, 20]. In particular, there is no obvious multiparameter extension of the discrepancy principle. Nevertheless, methods based on the discrepancy principle exist and we will discuss three of them.

Brezinski et al.  had some success with operators splitting. Substituting $${\mu _k^{i}} = {\nu _k^{i}} {\omega _k^{i}}$$ in (3) with nonnegative weights $${\omega _k^{i}}$$ and $$\sum _{i=1}^\ell {\omega _k^{i}} = 1$$ leads to

\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \sum _{i=1}^\ell {\omega _k^{i}} (\Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + {\nu _k^{i}} \Vert {K_k^{i}} {\varvec{c}}\Vert ^2). \end{aligned}

This form of the minimization problem suggests the approximation of $$X_k^* {\varvec{x}}_\star$$ by a linear combination [2, Sect. 3] of $${{\varvec{c}}_k^{i}}({\nu _k^{i}})$$, where

\begin{aligned} {{\varvec{c}}_k^{i}}(\nu ) = \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} {} - \beta {\varvec{e}}_1\Vert ^2 + \nu \Vert {K_k^{i}} {\varvec{c}}\Vert ^2 \qquad (i = 1, 2, \dots , \ell ), \end{aligned}
(12)

and $${\nu _k^{i}}$$ is such that $$\Vert {\underline{H}}_k {{\varvec{c}}_k^{i}}({\nu _k^{i}}) - \beta {\varvec{e}}_1\Vert = \eta \varepsilon$$. Alternatively, Brezinski et al.  consider solving where $${\nu ^{i}}$$ are fixed and obtained from (12). The latter approach provides better results in exchange for an additional QR decomposition. In either case, operator splitting is a straightforward approach, but does not necessarily satisfy the discrepancy principle exactly.

Lu and Pereverzyev  and later Fornasier et al.  rewrite the constrained minimization problem as a differential equation and approximate

\begin{aligned} F(\varvec{\mu }) = \Vert {\underline{H}}_k {\varvec{c}}_k(\varvec{\mu }) - \beta {\varvec{e}}_1\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {K_k^{i}} {\varvec{c}}_k(\varvec{\mu })\Vert ^2 \end{aligned}

by a model function $$m(\varvec{\mu })$$ which admits a straightforward solution to the constructed differential equation. However, it is unclear which $$\varvec{\mu }$$ the method finds and its solution may depend on the initial guess. On the other hand, it is possible to keep all but one parameter fixed and compute a value for the free parameter such that the discrepancy principle is satisfied. This allows one to trace discrepancy hypersurfaces to some extent.

Gazzola and Novati  describe another interesting method. They start with a one-parameter problem and successively add parameters in a novel way, until each parameter of the full multiparameter problem has a value assigned. Especially in early iterations the discrepancy principle is not satisfied, but the parameters are updated in each iteration so that the norm of the residual is expected to approach $$\eta \varepsilon$$. Unfortunately, we observed some issues in our implementation. For example, the quality of the result depends on initial values, as well as the order in which the operators are added (that is, the indexing of the operators). The latter problem is solved by a recently published and improved version of the method , which was brought to our attention during the revision of this paper.

We propose a new method that satisfies the discrepancy principle exactly, does not depend on an initial guess, and is independent of the scaling or indexing of the operators. The method uses the operator splitting approach in combination with new weights. Let us omit all k subscripts for the remainder of this section, and suppose $${\mu ^{i}} = \mu {\omega ^{i}}$$, where $${\omega ^{i}}$$ are nonnegative, but do not necessarily sum to one, and $$\mu$$ is such that the discrepancy principle is satisfied. Then (3) can be written as

\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}} {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \mu \sum _{i=1}^\ell {\omega ^{i}} \Vert {K^{i}}{\varvec{c}}\Vert ^2. \end{aligned}
(13)

Since the goal of regularization is to reduce sensitivity of the solution to noise, we use the weights

\begin{aligned} {\omega ^{i}} = \frac{\Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert D{{\varvec{c}}^{i}}({\nu ^{i}})\Vert }, \end{aligned}
(14)

which bias the regularization parameters in the direction of lower sensitivity with respect to changes in $${\nu ^{i}}$$. Here D denotes the (total) derivative with respect to regularization parameter(s), and $${{\varvec{c}}^{i}}$$ and $${\nu ^{i}}$$ are defined as before, consequently

\begin{aligned} D {{\varvec{c}}^{i}}({\nu ^{i}}) = -({\underline{H}}^* {\underline{H}} + {\nu ^{i}} {K^{i}}^* {K^{i}})^{-1} {K^{i}}^*{K^{i}} {{\varvec{c}}^{i}}({\nu ^{i}}). \end{aligned}

If for some indices $$D {{\varvec{c}}^{i}}({\nu ^{i}}) = \mathbf{0 }$$, then we take a $${{\varvec{c}}^{i}}({\nu ^{i}})$$ as the solution, or replace $$\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert$$ by a small positive constant. With this parameter choice, the solution does not depend on the indexing of the operators, nor, up to a constant, on the scaling of A, $$\varvec{b}$$, or any of the $${L^{i}}$$. The former is easy to see; for the latter, let $$\alpha$$, $$\gamma$$, and $${\lambda ^{i}}$$ be positive constants, and consider the scaled problem

\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{\widehat{\varvec{x}}} \Vert \gamma {\varvec{b}} - \alpha A \widehat{\varvec{x}}\Vert ^2 {} + \mu \sum _{i=1}^{\ell } {{\widehat{\omega }}^{i}} \Vert \lambda ^{i} {L^{i}} \widehat{\varvec{x}}\Vert ^2. \end{aligned}

The noisy component of $$\gamma {\varvec{b}}$$ is $$\gamma {\varvec{e}}$$ and $$\Vert \gamma {\varvec{e}}\Vert \le \gamma \varepsilon$$, hence the new discrepancy bound becomes

\begin{aligned} \Vert \alpha A \widehat{\varvec{x}} - \gamma {\varvec{b}}\Vert = \gamma \eta \varepsilon . \end{aligned}

The bound is satisfied when $${{\widehat{\omega }}^{i}} = \alpha ^2 / (\lambda ^i)^2\; {\omega ^{i}}$$, since in this case

\begin{aligned} \widehat{\varvec{x}} = \Big (\alpha ^2 A^*A + \mu \sum _{i=1}^\ell {\omega ^{i}} \frac{\alpha ^2}{({\lambda ^{i}})^2}({\lambda ^{i}})^2 {L^{i}}^*{L^{i}} \Big )^{-1} \alpha A^* \gamma {\varvec{b}} = \frac{\gamma }{\alpha } {\varvec{x}}. \end{aligned}

and

\begin{aligned} \min _{\widehat{\varvec{x}}} \Vert \gamma {\varvec{b}} - \alpha A \widehat{\varvec{x}}\Vert ^2 {} + \mu \sum _{i=1}^{\ell } {{\widehat{\omega }}^{i}} \Vert \lambda ^{i} {L^{i}} \widehat{\varvec{x}}\Vert ^2 = \gamma ^2 \Big ( \min _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \mu \sum _{i=1}^\ell {\omega ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \Big ). \end{aligned}

It may be checked that the weights in (14) are indeed proportional to $$\alpha ^2/(\lambda ^i)^2$$, that is

\begin{aligned} {\omega ^{i}} = \frac{\Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert } {}\sim \frac{\alpha ^2}{({\lambda ^{i}})^2}. \end{aligned}

There are additional viable choices for $${\omega ^{i}}$$, including two smoothed versions of the above:

\begin{aligned} {\omega ^{i}} = \frac{\Vert {\underline{H}} {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert {\underline{H}} D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert } \quad \text {and}\quad {\omega ^{i}} = \frac{\Vert {K^{i}} {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert {K^{i}} D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }, \end{aligned}

which consider the sensitivity of $${{\varvec{c}}^{i}}({\nu ^{i}})$$ in the range of $${\underline{H}}$$ and $${K^{i}}$$ respectively. We summarize the new parameter selection in Algorithm 3 below.

### Algorithm 3

(Multiparameter selection)

Input: Projected matrices $${\underline{H}}$$, $${K^{1}}$$, ..., $${K^{\ell }}$$, $$\beta = \Vert {\varvec{b}}\Vert$$, noise estimate $$\varepsilon$$, uncertainty parameter $$\eta$$, and threshold $$\tau$$.

Output: Regularization parameters $${\mu ^{1}}$$, ..., $${\mu ^{\ell }}$$.

1. 1.

Use (12) to compute $${{\varvec{c}}^{i}}$$ and $${\nu ^{i}}$$.

i f $$\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \le \tau \Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert$$ for some i then

2. 2.

Set $${\omega ^{i}} = \tau ^{-1}$$; or set $${\mu ^{i}} = {\nu ^{i}}$$ and $${\mu ^{j}} = 0$$ for $$j\ne i$$.

else

3. 3.

Let $${\omega ^{i}} = \Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert /\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert$$.

4. 4.

Compute $$\mu$$ in (13) such that the discrepancy principle is satisfied.

5. 5.

Set $${\mu ^{i}} = \mu {\omega ^{i}}$$.

An interesting property of Algorithm 3 is that, under certain conditions, $${\varvec{c}}(\varvec{\mu }({\widetilde{\varepsilon }}))$$ converges to the unregularized least squares solution

\begin{aligned} {\varvec{c}}(\mathbf{0 }) = ({\underline{H}}^* {\underline{H}})^{-1} {\underline{H}}^* \beta {\varvec{e}}_1 = {\underline{H}}^+ \beta {\varvec{e}}_1, \end{aligned}

as the $${\widetilde{\varepsilon }}$$ goes to zero. Here $${\underline{H}}^+$$ denotes the Moore–Penrose pseudoinverse and $${\varvec{c}}(\mathbf{0 })$$ is the minimum norm solution of the unregularized problem. The following proposition formalizes this observation.

### Proposition 3

Assume that $${\underline{H}}$$ is full rank, $${\underline{H}}^*\beta {\varvec{e}}_1 \ne \mathbf{0 }$$, and that $${K^{i}}$$ is nonsingular for $$i=1$$, ...$$\ell$$. Let $${\widetilde{\varepsilon }}$$ and $$\rho$$ be defined as in Sect. 3, let $$\eta >1$$ be fixed, and suppose that $${\nu ^{i}}({\widetilde{\varepsilon }})$$ and

\begin{aligned} \varvec{\mu }({\widetilde{\varepsilon }}) = ({\mu ^{1}}({\widetilde{\varepsilon }}), \dots , {\mu ^{\ell }}({\widetilde{\varepsilon }})) = \mu ({\widetilde{\varepsilon }}) ({\omega ^{1}}({\nu ^{1}}({\widetilde{\varepsilon }})), \dots , {\omega ^{\ell }}({\nu ^{\ell }}({\widetilde{\varepsilon }}))) \end{aligned}

are computed according to Algorithm 3 for all $$0\le {\widetilde{\varepsilon }} < \rho$$. Then

\begin{aligned} \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\omega ^{i}}({\nu ^{i}}({\widetilde{\varepsilon }})) = {\omega ^{i}}(0) \quad \text {and}\quad \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\varvec{c}}(\varvec{\mu }({\widetilde{\varepsilon }})) = {\varvec{c}}(\mathbf{0 }). \end{aligned}

### Proof

First note that $${\underline{H}}^* \beta {\varvec{e}}_1 \ne \mathbf{0 }$$ implies that $$\beta > 0$$ and $$\rho > 0$$. Since $${\underline{H}}$$ is full rank, the maps

\begin{aligned} \nu \mapsto {{\varvec{c}}^{i}}(\nu ), \quad \nu \mapsto D {{\varvec{c}}^{i}}(\nu ), \quad \text {and}\quad \varvec{\mu } \mapsto {\varvec{c}}(\varvec{\mu }) \end{aligned}

are continuous for all $$\nu \ge 0$$ and $$\varvec{\mu }\ge \mathbf{0 }$$, where the latter bound should be interpreted elementwise. Hence

\begin{aligned} \lim _{\nu \downarrow 0} {{\varvec{c}}^{i}}(\nu ) = {{\varvec{c}}^{i}}(0), \quad \lim _{\nu \downarrow 0} D{{\varvec{c}}^{i}}(\nu ) = D {{\varvec{c}}^{i}}(0), \quad \text {and}\quad \lim _{\varvec{\mu } \downarrow \mathbf{0 }} {\varvec{c}}(\varvec{\mu }) = {\varvec{c}}(\mathbf{0 }). \end{aligned}

It remains to be shown that

\begin{aligned} \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\nu ^{i}}({\widetilde{\varepsilon }}) = 0, \quad \Vert D {{\varvec{c}}^{i}}(0)\Vert \ne 0, \quad \text {and}\quad \lim _{{\widetilde{\varepsilon }} \downarrow 0} \varvec{\mu }({\widetilde{\varepsilon }}) = \mathbf{0 }. \end{aligned}
(15)

Let $${\widetilde{\varepsilon }}$$ be restricted to the interval $$[0, \rho /2]$$ and define $${\nu _{\max }^{i}} = \sigma _{\max }^2({\underline{H}} ({K^{i}})^{-1})$$. By Proposition 2,

\begin{aligned} 0 \le {\nu ^{i}}({\widetilde{\varepsilon }}) \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}}\, {\nu _{\max }^{i}} \le {\nu _{\max }^{i}}, \end{aligned}

which proves the first limit in (15). Furthermore, using the definitions of $${{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))$$ and $$D {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))$$ we find the bounds

\begin{aligned}&0< \rho \beta \frac{\sigma _{\min }({\underline{H}})}{ \Vert {\underline{H}}\Vert ^2 + {\nu _{\max }^{i}} \Vert {K^{i}}\Vert ^2} \le \Vert {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\Vert \le \rho \beta \Vert {\underline{H}}^+{\varvec{e}}_1\Vert ,\\&0 < \rho \beta \frac{\sigma _{\min }({\underline{H}}) \sigma _{\min }^2({K^{i}})}{ (\Vert {\underline{H}}\Vert ^2 + {\nu _{\max }^{i}} \Vert {K^{i}}\Vert ^2)^2} \le \Vert D {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\Vert \le \rho \beta \frac{\Vert {K^{i}}\Vert ^2\, \Vert {\underline{H}}^+{\varvec{e}}_1\Vert }{\sigma _{\min }^2({\underline{H}})}, \end{aligned}

which show that the inequality in (15) is satisfied. Moreover, the bounds show there exist $$\omega _{\min }$$ and $$\omega _{\max }$$ such that

\begin{aligned} 0< \omega _{\min } \le {\omega ^{i}}({\widetilde{\varepsilon }}) \le \omega _{\max } < \infty . \end{aligned}

Now, let $${\mathbf {K}}({\widetilde{\varepsilon }})$$ be the nonsingular matrix satisfying

\begin{aligned} {\mathbf {K}}({\widetilde{\varepsilon }})^*{\mathbf {K}}({\widetilde{\varepsilon }}) = \sum _{i=1}^\ell {\omega ^{i}}({\widetilde{\varepsilon }}) {K^{i}}^*{K^{i}}, \end{aligned}

then it can be checked that

\begin{aligned} \Vert {\underline{H}} {\mathbf {K}}({\widetilde{\varepsilon }})^{-1}\Vert ^2 \le \frac{\Vert {\underline{H}}\Vert ^2}{\min _i \omega _{\min } \sigma _{\min }^2({K^{i}})} < \infty . \end{aligned}

Define the right hand side of the equation above as M, then by Proposition 2, each entry of $$\varvec{\mu }({\widetilde{\varepsilon }})$$ is bounded from below by 0 and from above by

\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}}\, M \omega _{\max }, \end{aligned}

which goes to 0 as $${\widetilde{\varepsilon }} \downarrow 0$$. Therefore, this proves second limit in (15). $$\square$$

Proposition 3 is related to [9, Thm 3.3.3], where it is shown that the solution of a standard form Tikhonov regularization problem converges to a minimum norm least squares solution when the discrepancy principle is used and the noise converges to zero.

In this section we have discussed a new parameter selection method. In the next section we will look at the effect of perturbations in the parameters on the obtained solutions.

## Perturbation Analysis

The goal of regularization is to make reconstruction robust with respect to noise. By extension, a high sensitivity to the regularization parameters is undesirable. Consider a set of perturbed parameters $$\varvec{\mu }_k + {\varDelta }\varvec{\mu }$$; if $$\Vert {\varDelta }\varvec{\mu }\Vert$$ is sufficiently small where M and $${\varDelta } M$$ are defined as

\begin{aligned} M = {\underline{H}}_k^* {\underline{H}}_k + \sum _{i=1}^\ell {\mu _k^{i}} {K_k^{i}}^* {K_k^{i}}, \quad {\varDelta } M = \sum _{i=1}^\ell {\varDelta }{\mu _k^{i}} {K_k^{i}}^* {K_k^{i}}. \end{aligned}
(16)

Therefore, one might choose $$\varvec{\mu }_k$$ to minimize the sensitivity measure

\begin{aligned} \Vert D{\varvec{c}}(\varvec{\mu }_k){\varDelta }\varvec{\mu }\Vert = \Vert M^{-1}{\varDelta } M{\varvec{c}}(\varvec{\mu }_k)\Vert . \end{aligned}

To see the connection with the previous section, suppose that $$\varvec{\mu }_k = {\nu _k^{i}} {\varvec{e}}_i$$ and $${\varDelta }\varvec{\mu } = \pm \Vert {\varDelta }\varvec{\mu }\Vert {\varvec{e}}_i$$, then

\begin{aligned} \Vert M^{-1}{\varDelta } M\Vert \ge \frac{\Vert M^{-1}{\varDelta } M{\varvec{c}}_k(\varvec{\mu }_k)\Vert }{\Vert \varvec{c}_k(\varvec{\mu }_k)\Vert } = \frac{\Vert D{\varvec{c}}_k(\varvec{\mu }_k){\varDelta }\varvec{\mu }\Vert }{\Vert {\varvec{c}}_k(\varvec{\mu }_k)\Vert } = \frac{\Vert D {{\varvec{c}}_k^{i}}({\nu _k^{i}})\Vert \,\Vert {\varDelta }\varvec{\mu }\Vert }{\Vert {{\varvec{c}}_k^{i}}({\nu _k^{i}})\Vert } = \frac{\Vert {\varDelta }\varvec{\mu }\Vert }{{\omega _k^{i}}} \end{aligned}

Thus, larger weights $${\omega _k^{i}}$$ correspond to smaller lower bounds on $$\Vert M^{-1}{\varDelta } M\Vert$$. Having small lower bounds is desirable, since we show in Propositions 4 and 5 that minimizing $$\Vert M^{-1}{\varDelta } M\Vert$$ is equivalent to minimizing upper bounds on the forward and backward errors respectively.

### Proposition 4

Given regularization parameters $${\mu _k^{i}}$$ and perturbations $${\mu _\star ^{i}} = {\mu _k^{i}} + {\varDelta }{\mu _k^{i}}$$, let $${\varvec{c}}_k = {\varvec{c}}_k(\varvec{\mu }_k)$$, $${\varvec{c}}_\star = {\varvec{c}}_k(\varvec{\mu }_\star )$$, $${\varvec{x}}_k = X_k {\varvec{c}}_k$$, and $${\varvec{x}}_\star = X_k {\varvec{c}}_\star$$. Assume $${\underline{H}}_k$$ and all $${K_k^{i}}$$ are of full rank and define matrices M and $${\varDelta } M$$ as in (16). If M and $$M + {\varDelta } M$$ are nonsingular and the $${\varDelta } {\mu _k^{i}}$$ are sufficiently small so that $$\Vert M^{-1} {\varDelta } M\Vert < 1$$, then

\begin{aligned} \frac{ \Vert {\varvec{x}}_k - {\varvec{x}}_\star \Vert }{ \Vert {\varvec{x}}_k\Vert } \le \frac{ \Vert M^{-1} {\varDelta } M \Vert }{ 1 - \Vert M^{-1} {\varDelta } M \Vert }. \end{aligned}

### Proof

Observe that $${\varvec{c}}_k = M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1$$ and $$\varvec{c}_\star = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1$$. With a little manipulation we obtain

\begin{aligned} {\varvec{c}}_\star = (M + {\varDelta } M)^{-1} M {\varvec{c}}_k = (I + M^{-1} {\varDelta } M)^{-1} {\varvec{c}}_k = \sum _{j=0}^\infty (-M^{-1} {\varDelta } M)^j {\varvec{c}}_k. \end{aligned}

It follows that

\begin{aligned} \frac{\Vert {\varvec{c}}_k - {\varvec{c}}_\star \Vert }{\Vert {\varvec{c}}_k\Vert } = \frac{1}{\Vert {\varvec{c}}_k\Vert } \bigg \Vert \sum _{j=1}^\infty (-M^{-1} {\varDelta } M)^j {\varvec{c}}_k \bigg \Vert \le \sum _{j=1}^\infty \Vert M^{-1}{\varDelta } M\Vert ^j \le \frac{ \Vert M^{-1} {\varDelta } M\Vert }{ 1 - \Vert M^{-1} {\varDelta } M\Vert }. \end{aligned}

Since $$X_k$$ has orthonormal columns, the result of the proposition follows. $$\square$$

One may wonder if it is possible to pick a vector $${\varvec{f}}$$ close to $$\beta {\varvec{e}}_1$$ such that

\begin{aligned} {\varvec{c}}_k = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}}. \end{aligned}

Or in other words, given perturbed regularization parameters, is there a perturbation of $$\beta {\varvec{e}}_1$$ such that the optimal approximation to the exact solution is obtained? The following proposition provides a positive answer.

### Proposition 5

Under the assumptions of Proposition 4, there exist vectors $${\varvec{f}}$$ and $${\varvec{g}}$$ such that $${\varvec{c}}_k = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}}$$ and $${\varvec{c}}_\star = M^{-1} {\underline{H}}_k^* {\varvec{g}}$$. Furthermore, $${\varvec{f}}$$ and $${\varvec{g}}$$ satisfy

\begin{aligned}&\frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{f}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert } \le \kappa ({\underline{H}}_k) \frac{ \Vert M^{-1} {\varDelta } M\Vert }{ 1 - \Vert M^{-1} {\varDelta } M\Vert },\\&\frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{g}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert } \le \kappa ({\underline{H}}_k) \Vert M^{-1} {\varDelta } M\Vert \end{aligned}

where $$\kappa ({\underline{H}}_k)$$ is the condition number of $${\underline{H}}_k$$.

### Proof

The vector $${\varvec{f}}$$ is easy to derive using the ansatz

\begin{aligned} (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}} = M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1. \end{aligned}

Let $${\underline{H}}_k = QR$$ denote the reduced QR-decomposition of $${\underline{H}}_k$$, then

\begin{aligned} R^* Q^* {\varvec{f}} = (M + {\varDelta } M) M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1, \end{aligned}

and

\begin{aligned} {\varvec{f}} = Q R^{-*} (M + {\varDelta } M) M^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1 + (I - Q Q^*) {\varvec{v}} \end{aligned}

for arbitrary $${\varvec{v}}$$. Indeed, it is easy to verify that the above vector satisfies

\begin{aligned} {\varvec{c}}_k = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* {\varvec{f}}. \end{aligned}

If we choose $${\varvec{v}} = \beta {\varvec{e}}_1$$, then

\begin{aligned} {\varvec{f}} = Q R^{-*} {\varDelta } M M^{-1} R^* Q^* \beta {\varvec{e}}_1 + \beta {\varvec{e}}_1 \end{aligned}

so that

\begin{aligned} \frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{f}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert } = \Vert Q R^{-*} {\varDelta } M M^{-1} R^* Q^* {\varvec{e}}_1 \Vert \le \Vert R^{-*}\Vert \; \Vert R^*\Vert \; \Vert {\varDelta } M M^{-1}\Vert . \end{aligned}

Here $$\Vert R^{-*}\Vert \; \Vert R^*\Vert$$ is the condition number $$\kappa ({\underline{H}}_k)$$ and $$\Vert {\varDelta } M M^{-1}\Vert = \Vert M^{-1} {\varDelta } M\Vert$$, since both M and $${\varDelta } M$$ are symmetric. This proves the first part of the proposition.

The second part is analogous. In particular, we use the ansatz

\begin{aligned} M^{-1} {\underline{H}}_k^* {\varvec{g}} = (M + {\varDelta } M)^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1 \end{aligned}

and derive

\begin{aligned} {\varvec{g}} = R^{-*} Q M (M + {\varDelta } M)^{-1} {\underline{H}}_k^* \beta {\varvec{e}}_1 + (I - Q Q^*) \beta {\varvec{e}}_1. \end{aligned}

Again it is easy to verify that $${\varvec{c}}_\star = M^{-1} {\underline{H}}_k^* {\varvec{g}}$$. Observe that $${\varvec{g}}$$ can be rewritten as

\begin{aligned} {\varvec{g}} = R^{-*} Q ((I + {\varDelta } M M^{-1})^{-1} - I) R^* Q^* \beta {\varvec{e}}_1 + \beta {\varvec{e}}_1 \end{aligned}

such that

\begin{aligned} \frac{ \Vert \beta {\varvec{e}}_1 - {\varvec{f}}\Vert }{ \Vert \beta {\varvec{e}}_1\Vert }&= \Vert R^{-*} ((I + {\varDelta } M M^{-1})^{-1} - I) R^* Q^* {\varvec{e}}_1\Vert \\&\le \Vert R^{-*}\Vert \; \Vert R^*\Vert \; \Vert (I + {\varDelta } M M^{-1})^{-1} - I\Vert . \end{aligned}

Since $$\Vert {\varDelta } M M^{-1}\Vert = \Vert M^{-1} {\varDelta } M\Vert < 1$$, it follows that

\begin{aligned} \Vert (I + {\varDelta } M M^{-1})^{-1} - I\Vert \le \sum _{j=1}^\infty \Vert -{\varDelta } M M^{-1}\Vert ^j = \frac{ \Vert M^{-1} {\varDelta } M\Vert }{ 1 - \Vert M^{-1} {\varDelta } M\Vert }, \end{aligned}

which concludes the proof. $$\square$$

We have discussed forward and backward error bounds which help motivate our parameter choice. Now that we have investigated each of the three phases of our method, we are ready to show numerical results.

## Numerical Experiments

We benchmark our algorithm with problems from Regularization Tools by Hansen . Each problem provides an ill-conditioned $$n \times n$$ matrix A, a solution vector $${\varvec{x}}_\star$$ of length n and a corresponding measured vector $${\varvec{b}}$$. We let $$n = 1024$$ and add a noise vector $${\varvec{e}}$$ to $${\varvec{b}}$$. The entries of $${\varvec{e}}$$ are drawn independently from the standard normal distribution. The noise vector is then scaled such that $$\varepsilon = \Vert {\varvec{e}}\Vert$$ equals $$0.01 \Vert {\varvec{b}}\Vert$$ or $$0.05 \Vert {\varvec{b}}\Vert$$ for 1 and 5 % noise respectively. We use $$\eta = 1.01$$ for the discrepancy bound in (7). We test the algorithms with 1000 different noise vectors for every triplet A, $${\varvec{x}}_\star$$, and $${\varvec{b}}$$ and report the median results.

The algorithms terminate when the relative difference between two subsequent approximations is less then 0.01, when $${\varvec{x}}_{k+1}$$ is (numerically) linear dependent in $$X_k$$, when both $$U_{k+1}$$ and none of the $${V_k^{i}}$$ can be expanded, or when a maximum number of iterations is reached. For Algorithm 2 we use a maximum of 20 iterations and for Algorithm 1 a maximum of $$(\ell +1) \times 20$$ iterations. For the sake of a fair comparison, the algorithms return the best obtained approximations and their iteration numbers.

For each test problem, the tables below list the relative error obtained with Algorithm 1, abbreviated by $$E_\text {od}$$, and Algorithm 2, abbreviated by $$E_\text {md}$$. OD and MD stand for one direction and multidirectional respectively. Also listed are the ratio $$\rho _E$$ of $$E_\text {md}$$ to $$E_\text {od}$$ and the ratio $$\rho _\text {mv}$$ of the number of matrix-vector products. That is,

\begin{aligned} \rho _E = \frac{E_\text {md}}{E_\text {od}} \quad \text {and}\quad \rho _\text {mv} = \frac{\# \text {MVs Algorithm 2} }{ \# \text {MVs Algorithm 1} } \end{aligned}

Only matrix-vector multiplications with A, $$A^*$$, $${L^{i}}$$, and $${L^{i}}^*$$ count towards the total number of MVs used by each algorithm. We note, however, that multiplications with $${L^{i}}$$ and $${L^{i}}^*$$ are often less costly than multiplications with A and $$A^*$$.

Table 1 lists the results one-parameter Tikhonov regularization, where we used the following regularization operators. The first derivative operator $$L_1$$ with stencil $$[1,-1]$$ for Gravity-3, Heat-5, Heat, and Phillips. The second derivative operator $$L_2$$ with stencil $$[1,-2,1]$$ for Deriv2-1, Deriv2-2, Foxgood, Gravity-1, and Gravity-2. The third derivative operator $$L_3$$ with stencil $$[-1,3,-3,1]$$ for Baart. The fifth derivative operator $$L_5$$ with stencil $$[-1,5,-10,10,-5,1]$$ and Deriv2-3. The derivative operators $$L_d$$ are of size $$(n-d) \times n$$.

The table shows that multidirectional subspace expansion can obtain small improvements in the relative error at the cost of a small number of extra matrix-vector products, especially for 1 % noise. We stress that in these cases, Algorithm 1 is allowed to perform additional MVs, but converges with a higher relative error. If there is no improvement in the relative error, we see that multidirectional subspace expansion can improve convergence, for example, for the Deriv2 problems as well as Foxgood.

Table 2 lists the results for multiparameter Tikhonov regularization. We used the following regularization operators for each problem: the derivative operator $$L_d$$ as listed above, the identity operator I, and the orthogonal projection $$(I - N_d N_d^*)$$, where the columns of $$N_d$$ are an orthonormal basis for the nullspace .

Overall, we observe larger improvements in the relative error for multidirectional subspace expansion, but also a larger number MVs. We no longer see cases where multidirectional subspace expansion terminates with fewer MVs. In fact, the relative error is the same for Heat, although more MVs are required. Finally, Fig. 1 illustrates an example of the improved results which can be obtained by using multidirectional subspace expansion.

In the next tests we attempt to reconstruct the original image from a blurred and noisy observation. Consider an $$n \times n$$ grayscale image with pixel values in the interval [0, 1]. Then $${\varvec{x}}$$ is a vector of length $$n^2$$ obtained by stacking the columns of the image below each other. The matrix A represents a Gaussian blurring operator, generated with blur from Regularization Tools. The matrix A is block-Toeplitz with half-bandwidth band=11 and the amount of blurring is given by the variance sigma=5. The entries of the noise vector $${\varvec{e}}$$ are independently drawn from the standard normal distribution after which the vector is scaled such that $$\varepsilon = {\mathbb {E}}[\Vert {\varvec{e}}\Vert ] = 0.05 \Vert {\varvec{b}}\Vert$$. We take $$\eta$$ such that $$\Vert {\varvec{e}}\Vert \le \eta \varepsilon$$ in 99.9 % of the cases. That is,

\begin{aligned} \eta = 1 + \frac{3.090232}{\sqrt{2 n^2}}. \end{aligned}

For regularization we choose an approximation to the Perona–Malik  operator where $$\rho$$ is a small positive constant. Because is a nonlinear operator, we first perform a small number of iterations with a finite difference approximation $$L_{{\varvec{b}}}$$ of . The resulting intermediate solution $$\widetilde{\varvec{x}}$$ is used for a new approximation $$L_{\tilde{\varvec{x}}}$$ of . Finally, we run the algorithms a second time with $$L_{\widetilde{\varvec{x}}}$$ and more iterations; see Reichel et al.  for more information regarding the implementation of the Perona–Malik operator.

The first test image is also used in [13, 23, 25], and is shown in Figure 2. We use $$\rho = 0.075$$, 20 iterations for the first run, and 100 iterations for the second run. The second image is an image of Saturn, see Figure 3. For this image we use $$\rho = 0.03$$, 25 iterations for the first run and 150 iterations for the second run. In both cases we stop the iterations around the point where convergence flattens out, as can be seen from the convergence history in Figure 4. The figure uses the peak signal-to-noise ratio (PSNR) given by

\begin{aligned} -20 \log _{10}\left( \frac{\Vert {\varvec{x}}_\star - {\varvec{x}}_k\Vert }{n} \right) \end{aligned}

versus the iteration number k. A higher PSNR means a higher quality reconstruction.