Choosing satisfactory \({\mu _k^{i}}\) in multiparameter regularization is more difficult than the corresponding one-parameter problem. See for example [1, 2, 6, 14, 16, 20, 20]. In particular, there is no obvious multiparameter extension of the discrepancy principle. Nevertheless, methods based on the discrepancy principle exist and we will discuss three of them.
Brezinski et al. [2] had some success with operators splitting. Substituting \({\mu _k^{i}} = {\nu _k^{i}} {\omega _k^{i}}\) in (3) with nonnegative weights \({\omega _k^{i}}\) and \(\sum _{i=1}^\ell {\omega _k^{i}} = 1\) leads to
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \sum _{i=1}^\ell {\omega _k^{i}} (\Vert {\underline{H}}_k {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + {\nu _k^{i}} \Vert {K_k^{i}} {\varvec{c}}\Vert ^2). \end{aligned}$$
This form of the minimization problem suggests the approximation of \(X_k^* {\varvec{x}}_\star \) by a linear combination [2, Sect. 3] of \({{\varvec{c}}_k^{i}}({\nu _k^{i}})\), where
$$\begin{aligned} {{\varvec{c}}_k^{i}}(\nu ) = \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}}_k {\varvec{c}} {} - \beta {\varvec{e}}_1\Vert ^2 + \nu \Vert {K_k^{i}} {\varvec{c}}\Vert ^2 \qquad (i = 1, 2, \dots , \ell ), \end{aligned}$$
(12)
and \({\nu _k^{i}}\) is such that \(\Vert {\underline{H}}_k {{\varvec{c}}_k^{i}}({\nu _k^{i}}) - \beta {\varvec{e}}_1\Vert = \eta \varepsilon \). Alternatively, Brezinski et al. [2] consider solving
where \({\nu ^{i}}\) are fixed and obtained from (12). The latter approach provides better results in exchange for an additional QR decomposition. In either case, operator splitting is a straightforward approach, but does not necessarily satisfy the discrepancy principle exactly.
Lu and Pereverzyev [19] and later Fornasier et al. [5] rewrite the constrained minimization problem as a differential equation and approximate
$$\begin{aligned} F(\varvec{\mu }) = \Vert {\underline{H}}_k {\varvec{c}}_k(\varvec{\mu }) - \beta {\varvec{e}}_1\Vert ^2 {} + \sum _{i=1}^\ell {\mu ^{i}} \Vert {K_k^{i}} {\varvec{c}}_k(\varvec{\mu })\Vert ^2 \end{aligned}$$
by a model function \(m(\varvec{\mu })\) which admits a straightforward solution to the constructed differential equation. However, it is unclear which \(\varvec{\mu }\) the method finds and its solution may depend on the initial guess. On the other hand, it is possible to keep all but one parameter fixed and compute a value for the free parameter such that the discrepancy principle is satisfied. This allows one to trace discrepancy hypersurfaces to some extent.
Gazzola and Novati [6] describe another interesting method. They start with a one-parameter problem and successively add parameters in a novel way, until each parameter of the full multiparameter problem has a value assigned. Especially in early iterations the discrepancy principle is not satisfied, but the parameters are updated in each iteration so that the norm of the residual is expected to approach \(\eta \varepsilon \). Unfortunately, we observed some issues in our implementation. For example, the quality of the result depends on initial values, as well as the order in which the operators are added (that is, the indexing of the operators). The latter problem is solved by a recently published and improved version of the method [7], which was brought to our attention during the revision of this paper.
We propose a new method that satisfies the discrepancy principle exactly, does not depend on an initial guess, and is independent of the scaling or indexing of the operators. The method uses the operator splitting approach in combination with new weights. Let us omit all k subscripts for the remainder of this section, and suppose \({\mu ^{i}} = \mu {\omega ^{i}}\), where \({\omega ^{i}}\) are nonnegative, but do not necessarily sum to one, and \(\mu \) is such that the discrepancy principle is satisfied. Then (3) can be written as
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{{\varvec{c}}} \Vert {\underline{H}} {\varvec{c}} - \beta {\varvec{e}}_1\Vert ^2 {} + \mu \sum _{i=1}^\ell {\omega ^{i}} \Vert {K^{i}}{\varvec{c}}\Vert ^2. \end{aligned}$$
(13)
Since the goal of regularization is to reduce sensitivity of the solution to noise, we use the weights
$$\begin{aligned} {\omega ^{i}} = \frac{\Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert D{{\varvec{c}}^{i}}({\nu ^{i}})\Vert }, \end{aligned}$$
(14)
which bias the regularization parameters in the direction of lower sensitivity with respect to changes in \({\nu ^{i}}\). Here D denotes the (total) derivative with respect to regularization parameter(s), and \({{\varvec{c}}^{i}}\) and \({\nu ^{i}}\) are defined as before, consequently
$$\begin{aligned} D {{\varvec{c}}^{i}}({\nu ^{i}}) = -({\underline{H}}^* {\underline{H}} + {\nu ^{i}} {K^{i}}^* {K^{i}})^{-1} {K^{i}}^*{K^{i}} {{\varvec{c}}^{i}}({\nu ^{i}}). \end{aligned}$$
If for some indices \(D {{\varvec{c}}^{i}}({\nu ^{i}}) = \mathbf{0 }\), then we take a \({{\varvec{c}}^{i}}({\nu ^{i}})\) as the solution, or replace \(\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \) by a small positive constant. With this parameter choice, the solution does not depend on the indexing of the operators, nor, up to a constant, on the scaling of A, \(\varvec{b}\), or any of the \({L^{i}}\). The former is easy to see; for the latter, let \(\alpha \), \(\gamma \), and \({\lambda ^{i}}\) be positive constants, and consider the scaled problem
$$\begin{aligned} \mathop {{\arg \!\min }\,}\limits _{\widehat{\varvec{x}}} \Vert \gamma {\varvec{b}} - \alpha A \widehat{\varvec{x}}\Vert ^2 {} + \mu \sum _{i=1}^{\ell } {{\widehat{\omega }}^{i}} \Vert \lambda ^{i} {L^{i}} \widehat{\varvec{x}}\Vert ^2. \end{aligned}$$
The noisy component of \(\gamma {\varvec{b}}\) is \(\gamma {\varvec{e}}\) and \(\Vert \gamma {\varvec{e}}\Vert \le \gamma \varepsilon \), hence the new discrepancy bound becomes
$$\begin{aligned} \Vert \alpha A \widehat{\varvec{x}} - \gamma {\varvec{b}}\Vert = \gamma \eta \varepsilon . \end{aligned}$$
The bound is satisfied when \({{\widehat{\omega }}^{i}} = \alpha ^2 / (\lambda ^i)^2\; {\omega ^{i}}\), since in this case
$$\begin{aligned} \widehat{\varvec{x}} = \Big (\alpha ^2 A^*A + \mu \sum _{i=1}^\ell {\omega ^{i}} \frac{\alpha ^2}{({\lambda ^{i}})^2}({\lambda ^{i}})^2 {L^{i}}^*{L^{i}} \Big )^{-1} \alpha A^* \gamma {\varvec{b}} = \frac{\gamma }{\alpha } {\varvec{x}}. \end{aligned}$$
and
$$\begin{aligned} \min _{\widehat{\varvec{x}}} \Vert \gamma {\varvec{b}} - \alpha A \widehat{\varvec{x}}\Vert ^2 {} + \mu \sum _{i=1}^{\ell } {{\widehat{\omega }}^{i}} \Vert \lambda ^{i} {L^{i}} \widehat{\varvec{x}}\Vert ^2 = \gamma ^2 \Big ( \min _{{\varvec{x}}} \Vert A {\varvec{x}} - {\varvec{b}}\Vert ^2 {} + \mu \sum _{i=1}^\ell {\omega ^{i}} \Vert {L^{i}} {\varvec{x}}\Vert ^2 \Big ). \end{aligned}$$
It may be checked that the weights in (14) are indeed proportional to \(\alpha ^2/(\lambda ^i)^2\), that is
$$\begin{aligned} {\omega ^{i}} = \frac{\Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert } {}\sim \frac{\alpha ^2}{({\lambda ^{i}})^2}. \end{aligned}$$
There are additional viable choices for \({\omega ^{i}}\), including two smoothed versions of the above:
$$\begin{aligned} {\omega ^{i}} = \frac{\Vert {\underline{H}} {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert {\underline{H}} D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert } \quad \text {and}\quad {\omega ^{i}} = \frac{\Vert {K^{i}} {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }{ \Vert {K^{i}} D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert }, \end{aligned}$$
which consider the sensitivity of \({{\varvec{c}}^{i}}({\nu ^{i}})\) in the range of \({\underline{H}}\) and \({K^{i}}\) respectively. We summarize the new parameter selection in Algorithm 3 below.
Algorithm 3
(Multiparameter selection)
Input: Projected matrices \({\underline{H}}\), \({K^{1}}\), ..., \({K^{\ell }}\), \(\beta = \Vert {\varvec{b}}\Vert \), noise estimate \(\varepsilon \), uncertainty parameter \(\eta \), and threshold \(\tau \).
Output: Regularization parameters \({\mu ^{1}}\), ..., \({\mu ^{\ell }}\).
-
1.
Use (12) to compute \({{\varvec{c}}^{i}}\) and \({\nu ^{i}}\).
i
f
\(\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \le \tau \Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \) for some i
then
-
2.
Set \({\omega ^{i}} = \tau ^{-1}\); or set \({\mu ^{i}} = {\nu ^{i}}\) and \({\mu ^{j}} = 0\) for \(j\ne i\).
else
-
3.
Let \({\omega ^{i}} = \Vert {{\varvec{c}}^{i}}({\nu ^{i}})\Vert /\Vert D {{\varvec{c}}^{i}}({\nu ^{i}})\Vert \).
-
4.
Compute \(\mu \) in (13) such that the discrepancy principle is satisfied.
-
5.
Set \({\mu ^{i}} = \mu {\omega ^{i}}\).
An interesting property of Algorithm 3 is that, under certain conditions, \({\varvec{c}}(\varvec{\mu }({\widetilde{\varepsilon }}))\) converges to the unregularized least squares solution
$$\begin{aligned} {\varvec{c}}(\mathbf{0 }) = ({\underline{H}}^* {\underline{H}})^{-1} {\underline{H}}^* \beta {\varvec{e}}_1 = {\underline{H}}^+ \beta {\varvec{e}}_1, \end{aligned}$$
as the \({\widetilde{\varepsilon }}\) goes to zero. Here \({\underline{H}}^+\) denotes the Moore–Penrose pseudoinverse and \({\varvec{c}}(\mathbf{0 })\) is the minimum norm solution of the unregularized problem. The following proposition formalizes this observation.
Proposition 3
Assume that \({\underline{H}}\) is full rank, \({\underline{H}}^*\beta {\varvec{e}}_1 \ne \mathbf{0 }\), and that \({K^{i}}\) is nonsingular for \(i=1\), ...\(\ell \). Let \({\widetilde{\varepsilon }}\) and \(\rho \) be defined as in Sect. 3, let \(\eta >1\) be fixed, and suppose that \({\nu ^{i}}({\widetilde{\varepsilon }})\) and
$$\begin{aligned} \varvec{\mu }({\widetilde{\varepsilon }}) = ({\mu ^{1}}({\widetilde{\varepsilon }}), \dots , {\mu ^{\ell }}({\widetilde{\varepsilon }})) = \mu ({\widetilde{\varepsilon }}) ({\omega ^{1}}({\nu ^{1}}({\widetilde{\varepsilon }})), \dots , {\omega ^{\ell }}({\nu ^{\ell }}({\widetilde{\varepsilon }}))) \end{aligned}$$
are computed according to Algorithm 3 for all \(0\le {\widetilde{\varepsilon }} < \rho \). Then
$$\begin{aligned} \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\omega ^{i}}({\nu ^{i}}({\widetilde{\varepsilon }})) = {\omega ^{i}}(0) \quad \text {and}\quad \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\varvec{c}}(\varvec{\mu }({\widetilde{\varepsilon }})) = {\varvec{c}}(\mathbf{0 }). \end{aligned}$$
Proof
First note that \({\underline{H}}^* \beta {\varvec{e}}_1 \ne \mathbf{0 }\) implies that \(\beta > 0\) and \(\rho > 0\). Since \({\underline{H}}\) is full rank, the maps
$$\begin{aligned} \nu \mapsto {{\varvec{c}}^{i}}(\nu ), \quad \nu \mapsto D {{\varvec{c}}^{i}}(\nu ), \quad \text {and}\quad \varvec{\mu } \mapsto {\varvec{c}}(\varvec{\mu }) \end{aligned}$$
are continuous for all \(\nu \ge 0\) and \(\varvec{\mu }\ge \mathbf{0 }\), where the latter bound should be interpreted elementwise. Hence
$$\begin{aligned} \lim _{\nu \downarrow 0} {{\varvec{c}}^{i}}(\nu ) = {{\varvec{c}}^{i}}(0), \quad \lim _{\nu \downarrow 0} D{{\varvec{c}}^{i}}(\nu ) = D {{\varvec{c}}^{i}}(0), \quad \text {and}\quad \lim _{\varvec{\mu } \downarrow \mathbf{0 }} {\varvec{c}}(\varvec{\mu }) = {\varvec{c}}(\mathbf{0 }). \end{aligned}$$
It remains to be shown that
$$\begin{aligned} \lim _{{\widetilde{\varepsilon }} \downarrow 0} {\nu ^{i}}({\widetilde{\varepsilon }}) = 0, \quad \Vert D {{\varvec{c}}^{i}}(0)\Vert \ne 0, \quad \text {and}\quad \lim _{{\widetilde{\varepsilon }} \downarrow 0} \varvec{\mu }({\widetilde{\varepsilon }}) = \mathbf{0 }. \end{aligned}$$
(15)
Let \({\widetilde{\varepsilon }}\) be restricted to the interval \([0, \rho /2]\) and define \({\nu _{\max }^{i}} = \sigma _{\max }^2({\underline{H}} ({K^{i}})^{-1})\). By Proposition 2,
$$\begin{aligned} 0 \le {\nu ^{i}}({\widetilde{\varepsilon }}) \le \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}}\, {\nu _{\max }^{i}} \le {\nu _{\max }^{i}}, \end{aligned}$$
which proves the first limit in (15). Furthermore, using the definitions of \({{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\) and \(D {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\) we find the bounds
$$\begin{aligned}&0< \rho \beta \frac{\sigma _{\min }({\underline{H}})}{ \Vert {\underline{H}}\Vert ^2 + {\nu _{\max }^{i}} \Vert {K^{i}}\Vert ^2} \le \Vert {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\Vert \le \rho \beta \Vert {\underline{H}}^+{\varvec{e}}_1\Vert ,\\&0 < \rho \beta \frac{\sigma _{\min }({\underline{H}}) \sigma _{\min }^2({K^{i}})}{ (\Vert {\underline{H}}\Vert ^2 + {\nu _{\max }^{i}} \Vert {K^{i}}\Vert ^2)^2} \le \Vert D {{\varvec{c}}^{i}}({\nu ^{i}}({\widetilde{\varepsilon }}))\Vert \le \rho \beta \frac{\Vert {K^{i}}\Vert ^2\, \Vert {\underline{H}}^+{\varvec{e}}_1\Vert }{\sigma _{\min }^2({\underline{H}})}, \end{aligned}$$
which show that the inequality in (15) is satisfied. Moreover, the bounds show there exist \(\omega _{\min }\) and \(\omega _{\max }\) such that
$$\begin{aligned} 0< \omega _{\min } \le {\omega ^{i}}({\widetilde{\varepsilon }}) \le \omega _{\max } < \infty . \end{aligned}$$
Now, let \({\mathbf {K}}({\widetilde{\varepsilon }})\) be the nonsingular matrix satisfying
$$\begin{aligned} {\mathbf {K}}({\widetilde{\varepsilon }})^*{\mathbf {K}}({\widetilde{\varepsilon }}) = \sum _{i=1}^\ell {\omega ^{i}}({\widetilde{\varepsilon }}) {K^{i}}^*{K^{i}}, \end{aligned}$$
then it can be checked that
$$\begin{aligned} \Vert {\underline{H}} {\mathbf {K}}({\widetilde{\varepsilon }})^{-1}\Vert ^2 \le \frac{\Vert {\underline{H}}\Vert ^2}{\min _i \omega _{\min } \sigma _{\min }^2({K^{i}})} < \infty . \end{aligned}$$
Define the right hand side of the equation above as M, then by Proposition 2, each entry of \(\varvec{\mu }({\widetilde{\varepsilon }})\) is bounded from below by 0 and from above by
$$\begin{aligned} \frac{{\widetilde{\varepsilon }}}{\rho - {\widetilde{\varepsilon }}}\, M \omega _{\max }, \end{aligned}$$
which goes to 0 as \({\widetilde{\varepsilon }} \downarrow 0\). Therefore, this proves second limit in (15). \(\square \)
Proposition 3 is related to [9, Thm 3.3.3], where it is shown that the solution of a standard form Tikhonov regularization problem converges to a minimum norm least squares solution when the discrepancy principle is used and the noise converges to zero.
In this section we have discussed a new parameter selection method. In the next section we will look at the effect of perturbations in the parameters on the obtained solutions.