Each of the proposed methods solving bilinear inverse problems is based on a singular value thresholding on the tensor product \(\mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\) . If the dimension of the original space \(\mathbb {R}^{N_1} \times \mathbb {R}^{N_2}\) is already enormous, then the dimension of the tensor product literally explodes, which makes the computation of the required singular value decomposition impracticable. This difficulty occurs for nearly all bilinear image recovery problems. However, since the tensor \(\varvec{w}^{(n)}\) is generated by a singular value thresholding, the iterates \(\varvec{w}^{(n)}\) usually possesses a very low rank. Hence, the involved tensors can be stored in an efficient and storage-saving manner. In order to determine this low-rank representation, we only compute a partial singular value decomposition of the argument \(\varvec{w}\) of \(\mathcal {S}_\tau \) by deriving iterative algorithms only requiring the left- and right-hand actions of \(\varvec{w}\).
Our first algorithm is based on the orthogonal iteration with Ritz acceleration, see [28, 60]. In order to compute the leading \(\ell \) singular values, the main idea is here a joint power iteration over two \(\ell \)-dimensional subspaces \(\tilde{\mathcal {U}}_n \subset \mathbb {R}^{N_1}\) and \(\tilde{\mathcal {V}}_n \subset \mathbb {R}^{N_2}\) alternately generated by \(\tilde{\mathcal {U}}_n := \varvec{w}^* \varvec{H}_2 \tilde{\mathcal {V}}_{n-1}\) and \(\tilde{\mathcal {V}}_n := \varvec{w} \varvec{H}_1 \tilde{\mathcal {U}}_{n}\). These subspaces are represented by orthonormal bases \(\tilde{\varvec{U}}_n := [\tilde{\varvec{u}}_0^{(n)}, \dots , \tilde{\varvec{u}}_{\ell -1}^{(n)}]\) and \(\tilde{\varvec{V}}_n := [\tilde{\varvec{v}}_0^{(n)}, \dots , \tilde{\varvec{v}}_{\ell -1}^{(n)}]\) with respect to the inner products associated with \(\varvec{H}_1\) and \(\varvec{H}_2\).
Algorithm 4
(Subspace iteration)
Input: \(\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\), \(\ell > 0\), \(\delta > 0\).
-
(i)
Choose \(\tilde{\varvec{V}}_0 \in \mathbb {R}^{N_2 \times \ell }\), whose columns are orthonormal with respect to \(\varvec{H}_2\).
-
(ii)
For \(n>0\), repeat:
-
(a)
Compute \(\tilde{\varvec{E}}_n := \varvec{w}^* \varvec{H}_2 \tilde{\varvec{V}}_{n-1}\), and reorthonormalize the columns regarding \(\varvec{H}_1\).
-
(b)
Compute \(\tilde{\varvec{F}}_n := \varvec{w} \varvec{H}_1 \tilde{\varvec{E}}_{n}\), and reorthonormalize the columns regarding \(\varvec{H}_2\).
-
(c)
Determine the Euclidean singular value decomposition
$$\begin{aligned} \tilde{\varvec{F}}_n^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{E}}_n = \varvec{Y}_n \varvec{\Sigma }_n \varvec{Z}_n^*, \end{aligned}$$
and set \(\tilde{\varvec{U}}_n := \tilde{\varvec{E}}_n \varvec{Z}_n\) and \(\tilde{\varvec{V}}_n := \tilde{\varvec{F}}_n \varvec{Y}_n\).
until \(\ell \) singular vectors have converged, which means
$$\begin{aligned} \left| \left| \varvec{w}^* \varvec{H}_2 \tilde{\varvec{v}}_m^{(n)} - \sigma _m^{(n)} \tilde{\varvec{u}}_m^{(n)}\right| \right| _{\varvec{H}_1} \le \delta \, ||\varvec{w}||_{\mathcal {L}(\varvec{H}_1, \varvec{H}_2)} \qquad \text {for}\qquad 0 \le m < \ell , \end{aligned}$$
where \(||\cdot ||_{\mathcal {L}(\varvec{H}_1, \varvec{H}_2)}\) denotes the operator norm with respect to norms induced by \(\varvec{H}_1\) and \(\varvec{H}_2\), which may be estimated by \(\sigma _0^{(n)}\).
Output: \(\tilde{\varvec{U}}_n \in \mathbb {R}^{N_1 \times \ell }\), \(\tilde{\varvec{V}}_n \in \mathbb {R}^{N_2 \times \ell }\), \(\varvec{\Sigma }_n \in \mathbb {R}^{\ell \times \ell }\) with \(\tilde{\varvec{V}}_n^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_n = \varvec{\Sigma }_n\).
Here, reorthonormalization means that for each applicable m, the span of the first m columns of the matrix and its reorthonormalization coincide, and that the reorthonormalized matrix has orthonormal columns. This can, for instance, be achieved by the well-known Gram–Schmidt procedure.
Under mild conditions on the subspace associated with \(\tilde{\varvec{V}}_0\), the matrices \(\tilde{\varvec{U}}_n\), \(\tilde{\varvec{V}}_n\), and \(\varvec{\Sigma }_n := \text {diag}(\sigma _0^{(n)}, \dots , \sigma _{\ell -1}^{(n)})\) converge to leading singular vectors as well as to the leading singular values of a singular value decomposition \(\varvec{w} = \sum _{n=0}^{R-1} \sigma _n \, (\tilde{\varvec{u}}_n \otimes \tilde{\varvec{v}}_n)\).
Theorem 5
(Subspace iteration) If none of the basis vectors in \(\tilde{\varvec{V}}_0\) is orthogonal to the \(\ell \) leading singular vectors \(\tilde{\varvec{v}}_0, \dots , \tilde{\varvec{v}}_{\ell -1}\), and if \(\sigma _{\ell -1} > \sigma _\ell \), then the singular values \(\sigma _0^{(n)} \ge \cdots \ge \sigma _{\ell -1}^{(n)}\) in Algorithm 4 converge to \(\sigma _0 \ge \cdots \ge \sigma _{\ell -1}\) with a rate of
$$\begin{aligned} \bigl |\bigl (\sigma _m^{(n)} \bigr )^2 - \sigma _m^2\bigr | = O \Bigl ( \Bigl |\frac{\sigma _\ell }{\sigma _m}\Bigr |^{2n} \Bigr ) \qquad \text {and}\qquad \sigma ^{(n)}_m \le \sigma _m. \end{aligned}$$
Proof
By the construction in steps (a) and (b), the columns in \(\tilde{\varvec{E}}_n\) and \(\tilde{\varvec{F}}_n\) form orthonormal systems with respect to \(\varvec{H}_1\) and \(\varvec{H}_2\). In this proof, we denote the corresponding subspaces by \(\tilde{\mathcal {E}}_n\) and \(\tilde{\mathcal {F}}_n\), which are related by \(\tilde{\mathcal {E}}_n = \varvec{w}^* \varvec{H}_2 \tilde{\mathcal {F}}_{n-1}\) and \(\tilde{\mathcal {F}}_n = \varvec{w} \varvec{H}_1 \tilde{\mathcal {E}}_n\). Due to the basis transformation in (c), the columns of \(\tilde{\varvec{U}}_n\) and \(\tilde{\varvec{V}}_n\) also form orthonormal bases of \(\tilde{\mathcal {E}}_n\) and \(\tilde{\mathcal {F}}_n\). Next, we exploit that the projection \(\varvec{P}_n := \tilde{\varvec{V}}_n \tilde{\varvec{V}}_n^* \varvec{H}_2\) onto \(\tilde{\mathcal {F}}_n\) acts as identity on \(\varvec{w} \varvec{H}_1 \tilde{\mathcal {E}}_n\) by construction. Since \(\tilde{\varvec{U}}_n\) is a basis of \(\tilde{\mathcal {E}}_n\), and since \(\tilde{\varvec{V}}_n^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_n = \varvec{\Sigma }_n\) by the singular value decomposition in step (c), we have
$$\begin{aligned} \tilde{\varvec{U}}_n^* \varvec{H}_1 \varvec{w}^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_n= & {} \tilde{\varvec{U}}_n^* \varvec{H}_1 \varvec{w}^* \varvec{H}_2 \varvec{P}_n \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_n\nonumber \\= & {} \tilde{\varvec{U}}_n^* \varvec{H}_1 \varvec{w}^* \varvec{H}_2\tilde{\varvec{V}}_n \tilde{\varvec{V}}_n^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_n = \varvec{\Sigma }_n^2, \end{aligned}$$
(11)
and \(\tilde{\varvec{U}}_n\) diagonalizes \(\varvec{H}_1 \varvec{w}^* \varvec{H}_2 \varvec{w} \varvec{H}_1\) on the subspace \(\tilde{\mathcal {E}}_n\).
Using the substitutions
$$\begin{aligned} \varvec{E}_n := \varvec{H}_1^{\nicefrac 12} \, \tilde{\varvec{E}}_n, \quad \varvec{U}_n := \varvec{H}_1^{\nicefrac 12} \, \tilde{\varvec{U}}_n, \quad \varvec{F}_n := \varvec{H}_2^{\nicefrac 12} \, \tilde{\varvec{F}}_n, \quad \varvec{V}_n := \varvec{H}_2^{\nicefrac 12} \, \tilde{\varvec{V}}_n \end{aligned}$$
as well as
$$\begin{aligned} \mathcal {E}_n = \varvec{H}_1^{\nicefrac 12} \, \tilde{\mathcal {E}}_n \quad \text {and}\quad \mathcal {F}_n = \varvec{H}_2^{\nicefrac 12} \, \tilde{\mathcal {F}}_n, \end{aligned}$$
we notice that the iteration in Algorithm 4 is composed of two main steps. First, in (a) and (b), we compute an orthonormal basis \(\varvec{E}_n\) of
$$\begin{aligned} \mathcal {E}_n = (\varvec{H}_1^{\nicefrac 12} \varvec{w}^* (\varvec{H}_2^{\nicefrac 12})^*) (\varvec{H}_2^{\nicefrac 12} \varvec{w} \, (\varvec{H}_1^{\nicefrac 12})^*) \, \mathcal {E}_{n-1}. \end{aligned}$$
Secondly, (11) implies that we determine an Euclidean eigenvalue decomposition on the subspace \(\mathcal {E}_n\) by
$$\begin{aligned} \varvec{E}_n^* (\varvec{H}_1^{\nicefrac 12} \varvec{w}^* (\varvec{H}_2^{\nicefrac 12})^* ) (\varvec{H}_2^{\nicefrac 12} \varvec{w} \, (\varvec{H}_1^{\nicefrac 12} )^*) \varvec{E}_n = \varvec{Z}_n \varvec{\Sigma }_n^2 \varvec{Z}_n^* \end{aligned}$$
and \(\varvec{U}_n := \varvec{E}_n \varvec{Z}_n\).
This two-step iteration exactly coincides with the orthogonal iteration with Ritz acceleration for the matrix \( (\varvec{H}_1^{\nicefrac 12} \varvec{w}^* (\varvec{H}_2^{\nicefrac 12})^* ) (\varvec{H}_2^{\nicefrac 12} \varvec{w} \, (\varvec{H}_1^{\nicefrac 12} )^*)\), see [28, 60]. Under the given assumptions, this iteration converges to the \(\ell \) leading eigenvalues and eigenvectors with the asserted rates. In view of Lemma 2, the columns in \(\tilde{\varvec{U}}_n\) and \(\tilde{\varvec{V}}_n\) together with \(\varvec{\Sigma }_n\) converge to the leading components of the singular value decomposition of \(\varvec{w}\) with respect to \(\varvec{H}_1\) and \(\varvec{H}_2\). \(\square \)
Considering the subspace iteration (Algorithm 4), notice that the algorithm does not need an explicit representation of its argument \(\varvec{w}\) but the left- and right-hand actions of \(\varvec{w}\) as a matrix-vector multiplication. We may thus use the subspace iteration to compute the singular value thresholding \(\mathcal {S}_\tau (\varvec{w})\) without a tensor representation of \(\varvec{w}\).
Algorithm 5
(Tensor-free singular value thresholding)
Input: \(\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\), \(\tau > 0\), \(\ell > 0\), \(\delta > 0\).
-
(i)
Apply Algorithm 4 with the following modifications:
-
If \(\sigma _m^{(n)} > \tau \) for all \(0 \le m < \ell \), increase \(\ell \) and extend \(\tilde{\varvec{V}}_n\) by further orthonormal columns, unless \(\ell = {\text {rank}}\varvec{w}\), i.e., when the columns of \(\tilde{\varvec{E}}_n\) would become linearly dependent.
-
Additionally, stop the subspace iterations when the first \(\ell ' + 1\) singular values with \(\ell ' < \ell \) have converged and \(\sigma _{\ell '+1}^{(n)} < \tau \). Otherwise, continue the iteration until all nonzero singular values converge and set \(\ell ' = \ell \).
-
(ii)
Set \(\tilde{\varvec{U}}' := [\tilde{\varvec{u}}_0, \dots , \tilde{\varvec{u}}_{\ell '-1}]\), \(\tilde{\varvec{V}}' := [\tilde{\varvec{v}}_0, \dots , \tilde{\varvec{v}}_{\ell '-1}]\), and
$$\begin{aligned} \varvec{\Sigma }' := \text {diag}\bigl (S_\tau \bigl (\sigma _0^{(n)}\bigr ), \dots , S_\tau \bigl (\sigma _{\ell '}^{(n)}\bigr )\bigr ). \end{aligned}$$
Output: \(\tilde{\varvec{U}}' \in \mathbb {R}^{N_1 \times \ell '}\), \(\tilde{\varvec{V}}' \in \mathbb {R}^{N_2 \times \ell '}\), \(\varvec{\Sigma }' \in \mathbb {R}^{\ell ' \times \ell '}\) with \(\tilde{\varvec{V}}' \varvec{\Sigma }' (\tilde{\varvec{U}}')^* = \mathcal {S}_\tau (\varvec{w})\).
Corollary 1
(Exact singular value thresholding) If the nonzero singular values of \(\varvec{w}\) are distinct, and if none of the columns in \(\tilde{\varvec{V}}_n\) is orthogonal to the singular vectors with \(\sigma _n > \tau \), then Algorithm 5 computes the low-rank representation of \(\mathcal {S}_\tau (\varvec{w})\).
Although Algorithm 5 for generic start values always yields the singular value thresholding, the convergence of the subspace iteration is rather slow. Therefore, we now derive an algorithm that is based on the Lanczos-based bidiagonalization method proposed by Golub and Kahan in [27] and the Ritz approximation in [28]. This method again only require the left-hand and right-hand action of \(\varvec{w}\) with respect to a given vector. For simplifying the following considerations, we initially present the employed Lanczos process with respect to the Euclidean singular value decomposition.
The central idea is here to construct, for fixed k, orthonormal matrices \(\varvec{F}_k = [\varvec{f}_0, \dots , \varvec{f}_{k-1}] \in \mathbb {R}^{N_2 \times k}\) and \(\varvec{E}_k = [\varvec{e}_0, \dots , \varvec{e}_{k-1}] \in \mathbb {R}^{N_1 \times k}\) such that the transformed matrix
$$\begin{aligned} \varvec{F}_{k}^* \varvec{w} \varvec{E}_k = \varvec{B}_k = \begin{bmatrix} \beta _0 &{}\quad \gamma _0 &{}\quad &{}\quad &{}\quad \\ &{}\quad \beta _1 &{}\quad \gamma _1 &{}\quad &{}\quad \\ &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \\ &{}\quad &{}\quad &{}\quad \beta _{k-2} &{}\quad \gamma _{k-2} \\ &{}\quad &{}\quad &{}\quad &{}\quad \beta _{k-1} \end{bmatrix} \end{aligned}$$
(12)
is bidiagonal, and then to compute the singular value decomposition of \(\varvec{B}_k\) by determining orthogonal matrices \(\varvec{Y}_k\), \(\varvec{Z}_k\), and \(\varvec{\Sigma }_k\) in \(\mathbb {R}^{k \times k}\) such that
$$\begin{aligned} \varvec{Y}_{k}^* \varvec{B}_k \varvec{Z}_k = \varvec{\Sigma }_k = \text {diag}(\sigma _0, \dots , \sigma _{k-1}). \end{aligned}$$
Defining \(\varvec{U}_k \in \mathbb {R}^{N_1 \times k}\) and \(\varvec{V}_k \in \mathbb {R}^{N_2 \times k}\) as
$$\begin{aligned} \varvec{U}_k := \varvec{E}_k \varvec{Z}_k \qquad \text {and}\qquad \varvec{V}_k := \varvec{F}_k \varvec{Y}_k, \end{aligned}$$
we finally obtain a set of approximate right-hand and left-hand singular vectors, see [2, 27, 28].
The values \(\beta _n\) and \(\gamma _n\) of the bidiagonal matrix \(\varvec{B}_k\) and the related vectors \(\varvec{e}_n\) and \(\varvec{f}_n\) can be determined by the following iterative procedure [27]: Choose an arbitrary unit vector \(\varvec{p}_{-1} \in \mathbb {R}^{N_1}\) with respect to the Euclidean norm, and compute
$$\begin{aligned} \varvec{e}_{m+1} := \gamma _m^{-1} \, \varvec{p}_m,\quad&|\quad \varvec{f}_{m+1} := \beta _{m+1}^{-1} \varvec{q}_{m+1},\\ \varvec{q}_{m+1} := \varvec{w} \varvec{e}_{m+1} - \gamma _m \varvec{f}_m,\quad&|\quad \varvec{p}_{m+1} := \varvec{w}^* \varvec{f}_{m+1} - \beta _{m+1} \varvec{e}_{m+1},\\ \beta _{m+1} := ||\varvec{q}_{m+1}||,\quad&|\quad \gamma _{m+1} := ||\varvec{p}_{m+1}||. \end{aligned}$$
For the first iteration, we set \(\gamma _{-1} := 1\) and \(\varvec{f}_{-1} := \varvec{0}\). If \(\gamma _{m+1}\) vanishes, then we stop the Lanczos process since we have found an invariant Krylov subspace such that the computed singular values become exact.
In order to compute an approximate singular value decomposition with respect to \(\varvec{H}_1\) and \(\varvec{H}_2\), we exploit Lemma 2 and perform the Lanczos bidiagonalization regarding the transformed matrix \(\varvec{H}_2^{\nicefrac 12} \varvec{w} \, (\varvec{H}_1^{\nicefrac 12})^*\). Moreover, we incorporate the back transformation in Lemma 2 with the aid of the substitutions
$$\begin{aligned} \tilde{\varvec{e}}_m := \varvec{H}_1^{-\nicefrac 12} \varvec{e}_m, \quad \tilde{\varvec{p}}_m := \varvec{H}_1^{-\nicefrac 12} \varvec{p}_m \quad \text {and}\quad \tilde{\varvec{f}}_m := \varvec{H}_2^{-\nicefrac 12} \varvec{f}_m, \quad \tilde{\varvec{q}}_m := \varvec{H}_2^{-\nicefrac 12} \varvec{q}_m. \end{aligned}$$
(13)
In this manner, the square roots \(\varvec{H}_1^{\nicefrac 12}\) and \(\varvec{H}_2^{\nicefrac 12}\) and their inverses cancel out, and we obtain the following algorithm, which only relies on the original matrices \(\varvec{H}_1\) and \(\varvec{H}_2\).
Algorithm 6
(Lanczos bidiagonalization)
Input: \(\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\), \(k>0\).
-
(i)
Initiation: Set \(\gamma _{-1} := 1\) and \(\tilde{\varvec{f}}_{-1} := \varvec{0}\). Choose a unit vector \(\tilde{\varvec{p}}_{-1}\) with respect to \(\varvec{H}_1\).
-
(ii)
Lanczos bidiagonalization: For \(m = -1, \dots , k-2\) while \(\gamma _m \ne 0\), repeat:
-
(a)
Compute \(\tilde{\varvec{e}}_{m+1} := \gamma _m^{-1} \, \tilde{\varvec{p}}_m\), and reorthogonalize with \(\tilde{\varvec{e}}_0, \dots , \tilde{\varvec{e}}_m\) as to \(\varvec{H}_1\).
-
(b)
Determine \(\tilde{\varvec{q}}_{m+1} := \varvec{w} \varvec{H}_1 \tilde{\varvec{e}}_{m+1} - \gamma _m \tilde{\varvec{f}}_m\), and set \(\beta _{m+1} := ||\tilde{\varvec{q}}_{m+1}||_{\varvec{H}_2}\). Compute \(\tilde{\varvec{f}}_{m+1} := \beta _{m+1}^{-1} \tilde{\varvec{q}}_{m+1}\) and reorthogonalize with \(\tilde{\varvec{f}}_0, \dots , \tilde{\varvec{f}}_m\) as to \(\varvec{H}_2\).
-
(c)
Determine \(\tilde{\varvec{p}}_{m+1} := \varvec{w}^* \varvec{H}_2 \tilde{\varvec{f}}_{m+1} - \beta _{m+1} \tilde{\varvec{e}}_{m+1}\), and set \(\gamma _{m+1} := ||\tilde{\varvec{p}}_{m+1}||\).
-
(iii)
Compute the Euclidean singular value decomposition of \(\varvec{B}_k\) according to (12), i.e. \(\varvec{B}_k = \varvec{Y}_k \varvec{\Sigma }_k \varvec{Z}_{k}^*\), and set \(\tilde{\varvec{U}}_k := \tilde{\varvec{E}}_k \varvec{Z}_k\) and \(\tilde{\varvec{V}}_k := \tilde{\varvec{F}}_k \varvec{Y}_k\).
Output: \(\tilde{\varvec{U}}_k \in \mathbb {R}^{N_1 \times k}\), \(\tilde{\varvec{V}}_k \in \mathbb {R}^{N_2 \times k}\), \(\varvec{\Sigma }_k \in \mathbb {R}^{k \times k}\) with \(\tilde{\varvec{V}}_{k}^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_k = \varvec{\Sigma }_k\).
Remark 8
The bidiagonalization by Golub and Kahan is based on a Lanczos-type process, which is numerically unstable in the computation of \(\tilde{\varvec{e}}_n\) and \(\tilde{\varvec{f}}_n\). For this reason, we have to reorthogonalize all newly generated vectors \(\tilde{\varvec{e}}_n\) and \(\tilde{\varvec{f}}_n\) with the previously generated vectors, see [27]. This amounts to projecting \(\tilde{\varvec{e}}_{m+1}\) to the orthogonal complement of the span of \(\{\tilde{\varvec{e}}_{0}, \ldots , \tilde{\varvec{e}}_{m} \}\) and the analog for \(\tilde{\varvec{f}}_{m+1}\), for instance, via the Gram–Schmidt procedure.
Remark 9
The computation of the last \(\tilde{\varvec{p}}_{k-1}\) seems to be superfluous since it is not needed for the determination of the matrix \(\varvec{B}_k\). On the other side, this vector represents the residuals of the approximate singular value decomposition. More precisely, we have
$$\begin{aligned} \varvec{w} \varvec{H}_1 \tilde{\varvec{u}}_m = \sigma _m \tilde{\varvec{v}}_m \qquad \text {and}\qquad \varvec{w}^* \varvec{H}_2 \tilde{\varvec{v}}_m = \sigma _m \tilde{\varvec{u}}_m + \tilde{\varvec{p}}_{k-1} \varvec{\eta }_{k-1}^* \varvec{y}_m \end{aligned}$$
(14)
for \(m = 0, \dots , k-1\), see [2]. Here the vectors \(\tilde{\varvec{u}}_m\), \(\tilde{\varvec{v}}_m\), and \(\varvec{y}_m\) denote the columns of the matrices \(\tilde{\varvec{U}}_k = [\tilde{\varvec{u}}_0, \dots , \tilde{\varvec{u}}_{k-1}]\), \(\tilde{\varvec{V}}_k = [\tilde{\varvec{v}}_0, \dots , \tilde{\varvec{v}}_{k-1}]\), and \(\varvec{Y}_k = [\varvec{y}_0, \dots , \varvec{y}_{k-1}]\) respectively; the singular values \(\sigma _m\) of \(\varvec{B}_k\) are given by \(\varvec{\Sigma }_k = \text {diag}(\sigma _0, \dots , \sigma _{k-1})\); the vector \(\varvec{\eta }_{k-1} \in \mathbb {R}^k\) represents the last unit vector \((0, \dots , 0, 1)^*\).
Since the bidiagonalization method by Golub and Kahan is based on the Lanczos process for symmetric matrices, one can apply the related convergence theory to show that the approximate singular values and singular vectors – for increasing k – converge to the wanted singular value decomposition of \(\varvec{w}\), see [28]. Since we are only interested in the leading singular values and singular vectors, and since we want to choose the matrix \(\varvec{B}_k\) as small as possible, this convergence theory does not apply to our setting.
In order to improve the quality of the approximate singular value decomposition computed by Algorithm 6, we here use a restarting technique proposed by Baglama and Reichel [2]. The central idea is to adapt the Lanczos bidiagonalization such that the method can be restarted by a set of \(\ell \) previously computed Ritz vectors. For this purpose, Baglama and Reichel suggest a modified bidiagonalization of the form
$$\begin{aligned} \varvec{F}_{k,n}^* \varvec{w} \varvec{E}_{k,n} = \varvec{B}_{k,n} = \begin{bmatrix} \sigma _0^{(n-1)} &{}\quad &{}\quad &{}\quad \rho _0^{(n)} \\ &{}\quad \ddots &{}\quad &{}\quad \vdots \\ &{}\quad &{}\quad \sigma _{\ell -1}^{(n-1)} &{}\quad \rho _{\ell - 1}^{(n)} \\ &{}\quad &{}\quad &{}\quad \beta _\ell ^{(n)} &{}\quad \gamma _\ell ^{(n)} \\ &{}\quad &{}\quad &{}\quad &{}\quad \ddots &{}\quad \ddots \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \beta _{k-2}^{(n)} &{}\quad \gamma _{k-2}^{(n)} \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \beta _{k-1}^{(n)} \\ \end{bmatrix}, \end{aligned}$$
(15)
where the first \(\ell \) columns of the orthonormal matrices
$$\begin{aligned} \varvec{E}_{k,n} = [\varvec{u}_0^{(n-1)}, \dots , \varvec{u}_{\ell - 1}^{(n-1)}, \dots ] \qquad \text {and}\qquad \varvec{F}_{k,n} = [\varvec{v}_0^{(n-1)}, \dots , \varvec{v}_{\ell - 1}^{(n-1)}, \dots ] \end{aligned}$$
are predefined by the Ritz vectors of the previous iteration. For the computation of the first \(\ell < k\) leading singular values and singular vectors, we employ the following algorithm [2], which has been adapted to our setting by incorporating Lemma 2 and the substitution (14).
Algorithm 7
(Augmented Lanczos Bidiagonalization)
Input: \(\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\), \(\ell > 0\) \(k> \ell \), \(\delta >0\).
-
(i)
Apply Algorithm 6 to compute an approximate singular value decomposition \(\tilde{\varvec{V}}_{k,0}^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}}_{k,0} = \varvec{\Sigma }_{k,0}\).
-
(ii)
For \(n > 0\), until \(\ell \) singular vectors have converged, which means
$$\begin{aligned} \gamma _{k-1}^{(n-1)} |\varvec{\eta }_{k-1}^* \varvec{y}_m^{(n-1)}| \le \delta ||\varvec{w}||_{\mathcal {L}(\varvec{H}_1, \varvec{H}_2)} \qquad \text {for}\qquad 0 \le m < \ell , \end{aligned}$$
where \(||\cdot ||_{\mathcal {L}(\varvec{H}_1, \varvec{H}_2)}\) denotes the operator norm with respect to the norms induced by \(\varvec{H}_1\) and \(\varvec{H}_2\), which may be estimated by \(\sigma _0^{(n-1)}\), repeat:
-
(a)
Initialize the new iteration by setting \(\tilde{\varvec{e}}_m^{(n)} := \tilde{\varvec{u}}_m^{(n-1)}\) and \(\tilde{\varvec{f}}_m^{(n)} := \tilde{\varvec{v}}_m^{(n-1)}\) for \(m = 0, \dots , \ell - 1\). Further, set \(\tilde{\varvec{p}}_{\ell - 1}^{(n)} := \tilde{\varvec{p}}_{k-1}^{(n-1)}\) and \(\gamma _{\ell - 1}^{(n)} := ||\tilde{\varvec{p}}_{\ell - 1}^{(n)}||_{\varvec{H}_1}\).
-
(b)
Compute \(\tilde{\varvec{e}}_\ell ^{(n)} := (\gamma _{\ell - 1}^{(n)})^{-1} \, \tilde{\varvec{p}}_{\ell - 1}^{(n)}\), and reorthogonalize with \(\tilde{\varvec{e}}_0^{(n)}, \dots , \tilde{\varvec{e}}_{\ell - 1}^{(n)}\) as to \(\varvec{H}_1\).
-
(c)
Determine \(\tilde{\varvec{q}}_{\ell }^{(n)} := \varvec{w} \varvec{H}_1 \tilde{\varvec{e}}_\ell ^{(n)}\), compute the inner products \(\rho _m^{(n)} := \langle \tilde{\varvec{f}}_m^{(n)},\tilde{\varvec{q}}_\ell ^{(n)}\rangle _{\varvec{H}_2}\) for \(m = 0, \dots , \ell - 1\), and reorthogonalize \(\tilde{\varvec{q}}_\ell ^{(n)}\) as to \(\varvec{H}_2\) by
$$\begin{aligned} \tilde{\varvec{q}}_\ell ^{(n)} := \tilde{\varvec{q}}_\ell ^{(n)} - \sum _{m=0}^{\ell - 1} \rho _m^{(n)} \tilde{\varvec{f}}_m^{(n)} . \end{aligned}$$
-
(d)
Set \(\beta _\ell ^{(n)} := ||\tilde{\varvec{q}}_\ell ^{(n)}||_{\varvec{H}_2}\) and \(\tilde{\varvec{f}}_\ell ^{(n)} := (\beta _\ell ^{(n)})^{-1} \, \tilde{\varvec{q}}_\ell ^{(n)}\).
-
(e)
Determine \(\tilde{\varvec{p}}_\ell ^{(n)} := \varvec{w}^* \varvec{H}_2 \tilde{\varvec{f}}_\ell ^{(n)} - \beta _\ell ^{(n)} \tilde{\varvec{e}}_\ell ^{(n)}\), and set \(\gamma _\ell ^{(n)} := ||\tilde{\varvec{p}}_\ell ^{(n)}||_{\varvec{H}_1}\).
-
(f)
Calculate the remaining values of \(\varvec{B}_{k,n}\) by applying step (ii) of Algorithm 6 with \(m = \ell , \dots , k - 2\).
-
(g)
Compute the Euclidean singular value decomposition of \(\varvec{B}_{k,n}\) in (15), i.e. \(\varvec{B}_{k,n} = \varvec{Y}_{k,n} \varvec{\Sigma }_{k,n} \varvec{Z}_{k,n}^*\), and set \(\tilde{\varvec{U}}_{k,n} := \tilde{\varvec{E}}_{k,n} \varvec{Z}_{k,n}\) and \(\tilde{\varvec{V}}_{k,n} := \tilde{\varvec{F}}_{k,n} \varvec{Y}_{k,n}\).
-
(iii)
Set \(\tilde{\varvec{U}} := [\tilde{\varvec{u}}_0^{(n)}, \dots , \tilde{\varvec{u}}_{\ell -1}^{(n)}]\), \(\tilde{\varvec{V}} :=[\tilde{\varvec{v}}_0^{(n)}, \dots , \tilde{\varvec{v}}_{\ell -1}^{(n)}]\), and \(\varvec{\Sigma } := \text {diag}(\sigma _0^{(n)}, \dots , \sigma _{\ell -1}^{(n)})\).
Output: \(\tilde{\varvec{U}} \in \mathbb {R}^{N_1 \times \ell }\), \(\tilde{\varvec{V}} \in \mathbb {R}^{N_2 \times \ell }\), \(\varvec{\Sigma } \in \mathbb {R}^{\ell \times \ell }\) with \(\tilde{\varvec{V}}^* \varvec{H}_2 \varvec{w} \varvec{H}_1 \tilde{\varvec{U}} = \varvec{\Sigma }\).
Remark 10
The stopping criterion in step (ii) originates from the error representation in (14). For the operator norm \(||\tilde{\varvec{w}}||_{\mathcal {L}(\varvec{H}_1, \varvec{H}_2)}\), one may use the maximal leading singular values of the former iterations, which usually gives a sufficiently good approximation, see [2].
Although the numerical effort of the restarted augmented Lanczos process is enormously reduced compared with the subspace iteration, we are unfortunately not aware of a convergence and error analysis for this specific variant of Lanczos-type method. Nevertheless, we can employ the obtained partial singular value decomposition to determine the singular value thresholding.
Algorithm 8
(Tensor-free singular value thresholding)
Input: \(\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\), \(\tau > 0\), \(\ell > 0\), \(k>\ell \), \(\delta > 0\).
-
(i)
Apply Algorithm 7 with the following modifications:
-
If \(\sigma _m^{(n)} > \tau \) for all \(0 \le m < \ell \), increase \(\ell \) and k with \(\ell < k\), unless \(k = {\text {rank}}\varvec{w}\), i.e., when \(\gamma _{k}^{(n)}\) in Algorithm 6 vanishes.
-
Additionally, stop the augmented Lanczos method when the first \(\ell ' + 1\) singular values with \(\ell ' < \ell \) have converged and \(\sigma _{\ell '+1}^{(n)} < \tau \). Otherwise, continue the iteration until all nonzero singular values converge and set \(\ell ' = \ell \).
-
(ii)
Set \(\tilde{\varvec{U}}' := [\tilde{\varvec{u}}_0^{(n)}, \dots , \tilde{\varvec{u}}_{\ell '-1}^{(n)}]\), \(\tilde{\varvec{V}}' := [\tilde{\varvec{v}}_0^{(n)}, \dots , \tilde{\varvec{v}}_{\ell '-1}^{(n)}]\), and
$$\begin{aligned} \varvec{\Sigma }' := \text {diag}\bigl (S_\tau \bigl (\sigma _0^{(n)}\bigr ), \dots , S_\tau \bigl (\sigma _{\ell '}^{(n)}\bigr )\bigr ). \end{aligned}$$
Output: \(\tilde{\varvec{U}}' \in \mathbb {R}^{N_1 \times \ell '}\), \(\tilde{\varvec{V}}' \in \mathbb {R}^{N_2 \times \ell '}\), \(\varvec{\Sigma }' \in \mathbb {R}^{\ell ' \times \ell '}\) with \(\tilde{\varvec{V}}' \varvec{\Sigma }' (\tilde{\varvec{U}}')^* = \mathcal {S}_\tau (\varvec{w})\).
Besides the singular value thresholding, the proximal methods in Sect. 3 to solve the lifted and relaxed bilinear problems in Sect. 2 require the application of the lifted operators \(\breve{\mathcal {B}}\) well as its adjoints \(\breve{\mathcal {B}}^*\). Both operations can be computed in a tensor-free manner. Assuming that \(\varvec{w}\) has a low rank, one may compute the lifted bilinear forward operator with the aid of the universal property in Definition 1.
Corollary 2
(Tensor-free bilinear lifting) Let \(\mathcal {B} :\mathbb {R}^{N_1} \times \mathbb {R}^{N_2} \rightarrow \mathbb {R}^{M}\) be a bilinear mapping. If \(\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\) has the representation \(\varvec{w} = \tilde{\varvec{V}} \varvec{\Sigma } \tilde{\varvec{U}}^*\) with \(\tilde{\varvec{U}} := [\tilde{\varvec{u}}_0, \dots , \tilde{\varvec{u}}_{\ell -1}]\), \(\varvec{\Sigma } := \text {diag}(\sigma _0, \dots , \sigma _{\ell -1})\), and \(\tilde{\varvec{V}} := [\tilde{\varvec{v}}_0, \dots , \tilde{\varvec{v}}_{\ell -1}]\), then the lifted forward operator \(\breve{\mathcal {B}}\) acts by
$$\begin{aligned} \breve{\mathcal {B}}(\varvec{w}) = \sum _{n=0}^{\ell -1} \sigma _n \, \mathcal {B}(\tilde{\varvec{u}}_n, \tilde{\varvec{v}}_n). \end{aligned}$$
Considering the proximal methods, we see that the adjoint lifting only occurs in the argument of the singular value thresholding. If one applies the subspace iteration or the augmented Lanczos process, it is hence enough to study the left-hand and right-hand actions of the adjoint liftings. These actions can be expressed by the left-hand or right-hand adjoint of the original bilinear mapping \(\mathcal {B}\).
Lemma 5
(Tensor-free adjoint bilinear lifting) Let \(\mathcal {B} :\mathbb {R}^{N_1} \times \mathbb {R}^{N_2} \rightarrow \mathbb {R}^{M}\) be a bilinear mapping. The left-hand and right-hand actions of the adjoint lifting \(\breve{\mathcal {B}}^*(\varvec{y}) \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}\) with \(\varvec{y} \in \mathbb {R}^M\) are given by
$$\begin{aligned} \breve{\mathcal {B}}^*(\varvec{y}) \, \varvec{H}_1 \varvec{e} = [\mathcal {B}(\varvec{e}, \cdot )]^*(\varvec{y}) \qquad \text {and}\qquad [\breve{\mathcal {B}}^*(\varvec{y})]^* \, \varvec{H}_2 \varvec{f} = [\mathcal {B}(\cdot , \varvec{f})]^*(\varvec{y}) \end{aligned}$$
for \(\varvec{e} \in \mathbb {R}^{N_1}\) and \(\varvec{f} \in \mathbb {R}^{N_2}\).
Proof
Testing the right-hand action of the image \(\breve{\mathcal {B}}^*(\varvec{y})\) on \(\varvec{e} \in \mathbb {R}^{N_1}\) with an arbitrary vector \(\varvec{f} \in \mathbb {R}^{N_2}\), we obtain
$$\begin{aligned} \bigl \langle \breve{\mathcal {B}}^*(\varvec{y}) \, \varvec{H}_1 \varvec{e},\varvec{f}\bigr \rangle _{\varvec{H}_2}= & {} \text {tr}\bigl (\varvec{f}^* \varvec{H}_2\, \breve{\mathcal {B}}^*(\varvec{y})\, \varvec{H}_1 \varvec{e} \bigr )=\text {tr}\bigl ( \varvec{e} \varvec{f}^* \varvec{H}_2 \, \breve{\mathcal {B}}^*(\varvec{y}) \, \varvec{H}_1 \bigr )\\= & {} \bigl \langle \breve{\mathcal {B}}^*(\varvec{y}),\varvec{e}\otimes \varvec{f}\bigr \rangle _{\varvec{H}_1 \otimes \varvec{H}_2}=\bigl \langle \varvec{y},\mathcal {B}(\varvec{e}, \varvec{f})\bigr \rangle _{\varvec{K}}=\bigl \langle [\mathcal {B}(\varvec{e}, \cdot )]^*(\varvec{y}),\varvec{f}\bigr \rangle _{\varvec{H}_2}. \end{aligned}$$
The left-hand action follows analogously. \(\square \)
Remark 11
(Composed tensor-free adjoint lifting) Since the left-hand and right-hand actions of the tensor \(\varvec{w}^{(n)} = \sum _{k=0}^{R-1} \sigma _{k}^{(n)} \, (\tilde{\varvec{u}}_{k}^{(n)} \otimes \tilde{\varvec{v}}_{k}^{(n)})\) are given by
$$\begin{aligned} \varvec{w}^{(n)} \varvec{H}_1 \varvec{e}=\sum _{k=0}^{R-1} \sigma _{k}^{(n)} \, \langle \varvec{e},\tilde{\varvec{u}}_{k}^{(n)}\rangle _{\varvec{H}_1}\, \tilde{\varvec{v}}_{k}^{(n)} \end{aligned}$$
(16)
and
$$\begin{aligned} (\varvec{w}^{(n)})^* \varvec{H}_2 \varvec{f}=\sum _{k=0}^{R-1} \sigma _{k}^{(n)} \, \langle \varvec{f},\tilde{\varvec{v}}_{k}^{(n)}\rangle _{\varvec{H}_2}\, \tilde{\varvec{u}}_{k}^{(n)}, \end{aligned}$$
(17)
the right-hand action of the singular value thresholding argument \(\varvec{w} = \varvec{w}^{(n)} - \tau \, \breve{\mathcal {B}}^*(\varvec{y}^{(n+1)})\) within the proximal methods in Sect. 3 is given by
$$\begin{aligned} \varvec{w} \varvec{H}_1 \varvec{e} = - \tau \, [\mathcal {B}(\varvec{e}, \cdot )]^*\bigl (\varvec{y}^{(n+1)}\bigr ) + \sum _{k=0}^{R-1} \sigma _{k}^{(n)} \, \bigl \langle \varvec{e},\tilde{\varvec{u}}_{k}^{(n)}\bigr \rangle _{\varvec{H}_1} \, \tilde{\varvec{v}}_{k}^{(n)} \end{aligned}$$
(18)
and the left-hand action by
$$\begin{aligned} \varvec{w}^* \varvec{H}_2 \varvec{f} = - \tau \, [\mathcal {B}( \cdot , \varvec{f})]^*\bigl (\varvec{y}^{(n+1)}\bigr ) + \sum _{k=0}^{R-1} \sigma _{k}^{(n)} \, \bigl \langle \varvec{f},\tilde{\varvec{v}}_{k}^{(n)}\bigr \rangle _{\varvec{H}_2} \, \tilde{\varvec{u}}_{k}^{(n)}, \end{aligned}$$
(19)
where \(\varvec{w}^{(n)} = \sum _{k=0}^{R-1} \sigma _{k}^{(n)} \, (\tilde{\varvec{u}}_{k}^{(n)} \otimes \tilde{\varvec{v}}_{k}^{(n)})\).
Now we are ready to rewrite the proximal methods in Sect. 3 into tensor-free variants. Exemplarily, we consider the primal-dual method for bilinear operators and exact data, see Algorithm 1.
Algorithm 9
(Tensor-free primal-dual for exact data)
-
(i)
Initiation: Fix the parameters \(\tau , \sigma > 0\) and \(\theta \in [0,1]\). Choose the start value \((\varvec{w}^{(0)}, \varvec{y}^{(0)}) = (\varvec{0} \otimes \varvec{0}, \varvec{0})\) in \((\mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}) \times \mathbb {R}^M\), and set \(\varvec{w}^{(-1)}\) to \(\varvec{w}^{(0)}\).
-
(ii)
Iteration: For \(n \ge 0\), update \(\varvec{w}^{(n)}\) and \(\varvec{y}^{(n)}\):
-
(a)
Using the tensor-free computations in Corollary 2, determine
$$\begin{aligned} \varvec{y}^{(n+1)} := \varvec{y}^{(n)} + \sigma \, \bigl ( (1 + \theta ) \; \breve{\mathcal {B}}(\varvec{w}^{(n)}) - \theta \; \breve{\mathcal {B}}(\varvec{w}^{(n-1)}) - \varvec{g}^\dagger \bigr ). \end{aligned}$$
-
(b)
Compute a low-rank representation \(\varvec{w}^{(n+1)} = \tilde{\varvec{V}}^{(n+1)} \varvec{\Sigma }^{(n+1)} \tilde{\varvec{U}}^{(n+1)}\) of the singular value threshold
$$\begin{aligned} \mathcal {S}_\tau ( \varvec{w}^{(n)} - \tau \, \breve{\mathcal {B}}^*(\varvec{y}^{(n+1)})) \end{aligned}$$
with Algorithms 8 (or 5). The required actions are given in (18) and (19).
Remark 12
As starting value for the augmented Lanczos bidiagonalization according to Algorithm 8 required for step (ii.b) of Algorithm 9, we suggest a linear combination of the right-hand singular vectors of the previous iteration \(\varvec{w}^{(n)}\) in the hope that they are good approximations of the new singular vectors.
Using the above tensor-free computation methods, we immediately obtain a tensor-free variant for FISTA in Remark 6 since this iteration scheme is also based on the singular value thresholding, the lifted operator, and the action of its adjoint as well. Exploiting the universal property and Lemma 5, we can compute the actions of \(\breve{\varvec{w}}^{(n)} - \tau \breve{\mathcal {B}}^*(\breve{\mathcal {B}} \breve{\varvec{w}}^{(n)} - \varvec{g}^\epsilon )\) by setting \(\beta _{n+1} := \nicefrac {(t_n - 1)}{t_{n+1}}\) and
$$\begin{aligned} \varvec{y}^{(n)}:= & {} \sum _{k=0}^{R^{(n)}-1} (1 + \beta _n) \, \sigma _{k}^{(n)} \, \mathcal {B}(\tilde{\varvec{u}}_{k}^{(n)}, \tilde{\varvec{v}}_{k}^{(n)})\nonumber \\&- \sum _{k=0}^{R^{(n-1)}-1} \beta _n \, \sigma _{k}^{(n-1)} \, \mathcal {B}(\tilde{\varvec{u}}_{k}^{(n-1)}, \tilde{\varvec{v}}_{k}^{(n-1)}) - \varvec{g}^\epsilon . \end{aligned}$$
(20)
The right-hand and left-hand actions are now given by
$$\begin{aligned} \varvec{w} \varvec{H}_1 \varvec{e}:= & {} \sum _{k=0}^{R^{(n)}-1} (1 + \beta _n) \, \sigma _{k}^{(n)} \langle \varvec{e},\tilde{\varvec{u}}_{k}^{(n)}\rangle _{\varvec{H}_1} \tilde{\varvec{v}}_{k}^{(n)}\nonumber \\&- \sum _{k=0}^{R^{(n-1)}-1} \beta _n \sigma _{k}^{(n-1)} \langle \varvec{e},\tilde{\varvec{u}}_{k}^{(n-1)}\rangle _{\varvec{H}_1} \tilde{\varvec{v}}_{k}^{(n-1)}-\tau [\mathcal {B}(\varvec{e}, \cdot )]^* ( \varvec{y}^{(n)}) \end{aligned}$$
(21)
and
$$\begin{aligned} \varvec{w}^* \varvec{H}_2 \varvec{f}:= & {} \sum _{k=0}^{R^{(n)}-1} (1 +\beta _n) \, \sigma _{k}^{(n)} \langle \varvec{f},\tilde{\varvec{v}}_{k}^{(n)}\rangle _{\varvec{H}_2} \tilde{\varvec{u}}_{k}^{(n)}\\&-\sum _{k=0}^{R^{(n-1)}-1} \beta _n \sigma _{k}^{(n-1)} \langle \varvec{f},\tilde{\varvec{v}}_{k}^{(n-1)}\rangle _{\varvec{H}_2} \tilde{\varvec{u}}_{k}^{(n-1)}-\tau [\mathcal {B}( \cdot , \varvec{f})]^* ( \varvec{y}^{(n)}),\nonumber \end{aligned}$$
(22)
where \(\varvec{w}^{(n)} = \sum _{k=0}^{R^{(n)}-1} \sigma _{k}^{(n)} \, (\tilde{\varvec{u}}_{k}^{(n)} \otimes \tilde{\varvec{v}}_{k}^{(n)})\). These lead us to the following tensor-free algorithm.
Algorithm 10
(Tensor-free FISTA for Tikhonov)
-
(i)
Initiation: Fix the parameters \(\alpha , \tau > 0\). Choose the start value \((\varvec{w}^{(0)}) = (\varvec{0} \otimes \varvec{0})\) in \((\mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2})\), and set \(\varvec{w}^{(-1)}\) to \(\varvec{w}^{(0)}\) as well as \(t_0 := 1\) and \(\beta _0 := 0\).
-
(ii)
Iteration: For \(n \ge 0\), update \(\varvec{w}^{(n)}\), \(t_n\) and \(\beta _n\):
-
(a)
Determine \(\varvec{y}^{(n)}\) in (20).
-
(b)
Compute a low-rank representation \(\varvec{w}^{(n+1)} = \tilde{\varvec{V}}^{(n+1)} \varvec{\Sigma }^{(n+1)} \tilde{\varvec{U}}^{(n+1)}\) of the singular value threshold
$$\begin{aligned} \mathcal {S}_{\tau \alpha } ( (1+\beta _n) \varvec{w}^{(n)} - \beta _n \varvec{w}^{(n-1)}- \tau \, \breve{\mathcal {B}}^*(\varvec{y}^{(n)})) \end{aligned}$$
with Algorithms 8 (or 5). The required actions are given in (21) and (22).
-
(c)
Set
$$\begin{aligned} t_{n+1} := \frac{1 + \sqrt{1 + 4 t_n^2}}{2} \qquad \text {and}\qquad \beta _{n+1} := \frac{t_n - 1}{t_{n+1}}. \end{aligned}$$
Since FISTA as well as the primal-dual method require two times the evaluation of the lifted operator and one time the singular value thresholding in every iteration, the numerical complexity of both algorithms is comparable.
Adapting the computation of \(\varvec{y}^{(n+1)}\), one may analogously apply Algorithms 2 and 3 in a completely tensor-free manner. Because the singular value thresholding can be computed with arbitrary high accuracy, the convergence results for the primal-dual algorithm translates to our setting. The convergence analysis [18, Thm. 1] yields the following convergence guarantee, where the norm of the bilinear operator \(\mathcal {B}\) is defined by
$$\begin{aligned} ||\mathcal {B}|| := \sup _{\varvec{u} \in \mathbb {R}^{N_1} \setminus \{ \varvec{0}\}} \, \sup _{\varvec{v} \in \mathbb {R}^{N_2} \setminus \{ \varvec{0}\}} \, \frac{||\mathcal {B}(\varvec{u}, \varvec{v})||_{\varvec{K}}}{||\varvec{u}||_{\varvec{H}_1} \, ||\varvec{v}||_{\varvec{H}_2}}. \end{aligned}$$
Theorem 6
(Convergence—exact primal-dual) Under the parameter choice rule \(\theta = 1\) and \(\tau \sigma ||\mathcal {B}||^2 < 1\), the iteration \((\varvec{w}^{(n)}, \varvec{y}^{(n)})\) in Algorithm 9 converges to a minimizer \((\varvec{w}^\dagger , \varvec{y}^\dagger )\) of the lifted and relaxed problem (\({\mathfrak {B}_{0}}\)).
Proof
For the general minimization problem (3), the related saddle-point problem is given by
$$\begin{aligned} \text {minimize}_{\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}} \quad \text {maximize}_{\varvec{y} \in \mathbb {R}^M} \quad \langle \mathcal {A}(\varvec{w}),\varvec{y}\rangle + G(\varvec{w}) - F^*(\varvec{y}), \end{aligned}$$
(23)
cf. [18]. Hence, the bilinear relaxation with exact data (\({\mathfrak {B}_{0}}\)) corresponds to the primal-dual formulation
$$\begin{aligned} \text {minimize}_{\varvec{w} \in \mathbb {R}^{N_1} \otimes \mathbb {R}^{N_2}} \quad \text {maximize}_{\varvec{y} \in \mathbb {R}^M} \quad \langle \breve{\mathcal {B}}(\varvec{w}) - \varvec{g}^\dagger ,\varvec{y}\rangle + ||\varvec{w}||_{\pi (\varvec{H}_1, \varvec{H}_2)}. \end{aligned}$$
(24)
Due to [52, Thm. 28.3], the first components \(\tilde{\varvec{w}}\) of the saddle-points \((\tilde{\varvec{w}}, \tilde{\varvec{y}})\) of (24) are solutions of (\({\mathfrak {B}_{0}}\)). Vice versa, [52, Cor. 28.2.2] implies that the solutions \(\tilde{\varvec{w}}\) of (\({\mathfrak {B}_{0}}\)) are saddle-points of (24). In particular, the saddle-point problem (24) has at least one solution since the given data are exact.
Now, [18, Thm. 1] yields the convergence \((\varvec{w}^{(n)}, \varvec{y}^{(n)}) \rightarrow (\varvec{w}^\dagger , \varvec{y}^\dagger )\) of the primal-dual iteration in Algorithm 9, where the limit \((\varvec{w}^\dagger , \varvec{y}^\dagger )\) denotes a saddle point of (24), and \(\varvec{w}^\dagger \) thus a solution of (\({\mathfrak {B}_{0}}\)). \(\square \)
The employed subspace iteration and augmented Lanczos bidiagonalization are iterative schemes, which only calculate an approximation of the required singular value decomposition. How does this errors affect the convergence of the tensor-free primal-dual method? Using the subspace iteration, we may theoretically calculate the required singular values and vectors arbitrarily precise, which allows us to control the approximation error
$$\begin{aligned} E_n := ||\tilde{\varvec{w}}^{(n)} - \varvec{w}^{(n)}||_{\varvec{H}_1 \otimes \varvec{H}_2}. \end{aligned}$$
between the exact thresholding \(\varvec{w}^{(n)}\) and the approximated \(\tilde{\varvec{w}}^{(n)}\). If the made errors \(E_n\) are square-root summable, the primal-dual method converges nevertheless to the wanted solution.
Theorem 7
(Convergence – inexact primal-dual) Let \(\theta = 1\) and \(\tau \sigma ||\mathcal {B}||^2 < 1\). If the series \(\sum _{n = 1}^\infty E_n^{\nicefrac 12}\) converges, then the iteration \((\varvec{w}^{(n)}, \varvec{y}^{(n)})\) in Algorithm 9 converges to a point \((\varvec{w}^\dagger , \varvec{y}^\dagger )\), where \(\varvec{w}^\dagger \) is a minimizer of the lifted and relaxed problem (\({\mathfrak {B}_{0}}\)).
Proof
Without loss of generality, we assume that the approximations errors are bounded by \(E_n \le 1\). Next, we compare the objective of the proximation function in (7) at the minimizer \(\varvec{w}^{(n)} := \mathcal {S}_\tau (\varvec{w})\) and its approximation \(\tilde{\varvec{w}}^{(n)}\), where \(\varvec{w}\) is the argument of the singular value thresholding in the nth iteration. Exploiting that the projective (Schatten-one) norm is bounded by the Hilbertian (Schatten-two) norm, we here have
$$\begin{aligned}&\tau ||\tilde{\varvec{w}}^{(n)}||_{\pi (\varvec{H}_1, \varvec{H}_2)} +\tfrac{1}{2} ||\tilde{\varvec{w}}^{(n)} - \varvec{w}||^2_{\varvec{H}_1 \otimes \varvec{H}_2}- \tau ||\varvec{w}^{(n)}||_{\pi (\varvec{H}_1, \varvec{H}_2)} - \tfrac{1}{2}||\varvec{w}^{(n)} -\varvec{w}||^2_{\varvec{H}_1 \otimes \varvec{H}_2}\\&\quad \le \tau ||\tilde{\varvec{w}}^{(n)} - \varvec{w}^{(n)}||_{\pi (\varvec{H}_1, \varvec{H}_2)}+ \tfrac{1}{2} \bigl (||\tilde{\varvec{w}}^{(n)} - \varvec{w}^{(n)}||_{\varvec{H}_1 \otimes \varvec{H}_2} + ||\varvec{w}^{(n)} - \varvec{w}||_{\varvec{H}_1 \otimes \varvec{H}_2}\bigr )^2\\&\quad \quad -\tfrac{1}{2} ||\varvec{w}^{(n)} -\varvec{w}||^2_{\varvec{H}_1 \otimes \varvec{H}_2}\\&\quad \le \tau ||\tilde{\varvec{w}}^{(n)} - \varvec{w}^{(n)}||_{\pi (\varvec{H}_1, \varvec{H}_2)}+ \tfrac{1}{2} ||\tilde{\varvec{w}}^{(n)} - \varvec{w}^{(n)}||_{\varvec{H}_1 \otimes \varvec{H}_2}^2\\&\quad \quad + ||\tilde{\varvec{w}}^{(n)} - \varvec{w}^{(n)}||_{\varvec{H}_1 \otimes \varvec{H}_2} ||\varvec{w}^{(n)} - \varvec{w}||_{\varvec{H}_1 \otimes \varvec{H}_2}\\&\quad \le \tau E_n + \tfrac{1}{2} \, E_n^2 + \tau \sqrt{S} E_n\le \bigl ( \tau \, (\sqrt{S}+1) + \tfrac{1}{2} \, E_n \bigr ) \, E_n\le C \, E_n \end{aligned}$$
with \(S := \min \{N_1,N_2\}\) and with an appropriate constant \(C > 0\). Since \(\tilde{\varvec{w}}^{(n)}\) approximates the minimum of the proximal function with precision \(C E_n\), the calculated \(\tilde{\varvec{w}}^{(n)}\) is a so-called type-one approximation of the proximal point \(\varvec{w}^{(n)} := \mathcal {S}_\tau (\varvec{w})\) with precision \(C E_n\), see [50, p. 385]. Since the precisions \((C E_n)^{\nicefrac 12}\) are summable, [50, Thm. 2] guarantees the convergence of the inexact primal-dual method to a saddle-point \((\varvec{w}^\dagger , \varvec{y}^\dagger )\). As in the proof of Theorem 6, the first component \(\varvec{w}^\dagger \) is a solution of the lifted problem (\({\mathfrak {B}_{0}}\)). \(\square \)
Similar convergence guarantees can be obtained for the bilinear relaxations (\({\mathfrak {B}_{\epsilon }}\)) and (\({\mathfrak {B}_\alpha }\)). Depending on the considered problem—the bilinear forward operator—and on the applied proximal algorithm, one may even obtain explicit convergence rates. Recalling the recovery guarantee in Theorem 1 exemplarily, then Algorithm 9 moreover converges to a rank-one tensor and thus to a solution of the bilinear inverse problem (\({\mathfrak {B}}\)) with high probability. Analogous convergence results apply for other recovery guarantees for noise-free and noisy measurements.
Corollary 3
(Recovery guarantee) Let \(\mathcal {B}\) be a bilinear operator randomly generated as in (2). Then, there exist positive constants \(c_0\) and \(c_1\) such that the Algorithm 9 converges to a solution of (\({\mathfrak {B}}\)) with probability at least \(1 - \mathrm {e}^{-c_1 p}\) whenever \(p \ge c_0(N_1+N_2) \log (N_1 N_2)\).