1 Introduction

The SVD plays a central role in a vast variety of applications, primarily due to its best low-rank approximation property [11, Thm. 7.4.9.1], which is valid in any unitarily invariant norm. Frequently in practice, the matrix is too large to compute the full SVD, and computing a partial and approximate SVD \(A\approx U_1\Sigma _1 V_1^*\) is of interest, where \(U_1,V_1\) are tall-skinny matrices.

The standard approach to computing an approximate SVD of a large-scale matrix \(A\in \mathbb {C}^{m\times n}\) is to project A onto lower-dimensional trial subspaces spanned by \(\widehat{U}\in \mathbb {C}^{m\times {\widehat{m}}}, \widehat{V}\in \mathbb {C}^{n\times {\widehat{n}}}\) having orthonormal columns (how to choose \(\widehat{U},\widehat{V}\) is outside the scope; we refer e.g. to [1, 7]), compute the SVD of the small \({\widehat{m}}\times {\widehat{n}}\) matrix \(\widehat{A}:=\widehat{U}^*A\widehat{V}= \widetilde{U}\widehat{\Sigma }\widetilde{V}^*\) and obtain an approximate tall-skinny SVD as \(A\approx (\widehat{U}\widetilde{U})\widehat{\Sigma }(\widehat{V}\widetilde{V})^*\), represented explicitly as a low-rank (rank \(\min ({\widehat{m}},{\widehat{n}})\)) matrix. Some of the columns of \(\widehat{U}\widetilde{U}\) and \(\widehat{V}\widetilde{V}\) then approximate the exact left and right singular vectors of A. This process is an analogue of the Rayleigh–Ritz process for the symmetric eigenvalue problem, sometimes called the Petrov–Galerkin method [2], since the residual \(R=A-(\widehat{U}\widetilde{U})\widehat{\Sigma }(\widehat{V}\widetilde{V})^*\) is orthogonal to \(\widehat{U},\widehat{V}\), that is, \(\widehat{U}^*R = 0, R\widehat{V}= 0\).

For the Rayleigh–Ritz process for the symmetric eigenvalue problem, existing results are available for bounding the accuracy of the computed eigenvectors, most notably through results by Saad [20, Thm. 4.6], Knyazev [12, Thm. 4.3] and Stewart [21, Thm. 2]. Roughly speaking, these results show that the Rayleigh–Ritz process extracts an approximate set of eigenvectors that are optimal (for a fixed subspace) up to a certain constant factor, which depends on the residual norm and a gap between the exact and approximate eigenvalues (Ritz values).

This work derives analogous results for the SVD by establishing bounds for the accuracy of the computed left and right singular vectors, measured by the angle between the computed and desired subspaces. In essence, the message is the same as in the Rayleigh–Ritz process for eigenvalue problems: the projection algorithm obtains approximate sets of left and right singular vectors that are optimal to within a constant factor, again depending on the residual norm and a gap between exact and approximate singular values. The vector case (\(m_1=n_1=1\) in the notation below) was dealt with by Hochstenbach [10, Thm. 2.4]; the results here improve the bound slightly, and generalize to subspaces \(m_1,n_1>1\). A preliminary version appeared in the author’s PhD dissertation [17, Ch. 10].

Let us clarify the problem formulation. Let A be an \(m\times n\) matrix and \(\widehat{U}\in \mathbb {C}^{m\times {\widehat{m}}},\widehat{V}\in \mathbb {C}^{n\times {\widehat{n}}}\) (throughout, matrices UV and the variants \(\widehat{U},\widehat{V},\widetilde{U},\widetilde{V}\) have orthonormal columns) represent trial subspaces that are hoped to respectively approximately contain the subspace spanned by a set of desired \(k\le \min ({\widehat{m}},{\widehat{n}})\) exact singular vectors \(U_1\) and \(V_1\), corresponding to some of the singular values, usually but not necessarily the largest ones. That is,

$$\begin{aligned} A=[U_1\ U_1^\perp ] \begin{bmatrix} \Sigma _1&\\ {}&\Sigma _{(\perp )} \end{bmatrix} [V_1\ V_1^\perp ]^* \end{aligned}$$
(1)

is an exact full SVD, with \(\Sigma _1\in \mathbb {R}^{k\times k}\) and the singular values are arranged in an arbitrary (not necessarily decreasing) order. Let \([\widehat{U}\ \widehat{U}_3]\in \mathbb {C}^{m\times m}\) and \([\widehat{V}\ \widehat{V}_3]\in \mathbb {C}^{n\times n}\) be square unitary matrices and (we are using the subscript 3 for consistency with what follows)

$$\begin{aligned} (\widetilde{A}=)\ [\widehat{U}\ \widehat{U}_3]^{*}A[\widehat{V}\ \widehat{V}_3]= \begin{bmatrix} \widehat{\Sigma }&\quad R\\ S&\quad A_3 \end{bmatrix}. \end{aligned}$$
(2)

In practice we usually do not have access to \(\widehat{U}_3\) and \(\widehat{V}_3\). Accordingly we do not know R or S, but their norms can be computed via \(\Vert \widetilde{S}\Vert =\Vert A\widehat{V}-\widetilde{U}\widehat{\Sigma }\Vert \) and \(\Vert R\Vert =\Vert \widehat{U}^*A-\widehat{\Sigma }\widehat{V}^*\Vert \), which hold for any unitarily invariant norm. Similarly, we will not need to know \(A_3\), although the assumptions made in the results will have implications on its singular values.

If \(\Vert R\Vert =\Vert S\Vert =0\), then \(\widehat{U}\) and \(\widehat{V}\) correspond to exact left and right singular subspaces. When RS are both small we can expect \(\widehat{U},\widehat{V}\) to be good approximations to some singular vectors. Bounds that assess the quality of the projection subspaces \(\widehat{U}\) and \(\widehat{V}\) are derived in a classical result by Wedin [24].

The focus of this paper is the quality of the projection process as a means to extract good singular vectors from the projection subspaces. Specifically, in practice, it often happens that the whole projection space \(\widehat{U}\) is not very close to an exact singular subspace, but a subspace of \(\widehat{U}\) is close to an exact singular subspace of lower dimension (this is the case e.g. in the Jacobi–Davidson context [9], in which \(\widehat{U},\widehat{V}\) contain approximate and search subspaces). In view of this, the hope is that if \(\widehat{U}\) contains rich information in an exact singular subspace \(U_1\) (which is usually of lower dimension k than \({\widehat{m}}\)), then the projection method computes a good approximant \(\widehat{U}_1\) to \(U_1\). Quantifying this hope is the objective of this paper.

From the algorithmic viewpoint, we compute the projected matrix \(\widehat{A}=\widehat{U}^* A\widehat{V}\in \mathbb {C}^{\widehat{m}\times \widehat{n}}\) and its (full) SVD

$$\begin{aligned} \widehat{A}= \widehat{U}^* A\widehat{V}=\left[ \widetilde{U}_{1}\ \widetilde{U}_{2}\right] \begin{bmatrix} \widehat{\Sigma }_{1}&\\&\widehat{\Sigma }_{2} \end{bmatrix} \left[ \widetilde{V}_{1}\ \widetilde{V}_{2}\right] ^*,\nonumber \end{aligned}$$

where \(\widehat{\Sigma }_1\) is \(m_1\times n_1\) and \(\widehat{\Sigma }_2\) is \((\widehat{m}- m_1)\times (\widehat{n}-n_1)\); hence \(\widetilde{U}_1\in \mathbb {C}^{\widehat{m}\times m_1},\widetilde{U}_2\in \mathbb {C}^{\widehat{m}\times (\widehat{m}-m_1)},\) \(\widetilde{V}_1\in \mathbb {C}^{\widehat{n}\times n_1},\widetilde{V}_2\in \mathbb {C}^{\widehat{n}\times (\widehat{n}-n_1)}\). We do not impose any ordering in the singular values \(\widehat{\Sigma }_1\), \(\widehat{\Sigma }_2\). Then, defining \(\widehat{U}_i=\widehat{U}\widetilde{U}_{i}\) and \(\widehat{V}_i=\widehat{V}\widetilde{V}_{i}\) for \(i=\{1,2\}\), we have

$$\begin{aligned} \widetilde{A}:=\ \left[ \widehat{U}_1\ \widehat{U}_2\ \widehat{U}_3\right] ^{*}A\left[ \widehat{V}_1\ \widehat{V}_2\ \widehat{V}_3\right] = \begin{bmatrix} \widehat{\Sigma }_1&\quad 0&\quad R_1\\ 0&\quad \widehat{\Sigma }_2&\quad R_2\\ S_1&\quad S_2&\quad A_3 \end{bmatrix}. \end{aligned}$$
(3)

The goal here is to assess the quality of the extracted subspaces by examining the angles \(\angle (U_1,\widehat{U}_1)\) and \(\angle (V_1,\widehat{V}_1)\), the angles between extracted and exact singular subspaces, as compared with \(\angle (U_1,\widehat{U})\) and \(\angle (V_1,\widehat{V})\), the angles between the trial and exact subspace. Since \(\text{ span }(\widehat{U}_1)\subseteq \text{ span }(\widehat{U})\), we trivially have \(\angle (U_1,\widehat{U})\le \angle (U_1,\widehat{U}_1)\) and \(\angle (V_1,\widehat{V})\le \angle (V_1,\widehat{V}_1)\). The quality of the extraction is measured by how far from best possible the extracted subspace \(\widehat{U}_1,\widehat{V}_1\) are: the extraction is considered near-optimal if \(\angle (U_1,\widehat{U}_1)\le c\angle (U_1,\widehat{U})\) and \(\angle (V_1,\widehat{V}_1)\le c\angle (V_1,\widehat{V})\) hold for a constant c not much larger than 1.

Let us recall the definition of angles between subspaces. The angles \(\theta _i\) between two subspaces spanned by \(X\in \mathbb {C}^{m\times n_X},Y\in \mathbb {C}^{m\times n_Y}\) (\(n_X\le n_Y (\le m)\)) with orthonormal columns are defined by \(\theta _i=\text{ acos }(\sigma _i(X^*Y))\); they are known as the canonical angles or principal angles [5, Thm. 6.4.3]. These are connected to the CS decomposition for unitary matrices [5, Thm. 2.5.3], [18], in which the matrix \(\big [ Y\ Y^\perp \big ]^*X\) satisfies \(\text{ diag }(Q_1,Q_2)\big (\big [ Y\ Y^\perp \big ]^*X\big )W= \left[ \begin{array}{c} C\\ S\\ 0 \end{array}\right] \), where \(Q_1\in \mathbb {C}^{n_X\times n_X}, Q_2\in \mathbb {C}^{(n-n_X)\times (n-n_X)},W\in \mathbb {C}^{n_X\times n_X}\) are all square unitary and \(C=\text{ diag }(\cos \theta _1,\ldots ,\cos \theta _{n_X}),S=\text{ diag }(\sin \theta _1,\ldots ,\sin \theta _{n_X})\). We write \(\angle (X,Y)=\text{ diag }(\theta _1,\ldots ,\theta _{n_X})\), and \(\sin \angle (X,Y)=S\).

Applying a unitary transformation in (3) and (1), we see that \(\angle (\widehat{U}_1,U_1)=\angle \left( \widetilde{U}_1,\left[ \begin{array}{c} I_k\\ 0\end{array}\right] \right) \) and \(\angle (\widehat{V}_1,V_1) =\angle \left( \widetilde{V}_1,\big [\begin{matrix} I_k\\ 0\end{matrix}\big ]\right) \), where \(\widetilde{U}_1,\widetilde{V}_1\) are matrices of exact singular vectors of \(\widetilde{A}\) in (3). We note that it is natural to take \(k=m_1=n_1\), although this is not necessary; k is an integer that one can choose from \(1\le k\le \min ({m_1},{n_1})\); the upper bound is required for the extracted subspace \(\widetilde{U}_1,\widetilde{V}_1\) to contain the exact subspaces.

Here is the plan of this paper. First in Sect. 2 we review the results for the Rayleigh–Ritz process in symmetric eigenvalue problems. We then derive analogous results for the SVD in Sect. 3.

Notation. \(\widehat{U},\widehat{V}\) always respectively denote the left and right trial projection subspaces. \(\widehat{U}_1\) and \(\widehat{V}_1\) are the approximate subspaces obtained via the projection method and they satisfy \(\text{ span }(\widehat{U}_1)\subseteq \text{ span }(\widehat{U}), \text{ span }(\widehat{V}_1)\subseteq \text{ span }(\widehat{V})\). \(U_1,V_1\) are certain exact singular subspaces of A that \(\widehat{U}_1,\widehat{V}_1\) approximate. \(I_n\) is the \(n\times n\) identity matrix. \(X^\perp \) denotes the orthogonal complement of X. \(\Vert \cdot \Vert _2\) denotes the spectral norm, equal to the largest singular value. \(\Vert \cdot \Vert _{F}\) is the Frobenius norm \(\Vert A\Vert _F=\sqrt{\sum _{i,j} A_{ij}^2}\). Identities and inequalities involving \(\Vert \cdot \Vert \) without subscripts indicate they hold for any unitarily invariant norm, and those involving \(\Vert \cdot \Vert _{2,F}\) hold for both the spectral norm and the Frobenius norm, but not necessarily for every unitarily invariant norm.

\(\sigma _i(A)\) denotes the ith largest singular value of A. For \(A\in \mathbb {C}^{m\times n}\), by a full SVD we mean the decomposition \(A=U\Sigma V^*\) where UV are square unitary, hence \(\Sigma \) is \(m\times n\). We define \(\sigma _{\min }(A)=\sigma _{\min (m,n)}(A)\), and

(4)

In words, is equal to the smallest singular value when \(m=n\), but 0 otherwise. \(\lambda (A)\) denotes the set of eigenvalues of A. The key identity we use pertaining to is .

2 The Rayleigh–Ritz process for Hermitian eigenproblems and theorems of Saad, Knyazev and Stewart

The standard way of computing a subset of the eigenvalues and eigenvectors of a large-sparse Hermitian (or real symmetric) matrix \(A\in \mathbb {C}^{n\times n}\) is to form a low-dimensional subspace \(\text{ span }(\widehat{X})\) with \(\widehat{X}\in \mathbb {C}^{n\times \widehat{n}}\) having orthonormal columns, which approximately contains the desired eigenvectors, and then extract approximate eigenpairs (called the Ritz pairs) from it by means of the Rayleigh–Ritz process [19, Ch 11]. This process computes the eigendecomposition of the small Hermitian matrix \(\widehat{X}^{*}A\widehat{X}=\widetilde{X}\widehat{\Lambda }\widetilde{X}^*\), from which the Ritz values are taken as the diagonals of \(\widehat{\Lambda }\) and the Ritz vectors are the columns of \(\widehat{X}\widetilde{X}=:[\widehat{X}_1\ \widehat{X}_2]\). The Ritz pairs \((\widehat{\lambda },\widehat{x})\) thus obtained satisfy \(\widehat{x}\in \text{ span }(\widehat{X})\), and \(A\widehat{x}-\widehat{\lambda }\widehat{x}\perp \widehat{X}\). Observe the similarity between the Rayleigh–Ritz process and the Petrov–Galerkin projection method for the SVD.

A natural question is to ask how accurate the Ritz pairs are as approximation to the exact eigenpairs. The starting point is an analogue of (3): the partitioning of the projected A as

$$\begin{aligned} (\widetilde{A}=) [\widehat{X}_{1}\ \widehat{X}_{2}\ \widehat{X}_3]^*A[\widehat{X}_{1}\ \widehat{X}_{2}\ \widehat{X}_3]= \begin{bmatrix} \widehat{\Lambda }_1&\quad 0&\quad R^*_1\\ 0&\quad \widehat{\Lambda }_2&\quad R^*_2\\ R_1&\quad R_2&\quad A_3 \end{bmatrix}, \end{aligned}$$
(5)

where \([\widehat{X}_{1}\ \widehat{X}_{2}\ \widehat{X}_3]\) is an orthogonal matrix, with \(\widehat{X}_{1}\in \mathbb {C}^{n\times n_1}, \widehat{X}_{2}\in \mathbb {C}^{n\times (\widehat{n}-n_1)}, \) and \(\widehat{X}_3\in \mathbb {C}^{n\times (n-\widehat{n})}\). The computable quantities are \(\widehat{\Lambda }_1,\widehat{\Lambda }_2\) and the norms of each column of \(R_1,R_2\). In the context of the Rayleigh–Ritz process, \(\text{ span }(\widehat{X})=\text{ span }([\widehat{X}_{1}\ \widehat{X}_{2}])\) is the projection subspace, \(\lambda (\widehat{\Lambda }_1)\) and \(\lambda (\widehat{\Lambda }_2)\) are the Ritz values, and the goal here is to examine the accuracy of \(\widehat{X}_{1}\) as an approximate eigenspace \(X_1\in \mathbb {C}^{n\times k}\) of A (or more precisely, invariant subspace) such that \(AX_1 = X_1\Lambda _1\).

We note that bounding the eigenvalue accuracy can be done using standard eigenvalue perturbation theory: for example by Weyl’s theorem, the Ritz values \(\Lambda _i\) match some of those of A to within \(\Vert R_i\Vert _2\) for \(i=1,2\). For individual Ritz values, the corresponding residual norm \(\Vert A\widehat{x}_i-\widehat{\lambda }_i\widehat{x}_i\Vert _2\) is a bound for the distance to the closest exact eigenvalue. Moreover, by using results in [13, 15] one can often get tighter bounds for the eigenvalue accuracy, which scale quadratically with the norm of the residual.

2.1 Bounds for approximate eigenvector

Now we turn to eigenvectors and discuss the accuracy of the Ritz vectors. In the single-vector \(n_1=k=1\) case, we bound the angle between approximate and exact eigenvectors \(\widehat{x}\) and x. Saad [20, Thm. 4.6] proves the following theorem, applicable to (5) in the case where \(\widehat{\Lambda }_1=\widehat{\lambda }\) is a scalar and \(\widehat{x}=\widehat{X}_1\) is a vector (\(\widehat{X}\) is still a subspace of dimension \(>1\)).

Theorem 1

Let A be a Hermitian matrix and let \((\lambda ,x)\) be any of its eigenpairs. Let \((\widehat{\lambda },\widehat{x})\) be a Ritz pair obtained from the subspace \(\text{ span }(\widehat{X})\), such that \(\widehat{\lambda }\) is the closest Ritz value to \(\lambda \). Suppose \(\delta >0\), where \(\delta \) is the distance between \(\lambda \) and the set of Ritz values other than \(\widehat{\lambda }\). Then

$$\begin{aligned} \sin \angle (x,\widehat{x})\le \sin \angle (x,\widehat{X})\sqrt{1+\frac{\Vert r\Vert _2^2}{\delta ^2}}, \end{aligned}$$

where \(r=A\widehat{x}-\widehat{\lambda }\widehat{x}\).

Recalling that \(\sin \angle (x,\widehat{x})\ge \sin \angle (x,\widehat{X})\) holds trivially because \(x\in \text{ span }(\widehat{X})\), we see that the above theorem shows that the Ritz vector is optimal up to the constant \(\sqrt{1+\frac{\Vert r\Vert _2^2}{\delta ^2}}\).

Remark 1

This does not necessarily mean, however, that performing the Rayleigh–Ritz process is always a reliable way to extract approximate eigenpairs from a subspace; care is needed especially when computing interior eigenpairs, in which \(\delta \) can be very small or zero. One remedy is to use the harmonic Rayleigh–Ritz process. For more on this issue, see for example [16] and [22, Sect. 5.1.4].

2.2 Bounds for approximate eigenspace

Knyazev [12, Thm. 4.3] derives an extension of Saad’s result (Theorem 1) to approximate eigenspaces, presenting bounds in the spectral norm applicable in the context of self-adjoint linear operators in a Hilbert space. Stewart [21, 22] proves an analogous result applicable to non-Hermitian matrices.

Here we state a result specializing to Hermitian matrices. We give a proof since we use the same line of argument later for the SVD; The bound below slightly improves the classical ones in two ways: First, the term \(\Vert R_2\Vert \) in the bounds (6), (7) are \(\Vert [R_1\ R_2]\Vert \) in [12]. Second, (6) holds for any unitarily invariant norm. Essentially the same bound as (6) appears in [14, Thm. 4.1] (under relaxed conditions on the spectrum; see “Appendix”).

Theorem 2

Let A be a Hermitian matrix. Let \(\widetilde{A}\) be as in (5); \((\widehat{\Lambda }_1,\widehat{X}_1)\) is a set of Ritz pairs with \(\widehat{\Lambda }_1\in \mathbb {R}^{n_1\times n_1}\). Let \((\Lambda _1,X_1)\) with \(\Lambda _1\in \mathbb {R}^{k\times k}\), \(k\le n_1\) be a set of exact eigenpairs whose eigenvalues lie in the interval \([\lambda _0-d,\lambda _0+d]\) for some \(\lambda _0\) and \(d>0\). Suppose that \(\delta =\min |\lambda (\widehat{\Lambda }_2)-\lambda _0|-d>0\), where \(\widehat{\Lambda }_2\in \mathbb {R}^{(\widehat{n}-n_1)\times (\widehat{n}-n_1)}\) is as in (5). Then for any unitarily invariant norm

$$\begin{aligned} \left\| \sin \angle (X_1,\widehat{X}_1)\right\| \le \left\| \sin \angle (X_1,\widehat{X})\right\| \left( 1+\frac{\Vert R_2\Vert _2}{\delta }\right) , \end{aligned}$$
(6)

and for the spectral and Frobenius norms,

$$\begin{aligned} \left\| \sin \angle (X_1,\widehat{X}_1)\right\| _{2,F}\le \left\| \sin \angle (X_1,\widehat{X})\right\| _{2,F}\sqrt{1+\frac{\Vert R_2\Vert _{2}^2}{\delta ^2}}. \end{aligned}$$
(7)

Note that when applicable, (7) is slightly tighter than (6).

Proof

Let \(\widetilde{X}=\left[ \begin{array}{c} \widetilde{X}_1\\ \widetilde{X}_2\\ \widetilde{X}_3 \end{array}\right] \) be a set of eigenvectors of \(\widetilde{A}\) in (5) with orthonormal columns. A key step is to note from the CS decomposition that

$$\begin{aligned} \Vert \sin \angle (X_1,\widehat{X})\Vert = \Vert \sigma _i((\widehat{X}^\perp )^*X_1)\Vert =\left\| \widetilde{X}_3\right\| \end{aligned}$$
(8)

and

$$\begin{aligned} \Vert \sin \angle (X_1,\widehat{X}_1)\Vert = \Vert \sigma _i((\widehat{X}_1^\perp )^*X_1)\Vert =\left\| \begin{bmatrix}\widetilde{X}_2\\ \widetilde{X}_3\end{bmatrix}\right\| . \end{aligned}$$
(9)

The second block of \(\widetilde{A}\widetilde{X}=\widetilde{X}\Lambda _1\) gives

$$\begin{aligned} \widehat{\Lambda }_2\widetilde{X}_2+ R_2^*\widetilde{X}_3=\widetilde{X}_2\Lambda _1, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \widehat{\Lambda }_2\widetilde{X}_2-\widetilde{X}_2\Lambda _1=- R_2^*\widetilde{X}_3. \end{aligned}$$
(10)

We now use the following fact [23, p. 251]: Let \(F\in \mathbb {C}^{m\times m}\) and \(G\in \mathbb {C}^{n\times n}\) be such that \(1/\Vert F^{-1}\Vert -\Vert G\Vert =\delta >0\). Then for any \(W\in \mathbb {C}^{m\times n}\), we have \(\Vert W\Vert \le \frac{\Vert FW-WG\Vert }{\delta }\). We note that many results in eigenvector perturbation theory can be derived using this fact [23].

We use this in (10) as follows: Rewrite the equation as

$$\begin{aligned} (\widehat{\Lambda }_2-\lambda _0I_{\widehat{n}-n_1})\widetilde{X}_2-\widetilde{X}_2(\Lambda _1-\lambda _0I_{n_1})=- R_2^*\widetilde{X}_3, \end{aligned}$$

and take \((\widehat{\Lambda }_2-\lambda _0I_{\widehat{n}-n_1}):=F,\widetilde{X}_2:=W\), and \(\Lambda _1-\lambda _0I_{n_1}:=G\). Then by the assumptions on the spectrum we have \(1/\Vert F^{-1}\Vert \ge \min |\lambda (\widehat{\Lambda }_2)-\lambda _0|= \delta +d\) and \(\Vert G\Vert \le d\), so \(1/\Vert F^{-1}\Vert -\Vert G\Vert \ge \delta >0\). Therefore, we obtain

$$\begin{aligned} \Vert \widetilde{X}_2\Vert \le \frac{\Vert R_2^*\widetilde{X}_3\Vert }{\delta }. \end{aligned}$$

Hence using the fact [8, p. 327] that \(\Vert XY\Vert \le \Vert X\Vert _2\Vert Y\Vert \) for any unitarily invariant norm we obtain

$$\begin{aligned} \Vert \widetilde{X}_2\Vert \le \frac{\Vert R_2\Vert _2}{\delta }\Vert \widetilde{X}_3\Vert . \end{aligned}$$
(11)

Therefore, recalling (8) and (9) we have

$$\begin{aligned} \Vert \sin \angle (X_1,\widehat{X}_1)\Vert \nonumber&= \left\| \begin{bmatrix}\widetilde{X}_{2} \\ \widetilde{X}_{3} \end{bmatrix}\right\| \nonumber \le \Vert \widetilde{X}_{2}\Vert +\Vert \widetilde{X}_{3}\Vert \nonumber \\&\le (1+\frac{\Vert R_2\Vert _2}{\delta })\Vert \widetilde{X}_3\Vert =(1+\frac{\Vert R_2\Vert _2}{\delta })\Vert \sin \angle (X_1,\widehat{X})\Vert , \end{aligned}$$
(12)

which is (6). In the spectral or Frobenius norm we can use the inequality \(\Vert \big [ \begin{array}{c} A\\ B \end{array} \big ]\Vert _{2,F}\le \sqrt{\Vert A\Vert _{2,F}^2+\Vert B\Vert _{2,F}^2}\); note that this does not necessarily hold for other unitarily invariant norms, e.g. the nuclear (or trace) norm \(\Vert A\Vert _*=\sum _i \sigma _i(A)\).

We thus obtain the slightly tighter bound

$$\begin{aligned} \Vert \sin \angle (X_1,\widehat{X}_1)\Vert _{2,F}\nonumber&\le \sqrt{\Vert \widetilde{X}_{2}\Vert _{2,F}^2 +\Vert \widetilde{X}_{3}\Vert ^2_{2,F}} = \sqrt{1+\frac{\Vert R_2\Vert _2^2}{\delta ^2}}\Vert \widetilde{X}_3\Vert _{2,F}\\&= \sqrt{1+\frac{\Vert R_2\Vert _2^2}{\delta ^2}}\Vert \sin (X_1,\widehat{X})\Vert _{2,F}, \end{aligned}$$
(13)

which is (7). \(\square \)

As a historical sidenote, Theorem 2 (or more specifically the setup in (5)) is related to Question 10.2 in the classical paper by Davis and Kahan [4], where they ask for subspace angle bounds between three subspaces. Theorem 2 is related but not a direct answer to their question, since it provides bounds not on the direct subspace angles (as in Davis–Kahan’s \(\sin \theta ,\tan \theta \) theorems) but the relative quality of the extracted subspace \(\Vert \sin \angle (X_1,\widehat{X}_1)\Vert /\Vert \sin \angle (X_1,\widehat{X})\Vert \), where \(\dim \widehat{X} >\dim X_1\). Note also that the Rayleigh–Ritz process imposes a structure in the residual (the zero (1, 2) and (2, 1)-blocks) as in (5).

It is possible to relax the assumptions on the spectrum; in particular, we can allow the eigenvalues of \(\Lambda _1\) and \(\widehat{\Lambda }_2\) to interlace. To proceed straight to the main subject SVD, we defer the discussion to the “Appendix”.

It is perhaps worth emphasizing that the “gap” \(\delta \) is the gap between some of the Ritz values and some of the exact eigenvalues. In particular, the gap does not involve the eigenvalues of \(A_3\), in contrast to the gap that arises in the context of quadratic eigenvalue perturbation bounds [13, 15]. The same holds for the results for the SVD as we describe next.

3 Accuracy bounds for approximate singular subspaces

We now turn to our main subject SVD. Essentially, the goal is to derive the SVD analogues of Theorem 2. Regarding the accuracy of the singular values, just as in the symmetric eigenproblem, the errors can be bounded using standard perturbation theory, most importantly Weyl’s bound; bounds that scale quadratically with the residual are also known [13]. Our focus here is the accuracy of the singular vectors.

Two approaches are commonly employed for extending results for the symmetric eigenproblem to the SVD of \(A\in \mathbb {C}^{m\times n}\) (the discussion here assumes \(m\ge n\)):

  1. 1.

    Work with the Gram matrices \(A^*A\) and \(AA^*\), which respectively have V and \([U\ U^{\perp }]\) as the eigenvector matrices.

  2. 2.

    Work with the Jordan–Wieldant matrix \(\big [\begin{array}{cc} 0&{} A^*\\ A&{}0 \end{array}\big ]\), whose eigenvalues are \(\pm \sigma _i(A)\), along with \(|m-n|\) copies of 0. The eigenvectors are (assuming \(m\ge n\)) \(\big [\begin{array}{ccc} U&{} U&{} U^{\perp }\\ V&{}-V &{} 0 \end{array} \big ]\), where UV are the matrices of left and right singular vectors of A, and \(U^{\perp }\in \mathbb {C}^{m\times (m-{\widehat{n}})}\) is the orthogonal complement of U.

We avoid the Gram matrices because \(A^*A\) have eigenvalues \(\sigma ^2_i(A)\), through which the gap will be defined, and because of the squaring, the gap becomes smaller especially in the small singular values, resulting in unnecessarily large bounds.

Working with the Jordan–Wieldant matrix is an effective approach, and would give a bound on \(\angle \big ( \big [\begin{array}{c} \widehat{U}\\ \widehat{V}\end{array} \big ],\big [\begin{array}{c} U\\ V\end{array}\big ]\big )\) instead of \(\angle (\widehat{U},U)\) and \(\angle (\widehat{V},V)\). By using the technique employed in [23, p. 261] it is possible to deduce bounds on \(\angle (U_1,\widehat{U}_1)\) and \(\angle (V_1,\widehat{V}_1)\). Hochstenbach [10, Thm. 2.4] takes this approach to derive a bound for singular vectors (the case \(m_1=n_1=k=1\) in our setting). A slight issue with this approach is the presence of the \(|m-n|\) extra eigenvalues at 0, which can result in the gap quantity (\(\delta \) below) being unnecessarily small; indeed in [10, Thm. 2.4] the gap is defined differently depending on whether \(m=n\) or not. Our result below shows that this distinction is unnecessary (see remark at the end of Sect. 3.1). In addition, we remove a factor 2 in the bound [10], and more importantly, extend the analysis to the subspace case \(m_1,n_1,k>1\).

For the above reasons, in what follows we do not take the two approaches, and instead work directly with the matrix A and its projection.

3.1 Main result

We shall prove the following, which is the main result of this paper.

Theorem 3

Let \(A\in \mathbb {C}^{m\times n}\). Let \(\widehat{U}=[\widehat{U}_1\ \widehat{U}_2]\in \mathbb {C}^{m\times {\widehat{m}}}\) and \(\widehat{V}=[\widehat{V}_1\ \widehat{V}_2]\in \mathbb {C}^{n\times {\widehat{n}}}\) have orthonormal columns with \(\widehat{U}_1\in \mathbb {C}^{m\times m_1}\) and \(\widehat{V}_1\in \mathbb {C}^{n\times n_1}\), and let \(\widetilde{A},\widehat{\Sigma }_i,R_i,S_i,A_3\) be as defined in (3), and let (1) be an exact SVD with a singular triplet \((\Sigma _1,U_1,V_1)\), with \(\Sigma _1\in \mathbb {R}^{k\times k}\). Define (recall the definition of in (4))

(14)

If \(\delta >0\), then

$$\begin{aligned} \max&(\Vert \sin \angle (U_1,\widehat{U}_1)\Vert ,\Vert \sin \angle (V_1,\widehat{V}_1)\Vert )\nonumber \\&\le \left( 1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\right) \max (\Vert \sin \angle (U_1,\widehat{U})\Vert ,\Vert \sin \angle (V_1,\widehat{V})\Vert ) \end{aligned}$$
(15)

and

$$\begin{aligned} \Vert \sin&\angle (U_1,\widehat{U}_1)\Vert +\Vert \sin \angle (V_1,\widehat{V}_1)\Vert \nonumber \\&\le \left( 1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\right) \left( \Vert \sin \angle (U_1,\widehat{U})\Vert +\Vert \sin \angle (V_1,\widehat{V})\Vert \right) \end{aligned}$$
(16)

in any unitarily invariant norm. Moreover, in the spectral and Frobenius norms

$$\begin{aligned}&\max (\Vert \sin \angle (U_1,\widehat{U}_1)\Vert _{2,F},\Vert \sin \angle (V_1,\widehat{V}_1)\Vert _{2,F})\nonumber \\&\le \sqrt{1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2)^2 }{\delta ^2}} \max (\Vert \sin \angle (U_1,\widehat{U})\Vert _{2,F},\Vert \sin \angle (V_1,\widehat{V})\Vert _{2,F}). \nonumber \\ \end{aligned}$$
(17)

Proof

Let \(\left( \Sigma _1,\widetilde{U}_1,\widetilde{V}_1\right) \) be exact singular triplets of \(\tilde{A}\) and let \(\widetilde{V}_1=\left[ \begin{array}{c}\widetilde{V}_{11}\\ \widetilde{V}_{21}\\ \widetilde{V}_{31} \end{array}\right] , \widetilde{U}_1=\left[ \begin{array}{c}\widetilde{U}_{11} \\ \widetilde{U}_{21}\\ \widetilde{U}_{31} \end{array}\right] \), so that

$$\begin{aligned} \left[ \begin{array}{c@{\quad }c@{\quad }c} \widehat{\Sigma }_1&{}0&{} R_1\\ 0&{}\widehat{\Sigma }_2&{} R_2\\ S_1&{} S_2&{} A_3 \end{array}\right] \left[ \begin{array}{c}\widetilde{V}_{11}\\ \widetilde{V}_{21}\\ \widetilde{V}_{31} \end{array}\right] = \left[ \begin{array}{c}\widetilde{U}_{11} \\ \widetilde{U}_{21}\\ \widetilde{U}_{31} \end{array}\right] \Sigma _1, \end{aligned}$$
(18)

and

$$\begin{aligned} \left[ \begin{array}{c}\widetilde{U}_{11}^{*} \ \widetilde{U}_{21}^{*}\ \widetilde{U}_{31}^{*} \end{array}\right] \left[ \begin{array}{c@{\quad }c@{\quad }c} \widehat{\Sigma }_1&{}0&{} R_1\\ 0&{}\widehat{\Sigma }_2&{} R_2\\ S_1&{} S_2&{} A_3 \end{array}\right] = \Sigma _1\left[ \begin{array}{c}\widetilde{V}_{11}^{*}\ \widetilde{V}_{21}^{*}\ \widetilde{V}_{31}^{*} \end{array}\right] . \end{aligned}$$
(19)

As in (8), (9) we have

$$\begin{aligned} \Vert \sin \angle (U_1,\widehat{U}_1)\Vert =\left\| \left[ \begin{array}{c}\widetilde{U}_{21} \\ \widetilde{U}_{31} \end{array}\right] \right\| , \quad \Vert \sin \angle (V_1,\widehat{V}_1)\Vert =\left\| \left[ \begin{array}{c}\widetilde{V}_{21} \\ \widetilde{V}_{31} \end{array}\right] \right\| , \end{aligned}$$
(20)

and recalling that \(\widehat{U}=[\widehat{U}_1\ \widehat{U}_2]\) and \(\widehat{V}=[\widehat{V}_1\ \widehat{V}_2]\),

$$\begin{aligned} \Vert \sin \angle (U_1,\widehat{U})\Vert =\Vert \widetilde{U}_{31} \Vert , \quad \Vert \sin \angle (V_1,\widehat{V})\Vert =\Vert \widetilde{V}_{31} \Vert . \end{aligned}$$
(21)

To establish the theorem we shall bound \(\Vert \widetilde{U}_{21}\Vert \) with respect to \(\Vert \widetilde{U}_{31}\Vert \), and similarly bound \(\Vert \widetilde{V}_{21}\Vert \) with respect to \(\Vert \widetilde{V}_{31}\Vert \).

From the second block of (18) we obtain

$$\begin{aligned} \widehat{\Sigma }_2\widetilde{V}_{21}+ R_2\widetilde{V}_{31}=\widetilde{U}_{21}\Sigma _1, \end{aligned}$$
(22)

and from the second block of (19) we get

$$\begin{aligned} \widetilde{U}_{21}^{*}\widehat{\Sigma }_2+\widetilde{U}_{31}^{*} S_2=\Sigma _1\widetilde{V}_{21}^{*}. \end{aligned}$$
(23)

Now suppose that \(\delta = \sigma _{\min }(\Sigma _1)-\Vert \widehat{\Sigma }_2\Vert _2>0\). Then, taking norms and using the triangular inequality and the fact \(\Vert XY\Vert \le \Vert X\Vert _2\Vert Y\Vert \) in (22) and (23), we obtain

$$\begin{aligned} {\begin{matrix} \Vert \widetilde{U}_{21}\Vert \sigma _{\min }(\Sigma _1)-\Vert \widetilde{V}_{21}\Vert \Vert \widehat{\Sigma }_2\Vert _2&{}\le \Vert R_2\widetilde{V}_{31}\Vert , \\ \Vert \widetilde{V}_{21}\Vert \sigma _{\min }(\Sigma _1)-\Vert \widetilde{U}_{21}\Vert \Vert \widehat{\Sigma }_2\Vert _2&{}\le \Vert \widetilde{U}_{31}^{*} S_2\Vert . \end{matrix}} \end{aligned}$$
(24)

By adding the first inequality times \(\sigma _{\min }(\Sigma _1)\) and the second inequality times \(\Vert \widehat{\Sigma }_2\Vert \), we eliminate the \(\Vert \widetilde{V}_{21}\Vert \) term, and recalling the assumption \(\sigma _{\min }(\Sigma _1)>\Vert \widehat{\Sigma }_2\Vert _2\) we obtain

$$\begin{aligned} \Vert \widetilde{U}_{21}\Vert \le \frac{ \sigma _{\min }(\Sigma _1)\Vert R_2\widetilde{V}_{31}\Vert + \Vert \widehat{\Sigma }_2\Vert _2\Vert \widetilde{U}_{31}^{*}S_2\Vert }{(\sigma _{\min }(\Sigma _1))^2-\Vert \widehat{\Sigma }_2\Vert _2^2}. \end{aligned}$$
(25)

We can similarly obtain

$$\begin{aligned} \Vert \widetilde{V}_{21}\Vert \le \frac{ \sigma _{\min }(\Sigma _1)\Vert \widetilde{U}_{31}^{*}S_2\Vert + \Vert \widehat{\Sigma }_2\Vert _2\Vert R_2\widetilde{V}_{31}\Vert }{(\sigma _{\min }(\Sigma _1))^2-\Vert \widehat{\Sigma }_2\Vert _2^2}. \end{aligned}$$
(26)

Again using the fact \(\Vert XY\Vert \le \Vert X\Vert _2\Vert Y\Vert \) we have

$$\begin{aligned} \max (\Vert \widetilde{U}_{21}\Vert ,\Vert \widetilde{V}_{21}\Vert )&\le \frac{\max (\Vert \widetilde{U}_{31}^{*}S_2\Vert ,\Vert R_2\widetilde{V}_{31}\Vert ) }{\sigma _{\min }(\Sigma _1)-\Vert \widehat{\Sigma }_2\Vert _2}\nonumber \\&\le \frac{\max (\Vert \widetilde{U}_{31}^{*}\Vert ,\Vert \widetilde{V}_{31}\Vert ) \max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\sigma _{\min }(\Sigma _1)-\Vert \widehat{\Sigma }_2\Vert _2}\nonumber \\&= \frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta } \max (\Vert \widetilde{U}_{31}^{*}\Vert ,\Vert \widetilde{V}_{31}\Vert ). \end{aligned}$$
(27)

Now if , we proceed similarly, replacing (24) with and (using the fact ) to arrive at the same conclusion (27).

Recalling (20) and (21), we obtain

$$\begin{aligned} \max&(\Vert \sin \angle (U_1,\widehat{U}_1)\Vert ,\Vert \sin \angle (V_1,\widehat{V}_1)\Vert )=\max \left( \left\| \begin{bmatrix}\widetilde{U}_{21} \\ \widetilde{U}_{31} \end{bmatrix}\right\| , \left\| \begin{bmatrix}\widetilde{V}_{21} \\ \widetilde{V}_{31} \end{bmatrix}\right\| \right) \\&\le \max ( \Vert \widetilde{U}_{21}\Vert +\Vert \widetilde{U}_{31}\Vert , \Vert \widetilde{V}_{21}\Vert +\Vert \widetilde{V}_{31}\Vert ) \nonumber \\&\le \left( 1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\right) \max (\Vert \widetilde{U}_{31}\Vert ,\Vert \widetilde{V}_{31}\Vert )\\&= \left( 1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\right) \max (\Vert \sin \angle (U_1,\widehat{U})\Vert ,\Vert \sin \angle (V_1,\widehat{V})\Vert ), \end{aligned}$$

giving (15). In the spectral or Frobenius norm we use the inequality \(\Vert \big [ \begin{matrix} A\\ B \end{matrix} \big ]\Vert _{2,F}\le \sqrt{\Vert A\Vert ^2_{2,F}+\Vert B\Vert _{2,F}^2}\) to obtain the slightly tighter bound

$$\begin{aligned}&\max \{\Vert \sin \angle (U_1,\widehat{U}_1)\Vert _{2,F},\Vert \sin \angle (V_1,\widehat{V}_1)\Vert _{2,F}\}\nonumber =\max \left( \left\| \begin{bmatrix}\widetilde{U}_{21} \\ \widetilde{U}_{31} \end{bmatrix}\right\| _{2,F}, \left\| \begin{bmatrix}\widetilde{V}_{21} \\ \widetilde{V}_{31} \end{bmatrix}\right\| _{2,F}\right) \nonumber \\&\le \max ( \sqrt{\Vert \widetilde{U}_{21}\Vert _{2,F}^2 +\Vert \widetilde{U}_{31}\Vert _{2,F}^2}, \sqrt{\Vert \widetilde{V}_{21}\Vert _{2,F}^2 +\Vert \widetilde{V}_{31}\Vert _{2,F}^2}) \nonumber \\&\le \sqrt{1+\big (\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\big )^2}\max (\Vert \sin \angle (U_1,[\widehat{U}_1\ \widehat{U}_2])\Vert _{2,F},\Vert \sin \angle (V_1,[\widehat{V}_1\ \widehat{V}_2])\Vert _{2,F}). \end{aligned}$$

To obtain (16), adding the two inequalities (25) and (26) we have

(28)

where as before, the last bound holds in both cases and . Hence we have

$$\begin{aligned} \Vert \sin \angle (U_1,\widehat{U}_1)\Vert&+\Vert \sin \angle (V_1,\widehat{V}_1)\Vert = \left\| \begin{bmatrix}\widetilde{U}_{21} \\ \widetilde{U}_{31} \end{bmatrix}\right\| + \left\| \begin{bmatrix}\widetilde{V}_{21} \\ \widetilde{V}_{31} \end{bmatrix}\right\| \\&\le \Vert \widetilde{U}_{21}\Vert +\Vert \widetilde{U}_{31}\Vert + \Vert \widetilde{V}_{21}\Vert +\Vert \widetilde{V}_{31}\Vert \nonumber \\&\le \left( 1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\right) (\Vert \widetilde{U}_{31}\Vert +\Vert \widetilde{V}_{31}\Vert )\\&= \left( 1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\right) \left( \Vert \sin \angle (U_1,\widehat{U})\Vert +\Vert \sin \angle (V_1,\widehat{V})\Vert \right) , \end{aligned}$$

completing the proof. \(\square \)

Theorem 3 shows that the computed singular vectors extracted by the projection method is optimal up to the factor \(\sqrt{1+\big (\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\big )^2}\) in the spectral norm, and the factor \(1+\frac{\max (\Vert R_2\Vert _2,\Vert S_2\Vert _2) }{\delta }\) in any unitarily invariant norm. Note the similarity of these factors between Theorems 2 and 3.

The quantity \(\delta \) in the theorem plays essentially the same role as in the results in Sect. 2, and is the SVD analogue of the spectral gap. In a typical situation where the largest singular values are sought and \(m_1=n_1=k\), we have \(\widehat{\Sigma }_1\approx \Sigma _1\) with \(\Sigma _1\) containing the k largest singular values, and all the singular values of \(\widehat{\Sigma }_2\) are smaller than , and thus . Similarly, when the smallest singular values are sought, \(\widehat{\Sigma }_1\approx \Sigma _1\) with \(\Sigma _1\) containing the k smallest singular values, \(\widehat{\Sigma }_2\) is square and all its singular values are larger than \(\sigma _{\min (m,n)-k+1}(A)=\Vert \Sigma _1\Vert _2\), and thus . It is perhaps worth noting that when \(m\ne n\), the gap quantity given in [10, Thm. 2.4] becomes (in our notation) , which is unnecessarily small when .

3.2 When one-sided projection is used

Some of the state-of-the-art algorithms for an approximate SVD, such as [6, 7], rely on one-sided projection methods, instead of two-sided projection as described so far. In this case one would approximate \(A\approx (\widehat{U} \widetilde{U})\widehat{\Sigma }\widetilde{V}^*\), where \(\widetilde{U}\widehat{\Sigma }\widetilde{V}^*=\widehat{U}^*A\) is the SVD of the \({\widehat{m}}\times n\) matrix, obtained by projecting A only from the left side by \(\widehat{U}\) (or from the right by \(\widehat{V}\); we discuss left projection for definiteness).

Although one-sided projection is merely a special case of two-sided projection (as it corresponds to taking \(\widehat{V}=I_n\)), given its practical importance and the simplicities that accrue, we restate the above results when specialized to one-sided projections. In this case we do not have \(\widehat{V}_3\), and we start with the equation

$$\begin{aligned} (\widetilde{A}=)\ [\widehat{U}_1\ \widehat{U}_2\ \widehat{U}_3]^{*}A[\widehat{V}_1\ \widehat{V}_2]= \begin{bmatrix} \widehat{\Sigma }_1&\quad 0\\ 0&\quad \widehat{\Sigma }_2\\ S_1&\quad S_2 \end{bmatrix}. \end{aligned}$$
(29)

Corollary 1

In the notation of Theorem 3, suppose that \(\text{ span }(\widehat{V})=\mathbb {R}^{n}\). Then

$$\begin{aligned} \Vert \sin \angle (U_1,\widehat{U}_1)\Vert + \Vert \sin \angle (V_1,\widehat{V}_1)\Vert \le \left( 1+\frac{\Vert S_2\Vert _2 }{\delta }\right) \Vert \sin \angle (U_1,\widehat{U})\Vert \end{aligned}$$
(30)

in any unitarily invariant norm. Furthermore, in the spectral and Frobenius norms,

$$\begin{aligned} \max (\Vert \sin \angle (U_1,\widehat{U}_1)\Vert _{2,F},&\Vert \sin \angle (V_1,\widehat{V}_1)\Vert _{2,F}) \le \sqrt{1+\frac{\Vert S_2\Vert _{2,F}^2 }{\delta ^2}} \Vert \sin \angle (U_1,\widehat{U})\Vert _{2,F}. \end{aligned}$$
(31)

Proof

The bounds are obtained essentially by taking \(R_i=0\) in Theorem 3, and noting that \(\angle (V_1,\widehat{V})=0\) because \(\widehat{V}\) spans the entire space \(\mathbb {R}^{n}\). \(\square \)

Note that the inequality corresponding to (15) becomes strictly weaker than (30), thus omitted.

3.3 Subspace corresponding to smallest singular values

This work is expected to be relevant most in the context of finding an approximate and truncated SVD, in which one approximates the largest singular values and its associated subspaces. Nonetheless, the results obtained here are applicable to any set of singular vectors: for example, Theorem 3 can be used when \(\widehat{U}_1,\widehat{V}_1\) approximate the singular subspaces corresponding to certain interior singular values. The same holds for the smallest singular values.

However, practical difficulties arise when one is not looking for the largest singular subspaces. The first difficulty is related to Remark 1 before Sect. 2.2; when computing interior singular values, “spurious” approximate singular values can be very close to an exact sought singular value, while having corresponding singular vectors that are nowhere near the correct ones; this phenomenon would manifest itself as \(\delta \) being small. As mentioned at the beginning of Sect. 3, using the Jordan–Wieldant matrix introduces extra zero eigenvalues, resulting in unnecessarily small gaps for the smallest singular values—worsening the issue with spurious singular values. We avoid this by working directly with A. To remedy an (unavoidable) small gap, an approach analogous to harmonic Rayleigh–Ritz [10] might be helpful; the analysis of such alternatives would also be of interest.

Another difficulty is that even if the bounds in Theorem 3 are sufficiently small, this does not guarantee that \(\widehat{U}_1,\widehat{V}_1\) capture the whole left and right null space of A: specifically, \(U_1,V_1\) in the theorems above may merely be a subset of the null spaces, of the same size as \(\widehat{U}_1,\widehat{V}_1\). Unfortunately there appears to be no easy way to check that \(\widehat{U}_1,\widehat{V}_1\) contains the entire desired subspace (contrast this with when \(\widehat{U}_1,\widehat{V}_1\) approximate the largest singular subspaces, in which case \(A-\widehat{U}_1\widehat{\Sigma }_1\widehat{V}_1\) being small indicates \(\widehat{U}_1,\widehat{V}_1\) contain the desired subspaces).

3.3.1 Finding null space

One situation that Theorem 3 does not cover is when the null space is desired of a (fat) rectangular matrix. For definiteness, suppose \(A\in \mathbb {C}^{m\times n}\) with \(m< n\) and one wants to compute the \(n-m\) null vectors such that \(AV=0\). Then in the notation of Theorem 3, we would like to take \(\Sigma _1\) to be the “\(0\times (n-m)\)” matrix of zeros, which the theorem does not account for.

In this case we can modify the argument as follows.

Proposition 1

Let \(A\in \mathbb {C}^{m\times n}\) with \(m<n\), and let \(\widetilde{A},\widehat{\Sigma }_i,\widehat{U}_i,\widehat{V}_i,R_i,S_i,A_3\) be as defined in (3). Let \(V_1\in \mathbb {C}^{n\times k} \) with \(k\le n-m\), \(k\le m_1,n_1\) be a null space of A such that \(AV_1=0\). Then

$$\begin{aligned} \Vert \sin \angle (V_1,\widehat{V}_1)\Vert \le \left( 1+\frac{\Vert R_2\Vert _2) }{\sigma _{\min }(\widehat{\Sigma }_2)}\right) \Vert \sin \angle (V_1,\widehat{V})\Vert , \end{aligned}$$
(32)

and

$$\begin{aligned} \Vert \sin \angle (V_1,\widehat{V}_1)\Vert _{2,F}\le \sqrt{1+\bigg (\frac{\Vert R_2\Vert _2 }{\sigma _{\min }(\widehat{\Sigma }_2)}\bigg )^2} \Vert \sin \angle (V_1,\widehat{V})\Vert _{2,F} . \end{aligned}$$
(33)

Proof

We have

$$\begin{aligned} \begin{bmatrix} \widehat{\Sigma }_1&\quad 0&\quad R_1\\ 0&\quad \widehat{\Sigma }_2&\quad R_2\\ S_1&\quad S_2&\quad A_3 \end{bmatrix} \left[ \begin{array}{c} \widetilde{V}_{11}\\ \widetilde{V}_{21}\\ \widetilde{V}_{31} \end{array}\right] = 0, \end{aligned}$$
(34)

where \(\left[ \begin{array}{c} \widetilde{V}_{11}\\ \widetilde{V}_{21}\\ \widetilde{V}_{31} \end{array}\right] = [\widehat{V}_1\ \widehat{V}_2\ \widehat{V}^\perp ]^*V_1\). The goal as before is to bound \(\frac{\Vert \sin \angle (V_1,\widehat{V}_1)\Vert }{\Vert \sin \angle (V_1,\widehat{V})\Vert }= \Vert \left[ \begin{array}{c} \widetilde{V}_{21}\\ \widetilde{V}_{31} \end{array}\right] \Vert /\Vert \widetilde{V}_{31}\Vert .\)

The second block of (34) gives \(\widehat{\Sigma }_2\widetilde{V}_{21}+ R_2\widetilde{V}_{31}=0,\) from which we obtain (we require that \(\widehat{\Sigma }_2\) is square)

$$\begin{aligned} \Vert \widetilde{V}_{21}\Vert \le \frac{\Vert R_2\Vert }{\sigma _{\min }(\widehat{\Sigma }_2)}\Vert \widetilde{V}_{31}\Vert . \end{aligned}$$

Thus we obtain (32) and (33) using the same argument as in Theorem 3 as required. \(\square \)