Appendix 1: Proof of Theorem 2.6
We start with a key result on matrix concentration.
Theorem 1
(Noncommutative Bernstein Inequality [9, 17]) Let \(X_1, \dots , X_m\) be independent zero-mean-square \(d \times d\) random matrices. Suppose
$$\begin{aligned} \rho _k^2 {:=} \max \, \{\Vert \mathbb {E}[X_k X_k^T]\Vert _2, \Vert \mathbb {E}[X_k^T X_k] \Vert _2 \} \end{aligned}$$
and \(\Vert X_k\Vert _2 \le M\) almost surely for all \(k\). Then, for any \(\tau > 0\),
$$\begin{aligned} \mathbb {P} \left[ \left\| \sum _{k=1}^m X_k \right\| _2 > \tau \right] \le 2d \exp \left( \frac{-\tau ^2 / 2}{\sum _{k=1}^m \rho _k^2 + M\tau / 3} \right) . \end{aligned}$$
We proceed with the proof of Theorem 2.6.
Proof
We start by defining the notation
$$\begin{aligned} u_k {:=} U_{\Omega (k) \cdot }^T \in \mathbb {R}^d, \end{aligned}$$
that is, \(u_k\) is the transpose of the row of the row of \(U\) that corresponds to the \(k\hbox {th}\) element of \(\Omega \). We thus define
$$\begin{aligned} X_k {:=} u_k u_k^T - \frac{1}{n} I_d, \end{aligned}$$
where \(I_d\) is the \(d \times d\) identity matrix. Because of orthonormality of the columns of \(U\), this random variable has zero mean.
To apply Theorem 1, we must compute the values of \(\rho _k\) and \(M\) that correspond to this definition of \(X_k\). Since \(\Omega (k)\) is chosen uniformly with replacement, the \(X_k\) are distributed identically for all \(k\), and \(\rho _k\) is independent of \(k\) (and can thus be denoted by \(\rho \)).
Using the fact that
$$\begin{aligned} \Vert A-B\Vert _2 \le \max \{\Vert A\Vert _2,\Vert B\Vert _2\} \;\; \text{ for } \text{ positive } \text{ semidefinite } \text{ matrices } A \hbox { and } B,\qquad \end{aligned}$$
(6.1)
and recalling that \(\Vert u_k\Vert ^2_2 = \Vert U_{\Omega (k) \cdot }\Vert ^2_2 \le d\mu (U)/n\), we have
$$\begin{aligned} \left\| u_k u_k^T - \frac{1}{n} I_d \right\| _2 \le \max \left\{ \frac{d\mu (U)}{n}, \frac{1}{n} \right\} . \end{aligned}$$
Thus, we can define \(M {:=} d\mu (U) / n\). For \(\rho \), we note by symmetry of \(X_k\) that
$$\begin{aligned} \rho ^2 = \left\| \mathbb {E} \left[ X_k^2 \right] \right\| _2&= \left\| \mathbb {E} \left[ u_k u_k^T u_k u_k^T - \frac{2}{n} u_k u_k^T + \frac{1}{n^2} I_d \right] \right\| _2 \nonumber \\&= \left\| \mathbb {E}\left[ u_k (u_k^T u_k) u_k^T\right] - \frac{1}{n^2} I_d \right\| _2, \end{aligned}$$
(6.2)
where the last step follows from linearity of expectation, and \(E(u_k u_k^T) = (1/n) I_d\).
For the next step, we define \(S\) to be the \(n \times n\) diagonal matrix with diagonal elements \(\Vert U_{i \cdot }\Vert _2^2\), \(i=1,2,\cdots ,n\). We thus have
$$\begin{aligned} \Vert E[u_k (u_k^T u_k) u_k^T]\Vert _2 = \left\| \frac{1}{n} U^T S U \right\| \le \frac{1}{n} \Vert U \Vert _2^2 \Vert S\Vert _2 = \frac{1}{n} \frac{d \mu (U)}{n} = \frac{d \mu (U)}{n^2}. \end{aligned}$$
Using (6.1), we have from (6.2) that
$$\begin{aligned} \rho ^2 \le \max \left( \left\| \mathbb {E}\left[ u_k (u_k^T u_k) u_k^T\right] \right\| , \frac{1}{n^2} \right) \le \max \left( \frac{d \mu (U)}{n^2}, \frac{1}{n^2} \right) = \frac{d \mu (U)}{n^2}, \end{aligned}$$
since \(d \mu (U) \ge d \ge 1\).
We now apply Theorem 1. First, we restrict \(\tau \) to be such that \(M\tau \le |\Omega | \rho ^2\) to simplify the denominator of the exponent. We obtain
$$\begin{aligned} 2d \exp \left( \frac{-\tau ^2 / 2}{|\Omega | \rho ^2 + M\tau /3}\right) \le 2d\exp \left( \frac{-\tau ^2 / 2}{\frac{4}{3} |\Omega | \frac{d\mu (U)}{ n^2}}\right) , \end{aligned}$$
and thus
$$\begin{aligned} \mathbb {P} \left[ \left\| \sum _{k \in \Omega } \left( u_k u_k^T - \frac{1}{n} I_d \right) \right\| > \tau \right] \le 2d \exp \left( \frac{-3 n^2 \tau ^2}{8 |\Omega |d\mu (U)}\right) . \end{aligned}$$
Now take \(\tau = \gamma |\Omega |/n\) with \(\gamma \) defined in the statement of the lemma. Since \(\gamma <1\) by assumption, \(M\tau \le |\Omega | \rho ^2\) holds and we have
$$\begin{aligned} \mathbb {P} \left[ \left\| \sum _{k \in \Omega } \left( u_k u_k^T - \frac{1}{n} I_d \right) \right\| _2 \le \frac{|\Omega |}{n} \gamma \right] \ge 1 - \delta . \end{aligned}$$
(6.3)
We have, by symmetry of \(\sum _{k \in \Omega } u_k u_k^T\) and the fact that
$$\begin{aligned} \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T - \frac{|\Omega |}{n} I\right) = \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T \right) - \frac{|\Omega |}{n}, \end{aligned}$$
that
$$\begin{aligned} \left\| \sum _{k \in \Omega } \left( u_k u_k^T - \frac{1}{n} I_d \right) \right\| _2&= \left\| \left( \sum _{k \in \Omega } u_k u_k^T \right) - \frac{|\Omega |}{n} I_d \right\| _2 \\&= \max _{i=1,2,\cdots ,n} \left| \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T \right) - \frac{|\Omega |}{n} \right| , \end{aligned}$$
From (6.3), we have with probability \(1-\delta \) that
$$\begin{aligned} \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T \right) \in \left[ (1-\gamma ) \frac{|\Omega |}{n}, (1+\gamma ) \frac{|\Omega |}{n} \right] \quad \hbox {for all } \, i=1,2,\cdots ,n, \end{aligned}$$
completing the proof. \(\square \)
Appendix 2: Proof of Lemma 3.1
We drop the subscript “\(t\)” throughout the proof and use \(A_+\) in place of \(A_{t+1}\). From (3.1), and using the definitions (3.2), we have
$$\begin{aligned} A_+^T&= \bar{U}^TU_+ \\&= \bar{U}^TU \!+\! \left\{ (\cos (\sigma \eta ) - 1) \frac{\bar{U}^T UU^T \bar{U} s}{\Vert w\Vert } + \sin (\sigma \eta ) \frac{(I-\bar{U}^T UU^T \bar{U})s}{\Vert r\Vert } \right\} \frac{s^T\bar{U}^TU}{\Vert w\Vert } \\&= \left\{ I + (\cos (\sigma \eta ) - 1) \frac{A^TAss^T}{\Vert w\Vert ^2} + \sin (\sigma \eta ) \frac{(I-A^TA)ss^T}{\Vert r\Vert \Vert w\Vert } \right\} A^T = HA^T, \end{aligned}$$
where the matrix \(H\) is defined in an obvious way. Thus,
$$\begin{aligned} \Vert A_+\Vert _F^2 = \mathrm{trace}(A_+A_+^T) = \mathrm{trace}(AH^THA^T). \end{aligned}$$
Focusing initially on \(H^TH\), we obtain
$$\begin{aligned} H^TH&= I + (\cos (\sigma \eta )-1)^2 \frac{ss^T A^TAA^TA ss^T}{\Vert w\Vert ^4} \\&\quad + (\cos (\sigma \eta )-1) \frac{ss^TA^TA + A^TAss^T}{\Vert w\Vert ^2} \\&\quad + \sin (\sigma \eta ) \frac{2ss^T - ss^TA^TA - A^TAss^T}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{ss^T A^TA ss^T - ss^T A^TAA^TAss^T}{\Vert r\Vert \Vert w\Vert ^3} \\&\quad + \sin ^2 (\sigma \eta ) \frac{s(s^Ts - 2s^TA^TAs+s^TA^TAA^TAs)s^T}{\Vert r\Vert ^2 \Vert w\Vert ^2}. \end{aligned}$$
It follows immediately that
$$\begin{aligned} A_+ A_+^T&= AA^T + (\cos (\sigma \eta )-1)^2 \frac{Ass^T A^TAA^TA ss^TA^T}{\Vert w\Vert ^4} \\&\quad + (\cos (\sigma \eta )-1) \frac{Ass^TA^TAA^T + AA^TAss^TA^T}{\Vert w\Vert ^2} \\&\quad + \sin (\sigma \eta ) \frac{2Ass^TA^T - Ass^TA^TAA^T - AA^TAss^TA^T}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{Ass^T A^TA ss^TA^T - Ass^T A^TAA^TAss^TA^T}{\Vert r\Vert \Vert w\Vert ^3} \\&\quad + \sin ^2 (\sigma \eta ) \frac{As(s^Ts - 2s^TA^TAs+s^TA^TAA^TAs)s^TA^T}{\Vert r\Vert ^2 \Vert w\Vert ^2}. \end{aligned}$$
We now use repeatedly the fact that \(\mathrm{trace}\, ab^T= a^Tb\) to deduce that
$$\begin{aligned} \mathrm{trace}(A_+A_+^T)&= \mathrm{trace}(AA^T) + (\cos (\sigma \eta )-1)^2 \frac{(s^TA^TAs)s^T A^TAA^TA s}{\Vert w\Vert ^4} \\&\quad + (\cos (\sigma \eta )-1) \frac{2s^TA^TAA^TAs}{\Vert w\Vert ^2} \\&\quad + \sin (\sigma \eta ) \frac{2s^TA^TAs - 2 s^TA^TAA^TAs}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{(s^T A^TA s)^2 - (s^T A^TAA^TAs)(s^TA^TAs)}{\Vert r\Vert \Vert w\Vert ^3} \\&\quad + \sin ^2 (\sigma \eta ) \frac{\Vert s\Vert ^2 s^TA^TAs \!-\! 2(s^TA^TAs)^2 \!+\! (s^TA^TAA^TAs)(s^TA^TAs)}{\Vert r\Vert ^2 \Vert w\Vert ^2}. \end{aligned}$$
Now using \(w=As\) (and hence \(s^TA^TAs=\Vert w\Vert ^2\)), we have
$$\begin{aligned} \mathrm{trace}(A_+A_+^T)&= \mathrm{trace}(AA^T) + (\cos (\sigma \eta )-1)^2 \frac{s^T A^TAA^TA s}{\Vert w\Vert ^2} \\&\quad + (\cos (\sigma \eta )-1) \frac{2s^TA^TAA^TAs}{\Vert w\Vert ^2} \\&\quad + 2 \sin (\sigma \eta ) \frac{\Vert w\Vert ^2 - s^TA^TAA^TAs}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{\Vert w\Vert ^2 - s^T A^TAA^TAs}{\Vert r\Vert \Vert w\Vert } \\&\quad + \sin ^2 (\sigma \eta ) \frac{\Vert s\Vert ^2- 2\Vert w\Vert ^2 + (s^TA^TAA^TAs)}{\Vert r\Vert ^2}, \end{aligned}$$
For the second and third terms on the right-hand side, we use the identity
$$\begin{aligned} (\cos (\sigma \eta )-1)^2 + 2(\cos (\sigma \eta )-1) = \cos ^2 (\sigma \eta )-1 = {-}\sin ^2 (\eta \sigma ), \end{aligned}$$
allowing us to combine these terms with the final \(\sin ^2(\sigma \eta )\) term. Using also the identity \(\Vert r\Vert ^2 = \Vert s\Vert ^2-\Vert w\Vert ^2\), we obtain for the combination of these three terms that
$$\begin{aligned}&\sin ^2 (\sigma \eta ) \left[ 1 - \frac{\Vert w\Vert ^2}{\Vert r\Vert ^2} + s^TA^TAA^TAs \left( \frac{1}{\Vert r\Vert ^2} - \frac{1}{\Vert w\Vert ^2} \right) \right] \\&\qquad = \sin ^2 (\sigma \eta ) \left( 1 - \frac{\Vert w\Vert ^2}{\Vert r\Vert ^2}\right) \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) . \end{aligned}$$
We can also combine the third and fourth terms in the right-hand side above to yield a combined quantity
$$\begin{aligned} 2 \sin (\sigma \eta ) \cos (\sigma \eta ) \frac{\Vert w\Vert }{\Vert r\Vert } \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) . \end{aligned}$$
By substituting these two compressed terms into the expression above, we obtain
$$\begin{aligned}&\mathrm{trace}(A_+A_+^T) = \mathrm{trace}(AA^T) \\&\quad + \sin (\sigma \eta ) \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) \left[ \left( 1-\frac{\Vert w\Vert ^2}{\Vert r\Vert ^2} \right) \sin (\sigma \eta ) + 2 \cos (\sigma \eta ) \frac{\Vert w\Vert }{\Vert r\Vert } \right] . \end{aligned}$$
We now use the relations (3.3) to deduce that
$$\begin{aligned} \frac{\Vert w\Vert }{\Vert r\Vert } = \frac{\cos \theta }{\sin \theta }, \quad 1-\frac{\Vert w\Vert ^2}{\Vert r\Vert ^2} = -\frac{\cos (2 \theta )}{\sin ^2 \theta }, \end{aligned}$$
and thus, the increment \(\mathrm{trace}(A_+A_+^T) - \mathrm{trace}(AA^T)\) becomes
$$\begin{aligned}&\sin (\sigma \eta ) \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) \left[ -\frac{\cos (2 \theta )}{\sin ^2 \theta } \sin (\sigma \eta ) + 2 \cos (\sigma \eta ) \frac{\cos \theta }{\sin \theta } \right] \\&\quad = \frac{\sin (\sigma \eta ) \sin (2 \theta - \sigma \eta )}{\sin ^2 \theta } \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) . \end{aligned}$$
The result (3.6) follows by substituting \(w=As\) and (3.4).
Nonnegativity of the right-hand side follows from \(\theta _t \ge 0\), \(2\theta _t - \sigma _t \eta _t \ge 0\), and \(\Vert A_t^T w_t \Vert \le \Vert \bar{U}^T U_t \Vert \Vert w_t \Vert \le \Vert w_t \Vert \).
To prove that the right-hand side of (3.6) is zero when \(v_t \in \mathcal{S}\) or \(v_t \perp \mathcal{S}\), we take the former case first. Here, there exists \(\hat{s}_t \in \mathbb {R}^d\) such that
$$\begin{aligned} v_t = \bar{U} s_t = U_t \hat{s}_t. \end{aligned}$$
Thus,
$$\begin{aligned} w_t = A_t s_t = U_t^T \bar{U} s_t = U_t^T U_t \hat{s}_t = \hat{s}_t, \end{aligned}$$
so that \(\Vert v_t \Vert = \Vert w_t \Vert \) and thus \(\theta _t=0\), from (3.3). This implies that the right-hand side of (3.6) is zero. When \(v_t \perp \mathcal{S}_t\), we have \(w_t = U_t^Tv_t=0\) and so \(\theta _t = \pi /2\) and \(\sigma _t=0\), implying again that the right-hand side of (3.6) is zero.