Appendix
1.1 A. Additional preliminaries
Lemma 6
Let \(\mathbf {z}\) be a Gaussian random vector with entries i.i.d. sampled from \({\mathcal {N}}(0,1)\). Given nonzero vectors \(\mathbf {w}\) and \({\tilde{\mathbf {w}}}\) with angle \(\theta \), we have
$$\begin{aligned} \mathbb {E}\left[ 1_{\{\mathbf {z}^\top \mathbf {w}> 0\}}\right] = \frac{1}{2}, \; \mathbb {E}\left[ 1_{\{\mathbf {z}^\top \mathbf {w}> 0, \, \mathbf {z}^\top {\tilde{\mathbf {w}}}> 0\}}\right] = \frac{\pi -\theta }{2\pi }, \end{aligned}$$
andFootnote 3
$$\begin{aligned} \mathbb {E}\left[ \mathbf {z}1_{\{\mathbf {z}^\top \mathbf {w}>0\} } \right] = \frac{1}{\sqrt{2\pi }} \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert }, \; \mathbb {E}\left[ \mathbf {z}1_{\{\mathbf {z}^\top \mathbf {w}>0, \, \mathbf {z}^\top \mathbf {w}^* >0\}}\right] = \frac{\cos (\theta /2)}{\sqrt{2\pi }} \frac{\frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } }{\left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } \right\| }. \end{aligned}$$
Proof
The third identity was proved in Lemma A.1 of [10]. To show the first one, since Gaussian distribution is rotation invariant, without loss of generality we assume \(\mathbf {w}= [w_1,0,\mathbf {0}^\top ]^\top \) with \(w_1> 0\), then \(\mathbb {E}\left[ 1_{\{\mathbf {z}^\top \mathbf {w}> 0\}}\right] = \mathbb {P}(z_1>0) = \frac{1}{2}\).
We further assume \({\tilde{\mathbf {w}}}= [{\tilde{w}}_1,{\tilde{w}}_2,\mathbf {0}^\top ]^\top \). It is easy to see
$$\begin{aligned} \mathbb {E}\left[ 1_{\{\mathbf {z}^\top \mathbf {w}> 0, \, \mathbf {z}^\top {\tilde{\mathbf {w}}}>0\}}\right] = \mathbb {P}(\mathbf {z}^\top \mathbf {w}> 0, \, \mathbf {z}^\top {\tilde{\mathbf {w}}}>0) = \frac{\pi - \theta }{2\pi }, \end{aligned}$$
which is the probability that \(\mathbf {z}\) forms an acute angle with both \(\mathbf {w}\) and \(\mathbf {w}^*\).
To prove the last identity, we use polar representation of 2-D Gaussian random variables, where r is the radius and \(\phi \) is the angle with \(\mathrm {d}\mathbb {P}_r = r \exp (-r^2/2)\mathrm {d}r\) and \(\mathrm {d}\mathbb {P}_\phi = \frac{1}{2\pi }\mathrm {d}\phi \). Then, \(\mathbb {E}\left[ z_i 1_{\{\mathbf {z}^\top \mathbf {w}>0, \, \mathbf {z}^\top \mathbf {w}^* >0\}}\right] = 0\) for \(i\ge 3\). Moreover,
$$\begin{aligned} \mathbb {E}\left[ z_1 1_{\{\mathbf {z}^\top \mathbf {w}>0, \, \mathbf {z}^\top \mathbf {w}^* >0\}}\right] = \frac{1}{2\pi }\int _{0}^\infty r^2\exp \left( -\frac{r^2}{2}\right) \mathrm {d}r \int _{-\frac{\pi }{2}+\theta }^{\frac{\pi }{2}} \cos (\phi ) \mathrm {d}\phi = \frac{1+\cos (\theta )}{2\sqrt{2\pi }} \end{aligned}$$
and
$$\begin{aligned} \mathbb {E}\left[ z_2 1_{\{\mathbf {z}^\top \mathbf {w}>0, \, \mathbf {z}^\top \mathbf {w}^* >0\}}\right] = \frac{1}{2\pi }\int _{0}^\infty r^2\exp \left( -\frac{r^2}{2}\right) \mathrm {d}r \int _{-\frac{\pi }{2}+\theta }^{\frac{\pi }{2}} \sin (\phi ) \mathrm {d}\phi = \frac{\sin (\theta )}{2\sqrt{2\pi }}. \end{aligned}$$
Therefore,
$$\begin{aligned} \mathbb {E}\left[ \mathbf {z}1_{\{\mathbf {z}^\top \mathbf {w}>0, \, \mathbf {z}^\top \mathbf {w}^* >0\}}\right] = \frac{\cos (\theta /2)}{\sqrt{2\pi }}[\cos (\theta /2), \sin (\theta /2),\mathbf {0}^\top ]^\top = \frac{\cos (\theta /2)}{\sqrt{2\pi }} \frac{\frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } }{\left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } \right\| }, \end{aligned}$$
where the last equality holds because \(\frac{\mathbf {w}}{\Vert \mathbf {w}\Vert }\) and \(\frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert }\) are two unit-normed vectors with angle \(\theta \). \(\square \)
Lemma 7
For any nonzero vectors \(\mathbf {w}\) and \({\tilde{\mathbf {w}}}\) with \(\Vert {\tilde{\mathbf {w}}}\Vert \ge \Vert \mathbf {w}\Vert = c>0\), we have
-
1.
\(|\theta (\mathbf {w},\mathbf {w}^*)-\theta ({\tilde{\mathbf {w}}},\mathbf {w}^*)|\le \frac{\pi }{2c}\Vert \mathbf {w}- {\tilde{\mathbf {w}}}\Vert \).
-
2.
\(\left\| \frac{1}{\Vert \mathbf {w}\Vert } \frac{\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } - \frac{1}{\Vert {\tilde{\mathbf {w}}}\Vert } \frac{\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } \right\| \le \frac{1}{c^2}\Vert \mathbf {w}- {\tilde{\mathbf {w}}}\Vert \).
Proof
1. Since by Cauchy–Schwarz inequality,
$$\begin{aligned} \left\langle {\tilde{\mathbf {w}}}, \mathbf {w}- \frac{c{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert }\right\rangle = {\tilde{\mathbf {w}}}^\top \mathbf {w}- c\Vert {\tilde{\mathbf {w}}}\Vert \le 0, \end{aligned}$$
we have
$$\begin{aligned} \Vert {\tilde{\mathbf {w}}}- \mathbf {w}\Vert ^2 =&\; \left\| \left( 1-\frac{c}{\Vert {\tilde{\mathbf {w}}}\Vert } \right) {\tilde{\mathbf {w}}}- \left( \mathbf {w}-\frac{c{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } \right) \right\| ^2 \ge \left\| \left( 1-\frac{c}{\Vert {\tilde{\mathbf {w}}}\Vert } \right) {\tilde{\mathbf {w}}}\right\| ^2 + \left\| \mathbf {w}-\frac{c{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } \right\| ^2 \nonumber \\ \ge&\; \left\| \mathbf {w}-\frac{c{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } \right\| ^2 = c^2 \left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } - \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert }\right\| ^2. \end{aligned}$$
(24)
Therefore,
$$\begin{aligned} \; |\theta (\mathbf {w},\mathbf {w}^*)-\theta ({\tilde{\mathbf {w}}},\mathbf {w}^*)|&\le \theta (\mathbf {w},{\tilde{\mathbf {w}}}) = \theta \left( \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert },\frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert }\right) \\&\le \; \pi \sin \left( \frac{\theta \left( \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert },\frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert }\right) }{2}\right) = \frac{\pi }{2}\left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } - \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert }\right\| \le \frac{\pi }{2c}\Vert \mathbf {w}- {\tilde{\mathbf {w}}}\Vert , \end{aligned}$$
where we used the fact \(\sin (x)\ge \frac{2x}{\pi }\) for \(x\in [0,\frac{\pi }{2}]\) and the estimate in (24).
2. Since \(\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\) is the projection of \(\mathbf {w}^*\) onto the complement space of \(\mathbf {w}\) and likewise for \(\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\), the angle between \(\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\) and \(\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\) is equal to the angle between \(\mathbf {w}\) and \({\tilde{\mathbf {w}}}\). Therefore,
$$\begin{aligned} \left\langle \frac{\left( \mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\right) \mathbf {w}^*}{\left\| \left( \mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\right) \mathbf {w}^*\right\| } , \frac{\left( \mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\right) \mathbf {w}^*}{\left\| \left( \mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\right) \mathbf {w}^*\right\| } \right\rangle = \left\langle \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } , \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert } \right\rangle , \end{aligned}$$
and thus
$$\begin{aligned} \left\| \frac{1}{\Vert \mathbf {w}\Vert } \frac{\left( \mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\right) \mathbf {w}^*}{\left\| \left( \mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\right) \mathbf {w}^*\right\| } - \frac{1}{\Vert {\tilde{\mathbf {w}}}\Vert } \frac{\left( \mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\right) \mathbf {w}^*}{\left\| \left( \mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\right) \mathbf {w}^*\right\| } \right\|&= \left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert ^2} - \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert ^2} \right\| \\&= \frac{\Vert \mathbf {w}- {\tilde{\mathbf {w}}}\Vert }{\Vert \mathbf {w}\Vert \Vert {\tilde{\mathbf {w}}}\Vert }\le \frac{1}{c^2}\Vert \mathbf {w}- {\tilde{\mathbf {w}}}\Vert . \end{aligned}$$
The second equality above holds because
$$\begin{aligned} \left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert ^2} - \frac{{\tilde{\mathbf {w}}}}{\Vert {\tilde{\mathbf {w}}}\Vert ^2} \right\| ^2 = \frac{1}{\Vert \mathbf {w}\Vert ^2} + \frac{1}{\Vert {\tilde{\mathbf {w}}}\Vert ^2} - \frac{2\langle \mathbf {w}, {\tilde{\mathbf {w}}}\rangle }{\Vert \mathbf {w}\Vert ^2 \Vert {\tilde{\mathbf {w}}}\Vert ^2} = \frac{\Vert \mathbf {w}- {\tilde{\mathbf {w}}}\Vert ^2}{\Vert \mathbf {w}\Vert ^2 \Vert {\tilde{\mathbf {w}}}\Vert ^2}. \end{aligned}$$
\(\square \)
1.2 B. Proofs
Proof of Proposition 1
We rewrite the update (11) as
$$\begin{aligned} \mathbf {w}^{t+1} = \arg \min _{\mathbf {w}\in \mathcal {Q}} \; \langle \mathbf {w}, \nabla f(\mathbf {w}^t) \rangle + \frac{1-\rho }{2\eta } \Vert \mathbf {w}-\mathbf {w}_f^t\Vert ^2 + \frac{\rho }{2\eta } \Vert \mathbf {w}-\mathbf {w}^t\Vert ^2 . \end{aligned}$$
Since \(\mathbf {w}^t, \, \mathbf {w}^{t+1} \in \mathcal {Q}\), we have
$$\begin{aligned}&\langle \mathbf {w}^{t+1}, \nabla f(\mathbf {w}^t) \rangle + \frac{1-\rho }{2\eta } \Vert \mathbf {w}^{t+1}-\mathbf {w}_f^t\Vert ^2 + \frac{\rho }{2\eta } \Vert \mathbf {w}^{t+1}-\mathbf {w}^t\Vert ^2 \\&\quad \le \langle \mathbf {w}^t, \nabla f(\mathbf {w}^t) \rangle + \frac{1-\rho }{2\eta } \Vert \mathbf {w}^t-\mathbf {w}_f^t\Vert ^2, \end{aligned}$$
or equivalently,
$$\begin{aligned}&\langle \mathbf {w}^{t+1}-\mathbf {w}^t, \nabla f(\mathbf {w}^t) \rangle + \frac{1-\rho }{2\eta } \left( \left\| \mathbf {w}^{t+1}-\mathbf {w}_f^t \right\| ^2- \left\| \mathbf {w}^t-\mathbf {w}_f^t \right\| ^2 \right) \nonumber \\&\quad + \frac{\rho }{2\eta }\Vert \mathbf {w}^{t+1}-\mathbf {w}^t\Vert ^2 \le 0. \end{aligned}$$
(25)
On the other hand, since f has L-Lipschitz gradient, the descent lemma [2] gives
$$\begin{aligned} f(\mathbf {w}^{t+1})\le f(\mathbf {w}^t) + \langle \nabla f(\mathbf {w}^t), \mathbf {w}^{t+1}-\mathbf {w}^t \rangle + \frac{L}{2}\Vert \mathbf {w}^{t+1}-\mathbf {w}^t\Vert ^2. \end{aligned}$$
(26)
Combining (25) and (26) completes the proof. \(\square \)
Proof of Lemma 1
We first evaluate \(\mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w})^\top \right] \), \(\mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w}^*)^\top \right] \) and \(\mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w}^*)\sigma (\mathbf {Z}\mathbf {w}^*)^\top \right] \). Let \(\mathbf {Z}_i^\top \) be the i-th row vector of \(\mathbf {Z}\). Since \(\mathbf {w}\ne \mathbf {0}\), using Lemma 6, we have
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w})^\top \right] _{ii} = \mathbb {E}\left[ \sigma (\mathbf {Z}_i^\top \mathbf {w})\sigma (\mathbf {Z}_i^\top \mathbf {w})\right] = \mathbb {E}\left[ 1_{\{\mathbf {Z}_i^\top \mathbf {w}> 0\}}\right] = \frac{1}{2}, \end{aligned}$$
and for \(i\ne j\),
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w})^\top \right] _{ij} = \mathbb {E}\left[ \sigma (\mathbf {Z}_i^\top \mathbf {w})\sigma (\mathbf {Z}_j^\top \mathbf {w})\right] = \mathbb {E}\left[ 1_{\{\mathbf {Z}_i^\top \mathbf {w}> 0\}}\right] \mathbb {E}\left[ 1_{\{\mathbf {Z}_j^\top \mathbf {w}> 0\}}\right] = \frac{1}{4}. \end{aligned}$$
Therefore, \(\mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w})^\top \right] =\mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w}^*)\sigma (\mathbf {Z}\mathbf {w}^*)^\top \right] = \frac{1}{4}\left( \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) \). Furthermore,
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w}^*)^\top \right] _{ii} = \mathbb {E}\left[ 1_{\{\mathbf {Z}_i^\top \mathbf {w}> 0, \mathbf {Z}_i^\top \mathbf {w}^*> 0\}}\right] = \frac{\pi -\theta (\mathbf {w},\mathbf {w}^*)}{2\pi }, \end{aligned}$$
and \(\mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w}^*)^\top \right] _{ij}=\frac{1}{4}\). So,
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\left[ \sigma (\mathbf {Z}\mathbf {w})\sigma (\mathbf {Z}\mathbf {w}^*)^\top \right] = \frac{1}{4}\left( \left( 1-\frac{2\theta (\mathbf {w},\mathbf {w}^*)}{\pi }\right) \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) . \end{aligned}$$
We thus have proved (14) by noticing that
$$\begin{aligned} f(\mathbf {v},\mathbf {w}) =&\; \frac{1}{2}\left( \mathbf {v}^\top \mathbb {E}_\mathbf {Z}[\sigma (\mathbf {Z}\mathbf {w})^\top \sigma (\mathbf {Z}\mathbf {w})]\mathbf {v}- 2\mathbf {v}^\top \mathbb {E}_\mathbf {Z}[\sigma (\mathbf {Z}\mathbf {w})^\top \sigma (\mathbf {Z}\mathbf {w}^*)]\mathbf {v}^* \right. \\&\left. +(\mathbf {v}^*)^\top \mathbb {E}_\mathbf {Z}[\sigma (\mathbf {Z}\mathbf {w}^*)^\top \sigma (\mathbf {Z}\mathbf {w}^*)]\mathbf {v}^*\right) . \end{aligned}$$
Next, since (15) is trivial, we only show (16). Since \(\theta (\mathbf {w},\mathbf {w}^*) = \arccos \left( \frac{\mathbf {w}^\top \mathbf {w}^*}{\Vert \mathbf {w}\Vert }\right) \) is differentiable w.r.t. \(\mathbf {w}\) at \(\theta (\mathbf {w},\mathbf {w}^*)\in (0,\pi )\), we have
$$\begin{aligned} \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v},\mathbf {w})= & {} \frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi }\frac{\partial \theta }{\partial \mathbf {w}}(\mathbf {w},\mathbf {w}^*) = -\frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi }\frac{\Vert \mathbf {w}\Vert ^2\mathbf {w}^* - (\mathbf {w}^\top \mathbf {w}^*)\mathbf {w}}{\Vert \mathbf {w}\Vert ^3\sqrt{1-\frac{(\mathbf {w}^\top \mathbf {w}^*)^2}{\Vert \mathbf {w}\Vert ^2}}} \\= & {} -\frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi \Vert \mathbf {w}\Vert }\frac{\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert }. \end{aligned}$$
\(\square \)
Proof of Proposition 2
Suppose \(\mathbf {v}^\top \mathbf {v}^*=0\) and \(\frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w}) = \mathbf {0}\), then by Lemma 1,
$$\begin{aligned} 0 = \mathbf {v}^\top \mathbf {v}^* = (\mathbf {v}^*)^\top (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1}\left( \left( 1- \frac{2}{\pi }\theta (\mathbf {w},\mathbf {w}^*)\right) \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) \mathbf {v}^*. \end{aligned}$$
(27)
From (27), it follows that
$$\begin{aligned} \frac{2}{\pi }\theta (\mathbf {w},\mathbf {w}^*) (\mathbf {v}^*)^\top (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1} \mathbf {v}^*= (\mathbf {v}^*)^\top (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1}\left( \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) \mathbf {v}^* = \Vert \mathbf {v}^*\Vert ^2. \end{aligned}$$
(28)
On the other hand, from (27) it also follows that
$$\begin{aligned} \left( \frac{2}{\pi }\theta (\mathbf {w},\mathbf {w}^*)-1\right) (\mathbf {v}^*)^\top (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1} \mathbf {v}^* = (\mathbf {v}^*)^\top (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1} \mathbf {1}(\mathbf {1}^\top \mathbf {v}^*) = \frac{(\mathbf {1}^\top \mathbf {v}^*)^2}{m+1}, \end{aligned}$$
where \(\mathbf {I}\) is an m-by-m identity matrix, and we used \((\mathbf {I}+ \mathbf {1}\mathbf {1}^\top ) \mathbf {1} = (m+1)\mathbf {1}\). Taking the difference of the two equalities above gives
$$\begin{aligned} (\mathbf {v}^*)^\top (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1}\mathbf {v}^* = \Vert \mathbf {v}^*\Vert ^2 - \frac{(\mathbf {1}^\top \mathbf {v}^*)^2}{m+1}. \end{aligned}$$
By (28), we have \(\theta (\mathbf {w},\mathbf {w}^*) = \frac{\pi }{2}\frac{(m+1)\Vert \mathbf {v}^*\Vert ^2}{(m+1)\Vert \mathbf {v}^*\Vert ^2 - (\mathbf {1}^\top \mathbf {v}^*)^2}\), which requires
$$\begin{aligned} \frac{\pi }{2}\frac{(m+1)\Vert \mathbf {v}^*\Vert ^2}{(m+1)\Vert \mathbf {v}^*\Vert ^2 - (\mathbf {1}^\top \mathbf {v}^*)^2}<\pi , \; \text{ or } \text{ equivalently, } \; (\mathbf {1}^\top \mathbf {v}^*)^2 < \frac{m+1}{2}\Vert \mathbf {v}^*\Vert ^2. \end{aligned}$$
Otherwise, \(\frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w})\) and \(\frac{\partial f}{\partial \mathbf {w}}(\mathbf {v},\mathbf {w})\) do not vanish simultaneously, and there is no critical point. \(\square \)
Proof of Lemma 2
It is easy to check that \(\Vert \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \Vert = m+1\). Invoking Lemma 7.1 gives
$$\begin{aligned} \left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w}) - \frac{\partial f}{\partial \mathbf {v}}({\tilde{\mathbf {v}}},{\tilde{\mathbf {w}}}) \right\| =&\; \frac{1}{4}\left\| \big (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top \big )(\mathbf {v}-{\tilde{\mathbf {v}}}) + \frac{2}{\pi }(\theta (\mathbf {w},\mathbf {w}^*) - \theta ({\tilde{\mathbf {w}}},\mathbf {w}^*) )\mathbf {v}^* \right\| \\ \le&\; \frac{1}{4}\left( (m+1)\Vert \mathbf {v}- {\tilde{\mathbf {v}}}\Vert + \frac{2\Vert \mathbf {v}^*\Vert }{\pi } |\theta (\mathbf {w},\mathbf {w}^*) - \theta ({\tilde{\mathbf {w}}},\mathbf {w}^*)|\right) \\ \le&\; \frac{1}{4}\left( (m+1)\Vert \mathbf {v}- {\tilde{\mathbf {v}}}\Vert + \frac{\Vert \mathbf {v}^*\Vert }{c} \left\| \mathbf {w}- {\tilde{\mathbf {w}}}\right\| \right) \\ \le&\; \frac{1}{4}\left( m+1 + \frac{\Vert \mathbf {v}^*\Vert }{c} \right) \Vert (\mathbf {v}, \mathbf {w}) - ({\tilde{\mathbf {v}}}, {\tilde{\mathbf {w}}})\Vert . \end{aligned}$$
Using Lemma 7.2, we further have
$$\begin{aligned} \left\| \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v},\mathbf {w}) - \frac{\partial f}{\partial \mathbf {w}}({\tilde{\mathbf {v}}},{\tilde{\mathbf {w}}}) \right\| =&\; \left\| \frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi \Vert \mathbf {w}\Vert } \frac{\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } - \frac{{\tilde{\mathbf {v}}}^\top \mathbf {v}^*}{2\pi \Vert {\tilde{\mathbf {w}}}\Vert } \frac{\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } \right\| \\ \le&\; \left\| \frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi \Vert \mathbf {w}\Vert } \frac{\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } - \frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi \Vert {\tilde{\mathbf {w}}}\Vert } \frac{\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } \right\| \\&\; + \left\| \frac{\mathbf {v}^\top \mathbf {v}^*}{2\pi \Vert {\tilde{\mathbf {w}}}\Vert } \frac{\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } - \frac{{\tilde{\mathbf {v}}}^\top \mathbf {v}^*}{2\pi \Vert {\tilde{\mathbf {w}}}\Vert } \frac{\Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{{\tilde{\mathbf {w}}}{\tilde{\mathbf {w}}}^\top }{\Vert {\tilde{\mathbf {w}}}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } \right\| \\ \le&\; \frac{|\mathbf {v}^\top \mathbf {v}^*|}{2 \pi c^2 }\Vert \mathbf {w}-{\tilde{\mathbf {w}}}\Vert + \frac{\Vert \mathbf {v}^*\Vert }{2\pi c}\Vert \mathbf {v}-{\tilde{\mathbf {v}}}\Vert \\ \le&\; \frac{(C+c)\Vert \mathbf {v}^*\Vert }{2\pi c^2}\Vert (\mathbf {v}, \mathbf {w}) - ({\tilde{\mathbf {v}}}, {\tilde{\mathbf {w}}})\Vert . \end{aligned}$$
Combining the two inequalities above validates the claim. \(\square \)
Proof of Lemma 3
Equation (22) is true because \(\frac{\partial \ell }{\partial \mathbf {v}}(\mathbf {v},\mathbf {w};\mathbf {Z})\) is linear in \(\mathbf {v}\). To show (23), by (20) and the fact that \(\mu ^{\prime } = \sigma \), we have
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\left[ \mathbf {g}(\mathbf {v},\mathbf {w};\mathbf {Z})\right] =&\; \mathbb {E}_\mathbf {Z}\left[ \left( \sum _{i=1}^m v_i \sigma (\mathbf {Z}^\top _i\mathbf {w}) - \sum _{i=1}^m v^*_i\sigma (\mathbf {Z}^\top _i\mathbf {w}^*) \right) \left( \sum _{i=1}^m \mathbf {Z}_i v_i \sigma (\mathbf {Z}^\top _i\mathbf {w}) \right) \right] \\ =&\; \mathbb {E}_\mathbf {Z}\left[ \left( \sum _{i=1}^m v_i 1_{\{\mathbf {Z}^\top _i\mathbf {w}>0\}} - \sum _{i=1}^m v^*_i1_{\{\mathbf {Z}^\top _i\mathbf {w}^*>0\}} \right) \left( \sum _{i=1}^m 1_{\{\mathbf {Z}^\top _i\mathbf {w}>0\}} v_i\mathbf {Z}_i \right) \right] . \end{aligned}$$
Invoking Lemma 6, we have
$$\begin{aligned} \mathbb {E}\left[ \mathbf {Z}_i 1_{\{\mathbf {Z}_i^\top \mathbf {w}>0, \mathbf {Z}_j^\top \mathbf {w}>0\}}\right] = {\left\{ \begin{array}{ll} \frac{1}{\sqrt{2\pi }} \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } &{} \text{ if } i=j, \\ \frac{1}{2\sqrt{2\pi }} \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } &{} \text{ if } i\ne j, \end{array}\right. } \end{aligned}$$
(29)
and
$$\begin{aligned} \mathbb {E}\left[ \mathbf {Z}_i 1_{\{\mathbf {Z}_i^\top \mathbf {w}>0, \mathbf {Z}_j^\top \mathbf {w}^* >0\}}\right] = {\left\{ \begin{array}{ll} \frac{\cos (\theta (\mathbf {w},\mathbf {w}^*)/2)}{\sqrt{2\pi }} \frac{\frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \mathbf {w}^* }{\left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \mathbf {w}^* \right\| } &{} \text{ if } i=j, \\ \frac{1}{2\sqrt{2\pi }} \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } &{} \text{ if } i\ne j. \end{array}\right. } \end{aligned}$$
(30)
Therefore,
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\left[ \mathbf {g}(\mathbf {v},\mathbf {w};\mathbf {Z})\right] =&\;\sum _{i=1}^m v_i^2 \mathbb {E}\left[ \mathbf {Z}_i 1_{\{\mathbf {Z}_i^\top \mathbf {w}>0\}}\right] + \sum _{i=1}^m \sum _{\overset{j=1}{j\ne i}}^m v_i v_j \mathbb {E}\left[ \mathbf {Z}_i 1_{\{\mathbf {Z}_i^\top \mathbf {w}>0, \mathbf {Z}_j^\top \mathbf {w}>0\}}\right] \\&\; - \sum _{i=1}^m v_i v_i^* \mathbb {E}\left[ \mathbf {Z}_i 1_{\{\mathbf {Z}_i^\top \mathbf {w}>0, \mathbf {Z}_i^\top \mathbf {w}^*>0\}}\right] \\&- \sum _{i=1}^m \sum _{\overset{j=1}{j\ne i}}^m v_i v_j^* \mathbb {E}\left[ \mathbf {Z}_i 1_{\{\mathbf {Z}_i^\top \mathbf {w}>0, \mathbf {Z}_j^\top \mathbf {w}^*>0\}}\right] \\ =&\; \frac{1}{2\sqrt{2\pi }}\left( \Vert \mathbf {v}\Vert ^2 + (\mathbf {1}^\top \mathbf {v})^2 \right) \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert }\\&-\cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{\mathbf {v}^\top \mathbf {v}^*}{\sqrt{2\pi }} \frac{\frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \mathbf {w}^* }{\left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \mathbf {w}^* \right\| } \\&\; - \frac{1}{2\sqrt{2\pi }}\left( (\mathbf {1}^\top \mathbf {v})(\mathbf {1}^\top \mathbf {v}^*) - \mathbf {v}^\top \mathbf {v}^* \right) \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert }, \end{aligned}$$
which is exactly (23). \(\square \)
Proof of Lemma 4
Notice that \((\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2})\mathbf {w}= \mathbf {0}\) and \(\Vert \mathbf {w}^*\Vert = 1\), if \(\theta (\mathbf {w},\mathbf {w}_*)\ne 0, \pi \), then we have
$$\begin{aligned}&\; \left\langle \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v},\mathbf {w}; \mathbf {Z})\Big ], \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v},\mathbf {w}) \right\rangle \\&\quad = \; \cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{(\mathbf {v}^{\top }\mathbf {v}^*)^2}{(\sqrt{2\pi })^3} \left\langle \frac{1}{\Vert \mathbf {w}\Vert } \frac{\Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*}{\Big \Vert \Big (\mathbf {I}- \frac{\mathbf {w}\mathbf {w}^\top }{\Vert \mathbf {w}\Vert ^2}\Big )\mathbf {w}^*\Big \Vert } , \frac{\mathbf {w}^*}{\left\| \frac{\mathbf {w}}{\Vert \mathbf {w}\Vert } + \mathbf {w}^*\right\| } \right\rangle \\&\quad = \; \cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{(\mathbf {v}^{\top }\mathbf {v}^*)^2}{(\sqrt{2\pi })^3} \frac{\Vert \mathbf {w}\Vert ^2 - (\mathbf {w}^\top \mathbf {w}^*)^2}{\Vert \Vert \mathbf {w}\Vert ^2\mathbf {w}^* - \mathbf {w}(\mathbf {w}^\top \mathbf {w}^*)\Vert \, \Vert \mathbf {w}+\Vert \mathbf {w}\Vert \mathbf {w}^*\Vert } \\&\quad = \; \cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{(\mathbf {v}^{\top }\mathbf {v}^*)^2}{(\sqrt{2\pi })^3} \frac{\Vert \mathbf {w}\Vert ^2 - (\mathbf {w}^\top \mathbf {w}^*)^2}{\sqrt{\Vert \mathbf {w}\Vert ^4 -\Vert \mathbf {w}\Vert ^2(\mathbf {w}^\top \mathbf {w}^*)^2} \sqrt{2(\Vert \mathbf {w}\Vert ^2+ \Vert \mathbf {w}\Vert (\mathbf {w}^\top \mathbf {w}^*))}} \\&\quad = \; \cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{(\mathbf {v}^{\top }\mathbf {v}^*)^2}{4(\sqrt{\pi \Vert \mathbf {w}\Vert })^3} \frac{\Vert \mathbf {w}\Vert ^2 - (\mathbf {w}^\top \mathbf {w}^*)^2}{\sqrt{\Vert \mathbf {w}\Vert ^2 -(\mathbf {w}^\top \mathbf {w}^*)^2} \sqrt{\Vert \mathbf {w}\Vert + (\mathbf {w}^\top \mathbf {w}^*)}} \\&\quad = \; \cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{(\mathbf {v}^{\top }\mathbf {v}^*)^2\sqrt{1-\frac{\mathbf {w}^\top \mathbf {w}^*}{\Vert \mathbf {w}\Vert }}}{4(\sqrt{\pi })^3\Vert \mathbf {w}\Vert }\\&\quad = \; \cos \left( \frac{\theta (\mathbf {w},\mathbf {w}^*)}{2}\right) \frac{(\mathbf {v}^{\top }\mathbf {v}^*)^2\sqrt{1 - \cos (\theta (\mathbf {w},\mathbf {w}^*))}}{4(\sqrt{\pi })^3\Vert \mathbf {w}\Vert } \\&\quad = \; \frac{\sin \left( \theta (\mathbf {w},\mathbf {w}^*)\right) }{2(\sqrt{2\pi })^3\Vert \mathbf {w}\Vert }(\mathbf {v}^{\top }\mathbf {v}^*)^2. \end{aligned}$$
\(\square \)
Proof of Lemma 5
Denote \(\theta := \theta (\mathbf {w},\mathbf {w}^*)\). By Lemma 1, we have
$$\begin{aligned} \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w}) = \frac{1}{4}\big (\mathbf {I}+ \mathbf {1}\mathbf {1}^\top \big ) \mathbf {v}- \frac{1}{4}\left( \left( 1-\frac{2\theta }{\pi } \right) \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) \mathbf {v}^*. \end{aligned}$$
Since \(\Vert \mathbf {w}\Vert =1\), Lemma 3 gives
$$\begin{aligned} \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v},\mathbf {w}; \mathbf {Z})\Big ] = \frac{h(\mathbf {v},\mathbf {v}^*)}{2\sqrt{2\pi }}\mathbf {w}- \cos \left( \frac{\theta }{2}\right) \frac{\mathbf {v}^\top \mathbf {v}^*}{\sqrt{2\pi }}\frac{\mathbf {w}+ \mathbf {w}^*}{\left\| \mathbf {w}+ \mathbf {w}^*\right\| }, \end{aligned}$$
(31)
where
$$\begin{aligned} h(\mathbf {v},\mathbf {v}^*) =&\; \Vert \mathbf {v}\Vert ^2+ (\mathbf {1}^\top \mathbf {v})^2 - (\mathbf {1}^\top \mathbf {v})(\mathbf {1}^\top \mathbf {v}^*) + \mathbf {v}^\top \mathbf {v}^* \nonumber \\ =&\; \mathbf {v}^\top \left( \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) \mathbf {v}- \mathbf {v}^\top (\mathbf {1}\mathbf {1}^\top - \mathbf {I})\mathbf {v}^* \nonumber \\ =&\; \mathbf {v}^\top \left( \mathbf {I}+ \mathbf {1}\mathbf {1}^\top \right) \mathbf {v}- \mathbf {v}^\top \left( \mathbf {1}\mathbf {1}^\top + \left( 1-\frac{2\theta }{\pi }\right) \mathbf {I}\right) \mathbf {v}^* + 2\left( 1 - \frac{\theta }{\pi }\right) \mathbf {v}^\top \mathbf {v}^* \nonumber \\ =&\; 4 \mathbf {v}^\top \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w}) + 2\left( 1 - \frac{\theta }{\pi }\right) \mathbf {v}^\top \mathbf {v}^*, \end{aligned}$$
(32)
and by Lemma 4,
$$\begin{aligned} \left\langle \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v},\mathbf {w}; \mathbf {Z})\Big ], \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v},\mathbf {w}) \right\rangle = \frac{\sin \left( \theta \right) }{2(\sqrt{2\pi })^3}(\mathbf {v}^\top \mathbf {v}^*)^2. \end{aligned}$$
Hence, for some A depending only on C, we have
$$\begin{aligned}&\; \left\| \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v},\mathbf {w}; \mathbf {Z})\Big ] \right\| ^2 \\&\quad = \; \left\| \frac{2 \mathbf {v}^\top \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w})}{\sqrt{2\pi }} \mathbf {w}+ \cos \left( \frac{\theta }{2}\right) \frac{\mathbf {v}^\top \mathbf {v}^*}{\sqrt{2\pi }}\left( \mathbf {w}- \frac{\mathbf {w}+ \mathbf {w}^*}{\left\| \mathbf {w}+ \mathbf {w}^*\right\| } \right) \right. \\&\qquad \left. + \left( 1-\frac{\theta }{\pi }-\cos \left( \frac{\theta }{2}\right) \right) \frac{\mathbf {v}^\top \mathbf {v}^*}{\sqrt{2\pi }}\mathbf {w}\right\| ^2 \\&\quad \le \; \frac{6C^2}{\pi } \left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w})\right\| ^2 + \cos ^2\left( \frac{\theta }{2}\right) \frac{3(\mathbf {v}^\top \mathbf {v}^*)^2}{2\pi }\left\| \mathbf {w}- \frac{\mathbf {w}+ \mathbf {w}^*}{\left\| \mathbf {w}+ \mathbf {w}^*\right\| } \right\| ^2 \\&\qquad \; + \left( 1-\frac{\theta }{\pi }-\cos \left( \frac{\theta }{2}\right) \right) ^2 \frac{3(\mathbf {v}^\top \mathbf {v}^*)^2}{2\pi } \\&\quad \le \; \frac{6C^2}{\pi }\left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w})\right\| ^2 + \cos ^2\left( \frac{\theta }{2}\right) \frac{3\theta ^2}{8\pi } (\mathbf {v}^\top \mathbf {v}^*)^2\\&\qquad +\left( 1-\frac{\theta }{\pi }-\cos \left( \frac{\theta }{2}\right) \right) ^2 \frac{3(\mathbf {v}^\top \mathbf {v}^*)^2}{2\pi } \\&\quad \le \; \frac{6C^2}{\pi }\left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w})\right\| ^2 + \frac{3\pi }{8}\cos ^2\left( \frac{\theta }{2}\right) \sin ^2\left( \frac{\theta }{2}\right) (\mathbf {v}^\top \mathbf {v}^*)^2 + \frac{3\sin (\theta )}{2\pi }(\mathbf {v}^\top \mathbf {v}^*)^2 \\&\quad \le \; A\left( \left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v},\mathbf {w})\right\| ^2 + \left\langle \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v},\mathbf {w}; \mathbf {Z})\Big ], \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v},\mathbf {w}) \right\rangle \right) , \end{aligned}$$
where the equality is due to (31) and (32), the first inequality is due to Cauchy-Schwarz inequality, the second inequality holds because the angle between \(\mathbf {w}\) and \(\frac{\mathbf {w}+ \mathbf {w}^*}{\left\| \mathbf {w}+ \mathbf {w}^*\right\| }\) is \(\frac{\theta }{2}\) and \(\left\| \mathbf {w}- \frac{\mathbf {w}+ \mathbf {w}^*}{\left\| \mathbf {w}+ \mathbf {w}^*\right\| } \right\| \le \frac{\theta }{2}\), whereas the third inequality is due to \(\sin (x)\ge \frac{2x}{\pi }\), \(\cos (x)\ge 1-\frac{2x}{\pi }\), and
$$\begin{aligned} \left( 1-\frac{2x}{\pi } -\cos (x)\right) ^2\le & {} \left( \cos (x) - 1+ \frac{2x}{\pi } \right) \left( \cos (x) + 1 - \frac{2x}{\pi }\right) \\\le & {} \sin (x)(2\cos (x)) = \sin (2x), \end{aligned}$$
for all \(x\in [0,\frac{\pi }{2}]\). \(\square \)
Proof of Theorem 1
To leverage Lemmas 2 and 5, we would need the boundedness of \(\{\mathbf {v}^t\}\). Due to the coerciveness of f w.r.t \(\mathbf {v}\), there exists \(C_0>0\), such that \(\Vert \mathbf {v}\Vert \le C_0\) for any \(\mathbf {v}\in \{\mathbf {v}\in \mathbb {R}^m: f(\mathbf {v},\mathbf {w})\le f(\mathbf {v}^0,\mathbf {w}^0) \text{ for } \text{ some } \mathbf {w}\}\). In particular, \(\Vert \mathbf {v}^0\Vert \le C_0\). Using induction, suppose we already have \(f(\mathbf {v}^{t},\mathbf {w}^{t})\le f(\mathbf {v}^0,\mathbf {w}^0)\) and \(\Vert \mathbf {v}^t\Vert \le C_0\). If \(\mathbf {w}^t = \pm \mathbf {w}^*\), then \(\mathbf {w}^{t+1} = \mathbf {w}^{t+2} = \cdots = \pm \mathbf {w}^*\), and the original problem reduces to a quadratic program in terms of \(\mathbf {v}\). So \(\{\mathbf {v}^t\}\) will converge to \(\mathbf {v}^*\) or \((\mathbf {I}+ \mathbf {1}\mathbf {1}^\top )^{-1}(\mathbf {1}\mathbf {1}^\top - \mathbf {I})\mathbf {v}^*\) by choosing a suitable step size \(\eta \). In either case, we have \(\left\| \mathbb {E}_\mathbf {Z}\Big [\frac{\partial \ell }{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ]\right\| \) and \(\left\| \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ]\right\| \) both converge to 0. Else if \(\mathbf {w}^t \ne \pm \mathbf {w}^*\), we define for \(a\in [0,1]\) that
$$\begin{aligned} \mathbf {v}^t(a) := \mathbf {v}^t - a(\mathbf {v}^{t+1} - \mathbf {v}^t) = \mathbf {v}^t - a \eta \mathbb {E}_\mathbf {Z}\left[ \frac{\partial \ell }{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t;\mathbf {Z})\right] \end{aligned}$$
and
$$\begin{aligned} \mathbf {w}^t(a) := \mathbf {w}^t - a(\mathbf {w}^{t+1/2} - \mathbf {w}^t) = \mathbf {w}^t - a\eta \mathbb {E}_\mathbf {Z}\left[ \mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\right] , \end{aligned}$$
which satisfy
$$\begin{aligned} \mathbf {v}^t(0) = \mathbf {v}^t, \; \mathbf {v}^t(1) = \mathbf {v}^{t+1}, \; \mathbf {w}^t(0) = \mathbf {w}^t, \; \mathbf {w}^t(1) = \mathbf {w}^{t+1/2}. \end{aligned}$$
Let us fix \(0<c<1\) and \(C\ge C_0\). By the expressions of \(\mathbb {E}_\mathbf {Z}\left[ \frac{\partial \ell }{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t;\mathbf {Z})\right] \) and \(\mathbb {E}_\mathbf {Z}\left[ \mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\right] \) given in Lemma 3 and since \(\Vert \mathbf {w}^t\Vert =1\), for sufficiently small \({\tilde{\eta }}\) depending on \(C_0\), with \(\eta \le {\tilde{\eta }}\), it holds that \(\Vert \mathbf {v}^t(a)\Vert \le C\) and \(\Vert \mathbf {w}^t(a)\Vert \ge c\) for all \(a\in [0,1]\). Possibly at some point \(a_0\) where \(\theta (\mathbf {w}^t(a_0),\mathbf {w}^*) = 0\) or \(\pi \), such that \(\frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t(a_0),\mathbf {w}^t(a_0))\) does not exist. Otherwise, \(\left\| \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t(a),\mathbf {w}^t(a)) \right\| \) is uniformly bounded for all \(a\in [0,1]/\{a_0\}\), which makes it integrable over the interval [0, 1]. Then, we have
$$\begin{aligned} f(\mathbf {v}^{t+1}, \mathbf {w}^{t+1})&= \; f(\mathbf {v}^{t+1}, \mathbf {w}^{t+1/2}) = f(\mathbf {v}^t+ (\mathbf {v}^{t+1} -\mathbf {v}^t), \mathbf {w}^t+ (\mathbf {w}^{t+1/2}-\mathbf {w}^t)) \nonumber \\&= \; f(\mathbf {v}^t, \mathbf {w}^t) + \int _{0}^1 \left\langle \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t(a),\mathbf {w}^t(a)) , \mathbf {v}^{t+1} -\mathbf {v}^t \right\rangle \mathrm {d}a \nonumber \\&\quad + \int _{0}^1 \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t(a),\mathbf {w}^t(a)), \mathbf {w}^{t+1/2} - \mathbf {w}^t \right\rangle \mathrm {d}a \nonumber \\&= \; f(\mathbf {v}^{t}, \mathbf {w}^{t}) + \left\langle \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t) , \mathbf {v}^{t+1} -\mathbf {v}^t \right\rangle + \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t,\mathbf {w}^t) , \mathbf {w}^{t+1/2} -\mathbf {w}^t \right\rangle \nonumber \\&\quad + \int _{0}^1 \left\langle \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t(a),\mathbf {w}^t(a)) - \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t) , \mathbf {v}^{t+1} -\mathbf {v}^t \right\rangle \mathrm {d}a \nonumber \\&\quad + \int _{0}^1 \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t(a),\mathbf {w}^t(a)) - \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t,\mathbf {w}^t) , \mathbf {w}^{t+1/2} - \mathbf {w}^t \right\rangle \mathrm {d}a \nonumber \\&\le \; f(\mathbf {v}^{t}, \mathbf {w}^{t}) -\left( \eta -\frac{L\eta ^2}{2}\right) \left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t) \right\| ^2 \nonumber \\&\quad - \eta \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t,\mathbf {w}^t), \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ] \right\rangle \nonumber \\&\quad + \frac{L\eta ^2}{2} \left\| \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ] \right\| ^2 \nonumber \\&\le \; f(\mathbf {v}^{t}, \mathbf {w}^{t}) -\left( \eta -(1+A)\frac{L\eta ^2}{2}\right) \left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t) \right\| ^2 \nonumber \\&\quad - \left( \eta -\frac{AL\eta ^2}{2}\right) \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t,\mathbf {w}^t), \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ] \right\rangle . \end{aligned}$$
(33)
The third equality is due to the fundamental theorem of calculus. In the first inequality, we called Lemma 2 for \((\mathbf {v}^t, \mathbf {w}^t)\) and \((\mathbf {v}^t(a), \mathbf {w}^t(a))\) with \(a\in [0,1]/\{a_0\}\). In the last inequality, we used Lemma 5. So when \(\eta < \eta _0:= \min \left\{ \frac{2}{(1+A)L}, {\tilde{\eta }}\right\} \), we have \(f(\mathbf {v}^{t+1},\mathbf {w}^{t+1})\le f(\mathbf {v}^0,\mathbf {w}^0)\) and thus \(\Vert \mathbf {v}^{t+1}\Vert \le C_0\).
Summing up the inequality (33) over t from 0 to \(\infty \) and using \(f\ge 0\), we have
$$\begin{aligned}&\; \eta \sum _{t=0}^\infty \left( 1 -(1+A)\frac{L\eta }{2}\right) \left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t) \right\| ^2 + \left( 1 -\frac{AL\eta }{2}\right) \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t,\mathbf {w}^t), \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ] \right\rangle \\&\quad \le \; f(\mathbf {v}^0,\mathbf {w}^0)<\infty . \end{aligned}$$
Hence,
$$\begin{aligned} \lim _{t\rightarrow \infty }\left\| \frac{\partial f}{\partial \mathbf {v}}(\mathbf {v}^t,\mathbf {w}^t)\right\| = 0 \end{aligned}$$
and
$$\begin{aligned} \lim _{t\rightarrow \infty } \left\langle \frac{\partial f}{\partial \mathbf {w}}(\mathbf {v}^t,\mathbf {w}^t), \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ] \right\rangle = 0. \end{aligned}$$
Invoking Lemma 5 again, we further have
$$\begin{aligned} \lim _{t\rightarrow \infty }\left\| \mathbb {E}_\mathbf {Z}\Big [\mathbf {g}(\mathbf {v}^t,\mathbf {w}^t; \mathbf {Z})\Big ]\right\| = 0, \end{aligned}$$
which completes the proof. \(\square \)