Skip to main content

Local Convergence of an Algorithm for Subspace Identification from Partial Data

Abstract

Grassmannian rank-one update subspace estimation (GROUSE) is an iterative algorithm for identifying a linear subspace of \(\mathbb {R}^n\) from data consisting of partial observations of random vectors from that subspace. This paper examines local convergence properties of GROUSE, under assumptions on the randomness of the observed vectors, the randomness of the subset of elements observed at each iteration, and incoherence of the subspace with the coordinate directions. Convergence at an expected linear rate is demonstrated under certain assumptions. The case in which the full random vector is revealed at each iteration allows for much simpler analysis and is also described. GROUSE is related to incremental SVD methods and to gradient projection algorithms in optimization.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. B. A. Ardekani, J. Kershaw, K. Kashikura, and I. Kanno, Activation detection in functional MRIusing subspace modeling and maximum likelihood estimation, IEEETransactions on Medical Imaging, 18 (1999), pp. 101–114.

  2. L. Balzano, Handling Missing Data in High-DimensionalSubspace Modeling, PhD thesis, University of Wisconsin-Madison, May2012.

  3. L. Balzano, R. Nowak, and B. Recht, Online identification and tracking of subspaces from highlyincomplete information, in 48th Annual Allerton Conference OnCommunication, Control, and Computing (Allerton), September 2010,pp. 704–711. Available at http://arxiv.org/abs/1006.4046.

  4. L. Balzano, B. Recht, and R. Nowak, High-dimensional matched subspace detection when data are missing,in Proceedings of the International Symposium on Information Theory,IEEE, June 2010, pp. 1638–1642.

  5. L. Balzano and S. J. Wright, On GROUSE andincremental SVD, in Proceedings of the 5th International Workshopon Computational Advances in Multi-Sensor Adaptive Processing(CAMSAP), 2013, pp. 1–4.

  6. R. Basri and D. Jacobs, Lambertianreflectance and linear subspaces, IEEE Transactions on PatternAnalysis and Machine Intelligence, 25 (2003), pp. 218–233.

  7. E. Candès and J. Romberg, Sparsity andincoherence in compressive sampling, Inverse Problems, 23 (2007),pp. 969–985.

  8. J. P. Costeira and T. Kanade,A multibody factorization method for independently movingobjects, International Journal of Computer Vision, 29 (1998), pp.159–179.

  9. D. Gross, Recovering low-rank matrices from fewcoefficients in any basis, IEEE Transactions on Information Theory,57 (2011), pp. 1548–1566.

  10. J. Gupchup, R. Burns, A. Terzis, and A. Szalay, Model-based event detection in wireless sensornetworks, in Proceedings of the Workshop on Data Sharing andInteroperability (DSI), 2007.

  11. Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp, Finding structure with randomness:Probabilistic algorithms for constructing approximate matrixdecompositions, SIAM Review, 53 (2011), pp. 217–288.

  12. H. Krim and M. Viberg, Two decades of arraysignal processing research: the parametric approach, IEEE SignalProcessing Magazine, 13 (1996), pp. 67–94.

  13. A. Lakhina, M. Crovella, and C. Diot, Diagnosing network-wide traffic anomalies, in Proceedings ofSIGCOMM, 2004, pp. 219–230.

  14. D. Manolakis and G. Shaw, Detectionalgorithms for hyperspectral imaging applications, IEEE SignalProcessing Magazine, 19 (2002), pp. 29–43.

  15. J. Nocedal and S. J. Wright, NumericalOptimization, Springer, New York, second ed., 2006.

  16. S. Papadimitriou, J. Sun, and C. Faloutsos,Streaming pattern discovery in multiple time-series, inProceedings of the 31st International Conference on Very Large DataBases (VLDB ’05), 2005, pp. 697–708.

  17. B. Recht, A simpler approach to matrix completion,Journal of Machine Learning Research, 12 (2011), pp. 3413–3430.

  18. G. W. Stewart and J. Sun, Matrix PerturbationTheory, Computer Science and Scientific Computing, Academic Press,New York, 1990.

  19. L. Tong and S. Perreau, Multichannel blindidentification: From subspace to maximum likelihood methods,Proceedings of the IEEE, 86 (1998), pp. 1951–1968.

  20. P. van Overschee and B. de Moor, SubspaceIdentification for Linear Systems, Kluwer Academic Publishers,Norwell, Massachusetts, 1996.

  21. L. Vandenberghe, Convex optimization techniques in systemidentification, in Proceedings of the IFAC Symposium on SystemIdentification, July 2012, pp. 71–76.

  22. G. S. Wagner and T. J. Owens, Signaldetection using multi-channel seismic data, Bulletin of theSeismological Society of America, 86 (1996), pp. 221–231.

Download references

Acknowledgments

We are grateful to two referees for helpful and constructive comments on the original version of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen J. Wright.

Additional information

Communicated by Michael Todd.

Appendices

Appendix 1: Proof of Theorem 2.6

We start with a key result on matrix concentration.

Theorem 1

(Noncommutative Bernstein Inequality [9, 17]) Let \(X_1, \dots , X_m\) be independent zero-mean-square \(d \times d\) random matrices. Suppose

$$\begin{aligned} \rho _k^2 {:=} \max \, \{\Vert \mathbb {E}[X_k X_k^T]\Vert _2, \Vert \mathbb {E}[X_k^T X_k] \Vert _2 \} \end{aligned}$$

and \(\Vert X_k\Vert _2 \le M\) almost surely for all \(k\). Then, for any \(\tau > 0\),

$$\begin{aligned} \mathbb {P} \left[ \left\| \sum _{k=1}^m X_k \right\| _2 > \tau \right] \le 2d \exp \left( \frac{-\tau ^2 / 2}{\sum _{k=1}^m \rho _k^2 + M\tau / 3} \right) . \end{aligned}$$

We proceed with the proof of Theorem 2.6.

Proof

We start by defining the notation

$$\begin{aligned} u_k {:=} U_{\Omega (k) \cdot }^T \in \mathbb {R}^d, \end{aligned}$$

that is, \(u_k\) is the transpose of the row of the row of \(U\) that corresponds to the \(k\hbox {th}\) element of \(\Omega \). We thus define

$$\begin{aligned} X_k {:=} u_k u_k^T - \frac{1}{n} I_d, \end{aligned}$$

where \(I_d\) is the \(d \times d\) identity matrix. Because of orthonormality of the columns of \(U\), this random variable has zero mean.

To apply Theorem 1, we must compute the values of \(\rho _k\) and \(M\) that correspond to this definition of \(X_k\). Since \(\Omega (k)\) is chosen uniformly with replacement, the \(X_k\) are distributed identically for all \(k\), and \(\rho _k\) is independent of \(k\) (and can thus be denoted by \(\rho \)).

Using the fact that

$$\begin{aligned} \Vert A-B\Vert _2 \le \max \{\Vert A\Vert _2,\Vert B\Vert _2\} \;\; \text{ for } \text{ positive } \text{ semidefinite } \text{ matrices } A \hbox { and } B,\qquad \end{aligned}$$
(6.1)

and recalling that \(\Vert u_k\Vert ^2_2 = \Vert U_{\Omega (k) \cdot }\Vert ^2_2 \le d\mu (U)/n\), we have

$$\begin{aligned} \left\| u_k u_k^T - \frac{1}{n} I_d \right\| _2 \le \max \left\{ \frac{d\mu (U)}{n}, \frac{1}{n} \right\} . \end{aligned}$$

Thus, we can define \(M {:=} d\mu (U) / n\). For \(\rho \), we note by symmetry of \(X_k\) that

$$\begin{aligned} \rho ^2 = \left\| \mathbb {E} \left[ X_k^2 \right] \right\| _2&= \left\| \mathbb {E} \left[ u_k u_k^T u_k u_k^T - \frac{2}{n} u_k u_k^T + \frac{1}{n^2} I_d \right] \right\| _2 \nonumber \\&= \left\| \mathbb {E}\left[ u_k (u_k^T u_k) u_k^T\right] - \frac{1}{n^2} I_d \right\| _2, \end{aligned}$$
(6.2)

where the last step follows from linearity of expectation, and \(E(u_k u_k^T) = (1/n) I_d\).

For the next step, we define \(S\) to be the \(n \times n\) diagonal matrix with diagonal elements \(\Vert U_{i \cdot }\Vert _2^2\), \(i=1,2,\cdots ,n\). We thus have

$$\begin{aligned} \Vert E[u_k (u_k^T u_k) u_k^T]\Vert _2 = \left\| \frac{1}{n} U^T S U \right\| \le \frac{1}{n} \Vert U \Vert _2^2 \Vert S\Vert _2 = \frac{1}{n} \frac{d \mu (U)}{n} = \frac{d \mu (U)}{n^2}. \end{aligned}$$

Using (6.1), we have from (6.2) that

$$\begin{aligned} \rho ^2 \le \max \left( \left\| \mathbb {E}\left[ u_k (u_k^T u_k) u_k^T\right] \right\| , \frac{1}{n^2} \right) \le \max \left( \frac{d \mu (U)}{n^2}, \frac{1}{n^2} \right) = \frac{d \mu (U)}{n^2}, \end{aligned}$$

since \(d \mu (U) \ge d \ge 1\).

We now apply Theorem 1. First, we restrict \(\tau \) to be such that \(M\tau \le |\Omega | \rho ^2\) to simplify the denominator of the exponent. We obtain

$$\begin{aligned} 2d \exp \left( \frac{-\tau ^2 / 2}{|\Omega | \rho ^2 + M\tau /3}\right) \le 2d\exp \left( \frac{-\tau ^2 / 2}{\frac{4}{3} |\Omega | \frac{d\mu (U)}{ n^2}}\right) , \end{aligned}$$

and thus

$$\begin{aligned} \mathbb {P} \left[ \left\| \sum _{k \in \Omega } \left( u_k u_k^T - \frac{1}{n} I_d \right) \right\| > \tau \right] \le 2d \exp \left( \frac{-3 n^2 \tau ^2}{8 |\Omega |d\mu (U)}\right) . \end{aligned}$$

Now take \(\tau = \gamma |\Omega |/n\) with \(\gamma \) defined in the statement of the lemma. Since \(\gamma <1\) by assumption, \(M\tau \le |\Omega | \rho ^2\) holds and we have

$$\begin{aligned} \mathbb {P} \left[ \left\| \sum _{k \in \Omega } \left( u_k u_k^T - \frac{1}{n} I_d \right) \right\| _2 \le \frac{|\Omega |}{n} \gamma \right] \ge 1 - \delta . \end{aligned}$$
(6.3)

We have, by symmetry of \(\sum _{k \in \Omega } u_k u_k^T\) and the fact that

$$\begin{aligned} \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T - \frac{|\Omega |}{n} I\right) = \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T \right) - \frac{|\Omega |}{n}, \end{aligned}$$

that

$$\begin{aligned} \left\| \sum _{k \in \Omega } \left( u_k u_k^T - \frac{1}{n} I_d \right) \right\| _2&= \left\| \left( \sum _{k \in \Omega } u_k u_k^T \right) - \frac{|\Omega |}{n} I_d \right\| _2 \\&= \max _{i=1,2,\cdots ,n} \left| \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T \right) - \frac{|\Omega |}{n} \right| , \end{aligned}$$

From (6.3), we have with probability \(1-\delta \) that

$$\begin{aligned} \lambda _i \left( \sum _{k \in \Omega } u_k u_k^T \right) \in \left[ (1-\gamma ) \frac{|\Omega |}{n}, (1+\gamma ) \frac{|\Omega |}{n} \right] \quad \hbox {for all } \, i=1,2,\cdots ,n, \end{aligned}$$

completing the proof. \(\square \)

Appendix 2: Proof of Lemma 3.1

We drop the subscript “\(t\)” throughout the proof and use \(A_+\) in place of \(A_{t+1}\). From (3.1), and using the definitions (3.2), we have

$$\begin{aligned} A_+^T&= \bar{U}^TU_+ \\&= \bar{U}^TU \!+\! \left\{ (\cos (\sigma \eta ) - 1) \frac{\bar{U}^T UU^T \bar{U} s}{\Vert w\Vert } + \sin (\sigma \eta ) \frac{(I-\bar{U}^T UU^T \bar{U})s}{\Vert r\Vert } \right\} \frac{s^T\bar{U}^TU}{\Vert w\Vert } \\&= \left\{ I + (\cos (\sigma \eta ) - 1) \frac{A^TAss^T}{\Vert w\Vert ^2} + \sin (\sigma \eta ) \frac{(I-A^TA)ss^T}{\Vert r\Vert \Vert w\Vert } \right\} A^T = HA^T, \end{aligned}$$

where the matrix \(H\) is defined in an obvious way. Thus,

$$\begin{aligned} \Vert A_+\Vert _F^2 = \mathrm{trace}(A_+A_+^T) = \mathrm{trace}(AH^THA^T). \end{aligned}$$

Focusing initially on \(H^TH\), we obtain

$$\begin{aligned} H^TH&= I + (\cos (\sigma \eta )-1)^2 \frac{ss^T A^TAA^TA ss^T}{\Vert w\Vert ^4} \\&\quad + (\cos (\sigma \eta )-1) \frac{ss^TA^TA + A^TAss^T}{\Vert w\Vert ^2} \\&\quad + \sin (\sigma \eta ) \frac{2ss^T - ss^TA^TA - A^TAss^T}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{ss^T A^TA ss^T - ss^T A^TAA^TAss^T}{\Vert r\Vert \Vert w\Vert ^3} \\&\quad + \sin ^2 (\sigma \eta ) \frac{s(s^Ts - 2s^TA^TAs+s^TA^TAA^TAs)s^T}{\Vert r\Vert ^2 \Vert w\Vert ^2}. \end{aligned}$$

It follows immediately that

$$\begin{aligned} A_+ A_+^T&= AA^T + (\cos (\sigma \eta )-1)^2 \frac{Ass^T A^TAA^TA ss^TA^T}{\Vert w\Vert ^4} \\&\quad + (\cos (\sigma \eta )-1) \frac{Ass^TA^TAA^T + AA^TAss^TA^T}{\Vert w\Vert ^2} \\&\quad + \sin (\sigma \eta ) \frac{2Ass^TA^T - Ass^TA^TAA^T - AA^TAss^TA^T}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{Ass^T A^TA ss^TA^T - Ass^T A^TAA^TAss^TA^T}{\Vert r\Vert \Vert w\Vert ^3} \\&\quad + \sin ^2 (\sigma \eta ) \frac{As(s^Ts - 2s^TA^TAs+s^TA^TAA^TAs)s^TA^T}{\Vert r\Vert ^2 \Vert w\Vert ^2}. \end{aligned}$$

We now use repeatedly the fact that \(\mathrm{trace}\, ab^T= a^Tb\) to deduce that

$$\begin{aligned} \mathrm{trace}(A_+A_+^T)&= \mathrm{trace}(AA^T) + (\cos (\sigma \eta )-1)^2 \frac{(s^TA^TAs)s^T A^TAA^TA s}{\Vert w\Vert ^4} \\&\quad + (\cos (\sigma \eta )-1) \frac{2s^TA^TAA^TAs}{\Vert w\Vert ^2} \\&\quad + \sin (\sigma \eta ) \frac{2s^TA^TAs - 2 s^TA^TAA^TAs}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{(s^T A^TA s)^2 - (s^T A^TAA^TAs)(s^TA^TAs)}{\Vert r\Vert \Vert w\Vert ^3} \\&\quad + \sin ^2 (\sigma \eta ) \frac{\Vert s\Vert ^2 s^TA^TAs \!-\! 2(s^TA^TAs)^2 \!+\! (s^TA^TAA^TAs)(s^TA^TAs)}{\Vert r\Vert ^2 \Vert w\Vert ^2}. \end{aligned}$$

Now using \(w=As\) (and hence \(s^TA^TAs=\Vert w\Vert ^2\)), we have

$$\begin{aligned} \mathrm{trace}(A_+A_+^T)&= \mathrm{trace}(AA^T) + (\cos (\sigma \eta )-1)^2 \frac{s^T A^TAA^TA s}{\Vert w\Vert ^2} \\&\quad + (\cos (\sigma \eta )-1) \frac{2s^TA^TAA^TAs}{\Vert w\Vert ^2} \\&\quad + 2 \sin (\sigma \eta ) \frac{\Vert w\Vert ^2 - s^TA^TAA^TAs}{\Vert r\Vert \Vert w\Vert } \\&\quad + 2 \sin (\sigma \eta ) (\cos (\sigma \eta )-1) \frac{\Vert w\Vert ^2 - s^T A^TAA^TAs}{\Vert r\Vert \Vert w\Vert } \\&\quad + \sin ^2 (\sigma \eta ) \frac{\Vert s\Vert ^2- 2\Vert w\Vert ^2 + (s^TA^TAA^TAs)}{\Vert r\Vert ^2}, \end{aligned}$$

For the second and third terms on the right-hand side, we use the identity

$$\begin{aligned} (\cos (\sigma \eta )-1)^2 + 2(\cos (\sigma \eta )-1) = \cos ^2 (\sigma \eta )-1 = {-}\sin ^2 (\eta \sigma ), \end{aligned}$$

allowing us to combine these terms with the final \(\sin ^2(\sigma \eta )\) term. Using also the identity \(\Vert r\Vert ^2 = \Vert s\Vert ^2-\Vert w\Vert ^2\), we obtain for the combination of these three terms that

$$\begin{aligned}&\sin ^2 (\sigma \eta ) \left[ 1 - \frac{\Vert w\Vert ^2}{\Vert r\Vert ^2} + s^TA^TAA^TAs \left( \frac{1}{\Vert r\Vert ^2} - \frac{1}{\Vert w\Vert ^2} \right) \right] \\&\qquad = \sin ^2 (\sigma \eta ) \left( 1 - \frac{\Vert w\Vert ^2}{\Vert r\Vert ^2}\right) \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) . \end{aligned}$$

We can also combine the third and fourth terms in the right-hand side above to yield a combined quantity

$$\begin{aligned} 2 \sin (\sigma \eta ) \cos (\sigma \eta ) \frac{\Vert w\Vert }{\Vert r\Vert } \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) . \end{aligned}$$

By substituting these two compressed terms into the expression above, we obtain

$$\begin{aligned}&\mathrm{trace}(A_+A_+^T) = \mathrm{trace}(AA^T) \\&\quad + \sin (\sigma \eta ) \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) \left[ \left( 1-\frac{\Vert w\Vert ^2}{\Vert r\Vert ^2} \right) \sin (\sigma \eta ) + 2 \cos (\sigma \eta ) \frac{\Vert w\Vert }{\Vert r\Vert } \right] . \end{aligned}$$

We now use the relations (3.3) to deduce that

$$\begin{aligned} \frac{\Vert w\Vert }{\Vert r\Vert } = \frac{\cos \theta }{\sin \theta }, \quad 1-\frac{\Vert w\Vert ^2}{\Vert r\Vert ^2} = -\frac{\cos (2 \theta )}{\sin ^2 \theta }, \end{aligned}$$

and thus, the increment \(\mathrm{trace}(A_+A_+^T) - \mathrm{trace}(AA^T)\) becomes

$$\begin{aligned}&\sin (\sigma \eta ) \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) \left[ -\frac{\cos (2 \theta )}{\sin ^2 \theta } \sin (\sigma \eta ) + 2 \cos (\sigma \eta ) \frac{\cos \theta }{\sin \theta } \right] \\&\quad = \frac{\sin (\sigma \eta ) \sin (2 \theta - \sigma \eta )}{\sin ^2 \theta } \left( 1- \frac{s^TA^TAA^TAs}{\Vert w\Vert ^2} \right) . \end{aligned}$$

The result (3.6) follows by substituting \(w=As\) and (3.4).

Nonnegativity of the right-hand side follows from \(\theta _t \ge 0\), \(2\theta _t - \sigma _t \eta _t \ge 0\), and \(\Vert A_t^T w_t \Vert \le \Vert \bar{U}^T U_t \Vert \Vert w_t \Vert \le \Vert w_t \Vert \).

To prove that the right-hand side of (3.6) is zero when \(v_t \in \mathcal{S}\) or \(v_t \perp \mathcal{S}\), we take the former case first. Here, there exists \(\hat{s}_t \in \mathbb {R}^d\) such that

$$\begin{aligned} v_t = \bar{U} s_t = U_t \hat{s}_t. \end{aligned}$$

Thus,

$$\begin{aligned} w_t = A_t s_t = U_t^T \bar{U} s_t = U_t^T U_t \hat{s}_t = \hat{s}_t, \end{aligned}$$

so that \(\Vert v_t \Vert = \Vert w_t \Vert \) and thus \(\theta _t=0\), from (3.3). This implies that the right-hand side of (3.6) is zero. When \(v_t \perp \mathcal{S}_t\), we have \(w_t = U_t^Tv_t=0\) and so \(\theta _t = \pi /2\) and \(\sigma _t=0\), implying again that the right-hand side of (3.6) is zero.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Balzano, L., Wright, S.J. Local Convergence of an Algorithm for Subspace Identification from Partial Data. Found Comput Math 15, 1279–1314 (2015). https://doi.org/10.1007/s10208-014-9227-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-014-9227-7

Keywords

  • Subspace identification
  • Optimization
  • Incomplete data

Mathematics Subject Classification

  • 90C52
  • 65Y20
  • 68W20