Skip to main content
Log in

Nearly optimal stochastic approximation for online principal subspace estimation

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) has been widely used in analyzing high-dimensional data. It converts a set of observed data points of possibly correlated variables into a set of linearly uncorrelated variables via an orthogonal transformation. To handle streaming data and reduce the complexities of PCA, (subspace) online PCA iterations were proposed to iteratively update the orthogonal transformation by taking one observed data point at a time. Existing works on the convergence of (subspace) online PCA iterations mostly focus on the case where the samples are almost surely uniformly bounded. In this paper, we analyze the convergence of a subspace online PCA iteration under more practical assumption and obtain a nearly optimal finite-sample error bound. Our convergence rate almost matches the minimax information lower bound. We prove that the convergence is nearly global in the sense that the subspace online PCA iteration is convergent with high probability for random initial guesses. This work also leads to a simpler proof of the recent work on analyzing online PCA for the first principal component only.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abed-Meraim K, Attallah S, Chkeif A, et al. Orthogonal Oja algorithm. IEEE Signal Process Lett, 2000, 7: 116–119

    Article  Google Scholar 

  2. Absil P A, Edelman A, Koev P. On the largest principal angle between random subspaces. Linear Algebra Appl, 2006, 414: 288–294

    Article  MathSciNet  MATH  Google Scholar 

  3. Allen-Zhu Z, Li Y. First efficient convergence for streaming k-PCA: A global, gap-free, and near-optimal rate. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). New York: IEEE, 2017, 487–492

    Google Scholar 

  4. Arora R, Cotter A, Livescu K, et al. Stochastic optimization for PCA and PLS. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). New York: IEEE, 2012, 861–868

    Google Scholar 

  5. Arora R, Cotter A, Srebro N. Stochastic optimization of PCA with capped MSG. Adv Neural Inform Process Syst, 2013, 26: 1815–1823

    Google Scholar 

  6. Balcan M F, Du S S, Wang Y, et al. An improved gap-dependency analysis of the noisy power method. In: Proceedings of the 29th Annual Conference on Learning Theory, vol. 49. San Diego: PMLR, 2016, 284–309

    Google Scholar 

  7. Balsubramani A, Dasgupta S, Freund Y. The fast convergence of incremental PCA. Adv Neural Inform Process Syst, 2013, 2: 3174–3182

    Google Scholar 

  8. Blum A, Hopcroft J, Kannan R. Foundations of Data Science. New York: Cambridge University Press, 2020

    Book  MATH  Google Scholar 

  9. Chikuse Y. Statistics on Special Manifolds. New York: Springer, 2003

    Book  MATH  Google Scholar 

  10. De Sa C, Olukotun K, Ré C. Global convergence of stochastic gradient descent for some non-convex matrix problems. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37. San Diego: PLMR, 2015, 2332–2341

    Google Scholar 

  11. Demmel J. Applied Numerical Linear Algebra. Philadelphia: SIAM, 1997

    Book  MATH  Google Scholar 

  12. Garber D, Hazan E, Jin C, et al. Faster eigenvector computation via shift-and-invert preconditioning. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48. San Diego: JMLR, 2016, 2626–2634

    Google Scholar 

  13. Hardt M, Price E. The noisy power method: A meta algorithm with applications. Adv Neural Inform Process Syst, 2014, 27: 2861–2869

    Google Scholar 

  14. Horn R A, Johnson C R. Topics in Matrix Analysis. Cambridge: Cambridge University Press, 1991

    Book  MATH  Google Scholar 

  15. Hotelling H. Analysis of a complex of statistical variables into principal components. J Educational Psych, 1933, 24: 417–441

    Article  MATH  Google Scholar 

  16. Jain P, Jin C, Kakade S M, et al. Streaming PCA: Matching matrix Bernstein and near-optimal finite sample guarantees for Oja’s algorithm. In: Proceedings of The 29th Conference on Learning Theory (COLT). New York: COLT, 2016, 1147–1164

    Google Scholar 

  17. James A T. Normal multivariate analysis and the orthogonal group. Ann Math Statist, 1954, 25: 40–75

    Article  MathSciNet  MATH  Google Scholar 

  18. Li C L, Lin H T, Lu C J. Rivalry of two families off algorithms for memory-restricted streaming PCA. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS). San Diego: JMLR, 2016, 473–481

    Google Scholar 

  19. Li C J, Wang M D, Liu H, et al. Near-optimal stochastic approximation for online principal component estimation. Math Program, 2018, 167: 75–97

    Article  MathSciNet  MATH  Google Scholar 

  20. Luke Y L. The Special Functions and Their Approximations. New York: Academic Press, 1969

    MATH  Google Scholar 

  21. Marinov T V, Mianjy P, Arora R. Streaming principal component analysis in noisy settings. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80. San Diego: PMLR, 2018, 3413–3422

    Google Scholar 

  22. Mianjy P, Arora R. Stochastic PCA with ℓ2 and ℓ1 regularization. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80. San Diego: PMLR, 2018, 3531–3539

    Google Scholar 

  23. Muirhead R J. Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons, 1982

    Book  MATH  Google Scholar 

  24. Oja E. Simplified neuron model as a principal component analyzer. J Math Biol, 1982, 15: 267–273

    Article  MathSciNet  MATH  Google Scholar 

  25. Oja E, Karhunen J. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. J Math Anal Appl, 1985, 106: 69–84

    Article  MathSciNet  MATH  Google Scholar 

  26. Pearson K F R S. On lines and planes of closest fit to systems of points in space. Philos Mag, 1901, 2: 559–572

    Article  MATH  Google Scholar 

  27. Shamir O. Convergence of stochastic gradient descent for PCA. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48. San Diego: PMLR, 2016, 257–265

    Google Scholar 

  28. Stewart G W, Sun J G. Matrix Perturbation Theory. Boston: Academic Press, 1990

    MATH  Google Scholar 

  29. Tropp J A. User-friendly tail bounds for sums of random matrices. Found Comput Math, 2012. 12: 389–434

    Article  MathSciNet  MATH  Google Scholar 

  30. van der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. Springer Series in Statistics. New York: Springer, 1996

    Book  MATH  Google Scholar 

  31. Vershynin R. Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications. New York: Cambridge University Press, 2012, 210–268

    Google Scholar 

  32. Vu V Q, Lei J. Minimax sparse principal subspace estimation in high dimensions. Ann Statist, 2013, 41: 2905–2947

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 11901340), National Science Foundation of USA (Grant Nos. DMS-1719620 and DMS-2009689), Ministry of Science and Technology of Taiwan, Taiwanese Center for Theoretical Sciences, and the ST Yau Centre at the Taiwan Chiao Tung University. The authors are indebted to the referees for their constructive comments and suggestions that improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ren-Cang Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, X., Guo, ZC., Wang, L. et al. Nearly optimal stochastic approximation for online principal subspace estimation. Sci. China Math. 66, 1087–1122 (2023). https://doi.org/10.1007/s11425-021-1972-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-021-1972-5

Keywords

MSC(2020)

Navigation