ALT 2013: Algorithmic Learning Theory pp 98-112 | Cite as

Online PCA with Optimal Regrets

  • Jiazhong Nie
  • Wojciech Kotłowski
  • Manfred K. Warmuth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8139)

Abstract

We carefully investigate the online version of PCA, where in each trial a learning algorithm plays a k-dimensional subspace, and suffers the compression loss on the next instance when projected into the chosen subspace. In this setting, we give regret bounds for two popular online algorithms, Gradient Descent (GD) and Matrix Exponentiated Gradient (MEG). We show that both algorithms are essentially optimal in the worst-case when the regret is expressed as a function of the number of trials. This comes as a surprise, since MEG is commonly believed to perform sub-optimally when the instances are sparse. This different behavior of MEG for PCA is mainly related to the non-negativity of the loss in this case, which makes the PCA setting qualitatively different from other settings studied in the literature. Furthermore, we show that when considering regret bounds as a function of a loss budget, MEG remains optimal and strictly outperforms GD.

Next, we study a generalization of the online PCA problem, in which the Nature is allowed to play with dense instances, which are positive matrices with bounded largest eigenvalue. Again we can show that MEG is optimal and strictly better than GD in this setting.

Keywords

Online learning regret bounds expert setting k-sets PCA Gradient Descent and Matrix Exponentiated Gradient algorithms 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abernethy, J., Agarwal, A., Bartlett, P.L., Rakhlin, A.: A stochastic view of optimal regret through minimax duality. In: COLT (2009)Google Scholar
  2. 2.
    Abernethy, J., Warmuth, M.K., Yellin, J.: When random play is optimal against an adversary. In: COLT, pp. 437–446 (2008)Google Scholar
  3. 3.
    Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning 43(3), 211–246 (2001)CrossRefMATHGoogle Scholar
  4. 4.
    Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Cesa-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Trans. Neural Netw. Learning Syst. 7(3), 604–619 (1996)CrossRefGoogle Scholar
  6. 6.
    Cesa-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Trans. Neural Netw. Learning Syst. 7(3), 604–619 (1996)CrossRefGoogle Scholar
  7. 7.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press (2006)Google Scholar
  8. 8.
    Helmbold, D.P., Warmuth, M.K.: Learning permutations with exponential weights. Journal of Machine Learning Research 10, 1705–1736 (2009)MathSciNetMATHGoogle Scholar
  9. 9.
    Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)MathSciNetMATHGoogle Scholar
  10. 10.
    Kalai, A.T., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71(3), 291–307 (2005)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132(1), 1–63 (1997)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Koolen, W.M., Warmuth, M.K., Kivinen, J.: Hedging structured concepts. In: COLT, pp. 93–105 (2010)Google Scholar
  13. 13.
    Kuzmin, D., Warmuth, M.K.: Online kernel PCA with entropic matrix updates. In: ICML, pp. 465–472 (2007)Google Scholar
  14. 14.
    Srebro, N., Sridharan, K., Tewari, A.: On the universality of online mirror descent. In: NIPS, pp. 2645–2653 (2011)Google Scholar
  15. 15.
    Sridharan, K., Tewari, A.: Convex games in banach spaces. In: Proceedings of the 23nd Annual Conference on Learning Theory (COLT) (2010)Google Scholar
  16. 16.
    Tsuda, K., Rätsch, G., Warmuth, M.K.: Matrix exponentiated gradient updates for on-line learning and Bregman projections. Journal of Machine Learning Research 6, 995–1018 (2005)MATHGoogle Scholar
  17. 17.
    Warmuth, M.K., Kuzmin, D.: Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research 9, 2287–2320 (2008)MathSciNetMATHGoogle Scholar
  18. 18.
    Warmuth, M.K., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: ICML, pp. 1001–1008 (2006)Google Scholar
  19. 19.
    Warmuth, M.K., Vishwanathan, S.V.N.: Leaving the span. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 366–381. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 928–936. AAAI Press (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jiazhong Nie
    • 1
  • Wojciech Kotłowski
    • 2
  • Manfred K. Warmuth
    • 1
  1. 1.Department of Computer ScienceUniversity of CaliforniaSanta CruzUSA
  2. 2.Institute of Computing SciencePoznań University of TechnologyPoland

Personalised recommendations