ALT 2013: Algorithmic Learning Theory pp 98-112 | Cite as
Online PCA with Optimal Regrets
Abstract
We carefully investigate the online version of PCA, where in each trial a learning algorithm plays a k-dimensional subspace, and suffers the compression loss on the next instance when projected into the chosen subspace. In this setting, we give regret bounds for two popular online algorithms, Gradient Descent (GD) and Matrix Exponentiated Gradient (MEG). We show that both algorithms are essentially optimal in the worst-case when the regret is expressed as a function of the number of trials. This comes as a surprise, since MEG is commonly believed to perform sub-optimally when the instances are sparse. This different behavior of MEG for PCA is mainly related to the non-negativity of the loss in this case, which makes the PCA setting qualitatively different from other settings studied in the literature. Furthermore, we show that when considering regret bounds as a function of a loss budget, MEG remains optimal and strictly outperforms GD.
Next, we study a generalization of the online PCA problem, in which the Nature is allowed to play with dense instances, which are positive matrices with bounded largest eigenvalue. Again we can show that MEG is optimal and strictly better than GD in this setting.
Keywords
Online learning regret bounds expert setting k-sets PCA Gradient Descent and Matrix Exponentiated Gradient algorithmsPreview
Unable to display preview. Download preview PDF.
References
- 1.Abernethy, J., Agarwal, A., Bartlett, P.L., Rakhlin, A.: A stochastic view of optimal regret through minimax duality. In: COLT (2009)Google Scholar
- 2.Abernethy, J., Warmuth, M.K., Yellin, J.: When random play is optimal against an adversary. In: COLT, pp. 437–446 (2008)Google Scholar
- 3.Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning 43(3), 211–246 (2001)CrossRefMATHGoogle Scholar
- 4.Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)MathSciNetCrossRefMATHGoogle Scholar
- 5.Cesa-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Trans. Neural Netw. Learning Syst. 7(3), 604–619 (1996)CrossRefGoogle Scholar
- 6.Cesa-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Trans. Neural Netw. Learning Syst. 7(3), 604–619 (1996)CrossRefGoogle Scholar
- 7.Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press (2006)Google Scholar
- 8.Helmbold, D.P., Warmuth, M.K.: Learning permutations with exponential weights. Journal of Machine Learning Research 10, 1705–1736 (2009)MathSciNetMATHGoogle Scholar
- 9.Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)MathSciNetMATHGoogle Scholar
- 10.Kalai, A.T., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71(3), 291–307 (2005)MathSciNetCrossRefMATHGoogle Scholar
- 11.Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132(1), 1–63 (1997)MathSciNetCrossRefMATHGoogle Scholar
- 12.Koolen, W.M., Warmuth, M.K., Kivinen, J.: Hedging structured concepts. In: COLT, pp. 93–105 (2010)Google Scholar
- 13.Kuzmin, D., Warmuth, M.K.: Online kernel PCA with entropic matrix updates. In: ICML, pp. 465–472 (2007)Google Scholar
- 14.Srebro, N., Sridharan, K., Tewari, A.: On the universality of online mirror descent. In: NIPS, pp. 2645–2653 (2011)Google Scholar
- 15.Sridharan, K., Tewari, A.: Convex games in banach spaces. In: Proceedings of the 23nd Annual Conference on Learning Theory (COLT) (2010)Google Scholar
- 16.Tsuda, K., Rätsch, G., Warmuth, M.K.: Matrix exponentiated gradient updates for on-line learning and Bregman projections. Journal of Machine Learning Research 6, 995–1018 (2005)MATHGoogle Scholar
- 17.Warmuth, M.K., Kuzmin, D.: Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research 9, 2287–2320 (2008)MathSciNetMATHGoogle Scholar
- 18.Warmuth, M.K., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: ICML, pp. 1001–1008 (2006)Google Scholar
- 19.Warmuth, M.K., Vishwanathan, S.V.N.: Leaving the span. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 366–381. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 20.Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 928–936. AAAI Press (2003)Google Scholar