Adaptive and Optimal Online Linear Regression on ℓ1-Balls

  • Sébastien Gerchinovitz
  • Jia Yuan Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


We consider the problem of online linear regression on individual sequences. The goal in this paper is for the forecaster to output sequential predictions which are, after T time rounds, almost as good as the ones output by the best linear predictor in a given ℓ1-ball in ℝ d . We consider both the cases where the dimension d is small and large relative to the time horizon T. We first present regret bounds with optimal dependencies on the sizes U, X and Y of the ℓ1-ball, the input data and the observations. The minimax regret is shown to exhibit a regime transition around the point \(d = \sqrt{T} U X / (2 Y)\). Furthermore, we present efficient algorithms that are adaptive, i.e., that do not require the knowledge of U, X, Y, and T, but still achieve nearly optimal regret bounds.


Minimax Regret Scaling Algorithm Stochastic Setting Base Forecast Loss Bound 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AW01]
    Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 43(3), 211–246 (2001)CrossRefMATHGoogle Scholar
  2. [BM01]
    Birgé, L., Massart, P.: Gaussian model selection. J. Eur. Math. Soc. 3, 203–268 (2001)MathSciNetCrossRefMATHGoogle Scholar
  3. [BN08]
    Bunea, F., Nobel, A.: Sequential procedures for aggregating arbitrary estimators of a conditional mean. IEEE Trans. Inform. Theory 54(4), 1725–1735 (2008)MathSciNetCrossRefMATHGoogle Scholar
  4. [CB99]
    Cesa-Bianchi, N.: Analysis of two gradient-based algorithms for on-line regression. J. Comput. System Sci. 59(3), 392–411 (1999)MathSciNetCrossRefMATHGoogle Scholar
  5. [CBL06]
    Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, Cambridge (2006)CrossRefMATHGoogle Scholar
  6. [CBLW96]
    Cesa-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks 7(3), 604–619 (1996)CrossRefGoogle Scholar
  7. [CBMS07]
    Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. Mach. Learn. 66(2/3), 321–352 (2007)CrossRefMATHGoogle Scholar
  8. [DSSST10]
    Duchi, J.C., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the 23rd Annual Conference on Learning Theory (COLT 2010), pp. 14–26 (2010)Google Scholar
  9. [GL03]
    Gentile, C., Littlestone, N.: The robustness of the p-norm algorithms. Mach. Learn. 53(3), 265–299 (2003)CrossRefMATHGoogle Scholar
  10. [GY11]
    Gerchinovitz, S., Yu, J.Y.: Adaptive and optimal online linear regression on ℓ1-balls. Technical report (2011),
  11. [KW97]
    Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inform. and Comput. 132(1), 1–63 (1997)MathSciNetCrossRefMATHGoogle Scholar
  12. [Nem00]
    Nemirovski, A.: Topics in Non-Parametric Statistics. Springer, Heidelberg (2000)MATHGoogle Scholar
  13. [RWY09]
    Raskutti, G., Wainwright, M.J., Yu, B.: Minimax rates of convergence for high-dimensional regression under ℓq-ball sparsity. In: Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton 2009), pp. 251–257 (2009)Google Scholar
  14. [SSSZ10]
    Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. Optim. 20(6), 2807–2832 (2010)MathSciNetCrossRefMATHGoogle Scholar
  15. [SST09]
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for ℓ1-regularized loss minimization. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pp. 929–936 (2009)Google Scholar
  16. [Tsy03]
    Tsybakov, A.B.: Optimal rates of aggregation. In: Proceedings of the 16th Annual Conference on Learning Theory (COLT 2003), pp. 303–313 (2003)Google Scholar
  17. [Vov01]
    Vovk, V.: Competitive on-line statistics. Internat. Statist. Rev. 69, 213–248 (2001)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Sébastien Gerchinovitz
    • 1
  • Jia Yuan Yu
    • 1
    • 2
  1. 1.École Normale SupérieureParisFrance
  2. 2.HEC Paris, CNRSJouy-en-JosasFrance

Personalised recommendations