A Randomized Online Learning Algorithm for Better Variance Control

  • Jean-Yves Audibert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


We propose a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. It satisfies a simple risk bound, which is sharp to the extent that the standard statistical learning approach, based on supremum of empirical processes, does not lead to algorithms with such a tight guarantee on its efficiency. Our generalization error bounds complement the pioneering work of Cesa-Bianchi et al. [12] in which standard-style statistical results were recovered with tight constants using worst-case analysis.

A nice feature of our analysis of the randomized estimator is to put forward the links between the probabilistic and worst-case viewpoint. It also allows to recover recent model selection results due to Juditsky et al. [16] and to improve them in least square regression with heavy noise, i.e. when no exponential moment condition is assumed on the output.


Loss Function Prediction Function Empirical Process Generalization Error Statistical Learning Theory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alquier, P.: Iterative feature selection in least square regression estimation. Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2005)Google Scholar
  2. 2.
    Audibert, J.-Y.: Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré, Probab. Stat. 40(6), 685–736 (2004)CrossRefMathSciNetMATHGoogle Scholar
  3. 3.
    Audibert, J.-Y.: A better variance control for PAC-Bayesian classification. Preprint n.905, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2004),
  4. 4.
    Audibert, J.-Y.: PAC-Bayesian statistical learning theory. PhD thesis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2004)Google Scholar
  5. 5.
    Barron, A.: Are bayes rules consistent in information? In: Cover, T.M., Gopinath, B. (eds.) Open Problems in Communication and Computation, pp. 85–91. Springer, Heidelberg (1987)Google Scholar
  6. 6.
    Barron, A., Yang, Y.: Information-theoretic determination of minimax rates of convergence. Ann. Stat. 27(5), 1564–1599 (1999)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Bunea, F., Nobel, A.: Sequential procedures for aggregating arbitrary estimators of a conditional mean, Technical report (2005), available from:
  8. 8.
    Catoni, O.: Statistical Learning Theory and Stochastic Optimization: Ecole d’été de Probabilités de Saint-Flour XXXI. Lecture Notes in Mathematics. Springer, Heidelberg (2001)Google Scholar
  9. 9.
    Catoni, O.: A mixture approach to universal model selection. preprint LMENS 97-30 (1997), available from:
  10. 10.
    Catoni, O.: Universal aggregation rules with exact bias bound. Preprint n.510 (1999),
  11. 11.
    Catoni, O.: A PAC-Bayesian approach to adaptive classification. Preprint n.840, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2003)Google Scholar
  12. 12.
    Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)CrossRefMathSciNetMATHGoogle Scholar
  13. 13.
    Cesa-Bianchi, N., Lugosi, G.: On prediction of individual sequences. Ann. Stat. 27(6), 1865–1895 (1999)CrossRefMathSciNetMATHGoogle Scholar
  14. 14.
    Dudley, R.M.: Central limit theorems for empirical measures. Ann. Probab. 6, 899–929 (1978)CrossRefMathSciNetMATHGoogle Scholar
  15. 15.
    Haussler, D., Kivinen, J., Warmuth, M.K.: Sequential prediction of individual sequences under general loss functions. IEEE Trans. on Information Theory 44(5), 1906–1925 (1998)CrossRefMathSciNetMATHGoogle Scholar
  16. 16.
    Juditsky, A., Rigollet, P., Tsybakov, A.B.: Learning by mirror averaging (2005), available from arxiv websiteGoogle Scholar
  17. 17.
    Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS, vol. 1572, pp. 153–167. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  18. 18.
    Merhav, Feder: Universal prediction. IEEE Transactions on Information Theory 44 (1998)Google Scholar
  19. 19.
    Vapnik, V.: The nature of statistical learning theory, 2nd edn. Springer, Heidelberg (1995)MATHGoogle Scholar
  20. 20.
    Vovk, V.G.: Aggregating strategies. In: COLT 1990: Proceedings of the third annual workshop on Computational learning theory, pp. 371–386. Morgan Kaufmann Publishers Inc, San Francisco (1990)Google Scholar
  21. 21.
    Vovk, V.G.: A game of prediction with expert advice. Journal of Computer and System Sciences, 153–173 (1998)Google Scholar
  22. 22.
    Yang, Y.: Combining different procedures for adaptive regression. Journal of multivariate analysis 74, 135–161 (2000)CrossRefMathSciNetMATHGoogle Scholar
  23. 23.
    Yaroshinsky, R., El-Yaniv, R., Seiden, S.S.: How to better use expert advice. Mach. Learn. 55(3), 271–309 (2004)CrossRefMATHGoogle Scholar
  24. 24.
    Zhang, T.: Information theoretical upper and lower bounds for statistical estimation. IEEE Transaction on Information Theory (to appear, 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jean-Yves Audibert
    • 1
  1. 1.CERTISEcole des PontsMarne-la-ValléeFrance

Personalised recommendations