Optimal Oracle Inequality for Aggregation of Classifiers Under Low Noise Condition

  • Guillaume Lecué
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


We consider the problem of optimality, in a minimax sense, and adaptivity to the margin and to regularity in binary classification. We prove an oracle inequality, under the margin assumption (low noise condition), satisfied by an aggregation procedure which uses exponential weights. This oracle inequality has an optimal residual: (logM/n) κ/(2κ− 1) where κ is the margin parameter, M the number of classifiers to aggregate and n the number of observations. We use this inequality first to construct minimax classifiers under margin and regularity assumptions and second to aggregate them to obtain a classifier which is adaptive both to the margin and regularity. Moreover, by aggregating plug-in classifiers (only logn), we provide an easily implementable classifier adaptive both to the margin and to regularity.


Support Vector Machine Optimal Rate Prediction Rule Empirical Risk Aggregation Procedure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers under margin condition (2005) (Preprint PMA-998), available at:
  2. 2.
    Barron, A., Leung, G.: Information theory and mixing least-square regressions (manuscript, 2004)Google Scholar
  3. 3.
    Barron, A., Li, J.: Mixture density estimation. Biometrics 53, 603–618 (1997)CrossRefGoogle Scholar
  4. 4.
    Bartlett, P., Freund, Y., Lee, W.S., Schapire, R.E.: Boosting the margin: a new explanantion for the effectiveness of voting methods. Annals of Statistics 26, 1651–1686 (1998)CrossRefMathSciNetMATHGoogle Scholar
  5. 5.
    Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, Classification and Risk Bounds, Technical Report 638, Department of Statistics, U.C. Berkeley (2003), available at:
  6. 6.
    Blanchard, G., Bousquet, O., Massart, P.: Statistical Performance of Support Vector Machines (2004), available at:
  7. 7.
    Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: A survey of some recent advances. ESAIM: Probability and statistics 9, 325–375 (2005)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Blanchard, G., Lugosi, G., Vayatis, N.: On the rate of convergence of regularized boosting classifiers. JMLR 4, 861–894 (2003)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Statist. 30(4), 927–961 (2002)CrossRefMathSciNetMATHGoogle Scholar
  10. 10.
    Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2002)Google Scholar
  11. 11.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)MATHGoogle Scholar
  12. 12.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)MATHGoogle Scholar
  13. 13.
    Catoni, O.: Statistical Learning Theory and Stochastic Optimization, Ecole d’été de Probabilités de Saint-Flour, Lecture Notes in Mathematics. Springer, N.Y. (2001)Google Scholar
  14. 14.
    Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30, 1–50 (2002)MathSciNetMATHGoogle Scholar
  15. 15.
    Koltchinskii, V.: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. Ann. Statist. (to appear, 2005)Google Scholar
  16. 16.
    Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32(1), 30–55 (2004)MathSciNetMATHGoogle Scholar
  17. 17.
    Lecué, G.: Simultaneous adaptation to the margin and to complexity in classification (2005), available at:
  18. 18.
    Lecué, G.: Optimal rates of aggregation in classification (2006), available at:
  19. 19.
    Massart, P.: Some applications of concentration inequalities to Statistics. Probability Theory. Annales de la Faculté des Sciences de Toulouse 2, 245–303 (2000), volume spécial dédié à Michel TalagrandGoogle Scholar
  20. 20.
    Massart, P.: Concentration inequalities and Model Selection. Lectures notes of Saint Flour (2004)Google Scholar
  21. 21.
    Schölkopf, B., Smola, A.: Learning with kernels. MIT press, Cambridge University (2002)Google Scholar
  22. 22.
    Steinwart, I., Scovel, C.: Fast Rates for Support Vector Machines using Gaussian Kernels (2004), Los Alamos National Laboratory Technical Report LA-UR 04-8796 (submitted to Annals of Statistics)Google Scholar
  23. 23.
    Steinwart, I., Scovel, C.: Fast Rates for Support Vector Machines. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 279–294. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  24. 24.
    Tsybakov, A.B.: Optimal rates of aggregation. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 303–313. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  25. 25.
    Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1), 135–166 (2004)CrossRefMathSciNetMATHGoogle Scholar
  26. 26.
    Tsybakov, A.B.: Introduction à l’estimation non-paramétrique. Springer, Heidelberg (2004)MATHGoogle Scholar
  27. 27.
    Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383 (1990)Google Scholar
  28. 28.
    Yang, Y.: Mixing strategies for density estimation. Ann. Statist. 28(1), 75–87 (2000)CrossRefMathSciNetMATHGoogle Scholar
  29. 29.
    Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1), 56–85 (2004)CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Guillaume Lecué
    • 1
  1. 1.Laboratoire de Probabilités et Modèles Aléatoires (UMR CNRS 7599)Université Paris VIParisFrance

Personalised recommendations