Suboptimality of Penalized Empirical Risk Minimization in Classification

Lecué, Guillaume

doi:10.1007/978-3-540-72927-3_12

Guillaume Lecué¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Included in the following conference series:

International Conference on Computational Learning Theory

3252 Accesses
8 Citations

Abstract

Let \(\cal F\) be a set of M classification procedures with values in [ − 1,1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in \(\cal F\). This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((logM)/n)^1/2 or (logM)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Audibert, J.-Y.: A randomized online learning algorithm for better variance control. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 392–407. Springer, Heidelberg (2006)
Chapter Google Scholar
Barron, A., Li, J.: Mixture density estimation. Biometrics 53, 603–618 (1997)
Article Google Scholar
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. Journal of the American Statistical Association 101(473), 138–156 (2006)
Article MathSciNet Google Scholar
Bickel, P., Doksum, K.: Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1. Prentice-Hall, Englewood Cliffs (2001)
Google Scholar
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: some recent advances. ESAIM Probability and Statistics 9, 323–375 (2005)
MATH MathSciNet Google Scholar
Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Statist. 30(4), 927–961 (2002)
Article MATH MathSciNet Google Scholar
Catoni, O.: Statistical Learning Theory and Stochastic Optimization. Ecole d’été de Probabilités de Saint-Flour 2001. Lecture Notes in Mathematics. Springer, Heidelberg (2001)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
MATH Google Scholar
Chesneau, C., Lecué, G.: Adapting to unknown smoothness by aggregation of thresholded wavelet estimators. Submitted (2006)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
MATH Google Scholar
Einmahl, U., Mason, D.: Some Universal Results on the Behavior of Increments of Partial Sums. Ann. Probab. 24, 2626–2635 (1996)
MathSciNet Google Scholar
Freund, Y., Schapire, R.: A decision-theoric generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Statist. 28, 337–407 (2000)
Article MATH MathSciNet Google Scholar
Haussler, D., Kivinen, J., Warmuth, M.K.: Sequential prediction of individual sequences under general loss functions. IEEE Trans. on Information Theory 44(5), 1906–1925
Google Scholar
Hartigan, J.: Bayesian regression using akaike priors. Yale University, New Haven, Preprint (2002)
Google Scholar
Juditsky, A., Rigollet, P., Tsybakov, A.: Learning by mirror averaging. Preprint n.1034, LPMA
Google Scholar
Juditsky, A., Nazin, A., Tsybakov, A.B., Vayatis, N.: Recursive Aggregation of Estimators by Mirror Descent Algorithm with averaging. Problems of Information Transmission 41(4), 368–384
Google Scholar
Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 153–167. Springer, Heidelberg (1999)
Chapter Google Scholar
Koltchinskii, V.: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization (IMS Medallion Lecture). Ann. Statist. 34(6), 1–50 (2006)
MathSciNet Google Scholar
Lecué, G.: Optimal rates of aggregation in classification. Submitted (2005)
Google Scholar
Lecué, G.: Simultaneous adaptation to the margin and to complexity in classification. To appear in Ann. Statist (2005)
Google Scholar
Lecué, G.: Optimal oracle inequality for aggregation of classifiers under low noise condition. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 364–378. Springer, Heidelberg (2006)
Chapter Google Scholar
Lecué, G.: Suboptimality of Penalized Empirical Risk Minimization. Manuscript (2006)
Google Scholar
Leung, G., Barron, A.: Information theory and mixing least-square regressions. IEEE Transactions on Information Theory 52(8), 3396–3410 (2006)
Article MathSciNet Google Scholar
Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32(1), 30–55 (2004)
MATH MathSciNet Google Scholar
Nemirovski, A.: Topics in Non-parametric Statistics, Ecole d’été de Probabilités de Saint-Flour 1998. Lecture Notes in Mathematics, vol. 1738. Springer, Heidelberg (2000)
Google Scholar
Tsybakov, A.: Introduction à l’estimation non-paramétrique. Springer, Heidelberg (2004)
MATH Google Scholar
Tsybakov, A.B.: Optimal rates of aggregation. In: Schölkopf, B., Warmuth, M. (eds.) Computational Learning Theory and Kernel Machines. LNCS (LNAI), vol. 2777, pp. 303–313. Springer, Heidelberg (2003)
Google Scholar
Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1), 135–166 (2004)
Article MATH MathSciNet Google Scholar
Vapnik, V.N., Chervonenkis, A.Y.: Necessary and sufficient conditions for the uniform convergence of empirical means to their true values. Teor. Veroyatn. Primen. 26, 543–563 (1981)
MATH MathSciNet Google Scholar
Vovk, V.: Aggregating Strategies. In: Proceedings of the 3rd Annual Workshop on Computational Learning Theory, COLT1990, pp. 371–386. Morgan Kaufmann, San Francisco, CA (1990)
Google Scholar
Yang, Y.: Mixing strategies for density estimation. Ann. Statist. 28(1), 75–87 (2000)
Article MATH MathSciNet Google Scholar
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1), 56–85 (2004)
Article MATH MathSciNet Google Scholar
Zhang, T.: Adaptive estimation in Pattern Recognition by combining different procedures. Statistica Sinica 10, 1069–1089 (2000)
MathSciNet Google Scholar
Zhang, T.: From epsilon-entropy to KL-complexity: analysis of minimum information complexity density estimation, To appear in Ann. Statist (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Probabilités et Modèles Aléatoires (UMR CNRS 7599), Université Paris VI, 4 pl.Jussieu, BP 188, 75252 Paris, France
Guillaume Lecué

Authors

Guillaume Lecué
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lecué, G. (2007). Suboptimality of Penalized Empirical Risk Minimization in Classification. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-72927-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics