Abstract
Let \(\cal F\) be a set of M classification procedures with values in [ − 1,1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in \(\cal F\). This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((logM)/n)1/2 or (logM)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Audibert, J.-Y.: A randomized online learning algorithm for better variance control. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 392–407. Springer, Heidelberg (2006)
Barron, A., Li, J.: Mixture density estimation. Biometrics 53, 603–618 (1997)
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. Journal of the American Statistical Association 101(473), 138–156 (2006)
Bickel, P., Doksum, K.: Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1. Prentice-Hall, Englewood Cliffs (2001)
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: some recent advances. ESAIM Probability and Statistics 9, 323–375 (2005)
Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Statist. 30(4), 927–961 (2002)
Catoni, O.: Statistical Learning Theory and Stochastic Optimization. Ecole d’été de Probabilités de Saint-Flour 2001. Lecture Notes in Mathematics. Springer, Heidelberg (2001)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
Chesneau, C., Lecué, G.: Adapting to unknown smoothness by aggregation of thresholded wavelet estimators. Submitted (2006)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
Einmahl, U., Mason, D.: Some Universal Results on the Behavior of Increments of Partial Sums. Ann. Probab. 24, 2626–2635 (1996)
Freund, Y., Schapire, R.: A decision-theoric generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Statist. 28, 337–407 (2000)
Haussler, D., Kivinen, J., Warmuth, M.K.: Sequential prediction of individual sequences under general loss functions. IEEE Trans. on Information Theory 44(5), 1906–1925
Hartigan, J.: Bayesian regression using akaike priors. Yale University, New Haven, Preprint (2002)
Juditsky, A., Rigollet, P., Tsybakov, A.: Learning by mirror averaging. Preprint n.1034, LPMA
Juditsky, A., Nazin, A., Tsybakov, A.B., Vayatis, N.: Recursive Aggregation of Estimators by Mirror Descent Algorithm with averaging. Problems of Information Transmission 41(4), 368–384
Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 153–167. Springer, Heidelberg (1999)
Koltchinskii, V.: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization (IMS Medallion Lecture). Ann. Statist. 34(6), 1–50 (2006)
Lecué, G.: Optimal rates of aggregation in classification. Submitted (2005)
Lecué, G.: Simultaneous adaptation to the margin and to complexity in classification. To appear in Ann. Statist (2005)
Lecué, G.: Optimal oracle inequality for aggregation of classifiers under low noise condition. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 364–378. Springer, Heidelberg (2006)
Lecué, G.: Suboptimality of Penalized Empirical Risk Minimization. Manuscript (2006)
Leung, G., Barron, A.: Information theory and mixing least-square regressions. IEEE Transactions on Information Theory 52(8), 3396–3410 (2006)
Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32(1), 30–55 (2004)
Nemirovski, A.: Topics in Non-parametric Statistics, Ecole d’été de Probabilités de Saint-Flour 1998. Lecture Notes in Mathematics, vol. 1738. Springer, Heidelberg (2000)
Tsybakov, A.: Introduction à l’estimation non-paramétrique. Springer, Heidelberg (2004)
Tsybakov, A.B.: Optimal rates of aggregation. In: Schölkopf, B., Warmuth, M. (eds.) Computational Learning Theory and Kernel Machines. LNCS (LNAI), vol. 2777, pp. 303–313. Springer, Heidelberg (2003)
Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1), 135–166 (2004)
Vapnik, V.N., Chervonenkis, A.Y.: Necessary and sufficient conditions for the uniform convergence of empirical means to their true values. Teor. Veroyatn. Primen. 26, 543–563 (1981)
Vovk, V.: Aggregating Strategies. In: Proceedings of the 3rd Annual Workshop on Computational Learning Theory, COLT1990, pp. 371–386. Morgan Kaufmann, San Francisco, CA (1990)
Yang, Y.: Mixing strategies for density estimation. Ann. Statist. 28(1), 75–87 (2000)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1), 56–85 (2004)
Zhang, T.: Adaptive estimation in Pattern Recognition by combining different procedures. Statistica Sinica 10, 1069–1089 (2000)
Zhang, T.: From epsilon-entropy to KL-complexity: analysis of minimum information complexity density estimation, To appear in Ann. Statist (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Lecué, G. (2007). Suboptimality of Penalized Empirical Risk Minimization in Classification. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-72927-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)