Abstract
Given a finite set F of estimators, the problem of aggregation is to construct a new estimator whose risk is as close as possible to the risk of the best estimator in F. It was conjectured that empirical minimization performed in the convex hull of F is an optimal aggregation method, but we show that this conjecture is false. Despite that, we prove that empirical minimization in the convex hull of a well chosen, empirically determined subset of F is an optimal aggregation method.
Article PDF
Similar content being viewed by others
References
Bartlett P.L., Jordan M.I., McAuliffe J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Bunea F., Nobel A.: Sequential procedures for aggregating arbitrary estimators of a conditional mean. IEEE Trans. Inf. Theory 54(4), 1725–1735 (2008)
Bunea F., Tsybakov A.B., Wegkamp M.H.: Aggregation for Gaussian regression. Ann. Statist. 35(4), 1674–1697 (2007)
Catoni, O.: Statistical learning theory and stochastic optimization, vol. 1851 of Lecture Notes in Mathematics. Springer, Berlin, 2004. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001
Dalalyan A., Tsybakov A.: Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Mach. Learn. 72(1–2), 39–61 (2008)
Dudley R.M.: Uniform central limit theorems. Cambridge Studies in Advanced Mathematics, vol 3. Cambridge University Press, Cambridge (1999)
Gaïffas S., Lecué G.: Optimal rates and adaptation in the single-index model using aggregation. Electron. J. Stat. 1, 538–573 (2007)
Giné E., Zinn J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–998 (1984)
Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: Subspaces and orthogonal decompositions generated by bounded orthogonal systems. Positivity 11(2), 269–283 (2007)
Juditsky, A.B., Rigollet, P., Tsybakov, A.B.: Learning by mirror averaging. Ann. Statist. Available at http://www.imstat.org/aos/future_papers.html (2006, to appear)
Koltchinskii, V.: Local rademacher complexities and Oracle inequalities in risk minimization. Ann. Statist. 34(6), 1–50, December 2006. 2004 IMS Medallion Lecture
Lecué, G.: Suboptimality of penalized empirical risk minimization in classification. In: Proceedings of the 20th Annual Conference On Learning Theory, COLT07. Lecture Notes in Artificial Intelligence, 4539, 142–156, 2007. Springer, Heidelberg
Ledoux, M.: The concentration of measure phenomenon. Mathematical Surveys and Monographs, vol 89. American Mathematical Society, Providence, RI, 2001
Ledoux M., Talagrand M.: Probability in Banach spaces, volume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Springer, Berlin (1991)
Lee W.S., Bartlett P.L., Williamson R.C.: The importance of convexity in learning with squared loss. IEEE Trans. Inf. Theory 44(5), 1974–1980 (1998)
Lugosi G., Wegkamp M.: Complexity regularization via localized random penalties. Ann. Statist. 32(4), 1679–1697 (2004)
Mendelson, S.: Lower bounds for the empirical minimization algorithm. IEEE Trans. Inf. Theory (2007, to appear)
Mendelson S.: On weakly bounded empirical processes. Math. Ann. 340(2), 293–314 (2008)
Mendelson S., Pajor A., Tomczak-Jaegermann N.: Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17(4), 1248–1282 (2007)
Nemirovski, A.: Topics in non-parametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1998), vol 1738 of Lecture Notes in Math., pages 85–277. Springer, Berlin, 2000
Pisier G.: The volume of convex bodies and Banach space geometry. Cambridge Tracts in Mathematics, vol 94. Cambridge University Press, Cambridge (1989)
Tsybakov, A.B.: Optimal rates of aggregation. In: Proceedings of the 16th Annual Conference On Learning Theory, COLT03. Lecture Notes in Artificial Intelligence, 2777, 303–313, 2003. Springer, Heidelberg
Tsybakov, A.B.: Introduction à l’estimation non-paramétrique. Springer, Berlin, 2004
van der Vaart A.W., Wellner J.A.: Weak convergence and empirical processes, Springer Series in Statistics. Springer, New York (1996)
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was supported in part by an Australian Research Council Discovery grant DP0559465 and by an Israel Science Foundation grant 666/06.
Rights and permissions
About this article
Cite this article
Lecué, G., Mendelson, S. Aggregation via empirical risk minimization. Probab. Theory Relat. Fields 145, 591–613 (2009). https://doi.org/10.1007/s00440-008-0180-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-008-0180-8