Skip to main content
Log in

On aggregation for heavy-tailed classes

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

We introduce an alternative to the notion of ‘fast rate’ in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions—for example, that the \(L_q\) and \(L_2\) norms are equivalent on the linear span of the class for some \(q>2\), and the target random variable is square-integrable. The key components in the proof include a two-sided isomorphic estimator on distances between class members, which is based on the median-of-means; and an almost isometric lower bound of the form \(N^{-1}\sum _{i=1}^N f^2(X_i) \ge (1-\zeta )\mathbb {E}f^2\) which holds uniformly in the class. Both results only require that the \(L_q\) and \(L_2\) norms are equivalent on the linear span of the class for some \(q>2\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This is sometimes called a proper learning procedure.

  2. Naturally, there is no chance that a nontrivial error rate would be true without any assumption on the class \({\mathcal {F}}\). The aim is to obtain such an estimate under minimal assumptions on the class—as will be specified in what follows.

  3. Roughly put, the minimax rate is the best possible error rate one may achieve by any learning procedure, i.e., by any \(\Psi :(\Omega \times \mathbb {R})^N \rightarrow {\mathcal {F}}\).

  4. The reason for calling \(f^*(X)-Y\) ‘noise level’ is the independent noise case, in which \(Y=f^*(X)-\xi \) and \(\xi \) is mean-zero and independent of X. Thus \(f^*(X)-Y\) is indeed the noise, and its \(L_q\) norm calibrates the noise level of the problem.

References

  1. Anthony, M., Bartlett, P.L.: Neural network learning: theoretical foundations. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  2. Audibert, J.Y.: Proof of the optimality of the empirical star algorithm, unpublished note. Available at http://certis.enpc.fr/~audibert/Mes%20articles/NIPS07supplem2. Accessed 21 June 2016

  3. Audibert, J.Y.: Fast learning rates in statistical inference through aggregation. Ann. Stat. 37(4), 1591–1646 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, Oxford (2013)

    Book  MATH  Google Scholar 

  5. Catoni, O.: Statistical learning theory and stochastic optimization, vol. 1851 of Lecture Notes in Mathematics, Springer, Berlin (2004)

  6. de la Peña, V., Giné, E.: Decoupling: from dependence to independence. Springer-Verlag, Berlin (1999)

    Book  Google Scholar 

  7. Giné, E., Zinn, J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–989 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  8. Juditsky, A., Rigollet, P., Tsybakov, A.B.: Learning by mirror averaging. Ann. Stat. 36(5), 2183–2206 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Koltchinskii, V., Mendelson, S.: Bounding the smallest singular value of a random matrix without concentration. http://arxiv.org/abs/1312.3580 (preprint)

  10. Lecué, G.: HDR Thesis. Available at http://www.cmap.polytechnique.fr/~lecue/HDR. Accessed 21 June 2016

  11. Lecué, G., Mendelson, S.: Aggregation via Empirical risk minimization. Probab. Theor. Relat. Fields 145, 591–613 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lecué, G., Mendelson, S.: Learning subgaussian classes: upper and minimax bounds. Available at http://arxiv.org/abs/1305.4825 (preprint)

  13. Lecué, G., Mendelson, S.: Performance of empirical risk minimization in linear aggregation. Bernoulli 22(3), 1520–1534 (2016). doi:10.3150/15-BEJ701

  14. Ledoux, M.: The concentration of measure phenomenon. In: Mathematical Surveys and Monographs, vol. 89, p. x\(+\)181. American Mathematical Society, Providence, RI (2001)

  15. Ledoux, M., Talagrand, M.: Probability in banach spaces. Isoperimetry and processes, Ergebnisse der Mathematik und ihrer Grenzgebiete (3), vol. 23. Springer-Verlag, Berlin (1991)

    MATH  Google Scholar 

  16. Mendelson, S., Pajor, A., Tomczak-Jaegermann, N.: Reconstruction and subgaussian operators. Geom. Funct. Anal. 17(4), 1248–1282 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  17. Mendelson, S.: A remark on the diameter of random sections of convex bodies, geometric aspects of functional analysis (GAFA Seminar Notes), In: Klartag, B., Milman, E. (eds) Lecture notes in Mathematics 2116, pp. 395–404 (2014)

  18. Mendelson, S.: Learning without concentration. J ACM 62(3)(21), 1–25 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. Mendelson, S.: Learning without concentration for a general loss function. http://arxiv.org/abs/1410.3192 (preprint)

  20. Mendelson, S.: Upper bounds on product and multiplier empirical processes. Available at http://arxiv.org/abs/1410.8003 (preprint)

  21. Mossel, E., O’Donnell, R., Oleszkiewicz, K.: Noise stability of functions with low influences: invariance and optimality. Ann. Math. 171(1), 295–341 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  22. Nemirovski, A.: Topics in non-parametric statistics, In: Lectures on probability theory and statistics (Saint-Flour, 1998), vol 1738 of Lecture Notes in Math., pp. 85–277. Springer, Berlin (2000)

  23. Pisier, G.: The volume of convex bodies and banach space geometry. In: Cambridge Tracts in Mathematics, vol. 94, p. xvi\(+\)250. Cambridge University Press, Cambridge (1989). doi:10.1017/CBO9780511662454

  24. Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22(1), 28–76 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  25. Tsybakov, A.B.: Introduction to nonparametric estimation. Springer, New York (2009)

    Book  MATH  Google Scholar 

  26. Van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer Verlag, Berlin (1996)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahar Mendelson.

Additional information

Supported in part by the Mathematical Sciences Institute, The Australian National University, Canberra, ACT 2601, Australia, and by an Israel Science Foundation grant 707/14.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mendelson, S. On aggregation for heavy-tailed classes. Probab. Theory Relat. Fields 168, 641–674 (2017). https://doi.org/10.1007/s00440-016-0720-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-016-0720-6

Mathematics Subject Classification

Navigation