On aggregation for heavy-tailed classes

Mendelson, Shahar

doi:10.1007/s00440-016-0720-6

On aggregation for heavy-tailed classes

Published: 22 June 2016

Volume 168, pages 641–674, (2017)
Cite this article

Probability Theory and Related Fields Aims and scope Submit manuscript

Shahar Mendelson¹

397 Accesses
12 Citations
Explore all metrics

Abstract

We introduce an alternative to the notion of ‘fast rate’ in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions—for example, that the \(L_q\) and \(L_2\) norms are equivalent on the linear span of the class for some \(q>2\), and the target random variable is square-integrable. The key components in the proof include a two-sided isomorphic estimator on distances between class members, which is based on the median-of-means; and an almost isometric lower bound of the form \(N^{-1}\sum _{i=1}^N f^2(X_i) \ge (1-\zeta )\mathbb {E}f^2\) which holds uniformly in the class. Both results only require that the \(L_q\) and \(L_2\) norms are equivalent on the linear span of the class for some \(q>2\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Families of conditionally approximately convex functions

Article 19 October 2017

Convexity and unique minimum points

Article 22 February 2018

The empirical Christoffel function with applications in data analysis

Article 07 March 2019

Notes

This is sometimes called a proper learning procedure.
Naturally, there is no chance that a nontrivial error rate would be true without any assumption on the class \({\mathcal {F}}\). The aim is to obtain such an estimate under minimal assumptions on the class—as will be specified in what follows.
Roughly put, the minimax rate is the best possible error rate one may achieve by any learning procedure, i.e., by any \(\Psi :(\Omega \times \mathbb {R})^N \rightarrow {\mathcal {F}}\).
The reason for calling \(f^*(X)-Y\) ‘noise level’ is the independent noise case, in which \(Y=f^*(X)-\xi \) and \(\xi \) is mean-zero and independent of X. Thus \(f^*(X)-Y\) is indeed the noise, and its \(L_q\) norm calibrates the noise level of the problem.

References

Anthony, M., Bartlett, P.L.: Neural network learning: theoretical foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Audibert, J.Y.: Proof of the optimality of the empirical star algorithm, unpublished note. Available at http://certis.enpc.fr/~audibert/Mes%20articles/NIPS07supplem2. Accessed 21 June 2016
Audibert, J.Y.: Fast learning rates in statistical inference through aggregation. Ann. Stat. 37(4), 1591–1646 (2009)
Article MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, Oxford (2013)
Book MATH Google Scholar
Catoni, O.: Statistical learning theory and stochastic optimization, vol. 1851 of Lecture Notes in Mathematics, Springer, Berlin (2004)
de la Peña, V., Giné, E.: Decoupling: from dependence to independence. Springer-Verlag, Berlin (1999)
Book Google Scholar
Giné, E., Zinn, J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–989 (1984)
Article MathSciNet MATH Google Scholar
Juditsky, A., Rigollet, P., Tsybakov, A.B.: Learning by mirror averaging. Ann. Stat. 36(5), 2183–2206 (2008)
Article MathSciNet MATH Google Scholar
Koltchinskii, V., Mendelson, S.: Bounding the smallest singular value of a random matrix without concentration. http://arxiv.org/abs/1312.3580 (preprint)
Lecué, G.: HDR Thesis. Available at http://www.cmap.polytechnique.fr/~lecue/HDR. Accessed 21 June 2016
Lecué, G., Mendelson, S.: Aggregation via Empirical risk minimization. Probab. Theor. Relat. Fields 145, 591–613 (2009)
Article MathSciNet MATH Google Scholar
Lecué, G., Mendelson, S.: Learning subgaussian classes: upper and minimax bounds. Available at http://arxiv.org/abs/1305.4825 (preprint)
Lecué, G., Mendelson, S.: Performance of empirical risk minimization in linear aggregation. Bernoulli 22(3), 1520–1534 (2016). doi:10.3150/15-BEJ701
Ledoux, M.: The concentration of measure phenomenon. In: Mathematical Surveys and Monographs, vol. 89, p. x\(+\)181. American Mathematical Society, Providence, RI (2001)
Ledoux, M., Talagrand, M.: Probability in banach spaces. Isoperimetry and processes, Ergebnisse der Mathematik und ihrer Grenzgebiete (3), vol. 23. Springer-Verlag, Berlin (1991)
MATH Google Scholar
Mendelson, S., Pajor, A., Tomczak-Jaegermann, N.: Reconstruction and subgaussian operators. Geom. Funct. Anal. 17(4), 1248–1282 (2007)
Article MathSciNet MATH Google Scholar
Mendelson, S.: A remark on the diameter of random sections of convex bodies, geometric aspects of functional analysis (GAFA Seminar Notes), In: Klartag, B., Milman, E. (eds) Lecture notes in Mathematics 2116, pp. 395–404 (2014)
Mendelson, S.: Learning without concentration. J ACM 62(3)(21), 1–25 (2015)
Article MathSciNet MATH Google Scholar
Mendelson, S.: Learning without concentration for a general loss function. http://arxiv.org/abs/1410.3192 (preprint)
Mendelson, S.: Upper bounds on product and multiplier empirical processes. Available at http://arxiv.org/abs/1410.8003 (preprint)
Mossel, E., O’Donnell, R., Oleszkiewicz, K.: Noise stability of functions with low influences: invariance and optimality. Ann. Math. 171(1), 295–341 (2010)
Article MathSciNet MATH Google Scholar
Nemirovski, A.: Topics in non-parametric statistics, In: Lectures on probability theory and statistics (Saint-Flour, 1998), vol 1738 of Lecture Notes in Math., pp. 85–277. Springer, Berlin (2000)
Pisier, G.: The volume of convex bodies and banach space geometry. In: Cambridge Tracts in Mathematics, vol. 94, p. xvi\(+\)250. Cambridge University Press, Cambridge (1989). doi:10.1017/CBO9780511662454
Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22(1), 28–76 (1994)
Article MathSciNet MATH Google Scholar
Tsybakov, A.B.: Introduction to nonparametric estimation. Springer, New York (2009)
Book MATH Google Scholar
Van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer Verlag, Berlin (1996)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Technion, I.I.T, Haifa, 32000, Israel
Shahar Mendelson

Authors

Shahar Mendelson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shahar Mendelson.

Additional information

Supported in part by the Mathematical Sciences Institute, The Australian National University, Canberra, ACT 2601, Australia, and by an Israel Science Foundation grant 707/14.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mendelson, S. On aggregation for heavy-tailed classes. Probab. Theory Relat. Fields 168, 641–674 (2017). https://doi.org/10.1007/s00440-016-0720-6

Download citation

Received: 21 March 2015
Revised: 20 January 2016
Published: 22 June 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s00440-016-0720-6

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On aggregation for heavy-tailed classes

Abstract

Access this article

Similar content being viewed by others

Families of conditionally approximately convex functions

Convexity and unique minimum points

The empirical Christoffel function with applications in data analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

On aggregation for heavy-tailed classes

Abstract

Access this article

Similar content being viewed by others

Families of conditionally approximately convex functions

Convexity and unique minimum points

The empirical Christoffel function with applications in data analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation