Skip to main content
Log in

Learning without concentration for general loss functions

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

We study the performance of empirical risk minimization in prediction and estimation problems that are carried out in a convex class and relative to a sufficiently smooth convex loss function. The framework is based on the small-ball method and thus is suited for heavy-tailed problems. Moreover, among its outcomes is that a well-chosen loss, calibrated to fit the noise level of the problem, negates some of the ill-effects of outliers and boosts the confidence level—leading to a gaussian like behaviour even when the target random variable is heavy-tailed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. We refer to \(f^*(X)-Y\) as the noise of the problem. This name makes perfect sense when \(Y=f_0(X)-W\) for a mean-zero random variable W that is independent of X, and we use the term ‘noise’ even when the target does not have that particular form.

  2. The log-loss is more commonly used in the context of binary classification problems rather than in the type of real-valued problems we study here. However, because of its convexity properties it is an interesting example of the phenomenon we explore.

  3. Let us mention that it is possible to modify the arguments and tackle situations in which the constants \(\kappa \) and \(\varepsilon \) are not uniform, but to keep this article at a reasonable length we defer this to future work.

  4. One may show that under rather reasonable conditions, if \(f^*_\gamma \) is the true minimizer of the Huber loss with parameter \(\gamma \) and \(f^*\) is the true minimizer of the squared loss then \(\Vert f_\gamma ^*(X)-Y\Vert _{L_2} \lesssim \Vert f^*(X)-Y\Vert _{L_2}\).

References

  1. Bartlett, P.L., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33(4), 1497–1537 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Birgé, L., Massart, P.: Rates of convergence for minimum contrast estimators. Probab. Theory Relat. Fields 97(1–2), 113–150 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  3. Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013). ISBN 978-0-19-953525-5

    Book  MATH  Google Scholar 

  4. Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data. In: Springer Series in Statistics. Methods, Theory and Applications, pp. xviii, 556. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20192-9

  5. de la Peña, V.H., Giné, E.: Decoupling. From Dependence to Independence, Randomly Stopped Processes. \(U\)-Statistics and Processes. Martingales and Beyond. Probability and Its Applications. Springer, New York (1999)

    MATH  Google Scholar 

  6. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition, Volume 31 of Applications of Mathematics. Springer, New York (1996)

    Book  MATH  Google Scholar 

  7. Dudley, R.M.: Uniform Central Limit Theorems, Volume 63 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1999)

    Book  Google Scholar 

  8. Giné, E., Zinn, J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–998 (1984). With discussion

    Article  MathSciNet  MATH  Google Scholar 

  9. Koltchinskii, V.: Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Volume 2033 of Lecture Notes in Mathematics. Lectures from the 38th Probability Summer School Held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School]. Springer, Heidelberg (2011)

  10. Lecué, G., Mendelson, S.: Learning subgaussian classes: upper and minimax bounds. Technical Report, CNRS, Ecole polytechnique and Technion (2013)

  11. Ledoux, M.: The Concentration of Measure Phenomenon. American Mathematical Society, Providence, RI (2001)

    MATH  Google Scholar 

  12. Ledoux, M., Talagrand, M.: Probability in Banach Spaces, volume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Isoperimetry and Processes. Springer, Berlin (1991)

    Google Scholar 

  13. Lee, W.S., Bartlett, P.L., Williamson, R.C.: The importance of convexity in learning with squared loss. In: Proceedings of the Ninth Annual Conference on Computational Learning Theory, pp. 140–146. ACM Press (1996)

  14. Lugosi, G., Mendelson, S.: Risk minimization by median-of-means tournaments. (2016). https://arxiv.org/abs/1608.00757

  15. Massart, P.: Concentration Inequalities and Model Selection, Volume 1896 of Lecture Notes in Mathematics. Lectures from the 33rd Summer School on Probability Theory Held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard. Springer, Berlin (2007)

  16. Mendelson, S.: Improving the sample complexity using global data. IEEE Trans. Inf. Theory 48(7), 1977–1991 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  17. Mendelson, S.: Obtaining fast error rates in nonconvex situations. J. Complex. 24(3), 380–397 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Mendelson, S.: Learning without concentration for general loss functions. Technical Report, Technion (2014). http://arxiv.org/abs/1410.3192

  19. Mendelson, S.: Learning without concentration. J. ACM 62(3), Art. 21, 25 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. Mendelson, S.: Upper bounds on product and multiplier empirical processes. Stoch. Process. Appl. 126(12), 3652–3680 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Mendelson, S.: On aggregation for heavy-tailed classes. Probab. Theory Relat. Fields (2016). doi:10.1007/s00440-016-0720-6

  22. Milman, V.D., Schechtman, G.: Asymptotic Theory of Finite-Dimensional Normed Spaces, Volume 1200 of Lecture Notes in Mathematics. With an appendix by M. Gromov. Springer, Berlin (1986)

  23. Pisier, G.: The Volume of Convex Bodies and Banach Space Geometry, Volume 94 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge (1989)

    Book  Google Scholar 

  24. Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York (2009) (Revised and extended from the 2004 French original, Translated by Vladimir Zaiats)

  25. van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, New York (1996). (With applications to statistics)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahar Mendelson.

Additional information

Partially supported by ISF Grant 707/14.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mendelson, S. Learning without concentration for general loss functions. Probab. Theory Relat. Fields 171, 459–502 (2018). https://doi.org/10.1007/s00440-017-0784-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-017-0784-y

Mathematics Subject Classification

Navigation