On Concentration for (Regularized) Empirical Risk Minimization

van de Geer, Sara; Wainwright, Martin J.

doi:10.1007/s13171-017-0111-9

On Concentration for (Regularized) Empirical Risk Minimization

Published: 01 September 2017

Volume 79, pages 159–200, (2017)
Cite this article

Sankhya A Aims and scope Submit manuscript

Sara van de Geer¹ &
Martin J. Wainwright²

392 Accesses
11 Citations
Explore all metrics

Abstract

Rates of convergence for empirical risk minimizers have been well studied in the literature. In this paper, we aim to provide a complementary set of results, in particular by showing that after normalization, the risk of the empirical minimizer concentrates on a single point. Such results have been established by Chatterjee (The Annals of Statistics, 42(6):2340–2381 2014) for constrained estimators in the normal sequence model. We first generalize and sharpen this result to regularized least squares with convex penalties, making use of a “direct” argument based on Borell’s theorem. We then study generalizations to other loss functions, including the negative log-likelihood for exponential families combined with a strictly convex regularization penalty. The results in this general setting are based on more “indirect” arguments as well as on concentration inequalities for maxima of empirical processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new computational framework for log-concave density estimation

Article Open access 30 April 2024

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Article 29 March 2024

References

Borell, C. (1975). The brunn-Minkowski inequality in Gauss space. Inventiones Mathematicae 30, 2, 207–216.
Article MathSciNet MATH Google Scholar
Boucheron, S. and Massart, P. (2011). A high-dimensional Wilks phenomenon. Probability Theory and Related Fields 150, 3-4, 405–433.
Article MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. OUP Oxford.
Chatterjee, S. (2014). A new perspective on least squares under convex constraint. The Annals of Statistics 42, 6, 2340–2381.
Article MathSciNet MATH Google Scholar
Klein, T. (2002). Une inégalité de concentration à gauche pour les processus empiriques. Comptes Rendus Mathematique 334, 6, 501–504.
Article MathSciNet MATH Google Scholar
Klein, T. and Rio, E. (2005). Concentration around the mean for maxima of empirical processes. The Annals of Probability 33, 3, 1060–1077.
Article MathSciNet MATH Google Scholar
Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole dEté de probabilités de Saint-Flour XXXVIII-2008, volume 38 Springer Science & Business Media.
Ledoux, M. (2001). The concentration of measure phenomenon, volume 89. American Mathematical Society.
Massart, P. (2000). Some applications of concentration inequalities to statistics. Annales de la faculté des sciences de toulouse: Mathématiques, volume 9, pages 245–303.
Muro, A. and van de Geer, S. (2015). Concentration behavior of the penalized least squares estimator. arXiv:1511.08698.
Rockafellar, R.T. (1970). Convex analysis. Princeton University Press.
Saumard, A. (2012). Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression. Electronic Journal of Statistics 6, 579–655.
Article MathSciNet MATH Google Scholar
Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathé,matiques de l’IHES 81, 73–205.
Article MathSciNet MATH Google Scholar
van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer-Verlag, New York. ISBN 0-387-94640-3.
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Seminar for Statistics, ETH Zürich, Zurich, Switzerland
Sara van de Geer
Department of Statistics and Department of EECS, University of California, Berkeley, CA, 94720, USA
Martin J. Wainwright

Authors

Sara van de Geer
View author publications
You can also search for this author in PubMed Google Scholar
Martin J. Wainwright
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara van de Geer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

van de Geer, S., Wainwright, M.J. On Concentration for (Regularized) Empirical Risk Minimization. Sankhya A 79, 159–200 (2017). https://doi.org/10.1007/s13171-017-0111-9

Download citation

Received: 25 January 2017
Published: 01 September 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s13171-017-0111-9

Keywords and phrases.

AMS (2000) subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Concentration for (Regularized) Empirical Risk Minimization

Abstract

Access this article

Similar content being viewed by others

A new computational framework for log-concave density estimation

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords and phrases.

AMS (2000) subject classification

Navigation

On Concentration for (Regularized) Empirical Risk Minimization

Abstract

Access this article

Similar content being viewed by others

A new computational framework for log-concave density estimation

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases.

AMS (2000) subject classification

Search

Navigation