Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
  1. Home
  2. Probability Theory and Related Fields
  3. Article
ℓ 1-regularized linear regression: persistence and oracle inequalities
Download PDF
Download PDF
  • Published: 08 June 2011

ℓ 1-regularized linear regression: persistence and oracle inequalities

  • Peter L. Bartlett1,2,
  • Shahar Mendelson3 &
  • Joseph Neeman1 

Probability Theory and Related Fields volume 154, pages 193–224 (2012)Cite this article

  • 790 Accesses

  • 24 Citations

  • Metrics details

Abstract

We study the predictive performance of ℓ 1-regularized linear regression in a model-free setting, including the case where the number of covariates is substantially larger than the sample size. We introduce a new analysis method that avoids the boundedness problems that typically arise in model-free empirical minimization. Our technique provides an answer to a conjecture of Greenshtein and Ritov (Bernoulli 10(6):971–988, 2004) regarding the “persistence” rate for linear regression and allows us to prove an oracle inequality for the error of the regularized minimizer. It also demonstrates that empirical risk minimization gives optimal rates (up to log factors) of convex aggregation of a set of estimators of a regression function.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  1. Bartlett P.L.: Fast rates for estimation error and oracle inequalities for model selection. Econom. Theory 24(02), 545–552 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bartlett P.L., Jordan M.I., McAuliffe J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bartlett P.L., Mendelson S.: Empirical minimization. Probab. Theory Relat. Fields 135(3), 311–334 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bickel P.J., Ritov Y., Tsybakov A.B.: Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat. 39(4), 1705–1732 (2009)

    Article  MathSciNet  Google Scholar 

  5. Bunea F., Tsybakov A., Wegkamp M.: Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1, 169–194 (2007) (electronic)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Aggregation and sparsity via ℓ 1 penalized least squares. In: Proceedings of the 19th Annual Conference on Learning Theory (COLT 2006). Lecture Notes in Artificial Intelligence, vol. 4005, pp. 379–391. Springer, Berlin (2006)

  7. Candès E.J., Plan Y.: Near-ideal model selection by l1 minimization. Ann. Stat. 37(5A), 2145–2177 (2009)

    Article  MATH  Google Scholar 

  8. Carl B.: Inequalities of Bernstein–Jackson-type and the degree of compactness of operators in Banach spaces. Ann. Inst. Fourier (Grenoble) 35(3), 79–118 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  9. Catoni, O.: Statistical learning theory and stochastic optimization. Ecole d’Eté de Probabilités de Saint-Flour 2001. Lecture Notes in Mathematics, vol. 1851. Springer, Berlin (2004)

  10. de la Peña V.H., Giné E.: Decoupling: From Dependence to Independence, Probability and its Applications (New York). Springer, New York (1999)

    Google Scholar 

  11. Donoho D.L., Elad M., Temlyakov V.N.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52(1), 6–18 (2006)

    Article  MathSciNet  Google Scholar 

  12. Donoho D.L., Johnstone I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dudley R.M.: The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. In: Giné, E., Koltchinskii, V., Norvaisa, R. (eds) Selected Works of R.M. Dudley, Selected Works in Probability and Statistics, pp. 125–165. Springer, New York (2010)

    Chapter  Google Scholar 

  14. Giné E., Zinn J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–998 (1984) (with discussion)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gordon Y., Litvak A.E., Mendelson S., Pajor A.: Gaussian averages of interpolated bodies and applications to approximate reconstruction. J. Approx. Theory 149(1), 59–73 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  16. Greenshtein E.: Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ 1 constraint. Ann. Stat. 34(5), 2367–2386 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Greenshtein E., Ritov Y.: Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6), 971–988 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: Subspaces and orthogonal decompositions generated by bounded orthogonal systems. Positivity 11(2), 269–283 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: measures and proportional subsets of bounded orthonormal systems. Rev. Mat. Iberoamericana 24(3), 1075–1095 (2008)

    Article  MATH  Google Scholar 

  20. Hoeffding W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  21. Koltchinskii V.: Sparsity in penalized empirical risk minimization. Annales de l’Institut Henri Poincaré-Probabilités et Statistiques 45(1), 7–57 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  22. Lecué, G., Mendelson, S.: General oracle inequalities and applications to high dimensional data analysis (preprint)

  23. Leng C., Lin Y., Wahba G.: A note on the lasso and related procedures in model selection. Stat. Sinica 16(4), 1273–1284 (2006)

    MathSciNet  MATH  Google Scholar 

  24. Lounici K.: Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2, 90–102 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Meinshausen N., Bühlmann P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)

    Article  MATH  Google Scholar 

  26. Meinshausen N., Yu B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37(1), 246–270 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Mendelson S.: Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48(7), 1977–1991 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  28. Mendelson S.: On the performance of kernel classes. J. Mach. Learn. Res. 4(5), 759–771 (2004)

    MathSciNet  MATH  Google Scholar 

  29. Mendelson S., Neeman J.: Regularization in Kernel learning. Ann. Stat. 30(1), 526–565 (2010)

    Article  MathSciNet  Google Scholar 

  30. Milman V.D., Schechtman G.: Asymptotic theory of finite-dimensional normed spaces. Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)

    Google Scholar 

  31. Pajor A., Tomczak-Jaegermann N.: Remarques sur les nombres d’entropie d’un opérateur et de son transposé. C. R. Acad. Sci. Paris Sér. I Math. 301(15), 743–746 (1985)

    MathSciNet  MATH  Google Scholar 

  32. Paouris G.: Concentration of mass on convex bodies. Geom. Funct. Anal. 16(5), 1021–1049 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  33. Pisier, G.: Some applications of the metric entropy condition to harmonic analysis. In: Banach Spaces, Harmonic Analysis, and Probability Theory, pp. 123–154 (1983)

  34. Pisier, G.: The volume of convex bodies and Banach space geometry. In: Cambridge Tracts in Mathematics, vol. 94. Cambridge University Press, Cambridge (1989)

  35. Talagrand M.: Regularity of gaussian processes. Acta Mathematica 159, 99–149 (1987). doi:10.1007/BF02392556

    Article  MathSciNet  MATH  Google Scholar 

  36. Talagrand M.: The generic chaining. Springer Monographs in Mathematics Upper and lower bounds of stochastic processes. Springer, Berlin (2005)

    Google Scholar 

  37. Tibshirani R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  38. Tsybakov, A.B.: Optimal rates of aggregation. In: Computational Learning Theory 2003. Lecture Notes in Artificial Intelligence. vol. 2777, pp. 303–313. Springer, Berlin (2003)

  39. van de Geer S.A.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36(2), 614–645 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  40. van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York (with applications to statistics, 1996)

  41. Zhang C. H., Huang J.: The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Stat. 36(4), 1567–1594 (2008)

    Article  MATH  Google Scholar 

  42. Zhang T.: Some sharp performance bounds for least squares regression with ℓ 1 regularization. Ann. Stat. 37(5A), 2109–2144 (2009)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Statistics, University of California, Berkeley, CA, 94720, USA

    Peter L. Bartlett & Joseph Neeman

  2. Computer Science Division, University of California, Berkeley, CA, 94720, USA

    Peter L. Bartlett

  3. Department of Mathematics, Technion, I.I.T, 32000, Haifa, Israel

    Shahar Mendelson

Authors
  1. Peter L. Bartlett
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Shahar Mendelson
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Joseph Neeman
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph Neeman.

Additional information

The research leading to these results was supported by the Centre for Mathematics and its Applications, The Australian National University, and has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no [203134], from the Israel Science Foundation grant 666/06 and from the Australian Research Council grant DP0986563. We gratefully acknowledge the support of the NSF through grant DMS-0707060.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bartlett, P.L., Mendelson, S. & Neeman, J. ℓ 1-regularized linear regression: persistence and oracle inequalities. Probab. Theory Relat. Fields 154, 193–224 (2012). https://doi.org/10.1007/s00440-011-0367-2

Download citation

  • Received: 21 May 2010

  • Revised: 29 March 2011

  • Published: 08 June 2011

  • Issue Date: October 2012

  • DOI: https://doi.org/10.1007/s00440-011-0367-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Mathematics Subject Classification (2000)

  • 62G08
  • 62J07
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature