Advertisement

TEST

pp 1–24 | Cite as

Prediction error bounds for linear regression with the TREX

  • Jacob Bien
  • Irina Gaynanova
  • Johannes Lederer
  • Christian L. Müller
Original Paper
  • 16 Downloads

Abstract

The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of the TREX from a theoretical perspective and provide new insights into penalized regression in general.

Keywords

TREX High-dimensional regression Tuning parameters Oracle inequalities 

Mathematics Subject Classification

62J07 

Notes

Acknowledgements

We thank the editor and the reviewers for their insightful comments.

References

  1. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79MathSciNetCrossRefMATHGoogle Scholar
  2. Arlot S, Celisse A (2011) Segmentation of the mean of heteroscedastic data via cross-validation. Stat Comput 21(4):613–632MathSciNetCrossRefMATHGoogle Scholar
  3. Bach, F (2008) Bolasso: Model consistent Lasso estimation through the bootstrap. In: Proceedings of the 25th international conference on machine learning, pp 33–40Google Scholar
  4. Baraud Y, Giraud C, Huet S (2009) Gaussian model selection with an unknown variance. Ann Stat 37(2):630–672MathSciNetCrossRefMATHGoogle Scholar
  5. Barber R, Candès E (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085MathSciNetCrossRefMATHGoogle Scholar
  6. Belloni A, Chernozhukov V, Wang L (2011) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806MathSciNetCrossRefMATHGoogle Scholar
  7. Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of lasso and Dantzig selector. Ann Stat 37(4):1705–1732MathSciNetCrossRefMATHGoogle Scholar
  8. Bien J, Gaynanova I, Lederer J, Müller C (2018) Non-convex global minimization and false discovery rate control for the TREX. J Comput Graph Stat 27(1):23–33.  https://doi.org/10.1080/10618600.2017.1341414
  9. Boucheron S, Lugosi G, Massart P (2013) Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, CambridgeCrossRefMATHGoogle Scholar
  10. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, BerlinCrossRefMATHGoogle Scholar
  11. Bunea F, Lederer J, She Y (2014) The group square-root lasso: theoretical properties and fast algorithms. IEEE Trans Inf Theory 60(2):1313–1325MathSciNetCrossRefMATHGoogle Scholar
  12. Bunea F, Tsybakov A, Wegkamp M (2006) Aggregation and sparsity via \(\ell _1\)-penalized least squares. In: Proceedings of 19th annual conference on learning theory, pp 379–391Google Scholar
  13. Candès E, Plan Y (2009) Near-ideal model selection by \(\ell _1\) minimization. Ann Stat 37(5):2145–2177CrossRefMATHGoogle Scholar
  14. Candes E, Tao T (2007) The Dantzig selector: Statistical estimation when p is much larger than n. Ann Stat 35(6):2313–2351MathSciNetCrossRefMATHGoogle Scholar
  15. Chatterjee S, Jafarov J (2015) Prediction error of cross-validated lasso. arXiv:1502.06291
  16. Chételat D, Lederer J, Salmon J (2017) Optimal two-step prediction in regression. Electron J Stat 11(1):2519–2546MathSciNetCrossRefMATHGoogle Scholar
  17. Chichignoud M, Lederer J, Wainwright M (2016) A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J Mach Learn Res 17:1–20MathSciNetMATHGoogle Scholar
  18. Combettes P, Müller C (2016) Perspective functions: proximal calculus and applications in high-dimensional statistics. J Math Anal Appl 457(2):1283–1306MathSciNetCrossRefMATHGoogle Scholar
  19. Dalalyan A, Tsybakov A (2012) Mirror averaging with sparsity priors. Bernoulli 18(3):914–944MathSciNetCrossRefMATHGoogle Scholar
  20. Dalalyan A, Tsybakov A (2012) Sparse regression learning by aggregation and langevin monte-carlo. J Comput Syst Sci 78(5):1423–1443MathSciNetCrossRefMATHGoogle Scholar
  21. Dalalyan A, Hebiri M, Lederer J (2017) On the prediction performance of the lasso. Bernoulli 23(1):552–581MathSciNetCrossRefMATHGoogle Scholar
  22. Dalalyan A, Tsybakov A (2007) Aggregation by exponential weighting and sharp oracle inequalities. In: Proceedings of 19th annual conference on learning theory, pp 97–111Google Scholar
  23. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRefMATHGoogle Scholar
  24. Giraud C, Huet S, Verzelen N (2012) High-dimensional regression with unknown variance. Stat Sci 27(4):500–518MathSciNetCrossRefMATHGoogle Scholar
  25. Hebiri M, Lederer J (2013) How correlations influence lasso prediction. IEEE Trans Inf Theory 59(3):1846–1854MathSciNetCrossRefMATHGoogle Scholar
  26. Huang C, Cheang G, Barron A (2008) Risk of penalized least squares, greedy selection and L1 penalization for flexible function libraries. ManuscriptGoogle Scholar
  27. Koltchinskii V, Lounici K, Tsybakov A (2011) Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann Stat 39(5):2302–2329MathSciNetCrossRefMATHGoogle Scholar
  28. Lederer J, van de Geer S (2014) New concentration inequalities for empirical processes. Bernoulli 20(4):2020–2038MathSciNetCrossRefMATHGoogle Scholar
  29. Lederer J, Müller C (2014) Topology adaptive graph estimation in high dimensions. arXiv:1410.7279
  30. Lederer J, Müller C (2015) Don’t fall for tuning parameters: tuning-free variable selection in high dimensions with the TREX. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  31. Lederer J, Yu L, Gaynanova I (2016) Oracle inequalities for high-dimensional prediction. arXiv:1608.00624
  32. Lim N, Lederer J (2016) Efficient feature selection with large and high-dimensional data. arXiv:1609.07195
  33. Massart P, Meynet C (2011) The Lasso as an \(\ell _1\)-ball model selection procedure. Electron J Stat 5:669–687MathSciNetCrossRefMATHGoogle Scholar
  34. Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc Ser B 72(4):417–473MathSciNetCrossRefGoogle Scholar
  35. Raskutti G, Wainwright M, Yu B (2010) Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res 11:2241–2259MathSciNetMATHGoogle Scholar
  36. Rigollet P, Tsybakov A (2011) Exponential screening and optimal rates of sparse estimation. Ann Stat 39(2):731–771MathSciNetCrossRefMATHGoogle Scholar
  37. Sabourin J, Valdar W, Nobel A (2015) A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics 71(4):1185–1194MathSciNetCrossRefMATHGoogle Scholar
  38. Shah R, Samworth R (2013) Variable selection with error control: another look at stability selection. J R Stat Soc Ser B 75(1):55–80MathSciNetCrossRefGoogle Scholar
  39. Sun T, Zhang CH (2012) Scaled sparse linear regression. Biometrika 99(4):879–898MathSciNetCrossRefMATHGoogle Scholar
  40. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statist Soc Ser B 58(1):267–288MathSciNetMATHGoogle Scholar
  41. van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the lasso. Electron J Stat 3:1360–1392MathSciNetCrossRefMATHGoogle Scholar
  42. van de Geer S, Lederer J (2013) The Bernstein-Orlicz norm and deviation inequalities. Probab Theory Relat Fields 157(1–2):225–250MathSciNetCrossRefMATHGoogle Scholar
  43. van de Geer S, Lederer J (2013) The Lasso, correlated design, and improved oracle inequalities. IMS Collections 9:303–316MathSciNetMATHGoogle Scholar
  44. van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, BerlinCrossRefMATHGoogle Scholar
  45. van de Geer S (2007) The deterministic lasso. In Joint statistical meetings proceedingsGoogle Scholar
  46. van de Geer S (2000) Empirical processes in M-estimation. Cambridge University Press, CambridgeMATHGoogle Scholar
  47. Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell _1\)-constrained quadratic programming (lasso). IEEE Trans Inf Theory 55(4):2183–2202CrossRefMATHGoogle Scholar
  48. Wellner J (2017) The Bennett-Orlicz norm. Sankhya A 79(2):355–383MathSciNetCrossRefMATHGoogle Scholar
  49. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67MathSciNetCrossRefMATHGoogle Scholar
  50. Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942MathSciNetCrossRefMATHGoogle Scholar
  51. Zhuang R, Lederer J (2017) Maximum regularized likelihood estimators: a general prediction theory and applications. arXiv:1710.02950

Copyright information

© Sociedad de Estadística e Investigación Operativa 2018

Authors and Affiliations

  1. 1.Department of Data Sciences and OperationsUniversity of Southern California Los AngelesUSA
  2. 2.Department of StatisticsTexas A&M UniversityCollege StationUSA
  3. 3.Departments of Statistics and BiostatisticsUniversity of WashingtonSeattleUSA
  4. 4.Flatiron InstituteSimons FoundationNew YorkUSA

Personalised recommendations