Advertisement

TEST

pp 1–24 | Cite as

Prediction error bounds for linear regression with the TREX

  • Jacob Bien
  • Irina Gaynanova
  • Johannes Lederer
  • Christian L. Müller
Original Paper
  • 47 Downloads

Abstract

The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of the TREX from a theoretical perspective and provide new insights into penalized regression in general.

Keywords

TREX High-dimensional regression Tuning parameters Oracle inequalities 

Mathematics Subject Classification

62J07 

Notes

Acknowledgements

We thank the editor and the reviewers for their insightful comments.

References

  1. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79MathSciNetCrossRefzbMATHGoogle Scholar
  2. Arlot S, Celisse A (2011) Segmentation of the mean of heteroscedastic data via cross-validation. Stat Comput 21(4):613–632MathSciNetCrossRefzbMATHGoogle Scholar
  3. Bach, F (2008) Bolasso: Model consistent Lasso estimation through the bootstrap. In: Proceedings of the 25th international conference on machine learning, pp 33–40Google Scholar
  4. Baraud Y, Giraud C, Huet S (2009) Gaussian model selection with an unknown variance. Ann Stat 37(2):630–672MathSciNetCrossRefzbMATHGoogle Scholar
  5. Barber R, Candès E (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085MathSciNetCrossRefzbMATHGoogle Scholar
  6. Belloni A, Chernozhukov V, Wang L (2011) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806MathSciNetCrossRefzbMATHGoogle Scholar
  7. Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of lasso and Dantzig selector. Ann Stat 37(4):1705–1732MathSciNetCrossRefzbMATHGoogle Scholar
  8. Bien J, Gaynanova I, Lederer J, Müller C (2018) Non-convex global minimization and false discovery rate control for the TREX. J Comput Graph Stat 27(1):23–33.  https://doi.org/10.1080/10618600.2017.1341414
  9. Boucheron S, Lugosi G, Massart P (2013) Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, CambridgeCrossRefzbMATHGoogle Scholar
  10. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, BerlinCrossRefzbMATHGoogle Scholar
  11. Bunea F, Lederer J, She Y (2014) The group square-root lasso: theoretical properties and fast algorithms. IEEE Trans Inf Theory 60(2):1313–1325MathSciNetCrossRefzbMATHGoogle Scholar
  12. Bunea F, Tsybakov A, Wegkamp M (2006) Aggregation and sparsity via \(\ell _1\)-penalized least squares. In: Proceedings of 19th annual conference on learning theory, pp 379–391Google Scholar
  13. Candès E, Plan Y (2009) Near-ideal model selection by \(\ell _1\) minimization. Ann Stat 37(5):2145–2177CrossRefzbMATHGoogle Scholar
  14. Candes E, Tao T (2007) The Dantzig selector: Statistical estimation when p is much larger than n. Ann Stat 35(6):2313–2351MathSciNetCrossRefzbMATHGoogle Scholar
  15. Chatterjee S, Jafarov J (2015) Prediction error of cross-validated lasso. arXiv:1502.06291
  16. Chételat D, Lederer J, Salmon J (2017) Optimal two-step prediction in regression. Electron J Stat 11(1):2519–2546MathSciNetCrossRefzbMATHGoogle Scholar
  17. Chichignoud M, Lederer J, Wainwright M (2016) A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J Mach Learn Res 17:1–20MathSciNetzbMATHGoogle Scholar
  18. Combettes P, Müller C (2016) Perspective functions: proximal calculus and applications in high-dimensional statistics. J Math Anal Appl 457(2):1283–1306MathSciNetCrossRefzbMATHGoogle Scholar
  19. Dalalyan A, Tsybakov A (2012) Mirror averaging with sparsity priors. Bernoulli 18(3):914–944MathSciNetCrossRefzbMATHGoogle Scholar
  20. Dalalyan A, Tsybakov A (2012) Sparse regression learning by aggregation and langevin monte-carlo. J Comput Syst Sci 78(5):1423–1443MathSciNetCrossRefzbMATHGoogle Scholar
  21. Dalalyan A, Hebiri M, Lederer J (2017) On the prediction performance of the lasso. Bernoulli 23(1):552–581MathSciNetCrossRefzbMATHGoogle Scholar
  22. Dalalyan A, Tsybakov A (2007) Aggregation by exponential weighting and sharp oracle inequalities. In: Proceedings of 19th annual conference on learning theory, pp 97–111Google Scholar
  23. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRefzbMATHGoogle Scholar
  24. Giraud C, Huet S, Verzelen N (2012) High-dimensional regression with unknown variance. Stat Sci 27(4):500–518MathSciNetCrossRefzbMATHGoogle Scholar
  25. Hebiri M, Lederer J (2013) How correlations influence lasso prediction. IEEE Trans Inf Theory 59(3):1846–1854MathSciNetCrossRefzbMATHGoogle Scholar
  26. Huang C, Cheang G, Barron A (2008) Risk of penalized least squares, greedy selection and L1 penalization for flexible function libraries. ManuscriptGoogle Scholar
  27. Koltchinskii V, Lounici K, Tsybakov A (2011) Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann Stat 39(5):2302–2329MathSciNetCrossRefzbMATHGoogle Scholar
  28. Lederer J, van de Geer S (2014) New concentration inequalities for empirical processes. Bernoulli 20(4):2020–2038MathSciNetCrossRefzbMATHGoogle Scholar
  29. Lederer J, Müller C (2014) Topology adaptive graph estimation in high dimensions. arXiv:1410.7279
  30. Lederer J, Müller C (2015) Don’t fall for tuning parameters: tuning-free variable selection in high dimensions with the TREX. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  31. Lederer J, Yu L, Gaynanova I (2016) Oracle inequalities for high-dimensional prediction. arXiv:1608.00624
  32. Lim N, Lederer J (2016) Efficient feature selection with large and high-dimensional data. arXiv:1609.07195
  33. Massart P, Meynet C (2011) The Lasso as an \(\ell _1\)-ball model selection procedure. Electron J Stat 5:669–687MathSciNetCrossRefzbMATHGoogle Scholar
  34. Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc Ser B 72(4):417–473MathSciNetCrossRefGoogle Scholar
  35. Raskutti G, Wainwright M, Yu B (2010) Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res 11:2241–2259MathSciNetzbMATHGoogle Scholar
  36. Rigollet P, Tsybakov A (2011) Exponential screening and optimal rates of sparse estimation. Ann Stat 39(2):731–771MathSciNetCrossRefzbMATHGoogle Scholar
  37. Sabourin J, Valdar W, Nobel A (2015) A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics 71(4):1185–1194MathSciNetCrossRefzbMATHGoogle Scholar
  38. Shah R, Samworth R (2013) Variable selection with error control: another look at stability selection. J R Stat Soc Ser B 75(1):55–80MathSciNetCrossRefGoogle Scholar
  39. Sun T, Zhang CH (2012) Scaled sparse linear regression. Biometrika 99(4):879–898MathSciNetCrossRefzbMATHGoogle Scholar
  40. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statist Soc Ser B 58(1):267–288MathSciNetzbMATHGoogle Scholar
  41. van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the lasso. Electron J Stat 3:1360–1392MathSciNetCrossRefzbMATHGoogle Scholar
  42. van de Geer S, Lederer J (2013) The Bernstein-Orlicz norm and deviation inequalities. Probab Theory Relat Fields 157(1–2):225–250MathSciNetCrossRefzbMATHGoogle Scholar
  43. van de Geer S, Lederer J (2013) The Lasso, correlated design, and improved oracle inequalities. IMS Collections 9:303–316MathSciNetzbMATHGoogle Scholar
  44. van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, BerlinCrossRefzbMATHGoogle Scholar
  45. van de Geer S (2007) The deterministic lasso. In Joint statistical meetings proceedingsGoogle Scholar
  46. van de Geer S (2000) Empirical processes in M-estimation. Cambridge University Press, CambridgezbMATHGoogle Scholar
  47. Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell _1\)-constrained quadratic programming (lasso). IEEE Trans Inf Theory 55(4):2183–2202CrossRefzbMATHGoogle Scholar
  48. Wellner J (2017) The Bennett-Orlicz norm. Sankhya A 79(2):355–383MathSciNetCrossRefzbMATHGoogle Scholar
  49. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67MathSciNetCrossRefzbMATHGoogle Scholar
  50. Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942MathSciNetCrossRefzbMATHGoogle Scholar
  51. Zhuang R, Lederer J (2017) Maximum regularized likelihood estimators: a general prediction theory and applications. arXiv:1710.02950

Copyright information

© Sociedad de Estadística e Investigación Operativa 2018

Authors and Affiliations

  1. 1.Department of Data Sciences and OperationsUniversity of Southern California Los AngelesUSA
  2. 2.Department of StatisticsTexas A&M UniversityCollege StationUSA
  3. 3.Departments of Statistics and BiostatisticsUniversity of WashingtonSeattleUSA
  4. 4.Flatiron InstituteSimons FoundationNew YorkUSA

Personalised recommendations