, Volume 28, Issue 2, pp 451–474 | Cite as

Prediction error bounds for linear regression with the TREX

  • Jacob Bien
  • Irina Gaynanova
  • Johannes LedererEmail author
  • Christian L. Müller
Original Paper


The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of the TREX from a theoretical perspective and provide new insights into penalized regression in general.


TREX High-dimensional regression Tuning parameters Oracle inequalities 

Mathematics Subject Classification




We thank the editor and the reviewers for their insightful comments.


  1. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79MathSciNetzbMATHGoogle Scholar
  2. Arlot S, Celisse A (2011) Segmentation of the mean of heteroscedastic data via cross-validation. Stat Comput 21(4):613–632MathSciNetzbMATHGoogle Scholar
  3. Bach, F (2008) Bolasso: Model consistent Lasso estimation through the bootstrap. In: Proceedings of the 25th international conference on machine learning, pp 33–40Google Scholar
  4. Baraud Y, Giraud C, Huet S (2009) Gaussian model selection with an unknown variance. Ann Stat 37(2):630–672MathSciNetzbMATHGoogle Scholar
  5. Barber R, Candès E (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085MathSciNetzbMATHGoogle Scholar
  6. Belloni A, Chernozhukov V, Wang L (2011) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806MathSciNetzbMATHGoogle Scholar
  7. Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of lasso and Dantzig selector. Ann Stat 37(4):1705–1732MathSciNetzbMATHGoogle Scholar
  8. Bien J, Gaynanova I, Lederer J, Müller C (2018) Non-convex global minimization and false discovery rate control for the TREX. J Comput Graph Stat 27(1):23–33.
  9. Boucheron S, Lugosi G, Massart P (2013) Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, CambridgezbMATHGoogle Scholar
  10. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, BerlinzbMATHGoogle Scholar
  11. Bunea F, Lederer J, She Y (2014) The group square-root lasso: theoretical properties and fast algorithms. IEEE Trans Inf Theory 60(2):1313–1325MathSciNetzbMATHGoogle Scholar
  12. Bunea F, Tsybakov A, Wegkamp M (2006) Aggregation and sparsity via \(\ell _1\)-penalized least squares. In: Proceedings of 19th annual conference on learning theory, pp 379–391Google Scholar
  13. Candès E, Plan Y (2009) Near-ideal model selection by \(\ell _1\) minimization. Ann Stat 37(5):2145–2177zbMATHGoogle Scholar
  14. Candes E, Tao T (2007) The Dantzig selector: Statistical estimation when p is much larger than n. Ann Stat 35(6):2313–2351MathSciNetzbMATHGoogle Scholar
  15. Chatterjee S, Jafarov J (2015) Prediction error of cross-validated lasso. arXiv:1502.06291
  16. Chételat D, Lederer J, Salmon J (2017) Optimal two-step prediction in regression. Electron J Stat 11(1):2519–2546MathSciNetzbMATHGoogle Scholar
  17. Chichignoud M, Lederer J, Wainwright M (2016) A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J Mach Learn Res 17:1–20MathSciNetzbMATHGoogle Scholar
  18. Combettes P, Müller C (2016) Perspective functions: proximal calculus and applications in high-dimensional statistics. J Math Anal Appl 457(2):1283–1306MathSciNetzbMATHGoogle Scholar
  19. Dalalyan A, Tsybakov A (2012) Mirror averaging with sparsity priors. Bernoulli 18(3):914–944MathSciNetzbMATHGoogle Scholar
  20. Dalalyan A, Tsybakov A (2012) Sparse regression learning by aggregation and langevin monte-carlo. J Comput Syst Sci 78(5):1423–1443MathSciNetzbMATHGoogle Scholar
  21. Dalalyan A, Hebiri M, Lederer J (2017) On the prediction performance of the lasso. Bernoulli 23(1):552–581MathSciNetzbMATHGoogle Scholar
  22. Dalalyan A, Tsybakov A (2007) Aggregation by exponential weighting and sharp oracle inequalities. In: Proceedings of 19th annual conference on learning theory, pp 97–111Google Scholar
  23. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetzbMATHGoogle Scholar
  24. Giraud C, Huet S, Verzelen N (2012) High-dimensional regression with unknown variance. Stat Sci 27(4):500–518MathSciNetzbMATHGoogle Scholar
  25. Hebiri M, Lederer J (2013) How correlations influence lasso prediction. IEEE Trans Inf Theory 59(3):1846–1854MathSciNetzbMATHGoogle Scholar
  26. Huang C, Cheang G, Barron A (2008) Risk of penalized least squares, greedy selection and L1 penalization for flexible function libraries. ManuscriptGoogle Scholar
  27. Koltchinskii V, Lounici K, Tsybakov A (2011) Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann Stat 39(5):2302–2329MathSciNetzbMATHGoogle Scholar
  28. Lederer J, van de Geer S (2014) New concentration inequalities for empirical processes. Bernoulli 20(4):2020–2038MathSciNetzbMATHGoogle Scholar
  29. Lederer J, Müller C (2014) Topology adaptive graph estimation in high dimensions. arXiv:1410.7279
  30. Lederer J, Müller C (2015) Don’t fall for tuning parameters: tuning-free variable selection in high dimensions with the TREX. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  31. Lederer J, Yu L, Gaynanova I (2016) Oracle inequalities for high-dimensional prediction. arXiv:1608.00624
  32. Lim N, Lederer J (2016) Efficient feature selection with large and high-dimensional data. arXiv:1609.07195
  33. Massart P, Meynet C (2011) The Lasso as an \(\ell _1\)-ball model selection procedure. Electron J Stat 5:669–687MathSciNetzbMATHGoogle Scholar
  34. Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc Ser B 72(4):417–473MathSciNetzbMATHGoogle Scholar
  35. Raskutti G, Wainwright M, Yu B (2010) Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res 11:2241–2259MathSciNetzbMATHGoogle Scholar
  36. Rigollet P, Tsybakov A (2011) Exponential screening and optimal rates of sparse estimation. Ann Stat 39(2):731–771MathSciNetzbMATHGoogle Scholar
  37. Sabourin J, Valdar W, Nobel A (2015) A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics 71(4):1185–1194MathSciNetzbMATHGoogle Scholar
  38. Shah R, Samworth R (2013) Variable selection with error control: another look at stability selection. J R Stat Soc Ser B 75(1):55–80MathSciNetGoogle Scholar
  39. Sun T, Zhang CH (2012) Scaled sparse linear regression. Biometrika 99(4):879–898MathSciNetzbMATHGoogle Scholar
  40. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statist Soc Ser B 58(1):267–288MathSciNetzbMATHGoogle Scholar
  41. van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the lasso. Electron J Stat 3:1360–1392MathSciNetzbMATHGoogle Scholar
  42. van de Geer S, Lederer J (2013) The Bernstein-Orlicz norm and deviation inequalities. Probab Theory Relat Fields 157(1–2):225–250MathSciNetzbMATHGoogle Scholar
  43. van de Geer S, Lederer J (2013) The Lasso, correlated design, and improved oracle inequalities. IMS Collections 9:303–316MathSciNetzbMATHGoogle Scholar
  44. van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, BerlinzbMATHGoogle Scholar
  45. van de Geer S (2007) The deterministic lasso. In Joint statistical meetings proceedingsGoogle Scholar
  46. van de Geer S (2000) Empirical processes in M-estimation. Cambridge University Press, CambridgezbMATHGoogle Scholar
  47. Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell _1\)-constrained quadratic programming (lasso). IEEE Trans Inf Theory 55(4):2183–2202zbMATHGoogle Scholar
  48. Wellner J (2017) The Bennett-Orlicz norm. Sankhya A 79(2):355–383MathSciNetzbMATHGoogle Scholar
  49. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67MathSciNetzbMATHGoogle Scholar
  50. Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942MathSciNetzbMATHGoogle Scholar
  51. Zhuang R, Lederer J (2017) Maximum regularized likelihood estimators: a general prediction theory and applications. arXiv:1710.02950

Copyright information

© Sociedad de Estadística e Investigación Operativa 2018

Authors and Affiliations

  1. 1.Department of Data Sciences and OperationsUniversity of Southern California Los AngelesUSA
  2. 2.Department of StatisticsTexas A&M UniversityCollege StationUSA
  3. 3.Departments of Statistics and BiostatisticsUniversity of WashingtonSeattleUSA
  4. 4.Flatiron InstituteSimons FoundationNew YorkUSA

Personalised recommendations