Machine Learning

, Volume 97, Issue 1–2, pp 65–78 | Cite as

Leave-one-out cross-validation is risk consistent for lasso

Article

Abstract

The lasso procedure pervades the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses—predictive risk consistency, sign consistency, correct model selection—these results assume that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.

Keywords

Stochastic equicontinuity Uniform convergence Persistence 

References

  1. Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37(4), 1705–1732.MathSciNetCrossRefMATHGoogle Scholar
  2. Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. The Journal of Machine Learning Research, 2, 499–526.MathSciNetMATHGoogle Scholar
  3. Bunea, F., Tsybakov, A., & Wegkamp, M. (2007). Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics, 1, 169–194.MathSciNetCrossRefMATHGoogle Scholar
  4. Chatterjee, A., & Lahiri, S. (2011). Strong consistency of lasso estimators. Sankhya A-Mathematical Statistics and Probability, 73(1), 55–78.MathSciNetMATHGoogle Scholar
  5. Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.MathSciNetCrossRefGoogle Scholar
  6. Davidson, J. (1994). Stochastic limit theory: An introduction for econometricians. Oxford: Oxford university press.CrossRefGoogle Scholar
  7. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.MathSciNetCrossRefMATHGoogle Scholar
  8. Fu, W., & Knight, K. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28(5), 1356–1378.MathSciNetCrossRefMATHGoogle Scholar
  9. van de Geer, S., & Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. (2011). http://arxiv.org/abs/1107.0189
  10. Grandvalet, Y. (1998). Least absolute shrinkage is equivalent to quadratic penalization. In ICANN 98 (pp. 201-206). London: SpringerGoogle Scholar
  11. Greenshtein, E., & Ritov, Y. A. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10(6), 971–988.MathSciNetCrossRefMATHGoogle Scholar
  12. Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression, Verlag: Springer.Google Scholar
  13. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Verlag: Springer.Google Scholar
  14. Lee, S., Zhu, J., & Xing, E. P. (2010). Adaptive multi-task Lasso: With application to eQTL detection. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.) Advances in neural information processing systems, vol. 23 (pp. 1306–1314 ).Google Scholar
  15. Leng, C., Lin, Y., & Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statistica Sinica, 16(4), 1273–1284.MathSciNetMATHGoogle Scholar
  16. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.MathSciNetCrossRefMATHGoogle Scholar
  17. Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59(4), 1161–1167.MathSciNetCrossRefMATHGoogle Scholar
  18. Osborne, M., Presnell, B., & Turlach, B. (2000). On the lasso and its dual. Journal of Computational and Graphical statistics, 9(2), 319–337.MathSciNetGoogle Scholar
  19. Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13, 135–143.Google Scholar
  20. Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486–494.MathSciNetCrossRefMATHGoogle Scholar
  21. Shi, W., Wahba, G., Wright, S., Lee, K., Klein, R., & Klein, B. (2008). LASSO-patternsearch algorithm with application to ophthalmology and genomic data. Statistics and its Interface, 1(1), 137.MathSciNetCrossRefMATHGoogle Scholar
  22. Stromberg, K. (1994). Probability for analysts. London: Chapman & Hall.MATHGoogle Scholar
  23. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1), 267–288.MathSciNetMATHGoogle Scholar
  24. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282.MathSciNetCrossRefGoogle Scholar
  25. Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics, 7, 1456–1490.MathSciNetCrossRefMATHGoogle Scholar
  26. Tibshirani, R. J., & Taylor, J. (2012). Degrees of freedom in lasso problems. The Annals of Statistics, 40, 1198–1232.MathSciNetCrossRefMATHGoogle Scholar
  27. Wang, H., & Leng, C. (2007). Unified lasso estimation by least squares approximation. Journal of the American Statistical Association, 102(479), 1039–1048.MathSciNetCrossRefMATHGoogle Scholar
  28. Xu, H., Mannor, S., & Caramanis, C. (2008). Sparse algorithms are not stable: A no-free-lunch theorem. In: Proceedings of the IEEE 46th Annual Allerton Conference on Communication, Control, and Computing, (pp. 1299–1303).Google Scholar
  29. Zou, H., Hastie, T., & Tibshirani, R. (2007). On the degrees of freedom of the lasso. The Annals of Statistics, 35(5), 2173–2192.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Department of StatisticsColorado State UniversityFort CollinsUSA
  2. 2.Department of StatisticsIndiana UniversityBloomingtonUSA

Personalised recommendations