Statistics and Computing

, Volume 20, Issue 2, pp 253–266 | Cite as

Sparse conformal predictors

  • Mohamed HebiriEmail author


Conformal predictors, introduced by Vovk et al. (Algorithmic Learning in a Random World, Springer, New York, 2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. We propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if the total number of covariates is very large. Our approach is based on combining the principle of conformal prediction with the 1 penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ε>0 and has a coverage probability larger than or equal to 1−ε. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated and real data.

LASSO LARS Sparsity Variable selection Regularization path Confidence set 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bühlmann, P., Hothorn, T.: Twin boosting: improved feature selection and prediction. Stat. Comput. (2010, this issue) Google Scholar
  2. Bunea, F., Tsybakov, A., Wegkamp, M.: Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1, 169–194 (2007) zbMATHCrossRefMathSciNetGoogle Scholar
  3. Casella, G., Berger, R.L.: Statistical Inference. Duxbury, N. Scituate (2001) Google Scholar
  4. Chen, S.S., Donoho, D.L.: Atomic decomposition by basis pursuit. Technical Report (1995) Google Scholar
  5. Chesneau, Ch., Hebiri, M.: Some theoretical results on the grouped variables Lasso. Math. Methods Stat. 17, 317–326 (2008) zbMATHCrossRefMathSciNetGoogle Scholar
  6. Dalalyan, A., Tsybakov, A.: Aggregation by exponential weighting and sharp oracle inequalities. In: Learning Theory. Lecture Notes in Comput. Sci., vol. 4539, pp. 97–111. Springer, Berlin (2007) CrossRefGoogle Scholar
  7. Dalalyan, A., Tsybakov, A.: Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72, 39–61 (2008) CrossRefGoogle Scholar
  8. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression—with discussion. Ann. Stat. 32, 407–499 (2004) zbMATHCrossRefMathSciNetGoogle Scholar
  9. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007) zbMATHCrossRefMathSciNetGoogle Scholar
  10. Garrigues, P., El Ghaoui, L.: An homotopy algorithm for the lasso with online observations. In: Neural Information Processing Systems (Nips), vol. 21, pp. 489–496. MIT Press, Cambridge (2008) Google Scholar
  11. Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York (2002) Google Scholar
  12. Hebiri, M.: Regularization with the smooth-lasso procedure. Technical Report (2008) Google Scholar
  13. Huang, C., Cheang, G.L.H., Barron, A.: Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries. Preprint (2008) Google Scholar
  14. Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l1-regularized least squares. IEEE J. Sel. Top. Signal Process. 1, 606–617 (2007) CrossRefGoogle Scholar
  15. Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000) zbMATHCrossRefMathSciNetGoogle Scholar
  16. Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777–801 (2009) MathSciNetGoogle Scholar
  17. Meinshausen, N., Bühlmann, P.: High dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006) zbMATHCrossRefGoogle Scholar
  18. Osborne, M., Presnell, B., Turlach, B.: On the LASSO and its dual. J. Comput. Graph. Stat. 9, 319–337 (2000a) CrossRefMathSciNetGoogle Scholar
  19. Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20, 389–403 (2000b) zbMATHCrossRefMathSciNetGoogle Scholar
  20. Park, M.Y., Hastie, T.: L 1-regularization path algorithm for generalized linear models. J. R. Stat. Soc., Ser. B, Stat. Methodol. 69, 659–677 (2007) CrossRefMathSciNetGoogle Scholar
  21. Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35, 1012–1030 (2007) zbMATHCrossRefMathSciNetGoogle Scholar
  22. Santosa, F., Symes, W.W.: Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Stat. Comput. 7, 1307–1330 (1986) zbMATHCrossRefMathSciNetGoogle Scholar
  23. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for 1 regularized loss minimization. In: Proceedings of the 26th International Conference on Machine Learning. Omnipress, Montreal (2009) Google Scholar
  24. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B 58, 267–288 (1996) zbMATHMathSciNetGoogle Scholar
  25. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc., Ser. B, Stat. Methodol. 67, 91–108 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  26. Vapnik, V.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998) Google Scholar
  27. Vovk, V.: Asymptotic optimality of transductive confidence machine. In: Algorithmic Learning Theory. Lecture Notes in Comput. Sci., vol. 2533, pp. 336–350. Springer, Berlin (2002a) CrossRefGoogle Scholar
  28. Vovk, V.: On-line confidence machines are well-calibrated. In: Proceedings of the Forty-Third Annual Symposium on Foundations of Computer Science, pp. 187–196. IEEE Computer Society, Los Alamitos (2002b) Google Scholar
  29. Vovk, V., Gammerman, A., Saunders, C.: Machine-learning applications of algorithmic randomness. In Proceedings of the 16th International Conference on Machine Learning, pp. 444–453. ICML (1999) Google Scholar
  30. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York (2005) zbMATHGoogle Scholar
  31. Vovk, V., Nouretdinov Ilia, G., Gammerman, A.: On-line predictive linear regression. Technical Report (2007) Google Scholar
  32. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68, 49–67 (2006) zbMATHCrossRefMathSciNetGoogle Scholar
  33. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006) MathSciNetGoogle Scholar
  34. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006) zbMATHCrossRefGoogle Scholar
  35. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc., Ser. B, Stat. Methodol. 67, 301–320 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  36. Zou, H., Hastie, T., Tibshirani, R.: On the “Degrees of Freedom” of the lasso. Ann. Stat. 35, 2173–2192 (2007). URL zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Seminar für StatistikETH-ZurichZürichSwitzerland
  2. 2.Laboratoire de Probabilités et Modèles Aléatoires, CNRS-UMR 7599Université Paris 7—Diderot, UFR de MathématiquesParisFrance

Personalised recommendations