Statistics and Computing

, Volume 22, Issue 5, pp 1069–1084 | Cite as

The predictive Lasso

  • Minh-Ngoc TranEmail author
  • David J. Nott
  • Chenlei Leng


We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an l 1 constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thanks to its similar form to the original Lasso problem for GLMs, our procedure can benefit from available l 1-regularization path algorithms. Simulation studies and real data examples confirm the efficiency of our method in terms of predictive performance on future observations.


Generalized linear models Kullback-Leibler divergence Lasso Optimal prediction Variable selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aitchison, J.: Goodness of prediction fit. Biometrika 62, 547–554 (1975) MathSciNetzbMATHCrossRefGoogle Scholar
  2. Bailey, C.: Smart Exercise: Burning Fat, Getting Fit. Houghton-Mifflin, Boston (1994) Google Scholar
  3. Brown, P.J., Vannucci, M., Fearn, T.: Bayes model averaging with selection of regressors. J. R. Stat. Soc. B 64, 519–536 (2002) MathSciNetzbMATHCrossRefGoogle Scholar
  4. Burnham, K.P., Anderson, D.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, New York (2002) zbMATHGoogle Scholar
  5. Chen, M.H., Ibrahim, J.G.: Conjugate priors for generalized linear models. Stat. Sin. 13, 461–476 (2003) MathSciNetzbMATHGoogle Scholar
  6. Dupuis, J.A., Robert, C.P.: Variable selection in qualitative models via an entropic explanatory power. J. Stat. Plan. Inference 111, 77–94 (2003) MathSciNetzbMATHCrossRefGoogle Scholar
  7. Geisser, S.: Discussion of “Sampling and Bayes’ inference in scientific modelling and robustness'' by G.E.P. Box. J. R. Stat. Soc., Ser. A 143, 416–417 (1980) Google Scholar
  8. Geisser, S.: Predictive Inference: An Introduction. Chapman & Hall, New York (1993) zbMATHGoogle Scholar
  9. Gelman, A., Jakulin, A., Grazia, P., Su, Y.-S.: A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2, 1360–1383 (2008) MathSciNetzbMATHCrossRefGoogle Scholar
  10. Gneiting, T., Raftery, A.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007) MathSciNetzbMATHCrossRefGoogle Scholar
  11. Good, I.J.: Rational decisions. J. R. Stat. Soc. B 14, 107–114 (1952) MathSciNetGoogle Scholar
  12. Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15, 559–570 (2000) CrossRefGoogle Scholar
  13. Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial. Stat. Sci. 14, 382–417 (1999) MathSciNetzbMATHCrossRefGoogle Scholar
  14. Johnson, R.W.: Fitting percentage of body fat to simple body measurements. J. Stat. Educ. 4, 1 (1996) Google Scholar
  15. Leng, C., Tran, M.-N., Nott, D.J.: Bayesian adaptive Lasso. Technical Report (2010). arXiv:1009.2300v1
  16. Lindley, D.V.: The choice of variables in multiple regression (with discussion). J. R. Stat. Soc. B 30, 31–66 (1968) MathSciNetGoogle Scholar
  17. Nott, D.J., Leng, C.: Bayesian projection approaches to variable selection in generalized linear models. Comput. Stat. Data Anal. 54, 3227–3241 (2010) MathSciNetCrossRefGoogle Scholar
  18. O’Hagan, A., Forster, J.: The Advanced Theory of Statistics, Bayesian Inference, vol. 2B. Edward Arnold, London (2004) Google Scholar
  19. Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103, 681–686 (2008) MathSciNetzbMATHCrossRefGoogle Scholar
  20. Raftery, A.E., Madigan, D., Hoeting, J.A.: Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 92, 179–191 (1997) MathSciNetzbMATHGoogle Scholar
  21. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996) MathSciNetzbMATHGoogle Scholar
  22. Tran, M.N.: A criterion for optimal predictive model selection. Commun. Stat., Theory Methods 40, 893–906 (2011) MathSciNetzbMATHCrossRefGoogle Scholar
  23. Vehtari, A., Lampinen, J.: Model selection via predictive explanatory power. Report B38, Laboratory of Computational Engineering, Helsinki University of Technology (2004) Google Scholar
  24. Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Bayesian Inference and Decision Techniques: Essays in Honour of Bruno De Finetti, pp. 233–243. North-Holland, Amsterdam (1986) Google Scholar
  25. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006) MathSciNetzbMATHGoogle Scholar
  26. Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006) zbMATHCrossRefGoogle Scholar
  27. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005) MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Statistics and Applied ProbabilityNational University of SingaporeSingaporeSingapore
  2. 2.Australian School of BusinessUniversity of New South WalesSydneyAustralia

Personalised recommendations