AStA Advances in Statistical Analysis

, Volume 97, Issue 4, pp 349–385 | Cite as

Penalized likelihood and Bayesian function selection in regression models

Original Paper


Challenging research in various fields has driven a wide range of methodological advances in variable selection for regression models with high-dimensional predictors. In comparison, selection of nonlinear functions in models with additive predictors has been considered only more recently. Several competing suggestions have been developed at about the same time and often do not refer to each other. This article provides a state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework. In an empirical comparison, also including boosting, we evaluate several methods through applications to simulated and real data, thereby providing some guidance on their performance in practice.


Generalized additive model Regularization Smoothing Spike and slab priors 


  1. Avalos, M., Grandvalet, Y., Ambroise, C.: Parsimonious additive models. Comput. Stat. Data. Anal. 51, 2851–2870 (2007)MathSciNetCrossRefMATHGoogle Scholar
  2. Belitz, C., Lang, S.: Simultaneous selection of variables and smoothing parameters in structured additive regression models. Comput. Stat. Data. Anal. 53, 61–81 (2008)MathSciNetCrossRefMATHGoogle Scholar
  3. Belitz, C., Brezger, A., Kneib, T., Lang, S,, Umlauf, N.: BayesX-Software for Bayesian inference in structured additive regression models (2012). Version 2.1
  4. Bühlmann, P., Hothorn, T.: Boosting algorithms: Regularization, prediction and model fitting. Stat. Sci. 22, 477–505 (2007)CrossRefMATHGoogle Scholar
  5. Bühlmann, P., Yu, B.: Boosting with the \(l_2\) loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)CrossRefMATHGoogle Scholar
  6. Cottet, R., Kohn, R.J., Nott, D.J.: Variable selection and model averaging in semiparametric overdispersed generalized linear models. J. Am. Stat. Assoc. 103, 661–671 (2008)MathSciNetCrossRefMATHGoogle Scholar
  7. Eaton, J. W., Bateman, D., Hauberg, S.: GNU Octave Manual Version 3. Network Theory Limited (2008)Google Scholar
  8. Eilers, P.H.C., Marx, B.D.: Flexible smoothing using B-splines and penalized likelihood. Stat. Sci. 11, 89–121 (1996)MathSciNetCrossRefMATHGoogle Scholar
  9. Eugster, M.A., Hothorn, T. (Authors), Frick, H., Kondofersky, I., Kuehnle, O. S., Lindenlaub, C., Pfundstein, G., Speidel, M., Spindler, M., Straub, A., Wickler, F., Zink, K. (Contributors): hgam: High-dimensional additive modelling (2010) R package version 0.1-0Google Scholar
  10. Fahrmeir, L., Kneib, T.: Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford Statistical Science Series 36, Oxford (2011)Google Scholar
  11. Fahrmeir, L., Kneib, T., Konrath, S.: Bayesian regularization in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection. Stat. Comput. 20, 203–219 (2010)MathSciNetCrossRefGoogle Scholar
  12. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)MathSciNetCrossRefMATHGoogle Scholar
  13. Frank, A., Asuncion, A.: UCI machine learning repository (2010).
  14. George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)CrossRefGoogle Scholar
  15. George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Statistica Sinica 7, 339–374 (1997)MATHGoogle Scholar
  16. Griffin, J.E., Brown, P.J.: Alternative prior distributions for variable selection with very many more variables than observations. Technical Report UKC/IMS/05/08, IMS, University of Kent (2005)Google Scholar
  17. Gu, C.: Smoothing Spline ANOVA Models. Springer, Brlin (2002)Google Scholar
  18. Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: mboost. Model-based boosting (2012). R package version 2.1-1Google Scholar
  19. Huang, J., Horowitz, J.L., Wei, F.: Variable selection in nonparametric additive models. Ann. Stat. 38, 2282–2313 (2010)MathSciNetCrossRefMATHGoogle Scholar
  20. Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33(2), 730–773 (2005)MathSciNetCrossRefMATHGoogle Scholar
  21. Kneib, T., Hothorn, T., Tutz, G.: Variable selection and model choice in geoadditive regression models. Biometrics 65, 626–634 (2009)MathSciNetCrossRefMATHGoogle Scholar
  22. Kneib, T., Konrath, S., Fahrmeir, L.: High-dimensional structured additive regression models: Bayesian regularisation, smoothing and predictive performance. Appl. Stat. 60, 51–70 (2011)MathSciNetGoogle Scholar
  23. Konrath, S., Kneib, T., Fahrmeir, L.: Bayesian smoothing, shrinkage and variable selection in hazard regression. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures. Festschrift in Honour of Ursula Gather (2013)Google Scholar
  24. Leng, C., Zhang, H.H.: Model selection in nonparametric hazard regression. Nonparametr. Stat. 18, 417–429 (2006)MathSciNetCrossRefMATHGoogle Scholar
  25. Lin, Y., Zhang, H.H.: Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 34, 2272–2297 (2006)CrossRefMATHGoogle Scholar
  26. Marra, G., Wood, S.: Practical variable selection for generalized additive models. Comput. Stat. Data Anal. 55, 2372–2387 (2011)MathSciNetCrossRefGoogle Scholar
  27. MATLAB. MATLAB version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts (2010)Google Scholar
  28. Meier, L.: grplasso: Fitting user specified models with Group Lasso penalty (2009). R package version 0.4-2Google Scholar
  29. Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B 70, 53–71 (2008)CrossRefMATHGoogle Scholar
  30. Meier, L., van der Geer, S., Bühlmann, P.: High-dimensional additive modeling. Ann. Stat. 37, 3779–3821 (2009)CrossRefMATHGoogle Scholar
  31. O’Hara, R.B., Sillanpää, M.J.: A review of Bayesian variable selection methods: what, how, and which? Bayesian Anal. 4, 85–118 (2009)MathSciNetCrossRefGoogle Scholar
  32. Panagiotelis, A., Smith, M.: Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. J. Econom. 143, 291–316 (2008)MathSciNetCrossRefGoogle Scholar
  33. Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103, 681–686 (2008)MathSciNetCrossRefMATHGoogle Scholar
  34. Polson, N.G., Scott, J.G.: Local shrinkage rules, Lévy processes and regularized regression. J. R. Stat. Soc. Ser. B 74(2), 287–311 (2012)MathSciNetCrossRefGoogle Scholar
  35. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2011).
  36. Radchenko, P., James, G.M.: Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Am. Stat. Assoc. 105, 1–13 (2010)MathSciNetCrossRefGoogle Scholar
  37. Ravikumar, P., Liu, H., Lafferty, J., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. Ser. B 71, 1009–1030 (2009)MathSciNetCrossRefGoogle Scholar
  38. Reich, B.J., Storlie, C.B., Bondell, H.D.: Variable selection in Bayesian smoothing spline ANOVA models: application to deterministic computer codes. Technometrics 51, 110 (2009)Google Scholar
  39. Rue, H., Held, L.: Gaussian Markov Random Fields. Chapman & Hall / CRC (2005)Google Scholar
  40. Sabanés Bové, D.: hypergsplines:Bayesian model selection with penalised splines and hyper-g prior (2012) R package version 0.0-32Google Scholar
  41. Sabanés Bové, D., Held, L., Kauermann, G.: Mixtures of g-priors for generalised additive model selection with penalised splines. Technical report, University of Zurich and University Bielefeld (2011).
  42. Scheipl, F.: Bayesian regularization and model choice in structured additive regression. PhD thesis, Ludwig-Maximilians-Universität München, (2011a)Google Scholar
  43. Scheipl, F.: spikeSlabGAM: Bayesian variable selection, model choice and regularization for generalized additive mixed models in R. Journal of Statistical Software, 43(14), 1–24, 9 (2011b).
  44. Scheipl, F., Fahrmeir, L., Kneib, T.: Spike-and-slab priors for function selection in structured additive regression models. J. Am. Stat. Assoc. 107(500), 1518–1532 (2012). Google Scholar
  45. Smith, M., Kohn, R.: Nonparametric regression using Bayesian variable selection. J. Econometr. 75, 317–344 (1996)CrossRefMATHGoogle Scholar
  46. Storlie, C., Bondell, H., Reich, B., Zhang, H.H.: Surface estimation, variable selection, and the nonparametric oracle property. Statistica Sinica 21(2), 679–705 (2011)MathSciNetCrossRefMATHGoogle Scholar
  47. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)MathSciNetMATHGoogle Scholar
  48. Tutz, G., Binder, H.: Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62, 961–971 (2006)MathSciNetCrossRefMATHGoogle Scholar
  49. Umlauf, N., Kneib, T., Lang, S.: R2BayesX: Estimate structured additive regression models with BayesX (2012) R package Version 0.1-1Google Scholar
  50. Wahba, G.: Spline Models for Observational Data. SIAM (1990)Google Scholar
  51. Wang, L., Chen, G., Li, H.: Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23, 1486–1494 (2007)CrossRefGoogle Scholar
  52. Wood, S.: mgcv: GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL (2012). R package version 1.7-18Google Scholar
  53. Wood, S., Kohn, R., Shively, T., Jiang, W.: Model selection in spline nonparametric regression. J. R. Stat. Soc. Ser. B 64, 119–139 (2002)MathSciNetCrossRefMATHGoogle Scholar
  54. Xue, L.: Consistent variable selection in additive models. Statistica Sinica 19, 1281–1296 (2009)MathSciNetMATHGoogle Scholar
  55. Yau, P., Kohn, R., Wood, S.: Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression. J. Comput. Graph. Stat. 12, 23–54 (2003)MathSciNetCrossRefGoogle Scholar
  56. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)MathSciNetCrossRefMATHGoogle Scholar
  57. Zhang, H.H., Cheng, G., Liu, Y.: Linear or nonlinear? automatic structure discovery for partially linear models. J. Am. Stat. Assoc. 106(495), 1099–1112 (2011)Google Scholar
  58. Zhang, H.H., Lin, Y.: Component selection and smoothing for nonparametric regression in exponential families. Statistica Sinica 16, 1021–1041 (2006)MathSciNetMATHGoogle Scholar
  59. Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Statistics, Ludwig-Maximilians-University MünchenMunichGermany
  2. 2.Chair of Statistics, Georg-August-University Göttinger GöttingenGermany

Personalised recommendations