Abstract
We use Bayesian model selection paradigms, such as group least absolute shrinkage and selection operator priors, to facilitate generalized additive model selection. Our approach allows for the effects of continuous predictors to be categorized as either zero, linear or non-linear. Employment of carefully tailored auxiliary variables results in Gibbsian Markov chain Monte Carlo schemes for practical implementation of the approach. In addition, mean field variational algorithms with closed form updates are obtained. Whilst not as accurate, this fast variational option enhances scalability to very large data sets. A package in the R language aids use in practice.
Similar content being viewed by others
References
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993)
Azzalini, A.: sn 2.1.1: the Skew–Normal and related distributions such as the Skew-t and the Unified Skew–Normal. R package (2023). http://azzalini.stat.unipd.it/SN
Bhadra, A., Datta, J., Polson, N.G., Willard, B.: Lasso meets horseshoe: a survey. Stat. Sci. 34, 405–427 (2019)
Bürkner, P.-C.: bmrs 2.18.0: Bayesian regression models using Stan. R package (2022). https://r-project.org
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97, 465–480 (2010)
Chouldechova, A., Hastie, T.: Generalized additive model selection (2015). arXiv:1506.03850
Chouldechova, A., Hastie, T.: gamsel 1.8: fit regularization path for generalized additive models. R package (2022). https://r-project.org
Croissant, Y.: Ecdat 0.4: data sets for econometrics. R package (2022). https://r-project.org
Eddelbuettel, D., François, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)
Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990)
Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515–533 (2006)
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
Griffin, J.E., Brown, P.J.: Bayesian hyper-lassos with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2011)
Harezlak, J., Ruppert, D., Wand, M.P.: HRW 1.0: datasets, functions and scripts for semiparametric regression supporting Harezlak, Ruppert & Wand (2018). R package (2021). https://r-project.org
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman & Hall, New York (1990)
He, V.X., Wand, M.P.: gamselBayes: Bayesian generalized additive model selection. R package version 2.0 (2023). http://cran.r-project.org
Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33, 730–733 (2005)
Kyung, M., Gill, J., Ghosh, M., Casella, G.: Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 5, 369–412 (2010)
Lempers, F.B.: Posterior Probabilities of Alternative Linear Models. Rotterdam University Press, Rotterdam (1971)
Merrill, H.R., Tang, X., Bliznyuk, N.: Spatio-temporal additive regression model selection for urban water demand. Stoch. Environ. Res. Risk Assess. 33, 1075–1087 (2019)
Michael, J.R., Schucany, W.R., Haas, R.W.: Generating random variates using transformations with multiple roots. Am. Stat. 30, 88–90 (1976)
Mitchell, T.J., Beauchamp, J.J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–1032 (1988)
Ngo, L., Wand, M.P.: Smoothing with mixed model software. J. Stat. Softw. 9(1), 1–54 (2004)
Ormerod, J.T., Wand, M.P.: Explaining variational approximations. Am. Stat. 64, 140–153 (2010)
Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103, 681–686 (2008)
Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. Ser. B 71, 1009–1030 (2009)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2023). https://www.r-project/org/
Reich, B.J., Sorlie, C.B., Bondell, H.D.: Variable selection in smoothing spline ANOVA: application to deterministic computer codes. Technometrics 51, 110–120 (2009)
Robert, C.P.: Simulation of truncated normal variates. Stat. Comput. 5, 121–125 (1995)
Scheipl, F.: spikeSlabGAM: Bayesian variable selection, model choice and regularization for generalized additive mixed models in R. J. Stat. Softw. 43(14), 1–24 (2011)
Scheipl, F.: spikeSlabGAM 1.1: Bayesian variable selection and model choice for generalized additive mixed models. R package (2022). https://github.com/fabian-s/spikeSlabGAM
Scheipl, F., Fahrmeir, L., Kneib, T.: Spike-and-slab priors for function selection in structured additive regression models. J. Am. Stat. Assoc. 107, 1518–1532 (2012)
Shively, T.S., Kohn, R., Wood, S.: Variable selection and function estimation in additive nonparametric regression using a data-based prior. J. Am. Stat. Assoc. 94, 777–794 (1999)
Umlauf, N., Klein, N., Zeileis, A., Simon, T.: bamlss 1.2: Bayesian additive models for location, scale, and shape (and beyond). R package (2023a). https://www.bamlss.org
Umlauf, N., Kneib, T., Klein, N.: BayesX 0.3: R utilities accompanying the software package BayesX. R package (2023b). https://www.BayesX.org
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008)
Wand, M.P., Ormerod, J.T.: On semiparametric regression with O’Sullivan penalized splines. Aust. N. Z. J. Stat. 50, 179–198 (2008)
Wand, M.P., Ormerod, J.T.: Penalized wavelets: embedding wavelets into semiparametric regression. Electron. J. Stat. 5, 1654–1717 (2011)
Wand, M.P., Ormerod, J.T.: Continued fraction enhancement of Bayesian computing. Stat 1, 31–41 (2012)
Wood, S.N.: Generalized Additive Models: An Introduction with R, 2nd edn. CRC Press, Boca Raton, Florida (2017)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Acknowledgements
We are grateful to two reviewers for their comments that led to improvements. This research was supported by Australian Research Council grant DP180100597.
Funding
Australian Research Council (DP180100597).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors report that there are no competing interests to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, V.X., Wand, M.P. Bayesian generalized additive model selection including a fast variational option. AStA Adv Stat Anal (2023). https://doi.org/10.1007/s10182-023-00490-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10182-023-00490-y