Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates
We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate. The model is fitted repeatedly to subsampled data, and variables with high selection frequencies are extracted. To apply stability selection to boosted GAMLSS, we develop a new “noncyclical” fitting algorithm that incorporates an additional selection step of the best-fitting distribution parameter in each iteration. This new algorithm has the additional advantage that optimizing the tuning parameters of boosting is reduced from a multi-dimensional to a one-dimensional problem with vastly decreased complexity. The performance of the novel algorithm is evaluated in an extensive simulation study. We apply this new algorithm to a study to estimate abundance of common eider in Massachusetts, USA, featuring excess zeros, overdispersion, nonlinearity and spatiotemporal structures. Eider abundance is estimated via boosted GAMLSS, allowing both mean and overdispersion to be regressed on covariates. Stability selection is used to obtain a sparse set of stable predictors.
KeywordsBoosting Additive models GAMLSS GamboostLSS Stability selection
We thank Mass Audubon for the use of common eider abundance data.
- Anderson, D.R., Burnham, K.P.: Avoiding pitfalls when using information-theoretic methods. J. Wildl. Manag. 912–918 (2002)Google Scholar
- Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carre, G., Marquez, J.R.G., Gruber, B., Lafourcade, B., Leitao, P.J., Münkemüller, T., McClean, C., Osborne, P.E., Reineking, B., Schröder, B., Skidmore, A.K., Zurell, D., Lautenbach, S.: Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 27–46 (2013)CrossRefGoogle Scholar
- Flack, V.F., Chang, P.C.: Frequency of selecting noise variables in subset regression analysis: a simulation study. Am. Stat. 41(1), 84–86 (1987)Google Scholar
- Hofner, B., Hothorn, T.: stabs: stability selection with error control (2017). http://CRAN.R-project.org/package=stabs. R package version 0.6-2
- Hofner, B., Mayr, A., Fenske, N., Thomas, J., Schmid, M.: gamboostLSS: boosting methods for GAMLSS models (2017). http://CRAN.R-project.org/package=gamboostLSS. R package version 2.0-0
- Hothorn, T., Buehlmann, P., Kneib, T., Schmid, T., Hofner, B.: mboost: model-based boosting (2017). http://CRAN.R-project.org/package=mboost. R package version 2.8-0
- Li, P.: Robust logitboost and adaptive base class (abc) logitboost (2012). arXiv preprint arXiv:1203.3491
- Mayr, A., Binder, H., Gefeller, O., Schmid, M., et al.: Extending statistical boosting. Methods Inf. Med. 53(6), 428–435 (2014)Google Scholar
- Mayr, A., Hofner, B., Schmid, M.: Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinf. 17(1), 288 (2016)Google Scholar
- Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: European Conference on Computer Vision, pp. 71–84. Springer (2004)Google Scholar
- Rigby, R.A., Stasinopoulos, D.M., Akantziliotou, C.: Instructions on how to use the gamlss package in R (2008). http://www.gamlss.org/wp-content/uploads/2013/01/gamlss-manual.pdf
- Smith, A.D., Hofner, B., Osenkowski, J.E., Allison, T., Sadoti, G., McWilliams, S.R., Paton, P.W.C.: Spatiotemporal modelling of sea duck abundance: implications for marine spatial planning (2017). arXiv preprint arXiv:1705.00644