Skip to main content

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Abstract

We present a new algorithm for boosting generalized additive models for location, scale and shape (GAMLSS) that allows to incorporate stability selection, an increasingly popular way to obtain stable sets of covariates while controlling the per-family error rate. The model is fitted repeatedly to subsampled data, and variables with high selection frequencies are extracted. To apply stability selection to boosted GAMLSS, we develop a new “noncyclical” fitting algorithm that incorporates an additional selection step of the best-fitting distribution parameter in each iteration. This new algorithm has the additional advantage that optimizing the tuning parameters of boosting is reduced from a multi-dimensional to a one-dimensional problem with vastly decreased complexity. The performance of the novel algorithm is evaluated in an extensive simulation study. We apply this new algorithm to a study to estimate abundance of common eider in Massachusetts, USA, featuring excess zeros, overdispersion, nonlinearity and spatiotemporal structures. Eider abundance is estimated via boosted GAMLSS, allowing both mean and overdispersion to be regressed on covariates. Stability selection is used to obtain a sparse set of stable predictors.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  • Aho, K., Derryberry, D.W., Peterson, T.: Model selection for ecologists: the worldviews of AIC and BIC. Ecology 95, 631–636 (2014)

    Article  Google Scholar 

  • Anderson, D.R., Burnham, K.P.: Avoiding pitfalls when using information-theoretic methods. J. Wildl. Manag. 912–918 (2002)

  • Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–505 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  • Bühlmann, P., Hothorn, T.: Twin boosting: improved feature selection and prediction. Stat. Comput. 20, 119–138 (2010)

    MathSciNet  Article  Google Scholar 

  • Bühlmann, P., Yu, B.: Boosting with the L\(_2\) loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)

    Article  MATH  Google Scholar 

  • Bühlmann, P., Yu, B.: Sparse boosting. J. Mach. Learn. Res. 7, 1001–1024 (2006)

    MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., Gertheiss, J., Hieke, S., Kneib, T., Ma, S., Schumacher, M., Tutz, G., Wang, C., Wang, Z., Ziegler, A., et al.: Discussion of “the evolution of boosting algorithms” and “extending statistical boosting”. Methods Inf. Med. 53(6), 436–445 (2014)

    Article  Google Scholar 

  • Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carre, G., Marquez, J.R.G., Gruber, B., Lafourcade, B., Leitao, P.J., Münkemüller, T., McClean, C., Osborne, P.E., Reineking, B., Schröder, B., Skidmore, A.K., Zurell, D., Lautenbach, S.: Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 27–46 (2013)

    Article  Google Scholar 

  • Flack, V.F., Chang, P.C.: Frequency of selecting noise variables in subset regression analysis: a simulation study. Am. Stat. 41(1), 84–86 (1987)

    Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)

    Article  MATH  Google Scholar 

  • Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models, vol. 43. CRC Press, Boca Raton (1990)

    MATH  Google Scholar 

  • Hofner, B., Boccuto, L., Göker, M.: Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinf. 16(1), 144 (2015)

    Article  Google Scholar 

  • Hofner, B., Hothorn, T.: stabs: stability selection with error control (2017). http://CRAN.R-project.org/package=stabs. R package version 0.6-2

  • Hofner, B., Hothorn, T., Kneib, T., Schmid, M.: A framework for unbiased model selection based on boosting. J. Comput. Gr. Stat. 20, 956–971 (2011)

    MathSciNet  Article  Google Scholar 

  • Hofner, B., Mayr, A., Fenske, N., Thomas, J., Schmid, M.: gamboostLSS: boosting methods for GAMLSS models (2017). http://CRAN.R-project.org/package=gamboostLSS. R package version 2.0-0

  • Hofner, B., Mayr, A., Robinzonov, N., Schmid, M.: Model-based boosting in R—A hands-on tutorial using the R package mboost. Comput. Stat. 29, 3–35 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  • Hofner, B., Mayr, A., Schmid, M.: gamboostLSS: an R package for model building and variable selection in the GAMLSS framework. J. Stat. Softw. 74(1), 1–31 (2016)

    Article  Google Scholar 

  • Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B.: Model-based boosting 2.0. J. Mach. Learn. Res. 11, 2109–2113 (2010)

    MathSciNet  MATH  Google Scholar 

  • Hothorn, T., Buehlmann, P., Kneib, T., Schmid, T., Hofner, B.: mboost: model-based boosting (2017). http://CRAN.R-project.org/package=mboost. R package version 2.8-0

  • Hothorn, T., Müller, J., Schröder, B., Kneib, T., Brandl, R.: Decomposing environmental, spatial, and spatiotemporal components of species distributions. Ecol. Monogr. 81, 329–347 (2011)

    Article  Google Scholar 

  • Huang, S.M.Y., Huang, J., Fang, K.: Gene network-based cancer prognosis analysis with sparse boosting. Genet. Res. 94, 205–221 (2012)

    Article  Google Scholar 

  • Li, P.: Robust logitboost and adaptive base class (abc) logitboost (2012). arXiv preprint arXiv:1203.3491

  • Mayr, A., Binder, H., Gefeller, O., Schmid, M., et al.: The evolution of boosting algorithms. Methods Inf. Med. 53(6), 419–427 (2014)

    Article  Google Scholar 

  • Mayr, A., Binder, H., Gefeller, O., Schmid, M., et al.: Extending statistical boosting. Methods Inf. Med. 53(6), 428–435 (2014)

  • Mayr, A., Fenske, N., Hofner, B., Kneib, T., Schmid, M.: Generalized additive models for location, scale and shape for high-dimensional data—a flexible approach based on boosting. J. R. Stat. Soc. Ser. C Appl. Stat. 61(3), 403–427 (2012)

    MathSciNet  Article  Google Scholar 

  • Mayr, A., Hofner, B., Schmid, M.: Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinf. 17(1), 288 (2016)

  • Mayr, A., Hofner, B., Schmid, M., et al.: The importance of knowing when to stop. Methods Inf. Med. 51(2), 178–186 (2012)

    Article  Google Scholar 

  • Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(4), 417–473 (2010)

    MathSciNet  Article  Google Scholar 

  • Messner, J.W., Mayr, G.J., Zeileis, A.: Nonhomogeneous boosting for predictor selection in ensemble postprocessing. Mon. Weather Rev. 145(1), 137–147 (2017). doi:10.1175/MWR-D-16-0088.1

    Article  Google Scholar 

  • Mullahy, J.: Specification and testing of some modified count data models. J. Econom. 33(3), 341–365 (1986)

    MathSciNet  Article  Google Scholar 

  • Murtaugh, P.A.: Performance of several variable-selection methods applied to real ecological data. Ecol. Lett. 12, 1061–1068 (2009)

    Article  Google Scholar 

  • Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: European Conference on Computer Vision, pp. 71–84. Springer (2004)

  • Osorio, J.D.G., Galiano, S.G.G.: Non-stationary analysis of dry spells in monsoon season of Senegal River Basin using data from regional climate models (RCMs). J. Hydrol. 450–451, 82–92 (2012)

    Article  Google Scholar 

  • Rigby, R.A., Stasinopoulos, D.M.: Generalized additive models for location, scale and shape. J. R. Stat. Soc. Ser. C (Appl. Stat.) 54(3), 507–554 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  • Rigby, R.A., Stasinopoulos, D.M., Akantziliotou, C.: Instructions on how to use the gamlss package in R (2008). http://www.gamlss.org/wp-content/uploads/2013/01/gamlss-manual.pdf

  • Schmid, M., Hothorn, T.: Boosting additive models using component-wise P-splines. Comput. Stat. Data Anal. 53(2), 298–311 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  • Schmid, M., Potapov, S., Pfahlberg, A., Hothorn, T.: Estimation and regularization techniques for regression models with multidimensional prediction functions. Stat. Comput. 20(2), 139–150 (2010)

    MathSciNet  Article  Google Scholar 

  • Shah, R.D., Samworth, R.J.: Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75(1), 55–80 (2013)

    MathSciNet  Article  Google Scholar 

  • Smith, A.D., Hofner, B., Osenkowski, J.E., Allison, T., Sadoti, G., McWilliams, S.R., Paton, P.W.C.: Spatiotemporal modelling of sea duck abundance: implications for marine spatial planning (2017). arXiv preprint arXiv:1705.00644

Download references

Acknowledgements

We thank Mass Audubon for the use of common eider abundance data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janek Thomas.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 273662 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Thomas, J., Mayr, A., Bischl, B. et al. Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates. Stat Comput 28, 673–687 (2018). https://doi.org/10.1007/s11222-017-9754-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9754-6

Keywords

  • Boosting
  • Additive models
  • GAMLSS
  • GamboostLSS
  • Stability selection