Skip to main content

Inference for \(L_2\)-Boosting

Abstract

We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as \(L_2\)-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by \(L_2\)-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Automat. Control 19(6), 716–723 (1974)

    MathSciNet  Article  Google Scholar 

  2. Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L., et al.: Valid post-selection inference. Ann. Stat. 41(2), 802–837 (2013)

    MathSciNet  Article  Google Scholar 

  3. Brockhaus, S., Scheipl, F., Hothorn, T., Greven, S.: The functional linear array model. Stat. Model. 15(3), 279–300 (2015)

    MathSciNet  Article  Google Scholar 

  4. Brockhaus, S., Fuest, A., Mayr, A., Greven, S.: Signal regression models for location, scale and shape with an application to stock returns. J. R. Stat. Soc. Ser. C 67(3), 665–686 (2018)

    MathSciNet  Article  Google Scholar 

  5. Bühlmann, P., Hothorn, T.: Boosting algorithms: Regularization, prediction and model fitting (with discussion). Stat. Sci. 22(4), 477–505 (2007)

    Article  Google Scholar 

  6. Bühlmann, P., Yu, B.: Boosting with the \({L}_2\) loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)

    Article  Google Scholar 

  7. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    MathSciNet  Article  Google Scholar 

  8. Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)

    MathSciNet  Article  Google Scholar 

  9. Fithian, W., Sun, D., Taylor, J.: Optimal Inference After Model Selection. arXiv e-prints arXiv:1410.2597 (2014)

  10. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    MathSciNet  Article  Google Scholar 

  11. Hofner, B., Hothorn, T., Kneib, T., Schmid, M.: A framework for unbiased model selection based on boosting. J. Comput. Graph. Stat. 20(4), 956–971 (2011)

    MathSciNet  Article  Google Scholar 

  12. Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: Model-based boosting 2.0. J. Mach. Learn. Res 11, 2109–2113 (2010)

    MathSciNet  MATH  Google Scholar 

  13. Kivaranovic, D., Leeb, H.: Expected length of post-model-selection confidence intervals conditional on polyhedral constraints, vol. 1803, 01665. ArXiv e-prints (2018)

  14. Lee, J.D., Sun, D.L., Sun, Y., Taylor, J.E.: Exact post-selection inference, with application to the lasso. Ann. Stat. 44(3), 907–927 (2016). https://doi.org/10.1214/15-AOS1371

    MathSciNet  Article  MATH  Google Scholar 

  15. Loftus, J.R., Taylor, J.E.: A significance test for forward stepwise model selection. arXiv e-prints arXiv:1405.3920 (2014)

  16. Loftus, J.R., Taylor, J.E.: Selective inference in regression models with groups of variables. arXiv e-prints arXiv:1511.01478 (2015)

  17. Martino, L., Elvira, V., Louzada, F.: Effective sample size for importance sampling based on discrepancy measures. Signal Process. 131, 386–401 (2017)

    Article  Google Scholar 

  18. Mayr, A., Hofner, B., Waldmann, E., Hepp, T., Meyer, S., Gefeller, O.: An update on statistical boosting in biomedicine. Comput. Math. Methods Med. 2017, 12 (2017a)

    Article  Google Scholar 

  19. Mayr, A., Schmid, M., Pfahlberg, A., Uter, W., Gefeller, O.: A permutation test to analyse systematic bias and random measurement errors of medical devices via boosting location and scale models. Stat. Methods Med. Res. 26(3), 1443–1460 (2017b)

    MathSciNet  Article  Google Scholar 

  20. Melcher, M., Scharl, T., Luchner, M., Striedner, G., Leisch, F.: Boosted structured additive regression for escherichia coli fed-batch fermentation modeling. Biotechnol. Bioeng. 114(2), 321–334 (2017). https://doi.org/10.1002/bit.26073

    Article  Google Scholar 

  21. Rafiei, M.H., Adeli, H.: A novel machine learning model for estimation of sale prices of real estate units. J. Constr. Eng. Manage. 142(2), 04015066 (2015)

    Article  Google Scholar 

  22. Rügamer, D., Greven, S.: Selective inference after likelihood- or test-based model selection in linear models. Stat. Probab. Lett. 140, 7–12 (2018)

    MathSciNet  Article  Google Scholar 

  23. Rügamer, D., Brockhaus, S., Gentsch, K., Scherer, K., Greven, S.: Boosting factor-specific functional historical models for the detection of synchronization in bioelectrical signals. J. R. Stat. Soc. Ser. C 67(3), 621–642 (2018)

    MathSciNet  Article  Google Scholar 

  24. Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. R. Stat. Soc. Ser. B 75(1), 55–80 (2013)

    MathSciNet  Article  Google Scholar 

  25. Tian, X., Taylor, J.: Asymptotics of selective inference. Scand. J. Stat. 44(2), 480–499 (2017)

    MathSciNet  MATH  Google Scholar 

  26. Tian Harris, X., Panigrahi, S., Markovic, J., Bi, N., Taylor, J.: Selective sampling after solving a convex problem, vol. 1609, pp. 05609, ArXiv e-prints (2016)

  27. Tibshirani, R.J., Taylor, J., Lockhart, R., Tibshirani, R.: Exact post-selection inference for sequential regression procedures. J. Am. Stat. Assoc. 111(514), 600–620 (2016)

    MathSciNet  Article  Google Scholar 

  28. Tibshirani, R.J., Rinaldo, A., Tibshirani, R., Wasserman, L.: Uniform asymptotic inference and the bootstrap after model selection. Ann. Stat. 46(3), 1255–1287 (2018)

    MathSciNet  Article  Google Scholar 

  29. Wasserman, L., Roeder, K.: High dimensional variable selection. Ann. Stat. 37(5A), 2178–2201 (2009)

    MathSciNet  Article  Google Scholar 

  30. Yang, F., Barber, R.F., Jain, P., Lafferty, J.: Selective inference for group-sparse linear models. In: Advances in Neural Information Processing Systems, pp. 2469–2477 (2016)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to David Rügamer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 388 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rügamer, D., Greven, S. Inference for \(L_2\)-Boosting. Stat Comput 30, 279–289 (2020). https://doi.org/10.1007/s11222-019-09882-0

Download citation

Keywords

  • Bootstrap
  • Functional gradient descent boosting
  • Post-selection inference
  • Selective inference
  • Slow learner