# Bootstrap bias corrections for ensemble methods

Article

First Online:

Received:

Accepted:

## Abstract

This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning. We demonstrate empirically that the proposed bootstrap bias correction can lead to substantial improvements in both bias and predictive accuracy. In the context of ensembles of trees, we show that this correction can be approximated at only double the cost of training the original ensemble. Our method is shown to improve test set accuracy over random forests by up to 70% on example problems from the UCI repository.

## Keywords

Bagging Ensemble methods Bias correction Bootstrap## Notes

### Acknowledgements

Supported by NSF grants DMS 1053252 and DEB 1353039.

## Supplementary material

11222_2016_9717_MOESM1_ESM.r (6 kb)

11222_2016_9717_MOESM2_ESM.r (6 kb)

11222_2016_9717_MOESM3_ESM.r (6 kb)

11222_2016_9717_MOESM4_ESM.r (5 kb)

11222_2016_9717_MOESM5_ESM.r (6 kb)

11222_2016_9717_MOESM6_ESM.r (5 kb)

11222_2016_9717_MOESM8_ESM.r (10 kb)

## References

- Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res.
**9**, 20152033 (2008)MathSciNetMATHGoogle Scholar - Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)CrossRefMATHGoogle Scholar
- Breiman, L.: Bagging predictors. Mach. Learn.
**24**(2), 123–140 (1996)MATHGoogle Scholar - Breiman, L.: Random forests. Mach. Learn.
**45**, 5–32 (2001)CrossRefMATHGoogle Scholar - Brooks, T.F., Pope, D.S., Marcolini, M.A.: Airfoil Self-Noise and Prediction, vol. 1218. National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Division (1989)Google Scholar
- Cortez, P., Morais, A.: A data mining approach to predict forest fires using meteorological data. In: Neves, J., Santos, M.F., Machado, J. (eds.) New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, pp. 512–523. APPIA, Guimaraes (2007)Google Scholar
- Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst.
**47**(4), 547–553 (2009)CrossRefGoogle Scholar - Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat.
**7**, 1–26 (1979)MathSciNetCrossRefMATHGoogle Scholar - Efron, B.: Estimation and accuracy after model selection. J. Am. Stat. Assoc.
**109**(507), 991–1007 (2014)MathSciNetCrossRefMATHGoogle Scholar - Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, New York (1993)CrossRefMATHGoogle Scholar
- Eubank, R.L.: Nonparametric Regression and Spline Smoothing. CRC Press, New York (1990)MATHGoogle Scholar
- Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. (2013). doi: 10.1007/s13748-013-0040-3
- Freedman, D.A., et al.: Bootstrapping regression models. Ann. Stat.
**9**(6), 1218–1228 (1981)MathSciNetCrossRefMATHGoogle Scholar - Gerritsma, J., Onnink, R., Versluis, A.: Geometry, Resistance and Stability of the Delft Systematic Yacht Hull Series. Delft University of Technology, Amsterdam (1981)Google Scholar
- Hall, P.: The Bootstrap and Edgeworth Expansion. Springer, Berlin (1992a)CrossRefMATHGoogle Scholar
- Hall, P.: On bootstrap confidence intervals in nonparametric regression. Ann. Stat.
**20**, 695–711 (1992b)MathSciNetCrossRefMATHGoogle Scholar - Hall, P., Horowitz, J.: A simple bootstrap method for constructing nonparametric confidence bands for functions. Ann. Stat.
**41**(4), 1892–1921 (2013)MathSciNetCrossRefMATHGoogle Scholar - Härdle, W., Bowman, A.W.: Bootstrapping in nonparametric regression: local adaptive smoothing and confidence bands. J. Am. Stat. Assoc.
**83**(401), 102–110 (1988)MathSciNetMATHGoogle Scholar - Harrison, D., Rubinfeld, D.L.: Hedonic prices and the demand for clean air. J. Environ. Econ. Manag.
**5**, 81–102 (1978)CrossRefMATHGoogle Scholar - Liaw, A., Wiener, M.: Classification and regression by randomforest. R News
**2**(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/ - Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- Little, M.A., McSharry, P.E., Roberts, S.J., Costello, D.A., Moroz, I.M., et al.: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMed. Eng. OnLine
**6**(1), 23 (2007)CrossRefGoogle Scholar - Mentch L, Hooker G (2016a) Formal hypothesis tests for additive structure in random forests. J. Comput. Gr. Stat. (In Press)Google Scholar
- Mentch, L., Hooker, G.: Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J. Mach. Learn. Res.
**17**(26), 1–41 (2016b)MathSciNetMATHGoogle Scholar - Quinlan, J.R.: Combining instance-based and model-based learning. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243 (1993)Google Scholar
- Redmond, M., Baveja, A.: A data-driven software tool for enabling cooperative information sharing among police departments. Eur. J. Oper. Res.
**141**(3), 660–678 (2002)CrossRefMATHGoogle Scholar - Scornet, E.: On the asymptotics of random forests (2014). arXiv:1409.2090
- Scornet, E., Biau, G., Vert, J.P.: Consistency of random forests. Ann. Stat.
**43**(4), 1716–1741 (2015)MathSciNetCrossRefMATHGoogle Scholar - Sexton, J., Laake, P.: Standard errors for bagged and random forest estimators. Comput. Stat. Data Anal.
**53**(3), 801–811 (2009)MathSciNetCrossRefMATHGoogle Scholar - Thompson, J.J., Blair, M.R., Chen, L., Henrey, A.J.: Video game telemetry as a critical tool in the study of complex skill learning. PLoS ONE
**8**(9), e75129 (2013)CrossRefGoogle Scholar - Tüfekci, P.: Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst.
**60**, 126–140 (2014)Google Scholar - Wager, S.: Asymptotic theory for random forests (2014). arXiv:1405.0352
- Wager, S., Hastie, T., Efron, B.: Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J. Mach. Learn. Res.
**15**(1), 1625–1651 (2014)MathSciNetMATHGoogle Scholar - Yeh, I.C.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res.
**28**(12), 1797–1808 (1998)CrossRefGoogle Scholar

## Copyright information

© Springer Science+Business Media New York 2016