Bootstrap bias corrections for ensemble methods
This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning. We demonstrate empirically that the proposed bootstrap bias correction can lead to substantial improvements in both bias and predictive accuracy. In the context of ensembles of trees, we show that this correction can be approximated at only double the cost of training the original ensemble. Our method is shown to improve test set accuracy over random forests by up to 70% on example problems from the UCI repository.
KeywordsBagging Ensemble methods Bias correction Bootstrap
Supported by NSF grants DMS 1053252 and DEB 1353039.
- Brooks, T.F., Pope, D.S., Marcolini, M.A.: Airfoil Self-Noise and Prediction, vol. 1218. National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Division (1989)Google Scholar
- Cortez, P., Morais, A.: A data mining approach to predict forest fires using meteorological data. In: Neves, J., Santos, M.F., Machado, J. (eds.) New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, pp. 512–523. APPIA, Guimaraes (2007)Google Scholar
- Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. (2013). doi: 10.1007/s13748-013-0040-3
- Gerritsma, J., Onnink, R., Versluis, A.: Geometry, Resistance and Stability of the Delft Systematic Yacht Hull Series. Delft University of Technology, Amsterdam (1981)Google Scholar
- Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
- Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- Mentch L, Hooker G (2016a) Formal hypothesis tests for additive structure in random forests. J. Comput. Gr. Stat. (In Press)Google Scholar
- Quinlan, J.R.: Combining instance-based and model-based learning. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243 (1993)Google Scholar
- Scornet, E.: On the asymptotics of random forests (2014). arXiv:1409.2090
- Tüfekci, P.: Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst. 60, 126–140 (2014)Google Scholar
- Wager, S.: Asymptotic theory for random forests (2014). arXiv:1405.0352