Bootstrap Aggregating and Random Forest

  • Tae-Hwy LeeEmail author
  • Aman Ullah
  • Ran Wang
Part of the Advanced Studies in Theoretical and Applied Econometrics book series (ASTA, volume 52)


Bootstrap Aggregating (Bagging) is an ensemble technique for improving the robustness of forecasts. Random Forest is a successful method based on Bagging and Decision Trees. In this chapter, we explore Bagging, Random Forest, and their variants in various aspects of theory and practice. We also discuss applications based on these methods in economic forecasting and inference.


  1. Audrino, F., & Medeiros, M. C. (2011). Modeling and forecasting short-term interest rates: The benefits of smooth regimes, macroeconomic variables, and Bagging. Journal of Applied Econometrics, 26(6), 999–1022.CrossRefGoogle Scholar
  2. Biau, O., & D’Elia, A. (2011). Euro area GDP forecast using large survey dataset - A random forest approach. In EcoMod 2010.Google Scholar
  3. Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.Google Scholar
  4. Breiman, L. (2000). Some infinity theory for predictor ensembles. Berkeley: University of California.Google Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefGoogle Scholar
  6. Breiman, L., Friedman, J., Stone, C., & Olshen, R. (1984). Classification and regression trees. The Wadsworth and Brooks-Cole Statistics-Probability Series. Oxfordshire: Taylor & Francis.Google Scholar
  7. Bühlmann, P. (2004). Bagging, boosting and ensemble methods (pp. 877–907). Handbook of Computational Statistics: Concepts and Methods. Berlin: Springer.Google Scholar
  8. Bühlmann, P., & Yu, B. (2002). Analyzing bagging. Annals of Statistics, 30(4), 927–961.CrossRefGoogle Scholar
  9. Buja, A., & Stuetzle, W. (2000a), Bagging does not always decrease mean squared error definitions (Preprint). Florham Park: AT&T Labs-Research.Google Scholar
  10. Buja, A., & Stuetzle, W. (2000b). Smoothing effects of bagging (Preprint). Florham Park: AT&T Labs-Research.Google Scholar
  11. Fischer, T., Krauss, C., & Treichel, A. (2018). Machine learning for time series forecasting - a simulation study (2018). FAU Discussion Papers in Economics, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.Google Scholar
  12. Friedman, J. H., & Hall, P. (2007). On Bagging and nonlinear estimation. Journal of Statistical Planning and Inference, 137(3), 669–683.CrossRefGoogle Scholar
  13. Frosst, N., & Hinton, G. (2017). Distilling a neural network into a soft decision tree. In Ceur workshop proceedings.Google Scholar
  14. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefGoogle Scholar
  15. Hillebrand, E., Lee, T.-H., & Medeiros, M. (2014). Bagging constrained equity premium predictors (Chap. 14, pp. 330–356). In Essays in Nonlinear Time Series Econometrics, Festschrift in Honor of Timo Teräsvirta. Oxford: Oxford University Press.Google Scholar
  16. Hirano, K., & Wright, J. H. (2017). Forecasting with model uncertainty: Representations and risk reduction. Econometrica, 85(2), 617–643.CrossRefGoogle Scholar
  17. Hothorn, T., & Zeileis, A. (2017). Transformation forests. Technical report.
  18. Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Algorithm, theory and applications. Neurocomputing, 70, 489–501.CrossRefGoogle Scholar
  19. Inoue, A., & Kilian, L. (2008). How useful is Bagging in forecasting economic time. Journal of the American Statistical Association, 103(482), 511–522.CrossRefGoogle Scholar
  20. Irsoy, O., Yildiz, O. T., & Alpaydin, E. (2012). A soft decision tree. In 21st International Conference on Pattern Recognition (ICPR 2012).Google Scholar
  21. Janitza, S., Celik, E., & Boulesteix, A. L. (2016). A computationally fast variable importance test for Random Forests for high-dimensional data. Advances in Data Analysis and Classification, 185, 1–31.Google Scholar
  22. Jin, S., Su, L., & Ullah, A. (2014). Robustify financial time series forecasting with Bagging. Econometric Reviews, 33(5-6), 575–605.CrossRefGoogle Scholar
  23. Jordan, M., & Jacob, R. (1994). Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation, 6, 181–214.CrossRefGoogle Scholar
  24. Kontschieder, P., Fiterau, M., Criminisi, A., Bul, S. R., Kessler, F. B., & Bulo’, S. R. (2015). Deep Neural Decision Forests. In The IEEE International Conference on Computer Vision (ICCV) (pp. 1467–1475).Google Scholar
  25. Lee, T.-H., & Yang, Y. (2006). Bagging binary and quantile predictors for time series. Journal of Econometrics, 135(1), 465–497.CrossRefGoogle Scholar
  26. Lee, T.-H., Tu, Y., & Ullah, A. (2014). Nonparametric and semiparametric regressions subject to monotonicity constraints: estimation and forecasting. Journal of Econometrics, 182(1), 196–210.CrossRefGoogle Scholar
  27. Lee, T.-H., Tu, Y., & Ullah, A. (2015). Forecasting equity premium: Global historical average versus local historical average and constraints. Journal of Business and Economic Statistics, 33(3), 393–402.CrossRefGoogle Scholar
  28. Lin, Y., & Jeon, Y. (2006). Random forests and adaptive nearest neighbors. Journal of the American Statistical Association, 101(474), 578–590.CrossRefGoogle Scholar
  29. Luong, C., & Dokuchaev, N. (2018). Forecasting of realised volatility with the random forests algorithm. Journal of Risk and Financial Management, 11(4), 61.CrossRefGoogle Scholar
  30. Nyman, R., & Ormerod, P. (2016). Predicting economic recessions using machine learning. arXiv:1701.01428.Google Scholar
  31. Panagiotelis, A., Athanasopoulos, G., Hyndman, R. J., Jiang, B., & Vahid, F. (2019). Macroeconomic forecasting for Australia using a large number of predictors. International Journal of Forecasting, 35(2), 616–633.Google Scholar
  32. Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1, 81–106.Google Scholar
  33. Quinlan, J. R. (1994). C4.5: programs for machine learning. Machine Learning, 16(3), 235–240.Google Scholar
  34. Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8, 25.CrossRefGoogle Scholar
  35. Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for Random Forests. BMC Bioinformatics, 9, 1–11.CrossRefGoogle Scholar
  36. Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242.CrossRefGoogle Scholar
  37. Welch, I., & Goyal, A. (2008). A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies, 21-4 1455–1508.CrossRefGoogle Scholar
  38. Yildiiz, O. T., Írsoy, O., & Alpaydin, E. (2016). Bagging soft decision trees. In Machine Learning for Health Informatics (Vol. 9605, pp. 25–36).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of EconomicsUniversity of CaliforniaRiversideUSA

Personalised recommendations