Comments on: A random forest guided tour


We discuss future challenges in developing statistical theory for Random Forests. In particular, we suggest that an analysis of bias and extrapolation is vital to understanding the statistical properties of variable importance measures. We further point to the incorporation of random forests within larger statistical models as an important tool for high-dimensional statistical inference.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007. IEEE. pp 1–8

  2. Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792

    Article  Google Scholar 

  3. Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression: models, methods and applications. Springer, Berlin Heidelberg

    Google Scholar 

  4. Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pat Recogn Lett 27(4):294–300

    Article  Google Scholar 

  5. Gunther EC, Stone DJ, Gerwien RW, Bento P, Heyes MP (2003) Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc Natl Acad Sci 100(16):9608–9613

    Article  Google Scholar 

  6. Hooker G (2004) Variable interaction networks. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  7. Hooker G (2007) Generalized functional ANOVA diagnostics for high dimensional functions of dependent variables. J Comput Graph Stat 16:709–732

    MathSciNet  Article  Google Scholar 

  8. Hooker G, Mentch L (2015) Bootstrap bias corrections for ensemble methods. arXiv preprint arXiv:1506.00553

  9. Li Q, Racine JS (2007) Nonparametric econometrics. Princeton University Press, Princeton

    Google Scholar 

  10. Lou Y, Caruana R, Gehrke J, Hooker G (2013) Accurate intelligible models with pairwise interactions. In: Proceedings of the Ninteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  11. Mentch L, Hooker G (2014) Detecting feature interactions in bagged trees and random forests. ArXiv e-prints

  12. Mentch L, Hooker G (2015) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res (In press)

  13. Sorokina D, Caruana R, Riedewald M (2007) Additive groves of regression trees. In: Proceedings of the 18th European Conference on Machine Learning (ECML’07)

  14. Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 1348–1360

  15. Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9(1):307

    Article  Google Scholar 

  16. Wager S, Athey S (2015) Estimation and inference of heterogeneous treatment effects using random forests. arXiv preprint arXiv:1510.04342

Download references


This work was supported by NSF grants DMS-103252 and DEB-1353039.

Author information



Corresponding author

Correspondence to Giles Hooker.

Additional information

This comment refers to the invited paper available at: doi:10.1007/s11749-016-0481-7.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hooker, G., Mentch, L. Comments on: A random forest guided tour . TEST 25, 254–260 (2016).

Download citation


  • Random forests
  • Machine learning
  • Extrapolation
  • Variable importance

Mathematics Subject Classification

  • 62G09