, Volume 25, Issue 2, pp 254–260 | Cite as

Comments on: A random forest guided tour

  • Giles HookerEmail author
  • Lucas Mentch


We discuss future challenges in developing statistical theory for Random Forests. In particular, we suggest that an analysis of bias and extrapolation is vital to understanding the statistical properties of variable importance measures. We further point to the incorporation of random forests within larger statistical models as an important tool for high-dimensional statistical inference.


Random forests Machine learning Extrapolation Variable importance 

Mathematics Subject Classification




This work was supported by NSF grants DMS-103252 and DEB-1353039.


  1. Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007. IEEE. pp 1–8Google Scholar
  2. Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792CrossRefGoogle Scholar
  3. Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression: models, methods and applications. Springer, Berlin HeidelbergCrossRefzbMATHGoogle Scholar
  4. Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pat Recogn Lett 27(4):294–300CrossRefGoogle Scholar
  5. Gunther EC, Stone DJ, Gerwien RW, Bento P, Heyes MP (2003) Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc Natl Acad Sci 100(16):9608–9613CrossRefGoogle Scholar
  6. Hooker G (2004) Variable interaction networks. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data MiningGoogle Scholar
  7. Hooker G (2007) Generalized functional ANOVA diagnostics for high dimensional functions of dependent variables. J Comput Graph Stat 16:709–732MathSciNetCrossRefGoogle Scholar
  8. Hooker G, Mentch L (2015) Bootstrap bias corrections for ensemble methods. arXiv preprint arXiv:1506.00553
  9. Li Q, Racine JS (2007) Nonparametric econometrics. Princeton University Press, PrincetonzbMATHGoogle Scholar
  10. Lou Y, Caruana R, Gehrke J, Hooker G (2013) Accurate intelligible models with pairwise interactions. In: Proceedings of the Ninteenth ACM SIGKDD International Conference on Knowledge Discovery and Data MiningGoogle Scholar
  11. Mentch L, Hooker G (2014) Detecting feature interactions in bagged trees and random forests. ArXiv e-printsGoogle Scholar
  12. Mentch L, Hooker G (2015) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res (In press)Google Scholar
  13. Sorokina D, Caruana R, Riedewald M (2007) Additive groves of regression trees. In: Proceedings of the 18th European Conference on Machine Learning (ECML’07)Google Scholar
  14. Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 1348–1360Google Scholar
  15. Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9(1):307CrossRefGoogle Scholar
  16. Wager S, Athey S (2015) Estimation and inference of heterogeneous treatment effects using random forests. arXiv preprint arXiv:1510.04342

Copyright information

© Sociedad de Estadística e Investigación Operativa 2016

Authors and Affiliations

  1. 1.Department of Biological Statistics and Computational BiologyCornell UniversityIthacaUSA
  2. 2.Department of StatisticsUniversity of PittsburghPittsburghUSA

Personalised recommendations