Advertisement

TEST

, Volume 25, Issue 2, pp 197–227 | Cite as

A random forest guided tour

  • Gérard Biau
  • Erwan Scornet
Invited Paper

Abstract

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.

Keywords

Random forests Randomization Resampling Parameter tuning Variable importance  

Mathematics Subject Classification

62G02 

Notes

Acknowledgments

We thank the Editors and three anonymous referees for valuable comments and insightful suggestions.

References

  1. Amaratunga D, Cabrera J, Lee Y-S (2008) Enriched random forests. Bioinformatics 24:2010–2014CrossRefGoogle Scholar
  2. Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9:1545–1588CrossRefGoogle Scholar
  3. Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260CrossRefzbMATHMathSciNetGoogle Scholar
  4. Arlot S, Genuer R (2014) Analysis of purely random forests bias. arXiv:1407.3939
  5. Auret L, Aldrich C (2011) Empirical comparison of tree ensemble variable importance measures. Chemom Intell Lab Syst 105:157–170CrossRefGoogle Scholar
  6. Bai Z-H, Devroye L, Hwang H-K, Tsai T-H (2005) Maxima in hypercubes. Random Struct Algorithms 27:290–309CrossRefzbMATHMathSciNetGoogle Scholar
  7. Banerjee M, McKeague IW (2007) Confidence sets for split points in decision trees. Ann Stat 35:543–574CrossRefzbMATHMathSciNetGoogle Scholar
  8. Barndorff-Nielsen O, Sobel M (1966) On the distribution of the number of admissible points in a vector random sample. Theory Probab Appl 11:249–269CrossRefzbMATHMathSciNetGoogle Scholar
  9. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–127CrossRefzbMATHGoogle Scholar
  10. Bernard S, Heutte L, Adam S (2008) Forest-RK: a new random forest induction method. In: Huang D-S, Wunsch DC II, Levine DS, Jo K-H (eds) Advanced intelligent computing theories and applications. With aspects of artificial intelligence. Springer, Berlin, pp 430–437CrossRefGoogle Scholar
  11. Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recognit Lett 33:1580–1586CrossRefGoogle Scholar
  12. Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13:1063–1095zbMATHMathSciNetGoogle Scholar
  13. Biau G, Devroye L (2010) On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J Multivar Anal 101:2499–2518CrossRefzbMATHMathSciNetGoogle Scholar
  14. Biau G, Devroye L (2013) Cellular tree classifiers. Electron J Stat 7:1875–1912CrossRefzbMATHMathSciNetGoogle Scholar
  15. Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9:2015–2033zbMATHMathSciNetGoogle Scholar
  16. Biau G, Cérou F, Guyader A (2010) On the rate of convergence of the bagged nearest neighbor estimate. J Mach Learn Res 11:687–712zbMATHMathSciNetGoogle Scholar
  17. Boulesteix A-L, Janitza S, Kruppa J, König IR (2012) Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Mining Knowl Discov 2:493–507CrossRefGoogle Scholar
  18. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140zbMATHMathSciNetGoogle Scholar
  19. Breiman L (2000a) Some infinity theory for predictor ensembles. Technical Report 577, University of California, BerkeleyGoogle Scholar
  20. Breiman L (2000b) Randomizing outputs to increase prediction accuracy. Mach Learn 40:229–242CrossRefzbMATHGoogle Scholar
  21. Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefzbMATHGoogle Scholar
  22. Breiman L (2003a) Setting up, using, and understanding random forests V3.1. https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf
  23. Breiman L (2003b) Setting up, using, and understanding random forests V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf
  24. Breiman L (2004) Consistency for a simple model of random forests. Technical Report 670, University of California, BerkeleyGoogle Scholar
  25. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Boca RatonzbMATHGoogle Scholar
  26. Bühlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–961CrossRefzbMATHMathSciNetGoogle Scholar
  27. Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666, University of California, BerkeleyGoogle Scholar
  28. Clémençon S, Depecker M, Vayatis N (2013) Ranking forests. J Mach Learn Res 14:39–73zbMATHMathSciNetGoogle Scholar
  29. Clémençon S, Vayatis N (2009) Tree-based ranking methods. IEEE Trans Inform Theory 55:4316–4336CrossRefzbMATHMathSciNetGoogle Scholar
  30. Criminisi A, Shotton J, Konukoglu E (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7:81–227CrossRefzbMATHGoogle Scholar
  31. Crookston NL, Finley AO (2008) yaImpute: an R package for \(k\)NN imputation. J Stat Softw 23:1–16CrossRefGoogle Scholar
  32. Cutler A, Zhao G (2001) PERT—perfect random tree ensembles. Comput Sci Stat 33:490–497Google Scholar
  33. Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792CrossRefGoogle Scholar
  34. Davies A, Ghahramani Z (2014) The Random Forest Kernel and creating other kernels for big data from random partitions. arXiv:1402.4293
  35. Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 international joint conference on neural networks, pp 1–8Google Scholar
  36. Deng H, Runger G (2013) Gene selection with guided regularized random forest. Pattern Recognit 46:3483–3489CrossRefGoogle Scholar
  37. Denil M, Matheson D, de Freitas N (2013) Consistency of online random forests. In: International conference on machine learning (ICML)Google Scholar
  38. Denil M, Matheson D, de Freitas N (2014) Narrowing the gap: random forests in theory and in practice. In: International conference on machine learning (ICML)Google Scholar
  39. Désir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506CrossRefGoogle Scholar
  40. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New YorkCrossRefzbMATHGoogle Scholar
  41. Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:1–13Google Scholar
  42. Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 1–15Google Scholar
  43. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26CrossRefzbMATHMathSciNetGoogle Scholar
  44. Efron B (1982) The jackknife, the bootstrap and other resampling plans, vol 38. CBMS-NSF Regional Conference Series in Applied Mathematics, PhiladelphiaGoogle Scholar
  45. Fink D, Hochachka WM, Zuckerberg B, Winkler DW, Shaby B, Munson MA, Hooker G, Riedewald M, Sheldon D, Kelling S (2010) Spatiotemporal exploratory models for broad-scale survey data. Ecol Appl 20:2131–2147CrossRefGoogle Scholar
  46. Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  47. Genuer R (2012) Variance reduction in purely random forests. J Nonparametr Stat 24:543–562CrossRefzbMATHMathSciNetGoogle Scholar
  48. Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236CrossRefGoogle Scholar
  49. Geremia E, Menze BH, Ayache N (2013) Spatially adaptive random forests. In: IEEE international symposium on biomedical imaging: from nano to macro, pp 1332–1335Google Scholar
  50. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42CrossRefzbMATHGoogle Scholar
  51. Gregorutti B, Michel B, Saint Pierre P (2016) Correlation and variable importance in random forests. Stat Comput. doi: 10.1007/s11222-016-9646-1
  52. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRefzbMATHGoogle Scholar
  53. Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New YorkCrossRefzbMATHGoogle Scholar
  54. Ho T (1998) The random subspace method for constructing decision forests. Pattern Anal Mach Intell 20:832–844CrossRefGoogle Scholar
  55. Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674CrossRefMathSciNetGoogle Scholar
  56. Howard J, Bowles M (2012) The two most important algorithms in predictive modeling today. In: Strata Conference: Santa Clara. http://strataconf.com/strata2012/public/schedule/detail/22658
  57. Ishioka T (2013) Imputation of missing values for unsupervised data using the proximity in random forests. In: eLmL 2013, The fifth international conference on mobile, hybrid, and on-line learning, pp 30–36. International Academy, Research, and Industry AssociationGoogle Scholar
  58. Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537CrossRefzbMATHMathSciNetGoogle Scholar
  59. Ishwaran H (2013) The effect of splitting on random forests. Mach Learn 99:75–118CrossRefzbMATHMathSciNetGoogle Scholar
  60. Ishwaran H, Kogalur UB (2010) Consistency of random survival forests. Stat Probab Lett 80:1056–1064CrossRefzbMATHMathSciNetGoogle Scholar
  61. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841–860CrossRefzbMATHMathSciNetGoogle Scholar
  62. Ishwaran H, Kogalur UB, Chen X, Minn AJ (2011) Random survival forests for high-dimensional data. Stat Anal Data Mining ASA Data Sci J 4:115–132CrossRefMathSciNetGoogle Scholar
  63. Jeffrey D, Sanja G (2008) Simplified data processing on large clusters. Commun ACM 51:107–113Google Scholar
  64. Joly A, Geurts P, Wehenkel L (2014) Random forests with random projections of the output space for high dimensional multi-label classification. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 607–622Google Scholar
  65. Kim H, Loh W-Y (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604CrossRefMathSciNetGoogle Scholar
  66. Kleiner A, Talwalkar A, Sarkar P, Jordan MI (2014) A scalable bootstrap for massive data. J Royal Stat Soc Ser B (Stat Methodol) 76:795–816CrossRefMathSciNetGoogle Scholar
  67. Konukoglu E, Ganz M (2014) Approximate false positive rate control in selection frequency for random forest. arXiv:1410.2838
  68. Kruppa J, Schwarz A, Arminger G, Ziegler A (2013) Consumer credit risk: individual probability estimates using machine learning. Expert Syst Appl 40:5125–5131CrossRefGoogle Scholar
  69. Kruppa J, Liu Y, Biau G, Kohler M, König IR, Malley JD, Ziegler A (2014a) Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biometr J 56:534–563CrossRefzbMATHMathSciNetGoogle Scholar
  70. Kruppa J, Liu Y, Diener H-C, Holste T, Weimar C, König IR, Ziegler A (2014b) Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biometr J 56:564–583CrossRefzbMATHMathSciNetGoogle Scholar
  71. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New YorkCrossRefzbMATHGoogle Scholar
  72. Kyrillidis A, Zouzias A (2014) Non-uniform feature sampling for decision tree ensembles. In: IEEE international conference on acoustics, speech and signal processing, pp 4548–4552Google Scholar
  73. Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: efficient online random forests. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 3140–3148Google Scholar
  74. Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 178–187CrossRefGoogle Scholar
  75. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22Google Scholar
  76. Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101:578–590CrossRefzbMATHMathSciNetGoogle Scholar
  77. Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, pp 431–439Google Scholar
  78. Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A (2012) Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inform Med 51:74–81CrossRefGoogle Scholar
  79. Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999zbMATHMathSciNetGoogle Scholar
  80. Meinshausen N (2009) Forest Garrote. Electron J Stat 3:1288–1304CrossRefzbMATHMathSciNetGoogle Scholar
  81. Mentch L, Hooker G (2014) A novel test for additivity in supervised ensemble learners. arXiv:1406.1845
  82. Mentch L, Hooker G (2015) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res (in press)Google Scholar
  83. Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 453–469Google Scholar
  84. Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142CrossRefzbMATHGoogle Scholar
  85. Nicodemus KK, Malley JD (2009) Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics 25:1884–1890CrossRefGoogle Scholar
  86. Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New YorkCrossRefzbMATHGoogle Scholar
  87. Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199CrossRefGoogle Scholar
  88. Qian SS, King RS, Richardson CJ (2003) Two statistical methods for the detection of environmental thresholds. Ecol Model 166:87–97CrossRefGoogle Scholar
  89. Rieger A, Hothorn T, Strobl C (2010) Random forests with missing values in the covariates. Technical Report 79, University of Munich, MunichGoogle Scholar
  90. Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: IEEE 12th international conference on computer vision workshops, pp 1393–1400Google Scholar
  91. Schwarz DF, König IR, Ziegler A (2010) On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics 26:1752–1758CrossRefGoogle Scholar
  92. Scornet E (2015a) On the asymptotics of random forests. J Multivar Anal 146:72–83Google Scholar
  93. Scornet E (2015b) Random forests and kernel methods. IEEE Trans Inform Theory 62:1485–1500Google Scholar
  94. Scornet E, Biau G, Vert J-P (2015) Consistency of random forests. Ann Stat 43:1716–1741CrossRefzbMATHMathSciNetGoogle Scholar
  95. Segal MR (1988) Regression trees for censored data. Biometrics 44:35–47CrossRefzbMATHGoogle Scholar
  96. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE conference on computer vision and pattern recognition, pp 1297–1304Google Scholar
  97. Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5:595–645CrossRefzbMATHMathSciNetGoogle Scholar
  98. Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 8:1348–1360CrossRefzbMATHMathSciNetGoogle Scholar
  99. Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040–1053CrossRefzbMATHMathSciNetGoogle Scholar
  100. Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9:307CrossRefGoogle Scholar
  101. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inform Comput Sci 43:1947–1958CrossRefGoogle Scholar
  102. Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994CrossRefGoogle Scholar
  103. Truong AKY (2009) Fast growing and interpretable oblique trees via logistic regression models. PhD thesis, University of Oxford, OxfordGoogle Scholar
  104. Varian H (2014) Big data: new tricks for econometrics. J Econ Perspect 28:3–28CrossRefGoogle Scholar
  105. Wager S (2014) Asymptotic theory for random forests. arXiv:1405.0352
  106. Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15:1625–1651zbMATHMathSciNetGoogle Scholar
  107. Watson GS (1964) Smooth regression analysis. Sankhy\(\bar{a}\) Ser A 26:359–372Google Scholar
  108. Welbl J (2014) Casting random forests as artificial neural networks and profiting from it. In: Jiang X, Hornegger J, Koch R (eds) Pattern recognition. Springer, Berlin, pp 765–771Google Scholar
  109. Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Mining ASA Data Sci J 6:496–505CrossRefzbMATHMathSciNetGoogle Scholar
  110. Yan D, Chen A, Jordan MI (2013) Cluster forests. Comput Stat Data Anal 66:178–192CrossRefMathSciNetGoogle Scholar
  111. Yang F, Wang J, Fan G (2010) Kernel induced random survival forests. arXiv:1008.3952
  112. Yi Z, Soatto S, Dewan M, Zhan Y (2012) Information forests. In: 2012 information theory and applications workshop, pp 143–146Google Scholar
  113. Zhu R, Zeng D, Kosorok MR (2015) Reinforcement learning trees. J Am Stat Assoc 110(512):1770–1784Google Scholar
  114. Ziegler A, König IR (2014) Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Mining Knowl Discov 4:55–63CrossRefGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2016

Authors and Affiliations

  1. 1.Sorbonne Universités, UPMC Univ Paris 06, CNRS, Laboratoire de Statistique Théorique et Appliquées (LSTA)ParisFrance
  2. 2.Institut universitaire de FranceParisFrance

Personalised recommendations