Abstract
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
Keywords
Random forests Randomization Resampling Parameter tuning Variable importanceMathematics Subject Classification
62G02Notes
Acknowledgments
We thank the Editors and three anonymous referees for valuable comments and insightful suggestions.
References
- Amaratunga D, Cabrera J, Lee Y-S (2008) Enriched random forests. Bioinformatics 24:2010–2014CrossRefGoogle Scholar
- Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9:1545–1588CrossRefGoogle Scholar
- Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260CrossRefMATHMathSciNetGoogle Scholar
- Arlot S, Genuer R (2014) Analysis of purely random forests bias. arXiv:1407.3939
- Auret L, Aldrich C (2011) Empirical comparison of tree ensemble variable importance measures. Chemom Intell Lab Syst 105:157–170CrossRefGoogle Scholar
- Bai Z-H, Devroye L, Hwang H-K, Tsai T-H (2005) Maxima in hypercubes. Random Struct Algorithms 27:290–309CrossRefMATHMathSciNetGoogle Scholar
- Banerjee M, McKeague IW (2007) Confidence sets for split points in decision trees. Ann Stat 35:543–574CrossRefMATHMathSciNetGoogle Scholar
- Barndorff-Nielsen O, Sobel M (1966) On the distribution of the number of admissible points in a vector random sample. Theory Probab Appl 11:249–269CrossRefMATHMathSciNetGoogle Scholar
- Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–127CrossRefMATHGoogle Scholar
- Bernard S, Heutte L, Adam S (2008) Forest-RK: a new random forest induction method. In: Huang D-S, Wunsch DC II, Levine DS, Jo K-H (eds) Advanced intelligent computing theories and applications. With aspects of artificial intelligence. Springer, Berlin, pp 430–437CrossRefGoogle Scholar
- Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recognit Lett 33:1580–1586CrossRefGoogle Scholar
- Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13:1063–1095MATHMathSciNetGoogle Scholar
- Biau G, Devroye L (2010) On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J Multivar Anal 101:2499–2518CrossRefMATHMathSciNetGoogle Scholar
- Biau G, Devroye L (2013) Cellular tree classifiers. Electron J Stat 7:1875–1912CrossRefMATHMathSciNetGoogle Scholar
- Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9:2015–2033MATHMathSciNetGoogle Scholar
- Biau G, Cérou F, Guyader A (2010) On the rate of convergence of the bagged nearest neighbor estimate. J Mach Learn Res 11:687–712MATHMathSciNetGoogle Scholar
- Boulesteix A-L, Janitza S, Kruppa J, König IR (2012) Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Mining Knowl Discov 2:493–507CrossRefGoogle Scholar
- Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATHMathSciNetGoogle Scholar
- Breiman L (2000a) Some infinity theory for predictor ensembles. Technical Report 577, University of California, BerkeleyGoogle Scholar
- Breiman L (2000b) Randomizing outputs to increase prediction accuracy. Mach Learn 40:229–242CrossRefMATHGoogle Scholar
- Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefMATHGoogle Scholar
- Breiman L (2003a) Setting up, using, and understanding random forests V3.1. https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf
- Breiman L (2003b) Setting up, using, and understanding random forests V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf
- Breiman L (2004) Consistency for a simple model of random forests. Technical Report 670, University of California, BerkeleyGoogle Scholar
- Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Boca RatonMATHGoogle Scholar
- Bühlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–961CrossRefMATHMathSciNetGoogle Scholar
- Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666, University of California, BerkeleyGoogle Scholar
- Clémençon S, Depecker M, Vayatis N (2013) Ranking forests. J Mach Learn Res 14:39–73MATHMathSciNetGoogle Scholar
- Clémençon S, Vayatis N (2009) Tree-based ranking methods. IEEE Trans Inform Theory 55:4316–4336CrossRefMATHMathSciNetGoogle Scholar
- Criminisi A, Shotton J, Konukoglu E (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7:81–227CrossRefMATHGoogle Scholar
- Crookston NL, Finley AO (2008) yaImpute: an R package for \(k\)NN imputation. J Stat Softw 23:1–16CrossRefGoogle Scholar
- Cutler A, Zhao G (2001) PERT—perfect random tree ensembles. Comput Sci Stat 33:490–497Google Scholar
- Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792CrossRefGoogle Scholar
- Davies A, Ghahramani Z (2014) The Random Forest Kernel and creating other kernels for big data from random partitions. arXiv:1402.4293
- Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 international joint conference on neural networks, pp 1–8Google Scholar
- Deng H, Runger G (2013) Gene selection with guided regularized random forest. Pattern Recognit 46:3483–3489CrossRefGoogle Scholar
- Denil M, Matheson D, de Freitas N (2013) Consistency of online random forests. In: International conference on machine learning (ICML)Google Scholar
- Denil M, Matheson D, de Freitas N (2014) Narrowing the gap: random forests in theory and in practice. In: International conference on machine learning (ICML)Google Scholar
- Désir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506CrossRefGoogle Scholar
- Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New YorkCrossRefMATHGoogle Scholar
- Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:1–13Google Scholar
- Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 1–15Google Scholar
- Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26CrossRefMATHMathSciNetGoogle Scholar
- Efron B (1982) The jackknife, the bootstrap and other resampling plans, vol 38. CBMS-NSF Regional Conference Series in Applied Mathematics, PhiladelphiaGoogle Scholar
- Fink D, Hochachka WM, Zuckerberg B, Winkler DW, Shaby B, Munson MA, Hooker G, Riedewald M, Sheldon D, Kelling S (2010) Spatiotemporal exploratory models for broad-scale survey data. Ecol Appl 20:2131–2147CrossRefGoogle Scholar
- Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning, 2nd edn. Springer, New YorkMATHGoogle Scholar
- Genuer R (2012) Variance reduction in purely random forests. J Nonparametr Stat 24:543–562CrossRefMATHMathSciNetGoogle Scholar
- Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236CrossRefGoogle Scholar
- Geremia E, Menze BH, Ayache N (2013) Spatially adaptive random forests. In: IEEE international symposium on biomedical imaging: from nano to macro, pp 1332–1335Google Scholar
- Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42CrossRefMATHGoogle Scholar
- Gregorutti B, Michel B, Saint Pierre P (2016) Correlation and variable importance in random forests. Stat Comput. doi: 10.1007/s11222-016-9646-1
- Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRefMATHGoogle Scholar
- Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New YorkCrossRefMATHGoogle Scholar
- Ho T (1998) The random subspace method for constructing decision forests. Pattern Anal Mach Intell 20:832–844CrossRefGoogle Scholar
- Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674CrossRefMathSciNetGoogle Scholar
- Howard J, Bowles M (2012) The two most important algorithms in predictive modeling today. In: Strata Conference: Santa Clara. http://strataconf.com/strata2012/public/schedule/detail/22658
- Ishioka T (2013) Imputation of missing values for unsupervised data using the proximity in random forests. In: eLmL 2013, The fifth international conference on mobile, hybrid, and on-line learning, pp 30–36. International Academy, Research, and Industry AssociationGoogle Scholar
- Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537CrossRefMATHMathSciNetGoogle Scholar
- Ishwaran H (2013) The effect of splitting on random forests. Mach Learn 99:75–118CrossRefMATHMathSciNetGoogle Scholar
- Ishwaran H, Kogalur UB (2010) Consistency of random survival forests. Stat Probab Lett 80:1056–1064CrossRefMATHMathSciNetGoogle Scholar
- Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841–860CrossRefMATHMathSciNetGoogle Scholar
- Ishwaran H, Kogalur UB, Chen X, Minn AJ (2011) Random survival forests for high-dimensional data. Stat Anal Data Mining ASA Data Sci J 4:115–132CrossRefMathSciNetGoogle Scholar
- Jeffrey D, Sanja G (2008) Simplified data processing on large clusters. Commun ACM 51:107–113Google Scholar
- Joly A, Geurts P, Wehenkel L (2014) Random forests with random projections of the output space for high dimensional multi-label classification. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 607–622Google Scholar
- Kim H, Loh W-Y (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604CrossRefMathSciNetGoogle Scholar
- Kleiner A, Talwalkar A, Sarkar P, Jordan MI (2014) A scalable bootstrap for massive data. J Royal Stat Soc Ser B (Stat Methodol) 76:795–816CrossRefMathSciNetGoogle Scholar
- Konukoglu E, Ganz M (2014) Approximate false positive rate control in selection frequency for random forest. arXiv:1410.2838
- Kruppa J, Schwarz A, Arminger G, Ziegler A (2013) Consumer credit risk: individual probability estimates using machine learning. Expert Syst Appl 40:5125–5131CrossRefGoogle Scholar
- Kruppa J, Liu Y, Biau G, Kohler M, König IR, Malley JD, Ziegler A (2014a) Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biometr J 56:534–563CrossRefMATHMathSciNetGoogle Scholar
- Kruppa J, Liu Y, Diener H-C, Holste T, Weimar C, König IR, Ziegler A (2014b) Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biometr J 56:564–583CrossRefMATHMathSciNetGoogle Scholar
- Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New YorkCrossRefMATHGoogle Scholar
- Kyrillidis A, Zouzias A (2014) Non-uniform feature sampling for decision tree ensembles. In: IEEE international conference on acoustics, speech and signal processing, pp 4548–4552Google Scholar
- Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: efficient online random forests. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 3140–3148Google Scholar
- Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 178–187CrossRefGoogle Scholar
- Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22Google Scholar
- Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101:578–590CrossRefMATHMathSciNetGoogle Scholar
- Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, pp 431–439Google Scholar
- Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A (2012) Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inform Med 51:74–81CrossRefGoogle Scholar
- Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999MATHMathSciNetGoogle Scholar
- Meinshausen N (2009) Forest Garrote. Electron J Stat 3:1288–1304CrossRefMATHMathSciNetGoogle Scholar
- Mentch L, Hooker G (2014) A novel test for additivity in supervised ensemble learners. arXiv:1406.1845
- Mentch L, Hooker G (2015) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res (in press)Google Scholar
- Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 453–469Google Scholar
- Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142CrossRefMATHGoogle Scholar
- Nicodemus KK, Malley JD (2009) Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics 25:1884–1890CrossRefGoogle Scholar
- Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New YorkCrossRefMATHGoogle Scholar
- Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199CrossRefGoogle Scholar
- Qian SS, King RS, Richardson CJ (2003) Two statistical methods for the detection of environmental thresholds. Ecol Model 166:87–97CrossRefGoogle Scholar
- Rieger A, Hothorn T, Strobl C (2010) Random forests with missing values in the covariates. Technical Report 79, University of Munich, MunichGoogle Scholar
- Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: IEEE 12th international conference on computer vision workshops, pp 1393–1400Google Scholar
- Schwarz DF, König IR, Ziegler A (2010) On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics 26:1752–1758CrossRefGoogle Scholar
- Scornet E (2015a) On the asymptotics of random forests. J Multivar Anal 146:72–83Google Scholar
- Scornet E (2015b) Random forests and kernel methods. IEEE Trans Inform Theory 62:1485–1500Google Scholar
- Scornet E, Biau G, Vert J-P (2015) Consistency of random forests. Ann Stat 43:1716–1741CrossRefMATHMathSciNetGoogle Scholar
- Segal MR (1988) Regression trees for censored data. Biometrics 44:35–47CrossRefMATHGoogle Scholar
- Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE conference on computer vision and pattern recognition, pp 1297–1304Google Scholar
- Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5:595–645CrossRefMATHMathSciNetGoogle Scholar
- Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 8:1348–1360CrossRefMATHMathSciNetGoogle Scholar
- Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040–1053CrossRefMATHMathSciNetGoogle Scholar
- Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9:307CrossRefGoogle Scholar
- Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inform Comput Sci 43:1947–1958CrossRefGoogle Scholar
- Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994CrossRefGoogle Scholar
- Truong AKY (2009) Fast growing and interpretable oblique trees via logistic regression models. PhD thesis, University of Oxford, OxfordGoogle Scholar
- Varian H (2014) Big data: new tricks for econometrics. J Econ Perspect 28:3–28CrossRefGoogle Scholar
- Wager S (2014) Asymptotic theory for random forests. arXiv:1405.0352
- Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15:1625–1651MATHMathSciNetGoogle Scholar
- Watson GS (1964) Smooth regression analysis. Sankhy\(\bar{a}\) Ser A 26:359–372Google Scholar
- Welbl J (2014) Casting random forests as artificial neural networks and profiting from it. In: Jiang X, Hornegger J, Koch R (eds) Pattern recognition. Springer, Berlin, pp 765–771Google Scholar
- Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Mining ASA Data Sci J 6:496–505CrossRefMATHMathSciNetGoogle Scholar
- Yan D, Chen A, Jordan MI (2013) Cluster forests. Comput Stat Data Anal 66:178–192CrossRefMathSciNetGoogle Scholar
- Yang F, Wang J, Fan G (2010) Kernel induced random survival forests. arXiv:1008.3952
- Yi Z, Soatto S, Dewan M, Zhan Y (2012) Information forests. In: 2012 information theory and applications workshop, pp 143–146Google Scholar
- Zhu R, Zeng D, Kosorok MR (2015) Reinforcement learning trees. J Am Stat Assoc 110(512):1770–1784Google Scholar
- Ziegler A, König IR (2014) Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Mining Knowl Discov 4:55–63CrossRefGoogle Scholar