Abstract
In this paper we present a study on the Random Forest (RF) family of classification methods, and more particularly on two important properties called strength and correlation. These two properties have been introduced by Breiman in the calculation of an upper bound of the generalization error. We thus propose to experimentally study the actual relation between these properties and the error rate in order to confirm and extend the Breiman theoretical results. We show that the error rate statistically decreases with the joint maximization of the strength and minimization of the correlation, and this for different sizes of RF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Boinee, P., Angelis, A.D., Foresti, G.: Ensembling classifiers - an application to image data classification from cherenkov telescope experiment. World Academy of Science, Engineering and Technology 12, 66–70 (2005)
Banfield, R., Hall, L., Bowyer, K., Kegelmeyer, W.: A comparison of decision tree ensemble creation techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2006)
Bernard, S., Heutte, L., Adam, S.: On the selection of decision trees in random forests. In: International Joint Conference on Neural Network, pp. 302–307 (2009)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Santos, E.D., Sabourin, R., Maupin, P.: A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognition 41, 2993–3009 (2008)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Chatelain, C., Heutte, L., Paquet, T.: A two-stage outlier rejection strategy for numerical field extraction in handwritten documents. In: Internation Conference on Pattern Recognition, Honk Kong, China, vol. 3, pp. 224–227 (2006)
Bernard, S., Heutte, L., Adam, S.: Influence of hyperparameters on random forest accuracy. In: International Workshop on Multiple Classifier Systems, pp. 171–180 (2009)
Bernard, S.: Forêts aléatoires: De l’analyse des mécanismes de fonctionnement à la construction dynamique. Thèse de Doctorat, Université de Rouen, France (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bernard, S., Heutte, L., Adam, S. (2010). A Study of Strength and Correlation in Random Forests. In: Huang, DS., McGinnity, M., Heutte, L., Zhang, XP. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2010. Communications in Computer and Information Science, vol 93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14831-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-14831-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14830-9
Online ISBN: 978-3-642-14831-6
eBook Packages: Computer ScienceComputer Science (R0)