Abstract
It is known that feature selection and feature relevance can benefit the performance and interpretation of machine learning algorithms. Here we consider feature selection within a Random Forest framework. A feature selection technique is introduced that combines hypothesis testing with an approximation to the expected performance of an irrelevant feature during Random Forest construction.
It is demonstrated that the lack of implicit feature selection within Random Forest has an adverse effect on the accuracy and efficiency of the algorithm. It is also shown that irrelevant features can slow the rate of error convergence and a theoretical justification of this effect is given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Freund, Y., Schapire, R.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)
Rogers, J., Gunn, S.: Ensemble algorithms for feature selection. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS, vol. 3635, pp. 180–198. Springer, Heidelberg (2005)
Ho, T.: Nearest neighbours in random subspaces. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 640–648. Springer, Heidelberg (1998)
Roobaert, D., Karakoulas, G., Chawla, N.: Information gain, correlation and support vector machines. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications, Springer, Heidelberg (in press, 2006)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, pp. 856–863. AAAI, Menlo Park (2003)
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning, pp. 359–366 (2000)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004)
Chen, Y.W., Lin, C.J.: Combining svms with various feature selection strategies. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)
Borisov, A., Eruhimov, V., Tuv, E.: Tree-based ensembles with dynamic soft feature selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)
Friedman, J.: Flexible metric nearest neighbour classification (1994)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19, 1–141 (1991)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rogers, J., Gunn, S. (2006). Identifying Feature Relevance Using a Random Forest. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_12
Download citation
DOI: https://doi.org/10.1007/11752790_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34137-6
Online ISBN: 978-3-540-34138-3
eBook Packages: Computer ScienceComputer Science (R0)