Identifying Feature Relevance Using a Random Forest

Rogers, Jeremy; Gunn, Steve

doi:10.1007/11752790_12

Jeremy Rogers²⁰ &
Steve Gunn²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3940))

Included in the following conference series:

International Statistical and Optimization Perspectives Workshop "Subspace, Latent Structure and Feature Selection"

4350 Accesses
30 Citations
1 Altmetric

Abstract

It is known that feature selection and feature relevance can benefit the performance and interpretation of machine learning algorithms. Here we consider feature selection within a Random Forest framework. A feature selection technique is introduced that combines hypothesis testing with an approximation to the expected performance of an irrelevant feature during Random Forest construction.

It is demonstrated that the lack of implicit feature selection within Random Forest has an adverse effect on the accuracy and efficiency of the algorithm. It is also shown that irrelevant features can slow the rate of error convergence and a theoretical justification of this effect is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Freund, Y., Schapire, R.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)
Article Google Scholar
Rogers, J., Gunn, S.: Ensemble algorithms for feature selection. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS, vol. 3635, pp. 180–198. Springer, Heidelberg (2005)
Chapter Google Scholar
Ho, T.: Nearest neighbours in random subspaces. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 640–648. Springer, Heidelberg (1998)
Chapter Google Scholar
Roobaert, D., Karakoulas, G., Chawla, N.: Information gain, correlation and support vector machines. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications, Springer, Heidelberg (in press, 2006)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, pp. 856–863. AAAI, Menlo Park (2003)
Google Scholar
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning, pp. 359–366 (2000)
Google Scholar
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004)
Chapter Google Scholar
Chen, Y.W., Lin, C.J.: Combining svms with various feature selection strategies. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)
Google Scholar
Borisov, A., Eruhimov, V., Tuv, E.: Tree-based ensembles with dynamic soft feature selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)
Google Scholar
Friedman, J.: Flexible metric nearest neighbour classification (1994)
Google Scholar
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19, 1–141 (1991)
Article MathSciNet MATH Google Scholar
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Image, Speech and Intelligent Systems Research Group, School of Electronics and Computer Science, University of Southampton, UK
Jeremy Rogers & Steve Gunn

Authors

Jeremy Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Steve Gunn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISIS Research Group, University of Southampton, Southampton, U.K.
Craig Saunders
Dept. of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik
School of Electronics and Computer Science, University of Southampton, Building 1, Highfield Campus, SO17 1BJ, Southampton, UK
Steve Gunn
The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rogers, J., Gunn, S. (2006). Identifying Feature Relevance Using a Random Forest. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_12

Download citation

DOI: https://doi.org/10.1007/11752790_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34137-6
Online ISBN: 978-3-540-34138-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics