Skip to main content

Identifying Feature Relevance Using a Random Forest

  • Conference paper
Subspace, Latent Structure and Feature Selection (SLSFS 2005)

Abstract

It is known that feature selection and feature relevance can benefit the performance and interpretation of machine learning algorithms. Here we consider feature selection within a Random Forest framework. A feature selection technique is introduced that combines hypothesis testing with an approximation to the expected performance of an irrelevant feature during Random Forest construction.

It is demonstrated that the lack of implicit feature selection within Random Forest has an adverse effect on the accuracy and efficiency of the algorithm. It is also shown that irrelevant features can slow the rate of error convergence and a theoretical justification of this effect is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Freund, Y., Schapire, R.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)

    Google Scholar 

  2. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  Google Scholar 

  3. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)

    Article  Google Scholar 

  5. Rogers, J., Gunn, S.: Ensemble algorithms for feature selection. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS, vol. 3635, pp. 180–198. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Ho, T.: Nearest neighbours in random subspaces. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 640–648. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Roobaert, D., Karakoulas, G., Chawla, N.: Information gain, correlation and support vector machines. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications, Springer, Heidelberg (in press, 2006)

    Google Scholar 

  8. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, pp. 856–863. AAAI, Menlo Park (2003)

    Google Scholar 

  9. Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning, pp. 359–366 (2000)

    Google Scholar 

  10. John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  11. Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Chen, Y.W., Lin, C.J.: Combining svms with various feature selection strategies. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)

    Google Scholar 

  13. Borisov, A., Eruhimov, V., Tuv, E.: Tree-based ensembles with dynamic soft feature selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)

    Google Scholar 

  14. Friedman, J.: Flexible metric nearest neighbour classification (1994)

    Google Scholar 

  15. Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  16. Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19, 1–141 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  17. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (in press, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rogers, J., Gunn, S. (2006). Identifying Feature Relevance Using a Random Forest. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_12

Download citation

  • DOI: https://doi.org/10.1007/11752790_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34137-6

  • Online ISBN: 978-3-540-34138-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics