Efficient All Relevant Feature Selection with Random Ferns
- 1.3k Downloads
Many machine learning methods can produce variable importance scores expressing the usability of each feature in context of the produced model; those scores on their own are yet not sufficient to generate feature selection, especially when an all relevant selection is required. There are wrapper methods aiming to solve this problem, mostly focused around estimating the expected distribution of irrelevant feature importance. However, such estimation often requires a substantial computational effort.
In this paper I propose a method of incorporating such estimation within the training process of a random ferns classifier and evaluate it as an all relevant feature selector, both directly and as a part of a dedicated wrapper approach. The obtained results prove its effectiveness and computational efficiency.
KeywordsFeature importance Feature selection Random Forest Random ferns
This work has been financed by the National Science Centre, grant 2011/01/N/ST6/07035, as well as with the support of the OCEAN—Open Centre for Data and Data Analysis Project, co-financed by the European Regional Development Fund under the Innovative Economy Operational Programme. Computations were performed at ICM, grant G48-6.
- 1.Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)Google Scholar
- 5.Friedlander, M., Dobra, A., Massam, H., Briollais, L.: genMOSS: Functions for the Bayesian Analysis of GWAS Data, rpackageversion 1.2 (2014). https://CRAN.R-project.org/package=genMOSS
- 6.Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 17, 545–552 (2005)Google Scholar
- 7.Huynh-Thu, V.A., Wehenkel, L., Geurts, P.: Exploiting tree-based variable importances to selectively identify relevant variables. In: JMLR: Workshop and Conference Proceedings, pp. 60–73 (2008)Google Scholar
- 13.Oshin, O., Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using randomised ferns. In: 2009 IEEE 12th International Conference Computer Vision Workshops (ICCV Workshops), pp. 530–537. IEEE (2009)Google Scholar
- 14.Özuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. Image Process. (2008)Google Scholar
- 15.Özuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007Google Scholar
- 18.Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRefGoogle Scholar
- 19.Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 2181–2186. IEEE (2006)Google Scholar