A Hybrid Embedded-Filter Method for Improving Feature Selection Stability of Random Forests
Many domains deal with high dimensional data that are described with few observations compared to the large number of features. Feature selection is frequently used as a pre-processing step to make mining such data more efficient. Actually, the issue of feature selection concerns the stability which consists on the study of the sensibility of selected features to variations in the training set. Random forests are one of the classification algorithms that are also considered as embedded feature selection methods thanks to the selection that occurs in the learning algorithm. However, this method suffers from instability of selection. The purpose of our work is to investigate the classification and feature selection properties of Random Forests. We will have a particular focus on enhancing stability of this algorithm as an embedded feature selection method. A hybrid filter-embedded version of this algorithm is proposed and results show its efficiency.
KeywordsStability Feature selection Classification High dimensional data Random forests
- 1.Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and decision trees. Int. J. Comput. Sci. Issues (IJCSI) 9(5), 1–7 (2012)Google Scholar
- 13.van der Maaten, L.J.P., van den Herik, H.J.: Dimensionality reduction: A comparative review. Technical report. Tilburg Centre for Creative Computing, Tilburg University, Tilburg, Netherlands Technical Report: 2009–005 (2009)Google Scholar