A Hybrid Embedded-Filter Method for Improving Feature Selection Stability of Random Forests

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 552)

Abstract

Many domains deal with high dimensional data that are described with few observations compared to the large number of features. Feature selection is frequently used as a pre-processing step to make mining such data more efficient. Actually, the issue of feature selection concerns the stability which consists on the study of the sensibility of selected features to variations in the training set. Random forests are one of the classification algorithms that are also considered as embedded feature selection methods thanks to the selection that occurs in the learning algorithm. However, this method suffers from instability of selection. The purpose of our work is to investigate the classification and feature selection properties of Random Forests. We will have a particular focus on enhancing stability of this algorithm as an embedded feature selection method. A hybrid filter-embedded version of this algorithm is proposed and results show its efficiency.

Keywords

Stability Feature selection Classification High dimensional data Random forests 

References

  1. 1.
    Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and decision trees. Int. J. Comput. Sci. Issues (IJCSI) 9(5), 1–7 (2012)Google Scholar
  2. 2.
    Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)CrossRefGoogle Scholar
  3. 3.
    Ben Brahim, A., Limam, M.: A hybrid feature selection method based on instance learning and cooperative subset search. Pattern Recogn. Lett. 69(C), 28–34 (2016)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  5. 5.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). doi:10.1007/3-540-45014-9_1 CrossRefGoogle Scholar
  6. 6.
    Dyrskjøt, L., Thykjaer, T., Kruhøffer, M., Jensen, J.L., Marcussen, N., Hamilton-Dutoit, S., Wolf, H., Ørntoft, T.F.: Identifying distinct classes of bladder carcinoma using microarrays. Nat. Genet. 33(1), 90–96 (2003)CrossRefGoogle Scholar
  7. 7.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Data Management Systems. Morgan Kaufmann, San Francisco (2000)MATHGoogle Scholar
  9. 9.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRefGoogle Scholar
  10. 10.
    Li, S., Harner, E.J., Adjeroh, D.A.: Random KNN feature selection-a fast and stable alternative to random forests. BMC Bioinformatics 12(1), 1 (2011)CrossRefGoogle Scholar
  11. 11.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21 CrossRefGoogle Scholar
  12. 12.
    Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)CrossRefGoogle Scholar
  13. 13.
    van der Maaten, L.J.P., van den Herik, H.J.: Dimensionality reduction: A comparative review. Technical report. Tilburg Centre for Creative Computing, Tilburg University, Tilburg, Netherlands Technical Report: 2009–005 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.LARODEC, Institut Supérieur de GestionUniversité de TunisLe BardoTunisie
  2. 2.LARODEC, Tunis Business SchoolUniversité de TunisEl MouroujTunisie

Personalised recommendations