Robust Feature Selection Using Ensemble Feature Selection Techniques

  • Yvan Saeys
  • Thomas Abeel
  • Yves Van de Peer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5212)

Abstract

Robustness or stability of feature selection techniques is a topic of recent interest, and is an important issue when selected feature subsets are subsequently analysed by domain experts to gain more insight into the problem modelled. In this work, we investigate the use of ensemble feature selection techniques, where multiple feature selection methods are combined to yield more robust results. We show that these techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique. In addition, we also investigate the effect of ensemble feature selection techniques on classification performance, giving rise to a new model selection strategy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  2. 2.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1-2), 273–324 (1997)MATHCrossRefGoogle Scholar
  3. 3.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  4. 4.
    Dunne, K., Cunningham, P., Azuaje, F.: Solutions to instability problems with sequential wrapper-based approaches to feature selection. Technical report TCD-2002-28. Dept. of Computer Science, Trinity College, Dublin, Ireland (2002)Google Scholar
  5. 5.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRefGoogle Scholar
  6. 6.
    Kuncheva, L.: A stability index for feature selection. In: Proceedings of the 25th International Multi-Conference on Artificial Intelligence and Applications, pp. 390–395 (2007)Google Scholar
  7. 7.
    Krízek, P., Kittler, J., Hlavác, V.: Improving Stability of Feature Selection Methods. In: Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns, pp. 929–936 (2007)Google Scholar
  8. 8.
    Dietterich, T.: Ensemble methods in machine learning. In: Proceedings of the 1st International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)Google Scholar
  9. 9.
    Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging. Statistical Science 14, 382–401 (1999)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C (1988)Google Scholar
  11. 11.
    Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: Proceedings of the 7th European Conference on Machine Learning, pp. 171–182 (1994)Google Scholar
  12. 12.
    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  13. 13.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46(1-3), 389–422 (2002)MATHCrossRefGoogle Scholar
  14. 14.
    Breiman, L.: Bagging Predictors: Machine Learning 24(2), 123–140 (1996)Google Scholar
  15. 15.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  16. 16.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  17. 17.
    Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(3), 503–511 (2000)CrossRefGoogle Scholar
  18. 18.
    Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B.: Use of proteomics patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)CrossRefGoogle Scholar
  19. 19.
    Petricoin, E.F., Ornstein, D.K., Paweletz, C.P., Ardekani, A., Hackett, P.S., Hitt, B.A., Velassco, A., Trucco, C.: Serum proteomic patterns for detection of prostate cancer. J. Natl. Cancer Inst. 94(20), 1576–1578 (2002)Google Scholar
  20. 20.
    Hingorani, S.R., Petricoin, E.F., Maitra, A., Rajapakse, V., King, C., Jacobetz, M.A., Ross, S.: Preinvasive and invasive ductal pancreatic cancer and its early detection in the mouse. Cancer Cell. 4(6), 437–450 (2003)CrossRefGoogle Scholar
  21. 21.
    van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yvan Saeys
    • 1
    • 2
  • Thomas Abeel
    • 1
    • 2
  • Yves Van de Peer
    • 1
    • 2
  1. 1.Department of Plant Systems BiologyVIBGentBelgium
  2. 2.Department of Molecular GeneticsGhent UniversityGentBelgium

Personalised recommendations