Selection of Significant Features Using Monte Carlo Feature Selection

Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 605)

Abstract

Feature selection methods identify subsets of features in large datasets. Such methods have become popular in data-intensive areas, and performing feature selection prior to model construction may reduce the computational cost and improve the model quality. Monte Carlo Feature Selection (MCFS) is a feature selection method aimed at finding features to use for classification. Here we suggest a strategy using a z-test to compute the significance of a feature using MCFS. We have used simulated data with both informative and random features, and compared the z-test with a permutation test and a test implemented into the MCFS software. The z-test had a higher agreement with the permutation test compared with the built-in test. Furthermore, it avoided a bias related to the distribution of feature values that may have affected the built-in test. In conclusion, the suggested method has the potential to improve feature selection using MCFS.

Keywords

Feature selection MCFS Monte Carlo Feature significance  Classification 

Notes

Acknowledgments

We wish to thank the reviewers for insightful comments that helped improve this paper.  The authors were in part supported by an ESSENCE grant, by Uppsala University and by the Institute of Computer Science, Polish Academy of Sciences.

References

  1. 1.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J. Mach. Learn. Res. 3:1157–1182MATHGoogle Scholar
  2. 2.
    Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRefGoogle Scholar
  3. 3.
    Draminski M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J (2008) Monte Carlo feature selection for supervised classification. Bioinformatics 24:110–117CrossRefGoogle Scholar
  4. 4.
    Kierczak M, Ginalski K, Draminski M, Koronacki J, Rudnicki W, Komorowski J (2009) A rough set-based model of HIV-1 reverse transcriptase resistome. Bioinform. Biol. Insights 3:109–127Google Scholar
  5. 5.
    Draminski M, Kierczak M, Koronacki J, Komorowski J (2010) Monte Carlo feature selection and interdependency discovery in supervised classification. Stud Comput Intell 263:371–385CrossRefGoogle Scholar
  6. 6.
    Enroth S, Bornelöv S, Wadelius C, Komorowski J (2012) Combinations of histone modifications mark exon inclusion levels. PLoS ONE 7:e29911CrossRefMATHGoogle Scholar
  7. 7.
    Bornelöv S, Sääf A, Melen E, Bergström A, Moghadam BT, Pulkkinen V, Acevedo N, Pietras CO, Ege M, Braun-Fahrlander C, Riedler J, Doekes G, Kabesch M, van Hage M, Kere J, Scheynius A, Söderhäll C, Pershagen G, Komorowski J (2013) Rule-based models of the interplay between genetic and environmental factors in Childhood Allergy. PLoS ONE 8(11):e80080Google Scholar
  8. 8.
    Kruczyk M, Zetterberg H, Hansson O, Rolstad S, Minthon L, Wallin A, Blennow K, Komorowski J, Andersson M (2012) Monte Carlo feature selection and rule-based models to predict Alzheimer’s disease in mild cognitive impairment. J Neural Transm 119:821–831CrossRefGoogle Scholar
  9. 9.
  10. 10.
    Van AHT, Saeys Y, Wehenkel L, Geurts P (2012) Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28:1766–1774CrossRefGoogle Scholar
  11. 11.
    Dramiński M, Kierczak M, Nowak-Brzezińska A, Koronacki J, Komorowski J (2011) The Monte Carlo feature selection and interdependency discovery is unbiased, vol 40, pp 199–211. Systems Research Institute, Polish Academy of SciencesGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Cell and Molecular Biology, Science for Life LaboratoryUppsala UniversityUppsalaSweden
  2. 2.Department of Medical Biochemistry and MicrobiologyUppsala UniversityUppsalaSweden
  3. 3.Institute of Computer Science, Polish Academy of SciencesWarsawPoland

Personalised recommendations