Selection of Significant Features Using Monte Carlo Feature Selection
Feature selection methods identify subsets of features in large datasets. Such methods have become popular in data-intensive areas, and performing feature selection prior to model construction may reduce the computational cost and improve the model quality. Monte Carlo Feature Selection (MCFS) is a feature selection method aimed at finding features to use for classification. Here we suggest a strategy using a z-test to compute the significance of a feature using MCFS. We have used simulated data with both informative and random features, and compared the z-test with a permutation test and a test implemented into the MCFS software. The z-test had a higher agreement with the permutation test compared with the built-in test. Furthermore, it avoided a bias related to the distribution of feature values that may have affected the built-in test. In conclusion, the suggested method has the potential to improve feature selection using MCFS.
KeywordsFeature selection MCFS Monte Carlo Feature significance Classification
We wish to thank the reviewers for insightful comments that helped improve this paper. The authors were in part supported by an ESSENCE grant, by Uppsala University and by the Institute of Computer Science, Polish Academy of Sciences.
- 4.Kierczak M, Ginalski K, Draminski M, Koronacki J, Rudnicki W, Komorowski J (2009) A rough set-based model of HIV-1 reverse transcriptase resistome. Bioinform. Biol. Insights 3:109–127Google Scholar
- 7.Bornelöv S, Sääf A, Melen E, Bergström A, Moghadam BT, Pulkkinen V, Acevedo N, Pietras CO, Ege M, Braun-Fahrlander C, Riedler J, Doekes G, Kabesch M, van Hage M, Kere J, Scheynius A, Söderhäll C, Pershagen G, Komorowski J (2013) Rule-based models of the interplay between genetic and environmental factors in Childhood Allergy. PLoS ONE 8(11):e80080Google Scholar
- 11.Dramiński M, Kierczak M, Nowak-Brzezińska A, Koronacki J, Komorowski J (2011) The Monte Carlo feature selection and interdependency discovery is unbiased, vol 40, pp 199–211. Systems Research Institute, Polish Academy of SciencesGoogle Scholar