Feature Selection for Microarray Data Analysis Using Mutual Information and Rough Set Theory
Cancer classification is one major application of microarray data analysis. Due to the ultra high dimension of gene expression data, efficient feature selection methods are in great needs for selecting a small number of informative genes. In this paper, we propose a novel feature selection method MIRS based on mutual information and rough set. First, we select some top-ranked features which have higher mutual information with the target class to predict. Then rough set theory is applied to remove the redundancy among these selected genes. Binary particle swarm optimization (BPSO) is first proposed for attribute reduction in rough set. Finally, the effectiveness of the proposed method is evaluated by the classification accuracy of SVM classifier. Experiment results show that MIRS is superior to some other classical feature selection methods and can get higher prediction accuracy with small number of features. Generally, the results are highly promising.
KeywordsSupport Vector Machine Feature Selection Mutual Information Support Vector Machine Classifier Feature Selection Method
Unable to display preview. Download preview PDF.
- 2.Model, F., Adorjan, P., Olek, A., Piepenbrock, C.: Feature Selection for DNA Methylation Based Cancer Classification. Bioinformatics 17, 157–164 (2001)Google Scholar
- 6.Cover, T., Thomas, J.: Elements of Information Theory, New York. Wiley Series in Telecommunications (1991)Google Scholar
- 7.Zaffalon, M., Hutter, M.: Robust Feature Selection by Mutual Information Distributions. In: Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence, pp. 577–584 (2002)Google Scholar
- 10.Kennedy, J., Eberhart, R.C.: A Discrete Binary Version of the Particle Swarm Algorithm. In: Proceedings of the 1997 Conference on Systems, Man, and Cybernetics, Piscataway, pp. 4104–4109. IEEE Press, Los Alamitos (1997)Google Scholar
- 11.Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. Intelligent decision support: Handbook of applications and advances of rough set theory 11, 331–362 (1992)Google Scholar
- 12.Aleksander: Institute of Mathematics, University of Warsaw, Poland, http://rosetta.lcb.uu.se/
- 15.Cho, S., Won, H.: Machine Learning in DNA Microarray Analysis for Cancer Classification. In: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics, vol. 19, pp. 189–198 (2003)Google Scholar