Feature Selection for Microarray Data Analysis Using Mutual Information and Rough Set Theory

  • Wengang Zhou
  • Chunguang Zhou
  • Hong Zhu
  • Guixia Liu
  • Xiaoyu Chang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4115)


Cancer classification is one major application of microarray data analysis. Due to the ultra high dimension of gene expression data, efficient feature selection methods are in great needs for selecting a small number of informative genes. In this paper, we propose a novel feature selection method MIRS based on mutual information and rough set. First, we select some top-ranked features which have higher mutual information with the target class to predict. Then rough set theory is applied to remove the redundancy among these selected genes. Binary particle swarm optimization (BPSO) is first proposed for attribute reduction in rough set. Finally, the effectiveness of the proposed method is evaluated by the classification accuracy of SVM classifier. Experiment results show that MIRS is superior to some other classical feature selection methods and can get higher prediction accuracy with small number of features. Generally, the results are highly promising.


Support Vector Machine Feature Selection Mutual Information Support Vector Machine Classifier Feature Selection Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Furey, T., Cristianini, N., Duffy, N.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics 16, 909–914 (2000)CrossRefGoogle Scholar
  2. 2.
    Model, F., Adorjan, P., Olek, A., Piepenbrock, C.: Feature Selection for DNA Methylation Based Cancer Classification. Bioinformatics 17, 157–164 (2001)Google Scholar
  3. 3.
    Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97, 273–324 (1997)MATHCrossRefGoogle Scholar
  4. 4.
    Chow, T., Huang, D.: Estimating Optimal Feature Subsets Using Efficient Estimation of High-Dimensional Mutual Information. IEEE Transactions on Neural Networks 16, 213–224 (2005)CrossRefGoogle Scholar
  5. 5.
    Zhong, N., Dong, J.Z.: Using Rough Sets with Heuristics for Feature Selection. Journal of Intelligent Information System 16, 199–214 (2001)MATHCrossRefGoogle Scholar
  6. 6.
    Cover, T., Thomas, J.: Elements of Information Theory, New York. Wiley Series in Telecommunications (1991)Google Scholar
  7. 7.
    Zaffalon, M., Hutter, M.: Robust Feature Selection by Mutual Information Distributions. In: Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence, pp. 577–584 (2002)Google Scholar
  8. 8.
    Pawlak, Z.: Rough Sets. International Journal of Computer Information Science 11, 341–356 (1982)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin, Heidelberg, New York (1995)MATHGoogle Scholar
  10. 10.
    Kennedy, J., Eberhart, R.C.: A Discrete Binary Version of the Particle Swarm Algorithm. In: Proceedings of the 1997 Conference on Systems, Man, and Cybernetics, Piscataway, pp. 4104–4109. IEEE Press, Los Alamitos (1997)Google Scholar
  11. 11.
    Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. Intelligent decision support: Handbook of applications and advances of rough set theory 11, 331–362 (1992)Google Scholar
  12. 12.
    Aleksander: Institute of Mathematics, University of Warsaw, Poland,
  13. 13.
    Golub, T.R., Slonim, K.D., Tamayo, P., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  14. 14.
    Alon, U., Barkai, N., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Cancer Tissues Probed by Oligonucleotide Arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
  15. 15.
    Cho, S., Won, H.: Machine Learning in DNA Microarray Analysis for Cancer Classification. In: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics, vol. 19, pp. 189–198 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wengang Zhou
    • 1
  • Chunguang Zhou
    • 1
  • Hong Zhu
    • 1
  • Guixia Liu
    • 1
  • Xiaoyu Chang
    • 1
  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunP.R. China

Personalised recommendations