Comparative Performance Analysis of Different Measures to Select Disease Related Informative Genes from Microarray Gene Expression Data

  • Chandra Das
  • Shilpi BoseEmail author
  • Abhik Banerjee
  • Sourav Dutta
  • Kuntal Ghosh
  • Matangini Chattopadhyay
Conference paper
Part of the Learning and Analytics in Intelligent Systems book series (LAIS, volume 12)


Diseased sample classification is a very important application of microarray gene expression data. For sample classification the main problem is high dimensionality of genes (features). Among those huge numbers of genes only a small number of genes carry disease related information. To improve sample classification accuracy gene dimension reduction by selecting informative and non-redundant genes is a necessary task and for this purpose different feature selection methodologies are applied. In this regard, here, a comparative study of different measures to select informative and non-redundant genes is carried out. The effectiveness of different measures is assessed based on classification accuracy of different classifiers by applying them on different microarray gene expression datasets.


Classification Microarray gene expression data Feature selection Univariate filter t-test Fisher score Chi square Mutual information ReliefF Symmetric uncertainty 


  1. 1.
    Schena, M., Shalon, D., Davis, R., Brown, P.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)CrossRefGoogle Scholar
  2. 2.
    Schulze, A., Downward, J.: Navigating gene expression using microarrays – a technology review. Nat. Cell Biol. 3, E190–E195 (2001)CrossRefGoogle Scholar
  3. 3.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  4. 4.
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
  5. 5.
    Xiong, M.M., Jin, L., Li, W., Boerwinkle, E.: Tumor classification using gene expression profiles. Bio-techniques 29, 1264–1270 (2000)Google Scholar
  6. 6.
    Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., et al.: Machine learning in bioinformatics. Brief. Bioinform. 7, 86–112 (2006)CrossRefGoogle Scholar
  7. 7.
    Boulesteix, A.L., Strobl, C., Augustin, T., Daumer, M.: Evaluating microarray-based classifiers: an overview. Cancer Inform. 6, 77–97 (2008)CrossRefGoogle Scholar
  8. 8.
    Natsoulis, G., Ghaoui, L.E., Lanckriet, G.R.G., Tolley, A.M., Leroy, F., Dunleo, S., et al.: Classification of a large microarray data set: algorithm comparison and analysis of drug signatures. Genome Res. 15, 724–736 (2005)CrossRefGoogle Scholar
  9. 9.
    Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)CrossRefGoogle Scholar
  10. 10.
    Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised and semisupervised feature selection: a review on gene selection. IEEE Trans. Comput. Biol. Bioinform. 13, 971–989 (2015)Google Scholar
  11. 11.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–205 (2005)CrossRefGoogle Scholar
  12. 12.
    Wang, L., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(1), 40–53 (2007)CrossRefGoogle Scholar
  13. 13.
    Liao, J.G., Chin, K.-V.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)CrossRefGoogle Scholar
  14. 14.
    Maji, P., Das, C.: Relevant and significant supervised gene clusters for microarray cancer classification. IEEE Trans. Nanobiosc. 11(2), 161–168 (2012)CrossRefGoogle Scholar
  15. 15.
    Leung, Y., Hung, Y.: A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 108–117 (2010)CrossRefGoogle Scholar
  16. 16.
    Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice Hall, Upper Saddle River (1982)zbMATHGoogle Scholar
  17. 17.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1999)Google Scholar
  18. 18.
    Mani, K., et al.: A review on filter based feature selection method. Int. J. Innov. Res. Comput. Commun. Eng. 4(5), 9146–9156 (2016)Google Scholar
  19. 19.
    Yang, P., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinform. 11(Suppl. 1), S5 (2010).
  20. 20.
    Maji, P.: f-Information measures for efficient selection of discriminative genes from microarray data. IEEE Trans. Biomed. Eng. 56(4), 1063–1069 (2009)CrossRefGoogle Scholar
  21. 21.
    Liu, X., Krishnan, A., Mondry, A.: An entropy based gene selection method for cancer classification using microarray data. BMC Bioinform. 6(76), 1–14 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Chandra Das
    • 1
  • Shilpi Bose
    • 1
    Email author
  • Abhik Banerjee
    • 1
  • Sourav Dutta
    • 1
  • Kuntal Ghosh
    • 2
  • Matangini Chattopadhyay
    • 3
  1. 1.Department of Computer Science and EngineeringNetaji Subhash Engineering CollegeKolkataIndia
  2. 2.Machine Intelligence UnitIndian Statistical InstituteKolkataIndia
  3. 3.School of Education TechnologyJadavpur UniversityKolkataIndia

Personalised recommendations