Skip to main content

Advertisement

Log in

Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification

  • Image & Signal Processing
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Cancer is a deadly disease which requires a very complex and costly treatment. Microarray data classification plays an important role in cancer treatment. An efficient gene selection technique to select the more promising genes is necessary for cancer classification. Here, we propose a Two-stage MI-GA Gene Selection algorithm for selecting informative genes in cancer data classification. In the first stage, Mutual Information based gene selection is applied which selects only the genes that have high information related to the cancer. The genes which have high mutual information value are given as input to the second stage. The Genetic Algorithm based gene selection is applied in the second stage to identify and select the optimal set of genes required for accurate classification. For classification, Support Vector Machine (SVM) is used. The proposed MI-GA gene selection approach is applied to Colon, Lung and Ovarian cancer datasets and the results show that the proposed gene selection approach results in higher classification accuracy compared to the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Reboiro, J. M., Arrais, J. P., Oliveira, J. L. et al., Gene committee: A web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification. BMC Bioinf. 15(1):31, 2014.

    Article  Google Scholar 

  2. Saber, H. B., and ELLOUMI, M., DNA microarray data analysis: A new survey on Biclustering. International Journal for Computational Biology (IJCB) 4(1):21–37, 2015.

    Article  Google Scholar 

  3. Kirubakaran, R., Periya Nayaki, A., and Prathibhan, C. M., A survey on data mining in big data. International Journal of Research and Scientific Innovation III(IA):37–40, 2016.

    Google Scholar 

  4. Algamal, Z. Y., and Lee, M. H., Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional Cancer classification. ELSEVIER Journal of Computers in Biology and Medicine 67:136–145, 2015.

    Article  CAS  Google Scholar 

  5. Ditzler, G., Polikar, R., and Rosen, G., A sequential learning approach for scaling up filter-based feature subset selection. IEEE Transactions on Neural Networks and Learning Systems PP(99):1–15, 2017.

    Google Scholar 

  6. Ma, L., Li, M., Gao, Y., Chen, T., Ma, X., and Qu, L., A novel wrapper approach for feature selection in object-based image classification using polygon-based cross-validation. IEEE Geoscience and Remote Sensing Letters 14(3):409–413, 2017.

    Article  Google Scholar 

  7. Leung, Y., and Hung, Y., A multi-filter-multi-wrapper approach to gene selection and microarray data classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7(1):108–117, 2010.

    Article  CAS  Google Scholar 

  8. Motieghader, H., Najafi, A., Sadeghi, B., and Masoudi-Nejad, A., A hybrid gene selection algorithm for microarray Cancer classification using genetic algorithm and learning automata. ELSEVIER, Informatics in Medicine Unlocked 9:246–254, 2017.

    Article  Google Scholar 

  9. Ray, S. S., Ganivada, A., and Pal, S. K., A granular self-organizing map for clustering of gene selection in microarray data. IEEE Transactions on Neural Networks and Learning Systems 27(9):1890–1906, 2016.

    Article  Google Scholar 

  10. Nguyen, T., and Nahavandi, S., Modified AHP for gene selection and Cancer classification using Type-2 fuzzy logic. IEEE Transactions on Fuzzy Systems 24(2):273–287, 2016.

    Article  Google Scholar 

  11. Han, F., Yang, C., Wu, Y.-Q., Zhu, J.-S., Ling, Q.-H., Song, Y.-Q., and Huang, D.-S., A gene selection method for microarray data based on binary PSO encoding gene-to-class sensitivity information. IEEE/ACM Transactions on Computational Biology and Bioinformatics 14(1):85–96, 2017.

    Article  Google Scholar 

  12. Li, J., Malley, J. D., Andrew, A. S., Karagas, M. R., Moore, J. H. Detecting gene-gene Interactions using a Permutation-based Random Forest Method, SPRINGER, BioData Mining, Volume 9, Issue 14, 2016.

  13. Martin, C. W., Tauchen, A., Becker, A., Nattkemper, T. W. A Normalized Tree Index for Identification of Correlated Clinical Parameters in Microarray Experiments, SPRINGER BioData Mining, Volume 4, Issue 2, 2011.

  14. Liao, C., Li, S., Luo, Z. Gene Selection for Cancer Classification using Wilcoxon Rank Sum Test and Support Vector Machine, IEEE International Conference on Computational Intelligence and Security, November 2006.

  15. Jansi Rani, M., Devaraj, D. A Combined Clustering and Ranking based Gene Selection Algorithm for Microarray Data Classification, IEEE International Conference on Computational Intelligence and Computing Research.

  16. Wan, Y-W, Nagorski, J., Allen, G. I., Li, Z., Liu, Z. Identifying Cancer Biomarkers Through a Network Regularized Cox Model, IEEE International Workshop on Genomic Signal Processing and Statistics, November 2013.

  17. Paul, A. K., and Shill, P. C., Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data. ELSEVIER, Biosystems 163:1–10, 2018.

    Article  CAS  Google Scholar 

  18. Sheng, J., Deng, H.-W., Calhoun, V., and Wang, Y.-P., Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(6):1568–1579, 2011.

    Article  Google Scholar 

  19. Du, W., Cao, Z., Song, T., Li, Y., Liang, Y. A Feature Selection Method based on Multiple Kernel Learning with Expression Profiles of Different Types, SPRINGER, BioData Mining, Volume 10, Issue 4, 2017.

  20. Dashtban, M., and Balafar, M., Gene selection for microarray Cancer classification using a new evolutionary method employing artificial intelligence concepts. ELSEVIER, Genomics 109(2):91–107, 2017.

    Article  CAS  Google Scholar 

  21. Jain, I., Jain, V. K., Jain, R. Correlation Feature Selection based improved-Binary Particle Swarm Optimization for Gene Selection and Cancer Classification, ELSEVIER, Applied Soft Computing, In Press, 2017.

  22. Garro, B. A., Rodriguez, K., and Vazquez, R. A., Classification of DNA microarrays using artificial neural networks and ABC algorithm. ELSEVIER, Applied Soft Computing 38:548–560, 2016.

    Article  Google Scholar 

  23. Alshamlan, H. M., Badr, G. H., and Alohali, Y. A., Genetic bee Colony (GBC) algorithm: A new gene selection method for microarray Cancer classification. ELSEVIER, Computational Biology and Chemistry 56:49–60, 2015.

    Article  CAS  Google Scholar 

  24. Peng, S., Xu, Q., Ling, X. B., Peng, X., Du, W., and Chen, L., Molecular classification of Cancer types from microarray data using the combination of genetic algorithms and Support vector machines. ELSEVIER, FEBS Letters 555(2):358–362, 2003.

    Article  CAS  Google Scholar 

  25. Nilashi, M., Ibrahim, O., Ahmadi, H., and Shahmoradi, L., A knowledge-based system for breast Cancer classification using fuzzy logic method. ELSEVIER, Telematics and Informatics 34(4):133–144, 2017.

    Article  Google Scholar 

  26. Lynch, C. M., Abdollahi, B., Fuqua, J. D., de Carlo, A. R., Bartholomai, J. A., Balgemann, R. N., van Berkel, V. H., and Hermann, B., Frieboes; “prediction of lung Cancer patient survival via supervised machine learning classification techniques”. ELSEVIER, International Journal of Medical Informatics 108:1–8, 2017.

    Article  Google Scholar 

  27. Jin, C., and Jin, S.-W., Gene selection approach based on improved swarm intelligent optimization algorithm for tumour classification. IET Systems Biology 10(3):107–115, 2016.

    Article  Google Scholar 

  28. Yan, Z., Yuan, C., in Biometric Authentication, First International Conference, ICBA 2004, Hong Kong, China, July 15–17 2004. Lecture Notes in Computer Science, ed. by D Zhang, AK Jain. Ant colony optimization for feature selection in face recognition (Springer, Berlin, 2004), pp. 15–17.

  29. Karaboga, D., Gorkemli, B., Ozturk, C., Karaboga, N. A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif. Int. Rev. (2012).

  30. Nakamura, R., Pereira, L., Costa, K., Rodrigues, D., Papa, J., in SIBGRAPI Conference on Graphics, Patterns and Images, BBA: a binary bat algorithmfor feature selection, (OuroPreto, 22–25 2012).

  31. Zhou, Q., Zhou, H., and Li, T., Cost-sensitive feature selection using random Forest: Selecting low-cost subsets of informative features. ELSEVIER, Knowledge-based Systems 95:1–11, 2016.

    Article  Google Scholar 

  32. Suguna, N., and Thanushkodi, K., An independent rough set approach hybrid with artificial bee colony algorithm for dimensionality reduction. Am. J. Appl. Sci. 8(3):261–266, 2011.

    Article  Google Scholar 

  33. Shokouhifar, M., Sabet, S., in 3rd International Conference on Machine Vision, Hybrid approach for effective feature selection using neural networks andartificial bee colony optimization (IEEE, Piscataway, 2010), pp. 502–506.

  34. Guillen, P., Ebalunode, J. Cancer Classification based on Microarray Gene Expression Data using Deep Learning, IEEE International Conference on Computational Science and Computational Intelligence, December 2016.

  35. Ahmed M. Abdel-Zaher, Ayman M. Eldeib; “Breast Cancer Classification using Deep Belief Networks”, ELSEVIER, Expert Systems with Applications, Volume 46, pp. 139–144.

  36. Xue, B., Cervante, L., Shang, L., and Zhang, M., A particle swarm optimization based multi-objective filter approach to feature selection for classification. Artif. Intell. Rev. 7458:673–685, 2012.

    Google Scholar 

  37. Chen, B., Chen, L., and Chen, Y., Efficient ant colony optimization for image feature selection. Signal Proc. 93(6):1566–1576, 2013.

    Article  Google Scholar 

  38. Lotfi, E., and Keshavarz, A., Gene expression microarray classification using PCA-BEL. ELSEVIER, Computers in Biology and Medicine 54:180–187, 2014.

    Article  CAS  Google Scholar 

  39. Taguchi, Y-h. Principle Component Analysis based Unsupervised Feature Extraction Applied to Budding Yeast Temporally Periodic Gene Expression, SPRINGER, BioData Mining, Volume 9, Issue 22, 2016.

  40. Zhang, L., Qian, L., Ding, C., Zhou, W., and Li, F., Similarity-balanced discriminant neighbor embedding and its application to Cancer classification based on gene expression data. ELSEVIER, Computers in Biology and Medicine 64:236–245, 2015.

    Article  Google Scholar 

  41. Vanitha, C. D. A., Devaraj, D., and Venkatesulu, M., Gene expression data classification using Support vector machine and mutual information-based gene selection. ELSEVIER Procedia Computer Science 47:13–21, 2015.

    Article  Google Scholar 

  42. Kaya, M., The effects of a new selection operator on the performance of a genetic algorithm. ELSEVIER, Applied Mathematics and Computation 217(19):7669–7678, 2011.

    Article  Google Scholar 

  43. Shuai, X., and Zhou, X., A genetic algorithm based on combination operators. ELSEVIER, Procedia Environmental Sciences 11, Part A:346–350, 2011.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Jansi Rani.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Image & Signal Processing

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jansi Rani, M., Devaraj, D. Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification. J Med Syst 43, 235 (2019). https://doi.org/10.1007/s10916-019-1372-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-019-1372-8

Keywords

Navigation