A Survey on Feature Selection and Extraction Techniques for High-Dimensional Microarray Datasets

  • G. Manikandan
  • S. Abirami


In recent years, lots of data are generated and stored in the field of information technology, bioinformatics, text mining, face recognition, microarray data analysis, image processing, etc. From this microarray gene expression, data analysis gained the more importance due to role of disease diagnosis and prognoses to choose the appropriate treatment to the patients. Generally, gene expression data are a sort of high-dimensional data with small number of observation and large number of attributes. Interpreting the results from the gene expression data are difficult one due to the “curse of dimensionality.” For this issue, dimensionality reduction plays an important role, since it reduces the number of variables by using the techniques such as feature selection and feature extraction. The main aim of these approaches is to reduce/downscale the high-dimensional feature space to low-dimensional representation with an affection of classification accuracy. For this concern, the objective of this chapter is to gather and provide the up-to-date knowledge in the field of feature selection methods applied in the microarray data analysis as possible for the readers. In this chapter, a brief introduction about feature selection methods in the DNA microarray analysis was presented in Sect. 1. The taxonomy of the dimensionality reduction methods is represented in the diagrammatic way. Five feature selection methods such as filter, wrapper, embedded, hybrid, and ensemble methods have been discussed in detailed manner and tabulated the recent proposed algorithms, datasets used and accuracy achieved in the respective methods, and also the advantages and disadvantages of each method are discussed in Sect. 2. In Sect. 3, supervised, unsupervised, semi-supervised gene selection methods, with advantages and disadvantages were discussed. Section 4 discusses the feature extraction techniques applied in the microarray data analysis. Finally, Sect. 5 provides intrinsic characteristics of microarray data with respect feature selection.


  1. 1.
    James, A. P., & Dimitrijev, S. (2012). Ranked selection of nearest discriminating features. Human-Centric Computing and Information Sciences, 2, 12.CrossRefGoogle Scholar
  2. 2.
    Ang, J. C., et al. (2016). Supervised, unsupervised and semi-supervised feature selection: A review on gene selection. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(5), 971–989.CrossRefGoogle Scholar
  3. 3.
    Yu, L., & Liu, H. (2004). Redundancy based feature selection for microarray data. In Proceedings of the Tenth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 737–742).Google Scholar
  4. 4.
    Ambusaidi, M. A., et al. (2016). Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Transactions on Computers, 65(10), 2986–2998.MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Mohapatra, P., Chakravarty, S., & Dash, P. K. (2016). Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system. Swarm and Evolutionary Computation, 28, 144–160.CrossRefGoogle Scholar
  6. 6.
    Hoque, N., et al. (2016). A fuzzy mutual information-based feature selection method for classification. Fuzzy Information and Engineering, 8(3), 355–384.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Raza, M. S., & Qamar, U. (2016). An incremental dependency calculation technique for feature selection using rough sets. Information Sciences, 343, 41–65.MathSciNetCrossRefGoogle Scholar
  8. 8.
    Guo, S., et al. (2016). A centroid-based gene selection method for microarray data classification. Journal of Theoretical Biology, 400, 32–41.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Chen, H., Zhang, Y., & Gutman, I. (2016). A kernel-based clustering method for gene selection with gene expression data. Journal of Biomedical Informatics, 62, 12–20.CrossRefGoogle Scholar
  10. 10.
    Wang, S., & Wei, J. (2017). Feature selection based on measurement of ability to classify subproblems. Neurocomputing, 224, 155–165.CrossRefGoogle Scholar
  11. 11.
    Liu, H., Lui, L., & Zhang, H. (2008). Feature selection using mutual information: An experimental study. In PRICAI 2008: Trends in Artificial Intelligence (pp. 235–246). New York: Springer.Google Scholar
  12. 12.
    Sharma, A., Imoto, S., & Miyano, S. (2012). A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(3), 754–764.CrossRefGoogle Scholar
  13. 13.
    Kim, H. J., Choi, B. S., & Huh, M. Y. (2016). Booster in high dimensional data classification. IEEE Transactions on Knowledge and Data Engineering, 28(1), 29–40.CrossRefGoogle Scholar
  14. 14.
    Kang, S., Kim, D., & Cho, S. (2016). Efficient feature selection-based on random forward search for virtual metrology modeling. IEEE Transactions on Semiconductor Manufacturing, 29(4), 391–398.CrossRefGoogle Scholar
  15. 15.
    Choi, K. S., Zeng, Y., & Qin, J. (2012). Using sequential floating forward selection algorithm to detect epileptic seizure in EEG signals. In 2012 IEEE 11th International Conference on Signal Processing (ICSP), (Vol. 3), IEEE.Google Scholar
  16. 16.
    Apolloni, J., Leguizamón, G., & Alba, E. (2016). Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Applied Soft Computing, 38, 922–932.CrossRefGoogle Scholar
  17. 17.
    Chuang, L.-Y., et al. (2011). A hybrid feature selection method for DNA microarray data. Computers in Biology and Medicine, 41(4), 228–237.CrossRefGoogle Scholar
  18. 18.
    Lee, C. P., & Leu, Y. (2011). A novel hybrid feature selection method for microarray data analysis. Applied Soft Computing, 11(1), 208–213.CrossRefGoogle Scholar
  19. 19.
    Hsu, H.-H., Hsieh, C.-W., & Lu, M. D. (2011). Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 38(7), 8144–8150.CrossRefGoogle Scholar
  20. 20.
    Zorarpacı, E., & Özel, S. A. (2016). A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Systems with Applications, 62, 91–103.CrossRefGoogle Scholar
  21. 21.
    Lan, L., & Vucetic, S. (2011). Improving accuracy of microarray classification by a simple multi-task feature selection filter. International Journal of Data Mining and Bioinformatics, 5(2), 189–208.CrossRefGoogle Scholar
  22. 22.
    Wang, X., & Gotoh, O. (2010). A robust gene selection method for microarray-based cancer classification. Cancer Informatics, 9, 15–30.Google Scholar
  23. 23.
    Maulik, U., & Chakraborty, D. (2014). Fuzzy preference based feature selection and semi-supervised SVM for cancer classification. IEEE Transactions on Nanobioscience, 13(2), 152–160.CrossRefGoogle Scholar
  24. 24.
    Liao, B., Jiang, Y., Liang, W., Zhu, W., Cai, L., & Cao, Z. (2014). Gene selection using locality sensitive laplacian score. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11(6), 1146–1156.CrossRefGoogle Scholar
  25. 25.
    Liu, Q., Zhao, Z., Li, Y., Yu, X., & Wang, Y. (2013). A novel method of feature selection based on SVM. Journal of Computers, 8(8), 2144–2149.Google Scholar
  26. 26.
    Yu, L., Han, Y., & Berens, M. E. (2012). Stable gene selection from microarray data via sample weighting. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(1), 262–272.CrossRefGoogle Scholar
  27. 27.
    Wanga, A., Ana, N., Yanga, J., Chenb, G., Lia, L., & Alterovitzc, G. (2017). Wrapper-based gene selection with Markov blanket. Computers in Biology and Medicine, 81, 11–23.CrossRefGoogle Scholar
  28. 28.
    He, W., Zhu, X., Cheng, D., Hu, R., & Zhang, S. (2017). Unsupervised feature selection for visual classification via feature representation property. Neurocomputing, 236, 5–13.CrossRefGoogle Scholar
  29. 29.
    Dadaneh, B. Z., Markid, H. Y., & Zakerolhosseini, A. (2016). Unsupervised probabilistic feature selection using ant colony optimization. Expert Systems with Applications, 53, 27–42.CrossRefGoogle Scholar
  30. 30.
    Wang, C., Machiraju, R., & Huang, K. (2014). Breast cancer patient stratification using a molecular regularized consensus clustering method. Methods, 67, 304–312.CrossRefGoogle Scholar
  31. 31.
    Sheikhpour, R., et al. (2017). A survey on semi-supervised feature selection methods. Pattern Recognition, 64, 141–158.CrossRefGoogle Scholar
  32. 32.
    Aziz, R., Verma, C. K., & Srivastava, N. (2017). Dimension reduction methods for microarray data: A review. AIMS Bioengineering, 4(2), 179–197.CrossRefGoogle Scholar
  33. 33.
    Hosseinzadeh, F., KayvanJoo, A. M., Ebrahimi, M., & Goliaei, B. (2013). Prediction of lung tumor types based on protein attributes by machine learning algorithms. Springer Plus, 2, 238.CrossRefGoogle Scholar
  34. 34.
    Herland, M., Khoshgoftaar, T. M., & Wald, R. (2014). A review of data mining using big data in health informatics. Journal of Big data, 1, 4.CrossRefGoogle Scholar
  35. 35.
    Hira, Z. M., & Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data. Advances in Bioinformatics, Article ID 198363, pp 1–13.Google Scholar
  36. 36.
    Khalid, S., Khalil, T., & Nasreen, S. (2014). A survey of feature selection and feature extraction techniques in machine learning. In Science and Information Conference (pp. 371–378).Google Scholar
  37. 37.
    Masulli, F., Peterson, L. E., & Tagliaferri, R. (2009). Eds., Vol. 6160 of Lecture Notes in Computer Science (pp. 82–96), Berlin, Germany: Springer.Google Scholar
  38. 38.
    Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRefGoogle Scholar
  39. 39.
    Guyon, I., Bitter, H. M., Ahmed, Z., Brown, M., & Heller, J. (2005). Multivariate non-linear feature selection with kernel methods. In Soft Computing for Information Processing and Analysis (pp. 313–326).Google Scholar
  40. 40.
    Quiñonero Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009). Dataset shift in machine learning. Cambridge, MA: The MIT Press.Google Scholar
  41. 41.
    Moreno-Torres, J. G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N. V., & Herrera, F. (2011). A unifying view of dataset shift in classification. Pattern Recognition.Google Scholar
  42. 42.
    Han, X., et al. (2014). Feature subset selection by gravitational search algorithm optimization. Information Sciences, 281, 128–146.MathSciNetCrossRefGoogle Scholar
  43. 43.
    Xue, B., et al. (2016). A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation, 20(4), 606–662.CrossRefGoogle Scholar
  44. 44.
    Sharbaf, F. V., Mosafer, S., & Moattar, M. H. (2016). A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Journal on Genomics, 107(6), 231–238.CrossRefGoogle Scholar
  45. 45.
    Solorio-Fernández, S., Carrasco-Ochoa, J. A., & FcoMartínez-Trinidad, J. (2016). A new hybrid filter–wrapper feature selection method for clustering based on ranking. Journal on Neurocomputing, 214, 866–880.CrossRefGoogle Scholar
  46. 46.
    Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., & Gao, Z. (2017). A hybrid feature selection algorithm for gene expression data classification. Article on Neurocomputing, 256, 1–7.CrossRefGoogle Scholar
  47. 47.
    Zhu, M., & Song, J. (2013). An embedded backward feature selection method for multiple criteria linear programming (MCLP) classification algorithm. Procedia Computer Science, 17, 1047–1054.CrossRefGoogle Scholar
  48. 48.
    Mishra, S., & Mishra, D. (2015). SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm. Karbala International Journal on Modern Science, 1, 86–96.CrossRefGoogle Scholar
  49. 49.
    Li, Z. G., Meng, H. H., & Ni, J. (2008). Embedded gene selection for imbalanced microarray data analysis. In International Multi-symposiums on Computer and Computational Sciences (pp. 17–24).Google Scholar
  50. 50.
    Bonilla-Huerta, E., Hernandez-Montiel, A., Morales-Caporal, R., & Arjona-Lopez, M. (2016). Hybrid framework using multiple-filters and an embedded approach for an efficient selection and classification of microarray data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(1), 12–23.CrossRefGoogle Scholar
  51. 51.
    Sheng, L., Pique-Regi, R., Asgharzadeh, S., & Ortega, A. (2009). Microarray classification using block diagonal linear discriminant analysis with embedded feature selection. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 (pp. 1757–1760). ICASSP 2009.Google Scholar
  52. 52.
    Liu, K.-H., Zeng, Z.-H., & Ng, V. T. Y. (2016). A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data. Information Sciences, 349, 102–118.CrossRefGoogle Scholar
  53. 53.
    Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2012). An ensemble of filters and classifiers for microarray data classification. Pattern Recognition, 45(1), 531–539.CrossRefGoogle Scholar
  54. 54.
    Mollaee, M., & Mohammad, M. H. (2016). A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Bio Cybernetics and Biomedical Engineering, 36(3), 521–529.MathSciNetCrossRefGoogle Scholar
  55. 55.
    Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., & Alonso-Betanzos, A. (2017). Ensemble feature selection: Homogeneous and heterogeneous approaches. Knowledge-Based Systems, 118, 124–139.CrossRefGoogle Scholar
  56. 56.
    Das, A. K., Das, S., & Ghosh, A. (2017). Ensemble feature selection using bi-objective genetic algorithm. Knowledge-Based Systems, 123, 116–127.CrossRefGoogle Scholar
  57. 57.
    Liu, H., Liu, L., & Zhang, H. (2010). Ensemble gene selection by grouping for microarray data classification. Journal of Biomedical Informatics, 43(1), 81–87.CrossRefGoogle Scholar
  58. 58.
    Ebrahimpour, M. K., & Eftekhari, M. (2017). Ensemble of feature selection methods: A hesitant fuzzy sets approach. Applied Soft Computing, 50, 300–312.CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of ISTAnna UniversityChennaiIndia

Personalised recommendations