Skip to main content

Efficient Feature Selection Algorithm for Gene Classification

  • Conference paper
  • First Online:
Cognition and Recognition (ICCR 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1697))

Included in the following conference series:

  • 378 Accesses

Abstract

Microarray technology was evolved as one of the authoritative mechanisms for an organism to analysis of gene expression level. The microarray gene expression datasets contain a considerably large number (in terms of thousands) of features (genes) and a comparatively small number (in terms of hundreds) of samples. Because of these characteristics, microarray gene expression data analysis is complex. Therefore, efficient feature selection is the immediate requirement. The essential aspects of microarray gene expression data analysis are feature selection and classification. Although many feature selection methods were developed, the SVM, along with recursive component reduced termed as SVM-RFE, was tested to be a promising method. The genes are ranked during SVM classification model training, and critical features are selected with a combination of recursive feature elimination (RFE). The SVM-RFE main drawback was a significant amount of time consumption in the process. Therefore, efficient deployment of linear Support Vector Machine was introduced to overcome this issue. At the same time, Recursive Feature Elimination (RFE) was improvised with the technique known as the variable step size. Along with this, an effective resampling technique was proposed to preprocess the datasets in order to overcome the class imbalance problem. By using this method, the sample became balance from the same distribution that provides better classification result. The recursive feature elimination with variable step size (RFEVSS) with an effective resampling method was used in order to achieve better performance of the classifier that has been presented in this work. The class imbalance problem was addressed by implementation the effective resampling method described in this work. The large-scale linear support vector machine (LLSVM) has also been implemented effectively in order to increase efficiency. The detailed experiments were conducted to test the result with three classifiers on four benchmark microarray gene expression datasets. The results were presented in graphical form for better understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. (2015)

    Google Scholar 

  2. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection for high-dimensional data. Progr. Artif. Intell. 5(2), 65–75 (2016)

    Google Scholar 

  3. Elkhani, N., Muniyandi, R.C.: Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets. Int. J. Soft Comput. 11(5), 334–342 (2016)

    Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  5. Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)

    Article  Google Scholar 

  6. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)

    Article  Google Scholar 

  7. Zhou, X., Mao, K.: Ls bound based gene selection for DNA microarray data. Bioinformatics 21(8), 1559–1564 (2004)

    Article  Google Scholar 

  8. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  10. Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 9(1), 31–37 (2009)

    Article  Google Scholar 

  11. Yoon, S., Kim, S.: Mutual information-based SVM-RFE for diagnostic classification of digitized mammograms. Pattern Recogn. Lett. 30(16), 1489–1495 (2009)

    Article  Google Scholar 

  12. Tang, Y., Zhang, Y.-Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(3), 365–381 (2007)

    Article  Google Scholar 

  13. Yin, J., Hou, J., She, Z., Yang, C., Yu, H.: Improving the performance of SVM-RFE on classification of pancreatic cancer data. In: 2016 IEEE International Conference on Industrial Technology (ICIT), pp. 956–961. IEEE (2016)

    Google Scholar 

  14. Zhu, B., Baesens, B., vanden Broucke, S.K.: An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 408, 84–99 (2017)

    Google Scholar 

  15. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybridbased approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)

    Article  Google Scholar 

  16. Qian, Y., Liang, Y., Li, M., Feng, G., Shi, X.: A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing 143, 57–67 (2014)

    Article  Google Scholar 

  17. Galar, M., Fernandez, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced datasets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)

    Article  Google Scholar 

  18. Yu, H.-F., Huang, F.-L., Lin, C.-J.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Mach. Learn. Res. 11(Nov), 3183–3234 (2010)

    Google Scholar 

  20. Vapnik, V., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)

    Google Scholar 

  21. Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57

    Chapter  Google Scholar 

  22. Yang, D., Zhu, X.: Gene correlation guided gene selection for microarray data classification. BioMed. Res. Int. 2021, Article ID 6490118, 11 p. (2021). https://doi.org/10.1155/2021/6490118

  23. Ramadhani, P.T., Nasution, B.B.: Neural network as a preferred method for microarray data classification. In: 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), pp. 337–340 (2021). https://doi.org/10.1109/ICSECS52883.2021.00068

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narayan Naik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Naik, N., Sharath Kumar, Y.H. (2022). Efficient Feature Selection Algorithm for Gene Classification. In: Guru, D.S., Y. H., S.K., K., B., Agrawal, R.K., Ichino, M. (eds) Cognition and Recognition. ICCR 2021. Communications in Computer and Information Science, vol 1697. Springer, Cham. https://doi.org/10.1007/978-3-031-22405-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22405-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22404-1

  • Online ISBN: 978-3-031-22405-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics