Efficient Feature Selection Algorithm for Gene Classification

Naik, Narayan; Sharath Kumar, Y. H.

doi:10.1007/978-3-031-22405-8_14

Narayan Naik^10,11 &
Y. H. Sharath Kumar^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1697))

Included in the following conference series:

International Conference on Cognition and Recongition

325 Accesses

Abstract

Microarray technology was evolved as one of the authoritative mechanisms for an organism to analysis of gene expression level. The microarray gene expression datasets contain a considerably large number (in terms of thousands) of features (genes) and a comparatively small number (in terms of hundreds) of samples. Because of these characteristics, microarray gene expression data analysis is complex. Therefore, efficient feature selection is the immediate requirement. The essential aspects of microarray gene expression data analysis are feature selection and classification. Although many feature selection methods were developed, the SVM, along with recursive component reduced termed as SVM-RFE, was tested to be a promising method. The genes are ranked during SVM classification model training, and critical features are selected with a combination of recursive feature elimination (RFE). The SVM-RFE main drawback was a significant amount of time consumption in the process. Therefore, efficient deployment of linear Support Vector Machine was introduced to overcome this issue. At the same time, Recursive Feature Elimination (RFE) was improvised with the technique known as the variable step size. Along with this, an effective resampling technique was proposed to preprocess the datasets in order to overcome the class imbalance problem. By using this method, the sample became balance from the same distribution that provides better classification result. The recursive feature elimination with variable step size (RFEVSS) with an effective resampling method was used in order to achieve better performance of the classifier that has been presented in this work. The class imbalance problem was addressed by implementation the effective resampling method described in this work. The large-scale linear support vector machine (LLSVM) has also been implemented effectively in order to increase efficiency. The detailed experiments were conducted to test the result with three classifiers on four benchmark microarray gene expression datasets. The results were presented in graphical form for better understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. (2015)
Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection for high-dimensional data. Progr. Artif. Intell. 5(2), 65–75 (2016)
Google Scholar
Elkhani, N., Muniyandi, R.C.: Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets. Int. J. Soft Comput. 11(5), 334–342 (2016)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)
Article Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Article Google Scholar
Zhou, X., Mao, K.: Ls bound based gene selection for DNA microarray data. Bioinformatics 21(8), 1559–1564 (2004)
Article Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 9(1), 31–37 (2009)
Article Google Scholar
Yoon, S., Kim, S.: Mutual information-based SVM-RFE for diagnostic classification of digitized mammograms. Pattern Recogn. Lett. 30(16), 1489–1495 (2009)
Article Google Scholar
Tang, Y., Zhang, Y.-Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(3), 365–381 (2007)
Article Google Scholar
Yin, J., Hou, J., She, Z., Yang, C., Yu, H.: Improving the performance of SVM-RFE on classification of pancreatic cancer data. In: 2016 IEEE International Conference on Industrial Technology (ICIT), pp. 956–961. IEEE (2016)
Google Scholar
Zhu, B., Baesens, B., vanden Broucke, S.K.: An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 408, 84–99 (2017)
Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybridbased approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)
Article Google Scholar
Qian, Y., Liang, Y., Li, M., Feng, G., Shi, X.: A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing 143, 57–67 (2014)
Article Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced datasets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
Article Google Scholar
Yu, H.-F., Huang, F.-L., Lin, C.-J.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011)
Article MathSciNet MATH Google Scholar
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Mach. Learn. Res. 11(Nov), 3183–3234 (2010)
Google Scholar
Vapnik, V., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57
Chapter Google Scholar
Yang, D., Zhu, X.: Gene correlation guided gene selection for microarray data classification. BioMed. Res. Int. 2021, Article ID 6490118, 11 p. (2021). https://doi.org/10.1155/2021/6490118
Ramadhani, P.T., Nasution, B.B.: Neural network as a preferred method for microarray data classification. In: 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), pp. 337–340 (2021). https://doi.org/10.1109/ICSECS52883.2021.00068

Download references

Author information

Authors and Affiliations

Canara Engineering College, Mangaluru, India
Narayan Naik & Y. H. Sharath Kumar
Maharaja Institute of Technology Mysore, Srirangapatna, India
Narayan Naik & Y. H. Sharath Kumar

Authors

Narayan Naik
View author publications
You can also search for this author in PubMed Google Scholar
Y. H. Sharath Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Narayan Naik .

Editor information

Editors and Affiliations

University of Mysore, Mysore, India
D. S. Guru
Maharaja Institute of Technology Mysore, Mandya, India
Sharath Kumar Y. H.
Maharaja Institute of Technology Mysore, Mandya, India
Balakrishna K.
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
Tokyo Denki University, Tokyo, Japan
Manabu Ichino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Naik, N., Sharath Kumar, Y.H. (2022). Efficient Feature Selection Algorithm for Gene Classification. In: Guru, D.S., Y. H., S.K., K., B., Agrawal, R.K., Ichino, M. (eds) Cognition and Recognition. ICCR 2021. Communications in Computer and Information Science, vol 1697. Springer, Cham. https://doi.org/10.1007/978-3-031-22405-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-22405-8_14
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22404-1
Online ISBN: 978-3-031-22405-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics