Abstract
Microarray technology was evolved as one of the authoritative mechanisms for an organism to analysis of gene expression level. The microarray gene expression datasets contain a considerably large number (in terms of thousands) of features (genes) and a comparatively small number (in terms of hundreds) of samples. Because of these characteristics, microarray gene expression data analysis is complex. Therefore, efficient feature selection is the immediate requirement. The essential aspects of microarray gene expression data analysis are feature selection and classification. Although many feature selection methods were developed, the SVM, along with recursive component reduced termed as SVM-RFE, was tested to be a promising method. The genes are ranked during SVM classification model training, and critical features are selected with a combination of recursive feature elimination (RFE). The SVM-RFE main drawback was a significant amount of time consumption in the process. Therefore, efficient deployment of linear Support Vector Machine was introduced to overcome this issue. At the same time, Recursive Feature Elimination (RFE) was improvised with the technique known as the variable step size. Along with this, an effective resampling technique was proposed to preprocess the datasets in order to overcome the class imbalance problem. By using this method, the sample became balance from the same distribution that provides better classification result. The recursive feature elimination with variable step size (RFEVSS) with an effective resampling method was used in order to achieve better performance of the classifier that has been presented in this work. The class imbalance problem was addressed by implementation the effective resampling method described in this work. The large-scale linear support vector machine (LLSVM) has also been implemented effectively in order to increase efficiency. The detailed experiments were conducted to test the result with three classifiers on four benchmark microarray gene expression datasets. The results were presented in graphical form for better understanding.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. (2015)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection for high-dimensional data. Progr. Artif. Intell. 5(2), 65–75 (2016)
Elkhani, N., Muniyandi, R.C.: Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets. Int. J. Soft Comput. 11(5), 334–342 (2016)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Zhou, X., Mao, K.: Ls bound based gene selection for DNA microarray data. Bioinformatics 21(8), 1559–1564 (2004)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 9(1), 31–37 (2009)
Yoon, S., Kim, S.: Mutual information-based SVM-RFE for diagnostic classification of digitized mammograms. Pattern Recogn. Lett. 30(16), 1489–1495 (2009)
Tang, Y., Zhang, Y.-Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(3), 365–381 (2007)
Yin, J., Hou, J., She, Z., Yang, C., Yu, H.: Improving the performance of SVM-RFE on classification of pancreatic cancer data. In: 2016 IEEE International Conference on Industrial Technology (ICIT), pp. 956–961. IEEE (2016)
Zhu, B., Baesens, B., vanden Broucke, S.K.: An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 408, 84–99 (2017)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybridbased approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)
Qian, Y., Liang, Y., Li, M., Feng, G., Shi, X.: A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing 143, 57–67 (2014)
Galar, M., Fernandez, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced datasets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
Yu, H.-F., Huang, F.-L., Lin, C.-J.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011)
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Mach. Learn. Res. 11(Nov), 3183–3234 (2010)
Vapnik, V., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57
Yang, D., Zhu, X.: Gene correlation guided gene selection for microarray data classification. BioMed. Res. Int. 2021, Article ID 6490118, 11 p. (2021). https://doi.org/10.1155/2021/6490118
Ramadhani, P.T., Nasution, B.B.: Neural network as a preferred method for microarray data classification. In: 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), pp. 337–340 (2021). https://doi.org/10.1109/ICSECS52883.2021.00068
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Naik, N., Sharath Kumar, Y.H. (2022). Efficient Feature Selection Algorithm for Gene Classification. In: Guru, D.S., Y. H., S.K., K., B., Agrawal, R.K., Ichino, M. (eds) Cognition and Recognition. ICCR 2021. Communications in Computer and Information Science, vol 1697. Springer, Cham. https://doi.org/10.1007/978-3-031-22405-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-22405-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22404-1
Online ISBN: 978-3-031-22405-8
eBook Packages: Computer ScienceComputer Science (R0)