Abstract
A significant application of microarray gene expression data is the classification and prediction of biological models. An essential component of data analysis is dimension reduction. This study presents a comparison study on a reduced data using Analysis of Variance (ANOVA) and Recursive Feature Elimination (RFE) feature selection dimension reduction techniques, and evaluates the relative performance evaluation of classification procedures of Support Vector Machine (SVM) classification technique. In this study, an accuracy and computational performance metrics of the processes were carried out on a microarray colon cancer dataset for classification, SVM-RFE achieved 93% compared to ANOVA with 87% accuracy in the classification output result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aaron, T.L., Davis, J.M, John, C.M.: A step-by-step workflow for low-level analysis of single-cell RNA-seq data. Research 1(5), 1–62. https://doi.org/10.12688/f1000research.9501.2
Ana, C., et al.: A survey of best practices for RNA-seq data analysis. Genome Biol. 17(13), 1–19 (2016). https://doi.org/10.1186/s13059-016-0881-8
Levin, J.Z., et al.: Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010)
Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241–257 (2015)
Dongfang, W., Jin, G.: VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variation autoencoder. Genom. Proteom. Bioinform. (2018). https://doi.org/10.1016/j.gpb.2018.08.03
Junhyong, K.: Computational Analysis of RNA-Seq Data: From Quantification to High-Dimensional Analysis. University of Pennsylvania, pp. 35–43 (2012)
Bacher, R., and Kendziorski, C.: Design and computational analysis of single-cell RNA-seq experiments. Genome Biol. 17(63) (2016)
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. myAcad. Sci. USA 8; 96(12), 6745–6750 (1999)
Chieh, L., Siddhartha, J., Hannah, K., Ziv, B.: Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Res. 45(17), 1–11 (2017). https://doi.org/10.1093/nar/gkx681
Mariangela, B., et al.: RNA-seq analyses of changes in the Anopheles gambiae transcriptome associated with resistance to pyrethroids in Kenya: identification of candidate-resistance genes and candidate-resistance SNPs. Paras. Vector 8(474), 1–13 (2015). https://doi.org/10.1186/s13071-015-1083-z
Bezanson, J., Karpinski, S., Shah, V., Edelman, A.: Julia: a fast-dynamic language for technical computing (2012). arXiv:1209.5145
Gary, A.C.: Using ANOVA to analyze microarray data. Biotechn. Future Sci. 37(2), 1–5 (2018)
Mukesh, K., Nitish, K.R., Amitav, S., Santanu, K.R.: Feature selection and classification of microarray data using MapReduce Based ANOVA and KNN. Procedia Comput. Sci. 54, 301–310 (2015)
Ding, Y., Dawn, W.: Improving the performance of SVM-RFE to select genes in microarray data. BMC Bioinform. 2(12), 1–11 (2015)
Shruti, M., Mishra, D.: SVM-BT-RFE: an improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm. Karbala Int. J. Modern Sci. 1(2), 86–96 (2015)
Rimah, A., Dorra, B.A., Noureddine, E.: An empirical comparison of SVM and some supervised learning algorithms for vowel recognition. Int. J. Intell. Inf. Process. (IJIIP) 3(1), 1–5 (2012)
Aydadenta, H., Adiwijaya: On the classification techniques in data mining for microarray data classification. In: International Conference on Data and Information Science, Journal of Physics: Conf. Series vol. 971. pp. 1–10 (2018). https://doi.org/10.1088/1742-6596/971/1/012004
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM TIST. 2(3), 27
Soofi, A.A., Awan, A.: Classification techniques in. machine learning: applications and issues. J. Basic Appl. Sci. 13, 459–465 (2017)
Khan, A., Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 1–17 (2010)
Bhavsar, H., Panchal, M.H.: A review on support vector machine for data classification. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 1(2), 185–189 (2012)
Devi, A.V., Devaraj, D.V.: Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput. Sci. 47, 13–21 (2015)
Esra, P., Hamparsum, B., Sinan, Ç.: A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification. Comput. Math. Methods Med. 1, 1–14 (2015). https://doi.org/10.1155/2015/370640
Wenyan, Z., Xuewen, L., Jingjing, W.: Feature selection for cancer classification using microarray gene expression data. Biostat. Biometr. J. 1(2), 1–7 (2017)
Balamurugan, M., Nancy, A., Vijaykumar, S.: Alzheimer’s disease diagnosis by using dimensionality reduction based on KNN classifier. Biomed. Pharmacol. J. 10(4), 1823–1830 (2017)
Usman, A., Shazad, A., Javed, F.: Using PCA and factor analysis for dimensionality reduction of bio-informatics data. (IJACSA) Int. J. Adv. Comput. Sci. Appl. 8(5), 515–426 (2017)
Gökmen, Z., et al.: A comprehensive simulation study on classification of RNASeq data. PLoS ONE J. 12(8), 1–24 (2017)
Ian, T.J., Jorge, C.: Principal component analysis: a review and recent developments. Philosoph. Trans. Math. Phys. Eng. Sci. 374, 1–21 (2017)
Nathan, T.J., Andi, D., Katelyn, J.H., Dmitry, K.: Biological classification with RNA-Seq data: Can alternative splicing enhance machine learning classifier? bioRxiv. doi:http://dx.doi.org/10.1101/146340 (2017)
Keerthi, K.V., Surendiran, B.: Dimensionality reduction using Principal Component Analysis for network intrusion detection. Perspect. Sci. 8, 510–512 (2016)
Sofie, V.: A comparative review of dimensionality reduction methods for high-throughput single-cell transcriptomics. Master’s dissertation submitted to Ghent University to obtain the degree of Master of Science in Biochemistry and Biotechnology. Major Bioinformatics and Systems Biology, pp. 1–88 (2017)
Elavarasan, Mani, K.: A survey on feature extraction techniques. Int. J. Innov. Res. Comput. Commun. Eng. 3(1), 1–4 (2015)
Divya, J., Vijendra, S.: Feature selection and classification systems for chronic disease prediction: a review. Egyptian Inform. J. (2018). https://doi.org/10.1016/j.eij.2018.03.002
Awotunde, J.B., Ogundokun, R.O., Ayo, Femi E., Ajamu, Gbemisola J., Adeniyi, E.A., Ogundokun, E.O.: Social media acceptance and use among university students for learning purpose using UTAUT model. In: Borzemski, L., Świątek, J., Wilimowska, Z. (eds.) ISAT 2019. AISC, vol. 1050, pp. 91–102. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-30440-9_10
Ogundokun, R.O.: Evaluation of the scholastic performance of students in 12 programs from a private university in the south-west geopolitical zone in Nigeria. Research 8 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Abdulsalam, S.O. et al. (2020). Performance Evaluation of ANOVA and RFE Algorithms for Classifying Microarray Dataset Using SVM. In: Themistocleous, M., Papadaki, M., Kamal, M.M. (eds) Information Systems. EMCIS 2020. Lecture Notes in Business Information Processing, vol 402. Springer, Cham. https://doi.org/10.1007/978-3-030-63396-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-63396-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63395-0
Online ISBN: 978-3-030-63396-7
eBook Packages: Computer ScienceComputer Science (R0)