Abstract
The healthcare industry is moving toward personalized medicine which requires the use of individual genetic information so that medical treatment can be customized to the specific properties of an individual. DNA sequence of a genome consists of several genes. These genes are the basic building blocks of an organism. A human genome consists of 20–30 thousand genes. Some of these genes are involved in the growth and development of the body and some are responsible for the production of critical diseases (influenza, ebola, dengue) remaining are the non-coding (junk) genes. Identification and classification of these genes into a few biological meaningful groups: coding, non-coding, and viral are useful for the treatment and diagnosis of an organism. In this paper, k-mer (substrings of length k) frequency decomposition and soft-computing-based approach is used to classify and identify the large set of unknown genes into some meaningful groups. It works by first taking a DNA sequence and computing a vector of the proportions of every possible k-mer. These vectors are used as feature vectors, and a well-known supervised classification algorithm (multi-mode Naive Bayes classifier) is trained on the vectors. Experiments show that the proposed approach achieves the highest accuracy (\(\ge \)90 \(\%\)) along with the lowest running time in comparison with other state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bokulich, N.A., et al.: Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6(1), 90 (2018)
Nguyen, N.G., et al.: DNA sequence classification by convolutional neural network. J. Biomed. Sci. Eng. 9(05), 280 (2016)
Eickholt, J., Cheng, J.: DNdisorder: predicting protein disorder using boosting and deep networks. BMC Bioinform. 14(1), 88 (2013)
Leung, M.K.K., et al.: Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12), i121–i129 (2014)
Solis-Reyes, S., et al.: An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. PLoS One 13(11), e0206409 (2018)
Ma, Jianmin, Nguyen, Minh N., Rajapakse, Jagath C.: Gene classification using codon usage and support vector machines. IEEE/ACM Trans. Comput. Biol. Bioinf. 6(1), 134–143 (2009)
La Rosa, M., et al.: Genomic sequence classification using probabilistic topic modeling. In: International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics. Springer, Cham (2013)
Mukhopadhyay, S., et al.: A comparative study of genetic sequence classification algorithms. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing. IEEE (2002)
Buldyrev, S.V., et al.: Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. Phys. Rev. E 51(5), 5084 (1995)
Sharma, T.K., Pant, M.: Opposition-based learning embedded shuffled frog-leaping algorithm. In: Soft Computing: Theories and Applications, pp. 853-861. Springer, Singapore (2018)
Mahajan, R.: Emotion recognition via EEG using neural network classifier. In: Soft Computing: Theories and Applications, pp. 429–438. Springer, Singapore (2018)
Shinde, S., Brijesh, I.: IoT-enabled early prediction system for epileptic seizure in human being. In: Soft Computing: Theories and Applications, pp. 37–46. Springer, Singapore (2020)
Kumar, S., Agarwal, S.: An efficient tool for searching maximal and super maximal repeats in large DNA/protein sequences via induced-enhanced suffix array. Recent Patents Comput. Sci. 12(2), 128–134 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kumar, S. (2021). Gene Sequence Classification Using K-mer Decomposition and Soft-Computing-Based Approach. In: Sharma, T.K., Ahn, C.W., Verma, O.P., Panigrahi, B.K. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 1381. Springer, Singapore. https://doi.org/10.1007/978-981-16-1696-9_17
Download citation
DOI: https://doi.org/10.1007/978-981-16-1696-9_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1695-2
Online ISBN: 978-981-16-1696-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)