Abstract
The SARS-CoV-2 virus has demonstrated its ability to adapt and spread in various environments, making it a challenging target for identification and prediction. While current studies in the field concentrates on utilization of transcriptome sequence classification to identify the virus, circular RNAs (circRNAs) have shown potential as a diagnostic marker for viral diseases. These single-stranded, covalently closed RNA molecules possess unique features such as RNA binding capacity and expression regulation, making it a promising source for potential biomarkers to create a new classification model. In this study, we propose a circRNA-based classification model utilizing the dna2vec algorithm to extract distributed representations of variable-length k-mers, combined with classical machine learning algorithms. The results demonstrate superior performance of the model, with Random Forest classifier achieving an accuracy of 99.99%, highlighting the efficacy of circRNA-based classification for SARS-CoV-2 identification and the potential of circRNAs as diagnostic markers for viral diseases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Metsky, H.C., Freije, C.A., Kosoko-Thoroddsen, T.-S.F., Sabeti, P.C., Myhrvold, C.: CRISPR-based surveillance for COVID-19 using genomically-comprehensive machine learning design (2020). https://doi.org/10.1101/2020.02.26.967026
Xie, H., et al.: The role of circular RNAs in viral infection and related diseases. Virus Res. 291, 198205 (2021). https://doi.org/10.1016/j.virusres.2020.198205
Avilala, J., et al.: Role of virally encoded circular RNAs in the pathogenicity of human oncogenic viruses. Front. Microbiol. 12, 657036 (2021). https://doi.org/10.3389/fmicb.2021.657036
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). https://doi.org/10.1093/bioinformatics/btp324
Zielezinski, A., Vinga, S., Almeida, J., Karlowski, W.M.: Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18(1), 1–17 (2017). https://doi.org/10.1186/s13059-017-1319-7
Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics 32(12), i121–i127 (2016). https://doi.org/10.1093/bioinformatics/btw255
Randhawa, G.S., Hill, K.A., Kari, L.: MLDSP-GUI: An alignment-free standalone tool with an interactive graphical user interface for DNA sequence comparison and analysis. Bioinformatics 36(7), 2258–2259 (2020). https://doi.org/10.1093/bioinformatics/btz918
Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)
Kwan, H., Arniker, S.: Numerical representation of DNA sequences, pp. 307–310 (2009). https://doi.org/10.1109/EIT.2009.5189632
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to DNA sequence classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 129–140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44332-4_10
Asgari, E., Mofrad, M.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10, e0141287 (2015). https://doi.org/10.1371/journal.pone.0141287
Kimothi, D., et al.: Distributed representations for biological sequence analysis (2016). ArXiv abs/1608.05949
Ng, P.: dna2vec: consistent vector representations of variable-length k-mers (2017) arXiv preprint. arXiv:1701.06279
Lopez-Rincon, A., Tonda, A., Mendoza-Maldonado, L., et al.: Classification and specific primer design for accurate detection of SARS-CoV-2 using deep learning. Sci. Rep. 11, 947 (2021). https://doi.org/10.1038/s41598-020-80363-5
Zhang, J., Chen, Q., Liu, B.: DeepDRBP-2L: a new genome annotation predictor for identifying DNA binding proteins and RNA binding proteins using convolutional neural network and long short-term memory. IEEE/ACM Trans. Comput. Biol. Bioinf. 18, 1454–1463 (2019). https://doi.org/10.1109/TCBB.2019.2952338
Whata, A., Chimedza, C.: Deep learning for SARS COV-2 genome sequences. IEEE Access 9, 59597–59611 (2021). https://doi.org/10.1109/ACCESS.2021.3073728
Saha, I., Ghosh, N., Maity, D., Seal, A., Plewczynski, D.: COVID-DeepPredictor: recurrent neural network to predict SARS-CoV-2 and other pathogenic viruses. Front. Genet. 12, 569120 (2021). https://doi.org/10.3389/fgene.2021.569120
Ganesan, S., Sachin Kumar, S., Soman, K.P.: Biological sequence embedding based classification for MERS and SARS. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds.) ICACDS 2021. CCIS, vol. 1440, pp. 475–487. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81462-5_43
Ganesan, S., Kumar, S.S., Soman, K.P. Deep Learning Based NLP Embedding Approach for Biosequence Classification. In: Chbeir, R., Manolopoulos, Y., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2021. Lecture Notes in Computer Science, vol. 13119, pp. 161–173. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21517-9_16
Cai, Z., et al.: VirusCircBase: a database of virus circular RNAs. Brief. Bioinform. 22(2), 2182–2190 (2021). https://doi.org/10.1093/bib/bbaa052
Cai, Z., et al.: Identification and characterization of circRNAs encoded by MERS-CoV, SARS-CoV-1 and SARS-CoV-2. Brief. Bioinform. 22(2), 1297–1308 (2021). https://doi.org/10.1093/bib/bbaa334
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vinayak, M., Anandaram, H., Sachin Kumar, S., Soman, K.P. (2023). Circ RNA Based Classification of SARS CoV-2, SARS CoV-1 and MERS-CoV Using Machine Learning. In: Singh, M., Tyagi, V., Gupta, P., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2023. Communications in Computer and Information Science, vol 1848. Springer, Cham. https://doi.org/10.1007/978-3-031-37940-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-37940-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37939-0
Online ISBN: 978-3-031-37940-6
eBook Packages: Computer ScienceComputer Science (R0)