Abstract
The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19. We handle the class imbalance problem by sampling a fixed number of sequences for each class label. We handle the vanishing gradient problem in LSTMs arising from long sequences by dividing the sequence into fixed lengths and obtaining results on individual runs. Our results show that one-vs-all classifiers have test accuracies as high as 92.5% with tuned hyperparameters compared to the multi-class classifier model. Our experiments show higher overall accuracies for B.1.1.214, B.1.177.21, B.1.1.7, B.1.526, and P.1 on the one-vs-all classifiers, suggesting the presence of distinct mutations in these variants. Our results show that embedding vector size and batch sizes have insignificant improvement in accuracies, but changing from 2-mers to 3-mers mostly improves accuracies. We also studied individual runs which show that most accuracies improved after the 20th run, indicating that these sequence positions may have more contributions to distinguishing among different COVID-19 variants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hopkins J, Coronavirus resource center. https://coronavirus.jhu.edu
Riou J, Althaus CL (2020) Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Eurosurveillance 25(4):2000058
Nayak J, Mishra M, Naik B, Swapnarekha H, Cengiz K, Shanmuganathan V (2021) An impact study of COVID-19 on six different industries: automobile, energy and power, agriculture, education, travel and tourism and consumer electronics. Expert Syst
Shrestha N, Shad MY, Ulvi O, Khan MH, Karamehic-Muratovic A, Nguyen USDT, Baghbanzadeh M, Wardrup R, Aghamohammadi N, Cervantes D et al (2020) The impact of COVID-19 on globalization. One Health 100180
Walker P, Whittaker C, Watson O, Baguelin M, Ainslie K, Bhatia S, Bhatt S, Boonyasiri A, Boyd O, Cattarino L et al (2020) Report 12: the global impact of COVID-19 and strategies for mitigation and suppression
COVID-19 (coronavirus) drugs: are there any that work? https://www.mayoclinic.org/diseases-conditions/coronavirus/expert-answers/coronavirus-drugs/faq-20485627
Si L, Bai H, Rodas M, Cao W, Oh CY, Jiang A, Nurani A, Zhu DY, Goyal G, Gilpin SE et al (2020) Human organs-on-chips as tools for repurposing approved drugs as potential influenza and COVID19 therapeutics in viral pandemics. bioRxiv
Rinott E, Kozer E, Shapira Y, Bar-Haim A, Youngster I (2020) Ibuprofen use and clinical outcomes in COVID-19 patients. Clin Microbiol Infect 26(9):1259-e5
Payen JF, Chanques G, Futier E, Velly L, Jaber S, Constantin JM (2020) Sedation for critically ill patients with COVID-19: which specificities? one size does not fit all. Anaesth Crit Care Pain Med 39(3):341
Fontanet A, Cauchemez S (2020) COVID-19 herd immunity: where are we? Nat Rev Immunol 20(10):583–584
Le TT, Andreadakis Z, Kumar A, Román RG, Tollefsen S, Saville M, Mayhew S et al (2020) The COVID-19 vaccine development landscape. Nat Rev Drug Discov 19(5):305–306
Marziano V, Guzzetta G, Mammone A, Riccardo F, Poletti P, Trentini F, Manica M, Siddu A, Stefanelli P, Pezzotti P, et al (2021) Return to normal: COVID-19 vaccination under mitigation measures. medRxiv
Mahase E (2021) COVID-19: booster dose will be needed in autumn to avoid winter surge, says government adviser
Chen J, Gao K, Wang R, Wei GW (2021) Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies. Chem Sci
Shinde V, Bhikha S, Hoosain Z, Archary M, Bhorat Q, Fairlie L, Lalloo U, Masilela MS, Moodley D, Hanley S et al (2021) Efficacy of NVX-CoV2373 COVID-19 vaccine against the B.1.351 variant. N Engl J Med 384(20):1899–1909
Abu-Raddad LJ, Chemaitelly H, Butt AA (2021) Effectiveness of the BNT162b2 COVID-19 vaccine against the B.1.1.7 and B.1.351 variants. N Engl J Med
Madhi SA, Baillie V, Cutland CL, Voysey M, Koen AL, Fairlie L, Padayachee SD, Dheda K, Barnabas SL, Bhorat QE et al (2021) Efficacy of the ChAdOx1 nCoV-19 COVID-19 vaccine against the B.1.351 variant. N Engl J Med 384(20):1885–1898
Oh Y, Park S, Ye JC (2020) Deep learning COVID-19 features on cxr using limited training data sets. IEEE Trans Med Imaging 39(8):2688–2700
Amyar A, Modzelewski R, Li H, Ruan S (2020) Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: classification and segmentation. Comput Biol Med 126:104037
Yan Q, Wang B, Gong D, Luo C, Zhao W, Shen J, Shi Q, Jin S, Zhang L, You Z (2020) COVID-19 chest CT image segmentation–a deep convolutional neural network solution. arXiv Prepr. arXiv:2004.10987
Basu S, Campbell RH (2020) Going by the numbers: learning and modeling COVID-19 disease dynamics. Chaos Solitons Fractals 138:110140
Basu S (2020) A study of the dynamics and genetics of COVID-19 through machine learning. Master’s thesis, University of Illinois at Urbana-Champaign
Bhouri MA, Costabal FS, Wang H, Linka K, Peirlinck M, Kuhl E, Perdikaris P (2021) COVID-19 dynamics across the US: a deep learning study of human mobility and social behavior. Comput Methods Appl Mech Eng 382:113891
Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chakraborty C, Mohammed IA (2021) Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN comput Sci 2(1):1–13
Bouhamed H (2020) COVID-19 cases and recovery previsions with deep learning nested sequence prediction models with long short-term memory (LSTM) architecture. Int J Sci Res Comput Sci Eng 8(2)
Randhawa GS, Soltysiak MP, El Roz H, de Souza CP, Hill KA, Kari L (2020) Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE 15(4):e0232391
Pathan RK, Biswas M, Khandaker MU (2020) Time series prediction of COVID-19 by mutation rate analysis using recurrent neural network-based LSTM model. Chaos Solitons Fractals 138:110018
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Sekizuka T, Itokawa K, Hashino M, Okubo K, Ohnishi A, Goto K, Tsukagoshi H, Ehara H, Nomoto R, Ohnishi M et al (2021) A discernable increase in the severe acute respiratory syndrome coronavirus 2 R. 1 lineage carrying an E484K Spike protein mutation in Japan. medRxiv
Nagano K, Tani‐Sassa C, Iwasaki Y, Takatsuki Y, Yuasa S, Takahashi Y, Nakajima J, Sonobe K, Ichimura N, Nukui Y et al (2021) SARS-CoV-2 R. 1 lineage variants prevailed in Tokyo in March 2021. medRxiv
Rodriguez-Maldonado AP, Vazquez-Perez JA, Cedro-Tanda A, Taboada B, Boukadida C, Wong-Arambula C, Nunez-Garcia TE, Cruz-Ortiz N, Barrera-Badillo G, Hernandez-Rivas L et al (2021) Emergence and spread of the potential variant of interest (VOI) B. 1.1. 519 predominantly present in Mexico. medRxiv
Rhoads DD, Plunkett D, Nakitandwe J, Dempsey A, Tu ZJ, Procop GW, Bosler D, Rubin BP, Loeffelholz MJ, Brock JE (2021) Endemic SARS-CoV-2 polymorphisms can cause a higher diagnostic target failure rate than estimated by aggregate global sequencing data. J Clin Microbiol JCM–00913
Yi B, Poetsch AR, Stadtmüller M, Rost F, Winkler S, Dalpke AH (2021) Phylogenetic analysis of SARS-CoV-2 lineage development across the first and second waves in Eastern Germany, 2020. bioRxiv
B.1.177.21 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.177.21.html
Amato L, Jurisic L, Puglia I, Di Lollo V, Curini V, Torzi G, Di Girolamo A, Mangone I, Mancinelli A, Decaro N et al (2021) Multiple detection and spread of novel strains of the SARS-CoV-2 B. 1.177 (B. 1.177. 75) lineage that test negative by a commercially available nucleocapsid gene real-time RT-PCR. Emerg Microbes Infect (just-accepted):1–19
Planas D, Bruel T, Grzelak L, Guivel-Benhassine F, Staropoli I, Porrot F, Planchais C, Buchrieser J, Rajah MM, Bishop E et al (2021) Sensitivity of infectious SARS-CoV-2 B.1.1.7 and B.1.351 variants to neutralizing antibodies. Nat Med 27(5):917–924
SARS-CoV-2 variant classifications and definitions. https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html
B.1.1 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.1.html
Skidmore PT, Kaelin EA, Holland LA, Maqsood R, Wu LI, Mellor NJ, Blain JM, Harris V, LaBaer J, Murugan V et al (2021) Emergence of a SARS-CoV-2 E484K variant of interest in Arizona. medRxiv
Surleac M, Casangiu C, Banica L, Milu P, Florea D, Sandulescu O, Streinu-Cercel A, Vlaicu O, Tudor A, Hohan R et al (2021) Evidence of novel SARS-CoV-2 variants circulation in Romania. AIDS Res Hum Retroviruses 37(4):329–332
Younes M, Hamze K, Carter DP, Osman KL, Vipond R, Carroll M, Pullan ST, Nassar H, Mohamad N, Makki M et al (2021) B.1.1.7 became the dominant variant in Lebanon. medRxiv
Brejová B, Hodorová V, Boršová K, Čabanová V, Reizigová L, Paul ED, Čekan P, Klempa B, Nosek J, Vinař T (2021) B. 1.258 O, a SARS-CoV-2 variant with O H69/O V70 in the Spike protein circulating in the Czech Republic and Slovakia. arXiv Prepr. arXiv:2102.04689
Fonseca V, de Jesus R, Adelino T, Reis AB, de Souza BB, Ribeiro AA, Guimarães NR, Livorati MT, de Lima Neto DF, Kato RB et al (2021) Genomic evidence of SARS-CoV-2 reinfection case with the emerging B.1.2 variant in Brazil. J Infect
Webb LM, Matzinger S, Grano C, Kawasaki B, Stringer G, Bankers L, Herlihy R (2021) Identification of and surveillance for the SARS-CoV-2 variants B.1.427 and B.1.429—Colorado, January–March 2021. Morb Mortal Wkly Rep 70(19):717
Deng X, Garcia-Knight MA, Khalid MM, Servellita V, Wang C, Morris MK, Sotomayor-González A, Glasner DR, Reyes KR, Gliwa AS et al (2021) Transmission, infectivity, and antibody neutralization of an emerging SARS-CoV-2 variant in California carrying a L452R spike protein mutation. medRxiv
Annavajhala MK, Mohri H, Zucker JE, Sheng Z, Wang P, Gomez-Simmonds A, Ho DD, UhlemannAC (2021) A novel SARS-CoV-2 variant of concern, B.1.526, identified in New York. medRxiv
Lasek-Nesselquist E, Lapierre P, Schneider E, George KS, Pata J (2021) The localized rise of a B.1.526 variant containing an E484K mutation in New York State. medRxiv
B.1.596 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.596.html
Bernal JL, Andrews N, Gower C, Gallagher E, Simmons R, Thelwall S, Tessier E, Groves N, Dabrera G, Myers R et al (2021) Effectiveness of COVID-19 vaccines against the B.1.617.2 variant. medRxiv
Challen R, Dyson L, Overton CE, Guzman-Rincon LM, Hill EM, Stage HB, Brooks-Pollock E, Pellis L, Scarabel F, Pascall DJ et al (2021) Early epidemiological signatures of novel SARS-CoV-2 variants: establishment of B.1.617.2 in England. medRxiv
B.1 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.html
D.2 pango lineage. https://cov-lineages.org/lineages/lineage_D.2.html
Coutinho RM, Marquitti FM, Ferreira LS, Borges ME, da Silva RL, Canton O, Portella TP, Lyra SP, Franco C, da Silva AAM et al (2021) Model-based evaluation of transmissibility and reinfection for the P. 1 variant of the SARS-CoV-2. medRxiv
Kindratenko V, Mu D, Zhan Y, Maloney J, Hashemi SH, Rabe B, Xu K, Campbell R, Peng J, Gropp W (2020) HAL: computer system for scalable deep learning. In: Practice and experience in advanced research computing, pp 41–48
Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: 2008 Fourth international conference on natural computation, vol 4. IEEE, pp 192–201
Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22–30
Gulli A, Pal S (2017) Deep learning with keras. Packt Publishing Ltd
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv Prepr. arXiv:1412.6980
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Acknowledgements
This project has been funded by the Jump ARCHES endowment through the Health Care Engineering Systems Center.
This work uses resources from GISAID (https://www.gisaid.org). We would like to acknowledge all laboratories that have contributed their COVID-19 sequence data to GISAID.
This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at Urbana-Champaign.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Basu, S., Campbell, R.H. (2023). Classifying COVID-19 Variants Based on Genetic Sequences Using Deep Learning Models. In: Wang, L., Pattabiraman, K., Di Martino, C., Athreya, A., Bagchi, S. (eds) System Dependability and Analytics. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-02063-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-02063-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02062-9
Online ISBN: 978-3-031-02063-6
eBook Packages: EngineeringEngineering (R0)