Skip to main content

Classifying COVID-19 Variants Based on Genetic Sequences Using Deep Learning Models

  • Chapter
  • First Online:
System Dependability and Analytics

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

Abstract

The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19. We handle the class imbalance problem by sampling a fixed number of sequences for each class label. We handle the vanishing gradient problem in LSTMs arising from long sequences by dividing the sequence into fixed lengths and obtaining results on individual runs. Our results show that one-vs-all classifiers have test accuracies as high as 92.5% with tuned hyperparameters compared to the multi-class classifier model. Our experiments show higher overall accuracies for B.1.1.214, B.1.177.21, B.1.1.7, B.1.526, and P.1 on the one-vs-all classifiers, suggesting the presence of distinct mutations in these variants. Our results show that embedding vector size and batch sizes have insignificant improvement in accuracies, but changing from 2-mers to 3-mers mostly improves accuracies. We also studied individual runs which show that most accuracies improved after the 20th run, indicating that these sequence positions may have more contributions to distinguishing among different COVID-19 variants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hopkins J, Coronavirus resource center. https://coronavirus.jhu.edu

  2. Riou J, Althaus CL (2020) Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Eurosurveillance 25(4):2000058

    Article  Google Scholar 

  3. Nayak J, Mishra M, Naik B, Swapnarekha H, Cengiz K, Shanmuganathan V (2021) An impact study of COVID-19 on six different industries: automobile, energy and power, agriculture, education, travel and tourism and consumer electronics. Expert Syst

    Google Scholar 

  4. Shrestha N, Shad MY, Ulvi O, Khan MH, Karamehic-Muratovic A, Nguyen USDT, Baghbanzadeh M, Wardrup R, Aghamohammadi N, Cervantes D et al (2020) The impact of COVID-19 on globalization. One Health 100180

    Google Scholar 

  5. Walker P, Whittaker C, Watson O, Baguelin M, Ainslie K, Bhatia S, Bhatt S, Boonyasiri A, Boyd O, Cattarino L et al (2020) Report 12: the global impact of COVID-19 and strategies for mitigation and suppression

    Google Scholar 

  6. COVID-19 (coronavirus) drugs: are there any that work? https://www.mayoclinic.org/diseases-conditions/coronavirus/expert-answers/coronavirus-drugs/faq-20485627

  7. Si L, Bai H, Rodas M, Cao W, Oh CY, Jiang A, Nurani A, Zhu DY, Goyal G, Gilpin SE et al (2020) Human organs-on-chips as tools for repurposing approved drugs as potential influenza and COVID19 therapeutics in viral pandemics. bioRxiv

    Google Scholar 

  8. Rinott E, Kozer E, Shapira Y, Bar-Haim A, Youngster I (2020) Ibuprofen use and clinical outcomes in COVID-19 patients. Clin Microbiol Infect 26(9):1259-e5

    Article  Google Scholar 

  9. Payen JF, Chanques G, Futier E, Velly L, Jaber S, Constantin JM (2020) Sedation for critically ill patients with COVID-19: which specificities? one size does not fit all. Anaesth Crit Care Pain Med 39(3):341

    Article  Google Scholar 

  10. Fontanet A, Cauchemez S (2020) COVID-19 herd immunity: where are we? Nat Rev Immunol 20(10):583–584

    Article  Google Scholar 

  11. Le TT, Andreadakis Z, Kumar A, Román RG, Tollefsen S, Saville M, Mayhew S et al (2020) The COVID-19 vaccine development landscape. Nat Rev Drug Discov 19(5):305–306

    Article  Google Scholar 

  12. Marziano V, Guzzetta G, Mammone A, Riccardo F, Poletti P, Trentini F, Manica M, Siddu A, Stefanelli P, Pezzotti P, et al (2021) Return to normal: COVID-19 vaccination under mitigation measures. medRxiv

    Google Scholar 

  13. Mahase E (2021) COVID-19: booster dose will be needed in autumn to avoid winter surge, says government adviser

    Google Scholar 

  14. Chen J, Gao K, Wang R, Wei GW (2021) Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies. Chem Sci

    Google Scholar 

  15. Shinde V, Bhikha S, Hoosain Z, Archary M, Bhorat Q, Fairlie L, Lalloo U, Masilela MS, Moodley D, Hanley S et al (2021) Efficacy of NVX-CoV2373 COVID-19 vaccine against the B.1.351 variant. N Engl J Med 384(20):1899–1909

    Google Scholar 

  16. Abu-Raddad LJ, Chemaitelly H, Butt AA (2021) Effectiveness of the BNT162b2 COVID-19 vaccine against the B.1.1.7 and B.1.351 variants. N Engl J Med

    Google Scholar 

  17. Madhi SA, Baillie V, Cutland CL, Voysey M, Koen AL, Fairlie L, Padayachee SD, Dheda K, Barnabas SL, Bhorat QE et al (2021) Efficacy of the ChAdOx1 nCoV-19 COVID-19 vaccine against the B.1.351 variant. N Engl J Med 384(20):1885–1898

    Google Scholar 

  18. Oh Y, Park S, Ye JC (2020) Deep learning COVID-19 features on cxr using limited training data sets. IEEE Trans Med Imaging 39(8):2688–2700

    Article  Google Scholar 

  19. Amyar A, Modzelewski R, Li H, Ruan S (2020) Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: classification and segmentation. Comput Biol Med 126:104037

    Article  Google Scholar 

  20. Yan Q, Wang B, Gong D, Luo C, Zhao W, Shen J, Shi Q, Jin S, Zhang L, You Z (2020) COVID-19 chest CT image segmentation–a deep convolutional neural network solution. arXiv Prepr. arXiv:2004.10987

  21. Basu S, Campbell RH (2020) Going by the numbers: learning and modeling COVID-19 disease dynamics. Chaos Solitons Fractals 138:110140

    Article  MathSciNet  Google Scholar 

  22. Basu S (2020) A study of the dynamics and genetics of COVID-19 through machine learning. Master’s thesis, University of Illinois at Urbana-Champaign

    Google Scholar 

  23. Bhouri MA, Costabal FS, Wang H, Linka K, Peirlinck M, Kuhl E, Perdikaris P (2021) COVID-19 dynamics across the US: a deep learning study of human mobility and social behavior. Comput Methods Appl Mech Eng 382:113891

    Article  MathSciNet  Google Scholar 

  24. Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chakraborty C, Mohammed IA (2021) Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN comput Sci 2(1):1–13

    Article  Google Scholar 

  25. Bouhamed H (2020) COVID-19 cases and recovery previsions with deep learning nested sequence prediction models with long short-term memory (LSTM) architecture. Int J Sci Res Comput Sci Eng 8(2)

    Google Scholar 

  26. Randhawa GS, Soltysiak MP, El Roz H, de Souza CP, Hill KA, Kari L (2020) Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE 15(4):e0232391

    Article  Google Scholar 

  27. Pathan RK, Biswas M, Khandaker MU (2020) Time series prediction of COVID-19 by mutation rate analysis using recurrent neural network-based LSTM model. Chaos Solitons Fractals 138:110018

    Article  Google Scholar 

  28. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  29. Sekizuka T, Itokawa K, Hashino M, Okubo K, Ohnishi A, Goto K, Tsukagoshi H, Ehara H, Nomoto R, Ohnishi M et al (2021) A discernable increase in the severe acute respiratory syndrome coronavirus 2 R. 1 lineage carrying an E484K Spike protein mutation in Japan. medRxiv

    Google Scholar 

  30. Nagano K, Tani‐Sassa C, Iwasaki Y, Takatsuki Y, Yuasa S, Takahashi Y, Nakajima J, Sonobe K, Ichimura N, Nukui Y et al (2021) SARS-CoV-2 R. 1 lineage variants prevailed in Tokyo in March 2021. medRxiv

    Google Scholar 

  31. Rodriguez-Maldonado AP, Vazquez-Perez JA, Cedro-Tanda A, Taboada B, Boukadida C, Wong-Arambula C, Nunez-Garcia TE, Cruz-Ortiz N, Barrera-Badillo G, Hernandez-Rivas L et al (2021) Emergence and spread of the potential variant of interest (VOI) B. 1.1. 519 predominantly present in Mexico. medRxiv

    Google Scholar 

  32. Rhoads DD, Plunkett D, Nakitandwe J, Dempsey A, Tu ZJ, Procop GW, Bosler D, Rubin BP, Loeffelholz MJ, Brock JE (2021) Endemic SARS-CoV-2 polymorphisms can cause a higher diagnostic target failure rate than estimated by aggregate global sequencing data. J Clin Microbiol JCM–00913

    Google Scholar 

  33. Yi B, Poetsch AR, Stadtmüller M, Rost F, Winkler S, Dalpke AH (2021) Phylogenetic analysis of SARS-CoV-2 lineage development across the first and second waves in Eastern Germany, 2020. bioRxiv

    Google Scholar 

  34. B.1.177.21 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.177.21.html

  35. Amato L, Jurisic L, Puglia I, Di Lollo V, Curini V, Torzi G, Di Girolamo A, Mangone I, Mancinelli A, Decaro N et al (2021) Multiple detection and spread of novel strains of the SARS-CoV-2 B. 1.177 (B. 1.177. 75) lineage that test negative by a commercially available nucleocapsid gene real-time RT-PCR. Emerg Microbes Infect (just-accepted):1–19

    Google Scholar 

  36. Planas D, Bruel T, Grzelak L, Guivel-Benhassine F, Staropoli I, Porrot F, Planchais C, Buchrieser J, Rajah MM, Bishop E et al (2021) Sensitivity of infectious SARS-CoV-2 B.1.1.7 and B.1.351 variants to neutralizing antibodies. Nat Med 27(5):917–924

    Google Scholar 

  37. SARS-CoV-2 variant classifications and definitions. https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html

  38. B.1.1 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.1.html

  39. Skidmore PT, Kaelin EA, Holland LA, Maqsood R, Wu LI, Mellor NJ, Blain JM, Harris V, LaBaer J, Murugan V et al (2021) Emergence of a SARS-CoV-2 E484K variant of interest in Arizona. medRxiv

    Google Scholar 

  40. Surleac M, Casangiu C, Banica L, Milu P, Florea D, Sandulescu O, Streinu-Cercel A, Vlaicu O, Tudor A, Hohan R et al (2021) Evidence of novel SARS-CoV-2 variants circulation in Romania. AIDS Res Hum Retroviruses 37(4):329–332

    Article  Google Scholar 

  41. Younes M, Hamze K, Carter DP, Osman KL, Vipond R, Carroll M, Pullan ST, Nassar H, Mohamad N, Makki M et al (2021) B.1.1.7 became the dominant variant in Lebanon. medRxiv

    Google Scholar 

  42. Brejová B, Hodorová V, Boršová K, Čabanová V, Reizigová L, Paul ED, Čekan P, Klempa B, Nosek J, Vinař T (2021) B. 1.258 O, a SARS-CoV-2 variant with O H69/O V70 in the Spike protein circulating in the Czech Republic and Slovakia. arXiv Prepr. arXiv:2102.04689

  43. Fonseca V, de Jesus R, Adelino T, Reis AB, de Souza BB, Ribeiro AA, Guimarães NR, Livorati MT, de Lima Neto DF, Kato RB et al (2021) Genomic evidence of SARS-CoV-2 reinfection case with the emerging B.1.2 variant in Brazil. J Infect

    Google Scholar 

  44. Webb LM, Matzinger S, Grano C, Kawasaki B, Stringer G, Bankers L, Herlihy R (2021) Identification of and surveillance for the SARS-CoV-2 variants B.1.427 and B.1.429—Colorado, January–March 2021. Morb Mortal Wkly Rep 70(19):717

    Google Scholar 

  45. Deng X, Garcia-Knight MA, Khalid MM, Servellita V, Wang C, Morris MK, Sotomayor-González A, Glasner DR, Reyes KR, Gliwa AS et al (2021) Transmission, infectivity, and antibody neutralization of an emerging SARS-CoV-2 variant in California carrying a L452R spike protein mutation. medRxiv

    Google Scholar 

  46. Annavajhala MK, Mohri H, Zucker JE, Sheng Z, Wang P, Gomez-Simmonds A, Ho DD, UhlemannAC (2021) A novel SARS-CoV-2 variant of concern, B.1.526, identified in New York. medRxiv

    Google Scholar 

  47. Lasek-Nesselquist E, Lapierre P, Schneider E, George KS, Pata J (2021) The localized rise of a B.1.526 variant containing an E484K mutation in New York State. medRxiv

    Google Scholar 

  48. B.1.596 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.596.html

  49. Bernal JL, Andrews N, Gower C, Gallagher E, Simmons R, Thelwall S, Tessier E, Groves N, Dabrera G, Myers R et al (2021) Effectiveness of COVID-19 vaccines against the B.1.617.2 variant. medRxiv

    Google Scholar 

  50. Challen R, Dyson L, Overton CE, Guzman-Rincon LM, Hill EM, Stage HB, Brooks-Pollock E, Pellis L, Scarabel F, Pascall DJ et al (2021) Early epidemiological signatures of novel SARS-CoV-2 variants: establishment of B.1.617.2 in England. medRxiv

    Google Scholar 

  51. B.1 PANGO lineage. https://cov-lineages.org/lineages/lineage_B.1.html

  52. D.2 pango lineage. https://cov-lineages.org/lineages/lineage_D.2.html

  53. Coutinho RM, Marquitti FM, Ferreira LS, Borges ME, da Silva RL, Canton O, Portella TP, Lyra SP, Franco C, da Silva AAM et al (2021) Model-based evaluation of transmissibility and reinfection for the P. 1 variant of the SARS-CoV-2. medRxiv

    Google Scholar 

  54. Kindratenko V, Mu D, Zhan Y, Maloney J, Hashemi SH, Rabe B, Xu K, Campbell R, Peng J, Gropp W (2020) HAL: computer system for scalable deep learning. In: Practice and experience in advanced research computing, pp 41–48

    Google Scholar 

  55. Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: 2008 Fourth international conference on natural computation, vol 4. IEEE, pp 192–201

    Google Scholar 

  56. Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22–30

    Article  Google Scholar 

  57. Gulli A, Pal S (2017) Deep learning with keras. Packt Publishing Ltd

    Google Scholar 

  58. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283

    Google Scholar 

  59. Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association

    Google Scholar 

  60. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv Prepr. arXiv:1412.6980

  61. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This project has been funded by the Jump ARCHES endowment through the Health Care Engineering Systems Center.

This work uses resources from GISAID (https://www.gisaid.org). We would like to acknowledge all laboratories that have contributed their COVID-19 sequence data to GISAID.

This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at Urbana-Champaign.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roy H. Campbell .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Basu, S., Campbell, R.H. (2023). Classifying COVID-19 Variants Based on Genetic Sequences Using Deep Learning Models. In: Wang, L., Pattabiraman, K., Di Martino, C., Athreya, A., Bagchi, S. (eds) System Dependability and Analytics. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-02063-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-02063-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-02062-9

  • Online ISBN: 978-3-031-02063-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics