Futuristic Methods in Virus Genome Evolution Using the Third-Generation DNA Sequencing and Artificial Neural Networks

  • Hyunjin ShimEmail author


The Third-Generation in DNA sequencing has emerged in the last few years, using new technologies that allow the production of long-read sequences. Applications of Third-Generation sequencing enable real-time data production, changing the research paradigms in environmental, and facilitating medical sampling in virology. To take full advantage of the large-scale data generated from long-read sequencing, an innovation in downstream data analysis is necessary. Here, we discuss futuristic methods using machine learning approaches to analyze big genetic data. We discuss the future of twenty-first-century virology by presenting advanced approaches for virus studies using real-time data production and on-site data analysis with Third-Generation Sequencing and machine learning methods. We first introduce the basic concepts in conventional statistical models and methods in virology, building gradually into the necessity of innovating downstream data analysis to meet the advances in sequencing technologies. We argue that artificial neural networks can innovate downstream data analysis, as they can learn from big datasets without model assumptions nor feature specifications, as opposed to current data analysis in bioinformatics. Furthermore, we discuss how futuristic methods using artificial neural networks, combined with long-read sequences can revolutionize virus studies, using specific examples in supervised and unsupervised settings.


Artificial neural networks Supervised learning Unsupervised learning Third-Generation DNA sequencing Long-read DNA/RNA Experimental evolution Global virology Likelihood-free Model-free Data-driven Big data in virology 



We thank Sunil Kumar Dogga, Ana K. Pitol, Hyun Jeong Shim for helpful discussions.


  1. 1.
    Koonin EV, Dolja VV. A virocentric perspective on the evolution of life. Curr Opin Virol. 2013;3(5):546–57.CrossRefGoogle Scholar
  2. 2.
    Tanaka MM, Valckenborgh F. Escaping an evolutionary lobster trap: drug resistance and compensatory mutation in a fluctuating environment. Evolution (N Y). 2011;65(5):1376–87.Google Scholar
  3. 3.
    Hall AR, Scanlan PD, Morgan AD, Buckling A. Host-parasite coevolutionary arms races give way to fluctuating selection. Ecol Lett. 2011;14(7):635–42.CrossRefGoogle Scholar
  4. 4.
    Andrews SM, Rowland-Jones S. Recent advances in understanding HIV evolution. F1000Research [Internet]. Faculty of 1000 Ltd; 2017 [cited 2019 Jan 29];6:597. Available from:
  5. 5.
    Schrauwen EJ, Fouchier RA. Host adaptation and transmission of influenza A viruses in mammals. Emerg Microbes Infect [Internet]. Nature Publishing Group; 2014 [cited 2019 Jan 29];3(1):1–10. Available from:
  6. 6.
    Shim H. Feature learning of virus genome evolution with the nucleotide skip-gram neural network. Evol Bioinforma [Internet]. SAGE PublicationsSage UK: London, England; 2019 [cited 2019 Jan 10];15:117693431882107. Available from:
  7. 7.
    Simon-Loriere E, Holmes EC. Why do RNA viruses recombine? Nat Rev Microbiol. 2011;9:617–26.CrossRefGoogle Scholar
  8. 8.
    Koonin EV., Dolja VV. Virus world as an evolutionary network of viruses and capsidless selfish elements. Microbiol Mol Biol Rev [Internet]. 2014;78(2):278–303. Available from: Scholar
  9. 9.
    Holmes EC. Viral evolution in the genomic age. PLoS Biol [Internet]. Public Library of Science; 2007 [cited 2016 Jun 24];5(10):e278. Available from:
  10. 10.
    Foll M, Poh Y-P, Renzette N, Ferrer-Admetlla A, Bank C, Shim H, et al. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet [Internet]. 2014 [cited 2014 Mar 20];10(2):e1004185. Available from:
  11. 11.
    Zhong Q, Carratalà A, Shim H, Bachmann V, Jensen JD, Kohn T. Resistance of echovirus 11 to ClO2 is associated with enhanced host receptor use, altered entry routes and high fitness. Environ Sci Technol [Internet]. American Chemical Society; 2017 [cited 2017 Sep 20];51(18):10746–55. Available from: Scholar
  12. 12.
    Carratala Ripolles A, Shim H, Zhong Q, Bachmann V, Jensen JD, Kohn T. Experimental adaptation of human echovirus 11 to ultraviolet radiation leads to tolerance to disinfection and resistance to ribavirin. Virus Evol [Internet]. 2017 [cited 2017 Nov 4];3(November):1–11. Available from:
  13. 13.
    Shim H, Laurent S, Matuszewski S, Foll M, Jensen JD. Detecting and quantifying changing selection intensities from time-sampled polymorphism data. G3 [Internet]. 2016 [cited 2016 Apr 4];6(4):893–904. Available from:
  14. 14.
    Wright S. Evolution in Mendelian populations. Genetics [Internet]. 1931;16(2):97–159. Available from:
  15. 15.
    Fisher R. The genetical theory of natural selection. Oxford at the clarendon press. 1930.Google Scholar
  16. 16.
    Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–6.CrossRefGoogle Scholar
  17. 17.
    Lynch M, Ackerman MS, Gout J-F, Long H, Sung W, Thomas WK, et al. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet [Internet]. Nature Research; 2016 [cited 2016 Nov 15];17(11):704–14. Available from: Scholar
  18. 18.
    Barton NH, Charlesworth B. Why sex and recombination? Science (80- ). 1998;281:1986–90.CrossRefGoogle Scholar
  19. 19.
    Otto SP, Lenormand T. Resolving the paradox of sex and recombination. Nat Rev Genet [Internet]. Nature Publishing Group; 2002 [cited 2016 Jun 17];3(4):252–61. Available from:
  20. 20.
    Irwin KK, Laurent S, Matuszewski S, Vuilleumier S, Ormond L, Shim H, et al. On the importance of skewed offspring distributions and background selection in viral population genetics. Here [Internet]. Nature Publishing Group; 2016;1–7. Available from:
  21. 21.
    Bedford T, Riley S, Barr IG, Broor S, Chadha M, Cox NJ, et al. Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature. 2015;523:217–22.CrossRefGoogle Scholar
  22. 22.
    Luksza M, Lässig M, Łuksza M, Lässig M. A predictive fitness model for influenza. Nature [Internet]. 2014 [cited 2014 Jul 10];507(7490):57–61. Available from: Scholar
  23. 23.
    Suttle CA. Viruses in the sea. Nature. 2005;437(7057):356–61.CrossRefGoogle Scholar
  24. 24.
    Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann M, Mikhailova N, et al. Uncovering Earth’s virome. Nature. 2016;536(7617):425–30.CrossRefGoogle Scholar
  25. 25.
    Andersson AF, Banfield JF. Virus population dynamics and acquired virus resistance in natural microbial communities. Science (80- ). 2008;320(5879):1047–50.CrossRefGoogle Scholar
  26. 26.
    Argov T, Azulay G, Pasechnek A, Stadnyuk O, Ran-Sapir S, Borovok I, et al. Temperate bacteriophages as regulators of host behavior. Curr Opin Microbiol [Internet]. Elsevier Ltd; 2017;38:81–7. Available from: Scholar
  27. 27.
    Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst [Internet]. Annual reviews; 2010 [cited 2014 Mar 20];41(1):379–406. Available from:
  28. 28.
    Diggle PJ, Gratton RJ, Grattont RJ. Monte Carlo methods of inference for implicit statistical models. J R Stat Soc Ser B [Internet]. 1984 [cited 2016 Nov 10];46(2):193–227. Available from:
  29. 29.
    Rubin DB. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat [Internet]. Institute of Mathematical Statistics; 1984 [cited 2016 Nov 10];12(4):1151–72. Available from:
  30. 30.
    Marjoram P, Molitor J, Plagnol V, Tavare S. Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A [Internet]. 2003 [cited 2014 Apr 29];100(26):15324–8. Available from:
  31. 31.
    Minka TP. Expectation propagation for approximate Bayesian inference [Internet]. 2013 [cited 2017 Oct 17]. Available from:
  32. 32.
    Barthelmé S, Chopin N. Expectation propagation for likelihood-free inference. J Am Stat Assoc [Internet]. Taylor & Francis; 2014 2 [cited 2014 Apr 4];109(505):315–33. Available from: Scholar
  33. 33.
    Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics [Internet]. 2002;162(4):2025–35. Available from:
  34. 34.
    Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. Approximate Bayesian computation. PLoS Comput Biol [Internet]. 2013 [cited 2013 Oct 22];9(1):e1002803. Available from:
  35. 35.
    Robert CP, Cornuet J-M, Marin J-M, Pillai NS. Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci U S A [Internet]. 2011 [cited 2015 Aug 12];108(37):15112–7. Available from:
  36. 36.
    Strathmann H, Sejdinovic D, Livingstone S, Szabo Z, Gretton A. Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families. In Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (Eds). Advances in Neural Information Processing Systems 28 (pp. 955–963). Curran Associates, Inc. 2015.Google Scholar
  37. 37.
    Foll M, Shim H, Jensen JD. WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 2015;15(1):87–98.CrossRefGoogle Scholar
  38. 38.
    Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016. 775.Google Scholar
  39. 39.
    Sheehan S, Song YS. Deep learning for population genetic inference. PLoS Comput Biol [Internet]. 2016;12(3):e1004845. Available from: Scholar
  40. 40.
    Roossinck MJ, Bazán ER. Symbiosis: viruses as intimate partners. Annu Rev Virol [Internet]. Annual reviews; 2017 [cited 2019 Feb 12];4(1):123–39. Available from:
  41. 41.
    Angermueller C, Pärnamaa T, Parts L, Oliver S, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016;12(12):878.CrossRefGoogle Scholar
  42. 42.
    Malaspinas A-S. Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective. Mol Ecol [Internet]. 2016 [cited 2016 Nov 5];25(1):24–41. Available from: Scholar
  43. 43.
    Powney M, Williamson P, Kirkham J, Kolamunnage-Dona R. A review of the handling of missing longitudinal outcome data in clinical trials. Trials [Internet]. BioMed Central; 2014 [cited 2019 Jan 30];15:237. Available from:
  44. 44.
    Hong EP, Park JW. Sample size and statistical power calculation in genetic association studies. Genomics Inform [Internet]. Korea Genome Organization; 2012 [cited 2019 Jan 30];10(2):117–22. Available from:
  45. 45.
    Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature [Internet]. 1953 [cited 2013 Oct 20];171(4356):737–8. Available from: Scholar
  46. 46.
    Wu R. Nucleotide sequence analysis of DNA. Nat New Biol [Internet]. 1972 [cited 2017 Nov 4];236(68):198–200. Available from:
  47. 47.
    Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A [Internet]. 1977 [cited 2013 Oct 17];74(12):5463–7. Available from:
  48. 48.
    Fuller CW, Kumar S, Porel M, Chien M, Bibillo A, Stranges PB, et al. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array. Proc Natl Acad Sci [Internet]. 2016;113(19):5233–8. Available from: Scholar
  49. 49.
    Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature [Internet]. Nature Publishing Group; 2016 [cited 2019 Feb 25];530(7589):228–32. Available from:
  50. 50.
    Kafetzopoulou LE, Efthymiadis K, Lewandowski K, Crook A, Carter D, Osborne J, et al. Assessment of metagenomic Nanopore and Illumina sequencing for recovering whole genome sequences of chikungunya and dengue viruses directly from clinical samples. Eurosurveillance [Internet]. European Centre for Disease Prevention and Control; 2018 [cited 2019 Feb 7];23(50):1800228. Available from:
  51. 51.
    Keller MW, Rambo-Martin BL, Wilson MM, Ridenour CA, Shepard SS, Stark TJ, et al. Direct RNA sequencing of the coding complete influenza A virus genome. Sci Rep [Internet]. Nature Publishing Group; 2018 [cited 2019 Feb 7];8(1):14408. Available from:
  52. 52.
    Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, et al. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome Med [Internet]. BioMed Central; 2015 [cited 2017 Oct 7];7(1):99. Available from:
  53. 53.
    Faria NR, Kraemer MUG, Hill SC, Jesus JG de, Aguiar RS, Iani FCM, et al. Genomic and epidemiological monitoring of yellow fever virus transmission potential. Science (80- ) [Internet]. American Association for the Advancement of Science; 2018 [cited 2019 Feb 7];361(6405):894–9. Available from:
  54. 54.
    Mbala-Kingebeni P, Villabona-Arenas C-J, Vidal N, Likofata J, Nsio-Mbeta J, Makiala-Mandanda S, et al. Rapid confirmation of the Zaire Ebola virus in the outbreak of the Equateur province in the Democratic Republic of Congo: implications for public health interventions. Clin Infect Dis [Internet]. Oxford University Press; 2019 [cited 2019 Feb 8];68(2):330–3. Available from:
  55. 55.
    Hansen S, Dill V, Shalaby MA, Eschbaumer M, Böhlken-Fascher S, Hoffmann B, et al. Serotyping of foot-and-mouth disease virus using oxford nanopore sequencing. J Virol Methods [Internet]. Elsevier; 2019 [cited 2019 Feb 8];263:50–3. Available from: Scholar
  56. 56.
    Theuns S, Vanmechelen B, Bernaert Q, Deboutte W, Vandenhole M, Beller L, et al. Nanopore sequencing as a revolutionary diagnostic tool for porcine viral enteric disease complexes identifies porcine kobuvirus as an important enteric virus. Sci Rep [Internet]. Nature Publishing Group; 2018 [cited 2019 Feb 8];8(1):9830. Available from:
  57. 57.
    Gallagher MD, Matejusova I, Nguyen L, Ruane NM, Falk K, Macqueen DJ. Nanopore sequencing for rapid diagnostics of salmonid RNA viruses. Sci Rep [Internet]. Nature Publishing Group; 2018 [cited 2019 Feb 8];8(1):16307. Available from:
  58. 58.
    Prazsák I, Moldován N, Balázs Z, Tombácz D, Megyeri K, Szűcs A, et al. Long-read sequencing uncovers a complex transcriptome topology in varicella zoster virus. BMC Genomics [Internet]. BioMed Central; 2018 [cited 2019 Feb 8];19(1):873. Available from:
  59. 59.
    Tombácz D, Prazsák I, Szűcs A, Dénes B, Snyder M, Boldogkői Z. Dynamic transcriptome profiling dataset of vaccinia virus obtained from long-read sequencing techniques. Gigascience [Internet]. Oxford University Press; 2018 [cited 2019 Feb 8];7(12). Available from:
  60. 60.
    Tombácz D, Balázs Z, Csabai Z, Snyder M, Boldogkői Z. Long-read sequencing revealed an extensive transcript complexity in herpesviruses. Front Genet [Internet]. Frontiers; 2018 [cited 2019 Feb 8];9:259. Available from:
  61. 61.
    Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods [Internet]. Nature Publishing Group; 2018 [cited 2019 Feb 17];15(3):201–6. Available from:
  62. 62.
    Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, et al. Functional metagenomic profiling of nine biomes. Nature [Internet]. Nature Publishing Group; 2008 [cited 2018 Jan 27];452(7187):629–32. Available from: Scholar
  63. 63.
    Roux S, Enault F, Hurwitz BL, Sullivan MB. VirSorter: mining viral signal from microbial genomic data. PeerJ [Internet]. PeerJ Inc.; 2015 [cited 2019 Feb 25];3:e985. Available from:
  64. 64.
    Pratama AA, van Elsas JD. The ‘neglected’ soil virome – potential role and impact. Trends Microbiol [Internet]. Elsevier Ltd; 2018 [cited 2019 Feb 25];26(8):649–62. Available from:
  65. 65.
    Bellas CM, Anesio AM, Barker G, et al. Front Microbiol. 2015;6(JUL):1–14.Google Scholar
  66. 66.
    Bellas CM, Anesio AM, Telling J, Stibal M, Tranter M, Davis S. Viral impacts on bacterial communities in Arctic cryoconite. Environ Res Lett [Internet]. IOP Publishing; 2013 [cited 2016 Sep 4];8(4):045021. Available from:
  67. 67.
    Schuur EAG, Abbott B. Climate change: high risk of permafrost thaw. Nature [Internet]. Nature Publishing Group; 2011 [cited 2018 Jan 29];480(7375):32–3. Available from: Scholar
  68. 68.
    Colangelo-Lillis J, Eicken H, Carpenter SD, Deming JW. Evidence for marine origin and microbial-viral habitability of sub-zero hypersaline aqueous inclusions within permafrost near Barrow, Alaska. FEMS Microbiol Ecol. 2016;92(5):1–15.CrossRefGoogle Scholar
  69. 69.
    Trubl G, Solonenko N, Chittick L, Solonenko SA, Rich VI, Sullivan MB. Optimization of viral resuspension methods for carbon-rich soils along a permafrost thaw gradient. PeerJ. 2016;4:e1999.CrossRefGoogle Scholar
  70. 70.
    Bellas CM, Anesio AM. High diversity and potential origins of T4-type bacteriophages on the surface of Arctic glaciers. Extremophiles. 2013;17(5):861–70.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Earth and Planetary SciencesUniversity of California, BerkeleyBerkeleyUSA

Personalised recommendations