Skip to main content

Improved Method for Rooting and Tip-Dating a Viral Phylogeny

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 947 Accesses

Abstract

Each viral outbreak caused by a zoonotic transmission is associated with two urgent “When” and “Where” questions. The “When” question addresses the time of the zoonotic event, and the “Where” question addresses the geographic location of the zoonotic event. These two questions become difficult when there is no good outgroup for rooting the viral tree. Viral outbreaks and the subsequent intensive sequencing of viral genomes typically lead to many nearly identical viral strains isolated from human patients with no closely related viruses of animal origin to serve as an outgroup to root the tree. For example, the SARS-CoV-2 genomes are so closely related to each other with an average distance of ~0.0002, but the closest related virus derived from animals (RaTG13 from bat) has a sequence divergence of about 0.04. Including such a distant relative into the tree with SARS-CoV-2 will essentially shrink the SARS-CoV-2 genomes into a dot so that the tree would be roughly equivalent to a single branch with RaTG13 at one end and all SARS-CoV-2 genomes at the other end. Based on the assumption of a constant molecular clock, a least-squares method for rooting a viral phylogeny without an outgroup has previously been developed and applied to address the “When” and “Where” questions. However, the assumption of a constant evolutionary rate is often violated during viral evolution, especially when the viral population size increases with initial spread but decreases dramatically with various isolation and mass vaccination measures. I present an extended method by modeling the evolutionary rate as a linear function of time instead of a constant. This substantially improves the accuracy of dating the common ancestor of sampled SARS-CoV-2 genomes. Based on two large viral trees, one with 83,688 leaves and the other with 455,251 leaves, the common ancestor was dated May 27, 2019, and June 2, 2019, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. MacLean OA, Lytras S, Weaver S, Singer JB, Boni MF, Lemey P, Kosakovsky Pond SL, Robertson DL (2021) Natural selection in the evolution of SARS-CoV-2 in bats created a generalist virus and highly capable human pathogen. PLoS Biol 19:e3001115

    Article  CAS  Google Scholar 

  2. Wang H, Pipes L, Nielsen R (2021) Synonymous mutations and the molecular evolution of SARS-CoV-2 origins. Virus Evol 7:veaa098

    Article  Google Scholar 

  3. Boni MF, Lemey P, Jiang X, Lam TT-Y, Perry B, Castoe T, Rambaut A, Robertson DL (2020) Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol 5:1408–1417

    Article  CAS  Google Scholar 

  4. Lytras S, Hughes J, Martin D, Arné de K, Rentia L, Pond SK, Xia W, Jiang X, Robertson D (2021) Exploring the natural origins of SARS-CoV-2 in the light of recombination. bioRxiv. Accessed 1 Sept 2021

    Google Scholar 

  5. Xia X (2021) Dating the common ancestor from an NCBI tree of 83688 high-quality and full-length SARS-CoV-2 genomes. Viruses 13:1790

    Article  CAS  Google Scholar 

  6. Gilbert MT, Rambaut A, Wlasiuk G, Spira TJ, Pitchenik AE, Worobey M (2007) The emergence of HIV/AIDS in the Americas and beyond. Proc Natl Acad Sci U S A 104:18566–18570

    Article  CAS  Google Scholar 

  7. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, Si H-R, Zhu Y, Li B, Huang C-L et al (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579:270–273

    Article  CAS  Google Scholar 

  8. Rito T, Richards MB, Pala M, Correia-Neves M, Soares PA (2020) Phylogeography of 27,000 SARS-CoV-2 genomes: Europe as the major source of the COVID-19 pandemic. Microorganisms 8:1678

    Article  CAS  Google Scholar 

  9. Forster P, Forster L, Renfrew C, Forster M (2020) Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci 117:9241–9243

    Article  CAS  Google Scholar 

  10. Gómez-Carballa A, Bello X, Pardo-Seco J, Martinón-Torres F, Salas A (2020) Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders. Genome Res 30:1434–1448

    Article  Google Scholar 

  11. Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, Hengartner N, Giorgi EE, Bhattacharya T, Foley B et al (2020) Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 182:812–827.e819

    Article  CAS  Google Scholar 

  12. Buonagurio DA, Nakada S, Parvin JD, Krystal M, Palese P, Fitch WM (1986) Evolution of human influenza A viruses over 50 years: rapid, uniform rate of change in NS gene. Science 232:980–982

    Article  CAS  Google Scholar 

  13. Gojobori T, Moriyama EN, Kimura M (1990) Molecular clock of viral evolution, and the neutral theory. Proc Natl Acad Sci U S A 87:10015–10018

    Article  CAS  Google Scholar 

  14. Drummond A, Pybus OG, Rambaut A (2003) Inference of viral evolutionary rates from molecular sequences. Adv Parasitol 54:331–358

    Article  Google Scholar 

  15. Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728

    Article  CAS  Google Scholar 

  16. Xia X, Yang Q (2011) A distance-based least-square method for dating speciation events. Mol Phylogenet Evol 59:342–353

    Article  Google Scholar 

  17. Rambaut A, Lam TT, Max Carvalho L, Pybus OG (2016) Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol 2:vew007

    Article  Google Scholar 

  18. Xia X (2018) DAMBE7: new and improved tools for data analysis in molecular biology and evolution. Mol Biol Evol 35:1550–1552

    Article  CAS  Google Scholar 

  19. Himmelmann L, Metzler D (2009) TreeTime: an extensible C++ software package for Bayesian phylogeny reconstruction with time-calibration. Bioinformatics 25:2440–2441

    Article  CAS  Google Scholar 

  20. To T-H, Jung M, Lycett S, Gascuel O (2016) Fast dating using least-squares criteria and algorithms. Syst Biol 65:82–97

    Article  CAS  Google Scholar 

  21. Volz EM, Frost SDW (2017) Scalable relaxed clock phylogenetic dating. Virus Evol 3:vex025

    Article  Google Scholar 

  22. Xia X (2021) TRAD: tip-rooting and ancestor-dating. University of Ottawa, Ottawa

    Google Scholar 

  23. Crow JF, Kimura M (1965) Evolution in sexual and asexual populations. Am Nat 99:439–450

    Article  Google Scholar 

  24. Gossmann TI, Keightley PD, Eyre-Walker A (2012) The effect of variation in the effective population size on the rate of adaptive molecular evolution in eukaryotes. Genome Biol Evol 4:658–667

    Article  Google Scholar 

  25. Lanfear R, Kokko H, Eyre-Walker A (2014) Population size and the rate of evolution. Trends Ecol Evol 29:33–41

    Article  Google Scholar 

  26. Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge

    Book  Google Scholar 

  27. Zhang YZ, Holmes EC (2020) A genomic perspective on the origin and emergence of SARS-CoV-2. Cell 181:223–227

    Article  CAS  Google Scholar 

  28. Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF (2020) The proximal origin of SARS-CoV-2. Nat Med 26:450–452. https://doi.org/10.1038/s41591-41020-40820-41599

    Article  CAS  Google Scholar 

  29. Ruan Y, Wen H, Hou M, He Z, Lu X, Xue Y, He X, Zhang Y-P, Wu C-I (2021) The twin-beginnings of COVID-19 in Asia and Europe – one prevails quickly. Natl Sci Rev:nwab223

    Google Scholar 

  30. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, Pastore y Piontti A, Mu K, Rossi L, Sun K et al (2020) The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science 368:395

    Article  CAS  Google Scholar 

  31. Apolone G, Montomoli E, Manenti A, Boeri M, Sabia F, Hyseni I, Mazzini L, Martinuzzi D, Cantone L, Milanese G et al (2020) Unexpected detection of SARS-CoV-2 antibodies in the prepandemic period in Italy. Tumori J. https://doi.org/10.1177/0300891620974755

  32. Amendola A, Canuti M, Bianchi S, Kumar S, Fappani C, Gori M, Colzani D, Pond SL, Miura S, Baggieri M, et al (2021) Molecular evidence for SARS-CoV-2 in samples collected from patients with morbilliform eruptions since late summer 2019 in Lombardy, Northern Italy. https://ssrn.com/abstract=3883274; https://doi.org/10.2139/ssrn.3883274. Accessed 1 Sept 2021

  33. Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, Schäffer AA, Brister JR (2017) Virus variation resource – improved response to emergent viral outbreaks. Nucleic Acids Res 45:D482–D490

    Article  CAS  Google Scholar 

  34. Lam T, Jia N, Ya-Wei Z, Shum MH, Jiang J, Zhu H, Tong Y, Yong-xia S, Xue-bing N, Liao Y et al (2020) Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature 583:282–285

    Article  CAS  Google Scholar 

  35. Xiao K, Zhai J, Feng Y, Niu Z, Zhang X, Zou J, Li N, Guo Y, Li X, Shen X et al (2020) Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. Nature 583:286–289

    Article  CAS  Google Scholar 

  36. Zhang T, Qunfu W, Zhang Z (2020) Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak. Curr Biol 30:1346–1351.e1342

    Article  CAS  Google Scholar 

  37. Mallapaty S (2021) After the WHO report: what’s next in the search for COVID’s origins. Nature 592:337–338

    Article  CAS  Google Scholar 

  38. Zhou H, Chen X, Tao H, Juan L, Hao S, Yanran L, Peihan W, Liu D, Yang J, Holmes E et al (2020) A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein. Curr Biol 30:2196–2203

    Article  CAS  Google Scholar 

  39. Xia X (2021) Domains and functions of spike protein in SARS-COV-2 in the context of vaccine design. Viruses 13:109. https://doi.org/10.3390/v13010109

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This research was funded by a Discovery Grant from the Natural Science and Engineering Research Council (NSERC, RGPIN/2018-03878) of Canada. I thank G. B. Golding, B. Foley, D. Gray, A. Rambaut, Y. Wei, Z. Xie, and J. Xu for their discussions, and Henry Lu for his invitation to write this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuhua Xia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xia, X. (2022). Improved Method for Rooting and Tip-Dating a Viral Phylogeny. In: Lu, H.HS., Schölkopf, B., Wells, M.T., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-65902-1_19

Download citation

Publish with us

Policies and ethics