Skip to main content

Identifiability and Reconstructibility of Species Phylogenies Under a Modified Coalescent

Abstract

Coalescent models of evolution account for incomplete lineage sorting by specifying a species tree parameter which determines a distribution on gene trees, and consequently, a site pattern probability distribution. It has been shown that the unrooted topology of the species tree parameter of the multispecies coalescent is generically identifiable, and a reconstruction method called SVDQuartets has been developed to infer this topology. In this paper, we describe a modified multispecies coalescent model that allows for varying effective population size and violations of the molecular clock. We show that the unrooted topology of the species tree parameter for these models is generically identifiable and that SVDQuartets can still be used to infer this topology.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Bryant D, Bouckaert R, Felsenstein J, Rosenberg N, Roy Choudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29(8):1917–1932

    Article  Google Scholar 

  • Charlesworth B (2009) Effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10:195–205

    Article  Google Scholar 

  • Chifman J, Kubatko L (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30(23):3317–3324

    Article  Google Scholar 

  • Chifman J, Kubatko L (2015) Identifiability of the unrooted species tree topology under the coalescent model with time specific rate variation and invariable sites. J Theor Biol 374:35–47

    Article  MATH  Google Scholar 

  • Chou J, Gupta A, Yaduvanshi S, Davidson R, Nute M, Mirarab S, Warnow T (2015) A comparative study of SVDQuartets and other coalescent-based species tree estimation methods. BMC Genom 16(Suppl 10):S2

    Article  Google Scholar 

  • Degnan J, Salter L (2005) Gene tree distributions under the coalescent process. Evolution 59:24–37

    Article  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–76

    Article  Google Scholar 

  • Golub GH, Loan CFV (2013) Matrix computation. Johns Hopkins University Press, 4th edn. Section 2.4

  • Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27(3):570–580

    Article  Google Scholar 

  • Hoffman K, Kunze R (1971) Linear algebra, 2nd edn. Prentice Hall, New Jersey

    MATH  Google Scholar 

  • Kingman JFC (1982) Exchangeability and the evolution of large populations. In: Koch G, Spizzichino F (eds) Exchangeability in probability and statistics. North-Holland, Amsterdam, pp 97–112

    Google Scholar 

  • Kingman JFC (1982) On the genealogy of large populations. J Appl Prob 19A:27–43

    MathSciNet  Article  MATH  Google Scholar 

  • Kingman JFC (1982) The coalescent. Stoch Proc Appl 13:235–248

    MathSciNet  Article  MATH  Google Scholar 

  • Krantz SG, Parks HR (2002) A primer of real analytic functions, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Lio P, Goldman N (1998) Models of molecular evolution and phylogeny. Genome Res 8:1233–1244

    Article  Google Scholar 

  • Liu L, Yu L, Edwards S (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10(1):302

    Article  Google Scholar 

  • Liu L, Yu L, Pearl D, Edwards S (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58(5):468–477

    Article  Google Scholar 

  • Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536

    Article  Google Scholar 

  • Martin AP, Palumbi SR (1993) Body size, metabolic rate, generation time, and the molecular clock. Proc Natl Acad Sci USA 90:4087–4091

    Article  Google Scholar 

  • Mirarab S, Reaz R, Bayzid MD, Zimmermann T, Swenson MS, Warnow T (2014) Astral: genome-scale coalescent-based species tree. Bioinformatics (ECCB special issue) 30(17):i541–i548

    Google Scholar 

  • Mirarab S, Warnow T (2015) Astral-ii: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics (ISMB special issue) 31(12):i44–i52

    Google Scholar 

  • Mityagin B (2015) The zero set of a real analytic function. arXiv:1512.07276

  • Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5(5):568–583

    Google Scholar 

  • Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Swofford D (2002) PAUP\(^*\). Phylogenetic analysis using parsimony (\(^*\)and other methods). Version 4. Sinauer Associates, Sunderland, Massachusetts

  • Swofford D (2016) PAUP\(^*\). Phylogenetic analysis using parsimony (\(^*\)and other methods). Version 4a150

  • Syvanen M (1994) Horizontal gene transfer: evidence and possible consequences. Annu Rev Genet 28:237–261

    Article  Google Scholar 

  • Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460

    Google Scholar 

  • Takahata N, Nei M (1985) Gene genealogy and variance of interpopulational nucleotide differences. Genetics 110:325–344

    Google Scholar 

  • Tavaré S (1984) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Popul Biol 26:119–164

    MathSciNet  Article  MATH  Google Scholar 

  • Tian Y, Kubatko L (2016) Rooting phylogenetic trees under the coalescent model using site pattern probabilities. (submitted)

  • Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66(3):763–775

    Article  Google Scholar 

  • Yang Z (1993) Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10:1396–1401

    Google Scholar 

  • Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39(3):306–314

    Article  Google Scholar 

Download references

Acknowledgements

This research has been supported in part by the Mathematical Biosciences Institute and the National Science Foundation under Grant DMS 1440386. We would like to thank the two anonymous reviewers for several helpful remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Colby Long.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Long, C., Kubatko, L. Identifiability and Reconstructibility of Species Phylogenies Under a Modified Coalescent. Bull Math Biol 81, 408–430 (2019). https://doi.org/10.1007/s11538-018-0456-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-018-0456-9

Keywords

  • Molecular clock
  • SVDQuartets
  • Multispecies Coalescent