Abstract
Coalescent models of evolution account for incomplete lineage sorting by specifying a species tree parameter which determines a distribution on gene trees, and consequently, a site pattern probability distribution. It has been shown that the unrooted topology of the species tree parameter of the multispecies coalescent is generically identifiable, and a reconstruction method called SVDQuartets has been developed to infer this topology. In this paper, we describe a modified multispecies coalescent model that allows for varying effective population size and violations of the molecular clock. We show that the unrooted topology of the species tree parameter for these models is generically identifiable and that SVDQuartets can still be used to infer this topology.
Similar content being viewed by others
References
Bryant D, Bouckaert R, Felsenstein J, Rosenberg N, Roy Choudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29(8):1917–1932
Charlesworth B (2009) Effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10:195–205
Chifman J, Kubatko L (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30(23):3317–3324
Chifman J, Kubatko L (2015) Identifiability of the unrooted species tree topology under the coalescent model with time specific rate variation and invariable sites. J Theor Biol 374:35–47
Chou J, Gupta A, Yaduvanshi S, Davidson R, Nute M, Mirarab S, Warnow T (2015) A comparative study of SVDQuartets and other coalescent-based species tree estimation methods. BMC Genom 16(Suppl 10):S2
Degnan J, Salter L (2005) Gene tree distributions under the coalescent process. Evolution 59:24–37
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–76
Golub GH, Loan CFV (2013) Matrix computation. Johns Hopkins University Press, 4th edn. Section 2.4
Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27(3):570–580
Hoffman K, Kunze R (1971) Linear algebra, 2nd edn. Prentice Hall, New Jersey
Kingman JFC (1982) Exchangeability and the evolution of large populations. In: Koch G, Spizzichino F (eds) Exchangeability in probability and statistics. North-Holland, Amsterdam, pp 97–112
Kingman JFC (1982) On the genealogy of large populations. J Appl Prob 19A:27–43
Kingman JFC (1982) The coalescent. Stoch Proc Appl 13:235–248
Krantz SG, Parks HR (2002) A primer of real analytic functions, 2nd edn. Springer, New York
Lio P, Goldman N (1998) Models of molecular evolution and phylogeny. Genome Res 8:1233–1244
Liu L, Yu L, Edwards S (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10(1):302
Liu L, Yu L, Pearl D, Edwards S (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58(5):468–477
Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523–536
Martin AP, Palumbi SR (1993) Body size, metabolic rate, generation time, and the molecular clock. Proc Natl Acad Sci USA 90:4087–4091
Mirarab S, Reaz R, Bayzid MD, Zimmermann T, Swenson MS, Warnow T (2014) Astral: genome-scale coalescent-based species tree. Bioinformatics (ECCB special issue) 30(17):i541–i548
Mirarab S, Warnow T (2015) Astral-ii: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics (ISMB special issue) 31(12):i44–i52
Mityagin B (2015) The zero set of a real analytic function. arXiv:1512.07276
Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5(5):568–583
Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford
Swofford D (2002) PAUP\(^*\). Phylogenetic analysis using parsimony (\(^*\)and other methods). Version 4. Sinauer Associates, Sunderland, Massachusetts
Swofford D (2016) PAUP\(^*\). Phylogenetic analysis using parsimony (\(^*\)and other methods). Version 4a150
Syvanen M (1994) Horizontal gene transfer: evidence and possible consequences. Annu Rev Genet 28:237–261
Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460
Takahata N, Nei M (1985) Gene genealogy and variance of interpopulational nucleotide differences. Genetics 110:325–344
Tavaré S (1984) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Popul Biol 26:119–164
Tian Y, Kubatko L (2016) Rooting phylogenetic trees under the coalescent model using site pattern probabilities. (submitted)
Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66(3):763–775
Yang Z (1993) Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10:1396–1401
Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39(3):306–314
Acknowledgements
This research has been supported in part by the Mathematical Biosciences Institute and the National Science Foundation under Grant DMS 1440386. We would like to thank the two anonymous reviewers for several helpful remarks.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Long, C., Kubatko, L. Identifiability and Reconstructibility of Species Phylogenies Under a Modified Coalescent. Bull Math Biol 81, 408–430 (2019). https://doi.org/10.1007/s11538-018-0456-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-018-0456-9