The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the ‘equal input model’. This model generalizes the ‘Felsenstein 1981’ model (and thereby the Jukes–Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a ‘random cluster’ process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees—the so called ‘model invariants’), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of \(n=4\) leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167–191, 1987).
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Allman ES, Rhodes JA, Sullivant S (2012) When do phylogenetic mixture models mimic other phylogenetic models? Syst Biol 61:1049–1059
Casanellas M, Fernández-Sánchez J, Kedzierska AM (2012) The space of phylogenetic mixtures for equivariant models. Algorithms Mol Biol 7:33
Casanellas M, Fernández-Sánchez J (2011) Relevant phylogenetic invariants of evolutionary models. J Math Pures Appl 96:207–229
Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137:51–73
Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates, Sunderland
Fernández-Sánchez J, Casanellas M (2016) Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages. Syst Biol 65:280–291
Fu YX (1995) Linear invariants under Jukes’ and Cantor’s one-parameter model. J Theor Biol 173:339–352
Fu YX, Li WH (1991) Necessary and sufficient conditions for the existence of certain quadratic invariants under a phylogenetic tree. Math Biosci 105:229–238
Kedzierska A, Drton M, Guigó R, Casanellas M (2012) SPIn: model selection for phylogenetic mixtures via linear invariants. Mol Biol Evol 29:929–937
Kemeny JG, Snell JL (1976) Finite Markov chains. Springer, New York
Lake J (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191
Matsen FA, Mossel E, Steel M (2008) Mixed-up trees: the structure of phylogenetic mixtures. Bull Math Biol 70:1115–1139
Mossel E, Steel M (2004) A phase transition for a random cluster model on phylogenetic trees. Math Biosci 187:189–203
Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford
Steel MA, Székely LA, Hendy MD (1994) Reconstructing trees when sequence sites evolve at variable rates. J Comput Biol 1:153–163
Steel M (2011) Can we avoid ‘SIN’ in the house of ‘no common mechanism’? Syst Biol 60:96–109
Steel MA, Fu YX (1995) Classifying and counting linear phylogenetic invariants for the Jukes–Cantor model. J Comput Biol 2:39–47
Štefakovič D, Vigoda E (2007) Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J Comput Biol 14:156–189
Sturmfels B, Sullivant S (2005) Toric ideals of phylogenetic invariants. J Comput Biol 12:204–228
We thank the two anonymous reviewers for their helpful comments on an earlier version of this manuscript. Part of this research was performed while MC was visiting the Biomathematics Research Center of the University of Canterbury. MC would like to thank the Biomathematics Research Center (and specially its director) for the invitation, the support provided, and the great working atmosphere. MC is partially supported by MTM2012-38122-C03-01, MTM2015-69135-P (MINECO/FEDER) and Generalitat de Catalunya 2014 SGR-634.
About this article
Cite this article
Casanellas, M., Steel, M. Phylogenetic mixtures and linear invariants for equal input models. J. Math. Biol. 74, 1107–1138 (2017). https://doi.org/10.1007/s00285-016-1055-8
- Phylogenetic tree
- Markov processes
- Linear invariants
Mathematics Subject Classification