Skip to main content

Phylogenetic mixtures and linear invariants for equal input models

Abstract

The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the ‘equal input model’. This model generalizes the ‘Felsenstein 1981’ model (and thereby the Jukes–Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a ‘random cluster’ process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees—the so called ‘model invariants’), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of \(n=4\) leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167–191, 1987).

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Allman ES, Rhodes JA, Sullivant S (2012) When do phylogenetic mixture models mimic other phylogenetic models? Syst Biol 61:1049–1059

    Article  Google Scholar 

  2. Casanellas M, Fernández-Sánchez J, Kedzierska AM (2012) The space of phylogenetic mixtures for equivariant models. Algorithms Mol Biol 7:33

    Article  Google Scholar 

  3. Casanellas M, Fernández-Sánchez J (2011) Relevant phylogenetic invariants of evolutionary models. J Math Pures Appl 96:207–229

    MathSciNet  Article  MATH  Google Scholar 

  4. Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137:51–73

    MathSciNet  Article  MATH  Google Scholar 

  5. Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates, Sunderland

    Google Scholar 

  6. Fernández-Sánchez J, Casanellas M (2016) Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages. Syst Biol 65:280–291

    Article  Google Scholar 

  7. Fu YX (1995) Linear invariants under Jukes’ and Cantor’s one-parameter model. J Theor Biol 173:339–352

    Article  Google Scholar 

  8. Fu YX, Li WH (1991) Necessary and sufficient conditions for the existence of certain quadratic invariants under a phylogenetic tree. Math Biosci 105:229–238

    Article  MATH  Google Scholar 

  9. Kedzierska A, Drton M, Guigó R, Casanellas M (2012) SPIn: model selection for phylogenetic mixtures via linear invariants. Mol Biol Evol 29:929–937

    Article  Google Scholar 

  10. Kemeny JG, Snell JL (1976) Finite Markov chains. Springer, New York

    MATH  Google Scholar 

  11. Lake J (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191

    Google Scholar 

  12. Matsen FA, Mossel E, Steel M (2008) Mixed-up trees: the structure of phylogenetic mixtures. Bull Math Biol 70:1115–1139

    MathSciNet  Article  MATH  Google Scholar 

  13. Mossel E, Steel M (2004) A phase transition for a random cluster model on phylogenetic trees. Math Biosci 187:189–203

    MathSciNet  Article  MATH  Google Scholar 

  14. Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford

    MATH  Google Scholar 

  15. Steel MA, Székely LA, Hendy MD (1994) Reconstructing trees when sequence sites evolve at variable rates. J Comput Biol 1:153–163

    Article  Google Scholar 

  16. Steel M (2011) Can we avoid ‘SIN’ in the house of ‘no common mechanism’? Syst Biol 60:96–109

    Article  Google Scholar 

  17. Steel MA, Fu YX (1995) Classifying and counting linear phylogenetic invariants for the Jukes–Cantor model. J Comput Biol 2:39–47

    Article  Google Scholar 

  18. Štefakovič D, Vigoda E (2007) Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J Comput Biol 14:156–189

    MathSciNet  Article  Google Scholar 

  19. Sturmfels B, Sullivant S (2005) Toric ideals of phylogenetic invariants. J Comput Biol 12:204–228

    Article  Google Scholar 

Download references

Acknowledgments

We thank the two anonymous reviewers for their helpful comments on an earlier version of this manuscript. Part of this research was performed while MC was visiting the Biomathematics Research Center of the University of Canterbury. MC would like to thank the Biomathematics Research Center (and specially its director) for the invitation, the support provided, and the great working atmosphere. MC is partially supported by MTM2012-38122-C03-01, MTM2015-69135-P (MINECO/FEDER) and Generalitat de Catalunya 2014 SGR-634.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mike Steel.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Casanellas, M., Steel, M. Phylogenetic mixtures and linear invariants for equal input models. J. Math. Biol. 74, 1107–1138 (2017). https://doi.org/10.1007/s00285-016-1055-8

Download citation

Keywords

  • Phylogenetic tree
  • Markov processes
  • Linear invariants

Mathematics Subject Classification

  • 05C05
  • 60J28
  • 92D15