Algebraic Methods in Phylogenetics

  • Marta Casanellas
  • John A. Rhodes

To those outside the field, and even to some focused on empirical applications, phylogenetics may appear to have little to do with algebra. Probability and statistics are clearly important ingredients, as modeling and inferring evolutionary relationships motivate the field. Combinatorics is also an obvious component, as the graph-theoretic notions of trees, and more recently networks, are used to describe the relationships. But where does the algebra arise?

The models used in phylogenetics are necessarily complex. At the simplest, they depend on a tree structure, as well as Markov matrices describing changes in nucleotide sequences along the edges. These two components result in probability distributions given by rather complicated polynomials on the parameters of the models, whose precise form reflects the structure of the tree. Even following standard statistical paradigms for inference, efficient calculation, such as by the Felsenstein pruning algorithm Felsenstein (1981) used in...



Marta Casanellas is partially funded by AGAUR Project 2017 SGR-932, MINECO/FEDER Projects MTM2015-69135 and MDM-2014-0445. John A. Rhodes is supported by NIH grant R01 GM117590.


  1. Allman ES, Degnan JH, Rhodes JA (2018) Split probabilities and species tree inference under the multispecies coalescent model. Bull Math Biol 80(1):64–103MathSciNetCrossRefzbMATHGoogle Scholar
  2. Allman ES, Rhodes JA (2003) Phylogenetic invariants of the general Markov model of sequence mutation. Math Biosci 186:113–144MathSciNetCrossRefzbMATHGoogle Scholar
  3. Allman ES, Rhodes JA (2009) The identifiability of covarion models in phylogenetics. IEEE ACM Trans Comput Biol Bioinform 6:76–88CrossRefGoogle Scholar
  4. Cavender JA, Felsenstein J (1987) Invariants of phylogenies in a simple case with discrete states. J Class 4:57–71CrossRefzbMATHGoogle Scholar
  5. Chifman J, Kubatko L (2014) Quartet inference from snp data under the coalescent model. Bioinformatics 30(23):3317–3324CrossRefGoogle Scholar
  6. Evans SN, Speed TP (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21(1):355–377MathSciNetCrossRefzbMATHGoogle Scholar
  7. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376CrossRefGoogle Scholar
  8. Fernández-Sánchez J, Casanellas M (2016) Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages. Syst Biol 65(2):280–291CrossRefGoogle Scholar
  9. Hendy MD, Penny D (1989) A framework for the quantitative study of evolutionary trees. Syst Zool 38:297–309CrossRefGoogle Scholar
  10. Hendy MD, Penny D, Steel M (1994) A discrete Fourier analysis for evolutionary trees. Proc Natl Acad Sci 91:3339–3343CrossRefzbMATHGoogle Scholar
  11. Huelsenbeck JP (1995) Performance of phylogenetic methods in simulation. Syst Biol 44:17–48CrossRefGoogle Scholar
  12. Lake JA (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191Google Scholar
  13. Pearson K (1894) Contributions to the mathematical theory of evolution. Philos Trans R Soc Lond A Math Phys Eng Sci 185:71–110CrossRefzbMATHGoogle Scholar
  14. Pachter L, Sturmfels B (2005) Algebraic statistics for computational biology. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  15. Sturmfels B, Sullivant S (2005) Toric ideals of phylogenetic invariants. J Comput Biol 12:204–228CrossRefzbMATHGoogle Scholar

Copyright information

© Society for Mathematical Biology 2018

Authors and Affiliations

  1. 1.Department of MathematicsUniversitat Politècnica de CatalunyaBarcelonaSpain
  2. 2.Department of Mathematics and StatisticsUniversity of Alaska FairbanksFairbanksUSA

Personalised recommendations