# Dimensional Reduction for the General Markov Model on Phylogenetic Trees

- 175 Downloads
- 1 Citations

## Abstract

We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

### Keywords

Representation theory Markov chains Affine group## Notes

### Acknowledgements

This work was inspired from a question Alexei Drummond put to Barbara Holland during her presentation at the New Zealand Phylogenetics Meeting, DOOM 2016. I would also like to thank the anonymous reviewer for their careful and substantive comments that lead to a greatly improved manuscript.

**Funding** This work was supported by the Australian Research Council Discovery Early Career Fellowship DE130100423.

### References

- Allman ES, Kubatko LS, Rhodes JA (2017) Split scores: a tool to quantify phylogenetic signal in genome-scale data. Syst Biol. doi: 10.1093/sysbio/syw103
- Allman ES, Rhodes JA (2008) Phylogenetic ideals and varieties for the general Markov model. Adv. Appl. Math. 40(2):127–148MathSciNetCrossRefMATHGoogle Scholar
- Baker A (2012) Matrix groups: an introduction to Lie group theory. Springer Science & Business Media, New YorkGoogle Scholar
- Bashford JD, Jarvis PD, Sumner JG, Steel MA (2004) U(1)\(\times \) U(1)\(\times \) U(1) symmetry of the Kimura 3ST model and phylogenetic branching processes. J Phys A Math Gen 37(8):L81MathSciNetCrossRefMATHGoogle Scholar
- Bryant D (2009) Hadamard phylogenetic methods and the \(n\)-taxon process. Bull Math Biol 71(2):339–351MathSciNetCrossRefMATHGoogle Scholar
- Casanellas M, Fernández-Sánchez J (2007) Performance of a new invariants method on homogeneous and nonhomogeneous quartet trees. Mol Biol Evol 24(1):288–293CrossRefGoogle Scholar
- Casanellas M, Fernández-Sánchez J (2011) Relevant phylogenetic invariants of evolutionary models. Journal de Mathématiques Pures et Appliquées 96(3):207–229MathSciNetCrossRefMATHGoogle Scholar
- Cavender JA, Felsenstein J (1987) Invariants of phylogenies in a simple case with discrete states. J Classif 4(1):57–71CrossRefMATHGoogle Scholar
- Chifman J, Kubatko L (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30(23):3317–3324CrossRefGoogle Scholar
- Draisma J, Kuttler J (2009) On the ideals of equivariant tree models. Math Ann 344(3):619–644MathSciNetCrossRefMATHGoogle Scholar
- Eriksson N (2005) Tree construction using singular value decomposition. In: Pachter L, Sturmfels B (eds) Algebraic statistics for computational biology, chapter 10. Cambridge University Press, New York, pp 347–358CrossRefGoogle Scholar
- Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376CrossRefGoogle Scholar
- Felsenstein J (2004) Inferring phylogenies, vol 2. Sinauer Associates, SunderlandGoogle Scholar
- Fernández-Sánchez J, Casanellas M (2016) Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages. Syst Biol 65(2):280–291CrossRefGoogle Scholar
- Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Biol 20(4):406–416CrossRefGoogle Scholar
- Francis AR (2014) An algebraic view of bacterial genome evolution. J Math Biol 69(6–7):1693–1718MathSciNetCrossRefMATHGoogle Scholar
- Hagedorn TR (2000) A combinatorial approach to determining phylogenetic invariants for the general model. Technical report, CRM-2671Google Scholar
- Hendy MD, Penny D, Steel MA (1994) A discrete fourier analysis for evolutionary trees. Proc Natl Acad Sci 91(8):3339–3343CrossRefMATHGoogle Scholar
- Holland BR, Jarvis PD, Sumner JG (2013) Low-parameter phylogenetic inference under the general Markov model. Syst Biol 62(1):78–92CrossRefGoogle Scholar
- Jarvis PD, Sumner JG (2014) Adventures in invariant theory. ANZIAM J 56(02):105–115MathSciNetCrossRefMATHGoogle Scholar
- Jarvis PD, Sumner JG (2016) Matrix group structure and Markov invariants in the strand symmetric phylogenetic substitution model. J Math Biol 73:259–282MathSciNetCrossRefMATHGoogle Scholar
- Johnson JE (1985) Markov-type lie groups in \(\text{ GL }(n, r)\). J Math Phys 26(2):252–257MathSciNetCrossRefMATHGoogle Scholar
- Lake JA (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4(2):167–191Google Scholar
- Semple C, Steel M (2003) Phylogenetics, vol 24. Oxford University Press, OxfordMATHGoogle Scholar
- Sturmfels B, Sullivant S (2005) Toric ideals of phylogenetic invariants. J Comput Biol 12(2):204–228CrossRefGoogle Scholar
- Sumner JG, Charleston MA, Jermiin LS, Jarvis PD (2008) Markov invariants, plethysms, and phylogenetics. J Theor Biol 253(3):601–615MathSciNetCrossRefGoogle Scholar
- Sumner JG, Fernández-Sánchez J, Jarvis PD (2012a) Lie Markov models. J Theor Biol 298:16–31MathSciNetCrossRefGoogle Scholar
- Sumner JG, Holland BR, Jarvis PD (2012b) The algebra of the general Markov model on phylogenetic trees and networks. Bull Math Biol 74(4):858–880MathSciNetCrossRefMATHGoogle Scholar
- Sumner JG, Jarvis PD (2005) Entanglement invariants and phylogenetic branching. J Math Biol 51(1):18–36MathSciNetCrossRefMATHGoogle Scholar
- Sumner JG, Jarvis PD (2009) Markov invariants and the isotropy subgroup of a quartet tree. J Theor Biol 258(2):302–310MathSciNetCrossRefGoogle Scholar
- Yang Z (2014) Molecular evolution: a statistical approach. Oxford University Press, OxfordCrossRefMATHGoogle Scholar