The Algebra of the General Markov Model on Phylogenetic Trees and Networks
- 210 Downloads
It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.
KeywordsSplit system Cluster system Markov process Maximum likelihood
Unable to display preview. Download preview PDF.
- Bandelt, H.-J. (1994). Phylogenetic networks. In Verhandlungen des Naturwissenschaftlichen Vereins Hamburg: Vol. 34. Google Scholar
- Bryant, D. (2005a). Extending tree models to split networks. In: L. Pachter & B. Sturmfels (Eds.), Algebraic statistics and computational biology (pp. 297–368). Cambridge: Cambridge University Press. Google Scholar
- Holland, B., & Moulton, V. (2003). Consensus networks: a method for visualising incompatibilities in collections of trees. In: G. Benson & R. Page (Eds.), 3rd international workshop on algorithms in bioinformatics (WABI 2003) (pp. 165–176). Berlin: Springer. Google Scholar
- Huson, D. H., Rupp, R., & Scornavacca, C. (2011). Phylogenetic networks: concepts, algorithms and applications. Cambridge: Cambridge University Press. Google Scholar
- Jarvis, P. D., & Sumner, J. G. (2010). Markov invariants for phylogenetic rate matrices derived from embedded submodels. arXiv:1008.1121, to appear.
- Jermiin, L. S., Ho, S. Y. W., Ababneh, F., Robinson, J., & Larkum, A. W. D. (2004). The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. BMC Systems Biology, 53, 638–643. Google Scholar
- Strimmer, K., & Moulton, V. (2000). Likelihood analysis of phylogenetic networks using directed graphical models. Molecular Biology and Evolution, 17, 875–881. Google Scholar