Abstract
Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that, under some time restrictions, there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of “Lie Markov models” which, as we will show, are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines—that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra.
Similar content being viewed by others
Notes
The reader may notice that we have changed the terminology of Sumner et al. (2012a) and we refer to the desired property as “locally multiplicative closure” instead of “multiplicative closure”. The problem of global multiplicative closure for a continuous-time Markov model is a deep problem related to the convergence of the Baker–Campbell–Hausdorff formula (see Blanes and Casas 2004). Notice that this is not a serious drawback as the nature of the problem is local.
Note this group is isomorphic to the dihedral group \({\mathbf {D}}_4\), which describes the symmetries of a square. However, it also admits a more natural description in our setting as \({\mathfrak {S}}_2 \wr {\mathfrak {S}}_2 \), the wreath product of \({\mathfrak {S}}_2\) with itself (see Rotman 1995, Chapter VII).
References
Alexandrov AD (2005) Convex polyhedra. Springer Monographs in Mathematics. Springer, Berlin. ISBN 3-540-23158-7 (translated from the 1950 Russian edition by N. S. Dairbekov, S. S. Kutateladze and A. B. Sossinsky, with comments and bibliography by V. A. Zalgaller and appendices by L. A. Shor and Yu. A. Volkov)
Birkhoff G (1938) Analytical groups. Trans Am Math Soc 43(1):61–101. ISSN 0002–9947. doi:10.2307/1989902
Blanes S, Casas F (2004) On the convergence and optimization of the Baker–Campbell–Hausdorff formula. Linear Algebra Appl 378:135–158. ISSN 0024–3795. doi:10.1016/j.laa.2003.09.010
Bogopolski O (2008) Introduction to group theory. EMS Textbooks in Mathematics, European Mathematical Society (EMS), Zürich. ISBN 978-3-03719-041-8. doi:10.4171/041 (translated, revised and expanded from the Russian original)
Campbell JE (1897) On a law of combination of operators (second paper). Proc Lond Math Soc 28:381–390
Casanellas M, Fernández-Sánchez J (2010) Relevant phylogenetic invariants of evolutionary models. J Math Pure Appl 96:207–229
Casanellas M, Sullivant S (2005) The strand symmetric model. In: Algebraic statistics for computational biology. Cambridge University Press, New York, pp 305–321. doi:10.1017/CBO9780511610684.020
Casanellas M, Fernández-Sánchez J, Kedzierska A (2012) The space of phylogenetic mixtures for equivariant models. Algorithms Mol Biol 7:33
Davies EB (2010) Embeddable Markov matrices. Electron J Probab 15(47):1474–1486. ISSN 1083–6489. doi:10.1214/EJP.v15-733
Donten-Bury M, Michałek M (2012) Phylogenetic invariants for group-based models. J Algebr Stat 3(1):44–63. ISSN 1309–3452
Draisma J, Kuttler J (2008) On the ideals of equivariant tree models. Math Ann 344:619–644
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Fernández-Sánchez J (2013) Code for lie markov models with purine/pyrimidine symmetry. http://www.pagines.ma1.upc.edu/jfernandez/purine_pyrimidine.html
Hasegawa M, Kishino H, Yano T (1988) Phylogenetic inference from DNA sequence data. Statistical theory and data analysis, II (Tokyo, 1986). North-Holland, Amsterdam
James G, Liebeck M (2001) Representations and characters of groups, 2nd edn. Cambridge University Press, New York
Johnson JE (1985) Markov-type Lie groups in \(GL(n,{R})\). J Math Phys 26:252–257
Jukes T, Cantor C (1969) Evolution of protein molecules. In: Mammalian protein, metabolism, pp 21–132
Kimura M (1980) A simple method for estimating evolutionary rates of base substitution through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Kimura M (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci 78:1454–1458
Michałek M (2011) Geometry of phylogenetic group-based models. J Algebra 339:339–356. ISSN 0021-8693. doi:10.1016/j.jalgebra.2011.05.016
Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14:817–818
Rotman J (1995) An introduction to the theory of groups, 4th edn, volume 148 of Graduate Texts in Mathematics. Springer, New York. ISBN 0-387-94285-8
Sagan BE (2001) The symmetric group: representations, combinatorial algorithms, and symmetric functions, 2nd edn., Graduate Texts in MathematicsSpringer, Berlin
Semple C, Steel M (2003) Phylogenetics. Oxford Press, Oxford
Stein W et al (2012) Sage Mathematics Software (Version 4.8). The Sage Development Team. http://www.sagemath.org
Sumner JG, Fernández-Sánchez J, Jarvis PD (2012a) Lie Markov models. J Theor Biol 298:16–31. ISSN 0022-5193. doi:10.1016/j.jtbi.2011.12.017
Sumner JG, Jarvis PD, Fernández-Sánchez J, Kaine BT, Woodhams MD, Holland BR (2012b) Is the general time-reversible model bad for molecular phylogenetics? Syst Biol 61:1069–1074
Tavaré S (1986) Some probabilistic and statistical problems in the analysis of dna sequences. Lect Math Life Sci (American Mathematical Society) 17:57–86
Yap V, Pachter L (2004) Identification of evolutionary hotspots in the rodent genomes. Genome Res 14(4):574–579
Acknowledgments
JFS was partially supported by Ministerio de Educación y Ciencia MTM2009-14163-C02-02, MTM2012-38122-C03-01 and Generalitat de Catalunya, 2009 SGR 1284. JGS and PDJ were partially supported by Australian Research Council grant DP0877447. MDW was partially supported by Australian Research Council Grant FT100100031.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fernández-Sánchez, J., Sumner, J.G., Jarvis, P.D. et al. Lie Markov models with purine/pyrimidine symmetry. J. Math. Biol. 70, 855–891 (2015). https://doi.org/10.1007/s00285-014-0773-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-014-0773-z