Embeddability and rate identifiability of Kimura 2-parameter matrices
- 3 Downloads
Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for \(4\times 4\) matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these \(4\times 4\) matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.
KeywordsNucleotide substitution model Markov matrix Markov generator Matrix logarithm Embedding problem Rate identifiability
Mathematics Subject Classification60J10 60J27 15B51 15A16 92D15
All authors are partially funded by AGAUR Project 2017 SGR-932 and MINECO/FEDER Projects MTM2015-69135 and MDM-2014-0445. J Roca-Lacostena has received also funding from Secretaria d’Universitats i Recerca de la Generalitat de Catalunya (AGAUR 2018FI_B_00947) and European Social Funds. The authors would like to express their gratitude to Jeremy Sumner for his remarks and interesting conversations on the topic. They are also grateful to the anonymous reviewers for useful comments on the first version of the manuscript, which greatly improved the paper.
MC and JFS conceived the project, revised the proofs and computations and drafted part of the manuscript. JRL wrote the core of the manuscript and worked out the proofs and computations. All authors read, revised and approved the final manuscript.
- Jukes TH, Cantor C (1969) Evolution of protein molecules. Mamm Protein Metab 3(21):132Google Scholar
- Kosta D, Kubjas K (2017) Geometry of symmetric group-based models. ArXiv e-prints arXiv:1705.09228
- Steel M (2016) Phylogeny: discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics. SIAMGoogle Scholar