Embeddability and rate identifiability of Kimura 2-parameter matrices

  • Marta CasanellasEmail author
  • Jesús Fernández-Sánchez
  • Jordi Roca-Lacostena


Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for \(4\times 4\) matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these \(4\times 4\) matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.


Nucleotide substitution model Markov matrix Markov generator Matrix logarithm Embedding problem Rate identifiability 

Mathematics Subject Classification

60J10 60J27 15B51 15A16 92D15 



All authors are partially funded by AGAUR Project 2017 SGR-932 and MINECO/FEDER Projects MTM2015-69135 and MDM-2014-0445. J Roca-Lacostena has received also funding from Secretaria d’Universitats i Recerca de la Generalitat de Catalunya (AGAUR 2018FI_B_00947) and European Social Funds. The authors would like to express their gratitude to Jeremy Sumner for his remarks and interesting conversations on the topic. They are also grateful to the anonymous reviewers for useful comments on the first version of the manuscript, which greatly improved the paper.

Author Contributions

MC and JFS conceived the project, revised the proofs and computations and drafted part of the manuscript. JRL wrote the core of the manuscript and worked out the proofs and computations. All authors read, revised and approved the final manuscript.


  1. Barry D, Hartigan JA (1987) Statistical analysis of homonoid molecular evolution. Stat Sci 2:191–207CrossRefGoogle Scholar
  2. Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137(1):51–73MathSciNetCrossRefGoogle Scholar
  3. Culver WJ (1966) On the existence and uniqueness of the real logarithm of a matrix. Proc Am Math Soc 17:1146–1151MathSciNetCrossRefGoogle Scholar
  4. Cuthbert JR (1972) On uniqueness of the logarithm for Markov semi-groups. J Lond Math Soc 2(4):623–630MathSciNetCrossRefGoogle Scholar
  5. Cuthbert JR (1973) The logarithm function for finite-state Markov semi-groups. J Lond Math Soc 2(3):524–532MathSciNetCrossRefGoogle Scholar
  6. Davies EB (2010) Embeddable Markov matrices. Electron J Probab 15(47):1474–1486MathSciNetCrossRefGoogle Scholar
  7. Duchene S, Holt KE, Weill F-X, Le Hello S, Hawkey J, Edwards D, Fourment M, Holmes E (2016) Genome-scale rates of evolutionary change in bacteria. Microbial Genomics 2:e000094CrossRefGoogle Scholar
  8. Evans SN, Speed TP (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21:355–377MathSciNetCrossRefGoogle Scholar
  9. Fernández-Sánchez J, Sumner JG, Jarvis PD, Woodhams MD (2015) Lie Markov models with purine/pyrimidine symmetry. J Math Biol 70(4):855–91MathSciNetCrossRefGoogle Scholar
  10. Gantmacher FR (1959) The theory of matrices—1. Chelsea Publishing Company, VermontzbMATHGoogle Scholar
  11. Goodman GS (1970) An intrinsic time for non-stationary finite Markov chains. Probab Theor Relat Field 16:165–180MathSciNetzbMATHGoogle Scholar
  12. Guerry M-A (2013) On the embedding problem for discrete-time Markov chains. J Appl Probab 50(4):918–930MathSciNetCrossRefGoogle Scholar
  13. Guerry M-A (2019) Sufficient embedding conditions for three-state discrete-time Markov chains with real eigenvalues. Linear Multilinear Algebra 67(1):106–120MathSciNetCrossRefGoogle Scholar
  14. Hendy MD, Penny D (1993) Spectral analysis of phylogenetic data. J Classif 10(1):5–24CrossRefGoogle Scholar
  15. Higham NJ (2008) Functions of matrices—theory and computation. SIAM, PhiladelphiaCrossRefGoogle Scholar
  16. Ho SYW, Shapiro B, Phillips MJ, Cooper A, Drummond AJ (2007) Evidence for time dependency of molecular rate estimates. Syst Biol 56(3):515–522CrossRefGoogle Scholar
  17. Israel RB, Rosenthal JS, Wei JZ (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Finance 11(2):245–265MathSciNetCrossRefGoogle Scholar
  18. Jia C (2016) A solution to the reversible embedding problem for finite Markov chains. Stat Probab Lett 116:122–130MathSciNetCrossRefGoogle Scholar
  19. Jia C, Qian M, Jiang D (2014) Overshoot in biological systems modelled by Markov chains: a non-equilibrium dynamic phenomenon. IET Syst Biol 8(4):138–145CrossRefGoogle Scholar
  20. Jukes TH, Cantor C (1969) Evolution of protein molecules. Mamm Protein Metab 3(21):132Google Scholar
  21. Kaehler BD, Yap VB, Zhang R, Huttley GA (2015) Genetic distance for a general non-stationary Markov substitution process. Syst Biol 64(2):281–293CrossRefGoogle Scholar
  22. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16(2):111–120CrossRefGoogle Scholar
  23. Kimura M (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci 78(1):454–458CrossRefGoogle Scholar
  24. Kosta D, Kubjas K (2017) Geometry of symmetric group-based models. ArXiv e-prints arXiv:1705.09228
  25. Roca-Lacostena J, Fernández-Sánchez J (2018) Embeddability of Kimura 3st Markov matrices. J Theor Biol 445:128–135MathSciNetCrossRefGoogle Scholar
  26. Singer B, Spilerman S (1976) The representation of social processes by Markov models. Am J Sociol 82(1):1–54CrossRefGoogle Scholar
  27. Steel M (2016) Phylogeny: discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics. SIAMGoogle Scholar
  28. Van-Brunt A (2018) Infinitely divisible nonnegative matrices, m-matrices, and the embedding problem for finite state stationary Markov chains. Linear Algebra Appl 541:163–176MathSciNetCrossRefGoogle Scholar
  29. Verbyla KL, Yap VB, Pahwa A, Shao Y, Huttley GA (2013) The embedding problem for Markov models of nucleotide substitution. PLoS ONE 8:e69187CrossRefGoogle Scholar
  30. Zou L, Susko E, Field C, Roger AJ (2011) The parameters of the Barry and Hartigan general Markov model are statistically nonidentifiable. Syst Biol 60(6):872–875CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Dpt. MatemàtiquesUniversitat Politècnica de Catalunya and BGSMathBarcelonaSpain

Personalised recommendations