Skip to main content

Embeddability and rate identifiability of Kimura 2-parameter matrices

Abstract

Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for \(4\times 4\) matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these \(4\times 4\) matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Barry D, Hartigan JA (1987) Statistical analysis of homonoid molecular evolution. Stat Sci 2:191–207

    Article  Google Scholar 

  2. Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137(1):51–73

    MathSciNet  Article  Google Scholar 

  3. Culver WJ (1966) On the existence and uniqueness of the real logarithm of a matrix. Proc Am Math Soc 17:1146–1151

    MathSciNet  Article  Google Scholar 

  4. Cuthbert JR (1972) On uniqueness of the logarithm for Markov semi-groups. J Lond Math Soc 2(4):623–630

    MathSciNet  Article  Google Scholar 

  5. Cuthbert JR (1973) The logarithm function for finite-state Markov semi-groups. J Lond Math Soc 2(3):524–532

    MathSciNet  Article  Google Scholar 

  6. Davies EB (2010) Embeddable Markov matrices. Electron J Probab 15(47):1474–1486

    MathSciNet  Article  Google Scholar 

  7. Duchene S, Holt KE, Weill F-X, Le Hello S, Hawkey J, Edwards D, Fourment M, Holmes E (2016) Genome-scale rates of evolutionary change in bacteria. Microbial Genomics 2:e000094

    Article  Google Scholar 

  8. Evans SN, Speed TP (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21:355–377

    MathSciNet  Article  Google Scholar 

  9. Fernández-Sánchez J, Sumner JG, Jarvis PD, Woodhams MD (2015) Lie Markov models with purine/pyrimidine symmetry. J Math Biol 70(4):855–91

    MathSciNet  Article  Google Scholar 

  10. Gantmacher FR (1959) The theory of matrices—1. Chelsea Publishing Company, Vermont

    MATH  Google Scholar 

  11. Goodman GS (1970) An intrinsic time for non-stationary finite Markov chains. Probab Theor Relat Field 16:165–180

    MathSciNet  MATH  Google Scholar 

  12. Guerry M-A (2013) On the embedding problem for discrete-time Markov chains. J Appl Probab 50(4):918–930

    MathSciNet  Article  Google Scholar 

  13. Guerry M-A (2019) Sufficient embedding conditions for three-state discrete-time Markov chains with real eigenvalues. Linear Multilinear Algebra 67(1):106–120

    MathSciNet  Article  Google Scholar 

  14. Hendy MD, Penny D (1993) Spectral analysis of phylogenetic data. J Classif 10(1):5–24

    Article  Google Scholar 

  15. Higham NJ (2008) Functions of matrices—theory and computation. SIAM, Philadelphia

    Book  Google Scholar 

  16. Ho SYW, Shapiro B, Phillips MJ, Cooper A, Drummond AJ (2007) Evidence for time dependency of molecular rate estimates. Syst Biol 56(3):515–522

    Article  Google Scholar 

  17. Israel RB, Rosenthal JS, Wei JZ (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Finance 11(2):245–265

    MathSciNet  Article  Google Scholar 

  18. Jia C (2016) A solution to the reversible embedding problem for finite Markov chains. Stat Probab Lett 116:122–130

    MathSciNet  Article  Google Scholar 

  19. Jia C, Qian M, Jiang D (2014) Overshoot in biological systems modelled by Markov chains: a non-equilibrium dynamic phenomenon. IET Syst Biol 8(4):138–145

    Article  Google Scholar 

  20. Jukes TH, Cantor C (1969) Evolution of protein molecules. Mamm Protein Metab 3(21):132

    Google Scholar 

  21. Kaehler BD, Yap VB, Zhang R, Huttley GA (2015) Genetic distance for a general non-stationary Markov substitution process. Syst Biol 64(2):281–293

    Article  Google Scholar 

  22. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16(2):111–120

    Article  Google Scholar 

  23. Kimura M (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci 78(1):454–458

    Article  Google Scholar 

  24. Kosta D, Kubjas K (2017) Geometry of symmetric group-based models. ArXiv e-prints arXiv:1705.09228

  25. Roca-Lacostena J, Fernández-Sánchez J (2018) Embeddability of Kimura 3st Markov matrices. J Theor Biol 445:128–135

    MathSciNet  Article  Google Scholar 

  26. Singer B, Spilerman S (1976) The representation of social processes by Markov models. Am J Sociol 82(1):1–54

    Article  Google Scholar 

  27. Steel M (2016) Phylogeny: discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM

  28. Van-Brunt A (2018) Infinitely divisible nonnegative matrices, m-matrices, and the embedding problem for finite state stationary Markov chains. Linear Algebra Appl 541:163–176

    MathSciNet  Article  Google Scholar 

  29. Verbyla KL, Yap VB, Pahwa A, Shao Y, Huttley GA (2013) The embedding problem for Markov models of nucleotide substitution. PLoS ONE 8:e69187

    Article  Google Scholar 

  30. Zou L, Susko E, Field C, Roger AJ (2011) The parameters of the Barry and Hartigan general Markov model are statistically nonidentifiable. Syst Biol 60(6):872–875

    Article  Google Scholar 

Download references

Acknowledgements

All authors are partially funded by AGAUR Project 2017 SGR-932 and MINECO/FEDER Projects MTM2015-69135 and MDM-2014-0445. J Roca-Lacostena has received also funding from Secretaria d’Universitats i Recerca de la Generalitat de Catalunya (AGAUR 2018FI_B_00947) and European Social Funds. The authors would like to express their gratitude to Jeremy Sumner for his remarks and interesting conversations on the topic. They are also grateful to the anonymous reviewers for useful comments on the first version of the manuscript, which greatly improved the paper.

Author information

Affiliations

Authors

Contributions

MC and JFS conceived the project, revised the proofs and computations and drafted part of the manuscript. JRL wrote the core of the manuscript and worked out the proofs and computations. All authors read, revised and approved the final manuscript.

Corresponding author

Correspondence to Marta Casanellas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Casanellas, M., Fernández-Sánchez, J. & Roca-Lacostena, J. Embeddability and rate identifiability of Kimura 2-parameter matrices. J. Math. Biol. 80, 995–1019 (2020). https://doi.org/10.1007/s00285-019-01446-0

Download citation

Keywords

  • Nucleotide substitution model
  • Markov matrix
  • Markov generator
  • Matrix logarithm
  • Embedding problem
  • Rate identifiability

Mathematics Subject Classification

  • 60J10
  • 60J27
  • 15B51
  • 15A16
  • 92D15