Skip to main content
Log in

Identifiability of Large Phylogenetic Mixture Models

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Phylogenetic mixture models are statistical models of character evolution allowing for heterogeneity. Each of the classes in some unknown partition of the characters may evolve by different processes, or even along different trees. Such models are of increasing interest for data analysis, as they can capture the variety of evolutionary processes that may be occurring across long sequences of DNA or proteins. The fundamental question of whether parameters of such a model are identifiable is difficult to address, due to the complexity of the parameterization. Identifiability is, however, essential to their use for statistical inference.

We analyze mixture models on large trees, with many mixture components, showing that both numerical and tree parameters are indeed identifiable in these models when all trees are the same. This provides a theoretical justification for some current empirical studies, and indicates that extensions to even more mixture components should be theoretically well behaved. We also extend our results to certain mixtures on different trees, using the same algebraic techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allman, E. S., & Rhodes, J. A. (2003). Phylogenetic invariants for the general Markov model of sequence mutation. Mathematical Biosciences, 186(2), 113–144.

    Article  MathSciNet  MATH  Google Scholar 

  • Allman, E. S., & Rhodes, J. A. (2006). The identifiability of tree topology for phylogenetic models, including covarion and mixture models. Journal of Computational Biology, 13(5), 1101–1113.

    Article  MathSciNet  Google Scholar 

  • Allman, E. S., & Rhodes, J. A. (2008). Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Mathematical Biosciences, 211(1), 18–33.

    Article  MathSciNet  MATH  Google Scholar 

  • Allman, E. S., & Rhodes, J. A. (2008). Phylogenetic ideals and varieties for the general Markov model. Advances in Applied Mathematics, 40(2), 127–148.

    Article  MathSciNet  MATH  Google Scholar 

  • Allman, E. S., & Rhodes, J. A. (2009). The identifiability of covarion models in phylogenetics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(1), 76–88.

    Article  Google Scholar 

  • Allman, E. S., Ané, C., & Rhodes, J. A. (2008). Identifiability of a Markovian model of molecular evolution with gamma-distributed rates. Advances in Applied Probability, 40, 229–249. arXiv:0709.0531.

    Article  MathSciNet  MATH  Google Scholar 

  • Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37(6A), 3099–3132.

    Article  MathSciNet  MATH  Google Scholar 

  • Allman, E. S., Petrović, S., Rhodes, J. A., & Sullivant, S. (2010). Identifiability of two-tree mixtures for group-based models. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3), 710–722.

    Article  Google Scholar 

  • Allman, E. S., Matias, C., & Rhodes, J. A. (2011). Parameter identifiability in a class of random graph mixture models. Journal of Statistical Planning and Inference, 141, 1719–1736.

    Article  MathSciNet  MATH  Google Scholar 

  • Chai, J., & Housworth, E. A. (2011, to appear). On Rogers’s proof of identifiability for the GTR+Γ+I model. Systematic Biology.

  • Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Mathematical Biosciences, 137(1), 51–73.

    Article  MathSciNet  MATH  Google Scholar 

  • Cox, D., Little, J., & O’Shea, D. (1997). Ideals, varieties, and algorithms: an introduction to computational algebraic geometry and commutative algebra (2nd edn.). New York: Springer.

    Google Scholar 

  • Degnan, J. H., & Salter, L. A. (2005). Gene tree distributions under the coalescent process. Evolution, 59, 24–37.

    Google Scholar 

  • Eriksson, N. (2005). Tree construction using singular value decomposition. In Algebraic statistics for computational biology (pp. 347–358). New York: Cambridge University Press.

    Chapter  Google Scholar 

  • Felsenstein, J. (2004). Inferring phylogenies. Sunderland: Sinauer.

    Google Scholar 

  • Huelsenbeck, J. P., & Suchard, M. A. (2007). A nonparametric method for accommodating and testing across-site rate variation. Systematic Biology, 56(6), 975–987.

    Article  Google Scholar 

  • Kim, J. (2000). Slicing hyperdimensional oranges: the geometry of phylogenetic estimation. Molecular Phylogenetics and Evolution, 17(1), 58–75.

    Article  Google Scholar 

  • Kruskal, J. B. (1976). More factors than subjects, tests and treatments: an indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.

    Article  MathSciNet  MATH  Google Scholar 

  • Kruskal, J. B. (1977). Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications, 18(2), 95–138.

    Article  MathSciNet  MATH  Google Scholar 

  • Landsberg, J. M. (2011). The geometry of tensors with applications. Manuscript.

  • Le, S. Q., Lartillot, N., & Gascuel, O. (2008). Phylogenetic mixture models for proteins. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 363, 3965–3976.

    Article  Google Scholar 

  • Matsen, F. A., & Steel, M. A. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Systematic Biology, 56(5), 767–775.

    Article  Google Scholar 

  • Matsen, F. A., Mossel, E., & Steel, M. (2008). Mixed-up trees: the structure of phylogenetic mixtures. Bulletin of Mathematical Biology, 70(4), 1115–1139.

    Article  MathSciNet  MATH  Google Scholar 

  • Mossel, E., & Vigoda, E. (2005). Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science, 309, 2207–2209.

    Article  Google Scholar 

  • Pagel, M., & Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53(4), 571–581.

    Article  Google Scholar 

  • Pagel, M., & Meade, A. (2005). Mixture models in phylogenetic inference. In O. Gascuel (Ed.), Mathematics of evolution and phylogeny (pp. 121–142). Oxford: Oxford University Press.

    Google Scholar 

  • Rannala, B. (2002). Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51(5), 754–760.

    Article  Google Scholar 

  • Rhodes, J. A. (2010). A concise proof of Kruskal’s theorem on tensor decomposition. Linear Algebra and Its Applications, 432(7), 1818–1824.

    Article  MathSciNet  MATH  Google Scholar 

  • Semple, C., & Steel, M. (2003). Oxford lecture series in mathematics and its applications: Vol. 24. Phylogenetics. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Štefankovič, D., & Vigoda, E. (2007). Phylogeny of mixture models: Robustness of maximum likelihood and non-identifiable distributions. Journal of Computational Biology, 14(2), 156–189.

    Article  MathSciNet  Google Scholar 

  • Strassen, V. (1983). Rank and optimal computation of generic tensors. Linear Algebra and Its Applications, 52/53, 645–685.

    MathSciNet  Google Scholar 

  • Wakeley, J. (2008). Coalescent theory. Greenwood Village: Roberts & Company.

    Google Scholar 

  • Wang, H. C., Li, K., Susko, E., & Roger, A. J. (2008). A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. BMC Evolutionary Biology, 8, 331.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John A. Rhodes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rhodes, J.A., Sullivant, S. Identifiability of Large Phylogenetic Mixture Models. Bull Math Biol 74, 212–231 (2012). https://doi.org/10.1007/s11538-011-9672-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-011-9672-2

Keywords

Navigation