Identifiability of Large Phylogenetic Mixture Models

Rhodes, John A.; Sullivant, Seth

doi:10.1007/s11538-011-9672-2

Identifiability of Large Phylogenetic Mixture Models

Original Article
Published: 30 June 2011

Volume 74, pages 212–231, (2012)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

John A. Rhodes¹ &
Seth Sullivant²

288 Accesses
28 Citations
Explore all metrics

Abstract

Phylogenetic mixture models are statistical models of character evolution allowing for heterogeneity. Each of the classes in some unknown partition of the characters may evolve by different processes, or even along different trees. Such models are of increasing interest for data analysis, as they can capture the variety of evolutionary processes that may be occurring across long sequences of DNA or proteins. The fundamental question of whether parameters of such a model are identifiable is difficult to address, due to the complexity of the parameterization. Identifiability is, however, essential to their use for statistical inference.

We analyze mixture models on large trees, with many mixture components, showing that both numerical and tree parameters are indeed identifiable in these models when all trees are the same. This provides a theoretical justification for some current empirical studies, and indicates that extensions to even more mixture components should be theoretically well behaved. We also extend our results to certain mixtures on different trees, using the same algebraic techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allman, E. S., & Rhodes, J. A. (2003). Phylogenetic invariants for the general Markov model of sequence mutation. Mathematical Biosciences, 186(2), 113–144.
Article MathSciNet MATH Google Scholar
Allman, E. S., & Rhodes, J. A. (2006). The identifiability of tree topology for phylogenetic models, including covarion and mixture models. Journal of Computational Biology, 13(5), 1101–1113.
Article MathSciNet Google Scholar
Allman, E. S., & Rhodes, J. A. (2008). Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Mathematical Biosciences, 211(1), 18–33.
Article MathSciNet MATH Google Scholar
Allman, E. S., & Rhodes, J. A. (2008). Phylogenetic ideals and varieties for the general Markov model. Advances in Applied Mathematics, 40(2), 127–148.
Article MathSciNet MATH Google Scholar
Allman, E. S., & Rhodes, J. A. (2009). The identifiability of covarion models in phylogenetics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(1), 76–88.
Article Google Scholar
Allman, E. S., Ané, C., & Rhodes, J. A. (2008). Identifiability of a Markovian model of molecular evolution with gamma-distributed rates. Advances in Applied Probability, 40, 229–249. arXiv:0709.0531.
Article MathSciNet MATH Google Scholar
Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37(6A), 3099–3132.
Article MathSciNet MATH Google Scholar
Allman, E. S., Petrović, S., Rhodes, J. A., & Sullivant, S. (2010). Identifiability of two-tree mixtures for group-based models. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3), 710–722.
Article Google Scholar
Allman, E. S., Matias, C., & Rhodes, J. A. (2011). Parameter identifiability in a class of random graph mixture models. Journal of Statistical Planning and Inference, 141, 1719–1736.
Article MathSciNet MATH Google Scholar
Chai, J., & Housworth, E. A. (2011, to appear). On Rogers’s proof of identifiability for the GTR+Γ+I model. Systematic Biology.
Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Mathematical Biosciences, 137(1), 51–73.
Article MathSciNet MATH Google Scholar
Cox, D., Little, J., & O’Shea, D. (1997). Ideals, varieties, and algorithms: an introduction to computational algebraic geometry and commutative algebra (2nd edn.). New York: Springer.
Google Scholar
Degnan, J. H., & Salter, L. A. (2005). Gene tree distributions under the coalescent process. Evolution, 59, 24–37.
Google Scholar
Eriksson, N. (2005). Tree construction using singular value decomposition. In Algebraic statistics for computational biology (pp. 347–358). New York: Cambridge University Press.
Chapter Google Scholar
Felsenstein, J. (2004). Inferring phylogenies. Sunderland: Sinauer.
Google Scholar
Huelsenbeck, J. P., & Suchard, M. A. (2007). A nonparametric method for accommodating and testing across-site rate variation. Systematic Biology, 56(6), 975–987.
Article Google Scholar
Kim, J. (2000). Slicing hyperdimensional oranges: the geometry of phylogenetic estimation. Molecular Phylogenetics and Evolution, 17(1), 58–75.
Article Google Scholar
Kruskal, J. B. (1976). More factors than subjects, tests and treatments: an indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.
Article MathSciNet MATH Google Scholar
Kruskal, J. B. (1977). Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications, 18(2), 95–138.
Article MathSciNet MATH Google Scholar
Landsberg, J. M. (2011). The geometry of tensors with applications. Manuscript.
Le, S. Q., Lartillot, N., & Gascuel, O. (2008). Phylogenetic mixture models for proteins. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 363, 3965–3976.
Article Google Scholar
Matsen, F. A., & Steel, M. A. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Systematic Biology, 56(5), 767–775.
Article Google Scholar
Matsen, F. A., Mossel, E., & Steel, M. (2008). Mixed-up trees: the structure of phylogenetic mixtures. Bulletin of Mathematical Biology, 70(4), 1115–1139.
Article MathSciNet MATH Google Scholar
Mossel, E., & Vigoda, E. (2005). Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science, 309, 2207–2209.
Article Google Scholar
Pagel, M., & Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53(4), 571–581.
Article Google Scholar
Pagel, M., & Meade, A. (2005). Mixture models in phylogenetic inference. In O. Gascuel (Ed.), Mathematics of evolution and phylogeny (pp. 121–142). Oxford: Oxford University Press.
Google Scholar
Rannala, B. (2002). Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51(5), 754–760.
Article Google Scholar
Rhodes, J. A. (2010). A concise proof of Kruskal’s theorem on tensor decomposition. Linear Algebra and Its Applications, 432(7), 1818–1824.
Article MathSciNet MATH Google Scholar
Semple, C., & Steel, M. (2003). Oxford lecture series in mathematics and its applications: Vol. 24. Phylogenetics. Oxford: Oxford University Press.
MATH Google Scholar
Štefankovič, D., & Vigoda, E. (2007). Phylogeny of mixture models: Robustness of maximum likelihood and non-identifiable distributions. Journal of Computational Biology, 14(2), 156–189.
Article MathSciNet Google Scholar
Strassen, V. (1983). Rank and optimal computation of generic tensors. Linear Algebra and Its Applications, 52/53, 645–685.
MathSciNet Google Scholar
Wakeley, J. (2008). Coalescent theory. Greenwood Village: Roberts & Company.
Google Scholar
Wang, H. C., Li, K., Susko, E., & Roger, A. J. (2008). A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. BMC Evolutionary Biology, 8, 331.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, 99775, USA
John A. Rhodes
Department of Mathematics, North Carolina State University, Raleigh, NC, 27695, USA
Seth Sullivant

Authors

John A. Rhodes
View author publications
You can also search for this author in PubMed Google Scholar
Seth Sullivant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John A. Rhodes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rhodes, J.A., Sullivant, S. Identifiability of Large Phylogenetic Mixture Models. Bull Math Biol 74, 212–231 (2012). https://doi.org/10.1007/s11538-011-9672-2

Download citation

Received: 16 November 2010
Accepted: 06 June 2011
Published: 30 June 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11538-011-9672-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifiability of Large Phylogenetic Mixture Models

Abstract

Access this article

Similar content being viewed by others

Phylogenetic mixtures and linear invariants for equal input models

Identifiability of Phylogenetic Parameters from k-mer Data Under the Coalescent

Dimensions of Group-Based Phylogenetic Mixtures

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifiability of Large Phylogenetic Mixture Models

Abstract

Access this article

Similar content being viewed by others

Phylogenetic mixtures and linear invariants for equal input models

Identifiability of Phylogenetic Parameters from k-mer Data Under the Coalescent

Dimensions of Group-Based Phylogenetic Mixtures

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation