Low degree equations for phylogenetic group-based models

Abstract

Motivated by phylogenetics, our aim is to obtain a system of low degree equations that define a phylogenetic variety on an open set containing the biologically meaningful points. In this paper we consider phylogenetic varieties defined via group-based models. For any finite abelian group \(G\), we provide an explicit construction of \({{\mathrm{codim}}}X\) polynomial equations (phylogenetic invariants) of degree at most \(|G|\) that define the variety \(X\) on a Zariski open set \(U\). The set \(U\) contains all biologically meaningful points when \(G\) is the group of the Kimura 3-parameter model. In particular, our main result confirms (Michałek, Toric varieties: phylogenetics and derived categories, PhD thesis, Conjecture 7.9, 2012) and, on the set \(U\), Conjectures 29 and 30 of Sturmfels and Sullivant (J Comput Biol 12:204–228, 2005).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    Formally, by a signed multiset we mean a pair of multisets on the same base set. The first multiset represents the positive multiplicities, the second one negative.

  2. 2.

    Formally, if an element belongs to both multisets (the negative and the positive one) we cancel it.

References

  1. 1.

    Allman, E.S., Rhodes, J.A.: Phylogenetic invariants for the general Markov model of sequence mutation. Math. Biosci. 186(2), 113–144 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. 2.

    Allman, E.S., Rhodes, J.A.: Quartets and parameter recovery for the general Markov model of sequence mutation. Appl. Math. Res. Express 2004(4), 107–131 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. 3.

    Allman, E.S., Rhodes, J.A.: Phylogenetic invariants. In: Gascuel, O., Steel, M.A. (eds.) Reconstructing Evolution. Oxford University Press, Oxford (2007)

  4. 4.

    Allman, Elizabeth S., Rhodes, John A.: Phylogenetic ideals and varieties for the general Markov model. Adv. Appl. Math. 40(2), 127–148 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. 5.

    Bruns, W.: The quest for counterexamples in toric geometry. arXiv:1110.1840 (2011)

  6. 6.

    Buczyńska, W., Wiśniewski, J.A.: On geometry of binary symmetric models of phylogenetic trees. J. Eur. Math. Soc. 9(3), 609–635 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. 7.

    Casanellas, M.: Algebraic tools for evolutionary biology. EMS Newsl. 86, 12–18 (2012)

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Casanellas, M., Fernandez-Sanchez, J.: Performance of a new invariants method on homogeneous and nonhomogeneous quartet trees. Mol. Biol. Evol. 24(1), 288–293 (2007)

    Article  Google Scholar 

  9. 9.

    Casanellas, M., Fernandez-Sanchez, J.: Geometry of the Kimura 3-parameter model. Adv. Appl. Math. 41, 265–292 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  10. 10.

    Casanellas, M., Fernandez-Sanchez, J.: Relevant phylogenetic invariants of evolutionary models. J. de Mathémat. Pures et Appl. 96, 207–229 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  11. 11.

    Chang, J.T.: Full reconstruction of Markov models on evolutionary trees: identifiability and consistency, pp. 51–73

  12. 12.

    Cox, D.A., Little, J.B., Schenck, H.K.: Toric Varieties. American Mathematical Soc., Providence (2011)

  13. 13.

    Cohen, J.E.: Mathematics is biology’s next microscope, only better; biology is mathematics’ next physics, only better. PLoS Biol 2(12) (2004)

  14. 14.

    Chifman, J., Petrović, S.: Toric ideals of phylogenetic invariants for the general group-based model on claw trees \(k_{1, n}\). In: Proceedings of the 2nd International Conference on Algebraic Biology, pp. 307–321 (2007)

  15. 15.

    Donten-Bury, M., Michałek, M.: Phylogenetic invariants for group-based models. J. Algebr. Stat. 3(1) (2012)

  16. 16.

    Draisma, J., Kuttler, J.: On the ideals of equivariant tree models. Math. Ann. 344(3), 619–644 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. 17.

    Fulton, W.: Introduction to toric varieties. Annals of Mathematics Studies, vol. 131, The William H. Roever Lectures in Geometry. Princeton University Press, Princeton (1993)

  18. 18.

    Hendy, M., Penny, D.: A framework for the quantitative study of evolutionary trees. Syst. Zool. 38, 297–309 (1989)

    Article  Google Scholar 

  19. 19.

    Lasoń, M., Michałek, M.: On the toric ideal of a matroid. Adv. Math. 259 (2014)

  20. 20.

    Michałek, M.: Geometry of phylogenetic group-based models. J. Algebr. 339(1), 339–356 (2011)

    Article  MATH  Google Scholar 

  21. 21.

    Michałek, M.: Toric varieties: phylogenetics and derived categories, PhD thesis (2012)

  22. 22.

    Michałek, M.: Constructive degree bounds for group-based models. J. Combin. Theory Ser. A 120(7), 1672–1694 (2013)

    Article  MathSciNet  Google Scholar 

  23. 23.

    Michałek, M.: Toric geometry of the 3-kimura model for any tree. Adv. Geom. 14(1), 11–30 (2014)

    MathSciNet  MATH  Google Scholar 

  24. 24.

    Miller, E., Sturmfels, B.: Combinatorial commutative algebra, Graduate Texts in Mathematics, vol. 227. Springer, New York (2005)

  25. 25.

    Pachter, L., Sturmfels, B.: Tropical geometry of statistical models. Proc. Natl. Acad. Sci. 101, 16132–16137 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  26. 26.

    Pachter, L., Sturmfels, B.: Algebraic Statistics for Computational Biology. Cambridge University Press, Cambridge (2005)

  27. 27.

    Sturmfels, B., Sullivant, S.: Toric ideals of phylogenetic invariants. J. Comput. Biol. 12, 204–228 (2005)

    Article  Google Scholar 

  28. 28.

    Sturmfels, B.: Gröbner bases and convex polytopes, University Lecture Series, vol. 8. American Mathematical Society, Providence (1996)

  29. 29.

    Sullivant, S.: Toric fiber products. Computational Algebra. J. Algebr. 316(2), 560–577 (2007)

  30. 30.

    White, N.: A unique exchange property for bases. Linear Algebr. Appl. 31, 81–91 (1980)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

M. Michałek would like to thank Centre de Recerca Matemàtica (CRM), Institut de Matemàtiques de la Universitat de Barcelona (IMUB), Universitat Politècnica de Catalunya, and in particular Rosa-Maria Miró-Roig, for invitation and great working atmosphere.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Marta Casanellas.

Additional information

M. Casanellas and J. Fernández-Sánchez are partially supported by Spanish government MTM2012-38122-C03-01/FEDER and Generalitat de Catalunya 2009SGR1284. M. Michałek was supported by Polish National Science Centre grant number DEC-2012/05/D/ST1/01063.

Appendix

Appendix

Proof of Proposition 2.15

The last part of the Proposition is implied by:

$$\begin{aligned} \mathbb {C}[\tilde{M}_0]=\mathbb {C}[M_0]^{(G^N)}. \end{aligned}$$

Thus it is enough to prove the above equality.

Clearly the elements of \(\tilde{M}_0\) are invariant under the action of \(G^N\), hence \(\mathbb {C}[\tilde{M}_0]\subset \mathbb {C}[M_0]^{(G^N)}\). The elements of \(M_0\) form a basis of \(\mathbb {C}[M_0]\) consisting of eigenvectors with respect to the \(G^N\) action. Thus any invariant vector must be a linear combination of invariant elements of \(M_0\). It remains to prove that an element of \(M_0\) that is invariant with respect to \(G^N\) belongs to \(\tilde{M}_0\). The proof is inductive on the number of nodes of the tree \(T\).

First suppose that \(T\) has one interior node, that is \(T\) is a claw tree, with \(\mathtt{l}\) leaves. Consider an invariant element of \(M_0\) given by \(R:=\sum _{j=1}^\mathtt{l}\sum _{g\in G} a_{(j,g)} b_{(j,g)}\) with the condition \(\sum _{g\in G} a_{(1,g)}=\dots =\sum _{g\in G} a_{(\mathtt{l},g)}=0\). We will reduce \(Q\) to zero modulo \(\tilde{M}_0\). Notice that for any \(1\le j\le \mathtt{l}\), \(g_1,g_2\in G\) the element \(S_{j,g_1,g_2}:=b_{(j,g_1)}+b_{(j,g_2)}-b_{(j,g_1+g_2)}-b_{(j,\mathbf {0})}\) belongs to \(\tilde{M}_0\). Indeed, for example for \(j=1\) it equals:

$$\begin{aligned} Q_{[g_1,-g_1,0,\dots ,0]}+Q_{[g_2,0,-g_2,0,\dots ,0]}-Q_{[g_1+g_2,-g_1,-g_2,0,\dots ,0]}-Q_{[0,\dots ,0]}. \end{aligned}$$

Using elements as above we can reduce \(R\) and assume that for any \(g\ne 0\) and \(1\le j\le \mathtt{l}\), the coefficient \(a_{(j,g)}\) is zero apart from one \(g\) for each \(j\), for which the coefficient can be equal to one. Precisely, if for some \(j\) coefficients \(a_{(j,g_1)},a_{(j,g_2)}\) are positive (resp. negative) we subtract (resp. add) \(S_{j,g_1,g_2}\). If there is a positive entry \(a_{(j,g_1)}\) and a negative \(a_{(j,g_2)}\) we add \(S_{j,g_2,g_1-g_2}\). If a coefficient \(a_{(j,g)}\) is negative we add \(S_{j,g,-g}\). If a coefficient \(a_{j,g}>1\) we subtract \(S_{j,g_1,g_1}\). All these operations either strictly decrease \(\sum _{g\ne 0} |a_{j,g}|\) or leave the sum unchanged and increase the sum of negative coefficients. Thus the procedure must finish.

In other words, \(R=\sum _{j=1}^\mathtt{l}b_{(j,g_j)}-Q_{[0,\dots ,0]}\) modulo \(\tilde{M}_0\). As \(R\) is invariant, we obtain \(\sum _{j=1}^\mathtt{l}g_j=0\), which finishes the first inductive step.

Suppose now that \(T\) has more than one interior nodes. Consider an invariant element \(R\in M_0\) as before. By choosing an interior edge \(m\in E\) we can present \(T=T_1\star T_2\). The element \(Q\) induces two invariant elements \(R_i\in M_{0,T_i}\) for \(i=1,2\). By the inductive assumption we obtain: \(R_i=\sum _j c_{i,j}Q_{f_{i,j}}\), where \(c_{i,j}\in \mathbb {Z}\), \(\sum _j c_{i,j}=0\) and \(Q_{f_{i,j}}\in P_{T_i}\) correspond to flows \(f_{i,j}\) on the tree \(T_i\). Let us consider the signed multisetsFootnote 1 \(Z_i\) that are the projections of \(\sum c_{i,j}Q_{f_{i,j}}\) onto the edge \(m\)—each \(f_{i,j}\) distinguishes an element on \(m\). The multiset \(Z_i\) has \(c_{i,j}\) elements distinguished by \(f_{i,j}\) with a minus sign if \(c_{i,j}<0\). \(Z_i\) is a signed multiset of group elements. Let \(Z_i'\) be a signed multiset obtained by reductions cancelling \(g\) with \(-g\) in the multiset \(Z_i\) Footnote 2. The multiset \(Z_1'\) is just the signed multiset of group elements corresponding to the projection of \(R\) to \(m\). Thus, the multiset \(Z_2'\) is the same multiset as \(Z_1'\). This means that we can pair together elements from \(Z_1'\) and \(Z_2'\) such that each pair gives rise to a flow on the tree \(T\). The image of the sum of these flows does not have to equal \(R\) yet. We have to lift also the flows that we canceled by passing from \(Z_i\) to \(Z_i'\). This is done as follows. Suppose that two flows \(f_{1,j_0}\) and \(f_{1,j_1}\) on \(T_1\) associate \(g\) to the edge \(m\), but \(c_{1,j_0}>0\) and \(c_{1,j_1}<0\). Then, \(f_{1,j_0}\) and \(-f_{1,j_1}\) were canceling each other in \(Z_1\). We choose any flow \(s\) on \(T_2\) that associates \(g\) to the edge \(m\). We can glue together \(f_{1,j_0}\) and \(s\) obtaining a flow \(f_{1,j_0}\star s\) on the tree \(T\) and analogously \(f_{1,j_1}\star s\). The difference of flows \(Q_{f_{1,j_0}\star s}-Q_{f_{1,j_1}\star s}\) has the same coordinates \(b_{(e,g)}\) on the edges \(e\) of the tree \(T_1\) as \(Q_{f_{1,j_0}}-Q_{f_{1,j_1}}\). Moreover, the coordinates \(b_{(e,g)}\) for the edges \(e\) belonging to \(T_2\) are equal to zero. In this way we obtain the flows of \(T\) with the signed sum equal to \(\sum c_jf_{i,j}\) on \(T_i\), hence equal to \(R\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Casanellas, M., Fernández-Sánchez, J. & Michałek, M. Low degree equations for phylogenetic group-based models. Collect. Math. 66, 203–225 (2015). https://doi.org/10.1007/s13348-014-0120-0

Download citation

Mathematics Subject Classification

  • 92D15
  • 14H10
  • 60J20