## Abstract

Motivated by phylogenetics, our aim is to obtain a system of low degree equations that define a phylogenetic variety on an open set containing the biologically meaningful points. In this paper we consider phylogenetic varieties defined via group-based models. For any finite abelian group \(G\), we provide an explicit construction of \({{\mathrm{codim}}}X\) polynomial equations (phylogenetic invariants) of degree at most \(|G|\) that define the variety \(X\) on a Zariski open set \(U\). The set \(U\) contains all biologically meaningful points when \(G\) is the group of the Kimura 3-parameter model. In particular, our main result confirms (Michałek, Toric varieties: phylogenetics and derived categories, PhD thesis, Conjecture 7.9, 2012) and, on the set \(U\), Conjectures 29 and 30 of Sturmfels and Sullivant (J Comput Biol 12:204–228, 2005).

This is a preview of subscription content, access via your institution.

## Notes

- 1.
Formally, by a signed multiset we mean a pair of multisets on the same base set. The first multiset represents the positive multiplicities, the second one negative.

- 2.
Formally, if an element belongs to both multisets (the negative and the positive one) we cancel it.

## References

- 1.
Allman, E.S., Rhodes, J.A.: Phylogenetic invariants for the general Markov model of sequence mutation. Math. Biosci.

**186**(2), 113–144 (2003) - 2.
Allman, E.S., Rhodes, J.A.: Quartets and parameter recovery for the general Markov model of sequence mutation. Appl. Math. Res. Express

**2004**(4), 107–131 (2004) - 3.
Allman, E.S., Rhodes, J.A.: Phylogenetic invariants. In: Gascuel, O., Steel, M.A. (eds.) Reconstructing Evolution. Oxford University Press, Oxford (2007)

- 4.
Allman, Elizabeth S., Rhodes, John A.: Phylogenetic ideals and varieties for the general Markov model. Adv. Appl. Math.

**40**(2), 127–148 (2008) - 5.
Bruns, W.: The quest for counterexamples in toric geometry. arXiv:1110.1840 (2011)

- 6.
Buczyńska, W., Wiśniewski, J.A.: On geometry of binary symmetric models of phylogenetic trees. J. Eur. Math. Soc.

**9**(3), 609–635 (2007) - 7.
Casanellas, M.: Algebraic tools for evolutionary biology. EMS Newsl.

**86**, 12–18 (2012) - 8.
Casanellas, M., Fernandez-Sanchez, J.: Performance of a new invariants method on homogeneous and nonhomogeneous quartet trees. Mol. Biol. Evol.

**24**(1), 288–293 (2007) - 9.
Casanellas, M., Fernandez-Sanchez, J.: Geometry of the Kimura 3-parameter model. Adv. Appl. Math.

**41**, 265–292 (2008) - 10.
Casanellas, M., Fernandez-Sanchez, J.: Relevant phylogenetic invariants of evolutionary models. J. de Mathémat. Pures et Appl.

**96**, 207–229 (2011) - 11.
Chang, J.T.: Full reconstruction of Markov models on evolutionary trees: identifiability and consistency, pp. 51–73

- 12.
Cox, D.A., Little, J.B., Schenck, H.K.: Toric Varieties. American Mathematical Soc., Providence (2011)

- 13.
Cohen, J.E.: Mathematics is biology’s next microscope, only better; biology is mathematics’ next physics, only better. PLoS Biol

**2**(12) (2004) - 14.
Chifman, J., Petrović, S.: Toric ideals of phylogenetic invariants for the general group-based model on claw trees \(k_{1, n}\). In: Proceedings of the 2nd International Conference on Algebraic Biology, pp. 307–321 (2007)

- 15.
Donten-Bury, M., Michałek, M.: Phylogenetic invariants for group-based models. J. Algebr. Stat.

**3**(1) (2012) - 16.
Draisma, J., Kuttler, J.: On the ideals of equivariant tree models. Math. Ann.

**344**(3), 619–644 (2009) - 17.
Fulton, W.: Introduction to toric varieties. Annals of Mathematics Studies, vol. 131, The William H. Roever Lectures in Geometry. Princeton University Press, Princeton (1993)

- 18.
Hendy, M., Penny, D.: A framework for the quantitative study of evolutionary trees. Syst. Zool.

**38**, 297–309 (1989) - 19.
Lasoń, M., Michałek, M.: On the toric ideal of a matroid. Adv. Math.

**259**(2014) - 20.
Michałek, M.: Geometry of phylogenetic group-based models. J. Algebr.

**339**(1), 339–356 (2011) - 21.
Michałek, M.: Toric varieties: phylogenetics and derived categories, PhD thesis (2012)

- 22.
Michałek, M.: Constructive degree bounds for group-based models. J. Combin. Theory Ser. A

**120**(7), 1672–1694 (2013) - 23.
Michałek, M.: Toric geometry of the 3-kimura model for any tree. Adv. Geom.

**14**(1), 11–30 (2014) - 24.
Miller, E., Sturmfels, B.: Combinatorial commutative algebra, Graduate Texts in Mathematics, vol. 227. Springer, New York (2005)

- 25.
Pachter, L., Sturmfels, B.: Tropical geometry of statistical models. Proc. Natl. Acad. Sci.

**101**, 16132–16137 (2004) - 26.
Pachter, L., Sturmfels, B.: Algebraic Statistics for Computational Biology. Cambridge University Press, Cambridge (2005)

- 27.
Sturmfels, B., Sullivant, S.: Toric ideals of phylogenetic invariants. J. Comput. Biol.

**12**, 204–228 (2005) - 28.
Sturmfels, B.: Gröbner bases and convex polytopes, University Lecture Series, vol. 8. American Mathematical Society, Providence (1996)

- 29.
Sullivant, S.: Toric fiber products. Computational Algebra. J. Algebr.

**316**(2), 560–577 (2007) - 30.
White, N.: A unique exchange property for bases. Linear Algebr. Appl.

**31**, 81–91 (1980)

## Acknowledgments

M. Michałek would like to thank Centre de Recerca Matemàtica (CRM), Institut de Matemàtiques de la Universitat de Barcelona (IMUB), Universitat Politècnica de Catalunya, and in particular Rosa-Maria Miró-Roig, for invitation and great working atmosphere.

## Author information

### Affiliations

### Corresponding author

## Additional information

M. Casanellas and J. Fernández-Sánchez are partially supported by Spanish government MTM2012-38122-C03-01/FEDER and Generalitat de Catalunya 2009SGR1284. M. Michałek was supported by Polish National Science Centre grant number DEC-2012/05/D/ST1/01063.

## Appendix

### Appendix

###
*Proof of Proposition 2.15*

The last part of the Proposition is implied by:

Thus it is enough to prove the above equality.

Clearly the elements of \(\tilde{M}_0\) are invariant under the action of \(G^N\), hence \(\mathbb {C}[\tilde{M}_0]\subset \mathbb {C}[M_0]^{(G^N)}\). The elements of \(M_0\) form a basis of \(\mathbb {C}[M_0]\) consisting of eigenvectors with respect to the \(G^N\) action. Thus any invariant vector must be a linear combination of invariant elements of \(M_0\). It remains to prove that an element of \(M_0\) that is invariant with respect to \(G^N\) belongs to \(\tilde{M}_0\). The proof is inductive on the number of nodes of the tree \(T\).

First suppose that \(T\) has one interior node, that is \(T\) is a claw tree, with \(\mathtt{l}\) leaves. Consider an invariant element of \(M_0\) given by \(R:=\sum _{j=1}^\mathtt{l}\sum _{g\in G} a_{(j,g)} b_{(j,g)}\) with the condition \(\sum _{g\in G} a_{(1,g)}=\dots =\sum _{g\in G} a_{(\mathtt{l},g)}=0\). We will reduce \(Q\) to zero modulo \(\tilde{M}_0\). Notice that for any \(1\le j\le \mathtt{l}\), \(g_1,g_2\in G\) the element \(S_{j,g_1,g_2}:=b_{(j,g_1)}+b_{(j,g_2)}-b_{(j,g_1+g_2)}-b_{(j,\mathbf {0})}\) belongs to \(\tilde{M}_0\). Indeed, for example for \(j=1\) it equals:

Using elements as above we can reduce \(R\) and assume that for any \(g\ne 0\) and \(1\le j\le \mathtt{l}\), the coefficient \(a_{(j,g)}\) is zero apart from one \(g\) for each \(j\), for which the coefficient can be equal to one. Precisely, if for some \(j\) coefficients \(a_{(j,g_1)},a_{(j,g_2)}\) are positive (resp. negative) we subtract (resp. add) \(S_{j,g_1,g_2}\). If there is a positive entry \(a_{(j,g_1)}\) and a negative \(a_{(j,g_2)}\) we add \(S_{j,g_2,g_1-g_2}\). If a coefficient \(a_{(j,g)}\) is negative we add \(S_{j,g,-g}\). If a coefficient \(a_{j,g}>1\) we subtract \(S_{j,g_1,g_1}\). All these operations either strictly decrease \(\sum _{g\ne 0} |a_{j,g}|\) or leave the sum unchanged and increase the sum of negative coefficients. Thus the procedure must finish.

In other words, \(R=\sum _{j=1}^\mathtt{l}b_{(j,g_j)}-Q_{[0,\dots ,0]}\) modulo \(\tilde{M}_0\). As \(R\) is invariant, we obtain \(\sum _{j=1}^\mathtt{l}g_j=0\), which finishes the first inductive step.

Suppose now that \(T\) has more than one interior nodes. Consider an invariant element \(R\in M_0\) as before. By choosing an interior edge \(m\in E\) we can present \(T=T_1\star T_2\). The element \(Q\) induces two invariant elements \(R_i\in M_{0,T_i}\) for \(i=1,2\). By the inductive assumption we obtain: \(R_i=\sum _j c_{i,j}Q_{f_{i,j}}\), where \(c_{i,j}\in \mathbb {Z}\), \(\sum _j c_{i,j}=0\) and \(Q_{f_{i,j}}\in P_{T_i}\) correspond to flows \(f_{i,j}\) on the tree \(T_i\). Let us consider the signed multisets^{Footnote 1}
\(Z_i\) that are the projections of \(\sum c_{i,j}Q_{f_{i,j}}\) onto the edge \(m\)—each \(f_{i,j}\) distinguishes an element on \(m\). The multiset \(Z_i\) has \(c_{i,j}\) elements distinguished by \(f_{i,j}\) with a minus sign if \(c_{i,j}<0\). \(Z_i\) is a signed multiset of group elements. Let \(Z_i'\) be a signed multiset obtained by reductions cancelling \(g\) with \(-g\) in the multiset \(Z_i\)
^{Footnote 2}. The multiset \(Z_1'\) is just the signed multiset of group elements corresponding to the projection of \(R\) to \(m\). Thus, the multiset \(Z_2'\) is the same multiset as \(Z_1'\). This means that we can pair together elements from \(Z_1'\) and \(Z_2'\) such that each pair gives rise to a flow on the tree \(T\). The image of the sum of these flows does *not* have to equal \(R\) yet. We have to lift also the flows that we canceled by passing from \(Z_i\) to \(Z_i'\). This is done as follows. Suppose that two flows \(f_{1,j_0}\) and \(f_{1,j_1}\) on \(T_1\) associate \(g\) to the edge \(m\), but \(c_{1,j_0}>0\) and \(c_{1,j_1}<0\). Then, \(f_{1,j_0}\) and \(-f_{1,j_1}\) were canceling each other in \(Z_1\). We choose any flow \(s\) on \(T_2\) that associates \(g\) to the edge \(m\). We can glue together \(f_{1,j_0}\) and \(s\) obtaining a flow \(f_{1,j_0}\star s\) on the tree \(T\) and analogously \(f_{1,j_1}\star s\). The difference of flows \(Q_{f_{1,j_0}\star s}-Q_{f_{1,j_1}\star s}\) has the same coordinates \(b_{(e,g)}\) on the edges \(e\) of the tree \(T_1\) as \(Q_{f_{1,j_0}}-Q_{f_{1,j_1}}\). Moreover, the coordinates \(b_{(e,g)}\) for the edges \(e\) belonging to \(T_2\) are equal to zero. In this way we obtain the flows of \(T\) with the signed sum equal to \(\sum c_jf_{i,j}\) on \(T_i\), hence equal to \(R\). \(\square \)

## Rights and permissions

## About this article

### Cite this article

Casanellas, M., Fernández-Sánchez, J. & Michałek, M. Low degree equations for phylogenetic group-based models.
*Collect. Math.* **66, **203–225 (2015). https://doi.org/10.1007/s13348-014-0120-0

Received:

Accepted:

Published:

Issue Date:

### Mathematics Subject Classification

- 92D15
- 14H10
- 60J20