Decoupled molecules with binding polynomials of bidegree (n, 2)

We present a result on the number of decoupled molecules for systems binding two different types of ligands. In the case of n and 2 binding sites respectively, we show that there are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2(n!)^{2}$$\end{document}2(n!)2 decoupled molecules to a generic binding polynomial. For molecules with more binding sites for the second ligand, we provide computational results.


Introduction
In biology, a ligand is a substance that binds to a target molecule to serve a given purpose. A classical (Bohr et al. 1904;Hasselbalch 1917) and intensively studied (Barcroft 1913;Hill 1913) example is oxygen, which binds reversibly to hemoglobin to be transported through the bloodstream. Reversible mutual binding of different molecules is also a key feature in biological signal transduction (Changeux and Edel-stein 2005;Cho et al. 1996;Gutierrez et al. 2009;Ha and Ferrell 2016) and gene regulation (Gutierrez et al. 2012).
A common model for describing equilibrium and steady states of a ligand L binding to the sites of a target molecule M comes from the grand canonical ensemble of statistical mechanics (Ben-Naim 2001;Hill 1985;Schellman 1975;Wyman and Gill 1990). The grand partition function, in our context also known as the binding polynomial, arises as the denominator of the rational function describing the average number of occupied binding sites as a function of ligand activity. In the case of a target molecule with only one binding site, this rational function is given by where the variable Λ denotes the activity of the ligand in the environment, and a is a transformation of the binding energy depending on the temperature, which is usually assumed to be constant. This equation is also known as the (sigmoid) Henderson-Hasselbalch titration curve. Titration refers to the laboratory method used to obtain this curve. For systems of molecules with n binding sites it generalizes to the Adair equation (Adair et al. 1925;Stefan and Le Novère 2013): (Λ) = na n Λ n + (n − 1)a n−1 Λ n−1 + · · · + a 1 Λ a n Λ n + a n−1 Λ n−1 + · · · + a 1 Λ + 1 .
In this model, the binding polynomial and its roots play an important role for the characterization of the binding behavior of the ligand to the target molecule (Briggs 1983(Briggs , 1984(Briggs , 1985Connelly et al. 1986), in particular in the context of cooperativity (Abeliovich 2016;Stefan 2017). The rational functions of systems with n binding sites can be represented as sums of n Henderson-Hasselbalch curves (Martini and Ullmann 2013;Onufriev et al. 2001;Onufriev and Ullmann 2004), which means that any given system of interacting binding sites can be represented by a hypothetical molecule consisting of stochastic independent binding sites (Martini et al. 2013a) with the same titration curve. The roots of the binding polynomial determine the binding energies of the independent pseudo-sites in this so-called decoupled sites representation.
For two different types of ligands, the binding polynomial has two variables (representing the activities of both ligands in the environment) and (n 1 + 1) · (m 1 + 1) coefficients a i, j i=0...n 1 ; j=0,...,n 2 . Given an arbitrary binding polynomial for two types of ligands, it is not in general possible to find a molecule without interactions between all binding sites and possessing this binding polynomial. However, molecules can be found in which the binding sites for the same type of ligand do not interact and only interactions between sites for different ligands are non-trivial (Martini et al. 2013b, c). Contrary to the case of one type of ligand, where the decoupled sites representation is unique up to permutation of the roots, there are several different decoupled molecules. It has been shown previously that in the case of n and 1 binding sites for the two ligands, respectively, there are n! decoupled molecules. The situation becomes more complicated for general systems of n 1 and n 2 binding sites. The main goal of this paper is to prove the following theorem.  (n, 2) are the solutions to a system of 3n + 2 unknowns: the n + 2 binding energies and the 2n interaction energies. For generic binding polynomials, the number of complex solutions to this system equals 4(n!) 3 . These come in 2(n!) 2 classes under relabeling of the sites.
The article is structured as follows: In Sect. 2, we recall the definition of the binding polynomial and formulate the central question addressed in this work. In Sect. 3, we recall some results and techniques of numerical algebraic geometry, which are necessary to prove the main theorem in Sect. 4. We conclude the article with some experimental results in Sect. 5 and some open questions in Sect. 6.

Background and framework
In this section, we briefly recap the algebraic framework as well as past results, and, in doing so, fix various notations. Most importantly, we introduce some shorthand notation for molecules with (n, 2) sites for Sects. 3 and 4.

Single type of ligand
The binding behavior of systems with one type of ligand is governed by the energies required to bind to each site of the target molecule and the way different binding sites interact with each other. Following the notation of Martini and Ullmann (2013), we identify target molecules with these parameters.
Definition 2.1 A molecule M with n sites for one type of ligand is a point The g i are called the binding energies and the w i, j are called the interaction energies; they measure, respectively, the energy at each site i and the interaction energy between sites i and j (see Fig. 1). We call M decoupled if w i, j = 1 for all 1 ≤ i < j ≤ n.
We will consider the natural S n action that corresponds to relabelling the sites: σ · (g 1 , . . . , g n , w 1,2 , . . . , w n−1,n ) := (g σ (1) , . . . , g σ (n) , w σ (1),σ (2) , . . . , w σ (n−1),σ (n) ) for σ ∈ S n . Definition 2.2 Given a target molecule M with n sites, we refer to the power set K := 2 {1,...,n} as the set of all microstates. Each microstate I ∈ K describes a binding state by indicating whether binding site i is occupied (i ∈ I ) or not (i / ∈ I ). To each microstate I we associate a microstate constant The binding polynomial is then defined as It is a polynomial of degree n with constant term 1, and the map M → P M is constant on the S n orbits, i.e. P M (Λ) = P σ (M) (Λ) for every M ∈ (C * ) n(n+1) 2 and all σ ∈ S n .
The following theorem is also known as the decoupled sites representation. It implies that any molecule with real binding and interaction energies can be uniquely represented by a molecule with neutral interaction energy, provided that complex binding energies are allowed. Its proof consists of a reformulation of Vieta's formulas.

Multiple types of ligands
In case of d > 1 types of ligands, we consider each binding site to be only able to take up to one type of ligand (Martini et al. 2013c). This is sensible, as we can model a single binding site capable of binding to two types of ligands as two binding sites with interaction energies set so that the two sites can never be saturated at the same time.
For our purposes, let us assume that d = 2. We write n 1 and n 2 for the number of sites capable of binding to the first and second ligand, respectively.
Similar to the case d = 1, there is a natural S n 1 × S n 2 action that corresponds to relabelling the sites.
Definition 2.5 Similar to the case d = 1, we can define the binding polynomial P M of a molecule M. Explicitly, for decoupled molecules M, P M is a bivariate polynomial in the two (ligand) variables Λ 1 and Λ 2 , where the coefficients a i, j are given by It is a bivariate polynomial of bidegree (n 1 , n 2 ) with constant term 1. Moreover, the map M → P M is constant on the S n 1 × S n 2 -orbits, i.e. P M (Λ) = P σ (M) (Λ) for every M ∈ (C * ) n 1 +n 2 × (C * ) ( n 1 +n 2 2 ) and all σ ∈ S n 1 × S n 2 .
For (n 1 , n 2 ) = (n, 1), the decoupled sites representation takes the following form: Theorem 2.6 (Martini et al. 2013c, Corollary 2) For any molecule N with (n, 1) sites there exist, up to relabelling of the sites, and counted with multiplicity, n! decoupled molecules M of the same type such that P N = P M .

Decoupled molecules with (n, 2) sites
The main focus of this article are decoupled molecules with (n, 2) sites, for which we will simplify the notation as follows: instead of T 1 , . . . , T n , we label the n binding sites of the first type with 1, . . . , n, and, instead of S 1 , S 2 , we label the two binding sites of the second type with A, B (see Fig. 3), so that . . . a n,2 = g A g B g 1 w 1,A w 1,B . . . g n w n,A w n,B .
The formulas for the coefficients of the binding polynomial then simplify to the polynomials in System (2) (see Fig. 4). For an explicit instance of the equations and their solutions, see Sect. 5.1. We denote the pair set of decoupled molecules with (n, 2) sites and their binding polynomials of bidegree (n, 2) (ignoring its constant term 1)

Numerical algebraic geometry
In this section we recall some basic notions of numerical algebraic geometry and its main workhorse: homotopy continuation. For that, we regard M as the preimage of 0 of the polynomial map where g := (g 1 , . . . , g n , g A , g B ) and w := (w 1,A , . . . , w n,A , w 1,B , . . . , w n,B ) are referred to as unknowns, and a = (a 1,0 , . . . , a n,2 ) are regarded as parameters. We fix the projection One fundamental and important concept is that solutions vary continuously in the parameters, which is summarized in the following theorem.

Theorem 3.1 (Sommese and Wampler 2005, Theorem
(2) for fixed a ∈ V, f (g, w; a ) = 0 has only isolated solutions for (g, w) ∈ U; (3) for fixed a ∈ V, the multiplicity of (g * , w * ; a * ) as a solution of f (g, w; a * ) = 0 is the sum of the multiplicities of the solutions of f (g, w; a ) = 0 for (g, w) ∈ U.
Given any parameter a ∈ C n , one can show that any solution g ∈ C n to f (g; a ) = 0 consists of the roots of the univariate polynomial t n + a 1 t n−1 + · · · + a n . This is commonly known as Vieta's formula (Hazewinkel 2013). Hence there exist a Zariskiopen set U := C n \ Disc x (x n + a 1 x n−1 + · · · + a n ) such that for any a ∈ U there are n! distinct simple solutions to f (g; a ) = 0. We say that there are generically n! solutions and refer to a ∈ U as a generic choice of parameters. Should x n +a 1 x n−1 +· · ·+a n = (x −1) n , then the only solution is g = (1, . . . , 1). Theorem 3.1 implies that this solution is of multiplicity n!. This will be important in the proof of Lemma 4.2. The arguably most essential tool in numerical algebraic geometry is path tracking: Given there exist under good circumstances (Sommese and Wampler 2005, Theorem 7 However, the solution path might diverge, which is why these problems are commonly studied in projective space. with two starting solutions (0; 1), (1; 1), the target parameter 0 and the continuous, The two solution paths are of which the first obviously converges, while the second diverges. Note that diverging paths can only appear if parameters occur in the coefficients of non-constant monomials, which is not the case in System (2), see proof of Theorem 4.4.

Generic decoupled molecules with (n, 2) sites
In this section, we show that a generic binding polynomial represents 4 · (n!) 3 decoupled molecules with (n, 2) sites. Due to the complexity of the system of polynomial equations, the proof is split in two parts. First, we study a special class of decoupled molecules and their binding polynomials. In a second step, we study their implication to the generic case.
The multiplicity follows from the fact that: • any solution (g A , g B ; a 0, j ) to the two equations in the first row is of multiplicity 2 (see Example 3.2), • the solution (g 1 , . . . , g n ; a i,0 ) = (1, . . . , 1; n i ) to the latter equations in the first column is of multiplicity n!, • given g A = g B = g i = 1, the solution (w i,A , w i,B ; a i,2 ) = (1, . . . , 1; n i ) to the latter equations in the second column is of multiplicity n!. and from the fact that the multiplicity of the entire system equals the product of the multiplicities of the three smaller systems in our case (Eisenbud and Harris 2016, Proposition 1.29). Proposition 4.3 A generic normalized binding polynomial represents 2n! decoupled molecules, each of multiplicity 2(n!) 2 .
Proof By Lemma 4.2, it suffices to show that System (3) has 2n! simple solutions for generic a = (a 1,1 , . . . , a n,1 ) ∈ C n , or rather for (a 1,1 , . . . , a n,1 ) ∈ U for some Euclidean open subset U ⊆ C n . For the sake of simplicity, we abbreviate a i := a i,1 and w i := w i,A for i = 1, . . . , n. Next, we introduce n new variables μ 1 , . . . , μ n and consider the following equivalent system of 2n equations in the 2n variables μ 1 , . . . , μ n , w 1 . . . , w n : . . . . . . a n − μ n = 1 w 1 . . . w n (−n) Let N be the variety cut out by the system above and let π μ and π a denote the three projections onto μ i and a i respectively.

A generic decoupled sites representation
In this subsection, we will infer from Proposition 4.3 the number of molecules a generic binding polynomial represents.
Theorem 4.4 A generic binding polynomial of bidegree (n, 2) represents 4(n!) 3 decoupled molecules with (n, 2) sites. These come in 2(n!) 2 classes modulo the S n ×S 2 action that corresponds to relabelling of the sites. to f (g, w; a * ) = 0 of multiplicity 2(n!) 2 , and we will now argue why each of them yields 2(n!) 2 solutions if we perturb a * slightly, see Fig. 6. Applying Theorem 3.1 to each of the solutions, we obtain an open subset U ⊆ X ×Y , where X := (C * ) n+2 × (C * ) n·2 , Y := C 3n+2 , such that • for any a ∈ π a (U ), U ∩ X × {a }) has only isolated solutions of f (g, w; a ) = 0, • the sum of the multiplicities of those isolated solutions is 4(n!) 3 .
It remains to show that U contains all isolated solutions of f (g, w; a ) = 0 for a ∈ π a (U ) and that the solutions are simple. Both follow from the fact that our parameters are exactly the constant terms, i.e. we can regard M as the graph of the polynomial map To see that U contains all solutions to f (g, w; a ) = 0, observe that any solution (g , w ; a ) has a solution path such that f (z(t); φ(t)) = 0 for all t ∈ (0, 1]. As lim t→0 h(z(t)) = lim t→0 φ(t) = a * and h is polynomial, lim t→0 z(t) has to converge. Therefore (lim t→0 z(t), a * ) is one of our 2n! solutions, which implies (g , w ; a ) ∈ U .
To see that solutions are simple for generic binding polynomial note that a solution (g , w ; a ) to f (g, w; a ) = 0 is singular if and only if the point (g , w ) is a critical point of h. Hence, any solution in the following open set will be simple (g , w ) critical point} consists of the images of all critical points of h. It remains to show that that S is not the entire ambient space Y , so that the set of all critical points of h is a proper subvariety of X of positive codimension and so is the Zariski-closure of S. This is equivalent to the fact that the Jacobian of h has a non-zero determinant, which we will show defining an ordering on the set of all monomials and proving that the determinant has a non-zero maximal monomial with respect to it. Let > be the monomial ordering defined by where > lex is the lexicographical ordering on N k for arbitrary k, i.e.
Letting h i, j denote the component of h consisting of the polynomial on the righthand-side of a i, j in System (2), the Jacobian of h is of the form , and it remains to show that both J g and J w have non-zero determinant. Explicitly, J g is of the form The following monomial, which is contained in the product of all diagonal entries, is maximal among the monomial appearing in the Leibniz formula for determinants: Moreover, this monomial can only be obtained by multiplying the diagonal elements, because each s i maximal in its row and and the earlier entries consists of monomials strictly smaller than it. This implies that s g occurs only once in the Leibniz formula for determinants, making it the maximal monomial in the thus non-zero determinant.
Ignoring g A , g B , g 1 , . . . , g n , which can be ignored in the search of the largest monomial due to our choice of ordering, and abbreviating w i,AB := w i,A w i,B , the matrix J w is of the form: ⎛ The following monomial, contained in the product of all diagonal entries, is maximal among the monomials appearing in the Leibniz formula for determinants: s w := (1) · (w 1,A ) · · · (w 1,A · · · w n−1,A ) · (w 1,A ) · (w 1,A w 2,AB ) · · · (w 1,A w 2,AB · · · w n−1,AB ) As before, this polynomial can only be obtained by multiplying all diagonal entries, as the earlier entries in a row consists of strictly smaller monomials. This implies that s w occurs only once in the Leibniz formula for determinants, making it the maximal monomial in the thus non-zero determinant.

Further experimental results
In this section, we provide some experimental results for (n, 2) and beyond. For simplicity, we will use randomly chosen a ∈ C (n+1)(2+1)−1 . Moreover, we will also fix a choice of g S 1 , . . . , g S n 1 , g T 1 , . . . , g T n 2 to factor out the natural S n 1 × S n 2 action on the roots of System (1), see Sect. 5.1.
All computations are done using one of the following three programs: bertini (https://bertini.nd.edu/): A solver for polynomial equations using numerical algebraic geometry. It has built-in features for parallel path-tracking, which proved to be particularly useful for big examples. gfan (http://home.math.au.dk/jensen/software/gfan/gfan.html): A software package for computing Gröbner fans and tropical varieties. It features a new algorithm for computing mixed volumes using tropical homotopy methods (Jensen 2016). Singular (Decker et al. 2016): A computer algebra system for polynomial computations, with special emphasis on commutative and non-commutative algebra, algebraic geometry, and singularity theory.
Scripts, tutorials and other auxiliary files for the computations are available at https:// software.mis.mpg.de.

-
N variables and with only finitely many solutions, the mixed volume of their Newton polytopes is an upper bound on the number of solutions that is attained provided the non-zero coefficients are generic. This is known as the Bernstein-Khovanskii-Kushnirenko Theorem (Bernstein 1975), or BKK Theorem in short. Figure 8 shows a table with the mixed volume for various (n 1 , n 2 ) computed using gfan. We see that the number for (n 1 , 1) and (n 1 , 2) corresponds with the theoretical results. Sadly, there is no apparent pattern for (n 1 , n 2 ) with n 2 > 2.
Note that there exist criteria on so-called Newton-degeneracy which guarantee that the mixed volume equals the number of roots (Huber and Sturmfels 1995). However, due to the high dimension of the Newton polytopes, these were infeasible to verify for the cases of interest such as (4, 3).

Counting solutions using Gröbner bases
Given a zero-dimensional polynomial ideal I C[x], the dimension of C[x]/I as a C-vector space equals the number of solutions counted with multiplicity, though in our specific case we only have solutions with multiplicity 1. The vector space dimension can be easily read off any Gröbner basis, but computing the Gröbner basis itself is a highly challenging task (Greuel and Pfister 2008, Section 1.8.5). In Fig. 8, red numbers mark all cases for which Gröbner bases were computable in Singular. The respective vector space dimensions (computed using the Singular command vdim) all coincided with the mixed volume. This shows that their coefficients are generic in the context of the BKK Theorem, though for all cases but (3, 3) this was already proven.

Open questions
We close with three open questions. Question 6.1 What is the number of solutions for (n 1 , n 2 )?
For binding polynomials of bidegree (n, 1) and (n, 2), the number of decoupled molecules is given by relatively simple expressions. Assuming that the mixed volumes of the Newton polytopes equals the number of solutions, Table 8 indicates a more complicated pattern in the number of decoupled molecules for (n, 3). The smallest interesting example is the case (4, 3) for which we conjecture that the number equals 162432.
Question 6.2 How many solutions with real, positive values for g i and w i, j exist?
For univariate binding polynomials, the existence of complex roots suggests that the system does strongly interact and cannot be represented by a real decoupled system. In particular it is an indicator for "cooperativity" (Martini et al. 2016). It is neither clear how this concept can be translated to decoupled molecules for two types of ligands nor which characteristic different decoupled molecules share. To develop an understanding, it would be helpful to determine the number of real, positive solutions for small examples.

Question 6.3 Find an algorithm to compute the minimal interaction energy that a molecule with prescribed binding polynomial has.
For univariate binding polynomials, a quantitative measure for "cooperativity" is mapping the polynomial to the minimal interaction energy which is required to generate it (Martini 2017). In more detail, the norm of a molecule is the product of all the absolutes of its interaction energies, |M| = |w i, j |, where |w| := max w, w −1 , while the norm of a polynomial is the minimal norm of all molecules that give rise to this polynomial. How can we calculate | |? It would be interesting to investigate whether the machinery that has been developed for Euclidean distance degree is applicable in our setting (Draisma et al. 2016).