Families of Toric Chemical Reaction Networks

We study families of chemical reaction networks whose positive steady states are toric, and therefore can be parameterized by monomials. Families are constructed algorithmically from a core network; we show that if a family member is multistationary, then so are all subsequent networks in the family. Further, we address the questions of model selection and experimental design for families by investigating the algebraic dependencies of the chemical concentrations using matroids. Given a family with toric steady states and a constant number of conservation relations, we construct a matroid that encodes important information regarding the steady state behaviour of the entire family. Among other things, this gives necessary conditions for the distinguishability of families of reaction networks with respect to a data set of measured chemical concentrations. We illustrate our results using multi-site phosphorylation networks.


Introduction
Many of the fundamental processes in biological cells can be described by a set of interlinked chemical reactions. Prominent examples of cellular processes regulated via biochemical interactions include immune response [1], cell signalling [2], cell death [3,4], and toxin formation [5]. For this reason the study of chemical reaction networks forms a central part of algebraic systems biology [6,7,8,9,10]. One approach focuses on the long term behaviour of networks by investigating their steady states and the relation of the number and stability of steady states to the network structure [10,11,13]. In this paper we investigate the positive steady states for algorithmically constructed reaction networks, which we call families, for which the positive steady states may be parameterized by monomials (i.e. polynomials with a single term). Further, we use the algebraic dependencies of the variables representing the chemical concentrations to investigate experimental design and model identification for entire families.
Families of networks are formally defined in Definition 3.2. To obtain an intuition for what could be described as a family, we introduce phosphorylation networks [13,19]. Phosphorylation is a vital signalling process in biochemistry and it is one of the most widely studied protein modifications. During phosphorylation a phosphoryl group (P O − 3 ) is added to an organic molecule which acts as a substrate. The chemical reaction is catalysed by enzymes and often many phosphoryl groups can be added to the same substrate. This process is known as multisite phosphorylation. There are two limiting mechanisms which have been investigated in the literature and which will serve as the running examples in this paper, namely, distributive phosphorylation and processive phosphorylation. In the distributive system the enzyme unbinds from the substrate after every time a P O − 3 is added and in the processive mechanism the enzyme stays bound until the substrate is fully phosphorylated. The desphosphorylation mechanisms work analogously. The distributive and processive mechanisms for one site substrates follow the same reaction scheme However, on a two-site substrate the mechanisms differ, namely, the distributive mechanism is given by and the processive system is It is clear that the constructions for both the distributive and the processive system can be extended to an N -site substrate and that two different procedures are needed to do so. Hence, we can consider these two mechanisms to be in two distinct families of networks.
In this paper we study graph theoretic constructions which allows us to rigorously identify families of networks and also construct them from a base graph. Most importantly, though, we investigate which steady state properties are conserved throughout a family. One of the central goals of this paper is to develop criteria to establish which families have members with multiple positive steady states, so-called multistationary networks. Confirming that a network is multistationary and finding the associated parameter regions is highly nontrivial; a range of different approaches have been applied previously (see [20] for a survey). In particular, monomial positive steady state parameterizations have proved fruitful due to their relations to toric varieties which are well understood in algebraic geometry [18].
Previous work relating to the concept of families in this paper considers so-called atoms of multistationarity, which are the smallest multistationary subnetworks which can induce multistationarity in their parent networks [25]. Network properties resulting from the gluing of networks are investigated in [26]. Other network modifications which preserve or destroy multistationarity are studied in [27]. Recent results extend the techniques for identifying multistationarity to highly structured networks [23] and networks with intermediate species [24,42]. In this paper we develop a concept of families of networks which unifies the notions of highly structured networks, embedded subnetworks as defined in [25] (i.e. distributive systems), and networks with intermediate species [42] (i.e. processive systems). We show in Theorem 5.2 that if a member of a family is multistationary then so are all larger networks obtained by the recursive construction of Definition 3.2.
Going beyond multistationarity, we also investigate the necessary conditions for model rejection among members of a family if only limited steady state data is available. In particular, in Section 4, we encode algebraic dependencies between the variables using a combinatorial object called an algebraic matroid [29,30]. From the algebraic matroid we find binomial relations which have to be satisfied by the chemical concentrations at any positive steady state, so-called steady state invariants. The results in Section 4.1 extend the previous research of [33,34,32] and the application of matroids for experimental design presented in [6,34]. Using these previous results, we give novel necessary conditions for the distinguishability of two members of a family of reaction networks with respect to a data set of measured chemical concentrations. Consequently, we can also prescribe which species to measure to be able to reject a family member.
In summary, the biochemical questions we would like to address are: 1. What conditions are needed to define families of chemical reaction networks and what is the relation between their steady states? (Section 3) 2. Can we use the family construction in model selection or parameter estimation for the entire family? (Section 4) 3. Can we find conditions such that multistationarity of one family member implies multistationarity for all subsequent members? (Section 5) These motivating questions from chemistry translate into the following algebraic questions which we answer using techniques from toric geometry and matroid theory: 1. What are the relations between the toric varieties defined by recursively constructed reaction graphs?
2. What is the connection between the circuit polynomials of matroids associated to different family members?
3. What is the relation between the positive parts of the steady state varieties of subsequent family members?
This paper is organised as follows. Section 2 introduces chemical reaction networks and relevant definitions from toric geometry and matroid theory. In Section 3 we give a rigorous definition of a family of reaction network graphs and some preliminary results. In Section 4 we focus on biochemical and algebraic question 2 using matroid theory. We also introduce some new terminology which simplifies the proofs in the remainder of the paper. In Section 5 we prove the main result on multistationarity using the matroidal language developed in the previous section. We discuss our results and suggest further directions in Section 6.

Background
In this section we briefly introduce aspects of chemical reaction network theory and discuss essential notions of algebraic geometry and matroid theory.

Chemical Reaction Network Theory
Informally a chemical reaction network (CRN) can be described by a multiset N = {S, C, R}, where S is the set of species, C is the set of linear combinations of species (complexes) and R is the set of reactions.
The multiset N defines a directed graph (digraph) G whose vertex set is C and whose edge set is defined by the reaction set R. The reaction from complex C i to C j is an element of the reaction set R if and only if there is a directed edge C i → C j in G. Let X l ∈ S and {α il } ∈ Z ≥0 . A reaction from complex C i = l α il X l to C j = l α jl X n , with rate constant κ is written as the constants α il are called the stoichiometric coefficients of the complex C i . Let the reaction vector for the th reaction C i → C j be r = α j − α i where α i , α j are the column vectors of the stoichiometric coefficients of the complexes C i and C j . The n × m matrix of all reaction vectors Γ = (r 1 , . . . , r m ) is called the stoichiometric matrix. The rate constant κ assigns a weight to each edge of the digraph G, making G a weighted digraph. Definition 2.2 gives the description of a CRN which we are going to adopt for the remainder of this paper. While this is not the standard definition of a CRN, but equivalent to it, it allows us to focus more on the graphical structure of CRNs.
Definition 2.2. A chemical reaction network is a weighted directed graph G = (C, R) with vertex set C, edge set R and edge weights κ = (κ 1 , . . . , κ m ) T .
To connect the graphical structure of a CRN to its dynamical properties a description of reaction kinetics is needed. Previous work introduced a number of reaction laws such as mass action, rational function kinetics, Michaelis-Menten kinetics or Hill function kinetics [14]. In this paper we use mass action kinetics which assigns a monomial to each complex in the network. Let the chemical concentration of chemical species X n be x n , then the monomial for complex C i is obtained by Hence, using the representation of complexes as monomials we represent C as a vector of monomials x α = (x α 1 , . . . , x αm ) T .
Example 2.3. Revisiting the one site phosphorylation network (1), we map each species to its concentration E → x 1 , F → x 2 , S 0 → x 3 , S 1 → x 4 , ES 0 → x 5 , F S 1 → x 6 and introduce a vector of rate constants κ = (κ 1 , . . . , κ 6 ) T . Then, the reaction network is represented by the weighted digraph The dynamics of the network can be expressed in terms of the network structure and the stoichiometric coefficients as a set of ordinary differential equations where A κ is the negative weighted graph Laplacian of G, α T is the matrix of stoichiometric coefficients and x α is a vector of monomials. An alternative representation of equation (6) assigns a monomial κ x α to the th reaction in the network to build a vector R(x) = (κ 1 x α 1 , . . . , κ m x αm ) T . Then, the dynamical system (6) is given by dx/dt = ΓR(x), where Γ is the stoichiometric matrix as above.
The left kernel of the stoichiometric matrix, Γ, is of biochemical importance as it describes conservation relations. Conservation relations induce linear relations between the variables. Informally we say that a set of species is conserved if their total concentration, c, is constant. In particular, suppose z ∈ ker Γ T , then z T (dx/dt) = 0 which implies z T x = c ∈ R >0 ; this latter equation is then a conservation relation.
Remark 2.4. Given a CRN, N, with a maximum of d independent conservation relations, the linear subspace defined by the conservation relations, often referred to as compatibility class, may be compactly written as Zx − c = 0, where c ∈ R d ≥0 . The rows of Zx − c define the subspace and the matrix Z = (z 1 , . . . , z d ) T represents the conservation relations.
Example 2.5. In the one site phosphorylation system (1) (also Example 2.3), the enzyme E is conserved, but can exist in two states, the free state E and the bound state ES 1 . Thus, the total concentration c 1 = x 1 +x 3 is conserved. In total there are three independent conservation relations as the total mass of E, F , and the substrate is conserved respectively.
We now proceed to defining the steady states of a chemical reaction network [19].
Often one is interested in whether a chemical reaction network can have multiple positive steady states for a given set of rate constants κ and total concentrations c.

Algebraic Geometry and Toric Steady States
In this subsection we briefly introduce the notion of toric varieties from algebraic geometry, and their connections to chemical reaction networks. For an introduction to toric varieties we refer the reader to [18,38].
Given polynomial equations f 1 (x 1 , . . . , x n ), . . . , f r (x 1 , . . . , x n ) in the polynomial ring C[x 1 , . . . , x n ] we define the algebraic variety Note that, by definition, g(x) = 0 for any x ∈ V (f 1 , . . . , f r ) and any polynomial g in the ideal . , x n ]; hence we may also associate a variety V (I) = V (f 1 , . . . , f r ) to an ideal I = f 1 , . . . , f r . As in §2.1 we will consider a chemical reaction network G = (C, R) defining a ODE model as in (6). The set of all steady states of this system of ODEs (in an affine space C n ) defines an algebraic variety generated by the steady state ideal The ring R is generated by the chemical concentrations and defined over the field C.
A toric variety is an algebraic variety which can be described as the image of a monomial map. Work over the field C and fix an integer d × n-matrix A, with columns a 1 , a 2 , . . . , a n , and rank d. A given column vector a i defines a (Laurent) monomial t a i = t a 1i 1 t a 2i 2 · · · t a di d where t ∈ (C * ) d , and C * = C\{0} denotes the non-zero elements of the field (this is sometimes called the complex torus). The affine toric variety over C defined by A is In words, the variety X A is the (Zariski) closure in C n of the image of the monomial map defined by A. The defining implicit equations for X A can always be chosen to be binomials, that is X A = V (I) where I is an ideal defined by binomial equations. Specifically by [12,Corrollary 4.3] we have that where c + i is equal to c i if c i > 0 and 0 otherwise, and where c − i is equal to |c i | if c i < 0 and 0 otherwise. The ideal I above is, additionally, a prime ideal; we say an ideal I ⊂ C[x 1 , . . . , x n ] is prime if whenever f ·g ∈ I then either f ∈ I or g ∈ I (or both are in I). Geometrically this means the associated variety is irreducible (i.e. it cannot be written as a non-trivial union of other varieties) and reduced (meaning a generic point has multiplicity one, e.g. the roots of x 2 = 0 have multiplicity two).
Conversely, any prime polynomial ideal f 1 , . . . , f m in C[x 1 , . . . , x n ], where each f i is a binomial, defines a toric variety X A in C n . Chemical reaction networks whose steady state ideal is generated by such prime ideals have been studied in the literature e.g. [19]. To simplify notation we make the following definition. Definition 2.8. A chemical reaction network whose steady state ideal has an associated prime that is binomial is a toric chemical reaction network.
The fact that toric varieties can be parameterized by monomials over C * also yields a parametric representation of the positive part of the toric variety, i.e. of the positive steady states of the associated reaction network. For this reason we restrict our analysis to toric networks. We will also wish to explicitly incorporate the reaction rates in our analysis, hence our starting point will be a binomial ideal with coefficients in R(κ) := R(κ 1 , . . . , κ m ), the field of rational functions of the reaction rates with real coefficients. More precisely we consider a prime binomial steady state ideal defined by ν equations, The vectors b + i and b − i are positive column vectors with disjoint support and let B = (b 1 , . . . , b ν ) be the matrix with columns called the exponent matrix of I. Then there exists a matrix A such that b i ∈ ker(A), for i = 1, . . . , ν. The matrix A is a d × n integer matrix where d is the dimension of the toric variety and n is the dimension of the ambient affine space. Note that, since the dimension of a variety cannot exceed the dimension of its ambient space, rank(A) ≤ d ≤ n.
Remark 2.9. Note that there is some freedom in the choice of the matrix A, as integer row operations on rows of A will not change its kernel and, hence, the variety it parameterizes is left unaltered. Therefore, we informally refer to any matrix A as the A-matrix (associated to a toric variety X A = V (I)).
To explicitly incorporate the reaction rates in the monomial parameterization we will now define the scaled toric variety of a matrix A and a vector x * ∈ C n . The value x * can be thought of a particular steady state which in turn depends on the reaction rates, see Remark 2.13. Definition 2.10 (Scaled Toric Variety). Given an x * = (x * 1 , . . . , x * n ) ∈ C n define a (Laurent) monomial map ψ for a i a column of A. The (Zariski) closure of the image of this monomial map defines a scaled toric variety, The monomial map ψ A also induces a parameterization map.
Definition 2.11. The parameterization map defined by the A-matrix is the C-algebra homomorphism Note, as above, the map φ A depends on x * ∈ C n .
Example 2.12. The steady state ideal of (1) is which is a (prime) binomial ideal in the ring R(κ 1 , . . . , κ 6 )[x 1 , . . . , x 6 ]. Hence, we can find a parameterization Remark 2.13. As shown in [19] the entries of x * are algebraic functions in κ. In practice we wish to restrict our choices of κ to those such that when we evaluate x * at some fixed The proposition below connects the number of conserved quantities to the dimension of the toric variety and is straightforward, however, we include a proof as we were unable to locate one in the literature.
Proof. Each linear form j contains a constant term c j ∈ R which can be chosen freely. For a sufficiently general choice of Because V is toric it is parameterized by monomials, and hence its dimension in the positive orthant is the same as its dimension over R n . Remark 2.15. In the notation of Proposition 2.14 any d > 0 implies that there is an infinite number of steady states for each general choice of c 1 , . . . , c d . Hence, the case d > 0 renders the discussion of multistationarity irrelevant. Therefore for the remainder of this paper we assume that d = 0 meaning that the dimension of the toric variety is equal to the number of conservation relations.
Example 2.16 (Example 2.12 cont.). The network of Example 2.12 it is known to have a finite number of positive steady states for each set of initial conditions (d = 0) and there are three independent conservation relations x 1 + x 5 = c 1 , x 2 + x 6 = c 2 and x 3 + x 4 + x 5 + x 6 = c 3 . This implies that the dimension of the (toric) steady state variety, V (I), is dim(V (I)) = 3 which is indeed the case.

Matroids
One of the focuses of this paper is to find and study independent subsets of chemical species. Independent subsets can give valuable information about which species concentrations have to be measured and which concentrations can be determined from measurements [6]. A simple example of linear independence is the set of three vectors The vectors v 1 and v 2 are said to from a basis of the set {v 1 , v 2 , v 3 } which is a familiar notion of linear algebra. The set {v 1 , v 2 , v 3 } is minimally dependent as it contains only a single element other than a basis and is a circuit. Matroid theory extends the notion of independence to polynomials rings [29]. First, we define a matroid.

Definition 2.17 (Matroid). A matroid M(E, I) is a pair of two finite sets (E, I)
where E is the ground set and I is a set of subsets of E, called independent sets, satisfying the following conditions.
2. If i ∈ I and i ⊆ i then i ∈ I. This is called the hereditary property.
This is the exchange property.
The notions of matroid basis, rank and circuit will also be useful.
Definition 2.18. Matroid bases, rank and circuits are defined as follows: • A basis S of a matroid M is a maximally independent subset, i.e. a subset S ∈ I such that S ∪ k is a dependent set for all k ∈ E − S. Define the set of all bases to be B.
• The rank ρ is a function on a subset of E which takes a set e ⊆ E, and returns the cardinality of the largest subset i ⊆ e which also satisfies i ∈ I.
• The circuits C are minimally dependent sets, i.e. subsets of E which satisfy C ∈ I and (C − z) ∈ I for any z ∈ C.
In particular for a matroid M it holds that all bases of M are equicardinal. An important notion in matroid theory is also the rank of a matroid. Due to the general definition of a matroid many mathematical objects have a matroid structure such as sets of vectors in a vector space or graphs. One class of matroids relevant for chemical reaction networks are algebraic matroids. Algebraic matroids encode the algebraic dependencies between the variables of a polynomial ring C[x 1 , . . . , x n ] in a prime ideal P . A more detailed discussion of these ideas in given in Section 4. For information on how to compute algebraic matroids we refer the reader to [29,39]. Since we are interested in the relation between the matroids of families of reaction networks we introduce the notion of a submatroid.
Submatroids of algebraic matroids contain a subset of the polynomial relations between the variables of circuits of the original matroid. The relations between the variables of circuits are referred to as circuit polynomials. Below we will consider algebraic matroids of toric networks and the relations of matroids within a family of networks.
Example 2.21. The algebraic matroid of 2.12 can be calculated using the computer algebra system Macaulay2 [31]. It is a matroid on the ground set E = {x 1 , . . . , x 6 } of rank 3 with 12 bases. Hence, as we will show in later sections, measuring only four species will suffice to reject the one site model, any three species in a basis and one extra species. Further, the matroid of the two site distributive system is a rank 3 matroid with 61 bases and the matroid of the two site processive network is a rank 3 matroid with 19 bases. Both two-site model matroids contain the one site matroid as a submatroid.

Families of Reaction Networks
In this paper we obtain results which hold for a range of networks rather than a single network. When the properties studied (i.e. multistationarity) are present in a "small network", these properties lift to all larger networks constructed by the procedure outlined below. We call the collection of all such networks a family N with members N i . In this section we rigorously define families of networks and give examples of chemical systems fulfilling the family property. Furthermore, we prove some results on families which have toric steady states.

Family Definition and Properties
Families of networks can be found in a wide range of biochemical settings such as multisite phosphorylation [13] (e.g. cellular signalling, DNA transcription, cell death), kinetic proofreading [1] (immune response) or compartmentalised diffusion [40] (spatial models).
We identify a family by properties of its reaction graphs. If we can construct the graph of another network from a given network by a fixed set of procedures, the networks are in the same family; an example of such a construction is given in Figure 1. 1. Add a set E of edges adjacent to both some vertices of M and some vertices of G n ; the resulting graph G int n is referred to as the intermediate graph (see also Definition 3.8).
2. Delete some subset of the edges of G n from the graph G int n to form G n+1 .
(c) Deleting edges to form the graph G 1 We formalise this construction using a definition from graph theory [41]. Graphs constructed as in Remark 3.1 form a family if they satisfy the conditions of Definition 3.2 below. Definition 3.2 (See also Section 2 of [41]). For a reaction graph G = (C, R) and for U ⊆ C let G[U ] be the subgraph induced by U and N G (U ) the set of vertices adjacent to some vertex in U . Consider a set of graphs Further, there exists a positive integer n, a labelled graph M and a set of reactions Y , such that the set {G i } i≥0 is called a family of graphs if the following properties hold: is equal to M for n > r. In addition, G n [W n ] is always the same graph.
Put simply the last condition says that the graph "added" to the previous graph must be the same for each step throughout the family; this is illustrated in Figure 2 and further elucidated in [41]. In the context of chemical reaction networks we add another condition, namely, that in every subsequent family member there appears at least one new chemical species: We assume that in a network with a given set of chemical species every complex which is chemically possible and that can be formed with this set of species is already present in the network; i. e. if in a network with species H 2 and O 2 chemistry dictates that the only possible reaction is 2 H 2 + O 2 → 2 H 2 O then only the complexes 2 H 2 + O 2 and 2 H 2 O exist for this species set. Therefore, to give rise to a larger network new species have to be added.  Remark 3.4. For ease of notation, the unlabelled graph M is drawn in a "dot and arrow" representation; e.g. the graph M = {·, · → ·} corresponds to a graph with two connected components, an isolated vertex and a component with two unlabelled vertices and a directed edge between them.
Example 3.5 (Distributive Phosphorylation). Distributive phosphorylation with N = 1 phosphorylation sites follows the reaction scheme The reaction scheme is a digraph Hence (in the notation of Definition 3.2) , M = {· → ·, · ·} and r = 1. The digraphs G 1 and G 2 define the family members N 1 and N 2 . Inductively, this procedure can be continued to member N N . By inspection of the green subgraph we can see that condition 3 of Definition 3.2 is fulfilled for any N > 0 and, therefore, the distributive phosphorylation networks form a family. Further, N 1 is a subnetwork of N 2 as defined in [25, Definition 2.2].
Example 3.6 (Processive Phosphorylation). Processive phosphorylation of a substrate with N = 1 phosphorylation sites follows the reaction scheme For N = 1 this is the same reaction scheme as for distributive phosphorylation, except for the relabelling of the fully phosphorylated substrate as product P . However, to construct the next family member from the digraph G 1 we use Where the vertices M = {·, ·} were added and r = 1 as in Definition 3.2. Again, the digraphs G 1 and G 2 define the family members N 1 and N 2 . Condition 3 of Definition 3.2 is fulfilled for any N > 1 and, therefore, the processive phosphorylation networks form a family.
Example 3.7 (Non-example). As mentioned in Remark 3.3 not every infinite sequence of graphs forms a family. Consider the autocatalytic networks . . , N and setting X N +1 = X 1 .
In the reaction above ∅ represents production and degradation of a molecule by a mechanism not further studied. Without loss of generality consider N 7 and N 8 ; we see that C 8 is not a union of C 7 and another graph as X 7 + X 1 ∈ C 7 ⊆ C 8 .
We conclude this subsection by considering the polynomial equations arising from the reaction graphs of successive members of a family. First, we see from Section 2.1 that every reaction has a unique rate constant, κ i , which is the edge weight of the reaction digraph. Further, in the dynamical system given by the equations (6) every monomial, which represents a reactant complex, is multiplied by a constant κ i . Note that isolated vertices in the reaction graph do not contribute to the dynamics of the network. Hence, for families of networks which satisfy Definition 3.2 with Y = ∅, we can derive the equations of the N th member of a family from the (N + 1) th member by setting all the edge weights of the edges added to the reaction graph to zero. Formally, we define the evaluation map For networks which require the deletion of a set of edges, Y , we need the concept of an intermediate network.  The dynamical equations of the N th and (N + 1) th member can be constructed from their intermediate network by defining the appropriate evaluation maps. As can be seen from condition 2 of Definition 3.2 the reaction (edge) set of the (N + 1) th member of a family is given by The edge sets for the N th and intermediate network are given by R N and R N ∪ E N respectively. Define a function µ which associates a unique edge weight, κ i,j , to every edge of the reaction graph, This results in the edge weights π + (µ (R N ∪ E N )) = {κ m +1 , . . . , κ m+m } = µ ((R N − Y ) ∪ E N ) and π − (µ (R N ∪ E N )) = {κ 1 , . . . , κ m } = µ (R N ), which, after taking the inverse of µ, can be associated to the reaction sets of the (N + 1) th and N th network respectively. In particular, if G int N is the intermediate graph associated to G N then applying π − to (the edge set of) G int N will yield G N and applying π + to (the edge set of) G int N will yield G N +1 .
The reaction scheme for the first intermediate network of the processive phosphorylation family. If κ 1 = κ 2 = κ 3 = 0 then the N = 2 site network is obtained. Equivalently, if we let κ 7 = κ 8 = κ 9 = κ 10 = κ 11 = 0, then we obtain the N = 1 site network. The corresponding steady state ideals can be obtained in this manner from the ideal from the intermediate network.

Toric Families
As described in Definition 2.8, a reaction network is called toric if it has toric steady states. Showing whether a chemical reaction network is toric is a non-trivial task; previous results have used deficiency theory [11], network structure based methods [19] or methods based on detecting binomiality [44]. Formally, one needs to find the associated primes of a steady state ideal and compute a Gröbner basis of each. If the reduced Gröbner basis is binomial, then the ideal is a prime binomial ideal [43]. However, as every toric variety has nonzero components, removing any components of the steady state variety contained in the coordinate axes is often a first step towards identifying the toric components. Algebraically this removal is accomplished by computing the saturation I ∞ = I : (x 1 · · · x n ) ∞ , [45]. In this subsection we prove a series of small results regarding the steady state ideals of families of toric networks. First, we define a toric family.  Lemma 3.13 states that saturation and the evaluation map, π, commute. Therefore, the network operation of deleting edges in the reaction graph can be carried out before, or after, finding the prime binomial ideal in the steady state ideal. Lemma 3.13. Let π − be the evaluation map (12) and let I N +1 , I N ⊆ R[κ 1 , . . . , κ m+m , x 1 , . . . , x n+n ] be the steady state ideals of the (N + 1) th and N th family member respectively. Then it holds that The proof of this lemma uses the the geometric interpretation of the statements of the lemma and can be found in Appendix A. Suppose that N is a toric family of reaction networks with the (N + 1) th member of the family having toric steady states specified by the binomial ideal I N +1 . Lemma 3.13 establishes a useful connection between species (corresponding to variables x), and reactions (corresponding to κ), or, equivalently, between vertices and edges of a reaction graph. By the extra Condition 4 of Definition 3.2 every new edge must either originate or end on a complex containing a new species. Hence, by applying the evaluation map π − an ideal of an intermediate network I int N (= I N +1 for nested subnetworks) is automatically mapped from R[x 1 , . . . , x n+n ] (corresponding to a network with n + n chemical species) to an ideal in the ring R[x 1 , . . . , x n ] (corresponding to a network with n species).
Example 3.14 (Processive Phosphorylation (cont.)). The steady state ideal corresponding the network of Example 3.10 is As can be seen using the evaluation map π − , which sets κ 7 = · · · = κ 11 = 0, the variables x 7 and x 8 disappear from the ideal. Further, computing the saturation I : (x 1 · · · x 8 ) ∞ and the appropriate evaluation maps we can show that the relations of Lemma 3.13 hold using Macaulay2 [31].
Proposition 3.15 below shows that the image of a binomial ideal under the evaluation map π is either the constant ideal (corresponding to an empty variety) or another binomial ideal. The proof can be found in Appendix A.
A relationship between the A-matrices of successive members of toric families is now found by considering the exponent matrices of the binomial ideals and their corresponding A-matrices. Theorem 3.17. Let I N +1 : (x 1 · · · x n+n ) ∞ ⊂ R(κ 1 , . . . , κ m+m )[x 1 , . . . , x n+n ] be a binomial ideal defining the positive steady states of the (N + 1) th member of a family of reaction networks. Also let π be the evaluation map which sends κ m+1 = · · · = κ m+m = 0 and assume that I N = π(I N +1 ) ⊂ R(κ 1 , . . . , κ m )[x 1 , . . . , x n ]. Then I N : (x 1 · · · x n ) ∞ is binomial. Further, for a choice of generators, let B N +1 , B N be the matrices associated to the exponents of the binomial ideals I N +1 and I N . Similarly let A N +1 and A N be the A-matrices corresponding to B N +1 and B N . Then: , if the submatrix of B N +1 formed by the columns which have at least one non-zero entry in the last n rows has rank at most n .
The proof can be found in Appendix A. An interesting Corollary of the above Theorem occurs if the number of conservation relations is constant. This section is concluded by relating the mathematical insights of the above theorems to families of toric chemical reaction networks.

Matroid Theory for Toric Chemical Reaction Networks
In this section we study biochemical and algebraic question 2 regarding parameter estimation and model rejection. We start by showing the equivalence of two matroids, an algebraic matroid and a much simpler linear matroid. We then decorate the linear matroid with binomials and finally apply these results to model rejection. First, we prove the that following is a matroid.  The next theorem shows that the PSS matroid and the algebraic matroid defined by the binomial ideal are identical. Therefore, the algebraic matroid can be studied directly using linear algebra operations on the columns of the A-matrix defining the PSS matroid. This way bases, circuits and even circuit polynomials can be inferred trivially. The proof can be found in Appendix A. We now proceed to find decorations of the PSS matroid's circuits which reveal the full power of the PPS matroid formulation.
with λ i l ∈ Z chosen such that j−1 l=1 λ i l a i l = λ i j a i j and λ i j is positive (this is possible since C is a circuit). The expression Φ(C) is called the Laurent binomial associated to C.
and clearing the denominator gives the binomial By the following lemma the (Laurent) binomials associated to a PSS matroid are zero on the steady state variety. Hence, they form a model invariant.
Proof. Given a linearly dependent set of vectors it holds that, without loss of generality, Taking the preimage under φ A this can be rewritten as The following theorem states that a set of suitably chosen binomials contains only all positive steady states as our original ideal.
Theorem 4.8. Let M(A) be the matroid associated to a toric chemical reaction network N and choose a basis S and n − d circuits C i such that i C i = S. Then, the following holds Hence, proving that the variety V ( Φ(C 1 ), . . . , Φ(C n−d ) ), when intersected with the subspace spanned by the conservation relations, contains at least two positive points is necessary for proving multistationarity of the original toric network given by I N .
Proof. Let F k (x 1 , . . . , x n ) = Φ(C k ) to give W = V (F 1 , . . . , F n−d ) and also let The containment X A,x * ∩ R n >0 ⊆ W ∩ R n >0 follows from Lemma 4.7. Now prove the other containment. For a positive real point w ∈ W ∩ R n >0 we must have for each k = 1, . . . , n − d that F k (w) = 0. Hence, by Definition 4.5, For each circuit define the vectorλ ∈ Z n as the vector with non-zero entries i corresponding to the values of the coefficients of λ i j a i j − j−1 y=1 λ iy a iy = 0. and let Λ be the matrix with rowsλ k ; note that the rows of Λ form a basis of ker(A). Setw = (w i 1 , . . . , w i j ) and set x * = (x * i 1 , . . . , x * i j ); then w x * λ k = 1, which gives,λ k · log(w/x * ) = 0.
Further, log(w/x * ) ∈ ker(Λ) and, hence,w/x * is in the image of x * t A .
As a result of Theorem 4.8 we can restrict the study of positive steady states to the study of the PSS matroid. The next proposition establishes a connection between the PSS matroids of all members of a family. Proof. If the dimensions of the varieties are the same, then by Theorem 4.3 the A-matrix of N M is a submatrix of the A-matrix of N N . Therefore, E M ⊂ E N and, trivially, any set of linearly independent columns of any matrix is also linearly independent set of columns in any submatrix containing these. Hence, I M = I N ∩ P(E M ).

Experimental Design and Compatibility
We now study how steady state matroids and submatroids can be employed in experimental design and model rejection. For the remainder of this subsection we assume that the number of conservation relations within a family is fixed. Hence, by Proposition 4.9 the matroids of smaller family members are submatroids of the matroids of larger family members.
Previous related work includes the study of "complex-linear steady state invariants" [32] and data coplanarity [33]. A study using the language of algebraic matroids explicitly can be found in [6]. In this section we obtain results similar to [33,34] using the PSS matroid and the binomials of Definition 4.5 and show how they can be used in model rejection and experimental design for entire toric families. In particular, we focus on two extreme cases, the one of all parameters being known and the other of perfect, noise-free measurements. Statistical version of our results could form part of a future work.
First, we determine which species need to be measured to be able to construct the steady state locus for an entire family (if the rate constants are known).
Proposition 4.10. Fix a family of chemical reaction networks N for which the rate constants κ associated to every family member are known. There exists a subset of species of the smallest family member which is sufficient to measure at steady state in order to construct a corresponding positive steady state for every subsequent family member.
Proof. Recall that x * is a positive vector for known constants. Choose a basis S of the PSS matroid of the smallest family member N 1 . Hence, by Definition 4.5 binomial relations can be constructed to determine the steady state concentrations of the chemical species not in the basis. By Proposition 4.9 any basis of N 1 is a basis of the subsequent family members and, hence, Definition 4.5 applies.
Hence, by Proposition 4.10, measuring the concentrations of all species in a basis of the smallest network in a family is sufficient to determine the expected values of the chemical concentrations at steady state of the entire family.
Example 4.11. A detailed study of positive steady state parameterizations for the processive system has been carried out in [13]. Suppose we know all 11 rate constants of the one and two site processive phosphorylation networks of the running example. In this case the functions x * i (κ) can be found and evaluated explicitly (e.g. see [13]). Further, the columns a 1 , a 2 , a 3 form a basis of R 3 and hence the column space of the A matrix. Therefore, given measurements of x 1 , x 2 , x 3 , the expected steady state concentrations of any species x j with j > 3 can be found as a zero of the Laurent binomial j . Next, we use the PSS matroids for model selection or model rejection for perfect, that is noise-free, data. However, the result of Lemma 4.12 could also be applied to noisy data by following the construction in [34]. To determine whether a model is compatible with observed data it is necessary to determine whether there exists a set of parameters {κ 1 , . . . , κ m } > 0 such that a measured data point {ξ i 1 , . . . , ξ i j } is an element of the steady state variety. We proceed by formulating a parameter-free condition for model compatibility.
Lemma 4.12. Let N be a reaction network with PSS matroid M(A). Assume that the rate constants are fixed but unknown, but the initial conditions are varied. Fix a circuit C corresponding to the linear relations among the columns of A via j−1 l=1 λ i l a i l = λ i j a i j . Given a pair of steady state measurements {ξ i 1 , . . . , ξ i j } and {ζ i 1 , . . . , ζ i j } of the concentrations of C, the corresponding model is compatible only if Proof. As in Remark 2.13 we fix rate constants κ = (κ 1 , . . . , κ m ) T ∈ R m >0 such that x * ∈ R n >0 . Rearrange Φ(C) of Definition 4.4 to give For the measurements to be compatible we must have that when we evaluate the expression above at x i l = ξ i l and at x i l = ζ i l we obtain the same value, θ. The conclusion follows.
Example 4.13. Suppose we wanted to test whether some data we obtained could originate from a processive phosphorylation system. We set up the experiment using two different initial conditions and measured the concentrations of the elements of the circuit {x 1 , x 2 , x 3 , x 4 } and based on the results of the previous section we know that (x 1 x 3 )/(x 2 x 4 ) = const. Let the two ideal data points be ξ = {5/2, 3/10, 10/8, 875/96} and ζ = {1/2, 3/5, 1/8, 35/384}. A priori these data seem unrelated, however, by using the circuit information we find that Therefore, the data are compatible with the processive model. Note that the relation holds equally for the one-site and two-site model and we can conclude that the entire family is compatible.
Due to the additional structure provided by the PSS matroid the condition of Lemma 4.12 is much simpler than the linear algebra condition of [33]. By Proposition 4.9 it is easy to see that if Lemma 4.12 holds for a given PSS matroid it holds for all of its submatroids. Hence, measuring only a subset of species which belongs to a small member of the family tells us that the measurements of this subset of species is compatible for all subsequent family members. This allows us to determine that a given data set is compatible with a family of networks, but we cannot specify which network in the family is 'most' compatible with the data.
Identifying model parameters for perfect data has been studied extensively in previous work [49,50,51] and, therefore, the discussion in this chapter is restricted to the novel methods obtained by using the PSS matroid of a family of networks. In particular, the PSS matroid allows for straightforward conversion of a variety in the space of species to a variety in the space of parameters. We first show that, in the case of toric steady states the biochemically viable parameter sets, κ = (κ 1 , . . . , κ m ) T ∈ R m >0 , are the positive part of an algebraic variety and then generalise this result to the entire family.
Proposition 4.14. Let A be a full rank d × n integer matrix and let C 1 , . . . , C n−d be a collection of circuits of the matroid M(A), each containing the same basis S. Using Definition 4.4 to obtain the ideal J = Φ(C 1 ), . . . , Φ(C n−d ) ⊆ R = R(κ 1 , . . . , κ m )[x 1 , . . . , x n ] and denoting the variables present in a circuit as x(C 1 ) ⊆ {x 1 , . . . , x n }, then the intersection ideal Proof. By construction both, the numerator and the denominator of Φ(C i ) contain only variables in C i and, hence, after clearing the denominator the resulting polynomial Φ(C i ) also only contains variables in C i . Since C i is a circuit, the ideal J C i has codimension 1 in R ∩ R(κ 1 , . . . , κ m )[x(C i )] and, hence, by [52, I., §7, Proposition 4] it is principal. Further, Proposition 4.14 shows that the positive part of the steady state variety can easily be projected onto coordinate subspaces by dropping circuits. Hence, the PSS matroid allows for some freedom to "pick and mix" variables according to measurements. The picking and mixing corresponds to the geometric operation of projection of the variety X = V ( Φ(C 1 ), . . . , Φ(C n−d ) ) ⊆ R(κ 1 , . . . , κ m ) n . Next, suppose there exists a measurement ξ = {ξ i 1 , . . . , ξ i +d } containing measurements for the species in a basis S of the PSS and the species in the circuits C 1 , . . . , C , all containing S. Combining the idea of projection and measurement (evaluation) leads to the definition of a parameter variety.
Definition 4.15. Keeping the same notation as above and, by choosing an appropriate set of generators, i.e. "clearing the denominators", let J = Φ(C 1 ), . . . , Φ(C ) ⊆ R[κ 1 , . . . , κ m , x 1 , . . . , x n ]. Hence, V (J) ⊂ R m × R n . The parameter variety, X m , is obtained from V (J) by the evaluation The parameter variety is obtained by the selective projection and evaluation of the binomials obtained from Definition 4.5 and is a variety in the space of parameters only. Every parameter vector compatible with the measurement ξ is on the parameter variety. Every set of measurements gives rise to a parameter variety and, in order to be compatible with a model, their intersection needs to be non empty, in fact, their intersection needs to be non empty in the positive orthant. In order to uniquely identify a model based on a given measurement the positive orthant of X m needs to consist of a single point only. Various algebraic techniques can be applied to show when this is the case, e.g. [45,53].
The parameter varieties of families of toric networks can be related by applications of projections (selective application of Definition 4.5) and the partial evaluation map π. Suppose J N and J int are the ideals of the N th member of a family and the intermediate model between the N th and (N + 1) th member, respectively. Let both ideals (or a projection of them) contain the same circuits C 1 , . . . , C . Then, by Proposition 3.15, J N = π(J int ).

Inheritance of Multistationarity for Toric Families
In this section we investigate the inheritance of multistationarity among members of families of toric chemical reaction networks. Our main result is Theorem 5.2, where we apply results of [24,53] to show that if we can find a multistationary member of a family (satisfying certain conditions), then every larger member of the same family is multistationary for some parameter values. We begin by introducing some notation. where Z is an integer matrix. Further, let for some values of (λ 1 , . . . , λ n ).
We now state the main theorem of this section. (C3) The matrix J λ for the N th network is regular.
Then, if the N th member of the family is capable of multistationarity, every member of the family for which M ≥ N is also capable of multistationarity.
Before proving the theorem above we prove the following lemma. where Γ N = (r 1 , . . . , r m ) is the n × m stoichiometric matrix of the N th network and,Γ N = (r m+1 , . . . , r m+m ) is the (n + n ) × m matrix of new reactions. Hence, any basis of the left kernel of Γ int N can be written asz i = (z i , z i ), where z i is an element of a basis of the left kernel of Γ N or zero and z i is a vector to be determined byΓ N . The assumption of equidimensionality of the left kernels of Γ N and Γ int N can be satisfied only if rank(Γ N ) = n and, in the same fashion as Corollary 3.18, it follows, that Z N is a submatrix of Z int N . Now, delete m of the first m columns of Γ int N to form Γ N +1 . By the equidimensionality assumption it follows, that the rows of Z int N form a basis of the left kernel of Γ N +1 . Hence, Z int N = Z N +1 and Z N is a submatrix of Z N +1 . From Lemma 5.3 and Theorem 4.8 it becomes apparent why (C2) and toric steady states are required. Namely, toric steady states are needed to construct a matrix B and, hence, J λ N and (C2) is required to ensure that the conservation relation matrices are submatrices. The proof of Theorem 5.2 is now stated.
Proof. Fix a vector x * ∈ R n >0 as in Definition 4.4 and choose a basis S and n − d circuits {C 1 , . . . , C n−d } of the PSS matroid associated to the N th member of a family such that Further, let B N denote the exponent matrix of the binomials Φ(C 1 ), . . . , Φ(C n−d ). Hence, following the construction of Definition 5.1 we obtain the square matrix By construction (see Theorem 4.8), V (Φ(C 1 )(x), · · · , Φ(C n−d )(x)) ∩ R n >0 = ∅. By [24, Theorem 2.7] the system (15) is multistationary if and only if either det(J λ N ) = 0 or det(J λ N ) = 0 and the polynomial det(J λ N ) in λ 1 , . . . , λ n has a positive and a negative term. Suppose the N th network is multistationary and, by condition (C3), det(J λ N ) = 0. Next, build the (N + 1) th network by adding n new species and consider its matrix J λ N +1 . To obtain this matrix consider the new circuit binomials Φ(C n−d+1 )(x) = · · · = Φ(C n+n −d )(x) = 0, where each circuit contains the basis S and one new variable x n+i with its corresponding positive exponent b n+i , where i = 1, . . . , n . The exponent vectors will therefore have exactly one non-zero element (namely the positive integer b n+i ) in the last n rows. Therefore, the matrix J λ N +1 has the block form where A| [y 1 ...ym×a 1 ...an] denotes the restriction of a matrix A to the rows y 1 . . . y m and columns a 1 . . . a n . The matrix diag(b n+1 λ n+1 , . . . , b n+n λ n+n ) is a diagonal matrix with diagonal elements b n+1 λ n+1 , . . . , b n+n λ n+n . Hence, the expression T = n+n i=n+1 b i λ i det(J λ N ) = 0 must appear in det(J λ N +1 ) and, in particular, no term in T can be cancelled by any other term appearing in det(J λ N +1 ). Hence, if det(J λ N ) has coefficients of opposite sign so does det(J λ N +1 ) and, therefore the network is multistationary. The proof is completed by induction.
Remark 5.4. Note that, since we are allowed to choose different bases of the PSS matroid, and hence, different binomial systems it may be possible to satisfy condition (C3) of Theorem 5.2 for one particular choice of basis but not for a different choice of basis.

Conclusion
In this paper we studied families of chemical reaction networks with toric steady states, which we called toric families. First, we investigated the dimensions and parameterizations of toric steady state varieties and connected them to network properties whenever possible. In particular, if there is a finite number of positive steady states for all choices of c 1 , . . . , c d , then the number of conservation relations determines the dimension of the toric steady state variety and, with certain restrictions, the monomial parameterization of a chemical species X i is preserved throughout the family.
We next studied the PSS matroid defined by the parameterization of the positive steady states. In particular, we proved that the algebraic matroid defined by the binomial steady state ideal is equivalent to the PSS matroid. We showed how the PSS matroid can be decorated with binomials and how they can be used for model selection, experimental design or even parameter identification.
The final section investigated the multistationarity structure of toric families. The main result of the section showed that, under some mild restrictions, if a member of a family is capable of multistationarity then all larger members are too. This result was proved using the binomials constructed from the PSS matroids. We illustrated our results on the multisite distributive phosphorylation network.
Further research could include applying the results of this paper to other meaningful biochemical families such as different models for immune system reactions, e.g. [1,55]. Another direction could be to study the parameter varieties defined in this paper in the context of previous identifiability research and aim to include noisy data. Proof. Note that V (L) − V (x 1 · · · x n+n ) = V (L). Hence, which proves the first statement. For the second statement note that, by construction of the evaluation map, X N +1 ∩ V (L) = X N ⊂ C m κ × C n x × C n . Hence, X N +1 − V (x 1 · · · x n+n ) ∩ V (L) = X N +1 ∩ V (L) − V (x 1 · · · x n+n ) = X N − V (x 1 · · · x n+n ) = X N − V (x 1 · · · x n ).
Note that no variables in x n+1 , . . . , x n+n appear in I N and, hence, the subset of exponent vectors b i = b + i − b − i defining I N : (x 1 · · · x n ) ∞ have the general form b i = (b i 1 , . . . , b in , 0, . . . , 0) T . Further, the entries of the remaining ν − µ exponent vectors have at least one non-zero entry in the last n rows and these vectors can be collected into a matrix B. Hence, B N +1 may be written in block form as It is clear that the first n entries of any basis vector of the left kernel of B N +1 must form a vector in the left kernel of B N . Hence, the left kernel of B N +1 has a basis of the form a i = (a i ,ā i ), where a i is an element of a given basis of the left kernel of B N or the zero vector andā i ∈ Z n is a vector of variables to be determined. From the rank condition it is clear that the matrixB has a rank of at most n and, for any given i, a system of linear equations is obtained,ã ib = 0, for all columnsb ofB. Due to the rank condition, this system has always at least one (rational) solution and, hence, the collection of all solutions form a basis of the left kernel of B N +1 to give the matrix A N +1 (after clearing the denominators by multiplications of rows with scalars). Therefore, A N can be chosen as a submatrix of A N +1 , proving (ii).

Proof of Theorem 3.19
Proof. Part (i) follows immediately from Theorem 3.17. To prove (ii), suppose I N and I N +1 are the (binomial) steady state ideals of two members of a family and I N +1 is the steady state ideal of their intermediate network. It follows from Theorem 3.17 that the Amatrices of I N and I N +1 are submatrices of the A-matrix of the intermediate. Each column of the A-matrix corresponds to a variable x i . Since both I N +1 and I N +1 are ideals with species {x 1 , . . . , x n+n } then their associated A-matrices have the same number of columns. Further, by assumption, the number of conservation relations stays constant. Therefore, by Proposition 2.14 the A matrices associated to I N +1 and I N +1 are the same, since the ideals have the same dimension, meaning the A-matrices must have the same number of rows. Hence, the A-matrices associated to I N and I N +1 are submatrices; i.e. A N is a submatrix of A N +1 .

Proof of Theorem 4.3
Proof. The variables of the ground set of M(A) have algebraic dependencies as defined in Proposition 4.1. Hence, their algebraic dependencies can be found by solving the implicitization problem I = J ∩ R(κ 1 , . . . , κ m )[x 1 , . . . , x n ] where J = x 1 − x * 1 t a 1 , . . . , x n − x * n t an . However, the same ideal defines the implicit equations of a toric variety defined by ψ A : t → (x * 1 t a 1 , . . . , x * n t an ). Hence, the algebraic relations between the monomials φ A (x) are identical to the algebraic relations defined by the binomial ideal I b . This implies that M(A) = M(I b ).