Product-Form Stationary Distributions for Deficiency Zero Networks with Non-mass Action Kinetics

In many applications, for example when computing statistics of fast subsystems in a multiscale setting, we wish to find the stationary distributions of systems of continuous-time Markov chains. Here we present a class of models that appears naturally in certain averaging approaches whose stationary distributions can be computed explicitly. In particular, we study continuous-time Markov chain models for biochemical interaction systems with non-mass action kinetics whose network satisfies a certain constraint. Analogous with previous related results, the distributions can be written in product form.


Introduction
Biological interaction systems are typically modeled in one of three ways. If the counts of the constituent species are high, then their concentrations are often modeled via a system of ordinary differential equations with state space R d ≥0 , where d > 0 is the number of species. If the counts are moderate (perhaps order 10 2 or 10 3 ), then they may be approximated by some form of continuous diffusion process (Gillespie 2000;Van Kampen 2007). If the counts are low, then the system is typically modeled stochastically as a continuous-time Markov chain in Z d ≥0 Kurtz 2011, 2015). We often want to understand the stationary behavior of the model under consideration. For deterministic models, understanding the stationary behavior usually entails characterizing the stable fixed points of the system, whereas for stochastic models, we require the calculation of the stationary distribution.
Stationary distributions are also useful in a multiscale setting, where the stationary statistics of a fast subsystem can be utilized in the approximation of the dynamics of the slow variables, which are typically of most interest. When an analytical form for the stationary distribution of the fast subsystem is not known, numerical approximations can be used. However, these computations are often expensive and part of an "inner loop," typically making this calculation the rate-limiting step of the analysis. In the context of biochemical reaction networks, quasi-equilibrium (QE)-based approximations lead to fast subsystems which preserve mass action kinetics (Goutsias 2005;Janssen 1989; Thomas et al. 2012;Cao et al. 2005;Weinan et al. 2005). However, more recent improvements in stochastic averaging can lead to fast subsystems with non-mass action kinetics, and this observation was the motivation for the present work (Cotter 2016;Cotter and Erban 2016;Cotter et al. 2011).
One class of interaction networks that has been quite successfully analyzed, and that appears ubiquitously as fast subsystems, are those that are weakly reversible and have a deficiency of zero (see Appendix). For this class of models, and under the assumption of mass action kinetics, the fixed points of the deterministic models (Anderson 2011(Anderson , 2008Craciun 2015;Feinberg 1979Feinberg , 1987Gunawardena 2003) and the stationary distributions for the stochastic models, have been fully characterized (Anderson et al. 2010Cappelletti and Wiuf 2016;Van Kampen 1976). In fact, it is the study of this class of networks that is largely responsible for the development of the field of chemical reaction network theory (Feinberg 1972(Feinberg , 1979Gunawardena 2003;Horn 1972), a branch of applied mathematics in which the dynamical properties of the mathematical model are related to the structural properties of the interaction network.
In this article, we return to stochastically modeled interaction networks that are weakly reversible and have a deficiency of zero, though we consider propensity functions (also called intensity functions or rate functions) that are more general than mass action kinetics. However, we add a certain condition to the rates within the reaction network (see Assumption 1 below). Following Anderson et al. (2010), which was motivated by the work of Kelly (1979) who discovered the product-form stationary distribution of certain stochastically modeled queuing networks, we provide the form of the stationary distribution for this class of models. In particular, and in similarity with the main results of Anderson et al. (2010), in Theorem 2 we show that the distribution is of product form and that the key parameter of the distribution is a complex-balanced equilibrium value of an associated deterministically modeled system with mass action kinetics. The result of Anderson et al. (2010) can now be viewed as a special case of this new result.
The paper proceeds as follows: In Sect. 2, we introduce the formal mathematical model of interest. In Sect. 3, we provide the main theorem of this article, which characterizes the stationary distribution for the class of models of interest. In Sect. 4, we provide a series of examples which demonstrate the usefulness of the main result. Some brief concluding remarks are given in Sect. 5.
We assume throughout that the reader is familiar with terminology from chemical reaction network theory. However, we provide in Appendix all necessary terminology and results from this field that are used in the present work.

Mathematical Model
We consider a system with d chemical species, {S 1 , . . . , S d }, undergoing reactions which alter the state of the system. For concreteness, we suppose there are K > 0 distinct reaction channels. For the kth reaction channel, we denote by ν k , ν k ∈ Z d ≥0 the vectors representing the number of molecules of each species consumed and created in one instance of that reaction, respectively. Note that ν k − ν k ∈ Z d is the net change in the system due to one instance of the kth reaction. We associate each such ν k and ν k with a linear combination of the species in which the coefficient of S i is ν ki , the ith element of ν k . For example, if ν k = [1, 2] T for a system consisting of two species, then we associate with ν k the linear combination S 1 + 2S 2 . Under this association, each ν k and ν k is termed a complex of the system. We denote any reaction by the notation ν k → ν k , where ν k is the source, or reactant, complex and ν k is the product complex. We note that each complex may appear as both a source complex and a product complex in the system. The set of all complexes will be denoted by {ν k }.
Definition 1 Let S = {S i }, C = {ν k }, and R = {ν k → ν k } denote the sets of species, complexes, and reactions, respectively. The triple {S, C, R} is called a chemical reaction network.

Deterministic Model
The usual deterministic model for a chemical reaction network {S, C, R} assumes that the vector of concentrations for the species satisfies a differential equation of the forṁ where r k : R d ≥0 → R ≥0 is the (state-dependent) rate of the kth reaction channel. If we assume that each function r k satisfies deterministic mass action kinetics, then Definition 2 An equilibrium value c ∈ R d ≥0 of (1) is said to be complex balanced if for each η ∈ C, where the sum on the left, respectively right, is over those reactions with η as source complex, respectively product complex. In the special case of mass action kinetics, c is complex balanced if and only if

Stochastic Model, Previous Results, and Assumptions
The usual stochastic model for a reaction network {S, C, R} treats the system as a continuous-time Markov chain for which the rate of transition from state is a suitably chosen intensity function (the intensity functions are also termed propensity functions in the literature). This stochastic process can be characterized in a variety of useful ways Kurtz 2011, 2015). For example, it is the stochastic process with state space Z d ≥0 and infinitesimal generator Kolmogorov's forward equation for this model, termed the chemical master equation in the biology literature, is where p μ (t, x) is the probability the process is in state x ∈ Z d ≥0 at time t ≥ 0, given an initial distribution of μ, and A * is the adjoint of A. Note that (3) implies that a stationary distribution for the model, π , must satisfy Note also that (4) implies that π is in the null space of A * . Thus, and as is well known, finding π reduces to solving π A = 0, with x π(x) = 1, where A is reinterpreted as a generator matrix.
Of particular interest to us are the rate functions, λ k . Under the assumption of stochastic mass action kinetics, we have where κ k > 0 are the reaction rate constants. Here and throughout, we interpret any product of the form −1 i=0 a i to be equal to one. In Anderson et al. (2010), the authors considered a class of stochastically modeled reaction networks with mass action kinetics (those with a deficiency of zero and are weakly reversible, see Appendix) and characterized their stationary distributions as products of Poisson distributions. However, they did not just consider mass action kinetics in Anderson et al. (2010), but any kinetics satisfying the functional form so long as θ i (0) = 0 for each i. In particular, they proved the following.
Theorem 1 (Anderson et al. 2010) Let {S, C, R} be a reaction network, and let (κ 1 , . . . , κ k ) be a choice of positive rate constants. Suppose that modeled deterministically with mass action kinetics and rate constants κ k , the system is complex balanced with complex-balanced equilibrium c ∈ R d >0 . Then, the stochastically modeled system with intensity functions (6) admits the invariant measure The measure π can be normalized to a stationary distribution so long as it is summable.
The connection between weakly reversible, deficiency zero networks, and complexbalanced equilibria is given in Appendix.
In this paper, we generalize Theorem 1 by showing how to find the stationary distributions for models with rates that do not seem to satisfy the form (6). However, we add an assumption pertaining to the form of the rates within the reaction network, which we describe now. We begin by partitioning the set of species into two sets, S = S 1 ∪ S 2 . We will say that S i ∈ S 2 if α i = gcd{ν 1i , ν 2i , . . . , ν K i } > 1. Otherwise, we say that S i ∈ S 1 , noting that α i = 1. We now assume that the intensity functions for the reaction network are of the form where κ k > 0 and θ i ( Assumption 1 A stochastically modeled reaction network satisfies this assumption if it satisfies the partition described above and has intensity functions of the form (8).
We provide an example to clarify the notation.
Example 1 Consider the stochastically modeled system with reaction network where the intensity functions are placed next to the reaction arrows. Here S 1 ∈ S 2 , with α 1 = 2, and S 2 , S 3 ∈ S 1 . The assumption (8) then supposes that for appropriate and θ 3 (x 3 ) = x 3 1+x 3 , in which case the form for the stationary distribution does not follow immediately from Theorem 1.

Main Result
Here we state and prove our main result.
Theorem 2 Let {S 1 ∪S 2 , C, R} be a reaction network satisfying Assumption 1, and let (κ 1 , . . . , κ K ) be a choice of positive rate constants. Suppose that modeled deterministically with mass action kinetics and rate constants κ k the system is complex balanced with complex-balanced equilibrium c ∈ R d >0 . Then the stochastically modeled system with intensity functions (8) admits the invariant measure The measure π can be normalized to a stationary distribution so long as it is summable.
Note that the theorem applies to models with reaction networks satisfying Assumption 1 and that are weakly reversible and have a deficiency of zero. See Appendix.
Proof First note that if S 2 = ∅, then Theorem 2 is the same as Theorem 1 and there is nothing to show. Thus, we suppose S 2 = ∅.
The proof proceeds in the following manner. First, for each S i ∈ S 2 we will demonstrate the existence of a function ϕ i : Z ≥0 → R ≥0 for which Next, we will apply Theorem 1 and prove that the resulting distribution is indeed given by (10). Let S i ∈ S 2 . We begin by setting For z ≥ α i an integer, we may define ϕ i recursively via the formula Note that ϕ i is a well-defined function since θ i (z) > 0 for each z ≥ α i by assumption. It is clear that (11) is satisfied with this choice of ϕ i . For x ∈ Z d ≥0 , we may now write Hence, we may apply Theorem 1 and conclude that is an invariant measure for the system, where c is a complex-balanced fixed point for the deterministic system. It remains to show that First note that if x i < α i , then both sides of (13) are equal to one. For the time being, assume that α i ≤ x i < 2α i . Under this assumption, the right-hand side of (13) is where in the final equality we used that ϕ i ( ) = 1 for 1 ≤ < α i (and that x i − α i < α i ), and the left-hand side is where we again used that ϕ i ( ) = 1 when 1 ≤ < α i . Hence, (13) is verified when We will now prove that (13) holds in general by induction. We suppose that (13) holds for all z ≤ x i , where x i ≥ 2α i − 1, and will show it to hold at x i + 1. Using (12), the left-hand side of (13) evaluated at x i + 1 is where the final equality is an application of (12) with x i + 1 in place of the variable z. Continuing, we have where the final equalities are straightforward. Note that (14) is the right-hand side of (13) evaluated at x i + 1, and so the proof is complete.

Example 1: Motivating Example
First, we consider a motivating example arising from model reduction, through constrained averaging (Cotter 2016;Cotter and Erban 2016;Cotter et al. 2011), of the following system: where the intensity functions are placed next to the reaction arrows. Note that the intensities of all the reactions follow mass action kinetics. We consider this system in a parameter regime where the reversible dimerization reactions 2S 1 S 2 are occurring more frequently than the production of S 2 and the degradation of S 1 . Both S 1 and S 2 are changed by the fast reactions, but the quantity S = S 1 + 2S 2 is invariant with respect to the fast reactions, and as such is the slow variable in this system. We wish to reduce the dynamics of this system to a model only concerned with the possible changes in S: whereλ 3 (s) andλ 4 (s) are the effective rates of the system. Using the QE approximation (QEA),λ 3 (s) = κ 3 andλ 4 (s) = κ 4 E π QEA (s) [X 1 ], where π QEA (s) is the stationary distribution for the system under the assumption that X 1 (0) + 2X 2 (0) = s. Since the system (17)  In comparison, the constrained approach requires us to find the invariant distribution π Con of the following system: subject to X 1 (0)+2X 2 (0) = s. Readers interested in seeing how this is derived should refer to Cotter (2016). This network is weakly reversible and has a deficiency of zero. However, the form of the rates in this system does not satisfy the conditions specified in Anderson et al. (2010). In the context of constrained averaging, this lack of a closed form for the stationary distribution would result in the need for some form of approximation of the stationary distribution. There are two common methods utilized for performing this approximation. One possibility would be to perform exhaustive stochastic simulation of the system (18). Another option involves finding the distribution by finding the null space of the adjoint of the generator (see the discussion in and around (5)). However, as the state space of (18) will typically be huge, the latter method often involves truncating the state space and approximating the actual distribution with that of the stationary distribution of the truncated system (Cotter 2016). Both approaches will lead to approximation errors and varying amounts of computational cost. However, note that the system (18) does satisfy Assumption 1, with α 1 = 2 and α 2 = 1. We denote the rate of dimerization by λ D and its reverse by λ −D . Therefore, with θ 1 defined in the final equality. The form of the rate of the reverse reaction is much simpler and is given by which defines θ 2 . By Theorem 2, we can write down the stationary distribution of this system. The complex-balanced equilibrium of the associated deterministically modeled system with c 1 + 2c 2 = 1 is given by (c 1 , c 2 ) = √ (k 2 +k 4 )(k 2 +8k 1 +k 4 )−k 2 −k 4 4k 1 , 1−c 1 2 . Then by Theorem 2, and by recalling that all states (x 1 , x 2 ) in the domain satisfy s = x 1 + 2x 2 , the stationary distribution for S 2 is given by where Con is a normalization constant and s is the conserved quantity. Note that the indicator function in θ 1 (in the denominator) has disappeared since it is always equal to one over the domain of the product. We can compare (19) with the distribution of (17), which arises from the QEA, and also with the distribution of the full system (15) conditioned on S 1 + 2S 2 = s (which can be approximated by finding the null space of the adjoint of the generator of the full system on the truncated domain). First, we consider the QEA approximation. The invariant distribution of the fast subsystem (17) can be found using Theorem 1 and is given by where is the complex-balanced equilibrium for this system satisfying d 1 + 2d 2 = 1 and QEA is a normalizing constant.
Since the full system (15) does not have a deficiency of zero, we are not able to find its invariant distribution directly. However, by truncating the state space appropriately, we are able to approximate the full distribution by constructing the generator on this truncated state space and finding the null space of the adjoint.
Once we have approximated the null space of the truncated generator, we can find the approximation of P(X 2 = x 2 |X 1 + 2X 2 = s) by taking the probabilities of all states with x 1 + 2x 2 = s and renormalizing. In what follows, we truncated the domain of the generator to x ∈ {0, 1, . . . , 1000} × {0, 1, . . . , 500}. We consider the system (15) with parameters given by: Note that it is not obvious from these rates that the reactions with rates k 3 and k 4 are in fact the slow reactions in this system. The invariant density is largely concentrated in a small region centered close to the point x = (99, 114). By using the approximation of the invariant density that we have computed on the truncated domain, we can compute the expected ratio between occurrences of the fast reactions with rates k 1 and k 2 with the slow reactions with rates k 3 and k 4 . For this choice of parameters, the expected proportion of the total reactions which are fast reactions (dimerization/disassociation) is 82.68 %. This indicates a difference in timescales between these reactions, but the difference is not particularly stark, and as such we would expect there to be significant error in any approximation relying on the QEA. Figure 1 shows the three approximations of the distribution P(X 2 = x 2 |X 1 +2X 2 = 300) for the system (15) with parameters given by (21). The constrained and QEA approximations are computed using (19) and (20), respectively, with the normalizing constants computed numerically. As would be expected in this parameter regime, the constrained approximation is far more accurate than the QEA.
We can quantify the accuracy of each of the approximations by computing the relative l 2 differences with the distribution computed using the full generator on the truncated domain. This relative difference was 4.464 × 10 −1 for the QEA, in com- parison with 5.2337 × 10 −2 for the constrained approximation. This demonstrates the improvement in approximation that can be achieved by using constrained averaging, and which motivates the need for results like Theorem 2 which take non-mass action kinetics into account.

Example 2: Dimerization
We begin by considering a model consisting of only proteins, denoted P, and dimers, denoted D. We suppose there are two mechanisms by which the proteins are dimerized: random interactions between the protein molecules, and via a catalyst. The rate at which random interactions lead to the formation of dimers can be taken to be of mass action form. Assuming the concentrations are such that the catalyst is acting at capacity, the rate of formation of the dimers due to the catalyst can be faithfully modeled as a constant (so long as there are at least two proteins to make the reaction happen). Thus, letting x p and x d denote the numbers of proteins and dimers in the model, respectively, the reaction system can be represented as where κ p→d , κ d→ p and ρ are given parameters. Note that after an obvious change of variables the stationary distribution of this particular model is provided in (19). The reactions in (22) are typically a subset of the reactions in a larger system. For example, the actual model of interest may be where M represents an mRNA molecule. This is a standard model for dimer production. Depending upon the relevant timescales in the system, we may want to take d D = κ d = 0. If the reactions 2P D are appreciably faster than those of (23), then an obvious path simulation strategy presents itself in the mold of the slow-scale SSA (Cao et al. 2005) (i.e., stochastic averaging): 1. for the current values of P and D, find the stationary distribution of the fast subsystem analytically, 2. determine the effective rates of the reduced model where the expectations are with respect to the distribution found in step 1. 3. simulate forward in time using the stochastic simulation algorithm (Gillespie 1976) or the next reaction method (Anderson 2007;Gibson and Bruck 2000), and return to step 1.
Being able to analytically calculate the stationary distribution in step 1 allows us to bypass the need to numerically approximate the stationary distribution, as is commonly done (Weinan et al. 2005;Weinan and Vanden-Eijnden 2007).

Example 3
In Sects. 4.1 and 4.2, we applied Theorem 2 on the common motif 2S 1 S 2 . In this example, we present another common motif, 2S ∅, for which Theorem 2 is also useful. As opposed to the specific models we considered in the previous examples, here we present a more general framework in which the specific form of the propensity function for the reaction 2S → ∅ is arbitrary. This situation is common when undertaking certain types of averaging arguments (Cotter 2016).
Let us suppose that the effective dynamics of a slow variable in a larger system can be modeled as In practice, the normalization constant could be approximated by summing over an appropriate domain.

Example 4
This example will demonstrate the difficulties that can arise when a reaction is added that involves a species in the set S 2 with multiplicity not equal to α i . In particular, consider the system where where θ 1 (x 1 ) = 1 {x 1 >1} (10 + x 1 + 6 sin(π x 1 /5)) . Note that we have a non-mass action kinetics rate λ 1 for the reaction 2S 1 → ∅. There is another reaction involving S 1 , but the amount of S 1 molecules involved in this reaction is not a multiple of 2. This means that we cannot apply the result of Theorem 2 to this system unless ϕ satisfies the recurrence relation detailed in the proof of Theorem 2. In particular, it must satisfy ϕ(x 1 )ϕ(x 1 − 1) = θ 1 (x 1 ), or This recurrence relation defines a unique function ϕ : Z ≥0 → R ≥0 for each C ∈ R ≥0 . In general, the function ϕ can oscillate wildly. Let us consider, for example, ϕ when θ 1 is given as above. In this case, Figure 2 demonstrates how ϕ(x 1 ) and the amplitude of the oscillations grow with x i , for the case C = 1. It is clear that this function does not represent any physical reaction rate arising from chemistry, but we can still write down the invariant distribution of the system (25).  The complex balance equilibrium of the associated mass action kinetics system to (25) is given by (c 1 , c 2 ) = k 2 k 1 , k 3 k 4 k 2 k 1 . Therefore, the stationary distribution is given by: where is a normalizing constant and ϕ is given by (26) with C = 1. Note that the value of a here dictates the oddness or evenness of the quantity x 1 + x 2 , which is preserved by each of the reactions. Figure 3 shows the invariant distribution (27) of the system (25) with an even initial value of x 1 + x 2 , and for the following parameters: The normalization constant was approximated by summing the values of all states x ∈ {0, 1, . . . , 1000} × {0, 1, . . . , 1000}.

Conclusions
In this paper, we provided the stationary distributions for the class of stochastically modeled reaction networks with non-mass action kinetics that satisfy our Assumption 1 and that admit a complex-balanced equilibrium when modeled deterministically with mass action kinetics. Similarly to the results of Anderson et al. (2010), we showed that the stationary distributions are of product form. We motivated the need for such results through consideration of modern averaging techniques. In particular, Theorem 2 significantly reduces the computational cost of finding the invariant distribution of most fast subsystems in a multiscale setting, therefore making accurate approximations very cheap to compute. This in turn opens up more possibilities, for example approximation of likelihoods via multiscale reductions in the context of parameter inference for biochemical networks.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix: Terminology and Results from Chemical Reaction Network Theory
For each reaction network, {S, C, R}, there is a unique directed graph constructed in the following manner. The nodes of the graph are the complexes, C. A directed edge is then placed from complex ν k to complex ν k if ν k → ν k ∈ R. Each connected component of the resulting graph is termed a linkage class. We denote the number of linkage classes by . For example, in the reaction network (9) there are two linkage classes.
Definition 3 A chemical reaction network, {S, C, R}, is called weakly reversible if each linkage classes is strongly connected. A network is called reversible if ν k → ν k ∈ R whenever ν k → ν k ∈ R.
Note that a network is weakly reversible if and only if for any reaction ν k → ν k , there is a sequence of directed reactions beginning with ν k as a source complex and ending with ν k as a product complex. That is, there exist complexes ν 1 , . . . , ν r such that ν k → ν 1 , ν 1 → ν 2 , . . . , ν r → ν k ∈ R.
Definition 4 S = span {ν k →ν k ∈R} {ν k − ν k } is the stoichiometric subspace of the network. For z ∈ R d , we say z + S and (z + S) ∩ R d >0 are the stoichiometric compatibility classes and positive stoichiometric compatibility classes of the network, respectively. Denote dim(S) = s.
The final definition is that of the deficiency of a network (Feinberg 1987).

Definition 5
The deficiency of a chemical reaction network, {S, C, R}, is δ = |C| − − s, where |C| is the number of complexes, is the number of linkage classes of the network graph, and s is the dimension of the stoichiometric subspace of the network.
We state a classical result that can be found in Feinberg (1972Feinberg ( , 1979Feinberg ( , 1987, Gunawardena (2003), Horn (1972, which relates networks that are weakly reversible and have a deficiency of zero to those that admit complex-balanced equilibria. Theorem 3 If the reaction network {S, C, R} is weakly reversible and has a deficiency of zero, then for any choice of rate constants κ k the deterministic mass action system admits a complex-balanced equilibrium, c, satisfying (2). Moreover, within each positive stoichiometric compatibility class, there is precisely one equilibrium, and it is complex balanced.