Stochastic approximation of high-molecular by bi-molecular reactions

Biochemical networks, under mass-action kinetics, containing reactions with three or more reactants (called high-molecular reactions) are investigated. An algorithm for stochastically approximating the high-molecular reactions with a set of bi-molecular ones, involving at most two reactants, is presented. Properties of the algorithm and convergence are established by applying singular perturbation theory on the underlying chemical master equation. The algorithm is applied to a variety of examples from both systems and synthetic biology, demonstrating that biochemically plausible bi-molecular reaction networks may display a variety of noise-induced phenomena, and may be designed in a systematic manner.


Introduction
Reaction networks [1,2] are a central mathematical framework utilized for analyzing biochemical circuits from systems biology [3,4,5], and are a powerful programming language for designing synthetic molecular systems [6,7,8,9,10]. Molecular networks in the context of classical biochemistry typically do not include reactions of order three or higher (referred to in this paper as higher-order, or high-molecular, reactions), since reactive collisions between three or more molecules are unlikely to take place [2,11]. However, let us note that the higher-order reactions are in principle realizable in the context of nucleic-acid-based synthetic biology [12]. Despite the overall experimental implausibility of such reactions, they appear in both theoretical systems and synthetic biology. For example, third-order (tri-molecular) reactions appear in the Schlögl system [13], where they allow for bistationarity (coexistence of two stable equilibria), in the Brusselator [14] and Schnakenberg systems [15], which display oscillations (existence of a stable limit cycle), as well as in two-species networks displaying bicyclicity (coexistence of two stable limit cycles) [16], and a variety of bifurcation structures, such as homoclinic and saddle-node on invariant circle (SNIC) bifurcations [8,17]. Aside from well-mixed settings, third-order reactions also play a role in pattern formation [18], and, more broadly, are a subject of research within reaction-diffusion modeling in systems biology [19]. On the other hand, in synthetic biology, higher-order reactions appear e.g. in the so-called noisecontrol algorithm, put forward in [9], where such reactions allow for a precise state-dependent control of the biochemical stochastic dynamics. In order to experimentally realize the higher-order reactions in synthetic biology, they should, in general, first be mapped to suitable second-order (bimolecular) reactions [6]. In the context of systems biology, such a mapping allows for a biochemical interpretation of higher-order reactions. Thus, it is of both practical and theoretical interest to develop a suitable order-reduction algorithm, mapping high-molecular into bi-molecular networks.
An algorithm for approximating higher-order reaction networks by second-order ones at the deterministic level (i.e. at the level of reaction-rate equations) has been used for decades, e.g. see [20,21,22,23]. The algorithm has been rigorously justified in the deterministic setting in [24,Section 3] for general third-and fourth-order reactions using perturbation theory. Its performance depends upon an asymptotic parameter, appearing as a rate coefficient of some of the underlying chemical reactions. Another, more elaborate, order-reduction procedure has been presented in [25,26]. While the latter procedure does not depend on an asymptotic parameter, it depends on the precise initial conditions of some of the underlying species, which, from the perspective of synthetic biology, may pose significant challenges [27,28]. Less attention has been paid to the validity of such approximations at the stochastic level (i.e. at the level of the chemical master equation (CME) [29]): it has been formally shown in [30] that the specific third-order reaction, given by 3s → 2s, may be stochastically approximated by a second-order network using the algorithm from [24], and this has also been qualitatively described in [11]. However, the questions of whether the formal deterministic results from [24] extend into the stochastic setting, and whether dynamics of the approximating second-order networks converge in a suitable limit to the dynamics of the original higher-order network, remain unanswered. In particular, validity of perturbation results at the deterministic level does not generally guarantee validity at the stochastic level [31,32,33].
In this paper, we apply singular perturbation theory involving the CME, known in the context of chemical reaction networks as stochastic quasi-stationary approximations (QSAs) [34,35], in order to show that networks of arbitrarily high-order may be approximated by second-order ones at the stochastic level using the algorithm from [24]. The paper is organized as follows. In Section 2, we start by showing that any third-order reaction, involving identical reactants, may be approximated stochastically by a family of second-order networks, stated as Lemma 2.1. In Section 2.3, we apply Lemma 2.1 on the Schlögl system [13], given as the third-order test network (23), which displays both deterministic and stochastic bistabilities for the chosen rate coefficients. We demonstrate that an approximating second-order network displays the same stationary probability mass function (PMF) and switching pattern between the two PMF maxima as the original third-order network in an asymptotic limit (see also Figures 1 and 2). In Section 3, we show that reaction networks of arbitrarily high order may be stochastically approximated by a family of second-order networks, by generalizing Lemma 2.1 to Theorem 3.1, and establish a weak-convergence result in Theorem 3.2 for a particular sub-family of the approximating second-order networks. We also present two extended examples in Sections 3.1 and 3.2, involving applications of Theorem 3.1 to the test networks (44) and (48), respectively, which display purely stochastic phenomena, and demonstrate validity of Theorem 3.1 (see also Figure 3). The notation used in the paper is introduced as needed, and is summarized in Appendix A. Proofs of Theorems 3.1 and 3.2 can be found in Appendices B and C, respectively.

Special case: Third-order homoreactions
Let us start our analysis by considering an arbitrary third-order (tri-molecular) homoreaction, i.e. a reaction involving three reactant molecules of the same species, under mass-action kinetics, which is given by where {s l } N l=1 are the interacting biochemical species, {ν l } N l=1 ∈ Z ≥ the stoichiometric coefficients, and k ∈ R > is a dimensionless rate coefficient, where Z ≥ , and R > , denote the set of non-negative integers, and positive real numbers, respectively (see also Appendix A). Let us also consider the second-order (bi-molecular) network, under mass-action kinetics, given by with the sub-networks where Q 1 is an auxiliary species,ν 1 ,γ 1 ∈ Z ≥ , and κ 1 , κ 2 , ε ∈ R > are positive dimensionless parameters. Here, for convenience, we denote two irreversible reactions (ν →ν) ∈ R, called the forward reaction, and (ν → ν) ∈ R, called the backward reaction, jointly as the single reversible reaction (ν − −ν ) ∈ R. We say that network (2)-(3) is of second-order, because its highest-order reaction is of second-order (see also Appendix A). For mathematical convenience, we also define the fast sub-network In what follows, we assume ε is a small parameter, 0 < ε 1, and that the rate coefficients of the forward reaction from R ε 1 and reaction R 2 are much smaller than the rate coefficient of the backward reaction from R ε 1 for sufficiently small ε, i.e. κ 1 , κ 2 1/ε as ε → 0. We now proceed to use perturbation theory to formally show that, under appropriate conditions, stochastic dynamics of the third-order input network (1) and the second-order output network (2)-(3) are approximately the same in a weak sense. To this end, let us first introduce the chemical master equation (CME) [36,37,38] of the output network (see also Appendix A).
The chemical master equation. Let x = (x 1 , x 2 , . . . , x N ) ∈ Z N ≥ be the vector of copy-numbers of the species s 1 , s 2 , . . . , s N , and y 1 ∈ Z ≥ the copy-number of the auxiliary species Q 1 , appearing in the network (2)-(3). The forward operator of the fast sub-network R ε 0 is given by The fast process has a linear conservation law: x 1 + 2y 1 =x 1 , wherex 1 is a time-independent constant for the process induced by L 0 . Let us introduce a new vectorx = (x 1 , x 2 , . . . , x N ) ∈ Z N ≥ , with the first coordinate given byx Under the coordinate transformation (5), the fast sub-network (4) simplifies to the network Q 1 − → ∅, with the forward operator L 0 = 1/ε(E +1 y 1 − 1)y 1 The CME, governing the time-evolution of the probability mass function (PMF) underlying the network (2)-(3), on the slow time-scale τ = O(1), defined by t = τ /ε (the time is rescaled in order to capture nontrivial dynamics, as explained in what follows), and expressed in terms of the new coordinatesx, reads Here, operators L 0 , and L 1 , are induced by the fast sub-network R 1 0 , and the remaining (slow) sub-network R ε \ R ε 0 , respectively, and given by Functions α 1 (x 1 , y 1 ), and y 1 α 2 (x 1 , y 1 ), are the propensity functions of the reactions from R ε \ R ε 0 , expressed in terms of (5), with The jump vector ∆x is given by ∆x = (∆x 1 ,ν 2 , . . . ,ν N ), where ∆x 1 may be formally obtained by applying the difference operator ∆ on (5), and reads

Perturbation analysis
Let us write the solution of (6) as the perturbation series We require the zero-order approximation p 0 (x, y 1 , τ ) from (10) to be a PMF (i.e. it must be nonnegative and normalized). Substituting (10) into (6), and equating terms of equal powers in ε, the following system of three equations is obtained: The first equation in (11) is homogeneous, and forms a zero-eigenvalue problem, while the remaining two are inhomogeneous. Order 1/ε 2 equation. It follows from the structure of L 0 that we may write p 0 (x, y 1 , τ ) = p 0 (y 1 )p 0 (x, τ ). Assuming p 0 (x, τ ) > 0 for anyx ∈ Z N ≥ and any τ > 0, the first equation from (11) reduces to L 0 p 0 (y 1 ) = 0. Operator L 0 has a one-dimensional null-space, N (L 0 ) = {1 y 1 δ y 1 ,0 }, where 1 y 1 denotes functions independent of y 1 , and δ i,j is the Kronecker-delta function, non-zero only at i = j, where it is equal to one. We seek an element of this one-dimensional space which is normalized, resulting in In other words, under the infinitely fast reaction Q 1 − → ∅, the y 1 -marginal PMF is concentrated at zero, making Q 1 a short-lived species.
Order 1/ε equation. Using (7) and (12), the right-hand side (RHS) of the second equation from (11) becomes Let us denote the l 2 -adjoint operator of L 0 by L * 0 , called the backward operator, which is given by L * 0 = y 1 E −1 y 1 − 1 (see Appendix A). The second equation from (11) has either no solutions or infinitely many solutions, since the backward operator L * 0 has a non-trivial null-space, given by N (L * 0 ) = {1 y 1 }. In order to achieve the latter, the Fredholm alternative theorem [39] implies that the RHS of the equation has to be orthogonal to the null-space of the backward operator L * 0 . Using (13), the solvability condition becomes where f, g = ∞ y 1 =0 f (y 1 )g(y 1 ) denotes an l 2 inner-product. Constraint (14) is unconditionally satisfied, since (E +1 y 1 − 1)1 y 1 = 0. Note that if the time had not been rescaled according to t = τ /ε in (6), the second equation from (11) would read L 0 p 1 (x, y 1 , t) = ( ∂ ∂t − L 1 )p 0 (x, y 1 , t), and the solvability condition would give a trivial effective CME, ∂ ∂t p 0 (x, t) = 0. For this reason, we have rescaled the time to a longer scale.
Since the operator L 0 acts only on y 1 variable, and because of equality (13), the solution of the second equation in (11) may be written in the separable form Note that the factor p 1 (y 1 ;x) generally cannot be interpreted as a (conditional) PMF (e.g. it may be negative), and hence we write p 1 = p 1 (y 1 ;x) instead of p 1 = p 1 (y 1 |x). Substituting (15) into the second equation in (11), using (13) and the operator equality (E −1 y 1 − 1) = −(E +1 y 1 − 1)E −1 y 1 , and assuming positivity of p 0 (x, τ ), one obtains implying that the solutions p 1 (y 1 ;x) satisfy Order 1 equation. The solvability condition for the third equation from (11) is given by Equation (12) implies that the first term from (17) reads 1, ∂/∂τ p 0 (x, y 1 , τ ) = ∂/∂τ p 0 (x, τ ). Using equations (7) and (15), the second term from (17) reads which, upon substituting (16), becomes Substituting (18) into (17), using (8), and changing the time back to the original scale, τ = εt, one obtains the effective CME The presence of the factor ε on the RHS of the effective CME (19) stems from the fact that non-trivial dynamics (i.e. an effective CME with a non-zero RHS) occur at t = O(1/ε). Put it another way, the most common firing pattern of the network R ε , given by (2)-(3), consists of a firing of the reaction 2s 1 − → Q 1 , which produces the short-lived species Q 1 , followed by the fast reaction Q 1 − → 2s 1 , which quickly converts Q 1 back to 2s 1 , which leads to a trivial dynamics of s 1 . However, after long enough time, another firing pattern is also occasionally observed: after Q 1 is produced from 2s 1 , instead of being quickly converted back to 2s 1 , it participates in the target slow reaction s 1 + Q 1 − →ν 1 s 1 + M l=2ν l s l +γ 1 Q 1 , which gives rise to non-trivial dynamics of s 1 .

Validity of the effective CME
Let us now compare the CME of the input network (1) with the effective CME of the output network (2)-(3), given by (19). Before doing so, note that (19) is expressed in terms of the variablē x 1 , defined in (5), and not the original copy-number x 1 . Denoting by Y 1 (t) the time-dependent copy-number of Q 1 , and assuming convergence of the perturbation series (10) (see also Theorem 3.2 in Section 3), it follows from (12) that Y 1 (t) converges to zero. As a consequence, equation (5) implies thatX 1 (t) converges to X 1 (t) as ε → 0, so that we may replacex 1 by x 1 in (19), ensuring that the effective CME describes copy-number of the species s 1 . Stoichiometric and kinetic conditions. The effective propensity function appearing in (19) is . It has the same form as the the propensity function of the input network (1), given by α(x) ≡ kx 1 (x 1 − 1)(x 1 − 2). However, the input reaction and the effective output reaction generally have different reaction vectors and rate coefficients. In order for the effective CME (19) of the network (2)-(3) to match the CME of network (1), we must hence require two conditions. Firstly, we require that the effective reaction vector element ∆x 1 , given by (9), is equal to the original one, ∆x 1 =ν 1 − 3. This imposes the stoichiometric condition onν 1 andγ 1 :ν Secondly, we require that the effective propensity function is equal to the original one. This imposes the kinetic condition: Let us now summarize the established results.
Lemma 2.1. Consider the third-order input network R 0 , given by (1), and the second-order output network R ε , given by (2)- (3). The x-marginal zero-order PMF of the output network R ε , with 0 < ε 1, from the perturbation series (10), satisfies the effective CME (19). Furthermore, assume the stoichiometric and kinetic conditions (20) and (21) are satisfied, respectively. Then, the effective CME (19) of the output network R ε is identical to the CME of the input network R 0 .
Convergence. Lemma 2.1 provides conditions under which the effective CME of the output network matches the CME of the input network. This has been obtained by means of formal asymptotics, under the assumption that the rate coefficients κ 1 , κ 2 1/ε, with 0 < ε 1. In order to establish convergence of the perturbation series (10) as ε → 0, one may choose the coefficients κ 1 and κ 2 according to (21), expand the PMF of the output network into an appropriate perturbation series, and then study a convergence. For example, by choosing κ 1 =κ 1 ε −1/2 and κ 2 =κ 2 ε −1/2 , withκ 1κ2 = k, one obtains for any finite-time interval, where · 1 denotes the l 1 -norm over the bounded state-spaces of x and y, which we prove in a more general setting in Section 3 (in particular, see Theorem 3.2 with n = 3). Different scalings of κ 1 and κ 2 generally lead to different orders of convergence. For example, one can readily show that choosing κ 1 = O(ε −1/3 ) and κ 2 = O(ε −2/3 ) leads to a slower convergence: , for sufficiently small ε. In general, the convergence is limited by the slowest reaction in the output network (3).
Assuming convergence, Lemma 2.1 implies that the statistics of the common species from the input and output networks are close for each fixed finite-time interval, when 0 < ε 1. In the next section, we numerically investigate how well the long-time statistics of the input and output networks match for a particular test model.
Stationary PMF. Let us compare the stationary PMFs of the input network (23), denoted by p 0 = p 0 (x), with the x-marginal stationary PMF of the output network (24), denoted by p ε,x = p ε,x (x), which capture the long-time behavior of the systems, and are thus of practical importance. In Figure 1(a), we display the stationary PMF p ε,x (x) for various values of the asymptotic parameter ε. In particular, we show the input (target) PMF p 0 (x) as the black curve, while the output PMF p ε,x (x) for ε = 10 −3 and ε = 10 −6 as the dashed purple curve and the blue histogram, respectively. When ε = 10 −3 , the output PMF is bimodal, but the PMF is inaccurately distributed. Such a mismatch arises from the fact that the asymptotic parameter is not sufficiently small: 1/ε should be larger than the largest rate coefficient appearing in the input network (23) (which is k 3 = O(10 3 )). On the other hand, when ε = 10 −6 , i.e. when 1/ε is three orders of magnitude higher than k 3 , the matching between the input and output networks is excellent. In Figure 1(b), we show the y-marginal PMF for the output network when ε = 10 −6 , and one can notice that it is concentrated around y = 0. In Figures 1(c)-(d), we show representative sample paths corresponding to the histograms from (a)-(b), respectively, where one can notice a stochastic switching phenomenon for the species s.
To gain more quantitative information, let us measure the distance (error) between the input and output PMFs as a function of the asymptotic parameter ε by using the l 1 -norm: In Figure 2(a), we display a log-log plot of p ε,x −p 0 1 , as a function of the asymptotic parameter ε, as the black dots interpolated with the black lines, which was obtained by numerically solving the underlying stationary CMEs. We also show as the dashed blue line a log-log plot of the reference curve p ε,x − p 0 1 = √ ε. One can notice an excellent match in the slopes of the two curves, in accordance with equation (22). In Figure 2(b), we plot the stationary y-marginal PMF evaluated at zero, p ε,y (0), which converges to 1 as ε → 0, in agreement with (12) and (22).   (23) and (24), respectively, given as equation (25), as a function of the asymptotic parameter ε. Also shown as the dashed blue line is the reference curve y = √ ε. Panel (b) shows the y-marginal PMF of the auxiliary species Q from the output network (24), evaluated at y = 0, as a function of ε. Panel (c) shows the MFPT for the input network (23) as the black curve, while for the output network (24) as the blue histogram when ε = 10 −6 . Similar to panel (a), panel (d) displays a log-log plot of the l 1 -distance between mean first passage times (MFPTs) of the input and output networks, obtained by solving equations (26)- (27), as described in the main text. The parameters are fixed as in Figure 1.
Mean first passage time. PMFs provide the probability that the underlying sample paths are at particular states. For multimodal PMFs, another quantity of practical importance is the mean switching time, i.e. the average time it takes the sample paths, starting near one of the PMF maximum, to reach another PMF maximum for the first time. Such information is not provided in Figure 1, as PMFs do not uniquely capture time-parametrization of the underlying sample paths, i.e. sample paths with different switching times may give rise to identical PMFs.
In order to compare the switching times of the input and output networks (23) and (24), respectively, as a function of the asymptotic parameter ε, let us note that the three deterministic equilibria of the input network (23) are approximately given by (x * s,1 , x * u , x * s,2 ) ≈ (53, 105, 202), where x s,1 and x s,2 are stable, while x u is the unstable equilibrium. The mean switching time from the lower to the higher PMF modes may be measured by the average time it takes X(t), starting in a neighborhood of x * s,1 , to reach R ≈ (x * u + x * s,2 )/2 for the first time [17]. This can be formulated for the input network (23) as the boundary-value problem (BVP) [38] where τ 0 (x) is the mean first passage time (MFPT), given that the initial condition is fixed to x, and L * is the backward operator of (23). The second line in (26) is an absorbing boundary condition, expressing the fact that the MFPT, with the starting point x = R, is zero. Numerically solving the BVP (26) with R = 150 produces the black curve shown in Figure 2(c). Analogous to (26), the BVP for the MFPT of the output network (24), denoted τ ε (x, y), reads: where L * ε is the backward operator of (24). We take R x = R = 150, and truncate the y-state-space by taking R y = 100. In what follows, we compare the one-variable quantity τ 0 (x) with the twovariable quantity τ ε (x, y). Since Y (t) spends most of the time at y = 0, i.e. since p ε,y (y) ≈ δ y,0 for 0 < ε 1, we compare τ 0 (x) with τ ε,x (x) ≡ τ ε (x, 0) (i.e. we set the initial condition for y at y = 0), and measure the error using the l 1 -norm, τ ε,x − τ 0 1 . The error is shown Figure 2(d), and, as predicted by (22), one can notice a √ ε-convergence to zero. In Figure 2(c), we show τ ε,x (x) for ε = 10 −6 as the blue histogram, which is in an excellent agreement with τ 0 (x) shown in black.
Proof. See Appendix B.
Theorem 3.2. Consider the output network (29)- (30), satisfying the condition (32), with the following choice of the rate coefficients: Then, the PMF of the output network converges to its zero-order approximation for any finite-time interval, p ε → p 0 as ε → 0, with where c(T ) is constant independent of ε.
Proof. See Appendix C.
Note that the order of convergence predicted by equation (34), given by 1/(n − 1), decreases as the order of the input network (28) increases. Generally, note also that the convergence order depends on the kinetic condition (32), i.e. it depends on how the rate coefficients κ i are scaled, but is independent of the stoichiometric conditions (31).
The family of output networks (29)-(30), parametrized by the stoichiometric coefficients {ν l } m l=1 andγ n−2 , is not a unique approximation of the input network (28) with a second-order network, i.e. the ordering of the reactants and reactions in (29)-(30) is not unique.
Example 3.1. Consider the third-order input reaction Under the conditions of Theorem 3.1, reaction (35) may be approximated according to (29)- (30) with an output network R ε = R ε 1 (s 1 , s 2 ) ∪ R 2 (s 2 ), with the sub-networks i.e. the forward reaction from R ε 1 is a heteroreaction, involving the distinct reactants s 1 and s 2 . However, the second-order output network R ε 1 (s 2 , s 2 ) ∪ R 2 (s 1 ), given by for which the forward reaction from R ε 1 is a homoreaction, also approximates the input reaction (35) under the conditions of Theorem 3.1, which is readily proved analogously as in Appendix B.
In order to facilitate the proof of Theorem 3.1 in Appendix B, we have followed the convention of ordering the reactants in the input reaction (28) according to increasing stoichiometric coefficients, and we have also fixed the ordering of reactants and reactions in the output networks (29)- (30). For example, if the input reaction is 2s 1 +s 2 − → ∅, we would re-label the species to obtain s 1 +2s 2 − → ∅. In what follows, we are no longer concerned with proving Theorem 3.1 and, to gain a greater flexibility, we allow arbitrary orderings, and use both of the designs (36) and (37), depending on convenience.
Let us now interpret conditions (31) and (32) from Theorem 3.1. The kinetic condition (32) simply states that the product of the rate coefficients of all of the (n − 1) forward (slower) reactions from the output network (30), κ 1 κ 2 . . . κ n−1 , divided by the product of all of the (n − 2) backward (faster) reactions, (1/ε) n−2 , must be equal to the rate coefficient of the input reaction (28). Furthermore, the asymptotic conditions {εκ i } n−1 i=1 1 state that we require the forward reactions to be slower than the backward ones, ensuring validity of the perturbation analysis. The stoichiometric condition (31) also has an intuitive interpretation, as we now exemplify.

Example 3.2. Consider the fourth-order input reaction
where, using the notation from Theorem 3.1, ν 1 = ν 2 = 2 andν 1 =ν 2 = 4. A suitable family of output networks is given by Ifγ 2 = 0, then (31) implies thatν 1 =ν 2 = 4, i.e. that R 2 (s 2 ) from (39) has the same products as the input network (38), When 0 < ε 1, intuitively speaking, at the network level, we formally have Q 1 = s 1 + s 2 and Q 2 = s 1 + Q 1 = 2s 1 + s 2 . With this notation, condition (31) states that we may add the complex ∅ = (Q 2 − 2s 1 − s 2 ) to the products of R 2 (s 2 ) from (39) as many times as desired, as long as the resulting complex contains nonnegative stoichiometric coefficients (so that reaction R 2 (s 2 ) retains a chemical interpretation). By adding the product complex ∅ = (Q 2 − 2s 1 − s 2 ) once to (40), one obtains while by adding it twice, one gets Each of the three members (40)-(42) of the family of output networks (39) is valid. However, note that adding the product complex ∅ = (Q 2 − 2s 1 − s 2 ) three times to (40) leads to for which the product complex is not nonnegative: the reaction is non-chemical, and hence not a valid approximation of the input network (38).
In the remainder of this section, we introduce two more examples. In Section 3.1, we provide an example highlighting how the rate coefficients {κ} n−1 i=1 from (30) may be chosen in order for the underlying perturbation results to be valid at larger values of ε. In Section 3.2, we illustrate how Theorem 3.1 can be efficiently applied to multi-input networks, i.e. to input networks containing multiple higher-order reactions.

Example: Chemical Kronecker-delta
Let us consider the fifth-order input reaction network given by where we fix the dimensionless parameters to k 1 = k 2 = 1, and denote the copy-number of species s by x. The stationary PMF of (44) is shown in Figure 3(a) as the black dots, which are interpolated with the solid black lines. The PMF is approximately the Kronecker-delta function that peaks at x = 4, which may also be seen directly from the reaction network (44) The stoichiometric condition (31) becomesν = (4 − 4γ), and there are two options: (ν,γ) = (4, 0), and (ν,γ) = (0, 1). In what follows, we arbitrarily take (ν,γ) = (0, 1), and denote the copy-number of species s by x, while of species Q i by y i , for i ∈ {1, 2, 3}.
Having met the stoichimetric condition, it remains to also satisfy the kinetic condition (32). In particular, the rate coefficients from (45) must satisfy: A particular choice of the rate coefficients is given by (33) from Theorem 3.2 with n = 5: κ i = ε −3/4 (k 2 ) 1/4 , for i ∈ {1, 2, 3, 4}. While such a simple choice of the rate coefficients is valid, ensuring convergence of order 1/4 for sufficiently small 0 < ε 1, it may take very small values of ε for the validity of the underlying perturbation analysis. Let us now discuss how the rate coefficients {κ i } 4 i=1 may be chosen, so that the perturbation analysis is valid already at larger values of ε.
The perturbation analysis employed in this paper relies on scaling the rate coefficients of reactions. A more detailed analysis would consider the whole propensity functions of the reactions. While we do not pursue such an analysis in this paper, let us make a remark. The output network (45) consists of an ordered chain of reactions: in order for the sub-network R 4 from (45) to fire, and mimic the action of reaction 5s − → 4s from (44), one first requires that the forward reactions in the sub-networks R ε 1 , R ε 2 and R ε 3 fire. The reactant complex of the forward reaction from R ε 1 , forming the start of the chain, is given by 2s, while the propensity function is α 1 (x) = κ 1 x(x − 1). On the other hand, the forward reactions from R ε 2 and R ε 3 , and the reaction R 4 , involve the shortlived low-copy-number intermediates Q 1 , Q 2 and Q 3 as reactants, with the propensity functions α 2 (x, y 1 ) = κ 2 xy 1 , α 3 (x, y 2 ) = κ 3 xy 2 and α 4 (x, y 3 ) = κ 4 xy 3 , respectively. If the induced stochastic dynamics predominantly take place near the x-axis and sufficiently far from the origin (i.e. for lower values of y 1 , y 2 , y 3 , and higher values of x), then one may observe that α i (x, y 1 )/κ i α 1 (x)/κ 1 , for i ∈ {2, 3, 4}. In such settings, in order to speed up the slower forward reactions from R ε 2 and R ε 3 , and the reaction R 4 , a more efficient choice of the rate coefficients involves larger values of κ 2 , κ 3 and κ 4 , and, due to the constraint (32), smaller values of κ 1 . However, this should be balanced with the fact that, by taking a smaller κ 1 , the chain of reactions from (45) is triggered less often.
Going back to equation (46), let us then choose the rate coefficients e.g. as follows where the condition on the corrective factor α, namely 0 ≤ α < 1/4, ensures that {εκ i } 4 i=2 1 as ε → 0. Fixing e.g. α = 1/8, in Figure 3(a) we display the stationary x-marginal PMF of the output network (45) for ε = 10 −6 as the blue histogram, which is in an excellent agreement with the PMF of the input network (44), shown in black.
with the sub-networks given by (49)-(50), as the dashed purple curve when the asymptotic parameter is fixed to ε = 10 −3 , and as the blue histogram when ε = 10 −6 . The rest of the dimensionless parameters are fixed to: , andκ 1 ,κ 2 are chosen according to (52), with α = 1/12. The conservation constant for the input network is fixed to c = 7, and the initial conditions for all of the auxiliary species from the output network, {Q i } 5 i=1 ,Q 4 are fixed to 0.

Example: Noise-induced trimodality
Theorem 3.1 provides conditions under which a single higher-order input reaction (28) may be approximated by a second-order network (29)- (30). A generalization to the case of an input network containing multiple higher-order reactions is straightforward: Theorem 3.1 may be applied iteratively to each desired (higher-order) input reaction, one at a time, while the other input reactions, which one does not wish to approximate, are copied directly from the input to the output network without any modifications. Furthermore, if some of the input reactions involve common reactant sub-complexes, then some of the intermediate species Q i may be re-used to simultaneously approximate such reactions. Put it another way, reactions with common sub-complexes may be approximated by multiple chains of reactions of the form (30), which all branch out from a suitable common sub-chain. These statements can be readily proved as in Appendix B, under straightforward modifications of the coordinate transformations (57)- (58).
Let us now illustrate an iterative application of Theorem 3.1, and show how the intermediates may be re-used in order to reduce the number of auxiliary species and reactions in the output networks. To this end, consider the seventh-order input reaction network R 0 = R ∪ R 2,5 ∪ R 4,2 , given by In what follows, we fix the dimensionless parameters to k 1 = k 2 = k 2,5 = k 4,2 = 1. Denoting the copy-numbers of species s ands by x andx, respectively, note that their sum is conserved: In what follows, we fix the conservation constant to c = 7. Reaction network (48) has been obtained by applying the so-called noise-control algorithm [9] on the network R(s,s), in order to control its stochastic dynamics. Here, sub-networks R 2,5 and R 4,2 , called the zero-drift networks, introduce a state-dependent noise at x = 2 and x ∈ {4, 5}, respectively, thereby decreasing the PMF at those states, while preserving the mean of the PMF. In Figure 3(b), we display the stationary PMF underlying (48) as the black dots, which are interpolated with the solid black lines for visual clarity. One can notice that the network displays noise-induced trimodality, which is not observed at the deterministic level, where network (48) is monostable, displaying a globally stable equilibrium x * = k 1 /(k 1 + k 2 )c = 3.5. Let us now apply Theorem 3.1 on the network (48) in order to reduce its order to two, thereby demonstrating that the noise-control algorithm may be useful for molecular computing, where one typically experimentally implements up-to second-order reactions [6]. Applying Theorem 3.1 to independently reduce the order of each of the four reactions underlying R 2,5 ∪ R 4,2 requires 18 auxiliary species and 40 reactions in total, i.e. 10 independent auxiliary species for the network R 2,5 (5 species for each of the underlying reactions) and, similarly, 8 species for R 4,2 . However, 5 auxiliary species are in fact sufficient to reduce the order of network R 2,5 , since each of the two underlying reactions involve the same reactants, so that we may re-use the auxiliary species. Similar reasoning implies that we require only 4 auxiliary species to approximate the network R 4,2 . Hence, 9 auxiliary species are sufficient to approximate R 2,5 ∪ R 4,2 by a secondorder network. Furthermore, all of the reactions from R 2,5 ∪ R 4,2 involve a common reactant sub-complex (2s + 2s), i.e. they involve at least 2 molecules of s, and 2 ofs, which we may further exploit to reduce the number of the auxiliary species. In particular, since (2s+2s) may be converted into an auxiliary species in three steps, we may re-use 3 species in both networks R 2,5 and R 4,2 , hence requiring in total 6 auxiliary species. More precisely, sub-network R 2,5 may be approximated by R ε 1 (s) : 2s − → 3s + 4s, and we may re-use the species Q 3 from (49), formally satisfying Q 3 = (2s + 2s), to make the approximation of the sub-network R 4,2 more efficient, as follows: In summary, we have approximated the seventh-order input reaction network R 0 = R ∪ R 2,5 ∪ R 4,2 , given by (48), involving 2 species and 6 reactions, with the second-order output network R ε ≡ R ∪ (∪ 5 i=1 R ε i ∪R ε 4 ) ∪ R 6 ∪R 5 , involving 8 species and 18 reactions. The sub-network R 2,5 from (48) is approximated by The output network involving sub-networks (49)-(50) satisfies the stoichiometric conditions (31), while the kinetic conditions (32) are given by ε 5 κ 1 κ 2 κ 3 κ 4 κ 5 κ 6 = k 2,5 , Guided by the discussion in Section 3.1, let us choose the rate coefficients e.g. as follows Fixing e.g. α = 1/12 in (52), we display in Figure 3(b) the stationary x-marginal PMF of the output network R ε , containing sub-networks (49)-(50), for ε = 10 −3 , and ε = 10 −6 , as the purple squares, interpolated with the purple dashed lines, and the blue histogram, respectively. Similar to Figure 1(a), when ε = 10 −3 , the output PMF is qualitatively accurate, capturing the multimodality, but is quantitatively inaccurately distributed. On the other hand, when ε = 10 −6 , one can notice a good matching with the stationary PMF of the input network (48), which is shown in black.

Discussion
In this paper, we have shown that higher-order (high-molecular) input biochemical networks, containing reactions with three or more reactants, may be stochastically approximated with appropriate second-order (bi-molecular) output ones, containing reactions with up-to two reactants. In particular, it is shown that the probability distributions of the input and output networks match over arbitrarily long time-intervals, when appropriate rate coefficients in the output networks are sufficiently large. The approximation relies on a dimension-expansion: a single nth-order input reaction is replaced by an output network consisting of (2n − 3) reactions of up-to second-order, and involving (n − 2) additional auxiliary species. This has been established by applying singular perturbation theory on the underlying chemical master equation (CME), and stated in Theorems 3.1 and 3.2 in Section 3. We have, thereby, generalized the well-known order-reduction algorithm, which has been used for decades at the deterministic level [20,21,22,23,24], to the stochastic level and arbitrarily high-order. In particular, in Section 2, we have first established the result in the special setting when the input reaction is of third-order (tri-molecular), and involves identical reactants, stated as Lemma 2.1. We have applied Lemma 2.1 in Section 2.3, by mapping the third-order Schlögl network [13], given by (23) and displaying a bimodal stationary probability mass function (PMF), to an approximating second-order network, given by (24). It has been verified that the stationary PMF of the approximating network matches that of the Schlögl network in an asymptotic limit of the appropriate rate coefficients, see Figure 1. Furthermore, we have demonstrated that the mean switching time between the two PMF maxima is also preserved under applications of Lemma 2.1, and, in Figure 2, we have numerically verified the theoretically predicted convergence order, given in equation (22).
In Section 3, we have presented our main result: Theorem 3.1, which states that input reactions of arbitrarily high-order, given by (28), may be approximated by families of second-order output ones, given by (29)- (30). In particular, in order for the output networks to replicate the stochastic behavior of the input ones, constraints are imposed on the stoichiometry and kinetics of the output networks, given as conditions (31) and (32), respectively. We have also presented Theorem 3.2, which establishes a weak-convergence result for a sub-family of the output networks, with a convergence order that depends only on the kinetics of the output networks, and not the underlying stoichiometry. In Section 3.1, we have applied Theorem 3.1 on the fifth-order reaction network (44), whose probability distribution is approximately a Kronecker-delta function, arising as a result of a discrete state-space and stochastic dynamics. We have mapped the input network (44) to a family of output networks, given by (45), and discussed how the rate coefficients appearing in (45) may be chosen efficiently. It has been verified that the biochemically realistic network (45), consisting of up-to second-order reactions, also displays the approximate Kroneckerdelta probability distribution in an asymptotic limit, see Figure 3(a). Finally, in Section 3.2, we have considered the seventh-order input network (48), which has been designed using the so-called noise-control algorithm [9]. The input network displays noise-induced trimodality, and consists of four higher-order reactions. We have highlighted that Theorem 3.1 can be applied iteratively for multi-input reaction networks, i.e. networks involving multiple higher-order reactions. An efficient iterative application of Theorem 3.1 on the input network (48) has been presented, which significantly reduces the number of reactions and auxiliary species in the resulting output network, given by (49)-(50), whose PMF is shown to converge in Figure 3(b). The noise-control algorithm [9], and Theorem 3.1, jointly provide a systematic framework for designing bi-molecular chemical reaction networks, whose stochastic dynamics are controlled in a state-dependent manner, and which may be experimentally realized in nucleic-acid-based synthetic biology [6].

A Background Theory
Notation. Set R is the space of real numbers, R ≥ the space of nonnegative real numbers, and R > the space of positive real numbers. Similarly, Z is the space of integer numbers, Z ≥ the space of nonnegative integer numbers, and Z > the space of positive integer numbers. Euclidean row-vectors are denoted in boldface, Reaction networks. In this paper, we consider reaction networks R = R(s 1 , s 2 , . . . , s N ), firing in unit-volume reactors under mass-action kinetics, involving N reacting species, {s i } N i=1 = {s 1 , s 2 , . . . , s N }, and M reactions given by Here, the non-negative linear combinations of the species N i=1 ν ji s i and N i=1ν ji s i are called the reactant complex and the product complex of the j-th reaction, respectively, ν j = (ν j1 , ν j2 , . . . , ν jN ) ∈ Z N ≥ andν j = (ν j1 ,ν j2 , . . . ,ν jN ) ∈ Z N ≥ are the corresponding reactant and product stoichiometric vectors, respectively, while k j ∈ R ≥ is the rate coefficient of the j-th reaction [1,36]. Abusing the notation slightly, we denote the complex N i=1 ν ji s i by ν j , and reaction ( N i=1 ν ji s i → N i=1ν ji s i ) ∈ R by (ν j →ν j ) ∈ R, when convenient. The order of reaction (ν j →ν j ) ∈ R is given by ν j · 1 < ∞, where 1 = (1, 1, . . . , 1) ∈ Z N , and · denotes the vector dot-product. The order of reaction network R is given by the order of its highest-order reaction.
The stochastic model. Let us now consider reaction networks firing in well-mixed reactors, with discrete species counts, and stochastic dynamics. A suitable stochastic description models the time-evolution of species copy-number vector X(t) = (X 1 (t), . . . , X N (t)) ∈ Z N ≥ as a continuous-time discrete-space Markov chain [29], where t ∈ R ≥ is the time-variable. The underlying probability mass function (PMF) satisfies the partial difference-differential equation, called the chemical master equation (CME) [37,38], given by where p(x, t) is the PMF, i.e. the probability that the copy-number vector X = X(t) ∈ Z N ≥ at time t > 0 is given by x ∈ Z N ≥ . Here, the linear operator L is called the forward operator, while the step operator E −∆x is the propensity (intensity) function of the j-th reaction, and is given by where k j ≥ 0 is the rate coefficient of reaction of the j-th reaction. Here, denotes the ν ji -th factorial power of x i , with the convention Let p, q = x∈Z N ≥ p(x)q(x) denote the l 2 inner-product. The l 2 -adjoint operator of L, denoted by L * and called the backward operator [39], is given by On the other hand, if m ≥ 2, then x m + δ m,2 In what follows, for mathematical convenience, we define the fast network The CME corresponding to (29)-(30), rescaled in time according to t = τ /ε n−2 , and expressed in terms of the new variablesx, is given by where L 0 is the forward operator of network R 1 0 in the new coordinates, given by while L 1 is the forward operator of the slow sub-network R ε \ R ε 0 , and reads Here, we take the convention that y 0 ≡ 1, and that the operators E ±1 y 0 are identity operators, denoted E ±1 y 0 ≡ 1. Function y i−1 α i (x, y) is the propensity function of the forward reaction in the sub-network R ε i from (30), expressed in terms of the new variables (57)- (58). Let us note that L n−1 1 is the only operator which acts on the variablex, while the rest of the operators act only on y.