Hybrid Master Equation for Jump-Diffusion Approximation of Biomolecular Reaction Networks

Cellular reactions have multi-scale nature in the sense that the abundance of molecular species and the magnitude of reaction rates can vary in a wide range. This diversity leads to hybrid models that combine deterministic and stochastic modeling approaches. To reveal this multi-scale nature, we proposed jump-diffusion approximation in a previous study. The key idea behind the model was to partition reactions into fast and slow groups, and then to combine Markov chain updating scheme for the slow set with diffusion (Langevin) approach updating scheme for the fast set. Then, the state vector of the model was defined as the summation of the random time change model and the solution of the Langevin equation. In this study, we have proved that the joint probability density function of the jump-diffusion approximation over the reaction counting process satisfies the hybrid master equation, which is the summation of the chemical master equation and the Fokker-Planck equation. To solve the hybrid master equation, we propose an algorithm using the moments of reaction counters of fast reactions given the reaction counters of slow reactions. Then, we solve a constrained optimization problem for each conditional probability density at the time point of interest utilizing the maximum entropy approach. Based on the multiplication rule for joint probability density functions, we construct the solution of the hybrid master equation. To show the efficiency of the method, we implement it to a canonical model of gene regulation.


Introduction
Reaction networks in systems of biology have discrete and stochastic nature [11,14,15]. Ignoring the randomness of the stochastic fluctuations and the discreteness of the number of molecules of species result in inappropriate models which cannot correctly describe the dynamics of the whole cell. Stochastic modeling approach explains the dynamics of these systems using discrete-state continuous-time Markov chains and describes the state of the system by integervalued number of molecules of species. In this approach, the state vector of the system satisfies the random time change model (RTCM), which defines the reaction counting processes using Poisson processes [2]. Also, the probability mass function of these systems satisfies a set of differential equations referred to as the chemical master equation (CME) in the literature [18]. When the number of molecules of the species in the system of interest is very high, the state vector of the system can be defined by real-valued concentrations instead of integer-valued particle numbers. Dynamics of such systems can be modeled through diffusion approximation, and the state vector of the system satisfies an Itô stochastic differential equation (SDE) known as the chemical Langevin equation (CLE). Similarly, probability density function of these systems suffices the Fokker-Planck equation (FPE) [20,21]. In the thermodynamic limit, in which the number of molecules of species and the system volume both approach to infinity while the concentrations of species stay constant, the state of the system is given by the reaction rate equation (RRE) of the traditional deterministic modeling approach.
Cellular reaction systems involve reactions with very different rates and species with very different abundances. Models only based on the traditional deterministic modeling approach fail to account for this nature. Therefore, different hybrid methods that couple the stochastic and deterministic modeling approaches are needed. In general, hybrid methods separate reactions and/or species into different groups of reactions and/or species, and they use the diffusion or the deterministic modeling approach to describe the dynamics of fast reactions and/or species with high copy numbers, while Markov chain representation is utilized for slow reactions and/or species with low copy numbers [6,7,8,10,13,27].
A major challenge of modeling the reaction networks using the CME is the curse of dimensionality. Each state of the system under consideration adds one dimension to the corresponding CME. Therefore, when the number of reachable states is very high, it is very difficult to obtain the numerical solution of the CME. To avoid this drawback, different simulation algorithms, such as Gillespie's stochastic simulation algorithms (SSAs) and their versions, were proposed to obtain the trajectories of the biochemical system of interest [17,22]. The computational cost of these algorithms increases with the size of the model; therefore, it is not appropriate to use them for very complicated systems involving many reactions and reactants.
Moment approximations that analyze the dynamics of the reaction network under consideration using moments of the probability distribution satisfying the corresponding CME are considered as an alternative. In [12], the author proposed the method of moments that computes the moments for any reaction network from the corresponding CME. In [30], a moment closure approximation that obtains finite dimensional ordinary differential equation (ODE) system for the mean and the central moments by truncating the moment equations at a certain order and using the Taylor series is introduced. Another moment closure method that approximates the moments with higher order, compared to the order of truncation, utilizing nonlinear functions of the lower order moments is introduced in [35].
In [26], the authors introduced the method of conditional moments (MCM) that can be considered as the combination of a hybrid method and a moment approximation method. The MCM separates species into two different classes involving species with high copy number of molecules and species with low copy number of molecules. Based on this decomposition, the joint probability density function satisfying the corresponding CME is also represented as a product of the marginal probabilities of species with low copy number of molecules and the conditional probabilities of species with high copy number of molecules conditioned on the remaining species with low copy numbers of molecules. To describe the dynamics of species with low copy number of molecules, the authors used marginal probabilities, while the conditional means and the centered conditional moments are used to model the dynamics of species with high copy numbers. In comparison to [26], in [3], the authors obtained moments of the system of interest directly from the corresponding CME without using any partitioning of the species, and the maximum entropy approach is used to construct the corresponding probability distribution.
In [16], we developed a jump-diffusion approximation to model multi-scale behavior of cellular reactions. Based on an error bound, we separated reactions into fast and slow groups. We employed diffusion approximation for the fast reactions, while Markov jump process was kept for the slow ones. As a result, the state of the system was defined as the summation of the RTCM and the solution of the corresponding CLE. In this paper, based on this representation, we present the hybrid master equation (HME), which is the evolution equation for the joint probability density function of the jump diffusion approximation over the reaction counting process. We prove that the HME is the summation of the corresponding CME and the corresponding FPE [32]. To solve the HME, we obtain the evolution equation for the marginal probability of slow reactions and the evolution equations for the conditional moments of the fast reactions given slow reactions [26]. Using the maximum entropy approach, we construct the corresponding conditional probability at the time point of interest, which in turn gives the approximate solution of the corresponding HME.
The rest of the paper is organized as follows: In Section 2, we describe the basic concepts of the stochastic modeling approach. In Section 3, we give a brief summary of the jump-diffusion approximation. We introduce the HME in Section 4. In Section 5, we construct an ODE system that will be used to obtain the approximate solution of the HME. In Section 6, we introduce the maximum entropy approach. In Section 7, we present numerical results and also explain the details of how we use the maximum entropy approach to construct the joint probability density function describing the HME. Section 8 concludes the paper. Notation Before we give the details of the mathematical derivations, we present the basic notations used through the present paper. We represent all random variables and their realizations by upper-case (i.e. A) and lower case (i.e. a) symbols, respectively, and we use bold symbols to represent the support of a random variable (i.e. A). Also, e j ,ē j denote (R − L) × 1, L × 1, unit vectors with 1 in the j−th component and 0 in other coordinates.

Stochastic Modeling of Chemical Kinetics
In this study, we consider a well-mixed reaction system of M species, S 1 , S 2 , . . . , S M , interacting through R ≥ 1 reaction channels R 1 , R 2 , . . . , R R inside the reaction compartment with volume V . The k−th reaction channel of the system is described as follows: where r jk , p jk ∈ N, j = 1, 2, . . . , M , represent the number of molecules of species S j consumed and produced with a single occurrence of the reaction R k , respectively, and k is the real-valued stochastic reaction rate constant. Let X i (t) ∈ N 0 denote the number of molecules of species S i , i = 1, 2, . . . , M , at time t ≥ 0. Then, the state of the system at time t is X(t) = (X 1 (t), X 2 (t), . . . , X M (t)) T ∈ N M 0 . The classical stochastic modeling of biochemical networks assumes that the process of X is a continuous time Markov chain (CTMC). In this approach, the state vector, X(t), is defined as a random variable of the Markov jump process. Each reaction channel R k , k = 1, 2, . . . , R, is specified by its stoichiometric vector (state-change vector) and its propensity function. The stoichiometric vector ν k = (ν 1k , ν 2k , . . . , ν M k ) ∈ Z M with ν jk = p jk − r jk , j = 1, 2, . . . , M , represents the change in the state of the system after one occurrence of the reaction R k . In other words, when the reaction R k fires, the system state X(t) = x jumps to a new state x + ν k . Given X(t) = x, the probability that one R k reaction takes place in the time represents the propensity function calculated by the law of mass action kinetics, i.e., a k ( denote the number of occurrence of the reaction R k by the time t, then the state of the system at time t can be obtained as follows: If we represent the counting process Z k (t) in terms of the independent Poisson process denoted by ξ k , such that Z k (t) = ξ k t 0 a k (X(s))ds , then the state vector of the above CTMC satisfies the following RTCM [2] Let define the following probability mass function p t (x) = P(X(t) = x).
Another way of analyzing this CTMC process is to consider the time evolution of the probability function p t (x). This probability mass function is the solution of the following Kolmogorov's forward equation, which is known as the CME [23] ∂p When the number of molecules in the system is very high, then the abundance of the species at time t can be represented by the real valued concentrations of the form U (t) = V −1 X(t) ∈ R M ≥0 . In most cases, reaction channels in biochemical systems are bimolecular or monomolecular. If the k−th reaction channel R k is bimolecular or monomolecular, then its propensity function satisfies the equality a k (x) = V a k (u) where a k is the propensity function obtained using the deterministic reaction rate k [36].
It is well known that the centered version of each Poisson process, ξ k , in Equation (2.1) can be approximated through the independent Brownian motions W k (t) [2,29]. Considering the fact that (ξ k (V t) − V t)/ √ V converges in distribution to the Brownian motion W k (t) for large V , we obtain the diffusion approximation of Equation (2.1) as given below: 3) The first and the second summand in the right hand-side of Equation (2.3) are called drift and diffusion terms, respectively. The time derivative of the state vector U (t) satisfies an SDE, namely the CLE.
Let define the following probability density function Then, analog of the CME for this continuous process is represented by the following FPE [20,21] Cellular processes consist of bimolecular reactions of very different speeds involving reactants of largely different abundances. Therefore, the models based only on the RTCM or only the diffusion approximation may be inappropriate to dynamics of such multi-scale processes. In [16], we developed a jump-diffusion approximation to model such processes. In the following section, we will give a summary of this approximation.

Jump Diffusion Approximation
In jump-diffusion approximation [16], we partition the reactions into the fast subgroup, C, and the slow subgroup, D, and model the fast group using a diffusion process, while Markov chain representation is kept for the slow group.
In this approach instead of the CTMC process represented by X, we focus on the scaled abundancesX N i = X i /N ζi , i = 1, 2, . . . , M , and the scaled stochastic reaction rates κ j = j /N ηj , j = 1, 2, . . . , R, such thatX N i = O (1), κ j = O (1). Naturally, these scaled quantities will produce new scaled propensity functions as follows where r k = (r 1k , r 2k , . . . , r M k ) and ζ = (ζ 1 , ζ 2 , . . . , ζ M ). It must be noted thatā k (.) functions are also O(1). Finally, scaling the time t → tN θ and defining X N (t) =X N (tN θ ), we transform the state vector, X(t), given by Equation (2.1) into the following scaled state vector Modeling the fast reactions through diffusion approximation and modeling the slow reactions through Markov chains give the state vector of the jump diffusion approximation as follows: where Y (0) = X N (0), and W j is a standard Brownian motion. If τ 1 , τ 2 denote the successive firing times of reactions from the slow group, then for τ 1 < t < τ 2 , only reactions from the fast group can fire. Therefore, in this time interval, the state vector of the system is given by The main contribution of this study is the derivation of an error bound for the mean e(t) = E | X N (t) − Y (t) |, which is used to partition the reaction set into fast and slow subgroups. Based on this error bound, we construct a dynamic partitioning algorithm that takes into account the fact that a fast reaction can return to a slow reaction or vice versa during the course of time. By describing the state vector of the system as the summation of purely discrete and purely continuous components, we can introduce the HME, which defines the joint probability density function of the jump diffusion approximation over the reaction counting process. In the following section, we will obtain the HME.

Hybrid Master Equation
In jump diffusion approximation, we partition the reaction set into two subsets. As mentioned before, the first subset C involves reactions modeled by diffusion approximation, while the rest of the reactions constituting the slow set D are modeled by Markov chains. In the rest of the study, we will consider that there are L slow reactions, i.e., | D |= L, and R − L fast reactions in the system, i.e., | C |= R − L. Let T be a vector of reaction counters such that Z i (t) denotes the number of occurrences of the reaction R i , i = 1, 2, . . . , R, during the time of the process until time t > 0. Similar to the idea of splitting the state vector of the system into purely discrete and purely continuous parts, we also separate Z(t) = (D(t), C(t)) T into purely discrete and continuous parts corresponding to the reaction counters of the slow, D(t) ∈ N L , and the fast reaction set, We also separate the stoichiometric vectors such that µ D i = µ i , i ∈ D, and µ C j = µ j , j ∈ C. By using Equation (3.5), we will define reaction counters as follows: It must be noted that if τ 1 , τ 2 denote the successive firing times of reactions from the slow group, then for τ 1 < t < τ 2 , C(t) satisfies the following equation where d denotes the number of slow reactions fired until time τ 1 > 0.
The HME is the time derivative of the joint probability density function p t : Then, we can write To obtain the evolution equation for p t (d, c), which is called the HME, we need the following result whose details can be found in [32].
be a continuous process. Define the joint probability density function as follows: Then, the time derivative of this joint probability function, which is referred to as generalized Fokker-Planck equation (GFPE), has the following form where A n1,n2,...,n R−L = lim and and Define Y as a multi-scale process whose state vector is given in Equation (3.5). Then, the joint counting probability density function given in Equation (4.9) satisfies the following GFPE, which is referred to as the HME in the present paper.
Proof. By using Equation (4.10) and Equation (4.13), we obtain (4.14) Now, let's focus on the first summand on the right hand-side of Equation (4.14), which can be rewritten as follows: Using Equation (4.11) gives which can be reformulated as follows By using this representation, we can rewrite Equation (4.15) in the following form In our multi-scale process, we have L slow reactions, and one firing of the reaction R j in this set updates d to d +ē j . Starting from d, the system can jump to d = d +ē j , meaning that a d+ēj ,d = α j (d, c). In the same vein, to reach d, the system must supervene on d −ē j , by definition a d,d−ēj = α j (d −ē j , c). As a result, we obtain the desired summand as follows: Now, we can concentrate on the second and the third summands of Equation (4.14). Jump diffusion approximation is based on the idea that between two successive firing times of the slow reactions, the fast reactions continue to fire. Hence, the state vector and also the reaction counting process of the fast reaction set will satisfy diffusion processes (see Equation 3.6,4.8). Therefore, B j and B ij values have the forms [19,21,28] Substitution of B j and B ij values and Equation (4.16) into Equation (4.14) gives which completes the proof.
Based on the properties of the joint counting probability density function, we can write Since we partition reaction counters into two subsets, we will also decompose the propensity functions. Using mass action kinetics to compute propensities is very popular, and for this large class we partition the propensity function of the reaction R k , α k (d, c), k = 1, 2, . . . , R, as follows: c j µ C sj and K is a real constant that will be ignored to simplify the notation for the reader. Based on this representation, the HME given in Equation → R be any functions of d and c variables, respectively. To simplify the notation, we introduce one step operator in the following form Based on this representation, we define Then, we can write Equation (4.17) in the following form In the rest of the study, we will assume that p t (d, c) is zero at c = 0, c = ∞ [25,33]. In the folllowing section, we will explain how we obtain the solution of this HME.

Solution of the Hybrid Master Equation
To obtain the joint counting probability density function, p t (d, c), described by the HME given in Equation (4.18), we will approximate the process C(t) | D(t) using its moments. Solving a maximum entropy problem for each conditional moment will produce the conditional probability function, p t (c | d). The multiplication of p t (c | d) with the marginal probabilities of the remaining discrete states, i.e., p t (d) = p t (d, c) dc, will give us the desired joint probability density function p t (d, c) .
In the rest of the study, time dependent conditional means and the centered conditional moments of the process C(t) | D(t) will be denoted by Now, based on the study [26], we want to construct a differential equation system to obtain p t (d), To construct this system, we will need the following Lemma [12,26].
−→ R be a polynomial function of c, and p t (d, c) satisfy differential Equation (4.18). Assume that sufficiently many moments of p t (d, c) with respect to c exist, and the joint counting probability density vanishes at c = 0 and c = ∞. Define the following conditional mean where Proof. The proof of the Lemma can be found in Appendix A.1.
When F t (c) = 1 in Lemma 5.1, we obtain the time derivative of the marginal probability p t (d), which is given in the following proposition.
The strategy of our method is to obtain p t (d) and p t (c | d) separately and construct the joint probability function using the equality p t (d, c) = p t (c | d)p t (d). To obtain the conditional probability p t (c | d), we will use evolution equations of the conditional means E t [C m | d], m ∈ C, and the centered conditional . Equation (5.19) will be the first equation of our system. It must be noted that differential equation defining the marginal probability only depends on the slow reactions. To solve this differential equation, we need to reformulate the unknown conditional means The details of this transformation can be found in Appendix A.2.
In the following proposition, we will obtain the time evolution equation for the conditional means E t [C m | d], m ∈ C.
where δ jm is the Kronecker delta function.
Proof. The proof of the proposition can be found in Appendix A.3.
In the following proposition, we will obtain p t (d) Proposition 5.4.
Proof. The proof of the theorem can be found in Appendix Section A.4.
Up to this section, we have obtained the time derivatives of the marginal probabilities as well as those of the conditional means and the centered conditional moments. These three equations will give us the following differential equation system.
where δ jm is kronecker delta function. Also, Fē i is a one step operator as follows: In the following section, we will explain the details of the maximum entropy method which will be used to construct the conditional probability distribution p t (c | d).

Maximum Entropy
Assume that we want to obtain the solution of the HME under consideration at a specific time point τ > 0. Solving the ODE system in (5.22) , m ∈ C, M = (M 1 , M 2 , . . . , M R−L ) T ∈ N R−L values for the system of interest. Although the marginal probabilities, p τ (d), can directly be obtained from the ODE system, we still do not know the corresponding conditional probability density function, p τ (c | d), which will be used to construct the joint probability, p τ (d, c), solving the corresponding HME.
To estimate the unknown conditional probability density functions using its moments, we will use the maximum entropy approach proposed by Shannon [34]. Assume that we have a state space Ω = D × R R−L ≥0 and our goal is to estimate the unknown probability density function p τ : Ω → R ≥0 . Let denote the moments of the joint probability density function at time point τ . It must be noted that when M = e m , we To guarantee that p τ (c | d) is a probability function, we must impose the condition S 0 τ = 1. Then, the approximation for the conditional probability density function p τ (c | d) will be obtained solving the following constrained convex optimization problem Let N be the number of moment constraints and M k , k = 0, 1, . . . , N , denote different choices of vectors To impose the conditions given above, we will have M 0 = 0, M j = e j , j = 1, 2, . . . , R − L. Then, the solution of this constrained optimization problem can be obtained maximizing the following Lagrange function where λ k ∈ R, k = 1, 2, . . . , N are referred to as Lagrange multipliers. Taking the derivative of L(p τ (c | d), λ(τ )) with respect to p τ (c | d) will give the approximate solution of the conditional probability density for p τ (c | d) in the following form where Z(N, λ(τ )) is a normalization constant [1,4,5]. Now, we can obtain the approximate solution of the joint probability density function which solves the HME under consideration by multiplying the obtained conditional probability function p * τ (c | d) with the marginal probability function p τ (d).

Application
In this section of the present study, we will implement our proposed method to the following reaction system The state vector of the system at time t ≥ 0 is defined by The joint probability density function, p t (d, c), satisfies the following CME We separate reactions and stoichiometric vectors as follows: Propensity functions of the reactions are assumed to be where β 1 (d) = y 1 (0) − 2d, γ 1 (c) = c, β 2 (d) = y 2 (0) + 2d, γ 2 (c) = −c.
Then, the HME for the joint probability density function, p t (d, c), is defined as given below: The system of differential equation defining the marginal probabilities, the conditional means and the centered conditional moments has the following form Based on our previous discussions, we get the following system of differential equation which will be referred to as the moment equation system of the HME in the rest of the study give a system of differential equations that is expressed only in terms of the marginal probabilities, the conditional means and the second centered moments. In our application, we close moment equations setting the third and the higher moments to zero. If p t (d) = 0 , then we will not be able to obtain To avoid this drawback, in [26], the authors proposed a successful initialization procedure. Based on the fact that propensity functions must be non negative, we define the state space of the system as follows: To obtain each conditional probability density function by solving the corresponding convex optimization problem on the state space of interest, we use the CVX toolbox of the MATLAB [24]. When the size of Ω is very high, the dimensionality of the optimization problem increases. Therefore, the CVX cannot produce accurate results.
To keep the dimension of the optimization problems small for the CVX, we construct state space iteratively using a similar strategy to the sliding window method [37]. In summary, our strategy is to solve the moment equation system of the HME using an appropriate discretization method. At each discretization step, we check the marginal probabilities. If they are higher than a given threshold, then we extend the state space of the variable d. This procedure continues until the time point of the interest is reached. Finally, depending on the state space of the variable d, we construct a state space for the variable c. Now, we can explain the details of the method.
In the first step of this construction, we define a feasible subset Ω 0 of Ω, Ω 0 = D 0 × C 0 , in which the dimension of the optimization problem is acceptable for the CVX. To avoid the problem of having p 0 (d) = 0, we choose an initial Poisson distribution, p 0 (d, c), in the state space Ω 0 and compute the corresponding which will be considered as the initial conditions for the moment equation of the system of the HME.
Assume that we want to obtain the conditional counting probability density at time point τ > 0. Then, we approximate the solution of the moment equation system of the HME on [0, τ ] time interval using a numerical method. We choose a discretization time step ∆ and define t j = j∆, j = 0, 1, . . . , J such that t 0 = 0, t J = τ . As a result, we obtain subintervals [t j , t j+1 ], j = 0, 1, . . . , J − 1.
represent the approximate solution of the ODE system given in Equation (7.26) and D j represents the state space of d at time point t j . To construct D 1 , we will solve the moment equation system of the HME using initial conditions p 0 (d) Then, we will obtain p 1 (d), To extend D 0 , we define a threshold ε > 0 and check the marginal probability p 1 max ≡ p 1 (max(D 0 )). If p 1 max > ε, then we extend D 0 as follows: To approximate the solution of Equation (7.26) at time point t 2 , we need to initialize the system on D 1 . Although we know where | D 0 | denotes the cardinality of the subset D 0 . We employ this procedure successively until the desired time point τ is reached. Let D * denote the state space of d at time point τ . Here, we must choose ε > 0 such that, D * must also be in the feasible region of the CVX. Now, we can construct the feasible state space for c denoted by C * . Since we know initial domain Ω 0 = D 0 × C 0 , we only need to obtain (d, c) pairs for d ∈ D * \ D 0 . Then, for a given > 0, we construct the feasible region C * for variable c as follows: Here max(C 0 ) and min(C 0 ) denote the maximum and the minimum values of c of pairs (c, max(D 0 )) ∈ Ω 0 ,respectively. As a result, we have a feasible region Ω * = D * × C * for the CVX. Then, we can solve the corresponding convex optimization problems for each conditional counting probability density p τ (c | d), d ∈ D * using the CVX. Finally, we can compute the approximate solution of p τ (d, c). The resulting algorithm is presented in Algorithm 1. In our numerical simulation study, the state of the system is initialized y(0) = (50, 0) T and the reaction rate constants of R 1 , R 2 are given by κ 1 = 0.2s −1 , κ 2 = 0.4s −1 , respectively. We define  18 For each d ∈ D * obtain p τ (c | d)) using the CVX.
In Figures (1a) and (2a), one can see the state space Ω shown by the points with only green markers and Ω 0 shown by the points with black edged markers. The threshold for extending the region of the variable d is ε = 10 −6 , and = 2. We obtain joint counting probability density function at time points τ = 0.5 and τ = 1. We have used the Euler method with fixed time step ∆ = 10 −4 . Figures (1a) and (2a) also show the Ω * at time points τ = 0.5 and τ = 1, respectively. In both figures, the state space Ω * is the union of the points denoted by markers with black and red edges. Figures (1b) and (2b) show the joint counting probability satisfying the CME given in Equation (7.23) at time points τ = 0.5 and τ = 1, respectively. Figures (1c) and (2c) indicate the approximate solution of the p τ (d, c) satisfying Equation (7.24) obtained with Algorithm 1 at time points τ = 0.5 and τ = 1, respectively.

Conclusion
In this study, we present the hybrid master equation for jump-diffusion approximation, which models systems with multi-scale nature. The idea of jump diffusion approximation is to separate reactions into fast and slow groups based on an obtained error bound. Fast reactions are modeled using diffusion approximation, while Markov chain representation is employed for slow reactions. As a result, the state vector of the system is defined as the summation of the random time change model and the solution of the Langevin equation. In this study, based on the study of Pawula [32], we prove that joint probability density of this hybrid model over reaction counting process satisfies the hybrid master equation, which is the summation of the corresponding chemical master equation and the Fokker-Planck equation. It can be said that while [16] presents a state vector representation for reaction networks with multi-scale nature, the The joint counting probability density function satisfying the CME given in Equation (7.23) (c)The joint counting probability density function satisfying the HME given in Equation (7.24) current study complements it by obtaining evolution equation for the corresponding joint probability density over reaction counting process. To solve this equation, we use the same strategy with [26]. We write the joint probability density function as the product of the conditional counting probability density of the fast reactions conditioned on the counting process of the slow reactions and the marginal probability of the counting process of the slow reactions.
To construct the conditional probability density functions at a specific time point, we used the maximum entropy approach. We use the CVX toolbox of the MATLAB to solve the constrained optimization problems. Based on restrictions of the CVX on the dimensionality of the optimization problems, we present a method which constructs feasible regions for the CVX. We apply the method to a gene model.

A Appendix
A.1 Proof of Lemma 5.1 Proof. Using Leibniz integral rule and the boundary conditions gives Inserting Equation (4.17) into the first integral yields Since F t (c) is a polynomial function and sufficiently many moments of p t (d, c) with respect to c exist, we can manipulate the integral as follows: Here, we want to draw the attention of the reader to the following mean which is used in our equations Using this equality and the properties of the FPE will give us the following equation [9,31]: where ∂ j = ∂ ∂c j , = k 1 , . . . , k R−L . In general cellular reactions are unimolecular or bimolecular. Therefore, the third and the higher order derivatives will be zero, meaning that the conditional mean of the function γ rsi−n s (C) satisfies [26] (1.28) Here, we use the fact that As a result, if we have a reaction with linear propensity, the second term in Equation (1.28) will also be zero and we will get E t [γ rsi−n

A.3 Proof of Proposition 5.3
Proof. We will use the product rule for derivatives as follows: (1.29) By setting F t (c) = c m in Lemma 5.1, we can obtain the first derivative on the right hand-side of Equation (1.29) as follows: By using equalities ∂ ∂t C m = 0, ∂ 2 ∂c 2 j C m = 0, we get where δ jm is the Kronecker delta function.
In Equation (  Proof. Similar to our previous proofs, again we will use the product rule for derivatives as follows: The first term in the right hand-side of the equation above can be obtained from Lemma 5.1 choosing F (c) = c M . Then, we obtain Since, we have We get