General solution of the chemical master equation and modality of marginal distributions for hierarchic first-order reaction networks

Multimodality is a phenomenon which complicates the analysis of statistical data based exclusively on mean and variance. Here, we present criteria for multimodality in hierarchic first-order reaction networks, consisting of catalytic and splitting reactions. Those networks are characterized by independent and dependent subnetworks. First, we prove the general solvability of the Chemical Master Equation (CME) for this type of reaction network and thereby extend the class of solvable CME’s. Our general solution is analytical in the sense that it allows for a detailed analysis of its statistical properties. Given Poisson/deterministic initial conditions, we then prove the independent species to be Poisson/binomially distributed, while the dependent species exhibit generalized Poisson/Khatri Type B distributions. Generalized Poisson/Khatri Type B distributions are multimodal for an appropriate choice of parameters. We illustrate our criteria for multimodality by several basic models, as well as the well-known two-stage transcription–translation network and Bateman’s model from nuclear physics. For both examples, multimodality was previously not reported.


Introduction
The development of single-molecule methods such as Fluorescence in situ Hybridization (FISH) has resulted in considerable advances in fundamental research in biology (Trcek et al. 2011(Trcek et al. , 2012. Those methods provide researchers with discrete, singlemolecule data on (bio-)chemical reaction networks. The measured molecule numbers are often small, requiring a modeling approach that accounts for the discreteness of the molecule numbers (Klipp et al. 2009).
Networks of stochastically reacting molecule species are often described by the Chemical Master Equation (CME) (Gardiner 2009;Kampen 2011). The CME is a difference-differential equation. By using Gillespie's Stochastic Simulation Algorithm (Gillespie 1977), an exact realization of the CME's underlying Markov jump process is sampled and an indirect solution is obtained (Gillespie 1992). The CME may also be studied by approximation methods such as finite state projection (Munsky and Khammash 2006), or exactly using analytical approaches. Stationary solutions to the CME are known for a wide range of chemical reaction networks (Kampen 1976; Anderson et al. 2010). However, time-dependent, analytical solutions only exist for monomolecular reaction networks (Gans 1960;Jahnke and Huisinga 2007), consisting exclusively of conversion S i ↔ S j , degradation S i → ∅ and production ∅ → S j reactions, where S i defines a species of molecules. In contrast to numerical solution methods, analytical solutions allow for an abstract study of reaction networks, independently of the concrete value of reaction rate constants and network graphs.
Here, we present an extension of the class of analytically solvable CME's to a subset of first-order reactions, namely hierarchic first-order networks. In addition to monomolecular reactions, first-order reactions include (auto-)catalytic and splitting reactions, and therefore allow for more real-world applications. As a subset of those networks, we define hierarchic networks (Def. 3), characterized by a division into independent and dependent subnetworks. Common examples of hierarchic networks are the transcription-translation model in molecular biology (Friedman et al. 2006;Shahrezaei and Swain 2008), or nuclear decay chains in physics (Bateman 1910;Pressyanov 2002), both of which we discuss here.
In contrast to monomolecular networks, first-order networks result in marginal distributions of individual species that are not unimodal in general, as we show in Theorems 3 and 4. These distributions are poorly characterized by mean and variance, two essential measures used to describe statistical data in natural sciences. On one hand, characterizing multimodal distributions by mean and variance is unintuitive, since the mean is often a very unlikely outcome. On the other hand, many unimodal distributions, such as the Gaussian distribution, are fully specified by the first two cumulants, i.e. mean and variance, or just the mean in case of the Poisson distribution. Contrastingly, an infinite number of cumulants can be necessary to fully specify the multimodal distributions derived in this article. We therefore argue that CME approximation methods based on the first two cumulants or moments, such as the moment-closure method (Milner et al. 2012), yield insufficient results when reaction networks lead multimodal molecule distributions. Albeit to a lesser degree, this hold true even when more than the first two moments are used to approximate the CME's solution in approaches such as Constantino et al. (2016). Such distributions have already been reported for networks that include two-state or three-state random variables such as promoter switches studied in the context of gene-regulation (Shahrezaei and Swain 2008; Thomas et al. 2014). In the same context, a maximumentropy approach has been used to approximate multimodal molecule distributions successfully using the first seven moments (Andreychenko et al. 2017).
In the present paper, we develop an exact theoretical description for hierarchic first-order networks, based on probability generating functions. Hierarchic first-order networks are introduced in Sect. 2, followed by the main results in Sect. 3. In Sect. 3.1, we establish the existence of analytical solutions (Theorem 1) and derive the joint probability generating functions (Proposition 2), which are complete characterizations of the underlying statistics. Then, the analytical form of the marginal distributions of individual species is derived in Sect. 3.2. We show that the independent part of the network exhibits Poissonian and Binomial marginal distributions, while the dependent part is described by Discrete Compound Poisson (DCP) and Kathri Type B (KTB) marginal distributions. Next, we present criteria for the marginal distributions to be multimodal in Sect. 3.3. Since Poissonian/Binomial distributions are generally unimodal, the independent part exhibits unimodal marginal distributions (Theorem 2). In contrast, DCP/KTB marginal distributions from the dependent part can lead to multimodality under quite general conditions (Theorems 3 and 4). We illustrate these general results by several basic models (Sect. 4.1) and two real-world models (Sect. 4.2). Among these are the transcription-translation model (Example 5) and Bateman's model for nuclear decay chains (Example 6). For the former, multimodality was previously only discussed in an extended three-stage model variant (Shahrezaei and Swain 2008), whereas Theorem 3 proves the protein numbers to be multimodally distributed even in the simple two-species version. To the best of our knowledge, our exact solution to the CME for Bateman's model of nuclear decay chains is a novel result, applying the techniques developed in the preceding sections. We apply Theorem 4 to show multimodality for Bateman's model.
The derivation of probability mass functions from generating functions is reviewed in Appendix A.1.

Chemical master equation and characteristic ODEs
First of all, we define a network of n species with m chemical reactions by the n × m stoichiometric matrices Q = (Q i j ) and R = (R i j ), where Q i j , R i j ∈ N 0 . These matrices enable us to write with S = (S 1 , . . . , S n ) T being the vector of species and k = (k 1 , . . . , k m ) T the vector of rate constants.
In the present paper, we consider discrete and stochastic models of biochemical reaction networks. Let P(x, t|x 0 , t 0 ) be the probability distribution for the number of molecules x of each species at time t, given the initial condition x 0 at t 0 . This distribution is described by the Chemical Master Equation, a difference-differential equation defined by 1 where In the last equation col i denotes the ith column of a matrix. The CME (1) is based on the Markovian assumption and is derived in detail in Gardiner (2009) defined for s ∈ C n within some radius of convergence ≤ 1 in each coordinate respectively.
In the univariate case, a large list of correspondences between PGFs and distributions is provided by Johnson et al. (2005). Next, we represent the Master Eq. (1) as a partial differential equation (PDE) in terms of a generating function: (1) can equivalently be expressed as the partial differential equation This result can be found in textbooks such as Gardiner (2009), Sect. 11.6.4, p. 297. We provide a proof in "Appendix A.2" to the convenience of the reader. We define first-order networks as those obeying a first-order PDE for the generating function ( n j=1 Q ji ≤ 1), or equivalently a linear ordinary differential equation (ODE) system for the moments. In terms of chemical reactions, first-order networks consist exclusively of reactions of the type S j → l R li S l , i.e. monomolecular plus splitting and (auto-)catalytic reactions. First-order PDEs are solvable by the method of characteristics, 3 a method to convert the PDE solution problem into ODEs. The basic idea is that the solution of the PDE is given by a set of integral curves s(τ ) called characteristic curves, parametrized by some τ ∈ R. We obtain the ODE system, describing these curves s(τ ), by interpreting (4) as a total derivative, i.e.
Then, the characteristic system of ODEs, obtained by comparing the coefficients of (5) and (4), reads In case we not only require n j=1 Q ji ≤ 1, but also n j=1 R ji ≤ 1, the equations for ds i (τ ) dτ and dg(s(τ ),t (τ )) dτ are linear and we are restricted to monomolecular reactions. In consequence, the characteristic ODE system is solvable and we obtain a general solution of the CME, as shown in Gans (1960). 4 For the complete definition of the ODE system, we specify the initial conditions The generating function d(s 0 ) specifies the initial distribution. Here, we study deterministic initial conditions d(s 0 ) = s 0 x 0 and product Poissonian initial conditions where x 0 is the expected number of molecules at t = 0.

Hierarchically linear ODE systems and reaction networks
Compared to monomolecular reaction networks, a more general subset of first-order networks are hierarchically linear networks, which we characterize in this article.
Definition 2 (Hierarchically linear ODE system) Let x i (t), f i (·, t) and h i (·, t) be continuously differentiable, vector-and matrix-valued functions respectively. 5 An ODE system exhibiting the structure is called a hierarchically linear ODE system.
The next proposition gives a reason for our interest in hierarchically linear ODE systems: Proposition 1 All hierarchically linear ODE systems are solvable by means of matrix exponentials.
Proof We inspect a system of two levels only, since an extension to the n-dimensional case follows by mathematical induction. Then, the system for n = 2 has a solution in terms of simple matrix exponentials if h 1 (x 2 (t 1 )) and h 1 (x 2 (t 2 )) commute, i.e.
. 5 The temporal dependence of the functions is suppressed as usual. Furthermore, In this case, the solution can be written 6 as If the matrix-valued function h 1 does not commute for two different instants of time, the matrix exponent is expressed in terms of Magnus series (see Magnus 1954). Even in that case [h 1 (x 2 (t 1 )), h 1 (x 2 (t 2 ))] = 0, it is still possible to write down an analytical expression in terms of matrix exponentials.
A hierarchically linear ODE system translates to the study of chemical reaction networks modeled by the CME as follows: Definition 3 (Hierarchic first-order reaction network) A first-order reaction network, described by a hierarchically linear characteristic ODE system is called hierarchic first-order reaction network.

Two-level hierarchic networks
In contrast to the general form of the preceding definition, we focus on two-level hierarchic networks, composed of two subsystems. That is, we have n ind independent species S ind = (S 1 , . . . , S n ind , 0, . . . , 0) T and n dep dependent species S dep = (0, . . . , 0, S n ind +1 , . . . , S n ) T , where n ind + n dep = n. We express a two-level hierarchic network in terms of chemical reactions as The first-order reactions' products are split into two parts: S ind appears only once per reaction ( n ind j=1 R ji ≤ 1 for i ∈ {1, . . . , m}), while an arbitrary number of molecules S dep is allowed. The monomolecular reactions are defined by n j=n ind +1 R ji ≤ 1 for i ∈ {1, . . . , m}.
Note that by ignoring S dep , we might study Q T S ind → R T S ind independently of the monomolecular reactions (13), and obtain another monomolecular network, since n ind j=1 R ji ≤ 1 for i ∈ {1, . . . , m}. We therefore refer to (12) as independent part and (13) as dependent part of the hierarchic system, since the latter cannot be modeled Fig. 1 Scheme depicting a hierarchic first-order network. In system I, all non-monomolecular reactions take place, such as splitting or catalytic reactions, while in system II only monomolecular reactions take place independently of system I. Figure 1 depicts an example of such a hierarchic first-order two-level system.
Mathematically, (12) and (13) translate to a hierarchically linear characteristic ODE system. That is, the general form (6) is constrained to Note that the algebraic hierarchy is inverse to the reaction network hierarchy. The dependent part of the reaction network (system II) is described by an autonomous ODE system, while the independent part (system I) is non-autonomous due to the term n l=n ind +1 s l R li . The monomolecular system II can be expressed more compactly as where j, k ∈ {n ind + 1, . . . , n} and A := (α i j ) is the matrix holding the conversion rates. Using that matrix, we rewrite (15) as The inversion of the hierarchic and algebraic structure, mentioned in the previous paragraph, also becomes clear by the traditional rate equations for the concentration vector C dep of system II dC dep dt = AC dep + influx from system I.
It was shown by Jahnke and Huisinga (2007) using a statistical argument that the solution to the monomolecular CME is a product Poisson distribution, given Poissonian initial conditions, and a multinomial distribution, given deterministic initial conditions. Thereby, the parameters of the distributions are given by the traditional rate Eq. (17). This parametrization is also evident by the characteristic ODEs (16) because these can be expressed using the transposed Jacobian matrix of (17). However, it also becomes clear that the exact solution to the CME for a general firstorder network, is not fully parametrized by the traditional rate equations: The ODE system (13) is structurally different from the traditional rate equations for system I due to the time-dependent coefficients n l=n ind +1 s l R li . In contrast to the characteristic Eq. (14), the traditional rate equation system is generally autonomous if the rate constants k i are time-independent as we assume.
In order to highlight the relationship between traditional rate equations and characteristic ODEs for system II, we use the convention instead of (12) and (13) to fully express the two-level hierarchic reaction network. Note that we summarized influx reactions to both system I and II in (18), which yields the characteristic equations instead of (14) and (15).

Solvability of hierarchic first-order networks
First, we show that the CME is solvable for any hierarchically linear reaction network.
Theorem 1 (Existence of analytical solutions for hierarchic first-order networks) The Master equation corresponding to a hierarchic first-order reaction network is analytically solvable.
Proof Applying Proposition 1, we find that the characteristic system (6) is solvable as a linear equation system as s = e C s 0 + v, where C is a square matrix and v a column vector. For the initial condition of (7), we need s 0 . Since e C is generally invertible, the characteristic system s = e C s 0 + v can be solved for s 0 . This in turn yields the generating function and the distribution, defined as the coefficients of the former.

Remark 1
The term "analytical solution", which is often used in a more intuitive sense, requires some explanation. One may argue that the Master equation is a system of linear ODEs and can always be formally solved by a matrix exponential as such. This remains true even for bimolecular networks such as the Lotka-Volterra model (Lotka 1925). However, a formal solution of the Master equation does not tell us anything about the structure of the distribution, if the matrix exponential cannot be calculated explicitly. Contrary to this non-analytical case, the generating function stated in the next proposition earns the word "analytical", even if it also contains a matrix exponential. As shown in Sect. 3.2, the matrix exponential from the next proposition enables us to analyze properties of the distribution and may be used to find the stationary distribution, to compute moments and cumulants, etc.
Given that the molecules are initially product Poisson distributed, i.e. g(s, 0) = exp x 0 · (s − 1) , the generating function is given by In the last expression, the Jacobian J is defined by (20) 7 Without this requirement, we obtain an additional factor e t 0 b·(s ind (t )−1)dt , where s ind (t ) is defined in (55). For more compact expressions, this factor was left out without loss of generality.
with ε (1) We postpone the proof to "Appendix A.3" and interpret this result first. Remember that the deterministic or product Poisson distributions, assumed as an initial distributions in the preceding proposition, imply that all random variables are uncorrelated for a fixed time t. This can be shown by using the cumulant generating function κ(ξ ) = log(g(e ξ )), that is given for a product Poisson distribution by Here, "uncorrelated" means that the covariances and higher order mixed cumulants, defined as the coefficients of κ(ξ ), are zero: For monomolecular reaction networks, the variables remain uncorrelated after some time t > 0, since a monomolecular system given Poissonian initial conditions stays Poissonian, as shown in Jahnke and Huisinga (2007), Proposition 2. Contrastingly, for first-order processes, covariances between the variables appear for t > 0, because the exponent of Eq. (19) is in general not a first-order polynomial. In other words, monomolecular systems stay uncorrelated for uncorrelated initial conditions, while first-order systems do not.
We understand the correlations in terms of chemical reactions by the difference between the first-order splitting reaction S j → S k + S l and the two monomolecular reactions S j → S k and S j → S l : The molecules S k and S l appear simultaneously in the splitting reaction, while the two monomolecular conversions are statistically independent events.

Analytical form of marginal distributions
As we show in the following, the hierarchic network structure yields marginal distributions that are best interpreted as generalized distributions. We follow Johnson et al. (2005) for the next definition.
Definition 4 (Generalizing distribution) The distribution P 1 is generalized by the generalizing distribution P 2 : where g 1 (s) and g 2 (s) are the corresponding generating functions.
Generalized distributions are related to mixing distributions by Gurland's theorem (Gurland 1957), if certain requirements are met. Next, we define two classes of generalized distributions that arise as marginal distributions for hierarchic networks.
Definition 5 (Discrete compound Poisson distribution (DCP)) A discrete compound Poisson distribution P DCP is a univariate distribution (Zhang et al. 2014), generalizing the Poisson distribution with another distribution P 2 : The generating function is defined accordingly: The next type of distribution was previously defined by Khatri and Patel (1961) and will be important for our study of reaction networks given deterministic initial conditions: Definition 6 (Khatri's Type B distribution) Khatri's Type B distribution P KTB is a univariate distribution, generalizing the deterministic distribution with another distribution P 2 : The generating function is defined accordingly: are parameters and ν ∈ N 0 is the number of trials. Furthermore, g 2 (s) must be a generating function, i.e. ∞ i=1 α i = 1. We introduce the notation These distributions specify the marginal distributions of the dependent part (system II) of hierarchic first-order reaction networks. We will show in Sect. 3.3, that the DCP N and KTB N distributions are conditionally multimodal for N > 1.

Proposition 3 (Marginal distributions given
Given Poissonian initial conditions, the marginal distribution of any species X from the dependent part of the network is DCP N , where N < ∞. Given deterministic initial conditions, this distribution is KTB N .
To prove this statement, we need an auxiliary result: Lemma 2 Let H(x) be an n ×n matrix, whose entries depend polynomially on x ∈ C.
Proof By the Cayley-Hamilton theorem, H(x) fulfills its own characteristic polynomial Δ(λ), i.e. Δ(H(x)) = 0. Furthermore, any polynomial p(λ) might be expressed as p(λ) = q(λ)Δ(λ) + r (λ), where q is found by long division p/q with remainder r of degree ≤ n − 1. Since Δ(H(x)) = 0, we might express 9 e H(x) as Here, the second relation follows again from long division by the characteristic polynomial of H(x). For ∂ x λ i (x) = 0, the coefficients of the characteristic polynomial do not depend on x and thus ∂ x α k (x) = 0. Therefore, the expression (26) introduces only a finite-order dependence on x.
Using this Lemma, we prove Proposition 3: Proof We obtain the marginal distribution of species X by setting all s i = 1 except s X in Eqs. (19) and (23) respectively. First of all, the integrals in Eq. (19) do not change the order of the polynomial in the exponent with respect to s X , that is i j . Therefore, we have a DCP N distribution, where N is given by (24). We find 10 Because J(s X ) depends polynomially on s X , we apply Lemma 2 to obtain For simple reaction networks such as those from the next proposition, we show the order of the marginal distributions to be larger than one. We first investigate networks whose independent part is mass-conservative, i.e. n ind l=1 R li = 0 ⇒ f = 0 and The marginal distribution of any dependent species X is DCP N (KTB N for deterministic initial conditions), with N > 1 for the following minimal reaction networks: 1.

S ind
where R 1 ≥ 1, R 2 ≥ 1 and n ind = 3. 10 We suppress the time-dependence of J in the following.
Proof We expand the matrix powers in starting with J 1 : For reaction Type I, it suffices to consider the first order J 1 of (30). Since there are no reactions within the dependent part A = 0, we then have In consequence of (31), we get deg s X p,q (e J(s X ) ) pq > 1 whenever R i > 1. This translates to a splitting reaction of Type I. To see ∂ s X λ i (s X ) = 0, consider For reaction Type II, n ind = 3 and we need to calculate the second order J 2 in (30). We have 11 Note that only i = j terms are non-zero due to n ind l=1 R li ≤ 1. This implies that the products from the independent part must appear in different reactions. In terms of chemical reaction networks, we have We arbitrarily chose p = 1, r = 2, q = 3 to obtain the result. To see ∂ s X λ i (s X ) = 0, consider then the marginal distribution of any species X from the dependent part is DCP N (KTB N for deterministic initial conditions), with N > 1 for the following minimal reaction network: where R > 1 and n ind = 1.
Proof We express the reaction S ind 1 → R X by and plug the same expression into (19) to obtain log g(s X , t) ∝ s R X . In consequence, we get a DCP N distribution with N > 1 whenever R > 1. The analogous statement for deterministic initial conditions follows by plugging f 1 (s X ) into (23).
The next proposition holds for the transcription-translation model, as examined in detail in Example 5.
Proposition 6 (Marginal distributions given ∂ s X λ i (s X ) = 0) Let λ i (s X ) be the ith eigenvalue of J(s X , t , t) and let ∂ s X λ(s X ) = 0. Given Poissonian initial conditions, the marginal distribution of any species X from the dependent part of the network is DCP ∞ (KTB ∞ for deterministic initial conditions).
Proof In case all eigenvalues are distinct, we solve the system (27) by inverting the matrix 12 L(s X ), so Since (L −1 ) ki is a rational function of the eigenvalues, we have In case there is an eigenvalue λ j of multiplicity μ, the matrix L is not invertible. Since we derive (27) for λ and obtain μ − 1 additional equations: We solve this system together with (27) for α k (s X ) by inverting a matrix. The entries of this matrix are rational functions of λ, so the terms e λ i (s X ) in (32) do not cancel, i.e. Equation (33) holds. In the last step, we plug Eq. (32) into (19) for Poissonian and into (23) for deterministic initial conditions to obtain the generating function for the marginal distribution of X . Since the exponent of (19) is of infinite degree, we obtain a DCP ∞ class distribution. For deterministic initial conditions, we obtain a KTB ∞ class distribution.

Modality of marginal distributions
In this section, we investigate under which conditions the generalized distributions from Proposition 2 have infinitely many modes. We need several definitions for this end: Definition 7 (Unimodality) A distribution p n is said to be unimodal with mode a ∈ N 0 if p n ≥ p n−1 , ∀ n ≤ a and p n+1 ≤ p n , ∀ n ≥ a.
If this property does not depend on the parameters of the distribution, the latter is said to be unconditionally unimodal. A unimodal distribution that results in a unimodal distribution upon convolution with another unimodal distribution is called strongly unimodal. 13 The classes DCP 1 and KTB 1 are unconditionally unimodal, since the only members of these sets are the Poissonian and Binomial distribution respectively. For the independent species (system I) of a hierarchic first-order reaction network, we obtain unimodal marginal distributions: Theorem 2 (Unconditional unimodality of marginal distributions of independent species (system I)) Given deterministic or Poissonian initial conditions, the species from the independent part of the hierarchic network exhibit unconditionally unimodal marginal distributions at all times t > 0.
Proof Upon setting all s i = 1, except one s X from the independent part, we obtain a Poissonian distribution for Eq. (19). For deterministic initial conditions, Eq. (23) yields a convolution of binomial and Poisson distributions. Since the binomial and Poisson distribution are strongly unimodal, the resulting distribution is unimodal as well, as shown by Keilson and Gerber (1971).
For the second part of a hierarchic first-order network, we need the following definition: Definition 8 (Conditional multimodality) A distribution that is not unconditionally unimodal is called conditionally multimodal.
As an example for the preceding definition, we have Neyman's Type A distribution (Neyman 1939), defined by the generating function exp(λ(e φ(s−1) − 1)), where λ > 0 and φ > 0. Because of the double exponential generation function, it belongs to the class of DCP ∞ distributions that we prove to be conditionally multimodal in the next proposition.
Proof Since conditional multimodality is defined to be the contrary of unconditional unimodality, a counterexample is enough to prove conditional multimodality. Respective counterexamples are shown in Fig. 2. Figure 2a shows Neyman's Type A distribution, defined by the DCP ∞ generating function g(s) = exp(λ(e φ(s−1) − 1)), can be multimodal. 14 The same is done for the Hermite distribution by the generating function g(s) = e a 1 (s−1)+a 2 (s 2 −1) . As a DCP 2 distribution, it is conditionally multimodal, as demonstrated in Fig. 2a. A counterexample for DCP 3 is given by 13 Defined by Keilson and Gerber (1971). 14 Note that the coefficients of e φ(s−1) are defined as the parameters of DCP ∞ , not φ.  Fig. 2b. Figure 2d-f shows counterexamples for the generalized binomial distributions KTB 2 , KTB 3 and KTB ∞ : Since we cannot provide an infinite number of counterexamples, one for each DCP N or KTB N class, where N > 1, we extend the result of the last proposition as a conjecture.

Conjecture 1 The classes DCP N and KTB N are conditionally multimodal for N > 3.
The intuition behind this conjecture is that the DCP N and KTB N classes are modeling events that can occur in bursts. In terms of Definition 4, the generalizing distribution P 2 is modeling the (stochastic) number of events per burst or burst size, whereas the generalized distribution P 1 models the (stochastic) number of bursts, yielding the total number of events as P ∼ P 1 P 2 . In consequence, the distances between the modes seen in Fig. 2 Fig. 2c, the modes are clearly visible at multiples of 18.1. Thus, the bursting behavior is linked to conditional multimodality. In consequence, what we are conjecturing is that this link stays the same as we increase the number of possible bursts beyond 3.
In the next Theorems, we summarize the cases in which conditional multimodality appears for the species of the dependent system II.
Theorem 3 (Sufficient criterion for conditional multimodality, given deterministic or Poissonian initial conditions and ∂ s X λ(s X ) = 0) Let ∂ s X λ(s X ) = 0. Then the dependent species X in a hierarchic first-order network obeys a conditionally multimodal marginal distribution at all times t > 0.
Theorem 4 (Minimal reaction networks exposing conditional multimodality, given deterministic or Poissonian initial conditions and ∂ s X λ(s X ) = 0) Let ∂ s X λ(s X ) = 0, b = 0 and assume Conjecture 1 is true. Then, given Poissonian or deterministic initial conditions, the following minimal hierarchic first-order networks exhibit a conditionally multimodal marginal distribution at all times t > 0 for the dependent species X: 1. For mass-conservative independent networks, i.e. f = 0: (a) S ind 1 → S ind 2 + R X, (Type I) where R > 1 and n ind = 2.

For open independent networks and n ind
l=1 R li = 0 for all reactions, i.e. ∂ s X J = 0 and ∂ s X f = 0, the minimal network is: where R > 1 and n ind = 1.
Proof By Proposition 4 we know that the distribution class is DCP N Prop4 whenever f = 0, where the degree N Prop4 > 1 for both types of minimal reaction networks. Proposition 5 applies if ∂ s X f = 0, and yields the minimal reaction network S ind 1 → R X for which N Prop5 > 1.
These distribution classes are conditionally multimodal if N Prop4 and N Prop5 are larger than one according to Proposition 7 and Conjecture 1.
In general conditional multimodality arises, if the degree of the exponent of (19) is larger than one upon setting all s i = 1 except s X .

Examples
In the following two sections, we illustrate how our results can be used to predict conditional multimodality by several examples. The calculations can be verified using the supplemented Wolfram Mathematica files.

Basic models
Example 1 (Catalysis) The model of the simple catalytic reaction X k cat → X +Y is a trivial example of a hierarchic system, since system II is governed by a time-independent characteristic ODE. By Eqs. (6-7) we have The solution of (35) simply reads s Y (t) = s 0 Y . We reorder the terms in (34) and solve the same equation as The generating function, given by (8) for Poissonian initial conditions, reads Fig. 3 Marginal distributions for catalysis product p y := P(y, t|y 0 = 0, 0), given the parameters k cat = 20 s −1 , deterministic (y 0 = 0) and Poissonian initial conditions for x 0 = 2. The thick gray lines mark the expectation value, and the dashed lines the range of one standard deviation from the mean. The arrow marks the probability for a zero outcome and s Y = s 0 Y we obtain the generating function

By replacing
The marginal distribution of Y can be represented as a convolution of Neyman's type A distribution and a Poisson distribution: In case we set y 0 = 0 as a deterministic initial condition, the distribution is identical to Neyman's Type A distribution. In Fig. 3, we depict a numerical evaluation of this multimodal distribution. Conditional multimodality is expected by Theorem 3, since the Jacobian exhibits eigenvalues dependent on s Y . Note that the most likely outcome is zero, is far from the mean (solid gray line in Fig. 3), even outside the standard deviation (dashed gray line). This clearly demonstrates the misleading character of mean and standard deviation for multimodal distributions. The high probability for a zero outcome, depicted by the black arrow in Fig. 3, can be explained by the high likelihood of an "extinction scenario": If there are initially no molecules of species X , the number of molecules of type Y must always be zero. This phenomenon can be found in many hierarchic networks, as we show in the following figures. Using deterministic initial conditions, we obtain a contrasting picture, since the resulting marginal distributions are Poissonian: In consequence, no multimodality is observed. However, since P cat ∈ KTB ∞ , Proposition 7 predicts conditionally multimodality. This is not a contradiction because conditional multimodality is a notion defined by the order of the generalizing distribution g 2 (s) (see Def. 6), not by the model parameters (here k cat ). For a concrete Fig. 4 Plot of the joint distribution for the educt X and product Y for the same parameters and time points as Fig. 3 model, only subsets of the parameter space, defined by the set of coefficients of g 2 (s), are reachable by appropriate choices of the model parameters (here k cat ). In our model this subset does not contain to multimodal distributions. The joint distribution is plotted in Fig. 4. To obtain a multimodal distribution for deterministic initial conditions, we add a degradation reaction, as shown in the next example.
Example 2 (Catalysis with degradation) We see a different picture for deterministic initial conditions once we add a degradation mechanism: These reactions result in the characteristic system Deterministic initial conditions result in the following generating function: Figure 5 shows the comparison between Poissonian and deterministic initial conditions. For Poissonian initial conditions, the scenario of starting without any molecules of species X is possible. This "extinction scenario" does not lead to the production of any Y molecules at any time and is seen by the large peak at y = 0 in Fig. 5a. For deterministic initial conditions, we set x 0 = 1 and Fig. 5b therefore does not exhibit a peak of the same size at y = 0. However, after some time, the species X may well become extinct and therefore the mode at y = 0 still appears, albeit with a lower size compared to Fig. 5a.
Example 3 (Simple splitting) The simple splitting reaction X k 1 → Y + Z , obeys the characteristic system It is solved by Poissonian initial conditions yield the generating function Therefore, the marginal distributions are Poissonian and the joint distribution is a multivariate Poisson distribution, studied in detail by Kawamura (1979). The product Poisson distribution differs from the multivariate Poisson distribution by the fact that the variables Y and Z correlate, yet the marginal distributions stay unimodal for both. Without the explicit solution (40), we prove unconditional unimodality by noting that either Y or Z may be assigned to the independent part of the network. In consequence, Theorem 2 can be applied to both Y and Z .
Example 4 (Additional conversion) Things get more interesting as we add a conversion reaction to the same network: The characteristic system is given by and the Jacobian by The characteristic polynomial depends on s Z , which implies ∂ s Z λ(s Z ) = 0. Proposition 6 therefore applies and the resulting distributions are conditionally multimodal. Multiple modes, obtained by assuming Poisson distributed initial numbers, are visible in the marginal distribution Fig. 6 as well as the joint distribution Fig. 7.
By adding a degradation reaction X k 3 → ∅ to this model and setting k 2 k 1 , we obtain a model that behaves like Example 2. In consequence, multimodal distributions

Real world models
Example 5 (Transcription-translation model) The two-stage model, which excludes any effects of promoter activity, consists of the reactions where X are mRNA molecules and Y proteins. We use the implementation of Gillespie's algorithm by Maarleveld et al. (2013) to simulate this model.
The appearance of multimodal distributions is due to the reaction X → X + Y , studied in Example 2. As mentioned there, the Jacobian's eigenvalues depend on s Y , conditional multimodality is expected. Figures 8 and 9 show that the protein distribution can be multimodal for deterministic and Poissonian initial conditions respectively. Previously, the two-stage transcription-translation model was not shown to be conditionally multimodal (Shahrezaei and Swain 2008). Given small degradation rates, the apparent difference between the shape of Neyman's Type A distribution (see Figs. 2c and 3), solving the CME for X → X + Y , and the simulation results is rather small. Furthermore, the previously described "extinction scenario" is visible by the large peak at y = 0 in Figs. 8 and 9. This highlights the usefulness of basic models for the study of more complex models. The right panel in Fig. 8 shows that the probability of still having zero proteins after 15 s is much smaller than at 2.6 s. Therefore, the corresponding mode disappears. However, the distribution class is still conditionally multimodal as predicted by Theorem 3. Remember that conditional multimodality is a notion defined by the order of the generalizing distribution g 2 (s) (see Def. 6), not by the model parameters.
Example 6 (Natural decay series including decay particles) In an extension of the classical model by Bateman (1910), we now formulate the reaction network for nuclear decay chains including decay particles α, β and γ . The stoichiometry of the model is where R i,α , R i,β and R i,γ are the numbers of decay particles involved in each reaction.
The generating function PDE corresponding to the Master equation is As before, we solve the PDE (41) using the method of characteristics. The characteristic ODEs corresponding to Eq. (41) are: These equations represent a hierarchically linear ODE system. It can be interpreted as being composed of two subsystems, just as in Theorem 2. System I consists of all isotopes, while system II represents the α, β and γ particles. System II is a trivial example of a monomolecular reaction network, since no reactions take place within the system, we just have to consider the influx of particles from system I. We note that the Jacobian matrix of the characteristic system is identical to the rate equation's transposed, as discussed in Sect. 2.3. The ODE system is homogeneous and we have where A(s α,β,γ ) = (∂ s j ds i dt ) is the Jacobian. We express the generating function for Poissonian initial conditions by this matrix as g(s, t) = exp x 0 · (s 0 − 1) = exp x α,β,γ 0 · (s α,β,γ − 1) + x 1,...,n 0 · (exp(−A(s α,β,γ )t)s − 1) = exp x α,β,γ 0 · (s α,β,γ − 1) + exp(−A(s α,β,γ ) T t) x 1,...,n 0 · s − x 1,...,n 0 · 1 .
In the last line, we have used the identity Ab · c = b · A T c = A T c · b, which holds for any square matrix A and two vectors b, c of the same dimension. 15 The insight of the equivalence of the characteristic system and the transposed rate equation system, as discussed in Sect. 2.3, greatly simplifies the following computations. In consequence, the solution of the Master equation is not much more difficult than the solution of the rate equations, as provided by Bateman (1910) originally. 15 This identity can be seen by writing out the components of the scalar and matrix product: The difference to Bateman's system is that the negative Jacobian matrix A T depends on the constants s α , s β and s γ : Since this matrix is triangular, the eigenvalues are the diagonal entries. Because the eigenvalues do not depend on s α,β,γ , the marginal distribution of the corresponding particles obeys Proposition 3. Furthermore, Theorem 4 predicts that conditional multimodality depending on the choice of R α,i , R β,i and R γ,i . Note that Theorem 4 gives us this information without the exact solution of the system presented here. We now proceed along the lines of Pressyanov (2002) and solve the rate equation system, transposed to the characteristic system, to obtain exp(−A T (s α,β,γ )t) x 1,...,n 0 . The computations can be found in "Appendix A.4" and yield exp − A T (s α,β,γ )t x 1,...,n 0 i In consequence, we use the matrix exponential to express the generating function as g(s, t) = exp x α,β,γ 0 · (s α,β,γ − 1) Interestingly, the further the isotopes are downstream of the decay chain, the more they contribute to the higher order cumulants 16 of the α, β and γ particles. 17 An insight like that is not easily gained by doing stochastic simulations, because the simulation results alone would not directly point to this fact. We also see that the distribution is not simply characterized by mean and covariance. Next, we consider multimodality of the marginal distributions. Since the eigenvalues of A T (s α,β,γ ) do not depend on s α,β,γ , Theorem 4 applies and predicts conditional multimodality for several minimal networks. First, we consider networks corresponding to case 1. We evaluate the solution for the decay chain and plot the marginal distributions for Y in Fig. 10. The plots, depicting the temporal evolution of the marginal distribution, clearly demonstrate multimodality. Here, (45-46) represent the minimal reaction network Type I from Theorem 4. A minimal reaction network of Type II is obtained by reducing the network to Finally, by assigning X 2 to the dependent part of the network, we obtain an even smaller network as X 1 k 1 → (X 2 ) + 2Y . Now the independent part of the network is open. If we ignore X 2 , this represents case 2 in Theorem 4.
Note that X 2 can either be assigned to the independent or the dependent part of the reaction network. However, in either case Theorem 4 applies and predicts conditional multimodality.

Conclusions
In this article, we analyzed the CME for hierarchic first-order reaction networks, based on a general solution method. We showed in Theorem 1 that hierarchic reaction networks are generally treatable in an analytic manner. We derived the analytical solution for the joint probability generating function for Poisson and deterministic initial conditions. Next we analyzed the multimodality of resulting marginal distributions of individual species. The analysis revealed that the marginal distributions of species from the independent part of the network are unimodal (Theorem 2), while the dependent part yields conditionally multimodal distributions (Theorems 3 and 4). Furthermore, we presented several basic models of hierarchic reaction networks, which we consider insightful for the understanding of larger reaction networks. We illustrated this point by showing the similarity between the catalysis basic model and the transcriptiontranslation model. As a proof of principle, we showed that even more complex models such as the nuclear decay chain presented here, are amenable to an exact analytical treatment.
Underlining the prevalence of multimodality, we saw that even trivial networks like the one from Example 1, consisting of the reaction X → X + Y , are conditionally multimodal given Poissonian initial conditions. We also showed that deterministic initial conditions give rise to multimodal distributions in Examples 2 and 5. For the two-stage transcription-translation model, multimodality was previously not reported (Shahrezaei and Swain 2008).
Since the main Theorems 3 and 4 only require knowledge about the dependence of the eigenvalues of the Jacobian matrix on the variable of interest, it is sufficient to compute the characteristic polynomial of the Jacobian matrix. However, the characteristic polynomial of any square matrix is algorithmically computable and so our Theorems are algorithmically applicable in practice. We furthermore demonstrated how insight about a network may be gained by simply dividing the network into dependent and independent parts (see Example 3), without doing any calculations or simulations. Therefore, we believe that our abstract treatment of hierarchic reaction networks may inspire other researchers to analyze and design networks with respect to their multimodality properties.

Further developments
As a future project, we suggest the development of a numerical software to determine the parameter regions of multimodality of our basic models. In that study, the collection of basic models maybe extended, since the ones presented here are far from the only ones whose generating function may be written in a compact form. It might also be interesting to study n-fold hierarchies instead of the two-level hierarchies presented here.
Furthermore, we note that the Luria-Delbrück model (Zheng 1999), describing the division of wild-type cells X into mutants Y may be investigated in a similar way we proposed. One of the many variants of the model is defined by Even though the reaction equations show the hierarchic character of the model, they do not fit our definition of hierarchic first-order networks. This fact can by seen by the characteristic ODEs: The system is not hierarchically linear due to the s 2 X and s 2 Y terms, which is why we excluded autocatalytic reactions X → 2X from our study. However, (48) and (49) are examples of Riccati differential equations, and may be linearized as such by a coordinate transformation. In this way, an exact solution of the system can be obtained, 18 even though it involves large expressions involving hypergeometric functions. by In practice, the sum k j=0 often converges after a constant number of terms (i.e. the convergence is linear, 19 ) so the computational cost of evaluating n probabilities is O(n). Khatri and Patel (1961) provide recurrences for KTB N distributions, needed for the deterministic initial condition setting.
The generating function can also be used directly for parameter estimation, without calculating the distribution first. Marcheselli et al. (2008) propose a method for parameter estimation that is based exclusively of generating functions.

A.2 Proof of Lemma 1
Proof The proof is adapted from Gardiner (2009), Sect. 11.6.4, p. 297. We first multiply with s x and sum over all x to obtain Using the definition of the stochastic propensities α i (x) = k i n j=1 x j ! (x j −Q ji )! , we find the identities Plugging these into Eq. (52), the desired Eq. (4) is obtained. 19 Deuflhard and Hohmann (2003), p. 85, Definition 7.

A.3 Proof of solution formula for twofold hierarchy
Proof We start by writing out the PDE for a system of two parts, corresponding to the variables s ind := (s 1 , . . . , s n ind ) T and s dep := (s n ind +1 , . . . , s n ) T , as depicted in an exemplary manner in Fig. 1. The PDE for the generating function, derived from Lemma 1, then reads is defined to be the function representing the produced molecules that do not belong to part I. The characteristic ODEs are obtained in the same manner as for monomolecular systems and read for part I as: Part II is monomolecular and the characteristic equations are: The total derivative of g(s(t), t) reads The Jacobian matrix for part I is given by If the Jacobian commutes for two different times, i.e. J(t 1 )J(t 2 ) = J(t 2 )J(t 1 ), the homogeneous part of this system is solved by exp t 0 J(s dep (t ))dt . If it does not commute, exp t 0 J(s dep (t ))dt is the first term of the Magnus expansion The nonhomogeneous part is for the lth component of system I, so the complete solution reads From Eq. (53), we know that where d(s 0 ) is an arbitrary function to be determined by the initial conditions. The integral over c i (s dep (t , t)) = n l=n ind +1 (s l (t , t)) R li appears both in the homogeneous and in the inhomogeneous part of the equations for s ind (t , t): In the last line, we made use of the identity Ab · c = b · A T c = A T c · b. We note that the integral can be expressed completely in terms of the fundamental matrix of the second, monomolecular reaction network. Thus, the property that the statistics of monomolecular networks can be fully parametrized by the solution of its traditional reaction rate equations translates to hierarchic first-order networks. Using Poissonian initial conditions, we find d(s 0 ): We solve Eq. (55) for s 0 ind : The last integral is over a double exponential function and a simple exponential, hidden in and is not easily expressed in terms of elementary functions. A series expansion can be found, but is not very simple, which is why we keep the integral as it is. For s 0 dep , we obtain the same result as in the monomolecular case, namely If the initial condition is deterministic, that is P(x, 0) = δ x,x 0 , where x 0 = (x 0 1 , . . . , x 0 n ) T , we have This yields the result, using i := col i I n×n and (1) i := ( i,1 , . . . , i,n ind ) T , (2) i := ( i,n ind +1 , . . . , i,n ) T : (1) i · f(s 2 (t ))dt A.4 Solution of characteristic ODE system for decay chains The ODE system, describing the temporal evolution of the concentration of isotope x i , is transposed to the characteristic system (43-44) of the PDE (41). More precisely, the system for x i is identical to the original one from Bateman (1910) x n−1 − k n x n .
Due to this similarity, we are able to adapt the calculations from Pressyanov (2002) almost one-to-one. We take the Laplace transform 21 on both sides of Eqs. (63) and (64) In Laplace domain, we solve a linear equation system and invert the Laplace transform x(σ ) to obtain x(t). The solution of Eqs. (65-66) is given bỹ . . .
The last Eq. (67) represents the starting point of a linear recurrence, which, if the initial conditions x 2 , . . . x n are set to zero, 22 can be solved bỹ Here, δ defines an integration path parallel to the complex line that is situated right of all singularities ofx n (σ ). We may choose δ = 0, since all poles are on the negative part of the real axis. By closing the integration path through the addition of a semicircle (see Fig. 11), we obtain an integral that can be solved by applying the residue theorem, since the contour is closed 23 : Here, the contribution of the circular part to the value of the integral along C is zero. 24 we may calculate x n (t) by the residue formula x n (t) = 1 2πi C e σ tx n (σ )dσ = n j=1 Res j (e σ tx n (σ )) , where the sum is over all poles. In our case, we have n simple poles and so the residue theorem yields Res j (e σ tx n (σ )) = lim σ →−k j ((σ + k j )e σ tx n (σ )) 23 See Boas (2006) for a review of the solution methods for this kind of integral. 24 To see this, apply the integral triangular inequality.
Summing over all residues, we get The only thing left to do is to generalize the initial conditions for x 2 , . . . x n−1 . If we set x m = 0, we obtain for example .