Tighter bounds on transient moments of stochastic chemical systems

The use of approximate solution techniques for the Chemical Master Equation is common practice for the analysis of stochastic chemical systems. Despite their widespread use, however, many such techniques rely on unverifiable assumptions and only few provide mechanisms to control the approximation error quantitatively. Addressing this gap, Dowdy and Barton [The Journal of Chemical Physics, 149(7), 074103 (2018)] proposed a method for the computation of guaranteed bounds on the moment trajectories associated with stochastic chemical systems described by the Chemical Master Equation, thereby providing a general framework for error quantification. Here, we present an extension of this method. The key contribution is a new hierarchy of convex necessary moment conditions crucially reflecting the temporal causality and other regularity conditions that are inherent to the moment trajectories associated with stochastic processes described by the Chemical Master Equation. Analogous to the original method, these conditions generate a hierarchy of semidefinite programs that furnishes monotonically improving bounds on the trajectories of the moments and related statistics. Compared to its predecessor, the presented hierarchy produces bounds that are at least as tight and it often enables the computation of dramatically tighter bounds as it enjoys superior scaling properties and the generated semidefinite programs are highly structured. We analyze the properties of the presented hierarchy, discuss some aspects of its practical implementation and demonstrate its merits with several examples.


A. Introduction
The analysis of systems undergoing chemical reactions lies at the heart of many scientific and engineering activities.While deterministic models have proved adequate for the analysis of systems at the macroscopic scale, they often fall short for meso-and microscopic systems; in particular for those that feature low molecular counts.In this regime, the complex and chaotic motion of molecules reacting upon collision causes effectively stochastic fluctuations of the molecular counts that are large compared to the mean and, as a consequence, can have a profound effect on the system's characteristics -a situation frequently encountered in cellular biology [1][2][3][4] .In the context of the continuously growing capabilities of synthetic biology, this fact motivates the use of stochastic models for the identification, design and control of biochemical reaction networks.However, while in these applications stochastic models provide the essential fidelity relative to their deterministic counterparts, their analysis is generally more involved.
Stochastic chemical systems are canonically modeled as jump processes.Specifically, the system state, as encoded by the molecular counts of the individual chemical species, is modeled to change only discretely in response to reaction events as triggered by the arrivals of Poisson processes whose rates depend on the underlying reaction mechanism.The socalled Chemical Master Equation (CME) describes how the probability distribution of the state of such a process evolves a) Electronic mail: holtorf@mit.edub) Corresponding Author; Electronic mail: pib@mit.eduover time.To that end, the CME tracks the probability to observe the system in any reachable state over time.This introduces a major challenge for the analysis of stochastic chemical systems which routinely feature millions or even infinitely many reachable states, rendering a direct solution of the CME intractable.As a consequence, sampling techniques such as Gillespie's Stochastic Simulation Algorithm 5,6 have become the most prominent approach for the analysis of systems described by the CME.Although these techniques perform remarkably well across a wide range of problems, they are inadequate in certain settings.Most notably, they do not scale well for stiff systems, generally do not provide hard error bounds and the evaluation of sensitivity information is challenging 7 .Specifically, the two latter shortcomings limit their utility in the context of identification, design and control.Alternatives such as the finite state projection algorithm 8 come with guaranteed error bounds and straightforward sensitivity evaluation, however, suffer generally more severely from high dimensionality.
From a practical perspective, stochastic reaction networks are often sufficiently characterized by only a few low-order moments of the associated distribution of the state, for example through means and variances.In that case, tractability of the CME may be recovered by solving for the moments of its solution directly.The dynamics of a finite sequence of moments associated with the distribution described by the CME, however, do not generally form a closed, square system, hence do not admit a solution by simple numerical integration.Numerous moment closure approximations [9][10][11][12] have been proposed to remedy this problem.A major shortcoming of moment closure approximations, however, is that they generally rely on unverifiable assumptions about the underlying distribution and therefore introduce an uncontrolled error.In fact, it is well-known that their application can lead to unphysical results such as spurious oscillations and negative mean molecular counts [13][14][15] .Addressing this shortcoming, several authors have recently proposed methods for the computation of rigorous, theoretically guaranteed bounds for the moments (or related statistics) associated with stochastic reaction networks; such bounding schemes have been proposed for the steady [16][17][18][19] and transient setting [20][21][22][23] , and their utility for the design of biochemical systems has been demonstrated 24 .The key insight underpinning these methods, rooted in real algebraic geometry, is that the moment sequence associated with the true solution of the CME must satisfy a rich set of algebraic conditions reflecting the support and dynamics of the true distribution.Crucially, these conditions only depend on the problem data but do not require explicit knowledge of the true solution.The use of mathematical programming to identify a truncated moment sequence which minimizes (maximizes) a given moment or related statistic of interest subject to these conditions then furnishes a valid lower (upper) bound on the true value.While such bounds provide a mechanism to quantify errors or verify the consistency of approximation techniques, they are even frequently found to be sufficiently tight to be used directly as a proxy for the true solution [16][17][18][19] .
In this work, we extend the bounding scheme proposed by Dowdy and Barton 20 for transient moments of the solutions of the CME.To that end, we introduce new necessary moment conditions that improve the tightness of the semidefinite relaxations on which Dowdy and Barton's approach is based.These necessary moment conditions reflect crucially the temporal causality that is inherent to solutions of the CME.The conditions lend themselves to be organized in a hierarchy that provides a mechanism to trade-off computational cost for higher-quality bounds.Moreover, we show that the conditions exhibit favorable scaling properties and structure when compared to the conditions employed in the original method.
This article is organized as follows.In Section II, we introduce definitions and assumptions, formally define the moment bounding problem, and review essential preliminaries.Section III is devoted to the development and analysis of the proposed hierarchy of necessary moment conditions.In Section IV, we discuss certain aspects pertaining to the use of these conditions for computation of moment bounds in practice.The potential of the developed methodology is demonstrated with several examples in Sections V and VI before we conclude with some open questions in Section VII.

II. PRELIMINARIES A. Notation
We denote scalars with lowercase symbols without emphasis while vectors and matrices are denoted by bold lower-and uppercase symbols, respectively.Throughout, vectors are assumed to be column vectors.Generic sets are denoted by uppercase symbols without emphasis.For special or commonly used sets we use the standard notation.For example, for the (non-negative) n-dimensional reals and integers, we use the usual notation of R n (R n + ) and Z n (Z n + ), respectively.Similarly, we refer to the set of symmetric and symmetric positive semidefinite (psd) n-by-n matrices with S n and S n + , respectively, and use the usual shorthand notation A B for + .The set of n-dimensional vector and symmetric matrix polynomials with real coefficients (of degree at most k) in the variables x = [x 1 . . .x N ] will be denoted by ), respectively.In order to concisely denote multivariate monomials, we employ the multi-index notation: for a monomial in n variables corresponding to the multi-index j = [ j 1 . . .j n ] ∈ Z n + , we write x j = ∏ n i=1 x j i i .The indicator function of a set A is denoted by 1 A .Lastly, we denote the set of n times continuously differentiable functions on an interval I ⊂ R by C n (I) while the set of absolutely continuous functions will be denoted by A C (I).The remaining symbols will be defined as they are introduced.

B. Problem Statement, Definitions & Assumptions
We consider a chemical system featuring n chemical species S 1 , . . ., S n undergoing n R different reactions.The system state x is encoded by the molecular counts of the individual species, i.e., x = [x 1 . . .x n ] ∈ Z n + .It changes in response to reaction events according to the stoichiometry: Thus, the system state changes by response to reaction r.We will restrict ourselves to the framework of stochastic chemical kinetics for modeling such systems.
The notion of stochastic chemical kinetics treats the position and velocities of all molecules in the system as random variables; reactions are assumed to occur at collisions with a prescribed probability.Consequently, the evolution of the system state is a continuous-time jump process.Here, we will assume that this jump process can be described by the Chemical Master Equation (CME).
Assumption 1.Let P π (x,t) be the probability to observe the system in state x at time t given the distribution π of the initial state of the system.Then, P π (x,t) satisfies where a r denotes the propensity of reaction r, i.e., in state x, a r (x)dt quantifies the probability that reaction r occurs in [t,t + dt) as dt → 0.
Moreover, we will restrict our considerations to the case of polynomial reaction propensities.
Assumption 2. The reaction propensities a r in (CME) are polynomials.
To ensure the moment trajectories remain well-defined at all times, we will further assume that the stochastic process is well-behaved in the following sense.
Assumption 3. The number of reaction events occurring in the system within finite time is finite with probability 1.
A consequence of Assumption 3 is that the continuous-time jump processes associated with (CME) is regular 25 , i.e., it does not explode in finite time.We wish to emphasize that Assumptions 1 -3 are rather weak; Assumptions 1 and 2 are in line with widely accepted microscopic models 26 while Assumption 3 should intuitively be satisfied for any practically relevant system for which the CME is a reasonable modeling approach.Furthermore, Assumption 3 is formally necessary for (CME) to be valid on R + 25 .For a detailed, physically motivated derivation of the CME alongside discussion of the underlying assumptions and potential relaxations thereof, the interested reader is referred to Gillespie 26 .
Instead of studying the probability distribution P π as a description of the system behavior, in this paper we will focus on its moments defined as follows.
Definition 1.Let X be the reachable set of the system, i.e., X = {x ∈ Z n + | ∃t ≥ 0 : P π (x,t) > 0}, and j ∈ Z n + be a multi-index.The j th moment of P π (•,t) is defined as y j (t) = ∑ x∈X x j P π (x,t).y j is said to be of order |j| = ∑ n i=1 j i .The function y j (•) is called the trajectory of the j th moment.
Additionally, it will prove useful to introduce the following notion of generalized moments.Definition 2. Let y j be as in Definition 1 and t T > 0. Consider a uniformly bounded Lebesgue integrable function g : [0,t T ] → R. The j th generalized moment of P π with respect to g is defined by z j (g;t) = t 0 g(τ)y j (τ)dτ for t ∈ [0,t T ].We say g is a test function and generates z j (g;t).
Under Assumptions 1 and 2, it is well-known that the dynamics of the j th moment are described by a linear timeinvariant ordinary differential equation (ODE) of the form where q = max 1≤r≤n R deg(a r ) − 1.The coefficient vector c can be readily computed from the reaction propensities and stoichiometry; see for example Gillespie 27 for details.For q > 0, it is clear from (1) that the dynamics of moments of a certain order in general depend on moments of a higher order.This issue is commonly termed the moment closure problem.
If we denote by y L the vector of "lower" order moments up to a specified order, say m, and by y H the vector of "higher" order moments of order m + 1 to m + q, it is clear from (1) that we obtain a linear time-invariant ODE system of the form and n H = n+m+q n − n L denote the number of lower and higher order moments, respectively.For the sake of a more concise notation, throughout we will often omit these subscripts and instead write where In the presence of the moment closure problem, it is clear from the setup of Equation (mCME) that it does not provide sufficient information to determine uniquely the moment trajectories associated with the solution of (CME).In the following, we therefore address the question of how to compute hard, theoretically guaranteed bounds on the true moment trajectory y j (•) associated with the solution of (CME) in this setting.To that end, we build on the work of Dowdy and Barton 20 who have recently proposed an approach to answer this question.In broad strokes, they generate upper and lower bounds by optimizing a moment sequence truncated at a given order subject to a set of necessary moment conditions, i.e., conditions that the true moment trajectories are guaranteed to satisfy.By increasing the truncation order, the bounds can be successively improved.Our contribution is an extension of Dowdy and Barton's work in the form of a hierarchy of new necessary moment conditions.We show that these conditions provide additional, more scalable bound tightening mechanisms beyond increasing the truncation order and moreover give rise to highly structured optimization problems that can potentially be solved more efficiently than the unstructured problems arising in Dowdy and Barton's method.

C. Necessary Moment Conditions
The bounding method proposed by Dowdy and Barton 20 hinges on necessary moment conditions which restrict the set of potential solutions of (mCME) as much as possible, yet allow efficient computation.Necessary moment conditions in the form of affine equations and linear matrix inequalities (LMI) have proved to fit that bill.Conditions of this form are of particular practical value as they allow for the computation of the desired bounds via semidefinite programming (SDP).As shown by Dowdy and Barton 20 such affine equations arise from the system dynamics while the LMIs reflect constraints on the support of the underlying probability distribution.In the following, we will sketch their derivation and summarize the key properties that will be leveraged in Section III to construct additional necessary moment conditions and establish their properties.

Linear Matrix Inequalities
As claimed above, the fact that the solution of the CME P π (•,t) is a non-negative measure on R n and supported only on X implies that its truncated moment sequences satisfy certain LMIs [16][17][18][19]28 . Thefollowing argument reveals this fact: Consider a polynomial f ∈ R[x] that is non-negative on X; further, let b be a vector polynomial obtained by arranging the elements of the monomial basis of the polynomials up to degree d = m+q−deg( f ) 2 in a vector.Then, the following generalized inequality where E denotes the expectation with respect to P π (•,t) follows immediately It is easy to verify that the above relation can be concisely written as an LMI involving the moment trajectory of P π .Concretely, we can write where M f : R n L +n H → S ( n+d d ) is a linear map.The precise structure of M f depends on f and is immaterial for all arguments presented in this paper; however, the interested reader is referred to Lasserre 29 or Dowdy and Barton 16 for a detailed and formal description of the structure of M f .As clear from the above argument, the construction of valid LMIs of the form (LMI) relies merely on polynomials that are non-negative on X.For stochastic chemical systems, natural choices of such polynomials include f (x) = 1 and f (x) = x i for i = 1, . . ., n, [16][17][18][19] reflecting that P π is non-negative and in particular not supported on states with negative molecular counts, respectively.More generally, the support of P π (•,t) on any basic closed semialgebraic set can be reflected this way, most importantly including the special cases of polyhedra and bounded integer lattices.To account for this flexibility while simplifying notation, we will make use of the following definition and shorthand notation.Definition 3. Let f 0 , . . ., f n p be polynomials that are nonnegative on the reachable set X.The convex cone described by these LMIs is denoted by C(X), i.e., C(X) = {y ∈ R n L +n H | M f i (y) 0, i = 0, . . ., n p }.
Lastly, we note that the validity of LMIs of the form (LMI) carries over to the generalized moments that are generated by non-negative test functions.To see this, observe that the linearity of M f implies that holds.Now assuming g is non-negative on R + and applying Jensen's inequality to the extended convex indicator function of the positive semidefinite cone, 1 ∞ S + , therefore yields and hence M f (z(g;t)) 0 must hold for any t ≥ 0 in analogy to (LMI).

Affine Constraints
As noted in the beginning of this section, the moment dynamics (mCME) give rise to affine constraints that the moments and generalized moments must satisfy.To see this, consider a test function g ∈ A C ([0,t T ]) and final time t f ≤ t T .Then, as proposed by Dowdy and Barton 20 , integrating t f 0 g(t) dy L dt (t) dt by parts yields the following set of affine equations We wish to emphasize here that the above constraints are vacuous if z(g;t) and z(g ;t) are no further restricted.This observation motivates necessary restrictions on g to generate "useful" generalized moments.Recalling the discussion in Section II C, one may be tempted to argue that g and g shall be non-negative (or non-positive) on [0,t f ] so that the generated generalized moments satisfy LMIs of the form (LMI).In fact, Dowdy and Barton 20 as well as Sakurai and Hori 21 demonstrate that this is indeed a reasonable strategy; they use exponential and monomial test functions, respectively.However, in principle a wider range of test functions can be used.We defer the discussion of this issue to Section III.

III. TIGHTER BOUNDS
A. An Optimal Control Perspective Some of the conservatism in the original method of Dowdy and Barton 20 stems from the fact that the moments are only constrained in an integral or weak sense, i.e., z(g;t f ) = t f 0 g(τ)y(τ) dτ is constrained as opposed to y(t) for all t ∈ [0,t f ].This is potentially a strong relaxation as in fact the entire trajectory must satisfy the necessary moment conditions.Moreover, by Assumption 3, the moment trajectories remain bounded at all times, which, taken together with the fact that they satisfy the ODE (mCME), shows that they are guaranteed to be infinitely differentiable.Using these two additional pieces of information, we argue that the following continuoustime optimal control problem provides an elementary starting point for addressing the question of how to bound the moment trajectories associated with a stochastic chemical system evaluated at a given time point t f : Here, the "lower" order moments y L act as the state variables while the "higher" order moments y H can be viewed as control inputs.Although the infinite dimensional nature of Problem (OCP) leaves it with little immediate practical relevance, this representation is conceptually informative.In fact, it is not hard to verify that the method proposed by Dowdy and Barton 20 provides a systematic way to construct tractable relaxations of (OCP) in the form of SDPs.However, Dowdy and Barton's method does in no way reflect the dependence of y(t f ) on past values of y(t) other than y(0) nor the fact that y ∈ C ∞ (R + ).As we will show in the following, these obser-vations motivate new necessary moment conditions giving rise to a hierarchy of tighter SDP relaxations of Problem (OCP) than those constructed by Dowdy and Barton's method 20 .

B. A New Hierarchy of Necessary Moment Conditions
In this section, we present the key contribution of this article -a new hierarchy of convex necessary moment conditions that reflect the temporal causality and regularity conditions inherent to the moment trajectories associated with the distribution described by the CME.To provide some intuition for these results, we will first discuss some special cases of the proposed conditions which permit a clear interpretation.To that end, recall that the moment trajectory y j (•) must be infinitely differentiable on R + as all moment trajectories remain bounded by Assumption 3 and obey the linear time-invariant dynamics (mCME).As a consequence, the Taylor polynomial and remainder are well-defined for any 0 ≤ t 1 ≤ t 2 < +∞ and order l ≥ 0. A key observation here is that if |j| and l are sufficiently small 30 , then T l (y j ;t 1 ,t 2 ) and R l (y j ;t 1 ,t 2 ) depend linearly on y(t 1 ) and z(g l ;t 2 ) with g l (t) = 1 [t 1 ,t 2 ] (t)(t 2 − t) l , respectively.Formally, we can write for an appropriate choice of the coefficient vectors.
Overall, this observation suggests to employ conditions of the form at different time points along the trajectory as necessary moment conditions.In fact these conditions achieve exactly what we set out to do: they establish a connection between y j (t 2 ) and its past using the smoothness properties of the trajectory y j .Further, it is straightforward to see that analogous conditions are readily obtained for any generalized moment generated by a sufficiently smooth test function.The above conditions hence appear to be a promising starting point.From a practical perspective, however, they merely suggest a particular choice of test functions as revealed by the following proposition.
for l = 0, . . ., n I , then y t 1 , y t 2 and z g l ,t 2 also satisfy for l = 0, . . ., n I and j such that |j| ≤ m − lq, where c l,j and d l,j are defined as in (3).
Proof.The proof is deferred to Supplementary Information (SI).
Remark 1.Note that Condition (4) is analogous to Condition (2) as obtained for the test function g l with a shifted origin, hence it is a necessary moment condition.Further, we wish to emphasize that Condition (4) is in general more stringent than Condition (5) as is made clear in the proof.
Beyond a specific choice of test functions, the above considerations motivate a broader strategy to generate necessary moment conditions that reflect causality.This strategy can be summarized as "discretize and constrain".Instead of imposing Condition (2) on the entire time horizon [0,t f ] as proposed by Dowdy and Barton 20 , the time horizon can be partitioned dτ by parts can be imposed: While by itself this does not provide any restriction over Condition (2), the following observation makes it worthwhile: the generalized moments generated by a non-negative test function g form a monotonically increasing sequence with respect to the convex cone C(X).This follows immediately from the definition of z(g;t) and Jensen's inequality as described in Section II C; more formally, are necessary moment conditions.The Conditions ( 6) & ( 7) are generally a non-trivial restriction of the Conditions (2) & z(g;t f ) ∈ C(X) as employed by Dowdy and Barton 20 .To see this, simply observe that we recover Equation ( 2) by summing the Equations ( 6) over i = 1, . . ., n T and likewise obtain using that C(X) is a convex cone and z(g; 0) = 0 by definition.The above described strategies lend themselves to generalization in terms of a hierarchy of necessary moment conditions.This generalization can be performed in several equivalent ways.Next we will present one such generalization utilizing a concept which we refer to as iterated generalized moments: Definition 4. Let z j (g;t) be the j th generalized moment as per Definition 2.Then, the iterated generalized moment of Level l ≥ 0 is defined by For the sake of simplified notation and analysis, it will further prove useful to introduce the following left and right integral operators For vector-valued functions, I L and I R shall be understand as being applied componentwise.
With these two concepts in hand, the following proposition formalizes the proposed hierarchy of necessary moment conditions.
Proposition 2. Let t T > 0 and consider a non-negative test function g ∈ A C ([0,t T ]).Further, let y be the truncated sequence of moment trajectories associated with the solution of (CME), and z l the corresponding iterated generalized moments.Then, the following conditions hold for any l ≥ 1: Proof.It is easily verified that Condition (i) is obtained from integrating Equation (2) l − 1 times.Validity of Condition (ii) follows by a similar inductive argument: Since y(t) ∈ C(X) for all t ∈ [0,t T ], it follows by non-negativity of g on [0,t T ] and Jensen's inequality that for any 0 ≤ t 1 ≤ t 2 ≤ t T .Now suppose Condition (ii) is satisfied for l − 1.Then, it follows by Jensen's inequality that for any 0 ≤ t 1 ≤ t 2 ≤ t T and k = 0, . . ., l − 2 For k = l − 1, an analogous argument applies.
Before we proceed, a few remarks are in order to contextualize this result.
Remark 2. Choosing l = 1, t 1 = 0 and t 2 = t f reproduces the necessary moment conditions proposed by Dowdy and Barton 20 .
Remark 3. Regarding Condition (ii), one might be tempted to argue that any permutation of the operator products I L and I R of length l − 1 applied to f (x, y) = z 1 (g; y) − z 1 (g; x) gives rise to a new valid necessary moment condition.It can be confirmed, however, that I L and I R commute such that Condition (ii) is invariant under permutation of I L and I R ; a more detailed discussion of this claim can be found in the SI.
Remark 4. We wish to emphasize that Conditions (i) and (ii) depend affinely on the iterated generalized moments up to Level l evaluated at t 1 and t 2 , respectively.Accordingly, they preserve the computational advantages of the original necessary moment conditions.To avoid notational clutter in the remainder of this article, however, we will disguise this fact and concisely denote the left-hand-side of Condition (ii) by An explicit algebraic expression for Ω l,k is provided in the SI.
Remark 5.For the 0-th order moments, additional constraints arise from the definition as can be evaluated explicitly.
It is crucial to mention that Condition (i) in Proposition 1 is effectively unrestrictive unless z l (g ;t) can be further constrained.The following proposition provides a concrete guideline which test functions allow to circumvent this issue.
Proposition 3. Let F be a finite set of test functions such that span (F) is closed under differentiation.Then, for any f ∈ F, there exists a linear map Γ f such that Condition (i) in Proposition 2 is equivalent to We omit the elementary proof of Proposition 3 and instead provide a concrete example that shows how to construct the maps Γ f for a given set of exponential test functions.
Now let f (t) = e ρ i t .Comparing with Condition (i) in Proposition 2 shows that the map Γ f is defined by By definition of the iterated generalized moments and the fact that f (t) = ρ i f (t), it follows further that z l ( f ;t) = ρ i z l ( f ;t).Thus, Example 1 indicates the significance of the hypotheses of Proposition 3. In particular, it emphasizes that the closedness of span (F) under differentiation is precisely what is needed in order to guarantee that the associated necessary moment conditions described in Proposition 2 are "self-contained"; that is, the conditions only depend on generalized moments as generated by test functions in F. It is further noteworthy that there exist rich function classes beyond exponentials that can be used to assemble test function sets that satisfy the hypotheses of Proposition 3, for example polynomials and trignometric functions.
Another issue is that we require non-negativity of the test functions in Proposition 2. This problem can be alleviated by a simple reformulation and shift of the time horizon in Proposition 2. For example, if a test function g is nonnegative on [0,t + ] and non-positive on [t + ,t T ], we can simply consider the two test functions g in place of g and impose the necessary moment conditions on the intervals [0,t + ] and [t + ,t T ], respectively.This construction naturally extends to test functions with any finite number of sign changes.
We conclude this section by establishing some compelling properties of the hierarchy of necessary moment conditions put forward in Proposition 2. On the one hand, Conditions (i) and (ii) in Proposition 2 include the conditions considered in Proposition 1 as special cases.So in particular, they enforce consistency with higher-order Taylor expansions of the true moment trajectories as discussed in the beginning of this section.The following corollary to Proposition 2 formalizes this claim.
Corollary 1.Let n I ∈ Z + and t T > 0 be fixed.Further, suppose g ∈ A C ([0,t T ]) is non-negative, and let y and z l (g; •) be arbitrary functions such that z l (g; •) is linear in the first argument and z 0 (g;t) = g(t)y(t) holds.Fix If Conditions (i) and (ii) of Proposition 2 are satisfied by z l (g;t i ) n I +1 l=0 for i = 1, 2, then there exist functions z(h l g; •) that are linear in the first argument, and satisfy for all l ∈ {0, . . ., n I }.
Proof.The proof is deferred to the SI.Remark 6.To see the connection to Condition (4) in Propo-sition 1, simply consider the case where g(t) = 1.Moreover, note that Corollary 1 also shows that necessary moment conditions of the form of (6) & (7) are implied as they are recovered for l = 0 since we can simply identify z(h 0 g;t 2 ) with z 1 (g;t 2 ) − z 1 (g;t 1 ).
On the other hand, the proposed necessary moment conditions display benign scaling behavior in the following sense: Corollary 2. Let 0 ≤ t 1 ≤ t 2 ≤ t 3 < +∞ and n I be a fixed positive integer.Suppose {z s } n I s=1 is a set of functions such that for all i ∈ {1, 2} and k, l ∈ Z + such that k < l ≤ n I .Then, Proof.The proof is deferred to the SI.
In essence, Corollary 2 shows that imposing the proposed necessary moment conditions at multiple time points along a time horizon scales linearly with the number of time points considered.

C. An Augmented Semidefinite Program
In this section, we construct an SDP based on Proposition 2 whose optimal value furnishes bounds on the moment solutions of (CME) at a given time point t f ∈ [0,t T ].To that end, we consider the truncation order m to be fixed and the following user choices as known: (ii) F = g 1 , . . ., g n F -A finite set of test functions that satisfies the hypotheses of Propositions 2 and 3.
(iii) n I -A non-negative integer controlling the hierarchy level in Proposition 2.
These quantities parametrize a spectrahedron S(F, T, n I ) described by the necessary moment conditions of Proposition 2 as imposed for all test functions in F, at all time points in T and for all hierarchy Levels up to n I .In the formulation of S(F, T, n I ), however, we use a slightly different but equivalent formulation of Condition (i) of Proposition 2. The reason for this modification is that it results in weakly coupled conditions that allow the resultant SDPs to be decomposed in a natural way as we will discuss in Section IV A. Details on this reformulation can be found in the SI.S(F, T, n I ) is explicitly stated below; for the sake of concise notation we introduced the shorthand n(t) for the left neighboring point of any t ∈ T, i.e., n(t i ) = t i−1 for i = 2, . . ., n T and n(t 1 ) = 0.
By construction, the set S(F, T, n I ) contains the sequences {y(t) : t ∈ T} and {z l (g;t) : (g,t, l) ∈ F × T × {0, . . ., n I }} as generated by the true moment trajectories associated with the solution of (CME).Another piece of information that can be used to further restrict the set of candidates for the true moment solutions to (CME) is information about the moments of the initial distribution.We know for example from the definition that any iterated generalized moment z l g (t) for l ≥ 1 must vanish at t = 0.Moreover, one usually has specific information about the initial distribution of the system state, hence also about y(0).Here, we assume that the initial moments and iterated generalized moments are confined to a spectrahedral set denoted by S 0 (F, n I ).In the common setting in which the moments of the initial distribution y 0 are known exactly, S 0 (F, n I ) would be given by S 0 (F, n I ) = y 0 , {z l g,0 } y 0 = y(0), z l g,0 = 0, ∀(g, l) ∈ F × { 1, . . ., n I } .
Albeit adding the corresponding constraints to the description of S(T, F, n I ) appears natural, we deliberately choose to reflect this piece of information separately.Our motivation for this distinction is twofold: On the one hand, we want to emphasize that the presented approach naturally extends to the setting of uncertain or imperfect knowledge of the moments of the initial distribution of the system.Specifically, if the moments of the initial distribution are not known exactly, however, known to be confined to a spectrahedral set, the proposed bounding procedure applies without modification.On the other hand, we will argue in Section IV A that, based on this specific feature, the arising optimization problems lend themselves to be decomposed.The distinction in notation made here will simplify our exposition there.
The following Theorems finally summarize the key feature of the proposed methodology -the ability to generate a sequence of monotonically improving bounds on the moment trajectories associated with the solution of (CME).Theorem 1 shows that these bounds can be practically obtained via solution of a hierarchy of SDPs.
Theorem 1.Let y(t) be as in Definition 1 and t f ∈ T. Con-sider a multi-index |i| ≤ m and define Proof.Set y t = y(t) for all t ∈ T and z l g,t = z l (g;t) for all (g,t, l) ∈ F × T × {0, . . ., n I } with z l (g;t) as in Definition 4. By Proposition 2 ({y t }, {z l g,t }) ∈ S(F, T, n I ), and the result follows.
Remark 7.For equal truncation orders and choice of C(X), Remark 2 implies that the bounds obtained from (SDP) are at least as tight as those obtained by the approach of Dowdy and Barton 20 .
Remark 8.The lower bound y * i,t f can be evaluated using offthe-shelf solvers for SDPs such as MOSEK 31 , SeDuMi 32 , or SDPT3 33 .
Remark 9. Similar problems as (SDP) can be formulated to bound properties that can be described in terms of moments of non-negative measures on the reachable set; examples include variances 16 , the volume of a confidence ellipsoids 24 , and the value that the probability measure assigns to a semialgebraic set 16 .
The formulation of (SDP) provides several mechanisms to improve the bounds by adjusting the parameters T, F and n I .Theorem 2 shows that appropriate adjustments lead to a sequence of monotonically improving bounds.
Theorem 2. Let y * i,t f be defined as in Theorem 1.Let t ∈ [0,t T ] and define T = T ∪ {t}.Further, let g be an absolutely continuous function that is non-negative on [0,t T ] and define Likewise, s.t.
Proof.(8) is obvious if t ∈ T and g ∈ F. If t / ∈ T and/or g / ∈ F, any feasible point of the right-hand-side of (8) can be used to construct a feasible point of (SDP); simply remove the decision variables that correspond to time point t and/or test function g.Similarly, removing the iterated generalized moments of Level n I + 1 of the right-hand-side of (9) yields a feasible point for (SDP).
Remark 10.Increasing the truncation order also gives rise to montonically improving bounds.For the sake of brevity, we omit a formal statement and proof here as many easily adapted results of this type exist; see for example Corollary 6 in Kuntz et al. 18 .
We conclude this section with a brief discussion of the scalability of (SDP).Table I summarizes how the number of variables, affine constraints and LMIs as well as their dimension scales with F, T, n I and the truncation order m.The results demonstrate the value of the proposed formulation if the number of species n in the system under investigation is large.In that case, the bound tightening mechanisms offered by adjusting F, T and n I scale much more moderately than increasing the truncation order.Furthermore, it should be em-phasized that the invariance of LMI size with respect to F, T and n I is a very desirable property to achieve scalability of SDP hierarchies in practice 34,35 .Lastly, it is worth noting that moment-based SDPs are notorious for becoming numerically ill-conditioned as the truncation order increases.Thus, the presented hierarchy provides a mechanism to circumvent this issue to some extent.

IV. PRACTICAL CONSIDERATIONS
A. Leveraging Causality for Decomposition Techniques for the efficient numerical integration of ODEs hinge fundamentally on the causality that is inherent to the solution of ODEs.Specifically, causality enables the original problem, namely integration over a long time horizon, to be decomposed into a sequence of simpler, more tractable subproblems, each corresponding to integration over only a small fraction of the horizon.In this section, we discuss how the structure of the presented optimization problems can be exploited in a similar spirit.Additionally, we show that such exploitation of structure gives rise to a mechanism for trading off tractability and scalability.
Suppose we are interested in computing moment bounds at the end of a long time horizon [0,t f ].In light of the arguments made in Section III B, it is reasonable to expect that the set T should ideally be populated with a large number of time points in this setting.Accordingly, solving the resultant optimization problem in one go may become prohibitively costly, even despite the benign scaling of the SDP size with respect to |T|.As alluded to in the beginning of this section, this limitation may be circumvented by decomposing the problem into a sequence of simpler subproblems each of which cover only a fraction of the time horizon.To that end, suppose that T = t 1 , . . .,t n T is ordered with t n T = t f , and let t 0 = 0. Further consider the subsets T 1 , . . ., T n T of T such that T k = {t k }.We now define At this point, it is worth emphasizing the meaning of each S k and how its construction directly exploits the way we formulated the necessary moment conditions in S(F, T, n I ).
To that end, note that each condition in S(F, T, n I ) only links variables corresponding to adjacent time points.As a consequence, the set S(F, T k , n I ) constrains only the variables ({y t k−1 , y t k }, {z l g,t k−1 , z l g,t k }).By construction of S k , we project out the variables (y t k−1 , {z l g,t k−1 }) while imposing their membership in S k−1 .It follows by induction that S k precisely describes the projection of S(F, ∪ k i=1 T i , n I ) onto the variables (y t k , {z l g,t k }) under the condition that (y 0 , {z g,0 }) ∈ S 0 (F, n I ).By this argument, it follows that the original problem (SDP) is equivalent to the following reduced space formulation: inf where all decision variables that correspond to time points before t f have been projected out.It should be clear that the Set S0 = S 0 3: end for 6: return Sn T 7: end procedure Note that Algorithm 1 parallels the decomposition approach taken in classical numerical integration of ODEs: the task of finding moment bounds over the entire time horizon [0,t f ] is decomposed into a sequence of smaller subproblems corresponding to finding moment bounds over smaller subintervals of the horizon and each subproblem is solved by using the solution of the previous subproblem as input data.In other words, Algorithm 1 propagates the moment bounds forward in time, successively subinterval by subinterval, in the same way as a numerical integrator propagates values of the state of a dynamical system forward in time.
We conclude this section with some final remarks.First, we would like to emphasize that the specific choices of T and T k made in this section are made purely for clarity of exposition.In general, t f need not be the last element in T and the partition can be chosen as coarse as desired, i.e., each T k can comprise multiple time points.In that case, however, Algorithm 1 needs to be adjusted accordingly.Second, computing and representing the conic overapproximations in Algorithm 1 may be expensive, in particular if many moments are considered.For example, computing a polyhedral outer approximation of the positive semidefinite cone is known to converge exponentially slowly in the worst-case 36 .Second-order cone approximations perform better empirically 34,37 and theoretically 38 , however, are more expensive to compute and represent.On the other hand, it may not be necessary to find overapproximations that are globally tight but only near the optimal solution of the original problem.Finally, with decisions on accuracy of the overapproximation and the coarseness of the partition of T required in Algorithm 1, one is left with mechanisms to trade-off accuracy and computational cost.

B. Quantifying Approximation Quality
A natural question that arises from the formulation of problem (SDP) is how to choose the parameters required for its construction, i.e., the sets F and T, and the level n I of the proposed constraint hierarchy.We will show that an approximation of Problem (OCP) can provide useful guidance for these choices.Specifically, using (OCP) as a baseline, we show that an approximation of problem (OCP) provides rigorous information on the best attainable bounds given the truncation order m is fixed.To that end, recall that problem (OCP) requires optimization over an infinite dimensional vector space, namely C ∞ (R + ).To overcome this challenge, we will make two restrictions.On the one hand, we will restrict our considerations to a compact interval [0,t T ] and, on the other hand, we will restrict the search space to the set of univariate polynomials up to a fixed but arbitrary maximum degree d ∈ Z + .Note that the latter restriction is in some sense arbitrarily weak as 39 .
The above discussed restrictions enable the use of the following result to construct a tractable approximation of (OCP).
The maps α and β in Proposition 4 are remarkably simple and freely available software tools for sum-of-squares programming allow for simple, concise implementation.The interested reader is referred to Ahmadi and Khadir 40 for an explicit description of α and β alongside a simple proof of Proposition 4.
Proposition 4 allows to construct a tractable restriction of (OCP) on a compact horizon.The following theorem which may be regarded as a special case of the results of Ahmadi and Khadir 40 formalizes this claim.
Theorem 3. Let d ∈ Z + .Then, the following semi-infinite optimization problem is equivalent to a finite SDP.
Proof.First, note that all equality constraints in the above optimization problem require equality of polynomials of fixed maximum degree.Accordingly, equality can be enforced by matching the coefficients of the polynomials when expressed in a common basis which in turn can be done via finitely many affine equality constraints.Additionally, recall that C(X) is described in terms of finitely many LMIs.Thus, the constraint y(t) ∈ C(X), ∀t ∈ [0,t T ] is as well by Proposition 4.
Unfortunately, (pOCP) may be a strong restriction and often even infeasible.However, the formulation of (pOCP) can be further relaxed without giving up too much relevant information.Specifically, we propose to restrict the solution space to piecewise polynomial functions in analogy to the collocation approach to optimal control 41 .The following corollary to Theorem 3 formalizes this approach.
Corollary 3. Let d, n T ∈ Z + and consider n T + 1 time points t 0 , . . .,t n T such that 0 = t 0 < t 1 < • • • < t n T ≤ t T .Further suppose t f ∈ [t k ,t k−1 ] for some k.Then, the following semi-infinite optimization problem is equivalent to a finite SDP.Further, (pwpOCP) is a valid restriction of (SDP).
Proof.That (pwpOCP) is equivalent to a finite SDP follows immediately from Theorem 3. Further, let y i be feasible for (pwpOCP) and consider the piecewise polynomial obtained by parsing the y i together like ỹ(t) = y i (t), ∀t ∈ (t i−1 ,t i ] and ∀i ∈ {1, . . ., n T } . By construction ỹ satisfies (mCME) and ỹ(t) ∈ C(X), ∀t ∈ [0,t f ].Accordingly, the iterated generalized moments obtained from ỹ satisfy Conditions (i) and (ii) in Proposition 2. Thus, it is straightforward to generate a feasible point for (SDP) from ỹ.
Since (pwpOCP) is fully independent of the choice of F, T and n I , it provides a way to check rigorously the approximation quality of (SDP) against the baseline of (OCP).This can guide the user choice of the truncation order m and the parameters F, T and n I .Specifically, the difference of optimal values of (pwpOCP) and (SDP) quantifies the potential for improvements by adding elements to F and T versus moving to a higher level in the proposed hierarchy.

V. EXAMPLES
In this section, we present several case studies that demonstrate the effectiveness of the proposed bounding hierarchy.We put special emphasis on showcasing that the proposed method enables the computation of substantially tighter bounds than can be obtained by the method of Dowdy and Barton 20 .Throughout, we use the subscripts DB to indicate any results obtained with Dowdy and Barton's method 20 and the subscript HB for those generated with the method presented in this paper.
A. Preliminaries

Reaction Kinetics
The reaction networks in all considered examples are assumed to follow mass action kinetics.

State Space & LMIs
Following Dowdy and Barton 20 , we reduce the state space of every reaction network explicitly to the minimum number of independent species by eliminating reaction invariants.Further, we employ the set of LMIs suggested by Dowdy and Barton 20 .These comprise LMIs of the form (LMI) which reflect non-negativity of the underlying probability measure as well as non-negativity of molecular counts of all species including those eliminated via reaction invariants.

Hierarchy Parameters
Applying the proposed bounding scheme requires the user to specify a range of parameters, namely the truncation order m, the hierarchy level n I , the test function set F and the set of time points T used to discretize the time domain.While all these hierarchy parameters can in principle be chosen arbitrarily (assuming the test functions satisfy the hypotheses of Propositions 2 and 3) and independently, a careful choice is essential to achieve a good trade-off between bound quality and computational cost.We discuss this issue in greater detail in Section VI.While we are at present not aware of a systematic way of choosing the hierarchy parameters optimally in the aforementioned sense, we found that the following set of simple heuristics performs well in practice: • The set of time points T is chosen by equidistantly discretizing the entire time horizon [0,t f ], where t f denotes the time point at which the bounds are to be evaluated, into n T intervals.
• In line with Dowdy and Barton's original work 20 , we employ exponential test functions of the form g(t) = e ρ(t T −t) .As argued in Example 1, any set of test functions of this form satisfies the hypotheses of Propositions 2 and 3. Throughout, we choose t T to coincide with the end of the time horizon on which the bounds are to be evaluated.As t T merely controls the scale of the generalized moments generated by g, this choice is somewhat arbitrary but contributes in our experience to improved numerical conditioning of (SDP).For the choice of the parameters ρ, we draw motivation from linear time-invariant systems theory and choose ρ based on the singular values of the coefficient matrix A of the moment dynamics (mCME).Concretely, we choose the assembled from the smallest n F unique singular values σ 1 , . . ., σ n F of A.
• Motivated by the scaling of the size of the bounding SDP (see Table I), we use the following greedy procedure to ultimately choose m, n I , n T and n F 1. Fix m = 2, n I = 2, n F = 1 and successively increase n T until no significant bound tightening effect is observed.
2. Increase n F successively until no significant bound tightening effect is observed.
3. Increase m until bounds are sufficiently tight or computational cost exceeds a tolerable amount.
Note that the above procedure fixes the hierarchy Level n I at 2. While increasing n I generally also has a bound tightening effect, in our experience it promotes numerical ill-conditioning and is rarely significantly more efficient than the other bound tightening mechanisms.
In all following case studies, we employ the above heuristics to choose the hierarchy parameters.As the time point and test function sets are systematically generated once the parameters n T and n F are chosen, we instead report these parameters in place of T and F.

Numerical Considerations
Sum-of-squares and moment problems are notorious for poor numerical conditioning and the problems (SDP) and (pwpOCP) are no exception to this issue.While the specific reasons for this problem remain largely unclear, it is widely suspected to originate from the fact that moments and coefficients of polynomials expressed in the monomial basis often vary over several orders of magnitude.To circumvent this deficiency, we employ a simple scaling strategy for the decision variables in the bounding problems.This strategy is applicable whenever bounds are computed at multiple time points t 1 < t 2 < • • • along a trajectory and can be summarized as follows: we solve the SDPs in chronological order and scale the decision variables in the bounding problem corresponding to the time point t k by the values attained at the solution of the bounding problem associated with the previous time point t k−1 .For the initial problem, we perform scaling based on the moments of the distribution of the initial state of the system.While an appropriate scaling of the decision variables is crucial to avoid numerical issues, it is not always sufficient to achieve convergence of the solver to a desired degree of accuracy with respect to optimality.We hedge against potentially inaccurate, suboptimal solutions and ensure validity of the computed bounds by verifying that the solver converged to a dual feasible point and reporting the associated dual objective value.

Implementation
All semidefinite programs solved for the case studies presented in this section were assembled using JuMP 42 and solved with MOSEK v9.0.97 31 .Our implementation is openly available at https://github.com/FHoltorf/StochMP.

B. Generic Examples
To contrast the performance of the proposed methodology with its predecessor, we first consider three generic reaction networks that were studied by Dowdy and Barton 20 .

Simple Reaction Network
First, we study the bound quality for means and variances of the molecular counts of the species A and C following the simple reaction network Figure 1 shows a comparison between the bounds obtained by both methods.For reference, also the trajectories obtained with Gillespie's Stochastic Simulation Algorithm are provided.The results showcase that the presented necessary moment conditions have the potential to tighten the obtained bounds significantly.In particular bounds on the variance of both species are dramatically improved at the relatively low truncation order of m = 4.

Cyclic Reaction Network
Second, we investigate the cyclic reaction network illustrated in Figure 2.This reaction network exhibits similar characteristics as the simple network studied in the previous section, i.e., a bounded state space and two independent species.In contrast to the simple network, however, the situation is a bit more complex since the reaction is fully reversible.Therefore, the reachable set of the cyclic reaction network is at least as large as that of the simple network given an identical initial state.
Figure 2 shows a comparison between the bounds obtained from (SDP) and Dowdy and Barton's method.Although the bounds are slightly looser than before, the results demonstrate again the potential of the proposed methodology.

Large Reaction Network
Last, we investigate the reaction network illustrated in Fig- ure 4. In contrast to the previous networks, this large reaction network poses a challenge for sampling based analysis techniques.The underlying reason for that is two-fold.On the one hand, the network is characterized by a large, 7-dimensional state space.On the other hand, the system is extremely stiff.These properties frustrate sampling based techniques as they exacerbate the need for large sample sizes and render each sample evaluation expensive.
Figure 5 shows bounds on the mean molecular counts of species A and H.In line with the results of the previous sections, the bounds obtained by the proposed method are again considerably tighter.In this example, however, this result carries more weight as increasing the truncation order leads to a prohibitive increase in problem size for the method of Dowdy and Barton 20 .Accordingly, the proposed method offers bounds at a quality that was previously not attainable for problems of such complexity.

C. Open Systems
We now depart from systems that have a bounded state space and turn to open systems.Such systems are of particular interest as their moments can rarely be found analytically and the corresponding CME gives rise to a countably infinite system of coupled ODEs precluding a direct numerical integration.To showcase the ability to compute tight moment bounds for systems of this type, we study two birth-death processes with different "degrees of nonlinearity".The first system we investigate is the following simple nonlinear birth-death process: Figure 6 draws a comparison between the bounds for the mean and variance of the molecular count of species A. The proposed method again yields substantially tighter bounds than its predecessor.In fact, the bounds on the mean are even tight enough to essentially recover the true solution.
Next, we will consider Schlögl's system 43 : As a canonical example for a chemical bifurcation, Schlögel's system was previously studied by different authors 16,18 to illustrate bounding methods for the moments of stationary solutions of the CME.We provide the first analysis for the dynamic case here.Figure 7 illustrates the results for Schlögl's system.Although the proposed methodology again strongly outperforms its predecessor, the obtained bounds are rather loose.We emphasize, however, that we could not reproduce bounds of similar quality by increasing the truncation order in Dowdy and Barton's 20 method.The bounds started stalling before eventually poor numerical conditioning prohibited solution of the SDPs altogether.

D. Biochemical Reaction Networks
To finally demonstrate that the proposed methodology may be useful in practice, we examine three reaction networks drawn from biochemical application.In these applications, molecular counts are often present in the order of 10s to 100s necessitating the consideration of stochasticity.

Michaelis-Menten Kinetics
Michaelis-Menten kinetics underpin a vast range of metabolic processes.Understanding the behavior and noise present in the associated reaction networks is of particular value for the investigation of the metabolic degradation of trace substances in biological organisms.We examine the basic Michaelis-Menten reaction network: The reaction network features a two-dimensional state space.Accordingly, we bound the means and variances of the molecular counts of the product P and substrate S.The results are illustrated in Figure 8.For the sake of completeness, Figure 8 also features a comparison with the bounds obtained by Dowdy and Barton's 20 method.The proposed method reproduces essentially the exact solution for the means while providing reasonably tight bounds for the variances.Further, it again outperforms its predecessor, especially for bounds on the variances.

Negative Feedback Biocircuit
Many efforts of modern synthetic biology culminate in the design of biocircuits subject to stringent constraints on robustness and performance.Upon successful design, the implications of such tailored biocircuits are often far reaching, even addressing global challenges such as water pollution 44 and energy 45 .Accordingly, in recent years the use of systems theoretic techniques has received considerable attention to conceptualize, better understand and speed up the design process of biocircuits 22 .In this context, Sakurai and Hori 24 demonstrated the utility of stationary moment bounds for the design of biocircuits subject to robustness constraints.Here, we demonstrate that the proposed methodology could enable an extension of their analysis to the dynamic case.
We examine the negative feedback biocircuit illustrated in Figure 9 studied by Sakurai and Hori 24 .The corresponding reaction network is given by DNA Figure 10 illustrates the obtained bounds on means and variance of the molecular counts of the species mRNA and P. The bounds are of high quality and may provide useful information for robustness analysis as the noise level measured by the variance changes significantly over the time horizon until the steady-state value is reached.

Viral Infection
As our last example we consider the following reaction network from Srivastava et al. 46 used to model for the intracellular kinetics of a virus: In this network, S represents the viral structural protein while T and G represent the viral nucleic acids categorized as template and genomic, respectively.Studying a system undergoing the reactions as described by the above network with sampling based approaches is computationally expensive.This is due to two distinct reasons.On the one hand, the state space is infinite.On the other hand, the molecular counts of S, T and G vary over several orders of magnitude so that they evolve at different time scales and noise levels.This characteristic is found in many biochemical reaction networks and has motivated the development of hybrid simulation techniques; see for example Haseltine and Rawlings 47 .Such hybrid approaches combine the use of deterministic models for the dynamics of species present at large counts with stochastic models for rare events and species present at low counts.Although such methods accelerate simulation substantially, they generally introduce a range of assumptions accompanied by an unknown error.The methodology presented in this article provides a framework to quantify this error.Figure 11 shows the bounds obtained for the mean molecular counts of all three species.Although the bounds are not tight, we argue that they may be informative enough to assess whether approximate solutions are reasonable.This in stark contrast to Dowdy and Barton's method 20 which in this case provides extremely loose bounds that could not be substantially improved due to numerical difficulties and prohibitive computational cost at high truncation orders.

VI. BOUND TIGHTENING MECHANISMS
In this section, we briefly assess the effect of the different bound tightening mechanisms provided by the proposed bounding hierarchy.We conduct this empirical analysis on the basis of the birth-death process (11).Furthermore, we restrict our considerations here to studying the effect of increasing the truncation order m, the hierarchy level n I and the number of Figure 12 shows the effect of isolated changes in the different hierarchy parameters on the bounds obtained for the mean molecular count and its variance.The results indicate that all bound tightening mechanisms, when used in isolation, appear to suffer from diminishing returns, eventually causing the bounds to stall.Moreover, solely increasing the truncation order m appears insufficient to provide informative bounds over a long time horizon in this example; increasing either the number of time points or the hierarchy level n I are significantly more effective in comparison.
Figure 13 shows the effect of joint changes in the considered hierarchy parameters on the tightness of bounds on the mean molecular count.The figure indicates that jointly changing the hierarchy parameters effectively mitigates stalling of the bounds in this example such that significantly tighter bounds are obtained overall.While the general trends illustrated by Figures 12 and 13 align well with our experiences for a range of other examples, we wish to emphasize that it is in general hard to predict which combination of hierarchy parameters provides the best trade-off between computational cost and bound quality; when the choice of test functions is added to the equation, the situation becomes even more complicated.Moreover, as Figure 14 illustrates, the feasible region of the bounding SDPs shrinks anisotropically and, more importantly, with different intensity along different directions for different bound tightening mechanisms.As a consequence, the optimal choice of the hierarchy parameters is in general not only dependent on the system under investigation but also on the statistical quantity to be bounded.In summary, the results presented in this section underline the value of the additional bound tightening mechanisms offered by the proposed hierarchy; however, they also emphasize the need for better guidelines to enable an effective use of the tightening mechanisms in practice.

VII. CONCLUSION A. Summary
We have extended the results of Dowdy and Barton 20 by constructing a new hierarchy of convex necessary moment conditions for the moment trajectories of stochastic chemical systems described by the CME.Building on a discretization of the time domain of the problem, the conditions reflect temporal causality and regularity properties of the true moment trajectories.It is proved that the conditions give rise to a hierarchy of highly structured SDPs whose optimal values form a sequence of monotonically improving bounds on the true moment trajectories.Furthermore, the conditions provide new mechanisms to tighten the obtained bounds when compared to the original conditions proposed by Dowdy and Barton 20 .These tightening mechanisms are often a substantially more scalable alternative to the primary tightening mechanism of increasing the truncation order in Dowdy and Barton's approach 20 ; most notably, refining the time discretization results in linearly increasing problem sizes, independent of the state space dimension of the system.As an additional advantage, this bound tightening mechanism provides a way to sidestep the poor numerical conditioning of moment-based SDPs featuring high-order moments.Finally, it is demonstrated with several examples that the proposed hierarchy provides bounds that may indeed be useful in practice.

B. Open Questions
We close by stating some open questions motivated by our results.1.In the presented case studies, we naively chose the time points at which the proposed necessary moment conditions were imposed as equidistant.Several results from numerical integration, perhaps most notably Gauß quadrature, suggest that this is likely not the optimal choice.It would be interesting to examine if and how results from numerical integration can inform improve-ments of this choice.
2. The choice of the hierarchy parameters in the proposed bounding scheme is crucial to achieve a good trade-off between bound quality and computational cost.As indicated by the discussion in Section VI, however, the interplay between the bound tightening mechanisms associated with the different hierarchy parameters and their effect on the bound quality remains poorly understood.Accordingly, we believe that assessing the trade-offs offered by the different bound tightening mechanisms in greater detail and developing more rigorous guidelines on how to utilize them effectively constitutes an important step towards improving the practicality of the proposed method.
3. The ideas discussed in Section IV constitute promising research avenues towards improving practicality of the proposed method.Specifically, there are several open questions pertaining to the concrete implementation of Algorithm 1 and the way Problem (pwpOCP) can be used to inform an effective use of the different bound tightening mechanisms.Furthermore, the decomposable, weakly coupled structure of the bounding SDPs motivates other forms of exploitation than Algorithm 1; in particular the use of distributed opti-mization techniques such as ADMM 48 or Schwarz-like approaches 49,50 appears promising.
5.5 6.0 6.5 7.0 7.5 8.0 8.5 y 1 (t f ) and finally using that A A Az z z k+1 (g;t i ) + K K Kz z z k+1 (g ;t i ) = K K K z z z k (g;t i ) − (t 2 − t 1 ) k k! g(0)y y y(0) holds for all k ∈ {0, . . ., n I } and i ∈ {1, 2} with z z z 0 (g;t i ) = g(t i )y y y(t i ), it is easily verified that (6) is indeed satisfied for z z z(h l g;t 2 ) as defined in (8).

PROOF OF COROLLARY 2
Our proof of Corollary 2 relies in large part on explicit computations.To that end, it is first essential to provide an explicit algebraic relation for Ω l,k ({ z z z s (g;t 1 ) } l s=1 , { z z z s (g;t 2 ) } l s=1 ,t 1 ,t 2 ).Recall that Ω k,l was introduced to compactly denote expressions of the form (I n L I m R f f f )(t 1 ,t 2 ) with f f f (x, y) = z z z 1 (g; y) − z z z 1 (g; x).To avoid unnecessary clutter of notation, we will give an explicit algebraic expression for the latter; adjusting the indices to obtain an expression for Ω l,k is straightforward and can be found in the proof of Corollary 2. It is easily verified by induction that for any m, n ∈ Z + , the following identity holds With this in hand, we will first proof the following Lemma which answers the essential question behind Corollary 2.
Lemma 2. Let z z z l be a sequence of functions that satisfies z z z l (t) = t 0 z z z l−1 (τ)dτ for l ≥ 2. Further, define f f f (x, y) = z z z 1 (y) − z z z 1 (x) and consider a convex cone C.Then, for any 0 ≤ t 1 ≤ t 2 ≤ t 3 < +∞ and n, m ∈ Z + , the following implications hold: and Proof.We focus on proving (10).To that end, note that in order to establish (10), it suffices to show that

2 FIG. 12 :
FIG. 12: Bounds on the trajectories of the mean molecular count and variance of the birth-death process(11) for increasing m (a,b), n I (c,d), and n T (e,f) compared against the empiric sample mean and variance generated with Gillespie's SSA.In each figure only one parameter is varied while the others are held constant at the level indicated in the subcaptions.

= 2 FIG. 13 :
FIG. 13: Maximum gap between upper and lower bounds on mean molecular counts among the time points probed along the time horizon for joint changes in m and n I (a), m and n T (b), and n I and n T (c).

FIG. 14 : k=0 (t 2 −(t 2 −− 1 ∑ k=0 (t 2 − 1 ∑ k=1 (t 2 −
FIG.14: Projection of the feasible set of (SDP) corresponding to the birth-death process(11) for (a) increasing truncation order, (b) increasing number of time points n T , and (c) jointly increasing truncation order and number of time points.All projections are obtained for n I = 2 and t f = 20.