1 Introduction

The analysis of systems undergoing chemical reactions lies at the heart of many scientific and engineering activities. While deterministic models have proved adequate for the analysis of systems at the macroscopic scale, they often fall short for meso- and microscopic systems, in particular for those that feature low molecular counts. In this regime, the complex and chaotic motion of molecules reacting upon collision causes effectively stochastic fluctuations of the molecular counts that are large compared to the mean and, as a consequence, can have a profound effect on the system’s characteristics—a situation frequently encountered in cellular biology [7, 8, 24, 49]. In the context of the continuously growing capabilities of synthetic biology, this fact motivates the use of stochastic models for the identification, design and control of biochemical reaction networks. However, while in these applications stochastic models provide the essential fidelity relative to their deterministic counterparts, their analysis is generally more involved, often requiring heuristic approximations and simplifications. In this article, we address this gap by presenting a convex optimization-based framework for the rigorous quantification of uncertainty in stochastic reaction networks.

Stochastic chemical systems are canonically modeled as jump processes; the system state, as encoded by the molecular counts of the individual chemical species, is modeled to change discretely in response to reaction events as triggered by the arrivals of Poisson processes whose rates depend on the underlying reaction mechanism. The Chemical Master Equation (CME) is an ordinary differential equation that describes how the probability distribution of the state of such a process evolves over time. Specifically, a solution of the CME tracks the probability to observe the system in any reachable state over time. This introduces a major challenge for the analysis of stochastic reaction networks in practice as such systems routinely feature millions or even infinitely many reachable states, rendering a direct solution of the CME intractable. As a consequence, sampling techniques such as Gillespie’s Stochastic Simulation Algorithm [27, 28] have become the most prominent approach for the analysis of systems described by the CME. And although sampling techniques perform remarkably well across a wide range of problems, they are inadequate in certain settings. Most notably, they do not scale well for stiff systems, generally do not provide hard error bounds and the evaluation of sensitivity information is challenging [30]. In particular, the two latter shortcomings limit their utility in the context of identification, design and control. Approaches based on finite state projection [50] come with guaranteed error bounds and straightforward sensitivity evaluation, however, generally suffer severely from a large number of reachable states.

From a practical perspective, stochastic reaction networks are often sufficiently characterized by only a few low-order moments of the distribution of the state, for example through means and variances. In that case, tractability of the CME may be recovered by solving for the moments of its solution directly. The dynamics of a finite sequence of moments associated with the distribution described by the CME, however, generally do not form a closed system of differential equations and hence do not admit a solution by simple (numerical) integration. Numerous moment closure approximations [5, 40, 52, 67] have been proposed to remedy this problem. A major shortcoming of moment closure approximations, however, is that they generally rely on unverifiable assumptions and therefore introduce an uncontrolled error. In fact, it is well-known that their application can lead to unphysical results such as spurious oscillations and negative mean molecular counts, especially when applied to systems with low molecular counts [31, 63, 64].

In order to address the shortcomings of moment closure approximations while preserving the advantages of the moment-based description, several authors have recently proposed schemes for the computation of theoretically guaranteed bounds for the moments (or related statistics) associated with stochastic reaction networks involving low molecular counts; such bounding schemes have been proposed for stationary [21, 25, 43, 59, 62], transient [19, 22, 61] and exit time distributions [10] of stochastic chemical systems and analogous techniques have been successfully applied for the study of other types of stochastic processes [32, 33, 35, 39, 42, 48]. The key insight underpinning all these bounding schemes is that the moment-sum-of-squares hierarchy [45, 54] generates a rich set of convex conic conditions characterizing the true moments associated with stochastic chemical systems in many practically relevant settings. This insight enables the use of conic optimization to approximate the true moment sequence by one that conforms with a finite subset of these “necessary moment conditions” while maximizing (or minimizing) a moment of interest to bound its true value. By imposing necessary moment conditions involving higher and higher-order moments, a sequence of monotonically improving bounds of the true moment of interest is generated.

The unique ability to rigorous quantify errors and uncertainty makes bounding schemes particularly attractive for tasks such as robustness analysis and the verification of approximation techniques like moment closure approximations. The utility of moment bounds for such tasks, however, is directly tied to the bound quality. And while for stationary moments the existing bounding schemes are found to produce remarkably good bounds in a wide range of cases [21, 25, 43, 59, 60], bounds on transient moments are often lacking in tightness, even for simple reaction networks [22, 61]. For transient problems, necessary moment conditions involving moments of rather high order need to be considered to obtain informative bounds. This imposes strong practical limitations as the bounding schemes suffer from the curse of dimensionality, i.e., the size of the bounding problems grows combinatorially with the order of moments considered and the number of chemical species in the system. These limitations are exacerbated by the notorious numerical ill-conditioning of moment problems [10, 20, 21]. In order to address this issue, we extend in this work the bounding scheme proposed by Dowdy and Barton [22] for transient moments of the solutions of the CME. We devise a moment bounding scheme involving a new hierarchy of necessary moment conditions which is motivated by the characteristics of stochastic chemical systems and not generally considered in the standard moment-sum-of-squares hierarchy. In broad strokes, these conditions arise from a partitioning of the time domain in a way that is akin to discretization techniques commonly used for solving differential equations. As such, they reflect the temporal causality that is inherent to solutions of the CME. These conditions give rise to new bound tightening mechanisms beyond increasing the order of moments considered. For example, a simple refinement of the time domain partition results in a strengthening of the conditions and thus acts as a bound tightening mechanism. Critically, these bound tightening mechanisms avoid augmentation of the order of moments considered and thus enjoy favorable scaling properties in addition to alleviating numerical difficulties associated with the consideration of high-order moments. While these mechanisms do not obviate increasing the moment truncation order and so do not avoid the curse of dimensionality entirely, we find that they greatly improve upon the practicality of Dowdy and Barton’s [22] proposal. Cibulka et al. [17] recently studied a closely related spatio-temporal partitioning approach in the context of overapproximating the region of attraction of deterministic control systems via sum-of-squares techniques [36] and report similar improvements in terms of practicality. While their approach is similar in spirit to ours, it takes a different (dual) perspective of approximating continuous functions by piecewise polynomials as opposed to the moment-centered view presented here. Moreover, their approach is geared toward deterministic control systems and does not apply without modification to bounding the moment trajectories of stochastic chemical systems.

The remainder of this article is organized as follows. In Sect. 2, we introduce definitions and assumptions, formally define the problem of bounding the transient moments of stochastic chemical systems and review essential preliminaries. Section 3 is devoted to the development and analysis of the proposed hierarchy of necessary moment conditions. In Sect. 4, we discuss several aspects pertaining to the use of these conditions for computation of moment bounds in practice. The utility of the resultant bounding scheme is demonstrated with several examples in Sect. 5 before we conclude with some open questions in Sect. 6.

2 Preliminaries

2.1 Notation

We denote scalars with lowercase symbols without emphasis, while vectors and matrices are denoted by bold lower- and uppercase symbols, respectively. Throughout, vectors are assumed to be column vectors. Generic sets are denoted by uppercase symbols without emphasis. For special or commonly used sets, we use the standard notation; for example, the (non-negative) n-dimensional reals and integers are denoted by \(\mathbb {R}^n\) (\(\mathbb {R}^n_+\)) and \(\mathbb {Z}^n\) (\(\mathbb {Z}^n_+\)), respectively. Similarly, we refer to the set of symmetric and symmetric positive semidefinite (psd) n-by-n matrices with \(\mathbb {S}^n\) and \(\mathbb {S}^n_+\), respectively, and use the usual shorthand notation \(\varvec{A}\succeq \varvec{B}\) for \(\varvec{A}-\varvec{B}\in \mathbb {S}_+^n\). The set of n-dimensional vector and symmetric matrix polynomials with real coefficients (of degree at most k) in the variables \(\varvec{x} = [x_1 \ \dots \ x_N]^\top \) will be denoted by \(\mathbb {R}^n[\varvec{x}]\) (\(\mathbb {R}^n_k[\varvec{x}]\)) and \(\mathbb {S}^n[\varvec{x}]\) (\(\mathbb {S}^n_k[\varvec{x}]\)), respectively. In order to concisely denote multivariate monomials, we employ the multi-index notation: for a monomial in n variables corresponding to the multi-index \(\varvec{j} = [j_1\,\dots \,j_n]^{\top } \in \mathbb {Z}_+^n\), we write \(\varvec{x}^{\varvec{j}} = \prod _{i=1}^{n} x_i^{j_i}\). The indicator function of a set A is denoted by \(\mathbbm {1}_{A}\). Lastly, we denote the set of n times continuously differentiable functions on an interval \(I\subset \mathbb {R}\) by \(\mathcal {C}^n(I)\) while the set of absolutely continuous functions is denoted by \(\mathcal{A}\mathcal{C}(I)\). The remaining symbols will be defined as they are introduced.

2.2 Problem Statement, Definitions & Assumptions

We consider a reaction system featuring n chemical species \(S_1,\dots ,S_n\) undergoing \(n_R\) different reactions. The system state \(\varvec{x}\) is encoded by the molecular counts of the individual species, i.e., \(\varvec{x} = [x_1\, \dots \, x_n]^{\top } \in \mathbb {Z}^n_+\). It changes in response to reaction events according to a given stoichiometry:

$$\begin{aligned} \nu ^{-}_{1,r}S_1 + \cdots + \nu ^{-}_{n,r}S_N \rightarrow \nu ^{+}_{1,r} S_1 + \cdots + \nu ^{+}_{n,r} S_n, \quad r = 1, \dots , n_R. \end{aligned}$$

In other words, the system state changes by \(\varvec{\nu }_r = [\nu _{1,r}^+-\nu _{1,r}^- \ \dots \ \nu _{n,r}^+-\nu _{n,r}^-]^{\top }\in \mathbb {Z}^n\) upon the event of reaction r. We will restrict ourselves to the framework of stochastic chemical kinetics for modeling such systems.

The notion of stochastic chemical kinetics treats the position and velocities of all molecules in the system as random variables; reactions are assumed to occur at collisions with a prescribed probability. Consequently, the evolution of the system state is a continuous-time jump process. Here, we will assume that this jump process can be described by the Chemical Master Equation (CME).

Assumption 1

Let \(P_{\pi }(\varvec{x},t)\) be the probability to observe the system in state \(\varvec{x}\) at time t given the distribution \(\pi \) of the initial state of the system. Then, \(P_{\pi }(\varvec{x},t)\) satisfies

$$\begin{aligned} \frac{\partial P_{\pi }}{\partial t}(\varvec{x},t) = \sum _{r=1}^{n_R} a_r(\varvec{x}-\varvec{\nu }_r) P_{\pi }(\varvec{x}-\varvec{\nu }_r,t) - a_r(\varvec{x}) P_{\pi }(\varvec{x}, t), \ P_{\pi }(\cdot ,0) = \pi , \end{aligned}$$
(CME)

where \(a_r\) denotes the propensity of reaction r, i.e., \(a_r(\varvec{x})dt\) quantifies the probability that reaction r occurs in [0, dt) as \(dt\rightarrow 0\) given the initial system state is \(\varvec{x}\).

Moreover, we will restrict our considerations to the case of polynomial reaction propensities.

Assumption 2

The reaction propensities \(a_r\) in (CME) are polynomials.

To ensure the moment trajectories remain well-defined at all times, we will further assume that the stochastic process is well-behaved in the following sense.

Assumption 3

The number of reaction events occurring in the system within finite time is finite with probability 1.

A consequence of Assumption 3 is that the continuous-time jump process associated with (CME) is regular [57], i.e., it does not explode in finite time. We wish to emphasize that Assumptions 13 are rather weak; Assumptions 1 and 2 are in line with widely accepted microscopic models [29], while Assumption 3 should intuitively be satisfied for any practically relevant system for which the CME is a reasonable modeling approach. Furthermore, Assumption 3 is formally necessary for (CME) to be valid on an indefinite time horizon [57]. For a detailed, physically motivated derivation of the CME alongside discussion of the underlying assumptions and potential relaxations thereof, the interested reader is referred to Gillespie [29].

Instead of studying the probability distribution \(P_{\pi }\) as a description of the system behavior, in this paper we will focus on its moments defined as follows.

Definition 2.1

Let X be the reachable set of the system, i.e., \(X = \{ \varvec{x} \in \mathbb {Z}_+^n \mid \exists t \ge 0: P_{\pi }(\varvec{x},t) > 0 \}\), and \(\varvec{j} \in \mathbb {Z}_+^n\) be a multi-index. The \(\varvec{j}\)th moment of \(P_{\pi }(\cdot ,t)\) is defined as \(y_{\varvec{j}}(t) = \sum _{\varvec{x}\in X} \varvec{x}^{\varvec{j}} P_{\pi }(\varvec{x},t)\). \(y_{\varvec{j}}\) is said to be of order \(|\varvec{j}| = \sum _{i=1}^n j_i\). The function \(y_{\varvec{j}}(\cdot )\) is called the trajectory of the \(\varvec{j}\)th moment or the \(\varvec{j}\)th transient moment. If \(y_{\varvec{j}}\) further converges to a stationary value in the limit of long times, we refer to that value as the corresponding stationary moment.

Additionally, it will prove useful to introduce the following notion of generalized moments.

Definition 2.2

Let \(y_{\varvec{j}}\) be as in Definition 2.1 and \(t_T > 0\). Consider a uniformly bounded Lebesgue integrable function \(g:[0,t_T] \rightarrow \mathbb {R}\). The \(\varvec{j}\)th generalized moment of \(P_{\pi }\) with respect to g is defined by \(z_{\varvec{j}}(g;t) = \int _{0}^{t} g(\tau )y_{\varvec{j}}(\tau ) \,\, d\tau \) for \(t\in [0,t_T]\). We say g is a test function and generates \(z_{\varvec{j}}(g;t)\).

Under Assumptions 1 and 2, it is well-known that the dynamics of the \(\varvec{j}\)th moment are described by a linear time-invariant ordinary differential equation (ODE) of the form

$$\begin{aligned} \frac{dy_{\varvec{j}}}{dt}(t) = \sum _{|\varvec{k}|\le |\varvec{j}| + q} c_{\varvec{k}} y_{\varvec{k}}(t) = \varvec{c}^{\top } \varvec{y}(t), \end{aligned}$$
(1)

where \(q = \max _{1 \le r \le n_R} \text {deg}(a_r) - 1\). The coefficient vector \(\varvec{c}\) can be readily computed from the reaction propensities and stoichiometry; see for example Gillespie [26] for details. For \(q>0\), it is clear from Eq. (1) that the dynamics of moments of a certain order in general depend on moments of a higher order. This issue is commonly termed the moment closure problem. If we denote by \(\varvec{y}_L\) the vector of “lower-order” moments up to a specified order, say m, and by \(\varvec{y}_H\) the vector of “higher-order” moments of order \(m+1\) to \(m+q\), it is clear from Eq. (1) that we obtain a linear time-invariant ODE system of the form

$$\begin{aligned} \frac{d\varvec{y}_L}{dt}(t) = \varvec{A}_L \varvec{y}_L(t) + \varvec{A}_H \varvec{y}_H(t) \end{aligned}$$

with \(\varvec{A}_L \in \mathbb {R}^{n_L\times n_L}\) and \(\varvec{A}_H \in \mathbb {R}^{n_L\times n_H}\), where \(n_L={n+m \atopwithdelims ()n}\) and \(n_H = {n+m+q \atopwithdelims ()n} - n_L\) denote the number of lower- and higher-order moments, respectively. For the sake of a more concise notation, throughout we will often omit these subscripts and instead write

$$\begin{aligned} \varvec{K} \frac{d\varvec{y}}{dt}(t) = \varvec{A} \varvec{y}(t), \end{aligned}$$
(mCME)

where \(\varvec{A} = \left[ \varvec{A}_L \,\, \varvec{A}_H\right] \), \(\varvec{K} = \left[ \varvec{I}_{n_L \times n_L} \,\, \varvec{0}_{n_L \times n_H}\right] \) and \(\varvec{y}= \left[ \varvec{y}_L^{\top }\, \varvec{y}_H^{\top }\right] ^{\top }\).

In the presence of the moment closure problem, it is clear from the setup of Equation (mCME) that it does not provide sufficient information to determine uniquely the moment trajectories associated with the solution of (CME). In the following, we therefore address the question of how to compute hard, theoretically guaranteed bounds on the true moment trajectory \(y_{\varvec{j}}(\cdot )\) associated with the solution of (CME) in this setting. To that end, we build on the work of Dowdy and Barton [22] who have recently proposed an approach to answer this question. In broad strokes, they generate upper and lower bounds by optimizing a moment sequence truncated at a given order subject to a set of necessary moment conditions, i.e., conditions that the true moment trajectories are guaranteed to satisfy. By increasing the truncation order, the bounds can be improved. Our contribution is an extension of Dowdy and Barton’s work in the form of a hierarchy of new necessary moment conditions based upon partitioning of the time domain of the problem. We show that these conditions provide additional, more scalable bound tightening mechanisms beyond increasing the truncation order.

2.3 Necessary Moment Conditions

The bounding method proposed by Dowdy and Barton [22] hinges on necessary moment conditions which restrict the set of potential solutions of Eq. (mCME) as much as possible while remaining computationally tractable. Necessary moment conditions in the form of affine constraints and linear matrix inequalities (LMI) have proved to fit that bill [38, 46]. Conditions of this form are of particular practical value as they allow for the computation of the desired bounds via semidefinite programming (SDP). In fact, analogous conditions to those proposed by Dowdy and Barton [22] have been employed for a wide range of applications concerned with the moment-based study of dynamical systems [38], for example for optimal control [37, 47], region of attraction computation [17, 36], the analysis of PDEs [11, 41] or options pricing [48]. In general, affine moment conditions arise from the system dynamics, while the LMIs reflect constraints on the support of the underlying probability distributions. In the following, we will sketch the derivation of these conditions and highlight key properties which will be leveraged for the construction of new necessary moment conditions in Sect. 3.

2.3.1 Linear Matrix Inequalities

The fact that the solution of the CME, \(P_{\pi }(\cdot ,t)\), is a non-negative measure on \(\mathbb {R}^n\) and supported only on the reachable set X of the underlying reaction system implies that its truncated moment sequences satisfy certain LMIs [21, 25, 43, 45, 59]. The following argument reveals this fact: Consider a polynomial \(f \in \mathbb {R}[\varvec{x}]\) that is non-negative on X; further, let \(\varvec{b}\) be a vector polynomial obtained by arranging the elements of the monomial basis of the polynomials up to degree \(d = \lfloor \frac{m + q - \text {deg}(f)}{2} \rfloor \) in a vector. The following generalized inequality

$$\begin{aligned} \mathbb {E}\big [f\varvec{b}\varvec{b}^{\top } \big ] \succeq P_{\pi }(\hat{\varvec{x}},t)f(\hat{\varvec{x}})\varvec{b}(\hat{\varvec{x}})\varvec{b}(\hat{\varvec{x}})^{\top } \succeq \varvec{0}, \quad \forall (\hat{\varvec{x}},t) \in \mathbb {R}^n\times \mathbb {R}_+, \end{aligned}$$

where \(\mathbb {E}\) denotes the expectation with respect to \(P_{\pi }(\cdot , t)\), follows immediately. It is easy to verify that the above relation can be concisely written as an LMI involving the moment trajectory of \(P_\pi \). Concretely, we can write

$$\begin{aligned} \varvec{M}_f(\varvec{y}(t)) \succeq \varvec{0}, \end{aligned}$$
(LMI)

where \(\varvec{M}_f:\mathbb {R}^{n_L+n_H} \rightarrow \mathbb {S}^{{n+d\atopwithdelims ()d}}\) is a linear map. The precise structure of \(\varvec{M}_f\) depends on f and is immaterial for all arguments presented in this paper; however, the interested reader is referred to Lasserre [46] or Dowdy and Barton [21] for a detailed and formal description of the structure of \(\varvec{M}_f\). As clear from the above argument, the construction of valid LMIs of form (LMI) relies merely on polynomials that are non-negative on X. For stochastic chemical systems, natural choices of such polynomials, reflecting that \(P_{\pi }\) is non-negative and in particular not supported on states with negative molecular counts, are \(f(\varvec{x}) = 1\) and \(f(\varvec{x}) = x_i\) for \(i = 1,\dots ,n\) [21, 25, 43, 59]. More generally, the support of \(P_{\pi }(\cdot , t)\) on any basic closed semialgebraic set can be reflected this way, most importantly including the special cases of polyhedra and bounded integer lattices. To account for this flexibility while simplifying notation, we will make use of the following definition and shorthand notation.

Definition 2.3

Let \(f_0, \dots , f_{n_p}\) be polynomials that are non-negative on the reachable set X. The convex cone described by LMIs generated by these polynomials is denoted by C(X), i.e., \(C(X) = \{ \varvec{y} \in \mathbb {R}^{n_L+n_H} \mid \varvec{M}_{f_i}(\varvec{y}) \succeq \varvec{0}, \ i = 0,\dots ,n_p \}\).

Lastly, we note that the validity of LMIs of form (LMI) carries over to the generalized moments that are generated by non-negative test functions. To see this, observe that the linearity of \(\varvec{M}_{f}\) implies that

$$\begin{aligned} \varvec{M}_{f}(\varvec{z}(g;t)) = \int _{0}^{t} g(\tau ) \varvec{M}_{f}(\varvec{y}(\tau )) \, d\tau \end{aligned}$$

holds. Now assuming g is non-negative on \(\mathbb {R}_+\) and applying Jensen’s inequality to the extended convex indicator function of the positive semidefinite cone, \(\mathbbm {1}^{\infty }_{\mathbb {S}_+}\), yields that

$$\begin{aligned} 0 \le \mathbbm {1}^{\infty }_{\mathbb {S}_+}\left( \varvec{M}_{f}(\varvec{z}(g;t)) \right) \le \int _{0}^t g(\tau ) \mathbbm {1}^{\infty }_{\mathbb {S}_+}\left( \varvec{M}_{f}(\varvec{y}(\tau )) \right) \, d\tau = 0 \end{aligned}$$

and hence shows that \(\varvec{M}_{f}(\varvec{z}(g;t)) \succeq \varvec{0}\) must hold for any \(t \ge 0\) in analogy to Eq. (LMI).

2.3.2 Affine Constraints

The moment dynamics (mCME) give rise to affine constraints that the moments and generalized moments must satisfy [11, 13, 37, 41, 47, 48, 61]. To see this, consider a test function \(g \in \mathcal{A}\mathcal{C}([0,t_T])\) and final time \(t_f \le t_T\). Integrating \(\int _0^{t_f} g(t)\frac{d\varvec{y}_L}{dt}(t) \, dt\) by parts yields the following set of affine equations:

$$\begin{aligned} \varvec{K}\left( g(t_f)\varvec{y}(t_f) - g(0) \varvec{y}(0)\right) = \varvec{A} \varvec{z}(g;t_f) + \varvec{K} \varvec{z}(g';t_f). \end{aligned}$$
(2)

We wish to emphasize here that the above constraints are vacuous if \(\varvec{z}(g;t)\) and \(\varvec{z}(g';t)\) are no further restricted. This observation motivates necessary restrictions on the test function g to generate “useful” generalized moments. Recalling the discussion in Sect. 2.3, one may be tempted to argue that g and \(g'\) shall be non-negative (or non-positive) on \([0,t_f]\) so that the generated generalized moments satisfy LMIs of form (LMI). In fact, Dowdy and Barton [22] as well as Sakurai and Hori [61] demonstrate that this is indeed a reasonable strategy; they use exponential and monomial test functions, respectively. However, in principle a wider range of test functions can be used. We defer the discussion of this issue to Sect. 3.

3 Tighter Bounds

3.1 An Optimal Control Perspective

Some of the conservatism in the original method of Dowdy and Barton [22] stems from the fact that moments are only constrained in an integral or weak sense; i.e., \(\varvec{z}(g;t_f) = \int _0^{t_f} g(\tau ) \varvec{y}(\tau ) \, d\tau \) is constrained as opposed to \(\varvec{y}(t)\) for all \(t \in [0,t_f]\). This is potentially a strong relaxation as in fact the entire trajectory must satisfy the necessary moment conditions. Moreover, by Assumption 3, the moment trajectories remain bounded at all times, which, taken together with the fact that they satisfy the ODE (mCME), implies that they are guaranteed to be smooth. Using these two additional pieces of information, we argue that the following continuous-time optimal control problem provides an elementary starting point for addressing the question of how to bound the moment trajectories associated with a stochastic chemical system evaluated at a given time point \(t_f\):

$$\begin{aligned} \inf _{\varvec{y} \in \mathcal {C}^{\infty }(\mathbb {R}_+)} \quad&y_{\varvec{j}}(t_f) \nonumber \\ \text {s.t.} \quad&\frac{d\varvec{y}_L}{dt}(t) = \varvec{A}_L\varvec{y}_L(t) + \varvec{A}_H\varvec{y}_H(t),\quad \forall t \in \mathbb {R}_+, \nonumber \\&\varvec{y}(0) = \varvec{y}_{0}, \nonumber \\&\varvec{y}(t) \in C(X), \quad \forall t \in \mathbb {R}_+. \end{aligned}$$
(OCP)

Here, the lower-order moments \(\varvec{y}_L\) act as the state variables, while the higher-order moments \(\varvec{y}_H\) can be viewed as control inputs. Although the infinite-dimensional nature of Problem (OCP) leaves it with little immediate practical relevance, this representation is conceptually informative. It is not hard to verify that the method proposed by Dowdy and Barton [22] provides a systematic way to construct tractable SDP relaxations of (OCP). However, Dowdy and Barton’s method does in no way reflect the dependence of \(\varvec{y}(t_f)\) on past values of \(\varvec{y}(t)\) other than \(\varvec{y}(0)\) nor the fact that \(\varvec{y}\) is smooth or even continuous. As we will show in the following, ideas from the numerical analysis of ODEs allow us to reflect these features in the form of new necessary moment conditions constructed based on a discretization of the time domain of the problem. These conditions in turn yield tighter SDP relaxations of Problem (OCP) than those constructed by Dowdy and Barton’s method [22].

3.2 A New Hierarchy of Necessary Moment Conditions

In this section, we present the key contribution of this article—a new hierarchy of convex necessary moment conditions that reflect the temporal causality and regularity conditions inherent to the moment trajectories associated with the distribution described by the CME. To provide some intuition for these results, we will first discuss special cases of the proposed conditions which admit a clear interpretation. To derive these special cases, we specifically draw on two common ideas for the numerical analysis and solution of ODEs: temporal discretization and the analysis of the Taylor expansion of the moment trajectories [9, 16].

Recall that the moment trajectory \(y_{\varvec{j}}(\cdot )\) must be infinitely differentiable on \(\mathbb {R}_+\) as all moment trajectories remain bounded by Assumption 3 and obey the linear time-invariant dynamics (mCME). As a consequence, the Taylor polynomial

$$\begin{aligned} \varvec{T}_l(y_{\varvec{j}};t_1,t_2)&= \sum _{k=0}^l \frac{(t_2 - t_1)^k}{k!} \frac{d^ky_{\varvec{j}}}{dt^k}(t_1) \end{aligned}$$
(3)

and remainder

$$\begin{aligned} \varvec{R}_l(y_{\varvec{j}};t_1,t_2)&= \frac{1}{l!} \int ^{t_2}_{t_1} (t_2-t)^{l} \frac{d^{l+1} y_{\varvec{j}}}{dt^{l+1}}(t) \, dt \end{aligned}$$
(4)

are well-defined for any \(0 \le t_1 \le t_2 < +\infty \) and order \(l \ge 0\). Now two things can be observed. First, higher-order time derivatives of \(y_{\varvec{j}}\) as in Eqs. (3) and (4) are simply linear combinations of high-order moments due to the linear dynamics (mCME). Second, the remainder term (4) then can be characterized as a linear combination of generalized moments for the test function \(g_l(t)=\mathbbm {1}_{[t_1,t_2]}(t)(t_2-t)^l\). Thus, if \(|\varvec{j}|\) and l are sufficiently small,Footnote 1 then \(\varvec{T}_l(y_{\varvec{j}};t_1,t_2)\) and \(\varvec{R}_l(y_{\varvec{j}};t_1,t_2)\) depend linearly on \(\varvec{y}(t_1)\) and \(\varvec{z}(g_l;t_2)\); formally,

$$\begin{aligned} \begin{array}{l} \varvec{T}_l(y_{\varvec{j}};t_1,t_2) =\varvec{c}_{l,\varvec{j}}(t_1,t_2)^\top \varvec{y}(t_1), \\ \varvec{R}_l(y_{\varvec{j}};t_1,t_2) = \varvec{d}_{l,\varvec{j}}(t_1,t_2)^\top \varvec{z}(g_l;t_2), \end{array} \end{aligned}$$
(5)

for an appropriate choice of the coefficient vectors. Overall, this observation suggests to employ conditions of the form

$$\begin{aligned} y_{\varvec{j}}(t_2) = \varvec{T}_l(y_{\varvec{j}};t_1,t_2) + \varvec{R}_l(y_{\varvec{j}};t_1,t_2) \end{aligned}$$

at different time points along the trajectory as necessary moment conditions. These conditions achieve exactly what we set out to do: they establish a connection between \(y_{\varvec{j}}(t_2)\) and the moments at any past time point \(t_1\) using the smoothness properties of the moment trajectories \(\varvec{y}(t)\). In fact, for the same reason similar conditions are also used to derive and analyze numerical integration routines for ODEs such as Runge–Kutta or linear multistep methods [9, 16]. Further, it is straightforward to see that analogous conditions are readily obtained for any generalized moment generated by a sufficiently smooth test function. The above conditions hence appear to be a promising starting point. From a practical perspective, however, they merely suggest a particular choice of (local) polynomial test functions of the form \(g_l(t) = \mathbbm {1}_{[t_1,t_2]}(t) (t_2-t)^l\) as suggested by the integral form of the remainder in Eq. (4). This claim is formalized in the following proposition.

Proposition 3.1

Let \(0 \le t_1 \le t_2 < + \infty \) and \(n_I \le \left\lfloor \frac{m}{q} \right\rfloor \). Further, consider test functions of the form \(g_l(t) = \mathbbm {1}_{[t_1,t_2]}(t)(t_2-t)^l\). If \(\varvec{y}_{t_1}, \varvec{y}_{t_2} \in \mathbb {R}^{n_L+n_H}\) and \(\varvec{z}_{g_l,t_2} \in \mathbb {R}^{n_L+n_H}\) satisfy

$$\begin{aligned} \varvec{K}\left( g_l(t_2) \varvec{y}_{t_2} - g_l(t_1)\varvec{y}_{t_1} \right) = \varvec{A} \varvec{z}_{g_l,t_2} - l\varvec{K}\varvec{z}_{g_{l-1},t_2} \end{aligned}$$
(6)

for \(l = 0,\dots ,n_I\), then \(\varvec{y}_{t_1}, \varvec{y}_{t_2}\) and \(\varvec{z}_{g_l,t_2}\) also satisfy

$$\begin{aligned} y_{\varvec{j},t_2} = \varvec{c}_{l,\varvec{j}}(t_1,t_2)^\top \varvec{y}_{t_1} + \varvec{d}_{l,\varvec{j}}(t_1,t_2)^\top \varvec{z}_{g_l,t_2} \end{aligned}$$
(7)

for \(l=0,\dots ,n_I\) and \(\varvec{j}\) such that \(|\varvec{j}|\le m-lq\), where \(\varvec{c}_{l,\varvec{j}}\) and \(\varvec{d}_{l,\varvec{j}}\) are defined as in Eq. (5).

Proof

The proof is deferred to Appendix A. \(\square \)

Remark 3.1

Note that Condition (6) is analogous to Condition (2) as obtained for the test function \(g_l\) with a shifted origin, hence it is a necessary moment condition. Further, we wish to emphasize that Condition (6) is in general more stringent than Condition (7) as is made clear in the proof.

Beyond a specific choice of test functions, the above considerations motivate a broader strategy to generate necessary moment conditions that reflect causality. This strategy can be summarized as “discretize and constrain” and was previously shown to be effective in other contexts [17]. Instead of imposing Condition (2) on the entire time horizon \([0,t_f]\) as proposed by Dowdy and Barton [22], the time horizon can be partitioned into \(n_{\textsf{T}}\) subintervals \([t_{i-1}, t_i]\) with \(0= t_0< t_1< \cdots < t_{n_{\textsf{T}}} = t_f\) on which analogous conditions obtained from integrating \(\int _{t_{i-1}}^{t_i} g(\tau ) \varvec{K} \frac{d\varvec{y}}{dt}(\tau ) \, d\tau \) by parts can be imposed:

$$\begin{aligned}{} & {} \varvec{K}\left( g(t_i) \varvec{y}(t_i) - g(t_{i-1})\varvec{y}(t_{i-1}) \right) =\nonumber \\{} & {} \qquad \varvec{A} (\varvec{z}(g;t_i) - \varvec{z}(g;t_{i-1}) )+ \varvec{K}(\varvec{z}(g';t_i) - \varvec{z}(g';t_{i-1})). \end{aligned}$$
(8)

While by itself this does not provide any restriction over Condition (2), the following observation makes it worthwhile: the generalized moments generated by a non-negative test function g form a monotonically increasing sequence with respect to the convex cone \(C(X)\). This follows immediately from the definition of \(\varvec{z}(g;t)\) and Jensen’s inequality as described in Sect. 2.3; formally,

$$\begin{aligned} \varvec{z}(g;t_i) - \varvec{z}(g;t_{i-1}) \in C(X), \quad i = 1,\dots ,n_{\textsf{T}} \end{aligned}$$
(9)

are necessary moment conditions. Conditions (8) & (9) are generally a non-trivial restriction of Condition (2) & \(\varvec{z}(g;t_f) \in C(X)\) as employed by Dowdy and Barton [22]. To see this, simply observe that we recover Eq. (2) by summing Eq. (8) over \(i=1,\dots ,n_{\textsf{T}}\) and likewise obtain

$$\begin{aligned} \varvec{z}(g;t_f) = \sum _{i=1}^{n_{\textsf{T}}}\varvec{z}(g;t_i) - \varvec{z}(g;t_{i-1}) \in C(X), \end{aligned}$$

using that \(C(X)\) is a convex cone and \(\varvec{z}(g;0) = \varvec{0}\) by definition.

The above-described strategies lend themselves to generalization in terms of a hierarchy of necessary moment conditions. This generalization can be performed in several equivalent ways. Next we will present one such generalization utilizing a concept which we refer to as iterated generalized moments.

Definition 3.1

Let \(z_{\varvec{j}}(g;t)\) be the \(\varvec{j}\)th generalized moment as per Definition 2.2. Then, the iterated generalized moment of Level \(l \ge 0\) is defined by

$$\begin{aligned} z^l_{\varvec{j}}(g;t) = {\left\{ \begin{array}{ll} \int _{0}^t z_{\varvec{j}}^{l-1}(g;\tau )\,d\tau , &{}\quad l \ge 1 \\ g(t) y_{\varvec{j}}(t), &{}\quad l=0 \end{array}\right. }. \end{aligned}$$

For the sake of simplified notation and analysis, it will further prove useful to introduce the left and right integral operators \(I_{L},I_{R}: \mathcal {C}(\mathbb {R}^2) \rightarrow \mathcal {C}(\mathbb {R}^2)\) given by

$$\begin{aligned} (I_L f)(t_1,t_2) = \int _{t_1}^{t_2} f(t_1, t) \, dt \text { and } (I_R f)(t_1,t_2) = \int _{t_1}^{t_2} f(t, t_2) \, dt. \end{aligned}$$

For vector-valued functions, \(I_L\) and \(I_R\) shall be understand as being applied componentwise.

With these two notions in hand, a hierarchy of necessary moment conditions can be constructed by repeatedly integrating Eqs. (8) and (9) over an increment of the time domain partition. The resultant hierarchy of necessary moment conditions is summarized in the following proposition.

Proposition 3.2

Let \( t_T > 0\) and consider a non-negative test function \(g \in \mathcal{A}\mathcal{C}([0,t_T])\). Further, let \(\varvec{y}\) be the truncated sequence of moment trajectories associated with the solution of Eq. (CME), and \(\varvec{z}^l\) be the corresponding iterated generalized moments. Then, the following conditions hold for any \(l \ge 1\):

  1. (i)

    For any \(t\in [0,t_T]\) it holds that

    $$\begin{aligned} \varvec{A}\varvec{z}^l(g;t) + \varvec{K} \varvec{z}^l(g';t) = \varvec{K} \left( \varvec{z}^{l-1}(g;t) - \frac{t^{l-1}}{(l-1)!} g(0) \varvec{y}(0)\right) . \end{aligned}$$
  2. (ii)

    Let \(\varvec{f}(x,y) = \varvec{z}^1(g;y) - \varvec{z}^1(g;x)\). Then, for any \(0 \le t_1 \le t_2 \le t_T\) and \(k \in \left\{ 0,\dots ,l-1 \right\} \) it holds that \((I_L^{l-1-k}I^{k}_R \varvec{f})(t_1,t_2) \in C(X)\).

Proof

It is easily verified that Condition (i) is obtained from integrating Eq. (2) \(l-1\) times. Validity of Condition (ii) follows by a similar inductive argument: Since \(\varvec{y}(t) \in C(X)\) for all \(t \in [0,t_T]\), it follows by non-negativity of g on \([0,t_T]\) and Jensen’s inequality that

$$\begin{aligned} \varvec{z}^1(g;t_2) - \varvec{z}^1(g;t_1) = \int _{t_1}^{t_2} g(t) \varvec{y}(t)\, dt \in C(X)\end{aligned}$$

for any \(0 \le t_1 \le t_2 \le t_T\). Now suppose Condition (ii) is satisfied for \(l-1\). Then, it follows by Jensen’s inequality that for any \(0 \le t_1 \le t_2 \le t_T\) and \(k=0,\dots ,l-2\)

$$\begin{aligned} (I_L^{l-1-k} I_R^{k}\varvec{f})(t_1,t_2) = \int _{t_1}^{t_2} (I_L^{l-2-k} I_R^{k}\varvec{f})(t_1,t) \, dt \in C(X). \end{aligned}$$

For \(k = l-1\), an analogous argument applies. \(\square \)

Before we proceed, a few remarks are in order to contextualize this result.

Remark 3.2

Choosing \(l=1\), \(t_1=0\), \(t_2 = t_f\) and exponential test functions of the form \(g(t)=e^{\rho (t_T -t)}\) reproduces the necessary moment conditions proposed by Dowdy and Barton [22].

Remark 3.3

Regarding Condition (ii), one might be tempted to argue that any permutation of the operator products \(I_L\) and \(I_R\) of length \(l-1\) applied to \(\varvec{f}(x,y) = \varvec{z}^1(g;y) - \varvec{z}^1(g;x)\) gives rise to a new valid necessary moment condition. It can be confirmed, however, that \(I_L\) and \(I_R\) commute such that Condition (ii) is invariant under permutation of \(I_L\) and \(I_R\) (see Appendix B).

Remark 3.4

We wish to emphasize that Conditions (i) and (ii) depend affinely on the iterated generalized moments up to Level l evaluated at \(t_1\) and \(t_2\), respectively. Accordingly, they preserve the computational advantages of Dowdy and Barton’s necessary moment conditions. To avoid notational clutter in the remainder of this article we will disguise this fact and concisely denote the left-hand side of Condition (ii) by

$$\begin{aligned} \varOmega _{l,k}\left( \left\{ {\varvec{z}^i(g;t_1)} \right\} _{i=1}^{l}, \left\{ {\varvec{z}^i(g;t_2)} \right\} _{i=1}^{l}, t_1, t_2\right) . \end{aligned}$$

An explicit algebraic expression for \(\varOmega _{l,k}\) is provided in Appendix D.

Remark 3.5

For the \(\varvec{0}\)-th-generalized moments, additional constraints arise from the definition as

$$\begin{aligned} z_{\varvec{0}}^l(g;t) = {\left\{ \begin{array}{ll} \int _0^t z_{\varvec{0}}^{l-1}(g;\tau )\, d\tau , \ {} &{}l \ge 1\\ g(t), \ {} &{}l=0 \end{array}\right. } \end{aligned}$$

can be evaluated explicitly.

It is crucial to emphasize that a careful choice of test functions is critical to endow the conditions put forth in Proposition 3.2 with restrictive power. In particular, Condition (i) in Proposition 3.1 is effectively unrestrictive unless \(\varvec{z}^l(g';t)\) can be further constrained. Fortunately, one can draw from rich function classes including polynomials [61], trigonometric functions [11, 44] and exponentials [22] to assemble a set of test functions that render Condition (i) in Proposition 3.1 “self-contained.” For such test function sets, \(\varvec{z}^l(g';t)\) reduces to linear combinations of the generalized moments generated by the test function set itself. The following examples showcase how such test function sets can be constructed in practice from exponential and monomial test functions.

Example 3.1

Consider test functions of the form \(g(t) = e^{\rho t}\). Then, the associated generalized moments satisfy \( \varvec{z}^l(g';t) = \rho \varvec{z}^l(g; t)\) due to the linearity of \(\varvec{z}^l\) in its first argument. Condition (i) of Proposition 3.2 thus reduces to

$$\begin{aligned} \left( \varvec{A} + \rho \varvec{K} \right) \varvec{z}^l(g;t) = \varvec{K}\left( \varvec{z}^{l-1}(g;t) - \frac{t^{l-1}}{(l-1)!}g(0) \varvec{y}(0)\right) . \end{aligned}$$

Example 3.2

Consider a set of test functions of the form \(g_k(t) = t^k\) for \(k=0,\dots ,n\). By linearity of \(\varvec{z}^l\) in its first argument, the identity \(\varvec{z}^l(g_k';t) = k\varvec{z}^l(g_{k-1}; t)\) must hold. Accordingly, Condition (i) in Proposition 3.2 reduces to

$$\begin{aligned} \varvec{A} \varvec{z}^l(g_k;t) +k \varvec{K} \varvec{z}^l(g_{k-1};t) = \varvec{K}\left( \varvec{z}^{l-1}(g_k;t) - \frac{t^{l-1}}{(l-1)!}g_k(0) \varvec{y}(0)\right) . \end{aligned}$$

The preceding discussion and examples indicate that the span of an appropriate test function set should be closed under differentiation. The following proposition formalizes this guideline.

Proposition 3.3

Let F be a finite set of test functions such that \(\text {span}\left( F\right) \) is closed under differentiation. Then, for any \(f \in F\), there exists a linear map \(\varGamma _f\) such that Condition (i) in Proposition 3.2 is equivalent to

$$\begin{aligned} \varGamma _f\left( \left\{ \varvec{z}^l(g;t) \right\} _{g\in F} \right) = \varvec{K} \left( \varvec{z}^{l-1}(f;t) -\frac{t^{l-1}}{(l-1)!} f(0) \varvec{y}(0) \right) . \end{aligned}$$

We omit the elementary proof of Proposition 3.3 and instead refer back to Examples 3.1 and 3.2 for how the linear map \(\varGamma _f\) can be constructed in practice.

Another issue pertaining to the choice of test functions is the requirement of non-negativity in Proposition 3.2. This problem can be alleviated by a simple reformulation and shift of the time horizon in Proposition 3.2. For example, if a test function g is non-negative on \([0,t_+]\) and non-positive on \([t_+,t_T]\), we can simply consider the two test functions \(g_+(t) = \mathbbm {1}_{[0,t_+]}(t) g(t)\) and \(g_-(t) = -\mathbbm {1}_{[t_+,t_T]}(t) g(t)\) in place of g and impose the necessary moment conditions on the intervals \([0,t_+]\) and \([t_+,t_T]\), respectively. This construction naturally extends to test functions with any finite number of sign changes.

We conclude this section by establishing some compelling properties of the hierarchy of necessary moment conditions put forward in Proposition 3.2. On the one hand, Conditions (i) and (ii) in Proposition 3.2 include the conditions considered in Proposition 3.1 as special cases. So in particular, they enforce consistency with higher-order Taylor expansions of the true moment trajectories as discussed in the beginning of this section. The following corollary to Proposition 3.2 formalizes this claim.

Corollary 3.1

Let \(n_I \in \mathbb {Z}_+\) and \(t_T > 0\) be fixed. Further, suppose \(g \in \mathcal{A}\mathcal{C}([0,t_T])\) is non-negative, and let \(\varvec{y}\) and \(\varvec{z}^l(\cdot ;\cdot )\) be arbitrary functions such that \(\varvec{z}^l(\cdot ;\cdot )\) is linear in the first argument and \(\varvec{z}^0(g;t)= g(t)\varvec{y}(t)\) holds. Fix \(0 \le t_1 \le t_2 \le t_T\) and define \(h_l(t) = \mathbbm {1}_{[t_1,t_2]}(t)(t_2-t)^l\) for \(l=0,1,\dots ,n_I\). If Conditions (i) and (ii) of Proposition 3.2 are satisfied by \(\left\{ \varvec{z}^l(g;t_i) \right\} _{l=0}^{n_I+1}\) for \(i=1,2\), then there exist functions \(\varvec{z}(\cdot ;\cdot )\) that are linear in the first argument, and satisfy

$$\begin{aligned} \varvec{K}\left( h_l(t_2)g(t_2)\varvec{y}(t_2) - h_l(t_1)g(t_1)\varvec{y}(t_1) \right) = \varvec{A} \varvec{z}(h_l g;t_2) + \varvec{K}\varvec{z}((h_lg)';t_2) \end{aligned}$$
(10)

and

$$\begin{aligned} \varvec{z}(h_l g; t_2) \in C(X) \end{aligned}$$
(11)

for all \(l \in \left\{ 0,\dots ,n_I \right\} \).

Proof

The proof is deferred to Appendix C. \(\square \)

Remark 3.6

To see the connection between Condition (10) and Condition (6) in Proposition 3.1, simply consider the case where \(g(t) = 1\). Moreover, note that Corollary 3.1 also shows that necessary moment conditions of the form of (8) & (9) are implied as they are recovered for \(l=0\) since we can simply identify \(\varvec{z}(h_0g;t_2)\) with \(\varvec{z}^1(g;t_2) - \varvec{z}^1(g;t_1)\).

On the other hand, the proposed necessary moment conditions display benign scaling behavior. Condition (ii) in Proposition 3.2 scales quadratically with respect to the level l of the hierarchy, independent of the state space dimension. Moreover, a strengthening of the conditions is obtained by refining the partition of the time horizon. This strengthening mechanism does not only provide a desirable degree of flexibility but also enjoys linear scaling with respect to the number of intervals in the partition which of course is independent of the state space dimension. In particular, Condition (ii) must only be imposed between the endpoints of each interval of the partition to ensure it holds between any two endpoints. This claim is formalized in the following Corollary.

Corollary 3.2

Let \(0 \le t_1 \le t_2 \le t_3 < +\infty \) and \(n_I\) be a fixed positive integer. Suppose \(\left\{ \varvec{z}^s \right\} _{s=1}^{n_I}\) is a set of functions such that

$$\begin{aligned} \varOmega _{l,k}\left( \left\{ {\varvec{z}^s(t_{i})} \right\} _{s=1}^{l}, \left\{ {\varvec{z}^s(t_{i+1})} \right\} _{s=1}^{l}, t_i, t_{i+1}\right) \in C(X)\end{aligned}$$

for all \(i \in \{1,2\}\) and \(k,l \in \mathbb {Z}_+\) such that \(k < l \le n_I\). Then,

$$\begin{aligned} \varOmega _{l,k}\left( \left\{ {\varvec{z}^s(t_{1})} \right\} _{s=1}^{l}, \left\{ {\varvec{z}^s(t_{3})} \right\} _{s=1}^{l}, t_1, t_{3}\right) \in C(X)\end{aligned}$$

holds for all \(k,l \in \mathbb {Z}_+\) such that \(k < l \le n_I\).

Proof

The proof is deferred to Appendix D. \(\square \)

3.3 An Augmented Semidefinite Program

In this section, we construct an SDP based on the hierarchy of necessary moment conditions put forth in Proposition 3.2. The optimal value of this SDP furnishes bounds on the moment solutions of Eq. (CME) at a given time point \(t_f \in [0,t_T]\). To that end, we consider the moment truncation order m to be fixed and the following user choices as known:

  1. (i)

    \(\textsf{T} = \left\{ t_1,\dots ,t_{n_{\textsf{T}}} \right\} \)—A finite, ordered set of time points framing the partition of the time horizon. For notational simplicity, we assume that \(0< t_1< t_2< \cdots < t_{n_{\textsf{T}}} \le t_T\) and \(t_f \in \textsf{T}\).

  2. (ii)

    \(\textsf{F} = \left\{ g_1,\dots ,g_{n_{\textsf{F}}} \right\} \)—A finite set of test functions that satisfies the hypotheses of Propositions 3.2 and 3.3.

  3. (iii)

    \(n_I\)—A non-negative integer controlling the hierarchy level in Proposition 3.2.

These quantities parametrize a spectrahedron \(\textsf{S}(\textsf{F},\textsf{T}, n_I)\) described by the necessary moment conditions of Proposition 3.2 as imposed for all test functions in \(\textsf{F}\), at all time points in \(\textsf{T}\) and for all hierarchy levels up to \(n_I\). In the formulation of \(\textsf{S}(\textsf{F},\textsf{T},n_I)\), however, we use a slightly different but equivalent formulation of Condition (i) of Proposition 3.2. The reason for this modification is that it results in weakly coupled conditions that allow the resultant SDPs to be decomposed in line with the temporally causal structure of the constraints as we will discuss in Sect. 4.1. Details on this reformulation can be found in Appendix E. \(\textsf{S}(\textsf{F},\textsf{T},n_I)\) is explicitly stated below; for the sake of concise notation we introduced the shorthand n(t) for the left adjacent time point of \(t \in \textsf{T}\), i.e., \(n(t_i) = t_{i-1}\) for \(i = 2,\dots ,n_{\textsf{T}}\) and \(n(t_1) = 0\).

$$\begin{aligned}&\textsf{S}(\textsf{F},\textsf{T},n_I) \\&\qquad = \left\{ \{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\} \left| \begin{array}{l} \varvec{z}^0_{g,t} = g(t) \varvec{y}_t, \quad \forall (g,t) \in \textsf{F}\times \textsf{T},\\ \varvec{y}_t \in C(X), \quad \forall t \in \textsf{T},\\ \varGamma _g\left( \{\varvec{z}_{f,t}^{l}\}_{f \in \textsf{F}}\right) = {\left\{ \begin{array}{ll} \varvec{K} \left( \varvec{z}^{l-1}_{g,t} - \frac{t^{l-1}}{(l-1)!}g(0) \varvec{y}_0\right) , &{}\text {if } t = t_1 \\ \left( \frac{t}{n(t)}\right) ^{l-1}\varGamma _g \left( \{\varvec{z}_{f,n(t)}^{l}\}_{f \in \textsf{F}}\right) \\ \qquad + \varvec{K} \left( \varvec{z}^{l-1}_{g,t} - \left( \frac{t}{n(t)}\right) ^{l-1} \varvec{z}_{g,n(t)}^{l-1} \right) , &{}\text {if } t\ne t_1 \end{array}\right. }, \\ \qquad \qquad \qquad \qquad \forall (g,t,l) \in \textsf{F} \times \textsf{T} \times \left\{ 1,\dots ,n_I \right\} ,\\ \varOmega _{l,k}\left( \{\varvec{z}^s_{g,n(t)}\}_{s=1}^{l},\{\varvec{z}^s_{g,t}\}_{s=1}^{l},n(t),t\right) \in C(X), \\ \qquad \qquad \qquad \qquad \forall (g,t) \in \textsf{F} \times \textsf{T} \text { and } \forall k,l \in \mathbb {Z}_+ \text { such that } k < l \le n_I \end{array}\right. \right\} \end{aligned}$$

By construction, the set \(\textsf{S}(\textsf{F},\textsf{T},n_I)\) contains the sequences \(\{\varvec{y}(t):{t\in \textsf{T}}\}\) and \(\{ \varvec{z}^l(g;t): (g,t,l) \in \textsf{F} \times \textsf{T} \times \{0,\dots ,n_I\}\}\) as generated by the true moment trajectories associated with the solution of Eq. (CME). Another piece of information that can be used to further restrict the set of candidates for the true moment solutions to Eq. (CME) is information about the moments of the initial distribution. We know for example from the definition that any iterated generalized moment \(\textbf{z}^l(g;t)\) for \(l\ge 1\) must vanish at \(t=0\). Moreover, one usually has specific information about the initial distribution of the system state, hence also about \(\varvec{y}(0)\). Here, we assume that the initial moments and iterated generalized moments are confined to a spectrahedral set denoted by \(\textsf{S}_0(\textsf{F},n_I)\). In the common setting in which the moments of the initial distribution are known exactly, \(\textsf{S}_0(\textsf{F},n_I)\) would be given by

$$\begin{aligned} \textsf{S}_0(\textsf{F},n_I) = \left\{ \varvec{y}_0, \{\varvec{z}^{l}_{g,0}\} \left| \begin{array}{l} \varvec{y}_0 = \varvec{y}(0), \\ \varvec{z}^l_{g,0} = \varvec{0}, \ \forall (g,l) \in \textsf{F}\times \left\{ {1,\dots ,n_I} \right\} \end{array}\right. \right\} . \end{aligned}$$

Albeit adding the corresponding constraints to the description of \(\textsf{S}(\textsf{T,F},n_I)\) appears natural, we deliberately choose to reflect this piece of information separately. Our motivation for this distinction is twofold: On the one hand, we want to emphasize that the presented approach naturally extends to the setting of uncertain or imperfect knowledge of the moments of the initial distribution of the system. Specifically, if the moments of the initial distribution are not known exactly, however, known to be confined to a spectrahedral set, the proposed bounding procedure applies without modification. On the other hand, we will argue in Sect. 4.1 that the arising optimization problems lend themselves to be decomposed according to the temporal structure of the constraints. The distinction in notation made here will simplify our exposition there.

The following theorems finally summarize the key feature of our proposed bounding approach—the ability to generate a sequence of monotonically improving, practically computable bounds on the moment trajectories associated with the solution of Eq. (CME). Theorem 3.1 shows that these bounds can be obtained by way of solving an SDP.

Theorem 3.1

Let \(\varvec{y}\) denote the transient moments as described in Definition 2.1. Further, for any time point \(t_f \in \textsf{T}\) and multi-index \(|\varvec{i}| \le m\), define

$$\begin{aligned} y_{\varvec{i},t_f}^* = \inf _{\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}} \qquad&y_{\varvec{i},t_f} \nonumber \\ s.t. \qquad&(\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}) \in \textsf{S}(\textsf{F},\textsf{T},n_I), \nonumber \\&(\varvec{y}_0,\{\varvec{z}^{l}_{g,0}\}) \in \textsf{S}_0(\textsf{F},n_I). \end{aligned}$$
(SDP)

Then, \(y_{\varvec{i},t_f}^* \le y_{\varvec{i}}(t_f)\).

Proof

Set \(\varvec{y}_t = \varvec{y}(t)\) for all \(t \in \textsf{T}\) and \(\varvec{z}^{l}_{g,t} = \varvec{z}^{l}(g;t)\) for all \((g,t,l) \in \textsf{F}\times \textsf{T}\times \left\{ 0,\dots , n_I \right\} \) with \(\varvec{z}^l(g;t)\) as in Definition 3.1. By Proposition 3.2\((\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}) \in \textsf{S}(\textsf{F},\textsf{T},n_I)\), and the result follows. \(\square \)

Remark 3.7

For equal truncation orders and choice of \(C(X)\), Remark 3.2 implies that the bounds obtained from (SDP) are at least as tight as those obtained by the approach of Dowdy and Barton [22].

Remark 3.8

The lower bound \(\varvec{y}_{\varvec{i},t_f}^*\) can be evaluated using off-the-shelf solvers for SDPs such as MOSEK [6], SeDuMi [68] or SDPT3 [69].

Remark 3.9

Similar problems as (SDP) can be formulated to bound properties that can be described in terms of moments of non-negative measures on the reachable set; examples include variances [21], the volume of a confidence ellipsoids [60] and the mass that the probability measure assigns to a semialgebraic set [21].

The formulation of (SDP) provides several mechanisms to improve the bounds by adjusting the parameters \(\textsf{T}\), \(\textsf{F}\) and \(n_I\). Theorem 3.2 shows that appropriate adjustments lead to a sequence of monotonically improving bounds.

Theorem 3.2

Let \(y_{\varvec{i},t_f}^*\) be defined as in Theorem 3.1. Let \(\tilde{t} \in [0,t_T]\) and define \(\tilde{\textsf{T}} = \textsf{T}\cup \left\{ \tilde{t} \right\} \). Further, let \(\tilde{g}\) be an absolutely continuous function that is non-negative on \([0,t_T]\) and define \(\tilde{\textsf{F}} = \textsf{F} \cup \left\{ \tilde{g} \right\} \). Then,

$$\begin{aligned} y_{\varvec{i},t_f}^* \le \inf _{\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}} \qquad&y_{\varvec{i},t_f} \nonumber \\ s.t. \qquad&(\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}) \in \textsf{S}(\tilde{\textsf{F}},\tilde{\textsf{T}},n_I), \nonumber \\&(\varvec{y}_0,\{\varvec{z}^{l}_{g,0}\}) \in \textsf{S}_0(\tilde{\textsf{F}},n_I). \end{aligned}$$
(12)

Likewise,

$$\begin{aligned} y_{\varvec{i},t_f}^* \le \inf _{\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}} \qquad&y_{\varvec{i},t_f} \nonumber \\ s.t. \qquad&(\{\varvec{y}_t\}, \{\varvec{z}^{l}_{g,t}\}) \in \textsf{S}(\textsf{F},\textsf{T},n_I+1), \nonumber \\&(\varvec{y}_0,\{\varvec{z}^{l}_{g,0}\}) \in \textsf{S}_0(\textsf{F},n_I+1). \end{aligned}$$
(13)

Proof

Inequality (12) is clearly true if \(\tilde{t} \in \textsf{T}\) and \(\tilde{g} \in \textsf{F}\). If \(\tilde{t} \notin \textsf{T}\) and/or \(\tilde{g} \notin \textsf{F}\), any feasible point of the right-hand side of (12) can be used to construct a feasible point of (SDP); simply remove the decision variables that correspond to time point \(\tilde{t}\) and/or test function \(\tilde{g}\). Similarly, removing the iterated generalized moments of Level \(n_I+1\) of the right-hand side of (13) yields a feasible point for (SDP). \(\square \)

Remark 3.10

Increasing the truncation order also gives rise to monotonically improving bounds. For the sake of brevity, we omit a formal statement and proof here as many easily adapted results of this type exist; see for example Corollary 6 in Kuntz et al. [43].

Theorems 3.1 and 3.2 establish that the proposed necessary moment conditions provide a practical way to compute hard bounds on the transient moments associated with stochastic chemical systems alongside several mechanisms to tighten the bounds. A natural question in this context is whether these bounds converge to the true transient moments as the bound tightening mechanisms are taken to the limit. While we find that the bounds often become tight enough to be of practical value, we cannot claim convergence. This contrasts with many applications of the standard moment-sum-of-squares hierarchy, such as polynomial optimization [45], solving nonlinear PDEs [41], option pricing [48] and deterministic optimal control [37, 47], where convergence guarantees can be established under regularity conditions. The main obstacle obstructing the extension of these results to bounding schemes for stochastic chemical systems is the discrete nature of the state space of such systems. On the one hand, the state space of open systems is an unbounded integer lattice and hence not basic closed semialgebraic. This violates a key assumption underpinning existing convergence guarantees. On the other hand, even though the state space of closed systems is a bounded integer lattice and thus also a basic closed semialgebraic set, it typically features an exceedingly large number of elements, rendering the reflection of all individual elements of this set through LMIs intractable. To the best of our knowledge, the only moment bounding scheme for stochastic chemical systems that defies these complications and provides convergence guarantees is that of Kuntz et al. [43]. However, their construction relies on enumerating the discrete states through growing, finite truncations of the state space, complemented by moment bounds accounting for the truncated states. As such, their bounding scheme forfeits the perks of a purely moment-based description and is therefore more limited in practice.

We conclude this section with a brief discussion of the scalability of (SDP). Table 1 summarizes how the number of variables, affine constraints and LMIs as well as their dimension scales with \(\textsf{F}\), \(\textsf{T}\), \(n_I\) and the truncation order m. The results demonstrate the value of the proposed formulation if the number of species n in the system under investigation is large. In that case, the bound tightening mechanisms offered by adjusting \(\textsf{F}\), \(\textsf{T}\) and \(n_I\) scale much more moderately than increasing the truncation order. Furthermore, it should be emphasized that the invariance of LMI size with respect to \(\textsf{F}\), \(\textsf{T}\) and \(n_I\) is a particularly desirable property to achieve scalability of SDP hierarchies in practice [1, 3]. Lastly, it is worth noting that moment-based SDPs are notorious for becoming numerically ill-conditioned as the truncation order increases. Thus, the presented hierarchy provides a mechanism to circumvent this issue to an extent.

Table 1 Scaling of (SDP)

4 Practical Considerations

4.1 Leveraging Causality for Decomposition

Techniques for the efficient numerical integration of ODEs hinge fundamentally on the causality that is inherent to the solution of ODEs. It enables the original problem, namely integration over a long time horizon, to be decomposed into a sequence of simpler, more tractable subproblems, each corresponding to integration over only a small fraction of the time horizon. In this section, we discuss how the structure of the presented optimization problems can be exploited in a similar spirit. Additionally, we show that such exploitation of structure gives rise to a mechanism for trading off tractability and bound quality.

Suppose we are interested in computing moment bounds at the end of a long time horizon \([0,t_f]\). In light of the arguments made in Sect. 3.2, it is reasonable to expect that the set \(\textsf{T}\) should ideally be populated with a large number of time points in this setting. Accordingly, solving the resultant optimization problem in one go may become prohibitively costly, even despite the benign scaling of the SDP size with respect to \(|\textsf{T}|\). As alluded to in the beginning of this section, this limitation may be circumvented by decomposing the problem into a sequence of simpler subproblems each of which cover only a fraction of the time horizon. To that end, suppose that \(\textsf{T} = \left\{ t_1, \dots , t_{n_{\textsf{T}}} \right\} \) is ordered with \(t_{n_{\textsf{T}}}=t_f\), and let \(t_0 = 0\). Further consider the subsets \(\textsf{T}_1, \dots , \textsf{T}_{n_{\textsf{T}}}\) of \(\textsf{T}\) such that \(\textsf{T}_k = \left\{ t_k \right\} \). We now define

$$\begin{aligned} \textsf{S}_{k} = {\left\{ \begin{array}{ll} \textsf{S}_0(\textsf{F},n_I), &{}\text {if } k = 0,\\ \left\{ \varvec{y}_{t_k}, \{\varvec{z}^{l}_{g,t_k}\} \left| \begin{array}{l} \exists (\varvec{y}_{t_{k-1}}, \{\varvec{z}^{l}_{g,t_{k-1}}\}) \in \textsf{S}_{k-1} \text { such that } \\ (\{\varvec{y}_{t_{k-1}}, \varvec{y}_{t_k}\}, \{\varvec{z}^{l}_{g,t_{k-1}}, \varvec{z}^{l}_{g,t_k}\}) \in \textsf{S}(\textsf{F},\textsf{T}_k, n_I) \end{array} \right\} , \right.&\text {if } k \ge 1. \end{array}\right. } \end{aligned}$$

At this point, it is worth emphasizing the meaning of each \(\textsf{S}_k\) and how its construction directly exploits the way we formulated the necessary moment conditions in \(\textsf{S}(\textsf{F},\textsf{T},n_I)\). To that end, note that each condition in \(\textsf{S}(\textsf{F},\textsf{T},n_I)\) only links variables corresponding to adjacent time points. As a consequence, the set \(\textsf{S}(\textsf{F},\textsf{T}_k,n_I)\) constrains only the variables \((\{\varvec{y}_{t_{k-1}}, \varvec{y}_{t_k}\}, \{\varvec{z}^{l}_{g,t_{k-1}}, \varvec{z}^{l}_{g,t_k}\})\). By construction of \(\textsf{S}_k\), we project out the variables \((\varvec{y}_{t_{k-1}}, \{\varvec{z}^{l}_{g,t_{k-1}}\})\) while imposing their membership in \(\textsf{S}_{k-1}\). It follows by induction that \(\textsf{S}_k\) precisely describes the projection of \(\textsf{S}(\textsf{F}, \cup _{i=1}^k \textsf{T}_{i}, n_I)\) onto the variables \((\varvec{y}_{t_{k}}, \{\varvec{z}^{l}_{g,t_{k}}\})\) under the condition that \((\varvec{y}_0, \{\varvec{z}_{g,0}\}) \in \textsf{S}_0(\textsf{F},n_I)\). By this argument, it follows that the original problem (SDP) is equivalent to the following reduced space formulation:

$$\begin{aligned} \inf _{\varvec{y}_{t_f}, \{\varvec{z}^{l}_{g,t_f}\}} \qquad&y_{\varvec{i},t_f}\\ \text {s.t.} \qquad&(\varvec{y}_{t_f}, \{\varvec{z}^{l}_{g,t_f}\}) \in \textsf{S}_{n_{\textsf{T}}}, \end{aligned}$$

where all decision variables that correspond to time points before \(t_f\) have been projected out. It should be clear that the above optimization problem provides a computational advantage over the original problem only if the set \(\textsf{S}_{n_{\textsf{T}}}\) can be represented, or at least tightly approximated, in a “simple” way. To that end, we suggest to successively compute conic outer approximations of the projections \(\textsf{S}_k\) according to Algorithm 1.

Algorithm 1
figure a

Successive Overapproximation

Note that Algorithm 1 parallels the decomposition approach taken in classical numerical integration of ODEs: the task of finding moment bounds at the end of the time horizon \([0,t_f]\) is decomposed into a sequence of smaller subproblems corresponding to finding moment bounds over smaller subintervals of the horizon, and each subproblem requires the solution of the previous subproblem as input data. In other words, Algorithm 1 propagates the moment bounds forward in time, successively subinterval by subinterval, in the same way as a numerical integrator propagates values of the state of a dynamical system forward in time.

We conclude this section with some final remarks. First, we would like to emphasize that the specific choices of the subdomains \(\textsf{T}_k\) made in this section are made purely for clarity of exposition. The partition can be chosen as coarse as desired, i.e., each \(\textsf{T}_k\) can comprise multiple time points, only requiring minimal adjustments of Algorithm 1. Second, computing and representing the conic overapproximations in Algorithm 1 may be expensive, in particular if many moments are considered. For example, computing a polyhedral outer approximation of the positive semidefinite cone is known to converge exponentially slowly in the worst-case [15]. Second-order cone approximations perform better empirically [1, 4] and theoretically [12], however, are more expensive to compute and represent. On the other hand, it may not be necessary to find overapproximations that are globally tight but only near the optimal solution of the original problem. Finally, with decisions on accuracy of the overapproximation and the coarseness of the partition of \(\textsf{T}\) required in Algorithm 1, one is left with mechanisms to trade off accuracy and computational cost.

4.2 Quantifying Approximation Quality

A natural question that arises from the formulation of Problem (SDP) is how to choose the parameters required for its construction, i.e., the sets \(\textsf{F}\) and \(\textsf{T}\), and the level \(n_I\) of the proposed constraint hierarchy. We will show that an approximation of Problem (OCP) can provide useful guidance for these choices. Specifically, we show that an approximation of (OCP) provides rigorous information on the best attainable bounds given the truncation order m is fixed. To that end, recall that (OCP) requires optimization over an infinite-dimensional vector space, namely \(\mathcal {C}^\infty (\mathbb {R}_+)\). To overcome this challenge, we will make two restrictions. On the one hand, we will restrict our considerations to a compact interval \([0,t_T]\) and, on the other hand, we will restrict the search space to the set of univariate polynomials up to a fixed but arbitrary maximum degree \(d \in \mathbb {Z}_+\). Note that the latter restriction is in some sense arbitrarily weak as \(\mathbb {R}[t]\) is dense in \(\mathcal {C}^\infty ([0,t_T])\) [58].

The above-discussed restrictions enable the construction of a tractable approximation of (OCP) using the following result which can be traced back to the work of Nesterov [53] as well as Powers and Reznick [56].

Proposition 4.1

Let md be positive integers. If d is odd, let \(r=k=\frac{d+1}{2}m\). Otherwise, let \(k=\left( \frac{d}{2}+1\right) m\), and \(r=\frac{dm}{2}\). Then, there exist two linear maps \(\alpha :\mathbb {S}^k \rightarrow \mathbb {S}^{m}[t]\) and \(\beta :\mathbb {S}^r \rightarrow \mathbb {S}^m[t]\) such that the matrix polynomial \(\varvec{X} \in \mathbb {S}^m[t]\) satisfies \(\varvec{X}(t) \succeq \varvec{0}\) on [0, 1] if and only if there exist two matrices \(\varvec{Q}_\alpha \in \mathbb {S}^k_+\) and \(\varvec{Q}_\beta \in \mathbb {S}^r_+\) such that \(\varvec{X} = \alpha (\varvec{Q}_\alpha ) + \beta (\varvec{Q}_\beta )\).

The maps \(\alpha \) and \(\beta \) in Proposition 4.1 are remarkably simple and freely available software tools for sum-of-squares programming allow for simple, concise implementation. The interested reader is referred to Ahmadi and El Khadir [2, Proposition 2] for an explicit description of \(\alpha \) and \(\beta \) alongside a simple proof of Proposition 4.1.

Proposition 4.1 allows to construct a tractable restriction of (OCP) on a compact horizon. The following theorem which may be regarded as a special case of the results of Ahmadi and El Khadir [2] formalizes this claim.

Theorem 4.1

Let \(d \in \mathbb {Z}_+\). Then, the following semi-infinite optimization problem

$$\begin{aligned} \inf _{\varvec{y} \in \mathbb {R}_d^{n_L+n_H}[t]} \quad&y_{\varvec{j}}(t_f) \nonumber \\ s.t. \quad&\frac{d\varvec{y}_L}{dt}(t) = \varvec{A}_L\varvec{y}_L(t) + \varvec{A}_H\varvec{y}_H(t), \quad \forall t \in [0, t_T], \nonumber \\&\varvec{y}(0) = \varvec{y}_0, \nonumber \\&\varvec{y}(t) \in C(X), \quad \forall t \in [0, t_T]. \end{aligned}$$
(pOCP)

is equivalent to a finite SDP.

Proof

First, note that all equality constraints in the above optimization problem require equality of polynomials of fixed maximum degree. Accordingly, equality can be enforced by matching the coefficients of the polynomials when expressed in a common basis which in turn can be done via finitely many affine equality constraints. Additionally, recall that \(C(X)\) is described in terms of finitely many LMIs. Thus, the constraint \(\varvec{y}(t) \in C(X)\), \(\forall t \in [0, t_T]\) is as well by Proposition 4.1. \(\square \)

Unfortunately, (pOCP) may be a strong restriction and often even infeasible. However, the formulation of (pOCP) can be further relaxed without giving up too much relevant information. Specifically, we propose to restrict the solution space to piecewise polynomial functions in analogy to the collocation approach to optimal control [18]. The following corollary to Theorem 4.1 formalizes this approach.

Corollary 4.1

Let \(d,n_{\textsf{T}} \in \mathbb {Z}_+\) and consider \(n_{\textsf{T}}+1\) time points \(t_0,\dots ,t_{n_{\textsf{T}}}\) such that \(0=t_0<t_1<\cdots <t_{n_{\textsf{T}}} \le t_T\). Further suppose that \(t_f \in [t_k,t_{k-1}]\) for some k. Then, the following semi-infinite optimization problem

$$\begin{aligned} \inf _{\varvec{y}^i \in \mathbb {R}_d^{n_L+n_H}[t]} \quad&y^k_{\varvec{j}}(t_f) \nonumber \\ s.t. \quad&\frac{d\varvec{y}^i_L}{dt}(t) = \varvec{A}_L\varvec{y}^i_L(t) + \varvec{A}_H\varvec{y}^i_H(t), \quad \forall t \in [t_{i-1}, t_{i}], \ \forall i \in \left\{ 1,\dots ,n_{\textsf{T}} \right\} , \nonumber \\&\varvec{y}^i(t_{i}) = \varvec{y}^{i+1}(t_{i}), \quad \forall i \in \left\{ 1,\dots ,n_{\textsf{T}}-1 \right\} , \nonumber \\&\varvec{y}^1(0) = \varvec{y}_0, \nonumber \\&\varvec{y}^i(t) \in C(X), \quad \forall t \in [t_{i-1}, t_{i}],\ \forall i \in \left\{ 1,\dots ,n_{\textsf{T}} \right\} \end{aligned}$$
(pwpOCP)

is equivalent to a finite SDP. Further, (pwpOCP) is a valid restriction of (SDP).

Proof

That (pwpOCP) is equivalent to a finite SDP follows immediately from Theorem 4.1. Further, let \(\left\{ \varvec{y}^i \right\} \) be feasible for (pwpOCP) and consider the piecewise polynomial obtained by parsing the \(\varvec{y}^i\) together like

$$\begin{aligned} \tilde{\varvec{y}}(t) = \varvec{y}^i(t), \ \forall t \in (t_{i-1},t_i] \text { and } \forall i \in \left\{ 1,\dots ,n_{\textsf{T}} \right\} . \end{aligned}$$

By construction \(\tilde{\varvec{y}}\) satisfies Eq. (mCME) and \(\tilde{\varvec{y}}(t) \in C(X)\), \(\forall t \in [0,t_f]\). Accordingly, the iterated generalized moments obtained from \(\tilde{\varvec{y}}\) satisfy Conditions (i) and (ii) in Proposition 3.2. Thus, it is straightforward to generate a feasible point for (SDP) from \(\tilde{\varvec{y}}\). \(\square \)

Since (pwpOCP) is fully independent of the choice of \(\textsf{F}\), \(\textsf{T}\) and \(n_I\), it provides a way to check rigorously the approximation quality of (SDP) against the baseline of (OCP). This can guide the user choice of the truncation order m and the parameters \(\textsf{F}\), \(\textsf{T}\) and \(n_I\). Specifically, the difference of optimal values of (pwpOCP) and (SDP) quantifies the potential for improvements by adding elements to \(\textsf{F}\) and \(\textsf{T}\) versus moving to a higher level in the proposed hierarchy.

5 Examples

In this section, we present several case studies that demonstrate the effectiveness of the proposed bounding hierarchy. We put special emphasis on showcasing that the proposed method enables the computation of substantially tighter bounds than can be obtained by the method of Dowdy and Barton [22]. Throughout, we use the subscripts DB to indicate any results obtained with Dowdy and Barton’s method [22] and the subscript HB for those generated with the method presented in this paper.

5.1 Preliminaries

5.1.1 Reaction Kinetics

All considered reaction networks are modeled via the CME and are assumed to obey mass action kinetics such that Assumptions 1 and 2 are naturally satisfied.

5.1.2 State Space & LMIs

Following Dowdy and Barton [22], we reduce the state space of every reaction network explicitly to the minimum number of independent species by eliminating reaction invariants. Further, we employ the same set of LMI generating polynomials as suggested by Dowdy and Barton [22]. The resultant LMIs of the form (LMI) reflect non-negativity of the probability measure as well as its support on states with non-negative molecular counts.

5.1.3 Hierarchy Parameters

Applying the proposed bounding scheme requires the user to specify a range of parameters, namely the truncation order m, the hierarchy level \(n_I\), the test function set \(\textsf{F}\) and the set of time points \(\textsf{T}\) used to discretize the time horizon. While all these hierarchy parameters can in principle be chosen arbitrarily (assuming the test functions satisfy the hypotheses of Propositions 3.2 and 3.3) and independently, a careful choice is essential to achieve a good trade-off between bound quality and computational cost. At present we are not aware of a systematic way of choosing the hierarchy parameters optimally in that sense; however, our findings indicate that the following set of simple heuristics lead to good performance in practice.

  • The set of time points \(\textsf{T}\) is chosen by equidistantly discretizing the entire time horizon \([0,t_f]\), where \(t_f\) denotes the time point at which the bounds are to be evaluated, into \(n_{\textsf{T}}\) intervals.

  • In line with Dowdy and Barton’s original work [22], we employ exponential test functions of the form \(g(t) = e^{\rho (t_T-t)}\). As argued in Example 3.1, any set of test functions of this form satisfies the hypotheses of Propositions 3.2 and 3.3. Throughout, we choose \(t_T\) to coincide with the end of the time horizon on which the bounds are to be evaluated. As \(t_T\) merely controls the scale of the generalized moments generated by g, this choice is somewhat arbitrary but contributes in our experience to improved numerical conditioning of problem (SDP). For the choice of the parameters \(\rho \), we draw motivation from linear time-invariant systems theory and choose \(\rho \) based on the singular values of the coefficient matrix \(\varvec{A}\) of the moment dynamics (mCME). Overall, we choose the test function set \(\textsf{F} = \left\{ e^{-\sigma _i(t_T-t)} \right\} _{i=1}^{n_{\textsf{F}}}\) assembled from the smallest \(n_{\textsf{F}}\) unique singular values \(\sigma _1,\dots ,\sigma _{n_{\textsf{F}}}\) of \(\varvec{A}\).

  • Motivated by the scaling of the size of the bounding SDP (see Table 1), we use the following greedy procedure to ultimately choose m, \(n_{I}\), \(n_{\textsf{T}}\) and \(n_{\textsf{F}}\)

    1. 1.

      Fix \(m = 2\), \(n_{I} = 2\), \(n_{\textsf{F}} = 1\) and successively increase \(n_{\textsf{T}}\) until no significant bound tightening effect is observed.

    2. 2.

      Increase \(n_{\textsf{F}}\) successively until no significant bound tightening effect is observed.

    3. 3.

      Increase m until bounds are sufficiently tight or the computational cost exceeds a tolerable amount.

    Note that the above procedure fixes the hierarchy level \(n_I\) at 2. While increasing \(n_I\) generally also has a bound tightening effect, in our experience it promotes numerical ill-conditioning and is rarely significantly more efficient than the other bound tightening mechanisms.

We study the bound tightening effect of the different hierarchy parameters in greater detail with an example in Sect. 5.2. In all other case studies, we employ the above heuristics to choose the hierarchy parameters. As the time point and test function sets are systematically generated once the parameters \(n_{\textsf{T}}\) and \(n_{\textsf{F}}\) are chosen, we instead report these parameters in place of \(\textsf{T}\) and \(\textsf{F}\) throughout.

5.1.4 Numerical Considerations

Sum-of-squares and moment problems are notorious for poor numerical conditioning, and the Problems (SDP) and (pwpOCP) are no exception to this issue. And while promising general remedies, such as the use of better suited polynomial bases like Chebyshev polynomials [34], exist in theory, it remains a largely open research problem how to deploy them effectively in practice. To circumvent this deficiency, we instead employ a simple scaling strategy for the decision variables in the bounding problems. This strategy is applicable whenever bounds are computed at multiple time points \(t_1< t_2< \cdots < t_T\) along a trajectory and can be summarized as follows: we solve the SDPs in chronological order and scale the decision variables in the bounding problem corresponding to the time point \(t_k\) by the values attained at the solution of the bounding problem associated with the previous time point \(t_{k-1}\). For the initial problem, we scale the system state such that the moments of the initial distribution lie between 0 and 1. This strategy defers the orders of magnitude of the moments to the coefficients in the constraints of Problem (SDP) and as such exposes them directly to the SDP solver which is greatly beneficial in our experience.

While an appropriate scaling of the decision variables is crucial to avoid numerical issues, it is not always sufficient to achieve convergence of the solver to a desired degree of accuracy with respect to optimality. We hedge against potentially inaccurate, suboptimal solutions and ensure validity of the computed bounds by verifying that the solver converged to a dual feasible point and report the associated dual objective value.

5.1.5 Implementation

All semidefinite programs solved for the case studies presented in this section were assembled using JuMP [23] and solved with MOSEK v9.0.97 [6]. Our implementation is publicly available at https://github.com/FHoltorf/StochMP.

5.2 Bound Tightening Mechanisms

In this section, we assess the effect of the different bound tightening mechanisms provided by the proposed bounding hierarchy. We conduct this empirical analysis on the basis of the nonlinear birth-death process

$$\begin{aligned} \emptyset {\mathop {\rightarrow }\limits ^{c_1}}\text{ A }, \qquad 2\text{ A } {\mathop {\rightarrow }\limits ^{c_2}} \emptyset . \end{aligned}$$
(14)

Note that this reaction system has an unbounded state space. Such systems are of particular interest for the application of moment bounding schemes as their moments can rarely be found analytically and the corresponding CME gives rise to a countably infinite system of coupled ODEs precluding a direct numerical integration.

For the sake of simplicity, we restrict our considerations here to studying the bound tightening effect of increasing the truncation order m, the hierarchy level \(n_I\) and the number of time points \(n_{\textsf{T}}\) used to discretize the horizon; throughout, we only use the constant test function (\(n_{\textsf{F}} = 1\)).

Figure 1 shows the effect of isolated changes in the different hierarchy parameters on the bounds obtained for the mean molecular count of A and its variance. The results indicate that all bound tightening mechanisms, when used in isolation, appear to suffer from diminishing returns, eventually causing the bounds to stall. Moreover, solely increasing the truncation order m appears insufficient to provide informative bounds over a long time horizon in this example; increasing either the number of time points or the hierarchy level \(n_I\) is significantly more effective in comparison.

Figure 2 shows the effect of joint changes in the considered hierarchy parameters on the tightness of bounds on the mean molecular count. The figure indicates that jointly changing the hierarchy parameters effectively mitigates stalling of the bounds in this example such that significantly tighter bounds are obtained overall. While the general trends illustrated in Figs. 1 and 2 align well with our experiences for a range of other examples, we wish to emphasize that it is in general hard to predict which combination of hierarchy parameters provides the best trade-off between computational cost and bound quality; when the choice of test functions is added to the equation, the situation becomes even more complicated. Moreover, as Fig. 3 illustrates, the feasible region of the bounding SDPs shrinks anisotropically and, more importantly, with different intensity along different directions for different bound tightening mechanisms. This indicates that the optimal choice of the hierarchy parameters is in general not only dependent on the system under investigation but also on the statistical quantity to be bounded.

In summary, the results presented in this section underline the value of the additional bound tightening mechanisms offered by the proposed hierarchy; however, they also emphasize the need for better guidelines to enable an effective use of the tightening mechanisms in practice.

Fig. 1
figure 1

Bounds on the trajectories of the mean molecular count and variance of the birth-death process (14) for increasing m (a, b), \(n_I\) (c, d), and \(n_{\textsf{T}}\) (e, f) compared against the empiric sample mean and variance generated with Gillespie’s Stochastic Simulation Algorithm. In each figure only one parameter is varied, while the others are held constant at the level indicated in the subcaptions

Fig. 2
figure 2

Maximum gap between upper and lower bounds on mean molecular count of the birth-death process (14) among the time points probed along the time horizon for joint changes in m and \(n_I\) (a), m and \(n_{\textsf{T}}\) (b), and \(n_I\) and \(n_{\textsf{T}}\) (c)

Fig. 3
figure 3

Projection of the feasible set of (SDP) corresponding to the birth-death process (14) for (a) increasing truncation orders m, (b) increasing number of time points \(n_{\textsf{T}}\), and (c) jointly increasing truncation orders and number of time points. All projections are obtained for \(n_{I} = 2\) and \(t_f = 20\)

5.3 Generic Examples

To contrast the performance of the proposed methodology with its predecessor, we first consider two generic reaction networks that were studied by Dowdy and Barton [22].

5.3.1 Simple Reaction Network

First, we study the bound quality for means and variances of the molecular counts of the species A and C following the simple reaction network

$$\begin{aligned} \text{ A } + \text{ B } {\mathop {\rightarrow }\limits ^{c_1}} \text{ C } \underset{c_{2}}{\overset{c_{3}}{\rightleftharpoons }}\text{ D }. \end{aligned}$$
(15)

Figure 4 shows a comparison between the bounds obtained by both methods. For reference, also the trajectories obtained with Gillespie’s Stochastic Simulation Algorithm are provided. The results showcase that the presented necessary moment conditions have the potential to tighten the obtained bounds significantly. In particular bounds on the variance of both species are dramatically improved at the relatively low truncation order of \(m=4\).

Fig. 4
figure 4

Bounds on (a) means and (b, c) variances of molecular counts of species A and C in reaction network (15); initial state: \(x_{A,0} = 40\), \(x_{B,0} = 41\), and \(x_{C,0}=x_{D,0}=0\); kinetic parameters: \(\varvec{c} = (1, 2.1, 0.3) \, \hbox {s}^{-1}\); hierarchy parameters: \(m = 4\), \(n_{\textsf{F}}=3\), \(n_{\textsf{T}} = 10\)

5.3.2 Large Reaction Network

Departing from examples of toyish size, we now study the reaction network shown in Fig. 5. In contrast to the previous reaction systems, this large reaction network poses a major challenge for sampling-based analysis techniques. The underlying reason for that is twofold. On the one hand, the network is characterized by a large, 7-dimensional state space containing hundreds of millions of reachable states [22]. On the other hand, the system is extremely stiff. These properties frustrate sampling-based techniques as they exacerbate the need for large sample sizes and render each sample path evaluation expensive.

Figure 6 shows bounds on the mean molecular counts of species A and H. In line with the results of the previous sections, the bounds obtained by the proposed method are again considerably tighter. In this example, however, this result carries more weight as increasing the truncation order leads to a prohibitive increase in problem size for the method of Dowdy and Barton [22]. Accordingly, the proposed bounding scheme offers bounds at a quality that was previously not attainable for problems of such complexity.

Fig. 5
figure 5

Large reaction network from Dowdy and Barton [22]

Fig. 6
figure 6

Bounds on the mean molecular counts of species A and H in the large reaction network shown in Fig. 5; initial state: \(x_{\text{ A },0}=x_{\text{ F },0} = 53\), \(x_{\text{ B },0} = x_{\text{ C },0} = x_{\text{ D },0} = x_{\text{ E },0} = x_{\text{ G },0} = x_{\text{ H },0} = x_{\text{ I },0} = x_{\text{ J },0} = 0\); kinetic parameters: \(\varvec{c} = (1, 1, 1, 10^4, 1, 1, 10^5, 1) \, \hbox {s}^{-1}\); hierarchy parameters: \(m=2\), \(n_{\textsf{F}}=5\), \(n_{\textsf{T}} = 5\)

5.4 Biochemical Reaction Networks

To finally demonstrate that the proposed methodology may in fact be useful in practice, we examine two reaction networks drawn from biochemical applications. In these applications, molecular counts are often in the order of 10 s to 100 s necessitating the consideration of stochasticity.

5.4.1 Michaelis–Menten Kinetics

Michaelis–Menten kinetics underpin a vast range of metabolic processes. Understanding the behavior and noise present in the associated reaction networks is of particular value for the investigation of the metabolic degradation of trace substances in biological organisms. We examine the basic Michaelis–Menten reaction network:

$$\begin{aligned}&\text{ S } + \text{ E } \underset{c_{2}}{\overset{c_{1}}{\rightleftharpoons }} \text{ S }:\text{ E }{\mathop {\rightarrow }\limits ^{c_3}} \text{ P } + \text{ E } \nonumber \\&\text{ S }:\text{ E } {\mathop {\rightarrow }\limits ^{c_3}} \text{ P } + \text{ E } \nonumber \\&\text{ P } {\mathop {\rightarrow }\limits ^{c_4}} \text{ S }. \end{aligned}$$
(16)

Due to reaction invariants, the reaction network is characterized entirely by a two-dimensional state space. Accordingly, we bound the means and variances of the molecular counts of only the product P and substrate S. The results are illustrated in Fig. 7. The proposed method produces high-quality bounds for the means while providing informatively tight bounds for the variances. Further, it again outperforms its predecessor, especially with respect to the bounds on the variance.

Fig. 7
figure 7

Bounds on means (a) and variances (b, c) of the molecular counts of species S and P in the metabolic reaction network (16); initial state: \(x_{\text{ S },0} = x_{\text{ E },0} = 100\), \(x_{\text{ P },0} = x_{\text{ S }:\text{ E },0}=0\); kinetic parameters: \(\varvec{c} = (1, 1, 1, 1) \, \hbox {s}^{-1}\); hierarchy parameters: \(m = 4\), \(n_{\textsf{F}} =3\), \(n_{\textsf{T}} = 20\)

To showcase further that the proposed hierarchy can provide notable computational savings in practice, we also investigate the trade-off between computational cost and bound quality for a wide range of hierarchy parameters. As the standard moment-sum-of-squares hierarchy, i.e., the use of monomial test functions, provides an established alternative for bounding moment trajectories of stochastic chemical systems [61], we compare against this baseline. Figure 8 shows this comparison for bounds on the mean molecular count of species \(\text{ S }\) at the end of the time horizon \(t_f = 5\,\hbox {s}\). The proposed bounding problems allow for the computation of overall tighter bounds, often at a dramatic cost reduction. This advantage persists for almost all choices of the hierarchy parameters from the tested range, albeit some choices are significantly better than others. The results further indicate that the new bound tightening mechanisms of refining the time domain partition and increasing the hierarchy level \(n_I\) can be more effective than increasing the truncation order m, emphasizing the practical improvement upon Dowdy and Barton’s method.

Fig. 8
figure 8

Trade-off between bound quality and computational cost for the proposed hierarchy and the standard moment-sum-of-squares hierarchy when applied to the metabolic reaction network (16). Small markers refer to results obtained with the proposed bounding problems. For the proposed bounding problems, instances were constructed using hierarchy parameters in the range \(m \in \lbrace {2,4,6\rbrace }\), \(n_{\textsf{T}} \in \lbrace 1, \dots , 100 \rbrace \), \(n_I \in \lbrace 1, \dots , 7\rbrace \) and \(n_{\textsf{R}} \in \lbrace 1, 2, 3\rbrace \) so that the computation time does not exceed 200 s. For the moment sum-of-squares hierarchy (M-SOS), the total degree m of monomial test functions was varied

5.4.2 Negative Feedback Biocircuit

Many efforts of modern synthetic biology culminate in the design of biocircuits subject to stringent constraints on robustness and performance. Upon successful design, the implications of such tailored biocircuits are often far reaching, even addressing global challenges such as water pollution [66] and energy [55]. Accordingly, in recent years the use of systems theoretic techniques to conceptualize and speed up the design process of biocircuits has received considerable attention [19]. In this context, Sakurai and Hori [60] demonstrated the utility of bounds on stationary moments for the design of biocircuits subject to robustness constraints.

We show here that the proposed bounding method could enable an extension of Sakurai and Hori’s analysis to the dynamic case. To that end, we consider the same negative feedback biocircuit as studied in [60]. The reaction network is illustrated in Fig. 9 and formally given by the following set of reactions

$$\begin{aligned} \begin{array}{l} \text{ DNA } {\mathop {\rightarrow }\limits ^{c_1}} \text{ DNA } + \text{ mRNA } \\ \text{ mRNA } {\mathop {\rightarrow }\limits ^{c_2}} \emptyset \\ \text{ mRNA } {\mathop {\rightarrow }\limits ^{c_3}} \text{ mRNA } + \text{ P } \\ \text{ P } {\mathop {\rightarrow }\limits ^{c_4}} \emptyset \\ \text{ P } + \text{ DNA } \underset{c_{5}}{\overset{c_{6}}{\rightleftharpoons }} \text{ P }:\text{ DNA } \end{array}. \end{aligned}$$
(17)

Figure 10 illustrates the obtained bounds on means and variances of the molecular counts of mRNA and protein (P). The bounds are of high quality and may provide useful information for robustness analysis as the noise level measured by the variance changes significantly over the time horizon until a stationary value is reached. Figure 10 also underlines that the proposed bounding scheme is capable of furnishing considerably tighter bounds at the same truncation order as its predecessor. Figure 11 further provides a comparison of the bound quality vs. cost trade-off against the baseline of the standard moment-sum-of-squares hierarchy. While the proposed bounding scheme does not outperform the standard moment-sum-of-squares hierarchy at low precision for this example, it enables the computation of tighter bounds at similar cost overall. Most critically, it needs to be emphasized that the bounding problems of the moment-sum-of-squares hierarchy could not be solved beyond the truncation order \(m = 14\) due to numerical issues. This highlights the practical value of the proposed tightening mechanisms that avoid increasing the truncation order.

Fig. 9
figure 9

Negative feedback biocircuit. Reproduced from Sakurai and Hori [60]

Fig. 10
figure 10

Bounds on means (a) and variances (b) of the molecular counts of \(\text{ mRNA }\) and P in the negative feedback biocircuit illustrated in Fig. 9; initial state: \(x_{\text{ mRNA },0} = 10\), \(x_{\text{ P },0} = x_{\text{ P }:\text{ DNA },0} = 0\), \(x_{\text{ DNA },0} = 20\); kinetic parameters: \(\varvec{c} = (0.2, \ln (2)/5, 0.5, \ln (2)/20, 5, 1) \, \hbox {min}^{-1}\); hierarchy parameters: \(m = 4\), \(n_{\textsf{F}} = 3\), \(n_{\textsf{T}} = 20\)

Fig. 11
figure 11

Trade-off between bound quality and computational cost for the proposed hierarchy and the standard moment-sum-of-squares hierarchy when applied to the biocircuit (17). Small markers refer to results obtained with the proposed bounding problems. For the proposed bounding problems, instances were constructed using hierarchy parameters in the range \(m \in \lbrace {2,4,6\rbrace }\), \(n_{\textsf{T}} \in \lbrace 1, \dots , 100 \rbrace \), \(n_I \in \lbrace 1, \dots , 7\rbrace \) and \(n_{\textsf{R}} \in \lbrace 1, 2, 3\rbrace \) so that computation time does not exceed 1000s. For the moment sum-of-squares hierarchy (M-SOS), the total degree m of monomial test functions was varied

6 Conclusion

6.1 Summary

We have extended the results of Dowdy and Barton [22] by constructing a new hierarchy of convex necessary moment conditions for the moment trajectories of stochastic chemical systems described by the CME. Building on a discretization of the time domain of the problem, the conditions reflect temporal causality and regularity properties of the true moment trajectories. It is proved that the conditions give rise to a hierarchy of SDPs whose optimal values form a sequence of monotonically improving bounds on the true moment trajectories. Furthermore, the conditions provide new mechanisms to tighten the obtained bounds when compared to the original conditions proposed by Dowdy and Barton [22]. These tightening mechanisms are often a more scalable and practical alternative to the primary tightening mechanism of increasing the moment truncation order in Dowdy and Barton’s moment bounding scheme [22]; most notably, refining the time discretization results in linearly increasing problem sizes, independent of the state space dimension of the system. As an additional advantage, this bound tightening mechanism provides a way to sidestep the poor numerical conditioning of moment-based SDPs featuring high-order moments. Finally, it is demonstrated with several examples that the proposed hierarchy provides bounds that may indeed be useful in practice.

6.2 Open Questions

We close by stating some open questions motivated by our results.

  1. 1.

    In the presented case studies, we naively chose the time points at which the proposed necessary moment conditions were imposed as equidistant. Several results from numerical integration, perhaps most notably Gauß quadrature, suggest that this is likely a suboptimal choice. It would be interesting to examine if and how results from numerical integration can inform improvements of this choice.

  2. 2.

    The choice of the hierarchy parameters in the proposed bounding scheme is crucial to achieve a good trade-off between bound quality and computational cost. As indicated by the discussion in Sect. 5.2, however, the interplay between the bound tightening mechanisms associated with the different hierarchy parameters and their effect on the bound quality remains poorly understood. Accordingly, we believe that assessing the trade-offs offered by the different bound tightening mechanisms in greater detail and developing more rigorous guidelines on how to utilize them effectively constitutes an important step toward improving the practicality of the proposed method.

  3. 3.

    The ideas discussed in Sect. 4 constitute additional promising research avenues toward improving practicality of the proposed method. Specifically, there are several open questions pertaining to the concrete implementation of Algorithm 1 and the way Problem (pwpOCP) can be used to inform an effective use of the different bound tightening mechanisms. Furthermore, the decomposable, weakly coupled structure of the bounding SDPs motivates other forms of exploitation than Algorithm 1; in particular the use of distributed optimization techniques such as ADMM [14] or overlapping Schwarz decomposition [51, 65] appears promising.