Abstract
We present a new, efficient algorithm for inferring, from timeseries data or highthroughput data (e.g., flow cytometry), stochastic rate parameters for chemical reaction network models. Our algorithm combines the Gillespie stochastic simulation algorithm (including approximate variants such as tauleaping) with the crossentropy method. Also, it can work with incomplete datasets missing some model species, and with multiple datasets originating from experiment repetitions. We evaluate our algorithm on a number of challenging case studies, including bistable systems (Schlögl’s and toggle switch) and experimental data.
Download conference paper PDF
1 Introduction
In this paper we are concerned with the inference of biochemical reaction stochastic rate parameters from data. Reactions are discrete events that can occur randomly at any time with a rate dependent on the chemical kinetics [40]. It has recently become clear that stochasticity can produce dynamics profoundly different from the corresponding deterministic models. This is the case, e.g., in genetic systems where key species are present in small numbers or where key reactions occur at a low rate [23], resulting in transient, stochastic bursts of activity [4, 24]. The standard model for such systems is the Markov jump process popularised by Gillespie [13, 14]. Given a collection of reactions modelling a biological system and timecourse data, the stochastic parameter inference problem is to find parameter values for which the Gillespie model’s temporal behaviour is most consistent with the data. This is a very difficult problem, much harder, both theoretically and computationally, than the corresponding problem for deterministic kinetics—see, e.g., [41, Sect. 1.3]. One simple reason is because stochastic models can behave widely differently from the same initial conditions. (The related issue of parameter nonidentifiability is outside the scope of this paper, but the interested reader can find more in, e.g., [37, 38] and references therein.) Additionally, experimental data is usually sparse and most often involves only a limited subset of a model’s species; and the system under study might exhibit multimodal behaviour. Also, data might not directly relate to a species, it might be measured in arbitrary units (e.g., fluorescence measurements), thus requiring the estimation of scaling factors, or it might be described by frequency distributions (e.g., highthroughput data such as flow cytometry). Stochastic parameter inference is thus a fundamental and challenging problem in systems biology, and it is crucial for obtaining validated and predictive models.
In this paper we propose an approach for the parameter inference problem that combines Gillespie’s Stochastic Simulation Algorithm (SSA) with the crossentropy (CE) method [27]. The CE method has been successfully used in optimisation, rare–event probability estimation, and other domains [29]. For parameter inference, Daigle et al. [8] combined a stochastic Expectation–Maximisation (EM) algorithm with a modified crossentropy method. We instead develop the crossentropy method in its own right, discarding the costly EM algorithm steps. We also show that our approach can utilise approximate, faster SSA variants such as tauleaping [15]. Summarising, the main contributions of this paper are:

we present a new, cross entropybased algorithm for the stochastic parameter inference problem that outperforms previous, state–of–the–art approaches;

our algorithm can work with multiple, incomplete, and distribution datasets;

we show that tauleaping can be used within our technique;

we provide a thorough evaluation of our algorithm on a number of challenging case studies, including bistable systems (Schlögl model and toggle switch) and experimental data.
2 Background
Notation. Given a system with n chemical species, the state of the system at time t is represented by the vector \(\varvec{x}(t) = (x_1(t), \ldots , x_n(t))\), where \(x_i\) represents the number of molecules of the ith species, \(S_i\), for \(i \in \{1,\ldots ,n\}\). A wellmixed system within a fixed volume at a constant temperature can be modelled by a continuoustime Markov chain (CTMC) [13, 14]. The CTMC state changes are triggered by the (probabilistic) occurrences of chemical reactions. Given m chemical reactions, let \(\mathcal {R}_j\) denote the jth reaction of type:
where the vectors \(\varvec{\nu }_j^\) and \(\varvec{\nu }_j^+\) represent the stoichiometries of the underlying chemical kinetics for the reactants and products, respectively. Let \(\varvec{\nu }_j \in \mathbb {Z}^n\) denote the overall (nonzero) statechange vector for the jth reaction type, specifically \(\varvec{\nu }_j = \varvec{\nu }_j^{+}  \varvec{\nu }_j^{}\), for \(j \in \{1,\ldots ,m\}\). Assuming mass action kinetics (and omitting time dependency for \(\varvec{x}(t)\)), the reaction \(\mathcal {R}_j\) leads to the propensity [41]:
where \(\varvec{\theta } = (\theta _1,\ldots ,\theta _m)^{\intercal }\) is the vector of rate constants. In general, \(\varvec{\theta }\) is unknown and must be estimated from experimental data—that is the aim of our work. Our algorithm can work with propensity functions factorisable as in (1), but it is not restricted to mass action kinetics (i.e., the functions \(\alpha _j\)’s can be arbitrary).
CrossEntropy Method for Optimisation. The KullbackLeibler divergence [20] or crossentropy (CE) between two probability densities g and h is:
where \(\varvec{X}\) is a random variable with density g, and \(\mathbb {E}_g\) is expectation w.r.t. g. Note that \( \mathcal {D}(g,h) \ge 0\) with equality iff \(g=h\) (almost everywhere). (However, \(\mathcal {D}(g,h)\ne \mathcal {D}(h,g)\).) The CE has been successfully adopted for a wide range of hard problems, including rare event simulation for biological systems [7], discrete, and continuous optimisation [28, 29]. Consider the minimisation of an objective function J over a space \(\chi \) (assuming such minimum exists), \(\gamma ^*=\min \limits _{x \in \chi } J(x)\). The CE method performs a Monte Carlo search over a parametric family of densities \(\{f(\cdot ;\varvec{v}),\varvec{v}\in \mathcal {V}\}\) on \(\chi \) that contains as a limit the (degenerate) Dirac density that puts its entire mass on a value \(x^*\in \chi \) such that \(J(x^*) = \gamma ^*\)—the so called optimal density. The key idea is to use the CE to measure how far a candidate density is from the optimal density. In particular, the method solves a sequence of optimisation problems of the type below for different values of \(\gamma \) by minimising the CE between a putative optimal density \(g^*(\varvec{x}) \propto I_{\{J(\varvec{x})\le \gamma \}}f(\varvec{x}, \varvec{v}^*)\) for some \(\varvec{v}^*\in \mathcal {V}\), and the density family \(\{f(\cdot ;\varvec{v}),\varvec{v}\in \mathcal {V}\}\)
where I is the indicator function and \(\varvec{X}\) has density \(f(\cdot ;\varvec{u})\) for \(\varvec{u}\in \mathcal {V}\). The definition of density \(g^*\) above essentially means that, for a given \(\gamma \), we only consider densities that are positive only for arguments \(\varvec{x}\) for which \(J(\varvec{x}) \leqslant \gamma \). The generic CE method involves a 2step procedure which alternates solving (2) for a candidate \(g^*\) with adaptively updating \(\gamma \). In practice, problem (2) is solved approximately via a Monte Carlo adaptation, i.e., by taking sample averages as estimators for \(\mathbb {E}_u\). The output of the CE method is a sequence of putative optimal densities identified by their parameters \(\hat{\varvec{v}}_0, \hat{\varvec{v}}_1, \ldots , \hat{\varvec{v}}^*\), and performance scores \(\hat{\gamma }_0, \hat{\gamma }_1, \ldots , \hat{\gamma }^*\), which improve with probability 1. For our problem, a key benefit of the CE method is that an analytic solution for (2) can be found when \(\{f(\cdot ;\varvec{v}),\varvec{v}\in \mathcal {V}\}\) is the exponential family of distributions. (More details in [29].)
CrossEntropy Method for the SSA. We denote by \(r_j\) the number of firings of the jth reaction channel, \(\tau _i\) the time between the ith and \((i1)\)th reaction, and \(\tau _{r+1}\) the final time interval at the end of the simulation in which no reaction occurs. It can be shown that an exact SSA trajectory \(\varvec{z}=(\varvec{x}_0,\ldots ,\varvec{x}_r)\), where r is the total number of reaction events \(r=\sum _{j=1}^mr_j\), belongs to the exponential family of distributions [41]—whose optimal CE parameter can be found analytically. Daigle et al. [8] showed that the solution of (2) for the SSA likelihood yields the following Monte Carlo estimate of the optimal CE parameter \(v_j^*\),
where K is the number of SSA trajectories of the Monte Carlo approximation of (2), \(\varvec{z}_k\) is the kth trajectory, \(r_{jk}\) and \(\tau _{ik}\) are as before but w.r.t. the kth trajectory, \(\varvec{x}_{i,k}\) denotes the state after the \((i1)\)th reaction in the kth trajectory, and the fraction is defined only when the denominator is nonzero (i.e., there is at least one trajectory \(\varvec{z}_k\) for which \(J(\varvec{z}_k) \le \gamma \)—socalled elite samples). Note for \(\gamma =0\), the CE estimator (3) coincides with the maximum likelihood estimator (MLE) for \(\theta _j\) over the same trajectory. Following [7] and [26, Sect. 5.3.4], it is easy to show that a Monte Carlo estimator of the covariance matrix of the optimal parameter estimators (3) is given (written in operator style) by the matrix:
where E is the set of elite samples, \(K_E=E\), the operator \(\frac{\partial ^2}{\partial \theta ^2}\) returns a \(m\,\mathord {\times }\,m\) matrix, \(\frac{\partial }{\partial \theta }\) returns an mdimensional vector (\(m\,\mathord {\times }\,1\) matrix), and \(\frac{\partial }{\partial \theta }^\text {T}\) denotes matrix transpose. From Eq. (4) parameter variance estimates can be readily derived. However, a more numerically stable option is to approximate the variance of the jth parameter estimator using the sample variance
3 Methods
In this section, we present our stochastic rate parameter inference with crossentropy (SPICE) algorithm.
Overview. To efficiently sample the parameter space, we treat each stochastic rate parameter as being lognormally distributed, i.e., \(\theta _j \sim \text {Lognormal}(\omega _j,\text {var}(\omega _j))\), where \(\omega _j = \log (\theta _j)\) is the logtransformed parameter calculated analagously to (3) and (4), respectively. For the initial iteration, we sample the parameter vector \(\varvec{\theta }\) from the (logtransformed) desired parameter search space \([\varvec{\theta }_{\textsc {min}}^{(0)}, \varvec{\theta }_{\textsc {max}}^{(0)}]\) using a Sobol lowdiscrepancy sequence [33] to ensure adequate coverage. Subsequent iterations then generate a sequence of distribution parameters \(\{(\gamma _n,\varvec{\theta }_n,\varvec{\varSigma }_n)\}\) which aim to converge to the optimal parameters as follows:

1.
Updating of \(\gamma _n\): Generate K sample trajectories using the SSA, \(\varvec{z}_1,\ldots ,\varvec{z}_K\), from the model \(f(\cdot ;\varvec{\theta }^{(n1)})\) with \(\varvec{\theta }^{(n1)}\) sampled from the lognormal distribution, and sort them in order of their performances \(J_{1'} \le \cdots \le J_{K'}\) (see Eqs. (7) and (6) for the actual definition of the performance, or score, function we adopt). For a fixed small \(\rho \), say \(\rho =10^{2}\), let \(\hat{\gamma }_n\) be defined as the \(\rho \)th quantile of \(J(\varvec{z})\), i.e., \( \hat{\gamma }_n=J_{(\lceil \rho K\rceil )}\).

2.
Updating of \(\varvec{\theta }_n\): Using the estimated level \(\hat{\gamma }_n\), use the same K sample trajectories \(\varvec{z}_1,\ldots ,\varvec{z}_K\) to derive \(\hat{\varvec{\theta }}_{n}\) and \(\hat{\varvec{\sigma }}^2_n\) from the solution of Eqs. (3) and (4). In case of numerical issues (or undersampling) in our implementation we switch to (5) for updating the variance.
The SPICE algorithm’s pseudocode is shown in Algorithm 1. This 2step approach provides a simple iterative scheme which converges asymptotically to the optimal density. A reasonable termination criteria to take would be to stop if \(\hat{\gamma }_n \nleq \hat{\gamma }_{n1} \nleq \ldots \) for a fixed number of iterations. In general, more samples are required as the mean and variance of the estimates approach their optima.
Adaptive Sampling. We adaptively update the number of samples \(K_n\) taken at each iteration. The reasoning is to ensure the parameter estimates improve with statistical significance at each step. Thus, our method allows the algorithm to make faster evaluations early on in the iterative process, and concentrate simulation time on later iterations, where it becomes increasingly hard to distinguish significant improvements of the estimated parameters. We update our parameters based on a fixed number of elite samples, \(K_{{E}}\), satisfying \(J(\varvec{z})\le \gamma \). The performance of the ‘best’ elite sample is denoted \(J_n^*\), while the performance of the ‘worst’ elite sample—previously given by the \(\rho \)th quantile of \(J(\varvec{z})\)—is \(\hat{\gamma }_n\). The quantile parameter \(\rho \) is adaptively updated each iteration as \(\rho _n=K_{{E}}/K_n\), where \(K_{E}\) is typically taken to be 1–10% of the base number of samples \(K_0\). At each iteration, a check is made for improvement in either of the best or worst performing elite samples, i.e., if, \( J^*_n < J^*_{n1}\) or \( \hat{\gamma }_n < \hat{\gamma }_{n1}\), then we can update our parameters and proceed to the next iteration. If no improvement in either values are found, the number of samples \(K_n\) in the current iteration is increased in increments, up to a maximum \(K_{\text {max}}\). If we hit the maximum number of samples \(K_{\text {max}}\) for c iterations (e.g., \(c=3\)), then this suggests no further significant improvement can be made given the restriction on the number of samples.
Objective Function. The SPICE algorithm has been developed to handle an arbitrary number of datasets. Given N time series datasets, SPICE associates N objective function scores with each simulated trajectory. Each objective value corresponds to the standard sum of \(L^2\) distances of the trajectory across all time points in the respective dataset:
where \(\varvec{x}_t=\varvec{x}(t)\) and \(\varvec{y}_{n,t}\) is the datapoint at time t in the nth dataset. To ensure adequate coverage of the data, we choose our elite samples to be the best performing quantile of trajectories for each individual dataset (with scores \(J_n\)).
In the absence of temporal correlation within the data (e.g., when measurements between time points are independent or individual cells cannot be tracked as in flow cytometry data), we instead construct an empirical Gaussian mixture model for each time point within the data. Each mixture model at time t is comprised of N multivariate normal distributions, each with a vector of mean values \(\varvec{y}_{n,t}\) corresponding to the observed species in the nth dataset, and diagonal covariance matrix \(\varvec{\sigma }^2_n\) corresponding to an error estimate or variance of the measurements on the species. In our experiments we used a 10% standard deviation, as we did not have any information about measurement noise. We then take the objective score function to be proportional to the negative loglikelihood of the simulated trajectory w.r.t. the data:
Smoothed Updates. We implement the parameter smoothing update formula
where \( \beta _n = \beta  \beta \left( 1\frac{1}{n}\right) ^q\), \(\lambda \,\mathord {\in }\, (0,1]\), \(q\,\mathord {\in }\,\mathbb {N}^+\) and \(\beta \,\mathord {\in }\,(0,1)\) are smoothing constants, and \(\tilde{\varvec{\theta }},\tilde{\varvec{\sigma }}\) are outputs from the solution of the crossentropy in Eq. (2), approximated by (3) and (4), respectively. Parameter smoothing between iterations has three important benefits: (i) the parameter estimates converge to a more stable value, (ii) it reduces the probability of a parameter value tending towards zero within the first few iterations, and (iii) it prevents the sampling distribution from converging too quickly to a degenerate point probability mass at a local minima. Furthermore, [6] provide a proof that the CE method converges to an optimal solution with probability 1 in the case of smoothed updates.
Multiple Shooting and Particle Splitting. SPICE can optionally utilise these two techniques for trajectory simulation between time intervals. For multiple shooting we construct a sample trajectory comprised of T intervals matching the time stamps within the data \(\varvec{y}\). Originally [42], each segment from \(\varvec{x}_{t1}\) to \(\varvec{x}_t\) was simulated using an ODE model with the initial conditions set to the previous time point of the dataset, i.e., \(\varvec{x}_{t1} = \varvec{y}_{t1}\). We instead treat the data as being mixturenormally distributed, thus we sample our initial conditions \(\varvec{x}_{t1}\sim \mathcal {N}(\varvec{y}_{n,t1},\varvec{\sigma }^2_{n,t1})\), where the index of the time series n is first uniformly sampled. Using the SSA, each piecewise section of a trajectory belonging to sample k is then simulated with the same parameter vector \(\varvec{\theta }\). For particle splitting we adopt a multilevel splitting approach as in [8], and the objective function is calculated after the simulation of each segment from \(\varvec{x}_{t1}\) to \(\varvec{x}_t\). The trajectories \(\varvec{z}_k\) satisfying \(J(\varvec{z}_k)\le \hat{\gamma }\) are then resampled with replacement \(K_n\) times before simulation continues (recall \(K_n\) is the number of samples in the nth iteration). This process aims at discarding poorly performing trajectories in favour of those ‘closest’ to the data. This will in turn create an enriched sample, at the cost of introducing an aspect of bias propagation.
Hyperparameters. SPICE allows for the inclusion of hyperparameters \(\varvec{\phi }\) (e.g., scaling constants, and non kineticrate parameters), which are sampled (logarithmically) alongside \(\varvec{\theta }\). These hyperparameters are updated at each iteration via the standard CE method.
TauLeaping. With inexact, faster methods such as tauleaping [15] a degree of accuracy is traded off in favour of computational performance. Thus, we are interested in replacing the SSA with tauleaping in our SPICE algorithm. The next Proposition shows that with a tauleaping trajectory we get the same form for the optimal CE estimator as in (3).
Proposition 1
The CE solution for the optimal rate parameter over a tauleaping trajectory is the same as that for a standard SSA trajectory.
Proof
We shall use the same notation of Sect. 2 and further assume a trajectory in which state changes occur at times \(t_l\), for \( l\, \mathord {\in }\, \{0,1,\ldots ,L\}\). For each given time interval of size \(\tau _l\) of the tauleaping algorithm, \(k_{jl} \in \mathbb {Z}^+\) firings of each reaction channel \(\mathcal {R}_j\) are sampled from a Poisson process with mean \(\lambda _{jl} = \theta _j \alpha _j(\varvec{x}_{t_l})\tau _l\). Thus, the probability of firing \(k_{jl}\) reactions, in the interval \([t_l, t_l+\tau _l)\), given the initial state \(\varvec{x}_{t_l}\) is \(P(k_{jl}\varvec{x}_{t_l}, \lambda _{jl}) = \exp \{\lambda _{jl}\}(\lambda _{jl})^{k_{jl}}/{k_{jl}!}\), where \(P(0\varvec{x}_{t_l}, 0) = 1\). Therefore, the combined probability across all reaction channels is:
Extending for the entire trajectory, the complete likelihood is given by:
We can conveniently factorise the likelihood into component likelihoods associated with each reaction channel as \(\mathcal {L} = \prod _{j=1}^m \mathcal {L}_j \), where each component \(\mathcal {L}_j\) is given by \( \mathcal {L}_j = \prod _{l=0}^L \frac{\exp \{\lambda _{jl}\}(\lambda _{jl})^{k_{jl}}}{k_{jl}!}\). Expanding \(\lambda _{jl}\):
where \(r_j=\sum _{l=0}^L k_{jl}\), i.e., the total number of firings of reaction channel \(\mathcal {R}_j\). From [29], the solution to (2) can be found by solving:
given that the differentiation and expectation operators can be interchanged. Expanding \(\ln \mathcal {L}_j\) and simplifying, we get:
We can then take the derivative, \(\nabla \), with respect to \(\theta _j\),
It is simple to see that the previous entity holds when \(r_j/\theta _j=\sum _{l=0}^L \alpha _j(\varvec{x}_{t_l})\tau _l\), yielding the Monte Carlo estimate,
\(\square \)
4 Experiments
We utilise our SPICE algorithm on four commonly investigated systems: (i) the LotkaVolterra predator–prey model, (ii) a Yeast Polarization model, (iii) the bistable Schlögl system, and (iv) the Genetic Toggle Switch. We present results for each system obtained using both the standard SSA and optimised tauleaping (with an error control parameter of \(\varepsilon =0.1\)) to drive our simulations.
For each run of the algorithm we set the sample parameters \(K_{E}=10\), \(K_{\text {min}}=1,000\), \(K_{\text {max}}=20,000\), and set an upper limit on the number of iterations to 250. The smoothing parameters \((\lambda , \beta , q)\) were set to (0.7, 0.8, 5) respectively. For our analysis, we define the mean relative error (MRE) between a parameter estimate \(\hat{\varvec{\theta }}\) and the truth \(\varvec{\theta }^*\) as \(\text {MRE}(\%_{\textsc {ERR}})=M^{1}\sum _j^M\hat{\theta }_j  \theta ^*_j / \theta ^*_j \times 100 \). All our experiments were performed on a Intel Xeon 2.9GHz Linux system without using multiple cores—all reported CPU times are singlecore. SPICE has been implemented in Julia and is open source (https://github.com/pzuliani/SPICE).
For models (i)–(iii), we use synthetic data where the true solution is known, and compare the results of SPICE against some commonly used parameter estimation techniques implemented in COPASI 4.16 [17]. Specifically, we check the performance of SPICE against the genetic algorithm (GA), evolution strategy (ES), evolutionary programming (EP), and particle swarm (PS) implementations. For the ES and EP algorithms we allow 250 generations with a population of 1,000 particles. For the GA, we run 500 generations with 2,000 particles. For the PS, we allow 1,000 iterations with 1,000 particles^{Footnote 1}. For model (iv), the Genetic Toggle Switch, we show results for SPICE using real experimental data.
All statistics presented are based on 100 runs of each algorithm using fixed datasets. For each approach we also compared the performance of using the standard SSA versus tauleaping, alongside multipleshooting and particle splitting approaches. However, for the models tested, neither multiple shooting nor particle splitting helped in reducing CPU times or improving the estimates accuracy.
LotkaVolterra Predator–Prey Model. We implement the standard LotkaVolterra model below with real parameters \((\theta _1,\theta _2,\theta _3) =\) (0.5, 0.0025, 0.3), and initial population \((X_1, X_2) =\) (50, 50)
We artificially generated 5 datasets each consisting of 40 timepoints using Gillespie’s SSA, and performed parameter estimation based on these datasets. For the initial iteration, we placed bounds on the Sobol sequence parameter search space of \(\theta _j \,\mathord {\in }\, [1\mathrm {e}{6},10]\), for \(j = 1,2,3\). The minimum, maximum, and average MRE between the true parameters and their estimates across all 100 runs of each algorithm (using the standard SSA) are summarised in Table 1, together with corresponding CPU run times. Box plots summarising the obtained parameter estimates across all runs of each method are displayed in Fig. 1.
In the previous LotkaVolterra predator–prey example, SPICE was provided with the complete data for both species \(X_1,X_2\). However, we are also concerned with cases where the data is not fully observed, i.e., when we have latent species. To compare the effects of latent species on the quality of parameter estimates, we ran SPICE again (averaging across 100 runs), this time supplying information about species \(X_1\) alone. The results are presented in Table 1.
Yeast Polarization Model. We implement the Yeast Polarization model (see below) with real parameters \((\theta _1,\ldots ,\theta _8) =\) (0.38, 0.04, 0.082, 0.12, 0.021, 0.1, 0.005, 13.21), and initial population \((R,L,RL,G,G_a,G_{bg},G_d) =\) (500, 4, 110, 300, 2, 20, 90). The reactions of the model are [8]:
We artificially generated 5 datasets each consisting of 17 timepoints using Gillespie’s SSA, and performed parameter estimation based on these datasets. For the initial iteration, we placed bounds on the parameter search space of \(\theta _j \,\mathord {\in }\, [1\mathrm {e}{6},10]\) for \(1\leqslant j\leqslant 7\), and \(\theta _8 \,\mathord {\in }\, [1\mathrm {e}{6},100]\). The average relative errors between the estimated and the real parameters across 100 runs of the algorithm are summarised in Table 1, along with the corresponding CPU run times. The variability of the estimates obtained using SPICE (and other methods) are shown in Fig. 2.
Schlögl System. We use the Schlögl model [30] with parameters \((\theta _1,\theta _2,\theta _3,\theta _4) = (3\mathrm {e}{7}, 1\mathrm {e}{4}, 1\mathrm {e}{3}, 3.5)\), and initial population \((X, A, B) = (250, 1\mathrm {e}{5}, 2\mathrm {e}{5})\). This model is well known to produce bistable dynamics (see Fig. 4).
We artificially generated 10 datasets (in order to partially capture a degree of the bistable dynamics) each consisting of 100 timepoints, and performed parameter estimation based on these datasets (also see Fig. 4). For the initial iteration, we placed bounds on the parameter search space of \(\theta _1\, \mathord {\in }\, [1\mathrm {e}{9},1\mathrm {e}{5}]\), \(\theta _2\, \mathord {\in }\, [1\mathrm {e}{6},0.01]\), \(\theta _3\, \mathord {\in }\, [1\mathrm {e}{5},10]\), \(\theta _4 \,\mathord {\in }\, [0.01,100]\). Unlike the previous models, we explicitly ran the Schlögl System using tauleaping for all algorithms, due to the computation time being largely infeasible under the same conditions (4.5 h in SPICE, 48+ h in COPASI). The MRE of all the estimated parameters, together with CPU times for each algorithm are summarised in Table 1. Box plots of the SPICE algorithm’s performance are presented in Fig. 3. Note that the Schlögl system is sensitive to the initial conditions, so even slight perturbations of its parameters can cause the system to fail in producing bimodality.
Toggle Switch Model. The genetic toggle switch is a well studied bistable system, with particular importance toward synthetic biology. The toggle switch is comprised of two repressors, and two promoters, often mediated in practice through IPTG^{Footnote 2} and aTc^{Footnote 3} induction. We perform parameter inference based on real highthroughput data (see Fig. 5), implemented upon a simple model (see below) based on [12]. For our model, we define the following reaction propensities:
where GFP and mCherry are the two model species (reporter molecules), and the stochastic rate parameters are (\(\theta _1,\ldots ,\theta _4\)). The data used for parameter inference was obtained through fluorescent flow cytometry in [21], via the GFP and mCherry reporters, and consists 40,731 measurements across 7 timepoints over 6 h. We look specifically at the case where the switch starts in the lowGFP (high mCherry) state, and switches to the highGFP (lowmCherry) state over the time course after aTc induction to the cells. The inclusion of real, noisy data requires a degree of additional care as the data needs to be rescaled from arbitrary units (a.u.) to discrete molecular counts. We assume a linear (multiplicative) scale, e.g., such that GFP (a.u.) \(= \phi _5\,\times \) GFP molecules. Furthermore, we can no longer assume all the cells begin at the same state, and we must assume the initial state belongs to a distribution. This introduces extra socalled ‘hyperparameters’, specifically the GFP molecule count to fluorescent (a.u.) scale factor \(\phi _5\), and the respective mCherry scale factor \(\phi _6\). In addition, the model now contains 4 additional parameters, \(\phi _1,\ldots ,\phi _4\), which in turn are required to be estimated. Each hyperparameter is initially sampled as before using the lowdiscrepancy Sobol sequence, and updated using the means and variances of the generated elite samples as per the CE method.
The placed bounds on the initial kinetic parameter search space, based upon reported halflives for the variants of GFP [2] and mCherry [31], were \(\theta _{1,3}\, \mathord {\in }\, [1\mathrm {e}{3},1]\), and \(\theta _{2,4}\, \mathord {\in } \,[1,50]\). The respective bounds on the search space for the hyperparameters were \(\phi _{1,2,3,4}\, \mathord {\in }\, [1\mathrm {e}{3},10]\), and \(\phi _{5,6}\, \mathord {\in }\, [50,500]\). To generate the parameter estimates, we used SPICE with tauleaping (\(\varepsilon =0.1\), CPU time = 4,293 s). The estimated parameters and the resulting fit against the data for the model can be seen in Fig. 5.
5 Discussion
We can see from the presented results that our SPICE algorithm performs well on the models studied. For the LotkaVolterra model the quality of the estimates is always good—there is no relative error larger than 2.1% in Table 1 for SPICE. The CPU times are reasonable in absolute terms (about 20 min, single core), and much smaller than those of the methods implemented in COPASI, and with smaller errors. Also, having one unobserved species (\(X_2\)) in the data does not seem to impact the results very much. In particular, from Table 1 we see that the latent model indeed has higher error than the fully observable model. However, the error is always smaller than 10%, which is acceptable.
The Yeast Polarization model is a more difficult system: we can indeed see from Table 1 that a number of parameter estimates have large relative errors. These are the same ‘hard’ parameters estimated by MCEM\(^2\) [8] with similar errors. However, in CPU time terms, our SPICE algorithm does much better than MCEM\(^2\): SPICE can return a quite good estimate (in line with MCEM\(^2\)’s) on average in about 18 min using the direct method, while MCEM\(^2\) would need about 30 days [8]—a speedup of 2,400 times. Furthermore, for this model one could use tauleaping instead of the direct method, gaining a 3x speedup in performance while giving up little on accuracy (the Min., Av., and Max. MRE \(\%_\text {ERR}\) were 31.2, 41.5, and 56.3, respectively; Av. CPU time was 303 s).
The Schlögl system is another challenging case study, as clearly showed by results of Table 1, which were obtained by utilising tauleaping (as a matter of fact, for the Schlögl model the average accuracy of SPICE increases with the use of tauleaping). Our choice was motivated by the large CPU time of the direct method due to the fact that the upper steady state for X in the model has a large molecule number (about 600), which negatively impacts the running time of the direct method samples. The results of Table 1 show that there is no clear winner: the Evolutionary Programming method in COPASI has the smallest runtime, but twice the error achieved by SPICE, which has the best accuracy. As noted before, running the COPASI implementations with larger populations and more iterations did not significantly improve accuracy for the increased cost.
Lastly, the genetic Toggle Switch presents an interesting realworld case study with highthroughput data. The model now comprises four hyperparameters, each of which must be estimated alongside the four kinetic rate constants. In addition, the nondiscrete (and noisy) data is no longer known to be generated from a convenient mathematical model. In other terms, there is no guarantee that the model reflects the true underlying biochemical reaction network. Despite these challenges, our SPICE algorithm does a very good job (in little more than an hour of CPU time) in computing parameter estimates for which the model quite closely matches the experimental data—we see in fact from Fig. 5 that the model simulations fall inside the data, with very few exceptions, and the empirical and simulated distributions closely match.
Related Work. Techniques for stochastic rate parameter estimation fall into four categories. Early efforts included methods based on MLE: simulated maximum likelihood utilises Monte Carlo simulation and a genetic algorithm to maximise an approximated likelihood [34]. Efforts have been made to incorporate the ExpectationMaximisation (EM) algorithm with the SSA [18]. The stochastic gradient descent explores a Markov Chain Monte Carlo sampler with a MetropolisHastings update step [39]. In [25] a hidden Markov model is used for the system state, which is then solved by (approximate) likelihood maximisation. Lastly, a recent work [8] has combined an ascentbased EM algorithm with a modified crossentropy method. Another category of methodologies include Bayesian inference. In particular, approximate Bayesian computation (ABC) gains an advantage by becoming ‘likelihood free’, and recent advances in sequential Monte Carlo (SMC) samplers have further improved these methods [32, 35]. We note the similarities between ABC(SMC) approaches and SPICE. Both methods can utilize ‘elite’ samples to produce better parameter estimates. A key difference is that ABC(SMC) uses accepted simulation parameters to construct a posterior distribution, while SPICE utilizes complete trajectory information to compute optimal updates of an underlying parameter distribution. The Bayesian approach presented in [5] can handle partially observed systems, including notions of experimental error. Linear noise approximation techniques have been used alongside Bayesian analysis [19]. A very recent work [36] combines Bayesian analysis with statistical emulation in an attempt at reducing the cost due to the SSA simulations. A third class of methodologies center around the numerical solution of the chemical master equation (CME), which is often intractable for all but the simplest of systems. One approach is to use dynamic state space truncation [3] or finite state projection methods [9] that truncate the CME state space by ignoring the smallest probability states. Another variation is to use a method of moments approximation [10, 16] to construct ordinary differential equations (ODEs) describing the time evolution for the mean, variance, etc., of the underlying distribution. Other CME approximations are system size expansion using van Kampen’s expansion [11], and solutions of the FokkerPlanck equation [22] using a form of linear noise approximation. Finally, another method [42] treats intervals between time measurements piecewise, and within each interval an ODE approximation is used for the objective function. This method has been recently extended using linear noise approximation [43]. A recent work [1], tailored for highthroughput data, proposes a stochastic parameter inference approach based on the comparison of distributions.
6 Conclusions
In this paper we have introduced the SPICE algorithm for rate parameter inference in stochastic reaction networks. Our algorithm is based on the crossentropy method and Gillespie’s algorithm, with a number of significant improvements. Key strengths of our algorithm are its ability to use multiple, possibly incomplete datasets (including distribution data), and its (theoretically justified) use of tauleaping methods for model simulation. We have shown that SPICE works well in practice, in terms of both computational cost and estimate accuracy (which was often the best in the models tested), even on challenging case studies involving bistable systems and real highthroughput data. On a nontrivial case study, SPICE can be orders of magnitude faster than other approaches, while offering comparable accuracy in the estimates.
Notes
 1.
NB: we also tested the COPASI implementations using greater populations and more iterations (not shown), but found little improvement for the significant increase in computational cost.
 2.
Isopropyl \(\beta \)D1thiogalactopyranoside.
 3.
anhydrotetracycline.
References
Aguilera, L.U., Zimmer, C., Kummer, U.: A new efficient approach to fit stochastic models on the basis of highthroughput experimental data using a model of IRF7 gene expression as case study. BMC Syst. Biol. 11(1), 26 (2017)
Andersen, J.B., Sternberg, C., Poulsen, L.K., Bjørn, S.P., Givskov, M., Molin, S.: New unstable variants of green fluorescent protein for studies of transient gene expression in bacteria. Appl. Environ. Microbiol. 64(6), 2240–2246 (1998)
Andreychenko, A., Mikeev, L., Spieler, D., Wolf, V.: Approximate maximum likelihood estimation for stochastic chemical kinetics. EURASIP J. Bioinform. Syst. Biol. 2012(1), 9 (2012)
Blake, W.J., KAErn, M., Cantor, C.R., Collins, J.J.: Noise in eukaryotic gene expression. Nature 422(6932), 633–637 (2003)
Boys, R., Wilkinson, D., Kirkwood, T.: Bayesian inference for a discretely observed stochastic kinetic model. Stat. Comput. 18, 125–135 (2008)
Costa, A., Jones, O.D., Kroese, D.: Convergence properties of the crossentropy method for discrete optimization. Oper. Res. Lett. 35(5), 573–580 (2007)
Daigle, B.J., Roh, M.K., Gillespie, D.T., Petzold, L.R.: Automated estimation of rare event probabilities in biochemical systems. J. Chem. Phys. 134(4), 044110 (2011)
Daigle, B.J., Roh, M.K., Petzold, L.R., Niemi, J.: Accelerated maximum likelihood parameter estimation for stochastic biochemical systems. BMC Bioinform. 13(1), 68 (2012)
Dandach, S.H., Khammash, M.: Analysis of stochastic strategies in bacterial competence: a master equation approach. PLoS Comput. Biol. 6(11), 1–11 (2010)
Engblom, S.: Computing the moments of high dimensional solutions of the master equation. Appl. Math. Comput. 180(2), 498–515 (2006)
Fröhlich, F., Thomas, P., Kazeroonian, A., Theis, F.J., Grima, R., Hasenauer, J.: Inference for stochastic chemical kinetics using moment equations and system size expansion. PLoS Comput. Biol. 12(7), 1–28 (2016)
Gardner, T.S., Cantor, C.R., Collins, J.J.: Construction of a genetic toggle switch in Escherichia coli. Nature 403(6767), 339–342 (2000)
Gillespie, D.T.: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22(4), 403–434 (1976)
Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)
Gillespie, D.T.: Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 115(4), 1716–1733 (2001)
Hasenauer, J., Wolf, V., Kazeroonian, A., Theis, F.J.: Method of conditional moments (MCM) for the chemical master equation. J. Math. Biol. 69(3), 687–735 (2014)
Hoops, S., et al.: COPASI  a complex pathway simulator. Bioinformatics 22(24), 3067–3074 (2006)
Horváth, A., Martini, D.: Parameter estimation of kinetic rates in stochastic reaction networks by the EM method. In: BMEI, pp. 713–717. IEEE (2008)
Komorowski, M., Finkenstädt, B., Harper, C.V., Rand, D.A.: Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinform. 10(1), 343 (2009)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Leon, M.: Computational design and characterisation of synthetic genetic switches. Ph.D. thesis, University College London, UK (2017). http://discovery.ucl.ac.uk/1546318/1/Leon_Miriam_thesis_final.pdf
Liao, S., Vejchodský, T., Erban, R.: Tensor methods for parameter estimation and bifurcation analysis of stochastic reaction networks. J. Roy. Soc. Interface 12(108), 20150233 (2015)
McAdams, H.H., Arkin, A.: Stochastic mechanisms in gene expression. PNAS 94(3), 814–819 (1997)
Pirone, J.R., Elston, T.C.: Fluctuations in transcription factor binding can explain the graded and binary responses observed in inducible gene expression. J. Theoret. Biol. 226(1), 111–112 (2004)
Reinker, S., Altman, R.M., Timmer, J.: Parameter estimation in stochastic biochemical reactions. IEE Proc.  Syst. Biol. 153(4), 168–178 (2006)
Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, Heidelberg (2004). https://doi.org/10.1007/9781475741452
Rubinstein, R.Y.: Optimization of computer simulation models with rare events. Eur. J. Oper. Res. 99(1), 89–112 (1997)
Rubinstein, R.Y.: The crossentropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Prob. 1(2), 127–190 (1999)
Rubinstein, R.Y., Kroese, D.P.: The CrossEntropy Method. Springer, Heidelberg (2004)
Schlögl, F.: Chemical reaction models for nonequilibrium phase transitions. Zeitschrift für physik 253(2), 147–161 (1972)
Shaner, N.C., Campbell, R.E., Steinbach, P.A., Giepmans, B.N.G., Palmer, A.E., Tsien, R.Y.: Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein. Nat. Biotechnol. 22, 1567–1572 (2004)
Sisson, S.A., Fan, Y., Tanaka, M.M.: Sequential Monte Carlo without likelihoods. PNAS 104(6), 1760–5 (2007)
Sobol’, I.M.: On the distribution of points in a cube and the approximate evaluation of integrals. USSR Comput. Math. Math. Phys. 7(4), 86–112 (1967)
Tian, T., Xu, S., Gao, J., Burrage, K.: Simulated maximum likelihood method for estimating kinetic rates in gene expression. Bioinformatics 23(1), 84–91 (2007)
Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P.H.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. Roy. Soc. Interface 6(31), 187–202 (2009)
Vernon, I., Liu, J., Goldstein, M., Rowe, J., Topping, J., Lindsey, K.: Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions. BMC Syst. Biol. 12(1), 1 (2018)
Villaverde, A.F., Banga, J.R.: Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J. Roy. Soc. Interface 11(91), 20130505 (2013)
Voit, E.O.: The best models of metabolism. Wiley Interdisc. Rev.: Syst. Biol. Med. 9(6), e1391 (2017)
Wang, Y., Christley, S., Mjolsness, E., Xie, X.: Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC Syst. Biol. 4(1), 99 (2010)
Wilkinson, D.J.: Stochastic modelling for quantitative description of heterogeneous biological systems. Nat. Rev. Genet. 10(2), 122–133 (2009)
Wilkinson, D.J.: Stochastic Modelling for Systems Biology. CRC Press, Boca Raton (2012)
Zimmer, C., Sahle, S.: Parameter estimation for stochastic models of biochemical reactions. J. Comput. Sci. Syst. Biol. 6(1), 11–21 (2012)
Zimmer, C., Sahle, S.: Deterministic inference for stochastic systems using multiple shooting and a linear noise approximation for the transition probabilities. IET Syst. Biol. 9, 181–192 (2015)
Acknowledgements
This work has been supported by a BBSRC DTP PhD studentship and the EPSRC Portabolomics project (EP/N031962/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis> This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.</SimplePara> <SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>
Copyright information
© 2018 The Author(s)
About this paper
Cite this paper
Revell, J., Zuliani, P. (2018). Stochastic Rate Parameter Inference Using the CrossEntropy Method. In: Češka, M., Šafránek, D. (eds) Computational Methods in Systems Biology. CMSB 2018. Lecture Notes in Computer Science(), vol 11095. Springer, Cham. https://doi.org/10.1007/9783319994291_9
Download citation
DOI: https://doi.org/10.1007/9783319994291_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783319994284
Online ISBN: 9783319994291
eBook Packages: Computer ScienceComputer Science (R0)