# Data-Driven Method for Efficient Characterization of Rare Event Probabilities in Biochemical Systems

## Abstract

As mathematical models and computational tools become more sophisticated and powerful to accurately depict system dynamics, numerical methods that were previously considered computationally impractical started being utilized for large-scale simulations. Methods that characterize a rare event in biochemical systems are part of such phenomenon, as many of them are computationally expensive and require high-performance computing. In this paper, we introduce an enhanced version of the doubly weighted stochastic simulation algorithm (dwSSA) (Daigle et al. in J Chem Phys 134:044110, 2011), called dwSSA\(^{++}\), that significantly improves the speed of convergence to the rare event of interest when the conventional multilevel cross-entropy method in dwSSA is either unable to converge or converges very slowly. This achievement is enabled by a novel polynomial leaping method that uses past data to detect slow convergence and attempts to push the system toward the rare event. We demonstrate the performance of dwSSA\(^{++}\) on two systems—a susceptible–infectious–recovered–susceptible disease dynamics model and a yeast polarization model—and compare its computational efficiency to that of dwSSA.

## Keywords

Stochastic simulation Rare event probability estimation SSA dwSSA Gillespie algorithm Importance sampling## 1 Introduction

When Gillespie (1976, 1977) introduced the stochastic simulation algorithm (SSA), its use was deemed purely academic as computers were not powerful enough to support SSA simulations except for toy models. SSA is an exact numerical method in that its trajectories can be used to construct the chemical master equation (CME) as the number of simulations reach infinity. Every reaction is simulated explicitly (reaction time and index) until the final simulation time is reached for each trajectory. This can be computationally infeasible for a large system or even for a small system with many reaction firings. However, as computer processors became more affordable and powerful, increasing number of researchers started using the SSA to model a biological system and gained useful insight from numerical simulations. The dramatic increase in the usage can be seen by the number of citations SSA received; Gillespie’s paper (1977) was cited less than 100 times annually until 2003, and the number of annual citations spiked up to over 500 after 2007 (https://scholar.google.com/citations?user=QwXwK6UAAAAJ#).

With the popularity of SSA came new algorithms derived from it. Some were developed to increase the computational efficiency of the exact method (Gibson and Bruck 2000; Ramaswamy et al. 2009; Slepoy et al. 2008), while others featured faster computation at the expense of accuracy (Cao et al. 2007; Ben Hammouda et al. 2017; Tian and Burrage 2004; Auger et al. 2006; Gillespie 2001; Munsky and Khammash 2006). Specialized methods stemmed from SSA as well when researchers realized various scientific communities shared an interest in specific system behavior or characteristics, such as multiple timescale simulation (Chevalier and El-Samad 2009; Ball et al. 2006; Goutsias 2005; Cao et al. 2004, 2005), model reduction (Kang and Kurtz 2013; Gillespie et al. 2009), steady-state dynamics (Mauch and Stalzer 2010; Grima et al. 2012), and rare event characterization (Donovan et al. 2013; Zelnik et al. 2015; Xu and Cai 2011; Kuwahara and Mura 2008; Gillespie et al. 2009; Roh et al. 2010, 2011). The last area, field of rare event characterization, is relatively new because of the exceptionally high computational requirements associated with estimating a rare event probability. In order to obtain an accurate estimate, an exact method must be used. Accuracy lost from using an approximate method is likely to be much greater than the magnitude of the rare event probability. Moreover, variance of the estimate decreases slowly, proportional to the square root of the total number of simulations. Despite these hurdles, many important events in biology, chemistry, and epidemiology are rare and stochastic by nature. Examples of a significant rare event include mutation of a normal cell into a cancerous cell (Wang et al. 2014; Luebeck and Moolgavkar 2002; Moolgavkar and Knudson 1981), phage \(\lambda \) (Cao et al. 2010; Arkin et al. 1998), development of multidrug-resistant bacteria (Nikaido 2009; Maisonneuve et al. 2013), and resurgence of a disease Watts et al. (2005).

Development of the weighted stochastic simulation algorithm (wSSA) by Kuwahara and Mura (2008) alleviated some of the computational tolls by using importance sampling (IS) in the reaction selection process. In wSSA bias introduced by IS parameters is recorded at each reaction selection step and used at the end of the simulation to obtain an unbiased estimate of the rare event probability. Doing so does not affect the accuracy of wSSA, and with a good choice of IS parameters, a significant reduction in variance can be achieved. However, there are two major drawbacks with the wSSA. First is that the method does not provide any means to assess the accuracy of the resulting estimate. It is well known that a bad choice of IS parameters can yield an estimate whose variance is higher than that of an unbiased estimator. This problem was solved when Gillespie et al. (2009) demonstrated that running sum of trajectory weights can be used to compute the uncertainty of the final estimate without affecting the time complexity of wSSA. Second drawback of wSSA is that it did not provide a principled way to choose a good set of IS parameters. Having to guess the value of each IS parameter, one for every reaction is unreasonable even for a modeler who has a considerable insight into the system, especially in the presence of nonlinear reactions. This predicament was addressed by Daigle et al. (2011) with doubly weighted SSA (dwSSA), where both the time to the next reaction and the reaction index are biased. Significance of double weighting (biasing) is that its mathematical form of a trajectory weight can be used to compute a closed-form solution for the optimal IS parameters that minimize cross-entropy, which is used as a proxy to minimum variance. Calculating variance involves, except for a few simple toy models, computation of higher moments, which in turn depend on higher moments. Being able to obtain a closed-form solution is critical for computational efficiency and accuracy, and dwSSA provides an automated and principled way to compute good IS parameters that yield a low-variance estimate. In order to achieve this, Daigle et al. modified and incorporated a *multilevel* version of the cross-entropy (CE) method by Rubinstein and his colleagues (Rubinstein and Kroese 2004; Rubinstein 1997) into the SSA.

While dwSSA offers automatic selection of good IS parameters, its performance highly depends on the convergence rate of the multilevel CE method that computes optimal IS parameters. If the system exhibits low stochasticity, it is likely that dwSSA converges very slowly to the rare event. The worst-case scenario is that the multilevel CE method does not converge and is unable to return IS parameters. Since having a good set of IS parameters is necessary for obtaining a low-variance estimate, failure in the multilevel CE method is detrimental to the performance of dwSSA. In this paper we introduce dwSSA++ that contains a novel and improved method for computing optimal importance sampling parameters when the system is unable to reach the rare event with sufficient speed. In Sect. 2, we review the doubly weighted stochastic simulation algorithm and present the polynomial leaping method that is used to improve the speed of convergence. Pseudo-algorithms are provided in addition to the MATLAB code (https://github.com/minroh/dwSSA_pp) for ease of understanding. We then apply the dwSSA and the dwSSA++ to a susceptible–infectious–recovered–susceptible (SIRS) model to compare their computational efficiency and accuracy in Sect. 3. Finally, we summarize our contributions in Sect. 4.

## 2 Method

### 2.1 Stochastic Simulation Algorithm and Stochastic Chemical Kinetics

We focus on a well-stirred stochastic system with *N* species \(\{S_1, \ldots , S_N\}\), who interact through any of *M* reaction channels \(\{R_1, \ldots , R_M\}\) to change its population in discrete values. The state of the system at time *t* is denoted by \(\mathbf {X}(t) \equiv (X_1(t), \ldots , X_N(t))\), where \(X_i(t)\) corresponds to the number of molecules of \(S_i\) at time *t*. Probability that reaction \(R_j\) fires in the interval \([t, t+\mathrm {d}t)\) is given by its propensity function \(a_j(\mathbf {x}) \equiv a_j(\mathbf {X}(t)), \) \( \; j \in {1, \ldots , M}\), where \(\mathrm {d}t\) is an infinitesimal time increment. The sum of all *M* propensity functions is denoted \(a_0(\mathbf {x})\).

The SSA simulates time evolution of \(\mathbf {x}\) by generating a sequence of samples on two random variables: \(\tau \), time elapsing between the current and the next reaction firings; and \(j'\), index of the reaction fired at time \(t + \tau \). First random variable \(\tau \) is exponentially distributed with mean \(1/a_0(\mathbf {x})\), while \(j'\) is a categorical random variable where the probability of \(R_j\) being chosen as the next reaction is \(a_j(\mathbf {x})/a_0(\mathbf {x}), \; j \in \{1,\ldots , M\}\). After \(\tau \) and \(j'\) are computed, we update the state of the system using a \(M \times N\) stoichiometry matrix \(\mathbf {V}\), whose *j*th row \(\nu _j\) indicates the amount of change in \(\mathbf {x}\) due to one \(R_j\) reaction firing, i.e., \(\mathbf {X}(t + \tau ) = \mathbf {X}(t) + \mathbf {V}_{:,j}'\).

### 2.2 Doubly Weighted Stochastic Simulation Algorithm

We give a brief description of dwSSA here. Further details can be found in Daigle et al. (2011). The goal of dwSSA is to generate trajectories to characterize the probability of reaching a rare event \({\mathscr {E}}\) by final time \(t_f\). Thus, a trajectory is simulated until either \(t_f\) is reached or event \({\mathscr {E}}\) is observed at a stopping time \({\mathscr {T}} < t_f\), whichever occurs sooner. The form of rare event probability on which the dwSSA operates is \(p(\mathbf {x}_0, {\mathscr {E}}; t_f)\); it is defined as the probability that the system starting at time 0 in state \(\mathbf {x}_0\) will first reach rare event \({\mathscr {E}}\) by some time \(\le t_f\).

*predilection*function \(b_j(\mathbf {x})\) instead of the propensity function \(a_j(\mathbf {x})\), where \(b_j \equiv a_j \times \gamma _j, \ b_0 = \sum _{j=1}^{M} b_j(\mathbf x),\) and \(\gamma _j \in \mathbb {R}^+\). Using the predilection function, \(\tau \) now has a mean \(1/b_0(\mathbf {x})\), and \(j'\) is categorically distributed with probability \(b_j(\mathbf {x})/b_0(\mathbf {x})\). Thus, denoting \(N_{{\mathscr {T}}}\) as the total number of reactions that fire in the interval \([0,{\mathscr {T}}]\), the probability of a single dwSSA trajectory \(\mathbf {J} \equiv (\tau _1, j'_1, \ldots , \tau _{N_{{\mathscr {T}}}}, j'_{N_{{\mathscr {T}}}})\) takes the form

*k*. Daigle et al. solved this problem by using a

*multilevel*version of the cross-entropy method (Rubinstein and Kroese 2004), which takes the system closer to \({\mathscr {E}}\) in an iterative manner using favorable signals obtainable from the current state. Starting with \(s=1\) and \({\gamma }^{(0)}=\mathbf {1}\), we define an

*intermediate*rare event \({\mathscr {E}}^{(s)}\) as the value closest to \({\mathscr {E}}\) that is reachable by top \(\rho \) fraction of all trajectories simulated with \({\gamma }^{(s-1)}\). We note that no computation of \(\gamma \) is required in the beginning (\(s=1\)) as the system starts unbiased, i.e., \({\gamma }^{(s-1)}={\gamma }^{(0)}=\mathbf {1}\). After computing \({\mathscr {E}}^{(s)}\), we compute the following closed-form solution to obtain \(\hat{\gamma }_j^{(s)}, \, j \in \{1, \ldots , M\}\):

*k*iterates only over trajectories reaching the intermediate rare event \({\mathscr {E}}^{(s)}\). This procedure repeats until the intermediate rare event \({\mathscr {E}}^{(s)}\) either surpasses or reaches \({\mathscr {E}}\). At this time we terminate the multilevel CE method and set \(\hat{{\gamma }}^* \equiv \hat{{\gamma }}^{(s)}\). The final step is to obtain an estimate for \(p(\mathbf {x}_0, {\mathscr {E}}; t_f)\) using \(\hat{{\gamma }}^*\). We note that we cannot derive a closed-form solution for \(\hat{\gamma }^*\) using the probability expression for wSSA or swSSA. It is a unique feature of the dwSSA and the sdwSSA (Roh et al. 2011), latter of which also employs double biasing but with state-dependent IS parameters. Both the closed-form solution and automatic determination of importance sampling parameters are needed for the algorithm to be of practical use, especially for systems that contain nonlinear reactions.

The uncertainty in **step** 21 can be used to assess quality of the estimate \(\hat{p}(\mathbf {x}_0, {\mathscr {E}}; t_f)\). Denoting the true probability as \(p(\mathbf {x}_0, {\mathscr {E}}; t_f)\), the probability that \(\left( \hat{p}(\mathbf {x}_0, {\mathscr {E}}; t_f) {-} \sigma /K\right) \le p(\mathbf {x}_0, {\mathscr {E}}; t_f) \le \left( \hat{p}(\mathbf {x}_0, {\mathscr {E}}; t_f)+\sigma /K\right) \) is 68%. Doubling the interval (\(2\sigma /K\)) raises the confidence level to 95% and tripling to 99.7%. Thus, the smaller the uncertainty is, the tighter the confidence interval will be. If the uncertainty has the same order magnitude as the rare event probability estimate, then there is little to no trust in the value of \(\hat{p}_{dwSSA}(\mathbf {x}_0, {\mathscr {E}}; t_f)\), and the user is advised to increase *K* and rerun the algorithm.

### 2.3 Extrapolation of Biasing Parameters Using Past Simulation Data

The only difference between the dwSSA and dwSSA\(^{++}\) lies on how \(\hat{{\gamma }}^*\) is computed; given a set of IS parameters, both algorithms compute \(\hat{p}(\mathbf {x}_0, {\mathscr {E}}; t_f)\) using Algorithm 1. However, automatic computation of \(\hat{{\gamma }}^*\) is the most important component that makes the dwSSA efficient and practical compared to the earlier and related methods (Kuwahara and Mura 2008; Roh et al. 2010, 2011). Without automatic computation of \(\hat{{\gamma }}^*\), dwSSA becomes impractical as the user is expected to provide an importance sampling parameter for each reaction in the system. Although a user may be able to guess the general direction of biasing, i.e., encouraging (\(\gamma _j > 1\)) or discouraging (\(\gamma _j < 1\)), it is almost impossible for the user to guess values of all IS parameters in the system that can produce a low-variance estimate. In addition, manually tuning IS parameters (Roh et al. 2010, 2011) is not computationally feasible for any large systems. Therefore, except for very simple models, multilevel CE method is expected to run to obtain \(\hat{{\gamma }}^*\) prior to starting Algorithm 1.

While the multilevel CE method allows for automatic computation of \(\hat{{\gamma }}^*\) that minimizes cross-entropy, its performance largely depends on the speed of convergence to the rare event. For many applications, computational cost of multilevel CE method is negligible compared to the total cost of the simulation since the number of simulations used in multilevel CE method is often orders of magnitude less than that used in Algorithm 1 (Daigle et al. 2011; Roh et al. 2011). It is possible, however, for the computation time in multilevel CE method to dominate the total simulation time. This can happen when the system under study exhibits low stochasticity. If population count is high for all species, there will be little variability among trajectories. Even for a system with small population, IS parameters computed in a prior iteration can bring the system to a strongly stable stochastic equilibrium. In both cases, lack of variability in \(\mathbf {x}\) among trajectories is likely to result in an intermediate event that is either close or equal to the system’s average behavior. In fact, it is possible that \({\mathscr {E}}^{(s)}\) is farther from \({\mathscr {E}}\) than \({\mathscr {E}}^{(s-1)}\) . In the worst case, \({\mathscr {E}}^{(s)}\) may never converge to \({\mathscr {E}}\) and no \(\hat{{\gamma }}^*\) is computed. For this reason, it is recommended that the user sets a limit of iterations on the multilevel CE method to avoid running *ad infinitum*. For simulations in this paper, we set this number to 20.

Here, \(s_{\text {max}}\) denotes the maximum number of iterations allowed to compute \(\hat{{\gamma }}^*\) before declaring the algorithm failed to converge. For examples shown in Sect. 3, we set \(s_{\text {max}}\) to 20. The number of past data used to assess convergence rate and form an interpolant is defined as \(l_d\), which is shown in **step** 6 of Algorithm 2. While \(l_d\) can be any integer greater than 1, we recommend that it does not exceed 5. The reason is that increasing the number of data required for interpolation is not likely to increase the quality of the resulting interpolant. If good progress is made toward \({\mathscr {E}}\) with the conventional multilevel CE method, polynomial leaping method will not be called. On the other hand, if the system is converging slowly or not at all, having a large value of \(l_d\) delays the initial calling of the polynomial leaping method until at least \(l_d\) iterations of multilevel CE method are executed. Starting the polynomial leaping method also implies that past intermediate rare events (IREs) are similar in their values; the same must be true for the importance sampling parameters corresponding to these IREs. Thus, requiring a large number of past data is not expected to significantly increase the quality of resulting interpolant and will delay the system from leaping. For these reasons, we set the default value for \(l_d\) to 3.

**step**6). If any one of the two conditions evaluates to be true, then the polynomial leaping method (Algorithm 3) is used to compute \({\gamma }^{(s)}\) instead of

**step**15. First condition is true when \(l_d\) past intermediate rare events form a non-strictly converging sequence to \({\mathscr {E}}\). This means any stalling or regressing in \({\mathscr {E}}^{(s)}\) values during \(l_d\) stages of multilevel CE method will trigger polynomial leaping. Second condition is satisfied if the estimated number of iterations to reach the rare event exceeds a preset threshold \(\sigma \), which is set to 5 by default. We obtain the estimated number of iterations, \(\mu \), by first computing the speed of progress based on the last two multilevel CE iterations:

In order to determine the leaping eligibility, Algorithm 2 executes a series of diagnostic questions via binary decision tree shown in Fig. 1. The two conditions that trigger leaping correspond to the first node and its left child node, respectively. If neither condition is met, then the multilevel CE method is resumed to determine \(\gamma ^{(s)}\) as sufficient progress is being made toward \({\mathscr {E}}\), i.e., \(\mu <= \sigma \). This case corresponds to the leaf node with value **Run mCE** in Fig. 1. On the other hand, if the underlying system is neither making a progress toward the rare event nor exhibiting any signal, Algorithm 2 is unable to determine the direction of bias required to reach \({\mathscr {E}}\). While unlikely to occur for most systems, it is theoretically possible. For example, this may happen if the chosen initial state coincides with the system’s strong equilibrium state with very low variance. This case is indicated by leaf node with value **No signal** in Fig. 1. In all other cases, the binary decision tree returns two pieces of information required to initiate leaping (Algorithm 3): method of extrapolation and the type of input data. The method can be either polynomial interpolation or bisection and is decided based on the number of input data available. Bisection is employed only when there is a single eligible data for extrapolation. Polynomial interpolation is used otherwise. Here, we fit a low-degree polynomial according to specifications returned by the binary decision tree. Interpolants constructed by the polynomial leaping method are kept at low degree (1 or 2) since (7) was derived assuming convexity Daigle et al. (2011) and a small number of data, \(l_d\), is used to compute the interpolants. We note that the default value for \(l_d\) (=3) is set such that it is the minimum number of data required to construct a polynomial interpolant of degree 2. Leaves of the binary decision tree that correspond to polynomial interpolation contain value **Poly.** with its degree (**Deg. 1** or **Deg. 2**). Bisection is indicated by the keyword **Bisection**.

Second piece of information returned by the binary decision tree, type of input data, can be either past counter values or past IRE values. Between these two types, the former is preferred to the latter. Counter data represent the number of trajectories that reached \({\mathscr {E}}\) from past \(l_d\) iterations of multilevel CE method. Cardinality of the set of possible values for counters is \(\mathbf {card}(\{0,1,\ldots ,\lfloor K \times \rho \rfloor \})\), which is large for commonly chosen values of \(K (10^5 \, \text {to} \,10^8)\) and \(\rho (10^{-4}\, \text {to} \,10^{-2})\), where smaller value of \(\rho \) is associated with larger *K*. Upper limit of this set is \(\lfloor K \times \rho \rfloor \), as the multilevel CE method is able to compute \(\hat{{\gamma }}^*\) once \((K \times \rho )\) or more number of trajectories reach \(\mathscr {E}\). The large range allows the algorithm to easily assess the effect of change in biasing parameter values and compute reliable interpolants. On the other hand, the range of intermediate rare events varies greatly depending on the definition of a rare event for a given system; a wide range of biasing parameters may correspond to the same intermediate rare event. There is one notable advantage in using past IREs, however. We do not need to worry about its existence; unlike counters data, past IRE data is always available regardless of the system’s proximity to \(\mathscr {E}\). Unless the system starts in a strong stochastic equilibrium, which is very unlikely given myriad possible combinations of random initial reaction rate values, multilevel CE method will make a progress toward the rare event. The progress does not guarantee any trajectories to reach the rare event, and thus counter data may be 0. Nevertheless, its IRE value will be closer to \(\mathscr {E}\) due to the progress. And if the system reaches a strong equilibrium during the simulation and produce \(l_d\) IREs with the same value, we can extract more signal by accessing IRE values beyond past \(l_d\) iterations. This is the reason queries in the binary decision tree contain checks for all past IREs if the last \(l_d\) IRE values are identical. Thus, the order of preferred data type in the algorithm is counters, past \(l_d\) IRE values, and all past IRE values.

Once interpolants are constructed, we decide on the value of the output variable \(\xi \) that we want the system to produce on the next iteration of Algorithm 2. This value is assigned as the RHS of each *M* interpolant to compute \(\gamma _j^{(s)}\). Since \(\xi \) is an unobserved value outside the range of past behavior, obtaining \(\gamma _j^{(s)}\) is considered extrapolation. We note that computing \(\gamma ^{(s)}\) via extrapolation replaces the traditional multilevel CE routine (Algorithm 2, **step** 15), saving *K* trajectory simulations per each leaping.

*far*from observing the rare event, and we set the target to a more conservative value of \(\xi = \lceil \rho K \rceil \).

*h*reflects the absolute amount of progress made in IRE from the most recent simulation, and \(\mu \) denotes the number of iterations required to reach the rare event assuming the amount convergence per iteration stays at

*h*. We then compute the desired amount of progress for the next iteration, \(\delta \), which is the lesser of \(\mu /2\) and 3

*h*. The first quantity, \(\mu /2\), indicates that we aim to halve the distance to \(\mathscr {E}\) in the next simulation by utilizing leaping. The fact that past IRE values are used to construct interpolants instead of past counter data indicates that the system is not producing trajectories that observe \(\mathscr {E}\) under the current parametrization. Therefore, setting the next target to \(\mathscr {E}\) would be too aggressive and likely result in extrapolation beyond what the data can reliably predict. The second quantity 3

*h*sets a maximum limit on the target progress to three times the size of current progress. This limit also ensures extrapolation is not too extreme using the absolute distance to the rare event. If the current state is far from \(\mathscr {E}\), halfway point between the latest IRE and \(\mathscr {E}\) still may be too far for an accurate extrapolation. By imposing these two limits, we compute \(\xi \) more conservatively with IREs than with counters data to account for lack of trajectories reaching the rare event. Pseudo-algorithm for polynomial leaping method is shown in Algorithm 3.

We note that extrapolation of biasing parameters with leaping method can be selectively applied for large systems, where only few reactions may play an important role in observing the rare event. Second example in Sect. 3 illustrates this point.

## 3 Results

In this section, we illustrate the performance of dwSSA\(^{++}\) by comparing it to that of dwSSA on two example systems—a susceptible–infectious–recovered–susceptible (SIRS) disease dynamics model and a yeast polarization model. In order to minimize the difference in results due to stochasticity, same random number seeds were used for the corresponding dwSSA and dwSSA\(^{++}\) simulations. Default parameterizations are used for dwSSA\(^{++}\)-specific parameters, i.e., \(l_d = 3 \) and \(\sigma = 5\). We emphasize again that the two algorithms differ only in the method for computing optimal biasing parameters, i.e., conventional multilevel CE method vs modified multilevel CE method with polynomial leaping. Once \({\hat{\gamma }^*}\) is computed, both dwSSA and dwSSA\(^{++}\) run Algorithm 1 to estimate the rare event probability. All simulations were run using MATLAB\(^{\tiny {\textregistered }}\) 2017a and Parallel Computing Toolbox™ on Intel\(^{\tiny {\textregistered }}\) Core™ i7-6400U CPU. All codes used in simulations are available at https://github.com/minroh/dwSSA_pp.

### 3.1 SIRS

*S*becomes infected by an infectious individual in

*I*at rate \(\beta \). Infectious individuals recover at rate \(\lambda \). However, the immunity wanes and the members of

*R*rejoin the susceptible pool

*S*at rate \(\omega \). For this system we examine the event probability \(p(\mathbf {x}_0, \theta ^I; t_f) \equiv p([100 \; 1 \; 0], 60;30)\), i.e., probability that the population of

*I*reaches 60 before \(t_f=30\) given \(\mathbf {x}_0\) and \(k_0 = [\beta \; \lambda \; \omega ]\). Although the population of all three species stay small throughout the simulation, this particular parameter combination causes the system to exhibit low stochasticity, and the multilevel CE method of dwSSA does not converge by iteration 20 when default simulation parameter values (\(\rho = 0.01, K = 10^5\)) are used. The most extreme IRE observed in this simulation is 45.

There are two algorithmic parameters—\(\rho \) and *K*—that can be tuned to improve speed of convergence albeit each having an associated drawback. The first parameter \(\rho \) indicates the fraction of trajectories used to determine an intermediate rare event. Lowering the value of \(\rho \) will likely result in an IRE closer to the rare event. However, this also lowers the number of data used to compute the corresponding biasing parameters. Biasing parameters computed with only few data may not be reliable and yield an estimate with high variance. This drawback can be mitigated by increasing the total simulation size *K*. A big disadvantage of increasing the value of *K* for most systems is longer simulation time. However, doing so could lead to convergence for some systems that do not converge with a smaller *K*. Precise relationship between convergence rate and the two parameters is system dependent and often difficult to gauge when nonlinear reactions, such as \(R_2\) in the SIRS model, are present.

Results of multilevel CE method and Algorithm 2 applied to the SIRS model

| \(\rho \) | No. iter. | Convergence | \(\hat{\gamma }^{*}\) | Tot. time (hr) | Gain \(\left( \frac{(\text {dwSSA})}{(\text {dwSSA}^{++})}\right) \) |
---|---|---|---|---|---|---|

\(10^5\) | 0.01 | 20 | No (45) | NA | 1.97 | \(\infty \) |

11 (4) | Yes | (1.222 0.688 1.122) | 1.22 | |||

0.005 | 20 | No (47) | NA | 2.10 | \(\infty \) | |

7 (2) | Yes | (1.231 0.728 1.130) | 0.83 | |||

0.001 | 9 | Yes | (1.350 0.597 1.230) | 1.20 | 2.27 | |

5 (1) | Yes | (1.265 0.680 1.116) | 0.53 | |||

\(5\times 10^{-4}\) | 7 | Yes | (1.445 0.525 1.234) | 0.78 | 1.03 | |

7 (2) | Yes | (1.342 0.611 1.293 ) | 0.76 | |||

\(10^{-4}\) | 5 | Yes | (1.356 0.588 1.318) | 0.59 | 1.26* | |

4 | Yes | (1.197 0.793 1.264 ) | 0.47 | |||

\(10^6\) | 0.01 | 20 | No (45) | NA | 18.92 | \(\infty \) |

8 (3) | Yes | (1.256 0.666 1.207) | 8.09 | |||

0.005 | 20 | No (46) | NA | 19.43 | \(\infty \) | |

12 (3) | Yes | (1.369 0.573 1.250) | 16.50 | |||

0.001 | 20 | No (50) | NA | 21.90 | \(\infty \) | |

7 (2) | Yes | (1.175 0.757 1.051) | 7.58 | |||

\(5\times 10^{-4}\) | 15 | Yes | (1.397 0.641 1.174) | 19.40 | 2.19 | |

7 (2) | Yes | (1.310 0.605 1.222) | 8.87 | |||

\(10^{-4}\) | 5 | Yes | (1.146 0.810 1.050) | 6.12 | 1.19 | |

5(1) | Yes | (1.460 0.586 1.257) | 5.14 | |||

\(5\times 10^{-5}\) | 5 | Yes | (1.176 0.816 1.030) | 5.60 | 1.16* | |

4 | Yes | (1.191 0.747 1.090) | 4.82 | |||

\(10^{-5}\) | 4 | Yes | (1.247 0.661 1.251) | 4.84 | 1.07* | |

4 | Yes | (1.341 0.571 1.097) | 4.52 |

Several interesting observations can be made from Table 1. First, it is clear that lowering \(\rho \) for a given *K* increases the rate of convergence to the rare event, especially for the conventional multilevel CE method simulations. However, the number of data used to compute \(\hat{\gamma }^{*}\) decreases too, and this results in high variability in \(\hat{\gamma }^{*}\). For example, dwSSA\(^{++}\) does not employ polynomial leaping when using \(\rho = 10^{-4}\) and \(K=10^5\) (6th row in Table 1), making the algorithm equivalent to the conventional multilevel CE method. However, the two simulations yielded \(\hat{\gamma }^{*}\) values that are noticeably different, e.g., 26% difference in \(\hat{\gamma }^{*}_{2}\). The difference is not due to \(R_2\) being insignificant in producing \(\theta ^I\) since \(\gamma _2\) stays consistently below 1 throughout the simulation. This is because each iteration of multilevel CE method relied on only the top 10 data (\(10^5 \times 10^{-4} = 10\)) to compute the next IRE and its corresponding biasing parameters. When either \(\rho \) or *K* increases, we see that this variability disappears. For dwSSA\(^{++}\) runs that employ polynomial leaping, extrapolation leads to deviation from minimizing cross-entropy, and resulting \(\hat{\gamma }^{*}\) is expected to differ from the one obtained by using the conventional multilevel CE method. And the difference does not imply better or worse performance. However, \(\hat{\gamma }^{*}_{j}\) values obtained from multiple simulations using the same algorithm and parameterization should be consistent given \(R_j\) is involved in rare event production.

*K*results in faster convergence, it is not worth the computational gain if the resulting biasing parameters yield a high variance estimate.

It is worth noting that the conventional multilevel CE method was not able to compute \(\hat{\gamma }^{*} \) for five of the twelve runs in this parameter sweep. We see from Table 1 that dwSSA simulations using \(\rho \in \{0.005, \; 0.01\}\) for \(K = 10^5\) and \(\rho \in \{0.001 \; 0.005, \; 0.01\}\) for \(K = 10^6\) did not converge within 20 iterations, while all dwSSA\(^{++}\) simulations converged by iteration 12. It is also clear from Table 1 that performance of conventional multilevel CE method is sensitive to changes in both \(\rho \) and *K*. On the other hand, performance of Algorithm 2 is robust with respect to both parameters and exhibits superior convergence. Furthermore, because Algorithm 2 utilizes leaping only when it detects slow convergence, it reduces to the conventional CE method when enough progress is being made toward the rare event. This is illustrated by a gradual decline in the number of times polynomial leaping is employed with decreasing \(\rho \) (Row 3 in Table 1).

### 3.2 Yeast Polarization

*Saccharomyces cerevisiae*Drawert et al. (2010) in a similar fashion as Daigle et al. (2011). Our modified system consists of seven species \(\mathbf {x} = [R \; L \; RL \; G \; G_a \; G_{bg} \; G_d]\) and is characterized by the following eight reactions:

Results of multilevel CE method and Algorithm 2 applied to the yeast polarization model

| \(\rho \) | No. iter. | Convergence | \([\hat{\gamma }^*_6 \; \hat{\gamma }^*_8 ]\) | Tot. time (hr) | Gain \(\left( \frac{(\text {dwSSA})}{(\text {dwSSA}^{++})}\right) \) |
---|---|---|---|---|---|---|

\(10^5\) | 0.01 | 20 | No (298) | NA | 1.61 | \(\infty \) |

10 (2) | Yes | (0.120 4.445) | 0.72 | |||

0.005 | 16 | Yes | (0.112 3.194) | 1.20 | 1.47 | |

12 (3) | Yes | (0.0889 3.352) | 0.82 | |||

0.001 | 8 | Yes | (0.146 3.174) | 0.62 | 1.20 | |

7 (1) | Yes | (0.0846 3.062) | 0.52 | |||

\(5\times 10^{-4}\) | 7 | Yes | (0.124 3.420) | 0.55 | 1.01* | |

7 | Yes | (0.130 2.896) | 0.55 | |||

\(10^{-4}\) | 5 | Yes | (0.169 2.865) | 0.38 | 0.82* | |

6 | Yes | (0.176 3.331) | 0.47 |

Similar to the SIR model, we see a gradual increase in performance with decreasing value of \(\rho \) when using the conventional multilevel CE method, while the performance of Algorithm 2 is relatively robust with respect to \(\rho \). Algorithm 2 converges in all five sets of simulations while the multilevel CE method does in only three. As the convergence rate increases with decreasing \(\rho \), leaping method is triggered less frequently, and the two methods eventually become equivalent when no leaping is employed. When there is no leaping, any difference in the performance is purely due to stochasticity. We note that it is possible to modify Algorithm 2 to dynamically choose biasing parameters that can be used for extrapolation when leaping method is triggered. When slow convergence is detected, past biasing parameter values can be scanned to select leaping indices prior to entering Algorithm 3. Therefore, it is not necessary to run simulations prior to decide which reactions are to be extrapolated.

We illustrate effectiveness of leaping with \(\rho = 0.01\) and \(K=10^5\) on Fig. 4. Only the dwSSA\(^{++}\) converges using this parameter combination after utilizing leaping twice in iterations 7 and 9. The conventional multilevel CE method gets close to producing the rare event but never reaches it by iteration 20. We see from Fig. 4a that lack of time is not the main cause, as the dwSSA observes \(G_{\beta \gamma } > 290\) after iteration 6. The maximum \(G_{\beta \gamma }\) population in this simulation is 298, and it is first observed during iteration 13. We hypothesize that the system entered a stochastic equilibrium around this time, and that prevented the algorithm from converging. On the other hand, dwSSA\(^{++}\) recognizes slow convergence first at iteration 7 and then again at iteration 9. By extrapolating \(\gamma _6\) and \(\gamma _8\) values using past IRE data, the algorithm reaches the rare event by iteration 10 and successfully computes \(\hat{\gamma }^{*} \). Figure 4b shows \(\gamma _6\) and \(\gamma _8\) values computed by both algorithms. We see that the most significant change in \(\gamma _6\) from dwSSA\(^{++}\) occurred during the first leaping and \(\gamma _8\) during the second leaping.

## 4 Conclusion and Discussion

This paper describes dwSSA\(^{++}\) and its novel contribution in improving automatic computation of biasing parameters required to characterize a rare event probability. Numerical results from two example systems in Sect. 3 support our claim that the polynomial leaping method employed by dwSSA\(^{++}\) can significantly shorten simulation time in computing biasing parameters. We showed that the 12 simulations that employed polynomial leaping at least once performed better than its corresponding dwSSA simulations. Furthermore, the dwSSA\(^{++}\) converged on all 17 sets while the dwSSA failed to compute biasing parameters on 6 of them. Thus, the benefit of using dwSSA\(^{++}\) is not limited to computational efficiency but also lowering the failure rate in computing biasing parameters.

We note that the main contribution of dwSSA is in automatic computation of biasing parameters. Similar methods existed prior to dwSSA that utilized importance sampling to efficiently estimate a rare event probability (Kuwahara and Mura 2008; Gillespie et al. 2009; Roh et al. 2010), but they were all impractical for large systems because there was no principled method to compute biasing parameters that could yield a low-variance probability estimate. Therefore, the dwSSA is as impractical as its predecessors without automatic computation of the biasing parameters. Although the multilevel CE method used in dwSSA works well for many systems, it can fail to converge when a system is in a stochastic equilibrium or exhibiting low stochasticity. The dwSSA\(^{++}\) attempts to resolve this problem by extrapolating biasing parameters using past simulation data when slow convergence is detected. The algorithm also offers tuning parameters that define the threshold for slow convergence and the amount of past data utilized in polynomial leaping method. This allows for flexible controlling of the algorithm.

## Notes

### Acknowledgements

The author thanks Bill and Melinda Gates for their active support of this work and their sponsorship through the Global Good Fund.

## References

- Arkin A, Ross J, McAdams HH (1998) Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected escherichia coli cells. Genetics 149(4):1633–1648. http://www.genetics.org/content/149/4/1633
- Auger A, Chatelain P, Koumoutsakos P (2006) R-leaping: accelerating the stochastic simulation algorithm by reaction leaps. J Chem Phys 125(8):084,103. https://doi.org/10.1063/1.2218339 CrossRefGoogle Scholar
- Ball K, Kurtz TG, Popovic L, Rempala G (2006) Asymptotic analysis of multiscale approximations to reaction networks. Ann Appl Probab 16(4):1925–1961. https://doi.org/10.1214/105051606000000420 MathSciNetCrossRefzbMATHGoogle Scholar
- Bar EE, Ellicott AT, Stone DE (2003) G recruits rho1 to the site of polarized growth during mating in budding yeast. J Biol Chem 278(24):21798–21804. https://doi.org/10.1074/jbc.M212636200. http://www.jbc.org/content/278/24/21798.abstract CrossRefGoogle Scholar
- Ben Hammouda C, Moraes A, Tempone R (2017) Multilevel hybrid split-step implicit tau-leap. Numer Algorithms 74(2):527–560. https://doi.org/10.1007/s11075-016-0158-z MathSciNetCrossRefzbMATHGoogle Scholar
- Cao Y, Gillespie D, Petzold L (2005) Multiscale stochastic simulation algorithm with stochastic partial equilibrium assumption for chemically reacting systems. J Comput Phys 206(2):395–411. https://doi.org/10.1016/j.jcp.2004.12.014. http://www.sciencedirect.com/science/article/pii/S0021999104005182 MathSciNetCrossRefGoogle Scholar
- Cao Y, Gillespie DT, Petzold LR (2004) The slow-scale stochastic simulation algorithm. J Chem Phys 122(1):014,116. https://doi.org/10.1063/1.1824902 CrossRefGoogle Scholar
- Cao Y, Gillespie DT, Petzold LR (2007) Adaptive explicit–implicit tau-leaping method with automatic tau selection. J Chem Phys 126(22):224,101. https://doi.org/10.1063/1.2745299 CrossRefGoogle Scholar
- Cao Y, Lu HM, Liang J (2010) Probability landscape of heritable and robust epigenetic state of lysogeny in phage lambda. Proc Natl Acad Sci 107(43):18445–18450. https://doi.org/10.1073/pnas.1001455107. http://www.pnas.org/content/107/43/18445.abstract CrossRefGoogle Scholar
- Chevalier MW, El-Samad H (2009) A rigorous framework for multiscale simulation of stochastic cellular networks. J Chem Phys 131(5):054,102. https://doi.org/10.1063/1.3190327 CrossRefGoogle Scholar
- Daigle BJ, Roh MK, Gillespie DT, Petzold LR (2011) Automated estimation of rare event probabilities in biochemical systems. J Chem Phys 134(4):044,110. https://doi.org/10.1063/1.3522769 CrossRefGoogle Scholar
- Donovan RM, Sedgewick AJ, Faeder JR, Zuckerman DM (2013) Efficient stochastic simulation of chemical kinetics networks using a weighted ensemble of trajectories. J Chem Phys 139(11):115,105. https://doi.org/10.1063/1.4821167 CrossRefGoogle Scholar
- Drawert B, Lawson MJ, Petzold L, Khammash M (2010) The diffusive finite state projection algorithm for efficient simulation of the stochastic reaction-diffusion master equation. J Chem Phys 132(7):074,101. https://doi.org/10.1063/1.3310809 CrossRefGoogle Scholar
- Gibson MA, Bruck J (2000) Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A 104(9):1876–1889. https://doi.org/10.1021/jp993732q CrossRefGoogle Scholar
- Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22(4):403–434. https://doi.org/10.1016/0021-9991(76)90041-3. http://www.sciencedirect.com/science/article/pii/0021999176900413 MathSciNetCrossRefGoogle Scholar
- Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361. https://doi.org/10.1021/j100540a008 CrossRefGoogle Scholar
- Gillespie DT (2001) Approximate accelerated stochastic simulation of chemically reacting systems. J Chem Phys 115(4):1716–1733CrossRefGoogle Scholar
- Gillespie DT, Cao Y, Sanft KR, Petzold LR (2009) The subtle business of model reduction for stochastic chemical kinetics. J Chem Phys 130(6):064,103. https://doi.org/10.1063/1.3072704 CrossRefGoogle Scholar
- Gillespie DT, Roh M, Petzold LR (2009) Refining the weighted stochastic simulation algorithm. J Chem Phys 130(17):174,103. https://doi.org/10.1063/1.3116791 CrossRefGoogle Scholar
- Goutsias J (2005) Quasiequilibrium approximation of fast reaction kinetics in stochastic biochemical systems. J Chem Phys 122(18):184,102. https://doi.org/10.1063/1.1889434 CrossRefGoogle Scholar
- Grima R, Schmidt DR, Newman TJ (2012) Steady-state fluctuations of a genetic feedback loop: an exact solution. J Chem Phys 137(3):035,104. https://doi.org/10.1063/1.4736721 CrossRefGoogle Scholar
- Kang HW, Kurtz TG (2013) Separation of time-scales and model reduction for stochastic reaction networks. Ann Appl Probab 23(2):529–583. https://doi.org/10.1214/12-AAP841. http://projecteuclid.org/euclid.aoap/1360682022 MathSciNetCrossRefGoogle Scholar
- Kuwahara H, Mura I (2008) An efficient and exact stochastic simulation method to analyze rare events in biochemical systems. J Chem Phys 129(16):165,101. https://doi.org/10.1063/1.2987701 CrossRefGoogle Scholar
- Luebeck EG, Moolgavkar SH (2002) Multistage carcinogenesis and the incidence of colorectal cancer. Proc Natl Acad Sci 99(23):15095–15100. https://doi.org/10.1073/pnas.222118199. http://www.pnas.org/content/99/23/15095.abstract CrossRefGoogle Scholar
- Maisonneuve E, Castro-Camargo M, Gerdes K (2013) (p)ppgpp controls bacterial persistence by stochastic induction of toxin–antitoxin activity. Cell 154(5):1140–1150. https://doi.org/10.1016/j.cell.2013.07.048. http://www.sciencedirect.com/science/article/pii/S0092867413009586 CrossRefGoogle Scholar
- Mauch S, Stalzer M (2010) An efficient method for computing steady state solutions with Gillespie’s direct method. J Chem Phys 133(14):144,108. https://doi.org/10.1063/1.3489354. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2973983/ CrossRefGoogle Scholar
- McClure AW, Minakova M, Dyer JM, Zyla TR, Elston TC, Lew DJ (2015) Role of polarized g protein signaling in tracking pheromone gradients. Dev Cell 35(4):471–482. https://doi.org/10.1016/j.devcel.2015.10.024 CrossRefGoogle Scholar
- Moolgavkar SH, Knudson AGJ (1981) Mutation and cancer: a model for human carcinogenesis. J Natl Cancer Inst 66(6):1037–1052CrossRefGoogle Scholar
- Munsky B, Khammash M (2006) The finite state projection algorithm for the solution of the chemical master equation. J Chem Phys 124(4):044,104. https://doi.org/10.1063/1.2145882 CrossRefzbMATHGoogle Scholar
- Nikaido H (2009) Multidrug resistance in bacteria. Ann Rev Biochem 78:119–146. https://doi.org/10.1146/annurev.biochem.78.082907.145923. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2839888/ CrossRefGoogle Scholar
- Ramaswamy R, Gonzalez-Segredo N, Sbalzarini IF (2009) A new class of highly efficient exact stochastic simulation algorithms for chemical reaction networks. J Chem Phys 130(24):244,104–244,113CrossRefGoogle Scholar
- Roh MK, Daigle BJ, Gillespie DT, Petzold LR (2011) State-dependent doubly weighted stochastic simulation algorithm for automatic characterization of stochastic biochemical rare events. J Chem Phys 135(23):234,108. https://doi.org/10.1063/1.3668100 CrossRefGoogle Scholar
- Roh MK, Gillespie DT, Petzold LR (2010) State-dependent biasing method for importance sampling in the weighted stochastic simulation algorithm. J Chem Phys 133(17):174,106. https://doi.org/10.1063/1.3493460 CrossRefGoogle Scholar
- Rubinstein RY (1997) Optimization of computer simulation models with rare events. Eur J Oper Res 99(1):89–112. https://doi.org/10.1016/S0377-2217(96)00385-2 CrossRefGoogle Scholar
- Rubinstein RY, Kroese DP (2004) The cross-entropy method: a unified approach to combinatorial optimization. Monte-Carlo simulation, and machine learning. Springer, New YorkCrossRefGoogle Scholar
- Slepoy A, Thompson AP, Plimpton SJ (2008) A constant-time kinetic monte carlo algorithm for simulation of large biochemical reaction networks. J Chem Phys 128(20):205,101. https://doi.org/10.1063/1.2919546 CrossRefGoogle Scholar
- Tian T, Burrage K (2004) Binomial leap methods for simulating stochastic chemical kinetics. J Chem Phys 121(21):10,356–10,364. https://doi.org/10.1063/1.1810475 CrossRefGoogle Scholar
- Wang Y, Waters J, Leung ML, Unruh A, Roh W, Shi X, Chen K, Scheet P, Vattathil S, Liang H, Multani A, Zhang H, Zhao R, Michor F, Meric-Bernstam F, Navin NE (2014) Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512(7513):155–160. https://doi.org/10.1038/nature13600. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158312/ CrossRefGoogle Scholar
- Watts DJ, Muhamad R, Medina DC, Dodds PS (2005) Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proc Natl Acad Sci USA 102(32):11,157–11,162. https://doi.org/10.1073/pnas.0501226102. http://www.pnas.org/content/102/32/11157.abstract CrossRefGoogle Scholar
- Xu Z, Cai X (2011) Weighted next reaction method and parameter selection for efficient simulation of rare events in biochemical reaction systems. EURASIP J Bioinform Syst Biol 2011(1):4. https://doi.org/10.1186/1687-4153-2011-797251 CrossRefGoogle Scholar
- Zelnik YR, Solomon S, Yaari G (2015) Species survival emerge from rare events of individual migration. Sci Rep 5:7877. https://doi.org/10.1038/srep07877 CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.