Data-Driven Method for Efficient Characterization of Rare Event Probabilities in Biochemical Systems

As mathematical models and computational tools become more sophisticated and powerful to accurately depict system dynamics, numerical methods that were previously considered computationally impractical started being utilized for large-scale simulations. Methods that characterize a rare event in biochemical systems are part of such phenomenon, as many of them are computationally expensive and require high-performance computing. In this paper, we introduce an enhanced version of the doubly weighted stochastic simulation algorithm (dwSSA) (Daigle et al. in J Chem Phys 134:044110, 2011), called dwSSA\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{++}$$\end{document}++, that significantly improves the speed of convergence to the rare event of interest when the conventional multilevel cross-entropy method in dwSSA is either unable to converge or converges very slowly. This achievement is enabled by a novel polynomial leaping method that uses past data to detect slow convergence and attempts to push the system toward the rare event. We demonstrate the performance of dwSSA\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{++}$$\end{document}++ on two systems—a susceptible–infectious–recovered–susceptible disease dynamics model and a yeast polarization model—and compare its computational efficiency to that of dwSSA.


Introduction
When Gillespie (1976Gillespie ( , 1977 introduced the stochastic simulation algorithm (SSA), its use was deemed purely academic as computers were not powerful enough to support SSA simulations except for toy models. SSA is an exact numerical method in that its trajectories can be used to construct the chemical master equation (CME) as the number of simulations reach infinity. Every reaction is B Min K. Roh mroh@idmod.org 1 Institute for Disease Modeling, 3150 139th Ave SE, Bellevue, WA 98005, USA simulated explicitly (reaction time and index) until the final simulation time is reached for each trajectory. This can be computationally infeasible for a large system or even for a small system with many reaction firings. However, as computer processors became more affordable and powerful, increasing number of researchers started using the SSA to model a biological system and gained useful insight from numerical simulations. The dramatic increase in the usage can be seen by the number of citations SSA received; Gillespie's paper (1977) was cited less than 100 times annually until 2003, and the number of annual citations spiked up to over 500 after 2007 (https: //scholar.google.com/citations?user=QwXwK6UAAAAJ#).
With the popularity of SSA came new algorithms derived from it. Some were developed to increase the computational efficiency of the exact method (Gibson and Bruck 2000;Ramaswamy et al. 2009;Slepoy et al. 2008), while others featured faster computation at the expense of accuracy (Cao et al. 2007;Ben Hammouda et al. 2017;Tian and Burrage 2004;Auger et al. 2006;Gillespie 2001;Munsky and Khammash 2006). Specialized methods stemmed from SSA as well when researchers realized various scientific communities shared an interest in specific system behavior or characteristics, such as multiple timescale simulation (Chevalier and El-Samad 2009;Ball et al. 2006;Goutsias 2005;Cao et al. 2004Cao et al. , 2005, model reduction (Kang and Kurtz 2013;), steady-state dynamics (Mauch and Stalzer 2010;Grima et al. 2012), and rare event characterization (Donovan et al. 2013;Zelnik et al. 2015;Xu and Cai 2011;Kuwahara and Mura 2008;Roh et al. 2010Roh et al. , 2011. The last area, field of rare event characterization, is relatively new because of the exceptionally high computational requirements associated with estimating a rare event probability. In order to obtain an accurate estimate, an exact method must be used. Accuracy lost from using an approximate method is likely to be much greater than the magnitude of the rare event probability. Moreover, variance of the estimate decreases slowly, proportional to the square root of the total number of simulations. Despite these hurdles, many important events in biology, chemistry, and epidemiology are rare and stochastic by nature. Examples of a significant rare event include mutation of a normal cell into a cancerous cell (Wang et al. 2014;Luebeck and Moolgavkar 2002;Moolgavkar and Knudson 1981), phage λ (Cao et al. 2010;Arkin et al. 1998), development of multidrug-resistant bacteria (Nikaido 2009;Maisonneuve et al. 2013), and resurgence of a disease Watts et al. (2005).
Development of the weighted stochastic simulation algorithm (wSSA) by Kuwahara and Mura (2008) alleviated some of the computational tolls by using importance sampling (IS) in the reaction selection process. In wSSA bias introduced by IS parameters is recorded at each reaction selection step and used at the end of the simulation to obtain an unbiased estimate of the rare event probability. Doing so does not affect the accuracy of wSSA, and with a good choice of IS parameters, a significant reduction in variance can be achieved. However, there are two major drawbacks with the wSSA. First is that the method does not provide any means to assess the accuracy of the resulting estimate. It is well known that a bad choice of IS parameters can yield an estimate whose variance is higher than that of an unbiased estimator. This problem was solved when  demonstrated that running sum of trajectory weights can be used to compute the uncertainty of the final estimate without affecting the time complexity of wSSA. Second drawback of wSSA is that it did not provide a principled way to choose a good set of IS parameters. Having to guess the value of each IS parameter, one for every reaction is unreasonable even for a modeler who has a considerable insight into the system, especially in the presence of nonlinear reactions. This predicament was addressed by Daigle et al. (2011) with doubly weighted SSA (dwSSA), where both the time to the next reaction and the reaction index are biased. Significance of double weighting (biasing) is that its mathematical form of a trajectory weight can be used to compute a closed-form solution for the optimal IS parameters that minimize cross-entropy, which is used as a proxy to minimum variance. Calculating variance involves, except for a few simple toy models, computation of higher moments, which in turn depend on higher moments. Being able to obtain a closed-form solution is critical for computational efficiency and accuracy, and dwSSA provides an automated and principled way to compute good IS parameters that yield a low-variance estimate. In order to achieve this, Daigle et al. modified and incorporated a multilevel version of the cross-entropy (CE) method by Rubinstein and his colleagues (Rubinstein and Kroese 2004;Rubinstein 1997) into the SSA.
While dwSSA offers automatic selection of good IS parameters, its performance highly depends on the convergence rate of the multilevel CE method that computes optimal IS parameters. If the system exhibits low stochasticity, it is likely that dwSSA converges very slowly to the rare event. The worst-case scenario is that the multilevel CE method does not converge and is unable to return IS parameters. Since having a good set of IS parameters is necessary for obtaining a low-variance estimate, failure in the multilevel CE method is detrimental to the performance of dwSSA. In this paper we introduce dwSSA++ that contains a novel and improved method for computing optimal importance sampling parameters when the system is unable to reach the rare event with sufficient speed. In Sect. 2, we review the doubly weighted stochastic simulation algorithm and present the polynomial leaping method that is used to improve the speed of convergence. Pseudo-algorithms are provided in addition to the MATLAB code (https://github.com/minroh/dwSSA_pp) for ease of understanding. We then apply the dwSSA and the dwSSA++ to a susceptible-infectious-recovered-susceptible (SIRS) model to compare their computational efficiency and accuracy in Sect. 3. Finally, we summarize our contributions in Sect. 4.

Stochastic Simulation Algorithm and Stochastic Chemical Kinetics
We focus on a well-stirred stochastic system with N species {S 1 , . . . , S N }, who interact through any of M reaction channels {R 1 , . . . , R M } to change its population in discrete values. The state of the system at time t is denoted by X(t) ≡ (X 1 (t), . . . , X N (t)), where X i (t) corresponds to the number of molecules of S i at time t. Probability that reaction R j fires in the interval [t, t + dt) is given by its propensity function a j (x) ≡ a j (X(t)), j ∈ 1, . . . , M, where dt is an infinitesimal time increment. The sum of all M propensity functions is denoted a 0 (x).
The SSA simulates time evolution of x by generating a sequence of samples on two random variables: τ , time elapsing between the current and the next reaction firings; and j , index of the reaction fired at time t +τ . First random variable τ is exponentially distributed with mean 1/a 0 (x), while j is a categorical random variable where the probability of R j being chosen as the next reaction is a j (x)/a 0 (x), j ∈ {1, . . . , M}. After τ and j are computed, we update the state of the system using a M × N stoichiometry matrix V, whose jth row ν j indicates the amount of change in x due to one R j reaction firing, i.e., X(t + τ ) = X(t) + V :, j .

Doubly Weighted Stochastic Simulation Algorithm
We give a brief description of dwSSA here. Further details can be found in Daigle et al. (2011). The goal of dwSSA is to generate trajectories to characterize the probability of reaching a rare event E by final time t f . Thus, a trajectory is simulated until either t f is reached or event E is observed at a stopping time T < t f , whichever occurs sooner. The form of rare event probability on which the dwSSA operates is p(x 0 , E ; t f ); it is defined as the probability that the system starting at time 0 in state x 0 will first reach rare event E by some time ≤ t f .
Unlike the wSSA (Kuwahara and Mura 2008) or the swSSA (Roh et al. 2010) that limit importance sampling to reaction selection, dwSSA biases both the time to the next reaction τ and the next reaction index j . There are two significant advantages of dwSSA over wSSA and swSSA. First, the dwSSA makes possible characterization of rare events in some systems that cannot be achieved with the wSSA or the swSSA; second, the dwSSA offers an automated method for choosing importance sampling parameters, γ = [γ 1 , . . . , γ M ], that yield a low-variance estimate. Under dwSSA, probability that the reaction R j fires in the interval [t, t +dt) is given by its predilection function b j (x) instead of the propensity function a j (x), where b j ≡ a j × γ j , b 0 = M j=1 b j (x), and γ j ∈ R + . Using the predilection function, τ now has a mean 1/b 0 (x), and j is categorically distributed with probability b j (x)/b 0 (x). Thus, denoting N T as the total number of reactions that fire in the interval [0, T ], the probability of a single dwSSA trajectory J ≡ (τ 1 , j 1 , . . . , τ N T , j N T ) takes the form where t i ≡ i j=1 τ j . This probability is clearly biased for γ = 1. Correction factor for the dwSSA trajectory J, whose product with the probability in (1) equals the probability of an unbiased SSA trajectory, is (2) It is trivial to check that the product of (1) and (2) is equal to the unbiased probability of a SSA trajectory J: The Monte Carlo estimate for p(x 0 , E ; t f ) using dwSSA is thuŝ where J k represents the kth dwSSA trajectory and I {S(J k )∩E } is 1 if any of the states visited by J k (denoted by S(J k )) includes E and 0 otherwise. Daigle et al. incorporated a modified version of Rubinstein's cross-entropy method (Rubinstein 1997;Rubinstein and Kroese 2004) in order to minimize the cross-entropy between the unknown optimal γ * and its numerical estimateγ * , which is used to compute W dwSS A (J k ) in (2). Significance of minimizing cross-entropy instead of variance is that the former allows for a closed-form solution forγ * while the latter does not. Minimizing cross-entropy is equivalent to maximizing the following formula: For many applications, argument inside (5) is convex function of γ (Rubinstein and Kroese 2004). Assuming convexity, we can obtain a closed-form solution by taking partial derivatives with respect to each γ j and setting the right-hand side to 0: In rare event simulation, (6) can be problematic as most trajectories will not reach E , i.e., I {S(J k )∩E } will be 0 for most k. Daigle et al. solved this problem by using a multilevel version of the cross-entropy method (Rubinstein and Kroese 2004), which takes the system closer to E in an iterative manner using favorable signals obtainable from the current state. Starting with s = 1 and γ (0) = 1, we define an intermediate rare event E (s) as the value closest to E that is reachable by top ρ fraction of all trajectories simulated with γ (s−1) . We note that no computation of γ is required in the beginning (s = 1) as the system starts unbiased, i.e., γ (s−1) = γ (0) = 1. After computing E (s) , we compute the following closed-form solution to obtainγ where n k j is the total number of times R j fires in the kth dwSSA trajectory. In the above expression, rare event indicator function I {S(J k )∩E } in (6) (s) . The final step is to obtain an estimate for p(x 0 , E ; t f ) usingγ * . We note that we cannot derive a closed-form solution forγ * using the probability expression for wSSA or swSSA. It is a unique feature of the dwSSA and the sdwSSA ), latter of which also employs double biasing but with state-dependent IS parameters. Both the closed-form solution and automatic determination of importance sampling parameters are needed for the algorithm to be of practical use, especially for systems that contain nonlinear reactions. The algorithm for estimating p(x 0 , E ; t f ) with uncertainty ) usingγ * is as follows: Algorithm 1 Estimation of the Rare Event Probability break out of the while loop 11: end if 12: generate two unit-interval uniform random numbers r 1 and r 2 13: update all a j (x) and b j (x); recalculate a 0 (x) and b 0 (x) 18: end while 19: end for 20: The uncertainty in step 21 can be used to assess quality of the estimatep(x 0 , E ; t f ).
Denoting the true probability as p( Doubling the interval (2σ/K ) raises the confidence level to 95% and tripling to 99.7%. Thus, the smaller the uncertainty is, the tighter the confidence interval will be. If the uncertainty has the same order magnitude as the rare event probability estimate, then there is little to no trust in the value ofp dwSS A (x 0 , E ; t f ), and the user is advised to increase K and rerun the algorithm.

Extrapolation of Biasing Parameters Using Past Simulation Data
The only difference between the dwSSA and dwSSA ++ lies on howγ * is computed; given a set of IS parameters, both algorithms computep(x 0 , E ; t f ) using Algorithm 1. However, automatic computation ofγ * is the most important component that makes the dwSSA efficient and practical compared to the earlier and related methods (Kuwahara and Mura 2008; Roh et al. 2010Roh et al. , 2011. Without automatic computation ofγ * , dwSSA becomes impractical as the user is expected to provide an importance sampling parameter for each reaction in the system. Although a user may be able to guess the general direction of biasing, i.e., encouraging (γ j > 1) or discouraging (γ j < 1), it is almost impossible for the user to guess values of all IS parameters in the system that can produce a low-variance estimate. In addition, manually tuning IS parameters (Roh et al. 2010 is not computationally feasible for any large systems. Therefore, except for very simple models, multilevel CE method is expected to run to obtain γ * prior to starting Algorithm 1. While the multilevel CE method allows for automatic computation ofγ * that minimizes cross-entropy, its performance largely depends on the speed of convergence to the rare event. For many applications, computational cost of multilevel CE method is negligible compared to the total cost of the simulation since the number of simulations used in multilevel CE method is often orders of magnitude less than that used in Algorithm 1 Roh et al. 2011). It is possible, however, for the computation time in multilevel CE method to dominate the total simulation time. This can happen when the system under study exhibits low stochasticity. If population count is high for all species, there will be little variability among trajectories. Even for a system with small population, IS parameters computed in a prior iteration can bring the system to a strongly stable stochastic equilibrium. In both cases, lack of variability in x among trajectories is likely to result in an intermediate event that is either close or equal to the system's average behavior. In fact, it is possible that E (s) is farther from E than E (s−1) . In the worst case, E (s) may never converge to E and noγ * is computed. For this reason, it is recommended that the user sets a limit of iterations on the multilevel CE method to avoid running ad infinitum. For simulations in this paper, we set this number to 20.
In an attempt to address such slow or no-convergence scenarios, we developed a method called polynomial leaping that tries to take the system out of the low-variance region and toward the rare event using past simulation data. When sufficient signal is present, a polynomial interpolant is constructed for each reaction, where the input values are past importance sampling parameter values. Depending on the system's behavior, polynomial leaping utilizes one of the following two data types as the output variable for the interpolant: number of trajectories that reached E and the value of intermediate rare events from past simulations. Once the output variable is chosen, a lowdegree (1 or 2) polynomial interpolant is computed for each reaction, which is then used to extrapolate the next set of IS parameters. The amount of extrapolation depends on the system's proximity to the rare event as well as on the current speed of convergence. Whenever polynomial leaping method is used to compute the next set of IS parameters, computation of (7) is omitted in the multilevel CE method. Thus, employing polynomial leaping method could not only increase the speed of convergence but also reduce the total number of simulations required to obtainp(x 0 , E ; t f ). The modified multilevel CE method for dwSSA ++ with polynomial leaping is described in Algorithm 2.

Algorithm 2 Optimal dwSSA++ Parameter Estimation
if binary decision tree in Figure 1 returns mCE then 8: go to step 15 9: else if binary decision tree in Figure 1  Here, s max denotes the maximum number of iterations allowed to computeγ * before declaring the algorithm failed to converge. For examples shown in Sect. 3, we set s max to 20. The number of past data used to assess convergence rate and form an interpolant is defined as l d , which is shown in step 6 of Algorithm 2. While l d can be any integer greater than 1, we recommend that it does not exceed 5. The reason is that increasing the number of data required for interpolation is not likely to increase the quality of the resulting interpolant. If good progress is made toward E with the conventional multilevel CE method, polynomial leaping method will not be called. On the other hand, if the system is converging slowly or not at all, having a large value of l d delays the initial calling of the polynomial leaping method until at least l d iterations of multilevel CE method are executed. Starting the polynomial leaping method also implies that past intermediate rare events (IREs) are similar in their values; the same must be true for the importance sampling parameters corresponding to these IREs. Thus, requiring a large number of past data is not expected to significantly increase the quality of resulting interpolant and will delay the system from leaping. For these reasons, we set the default value for l d to 3.
There are two conditions that can prompt leaping in Algorithm 2 (step 6). If any one of the two conditions evaluates to be true, then the polynomial leaping method (Algorithm 3) is used to compute γ (s) instead of step 15. First condition is true when l d past intermediate rare events form a non-strictly converging sequence to E . This means any stalling or regressing in E (s) values during l d stages of multilevel CE method will trigger polynomial leaping. Second condition is satisfied if the estimated number of iterations to reach the rare event exceeds a preset threshold σ , which is set to 5 by default. We obtain the estimated number of iterations, μ, by first computing the speed of progress based on the last two multilevel CE iterations: We note that the above method is based on the relative rate of convergence to E and does not depend on the absolute distance to the rare event, which depends on the randomly assigned initial reaction rate values. In order to determine the leaping eligibility, Algorithm 2 executes a series of diagnostic questions via binary decision tree shown in Fig. 1. The two conditions that trigger leaping correspond to the first node and its left child node, respectively. If neither condition is met, then the multilevel CE method is resumed to determine γ (s) as sufficient progress is being made toward E , i.e., μ <= σ . This case corresponds to the leaf node with value Run mCE in Fig. 1. On the other hand, if the underlying system is neither making a progress toward the rare event nor exhibiting any signal, Algorithm 2 is unable to determine the direction of bias required to reach E . While unlikely to occur for most systems, it is theoretically possible. For example, this may happen if the chosen initial state coincides with the system's strong equilibrium state with very low variance. This case is indicated by leaf node with value No signal in Fig. 1. In all other cases, the binary decision tree returns two pieces of information required to initiate leaping (Algorithm 3): method of extrapolation and the type of input data. The method can be either polynomial interpolation or bisection and is decided based on the number of input data available. Bisection is employed only when there is a single eligible data for extrapolation. Polynomial interpolation is used otherwise. Here, we fit a low-degree polynomial according to specifications returned by the binary decision tree. Interpolants constructed by the polynomial leaping method are kept at low degree (1 or 2) since (7) was derived assuming convexity Daigle et al. (2011) and a small number of data, l d , is used to compute the interpolants. We note that the default value for l d (=3) is set such that it is the minimum number of data required to construct a polynomial interpolant of degree 2. Leaves of the binary decision tree that correspond Fig. 1 Binary decision tree used in polynomial leaping method. Larger boxes in the figure contain questions used in the decision making process, and its outline color indicates the type of response from its parent node. Box outlined in green is reached if the response to its parent node is positive. Similarly, red box is reached if the response is negative. Leaves of the decision tree represent the type of acceleration method and input data type to polynomial interpolation contain value Poly. with its degree (Deg. 1 or Deg. 2). Bisection is indicated by the keyword Bisection.
Second piece of information returned by the binary decision tree, type of input data, can be either past counter values or past IRE values. Between these two types, the former is preferred to the latter. Counter data represent the number of trajectories that reached E from past l d iterations of multilevel CE method. Cardinality of the set of possible values for counters is card({0, 1, . . . , K × ρ }), which is large for commonly chosen values of K (10 5 to 10 8 ) and ρ(10 −4 to 10 −2 ), where smaller value of ρ is associated with larger K . Upper limit of this set is K × ρ , as the multilevel CE method is able to computeγ * once (K × ρ) or more number of trajectories reach E . The large range allows the algorithm to easily assess the effect of change in biasing parameter values and compute reliable interpolants. On the other hand, the range of intermediate rare events varies greatly depending on the definition of a rare event for a given system; a wide range of biasing parameters may correspond to the same intermediate rare event. There is one notable advantage in using past IREs, however. We do not need to worry about its existence; unlike counters data, past IRE data is always available regardless of the system's proximity to E . Unless the system starts in a strong stochastic equilibrium, which is very unlikely given myriad possible combinations of random initial reaction rate values, multilevel CE method will make a progress toward the rare event. The progress does not guarantee any trajectories to reach the rare event, and thus counter data may be 0. Nevertheless, its IRE value will be closer to E due to the progress. And if the system reaches a strong equilibrium during the simulation and produce l d IREs with the same value, we can extract more signal by accessing IRE values beyond past l d iterations. This is the reason queries in the binary decision tree contain checks for all past IREs if the last l d IRE values are identical. Thus, the order of preferred data type in the algorithm is counters, past l d IRE values, and all past IRE values.
Once interpolants are constructed, we decide on the value of the output variable ξ that we want the system to produce on the next iteration of Algorithm 2. This value is assigned as the RHS of each M interpolant to compute γ (s) j . Since ξ is an unobserved value outside the range of past behavior, obtaining γ (s) j is considered extrapolation. We note that computing γ (s) via extrapolation replaces the traditional multilevel CE routine (Algorithm 2, step 15), saving K trajectory simulations per each leaping.
Computing a robust target ξ using past counters is straight forward: where y past represents past counters data. Ideally, we would like to compute IS parameters for E , but doing so can result in a large extrapolation error if the current system is far from the rare event. When the maximum of the past counters is greater than half the minimum number of data required to computeγ * , we set the target counter ξ to twice this minimum value ( 2ρ K ) to ensure enough trajectories reach E without over-perturbing the system. When this condition is not met, system is considered far from observing the rare event, and we set the target to a more conservative value of ξ = ρ K . Using the past IREs for extrapolation is not as straightforward. Speed of convergence can vary greatly depending on the function that defines a rare event. In order to assess the convergence speed, we compute the first order approximation using the amount of progress made by two most recent intermediate rare events. Denoting these two values as y 1 and y 2 , where y 1 is the last computed IRE, the target output is computed as following: In the above equation, the quantity h reflects the absolute amount of progress made in IRE from the most recent simulation, and μ denotes the number of iterations required to reach the rare event assuming the amount convergence per iteration stays at h. We then compute the desired amount of progress for the next iteration, δ, which is the lesser of μ/2 and 3h. The first quantity, μ/2, indicates that we aim to halve the distance to E in the next simulation by utilizing leaping. The fact that past IRE values are used to construct interpolants instead of past counter data indicates that the system is not producing trajectories that observe E under the current parametrization. Therefore, setting the next target to E would be too aggressive and likely result in extrapolation beyond what the data can reliably predict. The second quantity 3h sets a maximum limit on the target progress to three times the size of current progress. This limit also ensures extrapolation is not too extreme using the absolute distance to the rare event. If the current state is far from E , halfway point between the latest IRE and E still may be too far for an accurate extrapolation. By imposing these two limits, we compute ξ more conservatively with IREs than with counters data to account for lack of trajectories reaching the rare event. Pseudo-algorithm for polynomial leaping method is shown in Algorithm 3.

Algorithm 3 Importance Sampling Parameter Computation with Polynomial Leaping
1: y past ← relevant past data returned by the binary decision tree in Figure 1 2: γ past ← importance sampling parameters corresponding to y past 3: if binary decision tree in Figure 1  We note that extrapolation of biasing parameters with leaping method can be selectively applied for large systems, where only few reactions may play an important role in observing the rare event. Second example in Sect. 3 illustrates this point.

Results
In this section, we illustrate the performance of dwSSA ++ by comparing it to that of dwSSA on two example systems-a susceptible-infectious-recovered-susceptible (SIRS) disease dynamics model and a yeast polarization model. In order to minimize the difference in results due to stochasticity, same random number seeds were used for the corresponding dwSSA and dwSSA ++ simulations. Default parameterizations are used for dwSSA ++ -specific parameters, i.e., l d = 3 and σ = 5. We emphasize again that the two algorithms differ only in the method for computing optimal biasing parameters, i.e., conventional multilevel CE method vs modified multilevel CE method with polynomial leaping. Onceγ * is computed, both dwSSA and dwSSA ++ run Algorithm 1 to estimate the rare event probability. All simulations were run using MATLAB ® 2017a and Parallel Computing Toolbox™ on Intel ® Core™ i7-6400U CPU. All codes used in simulations are available at https://github.com/minroh/dwSSA_pp.

SIRS
Our first example is a susceptible-infectious-recovered-susceptible (SIRS) disease transmission model, which consists of the following three reactions: There are two algorithmic parameters-ρ and K -that can be tuned to improve speed of convergence albeit each having an associated drawback. The first parameter ρ indicates the fraction of trajectories used to determine an intermediate rare event. Lowering the value of ρ will likely result in an IRE closer to the rare event. However, this also lowers the number of data used to compute the corresponding biasing parameters. Biasing parameters computed with only few data may not be reliable and yield an estimate with high variance. This drawback can be mitigated by increasing the total simulation size K . A big disadvantage of increasing the value of K for most systems is longer simulation time. However, doing so could lead to convergence for some systems that do not converge with a smaller K . Precise relationship between convergence rate and the two parameters is system dependent and often difficult to gauge when nonlinear reactions, such as R 2 in the SIRS model, are present.
In order to study the sensitivity of these two parameters in the SIRS model, we computedγ * using both the dwSSA and dwSSA ++ with ρ ∈ {10 −4 , 5 × 10 −4 , 0.001, 0.005, 0.01} for K = 10 5 and ρ ∈ {10 −5 , 5 × 10 −5 , 10 −4 , 5 × 10 −4 , 0.001, 0.005, 0.01} for K = 10 6 . We set the maximum number of iterations (s max in Algorithm 2) to 20 in order to avoid running simulations ad infinitum. If either method did not compute the final biasing parameters by iteration 20, we declared the simulation to be inconvergent. For each run we measured the total simulation time and used it to calculate computational gain of using Algorithm 2 over the conventional multilevel CE method, where Gain := t(dwSSA)/t(dwSSA ++ ). Results from this parameter sweep is summarized in Table 1.
Several interesting observations can be made from Table 1. First, it is clear that lowering ρ for a given K increases the rate of convergence to the rare event, especially for the conventional multilevel CE method simulations. However, the number of data used to computeγ * decreases too, and this results in high variability inγ * . For example, dwSSA ++ does not employ polynomial leaping when using ρ = 10 −4 and K = 10 5 (6th row in Table 1), making the algorithm equivalent to the conventional multilevel CE method. However, the two simulations yieldedγ * values that are noticeably different, e.g., 26% difference inγ * 2 . The difference is not due to R 2 being insignificant in producing θ I since γ 2 stays consistently below 1 throughout the simulation. This is because each iteration of multilevel CE method relied on only the top 10 data (10 5 × 10 −4 = 10) to compute the next IRE and its corresponding biasing parameters. When either ρ or K increases, we see that this variability disappears. For dwSSA ++ runs that employ polynomial leaping, extrapolation leads to deviation from minimizing cross-entropy, and resultingγ * is expected to differ from the one obtained by using the conventional multilevel CE method. And the difference does not imply better or worse performance. However,γ * j values obtained from multiple simulations using the same algorithm and parameterization should be consistent given R j is involved in rare event production.
In order to demonstrate how high variance inγ * could negatively affect a rare event probability estimate, we computep([100 1 0], 60; 30) usingγ * obtained by both algorithms with ρ = 10 −4 and K = 10 5 . Running Algorithm 1 with K = 10 7 and γ * dwSS A = [1.356 0.588 1.318] yields the following estimate with a 95% confidence interval ):  The first column denotes the ensemble size K in multilevel CE method and Algorithm 2, the second column specifies ρ, the third column lists the total number of iterations required to computeγ * if converged (maximum iteration of 20 if not converged), the forth column denotes the observation of rare event where a value in parenthesis indicates the IRE closest to the rare event when the simulation did not converge, the fifth column presents the optimal biasing parameters if computed, the sixth column records the total simulation time, and the seventh column displays computational gain by using the dwSSA ++ over the dwSSA to computeγ * . Results from using K and ρ values from the first two columns are split into two rows-dwSSA(top) and dwSSA ++ (bottom). Number inside the parenthesis in the third column for the dwSSA ++ simulations indicates the number of iterations that utilized polynomial leaping. Computational gain with * denotes simulations where the difference in results is purely due to stochasticity and not algorithmic difference We can see that the uncertainty from latter simulation is almost three times as large as the one from former even though both were obtained using the same method and same parameter values. Poor sampling and insufficient number of data resulted in this discrepancy. If we increase the total sampling rate by using K = 10 6 but keep the sample size at 10 by decreasing ρ = 10 −5 , the dwSSA ++ simulation still remains equivalent to the dwSSA simulation as no leaping is triggered. However, since the total sample size is ten times larger than the former simulation, we see more consistency in the rare event estimate: The only way to assess the performance ofγ * is by computing a rare event probability estimate with a large number of simulations, which is often orders of magnitude greater than the number of simulations used to computeγ * . The goal of running either the dwSSA or the dwSSA ++ is to produce a low-variance estimate, and thus it is important to obtain reliable biasing parameters using sufficient number of data and sampling size. Although lowering ρ and K results in faster convergence, it is not worth the computational gain if the resulting biasing parameters yield a high variance estimate.
It is worth noting that the conventional multilevel CE method was not able to computeγ * for five of the twelve runs in this parameter sweep. We see from Table 1 that dwSSA simulations using ρ ∈ {0.005, 0.01} for K = 10 5 and ρ ∈ {0.001 0.005, 0.01} for K = 10 6 did not converge within 20 iterations, while all dwSSA ++ simulations converged by iteration 12. It is also clear from Table 1 that performance of conventional multilevel CE method is sensitive to changes in both ρ and K . On the other hand, performance of Algorithm 2 is robust with respect to both parameters and exhibits superior convergence. Furthermore, because Algorithm 2 utilizes leaping only when it detects slow convergence, it reduces to the conventional CE method when enough progress is being made toward the rare event. This is illustrated by a gradual decline in the number of times polynomial leaping is employed with decreasing ρ (Row 3 in Table 1). Figure 2 displays the spread of biasing parameter values listed in Table 1. We see thatγ * computed by dwSSA ++ are more consistent in their values and slightly more perturbing thanγ * computed by the dwSSA. We hypothesize that the former phenomenon may be due to dwSSA data having a smaller sample size (7 vs 12 from dwSSA ++ ) and the latter due to polynomial leaping, as it pushes the system further than what the multilevel CE method observes.

Yeast Polarization
For our second example, we use a modified version of the pheromone-induced G-protein cycle in Saccharomyces cerevisiae Drawert et al. (2010) in a similar fashion as Daigle et al. (2011). Our modified system consists of seven species x = [R L RL G G a G bg G d ] and is characterized by the following eight reac-   Bar et al. 2003). Here we aim to characterize the probability p(x 0 , θ G bg ; t f ), where θ G bg is defined as the population of G bg reaching 300 and t f = 5. Daigle et al. (2011) studied this system under a different parameterization. While the reaction rates and the rare event definition differ, their rare event also examines the population of G βγ (θ G bg = 50). The authors note that the two reaction rates that are most consistently differed from 1 are γ 6 and γ 8 , both of which do not directly affect the population of G βγ . This is illustrated in their paper by running 100 independent runs of multilevel cross-entropy method and computing variability among final biasing parameters. They also show that most simulations converge in three iterations. In our example, multilevel CE method with the same simulation parameters (ρ = 0.01 and K = 10 5 ) does not converge by iteration 20 and no biasing parameters are computed to estimate the rare event probability. However, when analyzing the resulting biasing parameters in each iteration of the multilevel CE method, same trend emerges; only biasing parameters γ 6 and γ 8 are consistently and significantly different from 1. Rest of the biasing parameters are either not consistent in the direction of basing or remain close to 1. We show the two different trends in Fig. 3. In Fig. 3a we see that all six parameters either fluctuate above and below 1 (γ 1 , γ 2 , γ 4 , and γ 7 ) or stays very close to it (γ 3 and γ 5 ). On the other hand, γ 6 and γ 8 values in Fig. 3b are consistently and significantly below and above 1, respectively. It is worth pointing out that γ 1 , γ 2 , and γ 4 all degenerate to 0 as intermediate rare event gets closer to the rare event. The first two parameters quickly approach 0 by iteration 5, and γ 4 is "turned off" by iteration 8. We can interpret this as multilevel CE method reducing the system to a lower dimensional model as some reactions become unnecessary in observing the rare event. When there is no observation of a reaction, we set its biasing parameter to the minimum positive number allowed in the computing hardware. This way the reaction is always possible to fire in the next iteration, however unlikely it may be.
As system size grows, it is likely that some, if not most, of the reactions do not affect observation of the rare event. Biasing parameters for reactions that do not matter in rare event production will show high variability in their values and are easy to detect Daigle et al. (2011). When leaping method is utilized, these biasing parameters should be excluded from being extrapolated for computational efficiency. Incorporating these spurious parameters in polynomial leaping will not only increase the total computation time but also produce mathematically meaningless interpolant. Therefore, we apply polynomial leaping on only the two parameters, γ 6 and γ 8 , that affect observation of the rare event. The rest of the biasing parameters retain their value from the previous iteration as they are deemed unimportant in rare event observation. However, their values are updated when the multilevel CE method (Algorithm 2 step 15) is run instead of the polynomial leaping method. This enables the algorithm to collect data and use them to decide which reactions need to be extrapolated when there is slow or no convergence. Table 2 summarizes simulation results from running both dwSSA and dwSSA ++ to computeγ * for p(x 0 , θ G bg ; t f ) using K = 10 5 and ρ ∈ {10 −4 5 × 10 −4 0.001 0.005 0.01}.
Similar to the SIR model, we see a gradual increase in performance with decreasing value of ρ when using the conventional multilevel CE method, while the performance of Algorithm 2 is relatively robust with respect to ρ. Algorithm 2 converges in all five sets of simulations while the multilevel CE method does in only three. As the convergence rate increases with decreasing ρ, leaping method is triggered less frequently, and the two methods eventually become equivalent when no leaping is employed. When there is no leaping, any difference in the performance is purely due to stochasticity. We note that it is possible to modify Algorithm 2 to dynamically choose biasing parameters that can be used for extrapolation when leaping method is triggered. When slow convergence is detected, past biasing parameter values can be scanned to select leaping indices prior to entering Algorithm 3. Therefore, it is not necessary to run simulations prior to decide which reactions are to be extrapolated.
We illustrate effectiveness of leaping with ρ = 0.01 and K = 10 5 on Fig. 4. Only the dwSSA ++ converges using this parameter combination after utilizing leaping twice in iterations 7 and 9. The conventional multilevel CE method gets close to producing the rare event but never reaches it by iteration 20. We see from Fig. 4a that lack of time is not the main cause, as the dwSSA observes G βγ > 290 after iteration 6. The maximum G βγ population in this simulation is 298, and it is first observed during iteration 13. We hypothesize that the system entered a stochastic equilibrium around this time, and that prevented the algorithm from converging. On the other hand, dwSSA ++ recognizes slow convergence first at iteration 7 and then again at iteration 9. By extrapolating γ 6 and γ 8 values using past IRE data, the algorithm reaches the rare event by iteration 10 and successfully computesγ * . Figure 4b shows γ 6 and γ 8 values computed by both algorithms. We see that the most significant change in γ 6 from dwSSA ++ occurred during the first leaping and γ 8 during the second leaping.

Conclusion and Discussion
This paper describes dwSSA ++ and its novel contribution in improving automatic computation of biasing parameters required to characterize a rare event probability. Numerical results from two example systems in Sect. 3 support our claim that the polynomial leaping method employed by dwSSA ++ can significantly shorten simulation time in computing biasing parameters. We showed that the 12 simulations that employed polynomial leaping at least once performed better than its corresponding dwSSA simulations. Furthermore, the dwSSA ++ converged on all 17 sets while the dwSSA failed to compute biasing parameters on 6 of them. Thus, the benefit of using dwSSA ++ is not limited to computational efficiency but also lowering the failure rate in computing biasing parameters.  We note that the main contribution of dwSSA is in automatic computation of biasing parameters. Similar methods existed prior to dwSSA that utilized importance sampling to efficiently estimate a rare event probability (Kuwahara and Mura 2008;Roh et al. 2010), but they were all impractical for large systems because there was no principled method to compute biasing parameters that could yield a low-variance probability estimate. Therefore, the dwSSA is as impractical as its predecessors without automatic computation of the biasing parameters. Although the multilevel CE method used in dwSSA works well for many systems, it can fail to converge when a system is in a stochastic equilibrium or exhibiting low stochasticity. The dwSSA ++ attempts to resolve this problem by extrapolating biasing parameters using past simulation data when slow convergence is detected. The algorithm also offers tuning parameters that define the threshold for slow convergence and the amount of past data utilized in polynomial leaping method. This allows for flexible controlling of the algorithm.
We point out that simulations run with the default parameter values (ρ = 0.01 and K = 10 5 ) described in Daigle et al. (2011) are inconvergent for both examples shown in this paper. However, we note that these values work well for many systems Roh et al. 2011) that do not suffer from low stochasticity. Default parameter values ρ = 0.01 and K = 10 5 ensure both sampling frequency (10 5 ) and the number of data used to compute IREs (10 5 × 0.01 = 10 3 ) are adequate. Thus, unless a user is aware that the system under study exhibits low stochasticity prior to simulation, we suggest the user to run the dwSSA ++ with default parameter values and allow the algorithm to extrapolate the biasing parameters when necessary. As shown with examples in Sect. 3, dwSSA ++ adaptively switches between the polynomial leaping method and the multilevel CE method depending on the system behavior.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.