Learning-Based Importance Sampling via Stochastic Optimal Control for Stochastic Reaction Networks

We explore efficient estimation of statistical quantities, particularly rare event probabilities, for stochastic reaction networks. Consequently, we propose an importance sampling (IS) approach to improve the Monte Carlo (MC) estimator efficiency based on an approximate tau-leap scheme. The crucial step in the IS framework is choosing an appropriate change of probability measure to achieve substantial variance reduction. This task is typically challenging and often requires insights into the underlying problem. Therefore, we propose an automated approach to obtain a highly efficient path-dependent measure change based on an original connection in the stochastic reaction network context between finding optimal IS parameters within a class of probability measures and a stochastic optimal control formulation. Optimal IS parameters are obtained by solving a variance minimization problem. First, we derive an associated dynamic programming equation. Analytically solving this backward equation is challenging, hence we propose an approximate dynamic programming formulation to find near-optimal control parameters. To mitigate the curse of dimensionality, we propose a learning-based method to approximate the value function using a neural network, where the parameters are determined via a stochastic optimization algorithm. Our analysis and numerical experiments verify that the proposed learning-based IS approach substantially reduces MC estimator variance, resulting in a lower computational complexity in the rare event regime, compared with standard tau-leap MC estimators.


Introduction
We propose an approach to efficiently estimate statistical quantities, particularly rare event probabilities for a particular class of continuous-time Markov chains known as stochastic reaction net-works (SRNs).Consequently, we develop a learning-based importance sampling (IS) algorithm to improve the Monte Carlo (MC) estimator efficiency based on an approximate tau-leap (TL) scheme.The automated approach is based on an original connection between optimal IS parameter determination within a class of probability measures and stochastic optimal control (SOC) formulation.
SRNs (see Section 1.1 for a short introduction and [9] for more details) describe the time evolution of biochemical reactions, epidemic processes [13,5], and transcription and translation in genomics and virus kinetics [49,32], among other important applications.For the current study, let X be an SRN that takes values in N d and is defined in the time interval [0, T ], where T > 0 is a user-selected final time.We aim to provide accurate and computationally efficient MC estimations for the expected value E[g(X(T ))], where g : N d → R is a scalar observable for X.In particular, we study estimating rare event probabilities with g(x) = 1 {x∈B} (i.e., the indicator function for a set B ⊂ R d ).
The quantity of interest, E[g(X(T ))], can be computed by solving the corresponding Kolmogorov backward equations [8].For most SRNs, deriving a closed-form solution for these ordinary differential equations is infeasible, and numerical approximations based on discretized schemes are commonly used.However, the computational cost scales exponentially with the number of species d.Therefore, we are particularly interested in estimating E[g(X(T ))] using MC methods, an attractive alternative to avoid the curse of dimensionality.
Many schemes have been developed to simulate exact sample paths for SRNs, such as the stochastic simulation algorithm [25] and modified next reaction method [4].Pathwise exact SRN realizations can incur high computational costs if any reaction channels have high reaction rates.Gillespie [26] and Aparicio and Solari [6] independently proposed the explicit TL method (see Section 1.2) to overcome this issue by simulating approximate paths of X, evolving the process with fixed time steps and keeping reaction rates fixed within each time step.Various simulation schemes have been subsequently proposed to deal with situations incorporating well-separated fast and slow time scales [14,46,1,2,40,11].
The current paper proposes a path-dependent IS approach based on an approximate TL scheme to improve the MC estimator efficiency, and hence efficiently estimate various statistical quantities for SRNs (particularly rare event probabilities).Our class of probability measure change is based on modifying the Poisson random variable rates used to construct the TL paths.In particular, optimal IS parameters are obtained by minimizing the second moment of the IS estimator (equivalently the variance) which represents the cost function for the associated SOC problem.We show that the corresponding value function solves a dynamic programming relation that is challenging to solve analytically (see Section 2.1).We approximate the dynamic programming equation to derive a closed form solution and near-optimal control parameters.The cost to solve the associated backward equation numerically in multi-dimensional settings increases exponentially with respect to the dimension (i.e., the curse of dimensionality).Thus, we propose approximating the resulting value function using a neural network to overcome this issue.Utilizing the optimality criterion for the SOC problem, we obtain a relationship between optimal IS parameters and the value function.
Finally, we employ a stochastic optimization algorithm to learn the corresponding neural network parameters.Our analysis and numerical results for different dimensions confirm that the proposed estimator considerably reduces the variance compared with the standard TL-MC method with a negligible additional cost.This allows rare event probabilities to be efficiently computed in a regime where standard TL-MC estimators commonly fail.
The proposed approach is more computationally efficient than previously proposed IS schemes in this context ( [36,24,48,16,15,23,47]) because it is based on an approximate TL scheme rather than the exact scheme.In contrast to previous approaches, the change of measure is systematically derived to ensure convergence to the optimal measure within the chosen class of probability measures, minimizing MC estimator variance.The novelty of this work is establishing a connection between IS and SOC in the context of pure jump processes, particularly for SRNs, with an emphasis on related practical and numerical aspects.Note that some previous studies [20,17,7,28,50,30,33,31,29,42] have established a similar connection, mainly in the diffusion dynamics context, with less focus on pure jump dynamics.In this work, the proposed methodology is based on an approximate explicit TL scheme, which could and be subsequently extended in future work to continuous-time formulation (exact schemes), and implicit TL schemes which are relevant for systems with fast and slow time scales.
The remainder of this paper is organized as follows.Sections 1.1, 1.2, 1.3 and 1.4 define relevant SRN, TL, MC and IS concepts, respectively.Section 2 establishes the connection between IS and SOC, formulating the SOC problem and defining its main ingredients: controls, cost function, and value function; then presents the dynamic programming solved by the optimal controls.Section 2.3 develops the proposed IS learning-based approach appropriate for multi-dimensional SRNs.Section 3 provides selected numerical experiments for different dimensions to illustrate the proposed approach's efficiency compared with standard MC approaches.Finally, Section 4 summarizes and concludes the work, and discusses possible future research directions.

Stochastic reaction networks (SRNs)
We are interested in the time evolution for an homogeneously mixed chemical reacting system described by the Markovian pure jump process, X : [0, T ]×Ω → N d , where (Ω, F, P) is a probability space.In this framework, we assume that d different species interact through J reaction channels.The i-th component, X i (t), describes the abundance of the i-th species present in the chemical system at time t.This work studies the time evolution of the state vector, (1.1) Each reaction channel R j is a pair (a j , ν j ) defined by its propensity function a j : R d → R + and stoichiometric vector ν j = (ν j,1 , ν j,2 , ..., ν j,d ) ⊤ satisfying Thus, the probability of observing a jump in the process X from state x to state x + ν j , a consequence of reaction R j firing during the small time interval (t, t + ∆t], is proportional to the time interval length, ∆t, where a j (x) is the proportionality constant.We set a j (x)=0 for x such that x+ν j / ∈ N d (i.e., the non-negativity assumption: the system can never produce negative population values).
Hence, from (1.2), process X is a continuous-time, discrete-space Markov chain that can be characterized by Kurtz's random time change representation [19], where Y j : R + ×Ω → N are independent unit-rate Poisson processes.Conditions on the reaction channels can be imposed to ensure uniqueness [5] and avoid explosions in finite time [18,45,27].
Applying the stochastic mass-action kinetics principle, we can assume that the propensity function a j (•) for reaction channel R j , represented as1 where {θ j } J j=1 represents positive constant reaction rates, and x i is the counting number for species S i .

Explicit tau-leap approximation
The explicit-TL scheme is a pathwise approximate method [26,6] to overcome computational drawbacks for exact methods (i.e., when many reactions fire during a short time interval).This scheme can be derived from the random time change representation (1.3) by approximating the integral t i+1 t i a j (X(s))ds as a j (X(t i )) (t i+1 − t i ), i.e., using the forward-Euler method with time mesh {t 0 = 0, t 1 , ..., t N = T } and size ∆t = T N .Thus, the explicit-TL approximation for X should satisfy for k ∈ {1, 2, . . ., N } (1.6) and given X 0 := x 0 , we iteratively simulate a path for X ∆t as (1.7) where, conditioned on the current state X ∆t k , {P k,j (r k,j )} {1≤j≤J} are independent Poisson random variables with respective rates r k,j := a j ( X ∆t k )∆t.
The explicit-TL path X ∆t is defined only at time mesh points, but can be naturally extended to [0, T ] as a piecewise constant path.We apply the projection to zero to prevent the process from exiting the lattice (i.e., producing negative values), hence (1.7) becomes (1.8) where the maximum is applied entry-wise.In this work, we use uniform time steps with length ∆t, but the explicit-TL scheme and the proposed IS scheme (see Section 2) can also be applied to non-uniform time meshes.

Biased Monte Carlo estimator
Let X be a stochastic process and g : R d → R a scalar observable.We want to approximate E [g(X(T ))], but rather than sampling directly from X(T ), we sample from X ∆t (T ), which are random variables generated by a numerical scheme with step size ∆t.We assume that variates X ∆t (T ) are generated with an algorithm with weak order, O (∆t), i.e., for sufficiently small ∆t, where C > 0. 2Let µ M be the standard MC estimator for E g(X ∆t (T )) , (1.10) where {X ∆t [m] (T )} M m=1 are independent and distributed as X ∆t (T ).
The global error for the proposed MC estimator has error decomposition To achieve the desired accuracy, TOL, it is sufficient to bound the bias and statistical error equally by T OL 2 .From (1.9), choosing step size ensures a bias of TOL 2 .Thus, considering the central limit theorem, the statistical error can be approximated as where constant C α is the (1 − α 2 )−quantile for the standard normal distribution.We choose C α = 1.96 for a 95% confidence level corresponding to α = 0.05.Choosing sample paths ensures the statistical error to be approximately bounded by TOL  2 .Given that the computational cost to simulate a single path is O ∆t −1 , the expected total computational complexity is O TOL −3 ; and the complexity scales with Var[g(X ∆t (T ))] (see (1.14)).

Importance sampling
Importance sampling (IS) techniques improve the computational costs for the crude MC estimator by variance reduction when used appropriately.To motivate the use of these techniques, consider estimating rare event probabilities, where the crude MC method is substantially expensive.In particular, consider estimating where Y is a random variable taking values in R with probability density function ρ Y .Let γ be sufficiently large that q becomes sufficiently small.We can approximate q using the MC estimator where {Y (i) } M i=1 are independent and identically distributed (i.i.d) realizations sampled according to ρ Y .The MC estimator variance is For a sufficiently small q, we can use (1.16) and the central limit theorem to approximate the relative error as where C α is chosen as in (1.13).
The number of required samples to attain a relative error tolerance Thus, for q of the order of 10 −8 , the number of required samples such that T OL rel = 5% is approximately equal to 1.5 • 10 11 .
To demonstrate the IS concept, consider the general problem of estimating E[g(Y )], where g is a given observable.In the previous example, g was chosen as g(y) = 1 {y>γ} .Let ρ Z be the probability density function for a new real random variable Z, such that g • ρ Y is dominated by ρ Z , i.e., for all x ∈ R.This permits, the quantity of interest to be expressed as where L(•) is the likelihood ratio.Hence the expected value under the new measure remains unchanged, but the variance could be reduced due to a different second moment E (g(Z) • L(Z)) 2 .
The MC estimator under the IS measure is ), (1.20) where Z [j] are i.i.d samples from ρ Z for j = 1, . . ., M .
The main challenge when using IS is choosing a new probability measure that substantially reduces the variance compared with the original measure.This step strongly depends on the structure of the problem under consideration.Further, the new measure should be obtained with negligible computational cost to ensure a computational efficient IS scheme.This is particularly challenging in the present problem, since we are considering path-dependent probability measures.In particular, the aim is to introduce a path-dependent change of probability measure that corresponds to changing the Poisson random variable rates used to construct the TL paths.Section 2.1 shows how the optimal IS parameters can be obtained using a novel connection with SOC.

Dynamic programming for the importance sample parameters
This section, establishes the connection between optimal IS measure determination within a class of probability measures, and SOC.Let X be a SRN as defined in Section 1.1 and let X ∆t denote its TL approximation as given by (1.8).We aim to find a near-optimal IS measure to improve the MC estimator computational performance to estimate E [g(X(T ))].Since finding the optimal path-dependent change of measure within all measure classes presents a challenging problem, we limit ourselves to a parameterized class obtained via modifying the Poisson random variable rates of the TL paths.This class of measure change was previously used in [10] to improve the MLMC estimator robustness and performance in this context; we focus on a single-level MC setting, and seek to automate the task to find a near-optimal IS measure within this class.We introduce the change of measure resulting from changing the Poisson random variable rates in the TL scheme, (2.1) where δ ∆t n,j (x) ∈ A x,j is the control parameter at time step n, under reaction j, and in state x ∈ N d ; and conditioned on X ∆t n , P n,j (r n,j ) are independent Poisson random variables with respective rates r n,j := δ ∆t n,j (X ∆t n )∆t.The admissible set, is chosen such that (1.18) is fulfilled and to avoid infinite variance for the IS estimator.The control δ ∆t n,j (x) ∈ A x,j depends deterministically on the current time step n, reaction channel j, and current state x = X ∆t n for the TL-IS approximation in (2.3).Therefore, the resulting scheme under the new measure is and the likelihood ratio3 at step n associated with the new IS measure is where δ ∆t n (x) ∈ × J j=1 A x,j are the IS parameters with δ ∆t n (x) j = δ ∆t n,j (x) and the Poisson realizations are denoted by P n with P n j := P n,j for j = 1, . . ., J. Equation (2.4) uses the convention that 2), this results in a factor of one in the likelihood ratio for reactions with a j (X ∆t n ) = 0. Therefore, the likelihood ratio for {X ∆t n : n = 0, . . ., N } across one path is (2.5) L P 0 , . . ., This likelihood ratio completes the characterization for the proposed IS approach, and allows the quantity of interest with respect to the new measure to be expressed as with the expectation in the right-hand side of (2.6) taken with respect to the dynamics in (2.3).Hereinafter, we aim to determine optimal parameters {δ ∆t n (x)} n=0,...,N −1;x∈N d that minimize the second moment (and hence the variance) for the IS estimator, given that X ∆t 0 = x 0 .To that end, we derive an associated SOC formulation.First we introduce the cost function for the proposed SOC problem in Definition 2.1, then derive a dynamic programming equation in Theorem 2.4 that is satisfied by the value function u ∆t (•, •) in Definition 2.3.The proof for Theorem 2.4 is given in Appendix A.
Definition 2.1 (Second moment for the proposed importance sampling estimator).Let 0 ≤ n ≤ N .Given that X ∆t n = x, the second moment for the proposed IS estimator can be expressed as Compared with the classical SOC formulation, (2.7) can be interpreted as the expected total cost; where the main difference is that (2.7) uses a multiplicative cost structure rather than the standard additive one.Therefore, we derive a dynamic programming relation in Theorem 2.4 associated with this cost structure that is fulfilled by the corresponding value function (see Definition 2.3), in the SRN context.Remark 2.2 (Structure of the cost function).One can derive an optimal control formulation with additive structure (similar to [30] in the stochastic differential equation setting) by applying a logarithmic transformation together with Jensen's inequality to (2.7).This reduces the control problem to a Kullback-Leibler minimization.In [42,43], this Kullback-Leibler minimization problem leads to the same optimal change of measure as the problem of finding the change of measure using a variance minimization approach.However, the previous conclusion needs more investigation in the setting of SRNs, which we leave for future potential work.
is the admissible set for the IS parameters; and u ∆t (N, x) = g 2 (x), for any x ∈ N d .Theorem 2.4 (Dynamic programming for importance sampling parameters).For x ∈ N d , the value function u ∆t (n, x) fulfills the dynamic programming relation and for n = N − 1, . . ., 0, and where ν = (ν 1 , . . ., ν J ) ∈ Z d×J .
Theorem 2.4 breaks down the minimization problem to a simpler optimization that can be solved stepwise backward in time starting from final time T .Solving the minimization problem (2.9) analytically is difficult due to the infinite sum.Section 2.2 shows how to overcome this issue by approximating (2.9) to derive near-optimal parameters for {δ ∆t n (x)} n=0,...,N −1;x∈N d for the proposed IS approach.

Approximate dynamic programming
Theorem 2.4 gives an exact solution for optimal IS parameters resulting from modifying the Poisson random variable rates in the TL paths.However, the infinite sum has to be evaluated in closed form to solve (2.9) analytically, which is generally difficult.Therefore, we propose approximating the value function u ∆t (n, x) in (2.9) by u ∆t (n, x) for all time steps n = 0, . . ., N , reaction channels j = 1, . . ., J and states x ∈ N d .First, both u ∆t (n, x) and u ∆t (n, x) satisfy the same final condition, Next, to derive the approximate dynamic programming relation for u ∆t (•, •), we presume Assumption 2.5 to hold.This assumption is motivated by the behavior of the original propensities, which are of O (1) due to the mass-action kinetics principle (refer to (1.5)).
Given Assumption 2.5 and that {a j (•)} J j=1 are of O (1), we apply a Taylor expansion around ∆t = 0 to the exponential term in (2.9), then truncate the expression within the infimum such that the remaining terms are O (∆t).This truncates the infinite sum and linearizes the exponential term.Thus, for x ∈ N d and n = N − 1, . . ., 0 where δ j ∈ A x,j , j = 1, . . ., J, are the SOC parameters at state x for reaction j.The admissible set A x,j is defined in (2.2).Assumption 2.5 ensures that (i) we can apply the Taylor expansion to the exponential term as ∆t decreases, and (ii) we have the exact approximation structure for (2.11) with no further terms scaling with ∆t that have order less than ∆t 2 .
In this case, the approximate optimal SOC parameter δ ∆t n,j (x) can be analytically determined as Note (2.13) includes the particular case when a j (x) = 0 for some j ∈ {1, . . ., J}.In such a case, δ ∆t n,j (x) = 0, which agrees with (2.2).
An important advantage for this numerical approximation, u ∆t (•, •), is that we reduce the complexity of the original optimization problem at each step in (2.9) from a simultaneous optimization over J variables to independent one-dimensional optimization problems that can be solved in parallel using (2.13).
Remark 2.6 (Assumption (2.12)).Whether the assumption in (2.12) is generally fulfilled depends on the method employed to solve the dynamic programming principle in (2.11).For example, if we use a direct numerical implementation either some special numerical treatment is required for the cases where (2.12) is violated, or some regularization is required to ensure well-posedness.The proposed approach from Section 2.3 avoids that issue since we model u ∆t (•, •) with a strictly positive ansatz function, which guarantees condition (2.12) to hold for any state x and all time steps n.
Remark 2.7 (Computational cost for dynamic programming).To derive a practical numerical algorithm for a finite number of states, we truncate the infinite state space where S 1 , . . ., S d is a set of sufficiently large upper bounds.The computational cost to numerically solve the dynamic programming equation (2.11) for step size ∆t and state space × d i=1 [0, S i ] can be expressed as where S * = max i=1,...,d S i .
The cost in (2.14) scales exponentially with dimension d.Section 2.3 proposes an alternative approach to address this curse of dimensionality.However, in future work, we aim to combine dimension reduction techniques for SRNs with a direct numerical implementation of dynamic programming.

Learning-based approach
Using the SOC formulation derived in Section 2.2, we propose approximating the value function u ∆t (•, •) with a parameterized ansatz function, u(t, x; β).Remark 2.8 (Choosing the ansatz function).The parameterized ansatz function u(t, x; β) should consider the final condition of the value function (2.9), and its choice depends on the given SRN and observable g(x).For linear observables, such as g(x) = x i , we can consider polynomial basis functions as an ansatz.For more complex problems, the ansatz function is a small neural network.
For rare event applications with observable g(x) = 1 {x i >γ} , we consider a sigmoid with learning parameters β = β space , β time ∈ R d+1 as the ansatz function where •, • denotes the inner product, and the time is scaled to one using t ∈ [0, 1].
Parameters b 0 and β 0 are not learned through optimization but determined by fitting the final condition for Theorem 2.4, which imposes u(1, x; β) ≈ g 2 (x) = 1 {x i >γ} .Therefore, the discontinuous indicator function is approximated by a sigmoid, and the fit is characterized by the position of the sigmoid's inflection point and the sharpness of the slope.The position and value of local and global minima with respect to the learned parameters β space and β time depend on the choices for b 0 and β 0 .
To derive IS parameters from the ansatz function, we use the previous SOC result from (2.13), i.e., We define u(t, •; •) in (2.15) as a time-continuous function for t ∈ [0, 1]; whereas the IS controls from δ ∆t j (n, •; •) are discrete in time for n = 0, . . ., N − 1, and depend on time step size ∆t.Therefore, u(•, •; β) can be used to derive control parameters for arbitrary ∆t in (2.16).
The parameters β for the ansatz function are then chosen to minimize the second moment, where {X ∆t,β n } n=1,...,N is the IS path generated using IS parameters from (2.16) and δ ∆t (n, x; β) j = δ ∆t j (n, x; β) for 1 ≤ j ≤ J.We use a gradient based stochastic optimizer method to solve (2.17), and derive Lemma 2.9 (proof in Appendix B) for the gradient of the second moment with respect to parameters β.
For an ansatz function different from (2.15), the gradient is still given by Lemma 2.9 only the derivation of ∂ ∂β l u(t, x; β) in (2.20) changes accordingly.By estimating the gradient in (2.18) using a MC estimator, we iteratively optimize the parameters β to reduce the variance.For this optimization, we use the Adam optimizer with the same parameter values suggested in [34] with the only difference that the step size is tuned to fit our problem setting.
In Section 3, we illustrate the potential of our new IS method based on the learning approach numerically in terms of variance reduction.Further theoretical and numerical analysis of this approach is left for future work, particularly the initialization for the learned parameters β time and β space in (2.15) and investigations of a stopping rule.
To derive an estimator for E[g(X(T ))] using the proposed IS change of measure, we first solve the related SOC problem using the approach from this section; then we simulate M paths under the new IS sampling measure.Thus, the MC estimator using the proposed IS change of measure over M paths becomes where X ∆t,β [i],N is the i-th IS sample path and the corresponding likelihood factor from (2.5) is Remark 2.10.The explicit pathwise derivatives in Lemma 2.9 have the following advantages compared with the finite difference approach: (i) the explicit pathwise derivatives are unbiased with respect to the TL scheme, resulting in only the MC error for evaluating the expectation (i.e., without additional finite difference error), and (ii) the gradient computation in (2.18) requires the estimation of an expected value with a high relative error because of g being fitted to an indicator function.Using the IS-TL paths we control better the related statistical error.

Computational cost for the learning-based approach
This section discusses the computational complexity for the learning approach to achieve a prescribed tolerance TOL.Recall that the proposed approach comprises two steps; hence, two types of costs occur: (i) the offline learning cost for the ansatz function parameters β, and (ii) the online cost to obtaining the MC estimator (2.21) based on M simulated paths using the derived IS measure (see (2.16)).
The offline cost for (i) can be expressed as where I is the number of optimizer steps, M 0 is the number of paths needed to derive the estimator of the gradient per optimizer step, C P oi is the cost to generate one Poisson random variable, C grad is the cost for the update of the algebraic evaluation of (2.18), and ∆t pl is the step size.In contrast to (2.14), this offline cost does not scale exponentially with dimension d.
The cost for one IS-TL path based on u(•, •; β) is the same as for a TL path with negligible additional factors C δ for evaluating (2.16) and C lik for deriving the likelihood update, as given in (2.4), where ∆t f is the step size.Thus, total cost is Following the same derivation as for (1.11)-(1.14),we choose ∆t f = TOL 2•C , where C is the constant from (1.9), to obtain total computational complexity to derive a prescribed tolerance T OL where L is the likelihood factor corresponding to the IS path X ∆t,β (refer to (2.22)).
Our numerical simulations suggest that the amount of variance reduction achieved with the proposed approach is not related to ∆t pl (see Figure 3.4).Therefore, we can achieve a low offline parameter learning cost (W pl (I, M 0 , ∆t pl )) by using ∆t pl ≫ ∆t f .For comparison, Section 1.3 shows that the standard MC-TL approach has total computational complexity 3  .
The proposed IS approach reduces this cost by variance reduction Var[g(X ∆t 3).The TL variance becomes increasingly large in the asymptotic regime for very rare event probabilities, such that the additional cost W pl (I, M 0 , ∆t pl ) for learning β in (2.23) becomes negligible.Therefore, we obtain W IS−T L (T OL) ≪ W M C−T L (T OL) in the rare event regime.

Numerical Experiments and Results
Through Examples 3.1, 3.2, and 3.3, we demonstrate the advantages for the proposed IS approach compared with the standard MC approach.We numerically show that the proposed approach achieves substantial variance reduction compared with standard MC estimators when applied to SRNs with different dimensions.
Since all three are rare event examples with observable g(x) = 1 {x i >γ} , we use the ansatz function (2.15) with initial parameters β space = 0, and β time = 0.The relative error is more relevant for rare event occurrences than the absolute error, hence we use a relative version of the variance, i.e., the squared coefficient of variation [12,35], which, for a random variable X, is given by To judge the robustness of our variance estimators, we estimate the kurtosis, κ := (Var[X]) 2 , because the standard deviation of the sample variance [10] is given by where M is the number of samples.We set the Adam optimizer step size α = 0.1 for the simulations.Figure 3.1 shows 100 Adam optimization steps for the decay example (Example 3.1) for step size ∆t pl = ∆t f = 1/2 4 .The quantity of interest is a rare event probability with magnitude 10 −3 .To estimate the gradient, we use M = 10 4 samples per Adam iteration.The squared coefficient of variation is reduced by a factor of 10 2 compared with the standard MC-TL variance after 13 Adam iterations.After reaching this minimum, the squared coefficient of variation increases for the next iteration steps.This behavior might be avoided by employing a smaller step size in the Adam algorithm.Figure 3.1(d) confirms that the kurtosis is bounded to a level below the standard TL's kurtosis, indicating a robust variance estimator.
For the 4-dimensional stochastic reaction network (Example 3.2), the rare event probability for the event {X 3 (T ) > 22} is of magnitude 10 −5 .Figure 3.2(b) confirms that the proposed learningbased approach reduces the variance by a factor 4 × 10 3 compared with standard TL for step size ∆t pl = ∆t f = 1/2 4 .Although Figure 3.  .Figure 3.2(d) confirms that the kurtosis for the proposed approach is substantially reduced compared with the kurtosis for the standard TL approach.The 6-dimensional example (Example 3.3) has a rare event probability with magnitude 10 −6 .Figure 3.3 shows the Adam optimization results for step size ∆t pl = ∆t f = 1/2 4 .The TL mean differs from the mean for the proposed approach (Fig. 3.3(a)) because the standard MC-TL estimator requires more than 10 6 runs to accurately estimate a probability of order 10 −6 .The proposed learning-based approach reduces the variance by a factor of more than 50 after 43 iterations.The kurtosis is bounded and lower than the kurtosis for the TL approach, confirming that the proposed approach results in a robust variance estimator.
Examples 3.2 and 3.3 show that a good choice of the ansatz function in combination with reasonable initial parameters provides substantial variance reduction from the first optimization step.However, we do not expect this behavior in general, particularly for high dimensions, and therefore we performed some optimization iterations.
The examples used step size ∆t pl = 1/2 4 and showed the squared coefficient of variation with respect to the same step size.To demonstrate that the learned parameters β can be used for forward runs with smaller step sizes (i.e., ∆t f ≪ ∆t pl ) as claimed in Section 2.4, we consider Example 3.2 and the final parameters from Figure 3.4 for forward runs with different ∆t f .The results show that the variance reduction is constant with respect to ∆t f , suggesting that a coarse ∆t pl is sufficient for parameter learning.The same behavior was observed for other tested examples.
Remark 3.4.We used the ansatz (2.15) based on a single sigmoid for the numerical experiments to demonstrate the potential for the proposed learning-based IS.Further variance reduction may be achieved either by summing several sigmoid functions as ansatz or selecting a different basis function shape.Relevant analyses will be pursued in future work.(see final optimizer step in Figure 3.2) and applied to forward runs with different ∆t f values.The squared coefficient of variation was estimated with M = 10 6 sample paths.The standard MC-TL approach is used as reference (dashed red line).This work developed an efficient path-dependent IS scheme to estimate statistical quantities for SRN processes, particularly rare event probabilities.Optimal IS parameters were obtained within a pre-selected class of change of measure using the proposed connection to an associated SOC problem, which could be solved via dynamic programming.To mitigate the curse of dimensionality encountered by the dynamic programming relation, we proposed a method for multi-dimensional SRNs based on approximating the value function via an ansatz function (i.e. a neural network), where the parameters were learned using a stochastic optimization algorithm.Numerical examples and subsequent analyses verified that the proposed estimator achieved substantial variance reduction compared with the standard MC method, providing lowered computational complexity in the rare event regime.Future work will further analyze the proposed learning-based approach and expand it to derive a multilevel MC estimator.We also plan to combine an implementation of the dynamic programming principle as derived in Section 2.2 with dimension reduction methods for SRNs.
A Proof for Theorem 2.4 Proof for Theorem 2.4.To show (2.9), we first reformulate C n,x (δ ∆t n , . . ., δ ∆t N −1 ) using the definition for the likelihood and the notion of conditional expectation, we can reformulate (A.1) and derive We can prove Theorem 2.4 using the above results.We split the proof into two parts, where the first inequality is obtained by To prove the second inequality, we choose the control at the n-th time step to be an arbitrary δ ∆t,+ n > 0, and for the remaining controls, we choose the elements of a minimizing sequence of controls such that Therefore, This inequality holds for any arbitrary δ ∆t,+ n > 0, and hence This completes the proof.
B Proof for Lemma 2.9 The partial derivatives of the second moment C 0,x δ ∆t n , . . ., δ ∆t N −1 ; β in (2.7) with respect to β l , l = 1, . . ., (d + 1) can be expressed as = E ∂ ∂β l g 2 X ∆t where the Poisson increments with respect to the TL measure are given in P n with (P n ) j := P n,j = P n,j a j ( X ∆t k )∆t for j = 1, . . ., J.In = , we assume that the expected value and the derivative commute (see [37] Assumption A1(1) for sufficient conditions).In = , we consider that g 2 X ∆t N is based on the original TL measure and hence is not dependent on β l .In (B.1), the term is only deterministically dependent on β, since X ∆t k is independent of β, and P k,j ∼ P oi(a j ( X ∆t k )∆t).Thus, the derivative can be computed in a closed form using the identity We compute the derivative from (B.2) using the following steps.where ∂ ∂β l u ∆t (t, x; β) depends on the chosen ansatz.
Combining the previous steps, the gradient can be expressed as where the gradient of δ ∆t j is dependent on the ansatz used and given by (B.5).Since the MC estimator (B.1) may have a large variance, we again apply IS, • S(X ∆t,β ; β) .(B.7)

Definition 2 . 3 (
Value function).The value function u ∆t (•, •) is defined as the optimal (infimum) second moment for the proposed IS estimator.For time step 0 ≤ n ≤ N and state x ∈ N d ,

space 4 =
2(c) seems to shows that parameters β time and β space 4 overlap, this is an artifact from the scale of the y-axis; in fact, the final values are β time = −3.2×10−4 and β −3.0 × 10 −3 .The intrinsic structure of Example 3.2 results in similar molecule counts for E(t) and S(t) and hence similar values for β space 1 and β space