Quasi-Monte Carlo Methods Applied to Tau-Leaping in Stochastic Biological Systems

Quasi-Monte Carlo methods have proven to be effective extensions of traditional Monte Carlo methods in, amongst others, problems of quadrature and the sample path simulation of stochastic differential equations. By replacing the random number input stream in a simulation procedure by a low-discrepancy number input stream, variance reductions of several orders have been observed in financial applications. Analysis of stochastic effects in well-mixed chemical reaction networks often relies on sample path simulation using Monte Carlo methods, even though these methods suffer from typical slow \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(N^{-1/2})$$\end{document}O(N-1/2) convergence rates as a function of the number of sample paths N. This paper investigates the combination of (randomised) quasi-Monte Carlo methods with an efficient sample path simulation procedure, namely \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ-leaping. We show that this combination is often more effective than traditional Monte Carlo simulation in terms of the decay of statistical errors. The observed convergence rate behaviour is, however, non-trivial due to the discrete nature of the models of chemical reactions. We explain how this affects the performance of quasi-Monte Carlo methods by looking at a test problem in standard quadrature.


Introduction
In the last few decades, research in molecular biology has generated vast amounts of quantitative data. This growing amount of data has inspired the development of a B Casper H. L. Beentjes beentjes@maths.ox.ac.uk 1 Mathematical Institute, Woodstock Road, Oxford OX2 6GG, UK variety of mathematical modelling and simulation techniques aiming to support the experimental study of the intricate processes taking place in cells and other molecular systems. As a result, we now often have the option to perform in silico experiments alongside the more traditional in vivo and in vitro approaches to study complex cellular pathways, allowing us to perform model fitting and inference, thus yielding a detailed view of the different components in these often intricate networks (Wilkinson 2009).
One feature which nowadays appears prominently in many models of chemical reaction networks is randomness. The aim of including randomness is to mimic the effects of intrinsic and extrinsic noise sources present in molecular systems, as found in experiments (McAdams and Arkin 1999;Elowitz et al. 2002;Blake et al. 2003;Cai et al. 2006). Stochasticity can be responsible for a wide variety of observed phenomena, such as stochastic focusing (Paulsson et al. 2000) or resonance-inducing oscillations (Hou and Xin 2003). The addition of noise, however, comes at a price in terms of our in silico experiments. Single experiments have to be run many times using Monte Carlo (MC) methods to yield results in the form of summary statistics to a satisfactory degree of certainty. This requirement can result in large computation times or even make a problem intractable with existing computational methods.
As such, a key component in the development of computational techniques for these stochastic models is finding ways to reduce the variance of the statistics returned by the simulation algorithms used. In other fields that rely heavily on MC computations, such as computational finance, a commonly applied variance reduction approach involves the use of quasi-Monte Carlo (QMC) techniques. By changing the random number input from pseudo-random numbers to low-discrepancy numbers, a gain in performance of sometimes several orders can be achieved. In the context of the simulation of chemical reaction networks, this idea has received very little attention. In Hellander (2008), an exploration of the combination of QMC methods with the exact simulation of continuous time Markov chain (CTMC) models is presented which shows some of the benefits of QMC over standard methods. In this work, we specifically look at an approximate simulation method, τ -leaping, which allows for an easier incorporation of QMC methods. We show that the benefits from using QMC methods in this case are perhaps less striking than anticipated based on the success of QMC methods in the numerical solution of stochastic differential equations (SDEs). We then give a detailed explanation for why this is the case, which serves as an explanation for the observations made in Hellander (2008) as well.

Outline
We start in Sect. 2 with an overview of mathematical modelling of chemical reactions of well-mixed species. These models need different simulation methods to compute summary statistics, which are discussed next, in Sect. 3. This section also includes a discussion about MC methods and the resulting statistical error in summary statistics. To reduce the statistical error, we explore the use of QMC methods, and we provide a practitioners introduction to QMC in Sect. 4. In Sect. 5, we show results from the application of QMC methods in the simulation of chemical reaction networks. We compare the results with a simple toy model from classical quadrature to elucidate the difference in performances observed. Finally, we conclude in Sect. 6 with a discussion of our observations on the combination of QMC methods and the simulation of stochastic biological systems.

Mathematical Models of Chemical Reactions
For the main part of this work, we will look at models that describe the temporal evolution of molecule copy numbers where we assume that the molecules are spatially distributed in a homogeneous way, i.e. we assume the systems to be well mixed in a volume V . Note that it is possible to include the spatial movement of molecules in this framework and we refer to Erban et al. (2007) for more detail.
Suppose we have a collection of n types of chemical species S 1 , . . . , S n that can interact via K different types of reactions R 1 , . . . , R K , often denoted as reaction channels. In generic form, we can describe such an interaction R k as α 1,k S 1 + · · · + α n,k S n c k − → β 1,k S 1 + · · · + β n,k S n , k = 1, . . . , K , where α i,k , β i,k ∈ N and c k is the reaction rate constant for reaction channel R k . We define X(t) to be the state vector, describing the evolution of all the species as time evolves, i.e. X i (t) is the number of molecules S i at time t. Upon the firing of reaction R k the number of molecules can change and we will use ζ k , the stoichiometric vector for reaction R k , to denote the change in the copy number when reaction R k fires, i.e. due to reaction R k we see X → X + ζ k . Or, in terms of the α i,k and β i,k , we thus have the i-th component of ζ k equal to β i,k − α i,k . We can now describe the temporal evolution of the chemical species using where N k (t) denotes the number of times that reaction channel R k fires in the time interval [0, t). This is, of course, not insightful yet as we have not described a means to calculate the N k (t). In order to model these counting functions, N k , we will assign to every reaction channel, R k , a propensity function, a k (X(t)), which describes the probability that the reaction channel fires in the infinitesimal time interval [t, t + dt). Note that the selection of this function is a modelling choice. A commonly employed choice is the Law of Mass Action which, in essence, looks at the number of combinations of reactants that can be made using the state vector X(t) to let reaction R k fire, see for example Higham (2008) (but note that the choice of a k is in no way essential to what follows).
We will now present an inverse historical way to define N k (t) based on these propensity functions. An accurate way to model this counting function would be to view the above description as a CTMC where, given the current state, X(t), we can experience K different state transitions based on the various reaction channels. An inhomogeneous Poisson process, Y k , will then describe the number of times reaction R k fires. This leads to the Kurtz random time change representation (RTCR) (Anderson and Kurtz 2011) where we see K independent inhomogeneous Poisson counting processes Y k . This representation of the CTMC is a pathwise description of the dynamics. Alternatively, one can describe the CTMC by the chemical master equation (CME), a (high) dimensional system of ordinary differential equations (ODEs) that describes the time evolution of the probability to occupy parts of the state space of X(t). Note that the CME can only be solved in special cases (Jahnke and Huisinga 2007). Both the RTCR and the CME form the basis for many simulation approaches, some of which we will touch upon later.
Going back to Eq.
(2) note that we can rewrite the time evolution of the CTMC as which now describes the evolution of the system over a time interval [t, t + τ ). If we assume that τ is small enough such that a k (X(s)) ≈ a k (X(t)) over the interval [t, t + τ ), but still such that a k (X(t))τ 1, we can use two approximations to derive the chemical Langevin equation (CLE) (Gillespie 2000). First we note that because a k (X(s)) ≈ a k (X(t)) we can write where we now have K homogeneous Poisson processes. Next we use the normal approximation of a Poisson process with large rate parameter a k (X(t))τ 1 to give where N k (μ, σ 2 ) is a normal random variable with mean μ and variance σ 2 . Equation (5), in the limit τ → 0, gives an evolution equation for X(t) in the form of an SDE, which can be written in the form where now the W t,k denote K independent Wiener processes. Comparing the RTCR, Eq.
(2), with Eq. (6), we see that both are pathwise descriptions of the dynamics of the species. In the case of the CLE, it is also possible to describe the dynamics in a manner akin to the CME, i.e. in terms of occupation probability of the state space of X(t). This approach yields the chemical Fokker-Planck equation (CFPE), a system of partial differential equations (PDEs) that, in general, again is high dimensional, and not many systems that yield exact results are known.
Using the SDE formulation, we can finally derive the deterministic reaction rate equations (RREs) which have been in use for over a century. We take the thermodynamic limit, letting the number of molecules and the volume V go to infinity, whilst their ratio remains constant. In this limit, the random fluctuations become negligibly small compared to the deterministic terms and we convert Eq. (5) into which we can rewrite into a system of ODEs by taking the limit τ → 0: Because they constitute a relatively smaller system of ODEs, the RREs can be studied using analytical tools, or numerical ODE solvers. On the flip side, however, this system of ODEs is completely deterministic and therefore does not incorporate stochastic effects. We thus have three different mathematical models describing the temporal evolution of well-mixed molecular species S 1 , . . . , S N , namely the RTCR (2), the CLE (5) and the RREs (8), respectively. These models can be seen as a chain of approximations going from a CTMC with discrete state space to a CTMC with a continuous state space to a deterministic ODE system. Each of these models can be analysed and simulated in different ways and at different cost.

Simulation of Well-Mixed Systems
Having presented three different models for the evolution of interacting chemical species in the preceding section, we can now ask how to perform a mathematical analysis on them. For the deterministic rate equations (8), we can use a wide array of well-known analytical and numerical techniques: we will not go in detail here but rather refer to Higham (2008) and references therein. If, however, noise and nonlinear reactions (for example second order or higher with mass action) are important, the RRE model will not yield correct results and one has to resort to one of the two stochastic models mentioned previously to investigate system behaviour.
The formulations using CTMCs incorporate stochasticity and therefore are often harder to interrogate using analytic methods; only for a handful of cases this has proven to be possible (Jahnke and Huisinga 2007). Repeated simulation of sample paths from these models is therefore crucial in order to gain insight into the dynamics of the species numbers. For the CLE, we can use standard simulation procedures that are used for SDEs, such as the Euler-Maruyama (EM) or Milstein methods, and we refer to Kloeden and Platen (1992) for an extensive exposition of that subject. These computational techniques for SDEs are generally efficient compared to methods for the RTCR that we will discuss next. However, they do suffer from both a numerical error depending on the integration scheme used and a bias error, because the CLE (6) is only an approximation to the RTCR (2).
The dynamics of the RTCR (2) can generally be studied in two different ways. The first is by the use of the CME. However, one of the disadvantages of this approach is the dimension of this system of ODEs; it is equal to the size of the state space. This will generally be so high that the problem becomes intractable, and we refer to Schnoerr et al. (2017) for a general overview of computational approximations related to the CME.
This means that one is often forced to rely on a different approach to get a handle on the model dynamics. Instead of looking at the evolution of the probability over the whole state space at once, we generate single sample paths which evolve according to the rules of Eq. (2). An exact algorithm to compute such sample paths in the context of chemical reaction networks is called the Stochastic Simulation Algorithm (SSA) or Gillespie's Direct Method (Gillespie 1977). Given a current state X(t), the algorithm provides a way to compute the time until the next reaction fires and determines which reaction this is. In that way, we can progress the Markov chain one reaction at a time. This approach can be made more computationally efficient by, for example, using the Next Reaction Method by Gibson and Bruck (2000) or the Modified Next Reaction Method by Anderson (2007).
Still these methods suffer from a drawback, namely their computational costs. As they simulate each reaction individually their run time can be significant if we have many molecules and reactions involved. This is the rationale behind the development of approximate methods to simulate sample paths from Eq. (2). One of the most widely used methods is the τ -leap scheme, also developed by Gillespie (2001). We go back to Eq. (4), but this time we do not approximate the Poisson process by Gaussian random variates to yield the CLE. In essence, the τ -leap method follows from the rationale that, given a small enough τ , the propensities of the reactions do not change much in the time interval [t, t + τ ) and therefore can be assumed constant. This approach yields a discrete-time Markov chain (DTMC) with a discrete state space, where the time between each state is given by the time step τ and the transitions are computed by The computational gain with this method is that in order to calculate Y k (a k (X(t))τ ) we can simply generate a single Poisson random variable p k with parameter a k (X(t))τ . This means that we can fire multiple reactions at once and therefore progress quicker than is the case for the SSA. An algorithmic representation of the τ -leap method is depicted in Algorithm 3.1. Being an approximate method, the τ -leap method does not come without caveats, one being the possibility of achieving negative molecule numbers. Many possible workarounds to avoid negative copy numbers have been proposed (Tian and Burrage 2004;Chatterjee et al. 2005;Cao et al. 2005Cao et al. , 2006. Furthermore, because τ -leaping is an approximate method, it yields a bias depending on selection of the magnitude of the step size, τ . Therefore one has to balance the effect of this bias with the computational costs, which are O(τ −1 ).
A different view on the τ -leap method is that it is a variant of the explicit Euler method for ODEs applied to Eq. (2), where we have approximated the time integral by a left Riemann sum. This method parallels the widely used EM scheme for SDEs. One could therefore ask the question whether it is possible to adapt other ODE time stepping approaches to the CTMC simulation case. Indeed, this is possible for a small class of methods, such as implicit Euler, which yields implicit τ -leap approaches (Rathinam et al. 2003). These methods can perform better for some systems, e.g. those exhibiting stiff behaviour.

Monte Carlo Methods and Errors
As a final note on the simulation of well-mixed systems, we now reflect on the different methods mentioned above. Many commonly used methods provide means to generate (approximate) sample paths of chemical reaction networks, but how can we infer information from these? In many instances, we are interested in expressions like g(X(t)), where g is a function of the state variable X(t) at time t. However, as X(t) is a random variable, we will often have to look at the expectation E [g(X(t))] of this function g. A common example would be g(x) = x k , where taking the expectation yields the k-th moment of the process X(t).
If we can only generate sample paths from the distribution of possible outcomes in the state space, we have to employ MC methods to estimate the required statistics.
We generate N independent, possibly approximate, sample pathsX (1) (t), . . . ,X (N ) (t) and construct from this the MC estimator The more samples N used, the more certain one can be of the closeness of E N to the expected value E [g(X(t))]. We can make this precise by considering the mean squared error (MSE) given by which can be decomposed into two separate sources of error. First of all there can be a bias. This could be the result of a modelling choice, for example, the CLE and its related computational methods form a biased approximation of the RTCR from which it was derived. Alternatively the bias could stem from algorithm parameters such as the time step τ in the τ -leap method. This type of error will not be investigated in this paper, and we will assume that, for a specific computational method, it is given and fixed. On the other hand, there are statistical errors, in the form of the variance V [E N ] of the estimator, which can be controlled. Statistical errors will therefore form the main focus of interest in this manuscript and are discussed next.
It goes without saying that it is desirable to have this statistical error as small as possible; we aim to control statistical uncertainty as tightly as possible given our computational resources. For a standard MC method, given a sample variance σ 2 , which is determined by the model being studied, the variance of the estimator E N is given by σ 2 /N . Reducing the statistical error can now be done in two ways. Firstly, by taking more samples the variance decays to zero as N → ∞. This approach requires the development of more efficient algorithms in order to bring the cost per simulation down. Alternatively, one could hope to reduce the sample path variance, σ , by applying a variance reduction technique, see Lemieux (2009, Chapter 4) for more detail in the context of MC methods. Note that employing standard variance reduction techniques results in a smaller σ and therefore these techniques only improve the constant of convergence for V [E N ], the convergence rate behaviour as N → ∞ does not change. For the remainder of the manuscript, however, we look at a variance reduction method different from MC that aims to reduce the variance decay as a function of increasing N .

Quasi-Monte Carlo Methods
One of the drawbacks of general MC methods is the slow convergence rate, often of the order O(N −1/2 ) for the root mean squared error (RMSE). A way to improve on plain MC methods is the use of QMC methods. Originally QMC methods were developed to approximate multidimensional integrals of the form Shown are N = 300 points u (i) scattered at random. If a perfect uniform distribution was attained by these points the number of points in B would be equal to Vol(B) · N and this would hold true for every box where s is the dimension of the problem. In standard MC, we would generate a sequence u (i) with i = 1, . . . , N of s-dimensional uniform random variates and calculate The convergence I N → I as N → ∞ for MC methods is based on the Law of Large Numbers (LLN), but this is not necessary for convergence. For example, deterministic quadrature methods such as the midpoint-rule exist and have no relation to the LLN. It turns out that, by virtue of the Koksma-Hlawka inequality, we can link the rate of convergence of I N as N → ∞ to the uniformity of the points u (i) ⊂ [0, 1) s used. More precisely, the Koksma-Hlawka inequalities link the discrepancy D * N of the point set u (i) and convergence of the approximate integral. This is given in the most common form by where V [ f ] is the total variation of the integrand f in the sense of Hardy and Krause. This approximation error inequality can be thought of as the equivalent of Eq. (11) for MC methods. Note that Eq. (11) is an equality and holds in probability, whereas Eq. (14) is a deterministic worst-case inequality. Comparing the two error bounds we see that V [ f ] takes the place of the variance σ , both quantities depending on the integrand f . Furthermore we see that, rather than having an error decay like N −1/2 , we now have a factor D * N determining the behaviour as N increases. The discrepancy, or the star-discrepancy D * N in particular, measures the greatest deviation of a point set from a perfect uniform distribution on [0, 1) s , which is illustrated in Fig. 1. Taking the supremum over all the boxes B with one corner at the origin, we measure the difference between the expected number of boxes in the perfect uniform case and reality. The total variation V [ f ] of the integrand is for all practical purposes impossible to calculate and harder to estimate than the actual integral I . Furthermore in practical applications one can encounter functions with infinite V [ f ], which voids the use of Eq. (14).
It turns out that it is possible to construct low-discrepancy sequences that will cover the integration domain more uniformly than random numbers, i.e. their discrepancy Comparison between a (pseudo) random point set (left) and a low-discrepancy Sobol' point set (right) in [0, 1) 2 , both of length N = 2 9 decays quicker than for equivalent random sequences, which have D * N = O(N −1/2 ). An example comparison between a (pseudo) random sequence and a low-discrepancy sequence is depicted in Fig. 2, which shows that the low-discrepancy sequence attains a much better spread over the integration domain [0, 1) 2 .
For the QMC method, we replace the (pseudo) random sequence {u (i) } by a deterministic sequence of low-discrepancy numbers {v (i) } (Lemieux 2009, Chapter 5). By their deterministic construction, these sequences can attain convergence orders like O((ln N ) s N −1 ) for a wide range of integrands f by virtue of (14). This O((ln N ) s N −1 ) convergence rate in the limit of N → ∞ will always be better than can be attained with standard MC. However, if the dimension, s, is large but N is not very large it is not clear, based on theoretical results, whether QMC will provide an improvement. There are, however, various reports in the literature, albeit without a theoretical justification, of QMC methods seemingly outperforming MC methods. Nowadays, QMC finds its application in many more areas than just integration, such as finance, see for example Lemieux (2009, Chapter 7), and Bayesian inference (Gerber and Chopin 2015).

Randomised Quasi-Monte Carlo
A weakness of QMC methods compared to other quadrature methods is the lack of a measure of error. For MC methods, we can use the LLN to estimate the variance and obtain confidence intervals. However, for QMC methods, the points used are deterministic and therefore do not allow the application of the LLN. The Koksma-Hlawka inequality (14) does provide deterministic error bounds, but for all practical purposes the quantities involved, V [ f ] and D * N , cannot be calculated or computed. Furthermore we note that, because the low-discrepancy numbers are a deterministic set, the QMC estimator is not unbiased.
We can, however, consider a hybrid of MC and QMC methods. This type of approach introduces randomness into QMC methods in such a way that we keep their good convergence properties whilst at the same time allowing for error estimation with the LLN. The resulting methods are also known as randomised quasi-Monte Carlo (RQMC) methods.
A common idea in such RQMC methods is to take a low-discrepancy point set {v (i) } and apply a randomisation to get a new set {ṽ (i) }. Good randomisations (specific for the low-discrepancy sequence used) exist such that this new set is still a lowdiscrepancy point set but, at the same time, for all points in this setṽ (i) ∼ U([0, 1) s ) holds. As a result of such a randomisation, I N with {ṽ (i) } will be an unbiased estimator of I . We refer to Lemieux (2009) and references therein for more information on such randomisations.
To construct a measure of the statistical error, we create M different randomised low-discrepancy point sets N of the objective I if we use equation (13). Combining these M randomisations gives rise to a new estimator which we note again is an unbiased estimator of I . At the same time, we can now estimate the variance like we did for MC methods, because we effectively have M independent unbiased estimates of I . This allows for an unbiased estimator of the sample path variance just as one can obtain for standard MC simulationŝ We can now incorporate this into the MC framework to find an unbiased empirical estimator of V I M,RQMC As a result, there are two ways one can reduce the variance of an RQMC estimator, either by taking more samples, N , per randomisation or by taking more randomisations, M. It is not always clear what choice one should make in this regard, but we can make some general observations. We note that increasing N means that within each randomisation more points of the low-discrepancy set will be used. This will therefore take advantage of the better spread of low-discrepancy point sets by loweringσ 2 RQMC , possibly at a rate faster than O(N −1/2 ). On the other hand, M only controls the number of randomisations, which ties in with the standard MC framework. Therefore M has limited influence on the statistical error convergence (O(M −1/2 ) for the RMSE). However, it should be large enough to make the variance estimation equation (17) sufficiently accurate, which can often already happen for M ≥ 10 (Lemieux 2009). Note that to get an RQMC estimator and sample variance we use M N sample points Fig. 3 Illustration of the combination of two one-dimensional point sets into a two-dimensional set, both for randomised Sobol' sets (black circle) and pseudo-random sets (grey circle). This approach for pseudo-random numbers results in a new two-dimensional pseudo-random number set, but this is not true for low-discrepancy numbers v 2 v 1 and thus for a fair comparison an RQMC method should be compared to standard MC with M N sample points.
In this section, RQMC was introduced as a variation on standard QMC methods by adding MC style randomisations. An alternative perspective of RQMC is starting from a MC method and then adding the low-discrepancy points to make it into a variance reduction method for standard MC methods. In Appendix B, we give more detail on this viewpoint of RQMC.

Application to Stochastic Simulations
(R)QMC methods were introduced in the previous sections in the context of quadrature, but the framework applies equally well to many stochastic simulation approaches. This is due to the fact that the object of interest often takes the form of an expectation, which can also be written as an integral. Therefore it can be sufficient, just as for quadrature, for stochastic simulations to substitute pseudo-random numbers in a MC simulation method with low-discrepancy numbers to get an equivalent (R)QMC method.
A crucial difference, however, is that for most standard low-discrepancy numbers we need to know the dimensionality of the problem a priori. This is due to the fact that one cannot make a low-discrepancy point set in two dimensions by simply combining two one-dimensional point sets (note that this does work for pseudo-random numbers!), which can be clearly seen in Fig. 3. This difference between the two types of points is caused by the way low-discrepancy point sets are generated, in a well-defined deterministic manner, which introduces correlation between the individual points.
It is therefore not straightforward to combine QMC methods with, e.g. Gillespie's SSA, as it is not clear, a priori, how many random numbers will be used in the simulation of a single path, i.e. the dimension is unknown and possibly infinite. There do exist ways to deal with possibly infinite integration problems in the context of QMC using (extensible) lattice rules and sequences, see for example Dick et al. (2013, Section 5) for an overview and L'Ecuyer and Munger (2016) for a software implementation of such constructions. For chemical reactions, a workaround for the simulation of CTMCs, using uniformisation of the CTMC, was presented in Hellander (2008).
In this paper, however, we focus instead on the (approximate) τ -leaping method, which in its simplest form (fixed τ ) does allow for an a priori determination of the dimension of the problem. Given K reaction channels and a simulation that runs with time step τ until final time T we find the dimension to be K T /τ , representing the total amount of random numbers used for a single path. The low-discrepancy numbers are then used in step 5 of Algorithm 3.1 to generate the Poisson random variables p k by applying an inverse transformation. Note that if this is done using a fast inverse transform, such as in Giles (2016), the process is not slower than direct methods for generating Poisson random variables in the current implementations of MATLAB and Python (R2017b and Numpy 1.14.0, respectively).

Numerical Experiments
We now test the effect of the combination of RQMC and τ -leaping on a set of example chemical reaction systems. We compare the results using τ -leaping to the results from numerically solving the CLE (6) using the EM discretisation as QMC methods have proven to be very effective for numerical simulation of SDEs in the past (Glasserman 2003). We note that the two computational methods are based on different models, the RTCR (2) and the CLE (6), respectively. As a result, the bias of the methods will be different and we therefore do not directly compare the summary statistics computed. Instead, we ignore bias and only measure the convergence rate of statistical errors for both methods. For work on the bias error incurred from using τ -leaping, we refer to Anderson and Koyama (2012);  and Rathinam (2016).
All numerical results for RQMC methods were obtained using the Sobol' sequences (Sobol' 1967). A random linear scramble was combined with a random digital shift (Matoušek 1998) to create randomised low-discrepancy points from the Sobol' sequences.

Monomolecular Reaction Networks
First we look at some elementary test systems to be able to closely compare the CLEbased method and the τ -leap method. The benefit of these systems is that the bias due to the finite step size τ is exactly known. In addition to this, the first two moments of the sample paths can be calculated analytically for both the τ -leap method and the EM discretisation scheme.

Linear Birth-Death Process
The first example is a single species linear birth-death system which models autocatalytic production and degradation of the species S 1 . For simplicity, we take the two reaction rates equal to each other so that we have E [X(t)] = X(0) and V [X(t)] = 2ctX(0), i.e. the system will exhibit fluctuations around the steady state given by the initial state X(0). Note that these identities also hold for the EM discretisation of the CLE and the τ -leap scheme applied to the RTCR, both computational methods are thus unbiased with respect to the CTMC model. In Fig. 4, we show the convergence results of the RMSE at time T = 1.6 for a system with c = 1 and X(0) = 10 3 . Both the Euler-Maruyama discretisation of the CLE and the τ -leap method use a time step τ = 0.2, i.e. we take eight steps in both methods. The dimension of the problem is therefore 16 (two reaction channels and eight time steps), which is generally thought to be within the realm of possibilities with (R)QMC methods.
We can clearly see that RQMC applied to both τ -leap and the CLE gives a strong improvement over the same method with standard pseudo random numbers. However, it is also clear that, contrary to the MC method, where both the CLE-based discretisation and τ -leap show equal convergence in terms of the RMSE, the RQMC method shows a difference in performance benefit. The SDE-based method has a convergence rate of roughly O(N −1 ) for all N . The same behaviour is not observed, however, for the τ -leap method which starts at an O(N −1 ) rate, but for N 10 2 seems to switch to the standard MC rate O(N −1/2 ). This might come as a surprise, because in the regime of high molecule numbers and reaction propensities the CLE and derived methods are expected to form an excellent approximation to the RTCR and τ -leap method.
We note that the decrease in convergence rate is not due to sample paths reaching low molecule numbers, which could result in a discrepancy between CLE-based methods and discrete molecule number methods such as the τ -leap method. With the initial conditions given above, such sample paths are very unlikely to happen and are not observed in the simulations used to produce Fig. 4. This also means that a strategy to  2005)), was not needed for this example. A clear difference between the τ -leap method and CLE-based method stems from their respective update formulas, (9) and (5), which are related but not equal. Therefore the results from the two methods can differ subtly. By increasing the reaction rate parameters of the system, the Poisson updates used for τ -leaping are better approximated by normal random variables, which is what is used in CLE-based methods. Furthermore, as a result of the difference in updates, the state space of the variable X(t) is continuous for the CLE-based methods and discontinuous, only taking integer values, for the RTCR-based τ -leap method. We now investigate what differences between the τ -leap method and CLE-based method exactly lead to the two contrasting convergence rate behaviours observed in Fig. 4.
Firstly we test whether the closeness of the τ -leap method and the equivalent discretisation of the CLE changes this observed behaviour of switching between convergence regimes. This is done by running a similar set of simulations with varying initial conditions, and therefore molecule number regimes. We set X(0) = ε −1 , so that as ε → 0 we expect better agreement between the τ -leap method and the CLE method. Note that as we vary ε the sample path variance for both methods has the form V [X(t)] = 2ctε −1 and therefore grows as ε → 0. In Fig. 5, we show the resulting comparison between the two methods, with the RMSE rescaled by ε −1/2 . This is done to normalise the RMSE by the sample path variance as ε is changed. Note that this rescaling does not influence the convergence rate behaviour as a function of N .
It is clear from Fig. 5 that for the EM discretisation of the continuous CLE the value of ε does not influence the convergence rate of the RMSE, i.e. it remains O(N −1 ) under changes in ε. The same cannot be said for the τ -leap method as now ε influences the transition between two different convergence regimes, fast O(N −1 ) and slow O(N −1/2 ) convergence, respectively. We observe that a smaller ε means that the transition takes place later, i.e. for higher N . Note that varying ε in the context of this system means changing the average copy number of S 1 encountered, and with that also the average reaction propensities. As a result, ε toggles how good the Poisson random variables in the τ -leap method can be approximated by normal variables, and therefore how good the CLE is as an approximation to the discrete dynamics. One might therefore think that RQMC performance depends on the 'closeness' of a discrete RTCR system is to its continuous CLE approximation. We now show that this is not necessarily the case. We consider an additional rescaling of the reaction rate constant of the form c = c 0 ε in combination with the previous rescaling of the initial condition. Note that now as ε → 0 this keeps the reaction propensities on average constant and of the order O(c 0 τ ) during a time step. As a result the value of ε does not change whether the EM discretisation of the CLE forms a good approximation to the τ -leap method. We perform a test to see what happens to the convergence rate if we change ε → 0 in this case. The results are shown in Fig. 6 and show similar behaviour compared to the previous example, where c was fixed. It is therefore not a 'closeness' of the RTCR to the CLE which governs the convergence rate, as this is determined by the propensities of the reaction channels. Rather it seems to be the copy number of S 1 molecules that is crucial for this system.

Reversible Isomerisation System
The previous example showed that in the case of molecule numbers in the system being not too small RQMC in combination with τ -leaping performed well. In the following example, we show that having large molecule numbers for some species in the system, does not guarantee good convergence behaviour of RQMC in combination with τ -leaping. We consider the two species system and start with X(0) = (X 1 , X 2 ) initial molecules. Define N 0 = X 1 + X 2 and note that this simple system is closed, which means that the sum of the number of S 1 and S 2 molecules at all times will be equal to N 0 . This information can be used to decouple the dynamics of S 1 and S 2 . Note that this system, under the CTMC model, converges to an equilibrium state of (α/(1 + α), 1/(1 + α)) N 0 . In order to ignore a transient regime in which the system goes to this equilibrium, we start the simulations with N 0 = α −1 (1+α)ε −1 and X(0) proportional to this equilibrium state, i.e. X(0) = ε −1 (1, α −1 ) . We note that under this X(0) initial condition for both the τ -leap method and the EM discretisation of the CLE we have E [X(t)] = X(0) and V [X(t)] ∝ X(0), like we saw in the previous system. This also means that both computational methods are unbiased for this system.
In Fig. 7, we show the results for a simulation until T = 1.6 with time step τ = 0.2 and parameters c = 1, α = 10 −4 and ε = 10 −2 . This means that S 2 has copy numbers of the order 10 6 , which one might reasonably say is large. We note again that there is a gain in performance in terms of RMSE if we compare RQMC and the equivalent MC method. However, we observe that, despite S 2 having large copy numbers, the RMSE for S 2 from τ -leaping quickly goes to O(N −1/2 ) convergence.
We can understand this quick slow down of convergence by again noting that N 0 = X 1 + X 2 remains constant. Therefore, the dynamics of X 2 , and thus also the RMSE, is slaved to the dynamics of S 1 molecules (and vice versa), i.e. RMSE(X 1 ) = RMSE(X 2 ). The RMSE of X 1 will attain a O(N −1/2 ) convergence rate relatively quick, because the number of S 1 molecules is moderate (on the order of 10 2 ), rather than large. The RMSE of S 2 molecules mimics this behaviour, because of the coupling via reactions between S 1 and S 2 molecules, and will therefore also change to O(N −1/2 ) convergence after the same number of samples N .
This example therefore shows that, by virtue of species being linked through reaction channels, it can be that high copy numbers for part of the reacting species do not guarantee faster convergence rates for RQMC methods than the standard O(N −1/2 ) rate. This is even the case when we use summary statistics that involve just those high copy number species (in our example the number of S 2 molecules).

Discrete Toy Model
To explain the observations from the previous examples, we consider a problem in traditional quadrature. We consider the integration of the following s-dimensional test functions over the domain [0, 1) s : Both functions integrate to zero over the s-dimensional hypercube and have variance regardless of s. We note that (20a) is an easy test function for (R)QMC methods as it represents a linear combination of one-dimensional functions (for which (R)QMC methods perform well). The effective dimension in the superposition sense of these additive functions is equal to one (Caflisch et al. 1997) and the convergence rate for RQMC 1 is O(N −3/2 ) regardless of dimension s. The second function (20b) was considered previously in Owen (1998) and is a much harder integrand for RQMC and MC methods. It has the property that RQMC methods for a low number of points have O(N −1/2 ) RMSE convergence which turns into O(N −3/2 ) if sufficiently many points are used [the definition of sufficient, which depends on s, is found in Owen (1998)]. RMSE convergence for these test functions for some dimensions s is depicted in Fig. 8. This shows that RQMC does indeed do a very good job at integrating (20a) and for N large enough the same holds for (20b). For (20a), we see that in terms of RMSE convergence there is no dependency on s. Note that for the chemical test systems previously discussed there was a clear difference in performance for RQMC methods between the continuous CLE and the discrete RTCR. In terms of quadrature, the integrand f in the first case is continuous, whereas in the second case it is discontinuous. Most convergence results for RQMC are based on the assumption that the integrand is continuous and it has been observed before that discontinuities can have an adverse effect on the convergence rate (Morokoff and Caflisch 1995;Moskowitz and Caflisch 1996;Berblinger et al. 1997;He and Wang 2015). We now show that a certain type of discontinuity, closely resembling the chemical reaction system case, can replicate the convergence behaviour that we have observed in the previous section.
We introduce the following transformation of the test functions f , which acts upon the input of the function f , where ε is a parameter which tunes the level of discontinuity. Note that as ε → 0 the function becomes smoother. In Fig. 9, we show the effect of varying ε on the onedimensional function (20a) and the filled contour plot for (20b) for ε = 0.07. Note that by applying transformation (22) we create a function which has discontinuities parallel to the axes of the integration domain [0, 1) s . In He and Wang (2015), it was proven that such axes-parallel discontinuities have a relatively mild effect on the convergence of RQMC methods. In Fig. 10, we see the effect that the introduction of discontinuity by (22) has on the RMSE convergence. Where the continuous functions showed O(N −3/2 ) convergence (recall Fig. 8), the discontinuous counterparts have, for large enough N , a slower O(N −1 ) convergence rate. The results in Fig. 10 hold for a wide range of dimensions s. As expected, results for (20a) are not affected by s due the fact that the function after transformation still is one-dimensional in superposition sense. On the other hand, for (20b), the effect of transformation (22) only shows once enough points have been used to leave the O(N −1/2 ) initial convergence, and after that convergence rates seem to drop from O(N −3/2 ) to O(N −1 ) as well.
Next we introduce a different transformation that converts continuous functions into discontinuous ones, Note that, in contrast to (22), this transformation acts upon the function output values. As a result, discontinuities introduced by (23) do not necessarily align with the axes of [0, 1) s , as can be seen in Fig. 11 in two dimensions. Results for the RMSE convergence for varying ε is shown in Fig. 12. We observe again that for small values of N the convergence rate is O(N −3/2 ), similar to the continuous case. However, we see that with transformation (23), for N large enough, the convergence rate becomes O(N −1/2 ), rather than O(N −1 ) which was observed for transformation (22). This comes back to the fact that the discontinuities introduced by (23) do not align with the axes of the integration domain [0, 1) s . One can understand this from the way many RQMC point sets are constructed (in particular digital nets, of which Sobol' point sets are a special case). For such sets, the points are equidistributed with respect to axes-aligned hyperrectangles. If the discontinuities of the integrand do not align with the domain axes, such as for transformation (23), then the RQMC points will not be able to sample of the integrand's different contributions uniformly. Discontinuities that do not align with the domain axes were also shown to be of a more detrimental type of discontinuity if one wants to use RQMC methods in He and Wang (2015). The limiting convergence rate is given by the MC rate O(N −1/2 ). This agrees with the fact that RQMC methods will, in the worst-case scenario, behave very much like a standard MC method and have a convergence rate which is not more than a constant times the MC rate (Owen 1998).
To further explain the convergence behaviour, we consider the decomposition of the discontinuous function into a continuous part, F(x), and discontinuous part, Note that |G(x)| ≤ ε and as a result the variance of G(x) is generally O(ε 2 ). We can then decompose the MSE of the estimator of the integral of f ε (x) by an unbiased RQMC rule as the sum of the MSE of the integration of F(x) and G(x). We note that the MSE for the continuous part, F(x), behaves like O(N −3 ), as observed in Fig. 8.
In the case of transformation (23) the RQMC method achieves MC like error rates for the discontinuous part, G(x). We therefore have the following decomposition of the MSE This yields a switch from fast O(N −3 ) convergence to slow O(N −1 ) when N = O(ε −1 ), i.e. at this point the error made for the discontinuous component of the function dominates the MSE. The same holds true for the RMSE and this scaling of the switch point as a function of ε is also observed in Fig. 12.
In the case of transformation (22) the RQMC method does not perform like a MC method and instead achieves O(N −2 ) convergence for the MSE. Note that the scaling of the variance now does not come into play, because the convergence is not MC like. Instead we observe a rescaling of the switching point N = O(ε −1 ) as well in Fig. 10. This leads to the following decomposition of the MSE This shows that, even in the case of a discontinuous integrand, RQMC methods can achieve lower MSE if the function can be decomposed in a continuous part and a discontinuous part that is relatively smaller in magnitude. RQMC performs superiorly on the continuous component of the integrand, giving fast error decay for a moderate number of points N . In the worst-case scenario, a MC convergence rate is achieved by RQMC on the discontinuous part, which will dominate the convergence order for large N .
This observation can be linked to observations made in Caflisch (1998). Caflisch notes that low-discrepancy point sets differ subtly from pseudo-random point sets in the sense that for a pseudo-random point set every point is an independent estimate of the integral. This is not true for a low-discrepancy point set, which has a deterministic structure. For these point sets, the initial points sample the integration domain on a coarse scale, whereas the later points are used for progressively finer scales. Therefore initially RQMC will perform well on a function like f ε , because on a coarse scale it is dominated by its continuous part, F(x). If more points are used the fine, discontinuous, structure due to G(x) starts to dominate and this is where the convergence stalls.
Note that for a general chemical reaction network it is not clear a priori how the summary statistic of interest can be decomposed into a continuous part and discontinuous part, or what the value of ε is. Or, in other words, it is not clear how important coarse scale continuous contributions are in relation to finer scale discrete ones. Therefore the performance benefit from using RQMC methods over MC methods can be hard to estimate a priori. We do, however, note that the implementation of low-discrepancy point sets is often relatively simple and does not need to increase the runtime of the simulation procedures (Appendix A). As a result, RQMC methods have a potential to provide computational savings over MC methods in the simulation approaches of chemical reaction networks by attaining lower statistical errors for similar computational time.

Schlögl System
As a final example, we look at the bistable Schlögl system, as encountered in Cao et al. (2004), which incorporates nonlinear interactions where we assume that the copy numbers for S 2 and S 3 are constant and large. The initial condition for S 1 is 250 molecules. Non-dimensional parameters are given by c 1 = 3 · 10 −7 , c 2 = 10 −4 , c 3 = 10 −3 , c 4 = 3.5 and the copy numbers for S 2 and S 3 are taken as 10 5 and 2 · 10 5 , respectively. The system is bistable for these parameters, with stable states around 100 copy numbers and 550 copy numbers for S 1 . We simulate the system up until final time T = 4 with time step τ = 0.4. We take the approach in Anderson and Higham (2012) to deal with sample paths with zero or fewer molecule numbers at a given time. We look at the mean number of S 1 molecules, though more meaningful summary statistics can be constructed for bistable systems.
In Fig. 13, we show results comparing the τ -leap method and EM discretisation of the CLE using both pseudo-random points and low-discrepancy points. We see that, although the RQMC method does not attain a much higher convergence rate than the standard MC rate of O(N −1/2 ), the RQMC method is superior to the standard MC method. Numerical experiments suggest that a similar situation as in Fig. 13 holds for at least the first few moments of S 1 copy numbers.
We also observe that, even though the CLE is continuous, the convergence rate for the EM discretisation is equal to that of τ -leap. This indicates that for this specific problem it might not be the discrete nature of the S 1 dynamics that causes the observed O(N −1/2 ) convergence rate. The behaviour is likely due to the fact that the system has four reaction channels and 10 time steps, leading to a dimensionality of 40 for this specific problem. Such a number of dimensions can be challenging for naïve QMC methods as applied here. One might benefit from applying a change of variables which transforms the effective dimension of the problem, and therefore improves the RQMC convergence. Techniques such as the Brownian bridge and principal components construction are available for SDEs and can help in making RQMC methods effective even for high-dimensional problems (Glasserman 2003, Chapter 5). For dynamics following the RTCR (2) such transformations are, however, not known and we leave this direction for future research.

Discussion and Outlook
It is known that the use of low-discrepancy numbers instead of pseudo-random numbers can greatly improve the convergence speed for problems involving traditional quadrature and SDEs. In this paper, we explored the application of RQMC methods in the framework of simulation of stochastic biological systems. In particular, we looked at the combination of low-discrepancy numbers with the τ -leap method. For simplicity, the fixed step τ -leap method was considered so as to allow for a simple implementation of low-discrepancy points without negative effects on the runtime. We note that the question of whether this is a good procedure has been addressed in the literature before (Cao et al. 2004(Cao et al. , 2005(Cao et al. , 2006Anderson 2008). This paper, however, does not focus on the question of whether τ -leaping forms a good approximation to the CTMC dynamics, which is the motivation therein for the discussion about time step selection. Rather, we focus on the question of how quick statistical errors in desired summary statistics decay as a function of the number of sample paths simulated. We answer this question in the simplest possible case, namely using fixed time step τ -leap, though we expect our conclusions below to be general enough to hold for a large class of simulation procedures for stochastic biological systems.
Theory suggests that the convergence rate for an RQMC method is not worse than for the equivalent MC method [up to a constant (Owen 1998)]. Reality seems to show that in case of chemical reaction networks RQMC is superior to MC, as evidenced by numerical experiments in Sect. 5. As a result, if one chooses the fixed time step τ -leap approach to simulate a chemical reaction network, the use of RQMC methods gives a better convergence behaviour as compared to the traditional MC implementation at no extra cost.
However, the benefits from using low-discrepancy numbers are smaller than anticipated based on results seen in the simulation of SDEs. In particular, if one chooses to model chemical reaction systems by SDEs in the form of the CLE, one sees a greater advantage in the use of low-discrepancy numbers. This effect can be caused by at least two factors.
Firstly, the inherently discrete nature of stochastic simulations of chemical reaction networks hinders RQMC convergence. It has been reported in the literature that discontinuous integrands experience less benefit from RQMC methods over standard MC methods (Morokoff and Caflisch 1995;Moskowitz and Caflisch 1996;Berblinger et al. 1997;He and Wang 2015). In Sect. 5, we showed through the use of a simplified test system that the behaviour observed in simulating chemical reactions can be replicated by introducing certain types of discontinuity in classical quadrature. The simple test systems in Sect. 5 allow for a detailed understanding of the RMSE convergence rate observed when applying RQMC. It is, however, not always possible to choose the biological model or its parameters such that the effect of discontinuities will be small. It would therefore be advantageous to have techniques that leave the desired summary statistic intact, but diminish the effect of discontinuities on the RMSE convergence.
Smoothing techniques have previously been considered to mitigate the effects of discontinuity in other contexts (Caflisch 1998;Moskowitz and Caflisch 1996) and we aim to address similar techniques in the context of chemical reaction networks in future work.
Secondly, it is known that the performance of (R)QMC methods can strongly depend on the dimension of the problem. As illustrated in the last example in Sect. 5, a higher dimension can lead to a much smaller performance benefit, regardless of the smoothness of the underlying problem. Methods to reduce the effective dimension of the problem by a change of variables have proven to be effective in other fields and it is an open question as to whether such transformations can be found for the simulation of biological systems. Another method which has proven to be fruitful in the simulation of DTMCs of potentially large dimension is array-RQMC (L'Ecuyer et al. 2008). This method is the cornerstone of the only other known QMC work in the area of stochastic biological systems (Hellander 2008). Observations in this paper about the effect of the discrete nature of chemical reaction systems support and explain the observation of a smaller than expected performance gain in Hellander (2008). In future work, we will explore the effect of discontinuities on the array-RQMC method and its combination with τ -leaping, with both fixed and adaptive time stepping.
We also point out that the original article introducing QMC methods in 1951 by Richtmyer (1951) considered a discrete linear birth process. He observed a smaller performance gain than expected and this might have impeded the further exploration of QMC methods in stochastic simulation for a few decades. Richtmyer's results can now be understood to be caused by the unfortunate choice of his chosen model problem, which is discontinuous in nature.
A further topic of future research is the effective dimension in the simulation of the RTCR. The concept of effective dimension and techniques to reduce said dimension are widely studied in the context of financial applications and in the future we aim to explore its implications for the models of interest in biology. and New College, Oxford, for funding. Ruth E. Baker is a Royal Society Wolfson Research Merit Award holder and a Leverhulme Research Fellow, and also acknowledges the BBSRC for funding via Grant No. BB/R000816/1.

Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A: Computational Effort to Generate Quasi-random Numbers
One should question whether the time taken to generate scrambled low-discrepancy sequences for an RQMC method is much longer than the time needed for pseudorandom numbers to be generated as this could void any observed performance gains. We therefore perform a small test to time the generation of the various random numbers. We time how long it takes to generate a point set of length N in s dimensions (averaged over 50 trials). Timing experiments were performed using MATLAB R2017b on an Ubuntu desktop PC with a 3.40 GHz Intel Core i7-2600K CPU and 16 GB of random access memory. We test the standard pseudo-random number generator (which uses the Mersenne Twister algorithm) versus Sobol' points with linear matrix scrambling and a random digital shift. The results are depicted in Fig. 14 and show that only for relatively small point sets the generation of pseudo-random numbers is distinctly faster than the Sobol' points (on the order of milliseconds). For point sets of lengths not uncommon in simulations (10 5 or more points), the difference is negligible. Therefore the completion time for an algorithm which has replaced pseudo-random numbers with low-discrepancy numbers will not differ noticeably. These findings agree with practical timing results for simulations of various financial applications in Lemieux (2009).

Appendix B: RQMC as a MC Variance Reduction Technique
An alternative look on RQMC is as a variance reduction technique within the standard MC framework as noted in L'Ecuyer (2016). After randomisation of the low-discrepancy point set, the estimator I (m) N becomes an unbiased estimator of the integral I in Eq. (12). The variance of the estimator can, by linearity, be written as In standard MC methods, the points v (i,m) used are independent and therefore the covariances are zero. For an RQMC method, however, this is not the case because of the deterministic construction of the points used. Note that this remains true despite the randomisation, because the point set as a whole still is a low-discrepancy set. In order to reduce the variance, one wants the contribution of the sum of covariances to be as negative as possible. This is attempted by RQMC methods through the construction of the points used in the quadrature and it places RQMC methods on equal footing with, for example, the standard variance reduction techniques of antithetic sampling and common random numbers (Lemieux 2009, Chapter 4).