Background

Biochemical systems frequently consist of reactions evolving on disparate timescales. The species regulated by fast reactions quickly equilibrate to a “quasi-steady-state (QSS)” [1], and hence these fast species can be assumed to be in a quasi-equilibrium that is dependent on the state of the slow species. This assumption allows one to eliminate the variables describing the fast species from deterministic models via non-elementary reaction functions. The deterministic quasi-state-state approximation (QSSA) can thus be used to reduce the dimensionality of a system and avoid stiffness in numerical simulations. QSSA has been widely used in both numerical and theoretical studies and its validity condition in deterministic models is well understood [111].

Timescale separation has also been used to reduce and accelerate simulations of stochastic models [1230]. The QSS of a fast species in the chemical master equation (CME) can be defined as the conditional average of the species which depends on the instantaneous state of the slow species [1618]. This approximation obviates the need to simulate fast reactions explicitly. However, to calculate the averages of the fast species, knowledge of the solution to the full CME is generally required, giving rise to a “chicken or the egg” problem – one introduces a reduced system to avoid solving the full system, but carrying out the reduction requires solving the full system. As an alternative, it has been proposed that one can approximate the needed averages using the QSS of the fast species obtained from the corresponding deterministic systems [16, 23, 24, 26]. For instance, Michaelis-Menten or Hill functions derived using the deterministic QSSA have been used as propensity functions in stochastic simulations – a method known as “stochastic QSSA”.

The validity of the stochastic QSSA relies on two assumptions: 1) the separation of timescales between the slow and fast reactions, and 2) the accurate approximation of the stochastic QSS (i.e. the conditional average of the fast species) by the deterministic QSS [30]. It is not well understood when these assumptions hold. In many previous studies, it has been assumed that the stochastic QSSA is accurate whenever the corresponding deterministic QSSA is accurate [3134]. However, recently introduced examples show that this may not always be true, as the reduced stochastic model may poorly approximate the full model even when their deterministic counterparts agree [27, 30, 35, 36]. Due to this discrepancy, previous studies concluded that the stochastic QSSA cannot be validated using the deterministic QSSA, leaving open the question of alternative validation methods. Nevertheless, the stochastic QSSA is still used widely in simulations of complex models that would otherwise be intractable [3134, 3747].

Here, we identify a clear correspondence between the validities of the deterministic and the stochastic QSSA for examples widely used in biological systems. This relation provides a simple and computationally inexpensive method for validating the stochastic QSSA. Specifically, we find that discrepancies between the stochastic and the deterministic QSSA stem from the fact that, due to the random fluctuations, the stochastic system explores a wider range of states than its deterministic counterpart. We provide an analysis of a two-state promoter model, and use numerical simulations for a variety of other models to show that the stochastic QSSA is accurate only when the deterministic QSSA is accurate over a range of initial conditions that cover the most likely states explored by the stochastic system. Our finding suggests that in many cases the validity of the stochastic QSSA can be checked post facto by examining the deterministic QSSA over a range of initial conditions obtained by simulating the reduced stochastic model.

Results and discussion

Discrepancy between stochastic and deterministic QSSA

We began investigating the relationship between the stochastic and the deterministic QSSA with a simple transcriptional negative feedback loop model (Fig. 1 a):

$$\begin{array}{*{20}l} \dot {M} &= \alpha_{M} D_{A} - \beta_{M} M, \end{array} $$
((1))
Fig. 1
figure 1

The relationship between the accuracy of the deterministic and the stochastic QSSA. a The diagrams for the full model (Eqs. 13) and the reduced model (Eq. 5). bc The deterministic QSSA is accurate when both k f =10−1 h −1 and k f =101 h −1 (the insets). However, the corresponding stochastic QSSA is accurate only when k f =101. The colored ranges and histograms represent a standard deviation of R from its mean and the distribution of R at steady state, respectively. Here, K D =10, α M =300h −1, β M =β R =1h −1. M(0)=R(0)=100 and D A (0)=0. de The phase plots of deterministic trajectories with various initial conditions when k f =10−1 (d) and k f =101 (e). The star and the square indicate initial conditions used in (b) and (c) and the insets of (d) and (e), respectively. The black circle indicates the fixed point. Here, we assumed M(0)=R(0)

$$\begin{array}{*{20}l} \dot {R} &= \alpha_{R} M - \beta_{R} R -k_{f} R D_{A} + k_{b} D_{R}, \end{array} $$
((2))
$$\begin{array}{*{20}l} \dot {D}_{A} &= -k_{f} R D_{A} + k_{b} D_{R}, \end{array} $$
((3))

where M, R, D A , and D R are the concentrations of mRNA, repressor, active DNA, and repressed DNA, respectively, measured relative to the total amount of DNA, D T ; α M and α R are the transcription and translation rates, respectively; k f and k r are the forward and reverse rates of repressor binding to DNA; and β M and β R are the degradation rates of mRNA and repressor, respectively. The total amount of DNA is conserved, and hence our choice of units implies that D A +D R =1.

If D A evolves faster than M and R, Eq. 3 equilibrates faster than Eqs. 1-2. Thus, D A can be assumed to be in steady state with respect to the instantaneous state of M and R [1, 48, 10]. The QSS of D A can be obtained by solving the QSS equation, \(\dot {D}_{A}=0\), giving

$$ D_{A} (R) = \frac {K_{D}}{K_{D}+R},\ $$
((4))

where K D =k b /k f . Equation 4 allows us to close the remaining equation (Eqs. 12) giving the reduced system (Fig. 1 a):

$$\begin{array}{*{20}l} \dot {M} &= \alpha_{M} \frac {K_{D}}{R + K_{D}} - \beta_{M} M,\\ \dot {R} &= \alpha_{R} M - \beta_{R} R.\ \end{array} $$
((5))

Note that different choices of k b and k f for Eqs. 13 result in the same reduction (Eq. 5), provided the ratio K D =k b /k f is fixed. We asked whether values of k f that lead to accurate deterministic reductions also provide accurate stochastic approximations. To address this question we varied k f while keeping K D =10 fixed. For both k f =10−1 and k f =101, the reduced model provides an accurate deterministic approximation to the full model for certain initial condition (insets of Fig. 1 b and c). However, the corresponding stochastic simulations of the reduced and the full model disagree when k f =10−1 (Fig. 1 b and c) (see ‘Methods’ for details of stochastic simulations). These results agree with previous studies showing that the accuracy of the deterministic reduced model does not guarantee the accuracy of stochastic simulations [27, 30, 35, 36].

Interestingly, the reduced deterministic model becomes inaccurate when initial conditions are changed for k f =10−1, but remains accurate over a wide range of initial conditions when k f =101 (Fig. 1 d and e). Thus, we hypothesized that the stochastic reduction is accurate if the deterministic reduction is valid over a range of initial conditions. We next investigated this hypothesis analytically and numerically.

A condition for an accurate stochastic QSSA

In the presence of timescale separation, deterministic solutions evolve in two phases: an initial transient (IT) phase and a QSS phase. The two phases correspond to times before and after the solutions are in QSS, respectively. During the IT phase the solution approaches the slow manifold defined by the QSS equation (e.g. the red dashed lines in Fig. 1 d and e). In this phase of the full model the “fast” variables have not equilibrated to their QSS. In the reduced model, however, fast variables are assumed to equilibrate instantaneously. Thus, the reduction is valid only when the “slow” variables change little during the IT phase [1, 4]. If this condition is satisfied, trajectories in the phase plane horizontally approach the slow manifold from their initial conditions (Fig. 1 d and e) [1, 3, 4].

The trajectories of the deterministic system evolve along the slow manifold during the QSS phase (Fig. 1 d and e). However, trajectories of the corresponding stochastic system fluctuate around the slow manifold. Therefore, any trajectory starting away from the slow manifold that does not horizontally relax onto it on average will contribute to the error in the stochastic QSSA. Thus, during the return of the fast variables to the slow manifold after a random fluctuation, the slow variables should change little. This leads to the hypothesis that for the stochastic QSSA to be accurate, the deterministic QSSA should be accurate for a range of initial conditions which contain the most likely fluctuations away from the slow manifold.

To investigate this hypothesis, following Segel and Slemrod [1], we estimated how much a slow variable (R) changes compared to its initial condition (R(0)) during the IT phase:

$$ \left|\frac{\Delta R}{R(0)}\right| \approx \frac{\beta_{R} / k_{f} +D_{A}^{\text{max}}}{R(0)+K_{D}}. \ $$
((6))

Here \(D_{A}^{max}\) is the maximum value of D A during the IT phase (See ‘Methods’ for details). This quantity should be small for the deterministic QSSA to be accurate. Equation 6 predicts that the error of the deterministic QSSA increases as either R(0) or k f decrease, in agreement with simulations (Fig. 1 d and e).

In previous work [30] we showed that the stochastic QSSA is accurate when (See ‘Methods’ for details):

$$ k_{f} \gg \beta_{R} \qquad \text{and} \qquad \frac{\text{Var}(D_{A})}{R+K_{D}} \ll1. \ $$
((7))

The first condition guarantees that the reactions regulating D A is fast and the second condition ensures that the stochastic QSS of D A (i.e. the conditional average of D A ) is accurately approximated by a deterministic QSS equation for D A (Eq. 4). Interestingly, the second condition is similar in form to the condition for timescale separation for the deterministic QSSA during the IT phases (Eq. 6). Thus, the conditions that imply the accuracy of the stochastic QSSA (Eq. 7) are satisfied if Eq. 6 is small for those values of R(0) that cover the range of states realized during the stochastic simulations. This supports our hypothesis that the stochastic QSSA is accurate when the deterministic QSSA is accurate for a range of initial conditions. This also explains why the parameter region for which the stochastic QSSA is valid is smaller than for the deterministic QSSA (Fig. 1 b and c).

A numerical method for testing the accuracy of the stochastic QSSA

Equations 6 and 7 predict that both the stochastic and the deterministic QSSA become increasingly accurate as k f or K D increases. We tested these predictions by numerically estimating the error of the deterministic QSSA using:

$$ \max_{\substack{\mathcal{N}}}\frac {{\int_{0}^{T}} \! \left|X_{\text{full}} (t) - X_{\text{QSSA}} (t)\right| dt}{{\int_{0}^{T}} \! \left|X_{\text{QSSA}} (t)\right| dt}, \ $$
((8))

where X full(t) and X QSSA(t) represent the solutions (R(t)) of the full (Eqs. 13) and the reduced model (Eq. 5), respectively. We chose the range of initial conditions (\(\mathcal {N}\)) to be three standard deviations from the expected values of the slow variables (M and R) at steady state of the reduced model (Red histograms in Fig. 1 b and c). The entire range of the fast variable (D A ) was included in \(\mathcal {N}\) because we cannot determine its expectation and standard deviation with the reduced model. Equation 8 provides a measure of the accuracy of the deterministic QSSA over the range of initial conditions that contains the likely stochastic perturbations from the steady state.

We used the relative error of the coefficient of variation at steady state to test the accuracy of the stochastic QSSA. Both Eq. 8 and the error of the stochastic QSSA decrease as k f and K D increase (Fig. 2), matching our analysis (Eqs. 67). This again shows the close relation between the error of deterministic and the stochastic QSSA. In more complex models an analytical approach may not be possible. For such models, the accuracy of the deterministic QSSA estimated numerically using Eq. 8 can inform the validity of the stochastic QSSA as described in the next section.

Fig. 2
figure 2

The numerically estimated error of deterministic and stochastic QSSA show strong correlation. ab The error of the stochastic QSSA is closely corelated to the error in the deterministic QSSA when k f (a) and K D (b) change. Here, the error in the deterministic QSSA is measured using the relative differences in deterministic solutions R(t) given in Eq. 8. The error of the stochastic QSSA is measured using the relative difference of coefficient of variation at steady state between the full and the reduced model. Note that since the error of stochastic and deterministic QSSA were measured in different ways, they can potentially differ, even by orders of magnitude. See Additional file 1: Figure S2 for the difference of the distributions of R between the stochastic QSSA and full model. The error bars were estimated with the boot strap method as described in ‘Methods’. See Fig. 1 for the details of parameters

The proposed numerical method for testing the accuracy of the stochastic QSSA with Eq. (8) is simple, but has limits. Importantly, we do not require stochastic simulations of the full model which can be computationally expensive. Hence, the distribution and the range of fluctuations of the fast species is not required to be known. Our method requires that the reduced deterministic system is accurately approximated the full deterministic system over a range of initial conditions that includes all plausible values of the fast species. This range of initial conditions must include most of the mass of the distribution of the fast species, but can be larger. Since we do not know the distribution of the fast species this could include initial conditions that are outside the fluctuation range of the fast species. Hence, our condition can be too restrictive, as we could require it to hold for initial conditions that are never visited by the stochastic system.

As an example, consider the unscaled model in Eqs. 13 (i.e. variables are not scaled by D T ). For fixed volume Ω, as D T increases the number of molecules of the fast species, D A , increases. This reduces fluctuations of D A and hence the range of the likely values of D A (Additional file 1: Figure S1A). Thus, as D T increases, requiring the deterministic QSSA to be accurate for all possible values of D A is a more restrictive condition than necessary to ensure the accuracy of the stochastic QSSA. As a result, when the error of the deterministic QSSA is estimated for a range of initial conditions that includes all possible values of the fast species, the error of the deterministic and stochastic QSSA shows a discrepancy as D T increases (Additional file 1: Figure S1B), and our method will give a false negative. This obvious limitation of our method can be resolved by estimating the distribution of D A at least approximately.

We also found that normalizing the fast species as we scaled the system with D T can help overcome the limitation of our method. In the scaled system (Eqs. 13 and Eq. 5), D T only appears implicitly in the parameter \(k_{f}=D_{T} k_{f}^{*}\) (\(k_{f}^{*}\) is the binding rate before scale) instead of directly affecting the DNA concentration. In the scaled system, our method predicts that as k f increases, the stochastic QSSA becomes more accurate (Fig. 2 a). This result can be used to predict that, since \(k_{f}=D_{T} k_{f}^{*}\) as either \(k_{f}^{*}\) or D T increases, the accuracy of the stochastic QSSA for the unscaled model would equally increase. This agrees with the numerical error analysis of the stochastic QSSA for the original model (Additional file 1: Figure S1C). In summary, the effect of D T on the accuracy of the unscaled model can be accurately captured by applying our method Eq. 8 to the scaled model. Hence, when testing the parameter dependence of the stochastic QSSA, normalization of the fast species can improve the reliability of our criterion using Eq. 8.

Key parameters for an accurate stochastic QSSA

Next, we tested whether our finding can explain the previously observed discrepancies between the stochastic and the deterministic QSSA [27, 30]. We began with a model of cooperative enzyme kinetics (Fig. 3 a) [5, 27]:

$$ \begin{aligned} \dot {S} &= k_{in} - k_{1} SE + k_{-1}E_{S} - k_{2}SE_{S} +k_{-2}E_{S_{2}}, \\ \dot {E_{S}} &= k_{1} SE - k_{-1}E_{S} - k_{2}SE_{S} +k_{-2}E_{S_{2}}+k_{p} E_{S_{2}}, \\ \dot {E}_{S_{2}} &= k_{2}SE_{S} - k_{-2}E_{S_{2}} - k_{p} E_{S_{2}},\\ \dot {P} &= k_{p} E_{S_{2}}.\ \end{aligned} $$
((9))
Fig. 3
figure 3

The deterministic QSSA can identify key parameters determining the accuracy of the stochastic QSSA. a The diagrams of the full model (Eqs. 9) and the reduced model (Eq. 10). bc) Whereas the deterministic QSSA is accurate (b), but the stochastic QSSA is inaccurate (c) with the same initial condition: \( S(0)=E_{S}(0)=E_{S_{2}}(0)=0\protect \phantom {\dot {i}\!}\). The deterministic QSSA becomes inaccurate with a different initial condition: S(0)=410 or 500, E S (0)=0, \(E_{S_{2}}(0)=1\) (the inset of (b)). The colored ranges and histograms represent a standard deviation of S from its mean and the distribution of S at steady state, respectively (c). The parameters of the model is adopted from Thomas et al. [27]: k in =0.5s −1, k −1=k −2=100s −1, k p =1s −1, \(K_{m_{1}}=2 \cdot 10^{6}\), \(K_{m_{2}}=0.101\). de Both the errors of the stochastic and the deterministic QSSA depends on k −1 (d), but not k −2 (e). The errors were measured as in Fig. 2. In particular, the error of the deterministic QSSA is estimated with Eq. 8, where T=4000 and X(t)=S(t) were used. See Additional file 1: Figure S3 for the distribution of S from the stochastic simulations

Here, the substrate (S) reversibly binds with enzyme (E) to form a complex (E S ). This complex can bind another substrate (S) to form a second complex (\(E_{S_{2}}\)), which dissociates to the first complex (E S ) and product (P). We scale all concentration relative to the total enzyme concentration, E T . The total enzyme concentration is constant, so that \(E+E_{S}+E_{S_{2}}=1\phantom {\dot {i}\!}\).

If E S and \(E_{S_{2}}\) equilibrate more quickly than S, and \(\frac {k_{-1}}{k_{1}} \gg \frac {k_{-2}+k_{p}}{k_{2}}\) leading cooperative substrate binding, then the full model can be reduced with QSSA [5, 8, 27] giving:

$$ \dot {S} = k_{in} - \frac{k_{p} S^{2}}{{K^{2}_{m}} + S^{2}}, \qquad \quad \dot {P} = \frac{k_{p} S^{2}}{{K^{2}_{m}} + S^{2}}, $$
((10))

where \({K^{2}_{m}} =\frac {k_{-1}}{k_{1}} \frac {k_{-2}+k_{p}}{k_{2}}\) (Fig. 3 a). Thomas et al. [27] showed that, in the deterministic case, the reduced model accurately captures the behavior of the full model (Fig. 3 b), but its stochastic equivalent with the same initial condition does not (Fig. 3 c). This discrepancy between the deterministic and the stochastic QSSA can be explained by observing that the deterministic QSSA becomes inaccurate for other initial conditions that correspond to fluctuations from the slow manifold in the stochastic system (see the inset of Fig. 3 b).

If we change k −1 and k −2 while fixing the values of \(K_{m_{1}}=\frac {k_{-1}}{k_{1}}\) and \(K_{m_{2}}=\frac {k_{-2}+k_{p}}{k_{2}} \), then \({K^{2}_{m}}\) in Eq. 10 does not change and the full model has the same reduction. We varied k −1 and k −2 while keeping \({K_{m}^{2}}\) fixed to determine when the reduced model is accurate. The error of the deterministic QSSA estimated with Eq. 8 decreases with the increase of k −1, but not k −2 (Fig. 3 d and e). This suggests that the error of the stochastic reduction will depend on k −1, but not on k −2. We confirmed this prediction in further stochastic simulations (Fig. 3 d and e). This illustrates how the key parameters determining the accuracy of the stochastic QSSA can be identified by examining the accuracy of the deterministic QSSA over an appropriate range of initial conditions.

Comparison of different stochastic QSSAs

Up to this point we have only considered the standard QSSA (sQSSA), but other versions of QSSAs have been proposed [4, 6, 7, 10, 12, 30]. Therefore, we next investigated whether the same relationship between the stochastic and the deterministic QSSA holds for other QSS reduction techniques, such as the total QSSA (tQSSA) [4, 6, 10] and the pre-factor QSSA (pQSSA) [7, 10, 12]. For illustration, we chose a transcriptional negative feedback loop model (Fig. 4 a) given by the system:

$$\begin{array}{*{20}l} \dot {M} &=\, \alpha_{M} D_{A} - \beta_{M} M, \end{array} $$
((11))
Fig. 4
figure 4

Different types of stochastic QSSA. a The diagram of full model (Eqs. 11-14) and two types of reduced models (Eqs. 17 and 18). b When an initial condition is on the limit cycle, only the stochastic tQSSA is accurate while both deterministic pQSSA and tQSSA are accurate (Additional file 1: Figure S4A). A detailed numerical error analysis with Fourier transform is provided in Additional file 1: Figure S5. c Even with an initial condition on the limit cycle (black circle), for which deterministic pQSSA is accurate (Additional file 1: Figure S4A), the stochastic trajectory escapes the limit cycle due to the fluctuation, where the deterministic pQSSA is inaccurate (e.g. black square) (Additional file 1: Figure S4B). The parameters of the model are adopted from [30]: α M =15.1745h −1, α P =α F =β M =β P =β F =1h −1, k b =50h −1, k f =200n M −1 h −1, D T =164.75n M

$$\begin{array}{*{20}l} \dot {P} &=\, \alpha_{P} M - \beta_{P} P, \end{array} $$
((12))
$$\begin{array}{*{20}l} \dot {R} &=\, \alpha_{R} P - \beta_{R} R -k_{f} R D_{A} + k_{b} D_{R}, \end{array} $$
((13))
$$\begin{array}{*{20}l} \dot {D}_{R} &= k_{f} R D_{A} - k_{b} D_{R} - \beta_{R} D_{R},\ \end{array} $$
((14))

where the transcription of mRNA (M) occurs proportional to active DNA (D A ) and M is translated to the protein (P), which transforms to the active repressor (R). The repressor reversibly binds with D A to form repressed DNA (D R ). The total DNA concentration D T =D A +D R is constant.

Our previous study shows that the tQSSA, but not the sQSSA, provides a valid reduction of the full model when binding and unbinding are fast [30]. For the tQSSA, we introduce a new variable, the total amount of repressor, TR+D R , and replace Eqs. 13 and 14 with

$$ \begin{aligned} \dot T &= \alpha_{R} P - \beta_{R} T, \\ \dot D_{R} &= k_{f} (T-D_{R}) D_{A} - k_{b} D_{R} - \beta_{R} D_{R}.\ \end{aligned} $$
((15))

Note that T does not depend on the fast reversible binding unlike F. By using D R =D T D A and solving the QSS equations for the fast species (\(\dot {D}_{R}=0\)), we can obtain the equilibrium values of D A in terms of T [30, 48]:

$$ D_{A} (T) = \frac{D_{T} - T - K_{d}+ \sqrt{\left(D_{T} - T - K_{d}\right)^{2} + 4D_{T} K_{d}}}{2}, $$
((16))

where K d =(k f +β R )/k b . With this equation, we can reduce the system (Fig. 4 a):

$$ \begin{aligned} \dot M =&\, \alpha_{M} D_{A} (T) - \beta_{M} M,\\ \dot P =&\, \alpha_{P} M - \beta_{P} P,\\ \dot T =&\, \alpha_{R} P - \beta_{R} T.\ \end{aligned} $$
((17))

Because the tQSS solution (Eq. 16) is less intuitive than the Michaelis-Menten-like form of the sQSS solution (Eq. 4), the reduced system can be transformed into a more intuitive form by expressing Eq. 17 in terms of the original free protein variable, R. Using \(D_{A}(R)= \frac {D_{T} K_{d}}{R + K_{d}} \) and \(\dot {T}=\frac {\partial T}{\partial R} \dot {R}\), we obtain:

$$ \begin{aligned} \dot {M} =&\, \alpha_{M} \frac {D_{T} K_{d}}{R + K_{d}} - \beta_{M} M,\\ \dot {P} =&\, \alpha_{P} M - \beta_{P} P,\\ p(R) \dot {R} =&\, \alpha_{R} P - \beta_{R} \left(R + \frac {D_{T} R}{R + K_{d}}\right),\ \end{aligned} $$
((18))

where \(p(R) \equiv \frac {\partial T}{\partial R} = \frac {\partial R}{\partial R}+\frac {\partial D_{R}}{\partial R}=1+ \frac {D_{T} K_{d}}{(R + K_{d})^{2}}\) (Fig. 4 a). Due to the pre-factor (p(R)) in Eq. 18, this reduction is known as pQSSA [7, 10, 12]. We have shown previously [30] that both the tQSSA and the pQSSA accurately approximate sustained oscillations of the full deterministic model (Additional file 1: Figure S4A). However, only the tQSSA provides an accurate approximation to the full stochastic model (Fig. 4 b and Additional file 1: Figure S5).

The discrepancy between the stochastic and the deterministic pQSSA can again be explained by examining the initial transient for a range of initial conditions (Fig. 4 c). In the deterministic case, initial conditions on the limit cycle (e.g. the black circle on blue loop in Fig. 4 c) lead to solutions that are accurately approximated using the pQSSA (Additional file 1: Figure S4A). However, initial conditions off the limit cycle (e.g. the black square in Fig. 4 c), the deterministic pQSSA becomes inaccurate (Additional file 1: Figure S4B). We thus expect the stochastic pQSSA to fail as well even for initial conditions on limit cycle (Fig. 4 b). For a stochastic reduction to be accurate, the corresponding reduced deterministic model needs to agree with the full model in a neighborhood of the limit cycle – a neighborhood that contains most of the mass of the stationary distribution of the stochastic system. This indicates that accurately approximating limit cycle period or amplitude – the focus of most previous deterministic models [5, 7, 8, 48, 49] – is not sufficient to guarantee an accurate stochastic approximation [50]. It is also necessary to check whether the deterministic model captures global dynamical features around the neighborhood of limit cycle [5154].

Both the deterministic pQSSA and tQSSA have been developed to improve the accuracy of sQSSA [4, 6, 7, 10]. The deterministic tQSSA broadens the range of initial conditions over which the approximation is valid during both the IT phase and QSS phase [4, 6, 10]. On the other hand, the transformations required for obtaining the pQSSA assume that the fast variables are initially in QSS (note \(D_{A}(R)= \frac {D_{T} K_{d}}{R + K_{d}} \) was used) [7, 10]. Therefore, the pQSSA is not necessarily valid during the IT period unlike tQSSA. This explains why the stochastic tQSSA, but not the stochastic pQSSA, is more accurate than the stochastic sQSSA [23, 24, 30].

Stochastic QSSA with composite reductions

We next investigated whether our findings can be extended to more complex models that contain multiple non-elementary functions. We considered the transcriptional negative feedback loop model with enzymatic degradation (Fig. 5 a) can be described using the following equations [27]:

$$ \begin{aligned} \dot {M} &= \alpha_{M}(D+D_{R}) -\beta_{M} M,\\ \dot {R} &= \alpha_{R} M -k_{1} RD + k_{-1}D_{R} -k_{2} R D_{R} \\ &\quad+k_{-2}D_{R_{2}} - k_{3}ER + k_{-3}E_{R}, \\ \dot {D} &= -k_{1} RD + k_{-1}D_{R}, \\ \dot {D_{R}} &= k_{1} RD - k_{-1}D_{R} -k_{2} RD_{R} + k_{-2}D_{R_{2}}, \\ \dot {E_{R}} &= k_{3}ER - k_{-3}E_{R} -k_{4}E_{R}. \end{aligned} $$
((19))
Fig. 5
figure 5

The deterministic QSSA can inform the validity of the stochastic QSSA with composite reduction. a The diagram of the full model (Eqs. 19) and the reduced model (Eqs. 20). b The three different types of reduced models (ER, DR and EDR) (Additional file 1: Figure S6) accurately approximate the deterministic simulations of full model with an initial condition, which will be used for stochastic simulation: M(0)=R(0)=E R (0)=G(0)=G R (0)=0. However, only ER model accurately approximates the full model if R(0)=100. ce The stochastic simulation of only ER model is accurate for an initial condition with which all three types of reduced model are deterministically accurate (Fig. 5b). The colored ranges and histograms represent a standard deviation of R from its mean and the distribution of R at the steady state. The parameters of the model are adopted from [27]: α M =50s −1, β M =k R =k 4=1s −1, k −1=k −2=k −3=10s −1, k 1=10−5 s −1, k 2=102 s −1, K M =110, D T =0.01

The rate of mRNA (M) transcription is proportional to the concentration of unbound DNA (D), and DNA bound with one protein (D R ). DNA bound with two proteins (\(D_{R_{2}}\)) is reposed status. mRNA is translated to protein (R), which reversibly binds with D and D R to form D R and \(D_{R_{2}}\), respectively. The protein also reversibly binds with the enzyme (E) to form a complex (E R ), which decays. \(D_{T} = D+D_{R}+D_{R_{2}}\phantom {\dot {i}\!}\) and E T =E+E R are constant. If E R , D, and D R equilibrate faster than M and R, and \(\frac {k_{-1}}{k_{1}} \gg \frac {k_{-2}}{k_{2}}\), the model can be reduced to [27]):

$$ \dot {M} = \frac{\alpha_{M} D_{T} {K^{2}_{D}}}{{K^{2}_{D}} +R^{2}} - \beta_{M}M, \quad \dot {R} = k_{R} M - \frac{k_{4} E_{T} R}{K_{M} + R}, $$
((20))

where K M =(k −3+k 4)/k 3 and \({K^{2}_{D}} =k_{-1}k_{-2}/k_{1}k_{2}\) [27]. The reduced model is obtained through composite reductions resulting in two non-elementary terms: the Hill function, \( \frac {\alpha _{M} D_{T} {K^{2}_{G}}}{{K^{2}_{D}} +R^{2}},\) describing transcriptional repression and the Michaelis-Menten function, \(\frac {k_{4} E_{T} R}{K_{M} + R},\) describing enzymatic degradation.

The reduced stochastic model is inaccurate [27]. To investigate whether a particular step in the composite reduction leads to this inaccuracy, we compared four models: the full model, the model with reduced enzymatic degradation (ER), the model with reduced transcriptional repression (DR) and the fully reduced model obtained using both reductions (EDR) (Fig. 5 a and Additional file 1: Figure S6). The solutions of these three deterministic reductions agree for a particular initial condition, which will be used for the stochastic simulation (Fig. 5 b). However, when we varied the initial condition, only the ER model accurately approximated the full deterministic model (Fig. 5 b). This suggests that an accurate stochastic reduction needs to contain an elementary representation of the repressor to DNA binding process. Indeed, only the stochastic ER model provided an accurate approximation (Fig. 5 c-e). The deterministic QSSA pointed to the reduction step which caused the inaccuracy in the stochastic QSSA. This suggests that our theory can identify the reduction steps that lead to a valid approximation (e.g. ER model).

Stochastic QSSA with an unbounded fast variable

Finally, we investigate an example in which the values of the fast variable are not bounded. Namely, consider the constitutive transcription of first mRNA and then translation of protein described by the deterministic system:

$$ \dot {M} = \alpha_{M} - k_{M} M, ~~\dot {P} = \alpha_{P} M - k_{P} P. $$
((21))

If k M k P , M rapidly equilibrates to its QSS (α M /k M ) and the reduced model becomes

$$\begin{array}{@{}rcl@{}} \dot {P} &=& \frac{\alpha_{M} \alpha_{P}}{k_{M}} - k_{P} P. \end{array} $$
((22))

Since the system is linear, the exact variance of P at the steady state can be calculated [27, 55]:

$$\begin{array}{@{}rcl@{}} \sigma^{2}_{\text{full}}=\langle P\rangle\left(\frac{\alpha_{P}/k_{M}}{1+k_{P}/k_{M}}+1\right), ~~\sigma^{2}_{\text{QSSA}}=\langle P\rangle. \end{array} $$

where 〈P〉 is the average of P at the steady state (α M α P /k M k P ).

Previous studies note that even if k P /k M ≪1, which ensures the accuracy of the deterministic QSSA, the stochastic QSSA has an error that depends on α P /k M [27, 55]. This discrepancy can also be explained by our finding: the accuracy of the stochastic QSSA depends not only on the accuracy of the deterministic QSSA, but also on the range of fluctuations in the full stochastic system. To see how this applies to the present example, assume that k M and k P are fixed and k P /k M ≪1. Then, as long as α M α P is fixed, the full model (Eq. 21) always has the same reduction (Eq. 22), and the differences between trajectories of the full model and the reduced deterministic model remain the same for different initial values of normalized M(0) by the steady state of M(α M /k M ) (Fig. 6 a). However, a larger α P (and, accordingly, a smaller α M ) results in a lower equilibrium concentration, and hence larger fluctuation in the normalized M (See Fig. 6 b, the coefficient of variation of M is \(\sqrt {k_{M}/\alpha _{M}}\)). This means that we need to require the deterministic reduction to be accurate for a larger range of initial conditions of the normalized M, and, in particular, small M(0)/(α M /k M ) when α P is large (Fig. 6 a and b). Thus, while α P does not affect the accuracy of deterministic QSSA, it does impact the error of the stochastic QSSA.

Fig. 6
figure 6

A model with an unbounded fast variable. a The trajectories of the full model (Eq. 21) and the reduced deterministic model (Eq. 22) during the IT period are the same for the same initial values of normalized M(0) by the steady state of M (α M /k M ) when α M =1,α P =10 and when α M =10,α P =1. The difference between the trajectories increases as M(0) decreases. Here, k M =1,k P =0.1. b When α M =1,α P =10, fluctuation in the normalized M by its steady state is larger than for α M =10,α P =1. c When k P =0.01, the relative differences between the trajectories of the full and reduced deterministic model during the IT period are smaller than when k P =0.1 (c.f. panel a)

As k P decreases, the deterministic QSSA becomes more accurate (Fig. 6 c), but the difference between \(\sigma ^{2}_{\text {full}}\) and \(\sigma ^{2}_{\text {QSSA}}\) increases, apparently contradicting our claim. However, unlike above, the change in k P leads to different reductions with different scales (Eq. 22) and therefore the accuracy of the reduction should be compared in a relative sense (Fig. 6 a, c). Indeed, when we compare the relative accuracy of the stochastic QSSA (the coefficient of variation), we see that :

$$\begin{array}{@{}rcl@{}} \frac{\sigma_{\text{full}}-\sigma_{\text{QSSA}}}{\langle P\rangle} \approx \frac{\sqrt{\alpha_{P}/k_{M}+1}-1}{ \sqrt{\alpha_{M}\alpha_{P}/k_{P}k_{M}}} \propto \sqrt{k_{P}}, \end{array} $$

where we used k M k P in the first approximation. Hence, the relative error of the stochastic QSSA decreases with decreasing k P , as does the relative error of the deterministic QSSA.

This example shows that the deterministic and stochastic QSSA are related as described earlier, even when the fast variable is not bounded. However, applying the simple numerical method (Eq. 8) to this example is not possible since it would require one to test the accuracy of deterministic QSSA for all possible values of the fast variables, a point we consider further in the Conclusions.

Conclusions

Numerous methods have been developed to accelerate and simplify stochastic simulations [1230]. The stochastic QSSA is the most widely used reduction technique due to its simplicity and general applicability [3134, 3747], but its validity is rarely justified rigorously [30]. Often it is tacitly assumed that the stochastic QSSA is accurate whenever its deterministic counterpart is valid [3134]. However, recent counterexamples have brought this assumption into question [27, 30, 35, 36]. Here, we demonstrated a clear relationship between the accuracy of the two reductions. Our analysis and simulations reveal that the stochastic QSSA is valid if the deterministic QSSA is accurate over a range of initial conditions that include the most probable fluctuations (Figs. 16). If the deterministic QSSA is not accurate in this neighborhood, the stochastic QSSA will fail. On the other hand, if the deterministic QSSA is accurate regardless of initial conditions, the stochastic QSSA will be accurate.

We have discussed the relationship between the accuracy of the deterministic and the stochastic QSSA using common examples (e.g. Michaelis-Menten and Hill kinetics). Based on these examples, we conjecture that a similar relation holds more generally. We leave a full theoretical investigation of this conjectured relationship for future work. Thomas et al. [56] used a projector operator technique to show that, if fast reactions do not affect slow species – a condition which is not generally satisfied when the QSSA can be applied – then the propensities derived from the deterministic QSSA can be used to provide an accurate linear noise approximation. This is consistent with our result showing that the tQSSA, which holds when slow species are insulated from fast reactions, leads to a more accurate stochastic QSSA (Fig. 4). Furthermore, we considered more general cases in which the fast reactions affect the slow species, which is common in QSSA reductions (Figs. 1, 3 and 5). Our work indicates that if fast reactions do affect slow species the stochastic QSSA is more accurate when the slow species do not change much during the IT period. This indicates that the theoretical results of Thomas et al. [56] can be generalized to the case when fast reactions do affect slow species.

Based on the above relationship, we provide a simple numerical method (Eq. 8) for testing the validity of stochastic reductions that include non-elementary propensity functions (Fig. 7). In applying this method, we assume that the distribution and the range of fluctuations of the fast species is not known. Without this knowledge, we assume that fast variables can vary over all possible states, a range that may be much larger than that covered by their actual fluctuations. As a result, our method is more conservative than necessary and can produce false negatives. Our method can be improved by identifying a plausible range of initial conditions that need to be tested by estimating the actual fluctuations in the fast variables via analysis (e.g. linear noise approximation) or numerical simulation (Fig. 6 b). Our approach works better if one normalizes the fast variables (Additional file 1: Figure S1), which can help reduce the parameter dependence of the fast variables’ fluctuation range.

Fig. 7
figure 7

Procedure for validating the stochastic QSSA with Eq. 8. Step 1. Perform stochastic simulations with the reduced model. Step 2. From these stochastic simulations, estimate a range of initial conditions for the slow variables. For the fast variables, use all possible states for their range of initial conditions. Step 3. Using these ranges of initial conditions for the slow and fast variables, test the accuracy of deterministic QSSA. If the deterministic QSSA is accurate for all of these initial conditions, the stochastic QSSA is accurate

Furthermore, to avoid stochastic simulations of the full model, we proposed to test the accuracy of the deterministic QSSA over a range of values for the slow variable determined by simulations of the reduced model (Fig. 7). It is possible that the accuracy of the deterministic QSSA inside this range does not imply validity outside the range. In such cases our method can lead to false positives and suggest the stochastic QSSA is accurate when it is not. A singular perturbation analysis of the accuracy of the deterministic QSSA can help to avoid this problem [13, 9, 11]. For instance, Eq. 6 shows that the accuracy of the deterministic QSSA across an arbitrary range of initial values of the slow variable (R(0)) is determined by the same condition. In such cases, false positives can be avoided (Fig. 2).

Diverse software packages for simulating stochastic biochemical systems are available. A common issue with all these packages is how to deal with non-elementary reaction functions. Some packages, such as STOCKS [57], require users to convert non-elementary reaction functions to elementary ones. Such simulations can be prohibitively slow. Thus other packages, such as COPASI [58], StochSS (http://www.stochss.org/) and Cain [59] allow the use of non-elementary propensity functions. The resulting simulations are faster, but may be inaccurate. Our method can provide a quick way to guarantee when such simulations can be trusted (Fig. 7). Because this method is based on our finding of a relative relationship between the error of the deterministic and stochastic QSSA (Figs. 2 and 3 d-e), it could be improved if a more direct relationship can be derived.

Our work could also be extended to more general singularly perturbed systems such as reaction diffusion system [3, 60, 61]. The relation we found could also provide a bridge between theories of singularly perturbed deterministic systems [2, 3, 9, 11], and their stochastic counterparts.

Methods

Stochastic simulation

All stochastic simulations were performed with the Gillespie algorithm [62]. The propensity functions of both elementary and non-elementary reactions were obtained by substituting \( X=\frac {n_{X}}{\Omega }\) to the macroscopic reaction rates (Additional file 1: Table S1). X and n X represent concentration and the number of species, respectively. Ω is the volume of system. We used Ω=1 unless it is specified. The variance and expectation values of species were estimated with 106 simulations (Figs. 2 and 3 de). The standard deviation of errors in Figs. 2 and 3 de were estimated with the bootstrap method with 100 block length.

Validity condition of deterministic QSSA

The deterministic QSSA of the model (Eq. 5) is accurate when a slow variable, R, changes little while a fast variable, D A , approaches its QSS in the model (Eqs. 13). Let us estimate how much R changes while D A approaches its QSS. To follow the estimation method of Segel and Slemrod [1], we consider the case when R decays from its initial condition. The timescale of D A during the IT period (RR(0)) can be estimated by

$$ t^{-1}_{D_{A}}=\left| \frac{\partial \dot{D}_{A}}{\partial D_{A}} \right|=k_{f} R (0) +k_{b} \ $$

Then, the relative change of R during the IT period becomes

$$\begin{array}{@{}rcl@{}} \left| \frac{\Delta R}{R(0)} \right| &\approx& \left| \frac{\dot{R}^{\text{max}}}{R(0)} \right| t_{D_{A}} \ \\ &\approx&\left| \frac{\beta_{R} R(0)+ k_{f} R(0)D_{A}^{\text{max}}}{R(0)} \right| \frac{1}{k_{f} R (0) +k_{b}}\ \\ &=&\frac{\beta_{R}/k_{f}+ D_{A}^{\text{max}}}{R(0)+K_{D}}, \end{array} $$
((23))

where \( \dot {R}^{max}\) and \( D_{A}^{\text {max}}\) represent the maximum of \( \dot {R}\) and D A during the IT period, respectively. Note that when we derived \(\dot {R^{\text {max}}} \approx \beta _{R} R(0)+k_{f} R(0) D_{A}^{\text {max}}\), we assumed that mRNA evolves much more slowly than D A during the IT period. Equation 23 estimates the maximal relative change of R during the IT period.

Validity condition of stochastic QSSA

Our previous study showed that the stochastic QSSA of the model (Eq. 5) is accurate when: 1) binding and unbinding between R and D A are faster than other reactions; and 2) the QSS of D A of the deterministic and stochastic systems are similar [30]. When the first condition is satisfied, it was shown that replacing the fast variable D A with its QSS of the stochastic system leads to an accurate reduction [1618, 29]. Thus, when the second condition is satisfied, the stochastic QSSA becomes accurate because the stochastic QSSA uses the QSS of D A driven with the deterministic system to replace the fast variables. For the first condition, we need k f β R =β M because K D =k b /k f is fixed in this study. For the second condition, the relative difference between the QSS of the deterministic (D A (R)) and stochastic (\( \left <D_{A}\right >\)) systems should be small. A theorem provided in our previous study (p. 788 of [30]) shows that the difference mainly depends on the Fano factor of the fast variables (Var(D A )/D A (R)) and the sensitivity of the QSS solution (d D A (R)/d R). Since the error stems from ignoring the effect of fluctuations of the fast species on the slow species ([27, 56]), the error depends on the Fano factor of fast variables – and this error is magnified by the sensitivity of the QSS solution:

$$\begin{array}{@{}rcl@{}} \left| \frac{D_{A}(R)-\left<D_{A}\right>}{D_{T}} \right| &\approx& \frac{1}{D_{T}} \frac{\text{Var}(D_{A})}{D_{A}(R)} \left| \frac{dD_{A}(R)}{dR} \right| \\\ &=& \text{Var} \left(\frac{D_{A}}{D_{T}}\right) \frac{D_{T}}{R+K_{D}} \end{array} $$
((24))

The second equality comes from Eq. 4. Under nondimensionalization, where D T =1, Eq. 24 becomes Eq. 7.

Availability of supporting data

No supporting data sets are associated with this manuscript.