Holimap: an accurate and efficient method for solving stochastic gene network dynamics

Jia, Chen; Grima, Ramon

doi:10.1038/s41467-024-50716-z

Holimap: an accurate and efficient method for solving stochastic gene network dynamics

Article
Open access
Published: 02 August 2024

Volume 15, article number 6557, (2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Holimap: an accurate and efficient method for solving stochastic gene network dynamics

Download PDF

1374 Accesses
7 Altmetric
Explore all metrics

Abstract

Gene-gene interactions are crucial to the control of sub-cellular processes but our understanding of their stochastic dynamics is hindered by the lack of simulation methods that can accurately and efficiently predict how the distributions of gene product numbers vary across parameter space. To overcome these difficulties, here we present Holimap (high-order linear-mapping approximation), an approach that approximates the protein or mRNA number distributions of a complex gene regulatory network by the distributions of a much simpler reaction system. We demonstrate Holimap’s computational advantages over conventional methods by applying it to predict the stochastic time-dependent dynamics of various gene networks, including transcriptional networks ranging from simple autoregulatory loops to complex randomly connected networks, post-transcriptional networks, and post-translational networks. Holimap is ideally suited to study how the intricate network of gene-gene interactions results in precise coordination and control of gene expression.

Mathematical Theory to Compute Stochastic Cellular Processes

A geometric analysis of fast-slow models for stochastic gene expression

Article 02 April 2015

Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting

Introduction

Genetic regulation occurs through intricate interactions between a number of genes^1,2,3,4. A gene “X” may express a protein which acts as a transcription factor (TF), promoting or inhibiting the RNA polymerase assembly on another target gene “Y” (or on itself) and thus regulating the extent that the latter is expressed⁵. These gene-gene interactions can be simply visualized as a directed graph with the genes being the nodes (vertices) and the directed edges (links) representing the interactions^6,7. Networks inferred from gene expression data, commonly called gene regulatory networks⁸, have been reconstructed by several methods^{9,10,11,12,13}. The complex connectivity of these networks makes intuitive understanding of their dynamics challenging. Consequently, the construction, mathematical analysis, and simulation of models of gene regulatory networks are indispensable tools in a quantitative biologist’s arsenal.

Several formalisms have been employed to predict gene regulatory network dynamics, including Boolean networks, ordinary differential equations (ODEs), and chemical master equations (CMEs)—for reviews covering these approaches and more, please see refs. ^14,15. These approaches have various advantages and disadvantages. In Boolean networks, the expression of each gene is tracked by a binary variable and hence large networks can be examined in a computationally efficient way. A more refined description is provided by the use of ODEs, where the time-dependent concentrations of RNAs, proteins, and other molecules are predicted as a function of the rate constants of the reactions in the network^16,17. An even more realistic description makes use of the CME approach where one predicts not only the mean expression levels of various genes but also the distributions of the discrete numbers of mRNAs and/or proteins measured across a population of cells¹⁸. This stochasticity has various sources (biological intrinsic and extrinsic noise, and technical noise introduced by experimental protocols), all of which lead to the large differences in gene expression observed from one cell to another^19,20,21.

Unfortunately, with an increasing level of sophistication and predictive power, simulations also rapidly become computationally expensive. Unraveling the stochastic dynamics of gene networks requires solving a set of coupled CMEs for the probability of the system being in each possible state. Since the number of states of a gene network is typically infinite, direct solution of these equations is impossible. The finite-state projection algorithm (FSP)²² truncates the infinite state space to a finite one; this renders numerical solutions possible because we only need to solve a finite-dimensional CME. However, the immense number of states limits its applications to very small networks with one or two interacting genes. For larger networks with multiple interacting genes, Monte Carlo simulations based on the stochastic simulation algorithm (SSA)²³ become more practical. Specifically, given the current state of the system, the SSA generates two random numbers to predict the time when the next reaction event occurs and which particular reaction event will occur. The output is a number of statistically correct trajectories (molecule number versus time data), one for each cell, from which the copy number distributions of all biochemical species can be calculated. However, the issue remains that a large sampling size is typically required to obtain smooth distributions and hence the computational time can still be very considerable. For an introduction to simulation methods in stochastic biology, we refer the reader to refs. ^24,25,26.

In this paper, we overcome the difficulties of conventional stochastic simulation methods for gene networks by devising an efficient approach—the high-order linear-mapping approximation (Holimap). The basic idea is to map the dynamics of a complex gene network with second or higher-order interactions (a system with nonlinear propensities and hence a nonlinear network) to the dynamics of a much simpler system where all reactions are first-order (a linear network). The reaction rates of this system are generally time-dependent and complex functions of the reaction rates of the original gene network and they are found by conditional moment-matching. The linear network has a much smaller state space than the nonlinear network which means that now simulation using FSP becomes feasible, leading to smooth distributions of protein numbers in a fraction of the time taken by SSA simulations. For an illustration of Holimap see Fig. 1.

**Fig. 1: Illustration of Holimap and its advantages over the SSA.**

The paper is structured as follows. The Holimap method is introduced by means of a simple autoregulatory feedback loop example where we show step-by-step how the approximation is constructed when second or higher-order interactions are only between a protein and a gene. The method is then extended to show the application to more complex networks with multiple protein-gene interactions and also to networks with gene product interactions such as those with RNA-RNA, RNA-protein, and protein-protein high-order reactions. By comparison with the SSA or FSP, we show that independent of the type of interactions in a gene network, Holimap provides highly accurate time-dependent distributions of protein or mRNA numbers over large swathes of parameter space including those regions where the system displays oscillatory or multistable dynamics. Finally, we show that the computation time of Holimap can be significantly reduced while maintaining its accuracy by devising a hybrid method which combines both Holimap and the SSA.

Results

Fundamental principles of Holimap illustrated by an autoregulation example

Consider a simple autoregulatory feedback loop^27,28, whereby protein expressed from a gene regulates its own transcription (Fig. 2a). Feedback is mediated by cooperative binding of h protein copies to the gene^29,30,31,32. In agreement with experiments³³, protein synthesis is assumed to occur in bursts of random size k sampled from a geometric distribution with parameter p, i.e., ${\mathbb{P}}(k=n)={p}^{n}(1-p)$. Here σ_b is the binding rate of protein to the gene; σ_u is the unbinding rate; ρ_b and ρ_u are the burst frequencies of protein, i.e., the frequencies with which bursts are produced, when the gene is in the bound and unbound states, respectively; d is the rate of protein degradation and dilution (due to cell division). The reaction system describes a positive feedback loop when ρ_b > ρ_u (since in the case, binding of a protein increases its own expression) and describes a negative feedback loop when ρ_b < ρ_u (binding of a protein decreases its own expression).

**Fig. 2: Holimaps for autoregulatory gene networks in steady-state conditions.**

Let p_i,n denote the probability of having n protein copies in an individual cell when the gene is in state i with i = 0, 1 corresponding to the unbound and bound states, respectively. To proceed, let ${g}_{i}={\sum }_{n=0}^{\infty }{p}_{i,n}$ be the probability of observing the gene in state i and let ${\mu }_{m,i}=\mathop{\sum }_{n=0}^{\infty }n(n-1)\cdots (n-m+1){p}_{i,n}$ be the mth factorial moment of protein numbers when the gene is in this state. For simplicity, we first focus on the case of non-cooperative binding (h = 1). From the CME, it is straightforward to obtain the following time evolution equations for the moments:

$${\dot{g}}_{0}= {\sigma }_{u}{g}_{1}-{\sigma }_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{1,0}= {\rho }_{u}B{g}_{0}-d{\mu }_{1,0}+{\sigma }_{u}({\mu }_{1,1}+{g}_{1})-{\sigma }_{b}({\mu }_{2,0}+{\mu }_{1,0}),\\ {\dot{\mu }}_{1,1}= {\rho }_{b}B{g}_{1}-d{\mu }_{1,1}-{\sigma }_{u}{\mu }_{1,1}+{\sigma }_{b}{\mu }_{2,0},\\ {\dot{\mu }}_{2,0}= 2{\rho }_{u}B({\mu }_{1,0}+B{g}_{0})-2d{\mu }_{2,0}\\ +{\sigma }_{u}({\mu }_{2,1}+2{\mu }_{1,1})-{\sigma }_{b}({\mu }_{3,0}+2{\mu }_{2,0}),\\ {\dot{\mu }}_{2,1}= 2{\rho }_{b}B({\mu }_{1,1}+B{g}_{1})-2d{\mu }_{2,1}-{\sigma }_{u}{\mu }_{2,1}+{\sigma }_{b}{\mu }_{3,0},$$

(1)

where g₁ = 1 − g₀ and B = 〈k〉 = p/(1 − p) is the mean protein burst size, i.e., the mean number of protein molecules produced in a single burst. For clarity, we have suppressed the explicit time dependence of all moments. Note that this system of equations is not closed, i.e., the equation for a moment of a certain order depends on moments of higher orders, and hence an exact solution is generally impossible. This difficulty stems from the nonlinear dependence on molecule numbers of the bimolecular propensity modeling protein-gene interactions³⁴.

In contrast, a linear gene network (one composed of only first-order reactions, i.e., the propensity of each reaction has a linear dependence on molecule numbers) is much easier to solve both analytically and numerically than a gene network with nonlinear propensities; for example, the moment equations are closed and thus can be solved exactly in this case. A basic idea of the linear-mapping approximation (LMA) developed in ref. ³⁵ is to transform a complex nonlinear network into a linear one by replacing all second or higher-order reactions between proteins and genes by effective first-order reactions. Specifically, for the network in Fig. 2a, we replace the reactions G + hP ⇌ G^* by G ⇌ G^*. The LMA maps the nonlinear network to the linear one shown in Fig. 2b, where the binding rate σ_b for the former is replaced by the effective gene switching rate ${\hat{\sigma }}_{b}$ for the latter, while the other parameters remain unchanged. In the LMA, ${\hat{\sigma }}_{b}$ is chosen to be σ_b multiplied by the conditional mean of protein numbers in the unbound gene state, i.e.,

$${\hat{\sigma }}_{b}={\sigma }_{b}\langle n| i=0\rangle=\frac{{\sigma }_{b}{\mu }_{1,0}}{{g}_{0}},$$

(2)

where g₀ and μ_1,0 can be calculated by a natural moment-closure method (“Methods”)³⁵. There are two approximations involved in the LMA: (i) in reality, the effective parameter ${\hat{\sigma }}_{b}$ should be stochastic rather than deterministic since it is proportional to the instantaneous protein number in the unbound state; (ii) any moment-closure method inevitably leads to some errors³⁶.

Next we propose an efficient method—Holimap, which we will show to perform much better than the LMA. There are two types of Holimaps. The first type is the 2-parameter Holimap (2-HM) which transforms the nonlinear gene network into the linear one illustrated in Fig. 2c, where both the binding and unbinding rates σ_b and σ_u for the former are replaced by the effective gene switching rates ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ for the latter. The remaining question is how to determine ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ so that the solution of the linear network accurately approximates that of the nonlinear one. For the linear network, the evolution of moments is governed by

$${\dot{g}}_{0}= {\widetilde{\sigma }}_{u}{g}_{1}-{\widetilde{\sigma }}_{b}{g}_{0},\\ {\dot{\mu }}_{1,0}= {\rho }_{u}B{g}_{0}-d{\mu }_{1,0}+{\widetilde{\sigma }}_{u}{\mu }_{1,1}-{\widetilde{\sigma }}_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{1,1}= {\rho }_{b}B{g}_{1}-d{\mu }_{1,1}-{\widetilde{\sigma }}_{u}{\mu }_{1,1}+{\widetilde{\sigma }}_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{2,0}= 2{\rho }_{u}B({\mu }_{1,0}+B{g}_{0})-2d{\mu }_{2,0}+{\widetilde{\sigma }}_{u}{\mu }_{2,1}-{\widetilde{\sigma }}_{b}{\mu }_{2,0},\\ {\dot{\mu }}_{2,1}= 2{\rho }_{b}B({\mu }_{1,1}+B{g}_{1})-2d{\mu }_{2,1}-{\widetilde{\sigma }}_{u}{\mu }_{2,1}+{\widetilde{\sigma }}_{b}{\mu }_{2,0}.$$

(3)

The effective rates ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ are chosen so that the two systems have the same zeroth and first-order moment equations (for the latter, we mean the first-order moment when the gene is in the bound state). Matching the first and third identities in Eqs. (1) and (3), we find that ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ should satisfy

$$\begin{array}{rcl}&&{\widetilde{\sigma }}_{u}{g}_{1}-{\widetilde{\sigma }}_{b}{g}_{0}={\sigma }_{u}{g}_{1}-{\sigma }_{b}{\mu }_{1,0},\\ &&{\widetilde{\sigma }}_{u}{\mu }_{1,1}-{\widetilde{\sigma }}_{b}{\mu }_{1,0}={\sigma }_{u}{\mu }_{1,1}-{\sigma }_{b}{\mu }_{2,0}.\end{array}$$

(4)

The remaining question is how to use these equations to obtain formulae for the effective rates. This can be done as follows: we first solve for ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ using Eq. (4) and then substitute these into Eq. (3) to obtain a set of closed moment equations. These equations can be solved for the values of all zeroth, first, and second-order moments, i.e., g_i, μ_1,i, and μ_2,i. Finally substituting these into Eq. (4) gives the values of the effective parameters ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ for the linear network. See Supplementary Note 2 for a more detailed explanation of the Holimap algorithm.

In steady-state, the values of ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ are constants independent of time, and hence we can use the steady-state protein distribution of the linear network to approximate that of the nonlinear one—this can be computed analytically³⁷ or using FSP. When the system has not reached steady-state, the values of ${\widetilde{\sigma }}_{b}$ and ${\widetilde{\sigma }}_{u}$ depend on time t. In this case, we can use the time evolution of the linear network with time-dependent rates to predict that of the nonlinear one—while analytical solutions are not generally available in this case, the distributions can be efficiently computed using FSP.

In some regions of parameter space, the 2-HM may still not be accurate enough. To solve this problem, we devise a second type of Holimap—the 4-parameter Holimap (4-HM), which transforms the nonlinear network into the linear one illustrated in Fig. 2d. Here the binding rate σ_b, unbinding rate σ_u, and the protein burst frequencies ρ_b and ρ_u for the former are replaced by four effective parameters ${\bar{\sigma }}_{b},{\bar{\sigma }}_{u},{\bar{\rho }}_{b}$, and ${\bar{\rho }}_{u}$ for the latter, which can be determined by matching the moment equations for the two networks (“Methods”). Note that while for the 2-HM, we matched only the zeroth and first-order moments, for the 4-HM, we matched these and also the second-order moments. The 2-HM and 4-HM will be collectively referred to as Holimaps in what follows.

Thus far, we have only considered the case of h = 1. For the case of cooperative binding (h ≥ 2), the Holimap approximation procedure can be similarly performed, except that higher-order moment equations need to be solved (Supplementary Note 2)—the algorithm for finding the effective parameters requires the solution of (h + 1)-order moment equations. For example, when h = 2, third-order moment equations need to be solved and the effective parameters depend on the values of zeroth, first, second, and third-order moments. We emphasize that the computational cost of Holimap is mainly determined by the number of moment equations, L, to be solved. For autoregulatory loops, L = 1 + 2h for the LMA and L = 3 + 2h for Holimap. Note that the 2-HM and 4-HM have the same L.

The principles used to construct Holimaps for autoregulated networks can be used to obtain Holimaps for an arbitrarily complex network consisting of a system of interacting genes that regulate each other via positive or negative feedback. A flow chart of the Holimap algorithm for a general regulatory network can be found in Supplementary Fig. 1. The computational time of Holimap depends on the complexity of the network—an increased number of nodes (genes) or edges (regulatory reactions) results in an increased number of moment equations L to be solved. In Supplementary Note 3, we prove that for a general network, L scales polynomially with the cooperativity h and scales exponentially with respect to the network size M (number of genes).

Applications to one-node (autoregulatory) networks

We now assess the performance of Holimap based on the Hellinger distance (HD) between the steady-state protein distribution obtained by applying FSP to the nonlinear network and the approximate distribution computed using the LMA and the two types of Holimaps. Note that while the direct application of FSP also leads to an approximate distribution, in effect, it can be considered exact since the error is very small provided the state space is truncated to a large enough value²². Here we choose the HD because it is bounded between 0 and 1; a visually accurate approximation is obtained when the HD ≪ 0.1.

Figure 2e illustrates the HD for the LMA as a function of ρ_u and ρ_b. Clearly, the LMA performs well when ρ_u and ρ_b are not very different from each other. However, it results in larger deviations from FSP when the protein burst frequency in one gene state is significantly larger than that in the other. We also find that the LMA is much more accurate for negative feedback loops (ρ_u > ρ_b) than for positive feedback loops (ρ_b > ρ_u). In the LMA, the effective stochastic parameter ${\hat{\sigma }}_{b}$ is approximated by σ_b multiplied by the conditional mean of protein numbers in the unbound state. Hence it must give rise to inaccurate approximations when protein noise in the unbound gene state is large. This is exactly what happens in the positive feedback case where the low synthesis rate in the unbound state results in a small conditional mean and thus large protein noise.

We next examine whether Holimap outperforms the LMA when it is applied to positive feedback loops. Figure 2f shows the HD against σ_u/d and σ_b/d for the LMA, 2-HM, and 4-HM when ρ_b ≫ ρ_u. It is clear that the LMA (Fig. 2f, left) performs well when σ_u and σ_b are both small, but it becomes highly inaccurate when σ_u and σ_b are larger. The protein distribution can be unimodal or bimodal. The bimodal one is of particular interest because it indicates the separation of isogenic cells into two different phenotypes. In particular, we find that the LMA results in poor approximations when σ_u≥ d and when the distribution is bimodal. This can be explained as follows. Recall that the LMA transforms a nonlinear network into a linear one with unchanged σ_u, which is commonly known as the telegraph model of stochastic gene expression³⁸. In ref. ³⁹, it has been proved that the telegraph model can produce a bimodal steady-state distribution only when both gene switching rates are smaller than the protein decay rate (${\sigma }_{u},{\hat{\sigma }}_{b} \, < \, d$). When σ_u≥ d, the linear network can never exhibit bimodality, while the bimodality in the nonlinear network can be apparent.

We emphasize that σ_u≥ d is biologically relevant since in naturally occurring systems, protein is usually very stable⁴⁰ and hence its decay rate is often smaller than the rates of gene state switching. For example, in mouse fibroblasts, it has been measured⁴¹ that the median protein half-life is 46 h and the mean cell cycle duration is 27.5 h; hence the mean decay rate of protein is estimated to be $d=(\log 2)/46+(\log 2)/27.5\,{{{{\rm{h}}}}}^{-1}=6.7\times 1{0}^{-4}\,{\min }^{-1}$. In the same cell type, the mean activation and inactivation rates for thousands of genes are estimated to be 0.002 min⁻¹ and 0.24 min⁻¹ ⁴². In another study, the mean activation and inactivation rates are estimated to be 0.014 min⁻¹ and 0.17 min⁻¹⁴³. Hence σ_u≥ d is indeed satisfied for most genes.

In contrast to the LMA, both the 2-HM and 4-HM markedly reduce the HD values (Fig. 2f, center and right). The LMA has a maximum HD of 0.7, while for the two types of Holimaps, the maximum HDs are only 0.2 and 0.16. The 4-HM performs marginally better than the 2-HM in capturing steady-state protein distributions. We also compare the region of parameter space where bimodality is predicted to exist (region enclosed by the orange curves) with the actual region where bimodality manifests according to FSP (region enclosed by the red curves). We note that while the LMA fails to capture the bimodal region of the protein distribution, especially when σ_u≥ d, both the 2-HM and 4-HM capture the vast majority of the bimodal region. In summary, the deficiencies of the LMA for positive feedback loops are remedied by the use of Holimaps (Fig. 2g).

Finally, we examine how the cooperativity in protein binding affects the accuracy of various approximation methods. Figure 2h shows the maximum HD as a function of h for the LMA, 2-HM, and 4-HM, where the maximum HD is computed when σ_u and σ_b vary over large ranges and other parameters remain fixed. Clearly, for the LMA, the maximum HD increases approximately linearly with respect to h when h ≤ 4; for Holimaps, the maximum HD is insensitive to h. Since TF cooperativity is the norm rather than the exception⁵, our results suggest Holimap’s accuracy remains high over the physiologically meaningful range of parameter values.

The results that we have presented assume steady-state conditions. However, the 2-HM can also accurately reproduce the time evolution of the protein distribution for nonlinear gene networks (Supplementary Fig. 2). The 4-HM is also accurate; however depending on parameter values, it may lead to numerical instability at short times, which usually occurs when σ_u and σ_b are large for negative feedback loops (Supplementary Fig. 3). We did not observe numerical instability for the 2-HM. As a result, the 2-HM might be the preferable choice when dynamics is of major interest. In steady-state, while the improvement in accuracy of the 4-HM may be marginal, nevertheless since the two types of Holimaps require the solution of the same number of moment equations, the 4-HM is more advantageous when dynamics is not of interest.

Applications to two-node networks with deterministic mono- and bistability

We next evaluate the performance of Holimaps when applied to study the steady-state behavior of two-node gene networks, where two genes regulate each other (Fig. 3a, left). Feedback is mediated by cooperative binding of h₁ copies of protein P₁ to gene G₂ and cooperative binding of h₂ copies of protein P₂ to gene G₁. Here σ_bi and σ_ui are the binding and unbinding rates for gene G_i, respectively; ρ_bi and ρ_ui are the synthesis rates of protein P_i when the gene is in the bound and unbound states, respectively; d_i is the degradation rate of protein P_i. For simplicity, we do not take protein bursting into account, although it can be included easily. Depending on whether ρ_ui < ρ_bi or ρ_ui > ρ_bi for i = 1, 2, there are four different types of effective system dynamics that constitute either a positive feedback or a negative feedback loop (Fig. 3b). For example, a toggle switch (two negative regulations)⁴⁴ corresponds to the case of ρ_u1 > ρ_b1 and ρ_u2 > ρ_b2. For two-node networks, Holimaps can be performed in a similar way as we have previously shown for autoregulatory loops, i.e., by replacing all protein-gene binding reactions by effective first-order reactions with new parameters and also allowing some of the other reactions to have different rate constants than those in the original network (Fig. 3a, center and right).

**Fig. 3: Holimaps for two-node gene networks in steady-state conditions.**

We first focus on a negative feedback loop without cooperative binding (Fig. 3c). Since the LMA performs well when the unbinding rate σ_ui is much smaller than the degradation rate d_i, here we consider the case of σ_ui ≫ d_i. We use the HD between the actual and approximate steady-state distributions of protein P₁ to test the accuracy of Holimap. Figure 3d illustrates the HDs for the LMA and 4-HM as functions of σ_b1 and σ_b2. We find that the network displays bimodality when σ_b1 is large and σ_b2 is small. This is surprising because in the literature there are two well-accepted origins for bimodality: (i) a positive feedback loop with ultra-sensitivity (type-I)⁴⁴ and (ii) slow switching between gene states (type-II), independent of the type of feedback loop³⁷. Here the network is a negative feedback loop without cooperative binding, and thus there is neither a positive feedback loop nor ultra-sensitivity. Moreover, since both σ_u1 and σ_b1 are large, gene G₁ switches rapidly between the two states. Hence the bimodality observed is neither type-I nor type-II, and in the following, we refer to it as type-III bimodality.

From Fig. 3d, it is clear that the LMA performs poorly in this bimodal region. Again, the LMA cannot capture type-III bimodality since it transforms the nonlinear network into a linear one with unchanged σ_ui, which is unable to produce a bimodal distribution when σ_ui≥ d_i³⁹. On the other hand, the 4-HM significantly reduces the HD values and performs exceptionally well in capturing the bimodal region (Fig. 3e). Here we do not show the 2-HM because it leads to similar results as the 4-HM except for a slightly larger HD value.

We next consider a toggle switch with cooperative binding, where two genes repress each other (Fig. 3f). Note that this is a positive feedback loop with ultra-sensitivity and hence it can produce deterministic bistability (type-I bimodality), which means that the corresponding system of deterministic rate equations (Supplementary Note 4) is capable of having two stable fixed points and one unstable point. Again, we only focus on the situation of σ_ui ≫ d_i. Figure 3g illustrates the HDs for the LMA and 4-HM against σ_b1 and σ_b2. The yellow curve encloses the region of deterministic bistability, which is markedly smaller than the true bimodal region enclosed by the red curve. According to simulations, bimodality can be observed when both σ_b1 and σ_b2 are large. The LMA fails to reproduce the bimodal distribution since σ_ui≥ d_i, as expected. The 4-HM not only successfully captures the bimodal region (enclosed by the orange curve), but also yields small HD values. The maximum HD for the LMA is as large as 0.7, while it is only 0.13 for the 4-HM. In particular, in the deterministically bistable region, both the 2-HM and 4-HM accurately predict the protein distribution while the LMA completely fails (Fig. 3h).

Applications to three-node networks with deterministic oscillations

We now focus on three-node networks, where three genes regulate each other in a cyclic manner (Fig. 4a, left). Feedback is mediated by cooperative binding of h_i copies of protein P_i to gene G_i+1 for i = 1, 2, 3, where G₄ = G₁. Again, depending on whether ρ_ui < ρ_bi or ρ_ui > ρ_bi for i = 1, 2, 3, the network can be a repressilator (three negative regulations)⁴⁵, a Goodwin model (one negative regulation and two positive regulations)⁴⁶, or a positive feedback loop⁴⁷.

**Fig. 4: Holimaps for three-node gene networks.**

As for previous examples, Holimap transforms the nonlinear network into a linear one (Fig. 4a, right). We now focus on the repressilator illustrated in Fig. 4b, where the cooperativities are chosen as h₁ = h₂ = h₃ = 3. Here high cooperativities are chosen since we require the corresponding deterministic system of rate equations (Supplementary Note 4) to produce sustained oscillations. According to simulations, deterministic oscillations are not observed when h_i≤ 2. Figure 4c illustrates the oscillatory time evolution of the mean and Fano factor (the variance divided by the mean) of fluctuations in the number of protein P₁ computed using the SSA, LMA, and 2-HM. Note that here we do not consider the 4-HM because, as previously mentioned, it may cause numerical instability when computing time-dependent distributions. The LMA fails to reproduce damped oscillations in the time evolution of the mean and Fano factor, while Holimap excellently captures these oscillations. Note also that the LMA significantly underestimates the variance of fluctuations and hence leads to a much smaller Fano factor in the limit of long times.

Figure 4d compares the time-dependent protein distributions computed using the SSA, LMA, and 2-HM. Interestingly, both the LMA and 2-HM accurately reproduce the protein distribution at small times (t ≤ 3). However, the LMA fails to reproduce bimodality at intermediate and large times since it underestimates noise. In contrast, Holimap performs remarkably well in predicting the complete time evolution of the protein distribution.

Thus far, we have considered regulatory networks where each gene is regulated by one TF; however, many genes are regulated by a multitude of TFs which are often shared between multiple genes⁴⁸. In Supplementary Note 5, we investigate gene networks with two TF binding sites. We show that Holimap performs excellently in capturing the protein distributions as well as the bimodal region, independent of the type of network topology and the type of TF binding (independent, positive cooperative, and negative cooperative binding⁴⁹).

A hybrid combination of SSA and Holimap provides highly efficient computation of complex gene network dynamics

The FSP and SSA are two widely used methods for solving the dynamics of stochastic chemical reaction systems. While FSP yields an accurate distribution, from a practical point of view, it is only applicable to small networks where protein numbers are not very large; for large networks, the size of the state space leads to an enormous computational cost²². The SSA can also be computationally expensive, particularly when the network has multiple reaction time scales²³. When fluctuations are large, it can yield a non-smooth distribution, from which it is sometimes even difficult to determine the number of modes. To overcome this, a huge number of stochastic trajectories may be needed to obtain statistically accurate results. Holimap provides an accurate and smooth approximation of the protein distributions; however, it becomes computationally slow when the network is complex or the cooperativity is large since in these cases we have to solve a large number of moment equations. This raises an important question: is it possible to develop a highly efficient and accurate computation method of stochastic gene network dynamics that yields a smooth distribution?

To address this question, we propose a hybrid method that combines the SSA and Holimap. This method consists of three steps (Fig. 5a). First we use the SSA to generate a small number of trajectories (usually a few thousand trajectories are enough) from which we compute the steady-state or time-dependent sample moments of protein numbers. We then use the latter to compute the approximate effective parameters of the linear network. Finally, we use FSP to compute the protein distribution of the linear network with effective parameters to approximate that of the nonlinear one. For example, for the autoregulatory circuit illustrated in Fig. 2a, we substitute the sample moments obtained from the SSA into Eq. (4) to compute the approximate values of ${\widetilde{\sigma }}_{u}$ and ${\widetilde{\sigma }}_{b}$, and then use the marginal protein distribution of the linear network to construct the 2-HM of the nonlinear network. In other words, for Holimap, the determination of the effective parameters can be done independently of other computational methods while the hybrid method requires the running of the SSA.

**Fig. 5: A hybrid method combining the SSA and Holimap.**

This hybrid SSA + Holimap method is computationally much faster than the SSA because the number of trajectories needed to obtain good approximations to low-order moments is much less than that needed to obtain smooth protein distributions. It is also computationally less expensive than Holimap since there is no need to solve a large number of moment equations. To test this hybrid method, we compare the time-dependent distributions for the repressilator calculated using the SSA, LMA, 2-HM, and SSA + 2-HM (Fig. 4d). In Fig. 5b, we also compare the CPU times and accuracy of these methods. The number of SSA trajectories N needed for SSA + 2-HM is chosen such that the distributions obtained from N trajectories and those obtained from 3N trajectories have an HD (averaged over all time points) less than 0.02, i.e., increasing the sample size will not substantially improve the approximation accuracy—a sample size of N = 2000 is sufficient to satisfy this criterion. Notably with almost the same CPU time, SSA + 2-HM yields distributions that are significantly more accurate than those obtained from only the SSA with the same number of trajectories—the HD for the former is only 0.04–0.06, while for the latter it is 0.11–0.13; here the distributions obtained from the SSA with 10⁵ trajectories are used as a proxy of ground truth when computing the HDs. We also note that SSA + 2-HM yields distributions that are practically as accurate as the 2-HM but are over 16 times faster (28 s vs 7 min 39 s).

To further test the accuracy of SSA + Holimap, we apply it to a random M-node gene network (Fig. 5c), where any pair of nodes has a probability of 2/M to be connected. This guarantees that each gene on average regulates two genes. When connected, each direct edge has an equal probability to be positive or negative regulation; autoregulation is also allowed. The details of the stochastic model are described in Methods. We then apply the 2-HM to transform the nonlinear random network into a linear one and then use 2000 SSA trajectories to estimate the effective parameters of the linear network. Figure 5d illustrates the CPU times and HDs against the number of nodes M for SSA + 2-HM and the SSA with the same number of trajectories. Again an SSA with 10⁵ trajectories is used to generate a proxy of the ground truth distribution when computing the HDs. Clearly, the two methods yield almost the same CPU times that approximately linearly scale with M. This is because for SSA + 2-HM, almost all time is spent on simulating the SSA trajectories, while solving the linear network consumes very little time. However, compared with an SSA with 2000 trajectories, SSA + 2-HM gives rise to markedly lower HD values, which are insensitive to M.

Generalization to networks with post-translational or post-transcriptional regulation

Thus far, we have showcased Holimap in transcriptional networks with protein-gene interactions. A crucial question is whether Holimap can also be applied to solve the dynamics of post-translational and post-transcriptional networks with complex protein-protein, protein-RNA, and RNA-RNA interactions. To see this, we first focus on two post-translational networks (Fig. 6a, b).

**Fig. 6: Holimap for post-translational and post-transcriptional networks.**

Figure 6a shows a two-node synthetic network with autoregulation and protein sequestration⁵⁰. Here protein P_i produced from gene G_i regulates its own expression; the two proteins P₁ and P₂ can bind to each other and form an inactive complex C. We then devise a three-parameter Holimap (3-HM) which transforms the nonlinear gene network into the linear one shown in Fig. 6c. In principle, Holimap replaces all high-order interactions between genes, proteins, and RNAs by effective first-order reactions. We first replace the protein-gene binding reactions ${G}_{i}+{h}_{i}{P}_{i} \, \rightleftharpoons \, {G}_{i}^{*}$ by ${G}_{i} \, \rightleftharpoons \, {G}_{i}^{*}$ with effective parameters ${\widetilde{\sigma }}_{ui}$ and ${\widetilde{\sigma }}_{bi}$, and then we replace the protein-protein binding reaction P₁ + P₂ → C by ${P}_{i}\to \varnothing$ with effective parameter ${\widetilde{d}}_{i}$. Again, using moment-matching, the three effective parameters ${\widetilde{\sigma }}_{ui},{\widetilde{\sigma }}_{bi}$, and ${\widetilde{d}}_{i}$ can be represented by low-order moments of the nonlinear network (Supplementary Note 7) and hence can be computed approximately using the SSA with a small number of trajectories. In this way, the hybrid SSA + Holimap can be applied to predict the dynamics of the nonlinear network.

Note that since Holimap replaces the binding reaction P₁ + P₂ → C by ${P}_{1}\to \varnothing$ with a new parameter $\widetilde{d}$, intuitively, one may deduce that this approximation is only valid when protein P₂ is very abundant compared to protein P₁ so that noise in protein P₂ number can be ignored. However, unexpectedly, we find that Holimap makes accurate predictions not only in this special case but also in scenarios where the two proteins interact at comparable concentrations and where P₂ is very scarce compared to P₁ (Supplementary Fig. 5). This again confirms the high accuracy of Holimap over large regions of parameter space.

As another example of post-translational regulation, we consider a gene network with autoregulation and protein phosphorylation (Fig. 6b), which has been used to account for circadian oscillations in Drosophila and Neurospora⁵¹. Here the free protein P can be reversibly phosphorylated into the forms P₁ and P₂, successively. The latter form P₂ can bind to the gene and regulate its expression. Both phosphorylation and dephosphorylation are enzyme-catalyzed and are described using Michaelis-Menten kinetics. Holimap can also be applied to this network, where protein-gene interactions are replaced by the switching reactions G $\rightleftharpoons$ G^* with effective parameters ${\widetilde{\sigma }}_{u}$ and ${\widetilde{\sigma }}_{b}$, and the complex post-translational regulation is replaced by the degradation reaction $P\to \varnothing$ with effective parameter ${\widetilde{d}}$ (Fig. 6c and Supplementary Note 7).

Furthermore, we apply Holimap to two post-transcriptional networks (Fig. 6d, e). Figure 6d illustrates a gene network with autoregulation and mRNA degradation control⁵². Here the enzyme can convert between an inactive form E and an active form E^*. The degradation of the mRNA of interest can occur spontaneously with rate d and can be catalyzed by the active form of the enzyme with rate α. Holimap transforms the nonlinear network into the linear one shown in Fig. 6f by removing all high-order interactions between molecules. In particular, the enzyme-catalyzed degradation reaction M + E^* → E^* is replaced by the effective degradation reaction $M\to \varnothing$ with new parameter $\widetilde{d}$ (Supplementary Note 7).

Figure 6e illustrates another network with microRNA-mRNA interactions, which has been shown to be capable of producing complex emergent behaviors such as bistability and sustained oscillations⁵³. Here the mRNA of interest, expressed from gene G₁, has two microRNA binding sites. The microRNA, produced from gene G₂, can bind to its mRNA target and form two inactive complexes C₁ (only one binding site is occupied) and C₂ (both binding sites are occupied). The free mRNA and microRNA are degraded with rates d₁ and d₂, respectively. Once the complex C₁ (C₂) is formed, the mRNA and microRNA are degraded with rates a₁ (b₁) and a₂ (b₂), respectively. The mRNA dynamics for this network can also be predicted by Holimap, which replaces the complex post-transcriptional regulation by the effective reaction $M\to \varnothing$ with new parameter $\widetilde{d}$ (Fig. 6f and Supplementary Note 7).

Note that for transcriptional networks, Holimap does not change the degradation rate; however, for post-transcriptional and post-translational networks, both the binding/unbinding rate and degradation rate need to be modified. To test the accuracy of the three-parameter Holimap, we compare the time-dependent distributions for the above four gene networks computed using the SSA with 10⁵ trajectories, SSA with 2000 trajectories, and hybrid SSA + Holimap with 2000 trajectories (Fig. 6g). Clearly, SSA+Holimap is accurate for all networks. In particular, the distributions predicted by SSA + Holimap with a small number of trajectories have almost the same accuracy as those predicted by the SSA with a huge number of trajectories (HD < 0.03) while the CPU time is reduced by over 60 fold (Fig. 6h).

Discussion

In this paper, we have constructed a computational method, Holimap, for the accurate and efficient prediction of the protein/mRNA number distributions of a general gene regulatory network. We have showcased the method by applying it to a variety of networks including transcriptional networks with protein-gene interactions, post-translational networks with protein-protein interactions, and post-transcriptional networks with protein-RNA or RNA-RNA interactions. For transcriptional networks, we have tested Holimap in simple autoregulatory loops where a gene influences its own expression, two-gene systems such as the toggle switch, three-gene systems such as the repressilator, and complex randomly connected networks with numerous interacting genes. Notably, we have demonstrated that a hybrid method that uses both Holimap and the SSA leads to much more accurate distributions than solely using the SSA, with practically no increase in the CPU time and high accuracy that is independent of the number of interacting genes in the network.

We devised three types of Holimaps—the 2-HM, 3-HM, and 4-HM—all of them decoupling gene-gene interactions in a nonlinear regulatory network and transforming it into a linear one with multiple effective parameters. The 2-HM and 4-HM apply to transcriptional networks, while the 3-HM is only applicable to post-translational and post-transcriptional networks. The 4-HM is more accurate than the 2-HM, although the improvement in accuracy is marginal. Depending on parameters, the 4-HM may lead to numerical instability at short times. Hence the 4-HM is preferred if our aim is to compute the steady-state distribution, and the 2-HM is a preferable choice if our aim is to compute the time-dependent distribution. The two types of Holimaps require the solution of the same number of moment equations and hence give rise to similar CPU times. Since the number of equations to be solved increases exponentially with the network size, the standard Holimap is only recommended when the scale of the network is not too large. For medium and large-scale networks, the hybrid SSA+Holimap approach is more advantageous since it significantly reduces the CPU time while maintaining high accuracy.

Some of the advantages of our method over other common approximations in the literature are as follows: (i) Holimap does not sacrifice the discrete nature of molecular reactions since the approximate distributions are solutions of the CME of the effective linear network. This is unlike many common methods that achieve a speed increase by making use of a continuum approximation of the CME such as the Fokker-Planck / Langevin equations^54,55 or partial integrodifferential equations^56,57; (ii) Holimap does not assume the protein number distribution to be of a simple type such as the Gaussian, Poisson, Lognormal or Gamma distributions, as commonly assumed by conventional moment-closure methods^58,59—the solution of the linear network that Holimap utilizes is very flexible and spans a very large number of possible distribution shapes including those with multiple modes and significant skewness. For example, if each gene in a complex regulatory network switches between a number of states for which only one is active, then Holimap approximates the protein distribution for each gene by that of a multi-state gene expression model with no regulatory interactions (Supplementary Note 5) for which the analytical steady-state solution is known to be a generalized hypergeometric function^60,61, which includes a large number of special functions as special cases.

Our hybrid SSA+Holimap method shares some similarities with neural network-based approaches⁶², which can also be used to predict complex gene network dynamics. The former uses the SSA to generate the sample moments which are then used to compute the values of effective parameters, while the latter uses the SSA to train the surrogate neural network model. While both methods can accurately capture the protein/mRNA distribution, our method outperforms the neural network-based ones in three aspects: (i) while neural network models perform well in the parameter ranges which are used to train the surrogate model, their extrapolation ability is usually weak. Our method is mechanism-based and provides accurate results over wide parameter ranges; (ii) neural network-based methods require a very long time to train the surrogate model. When the network is complex, the training time may take tens of hours to several days and may also require multiple rounds of hyperparameter tuning. In contrast, Holimap avoids the long training time; (iii) neural network models have good predictive ability but their learned approximation does not typically have a clear biophysical interpretation. Holimap transforms the complex nonlinear network into a linear one which not only has a clear physical meaning but also allows an approximative analytical solution. In addition, SSA + Holimap can be combined with neural network-based methods to increase the speed and accuracy of the latter. Since SSA + Holimap can be used to generate distributions comparable in accuracy to those from the SSA with a much larger number of trajectories, it follows that SSA + Holimap can reduce the time to generate an accurate training dataset as input to the neural network.

The main limitation of the present method is that there are no analytical guarantees that the effective parameters of the linear network are positively definite for all times. Nevertheless, for all examples using the 2-HM and 3-HM in this paper, we have numerically found this to be the case and hence we are confident that the linear network obtained by the 2-HM or 3-HM procedure is generally physically interpretable. In contrast, we observed that the 4-HM procedure can occasionally give rise to negative parameter values (typically when the binding and unbinding rates are large) and hence should be used more cautiously. Ongoing work aims to extend the method to predict both mRNA and protein dynamics, including their joint distribution for pairs of genes.

Concluding, we have devised a method that overcomes many of the known difficulties encountered when simulating complex stochastic gene network dynamics. We anticipate that Holimap will be useful for investigating noisy dynamical phenomena in complex regulatory networks where intuitive understanding is challenging to attain and simulations using the SSA become computationally prohibitive.

Methods

Determining the effective parameter for the LMA

For the linear network in Fig. 2b, the evolution of moments is governed by

$${\dot{g}}_{0}= {\sigma }_{u}{g}_{1}-{\hat{\sigma }}_{b}{g}_{0},\\ {\dot{\mu }}_{1,0}= {\rho }_{u}B{g}_{0}-d{\mu }_{1,0}+{\sigma }_{u}{\mu }_{1,1}-{\hat{\sigma }}_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{1,1}= {\rho }_{b}B{g}_{1}-d{\mu }_{1,1}-{\sigma }_{u}{\mu }_{1,1}+{\hat{\sigma }}_{b}{\mu }_{1,0}.$$

(5)

Inserting Eq. (2) into Eq. (5) gives a closed set of moment equations, from which the values of g₀, μ_1,1, and μ_1,0 can be computed approximately. Finally, using these values, the effective parameter ${\hat{\sigma }}_{b}$ can be obtained from Eq. (2). The remaining steps for the LMA are the same as for the 2-HM.

Determining the effective parameters for the 4-HM

For the autoregulatory circuit, it follows from Eq. (1) that

$${\dot{\mu }}_{1,0}+{\dot{\mu }}_{1,1}= {\rho }_{u}B{g}_{0}+{\rho }_{b}B{g}_{1}-d({\mu }_{1,0}+{\mu }_{1,1})\\ +{\sigma }_{u}{g}_{1}-{\sigma }_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{2,0}+{\dot{\mu }}_{2,1}= 2{\rho }_{u}B({\mu }_{1,0}+B{g}_{0})+2{\rho }_{b}B({\mu }_{1,1}+B{g}_{1})\\ -2d({\mu }_{2,0}+{\mu }_{2,1})+2{\sigma }_{u}{\mu }_{1,1}-2{\sigma }_{b}{\mu }_{2,0}.$$

(6)

For the linear network in Fig. 2d, the evolution of moments is governed by

$${\dot{g}}_{0}= {\bar{\sigma }}_{u}{g}_{1}-{\bar{\sigma }}_{b}{g}_{0},\\ {\dot{\mu }}_{1,0}= {\bar{\rho }}_{u}B{g}_{0}-d{\mu }_{1,0}+{\bar{\sigma }}_{u}{\mu }_{1,1}-{\bar{\sigma }}_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{1,1}= {\bar{\rho }}_{b}B{g}_{1}-d{\mu }_{1,1}-{\bar{\sigma }}_{u}{\mu }_{1,1}+{\bar{\sigma }}_{b}{\mu }_{1,0},\\ {\dot{\mu }}_{2,0}= 2{\bar{\rho }}_{u}B({\mu }_{1,0}+B{g}_{0})-2d{\mu }_{2,0}+{\bar{\sigma }}_{u}{\mu }_{2,1}-{\bar{\sigma }}_{b}{\mu }_{2,0},\\ {\dot{\mu }}_{2,1}= 2{\bar{\rho }}_{b}B({\mu }_{1,1}+B{g}_{1})-2d{\mu }_{2,1}-{\bar{\sigma }}_{u}{\mu }_{2,1}+{\bar{\sigma }}_{b}{\mu }_{2,0}.$$

(7)

From these equations, it is easy to show that

$${\dot{\mu }}_{1,0}+{\dot{\mu }}_{1,1}= {\bar{\rho }}_{u}B{g}_{0}+{\bar{\rho }}_{b}B{g}_{1}-d({\mu }_{1,0}+{\mu }_{1,1}),\\ {\dot{\mu }}_{2,0}+{\dot{\mu }}_{2,1}= 2{\bar{\rho }}_{u}B({\mu }_{1,0}+B{g}_{0})+2{\bar{\rho }}_{b}B({\mu }_{1,1}+B{g}_{1})\\ -2d({\mu }_{2,0}+{\mu }_{2,1}).$$

(8)

Matching Eqs. (6) and (8), we find that ${\bar{\rho }}_{b}$ and ${\bar{\rho }}_{u}$ should satisfy the following system of linear equations:

$${\bar{\rho }}_{u}B{g}_{0}+{\bar{\rho }}_{b}B{g}_{1}= {\rho }_{u}B{g}_{0}+{\rho }_{b}B{g}_{1}+{\sigma }_{u}{g}_{1}-{\sigma }_{b}{\mu }_{1,0},\\ {\bar{\rho }}_{u}B({\mu }_{1,0}+B{g}_{0})+{\bar{\rho }}_{b}B({\mu }_{1,1}+B{g}_{1}) \\= {\rho }_{u}B({\mu }_{1,0}+B{g}_{0})+{\rho }_{b}B({\mu }_{1,1}+B{g}_{1})+{\sigma }_{u}{\mu }_{1,1}-{\sigma }_{b}{\mu }_{2,0}.$$

(9)

Matching the first and third identities in Eqs. (1) and (7), it is clear that ${\bar{\sigma }}_{b}$ and ${\bar{\sigma }}_{u}$ should satisfy the following system of linear equations:

$$\begin{array}{rcl}&& {\bar{\sigma }}_{u}{g}_{1}-{\bar{\sigma }}_{b}{g}_{0}={\sigma }_{u}{g}_{1}-{\sigma }_{b}{\mu }_{1,0},\\ &&{\bar{\sigma }}_{u}{\mu }_{1,1}-{\bar{\sigma }}_{b}{\mu }_{1,0}={\sigma }_{u}{\mu }_{1,1}-{\sigma }_{b}{\mu }_{2,0}+({\bar{\rho }}_{b}-{\rho }_{b})B{g}_{1},\end{array}$$

(10)

where ${\bar{\rho }}_{b}$ has been determined by solving Eq. (9). Compared with Eq. (4), Eq. (10) has an additional term $({\bar{\rho }}_{b}-{\rho }_{b})B{g}_{1}$. This is because ρ_b remains unchanged for the 2-HM but is changed for the 4-HM.

Finally, inserting Eqs. (9) and (10) into Eq. (7) gives a system of closed moment equations and hence the values of all zeroth, first, and second-order moments can be approximately calculated. Substituting these moments into Eqs. (9) and (10), one can finally solve for the four effective parameters ${\bar{\rho }}_{u},{\bar{\rho }}_{b},{\bar{\sigma }}_{u}$, and ${\bar{\sigma }}_{b}$ of the linear network. The 4-HM predicts the protein distribution of the nonlinear network by solving the CME of the linear one in Fig. 2d with the values of the effective parameters found above.

Stochastic model for complex gene networks

Here we consider the stochastic model of an arbitrary gene regulatory network involving protein synthesis, protein degradation, gene state switching, and complex gene regulation mechanisms⁶³. Specifically, we assume that the network involves M distinct genes, each of which can be in two states: an inactive state G_j and an active state ${G}_{j}^{*}$. The protein associated with gene G_j is denoted by P_j. The network can be described by the following reactions:

$$ G_j \mathop{\longrightarrow}^{\alpha_j^0} G_j^{\ast},\quad G_j^{\ast} \mathop{\longrightarrow}^{\alpha_j^1} G_j,\\ G_j+h_{ji}P_i\mathop{\longrightarrow}^{\sigma_{ji}^0} G_j^{\ast},\quad G_j^{\ast}+h_{ji}P_i\mathop{\longrightarrow}^{\sigma_{ji}^1} G_{j},\\ G_j\mathop{\longrightarrow}^{\rho_j^0} G_j+P_j,\quad G_j^{\ast} \mathop{\longrightarrow}^{\rho_j^1} G_j^{\ast}+P_j,\\ P_j\mathop{\longrightarrow}^{d_j}{\varnothing},\quad i,\; j=1,2,...,M,$$

(11)

where the reactions in the first row describe spontaneous gene state switching, the reactions in the second row describe gene regulation, the reactions in the third row describe protein synthesis in the two-gene states, and the last reaction describes protein degradation or dilution. Since G_j is the inactive state and ${G}_{j}^{*}$ is the active state, we have ${\rho }_{j}^{1} \, > \, {\rho }_{j}^{0}$. Due to complex gene regulation, each gene G_j may be regulated by all genes. If gene G_i activates gene G_j, then ${\sigma }_{ji}^{0} \, > \, 0$ and ${\sigma }_{ji}^{1}=0$ since the binding of protein P_i induces the switching from G_j to ${G}_{j}^{*}$; on the contrary, if gene G_i inhibits gene G_j, then ${\sigma }_{ji}^{0}=0$ and ${\sigma }_{ji}^{1} \, > \, 0$ since the binding of protein P_i induces the switching from ${G}_{j}^{*}$ to G_j. When performing simulations (SSA and SSA + Holimap), the parameters are chosen as ${d}_{i}=1,\, {h}_{ji}=1,\, {\rho }_{j}^{1}=81,\, {\rho }_{j}^{0}=5.4,\, {\alpha }_{j}^{0}={\alpha }_{j}^{1}=0.5,\, {\sigma }_{ji}^{0}=0.01,\, {\sigma }_{ji}^{1}=0$ when G_i activates G_j, and ${\sigma }_{ji}^{0}=0,\, {\sigma }_{ji}^{1}=0.01$ when G_i inhibits G_j. The presence or absence of a gene-gene interaction and its type are determined randomly. Here we assume that protein-gene interactions are non-cooperative (h_ij = 1). The spontaneous switching rates between G_j and ${G}_{j}^{*}$ are chosen to be ${\sigma }_{j}^{0}={\sigma }_{j}^{1}=0.5$. Since each gene is on average regulated by two genes (one positive regulation and one negative regulation), the switching rates due to gene regulation are roughly equal to ${\sigma }_{ji}^{0}={\sigma }_{ji}^{1}=0.01$ multiplied by the number of regulator P_i, which is ~50. Hence the total switching rates due to spontaneous contributions and gene regulation are roughly 0.5 + 0.01 × 50 = 1, i.e., they are comparable with the degradation rate d_i = 1.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

MATLAB R2019a was used to analyze the data. Source data are provided with this paper.

Code availability

The MATLAB codes for Holimap and SSA + Holimap can be found in the Github repository⁶⁴.

References

Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68 (2002).
Article CAS PubMed Google Scholar
Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the evolution of animal body plans. Science 311, 796–800 (2006).
Article ADS CAS PubMed Google Scholar
Olson, E. N. Gene regulatory networks in the evolution and development of the heart. Science 313, 1922–1927 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Spitz, F. & Furlong, E. E. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).
Article CAS PubMed Google Scholar
Pavlopoulos, G. A. et al. Using graph theory to analyze biological networks. BioData Min. 4, 1–27 (2011).
Article Google Scholar
Koutrouli, M., Karatzas, E., Paez-Espino, D. & Pavlopoulos, G. A. A guide to conquer the biological network era using graph theory. Front. Bioeng. Biotech. 8, 34 (2020).
Article Google Scholar
Emmert-Streib, F., Dehmer, M. & Haibe-Kains, B. Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front. Cell Dev. Biol. 2, 38 (2014).
Article PubMed PubMed Central Google Scholar
Margolin, A. A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 7, 1–15 (2006).
Article Google Scholar
Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 challenges: a community effort to assess biological network inference. Ann. N. Y. Acad. Sci. 1158, 159–195 (2009).
Article ADS CAS PubMed Google Scholar
Emmert-Streib, F., Glazko, G. V., Altay, G. & de Matos Simoes, R. Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front. Genet. 3, 8 (2012).
Article PubMed PubMed Central Google Scholar
Chan, T. E., Stumpf, M. P. & Babtie, A. C. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 5, 251–267 (2017).
Article CAS PubMed PubMed Central Google Scholar
Badia-i Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 1–16 (2023).
De Jong, H. Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. 9, 67–103 (2002).
Article MathSciNet PubMed Google Scholar
Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).
Article CAS PubMed Google Scholar
Edelstein-Keshet, L. Mathematical models in biology (SIAM, 2005).
Ingalls, B. P. Mathematical modeling in systems biology: an introduction (MIT press, 2013).
Schnoerr, D., Sanguinetti, G. & Grima, R. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. J. Phys. A Math. Theor. 50, 093001 (2017).
Article ADS MathSciNet Google Scholar
Elowitz, M. B., Levine, A. J., Siggia, E. D. & Swain, P. S. Stochastic gene expression in a single cell. Science 297, 1183 (2002).
Article ADS CAS PubMed Google Scholar
Ozbudak, E. M., Thattai, M., Kurtser, I., Grossman, A. D. & van Oudenaarden, A. Regulation of noise in the expression of a single gene. Nat. Genet. 31, 69–73 (2002).
Article CAS PubMed Google Scholar
Munsky, B., Neuert, G. & Van Oudenaarden, A. Using gene expression noise to understand gene regulation. Science 336, 183–187 (2012).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Munsky, B. & Khammash, M. The finite state projection algorithm for the solution of the chemical master equation. J. Chem. Phys. 124, 044104 (2006).
Article ADS PubMed Google Scholar
Gillespie, D. T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976).
Article ADS MathSciNet CAS Google Scholar
Székely Jr, T. & Burrage, K. Stochastic simulation in systems biology. Comput. Struct. Biotechnol. J. 12, 14–25 (2014).
Article PubMed Google Scholar
Klipp, E., Liebermeister, W., Wierling, C. & Kowald, A. Systems biology: a textbook (John Wiley & Sons, 2016).
Munsky, B., Hlavacek, W. S. & Tsimring, L. S. Quantitative biology: theory, computational methods, and models (MIT Press, 2018).
Bateman, E. Autoregulation of eukaryotic transcription factors. Prog. Nucleic Acid Res. Mol. Biol. 60, 133–168 (1998).
Article CAS PubMed Google Scholar
Becskei, A. & Serrano, L. Engineering stability in gene networks by autoregulation. Nature 405, 590–593 (2000).
Article ADS CAS PubMed Google Scholar
Crews, S. T. & Pearson, J. C. Transcriptional autoregulation in development. Curr. Biol. 19, R241–R246 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hermsen, R., Ursem, B. & Ten Wolde, P. R. Combinatorial gene regulation using auto-regulation. PLoS Comput. Biol. 6, e1000813 (2010).
Article MathSciNet PubMed PubMed Central Google Scholar
Nie, Y., Shu, C. & Sun, X. Cooperative binding of transcription factors in the human genome. Genomics 112, 3427–3434 (2020).
Article CAS PubMed Google Scholar
Jia, C. & Grima, R. Dynamical phase diagram of an auto-regulating gene in fast switching conditions. J. Chem. Phys. 152, 174110 (2020).
Article ADS CAS PubMed Google Scholar
Cai, L., Friedman, N. & Xie, X. S. Stochastic protein expression in individual cells at the single molecule level. Nature 440, 358–362 (2006).
Article ADS CAS PubMed Google Scholar
Singh, A. & Hespanha, J. P. Lognormal moment closures for biochemical reactions. In Proc. of the 45th IEEE Conference on Decision and Control, 2063–2068 (IEEE, 2006).
Cao, Z. & Grima, R. Linear mapping approximation of gene regulatory networks with stochastic dynamics. Nat. Commun. 9, 3305 (2018).
Article ADS PubMed PubMed Central Google Scholar
Grima, R. A study of the accuracy of moment-closure approximations for stochastic chemical kinetics. J. Chem. Phys. 136, 154105 (2012).
Jia, C. & Grima, R. Small protein number effects in stochastic models of autoregulated bursty gene expression. J. Chem. Phys. 152, 084115 (2020).
Article ADS CAS PubMed Google Scholar
Ko, M. S. A stochastic model for gene induction. J. Theor. Biol. 153, 181–194 (1991).
Article ADS CAS PubMed Google Scholar
Jiao, F., Sun, Q., Tang, M., Yu, J. & Zheng, B. Distribution modes and their corresponding parameter regions in stochastic gene transcription. SIAM J. Appl. Math. 75, 2396–2420 (2015).
Article MathSciNet Google Scholar
Jia, C. & Grima, R. Frequency domain analysis of fluctuations of mRNA and protein copy numbers within a cell lineage: theory and experimental validation. Phys. Rev. X 11, 021032 (2021).
CAS Google Scholar
Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337 (2011).
Article ADS PubMed Google Scholar
Larsson, A. J. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Suter, D. M. et al. Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474 (2011).
Article ADS CAS PubMed Google Scholar
Gardner, T. S., Cantor, C. R. & Collins, J. J. Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339 (2000).
Article ADS CAS PubMed Google Scholar
Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338 (2000).
Article ADS CAS PubMed Google Scholar
Goodwin, B. C. Oscillatory behavior in enzymatic control processes. Adv. Enzym. Regul. 3, 425–437 (1965).
Article CAS Google Scholar
Bragdon, M. D. et al. Cooperative assembly confers regulatory specificity and long-term genetic circuit stability. Cell 186, 3810–3825 (2023).
Article CAS PubMed Google Scholar
Lammers, N. C., Kim, Y. J., Zhao, J. & Garcia, H. G. A matter of time: using dynamics and theory to uncover mechanisms of transcriptional bursting. Curr. Opin. Cell Biol. 67, 147–157 (2020).
Article CAS PubMed PubMed Central Google Scholar
Goentoro, L., Shoval, O., Kirschner, M. W. & Alon, U. The incoherent feedforward loop can provide fold-change detection in gene regulation. Mol. Cell 36, 894–899 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zhu, R., del Rio-Salgado, J. M., Garcia-Ojalvo, J. & Elowitz, M. B. Synthetic multistability in mammalian cells. Science 375, eabg9765 (2022).
Article CAS PubMed Google Scholar
Gonze, D., Halloy, J. & Goldbeter, A. Robustness of circadian rhythms with respect to molecular noise. Proc. Natl Acad. Sci. USA 99, 673–678 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Kuwahara, H. & Schwartz, R. Stochastic steady state gain in a gene expression process with mRNA degradation control. J. R. Soc. Interface 9, 1589–1598 (2012).
Article PubMed PubMed Central Google Scholar
Nordick, B., Yu, P. Y., Liao, G. & Hong, T. Nonmodular oscillator and switch based on RNA decay drive regeneration of multimodal gene expression. Nucleic Acids Res. 50, 3693–3708 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tian, T., Burrage, K., Burrage, P. M. & Carletti, M. Stochastic delay differential equations for genetic regulatory networks. J. Comput. Appl. Math. 205, 696–707 (2007).
Article ADS MathSciNet Google Scholar
Tomioka, R., Kimura, H., Kobayashi, T. J. & Aihara, K. Multivariate analysis of noise in genetic regulatory networks. J. Theor. Biol. 229, 501–521 (2004).
Article ADS MathSciNet CAS PubMed Google Scholar
Friedman, N., Cai, L. & Xie, X. S. Linking stochastic dynamics to population distribution: an analytical framework of gene expression. Phys. Rev. Lett. 97, 168302 (2006).
Article ADS PubMed Google Scholar
Bokes, P. & Singh, A. Protein copy number distributions for a self-regulating gene in the presence of decoy binding sites. PLoS one. 10, e0120555 (2015).
Schnoerr, D., Sanguinetti, G. & Grima, R. Comparison of different moment-closure approximations for stochastic chemical kinetics. J. Chem. Phys. 143, 185101 (2015).
Lakatos, E., Ale, A., Kirk, P. D. & Stumpf, M. P. Multivariate moment closure techniques for stochastic kinetic models. J. Chem. Phys. 143, 094107 (2015).
Zhou, T. & Zhang, J. Analytical results for a multistate gene model. SIAM J. Appl. Math. 72, 789–818 (2012).
Article MathSciNet Google Scholar
Jia, C. & Li, Y. Analytical time-dependent distributions for gene expression models with complex promoter switching mechanisms. SIAM J. Appl. Math. 83, 1572–1602 (2023).
Article MathSciNet Google Scholar
Sukys, A., Öcal, K. & Grima, R. Approximating solutions of the chemical master equation using neural networks. Iscience. 25, (2022).
Wang, X., Li, Y. & Jia, C. Poisson representation: a bridge between discrete and continuous models of stochastic gene regulatory networks. J. R. Soc. Interface 20, 20230467 (2023).
Article PubMed Google Scholar
Jia, C. & Grima, R. Holimap: an accurate and efficient method for solving stochastic gene network dynamics. chenjiacsrc/Holimap https://doi.org/10.5281/zenodo.12725485 (2024).

Download references

Acknowledgements

We thank Augustinas Sukys for comments on the manuscript. C.J. acknowledges support from the National Natural Science Foundation of China with grant Nos. U2230402 and 12271020. R.G. acknowledges support from the Leverhulme Trust (RPG-2020-327).

Author information

Authors and Affiliations

Applied and Computational Mathematics Division, Beijing Computational Science Research Center, Beijing, China
Chen Jia
School of Biological Sciences, University of Edinburgh, Edinburgh, UK
Ramon Grima

Authors

Chen Jia
View author publications
You can also search for this author in PubMed Google Scholar
Ramon Grima
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.G. conceived the original idea. C.J. performed the theoretical derivations and numerical simulations. C.J. and R.G. interpreted the theoretical results and jointly wrote the manuscript.

Corresponding author

Correspondence to Ramon Grima.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jia, C., Grima, R. Holimap: an accurate and efficient method for solving stochastic gene network dynamics. Nat Commun 15, 6557 (2024). https://doi.org/10.1038/s41467-024-50716-z

Download citation

Received: 02 March 2024
Accepted: 13 July 2024
Published: 02 August 2024
DOI: https://doi.org/10.1038/s41467-024-50716-z
Springer Nature Limited

Holimap: an accurate and efficient method for solving stochastic gene network dynamics

Abstract

Similar content being viewed by others

Introduction

Results

Fundamental principles of Holimap illustrated by an autoregulation example

Applications to one-node (autoregulatory) networks

Applications to two-node networks with deterministic mono- and bistability

Applications to three-node networks with deterministic oscillations

A hybrid combination of SSA and Holimap provides highly efficient computation of complex gene network dynamics

Generalization to networks with post-translational or post-transcriptional regulation

Discussion

Methods

Determining the effective parameter for the LMA

Determining the effective parameters for the 4-HM

Stochastic model for complex gene networks

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation