Skip to main content

Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting

  • Conference paper
  • First Online:
Computational Methods in Systems Biology (CMSB 2023)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14137))

Included in the following conference series:

  • 260 Accesses

Abstract

Gene regulatory networks, as a powerful abstraction for describing complex biological interactions between genes through their expression products within a cell, are often regarded as virtually deterministic dynamical systems. However, this view is now being challenged by the fundamentally stochastic, ‘bursty’ nature of gene expression revealed at the single cell level. We present a Python package called Harissa which is dedicated to simulation and inference of such networks, based upon an underlying stochastic dynamical model driven by the transcriptional bursting phenomenon. As part of this tool, network inference can be interpreted as a calibration procedure for a mechanistic model: once calibrated, the model is able to capture the typical variability of single-cell data without requiring ad hoc external noise, unlike ordinary or even stochastic differential equations frequently used in this context. Therefore, Harissa can be used both as an inference tool, to reconstruct biologically relevant networks from time-course scRNA-seq data, and as a simulation tool, to generate quantitative gene expression profiles in a non-trivial way through gene interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Code Availability

The code of the package, a tutorial and some basic usage scripts are available at https://github.com/ulysseherbach/harissa. In addition, Harissa is indexed in the Python Package Index and can be installed via pip.

References

  1. Benaïm, M., Le Borgne, S., Malrieu, F., Zitt, P.A.: Qualitative properties of certain piecewise deterministic Markov processes. Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 51(3), 1040–1075 (2015). https://doi.org/10.1214/14-AIHP619

    Article  Google Scholar 

  2. Faggionato, A., Gabrielli, D., Crivellari, M.: Averaging and large deviation principles for fully-coupled piecewise deterministic Markov processes and applications to molecular motors. Markov Process. Rel. Fields 16(3), 497–548 (2010). https://doi.org/10.48550/arXiv.0808.1910

    Article  Google Scholar 

  3. Herbach, U.: Modélisation stochastique de l’expression des gènes et inférence de réseaux de régulation. Ph.D. thesis, Université de Lyon (2018)

    Google Scholar 

  4. Herbach, U.: Stochastic gene expression with a multistate promoter: breaking down exact distributions. SIAM J. Appl. Math. 79(3), 1007–1029 (2019). https://doi.org/10.1137/18M1181006

    Article  Google Scholar 

  5. Herbach, U., Bonnaffoux, A., Espinasse, T., Gandrillon, O.: Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Syst. Biol. 11(1), 105 (2017). https://doi.org/10.1186/s12918-017-0487-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Malrieu, F.: Some simple but challenging Markov processes. Annales de la Faculté de Sciences de Toulouse 24(4), 857–883 (2015). https://doi.org/10.5802/afst.1468

    Article  Google Scholar 

  7. Richard, A., et al.: Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 14(12), e1002585 (2016). https://doi.org/10.1371/journal.pbio.1002585

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Sarkar, A., Stephens, M.: Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis. Nat. Genet. 53(6), 770–777 (2021). https://doi.org/10.1038/s41588-021-00873-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Schwanhäusser, B., et al.: Global quantification of mammalian gene expression control. Nature 473(7347), 337–342 (2011). https://doi.org/10.1038/nature10098

    Article  CAS  PubMed  Google Scholar 

  10. Semrau, S., Goldmann, J.E., Soumillon, M., Mikkelsen, T.S., Jaenisch, R., van Oudenaarden, A.: Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat. Commun. 8(1), 1096 (2017). https://doi.org/10.1038/s41467-017-01076-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Shahrezaei, V., Swain, P.S.: The stochastic nature of biochemical networks. Curr. Opin. Biotechnol. 19(4), 369–374 (2008). https://doi.org/10.1016/j.copbio.2008.06.011

    Article  CAS  PubMed  Google Scholar 

  12. Stumpf, P.S., et al.: Stem cell differentiation as a non-Markov stochastic process. Cell Syst. 5(3), 268–282 (2017). https://doi.org/10.1016/j.cels.2017.08.009

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Tunnacliffe, E., Chubb, J.R.: What is a transcriptional burst? Trends Genet. 36(4), 288–297 (2020). https://doi.org/10.1016/j.tig.2020.01.003

    Article  CAS  PubMed  Google Scholar 

  14. Ventre, E.: Reverse engineering of a mechanistic model of gene expression using metastability and temporal dynamics. In Silico Biol. 14(3–4), 89–113 (2021). https://doi.org/10.3233/ISB-210226

    Article  CAS  PubMed  Google Scholar 

  15. Ventre, E., Espinasse, T., Bréhier, C.E., Calvez, V., Lepoutre, T., Gandrillon, O.: Reduction of a stochastic model of gene expression: lagrangian dynamics gives access to basins of attraction as cell types and metastabilty. J. Math. Biol. 83(5), 59 (2021). https://doi.org/10.1007/s00285-021-01684-1

    Article  PubMed  Google Scholar 

  16. Ventre, E., Herbach, U., Espinasse, T., Benoit, G., Gandrillon, O.: One model fits all: combining inference and simulation of gene regulatory networks. PLoS Comput. Biol. 19(3), e1010962 (2023). https://doi.org/10.1371/journal.pcbi.1010962

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The author is very grateful to Elias Ventre and Olivier Gandrillon for fruitful discussions which led to improve the Harissa package.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulysse Herbach .

Editor information

Editors and Affiliations

A Appendices

A Appendices

1.1 A.1 Reduced Model

The inference procedure is based on analytical results which are not available for the two-stage ‘mRNA-protein’ model (3). On the other hand, such results exist for a one-stage ‘protein-only’ model that is a valid approximation of the former when proteins are more stable than mRNA (i.e. \(d_{0,i}/d_{1,i} \gg 1\)). The resulting process \((Z(t))_{t\ge 0} \in \mathbb {R}_+^n\) is also a PDMP, whose master equation can be interpreted in terms of simplified trajectories (Fig. 1B):

$$\begin{aligned} \begin{array}{rll} \displaystyle \frac{\partial }{\partial t}p(z,t) &{} \displaystyle = \sum _{i=1}^n \left[ d_{1,i}\frac{\partial }{\partial z_i}\{z_ip(z,t)\} \right. &{}\\ &{} \qquad \displaystyle + \left. \int _0^{z_i}k_{\text {on},i}(z-he_i) p(z-he_i,t) c_i e^{-c_i h} \textrm{d}h - k_{\text {on},i}(z) p(z,t) \right] . &{} \end{array} \end{aligned}$$
(4)

Given Z(t), mRNA levels X(t) are obtained by sampling independently for every \(i\in \{1,\dots ,n\}\) and \(t > 0\) from \(X_i(t) \sim \textrm{Gamma}(k_{\textrm{on},i}(Z(t))/d_{0,i},b_i\)), which is the quasi-steady-state (QSS) distribution of the complete model [5, 6].

1.2 A.2 Inference Algorithm

Now consider mRNA counts measured in m cells, assumed independent, along a time-course experiment following a stimulus. Each cell \(k = 1, \dots , m\) is associated with an experimental time point \(t_k\). We introduce the following notation:

  • \(\textbf{x}_k = (x_{ki})\in \{0,1,2,\dots \}^{n}\) : mRNA counts (cell k, gene i);

  • \(\textbf{z}_k = (z_{ki})\in (0,+\infty )^{n}\) : latent protein levels (cell k, gene i);

  • \(\alpha = (\alpha _{ij}(t_k))\in \mathbb {R}^{n \times n}\) : effective interaction \(i \rightarrow j\) at time \(t_k\).

A stimulus is represented as gene \(i=0\) and we therefore add parameters \(\alpha _{0j}(t_k)\) for \(j=1, \dots , n\) and \(k=1, \dots , m\). We further set \(z_{k0} = 0\) if \(t_k \le 0\) (before stimulus) and \(z_{k0} = 1\) if \(t_k > 0\) (after stimulus). Then, writing \(a_i = k_{1,i}/d_{0,i}\), the underlying statistical model of Harissa is defined by

$$\begin{aligned} p(\textbf{z}_k)&= \prod _{i=1}^n {z_{ki}}^{c_i \sigma _{ki} - 1} e^{-c_i z_{ki}} \frac{{c_i}^{c_i\sigma _{ki}}}{{\Gamma }(c_i\sigma _{ki})} \;, \end{aligned}$$
(5)
$$\begin{aligned} p(\textbf{x}_k | \textbf{z}_k)&= \prod _{i=1}^n \frac{1}{x_{ki}!} \frac{{\Gamma }(a_i z_{ki} + x_{ki})}{{\Gamma }(a_i z_{ki})} \frac{{b_i}^{a_i z_{ki}}}{(b_i+1)^{a_i z_{ki} + x_{ki}}} \;, \end{aligned}$$
(6)

with

$$\begin{aligned} \sigma _{ki} = {\left[ 1+\exp (-\{\beta _i + \alpha _{0i}(t_k) z_{k0} + \textstyle \sum _{j=1}^n \alpha _{ji}(t_k) z_{kj}\})\right] }^{-1} . \end{aligned}$$
(7)

Details of this derivation can be found in [4, 5, 8, 14]. Roughly, (5) comes from a ‘Hartree’ approximation of (4), while (6) corresponds to a Poisson distribution with random parameter sampled from the QSS distribution of X(t) given Z(t). Note that \(p(\textbf{z}_k)\) is in general only a pseudo-likelihood as \(\sigma _{ki}\) depends on \(\textbf{z}_k\).

Since the preliminary version of Harissa [5], the global inference procedure has been heavily improved using important identifiability results from [14, 15]. The final algorithm consists of three steps:

  1. 1.

    Model calibration: estimate \(a_i\) and \(b_i\) for each gene individually from (6);

  2. 2.

    Bursting mode inference: estimate the frequency mode (\(k_{0,i}\) or \(k_{1,i}\)) for each gene in each cell (can be seen as a binarization step with specific thresholds);

  3. 3.

    Network inference: consider \(\textbf{z}_k\) as observed from step 2 and maximize (5) with respect to \(\alpha \) after adding an appropriate penalization term [14]. Each parameter \(\theta _{ij}\) is then set to \(\alpha _{ij}(t_k)\) with \(t_k\) that maximizes \(|\alpha _{ij}(t_k)|\).

1.3 A.3 Repressilator Network

Considering an instance model = NetworkModel(3), the repressilator network simulated in Fig. 3 is defined as follows:

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Herbach, U. (2023). Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting. In: Pang, J., Niehren, J. (eds) Computational Methods in Systems Biology. CMSB 2023. Lecture Notes in Computer Science(), vol 14137. Springer, Cham. https://doi.org/10.1007/978-3-031-42697-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42697-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42696-4

  • Online ISBN: 978-3-031-42697-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics