Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting

Herbach, Ulysse

doi:10.1007/978-3-031-42697-1_7

Ulysse Herbach ORCID: orcid.org/0000-0002-0972-385X⁹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14137))

Included in the following conference series:

International Conference on Computational Methods in Systems Biology

260 Accesses

Abstract

Gene regulatory networks, as a powerful abstraction for describing complex biological interactions between genes through their expression products within a cell, are often regarded as virtually deterministic dynamical systems. However, this view is now being challenged by the fundamentally stochastic, ‘bursty’ nature of gene expression revealed at the single cell level. We present a Python package called Harissa which is dedicated to simulation and inference of such networks, based upon an underlying stochastic dynamical model driven by the transcriptional bursting phenomenon. As part of this tool, network inference can be interpreted as a calibration procedure for a mechanistic model: once calibrated, the model is able to capture the typical variability of single-cell data without requiring ad hoc external noise, unlike ordinary or even stochastic differential equations frequently used in this context. Therefore, Harissa can be used both as an inference tool, to reconstruct biologically relevant networks from time-course scRNA-seq data, and as a simulation tool, to generate quantitative gene expression profiles in a non-trivial way through gene interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Code Availability

The code of the package, a tutorial and some basic usage scripts are available at https://github.com/ulysseherbach/harissa. In addition, Harissa is indexed in the Python Package Index and can be installed via pip.

References

Benaïm, M., Le Borgne, S., Malrieu, F., Zitt, P.A.: Qualitative properties of certain piecewise deterministic Markov processes. Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 51(3), 1040–1075 (2015). https://doi.org/10.1214/14-AIHP619
Article Google Scholar
Faggionato, A., Gabrielli, D., Crivellari, M.: Averaging and large deviation principles for fully-coupled piecewise deterministic Markov processes and applications to molecular motors. Markov Process. Rel. Fields 16(3), 497–548 (2010). https://doi.org/10.48550/arXiv.0808.1910
Article Google Scholar
Herbach, U.: Modélisation stochastique de l’expression des gènes et inférence de réseaux de régulation. Ph.D. thesis, Université de Lyon (2018)
Google Scholar
Herbach, U.: Stochastic gene expression with a multistate promoter: breaking down exact distributions. SIAM J. Appl. Math. 79(3), 1007–1029 (2019). https://doi.org/10.1137/18M1181006
Article Google Scholar
Herbach, U., Bonnaffoux, A., Espinasse, T., Gandrillon, O.: Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Syst. Biol. 11(1), 105 (2017). https://doi.org/10.1186/s12918-017-0487-0
Article CAS PubMed PubMed Central Google Scholar
Malrieu, F.: Some simple but challenging Markov processes. Annales de la Faculté de Sciences de Toulouse 24(4), 857–883 (2015). https://doi.org/10.5802/afst.1468
Article Google Scholar
Richard, A., et al.: Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 14(12), e1002585 (2016). https://doi.org/10.1371/journal.pbio.1002585
Article CAS PubMed PubMed Central Google Scholar
Sarkar, A., Stephens, M.: Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis. Nat. Genet. 53(6), 770–777 (2021). https://doi.org/10.1038/s41588-021-00873-4
Article CAS PubMed PubMed Central Google Scholar
Schwanhäusser, B., et al.: Global quantification of mammalian gene expression control. Nature 473(7347), 337–342 (2011). https://doi.org/10.1038/nature10098
Article CAS PubMed Google Scholar
Semrau, S., Goldmann, J.E., Soumillon, M., Mikkelsen, T.S., Jaenisch, R., van Oudenaarden, A.: Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat. Commun. 8(1), 1096 (2017). https://doi.org/10.1038/s41467-017-01076-4
Article CAS PubMed PubMed Central Google Scholar
Shahrezaei, V., Swain, P.S.: The stochastic nature of biochemical networks. Curr. Opin. Biotechnol. 19(4), 369–374 (2008). https://doi.org/10.1016/j.copbio.2008.06.011
Article CAS PubMed Google Scholar
Stumpf, P.S., et al.: Stem cell differentiation as a non-Markov stochastic process. Cell Syst. 5(3), 268–282 (2017). https://doi.org/10.1016/j.cels.2017.08.009
Article CAS PubMed PubMed Central Google Scholar
Tunnacliffe, E., Chubb, J.R.: What is a transcriptional burst? Trends Genet. 36(4), 288–297 (2020). https://doi.org/10.1016/j.tig.2020.01.003
Article CAS PubMed Google Scholar
Ventre, E.: Reverse engineering of a mechanistic model of gene expression using metastability and temporal dynamics. In Silico Biol. 14(3–4), 89–113 (2021). https://doi.org/10.3233/ISB-210226
Article CAS PubMed Google Scholar
Ventre, E., Espinasse, T., Bréhier, C.E., Calvez, V., Lepoutre, T., Gandrillon, O.: Reduction of a stochastic model of gene expression: lagrangian dynamics gives access to basins of attraction as cell types and metastabilty. J. Math. Biol. 83(5), 59 (2021). https://doi.org/10.1007/s00285-021-01684-1
Article PubMed Google Scholar
Ventre, E., Herbach, U., Espinasse, T., Benoit, G., Gandrillon, O.: One model fits all: combining inference and simulation of gene regulatory networks. PLoS Comput. Biol. 19(3), e1010962 (2023). https://doi.org/10.1371/journal.pcbi.1010962
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The author is very grateful to Elias Ventre and Olivier Gandrillon for fruitful discussions which led to improve the Harissa package.

Author information

Authors and Affiliations

Université de Lorraine, CNRS, Inria, IECL, 54000, Nancy, France
Ulysse Herbach

Authors

Ulysse Herbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ulysse Herbach .

Editor information

Editors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jun Pang
Inria Lille, Villeneuve d’Ascq, France
Joachim Niehren

A Appendices

1.1 A.1 Reduced Model

The inference procedure is based on analytical results which are not available for the two-stage ‘mRNA-protein’ model (3). On the other hand, such results exist for a one-stage ‘protein-only’ model that is a valid approximation of the former when proteins are more stable than mRNA (i.e. $d_{0,i}/d_{1,i} \gg 1$). The resulting process $(Z(t))_{t\ge 0} \in \mathbb {R}_+^n$ is also a PDMP, whose master equation can be interpreted in terms of simplified trajectories (Fig. 1B):

$$\begin{aligned} \begin{array}{rll} \displaystyle \frac{\partial }{\partial t}p(z,t) &{} \displaystyle = \sum _{i=1}^n \left[ d_{1,i}\frac{\partial }{\partial z_i}\{z_ip(z,t)\} \right. &{}\\ &{} \qquad \displaystyle + \left. \int _0^{z_i}k_{\text {on},i}(z-he_i) p(z-he_i,t) c_i e^{-c_i h} \textrm{d}h - k_{\text {on},i}(z) p(z,t) \right] . &{} \end{array} \end{aligned}$$

(4)

Given Z(t), mRNA levels X(t) are obtained by sampling independently for every $i\in \{1,\dots ,n\}$ and $t > 0$ from $X_i(t) \sim \textrm{Gamma}(k_{\textrm{on},i}(Z(t))/d_{0,i},b_i$), which is the quasi-steady-state (QSS) distribution of the complete model [5, 6].

1.2 A.2 Inference Algorithm

Now consider mRNA counts measured in m cells, assumed independent, along a time-course experiment following a stimulus. Each cell $k = 1, \dots , m$ is associated with an experimental time point $t_k$. We introduce the following notation:

$\textbf{x}_k = (x_{ki})\in \{0,1,2,\dots \}^{n}$ : mRNA counts (cell k, gene i);
$\textbf{z}_k = (z_{ki})\in (0,+\infty )^{n}$ : latent protein levels (cell k, gene i);
$\alpha = (\alpha _{ij}(t_k))\in \mathbb {R}^{n \times n}$ : effective interaction $i \rightarrow j$ at time $t_k$.

A stimulus is represented as gene $i=0$ and we therefore add parameters $\alpha _{0j}(t_k)$ for $j=1, \dots , n$ and $k=1, \dots , m$. We further set $z_{k0} = 0$ if $t_k \le 0$ (before stimulus) and $z_{k0} = 1$ if $t_k > 0$ (after stimulus). Then, writing $a_i = k_{1,i}/d_{0,i}$, the underlying statistical model of Harissa is defined by

$$\begin{aligned} p(\textbf{z}_k)&= \prod _{i=1}^n {z_{ki}}^{c_i \sigma _{ki} - 1} e^{-c_i z_{ki}} \frac{{c_i}^{c_i\sigma _{ki}}}{{\Gamma }(c_i\sigma _{ki})} \;, \end{aligned}$$

(5)

$$\begin{aligned} p(\textbf{x}_k | \textbf{z}_k)&= \prod _{i=1}^n \frac{1}{x_{ki}!} \frac{{\Gamma }(a_i z_{ki} + x_{ki})}{{\Gamma }(a_i z_{ki})} \frac{{b_i}^{a_i z_{ki}}}{(b_i+1)^{a_i z_{ki} + x_{ki}}} \;, \end{aligned}$$

(6)

with

$$\begin{aligned} \sigma _{ki} = {\left[ 1+\exp (-\{\beta _i + \alpha _{0i}(t_k) z_{k0} + \textstyle \sum _{j=1}^n \alpha _{ji}(t_k) z_{kj}\})\right] }^{-1} . \end{aligned}$$

(7)

Details of this derivation can be found in [4, 5, 8, 14]. Roughly, (5) comes from a ‘Hartree’ approximation of (4), while (6) corresponds to a Poisson distribution with random parameter sampled from the QSS distribution of X(t) given Z(t). Note that $p(\textbf{z}_k)$ is in general only a pseudo-likelihood as $\sigma _{ki}$ depends on $\textbf{z}_k$.

Since the preliminary version of Harissa [5], the global inference procedure has been heavily improved using important identifiability results from [14, 15]. The final algorithm consists of three steps:

1.
Model calibration: estimate $a_i$ and $b_i$ for each gene individually from (6);
2.
Bursting mode inference: estimate the frequency mode ($k_{0,i}$ or $k_{1,i}$) for each gene in each cell (can be seen as a binarization step with specific thresholds);
3.
Network inference: consider $\textbf{z}_k$ as observed from step 2 and maximize (5) with respect to $\alpha $ after adding an appropriate penalization term [14]. Each parameter $\theta _{ij}$ is then set to $\alpha _{ij}(t_k)$ with $t_k$ that maximizes $|\alpha _{ij}(t_k)|$.

1.3 A.3 Repressilator Network

Considering an instance model = NetworkModel(3), the repressilator network simulated in Fig. 3 is defined as follows:

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herbach, U. (2023). Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting. In: Pang, J., Niehren, J. (eds) Computational Methods in Systems Biology. CMSB 2023. Lecture Notes in Computer Science(), vol 14137. Springer, Cham. https://doi.org/10.1007/978-3-031-42697-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-42697-1_7
Published: 09 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42696-4
Online ISBN: 978-3-031-42697-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting

Abstract

Access this chapter

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendices

A Appendices

1.1 A.1 Reduced Model

1.2 A.2 Inference Algorithm

1.3 A.3 Repressilator Network

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation