Abstract
Gene regulatory networks, as a powerful abstraction for describing complex biological interactions between genes through their expression products within a cell, are often regarded as virtually deterministic dynamical systems. However, this view is now being challenged by the fundamentally stochastic, ‘bursty’ nature of gene expression revealed at the single cell level. We present a Python package called Harissa which is dedicated to simulation and inference of such networks, based upon an underlying stochastic dynamical model driven by the transcriptional bursting phenomenon. As part of this tool, network inference can be interpreted as a calibration procedure for a mechanistic model: once calibrated, the model is able to capture the typical variability of single-cell data without requiring ad hoc external noise, unlike ordinary or even stochastic differential equations frequently used in this context. Therefore, Harissa can be used both as an inference tool, to reconstruct biologically relevant networks from time-course scRNA-seq data, and as a simulation tool, to generate quantitative gene expression profiles in a non-trivial way through gene interactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Code Availability
The code of the package, a tutorial and some basic usage scripts are available at https://github.com/ulysseherbach/harissa. In addition, Harissa is indexed in the Python Package Index and can be installed via pip.
References
Benaïm, M., Le Borgne, S., Malrieu, F., Zitt, P.A.: Qualitative properties of certain piecewise deterministic Markov processes. Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 51(3), 1040–1075 (2015). https://doi.org/10.1214/14-AIHP619
Faggionato, A., Gabrielli, D., Crivellari, M.: Averaging and large deviation principles for fully-coupled piecewise deterministic Markov processes and applications to molecular motors. Markov Process. Rel. Fields 16(3), 497–548 (2010). https://doi.org/10.48550/arXiv.0808.1910
Herbach, U.: Modélisation stochastique de l’expression des gènes et inférence de réseaux de régulation. Ph.D. thesis, Université de Lyon (2018)
Herbach, U.: Stochastic gene expression with a multistate promoter: breaking down exact distributions. SIAM J. Appl. Math. 79(3), 1007–1029 (2019). https://doi.org/10.1137/18M1181006
Herbach, U., Bonnaffoux, A., Espinasse, T., Gandrillon, O.: Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Syst. Biol. 11(1), 105 (2017). https://doi.org/10.1186/s12918-017-0487-0
Malrieu, F.: Some simple but challenging Markov processes. Annales de la Faculté de Sciences de Toulouse 24(4), 857–883 (2015). https://doi.org/10.5802/afst.1468
Richard, A., et al.: Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 14(12), e1002585 (2016). https://doi.org/10.1371/journal.pbio.1002585
Sarkar, A., Stephens, M.: Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis. Nat. Genet. 53(6), 770–777 (2021). https://doi.org/10.1038/s41588-021-00873-4
Schwanhäusser, B., et al.: Global quantification of mammalian gene expression control. Nature 473(7347), 337–342 (2011). https://doi.org/10.1038/nature10098
Semrau, S., Goldmann, J.E., Soumillon, M., Mikkelsen, T.S., Jaenisch, R., van Oudenaarden, A.: Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat. Commun. 8(1), 1096 (2017). https://doi.org/10.1038/s41467-017-01076-4
Shahrezaei, V., Swain, P.S.: The stochastic nature of biochemical networks. Curr. Opin. Biotechnol. 19(4), 369–374 (2008). https://doi.org/10.1016/j.copbio.2008.06.011
Stumpf, P.S., et al.: Stem cell differentiation as a non-Markov stochastic process. Cell Syst. 5(3), 268–282 (2017). https://doi.org/10.1016/j.cels.2017.08.009
Tunnacliffe, E., Chubb, J.R.: What is a transcriptional burst? Trends Genet. 36(4), 288–297 (2020). https://doi.org/10.1016/j.tig.2020.01.003
Ventre, E.: Reverse engineering of a mechanistic model of gene expression using metastability and temporal dynamics. In Silico Biol. 14(3–4), 89–113 (2021). https://doi.org/10.3233/ISB-210226
Ventre, E., Espinasse, T., Bréhier, C.E., Calvez, V., Lepoutre, T., Gandrillon, O.: Reduction of a stochastic model of gene expression: lagrangian dynamics gives access to basins of attraction as cell types and metastabilty. J. Math. Biol. 83(5), 59 (2021). https://doi.org/10.1007/s00285-021-01684-1
Ventre, E., Herbach, U., Espinasse, T., Benoit, G., Gandrillon, O.: One model fits all: combining inference and simulation of gene regulatory networks. PLoS Comput. Biol. 19(3), e1010962 (2023). https://doi.org/10.1371/journal.pcbi.1010962
Acknowledgements
The author is very grateful to Elias Ventre and Olivier Gandrillon for fruitful discussions which led to improve the Harissa package.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendices
A Appendices
1.1 A.1 Reduced Model
The inference procedure is based on analytical results which are not available for the two-stage ‘mRNA-protein’ model (3). On the other hand, such results exist for a one-stage ‘protein-only’ model that is a valid approximation of the former when proteins are more stable than mRNA (i.e. \(d_{0,i}/d_{1,i} \gg 1\)). The resulting process \((Z(t))_{t\ge 0} \in \mathbb {R}_+^n\) is also a PDMP, whose master equation can be interpreted in terms of simplified trajectories (Fig. 1B):
Given Z(t), mRNA levels X(t) are obtained by sampling independently for every \(i\in \{1,\dots ,n\}\) and \(t > 0\) from \(X_i(t) \sim \textrm{Gamma}(k_{\textrm{on},i}(Z(t))/d_{0,i},b_i\)), which is the quasi-steady-state (QSS) distribution of the complete model [5, 6].
1.2 A.2 Inference Algorithm
Now consider mRNA counts measured in m cells, assumed independent, along a time-course experiment following a stimulus. Each cell \(k = 1, \dots , m\) is associated with an experimental time point \(t_k\). We introduce the following notation:
-
\(\textbf{x}_k = (x_{ki})\in \{0,1,2,\dots \}^{n}\) : mRNA counts (cell k, gene i);
-
\(\textbf{z}_k = (z_{ki})\in (0,+\infty )^{n}\) : latent protein levels (cell k, gene i);
-
\(\alpha = (\alpha _{ij}(t_k))\in \mathbb {R}^{n \times n}\) : effective interaction \(i \rightarrow j\) at time \(t_k\).
A stimulus is represented as gene \(i=0\) and we therefore add parameters \(\alpha _{0j}(t_k)\) for \(j=1, \dots , n\) and \(k=1, \dots , m\). We further set \(z_{k0} = 0\) if \(t_k \le 0\) (before stimulus) and \(z_{k0} = 1\) if \(t_k > 0\) (after stimulus). Then, writing \(a_i = k_{1,i}/d_{0,i}\), the underlying statistical model of Harissa is defined by
with
Details of this derivation can be found in [4, 5, 8, 14]. Roughly, (5) comes from a ‘Hartree’ approximation of (4), while (6) corresponds to a Poisson distribution with random parameter sampled from the QSS distribution of X(t) given Z(t). Note that \(p(\textbf{z}_k)\) is in general only a pseudo-likelihood as \(\sigma _{ki}\) depends on \(\textbf{z}_k\).
Since the preliminary version of Harissa [5], the global inference procedure has been heavily improved using important identifiability results from [14, 15]. The final algorithm consists of three steps:
-
1.
Model calibration: estimate \(a_i\) and \(b_i\) for each gene individually from (6);
-
2.
Bursting mode inference: estimate the frequency mode (\(k_{0,i}\) or \(k_{1,i}\)) for each gene in each cell (can be seen as a binarization step with specific thresholds);
-
3.
Network inference: consider \(\textbf{z}_k\) as observed from step 2 and maximize (5) with respect to \(\alpha \) after adding an appropriate penalization term [14]. Each parameter \(\theta _{ij}\) is then set to \(\alpha _{ij}(t_k)\) with \(t_k\) that maximizes \(|\alpha _{ij}(t_k)|\).
1.3 A.3 Repressilator Network
Considering an instance model = NetworkModel(3), the repressilator network simulated in Fig. 3 is defined as follows:
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Herbach, U. (2023). Harissa: Stochastic Simulation and Inference of Gene Regulatory Networks Based on Transcriptional Bursting. In: Pang, J., Niehren, J. (eds) Computational Methods in Systems Biology. CMSB 2023. Lecture Notes in Computer Science(), vol 14137. Springer, Cham. https://doi.org/10.1007/978-3-031-42697-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-42697-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42696-4
Online ISBN: 978-3-031-42697-1
eBook Packages: Computer ScienceComputer Science (R0)