Phenotype Control techniques for Boolean gene regulatory networks

Plaugher, Daniel; Murrugarra, David

doi:10.1007/s11538-023-01197-6

Phenotype Control techniques for Boolean gene regulatory networks

Review
Published: 30 August 2023

Volume 85, article number 89, (2023)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

203 Accesses
6 Altmetric
1 Mention
Explore all metrics

Abstract

Modeling cell signal transduction pathways via Boolean networks (BNs) has become an established method for analyzing intracellular communications over the last few decades. What’s more, BNs provide a course-grained approach, not only to understanding molecular communications, but also for targeting pathway components that alter the long-term outcomes of the system. This has come to be known as phenotype control theory. In this review we study the interplay of various approaches for controlling gene regulatory networks such as: algebraic methods, control kernel, feedback vertex set, and stable motifs. The study will also include comparative discussion between the methods, using an established cancer model of T-Cell Large Granular Lymphocyte Leukemia. Further, we explore possible options for making the control search more efficient using reduction and modularity. Finally, we will include challenges presented such as the complexity and the availability of software for implementing each of these control techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emergence in complex networks of simple agents

Article Open access 23 May 2023

Complex Networks: a Mini-review

Article 13 July 2020

Gene Co-expression Network Analysis

References

Aguilar B, Gibbs DL, Reiss DJ, McConnell M, Danziger SA, Dervan A, Trotter M, Bassett D, Hershberg R, Ratushny AV, Shmulevich I (2020) A generalizable data-driven multicellular model of pancreatic ductal adenocarcinoma. Gigascience 9(7):07
Google Scholar
Aguilar B, Fang P, Laubenbacher R, Murrugarra D (2020) A near-optimal control method for stochastic Boolean networks. Lett Biomath 7(1):67
MathSciNet Google Scholar
Akutsu T, Hayashida M, Ching W-K, Michael KN (2007) Control of Boolean networks: hardness results and algorithms for tree structured networks. J Theor Biol 244(4):670–679
MathSciNet MATH Google Scholar
Arkin A, Ross J, McAdams HH (1998) Stochastic kinetic analysis of developmental pathway bifurcation in phage $\lambda $-infected Escherichia coli cells. Genetics 149(4):1633–1648
Google Scholar
Baker RE, Pena J-M, Jayamohan J, Jérusalem A (2018) Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol Lett 14(5):20170660
Google Scholar
Bender EA, Williamson SG (2010) Lists, decisions and graphs. S. Gill Williamson
Bertsekas D (2019) Reinforcement learning and optimal control. Athena Scientific, Nashua
Google Scholar
Borriello E, Daniels BC (2021) The basis of easy controllability in Boolean networks. Nat Commun 12(1)
Cheng D, Qi H, Li Z, Liu JB (2011) Stability and stabilization of Boolean networks. Int J Robust Nonlinear Control 21(2):134–156
MathSciNet MATH Google Scholar
Choo S-M, Ban B, Joo JI, Cho K-H (2018) The phenotype control kernel of a biomolecular regulatory network. BMC Syst Biol 12(1):49
Google Scholar
Cifuentes-Fontanals L, Tonello E, Siebert H (2022) Control in Boolean networks with model checking. Front Appl Math Stat 8
Cifuentes-Fontanals L, Tonello E, Siebert H (2022) Node and edge control strategy identification via trap spaces in Boolean networks
Creative Proteomics (2018) Brief introduction of post-translational modifications (PTMS). Creative Proteomics Blog
Didier G, Remy E, Chaouiya C (2011) Mapping multivalued onto Boolean dynamics. J Theor Biol 270(1):177–184
MathSciNet MATH Google Scholar
Erkan M, Reiser-Erkan C, Michalski C, Kleeff J (2010) Tumor microenvironment and progression of pancreatic cancer. Exp Oncol 32:128–31
Google Scholar
Farrow B, Albo D, Berger DH (2008) The role of the tumor microenvironment in the progression of pancreatic cancer. J Surg Res 149(2):319–328
Google Scholar
Feig C, Gopinathan A, Neesse A, Chan DS, Cook N, Tuveson DA (2012) The pancreas cancer microenvironment. Clin Cancer Res 18(16):4266–4276
Google Scholar
Festa P, Pardalos P, Resende M (1999) Feedback set problems. Encyclopedia of optimization 2
Fiedler B, Mochizuki A, Kurosawa G, Saito D (2013) Dynamics and control at feedback vertex sets. I: informative and determining nodes in regulatory networks. J Dyn Differ Equ 25(3):563–604
MathSciNet MATH Google Scholar
Galinier P, Lemamou E, Bouzidi M (2013) Applying local search to the feedback vertex set problem. J Heuristics 19:10
Google Scholar
Gong C, Milberg O, Wang B, Vicini P, Narwal R, Roskos L, Popel AS (2017) A computational multiscale agent-based model for simulating spatio-temporal tumour immune response to pd1 and pdl1 inhibition. J R Soc Interface 14(134):20170320
Google Scholar
Gore J, Korc M (2014) Pancreatic cancer stroma: friend or foe? Cancer Cell 25:711–712
Google Scholar
Grayson DR, Stillman ME (2002) Macaulay2, a software system for research in algebraic geometry. http://www.math.uiuc.edu/Macaulay2/
Heinz S, Urszula L (2016) Optimal control for mathematical models of cancer therapies: an application of geometric methods, vol 42. Springer, New York
MATH Google Scholar
Hinkelmann F, Brandon M, Guang B, McNeill R, Blekherman G, Veliz-Cuba A, Laubenbacher R (2011) ADAM: analysis of discrete models of biological systems using computer algebra. BMC Bioinform 12:295
Google Scholar
Johnson K, Plaugher D, Murrugarra D (2023) Investigating the effect of changes in model parameters on optimal control policies, time to absorption, and mixing times
Kadelka C, Laubenbacher R, Murrugarra D, Veliz-Cuba A, Matthew W (2022) Decomposition of Boolean networks: an approach to modularity of biological systems
Kauffman SA (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 22(3):437–467
MathSciNet Google Scholar
Kleeff J, Beckhove P, Esposito I, Herzig S, Huber PE, Matthias Löhr J, Friess H (2007) Pancreatic cancer microenvironment. Int J Cancer 121(4):699–705
Google Scholar
Lenhart S, Workman JT (2007) Optimal control applied to biological models, 1st edn. Chapman Hall/CRC, Boca Raton
MATH Google Scholar
Loughran TP (2006) Large granular lymphocytic leukemia. Leukemia and Lymphoma Society
Macklin P (2019) Key challenges facing data-driven multicellular systems biology. Gigascience 8(10):giz127
Google Scholar
Mochizuki A, Fiedler B, Kurosawa G, Saito D (2013) Dynamics and control at feedback vertex sets. II: a faithful monitor to determine the diversity of molecular activities in regulatory networks. J Theor Biol 335:130–146
MathSciNet MATH Google Scholar
Moore H (2018) How to mathematically optimize drug regimens using optimal control. J Pharmacokinet Pharmacodyn 45(1):127–137
Google Scholar
Motter AE (2015) Networkcontrology. Chaos Interdiscip J Nonlinear Sci 25(9):097621
MathSciNet Google Scholar
Murrugarra D, Aguilar B (2018) Algebraic and combinatorial computational biology, chapter 5. Academic Press, New York, pp 149–150
Google Scholar
Murrugarra D, Dimitrova ES (2015) Molecular network control through Boolean canalization. EURASIP J Bioinform Syst Biol 2015(1):9
Google Scholar
Murrugarra D, Dimitrova E (2021) Quantifying the total effect of edge interventions in discrete multistate networks. Automatica 125:109453
MathSciNet MATH Google Scholar
Murrugarra D, Veliz-Cuba A, Aguilar B, Arat S, Laubenbacher R (2012) Modeling stochasticity and variability in gene regulatory networks. EURASIP J Bioinf Syst Biol 2012(1):5
Google Scholar
Murrugarra D, Veliz-Cuba A, Aguilar B, Laubenbacher R (2016) Identification of control targets in Boolean molecular network models via computational algebra. BMC Syst Biol 10(1):94
Google Scholar
Murrugarra D, Miller J, Mueller AN (2016) Estimating propensity parameters using google PageRank and genetic algorithms. Front Neurosci 10:513
Google Scholar
Padoan A, Plebani M, Basso D (2019) Inflammation and pancreatic cancer: focus on metabolism, cytokines, and immunity. Int J Mol Sci 20:676
Google Scholar
Plaugher D (2022) An integrated computational pipeline to construct patient-specific cancer models
Plaugher D, Aguilar B, Murrugarra D (2022) Uncovering potential interventions for pancreatic cancer patients via mathematical modeling. J Theor Biol 548:111197
MathSciNet MATH Google Scholar
Plaugher D, Murrugarra D (2021) Modeling the pancreatic cancer microenvironment in search of control targets. Bull Math Biol 83
Rozum J, Albert R (2022) Leveraging network structure in nonlinear control. NPJ Syst Biol Appl 8(1):36
MATH Google Scholar
Saadatpour A, Albert I, Albert R (2010) Attractor analysis of asynchronous Boolean models of signal transduction networks. J Theor Biol 266(4):641–56
MathSciNet MATH Google Scholar
Saadatpour A, Wang R-S, Liao A, Liu X, Loughran TP, Albert I, Albert R (2011) Dynamical and structural analysis of a T cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia. PLoS Comput Biol 7(11):e1002267
Google Scholar
Saadatpour A, Albert R, Reluga T (2013) A reduction method for Boolean network models proven to conserve attractors. SIAM J Appl Dyn Syst 12:1997–2011
MathSciNet MATH Google Scholar
Shmulevich I, Dougherty ER, Kim S, Zhang W (2002) Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18(2):261–274
Google Scholar
Shmulevich I, Dougherty ER (2010) Probabilistic Boolean networks: the modeling and control of gene regulatory networks. SIAM
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
MATH Google Scholar
Taylor BP, Dushoff J, Weitz JS (2016) Stochasticity and the limits to confidence when estimating r0 of Ebola and other emerging infectious diseases. J Theor Biol 408:145–154
MATH Google Scholar
Thomas R (1973) Boolean formalization of genetic control circuits. J Theor Biol 42(3):563–585
Google Scholar
Veliz-Cuba A (2011) Reduction of Boolean network models. J Theor Biol 289:167–172
MathSciNet MATH Google Scholar
Veliz-Cuba A, Aguilar B, Hinkelmann F, Laubenbacher R (2014) Steady state analysis of Boolean molecular network models via model reduction and computational algebra. BMC Bioinform 15:221
Google Scholar
Veliz-Cuba A, Voss SR, Murrugarra D (2022) Building model prototypes from time-course data. Lett Biomath 9(1):107–120
Google Scholar
Vieira LS, Laubenbacher RC, Murrugarra D (2020) Control of intracellular molecular networks using algebraic methods. Bull Math Biol 82(1):1–22
MathSciNet MATH Google Scholar
Waddington CH (1957) The strategy of the genes: a discussion of some aspects of theoretical biology. Allen & Unwin, London
Google Scholar
Yang J-M, Lee C-K, Cho K-H (2020) Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Trans Control Netw Syst 8(2):928–939
MathSciNet Google Scholar
Yang J-M, Lee C-K, Cho K-H (2021) Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Trans Control Netw Syst 8(2):928–939
MathSciNet Google Scholar
Yang G, Zañudo JGT, Albert R (2018) Target control in logical models using the domain of influence of nodes. Front Physiol 9
Yousefi MR, Datta A, Dougherty ER (2012) Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness. IEEE Trans Signal Process 60(9):4930–4944
MathSciNet MATH Google Scholar
Zañudo J, Albert R (2013) An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos (Woodbury, NY) 23:025111
MathSciNet MATH Google Scholar
Zañudo JGT, Albert R (2015) Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput Biol 11(4):e1004193
Google Scholar
Zañudo JGT, Yang G, Albert R (2017) Structure-based control of complex networks with nonlinear dynamics. Proc Natl Acad Sci USA 114(28):7234–7239
Google Scholar

Download references

Acknowledgements

The authors would like to thank Reinhard Laubenbacher and Reka Albert for their discussions and suggestions during in the initial stage of this project. Further, DP was supported by the NIH Training Grant T32CA165990. D.M. was partially supported by a Collaboration grant (850896) from the Simons Foundation.

Author information

Authors and Affiliations

Department of Toxicology and Cancer Biology, University of Kentucky, Lexington, KY, USA
Daniel Plaugher
Department of Mathematics, University of Kentucky, Lexington, KY, USA
David Murrugarra

Authors

Daniel Plaugher
View author publications
You can also search for this author in PubMed Google Scholar
David Murrugarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Plaugher.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

7 Appendix

1.1 7.1 Finite Dynamical Systems

For the last few decades, a popular modeling approach for gene regulation has been to implement dynamical systems over finite fields. Here, functions can be interpreted as modeling information processing within cells, which determines cellular behavior. As depicted in Fig. 8, $\{x_{i_1},\dots ,x_{i_m}\} $ represent the input genes or predictor genes, $f_i(x_{i_1},...,x_{i_m})$ is the internal update function or predictor rule, and $x_i$ is the target gene.

First, let $X=X_1\times X_2\dots \times X_n$ be the Cartesian product of finite sets. A local model over a finite set X is an n-tuple of coordinate functions $F=(f_1, f_2\dots , f_n)$, where $f_i:X^n\rightarrow X$. Each function $f_i$ uniquely determines a function

$$\begin{aligned} F_i: (x_1,\dots ,x_n) \mapsto (x_1,\dots ,f_i(x),\dots ,x_n) \end{aligned}$$

and $x=(x_1,\dots ,x_n)$. Every local model defines a canonical finite dynamical system (FDS) map, where the functions are updated as

$$\begin{aligned} f:X^n\rightarrow X^n,\quad f:(x_1,\dots ,x_n)\mapsto (f_1(x),\dots ,f_n(x)). \end{aligned}$$

Note that discrete does not necessarily imply finite. Take the natural numbers ${\mathbb {N}}=1, 2, 3, 4, \dots $, for example. The set is clearly discrete, yet its cardinality is infinite. In general, we cannot always write a function as a tuple if the space is simply “discrete”. In order to provide structure to each $X_i$, we embed $X_i$ into a finite field where, for some prime p,

$$\begin{aligned} X_i\hookrightarrow {\mathbb {F}},\quad |{\mathbb {F}}|=p^k. \end{aligned}$$

For example, if we desire states of Low, Medium, and High to represent levels of gene expression, then $X_i=\{L,M,H\} \hookrightarrow {\mathbb {F}}_3=\{0,1,2\}$. We call these mixed-state models when states are non-binary. For the case when all states are binary (i.e. ON or OFF, HIGH or LOW, 1 or 0), we call these models Boolean networks (Plaugher 2022).

1.1.1 7.1.1 Boolean Networks

Boolean networks (BNs) are popular because we can build effective models without the use of constants or rates. This then eliminates the need for tedious parameter discovery. Rather, BNs focus on the mechanics and logic of the system. BN models were originally introduced in 1963 by Kauffman and Thomas to provide a coarse grained description of gene regulatory networks (Kauffman 1969; Thomas 1973). Within a BN there are three main components: structure (wiring diagram), functions (regulatory rules), and dynamics (attractors). As we begin to define our terms, it may be helpful to keep Fig. 9 in mind as a basic example. Given n binary variables, define a Boolean Network as an n-tuple of coordinate functions

$$\begin{aligned} F=(f_1,\dots f_n): \{0,1\}^n\rightarrow \{0,1\}^n, \quad f_i:\{0,1\}^n\mapsto \{0,1\}. \end{aligned}$$

The wiring diagram of F, call it W, is then defined as a directed graph with n nodes $\{x_1, x_2,\dots , x_n\}$ such that there is an edge in W from $x_j$ to $x_i$ if $f_i$ depends on $x_j$. That is,

$$\begin{aligned} x_j\rightarrow x_i \quad \text {if} \quad f_i=f(x_{i_1},\dots ,x_{i_j},\dots ,x_{i_k}) \end{aligned}$$

Within W we denote positive edges as $x_j\rightarrow x_i$ and negative edges as $x_j\dashv x_i$ (or sometimes $x_j\multimap x_i$). Biologically, a positive edge is representative of activation while a negative edge represents inhibition. For example, in Fig. 9 we see the wiring diagram of $F=(f_1, f_2)=(x_2,x_1)$.

Now that we have structure and functions, the dynamics of F are traditionally described as: (1) trajectories for all $2^n$ possible initial conditions, or (2) a directed graph with nodes in ${\mathbb {F}}^n_2=\{0,1\}^n$. In the first case, a trajectory is a sequence $(x(t))_{t=0}^\infty $ given by the difference equations $x(t+1)=F(x(t))$ for all $t\ge 0$ (Kadelka et al. 2022). For example, Fig. 9 would yield deterministic trajectories

$$\begin{aligned} T_1&=(00, 00, 00,\dots )\\ T_2&=(11, 11, 11,\dots )\\ T_3&=(01, 10, 01, 10,\dots )\\ T_4&=(10, 01, 10, 01,\dots ). \end{aligned}$$

The phase space (also called state space) of F is the directed graph with vertex set $S^n$ and edge set $\{(s,f(s))|s\in S^n$}. Simply put, in a BN, S is the set of all possible states, and their respective transitions according to the model F form the state space (see Fig. 10). A node $s\in S$ is called transient if $f^k(s)\ne s$ for all $k>1$, a node $s\in S$ is called periodic (or cyclic) if $f^k(s)= s$ for some $k\ge 1$, and a node $s\in S$ is called a fixed point if $f(s)= s$. We can also think of the phase space as having strongly connected components (SCCs), where a SCC is said to be terminal if it has no out-going edges. Thus, a transient state is not in a terminal SCC, a cyclic attractor is in a terminal k-cycle ($k=1$ is a fixed point), and any instance of an SCC otherwise is a complex attractor. In other words, we define an attractor as a set of states from which there is no escape as the system evolves, and an attractor with a single state is called a fixed point. Thus, given sufficient time, the dynamics of a BN always end up in a fixed point or (complex) attractor.

Table 4 Dynamic truth table for Fig. 9

Full size table

For example, it was previously shown above that $F=(f_1, f_2)=(x_2,x_1)$. To find the dynamics of the corresponding state space $S=\{00,01,10,11\}$, one can construct truth Table 4 using lexicographic ordering. It is important to point out that we denote the states in order of the variable so that

$$\begin{aligned} s_2 = \{0,1\} = 01 = \{x_1=0,x_2=1\}, \end{aligned}$$

because maintaining order is highly important for correct interpretation of state values. The left columns indicate the possible states of our nodes $x_1$ and $x_2$, whereas the right columns indicate their deterministic updates according to the functions $f_1$ and $f_2$. Therefore, from the framework we see in Fig. 10 that we have two fixed points and one cycle.

Table 5 Standard Boolean logical rules

Full size table

Up to this point we have only discussed linear BNs, but real-world models are almost always highly nonlinear (see Fig. 11). To accommodate these nonlinear regulatory networks, we implement various classes of functions based on three main Boolean logical rules - AND, OR, NOT. Some use XOR (exclusive OR), but for simplicity it is excluded here. Assume the variables x and y are given in a BN. Then Table 5 summarizes the functionality and notation used for each of the three main rules.

A common criticism of using discrete models for regulatory networks such as BNs is that deterministic dynamics are artificial. In reality biological systems do not contain a “central clock", but instead the concentration levels of gene products change and respond to stimuli on varying time-scales. Thus, the update schedules chosen play a significant role in the accuracy of the model. Synchronous update schedules produce deterministic dynamics, wherein nodes are all updated simultaneously so that

$$\begin{aligned} x(0)\rightarrow x(1)=F(x(0))\rightarrow x(2)=F(x(1))\rightarrow \cdots . \end{aligned}$$

On the other hand, asynchronous update schedules produce stochastic dynamics, wherein a randomly selected node is updated at each time step so that

$$\begin{aligned} x(0)\rightarrow x(1)=(x_1(0),\dots ,f_i(x(0)),\dots ,x_n(0))\rightarrow \cdots . \end{aligned}$$

Lastly, sequential update schedules are performed asynchronously according to a designated permutation $\sigma = (\sigma _1,\dots ,\sigma _n)$ of $(1,\dots , n)$. Specifically, if we define $F_i(x_1,\dots , x_n)=(x_1,\dots ,f_i(x),\dots ,x_n)$, then the update is given by

$$\begin{aligned} F_\sigma (x)=F_{\sigma _n}(F_{\sigma _{n-1}}(\cdots (F_{\sigma _{1}}(x))\cdots )) \end{aligned}$$

according to the order designated by $\sigma $. This is sometimes done when the ordering of gene updates are known, as some may update faster than others. For example, using our simple example in Figs. 9, 12 shows the varying impacts of these three update schedules.

We can easily observe from Fig. 12 that fixed points are maintained across all update schedules. However, cycles are not necessarily preserved. As a result, different update schedules lead to different dynamics in the state space, which could lead to different attractors (or eliminate attractors), which would result in different target discoveries for interventions. This is where the framework of Stochastic Discrete Dynamical Systems (SDDS) is beneficial (Murrugarra and Aguilar 2018; Plaugher and Murrugarra 2021; Plaugher et al. 2022; Plaugher 2022). Developed in Murrugarra and Aguilar (2018), SDDS incorporates Markov chain tools to study long-term dynamics of Boolean networks. SDDS uses parameters based on designated propensities to model node (and pathway) signal activation and deactivation, also referred to as degradation. In essence, SDDS merges the synchronous and asynchronous update schedules described above. One propensity is used when the update positively impacts the node, in the sense that the node increases its value from OFF to ON. Another propensity is used when the update negatively affects the node in the sense that the node decreases its value from ON to OFF. More precisely, an SDDS of the variables $(x_1, x_2,..., x_{n})$ is a collection of n triples

$$\begin{aligned} \hat{F}=\{f_k, p_k^\uparrow , p_k^\downarrow \}_{k=1}^n \end{aligned}$$

where for $k=1,..., n$,

$f_k:\{0,1\}^n\rightarrow \{0,1\}$ is the update function for $x_k$
$p_k^\uparrow \in [0,1]$ is the activation propensity
$p_k^\downarrow \in [0,1]$ is the deactivation propensity

Here, the parameters $p_k^\uparrow $ and $p_k^\downarrow $ introduce stochasticity. For example, an activation of $x_k(t)$ at the next time step (i.e. $x_k(t)=0$, $f_k(x_1(t),...,x_n(t))=1$, and $x_k(t+1)=1$) occurs with probability $p_k^\uparrow $. An SDDS can be represented as a Markov Chain via its transition matrix, which can be viewed as transition probabilities between various states of the network. Elements of the transition matrix A are determined as follows: consider the set ${S}=\{0,1\}^n$ consisting of all possible states of the network. Suppose $x=(x_1,...,x_n)\in {S}$ and $y=(y_1,...,y_n)\in {S}$. Then, the probability of transitioning from x to y is

$$\begin{aligned} a_{y,x}=\prod _{i=1}^n P(x_i\rightarrow y_i) \end{aligned}$$

(23)

where entries are stored column-wise and

$$\begin{aligned} P(x_i\rightarrow f_i(x))=\left\{ \begin{matrix} p_k^\uparrow , &{}\text {if } x_i<f_i(x)\\ p_k^\downarrow , &{}\text {if } x_i>f_i(x)\\ 1, &{}\text {if } x_i = f_i(x) \end{matrix}\right. \quad \text {and}\quad P(x_i\rightarrow x_i)=\left\{ \begin{matrix} 1-p_k^\uparrow , &{}\text {if } x_i<f_i(x)\\ 1-p_k^\downarrow , &{}\text {if } x_i>f_i(x)\\ 1, &{}\text {if } x_i = f_i(x) \end{matrix}\right. . \end{aligned}$$

It follows that $P(x_i\rightarrow y_i)=0$ for any $y_i\notin \{x_i,f_i(x)\}$. Therefore, we achieve $A=[a_{y,x}]_{x,y\in {S}}$. Note that when propensities are set to $p=1$, we have a traditional BN. With this framework, we built a simulator that takes random initial states as inputs and then tracks the trajectory of each node through time. Long-term phenotype expression probabilities can then be estimated, as well as network dynamics with (and without) controls (Plaugher 2022).

1.2 7.2 Elementary Examples for Control Methods

1.2.1 7.2.1 Computational Algebra

Consider the network in Fig. 13, with the following regulatory functions.

$$\begin{aligned} f_1&= (\sim x_3) \wedge (\sim x_5)\\ f_2&= (\sim x_1) \vee x_4\\ f_3&= (\sim x_2)\vee x_5\\ f_4&= x_3\\ f_5&= \sim x_4 \end{aligned}$$

Using Table 5, we rewrite our functions as the following simplified polynomials.

$$\begin{aligned} f_1&= 1+x_3+x_5+x_3x_5\\ f_2&= 1+x_1+x_1x_4\\ f_3&= x_2x_5+x_2+1\\ f_4&= x_3\\ f_5&= 1+x_4\\ \end{aligned}$$

We can then find the fixed points of the system by solving $f_i=x_i$ for $i=1,\dots , 5$. Another way to view this step is as finding roots of $g_i=0$ where $g_i=f_i-x_i$, then finding the Grobner basis of the ideal $I=\langle g_1,\dots ,g_5\rangle $. In any case, the example in Fig. 13 does not contain any fixed points. However, further state space analysis does reveal two attractors: $\{01011, 01100\}$ and $\{00101, 01010, 01110, 01111, 10001, 11000\}$. Now, we encode our edge controls as

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+(u_{3,1}+1)x_3+(u_{5,1}+1)x_5+(u_{3,1}+1)x_3(u_{5,1}+1)x_5\\ {\mathcal {F}}_2&= 1+(u_{1,2}+1)x_1+(u_{1,2}+1)x_1(u_{4,2}+1)x_4\\ {\mathcal {F}}_3&= (u_{2,3}+1)x_2(u_{5,3}+1)x_5+(u_{2,3}+1)x_2+1\\ {\mathcal {F}}_4&= (u_{3,4}+1)x_3\\ {\mathcal {F}}_5&= 1+(u_{4,5}+1)x_4 \end{aligned} \end{aligned}$$

(24)

and node controls as

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= (u_1^-+u_1^++1)(1+x_3+x_5+x_3x_5)+u_1^+\\ {\mathcal {F}}_2&= (u_2^-+u_2^++1)(1+x_1+x_1x_4)+u_2^+\\ {\mathcal {F}}_3&= (u_3^-+u_3^++1)(x_2x_5+x_2+1)+u_3^+\\ {\mathcal {F}}_4&= (u_4^-+u_4^++1)x_3+u_4^+\\ {\mathcal {F}}_5&= (u_5^-+u_5^++1)(1+x_4)+u_5^+. \end{aligned} \end{aligned}$$

(25)

Let’s consider the objective of generating new attractors, and assume we want our steady state to be $y=11110$. In general, one can search the entire system for controls, but there may be special cases where limiting decisions can be made amongst collaborators. For arguments sake, suppose we want to find edge knockouts and limit our search to edges $x_3\rightarrow x_1$, $x_5\rightarrow x_1$, and $x_2\rightarrow x_3$. Then the updated edge equations (Eq. 24) become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+(u_{3,1}+1)x_3+(u_{5,1}+1)x_5+(u_{3,1}+1)x_3(u_{5,1}+1)x_5\\ {\mathcal {F}}_2&= 1+x_1+x_1x_4\\ {\mathcal {F}}_3&= (u_{2,3}+1)x_2x_5+(u_{2,3}+1)x_2+1\\ {\mathcal {F}}_4&= x_3\\ {\mathcal {F}}_5&= 1+x_4. \end{aligned} \end{aligned}$$

(26)

Evaluating at $y=11110$ yields

$$\begin{aligned} {\mathcal {F}}_1=u_{3,1},\quad {\mathcal {F}}_2=1,\quad {\mathcal {F}}_3= u_{2,3},\quad {\mathcal {F}}_4=1,\quad {\mathcal {F}}_5=0. \end{aligned}$$

Therefore, the desired fixed point is achieved if and only if $u_{3,1}=u_{2,3}=1$. That is, the controls for $u_{3,1}$ and $u_{2,3}$ are active, such that we must delete both corresponding edges. Similarly, we can determine node control to achieve new fixed point $y=11110$. Again, for simplicity, we limit ourselves to $x_1$ knock-in, $x_3$ knock-out and knock-in, and $x_4$ knock-in. The updated node equations (Eq. 25) then become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= (u_1^++1)(1+x_3+x_5+x_3x_5)+u_1^+\\ {\mathcal {F}}_2&= 1+x_1+x_1x_4\\ {\mathcal {F}}_3&= (u_3^-+u_3^++1)(x_2x_5+x_2+1)+u_3^+\\ {\mathcal {F}}_4&= (u_4^++1)x_3+u_4^+\\ {\mathcal {F}}_5&= 1+x_4. \end{aligned} \end{aligned}$$

(27)

Evaluating at $y=11110$ yields

$$\begin{aligned} {\mathcal {F}}_1=u_1^+,\quad {\mathcal {F}}_2=1,\quad {\mathcal {F}}_3=u_3^+,\quad {\mathcal {F}}_4=1,\quad {\mathcal {F}}_5=0. \end{aligned}$$

Thus, the desired fixed point is achieved if and only if $u_1^+=1$ and $u_3^+=1$. Importantly, this means that the controls by themselves are insufficient but together they achieve the desired goal. One can easily see that requiring numerous controls in much larger systems may not be biological feasible, which is why alternate objectives can prove useful.

Suppose we determine that $y=01111$ is in a diseased attractor which we want to destroy. We can then aim to block the transition from y to $F(y)=01110$. We limit ourselves to considering edges from $x_3\rightarrow x_1$, $x_5\rightarrow x_1$, $x_3\rightarrow x_4$, and $x_4\rightarrow x_5$. The updated edge equations (Eq. 24) become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+(u_{3,1}+1)x_3+(u_{5,1}+1)x_5+(u_{3,1}+1)x_3(u_{5,1}+1)x_5\\ {\mathcal {F}}_2&= 1+x_1+x_1x_4\\ {\mathcal {F}}_3&= x_2x_5+x_2+1\\ {\mathcal {F}}_4&= (u_{3,4}+1)x_3\\ {\mathcal {F}}_5&= 1+(u_{4,5}+1)x_4. \end{aligned} \end{aligned}$$

(28)

Evaluating at $y=01111$ yields

$$\begin{aligned} {\mathcal {F}}_1=u_{3,1}u_{5,1},\quad {\mathcal {F}}_2=1,\quad {\mathcal {F}}_3=1,\quad {\mathcal {F}}_4=u_{3,4}+1,\quad {\mathcal {F}}_5=u_{4,5}. \end{aligned}$$

This means that Eq. 2 becomes

$$\begin{aligned} (u_{3,1}u_{5,1}+1)(u_{3,4})(u_{4,5}+1)=0 \end{aligned}$$

giving three possible solutions: $u_{3,1}=u_{5,1}=1$, $u_{3,4}=0$, or $u_{4,5}=1$. Notice that we again have a combinatorial solution in $u_{3,1},u_{5,1}$ since they are insufficient individually but successful together, $u_{3,4}=0$ means that the control is inactive, and $u_{4,5}$ is a singleton control.

Lastly, consider the objective of region blocking. Suppose we want to avoid regions where $x_3=0$, and we will limit ourselves to nodes $x_2$ knock-out, $x_3$ knock-in, and $x_4$ knock-in. Then the updated node equations (Eq. 25) become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+x_3+x_5+x_3x_5\\ {\mathcal {F}}_2&= (u_2^-+1)(1+x_1+x_1x_4)\\ {\mathcal {F}}_3&= (u_3^++1)(x_2x_5+x_2+1)+u_3^+\\ {\mathcal {F}}_4&= (u_4^++1)x_3+u_4^+\\ {\mathcal {F}}_5&= 1+x_4. \end{aligned} \end{aligned}$$

(29)

Next, we see that Eq. 3 yields

$$\begin{aligned} \begin{aligned} 0&= 1+x_3+x_5+x_3x_5 +x_1\\ 0&= (u_2^-+1)(1+x_1+x_1x_4) +x_2\\ 0&= (u_3^++1)(x_2x_5+x_2+1)+u_3^+ +x_3\\ 0&= (u_4^++1)x_3+u_4^+ +x_4\\ 0&= 1+x_4 +x_5\\ 0&=x_3 \end{aligned} \end{aligned}$$

(30)

Using computation algebra tools to compute the Grobner basis of the ideal associated to the above equations, we encode the system of equations to achieve the ideal:

$$\begin{aligned} I=\langle x_1+1, u_2^-,x_2+1,u^+_3,x_3,u^+_4+1,x_4+1,x_5 \rangle . \end{aligned}$$

This means the original system has the same solutions as the following system.

$$\begin{aligned} x_1+1&=0&u_2^-&=0&x_2+1&=0&u_3^+&=0 \\ x_3&=0&u_4^++1&=0&x_4+1&=0&x_5&=0 \end{aligned}$$

Recall that our goal is to block the region $x_3=0$ by finding parameters that guarantee the above system has no solutions. Utilizing equations that only contain control parameters we have $u_2^-=0$, $u_3^+=0$, and $u_4^++1=0$. Thus, if we allow either $u_2^-=1$, $u_3^+=1$, or $u_4^+=0$, then our system will have no solution, as needed. Since $x_3$ is limiting criteria and $u_4^+$ is an inactive control, that leaves $u_2^-=1$ as the desired target. As one can see, the computational algebra method is quite versatile (Plaugher 2022).

1.2.2 7.2.2 Control Kernel

Consider the network in Fig. 14. Steady state analysis reveals two fixed points: 000100 and 111011. Suppose our control objective is $x_4=0$, which is the second fixed point respectively. We first notice that there are no input nodes, which means we move on to distinguishing nodes. Then the CK method (correctly) indicates that $x_1=1$ will direct the system into the desired fixed point. Admittedly, while the CK method is straight forward, the documentation for the software used to implement the search can be difficult to navigate (Plaugher 2022).

1.2.3 7.2.3 Feedback Vertex Set

Figure 15 contains a simple example of identifying a FVS. The input node ($x_1$) is always in the control set, while the only other node required is one of those in the 3-cycle. As scene in the figure, Fig. 15a is the example wiring diagram and Fig. 15b–d show the three possible FVS’s. One can easily see that the strategy for FVS is quite simple, yet, it can produce larger control sets than necessary. Further, we may not obtain all FVS’s if the system has many attractors (Plaugher 2022).

1.2.4 7.2.4 Stable Motifs

Consider the example network in Fig. 16a, with the following functions and negated functions.

$$ \begin{aligned} \begin{array}{llll} f_1&{}= x_2 | x_3 &{}\qquad \qquad \qquad \sim f_1 = (\sim x_2) \& (\sim x_3)\\ f_2&{}= x_1 \& (\sim x_3) &{}\qquad \qquad \qquad \sim f_2= (\sim x_1) | x_3\\ f_3&{}= (\sim x_1) | (\sim x_2) &{}\qquad \qquad \qquad \sim f_3= x_1 \& x_2 \end{array} \end{aligned}$$

Using the aforementioned steps, the expanded graph obtained is Fig. 16b. Notice there are two stable motifs (circled in orange and green), which indicate a fixed point (110) and a partial fixed point (X01). To find the rest of partial fixed point, substitute known values into the original functions. Therefore,

$$\begin{aligned} f_1= x_2 | x_3 = 0 | 1 = 1 \end{aligned}$$

which gives 101 as the second fixed point. Since the control sets are subsets of the stable motifs, we have $\{x_2=1, x_3=0\}$ or $\{x_1=1, x_3=0\}$ for fixed point 110, and $\{x_2=0\}$ or $\{x_3=1\}$ for fixed point 101 (Plaugher 2022).

1.3 7.3 Simulating Target Efficacy

To determine the efficacy of controls, we compare uncontrolled simulations with the appropriate target control simulations. Thus, a good control will produce low disease levels and high health levels (Plaugher 2022). We can do so by utilizing a stochastic simulator based on SDDS (Murrugarra and Aguilar 2018; Plaugher and Murrugarra 2021; Plaugher et al. 2022; Plaugher 2022), which requires several inputs before it can begin. The number of input variables in each Boolean function is given by the vector nv. Next, we need the variables for each gene in the form of an $m\times n$ matrix called varF where m is the maximum number of inputs, n is the number of genes, and information is stored column-wise. The number of variables will vary between functions. Since only the first nv(i) elements of the ith column are relevant, all remaining entries are set as $(-1)$. Now we construct the truth table F in compact form with size $2^m \times n$. Again, the length of each column i will vary but only the first $2^{nv(i)}$ entries are relevant. So all remaining entries are set as $(-1)$. It is vitally important to maintain numerical ordering, which is why the columns of F are in lexicographic binary arrays (Veliz-Cuba et al. 2022).

We must also establish propensities in the form of a $2\times n$ matrix c that contains values for $p_k^\uparrow $ and $p_k^\downarrow $. The values chosen for propensities may perturb results, as we saw in Fig. 12. But for all intents and purposes, we typically use $p_k^\uparrow = p_k^\downarrow = 0.9$ (i.e. follow the function rules $90\%$ of the time). Finally, we can run simulations using inputs: F, varF, nv, number of states (usually Boolean), c, n, number of steps, and number of random initializations. We have also implemented versions that allow for mutation induction and specified initial states. As a result, we achieve time-course trajectories, and we can use the Markov chain structure of SDDS to analyze features such as time to absorption, stationary distributions, and more.

As an example, consider the simple 3-cycle in Fig. 17. This particular system has two fixed points ($\{000\}$ and $\{111\}$) as well as two attractors ($\{001, 100, 010\}$ and $\{011, 101, 110\}$). Simulations were conducted using the variables in Table 6, with 1000 random initializations, 100 time steps (function updates), and injecting 1$\%$ noise. The overall state-space is shown in Fig. 18. In Fig. 19a, the uncontrolled simulation shows the oscillatory nature of attractors. However, Fig. 19b, c show that inducing control on $x_1$ is enough to drive the system to one fixed point or the other. Therefore, the SDDS simulator has the ability to show long-term trajectories and impact of controls over time.

Table 6 Variable tables for simple 3-cycle simulations in Fig. 17 (Plaugher 2022)

Full size table

1.4 7.4 Software

Cumulative files for all control techniques and examples, as well as “how-to” documentation (Plaugher 2022)
- https://github.com/drplaugher/SMATA_pipeline
CA: used to find fixed points, controls, and run simulations (Plaugher and Murrugarra 2021; Plaugher et al. 2022; Grayson and Stillman 2002)
- use the example files above
- see also, https://github.com/drplaugher/PCC_Mutations
CK: used to find control kernels (Borriello and Daniels 2021)
- https://doi.org/10.5281/zenodo.5172898
FVS: used to find FVSs (Mochizuki et al. 2013; Zañudo et al. 2017)
- https://github.com/jgtz/FVS_python3
Modularity: used to find strongly connected components (modules) (Kadelka et al. 2022)
- use the example files above
SM: used to find stable motifs and dynamic attractors (Zañudo and Albert 2015, 2013)
- https://github.com/jgtz/StableMotifs
- https://github.com/jcrozum/pystablemotifs

1.5 7.5 Appendix Tables

See Tables 7 and 8.

Table 7 Small T-LGL rules

Full size table

Table 8 Functions for large T-LGL model

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Plaugher, D., Murrugarra, D. Phenotype Control techniques for Boolean gene regulatory networks. Bull Math Biol 85, 89 (2023). https://doi.org/10.1007/s11538-023-01197-6

Download citation

Received: 19 April 2023
Accepted: 11 August 2023
Published: 30 August 2023
DOI: https://doi.org/10.1007/s11538-023-01197-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phenotype Control techniques for Boolean gene regulatory networks

Abstract

Access this article

Similar content being viewed by others