Skip to main content

Advertisement

Log in

Phenotype Control techniques for Boolean gene regulatory networks

  • Review
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Modeling cell signal transduction pathways via Boolean networks (BNs) has become an established method for analyzing intracellular communications over the last few decades. What’s more, BNs provide a course-grained approach, not only to understanding molecular communications, but also for targeting pathway components that alter the long-term outcomes of the system. This has come to be known as phenotype control theory. In this review we study the interplay of various approaches for controlling gene regulatory networks such as: algebraic methods, control kernel, feedback vertex set, and stable motifs. The study will also include comparative discussion between the methods, using an established cancer model of T-Cell Large Granular Lymphocyte Leukemia. Further, we explore possible options for making the control search more efficient using reduction and modularity. Finally, we will include challenges presented such as the complexity and the availability of software for implementing each of these control techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aguilar B, Gibbs DL, Reiss DJ, McConnell M, Danziger SA, Dervan A, Trotter M, Bassett D, Hershberg R, Ratushny AV, Shmulevich I (2020) A generalizable data-driven multicellular model of pancreatic ductal adenocarcinoma. Gigascience 9(7):07

    Google Scholar 

  • Aguilar B, Fang P, Laubenbacher R, Murrugarra D (2020) A near-optimal control method for stochastic Boolean networks. Lett Biomath 7(1):67

    MathSciNet  Google Scholar 

  • Akutsu T, Hayashida M, Ching W-K, Michael KN (2007) Control of Boolean networks: hardness results and algorithms for tree structured networks. J Theor Biol 244(4):670–679

    MathSciNet  MATH  Google Scholar 

  • Arkin A, Ross J, McAdams HH (1998) Stochastic kinetic analysis of developmental pathway bifurcation in phage \(\lambda \)-infected Escherichia coli cells. Genetics 149(4):1633–1648

    Google Scholar 

  • Baker RE, Pena J-M, Jayamohan J, Jérusalem A (2018) Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol Lett 14(5):20170660

    Google Scholar 

  • Bender EA, Williamson SG (2010) Lists, decisions and graphs. S. Gill Williamson

  • Bertsekas D (2019) Reinforcement learning and optimal control. Athena Scientific, Nashua

    Google Scholar 

  • Borriello E, Daniels BC (2021) The basis of easy controllability in Boolean networks. Nat Commun 12(1)

  • Cheng D, Qi H, Li Z, Liu JB (2011) Stability and stabilization of Boolean networks. Int J Robust Nonlinear Control 21(2):134–156

    MathSciNet  MATH  Google Scholar 

  • Choo S-M, Ban B, Joo JI, Cho K-H (2018) The phenotype control kernel of a biomolecular regulatory network. BMC Syst Biol 12(1):49

    Google Scholar 

  • Cifuentes-Fontanals L, Tonello E, Siebert H (2022) Control in Boolean networks with model checking. Front Appl Math Stat 8

  • Cifuentes-Fontanals L, Tonello E, Siebert H (2022) Node and edge control strategy identification via trap spaces in Boolean networks

  • Creative Proteomics (2018) Brief introduction of post-translational modifications (PTMS). Creative Proteomics Blog

  • Didier G, Remy E, Chaouiya C (2011) Mapping multivalued onto Boolean dynamics. J Theor Biol 270(1):177–184

    MathSciNet  MATH  Google Scholar 

  • Erkan M, Reiser-Erkan C, Michalski C, Kleeff J (2010) Tumor microenvironment and progression of pancreatic cancer. Exp Oncol 32:128–31

    Google Scholar 

  • Farrow B, Albo D, Berger DH (2008) The role of the tumor microenvironment in the progression of pancreatic cancer. J Surg Res 149(2):319–328

    Google Scholar 

  • Feig C, Gopinathan A, Neesse A, Chan DS, Cook N, Tuveson DA (2012) The pancreas cancer microenvironment. Clin Cancer Res 18(16):4266–4276

    Google Scholar 

  • Festa P, Pardalos P, Resende M (1999) Feedback set problems. Encyclopedia of optimization 2

  • Fiedler B, Mochizuki A, Kurosawa G, Saito D (2013) Dynamics and control at feedback vertex sets. I: informative and determining nodes in regulatory networks. J Dyn Differ Equ 25(3):563–604

    MathSciNet  MATH  Google Scholar 

  • Galinier P, Lemamou E, Bouzidi M (2013) Applying local search to the feedback vertex set problem. J Heuristics 19:10

    Google Scholar 

  • Gong C, Milberg O, Wang B, Vicini P, Narwal R, Roskos L, Popel AS (2017) A computational multiscale agent-based model for simulating spatio-temporal tumour immune response to pd1 and pdl1 inhibition. J R Soc Interface 14(134):20170320

    Google Scholar 

  • Gore J, Korc M (2014) Pancreatic cancer stroma: friend or foe? Cancer Cell 25:711–712

    Google Scholar 

  • Grayson DR, Stillman ME (2002) Macaulay2, a software system for research in algebraic geometry. http://www.math.uiuc.edu/Macaulay2/

  • Heinz S, Urszula L (2016) Optimal control for mathematical models of cancer therapies: an application of geometric methods, vol 42. Springer, New York

    MATH  Google Scholar 

  • Hinkelmann F, Brandon M, Guang B, McNeill R, Blekherman G, Veliz-Cuba A, Laubenbacher R (2011) ADAM: analysis of discrete models of biological systems using computer algebra. BMC Bioinform 12:295

    Google Scholar 

  • Johnson K, Plaugher D, Murrugarra D (2023) Investigating the effect of changes in model parameters on optimal control policies, time to absorption, and mixing times

  • Kadelka C, Laubenbacher R, Murrugarra D, Veliz-Cuba A, Matthew W (2022) Decomposition of Boolean networks: an approach to modularity of biological systems

  • Kauffman SA (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 22(3):437–467

    MathSciNet  Google Scholar 

  • Kleeff J, Beckhove P, Esposito I, Herzig S, Huber PE, Matthias Löhr J, Friess H (2007) Pancreatic cancer microenvironment. Int J Cancer 121(4):699–705

    Google Scholar 

  • Lenhart S, Workman JT (2007) Optimal control applied to biological models, 1st edn. Chapman Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Loughran TP (2006) Large granular lymphocytic leukemia. Leukemia and Lymphoma Society

  • Macklin P (2019) Key challenges facing data-driven multicellular systems biology. Gigascience 8(10):giz127

    Google Scholar 

  • Mochizuki A, Fiedler B, Kurosawa G, Saito D (2013) Dynamics and control at feedback vertex sets. II: a faithful monitor to determine the diversity of molecular activities in regulatory networks. J Theor Biol 335:130–146

    MathSciNet  MATH  Google Scholar 

  • Moore H (2018) How to mathematically optimize drug regimens using optimal control. J Pharmacokinet Pharmacodyn 45(1):127–137

    Google Scholar 

  • Motter AE (2015) Networkcontrology. Chaos Interdiscip J Nonlinear Sci 25(9):097621

    MathSciNet  Google Scholar 

  • Murrugarra D, Aguilar B (2018) Algebraic and combinatorial computational biology, chapter 5. Academic Press, New York, pp 149–150

    Google Scholar 

  • Murrugarra D, Dimitrova ES (2015) Molecular network control through Boolean canalization. EURASIP J Bioinform Syst Biol 2015(1):9

    Google Scholar 

  • Murrugarra D, Dimitrova E (2021) Quantifying the total effect of edge interventions in discrete multistate networks. Automatica 125:109453

    MathSciNet  MATH  Google Scholar 

  • Murrugarra D, Veliz-Cuba A, Aguilar B, Arat S, Laubenbacher R (2012) Modeling stochasticity and variability in gene regulatory networks. EURASIP J Bioinf Syst Biol 2012(1):5

    Google Scholar 

  • Murrugarra D, Veliz-Cuba A, Aguilar B, Laubenbacher R (2016) Identification of control targets in Boolean molecular network models via computational algebra. BMC Syst Biol 10(1):94

    Google Scholar 

  • Murrugarra D, Miller J, Mueller AN (2016) Estimating propensity parameters using google PageRank and genetic algorithms. Front Neurosci 10:513

    Google Scholar 

  • Padoan A, Plebani M, Basso D (2019) Inflammation and pancreatic cancer: focus on metabolism, cytokines, and immunity. Int J Mol Sci 20:676

    Google Scholar 

  • Plaugher D (2022) An integrated computational pipeline to construct patient-specific cancer models

  • Plaugher D, Aguilar B, Murrugarra D (2022) Uncovering potential interventions for pancreatic cancer patients via mathematical modeling. J Theor Biol 548:111197

    MathSciNet  MATH  Google Scholar 

  • Plaugher D, Murrugarra D (2021) Modeling the pancreatic cancer microenvironment in search of control targets. Bull Math Biol 83

  • Rozum J, Albert R (2022) Leveraging network structure in nonlinear control. NPJ Syst Biol Appl 8(1):36

    MATH  Google Scholar 

  • Saadatpour A, Albert I, Albert R (2010) Attractor analysis of asynchronous Boolean models of signal transduction networks. J Theor Biol 266(4):641–56

    MathSciNet  MATH  Google Scholar 

  • Saadatpour A, Wang R-S, Liao A, Liu X, Loughran TP, Albert I, Albert R (2011) Dynamical and structural analysis of a T cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia. PLoS Comput Biol 7(11):e1002267

    Google Scholar 

  • Saadatpour A, Albert R, Reluga T (2013) A reduction method for Boolean network models proven to conserve attractors. SIAM J Appl Dyn Syst 12:1997–2011

    MathSciNet  MATH  Google Scholar 

  • Shmulevich I, Dougherty ER, Kim S, Zhang W (2002) Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18(2):261–274

    Google Scholar 

  • Shmulevich I, Dougherty ER (2010) Probabilistic Boolean networks: the modeling and control of gene regulatory networks. SIAM

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Taylor BP, Dushoff J, Weitz JS (2016) Stochasticity and the limits to confidence when estimating r0 of Ebola and other emerging infectious diseases. J Theor Biol 408:145–154

    MATH  Google Scholar 

  • Thomas R (1973) Boolean formalization of genetic control circuits. J Theor Biol 42(3):563–585

    Google Scholar 

  • Veliz-Cuba A (2011) Reduction of Boolean network models. J Theor Biol 289:167–172

    MathSciNet  MATH  Google Scholar 

  • Veliz-Cuba A, Aguilar B, Hinkelmann F, Laubenbacher R (2014) Steady state analysis of Boolean molecular network models via model reduction and computational algebra. BMC Bioinform 15:221

    Google Scholar 

  • Veliz-Cuba A, Voss SR, Murrugarra D (2022) Building model prototypes from time-course data. Lett Biomath 9(1):107–120

    Google Scholar 

  • Vieira LS, Laubenbacher RC, Murrugarra D (2020) Control of intracellular molecular networks using algebraic methods. Bull Math Biol 82(1):1–22

    MathSciNet  MATH  Google Scholar 

  • Waddington CH (1957) The strategy of the genes: a discussion of some aspects of theoretical biology. Allen & Unwin, London

    Google Scholar 

  • Yang J-M, Lee C-K, Cho K-H (2020) Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Trans Control Netw Syst 8(2):928–939

    MathSciNet  Google Scholar 

  • Yang J-M, Lee C-K, Cho K-H (2021) Stabilizing control of complex biological networks based on attractor-specific network reduction. IEEE Trans Control Netw Syst 8(2):928–939

    MathSciNet  Google Scholar 

  • Yang G, Zañudo JGT, Albert R (2018) Target control in logical models using the domain of influence of nodes. Front Physiol 9

  • Yousefi MR, Datta A, Dougherty ER (2012) Optimal intervention strategies for therapeutic methods with fixed-length duration of drug effectiveness. IEEE Trans Signal Process 60(9):4930–4944

    MathSciNet  MATH  Google Scholar 

  • Zañudo J, Albert R (2013) An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos (Woodbury, NY) 23:025111

    MathSciNet  MATH  Google Scholar 

  • Zañudo JGT, Albert R (2015) Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput Biol 11(4):e1004193

    Google Scholar 

  • Zañudo JGT, Yang G, Albert R (2017) Structure-based control of complex networks with nonlinear dynamics. Proc Natl Acad Sci USA 114(28):7234–7239

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Reinhard Laubenbacher and Reka Albert for their discussions and suggestions during in the initial stage of this project. Further, DP was supported by the NIH Training Grant T32CA165990. D.M. was partially supported by a Collaboration grant (850896) from the Simons Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Plaugher.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

7 Appendix

7 Appendix

1.1 7.1 Finite Dynamical Systems

For the last few decades, a popular modeling approach for gene regulation has been to implement dynamical systems over finite fields. Here, functions can be interpreted as modeling information processing within cells, which determines cellular behavior. As depicted in Fig. 8, \(\{x_{i_1},\dots ,x_{i_m}\} \) represent the input genes or predictor genes, \(f_i(x_{i_1},...,x_{i_m})\) is the internal update function or predictor rule, and \(x_i\) is the target gene.

Fig. 8
figure 8

FDS for gene regulation (Plaugher 2022)

First, let \(X=X_1\times X_2\dots \times X_n\) be the Cartesian product of finite sets. A local model over a finite set X is an n-tuple of coordinate functions \(F=(f_1, f_2\dots , f_n)\), where \(f_i:X^n\rightarrow X\). Each function \(f_i\) uniquely determines a function

$$\begin{aligned} F_i: (x_1,\dots ,x_n) \mapsto (x_1,\dots ,f_i(x),\dots ,x_n) \end{aligned}$$

and \(x=(x_1,\dots ,x_n)\). Every local model defines a canonical finite dynamical system (FDS) map, where the functions are updated as

$$\begin{aligned} f:X^n\rightarrow X^n,\quad f:(x_1,\dots ,x_n)\mapsto (f_1(x),\dots ,f_n(x)). \end{aligned}$$

Note that discrete does not necessarily imply finite. Take the natural numbers \({\mathbb {N}}=1, 2, 3, 4, \dots \), for example. The set is clearly discrete, yet its cardinality is infinite. In general, we cannot always write a function as a tuple if the space is simply “discrete”. In order to provide structure to each \(X_i\), we embed \(X_i\) into a finite field where, for some prime p,

$$\begin{aligned} X_i\hookrightarrow {\mathbb {F}},\quad |{\mathbb {F}}|=p^k. \end{aligned}$$

For example, if we desire states of Low, Medium, and High to represent levels of gene expression, then \(X_i=\{L,M,H\} \hookrightarrow {\mathbb {F}}_3=\{0,1,2\}\). We call these mixed-state models when states are non-binary. For the case when all states are binary (i.e. ON or OFF, HIGH or LOW, 1 or 0), we call these models Boolean networks (Plaugher 2022).

1.1.1 7.1.1 Boolean Networks

Boolean networks (BNs) are popular because we can build effective models without the use of constants or rates. This then eliminates the need for tedious parameter discovery. Rather, BNs focus on the mechanics and logic of the system. BN models were originally introduced in 1963 by Kauffman and Thomas to provide a coarse grained description of gene regulatory networks (Kauffman 1969; Thomas 1973). Within a BN there are three main components: structure (wiring diagram), functions (regulatory rules), and dynamics (attractors). As we begin to define our terms, it may be helpful to keep Fig. 9 in mind as a basic example. Given n binary variables, define a Boolean Network as an n-tuple of coordinate functions

$$\begin{aligned} F=(f_1,\dots f_n): \{0,1\}^n\rightarrow \{0,1\}^n, \quad f_i:\{0,1\}^n\mapsto \{0,1\}. \end{aligned}$$

The wiring diagram of F, call it W, is then defined as a directed graph with n nodes \(\{x_1, x_2,\dots , x_n\}\) such that there is an edge in W from \(x_j\) to \(x_i\) if \(f_i\) depends on \(x_j\). That is,

$$\begin{aligned} x_j\rightarrow x_i \quad \text {if} \quad f_i=f(x_{i_1},\dots ,x_{i_j},\dots ,x_{i_k}) \end{aligned}$$

Within W we denote positive edges as \(x_j\rightarrow x_i\) and negative edges as \(x_j\dashv x_i\) (or sometimes \(x_j\multimap x_i\)). Biologically, a positive edge is representative of activation while a negative edge represents inhibition. For example, in Fig. 9 we see the wiring diagram of \(F=(f_1, f_2)=(x_2,x_1)\).

Fig. 9
figure 9

Simple Boolean network (Plaugher 2022)

Now that we have structure and functions, the dynamics of F are traditionally described as: (1) trajectories for all \(2^n\) possible initial conditions, or (2) a directed graph with nodes in \({\mathbb {F}}^n_2=\{0,1\}^n\). In the first case, a trajectory is a sequence \((x(t))_{t=0}^\infty \) given by the difference equations \(x(t+1)=F(x(t))\) for all \(t\ge 0\) (Kadelka et al. 2022). For example, Fig. 9 would yield deterministic trajectories

$$\begin{aligned} T_1&=(00, 00, 00,\dots )\\ T_2&=(11, 11, 11,\dots )\\ T_3&=(01, 10, 01, 10,\dots )\\ T_4&=(10, 01, 10, 01,\dots ). \end{aligned}$$

The phase space (also called state space) of F is the directed graph with vertex set \(S^n\) and edge set \(\{(s,f(s))|s\in S^n\)}. Simply put, in a BN, S is the set of all possible states, and their respective transitions according to the model F form the state space (see Fig. 10). A node \(s\in S\) is called transient if \(f^k(s)\ne s\) for all \(k>1\), a node \(s\in S\) is called periodic (or cyclic) if \(f^k(s)= s\) for some \(k\ge 1\), and a node \(s\in S\) is called a fixed point if \(f(s)= s\). We can also think of the phase space as having strongly connected components (SCCs), where a SCC is said to be terminal if it has no out-going edges. Thus, a transient state is not in a terminal SCC, a cyclic attractor is in a terminal k-cycle (\(k=1\) is a fixed point), and any instance of an SCC otherwise is a complex attractor. In other words, we define an attractor as a set of states from which there is no escape as the system evolves, and an attractor with a single state is called a fixed point. Thus, given sufficient time, the dynamics of a BN always end up in a fixed point or (complex) attractor.

Fig. 10
figure 10

Phase space of diagram 9 (Plaugher 2022)

Table 4 Dynamic truth table for Fig. 9

For example, it was previously shown above that \(F=(f_1, f_2)=(x_2,x_1)\). To find the dynamics of the corresponding state space \(S=\{00,01,10,11\}\), one can construct truth Table 4 using lexicographic ordering. It is important to point out that we denote the states in order of the variable so that

$$\begin{aligned} s_2 = \{0,1\} = 01 = \{x_1=0,x_2=1\}, \end{aligned}$$

because maintaining order is highly important for correct interpretation of state values. The left columns indicate the possible states of our nodes \(x_1\) and \(x_2\), whereas the right columns indicate their deterministic updates according to the functions \(f_1\) and \(f_2\). Therefore, from the framework we see in Fig. 10 that we have two fixed points and one cycle.

Fig. 11
figure 11

Nonlinear Boolean network (Plaugher 2022)

Table 5 Standard Boolean logical rules

Up to this point we have only discussed linear BNs, but real-world models are almost always highly nonlinear (see Fig. 11). To accommodate these nonlinear regulatory networks, we implement various classes of functions based on three main Boolean logical rules - AND, OR, NOT. Some use XOR (exclusive OR), but for simplicity it is excluded here. Assume the variables x and y are given in a BN. Then Table 5 summarizes the functionality and notation used for each of the three main rules.

A common criticism of using discrete models for regulatory networks such as BNs is that deterministic dynamics are artificial. In reality biological systems do not contain a “central clock", but instead the concentration levels of gene products change and respond to stimuli on varying time-scales. Thus, the update schedules chosen play a significant role in the accuracy of the model. Synchronous update schedules produce deterministic dynamics, wherein nodes are all updated simultaneously so that

$$\begin{aligned} x(0)\rightarrow x(1)=F(x(0))\rightarrow x(2)=F(x(1))\rightarrow \cdots . \end{aligned}$$

On the other hand, asynchronous update schedules produce stochastic dynamics, wherein a randomly selected node is updated at each time step so that

$$\begin{aligned} x(0)\rightarrow x(1)=(x_1(0),\dots ,f_i(x(0)),\dots ,x_n(0))\rightarrow \cdots . \end{aligned}$$

Lastly, sequential update schedules are performed asynchronously according to a designated permutation \(\sigma = (\sigma _1,\dots ,\sigma _n)\) of \((1,\dots , n)\). Specifically, if we define \(F_i(x_1,\dots , x_n)=(x_1,\dots ,f_i(x),\dots ,x_n)\), then the update is given by

$$\begin{aligned} F_\sigma (x)=F_{\sigma _n}(F_{\sigma _{n-1}}(\cdots (F_{\sigma _{1}}(x))\cdots )) \end{aligned}$$

according to the order designated by \(\sigma \). This is sometimes done when the ordering of gene updates are known, as some may update faster than others. For example, using our simple example in Figs. 9, 12 shows the varying impacts of these three update schedules.

Fig. 12
figure 12

State-space dynamical variants according to update schedules (Plaugher 2022)

We can easily observe from Fig. 12 that fixed points are maintained across all update schedules. However, cycles are not necessarily preserved. As a result, different update schedules lead to different dynamics in the state space, which could lead to different attractors (or eliminate attractors), which would result in different target discoveries for interventions. This is where the framework of Stochastic Discrete Dynamical Systems (SDDS) is beneficial (Murrugarra and Aguilar 2018; Plaugher and Murrugarra 2021; Plaugher et al. 2022; Plaugher 2022). Developed in Murrugarra and Aguilar (2018), SDDS incorporates Markov chain tools to study long-term dynamics of Boolean networks. SDDS uses parameters based on designated propensities to model node (and pathway) signal activation and deactivation, also referred to as degradation. In essence, SDDS merges the synchronous and asynchronous update schedules described above. One propensity is used when the update positively impacts the node, in the sense that the node increases its value from OFF to ON. Another propensity is used when the update negatively affects the node in the sense that the node decreases its value from ON to OFF. More precisely, an SDDS of the variables \((x_1, x_2,..., x_{n})\) is a collection of n triples

$$\begin{aligned} \hat{F}=\{f_k, p_k^\uparrow , p_k^\downarrow \}_{k=1}^n \end{aligned}$$

where for \(k=1,..., n\),

  • \(f_k:\{0,1\}^n\rightarrow \{0,1\}\) is the update function for \(x_k\)

  • \(p_k^\uparrow \in [0,1]\) is the activation propensity

  • \(p_k^\downarrow \in [0,1]\) is the deactivation propensity

Here, the parameters \(p_k^\uparrow \) and \(p_k^\downarrow \) introduce stochasticity. For example, an activation of \(x_k(t)\) at the next time step (i.e. \(x_k(t)=0\), \(f_k(x_1(t),...,x_n(t))=1\), and \(x_k(t+1)=1\)) occurs with probability \(p_k^\uparrow \). An SDDS can be represented as a Markov Chain via its transition matrix, which can be viewed as transition probabilities between various states of the network. Elements of the transition matrix A are determined as follows: consider the set \({S}=\{0,1\}^n\) consisting of all possible states of the network. Suppose \(x=(x_1,...,x_n)\in {S}\) and \(y=(y_1,...,y_n)\in {S}\). Then, the probability of transitioning from x to y is

$$\begin{aligned} a_{y,x}=\prod _{i=1}^n P(x_i\rightarrow y_i) \end{aligned}$$
(23)

where entries are stored column-wise and

$$\begin{aligned} P(x_i\rightarrow f_i(x))=\left\{ \begin{matrix} p_k^\uparrow , &{}\text {if } x_i<f_i(x)\\ p_k^\downarrow , &{}\text {if } x_i>f_i(x)\\ 1, &{}\text {if } x_i = f_i(x) \end{matrix}\right. \quad \text {and}\quad P(x_i\rightarrow x_i)=\left\{ \begin{matrix} 1-p_k^\uparrow , &{}\text {if } x_i<f_i(x)\\ 1-p_k^\downarrow , &{}\text {if } x_i>f_i(x)\\ 1, &{}\text {if } x_i = f_i(x) \end{matrix}\right. . \end{aligned}$$

It follows that \(P(x_i\rightarrow y_i)=0\) for any \(y_i\notin \{x_i,f_i(x)\}\). Therefore, we achieve \(A=[a_{y,x}]_{x,y\in {S}}\). Note that when propensities are set to \(p=1\), we have a traditional BN. With this framework, we built a simulator that takes random initial states as inputs and then tracks the trajectory of each node through time. Long-term phenotype expression probabilities can then be estimated, as well as network dynamics with (and without) controls (Plaugher 2022).

1.2 7.2 Elementary Examples for Control Methods

1.2.1 7.2.1 Computational Algebra

Fig. 13
figure 13

CA example (Plaugher 2022)

Consider the network in Fig. 13, with the following regulatory functions.

$$\begin{aligned} f_1&= (\sim x_3) \wedge (\sim x_5)\\ f_2&= (\sim x_1) \vee x_4\\ f_3&= (\sim x_2)\vee x_5\\ f_4&= x_3\\ f_5&= \sim x_4 \end{aligned}$$

Using Table 5, we rewrite our functions as the following simplified polynomials.

$$\begin{aligned} f_1&= 1+x_3+x_5+x_3x_5\\ f_2&= 1+x_1+x_1x_4\\ f_3&= x_2x_5+x_2+1\\ f_4&= x_3\\ f_5&= 1+x_4\\ \end{aligned}$$

We can then find the fixed points of the system by solving \(f_i=x_i\) for \(i=1,\dots , 5\). Another way to view this step is as finding roots of \(g_i=0\) where \(g_i=f_i-x_i\), then finding the Grobner basis of the ideal \(I=\langle g_1,\dots ,g_5\rangle \). In any case, the example in Fig. 13 does not contain any fixed points. However, further state space analysis does reveal two attractors: \(\{01011, 01100\}\) and \(\{00101, 01010, 01110, 01111, 10001, 11000\}\). Now, we encode our edge controls as

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+(u_{3,1}+1)x_3+(u_{5,1}+1)x_5+(u_{3,1}+1)x_3(u_{5,1}+1)x_5\\ {\mathcal {F}}_2&= 1+(u_{1,2}+1)x_1+(u_{1,2}+1)x_1(u_{4,2}+1)x_4\\ {\mathcal {F}}_3&= (u_{2,3}+1)x_2(u_{5,3}+1)x_5+(u_{2,3}+1)x_2+1\\ {\mathcal {F}}_4&= (u_{3,4}+1)x_3\\ {\mathcal {F}}_5&= 1+(u_{4,5}+1)x_4 \end{aligned} \end{aligned}$$
(24)

and node controls as

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= (u_1^-+u_1^++1)(1+x_3+x_5+x_3x_5)+u_1^+\\ {\mathcal {F}}_2&= (u_2^-+u_2^++1)(1+x_1+x_1x_4)+u_2^+\\ {\mathcal {F}}_3&= (u_3^-+u_3^++1)(x_2x_5+x_2+1)+u_3^+\\ {\mathcal {F}}_4&= (u_4^-+u_4^++1)x_3+u_4^+\\ {\mathcal {F}}_5&= (u_5^-+u_5^++1)(1+x_4)+u_5^+. \end{aligned} \end{aligned}$$
(25)

Let’s consider the objective of generating new attractors, and assume we want our steady state to be \(y=11110\). In general, one can search the entire system for controls, but there may be special cases where limiting decisions can be made amongst collaborators. For arguments sake, suppose we want to find edge knockouts and limit our search to edges \(x_3\rightarrow x_1\), \(x_5\rightarrow x_1\), and \(x_2\rightarrow x_3\). Then the updated edge equations (Eq. 24) become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+(u_{3,1}+1)x_3+(u_{5,1}+1)x_5+(u_{3,1}+1)x_3(u_{5,1}+1)x_5\\ {\mathcal {F}}_2&= 1+x_1+x_1x_4\\ {\mathcal {F}}_3&= (u_{2,3}+1)x_2x_5+(u_{2,3}+1)x_2+1\\ {\mathcal {F}}_4&= x_3\\ {\mathcal {F}}_5&= 1+x_4. \end{aligned} \end{aligned}$$
(26)

Evaluating at \(y=11110\) yields

$$\begin{aligned} {\mathcal {F}}_1=u_{3,1},\quad {\mathcal {F}}_2=1,\quad {\mathcal {F}}_3= u_{2,3},\quad {\mathcal {F}}_4=1,\quad {\mathcal {F}}_5=0. \end{aligned}$$

Therefore, the desired fixed point is achieved if and only if \(u_{3,1}=u_{2,3}=1\). That is, the controls for \(u_{3,1}\) and \(u_{2,3}\) are active, such that we must delete both corresponding edges. Similarly, we can determine node control to achieve new fixed point \(y=11110\). Again, for simplicity, we limit ourselves to \(x_1\) knock-in, \(x_3\) knock-out and knock-in, and \(x_4\) knock-in. The updated node equations (Eq. 25) then become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= (u_1^++1)(1+x_3+x_5+x_3x_5)+u_1^+\\ {\mathcal {F}}_2&= 1+x_1+x_1x_4\\ {\mathcal {F}}_3&= (u_3^-+u_3^++1)(x_2x_5+x_2+1)+u_3^+\\ {\mathcal {F}}_4&= (u_4^++1)x_3+u_4^+\\ {\mathcal {F}}_5&= 1+x_4. \end{aligned} \end{aligned}$$
(27)

Evaluating at \(y=11110\) yields

$$\begin{aligned} {\mathcal {F}}_1=u_1^+,\quad {\mathcal {F}}_2=1,\quad {\mathcal {F}}_3=u_3^+,\quad {\mathcal {F}}_4=1,\quad {\mathcal {F}}_5=0. \end{aligned}$$

Thus, the desired fixed point is achieved if and only if \(u_1^+=1\) and \(u_3^+=1\). Importantly, this means that the controls by themselves are insufficient but together they achieve the desired goal. One can easily see that requiring numerous controls in much larger systems may not be biological feasible, which is why alternate objectives can prove useful.

Suppose we determine that \(y=01111\) is in a diseased attractor which we want to destroy. We can then aim to block the transition from y to \(F(y)=01110\). We limit ourselves to considering edges from \(x_3\rightarrow x_1\), \(x_5\rightarrow x_1\), \(x_3\rightarrow x_4\), and \(x_4\rightarrow x_5\). The updated edge equations (Eq. 24) become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+(u_{3,1}+1)x_3+(u_{5,1}+1)x_5+(u_{3,1}+1)x_3(u_{5,1}+1)x_5\\ {\mathcal {F}}_2&= 1+x_1+x_1x_4\\ {\mathcal {F}}_3&= x_2x_5+x_2+1\\ {\mathcal {F}}_4&= (u_{3,4}+1)x_3\\ {\mathcal {F}}_5&= 1+(u_{4,5}+1)x_4. \end{aligned} \end{aligned}$$
(28)

Evaluating at \(y=01111\) yields

$$\begin{aligned} {\mathcal {F}}_1=u_{3,1}u_{5,1},\quad {\mathcal {F}}_2=1,\quad {\mathcal {F}}_3=1,\quad {\mathcal {F}}_4=u_{3,4}+1,\quad {\mathcal {F}}_5=u_{4,5}. \end{aligned}$$

This means that Eq. 2 becomes

$$\begin{aligned} (u_{3,1}u_{5,1}+1)(u_{3,4})(u_{4,5}+1)=0 \end{aligned}$$

giving three possible solutions: \(u_{3,1}=u_{5,1}=1\), \(u_{3,4}=0\), or \(u_{4,5}=1\). Notice that we again have a combinatorial solution in \(u_{3,1},u_{5,1}\) since they are insufficient individually but successful together, \(u_{3,4}=0\) means that the control is inactive, and \(u_{4,5}\) is a singleton control.

Lastly, consider the objective of region blocking. Suppose we want to avoid regions where \(x_3=0\), and we will limit ourselves to nodes \(x_2\) knock-out, \(x_3\) knock-in, and \(x_4\) knock-in. Then the updated node equations (Eq. 25) become

$$\begin{aligned} \begin{aligned} {\mathcal {F}}_1&= 1+x_3+x_5+x_3x_5\\ {\mathcal {F}}_2&= (u_2^-+1)(1+x_1+x_1x_4)\\ {\mathcal {F}}_3&= (u_3^++1)(x_2x_5+x_2+1)+u_3^+\\ {\mathcal {F}}_4&= (u_4^++1)x_3+u_4^+\\ {\mathcal {F}}_5&= 1+x_4. \end{aligned} \end{aligned}$$
(29)

Next, we see that Eq. 3 yields

$$\begin{aligned} \begin{aligned} 0&= 1+x_3+x_5+x_3x_5 +x_1\\ 0&= (u_2^-+1)(1+x_1+x_1x_4) +x_2\\ 0&= (u_3^++1)(x_2x_5+x_2+1)+u_3^+ +x_3\\ 0&= (u_4^++1)x_3+u_4^+ +x_4\\ 0&= 1+x_4 +x_5\\ 0&=x_3 \end{aligned} \end{aligned}$$
(30)

Using computation algebra tools to compute the Grobner basis of the ideal associated to the above equations, we encode the system of equations to achieve the ideal:

$$\begin{aligned} I=\langle x_1+1, u_2^-,x_2+1,u^+_3,x_3,u^+_4+1,x_4+1,x_5 \rangle . \end{aligned}$$

This means the original system has the same solutions as the following system.

$$\begin{aligned} x_1+1&=0&u_2^-&=0&x_2+1&=0&u_3^+&=0 \\ x_3&=0&u_4^++1&=0&x_4+1&=0&x_5&=0 \end{aligned}$$

Recall that our goal is to block the region \(x_3=0\) by finding parameters that guarantee the above system has no solutions. Utilizing equations that only contain control parameters we have \(u_2^-=0\), \(u_3^+=0\), and \(u_4^++1=0\). Thus, if we allow either \(u_2^-=1\), \(u_3^+=1\), or \(u_4^+=0\), then our system will have no solution, as needed. Since \(x_3\) is limiting criteria and \(u_4^+\) is an inactive control, that leaves \(u_2^-=1\) as the desired target. As one can see, the computational algebra method is quite versatile (Plaugher 2022).

1.2.2 7.2.2 Control Kernel

Fig. 14
figure 14

CK example (Plaugher 2022)

Consider the network in Fig. 14. Steady state analysis reveals two fixed points: 000100 and 111011. Suppose our control objective is \(x_4=0\), which is the second fixed point respectively. We first notice that there are no input nodes, which means we move on to distinguishing nodes. Then the CK method (correctly) indicates that \(x_1=1\) will direct the system into the desired fixed point. Admittedly, while the CK method is straight forward, the documentation for the software used to implement the search can be difficult to navigate (Plaugher 2022).

1.2.3 7.2.3 Feedback Vertex Set

Figure 15 contains a simple example of identifying a FVS. The input node (\(x_1\)) is always in the control set, while the only other node required is one of those in the 3-cycle. As scene in the figure, Fig. 15a is the example wiring diagram and Fig. 15b–d show the three possible FVS’s. One can easily see that the strategy for FVS is quite simple, yet, it can produce larger control sets than necessary. Further, we may not obtain all FVS’s if the system has many attractors (Plaugher 2022).

Fig. 15
figure 15

FVS example (Plaugher 2022)

1.2.4 7.2.4 Stable Motifs

Consider the example network in Fig. 16a, with the following functions and negated functions.

$$ \begin{aligned} \begin{array}{llll} f_1&{}= x_2 | x_3 &{}\qquad \qquad \qquad \sim f_1 = (\sim x_2) \& (\sim x_3)\\ f_2&{}= x_1 \& (\sim x_3) &{}\qquad \qquad \qquad \sim f_2= (\sim x_1) | x_3\\ f_3&{}= (\sim x_1) | (\sim x_2) &{}\qquad \qquad \qquad \sim f_3= x_1 \& x_2 \end{array} \end{aligned}$$

Using the aforementioned steps, the expanded graph obtained is Fig. 16b. Notice there are two stable motifs (circled in orange and green), which indicate a fixed point (110) and a partial fixed point (X01). To find the rest of partial fixed point, substitute known values into the original functions. Therefore,

$$\begin{aligned} f_1= x_2 | x_3 = 0 | 1 = 1 \end{aligned}$$

which gives 101 as the second fixed point. Since the control sets are subsets of the stable motifs, we have \(\{x_2=1, x_3=0\}\) or \(\{x_1=1, x_3=0\}\) for fixed point 110, and \(\{x_2=0\}\) or \(\{x_3=1\}\) for fixed point 101 (Plaugher 2022).

Fig. 16
figure 16

Stable motif example (Plaugher 2022)

1.3 7.3 Simulating Target Efficacy

To determine the efficacy of controls, we compare uncontrolled simulations with the appropriate target control simulations. Thus, a good control will produce low disease levels and high health levels (Plaugher 2022). We can do so by utilizing a stochastic simulator based on SDDS (Murrugarra and Aguilar 2018; Plaugher and Murrugarra 2021; Plaugher et al. 2022; Plaugher 2022), which requires several inputs before it can begin. The number of input variables in each Boolean function is given by the vector nv. Next, we need the variables for each gene in the form of an \(m\times n\) matrix called varF where m is the maximum number of inputs, n is the number of genes, and information is stored column-wise. The number of variables will vary between functions. Since only the first nv(i) elements of the ith column are relevant, all remaining entries are set as \((-1)\). Now we construct the truth table F in compact form with size \(2^m \times n\). Again, the length of each column i will vary but only the first \(2^{nv(i)}\) entries are relevant. So all remaining entries are set as \((-1)\). It is vitally important to maintain numerical ordering, which is why the columns of F are in lexicographic binary arrays (Veliz-Cuba et al. 2022).

We must also establish propensities in the form of a \(2\times n\) matrix c that contains values for \(p_k^\uparrow \) and \(p_k^\downarrow \). The values chosen for propensities may perturb results, as we saw in Fig. 12. But for all intents and purposes, we typically use \(p_k^\uparrow = p_k^\downarrow = 0.9\) (i.e. follow the function rules \(90\%\) of the time). Finally, we can run simulations using inputs: F, varF, nv, number of states (usually Boolean), c, n, number of steps, and number of random initializations. We have also implemented versions that allow for mutation induction and specified initial states. As a result, we achieve time-course trajectories, and we can use the Markov chain structure of SDDS to analyze features such as time to absorption, stationary distributions, and more.

Fig. 17
figure 17

Simple 3-cycle (Plaugher 2022)

As an example, consider the simple 3-cycle in Fig. 17. This particular system has two fixed points (\(\{000\}\) and \(\{111\}\)) as well as two attractors (\(\{001, 100, 010\}\) and \(\{011, 101, 110\}\)). Simulations were conducted using the variables in Table 6, with 1000 random initializations, 100 time steps (function updates), and injecting 1\(\%\) noise. The overall state-space is shown in Fig. 18. In Fig. 19a, the uncontrolled simulation shows the oscillatory nature of attractors. However, Fig. 19b, c show that inducing control on \(x_1\) is enough to drive the system to one fixed point or the other. Therefore, the SDDS simulator has the ability to show long-term trajectories and impact of controls over time.

Table 6 Variable tables for simple 3-cycle simulations in Fig. 17 (Plaugher 2022)
Fig. 18
figure 18

Phase-space of simple 3-cycle. Here we show the state-space of the example from Fig. 17, using SDDS with transition probabilities, with nodes written in lexicographical ordering

Fig. 19
figure 19

Simulation examples for a simple 3-cycle with 1% noise (Plaugher 2022)

1.4 7.4 Software

1.5 7.5 Appendix Tables

See Tables 7 and 8.

Table 7 Small T-LGL rules
Table 8 Functions for large T-LGL model

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Plaugher, D., Murrugarra, D. Phenotype Control techniques for Boolean gene regulatory networks. Bull Math Biol 85, 89 (2023). https://doi.org/10.1007/s11538-023-01197-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11538-023-01197-6

Keywords

Navigation