# The phenotype control kernel of a biomolecular regulatory network

**Part of the following topical collections:**

## Abstract

### Background

Controlling complex molecular regulatory networks is getting a growing attention as it can provide a systematic way of driving any cellular state to a desired cell phenotypic state. A number of recent studies suggested various control methods, but there is still deficiency in finding out practically useful control targets that ensure convergence of any initial network state to one of attractor states corresponding to a desired cell phenotype.

### Results

To find out practically useful control targets, we introduce a new concept of phenotype control kernel (PCK) for a Boolean network, defined as the collection of all minimal sets of control nodes having their fixed state values that can generate all possible control sets which eventually drive any initial state to one of attractor states corresponding to a particular cell phenotype of interest. We also present a detailed method with which we can identify PCK in a systematic way based on the layered network and converging tree of a given network. We identify all candidates for control nodes from the layered network and then hierarchically search for all possible minimal sets by using the converging tree. We show the usefulness of PCK by applying it to cell proliferation and apoptosis signaling networks and comparing the results with other control methods. PCK is the unique control method for Boolean network models that can be used to identify all possible minimal sets of control nodes. Interestingly, many of the minimal sets have only one or two control nodes.

### Conclusions

Based on the new concept of PCK, we can identify all possible minimal sets of control nodes that can drive any molecular network state to one of multiple attractor states representing a same desired cell phenotype.

## Keywords

Biological network Boolean network model Attractor Basin Target control Network control Phenotype control kernel Layered network Converging tree## Abbreviations

- MAPK
Mitogen-activated protein kinase

- MFVS
Minimum feedback vertex set

- PCK
Phenotype control kernel

## Background

The ultimate goal of systems biology is to control a cellular state which is determined by the dynamics of the underlying molecular regulatory network. Here, the cellular state transition dynamics is governed by both topology (structural information on interaction between molecules) and regulatory functions (operations of the types of interactions).

There have been attempts (presented by various ‘control methods’) to find out ‘control nodes’ that can drive the network state to a desired one by fixing the state values of the control nodes or directly controlling the control nodes with external signals. One attempt was to find out a dominating set from the topology of undirected networks [1, 2, 3, 4, 5, 6, 7], which is defined as a set of central nodes that are connected to all the other nodes in undirected networks. Another bunch of works suggested driver nodes [8, 9], steering kernel [10] and feedback vertex sets [11, 12] based only on the topology of directed networks. To drive any given initial state composed of all the nodes in the network into any other final state within a finite time (‘full control’), it is enough to control driver nodes in [8] with external input signals directly acting on the driver nodes, where the dynamics of the network is modelled by a system of linear differential equations and the signals can be explicitly defined. The control method in [9] was developed to find out driver nodes to drive some nodes of interest instead of all the nodes as in [8], which can be considered as a generalization of [8] and is called ‘target control’ compared to full control. For transition between two specific states of a network instead of any two states as in [8], the structure-based method in [10] can be used where the dynamics is also modelled by a system of linear differential equations. It is possible that fewer nodes are enough to control some networks by using the methods in [9, 10] than in [8]. The strategy in [11, 12] is to make a given network acyclic by removing its feedback loops, where the removal is implemented by fixing state values of some nodes in the loops. When applying the structure-based method in [11, 12], the dynamics is modelled by a system of first order differential equations and the fixed values must be the values of the nodes in the desired attractor, where the point attractor for the differential equations is defined as a vector composed of all the node values at which the first derivatives of the nodes are zero. Despite the differential equations used in [11, 12] must satisfy some properties, the equations can be nonlinear.

Although the control methods based only on the topology can provide control nodes for a large size of networks, topology itself is not enough to identify control nodes that can ensure transition towards a desired attractor, which was shown by using Boolean network models [13]. Note that the Boolean attractor is defined as a set of vectors of Boolean state values of all the nodes that is closed with respect to transition (i.e., if a state vector is in the set, then next state vector is also included in the set). So, a number of other studies were carried out to use regulatory functions of the network in finding out control nodes. The control method in [14] was developed for continuous models as in [8, 9, 10, 11, 12] whose goal is to identify a sequence of perturbations to the undesired state of the system that can drive it to the attraction basin of the desired stable state, where the basin is the set of states converging to the desired state. However, there is no efficient algorithm of finding out the exact basin. Among various types of regulatory functions, the Boolean function is particularly useful for modeling large-scale regulatory networks as it is a parameter-free logical function and thereby we can avoid parameter estimation which is often a critical limitation in mathematical modeling of such large-scale networks [15, 16, 17], so many control methods for Boolean network models were developed [18, 19, 20, 21, 22, 23, 24, 25]. The basic idea in [18] is to numerically find out a minimal control set by using a genetic algorithm. The control method integrating the structure of a given network and the Boolean regulatory functions was developed in [19] by using both minimal strongly connected components and their fixed points of the Boolean model. The approach in [19] is based on the concept in [20]. The desired final state used in the control methods in [18, 19, 20] must be a Boolean attractor before applying the control methods, but there is no such restriction on the desired final state in [21, 22, 23, 24, 25] as in [8, 9, 10]. The state values of control nodes found in [18, 19, 20] and [21, 22, 23, 24, 25] are fixed and directly controlled with external Boolean inputs as in [11, 12] and [8, 9, 10], respectively. Given a set of external control nodes for a Boolean network which has a tree structure, the polynomial time algorithm in [21] and integer programming-based approach in [22] were introduced to find out a sequence of the state values of the external control nodes to force an initial state to transit toward a desired state within desired time steps. Necessary and sufficient conditions for controllability and observability of Boolean control networks are considered in [23] based on semi-tensor product of matrices, which is a new matrix product and requires high computational cost. The control method in [24] is based on computational algebra, which forces the desired state to become a fixed point attractor. This method is only applicable to the case of controlling a desired state to become a point attractor. To avoid undesirable state transitions, edge-deletion strategy can be applied as in [25].

In most cases, there are multiple attractor states corresponding to a particular cell phenotype of interest, which is defined by the state values of some nodes instead of all nodes. So, in case a control method for Boolean models is to drive a given network state to converge to any attractor corresponding to the phenotype, the desired final state depends only on the state values of the phenotype nodes instead of all nodes, which is a type of target control as in [9] for continuous models. However, there is no such target control method for Boolean network models as far as we know. In addition, there can be multiple control sets, which is the set of control nodes, for a given network and so we need target control methods whose goal is to find out all possible control sets, but there is no such target control method yet. Therefore, in this paper, we present a novel and practical target control method for Boolean network models with which we can identify all minimal control sets and show its usefulness by applying it to biological network examples.

## Results

### The layered network and converging tree of an example network

*P*= 0. The procedure of finding minimal control sets for

*P*= 0 is largely composed of two parts: The first part is hierarchically constructing a layered network (Fig. 1b) from the given network (Fig. 1a), where each layer consists of nodes of the given network. The second part is hierarchically finding out minimal control sets (Fig. 2a-d) and constructing the converging tree (Fig. 2e) which consists of the minimal control sets.

The layered network can be constructed as follows. The 0th layer of the layered network (Fig. 1b) is composed of the phenotype node P. The input nodes to P are C and E, which comprise the 1st layer. Likewise, among the nodes of A, B, D and F which are not used in the 0th nor 1st layer, the input nodes to C or E are B, D and F, which comprise the 2nd layer. Finally, the remaining node A is an input node to D, so the 3rd layer is composed only of A. Since there is no remaining input node to a node in the 3rd layer, the 4th layer does not exist and therefore the 3rd layer becomes the last layer in the layered network. The step-by-step procedure for construction of the layered network implies the uniqueness of the layered network. Using the layered network, we can separate the nodes in a given network into two groups: One is the set of nodes which have an influence on the phenotype nodes and the other is the set of nodes which have no influence. The two groups are presented for a simplified mitogen-activated protein kinase network. In addition, to find out a minimal control set containing a control node ϒ of interest, we can use the information of the location of the node ϒ in the layered network, which is shown for a simplified cancer cell signaling network. In particular, information on the influential nodes obtained from the layered network is used to implement the algorithm for construction of the converging tree. We can then find out minimal control sets in the converging tree by hierarchically applying necessary and sufficient conditions for the phenotype node to have the desired value as follows:

Step 0. Determine the desired value of the phenotype node. The 0th level of the converging tree (Fig. 2e) is the singleton set {*P* = 0} of the desired steady state value of the phenotype node P (Fig. 2a). For simplicity, we use the notation {P = 0} instead of the set {P|*P* = 0}.

*P*= 0} in the 0th level. To define the concepts of child and parent sets, let us consider that a minimal control set

*S*

_{1}is located in the

*i*

^{ th }level. If inserting

*S*

_{1}into the update rules results in a minimal control set

*S*

_{2}in the (

*i*− 1)

^{ th }level, then the minimal control set

*S*

_{1}is called the child set of the parent set

*S*

_{2}. The parent set of

*S*

_{2}is also called the ancestor set of

*S*

_{1}. We often omit the term ‘minimal’ in ‘minimal control sets’ if there is no confusion. The candidate nodes of the children sets of {

*P*= 0} are P, C and E because of P* = C&E. Then the steady state value P = 0 is directly generated by substituting one of the three perturbations {

*P*= 0}, {C = 0} and {E = 0} to the update rules (Fig. 1a) and, as a result, these three perturbation targets constitute the control sets in the 1st level. For a new child set

*S*, we apply the following two rules of removing ‘included’ control sets or ‘contradictory’ children sets to both

*S*∪

*S*

^{ before }and

*S*

^{ ancester }, where

*S*

^{ before }is the union of control sets found in the previous process and

*S*

^{ ancester }the union of the parent and ancestor sets of the child set

*S*:

- (1)
The first rule is to remove ‘included’ control sets in the set

*S*∪*S*^{ before }(see the first removal rule in Methods for details). - (2)
The second rule is to remove

*S*if the value of a control node*N*in*S*is ‘contradictory’ to the value of the control node*N*in*S*^{ ancester }(see the second removal rule in Methods for details).

For the control set *S*={*P* = 0} in the 1st level, we apply the removal rules to *S* ∪ *S*^{ before } = {*P* = 0}∪{*P* = 0} = {*P* = 0} and *S*^{ ancester } = {*P* = 0} and, as a result, {*P* = 0} in the 1st level is included in {P = 0} in the 0th level. Therefore, the control set {*P* = 0} in the 1st level is removed by the first rule. For the control set *S*={C = 0} in the 1st level, we apply the removal rules to *S* ∪ *S*^{ before }={C = 0}∪{*P* = 0} and *S*^{ ancester }={*P* = 0} and, as a result, we find that no control set is removed until the 1st level except {*P* = 0} in the 1st level. Similarly, in the case of *S*={E = 0} in the 1st level, we have *S* ∪ *S*^{ before }={E = 0}∪{C = 0}∪{*P* = 0} and *S*^{ ancester }={*P* = 0}, and therefore we find that no control set is removed until the 1st level except {*P* = 0} in the 1st level. As a result, the 1st level consists of two control sets {C = 0} and {E = 0} which are referred to as signals for the control set {*P* = 0} in the 0th level (Fig. 2b). In the following, for a singleton and parent set {K = 0}, we exclude such a node K from the candidate nodes of its children sets. Since the level of the minimal control set {C = 0} is the 1st level, {C = 0} can directly control the phenotype value P = 0 in the 0th level.

Step 2. Find children sets that directly generate each parent set in the 1st level. Since there exist two parent sets {C = 0} and {E = 0} in the 1st level (Fig. 2b), the children sets can be found for each parent set (Fig. 2c).

Step2–1. Parent set {C = 0} in the 1st level. The candidate nodes of the children sets are B, D and E because of the update rule C* = (!B)&D&E for the node C. Then the steady state value C = 0 is directly generated by substituting one of the three perturbations {B = 1}, {D = 0}, {E = 0} to the update rule. Applying the first removal rule, we find that {E = 0} in the 2nd level is removed due to {E = 0} in the 1st level and that the remaining control sets in the 2nd level are {B = 1} and {D = 0}.

Step2–2. Parent set {E = 0} in the 1st level. The candidate nodes are D and F because of the update rule E* = D|F. Then the steady state value E = 0 is directly generated by substituting the perturbation {(D, F) = (0,0)} to the update rule. Applying the first removal rule, we find that {(D,F) = (0,0)} in the 2nd level is removed due to {D = 0} in the 2nd level and that the parent set {E = 0} in the 1st level does not have a child set, resulting in that {E = 0} becomes a leaf set in the 1st level.

*P*= 0 in the 0th level via the parent set {C = 0} in the 1st level. Following the similar process, we can construct the converging tree in Fig. 2e (see Additional file 1 for details). Finally, we find out six minimal control sets: {C = 0} and {E = 0} in the 1st level, {B = 1} and {D = 0} in the 2nd level (children sets of {C = 0}), {A = 0} in the 3rd level (child set of {D = 0}), and {F = 0} in the last level (child set of {A = 0}). Using the converging tree, we find out that the minimal control set {F = 0} generates the minimal control sets which are the parent or ancestor sets {A = 0}, {D = 0} and {C = 0}. Since all possible sets of control targets must contain at least one minimal control set (see the proof in Additional file 1), we call the collection of such minimal control sets ‘phenotype control kernel (PCK)’ for a given Boolean network model. Therefore, PCK consists of six minimal control sets in this case,

### Two biological network models

We apply our method to two biological network models. The first one is the mitogen-activated protein kinase (MAPK) model [26] with 53 nodes and 88 links, where Boolean update rules are described with logic functions (the first tab in Additional file 2). To show that our method can also be applied to models with threshold functions instead of logic functions, we employed another cancer cell signaling network model [27] of 96 nodes and 265 links, where the Boolean update rules are described with threshold functions (the first tab in Additional file 3).

### The simplified MAPK network and control strategy

The MAPK model has four stimuli (DNA_damage, EGFR_stimulus, FGFR3_stimulus, TGFBR_stimulus) and three phenotype nodes (Apoptosis, Growth_Arrest, Proliferation) (Additional file 4: Figure S1).

which indicates that all attractors represent a proliferation phenotype. Hence, we find that any initial state of the MAPK network will converge to a cancer state under this condition (the last tab in Additional file 2).

To identify control targets under the aforementioned condition, we first need to determine how to represent the mutations in the MAPK network. For this purpose, let us consider the cancer therapies targeting oncogene addiction and synthetic lethality effects [28]: A cancer therapy targeting the oncogene addiction identifies a mutated gene and attempts to suppress oncogenic signal from the mutated gene [29]. Similarly, to represent the control situation in the MAPK network, we consider two mutant PPP2CA and CREB as input nodes generating oncogenic signals. As a result, we consider MEK1_2, ERK, p70, DUSP1, p38, MSK and MYC in the downstream of the two mutants as control targets that can suppress the oncogenic signals from CREB and PPP2CA.

Let us consider a case that we can only control non-phenotype nodes except CREB and PPP2CA in the simplified MAPK network and that we cannot change the determined phenotypic values (Apoptosis, Growth_Arrest) = (0,0) whereas we can obtain Proliferation = 0 by controlling some of the 26 nodes. In this case, the goal of control is to stop proliferation in the original network, indicating that all attractors of the original network should have the phenotypic values (Apoptosis, Growth_Arrest, Proliferation) = (0,0,0). To achieve this goal, we apply our method to the simplified MAPK network model and, as a result, we can find all control sets.

### The layered network of the simplified MAPK network

The nodes of the simplified MAPK network can be further partitioned according to where the nodes are located in the layered network as shown in Fig. 3, where 14 red nodes are the candidates for control nodes for the desired value of the phenotype node Proliferation (the green node in Fig. 3). On the other hand, any perturbation of the other 12 white nodes in Fig. 3 cannot drive the network to have the desired value of Proliferation.

### The converging tree of the simplified MAPK network

In the following, we describe the construction procedure of the converging tree step by step.

Step1. Find children sets that directly generate the parent set {Proliferation = 0} in the 0th level. Since the possible control nodes in the 1st level are input nodes to Proliferation, we find that the candidates for control nodes are p70 and MYC from Proliferation* = p70&MYC. Then the steady state value Proliferation = 0 is determined by one of the two perturbations {p70 = 0} and {MYC = 0}. Applying the two removal rules, we find that no control set is removed up to the present level. Therefore the 1st level consists of {p70 = 0} and {MYC = 0}.

Step2. Find children sets that directly generate each parent set in the 1st level. Since the 1st level contains two parent sets {p70 = 0} and {MYC = 0}, the children sets can be found for each parent set.

Step2–1. Parent set {p70 = 0} in the 1st level. The possible control node is ERK because of p70* = ERK. Then the steady state value p70 = 0 is determined by {ERK = 0}. Applying the two removal rules, we find that no control set is removed up to the present level.

Step2–2. Parent set {MYC = 0} in the 1st level. The possible control node is MSK because of MYC* = MSK. Then the steady state value MYC = 0 is determined by the perturbation {MSK = 0}. Applying the two removal rules, we find that no control set is removed up to the present level.

It follows from Step2–1 and Step2–2 that the control sets in the 2nd level are {ERK = 0} and {MSK = 0}.

Step3. Find children sets that directly generate each parent set in the 2nd level. The 2nd level contains two control sets {ERK = 0} and {MSK = 0}. We then follow the two steps as in Step 2.

Step3–1. Parent set {ERK = 0} in the 2nd level. The possible control node is MEK1_2 because of ERK* = MEK1_2. Then the steady state value ERK = 0 is determined by {MEK1_2 = 0}. Applying the two removal rules, we find that no control set is removed up to the present level.

Step3–2. Parent set {MSK = 0} in the 2nd level. The candidates for control nodes are ERK and p38 because of MSK* = ERK|p38. Then the steady state value MSK = 0 is determined by the perturbation {(ERK, p38) = (0,0)}. Applying the first removal rule, we find that {(ERK, p38) = (0,0)} is removed since the control set {ERK = 0} exists in the 2nd level. Thus {MSK = 0} becomes a leaf set in the 2nd level.

It follows from Step3–1 and Step3–2 that the control set in the 3rd level is {MEK1_2 = 0}.

Step4. Find children sets that directly generate the parent set {MEK1_2 = 0} in the 3rd level. The candidates for control nodes are PPP2CA and AP1 because of MEK1_2* =!(PPP2CA|AP1), which is simplified as MEK1_2* =!AP1 from the mutation PPP2CA = 0. Then the steady state value MEK1_2 = 0 is determined by {AP1 = 1}. Applying the two removal rules, we find that no control set is removed up to the present level. Therefore the 4th level has only one control set {AP1 = 1}.

Following the similar procedure with the two removal rules, we can obtain the converging tree in Fig. 4 (see Additional file 5 for the other levels).

### Comparison with other control methods

Our method can be considered as a target control method for Boolean network models, which drives some target nodes of interest to have desired values instead of all nodes in the network. Hence, previous target control methods for Boolean network models, if any, can be compared with our method. However, there is no such a target control method for Boolean network models, so direct comparison is not possible at present.

Although the method introduced in [9] is a target control method which is not for Boolean network models, we can still compare our method with it by using the simplified MAPK network model where the target node set consists of the Proliferation node (filled orange circle in Fig. 4a). The method in [9] provides a set of driver nodes that is enough to drive the Proliferation node to have its desired state value by using the greedy algorithm which constructs a series of bipartite graphs as shown in Additional file 6. As a result, the PPP2CA node becomes the unique driver node (filled red circle in Fig. 4a). For convenience, the set of all driver nodes is also called a control set from now on. Note that if a different maximum matching is chosen, a different control set {CREB} can be obtained. Then the method in [9] needs to control one node (denoted by number 1 before the green bar in Fig. 4a) and similarly our method needs only one or two layered nodes in the layered network for the phenotype control (denoted by numbers 1 and 2 before the blue bars in Fig. 4a), where the layered nodes are marked with purple circles in the bottom box in Fig. 4a. As expected, fewer driver nodes are needed for target control methods than the full control methods in [8] and [11, 12] (see Additional file 7).

Applying the greedy algorithm and our method to the simplified MAPK network model, we obtain one and eleven control sets in Fig. 4b, respectively. If the greedy algorithm is applied multiple times and different maximum matchings are chosen at each time, multiple control sets can be obtained. The difference of the number of control sets between the method in [9] and our method originates from the fact that driver nodes can be obtained only from the structure of the network but our method depends on Boolean update rules. Note that if a set Ψ of some nodes is not a minimal control set and controlling the set Ψ leads the simplified MAPK Boolean network to have the desired value Proliferation = 0, then a subset of Ψ is one of the 11 control sets in the converging tree and thereby the set Ψ is unnecessary with respect to perturbation.

### The simplified cancer cell signaling network with threshold update functions

The cancer cell signaling network has six inputs (Mutagen, GFs, Nutrients, TNFa, Hypoxia, Gli) and four output nodes (AcidLactic, Apoptosis, Glut_1 and DNARepair) as in Additional file 8: Figure S2.

Hypoxia condition is a common condition of cancer cells in vivo, so let us consider a simplified network model under this condition and find out control targets that can induce apoptosis [32]. Note that there are some fixed values (PTEN, APC, Max, p14, FOXO, ROS) = (1,1,1,0,1,0) in the original update rules. Substituting these fixed values as well as input Hypoxia = 1 into the original update rules, we finally have fixed values for 40 nodes and simplified update rules for the remaining 56 nodes including three output nodes Apoptosis, DNA_Repair and Glut_1 (the second tab in Additional file 3). Note that the attractors of the cancer cell signaling network are preserved after the substitution [33]. We refer to the cancer cell signaling network obtained from the simplified update rules as a simplified cancer cell signaling network.

### Construction of the converging tree based on the layered network of the simplified cancer cell signaling network

## Discussion

In case a desired phenotype is defined by the state values of multiple nodes as in [34, 35, 36], our method can also be applied if the number of children sets does not increase too much owing to the two removal rules, as shown in Additional file 10. However, if the number of children sets is enormously increased and the children sets are not efficiently eliminated by the two removal rules, then the complexity can become high. As a result, if we cannot complete the construction of a converging tree, we need to modify our algorithm such that the number of children sets is restricted.

Even if the construction of a converging tree is incomplete, the converging tree provides us with useful information since we can find control sets hierarchically from the root set. Let us consider a case when we need to know the level of a converging tree, in which there is a control set containing a given control node *N* before we construct the converging tree. If the node *N* is not a direct or indirect input node to the phenotype nodes of interest, it is not possible to find such a control set and therefore we cannot find such a level even after completion of the converging tree. Otherwise, we can use the relationship between such a level and the shortest length of paths from the given node *N* to the phenotype nodes. To deal with both cases, we have introduced the layered network which hierarchically consists of direct or indirect input nodes to the phenotype nodes. We need to note that the layered network can be constructed by using only the topology of a given directed network and it does not require the information about regulatory functions. If the node *N* is not included in the layered network, then there is no control set having the node *N* as its control node. For instance, SOS, FRS2, SPRY, PKC, FGFR3, GRB2, PLCG, EGFR, BCL2, MAX, p14 and MDM2 in Fig. 3 are not included in the layered network. Otherwise, we need to construct the converging tree with multiple levels. For instance, if *N* is the node *N* = *CHK*1/2 in Fig. 5, then the node *N*is located in the 4th layer of the layered network and therefore we need to construct the converging tree with at least 4 levels as in Fig. 6. In this respect, our method can be considered as a hierarchical control strategy based on both topology and regulatory functions of the network in finding out control nodes.

## Conclusions

There is a growing interest in controlling complex biological networks, but no practical method is available that can be used to find out all possible combinations of control targets ensuring convergence to a desired cell phenotype. To resolve this problem, in this study, we introduced the concept of PCK and presented a detailed method of identifying PCK based on layered network and converging tree. We showed that PCK can generate all control sets. The converging tree can be useful for biologists or clinicians since it can be used to find out drug targets or to realize precision medicine (or personalized medicine) by identifying control targets that are interpreted or serve as drug targets. For instance, we can construct a patient-specific cancer cell signaling network model by reflecting the genomic variation of the tumor sample and apply this method to identify the most effective drug targets in consideration of such genomic variation effects in the network dynamics.

In this paper, we considered an intracellular regulatory network model and developed one pair of the layered network and converging tree of the single network model. Considering the heterogeneous cell population of cancer, we need to further expand the present approach by considering multiple network models at the same time. For instance, for two Boolean network models representing two cancer cell types, we can construct two pairs of the layered network and converging tree. In this way, we might be able to find out a set of common control sets that can be applied to the two cancer networks for the desired values of the phenotype nodes. Such a control set can be found by combining the layered network (topological property) and the converging tree (dynamical property). This remains as a future study.

## Methods

Let *P*_{1}, ⋯, *P*_{ℓ} be some of the phenotype nodes of a given Boolean network, which are used to define the desired state values *P*_{ i } = *d*_{ i } (1 ≤ *i* ≤ ℓ) for a positive integer ℓ and constants *d*_{ i } ∈ {0, 1}. The term ‘minimal control set’ denotes a minimal set of nodes having their fixed state values that can ensure convergence of the Boolean network to the desired values *P*_{ i } = *d*_{ i } (1 ≤ *i* ≤ ℓ) by inserting the fixed state values into the update rules of the Boolean network, where the elements of the minimal control set are referred to as ‘control nodes’ and it is said that the minimal control set generates (or determines) the desired state values *P*_{ i } = *d*_{ i } (1 ≤ *i* ≤ ℓ). Any minimal control set can be hierarchically found from a tree structure obtained by solving a system of some Boolean equations constructed from the update rules, so some terminologies are employed here to indicate the location of minimal control sets: root, child, parent, ancestor, descendant and leaf sets. The root set is {(*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, *d*_{ℓ})}. If a minimal control set *S*_{1} directly generates the root set {(*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, *d*_{ℓ})}, then the minimal control set *S*_{1} is called the child set of the parent set {(*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, *d*_{ℓ})}. If *S*_{2} is a child set of *S*_{1}, then {(*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, *d*_{ℓ})} is called the ancestor set of *S*_{2}. In addition, *S*_{2} is called the descendant set of {(*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, *d*_{ℓ})}. If *S*_{1} has no child set, then *S*_{1} is called a leaf set.

### Layered network

The layered network of a given network consists of layers, where the 0^{ th } layer consists of the phenotype nodes of interest. Let us denote the nodes located in the *k*^{ th } layer for any *k* in {0, ⋯, *i*} as \( {N}_1^k,\cdots, {N}_{\gamma_k}^k \) for some nonnegative integers *i* and *γ*_{ k }. Then the (*i* + 1)^{ th } layer consists of the nodes which are inputs to at least one of \( {N}_1^i,\cdots, {N}_{\gamma_i}^i \) and not contained in \( {\cup}_{k=0}^i\left\{{N}_1^k,\cdots, {N}_{\gamma_k}^k\right\} \). Each node of the layered network is referred as a layered node. Since the input nodes to each node in the network are unique, the layered network is unique.

### The first removal rule for included control sets

Let \( {C}^k=\left\{\left({N}_1^k,\cdots, {N}_{\xi_k}^k\right)=\left({n}_1^k,\cdots, {n}_{\xi_k}^k\right)\right\} \) be a control set in the *k*^{ th } level. If a control set \( {C}^i=\left\{\left({N}_1^i,\cdots, {N}_{\varsigma}^i\right)=\left({n}_1^i,\cdots, {n}_{\varsigma}^i\right)\right\} \) in the *i*^{ th } level satisfies

\( \left({N}_m^k,{n}_m^k\right)=\left({N}_{g_{mki}}^i,{n}_{g_{mki}}^i\right) \) for some *k*, all 1 ≤ *m* ≤ *ξ*_{ k }, 1 ≤ *g*_{ mki } ≤ *ς* and *ξ*_{ k } ≤ *ς*.

then *C*^{ i } is removed. In this case, *C*^{ i } is said to be ‘included’ in *C*^{ k } from the point of view of perturbations. For instance, we define that a control set {A = 0} includes a control set {(A, B) = (0,1)} from the point of view of perturbations and that the control set {(A, B) = (0,1)} is said to be an included control set. Therefore, the control set {(A, B) = (0,1)} can be removed. Such a rule is referred to as the first removal rule for included control sets.

### The second removal rule for contradictory children sets

Let a candidate \( {C}^{i+1}=\left\{\left({N}_1^{i+1},\cdots, {N}_{\varsigma}^{i+1}\right)=\left({n}_1^{i+1},\cdots, {n}_{\varsigma}^{i+1}\right)\right\} \)for a minimal control set in the (*i* + 1)^{ th } level have parent or ancestor sets \( {C}_1^k,\cdots, {C}_{\gamma_k}^k \) in the *k*^{ th } level for 0 ≤ *k* ≤ *i* and a positive integer *γ*_{ k }**,** where \( {C}_j^k=\left\{\left({N}_1^{k,j},\cdots, {N}_{\xi_{k,j}}^{k,j}\right)=\left({n}_1^{k,j},\cdots, {n}_{\xi_{k,j}}^{k,j}\right)\right\} \) for 1 ≤ *j* ≤ *γ*_{ k } and \( {n}_{\mathrm{\ell}}^{k,j}\in \left\{0,1\right\}\left(1\le \mathrm{\ell}\le {\xi}_{k,j}\right) \). If *C*^{i + 1} satisfies

\( {N}_m^{i+1}={N}_{g_{mikj}}^{k,j} \) and \( {n}_m^{i+1}\ne {n}_{g_{mikj}}^{k,j} \) for some *m*, *k*, *j* and *g*_{ mikj },

then the candidate *C*^{i + 1} is removed. In this case *C*^{i + 1} in the (*i* + 1)^{ th } level is said to be a contradictory child set. For example, as in Fig. 2d, the control set {(C,F) = (1,0)} in the 3rd level has an ancestor set {C = 0} in the 1st level. Applying the child set {(C,F) = (1,0)} into the given network, we can obtain the parent set {B = 1} but cannot the ancestor set {C = 0}, so that we cannot obtain the desired phenotype value. Therefore, we need to remove the minimal control set {(C,F) = (1,0)} in the 3rd level, where the removal is referred to as the second removal rule for ‘contradictory’ children sets.

### Algorithm for the converging tree

The converging tree of a given network can be hierarchically constructed as follows.

Step 0. Determine the desired steady state values of the phenotype nodes. Let *P*_{1}, ⋯, *P*_{ℓ} (1 ≤ ℓ) be some of the phenotype nodes of a given Boolean network such that the phenotype nodes have desired steady state values (*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, d_{ℓ}) for a positive integer ℓ and constants *d*_{ υ } (1 ≤ *υ* ≤ ℓ) of values 0 or 1. Then the 0th level of the converging tree is the set {(*P*_{1}, ⋯, *P*_{ℓ}) = (*d*_{1}, ⋯, d_{ℓ})}, which is called the root set of the converging tree.

Step 1. Let the control sets *C*_{1}, ⋯, *C*_{ k } be non-leaf sets of the *i*^{ th } level of the converging tree for a nonnegative integer *i* and a positive integer *k*, where \( {C}_j=\left\{\left({N}_1^j,\cdots, {N}_{\gamma_j}^j\right)=\left({n}_1^j,\cdots, {n}_{\gamma_j}^j\right)\right\} \) for 1 ≤ *j* ≤ *k*, a positive integer *γ*_{ j } and \( {n}_m^j\in \left\{0,1\right\}\left(1\le m\le {\gamma}_j\right) \). Here \( {N}_m^j \) is a control node of the control set *C*_{ j } with the fixed state value \( {N}_m^j={n}_m^j \).

Step 1–1. Find children sets \( {C}_j^{child} \) of each parent set *C*_{ j } in the *i*^{ th } level.

Before applying our algorithm, we can prepare the collection of solutions of each Boolean equation \( {n}^i={f}_i\left({x}_1^i,\cdots, {x}_{\alpha_i}^i\right) \), *n*^{ i } ∈ {0, 1}, where \( {x}^i={f}_i\left({x}_1^i,\cdots, {x}_{\alpha_i}^i\right) \) is the update rule for *x*^{ i }. Then the solutions of the system can be obtained by the Cartesian product of sets of solutions of each Boolean equation.

Step 1–1-1. Find included sets among \( {C}_j^{child} \) by applying the first removal rule.

Case 1-1. All \( {C}_j^{child} \) are included.

If *j* < *k*, replace *j* with *j* + 1 and go to Step 1–1.

Otherwise, go to Step2.

Case 1-2. Some of \( {C}_j^{child} \) are not included.

Denote by \( {C}_{j,\mathrm{I}}^{child} \) the children sets that are not included and go to Step 1–1-2.

Step 1–1-2. Find contradictory children sets among \( {C}_{j,\mathrm{I}}^{child} \) by applying the second removal rule.

Case 2-1. All \( {C}_{j,\mathrm{I}}^{child} \)are contradictory.

If *j* < *k*, replace *j* with *j* + 1 and go to Step 1–1.

Otherwise, go to Step2.

Case 2-2. Some of \( {C}_{j,\mathrm{I}}^{child} \)are not contradictory.

Denote by \( {C}_{j,\mathrm{I},\mathrm{I}\mathrm{I}}^{child} \) the children sets that are not contradictory and go to Step 1–1-3.

Step 1–1-3. Find control sets in the *τ*^{ th } level (1 ≤ *τ* ≤ *i* + 1) which are included in one of \( {C}_{j,\mathrm{I},\mathrm{I}\mathrm{I}}^{child} \). If there exists an included control set, go to Case 3. Otherwise, replace *j* with *j* + 1 and go to Step 1–1 when *j* < *k*, and go to Step2 when *j* = *k*.

Case 3-1. The included control sets in the *τ*^{ th } level are not parent or ancestors of any one of \( {C}_{j,\mathrm{I},\mathrm{I}\mathrm{I}}^{child} \).

Remove the included control sets and their descendant sets. If there exists a next parent *C*_{ υ }(*j* + 1 ≤ *υ* ≤ *k*), repeat Step 1–1 with *C*_{ υ } instead of *C*_{ j }. Otherwise, go to Step 2 .

Case 3-2. Other case.

Let the *L*^{ th } level (0 ≤ *L* ≤ *i*) be the lowest level among the levels in which there exists a parent or ancestor set included in one of \( {C}_{j,\mathrm{I},\mathrm{I}\mathrm{I}}^{child} \). Denote the included control sets in the *L*^{ th } level by *C*_{ L }. Replace *C*_{ L } with the children sets of \( {C}_{j,\mathrm{I},\mathrm{I}\mathrm{I}}^{child} \) which include *C*_{ L } and remove the *ξ*^{ th } levels (*L* < *ξ*) as well as included control sets in the *L*^{ th } level. Repeat Step 1–1 from the *L*^{ th } level instead of the *i*^{ th } level.

Step 2. Let the control sets *D*_{1}, ⋯, *D*_{ θ } consist of the (*i* + 1)^{ th }level of the converging tree for a positive integer *θ*. If all *D*_{1}, ⋯, *D*_{ θ } are leaf sets, then construction of the converging tree is completed and PCK is the collection of all minimal control sets in the converging tree. Otherwise, repeat Step 1 for non-leaf sets in the (*i* + 1)^{ th } level instead of *C*_{1}, ⋯, *C*_{ k }.

Remark. The algorithm for the converging tree can be summarized as two parts: The first part is to find out the layered nodes that have influence on the desired phenotype values. The second part is to apply the two removal rules to eliminate ‘included’ or ‘contradictory’ control sets. As a result, we can obtain the minimal control sets in each level of the converging tree. Before applying our algorithm, we can prepare the collection of solutions of each Boolean equation \( {n}^i={f}_i\left({x}_1^i,\cdots, {x}_{\alpha_i}^i\right) \),*n*^{ i } ∈ {0, 1}, where \( {x}^i={f}_i\left({x}_1^i,\cdots, {x}_{\alpha_i}^i\right) \) is the update rule for *x*^{ i }. Then the solutions of system of Boolean equations can be obtained by the Cartesian product of sets of solutions of each Boolean equation. So, the computational complexity is as follows.

*p*

_{ i }(1 ≤

*i*≤

*η*

_{ pi }) be a parent set at the

*i*

^{ th }level, where the nodes of the set

*p*

_{ i }are denoted by (

*pn*)

_{i, j}(1 ≤

*j*≤

*η*

_{ pni }). Since we can find out children nodes (

*cn*)

_{i, j, k}(1 ≤

*k*≤

*η*

_{ cnij }) of the parent node (

*pn*)

_{i, j}without solving Boolean equations by using the collection prepared a priori, the number of children sets of the parent set

*p*

_{ i }is \( {\prod}_{j=1}^{\eta_{pni}}{\eta}_{cnij} \) and the number of children sets of all the parent sets in the

*i*

^{ th }level is \( {\eta}_{pi}\times {\prod}_{j=1}^{\eta_{pni}}{\eta}_{cnij} \). Hence, the computational complexity for finding out children sets of all the parent sets in the

*i*

^{ th }level is

If the number of children sets is not increased owing to the two removal rules, the computational complexity would be not too high, as shown in Additional file 10. However, if the number of children sets becomes enormously increased since the children sets are not eliminated by the two removal rules, then the complexity might become high. As a result, if we cannot complete the construction of converging tree, we need to modify our algorithm such that the number of children sets is restricted.

## Notes

### Acknowledgments

Not applicable.

## Funding

This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea Government, the Ministry of Science and ICT (2017R1A2A1A17069642, 2015M3A9A7067220, and 2013M3A9A7046303). It was also supported by the KAIST Future Systems Healthcare Project from the Ministry of Science, ICT & Future Planning, the KUSTAR-KAIST Institute, Korea under the R&D program supervised by the KAIST, and the grant of the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea (HI13C2162).

## Availability of data and materials

The data supporting the results of this article are included and cited within the article and its additional files.

### Authors’ contributions

Conceived and designed the experiments: SMC and KHC. Performed the experiments: SMC, BB and JIJ. Analyzed the data: SMC, BB, JIJ and KHC. Wrote the paper: SMC and KHC. All authors read and approved the final manuscript.

## Ethics approval and consent to participate

Not applicable.

## Consent for publication

Not applicable.

## Competing interests

The authors declare that they have no competing interests.

## Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary material

## References

- 1.Milenkovic T, Memisevic V, Bonato A, Przulj N. Dominating biological networks. PLoS One. 2011;6(8):e23016.CrossRefPubMedPubMedCentralGoogle Scholar
- 2.Nacher JC, Akutsu T. Dominating scale-free networks with variable scaling exponent: heterogeneous networks are not difficult to control. New J Phys. 2012;14(7):073005.CrossRefGoogle Scholar
- 3.Wuchty S. Controllability in protein interaction networks. P Natl Acad Sci USA. 2014;111(19):7156–60.CrossRefGoogle Scholar
- 4.Khuri S, Essentiality WS. Centrality in protein interaction networks revisited. Bmc Bioinformatics. 2015;16:109.CrossRefPubMedPubMedCentralGoogle Scholar
- 5.Nacher JC, Akutsu T. Minimum dominating set-based methods for analyzing biological networks. Methods. 2016;102:57–63.CrossRefPubMedGoogle Scholar
- 6.Zhang XF, Ou-Yang L, Zhu Y, Wu MY, Dai DQ. Determining minimum set of driver nodes in protein-protein interaction networks. Bmc Bioinformatics. 2015;16:146.CrossRefPubMedPubMedCentralGoogle Scholar
- 7.Wang H, Zheng H, Browne F, Wang C, editors. Minimum dominating sets in cell cycle specific protein interaction networks. 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2014. p. 2–5.Google Scholar
- 8.Liu YY, Slotine JJ, Barabasi AL. Controllability of complex networks. Nature. 2011;473(7346):167–73.CrossRefPubMedGoogle Scholar
- 9.Gao J, Liu YY, D'Souza RM, Barabasi AL. Target control of complex networks. Nat Commun. 2014;5:5415.CrossRefPubMedPubMedCentralGoogle Scholar
- 10.Wu FX, Wu L, Wang J, Liu J, Chen L. Transittability of complex networks and its applications to regulatory biomolecular networks. Sci Rep-Uk. 2014;4:4819.CrossRefGoogle Scholar
- 11.Fiedler B, Mochizuki A, Kurosawa G, Saito D. Dynamics and control at feedback vertex sets. I: informative and determining nodes in regulatory networks. J Dyn Differ Equ. 2013;25(3):563–604.CrossRefGoogle Scholar
- 12.Mochizuki A, Fiedler B, Kurosawa G, Saito D. Dynamics and control at feedback vertex sets. II: a faithful monitor to determine the diversity of molecular activities in regulatory networks. J Theor Biol. 2013;335:130–46.CrossRefPubMedGoogle Scholar
- 13.Gates AJ, Rocha LM. Control of complex networks requires both structure and dynamics. Sci Rep-Uk. 2016;6:24456.CrossRefGoogle Scholar
- 14.Cornelius SP, Kath WL, Motter AE. Realistic control of network dynamics. Nat Commun. 2013;4:1942.CrossRefPubMedPubMedCentralGoogle Scholar
- 15.Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969;22(3):437–67.CrossRefPubMedGoogle Scholar
- 16.Glass L, Kauffman SA. The logical analysis of continuous, non-linear biochemical control networks. J Theor Biol. 1973;39(1):103–29.CrossRefPubMedGoogle Scholar
- 17.Wang RS, Saadatpour A, Albert R. Boolean modeling in systems biology: an overview of methodology and applications. Phys Biol. 2012;9(5):055001.CrossRefPubMedGoogle Scholar
- 18.Kim J, Park SM, Cho KH. Discovery of a kernel for controlling biomolecular regulatory networks. Sci Rep-Uk. 2013;3:2223.CrossRefGoogle Scholar
- 19.Zanudo JG, Albert R. Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput Biol. 2015;11(4):e1004193.CrossRefPubMedPubMedCentralGoogle Scholar
- 20.Wang RS, Albert R. Elementary signaling modes predict the essentiality of signal transduction network components. BMC Syst Biol. 2011;5:44.CrossRefPubMedPubMedCentralGoogle Scholar
- 21.Akutsu T, Hayashida M, Ching WK, Ng MK. Control of Boolean networks: hardness results and algorithms for tree structured networks. J Theor Biol. 2007;244(4):670–9.CrossRefPubMedGoogle Scholar
- 22.Akutsu T, Zhao Y, Hayashida M, Tamura T. Integer programming-based approach to attractor detection and control of Boolean networks. Ieice T Inf Syst. 2012;E95d(12):2960–70.CrossRefGoogle Scholar
- 23.Cheng DZ, Qi HS. Controllability and observability of Boolean control networks. Automatica. 2009;45(7):1659–67.CrossRefGoogle Scholar
- 24.Murrugarra D, Veliz-Cuba A, Aguilar B, Laubenbacher R. Identification of control targets in Boolean molecular network models via computational algebra. BMC Syst Biol. 2016;10:94.CrossRefPubMedPubMedCentralGoogle Scholar
- 25.Murrugarra D, Dimitrova ES. Molecular network control through boolean canalization. EURASIP J Bioinform Syst Biol. 2015;2015(1):9.CrossRefPubMedPubMedCentralGoogle Scholar
- 26.Grieco L, Calzone L, Bernard-Pierrot I, Radvanyi F, Kahn-Perles B, Thieffry D. Integrative modelling of the influence of MAPK network on cancer cell fate decision. PLoS Comput Biol. 2013;9(10):e1003286.CrossRefPubMedPubMedCentralGoogle Scholar
- 27.Fumia HF, Martins ML. Boolean network model for cancer pathways: predicting carcinogenesis and targeted therapy outcomes. PLoS One. 2013;8(7):e69008.CrossRefPubMedPubMedCentralGoogle Scholar
- 28.Lord CJ, Ashworth A. Mechanisms of resistance to therapies targeting BRCA-mutant cancers. Nat Med. 2013;19(11):1381–8.CrossRefPubMedGoogle Scholar
- 29.Luo J, Solimini NL, Elledge SJ. Principles of cancer therapy: oncogene and non-oncogene addiction. Cell. 2009;136(5):823–37.CrossRefPubMedPubMedCentralGoogle Scholar
- 30.Wang T, Yu H, Hughes NW, Liu B, Kendirli A, Klein K, et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with Oncogenic Ras. Cell. 2017;168(5):890–903. e15CrossRefPubMedPubMedCentralGoogle Scholar
- 31.Kaelin WG Jr. The concept of synthetic lethality in the context of anticancer therapy. Nat Rev Cancer. 2005;5(9):689–98.CrossRefPubMedGoogle Scholar
- 32.Wilson WR, Hay MP. Targeting hypoxia in cancer therapy. Nat Rev Cancer. 2011;11(6):393–410.CrossRefPubMedGoogle Scholar
- 33.Saadatpour A, Albert R, Reluga TC. A reduction method for Boolean network models proven to conserve attractors. SIAM J Appl Dyn Syst. 2013;12(4):1997–2011.CrossRefGoogle Scholar
- 34.Steinway SN, Zanudo JGT, Michel PJ, Feith DJ, Loughran TP, Albert R. Combinatorial interventions inhibit TGFbeta-driven epithelial-to-mesenchymal transition and support hybrid cellular phenotypes. NPJ Syst Biol Appl. 2015;1:15014.CrossRefPubMedPubMedCentralGoogle Scholar
- 35.Cahan P, Li H, Morris SA, Lummertz da Rocha E, Daley GQ, Collins JJ. CellNet: network biology applied to stem cell engineering. Cell. 2014;158(4):903–15.CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Cho KH, Joo JI, Shin D, Kim D, Park SM. The reverse control of irreversible biological processes. Wiley Interdiscip Rev Syst Biol Med. 2016;8(5):366–77.CrossRefPubMedPubMedCentralGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.