Critical nodes identification in complex systems

To control complex systems with limited resources, critical nodes need to be identified for protection or removal. Loss of critical nodes decreases or minimizes a system’s ability to diffuse entities such as information, goods, or diseases. We design three metrics to assess system homogeneity, diffusion speed, and diffusion scale, and investigate their performance over complex systems. Six algorithms using the three metrics to identify critical nodes are examined. The three nonpolynomial-time algorithms identify the most critical nodes (global optimum). The three polynomial-time algorithms identify critical nodes step by step (local optima), but do not guarantee the global optimum. The three polynomial-time algorithms are compared to other critical nodes identification algorithms and have better performance; they may be applied to practical problems to efficiently identify critical nodes in complex systems.


Introduction
Vulnerabilities to natural disasters and terrorist violence require a better understanding of complex systems and new methods of protection and prevention [4]. The objective of this research is to apply informative performance metrics to B Xin Chen xchen@siue.edu 1 Department of Mechanical and Industrial Engineering, Southern Illinois University, Edwardsville, Illinois 62026-1805, USA identify critical nodes in complex systems such as smart grid, social networks, information systems, and criminal organizations. For example, to protect an electrical power grid, most vulnerable transformers and stations whose failures will cause large-scale blackouts (hence they are also critical nodes) need to be provided with backup capacity or enhanced security. For another example, to ensure public safety and security, key members of a criminal or terrorist organization need to be neutralized (e.g., captured or isolated).
Ideally, we would like to protect all nodes in an electrical power grid or an information system. In reality, resources are limited. To protect all nodes in a large system is not affordable. Time to respond to a criminal activity or terrorist attack is short. It is necessary to apply limited resources to the most critical nodes to maximize the effect of either protecting a system or destroying a criminal or terrorist organization. Previous research on critical nodes identification (CNI) suggested a few metrics, but did not indicate why and how they might be useful in practical problems. Some metrics and related CNI algorithms (e.g., [23]) developed earlier can only be applied to systems with special structures. In addition, previous research predominantly studied different CNI algorithms in order to maximize or minimize performance metrics. The characteristics of performance metrics were not studied or used to identify critical nodes.
This research investigates a set of informative performance metrics and applies them to identify critical nodes in complex systems. The main contributions of this research include: (a) development of three new system performance metrics that describe desired properties of complex systems; (b) design of algorithms that use the metrics to identify critical nodes; and (c) analysis of characteristics of the metrics and algorithm complexity.

Background
Nodes in a complex system represent machines, equipment, workstations, computers, generators, control units, operators, and other components each of which is modeled as a separate entity. Edges (links) between nodes represent the flow of entities including products, services, and information. Nodes are linked directly or indirectly. If a node j is linked to a node j directly, there is an edge between the two nodes. When two nodes j and j are linked indirectly, there is at least one path between j and j through other nodes so that entities can be diffused from j to j and/or from j to j. When two nodes are not linked, there is no path between the two nodes. Entities may flow along both directions between two nodes connected by an undirected edge [10,11,13]. Directed edges are also called arcs; entities may flow along only one direction between two nodes connected by an arc.
There have been many studies on CNI, and complex networks and complex systems in general [29]. Most recently, Shen and Smith [23] used dynamic programming algorithms to identify critical nodes in trees and series-parallel systems. The performance metrics of interest were: (a) the number of components, which was to be maximized; and (b) the largest component order (i.e., the number of nodes in a component), which was to be minimized. For systems which can be modeled as trees and series-parallel graphs, the complexity of the dynamic programming algorithms was at most O(n 5 log n). These algorithms may be extended to generalized connected systems, which are interpreted as k-hole systems. The complexity of the algorithms, however, increases exponentially as k increases. The algorithms are not applicable to systems with disconnected components.
Analysis of system vulnerability is related to CNI. Dinh et al. [16] used dynamic programming to identify critical nodes and links. The algorithms developed were approximation algorithms and the complexity was at most O(log 1.5 n). The performance metric of interest was pairwise connectivity. The pairwise connectivity between two nodes is one if this pair is connected and zero otherwise. In an undirected system, a pair of nodes is connected if and only if there exists at least one path (in both directions) between the two nodes. The system pairwise connectivity is the sum of pairwise connectivity between any pair of nodes. For a given level of degradation in pairwise connectivity, it was shown that to find a minimum set of nodes or edges (called β-disruptor), whose removal causes the specific level of degradation, was an NP-complete problem for undirected systems.
CNI was also studied in the context of system reliability [28]. A reliability metric, , was used to describe the average reliability between every pair of nodes in a system. An evolutionary algorithm was used to identify critical components (nodes), the removal of which aimed to minimize both and the cost of incapacitating links (multiple-objective opti-mization). The evolutionary algorithm provided good-quality solutions; its Pareto fronts (efficient frontiers) were close to the real Pareto optimal solutions.
In the study of the Internet and World Wide Web [1,7] and more generally complex systems such as airline routes, electric power grids, and disease propagation [8], the diameter of a system, which is the average length of the shortest paths between any two nodes in the system, was analyzed when nodes were removed according to the number of links they have; nodes with more links were removed first. Compared to removing randomly selected nodes, the link-based node removal increased the diameter faster, although the diameter does not always increase when a node is removed.
Earlier studies on CNI focused on general systems and algorithm complexity. For a fixed system property (i.e., a value or a range of values for a performance metric), a nodedeletion problem aims to find the minimum number of nodes which must be deleted from a given system so that the result satisfies the property. It was shown that the node-deletion problem for a system property is NP-hard or NP-complete if the property is nontrivial and hereditary [19,25]. A system property is nontrivial if it is true for infinitely many systems and false for infinitely many systems. A property is hereditary if for any system satisfying the property, all nodes-induced subsystems also satisfy the property.
Results of recent studies on CNI validate the nodedeletion problem analyzed more than 30 years ago. The largest component order in a system [23] is nontrivial and hereditary. Testing (certificate-checking) for the largest component order can be performed in polynomial time. For instance, a depth-first algorithm such as Tarjan's algorithm may be used to identify the largest component order with the complexity of O(nodes + edges). A system with n nodes has at most n(n−1) 2 edges. The certificate-checking is therefore bounded by O(n 2 ). When the largest component order is the performance metric of interest, to identify the most critical nodes, whose removal minimizes the largest component size, is NP-complete according to Lewis and Yannakakis [19]. Similarly, the system pairwise connectivity [16] is nontrivial and hereditary. Tarjan's algorithm may be used to find pairwise connectivity between any pair of nodes and further calculate the pairwise connectivity. The CNI problem is therefore NP-complete when the system pairwise connectivity is of concern. Not all system performance metrics are hereditary, however. The system diameter [1] is not hereditary. Deleting a node may decrease the diameter although it increases the diameter most of the time. The CNI problem might belong to class P if the objective is to maximize the system diameter.
Some other recent work is related to CNI. In physics and engineering, research was focused on developing mathematical models and analyzing system properties [2,12,22]. In social science, many measures including centrality [17], complement [14], and reciprocal [5] were developed to describe system properties. In industrial process monitoring and control where multiple processes form a complex manufacturing system, data-driven approaches were applied for fault prognostics and diagnostics [26,27]. Borgatti [5] defined two types of problems to assess the importance of nodes. The Key Player Problem-Positive studied the extent to which a node is embedded in the system. The Key Player Problem-Negative studied the amount of reduction in cohesiveness of a system after elimination of a node. The Dynamic Network Modeling (DNA) [4,9,20,24] was developed to model and analyze complex systems. The DNA was successfully applied to terrorist networks and used to identify critical nodes through simulation experiments.
In summary, previous research on CNI validated certain analytical results and alluded to their applications to practical problems, but failed to design metrics that are meaningful for practical problems. There are two types of CNI problems: the optimization problem and the recognition problem. In the optimization problem, given limited resources (i.e., a certain number of nodes need to be protected or removed), which nodes' removal can minimize or maximize a performance metric? In the recognition problem, given a desired property (e.g., ≥ the value of a performance metric or ≤ the value of a performance metric), what is the minimum number of nodes that need to be protected or removed to satisfy the property? These two types of problems are equivalent in terms of algorithm complexity. This research is focused on the optimization problem, namely, to identify the most critical nodes. What was missing in previous research, which is the focus of this study, lies in two areas: (a) Which properties of a complex system need to be measured? A set of performance metrics must be designed to assess the impact of removing a portion of nodes from a complex system. Previous research suggested a few metrics, but did not indicate why and how they might be useful in practical problems. In addition, some metrics and related CNI algorithms (e.g., [23]) can only be applied to systems with special structures. This research designs three metrics that indicate a complex system's speed and scale of diffusing entities, which are useful in many real-world application; and (b) What are the characteristics of performance metrics?
Previous research predominantly studied different CNI algorithms in order to maximize or minimize performance metrics. The characteristics of performance metrics affect the efficiency and effectiveness of CNI algorithms. Moreover, performance metrics themselves can be used to identify critical nodes. This research identifies the values for each of the three new performance metrics and their corresponding network structures of a complex system, which provide important insight into how these metrics may be used to identify critical nodes.

Problem definition
Let G(V, E) represent a system of n vertices (nodes) and s edges. n and s are the order and size of G(V, E), respectively. V is a set of all nodes in G(V, E) : v 1 , v 2 , . . . , v n . E is a set of all edges in G(V, E) : e 1 , e 2 , . . . , e s . A path from v i to v j is a set of nodes and edges that connect v i and v j , with which entities may be diffused from v i to v j . In an undirected G(V, E), any edge is bidirectional, or undirected; a path from v i to v j is also a path from v j to v i . A directed G(V, E) has at least one directed edge (arc). An arc from v i to v j connects v i and v j ; entities may be diffused from v i to v j , but not from v j to v i . Since information is diffused in both directions in most social and information networks, this research focuses on CNI for undirected systems. G(V, E)'s ability to diffuse entities such as information, goods, or diseases [15] is reflected in homogeneity, diffusion speed, and diffusion scale, all of which are performance metrics that describe the properties of G(V, E) [3]. In a system with high homogeneity (i.e., a homogeneous network), relationships between nodes are the same or similar. In a network with low homogeneity (i.e., a heterogeneous network), relationships between nodes are much different. Diffusion speed indicates how fast entities are diffused between nodes. Diffusion scale indicates how many nodes in G(V, E) diffuse (or receive) entities to (or from) other nodes. This research investigates using performance metrics to identify critical nodes in G(V, E); removal of critical nodes minimizes homogeneity, diffusion speed, and/or diffusion scale. When resources are limited, CNI helps control systems in order to, for example, (a) minimize the possibility and scale of criminal organizations or terrorist attacks by neutralizing the most critical criminals or terrorists; (b) maximize a computer network's resilience to incidents (e.g., cyber-attacks) and accidents (e.g., random computer failures) by protecting the most critical nodes such as routers and servers; and (c) minimize disruptions in sensor or logistics networks by providing backup capacity to the most critical nodes.

Performance metrics
Homogeneity: normalized expected geodesic distance Nodes v i and v j are neighbors if an edge connects v i and v j . The geodesic distance, d v i ,v j , is the distance of the shortest path(s) between v i and v j . If v i and v j are neighbors, Suppose v i diffuses entities to other nodes in a system with a total of n nodes, the expected geodesic distance (EGD) from v i to other nodes is Further, suppose all nodes, v 1 , v 2 , . . . , v n , have equal probability, 1 n , to be the source node that begins the diffusion of entities to other nodes, the EGD of the network is EGD = At each step of diffusion, a node diffuses entities to all its neighbors. The EGD indicates the expected number of steps it takes to diffuse entities from the source node to other nodes. If diffusion time is proportional to the number of steps, the larger the EGD is, the longer is the total diffusion time and the lower is the diffusion speed. Since Figure 1 depicts two systems 1(a) and 1(b). For 1(a), Both systems have the same EGD (i.e., on average, it takes the same amount of time (two steps) to diffuse entities). The EGD indirectly measures a network's diffusion speed and does not take into consideration the order of the network, n. For instance, entities can be diffused to total five nodes in 1(a) (n = 5), whereas they can be diffused to total seven nodes in 1(b) (n = 7). Moreover, 1(b) is more homogenous than 1(a). The largest geodesic distance (LGD) of 1(b) is three, whereas the LGD of 1(a) is four (the distance of the shortest path between nodes 1 and 5). System 1(a) is less homogenous than 1(b) because there is a larger difference between the smallest geodesic distance and LGD in 1(a) than in 1(b).
To accurately measure a system's homogeneity, the normalized expected geodesic distance (NEGD) is defined in Eq.
(b) A system of seven nodes.

Fig. 1
Two systems with the same EGD is, the more homogeneous is a system. NEGD = 1 for a clique. For a fully disconnected system, it is defined that NEGD = 0. Although such a system is homogeneous, it is the most desirable case when the goal is to destroy a system and the worst case when the goal is to protect a system. Let θ(v i ) be the degree of v i , the number of edges that connect v i and its neighbors.
(i) LGD = n − 1. If n = 3, then n f = 1, which is contradictory to n f = 0. Therefore n ≥ 4. G(V, E) comprised a chain of n nodes. (An example is shown in Fig. 1a 1 NEGD of G(V, E) with total n nodes and n f fully connected nodes LGD = 0 0 . NEGD is strictly monotonically decreasing. When n = 4, NEGD = 5 9 . Since lim n→∞ There exists at least one chain of nodes, say V 1 ; the distance of the shortest path between the two end nodes of the chain is LGD.
The desired system property in the recognition problem is NEGD ≤ c, where 0 ≤ c ≤ 1. This property is nontrivial since it is true for infinitely many systems and false for infinitely many systems for given c. NEGD ≤ 1 is hereditary since the maximum of NEGD is one. NEGD ≤ 0 if and only if G(V, E) does not have any edge; any subsystem of G(V, E) does not have any edge. NEGD ≤ 0 is also hereditary. When 0 < c < 1, NEGD ≤ c is nonhereditary. The proof is as follows.
For any value of c, 0 < c < 1, G(V, E) may be constructed such that it includes only one edge and NEGD = 2 n(n−1) ≤ c < 1, n ≥ 3. A subsystem of G(V, E), which has two nodes and one edge that connects the two nodes, may be produced by removing all nodes without any edge. For this This concludes the proof of Proposition 1.
clique is the most homogeneous among all system structures. NEGD remains one regardless of which nodes are removed from G(V, E) (hereditary). All nodes in G(V, E) are equally critical; (b) If G(V, E) has at least one node that is fully connected but is not a clique (i.e., 1 ≤ n f ≤ n − 2), NEGD is greater than 0.5 and less than one; (c) If G(V, E) does not have any node that is fully connected (i.e., n f = 0), but comprised a chain of n nodes (i.e., LGD = n − 1), NEGD is described by a closed form, NEGD = 1 3 + 2 3(n−1) . NEGD is strictly monotonically decreasing as n increases. As a chain becomes longer, it is less homogeneous; (d) If G(V, E) does not have any node that is fully connected (i.e., n f = 0), and does not comprise a chain of n nodes (i.e., 1 ≤ LGD ≤ n − 2), NEGD is greater than zero and less than one. Most real-world networks follow this structure (n f = 0 and 1 ≤ LGD ≤ n−2) and NEGD can be used to describe their homogeneity and help identify critical nodes; (e) If G(V, E) does not have any edge (i.e., LGD = 0), NEGD = 0; and (f) Compared to the NEGD ( 1 3 < NEGD ≤ 1 2 ) of G(V, E) (n ≥ 5) that comprised a chain, the NEGD ( 1 2 < that has at least one fully connected node is higher. Systems with at least one fully connected node are more homogeneous than those with the chain structure. LGD v i is the maximum number of steps required for all nodes in the component to receive entities diffused from a randomly selected node v i . Let n v i be the total number of nodes in the component to which v i belongs; n v i is the order of the component. n v i − 1 is the total number of nodes that receive entities diffused from v i .
LGD v i is the expected minimum number of nodes that receive entities diffused from v i at each step.
LGD v i is the minimum diffusion speed of the component to which v i belongs. All nodes in the same component have the same minimum diffusion speed.
LGDv i n is the expected minimum speed, which indicates the expected minimum number of nodes in G(V, E) that receive entities at each step of diffusion. The maximum value of LGDv i n is n − 1, which is for a clique. To compare systems of different orders, the normalized expected minimum speed (NEMS) is defined in Eq. (2) for n > 1. NEMS = 0 for a fully disconnected G(V, E) and NEMS = 1 for a clique. The larger the NEMS is, the higher diffusion speed does G(V, E) have. For the two systems in Fig. 1, NEMS = 0.250 for 1(a) and NEMS = 0.333 for 1(b); system 1(b) has higher speed than 1(a).
The desired system property in the recognition problem is NEMS ≤ c, where 0 ≤ c ≤ 1. This property is nontrivial since it is true for infinitely many systems and false for infinitely many systems for given c. LGD . When G(V, E) has at least one node that is fully connected (i.e., 1 ≤ n f ≤ n − 2), G(V, E) comprised a single component and LGD = 2; NEMS = 0.5. When G(V, E) does not have any node that is fully connected (i.e., n f = 0), but comprised a chain of n nodes (i.e., LGD = n − 1), G(V, E) comprised a single component and NEMS = 1 n−1 . In this case, NEMS is strictly monotonically decreasing as n increases. As a chain becomes longer, its diffusion speed decreases; (c) If G(V, E) does not have any node that is fully connected (i.e., n f = 0), and does not comprise a chain of n nodes (i.e., 1 ≤ LGD ≤ n −2), NEMS is greater than zero and less than one. NEMS approaches zero as n increases when G(V, E) comprised a chain of LGD + 1 nodes, and n − (LGD + 1) fully discon- LGD = 0 0 nected nodes. In this case, NEMS = LGD+1 n(n−1) . Since LGD is bounded by n − 2, lim n→∞ 3 ) of G(V, E) that comprised a chain, the NEMS (NEMS = 0.5 or 1) of G(V, E) that has at least one fully connected node is higher. When G(V, E ) represents the typical structure of most real-world systems (n f = 0 and 1 ≤ LGD ≤ n − 2), 0 < NEMS < 1 and NEMS is a useful performance metric to describe how fast entities diffuse in G(V, E).

Diffusion scale: normalized largest component order
Suppose v i is the source node that diffuses entities to other nodes in G(V, E). n v i − 1 is the number of nodes that receive entities. Suppose ∀v i ∈ V might be the source node, max(n v i ) − 1 is the maximum number of nodes that receive entities if there is only one source node. max(n v i ) is the largest component order in G (V, E). The normalized largest component order (NLCO; Eq. (3)) indicates the maximum scale of (a) terrorist or criminal activities by connected terrorists or criminals; and (b) networked computers, communication devices, and sensors.

Proposition 3 Suppose G(V, E) has total n nodes and n f
(0 ≤ n f ≤ n) fully connected nodes. Table 3 depicts the

is a nonhereditary property of G(V, E).
Proof of Proposition 3 When n = 1, it is defined that   G(V, E) can be constructed such that it includes only one edge and NLCO = 1 n−1 ≤ c < 1, n ≥ 3. A subsystem of G(V, E), which has two nodes and one edge that connects the two nodes, may be produced by removing all nodes without any edge. For this subsystem, This concludes the proof of Proposition 3.

CNI algorithms and complexity
All three performance metrics may be applied to identify critical nodes. For example, the scale and complexity ofterrorist attacks are related to the size of terrorist groups; smaller groups are less likely to launch coordinated largescale attacks. The NLCO may be used to identify critical terrorists. For another example, to enhance information security, it is important to slow down the spread of viruses in a computer network. The NEMS may be used to identify vulnerable computers. For engineered systems such as the Internet and wireless sensor networks, the significance of using the performance metrics is twofold. First, the performance metrics may be applied to compare different system designs. For instance, the NEMS may be used to gauge the speed of information diffusion in a wireless sensor network, and help select the best network designs. Second, the performance metrics may be applied to identify critical nodes. In a computer network, for instance, the NEGD may be used to identify critical computers whose removal result in a heterogeneous network, which is vulnerable to targeted attacks [1]. Propositions 1 through 3 summarize the performance of the three metrics over generalized systems, and enable the assessment and comparison of systems of different orders and sizes. Algorithms are needed to apply the three metrics to identify critical nodes.

Nonpolynomial-time CNI algorithms
Given that a maximum of m nodes may be removed from a system of n nodes (0 < m ≤ n), Algorithm A uses the NEGD to identify the most critical nodes (i.e., the global optimal solution) in the system.
The NEGD in Algorithm A may be replaced with NEMS or NLCO to identify the most critical nodes. Since any per-

Polynomial-time CNI algorithms
To reduce complexity, Algorithm B uses the NEGD to identify critical nodes in a system step by step.
Algorithm B identifies the local optimal solution at each step and removes the node whose removal minimizes the NEGD. This algorithm does not guarantee the global optimum, which is achieved by the nonpolynomial-time CNI algorithms (e.g., Algorithm A). Two algorithms similar to Algorithm B may be written by replacing the NEGD with NEMS or NLCO. Calculating NEGD, NEMS, or NLCO requires O(n 3 ) time. At each step, a performance metric is calculated for n + 1 − k different systems. Since there are m steps, the overall complexity of each of the three CNI algorithms with local optima is O(mn 4 ).
These three algorithms are applied to the network of 9/11 terrorists [18] to identify the critical nodes. Each square (Fig.  2) represents a terrorist and each link represents communications between two terrorists. There are total 63 nodes and they form a connected network. Note that for coordinated attacks, there cannot be disconnected components. None of the 63 nodes is fully connected (i.e., n f = 0) and the network is not a chain (i.e., LGD < n −1 = 62); this reflects the structure of many criminal organizations and is designed for practical purposes such as secrecy. Figures 3, 4 and 5 show the results of applying Algorithm B to the 9/11 terrorist network. Detailed results are included in Tables 4, 5 and 6 in the Appendix. These results validate the three propositions. All three performance metrics are between zero and one. Figure  5 also shows that NLCO ≤ c, 0 < c < 1, is nonhereditary because the NLCO does not decrease monotonically and sometimes increases. Figures 3 and 4 indicate that both the NEGD and NEMS are close to zero after almost 30 nodes are removed or neutralized, which are approximately 50 % of all terrorists. When NEGD is close to zero, most nodes are disconnected, whereas the diffusion speed is almost zero when NEMS is close to zero. The terrorist network cannot launch any large-scale attacks after about 30 nodes are removed using Algorithm B, which is an efficient, polynomial-time algorithm. Figure 5 shows that the NLCO reaches the minimum after about 30 nodes are removed and then increases slightly before it becomes zero. This indicates that the portion of nodes that form a connected group and may launch terrorist attacks is the smallest after 30 nodes are removed. Removing additional nodes does not further decrease the portion until almost all nodes are removed. Overall, the 9/11 terrorists network is destroyed or neutralized after about 30 critical nodes are removed using Algorithm B.

Comparative analysis of CNI algorithms
To further validate Algorithm B, three CNI algorithms, including degree-based node removal, betweenness-based node removal, and random node removal, are applied to the 9/11 terrorist network. These three algorithms have been widely used to identify critical nodes in complex systems. The degree-based node removal identifies critical nodes based on their degree [1,21]. The node degree, θ(v i ), is an indicator of v i 's connections with other nodes in a system. Nodes with a larger degree are more critical and are removed or neutralized first. The betweenness-based node removal identifies critical nodes based on their betweenness, which measures the frequency with which a node falls on the shortest paths connecting pairs of other nodes [17]. Betweenness indicates the potential of a node in controlling communications in a system. Nodes with a larger betweenness are more critical and are removed or neutralized first. Equation The random node removal randomly selects a node as the most critical node and removes it from the system. The ran-dom node removal is expected to have the worst performance in minimizing the three performance metrics. Figures 6, 7, and 8 compare the performance of the four CNI algorithms, Algorithm B, degree-based node removal, betweennessbased node removal, and random node removal, which are applied to the 9/11 terrorist network. Figure 6 shows that Algorithm B decreases NEGD faster than degree-based node removal and random node removal. The betweenness-based node removal has smaller NEGD compared to Algorithm B between 2 and 24 nodes are removed ( dix). Starting from the 25th node, however, Algorithm B performs better than the betweenness-based node removal with smaller NEGD. Overall, Algorithm B has the best performance among all four CNI algorithms and consistently decreases NEGD. Figure 7 shows a similar trend observed in Fig. 6. Algorithm B performs the best in minimizing NEMS. Figure 8 shows that Algorithm B decreases NLCO faster than the random node removal. The degree-based node removal and betweenness-based node removal have smaller NLCO than Algorithm B at the beginning, but larger NLCO when there are a few nodes left in the system. Overall, Algorithm B should be used to minimize the three performance metrics, whereas degree-based node removal and betweenness-based node removal may be used to decrease NLCO more efficiently.
As discussed in Sect. "CNI algorithms and complexity", the computational complexity of Algorithm B is O(mn 4 ), where m is the number of critical nodes and n is the total number of nodes in a system. The computational complexity of the degree of a node is O(n). To compute the degree of all n nodes, the complexity is O(n 2 ). The complexity of the degree-based node removal is therefore O(mn 4 ), which is the same as that of Algorithm B. Using Brandes' Algorithm [6], calculating the betweenness of a node requires O(ns) time, where s denotes the number of edges in a system. Since there are at most n(n−1) Table 6 in the Appendix) removal is the second best algorithm among the four CNI algorithms, but requires more computation time compared to Algorithm B. The complexity of the random node removal is O(mn) because the complexity of removing a randomly selected node at each step is O(n).

Conclusions and future research
Three new performance metrics, NEGD, NEMS, and NLCO, are designed to assess a system's ability of diffusing entities such as information, goods, or diseases. All three metrics are normalized and are between zero and one. The higher their value is, the more capable is a system to diffuse entities. Characteristics of the three metrics are analyzed for generalized systems. All three metrics are nontrivial; they are nonhereditary except for extreme cases (e.g., NEGD ≤ 1 or NEGD ≤ 0).
These three performance metrics may be used to identify critical nodes in complex systems. Three nonpolynomial algorithms (Sect. "Nonpolynomial-time CNI algorithms") use the three metrics to identify the most critical nodes (i.e., global optimum). CNI is NP-complete if any of the three metrics is required to be less than or equal to zero. There might exist polynomial-time CNI algorithms if any of the three metrics is required to be less than or equal to a constant that is between but exclusive of zero and one. In Sect. "Polynomialtime CNI algorithms", three polynomial-time algorithms are designed to identify critical nodes step by step (i.e., local optima). These three algorithms with local optima do not guarantee the identification of the global optimum, but their algorithm complexity is O(mn 4 ), which is in class P, where m is the number of critical nodes to be identified and n is the number of nodes in the system (i.e., system order; m ≤ n). These polynomial-time algorithms are compared to three other widely used CNI algorithms, including degree-based node removal, betweenness-based node removal, and random node removal. The polynomial-time algorithms developed in this article have the best performance.
CNI is important in controlling complex systems with limited resources. Future research may focus on three areas: (a) Study the relationship between the three performance metrics and determine whether they can be integrated or additional metrics are needed to assess desired properties of complex systems; (b) Apply the six algorithms to other real-world complex systems to further validate and compare their performance and complexity; (c) Design other exact or heuristic optimization algorithms to identify critical nodes. Since the three performance metrics are nontrivial but nonhereditary properties in most cases, there might exist exact optimization algorithms that belong to class P; and (d) Integrate the DNA and the performance metrics and algorithms developed in this research, and apply them to systems whose properties, e.g., topology or structure, change over time.