Introduction

Vulnerabilities to natural disasters and terrorist violence require a better understanding of complex systems and new methods of protection and prevention [4]. The objective of this research is to apply informative performance metrics to identify critical nodes in complex systems such as smart grid, social networks, information systems, and criminal organizations. For example, to protect an electrical power grid, most vulnerable transformers and stations whose failures will cause large-scale blackouts (hence they are also critical nodes) need to be provided with backup capacity or enhanced security. For another example, to ensure public safety and security, key members of a criminal or terrorist organization need to be neutralized (e.g., captured or isolated).

Ideally, we would like to protect all nodes in an electrical power grid or an information system. In reality, resources are limited. To protect all nodes in a large system is not affordable. Time to respond to a criminal activity or terrorist attack is short. It is necessary to apply limited resources to the most critical nodes to maximize the effect of either protecting a system or destroying a criminal or terrorist organization. Previous research on critical nodes identification (CNI) suggested a few metrics, but did not indicate why and how they might be useful in practical problems. Some metrics and related CNI algorithms (e.g., [23]) developed earlier can only be applied to systems with special structures. In addition, previous research predominantly studied different CNI algorithms in order to maximize or minimize performance metrics. The characteristics of performance metrics were not studied or used to identify critical nodes.

This research investigates a set of informative performance metrics and applies them to identify critical nodes in complex systems. The main contributions of this research include: (a) development of three new system performance metrics that describe desired properties of complex systems; (b) design of algorithms that use the metrics to identify critical nodes; and (c) analysis of characteristics of the metrics and algorithm complexity.

Background

Nodes in a complex system represent machines, equipment, workstations, computers, generators, control units, operators, and other components each of which is modeled as a separate entity. Edges (links) between nodes represent the flow of entities including products, services, and information. Nodes are linked directly or indirectly. If a node j is linked to a node \(j^{\prime }\) directly, there is an edge between the two nodes. When two nodes j and \(j^{\prime }\) are linked indirectly, there is at least one path between j and \(j^{\prime }\) through other nodes so that entities can be diffused from j to \(j^{\prime }\) and/or from \(j^{\prime }\) to j. When two nodes are not linked, there is no path between the two nodes. Entities may flow along both directions between two nodes connected by an undirected edge [10, 11, 13]. Directed edges are also called arcs; entities may flow along only one direction between two nodes connected by an arc.

There have been many studies on CNI, and complex networks and complex systems in general [29]. Most recently, Shen and Smith [23] used dynamic programming algorithms to identify critical nodes in trees and series–parallel systems. The performance metrics of interest were: (a) the number of components, which was to be maximized; and (b) the largest component order (i.e., the number of nodes in a component), which was to be minimized. For systems which can be modeled as trees and series-parallel graphs, the complexity of the dynamic programming algorithms was at most \(O({n^5\log n})\). These algorithms may be extended to generalized connected systems, which are interpreted as k-hole systems. The complexity of the algorithms, however, increases exponentially as k increases. The algorithms are not applicable to systems with disconnected components.

Analysis of system vulnerability is related to CNI. Dinh et al. [16] used dynamic programming to identify critical nodes and links. The algorithms developed were approximation algorithms and the complexity was at most \(O({\log ^{1.5}n})\). The performance metric of interest was pairwise connectivity. The pairwise connectivity between two nodes is one if this pair is connected and zero otherwise. In an undirected system, a pair of nodes is connected if and only if there exists at least one path (in both directions) between the two nodes. The system pairwise connectivity is the sum of pairwise connectivity between any pair of nodes. For a given level of degradation in pairwise connectivity, it was shown that to find a minimum set of nodes or edges (called \(\beta \)-disruptor), whose removal causes the specific level of degradation, was an NP-complete problem for undirected systems.

CNI was also studied in the context of system reliability [28]. A reliability metric, \(\Pi \), was used to describe the average reliability between every pair of nodes in a system. An evolutionary algorithm was used to identify critical components (nodes), the removal of which aimed to minimize both \(\Pi \) and the cost of incapacitating links (multiple-objective optimization). The evolutionary algorithm provided good-quality solutions; its Pareto fronts (efficient frontiers) were close to the real Pareto optimal solutions.

In the study of the Internet and World Wide Web [1, 7] and more generally complex systems such as airline routes, electric power grids, and disease propagation [8], the diameter of a system, which is the average length of the shortest paths between any two nodes in the system, was analyzed when nodes were removed according to the number of links they have; nodes with more links were removed first. Compared to removing randomly selected nodes, the link-based node removal increased the diameter faster, although the diameter does not always increase when a node is removed.

Earlier studies on CNI focused on general systems and algorithm complexity. For a fixed system property (i.e., a value or a range of values for a performance metric), a node-deletion problem aims to find the minimum number of nodes which must be deleted from a given system so that the result satisfies the property. It was shown that the node-deletion problem for a system property is NP-hard or NP-complete if the property is nontrivial and hereditary [19, 25]. A system property is nontrivial if it is true for infinitely many systems and false for infinitely many systems. A property is hereditary if for any system satisfying the property, all nodes-induced subsystems also satisfy the property.

Results of recent studies on CNI validate the node-deletion problem analyzed more than 30 years ago. The largest component order in a system [23] is nontrivial and hereditary. Testing (certificate-checking) for the largest component order can be performed in polynomial time. For instance, a depth-first algorithm such as Tarjan’s algorithm may be used to identify the largest component order with the complexity of \(O({\text{ nodes }+\text{ edges }})\). A system with n nodes has at most \(\frac{n({n-1})}{2}\) edges. The certificate-checking is therefore bounded by \(O({n^2})\). When the largest component order is the performance metric of interest, to identify the most critical nodes, whose removal minimizes the largest component size, is NP-complete according to Lewis and Yannakakis [19].

Similarly, the system pairwise connectivity [16] is nontrivial and hereditary. Tarjan’s algorithm may be used to find pairwise connectivity between any pair of nodes and further calculate the pairwise connectivity. The CNI problem is therefore NP-complete when the system pairwise connectivity is of concern. Not all system performance metrics are hereditary, however. The system diameter [1] is not hereditary. Deleting a node may decrease the diameter although it increases the diameter most of the time. The CNI problem might belong to class P if the objective is to maximize the system diameter.

Some other recent work is related to CNI. In physics and engineering, research was focused on developing mathematical models and analyzing system properties [2, 12, 22]. In social science, many measures including centrality [17], complement [14], and reciprocal [5] were developed to describe system properties. In industrial process monitoring and control where multiple processes form a complex manufacturing system, data-driven approaches were applied for fault prognostics and diagnostics [26, 27]. Borgatti [5] defined two types of problems to assess the importance of nodes. The Key Player Problem-Positive studied the extent to which a node is embedded in the system. The Key Player Problem-Negative studied the amount of reduction in cohesiveness of a system after elimination of a node. The Dynamic Network Modeling (DNA) [4, 9, 20, 24] was developed to model and analyze complex systems. The DNA was successfully applied to terrorist networks and used to identify critical nodes through simulation experiments.

In summary, previous research on CNI validated certain analytical results and alluded to their applications to practical problems, but failed to design metrics that are meaningful for practical problems. There are two types of CNI problems: the optimization problem and the recognition problem. In the optimization problem, given limited resources (i.e., a certain number of nodes need to be protected or removed), which nodes’ removal can minimize or maximize a performance metric? In the recognition problem, given a desired property (e.g., \(\ge \) the value of a performance metric or \(\le \) the value of a performance metric), what is the minimum number of nodes that need to be protected or removed to satisfy the property? These two types of problems are equivalent in terms of algorithm complexity. This research is focused on the optimization problem, namely, to identify the most critical nodes. What was missing in previous research, which is the focus of this study, lies in two areas:

  1. (a)

    Which properties of a complex system need to be measured? A set of performance metrics must be designed to assess the impact of removing a portion of nodes from a complex system. Previous research suggested a few metrics, but did not indicate why and how they might be useful in practical problems. In addition, some metrics and related CNI algorithms (e.g., [23]) can only be applied to systems with special structures. This research designs three metrics that indicate a complex system’s speed and scale of diffusing entities, which are useful in many real-world application; and

  2. (b)

    What are the characteristics of performance metrics? Previous research predominantly studied different CNI algorithms in order to maximize or minimize performance metrics. The characteristics of performance metrics affect the efficiency and effectiveness of CNI algorithms. Moreover, performance metrics themselves can be used to identify critical nodes. This research identifies the values for each of the three new performance metrics and their corresponding network structures of a complex system, which provide important insight into how these metrics may be used to identify critical nodes.

Problem definition

Let G( VE) represent a system of n vertices (nodes) and s edges. n and s are the order and size of G( VE), respectively. V is a set of all nodes in \(G( {V, E}):v_1, v_2,\ldots ,v_n\). E is a set of all edges in \(G( {V, E}): e_1, e_2,\ldots ,e_s\). A path from \(v_i \) to \(v_j \) is a set of nodes and edges that connect \(v_i \) and \(v_j \), with which entities may be diffused from \(v_i \) to \(v_j \). In an undirected G( VE), any edge is bidirectional, or undirected; a path from \(v_i \) to \(v_j \) is also a path from \(v_j \) to \(v_i \). A directed G( VE) has at least one directed edge (arc). An arc from \(v_i \) to \(v_j \) connects \(v_i \) and \(v_j \); entities may be diffused from \(v_i \) to \(v_j \), but not from \(v_j \) to \(v_i \). Since information is diffused in both directions in most social and information networks, this research focuses on CNI for undirected systems.

G( VE)’s ability to diffuse entities such as information, goods, or diseases [15] is reflected in homogeneity, diffusion speed, and diffusion scale, all of which are performance metrics that describe the properties of G(VE) [3]. In a system with high homogeneity (i.e., a homogeneous network), relationships between nodes are the same or similar. In a network with low homogeneity (i.e., a heterogeneous network), relationships between nodes are much different. Diffusion speed indicates how fast entities are diffused between nodes. Diffusion scale indicates how many nodes in G(VE) diffuse (or receive) entities to (or from) other nodes. This research investigates using performance metrics to identify critical nodes in G(VE); removal of critical nodes minimizes homogeneity, diffusion speed, and/or diffusion scale. When resources are limited, CNI helps control systems in order to, for example,

  1. (a)

    minimize the possibility and scale of criminal organizations or terrorist attacks by neutralizing the most critical criminals or terrorists;

  2. (b)

    maximize a computer network’s resilience to incidents (e.g., cyber-attacks) and accidents (e.g., random computer failures) by protecting the most critical nodes such as routers and servers; and

  3. (c)

    minimize disruptions in sensor or logistics networks by providing backup capacity to the most critical nodes.

Performance metrics

Homogeneity: normalized expected geodesic distance

Nodes \(v_i \) and \(v_j \) are neighbors if an edge connects \(v_i \) and \(v_j \). The geodesic distance, \(d_{v_i ,v_j } \), is the distance of the shortest path(s) between \(v_i \) and \(v_j \). If \(v_i \) and \(v_j \) are neighbors, \(d_{v_i ,v_j } =1\). If there exists no path between \(v_i \) and \(v_j \), \(d_{v_i ,v_j } =0\). Suppose \(v_i \) diffuses entities to other nodes in a system with a total of n nodes, the expected geodesic distance (\(\mathrm{EGD})\) from \(v_i \) to other nodes is \(\frac{\mathop \sum \nolimits _{j=1,j\ne i}^n d_{v_i ,v_j } }{n-1}\). Further, suppose all nodes, \(v_1, v_2,\ldots , v_n\), have equal probability, \(\frac{1}{n}\), to be the source node that begins the diffusion of entities to other nodes, the \(\mathrm{EGD}\) of the network is \(\mathrm{EGD}=\frac{\mathop \sum \nolimits _{i=1}^{n-1} \mathop \sum \nolimits _{j=i+1}^n d_{v_i ,v_j } }{\frac{n(n-1)}{2}}\).

At each step of diffusion, a node diffuses entities to all its neighbors. The \(\mathrm{EGD}\) indicates the expected number of steps it takes to diffuse entities from the source node to other nodes. If diffusion time is proportional to the number of steps, the larger the \(\mathrm{EGD}\) is, the longer is the total diffusion time and the lower is the diffusion speed. Since 0\(\le d_{v_i ,v_j } \le n-1\), \(0\le \mathrm{EGD}\le n-1\). \(\mathrm{EGD}=1\) for a clique, which is a fully connected network (i.e., \(d_{v_i ,v_j } =1\) for \(\forall v_i , v_j )\). \(\mathrm{EGD}=0\) for a fully disconnected network without any edge (i.e., \(s=0)\). In a connected network, \(d_{v_i ,v_j } \ge 1\) for \(\forall v_i , v_j \); in a disconnected network, \(\exists v_i , v_j \) such that \(d_{v_i ,v_j } =0\). Figure 1 depicts two systems 1(a) and 1(b). For 1(a), \(\mathrm{EGD}=\frac{( {1+2+3+4})+( {1+2+3})+( {1+2})+1}{\frac{5( {5-1})}{2}}=2\). For 1(b), \(\mathrm{EGD}=\frac{( {1+2+3+1+2+3})+( {1+2+3+2+3})+( {1+2+3+3})+( {1+2+3})+( {1+2})+1}{\frac{7( {7-1})}{2}}=2\). Both systems have the same \(\mathrm{EGD}\) (i.e., on average, it takes the same amount of time (two steps) to diffuse entities).

Fig. 1
figure 1

Two systems with the same \(\mathrm{EGD}\)

The \(\mathrm{EGD}\) indirectly measures a network’s diffusion speed and does not take into consideration the order of the network, n. For instance, entities can be diffused to total five nodes in 1(a) (\(n=5)\), whereas they can be diffused to total seven nodes in 1(b) (\(n=7)\). Moreover, 1(b) is more homogenous than 1(a). The largest geodesic distance (\(\mathrm{LGD})\) of 1(b) is three, whereas the \(\mathrm{LGD}\) of 1(a) is four (the distance of the shortest path between nodes 1 and 5). System 1(a) is less homogenous than 1(b) because there is a larger difference between the smallest geodesic distance and \(\mathrm{LGD}\) in 1(a) than in 1(b).

To accurately measure a system’s homogeneity, the normalized expected geodesic distance (\(\mathrm{NEGD})\) is defined in Eq. (1) for \(n>1\). In essence, \(\mathrm{NEGD}=\frac{\mathrm{EGD}}{\mathrm{LGD}}\). \(\mathrm{NEGD}=0.500\) for 1(a) and \(\mathrm{NEGD}=0.667\) for 1(b). The larger the \(\mathrm{NEGD}\) is, the more homogeneous is a system. \(\mathrm{NEGD}=1\) for a clique. For a fully disconnected system, it is defined that \(\mathrm{NEGD}=0\). Although such a system is homogeneous, it is the most desirable case when the goal is to destroy a system and the worst case when the goal is to protect a system. Let \(\theta ( {v_i })\) be the degree of \(v_i \), the number of edges that connect \(v_i \) and its neighbors. The \(\mathrm{NEGD}\) of G( VE) is summarized in Proposition 1.

$$\begin{aligned} \mathrm{NEGD}=\frac{\mathop \sum \nolimits _{i=1}^{n-1} \mathop \sum \nolimits _{j=i+1}^n d_{v_i ,v_j } }{\frac{n(n-1)\mathrm{LGD}}{2}} \end{aligned}$$
(1)

Proposition 1

Suppose G( VE) has total n nodes. Let \(n_f \) (\(0\le n_f \le n)\) be the number of fully connected nodes \(v_i \)s. \(\theta ( {v_i })=n-1\). Table 1 depicts the \(\mathrm{NEGD}\) of G( VE). 0\(\le \mathrm{NEGD}\le 1\). \(\mathrm{NEGD}\le c\), 0\(\le c\le 1\), is a nontrivial property of G( VE). \(\mathrm{NEGD}\le c\), \(c=0\) or \(c=1\), is a hereditary property of G( VE). \(\mathrm{NEGD}\le c\), 0\(<c<1\), is a nonhereditary property of G( VE).

Table 1 \(\mathrm{NEGD}\) of G( VE) with total n nodes and \(n_f \) fully connected nodes

Proof of Proposition 1

When \(n=1\), \(\mathrm{NEGD}=0\) per definition.

When \(n=2\), \(\mathrm{NEGD}=0\) if there is no edge; \(\mathrm{NEGD}=1\) if there is one edge.

When \(n\ge 3\), let \(V_f \) represent a set of nodes, each of which is connected to other \(n-1\) nodes in G( VE). \(\theta ( {v_i })=n-1\) for \(\forall v_i \in V_f \). \(\theta ( {v_j })<n-1\) for \(\forall v_j \notin V_f \). \(V_f \) has \(n_f \) nodes; \(0\le n_f \le n\).

  1. (a)

    \(n_f =n. \quad G( {V, E})\) is a clique and \(\mathrm{NEGD}=1\);

  2. (b)

    \(n_f \ne n-1.\) If \(n_f =n-1\), there is only one node, say \(v_j \in V\), such that \(v_j \notin V_f \) (i.e., \(\theta ( {v_j })<n-1)\). Since all \(n-1\) nodes in \(V_f \) have the degree of \(n-1\), all \(n-1\) nodes in \(V_f \) must be connected to \(v_j \). This indicates that \(\theta ( {v_j })=n-1\). This is contradictory to the condition that \(v_j \notin V_f \). Therefore \(n_f \ne n-1\);

  3. (c)

    \(1\!\le \! n_f \!\le \! n\!-\!2.~ \mathrm{NEGD}\!\le \!\! \frac{\frac{n_f (n_f -1)}{2}+n_f ( {n-n_f })+2\frac{( {n-n_f })( {n-n_f -1})}{2}}{2\frac{n(n-1)}{2}}=1-\frac{n_f ( {2n-n_f -1})}{2n( {n-1})} =\frac{1}{2n( {n-1})}\Big [ ( {\frac{2n-1}{2}-n_f })^2+\frac{4n^2-4n-1}{4} \Big ].\) Since \(1\le n_f \le n-2<\frac{2n-1}{2}\), \(\mathrm{NEGD}\le \frac{n-1}{n}\) when \(n_f =1\). Since \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{n-1}{n}=1\), \(\mathrm{NEGD}<1\). \(\mathrm{NEGD}>\frac{\frac{n_f (n_f -1)}{2}+n_f ( {n-n_f })+\frac{( {n-n_f })( {n-n_f -1})}{2}}{2\frac{n(n-1)}{2}}=\frac{1}{2}\). \(\frac{1}{2}<\mathrm{NEGD}<1\); and

  4. (d)

    \(n_f =0.\) Three different situations can be analyzed: \(\mathrm{LGD}=n-1\), \(1\le \mathrm{LGD}\le n-2\), and \(\mathrm{LGD}=0\).

    1. (i)

      \(\mathrm{LGD}=n-1.\) If \(n=3\), then \(n_f =1\), which is contradictory to \(n_f =0\). Therefore \(n\ge 4\). G( VE) comprised a chain of n nodes. (An example is shown in Fig. 1a) \(\mathop \sum \nolimits _{i=1}^n \mathop \sum \nolimits _{j=i+1}^n d_{v_i ,v_j } = \quad \left[ {1+\cdots +( {n-1})} \right] +\left[ {1+\cdots +( {n-2})} \right] +\cdots +\left[ {1+2} \right] +1=\frac{n( {n-1})}{2}+\frac{( {n-1})( {n-2})}{2}+\cdots +\frac{3\times 2}{2}+\frac{2\times 1}{2}= \quad \frac{1}{2}\left[ ( {( {n-1})^2+( {n-2})^2+\cdots +2^2+1^2})\right. \left. +( {( {n-1})+( {n-2})+\cdots +2+1}) \right] =\frac{1}{2}\left[ {\frac{( {n-1})n( {2n-2+1})}{6}+\frac{n( {n-1})}{2}} \right] \!=\!\frac{( {n-1})n( {n+1})}{6}\). \(\mathrm{NEGD}\!=\!\frac{\mathop \sum \nolimits _{i=1}^n \mathop \sum \nolimits _{j=i+1}^n d_{v_i ,v_j } }{\frac{n(n-1)}{2}(n-1)}=\frac{\frac{( {n-1})n( {n+1})}{6}}{\frac{n(n-1)}{2}(n-1)}=\frac{n+1}{3(n-1)}=\frac{1}{3}+\frac{2}{3( {n-1})}\). \(\mathrm{NEGD}\) is strictly monotonically decreasing. When \(n=4\), \(\mathrm{NEGD}=\frac{5}{9}\). Since \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \left[ {\frac{1}{3}+\frac{2}{3( {n-1})}} \right] =\frac{1}{3}\), \(\frac{1}{3}<\mathrm{NEGD}\le \frac{5}{9}\);

    2. (ii)

      \(1\le \mathrm{LGD}\le n-2.\) There exists at least one chain of nodes, say \(V_1 \); the distance of the shortest path between the two end nodes of the chain is \(\mathrm{LGD}\). \(V_1 \) has \(\mathrm{LGD}+1\) nodes. \(\mathrm{NEGD}=\frac{\mathop \sum \nolimits _{i=1}^n \mathop \sum \nolimits _{j=i+1}^n d_{v_i ,v_j } }{\frac{n(n-1)}{2}\mathrm{LGD}}= \quad \frac{\frac{\mathrm{LGD}( {\mathrm{LGD}+1})( {\mathrm{LGD}+2})}{6}+\mathop \sum \nolimits _{i,j} d_{v_i ,v_j } }{\frac{n( {n-1})}{2}\mathrm{LGD}}\ge \frac{\frac{\mathrm{LGD}( {\mathrm{LGD}+1})( {\mathrm{LGD}+2})}{6}}{\frac{n( {n-1})}{2}\mathrm{LGD}}=\frac{( {\mathrm{LGD}+1})( {\mathrm{LGD}+2})}{3n( {n-1})}\ge \frac{2}{n( {n-1})}\). Since \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{2}{n( {n-1})}=0\), \(\mathrm{NEGD}\!>\!0\). Meantime, \(\mathrm{NEGD}\!=\!\frac{\mathop \sum \nolimits _{i=1}^n \mathop \sum \nolimits _{j=i+1}^n d_{v_i ,v_j } }{\frac{n(n-1)}{2}\mathrm{LGD}}=\frac{\frac{\mathrm{LGD}( {\mathrm{LGD}+1})( {\mathrm{LGD}+2})}{6}+\mathop \sum \nolimits _{i,j} d_{v_i ,v_j } }{\frac{n( {n-1})}{2}\mathrm{LGD}}<\frac{\frac{\mathrm{LGD}( {\mathrm{LGD}+1})( {\mathrm{LGD}+2})}{6}+\left[ {\frac{n( {n-1})}{2}-\frac{\mathrm{LGD}( {\mathrm{LGD}+1})}{2}} \right] \mathrm{LGD}}{\frac{n( {n-1})}{2}\mathrm{LGD}}=1-\frac{2( {\mathrm{LGD}+1})( {\mathrm{LGD}-1})}{3n( {n-1})}\le 1\); and

    3. (iii)

      \(\mathrm{LGD}=0. \quad \mathrm{NEGD}=0\) per definition.

The desired system property in the recognition problem is \(\mathrm{NEGD}\le c\), where \(0\le c\le 1\). This property is nontrivial since it is true for infinitely many systems and false for infinitely many systems for given c. \(\mathrm{NEGD}\le 1\) is hereditary since the maximum of \(\mathrm{NEGD}\) is one. \(\mathrm{NEGD}\le 0\) if and only if G( VE) does not have any edge; any subsystem of G( VE) does not have any edge. \(\mathrm{NEGD}\le 0\) is also hereditary. When \(0<c<1\), \(\mathrm{NEGD}\le c\) is nonhereditary. The proof is as follows.

Table 2 \(\mathrm{NEMS}\) of G( VE) with total n nodes and \(n_f \) fully connected nodes

For any value of c, \(0<c<1\), G( VE) may be constructed such that it includes only one edge and \(\mathrm{NEGD}=\frac{2}{n(n-1)}\le c<1\), \(n\ge 3\). A subsystem of G( VE), which has two nodes and one edge that connects the two nodes, may be produced by removing all nodes without any edge. For this subsystem, \(\mathrm{NEGD}=1>c\). \(\mathrm{NEGD}\le c\), \(0<c<1\), is therefore nonhereditary.

This concludes the proof of Proposition 1. \(\square \)

Proposition 1 reveals the homogeneity of G( VE) (\(n\ge 3)\):

  1. (a)

    If G( VE) is a clique (i.e., \(n_f =n)\), \(\mathrm{NEGD}=1\). A clique is the most homogeneous among all system structures. \(\mathrm{NEGD}\) remains one regardless of which nodes are removed from G( VE) (hereditary). All nodes in G( VE) are equally critical;

  2. (b)

    If G( VE) has at least one node that is fully connected but is not a clique (i.e., \(1\le n_f \le n-2)\), \(\mathrm{NEGD}\) is greater than 0.5 and less than one;

  3. (c)

    If G( VE) does not have any node that is fully connected (i.e., \(n_f =0)\), but comprised a chain of n nodes (i.e., \(\mathrm{LGD}=n-1)\), \(\mathrm{NEGD}\) is described by a closed form, \(\mathrm{NEGD}=\frac{1}{3}+\frac{2}{3( {n-1})}\). \(\mathrm{NEGD}\) is strictly monotonically decreasing as n increases. As a chain becomes longer, it is less homogeneous;

  4. (d)

    If G( VE) does not have any node that is fully connected (i.e., \(n_f =0)\), and does not comprise a chain of n nodes (i.e., \(1\le \mathrm{LGD}\le n-2)\), \(\mathrm{NEGD}\) is greater than zero and less than one. Most real-world networks follow this structure (\(n_f =0\) and \(1\le \mathrm{LGD}\le n-2)\) and \(\mathrm{NEGD}\) can be used to describe their homogeneity and help identify critical nodes;

  5. (e)

    If G( VE) does not have any edge (i.e., \(\mathrm{LGD}=0)\), \(\mathrm{NEGD}=0\); and

  6. (f)

    Compared to the \(\mathrm{NEGD}\) (\(\frac{1}{3}<\mathrm{NEGD}\le \frac{1}{2})\) of G( VE) (\(n\ge 5)\) that comprised a chain, the \(\mathrm{NEGD}\) (\(\frac{1}{2}<\mathrm{NEGD}\le 1)\) of G( VE) that has at least one fully connected node is higher. Systems with at least one fully connected node are more homogeneous than those with the chain structure.

Diffusion speed: normalized expected minimum speed

G( VE) comprises components. A component is disconnected from all other components in G( VE); there are no edges that connect nodes in one component and nodes in another component. Nodes in the same component are connected to each other directly or indirectly. G( VE) has only one component if it is a clique. A fully disconnected G( VE) of n nodes comprised n components, each of which has one node. At each step of diffusion, a node \(v_i \) diffuses entities to all its neighbors. After one or more steps, entities are diffused to all nodes in the component to which \(v_i \) belongs. Let \(\mathrm{LGD}_{v_i } \) be the largest geodesic distance of the component to which \(v_i \) belongs. \(\mathrm{LGD}_{v_i } \) is the maximum number of steps required for all nodes in the component to receive entities diffused from a randomly selected node \(v_i \). Let \(n_{v_i } \) be the total number of nodes in the component to which \(v_i \) belongs; \(n_{v_i } \) is the order of the component. \(n_{v_i } -1\) is the total number of nodes that receive entities diffused from \(v_i \). \(\frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\) is the expected minimum number of nodes that receive entities diffused from \(v_i \) at each step. \(\frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\) is the minimum diffusion speed of the component to which \(v_i \) belongs. All nodes in the same component have the same minimum diffusion speed. \(\frac{\mathop \sum \nolimits _{i=1}^n \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }}{n}\) is the expected minimum speed, which indicates the expected minimum number of nodes in G( VE) that receive entities at each step of diffusion. The maximum value of \(\frac{\mathop \sum \nolimits _{i=1}^n \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }}{n}\) is \(n-1\), which is for a clique. To compare systems of different orders, the normalized expected minimum speed (\(\mathrm{NEMS})\) is defined in Eq. (2) for \(n>1\). \(\mathrm{NEMS}=0\) for a fully disconnected G( VE) and \(\mathrm{NEMS}=1\) for a clique. The larger the \(\mathrm{NEMS}\) is, the higher diffusion speed does G( VE) have. For the two systems in Fig. 1, \(\mathrm{NEMS}=0.250\) for 1(a) and \(\mathrm{NEMS}=0.333\) for 1(b); system 1(b) has higher speed than 1(a).

$$\begin{aligned} \mathrm{NEMS}=\frac{\mathop \sum \nolimits _{i=1}^n \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }}{n(n-1)} \end{aligned}$$
(2)

Proposition 2

Suppose G( VE) has total n nodes and \(n_f \) (\(0\le n_f \le n)\) fully connected nodes. Table 2 depicts the \(\mathrm{NEMS}\) of G( VE). \(0\le \mathrm{NEMS}\le 1\). \(\mathrm{NEMS}\le c\), 0\(\le c\le 1\), is a nontrivial property of G( VE). \(\mathrm{NEMS}\le c\), \(c=0\) or \(c=1\), is a hereditary property of G( VE). \(\mathrm{NEMS}\le c\), 0\(<c<1\), is a nonhereditary property of G( VE).

Proof of Proposition 2

When \(n=1\), \(\mathrm{NEMS}=0\) per definition.

When \(n=2\), \(\mathrm{NEMS}=0\) if there is no edge; \(\mathrm{NEMS}=1\) if there is one edge.

When \(n\ge 3\),

  1. (a)

    \(n_f =n. \quad G( {V, E})\) is a clique and \(\mathrm{NEMS}=1\);

  2. (b)

    \(n_f \ne n-1\) (see proof in Proposition 1);

  3. (c)

    \(1\le n_f \le n-2. \quad \mathrm{NEMS}=\frac{\mathop \sum \nolimits _{i=1}^n \frac{n-1}{2}}{n(n-1)}=\frac{1}{2};\) and

  4. (d)

    \(n_f =0.\) Three different situations can be analyzed: \(\mathrm{LGD}=n-1\), \(1\le \mathrm{LGD}\le n-2\), and \(\mathrm{LGD}=0\).

    1. (i)

      \(\mathrm{LGD}=n-1. \quad n\ge 4\) (see proof in Proposition 1). G( VE) comprised a chain of n nodes. \(\mathrm{NEMS}=\frac{\mathop \sum \nolimits _{i=1}^n \frac{n-1}{n-1}}{n(n-1)}=\frac{1}{n-1}\). \(\mathrm{NEMS}\) is strictly monotonically decreasing. When \(n=4\), \(\mathrm{NEMS}=\frac{1}{3}\). Since \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{1}{n-1}=0\), \(0<\mathrm{NEMS}\le \frac{1}{3}\);

    2. (ii)

      \(1\le \mathrm{LGD}\le n-2.\) There exists at least one chain of nodes; the distance of the shortest path between the two end nodes of the chain is \(\mathrm{LGD}\). Suppose this chain of nodes belongs to a component \(V_1 \), which has \(n_1 \) nodes. \(n_1 \ge \mathrm{LGD}+1\). \(\mathrm{NEMS}=\frac{\mathop \sum \nolimits _{i=1}^n \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }}{n(n-1)}=\frac{\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }}{n(n-1)}\). The minimum value of \(\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\) is zero; each node \(v_i \), \(v_i \notin V_1 \), is disconnected from other nodes. \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}=\frac{n_1 ( {n_1 -1})}{\mathrm{LGD}}\ge \mathrm{LGD}+1\). Therefore, \(\mathrm{NEMS}\ge \frac{\mathrm{LGD}+1}{n(n-1)}\). When \(\mathrm{NEMS}=\frac{\mathrm{LGD}+1}{n(n-1)}\), G( VE) comprised \(n-\mathrm{LGD}\) components. One component has \(\mathrm{LGD}+1\) nodes, which form a chain; the other \(n-( {\mathrm{LGD}+1})\) components each have one node. When \(\mathrm{LGD}=1\), \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{\mathrm{LGD}+1}{n(n-1)}=\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{2}{n(n-1)}=0\). When \(\mathrm{LGD}=n-2\), \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{\mathrm{LGD}+1}{n(n-1)}=\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{n-1}{n(n-1)}=\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{1}{n}=0\). \(\mathrm{NEMS}>0\). To find the maximum of \(\mathrm{NEMS}\), note that \(\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le \mathop \sum \nolimits _{V-V_1 } ( {n_{v_i } -1})\). When \(\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }=\mathop \sum \nolimits _{V-V_1 } ( {n_{v_i } -1})\), each node \(v_i \), \(v_i \notin V_1 \), belongs to a clique and \(\mathrm{LGD}_{v_i } =1\). Since there are total \(n-n_1 \) nodes \(v_i^\prime s\) such that \(v_i \notin V_1 \), \(\mathop \sum \nolimits _{V-V_1 } ( {n_{v_i } -1})=\mathop \sum \nolimits _{V-V_1 } n_{v_i } -( {n-n_1 })\le ( {n-n_1 })^2-( {n-n_1 })\). \(\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le ( {n-n_1 })^2-( {n-n_1 })\). When \(\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }=( {n-n_1 })^2-( {n-n_1 })\), not only each node \(v_i \), \(v_i \notin V_1 \), belongs to a clique, but all \(v_i^\prime s\), \(v_i \notin V_1 \), belong to the same clique. \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+ \quad \mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le \frac{n_1 ( {n_1 -1})}{\mathrm{LGD}}+( {n-n_1 })^2-( {n-n_1 })=\frac{n_1 \left[ {( {1+\mathrm{LGD}})n_1 -( {2\mathrm{LGD}\cdot n+1-\mathrm{LGD}})} \right] +\mathrm{LGD}( {n-1})n}{\mathrm{LGD}}\). If \(n_1 =n\), \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le \frac{n( {n-1})}{\mathrm{LGD}}\); G( VE) has one component. If \(\mathrm{LGD}=1\), G( VE) becomes a clique, which is contradictory to the condition that \(n_f =0\). \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le \frac{n( {n-1})}{\mathrm{LGD}}\le \frac{n( {n-1})}{2}\). If \(n=3\) and \(\mathrm{LGD}=2\), \(\mathrm{LGD}=n-1\), which is contradictory to the condition that \(\mathrm{LGD}\le n-2\). \(n\ge 4\) when \(\mathrm{LGD}=2\). G( VE) has a single component of n (\(n\ge 4)\) nodes with \(\mathrm{LGD}=2\) when \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }=\frac{n( {n-1})}{2}\). If \(n_1 <n\), \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le n_1 ( n_1 -2n+1)+n( {n-1})+\frac{n_1 ( {n_1 -1})}{\mathrm{LGD}}\). When \(\mathrm{LGD}=1\), \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le n( {n-1})-2n_1 ( n-n_1 )\); G( VE) comprised two cliques. To further maximize \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\), note that \(n( {n-1})-2n_1 ( {n-n_1 })\!=\!n( {n-1})\!+\!2( {n_1 \!-\frac{n}{2}})^2-\frac{n^2}{2}\). When \(n_1 =n-1\), \(n( {n-1})+2( {n_1 -\frac{n}{2}})^2-\frac{n^2}{2}\le ( {n-1})( {n-2})\). When \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }=( {n-1})( {n-2})\), G( VE) comprised two components: one is a clique with \(n-1\) nodes and the other has one node. \(\frac{n( {n-1})}{2}=( {n-1})( {n-2})\) when \(n=4\); \(\frac{n( {n-1})}{2}<( {n-1})( {n-2})\) when \(n>4\). Therefore, \(\mathop \sum \nolimits _{V_1 } \frac{n_1 -1}{\mathrm{LGD}}+\mathop \sum \nolimits _{V-V_1 } \frac{n_{v_i } -1}{\mathrm{LGD}_{v_i } }\le ( {n-1})( {n-2})\). \(\mathrm{NEMS}\le \frac{( {n-1})( {n-2})}{n( {n-1})}=\frac{n-2}{n}\). \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{n-2}{n}=1\). \(\mathrm{NEMS}<1\). In summary, \(\frac{\mathrm{LGD}+1}{n(n-1)}\le \mathrm{NEMS}\le \frac{n-2}{n}\). When \(\mathrm{NEMS}=\frac{\mathrm{LGD}+1}{n(n-1)}\), G( VE) comprised a chain of \(\mathrm{LGD}+1\) nodes, and \(n-( {\mathrm{LGD}+1})\) components each have one node. When \(\mathrm{NEMS}=\frac{n-2}{n}\), G( VE) comprised a clique of \(n-1\) nodes and another component of one node; and

    3. (iii)

      \(\mathrm{LGD}=0. \quad \mathrm{NEMS}=0\) per definition.

The desired system property in the recognition problem is \(\mathrm{NEMS}\le c\), where \(0\le c\le 1\). This property is nontrivial since it is true for infinitely many systems and false for infinitely many systems for given c. \(\mathrm{NEMS}\le 1\) is hereditary since the maximum of \(\mathrm{NEMS}\) is one. \(\mathrm{NEMS}\le 0\) if and only if G( VE) does not have any edge; any subsystem of G( VE) does not have any edge. \(\mathrm{NEMS}\le 0\) is also hereditary. When \(0<c<1\), \(\mathrm{NEMS}\le c\) is nonhereditary. The proof is as follows:

For any value of c, \(0<c<1\), G( VE) can be constructed such that it is a chain system and \(\mathrm{NEGD}=\frac{1}{n-1}\le c<1\), \(n\ge 4\). A subsystem of G( VE), which has two nodes and one edge that connects the two nodes, may be produced by removing nodes on either end of the chain until only two nodes are left. For this subsystem, \(\mathrm{NEMS}=1>c\). \(\mathrm{NEMS}\le c, 0<c<1\), is therefore nonhereditary.

This concludes the proof of Proposition 2. \(\square \)

Table 3 \(\mathrm{NLCO}\) of G( VE) with total n nodes and \(n_f \) fully connected nodes

Proposition 2 reveals the diffusion speed of G( VE) (\(n\ge 3)\):

  1. (a)

    If G( VE) is a clique (i.e., \(n_f =n)\), \(\mathrm{NEMS}=1\). A clique has the highest diffusion speed among all system structures. Regardless of which source node that diffuses entities, all other nodes in G( VE) receive the entities in one step. \(\mathrm{NEMS}\) remains one regardless of which nodes are removed from G( VE). All nodes in G( VE) are equally critical;

  2. (b)

    If G( VE) comprised a single component, \(\mathrm{NEMS}=\frac{1}{\mathrm{LGD}}\). When G( VE) has at least one node that is fully connected (i.e., \(1\le n_f \le n-2)\), G( VE) comprised a single component and \(\mathrm{LGD}=2\); \(\mathrm{NEMS}=0.5\). When G( VE) does not have any node that is fully connected (i.e., \(n_f =0)\), but comprised a chain of n nodes (i.e., \(\mathrm{LGD}=n-1)\), G( VE) comprised a single component and \(\mathrm{NEMS}=\frac{1}{n-1}\). In this case, \(\mathrm{NEMS}\) is strictly monotonically decreasing as n increases. As a chain becomes longer, its diffusion speed decreases;

  3. (c)

    If G( VE) does not have any node that is fully connected (i.e., \(n_f =0)\), and does not comprise a chain of n nodes (i.e., \(1\le \mathrm{LGD}\le n-2)\), \(\mathrm{NEMS}\) is greater than zero and less than one. \(\mathrm{NEMS}\) approaches zero as n increases when G( VE) comprised a chain of \(\mathrm{LGD}+1\) nodes, and \(n-( {\mathrm{LGD}+1})\) fully disconnected nodes. In this case, \(\mathrm{NEMS}=\frac{\mathrm{LGD}+1}{n(n-1)}\). Since \(\mathrm{LGD}\) is bounded by \(n-2\), \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{\mathrm{LGD}+1}{n(n-1)}=0\). This result is intuitively correct. Fully disconnected nodes have the lowest diffusion speed. For a component, a chain structure has the lowest diffusion speed. G( VE) has the lowest diffusion speed if it comprised a chain structure (with minimum component size \(\mathrm{LGD}+1)\) and other fully disconnected nodes. \(\mathrm{NEMS}\) approaches one as n increases when G( VE) comprised a clique of \(n-1\) nodes and a fully disconnected node. In this case, \(\mathrm{NEMS}=\frac{n-2}{n}\) and \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{n-2}{n}=1\). Since a clique has the highest diffusion speed and G( VE) does not have any fully connected node, G( VE) has the highest diffusion speed if it comprised a clique with the maximum order \(n-1\), and a single fully disconnected node;

  4. (d)

    If G( VE) does not have any edge (i.e., \(\mathrm{LGD}=0)\), \(\mathrm{NEMS}=0\); and

  5. (e)

    Compared to the \(\mathrm{NEMS}\) (\(0<\mathrm{NEMS}\le \frac{1}{3})\) of G( VE) that comprised a chain, the \(\mathrm{NEMS}\) (\(\mathrm{NEMS}=0.5\) or 1) of G( VE) that has at least one fully connected node is higher. When G( VE) represents the typical structure of most real-world systems (\(n_f =0\) and \(1\le \mathrm{LGD}\le n-2)\), \(0<\mathrm{NEMS}<1\) and \(\mathrm{NEMS}\) is a useful performance metric to describe how fast entities diffuse in G( VE).

Diffusion scale: normalized largest component order

Suppose \(v_i \) is the source node that diffuses entities to other nodes in G( VE). \(n_{v_i } -1\) is the number of nodes that receive entities. Suppose \(\forall v_i \in V\) might be the source node, \(\mathrm{max}( {n_{v_i } })-1\) is the maximum number of nodes that receive entities if there is only one source node. \(\mathrm{max}( {n_{v_i } })\) is the largest component order in G( VE). The normalized largest component order (\(\mathrm{NLCO}\); Eq. (3)) indicates the maximum scale of (a) terrorist or criminal activities by connected terrorists or criminals; and (b) networked computers, communication devices, and sensors.

$$\begin{aligned} \mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1} \end{aligned}$$
(3)

Proposition 3

Suppose G( VE) has total n nodes and \(n_f \) (\(0\le n_f \le n)\) fully connected nodes. Table 3 depicts the \(\mathrm{NLCO}\) of G( VE). \(0\le \mathrm{NLCO}\le 1\). \(\mathrm{NLCO}\le c\), 0\(\le c\le 1\), is a nontrivial property of G( VE). \(\mathrm{NLCO}\le c\), \(c=0\) or \(c=1\), is a hereditary property of G( VE). \(\mathrm{NLCO}\le c\), 0\(<c<1\), is a nonhereditary property of G( VE).

Proof of Proposition 3

When \(n=1\), it is defined that \(\mathrm{NLCO}=0\).

When \(n=2\), \(\mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}=\frac{1-1}{2-1}=0\) if there is no edge; \(\mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}=\frac{2-1}{2-1}=1\) if there is one edge.

When \(n\ge 3\),

  1. (a)

    \(n_f =n\) or \(1\le n_f \le n-2\), G( VE) comprised a single component. \(\mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}=\frac{n-1}{n-1}=1\);

  2. (b)

    \(n_f \ne n-1\) (see proof in Proposition 1); and

  3. (c)

    \(n_f =0.\) Three different situations can be analyzed: \(\mathrm{LGD}=n-1\), \(1\le \mathrm{LGD}\le n-2\), and \(\mathrm{LGD}=0\).

    1. (i)

      \(\mathrm{LGD}=n-1. \quad n\ge 4\) (see proof in Proposition 1). G( VE) comprised a chain of n nodes. \(\mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}=\frac{n-1}{n-1}=1\);

    2. (ii)

      \(1\le \mathrm{LGD}\le n-2.\) The minimum largest component order, \(\mathrm{max}( {n_{v_i } })\), is \(\mathrm{LGD}+1\). \(\mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}\ge \frac{( {\mathrm{LGD}+1})-1}{n-1}=\frac{\mathrm{LGD}}{n-1}\ge \frac{1}{n-1}\). Since \(\mathop {\lim }\nolimits _{n\rightarrow \infty } \frac{1}{n-1}=0\), \(\mathrm{NLCO}>0\). The maximum \(max( {n_{v_i } })\) is n when G( VE) comprised a single component. Systems with \(n_f =0\) and \(1\le \mathrm{LGD}\le n-2\) may comprise a single component. For example, G( VE) comprised a chain of \(n-1\) nodes and another node that is connected to one of the nodes on the chain other than the two end nodes. \(\mathrm{NLCO}\le 1\); and

    3. (iii)

      \(\mathrm{LGD}=0. \quad \mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}=\frac{1-1}{n-1}=0.\)

The desired system property in the recognition problem is \(\mathrm{NLCO}\le c\), where \(0\le c\le 1\). This property is nontrivial since it is true for infinitely many networks and false for infinitely many networks for given c. \(\mathrm{NLCO}\le 1\) is hereditary since the maximum of \(\mathrm{NEGD}\) is one. \(\mathrm{NLCO}\le 0\) if and only if G( VE) does not have any edge; any subsystem of G( VE) does not have any edge. \(\mathrm{NLCO}\le 0\) is also hereditary. When \(0<c<1\), \(\mathrm{NLCO}\le c\) is nonhereditary. The proof is as follows:

For any value of c, \(0<c<1\), G( VE) can be constructed such that it includes only one edge and \(\mathrm{NLCO}=\frac{1}{n-1}\le c<1\), \(n\ge 3\). A subsystem of G( VE), which has two nodes and one edge that connects the two nodes, may be produced by removing all nodes without any edge. For this subsystem, \(\mathrm{NLCO}=1>c\). \(\mathrm{NLCO}\le c\), \(0<c<1\), is therefore nonhereditary.

This concludes the proof of Proposition 3. \(\square \)

Proposition 3 reveals the diffusion scale of G( VE) (\(n\ge 3)\):

  1. (a)

    G( VE) comprised a single component if \(n_f >0\) or \(\mathrm{LGD}=n-1\). G( VE) may comprise a single component if \(n_f =0\) and \(\mathrm{LGD}<n-1\). When G( VE) comprised a single component, \(\mathrm{NLCO}=1\);

  2. (b)

    If G( VE) does not have any node that is fully connected (i.e., \(n_f =0)\), and does not comprise a chain of n nodes (i.e., \(1\le \mathrm{LGD}\le n-2)\), \(\mathrm{NLCO}\) is greater than zero and less than or equal to one. \(\mathrm{NLCO}\) approaches zero as n increases if the \(\mathrm{LGD}\) of G( VE) is a constant. For large systems, \(\mathrm{NLCO}\approx 0\) if \(\mathrm{LGD}\le c_1 \); \(c_1 \) is a constant and \(c_1 \ll n\); and

  3. (c)

    If G( VE) does not have any edge (i.e., \(\mathrm{LGD}=0)\), \(\mathrm{NLCO}=\frac{\mathrm{max}( {n_{v_i } })-1}{n-1}=\frac{1-1}{n-1}=0\).

CNI algorithms and complexity

All three performance metrics may be applied to identify critical nodes. For example, the scale and complexity ofterrorist attacks are related to the size of terrorist groups; smaller groups are less likely to launch coordinated large-scale attacks. The \(\mathrm{NLCO}\) may be used to identify critical terrorists. For another example, to enhance information security, it is important to slow down the spread of viruses in a computer network. The \(\mathrm{NEMS}\) may be used to identify vulnerable computers. For engineered systems such as the Internet and wireless sensor networks, the significance of using the performance metrics is twofold. First, the performance metrics may be applied to compare different system designs. For instance, the \(\mathrm{NEMS}\) may be used to gauge the speed of information diffusion in a wireless sensor network, and help select the best network designs. Second, the performance metrics may be applied to identify critical nodes. In a computer network, for instance, the \(\mathrm{NEGD}\) may be used to identify critical computers whose removal result in a heterogeneous network, which is vulnerable to targeted attacks [1].

Propositions 1 through 3 summarize the performance of the three metrics over generalized systems, and enable the assessment and comparison of systems of different orders and sizes. Algorithms are needed to apply the three metrics to identify critical nodes.

Nonpolynomial-time CNI algorithms

Given that a maximum of m nodes may be removed from a system of n nodes (\(0<m\le n)\), Algorithm A uses the \(\mathrm{NEGD}\) to identify the most critical nodes (i.e., the global optimal solution) in the system.

figure a

The \(\mathrm{NEGD}\) in Algorithm A may be replaced with \(\mathrm{NEMS}\) or \(\mathrm{NLCO}\) to identify the most critical nodes. Since any performance metric must be calculated for \(\mathop \sum \nolimits _{k=1}^m ( {{\begin{array}{c} n \\ k \\ \end{array} }})\) sets, which cannot be completed in polynomial time, the algorithms that identify the most critical nodes require nonpolynomial time. For certificate-checking, suppose it is required that \(\mathrm{NEGD}\le c\), where c is a constant and \(c\ge 0\), and a maximum of m nodes may be removed from a system of n nodes (\(0<m\le n)\). It takes O(m) time to verify that k (\(0<k\le m)\) nodes are removed from the system. Calculating \(\mathrm{NEGD}\) requires \(O(n^3)\) time. The certificate-checking of \(\mathrm{NEGD}\) is in time \(O(n^3)\). When \(c\ge 1\), \(\mathrm{NEGD}\le c\) is always true since the maximum of \(\mathrm{NEGD}\) is one. CNI cannot be performed if desired system property is \(\mathrm{NEGD}\le c\), \(c\ge 1\). When \(c=0\), \(\mathrm{NEGD}\le c\) is a nontrivial and hereditary property of G( VE) according to Proposition 1. CNI is therefore NP-complete if desired system property is \(\mathrm{NEGD}\le 0\). When \(0<c<1\), \(\mathrm{NEGD}\le c\) is a nontrivial and nonhereditary property of G( VE). There might exist polynomial-time CNI algorithms if desired system property is \(\mathrm{NEGD}\le c\), \(0<c<1\).

Similarly, calculating \(\mathrm{NEMS}\) or \(\mathrm{NLCO}\) requires \(O(n^3)\) time. The certificate-checking of \(\mathrm{NEMS}\) or \(\mathrm{NLCO}\) is in time \(O(n^3)\). According to Propositions 2 and 3, both \(\mathrm{NEMS}\le 0\) and \(\mathrm{NLCO}\le 0\) are nontrivial and hereditary; both \(\mathrm{NEMS}\le c\) and \(\mathrm{NLCO}\le c\), \(0<c<1\), are nontrivial and nonhereditary. CNI is NP-complete if desired system property is \(\mathrm{NEMS}\le 0\) or \(\mathrm{NLCO}\le 0\). There might exist polynomial-time CNI algorithms if desired system property is \(\mathrm{NEMS}\le c\) or \(\mathrm{NLCO}\le c\), \(0<c<1\).

Polynomial-time CNI algorithms

To reduce complexity, Algorithm B uses the \(\mathrm{NEGD}\) to identify critical nodes in a system step by step.

figure b

Algorithm B identifies the local optimal solution at each step and removes the node whose removal minimizes the \(\mathrm{NEGD}\). This algorithm does not guarantee the global optimum, which is achieved by the nonpolynomial-time CNI algorithms (e.g., Algorithm A). Two algorithms similar to Algorithm B may be written by replacing the \(\mathrm{NEGD}\) with \(\mathrm{NEMS}\) or \(\mathrm{NLCO}\). Calculating \(\mathrm{NEGD}\), \(\mathrm{NEMS}\), or \(\mathrm{NLCO}\) requires \(O(n^3)\) time. At each step, a performance metric is calculated for \(n+1-k\) different systems. Since there are m steps, the overall complexity of each of the three CNI algorithms with local optima is \(O( {mn^4})\).

These three algorithms are applied to the network of 9/11 terrorists [18] to identify the critical nodes. Each square (Fig. 2) represents a terrorist and each link represents communications between two terrorists. There are total 63 nodes and they form a connected network. Note that for coordinated attacks, there cannot be disconnected components. None of the 63 nodes is fully connected (i.e., \(n_f =0)\) and the network is not a chain (i.e., \(\mathrm{LGD}<n-1=62)\); this reflects the structure of many criminal organizations and is designed for practical purposes such as secrecy. Figures 3, 4 and 5 show the results of applying Algorithm B to the 9/11 terrorist network. Detailed results are included in Tables 4, 5 and 6 in the Appendix. These results validate the three propositions. All three performance metrics are between zero and one. Figure 5 also shows that \(\mathrm{NLCO}\le c\), \(0<c<1\), is nonhereditary because the \(\mathrm{NLCO}\) does not decrease monotonically and sometimes increases.

Figures 3 and 4 indicate that both the \(\mathrm{NEGD}\) and \(\mathrm{NEMS}\) are close to zero after almost 30 nodes are removed or neutralized, which are approximately 50 % of all terrorists. When \(\mathrm{NEGD}\) is close to zero, most nodes are disconnected, whereas the diffusion speed is almost zero when \(\mathrm{NEMS}\) is close to zero. The terrorist network cannot launch any large-scale attacks after about 30 nodes are removed using Algorithm B, which is an efficient, polynomial-time algorithm. Figure 5 shows that the \(\mathrm{NLCO}\) reaches the minimum after about 30 nodes are removed and then increases slightly before it becomes zero. This indicates that the portion of nodes that form a connected group and may launch terrorist attacks is the smallest after 30 nodes are removed. Removing additional nodes does not further decrease the portion until almost all nodes are removed. Overall, the 9/11 terrorists network is destroyed or neutralized after about 30 critical nodes are removed using Algorithm B.

Fig. 2
figure 2

September 11 terrorist network (adapted from [18])

Fig. 3
figure 3

\(\mathrm{NEGD}\) of the 9/11 terrorist network using Algorithm B for CNI

Fig. 4
figure 4

\(\mathrm{NEMS}\) of the 9/11 terrorist network using Algorithm B for CNI

Fig. 5
figure 5

\(\mathrm{NLCO}\) of the 9/11 terrorist network using Algorithm B for CNI

Fig. 6
figure 6

Comparison of \(\mathrm{NEGD}\) with different CNI algorithms (Table 4 in the Appendix)

Fig. 7
figure 7

Comparison of \(\mathrm{NEMS}\) with different CNI algorithms (Table 5 in the Appendix)

Fig. 8
figure 8

Comparison of \(\mathrm{NLCO}\) with different CNI algorithms (Table 6 in the Appendix)

Comparative analysis of CNI algorithms

To further validate Algorithm B, three CNI algorithms, including degree-based node removal, betweenness-based node removal, and random node removal, are applied to the 9/11 terrorist network. These three algorithms have been widely used to identify critical nodes in complex systems. The degree-based node removal identifies critical nodes based on their degree [1, 21]. The node degree, \(\theta ({v_i})\), is an indicator of \(v_i\)’s connections with other nodes in a system. Nodes with a larger degree are more critical and are removed or neutralized first. The betweenness-based node removal identifies critical nodes based on their betweenness, which measures the frequency with which a node falls on the shortest paths connecting pairs of other nodes [17]. Betweenness indicates the potential of a node in controlling communications in a system. Nodes with a larger betweenness are more critical and are removed or neutralized first. Equation (4) calculates the betweenness, \(Bet( {v_k })\), of node \(v_k \), where \(b_{ij} ( {v_k })\) (Eq. 5) is the portion of the shortest paths connecting nodes \(v_i \) with \(v_j \) that contain node \(v_k \). \(g_{ij} \) in Eq. (5) is the total number of shortest paths connecting \(v_i \) and \(v_j \) and \(g_{ij} (v_k )\) is the number of shortest paths that connect \(v_i \) and \(v_j \) and contain \(v_k \).

$$\begin{aligned}&\mathrm{Bet}( {v_k })=\mathop \sum \limits _{i=1}^n \mathop \sum \limits _{j=i+1}^n b_{ij} ( {v_k })\end{aligned}$$
(4)
$$\begin{aligned}&b_{ij} ( {v_k })=\frac{1}{g_{ij} }\times g_{ij} ( {v_k }) \end{aligned}$$
(5)

The random node removal randomly selects a node as the most critical node and removes it from the system. The random node removal is expected to have the worst performance in minimizing the three performance metrics. Figures 6, 7, and 8 compare the performance of the four CNI algorithms, Algorithm B, degree-based node removal, betweenness-based node removal, and random node removal, which are applied to the 9/11 terrorist network. Figure 6 shows that Algorithm B decreases \(\mathrm{NEGD}\) faster than degree-based node removal and random node removal. The betweenness-based node removal has smaller \(\mathrm{NEGD}\) compared to Algorithm B between 2 and 24 nodes are removed (Table 4 in the Appendix). Starting from the 25th node, however, Algorithm B performs better than the betweenness-based node removal with smaller \(\mathrm{NEGD}\). Overall, Algorithm B has the best performance among all four CNI algorithms and consistently decreases \(\mathrm{NEGD}\).

Figure 7 shows a similar trend observed in Fig. 6. Algorithm B performs the best in minimizing \(\mathrm{NEMS}\). Figure 8 shows that Algorithm B decreases \(\mathrm{NLCO}\) faster than the random node removal. The degree-based node removal and betweenness-based node removal have smaller \(\mathrm{NLCO}\) than Algorithm B at the beginning, but larger \(\mathrm{NLCO}\) when there are a few nodes left in the system. Overall, Algorithm B should be used to minimize the three performance metrics, whereas degree-based node removal and betweenness-based node removal may be used to decrease \(\mathrm{NLCO}\) more efficiently.

As discussed in Sect. “CNI algorithms and complexity”, the computational complexity of Algorithm B is \(O({mn^4})\), where m is the number of critical nodes and n is the total number of nodes in a system. The computational complexity of the degree of a node is O(n). To compute the degree of all n nodes, the complexity is \(O({n^2})\). The complexity of the degree-based node removal is therefore \(O(mn^4)\), which is the same as that of Algorithm B. Using Brandes’ Algorithm [6], calculating the betweenness of a node requires O(ns) time, where s denotes the number of edges in a system. Since there are at most \(\frac{n({n-1})}{2}\) edges, the complexity of calculating betweenness is \(O({n^3})\). The complexity of the betweenness-based node removal is \(O({mn^5})\). The betweenness-based node removal is the second best algorithm among the four CNI algorithms, but requires more computation time compared to Algorithm B. The complexity of the random node removal is O(mn) because the complexity of removing a randomly selected node at each step is O(n).

Conclusions and future research

Three new performance metrics, \(\mathrm{NEGD}\), \(\mathrm{NEMS}\), and \(\mathrm{NLCO}\), are designed to assess a system’s ability of diffusing entities such as information, goods, or diseases. All three metrics are normalized and are between zero and one. The higher their value is, the more capable is a system to diffuse entities. Characteristics of the three metrics are analyzed for generalized systems. All three metrics are nontrivial; they are nonhereditary except for extreme cases (e.g., \(\mathrm{NEGD}\le 1\) or \(\mathrm{NEGD}\le 0)\).

These three performance metrics may be used to identify critical nodes in complex systems. Three nonpolynomial algorithms (Sect. “Nonpolynomial-time CNI algorithms”) use the three metrics to identify the most critical nodes (i.e., global optimum). CNI is NP-complete if any of the three metrics is required to be less than or equal to zero. There might exist polynomial-time CNI algorithms if any of the three metrics is required to be less than or equal to a constant that is between but exclusive of zero and one. In Sect. “Polynomial-time CNI algorithms”, three polynomial-time algorithms are designed to identify critical nodes step by step (i.e., local optima). These three algorithms with local optima do not guarantee the identification of the global optimum, but their algorithm complexity is \(O( {mn^4})\), which is in class P, where m is the number of critical nodes to be identified and n is the number of nodes in the system (i.e., system order; \(m\le n)\). These polynomial-time algorithms are compared to three other widely used CNI algorithms, including degree-based node removal, betweenness-based node removal, and random node removal. The polynomial-time algorithms developed in this article have the best performance.

CNI is important in controlling complex systems with limited resources. Future research may focus on three areas:

  1. (a)

    Study the relationship between the three performance metrics and determine whether they can be integrated or additional metrics are needed to assess desired properties of complex systems;

  2. (b)

    Apply the six algorithms to other real-world complex systems to further validate and compare their performance and complexity;

  3. (c)

    Design other exact or heuristic optimization algorithms to identify critical nodes. Since the three performance metrics are nontrivial but nonhereditary properties in most cases, there might exist exact optimization algorithms that belong to class P; and

  4. (d)

    Integrate the DNA and the performance metrics and algorithms developed in this research, and apply them to systems whose properties, e.g., topology or structure, change over time.