Identification and Quantification of Node Criticality through EWM–TOPSIS: A Study of Hong Kong’s MTR System

Public transport networks (PTNs) are critical in populated and rapidly densifying cities such as Hong Kong, Beijing, Shanghai, Mumbai, and Tokyo. Public transportation plays an indispensable role in urban resilience with an integrated, complex, and dynamically changeable network structure. Consequently, identifying and quantifying node criticality in complex PTNs is of great practical significance to improve network robustness from damage. Despite the proposition of various node criticality criteria to address this problem, few succeeded in more comprehensive aspects. Therefore, this paper presents an efficient and thorough ranking method, that is, entropy weight method (EWM)–technology for order preference by similarity to an ideal solution (TOPSIS), named EWM–TOPSIS, to evaluate node criticality by taking into account various node features in complex networks. Then we demonstrate it on the Mass Transit Railway (MTR) in Hong Kong by removing and recovering the top k critical nodes in descending order to compare the effectiveness of degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and the proposed EWM–TOPSIS method. Four evaluation indicators, that is, the frequency of nodes with the same ranking (F), the global network efficiency (E), the size of the largest connected component (LCC), and the average path length (APL), are computed to compare the performance of the four methods and measure network robustness under different designed attack and recovery strategies. The results demonstrate that the EWM–TOPSIS method has more obvious advantages than the others, especially in the early stage.


Introduction
Public transport networks (PTNs), including buses, trolleybuses, trams or light rail, rapid transit (i.e., metro, subway, underground), and ferries, are the backbone and central pillars for urbanization. Urbanization, especially in coastal regions, has exacerbated PTNs' vulnerabilities to disruptions. Various disruptions, ranging from natural hazards such as super typhoons or flooding to man-made events such as accidents, terrorism, and social unrest, will induce cascading failures in PTNs. Cascading failure of highly connected and interrelated components can cause significant damages to the whole network. As a result, identifying vulnerabilities of each component of PTNs has recently generated considerable research interest in the impacts of disruptive events.
Critical nodes can affect the structure and function of complex networks more significantly than the others [1]. Meanwhile, it reflects its control over structural connectivity and contributions to the functional operability of the system. Consequently, identifying the critical nodes is of considerable significance in analyzing the vulnerability and fragility of a tiny fraction of critical nodes against disruptive events [2][3][4][5]. Current methods, including degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), eccentricity, page rank, and subgraph centrality, have been widely used to measure node criticality in complex networks. For example, Yin et al. [6] addressed the importance evaluation problem by establishing the betweenness-based indexes of stations, edges, and lines based on passenger flow, shortest paths, and k shortest paths. By conducting empirical studies on Shanghai's subway system, Zhang et al. [7] depicted that the system is more vulnerable and susceptible to targeted attacks, and the high betweenness node-based removal would result in further damage to the whole network. Psaltoglou and Calle [8] provided a novel methodology for identifying the critical nodes in urban PTNs over time by comparing the degree centrality and betweenness centrality. Kanwar et al. [9] compared the Delhi Metro (DMop) network and its extension (DMext) by performing targeted and random attacks on high-degree, high-betweennessbased node attacks, and edge-based attacks to study the network vulnerability.
Although many efforts have been made on using a single criterion, such as degree centrality (DC), betweenness centrality (BC) [10], and closeness centrality (CC) to evaluate node criticality, these only exhibit the single-sided aspect of node criticality. DC [11] represents the number of neighbor nodes connected to a node, which implies its influence on the network structure. However, as a simple centrality measure, it only calculates the number of nodes connected to a certain node and does not account for how critical those neighbors are, which indicates the computation precision is not high enough. Moreover, the nodes with the same degree play different roles in a complex network [10]. Meanwhile, BC only considers nodes that belong to the shortest path of other node pairs [12]. At the same time, CC would fail when there is a disconnected component in a network. Both centrality measures have high computational complexity.
Considering the various node features, this paper presents a comprehensive method that can effectively identify the critical nodes in PTNs. This method incorporated the entropy weight method (EWM) [13] and the technology for order preference by similarity to an ideal solution (TOP-SIS) that are widely used in other research areas. EWM is a commonly used weighting method to measure the structural complexity and characterize the size of information in decision-making [14]. On the other hand, TOPSIS is a multicriteria decision-making approach proposed by Hwang and Yoon in 1981 and further developed by Yoon in 1987 and Hwang et al. in 1993 [15]. It is based on the concept that an ideal alternative should have the shortest geometric distance to a positive ideal solution (PIS) A þ and the farthest distance to a negative ideal solution (NIS) A À . TOPSIS has been widely used to identify the critical nodes in different research fields such as the coal industry [16], infrastructure resources [17], security risk [18], social network [19], and public transportation systems [20]. It accredits equal weights to each criterion, whereby the different roles various criteria would play during the procedure are ignored [21].
In this paper, EWM is used to calculate the weight of each criterion and reduce the disadvantages of TOPSIS, which adopts equal weights. This method addresses the problem of inaccurate assessment and avoids the singlesided effects of using a single criterion. Additionally, structural data from the Mass Transit Railway (MTR) network in Hong Kong are used to illustrate the effectiveness of the proposed method. Four evaluation indicators, that is, the frequency of nodes with the same ranking (F), global network efficiency (E), the size of the largest connected component (LCC), and the average path length (APL), are adopted to measure network robustness before and after removing or recovering the top k ranked nodes by different ranking methods. The results indicate the superiority of the proposed method.
The remainder of this paper is structured as follows. Section 2 describes the main elements of the node properties, existing centrality criteria, and the proposed method. An empirical analysis is conducted on the MTR network in Hong Kong. Section 3 quantifies the effectiveness of the proposed method by evaluating the network robustness after node removal and the network recovery speed based on different designed sequences. Section 4 presents a detailed discussion, conclusion, and further work.

Methods
Considering the problems of using a single criterion to evaluate the node criticality, this paper adopts a weighting method, namely EWM, and a multicriteria decision analysis method, namely TOPSIS, which incorporate different centrality measures for node criticality evaluation.

Centrality Measures
A network can be represented as a graph that contains nodes and edges with attributes [22]. PTNs can be denoted as an undirected network g stands for the set of stations, and E ¼ e ij È É , i 6 ¼ j ¼ 1; 2; 3; Á Á Á n ð Þdenotes the connection between the stations, respectively [23]. W denotes the weight set, which equals the travel time between the Urban Rail Transit (2021) 7(3):226-239 227 stations. In practice, from the passenger's point of view, it is better to spend the minimum time to travel in daily life. Hence, in this paper, the shortest path length d ij between the nodes is defined as the minimum travel time from v i to v j . It is expressed as: A ¼ fa ij g mÂn is the adjacency matrix, where a ij ¼ 1 if there is a connection between node i and node j, and 0 otherwise. By calculating the sum of a ij , k i ¼ P n j a ij can be expressed as the degree of node i. Hence the degree centrality can be denoted as Generally speaking, a node's influence is significantly associated with its capacity and its surrounding neighbors [24]. Therefore, degree centrality (DC) is used to evaluate the ability of a node to transmit information to others. The greater the value, the more critical the node is.
Betweenness centrality (BC) is a measurement of node criticality proposed over the years to evaluate the node or edge criticality by measuring the structural centrality [25]. It reflects the ability of a node to control the whole network flow, such as passenger flow passing through the shortest path in PTNs. For undirected network, it can be expressed as follows: B i denotes as the fraction of the shortest paths between nodes.
In a connected PTN, closeness centrality (CC) of a node is calculated as the reciprocal of the sum of the length or links of the shortest paths between two nodes [26]. It is regarded as a measure of how long it will take passengers to travel from a given node to other reachable nodes in the network [27]. It can be represented as in the following equation: where d ij is the shortest paths between nodes i and j. It can be regarded as a measurement of the average travel time from v i to v j . The greater the value, the higher the efficiency of the node. Although these criteria have been used to evaluate node criticality from different perspectives, they only reflect a single-side characteristic of the nodes. Hence, there should be a unified criterion to evaluate the node criticality from diverse perspectives. To illustrate this problem, this paper proposes a novel method using the centrality criterion illustrated above, i.e., TOPSIS and EWM, to assess the node criticality in undirected networks such as PTNs.

Proposed Method
Step 1: Initialize the original matrix If there is a set of nodes in a graph-based network, whose nodes are denoted as V ¼ v 1 ; v 2 ; Á Á Á ; v n f g and another three centrality measures j ¼ 1; 2; 3 ð Þ, which represent DC, BC, and CC respectively, then the v i c j À Á i ¼ 1; 2; . . .; n; j ¼ 1; 2; 3 ð Þ represents the value of jth centrality measures for the ith node. The original matrix would be decided as To eliminate dimensional differences among the centrality measures and standardize the original matrix, the centrality measures can be standardized into benefit criteria denoting that the higher the measure, the more important the node is. It is expressed as where . Therefore, the standardized matrix can be denoted as Step 2: Calculate the objective weight of each criterion According to Shannon's entropy calculation procedure, there is a calculation of the ratio of each criterion and the sum of all criteria, where p ij ¼ r ij = P n k¼1 r kj . EWM is used to calculate each weight of each criterion, which decides the weight according to the different roles it plays. Therefore, the entropy is defined as where K ¼ 1= ln n. When p ij ¼ 0, E j ¼ 0. Then the weighting coefficient of jth criterion will be calculated through Step 3: Normalize the standardized matrix Multiplying the columns of the standardized matrix by the corresponding weights yields the weighted matrix, which can be expressed as . .
Step 4: Calculate the distance to the ideal solution Based on each standardized criterion, the positive ideal solution (PIS) Y þ and the negative ideal solution (NIS) Y À can be denoted as Therefore, the distance between each standardized criterion and PIS, and NIS, can be calculated through the following equations: Step 5: Calculate the relative closeness degree The closer to the PIS and the farther away from the NIS, the more importance the node is. Accordingly, the relative closeness degree to the ideal solution is the measurement of a node's importance and criticality, which can be calculated as follows: Step 6: Ranking node importance The vector of node importance can be denoted as Then Q e is ranked in descending order based on each centrality measure of each node; it can be acquired through On the basis of the theoretical analysis, the algorithm to rank the node importance in a graph-based network is shown in Algorithm 1. All the analyses are executed through MATLAB.

Materials
To validate the feasibility and effectiveness of the proposed method, a case study is demonstrated on the MTR network in Hong Kong. MTR, as a rapid transit system in Hong Kong, consists of 11 lines serving the urbanized areas of Hong Kong Island, Kowloon, and the New Territories This paper computed the basic topological features of the Hong Kong MTR network, which contains 95 nodes and 102 edges. The number of nodes, number of edges, maximum degree (D max ), minimum degree (D min ), average degree (D avg ), clustering coefficient (C), and average betweenness (AB) are presented in Table 1. The node betweenness is defined as the number of the shortest paths passing through nodes within the whole network. AB is an attribute to evaluate the average frequency of shortest paths passing a node, which indicates the average transport capacity of nodes.
The degree distribution of the Hong Kong MTR network is shown in Fig. 2. It depicts that when the degree is equal to 2, the probability of degree distribution of the Hong Kong MTR network will exceed 0.6. When the degree is equal to 4, the probability of degree distribution is less than 0.1. The maximum degree is 4, while the minimum degree is 1. Particularly, underground pedestrian passages serve as (iii) Based on each weight, the weighted decision matrix Y is generated.
(2) Calculate the ideal solution (PIS) Y þ and NIS Y À and the distance (S þ i and S À i ) to Y þ and Y À through Eq. (10) and (11), respectively (3) Set and rank the node importance in descending order through Eq. (14), and obtain the results

Quantify the Effectiveness of the Proposed Method
Demonstrated on the real and synthetic PTNs, we examine the effectiveness of the proposed method in the case of random, targeted attacks and recovery speed. For targeted attacks, we remove the nodes in descending order ranked by degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and the proposed EWM-TOPSIS method, respectively, while network recovery refers to the recovery speed based on the above-mentioned methods after random attacks in the removal of top 30 nodes. The random attack is used as a benchmark. Then, four evaluation indicators, that is, the frequency of nodes with the same ranking (F) initially, the global network efficiency (E), the size of the largest connected component (LCC), and the average path length (APL), are calculated to measure the performance of the proposed method.

Evaluation Indicators
Global efficiency (E)-Global efficiency, developed by Latora and Marchiori [28], is used to eliminate the disadvantage of the size of giant component theory and can be calculated through Eq. (16). It measures how efficient a network performs before and after disruptions, which is expressed as follows: where d ij is the shortest path length between node i and j in G and n is the total number of nodes. The distance d ij should be infinite if nodes are disconnected. The higher the value of E, the higher the efficiency of the networks. The ratio of network efficiency before and after node removal l is adopted to evaluate the effect of removing nodes on network connectivity, which is expressed as where E 0 denotes the network efficiency before removing the top k critical nodes, whereas E k denotes the efficiency after removing the top k nodes at kth time step. The faster the decrease in the value of l, the more critical the removed nodes and the more efficient the methods. The size of the largest connected component (LCC)-In complex graph theory, a connected component is a subgraph in which edges connect any two nodes. The largest connected component is the maximal connected subgraph. When a fraction of the nodes is removed, the network will be split into several connected components. The size of the largest connected component (LCC) is used as a robustness indicator to evaluate network connectivity and compare the performance of the proposed method with other methods. The ratio of the size of the largest connected components before and after node removal S k ð Þ is expressed as where LCC k denotes the final size of LCC after removing the top k nodes, and LCC 0 denotes the initial size of LCC. The smaller the S, the better the ranking method is. Average path length (AOL)-The average path length is a structural robustness indicator to measure how interconnected a network is. It is calculated by determining the average value of the length of the shortest path between all pairs of nodes in the weighted network. Smaller APL reflects strong connectivity between the nodes. When a fraction of nodes in the network fails, the value of APL will change. It is calculated as follows: The ratio of the average path length before and after removing the top k critical nodes L k ð Þ is denoted as where l k is the value of APL after removing the top k critical nodes, and l 0 is the original value of APL.

Attack Strategies
Different attack strategies are introduced in this section. In this paper, we continuously attack the network by removing the nodes according to the attack protocols designed as follows: (1) The highest degree-based attacks (2) The highest betweenness-based attacks (3) The highest closeness-based attacks (4) The highest EWM-TOPSIS-based attacks (5) Random attacks serve as a benchmark, which refers to removing the nodes randomly, while 20 independent runs are conducted and the ensemble mean values for the LCC, APL, and E are plotted (6) Recovery speed after random attacks in the early stage.
This paper adopts a dynamic attack manner because it performs better than the static attack for the same attack sequence [29]. After an attack, the topological features are recalculated, ensuring that the most critical node in the current network is removed in each round of attack. To increase statistical significance, nodes with the same ranking will fail randomly, and 20 independent runs are considered. Figure 3 shows the removal process in an undirected network with six nodes and the change of network connectivity under attacks.
Nodes in yellow and blue are selected for targeted removal. When a certain node is removed, it will lead to the failure of the edges that connect it with other neighboring nodes. Furthermore, the neighbored nodes with degree 1 will get dysfunctional. The dashed nodes and edges indicate that they have been detached from the network and, hence, incapacitated, resulting in the deduction of LCC, APL, and E. A higher frequency of the same ranking will lead to difficulty in ranking node criticality effectively. According to the overall ranking results, the frequency of nodes with the same ranking sorted by the DC method is the highest, reaching 62.11%. Nodes ranked by DC are divided into four categories according to the degree's value; for example, the top eight nodes share the same ranking with four adjacent nodes. This is due to it only considering the number of adjacent nodes from a local perspective and exhibiting less structural information from a global perspective. In contrast, the BC and CC methods have much a However, the frequency of nodes with the same ranking by the EWM-TOPSIS method has reached zero. Figure 4 shows the comparison of the frequency of nodes with the same ranking by applying the four methods. It is concluded that the EWM-TOPSIS method is much more accurate and efficient than the other centrality measures with the lowest F. Although the EWM-TOPSIS method is calculated based on DC, BC, and CC, it shows a better performance of ranking critical nodes and of higher resolution than DC, BC, and CC, themselves. The ranking results will be used as a baseline to quantify the network robustness under target attacks. Kowloon Tong, Tai Wai, Admiralty, Mei Foo, and Tsim Sha Tsui, as the five top-ranked nodes ranked by EWM-TOPSIS, highlight their criticality in terms of network connectivity and transport capacity. Further discussion will be portrayed in the next section.

Connectivity Analysis
In this section, we examine the effectiveness of the proposed method in terms of network robustness and recovery through connectivity measures, including l k ð Þ, S k ð Þ, and L k ð Þ. The robustness of a complex network refers to its capability to maintain network connectivity given the failure of a fraction of nodes or edges. Firstly, nodes are removed from the network in descending order according to designed attack protocols 1-5. Then, the value of l k ð Þ is recalculated through formula (17) separately in each round of attack. The lower the value of l k ð Þ, the greater the decrease in network efficiency, indicating a higher-ranking accuracy. A fraction of critical nodes is removed according to the ranking results in a dynamic manner. The process is conducted continuously until the value of l k ð Þ reaches the threshold limit value (TLV). The resulting l k ð Þ values are depicted in a line graph, as shown in Fig. 6 to provide a more in-depth insight into the quantitative change in network efficiency.
As can be seen from Fig. 5, the decrease of l k ð Þ in the case of EWM-TOPSIS is more significant than the others, which has shown its higher accuracy and efficiency, especially in the initial stage. The EWM-TOPSIS method corresponds to the most significant decline in global network efficiency. It has declined from 0.5999 to 0.5269 when ''Kowloon Tong'' is removed, which shows a significant (12.17%) fall in l k ð Þ. Simultaneously, there is a sudden decrease from 0.5999 to 0.2918 (i.e., 51.36%) when ''Kowloon Tong,'' ''Tai Wai,'' and ''Admiralty'' are removed, while the corresponding numbers of DC, BC, and CC are 34.92%, 51.36%, and 26.82%. BC shows the same performance level as EWM-TOPSIS in the early stage. However, EWM-TOPSIS is superior to BC attack protocols during the latter stages. When 69 nodes ranked by EWM-TOPSIS are removed, the value of l k ð Þ has declined to less than 0.01, which implies the most significant connectivity loss and network paralysis. Meanwhile, in the case of node criticality ranked by DC, BC, and CC, the change of network efficiency follows a similar downward trend. Figure 6 displays the decrease of the size of the largest connected component (S k ð Þ) subject with four attack protocols. As can be seen from Fig. 7b, it drastically decreases to less than half of its original size when only removing the top three nodes under EWM-TOPSIS and BC attack protocols, whereas it requires the removal of five and nine nodes to achieve the same performance level in regard to nodes ranked by DC and CC, respectively. It can be inferred that CC corresponds to the least accuracy and effectiveness when quantifying node criticality. Since the EWM-TOPSIS attack strategy can increase the decline speed of S fastest when removing a small fraction of nodes and ensure continuously high strike effectiveness when removing a large fraction of nodes, the proposed EWM-TOPSIS method performs the best to quantify node criticality further.
Furthermore, we explore the decline speed of average path length (APL) subjected to different attack strategies. As shown in Fig. 7, the variation of APL shows an ''upward-downward'' trend, which refers to the fact that the removal of the most critical nodes can result in the failure of edges that are critical to network connectivity. The variation exhibits a descending trend when top k nodes are removed at kth time step. The smaller the ratio of the average path length before and after removing the top k critical nodes (L), the more accurate the ranking method is. It can be seen from the figure that the EWM-TOPSIS method proposed in this paper has the most significant strike effect on network fragmentation, and hence the effectiveness of the proposed method is further quantified by different topological features.

Recovery Speed
To further examine the effectiveness of the proposed method, recovery sequences are generated by different methods after 30% random attacks. Additionally, 20 random recovery sequences are generated to be regarded as a baseline for comparison. In this case, the recovery speed is measured by computing the area between the Y-axis and the recovery curve, which is defined as the impact area (IA) [24]. A larger IA indicates a less efficient recovery strategy. Observing the results, as shown in Fig. 8, we can find that the recovery sequence generated by EWM-TOPSIS has an approximately 10% smaller IA than the corresponding value based on the DC-based sequence, which shows that the proposed EWM-TOPSIS can make the Fig. 5 The decline rate of global network efficiency (l) with different attack protocols network recover from random attacks more efficiently and indicates its superiority for ranking node criticality. In real-world transportation systems, node criticality is complicatedly correlated with different centrality measures. However, a purely single centrality-based measure is incapable of satisfying the demand for the fastest restoration. In summary, the results indicate that the proposed EWM-TOPSIS emerges as a better method for examining network robustness and, hence, evaluating node criticality than degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC) in a dynamic attack manner, which has more evident advantages in the initial rounds of attack, indicating its highest application value. Comprehensively, BC performs the second-best in evaluating node criticality, whereas CC is the worst.

Discussion and Conclusions
Prior work has utilized a single criterion, for example, BC, CC, DC, page rank, etc., to evaluate node criticality in complex networks. However, these studies only reflect the single-sided characteristic of node criticality. To eliminate this deficiency, we propose a node criticality identification method based on EWM and TOPSIS. The network robustness and recovery simulation experiments on real PTNs show that the proposed method synthesizes the advantages of three single criteria (DC, BC, and CC), and gains more accurate and effective assessment results. An empirical analysis is conducted on MTR in Hong Kong to warrant the effectiveness of the proposed method. Four evaluation indicators have been used to quantify the effectiveness of different ranking methods, including the frequency of nodes with the same ranking (F), the global network efficiency (E), the size of the largest connected component (LCC), and the average path length (APL). The network connectivity decreases sharply after removing the top three critical nodes in descending order ranked by DC, BC, CC, and EWM-TOPSIS. Since the EWM-TOPSIS method corresponds to the most significant decline in the value of E, LCC, and APL after removing the top k critical nodes in a dynamic attack manner, it shows superiority over the other three centrality measures in the context of accurately evaluating node criticality. Its superiority over the other three measures is more evident when a smaller fraction of nodes is removed. The network gets close to  dysfunctionality when only less than 10% of critical nodes are removed in the case of all ranking methods. To further certify the application value of the proposed method, we compare the performance of multiple station recovery sequences generated from different methods after randomly removing 30% nodes. In this case, EWM-TOPSIS tends to generate the most efficient recovery sequences and bring the network back to full functionality most effectively. This paper has proposed a comprehensive method of evaluating node criticality. It takes a multicriteria approach and recalculates the weights of each centrality criterion to overcome the deficiencies of using a single criterion, which can provide support for the planning of highly efficient transport systems. However, some limitations are worth noting. The proposed method in this paper is for the undirected networks. In real PTNs, many of them are directed networks where the connections between nodes are not bidirectional.
Additionally, although the proposed method has superior performance in evaluating node criticality and providing necessary analysis results for assessing network topology, further optimization still exists, including considering the impact of dynamic features (i.e., passenger flow). Future work will, therefore, consider the effect of passenger flow through different stations over time. Furthermore, more tests on real networks will be carried out for generalization into other fields.

Data availability
The raw data about the network structure and operation schedule of MTR in Hong Kong are extracted from http:// www.mtr.com.hk/en/customer/services/system_map.html.
Code availability The original code is written by the authors and therefore cannot be shared. The authors would be willing to provide the code to interested parties upon request.

Declarations
Conflict of interest The authors declare that they have no conflicts of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.