Traffic networks are critical infrastructure systems for community activities and economic growth. However, they are vulnerable to natural hazards, such as earthquakes, floods, storm surges, etc. Rapid restoration of the traffic network is crucial for post-hazard recovery of community life since the traffic network’s performance directly influences the restorations of other community activities. Given a large number of damaged roads and limited recovery resources after a hazard, determining a fast and efficient repairing sequence for traffic network service restoration is crucial yet challenging for decision makers.

The term ‘resilience’ is widely used in engineering to evaluate the system’s ability to withstand interruptions and recover. Rose [1] and Zhang et al. [2] gave a detailed definition of the resilience of traffic networks. Based on the definition, Fig. 1 is a common illustration that depicts a system performance time-dependent curve before and after a hazard happens [3,4,5,6]. Before the hazard happens, the system experiences routine deterioration and maintenance, which largely determines its ability to withstand the hazards. Once the hazard happens, a significant performance drop can be observed. Sequential recovery is conducted after the hazard, labeled the ‘recovery phase’ in Fig. 1. Previous studies have widely used this performance trajectory to quantify system resilience. With the time-dependent system performance trajectory, the area under the trajectory is a widely acceptable resilience index for the measurement of the system’s resilience, which can be defined as the accumulated system performance during the recovery phase (Eq. 1). Following Rose [1] and Zhant et al.’s [2] model, there have been many other developments in resilience frameworks. For example, another resilience framework was proposed by Renschler et al. (PEOPLES resistance framework), which considers more dimensions of community resilience by taking two integrals over time and space [7]. Recently, Sharma, Tabandeh, and Gardoni proposed multiple mathematical formulations that borrow concepts from probability theory, which can be used for various life-cycle trajectories with different trends and time spans [8]. A resilience quantification for interdependent infrastructure was also introduced [9]. A more detailed review of resilience was discussed by Koliou et al. [10].

Fig. 1.
figure 1

Illustration of the concept of resilience and maintenance strategy

$$RI={\int }_{{t}_{0}}^{{t}_{r}}\frac{p\left(t\right)}{tdt}$$

where RI is the resilience index, \({t}_{0}\) is the recovery start time, \({t}_{r}\) is the recovery finish time. \(p\left(t\right)\) is the system performance which is dependent on time t.

Based on the research about road network performance and resilience quantification [11], finding the optimal decisions in road network emergency management is critical for achieving an efficient recovery process, which can generate a resilient recovery process. Existing approaches can be mainly categorized into three categories, i.e., the components ranking-based methods, the mathuristic optimization methods, and machine learning-based methods. For the components ranking-based method, the recovery sequence is determined based on the importance value assigned to each failure component. For example, Tang et al. [12] repaired the traffic monitoring sensors using ‘betweenness centrality’ to rank importance. In another example, Aydin et al. [13] used proximate resources, road hierarchy, and time required to rank each road segment. Although components ranking-based methods feature high computing efficiency, these methods are not resilience-oriented and cannot consider multiple types of information at the same time. Multiple factors such as locations of hospitals, shelters, or schools should be considered in the resilience framework to promote social justice and equity. Hence, a compotent ranking-based method cannot be used when multiple factors are considered in the system performance evaluation. On the other hand, the matheuristics optimization methods overcome this limitation by using global optimization algorithms such as the Genetic algorithm (GA) or Monte-Carlo simulation methods. For example, Zhang et al. successfully used the Genetic Algorithm (GA) method for road-bridge network recovery after a seismic hazard [14]. Mixed Integer Programming is another method that has been widely used. Sharma, Tabandeh, and Gardoni proposed a multiscale optimization approach that can consider multi-infrastructure interdependence [9]. Although these methods can consider the influence of multiple factors on the system resilience as long as the equations are well designed, need massive samplings and significant computational time. Besides, another challenge is that these methods need to solve the problem with specific and known damage situations, which is impossible to be obtained before the hazard happens and cannot utilize the post-hazard real world data. These two limitations made global optimization algorithms not suit for making fast-responding decisions in post hazards.

In recent years, machine learning-based decision-making techniques are emerging. For example, Zou and Chen used a deep ensemble assisted active learning approach to schedule the transportation network recovery with the consideration of multiclass users’ travel behavior [15]. Nozhati used a dynamic programming method to find the near-optimal solution [16]. The deep reinforcement learning algorithm is seen as the most promising method. Although many studies have demonstrated its ability in tackling the optimization problems with high dimensional decision space and state space [17,18,19,20], few studies have used it for the emergency management, such as the decisions in the recovery process [21]. Additionally, most studies utilized the deep reinforcement learning with a given situation, the computation time is significantly long due to the large computation complexity. There is an urgent need for shorting the computation time to achieve a fast and smooth decision in the disaster management [22].

To overcome the trade-off between the performance and computing efficiency of the above-mentioned road recovery method, deep reinforcement learning (DRL), is utilized in this study. However, directly conducting DRL with Artificial Neural Network on traffic network is challenging due to its special graph structure. Hence, a graph convolutional neural network (GCN) based DRL method is proposed to determine the optimal restoration sequence of traffic networks, which is named GCN-DRL model. The benefits of the proposed decision-making framework include: 1)it can be customized with multiple factors such as the location of emergency stations, road damage levels, and different repair time; 2) it utilizes the road network graph structure in the computing process, which does not require manually network embedding ; 3) it is a stepwise decision-making method so the real-world damage situation can be used as the new input into the framework even the provided repairing sequence is not strictly followed; and 4) it can provide a pre-trained model, which can be used for a fast response after a new hazard happens. The organization of this study is summarized below. In Section 2, the system performance metric for road network is illustrated. It is noted that the method used in this study for road system performance is based on the previous studies but can easily incorporate a customized road system performance model. Section 3 describes the novel GCN-DRL decision support model, the framework for model training, and the decision-making process based on the GCN-DRL model. Sections 4 and 5 illustrated the applications of the proposed methods in two case studies of road networks respectively. Finally, Section 6 discussed the factors affecting the proposed model and summarized the major conclusions.

System performance metric

As illustrated in Fig. 1 and Eq. (1), any resilience-informed decision-making requires the quantitive measurement metrics of system performance. The system performance metric proposed by Zhang and Wang [23] (p) is used in this study. However, it is noted that any time-dependent performance measurement metric can be considered in the proposed framework. The metric developed by Zhang and Wang [23] is briefly described in this section. The system performance of the road network is quantified by the weighted summary of intersections’ average number of reliable independent pathways, i.e., the weight of each intersection and the average number of reliable independent paths through that intersection. The weight of each intersection is determined by its location. The average number of reliable independent pathways is determined by the independent paths, traffic flows, and road reliability of each road segment (R). The road reliability can be used to indicate the road’s damaged condition. It should be noted there are two main differences between the applied metric and the original reference. Firstly, a 1 km threshold is used to compute the weight of the intersection. Secondly, the traffic volume of each road is ignored due to the lack of post-hazard traffic data support. The traffic flow is an important component in the resilience quantifying process [15, 24, 25]. However, because of the simplicity required of this study, as well as the lack of post-hazard traffic data support, this parameter is ignored. The proposed framework is still applicable considering that the influence of traffic flow can be represented by the value of the average number of reliable independent pathways of each node.

The weight of each intersection is determined by its distance to the nearest emergency response facilities (Eq. 2). The original criteria is modified in this study to avoid a too large weight value when the shortest distance is much smaller than 1 (i.e., Eq. (3)).

$${w}_{i}=\frac{{\Omega }_{i}}{{\sum }_{j=1}^{n}{\Omega }_{j}}$$



\({w}_{i}\) is the intersection’s weight, \({{\varvec{D}}}_{{\varvec{i}}}\) is the distance set of intersection i to the pre-defined emergency respond facilities; \({\Omega }_{i}\) is the reciprocal of the distance between node i and its nearest emergency response facility. When the distance is less than 1 kilometer or the intersection itself is an emergency response facility, \({\Omega }_{i}\) equals 1.

At any given time t, the average number of reliable independent pathways of the intersection i is determined by Eq. 4.

$${r}_{i}=\frac{1}{n-1}\sum_{j=1, j\ne i}^{n}\sum\limits_{k=1}^{{K}_{\left(i,j\right)}}{v}_{k}\left(i,j\right){R}_{k}\left(i,j\right)$$

where \({R}_{k}\left(i,j\right)\) is the reliability of the kth independent path; \({v}_{k}\left(i,j\right)\) is the weight of kth independent path.

The intersection’s average number of reliable independent pathways is determined based on independent pathways’ reliability and weight between any origin-destination (O-D) pairs. Mathematically, for any independent pathway between intersection i and j, its independent pathways’ reliability \({R}_{k}\left(i,j\right)\) can be determined by Eq. 5.

$${R}_{k}\left(i,j\right)=\prod\limits_{\forall l\in {P}_{k}\left(i,j\right)}{R}_{l}$$

The weight of kth independent path through the intersection can be determined by Eq. 6:

$${v}_{k}\left(i,j\right)=\frac{{L}_{\mathrm{max}\left(i,j\right)}}{{L}_{{P}_{k\left(i,j\right)}}\cdot {\sum }_{k=1}^{K\left(i,j\right)}\left(\frac{{L}_{\mathrm{max}\left(i,j\right)}}{{L}_{{P}_{k\left(i,j\right)}}}\right)}\times K\left(i,j\right)$$

where \({R}_{k}\left(i,j\right)\) is the reliability of the kth independent path. l is the road segment that belongs to the independent path and \({R}_{l}\) is its reliability after hazard. \({v}_{k}\left(i,j\right)\) is the weight of kth independent path. \({K}_{\left(i,j\right)}\) is the number of all independent paths between node i and j. \({L}_{\mathrm{max}\left(i,j\right)}\) is the maximum length and \({L}_{{p}_{k\left(i,j\right)}}\) is the kth length.

With the weight of each intersection and the average number of reliable independent paths through each intersection determined, the simplified system performance metric is derived from Eq. 7

$$p\left(t\right)=\left(\sum_{i=1}^{n}{w}_{i}{r}_{i} \right)\times 100\mathrm{\%}$$

where \(p\left(t\right)\) is the performance of the road-network at time t, \({r}_{i}\) is the average number of reliable independent pathways; \({w}_{i}\) is the important weight of each node i;

GCN-DRL decision-making framework for resilience road network restoration

The proposed decision-making framework as shown in Fig. 2 contains three main components, the proposed GCN-DRL model, the training process, and the decision-making process. The proposed GCN-DRL model contains a combined Graph Convolutional Neural network (GCN) and Artificial Neural Network (ANN). It is used to embed the current state of road network and output the ranking of available decisions. The training process is based on the conventional DRL training framework, which is a trial-and-error process. The parameters inside the GCN-DRL model are trained during the training process. The decision-making process is the decision process that is used to determine the repairing sequence. A detailed description of the architecture of the GCN-DRL model, the training process, and the decision-making process will be explained in the following sections.

Fig. 2
figure 2

The proposed GCN-DRL model, training process, and decision-making process (m is the predefined training times)

GCN-DRL model architecture

The DRL is an advanced ML technique that integrates features of reinforcement learning (RL) and deep learning. The former is used to characterize a method that solves learning problems based on trial-and-error search [26], while the DRL allows the agent to make decisions from unstructured large input data without manual intervention with the help of deep learning. The objective of DRL is to train a ‘deep Q function’ that can estimate the reward of each action. Then the decisions can be simply made by selecting the action with the largest reward. Moreover, it is commonly known that for a global optimization problem, local optimization often leads to a suboptimal result due to the future influence is not considered. To overcome the limitations, the reward value of each action by DRL simultaneously considers the instant reward and the future reward. Mathematically, the future reward value of each action at a specified state can be determined by Eq. 8 [27].

$${Q}^{*}\left(s,a\right)={\mathbb{E}}\left[\underbrace{(1-\gamma) \cdot R}_{\substack{instant\\ reward}}+\gamma \bullet \underbrace{{arg \, max} \ Q^* {(s^{\prime},a^{\prime})}}_{optimal \; future\;reward}\right]$$

where \({Q}^{*}\left(s,a\right)\) is the reward value of action a when the traffic network state is s; \({\mathbb{E}}[\bullet ]\) denotes the mathematical expectation; R denotes the instant reward, i.e. the system performance instant improvement after taking action a; \({{arg \, max} \ Q^* {(s^{\prime},a^{\prime})}}\) denotes the optimal future reward where \(s^{\prime}\) is the state of the traffic network in the next step and \(a^{\prime}\) is the corresponding optimal action; \(\gamma\) is the return discount factor where 1 denotes only considering future reward and 0 denotes only considering the instant reward of the action.

Conventional DRL often uses the artificial neural network (ANN) as its ‘Deep Q function’ to estimate the reward value of each action under a given system state. Then the action with the highest reward is selected to achieve the globally optimal result (detailed training process refer to section 3.2). Figure 3 illustrates the considered traffic network state in this study, which is represented as a graph structure by \(G=(V, E)\). The graph structure is built based on the traffic network structure and the attribute of each node is represented by its average independent pathways (Eq. 4). Considering the influence of restoring a road segment that will be parsed along the edges rather than spread in Euclid space, a machine learning algorithm that can consider similar patter is more preferred. Hence the graph convolutional neural network (GCN) is applied as the key component of the proposed GCN-DRL model.

Fig. 3
figure 3

An example of graph structure characterized by nodes and edges (\({r}_{i}\) is the averaged independent pathway numbers defined in Eq. 4)

The graph neural network is a state of art neural network that can directly operate on graph structure data. It has been applied in multiple domains and achieved promising results, such as in traffic networks, graph knowledge, and recommendation systems [28]. Previous studies mainly used it for graph classification and prediction tasks as reviewed by Zhou et al. [28]. As inspired by the conventional Convolutional Neural Network (CNN), the GCN convolutes the node features along the connected edges instead of within a Euclid space. Hence, the convoluting process is very similar to the influence parsing process of repairing a road segment, both of which are transferring along the edge and have impacts on the network nodes.

The detailed architecture of the proposed GCN-DRL model is illustrated in Fig. 2 and also described here. The created GCN-DRL model consists of two blocks (blue areas in Fig. 2), i.e., the GCN block and artificial neural network block. The GCN block transforms the node attribute from 1 dimension (number of independent pathways) to 128 dimensions, which means the output of the GCN block is a graph structure whose nodes have 128 dimensions. Two layers of GCN with sufficient neurons are used to ensure the node and network structure information can be sufficiently extracted during the convolution process. This process is similar to the normal CNN convolutional process except only the data of neighbors are convolved [28]. The graph convolution process is mathematically expressed in Eq. 9.

$${H}^{l+1}=\sigma \left({\widetilde{D}}^{-\frac{1}{2}}\widetilde{A}{\widetilde{D}}^{-\frac{1}{2}}{H}^{l}{W}^{l}\right)$$

where \({H}^{l}\) is the \({l}^{th}\) layer of GCN neural network, when \(l=0\), \({H}^{0}=X\). X is the feature matrix of the graph whose dimension is \(N\times D\), N is the number of nodes, D is the features of each node. \(\widetilde{A}=A+I\), A is the representative description of the graph structure, an adjacency matrix is used in this study. I is the identity matrix of A.\(\widetilde{D}\) is the diagonal node degree matrix of \(\widetilde{A}\), \(\sigma \left(\bullet \right)\) denotes the activation function. Relu is used in this study. \({W}^{l}\) is the weight matrix of the \({l}^{th}\) layer.

To project the traffic network state into action space, the convolved values of all nodes are averaged and fed into an ANN model with two layers. The last layer (output layer) contains the same number of neurons to the action space. Hence the final output values correspond to the reward values of each action. Although not shown in the flow chart, a ReLu activation function is used between each layer and before the output layer to enhance the nonlinear ability of the neural network. The output layer from the ANN is the total reward (including instant and long-term rewards) corresponding to each action, from which the optimal action that leads to the highest reward can be selected.

GCN-DRL model training process

The parameters (weights and bias) of the neurons in the GCN-DRL model is initialized with random value before the training process. The commonly used training framework of DRL is adopted to train the GCN-DRL model. The unique feature of the GCN-DRL model is that the ‘deep Q function’ is based on the proposed GCN-ANN model as described in section 3.1. The proposed GCN-DRL framework can be purposely trained as an AI agent to pick actions that give the highest reward. When the reward is set to correspond to system resilience, this will lead to a sequence of decisions that lead to fast system recovery under a hazard situation. The detailed process to apply the GCN-DRL model to the post-hazard recovery of a traffic network is described as follows.

  1. (1)

    The traffic network is initialized with a random damage scenario. Two parameters are also predefined in this stage, i.e., the reward discount factor \(\gamma\) in Eq. 8, the total training episodes m. The total training episodes m determines the total trials of the GCN-DRL model. A larger training episode may provide better decisions but also take longer computing time.

  2. (2)

    Then the training process of the GCN-DRL model initiates. The state of the traffic network, which is represented by the network structure and node’s average number of reliable independent pathways is fed into the GCN-DRL model. The output of the GCN-DRL model is the estimated future reward of all repairing decisions, i.e. the future reward of repairing each road segment. In the beginning, the parameters in the GCN-DRL model have randomly initialized hence the future rewards of each decision are also random values.

  3. (3)

    After projecting the traffic network state into decision-reward space, the estimated best decision can be determined by Epsilon-Greedy policy [29]. The repairing road is selected either by randomly sampling or by the GCN-DRL model. In the training process, the probability that the decision made by GCN-DRL model will gradually increase to 100%.

  4. (4)

    After selecting the decisions, the traffic network will be updated by reopening this road segment. Based on the decision and updated traffic network state (by Eqs. 2,3,4,5,6), two paths are conducted to determine this action’s ‘instant reward’ and ‘future reward’ as shown in Fig. 2. The ‘instant reward’ is determined by the traffic network state performance before and after taking the decision (orange color path). The ‘future reward’ is determined by the largest ‘future reward’ value after feeding the updated traffic network state into GCN-DRL model.

  5. (5)

    With the determined ‘instant reward’ and ‘future reward’ of the next state, the real ‘future reward’ at the current state can be determined by Eq. 8. This value is used to train the GCN-DRL model to tune the parameters (weight and bias) of each neuron.

  6. (6)

    During the training process, a list of repaired road segments is recorded. At each time step, the future rewards of the decisions in this list are set as 0.

  7. (7)

    Repeating the process (2) to (5) until all the road segments are recovered is defined as 1 episode.

  8. (8)

    Once the training process is finished for one recovery revolution, the value of resilience index can be calculated based on Eq. 1.

  9. (9)

    Regenerate a random damage scenario and feed it to step (1).

  10. (10)

    Repeat the process from (1) to (7) m times. The m is the predefined number of training episodes.

As can be seen, the estimated action reward value by the GCN-DRL model begins from a random value. With the training process continuing, the reward values that are used to train the GCN-DRL are closer to the true values hence the GCN-DRL model is expected to make decisions better and better. After the training episodes exceed the predefined training number m, the pre-trained GCN-DRL model can be saved for future decision-making. Although the proposed framework is purposely used for training a decision-making model that can handle any damage scenarios, it can also be used for getting the optimal decisions for a specific damage scenario. To find the optimal decisions for a specific damage scenario, step (9) can be replaced by using the same initial damage situation in step (1). Reasonably, training a decision-making model for a specific damage scenario needs fewer training episodes than training that for any damage scenario.

Several hyperparameters are involved in the proposed GCN-DRL model, including the layers of applied graph convolutional neural networks, the number of neurons used for each layer, the learning rate, optimization function, and reward discount rate (γ in Eq. 8). In this study, two layers of graph convolutional neural networks (each containing 128 neurons) are used to guarantee the model has enough nonlinear capability for network embedding. A higher number may increase the ability of the model’s non-linearity ability but will also increase the computation time. A smaller number may decrease the model’s ability of decision-making under high dimensions of state-action space. Additionally, a learning rate of 0.005 is used. This is because we prefer to use a larger number of training times with a smaller value of learning rate to achieve a smoother training process. The ’Adam’ Optimizer is selected in the experimental process. The reward discount value, γ, is determined at 0.5 after multiple times of experiments. It was observed that simply setting it to 0 (only considering instant reward) or 1 (only considering future reward) cannot achieve the highest resilience index value in the end. Although these parameters are selected based on multiple experiments, other parameter searching methods can be considered in engineering applications such as grid search and Bayesian optimization [30]. Moreover, the technique ‘Experience replay’ [31] is used in this study to achieve a smooth and stable training result. The key idea behind the ‘Experience replay’ is trying to use a random subset of multiple trials and the corresponding Q values to train the deep Q agent rather than only using the single most recent action. The agent (GCN-based DRL) is performed by python deep graph library [32] and PyTorch library [33].

GCN-DRL for decision making

After building the GCN-DRL model as described in Section 3.1 and conducting a comprehensive training process (Section 3.2), a pre-trained GCN-DRL model is available. Since the GCN-DRL model is trained with random damage situations, it can be used for any new damage scenario to achieve a repairing sequence that gives the fast system recovery. With the trained GCN-DRL model, each action is selected among the possible actions that give the maximum reward values. By repeatedly updating the traffic network state after each action, the final repairing sequence can be determined sequentially.

Case study I: Road network recovery sequences post-earthquake

The application of the proposed GCN-DRL decision-making framework is firstly illustrated by using a part of the road network from Pomona, California. This city is about 60 kilometers away from Reseda, Los Angeles, which is the epicenter of the 1994 Northbridge earthquake. The selected road network contains 93 junctions that are connected by 136 road segments as shown in Fig. 4. The city road network is abstracted by using the python library OSMnx [34]. The training process and performance of the proposed GCN-DRL model is firstly illustrated by improving the repairing decisions over a specific road network damage situation. Then a universal GCN-DRL model is trained with randomly generated damage situations. Two other methods for road network repairing decisions, i.e., the genetic algorithm method and centrality-based repair prioritization method, are compared over the performance of repairing decisions over the same damage situations. The following assumptions are made to recover from the earthquake hazard. Similar strategies have been widely used in previous studies [35, 36].

  1. 1)

    The road segments are intact before the earthquake, with a reliability of 1.0 assigned. The reliability of a road segment is reduced based on the extent of damages.

  2. 2)

    Due to the constraints such as budget, manpower, and other resources, we assume only one road segment is repaired at each time step. However, it should be noted that, multiple repairing teams are probable. They could also refer to this repairing sequence.

  3. 3)

    After completing the repairing process, the reliability of repaired road segment is restored to 1.0

  4. 4)

    The repair time (in days) for each damaged road segment is dependent on its reliability. In this study, the repair time from FEMA is adopted [37]. For the road segments whose reliability is below 0.2, the repair time is assumed as 7 days. For the road segments whose reliability is above 0.8, the repair time is assumed as 1 day. For the others, the repair time is assumed as 2 days. It should be noted that this repair time can be modified when more information is available.

  5. 5)

    The road network performance index is computed and recorded after each damaged road segment is repaired.

  6. 6)

    The road network restoration process continues until when all the damaged road segments are fixed and their reliability values are restored to 1.0

Fig. 4
figure 4

Overview of the portion of the road network of Pomona, California that is analyzed

Initial damage situation under earthquake hazard

Seismic fragility curves are used for road system resilience assessment [38]. The fragility curve developed by HAZUS [39] is utilized to estimate road reliability after the earthquake. According to the HAZUS, the failure probability of a road section exceeding a given damage state can be modeled as a cumulative lognormal distribution function as shown in Eq. 10. Hence the reliability can be determined by Eq. 11.

$${P}_{f}\left(S\right)=\Phi \left[\frac{1}{\beta }\mathrm{ln}\left(\frac{S}{\mu }\right)\right]$$

where \({P}_{f}\left(S\right)\) is the probability at a given intensity measure value; \(\Phi\) is the standard normal cumulative distribution function; \(\beta\) is the standard deviation and \(\mu\) is the median parameter for seismic intensity measure.

In this study, the considered road segments are mainly urban roads with two traffic lanes and the ‘moderate damage level’ is considered. According to Argyroudis [40], the post-earthquake permanent ground deformation (PGD) is a widely used intensity measure for pavement damage assessment and the parameters (\(\beta\) and \(\mu\)) are set as 0.7 and 0.30 meters respectively. The corresponding fragility curve is shown in Fig. 5.

Fig. 5
figure 5

Seismic fragility curve of a pavement at moderate damage state [40]

GCN-DRL training for specific damage situation

The seismic fragility curve in Fig. 5 was applied to generate the post-earthquake damage conditions for the road network. To demonstrate the universality and robustness of the proposed decision-making framework, a random PGD value selected from a uniform distribution (0 to 1.2) is assigned for each road segment. Figure 6 shows a specific damage situation from one random sampling. The reliability of road segments varies from 0.1 to 0.95. In practical application, estimation of the reliability of each road segment can be improved with the known PGD values or on-site inspection by use of Eqs. 10 and 11. Besides, the ‘emergency response facilities’ for post-hazard recovery are annotated by the red points. There are five emergency response facilities.

Fig. 6
figure 6

Post-earthquake road segment reliability map (locations of emergency response facilities are shown as red point.

The GCN-DRL model is firstly trained with the proposed training framework illustrated in Fig. 2 for this specific initial damage situation. The total number of training, m, is set as 500. The reward discount factor \(\gamma\) is set as 0.5, which means the model would equally consider the influence of instant reward and future reward. Moreover, the instant reward is defined as the improvement of the road network system performance by each repair action. Mathematically, the instant reward function is calculated by Eq. 12. The future reward is estimated by the GCN-DRL model, which will gradually converge with the training process.


where \({R}_{t}\) is the instant reward of the action taken in time t, \(p\) is the system performance as stated in Eq. 7. \({T}_{t}\) is the repair time in assumption (4).

Figure 7 shows the training process of the proposed GCN-DRL training framework. The total training time is about 27 hours when using a desktop with Intel i7 and Nvidia 2070. The system resilience index of each recovery round is recorded (as shown in Fig. 2) and plotted. To have a better visualization, the resilience indexes during the training process are smoothed by using the Savitzky-Golay filter [41]. As can be seen, the resilience index of each trial shows small values and a larger fluctuation at the initial training stage. The smoothed curve shows there is a steady increase of resilience index with the continuation of the training process. The final resilience index of the repairing sequence increased from around 150 to over 300 after 1,500 times of training times. The gradually increasing curve demonstrates the GCN-DRL model is finding more and more optimized repairing sequences during the training time.

Fig. 7
figure 7

The change of resilience index based on the repair decisions by the GCN-DRL model during the training process

Determine repairing sequence for a specific damage situation

To compare the performance of the GCN-DRL-based decision framework, another two decision-making strategies are utilized as a baseline comparison. These include the repair strategy based on genetic algorithm and repair strategy based on ranking the betweenness centrality [12]. These two strategies are chosen for comparison since they are the most common and convenient ways used for determining the network recovery sequence. The selected comparison methods are briefly described below. It should be noted there are many other repair decision strategies proposed in the previous studies.

Repair strategy based on genetic algorithm

Genetic algorithm is a well-developed method for global optimization. A conventional genetic algorithm for combinatorial optimization problems is utilized. The ‘OX’ crossover method is adopted as described by Moscato and Pablo [42]. A total of 7500 trials are used to obtain the final solution, including 10 populations with 750 generations. Hence, the total trial numbers of GA algorithm are 5 times larger than the GCN-DRL model.

Repair strategy based on betweenness centrality

The betweenness centrality [43] is utilized to set the repairing prioritization for the damaged road segments. The betweenness centrality indicates the number of times a road segment is passed by all pairs of shortest paths. The higher the betweenness centrality of a road segment, the more important it is for the network connectivity. Mathematically, the betweenness centrality of a network edge can be expressed in Eq. 13 [44].

$$B{C}_{e}=\sum\limits_{i,j\in V}\frac{ \varepsilon \left(i,j|e\right)}{\varepsilon \left(i,j\right)}$$

where \(V\) is the set of all nodes, \(\varepsilon (i,j)\) is the number of shortest paths between node i and j. \(\varepsilon (i,j|e)\) is the number of these paths that passing the road segment e.

To effectively obtain the betweenness centrality value for each road segment, Borgatti’s algorithm is adopted [45]. The final betweenness centrality map of the road segments for this case study is shown in Fig. 8.

Fig. 8
figure 8

The map of betweenness centrality values of the road segments

The final performance of the three different repair decision strategies (i.e., GCN-DRL, Genetic Algorithm, and Betweenness centrality) are compared from two major aspects, i.e., the efficiency in road network performance restoration and the computational efficiency. The road network restoration efficiency is measured by the final resilience index value (RI) and the time required to achieve certain levels of system performance. The time-dependent road network system performance curves using repair decisions from these three decision strategies are shown in Fig. 9. It is noted that the recovery processes determined by genetic algorithm-based repair strategy are indicated by the shadowed area, with the upper and lower boundaries indicating the obtained best and worst performance of repairing sequences respectively.

Fig. 9
figure 9

Comparison of the development of post-earthquake system performance by repairing sequences determined by different decision methods, (GA is short for genetic algorithm, GCN is the proposed method, and BC is short for betweenness centrality)

The higher resilience index value corresponding to the system recovery curve, i.e., the curve with a higher under-curve area, indicates a higher resilience and therefore corresponds to a better repair strategy. Among the three methods compared, the repair sequence by the proposed GCN-DRL model significantly overperforms the other two methods. The repair sequence prioritized based on the betweenness centrality only slightly underperforms the best repairing solution by the genetic algorithm, which is a global optimization method. Figure 9 also denotes the time required of different strategies for the system to recover 80% of its original performance. As can be seen, for the best repair sequence by the genetic algorithm, it takes about 580 days to achieve the 80% recovery while the worst scenario takes as long as 630 days. The repair sequence based on the betweenness centrality needs about 680 days to recover 80% of system performance, which is between the best and worst solution of genetic algorithm solutions. The repair sequence by the GCN-DRL model only needs about 548 days to achieve 80% system performance, which significantly outperforms the other two decision strategies. The fast recovery ensures higher system resilience.

Performance of pre-trained GCN-DRL model in providing repair decisions on new damage situations

The previous comparison showed that the GCN-DRL decision support framework achieves faster system performance recovery compared with alternative approaches such as the genetic algorithm and the betweenness centrality prioritized repairing. However, it was also observed the computational efficiency is relatively low due to the model training process. For the betweenness centrality-based, the iterating times is \(1\times M\) for the repairing sequence to be determined. The number of iterations required for the GCN-DRL method model training and genetic algorithm is \(\times M\) , \(N\) is the number of trial times and M is the number of damaged roads. Consequently, the computational time significantly increases. This is also a common criticism for similar global optimization methods. The requirements on the time-consuming training process potentially will limit the ability of the model for fast-responding when a hazard happens. However, unlike the genetic algorithms, the proposed framework allows training a universal GCN-DRL model before the hazard happens via the procedures illustrated in Fig. 2. With a pre-trained GCN-DRL model, a close-to-optimal road network repairing sequence can be quickly obtained for any new hazard damage situations without the need of additional training. This strategy will significantly reduce the computational time to deploy the GCN-DRL model to meet the needs for emergency responses.

Analyses are conducted to illustrate the performance of the pre-trained GCN-DRL model to identify a resilient repair sequence under new damage situations. A GCN-DRL model is firstly trained with data from different initial road damage situations (Fig. 2). The initialize process is conducted by repetitively assigning randomly generated PGD to each road segment and then computing its reliability based on the fragility curve (Fig. 5). The total number of training steps is increased to 10,000 due to the significant increment of state space. The major computational efforts are for network performance evaluation and neural network training. Correspondingly, the total training time required is around 12 days with a desktop computer without GPU acceleration (this time may vary with the configurations of the computer). The pre-trained GCN-DRL model is applied to analyze four other new damage scenarios as shown in Fig. 10.

Fig. 10
figure 10

Four new damage scenarios

The parameters of the pre-trained GCN-DRL model are saved and then loaded to handle the new damage scenarios. The repair sequence is determined by the GCN-DRL by only applying the GCN-DRL model with the inputs of the initial damage conditions of the road network. The other two repair decision-making methods are also used to obtain the final repairing sequences as well. The corresponding system recovery trajectories for each damage scenario based on the repair sequences by different decision strategies are compared in Fig. 11. It can be observed that the recovery process based on the pre-trained GCN-DRL model outperforms the other two decision-making methods significantly for all these different damage scenarios. The area under the recovery curve of GCN-DRL model is much larger, which indicates a more resilient recovery process. Also, it took less amount of time to achieve 80% recovery of the road network performance by GCN-DRL model than those by the other two methods.

Fig. 11
figure 11

Comparison of system recovery under different new damage scenarios based on repairing sequence determined by different decision methods (i.e., pre-trained GCN-DRL model, genetic algorithm, betweenness centrality)

Figure 12 summarizes the final system resilience index based on different repair decision models and the corresponding computational time. As can be seen from this figure, with the utilization of the pre-trained model, the repair sequence by the GCN-DRL model achieves the highest system resilience index with a low computational time. The repair sequence based on betweenness prioritization used the least amount of computational time, its performance in system recovery, however, is also the worst. The genetic algorithm took around 14 hours to finish the computing for each damage scenario.

Fig. 12
figure 12

Comparison of the system recovery based on repair decisions by different methods a the final SRI; b the computational time

Case study II: Rapid decisions for flood hazard

The road network recovery after flood hazard [46] is analyzed to further assess the performance of the GCN-DRL model in its capability to determine the repair sequence when subjected to a different hazard. The road network used in this case study is a part of the University Heights, Cleveland, Ohio, USA. The road network contains 95 intersections connected by 141 road segments, as shown in Fig. 13.

Fig. 13
figure 13

Examined road network of University Heights, Cleveland, OH for case study II

The impacts of flooding on road network simulation

The capacity of the road sections is compromised by the flood. To evaluate its impacts on the road operation, road flooding conditions are required. Different flood diffusion models such as HEC-RAS model, ISIS model, MIKE model, etc have been proposed to predict the flood conditions at different locations [47]. In this study, the Susceptible-impacted-susceptible (SIS) network diffusion model is used for the generation of flooding scenarios along different road sections [48]. There are two primary parameters in the SIS network diffusion model, i.e. average transition probabilities (\(\alpha\)) and recovery probabilities (\(\gamma\)), that dominates the flood diffusion process in the SIS diffusion model. The parameter \(\alpha\) describes the probability of one node falls into ‘flooded’ class if one of its neighbors is flooded. The parameter \(\beta\) describes the probability of one node recovered from the ‘flooded’ into ‘normal’. The same parameters \(\alpha (0.02)\) and \(\gamma (0.013)\) as proposed by Bahrulla [48] are used to analyze the impacts of the flood on-road sections. It should be noted that for cities with localized flood monitoring data, these two parameters can be further calibrated using the data-analysis methods discussed in the original paper by Bahrulla [48]. For the impacts of flood inundation on-road section, the status of the road segments is either as ‘completely shut down or ‘completely open’ to traffic. Hence the reliability of each road segment during the flood is set as either 0 or 1. This is different from the continuous reliability values assigned to road sections post-earthquakes based on their extent of damages.

The effects of flood on road network performance and resilience index

The road network performance and resilience for the road network under flood hazards are measured using the same quantification methods described in Section 2. Since the reliability of each road can only have the binary status of 0 or 1, the node performance can be simplified by the use of Eq. 14 and the system performance can be simplified as Eq. 15

$${r}_{i}=\frac{1}{n-1}\sum\limits_{i=1, j\ne i}^{n}=\sum \limits_{k=1}^{{K}_{\left(i,j\right)}}{w}_{k}\left(i,j\right){R}_{k}\left(i,j\right) =\frac{1}{n-1}\sum\limits_{i=1,j\ne i}^{n}K\left(i,j\right)$$

where n is the number of nodes in the network. \(K(i,j)\) is the number of independent paths between node i and node j. \({r}_{i}(t)\) is the average number of reliable independent pathways of node i at time step t.

Training a universal GCN-DRL model for road network recovery decisions

As the damage state of each road can either be 0 or 1, the initial situation can be modeled by assuming all the road segments are ‘damaged’. Hence the training process can approximately cover any new damage situations in real-world conditions. The number of training episode is set as 4,000. The state of the road network is represented by the node’s performance value \({r}_{i}\) as shown in Eq. 14 and the road network structure. The same reward function (Eq. 12) and reward discount value \(\gamma\) (0.5) are used in this case study.

The training process of the GCN-DRL model is shown in Fig. 14. As mentioned, one episode corresponds to one round of the complete road network recovery process. The computation is performed on a Windows desktop with 16GB RAM, intel CORE i7 process and Nvidia 2060. The total training process took 43 hours and 32 minutes. The mean value of the resilience of the first 300 episodes is only 36.25, while the last 300 episodes achieve around the average value of 44. The variance of resilience during the learning process is relatively larger due to the enormous actions and state spaces since all road segments are assumed to be ‘damaged’ initially. The internal weights and biases of the pre-trained GCN-DRL model is saved for subsequent analyses.

Fig 14.
figure 14

The variations of system resilience index during the training process of the GCN-DRL model under the assumption that all road sections are inundated by flood initially

The performance of pre-trained GCN-DRl under new flooding situations

New flooding situations of the road network are simulated by using the flood diffusion model. Four new flooding situations along the road network are simulated by using the flood diffusion model with flooding initialized randomly. The results of flood inundated road sections are shown in Fig. 15. The number of submerged road segments varies between 12 and 34 based on different flood scenarios.

Fig. 15
figure 15

Four different flood situations simulated by the SIS network diffusion model (road section indicated as red are submerged under flood)

Three decision-making methods are used to obtain the recovery sequence, i.e., prioritize where the pump should be deployed to open the road sections to traffic.

  1. 1)

    The pre-trained GCN-DRL model that loads the ‘training experience’ to solve new flooding situations, is named the universal GCN-DRL model.

  2. 2)

    The GCN-DRL model that is trained from scratch for each specific damage scenario, is named the flood-specific GCN-DRL model.

  3. 3)

    Betweenness centrality-based prioritization method. This method is used as a comparison benchmark since a strong relationship exists between the edge betweenness centrality and network connectivity. The previous study has also demonstrated the superiority of centrality-based recovery when applying on a planar network [49].

The road network resilience index values by using the best road section recovery sequences from these three different decision strategies are shown in Fig. 16 a). The results show that all these three decision methods (i.e., the universal GCN-DRL model, specified GCN-DRL model, and betweenness centrality prioritized based model) all achieved a similar system resilience index. The results make sense since the betweenness centrality is highly correlated to the graph connectivity of the road network and no other operational parameters are considered in this case study. Therefore, the recovery sequence based on betweenness centrality ranking approximates the optimal solution. Among these three methods, the flood-specified GCN-DRL model slightly outperforms the rest two models under the first and third flood situations. However, when considering the computational time needed, the flood-specific GCN-DRL method requires 2 to 3 hours to train the model, which is a much longer time than the time needed to use the universal GCN-DRL model. By contrast, the pre-trained universal GCN-DRL model and the graph-theory method based on the betweenness centrality only takes around 9 seconds to obtain the final solution and lead to recovery sequences that give high system resilience index values. The result demonstrates that pre-trained GCN-DRL model can be deployed for emergency decisions for rapid responses after a hazard.

Fig 16
figure 16

Comparison of the performance of post-flood road network recovery decisions by three different methods a SRI values of the road network b the computational time


In this article, a novel GCN-DRL model is developed to determine the optimal recovery sequence of road networks subjected to different types of natural hazards. The proposed decision-making framework allows the GCN-DRL model to be trained before the hazard happens by letting the model freely explore the different damage scenarios. The performance of the decision support model is evaluated by its application to post-hazard recovery of two testbed road networks subjected to earthquake and flood respectively. The results from both case studies demonstrated that the GCN-DRL model can be trained using randomly generated damage scenarios before the hazard happens and can be applied immediately to determine the optimal road network recovery sequence for rapid resilient post-hazard responses. The model possesses several unique features. First, as a resilience-informed global optimization method, it achieves a better decision sequence than ranking repair sequences based on the betweenness centrality, especially when multiple factors are considered in network performance evaluation. It also shows higher computing efficiency than the genetic algorithm. Secondly, the graph reading ability of the proposed method can utilize the road network structure directly without any manually embedding. Lastly, the decision-making time can be significantly saved by using a pre-trained machine learning model, which shows the potential that the model can be trained before a hazard happens with a supercomputer.

The GCN-DRL model provides a novel decision-support tool to assist emergency management decision-makers. While the model is demonstrated on a small road network, it can be readily extended for a larger scale of network by using more advanced computer configurations such as multiple graphics processing units (GPUs).