A problem-specific parallel pareto local search for the reactive decision support of a special RCPSP extension

The disaster information collection mission should be executed after the disaster occurs to provide details for the decision-makers. During the execution of the information collection mission, some disruptions may occur and prevent the resource used for information collection from completing the mission as planned. It is difficult for decision-makers to make reactive resource scheduling plan that optimize the mission’s execution time, quality, and cost at the same time under such circumstances. This article focuses on designing the reactive decision support algorithm for the disaster information collection resource scheduling, which aims to provide multi high-quality scheduling plans for decision-makers to choose. The problem studied in this article is modeled as an extension of Resource-Constrained Project Scheduling Problem (RCPSP). First, the basic problem formulation for a normal schedule and two disruption recovery models are presented. Second, a novel framework of a parallel pareto local search based on decomposition is designed to repair the schedule within the time limit. Third, two solution acceptance criteria based on constraint handling and negative correlation are specially designed to maintain high-quality population with diversity. The experiments show that the proposed method outperforms the other competitors with respect to Inverted Generational Distance, Spacing, and Hypervolume, which means that the proposed method can help decision-makers to make better decisions.


Introduction
Disasters have caused huge losses to human beings.Many scholars have studied different disaster relief issues [1][2][3][4] to reduce losses and save lives.At present, decision-makers can make better decisions through decision support technology in many fields; such as ship trajectory cleansing and prediction [5], maintenance strategies making [6], bank telemarketing sales prediction [7], vehicle routing [8], and smart grid management [9].The disaster information collection resource scheduling decision support module plays a vital role in modern disaster relief command and control systems, which aims to present high-quality resource allocation plans to decisionmakers.A disaster information collection mission has several tasks with precedence relations.The precedence relations restrict that one task cannot be started if at least one of its predecessor tasks is not finished.This is because there are many hidden dangers, such as fire and chemical leakage after the disaster, so the precedence relations are built based on the geographic accessibility of each task to ensure the safety of the information collection agents.Each information collection task needs some skill provided by the information collection agent.The agent represents a team with fewer than 5 people, which is the smallest unit for resource scheduling.The team can own vehicles, laser radars, UAVs, ranging instruments, and other equipment.The information collection in this article includes three types of tasks: urban area information collection, woodland area information collection, and terrain information collection.Because each type of task corresponds to different working modes of agents and uses different sensor combinations, this article models the ability of agents to perform tasks as skill 1-3.Each information collection agent has a pre-defined quality value and a cost value that correspond to each task.An example of an information collection mission in disaster relief scenario is presented in Fig. 1.During the execution of the information collection mission, some unpredictable events may occur and prevent the information collection mission to be processed with the original schedule.Since the information collection agents are performing tasks in the area of interest when the disruption occurs, according to the requirements of the disaster relief decision support systems, the reactive scheduling time needs to be controlled within 60 s.The reactive scheduling of disaster information collection repairs the original schedule, which helps the decision-maker to answer the following questions: (1) each task's start time; (2) the information collection agents which are assigned to the task; (3) which kind of skill that each agent should use in each task.Three objectives are optimized at the same time, minimize the makespan, minimize the cost, and maximize the quality of the information collection mission.
The disaster information collection resource scheduling problem is modeled as an extension of resource-constrained project scheduling problem (RCPSP).RCPSP focus on building a resource allocation result for a commercial project of activities, which are constrained by a limited resource supply and the precedence relation network [10].The information collection mission is just as the "project" in RCPSP, the "task" corresponds to the "activity" in RCPSP, and the agent is very similar to the "multi-skill resource" in RCPSP.The processing time of each activity is assumed to be fixed in RCPSP, but the information collection task's processing time is not fixed.Actually, it is a non-linear function of the resource allocated to that task.For example, both one agent or two agents can finish an information collection task, but the task completion time for two agents is shorter.In RCPSP, the resource transfer time between activities is usually set to 0. However, the task's location is different in our problem, so the resource transfer time must be considered.These new features increase the difficulty of solving the problem in this article.Using the traditional RCPSP algorithm to solve the problem in this article is time-consuming and ineffective.Some related works are addressed in this part.Tradition algorithms for solving RCPSP can be divided into three categories: exact algorithm, heuristic algorithm, and metaheuristic algorithm.Some articles [11][12][13] used the exact method to solve RCPSP and get the best solution.However, the exact algorithm is time-consuming and does not fit the reactive scheduling problem in this article.
Some articles [14][15][16] developed heuristic methods to solve RCPSP.Although the heuristic algorithm is really fast, it cannot deal the multi-objective optimization cases.
Since the problem in this article is a multi-objective reactive robust optimization problem, the related research progress of meta-heuristic algorithm for solving RCPSP and its extensions are introduced below.Lambrechts et al. [17] focused on the uncertainty in resource availabilities subject to unforeseen breakdowns, a robust schedule model was built that meets the project deadline and minimizes the schedule instability, and proactive strategies are proposed to solve the problem.Chen and Zhang [18] aimed to develop a two-stage model to obtain a proactive and reactive schedule in resourceconstrained project scheduling problems under uncertainty, a modified tabu search was employed to ensure scheduling process execution in reactive phase.Davari and Demeulemeester [19] proposed to use the selection-based reactions and the class of buffer-based reactions to deal with the uncertainty and disruptions in resource-constrained project scheduling problem.Chakrabortty et al. [20] focused on finding a robust initial schedule that can protect itself from any possible future disruptions or resource breakdowns, and a variable neighborhood search-based heuristic algorithm was proposed to solve the problem.The above articles focused on finding a robust initial schedule, but they are not targeted to deal with reactive scheduling scenarios.
Some articles focused on the single-objective RCPSP and its extensions.Deblaere et al. [21] addressed the reactive multi-mode RCPSP, and they proposed and evaluated several dedicated exact reactive scheduling procedures as well as a tabu search heuristic for repairing a disrupted schedule under the assumption that no activity can be started before its baseline starting time.Ning et al. [22] constructed the schedule adjustment cost determined by project reactive scheduling to manage disruptions caused by the randomness of activity duration, a tabu simulated annealing, and a variable neighborhood tabu search were developed to solve the problem.Adamu et al. [23] proposed a model called hybrid-RCPSP to solve reactive project scheduling problem.Davari and Demeulemeester [24] studied the reactive resource-constrained project scheduling problem, in which the approach to get a solution to the problem was a proactive and reactive policy that is a combination of a baseline schedule and a set of required reactions.Davari and Demeulemeester [25] addressed the proactive and reactive project scheduling with stochastic duration, and a dynamic programming method was proposed to solve the problem over different classes of proactive and reactive policies.Wang et al. [26] studied the reactive strategies in the multi-project scheduling problem, and a dual population genetic algorithm was designed to solve this problem.The above articles can deal with the reactive scheduling cases, but they are all single-objective optimization problem and assume that the task processing time is fixed.Zheng et al. [27] addressed the proactive and reactive of resource-constrained project problem in which activity durations are stochastic variables, and two reactive scheduling models were proposed to repair the baseline schedules after disruptions.However, it can only deal with single-objective optimization problems.Some articles focused on the multi-objective RCPSP and its extensions.Bagherinejad et al. [28] focused on the multimode multi-objective RCPSP and proposed a hybrid ant colony and genetic algorithm to solve the problem.Yeganeh and Zegordi [29] presented a multi-objective optimization approach for constructing resilient project schedules under resource constraints to cope with uncertain activity durations.Li et al. [30] proposed a multi-objective discrete Jaya algorithm to solve the multi-skill multi-objective RCPSP.Zhu et al. [31] presented an efficient decomposition-based multiobjective genetic programming hyper-heuristic algorithm to solve the multi-skill RCPSP with the objectives of minimizing the project's makespan and the total resource assignment cost at the same time.Hosseinian and Baradaran [32] considered the transfer time in multi-skill RCPSP, and built a model to optimize the project's makespan and cost simultaneously, and then, a multi-objective multi-agent optimization algorithm is proposed to get feasible schedules.The above articles dealt with multi-objective RCPSP, but the algorithms are not optimized for reactive scheduling, and the calculation time is relatively long.Some literature focused on designing algorithm strategies.Chand et al. [33] focused on the resource-constrained project scheduling problem with resource unavailability and disruptions, and a genetic programming hyper-heuristic that can automatically evolve the priority heuristic was proposed to solve the problem.Chakrabortty et al. [34] addressed the event-based reactive approach to deal with reactive resourceconstrained project scheduling problem, and an enhanced iterated greedy approach was also proposed to solve the large-scale problem.RCPSP and its extension can also be applied in command and control system [35], nuclear laboratory research planning [36], new production development [37], etc. Table 1 shows the differences and gaps between the existing research and the research in this article.
Overall, to solve the problem addressed in this article, the following characteristics should be considered: (1) multiobjectives for reactive scheduling; (2) the precedence relations between tasks; (2) using multi-skill resource (agent); (4) the transfer time is considered; (5) the processing time of a task is a non-linear function of the resource allocation The indirect and direct predecessor of task j The finish time of task j in the initial and repaired schedule The start time of task j in the initial and repaired schedule

L j
The skills required by task j The set of tasks that can use resource k The area size in task j that needs skill l to perform The area size in task j which needs skill l to perform

R j
The set of agents that can be used in task j R A l j The set of agents that are allocated to task j to perform skill l r ρ l (t) The total skill consumption of skill l in a given time t within the initial schedule The total skill consumption of skill l in a given time t within the repaired schedule

M R j
The maximum number of agents that can be used in task j

Dt
The time point when the disruption happens i j The time cost for transfer agents from task i to task j

U B
The maximum makespan for the information collection mission The preparation time for the agents which are allocated to task j u kl The number of skill l that the agent k can provide The cost for agent k to use skill l per time step The quality contribution for agent k to use skill l per time step The actual start time of task j s jt (decision variable) Equals 1 if task j is started at time t, 0 otherwise x jklt (decision variable) Equals 1 if agent k is allocated to task j to perform skill l at time t, 0 otherwise

Problem formulation
Three objectives are optimized at the same time in this article: (1) minimizing the information collection mission's makespan; (2) minimizing the cost caused by the agent performing information collection tasks; (3) maximizing the mission's quality.A task on node graph G = (V , E) is adopted to represent the precedence relations, in which V denotes a set of information collection tasks, and E denotes the precedence between tasks.Some assumptions [4,35] are made in this article: • Task 0 denotes the dummy start task, and task N + 1 denotes the dummy end task.• Preemption is allowed when disruption occurs.
• Each agent can only contribute one type of the skills it masters in a task.• The cost and quality are pre-defined for each agent corresponding to each task.• Only after all the allocated resources are transferred to the starting point of the task, the task can be started.

Notations
The notations are shown in Table 2.The bold numbers are represent the optimal value of each experiment

Basic formulation for normal schedules
Based on the model presented in [2,3,38], the formulation for normal information collection scheduling problem without disruptions is given as follows: Equation ( 1) is to minimize the makespan of the information collection mission.Equation (2) aims to minimize the cost.Equation ( 3) is to maximize the mission's quality, and to make the optimization direction consistent, this article minimizes the reciprocal of the mission's quality.Constraints (4) show the way to calculate the processing time of an information collection task when given a specific resource allocation result.Constraints (5) ensure that the denominator of Eq. ( 3) is not equal to zero.The constraints (6) limit the maximum number of agents allocated to a given task.Constraints (7) restrict that the task in information collection mission can be only started to be processed only once.Constraints (8) make sure that once the agent is allocated to a task, it can only perform one type of skill in that task.Constraints (9) make sure that the transfer time and precedence relations must be respected in resource allocation.Constraints ( 10)- (11) restrict that the agent can only perform one task at the same time.Constraints (12) ensure that if an agent is to be assigned to another task, the current position of that agent should be the direct or indirect predecessor of the corresponding task.Constraints (13) make sure that each task's processing is less than its maximum processing according to the resource allocation result.Constraints (14) show the relationship between decision variables z and x, which avoid that the agent is assigned from a predecessor task to a successor task.Constraints ( 15)-( 17) define the domain for each decision variable.

Disruption recovery model
Disruptions can be caused by the breakdown of agents, the incorrect estimations of environment parameters, and the dangerous situation discovered during the mission.The disruptions can cause the deterioration of the optimization objectives.Two types of disruptions recovery conditions are considered.The symbols are defined in Table 2.

Preempt-repeat condition
In the preempt-repeat condition, the affected tasks should be processed from their very beginning.The reactive scheduling will be executed just after the disruption occurs.

Precedence relations:
The precedence relation is the same as Eq. ( 9).

Start time constraints:
Assume the rescheduling is started immediately after the disruption.The incomplete or affected tasks must be finished after the disruption Skill requirement: The size of the task area and the agent's availability may change after the disruption.The skill requirement must be satisfied in the repaired schedule

Preempt-resume condition
In preempt-resume conditions, the affected task starts from the portion of work it left before the disruption.

Precedence relations:
The precedence relation is the same as Eq. ( 9).

Start time constraints:
For the task whose start time is earlier than the disruption, it follows constraints (20).For the task whose start time and finish time are later than the disruption, it should be started twice, as shown in constraints (21).s j is the actual start time of task j Skill requirement: Constraints are the same as Eq. ( 19) for the task whose start time is earlier than the disruption.For the task whose start time is earlier than the disruption and the finish time is later than the disruption, if the skill requirement increases, more agents should be allocated to the task as constraints ( 22)

Parallel pareto local search algorithm
The reactive scheduling of the information collection mission is time-critical (less than 60 s).Although the multi-objective meta-heuristic algorithm has stronger global search ability, given a high-quality initial population in advance and a time limit, it does not always obtain a better solution than the local search method.The approximated pareto front before disruption is the initial population in this article.A parallel pareto local search framework is designed to deal with the information collection mission reactive scheduling problem under two types of disruptions.

Solution representation
A task vector and a resource matrix are combined to represent the solutions.The task vector is denoted as π = {π 1 , π 2 , . . ., π N }; each element in the task vector takes a priority value from 0 to 1, which indicates the priority of the corresponding task.Then, the way to decode a task vector to a task list that meets the precedence constraints is presented: all the elements in the task list are rearranged in ascending order, while the precedence constraints are met, and the position of each element represents the index of task, and then, a feasible task list is obtained.The resource matrix is denoted as M, which decides the resource allocation result.M is a K × j∈V |L j | matrix; each value in M takes value from 0 to 1.The index of row in M denotes the resource index, and the index of column in M denotes the types of skill required for each task corresponding to the task list.Assume task i's index in task list is i p, then the j∈{π 1 ...π i p−1 } |L j | column in matrix M represents the agents assigned to task-i to perform the first type of skill in |L j |.Given an element in M, if The bold numbers are represent the optimal value of each experiment  The bold numbers are represent the optimal value of each experiment The bold numbers are represent the optimal value of each experiment the value of that element is greater than 0.5, the agent corresponding to the row index is assigned to perform the skill and the tasks that are represented by the column index.Following the above procedure, a feasible resource allocation result could be obtained.To show the above decoding process more intuitively, an information collection mission is presented in Fig. 2 and its decoding process is shown in Fig. 3.

Decomposition method
The Tchebycheff scalar objective function is selected, because it can handle the case in which the shape of pareto front is not convex.W weight vectors λ 1 , . . ., λ W are defined, and each vector corresponds to a parallel process.The sum of the elements in each weight vector λ w = (λ w 1 , . . ., λ w m ) is equal to 1, and λ w m ≥ 0, w ∈ W .Given a positive integer H , the way to calculate the value of each weight vector and the subregion definition for each parallel process w is the same as [39].Define z * = (z 1 , . . ., z m ) as the Nadir point.Using the Tchebycheff approach, the objective function for process w is defined as

Framework of the proposed method
The framework of the proposed parallel pareto local search (PPLS) is shown in Algorithm 1. W processes are running in parallel.In each process, A w is initialized to the solutions both in the pareto front before disruption (A 0 ) and the subregion λ w .Mn is number of individuals each parallel process maintains.

Individual local search
Given an existing solution x, the Gaussian mutation operator is adopted to conduct local search where x i denotes the ith element of x and N (0, σ i ) denotes a Gaussian random variable with zero mean and standard deviation σ i .If x > 1, set x to 1.If x < 0, set x to 0. In the beginning, σ i is initialized to the same value.The value of σ i is adapted in each based on the 1 5 successful rule proposed in [40] where nc is the number of changed individual in a process, and β is a fixed parameter.

Acceptance criterion based on constraint handling
A problem-specific constraint handling method is proposed in this part.For preempt-repeat condition, given an individual x and using the decoding process designed in Sect.3.1, the violation of constraints can be computed as For preempt-resume condition In each parallel process, three different conditions are considered: (1) There is no feasible solution in the current population Pop.Considering the constraint violation C(x) as the

Acceptance criterion based on negative correlation
For ith parallel process, the average of the solutions it maintains is denoted as xi .The Bhattacharyya distance is used to present the negative correlation between different process.The Bhattacharyya distance between parallel process p i and p j is defined as where i is σ 2 i I, I is the identity matrix, and = ( i + j )/2.The main idea behind the criterion based on negative correlation is that, for a parallel process, the selected solution should be with high quality and should lead to a distribution that is distant from the other parallel processes.The former can be represented by Eq. (32).The latter can be represented as Second, SP of each algorithm is compared.PPLS obtains only 1 best solution in preempt-repeat condition and 0 in preempt-resume condition.The algorithm with higher quality pareto front usually does not has competitive performance on SP, so the performance of PPLS is acceptable.Besides, none of the comparison algorithms has a clear advantage over the others.
Third, with respect to HV, PPLS obtains 5 best solutions in preempt-repeat condition and 7 in preempt-resume condition.MOTLA finds the 3 best solutions in preempt-repeat condition, which is the closest to PPLS.EMOIS finds 3 best solutions in preempt-resume conditions.MOIWO has the worst performance and PPLS has the best performance.
Fourth, the experiment is conducted to show the effectiveness of the proposed acceptance criterion based on negative correlation.PPLS-WN represents the proposed PPLS without the acceptance criterion based on negative correlation.HV and SP are calculated under preempt-repeat and preemptresume condition, the results are presented in Tables 11 and  12.The results clearly show that the acceptance criterion based on negative correlation can increase the diversity of the obtained pareto front without reducing its quality.
Furthermore, to show the PPLS's search process better, 5 moments are presented on instance-3 and instance-7, as shown in Fig. 5.
Summarizing, the proposed method outperforms the comparison algorithms generally.The reasons why the proposed method outperforms other algorithms can be concluded as follows: • The framework of PPLS is designed specifically to fit the information collection mission reactive scheduling problem's characteristics.• The proper design of the solution representation and decoding scheme contributes to reducing the search space.• For the information collection mission reactive scheduling problem within a short time limit, the algorithm with stronger local search ability tends to have better performance, and the importance of global search ability is relatively weak.

Conclusion
This article focuses on providing decision support for the decision-makers when disruption prevents the disaster information collection mission from completing the work as planned.When disruption occurs, it is vary difficult for the decision-makers to make high-quality reactive decisions.The disaster information collection resource scheduling problem is modeled as an extension of resource-constrained project scheduling problem (RCPSP).The mathematical model of the disaster reactive decision support problem with two recovery models is given.A novel framework of a parallel pareto local search based on decomposition is specially designed to provide reactive decisions for the decisionmakers within the time limit.Two solution acceptance criteria based on constraint handling and negative correlation are also proposed to maintain high-quality population with diversity.
The experiments have been conducted and the results show the proposed method outperforms the other competitors.As a part of our future research plan, we aim to develop new algorithms for solving RCPSPs involving complex practical issues, such as the prediction of dynamic disruptions and resource uncertainty.Although the proposed algorithm is efficient, it is not a real-time algorithm.The user experience of real-time decision support systems is significantly better than that of non-real-time systems.Using the deep reinforcement learning method to train a policy network which represents the reactive policy when disruption happens is a possible way to realize real-time decision support.
Although the training process might be time-consuming, it can be viewed as a preparation before the use of the decision support system.Once the policy network is obtained, the time cost of the reactive scheduling process will be in milliseconds.Game theory can also be introduced in the deep reinforcement learning methods to explore the interesting interaction between decision-making and the environment's feedback.
Signal and Communication Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China 4 Pengcheng Laboratory, Shenzhen 518055, China

Fig. 1
Fig. 1 Example of an information collection mission

Fig. 2
Fig. 2 Example of a small information collection mission scheduling problem

Fig. 3
Fig. 3 Illustration the solution representation and decoding process

Table 1
Differences between the existing research and this article

Table 2
The symbols Symbols Description i, jIndex of tasks, i, j = 0, 1, 2, . . ., N , N + 1 l Index of skill type, l = 1, 2, . . ., L N k Index of information collection agent, k = 1, 2, . . ., K t Index of the time step N The number of non-dummy tasks m The number of objectives V = {0, . . ., i, . . ., j, . . ., N + 1} Task set, task 0 and N+1 is start and end task, respectively V * Incomplete tasks set after the disruption occurs R = {1, . . ., k, . . .K } Agent or resource set K The number of agents which can be used R * Agent or resource set after the disruption occurs L = {1, . . ., l, . . .L N} Set of information collection skills P j , P I j The indirect and direct predecessor of task j S j , S I j

Table 3
Instance parameters

Table 5
Performance comparison of IGD in preempt-repeat condition

Table 6
Performance comparison of IGD in preempt-resume condition

Table 9
Performance comparison of HV in preempt-repeat condition

Table 12
The effectiveness of the acceptance criterion 2 with respect to HV shown in Tables5, 6, 7, 8, 9, and 10 and Fig.4.The Wilcoxon's rank sum test at 5% significance level is used to present the difference between the comparison algorithm and the algorithm designed in this article.Using T to denote the rank sum of the comparison algorithm result.According to the rank sum table, P(82 < T < 128) = 0.05, that is, T > 128 means that the comparison algorithm is worse than the proposed algorithm, T < 82 represents the comparison algorithm is better than the proposed algorithm, and 82 < T < 128 means the comparison algorithm is similar to the proposed algorithm.The symbols †, , and ≈ denote that the performance of the proposed algorithm in this article is better, worse, and similar than the comparison algorithm.The standard deviation is shown in parentheses, and the data in bold are the best value found in the corresponding test instance.First, the performance on IGD is discussed.The PPLS outperforms the other algorithms significantly.The PPLS obtains 7 best solutions in preempt-repeat condition and 6 in preempt-resume condition.It is clear that PPLS obtains the best pareto front.MOTLA has the second-best performance generally.
)Then, normalization is conducted by requiring Corr(x) + Corr(x ) = 1.Using the above definitions, the detail of acceptance criterion 2 is presented in Algorithm 3. are