Machine learning and optimization for production rescheduling in Industry 4.0

Along with the fourth industrial revolution, different tools coming from optimization, Internet of Things, data science, and artificial intelligence fields are creating new opportunities in production management. While manufacturing processes are stochastic and rescheduling decisions need to be made under uncertainty, it is still a complicated task to decide whether a rescheduling is worthwhile, which is often addressed in practice on a greedy basis. To find a tradeoff between rescheduling frequency and the growing accumulation of delays, we propose a rescheduling framework, which integrates machine learning (ML) techniques and optimization algorithms. To prove the effectiveness, we first model a flexible job-shop scheduling problem with sequence-dependent setup and limited dual resources (FJSP) inspired by an industrial application. Then, we solve the scheduling problem through a hybrid metaheuristic approach. We train the ML classification model for identifying rescheduling patterns. Finally, we compare its rescheduling performance with periodical rescheduling approaches. Through observing the simulation results, we find the integration of these techniques can provide a good compromise between rescheduling frequency and scheduling delays. The main contributions of the work are the formalization of the FJSP problem, the development of ad hoc solution methods, and the proposal/validation of an innovative ML and optimization-based framework for supporting rescheduling decisions.


Introduction
The fourth industrial revolution, or Industry 4.0 (I4.0) for short, allows decision-makers to obtain real-time information from various plant components and machines to communicate with each other. I4.0 can, therefore, be viewed as the application of the Internet of Things (IoT) to industrial production (IIoT). accumulation of delays in production, the unexpected arrival of urgent orders, machine faults, or absence of the operator. To enforce a rescheduling, it is necessary to compute a new schedule balancing the possible time savings and efforts to implement the changes. Although rescheduling frequently helps manage unexpected disturbances, it needs additional working time in reorganization and affects the stability of shop flows. On the contrary, rescheduling too rarely does not eliminate enough a growing accumulation of the delays. Given an optimization technique to create a new schedule, determining the best rescheduling time remains the main problem. If the company receives unforeseen but urgent requests or machine fails, this decision is easy to make (i.e., the rescheduling process should be carried out as soon as possible). In general, rescheduling is required within a manufacturing process if unexpected events arise, leading to unfeasible schedules. However, in the continuous and complex production setting, deciding to reschedule or not quickly and effectively is not a trivial task. The problem is so complicated that in the real setting, many factories just reschedule periodically.
In the view of improving the rescheduling strategy, our paper proposes a new rescheduling framework by combining metaheuristic optimization algorithms and machine learning (ML) techniques. The proposed approach provides empirical evidence of efficiency and effectiveness in the production problems of some Italian companies, within the industrial project Plastic and Rubber 4.0 (P&R4.0) 1a project aimed at being the Italian response to I4.0 for companies in the plastic and rubber processing field. It is essential to highlight that the paper goal is to describe the integration between ML and optimization and to show a comprehensively proof-of-work methodology, but not necessarily to exploit its full potential (which can be achieved only by tailoring the method to the studied setting). For this reason, both the chosen metaheuristics and ML algorithms are not the most advanced ones but selected among mature and popular methods, which have shown excellent performance in the past. This choice also shows the potential of getting better performance by adopting more advanced and tailored algorithms.
The paper is organized as follows. Section 2 reviews the literature on scheduling and rescheduling, highlighting their relationship with I4.0. Section 3 introduces the methodology of the integrated framework. Section 4 presents the scheduling problem and the mathematical model. Section 5 displays the adopted optimization approach. Section 6 demonstrates the creation of classification models. The results of the numerical experiments are shown in Section 7. Finally, we conclude and outline future research lines in Section 8.

Literature review
Scheduling is the process of assigning tasks to resources or allocating resources to perform tasks over time. This work focuses on a variation of the job-shop problem (JSP) [13]. Extensive research on JSP methods, including heuristic principles, classical optimization, and artificial intelligence (AI), is reported in [14]. [15] points out that priority rules and dispatching rules are probably the most frequently used heuristic policies embedded in metaheuristic methods for scheduling problems.
The scheduling problems exist in different manufacturing and service industries with their particularities. In plastic and rubber molding-related fields, the fabrication of injection molds supplies supports to other companies using injected components either as semi-finished products or as final products [16]. Their scheduling often involves complex production systems [17] owing in general to a large number of products, unrelated parallel machines, and sequence-dependent setup times. Such characteristics often match the job-shop scheduling problem ones [18]. Each order or aggregated orders can be seen as a job. For each job, there is a set of ordered activities, and each activity requires the exclusive use of a resource. Although a well-designed schedule is critical to get products delivered on time, the studies on the scheduling problems in plastic and rubber field are still limited. In [19], the authors develop mathematical models for the job-shop scheduling problem with sequence-dependent setup times and solve them through several search methods. In [16], a case on plastic injection molds is studied, and a flexible job-shop scheduling problem is addressed with Petri nets (PN) and genetic algorithms (GA). PN provides a formal representation of the complex system. GA creates a near-optimal schedule to minimize the total weighted tardiness based on the structure provided by PN. The work also provides a clear explanation of the characteristics of plastic injection molds. In [20], the authors describe why the production process in a Belgian rubber company is a job-shop scheduling problem. They solve such a problem through a hybrid shifting bottleneck procedure with a tabu search algorithm. Finally, in [21], a flexible JSP is transformed into a game, which is solved through game theory (GT) approaches. All the jobs and the manufacturer are players trying to maximize their profits. Moreover, each job also aims at minimizing its tardiness while the manufacturer also wants to minimize the makespan of all the jobs.
In this paper, we formalize the scheduling problem in P&R4.0, namely the flexible job-shop scheduling problem with sequence-dependent setup time and limited dual resources (FJSP), where dual resources mean general-purpose machines and setup workers. And we solve the problem with one of the possibilities-hybrid metaheuristics.
Although the lack of setup workers is a common phenomenon in factories, most researchers do not consider their availability. For example, [22] introduces the issue concerning both the selection of the machine and operation with sequence-dependent setups, without mentioning workers. In the papers [23] and [24], the authors have the same lack. In [25], setup workers are viewed as a critical resource in their single-stage production composed by a set of unrelated parallel machines. In [26], the authors consider setups in a dynamic environment. Specifically, the work deals with a scheduling problem managing a wide variety of products, and an implicit clustering is employed against the impractical building of a large-scaled setup matrix. However, the availability of setup workers is not mentioned. Another research [27] concerns the flexibility of both workers and machines as well as the precedence between operations, but there is no consideration of setup. To our knowledge, FJSP has not been formalized in the literature and no precise heuristics have been suggested as solution methods in the static view, nor for rescheduling in the dynamic setting. Consequently, to provide an integrated rescheduling framework is another contribution of the present paper.
As aforementioned, due to the I4.0 revolution, also scheduling optimization has the opportunity to develop new tools. In [28], the authors present an I4.0 survey on the implementation of optimum control to scheduling in production and supply chain by concentrating on the deterministic maximum principle. Not only do they derive major contributions, application areas, limitations, and research and application recommendations for future research but also they explain control models in industrial engineering and production management. In [13], the author reviews several JSP-related optimization problems applied in I4.0, which shows that one of the most important and active research fields is the application of distributed optimization algorithms. Especially, multiagent-based systems have been proven to be very effective in several settings (see, e.g., [29]). Moreover, the technique is capable of generating effective schedules for both dynamic and static problem sets, as in [30]. Nonetheless, there is no research done in both articles on the problem of deciding the right rescheduling time. The above discussion testifies a lack in the literature if both I4.0 opportunities and rescheduling problems are considered. In the rest of the section, we focus on the papers considering the rescheduling problem. The work [31] is the first to describe a general model for providing schedules using JSP and GA. This algorithm is evaluated under different situations of workload in a dynamic environment. In [32], the authors are the first to provide well-defined concepts for most rescheduling production systems and to identify a framework for understanding rescheduling approaches, policies, and methods. After that, more rescheduling-related papers appear. [33] and [34] provide critical rescheduling analyses. The former concerns a broad set of operations for railway rescheduling. Even though different algorithms are presented, most of them are problem-specific and cannot be generalized into the context of the smart industry. Instead, by focusing on solutions involving the integration among industries and real application cases, [34] presents a systematic literature review of the studies on rescheduling production. Their paper mainly deals with the choice of the rescheduling heuristic rather than the decision of the rescheduling timing. The lack is common both in rescheduling literature and in the small branch of the literature dealing with rescheduling-specific ML applications. For example, [35] presents an algorithm that uses Q-learning principles to change the train schedules on a single-track railway and in [36] the authors develop an artificial cognition control system to acquire rescheduling knowledge in the form of decision rules. Another work [37] proposes a two-stage teaching-learning-based optimization approach, which avoids considerable modifications for ensuring robust and stable schedules after machine breaks unexpectedly.
In terms of scheduling and rescheduling framework, [38] introduces a general rescheduling framework to address issues arising from the dynamic nature of production scheduling for a classical JSP. The proposed solution consists of a solver that assumes deterministic and static data and a controller that handles uncertainty that triggers a new solution from the solver if the scheduling performance drops down below a certain threshold. Our research is close to their approach to capturing the complex rescheduling properties. However, while the decisionmaking controller's output depends on when relevant information is gathered in their method, they do not propose the possible integration of real-time data analysis. Another similar work has been proposed [39], which considers optimization scheduling and rescheduling under I4.0 and introduces a new decision-making scheme by using Tolerance Scheduling (identifying scenarios where a given schedule remains acceptable) to mitigate the rescheduling changes in the dynamic environment. They propose to start from defining the disruption events and designing tolerance for parameters, and then incorporate the scheme into a Cyber Physical Production System (CPPS, [40]), which can decide to reschedule only when the objective function value is significantly affected. However, the paper lacks a complete example or case study, which makes the approach feasibility doubtful in terms of the complexity of implementation and calculation time. In our approach, we validate our rescheduling framework through a scheduling problem and numerical experiments. In [41], the authors propose an event-driven JSP mechanism under machine capacity constraints. The event-driven rescheduling strategy achieved better performance with respect to a periodic rescheduling approach. While our work and [41] both compare with the periodic approach and show superior performance, different rescheduling goals and actions can be identified. Concerning goals, in [41], the rescheduling is done once a dynamic event occurs while minimizing the objective values. Our work intends to balance the big deviation of objective value and energy spent on implementing rescheduling. Concerning actions, in [41] the rescheduling is applied after a disturbance occurs. Instead, our work focuses on combining real-time monitoring and prevention, depending on the various types of possible disturbances. While rescheduling must be done for some unexpected events (e.g., the arrival of urgent orders), some other disturbances that may occur more frequently (e.g., accumulated processing time variations) can be detected through real-time monitoring and integrated with ML to make rescheduling decision.
In [42], the authors propose a decision-making model based on minority game (MG) theory to organize and manage the resources and services provided by the autonomous participants of a cloud manufacturing system with private information. In the game, a set of classes is the agents competing for a group of workstations. Each agent chooses the workstation with maximum availability. The class i allocated on the workstation j wins if the workload of the workstation j is less than limited value. The game stops and gives the allocation when either all the agents win or the number of rounds equal to the limitation. Particularly, if a machine fails, MG reallocates the product classes adding the processing time due to recovery time. In the proposed model, each agent selects resources based on the best agents' score and not the best allocation of the workstations, the simulation results and the low computational complexity prove MG is adequate to solve the resource allocation problem in a system of sharing resources. However, the computation to compare the performance is only based on the workload of workstations. Regarding other objectives (e.g., minimization of makespans, maximum or total tardiness), the performance is not guaranteed. Also, it reschedules after a machine failure without the incorporation of early failure detection or prevention of rescheduling due to other disturbances. A similar lack occurs in other GT-based approaches, such as [43]. Also, in [44], a GT-based approach for self-optimization and learning of modular production units is presented. In the proposed method, each control parameter serves as a player. To avoid long training time and huge data set requirements, appropriate parameters are defined from the basic control level (BCL) to be learned by learning agents. In the learning algorithm, optimal actions for each player have to be inferred from interacting with the environment. However, the experiments focus on energy optimization. In the production scheduling applications, the ability to deliver customer orders in time is of primary importance. The applicability in time-related objectives is still to be validated. Besides, there are several limitations to GT including the fact that each player must know the cost functions of the other players and it is hard to choose when several Nash equilibria exist [45].
To fill in the blanks of the existing frameworks for rescheduling, we deepen the integration of ML and optimization under I4.0 and propose our methodology in detail in the next section.

Methodology
In this section, we present the general integration methodology and the specific one implemented in our case study. We recall the reader that the goal of the paper is not to implement the most advanced scheduling and rescheduling strategies or the most advanced ML techniques. Instead, we concentrate on the methodology to integrate various existing methods, which creates new possibilities in the area of rescheduling. Figure 1 displays the proposed general methodology in a sequential workflow. The main steps are: 1. Analyze and classify the problem of scheduling. The problem may range from a classical JSP to a complex problem as the one proposed in Section 4. 2. Develop techniques for the specific problem, including: (a) Optimization algorithms for scheduling and rescheduling. They can be based on either mathematical programming algorithms or any appropriate metaheuristics. Usually, the scheduling algorithm should be used as a strategic plan, while the rescheduling algorithm should be used as a tactical adaptation of the original scheduling. Thus, the rescheduling optimization method has to be extremely fast. (b) The ML classification model for identifying rescheduling patterns. To enable the automation of rescheduling, meaning that the system knows when to reschedule without or with minimal human intervention, we use ML classification algorithms to learn from the historical data and output as the rescheduling decision. An implementation example is described in detail later (see Fig.7, Section 6). The ML strategy can rely on automatic feature extraction or more sophisticated methods.
3. Generate a new schedule periodically from a group of production orders with the predefined optimization algorithm. Then, each production schedule is started with interconnected systems and real-time monitoring. The real-time data are sent to the analytic data algorithms periodically for being translated into features that the classification model can recognize. Then, the model sends the output to the rescheduling controller, suggesting to reschedule or not. If the recommendation is to reschedule, the subsequent rescheduling operation will be taken. Otherwise, the output will be held to completion. 4. Record the data as feedback to update the classification model. Since this is a post-process executed only after finishing the scheduling, it is represented in the figure through a dashed line.
To elucidate the general methodology of the automated rescheduling framework, we implement the following steps for a case study within the P&R4.0 project: 1. Formalize the production scheduling problem. 2. Develop the solution approach for the scheduling and rescheduling problem. 3. Derive features and algorithms for creating a classification model. 4. Demonstrate the potential effectiveness, run the subsequent numerical experiments: (a) Implement and test a heuristic approach capable of finding good schedules in a reasonable amount of time; (b) Create data to simulate the information from technologies provided in the I4.0 framework; (c) Train ML model to learn the rescheduling patterns for deciding when to reschedule (i.e., when to trigger the heuristic for getting a new schedule and then to update the production schedule); (d) Compare the performance on the same scenarios followed by the proposed rescheduling framework and the commonly used periodical rescheduling that do not align with ML and real-time data analysis.

Problem definition and modeling phase
The optimization problem being considered is the flexible job-shop problem with sequence-dependent setup time and limited dual resources (FJSP). Based on conventional JSP, our FJSP introduces: -The flexibility in selecting machines as there may be more than one machine capable of the same operations; -The limited resources of setup workers and machines; -The sequence-dependent setup, which is under the control of both machines and setup workers.
The key assumptions of the model are: -No preemption is allowed for each operation, operations between different jobs are independent; -One machine and one worker can only work on one operation at a time; -All jobs, machines, and workers are known at the start.
The following sets are considered: done, each operation belonging to a specific job; Because each job is a predefined set of operations with a fixed precedence relationship, we define a directed graph where O is the set of nodes and E is the set of arcs, which enforces the precedence relationships of the operations for the same job (for example, [46]). More specifically, an arc from operationõ to operation o means that prior to operation o, operationõ must be performed.
The following parameters are also specified: -T cc mt is the setup time needed to change from configuration c to configurationc on machine m at time t; -T om is the processing time for operation o done on machine m; -L t is the number of setup workers available at time t.
Let us consider the following decision variables: -C max is the value of the total makespan for the set of jobs; -C o is the completion time of operation o; -y omt is a binary variable taking value 1 iff operation o is processed on machine m at time t; -s omt is a binary variable taking value 1 iff operation o starts to be processed on machine m at time t; -z c mt is a binary variable taking value 1 iff machine m is in configuration c at time t; -w cc mt is a binary variable taking value 1 iff machine m changes from configuration c to configurationc at time t.
Then, a mixed-integer linear programming (MILP) formulation for the FJSP is as follows: subject to: The objective function Eq. 1 aims at minimizing the maximum production makespan. Constraints Eq. 2 ensure the correctness of makespan value by defining it as the maximum of all the completion times. Constraints Eq. 3 impose that each operation must be performed while Eq. 4 enforces that each operation must start in a single time step on only one machine. Constraints Eq. 5 ensure that the right amount of time for each operation is required. In addition, constraints Eq. 6 impose that an operation cannot be executed unless it starts. The constraints Eqs. 7 and 8 enforce the precedence relation between the operations, while constraints Eq. 9 enforce that each machine must have a configuration. Constraints Eq. 10 prohibit a machine from taking a configuration which is not in the set of configurations that it can achieve. Constraints Eq. 11 limit the number of configuration changes that can be made in a given time step. Furthermore, constraints Eqs. 12-17 add the relations between the variables. In particular, constraints Eq. 12 impose that an operation cannot start if the machine is not in the correct configuration, constraints Eq. 13 enforce that the completion time of one operation must be greater than the maximum time of that operation on the assigned machine, and constraints Eq. 14 impose that when a machine performs an operation, no other operations can start during the process. For variables w and z, the logic consistency is defined by constraints Eqs. 15 and 16. Constraints Eq. 17 impose that no operation should begin on it when a machine is changing the configuration. Finally, constraints Eqs. 19-21 define a binary condition on the variables.

Scheduling optimization phase
Problem Eqs. 1-21 become very difficult to solve, even for small-size instances. It has a number of variables of the order of 2 max{|O||M||T |, |M||T ||C| 2 } . Thus, even for relatively small instances (e.g., for |O| = 100, |M| = 7, |T | = 30, and |C| = 3), exact solvers cannot solve the problem in a reasonable amount of time. Since real applications need efficient scheduling procedures without affecting the overall makespan, we adopt a hybrid algorithm (HA) to calculate the initial schedule. HA consists of the genetic algorithm (GA) and tabu search (TS), as discussed in [47].
Note that there exist many other hybrid GA approaches dealing with flexible job-shop scheduling. For example, in [48], the authors design an approach for integrating GA with simulated annealing (SA). The introduction of SA is to overcome the premature convergence of GA, similar to the introduction of TS in our HA. however, since the focus of the paper is not to find the best scheduling algorithm, the comparison of different hybrid algorithms is considered out of the scope.
The flow chart (Fig. 2) describes the HA procedure by starting with GA to provide a set of initial solutions as a population and then selecting solutions to do crossover and mutation. TS performs a local search on each of the new solutions. GA then uses improved solutions from TS to start a new evolution. By omitting the TS steps, this hybrid framework can be converted into traditional GA. Similarly, by setting the population size to one and omitting the genetic operators, it can be converted into traditional TS. While HA is not new, we would like to provide interested readers with a clear view of how we adapt HA to solve FJSP in the following sections.

Encoding and decoding
In GA, it is essential to ensure that all solutions (i.e., chromosomes) generated during the evolutionary process are feasible. In the paper, we show aspects of both encoding and decoding.
The job representation is selected to encode an individual. A chromosome is an array of genes [j 1 , j 2 , . . . , j |O| ], each gene corresponds to the job number of the operation. More precisely, this means that the ith presence of the job number j is the ith operation of job j .
In the decoding phase, the assignment of machines and the calculation of start and end time are performed. There might be more than one machine available for each operation, so we use the modified greedy strategy to select randomly only one of the earliest available machines between the first two machines. Subsequently, the availability of both a machine and a setup worker is considered to calculate the start and end time of each operation. The objective is to minimize the overall completion time, so the value of makespan is fitness. The smaller the fitness, the better the solution is.
For example, in Fig. 3, we assume that each machine is in configuration 4, there is only one setup worker, and we label the operations by j -o, where j ∈ J , o ∈ O. All machines and workers from t = 0 are available. The directed graph in Fig. 4 indicates the precedence relationships. X and Y are two dummy nodes denoting the source and sink, respectively, so there is a path going from X to Y for each job there. The set of directed arcs specifies the ordered pairs of operations. Figure 5 (left side) shows one example of chromosomes. The middle reports a possible machine assignment for operations. Notably, for an operation o, the earliest starting time on each machine is calculated based on the finishing time of its predecessors and the setup worker's available time. Finally, in Fig. 5 (right side), we update the graph by adding the arcs that model the operation precedence on the same machine (dotted lines).

Genetic operators
The initial population consists of the chromosomes with randomness in operation sequence (for operations in directed arcs). Genetic operators-selection, crossover, and mutation-are established for creating new solutions. To ensure the feasibility of each solution, once it is discovered that it is unfeasible, it will be corrected.
In each generation, we choose tournament selector (selecting the best individual from random samples with replacement) for selecting survivors and roulette wheel selector (selecting according to the fitness proportion) for selecting individuals to create offsprings.
A crossover operator acts on two strings of parents at a time and produces offsprings by recombining the characteristics of both parent strings. We use a so-called two-point crossover, which randomly chooses two points in parents and swap the area between the two points.
The left of Fig. 6 shows a crossover example, generating two feasible children. On the right, an infeasible solution to the same problem shown in Fig. 3 shows how to fix infeasibility. Fig. 3 Example of one scheduling toy instance problem A strategy for swapping mutation is used to avoid spending more time in managing feasibility rather than exploring for better solutions. The solution is straightforward, taking two positions in the recombined chromosome randomly, then swapping genes on the positions to obtain new offsprings. The newborns are feasible on any swap since we use job numbers to represent genes.

Tabu Search
In TS, move, neighborhood structure, tabu list, and aspiration criteria are the main components.
A neighborhood structure is a mapping of a solution to a set of neighbors (a neighbor is a slightly different solution from the original). [49] proposes the first effective neighborhood structure for JSP by reversing the order of two successive operations on the same machine. A move is a modification on a solution to get a neighbor. The reversing transition is a type of move. This paper adopts the swap strategy exchanging any two operations on different jobs. With the intrinsic meaning of "tabu," forbidden, the tabu list is a memory structure recording the recent moves to avoid the solution cycle. In our work, the positions of the two operations in a move are recorded as an element in the tabu list. The list is cyclic with a fixed capacity, which means the oldest element is removed when a new element needs to be inserted, but the full capacity is reached. As elaborated by [50], TS excels at avoiding getting stuck in local optima with the usage of the tabu list. However, it is inevitable to consider more for balancing intensification (exploring best neighbors) and diversification (disallowing the moves annotated as tabu) based on the length of the tabu list.
While the advantage of the tabu list is shown, it displays the possibility of forbidding some solutions, which are discovered by applying the tabu move, being visited. To mitigate the risk, we accept the widely used aspiration criteria: accepting a tabu move, which creates a better solution than what has been found so far.
TS records and encodes the best solution found in a chromosome, and then returns it to the GA population. Although it is likely to transform into better solutions by running for more generations, there should be a tradeoff between the running time of TS and that of GA in HA.

Machine learning-based classification phase
To overcome the difficulty of making the rescheduling decision, we create a classification model that returns the rescheduling suggestion, given the topology-related information and the current state of the production system. We highlight that the techniques presented in this section are one possible choice for the classification methods. We select those techniques because they are easy to implement, well known, and robust.
The approach is useful for the following reasons: first, the classifier returns the result of the calculation in a short amount of time, which satisfies the time criterion in dynamic production; second, it requires a small amount of computing power; third, as it provides answers in a short time, it can be run at high frequency and can, therefore, be responsive. Finally, it is possible to know the characteristics of the plant that play actively in deciding the need for rescheduling by using the proposed methodology. Therefore, the plant can be modified to improve its robustness, reduce the bottleneck, and so on.
Considering a set of scenarios = {1, 2, . . . , θ max }, in each one θ ∈ , the jobs that the plant has to fulfill, the related operations and the number of machines will be changed. In the following, we use the notation u(θ ) = (u 1 , . . . , u T ) to indicate the schedules of the plant in scenario θ ∈ . Given two different schedules u(θ ) and v(θ), if the production follows u(θ ) in [0, t] and v(θ) in [t + 1, T ], in order to indicate the concatenation of the two schedules, we use the notation < u(θ ), v(θ ) > t . Given a schedule u(θ ) and a scenario θ ∈ , we call the operator F(u, θ ) the computed makespan. For each  , we define the processing time variations δ 11 (θ), . . . , δ |M|1 (θ), . . . , δ 1|T | (θ), . . . , δ |M||T | (θ ). The interpretation of δ mt (θ ) is as follows: given an operation o that on machine m lasting for T om to process, if scenario θ occurs, it lasts T om (1 + δ mt (θ )). These variations can be positive (under-estimated processing time) or negative (over-estimated processing time). It is worth noting that the random variables δ mt (θ ) are independent with respect to machine and time step, and independent from the scheduling [51]. In addition, we assume that the expected value is zero. Please notice that this is not a restrictive hypothesis because if the decision-maker discovers that some processes last longer than expected on average, then the expectations change.
To compute the data set for training the classifier, we compute the actual schedule u(θ ) in each scenario θ , i.e., the schedule to be followed by the plant by the heuristics described in Section 5. Then, we run the rescheduling procedure to get a new schedule u(θ |t) for each time step t of scenario θ . The new schedule is computed using the heuristic proposed in Section 5.3, with the current schedule being u(θ ) the starting solution, over the following optimization model: (22) subject to Eqs. 2-21. In the modified objective function Eq. 22, 0 < λ < 1 is the relative importance between the original C max and the l 1 norm of the difference between the solution of the actual schedule and the rescheduled solution (N var ). The latter term is useful to limit the number of scheduling changes while minimizing makespan. Given schedule u(θ |t) and threshold b, if then it is better to reschedule (label 1) and u(θ ) is updated with < u(θ ), u(θ|t) > t , otherwise no (label 0). We consider b as decrements of the makespan in Eq. 23 and assign it 5% by taking into account the time spent in reorganizing the production and avoiding rescheduling if only a small improvement can be achieved. Note that the threshold can be adjusted to each production manager's actual requirement. A higher threshold can be set accordingly for productions that are difficult to reschedule frequently. It is worth noting that the similar idea of Eq. 23 can be found in [39], where the authors call it inertia factor.
By using the procedure above, we get a set of plant states, one for each pair (t, θ ) ∈ T × , and we associate each of them with the label (reschedule or do not reschedule). From the dataset, we extract a set of features for each state, considering information on processing time variations (PTV), planned scheduling, and plant information (including the available resources for each operation and customer orders to be managed).
Besides, it is assumed that all simulations have the same time horizon T . This assumption is not restrictive as we can always consider T as the longest simulation end time.
Following the approach of [52] and [53], we do not consider automatic feature extraction but we exploit the experience of the involved company to define the following features: Since considering all operations can lead to over-fitting (the number of operations is higher than the number of Fig. 6 Example of crossover and infeasibility handling. The red dashed lines represent crossover points scenarios), with the ratio specified as a measure, only the operations with a high ratio, meaning those with more flexibility in changing machines, are considered. We call the number of considered operations OP Num. The performance of the classifiers related to OP Num is evaluated in Section 7. When OP Num is 2, the considered feature set is {t, OP T 1 , P T V 1 , ρ 1 , OP T 2 , P T V 2 , ρ 2 }. A snippet of ML input and output example with OP Num equal to 10 is shown in Fig. 7. The input is the same for all the algorithms compared (negative P T V values mean that the processing time is less than planned, and vice versa). The output is the rescheduling decision (1 means reschedule, 0 means not to reschedule), which is obtained according to the different ML algorithms.
After getting the features above, the resulting dataset is divided into a training dataset (70%) and a test dataset (30%). We consider three commonly used classification algorithms: Random Forest Classifier, which belongs to decision tree induction methods; Support Vector Machines; and Multilayer Perceptrons from neural networks: 1. Random Forest Classifier (RFC): a combination of decision tree classifiers and the ensemble of trees voting for the most popular class [54]. RFC is easy to parametrize, not sensitive to over-fitting, and it provides ancillary information like variable importance [55]. However, a large size of data set can lead to high memory consumption [56]. 2. Support Vector Machine (SVM): input vectors are mapped to high dimensional feature space and a linear decision surface is constructed in the space [57]. It does not require any parameter tuning since it can find good parameter settings automatically [58]. It delivers a unique solution because the optimization problem is convex. However, while the feature of non-parametric brings convenience, it lacks the transparency of results [59]. 3. Multilayer Perceptrons (MLP): a type of neural network, which simulates human brains [60]. It is a system of interconnected neurons or nodes representing a nonlinear mapping between an input vector and an output vector. The algorithm works well for simple problems, but for difficult problems, several iterations are needed for the training convergence [61]. It shows one benefit that it needs of neither the prior assumptions about the distribution of training data nor the decision regarding the relative importance of input measurements. The costs spent on deciding the number of layers and the number of nodes in those layers are not trivial, and there is no single method for doing it [62].
We have decided not to use more advanced techniques such as deep learning [63], convolution neural network [64], or clustering [65] since our aim in this work is to provide a proof of concept of the integration framework by starting from simple but widely used techniques. Eventually, these approaches are considered because they provide the user with insights on the features considered. Furthermore, for selecting the validation model, we choose the cross validation (CV) technique, which overcomes over-fitting issues [66]. It is well known that using the same data for the training algorithm and evaluating performance leads to over-optimistic results [67].

Numerical experiments
In this section, we present the instance generation procedure, the implementation details, and then discuss the experimental results regarding the optimization and ML techniques separately and those regarding their integration in the rescheduling process.

Instance generation
The problem instances were created by using a general method to construct all the sets, operators, and parameters described in Section 4. Notably, we use it throughout the section to model a company's factory. The plant consists of two product lines, one for molded rubber and the other for plastic items. In the paper, only the rubber line is considered. The line is made up of 16 machines. All jobs are made up of successive operations, i.e., there are no two operations of the same job that can be carried out in parallel. Also, only one worker can perform the setup operation. Every new setup operation required by a particular operation must, therefore, wait for the setup to be completed.

Fig. 7 An example of ML inputs and outputs
The empirical distribution of the PTV used is shown in Fig. 8.
The maximum increment is 20% of the planned processing time, while the maximum reduction is 15%.
In the training phase, we start from 23 scenarios. For each, the number of operations is simulated from 2 to 41, with processing time varied from 3 to 25 time units, and available machine quantity from 1 to 14.

Implementation details
As described in Section 5, the first schedule is obtained by HA, consisting of GA and TS. The GA part is built on the open-source programming library Jenetics [68], while the TS part has been implemented based on the opensource programming library OpenTS [69]. By calibrating the parameters, the population size is settled to 200 for both GA only and HA. For the crossover and mutation operators, in HA, a two-point crossover with probability 0.86 and a swap mutation 0.3 are used. In GA only, two-point crossover with 0.76 and swap mutation with 0.115 are adopted. The tabu length is calibrated into 30 for the TS-only approach and 20 for the HA approach. For each individual in HA, TS is set to iterate 50 times as a stopping criterion.
The ML procedure has been implemented by using the package Scikit-learn [70] in Python 3.6. The machine used for the numerical experiments is equipped with an Intel(R) Core(TM) i5 CPU@2.3GHz and 8 GB RAM and running macOS v10.14.3. The MILP solver used in the numerical experiment is GUROBI Optimizer v8.1.0 (build v8.1.0rc1).
The experiment results are shown in the next three subsections. In Section 7.3, we compare the performance of the heuristics implemented for solving Eqs. 1-21, in Section 7.4, the characteristics of the classification problem are analyzed, and next, in Section 7.5, the performance

Comparing the results of heuristics against the exact solver
In Table 1, each row compares the heuristic gaps (%) against the solutions provided by the GUROBI exact solver (ES). All computation times reported are measured in seconds. The results include two types of comparisons: -Under the same running time, the differences of makespan; -The distinctions of heuristic makespan from the best possible values of ES.
Since running time is increased with the difficulty level of the problem, we encompass four intervals to cover a wide range of difficulty levels. The makespans of the instances tested in Table 1 fall within the range 17-120 time units due to the limitation of ES.
The columns under |M| and |O| provide the number of machines and operations used in each instance, respectively. The same machine number and operation number do represent distinct instances because other parameters are different (for example, the duration of each operation and the precedence). For the comparison under the same running time, GA gap, T S gap, and H A gap respectively report their gaps (%) compared with the solutions supplied by ES. Moreover, best T indicates the running time for Gurobi to find the optimal value for each instance. The columns under GA bgap, T S bgap, and H A bgap separately show the differences of GA, TS, and HA compared with the optimal values found by ES. A dash (-) indicates for the given instance, the solving of solution approach ES exceeded the running memory of the computer; thus, no gap is quantifiable.
When comparing under the same running time, there is only one row of positive values (1 out of 20 instances) available in the first three gap columns, demonstrating that, in most cases, all the three heuristics perform significantly better than ES. The statistics in the last two rows (averages and standard deviations) support the effectiveness of the heuristics. We found that keeping the running time shown in the seventh column, ES slightly surpassed the performance of the heuristics. HA stayed a bit beforehand comparing with the other two heuristics.

Comparing the results of the heuristics
For comparing the three heuristics in larger scales, instances with longer operation period (20-50) and larger makespan  HA has demonstrated its good performance with more frequent appearance of non-negative values in Table 2, which is contributed by its mixture strategy in exploration and exploitation. Consequently, HA is chosen to get the initial schedule. TS achieved similar results with a slightly worse quality compared with HA. With the feature of neighbor exploration tending to discover similar solutions and its satisfying quality, TS is chosen for rescheduling.

ML-based classification analysis
This section shows the performance of the proposed approach from the results of three classification algorithms. As for the performance estimator, the area under the receiver operating characteristic curve (AUC) is adopted since it exhibits more desirable properties comparing with overall accuracy [71,72]. The value of AUC ranges from 0.5 (useless test) to 1 (correctly discriminated test).
The test compares the performance of SVM, RFC, and MLP and the number of operations considered. The X-axis OP Num indicates the number of collected operations ranging from 1 to 10. The outcome is averaged by taking results from 10 random seeds (different seed leads to different fitting behavior of RFC and MLP, which likely causes different scores). In Fig. 9, the average AUC values are presented. As shown, RFC stayed far ahead. For SVM, firstly AUC score grew but declined after OP Num 8 was reached. Instead, for MLP, the score kept decreasing, with some exceptions in the middle. Concerning RFC, as OP Num increased, its AUC values kept improving. Thus, considering operations with the ten highest ratios, the RFC achieved the best AUC score of 0.81. For this reason, RFC and this setting are considered in the following subsection.

Rescheduling performance
The following subsections will present the workflow of the rescheduling framework, its computational results on the comparisons of makespan improvements, remaining makespan, and detailed analysis of two examples.

Rescheduling simulation process
The pseudo-code in Algorithm 1 exploits the workflow of the simulation done in the rescheduling framework where ML and optimization techniques are integrated.
The method for rescheduling at fixed time intervals is often used in practice due to the simplicity of rescheduling rules. Especially, as we know, several companies with three work shifts per day tend to reschedule every 8 h (i.e., at the start of each work shift). With this policy, when the integration with I4.0 technologies is not applied, the company gives workers their mansions at the start of the shift and workers stick to the plan until the end of their shift. By contrast, some companies provide wearable devices for workers with I4.0 that create the opportunity to communicate quickly and efficiently-thus offering the possibility of having real-time information to update the work without paying a real cost for reorganizing. So it is not considered to add a penalty when implementing rescheduling.

Rescheduling computational results
Given the time interval of T = 2 time units and θ max = 15, we tested the periodic approach with 1, 2, 4, 7, and 10 rescheduling time intervals and the ML rescheduling policies outlined in Section 6. For the same scenario that ML and periodic approaches run, the same oscillation values are added. By implementing the procedures presented in Algorithm 1, the statistics were collected at each time interval until reaching the originally planned finishing time. For example, if a schedule is estimated to be completed in 100 time units, then in the simulation of production, 100 time units will be set as the time horizon.
All the values correlated with time are normalized and represented in percentages.

Comparison of makespan improvements
The statistics on each approach are shown in Table 3. The column Approach indicates the rescheduling mechanism, N represents the rescheduling times, and avgI indicates the average improvement of the makespan (i.e., the average saving of the production time, calculated by averaging the improvements of all the rescheduling occurrences). The last, stdI , presents the standard deviation of the makespan improvements. More precisely, by defining n(θ) as the times in which a rescheduling is performed in scenario θ , the rescheduling number taking into account all scenarios is defined as As shown, P-1 had a negligible average improvement with the highest rescheduling frequency. We assume that a single time unit is 1 h, this strategy is equivalent to rescheduling every 2 h, so it is not ideal for real-world implementation (many operations can last longer than 2 h).
On the contrary, P-10 rescheduled just 24 times but achieved the highest average gain in makespan (16.00). Rescheduling less frequently creates a more extensive growing space. However, its considerable standard deviation value indicates the range of its improvement values is a bit too wide.
In general, ML performed best in terms of both average value and standard deviation by rescheduling a few times.
Comparison of remaining makespan For stabilizing the production, it is essential to manage unexpected events in a dynamic environment. Generally speaking, the less rescheduling, the better, despite the use of modern technologies, because any communication can fail for various reasons (workers may miss messages, misunderstand, lose time to understand the message, and so on).
To investigate the differences between the makespans achieved through each periodical solution and ML approach, C θ is defined as the remaining makespan of scenario θ ∈ at the last measured time step (the time step is measured until the planned finish time) through periodical rescheduling, C ML θ is for ML approach and D θ is the corresponding difference, calculated as in Eq. 25: With the methods above applied in each scenario, both the average makespan difference avgD and the standard deviation stdD are calculated by considering all the scenarios. The results are shown in Table 4. In Table 4, the figures excluding the row of P-7 are positive, which indicates that most periodical solutions had bigger remaining makespan than those of the ML approach. Therefore, the schedules were probably finished later than ML by the periodic ones. P-1 and P-7 reach, on average, the closest makespan values to ML. However, as stated before, P-1 is not the right approach in practice because its frequent rescheduling leads to unstable production and potential resource waste. The standard deviations were significant because the tested instances were quite diversified in operation quantity, machine quantity, and processing time.
Although P-4 rescheduled more frequently than P-10, there is no advantage in reducing makespan values; hence, we can infer rescheduling frequently was not indeed necessary in every scenario. P-2 rescheduled more often than P-4. However, it failed to reschedule at the most "profitable" time in general. Besides, we can see that compared with other periodical approaches, P-7 was most comparable with the ML approach.
Comparison of the makespan at each time step Ultimately, not only at the end of the time horizon but also during the time phases, we examine the discrepancies in makespan. At each time step, we compare the makespan difference and compute it by considering ML as the benchmark.
Similar to the calculation of D θ , the difference at each time step is now counted. Given scenario θ ∈ and a set of time steps T , C tθ is defined as the remaining makespan at the time step t ∈ T , C ML tθ is for ML, and D * θ is the makespan difference by comparing each approach with the ML approach at the same t, which is calculated in Eq. 26: After getting D * θ , the averages and standard deviations were calculated by following conventional methods. The results are listed in Table 5. Table 5 shows that averagely all the periodical approaches had greater makespan than ML, which proves the effectiveness of ML in the ongoing production. Among   Tables 3, 4, and 5, while P-7 had a bigger average makespan considering all time steps compared with ML (in Table 5), it did indicate a good tradeoff between rescheduling frequency and schedule delays. In general, P-4 behaved fairly in all the aspects, which matches the fact that it is widely used in factories. Considering the proposed approaches-ML and periodical ones-we can see that by recommending to reschedule less frequently and at the right time, ML got satisfactory outcomes not only in saving overall production time but also in the rescheduling effectiveness, which avoided wasting resources in managing machine and worker changes.

Detailed analysis of two examples
We take two scenarios for detailed analysis in Fig. 10 by showing the makespan trend. Table 6 shows the corresponding rescheduling frequency for each approach in the two scenarios.
On the left of Fig. 10, it shows that ML suggested rescheduling twice at around time steps 25 and 65. Generally, all lines shook greatly from time step 18 to 69, which might result from the random oscillations added to the schedule. ML outperformed all others except for P-7. On the right, P-7, P-10, and ML overlapped into one line. Without any rescheduling, ML got the best result equivalently in makespan. We can deduce that rescheduling is not necessary for every disturbance, and a periodical approach is rigid to fit in.
The planning problem is N P−hard. Therefore, adequate time to run metaheuristic algorithms is needed. In the continuous manufacturing process, the production status is changing concerning the passage of time. A rescheduling decision can be made within seconds with an ML approach, and the actual rescheduling approach is searched only if the favorable decision is made. ML reveals the potential to make better rescheduling decisions not only for the adaptability it owns but also for the time it saves.

Conclusions and future research
In this paper, we have proposed a new framework for coping with rescheduling under the context of I4.0. This work represents the preliminary approach to use ML and optimization together in the rescheduling field by assuming the availability of real-time data analysis. We proved the potential of the integration of these techniques by conducting computational experiments. It is essential to notice that, despite the simplicity of the techniques used in the framework, we have been able to achieve good results. The main results of the paper are, therefore, the definition of the first set of features that led to a good classifier and the above general methodology. Furthermore, another contribution is the formalization of the FJSP through a mathematical programming model. While the case study is on plastic and rubber manufacturing, the proposed framework can also be tailored for other industries (such as printed circuit board, semiconductor, and metal), which often face the problem of making the rescheduling decisions. Specifically, the dedicated features should be derived for the new problem. Besides, we believe it is also possible to effectively adapt the approach out of the production industry, such as in personnel scheduling for hospitals, where daily fluctuations in emergencies, patient population, and levels of care occur frequently. For example, in [73], there is a list of available nurses, including floaters who are assigned to specific units in need, and casuals who have no employment contract and are typically called at the last minute. How to satisfy the patients' demand in time while avoiding excessive workload of nurses remains a big challenge. In this particular case, our rescheduling framework may help to balance the service. The average waiting time of patients, the number of patients, and the number of available nurses can be exciting features to be included within the ML approach.
Several future developments on this topic could be considered: -Improving the performance of the heuristics by reducing the number of machines capable of carrying out each task [74] -Expanding the simulation by differentiating the stochastic oscillations under different disturbance factors because the current distribution of PTV is too general and it may result in the rapid increase of processing time -More advanced machine learning algorithms, as well as the definition of an enlarged set of features including the deeper knowledge related to the bottleneck of the scheduling and the property of the graph G. In particular, we are interested in exploring the research with graph theory and neural networks of scheduling and rescheduling patterns [75]. There is also a need for more detailed instructions on the methodology of data analysis.
Finally, the work has shown that the performance of a heuristic is possible to be learned for a machine learning technique. This general aspect could be applied in several other contexts and opens several general research lines as, for example, the possibility to use the ML techniques not just to classify the application of a heuristic but to guide and calibrate it on the ongoing setting.