Benders decomposition for the mixed no-idle permutation flowshop scheduling problem

The mixed no-idle flowshop scheduling problem arises in modern industries including integrated circuits, ceramic frit and steel production, among others, and where some machines are not allowed to remain idle between jobs. This paper describes an exact algorithm that uses Benders decomposition with a simple yet effective enhancement mechanism that entails the generation of additional cuts by using a referenced local search to help speed up convergence. Using only a single additional optimality cut at each iteration, and combined with combinatorial cuts, the algorithm can optimally solve instances with up to 500 jobs and 15 machines that are otherwise not within the reach of off-the-shelf optimization software, and can easily surpass ad-hoc existing metaheuristics. To the best of the authors’ knowledge, the algorithm described here is the only exact method for solving the mixed no-idle permutation flowshop scheduling problem.


Introduction
The permutation flowshop scheduling problem (PFSP) is concerned with sequencing a set N of n jobs on a set M of m machines in a sequential manner. Each job j has a known, fixed, nonnegative amount of processing time on machine i denoted by p i j (i = 1, . . . , m; j = 1, . . . , n). At any point in time, each job can be processed by at most one machine and each machine can process at most one job. When a machine starts processing a job, it must complete that job without interruption, as no preemption is allowed. The sequence of jobs to be processed is the same for each machine, implying that there are n! possible solutions of the problem. One extension of the PFSP is the no-idle permutation flowshop scheduling problem (NPFSP), where machines should run continuously from the time that they start the first job until they complete the last job, i.e., idle times are not allowed at any machine in between the processing of consecutive jobs. The problem arises in production environments where setup times or machine operating costs are significant, so it is not cost-effective to let the machines sit idle at any point during the production run (Pan and Ruiz 2014), such as foundry production (Saadani et al. 2003), fiber glass processing (Kalczynski and Kamburowski 2005), production of integrated circuits and in the steel industry (Pan and Ruiz 2014). In other real-life cases, and as pointed out by Ruiz et al. (2009), there might be technological constraints impeding idleness at machines such as high-temperature frit kilns, for example.
The first study on the NPFSP was by Adiri and Pohoryles (1982), defined on two machines with the objective of minimizing the sum of completion times. A more popular objective has been to minimize the total makespan, namely the sum of the completion times of the last job processed on each machine, for which the first study was contributed by Vachajitpan (1982), where mathematical models and branchand-bound methods for solving small-scale instances were presented. One exact method using branch-and-bound was described by Baptiste and Hguny (1997). Comprehensive reviews on the problem can be found in Ruiz and Maroto (2005) and Goncharov and Sevastyanov (2009), which indicated an abundance of heuristic algorithms described to solve the problem, including discrete differential evolution and particle swarm optimization (Pan and Wang 2008a, b), iterated greedy search (Ruiz et al. 2009), variable neighborhood search (Tasgetiren et al. 2013) and memetic algorithms (Shao et al. 2017). To date, however, no effective exact approach has been proposed for NPFSP, and the attempts that exist can solve instances with only more than a handful of jobs (Pan and Ruiz 2014).
The mixed no-idle permutation flowshop scheduling problem (MNPFSP) arises as a more general case of the NPFSP when some machines are allowed to be idle, and others not. Examples can be found in the production of integrated circuits and ceramic frit, as well as in the steel industry (Pan and Ruiz 2014). In ceramic frit production, for example, only the central fusing kiln has the no-idle constraint. As an extension of the PFSP, which is known to be NP-hard for three or more machines (see, e.g., Röck 1984), the MNPFSP is also NP-hard in the strong sense (Pan and Ruiz 2014). The only study on this problem that minimizes makespan is that of Pan and Ruiz (2014), which describes a mixed integer programming model for the problem, an effective iterated greedy (IG) algorithm and enhancements to accelerate the calculation of insertions used within the local search. Computational results showed that, in comparison with the existing methods, the IG algorithm was able to identify solutions for the NPFSP instances that were 61% better, on average, with respect to the makespan. However, no exact method, to our knowledge, has been proposed to solve the MNPFSP, which is the aim of this paper. In particular, we contribute to the literature by describing an application of Benders decomposition that is enhanced with a referenced local search (RLS) that is used to generate additional cuts to accelerate the convergence of the algorithm. We propose and test three cut generating strategies and use combinatorial cuts to discard solutions already evaluated. The algorithms described in this paper are all exact, the performance of which we computationally assess on a test bed of literature instances, and compare with the commercial optimizer CPLEX and its automated Benders decomposition algorithm.
The remainder of this paper is structured as follows. Section 2 formally defines the problem and presents a formulation. The proposed algorithm is described in detail in Sect. 3. Computational results are presented in Sect. 4, followed by conclusions and future research in Sect. 5.

The mixed no-idle permutation flowshop scheduling problem
We denote by π ( j) the job occupying position j in a given permutation π . The PFSP requires the condition c i,π ( j) ≥ c i,π ( j−1) + p i,π ( j) to hold for any two consecutive jobs in any permutation, where c i, j is the completion time of task j at machine i. A key difference between the NPFSP and the PFSP is that the former forbids any idle time between any two consecutive tasks, thus transforming the previous inequality into an equality as c i,π ( j) = c i,π ( j−1) + p i,π ( j) . The mixed no-idle problem generalizes two problems in which there exists a subset M ⊆ M of m 'no-idle' machines, and it is only for those machines in M \ M that any idle running is allowed. Apart from these key differences, the common PFSP assumptions hold (Baker 1974): (1) Jobs are independent from each other and are available for their processing from time 0; (2) machines are always available (no breakdowns); (3) machines can process only one task at any given time; (4) jobs can be processed by only one machine at all times; (5) tasks must be processed without interruptions once started and until their completion (no preemptions allowed); (6) setup times are either sequence-independent and can be directly included in the processing times, or are considered negligible and therefore ignored; and finally (7) there is an infinite in-process buffer capacity between any two machines in the shop.
In the remainder of the paper, we make use of the integer programming formulation below of the problem described in Pan and Ruiz (2014), provided here for the sake of completeness. In this formulation, a binary variable x jk takes the value 1 if job j is in position k of the sequence, and 0 otherwise. Continuous variables c i,k denote the completion time of job k = 1, . . . , n on machine i = 1, . . . , m. Minimize subject to n j=1 x jk = 1 k = 1, . . . , n x jk ∈ {0, 1} j, k = 1, . . . , n.
Objective function (1) minimizes the makespan among all jobs. In the PFSP and also for the MNPFSP, the makespan corresponds to the completion time of the last job in the permutation at the last machine of the shop (c m,n ). With constraints (2) and (3), we enforce that any job occupies one position in the permutation and also that each position at any permutation is occupied by one job. Constraints (4) control the completion time of the first job in the permutation. Constraints (5) enforce that the completion times of jobs on the second and subsequent machines are larger than the completion times of the preceding tasks of the same job on previous machines plus their processing time, effectively avoiding overlaps of the tasks of the same job. The key characteristic of the MNPFSP is shown in constraints (6) and (7). Constraint (6) forbids idle time at 'no-idle' machines by making the completion time of a job equal to the completion time of the previous job in the sequence plus its processing time. Inequality (7) is the usual PFSP constraint that forbids overlaps of jobs on the same machine and, at the same time, allows for idle time.

Benders decomposition
In this section, we first describe an application of the traditional Benders decomposition followed by the proposed cut generation strategy using a local search algorithm.

Application of Benders decomposition
Benders decomposition (Benders 1962) is a cutting plane algorithm to solve a Benders reformulation of a given model that enables it to be decomposed into two simpler formulations, namely the master problem and the subproblem. The master problem contains only a subset of the variables and of the constraints of the original model. The subproblem is the original model in which the master problem variables are fixed, and the solution of which yields either an optimality or a feasibility cut for the master problem (Costa et al. 2012). Benders reformulation is typically solved using a delayed constraint generation algorithm that iterates between the master and the subproblem, until an optimal solution is identified.
Let M(c, x) denote the formulation (1)-(9) where x = {x jk | j, k = 1, . . . , n} and c = {c ik |k = 1, . . . , n; i = 1, . . . , m} are the vectors of the decision variables. Let us suppose that variables x have been fixed as x =x ∈ X = {x|x satisfies (2), (3), (9)}. The resulting formulation, shown by M(c,x), consists only of the variables c, the constraints of which are assigned the dual variables α and β corresponding to constraints (4) and (5), respectively, and γ corresponding to constraints (6) and (7), respectively. The dual D (α, β, γ,x) of M(c,x) formulation is given by the following: The procedure to calculate the makespan we use here is the one proposed in Pan and Ruiz (2014), which calculates the start and completion times of the jobs in the order in which they appear in a given permutation, with the makespan being equal to the completion of the job in the last position. The procedure runs in O(nm) time and implies that M(c,x) always admits a feasible solution for a givenx ∈ X . This, in turn, means that D(α, β, γ,x) is always feasible for a givenx ∈ X , and for an optimal solution (α,β,γ ) of the dual problem, one obtains the following Benders optimality cuts: where z is a lower bound on the optimal solution value of Using this result, we are now ready to present the following reformulation of M(c, x), referred to as the master problem, constructed using the set P D of extreme points of the polytope defined by the dual problem D (α, β, γ,x), and shown as MP(P D ) below: As the MP includes a large number of optimality cuts, it can be solved using a cutting plane algorithm in practice, normally starting with MP(∅) with no optimality cuts (26), and generating the cuts on an as-needed basis. The algorithm stops after solving a certain MP(P), where P ⊆ P D .
To help speed up the convergence of the algorithm, we also use the following combinatorial Benders cuts using any solutionx ∈ X in the master problem: which can be used to cut solutionx off from the set of feasible solutions to M(c, x). In particular, after the addition of constraint (27), any solution obtained by the formulation will differ from solutionx with respect to the position of at least two jobs. This follows from the fact that changing the position of one job in a given sequence implicitly requires the change of position of another job in the sequence.

Referenced local search algorithm
Another ingredient of our algorithm is a referenced local search (RLS), proposed by Pan et al. (2008), which is initialized with a seed permutation π ref obtained by a good constructive heuristic. In this paper, the reference permutation π ref is taken from the master problem solution. Let C max (π ) denote the makespan of permutation π . The RLS procedure first finds the referenced job, which is determined by using the index i in the RLS procedure, in the current permutation π . The referenced job is removed from π and inserted into all possible positions of π . If this operation results in a permutation π * with a lower C max (π * ) as compared to C max (π ), then the current permutation is replaced with π * and the value of the counter controlling the number of runs in RLS is set to 1. Otherwise, the value of the counter is increased by one. This procedure is repeated until the value of the counter reaches the number of jobs in the problem. RLS keeps a list S with the best solutions found. The pseudocode of RLS is given in Algorithm 1. Note that the accelerated makespan calculations and speedups proposed in Pan and Ruiz (2014) are employed in the proposed RLS local search.

Cut generation using referenced local search
The choice of various ingredients used within a Benders decomposition can have a significant effect on the performance of the algorithm. These range from model selection (Magnanti and Wong 1981), cut improvement (e.g., Papadakos 2008; Saharidis and Ierapetritou 2013) and strengthening the master problem, through the addition of pre-generated valid inequalities (e.g., Cordeau et al. 2006). However, the choice of the integer solutions obtained from the master problem and the improvement thereof has received much less attention. In particular, the question around the effect of the quality of a feasible solution and the strength of the corresponding cut subsequently added to the master problem on the convergence of the Benders decomposition algorithm has not yet been fully investigated. Costa et al. (2012), for example, suggest the use of intermediate solutions within Benders decomposition, obtained either during the course of the branch-and-cut algorithm or through local search, and report encouraging results for the fixed-charge network design problem. In this paper, we seek to explore this question further and propose a Benders decomposition algorithm that embeds the bespoke RLS into the algorithm for solving the MNPFSP. The aim is to enhance the performance of the algorithm by generating extra cuts within the algorithm induced by the heuristic solutions. This section explains the details of the algorithm and describes and tests various strategies for an effective implementation.
The enhanced Benders decomposition is an iterative algorithm that generates optimality cuts (26) at each iteration on the basis of an optimal MP solution x * and uses x * as an input to the RLS to generate a user-defined number σ of neighbor solutions, each of which induces an additional optimality cut inserted into the master problem. The reason behind the choice of the RLS, instead of other heuristics and metaheuristics for the problem, is the simplicity of its implementation and the lack of any special input parameters. The RLS has been shown to work effectively for solving the PFSP (Pan et al. 2008), the NPFSP (Deng and Gu 2012) and the MNPFSP (Pan and Ruiz 2014). We propose three different strategies for generating the additional cuts, which are described below. Let S = {x 1 , x 2 , . . . , x |S| } be the set of solutions generated by RLS, and let v(x) denote the objective function value of a feasible solution x.

Computational results
This section presents a computational study to assess the performance of the Benders decomposition algorithm proposed in this paper. The algorithm and its variants are coded in Visual C++, using CPLEX 12.7.1 as the solver. We used an Intel Core i5-2450M computer with a 2.5 GHz CPU and 4 GB of memory. The tests were conducted on two sets of instances available at http://soa.iti.es/rruiz that were originally proposed in Ruiz et al. (2009)

Algorithm 2 Benders decomposition
Input: number σ of cuts to add at each iteration, cut generation strategy, allowable optimality gap ≥ 0 1: Let S be the set of all neighbor solutions of x 0 obtained by the RLS. 9: Let V ← x i where x i is one of the i = 1, 2, . . . , σ solutions obtained according to the cut generation strategy used (elite, highly elite or random) 10: for each solution x i ∈ V do 11: Solve D(α, β, γ, x i ) and let (α, β, γ ) i denote the resulting solution 12: Generate a combinatorial cut using solution x 0 and add to MP(P). 18: end for 19: end while Output: Best solution x * and value v(x * ) Ruiz (2014). The experiments were conducted in four sets, which will be explained below along with the results.

Performance of standard Benders decomposition implementations
In the first stage, we first compare our (deliberately) naive implementation of Benders decomposition (shown by BD) described in Sect. 3.1, the branch-and-cut algorithm of CPLEX 12.7.1 (shown by CPLEX) to solve the formulation of the problem shown in Sect. 2 and the automated Benders decomposition available within the software (shown by ABD). The aim is to show the performance of standard Benders decomposition implementations on the MNPFSP. For this purpose, we generate relatively small-size instances from the instance I_3_500_50_1 with 500 jobs and 50 machines available at the above website. A smaller instance with ten jobs and three machines, for example, is generated by using the first ten jobs and three machines within the larger instance. A total of 27 small-scale instances are formed with n ∈ {10, 15, 20, 25, 30, 35, 40, 45, 50} jobs and m ∈ {3, 6, 9} machines. A computational time limit of 7200 CPU seconds was imposed on the total run time of each algorithm. The results are given in Table 1, where the first three columns show the instance number as the identifier, number n of jobs and number m of machines. Then, for each of the three methods compared, we provide the value of the best integer solution (BI) identified within the time limit, the final optimality gap (GAP), in percent, the overall solution time (ST), in seconds, and the total number of iterations (NI) for ABD and BD. As can be seen from Table 1, BD was able to solve 23 out of 27 instances to optimality, while CPLEX and ABD are able to solve all of the instances to optimality within the given time limit. In terms of the time to solve to optimality, CPLEX is the fastest in all instances except for instances 16, 22 and 23, for which ABD shows a better performance. BD is quicker than the other two for instances 4, 7 and 10. These results suggest that a standard implementation of Benders decomposition without any enhancements is highly ineffective and does not seem to provide encouraging results for the MNPFSP.

Effectiveness of the cut generation strategies
The second stage of the experiments numerically evaluates the cut generation strategies, and the effect of the number of additional cuts σ added at each iteration of the algorithm. We denote by σ = σ + 1 the total number of cuts added at each iteration, where the '+ 1' indicates the original optimality cut from the solution of the master problem, and test σ ∈ {2, 5, 10, 25, 50} in the experiments, resulting in a total of 15 combinations applied to the 27 instances described above. A computational time limit of 7200 CPU seconds also was imposed.
The results of the experiments are reported in Tables 2  and 3 for the elite and the highly elite strategies, respectively, denoted by the notation Eσ and HEσ which also indicates the number of additional optimality cuts introduced. We do not report the results associated with the random strategy, as it consistently produced very poor quality solutions in all cases.
The findings indicate that the elite strategy has found the optimum solutions for all instances with all settings of σ , with the exception of instances 24 and 27 for E2, instances 18, 24, 26 and 27 for E5 and instances 18, 23 and 26 for E10. The highly elite strategy, on the other hand, has found all the optimum solutions for all values of σ , as can be seen from Table 3. Furthermore, the variant HE2 solves 19 (out of 27) instances to optimality quicker than the other two cut generation strategies. Comparing HE2 with the results of CPLEX and ABD reported in Table 1, we observe that it yields the best performance on 11 out of the 27 instances and is able to solve instances to optimality for which BD has failed to do so within the time limit.

Results on larger-scale instances
The third and last phase of the computational study compares the algorithms on medium-and large-scale instances. The former set includes a further 30 instances, generated in the same manner as the small-size instances by choosing n to range from 50 to 500 in increments of 50, and m as either 5, 10 and 15 from within the main instance I_3_500_50_1.
Having established the superiority of HE2 over other variants of the Benders decomposition algorithm in the previous section, we now employ two further enhancements to this algorithm. The first is the addition of combinatorial cuts (27) within the algorithm, shown by HE2, and the second is a further strengthening of the optimality cuts using the Paretooptimal cut generation scheme described by Magnanti and Wong (1981), indicated by HE2 + PO. These two variants have also been tested by deactivating the combinatorial cuts, indicated using the same names but with the suffix '-CC.' It should be noted with the addition of cardinality cuts, the optimality gaps are no longer valid, reasons for which we do not report here. The tests comparing the four variants have been conducted on a limited set of instances with n chosen as either 50, 100 or 150, and m equal to 5, 10 or 15. The results are reported in Table 4.
The results in Table 4 show that the variants of the algorithm, namely HE2 and HE2 + PO that use combinatorial  To further compare HE2 and HE2 + PO along with the CPLEX solver and the ABD, we provide further results on instances with up to 500 jobs in Table 5. These results are indicative of the superior performance of the two variants of the algorithm over CPLEX and ABD, in terms of both the computational solution time and solution quality. In particular, with the exception of one instance, one of the two variants always yields the best performance. More remarkably, while neither CPLEX nor ABD was able to identify a feasible solution for instances (n = 150, m = 10), (n = 300, m = 5), (n = 350, m = 10), (n = 500, m = 5) and (n = 500, m = 5) within the allowed time limit, at least one of HE2 or HE2 + PO solved these instances to optimality.

Comparison with a state-of-the-art heuristic
The last set of experiments reported in this section are conducted to compare the algorithms we propose in this paper with a state-of-the-art heuristic described for the problem, namely the iterated greedy algorithm (IGA) of Pan and Ruiz (2014), in terms of the value of the solutions identified. These tests use the same set of instances shown in Table 5. In their study, Pan and Ruiz (2014) run the IGA under a time limit of n × (m/2) × ρ milliseconds, where ρ ∈ {10, 20, 30, 60, 90}. In our experiments, we run the IGA five times, using the largest value for ρ = 90 in setting the time limit for each of the five runs. The results are shown in Table 6, where the column titled BI corresponds to the best value found for the Bold values indicate the best solution value corresponding instance, and the column titled Method indicates which of the four algorithms of Table 5 was able to identify the best value. Columns five to seven report the minimum, maximum and the average solution values for each instance, respectively, across the five runs of the IGA, and the standard deviation. For reasons of fairness, we also run the IGA once for each instance, under the same time limit of 7200 s used for the exact algorithms, and report the value of the best solution identified in the last column of the table.
As the results in Table 6 indicate, the exact algorithms proposed in this paper, in particular HE2 and HE2 + PO, are highly competitive with the IGA for instances with n ≤ 250, and are able to identify the same solution values in most cases. However, the exact algorithms often yield better quality solutions for larger instances in comparison, irrespective of whether the IGA is run under a 7200 second time limit or shorter. In particular, for instances with (n = 350, m = 15), (n = 400, m = 10), (n = 400, m = 15), (n = 450, m = 10), (n = 450, m = 15), (n = 500, m = 10) and (n = 500, m = 15), the exact algorithms yield better quality solutions under the same time limit.

Conclusions and future research
This paper presented an enhancement to the traditional Benders decomposition by generating cuts using a local search algorithm for the mixed no-idle permutation flowshop scheduling problem. The latter algorithm was used as an 'oracle' to generate high-quality solutions, which in turn were used to construct additional cuts used within the iterative algorithm. Our findings indicate that such a strategy can substantially improve the performance of the algorithm, even with only a single additional cut added at each iteration, and that the quality of the solution used to generate the additional cut is of paramount importance. In particular, it is only high-quality solutions that help to improve the convergence; randomly generated solutions worsen the efficiency of the algorithm. Our algorithm also makes use of combinatorial cuts that eliminate feasible solutions from the search space, which leads to the solution of instances of up to 500 jobs and five machines that otherwise are not solved with a commercial solver. The results encourage the use of such a strategy on other types of problems, assuming that an 'oracle' is available to generate high-quality solutions within short computational times.