A characterization of optimal multiprocessor schedules and new dominance rules

The paper on hand approaches the classical makespan minimization problem on identical parallel machines from a rather theoretical point of view. Using an approach similar to the idea behind inverse optimization, we identify a general structural pattern of optimal multiprocessor schedules. We also show how to derive new dominance rules from the characteristics of optimal solutions. Results of our computational study attest to the efficacy of the new rules. They are particularly useful in limiting the search space when each machine processes only a few jobs on average.


Introduction
The present paper is concerned with the multiprocessor scheduling problem. Given a set M = {M 1 , . . . , M m } of m ≥ 2 identical parallel machines and a set J = {J 1 , . . . , J n } of n > m independent jobs with positive processing times p 1 , p 2 , . . . , p n , the objective is to assign the jobs to the machines so that the latest machine completion time (also called makespan) C max = max{C 1 , . . . , C m }-with C i being the sum of processing times of all jobs assigned to M i -is minimized. Preemption is not allowed. Using the three-field notation of Graham et al. (1979) this problem is abbreviated as P||C max . In the literature, P||C max is also known as the makespan minimization problem on identical parallel machines.
The N P-hard problem P||C max (see Garey and Johnson 1979) represents one of the very basic and fundamental problems in scheduling theory. It has received and still receives a lot of attention from both the academic world and practitioners. The large body of literature, that has evolved over the years, contains papers on approximation algorithms, (meta-)heuristics, exact solution procedures, and lower bounding techniques.
From the numerous publications on (meta-)heuristic algorithms within the last two decades, we selected the following few ones to outline the broad range of near-optimal solution approaches. Alvim and Ribeiro (2004) exploited the "dual" relation between P||C max and the bin packing problem (BPP). They proposed a hybrid improvement heuristic that consists of construction, redistribution, and improvement phases. In the latter phase, tabu search is applied. Frangioni et al. (2004) proposed new neighborhood operators for local search algorithms that perform multiple exchanges of jobs among machines. Dell'Amico et al. (2008) presented an effective meta-heuristic algorithm based on the scatter search paradigm. Kashan and Karimi (2009) presented a discrete particle swarm optimization algorithm and a hybrid version, that makes use of an efficient local search algorithm to further improve on the makespan. Paletta and Vocaturo (2011) developed a composite heuristic. In the construction phase, families of partial solutions are combined until a feasible solution is generated. The construction phase is followed by an improvement phase. Local search techniques are used to improve on the initial solution. Davidović et al. (2012) applied a bee colony optimization approach. In the same year, Chen et al. (2012) proposed a dynamic harmony search algorithm and a hybrid version, that additionally performs a variable neighborhood search based local search. Among the most recently published meta-heuristic algorithms are the grouping evolutionary strategy of Kashan et al. (2018) and an improved cuckoo search of Laha and Gupta (2018). Only recently, Della Croce et al. (2019) and Della Croce and Scatamacchia (2018) have revisited the famous longest processing time (LPT) rule of Graham (1969).
A few approaches towards the exact solution of P||C max have also been published. Dell' Amico and Martello (1995) implemented a depth-first branch-and-bound algorithm. They also derived tight lower bounds from the relationship between P||C max and BPP. Mokotoff (2004) designed a cutting plane algorithm. Dell'Amico et al. (2008) proposed a specialized binary search and a branch-and-price scheme. Haouari and Jemmali (2008) suggested a new symmetry-breaking branching scheme and lifting procedures to tighten lower bounds. Lenté et al. (2013) derived a new exponentialtime algorithm from their extension of the Sort and Search method. Mnich and Wiese (2015) presented the first fixed-parameter algorithm for P||C max . Recently, Mrad and Souayah (2018) proposed an arc-flow formulation.
Despite the large number of publications, only little is known about the structure of optimal solutions. This might be due to the fact that P||C max itself has very little structure compared to other N P-hard optimization problems. To the best of our knowledge, only Dell' Amico and Martello (1995) addressed this issue casually by developing upper and lower bounds on the number of jobs per machine. These bounds are then used to derive lower bounds on the optimal makespan.
To close this gap, we aim at identifying general characteristics of optimal multiprocessor schedules. Using an approach that is to some extent related to the concept of inverse optimization, we show that a schedule has to have a specific characteristic in order to be (uniquely) optimal. This allows us to restrict the solution space effectively during the search for an optimal schedule. In one of our earlier papers (Walter et al. 2017), we have already applied this approach successfully to the exact solution of the "dual" problem P||C min , i.e., the problem of maximizing the minimum machine completion time. As the said paper overtook the present one during the review process, it does not contain any proofs of the underlying mathematical theory. However, we think that it is important to provide the formal results as well. This makes it easier for future researchers to transfer them to other combinatorial optimization problems such as the bin packing problem. The mathematical groundwork, therefore, constitutes the main contribution of this paper.
We identify and prove a general characteristic of optimal multiprocessor schedules and translate it into new dominance rules. Although this paper focuses on theoretical foundations, we implemented these rules in order to determine their benefit in a computational study. Used within a rather simple depth-first search, we obtained promising results: The new rules are quite effective in eliminating dominated (partial) solutions when each machine processes only a few jobs on average (i.e., 2 < n/m < 4). Those instances are known to be typically more difficult to solve than large-sized instances with multiple jobs per machine (cf. the computational results published in Dell' Amico andMartello (1995), Dell'Amico et al. (2008) and Haouari and Jemmali (2008)). With increasing n/m, bounding arguments often become tighter (cf., e.g., Haouari et al. 2006) and this helps to verify optimality more quickly.
Our paper is divided into a theoretical part (Sects. 2, 3) and a practical part (Sects. 4, 5). The theoretical part represents the main contribution. Here, we undertake a thorough investigation of the solution space and identify a general characteristic of optimal multiprocessor schedules. We then translate our findings into new dominance rules and discuss prerequisites for their application within a tree search. In the second part, we describe the elements of the implemented branch-and-bound algorithm and address the efficient implementation of the new dominance rules. We then analyze the results of our experimental study and assess the benefit of the new rules. Finally, Sect. 6 concludes the paper and describes future research directions.

A theoretical study of the solution space
In this section we provide a profound theoretical study of the underlying solution space. Using an approach similar to the idea behind inverse optimization (cf., e.g., Ahuja and Orlin 2001), we aim at the identification of a general characteristic of optimal multiprocessor schedules. In the remainder of the paper we presuppose the jobs to be labeled so that p 1 ≥ p 2 ≥ · · · ≥ p n .
With these restrictions we eliminate symmetric solutions that result from a simple renumbering of the machines. We, therefore, call the remaining solutions nonpermuted schedules. It is readily verified that the number of non-permuted schedules is approximately equal to m n /m!. Throughout this paper we use the aforementioned symmetry-breaking solution representation. For brevity, we often omit the adjunct "non-permuted" when we speak of schedules.

Methodological approach: the concept of potential optimality and the path conditions
Our study originates from the question whether there exists a general pattern that characterizes non-optimal solutions no matter what selection of processing times is given. The existence of such a characteristic would allow us to limit the solution space to those schedules that do not have this characteristic and therefore have the potential to become (uniquely) optimal. We call them potentially (unique) optimal schedules. Our main goal is to identify such a characteristic and to find a preferably small set of schedules that contains at least one optimal solution for any feasible input data. If this would succeed, then it suffices to search this set for an optimal schedule. To achieve this goal we apply an approach that is related to the concept of inverse optimization (see Ahuja and Orlin 2001;Heuberger 2004). In inverse optimization one aims at determining unknown exact values of (some) adjustable parameterssuch as processing times-within given boundaries so that a pre-specified solution becomes optimal. Until now, this concept has been applied to scheduling problems only by very few researchers (e.g., Shakhlevich 2009, 2011;Koulamas 2005). Slightly deviating from the basic idea of inverse optimization, we consider arbitrary schedules and ask whether we can select n feasible processing times so that the given schedule becomes uniquely makespan-optimal. If no such set of processing times exists, we can eliminate this schedule from the solution space.
Before we start with the characterization of potentially optimal solutions, we introduce a new way of illustrating schedules. Usually, Gantt charts are used to display which machine performs which job and what is the start and end time of processing. However, as our methodological approach mainly builds on the number of jobs on each machine rather than on processing times, we propose to illustrate a schedule S by m 2 paths P (1 ≤ i 1 < i 2 ≤ m)-one for each pair of machines. Simply put, is a string of length n + 1 where the j-th entry ( j = 1, . . . , n) represents the difference between the number of jobs on machine i 1 and i 2 after the j longest jobs have been assigned according to schedule S. We set P (i 1 ,i 2 ) S (0) = 0 for each path to represent initially empty machines. Example 2.1 shows how we depict paths.
( j − 1) between any two successive entries can take only one of the three values 1, −1, or 0 for each path. A difference equal to 1 means that job j is assigned to machine i 1 (illustrated by an upward line), a difference equal to −1 means that job j is assigned to machine i 2 (illustrated by a downward line), and if the difference equals 0 this means that j is assigned neither to machine i 1 nor to i 2 but to one of the other m − 2 machines (illustrated by a horizontal line).
As will become apparent in the next two subsections (cf. Theorems 2.3 and 2.5), schedules in the set S(n, m) (n > m ≥ 2) are central to the concept of potentially unique optimal P||C max -solutions. We define this set as follows: or "P (i 1 ,i 2 ) S ( j) = 1 for j = j 1 , . . . , j 2 (0 < j 1 ≤ j 2 < n) and P (i 1 ,i 2 ) S ( j) = 0 for j = 0, . . . , j 1 − 1, j 2 + 1, . . . , n" . (1) The set S(n, m) contains all schedules S that feature two characteristics: (i) each machine processes at least one job and (ii) each path has at least one negative entry when the total number of jobs on the two corresponding machines is greater than 2. We say that schedules in S(n, m) satisfy the path conditions or, equivalently, each of the m 2 paths satisfies the path condition. Returning to Example 2.1, schedule S = (1, 2, 3, 2) is obviously not in S(4, 3) as the (2, 3)-path does not satisfy the path condition, whereas the other two paths satisfy the path condition.
The shares are quite small as can be seen from Table 1. In particular, when m ∈ {5, 6, 7} the number of schedules in S(n, m) is smaller by some orders of magnitude than the number of non-permuted schedules. However, recalling that the total number of non-permuted schedules is approximately equal to m n /m!, S(n, m) may contain a great number of schedules despite small shares.

Potentially optimal schedules on two machines
We start with the case of two identical parallel machines and prove the following theorem.

Theorem 2.3 Let S be a schedule that is not in S(n, 2). Then, S is not a potentially unique makespan-optimal schedule.
Proof Consider a schedule S / ∈ S(n, 2) and let J 1 (S) = {a 1 , . . . , a r } and J 2 (S) = {b 1 , . . . , b s } denote the set of jobs (to be more accurate: their indices) that are assigned to machine 1 and 2, respectively. Without loss of generality, we assume that a 1 < a 2 < · · · < a r and b 1 < · · · < b s . Since S / ∈ S(n, 2), the number of jobs on machine 1 must be at least as large as the number of jobs on machine 2, i.e., r ≥ s (with r + s = n). Moreover, for each k ∈ {1, . . . , s}, the processing time of the k-th longest job on machine 1 is at least as large as the processing time of the k-th longest job on machine 2, i.e., p a k ≥ p b k which is equivalent to a k < b k . Thus, machine 1 runs at least as long as machine 2. The completion time of the last job on machine 1 gives the makespan of schedule S, i.e., C max (S) = r k=1 p a k . In order to prove that S is not a potentially unique makespan-optimal solution, we construct a scheduleS that is not "longer" than S, i.e., C max (S) ≤ C max (S), no matter what problem instance is given. We distinguish two cases depending on the number of jobs on machine 2 in S.

s < 2.
LetS be the schedule that is obtained when job a 2 is shifted from machine 1 to machine 2 in S. Obviously, this cannot increase the makespan. 2. s ≥ 2. Now, letS be the schedule that is obtained when the jobs a s and b s are swapped in S, i.e., a s is processed on machine 2 and b s on machine 1 inS. Then, the makespan In the other case, i.e., C max (S) = s−1 k=1 p b k + p a s , we can conclude This completes the proof of the theorem as in either case a scheduleS exists that is not longer than S.
We remark that the scheduleS itself is not required to be an element of S(n, 2). However, it is readily verified that we can convert any schedule S / ∈ S(n, 2) into a schedule S = S that belongs to S(n, 2) by an iterative application of the shifting operation (as in Case 1 of the proof of Theorem 2.3) and/or swapping operation (as in Case 2). We call the entire process path conversion (on two machines). Clearly, the path conversion does not increase the makespan. Example 2.4 illustrates the procedure.
In view of Sect. 2.4 it is useful to record two important properties of the swapping operation (recall that swaps imply s ≥ 2 jobs on machine 2): • The s − 1 longest jobs on machine 1 are not affected by any swap.
• LetS denote the schedule that is obtained when one swap is performed on a schedule S / ∈ S(n, 2). Then, the entries in the path PS can be computed as follows:  Table 3 ScheduleS Table 4 Schedule S Thus, the path conversion takes exactly (P S (2s − 1) + 1)/2 swaps. The first negative entry in the path of the resulting schedule S occurs at position 2s − 1.
To sum up, we can say that S(n, 2) contains at least one optimal solution for every selection of n feasible processing times. Consequently, when searching for an optimal solution, it is not necessary to consider schedules that are not in S(n, 2), i.e., schedules that do not satisfy the path condition. In a preliminary experimental study we found for every schedule S in S(n, 2) (n ≤ 25) a selection of n processing times so that S is the unique optimal solution. This implies that further reductions in the solution space appear to be only realizable when processing times are explicitly taken into account.

Potentially optimal schedules on three or more machines
Using our findings from the previous subsection we now address potentially optimal schedules on more than two identical parallel machines.
Theorem 2.5 Let S be a schedule that is not in S(n, m) (m ≥ 3). Then, S is not a potentially unique makespan-optimal schedule. Moreover, any schedule S / ∈ S(n, m) can be converted into a schedule that belongs to S(n, m) by a successive application of the path conversion.
We will prove this theorem with the help of the following two Lemmata 2.6 and 2.7.
does not satisfy it. Then, the (i 1 , i 2 )-path still satisfies the path condition after application of the path conversion to P Proof We consider the application of the path conversion to P . Each single shift and swap also affects the entries in the (i 1 , i 2 )-path. We start with the case of a shift. Assume that job k is shifted from machine i 1 to i 3 . Then, the entries in the (i 1 , i 2 )-path change as follows: Now consider the case of a swap. Assume that job k on machine i 1 is swapped with job l on machine i 3 . Recall from Sect. 2.3 that k < l. This leads to the following entries in the (i 1 , i 2 )-path: if j = l, . . . , n with S andS denoting the schedule before and after the current swap is performed, respectively. As can be seen from the two formulas, after each single step of the ( j) for all positions j. Hence, it is impossible that the (i 1 , i 2 )-path does not contain a negative entry anymore after the conversion of the (i 1 , i 3 )-path is completed.
does not satisfy it. Then, the (i 1 , i 2 )-path and the (i 1 , i 3 )-path still satisfy the path condition after application of the path conversion to Proof Assume that the number of jobs on the three machines i 1 , i 2 , and i 3 are r , s, and q in schedule S, respectively. We let b k (k = 1, . . . , s) denote the job with the k-th smallest index on machine i 2 and c q denote the job with the largest index on machine i 3 . We use b k and c q also as the index of the corresponding job.
We distinguish two cases for q.
Due to the assumptions on the three considered paths, q = 1 implies r = 1 and s > 1. Then, according to Sect. 2.3, the conversion of the (i 2 , i 3 )-path takes at most two steps (one shift and at most one subsequent swap). After completing the conversion of the (i 2 , i 3 )-path, the number of jobs on machine i 2 is still greater than or equal to one and the number of jobs on machine i 3 equals two. The number of jobs on i 1 remains unchanged. Thus, the (i 1 , i 2 )-path and the (i 1 , i 3 )-path still satisfy the path condition. 2. q > 1. In Let j 1 and j 2 denote the position of the first negative entry in is supposed to satisfy the path condition. Now we consider the path conversion of P (i 2 ,i 3 ) S . First, recall from Sect. 2.3 that this conversion does not affect the q − 1 longest jobs b 1 , . . . , b q−1 on machine i 2 . Furthermore, note that the first step of the conversion consists in swapping job b q on i 2 with job c q on i 3 , i.e., no previous shift is performed which means that the number of jobs on each machine remains unchanged. We distinguish two subcases depending on the relation between j 1 and b q .
First, note that this subcase implies j 2 = c q . Hence, there are exactly q − 1 jobs on machine i 1 whose index is not greater than c q − 1 in schedule S. Since the path conversion of P (i 2 ,i 3 ) S leaves the jobs b 1 , . . . , b q−1 unchanged but swaps at least the jobs b q and c q , there are at least q jobs on machine i 2 whose index is not greater than c q in the resulting scheduleS. Thus, we have It remains to show that the (i 1 , i 3 )-path of the resulting scheduleS also still satisfies the path condition. This is readily done because (i) jobs on machine i 1 were not affected by the conversion of the (i 2 , i 3 )-path and (ii) some of the "downward lines" in the resulting (i 1 , i 3 )-path occur earlier than in the initial (i 1 , i 3 )-path. Hence, in either subcase we have P Proof of Theorem 2.5 The proof of the first part of the theorem is straightforward. Since S is not in S(n, m), there exists at least one path that does not satisfy the path condition. Let the (i 1 , i 2 )-path be such a path. Application of the path conversion to P (i 1 ,i 2 ) S neither increases the maximum completion time of the two machines i 1 and i 2 (cf. Sect. 2.3) nor involves any jobs on the other m − 2 machines. Thus, the makespan of the resulting schedule cannot be greater than the makespan of S. This proves that S cannot be a potentially unique makespan-optimal schedule.
We prove the second part of the theorem with the help of the two Lemmata 2.6 and 2.7. First, we consider the paths (1, 2), (1, 3), . . . , (1, m) one by one. If any of these does not satisfy the path condition, we apply the path conversion. According to Lemma 2.6, each of the m − 1 paths (1, i) (i = 2, . . . , m) is then satisfying the path condition. However, a renumbering of the machines may now be required in order to restore the representation as a non-permuted schedule. In the second round, we consider the paths (2, 3), (2, 4), . . . , (2, m) one by one. If any of these does not satisfy the path condition, we apply the path conversion. According to Lemmata 2.6 and 2.7, each of the 2m − 3 paths (1, i) (i = 2, . . . , m) and (2, i) (i = 3, . . . , m) is then satisfying the path condition. Again, a renumbering of the machines may be required. We repeat this iterative process until we finally arrive at the (m − 1, m)-path. This shows that we can convert any schedule S / ∈ S(n, m) into a schedule that belongs to S(n, m) by a successive application of the path conversion.
To sum up the results of Sects. 2.3 and 2.4, we can say that for every m ≥ 2 and n > m the set S(n, m) always contains at least one optimal solution no matter what processing times are given. Hence, when searching for an optimal solution to a given sequence of processing times it is not necessary to consider schedules that are not in S(n, m). Those schedules can be excluded from the solution space since there exists always at least one optimal solution that satisfies the path conditions. In view of the fact that P||C max has only very little problem-inherent structure, we did not quite expect such a universal result. However, we shall also remark that we can select processing times in such a way that not every optimal solution satisfies the path conditions. An obvious example is the case of identical processing times where p 1 = . . . = p n . Then, any schedule with either n/m or n/m jobs on each of the m machines is makespan-optimal. However, not every such schedule is in S(n, m), e.g., the schedule S = (1, 2, . . . , m, 1, 2, . . . , m, . . .).
As before, we conducted a small experimental study for m = 3 and n ≤ 12. Again, we were able to find processing times for every S ∈ S(n, m) so that S is the unique optimal solution. Although we do not have a mathematical proof yet, we strongly conjecture that processing times exists for each S ∈ S(n, m) so that S is the unique optimal solution. This would imply that S(n, m) cannot be further reduced without explicitly taking into account the processing times. However, a watertight proof remains as a challenging task for future research. Keeping in mind that the concept of potential optimality does not require knowledge about the actual processing times, we feel that there might be some room to tighten our universal results when specific classes of processing times (e.g., depending on the ratio of the longest to the smallest processing time) are considered.

New dominance rules derived from the path conditions
From our study of the solution space (see Sect. 2) we have learned that there exists always an optimal solution that satisfies the path conditions. We will now use this result to derive and formulate new existential property-based dominance rules for P||C max (cf. Jouglet and Carlier 2011, for an overview on different formulations and types of dominance rules in combinatorial optimization). These rules will then be integrated into an exact solution procedure to guide the search towards schedules in S(n, m) (see Sect. 4).
We now describe the rationale behind the new rules. Given a partial solution we want to decide whether or not it is possible to complete this solution in such a way that the path conditions are satisfied. Basically, this can be done by counting for each machine separately the minimum number of jobs that still have to be assigned until the path conditions are satisfied. Obviously, the counting strongly depends on which jobs have already been assigned and which jobs still have to be assigned, i.e., assumptions on the order in which the jobs are selected for assignment are required. To derive effective rules, that preferably allow for an early decision whether or not the path conditions can be satisfied, we assume the jobs to be successively assigned in order of non-increasing processing times. This is a common job selection principle-not only in a job-oriented branching scheme but also in construction heuristics such as the well-known LPT-rule (cf. Graham 1969). However, it is important to keep in mind that neither the validity of the theoretical results derived in Sect. 2 nor their translation into dominance rules presupposes this specific job selection principle. At the end of this section we will sketch how to derive special dominance rules for other job selection principles or a machine-oriented branching scheme.

Counting the minimum number of required jobs
In order to count the minimum number of required jobs we identify all machine-pairs that do currently not satisfy the path condition and determine the minimum number of required jobs on these machines. Let us consider a partial solutionS k in which the k < n longest jobs have already been assigned. For each pair of machines (i 1 , i 2 ) with 1 ≤ i 1 < i 2 ≤ m we introduce a dummy variable δ (i 1 ,i 2 ) S k ∈ {0, 1} that indicates whether or not the corresponding path condition is currently fulfilled (δ if the answer is yes, δ (i 1 ,i 2 ) S k = 0 if the answer is no). In line with the definition of the set S(n, m) in (1), a pair (i 1 , i 2 ) currently satisfies the path condition if either the corresponding partial path has already at least one negative entry or each of the two machines processes exactly one of the first k jobs. At this point it is important to note that in the former case, the path condition remains satisfied no matter how the remaining n − k jobs are assigned, whereas in the latter case, the (i 1 , i 2 )-path might not satisfy the path condition after the assignment of the remaining jobs (e.g., when i 1 receives another job but i 2 does not).
Obviously, only the pairs (i 1 , i 2 ) with δ (i 1 ,i 2 ) S k = 0 have to be considered in the calculation of the minimum number of required jobs. Depending on the current number of jobs on machine i 1 , we distinguish two cases: 1. i 1 processes at most one of the first k jobs. In this case it is sufficient to assign one job to i 2 in order to satisfy the path condition. 2. i 1 processes at least two of the first k jobs.
In this case at least P (i 1 ,i 2 ) S k (k) + 1 jobs still have to be assigned to i 2 in order to obtain a negative entry in the (i 1 , i 2 )-path. Recall from Sect. 2.2 that P gives the difference between the current number of jobs on i 1 and i 2 in the partial scheduleS k .
As each pair (i 1 , i 2 ) has to satisfy the path condition, gives the minimum number of jobs that still have to be assigned to machine i 2 . If δ (i 1 ,i 2 ) S k = 1 for all i 1 = 1, . . . , i 2 − 1, we set v i 2 (k) = 0. The additional α i 2 (k)-term in Eq.
(2) corresponds to the aforementioned case differentiation. More precisely, if Case 1 holds for all machines i 1 = 1, . . . , i 2 − 1, then α i 2 (k) = 0. Otherwise, if at least one of the machines 1, . . . , i 2 − 1 processes more than one job (cf. Case 2), then α i 2 (k) = 1. It is readily verified that by assigning the next v m (k) jobs to machine m, the following v m−1 (k) jobs to machine m − 1 and so on until machine 2 finally receives its v 2 (k) required jobs, all m 2 path conditions are satisfied, i.e., m i 2 =2 v i 2 (k) is the minimum number of required jobs. If this number exceeds the number of remaining jobs, i.e., the current partial solution cannot be completed in such a way that the resulting schedule belongs to S(n, m). We remark that this first dominance rule is a very basic one. It does not require any explicit information on the (current) objective function value. In what follows, we derive two makespan-specific dominance rules from the results of Sect. 2. Afterwards, Example 3.1 illustrates the benefit of each of our new rules.

Increasing the minimum number of required jobs
Given the v i (k)-values for i = 2, . . . , m as determined in Sect. 3.1, we now present a procedure that checks whether some of these values can be increased by one. Requiring an upper bound U on the optimal makespan, the procedure determines the minimum number m of machines that have to process at least two jobs in order that the makespan of the corresponding schedule does not exceed U . To determine m , we successively consider the ratios where P = n j=1 p j . Starting with i = 0, q 0 represents the average machine completion time. If q 0 > p 1 , at least one machine has to process more than one job. Assuming that the completion time of this machine equals U , q 1 represents the minimum average load of the remaining m − 1 machines. If q 1 > p 1 , one of these m − 1 machines also has to process at least two jobs. We continue this process with considering q 2 and so on. The process stops as soon as q i ≤ p 1 and we obtain m = i.
Instead of using the aforementioned iterative procedure, we can determine m also analytically. It is readily verified that m = P −mp 1 U − p 1 provided that P > mp 1 and U > p 1 . If P < mp 1 or U = p 1 , we set m := 0.
To decide whether some of the v i (k)-values can be increased, we take a look at those machines to which currently at most one job is assigned. Let i (k) denote the smallest index of all machines to which currently at least two jobs are assigned. If no such machine exists, we can increase v i (k) by one for i = m − m + 1, . . . , m. In the other case, i.e., i (k) ≤ m, each machine i > i (k) also has to process at least two jobs in order to satisfy the path conditions. These are m − i (k) + 1 machines (including machine i (k)). If m > m − i (k) + 1, then the machines m − m + 1, . . . , i (k) − 1 also have to process at least two jobs which means that we can increase v i (k) by one for i = m − m + 1, . . . , i (k) − 1.

Incorporating the processing times
After having determined all v i (k)-values, we now also take the processing times into account. Our intention is to decide whether it is possible to assign the required number of jobs m i=1 v i (k) to the machines in such a way that no machine runs longer than U − 1. As this problem is N P-hard in the strong sense (proof by reduction from 3-Partition, cf. Garey and Johnson 1979), we solve a relaxed version instead. The relaxation concerns the restriction that each job has to be assigned exactly once, i.e., we now allow jobs to be assigned more than once.
Assume that 1 ≤ r ≤ m machines still require at least one job and let I = {i 1 , i 2 , . . . , i r } be the corresponding set of machines, i.e., v i (k) > 0 for all i ∈ I . We then determine for each i ∈ I the longest job j i that can be assigned to machine i in combination with the v i (k) − 1 shortest jobs so that i finishes not later than U − 1.
More formally, where C k i is the current completion time of machine i after the first k jobs have already been assigned to the machines. Note that an assignment of a job j ∈ {k +1, . . . , j i −1} to machine i cannot improve on U and will therefore not lead to a new incumbent solution that satisfies the path conditions. The same holds true for the case that p n−l is already exceeding U − 1. Let π denote a permutation of the machines in I that sorts the corresponding jobs j i (i ∈ I ) in non-increasing order of their indices. Obviously, in case that n − j π(1) + 1 < v π(1) (k), the current solution cannot be completed in such a way that both the path conditions are satisfied and the makespan is less than U . In the other case, i.e., n − j π(1) + 1 ≥ v π(1) (k), we go on and check whether n − j π(2) + 1 is smaller than v π(1) (k)+v π(2) (k). If this is the case, the partial solution can be fathomed using the same argument as before. Otherwise, we repeat this iterative process and consider the next machines according to π one by one, i.e., we check for (k) and so on. In case that one of the inequalities n − j π(r ) + 1 < r b=1 v π(b) (k) (r = 1, . . . , r ) is fulfilled, the current partial solution cannot lead to a new incumbent solution that satisfies the path conditions. Example 3.1 We consider m = 5 machines and n = 11 jobs with processing times (187,162,140,127,119,108,101,71,62,50,25). Application of the well-known LPT-rule yields an upper bound value of U = 237 on the optimal makespan. Given the partial scheduleS = (1, 2, 3, 4, 5, 4), i.e., the longest k = 6 jobs have already been assigned, Table 5 provides the entries of all paths at position 6. The superscript indicates that the corresponding path does currently not satisfy the path condition. According to Sect. 3.1, we readily obtain v i (6) = 0 for i = 1, . . . , 4 and v 5 (6) = 1 + α 5 (6) = 1 + 1 = 2. However, as n − k = 5 > 2 = 5 i=1 v i (6), we cannot fathom the current partial solution. Next, we try to increase the v i (6)-values by application of the procedure as described in Sect. 3.2. We get m = 1152−5·187 237−187 = 5 and i (6) = 4 and can increase the v i (6)-values by one for i = 1, 2, 3. Nevertheless,S cannot be fathomed as the minimum number of required jobs is still not greater than the number of unassigned jobs ( 5 i=1 v i (6) = 5 ≯ n − k = 5). Finally, we take the processing times of the five unassigned jobs into account as suggested in Sect. 3.3. We have I = {1, 2, 3, 5}, j 1 = 11, and j 2 = j 3 = j 5 = 8. As n − j 5 +1 = 4 < 5 = i∈I v i (6) (at iteration number 4 of the above-mentioned procedure),S cannot be completed in such a way that both the resulting makespan is less than U and the path conditions are satisfied, i.e., we can fathomS after all.

Outlook
In this finishing subsection of the theoretical part we briefly show that the translation of the new structural characteristics into dominance rules is not restricted to a specific job selection rule. By relaxing the assumption that jobs are selected in non-increasing order of their processing times, the results of Sect. 2 can still be applied to evaluate partial solutions with respect to the satisfiability of the path conditions. We clarify this by means of two examples.
Example 3.2 Let n = 8, m = 2 and consider the partial solution S = (1, x, 1, 2, 2, x, 2, 1), i.e., job 2 and 6 still have to be assigned. In order to satisfy the path condition, at least one of the remaining two jobs has to be assigned to machine 2.

Example 3.3
Let n = 8, m = 2 and consider the partial solution S = (1, x, 1, 2, 1, x, x, 1), i.e., job 2, 6, and 7 still have to be assigned. This time, the path condition can only be satisfied when all three remaining jobs are assigned to machine 2.
The previous two examples reveal that there is a lot of potential in translating the theoretical results of Sect. 2 into methods that evaluate partial solutions and restrict the remaining job assignments when other job selection rules are applied within a job-oriented branching scheme or even when a machine-oriented branching strategy is used. The formulation of general rules for different branching schemes appears to be a challenging but valuable task for future research.

A simple branch-and-bound algorithm
We implemented a basic branch-and-bound algorithm in order to determine the effectiveness of the new (path-related) dominance rules in a computational study. Our procedure performs a depth-first search similar to the one in Dell' Amico and Martello (1995). At each level of the branching-tree, the job with the longest processing time amongst all unassigned jobs is chosen. More specifically, at level k, the current node generates at most m son-nodes by assigning job k to those machines M i that fulfill C k−1 i + p k < U * . The corresponding machines are selected according to increasing current completion times C k−1 i . The makespan of the currently best known solution is denoted by U * .
Note that selecting the job with the longest remaining processing time at each level of the tree is necessary for the application of the dominance rules derived in Sects. 3.1-3.3, whereas the depth-first nature of our search is not a prerequisite. One can also implement a breadth-first or minimum lower bound strategy instead.
To avoid complete enumerations we also implemented a few lower and upper bounding procedures from the literature (see Sect. 4.1). Details on the application of the new dominance rules are provided in Sect. 4.2. However, it is beyond the scope of this paper to design a state-of-the-art algorithm for P||C max . This would require the implementation of even more sophisticated branching and bounding techniques than the ones described here.

Implemented lower and upper bounding procedures
To guide the search and to assess the quality of partial solutions, we implemented some lower and upper bound arguments. Concerning lower bounds, we apply two procedures of Dell' Amico and Martello (1995). The first one, L T V = max n j=1 p j /m , p 1 , p m + p m+1 , is an immediate bound obtained from simple relaxations of P||C max . The second one, L DM = max{C + 1 : ∃p ≤ C/2 for which B α (C, p) > m or B β (C, p) > m}, exploits the coherence between P||C max and the bin packing problem (BPP). In its core, L DM consists of two sophisticated lower bounds B α (C, p) and B β (C, p) for BPP. For any further details we refer to Dell' Amico and Martello (1995).
To enhance lower bounds we implemented a lifting procedure of Haouari et al. (2006). Roughly speaking, this procedure determines lower bounds for specific partial instances that are also valid for the entire instance. We letL denote the lifted version of a bound L. Finally, we implemented a procedure of Haouari and Jemmali (2008). This procedure tries to tighten a lower bound L by solving a specific subset-sum-problem (SSP). It checks whether a subset of J exists so that the corresponding processing times sum up exactly to a known lower bound value L. If no such subset exists, then the smallest realizable sum of processing times greater than L constitutes an improved lower bound. We denote the tightened bound by L SS P .
Concerning upper bounds, we implemented three procedures: the well-known LPTrule (cf. Graham 1969), the Multifit-algorithm (cf. Coffman et al. 1978), and a multistart local search improvement heuristic (cf. Haouari et al. 2006). The latter procedure iteratively solves specific P2||C max -instances. We denote the three corresponding upper bounds by U L PT , U M F , and U L S , respectively. For further details we refer to the literature.
To obtain global bounds we applied the above-mentioned bounding procedures at the root node in the following order. At first, we compute L T V and U L PT . In case L T V = U L PT , an optimal solution is obtained. Otherwise, we determine L DM . If L DM < U L PT , we compute U M F and if L DM < U M F , we additionally determine U L S . If there is still a gap between L DM and U L S , the lifted boundL DM is computed. L SS P is only determined in case thatL DM < U L S . To obtain a local bound and to save up computation time, we only compute L DM at each branched node of the search tree.

Dominance rules
If we cannot fathom a current partial solution after application of the bounding procedures, we check whether the path conditions are already satisfied. If they are not yet satisfied, we make use of our new path-related dominance rules. To allow for an efficient application, it is advisable to store not only the current position of each path but also the information whether or not the path (currently) satisfies the path condition. Using simple data structures such as two-dimensional arrays, an update of the relevant information consumes O(m 2 ) time at each generated node.
Provided that all these information is available, it takes O(m) time to determine the minimum number of required jobs (cf. Sect. 3.1). The same asymptotic run time is required for the attempt to increase the number of required jobs according to Sect. 3.2. Finally, incorporating the processing times as explained in Sect. 3.3 can be realized in O(mn) time. If none of the new rules confirms that the current solution can be fathomed, we branch the corresponding node.
Having in mind that there might be some optimal solutions that do not satisfy the path conditions (cf. end of Sect. 2.4), it does not seem to be useful to apply the new dominance rules at deep levels of the branching tree. Indeed, our preliminary tests indicated that their benefit decreases when they are applied to almost complete solutions. We do therefore not apply them when the number of remaining jobs is smaller than 0.3n .

Computational study
This section reports on the results of our computational study and we discuss the benefits of the new (path-related) dominance rules. To appropriately assess their benefit we implemented the branch-and-bound algorithm of Sect. 4 and a variant thereof. We label them BB Paths and BB NoPaths , respectively. Both algorithms are identical except that our new dominance rules are only applied in BB Paths but not in BB NoPaths .

Setup of the tests
Following the existing literature we considered different combinations of m and n and different distributions of processing times to generate our test instances. Specifically, we chose m ∈ {3, 5, 10, 15, 20} and n = km with k ∈ {2, 2.25, 2.5, 2.75, 3, 3.5, 4, 5}. Processing times are randomly drawn from five different distributions (see Table 6) as proposed in Dell' Amico and Martello (1995).
For each parameter setting (Class, m, n), we successively generated instances until five of them fulfilled the property of not being solved to proven optimality already at the root node by application of the global bounds. In other words, we tested our two branch-and-bound algorithms only on those instances which require branching in order to find an optimal solution or to verify optimality. We also recorded the total number of instances (column "Inst" in Table 8) that had to be generated. Thus, "Inst" serves as an indicator for the difficulty of finding optimal solutions or verifying Cut-off normal distribution with μ = 100 and σ = 20 5 Cut-off normal distribution with μ = 100 and σ = 50 optimality by means of upper and lower bounds at the root node. Since the likelihood of being solved at the root node rapidly increases with increasing ratios of n to m (cf., e.g., Haouari et al. 2006, and column "Inst" in Table 8), we concentrate on those cases where n/m ≤ 5. To avoid trivial instances we omitted the settings (3, m, 2m) for all m and (Class, 3, 6) for all five classes. Hence, our data set contains a total of 955 instances. We applied both BB Paths and BB NoPaths to each of them. Table 7 lists our three main performance criteria. To allow for a fair and meaningful comparison, the "Nodes"-criterion considers only those instances that have been solved by both algorithms within a prespecified time limit, whereas "Time" averages over all instances. Additionally, we recorded how often one algorithm returned a better solution than the other one. In case that no optimal solution has been found or optimality could not have been verified, we also determined the average and maximum relative deviation between the returned objective function value and the global lower bound.
We have implemented our algorithms in Java language (version 7.2). The computational tests were performed on a personal computer with an Intel Core i7-2600 processor (3.4 GHz), 8 GB RAM, and Windows 7 Professional SP1 (64 bit). The maximal computation time was set to 600 s per instance for each of our two algorithms. BB Paths and BB NoPaths were run as single processes/threads. Table 8 contains the results of our experiments on the effectiveness of the new dominance rules. For reasons of comprehensibility, we abstain from providing the results for each individual setting of the 5 × 39 parameter combinations (Class, m, n). Instead, we average the results over the 25 (20) instances per (m, n)-pair and provide the influence of the processing time classes in a compact way in a separate table (see Table 10).

Experimental results on the effectiveness of the new rules
Starting with the "US"-criterion, i.e., the number of unsolved instances, it can be seen that both algorithms show the same performance for the majority of the investigated parameter combinations. However, there are seven (m, n)-pairs [(15, 30), (15, 34), (15, 38), (15, 42), (20, 40), (20, 45), and (20, 50)] where BB Paths finds significantly more optimal solutions than BB NoPaths -132 compared to 90, i.e., 42 optimal solutions more. Note that all these (m, n)-pairs satisfy 2 ≤ n/m < 3. In total, 371 out of the 955 instances remained unsolved after application of BB NoPaths , whereas only 329 instances remained unsolved when BB Paths was applied. In case of unsolved instances, relative deviations from the global lower bound are fairly small (0.55% on average and a maximum of 4.19%). Solving the small-sized instances with m ≤ 5 machines did not pose a problem to our algorithms. None of the corresponding 370 instances remained unsolved. In contrast, almost all of the large-sized instances with m ≥ 15 machines and n ≥ 3m jobs remained unsolved within the time limit. However, it is also worth noting that the solution returned by BB Paths is at least as good as the BB NoPaths -solution for each of the 955 instances. In particular, BB Paths is superior to BB NoPaths in verifying optimality.
The superior performance of BB Paths over BB NoPaths becomes even more obvious when we take a look at the two other criteria "Time" and "Nodes". BB Paths does not only find more optimal solutions, the new dominance rules also help to identify optimality more quickly (overall average of 215 s vs. 242 s) and to considerably reduce the number of generated branch-and-bound nodes (overall average of about 6.2 millions vs. 9.4 millions). While BB Paths usually generates far less nodes than BB NoPaths , distinctly shorter computation times can only be realized when m ≥ 10. For smaller values of m, average computation times of the two variants are almost identical. However, for very few (m, n)-pairs [e.g., (5, 20) and (5, 25)], BB NoPaths is even slightly faster than BB Paths despite generating more nodes. Thus, the additional time required for application of the new rules could not always be compensated for by smaller search trees. Table 9 summarizes the results depending on the ratio of n to m. The results reveal that the new dominance rules are particularly effective in limiting the search space when n/m ranges between 2 and 3. When solving instances of the two smallest investigated ratios, only about 15.8% of the average computation time is required and only about 6.5% of the decision nodes are generated. For larger ratios, the effect diminishes as now more and more solutions exist that satisfy the path conditions. In particular, it becomes more difficult for the new rules to prune partial solutions at early levels of the decision tree and, thus, to restrict the search space effectively since the number of possible ways to satisfy the path conditions increases with increasing n/m. However, the entries in the "Inst"-column of Table 8 immediately reveal that the larger the ratio of n to m the more often instances can already be solved at the root node without requiring any branching effort at all. Almost all of the generated instances (249,816 out of 251,948, i.e., 99.15%) belong to the group of instances with n/m ∈ [4, 5]. We, therefore, did not consider any larger ratios in our tests. Table 10 summarizes the results depending on the processing time classes. As can be seen the new rules achieve the greatest improvements in terms of "US" (up to    26% less unsolved instances) and "Time" (savings of up to 23%) for the processing time classes 3 and 4. These two classes have in common that the processing times of the jobs do not vary widely among each other, i.e., the range of values is rather small. In particular, the ratio p 1 / p n of the longest to the shortest processing time is small. Smaller ratios seem to be beneficial for the dominance rule of Sect. 3.3. The greatest improvements in terms of "Nodes" are realized when processing times are drawn according to Class 2 and 4 (savings of up to 58%).

Conclusions
The present paper addressed the fundamental makespan minimization problem on identical parallel machines from a theoretical point of view. Using an approach similar to the idea behind inverse optimization, we identified and proved general characteristics of optimal schedules. These new structural insights were then translated into dominance rules to restrict the solution space during the search for an optimal schedule. Although focusing on the theoretical foundation and mathematical groundwork, we implemented the new dominance rules into a depth-first branch-and-bound algorithm in order to determine their effectiveness. In our computational study the new rules proved to be very useful. Depending on the ratio of n to m they did not only help to find more optimal solutions but also to identify them more quickly. Based on the output of our experiments we believe that it is worthwhile to pursue and develop the concept of potential optimality. Firstly, there might be some room to tighten our results either by considering specific classes of problem instances or by taking the job processing times explicitly into account. Although this appears to be a technically challenging task, it might not only result in a further restriction of the set of potentially optimal solutions but also allow for tighter v i -values. As our new dominance rules largely depend on the v i -values and the v i -values themselves depend on the theoretical results on structural patterns of optimal schedules, we can even expect tighter versions of our dominance rules. Secondly, it is useful to develop our first ideas on deriving dominance rules for other job selection rules than LPT. This way, the new structural insights can also be used in other branching schemes than the one implemented here. Thirdly, it would be interesting to integrate our findings into other exact solution approaches, such as column generation or dynamic programming, or to define efficient neighborhoods for local search procedures based on the path conditions. Last but not least, it appears promising to tackle similarly structured optimization problems by our methodological approach.