Scheduling on parallel identical machines with late work criterion: Offline and online cases

In the paper, we consider the problem of scheduling jobs on parallel identical machines with the late work criterion and a common due date, both offline and online cases. Since the late work criterion has not been studied in the online mode so far, the analysis of the online problem is preceded by the analysis of the offline problem, whose complexity status has not been formally stated in the literature yet. Namely, for the offline mode, we prove that the two-machine problem is binary NP-hard, and the general case is unary NP-hard. In the online mode we assume that jobs arrive in the system one by one, i.e., we consider the online over list model. We give an algorithm with a competitive ratio being a function of the number of machines, and we prove the optimality of this approach for two identical machines.


Introduction
Time constraints, which can be used to determine feasibility conditions, as well as to evaluate the quality of feasible solutions, play an important role in scheduling problems which can be met in the real world. In the scheduling theory, time constraints might be modeled with due dates or deadlines and the quality of schedules is estimated with reference to these parameters, leading to such criteria as lateness (e.g., McMahon and Florian 1975), tardiness (e.g., Emmons 1969), or the number of tardy jobs (e.g., Moore 1968).
Late work criterion is one of the less explored objective functions based on due dates. It was first proposed by Blazewicz (1984). Soon a number of research groups focused on this performance measure, obtaining a set of interesting results. The single-machine case was considered mainly by Potts and Van Wassenhove (1991), Hochbaum and Shamir (1990), Hariri et al. (1995), Kovalyov et al. (1994), Kethley and Alidaee (2002), and, more recently, by Lin and Hsu (2005), Zhang and Wang (2005), as well as by Ren et al. (2009). The parallel machines environment was studied by Blazewicz (1984), Blazewicz and Finke (1987), and Leung (2004), while the dedicated machines environment was investigated mainly by Blazewicz et al. (2004aBlazewicz et al. ( , 2005aBlazewicz et al. ( , 2007 and by Leung (2004), as well as by Lin et al. (2006). The state of the art for late work scheduling was presented by Leung (2004) in the context of imprecise computations and then by Sterna (2011). This latter survey shows that the majority of results obtained for the late work criterion so far concern the single-machine or shop systems, and not too much attention has been paid to the parallel machines environment. Besides, all results presented in the literature concern the offline version of the mentioned scheduling problems, and no paper is devoted to the online case.
In this work, we consider the problem of scheduling jobs on parallel identical machines with the late work criterion and a common due date, both offline and online versions. For the offline case, we prove that the problem for two machines (P2|d j = d|Y ) is binary NP-hard, while for an arbitrary number of machines (P|d j = d|Y ), it is unary NP-hard. In the online model studied in this paper, jobs appear in the system one by one: when the previous job is scheduled, the next one may arrive. Since there is an input sequence of jobs, this model is called in the literature online "over list." For the online version of the analyzed scheduling problem (P|d j = d, online over list|Y ), we propose an algorithm with competitive ratio , where m is the number of machines. Then, we prove the optimality of this method for two identical machines. More precisely, we show that when m = 2, the competitive ratio, equal to √ 5 − 1, is identical with the lower bound of the problem, so the proposed algorithm is optimal for P2|d j = d, online over list|Y . Moreover, when m → ∞, the competitive ratio converges to √ 2, which is constant. The rest of the paper is organized as follows: Section 2 presents the formal definition of the considered problem and provides some information on the related work. Section 3 shows that the offline problem is NP-hard. The online case is investigated in Sect. 4, where an online algorithm with the constant competitive ratio is proposed together with the proof of its optimality for two machines. Some conclusions and future research directions are given in Sect. 5.

Problem definition and related work
The problem of scheduling jobs on parallel identical machines with the late work criterion can be defined as follows: There are given n jobs J = {J 1 , . . . , J n } and m identical machines M = {M 1 , . . . , M m }. Each job J j ( j = 1, . . . , n) is described by its processing time p j , due date d j (representing the preferred completion time for this job), and optionally weight w j (representing the relative importance of this job). In this paper, we consider a common due date for all the jobs (i.e., d j = d, for j = 1, . . . , n), and we assume that jobs have the same unit weight (i.e., w j = 1, for j = 1, . . . , n). To solve the problem, it is necessary to schedule all jobs (i.e., to assign jobs to machines and to determine their sequence on particular machines), such that all jobs are executed without preemption and each machine executes at most one job at the same time (i.e., for each job J j its feasible completion time C j is determined). The schedule should minimize the total late work Y .
The late work for job J j is equal to the length of the late part of this job (if any), i.e., Y j = min{ p j , max{0, C j − d j }} (cf. Fig. 1 for the common due date case). The total late work is equal to the sum of late parts of all jobs in the system i.e., Y = n j=1 Y j = n j=1 min{ p j , max{0, C j − d j }}. In general, for different weights the total weighted late work criterion Y w might be considered, i.e., In the offline version, P|d j = d|Y [according to the threefield classification scheme (Graham et al. 1979)], the set of all jobs to be scheduled is known in advance. Moreover, we assume that all jobs are available at time zero (i.e., job release times are equal to zero, r j = 0 for j = 1, . . . , n).
In the online version, P|d j = d, online over list|Y , jobs arrive in the system "over list." This means that the set of all jobs is unknown in advance, and the next job might appear in the system only after scheduling the previous one.
The late work scheduling problems model many practical situations (Sterna 2011). For example, jobs may represent customer orders in internet shops (cf. Wojciechowski and Musial 2010;Blazewicz and Musial 2011;Blazewicz et al. 2014) or tasks in production systems (cf. Sterna 2007b), which have to be executed on identical machines working in parallel, e.g., by workers having the same qualifications or on identical stages in a flexible manufacturing system, within given time, e.g., before a shipping term or within a planning horizon. The late work models the amount of work not executed on time, which determines, e.g., the loss of income caused by not executing parts of jobs on time, the fine which has to be paid to customers in case of delay, or just the amount of work which has to be scheduled in the following interval, because it was not completed in the assumed planning horizon. Jobs may also model pieces of information (Blazewicz 1984) which have to be collected by sensing devices before a given deadline. Minimizing late work corresponds to minimizing information loss, directly influencing the efficiency of control algorithms. Other applications arise in agriculture, where stretches of land have to be harvested before a given time resulting from the vegetation cycle (Potts and Van Wassenhove 1991). In such a case, late work models perished goods not harvested on time. Depending on the real world conditions, i.e., whether the situation is static (all jobs are known) or dynamic (jobs arrive one after the other), the offline or online version of the model should be applied.
Late work parameter, Y j , was first introduced by Blazewicz (1984), who called it information loss, referring to a possible application of this performance measure in control systems. The phrase late work was proposed by Potts and Van Wassenhove (1991), who denoted this parameter as V j , which appears in the literature alongside Y j . The relation between late work and other performance measures (such as makespan C max , maximum lateness L max , mean (weighted) tardiness D/D w , earliness E/E w , and flow time F/F w , as well as the (weighted) number of tardy jobs U /U w ) was determined by Blazewicz et al. (2000), who showed that problems with the late work criterion are at least as difficult as the analogous problems with the maximum lateness performance measure.
The majority of results for the late work criterion have been obtained for single-machine problems. For example, Potts and Van Wassenhove (1991) proposed a polynomial time algorithm with the complexity O(nlogn) to solve the problem 1| pmtn|Y and showed that its non-preemptive case is NP-hard. Hariri et al. (1995) proposed an algorithm which solves 1| pmtn|Y w with the same time complexity O(nlogn). Then, a fully polynomial time approximation scheme was proposed for 1||Y , which is an example of DP-benevolent problem defined by Woeginger (2000). For the case with job release times (1|r j , pmtn|Y ), Lin and Hsu (2005) proposed a polynomial time algorithm, also with the complexity O(nlogn).
There are also a series of results concerning dedicated machines problems. Blazewicz et al. proved the binary NPhardness of open, flow and job shop problems: O2|d j = d|Y w (Blazewicz et al. 2004a), F2|d j = d|Y w (Blazewicz et al. 2005a), and J 2|d j = d, n j ≤ 2|Y w (Blazewicz et al. 2007). They gave NP-hardness proofs and three dynamic programming methods with pseudo-polynomial complexity O(nd 3 ), O(n 2 d 4 ), and O(n 3 d 11 ), respectively. Besides this research group, Leung (2004) proved that when jobs have two distinct due dates, problems O2|d j ∈ {d 1 , d 2 }|Y and F2|d j ∈ {d 1 , d 2 }|Y are NP-hard. Using a similar approach, Lin et al. (2006) proved that F2|d j = d|Y is NP-hard. In addition to these theoretical results, computational experiments with metaheuristic approaches for shop systems were reported in the literature (cf., e.g., Blazewicz et al. 2005b, c;Pesch and Sterna 2009).
Although the late work criterion was originally proposed for parallel machines, only a few results have been obtained for this machine environment so far. To the best of our knowledge, only three papers focus directly on this subject. Blazewicz (1984) and Blazewicz and Finke (1987) investigated problems P||Y , P|r j , pmtn|Y w and Q|r j , pmtn|Y w . Then, Leung (2004) gave some results for unweighted cases P|r j , pmtn|Y and Q|r j , pmtn|Y . The non-preemptive problem mentioned above is NP-hard, while the preemptive ones are polynomially solvable.

Offline problem P|d j = d|Y
The complexity status of the offline scheduling problem P2|d j = d|Y has not been determined so far, despite the fact that it might be quite easily predicted. Based on the literature, we know that the problem with an arbitrary number of machines and arbitrary due dates, P||Y , is NPhard (Blazewicz 1984). Actually, already the single-machine problem, 1||Y , is binary NP-hard (Potts and Van Wassenhove 1991). In consequence, the two-machine problem with arbitrary due dates, P2||Y , has to be at least binary NP-hard. By contrast, the single-machine problem with a common due date, 1|d j = d|Y , is polynomially solvable, since any schedule without idle time is optimal (Potts and Van Wassenhove 1991). There is no known result concerning the complexity of the two-machine non-preemptive problem with a common due date. On the other hand, it is known that late work scheduling problems are at least as difficult as the problems of minimizing the makespan (Blazewicz et al. 2000). We know that the makespan minimization problem on two identical parallel machines, P2||C max , is binary NP-hard (Garey and Johnson 1979). But this result does not determine the complexity of the problem P2|d j = d|Y (but P2||Y ). Nevertheless, there is a very close relation between P2||C max and P2|d j = d|Y .
Theorem 1 In the offline case, any optimal solution for P2||C max is optimal for P2|d j = d|Y .
Proof Consider the schedule which is optimal for problem P2||C max . Denote the makespan on M i as C i (i = 1 and 2). Then the optimal schedule length is equal to C * max = max{C 1 , C 2 }. Let us assume without loss of generality that C 1 = C * max ; therefore, C 2 ≤ C 1 .
Case 1 If d ≥ C 1 , then all jobs in the schedule are early, the total late work equals zero (Y = 0), and the schedule is optimal for P2|d j = d|Y . Case 2 If d < C 1 and d ≥ C 2 , then Y = C 1 −d = C * max −d, and it cannot be smaller, since C * max is minimal; therefore, the schedule is optimal for P2|d j = d|Y . Case 3 If d < C 2 , then there are late jobs on both machines and the total late work equals Since jobs are executed without idle time and machines are busy in the whole interval [0, d], the amount of early work cannot be increased; thus, the late work cannot be decreased, so the schedule is optimal for P2|d j = d|Y .
Theorem 1 shows that using a method of solving the problem with the makespan criterion, which is NP-hard, we can also solve the problem with the late work criterion and a common due date. But it does not mean that the solution for the late work criterion cannot be found in polynomial time.
The NP-hardness of problem P2|d j = d|Y results from its similarity to the Partition Problem, which is NP-complete (Garey and Johnson 1979).

Theorem 2 The problem P2|d j = d|Y is NP-hard.
Proof The decision counterpart of problem P2|d j = d|Y obviously belongs to NP, since its solution can be verified in polynomial time.
Consider the NP-complete Partition Problem, formulated as follows: There is given a set A = {a 1 , . . . , a n } of n elements of size s j . The question is does there exist a subset A ⊆ A, such that a j ∈A s j = a j ∈A\A s j ?
Consider the scheduling problem with two identical parallel machines and a common due date, P2|d j = d|Y , where n jobs J j (1 ≤ j ≤ n) from set J correspond to n elements a j from A, i.e., p j = s j , and d = 1 2 a j ∈A s j = 1 2 n j=1 p j . The question is does there exist a solution for P2|d j = d|Y with the total late work equal to zero (Y = 0)?
Both problems are equivalent. If the Partition Problem has a solution, then we can execute jobs corresponding to A\A on M 1 and to A on M 2 . The schedule length on both machines is equal to 1 2 a j ∈A s j = 1 2 n j=1 p j , and all jobs are early, leading to zero total late work.
On the other hand, if there is a schedule with zero late work, then all jobs finish not later than at common due date d. Since d = 1 2 n j=1 p j , the jobs have to be executed on both machines without idle times exactly for d time units. The division of jobs between machines determines the solution of the Partition Problem.
Moreover, problem P2|d j = d|Y is binary NP-hard, because it can be solved in pseudo-polynomial time by a simple dynamic programming method. Assuming that f ( j, A, B) denotes the total late work for j jobs scheduled on the machines, where fully early jobs are executed for at most A and B units on M 1 and M 2 , respectively, the optimal total late work is equal to f (n, d, d). The criterion value can be determined by the recurrence function with zero initial conditions: The first formula represents schedules with job J j assigned to machine M 1 , while the second one represents schedules with J j executed on M 2 . Since jobs are scheduled without idle time on both machines and their sequence is not important from the criterion value point of view, then the solution process is reduced to assigning jobs to machines or-in other words-to packing jobs before the due date. Because the recurrence function has to be determined for n jobs and 0 ≤ A ≤ d, 0 ≤ B ≤ d, the method runs in pseudo-polynomial time O(nd 2 ).
Taking into account the fact that the two-machine case, P2|d j = d|Y , is NP-hard, its generalization for an arbitrary number of machines P|d j = d|Y is also NP-hard (Garey and Johnson 1979).
Actually, using analogous reasoning as in Theorem 2, it is easy to show that problem P|d j = d|Y is unary NP-hard due to the transformation from the 3-Partition Problem. The general idea of the proof is as follows.
In the unary NP-complete 3-Partition Problem (Garey and Johnson 1979), there is given a set A = {a 1 , . . . , a 3n } of 3n elements of size s j , such that a j ∈A s j = n B, where 1 4 B < s j < 1 2 B for each j = 1, . . . , 3n. The question is does there exist a partition of A into A 1 , . . . , A n , such that s j ∈A i s j = B for each i = 1, . . . , n? Constructing a schedule with 3n jobs, corresponding to elements a j , on n machines with zero total late work with regard to common due date d = B, is equivalent to solving the 3-Partition Problem.
Theorems 1 and 2 show the similarity of problems P2|d j = d|Y and P2||C max . We know that solutions minimizing makespan are optimal from the total late work point of view, and no other schedule minimizing the total late work can be constructed in polynomial time. Moreover, the dynamic programming method proposed above shows the similarity of the considered scheduling problem to the Knapsack Problem or the Bin Packing Problem, because we try to pack as many time units of jobs before a common due date as possible.
Although problem P2|d j = d|Y is closely related to these classical combinatorial problems, determining its computational complexity, and consequently the complexity of P|d j = d|Y , was necessary to start research on online versions of these cases, because the efficiency of online algorithms is evaluated based on the comparison of online and offline solutions.

Online problem P|d j = d, onl i ne over l i st|Y
In the literature (cf., e.g., Tan and Zhang 2013), two basic models of online scheduling are discussed: 1) online "over list" model and 2) online "over time" one. In online "over list" scheduling, it is assumed that jobs come into the system one after another, i.e., the information on the next job becomes available-without any delay-after the previous one has been processed . In online "over time" scheduling, each job has its release time, and the information on this job becomes known only after this time.
Within the reported research, we focus on the first branch of online scheduling. All the problem parameters defined for the offline case in Sect. 3 apply to the online case, too.
Particularly, each job J j has processing time p j and all jobs should be preferably executed before common due date d. Our goal is to schedule all jobs on m identical machines, so that the total late work Y is minimized.
Since the set of jobs is unknown in advance, a schedule has to be constructed by an online algorithm. Obviously, due to incomplete knowledge about the whole set of jobs, online schedule might be worse than an optimal offline schedule, determined under perfect knowledge about all problem parameters.
To estimate the quality of online scheduling algorithms, we often use competitive ratio-a classic measure which shows how close an online solution is to an offline optimal solution (cf., e.g., Borodin and El-Yaniv 1998;Fiat and Woeginger 1998). For example, in a scheduling problem with an objective function which is minimized, for an input π and an online algorithm A, where C A max (π ) denotes the criterion value produced by A, and C * max (π ) denotes the optimal solution value in an offline model, the competitive ratio of A is the infimum r such that for any input, C A max (π ) ≤ r · C * max (π ). We call A an r − competitive algorithm.
Moreover, an online problem has a lower bound ρ, if no online scheduling algorithm has a competitive ratio strictly smaller than ρ.
An online scheduling algorithm is called optimal, if its competitive ratio is equal to this lower bound.
However, as we see from the following lemma, we cannot use the competitive ratio to estimate the quality of online scheduling algorithms minimizing late work directly.

Lemma 3 There is no constant lower bound of competitive ratio for P|d j = d, online over list|Y .
Proof Denote the total late work produced by an online algorithm A as Y A and the optimal total late work for an offline solution as Y * . Let m = 2 and d = 2 − , where 0 < < 1. Assume that the first two jobs which arrive in the system have unit processing times p 1 = p 2 = 1. Case 1 Assume that these two jobs have been assigned by any online algorithm to the same machine. Then the job sequence ends and no more jobs arrive in the system. Hence, we get Y A = 2 − d = . But in the optimal offline solution, we have Y * = 0, by assigning the jobs to different machines. Case 2 Now assume that this online algorithm has assigned these two jobs to different machines; then the third job with p 3 = 2 comes. We must assign it to one of the machines, and we have Y A = 3 − d = 1 + . But in the optimal offline solution, we have Y * = (2 − d) + (2 − d) = 2 , by assigning the first two jobs to one machine and the third one to the other. Then Y A Y * = 1+ 2 = 1 2 + 1 2 → ∞, when → 0.
From Lemma 3 we see that the competitive ratio calculated based on the total late work is useless for investigating online problems, since it might be not well defined (as in Case 1 ) or it might be infinite (as in Case 2), and another approach should be applied. Actually, to determine the competitive ratio for online algorithms for P|d j = d, online over list|Y , we can use the concept of early wor k (X ), which is of course complementary to late work (Blazewicz et al. 2005a). Early work denotes the part of job J j executed before (instead after) the due date (X j ) and any algorithm solving the problem should maximize the total early work (X = n j=1 X j ). For the offline problem both concepts are fully equivalent, but for the online case only early work allows us to investigate the competitive ratio. As it was shown, the total late work could be zero in an optimal offline solution, and, in this case, we could not use the competitive ratio to estimate the quality of an online solution (we cannot use zero as denominator). On the contrary, early work is always positive (except for the trivial cases with d = 0 or n = 0).
Hence, we will denote with X A the total early work of a solution constructed by the online algorithm A and with X * the criterion value of an optimal offline solution. Then the competitive ratio for the early work scheduling problem can be defined as infimum r such that for any input X * X A ≤ r .

Algorithm for P|d j = d, onl i ne over l i st|Y
To solve the considered problem, we propose an online algorithm called E F F m (Extended First Fit for m machines), with the competitive ratio √ 2m 2 −2m+1−1 m−1 . We use the following notation: -L i j : the load on machine M i (i = 1, . . . , m) after job J j ( j = 1, . . . , n) has been assigned (i.e., current makespan on M i ), -Sum: the total size of all jobs, Sum = n j=1 p j , -X E F F m : the total early work of online solution constructed by E F F m , -X * : the optimal early work of offline solution (note that X * ≤ min{Sum, md} from the definition of early work), r m = √ 2m 2 −2m+1−1 m−1 : the desired competitive ratio (m ≥ 2).
The online algorithm E F F m assigns a new job to the first suitable machine or to the machine with the minimum load, if there is no suitable one. The machine is suitable, if after assigning a new job its load will not exceed the assumed ratio, i.e., r m d.
Algorithm E F F m 1. Set t = 1, L i 0 = 0 for i = 1, . . . , m. 2. When job J t comes, assign it to the first machine which fits it, without violating the ratio (First Fit), i.e., 3. If (i > m), then assign J t to the machine with the minimum load. 4. If there is another job in the input sequence, set t = t + 1 and go to Step 2, else stop.

Theorem 4 The competitive ratio of Algorithm E F F m is
Proof For the sake of simplicity, we assume that the machines are numbered in the non-increasing order of their loads when Algorithm E F F m stops. Consider the time after the last job J n has been assigned to a machine. = r m . Otherwise A = 0 , which means that there is a "big" job with processing time p > r m d. Exclude temporarily this job and the machine on which it has been scheduled from the analysis. We get a new job sequence J and a new schedule for it. Let X be the total early work of this new schedule constructed by E F F m and let X * be the optimal offline early work for this instance. Using the reasoning presented before, we can show that X * X ≤ r m . Then we have X * X E F Fm = X * +d X +d ≤ r m , since r m ≥ 1.

Lemma 5
The competitive ratio of Algorithm E F F m , i.e., , is an increasing function of m.
Proof To prove this lemma, it is sufficient to prove that r m+1 > r m , i.e., Thus, this lemma holds for m ≥ 2.
Taking into account that for m → ∞, the competitive ratio of the proposed method converges to r ∞ = √ 2 ≈ 1.414. Hence, we can state that the competitive ratio of Algorithm E F F m is bounded by constant √ 2.

Lower bound of P2|d j = d, onl i ne over l i st|Y
Now, we show that the lower bound of problem P2|d j = d, online over list|Y is equal to √ 5 − 1 ≈ 1.236.
Theorem 6 For P2|d j = d, online over list|Y , no online algorithm has its competitive ratio strictly less than √ 5 − 1.
Proof Let d = 1+ √ 5 2 and assume that the first two jobs appearing in the system have unit processing times. There are only two possible ways of scheduling them.
Case 1 If these jobs have been assigned to the same machine, then the input job sequence ends. We have X A = d, while X * = 2, and X * X A = 2 d = √ 5 − 1. Case 2 If these two jobs have been assigned to different machines, then the last job with processing time equal to 2 comes into the system. We have X A = 1 + d and X * = 2d, then X * X A = 2d 1+d = 1+ Taking into account the fact that for two machines Algorithm E F F m has the competitive ratio r 2 which is equal to the lower bound presented above, E F F 2 is an optimal online algorithm.

Conclusions
The scheduling problems with the late work criterion have been investigated for nearly 30 years, but most of the literature focused on single-machine and shop environments. In this paper we returned to the parallel machines environment, for which this performance measure was originally proposed. We investigated, for the first time, the online model, initiating studies on online versions of late work minimization problems.
Namely, for the offline case, we established the computational complexity of the problem with a common due date and an arbitrary number of machines, showing its unary NP-hardness, and we proved binary NP-hardness of the two-machine case. The research on the offline problems might be continued, as for others intractable problems, by exploring the structure of their optimal offline solutions (cf., e.g., Sterna 2007a) or by designing heuristic/metaheuristic approaches (cf., e.g., Blazewicz et al. 2004bBlazewicz et al. , 2008. For the online case, the constant competitive ratio algorithm was given for an arbitrary number of machines, which appears to be optimal for two identical machines. Since the online environment has not been taken into account in the context of late work minimization so far, the scope for future research is overwhelming. For the parallel machines case with a common due date, one can take into accountfor example-semi-online problems, allowing some jobs rearrangements (cf., e.g., Tan and Yu 2008;Chen et al. 2011) or assuming existence of buffers (cf., e.g., Englert et al. 2008;Lan et al. 2012).
The natural step would be also investigating other problems, whose offline versions' complexity status has been already determined, in the online mode.