Column generation for minimizing total completion time in a parallel-batching environment

This paper deals with the 1|p-batch,sj≤b|∑Cj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1|{p-\text {batch}, s_j\le b}|\sum C_j$$\end{document} scheduling problem, where jobs are scheduled in batches on a single machine in order to minimize the total completion time. A size is given for each job, such that the total size of each batch cannot exceed a fixed capacity b. A graph-based model is proposed for computing a very effective lower bound based on linear programming; the model, with an exponential number of variables, is solved by column generation and embedded into both a heuristic price and branch algorithm and an exact branch and price algorithm. The same model is able to handle parallel-machine problems like Pm|p-batch,sj≤b|∑Cj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Pm|{p-\text {batch}, s_j\le b}|\sum C_j$$\end{document} very efficiently. Computational results show that the new lower bound strongly dominates the bounds currently available in the literature, and the proposed heuristic algorithm is able to achieve high-quality solutions on large problems in a reasonable computation time. For the single-machine case, the exact branch and price algorithm is able to solve all the tested instances with 30 jobs and a good amount of 40-job examples.


Introduction
In manufacturing system management, capacity is a key factor to have supply matching demand, i.e., to have a system able to produce what is needed to satisfy customer demand.
Several are the factors negatively impacting the system capacity.The most studied ones are those related to system balancing and to part batching when setup times are present, as severe bottlenecks and/or small batches can largely reduce the system capacity, thus leading to the incapacity of the manufacturing system to timely respond to the market demand (Cachon and Terwiesch (2012)).
Batches induced by setup times are called serial batches and, although they are very important in manufacturing systems, they are not the only type of batches that can be present in the shop floor.Transfer batches and parallel batches can also be found in manufacturing systems, the first being related to the capacity of the material handling resources and the second, as the serial ones, to the capacity of the machines.
Although both serial and parallel batches are related to and affecting machine capacity, their nature is very different.Serial batches are due the presence of setup times, while parallel batches stem from the ability of machines to accommodate and manufacture several jobs at the same time.They are less studied than serial and transfer batches, because they are less frequent; however, they are not less important.
Specifically, parallel batches can be found in many manufacturing process where heating operations are necessary, such as in mould manufacturing (Liu et al. (2016)) and semiconductor industry (Mönch et al. (2012)), or when there are sterilization phases (Ozturk et al. (2012)), just to cite a few examples.
In all these cases, operations takes a quite long time and the machines usually are batch machines that can accommodate several parts and process them simultaneously, exactly to virtually share the long processing time among all the parts processed at the same time.Each part has an individual size and batch machines (e.g., batch oven for heating treatments or autoclaves for sterilization operations) have a limited capacity; therefore, the number of parts that can be in a single batch is limited.
Due to the limited capacity of the batch machine, and then to the limited number of parts that can be accommodated in it, when several jobs have to be processed on the batch machine, they have to be partitioned in several batches.When batches have been created, their processing has to be scheduled on the machine, and this decision is obviously intertwined with batch creation.Moreover, the two decisions (i.e., how to create batches and how to sequence them on the batch machines) strictly depend on the objective the shop floor manager aims at (e.g., minimizing the number of tardy jobs, minimizing the maximum delay, reducing the total flow time, maximizing the machine utilization, etc.).
In this paper, the above described parallel batch problem is considered.Specifically, given a set of jobs all available at the same time, how to partition them in batches and how to sequence batches on machines is addressed with the objective of minimizing the total completion time.
With respect to the current literature, the problem addressed in the paper is the same problem as Rafiee Parsa et al. (2016) with the main difference that it extended to the parallel machines case.Following the three field notation of Graham et al. (1979) the problems dealt with is the paper can be refereed to as 1|p-batch, sj ≤ C| Cj and Pm|p-batch, sj ≤ C| Cj , and a column generation algorithm has been developed for their solution.
A fundamental work in the field of parallel batch processor scheduling is the one of Uzsoy (1994) where a single batch processing machine problem is studied with regards to makespan and total flow time criteria.In particular, in this work, non identical job sizes are taken into account and complexity results are also provided.
Large part of the literature on parallel batching is devoted to the minimization of the makespan criterion (e.g., Damodaran et al. (2006), Dupont and Dhaenens-Flipo (2002) and Rafiee Parsa et al. (2010)) while the total flow time problem have been less studied (Jolai Ghazvini and Dupont (1998) and Rafiee Parsa et al. (2016)).
The work in Jolai Ghazvini and Dupont (1998) and a modified version of the genetic algorithm presented in Damodaran et al. (2006) have been used as benchmark procedures to the hybrid max-min ant system presented in Rafiee Parsa et al. (2016).Different objective functions dealing with tardiness and lateness have been also addressed (e.g., Wang (2011), Malapert et al. (2012) and Cabo et al. (2015)).
The other most recent works on single and parallel machines parallel batching problems are Beldar and Costa (2018), Jia et al. (2018) and Ozturk et al. (2017).In Ozturk et al. (2017), the authors address a problem with unit size jobs and maximum completion time objective.In Beldar and Costa (2018) and Jia et al. (2018), instead, the total completion time criterion is considered, though Jia et al. (2018) consider the weighted version and equal processing time for all the jobs, while in Beldar and Costa (2018) only the single machine case is tackled with constraints on cardinality and size of jobs batches.
The remainder of the paper is structured as follows.The column generation approach for 1|p-batch, sj ≤ C| Cj problem is developed in Section 2,while Section 3 presents the extension of the approach to the parallel machine case, with special attention of the case with identical parallel machines.Computational results are reported in Section 4. Section 5 concludes the paper discussing the directions for future research.

Single machine models
In the remainder of the paper, the following notation is used.N = {1, 2, . . ., n} denotes the set of jobs to be scheduled; for each job j ∈ N , its processing time pj and its size sj, both integers, are given.The machine has a given integer capacity denoted by C. When a subset of jobs is packed in a batch B, pB = max{pj : j ∈ B} is used to indicates the batch processing time.Every batch B is required to have j∈B sj ≤ C. The machine processes the jobs in a batch sequence S = (B1, B2, . . ., Bt), where each job j in the k-th batch B k shares the batch completion time: Cj = CB k = k l=1 pB l ∀j ∈ B k .The 1|p-batch, sj ≤ C| Cj problem calls for forming the batches and sequencing them in order to minimize f (S) = j∈N Cj .
The problem is known to be NP-hard (Uzsoy, 1994).A MILP model for this problem, as given by Rafiee Parsa et al. (2016) is the following, where variable xij = 1 iff job i is scheduled in the j-th batch.Variables Cj and ci represent the completion times of the j-th batch and of job i, respectively.Variable Pj represents the processing time of the j-th batch.
The total completion time is expressed by (1).Constraint set (2) ensures that each job is assigned exactly to one batch and, since all the jobs assigned to a batch cannot exceed the batch capacity, constraint set (3) has to be defined.Constraint set (4) represents the fact that the processing time of a batch is the maximum processing time of all the contained jobs.The completion time for the first batch is simply its processing time since it is the first to be processed by the machine, as stated in constraint (5).Constraint set (6), instead, ensures that completion time for all the other batches is evaluated as the sum of its processing time and the completion time of the precedent batch.Constraint set (7) specifies that the completion time of a job must be the completion time of the corresponding batch (the constant M must be very large).
Model (1)-( 9) is known to be very weak.A state-of-the art solver like CPLEX can waste hours over 15-jobs instances, with optimality gaps at the root branching node of 100%.
For this reason, an alternative model is considered, where a batch sequence is represented as a path on a graph.Let B = {B ⊆ N : j∈B sj ≤ C} be the set of all the possible batches.Define a (multi)graph G(V, A) with vertex and arc sets Each arc in A is a triple (i, k, B) with head k and tail i and an associated batch B with (k − i) jobs; it will represent the batch B scheduled in a batch sequence such that exactly n − i + 1 jobs are scheduled from batch B up to the end of the sequence.For each arc (i, k, B) a cost is defined as c ikB = (n − i + 1)pB.
Property 1 highlights the relationship between feasible batches and paths of the above defined graph.
Property 1.Each feasible batch sequence S corresponds to a path P in G(V, A) from 1 to n + 1 such that the set of jobs N is partitioned over the arcs in P , and f (S) = {c ikB : (i, k, B) ∈ P }.
Proof.Refer to Figure 1 to fix the idea.A batch sequence S = (B1, B2, . . ., Bt) is easily seen to be mapped (and vice-versa) onto a path where B1, . . ., Bt form a partition of N , i1 = 1, kt = n + 1, k ℓ = i ℓ + B ℓ for ℓ = 1, . . ., t, and k ℓ = i ℓ+1 for ℓ = 1, . . ., t − 1.For an arc (iq, kq, Bq) in position q on P , as claimed above, it holds that the number of jobs scheduled from Bq to Bt is The objective function for the batch sequence is Adding by column: A very large model for the 1|p-batch, sj ≤ C| Cj problem includes features of a shortest path/minimum cost flow as well as partition constraints, as follows.Let aB be the incidence vector of job set B, aB,j = (1 if j ∈ B, else 0), and 1 = (1, 1, . . ., 1 ) is on the path (i.e., batch B is scheduled after i − 1 jobs); constraints (11) are flow conservation constraints that ensure that a unit flow is sent from node 1 to node n+1, therefore capturing the shortest path problem.Constraints ( 12) enforce the requirement that the job set is exactly partitioned over the arcs selected in the path.
2.1.Column generation.The continuous relaxation of ( 10)-( 13), where the integrality constraints ( 13) are relaxed to is solved by means of a column generation procedure.Model ( 10)-(13 ′ ) is the master problem: a restricted master problem is made of a subset A ′ ⊂ A of arcs.The dual of ( 10)-( 13 Solving the restricted master problem leads to a basic feasible solution for the master problem and dual variables/simplex multipliers u1, . . ., un+1 for constraints (11), v1, . . ., vn for constraints (12).
Pricing the arcs (i, k, B) ∈ A corresponds to finding the most violated dual constraints (15).The strategy developed in this paper is to price the arcs separately for each pair of indices i < k, therefore determining minimum (possibly negative) reduced costs Assume to index the jobs in Longest Processing Time (LPT) order, so that Finding the batch B that minimizes (16) for each given pair of indices i < k and the batch processing time pB can be done by exploiting the dynamic programming state space of a family of cardinality-constrained knapsack problems where items correspond to jobs.Define, for r = 1, . . ., n, gr(τ, ℓ) = max n j=r vj yj : where gr(τ, ℓ) is the optimal value of a knapsack with profits vj and sizes sj , limited to items/jobs r, r + 1, . . ., n, total size ≤ τ and cardinality ℓ.Variable yj is set to 1, i.e., yj = 1, iff item/job j is included in the solution.
Optimal values for gr(τ, ℓ) can be recursively computed as with boundary conditions The corresponding optimal job sets are denoted by Br(τ, ℓ); such sets can be retrieved by backtracking.
Property 2. Let L = {1} ∪ {j > 1 : pj < pj−1}.For any given pair of indices i < k, an arc with minimum reduced cost (i, k, B * ) is one of Proof.Every arc (i, k, B) can be shown to have a reduced cost not less than some of the arcs in ( 17).Let cikB = (n − i + 1)pB − j∈B vj − (ui − u k ) be the reduced cost of an arc (i, k, B).Remind that |B| = k − i, and the jobs are numbered in non-increasing order of processing times.Choose r as the smallest job index such that pr = pB.Note that B ⊆ {r, r +1, . . ., n} and r ∈ L. Consider knapsack gr(C, k −i) and the associated optimal subset Br = Br(C, k − i).The batch B is a feasible solution for knapsack gr(C, k − i), hence j∈B vj ≤ gr(C, k − i); also, because of the choice of r, pB r ≤ pr = pB.Thus, All the relevant arcs with minimum reduced cost can be generated by the procedure reported in Algorithm 1 (NewCols).
Sort and renumber jobs in N such that p1 ≥ p2 for ℓ = 1, . . ., n do 5: Set r := 1, done := false; 6: while not done do 7: Retrieve gr(C, ℓ) and B = Br(C, ℓ); 8: end for 15: Set r := min{j : pj < pr}; 16: If no such index exists, set done := true; return H; 20: end function The size of the state space required for the pricing is bounded by O(n 2 C), while the pricing procedure can have two bottlenecks: ⊲ Use branch and bound A ′ ← A ′ ∪ H 13: end while (a) the O(n 3 ) effort due to the three nested loops on lines 4, 6, 8.The while on line 6 can be executed n times in the worst case; (b) filling the state space gr(τ, ℓ), which requires at most O(n 2 C) arithmetic operations.
A memoized dynamic programming table is used, so that the execution of the topdown recursion for computing an entry gr(τ, ℓ) is deferred until the first time the value is queried.Then, the value is kept in storage and accessed in O(1) time if it is queried again.Because of these two possible bottlenecks, the running time of NewCols is bounded from above by O(max(n 3 , n 2 C)).
2.2.Heuristic procedure.The column generation described in the previous section is used to solve the continuous relaxation of the master problem; once the relaxed optimum has been found, the resulting restricted master problem is taken, the variables are set to binary type and the resulting MILP is solved by using CPLEX in order to get a heuristic solution for the master.This is often called "price and branch", as opposed to the exact approach of branch and price.
In order to generate the initial column set, the jobs are sorted in shortest processing time (SPT) order and all the possible arcs with feasible batches made of SPT-consecutive jobs are generated (Algorithm 2, InitCols).
The complete heuristic procedure is sketched in Algorithm 3.

Parallel machine models
Model ( 10)-( 13) is readily extended to parallel machine cases.Consider the fairly general Rm|p-batch, sj ≤ C| Cj problem with m parallel unrelated machines.Let p jh be the processing time of job j on machine h.A special type of arcs with empty batches is added to the graph developed for the single machine case, using the arc set Arcs (i, k, B) ∈ A are given machine-dependent costs c h ikB = p Bh (n − i + 1), with p Bh = max{p jh : j ∈ B}.Empty arcs (1, k, ∅) are given costs c h 1k∅ = 0, k = 2, . . ., n + 1, h = 1, . . ., m. Empty arcs are all added to the restricted master problem from the beginning, so that they do not need to be considered in the dynamic programming pricing procedure.A feasible solution is made of m batch sequences processed by the m machines.Such batch sequences correspond to m paths (one path for each machine) 1 → n + 1 on the arcs of which the set of jobs is exactly partitioned.Such paths will have an empty arc as first arc.Note that if (i, k, B) is on the h-th path, this means that n − i + 1 jobs will be scheduled from B to the end of the h-th batch sequence.Property 1 is easily extended to the multi-machine case.Figure 2 reports a sketch of the proof with m = 2.The empty arcs act as placeholders.
Model ( 10)-( 13) can be extended to the parallel machine case using the multi-commodity features.
Here x h ikB = 1 iff batch B is on the h-th path.Flow conservation constraints (19) require that one unit of each commodity is routed from node 1 to node n + 1. Constraints (20) enforce the exact partition of the whole job set across the arcs belonging to the m paths.
The reduced cost is then separately minimized for each combination of pair of indices i < k and machine h, searching for arcs (i, k, B) with reduced costs This requires calling NewCols m times, once per machine, since the LPT ordering on each machine is different and so is the state space gr(τ, ℓ).Hence, the running time for pricing raises to O(m max(n 3 , n 2 C)).
A somewhat better situation arises in the case of identical parallel machines, with problem P m|p-batch, sj ≤ C| Cj .Since each job j has the same processing time pj on every machine, the state space gr(τ, ℓ) used for pricing is common to all the machines, and a slightly modified version of NewCols can do the entire pricing, still keeping the running time within O(max(n 3 , n 2 C)).The procedure is reported in Algorithm 4. The key observation is that the pB and j∈B vj components of the reduced costs c h ikB are machine-independent, whereas only the largest difference ∆u ik = max h {(u h i − u h k )} is strictly needed in order to compute minimum reduced costs.Such largest differences are Algorithm 4 Pricing procedure for identical parallel machines.
precomputed in time O(mn 2 ) on line 3.For any (i, k, B), let r be the smallest index such that pr = pB, and let Br = Br(C, k − i); then, similarly to what proved in Property 2: Finally note that also taking into account different capacities for each machines, or even different job sizes on each machine, simply requires to specialize the knapsack family used.Details are omitted for the sake of conciseness.

Computational results
The price and branch heuristics on single machine and parallel identical machine instances have been tested on randomly generated instances.For generating job data, the same approach as Uzsoy (1994) and Rafiee Parsa et al. (2016) have been used.Specifically, all the job processing times are drawn from a uniform distribution pj ∈ [1, 100], while job sizes sj are drawn from four possible uniform distributions, labeled by σ ∈ {σ1, σ2, σ3, σ4}: In both Uzsoy (1994) and Rafiee Parsa et al. ( 2016) the machine capacity is fixed at C = 10.Since the pricing procedure has a pseudopolynomial running time, instances with C = 30 and C = 50 have been also generated in order to assess how the procedure behaves with a larger capacity.Single-machine instances have been generated with n ranging from 20 to 100 jobs, and with all four σ size distributions.For each n, σ and C combinations 10 random instances have been generated.With the same job data the corresponding instances of the parallel machines problem P m|p-batch, sj ≤ C| Cj have been solved for m = 2, 3, 5 identical machines.For the parallel machine case, only the C = 10 instances have been used.
Both the column-generation based lower bound CG-LB and the quality of the heuristic solution CG-UB have been evaluated.As far as the quality of the lower bound is concerned, the continuous relaxation of model ( 1)-( 9) is not a realistic competitor, zero being the typical value found by CPLEX at the root branching node.A more meaningful comparison can be performed against the combinatorial lower bound proposed by Uzsoy (1994).Such bound is based on a relaxation of 1|p-batch, sj ≤ C| Cj to a preemptive problem on C parallel machines (refer to Uzsoy (1994) for details).This lower bound is referred to as PR in the following.
For what the evaluation of CG-UB is concerned, it was difficult to compare the obtained results with the known literature as neither the test instances nor the computer codes used by Uzsoy (1994) and Rafiee Parsa et al. (2016) have been made available.Hence, some comparison have been made with the results of Rafiee Parsa et al. (2016), using instances of the same type, but, for this reason, the comparison has to be taken with some care.On the other hand, when CPLEX is feeded with model (1)-( 9) and given some time, its internal heuristics do generate a number of heuristic solutions, although it has no chance of certifying optimality.Hence, CPLEX has been run on some set of instances in order to get heuristic solutions with a time limit of 300 seconds.
The times required to compute CG-LB and CG-UB are separately reported.The gap between CG-UB and CG-LB is evaluated as All the tests ran in a Linux environment with Intel Core i7-6500U CPU @ 2.50GHz processor; C++ language has been used for coding the algorithms, and CPLEX 12.8, called directly from C++ environment using CPLEX callable libraries has ben used to solve relaxed and mixed-integer programs.
4.1.Single machine.Tables 1, 2 and 3 show the results over an increasing number of jobs with batch capacity C = 10, 30 and 50, respectively; the CG-UBwas computed using CPLEX with a time limit of 60 seconds.Values are shown as average over each 10-instance group for the time, and as average, maximum (worse) and minimum (best) over each 10instance group for the gap.Column opt reports the number of instances (out of 10) in which the solution can be certified to be the optimum, i.e., in which CG-UB = CG-LB.The comparison between the CG-LB value and Uzsoy's PR lower bound is also reported, computing the average, maximum and minimum over each 10-instance group of the ratio CG-LB/PR.
In Table 1, it can be seen that, with C = 10, the computation of CG-LB is fast, with average CPU times less than 1 second in almost all the cases (i.e., with any number of jobs).The σ4 instances are the most time demanding, with the only average computation time above 1 second.This is due to the fact that a larger set of columns is usually generated on such instances.The computation of CG-UB is, as expected, the heaviest part of the procedure, with larger CPU times.However, only in cases n = 80, 100 and σ = σ4 the CPLEX time limit is reached.Again σ4 instances were the most CPU time demanding, because of the larger set of columns to be handled.The certified solution quality was very good, with an average optimality gap usually below 1.5%, and only one case (n = 80, σ = σ4) above 5%.
From Table 1, it can be easily seen that CG-LB performances are much better than PR in every combination, ranging from an average 9% gain when n = 100 and σ = σ4 to an average 29% when n = 20 and σ = σ4.These values also suggest that PR performs better for large n;in fact, when a high number of batches are required in the feasible solutions, the usually weak parallel machine relaxation of PR becomes slightly stronger.
From Table 2, it can be noticed that CPU times for CG-LB increase; this is expected, since a larger number of possible batches are generated with an increased capacity.The larger reduced master problems obviously affect also the computation of CG-UB, which reaches the time limit in all the cases for n = 80, 100.The average optimality gaps worsen, but the worse increase is not found on σ4 instances; instead, it affects more heavily σ1 instances, especially for large n.
Overall, increasing capacity also increments the distance between the two lower bounds CG-LB and PR; CG-LB performs better in every combination, ranging from an average 10% gain when C = 30, n = 100, and σ = σ3 to an average 81% when C = 30, n = 20, and σ = σ4.This is reasonable, since PR is based on a preemptive relaxation to C parallel machines and allowing to split jobs on more machines weakens the relaxation.
Table 3 shows the results of the tests with capacity C = 50 that confirm the impact of C. The instances belonging to class σ4 are still the most computationally demanding, both for lower bounding and heuristic solution.Instances with σ = σ1 are the worse in terms of solution quality -with the exception of small 20 job instances -but, curiously, the gap lowers on n = 80 instances when passing from C = 30 to C = 50.The worst average gap is reached with n = 100 and σ = σ1 (15.66%).Also, PR worsens considerably with respect to CG-LB.  4. Comparison between HMMAS and CG-UB algorithms An attempt to compare our upper bound CG-UB to the hybrid ant system (HMMAS) developed by Rafiee Parsa et al. (2016) has been done by generating the random instances in the same way they did, and evaluating the execution times with extreme care considering also the different operational environment, as neither their algorithm nor to their instances were available.Also, for the comparison, only capacity C = 10 was used, as in Rafiee Parsa et al. (2016) only such value has been used.
As can be seen in Table 4, the results show that the performance of CG-UB, evaluated against Uzsoy's lower bound PR, seems to be very similar to that of HMMAS.It is not possible to explicitly compare to results of Rafiee Parsa et al. (2016) but, since the results appear to be very close, it can be speculated that the two algorithms could give similar results for the upper bound, when they are run on the same instance set.With the same care, it can be observed that the CPU times of CG-UB seem to be much smaller than those of HMMAS, even taking into account the different processors.The notable exception is for large σ4 instances that generate large reduced master problems.
It must be stressed, however, that, as the instances of Rafiee Parsa et al. (2016) were not available, the optimality gap of their results against a strong lower bound is unknown.Thus, even if the results seem to suggest that the upper bounds are comparable, the algorithm quality cannot be directly benchmarked.
Eventually, the quality of CG-UB has been compared to the quality of the heuristic solution reached by CPLEX (CPLEX-UB) after 300 seconds of computation using model (1)-(9).CPLEX optimality gap is most of the times well above 90% because the lower bound is zero or almost zero.Anyway, using the proposed stronger lower bound, a more realistic optimality gap can be computed for CPLEX as The gap for CG-UB is recomputed as (CG-UB − CG-LB)/UB columns can sum to more than 10.Instances with n = 20, 40, 60, 80 and C = 10 have been tested.CPLEX ran for the full 300 seconds on all the instances, without proving optimality for any of them.CG-UB ran with the same 60 seconds time limit as in Table 1.Basically, except for the small n = 20 instances, CPLEX solution is consistently worse than CG-UB.

Parallel machines.
With the same data, the P m|p-batch, sj ≤ C| Cj problem has been solved with m = 2, 3 and 5 machines.The testing has been limited to the case C = 10.The time limit for the branch and bound phase of the heuristic was raised to 180 seconds.The results are reported in tables 6, 7 and 8. Apparently, increasing the number of machines has a very mild impact on the CPU time for computing the lower bound.The growth of the computational cost is much higher for the branch and bound phase, but with a certain variability on the four classes of instances, with classes σ1 and σ4 exhibiting the largest growth.Again, class σ4 broke the time limit in all the instances.The quality of the solution, as measured by the percentage gap, does not suffer seriously, except for case n = 100, m = 5, class σ4.The worst average of 14.09% is caused by one single instance with a very large gap of 81.62%; if a larger but still acceptable time limit of 300 seconds is allowed, the average gap for this class lowers to 4.63% (max gap 12.56%).
Uzsoy's bound PR is easily extended to the parallel machines case allowing a relaxation to mC parallel machines.Tables 6, 7 and 8 also compare CG-LB with PR extended to the parallel machines case.The ratio between the two bounds is apparently unaffected by the growth of m.

Final remarks
In this paper, column generation techniques for solving 1|p-batch, sj ≤ C| Cj problem has been explored, generalizing such techniques to problems with parallel machines.The exponential size model ( 10)-( 13), handled by means of column generation, allows to find -to the authors' knowledge -the tightest known lower bound for 1|p-batch, sj ≤ C| Cj.Embedded in a simple price-and-branch approach, it achieves high-quality solutions for instances up to 100 jobs in size, with certified optimality gaps.The model relies on Property 1 in order to express the linear objective function by means of "positional" coefficients.Property 2 is crucial in order to develop an efficient pricing procedure.The approach is flexible enough to be extended to problems with parallel machines with a very limited effort while it is probably not so simple to extend the it to weighted wjCj objectives.Having a column-generation based lower bound would naturally lead to searching for a branch-and-price exact approach; some problems have still to be tackled in this direction.Applying the classical branching scheme of (Foster and Ryan, 1976) makes the pricing procedure useless at most branching nodes, since the knapsack-like problems must take disjunctions into account.
Preliminary experiments have been run with a branching scheme that allows to keep the pricing problem structure.Similarly to what is known to happen with other Cj problems, a narrow optimality gap is obtained at the root node, but closing such gap is quite difficult.In the preliminary experiments, this seems to happen because of the large number of equivalent optimal solutions that can potentially be generated playing on the different equivalent packing of jobs into batches.This leads to a large number of optimal branches that cannot be fathomed by bounding -at least not in early stages of the search.Some strong dominance criterion, suitable to break or prune such equivalencies, might then be needed before a branch-and-price is usable for this type of parallel batching problems.This is subject of ongoing research.

Table 1 .
Results for CG-UB and CG-LB with C = 10

Table 2 .
Results for CG-UB and CG-LB with C = 30

Table 3 .
Results for CG-UB and CG-LB with C = 50

Table 5 .
100% for uniformity.The comparison is reported in Table5, again in terms of average, worst and best gap.The column #win counts the number of instances out of ten for which each algorithm achieves the best solution.In case of a draw, a "win" is counted for both, so the two Comparison between CPLEX-UB (300 secs) and CG-UB.

Table 6 .
Results for CG-UB and CG-LB with C = 10 and 2 parallel machines

Table 7 .
Results for CG-UB and CG-LB with C = 10 and 3 parallel machines

Table 8 .
Results for CG-UB and CG-LB with C = 10 and 5 parallel machines