Empowering the configuration-IP: new PTAS results for scheduling with setup times

Integer linear programs of configurations, or configuration IPs, are a classical tool in the design of algorithms for scheduling and packing problems where a set of items has to be placed in multiple target locations. Herein, a configuration describes a possible placement on one of the target locations, and the IP is used to choose suitable configurations covering the items. We give an augmented IP formulation, which we call the module configuration IP. It can be described within the framework of n-fold integer programming and, therefore, be solved efficiently. As an application, we consider scheduling problems with setup times in which a set of jobs has to be scheduled on a set of identical machines with the objective of minimizing the makespan. For instance, we investigate the case that jobs can be split and scheduled on multiple machines. However, before a part of a job can be processed, an uninterrupted setup depending on the job has to be paid. For both of the variants that jobs can be executed in parallel or not, we obtain an efficient polynomial time approximation scheme (EPTAS) of running time f(1/ε)·poly(|I|)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(1/\varepsilon )\cdot \mathrm {poly}(|I|)$$\end{document}. Previously, only constant factor approximations of 5/3 and 4/3+ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4/3 + \varepsilon $$\end{document}, respectively, were known. Furthermore, we present an EPTAS for a problem where classes of (non-splittable) jobs are given, and a setup has to be paid for each class of jobs being executed on one machine.


Introduction
In this paper, we present an augmented formulation of the classical integer linear program of configurations (configuration IP) and demonstrate its use in the design of efficient polynomial time approximation schemes for scheduling problems with setup times. Configuration IPs are widely used in the context of scheduling or packing problems in which items have to be distributed to multiple target locations. The configurations describe possible placements on a single location, and the integer linear program (IP) is used to choose a proper selection covering all items. Two fundamental problems, for which configuration IPs have prominently been used, are bin packing and minimum makespan scheduling on identical parallel machines, or machine scheduling for short. For bin packing, the configuration IP was introduced as early as 1961 by Gilmore and Gomory [13], and the recent results for both problems typically use configuration IPs as a core technique, see, e.g., [14,19]. In the present work, we consider scheduling problems and therefore introduce the configuration IP in more detail using the example of machine scheduling.

Configuration IP for Machine Scheduling
In the problem of machine scheduling, a set J of n jobs is given together with processing times p j for each job j and a number m of identical machines. The objective is to find a schedule σ : J → [m] such that the makespan is minimized, that is, the latest finishing time of any job C max (σ ) = max i∈[m] j∈σ −1 (i) p j . For a given makespan bound, the configurations may be defined as multiplicity vectors indexed by the occurring processing times such that the overall length of the chosen processing times does not violate the bound. The configuration IP is then given by variables x C for each configuration C; constraints ensuring that there is a machine for each configuration, i.e., C x C = m; and further constraints due to which the jobs are covered, i.e., C C p x C = |{ j ∈ J | p j = p}| for each processing time p. In combination with certain simplification techniques, this type of IP is often used in the design of polynomial time approximation schemes (PTAS). A PTAS is a procedure that, for any fixed accuracy parameter ε > 0, returns a solution with approximation guarantee (1+ε), that is, a solution whose objective value lies within a factor of (1+ε) of the optimum. In the context of machine scheduling, the aforementioned simplification techniques can be used to guess the target makespan T of the given instance; to upper bound the cardinality of the set of processing times P by a constant (depending in 1/ε); and to lower bound the processing times in size such that they are within a constant factor of the makespan T (see, e.g., [4,19]). Hence, only a constant number of configurations is needed, which leads to an integer program with a constant number of variables. Integer programs of that kind can be efficiently solved using the classical algorithm by Lenstra and Kannan [22,27], yielding a PTAS for machine scheduling. Here, the error of (1+ε) in the quality of the solution is due to the simplification steps, and the scheme has a running time of the form f (1/ε) · poly(|I |), where |I | denotes the input size and f some computable function. A PTAS with this property is called efficient (EPTAS). Note that for a regular PTAS a running time of the form |I | f (1/ ) is allowed. It is well-known that machine scheduling is strongly NP-hard, and therefore it admits no optimal polynomial time algorithm, unless P=NP. Moreover, a so-called fully polynomial PTAS (FPTAS)-which is an EPTAS with a polynomial function f -cannot be hoped for either.

Machine Scheduling with Classes
The configuration IP is used in a wide variety of approximation schemes for machine scheduling problems [4,19]. However, for scheduling problems where the jobs have to meet some additional requirements, such as class dependencies, the approach often ceases to work. A problem emerging, in this case, is that the additional requirements have to be represented in the configurations, resulting in a super-constant number of variables in the IP. We elaborate on this using a concrete example: Consider the variant of machine scheduling in which the jobs are partitioned into K setup classes. For each job j, a class k j is given; and for each class k, a setup time s k has to be paid on a machine if a job belonging to that class is scheduled on it, i.e., C max (σ ) = max i∈ [m] j∈σ −1 (i) p j + k∈{k j | j∈σ −1(i)} s k . With some effort, simplification steps similar to the ones for machine scheduling can be applied. In the course of this, the setup times as well can be suitably bounded in number and guaranteed to be sufficiently big (see [20]). However, it is not obvious how the configuration IP should be extended without losing the property that it can be solved efficiently. For instance, extending the configurations with multiplicities of setup times creates a need to encode class information into the configurations or to introduce other class dependent variables. This leads to a super-constant number of variables and constraints.
Module Configuration IP Our approach to deal with the class dependencies of the jobs is to cover the job classes with so-called modules and cover the modules in turn with configurations in an augmented IP, called the module configuration IP (MCIP). In the setup class model, for instance, the modules may be defined as combinations of setup times and multiplicity vectors of processing times, and the configurations, in turn, as multiplicity vectors of module sizes. The number of both the modules and the configurations will typically be bounded by a constant. To cover the classes by modules, each class is provided with its own set of modules, that is, there are variables for each pair of class and module. Since the number of classes is part of the input, the number of variables in the resulting MCIP is super-constant, and therefore the algorithm by Lenstra and Kannan [22,27] is not the proper tool for the solving of the MCIP. However, the MCIP has a certain simple structure: The mentioned variables are partitioned into uniform classes each corresponding to the set of modules, and for each class, the modules have to do essentially the same, that is, cover the jobs of the class. Utilizing these properties, we can formulate the MCIP in the framework of n-fold integer programs-a class of IPs whose variables and constraints fulfill certain uniformity requirements. In 2013 Hemmecke et al. [15] presented the first fixed parameter tractable (FPT) algorithm for n-fold IPs, that is, an algorithm with a running time f (k) · poly(|I |) where k is some parameter (or a sequence of parameters) depending in the instance. In the MCIP the corresponding parameters can be properly bounded which enables the present result. For a more detailed description of n-fold IPs and the MCIP, the reader is referred to Sects. 2 and 3, respectively. In Fig. 1, the basic idea of the MCIP is visualized.
Using the MCIP, we are able to formulate an EPTAS for machine scheduling in the setup class model described above. Before, only a regular PTAS with running time nm O(1/ε 5 ) was known [20]. To the best of our knowledge, this is the first use of n-fold integer programing in the context of approximation algorithms.

Results and Methodology
To show the conceptual power of the MCIP, we utilize it for two more problems: The splittable and the preemptive setup model of machine scheduling. In both variants, for each job j, a setup time s j is given. Each job may be partitioned into multiple parts that can be assigned to different machines, but before any part of the job can be processed the setup time has to be paid. In the splittable model, job parts belonging to the same job can be processed in parallel, and therefore it suffices to find a partition of the jobs and an assignment of the job parts to machines. This is not the case for the preemptive model, in which additionally a starting time for each job part has to be found, and two parts of the same job may not be processed in parallel. In 1999, Schuurman and Woeginger [33] presented a polynomial time algorithm for the preemptive model with approximation guarantee 4/3 + ε, and for the splittable case, a guarantee of 5/3 was achieved by Chen et al. [6]. These are the best known approximation guarantees for the problems at hand. We show that solutions arbitrarily close to the optimum can be found in polynomial time: Theorem 1 There is an efficient PTAS with running time 2 f (1/ε) poly(|I |) for minimum makespan scheduling on identical parallel machines in the setup-class model, as well as in the preemptive and splittable setup models.
More precisely, we get a running time (1) in the splittable, and 2 2 O( 1 /ε log 1 /ε) n 1+o(1) m in the preemptive model. Note that all three problems are strongly NP-hard, due to trivial reductions from machine scheduling, and hence FPTAS results cannot be hoped for.
Summing up, the main achievement of this work is the development of the module configuration IP and its application in the design of approximation schemes. Up to now, EPTAS or even PTAS results seemed out of reach for the considered problems, and for the preemptive model, we provide the first improvement in 20 years. The simplification techniques developed for the splittable and preemptive model in order to employ the MCIP are original and in the latter case quite sophisticated and therefore interesting by themselves. Furthermore, we expect the MCIP to be applicable to other packing and scheduling problems as well, in particular for variants of machine scheduling and bin packing with additional class dependent constraints. On a more conceptual level, we have presented a first demonstration of the potential of n-fold integer programming in the theory of approximation algorithms and hope to inspire further studies in this direction.
We conclude this paragraph with a more detailed overview of our results and their presentation. For all three EPTAS results, we employ the classical dual approximation framework by Hochbaum and Shmoys [16] to get a guess of the makespan T . This approach is introduced in Sect. 2 together with n-fold IPs and formal definitions of the problems. In the following section, we develop the module configuration IP and argue that it is indeed an n-fold IP. The EPTAS results follow the same basic approach described above for machine scheduling: We find a schedule for a simplified instance via the MCIP and transform it into a schedule for the original one. The simplification steps typically include rounding of the processing and setup times using standard techniques, as well as the removal of certain jobs which later can be reinserted via carefully selected greedy procedures. For the splittable and preemptive model, we additionally have to prove that schedules with a certain simple structure exist, and in the preemptive model, the MCIP has to be extended. In Sect. 4 the basic versions of the EPTAS are presented, and in Sect. 5 some improvements of the running time for the splittable and the setup class model are discussed.

Related work
For an overview on n-fold IPs and their applications, we refer to the following works [12,28,31]. The first FPT algorithm for n-fold IPs was presented by Hemmecke et al. [15] in 2013, and it has a running time with a cubic dependence in n. In 2018, Eisenbrand et al. [11] and independently Koutecký et al. [26] developed algorithms with running times with near quadratic dependence in n and improved dependencies in the parameters. Then, in 2019, a near linear dependence in n was achieved by Jansen et al. [21] as well as Eisenbrand et al. [12]. Finally, in 2021, Cslovjecsek et al. [10] further improved and parallelized the result. For an overview on recent results on n-fold IPs and related topics we refer to [12].
There have been recent applications of n-fold integer programming to scheduling problems in the context of parameterized algorithms: Knop and Kouteckỳ [23] showed, among other things, that the problem of makespan minimization on unrelated parallel machines where the processing times are dependent on both jobs and machines is fixedparameter tractable with respect to the maximum processing time and the number of distinct machine types. This was generalized to the parameters maximum processing time and rank of the processing time matrix by Chen et al. [7]. Furthermore, Knop et al. [25] provided an improved algorithm for a special type of n-fold IPs, yielding improved running times for several applications of n-fold IPs including results for scheduling problems. In a recent result [24], published after the present work, the configuration IP is strongly generalized. The resulting problem is modeled as an n-fold IP and shown to catch several allocation problems.
There is extensive literature concerning scheduling problems with setup times. We highlight a few closely related results and otherwise refer to the surveys [1][2][3]. In the following, we use the term α-approximation as an abbreviation for polynomial time algorithms with approximation guarantee α. The setup class model was first considered by Mäcker et al. [29] in the special case that all classes have the same setup time. They designed a 2-approximation and additionally a (3/2 + ε)-approximation for the case that the overall length of the jobs from each class is bounded. Jansen and Land [20] presented a simple 3-approximation with linear running time, a (2 + ε)approximation, and the aforementioned PTAS for the general setup class model. As indicated before, Chen et al. [6] developed a 5/3-approximation for the splittable model. A generalization of this, in which both setup and processing times are job and machine dependent, has been considered by Correa et al. [8]. They achieve a (1+φ)-approximation where φ denotes the golden ratio, using a newly designed linear programming formulation. Moreover, there are recent results concerning machine scheduling in the splittable model considering the sum of (weighted) completion times as the objective function, e.g., [9,32]. For the preemptive model, a PTAS for the special case that all jobs have the same setup time has been developed by Schuurman and Woeginger [33]. The mentioned (4/3 + ε)-approximation for the general case [33] follows the same approach. Furthermore, a combination of the setup class and the preemptive model has been considered in which the jobs are scheduled preemptively, but the setup times are class dependent. Monma and Potts [30] presented, among other things, a (2 − 1/( m/2 + 1))-approximation for this model, and later Chen [5] achieved improvements for some special cases.

Preliminaries
In the following, we establish some concepts and notations, formally define the considered problems, and outline the dual approximation approach by Hochbaum and Shmoys [16], as well as n-fold integer programs.
For any integer n, we denote the set {1, . . . , n} by [n]; we write log(·) for the logarithm with basis 2; and we will usually assume that some instance I of the problem considered in the respective context is given together with an accuracy parameter ε ∈ (0, 1) such that 1/ε is an integer. Furthermore, for any two sets X , Y , we write Y X for the set of functions f : X → Y . If X is finite, we say that Y is indexed by X and sometimes denote the function value of f for the argument x ∈ X by f x .
Problems For all three of the considered problems, a set J of n jobs with processing times p j ∈ Q >0 for each job j ∈ J and a number of machines m is given. In the preemptive and the splittable model, the input additionally includes a setup time s j ∈ Q >0 for each job j ∈ J ; while in the setup class model, it includes a number K of setup classes, a setup class k j ∈ [K ] for each job j ∈ J , as well as setup times We take a closer look at the definition of a schedule in the preemptive model. The jobs may be split. Therefore, partition sizes κ : J → Z >0 , together with processing time fractions λ j : [κ( j)] → (0, 1] such that k∈[κ( j)] λ j (k) = 1 have to be found, meaning that job j is split into κ( j) many parts and the k-th part for k ∈ [κ( j)] has processing time λ j (k) p j . This given, we define J = {( j, k) | j ∈ J , k ∈ [κ( j)]} to be the set of job parts. Now, an assignment σ : J → [m] along with starting times ξ : J → Q >0 has to be determined such that any two job parts assigned to the same machine or belonging to the same job do not overlap. More precisely, we have to assure that for each two job parts ( j, k), ( j , k ) ∈ J with σ ( j, k) = σ ( j , k ) or j = j , we have ξ( j, k) + s j + λ j (k) p j ≤ ξ( j , k ) or vice versa. A schedule is given by (κ, λ, σ, ξ), and the makespan can be defined as C max = max ( j,k)∈J (ξ( j, k) + s j + λ j (k) p j ). Note that the variant of the problem in which overlap between a job part and setup of the same job is allowed is equivalent to the one presented above. This was pointed out by Schuurmann and Woeginger [33] and can be seen with a simple swapping argument.
In the splittable model, it is not necessary to determine starting times for the job parts because, given the assignment σ , the job parts assigned to each machine can be scheduled as soon as possible in arbitrary order without gaps. Hence, in this case, the output is of the form (κ, λ, σ ), and the makespan can be defined as Lastly, in the setup class model, the jobs are not split, and the jobs assigned to each machine can be scheduled in batches comprised of the jobs of the same class assigned to the machine without overlaps and gaps. The output is therefore just an assignment σ : J → [m], and the makespan is given by Note that in the preemptive and the setup class model, we can assume that the number of machines is bounded by the number of jobs: If there are more machines than jobs, placing each job on a private machine yields an optimal schedule in both models, and the remaining machines can be ignored. This, however, is not the case in the splittable model, which causes a minor problem in the following.
Dual Approximation All of the presented algorithms follow the dual approximation framework introduced by Hochbaum and Shmoys [16]: Instead of solving the minimization version of a problem directly, it suffices to find a procedure that for a given bound T on the objective value either correctly reports that there is no solution with value T , or returns a solution with value at most (1 + aε)T for some constant a. If we have some initial upper bound B for the optimal makespan OPT with B ≤ bOPT for some b, we can define a PTAS by trying different values T from the interval [B/b, B] in a binary search fashion, and find a value T * ≤ (1+O(ε))OPT after O(log b/ε) iterations. Note that for all of the considered problems, constant approximation algorithms are known, and the sum of all processing and setup times is a trivial m-approximation. Hence, we always assume that a target makespan T is given. Furthermore, we assume that the setup times and in the preemptive and setup class cases also the processing times are bounded by T because otherwise we can reject T immediately.

n-fold Integer Programs
We briefly define n-fold integer programs (IPs) following the notation of [15] and [23], and state the main algorithmic result needed in the following. Let n, r , s, t ∈ Z >0 be integers and A be an integer ((r + ns) · nt)-matrix of the following form: The matrix A is the so-called n-fold product of the bimatrix A 1 A 2 , with A 1 an r × t and A 2 an s × t matrix. Furthermore, let w, , u ∈ Z nt and b ∈ Z r +ns . Then the n-fold integer programming problem is given by: We set to be the maximum absolute value occurring in A. There are several algorithms for solving n-fold IPs. We use the most recent result by Cslovjecsek et al. [21]: The variables x can naturally be partitioned into bricks x (q) of dimension t for each q ∈ [n] such that x = (x (1) , . . . x (n) ). Furthermore, we denote the constraints corresponding to A 1 as globally uniform and the ones corresponding to A 2 as locally uniform. Hence, r is the number of globally and s the number of locally uniform constraints (ignoring their n-fold duplication), t the brick size and n the brick number.

Module configuration IP
In this section, we state the configuration IP for machine scheduling; introduce a basic version of the module configuration IP (MCIP) that is already sufficiently general to work for both the splittable and setup class model; and lastly show that the configuration IP can be expressed by the MCIP in multiple ways. Before that, however, we formally introduce the concept of configurations.
Given a set of objects A, such as jobs, a configuration C of these objects is a vector of multiplicities indexed by the objects, i.e., C ∈ Z A ≥0 . For given sizes (a) of the objects a ∈ A, the size (C) of a configuration C is defined as a∈A C a (a). Moreover, for a given upper bound B, we define C A (B) to be the set of configurations of A that are bounded in size by B, that is, Configuration IP We provide a recollection of the configuration IP for machine scheduling. Let P be the set of distinct processing times for some instance I with multiplicities n p for each p ∈ P, meaning, I includes exactly n p jobs with processing time p. The size ( p) of a processing time p is the processing time itself, that is, ( p) = p. Furthermore, let T be a guess of the optimal makespan. The configuration IP for I and T is given by variables x C ≥ 0 for each C ∈ C P (T ) and the following constraints: Due to constraint (1), exactly one configuration is chosen for each machine; while (2) ensures that the correct number of jobs or job sizes is covered.
Module Configuration IP Let B be a set of basic objects (e.g., jobs or setup classes) and let there be D integer values B 1 , . . . , B D for each basic object B ∈ B (e.g., processing time or numbers of different kinds of jobs). Our approach is to cover the basic objects with so-called modules and, in turn, cover the modules with configurations. Depending on the context, modules correspond to batches of jobs or job piece sizes together with a setup time and can also encompass additional information like a starting time. Let M be a set of such modules. In order to cover the basic objects, each module M ∈ M also has D integer values M 1 , . . . , M D . Furthermore, each module M has a size (M) and a set of eligible basic objects B(M). The latter is needed because not all modules are compatible with all basic objects, e.g., because they do not have the right setup times. The configurations are used to cover the modules, however, it typically does not matter which module exactly is covered, but rather which size the module has. Let H be the set of distinct module sizes, i.e., H = { (M) | M ∈ M}, and for each module size h ∈ H let M(h) be the set of modules with size h. We consider the set C of configurations of module sizes which are bounded in size by a guess of the makespan T , i.e., C = C H (T ). In the preemptive case, configurations need to additionally encompass information about starting times of modules, and therefore the definition of configurations will be slightly more complicated in that case.
Since we want to choose configurations for each machine, we have variables x C for each C ∈ C and constraints corresponding to (1). Furthermore, we choose modules with variables y M for each M ∈ M, and because we want to cover the chosen modules with configurations, we have some analogue of constraint (2), say C∈C(T ) C h x C = to cover multiple basic objects, each instance of M should only be used for one of them. Hence, it makes sense to introduce the variables y M for each basic object, and this is were n-fold IPs come into play. The variables stated so far form a brick of the variables of the n-fold IP, and there is one brick for each basic object, that is, we have, for each B ∈ B, variables x M are set to zero, if B is not eligible for M; and we set the lower bounds of all variables to zero. Sensible upper bounds for the remaining variables will be typically clear from context. Besides that, the module configuration integer program MCIP (for B, M and C) is given by: It is easy to see that the constraints (3) and (4) are globally uniform. They are the mentioned adaptations of (1) and (2). The constraint (5), on the other hand, is locally uniform and ensures that the basic objects are covered. Note that, while the duplication of the configuration variables does not carry meaning, it also does not upset the model: Consider the modified MCIP that is given by not duplicating the configuration variables. A solution (x,ỹ) for this IP gives a solution (x, y) for the MCIP by fixing some basic object B * , setting x (B * ) C =x C for each configuration C, setting the remaining configuration variables to 0, and copying the remaining variables. Given a solution (x, y) for the MCIP, on the other hand, gives a solution for the modified version (x,ỹ) by settingx C = B∈B x B C for each configuration C. Summarizing we get:

Observation 1
The MCIP is an n-fold IP with brick-size t = |M| + |C|, brick number n = |B|, r = |H | + 1 globally uniform and s = D locally uniform constraints.
Moreover, in all of the considered applications, we will minimize the overall size of the configurations, i.e., B∈B C∈C (C)x (B) C . This will be required because in the simplification steps of our algorithms some jobs are removed and have to be reinserted later, and we therefore have to make sure that no space is wasted.

First Example
We conclude the section by pointing out several different ways to replace the classical configuration IP for machine scheduling with the MCIP, thereby giving some intuition for the model. The first possibility is to consider the jobs as the basic objects and their processing times as their single value (B = J , D = 1); the modules are the processing times (M = P), and a job is eligible for a module, if its processing time matches; and the configurations are all the configurations bounded in size by T . Another option is to choose the processing times as basic objects keeping all the other definitions essentially like before. Lastly, we could consider the whole set of jobs or the whole set of processing times as a single basic object with D = |P| different values. In this case, we can define the set of modules as the set of configurations of processing times bounded by T .

EPTAS results
In this section, we present approximation schemes for each of the three considered problems. Each of the results follows the same approach: The instance is carefully simplified, a schedule for the simplified instance is found using the MCIP, and this schedule is transformed into a schedule for the original instance. The presentation of the result is also similar for each problem: We first discuss how the instance can be sensibly simplified and how a schedule for the simplified instance can be transformed into a schedule for the original one. Next, we discuss how a schedule for the simplified instance can be found using the MCIP, and, lastly, we summarize and analyze the taken steps.
For the sake of clarity, we have given rather formal definitions for the problems at hand in Sect. 2. In the following, however, we will use the terms in a more intuitive fashion for the most part, and we will, for instance, often take a geometric rather than a temporal view on schedules and talk about the length or the space taken up by jobs and setups on machines rather than time. In particular, given a schedule for an instance of any one of the three problems together with an upper bound for the makespan T , the free space with respect to T on a machine is defined as the summed up lengths of time intervals between 0 and T in which the machine is idle. The free space (with respect to T ) is the summed up free space of all the machines. For bounds T and L for the makespan and the free space, respectively, we say that a schedule is a (T , L)-schedule if its makespan is at most T and the free space with respect to T is at least L.
When transforming the instance, we will increase or decrease processing and setup times and fill in or remove extra jobs. Consider a (T , L )-schedule where T and L denote some arbitrary makespan or free space bounds. If we fill in extra jobs or increase processing or setup times, but can bound the increase on each machine by some bound b, we end up with a (T + b, L )-schedule for the transformed instance. In particular, we have the same bound for the free space because we properly increased the makespan bound. If, on the other hand, jobs are removed or setup times decreased, we obviously still have a (T , L )-schedule for the transformed instance. This will be used frequently in the following.

Setup class model
We start with the setup class model. In this case, we can essentially reuse the simplification steps that were developed by Jansen and Land [20] for their PTAS. The main difference between the two procedures is that we solve the simplified instance via the MCIP, while they used a dynamic program. For the sake of self-containment, we include our own simplification steps, but remark that they are strongly inspired by Table 1 Overview on the job classifications [20]. In Sect. 5, we present a more elaborate rounding procedure resulting in an improved running time.
Simplification of the Instance In the following, we distinguish big setup jobs j belonging to classes k with setup times s k ≥ ε 3 T and small setup jobs with s k < ε 3 T . We denote the corresponding subsets of jobs by J bst and J sst , respectively. Furthermore, we call a job tiny or small, if its processing time is smaller than ε 4 T or εT , respectively, and big or large otherwise. For any given set of jobs J , we denote the subset of tiny jobs from J with J tiny and the small, big, and large jobs analogously (see Table 1 for an overview). We simplify the instance in four steps, aiming for an instance that exclusively includes big jobs with big setup times and additionally only a constant number of distinct processing and setup times. For technical reasons, we assume ε ≤ 1/2.
We proceed with the first simplification step. Let I 1 be the instance given by the job set J \J sst small and Q the set of setup classes completely contained in J sst small , i.e., An obvious lower bound on the space taken up by the jobs from J sst small in any schedule is given by L = j∈J sst small p j + k∈Q s k . Note that the instance I 1 may include a reduced number K of setup classes. 1 , that is, a schedule with makespan T and free space at least L; and any (T , L)-schedule for I 1 can be transformed into a schedule for I with makespan at most (1+ε)T +εT +2ε 3 T .

Lemma 1 A schedule for I with makespan T induces a (T , L)-schedule for I
Proof The first claim is obvious and we therefore assume that we have a (T , L)schedule for I 1 . We group the jobs from J sst small by setup classes and first consider the groups with summed up processing time at most ε 2 T . For each of these groups, we check whether the respective setup class contains a large job. If this is the case, we schedule the complete group on a machine on which such a large job is already scheduled if possible using up free space. Since the large jobs have a length of at least εT , there are at most T /(εT ) many large jobs on each machine, and therefore the schedule on the respective machine has length at most (1 + ε)T , or there is free space with respect to T left. If, on the other hand, the respective class does not contain a large job and is therefore fully contained in J sst small , we create a container including the whole class and its setup time. Note that the overall length of the container is at most (ε 2 + ε 3 )T ≤ εT (using ε ≤ 1/2). Next, we create a sequence containing the containers and the remaining jobs ordered by setup class. We insert the items from this sequence greedily into the remaining free space in a next-fit fashion exceeding T on each machine by at most one item from the sequence, thereby creating an error of at most εT . This can be done because we had a free space of at least L, and the inserted objects had an overall length of at most L. To make the resulting schedule feasible, we have to insert some setup times. However, because the overall length of the jobs from each class in need of a setup is at least ε 2 T , and the sequence was ordered by classes, there are at most T /(ε 2 T ) + 2 distinct classes without a setup time on each machine. Inserting the missing setup times will therefore increase the makespan by at Next, we deal with the remaining (large) jobs with small setup times j ∈ J sst large . Let I 2 be the instance we get by increasing the setup times of the classes with small setup times to ε 3 T . We denote the setup time of class k ∈ [K ] for I 2 by s k . Note that there are no small setup jobs in I 2 .
Proof The first claim is true because in a schedule with makespan at most T there can be at most T /(εT ) many large jobs on any machine, and the second claim is obvious.
Let I 3 be the instance we get by replacing the jobs from J bst tiny with placeholders of size ε 4 T . More precisely, we remove J bst tiny , and, for each class k ∈ [K ], we introduce ( j∈J bst tiny ,k j =k p j )/(ε 4 T ) many jobs with processing time ε 4 T and class k. We denote the job set of I 3 by J and the processing time of a job j ∈ J by p j . Note that I 3 exclusively contains big jobs with big setup times. Proof Note that for any (T , L )-schedule for I 2 or I 3 , there are at most T /(ε 3 T ) many distinct big setup classes scheduled on any machine. Hence, when considering such a schedule for I 2 , we can remove the tiny jobs belonging to J bst tiny from the machines and instead fill in the placeholders, such that each machine for each class receives at most as much length from that class, as was removed, rounded up to the next multiple of ε 4 T . All placeholders can be placed like this and the makespan is increased by at most (T /(ε 3 T ))ε 4 T = εT . If, on the other hand, we consider such a schedule for I 3 , we can remove the placeholders and instead fill in the respective tiny jobs, again overfilling by at most one job. This yields a ((1 + ε)T , L )-schedule for I 2 with the same argument.
Lastly, we perform both a geometric and an arithmetic rounding step for the processing and setup times. The geometric rounding is needed to suitably bound the number of distinct processing and setup times, and due to the arithmetic rounding, we will be able to guarantee integral coefficients in the IP. More precisely, we set p j = (1 + ε) log 1+ε p j /(ε 4 T ) ε 4 T andp j = p j /ε 5 T ε 5 T for each j ∈ J , as well as The resulting instance is called I 4 .

Lemma 4 A (T , L )-schedule for I 3 induces a ((1 + 3ε)T , L )-schedule for I 4 , and any (T , L )-schedule for I 4 can be turned into a (T , L )-schedule for I 3 .
Proof For the first claim, we first stretch a given schedule by (1 + ε). This enables us to use the processing and setup times due to the geometric rounding step. Now, using the ones due to the second step increases the schedule by at most 2εT , because there where at most T /(ε 4 T ) many big jobs on any machine to begin with. The second claim is obvious.
Based on the rounding steps, we define two makespan boundsT andT : LetT be the makespan bound that is obtained from T by the application of the Lemmata 1-4 in sequence, i.e., We will find a (T , L)schedule for I 4 utilizing the MCIP and afterward apply the Lemmata 1-4 backwards to get a schedule with makespanT Let P and S be the sets of distinct occurring processing and setup times for instance I 4 . Because of the rounding, the minimum and maximum lengths of the setup and processing times, and ε < 1, we can bound |P| and |S| by For the sake of readability, we state the resulting constraints of the MCIP with adapted notation and without duplication of the configuration variables:

Utilization of the MCIP
Note that the coefficients are all integral and this includes those of the objective function, i.e., C (C)x C , because of the scaling step.

Lemma 5 With the above definitions, there is a (T , L)-schedule for I 4 if and only if the MCIP has a solution with objective value at most mT − L.
Proof Let there be a (T , L)-schedule for I 4 . Then the schedule on a given machine corresponds to a distinct configuration C that can be determined by counting for each possible module size h the batches of jobs from the same class whose length together with the setup time adds up to an overall length of h. Note that the length of this configuration is equal to the used up space on that machine. We fix an arbitrary setup class k and set the variables x (k) C accordingly (and x (k ) C = 0 for k = k and C ∈ C). By this setting, we get an objective value of at most mT − L because there was at least L free space in the schedule. For each class k and module M, we count the number of machines on which there are exactly M p jobs with processing time p from class k for each p ∈ P and set y (k) M accordingly. It is easy to see that the constraints are satisfied by these definitions.
Given a solution (x, y) of the MCIP, we define a corresponding schedule: Because of (6), we can match the machines to configurations such that each machine is matched to exactly one configuration. If machine i is matched to C, for each module size h, we create C h slots of length h on i. Next, we divide the setup classes into batches. For each class k and module M, we create y (k) M batches of jobs from class k with M p jobs with processing time p for each p ∈ P and place the batch together with the corresponding setup time into a fitting slot on some machine. Because of (8) and (7), all jobs can be placed by this process. Note that the used space equals the overall size of the configurations, and we therefore have free space of at least L.
Result Using the above results, we can formulate and analyze the following procedure: By Theorem 2 and some arithmetic, the MCIP can be solved in time: (1) When building the actual schedule, we iterate through the jobs and machines like indicated in the proof of Lemma 5 yielding the following: The algorithm for the setup class model finds a schedule with makespan (1 + O(ε))T or correctly determines that there is no schedule with makespan T in

Splittable model
The approximation scheme for the splittable model presented in this section is probably the easiest one discussed in this work. There is, however, one problem concerning this procedure: Its running time is polynomial in the number of machines which might be exponential in the input size in this case, since the input only includes the number of machines m with encoding length O(log m). For the other two problems, we can assume that we have at most as many machines as jobs (see Sect. 2) and hence this is not an issue. But in the splittable case we may have many more machines than jobs. Note that is not an issue for the other two problems (see Sect. 2). In Sect. 5, we show how this problem can be overcome and further improve the running time.
Simplification of the Instance In this context, the set of big setup jobs J bst is given by the jobs with setup times at least εT and the small setup jobs J sst are all the others. Let L = j∈J sst (s j + p j ). Because every job has to be scheduled and every setup has to be paid at least once, L is a lower bound on the summed up space due to small jobs in any schedule. Let I 1 be the instance that we get by removing all the small setup jobs from the given instance I .

Lemma 6 A schedule with makespan T for I induces a (T , L)-schedule for I 1 ; and any (T , L)-schedule for I 1 can be transformed into a schedule for I with makespan at most T + εT .
Proof The first claim is obvious. Hence, consider a sequence consisting of the jobs from J sst together with their setup times where the setup time of a job is the direct predecessor of the job. We insert the setup times and jobs from this sequence greedily into the schedule in a next-fit fashion: Given a machine, we keep inserting the items from the sequence on to the machine at the end of the schedule until the taken up space reaches T . If the current item does not fit exactly, we cut it such that the used space on the machine is exactly T . Then we continue with the next machine (without the insertion of an additional setup time). We can place the whole sequence like this without exceeding the makespan T , because we have free space of at least L which is the summed up length of the items in the sequence. Next, we remove each setup time that was placed only partly on a machine together with those that were placed at the end of the schedule Furthermore, we insert a fitting setup time for the jobs that were scheduled without one, which can happen only once for each machine. This yields a feasible schedule whose makespan is increased by at most εT .
Next, we round up the processing and setup times of I 1 to the next multiple of ε 2 T , that is, for each job j ∈ J , we setp j = p j /(ε 2 T ) ε 2 T ands j = s j /(ε 2 T ) ε 2 T . We call the resulting instance I 2 and denote its job set by J . Proof Consider a (T , L)-schedule for I 1 . There are at most 1/ε jobs scheduled on each machine since each setup time has a length of at least εT . On each machine, we extend each occurring setup time and the processing time of each occurring job part by at most ε 2 T to round it to a multiple of ε 2 T . This step extends the makespan by at most 2εT . Since now each job part is a multiple of ε 2 T , the total processing time of the job is a multiple of ε 2 T too. However, its overall length might be greater than its rounded processing time, and we simply discard some processing time in this case. The second claim is obvious.
Based on the two Lemmata, we define two makespan boundsT = (1 + 2ε)T and T =T + εT = (1 + 3ε)T . We will use the MCIP to find a (T , L)-schedule for I 2 in which the length of each job part is a multiple of ε 2 T . Using the two Lemmata, this will yield a schedule with makespan at mostT for the original instance I . We state the constraints of the MCIP for the above definitions with adapted notation and without duplication of the configuration variables:

Utilization of the MCIP
Note that we additionally minimize the summed up size of the configurations via the objective function C (C)x C . Proof Given such a schedule for I 2 , the schedule on each machine corresponds to exactly one configuration C that can be derived by counting the job pieces and setup times with the same summed up length h and setting C h accordingly. This yields the values for the x variables. The size of the configuration C is equal to the used space on the respective machine. Hence, the objective value is bounded by mT − L. Furthermore, for each job j and job part length q, we count the number of times a piece of j with length q is scheduled and set y ( j) (q,s j ) accordingly. It is easy to see that the constraints are satisfied. Now, let (x, y) be a solution to the MCIP with objective value at most mT − L. We use the solution to construct a schedule: For each configuration C we reserve x C machines. On each of these machines we create C h slots of length h for each module size h ∈ H . Note that because of (9), there is the exact right number of machines for this. Next, consider each job j and possible job part length q and create y ( j) (q,s j ) split pieces of length q and place them together with a setup of s j into a slot of length s j + q on any machine. Because of (11), the entire job is split up by this, and because of (10), there are enough slots for all the job pieces. Note that the used space in the created schedule is equal to the objective value of (x, y) and therefore there is at least L free space.
Result Summing up, we can find a schedule of length at most (1 + 3ε)T or correctly determine that there is no schedule of length T with the following procedure: Algorithm 2 1. Generate the modified instance I 2 : -Remove the small setup jobs. -Round the setup and processing times of the remaining jobs.  1/ε 2 ). Hence, the MCIP can be solved in time: (1) While the first step of the procedure is obviously dominated by the above, this is not the case for the remaining ones. In particular, building the schedule from the IP solution has linear costs in both n and m if the procedure described in the proof of Lemma 8 is realized in a straight-forward fashion. Note that the number of machines m could be exponential in the number of jobs, and therefore the described procedure is a PTAS only for the special case of m = poly(n). However, this limitation can be overcome with a little extra effort, as we discuss in Sect. 5.

Preemptive model
In the preemptive model, we have to actually consider the timeline of the schedule on each machine, instead of just the assignment of the jobs or job pieces, and this causes some difficulties. For instance, we will have to argue that it suffices to look for a schedule with few possible starting points, and we will have to introduce additional constraints in the IP in order to ensure that pieces of the same job do not overlap. Our first step in dealing with this extra difficulty is to introduce some concepts and notation: For a given schedule with a makespan bound T , we call a job piece together with its setup a block, and we call the schedule X -layered, for some value X , if each block starts at a multiple of X . Corresponding to this, we call the time in the schedule between two directly succeeding multiples of X a layer and the corresponding time on a single machine a slot. We number the layers bottom to top and identify them with their number, that is, the set of layers is given by { ∈ Z >0 | ( − 1)X ≤ T }. Note that in an X -layered schedule there is at most one block in each slot, and for each layer there can be at most one block of each job present. Furthermore, we slightly alter the definition of free space for X -layered schedules: We solely count the space from slots that are completely free. If in such a schedule for each job there is at most one slot occupied by this job but not fully filled, we additionally call the schedule layer-compliant.

Simplification of the instance
In the preemptive model, we distinguish big, medium and small setup jobs using two parameters δ and μ: The big setup jobs J bst are those with setup time at least δT , the small J sst have a setup time smaller than μT , and the medium J mst are the ones in between. We set μ = ε 2 δ and choose δ ∈ {ε 1 , . . . , ε 2 /ε 2 } such that the summed up processing time together with the summed up setup time of the medium setup jobs is upper bounded by mεT , i.e., j∈J mst (s j + p j ) ≤ mεT . If there is a schedule with makespan T , such a choice is possible because of the pidgeon hole principle and because the setup time of each job has to occur at least once in any schedule. Similar arguments are widely used, e.g., in the context of geometrical packing algorithms. Furthermore, we distinguish the jobs by processing times calling those with processing time at least εT big and the others small. For a given set of jobs J , we call the subsets of big or small jobs J big or J small , respectively. An overview of the job classification is provided in Table 2. We perform three simplification steps, aiming for an instance in which the small and medium setup jobs are big; small setup jobs have setup time 0; and for which an εδT -layered, layer-compliant schedule exists. The rationale behind the above approach will only become clear step by step in the following, and we kindly ask the reader to be patient. In particular, we moved a particularly complicated proof to the end of this part.
Let I 1 be the instance we get by removing the small jobs with medium setup times J mst small from the given instance I .

Lemma 9
If there is a schedule with makespan at most T for I , then there is also such a schedule for I 1 ; and if there is a schedule with makespan at most T for I 1 , then there is a schedule with makespan at most T + (ε + δ)T for I.

Proof
The first claim is obvious. For the second, we create a sequence containing the jobs from J mst small each directly preceded by its setup time. Recall that the overall length of the objects in this sequence is at most mεT , and the length of each job is bounded by εT . We greedily insert the objects from the sequence considering each machine in turn. On the current machine, we start at time T + δT and keep inserting until T + δT + εT is reached. If the current object is a setup time, we discard it and continue with the next machine and object. If, on the other hand, it is a job, we split it such that the remaining space on the current machine can be perfectly filled. We can place all objects like this, however the first job part placed on a machine might be missing a setup. We can insert the missing setups because they have length at most δT and between time T and T + δT there is free space.
Next, we consider the jobs with small setup times: Let I 2 be the instance we get by removing the small jobs with small setup times J sst small and setting the setup time of the big jobs with small setup times to zero, i.e.,s j = 0 for each j ∈ J sst big . Note that in the resulting instance each small job has a big setup time. Furthermore, let L := j∈J sst small p j + s j . Then L is an obvious lower bound for the space taken up by the jobs from J sst small in any schedule.

Lemma 10
If there is a schedule with makespan at most T for I 1 , then there is also a (T , L)-schedule for I 2 ; and if there is a γ T -layered (T , L)-schedule for I 2 with T a multiple of γ T , then there is also a schedule with makespan at most (1 + γ −1 μ)T + (μ + ε)T for I 1 .
Proof The first claim is obvious, and for the second consider a γ T -layered (T , L)schedule for I 2 . We create a sequence that contains the jobs of J sst small and their setups such that each job is directly preceded by its setup. Remember that the remaining space in partly filled slots is not counted as free space. Hence, since the overall length of the objects in the sequence is L, there is is enough space in the free slots of the schedule to place them. We do so in a greedy fashion guaranteeing that each job is placed on exactly one machine: We insert the objects from the sequence into the free slots considering each machine in turn, starting on the current machine from the beginning of the schedule, and moving on towards its end. If an object cannot be fully placed into the current slot there are two cases: It could be a job or a setup. In the former case, we cut it and continue placing it in the next slot, or, if the current slot was the last one, we place the rest at the end of the schedule. In the latter case, we discard the setup and continue with the next slot and object. The resulting schedule is increased by at most εT , which is caused by the last job placed on a machine.
To get a proper schedule for I 1 we have to insert some setup times: For the large jobs with small setup times and for the jobs that were cut in the greedy procedure. We do so by inserting a time window of length μT at each multiple of γ T and at the end of the original schedule on each machine. By this, the schedule is increased by at most γ −1 μT + μT . Since all the job parts in need of a setup are small and did start at multiples of μT or at the end, we can insert the missing setups. Note that blocks that span over multiple layers are cut by the inserted time windows. This, however, can easily be repaired by moving the cut pieces properly down.
We continue by rounding the medium and big setup and all the processing times. In particular, we round the processing times and the big setup times up to the next multiple of εδT and the medium setup times to the next multiple of εμT , i.e.,p j = p j /(εδT ) εδT for each job j,s j = s j /(εδT ) εδT for each big setup job j ∈ J bst , ands j = s j /(εμT ) εμT for each medium setup job j ∈ J mst big .

Lemma 11
If there is a (T , L)-schedule for I 2 , then there is also an εδT -layered, layer-compliant ((1 + 3ε)T , L)-schedule for I 3 ; and if there is a γ T -layered (T , L)schedule for I 3 , then there is also such a schedule for I 2 .
While the second claim is easy to see, the proof of the first is rather elaborate and unfortunately a bit tedious. Hence, since we believe Lemma 11 to be fairly plausible by itself, we postpone its proof to the end of the section and proceed discussing its use.
For the big and small setup jobs, both processing and setup times are multiples of εδT . Therefore, the length of each of their blocks in an εδT -layered, layer-compliant schedule is a multiple of εδT . For a medium setup job, on the other hand, we know that the overall length of its blocks has the form xεδT + yεμT , with non-negative integers x and y. In particular, it is a multiple of εμT because εδT = (1/ε 2 )εμT . In a εδT -layered, layer-compliant schedule, for each medium setup job the length of all but at most one block is a multiple of εδT and therefore a multiple of εμT . If both the overall length and the lengths of all but one block are multiples of εμT , this is also true for the one remaining block. Hence, we will use the MCIP not to find an εδT -layered, layer-compliant schedule in particular, but an εδT -layered one with block sizes as described above and maximum free space.
Based on the simplification steps, we define two makespan boundsT andT : Let T be the makespan bound we get by the application of the Lemmata 9-11, i.e.,T = (1 + 3ε)T . We will use the MCIP to find an εδT -layered (T , L)-schedule for I 3 and apply the Lemmata 9-11 backwards to get a schedule for I with makespan at most

Utilization of the MCIP
Similar to the splittable case, the basic objects are the (big) jobs, i.e., B = J big , and their single value is their processing time (D = 1). The modules, on the other hand, are more complicated, because they additionally need to encode which layers are exactly used and, in case of the medium jobs, to which degree the last layer is filled. For the latter, we introduce buffers, representing the unused space in the last layer and define modules as tuples ( , q, s, b) Concerning the small setup modules, note that the small setup jobs have a setup time of 0 and therefore may be covered slot by slot. We establish εμT = 1 via scaling, to ensure integral values. A big, medium or small job is eligible for a module if it is also big, medium or small, respectively, and the setup times fit.
We have to avoid that two modules M 1 , M 2 whose corresponding time intervals overlap are used to cover the same job or in the same configuration. Such an overlap occurs if there is some layer used by both of them, that is, Hence, for each layer ∈ , we set M ⊆ M to be the set of modules that use layer . Furthermore, we partition the modules into groups by size and starting layer, i.e., The size of a group G ∈ is the size of a module from G, i.e., (G) = (M) for M ∈ G. Unlike before we consider configurations of module groups rather than module sizes. More precisely, the set of configurations C is given by the configurations of groups such that for each layer at most one group using this layer is chosen, i.e., C = {C ∈ Z ≥0 | ∀ ∈ : G⊆M C G ≤ 1}. With this definition we prevent overlap conflicts on the machines. Note that unlike in the cases considered so far, the size of a configuration does not correspond to a makespan in the schedule, but to used space, and the makespan bound is realized in the definition of the modules instead of in the definition of the configurations. To also avoid conflicts for the jobs, we extend the basic MCIP with additional locally uniform constraints. In particular, the constraints of the extended MCIP for the above definitions with adapted notation and without duplication of the configuration variables are given by: Like in the first two cases, we minimize the summed-up size of the configurations via the objective function C (C)x C . Note that in this case the size of a configuration does not have to equal its height. It is easy to see that the last constraint is indeed locally uniform. However, since we have an inequality instead of an equality, we have to introduce | | slack variables in each brick, yielding:

Observation 2
The MCIP extended like above is an n-fold IP with brick-size t = |M|+|C|+| |, brick number n = |J |, r = | |+1 globally uniform and s = D +| | locally uniform constraints. Proof We first consider such a schedule for I 3 . For each machine, we can derive a configuration that is given by the starting layers of the blocks together with the summed-up length of the slots the respective block is scheduled in. The size of the configuration C is equal to the used space on the respective machine. Hence, we can fix some arbitrary job j and set x ( j) C to the number of machines corresponding to j (and x ( j ) C = 0 for j = j). Keeping in mind that in an εδT -layered schedule the free space is given by the free slots, the above definition yields an objective value bounded by mT − L because there was free space of at least L. Next, we consider the module variables for each job j in turn: If j is a small setup job, we set y ( j) ( ,εδT ,0,0) to 1 if j occurs in and to 0 otherwise. Now, let j be a big setup job. For each of its blocks, we set y ( j) ( ,z−s j ,s j ,0) = 1, where is the starting layer and z the length of the block. The remaining variables are set to 0. Lastly, let j be a medium setup job. For each of its blocks, we set y where is the starting layer of the block, z its length and b = z/(εδT ) εδT − z. Again, the remaining variables are set to 0. It is easy to verify that all constraints are satisfied by this solution.
If, on the other hand, we have a solution (x, y) to the MCIP with objective value at most mT − L, we reserve j x ( j) C machines for each configuration C. There are enough machines to do this, because of (12). On each of these machines we reserve space: For each G ∈ , we create an allocated space of length (G) starting from the starting layer of G if C G = 1. Let j be a job and be a layer. If j has a small setup time, the variable y ( j) ( ,εδT ,0,0) may have the value 0 or 1. In the latter case, we create a piece of length εδT and place it into an allocated space of length εδT in layer . If, on the other hand, j is a big or medium setup job, we consider each possible job part length q ∈ Q bst or q ∈ Q mst , respectively, create y ( j) ( ,q,s j ,0) or y ( j) ( ,q,s j ,b) ( with b = q/(εδT ) εδT − εδT ) pieces of length q, and place them together with their setup time into allocated spaces of length q in layer . Because of (14), the entire job is split up by this, and because of (13), there are enough allocated spaces for all the job pieces. The makespan bound is ensured by the definition of the modules, and overlaps are avoided due to the definition of the configurations and (15). Furthermore, the used slots have an overall length equal to the objective value of (x, y) and therefore there is at least L free space.

Result
Summing up the above considerations, we get:

Algorithm 3
1. Determine a suitable class of medium setup jobs. If there is no such class, report that there is no schedule with makespan T and terminate the procedure. 2. Generate the modified instance I 3 : -Remove the small jobs with medium setup times.
-Remove the small jobs with small setup times, and decrease the setup time of big jobs with small setup time to 0. -Round the big processing times, as well as the medium, and the big setup times.
3. Build and solve the MCIP for I 3 . 4. If the MCIP is infeasible, or the objective value greater than mT − L, report that I has no solution with makespan T . 5. Otherwise build the εδT -layered schedule with a makespan of at mostT and a free space of at least L for I 3 . 6. Transform the schedule into a schedule for I with makespan at mostT : -Use the prerounding processing and setup times.
-Insert the small jobs with small setup times into the free slots and insert the setup times of the big jobs with small setup times. -Insert the small jobs with medium setup times.

Proof of Lemma 11
We divide the proof into three steps, which can be summarized as follows: 1. We transform a (T , L)-schedule for I 2 into a ((1 + 3ε)T , L)-schedule for I 3 in which the big setup jobs are already properly placed inside the layers. 2. We construct a flow network with integer capacities and a maximum flow based on the placement of the remaining jobs in the layers. 3. Using flow integrality and careful repacking, we transform the schedule into a εδT -layered, layer-compliant schedule. More precisely, the above transformation steps will produce a εδT -layered, layercompliant ((1 + 3ε)T , L)-schedule with the additional properties that too much processing time may be inserted for some jobs or setup times are produced that are not followed by the corresponding job pieces. Note that this does not cause any problems: We can simply remove the extra setups and processing time pieces. For the medium jobs, this results in a placement with at most one used slot that is not fully filled, as required in a layer-compliant schedule.
Step 1 Remember that a block is a job piece together with its setup time placed in a given schedule. Consider a (T , L)-schedule for I 2 and suppose that for each block in the schedule there is a container perfectly encompassing it. Now, we stretch the entire schedule by a factor of (1 + 3ε) and in this process we stretch and move the containers correspondingly. The blocks are not stretched but moved in order to stay in their container, and we assume that they are positioned at the bottom, that is, at the beginning of the container. Note that we could move each block inside its respective container without creating conflicts with other blocks belonging to the same job. In the following, we use the extra space to modify the schedule. Similar techniques are widely used in the context of geometric packing algorithms. Let j be a big setup job. In each container containing a block belonging to j, there is a free space of at least 3εδT because the setup time of j is at least δT and therefore the container had at least that length before the stretching. Hence, we have enough space to perform the following two steps. We move the block up by at most εδT such that it starts at a multiple of εδT . Next, we enlarge the setup time and the processing time by at most εδT such that both are multiples of εδT . Now the setup time is equal to the rounded setup time, while the processing time might be bigger because we performed this step for each piece of the job. We outline the procedure in Fig. 2.
We continue with the small setup jobs. These jobs are big and therefore for each of them there is a summed up free space of at least 3ε 2 T in the containers belonging to the respective job-more than enough to enlarge some of the pieces such that their overall length matches the rounded processing time.
Lastly, we consider the medium setup jobs. These jobs are big as well and we could apply the same argument as above, but we need to be a little bit more careful in order to additionally realize the rounding of the setup times and an additional technical step we need in the following. Fix a medium setup job j and a container filled with a block belonging to j. Since the setup time has a length of at least μT , the part of the container filled with it was increased by at least 3εμT . Hence, we can enlarge the setup time to the rounded setup time without using up space in the container that was created due to the processing time part. We do this for all blocks belonging to medium setup jobs. The extra space in the containers of a medium setup job due to the processing time parts is still at least 3ε 2 T ≥ 3εδT . For each medium setup job j, we spend at most εδT of this space to enlarge its processing time to its rounded size and again at most εδT to create a little bit of extra processing time in the containers belonging to j. The size of this extra processing time is bounded by εδT and chosen in such a way that the overall length of all blocks belonging to j in the schedule is also a multiple of εδT . Because of the rounding, the length of the added extra processing time for each j is a multiple of εμT . The purpose of the extra processing time is to ensure integrality in the flow network, which is constructed in the next step.
Note that the free space that was available in the original schedule was not used in the above steps, in fact, it was even increased by the stretching. Hence, we have created a ((1 + 3ε)T , L)-schedule for I 3 -or a slightly modified version thereof-and the big setup jobs are already well-behaved with respect to the εδT -layers, that is, they start at multiples of εδT and fully fill the slots they are scheduled in.
Step 2 Note that for each job j and layer ∈ , the overall length q j, of job and setup pieces belonging to j and placed in is bounded by εδT . We say that j is fully, or partially, or not scheduled in layer if q j, = 1, or q j, ∈ (0, 1), or q j, = 0, respectively. Let X j be the set of layers in which j is scheduled partially and Y the set of (medium or small setup) jobs partially scheduled in . Then a j = ∈X j q j, is a multiple of εδT , and we set n j = a j /(εδT ). Furthermore, let b = j∈Y q j, and k = b /(εδT ) .
Our flow network has the following structure: There is a node v j for each medium or small setup job, a node u for each layer , as well as a source α and a sink ω. The source node is connected to the job nodes via edges (α, v j ) with capacity n j , and the layer nodes are connected to the sink via edges (u , ω) with capacity k . Lastly, there are edges (v j , u ) between job and layer nodes with capacity 1 if j is partially scheduled in layer or 0 otherwise. In Fig. 3, a sketch of the network is given.
The schedule can be used to define a flow f with value j n j in the network by setting f (α, v j ) = n j , f (u , ω) = b /(εδT ), and f (v j , u ) = q j, /(εδT ). It is easy to verify that f is a maximum flow, and because all capacities in the flow network are integral, we can find another maximum flow f with integral values.
Step 3 We start by introducing some notation and a basic operation for the transformation of the schedule: Given two machines i and i and a time t, a machine swap between i and i at moment t produces a schedule in which everything that was scheduled on i from t on is now scheduled on i and vice versa. If on both machines there is either nothing scheduled at t, or blocks are starting or ending at t, the resulting schedule is still feasible. Moreover, if there is a block starting at t on one of the machines and another one belonging to the same job ending on the other, we can merge the two blocks and transform the setup time of the first into processing time. We assume in the following that we always merge if this is possible when performing a machine swap. Remember that by definition blocks belonging to the same job cannot overlap. However, if there was overlap, it could be eliminated using machine swaps [33].
If a given slot only contains pieces of jobs that are partially scheduled in the layer, we call the slot usable. Furthermore, we say that a job j is flow assigned to layer if f (v j , u ) = 1. In the following, we will iterate through the layers, create as many usable slots as possible, reserve them for flow assigned jobs, and fill them with processing and setup time of the corresponding jobs later on. To do so, we have to distinguish different types of blocks belonging to jobs that are partially placed in a given layer: Inner blocks which lie completely inside the layer and touch at most one of its borders, upper cross-over blocks which start inside the layer and end above it, and lower cross-over blocks which start below the layer and end inside it. When manipulating the schedule layer by layer, the cross-over jobs obviously can cause problems. To deal with this, we will need additional concepts: A repair piece for a given block is a piece of setup time of length less than εδT , with the property that the block and the repair piece together make up exactly one setup of the respective job. Hence, if a repair-piece is given for a block, the block is comprised completely of setup time. Moreover, we say that a slot reserved for a job j has a dedicated setup if there is a block of j including a full setup starting or ending inside the slot.
In the following, we give a detailed description of the transformation procedure followed by a high-level summarization. The procedure runs through two phases. In the first phase the layers are transformed one after another from bottom to top. After a layer is transformed the following invariants will always hold: 1. A scheduled block either includes a full setup or has a repair piece. In the latter case it was an upper cross-over block in a previous iteration. 2. Reserved slots that are not full have a dedicated setup. Fig. 4 The rectangles represent blocks, the hatched parts the setup times, and the dashed lines layer borders. The push and cut step is performed on two blocks. For one of the two a repair piece is created Note that the invariants are trivially fulfilled in the beginning. During the first phase, we remove some job and setup parts from the schedule that are reinserted into the reserved slots in the second phase. Let ∈ denote the current layer.
In the first step, our goal is to ensure that jobs that are fully scheduled in occupy exactly one slot thereby creating as many usable slots as possible. Let j be a job that is fully scheduled in layer . If there is a block belonging to j and ending inside the layer at time t, there is another block belonging to j and starting at t because j is fully scheduled in and there are no overlaps. Hence, we can perform a machine swap at time t between the two machines the blocks are scheduled on. We do so for each job fully scheduled in the layer and each corresponding pair of blocks. After this step, there are at least k usable slots and at most k flow assigned jobs in layer .
Next, we consider upper cross-over blocks of jobs that are partially scheduled in the layer but are not flow assigned to it. These are the blocks that cause the most problems, and we perform a so-called push and cut step (see Fig. 4) for each of them: If q is the length of the part of the block lying in , we cut away the upper part of the block of length q and move the remainder up by q. If the piece we cut away does contain some setup time, we create a repair piece for the block out of this setup time. The processing time part of the piece, on the other hand, is removed. Note that this step preserves the first invariant. The repair piece is needed in the case that the job corresponding to the respective block is flow assigned to the layer in which the block ends.
We now remove all inner blocks from the layer as well as the parts of the upper and lower cross-over blocks that lie in the layer. After this all usable slots are completely free. Furthermore, note that the first invariant might be breached by this.
Next, we arbitrarily reserve usable slots for jobs flow assigned to the layer. For this, note that due to the definition of the flow network, there are at most k jobs flow assigned to the layer and there are at least as many usable slots, as noted above. This step might breach the second invariant as well. Using machine swaps at the upper and lower border of the layer, we then ensure that the upper and lower cross-over blocks of the jobs flow assigned to the layer lie on the same machine as the reserved slot. Note that for each job there can be at most one upper or lower cross-over block, respectively, in the layer.
To restore the invariants, we perform the following repair steps for each job j flow assigned to the layer: Case 1 If there is an upper cross-over block for j or a lower cross-over block without a repair peace, we reinsert the removed part (or parts) at the end or beginning of the slot, respectively. This provides a dedicated setup for the job and furthermore the first invariant once again holds for the respective cross-over blocks. Case 2 If there is neither an upper nor a lower block for j, there is an inner block belonging to j. This has to be the case because otherwise the capacity in the flow network between j and is 0, and j could not have been flow assigned to . Moreover, this inner block contains a full setup, and we can place it in the beginning of the slot thus providing the dedicated setup. The invariants are both restored. Case 3 The last possibility is that there is no upper cross-over block but a lower cross-over block with a repair piece. In this case, the removed part of the block is fully comprised of setup and we reinsert it in the beginning of the reserved slot. Furthermore, we insert as much setup of the repair piece as possible. If the repair piece is not used up, we now consider the remainder as the new repair piece of the block. Hence, the first invariant holds, and since the slot is full in this case, the second one holds as well. If, on the other hand, the full repair piece is inserted, we thereby provide a dedicated setup for the slot and the block once again contains a full setup. In this case, the jobs does not have a repair piece anymore.
After the first phase is finished, we have to deal with the removed pieces in the second one. The overall length of the reserved slots for a job j equals the overall length a j of its setup and job pieces from layers in which j was partially scheduled. Since we did not create or destroy any job piece, we can place the removed pieces corresponding to job j into the remaining free space of the slots reserved for j, and we do so after transforming them completely into processing time. Because of the second invariant, there is a dedicated setup in each slot, however, it may be positioned directly above the newly inserted processing time. This can be fixed by switching the processing time with the top part of the respective setup time. Furthermore, there may be some blocks that still have a repair piece. We may remove these blocks together with their repair pieces.
Lastly, all remaining usable slots are completely free at the end of this procedure, and since the others are full, they have an overall size of at least L. We conclude the proof of Lemma 11 with an overview of the transformation procedure.

Algorithm 4 Phase 1:
For each layer ∈ , considered bottom to top, perform the following steps: 1. Use machine swaps to ensure that jobs fully scheduled in occupy exactly one slot. 2. For each upper cross-over block of a job partially scheduled but not flow assigned to perform a push and cut step. 3. Remove inner blocks and parts of cross-over blocks that lie in . 4. Reserve usable slots for jobs flow assigned to the layer. 5. Use machine swaps to ensure that cross-over blocks of flow assigned jobs lie on the same machine as the reserved slot. 6. For each job j flow assigned to the layer, perform one of the repair steps.
Phase 2: 1. Transform all removed pieces into processing time and insert the removed pieces into the reserved slots. 2. If processing time has been inserted ahead of the dedicated setup of the slot, reschedule properly. 3. Remove blocks that still have a repair piece.

Improvements of the running time
In this section, we revisit the splittable and the setup time model. For the former, we address the problem of the running time dependence in the number of machines m, and for both, we present an improved rounding procedure yielding a better running time.

Splittable model: machine dependence
In the splittable model, the number of machines m may be super-polynomial in the input size because it is not bounded by the number of jobs n. Hence, we need to be careful already when defining the schedule in order to get a polynomially bounded output. We say a machine is composite if it contains more than one job, and we say it is plain if it contains at most one job. For a schedule with makespan T , we call each machine trivial if it is plain and has load T or if it is empty and nontrivial otherwise. We say a schedule with makespan T is simple if the number of nontrivial machines is bounded by n 2 .

Lemma 13
If there is a schedule with makespan T for I there is also a simple schedule with makespan at most T .
Proof Let there be a schedule with makespan T for I . For the first step, let us assume there are more than n 2 composite machines. In this case, there exist two distinct machines i 1 and i 2 and two distinct jobs j 1 and j 2 such that both machines contain parts of both jobs since there are at most n 2 different pairs of jobs. For x, y ∈ {1, 2}, let t(x, y) be the processing time combined with the setup time of job x ∈ { j 1 , j 2 } on machine y ∈ {i 1 , i 2 }. W.l.o.g., let t( j 1 , i 1 ) be the smallest value of the four. We swap this job part and its setup time with some of the processing time of the job j 2 on machine i 2 . If the processing time of j 2 on i 2 is smaller than t( j 1 , i 1 ), there is no processing time of j 2 on i 2 left and we can discard the corresponding setup time. Afterwards, the makespan has not increased and at least one machine processes one job less. We can repeat this step iteratively until there are at most n 2 machines containing more than one job.
In the second step, we shift processing time from the composite machines to the plain ones. We do this for each job until it is either not contained on a composite machine or each plain machine containing this job has load T . If the job is no longer contained on a composite machine, we shift the processing time of the job such that all except one machine containing this job has load T . Since this job does not appear on any composite machine, the number of such machines can in this case be bounded by n−1 2 by repeating the first step. Therefore, the number of nontrivial machines is bounded by n−i 2 + i ≤ n 2 for some i ∈ {0, . . . , n}.
For a simple schedule, a polynomial representation of the solution is possible: For each job, we state the number of trivial machines containing this job or fix a first and last trivial machine belonging to this job. This enables a polynomial encoding length of the output, given that the remaining parts of the jobs are not fragmented into too many parts which can be guaranteed using the results of Sect. 4.
To guarantee that the MCIP finds a simple solution, we need to modify it a little. We have to ensure that nontrivial configurations are not used too often. Let C ⊆ C be the set of nontrivial configurations, i.e., the set of configurations containing more than one module or one module with size smaller than T . We add the following globally uniform constraint to the MCIP: Since this is an inequality, we have to introduce a slack variable increasing the brick size by one. However, this does not change the running time.
The number of modules with maximum size denotes for each job in J bst how many trivial machines it uses. The other modules can be mapped to the nontrivial configurations and the jobs can be mapped to the modules.
We still have to schedule the jobs in J sst . We do this as described in the proof of Lemma 6. We fill the nontrivial machines greedily step by step starting with the jobs having the smallest processing time. When these machines are filled, there are some completely empty machines left. Now, we estimate how many machines can be completely filled with the current job j. This can be done by dividing the remaining processing time by T − s i in O (1). The remaining part is scheduled on the next free machine. This machine is filled up with the next job and again the number of machines which can be filled completely with the rest of this new job is determined. These steps are iterated until all jobs in J sst are scheduled. This greedy procedure needs at most O(|J bst |(|J bst | − 1) + |J sst |) = O(n 2 ) operations. Therefore, we can avoid the dependence in the number of machines at the cost of a quadratic dependency in n in the running time.

Improved rounding procedures
To improve the running time in the splittable and setup class model, we reduce the number of module sizes via a geometric and an arithmetic rounding step. In both cases, the additional steps are performed following all the other simplification steps. The basic idea is to include setup times together with their corresponding job pieces or batches of jobs respectively into containers with suitably rounded sizes and to model these containers using the modules. The containers have to be at least as big as the objects they contain and the load on a machine is given by the summed up sizes of the containers on the machine. Let H * be a set of container sizes. Then an H * -structured schedule is a schedule in which each setup time together with its corresponding job piece or batch of jobs is packed in a container with the smallest size h ∈ H * such that the summed up size of the setup and the job piece or batch of jobs is upper bounded by h. Splittable Model Consider the instance I 2 for the splittable model described in Sect. 4.2. In this instance, each setup and processing time is a multiple of ε 2 T and we are interested in a schedule of length (1 + 2ε)T . For each multiple h of ε 2 T , let h = (1 + ε) log 1+ε h/(ε 2 T ) ε 2 T andh = h /ε 2 T ε 2 T , andH = {h | h ∈ ε 2 T Z ≥1 , h ≤ (1 + 2ε) 2 T }. Note that |H | ∈ O(1/ε log 1/ε)

Lemma 14
If there is a ((1 + 2ε)T , L )-schedule for I 2 in which the length of each job part is a multiple of ε 2 T , there is also anH -structured ((1 + 2ε) 2 T , L )-schedule for I 2 with the same property.
Proof Consider such a schedule for I 2 and a pair of setup time s and job piece q scheduled on some machine. Let h = s + q. Stretching the schedule by (1 + 2ε) creates enough space to place the pair into a container of sizeh, because (1 + ε)h ≤h, and εh ≤ ε 2 T , since s ≥ εT .
To implement this lemma into the procedure, the processing time boundsT andT both have to be properly increased. Modeling anH -structured schedule can be done quite naturally: We simply redefine the size (M) of a module M = (s, q) ∈ M to be (s + q). With this definition, we have |H | = |H | = O(1/ε log 1/ε) yielding an improved running time for solving the MCIP of: (1) Combining this with the results above and the considerations in Sect. 4.2 yields the running time claimed below Theorem 1.

Setup Class Model
In the setup class model, an analogous approach also yields a reduced set of module sizes, that is, |H | = O(1/ε log 1/ε). Therefore, the MCIP can be solved in time: 2 O( 1 /ε 3 (log 1 /ε) 4 ) K 1+o (1) Hence, we get the running time claimed beneath Theorem 1.

Conclusion
We presented a more advanced version of the classical configuration IP, showed that it can be solved efficiently using algorithms for n-fold IPs, and developed techniques to employ the new IP for the formulation of efficient polynomial time approximation schemes for three scheduling problems with setup times for which no such algorithms were known before. For further research the immediate questions are whether improved running times for the considered problems, in particular for the preemptive model, can be achieved; whether the MCIP can be solved more efficiently; and to which other problems it can be reasonably employed. From a broader perspective, it would be interesting to further study the potential of new algorithmic approaches in integer programming for approximation, and, on the other hand, further study the respective techniques themselves.