Server Cloud Scheduling

Consider a set of jobs connected to a directed acyclic task graph with a fixed source and sink. The edges of this graph model precedence constraints and the jobs have to be scheduled with respect to those. We introduce the Server Cloud Scheduling problem, in which the jobs have to be processed either on a single local machine or on one of many cloud machines. Both the source and the sink have to be scheduled on the local machine. For each job, processing times both on the server and in the cloud are given. Furthermore, for each edge in the task graph, a communication delay is included in the input and has to be taken into account if one of the two jobs is scheduled on the server, the other in the cloud. The server can process jobs sequentially, whereas the cloud can serve as many as needed in parallel, but induces costs. We consider both makespan and cost minimization. The main results are an FPTAS with respect for the makespan objective for a fairly general case and strong hardness for the case with unit processing times and delays.


Introduction
Scheduling with precedence constraints with the goal of makespan minimization is widely considered a fundamental problem.It has already been studied in the 1960s by Graham [1] and receives a lot of research attention up to this day (see e.g.[2][3][4]).One problem variant that has received particular attention recently, is the variant with communication delays (e.g.[4][5][6]).Another, more contemporary topic concerns scheduling using external resources like, for instance, machines from the cloud and several models in this context have been considered of late (e.g.[7][8][9]).In this paper, we introduce and study a model closely connected to both settings, where jobs with precedence constraints may either be processed on a single server machine or on one of many cloud machines.Here, communication delays may occur only if the computational setting is changed.The server and cloud machines may behave heterogeneously, i.e., jobs may have different processing times on the server and in the cloud, and scheduling in the cloud incurs costs proportional to the computational load performed in this context.Both makespan and cost minimization is considered.We believe that the present model provides a useful link between scheduling with precedence constraints and communication delays on the one hand and cloud scheduling on the other.There is a shorter published conference version [10] of this paper; Section 3, Section 7 and Section 8 are new content exclusive to this version.

Problem
We consider a scheduling problem SCS in which a task graph G = (J , E) has to be scheduled on a combination of a local machine (server) and a limitless number of remote machines (cloud).The task graph is a directed, acyclic graph with exactly one source S ∈ J and exactly one sink T ∈ J .Each job j ∈ J has a processing time on the server p s (j) and on the cloud p c (j).We consider p s (S) = p s (T ) = 0 and p c (S) = p c (T ) = ∞.For every other job the values of p s and p c can be arbitrary in N 0 , meaning that the server and the cloud are unrelated machines in our default model.An edge e = (i, j) denotes precedence, i.e., job i has to be fully processed before job j can start.Furthermore an edge e = (i, j) has a communication delay of c(i, j) ∈ N 0 , which means that after job i finished, j has to wait an additional c(i, j) time steps before it can start, if i and j are not both scheduled on the same type of machine (server or cloud).
A schedule π is given as a tuple (J s , J c , C). J s and J c are a proper partition of J : J s ∩J c = ∅ and J s ∪J c = J .The sets J s and J c denote jobs that are processed on the server or cloud in π, respectively.Lastly, C : J → N 0 maps jobs to their completion time.
We introduce some notation before we formally define the validity of a schedule.Let p π (j) be equal to p s (j) iff j ∈ J s , and p c (j) iff j ∈ J s .The value p π (j) denotes the actual processing time of job j in π.Let E * := {(i, j) ∈ E | (i ∈ J s ∧ j ∈ J c ) ∨ (i ∈ J c ∧ j ∈ J s )} be the set of edges between jobs on different computational contexts (server or cloud).Intuitively, for all the edges in E * we have to take the communication delays into consideration, for all edges in E \ E * we only care about the precedence.
We call a schedule π valid if and only if the following conditions are met: a) There is always at most one job processing on the server: ∀ i∈J s ∀ j∈J s \{i} : (C(i) ≤ C(j) − p π (j)) ∨ (C(i) − p π (i) ≥ C(j)) b) Tasks are not started before the preceding tasks have been finished and the required communication is done: ∀ (i,j)∈E\E * : (C(i) ≤ C(j) − p π (j)) ∀ (i,j)∈E * : (C(i) + c(i, j) ≤ C(j) − p π (j)) The makespan (mspan) of a schedule is given by the completion time of the sink C(T ).The cost (cost) of a schedule is given by the time it spends processing tasks on the cloud: i∈J c p π (i).Note here, that by requiring p s (S) = p s (T ) = 0 and p c (S) = p c (T ) = ∞, we assume every job to start and end on the server.This is done only for convenience as it defines a clear start and end state for each schedule.
Naturally two different optimization problems arise from the definition.First, given a deadline d, find a schedule with lowest cost and mspan = C(T ) ≤ d.Second, given a cost budget b, find a schedule with smallest makespan and cost = i∈J c p π (i) ≤ b.In both instances the d, respectively the b, is strict.The natural decision variant is: given both d and b find a schedule that adheres to both, if one exists.
Remark 1 Instances of SCS might contain schedules with a makespan (and therefore cost) of 0. We can check for those in polynomial time: First, remove all edges with communication delay 0, we get a set of connected components K. Iff ∀ k∈K ∀ j∈k ps(j) = 0 ∨ ∀ j∈k pc(j) = 0 , then there is a schedule with makespan of 0. For the rest of the paper we will assume that our algorithms check that beforehand and are only interested in schedules with mspan > 0.

Results
We start by establishing (weak) NP-hardness already for the case without communication delays and very simple task graphs.More precisely, for the case in which the task graph forms one chain starting with the source and ending with the sink and the case in which the graph is fully parallel, i.e., each job j ∈ J \ {S, T } is only preceded by the source and succeeded by the sink.On the other hand, we establish FPTAS results for both the chain and fully parallel case with arbitrary communication delays and with respect to both objective functions.Furthermore, we present a 2-approximation for the case without delays and identical server and cloud machines (p c = p s ) but arbitrary task graph and the makespan objective and show that the respective algorithm can also be used to solve the problem optimally with respect to both objectives in the case of unit processing times.These results are all relatively simple and are discussed in Section 2. In Section 3 we generalize the previous two task graph models (chain and fully parallel) into one, called extended chain graphs.We present a (2 + ε)-approximation for the budget restrained makespan minimization for this class of task graphs.Furthermore, we discuss some small assumptions on the problem instance, which allow us to achieve FPTAS results instead.We end the section by giving a reduction from the strongly NP-hard 1 | r j | w j U j problem [11].In Section 4 we aim to generalize the previous FPTAS results regarding the makespan as much as possible.We are able to show that an FPTAS can be achieved as long as the maximum cardinality source and sink dividing cut ψ is constant.Intuitively, this parameter upper bounds the number of edges that have to be considered together in a dynamic program and in many relevant problem variants it can be bounded or replaced by the longest anti-chain length.We provide a formal definition in Section 4. Next, we turn our attention to strong NP-hardness results in Section 5. We are able to show, that a classical reduction due to Lenstra and Rinnooy Kan [12] can be adapted to prove NP-hardness already for the variant of SCS without communication delays and processing times equal to one or two.Now, in the case of unit processing times without communication delays this can be trivially solved in polynomial time, and hence we are interested in the case with unit processing times and communication delays.We design an intricate reduction to show that this very basic case is NP-hard as well.Note that in this setting the server and cloud machines are implicitly identical.Furthermore, we are able to show that a slight variation of this reduction implies that no constant approximation with respect to the cost objective can be achieved regarding the general problem.In Section 6, we consider approximation algorithms for the case with unit processing times and delays.We show that a relatively simple approach yields a 1+ε 2ε -approximation for ε ∈ (0, 1] regarding the cost objective if we allow a makespan of (1 + ε)d.In Section 7, we establish some natural generalizations on the model and sketch how those can be solved by slight adaptations of our algorithms for extended chain and constant φ graphs.Lastly, in Section 8 we show how to give an αapproximation, for any chosen α > 0, on the pareto front of a problem with a task graph with constant φ, when we look at the problem as a multi objective optimization problem.This means, that for any point in the actual pareto front, we give a nearby feasible point that is only worse by a factor of 1 + α in both dimensions.In Table 1 we give an overview over the important results.

Related Work
Probably the closest related model to the one considered in this paper was studied by Aba et al. [7].In this paper the input is very similar, however, in both computational settings an unbounded number of machines may be used and the goal is makespan minimization.The authors show NP-hardness on the one hand, and identify cases that can be solved in polynomial time on the other.In the conclusion of this paper a model very similar to the one studied in this work is mentioned as an interesting research direction.For a detailed

Hardness Results
fully parallel or chain task graph, c = 0 (weakly) NP-hard extended chain task graph (strongly) NP-hard ∀j ∈ J : c(j) = 0, pc(j), ps(j) ∈ {1, 2} (strongly) NP-hard c = pc = ps = 1 (unit delays, unit sizes) (strongly) NP-hard general problem no constant approximation w.r.t.cost discussion of related models, we refer to the preprint version of the above work [7].The present model is closely related to the classical problem of makespan minimization on parallel machines with precedence constraints, where a set of jobs with processing times, a precedence relation on the jobs (or a task graph), and a set of m machines are given.The goal is to assign the jobs to starting times and machines such that the precedence constraints are met and the last job finishes as soon as possible.In the 1960's, Graham [1] introduced the list scheduling heuristic for this problem and proved it to be a (2 − 1 m )approximation.Interestingly, to date, this is essentially the best result for the general problem.On the other hand, Lenstra and Rinnooy Kan [12] showed that no better than 4  3 -approximation can be achieved for the problem with unit processing times, unless P=NP.In more recent days, there has been a series of exciting new results for this problem starting with a paper by Svensson [13] who showed that no better than 2-approximation can be hoped for assuming a variant of the unique games conjecture.Furthermore, Levey and Rothvoss [2] presented an approximation scheme with nearly quasi-polynomial running time for the variant with unit processing times and a constant number of machines, and Garg [3] improved the running time to quasi-polynomial shortly thereafter.These results utilized so called LP-hierarchies to strengthen linear programming relaxations of the problems.This basic approach has been further explored in a series of subsequent works (e.g.[4][5][6]), which in particular also investigate the problem variant where a communication delay is incurred for pairs of precedence-constrained jobs running on different machines.The latter problem variant is closely related to our setting as well.
Lastly, there is at least a conceptual relationship to problems where jobs are to be executed in the cloud.For example, a problem was considered by Saha [8] in which cloud machines have to be rented in fixed time blocks in order to schedule a set of jobs with release dates and deadlines minimizing the costs which are proportional to the rented time blocks.Another example is a work by Mäcker et al. [9] in which machines of different types can be rented from the cloud and machine dependent setup times have to be payed before they can be used.Jobs arrive in an online fashion and the goal is again cost minimization.Both papers reference further work in this context.

Preliminary Results -Chains and Fully Parallel
In this section we collect some results that can be considered low hanging fruits and give a first overview concerning the complexity and approximability of our problem.In particular, we show weak NP-hardness already for cases with very simple task graphs and without communication delays.Furthermore, we discuss complementing FPTAS results and a 2-approximation for the case with identical cloud and server machines and without communication delays.

Hardness
We show that SCS is NP-hard even for two very simple types of taskgraphs and in a case where every communication time is 0. For both of these reductions we use the decision variant of the problem: given both a deadline d and a budget b, find a schedule that satisfies both.Naturally this will show the hardness of both the cost minimization as well as the makespan minimization problem.We start by reducing the decision version of knapsack to SCS with a chain graph as its task graph.The knapsack problem is given as a capacity C, a value threshold V and a set of items {1, . . ., n} with weights w i and values v i .The question is, if there exist is a subset of items S such that i∈S w i ≤ C and i∈S v i ≥ V .We create the respective SCS problem as follows.For every item i ∈ {1, . . ., n} create a task with p s (i) = w i +v i and p c (i) = v i .Consider a task graph with those tasks as a chain (in an arbitrary order) and each resulting edge (i, j) has c(i, j) = 0. We set the deadline to d = 1≤i≤n v i + C and the budget to b = 1≤i≤n v i − V .It is left to show, that there is a solution to the knapsack problem if and only if there is a schedule to our transformed problem.Basically we show that there is a one to one relation between our schedules and knapsack solutions.Assume there is some feasible solution (subset of items S) for the knapsack problem with value V ′ .For each i ∈ S we put the respective task in J s and the rest in J c .Since the task graph is a chain we can compute a minimal makespan from this partition: 1≤i≤n v i + i∈S w i which is smaller or equal to d if and only if i∈S w i ≤ C. The cost for the schedule is equal to 1≤i≤n v i − V ′ .Therefore, the cost for the schedule is smaller or equal to b exactly when V ′ ≥ V .It is easy to see that we can construct a knapsack solution from a schedule in a similar vein, therefore we conclude: Theorem 1 The SCS problem is weakly NP-hard for chain graphs and without communication delays.
Secondly we look at problems with fully parallel task graphs, which means that every job j besides S and T has exactly two edges: (S, j) and (j, T ).
Here we do a simple partition reduction.Given a set S of natural numbers, the question is, if there is a partition into sets S 1 and S 2 such that i∈S1 i = i∈S2 i?For every element i in S we create a task with p s (j) = p c (j) = i, set d = b = 1 2 i∈S1 i.We arrange the tasks into a fully parallel task graph where each edge (i, j) has c(i, j) = 0. Imagine a solution S 1 , S 2 for the partition problem.We schedule every task related to an integer in S 1 on the server and every other task on the cloud.Since everything is fully parallel and there are no communication delays we can conclude a makespan of max{ i∈S1 i, max i∈S2 i} and costs of i∈S2 .This is a correct solution for the scheduling problem if and only if i∈S1 i = i∈S2 i. Again it is easy to see that an equivalent argument can be made for the other direction.

Theorem 2
The SCS problem is weakly NP-hard for fully parallel graphs and without communication delays.

Algorithms
In the following, we present complementing FPTAS results for the variants of SCS with fully parallel and chain task graphs.Furthermore, in both of the above reductions we did have no communication delays and in one of them the jobs had the same processing time on the server and the cloud.Hence, we take a closer look at this case as well and present a simple 2-approximation even for arbitrary task graphs and with respect to the makespan objective.

Fully Parallel Case
We show that the variant of SCS with fully parallel task graph can be dealt with using straight-forward applications of well-known results and techniques.In particular, we can design two simple dynamic programs for the search version of the problem that consider for each job the two possibilities of scheduling them on the cloud or on the server and compute for each possible budget or deadline the lowest makespan or cost, respectively, that can be achieved with the jobs considered so far.These dynamic programs can then be combined with suitable rounding procedures that reduce the number of considered states and search procedures for approximate values for the optimal cost or makespan, respectively, yielding: Theorem 3 There is an FPTAS for SCS with fully parallel task graph with respect to both the cost and the makespan objective.
Proof We start by designing the dynamic programs for the search version of the problem with budget b and deadline d.Without loss of generality, we assume J = {0, 1, . . ., n, n + 1} with S = 0, T = n + 1 and set c(j) = c(S, j) + c(j, T ).
For each deadline d ′ ∈ {0, 1, . . ., d} and j ∈ J , we want to compute the smallest cost C[j, d ′ ] of all the schedules of the jobs 0, 1, . . ., j adhering to the deadline d ′ on the server (j = 0 denotes the trivial case that no job after the source has been scheduled).We initialize C[0, d ′ ] = 0 for each d ′ .For all other jobs j we consider the two possibilities of scheduling it on the cloud or server.In particular, let In the second dynamic program, we compute the smallest makespan M [j, b ′ ] of all the schedules of the jobs 0, 1, . . ., j adhering to the budget b ′ , for each budget b ′ ∈ {0, 1, . . ., b} and j ∈ J .Again, we set M [0, b ′ ] = 0 for each b ′ and consider the two possibilities of scheduling job j on the cloud or server.To that end, let For both programs, we can use rounding and scaling approaches to trade the complexity dependence in d or b with a dependence in poly(n, 1 ε ) incurring a loss of a factor (1 + O(ε)) in the makespan or cost, respectively, if a solution is found.This can then be combined with a suitable search procedure for approximate values of the optimal makespan or cost.For details, we refer to Section 4, where such techniques are used and described in more detail.In addition to the techniques mentioned there, the possibility of a cost zero solution has to be considered which can easily be done in this case.

Chain Graph Case
We present FPTAS results for the variant of SCS with chain task graph.The basic approach is very similar to the fully parallel case.
Theorem 4 There is an FPTAS for SCS with chain task graph with respect to both the cost and the makespan objective.
Proof We again start by designing dynamic programs for the search version of the problem with budget b and deadline d.Without loss of generality, we assume J = {0, 1, . . ., n + 1} with S = 0, T = n + 1, and j ∈ {0, 1, . . ., n + 1} being the j-th job in the chain.
For each deadline d ′ ∈ {0, 1, . . ., d}, job j ∈ {0, 1, . . ., n + 1}, and location loc ∈ {s, c} (referring to the server and cloud) we want to compute the smallest cost C[d ′ , j, loc] of all the schedules of the jobs 1, . . ., j adhering to the deadline d ′ and with the job j being scheduled on loc.To that end, we set C[d ′ , 0, s] = 0, C[d ′ , 0, c] = ∞, and with slight abuse of notation use the convention C[z, j, loc] = ∞ for z < 0. Further values can be computed via the following recurrence relations: s] > b, we know that there is no feasible solution for the search version, and otherwise we can use backtracking starting from C[d, n + 1, s] to find one.The time and space complexity is polynomial in d and n.
In the second dynamic program, we compute the smallest makespan M [j, b ′ , loc] of all the schedules of the jobs 0, . . ., j adhering to the budget b ′ and with job j placed on location loc, for each b ′ ∈ {0, 1, . . ., b}, j ∈ {0, 1, . . ., n + 1} and loc ∈ {s, c}.We set M [b ′ , 0, s] = 0, M [b ′ , 0, c] = ∞, use the convention M [z, j, loc] = ∞ for z < 0, and the recurrence relations: we know that there is no feasible solution for the search version, and otherwise we can use backtracking starting from M [b, n + 1, s] to find one.The time and space complexity is polynomial in b and n.
Like in the fully parallel case, we can use rounding and scaling approaches to trade the complexity dependence in d or b with a dependence in poly(n, 1 ε ) incurring a loss of a factor (1 + O(ε)) in the makespan or cost, respectively, if a solution is found.This can then be combined with a suitable search procedure for approximate values of the optimal makespan or cost.For details, we refer to Section 4, where such techniques are used and described in more detail.In addition to the techniques mentioned there, the possibility of a cost zero solution has to be considered which can easily be done in this case as well.

The Extended Chain Model
As a first step towards more general models we introduce the extended chain model.The main idea here is to find a unifying generalization for the chain and fully parallel case.Informally one can imagine an extended chain as a chain graph where any number of edges were replaced with fully parallel graphs.After giving a formal definition of these graphs we introduce a (2 + ε)-approximation for the budget restrained makespan minimization.That algorithm uses reductions to single machine weighted number of tardy jobs scheduling to solve some intermediate parts via known procedures.Therefore, we briefly discuss this problem here before actually giving our algorithm.We finish the constructive side by exploring some assumptions on problem instances that allow us to achieve FPTAS results with our approach.Lastly, we give a reduction to show that this problem is strongly NP-hard.

Single Machine Weighted Number of Tardy Jobs
As mentioned before this section reduces some intermediate steps in the algorithm to the single machine weighted tardiness problems, for which we will reuse an already established algorithm.
The single machine weighted number of tardy jobs (WNTJ) problem, or 1 | | w j U j in three field notation [14], can be defined as follows: On a single machine, where only one job at a time can be processed, are n jobs to be scheduled.Each job has an integer processing time p j , weight w j and due date d j .A job is called 'late' if it is scheduled completion time C j > d j and 'early' if C j ≤ d j .The goal is to find a schedule which minimizes the sum over the weights of the tardy (late) jobs.Pseudo polynomial dynamic programs with runtime in O(n min{ j p j , max j d j }) and O(n min{ j p j , j w j , max j d j }), respectively, were given by Lawler and Moore [15] and later Sahni [16].Denote the former by wTardyJobs.For a more comprehensive survey on this (and related) problems, we refer to [17].

Model
We give a constructive description of extended chain graphs.Let G = (J , E) with S ∈ J and T ∈ J be a chain graph.For any number of edges e = (j − 1, j) ∈ E we may remove the edge e and introduce a set of jobs J j and for every j ′ ∈ J j two edges, namely (j − 1, j ′ ) and (j ′ , j).The resulting graph G ′ = (J ′ , E ′ ) is an extended chain graph.We denote by N the total number of jobs (nodes) in the graph.Denote the SCS problem on extended chains by SCS e .For an example we refer to Figure 1.Note here, that the introduced subgraphs are fully parallel graphs as described earlier and consequently fully parallel graphs, as well as chain graphs, are a subset of extended chain graphs.This also directly infers that SCS e is at least weakly NP-hard as shown in Theorem 1 and Theorem 2.

A (2 + ε)-approximation for Makespan Minimization on the Extended Chain
Theorem 5 There is a (2 + ε)-approximation algorithm for the budget restrained makespan minimization problem on extended chains.
We design a pseudo polynomial algorithm, that given a feasible makespan estimate T (T ≥ mspan OP T ) calculates a schedule with makespan at most min{2T, 2mspan OP T }.Otherwise (T < mspan OP T ) the algorithm calculates a schedule with makespan at most min{2T, 2mspan OP T } or no schedule at all.We can use a binary search to find T ≈ OP T , beginning with the trivial upperbound T = j∈J ′ p s (j) ≥ mspan OP T We first introduce notation that follows the constructive description of extended chains above.We assume J = {0, 1, . . ., n + 1} with S = 0, T = n + 1, and j ∈ {1, . . ., n} being the j-th job in the original chain.If there is a parallel subgraph between some jobs j − 1 and j we denote the jobs in it by J j = {0 j , 1 j , . . ., m j }.
We reuse the state description from Theorem 4, but this time we iteratively create all reachable states by going over the jobs {0, 1, . . ., n + 1}.A state is a combination of timestamp t ∈ {0, 1, . . ., T }, job j ∈ {0, 1, . . ., n + 1}, and location loc ∈ {s, c} (referring to server and cloud respectively).The value of a state is the smallest cost of all the schedules of the jobs 0, 1, . . ., j finishing processing during or before timestamp t, with j being scheduled on loc, denoted by [t, j, loc] = cost.Note, that we have not mentioned the parallel subgraphs in the description above.We start with the trivial start state [0, 0(= S), s] = 0 Let StateList j−1 be the list of states for some job of the chain j − 1.We create StateList j in the following way: First we create a set of state extensions Extensions j , each of form [∆t, loc j−1 → loc j ] = cost.Then we form every (fitting) combination of a state from StateList j−1 with an extension from Extensions j , which forms StateList j .Lastly we cull all dominated states from StateList j and continue with j + 1.
Calculate Extensions j : 1.If there is no parallel subgraph between j − 1 and j we can simply enumerate all state extensions: (a) j − 1 on server, j on server: Otherwise, there is a parallel subgraph between j − 1 and j with jobs J j = {0 j , 1 j , . . ., m j }.
(a) j − 1 on server, j on server: Set ∆ max = min{ j ′ ∈Jj p s (j ′ ), T }, for every ∆ i in {0, . . ., ∆ max }, do the following: Set J s = ∅ and J c = ∅.For every j ′ ∈ J j check: add j ′ to J s (j ′ has to be put on the server) If j ′ ∈J s p s (j ′ ) > ∆ i break and go to next ∆ i .Create a WNTJ instance as follows: For every job j ′ ∈ J j \ (J s ∪ J c ) create a job j ′′ with processing time p j ′ = p s (j ′ ), deadline d j ′′ = ∆ i − j ′ ∈J s p s (j ′ ) and weight w j ′′ = p c (j ′ ).Solve this problem with wTardyJobs, let V be the cost of the solution.Add [∆ i , s → s] = j ′ ∈J c p c (j ′ ) + V to Extensions j .(Remark: This could also be solved as a knapsack problem, but we need WNTJ later either way.)(b) j − 1 on server, j on cloud: Set ∆ max = min{ j ′ ∈Jj p s (j ′ ) + max j ′ ∈Jj c(j ′ , j), T }, for every ∆ i in {0, . . ., ∆ max }, do the following: Set J s = ∅ and J c = ∅.For every j ′ ∈ J j check: add j ′ to J s (j ′ has to be put on the server) Create a WNTJ instance as follows: For every job j ′ ∈ J j \ J c create a job j ′′ with processing time p(j ′′ ) = p s (j ′ ), deadline d j ′′ = ∆ i − c(j ′ , j) and weight This works analogously to the previous case.Simply replace each instance of c(j ′ , j) by c(j − 1, j ′ ) and vice versa.Add the resulting extensions to Extensions j .Note, that for the reduction there is no computational difference between common release date and different deadlines and different release dates but common deadline.(d) j − 1 on cloud, j on cloud: We 2-approximates the resulting extensions, by precisely handling the communication to the server, but upperbounding the communication from the server.Repeat case 2b with the two following changes: For the checks before the problem conversion use c(j − 1, j ′ )+ p s (j ′ )+ c(j ′ , j) and p c (j ′ ) instead of p s (j ′ ) + c(j ′ , j) and c(j − 1, j ′ ) + p c (j ′ ), respectively.Let J s ′ ⊆ J j be the set of jobs actually put on the server in this step.
We wait for the biggest communication delay to pass until we schedule the first job on the server.Note, that ∆ i + max j ′ ∈J s ′ c(j ′ , j) ≤ 2∆ i by construction.
For every pair of a state ([t, j − 1, loc] = cost) ∈ StateList j−1 and ([∆t, loc j−1 → loc j ] = cost ′ ) ∈ Extensions j with loc = loc j−1 add [t+∆t, j, loc j ] = cost+cost ′ to StateList j .After that process, for every triple t, j, loc that has multiple states in StateList j keep only the state with the lowest cost.We can also discard states with cost > b and timestamp t > 2T .Repeat this process with j → j + 1 until we computed StateList n+1 , simply move through that list and select the state with lowest timestamp t.If there is no such state, there exist no schedule with makespan smaller or equal to T .
Lemma 1 Given a feasible T , the described procedure calculates a 2-approximation on the optimal makespan in time poly(N, T ) Proof We start by showing the approximation factor.Assume that we added That hypothetical algorithm would calculate a (possibly infeasible) solution with makespan mspan hypo ALG ≤ mspan OP T , since step 2d underestimates the needed time, and everything else is calculated precisely.The actual algorithm has makespan mspan ALG ≤ 2mspan hypo ALG and therefore also mspan ALG ≤ 2mspan OP T .
We show the runtime of the algorithm by bounding the time needed for each iteration of: 1. constructing state extensions Extensions j , 2. combining the extensions with the previous StateList j−1 and 3. culling duplicates from the resulting StateList j .1.For directly connected jobs j − 1 and j we can trivially calculate the 4 options in constant time.Therefore, we are interested in the runtime of steps 2a, 2b, 2c and 2d for some parallel subgraph with jobs J j .The steps get repeated for ∆ i in {0, . . ., ∆ max }, where ∆ max < T .The preprocessing in each iteration of all four steps, needs time linear in the size of J j .Using wTardyJobs in the steps needs time in we need time in poly(T, N ) to calculate Extensions j , with • 2 (timestamp, job, location) different states (after the previous culling).We may simply bruteforce all possible combinations from StateList j−1 ×Extensions j .Since both of these sets have at most poly(T, N ) elements, the resulting set StateList j also has polynomial size.
3. By culling states from StateList j we reduce it back to size at most 2T •(n+2)•2.
It should be obvious, that we can identify duplicate states in polynomial time.
Note that we iterate the above steps for each job j ∈ {1, . . ., n + 1}.Therefore we have a polynomial repetition of steps needing polynomial time.Note that we prevent exponential build-up in the state lists, by culling duplicates after each iteration.Now we have to scale our instance, such that our pseudo polynomial algorithm runs in proper polynomial time.For that, we scale T and all p c , p s and c by N ε ′ T and round down to the next integer.Then, we run our algorithm with the scaled values, but still use the unscaled p c to calculate the value (cost) of states, as those calculations only factor logarithmically in the runtime, a p c exponential in the input size is fine.The algorithm now needs time in poly(N, ⌊ T •N ε ′ T ⌋) ≤ poly(N, ε ′ ) and finds a 2 approximation for the scaled instance (given a feasible T ).After scaling back up each job and communication delay might need up to T N ε ′ additional time, delaying our whole schedule by at most 3N • T N ε ′ ≤ 3ε ′ T .For ε = 3ε ′ and T = mspan OP T our resulting schedule has a makespan of mspan ALG ≤ 2mspan OP T + εT = (2 + ε)mspan OP T .Via a binary search we can find such a T by repeating our procedure at most log j∈J ′ p s (j) times.This concludes the proof of Theorem 5.
Corollary 1 There is a polynomial algorithm for the deadline restrained cost minimization problem on extended chains, that finds a schedule with at most optimal cost, but a makespan of (2 + ε)d.

Cases with FPTAS
We reconsider the approximation result for three assumptions on the model which allow us to improve the result.Looking back at Theorem 5, we build an algorithm that would be an FPTAS if it were not for case 2d where we needed to double our time frame ∆ i to fit the unaccounted communication delay.In the following part we will only describe how to approach that case, since everything else can stay as it was.
First we assume locally small delays in the parallel subgraphs, meaning that the smallest processing time in the subgraph is at least as big as the largest communication delay.More precisely, for every J e with e = (j − 1, j) it holds that min In this case only the first j α , and the last job j ω to be processed on the server are actually affected by their communication delay, since all other delays fit in the time frame, where j α and j ω are processed.After the preprocessing of a given ∆ i , for each pair of jobs j α , j ω ∈ J j \ J c with j α = j ω fo the following: Assume j α , j ω are the first and last job to be processed on the server, respectively.Add j α and j ω to J s .Now create the WNTJ instance as follows: For every job j ′ ∈ J j \ (J s ∪ J c ) create a job j ′′ with processing time p j ′ = p s (j ′ ), deadline d j ′′ = ∆ i − (c(j − 1, j α ) + c(j ω , j)) − j ′ ∈J s p s (j ′ ) and weight w j ′′ = p c (j ′ ).Solve this problem with wTardyJobs, let V be the cost of the solution and note Secondly, we assume a constant upper bound c max on the communication delays inside parallel subgraphs.More precisely, for every J e with e = (j −1, j) it holds that c max ≥ c(j − 1, j ′ ) and c max ≥ c(j ′ , j).
Instead of brute forcing only a first and last job, we brute force the first and last c max time steps.Trivially, jobs with p s = 0 can be put on the server, and therefore there are at most O(N c max • N c max ) combinations we have to work through.The remaining part works analogously to the first case.
Lastly, we assume that each job produces some output, that has to be send to all of its direct successors in full, meaning that all outgoing communication delays of a job are equivalent.More precisely, for every J e with e = (j − 1, j) it holds that ∀j ′ , j ′′ ∈ J e : c(j − 1, j ′ ) = c(j − 1, j ′′ ).
Here we can simply reuse the result from step 2b, but subtract c(j − 1, j ′ ) from the ∆ i used in the WNTJ problem.Since all c(j − 1, j ′ ) are equal, no job could be processed on the server in the first c(j − 1, j ′ ) time steps, and all jobs are available after those c(j − 1, j ′ ) time steps.All these, in combination with the previously described scaling approach, lead to FPTAS results: Theorem 6 There is an FPTAS for the budget restrained makespan minimization problem on extended chains, if at least one of the following holds for every parallel subgraph Je with e = (j − 1, j): 1. min j ′ ∈Je min{ps(j ′ ), pc(j ′ )} ≥ max j ′ ∈Je max{c((j − 1, j ′ )), c((j ′ , j))} 2. cmax ≥ c(j − 1, j ′ ) and cmax ≥ c(j ′ , j)

Strong NP-Hardness of Scheduling Extended Chains
As already noted, this problem is at least weakly NP-hard, following from Theorem 1 as well as Theorem 2. We show that this problem is actually strongly NP-hard, by giving a reduction from the strongly NP-hard 1 | r j | w j U j problem [11].As in Section 2.1 we use decision variants of the considered problems, resulting in results for both deadline restrained cost reduction and budget restrained makespan minimization.

Theorem 7
The SCS e problem is strongly NP-hard.
Proof 1 | r j | w j U j is defined as follows: Given a set of jobs J = {1, . . ., n}, each with processing time p j , release date r j , deadline d j and weight w j , schedule the jobs (without preemption) on a single machine, such that the sum of weights of late jobs is smaller or equal to a given b ( w j U j ≤ b).A job j is late (U j = 1) if it finishes processing after d j , U j = 0 otherwise.Given an instance of 1 | r j | w j U j , create the following decision version of SCS e .Note that we will substitute "an edge (j, j ′ ) with communication delay c(j, j ′ ) = k" simply by "an edge c(j, j ′ ) = k" to keep this readable.As per definition create S and T with ps(S) = ps(T ) = 0 and pc(S) = pc(T ) = ∞.Create jobs j pre and j post with ps(j pre ) = ps(j post ) = ∞ and pc(j pre ) = pc(j post ) = 0 and edges c(S, j pre ) = 0 and c(j post , T ) = 0. Set w max = max j∈J w j and d max = max j∈J d j .For every j ∈ J create a job j ′ with ps(j ′ ) = p j , pc(j ′ ) = w j and edges c(j pre , j ′ ) = r j , c(j ′ , j post ) = w max + d max − d j .Set the deadline to d ′ = w max + d max and the budget b ′ = b.Trivially, in all schedules S and T are scheduled on the server, j pre and j post on the cloud.Note that neither of these jobs contributes processing time to the resulting schedule.For better comprehension we give an example of the structure in Figure 2.
It remains to show, that there is a schedule with w j U j ≤ b for the original 1 | r j | w j U j problem, iff there is a schedule with cost ≤ b ′ and makespan ≤ d ′ for the SCS e problem.
Assume that there is a schedule with w j U j ≤ b.We can partition the jobs into two sets J early and J late , which contain all jobs that are on time or late, respectively.Place all jobs that correspond to a job from J late on the cloud and start them immediately.All of them finish before d ′ = w max +d max , since w max ≥ pc(j ′ ).Place all remaining jobs (J early ) on the server and let them start at the same time as in the original schedule.Since no job starts before its release date no communication delay is violated in the new schedule.Since all jobs from J early end before their deadline, no communication delay hinders us from scheduling j post and T at d ′ = ∆ max + d max .The cost of that schedule is equal to the value of w j U j in the original schedule and therefore ≤ b.One can confirm that the other direction works analogously by keeping the schedule of jobs on the cloud intact, and simply processing all jobs from the cloud after that schedule in any order.
With argumentation similar to the reduction above, one can show that the 1 | r j | w j U j problem is embedded in step 2d of this chapter's algorithm.This leads to the observation, that we might be able to use approximation results for 1 | r j | w j U j to improve our handling of that case.Sadly, to the best of our knowledge, no approximation algorithms with a provable approximation factor are known for this problem.There are however practical algorithms, which have been tested empirically.Used approaches contain mixed integer programming [18], genetic algorithms [19] and branch-and-bound algorithms [20].For more information we again refer to [17].

Constant Cardinality Source and Sink Dividing Cut
We introduce the concept of a maximum cardinality source and sink dividing cut.For G = (J , E), let J S be a subset of jobs, such that J S includes S and there are no edges (j, k) with j ∈ J \ J S and k ∈ J S .In other words, in a running schedule J S and J \ J S , could represent already processed jobs and still to be processed jobs respectively.Denote by J G S the set of all such sets J S .We define the maximum number of edges between any set J S and J \ J S in G.In a series-parallel task graph ψ is equal to the maximum anti-chain size of the graph.
In this chapter we discuss how to solve or approximate SCS problems with a constant size ψ, but otherwise arbitrary task graphs.We first consider the deadline confined cost minimization, in Theorem 9 we show how to adapt this to the budget confined makespan minimization.We give a dynamic program to optimally solve instances of SCS with arbitrary task graphs.At first we will not confine the algorithm to polynomial time.Consider a given problem instance with G = (J , E), its source S and sink T , processing times p s (j) and p c (j) for each j ∈ J , communication delays c(i, j) for each (i, j) ∈ E and a deadline d.
We define intermediate states of a (running) schedule, as the states of our dynamic program (see Section 4).Such a state contains two types of variables.First we have two global variables, the timestamp t and the number of time steps the server has been unused f s .In other words, the server has not finished processing a job since t − f s .The second type is defined per open edge.An open edge is a e = (j, k) where j has already been processed, but k has not.For each such edge add the variables e = (j, k) (the edge itself), loc j ∈ {s, c} denoting if j was processed on the server (s) or the cloud (c) and f j denoting the number of time steps that have passed since j finished processing.If a job j is contained in multiple open edges, loc j and f j are still only included once.Write the state as [t, f s , e 1 = (j 1 , k 1 ), loc j 1 , f j 1 , . . ., e m = (j m , k m ), loc j m , f j m ], where e 1 , . . ., e m denote all open edges.Note here, that there is information that we purposefully drop from a state: the completion time and location of every processed job without open edges, as those are not important for future decisions anymore.There might be multiple ways to reach a specific state, but we only care about the minimum possible cost to achieve that state, which is the value of the state.We iteratively calculate the value of every reachable state with t = 0, 1, 2, . . . .We start with the trivial state [t = 0, f s = 0, e 1 , . . ., e m , loc S = s, f S = 0] = 0, where e 1 , . . ., e m ∈ E with e i = (S, j).This state forms the beginning of our (sorted) state list.We keep this list sorted in an ascending order of state values (costs) at all times.We exhaustively calculate every state that is reachable during a specific time step, given the set of states reachable during the previous time step.Intuitively, we try every possible way to "fill up" the still undefined time windows f s and f j .
Finally, we give the actual dynamic program in Algorithm 1.After the dynamic program finished, we iterate through the state list one last time and take the first state [t = d, f s ].The value of that state is the minimum cost possible to schedule G in time d.One can easily adapt this procedure to also yield such a schedule, by keeping a list of all processed jobs per state containing their location and completion time.
Proof At any point there are a maximum of O(d • (d • n) ψ ) states in the state list.For every t we look at every state.Since we never insert a state in front of the state we are currently inspecting (costs can only increase), this traverses the list exactly once.For each of those states we calculate every possible successor, of which there are O(ψ) and traverse the state list an additional time to correctly insert or update the state.We iterate from t = 0 to d and therefore get a runtime of: Algorithm 1 DPfGG: Dynamic Program for General Graphs can j be processed on the server?
jF its ← T RU E

Rounding the Dynamic Program
We use a rounding approach on DPfGG to get a program that is polynomial in n =| J |, given that ψ is constant.We scale d, c, p c , and p s by a factor ς := ε•d 2n .Denote by d := ⌈ d ς ⌉ ≤ 2n ε + 1, ps (j) := ⌊ ps(j) ς ⌋, pc (j) := ⌊ pc(j) ς ⌋ and ĉ(x) := ⌊ c(x) ς ⌋.Note here, that we round up d but everything else down.We run the dynamic program with the rounded values, but still calculate the cost of a state with the original unscaled values.
We transform the output π ′ to the unscaled instance, by trying to start every job j at the same (scaled back up) point in time as in the scaled schedule.Since we rounded down, there might now be points in the schedule where a job j can not start at the time it is supposed to.This might be due to the server not being free, a parent node of j that has not been fully processed or an unfinished communication delay.We look at the first time this happens and call the mandatory delay on j ∆ and increase the start time of every remaining job by ∆.Repeat this process until all jobs are scheduled.We introduce no new conflicts with this procedure, since we always move everything together as a block.Call this new schedule π.
Theorem 8 Assuming a constant number ψ DPfGG combined with the scaling technique finds a schedule π with at most optimal cost and a makespan ≤ (1 + ε) • d in time poly(n, 1 ε ), for any ε > 0.
Proof We start by proving the runtime of our algorithm.We can scale the instance in polynomial time, this holds for both scaling down and scaling back up.The dynamic program now takes time in O( d2ψ+3 • n 2ψ+1 ), where d ≤ 2n ε + 1.Since ψ is constant this results in an dynamic program runtime in poly(n, 1 ε ).In the end we transform the schedule as described above, for that we go trough the schedule once and delay every job no more than n times.Trivially, this can be done in polynomial time as well.
Secondly we show that the makespan of π is at most (1+ε)•d.Every valid schedule for the unscaled problem is also valid in the scaled problem, meaning that there is no possible schedule we overlook due to the scaling.In the other direction this might not hold.First, while scaling everything down we rounded the deadline up.This means, that scaled back we might actually work with a deadline of up to d + ς.Secondly, we had to delay the start of jobs to make sure that we only start jobs when it is actually possible.In the worst case we delay the sink T a total of n − 2 times, once for every job other than S and T .Each time we delay all remaining jobs we can bound the respective ∆ < 2 • ς.This is due to the fact that each of the delaying options can not delay by more than ς (as that is the maximum timespan not regarded in the scaled problem) and only a direct predecessor job and the communication from it needing longer can coincide to a non-parallel delay.Taking both of these into account, a valid schedule for the scaled problem might use time up to Lastly, we take a look at the cost of π.While rounding, we did not change the calculation of a states value, and with every valid schedule of the unscaled instance being still valid in the scaled instance we can conclude that the cost of π is smaller or equal to an optimal solution of the original problem.
Theorem 9 DPfGG combined with the scaling technique and a binary search over the deadline yields an FPTAS for the cost budget makespan problem, for graphs with a constant number ψ.
Proof Theorem 8 can be adapted to solve this, assuming that we know a reasonable makespan estimate of an optimal solution to use in our scaling factor.During the algorithm discard any state with costs bigger than the budget and terminate when the first state [t, fs] is reached.The t gives us the makespan.
Using a makespan estimate that is too big will lead to a rounding error that is not bounded by ε • mspan OP T , a too small estimate might not find a solution.To solve this, we start with an estimate that is purposefully large.Let d max = j∈J ps(j) be the sum over all processing times on the server.There is always a schedule with 0 costs and makespan d max .We run our algorithm with the scaling factor ς 0 := ε•d max 4n .Iteratively repeat this process with scaling factor ς i = 1 2 i ς 0 for increasing i starting with 1.At the same time half the original deadline estimate in each step, which leads to d, and therefore the runtime, to stay the same in each iteration.End the process when the algorithm does not find a solution for the current i and deadline estimation.This infers that there is no schedule with the wanted cost budget and a makespan smaller or equal to 1  2 i d max (in the unscaled instance), therefore 1 2 i d max < mspan OP T .We look at the result of the previous run i − 1: The scaled result was optimal, therefore the unscaled version has a makespan of at most It should be easy to infer from Lemma 2 that each iteration of this process has polynomial runtime.Combined with the fact that we iterate at most log d max times we get a runtime that is in poly(n, 1 ε ).

Remark 2
The results of this chapter work, as written, for a constant ψ.Note here, that for series parallel digraphs, this is equivalent to a constant anti-chain size.The algorithms can also be adapted to work on any graph with constant anti-chain size, if the communication delays are bounded by some constant or are locally small.Delays are locally small, if for every (j, k) ∈ E, c(j, k) is smaller or equal than every pc(k ′ ), ps(k ′ ), pc(j ′ ) and ps(j ′ ), where k ′ is every direct successor of j and j ′ every direct predecessor of k [21].

Strong NP-Hardness
In this section, we consider more involved reductions then in Section 2 in order to gain a better understanding for the complexity of the problem.First, we show that a classical result due to Lenstra and Rinnooy Kan [12] can be adapted to prove that already the variant of SCS without communication delays and processing times equal to one or two is NP-hard.This already implies strong NP-hardness.Remember that we did show in Section 2 that SCS without communication delays and with unit processing times can be solved in polynomial time.Hence, it seems natural to consider the problem variant with unit processing times and communication delays.We prove this problem to be NP-hard as well via an intricate reduction from 3SAT that can be considered the main result of this section.Lastly, we show that the latter reduction can be easily modified to get a strong inapproximability result regarding the general variant of SCS and the cost objective.

No Delays and Two Sizes
We show strong hardness for the case without communication delays and p c (j), p s (j) ∈ {1, 2} for each job j.The reduction is based on a classical result due to Lenstra and Rinnooy Kan [12].Let G = (V, E), k be a clique instance with | E |> k 2 , and let n =| V | and m =| E |.We construct an instance of the cloud server problem in which the communication delays all equal zero and both the deadline and the cost bound is 2n + 3m.There is one vertex job J(v) for each node v ∈ V and one edge job J(e) for each edge e ∈ E and J({u, v}) is preceded by J(u) and J(v).The vertex jobs have size 1 and the edge jobs size 2 both on the server and on the cloud.
Furthermore there is a dummy structure.First, there is a chain of 2n + 3m many jobs called the anchor chain.The i-th job of the anchor chain is denoted A(i) for each i ∈ {0, . . .2n + 3m − 1} and has size 1 on the cloud and size 2 on the server.Next, there are gap jobs each of which has size 1 both on the server and the cloud.Let k * = k 2 and v ≺ w indicate that an edge from v to w is included in the task graph.There are four types of gap jobs, namely Lastly, there are the source and the sink which precedes or succeeds all of the above jobs, respectively.Proof First note that in a schedule with deadline 2n+3m+1 the anchor chain has to be scheduled completely on the cloud.If the schedule additionally satisfies the cost bound, all the other jobs have to be scheduled on the server.Furthermore, for the gap and anchor chain jobs there is only one possible time slot due to the deadline.In particular, A(i) starts at time i, G(  4).The m edge jobs have to be scheduled in the length 2 slots, and hence the vertex jobs have to be scheduled in the length 1 slots.=⇒ : Given a k-clique, we can position the k clique vertices in the first k length 1 slots, the corresponding k * edges in the first length 2 slots, the remaining vertex jobs in the remaining length 1 slots, and the remaining edge jobs in the remaining length 2 slots.
⇐= : Given a feasible schedule, the vertices corresponding to the first length 1 slots have to form a clique.This is the case, because there have to be k * edge jobs in the first length 2 slots and all of their predecessors are positioned in the first length 1 slots.This is only possible if these edges are the edges of a k-clique.

Hence, we have:
Theorem 10 The SCS problem with job sizes 1 and 2 and without communication delays is strongly NP-hard.
In the above reduction the server and the cloud machines are unrelated relative to each other due to different sizes of the anchor chain jobs.However, it is easy to see that the reduction can be modified to a uniform setting where the cloud machines have speed 2 and the server speed 1.If we allow communication delays, even identical machines can be achieved.

Unit Size and Unit Delay
We consider a unit time variant of our model in which all p c = p s = 1 and all c = 1.Note here, that this also implies that the server and the cloud are identical machines (the cloud still produces costs, while the server does not).As usual for reductions we look at the decision variant of the problem: Is there a schedule with cost smaller or equal to b while adhering to the deadline d.

Theorem 11
The SCS 1 problem is strongly NP-hard.
We give a reduction 3SAT ≤ p SCS 1 .Let φ be any boolean formula in 3-CNF, denote the variables in φ by X = {x 1 , x 2 , . . ., x m } an the clauses by Before we define the reduction formula we want to give an intuition and a few core ideas used in the reduction.
The main idea is that we ensure that nearly everything has to be processed on the cloud, there are only a few select jobs that can be handled by the server.For each variable there will be two jobs, of which one can be processed on the server, the selection will represent an assignment.For each clause there will be a job per literal in that clause, only one of which can be processed on the server, and only if the respective variable job is 'true'.Only if for each variable and for each clause one job is handled by the server the schedule will adhere to both the cost and the time limits.
A core technique of the reduction is the usage of an anchor chain.An anchor chain of length l consists of two chains of the same length l := d − 2, where we interlock the chains by inserting (a i , b i+1 ) and (b i , a i+1 ) for two parallel edges (a i , a i+1 ) and (b i , b i+1 ).The source S is connected to the two start nodes of the anchor chain, the two nodes at the end of the chain are connected to T .
Lemma 4 If the task graph of a SCS 1 problem contains an anchor chain, every valid schedule has to schedule all but one of a 1 ,b 1 and one of a l ,b l on the cloud.For every job a i , b i 1 < i < l the time step in which it will finish processing on the cloud in every valid schedule is i + 1.

Finally we give the reduction function
We define G by constructively giving which jobs and edges are created by f .Create an anchor chain of length d − 2, this will be used to limit parts of a schedule to certain time frames.Note that by Lemma 4 we know that every valid schedule of G = (J , E), d, k has every node pair of the anchor chain (besides the first and last) on the cloud at a specific fixed timestamp.More specifically, the a i+1 a i+2 a i+3 a i+4 a i+5 a 6+m+p a 7+m+p a 8+m+p a 9+m+p connection chain Fig. 6: Schematic representation of the variable and clause gadgets and their connection.
completion time of a i and a i+j differ by exactly j time units.For each variable x i ∈ X create two jobs j xi and j xi and edges (a 1+i , j xi ), (a 1+i , j xi ) and (j xi , a 5+i ), (j xi , a 5+i ).For each clause C φ p create a clause job j C φ p and edges (a 7+m+p , j C φ p ) and (j C φ p , a 9+m+p ).Let L p 1 , L p 2 , L p 3 be the literals in C φ p .Create jobs j L p 1 , j L p 2 , j L p 3 and edges (j L p 1 , C φ p ), (j L p 2 , C φ p ), (j L p 3 , C φ p ) for these literals.For every literal job j L p 1 connect it to the corresponding variable job j xi or j xi by a chain of length 1 + (m − i) + p.Also create an edge from a 3+i to the start of the created chain and an edge from the end of the chain to a 6+m+p .
It remains to show that there is a schedule of length at most d with costs at most b in f (φ) = G, d, b if and only if there is a satisfying assignment for φ.
Lemma 5 In a deadline adhering schedule for f (φ) = G, d, b every job in the anchor chain (except on at the front and one at the end), every job in the variable and clause literal connecting chains and every clause job has to be scheduled on the cloud.
Proof By Lemma 4 we already know that every node in the anchor chain except one of v 1 ,w 1 and one of v l ,w l has to be scheduled on the cloud.We also know, that the jobs in the anchor chain have fixed time steps in which they have to be processed.We look at some chain and its connection to the anchor chain.The start of the chain of length 1 + (m − i) + p is connected to a 3+i , the end to a 6+m+p .Between the end of a 3+i and the start of a 6+m+p are 6 + m + p − 1 − (3 + i) = 2 + m + p − i time steps.So with the processing time required to schedule all 1 + (m − i) + p jobs of the chain, there is only one free time step, but we would need at least 2 free time steps to cover the communication cost to and from the server.(Recall here that both a 3+i and a 6+m+p have to be processed on the cloud).The same simple argument fixes each clause job to a specific time step on the server.Lemma 6 In a deadline adhering schedule for f (φ) = G, d, b only one of jx i and jx i can be processed on the server for every variable x i ∈ X .The same is true for Proof jx i and jx i are both fixed to the same time interval via the edges (a 1+i , jx i ), (a 1+i , jx i ) and (jx i , a 5+i ), (jx i , a 5+i ).Since a 1+i and a 5+i will be processed on the cloud and keeping communication delays in mind, only the middle of the three time steps in between can be used to schedule jx i or jx i on the server.Since the server is only a single machine only on of them can be processed on the server.Note here that the other job can be scheduled a time step earlier which we will later use.The argument for j L p 1 , j L p 2 , j L p 3 works analogously to the statement above.
Lemma 7 There is a deadline adhering schedule for of C φ p has to be processed in time step 9 + m + p (between a 7+m+p and a 9+m+p ).Therefore, j L p 1 has to be processed no later than 8 + m + p or 7 + m + p if it is processed on the cloud or server respectively.Let jx i be the variable job connected to j L p 1 via a connection chain.If jx i is true (scheduled on the cloud), it can finish processing at time step 3 + i, which does not delay the start of the connection chain (which is connected to a 3+i , finishing in time step 4 + i).This means that the chain can finish in time step 4 + i + 1 + (m − i) + p = 5 + m + p, the time step 6 + m + p can be used for communication, allowing j L p 1 to be processed by the server in 7 + m + p.If jx i is false (scheduled on the server), it finishes processing at time step 4 + i, which, combined with the induced communication delay, delays the start of the chain by 1.Therefore, the chain only finishes in time step 6 + m + p, and j L p 1 has to be processed on the cloud, since there is not enough time for the communication back and forth.
Trivially, the same argument holds true for j L p It should be easy to see that the reduction function f is computable in polynomial time.Combined with Lemma 7 this concludes the proof of our reduction 3SAT ≤ p SCS 1 .The correctness of Theorem 11 trivially follows from that.

The General Case
Adapting the previous reduction we can show an even stronger result for the general case of SCS.Basically we are able to degenerate the reduction output in a way, that a satisfying assignment results in a schedule with cost 0, while every other assignment (schedule) has costs of at least 1.It should be obvious, that this also means that there is no approximation algorithm for this problem with a fixed multiplicative performance guarantee, if P = NP.
This reduction uses processing times and communication delays of 0, ∞ and values in between.Note that ∞ can simply be replaced by d + 1.To keep the following part readable we again substitute "an edge (j, j ′ ) with communication delay c(j, j ′ ) = k" simply by "an edge c(j, j ′ ) = k" We follow the same general structure (an anchor chain, variable-, clauseand connection gadgets).The anchor chain now looks as follows: For every time step create two jobs a i and a ′ i with p s (a i ) = 0, p c (a i ) = ∞, p s (a ′ i ) = ∞, p c (a ′ i ) = 0 and an edge c(a i , a ′ i ) = 0.These chain links are than connected by an edge c(a ′ i , a i+1 ) = 1.Finally we create c(S, a 1 ) = 1 and c(a d , T ) = 0.It should be easy to see, that every schedule will process a i and a ′ i in time step i on the server and the cloud respectively.This gives us anchors to the server and to the cloud for every time step, without inducing congestion or costs.Since the anchor jobs themselves have processing time of 0, the "usable" time interval between some a i and a i+1 is one full time step.
For each variable x i ∈ X create two jobs j xi , j xi with p s (j xi ) = p s (j xi ) = 1 and p c (j xi ) = p c (j xi ) = 0. Create edges c(a i , j xi ) = 1, c(a i , j xi ) = 1 and c(j xi , a i+1 ) = 0, c(j xi , a i+1 ) = 0.In short, only one of them can be processed on the server, the other on the cloud.Both will finish in time step i + 1, the one processed on the server is true, therefore processing both on the cloud is possible, but not helpful.
For each clause C φ p create a clause job j C φ p with p s (j C φ p ) = ∞, p c (j C φ p ) = 0 and edges c(a ′ 5+m+3p , j C φ p ) = ∞ and c(j C φ p , a ′ 6+m+3p ) = ∞.This means, that j C φ p has to finish processing by time step 6 + m + 3p.Let L p 1 , L p 2 , L p 3 be the literals in C φ p .Create jobs j L p 1 , j L p 2 , j L p 3 each with p c = p s = 1 and edges c(j L p 1 , C φ p ) = 0, c(j L p 2 , C φ p ) = 0, c(j L p 3 , C φ p ) = 0 for these literals.Create edges c(a 3 + m + 3p, j L p 1 ) = 0, c(a 3 + m + 3p, j L p 2 ) = 0 and c(a 3 + m + 3p, j L p 3 ) = 0, so that, in theory, all three of the literal jobs can be processed on the server, finishing in time steps 4 + m + 3p, 5 + m + 3p and 6 + m + 3p respectively.Lastly, for every literal job j L p 1 connect it to the corresponding variable job j xi (or j xi ) by a an edge with communication delay of m − i + 3p + 3. Since j xi (or j xi ) finish processing in time step i + 1, this means that j L p 1 can start no earlier than m + 3p + 4 (and therefore finish processing in 5 + m + 3p), if j xi (or j xi ) were processed on the cloud.
Recall here, that a variable job being scheduled on the server denotes that it is true.So only a literal job that evaluates to true, can be scheduled so that it finishes processing in time step 4 + m + 3p on the cloud.
It follows directly, that a schedule for this construction will have costs of 0 if and only if the assignment derived from the placement of the variable jobs fulfills every clause.
Theorem 12 There is no approximation algorithm for SCS that has a fixed performance guarantee, assuming that P = N P .

Unit Size and Unit Delay -And no Delay
As the last step of this paper we explore simple algorithms on unit size instances with arbitrary task graphs.Recall that we proved these to be strongly NP-hard.We use resource augmentation and ask: given a SCS 1 problem instance with deadline d, find a schedule in poly.time that has a makespan of at most (1 + ε) • d that approximates the optimal cost in regards to the actual deadline d.
If there is a chain of length d or d− 1, that chain has to be scheduled on the server, since there is no time for the communication delay.For instances with a chain of size d that is trivially optimal, for those with d − 1 we can check in polynomial time if any other job also fits on the server, again, finding an optimal solution.From now we assume that there is no chain of length more than d − 2.
First, construct a schedule which places every job on the cloud, as fast as possible.The resulting schedule from time step (ts) 1 to (1 + ε) • d looks as follows: one ts of communication, at most d − 2 ts of processing on the server, another ts for communication followed by at least εd empty ts.Now pull (one of) the last job(s) that is processed on the cloud to the last empty ts and process it on the server instead.Repeat this process until the last job can not be moved to the server anymore.Do the whole procedure again, but this time starting with the cloud schedule in the end of the schedule, and each time pulling the first job to the beginning.Keep the result with lower costs.Note that one can always fill the ts being used solely for communicating from the server to the cloud with processing one job on the server, that otherwise would be one of the first jobs being processed on the cloud (the same holds for the other direction).
Theorem 13 The described algorithm yields a schedule with approximation factor of 1+ε 2ε while having a makespan of at most The algorithm places all jobs on the server, the cost is 0 and therefore optimal.
Case (1 + ε)d < n < (1 + 2ε)d: Assume that the preliminary cloud-only schedule needs d − 2 ts on the cloud, if that is not the case, we stretch the schedule to that length.There are n jobs distributed onto d − 2 ts.Therefore, either from the front or from the end, there is an interval of length For ε ≥ 0.5 it holds that: Case (1+2ε)d ≤ n: In this case we simply observe that our algorithm places at least εd many jobs on the server.For ε ≥ 1 it holds that:

No Delays and Identical Machines
We design a simple heuristic for the case in which the server and the cloud machines behave the same, that is, p c (j) = p s (j) for each job j (except for the source and sink), and the communication delays all equal zero.In this case, we may define the length of a chain in the task graph as the sum of the processing times of the jobs in the chain.The first step in the algorithm is to identify a longest chain in the task graph, which can be done in polynomial time.The jobs of the longest chain are scheduled on the server and the remaining jobs on the cloud each as early as possible.Now, the makespan of the resulting schedule is the length of a longest chain, which is optimal (or better) and there are no idle times on the server.However, the schedule may not be feasible since the budget may be exceeded.Hence, we repeatedly do the following: If the budget is still exceeded, we pick a job scheduled on the cloud with maximal starting time and move it on to the server right before its first successor (which may be the sink).Some jobs on the server may be delayed by this but we can do so without causing idle times.If all the processing times are equal this procedure produces an optimal solution and otherwise there may be an additive error of up to the maximal job size.Hence, we have: Theorem 14 There is a 2-approximation for SCS without communication delays and identical server and cloud machines.
It is easy to see, that the analysis is tight considering an instance with three jobs: One with size b, one with size b + ε, and one with size 2ε.The first jobs precedes the last one.Our algorithm will place everything on the server, while the first job is placed on the cloud in the optimal solution.
Note that we can take a similar approach to find a solution with respect to the cost objective by placing more and more jobs on the server as long as the deadline is still adhered to.However, an error of one job can result in an unbounded multiplicative error in the objective in this case.On the other hand, it is easy to see that in the case with unit processing times, there will be no error at all in both procedures yielding: Corollary 2 The variant of SCS without communication delays and unit processing times can be solved in polynomial time with respect to both the makespan and the cost objective.

Generalizations of Server Cloud Scheduling
In this chapter we introduce some generalizations to the SCS.We consider different aspects from multiple clouds and server machines to direction specific delays.We sketch how to adapt our algorithms for SCS e and SCS ψ to cover those new generalizations.

Changes in the Definitions
We shortly define the changes to the model that we explore in this section.

Machine Model
So far we imagined a single server machine and one homogeneous cloud in our problem definition.Now, instead of a single server machine there can be any (constant) number of identical server machines: server = {s 1 , . . ., s z }.Instead of one homogeneous cloud there can be any number of different cloud contexts: clouds = {c 1 , . . ., c k }.Each cloud context still consists of an unlimited number of parallel machines.

Jobs
Jobs are still given as a task graph G = (J , E).A job j ∈ J has processing time p s (j) on any server machine and processing time p ci (j) on a machine of cloud context c i .An edge e = (i, j) and machine contexts m 1 , m 2 ∈ {s, c 1 , . . ., c k } have a communication delay of c m1⊲m2 (i, j) ∈ N 0 , which means, that after job i finished on a machine of type m 1 , j has to wait an additional c m1⊲m2 (i, j) time steps before it can start on a machine of type m 2 .For m 1 = m 2 we set c m1⊲m2 (i, j) = 0. Note that this function does not need to be symmetric, e.g.c m1⊲m2 (i, j) and c m2⊲m1 (i, j) may be unequal.

Costs and Schedules
Previously we defined cost simply by "time spend on the cloud".While considering multiple clouds, that is not sensible anymore.A faster cloud will not be universally cheaper than a slower one.We define a cost function based on the cloud context and job, cost : J × clouds → N 0 .A schedule still consists of C : J → N 0 (maps jobs to their completion time), but instead of a partition we give a mapping function η : J → {s 1 , . . ., s z } ∪ {c 1 , . . ., c k }.Note that s i refers to one specific server machine, while c i refers to a cloud context, consisting of infinitely many machines.
We call a schedule π = (C, η) valid if and only if the following conditions are met: a) There is always at most one job processing on each server: b) Tasks are not started before the previous tasks has been finished/ the required communication is done: The makespan (mspan) of a schedule is still given by the completion time of the sink T : C(T ).The cost (cost) of a schedule is given by: j∈jobs:η(j)∈clouds cost(j, η(j)).

Revisiting SCS e
We briefly sketch how to adapt the algorithm from Section 3 to incorporate the previously defined changes on the model.We will use the observations, that multiple server machines only affect the scheduling of parallel parts and that we can always calculate an optimal cloud location for a job in a given situation (part of the schedule, time frame and location of predecessor and successor).
Theorem 15 There is a (4 + ε)-approximation algorithm for the budget restrained makespan minimization problem on extended chains, even when there are z server machines, k different cloud contexts, the communication delays are directionally dependent on the machine context, and costs are given as an arbitrary cost function cost : J × clouds → N 0 .
Proof We adapt the pseudo polynomial algorithm from Section 3 that given a feasible makespan estimate T (T ≥ mspan OP T ) calculates a schedule with makespan of at most min{2T, 2mspan OP T }, such that it incorporates the changes to the model and calculates a schedule with makespan of at most min{4T + ε ′ , 4mspan OP T + ε ′ }.The only change in the state description is that loc ∈ {s, c 1 , . . ., c k } instead of loc ∈ {s, c}.
As the state description is used for the chain parts of the extended chain, we do not differentiate the server machines here.The creation of the state extension list Extensions j (each of form [∆t, loc j−1 → loc j ] = cost), has the following changes: • Instead of the four combinations s → s, s → c, c → s, c → c, we consider all combinations from {s, c 1 , . . ., c k } × {s, c 1 , . . ., c k }. • Substitute the corresponding values, for example [p c (j) + c(j − 1, j), s → c] = p c (j) becomes [p ci (j) + c s⊲ci (j − 1, j), s → c i ] = cost(j, c i ).• If there is a parallel subgraph between j − 1 and j we adapt the calculation in the following way: -Calculate ∆ max as before (the sum over all processing times on the server plus the biggest relevant in-and outgoing communication delays) -Iterate over ∆ i in {0, . . ., ∆ max }: * As before, check for each job if it fits: (1) only on the servers, (2) not on the servers but on at least one cloud context, (3) on both, (4) on none.If at least one job falls into (4) break.* Calculate for each job j in (2) or (3) the cheapest fitting option to schedule that job on some available cloud in time frame ∆ i .Use that cost c j for j for the remainder of the iteration.* Greedily put jobs in (1) onto server machines (1 to k) until the current server has load ≥ ∆ i , proceed with the next machine and so on.If not all jobs in (1) can be placed this way break, as there is not enough space to place jobs on the server that do not fit on the cloud in the given time frame.* Sort the jobs in (3) by their ratio of cost c j to processing time on the server (highest to lowest cost per time).Continue by greedily placing those on the server machines as before.When all jobs in (3) are placed, or all server machines have load ≥ ∆ i , put all remaing jobs from (3) on their corresponding cheapest cloud context.* Put all jobs from (2) on their corresponding cheapest cloud context.* insert time in the front and back corresponding to the biggest communication delay invoked by the (sub-)schedule for the parallel part The rest of the algorithm behaves as before.The changes to state extensions spanning a parallel subgraph calculate solutions that have at most optimal cost for a time frame of ∆ i , while using a time frame of 4∆ i .The 4 times correspond to: at most 2∆ i time for all in-and outgoing communication delays since the communication delays have to fit into ∆ i to be considered, at most 2∆ i time for our greedy packing of the server machines since we can add a job of size ∆ i to machine currently having load ∆ i − ǫ.It should be easy to see that the greedy packing of "highest cost jobs", with what is essentially resource augmentation of a multiple knapsack problem, gives at most optimal cost.Note that we could also utilize a PTAS for multiple knapsack here to stay in a time frame of 3∆ i , but we want to find a solution with optimal cost (or lower), to remain strictly budget adhering.It remains to simply use the same scaling technique used in Section 3 to get the 4 + ε-approximation.
If the communication delays are constant the result can be easily adapted to yield a 2+ε-approximation, by getting rid of the added time for communication delays.

Revisiting SCS ψ
In a similar vein as the previous subsection we briefly sketch how to adapt the results from Section 4 to include most of the previously defined model generalizations.Naturally, we still require the maximum cardinality source and sink dividing cut to be bounded by a constant.In contrast to the previous result we require the number of server machines to be a constant.
Theorem 16 There is an FPTAS for the budget restrained makespan minimization problem for graphs with a constant maximum cardinality source and sink dividing cut, even when there are a constant number of server machines, k different cloud contexts, the communication delays are directionally dependent on the machine context, and costs are given as an arbitrary cost function cost : J × clouds → N 0 .
Proof We make the following two changes to the state definition: We consider loc j ∈ {s, c 1 , . . ., c k instead of loc j ∈ {s, c}, we track the unused time of every server machine individually so instead of a single fs the state contains f s 1 , . . ., f s z .The dynamic program needs only minor tweaks.When iterating through the jobs that are open (and of which all predecessors have been processed) use the server s i with the smallest fitting f s i and set f s i = 0. Instead of checking if the job fits on "the cloud" we simply go through all clouds, and add corresponding states for each fitting location.While calculating the value of a state use the new cost function cost instead of pc, while checking if a job fits we use the directional communication delays.After a full iteration increase each f s i by one (instead of only increasing the singular fs).It should be easy to see, that these adaptations do not change the correctness of the algorithm.The runtime (after the rounding technique) naturally increases to poly(n z , k, 1 ε ), which is polynomial, iff z (the number of server machines) is a constant.

Approximating the Pareto Front
The problem variants we describe and analyze in this paper are multi-criteria optimization problems.To simultaneously handle the two criteria cost and makespan, we either looked at decision variants "is there a schedule with makespan ≤ d and cost ≤ b" or we used one of them as a constraint and asked "given a budget of b, minimize the makespan" (or vice versa).Naturally, one might be interested in finding an assortment of different efficient solutions, without giving a specific budget or deadline.A solution is called efficient, or Pareto optimal, if we can not improve one of the criteria, without worsening the other.The set of all Pareto optimal solutions is called the Pareto front.In the following, we will use the term point to refer to the makespan and cost of a feasible solution of a given SCS problem.
For our NP-hard problems, we will not be able to efficiently calculate the exact Pareto front, but we can find a set of points that is close to the optimum.In the literature, one can find slightly different definitions for such approximations.In [22], the authors scale each criteria to an interval from 0 to 1.A set of points is an α-approximation, if for each point in the actual Pareto front, there is a point where each dimension is offset by at most an additional ±α.We follow the definition of Pareto front approximations given in [23] (adapted to our case with exactly 2 objectives): Definition 1 A set of points S is an α-approximation of a Pareto front, if for each point p = (mspan p , cost p ) there is a point p ′ = (mspan p ′ , cost p ′ ) in S with mspan p ′ ≤ (1 + α)mspan p and cost p ′ ≤ (1 + α)cost p .
The dynamic programming algorithms established in this paper can be used to find such an approximation.We use the results from Section 4 to show how this is done, but note that a similar approach can be used for other results of this paper.
Intuitively our dynamic programs calculate a collection of possible results but only report a single one, where the "best" is selected based on the current objective.Imagine that one of our deadline restrained algorithms with approximation factor (1 + ε) reports every non dominated solution it finds instead.The result for d = 10 and ε = 0.1 could look like Figure 7.For every reported point (mspan, cost) we can infer a lower bound on the makespan of mspan − ε • d any schedule with a given cost has, due to the approximation factor of the algorithm.Note that gap is in relation to a given d, and therefore results with a smaller makespan are less precise.We will circumvent that by repeating the algorithm with smaller values for d.Proof Given some SCS problem with constant ψ run DPfGG (with the rounding approach) with d = j∈J ps(j).Normally the algorithm found the first state [ d, fs] = cost.Now, instead let the algorithm find the first state [t, fs] = cost for every t ∈ (0.5 d, d].For each of those states calculate an upper bound on the makespan for the respective schedule in the unscaled instance.Following the argumentation in the proof for Theorem 8, we know that the makespan is ≤ t+ς +(n−2)2ς = (t+2n−4)ς.
Report the point (mspan = (t + 2n − 4)ς, cost) and add it to S. After that full algorithm iteration, set d := 0.5d and repeat the process.Do this until d = 1.Finally, return the reported point set S.
We want to show that for every point p = (mspan p , cost p ) of a Pareto front, there is a reported point p ′ = (mspan p ′ , cost p ′ ) with mspan p ′ ≤ (1 + α)mspan p and cost p ′ ≤ (1 + α)cost p .Given some point p = (mspan p , cost p ), look at the iteration where 0.5d < mspan p ≤ d.Since there is a feasible schedule with mspan p and cost p at some point during that iteration we found a feasible scaled schedule with t = ⌊ mspan p , cost p ′ ) with mspan p ′ ≤ (1 + 2ε)mspan p and cost p ′ ≤ cost p got reported.Setting ε = 0.5α and noting that we repeat the process no more than log( j∈J ps(j)) times concludes the proof.

Future Work
We give a small overview over the future research directions that emerge from our work.SCS e : If good approximations for 1 | r j | w j U j become established, the algorithm given in Section 3 for the extended chain could probably be improved.One could model the incoming communication delay with release dates and get an equivalent subproblem to solve, instead of the approximate subproblem currently used.SCS: Section 5 gives a strong inapproximability result for the general case with regards to the cost function.For two easy cases (chain and fully parallel graphs) we could establish FPTAS results, for graphs with a constant ψ we have an algorithm that finds optimal solutions with a (1 + ε) deadline augmentation.Here one could explore if there are FPTAS results for different assumptions, are there approximation algorithms without resource augmentation for constant ψ instances, and lastly are there approximation algorithms with resource augmentation for the general case.For the makespan function we already have a FPTAS for graphs with a constant ψ.It remains to explore approximation algorithms or inapproximability results for the general case of this problem.SCS 1 : We show strong NP-hardness even for this simplified problem.Since this is a special case of the general problem all constructive results still hold, additionally we were able to give a first simple algorithm for cost optimization in general graphs.Here it would be interesting to look into more involved approximation algorithms that give better performance guarantees, maybe without resource augmentation.
we know that there is no feasible solution for the search version, and otherwise we can use backtracking starting from M [n + 1, b] to find one.The time and space complexity is polynomial in b and n.

Fig. 1 :
Fig.1: An example extended chain with two parallel parts.

Fig. 2 :
Fig. 2: Schematic example of resulting SCS e problem for 5 jobs, squiggly arrows represent communication delays and model release dates and deadlines.

2 Fig. 3 :
Fig. 3: Example state of a running schedule, open edges are orange, loc ji and f ji kept for j 0 , j 1 and j 2 .

Fig. 4 :
Fig.4: The dummy structure for the reduction from the clique problem to a special case of SCS.Time flows from left to right, the anchor chain jobs are positioned on the cloud, and the gap jobs on the server.

Fig. 7 :Theorem 17
Fig.7: Reported solutions by our algorithm, filled circles and empty circles represent reported points and best possible solutions due to the approximation factor, respectively.Dotted region is infeasible, striped region is feasible but dominated.

Table 1 :
An overview of the results of this paper.
we know that there is no feasible solution for the search version, and otherwise we can use backtracking starting from C[n + 1, d] to find one.The time and space complexity is polynomial in d and n.

1 :
initialize state list SL with start state (as defined above) 2: for all state ∈ SL do 3: let J state be the set of all jobs that are endpoints in open edges from state Lemma 3There is a k-clique, if and only if there is a schedule with length and cost at most 2n + 3m.Server Cloud Scheduling * size 2 slots n − k size 1 slots m − k * size 2 slots and only if there is a satisfying assignment for φ.The variable jobs processed on the cloud represent this satisfying assignment Proof From Lemma 4, Lemma 5 and Lemma 6 we can infer that a schedule with costs of | J | −(2 + m + n) has two jobs of the anchor chain, one job for each pair of variable jobs and one job per clause on the server.Two jobs of the anchor chain can always be placed on the server, the choice of variable jobs is also free.It remains to show, that we can only schedule a literal job per clause on the server if and only if the respective clause is fulfilled by the assignment inferred by the variable jobs.The clause job j C φ p εd ts are filled with jobs being processed on the server.With the one job we can process on the server during the communication ts we process d 2 + εd jobs on the server and have costs of n − ( d 2 + εd).An optimal solution has costs of at least n − d.