A Fully PolynomialTime Approximation Scheme for Speed Scaling with a Sleep State
 139 Downloads
Abstract
We study classical deadlinebased preemptive scheduling of jobs in a computing environment equipped with both dynamic speed scaling and sleep state capabilities: Each job is specified by a release time, a deadline and a processing volume, and has to be scheduled on a single, speedscalable processor that is supplied with a sleep state. In the sleep state, the processor consumes no energy, but a constant wakeup cost is required to transition back to the active state. In contrast to speed scaling alone, the addition of a sleep state makes it sometimes beneficial to accelerate the processing of jobs in order to transition the processor to the sleep state for longer amounts of time and incur further energy savings. The goal is to output a feasible schedule that minimizes the energy consumption. Since the introduction of the problem by Irani et al. (ACM Trans Algorithms 3(4), 2007), its exact computational complexity has been repeatedly posed as an open question (see e.g. Albers and Antoniadis in ACM Trans Algorithms 10(2):9, 2014; Baptiste et al. in ACM Trans Algorithms 8(3):26, 2012; Irani and Pruhs in SIGACT News 36(2):63–76, 2005). The currently best known upper and lower bounds are a 4 / 3approximation algorithm and NPhardness due to Albers and Antoniadis (2014) and Kumar and Shannigrahi (CoRR, 2013. arXiv:1304.7373), respectively. We close the aforementioned gap between the upper and lower bound on the computational complexity of speed scaling with sleep state by presenting a fully polynomialtime approximation scheme for the problem. The scheme is based on a transformation to a nonpreemptive variant of the problem, and a discretization that exploits a carefully defined lexicographical ordering among schedules.
Keywords
Approximation algorithms Energy efficiency Polynomialtime approximation scheme1 Introduction
As energy efficiency in computing environments becomes more crucial, chip manufacturers are increasingly incorporating energysaving functionalities to their processors. One of the most common functionalities is dynamic speed scaling, where the processor can adjust its speed dynamically. A higher speed yields a higher performance, but this performance comes at the cost of more energy consumption. On the other hand, a lower speed results in better energyefficiency, but at the cost of performance degradation. In practice, it has been observed [7, 11] that the power consumption of the processor is approximately proportional to its speed cubed. However, even when the processor is idling, it consumes a nonnegligible amount of energy just for the sake of “being active” (for example because of leakage current). Due to this fact, additional energy can be saved by incorporating a sleep state to the processor. A processor in a sleep state consumes zero (or negligible) energy; however, there is an extra energy cost when it is transitioned back to the active state.
This article studies the offline problem of minimizing energy consumption of a processor which is equipped with both speed scaling and sleep state capabilities. This problem is called speed scaling with sleep state, first introduced by Irani et al. [18].
Let us state the problem more formally. The given processor has two states: the active state, during which it can execute jobs and consume some energy, and the sleep state, during which no jobs can be executed but also no energy is consumed. We assume that a wakeup operation, that is a transition from the sleep state to the active state, incurs a constant energy cost \(C>0\), whereas transitioning from the active state to the sleep state is free of charge. Further, as in [2, 18], the power required by the processor in the active state is an arbitrary convex and nondecreasing function P of its speed s. In accordance to all previous work in the area, we make the necessary assumptions that the power function P is fixed and therefore not part of the input.^{1} We also assume that \(P(0)>0\), since (i) as already mentioned, realworld processors are known to have leakage current and (ii) otherwise the sleep state would be redundant. Further motivation for considering arbitrary convex power functions for speed scaling can be found, for example, in [8].
The input is a set \(\mathcal {J}\) of n jobs. Each job j is associated with a release time \(r_j\), a deadline \(d_j\) and a processing volume \(v_j\). One can think of the processing volume as the number of CPU cycles that are required in order to completely process the job, so that if job j is processed at a speed of s, then \(v_j/s\) timeunits are required to complete the job. We call the interval \([r_j,d_j)\) the allowed interval of job j, and say that job j is active at time point t if and only if \(t\in [r_j,d_j)\).^{2} Furthermore, we may assume without loss of generality that \(\min _{j\in \mathcal {J}}r_j=0\), and that \(v_{min}:= \min _{j\in \mathcal {J}}v_j\) is normalized to 1 (if \(v_{min}\ne 1\) is the case, we can scale the instance by dividing the \(r_j\)’s, \(d_j\)’s, and \(v_j\)’s by \(v_{min}\), and using the power function \(P(s) \cdot v_{min}\) along with the original wakeup cost C). Further, let \(d_{max}:=\max _{j\in \mathcal {J}} d_j\) be the latest deadline of any job.
A schedule is defined as a mapping of every time point t to the state of the processor, its speed, and the job being processed at t (or null if there is no job running at t). Note that the processing speed is zero whenever the processor sleeps, and that a job can only be processed when the speed is strictly positive. A schedule is called feasible when the whole processing volume of every job j is completely processed in j’s allowed interval \([r_j,d_j)\). Preemption of jobs is allowed.
The energy consumption incurred by schedule \(\mathcal {S}\) while the processor is in the active state, is its power integrated over time, i.e. \(\int P(s(t)) dt\), where s(t) is the processing speed at time t, and the integral is taken over all time points in \([0,d_{max})\) during which the processor is active under \(\mathcal {S}\). Assume that \(\mathcal {S}\) performs k transitions from the sleep state to the active state. (We will assume throughout the paper that initially, prior to the first release time, as well as finally, after the last deadline, the processor is in the active state. However, our results can be easily adapted for the setting where the processor is initially and/or eventually in the sleep state). Then the total energy consumption of \(\mathcal {S}\) is \(E(\mathcal {S}) := \int P(s(t))dt +kC\), where again the integral is taken over all time points at which \(\mathcal {S}\) keeps the processor in the active state. We are seeking a feasible schedule that minimizes the total energy consumption.
Observe that, by Jensen’s inequality, and by the convexity of the power function, it is never beneficial to process a job with a varying speed. Irani et al. [18] observed the existence of a critical speed\(s_{crit}\), which is the most efficient speed for processing jobs. This critical speed is the smallest speed that minimizes the function P(s) / s. Note that, by the convexity of P(s), the only case where the critical speed \(s_{crit}\) is not well defined, is when P(s) / s is always decreasing. However, this would render the setting unrealistic, and furthermore make the algorithmic problem trivial, since it would be optimal to process every job at an infinite speed. We may therefore assume that this case does not occur. Further, it can be shown (see [18]) that for any \(s\ge s_{crit}\), the function P(s) / s is nondecreasing.
1.1 Previous Work
The theoretical model for dynamic speed scaling was introduced in a seminal paper by Yao, Demers and Shenker [21]. They developed a polynomial time algorithm called YDS, that outputs a minimumenergy schedule for this setting. Irani, Shukla and Gupta [18] initiated the algorithmic study of speed scaling combined with a sleep state. Such a setting motivates the socalled race to idle technique: one saves overall energy by accelerating some jobs in order to transition the processor to the sleep state for longer periods of time (see [4, 13, 14, 20] and references therein for more information regarding the race to idle technique). Irani et al. developed a 2approximation algorithm for speed scaling with sleep state, but the computational complexity of the scheduling problem has remained open. The first step towards attacking this open problem was made by Baptiste [9], who gave a polynomial time algorithm for the special case where the processor must execute all jobs at a fixed speed, and all jobs are of unit size. Baptiste’s algorithm is based on a clever dynamic programming formulation of the scheduling problem, and was later extended to (i) arbitrarilysized jobs in [10], and (ii) a multiprocessor setting in [12].
More recently, Albers and Antoniadis [2] improved the upper bound on the approximation ratio of the general problem, by developing a 4 / 3approximation algorithm. For the special case of agreeable deadlines and a power function of the form \(P(s) = s^\alpha + \beta \) (with constant \(\alpha >1\) and \(\beta >0\)), Bampis et al. [5] provided an exact polynomial time algorithm. With respect to the lower bound, [2] gave an NPhardness reduction from the partition problem. The reduction uses a particular power function that is based on the partition instance, i.e., it is considered that the power function is part of the input. The reduction of [2] was later refined by Kumar and Shannigrahi [19], to show that the problem is NPhard for any fixed, nondecreasing and strictly convex power function.
The online setting of the problem has also been studied. Irani et al. [18] gave a \((2^{2\alpha 2}\alpha ^\alpha + 2^{\alpha 1+2})\)competitive online algorithm. Han et al. [15] improved upon this result by developing an \((\alpha ^\alpha +2)\)competitive algorithm for the problem. Both of the above results assume a power function of the form \(P(s) = s^\alpha + \beta \), where \(\alpha >1\) and \(\beta >0\) are constants.
A more thorough discussion on the above scheduling problems can be found in the surveys [1, 17].
1.2 Our Contribution
We study the offline setting of speed scaling with sleep state. Since the introduction of the problem by Irani et al. [18], its exact computational complexity has been repeatedly posed as an open question (see e.g. [2, 10, 17]). The currently best known upper and lower bounds are a 4 / 3approximation algorithm and NPhardness due to [2] and [2, 19], respectively. In this paper, we settle the open question by presenting a fully polynomialtime approximation scheme.
At the core of our approach is a transformation of the original preemptive problem into a nonpreemptive scheduling problem of the same type. At first sight, this may seem counterintuitive, especially as Bampis et al. [6] showed that (for the problem of speed scaling alone), for the same instance, the ratio of an optimal preemptive solution against an optimal nonpreemptive solution can be very high. However, this does not apply in our case, as we consider the nonpreemptive problem on a modified instance, where each job is replaced by a polynomial number of pieces. Furthermore, in our analysis, we make use of a particular lexicographic ordering that does exploit the advantages of preemption.
In order to compute an optimal schedule for the modified instance via dynamic programming, we require a number of properties that pieces must satisfy in a valid schedule. The definition of these properties is based on a discretization of the time horizon by a polynomial number of time points. Roughly speaking, we focus on those schedules that start and end the processing of each piece at such time points, and satisfy a certain constraint on the processing order of the pieces. Proving that a nearoptimal schedule in this class of schedules exists is the most subtle part of our approach.
On one hand, the processing order constraint can be exploited by the DP; on the other hand, such a constraint is difficult to establish in an optimal schedule with the introduced indivisible volumes (since pieces of different jobs might have different volumes and cannot easily be interchanged). To get around this, we first ensure the right ordering in an optimal schedule for the preemptive setting, and then perform a series of transformations to a nonpreemptive schedule with the above properties. Each of these transformations increases the energy consumption only by a small factor, and maintains the correct ordering among the pieces.
We remark that Baptiste [9] used a dynamic program of similar structure for the special case of unitsized jobs and a fixedspeed processor equipped with a sleep state. His dynamic program is also based on a particular ordering of jobs, which, however, is not sufficient for our setting. Since we have pieces of different sizes, the swapping argument used in [9] fails.
In Sect. 2, we describe the YDS algorithm from [21] for the problem of speed scaling without a sleep state, and then show several properties that a schedule produced by YDS has for our problem of speed scaling with sleep state. We then, in Sect. 3, define a particular class of schedules that have a set of desirable properties, and show that there exists a schedule in this class, whose energy consumption is within a \((1+\epsilon )\)factor from optimal. Finally, in Sect. 4, we develop an algorithm based on a dynamic program, that outputs, in polynomial time, a schedule of minimal energy consumption among all the schedules of the aforementioned class.
2 Preliminaries
We remark that the speed used for the processing of jobs can never increase between two consecutive rounds, i.e., YDS schedules the jobs by order of nonincreasing speeds. Furthermore, all the jobs scheduled in each round i have their allowed intervals within \(I_i\).
Given any job instance \(\mathcal {J}\), let \(\text {FAST}(\mathcal {J})\) be the subset of \(\mathcal {J}\) that YDS processes at a speed greater than or equal to \(s_{crit}\), and let \(\text {SLOW}(\mathcal {J}):=\mathcal {J}\setminus \text {FAST}(\mathcal {J})\). The following lemma is an extension of a fact proven by Irani et al. [18].
Lemma 1
 1.
Every job in \(\text {FAST}(\mathcal {J})\) is processed according to YDS.
 2.
Every job \(k \in \text {SLOW}(\mathcal {J})\) is run at a uniform speed \(s_k< s_{crit}\), and the processor never (actively) runs at a speed less than \(s_k\) during \([r_k, d_k)\).
Proof
Clearly, in general a schedule produced by YDS only satisfies the first condition and is not a YDSextension.
By the preceding lemma, we may use YDS to schedule the jobs in \(\text {FAST}(\mathcal {J})\), and need to find a good schedule only for the remaining jobs (which are exactly \(\text {SLOW}(\mathcal {J})\)). To this end, we transform the input instance \(\mathcal {J}\) to an instance \(\mathcal {J}'\), in which the jobs \(\text {FAST}(\mathcal {J})\) are replaced by dummy jobs. This introduction of dummy jobs bears resemblance to the approach of [2]. We then show in Lemma 2, that any schedule for \(\mathcal {J}'\) with a certain property, can be transformed to a schedule for \(\mathcal {J}\) without any degradation in the approximation factor.

For every job \(j\in \text {SLOW}(\mathcal {J})\), if there exists an i such that \(r_j \in I_i\) (resp. \(d_j\in I_i\)), then we set \(r_{j} := z_i\) (resp. \(d_{j} := y_i\)), else we keep the job as it is.

For each \(I_i\), we replace all jobs \(j\in \text {FAST}(\mathcal {J})\) that are active in \(I_i\) by a single job \(j^d_i\) (d stands for “dummy”) with release time at \(y_i\), deadline at \(z_i\), and processing volume \(v_i^d\) equal to the total volume that \(S_{\text {YDS}}\) schedules in \(I_i\), i.e. \(v_i^d = \sum _{j \in B(I_i)}v_j\).
Proposition 1
\(\text {FAST}(\mathcal {J}') = \{j_i^d: 1\le i\le \ell \}\) and \(\text {SLOW}(\mathcal {J}') = \text {SLOW}(\mathcal {J})\).
Proof
Since \(\mathcal {J}' = \{j_i^d: 1\le i\le \ell \}\cup \text {SLOW}(\mathcal {J})\), and furthermore \(\text {SLOW}(\mathcal {J}')\) and \(\text {FAST}(\mathcal {J}')\) are disjoint sets, it suffices to show that (i) \(\text {FAST}(\mathcal {J}') \supseteq \{j_i^d: 1\le i\le \ell \}\) and that (ii) \(\text {SLOW}(\mathcal {J}')\supseteq \text {SLOW}(\mathcal {J})\).
For (i), we observe that no job \(j_i^d\) can be feasibly scheduled at a uniform speed less than \(s_{crit}\). As YDS uses a uniform speed for each job, these jobs must belong to \(\text {FAST}(\mathcal {J}')\).
For (ii), consider the execution of YDS on \(\mathcal {J}'\). More specifically, consider the first round when a job from \(\text {SLOW}(\mathcal {J})\) is scheduled. Let \(\mathcal {I}\) be the maximal density interval of this round, and let \(\mathcal {J}_S\) and \(\mathcal {J}_d\) be the sets of jobs from \(\text {SLOW}(\mathcal {J})\) and \(\{j_i^d: 1\le i\le \ell \}\), respectively, that are scheduled in this round (note that \(\mathcal {I}\) contains the allowed intervals of these jobs). As the speed used by YDS is nonincreasing from round to round, it suffices to show that \(dens(\mathcal {I})<s_{crit}\).
The following lemma suggests that for obtaining an FPTAS for instance \(\mathcal {J}\), it suffices to give an FPTAS for instance \(\mathcal {J}'\), as long as we schedule the jobs \(j_i^d\) exactly in their allowed intervals \(I_i\).
Lemma 2
Let \(S'\) be a schedule for input instance \(\mathcal {J}'\), that (i) processes each job \(j_i^d\) exactly in its allowed interval \(I_i\) (i.e. from \(y_i\) to \(z_i\)), and (ii) is a capproximation for \(\mathcal {J}'\). Then \(S'\) can be transformed in polynomial time into a schedule S that is a capproximation for input instance \(\mathcal {J}\).
Proof
Given such a schedule \(S'\), we leave the processing in the intervals \(T_1,\dots , T_m\) unchanged, and replace for each interval \(I_i\) the processing of job \(j_i^d\) by the original YDSschedule \(S_{\text {YDS}}\) during \(I_i\). It is easy to see that the resulting schedule S is a feasible schedule for \(\mathcal {J}\). We now argue about the approximation factor.
3 Discretizing the Problem
After the transformation in the previous section, we have an instance \(\mathcal {J}'\). In this section, we show that there exists a “discretized” schedule for \(\mathcal {J}'\), whose energy consumption is at most \(1+\epsilon \) times that of an optimal schedule for \(\mathcal {J}'\). In the next section, we will show how such a discretized schedule can be found by dynamic programming.
Before presenting formal definitions and technical details, we here first sketch the ideas behind our approach.
A major challenge of the original problem is that we need to deal with an infinite number of possible schedules. We overcome this intractability by “discretizing” the problem as follows: (1) we break each job in \(\text {SLOW}(\mathcal {J}')\) into smaller pieces, and (2) we create a set of time points and introduce the additional constraint that each piece of a job has to start and end at these time points. The number of the introduced time points and job pieces are both polynomial in the input size and \(1/\epsilon \), which substantially reduces the amount of guesswork we have to do in the dynamic program. The challenge is how to find such a discretization and argue that it does not increase the optimal energy consumption by too much.
3.1 Further Definitions and Notation
We first define the set W of time points. Given an error parameter \(\epsilon > 0\), let \(\delta := \min \{\frac{1}{4}, \frac{\epsilon }{4} \frac{P(s_{crit})}{P(2s_{crit})P(s_{crit})}\}\). Intuitively, \(\delta \) is defined in such a way that speeding up the processor by a factor \((1+ \delta )^3\) does not increase the power consumption by more than a factor \(1+\epsilon \) (see Lemma 5).
Let \(W' := \bigcup _{j \in \mathcal {J}'} \{r_j,d_j\}\), and consider the elements of \(W'\) in sorted order. Let \(t_i, 1\le i \le W'\) be the ith element of \(W'\) in this order. We call an interval \([t_i, t_{i+1})\) for \(1\le i \le W'1\) a zone, and observe that every zone is either equal to some interval \(I_i\) or contained in some interval \(T_i\).
Note that W is polynomial in the input size and \(1/\epsilon \).
Definition 1
We split each job \(j\in \text {SLOW}(\mathcal {J}')\) into \(4n^2\lceil 1/\delta \rceil \) equal sized pieces, and also consider each job \(j_i^d \in \text {FAST}(\mathcal {J}')\) as a single piece on its own. For every piece u of some job j, let \(job(u) := j\), \(r_u:=r_j\), \(d_u:=d_j\), and \(v_u:=v_j/(4n^2\lceil 1/\delta \rceil )\) if \(j\in \text {SLOW}(\mathcal {J}')\), and \(v_u:=v_j\) otherwise. Furthermore, let D denote the set of all pieces derived from all jobs in \(\mathcal {J}'\).
Note that \(D= \ell + \text {SLOW}(\mathcal {J}') \cdot 4n^2\lceil 1/\delta \rceil \) is polynomial in the input size and \(1/\epsilon \). We now define an ordering of the pieces in D.
Definition 2
We point out that any schedule for \(\mathcal {J}'\) can also be seen as a schedule for D, by implicitly assuming that the pieces of any fixed job are processed in the above order.
We are now ready to define the class of discretized schedules.
Definition 3
 (i)
Every piece is completely processed in a single zone, and without preemption.
 (ii)
The execution of every piece starts and ends at a time point from the set W.
 (iii)
For any time point t, such that in S a piece u ends at t, S schedules all pieces \(u'\succ u\) with \(d_{u'}\ge t\) after t.
Finally, we define a particular ordering over possible schedules, which will be useful in our analysis.
Definition 4
Observe that shifting the processing interval of any fraction of some job j to an earlier time point (without affecting the other processing times of j) decreases the value of \(q_j\).
3.2 Existence of a NearOptimal Discretized Schedule
In this section, we first show that there exists a YDSextension for \(\mathcal {J}'\) with certain nice properties (recall that a YDSextension is an optimal schedule satisfying the properties of Lemma 1). We then explain how such a YDSextension can be transformed into a wellordered discretized schedule, and prove that the speed of the latter, at all times, is at most \((1+\delta )^3\) times that of the former. This fact essentially guarantees the existence of a wellordered discretized schedule with energy consumption at most \(1+\epsilon \) that of an optimal schedule for \(\mathcal {J}'\). The transformation is depicted in Fig. 2.
Lemma 3
 1.
Every job \(j_i^d\) is scheduled exactly in its allowed interval \(I_i\).
 2.Every zone \([t_i,t_{i+1})\subseteq \mathcal {T}\) has the following two properties:
 (a)
There is at most one contiguous maximal processing interval within \([t_i,t_{i+1})\), and this interval either starts at \(t_i\) and/or ends at \(t_{i+1}\). We call this interval the block of zone \([t_i,t_{i+1})\).
 (b)
OPT uses a uniform speed of at most \(s_{crit}\) during this block.
 (a)
 3.
There exist no two jobs \(j' \succ j\), such that a portion of j is processed after some portion of \(j'\), and before \(d_{j'}\).
Proof
 1.
Since \(\text {FAST}(\mathcal {J}') = \{j_i^d: 1\le i\le \ell \}\) (by Proposition 1), and OPT is a YDSextension, it follows that each \(j_i^d\) is processed exactly in its allowed interval \(I_i\).
 2.
 (a)
Assume for the sake of contradiction that \([t_i,t_{i+1})\subseteq \mathcal {T}\) contains a number of maximal intervals \(N_1,N_2,\dots , N_\psi \) (ordered from left to right^{3}) during which jobs are being processed, with \(\psi \ge 2\). Let \(M_1,M_2,\dots , M_{\psi '}\) (again ordered from left to right) be the remaining maximal intervals in \([t_i,t_{i+1})\), so that \(N_1,\dots , N_\psi \) and \(M_1,\dots , M_{\psi '}\) partition the zone \([t_i,t_{i+1})\). Furthermore, note that for each \(i=1,\dots , \psi '\), the processor is either active but idle or asleep during the whole interval \(M_i\), since otherwise setting the processor asleep during the whole interval \(M_i\) would incur a strictly smaller energy consumption.
We modify the schedule by shifting the intervals \(N_i\), \(i = 2, \dots , \psi \) to the left, so that \(N_1, N_2, \dots , N_\psi \) now form a single contiguous processing interval. The intervals \(M_k\) lying to the right of \(N_1\) are moved further right and merge into a single (longer) interval \(M'\) during which no jobs are being processed. If the processor was active during each of these intervals \(M_k\), then we keep the processor active during the new interval \(M'\), else we transition it to the sleep state. We observe that the resulting schedule is still a YDSextension (note that its energy consumption is at most that of the initial schedule), but is lexicographically smaller.
For the second part of the statement, assume that there exists exactly one contiguous maximal processing interval \(N_1\) within \([t_i,t_{i+1})\), and that there exist two Mintervals, \(M_1\) and \(M_2\) before and after \(N_1\), respectively.
We consider two cases:
The processor is active just before \(t_i\), or the processor is asleep both just before \(t_i\) and just after \(t_{i+1}\): In this case we can shift \(N_1\) left by \(M_1\) time units, so that it starts at \(t_i\). Again, we keep the processor active during \([t_i+N_1, t_{i+1})\) only if it was active during both \(M_1\) and \(M_2\). As before, the resulting schedule remains a YDSextension, and is lexicographically smaller.

The processor is in the sleep state just before \(t_i\) but active just after \(t_{i+1}\): In this case we shift \(N_1\) by \(M_2\) time units to the right, so that its right endpoint becomes \(t_{i+1}\). During the new idle interval \([t_i,t_i+M_1+M_2)\) we set the processor asleep. Note that in this case the processor was asleep during \(M_1\). The schedule remains a YDSextension, but its energy consumption becomes strictly smaller: (i) either the processor was asleep during \(M_2\), in which case the resulting schedule uses the same energy while the processor is active but has one wakeup operation less, or (ii) the processor was active and idle during \(M_2\), in which case the resulting schedule saves the idle energy that was spent during \(M_2\).

 (b)
The statement follows directly from the second property of Lemma 1 and the fact that all jobs processed during \([t_i,t_{i+1})\) belong to \(\text {SLOW}(\mathcal {J}')\) and are active in the entire zone.
 (a)
 3.
Assume for the sake of contradiction that there exist two jobs \(j' \succ j\), such that a portion of j is processed during an interval \(Z = [\zeta _1,\zeta _2)\), \(\zeta _2 \le d_{j'}\), and some portion of \(j'\) is processed during an interval \(Z' = [\zeta _1',\zeta _2')\), with \(\zeta _2'\le \zeta _1\). We first observe that both jobs belong to \(\text {SLOW}(\mathcal {J}')\). This follows from the fact that both jobs are active during the whole interval \([\zeta _1',\zeta _2)\), and processed during parts of this interval, whereas any job \(j_i^d\) (which are the only jobs in \(\text {FAST}(\mathcal {J}')\)) is processed exactly in its entire interval \([y_i,z_i)\) (by statement 1 of the lemma).
By the second property of Lemma 1, both j and \(j'\) are processed at the same speed. We can now apply a swap argument. Let \(L:=\min \{Z,Z'\}\). Note that OPT schedules only \(j'\) during \([\zeta _2'L,\zeta _2')\) and only j during \([\zeta _2L,\zeta _2)\). Swap the part of the schedule OPT in \([\zeta _2'L,\zeta _2')\) with the schedule in the interval \([\zeta _2L, \zeta _2)\). Given the above observations, it can be easily verified that the resulting schedule (i) is feasible and remains a YDSextension, and (ii) is lexicographically smaller than OPT.
The next lemma shows how to transform the lexicographically minimal YDSextension for \(\mathcal {J}'\) of the previous lemma into a wellordered discretized schedule. This is the most crucial part of our approach. Roughly speaking, the transformation needs to guarantee that (1) in each zone, the volume of a job \(j \in \text {SLOW}(\mathcal {J}')\) processed is an integer multiple of \(v_j/(4n^2\lceil 1/\delta \rceil )\) (this is tantamount to making sure that each zone has integral job pieces to deal with), (2) the job pieces start and end at the time points in W, and (3) all the job pieces are processed in the “right order”. As we will show, the new schedule may run at a higher speed than the given lexicographically minimal YDSextension, but not by too much.
Lemma 4
Proof
Through a series of three transformations, we will transform OPT to a wellordered discretized schedule F, while upper bounding the increase in speed caused by each of these transformations. More specifically, we will transform OPT to a schedule \(F_1\) satisfying (i) and (iii) of Definition 3, then \(F_1\) to \(F_2\) where we slightly adapt the block lengths, and finally \(F_2\) to F which satisfies all three properties of Definition 3. Each of these transformations can increase the speed by at most a factor \((1+\delta )\) for any \(t\in \mathcal {T}\) and does not affect the speed in any interval \(I_i\).
Transformation 1
 (i)
For each job \(j\in \text {SLOW}(\mathcal {J}')\), an integer multiple of \(v_j/(4n^2\lceil 1/\delta \rceil )\) volume of job j is processed in each zone, and the processing order of jobs within each zone is determined by \(\prec \). Together with property 1 of Lemma 3, this implies that \(F_1\) (considered as a schedule for pieces) satisfies Definition 3(i).
 (ii)
The wellordered property of Definition 3 is satisfied.
 (iii)
For all \(t\in \mathcal {T}\) it holds that \(s_{F_1}(t)\le (1+\delta )s_{OPT}(t)\), and for every \(t\notin \mathcal {T}\) it holds that \(s_{F_1}(t)=s_{OPT}(t)\).
Note that by Lemma 3, every zone is either empty, filled exactly by a job \(j_i^d\), or contains a single block. For any job \(j\in \text {SLOW}(\mathcal {J}')\), and every zone \([t_i,t_{i+1})\), let \(V^i_j\) be the processing volume of job j that OPT schedules in zone \([t_i,t_{i+1})\). Since there can be at most 2n different zones, for every job j there exists some index h(j), such that \(V^{h(j)}_j\ge v_j/(2n)\).
Note that in the resulting schedule \(F_1\), a job may be processed at different speeds in different zones, but each zone uses only one constant speed level.
It is easy to see that \(F_1\) is a feasible schedule in which for each job \(j\in \text {SLOW}(\mathcal {J}')\), an integer multiple of \(v_j/(4n^2\lceil 1/\delta \rceil )\) volume of j is processed in each zone, and that \(\bar{\mathcal {V}}_j^i \le V_j^i\) for all \(i\ne h(j)\). Furthermore, if \(i = h(j)\), we have that \(\bar{\mathcal {V}}_j^i  V_j^i \le v_j/(2n\lceil 1/\delta \rceil )\), and \(V_j^i\ge v_j/(2n)\). It follows that \(\bar{\mathcal {V}}_j^i\le V_j^i+V_j^i/\lceil 1/\delta \rceil \le (1+\delta )V_j^i\) in this case, and therefore \(s_{F_1}(t) \le (1+\delta )s_{OPT}(t)\) for all \(t\in \mathcal {T}\). We note here, that for every job \(j_i^d\), and the corresponding interval \(I_i\), nothing changes during the transformation.
We finally show that \(F_1\) satisfies the wellordered property of Definition 3. Assume for the sake of contradiction that there exists a piece u ending at some t, and there exists a piece \(u'\succ u\) with \(d_{u'}\ge t\) that is scheduled before t. Recall that we can implicitly assume that the pieces of any fixed job are processed in the corresponding order \(\prec \). Therefore \(job(u') \succ job(u)\), by definition of the ordering \(\prec \) among pieces. Furthermore, if \([t_k,t_{k+1})\) and \([t_{k'},t_{k'+1})\) are the zones in which u and \(u'\), respectively, are scheduled, then \(k'<k\), as \(k' = k\) would contradict \(F_1\)’s processing order of jobs inside a zone. Also note that \(d_{u'} \ge t_{k+1}\), since \(t\in (t_k,t_{k+1}]\), and \((t_k,t_{k+1})\) does not contain any deadline. This contradicts property 3 of Lemma 3, as the original schedule OPT must have processed some volume of \(job(u')\) in \([t_{k'},t_{k'+1})\), and some volume of job(u) in \([t_{k},t_{k+1})\).
Transformation 2
(\(F_1\rightarrow F_2\)): In this transformation, we slightly modify the block lengths, as a preparation for Transformation 3. For every nonempty zone \([t_i,t_{i+1})\subseteq \mathcal {T}\), we increase the uniform speed of its block until it has a length of \((1+\delta )^j\frac{1}{4n^2s_{crit}(1+\delta ) \lceil 1/\delta \rceil }\) for some integer \(j\ge 0\), keeping one of its endpoints fixed at \(t_i\) or \(t_{i+1}\). Note that in \(F_1\), the block had length at least \(\frac{1}{4n^2s_{crit}(1+\delta ) \lceil 1/\delta \rceil }\), since it contained a volume of at least \(1/(4n^2\lceil 1/\delta \rceil )\), and the speed in this zone was at most \((1+\delta )s_{crit}\). The speedup needed for this modification is clearly at most \((1+\delta )\).
As this transformation does not change the processing order of any pieces nor the zone in which any piece is scheduled, it preserves the wellordered property of Definition 3.
Transformation 3
We now show that the speedup used in our transformation does not increase the energy consumption by more than a factor of \(1+\epsilon \). To this end, observe that for any \(t\in \mathcal {T}\), the speed of the schedule OPT in Lemma 4 is at most \(s_{crit}\), by Lemma 3(2). Furthermore, note that the final schedule F has speed zero whenever OPT has speed zero. This allows F to use exactly the same sleep phases as OPT (resulting in the same wakeup costs). It therefore suffices to prove the following lemma, in order to bound the increase in energy consumption.
Lemma 5
Proof
We summarize the major result of this section in the following lemma.
Lemma 6
There exists a wellordered discretized schedule with an energy consumption no more than \((1+\epsilon )\) times the optimal energy consumption for \(\mathcal {J}'\).
4 The Dynamic Program
In this section, we show how to use dynamic programming to find a wellordered discretized schedule with minimum energy consumption. In the following, we discuss only how to find the minimum energy consumption of this target schedule, as the actual schedule can be easily retrieved by proper bookkeeping in the dynamic programming process.
Recall that D is the set of all pieces and W the set of time points. Let \(u_1, u_2, \ldots , u_{D}\) be the pieces in D, and w.l.o.g. assume that \(u_1\prec u_2\prec \ldots \prec u_{D}\).
Definition 5
 1.
all pieces \(\{u\succeq u_k: \tau _1<d_u\le \tau _2\}\) are processed in the interval \([\tau _1, \tau _2)\), and
 2.
the machine is active right before \(\tau _1\) and right after \(\tau _2\).
In case that there is no such feasible schedule, let \(E_k(\tau _1, \tau _2) = \infty \).

Suppose that \(d_{u_k} \not \in (\tau _1, \tau _2]\). Then clearly \(E_k(\tau _1,\tau _2) = E_{k+1}(\tau _1,\tau _2)\).
 Suppose that \(d_{u_k} \in (\tau _1, \tau _2]\). By definition, piece \(u_k\) needs to be processed in the interval \([\tau _1,\tau _2)\). We need to guess its actual execution period \([b,e) \subseteq [\tau _1,\tau _2)\), and process the remaining pieces \(\{u\succeq u_{k+1}: \tau _1<d_u\le \tau _2\}\) in the two intervals \([\tau _1, b)\) and \([e,\tau _2)\). We first rule out some guesses of [b, e) that are bound to be wrong.

By Definition 3(i), in a discretized schedule, a piece has to be processed completely inside a zone \([t_i,t_{i+1})\) (recall that \(t_i \in W'\) are the release times and deadlines of the jobs). Therefore, in the right guess, the interior of [b, e) does not contain any release times or deadlines; more precisely, there is no time point \(t_i \in W'\) so that \(b< t_i < e\).

By Definition 3(iii), in a wellordered discretized schedule, if piece \(u_k\) ends at time point e, then all pieces \(u' \succ u_k\) with deadline \(d_{u'} \ge e\) are processed after\(u_k\). However, consider the guess [b, e), where \(e = d_{u'}\) for some \(u' \succ u_{k}\) (notice that the previous case does not rule out this possibility). Then \(u'\) cannot be processed anywhere in a wellordered schedule. Thus, such a guess [b, e) cannot be right.
By the preceding discussion, if the guess (b, e) is right, the two sets of pieces \(\{u\succeq u_{k+1}: \tau _1<d_u\le b\}\) and \(\{u\succeq u_{k+1}: e <d_u\le \tau _2 \}\), along with piece \(u_k\), comprise all pieces to be processed that are required by the definition of \(E_k(\tau _1, \tau _2)\). Clearly, the former set of pieces \(\{u\succeq u_{k+1}: \tau _1<d_u\le b\}\) has to be processed in the interval \([\tau _1,b)\); the latter set of pieces, in a wellordered schedule, must be processed in the interval \([e,\tau _2)\) if [b, e) is the correct guess for the execution of the piece \(u_k\).
We therefore have that if there exist \(b, e \in W\) with the properties stated under the minoperator, and \(E_k(\tau _1,\tau _2) = \infty \) otherwise. 
Theorem 1
There exists a fully polynomialtime approximation scheme (FPTAS) for speed scaling with sleep state. Its time complexity is \(O(\frac{n^{32}}{\epsilon ^{17}} ( \log \frac{n^2 d_{\max }}{\epsilon })^4)\).
Proof
Given an arbitrary instance \(\mathcal {J}\) for speed scaling with sleep state, we can transform it in polynomial time to an instance \(\mathcal {J}'\), as seen in Sect. 2. We then apply the dynamic programming algorithm that was described in this section to obtain a wellordered discretized schedule \(\mathcal {S'}\) of minimal energy consumption for instance \(\mathcal {J}'\). By Lemma 6, we have that \(\mathcal {S'}\) is a \((1+\epsilon )\)approximation for instance \(\mathcal {J}'\). Furthermore, note that every discretized schedule (and therefore also \(\mathcal {S'}\)) executes each job \(j_i^d\) exactly in its allowed interval \(I_i = [y_i, z_i)\). This holds because there are no time points from the interior of \(I_i\) included in W, and any discretized schedule must therefore choose to run \(j_i^d\) precisely from \(y_i \in W\) to \(z_i \in W\). Therefore, by Lemma 2, we can transform \(\mathcal {S'}\) to a schedule \(\mathcal {S}\) in polynomial time and obtain a \((1+\epsilon )\)approximation for \(\mathcal {J}\).
We next analyze the running time. The preprocessing of the instance and the YDS algorithm are easily dominated by the dynamic program. By construction in Sect. 3.1, \(W= O( \frac{n^7}{\epsilon ^4} \log \frac{n^2 d_{\max }}{\epsilon })\), \(W' = O(n)\) and \(D = O(\frac{n^3}{\epsilon })\). The total number of entries in the dynamic program is \(O(DW^2)\). For each single entry of \(E_{k}(\tau _1, \tau _2)\), we need to check \(O({W \atopwithdelims ()2})\) possibilities. For each possibility, we need O(n) time. In sum, this gives the running time of \(O(nDW^4)=O(\frac{n^{32}}{\epsilon ^{17}} ( \log \frac{n^2 d_{\max }}{\epsilon })^4)\). \(\square \)
Footnotes
 1.
It is assumed that we have access to an oracle which allows us to evaluate P(s) for any value of s and perform simple operations involving values returned by P. Similarily the critical speed (defined later) is a known parameter of P(s).
 2.
Unless stated differently, throughout the text an interval will always have the form \([\cdot ,\cdot )\).
 3.
For any two time points \(t_1<t_2\), we say that \(t_1\) is to the left of \(t_2\), and \(t_2\) is to the right of \(t_1\).
Notes
Acknowledgements
Open access funding provided by Max Planck Society.
References
 1.Albers, S.: Energyefficient algorithms. Commun. ACM 53(5), 86–96 (2010)CrossRefGoogle Scholar
 2.Albers, S., Antoniadis, A.: Race to idle: new algorithms for speed scaling with a sleep state. ACM Trans. Algorithms 10(2), 9 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Antoniadis, A., Huang, C.C., Ott, S.: A fully polynomialtime approximation scheme for speed scaling with sleep state. In: Proceedings of the Seventeenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2015 (2015)Google Scholar
 4.Bailis, P., Reddi, V.J., Gandhi, S., Brooks, D., Seltzer, M.I.: Dimetrodon: processorlevel preventive thermal management via idle cycle injection. In: DAC, pp. 89–94. ACM (2011)Google Scholar
 5.Bampis, E., Dürr, C., Kacem, F., Milis, I.: Speed scaling with power down scheduling for agreeable deadlines. Sustain. Comput.: Inf. Syst. 2(4), 184–189 (2012)Google Scholar
 6.Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., Nemparis, I.: From preemptive to nonpreemptive speedscaling scheduling. In: COCOON, pp. 134–146. Springer, New York (2013)Google Scholar
 7.Bansal, N., Chan, H.L., Katz, D., Pruhs, K.: Improved bounds for speed scaling in devices obeying the cuberoot rule. Theory Comput. 8(1), 209–229 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Bansal, N., Chan, H.L., Pruhs, K.: Speed scaling with an arbitrary power function. ACM Trans. Algorithms 9(2), 18 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Baptiste, P.: Scheduling unit tasks to minimize the number of idle periods: a polynomial time algorithm for offline dynamic power management. In: SODA, pp. 364–367. ACM Press, New York (2006)Google Scholar
 10.Baptiste, P., Chrobak, M., Dürr, C.: Polynomialtime algorithms for minimum energy scheduling. ACM Trans. Algorithms 8(3), 26 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Brooks, D.M., Bose, P., Schuster, S.E., Jacobson, H., Kudva, P.N., Buyuktosunoglu, A., Wellman, J.D., Zyuban, V., Gupta, M., Cook, P.W.: Poweraware microarchitecture: design and modeling challenges for nextgeneration microprocessors. IEEE Micro 20(6), 26–44 (2000)CrossRefGoogle Scholar
 12.Demaine, E.D., Ghodsi, M., Hajiaghayi, M., SayediRoshkhar, A.S., Zadimoghaddam, M.: Scheduling to minimize gaps and power consumption. J. Sched. 16(2), 151–160 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Gandhi, A., HarcholBalter, M., Das, R., Lefurgy, C.: Optimal power allocation in server farms. In: SIGMETRICS/Performance, pp. 157–168. ACM, New York (2009)Google Scholar
 14.Garrett, M.: Powering down. ACM Queue 5(7), 16–21 (2007)CrossRefGoogle Scholar
 15.Han, X., Lam, T.W., Lee, L.K., To, I.K.K., Wong, P.W.H.: Deadline scheduling and power management for speed bounded processors. Theor. Comput. Sci. 411(40–42), 3587–3600 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Huang, C., Ott, S.: New results for nonpreemptive speed scaling. In: MFCS, pp. 360–371. Springer, New York(2014)Google Scholar
 17.Irani, S., Pruhs, K.: Algorithmic problems in power management. SIGACT News 36(2), 63–76 (2005)CrossRefGoogle Scholar
 18.Irani, S., Shukla, S.K., Gupta, R.: Algorithms for power savings. ACM Trans. Algorithms 3(4) (2007)Google Scholar
 19.Kumar, G., Shannigrahi, S.: NPhardness of speed scaling with a sleep state. CoRR (2013). arXiv:1304.7373
 20.Raghavan, A., Emurian, L., Shao, L., Papaefthymiou, M.C., Pipe, K.P., Wenisch, T.F., Martin, M.M.K.: Utilizing dark silicon to save energy with computational sprinting. IEEE Micro 33(5), 20–28 (2013)CrossRefGoogle Scholar
 21.Yao, F.F., Demers, A.J., Shenker, S.: A scheduling model for reduced cpu energy. In: FOCS, pp. 374–382. IEEE Computer Society (1995)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.