A survey of offline algorithms for energy minimization under deadline constraints
 1.2k Downloads
 7 Citations
Abstract
Modern computers allow software to adjust power management settings like speed and sleep modes to decrease the power consumption, possibly at the price of a decreased performance. The impact of these techniques mainly depends on the schedule of the tasks. In this article, a survey on underlying theoretical results on power management, as well as offline scheduling algorithms that aim at minimizing the energy consumption under realtime constraints, is given.
Keywords
Scheduling Algorithmic power management Speed scaling Sleep modes Energy minimization1 Introduction
Energy consumption is nowadays an important design constraint for computing systems (Zhuravlev et al. 2013). On the one hand, computing power of embedded systems increases rapidly, whereas the battery capacity does not grow with the same pace. On the other hand, like for datacenters, the energy consumption is an important cost factor.
To decrease the energy consumption of computing devices while still meeting performance constraints power management techniques are often deployed (e.g., Irani and Pruhs 2005; Albers 2010). Herein, software is used to influence the energy consumption of computers. This software allows control of hardware parameters like the speed (speed scaling), or decides to transition devices to a lowpower sleep mode when they are not used. Combined with such power management techniques, scheduling algorithms play a crucial role, since the underlying schedules have a critical impact on the efficiency of power management techniques. The collection of all of these techniques is often referred to with the generic term Algorithmic power management (see, e.g., Pruhs 2011).
In this article, we discuss many algorithmic power management results. More precisely, we survey theoretical results on power management as well as offline algorithms for energy minimization under deadline constraints (called the “server problem”; Bunde 2006). Furthermore, we discuss both speed scaling and lowpower sleep modes. Speed scaling is used to adapt the speed of a system, so that its power consumption is reduced. It can be hard to determine the optimal speeds, because these have to be chosen globally—all tasks have to be taken into account—instead of chosen locally on a taskbytask basis.
An idle device can be put in a lowpower sleep mode to reduce the energy consumption; however, energy is required to wake it up again. This poses a tradeoff between sleeping or remaining idle. Sleep modes can significantly reduce the energy consumption when the system has long idle periods. Because of this, scheduling algorithms are deployed to create schedules with many and sufficiently long idle periods. Note, that speed scaling and sleep modes are not mutually exclusive: sometimes it is better to use both in combination.
Note, that in the last years, also peak power minimization became an important topic of research (e.g., Lee et al. 2014; Manoj et al. 2013). We argue that many of the speed scaling algorithms that we survey also minimize the peak power of a system.
This article is organized as follows. In the next section, we briefly discuss some related surveys. After that, in Sect. 3, we provide introductions into modeling of speed scaling and sleep modes, and we introduce the notations that are used throughout this survey. The latter is important, as different authors use different notations when describing their power management problems. Many are loosely based on the notation by Graham et al. (1977).
In Sect. 4, we present several (orthogonal) theoretical power management results, which form the foundation of many power management algorithms, and show how these results interact.
Section 5 surveys algorithms that minimize the energy consumption of singleprocessor systems with deadline constraints. The relation between similar problems is discussed and it is shown how the theoretical power management results from Sect. 4 are applied. This discussion is followed by a survey of multiprocessor power management problems, and algorithms for these problems in Sect. 6. In Sect. 7 several open problems are discussed, and Sect. 8 concludes this article with a discussion.
2 Related surveys
The recent article by Zhuravlev et al. (2013) surveys many energyaware scheduling techniques. Many of the papers they survey are on thermalaware scheduling and scheduling for asymmetric systems. In their survey, there is no emphasis on algorithms and their properties, which is the focus of this article.
Benini et al. (2000) give an important survey on sleep modes (DPM) that is mainly application oriented. They present a lot of background, and discuss implementation details, including a discussion on the Advanced Configuration and Power Interface (ACPI). The algorithms they discuss are intended for general operating systems, and depend on predictive schemes and stochastics. In contrast, this article focuses on (clairvoyant) offline algorithms for realtime systems. Moreover, this article discusses speed scaling and scheduling.
There are several articles that survey results from algorithmic power management. The very broad overview by Chen and Kuo (2007) discusses powerrelated scheduling techniques, but does not focus on algorithms. Irani and Pruhs (2005) and Albers (2010) present surveys that do focus on algorithms. The first survey (Irani and Pruhs 2005) contains a relatively small set of algorithms, while the second and more recent survey article (Albers 2010) discusses more algorithms. Although both surveys treat results from the entire spectrum of algorithmic power management, only a few offline algorithms for energy minimization under deadline constraints are discussed.
None of these surveys discussed in this section focuses on offline energy minimization under deadline constraints, nor treated many papers on this subject. Furthermore, to the best of our knowledge, the survey in this article is the first that links the many different theoretical concepts of algorithmic power management.
3 Modeling and notation
Many algorithmic power management papers have different modeling assumptions and there is no unique notation to describe both speed scaling and sleep mode problems. In this section, we structure the modeling assumptions and present a unifying notation for power management problems. Section 3.1 discusses the notation and models for tasks. Some practical aspects for speed scaling on a computer processor and a notation for these aspects are discussed in Sect. 3.2, while modeling of sleep modes is discussed in Sect. 3.3. Finally, a notation for algorithmic power management problems is presented in Sect. 3.4.
3.1 Task models
In general, a finite number (N) of tasks is considered, which we denote by \(T_1,\dots ,T_N\). These tasks are scheduled on M processors, where in many cases \(M=1\). Each task \(T_n\) has a workload \(w_n\). For speed scaling, a speed \(s_n\) at which task \(T_n\) is executed must be determined, which leads to an execution time \(e_n=\frac{w_n}{s_n}\). In some cases, the speed may be changed during a task, which leads to an adaption of the used notation. Then the speed function \(s : \mathbb {R}_0^+ \rightarrow \mathbb {R}_0^+\), which gives the speed as a function of the time, is used.
The available speeds are given by a set \(\mathcal {S}\), which is either an interval (\(\mathcal {S} = [s^{\min }, s^{\max }]\)) or a finite discrete set with K speeds (\(\mathcal {S}=\{\bar{s}_1,\dots ,\bar{s}_K\}\), where we assume w.l.o.g. that \(\bar{s}_1\le \dots \le \bar{s}_K\)). When a speed must be chosen from a continuous (discrete) set, we call this speed a continuous (discrete) speed, and refer to a problem with such restriction as a continuous (discrete) speed scaling problem.
Besides its workload, each task has an arrival time \(a_n\) and a deadline \(d_n\). The tasks have to be scheduled to meet these constraints, implying that the begin time \(b_n\) and completion time \(c_n\) have to be chosen so that \(a_n\le b_n\le c_n \le d_n\). If the tasks are scheduled without interruption, we furthermore have \(c_n=b_n+e_n\).
3.2 Processor models for speed scaling
An important objective used in the majority of papers that we survey is energy minimization of microprocessors. Hence, in the following we concentrate on speed scaling of microprocessors. Furthermore, we discuss some modeling assumptions that are not studied in the current algorithmic power management literature.
Microprocessors have a clock frequency, which represents the speed of the processor. For many systems the speed of the computer memory (and other peripherals) does not scale with the clock frequency of the processor because it is a separate device that does not necessarily use the same clock frequency. In other words, in most practical settings the speed of the overall system (and of tasks) does not scale linearly with the clock frequency of the microprocessor (Devadas and Aydin 2012). However, all algorithms that we survey assume that the speed does scale linearly with the clock frequency, and hence we also assume this throughout this article. Note, that this assumption leads to an underestimation of the execution times of the tasks in case the clock frequency is decreased with respect to some reference clock frequency, which means that tasks finish earlier than is predicted using the models. Note, that for a multicore processor with only local memories (e.g., scratchpad memory) the speed does scale linearly with the processor clock frequency.
As a consequence of the above assumption, clock frequency and speed are synonyms, and therefore \(s_n\) and s(t) are used to denote the clock frequency. In this article, we mostly use the terms speed and speed scaling, instead of clock frequency and Dynamic Voltage and Frequency Scaling (DVFS), in line with the majority of papers on algorithmic power management.
For multicore processors, there are two main flavors of speed scaling, namely local speed scaling and global speed scaling. While local speed scaling changes the speed per individual core, global speed scaling makes these changes for the entire chip. For this reason, the optimal solutions to the local and global speed scaling problems are not interchangeable. Global speed scaling is the most commonly applied of these techniques, since it is cheaper to implement (March et al. 2011; Chaparro et al. 2007). Examples of modern processors and systems that use global speed scaling are the Intel Itanium, the PandaBoard (dualcore ARM Cortex A9), IBM Power7, and the NVIDIA Tegra 2 (Kalla et al. 2010; March et al. 2011; Kandhalu et al. 2011; Zhang et al. 2012).
Nowadays, most modern microprocessors are built using CMOS transistors. When the clock frequency of a CMOS processor is decreased, the voltage may be decreased as well. Dynamic voltage and frequency scaling (DVFS) (Weiser et al. 1996) is a power management technique that allows the clock frequency and voltage to be changed at runtime. Both the clock frequency and the voltage influence the power consumption of a processor and the energy consumption is obtained by integrating this power consumption over time.
Generally, we define the total power consumption (both static and dynamic) as a power function \(p : \mathbb {R}_0^+ \rightarrow \mathbb {R}_0^+\), which maps speed to power.
For microprocessors, the power function does not fully describe all energy that is used, since changing the clock frequency also has an energy and time overhead. The recent article by Park et al. (2013) shows that the time and energy overheads of DVFS are in the same order of magnitude as the overhead of context switching. For example, the transition delay overhead is at most \(62.68\,\upmu s\) on an Intel Core2 Duo E6850 (Park et al. 2013). Furthermore, most algorithms avoid changing the clock frequency often because of the convexity of the power function (see Sect. 4.1), hence the number of speed changes is relatively low. Because of these two reasons, we assume that the energy overhead of changing the clock frequency is negligible in case of DVFS.
Note, that speed scaling is not restricted to microprocessors, but can also be used for flash memory (Lee and Kim 2010), hard disks (Liu et al. 2004), and may even be relevant to applications outside of computer science.
3.3 Sleep modes
As already mentioned in the previous subsection, devices also consume power when they are idle. Several devices like microprocessors, hard disks, communication devices (e.g., network interfaces) can switch to a sleep mode by powering (parts of) the device down to decrease the power when idle. For example, when a processor is transitioned to a sleep mode, the current state is stored, and the state is recovered when the processor is awakened. Another example is a harddisk drive, which spins down when put to sleep mode, while it spins up when it is awakened. These devices have in common that a cost in both latency and energy is associated with switching to a sleep mode and waking up. The energy consumption determines the breakeven time, which is the minimum length of an idle period which makes it worthwhile to transition to a sleep mode. It is commonly assumed that the breakeven time for a sleep mode is longer than the latency associated of switching to and from this sleep mode. It was shown empirically that algorithms that use this assumption still work well when the latency is taken into account (Irani et al. 2007).
Devices can even have multiple sleep modes, with different breakeven times, or there can be multiple devices within a system with different breakeven times. The energy consumption during an idle period is generally modeled as a piecewise concave function \(E^{\text {SL}}: \mathbb {R}_0^+ \rightarrow \mathbb {R}_0^+\) of the length of the idle period (Augustine et al. 2008; Gerards and Kuper 2013).
3.4 Problem notation and qualification
To classify a wide variety of algorithmic power management problems, in this section a compact notation (based on the threefield notation for scheduling problems that was introduced by Graham et al. 1977) to describe a wide variety of algorithmic power management problems is introduced. The notation is similar to what is used in the algorithmic power management literature (e.g., Bampis et al. 2015), but avoids several ambiguities, by making explicit what kind of power management techniques are used.

\(\mathfrak a\): The system field describes the architecture of the system. This includes the number of processors (or devices), whether speed scaling (ss) and/or sleep modes (sl) are used, and properties of the system with respect to speed scaling and/or sleep modes (see Table 1). The entries nonunif, disc, and global all imply speed scaling (ss) to keep the notation concise.

\(\mathfrak b\): The second field contains the task characteristics like arrival time, deadline, restrictions on the ordering of timing constraints of tasks (agree, prec, lami), and scheduling properties (migr, pmtn, prio, sched). E.g., when \(a_n\) occurs in this field, it means that tasks have arrival times, otherwise \(a_n{=}0\) (for all n) is implied. As we focus on energy minimization under deadline constraints, \(d_n\) always occurs in \(\mathfrak b\) and implies that deadlines must be met.

\(\mathfrak c\): The third field contains the scheduling objective. In the context of this article, the field \(\mathfrak c\) only contains “E” to denote that the energy should be minimized, but we maintain this field to preserve compliance with Graham’s notation.
Notation for algorithmic power management problems
Field  Entry  Meaning 

\(\mathfrak a\)  1  Single processor 
\(P_M\)  M parallel processors  
\(\mathrm {ss}\)  Speed scaling is supported  
\(\mathrm {nonunif}\)  A nonuniform power function is used (\(\mathrm {ss}\) implied)  
\(\mathrm {disc}\)  Discrete speed scaling is used (\(\mathrm {ss}\) implied)  
\(\mathrm {global}\)  Global speed scaling is used (\(\mathrm {ss}\) implied)  
\(\mathrm {sl}\)  Sleep modes supported  
\(\mathfrak b\)  \(a_n\)  Arrival time 
\(a_n{=}a\)  Same arrival time a for all tasks  
\(d_n\)  Deadline constraint  
\(d_n{=}d\)  Same deadline constraint d for all tasks  
\(w_n{=}w\)  All tasks have workload w  
\(\mathrm {agree}\)  Agreeable deadlines (\(a_n \le a_m \Leftrightarrow d_n \le d_m\))  
\(\mathrm {lami}\)  Laminar instances  
(\([a_i,d_i] \subset [a_j,d_j] \vee [a_j,d_j] \subset [a_i,d_i] \vee [a_i,d_i] \cap [a_j,d_j] = \emptyset \))  
\(\mathrm {prec}\)  Tasks have precedence constraints  
\(\mathrm {pmtn}\)  Preemptions are allowed  
\(\mathrm {prio}\)  Tasks have a fixed priority  
\(\mathrm {migr}\)  Task migration is allowed  
\(\mathrm {sched}\)  A schedule is given  
\(\mathfrak c\)  E  Minimize the energy consumption 
4 Fundamental results
Over the years, many fundamental results on algorithmic power management have been obtained, which form the basis of many algorithms, or relate problems to each other, so that the solution to one problem can be used to find a solution to another problem. This section introduces these fundamental results and concepts in the area of algorithmic power management. One of the most important results is that for the singleprocessor case it is optimal to use a constant speed between begin and completion time of tasks due to the convexity of the power function (Sect. 4.1). Although this result only holds for convex power functions, using the idea presented in Sect. 4.2, it can also be used for the nonconvex situation as all power functions can be “made” convex. Convexity is not the only requirement for optimization, one has to be careful that the chosen speed for a task is not too low because then static power may dominate (Sect. 4.3).
Whereas the above results are often presented in a continuous speed scaling context, in practice, discrete speed scaling is more often used. Many speed scaling problems (with a given schedule) can be formulated as a linear program (Sect. 4.4). Moreover, in the singleprocessor case it is furthermore straightforward to derive the solution to this discrete problem from the solution to the continuous case (Sect. 4.5).
For multiprocessor problems, it can be shown that in the optimal solution of several problems the power consumption remains constant over time. This fact is referred to as the power equality (Sect. 4.6). The problem wherein every task has a different power function (Sect. 4.7) is related to this multiprocessor problem. We present a simple transformation that transforms this problem with multiple power functions to the problem wherein all tasks have the same power function.
Finally, we briefly discuss that speed scaling problems wherein preemptions are not allowed can sometimes be written as a flow problem (Sect. 4.8), and that when scheduling for sleep modes, it is often best to unbalance the length of idle periods (Sect. 4.9).
4.1 Constant speed
Whenever a single processor executes a single task using varying speeds, the energy consumption can be decreased by running it at the average speed. This even holds when the task is executed with interruptions (i.e., on times given by any set \(\mathcal {T}\)). This result holds for all convex power functions, where this property does not form a restriction as is discussed in Sect. 4.2. We formalize this result, which is a direct consequence of Jensen’s inequality (Irani et al. 2007), in the following theorem.
Theorem 1
Proof
Theorem 1 shows that for continuous speed scaling, there always exists a constant speed that is optimal for a single task on a single processor. Many papers (e.g., Huang and Wang 2009; Yao et al. 1995; Li et al. 2006) use the idea behind Theorem 1, and show that minimizing unnecessary speed fluctuations on a single processor is optimal also for situations with more than one task, i.e., \(N>1\). However, when there are arrival times, deadlines, etc., the optimal constant speed may change on these specific times, meaning that the optimal speed function is piecewise constant.
4.2 Nonconvex power function
The previous section (and with it, a large part of the literature) assumes that the power function is convex, but for technical reasons this is not always the case. However, it is possible to circumvent this by not using the speeds of the regions where the function is not convex, since we can show that these speeds are not efficient. This process is first explained for discrete speed scaling.
Based on the above, we may assume that all speeds in \(\mathcal {S}\) are efficient speeds, thus Eq. (2) holds for all speeds (i.e., inefficient speeds are “discarded”), as is discussed by Hsu and Feng (2005). This illustrates that we can always assume without loss of generality that the power function is convex.
Bansal et al. (2013) state that a similar procedure can be followed for continuous speed scaling. Note, that the static and dynamic power models from Sect. 3.2 are already convex.
4.3 Critical speed
With the presence of static power, convexity of the power function is not the only aspect which has to be taken into account when finding an optimal solution for some speed scaling problems.
In practice, processors consume static power (\({\gamma _2}>0\)), i.e., the power consumption at speed 0 is nonnegative (\(p(0)>0\)). Unfortunately, most papers do not clearly define for which time period they take the static power into account. In this survey, we assume that the application begins at some given time \(t^B\), and the power consumption of the processor is accounted for until some time \(t^C\). Furthermore, we either assume that \(t^C=c_N\) (completion time of the last task) or \(t^C=d_N\) (deadline of the last task). For example, Yao et al. (1995) only assume that the power function is convex and do not mention static power. However, their result only holds when the static power cannot be influenced, i.e., when it is accounted for until the deadline of the last task and not only to the completion time of the last task. As in this case, static power cannot be influenced, the situation where \(p(0)=0\) gives the same solution as the case where \(p(0)>0\). This scenario is mentioned by Irani et al. (2007).
4.4 Discrete speed scaling as a linear program
Besides static power, many processors have the restriction that only a small set of speeds is allowed (discrete speed scaling). Many discrete speed scaling problems with a given schedule can be formulated as a linear program, as we show in the following.
Constraints like arrival time, deadline, and precedence constraints can all be formulated as linear constraints. Therefore, many discrete speed scaling problems (with or without a given schedule) can be formulated as a linear program (Kwon and Kim 2005; Rountree et al. 2007) and, thus, can be solved in polynomial time.
4.5 Relation between continuous and discrete speed scaling
Formulating discrete speed scaling problems as a linear program and solving it with linear programming software provides few insights. Instead, a tailored algorithm for finding the optimal speeds is desirable. Such algorithms are described in many papers (e.g., Yao et al. 1995; Pruhs et al. 2008; Huang and Wang 2009) for continuous speed scaling, while in practice most processors support only discrete speed scaling. Therefore, in the following, we investigate the relation between continuous speed scaling and discrete speed scaling.
When a single task is considered, the optimal speed s resulting from the continuous case can be used to determine the optimal speeds for the discrete case. When the speed s is not one of the available discrete speeds, using only the neighboring speeds \(\bar{s}_i \le s \le \bar{s}_{i+1}\) leads to an optimal solution. More precisely, the first part of the work is executed at speed \(\bar{s}_{i+1}\) and the remaining work is executed at speed \(\bar{s}_{i}\). These fractions of work are calculated so that the overall time remains the same. We refer to this as simulating continuous speed scaling.
4.6 Power equality
The previous sections mainly focused on the singleprocessor case. In the multiprocessor case with precedence constraints, new issues arise that are best illustrated with an example.
Example 1
Consider the three tasks from Fig. 1, each with w work, which are to be executed on a local speed scaling multiprocessor system. Task \(T_1\) has to be finished before tasks \(T_2\) and \(T_3\) can be executed, and the application as a whole has a global arrival time 0 and a global deadline d. An example of a naive speed assignment is \(s_1=s_2=s_3=\frac{2w}{d}\). Note that Theorem 1 cannot be used to argue that this assignment is optimal, because now multiple processors are active. In fact, this assignment is not optimal, since it can be improved by slightly increasing \(s_1\) so that task \(T_1\) consumes slightly more energy, while the two tasks \(T_2\) and \(T_3\) can decrease their energy consumption. The speed of task \(T_1\) should not be too high (discussed below), because then its energy consumption is no longer compensated by tasks \(T_2\) and \(T_3\).
This example illustrates that the optimal speeds depend on the amount of parallelism of the scheduled tasks. Pruhs et al. (2008) introduce the power equality for tasks with a common arrival time and deadline: in the optimal solution, the power consumption remains constant. Thus, the power is constant, and the speeds can be calculated using this power and the number of parallel executed tasks. For the concrete situation of Fig. 1, this means that \(p(s_1) = p(s_2) + p(s_3)\). This power equality generalizes Theorem 1.
Example 2
Consider again the task graph from Fig. 1 with the power function \(p(s)=s^3\), and assume that all the tasks have 10 work, and the global deadline is 40. A naive speed assignment uses the constant speed \(s_1=s_2=s_3=\frac{1}{2}\).
4.7 Nonuniform power
Most papers assume that uniform power is used (see Sect. 3.2), while in practice the parameter \({\gamma _1}\) of the power function is not constant (i.e., nonuniform) for all tasks (Kwon and Kim 2005), and a task specific factor \({\gamma _1}(n)\) for the dynamic power of task \(T_n\) is more appropriate. A similar situation occurs in the multicore situation with m active cores, where the dynamic power must be multiplied by m. This fact is used by several papers on multicore speed scaling (e.g., Gerards et al. 2015).
Uniprocessor algorithmic power management problems
Section  Problem  Papers 

General tasks (Sect. 5.1)  \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\)  
\(1;\mathrm {disc} \vert a_n;d_n;\mathrm {pmtn} \vert E\)  
\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {prio} \vert E\)  Quan and Hu (2003)  
\(1;\mathrm {ss} \vert a_n;d_n \vert E\)  Antoniadis and Huang (2013), Bampis et al. (2015), Huang and Ott (2014)  
Bampis et al. (2014a), CohenAddad et al. (2015), Bampis et al. (2014b)  
\(1;\mathrm {ss} \vert a_n;d_n;w_n=1 \vert E\)  Huang and Ott (2014)  
\(1;\mathrm {ss};\mathrm {nonunif} \vert a_n;d_n;\mathrm {pmtn} \vert E\)  Kwon and Kim (2005)  
\(1;\mathrm {ss};\mathrm {nonunif};\mathrm {disc} \vert a_n;d_n \vert E\)  Kwon and Kim (2005)  
\(1;\mathrm {sl} \vert a_n;d_n;\mathrm {pmtn} \vert E\)  Baptiste et al. (2012)  
\(1;\mathrm {ss};\mathrm {sl} \vert a_n;d_n;\mathrm {pmtn} \vert E\)  Irani et al. (2007), Albers and Antoniadis (2014), Antoniadis et al. (2015)  
Agreeable deadlines (Sect. 5.2)  \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\)  
\(1;\mathrm {sl} \vert a_n;d_n;\mathrm {agree} \vert E\)  Angel et al. (2014)  
\(1;\mathrm {sl};\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\)  Bampis et al. (2012a)  
Laminar instances (Sect. 5.3)  \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {lami} \vert E\)  Li et al. (2006) 
\(1;\mathrm {ss} \vert a_n;d_n=d;\mathrm {pmtn} \vert E\)  Li et al. (2006)  
\(1;\mathrm {ss} \vert a_n=a;d_n;\mathrm {pmtn} \vert E\)  Li et al. (2006)  
\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {lami} \vert E\)  Huang and Ott (2014) 
The newly obtained problem has uniform power, can be solved using classic algorithms, and the resulting solution can be transformed back to a solution to the problem with nonuniform power.
4.8 Flow problems
Several power management problems can be reduced to (convex) flow problems. However, as these formulations as flow problems depend on the concrete algorithmic power management problem, we do not discuss this technique in more detail. We refer the interested readers to three papers, namely Bampis et al. (2012b), Albers et al. (2011), and Angel et al. (2012b), which use such techniques to solve the problem \(P_M;ss \vert a_n;d_n;\mathrm {pmtn};\mathrm {migr} \vert E\). In Sect. 6.1 these papers are briefly discussed.
4.9 Sleep modes
5 Uniprocessor problems
The previous section introduced many general concepts that can be applied to a variety of power management problems. This section surveys concrete algorithms for uniprocessor power management problems (see Table 2 for an overview), and relates these algorithms (when applicable) to the results that were presented in the previous section.
Recall that for each task \(T_n\) we have a workload \(w_n\), an arrival time \(a_n\), and a deadline \(d_n\) before which the task has to finish. In the case of speed scaling, a speed \(s_n\) is to be determined, leading to an execution time \(e_n\). We use \(b_n\) and \(c_n\) to denote the begin and completion time of task \(T_n\), respectively.
The problems in this section are grouped depending on restrictions on the ordering of the timing constraints of tasks. For all problems discussed in this section, the problem consists of finding a schedule together with speeds and/or sleep decisions. First, the problems without any restrictions on the timing constraints are discussed in Sect. 5.1. Several variants of this problem are solved by algorithms with a relatively highpolynomial time complexity, or are NPhard. Second, in Sect. 5.2, the simpler case of problems with agreeable deadlines is discussed. For many variants of this problem, algorithms with a quadratic time complexity are known. Third, laminar problems are discussed in Sect. 5.3.
5.1 General tasks
In this section, we discuss general tasks, i.e., tasks that have arbitrary arrival times and deadlines. The first variant that we consider allows preemptions of tasks (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). According to Albers et al. (2011), this is the most extensively studied speed scaling problem in the algorithmoriented literature. Yao et al. (1995) present the wellknown YDS algorithm (named after the authors) to solve this problem. This algorithm is often used as a subroutine by other algorithms, and in complexity proofs.
The considered problem involves both scheduling and speed scaling. However, if we have specified the speed to use over the complete time horizon, or if we have specified the speed of each task, we can find a corresponding feasible schedule—if it exists for this speed assignment—by planning successively always the available task with the smallest deadline (Yao et al. 1995). The basic idea of the YDS algorithm is to avoid unnecessary speed changes (see Sect. 4.1), and has the property that the speeds in the optimal solution cannot be lowered to decrease the energy consumption without violating deadlines.
Tasks for Example 3
Task  Arrival time  Deadline  Workload 

\(T_1\)  0  30  30 
\(T_2\)  5  10  10 
\(T_3\)  15  55  10 
\(T_4\)  25  35  10 
Example 3
(YDS algorithm) Consider the tasks from Table 3 of which the arrival times and deadlines are depicted in Fig. 2a. The YDS algorithm first determines the critical interval, which is \(I_{2,2}\) in the first iteration of the algorithm (see Table 4). Since the density of this interval is \(g(I_{2,2})=2\), task \(T_2\) is assigned the speed \(s_2=2\). Next, the interval \(I_{2,2}\) is removed, and the arrival times and deadlines of the other tasks are adapted accordingly (see Fig. 2b).
Interval densities for Example 3
Interval  Iteration 1  Iteration 2  Iteration 3  

\(g(I_{i,j})\)  \(g(I_{i,j})\)  \(g(I_{i,j})\)  
\(I_{1,1}\)  \(\frac{40}{30}\)  \(\approx 1.333\)  \(\frac{30}{25}\)  \( = 1.2\)  
\(I_{1,2}\)  \(\frac{10}{10}\)  \(= 1\)  
\(I_{1,3}\)  \(\frac{50}{55}\)  \(\approx 0.909\)  \(\frac{50}{50}\)  \( = 1\)  
\(I_{1,4}\)  \(\frac{50}{35}\)  \(\approx 1.429\)  \(\frac{40}{30}\)  \(\approx 1.333\)  
\(I_{2,1}\)  \(\frac{10}{25}\)  \(= 0.4\)  
\(I_{2,2}\)  \(\frac{10}{5}\)  \(=2\)  
\(I_{2,3}\)  \(\frac{30}{50}\)  \(=0.6\)  
\(I_{2,4}\)  \(\frac{20}{30}\)  \(\approx 0.667\)  
\(I_{3,1}\)  0  0  
\(I_{3,2}\)  0  
\(I_{3,3}\)  \(\frac{20}{40}\)  \(=0.5\)  \(\frac{20}{40}\)  \(=0.5\)  \(\frac{10}{20}\)  \(=0.5\) 
\(I_{3,4}\)  \(\frac{10}{20}\)  \(=0.5\)  \(\frac{10}{20}\)  \(=0.5\)  
\(I_{4,1}\)  0  0  
\(I_{4,2}\)  0  
\(I_{4,3}\)  \(\frac{10}{30}\)  \(\approx 0.333\)  \(\frac{10}{30}\)  \(\approx 0.333\)  
\(I_{4,4}\)  \(\frac{10}{10}\)  \(=1\)  \(\frac{10}{10}\)  \(=1\) 
In a schedule created by this YDS algorithm, the processor is active from the arrival of the first task to the deadline of the last task (unless there are no tasks in some interval). Hence, because of static power, this algorithm is only optimal when it is assumed that the processor remains active until the last deadline (Irani et al. 2007). To the best of our knowledge, there is no optimal algorithm known for the situation where no static energy is consumed after the last executed task.
The original implementation of the YDS algorithm has a time complexity of \(O(N^3)\) (Li et al. 2006). As the original paper (Yao et al. 1995) does not contain a proof of optimality, several proofs of optimality have appeared in the literature afterwards. Bansal et al. (2007) use the Karush Kuhn Tucker (KKT) conditions (Boyd and Vandenberghe 2004) to prove optimality of YDS for the power function \(p(s)=s^\alpha \). Li et al. (2006) give a different proof, and present an efficient implementation of YDS with time complexity \(O(N^2 \log N)\). They also provide an \(O(K N \log N)\) algorithm for the variant with discrete speed scaling with K speeds (\(1;\mathrm {disc} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). A recent technical report by Li et al. (2014) states that the continuous problem can be solved in \(O(N^2)\) and the discrete problem can be solved in \(O(N \log \max \{N,K\})\). An alternative method for obtaining the optimal speeds in the discrete case is by applying the YDS algorithm, and then simulating the obtained speeds as discussed in Sect. 4.5 (Kwon and Kim 2005; Hsu and Feng 2005).
The YDS algorithm schedules tasks in EDF order. This implies that when tasks must be scheduled in a predefined order (e.g., based on priorities), the YDS algorithm cannot be used (Quan and Hu 2003). Yun and Kim (2003) show that the fixed priority variant of this problem (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {prio} \vert E\)) is NPhard, and give an FPTAS for the problem.
There exist several other variations of the problem introduced by Yao et al. (1995). The variant that does not allow preemptions of tasks (\(1;\mathrm {ss} \vert a_n;d_n \vert E\)) is NPhard (Antoniadis and Huang 2013). Bampis et al. (2015) designed an algorithm for this problem with the approximation ratio \((1+w^{\max }/w^{\min })^\alpha \), where \(w^{\max }\) and \(w^{\min }\) are, respectively, the upper and lower bounds on the work of tasks. Bampis et al. (2014b) use results from several papers (Huang and Ott 2014; Bampis et al. 2014a; CohenAddad et al. 2015) for this problem to design an algorithm with approximation ratio \((1+\epsilon )^\alpha \tilde{B}_\alpha \), where \(\tilde{B}_\alpha = \sum _{k=0}^\infty \frac{k^\alpha e^{1}}{k!}\) is a generalization of the Bell numbers that works for fractional values of \(\alpha \). When all tasks have the same workload (\(1;\mathrm {ss} \vert a_n;d_n;w_n=1 \vert E\)), the problem can be solved in polynomial time (Huang and Ott 2014).
Kwon and Kim (2005) study another variation, where the dynamic power consumption may differ per task (\(1;\mathrm {ss}; \mathrm {nonunif} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). This is, for example, due to switched capacitances. They solve this problem using a substitution of variables (see Sect. 4.7). They formulate the discrete speed scaling variant of this problem (\(1;\mathrm {ss};\mathrm {nonunif};\mathrm {disc} \vert a_n;d_n;\mathrm {pmtn} \vert E\)) as a linear program (see Sect. 4.4).
The sleep mode counterpart of the YDS problem is \(1;\mathrm {sl} \vert a_n;d_n;\mathrm {pmtn} \vert E\). Baptiste et al. (2012) present an algorithm that is commonly referred to as BCD (named after the authors), that uses dynamic programming to solve the problem in \(O(N^4)\) time. Their algorithm is restricted to instances where processors have only a single sleep mode.
Other authors (Albers and Antoniadis 2014; Irani et al. 2007) study the combination of speed scaling and sleep modes, namely \(1;\mathrm {ss};\mathrm {sl} \vert a_n;d_n;\mathrm {pmtn} \vert E\), which is an NPhard problem. The heuristic by Irani et al. (2007) is a 2approximation and is relatively easy to implement. This heuristic uses YDS to determine the speeds, and whenever YDS determines a speed \(s_n{<}s^{\text {crit}}\), this speed is replaced by the speed \(s^{\text {crit}}\) (this is called an \(s^{\text {crit}}\)schedule). These changes create idle time, that can be used to put the processor into a sleep mode. As long as there are tasks available, they are consecutively executed, followed by an idle period of maximal length. This scheduling method is used to create relatively large idle periods. Albers and Antoniadis (2014) use a similar method, but with the cutoff speed \(s^*\) instead of \(s^{\text {crit}}\), where \(s^*\) is determined by solving \(\bar{p}(s^*)=\frac{4}{3}\bar{p}(s^{\text {crit}})\). Furthermore, they use BCD instead of the scheduling algorithm by Irani et al. (2007). This results in a 4/3approximation, but has a higher time complexity (\(O(N^4)\)) because of the use of BCD. When the power function \(p(s)={\gamma _1}s^\alpha + {\gamma _2}\) is used (realistic for DVFS), the approximation ratio becomes 137/117 (\({<}1.171\)). Recently, Antoniadis et al. (2015) presented an FPTAS for this problem that is based on dynamic programming. In this dynamic programming approach, the time horizon is discretized by a polynomial number of intervals, where the number of intervals depends on the required approximation ratio.
5.2 Agreeable deadlines
In applications like multimedia and telecommunication, the arrival times and deadlines are usually in the same order (i.e., \(a_n < a_m \Leftrightarrow d_n \le d_m\)). Such applications are said to have agreeable deadlines. This special structure of the timing constraints makes the development of efficient speed scaling and sleep mode algorithms possible. One main reason for this is that we can assume w.l.o.g. that the tasks are scheduled in order of their timing constraints (i.e., deadlines) and that no preemption is used (for the latter, see e.g., Bampis et al. 2015)
Speed scaling for systems with agreeable deadlines (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\)) is studied by many authors (e.g., Huang and Wang 2009; Wu et al. 2011). Huang and Wang (2009) present an algorithm that calculates the optimal speeds in quadratic time. Their algorithm first schedules the task using the same speed for all tasks. This speed is calculated, so that all tasks are scheduled exactly within the time interval between the first arrival time and the last deadline without any idle time. Then, a task \(T_n\) with the largest violation of an arrival or a deadline in this schedule is used to divide the set of tasks into two subsets: the tasks before and the tasks after the violation. For a deadline violation, the completion time of task \(T_n\) is fixed to \(d_n\), while for an arrival time violation the begin time of task \(T_n\) is fixed to \(a_n\). Then the procedure is recursively repeated for both subsets.
In a variant of this problem, the maximal rate of change of the speed is bounded from above by R (i.e., \(\max _t s'(t) \le R\), for some \(R > 0\)). For this problem Wu et al. (2011) present an algorithm, which finds the optimal solution in quadratic time.
Next to agreeable deadlines with speed scaling, also the problem with sleep modes and the combination of speed scaling and sleep modes is studied in the literature. For the problem where the processor has a single sleep mode (\(1;\mathrm {sl} \vert a_n;d_n;\mathrm {agree} \vert E\)), the algorithm by Angel et al. (2012a) (see also Angel et al. 2014) can be applied to find an energy optimal schedule. The authors observe that there always exists an optimal solution in which every task \(T_n\) starts at either (i) \(a_n\), (ii) \(c_{n1}\), or (iii) \(d_ne_n\). Note, that the options for the completion time \(c_{n1}\) depends on the begin times of tasks \(T_1,\dots ,T_{n1}\). By this, for each task \(T_k\) (tasks ordered in EDF order), there are O(k) possible begin times, leading to a quadratic time complexity. This result by Angel et al. (2012a) is extended by Bampis et al. (2012a) leading to a cubic time algorithm to find the optimal combination of speed scaling and sleep modes (\(1;\mathrm {sl};\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\)).
5.3 Laminar instances
In this section, we study tasks with a nested structure, called laminar instances. A realtime system is a laminar instance whenever, for each pair of tasks, the permissible intervals (\([a_n,d_n]\) for task \(T_n\)) do not overlap, or one is completely contained within the other. In a graphical representation, a task \(T_i\) is drawn on top of task \(T_j\) when \([a_i,d_i] \subset [a_j,d_j]\), which creates layers of tasks and explains the term “laminar instances.” According to Li et al. (2006) these structures occur in recursive programs. Since the tasks can be arranged in a tree structure that expresses this recursion, laminar instances are also referred to as treestructured tasks (Li et al. 2006). Li et al. (2006) give an efficient polynomial time algorithm to find the optimal speeds for laminar instances (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {lami} \vert E\)). The variant of this problem that does not allow preemptions (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {lami} \vert E\)) is NPhard. Huang and Ott (2014) present a QuasiPolynomial Time Approximation Scheme (QPTAS) for this problem.
Just as for the problem with agreeable deadlines, the restriction to laminar instances makes the problem easier to solve. In fact, the case where all deadlines or all arrival times are the same has both agreeable deadlines and is a laminar instance. For both problems, a linear time solution is available (Li et al. 2006).
6 Multiprocessor problems
Multiprocessor algorithmic power management problems
Section  Problem  Papers 

General tasks (Sect. 6.1)  \(P_M;\mathrm {ss} \vert a_n=a;d_n=d \vert E\)  Albers et al. (2014), Pruhs et al. (2008), Chen et al. (2004) 
\(P_M;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {migr} \vert E\)  Bingham and Greenstreet (2008), Albers et al. (2011), Angel et al. (2012b), Bampis et al. (2012b)  
\(P_M;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\)  
\(P_M;\mathrm {ss} \vert a_n;d_n \vert E\)  CohenAddad et al. (2015), Bampis et al. (2015), Bampis et al. (2014a)  
Agreeable deadlines (Sect. 6.2)  \(P_M;\mathrm {ss} \vert a_n;d_n;w_n=1;\mathrm {agree} \vert E\)  Bampis et al. (2015) 
Tasks with precedence constraints (Sect. 6.3)  \(P_M;\mathrm {ss} \vert a_n=a;d_n=d;\mathrm {prec} \vert E\)  Li (2012) 
\(P_M;\mathrm {global} \vert d_n=d;\mathrm {prec} \vert E\)  Gerards et al. (2015)  
\(P_M;\mathrm {global} \vert a_n;d_n;\mathrm {sched};\mathrm {prec} \vert E\)  Gerards et al. (2014) 
6.1 General tasks
We first consider the variant of the problem, where all tasks arrive at time 0, have a shared global deadline, and local speed scaling is used to minimize the total energy consumption (\(P_M;\mathrm {ss} \vert a_n=a;d_n=d \vert E\)). This problem is strongly NPhard (Albers et al. 2014), since the 3partition problem can be reduced to it. Pruhs et al. (2008) show that the problem of minimizing the makespan under an energy constraint can be formulated as the problem of minimizing the \(\ell _\alpha \) norm of the processor loads (where \(\alpha \) is the exponent in the dynamic power function, see Sect. 3.2). For the latter problem, a PTAS exists (Alon et al. 1997). In a similar fashion, also a PTAS can be derived for energy minimization under a global deadline constraint. Such a PTAS cannot exist (unless \(\mathcal {P}\ne \mathcal {NP}\)) if there is a maximum speed \(s^\text {max}\), i.e., \(s_n \le s^\text {max}\) for all n (Chen et al. 2004). Chen et al. (2004) study both the general tasks problem (\(P_M;\mathrm {ss} \vert a_n=0;d_n=d \vert E\)) and the variant with restricted speeds. For the first problem they provide an algorithm with a 1.13 approximation ratio, which also attains this ratio for the second problem under some additional restrictions. Furthermore, they presented an algorithm that can solve both problems optimally when migrations are allowed.
There are several variations of the problem with arbitrary arrival times and deadlines considered in the literature. They differ depending on whether preemptions and migrations of tasks are allowed or not. The widely studied problem \(P_M;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {migr} \vert E\) uses the combination of local speed scaling and scheduling, where preemptions and migrations of tasks are allowed. This problem was first studied by Bingham and Greenstreet (2008), wherein the authors show that the problem is convex. They present an algorithm that is polynomial in the number of tasks, but according to the authors, the complexity is too high for practical applications. However, as they also discuss properties of the optimal solution, their paper is important when studying multiprocessor speed scaling with preemptions and migrations. Albers et al. (2011) present a more efficient polynomial time algorithm for the same problem. Their algorithm uses repeated maximum flow computations to minimize the energy consumption. A closely related approach by Angel et al. (2012b) also uses maximum flow computations to find the optimal solution in polynomial time. The resulting algorithm is more efficient than that of Albers et al. (2011) for the case that a reduced accuracy is allowed. Another approach to the same problem is discussed in the paper by Bampis et al. (2012b), wherein the optimal speeds are determined by solving a convex flow problem. In this approach, execution times correspond to amounts of flow, which have to be sent through the network. The algorithm that solves this problem has a time complexity that depends on the latest deadline. Although this dependency on the deadline is a drawback, the presented approach is straightforward and its concepts are interesting for future research in this direction.
Albers et al. (2014) study the variant of the problem where migrations are not allowed (\(P_M;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). They show that the problem is NPhard, even for tasks with unit workload (for which a PTAS is given). The difficult part of this problem is the assignments of tasks to processors. If such an assignment is given, determining the optimal speeds and scheduling order is straightforward, since YDS can be used for the tasks on each individual processor. The heuristic by Albers et al. (2014) sorts the tasks in order of nondecreasing deadlines, and assigns the tasks in this order to the processor with the lowest amount of work assigned to it. This heuristic has an approximation ratio of \(2(2\frac{1}{N})^\alpha \). A more general version of this problem that considers a weighted sum of the energy consumption and flow time as objective is studied by Greiner et al. (2014).
In recent years, the problem that allows neither migration nor preemption (\(P_M;\mathrm {ss} \vert a_n;d_n \vert E\)) has caught some attention (CohenAddad et al. 2015; Bampis et al. 2015). Bampis et al. (2014a) use results from this previous research to develop an algorithm with the approximation ratio \(\tilde{B}_\alpha \big ((1+\epsilon )(1+w^{\max }/w^{\min })\big )^\alpha \).
6.2 Agreeable deadlines
Just as for the uniprocessor problem with agreeable deadlines, in the multiprocessor case a solution to the preemptive problem with no migration can be transformed to a nonpreemptive solution with no migration with the same costs (Bampis et al. 2015).
Albers et al. (2014) present an optimal algorithm for the multiprocessor agreeable deadline problem where tasks have unit workload (\(P_M;\mathrm {ss} \vert a_n;d_n;w_n{=}1;\mathrm {agree} \vert E\)). This algorithm sorts the tasks in order of nondecreasing deadlines, assigns them to the processors using round robin scheduling and applies an algorithm that solves \(1;\mathrm {ss} \vert a_n;d_n;w_n=1;\mathrm {agree} \vert E\) (e.g., YDS) to the task sets for each individual processor. For tasks with an arbitrary workload they give an \(\alpha ^\alpha 2^{4\alpha }\)approximation algorithm.
6.3 Tasks with precedence constraints
According to the survey by Chen and Kuo (2007) ... energyefficient scheduling for jobs with precedence constraints with theoretical analysis is still missed in multiprocessor systems. Only a few papers have studied speed scaling of tasks with precedence constraints, and to the best of our knowledge no papers studied the sleep mode variant of this problem. Since the local speed scaling problem (\(P_M;\mathrm {ss} \vert a_n=a;d_n=d \vert E\)) from Sect. 6.1 is already NPhard, the variant with precedence constraints (\(P_M;\mathrm {ss} \vert a_n=a;d_n=d;\mathrm {prec} \vert E\)) is also NPhard.
Li (2012) studies the latter problem, and shows that under specific conditions the optimal solution to this problem becomes straightforward to approximate, namely for graphs with precedence constraints that have more parallelism than processors (called wide task graphs). Due to the amount of parallelism, the tasks are easy to schedule and using a single speed for the entire application gives nearoptimal results.
The global speed scaling variant of this problem (\(P_M;\mathrm {global} \vert a_n=a;d_n=d;\mathrm {prec} \vert E\)) is also NPhard, and was studied by Gerards et al. (2015). This problem consists of both scheduling and speed scaling. However, the second step is easy to solve, since the concept of power equality (see Sect. 4.6) can be applied to find the optimal speeds. Gerards et al. (2015) give a scheduling criterion that—together with optimal speeds—leads to a minimal energy consumption. Furthermore, they show how well existing scheduling algorithms perform at approximating the energy consumption.
A closely related problem that also assumes global speed scaling is \(P_M;\mathrm {global} \vert a_n;d_n;\mathrm {sched};\mathrm {prec} \vert E\), where tasks have individual arrival times and deadlines, and a schedule of the tasks is already given. Gerards et al. (2014) give a method that finds the optimal speeds by combining the results on nonuniform power (Sect. 4.7) and the power equality (Sect. 4.6). The given schedule is subdivided into pieces, whereby a piece is a chunk of workload with a constant number of active cores, during which no tasks start or complete. Using the results on nonuniform power and the power equality, these pieces are transformed in such a way that a uniprocessor problem with agreeable deadlines \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\) is achieved, which can be solved in quadratic time (see Sect. 5.2). This solution can be transformed back to obtain the optimal solution of the original problem.
7 Open problems
This section discusses some open problems related to speed scaling. The first problem (Sect. 7.1) is about the relation between continuous and discrete speed scaling for a multiprocessor system. This problem was already solved for singleprocessor systems. The second problem is about speed scaling of tasks with precedence constraints on a local speed scaling system. Even for a given schedule, this problem may be hard.
7.1 Multiprocessor discrete speed scaling
Discrete speed scaling for a single processor is often considered a simpler problem than continuous speed scaling. There is an \(O(N^2 \log N)\)time algorithm for the frequently studied problem \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\), while there is a \(O(K N \log N)\)algorithm to the discrete speed scaling variant of this problem with K speeds (in practice, \(K \ll N\)). Furthermore, a solution to a continuous speed scaling problem can be converted to the discrete speed scaling variant in \(O(N \log K)\) time by simulating the continuous speeds (Sect. 4.5). To the best of our knowledge, there are no papers that relate optimal continuous and discrete speed scaling for multiprocessor systems, or that solve discrete multiprocessor speed scaling problems algorithmically. Only in the simple case where tasks have no precedence constraints and local speed scaling is used, the techniques from singleprocessor speed scaling can be applied to individual processors. More research on discrete speed scaling for multiprocessor systems, and the relation between continuous and discrete speed scaling on such systems is desirable.
7.2 Local speed scaling for tasks with precedence constraints
Local speed scaling for tasks with precedence constraints is an unsolved and important problem. Even the case where the tasks have been scheduled (i.e., task have been assigned to processors, and per processor a sequence of the assigned tasks is given) and only speeds need to be determined (\(P_M \vert a_n=a;d_n=d;\mathrm {prec};\mathrm {sched} \vert E\)) is currently unsolved. The power equality (discussed in Sect. 4.6) can be used as a first step toward solving the problem.
The following example illustrates why this problem may be difficult.
Example 4
Consider the power function \(p(s)=s^3\) for a threeprocessor system with local speed scaling. The tasks have precedence constraints as given in Fig. 3a. All tasks share the common deadline \(d=1\).
 (a)
Task \(T_2\) finishes before task \(T_3\), or at the same time.
In the discussion below, we may assume that the edge “a” between task \(T_2\) and \(T_4\) does not exist, as (with the given assumption) it does not influence the optimal solution. In the optimal solution, we have \(e_2+e_7=e_3+e_4=e_3+e_5\) (same execution time for tasks, avoiding gaps in the schedule), otherwise the energy consumption can be decreased by decreasing the speed of a task that is next to a gap in the schedule. These relations can be used to determine the speeds of these tasks. Using the power equality, the relation between the speeds \(s_3\), \(s_4\), and \(s_5\) can be determined. It can also be used to relate speeds \(s_1\), \(s_2\), and \(s_3\). Now enough information is available to find the optimal speeds.
 (b)
Task \(T_2\) finishes after task \(T_3\).
In the discussion below, we may assume that the edge “b” between tasks \(T_3\) and \(T_4\) does not exist, as (with the given assumption) it does not influence the optimal solution. In the optimal solution we have that \(e_2+e_7=e_2+e_4=e_3+e_5\). Again, using the convexity of the power function and using the power equality, the optimal speeds can be determined.
This example indicates that solving the overall continuous problem depends on a number of discrete cases. These cases are specified by whether some task finishes before or after some other task. As it is unclear how many of these decision points may occur, and if there is an efficient (polynomial time) algorithm to make these decisions, the above example suggests that the local speed scaling problem with a given schedule of tasks with precedence constraints may be difficult.
8 Discussion
Algorithmic power management can be used to significantly reduce the energy consumption of computing devices. Combined with such power management techniques, scheduling algorithms play a crucial role, since the underlying schedules have a critical impact on the efficiency of power management techniques. This survey discusses a great variety of such scheduling algorithms that reduce the energy consumption of realtime systems by either decreasing the speed (speed scaling), or by turning devices off (sleep modes). We also argued that many of these speed scaling algorithms minimize the peak power consumption, although they are designed to minimize the energy consumption. Furthermore, we pointed out that many power management algorithms rely on the same theoretical concepts. Therefore, we did not only survey algorithms, but also the fundamental ideas behind these algorithms.
As many papers on algorithmic power management do not consider several important architectural details, there is a gap between theory and practice. Therefore, in this survey we gave a short overview of some of these aspects, and how they can be modeled or treated. An example of such an aspect is nonuniform power, which is rarely mentioned in the theoretical literature.
Another important aspect missing in the theoretic literature is the interaction between global and local speed scaling (“voltage and frequency islands”). These hybrids of local and global speed scaling, and multiprocessor discrete speed scaling are—in our view—the major theoretical challenges that need to be addressed in the near future.
Notes
Acknowledgments
This work is supported through NWO Project EASY.
References
 Albers, S. (2010). Energyefficient algorithms. Communications of the ACM, 53(5), 86–96. doi: 10.1145/1735223.1735245.CrossRefGoogle Scholar
 Albers, S., & Antoniadis, A. (2014). Race to idle: New algorithms for speed scaling with a sleep state. ACM Transactions on Algorithms, 10(2), 9:1–9:31. doi: 10.1145/2556953.CrossRefGoogle Scholar
 Albers, S., Antoniadis, A., & Greiner, G. (2011). On multiprocessor speed scaling with migration: Extended abstract. In: Proceedings of the 23rd ACM symposium on parallelism in algorithms and architectures, ACM, New York, NY, USA, SPAA ’11 (pp. 279–288). doi: 10.1145/1989493.1989539.
 Albers, S., Müller, F., & Schmelzer, S. (2014). Speed scaling on parallel processors. Algorithmica, 68(2), 404–425. doi: 10.1007/s0045301296787.CrossRefGoogle Scholar
 Alon, N., Azar, Y., Woeginger, G.J., & Yadid, T. (1997). Approximation schemes for scheduling. In: Proceedings of the 8th annual ACMSIAM symposium on discrete algorithms, society for industrial and applied mathematics, SODA ’97 (pp. 493–500). Philadelphia, PA. http://dl.acm.org/citation.cfm?id=314161.314371.
 Angel, E., Bampis, E., & Chau, V. (2012a). Low complexity scheduling algorithm minimizing the energy for tasks with agreeable deadlines. In: D. FernándezBaca (Ed.) LATIN 2012: Theoretical informatics. Lecture Notes in Computer Science, vol 7256 (pp. 13–24). Springer, Berlin. doi: 10.1007/9783642293443_2.
 Angel, E., Bampis, E., Kacem, F., & Letsios, D. (2012b). Speed scaling on parallel processors with migration. In: Kaklamanis C, Papatheodorou T, & Spirakis P (eds) EuroPar 2012 parallel processing. Lecture Notes in Computer Science, vol. 7484 (pp. 128–140). Springer, Berlin. doi: 10.1007/9783642328206_15
 Angel, E., Bampis, E., & Chau, V. (2014). Low complexity scheduling algorithms minimizing the energy for tasks with agreeable deadlines. Discrete Applied Mathematics, 175, 1–10. doi: 10.1016/j.dam.2014.05.023.CrossRefGoogle Scholar
 Antoniadis, A., & Huang, C. C. (2013). Nonpreemptive speed scaling. Journal of Scheduling, 16(4), 385–394. doi: 10.1007/s1095101303126.CrossRefGoogle Scholar
 Antoniadis, A., Huang, C.C., & Ott, S. (2015). A fully polynomialtime approximation scheme for speed scaling with sleep state. In: Proceedings of the 26th annual ACMSIAM symposium on discrete algorithms, SIAM, SODA ’15 (pp. 1102–1113). http://dl.acm.org/citation.cfm?id=2722129.2722203.
 Augustine, J., Irani, S., & Swamy, C. (2008). Optimal powerdown strategies. SIAM Journal on Computing, 37(5), 1499–1516. doi: 10.1137/05063787X.CrossRefGoogle Scholar
 Bampis, E., Dürr, C., Kacem, F., & Milis, I. (2012). Speed scaling with power down scheduling for agreeable deadlines. Sustainable Computing: Informatics and Systems, 2(4), 184–189. doi: 10.1016/j.suscom.2012.10.003.Google Scholar
 Bampis, E., Letsios, D., & Lucarelli, G. (2012b). Green scheduling, flows and matchings. In: Chao KM, Hsu TS, Lee DT (eds) Algorithms and computation. Lecture Notes in Computer Science, vol. 7676 (pp. 106–115). Springer, Berlin. doi: 10.1007/9783642352614_14.
 Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., & Sviridenko, M. (2014a). Energy efficient scheduling and routing via randomized rounding (pp. 1–27). arXiv:1403.4991.
 Bampis, E., Letsios, D., & Lucarelli, G. (2014b). Speedscaling with no preemptions. In: H.K. Ahn & C.S. Shin (eds) Algorithms and computation. Lecture Notes in Computer Science, vol. 8889 (pp. 259–269). Springer, Berlin. doi: 10.1007/9783319130750_21.
 Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., & Nemparis, I. (2015). From preemptive to nonpreemptive speedscaling scheduling. Discrete Applied Mathematics, 181, 11–20. doi: 10.1016/j.dam.2014.10.007.CrossRefGoogle Scholar
 Bansal, N., Kimbrel, T., & Pruhs, K. (2007). Speed scaling to manage energy and temperature. Journal of ACM, 54(1), 3:1–3:39. doi: 10.1145/1206035.1206038.CrossRefGoogle Scholar
 Bansal, N., Chan, H. L., & Pruhs, K. (2013). Speed scaling with an arbitrary power function. ACM Transactions on Algorithms, 9(2), 18:1–18:14. doi: 10.1145/2438645.2438650.CrossRefGoogle Scholar
 Baptiste, P., Chrobak, M., & Dürr, C. (2012). Polynomialtime algorithms for minimum energy scheduling. ACM Transactions on Algorithms, 8(3), 26:1–26:29. doi: 10.1145/2229163.2229170.CrossRefGoogle Scholar
 Benini, L., Bogliolo, A., & De Micheli, G. (2000). A survey of design techniques for systemlevel dynamic power management. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(3), 299–316. doi: 10.1109/92.845896.CrossRefGoogle Scholar
 Bingham, B.D., & Greenstreet, M.R. (2008). Energy optimal scheduling on multiprocessors with migration. In: International symposium on parallel and distributed processing with applications, ISPA ’08 (pp. 153–161). doi: 10.1109/ISPA.2008.128.
 Boyd, S., & Vandenberghe, L. (2004). Convex optimization. New York: Cambridge University Press.CrossRefGoogle Scholar
 Bunde, D.P. (2006). Poweraware scheduling for makespan and flow. In: Proceedings of the 18th annual ACM symposium on parallelism in algorithms and architectures, SPAA ’06 (pp. 190–196). ACM, New York. doi: 10.1145/1148109.1148140.
 Chaparro, P., González, J., Magklis, G., Qiong, C., & González, A. (2007). Understanding the thermal implications of multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 18(8), 1055–1065. doi: 10.1109/TPDS.2007.1092.CrossRefGoogle Scholar
 Chen, J.J., & Kuo, C.F. (2007). Energyefficient scheduling for realtime systems on dynamic voltage scaling (DVS) platforms. In: 13th IEEE international conference on embedded and realtime computing systems and applications, RTCSA 2007 (pp. 28–38). doi: 10.1109/RTCSA.2007.37.
 Chen, J.J., Hsu, H.R., Chuang, K.H., Yang, C.L., Pang, A.C., & Kuo, T.W. (2004). Multiprocessor energyefficient scheduling with task migration considerations. In: Proceedings of 16th Euromicro conference on realtime systems, ECRTS 2004 (pp. 101–108). doi: 10.1109/EMRTS.2004.1311011.
 Cho, S., & Melhem, R. G. (2010). On the interplay of parallelization, program performance, and energy consumption. IEEE Transactions on Parallel and Distributed Systems, 21(3), 342–353. doi: 10.1109/TPDS.2009.41.CrossRefGoogle Scholar
 CohenAddad, V., Li, Z., Mathieu, C., & Milis, I. (2015). Energyefficient algorithms for nonpreemptive speedscaling. In: E. Bampis, & O. Svensson (Eds.) Approximation and online algorithms. Lecture Notes in Computer Science, vol. 8952 (pp. 107–118). Springer, Berlin. doi: 10.1007/9783319182636_10.
 Devadas, V., & Aydin, H. (2012). On the interplay of voltage/frequency scaling and device power management for framebased realtime embedded applications. IEEE Transactions on Computers, 61(1), 31–44. doi: 10.1109/TC.2010.248.CrossRefGoogle Scholar
 Gerards, M., Hurink, J., Holzenspies, P., Kuper, J., & Smit, G. (2014). Analytic clock frequency selection for global DVFS. In: 22nd Euromicro international conference on parallel, distributed and networkbased processing (PDP) (pp. 512–519). doi: 10.1109/PDP.2014.103.
 Gerards, M. E. T., & Kuper, J. (2013). Optimal DPM and DVFS for framebased realtime systems. ACM Transactions on Architecture and Code Optimization, 9(4), 41:1–41:23. doi: 10.1145/2400682.2400700.CrossRefGoogle Scholar
 Gerards, M. E. T., Hurink, J. L., & Kuper, J. (2015). On the interplay between global DVFS and scheduling tasks with precedence constraints. IEEE Transactions on Computers, 64(6), 1742–1754. doi: 10.1109/TC.2014.2345410.Google Scholar
 Graham, R. L., Lawler, E. L., Lenstra, J. K., & Rinnooy Kan, A. H. G. (1977). Optimization and approximation in deterministic sequencing and scheduling: A survey. Annals of Discrete Mathematics v5, 5, 287–326. doi: 10.1016/S01675060(08)70356X.CrossRefGoogle Scholar
 Greiner, G., Nonner, T., & Souza, A. (2014). The bell is ringing in speedscaled multiprocessor scheduling. Theory of Computing Systems, 54(1), 24–44. doi: 10.1007/s0022401394779.CrossRefGoogle Scholar
 Hsu, C.H., & Feng, W.C. (2005). When discreteness meets continuity: Energyoptimal DVS scheduling revisited. Tech. Rep. LAUR 053104, Los Alamos National Laboratory, http://sss.cs.vt.edu/pubs/tr053104.pdf.
 Huang, C.C., & Ott. S. (2014). New results for nonpreemptive speed scaling. In: E. CsuhajVarj, M. Dietzfelbinger, & Z. Sik (Eds.) Mathematical foundations of computer science 2014. Lecture Notes in Computer Science, vol. 8635 (pp. 360–371). Springer, Berlin. doi: 10.1007/9783662444658_31.
 Huang, W., & Wang, Y. (2009). An optimal speed control scheme supported by media servers for lowpower multimedia applications. Multimedia Systems, 15(2), 113–124. doi: 10.1007/s0053000901535.CrossRefGoogle Scholar
 Irani, S., & Pruhs, K. R. (2005). Algorithmic problems in power management. SIGACT News, 36(2), 63–76. doi: 10.1145/1067309.1067324.CrossRefGoogle Scholar
 Irani, S., Shukla, S., & Gupta, R. (2007). Algorithms for power savings. ACM Transactions on Algorithms, 3(4), 41:1–41:23. doi: 10.1145/1290672.1290678.CrossRefGoogle Scholar
 Ishihara, T., & Yasuura, H. (1998). Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings of the 1998 international symposium on low power electronics and design, ISLPED ’98 (pp. 197–202). ACM, New York. doi: 10.1145/280756.280894.
 Jejurikar, R., Pereira, C., & Gupta, R. (2004). Leakage aware dynamic voltage scaling for realtime embedded systems. In: 41st Proceedings of design automation conference, DAC ’04 (pp. 275–280). ACM, New York. doi: 10.1145/996566.996650.
 Kalla, R., Sinharoy, B., Starke, W. J., & Floyd, M. (2010). Power 7: IBM’s nextgeneration server processor. IEEE Micro, 30(2), 7–15. doi: 10.1109/MM.2010.38.CrossRefGoogle Scholar
 Kandhalu, A., Kim, J., Lakshmanan, K., & Rajkumar, R.R. (2011). Energyaware partitioned fixedpriority scheduling for chip multiprocessors. In: 17th international conference on embedded and realtime computing systems and applications, vol. 1, (pp. 93–102). IEEE Computer Society, Los Alamitos. doi: 10.1109/RTCSA.2011.75.
 Kwon, W. C., & Kim, T. (2005). Optimal voltage allocation techniques for dynamically variable voltage processors. ACM Transactions on Embedded Computing Systems, 4(1), 211–230. doi: 10.1145/1053271.1053280.CrossRefGoogle Scholar
 Lee, J., Yun, B., & Shin, K. G. (2014). Reducing peak power consumption inmulticore systems without violatingrealtime constraints. IEEE Transactions on Parallel and Distributed Systems, 25(4), 1024–1033. doi: 10.1109/TPDS.2013.131.CrossRefGoogle Scholar
 Lee, S., & Kim, J. (2010). Using dynamic voltage scaling for energyefficient flashbased storage devices. In: SoC Design Conference (ISOCC), 2010 International (pp. 63–66). doi: 10.1109/SOCDC.2010.5682971.
 Li, K. (2012). Scheduling precedence constrained tasks with reduced processor energy on multiprocessor computers. IEEE Transactions on Computers, 61(12), 1668–1681. doi: 10.1109/TC.2012.120.CrossRefGoogle Scholar
 Li, M., Liu, B., & Yao, F. (2006a). Minenergy voltage allocation for treestructured tasks. Journal of Combinatorial Optimization, 11(3), 305–319. doi: 10.1007/s1087800679106.CrossRefGoogle Scholar
 Li, M., Yao, A. C., & Yao, F. F. (2006b). Discrete and continuous minenergy schedules for variable voltage processors. Proceedings of the National Academy of Sciences of the United States of America, 103(11), 3983–3987. doi: 10.1073/pnas.0510886103.CrossRefGoogle Scholar
 Li, M., Yao, F. F., & Yuan, H. (2014). An \(O(n^2)\) algorithm for computing optimal continuous voltage schedules (pp. 1–12). arXiv:1408.5995.
 Liu, X., Shenoy, P., & Gong, W. (2004). A time seriesbased approach for power management in mobile processors and disks. In: Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video—NOSSDAV ’04 (pp. 74–79). doi: 10.1145/1005847.1005864.
 Manoj, P.D.S., Wang, K., & Yu, H. (2013). Peak power reduction and workload balancing by spacetime multiplexing based demandsupply matching for 3d thousandcore microprocessor. In: Proceedings of the 50th annual design automation conference, DAC ’13 (pp. 175:1–175:6). ACM, New York. doi: 10.1145/2463209.2488950.
 March, J. L., Sahuquillo, J., Hassan, H., Petit, S., & Duato, J. (2011). A new energyaware dynamic task set partitioning algorithm for soft and hard embedded realtime systems. The Computer Journal, 54(8), 1282–1294. doi: 10.1093/comjnl/bxr008.CrossRefGoogle Scholar
 Park, S., Park, J., Shin, D., Wang, Y., Xie, Q., Pedram, M., et al. (2013). Accurate modeling of the delay and energy overhead of dynamic voltage and frequency scaling in modern microprocessors. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 32(5), 695–708. doi: 10.1109/TCAD.2012.2235126.CrossRefGoogle Scholar
 Pruhs, K. (2011). Green computing algorithmics. In: IEEE 52nd annual symposium on foundations of computer science (FOCS) (pp. 3–4). doi: 10.1109/FOCS.2011.44.
 Pruhs, K., van Stee, R., & Uthaisombut, P. (2008). Speed scaling of tasks with precedence constraints. Theory of Computing Systems, 43(1), 67–80. doi: 10.1007/s0022400790701.CrossRefGoogle Scholar
 Quan, G., & Hu, X. S. (2003). Minimal energy fixedpriority scheduling for variable voltage processors. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 22(8), 1062–1071. doi: 10.1109/TCAD.2003.814948.CrossRefGoogle Scholar
 Rountree, B., Lowenthal, D.K., Funk, S., Freeh, V.W., de Supinski, B.R., & Schulz, M. (2007). Bounding energy consumption in largescale MPI programs. In: Proceedings of the 2007 ACM/IEEE conference on supercomputing—SC ’07. ACM, New York (pp. 49:1–49:9). doi: 10.1145/1362622.1362688.
 Weiser, M., Welch, B., Demers, A., & Shenker, S. (1996). Scheduling for reduced CPU energy. In: T. Imielinski & H.F. Korth (Eds.) Mobile computing. The Kluwer International Series in Engineering and Computer Science, vol. 353 (pp. 449–471). Springer, New York. doi: 10.1007/9780585296036_17.
 Wu, W., Li, M., & Chen, E. (2011). Minenergy scheduling for aligned jobs in accelerate model. Theoretical Computer Science, 412(12–14), 1122–1139. doi: 10.1016/j.tcs.2010.12.013.CrossRefGoogle Scholar
 Yao, F., Demers, A., & Shenker, S. (1995). A scheduling model for reduced CPU energy. In: Proceedings of IEEE 36th annual foundations of computer science (pp. 374–382). doi: 10.1109/SFCS.1995.492493.
 Yun, H. S., & Kim, J. (2003). On energyoptimal voltage scheduling for fixedpriority hard realtime systems. ACM Transactions on Embedded Computing Systems, 2(3), 393–430. doi: 10.1145/860176.860183.CrossRefGoogle Scholar
 Zhang, D., Guo, D., Chen, F., Wu, F., Wu, T., Cao, T., et al. (2012). TLplanebased multicore energyefficient realtime scheduling algorithm for sporadic tasks. ACM Transactions on Architecture and Code Optimization, 8(4), 47:1–47:20. doi: 10.1145/2086696.2086726.CrossRefGoogle Scholar
 Zhuravlev, S., Saez, J. C., Blagodurov, S., Fedorova, A., & Prieto, M. (2013). Survey of energycognizant scheduling techniques. IEEE Transactions on Parallel and Distributed Systems, 24(7), 1447–1464. doi: 10.1109/TPDS.2012.20.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.