1 Introduction

Energy consumption is nowadays an important design constraint for computing systems (Zhuravlev et al. 2013). On the one hand, computing power of embedded systems increases rapidly, whereas the battery capacity does not grow with the same pace. On the other hand, like for datacenters, the energy consumption is an important cost factor.

To decrease the energy consumption of computing devices while still meeting performance constraints power management techniques are often deployed (e.g., Irani and Pruhs 2005; Albers 2010). Herein, software is used to influence the energy consumption of computers. This software allows control of hardware parameters like the speed (speed scaling), or decides to transition devices to a low-power sleep mode when they are not used. Combined with such power management techniques, scheduling algorithms play a crucial role, since the underlying schedules have a critical impact on the efficiency of power management techniques. The collection of all of these techniques is often referred to with the generic term Algorithmic power management (see, e.g., Pruhs 2011).

In this article, we discuss many algorithmic power management results. More precisely, we survey theoretical results on power management as well as offline algorithms for energy minimization under deadline constraints (called the “server problem”; Bunde 2006). Furthermore, we discuss both speed scaling and low-power sleep modes. Speed scaling is used to adapt the speed of a system, so that its power consumption is reduced. It can be hard to determine the optimal speeds, because these have to be chosen globally—all tasks have to be taken into account—instead of chosen locally on a task-by-task basis.

An idle device can be put in a low-power sleep mode to reduce the energy consumption; however, energy is required to wake it up again. This poses a trade-off between sleeping or remaining idle. Sleep modes can significantly reduce the energy consumption when the system has long idle periods. Because of this, scheduling algorithms are deployed to create schedules with many and sufficiently long idle periods. Note, that speed scaling and sleep modes are not mutually exclusive: sometimes it is better to use both in combination.

Note, that in the last years, also peak power minimization became an important topic of research (e.g., Lee et al. 2014; Manoj et al. 2013). We argue that many of the speed scaling algorithms that we survey also minimize the peak power of a system.

This article is organized as follows. In the next section, we briefly discuss some related surveys. After that, in Sect. 3, we provide introductions into modeling of speed scaling and sleep modes, and we introduce the notations that are used throughout this survey. The latter is important, as different authors use different notations when describing their power management problems. Many are loosely based on the notation by Graham et al. (1977).

In Sect. 4, we present several (orthogonal) theoretical power management results, which form the foundation of many power management algorithms, and show how these results interact.

Section 5 surveys algorithms that minimize the energy consumption of single-processor systems with deadline constraints. The relation between similar problems is discussed and it is shown how the theoretical power management results from Sect. 4 are applied. This discussion is followed by a survey of multiprocessor power management problems, and algorithms for these problems in Sect. 6. In Sect. 7 several open problems are discussed, and Sect. 8 concludes this article with a discussion.

2 Related surveys

The recent article by Zhuravlev et al. (2013) surveys many energy-aware scheduling techniques. Many of the papers they survey are on thermal-aware scheduling and scheduling for asymmetric systems. In their survey, there is no emphasis on algorithms and their properties, which is the focus of this article.

Benini et al. (2000) give an important survey on sleep modes (DPM) that is mainly application oriented. They present a lot of background, and discuss implementation details, including a discussion on the Advanced Configuration and Power Interface (ACPI). The algorithms they discuss are intended for general operating systems, and depend on predictive schemes and stochastics. In contrast, this article focuses on (clairvoyant) offline algorithms for real-time systems. Moreover, this article discusses speed scaling and scheduling.

There are several articles that survey results from algorithmic power management. The very broad overview by Chen and Kuo (2007) discusses power-related scheduling techniques, but does not focus on algorithms. Irani and Pruhs (2005) and Albers (2010) present surveys that do focus on algorithms. The first survey (Irani and Pruhs 2005) contains a relatively small set of algorithms, while the second and more recent survey article (Albers 2010) discusses more algorithms. Although both surveys treat results from the entire spectrum of algorithmic power management, only a few offline algorithms for energy minimization under deadline constraints are discussed.

None of these surveys discussed in this section focuses on offline energy minimization under deadline constraints, nor treated many papers on this subject. Furthermore, to the best of our knowledge, the survey in this article is the first that links the many different theoretical concepts of algorithmic power management.

3 Modeling and notation

Many algorithmic power management papers have different modeling assumptions and there is no unique notation to describe both speed scaling and sleep mode problems. In this section, we structure the modeling assumptions and present a unifying notation for power management problems. Section 3.1 discusses the notation and models for tasks. Some practical aspects for speed scaling on a computer processor and a notation for these aspects are discussed in Sect. 3.2, while modeling of sleep modes is discussed in Sect. 3.3. Finally, a notation for algorithmic power management problems is presented in Sect. 3.4.

3.1 Task models

In general, a finite number (N) of tasks is considered, which we denote by \(T_1,\dots ,T_N\). These tasks are scheduled on M processors, where in many cases \(M=1\). Each task \(T_n\) has a workload \(w_n\). For speed scaling, a speed \(s_n\) at which task \(T_n\) is executed must be determined, which leads to an execution time \(e_n=\frac{w_n}{s_n}\). In some cases, the speed may be changed during a task, which leads to an adaption of the used notation. Then the speed function \(s : \mathbb {R}_0^+ \rightarrow \mathbb {R}_0^+\), which gives the speed as a function of the time, is used.

The available speeds are given by a set \(\mathcal {S}\), which is either an interval (\(\mathcal {S} = [s^{\min }, s^{\max }]\)) or a finite discrete set with K speeds (\(\mathcal {S}=\{\bar{s}_1,\dots ,\bar{s}_K\}\), where we assume w.l.o.g. that \(\bar{s}_1\le \dots \le \bar{s}_K\)). When a speed must be chosen from a continuous (discrete) set, we call this speed a continuous (discrete) speed, and refer to a problem with such restriction as a continuous (discrete) speed scaling problem.

Besides its workload, each task has an arrival time \(a_n\) and a deadline \(d_n\). The tasks have to be scheduled to meet these constraints, implying that the begin time \(b_n\) and completion time \(c_n\) have to be chosen so that \(a_n\le b_n\le c_n \le d_n\). If the tasks are scheduled without interruption, we furthermore have \(c_n=b_n+e_n\).

3.2 Processor models for speed scaling

An important objective used in the majority of papers that we survey is energy minimization of microprocessors. Hence, in the following we concentrate on speed scaling of microprocessors. Furthermore, we discuss some modeling assumptions that are not studied in the current algorithmic power management literature.

Microprocessors have a clock frequency, which represents the speed of the processor. For many systems the speed of the computer memory (and other peripherals) does not scale with the clock frequency of the processor because it is a separate device that does not necessarily use the same clock frequency. In other words, in most practical settings the speed of the overall system (and of tasks) does not scale linearly with the clock frequency of the microprocessor (Devadas and Aydin 2012). However, all algorithms that we survey assume that the speed does scale linearly with the clock frequency, and hence we also assume this throughout this article. Note, that this assumption leads to an underestimation of the execution times of the tasks in case the clock frequency is decreased with respect to some reference clock frequency, which means that tasks finish earlier than is predicted using the models. Note, that for a multicore processor with only local memories (e.g., scratchpad memory) the speed does scale linearly with the processor clock frequency.

As a consequence of the above assumption, clock frequency and speed are synonyms, and therefore \(s_n\) and s(t) are used to denote the clock frequency. In this article, we mostly use the terms speed and speed scaling, instead of clock frequency and Dynamic Voltage and Frequency Scaling (DVFS), in line with the majority of papers on algorithmic power management.

For multicore processors, there are two main flavors of speed scaling, namely local speed scaling and global speed scaling. While local speed scaling changes the speed per individual core, global speed scaling makes these changes for the entire chip. For this reason, the optimal solutions to the local and global speed scaling problems are not interchangeable. Global speed scaling is the most commonly applied of these techniques, since it is cheaper to implement (March et al. 2011; Chaparro et al. 2007). Examples of modern processors and systems that use global speed scaling are the Intel Itanium, the PandaBoard (dual-core ARM Cortex A9), IBM Power7, and the NVIDIA Tegra 2 (Kalla et al. 2010; March et al. 2011; Kandhalu et al. 2011; Zhang et al. 2012).

Nowadays, most modern microprocessors are built using CMOS transistors. When the clock frequency of a CMOS processor is decreased, the voltage may be decreased as well. Dynamic voltage and frequency scaling (DVFS) (Weiser et al. 1996) is a power management technique that allows the clock frequency and voltage to be changed at run-time. Both the clock frequency and the voltage influence the power consumption of a processor and the energy consumption is obtained by integrating this power consumption over time.

In general, there are two major sources of power consumption, namely dynamic power consumption and static power consumption. Dynamic power is consumed due to activities of the processor, i.e., due to transitions of logic gates. A CMOS transistor charges and discharges (parasitic) capacitances when it switches between logical zero and logical one. The dynamic power can be calculated by \( A C V_\mathrm {dd}^2s, \) where \(V_\mathrm {dd}\) is the supply voltage, s is the clock frequency (i.e., speed), C is the switched capacitance, and A is the activity factor, the average number of transitions per second (Ishihara and Yasuura 1998). For a given clock frequency, the minimal supply voltage is bounded and many papers (implicitly) assume that this minimal voltage is used, i.e., they used the simplified relation \(V_\mathrm {dd} = \beta s\) for some constant \(\beta >0\) (e.g., Yao et al. 1995; Huang and Wang 2009). This gives the dynamic power model

$$\begin{aligned} p^\mathrm {dyn}(s) = {\gamma _1}s^\alpha , \end{aligned}$$
(1)

where \(\alpha \) is a system-dependent constant (usually, \(\alpha \approx 3\)) and \({\gamma _1}= A C \beta ^{\alpha -1}\) contains both the average activity factor and switched capacitance. Most papers assume that \({\gamma _1}\) is constant for the entire application. Some papers use a separate constant \({\gamma _1}(n)\) for each task (referred to as nonuniform loads by Kwon and Kim (2005), or as nonuniform power), because the activity may deviate for different types of tasks. This makes the power function (to some extent) nonuniform, but throughout this article we assume \({\gamma _1}\) is constant. On the one hand this is done to keep the notation simple, and on the other hand we assume that when the power function is nonuniform, the theory that we present in Sect. 4.7 can be applied.

Static power is the power that is consumed independently of the activity of the transistors, and, thereby, it is independent of the clock frequency. However, there are two different definitions of static power that are used in the literature. The first definition of static power, popular in algorithmic papers (e.g., Cho and Melhem 2010), takes static power as a constant function (i.e., independent of the clock frequency), and is given by

$$\begin{aligned} p^\mathrm {static}(s) = {\gamma _2}, \end{aligned}$$

where \({\gamma _2}\) is a system dependent constant. The second definition—often used in computer architecture papers—uses the voltage to express the static power. Although it is physically modeled using an exponential equation, the following linear approximation with system dependent constants \({\gamma _2}\) and \({\gamma _3}\) is popular (Park et al. 2013):

$$\begin{aligned} p^\mathrm {static}(V_\mathrm {dd}) = {\gamma _2}+ \frac{{\gamma _3}}{\beta } V_\mathrm {dd}, \end{aligned}$$

and the relation between the voltage and the clock frequency (\(V_\mathrm {dd} = \beta s\)) gives

$$\begin{aligned} p^\mathrm {static}(s) = {\gamma _2}+ {\gamma _3}s. \end{aligned}$$

Note, that this relation makes the static power—which is independent of the clock frequency—indirectly dependent on the clock frequency. The resulting static energy for w work executed at speed s is \({\gamma _2}\frac{w}{s} + {\gamma _3}w\), when it is assumed that static power is consumed until all work is completed (see the discussion in Sect. 4.3). As a consequence, the constant \({\gamma _3}\) does not influence the choice of the optimal clock frequency in the case of energy minimization, which is the focus of this article. Thus, we can assume without loss of generality that \({\gamma _3}=0\) and use \(p^\mathrm {static}(s) = {\gamma _2}\) to model the static power. Since both static power models lead to the same optimal solution, it is not relevant for optimization, which of the two static power models is used.

Generally, we define the total power consumption (both static and dynamic) as a power function \(p : \mathbb {R}_0^+ \rightarrow \mathbb {R}_0^+\), which maps speed to power.

For microprocessors, the power function does not fully describe all energy that is used, since changing the clock frequency also has an energy and time overhead. The recent article by Park et al. (2013) shows that the time and energy overheads of DVFS are in the same order of magnitude as the overhead of context switching. For example, the transition delay overhead is at most \(62.68\,\upmu s\) on an Intel Core2 Duo E6850 (Park et al. 2013). Furthermore, most algorithms avoid changing the clock frequency often because of the convexity of the power function (see Sect. 4.1), hence the number of speed changes is relatively low. Because of these two reasons, we assume that the energy overhead of changing the clock frequency is negligible in case of DVFS.

Note, that speed scaling is not restricted to microprocessors, but can also be used for flash memory (Lee and Kim 2010), hard disks (Liu et al. 2004), and may even be relevant to applications outside of computer science.

3.3 Sleep modes

As already mentioned in the previous subsection, devices also consume power when they are idle. Several devices like microprocessors, hard disks, communication devices (e.g., network interfaces) can switch to a sleep mode by powering (parts of) the device down to decrease the power when idle. For example, when a processor is transitioned to a sleep mode, the current state is stored, and the state is recovered when the processor is awakened. Another example is a harddisk drive, which spins down when put to sleep mode, while it spins up when it is awakened. These devices have in common that a cost in both latency and energy is associated with switching to a sleep mode and waking up. The energy consumption determines the break-even time, which is the minimum length of an idle period which makes it worthwhile to transition to a sleep mode. It is commonly assumed that the break-even time for a sleep mode is longer than the latency associated of switching to and from this sleep mode. It was shown empirically that algorithms that use this assumption still work well when the latency is taken into account (Irani et al. 2007).

Devices can even have multiple sleep modes, with different break-even times, or there can be multiple devices within a system with different break-even times. The energy consumption during an idle period is generally modeled as a piecewise concave function \(E^{\text {SL}}: \mathbb {R}_0^+ \rightarrow \mathbb {R}_0^+\) of the length of the idle period (Augustine et al. 2008; Gerards and Kuper 2013).

3.4 Problem notation and qualification

To classify a wide variety of algorithmic power management problems, in this section a compact notation (based on the three-field notation for scheduling problems that was introduced by Graham et al. 1977) to describe a wide variety of algorithmic power management problems is introduced. The notation is similar to what is used in the algorithmic power management literature (e.g., Bampis et al. 2015), but avoids several ambiguities, by making explicit what kind of power management techniques are used.

We specify a general power management problem by three fields \(\mathfrak a| \mathfrak b| \mathfrak c\), where \(\mathfrak a\) denotes the system properties, \(\mathfrak b\) describes the tasks and their constraints, and \(\mathfrak c\) is the objective for optimization. The fields with their possible entries and their meaning are given in Table 1. A brief discussion of this notation follows below.

  • \(\mathfrak a\): The system field describes the architecture of the system. This includes the number of processors (or devices), whether speed scaling (ss) and/or sleep modes (sl) are used, and properties of the system with respect to speed scaling and/or sleep modes (see Table 1). The entries nonunif, disc, and global all imply speed scaling (ss) to keep the notation concise.

  • \(\mathfrak b\): The second field contains the task characteristics like arrival time, deadline, restrictions on the ordering of timing constraints of tasks (agree, prec, lami), and scheduling properties (migr, pmtn, prio, sched). E.g., when \(a_n\) occurs in this field, it means that tasks have arrival times, otherwise \(a_n{=}0\) (for all n) is implied. As we focus on energy minimization under deadline constraints, \(d_n\) always occurs in \(\mathfrak b\) and implies that deadlines must be met.

  • \(\mathfrak c\): The third field contains the scheduling objective. In the context of this article, the field \(\mathfrak c\) only contains “E” to denote that the energy should be minimized, but we maintain this field to preserve compliance with Graham’s notation.

Table 1 Notation for algorithmic power management problems

4 Fundamental results

Over the years, many fundamental results on algorithmic power management have been obtained, which form the basis of many algorithms, or relate problems to each other, so that the solution to one problem can be used to find a solution to another problem. This section introduces these fundamental results and concepts in the area of algorithmic power management. One of the most important results is that for the single-processor case it is optimal to use a constant speed between begin and completion time of tasks due to the convexity of the power function (Sect. 4.1). Although this result only holds for convex power functions, using the idea presented in Sect. 4.2, it can also be used for the nonconvex situation as all power functions can be “made” convex. Convexity is not the only requirement for optimization, one has to be careful that the chosen speed for a task is not too low because then static power may dominate (Sect. 4.3).

Whereas the above results are often presented in a continuous speed scaling context, in practice, discrete speed scaling is more often used. Many speed scaling problems (with a given schedule) can be formulated as a linear program (Sect. 4.4). Moreover, in the single-processor case it is furthermore straightforward to derive the solution to this discrete problem from the solution to the continuous case (Sect. 4.5).

For multiprocessor problems, it can be shown that in the optimal solution of several problems the power consumption remains constant over time. This fact is referred to as the power equality (Sect. 4.6). The problem wherein every task has a different power function (Sect. 4.7) is related to this multiprocessor problem. We present a simple transformation that transforms this problem with multiple power functions to the problem wherein all tasks have the same power function.

Finally, we briefly discuss that speed scaling problems wherein preemptions are not allowed can sometimes be written as a flow problem (Sect. 4.8), and that when scheduling for sleep modes, it is often best to unbalance the length of idle periods (Sect. 4.9).

4.1 Constant speed

Whenever a single processor executes a single task using varying speeds, the energy consumption can be decreased by running it at the average speed. This even holds when the task is executed with interruptions (i.e., on times given by any set \(\mathcal {T}\)). This result holds for all convex power functions, where this property does not form a restriction as is discussed in Sect. 4.2. We formalize this result, which is a direct consequence of Jensen’s inequality (Irani et al. 2007), in the following theorem.

Theorem 1

Given a task with w work, which is executed at the times given by the set \(\mathcal {T}\) (i.e., \(w=\int _\mathcal {T} s(\tau ) d\tau \)) and is executed on a processor with a convex power function. Then the following inequality holds:

$$\begin{aligned} p\left( \frac{w}{e}\right) e \le \int _\mathcal {T} p(s(\tau ))\mathrm{d}\tau . \end{aligned}$$

Proof

The infinite version of Jensen’s inequality states:

$$\begin{aligned} p\left( \frac{1}{\int _\mathcal {T} 1 d\tau } \int _\mathcal {T} s(\tau ) \mathrm{d}\tau \right) \le \frac{1}{\int _\mathcal {T} 1 d\tau } \int _\mathcal {T} p(s(\tau ))\mathrm{d}\tau . \end{aligned}$$

Multiplying this equation by \(\int _\mathcal {T} 1 d\tau \) directly leads to the result of the theorem. \(\square \)

Theorem 1 shows that for continuous speed scaling, there always exists a constant speed that is optimal for a single task on a single processor. Many papers (e.g., Huang and Wang 2009; Yao et al. 1995; Li et al. 2006) use the idea behind Theorem 1, and show that minimizing unnecessary speed fluctuations on a single processor is optimal also for situations with more than one task, i.e., \(N>1\). However, when there are arrival times, deadlines, etc., the optimal constant speed may change on these specific times, meaning that the optimal speed function is piecewise constant.

4.2 Nonconvex power function

The previous section (and with it, a large part of the literature) assumes that the power function is convex, but for technical reasons this is not always the case. However, it is possible to circumvent this by not using the speeds of the regions where the function is not convex, since we can show that these speeds are not efficient. This process is first explained for discrete speed scaling.

Assume three given speeds \(\bar{s}_i < \bar{s}_j < \bar{s}_k\) (let \(\bar{s}_j = \lambda \bar{s}_i + (1-\lambda )\bar{s}_k\) for some \(\lambda \in (0,1)\)) and w work, where

$$\begin{aligned} p(\bar{s}_j)w \le p(\bar{s}_i)\lambda w + p(\bar{s}_k)(1-\lambda )w, \end{aligned}$$
(2)

does not hold. This implies that executing the work at speed \(\bar{s}_j\) would cost more energy than executing a part of the work at \(\bar{s}_i\) and the remaining work at \(\bar{s}_k\). In this case, we call \(\bar{s}_j\) an inefficient speed as it is never beneficial to use this speed.

Based on the above, we may assume that all speeds in \(\mathcal {S}\) are efficient speeds, thus Eq. (2) holds for all speeds (i.e., inefficient speeds are “discarded”), as is discussed by Hsu and Feng (2005). This illustrates that we can always assume without loss of generality that the power function is convex.

Bansal et al. (2013) state that a similar procedure can be followed for continuous speed scaling. Note, that the static and dynamic power models from Sect. 3.2 are already convex.

4.3 Critical speed

With the presence of static power, convexity of the power function is not the only aspect which has to be taken into account when finding an optimal solution for some speed scaling problems.

In practice, processors consume static power (\({\gamma _2}>0\)), i.e., the power consumption at speed 0 is nonnegative (\(p(0)>0\)). Unfortunately, most papers do not clearly define for which time period they take the static power into account. In this survey, we assume that the application begins at some given time \(t^B\), and the power consumption of the processor is accounted for until some time \(t^C\). Furthermore, we either assume that \(t^C=c_N\) (completion time of the last task) or \(t^C=d_N\) (deadline of the last task). For example, Yao et al. (1995) only assume that the power function is convex and do not mention static power. However, their result only holds when the static power cannot be influenced, i.e., when it is accounted for until the deadline of the last task and not only to the completion time of the last task. As in this case, static power cannot be influenced, the situation where \(p(0)=0\) gives the same solution as the case where \(p(0)>0\). This scenario is mentioned by Irani et al. (2007).

For the other scenario, where the static power is active until the last task has finished, not only the power function should be studied, but also the energy-per-work function:

$$\begin{aligned} \bar{p}(s) = \frac{p(s)}{s}. \end{aligned}$$

This function gives the energy consumption of a unit work (instead of a unit time), has a global minimizer \(s^{\text {crit}}\) (called the critical speed by Jejurikar et al. 2004), and is increasing on \(s\ge s^{\text {crit}}\) (Irani et al. 2007). All speeds below \(s^{\text {crit}}\) require more energy per unit work, while it takes longer to execute. Hence, the schedule length can be decreased by increasing speeds to \(s^{\text {crit}}\), and the energy consumption is reduced.

4.4 Discrete speed scaling as a linear program

Besides static power, many processors have the restriction that only a small set of speeds is allowed (discrete speed scaling). Many discrete speed scaling problems with a given schedule can be formulated as a linear program, as we show in the following.

When discrete speed scaling is considered with K discrete speeds, the decision to be made is the amount of work of task \(T_n\) that is executed at speed \(\bar{s}_k\). If we denote this amount by \(w_{n,k}\) (i.e., \(\sum _{k=1}^K w_{n,k}=w_n\)), the total energy consumption of all tasks together is given by

$$\begin{aligned} \sum _{n=1}^N \sum _{k=1}^K p(\bar{s}_k)w_{n,k}, \end{aligned}$$

which is a linear function of the decision variables \(w_{n,k}\). These variables, together with the begin time of tasks, form the decision variables of the linear program.

Constraints like arrival time, deadline, and precedence constraints can all be formulated as linear constraints. Therefore, many discrete speed scaling problems (with or without a given schedule) can be formulated as a linear program (Kwon and Kim 2005; Rountree et al. 2007) and, thus, can be solved in polynomial time.

4.5 Relation between continuous and discrete speed scaling

Formulating discrete speed scaling problems as a linear program and solving it with linear programming software provides few insights. Instead, a tailored algorithm for finding the optimal speeds is desirable. Such algorithms are described in many papers (e.g., Yao et al. 1995; Pruhs et al. 2008; Huang and Wang 2009) for continuous speed scaling, while in practice most processors support only discrete speed scaling. Therefore, in the following, we investigate the relation between continuous speed scaling and discrete speed scaling.

When a single task is considered, the optimal speed s resulting from the continuous case can be used to determine the optimal speeds for the discrete case. When the speed s is not one of the available discrete speeds, using only the neighboring speeds \(\bar{s}_i \le s \le \bar{s}_{i+1}\) leads to an optimal solution. More precisely, the first part of the work is executed at speed \(\bar{s}_{i+1}\) and the remaining work is executed at speed \(\bar{s}_{i}\). These fractions of work are calculated so that the overall time remains the same. We refer to this as simulating continuous speed scaling.

The above-described simulating process has been proven to be optimal for the execution of a single task, and can be extended to multiple tasks. For multiple tasks, many continuous speed scaling algorithms only require that the power function is convex. Given a set of discrete speeds, we can fill the intervals between these speeds by taking the weighted average speed of a task using two neighboring speeds. This leads to a power function that gives as power for a given speed the weighted average power of the two used speeds (this function is called the average power function). Kwon and Kim (2005) and Hsu and Feng (2005) have proven that this average power function is a convex piecewise linear function. Hence, any continuous speed scaling algorithm that assumes only convexity can be used to find the optimal average speeds, after which the discrete assignment can be determined using simulation.

Fig. 1
figure 1

Task graph

4.6 Power equality

The previous sections mainly focused on the single-processor case. In the multiprocessor case with precedence constraints, new issues arise that are best illustrated with an example.

Example 1

Consider the three tasks from Fig. 1, each with w work, which are to be executed on a local speed scaling multiprocessor system. Task \(T_1\) has to be finished before tasks \(T_2\) and \(T_3\) can be executed, and the application as a whole has a global arrival time 0 and a global deadline d. An example of a naive speed assignment is \(s_1=s_2=s_3=\frac{2w}{d}\). Note that Theorem 1 cannot be used to argue that this assignment is optimal, because now multiple processors are active. In fact, this assignment is not optimal, since it can be improved by slightly increasing \(s_1\) so that task \(T_1\) consumes slightly more energy, while the two tasks \(T_2\) and \(T_3\) can decrease their energy consumption. The speed of task \(T_1\) should not be too high (discussed below), because then its energy consumption is no longer compensated by tasks \(T_2\) and \(T_3\).

This example illustrates that the optimal speeds depend on the amount of parallelism of the scheduled tasks. Pruhs et al. (2008) introduce the power equality for tasks with a common arrival time and deadline: in the optimal solution, the power consumption remains constant. Thus, the power is constant, and the speeds can be calculated using this power and the number of parallel executed tasks. For the concrete situation of Fig. 1, this means that \(p(s_1) = p(s_2) + p(s_3)\). This power equality generalizes Theorem 1.

Example 2

Consider again the task graph from Fig. 1 with the power function \(p(s)=s^3\), and assume that all the tasks have 10 work, and the global deadline is 40. A naive speed assignment uses the constant speed \(s_1=s_2=s_3=\frac{1}{2}\).

As in an optimal solution, tasks \(T_2\) and \(T_3\) complete simultaneously, and we get \(s_2=s_3\). Due to the power equality, for the optimal solution it holds that

$$\begin{aligned} p(s_1) = p(s_2)+p(s_3)=2p(s_2). \end{aligned}$$

Using \(p(s)=s^3\) and some elementary algebra gives \(s_1 = \root 3 \of {2}s_2\). Furthermore, the energy consumption is minimized when \(\frac{w_1}{s_1} + \frac{w_2}{s_2}=40.\) Thus \( s_1 = \frac{1+\root 3 \of {2}}{4}.\)

4.7 Nonuniform power

Most papers assume that uniform power is used (see Sect. 3.2), while in practice the parameter \({\gamma _1}\) of the power function is not constant (i.e., nonuniform) for all tasks (Kwon and Kim 2005), and a task specific factor \({\gamma _1}(n)\) for the dynamic power of task \(T_n\) is more appropriate. A similar situation occurs in the multicore situation with m active cores, where the dynamic power must be multiplied by m. This fact is used by several papers on multicore speed scaling (e.g., Gerards et al. 2015).

The dynamic energy consumption for N tasks with nonuniform power functions is given by (see Eq. (1) and Sect. 3.2)

$$\begin{aligned} E = \sum _{n=1}^N {\gamma _1}(n) s_n^\alpha \frac{w_n}{s_n}. \end{aligned}$$
(3)
Table 2 Uniprocessor algorithmic power management problems

Fortunately, there is an elegant transformation due to Kwon and Kim (2005) that can reduce this expression to one with a constant power parameter \({\gamma _1}\). Using the substitution of variables \(\mathring{w}_n = \root \alpha \of {{\gamma _1}(n)}w_n\) and \(\mathring{s}_n = \root \alpha \of {{\gamma _1}(n)} s_n\), (3) becomes

$$\begin{aligned} E = \sum _{n=1}^N \mathring{s}_n^\alpha \frac{\mathring{w}_n}{\mathring{s}_n}. \end{aligned}$$
(4)

This corresponds to an instance where the execution time of task \(T_n\) becomes \(\frac{\mathring{w}_n}{\mathring{s}_n}\), and \({\gamma _1}=1\) for all tasks, i.e., \({\gamma _1}\) disappears from the costs.

The newly obtained problem has uniform power, can be solved using classic algorithms, and the resulting solution can be transformed back to a solution to the problem with nonuniform power.

4.8 Flow problems

Several power management problems can be reduced to (convex) flow problems. However, as these formulations as flow problems depend on the concrete algorithmic power management problem, we do not discuss this technique in more detail. We refer the interested readers to three papers, namely Bampis et al. (2012b), Albers et al. (2011), and Angel et al. (2012b), which use such techniques to solve the problem \(P_M;ss \vert a_n;d_n;\mathrm {pmtn};\mathrm {migr} \vert E\). In Sect. 6.1 these papers are briefly discussed.

4.9 Sleep modes

A device can have multiple sleep modes that can be used to decrease the power consumption when the device is idle. A deeper sleep mode requires less power, but the transition costs are higher. As already mentioned, only when the idle period is longer than the break-even time of a sleep mode, it becomes worthwhile to use this sleep mode. Furthermore, for the case that in any idle period the best possible sleep mode is used (i.e., that with the lowest total energy consumption), we can derive an important property of the sleep mode problem. This property is based on the following two properties of the energy consumption function \(E^{\text {SL}}(\tau )\): the function \(E^{\text {SL}}(\tau )\) is an increasing and concave function and \(E^{\text {SL}}(0)=0\). Because of these properties, it holds that for \(0 \le \delta \le x \le y\) (Gerards and Kuper 2013) we have

$$\begin{aligned} E^{\text {SL}}(x-\delta ) + E^{\text {SL}}(y+\delta ) \le E^{\text {SL}}(x) + E^{\text {SL}}(y). \end{aligned}$$
(5)

This means that, for any two idle periods of length x and y (\(x \le y\)), the energy consumption does not increase when a certain amount \(\delta \) of the smallest period gets shifted to a bigger idle period. This implies that a schedule that “unbalances” the length of idle periods reduces the energy consumption.

5 Uniprocessor problems

The previous section introduced many general concepts that can be applied to a variety of power management problems. This section surveys concrete algorithms for uniprocessor power management problems (see Table 2 for an overview), and relates these algorithms (when applicable) to the results that were presented in the previous section.

Recall that for each task \(T_n\) we have a workload \(w_n\), an arrival time \(a_n\), and a deadline \(d_n\) before which the task has to finish. In the case of speed scaling, a speed \(s_n\) is to be determined, leading to an execution time \(e_n\). We use \(b_n\) and \(c_n\) to denote the begin and completion time of task \(T_n\), respectively.

The problems in this section are grouped depending on restrictions on the ordering of the timing constraints of tasks. For all problems discussed in this section, the problem consists of finding a schedule together with speeds and/or sleep decisions. First, the problems without any restrictions on the timing constraints are discussed in Sect. 5.1. Several variants of this problem are solved by algorithms with a relatively high-polynomial time complexity, or are NP-hard. Second, in Sect. 5.2, the simpler case of problems with agreeable deadlines is discussed. For many variants of this problem, algorithms with a quadratic time complexity are known. Third, laminar problems are discussed in Sect. 5.3.

5.1 General tasks

In this section, we discuss general tasks, i.e., tasks that have arbitrary arrival times and deadlines. The first variant that we consider allows preemptions of tasks (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). According to Albers et al. (2011), this is the most extensively studied speed scaling problem in the algorithm-oriented literature. Yao et al. (1995) present the well-known YDS algorithm (named after the authors) to solve this problem. This algorithm is often used as a subroutine by other algorithms, and in complexity proofs.

The considered problem involves both scheduling and speed scaling. However, if we have specified the speed to use over the complete time horizon, or if we have specified the speed of each task, we can find a corresponding feasible schedule—if it exists for this speed assignment—by planning successively always the available task with the smallest deadline (Yao et al. 1995). The basic idea of the YDS algorithm is to avoid unnecessary speed changes (see Sect. 4.1), and has the property that the speeds in the optimal solution cannot be lowered to decrease the energy consumption without violating deadlines.

More precisely, the YDS algorithm works with time intervals of the form \(I_{i,j} = [a_i,d_j]\), where \(a_i < d_j\). The density of such an interval is defined as

$$\begin{aligned} g(I_{i,j}) = \frac{\sum _{n \in T_{i,j}} w_n}{d_j-a_i}, \end{aligned}$$

where \(T_{i,j} := \{ T_n \mid [a_n, d_n] \subseteq I_{i,j} \}\) is the set of all tasks that have to be scheduled completely within the interval \(I_{i,j}\). The density determines the minimal average speed that has to be used to execute the tasks from \(T_{i,j}\) completely within this interval. The YDS algorithm takes a so-called critical interval—an interval \(I_{i,j}\) with the highest density—and assigns to all tasks from \(T_{i,j}\), and to the interval \(I_{i,j}\) this density as speed. The algorithm creates a new subproblem by removing these tasks from the task set, and by removing the interval \(I_{i,j}\) from the time axis leading to an adjustment of the arrival times and deadlines of the other tasks to take unavailability of the processor during this time interval into account. Next to leading to an optimal solution, by construction, YDS also avoids unnecessary speed fluctuations and obviously YDS also minimizes the peak power.

Table 3 Tasks for Example 3
Fig. 2
figure 2

Arrival times, deadlines, and optimal solution for Example 3. (a) Iteration 1. (b) Iteration 2. (c)Iteration 3. (d) Optimal solution

Example 3

(YDS algorithm) Consider the tasks from Table 3 of which the arrival times and deadlines are depicted in Fig. 2a. The YDS algorithm first determines the critical interval, which is \(I_{2,2}\) in the first iteration of the algorithm (see Table 4). Since the density of this interval is \(g(I_{2,2})=2\), task \(T_2\) is assigned the speed \(s_2=2\). Next, the interval \(I_{2,2}\) is removed, and the arrival times and deadlines of the other tasks are adapted accordingly (see Fig. 2b).

In the second iteration, interval \(I_{1,4}\) yields the critical density \(g(I_{1,4})=\frac{4}{3}\) (see Table 4), which is assigned as speed to task \(T_1\) and \(T_4\) (i.e., \(s_1=s_4=\frac{4}{3}\)). After removing these tasks, only task \(T_3\) remains in the last iteration (see Fig. 2c), which is assigned the speed \(s_3=\frac{1}{2}\). A preemptive Earliest Deadline First (EDF) schedule with the aforementioned speeds ensures that the deadlines are met and the energy consumption is minimized.

Table 4 Interval densities for Example 3

In a schedule created by this YDS algorithm, the processor is active from the arrival of the first task to the deadline of the last task (unless there are no tasks in some interval). Hence, because of static power, this algorithm is only optimal when it is assumed that the processor remains active until the last deadline (Irani et al. 2007). To the best of our knowledge, there is no optimal algorithm known for the situation where no static energy is consumed after the last executed task.

The original implementation of the YDS algorithm has a time complexity of \(O(N^3)\) (Li et al. 2006). As the original paper (Yao et al. 1995) does not contain a proof of optimality, several proofs of optimality have appeared in the literature afterwards. Bansal et al. (2007) use the Karush Kuhn Tucker (KKT) conditions (Boyd and Vandenberghe 2004) to prove optimality of YDS for the power function \(p(s)=s^\alpha \). Li et al. (2006) give a different proof, and present an efficient implementation of YDS with time complexity \(O(N^2 \log N)\). They also provide an \(O(K N \log N)\) algorithm for the variant with discrete speed scaling with K speeds (\(1;\mathrm {disc} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). A recent technical report by Li et al. (2014) states that the continuous problem can be solved in \(O(N^2)\) and the discrete problem can be solved in \(O(N \log \max \{N,K\})\). An alternative method for obtaining the optimal speeds in the discrete case is by applying the YDS algorithm, and then simulating the obtained speeds as discussed in Sect. 4.5 (Kwon and Kim 2005; Hsu and Feng 2005).

The YDS algorithm schedules tasks in EDF order. This implies that when tasks must be scheduled in a predefined order (e.g., based on priorities), the YDS algorithm cannot be used (Quan and Hu 2003). Yun and Kim (2003) show that the fixed priority variant of this problem (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {prio} \vert E\)) is NP-hard, and give an FPTAS for the problem.

There exist several other variations of the problem introduced by Yao et al. (1995). The variant that does not allow preemptions of tasks (\(1;\mathrm {ss} \vert a_n;d_n \vert E\)) is NP-hard (Antoniadis and Huang 2013). Bampis et al. (2015) designed an algorithm for this problem with the approximation ratio \((1+w^{\max }/w^{\min })^\alpha \), where \(w^{\max }\) and \(w^{\min }\) are, respectively, the upper and lower bounds on the work of tasks. Bampis et al. (2014b) use results from several papers (Huang and Ott 2014; Bampis et al. 2014a; Cohen-Addad et al. 2015) for this problem to design an algorithm with approximation ratio \((1+\epsilon )^\alpha \tilde{B}_\alpha \), where \(\tilde{B}_\alpha = \sum _{k=0}^\infty \frac{k^\alpha e^{-1}}{k!}\) is a generalization of the Bell numbers that works for fractional values of \(\alpha \). When all tasks have the same workload (\(1;\mathrm {ss} \vert a_n;d_n;w_n=1 \vert E\)), the problem can be solved in polynomial time (Huang and Ott 2014).

Kwon and Kim (2005) study another variation, where the dynamic power consumption may differ per task (\(1;\mathrm {ss}; \mathrm {nonunif} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). This is, for example, due to switched capacitances. They solve this problem using a substitution of variables (see Sect. 4.7). They formulate the discrete speed scaling variant of this problem (\(1;\mathrm {ss};\mathrm {nonunif};\mathrm {disc} \vert a_n;d_n;\mathrm {pmtn} \vert E\)) as a linear program (see Sect. 4.4).

The sleep mode counterpart of the YDS problem is \(1;\mathrm {sl} \vert a_n;d_n;\mathrm {pmtn} \vert E\). Baptiste et al. (2012) present an algorithm that is commonly referred to as BCD (named after the authors), that uses dynamic programming to solve the problem in \(O(N^4)\) time. Their algorithm is restricted to instances where processors have only a single sleep mode.

Other authors (Albers and Antoniadis 2014; Irani et al. 2007) study the combination of speed scaling and sleep modes, namely \(1;\mathrm {ss};\mathrm {sl} \vert a_n;d_n;\mathrm {pmtn} \vert E\), which is an NP-hard problem. The heuristic by Irani et al. (2007) is a 2-approximation and is relatively easy to implement. This heuristic uses YDS to determine the speeds, and whenever YDS determines a speed \(s_n{<}s^{\text {crit}}\), this speed is replaced by the speed \(s^{\text {crit}}\) (this is called an \(s^{\text {crit}}\)-schedule). These changes create idle time, that can be used to put the processor into a sleep mode. As long as there are tasks available, they are consecutively executed, followed by an idle period of maximal length. This scheduling method is used to create relatively large idle periods. Albers and Antoniadis (2014) use a similar method, but with the cut-off speed \(s^*\) instead of \(s^{\text {crit}}\), where \(s^*\) is determined by solving \(\bar{p}(s^*)=\frac{4}{3}\bar{p}(s^{\text {crit}})\). Furthermore, they use BCD instead of the scheduling algorithm by Irani et al. (2007). This results in a 4/3-approximation, but has a higher time complexity (\(O(N^4)\)) because of the use of BCD. When the power function \(p(s)={\gamma _1}s^\alpha + {\gamma _2}\) is used (realistic for DVFS), the approximation ratio becomes 137/117 (\({<}1.171\)). Recently, Antoniadis et al. (2015) presented an FPTAS for this problem that is based on dynamic programming. In this dynamic programming approach, the time horizon is discretized by a polynomial number of intervals, where the number of intervals depends on the required approximation ratio.

5.2 Agreeable deadlines

In applications like multimedia and telecommunication, the arrival times and deadlines are usually in the same order (i.e., \(a_n < a_m \Leftrightarrow d_n \le d_m\)). Such applications are said to have agreeable deadlines. This special structure of the timing constraints makes the development of efficient speed scaling and sleep mode algorithms possible. One main reason for this is that we can assume w.l.o.g. that the tasks are scheduled in order of their timing constraints (i.e., deadlines) and that no preemption is used (for the latter, see e.g., Bampis et al. 2015)

Speed scaling for systems with agreeable deadlines (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\)) is studied by many authors (e.g., Huang and Wang 2009; Wu et al. 2011). Huang and Wang (2009) present an algorithm that calculates the optimal speeds in quadratic time. Their algorithm first schedules the task using the same speed for all tasks. This speed is calculated, so that all tasks are scheduled exactly within the time interval between the first arrival time and the last deadline without any idle time. Then, a task \(T_n\) with the largest violation of an arrival or a deadline in this schedule is used to divide the set of tasks into two subsets: the tasks before and the tasks after the violation. For a deadline violation, the completion time of task \(T_n\) is fixed to \(d_n\), while for an arrival time violation the begin time of task \(T_n\) is fixed to \(a_n\). Then the procedure is recursively repeated for both subsets.

In a variant of this problem, the maximal rate of change of the speed is bounded from above by R (i.e., \(\max _t |s'(t)| \le R\), for some \(R > 0\)). For this problem Wu et al. (2011) present an algorithm, which finds the optimal solution in quadratic time.

Next to agreeable deadlines with speed scaling, also the problem with sleep modes and the combination of speed scaling and sleep modes is studied in the literature. For the problem where the processor has a single sleep mode (\(1;\mathrm {sl} \vert a_n;d_n;\mathrm {agree} \vert E\)), the algorithm by Angel et al. (2012a) (see also Angel et al. 2014) can be applied to find an energy optimal schedule. The authors observe that there always exists an optimal solution in which every task \(T_n\) starts at either (i) \(a_n\), (ii) \(c_{n-1}\), or (iii) \(d_n-e_n\). Note, that the options for the completion time \(c_{n-1}\) depends on the begin times of tasks \(T_1,\dots ,T_{n-1}\). By this, for each task \(T_k\) (tasks ordered in EDF order), there are O(k) possible begin times, leading to a quadratic time complexity. This result by Angel et al. (2012a) is extended by Bampis et al. (2012a) leading to a cubic time algorithm to find the optimal combination of speed scaling and sleep modes (\(1;\mathrm {sl};\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\)).

5.3 Laminar instances

In this section, we study tasks with a nested structure, called laminar instances. A real-time system is a laminar instance whenever, for each pair of tasks, the permissible intervals (\([a_n,d_n]\) for task \(T_n\)) do not overlap, or one is completely contained within the other. In a graphical representation, a task \(T_i\) is drawn on top of task \(T_j\) when \([a_i,d_i] \subset [a_j,d_j]\), which creates layers of tasks and explains the term “laminar instances.” According to Li et al. (2006) these structures occur in recursive programs. Since the tasks can be arranged in a tree structure that expresses this recursion, laminar instances are also referred to as tree-structured tasks (Li et al. 2006). Li et al. (2006) give an efficient polynomial time algorithm to find the optimal speeds for laminar instances (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {lami} \vert E\)). The variant of this problem that does not allow preemptions (\(1;\mathrm {ss} \vert a_n;d_n;\mathrm {lami} \vert E\)) is NP-hard. Huang and Ott (2014) present a Quasi-Polynomial Time Approximation Scheme (QPTAS) for this problem.

Just as for the problem with agreeable deadlines, the restriction to laminar instances makes the problem easier to solve. In fact, the case where all deadlines or all arrival times are the same has both agreeable deadlines and is a laminar instance. For both problems, a linear time solution is available (Li et al. 2006).

6 Multiprocessor problems

This section discusses multiprocessor algorithmic power management problems (see Table 5 for an overview). The problems in this section consist of finding a multiprocessor schedule together with speeds and/or sleep decisions. General tasks (i.e., tasks without special restrictions on arrival times and deadlines) are discussed in Sect. 6.1. Algorithms for tasks with agreeable deadlines are discussed in Sect. 6.2, followed by a discussion of tasks with precedence constraints in Sect. 6.3.

Table 5 Multiprocessor algorithmic power management problems

6.1 General tasks

We first consider the variant of the problem, where all tasks arrive at time 0, have a shared global deadline, and local speed scaling is used to minimize the total energy consumption (\(P_M;\mathrm {ss} \vert a_n=a;d_n=d \vert E\)). This problem is strongly NP-hard (Albers et al. 2014), since the 3-partition problem can be reduced to it. Pruhs et al. (2008) show that the problem of minimizing the makespan under an energy constraint can be formulated as the problem of minimizing the \(\ell _\alpha \) norm of the processor loads (where \(\alpha \) is the exponent in the dynamic power function, see Sect. 3.2). For the latter problem, a PTAS exists (Alon et al. 1997). In a similar fashion, also a PTAS can be derived for energy minimization under a global deadline constraint. Such a PTAS cannot exist (unless \(\mathcal {P}\ne \mathcal {NP}\)) if there is a maximum speed \(s^\text {max}\), i.e., \(s_n \le s^\text {max}\) for all n (Chen et al. 2004). Chen et al. (2004) study both the general tasks problem (\(P_M;\mathrm {ss} \vert a_n=0;d_n=d \vert E\)) and the variant with restricted speeds. For the first problem they provide an algorithm with a 1.13 approximation ratio, which also attains this ratio for the second problem under some additional restrictions. Furthermore, they presented an algorithm that can solve both problems optimally when migrations are allowed.

There are several variations of the problem with arbitrary arrival times and deadlines considered in the literature. They differ depending on whether preemptions and migrations of tasks are allowed or not. The widely studied problem \(P_M;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn};\mathrm {migr} \vert E\) uses the combination of local speed scaling and scheduling, where preemptions and migrations of tasks are allowed. This problem was first studied by Bingham and Greenstreet (2008), wherein the authors show that the problem is convex. They present an algorithm that is polynomial in the number of tasks, but according to the authors, the complexity is too high for practical applications. However, as they also discuss properties of the optimal solution, their paper is important when studying multiprocessor speed scaling with preemptions and migrations. Albers et al. (2011) present a more efficient polynomial time algorithm for the same problem. Their algorithm uses repeated maximum flow computations to minimize the energy consumption. A closely related approach by Angel et al. (2012b) also uses maximum flow computations to find the optimal solution in polynomial time. The resulting algorithm is more efficient than that of Albers et al. (2011) for the case that a reduced accuracy is allowed. Another approach to the same problem is discussed in the paper by Bampis et al. (2012b), wherein the optimal speeds are determined by solving a convex flow problem. In this approach, execution times correspond to amounts of flow, which have to be sent through the network. The algorithm that solves this problem has a time complexity that depends on the latest deadline. Although this dependency on the deadline is a drawback, the presented approach is straightforward and its concepts are interesting for future research in this direction.

Albers et al. (2014) study the variant of the problem where migrations are not allowed (\(P_M;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\)). They show that the problem is NP-hard, even for tasks with unit workload (for which a PTAS is given). The difficult part of this problem is the assignments of tasks to processors. If such an assignment is given, determining the optimal speeds and scheduling order is straightforward, since YDS can be used for the tasks on each individual processor. The heuristic by Albers et al. (2014) sorts the tasks in order of nondecreasing deadlines, and assigns the tasks in this order to the processor with the lowest amount of work assigned to it. This heuristic has an approximation ratio of \(2(2-\frac{1}{N})^\alpha \). A more general version of this problem that considers a weighted sum of the energy consumption and flow time as objective is studied by Greiner et al. (2014).

In recent years, the problem that allows neither migration nor preemption (\(P_M;\mathrm {ss} \vert a_n;d_n \vert E\)) has caught some attention (Cohen-Addad et al. 2015; Bampis et al. 2015). Bampis et al. (2014a) use results from this previous research to develop an algorithm with the approximation ratio \(\tilde{B}_\alpha \big ((1+\epsilon )(1+w^{\max }/w^{\min })\big )^\alpha \).

6.2 Agreeable deadlines

Just as for the uniprocessor problem with agreeable deadlines, in the multiprocessor case a solution to the preemptive problem with no migration can be transformed to a nonpreemptive solution with no migration with the same costs (Bampis et al. 2015).

Albers et al. (2014) present an optimal algorithm for the multiprocessor agreeable deadline problem where tasks have unit workload (\(P_M;\mathrm {ss} \vert a_n;d_n;w_n{=}1;\mathrm {agree} \vert E\)). This algorithm sorts the tasks in order of nondecreasing deadlines, assigns them to the processors using round robin scheduling and applies an algorithm that solves \(1;\mathrm {ss} \vert a_n;d_n;w_n=1;\mathrm {agree} \vert E\) (e.g., YDS) to the task sets for each individual processor. For tasks with an arbitrary workload they give an \(\alpha ^\alpha 2^{4\alpha }\)-approximation algorithm.

6.3 Tasks with precedence constraints

According to the survey by Chen and Kuo (2007) ... energy-efficient scheduling for jobs with precedence constraints with theoretical analysis is still missed in multiprocessor systems. Only a few papers have studied speed scaling of tasks with precedence constraints, and to the best of our knowledge no papers studied the sleep mode variant of this problem. Since the local speed scaling problem (\(P_M;\mathrm {ss} \vert a_n=a;d_n=d \vert E\)) from Sect. 6.1 is already NP-hard, the variant with precedence constraints (\(P_M;\mathrm {ss} \vert a_n=a;d_n=d;\mathrm {prec} \vert E\)) is also NP-hard.

Li (2012) studies the latter problem, and shows that under specific conditions the optimal solution to this problem becomes straightforward to approximate, namely for graphs with precedence constraints that have more parallelism than processors (called wide task graphs). Due to the amount of parallelism, the tasks are easy to schedule and using a single speed for the entire application gives near-optimal results.

The global speed scaling variant of this problem (\(P_M;\mathrm {global} \vert a_n=a;d_n=d;\mathrm {prec} \vert E\)) is also NP-hard, and was studied by Gerards et al. (2015). This problem consists of both scheduling and speed scaling. However, the second step is easy to solve, since the concept of power equality (see Sect. 4.6) can be applied to find the optimal speeds. Gerards et al. (2015) give a scheduling criterion that—together with optimal speeds—leads to a minimal energy consumption. Furthermore, they show how well existing scheduling algorithms perform at approximating the energy consumption.

A closely related problem that also assumes global speed scaling is \(P_M;\mathrm {global} \vert a_n;d_n;\mathrm {sched};\mathrm {prec} \vert E\), where tasks have individual arrival times and deadlines, and a schedule of the tasks is already given. Gerards et al. (2014) give a method that finds the optimal speeds by combining the results on nonuniform power (Sect. 4.7) and the power equality (Sect. 4.6). The given schedule is subdivided into pieces, whereby a piece is a chunk of workload with a constant number of active cores, during which no tasks start or complete. Using the results on nonuniform power and the power equality, these pieces are transformed in such a way that a uniprocessor problem with agreeable deadlines \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {agree} \vert E\) is achieved, which can be solved in quadratic time (see Sect. 5.2). This solution can be transformed back to obtain the optimal solution of the original problem.

7 Open problems

This section discusses some open problems related to speed scaling. The first problem (Sect. 7.1) is about the relation between continuous and discrete speed scaling for a multiprocessor system. This problem was already solved for single-processor systems. The second problem is about speed scaling of tasks with precedence constraints on a local speed scaling system. Even for a given schedule, this problem may be hard.

7.1 Multiprocessor discrete speed scaling

Discrete speed scaling for a single processor is often considered a simpler problem than continuous speed scaling. There is an \(O(N^2 \log N)\)-time algorithm for the frequently studied problem \(1;\mathrm {ss} \vert a_n;d_n;\mathrm {pmtn} \vert E\), while there is a \(O(K N \log N)\)-algorithm to the discrete speed scaling variant of this problem with K speeds (in practice, \(K \ll N\)). Furthermore, a solution to a continuous speed scaling problem can be converted to the discrete speed scaling variant in \(O(N \log K)\) time by simulating the continuous speeds (Sect. 4.5). To the best of our knowledge, there are no papers that relate optimal continuous and discrete speed scaling for multiprocessor systems, or that solve discrete multiprocessor speed scaling problems algorithmically. Only in the simple case where tasks have no precedence constraints and local speed scaling is used, the techniques from single-processor speed scaling can be applied to individual processors. More research on discrete speed scaling for multiprocessor systems, and the relation between continuous and discrete speed scaling on such systems is desirable.

7.2 Local speed scaling for tasks with precedence constraints

Local speed scaling for tasks with precedence constraints is an unsolved and important problem. Even the case where the tasks have been scheduled (i.e., task have been assigned to processors, and per processor a sequence of the assigned tasks is given) and only speeds need to be determined (\(P_M \vert a_n=a;d_n=d;\mathrm {prec};\mathrm {sched} \vert E\)) is currently unsolved. The power equality (discussed in Sect. 4.6) can be used as a first step toward solving the problem.

The following example illustrates why this problem may be difficult.

Example 4

Consider the power function \(p(s)=s^3\) for a three-processor system with local speed scaling. The tasks have precedence constraints as given in Fig. 3a. All tasks share the common deadline \(d=1\).

We keep the work of the tasks variable in this example, to demonstrate the influence of the work on the solution. The schedule (with some arbitrarily chosen workload) is given in Fig. 3b. Note, that the position of the gaps in the schedule will change when the workload changes. The optimisation problem is: for a given schedule, determine the optimal speed assignment that minimizes the energy consumption, respects precedence constraints, and meets the deadline.

Fig. 3
figure 3

Precedence constraints and schedule for Example 4. a Tasks with precedence constraints. b Schedule

Due to convexity of the power function, in the optimal solution it must hold that \(s_1=s_6\). To ease the discussion, we consider two situations:

  1. (a)

    Task \(T_2\) finishes before task \(T_3\), or at the same time.

    In the discussion below, we may assume that the edge “a” between task \(T_2\) and \(T_4\) does not exist, as (with the given assumption) it does not influence the optimal solution. In the optimal solution, we have \(e_2+e_7=e_3+e_4=e_3+e_5\) (same execution time for tasks, avoiding gaps in the schedule), otherwise the energy consumption can be decreased by decreasing the speed of a task that is next to a gap in the schedule. These relations can be used to determine the speeds of these tasks. Using the power equality, the relation between the speeds \(s_3\), \(s_4\), and \(s_5\) can be determined. It can also be used to relate speeds \(s_1\), \(s_2\), and \(s_3\). Now enough information is available to find the optimal speeds.

  2. (b)

    Task \(T_2\) finishes after task \(T_3\).

    In the discussion below, we may assume that the edge “b” between tasks \(T_3\) and \(T_4\) does not exist, as (with the given assumption) it does not influence the optimal solution. In the optimal solution we have that \(e_2+e_7=e_2+e_4=e_3+e_5\). Again, using the convexity of the power function and using the power equality, the optimal speeds can be determined.

A possible method for finding the optimal speeds now is by calculating the energy consumption for both situations and selecting the one with the lowest costs.

This example indicates that solving the overall continuous problem depends on a number of discrete cases. These cases are specified by whether some task finishes before or after some other task. As it is unclear how many of these decision points may occur, and if there is an efficient (polynomial time) algorithm to make these decisions, the above example suggests that the local speed scaling problem with a given schedule of tasks with precedence constraints may be difficult.

8 Discussion

Algorithmic power management can be used to significantly reduce the energy consumption of computing devices. Combined with such power management techniques, scheduling algorithms play a crucial role, since the underlying schedules have a critical impact on the efficiency of power management techniques. This survey discusses a great variety of such scheduling algorithms that reduce the energy consumption of real-time systems by either decreasing the speed (speed scaling), or by turning devices off (sleep modes). We also argued that many of these speed scaling algorithms minimize the peak power consumption, although they are designed to minimize the energy consumption. Furthermore, we pointed out that many power management algorithms rely on the same theoretical concepts. Therefore, we did not only survey algorithms, but also the fundamental ideas behind these algorithms.

As many papers on algorithmic power management do not consider several important architectural details, there is a gap between theory and practice. Therefore, in this survey we gave a short overview of some of these aspects, and how they can be modeled or treated. An example of such an aspect is nonuniform power, which is rarely mentioned in the theoretical literature.

Another important aspect missing in the theoretic literature is the interaction between global and local speed scaling (“voltage and frequency islands”). These hybrids of local and global speed scaling, and multiprocessor discrete speed scaling are—in our view—the major theoretical challenges that need to be addressed in the near future.