Skip to main content
Log in

Approximation algorithms for energy-efficient scheduling of parallel jobs

  • Published:
Journal of Scheduling Aims and scope Submit manuscript

Abstract

In this paper, we consider the homogeneous scheduling on speed-scalable processors, where the energy consumption is minimized. While most previous works have studied single-processor jobs, we focus on rigid parallel jobs, using more than one processor at the same time. Each job is specified by release date, deadline, processing volume and the number of required processors. Firstly, we develop constant-factor approximation algorithms for such interesting cases as agreeable jobs without migration and preemptive instances. Next, we propose a configuration linear program, which allows us to obtain an “almost exact” solution for the preemptive setting. Finally, in the case of non-preemptive agreeable jobs with unit-work operations, we present a three-approximation algorithm by generalization of the known exact algorithm for single-processor jobs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A previous version of the constant-factor approximation algorithm and analogous “almost exact” method were published in Kononov and Kovalenko (2016, 2017) for migratory rigid jobs with a uniform partitioning of the work between processors.

References

  • Albers, S., Antoniadis, A., & Greiner, G. (2015). On multi-processor speed scaling with migration. Journal of Computer and System Sciences, 81, 1194–1209.

    Article  Google Scholar 

  • Albers, S., Bampis, E., Letsios, D., Lucarelli, G., & Stotz, R. (2017). Scheduling on power-heterogeneous processors. Information and Computation, 257, 22–33.

    Article  Google Scholar 

  • Albers, S., Müller, F., & Schmelzer, S. (2014). Speed scaling on parallel processors. Algorithmica, 68(2), 404–425.

    Article  Google Scholar 

  • Angel, E., Bampis, E., Kacem, F., & Letsios, D. (2019). Speed scaling on parallel processors with migration. Journal of Combinatorial Optimization, 37(4), 1266–1282.

    Article  Google Scholar 

  • Antoniadis, A., & Huang, C. C. (2013). Non-preemptive speed scaling. Journal of Scheduling, 16(4), 385–394.

    Article  Google Scholar 

  • Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., & Nemparis, I. (2015). From preemptive to non-preemptive speed-scaling scheduling. Discrete Applied Mathematics, 181, 11–20.

    Article  Google Scholar 

  • Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., & Sviridenko, M. (2018). Energy efficient scheduling and routing via randomized rounding. Journal of Scheduling, 21(1), 35–51.

    Article  Google Scholar 

  • Bampis, E., Letsios, D., & Lucarelli, G. (2015). Green scheduling, flows and matchings. Theoretical Computer Science, 579, 126–136.

    Article  Google Scholar 

  • Bingham, B. D., & Greenstreet, M. R. (2008). Energy optimal scheduling on multiprocessors with migration. In International symposium on parallel and distributed processing with applications, ISPA 2008 (pp. 153–161).

  • Brodtkorb, A. R., Dyken, C., Hagen, T. R., Hjelmervik, J. M., & Storaasli, O. O. (2010). Stateof-the-art in heterogeneous computing. Scientific Programming, 18, 1–33.

    Article  Google Scholar 

  • Chen, J., Hsu, H., Chuang, K., Yang, C., Pang, A., & Kuo, T. (2004). Multiprocessor energy-efficient scheduling with task migration considerations. In 16th euromicro conference on real-time systems, ECRTS (pp. 101–108). IEEE.

  • Cohen-Addad, V., Li, Z., Mathieu, C., & Milis, I. (2015). Energy-efficient algorithms for non-preemptive speed-scaling. In International workshop on approximation and online algorithms, WAOA 2014. LNCS (Vol. 8952, pp. 107–118). Springer, Berlin.

  • Drozdowski, M. (2009). Scheduling for parallel processing. London: Springer.

    Book  Google Scholar 

  • Gerards, M. E. T., Hurink, J. L., & Hölzenspies, P. K. F. (2016). A survey of offline algorithms for energy minimization under deadline constraints. Journal of Scheduling, 19, 3–19.

    Article  Google Scholar 

  • Greiner, G., Nonner, T., & Souza, A. (2014). The bell is ringing in speed-scaled multiprocessor scheduling. Theory of Computing Systems, 54(1), 24–44.

    Article  Google Scholar 

  • Grötschel, M., Lovász, L., & Schrijver, A. (1993). Geometric algorithms and combinatorial optimizations, 2nd corrected edition. Berlin: Springer.

    Book  Google Scholar 

  • Gupta, A., Im, S., Krishnaswamy, R., Moseley, B., & Pruhs, K. (2012). Scheduling heterogeneous processors isn’t as easy as you think. In Twenty-third annual ACM-SIAM symposium on discrete algorithms (pp. 1242–1253).

  • Gupta, A., Krishnaswamy, R., & Pruhs, K. (2010a). Nonclairvoyantly scheduling power-heterogeneous processors. In Proceedings of the international green computing conference (pp. 165–173).

  • Gupta, A., Krishnaswamy, R., & Pruhs, K. (2010b). Scalably scheduling power-heterogeneous processors. In Proceedings of the international colloquium on automata, languages, and programming (pp. 312–323).

  • Huang, W., & Wang, Y. (2009). An optimal speed control scheme supported by media servers for low-power multimedia applications. Multimedia Systems, 15(2), 113–124.

    Article  Google Scholar 

  • Jansen, K., & Porkolab, L. (2000). Preemptive parallel task scheduling in \(o(n)+poly(m)\) time. In D. T. Lee, S.H. Teng (Eds.), Proceedings of ISAAC 2000. LNCS (Vol. 1969, pp. 398–409).

  • Johannes, B. (2006). Scheduling parallel jobs to minimize the makespan. Journal of Scheduling, 9, 433–452.

    Article  Google Scholar 

  • Karzanov, A. (1974). Determining the maximal flow in a network by the method of preflows. Soviet Math. Doklady, 15(2), 434–437.

    Google Scholar 

  • Kononov, A., & Kovalenko, Y. (2016). On speed scaling scheduling of parallel jobs with preemption. In International conference on discrete optimization and operations research, DOOR-2016. LNCS (Vol. 9869, pp. 309–321). Springer, Berlin.

  • Kononov, A., & Kovalenko, Y. (2017). An approximation algorithm for preemptive speed scaling scheduling of parallel jobs with migration. In International conference on learning and intelligent optimization, LION 2017. LNCS (Vol. 10556, pp. 351–357). Springer, Berlin.

  • Li, M., Yao, F., & Yuan, H. (2017). An \(O(n^2)\) algorithm for computing optimal continuous voltage schedules. In Theory and applications of models of computation, TAMC 2017. LNCS(Vol. 10185, pp. 389–400). Springer.

  • Naroska, E., & Schwiegelshohn, U. (2002). On an on-line scheduling problem for parallel jobs. Information Processing Letters, 81, 297–304.

    Article  Google Scholar 

  • Shioura, A., Shakhlevich, N., & Strusevich, V. (2017). Machine speed scaling by adapting methods for convex optimization with submodular constraints. INFORMS Journal on Computing, 29(4), 724–736.

    Article  Google Scholar 

  • Wu, W., Li, M., & Chen, E. (2011). Min-energy scheduling for aligned jobs in accelerate model. Theoretical Computer Science, 412(12–14), 1122–1139.

    Article  Google Scholar 

  • Yao, F., Demers, A., & Shenker, S. (1995). A scheduling model for reduced CPU energy. In 36th annual foundations of computer science, FOCS 1995 (pp. 374–382). IEEE.

Download references

Acknowledgements

A. Kononov was supported by Program no. I.5.1 of Fundamental Research of the Siberian Branch of the Russian Academy of Sciences (project no. 0314-2019-0014). Yu. Kovalenko was supported by Program no. I.5.1 of Fundamental Research of the Siberian Branch of the Russian Academy of Sciences (project no. 0314-2019-0019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yulia Kovalenko.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

The Appendix contains an exact method for computing the lower bound on the energy consumption and approximation algorithms for particular cases of the speed-scaling problem when rigid jobs have a common release date and/or a common deadline.

Lower bound model

Fig. 4
figure 4

Example of graph \(G=(V,A)\)

Using the approach from Shioura et al. (2017), we present an exact algorithm for finding a max-flow in the bipartite network \(G=(V,A)\) proposed in Sect. 2.1 (see Fig. 4) that minimizes the total cost

$$\begin{aligned} \sum _{j=1}^n {\frac{x(s,j)}{size_j} \sum _{l=1}^{size_j}\left( \frac{W_{jl} size_j}{x(s,j)} \right) ^{\alpha }}. \end{aligned}$$

Note that our network differs from the network in Shioura et al. (2017) only in the capacities of arcs from job nodes to interval nodes (\(\mu (j,I_k)=\Delta _k\) in Shioura et al. 2017). Initially, we reformulate the problem in terms of submodular optimization. We define a polymatroid polyhedron

$$\begin{aligned} Q=\left\{ q\in \mathcal {R}^n:\ \mathrm {there}\ \mathrm {exists}\ \mathrm {feasible}\ \mathrm {s-t}\ \mathrm {flow}\ x\right. \\ \left. \mathrm {in}\ G=(V,A)\ \mathrm {with}\ x(s,j)=q_j \mathrm {for}\ j=1,\dots ,n \right\} . \end{aligned}$$

Let \(q(X):=\sum _{j\in X}q_j\) be the total duration of all jobs from X as they are executed on one processor, and a polymatroid rank function \(\varphi :2^{\mathcal {J}}\rightarrow \mathbb {R}\) is given by

$$\begin{aligned} \varphi (X)=\max \left\{ \sum _{j\in X} y(s,j):\ y\ \mathrm {is}\ \mathrm {a}\ \mathrm {feasible}\ \mathrm {flow}\ \mathrm {in}\ G\right\} . \end{aligned}$$

Then all possible max-flows can be characterized as a base polyhedron \(B(\varphi )=\{ q\in Q_{(+)}(\varphi ):\ q(X)\le \varphi (X),\ X\in 2^{\mathcal {J}}\}\) of the polymatroid polyhedron \(Q_{(+)}(\varphi )=\{ q\in \mathbb {R}^n_+:\ q(\mathcal {J})=\varphi (\mathcal {J})\}\).

For \(X\subseteq \mathcal {J}\) and \(h=1,\dots ,\gamma \), we denote by

$$\begin{aligned} \eta (X,h):=\sum \limits _{j\in X:\ I_h\in \Gamma (j)} size_j \end{aligned}$$

the number of processors that can be utilized by jobs from X in interval \(I_h\). Then the value \(\varphi (x)\) is explicitly given as

$$\begin{aligned} \varphi (x)=\sum _{h=1}^{\gamma } \min \{m,\eta (X,h)\}\cdot \Delta _h \end{aligned}$$

and specifies the total duration of all time intervals available for processing the jobs of set X.

Therefore, in terms of submodular optimization, the considered problem can be reformulated as

$$\begin{aligned}&\sum _{j=1}^n \left( \frac{q_j}{size_j} \right) ^{1-\alpha } \left( \sum _{l=1}^{size_j} W_{jl}^{\alpha }\right) \rightarrow \min ,\nonumber \\&\quad q\in B(\varphi ). \end{aligned}$$
(14)

To solve the problem (14), we use the decomposition algorithm from Shioura et al. (2017). The scheme is presented in Algorithm 2. Note that the subproblems to be solved in Steps 5 and 6 have a common structure. Hence, the original problem is solved recursively. The number of subproblems generated by the decomposition algorithm is O(n), and therefore Steps 1, 2 and 3 are performed O(n) times. Hence, the overall running time of the decomposition algorithm is \(O(nT_{123}(n))\), where \(T_{123}(n)\) is the time complexity of Steps 1, 2 and 3.

Step 1 can be implemented in O(n) time, using the necessary condition

$$\begin{aligned} \frac{\partial \left( \sum \limits _{i=1}^n \left( \frac{q_i}{size_i} \right) ^{1-\alpha } \left( \sum \limits _{l=1}^{size_i} W_{il}^{\alpha }\right) \right) }{\partial q_j}=\lambda ,\ j\in \mathcal {J}. \end{aligned}$$
(15)
figure b

Thus, we combine (15) with (16) and obtain

$$\begin{aligned} q_j= \frac{\varphi (\mathcal {J})\left( size_j^{\alpha -1} \sum \limits _{l=1}^{size_j}(W_{jl})^{\alpha }\right) ^{1/\alpha } }{\sum \limits _{i\in \mathcal {J}} \left( size_i^{\alpha -1} \sum \limits _{l=1}^{size_i}(W_{il})^{\alpha }\right) ^{1/\alpha } },\ j\in \mathcal {J}. \end{aligned}$$
(17)

The problem of Step 2 is reduced to the max-flow problem in the network \(G_b\) obtained from network \(G=(V,A)\) by replacing capacities \(\mu (s,j)=+\infty \) with b(j) for all \(j\in \mathcal {J}\). For a max-flow \(x^*\) in \(G_b\), an optimal solution to the problem of Step 2 is given by \(c_j=x^*(s,j),\ j\in \mathcal {J}\). At Step 3, a set \(Y^*\) is constructed from a minimum s-t cut \((S^*,T^*)\) in \(G_b\) as \(Y^*=S^*\) due to \(c(S^*)=\varphi (S^*),\ c_j=b_j,\ j\in T^*\). The results for Steps 2 and 3 immediately follow from Lemma 3 in Shioura et al. (2017). A max-flow in \(G_b\) can be found in \(O(n^3)\) time by the algorithm of Karzanov, and a minimum s-t cut in \(G_b\) is computed in \(O(n^2)\) operations Karzanov (1974). Therefore, the time complexity of the decomposition algorithm is \(O(n^4)\).

Common release date and/or deadline

Let us assume that all jobs have a common release date r and/or a common deadline d. We present strongly polynomial-time algorithms achieving constant-factor approximation guarantees for non-migratory cases. Our algorithms consist of two stages. At the first stage, we obtain a lower bound on the minimum energy consumption and calculate intermediate execution times of jobs. Then, at the second stage, we determine the final speeds of jobs and schedule them.

1.1 Common release date and deadline

Now we consider the non-preemptive case of the problem where all jobs arrive at time \(r=0\) and have a shared global deadline d.

1.1.1 The first stage

A lower bound on the objective function can be found in \(O(n^4+n\max \nolimits _{j\in \mathcal {J}}size_j)\) time using the method presented in Sect. 2.1. Here, a more effective approach is proposed for the considered problem instances.

We construct the following convex problem with \(e_j\) being the temporary duration of job \(j\in \mathcal {J}\):

$$\begin{aligned}&\sum _{j=1}^n e_j \sum _{l=1}^{size_j} \left( \frac{W_{jl}}{e_j}\right) ^{\alpha }\rightarrow \min , \end{aligned}$$
(18)
$$\begin{aligned}&\sum _{j=1}^n size_j e_j=md. \end{aligned}$$
(19)

Constraint (19) gives the bound on processor usage in interval [0, d). The energy consumption is formulated in form (18). This problem is solved using the Lagrangian method. Define the Lagrangian function \(L(e_j,\lambda )\) as

$$\begin{aligned} \sum _{j=1}^n e_j \sum _{l=1}^{size_j} \left( \frac{W_{jl}}{e_j}\right) ^{\alpha } + \lambda \left( \sum _{j=1}^n size_j e_j-md \right) . \end{aligned}$$
(20)

The necessary and sufficient conditions for an optimal solution allow us to find the temporary durations

$$\begin{aligned} e_j=\frac{mdB_j}{\sum _{j'=1}^n size_{j'}B_{j'}}, j\in \mathcal {J}, \end{aligned}$$
(21)

where \(B_j=\left( \sum \limits _{l=1}^{size_j} \frac{ W_{jl}^{\alpha }(\alpha -1) }{ size_j }\right) ^{\frac{1}{\alpha }}\).

Note that \(e_j\) may be greater than d for some jobs. In order to avoid such situations, we propose the following procedure.

Let \(m'\) denote the current number of unoccupied processors and \(\mathcal {J}'\) be the set of currently considered jobs. Initially, \(\mathcal {J}':=\mathcal {J}\) and \(m':=m\).

We enumerate the jobs one by one in order of non-increasing values \(B_j\). If the current job i has value \(B_i\ge \frac{\sum _{j\in \mathcal {J}'} B_j size_j}{m'}\), then we assign duration \(p_i:=d\) for this job, and set \(\mathcal {J}':=\mathcal {J}'\setminus \{i\}\) and \(m':=m'-size_i\). We then go to the next job. Otherwise, all jobs \({l\in \mathcal {J}'}\) satisfy the inequality \({B_l<\frac{\sum _{j\in \mathcal {J}'} B_j size_j}{m'}}\), and we assign durations \(p_l:=\frac{B_l m' d}{\sum _{j\in \mathcal {J}'} B_j size_j}\) for them.

The time complexity of our procedure is \(O(n\log n+n\max \nolimits _{j\in \mathcal {J}}size_j)\). It guarantees that \(\sum _{j\in \mathcal {J}}p_j size_j= md\), \(p_j \le d\) and gives the lower bound on the objective function equal to \(\sum _{j\in \mathcal {J}} p_j \sum _{l=1}^{size_j} \left( \frac{W_{jl}}{p_j}\right) ^{\alpha }\). At the second stage, we use the “non-preemptive list scheduling” algorithm (Naroska and Schwiegelshohn 2002) to construct a feasible schedule.

1.1.2 The second stage

Whenever a subset of processors falls idle, the non-preemptive list scheduling algorithm schedules a job that does not require more processors than are available, until all jobs in \(\mathcal {J}\) are assigned. The time complexity of the algorithm is \(O(n^2)\).

We claim that the length of the constructed schedule is at most \({\left( 2-\frac{1}{m}\right) d}\). (The proof is similar to the proof of Lemma 2 in Sect. 2.4.) By increasing the speed of each job operation in \(\left( 2-\frac{1}{m}\right) \) times, we obtain a schedule with length of at most d. The total energy consumption is increased by a factor \(\left( 2-\frac{1}{m}\right) ^{\alpha -1}\) in comparison with the lower bound. As a result, we have

Theorem 6

A \(\left( 2-\frac{1}{m}\right) ^{\alpha -1}\)-approximate schedule can be found in \(O(n^2+n\max \nolimits _{j\in \mathcal {J}}size_j)\) time for speed-scaling scheduling problems \({P|size_j,r_j=r,d_j=d|E}\) and \({P|size_j,pmtn*,r_j=r,d_j=d|E}\).

1.2 Common release date or deadline

Here, we study the preemptive problem without migration where all jobs are released at time \(r=0\), but have individual deadlines. Auxiliary durations \(e_j\) of jobs and a lower bound on the objective function are computed in \(O(n^4+n\max \nolimits _{j\in \mathcal {J}}size_j)\) time using the min-cost max-flow model presented in Sect. 2.1.

Then we use the preemptive earliest deadline list scheduling algorithm to construct an approximate solution. Jobs are scheduled in order of non-decreasing deadlines as follows. If \(size_i> \left\lceil \frac{m}{2} \right\rceil \), then job i is assigned at the end of the current schedule. Otherwise, we start job i at the earliest time instant when \(size_i\) processors are idle and process it during \(e_i\) time, ignoring intervals of jobs with \(size_j> \left\lceil \frac{m}{2} \right\rceil \). The time complexity of the algorithm is \(O(n^2)\).

We claim that the completion time \(C_j\) of each job j in the constructed schedule is at most \({\left( 3-\varphi _m\right) d_j}\) (see Lemma 4). Hence, an increase in the speeds in \({\left( 3-\varphi _m\right) }\) time yields a feasible schedule. The total energy consumption is increased by a factor \(\left( 3-\varphi _m\right) ^{\alpha -1}\).

Obviously, through the interchange of release dates and deadlines, the algorithm presented can also handle the case of jobs with individual release dates but a common deadline. As a result, we have

Theorem 7

A \(\left( 3-\varphi _m\right) ^{\alpha -1}\)-approximate schedule can be found in \(O(n^4 +n\max \limits _{j\in \mathcal {J}}size_j)\) time for speed-scaling scheduling problems \(P|size_j,pmtn*,r_j=r,d_j|E\) and \(P|size_j,pmtn*,r_j,d_j=d|E\).

Recall that

$$\begin{aligned} \varphi _m = {\left\{ \begin{array}{ll} \frac{4}{m+1} &{} \text {if { m} is odd,} \\ \frac{6}{m+2} &{} \text {if { m} is even,} \end{array}\right. } \end{aligned}$$

and prove the following lemma.

Lemma 4

Given m processors and a set of jobs \(\mathcal {J}\) with deadlines \(d_i,\) processing times \(e_i\le d_i\) and sizes \(size_i\), where \(\sum _{j\in \mathcal {J}_i}e_j size_j\le md_i\) for each \(i\in \mathcal {J}\) with \(\mathcal {J}_i=\{j\in \mathcal {J}:\ d_j\le d_i\}\), the completion time \(C_i\) is at most \({\left( 3-\varphi _m\right) d_i}\) for each job \(i\in \mathcal {J}\) in the schedule S constructed by the preemptive earliest deadline list scheduling algorithm.

Proof

We consider an arbitrary deadline \(d_i\), where job i has the maximum completion time \(C_i\) in schedule S among all jobs with deadline equal to \(d_i\).

Note that \(C_j\le C_i\) for all jobs \(j\in \mathcal {J}_i\). Let \(S_i\) denote the part of schedule S which contains only jobs from subset \(\mathcal {J}_i\) and occupies interval \([0,C_i)\). We will show that \(C_i\le {\left( 3-\varphi _m\right) d_i}\).

If at least \(\left\lceil \frac{m+1}{2} \right\rceil \) processors are used at any time instance in subschedule \(S_i\), we have

$$\begin{aligned} d_i\ge \frac{1}{m}\sum _{j\in \mathcal {J}_i}e_j size_j \ge \left\lceil \frac{m+1}{2} \right\rceil \frac{C_i}{m} \ge \frac{C_i}{2-\frac{1}{m}}. \end{aligned}$$

Otherwise, assume that l is the last job in subschedule \(S_i\) that requires \(size_l \le \left\lceil \frac{m}{2} \right\rceil \) processors. It is easy to see that all time slots in intervals \([0,C_l-e_l)\) and \([C_l,C_i)\) use at least \(\left\lceil \frac{m+1}{2} \right\rceil \) processors, and at least \(size_l\) processors are utilized in interval \([C_l-e_l,C_l)\). Therefore, the total load of all processors in subschedule \(S_i\) is at least

$$\begin{aligned} \left\lceil \frac{m+1}{2} \right\rceil \left( C_i-e_l\right) +size_le_l\le \sum _{j\in \mathcal {J}_i}e_jsize_j\le d_im. \end{aligned}$$

If \(e_l\ge \frac{C_i}{3-\varphi _m}\), then \(C_i \le \left( 3-\varphi _m\right) d_l \le \left( 3-\varphi _m\right) d_i\).

Otherwise, for \(size_l\ge 1\) we have

$$\begin{aligned} md_i\ge & {} \left\lceil \frac{m+1}{2} \right\rceil C_i - e_l\left( \left\lceil \frac{m+1}{2} \right\rceil -size_l\right) \\\ge & {} \left\lceil \frac{m+1}{2} \right\rceil C_i - \left\lceil \frac{m-1}{2} \right\rceil e_l\\\ge & {} C_i \left( \left\lceil \frac{m+1}{2} \right\rceil - \left\lceil \frac{m-1}{2} \right\rceil \frac{1}{3-\varphi _m} \right) =\frac{mC_i}{3-\varphi _m}. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kononov, A., Kovalenko, Y. Approximation algorithms for energy-efficient scheduling of parallel jobs. J Sched 23, 693–709 (2020). https://doi.org/10.1007/s10951-020-00653-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10951-020-00653-8

Keywords

Navigation