Models and algorithms for energy-efficient scheduling with immediate start of jobs

We study a scheduling model with speed scaling for machines and the immediate start requirement for jobs. Speed scaling improves the system performance, but incurs the energy cost. The immediate start condition implies that each job should be started exactly at its release time. Such a condition is typical for modern Cloud computing systems with abundant resources. We consider two cost functions, one that represents the quality of service and the other that corresponds to the cost of running. We demonstrate that the basic scheduling model to minimize the aggregated cost function with n jobs is solvable in O(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n\log n)$$\end{document} time in the single-machine case and in O(n2m)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^{2}m)$$\end{document} time in the case of m parallel machines. We also address additional features, e.g., the cost of job rejection or the cost of initiating a machine. In the case of a single machine, we present algorithms for minimizing one of the cost functions subject to an upper bound on the value of the other, as well as for finding a Pareto-optimal solution.


Introduction
In this paper, we study scheduling models that address two important aspects of modern computing systems: machine speed scaling for time and energy optimization and the requirement to start jobs immediately at the time they are submitted to the system.The first aspect, speed scaling, has been the subject of intensive research since the 1990s, see Yao et al. (1995), and has become particularly important recently, with the increased attention to energy-saving demands, see surveys Albers (2009Albers ( , 2010a)), Jing et al. (2013), Gerards et al. (2016).It reflects the ability of modern computing systems to change their clock speeds through the technique known as Dynamic Voltage and Frequency Scaling (DVFS).The higher the speed, the better the performance from users' perspective, but the energy usage and other computation costs do increase.The goal is to select the right speed value from the full spectrum of speed to achieve a desired trade-off between performance and energy.DVFS techniques have been successfully applied in Cloud data centers to reduce the energy usage, see, e.g., VonLaszewski et al. (2009), Wu et al. (2014), DoLago et al. (2011).
The second aspect, the immediate start condition, is motivated by the advancements of modern Cloud computing systems, and it is widely accepted by practitioners.This feature is not typical for the traditional scheduling research dealing with scenarios arising from manufacturing.In such systems, jobs compete for limited resources.They often have to wait until resources become available, and job starting times can be delayed if the system is busy.
In modern computing systems (Clouds and data centers), processing units are no longer scarce resources, but quite opposite, abundant resources; see, e.g., Kushida et al. (2015).Clouds give the illusion of infinite computing resources available on demand.Cloud providers agree with customers on a service-level agreement (SLA) and sell computing services to customers as utilities.Special mechanisms allow Cloud providers to ensure that the actual demand for resources is met at practically any point in time (Armbrust et al. 2009(Armbrust et al. , 2010;;Jennings and Stadler 2015).In the modern competitive market, Cloud providers achieve high availability of resources, promise their customers instant access to resources and allow customers to monitor how that promise is kept (Aceto et al. 2013).These features are unprecedented in the history of IT and have now become a standard.
The infrastructure for Cloud computing systems is provided by data centers.Data centers execute a large number of computing processes (which we call jobs from now on).In order to guarantee on-demand access, the execution of a job needs to be started immediately upon its submission to the system.Customers experiencing waiting times in order to get their jobs started become unsatisfied with the service and are likely to change the provider next time (Armbrust et al. 2010).It is therefore in the interest of providers to start job execution as soon as jobs are submitted to the system.This phenomenon is our motivation for what we call the immediate start condition.
The optimization criteria are typically of two types: those related to the system performance and the quality-of-service (QoS) provision, as well as those related to the operational cost of the processing system.The criteria of the first type may represent the mean flow times of jobs, total (or mean) tardiness or a more general function F defined as the sum of penalty functions of job completion times.The second objective G is the sum of operational costs for using individual resources, each of which depends on the time the resource is used.It can be linear, to model the monetary resource usage cost, or convex, to model the energy consumption cost.
Observing the immediate start condition is one of the key priorities for resource providers, and it is usually included in the QoS protocols.The case study presented by Garg et al. (2013) characterizes a possible waste of execution time due to the resource unavailability.It is estimated as low as 0.5% for customers of Amazon EC2 and 0.1% for Windows Azure customers.The Rackspace web hosting company guarantees 100% network availability and 99.95-99.99%platform availability; see Rackspace (2015); in reality Rackspace often achieves 100% availability of its resources (Garg et al. 2013).Nowadays, special software is being developed in order to strengthen service-level agreements (SLAs) for customers by fixing the maximum response time, which can be as small as a few seconds (Iqbal et al. 2009).Due to a strong competition in the area, the customers choose providers who are prepared to demonstrate that handling the submitted jobs is their top priority.
The immediate start requirement is not a seemingly strong assumption, but a fact of today's life.It is widely accepted in distributed computing, but generally overlooked by the scheduling community, where the traditional perception remains, that of limited resources and acceptable delayed starting times.
In this paper, we initiate the study of the immediate start off-line scheduling models, assuming that accurate job characteristics can be available in advance through historical analysis, predicting techniques or a combination of both; see, e.g., Moreno et al. (2014) for off-line scenarios in Cloud computing.To satisfy the immediate start condition, we recommend a policy of changing the processing speeds, so that a certain measure of the schedule quality and the cost of speeds (normally understood as energy) are both taken into consideration.The owners of submitted tasks and the providers of processing facilities both want an early completion of tasks, as recorded in the SLAs, and this in practice leads to a no-preemption requirement; see Tian and Zhao (2015).We understand that the models that we address in this paper are rather ideal and simple, but we see our work as a necessary step that should be made before more advanced and practically relevant models are investigated.
In the remainder of this section, we provide a formal definition of the model under study and discuss the relevant literature.

Definitions and notation
Formally, in the models under consideration, we are given a set of jobs N = {1, 2, . . ., n}.A job j ∈ N can be understood as a computational task characterized by its volume or work γ j , measured in millions of instructions to be performed on a computing device.Each job j ∈ N is associated with a release date r j before which it is not available.For completeness, assume that r n+1 = + ∞.
It is also possible that job j is given a due date d j , before which it is desired to complete that job, and/or a deadline d j .The due dates d j are seen as "soft", i.e., it is possible to violate them, and usually a certain penalty is associated with such a violation.On the other hand, the deadlines d j are "hard", i.e., in any feasible schedule job j must be completed by time d j .If for a job j ∈ N no deadline is given, we assume that d j = + ∞.Additionally, job j can be given weight w j , which indicates its relative importance.
Each job is processed without preemption either on a single machine or on one of m parallel machines.For processing job j ∈ N , the corresponding machine has to be assigned speed s j , measured in millions of instructions performed per time unit, so that the actual processing time of job j is defined by The main feature of the models that we study in this paper is the requirement of the immediate start or dispatch of each job, i.e., job j ∈ N must start its processing immediately upon its arrival at time r j .Without loss of generality, we assume that all release dates are distinct and the jobs are numbered in increasing order of their release dates, i.e., (2) For a fixed schedule, let C j denote the completion time of job j ∈ N .Provided that job j ∈ N is processed at speed s j , we have that where the actual processing time p j is defined by (1).
We associate each job j ∈ N with two types of penalties: (i) a traditional scheduling cost function f j C j , where f j is a non-decreasing function, so that f j C j represents the penalty for completing job j at time C j ; the total cost is then (ii) the speed cost function g j s j , which is often interpreted as the energy that is required for running job j for one time unit at speed s j ; the operational cost is then Notice that our model is formulated for a homogeneous distributed system: all physical machines have the same speed and energy characteristics, and the cost functions g j are independent of machines.
It is widely accepted in the literature on power-aware scheduling that energy is proportionate to the cube of speed; see, e.g., Brooks et al. (2000) and Pruhs et al. (2008).This is why in most illustrative examples presented in the remainder of this paper we assume that Depending on the type of the objective function, in this paper we address several different models with immediate start: -+ : it is required to find a feasible schedule that minimizes the aggregated cost F + G; -1 : it is required to find a feasible schedule that minimizes one of the cost functions subject to an upper bound on the other function, e.g., to minimize total energy G subject to an upper bound on the value of F; -2 : it is required to find feasible schedules that simultaneously minimize two cost components, e.g., to find the Pareto-optimal solutions for the problem of minimizing total cost F and total energy G.

Related work
Both features, machine speed scaling and the immediate start condition, have a long history of study.However, so far they have been considered separately and in different contexts.
One point of difference is related to preemption, the ability to interrupt and resume job processing at any time.This feature is typically accepted in speed scaling research in order to avoid intractable cases, while it is forbidden in the immediate start model on a single machine and on parallel identical machines.Notice that preemptive version with immediate start should have additional condition on immediate migration and restart, which makes preemption redundant.In what follows, we provide further details about the two streams of research.
The speed scaling research stems from the seminal paper by Yao et al. (1995), who developed an O(n 3 )-time algorithm for preemptive scheduling of n jobs on a single machine within the time windows r j , d j given for each job j ∈ N .Note that in that paper time windows are treated in the traditional sense, without the immediate start requirement.Subsequent papers by Li et al. (2006Li et al. ( , 2014)), Albers et al. (2011Albers et al. ( , 2014) ) and Angel et al. (2012) proposed improved algorithms for the single-machine problem and extended this line of research to the multi-machine model.The running times of the current fastest algorithms are O(n 2 ) and O(n 4 ) for the single-machine and parallel-machine cases, see Shioura et al. (2015).
Speed scaling problems which involve not only the speed cost function G, but also a scheduling cost function F = n j=1 f j (C j ) have been under study since the paper by Pruhs et al. (2008).The two most popular functions are the total completion time F 1 = n j=1 C j and the total rejection cost F 2 = n j=1 w j sgn max C j − d j , 0 , where w j is the cost incurred if job j cannot be processed before its deadline and therefore is rejected.Without the immediate start condition, the tractable cases of problems + and 1 with objectives F 1 and F 2 are very limited.
In the case of function F 1 , the version of problem + with equal release dates is solvable in O n 2 m 2 (n + log m) time, where m is the number of machines; see Bampis et al. (2015).Notice that preemptions are redundant in that model.If jobs are available at arbitrary release times r j , then problem 1 is NP-hard even if there is only one machine and preemption is allowed, see Barcelo (2015).For problems with arbitrary release dates and equal-work jobs, preemption allowance makes no difference to an optimal solution, and due to a nonlinear nature of the problem an optimal value of the objective can be found within a chosen accuracy ε.For example, for problem 1 on a single machine an algorithm by Pruhs et al. (2008) takes O(n 2 log G ε ) time, where G is the upper bound on the speed cost function (energy), while for problem + on parallel machines an algorithm by Albers and Fujiwara (2007) requires O(n 3 log 1 ε ) time.The difficulties associated with arbitrary-length jobs are discussed by Pruhs et al. (2008), Bunde (2009), Barcelo et al. (2013).For the problem of preemptive scheduling on a single discretely controllable machine, Antoniadis et al. (2014) provide an algorithm with time complexity O(n 4 k), where k is the number of possible speed values of the processor.
In the speed scaling research, the problems of minimizing the total rejection cost F 2 are typically studied as those of maximizing the throughput, defined as the number of jobs that can be processed by their deadlines.Polynomial-time algorithms are known only for special cases, where various conditions are imposed, in addition to the assumption that all jobs have equal weights w j .Notice that strict assumptions of those models make preemption redundant.The singlemachine problem 1 with w j = 1 for all j ∈ N is solvable in O n 4 log n log γ j time and in O n 6 log n log γ j time, depending on whether the jobs are available simultaneously (r j = 0 for all j ∈ N ) or not; in the latter case, it is further required that release dates and deadlines are agreeable, see Angel et al. (2013).The parallel-machine problem 1 with the jobs of equal size and equal weight ( additionally release dates and deadlines are agreeable, see Angel et al. (2016).
Research on speed scaling problems extends to the design of approximation algorithms and the study of their online versions.Without providing a comprehensive list of results of this type, we refer an interested reader to the survey papers by Albers (2009Albers ( , 2010a, b) , b) and Bampis (2016).
As far as the immediate start condition is concerned, the most relevant problems studied in the literature fall into the category of interval scheduling.In such models, each job is characterized by time intervals where it can be processed (Kovalyov et al. 2007).One of the most well-studied versions of interval scheduling assumes that there is only one interval per job [r j , d j ].In interval scheduling, there is no freedom in selecting job starting times and in making preemption: every job j ∈ N should start precisely at a given time r j and complete at a given deadline d j .There is also no control over machine speeds, which are fixed and cannot be changed.The decision making consists in (i) selecting a subset of jobs that can be processed within their time intervals and (ii) assigning them to the machines for processing without preemption.The two typical objectives are the job rejection cost, which is defined similarly to the function F 2 , and the machine usage cost defined typically as the (weighted) number of machines which are selected to process the jobs.Note that unlike the operational cost function G used in our model, the machine usage cost in interval scheduling does not take into account the actual time of using a machine.
Within the broad range of interval scheduling results (see the survey papers by Kolen et al. (2007) and Kovalyov et al. (2007)), those relevant to our study deal with identical parallel machines or uniform machines.In the case of identical parallel machines, the fastest algorithms for minimizing the job rejection cost have time complexity O(n log n) if all jobs have equal weights (Carlisle and Lloyd 1995) and O(mn log n) if job weights are allowed to be different (Bouzina and Emmons 1996); the fastest algorithm for minimizing the machine usage cost is of time complexity O(n log n) if machine weights are equal (Gupta et al. 1979).
The version of the problem with uniform machines is less studied.For uniform machines, both problems, with job rejection cost and machine usage cost, are strongly NPhard; see Nakajima et al. (1982) and Bekki and Azizoglu (2008).Polynomial-time algorithms, all of time complexity O(n log n), are known for the problem of minimizing the machine usage cost, if there are only two types of machines, slow and fast (Nakajima et al. 1982), and for the problem of minimizing the job rejection cost, in one of the following two cases: if all jobs are available simultaneously and have equal weights, or if all jobs have equal volume and there are only two processing machines (Bekki and Azizoglu 2008).
One more problem related to our study is a relaxed version of interval scheduling, where the jobs are allowed to start at any time after their release dates r j , but they are required to complete exactly at their deadlines d j .Such a problem can be considered as a counterpart of our problem, where the jobs are required to start at release dates r j , but they are allowed to complete at any time before deadlines d j .
For the model with fixed job completion times, the scheduling cost function F = f j C j can be only of type F 2 representing the job rejection cost, since for any accepted job j, C j = d j and therefore there is no scope for optimizing a function f j (C j ).Prior study focuses on the model with identical parallel machines, where machine speeds are equal and cannot be changed.Algorithms of time complexities O(n log n) and O(n 2 m) are proposed by Lann and Mosheiov (2003) and Hiraishi et al. (2002) for the case of equal-weight jobs and for the general case, respectively.The latter result is generalized further to the case of controllable processing times (Leyvand et al. 2010), where a job consuming x j amount of resources gets a compressed processing time given by an arbitrary decreasing function p j (x j ); the associated resource consumption cost is linear, The two typical examples of p j are p j (x j ) = p j − a j x j and p j (x j ) = θ j x j k , where p j and θ j are given job-related parameters, while k is a positive constant; see Shabtay and Steiner (2007).The second model is linked to the power-aware model: if k = 1/2, θ j = γ 3 j and x j = γ j s 2 j , then we get G = G assuming a cubic power function (compare p j (x j ) = γ j s j with (1) and G = n j=1 β j γ j s 2 j with ( 4)).Notice that as observed above, the research with fixed completion times is limited to only one type of scheduling objective: job rejection cost.
As demonstrated in Leyvand et al. (2010), the counterpart of problem + with fixed completion times can be solved in O(mn 2 ) time, while the counterparts of problems 1 and 2 are NP-hard.For discrete versions of NP-hard problems, Leyvand et al. (2010) develop algorithms of time complexity O(mn m+1 X max ), where X max is the maximum resource usage cost, X max = n j=1 β contr j max x j , assuming that resource amounts x j are allowed to take only discrete values from a given range.
We study the most general versions of + , 1 and 2 with arbitrary functions f j C j , reflecting diverse needs of customer-oriented quality-of-service provisioning in distributed systems.Problem + is solvable in O(n) time on a single machine (Sect.2), and in O(n 2 m) on m parallel machines (Sect.3).The 1 model of minimizing energy G on a single machine subject to an upper bound on the total flow time is handled in Sect.4; we formulate it as a nonlinear resource allocation problem with continuous variables and explain how it can be solved in O(n log n) time.In Sect.5, we present a method, also of time complexity O(n log n), for finding Pareto-optimal solutions for the 2 model, in which the functions F and G have to be simultaneously minimized on a single machine.Conclusions are presented in Sect.6.

Problem + on a single machine
In this section, we consider the problem of minimizing the sum of the performance cost function F and total energy G on a single machine, provided that each job j ∈ N starts immediately at time r j .
It is clear that in the single-machine case, in order to guarantee the immediate start of job j + 1, each job j, 1 ≤ j ≤ n − 1, must be completed no later than time r j+1 .Taking into account deadlines d j , we conclude that in a feasible schedule each job j ∈ N must be completed by its imposed deadline D j given by Recall that r n+1 = + ∞.
Due to the immediate start condition, the actual processing time p j of job j ∈ N should not exceed In order to minimize the sum of total cost F and total energy G, we need to solve a problem, which in terms of the decisions variables p j , j ∈ N , can be formulated as (5) Due to a separable structure of the objective in ( 5), the optimal processing times can be found independently for each job j ∈ N by solving the following n problems with a single decision variable p: For problem (6), let Z * j denote the smallest value of the objective function Z j , and p * j be the value of p that minimizes Z j .In the schedule that minimizes F + G, the jobs are processed in the order of their numbering, and the actual processing time of job j ∈ N is equal to p * j .For most practically relevant cases, we may assume that for each j ∈ N problem (6) can be solved in constant time.Under this assumption, we obtain the following statement.
Theorem 1 The problem + of minimizing the sum of total cost F and total energy G on a single machine is solvable in O (n) time, provided that the jobs are numbered in accordance with (2) and for each j ∈ N problem (6) can be solved in constant time.
Below we present several illustrations, taking two popular scheduling performance measures and, as agreed in Sect. 1, a cubic speed cost function (3).Notice that for the latter function, pg j

123
For job j ∈ N , suppose that f j C j = w j C j , i.e., F represents the weighted sum of the completion times.Then, problem (6) can be written as Minimize Z j = w j p + w j r j + β j γ 3 j p 2 subject to 0 < p ≤ u j , so that p * j = min γ j For another illustration, assume that job j ∈ N is given a "soft" due date d j , but no "hard" deadline d j , i.e., D j = r j+1 , 1 ≤ j ≤ n − 1. Suppose that f j C j = w j max C j − d j , 0 , i.e., F represents total weighted tardiness.
If for a job j ∈ N , the inequality r j+1 ≤ d j holds, then job j will be completed before its due date and problem (6) can be written as Minimize , r j+1 − r j .Otherwise, i.e., if r j+1 > d j , in order to solve problem (6), we need to solve two problems: Minimize β j γ 3 j p 2 subject to 0 < p ≤ d j − r j , which corresponds to an early completion of job j so that no tardiness occurs, and where job j completes after its due date.For job j ∈ N , the optimal actual processing time p * j is equal to the value of p that delivers the lowest value of the objective function in these two problems.
In the presented examples, which can be extended to most traditionally used objective functions, the actual processing time p * j of each job is essentially written in closed form, which justifies our assumption that each problem (6) can be solved in constant time.

Problem + on parallel machines
In this section, we study the problem of finding an immediate start schedule for processing the jobs of set N on m parallel  2010), we reduce our problem to a minimum-cost flow problem in a special network (Fig. 1).
Introduce a bipartite network H = (V, T ).The node set V = {s, t} ∪ N A ∪ N B consists of a source s, a sink t, two sets N A = {A 1 , A 2 , . . ., A n } and N B = {B 1 , B 2 , . . ., B n }, where each node A j and B j is associated with job j ∈ N .The set T of arcs is defined as T = T A ∪ T AB ∪ T B A ∪ T B , where Each arc q, q ∈ T is associated with capacity μ(q, q ) and cost c(q, q ).Recall that a feasible flow x : A → R satisfies the capacity constraint 0 ≤ x(q, q ) ≤ μ(q, q ), (q, q ) ∈ T, ( i.e., the flow on an arc cannot be larger than its capacity, and the flow balance constraint q: (q ,q)∈T AB ∪T B A x(q , q) = q: (q,q )∈T AB ∪T B A x(q, q ), (8) for q ∈ V \ {s, t}, i.e., for each node q other than the source and the sink the flow that enters the node must be equal to the flow that leaves the node.The value of a flow x is equal to the total flow on the arcs that leave the source (or, equivalently, enter the sink): For network H , let us set all arc capacities to 1.By appropriately defining the costs on the arcs of network H , we reduce the original problem of minimizing the objective function F + G on m parallel machines to finding the minimum-cost flow of value m in H .
Suppose a flow of value m in network H is found.Since the network is acyclic, the arcs with a flow equal to 1 will form m paths from s to t , and the order of arcs of set T B A in each path defines the sequence of jobs on a machine.A path starts with an arc s, A j , proceeds with pairs of arcs of the form (A j , B j ), (B j , A k ), and concludes with the final pair (A , B ), (B , t).An arc s, A j implies that job j is the first on some machine.A pair (A j , B j ), (B j , A k ) corresponds to scheduling two jobs, j and k, one after another on the same machine, while a pair (A , B ), (B , t) corresponds to assigning job as the last job on a machine.
The arc costs reflect the selected sequence of jobs on a machine.If a job j ∈ N has no "hard" deadline, define d j = + ∞.For the final pair of the chain (A , B ), (B , t), the cost of scheduling job as the last job on a machine is equal to the contribution of job ∈ N to the objective function.It can be found as the optimal value Z * for the problem ( 6) with j = and u j = d j .Thus, for each j ∈ N , we compute the value Z * j and assign this value as a cost of the arc (B j , t).
If a pair of arcs (A j , B j ), (B j , A k ) with r j < r k belongs to a certain path from s to t, then job j is sequenced on some machine immediately before job k, and therefore must complete before time min d j , r k .The cost associated with the job sequence ( j, k) is equal to the smallest value Z * j,k of the objective function Z j,k for the problem Minimize Z j,k = f j r j + p + pg j γ j p subject to 0 < p ≤ min d j − r j , r k − r j , (9) which can be seen as problem ( 6), where u j = min{ d j − r j , r k − r j }.For each pair ( j, k) where 1 ≤ j < k ≤ n, we compute the value Z * j,k and assign this value as a cost of the arc (B j , A k ).
For each arc (A j , B j ) ∈ T AB , the cost is set equal to −M, where M is a large positive number.This guarantees that every arc (A j , B j ) ∈ T AB receives a flow of 1, so that each job j ∈ N will be scheduled.If we ignore the costs of the arcs (A j , B j ) ∈ T AB , the total cost of the found flow is equal to the optimal value of the function F + G.
Thus, if one of the paths from s to t visits the sequence of nodes (s, A j 1 , B j 1 , A j 2 , B j 2 , . . ., A j y , B j y , t), then in the associated schedule on some machine the sequence of jobs ( j 1 , j 2 , . . ., j y ) is processed.The actual processing time p * j i of job j i , 1 ≤ i ≤ y − 1, is equal to the value of p that delivers the smallest value of Z * j i , j i+1 , while for the last job j y the actual processing p * j y is defined by the value of p that delivers the smallest value of Z * j y .
As in Sect.2, we may assume that determining the cost of each arc of network H takes constant time, so that all the costs will be found in O n 2 time.The required flow can be found in O n 2 m by applying the successive shortest path algorithm, similar to the Ford-Fulkerson algorithm; see Ahuja et al. (1993).

Theorem 2
The problem + of minimizing the sum of total cost F and total energy G on m parallel machines is solvable in O n 2 m time by finding the minimum-cost flow of value m in network H , provided that the cost of each arc of H can be computed in constant time.
The described approach can be extended to the problem of determining the optimal number of parallel machines to be used.This aspect is particularly important in modern computing systems, as there are overheads related to initialization of virtual machines in Clouds, and overheads for activating the machines which are in the sleep mode.
Suppose that using v parallel machines incurs cost σ v , 1 ≤ v ≤ m, and we are interested in minimizing F + G plus additionally the cost σ v of all used machines.This can be done by solving the sequence of flow problems in network H , trying flow values 1, then 2, etc. up to an upper bound m on the machine number.For each tried value of v, 1 ≤ v ≤ m, the function F + G + σ v is evaluated and the best option is taken.The running time for solving the resulting problem remains O n 2 m , since the successive shortest path algorithm for finding the min-cost flow of value m will iteratively find the min-cost flows with all required intermediate values 1, 2, . . ., m − 1.

Theorem 3
The problem + of minimizing the sum of total cost F, total energy G and the cost σ v for using v ≤ m machines, where v is a decision variable, is solvable in O n 2 m time, under the assumptions of Theorem 2.
A drawback of the model with the aggregated objective function is that it schedules all arrived jobs.In the case of a rather short interval available for processing a job, this can only be achieved if a very high speed is applied, which may be unacceptably expensive.It may appear to be beneficial not to accept certain jobs and to pay an agreed rejection fee.
Suppose that the cost of rejecting job j ∈ N is δ j .Let N A be the set of accepted jobs, while N R = N \N A be the set of rejected jobs.If we want to minimize the sum of the performance function for the accepted jobs, total energy used and total rejection penalty, we need to solve the problem with the objective function

123
which is equivalent (up to the additive constant j∈N δ j ) to In the network model, we replace the cost on an arc A j , B j ∈ T AB by − δ j , keeping the cost of an arc B j , A k ∈ T B A the same as in the basic model.Recall that the latter cost is found by solving problem (9).Since in an optimal solution less than m machines can be used, we add an extra arc (s, t) of capacity m and zero cost.
For example, suppose that the minimum-cost flow of value , ≤ m, in the modified network is found, and one of the paths from s to t visits the sequence of nodes (s, A j 1 , B j 1 , A j 2 , B j 2 , . . ., A j y , B j y , t).Then, the sequence of accepted jobs j 1 , j 2 , . . ., j y is processed on some machine, and the contribution of job j i is equal to the cost of the arc that leaves node B j i , found by solving problem (9), plus the cost − δ j i of the arc that enters node B j i , 1 ≤ i ≤ y.The described adjustments do not change the time complexity of the approach.
Theorem 4 The problem + in which it is required to determine the set N R of rejected jobs to minimize the sum of total cost F, total energy G and the cost j∈N R δ j is solvable in O n 2 m time, under the assumptions of Theorem 2.

Problem 1 on a single machine
In this section, we consider the problem of minimizing total energy G subject to a constraint on total cost F on a single machine.The presented solution approach is based on Karush-Kuhn-Tucker (KKT) reasoning in relation to the associated Lagrange function.This approach works for a wide range of functions G and F; however, below for simplicity it is presented for the case that F = j∈N C j − r j , i.e., F represents total flow time.Moreover, a natural interpretation of the obtained results occurs if for each j ∈ N the energy function g j is polynomial, strictly convex, decreasing in p j and job-independent, e.g., satisfies (3) with β j = 1.
Due to the immediate start condition, we see that C j −r j = p j , and let P be a given upper bound on F = j∈N p j .Let u j be an upper bound on the actual processing time p j , defined as in Sect. 2. Denote G j p j = p j g j γ j p j , j ∈ N .
Then, the problem we study in this section can be formulated as Such a problem can be classified as a nonlinear resource allocation problem with continuous decision variables; see the survey by Patriksson (2008).Note that we can limit our consideration to the case of P < j∈N u j ; otherwise, in an optimal solution p j = u j for all j ∈ N .
The KKT conditions guarantee that there exists a value λ * such that Q (λ * ) = 0.Such a multiplier λ * and vector p (λ * ) deliver the minimum to the Lagrangian function, so that vector p (λ * ) is a solution to problem (10), i.e., defines the optimal values of the actual processing times.
Differentiating functions G j p j denote For a polynomial energy function, e.g., G j p j = p j , j ∈ N , the values λ j admit a natural interpretation.Indeed, λ j = −G j u j = 2γ 3 j u 3 j = 2s 3 j , i.e., if for a job j the actual processing time is equal u j , then this job is processed at speed 3 λ j /2.
Let π = (π (1) , π (2) , . . ., π (n)) be a permutation of the numbers 1, 2, . . ., n such that For a k, 1 ≤ k ≤ n, in accordance with the KKT reasoning for the resource allocation problem (Patriksson 2008), define (11) where the values p 0 π ( j) for k + 1 ≤ j ≤ n are solutions of the system of equations By applying binary search with respect to k, we find a value k * such that either In the former case, we define λ * = λ π (k * ) and p (λ * ) = p λ π (k * ) ; otherwise, solve the system of equations Having solved the latter system, we determine λ * and the values p * π ( j) , k * + 1 ≤ j ≤ n.The components of the solution vector p (λ * ) are defined by The search for the value k * takes at most log n iterations, and system (12) has to be solved in each iteration.Additionally, system (13) has to be solved at most once.If energy functions are cubic, we may assume that solving systems ( 12) and ( 13) requires time that is linear with respect to the number of decision variables.Indeed, the solution to ( 12) is given by The solution to (13) is given by Thus, we have proved the following statement.

Theorem 5
The problem 1 of minimizing total energy G on a single machine, subject to the bounded total flow time F ≤ P, reduces to the nonlinear resource allocation problem and can be solved in O (n log n) time, provided that energy functions g j are polynomial, strictly convex, decreasing in p j and job-independent.
The following remark is useful for justifying the solution method for the bicriteria problem, presented in the next section.Simultaneous equations ( 13) imply that in an optimal solution for each job π ( j), 1 ≤ j ≤ k * , the equality p π ( j) (λ * ) = u π ( j) holds, i.e., each of these jobs fully uses the interval r π ( j) , r π ( j) + u π ( j) available for its processing.The processing speed of job π , so that all jobs π ( j), k * + 1 ≤ j ≤ n , are processed at the same speed 3 √ λ * /2 and none of these jobs fully uses the available interval.Moreover, since λ π ( 1 we conclude that the common speed at which each job π ( j), k * +1 ≤ j ≤ n, is processed is less than the processing speed of the jobs π ( j), 1 ≤ j ≤ k * .

Problem 2 on a single machine
In this section, we describe an approach to solving the bicriteria problem, in which it is required to simultaneously minimize total cost F and total energy G on a single machine.Recall that a schedule S is called Pareto-optimal if there exists no schedule S such that F(S ) ≤ F(S ) and G(S ) ≤ G(S ), where at least one of these inequalities is strict.
Although the outlined approach can be extended to deal with rather general cost functions, below we present it for F = n j=1 C j − r j and G = n j=1 p j g j . The solution of the problem of finding the Pareto optimum is given in the space of variables F and G by (i) a sequence of break-points F 0 , F 1 , F 2 , . . ., F ν of the variable F and (ii) an explicit formula that expresses variable G as a function of variable F ∈ F k , F k+1 for all k = 0, 1, . . .., ν − 1.As we show below, ν = n.
In line with the reasoning presented in Sect.4, compute The value s j represents the speed at which job j ∈ N has to be processed to get the actual processing time u j .Determine a permutation π = (π (1) , π (2) , . . ., π (n)) of the numbers 1, 2, . . ., n such that For completeness, define s π Denote Γ = j∈N γ j .Introduce F 0 = 0 and Theorem 6 For the bicriteria problem 2 of minimizing total flow time F and total energy G on a single machine, the values F k , 0 ≤ k ≤ n, defined by (15) correspond to the break-points of the variable F, and the variable G can be expressed as Proof The fact that the values F k , 0 ≤ k ≤ ν, are indeed break-points and that ν = n follows from the structure of an optimal solution of the problem of minimizing total energy G subject to an upper bound on the sum of actual processing times; see ( 11) and ( 14) from Sect. 4. For F ∈ (F k , F k+1 ] considering the jobs in accordance with the permutation π , the actual processing times of the first k jobs are fixed to their upper bounds, while the actual processing times of the remaining jobs are obtained by running these jobs at a common speed s, that decreases starting from s π (k) .
The next break-point F k+1 occurs when s becomes equal to s π (k+1) .Note that break-points F k and F k+1 coincide if s π (k) = s π (k+1) , but we count them separately so that indeed ν = n.The last break-point F n corresponds to the situation that the actual processing time of job π(n) is equal to its largest possible value u π(n) .Consider the interval (F 0 , F 1 ].In this interval, the jobs are run with a speed s ≥ s π (1) , so that for F ∈ (F 0 , F 1 ], F = j∈N p j = j∈N γ j /s = Γ /s.We deduce that which complies with ( 16) for k = 0. Now consider the next interval (F 1 , F 2 ].It follows that F ∈ (F 1 , F 2 ] can be written as as a function of speed s, where s decreases from s π (1) to s π (2) , so that and as ( 16) for k = 1.Consider an interval (F k , F k+1 ] for some k, 0 ≤ k ≤ n−1.It follows that F ∈ (F k , F k+1 ] can be written as where s decreases from s π (k) to s π (k+1) , so that Computing G for all values of k, 0 ≤ k ≤ n − 1, takes O (n log n) time.This proves the theorem.

Conclusions
In this paper, we address several versions of the scheduling model that combines a well-established feature of speed scaling and a requirement of immediate job starting times, that is typical for modern Cloud computing systems.Both objectives are of the min-sum type, one depending on the job completion times, and another one on the machine usage cost.We show that the single-machine model with n jobs can be solved in O(n log n) time for two single criterion versions of our problem, + and 1 , or for the most general bicriteria version 2 .The single criterion version + of the multi-machine model with n jobs and m machines is solvable in O(n 2 m) time.
Presented results for immediate start models can be naturally generalized to handle problems that combine a max-type scheduling objective F max = max j∈N f j (C j ) and the energy component G.For example, for f j (C j ) = C j or f j (C j ) = C j − d j the objective F max becomes either the makespan C max or the maximum lateness L max , respectively.
-For problem max 1 (minimizing energy G subject to an upper bound F on the value of F max ), define deadlines induced by a given value of F, eliminate F max from consideration by setting f j (C j ) = 0, j ∈ N , and solve problem + to minimize G + 0 using the techniques from Sects.2, 3.
-As far as problem max + is concerned, function F max is convex in p j for the most popular min-max scheduling objectives, such as F max ∈ {C max , L max }.Since the energy component G is also convex in p j , it follows that the objective F max + G is convex and its minimum can be found by a numerical method of convex optimization.
To summarize, our study can be considered as the first attempt to explore fundamental properties of the new model with the immediate start condition.Future research may elaborate further the applied aspects of our study: the basic system model can be enhanced to address a range of issues typical for modern Cloud computing systems, such as heterogeneous physical machines having different speed characteristics and energy usage functions, introduction of virtual machines with possible allocation of several virtual machines to one physical machine, the possibility of migrating virtual machines and associated tasks.Our study may also serve as a basis for the development of the online algorithms for problems with the immediate start condition.Notice that the online versions of the traditional models of power-aware scheduling, without immediate start, are proposed by Albers and Fujiwara (2007), Bansal et al. (2010), Chan et al. (2013), Lam et al. (2008Lam et al. ( , 2012)).