Hierarchical genetic-based grid scheduling with energy optimization
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s10586-012-0226-7
- Cite this article as:
- Kołodziej, J., Khan, S.U., Wang, L. et al. Cluster Comput (2013) 16: 591. doi:10.1007/s10586-012-0226-7
- 14 Citations
- 1.6k Views
Abstract
An optimization of power and energy consumptions is the important concern for a design of modern-day and future computing and communication systems. Various techniques and high performance technologies have been investigated and developed for an efficient management of such systems. All these technologies should be able to provide good performance and to cope under an increased workload demand in the dynamic environments such as Computational Grids (CGs), clusters and clouds.
In this paper we approach the independent batch scheduling in CG as a bi-objective minimization problem with makespan and energy consumption as the scheduling criteria. We use the Dynamic Voltage Scaling (DVS) methodology for scaling and possible reduction of cumulative power energy utilized by the system resources. We develop two implementations of Hierarchical Genetic Strategy-based grid scheduler (Green-HGS-Sched) with elitist and struggle replacement mechanisms. The proposed algorithms were empirically evaluated versus single-population Genetic Algorithms (GAs) and Island GA models for four CG size scenarios in static and dynamic modes. The simulation results show that proposed scheduling methodologies fairly reduce the energy usage and can be easily adapted to the dynamically changing grid states and various scheduling scenarios.
Keywords
Genetic algorithmHierarchical genetic strategyComputational gridSchedulingDynamic voltageFrequency scaling1 Introduction
Grid computing has emerged as a wide area distributed platform for solving the large-scale problems in science, engineering, etc. Computational Grid (CG) involves the combination of many computing resources into a network for the execution of computational tasks. The resources are distributed across multiple organizations, administrative domains having their own access, usage policies and local schedulers. The tasks scheduling and the effective management of the resources in such systems remain complex problems and therefore, demands sophisticated tools for analyzing the algorithms performances before applying them to the real systems.
The main issues related to energy efficiency have been introduced by the large scales of enterprise computing environments and data centers. Due to the importance of power and energy consumption in modern-day and future computing and communication systems various techniques and recent technologies have been investigated and developed. However, these solutions are mainly related to an optimization of the system thermodynamics [37]. It requires profiles of hardware energy consumption and application energy consumption, and the correlation between workload distribution and the energy consumption of power and cooling [12].
While the CGs have been widely promoted as cheap alternative to supercomputers, a significant disproportion of resource availability and resource provisioning may be observed in the system [21]. Therefore, the current efforts in the grid computing research focus on the design of new effective grid schedulers, that can simultaneously optimize the key grid objectives, such as makespan, flowtime, resource and cumulative energy utilization [11]. Energy efficient scheduling in CGs becomes a complex endeavor due to the multi-constraints, various optimization criteria and different priorities of the resource owners. Various types of information and data processed in the large-scale dynamic grid environment may be incomplete, imprecise, fragmentary and overloading, which complicates the assignment scores, availability of resources, and the may increase the amount of energy used in the system [43]. Heuristic approaches have shown great potential to solve many demanding, real-world decision and optimization problems in uncertain large-scale environments and seem to be the effective means for designing energy-aware grid schedulers in CGs [27, 28].
The main objective of this work is to define an effective genetic-based batch scheduler, that can be easily implemented in the dynamic grid environment and enable an energy aware allocation of the grid resources. We address a Independent Batch Scheduling problem in CGs, where tasks are processed in a batch mode and there are no dependencies among them. This scheduling scenario is very useful in illustrating many realistic grid approaches[ref]. We define two main scheduling criteria, which are optimized in hierarchical mode, namely makespan as the privileged criterion and average energy consumption. We use a Dynamic Voltage Scaling (DVS) methodology for reducing the cumulative power energy utilized by the system resources. Based on the result of our preliminary study on the effectiveness of mono-population genetic-based schedulers in energy-aware scheduling in grids [26, 29], we developed two implementations of hierarchical Green-HGS-Sched genetic scheduler and provided the empirical evaluation in two “energetic” scheduling modes in static and dynamic grid scenarios. The performance of these hierarchical schedulers have been measured by using the makespan and relative energy consumption improvement rate metrics. The effectiveness of the implementations of Green-HGS-Sched were compared with the results achieved by four single-population Genetic Algorithms (GAs) and Island GA scheduler [46]. All schedulers have been integrated with the grid simulator.
The remainder of this paper is structured as follows. Related work is discussed in Sect. 2 and the addressed scheduling problem is specified in Sect. 3. The generic energy model is defined in Sect. 5 together with the main scheduling scenarios and criteria. The Green-HGS-Sched framework and genetic operators are presented in Sect. 7. Section 8 presents the results of a simple empirical analysis of the effectiveness of hierarchical, island and mono-population schedulers. The paper is summarized in Sect. 9.
2 Related work
The static management methodologies are working usually at the hardware level of the class of the static management In such systems, the physical computational devices can be replaced by the low-power battery machines or nano-processors and the system workload can be effectively distributed. It allows to optimize the energy utilized for computing the applications, storage and data transfer by reducing the number of idle devices and idle periods of active processors. The major projects based on the static power management include Green-Destiny [45], FAWN [3] or Gordon [10] projects.
Dynamic Voltage and Frequency Scaling method became recently a key dynamic power management methodology supporting the energy efficient scheduling in grids and large-scale data systems [30, 49]. In most of the DVFS approached the scheduling has been defined as classical or dynamic load balancing problem. Khan and Ahmad [21] have successfully used the game theory paradigm for the optimization of the system performance and energy consumption. Several research works have used similar models and approaches, that have addressed various research problems related to large-scale computing systems, such as energy proportionality [17, 20], memory-aware computations, data intensive computations, energy-efficient, and grid scheduling [22, 39]. A lot of interesting examples of recently developed static and dynamic power and energy management techniques in the distributed computing environments are presented in the following surveys [5, 40, 43, 44].
Although a significant volume of the research has been provided in energy effective scheduling and resource allocation in large-scale computing systems, still not so large family of energy-aware genetic-based grid and cloud schedulers have been developed. Most of those approaches need an implementation of specially designed genetic operators, such as partially matching or cycle crossover and swap or rebalancing mutation mechanisms primarily designed for solving the complex combinatorial optimization problems [25]. An energy consumed by the system is usually just one of the components of a multi-objective fitness function.
In [41] and [42] Shen et al. present a shadow price technique for improving the genetic operations in standard GA used as a scheduler in computational cloud. The “shadow price” for a pair task-machine is defined as an average energy consumption per instruction for the processor that can operate at different voltage levels. Then the classical move and swap mutation operations are used for an optimal mapping of tasks to machines. The fitness function for such GA scheduler is expressed as a total energy consumption.
Kessaci et al. in [19] present two versions of multi-objective parallel Genetic Algorithm (MOPGA) hybridized with energy-conscious scheduling heuristics (ECS). The GA engine is based on the concepts of island GA and multi-start GA models. The authors consider parallel applications represented by a directed acyclic graph (DAG), which are mapped onto multi-processors machines. The voltage and frequencies of the processors are scaled up at 16 discrete levels and genes in GA chromosomes are defined by the task-processor labels and processor voltage. The objective function is composed of two criteria: privileged makespan and total energy consumption in the system. The reduction of the energy utilization achieved in the experimental analysis is about 47.4 %.
The solution presented in [19] is dedicated to general computing and embedded systems. An application of such methodology in computational cloud is demonstrated by Mezmaz et al. in [34]. The energy conservation rate in cloud system is very similar to the results obtained in the general case.
Another hybrid GA approach is presented by Miao et al. in [35]. The authors propose a multi-objective genetic algorithm which is hybridized with simulated annealing for the improvement of the local solutions.
3 Independent batch scheduling problem in computational grids
Due to the high parametrization, sheer size and dynamics of the grid system, scheduling problems in grids may be considered in fact as a family of NP-complete optimization problems [13]. Depending on the requirements of the grid users, the complexity of the problem can be determined by the number of objectives to be optimized, the type of the environment (static or dynamic), task processing (immediate or batch), task interrelations (independence or dependency), grid resource management (centralized, decentralized and hierarchical), and many others. To achieve the desired performance of the system, both users’ conditions and grid environment information must be “embedded” into the scheduling mechanism [1], [25].
In this paper we address an Independent Batch Scheduling problem in CGs. In this problem, it is assumed that tasks are grouped into batches and can be executed independently in static or dynamic grid environments. The scheduling attributes needed for the specification of this problem are highlighted in Fig. 2 as the dark blue text boxes. The generic independent batch scheduling model is effective in massive parallel processing of the applications that require a large amount of data. Therefore, there are many realistic scenarios, including banking systems, virtual campuses, health systems, bio-informatics applications, and many others, where independent batch scheduling is successfully applied. However, even under the independent nature of tasks and the batch processing, the problem is computationally hard to solve.
- (i)
Get the information on available resources;
- (ii)
Get the information on pending tasks;
- (iii)
Get the information on data hosts where data files for tasks completion are required;
- (iv)
Prepare a batch of tasks and compute a schedule for that batch on available machines and data hosts;
- (v)
Allocate tasks;
- (vi)
Monitor (failed tasks are re-scheduled).
To our best knowledge there is no standard notation for classification of the scheduling problems in CGs. A simple extension of conventional Graham’s [16] and Brucker’s [8] classifications of scheduling problems has been proposed by Fibich et al. [14]. The characteristics of the resource-constrained project scheduling problem [9] and resource-constrained machine scheduling [6, 7] may be helpful in the specification and formal description of the grid resources.
Rm—according to the Graham’s notation, it means that the tasks are mapped into the (parallel) resources of various speed^{1};
b—means that the task processing mode is “batch mode”;
indep—denotes “independency” as the task interrelation;
(stat,dyn)—means that we will consider both static and dynamics grid scheduling modes;
h—means that the scheduling objectives are optimized in hierarchical mode;
C_{max}—denotes a makespan as the privileged scheduling objective;
E_{I}(E_{batch})—denotes total energy consumption as the second scheduling criterion (E_{I} or E_{batch} is selected depending on the scheduling scenario (see Sect. 6.4)).
Most of these parameters will be explained in Sect. 4, 5 and 6.
4 Expected time to compute (ETC) matrix model
In order to estimate the execution times of tasks on machines we used the Expected Time to Compute (ETC) matrix model [2] adopted to the independent batch scheduling. It is assumed in this model that each task can only be executed on one grid node in each batch and no preemptive process is allowed within tasks or resources. In the case of the failures of machines, the tasks are re-scheduled in the next batch, however, the scheduling of tasks in different batches are the independent processes. It is also assumed that when a machine processes its tasks, there is no priority distinctions between the tasks assigned in the previous batches and those assigned in the current batch. And finally, each machine cannot remain idle and all tasks assigned to this machine must be activated.
n—is the number of tasks in a batch;
m—is the number of machines available in the system for an execution of a given batch of tasks;
N={t_{1},…,t_{n}}—denotes the set of tasks in a batch;
M={x_{1},…,x_{m}}—denotes the set of available machines for the task batch;
N_{l}={1,…,n}—is the set of tasks’ labels;
M_{l}={1,…,m}—is the set of machines’ labels.
- (a)Taskj:
wl_{j}—load parameter expressed in Millions of Instructions (MI)– we denote by WL=[wl_{1},…,wl_{n}] a workload vector for all tasks in the batch;
- (b)Machinei:
cc_{i}—computing capacity parameter expressed in Millions of Instructions Per Second (MIPS), we denote by CC=[cc_{1},…,cc_{m}] a computing capacity vector;
ready_{i}—ready time of i, which expresses the time needed for the reloading of the machine i after finishing the last assigned task, a ready times vector for all machines is denoted by ready_times=[ready_{1},…,ready_{m}].
In this generic model there is no detailed specification of the types of tasks and machines. The tasks can be considered as monolithic applications or large-scale metatasks with no dependencies among the components. The workloads of tasks can be estimated based on the specifications provided by the users, on historical data, or it can be generated based on the system predictions [18]. As machines we usually define the multiprocessors or parallel machines (see Rm parameter in the notation) or even small local area networks or computational clusters.
All ETC[j][i] parameters are defined as the elements of an ETC matrix, ETC=[ETC[j][i]]_{n×m}, which is the main structure in ETC model.
In simulation analysis, wl_{j} and cc_{i} parameters are usually generated by using the Gamma probability distribution (or the standard Gauss distribution) [32] in order to express the heterogeneities of tasks and machines in the grid system (see Sect. 8.1).
5 Energy model
DVFS levels for three machine classes
Level | Class I | Class II | Class III | |||
---|---|---|---|---|---|---|
Volt. | Rel. freq. | Volt. | Rel. freq. | Volt. | Rel. freq. | |
0 | 1.5 | 1.0 | 2.2 | 1.0 | 1.75 | 1.0 |
1 | 1.4 | 0.9 | 1.9 | 0.85 | 1.4 | 0.8 |
2 | 1.3 | 0.8 | 1.6 | 0.65 | 1.2 | 0.6 |
3 | 1.2 | 0.7 | 1.3 | 0.50 | 1.9 | 0.4 |
4 | 1.1 | 0.6 | 1.0 | 0.35 | ||
5 | 1.0 | 0.5 | ||||
6 | 0.9 | 0.4 |
For lower supply voltage, the operation frequency of the machine decreases, which means that \(f_{s_{l}}(i)\) coefficients are within the range of [0, 1]. We assume in this work that the supply voltage is constant during the calculation (execution) of each task, but may be different for different tasks.
It can be observed from the (5) that the inversions of the relative frequency coefficients approximately estimates the raises of the completion times of tasks on machines. It is a consequence of the previous assumption about the inverse proportion of the completion times of tasks and the frequencies of machines.
6 Scheduling scenarios and objectives
The DVFS-based energy model for grid system defined in the previous section, will be used now for the specification of two scheduling scenarios and for the definition of the scheduling criteria.
6.1 Scheduling representation
The solutions of the scheduling problem addressed in this paper (schedules) can be encoded as the permutation strings (with and without repetitions) of task and machine labels. We consider in this paper two different encoding methods of schedules, namely direct representation and permutation-based representation.
The cardinality of \(\mathcal{S}\) is m^{n}.
The cardinality of \(\mathcal{S}_{(1)}\) is n!.
In this representation some additional information about the numbers of tasks assigned to each machine is required. Therefore, we defined a vector v=[v_{1},…,v_{m}]^{T} of the size m, where v_{i} denotes the number of tasks assigned to the machine i.
6.2 Scheduling scenarios
The problem of scheduling tasks in CG is multi-objective in its general setting as the quality of the solutions can be measured under several criteria.
Two basic models are utilized in multi-objective optimization: hierarchical and simultaneous modes. In the simultaneous mode (s) all objective functions are optimized simultaneously while in the hierarchical (h) case, the objectives are sorted a priori according to their importance in the model. The process starts by optimizing the most important criterion. When further improvements are impossible, the second criterion is optimized under the restriction of keeping unchanged (or improving) the optimal values of the first one. In this paper we define the scheduling problem as a discrete 2-step hierarchical global optimization procedure, where: (i) a makespan is considered as a dominant criterion, and it is minimized in the first step, and (ii) in the second step a total energy consumption is minimized with the assumption that the makespan value does not increase.
- 1.
I—Max-Min Mode, in which each machine works at the maximal DVFS level during the computations and turns into the idle mode after the execution of all tasks assigned to this machine;
- 2.
II—Modular Power Supply Mode, in which each machine may work at different DVFS levels during the task executions and then may turn into the idle mode.
The first mode seems to be the most effective in the case of low-power devices or services defined as “machines” (resources) in the system. No modification of the standard scheduling procedures and standard scheduling objectives, such as makespan, flowtime, tardiness, etc., is necessary. The second mode may be a good candidate for a testbed architecture for the future generation grid systems. The optimal power supply levels can be specified for each current devices (machines), that can be in the future replaced by the next-generation low-power devices for keeping (or improving) the energy consumption at optimal level.
In the following two subsections we define the procedures of calculation of the makespan and total energy consumed in the system in these two above mentioned scenarios.
6.3 Makespan optimization
For the machine with the maximal completion time (makespan) the idle factor is zero.
6.4 Energy optimization
The second step of the scheduling optimization procedure is the minimization of the total energy consumed in CG for scheduling a given batch of tasks. We assume the minimal power supply for each machine in the idle mode and maximal power and voltage supply in reloading process.
7 Green-HGS-Sched: energy-aware hierarchical genetic scheduler
An exploration of the search space in grid scheduling is very complex, mainly because of the sheer size of the solution space and the system dynamics. The search space is determined by the permutations of tasks or machines’ labels, but the lengths of these permutation strings may vary as the numbers of tasks and/or machines can change over time. Additional probability distributions should be then specified for an estimation of the system states in considered time intervals.
In this paper we adapted to the green grid scheduling the Hierarchic Genetic Scheduler (Green-HGS-Sched) developed in [25] as an effective multi-level hierarchic alternative for the single-population genetic-based schedulers. The main aim of Green-HGS-Sched is a comprehensive exploration of the scheduling landscape by the execution of many dependent evolutionary processes. This scheduler can be modeled as a multi-level decision tree. The search process starts by activating a scheduler with the lowest possible accuracy of search, that is interpreted as the “core” in the tree model. This process is responsible for the management of the whole search process, and for the detection of promising partial solutions. More accurate processes are activated in the neighborhoods of those partial solutions for the prevention of the premature convergence of the scheduler and for a possible improvement of the best solutions found in the system. The activation of these processes does not increase significantly the complexity of the hierarchic scheduler because of three main reasons: (i) differently to the hybrid strategies, where the components are usually composed of various meta-heuristics and local search methods, we use the same general framework for the algorithms working at all levels of the tree; (ii) the tree extension is steered by the specialized operations responsible for the deactivation of the ineffective processes and by the effectiveness of the search in the core of the tree; (iii) finally, the synchronization of the search is provided ‘horizontally’ at each level of the tree, so there is no need to refer to the parental nodes and enables an easy adaptation to the actual system state. For all these reasons Green-HGS-Sched significantly differs from the existed hierarchical, hybrid and branching schedulers applied to the various grid scheduling problems and classical job-shop problems (see e.g. [8]).
Each branch of the tree is created by an active genetic algorithm designed for solving the scheduling problems in CGs. The accuracy of search in Green-HGS-Sched branches is defined by the degree parameter with lowest value 0 set for the core of the system.
e∈ℕ defines the global metaepoch counter;
t∈{1,…,M},M∈ℕ and M is the maximal degree of the branches;
r is the number of branches of the same degree.
In the case of the conditional sprouting of the new branches of the degree t+1 from the parental branch of the degree t the keys are calculated for the best individual in the parental branch and individuals in all populations in all active branches of the degree t+1. If there is any individual in the higher degree branches, for which the key matches the key of the best adapted individual in the parental branch, then the value of BC is 1 and no branch of the degree t+1 is sprouted.
In the case of the comparison of the branches of the same degree t, all branches, in which there exists the individuals with the identical keys have to be reduced and a single joint branch is created (the value of BC is 1). The individuals in this branch are selected from the “youngest” (in the sense of the population evolution) populations in all reduced branches.
It has been shown in [47] and [25] that hash technique can reduce significantly (50–70 % ) the execution time of the genetic algorithms, where indication of the similarity of solutions is necessary.
7.1 Genetic Engine in Green-HGS-Sched
We apply the direct representation of the schedules in the base populations P^{t} and P^{t+1}, and permutation representation in \(P_{c}^{t}\) and \(P_{m}^{t}\) populations to implement the crossover and mutation operators. An initial population is generated randomly. Based on our previous results of implementation of genetic-based meta-heuristics to green scheduling in grids we use the following configuration of genetic operations in the main loop of the Algorithm 1: (i) Linear Ranking as selection scheme, (ii) Cycle Crossover (CX) operator and (iii) Move mutation method [36].
In Cycle Crossover (CX) each task in a chromosome must occupy the same position, so that only interchanges between alleles (positions) can be made. Firstly, a cycle of alleles is identified. The crossover operator leaves the cycles unchanged, while the remaining segments of the parental strings are exchanged. The main idea of Move mutation a task is moved from one machine to another one. Although the task can be appropriately chosen, this mutation strategy tends to unbalance the number of tasks per machine.
The possible high computational cost of the struggle strategy may be reduced by implementing a hash technique, as it was proposed in the previous section for BC operator. Using the struggle replacement mechanism in genetic grid schedulers allow us a fine tuning of the scheduler to “converge” to a good solution depending on available time (for instance, scheduler’s time activation interval) [47].
8 Empirical evaluation of the hierarchical genetic schedulers
In this section we present the results of empirical evaluation of two implementation of Green-HGS-Sched, namely HGS-Elit and HGS-St with Elitist Generational and Struggle replacement mechanisms (respectively) in the branches. We compare the efficiency of hierarchical schedulers with the results achieved by single-population GAs and Island Models with the same configuration of genetic operators and parameters. The experiments were conducted in two “energetic” scenarios, namely Max-Min Mode and Modular Power Supply Mode defined in Sect. 6.2. For simulating various grid size scenarios in static and dynamic modes we used the Energy-aware Hyper-G grid simulator introduced in [26]. The main idea of the simulator is presented in Sect. 8.1. The empirical results are analyzed in Sect. 8.3.
8.1 Energy-aware HyperSim-G grid simulator
workload vector of tasks;
computing capacities of machines;
prior loads of machines;
machine categories specification parameters (number of classes, maximal computational capacity value, computational capacity ranges interval for each class, machine operational speed parameter for each class, etc.);
DSVF levels matrix for machine categories; and
the ETC matrix.
The input data is needed for generation of the scheduling event, which is passed on to the selected scheduler in order to compute the optimal schedule(-s). Finally, the scheduler sends the optimal schedule(-s) back to the simulator, which allocate the resources and simulate the computation process.
The instances produced by the simulator for our experiments are divided into static and dynamic grid scheduling benchmarks. In the static case, the number of tasks and the number of machines remain constant during the simulation, while in the dynamic case, both parameters may vary over time. In both static and dynamic cases four Grid size scenarios are considered: (a) small (32 hosts/512 tasks), (b) medium (64 hosts/1024 tasks), (c) large (128 hosts/2048 tasks), and (d) very large (256 hosts/4096 tasks).
Number of hosts: Number of resources in grid;
MIPS: A probability distribution specified for modeling the computing capacity of resources;
Total tasks: Number of tasks in a given batch;
Workload: A probability distribution used for modeling the workload of tasks;
Host selection: Selection policy of resources, the parameter (all means that all resources of the system are selected for scheduling purposes);
Task selection: Selection policy of tasks, the parameter (all means that all tasks in the system must be scheduled);
Number of runs: Number of simulations done with the same parameters, reported results are then averaged over this number.
Values of key parameters of the grid simulator in static and dynamic cases
Small | Medium | Large | Very large | |
---|---|---|---|---|
Static case | ||||
Nb. of hosts | 32 | 64 | 128 | 256 |
Resource cap. (in MHz CPU) | N(5000,875) | |||
Total nb. of tasks | 512 | 1024 | 2048 | 4096 |
Workload of tasks | N(250000000,43750000) | |||
Dynamic case | ||||
Init. hosts | 32 | 64 | 128 | 256 |
Max. hosts | 37 | 70 | 135 | 264 |
Min. hosts | 27 | 58 | 121 | 248 |
Resource cap. (in MHz CPU) | N(5000,875) | |||
Add host | N(625000,93750) | N(562500,84375) | N(500000,75000) | N(437500,65625) |
Delete host | N(625000,93750) | |||
Total tasks | 512 | 1024 | 2048 | 4096 |
Init. tasks | 384 | 768 | 1536 | 3072 |
Workload | N(250000000,43750000) | |||
Interarrival | E(7812,5) | E(3906,25) | E(1953,125) | E(976,5625) |
In the dynamic case we have to specify the minimal and maximal values for numbers of tasks and machines in the system. The resources can be dropped or added to grid with the frequencies defined by the Gaussian distributions (add host and delete host parameters). New tasks may arrive in the system with frequency parameter denoted by interarrival, until a total tasks value is reached. An Activation parameter establishes the activation policy according to an exponential distribution. The already scheduled tasks that have not been executed yet will be rescheduled if reschedule is true.
We consider 16 DVFS levels for three “energetic” resource classes: Class I, Class II and Class III presented in Table 1.
8.2 Scheduling meta-heuristics and performance measures
Six GA-based grid schedulers evaluated in the experimental analysis
Scheduler | Type of algorithm | Replacement method |
---|---|---|
GA-Elit | Single-population GA | Elitist Generational |
GA-St | Single-population GA | Struggle |
IGA-Elit | Island GA | Elitist Generational |
IGA-St | Island GA | Struggle |
HGS-Elit | Green-HGS-Sched | Elitist Generational |
HGS-St | Green-HGS-Sched | Struggle |
The aforementioned methodologies differ in the implementation of the replacement mechanism in the main genetic framework. We used Elitist Generational replacement in xxx-Elit algorithms and Struggle procedure in xxx-St algorithms. Both single-population GAs, namely GA-Elit and GA-St, are implemented as the main genetic mechanism in IGA-Elit, HGS-Elit, IGA-St and HGS-St respectively.
GA setting for static and dynamic benchmarks
Parameter | GA-Elit | GA-St |
---|---|---|
Evolution steps | 5∗m | 20∗m |
Pop. size \((\mathit{pop\_size})\) | ⌈(log_{2}(m))^{2}−log_{2}(m)⌉ | 4∗(log_{2}(m)−1) |
Intermediate pop. | \(\mathit{pop\_size}-2\) | \((\mathit{pop\_size})/3\) |
Cross probab. | 1.0 | 1.0 |
Mutation probab. | 0.2 | |
\(\mathit{max\_time\_to\_spend}\) | 30 secs (static)/45 s (dynamic) |
Green-HGS-Sched settings for static and dynamic benchmarks
Parameter | |
---|---|
\(\mathit{period\_of\_metaepoch}\) | 20∗n |
\(\mathit{nb\_of\_metaepochs}\) | 10 |
Degrees of branches (t) | 0 and 1 |
Population size in the core | 3∗(⌈4∗(log_{2}n−1)/(11.8)⌉) |
Population size in the sprouted branches \((b\_\mathit{pop\_size})\) | (⌈(4∗(log_{2}n−1))/(11.8)⌉) |
Intermediate pop. in the core | \(\mathit{abs}((r\_\mathit{pop\_size})/3)\) |
Intermediate pop. in the sprouted branch | \(\mathit{abs}((b\_\mathit{pop\_size})/3)\) |
Cross probab. | 0.9 |
Mutation probab. in core | 0.4 |
Mutation probab. in the sprouted branches | 0.2 |
\(\mathit{max\_time\_to\_spend}\) | 40 s (static)/70 s (dynamic) |
Configuration of IGA algorithm
Parameter | |
---|---|
it_{d} | 20∗n |
mig | 5 % |
Number of islands (demes) | 10 |
Cross probab. | 1.0 |
Mutation probab. | 0.2 |
\(\mathit{max\_time\_to\_spend}\) | 40 s (static)/70 s (dynamic) |
- minimal makespan defined as follows:$$ \mathit{makespan} = \min \{\mathit{Makespan}_I, \mathit{Makespan}_{\mathit{II}}\} $$(32)
8.3 Results
Both implementations of Green-HGS-Sched achieved the best results in all instances but Large grid in static case and Small and Large instances in the dynamic case, in which they lose to IGA model. The results of a simple comparison of the impact of the replacement method on the algorithms’ performance provided for all pairs of the xxx-Elit and xxx-St schedulers show that Struggle replacement is much more effective that Elitist Generational method in the case of single-population GA and IGA schedulers. It confirms the results of our preliminary study on the effectiveness of single-population genetic schedulers in CGs presented in [26]. For Green-HGS-Sched the situation is completely different. In most of the cases the effectiveness of both hierarchical implementations are at the comparative levels, with a little advantage of elite technique in the dynamic case. It means that in Green-HGS-Sched the most important is the fast exploration by the core of the system of probably wider regions in the search space than in the case of GA and IGA implementations. The core can activate the more accurate processes in the neighborhoods of the partial solutions which are undetected by the other schedulers, which makes the Green-HGS-Sched very effective in the exploration of new regions in the optimization domain and in escaping the basins of attraction of the local solutions. The complexity of the hierarchic system is in fact not a drawback of the scheduler, cause the constraints of the execution time for HGS and IGA are exactly the same. The ranges in the achieved makespan values for all considered meta-heuristics are not greater than 30–45 % of the mean makespan values, which means that the stability of all schedulers in all cases are acceptable. The distributions of the makespan results are asymmetric: the skewness in the static case is positive, for GA and IGA and negative for Green-HGS-Sched in most of the static instances, however it is negative in the dynamic grids for almost all schedulers. It means that the reduction of the average makespan in this case is much harder than in static case (the mean values are closer the third quantile, than the first one), which confirms the complexity of the problem in the realistic dynamic grid scenarios.
The results of the energy optimization differs significantly compare with the makespan results. In this case each of IGA-Elit and GA-Elit algorithms outperforms the rest schedulers in four instances. Green-HGS-Sched is not as good in energy optimization as in makespan minimizing. It means that it works quite good in Min-Max scenario, so no additional DSV modules are necessary here. The range of the average saving rate values is 10 %–35 % for most of the schedulers, which is rather high. Finally, it can be observed that the skewness of the distribution of the results is positive or neutral for the worst “energy optimizers” and negative for the best ones.
9 Conclusions
We addressed in this paper the problem of optimizing the energy utilized in CGs in independent batch scheduling. Our energy management model is based on Dynamic Voltage Scaling (DVFS) technique adapted to the dynamic grid environment. We formalized the grid scheduling problem as a bi-objective optimization task with makespan and average energy consumption as the main objectives.
For solving the addressed grid scheduling problem, we developed two implementations of an energy-efficient Hierarchical Grid Scheduler Green-HGS-Sched and provided its experimental evaluation in two ‘energetic’ scheduling modes in static and dynamic grid scenarios under the makespan and relative energy consumption improvement rate. Their effectiveness were compared with the results achieved by four single-population Genetic Algorithm (GA) and Island GA schedulers. To provide the experiments, we integrated all energy-aware schedulers within a grid simulator. The simulation results confirmed the effectiveness of the proposed schedulers in the reduction of the energy consumed by the whole system and in dynamic load balancing of the resources in grid clusters, which is sufficient to maintain the desired quality level(-s).
Our model is general in its implementation and can be easily adapted to a particular scenario and realistic grid infrastructure, such as the large-scale banking system or highly distributed data system. First, we do not consider any special architectures for grid resources, which means that this characteristics can be specify separately and integrated with the system simulator. The term “task” can be also used for monolithic applications, metatasks or parallel applications represented by Directed Acyclic Graphs. The schedulers are integrated with the main grid simulator as separate modules, and therefore they can be easily modified, extended and hybridized with the other algorithms. Finally, we simulate the dynamics of the realistic grid system, in which the availability of the resources and the number of tasks may vary over the time.
Acknowledgements
Dr. Kołodziej’s and Dr. Byrski’s research presented here was partially supported by “Biologically inspired mechanisms in planning and management of dynamic environments” grant of Polish National Science Center, No. N N516 500039.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.