1 Introduction

Grid computing has emerged as a wide area distributed platform for solving the large-scale problems in science, engineering, etc. Computational Grid (CG) involves the combination of many computing resources into a network for the execution of computational tasks. The resources are distributed across multiple organizations, administrative domains having their own access, usage policies and local schedulers. The tasks scheduling and the effective management of the resources in such systems remain complex problems and therefore, demands sophisticated tools for analyzing the algorithms performances before applying them to the real systems.

The main issues related to energy efficiency have been introduced by the large scales of enterprise computing environments and data centers. Due to the importance of power and energy consumption in modern-day and future computing and communication systems various techniques and recent technologies have been investigated and developed. However, these solutions are mainly related to an optimization of the system thermodynamics [37]. It requires profiles of hardware energy consumption and application energy consumption, and the correlation between workload distribution and the energy consumption of power and cooling [12].

While the CGs have been widely promoted as cheap alternative to supercomputers, a significant disproportion of resource availability and resource provisioning may be observed in the system [21]. Therefore, the current efforts in the grid computing research focus on the design of new effective grid schedulers, that can simultaneously optimize the key grid objectives, such as makespan, flowtime, resource and cumulative energy utilization [11]. Energy efficient scheduling in CGs becomes a complex endeavor due to the multi-constraints, various optimization criteria and different priorities of the resource owners. Various types of information and data processed in the large-scale dynamic grid environment may be incomplete, imprecise, fragmentary and overloading, which complicates the assignment scores, availability of resources, and the may increase the amount of energy used in the system [43]. Heuristic approaches have shown great potential to solve many demanding, real-world decision and optimization problems in uncertain large-scale environments and seem to be the effective means for designing energy-aware grid schedulers in CGs [27, 28].

The main objective of this work is to define an effective genetic-based batch scheduler, that can be easily implemented in the dynamic grid environment and enable an energy aware allocation of the grid resources. We address a Independent Batch Scheduling problem in CGs, where tasks are processed in a batch mode and there are no dependencies among them. This scheduling scenario is very useful in illustrating many realistic grid approaches[ref]. We define two main scheduling criteria, which are optimized in hierarchical mode, namely makespan as the privileged criterion and average energy consumption. We use a Dynamic Voltage Scaling (DVS) methodology for reducing the cumulative power energy utilized by the system resources. Based on the result of our preliminary study on the effectiveness of mono-population genetic-based schedulers in energy-aware scheduling in grids [26, 29], we developed two implementations of hierarchical Green-HGS-Sched genetic scheduler and provided the empirical evaluation in two “energetic” scheduling modes in static and dynamic grid scenarios. The performance of these hierarchical schedulers have been measured by using the makespan and relative energy consumption improvement rate metrics. The effectiveness of the implementations of Green-HGS-Sched were compared with the results achieved by four single-population Genetic Algorithms (GAs) and Island GA scheduler [46]. All schedulers have been integrated with the grid simulator.

The remainder of this paper is structured as follows. Related work is discussed in Sect. 2 and the addressed scheduling problem is specified in Sect. 3. The generic energy model is defined in Sect. 5 together with the main scheduling scenarios and criteria. The Green-HGS-Sched framework and genetic operators are presented in Sect. 7. Section 8 presents the results of a simple empirical analysis of the effectiveness of hierarchical, island and mono-population schedulers. The paper is summarized in Sect. 9.

2 Related work

Numerous interesting research projects have been recently realized in the area of energy aware resource management in modern large-scale distributed computing system. Based on the taxonomy defined for cloud computing in [15], the power and energy management methodologies in distributed computing environments can be classified into two main categories, namely static energy management (SEM) methods and dynamic energy management (DEM) techniques, as it is presented in Fig. 1 (see also [28]).

Fig. 1
figure 1

Taxonomy of energy and power management techniques in large-scale distributed computing environments [28]

The static management methodologies are working usually at the hardware level of the class of the static management In such systems, the physical computational devices can be replaced by the low-power battery machines or nano-processors and the system workload can be effectively distributed. It allows to optimize the energy utilized for computing the applications, storage and data transfer by reducing the number of idle devices and idle periods of active processors. The major projects based on the static power management include Green-Destiny [45], FAWN [3] or Gordon [10] projects.

Dynamic Voltage and Frequency Scaling method became recently a key dynamic power management methodology supporting the energy efficient scheduling in grids and large-scale data systems [30, 49]. In most of the DVFS approached the scheduling has been defined as classical or dynamic load balancing problem. Khan and Ahmad [21] have successfully used the game theory paradigm for the optimization of the system performance and energy consumption. Several research works have used similar models and approaches, that have addressed various research problems related to large-scale computing systems, such as energy proportionality [17, 20], memory-aware computations, data intensive computations, energy-efficient, and grid scheduling [22, 39]. A lot of interesting examples of recently developed static and dynamic power and energy management techniques in the distributed computing environments are presented in the following surveys [5, 40, 43, 44].

Although a significant volume of the research has been provided in energy effective scheduling and resource allocation in large-scale computing systems, still not so large family of energy-aware genetic-based grid and cloud schedulers have been developed. Most of those approaches need an implementation of specially designed genetic operators, such as partially matching or cycle crossover and swap or rebalancing mutation mechanisms primarily designed for solving the complex combinatorial optimization problems [25]. An energy consumed by the system is usually just one of the components of a multi-objective fitness function.

In [41] and [42] Shen et al. present a shadow price technique for improving the genetic operations in standard GA used as a scheduler in computational cloud. The “shadow price” for a pair task-machine is defined as an average energy consumption per instruction for the processor that can operate at different voltage levels. Then the classical move and swap mutation operations are used for an optimal mapping of tasks to machines. The fitness function for such GA scheduler is expressed as a total energy consumption.

Kessaci et al. in [19] present two versions of multi-objective parallel Genetic Algorithm (MOPGA) hybridized with energy-conscious scheduling heuristics (ECS). The GA engine is based on the concepts of island GA and multi-start GA models. The authors consider parallel applications represented by a directed acyclic graph (DAG), which are mapped onto multi-processors machines. The voltage and frequencies of the processors are scaled up at 16 discrete levels and genes in GA chromosomes are defined by the task-processor labels and processor voltage. The objective function is composed of two criteria: privileged makespan and total energy consumption in the system. The reduction of the energy utilization achieved in the experimental analysis is about 47.4 %.

The solution presented in [19] is dedicated to general computing and embedded systems. An application of such methodology in computational cloud is demonstrated by Mezmaz et al. in [34]. The energy conservation rate in cloud system is very similar to the results obtained in the general case.

Another hybrid GA approach is presented by Miao et al. in [35]. The authors propose a multi-objective genetic algorithm which is hybridized with simulated annealing for the improvement of the local solutions.

3 Independent batch scheduling problem in computational grids

Due to the high parametrization, sheer size and dynamics of the grid system, scheduling problems in grids may be considered in fact as a family of NP-complete optimization problems [13]. Depending on the requirements of the grid users, the complexity of the problem can be determined by the number of objectives to be optimized, the type of the environment (static or dynamic), task processing (immediate or batch), task interrelations (independence or dependency), grid resource management (centralized, decentralized and hierarchical), and many others. To achieve the desired performance of the system, both users’ conditions and grid environment information must be “embedded” into the scheduling mechanism [1], [25].

The main attributes of the gird scheduling are presented in Fig. 2.

Fig. 2
figure 2

Main scheduling attributes in CGs

In this paper we address an Independent Batch Scheduling problem in CGs. In this problem, it is assumed that tasks are grouped into batches and can be executed independently in static or dynamic grid environments. The scheduling attributes needed for the specification of this problem are highlighted in Fig. 2 as the dark blue text boxes. The generic independent batch scheduling model is effective in massive parallel processing of the applications that require a large amount of data. Therefore, there are many realistic scenarios, including banking systems, virtual campuses, health systems, bio-informatics applications, and many others, where independent batch scheduling is successfully applied. However, even under the independent nature of tasks and the batch processing, the problem is computationally hard to solve.

The independent batch scheduling procedure can be realized in the following six steps:

  1. (i)

    Get the information on available resources;

  2. (ii)

    Get the information on pending tasks;

  3. (iii)

    Get the information on data hosts where data files for tasks completion are required;

  4. (iv)

    Prepare a batch of tasks and compute a schedule for that batch on available machines and data hosts;

  5. (v)

    Allocate tasks;

  6. (vi)

    Monitor (failed tasks are re-scheduled).

In Fig. 3 we present a simple graphical flowchart of the batch scheduling phases.

Fig. 3
figure 3

Main phases of the batch scheduler in CGs

To our best knowledge there is no standard notation for classification of the scheduling problems in CGs. A simple extension of conventional Graham’s [16] and Brucker’s [8] classifications of scheduling problems has been proposed by Fibich et al. [14]. The characteristics of the resource-constrained project scheduling problem [9] and resource-constrained machine scheduling [6, 7] may be helpful in the specification and formal description of the grid resources.

Based on the methodology presented in [14] and [23], and the main scheduling attributes presented in Fig. 2, the instance of the independent batch grid scheduling problem considered in this paper can be denoted by using the following expression:

$$ Rm \bigl[\bigl\{ b, \mathit{indep}, (\mathit{stat}, \mathit{dyn}), h\bigr\}\bigr] \bigl(C_{max}, E_{I}(E_{II})\bigr) $$
(1)

where:

  • Rm—according to the Graham’s notation, it means that the tasks are mapped into the (parallel) resources of various speedFootnote 1;

  • b—means that the task processing mode is “batch mode”;

  • indep—denotes “independency” as the task interrelation;

  • (stat,dyn)—means that we will consider both static and dynamics grid scheduling modes;

  • h—means that the scheduling objectives are optimized in hierarchical mode;

  • C max—denotes a makespan as the privileged scheduling objective;

  • E I (E batch)—denotes total energy consumption as the second scheduling criterion (E I or E batch is selected depending on the scheduling scenario (see Sect. 6.4)).

Most of these parameters will be explained in Sect. 45 and 6.

4 Expected time to compute (ETC) matrix model

In order to estimate the execution times of tasks on machines we used the Expected Time to Compute (ETC) matrix model [2] adopted to the independent batch scheduling. It is assumed in this model that each task can only be executed on one grid node in each batch and no preemptive process is allowed within tasks or resources. In the case of the failures of machines, the tasks are re-scheduled in the next batch, however, the scheduling of tasks in different batches are the independent processes. It is also assumed that when a machine processes its tasks, there is no priority distinctions between the tasks assigned in the previous batches and those assigned in the current batch. And finally, each machine cannot remain idle and all tasks assigned to this machine must be activated.

The following notation for tasks and machines will be used throughout the paper [26, 27]:

  • n—is the number of tasks in a batch;

  • m—is the number of machines available in the system for an execution of a given batch of tasks;

  • N={t 1,…,t n }—denotes the set of tasks in a batch;

  • M={x 1,…,x m }—denotes the set of available machines for the task batch;

  • N l ={1,…,n}—is the set of tasks’ labels;

  • M l ={1,…,m}—is the set of machines’ labels.

The tasks and machines in the grid systems are characterized by the following general parameters:

  1. (a)

    Task j:

    • wl j —load parameter expressed in Millions of Instructions (MI)– we denote by WL=[wl 1,…,wl n ] a workload vector for all tasks in the batch;

  2. (b)

    Machine i:

    • cc i —computing capacity parameter expressed in Millions of Instructions Per Second (MIPS), we denote by CC=[cc 1,…,cc m ] a computing capacity vector;

    • ready i —ready time of i, which expresses the time needed for the reloading of the machine i after finishing the last assigned task, a ready times vector for all machines is denoted by ready_times=[ready 1,…,ready m ].

In this generic model there is no detailed specification of the types of tasks and machines. The tasks can be considered as monolithic applications or large-scale metatasks with no dependencies among the components. The workloads of tasks can be estimated based on the specifications provided by the users, on historical data, or it can be generated based on the system predictions [18]. As machines we usually define the multiprocessors or parallel machines (see Rm parameter in the notation) or even small local area networks or computational clusters.

For each task-machine pair, the coordinates of WL and CC vectors can be used for an estimation of the completion times of the task j on machine i. These completion times are denoted by ETC[j][i] (iM l , jN l ), and can be calculated in the following way:

$$ \mathit{ETC}[j][i]=\frac{wl_{j}}{cc_{i}}. $$
(2)

All ETC[j][i] parameters are defined as the elements of an ETC matrix, ETC=[ETC[j][i]] n×m , which is the main structure in ETC model.

In simulation analysis, wl j and cc i parameters are usually generated by using the Gamma probability distribution (or the standard Gauss distribution) [32] in order to express the heterogeneities of tasks and machines in the grid system (see Sect. 8.1).

5 Energy model

The energy model presented in this paper is based on the power consumption model in complementary metal-oxide semiconductor (CMOS) logic circuits [4]. In this model, the capacitive power P ij ‘consumed by the machine i for computing the task j is calculated in the following way:

$$ P_{ij}= A \cdot C \cdot v^2 \cdot f, $$
(3)

where A is the number of switches per clock cycle, C is the total capacitance load, v is the supply voltage and f is the machine’s frequency. It is assumed that operating frequency of each machine is approximately proportionate to its processing speed (see [33]). The decrease in the supply voltage and frequency reduces the energy consumed by the machine.

We assume that each machine in the grid system is equipped with Dynamic Voltage and Frequency Scaling (DVFS) module [31], that allows the modulation of the supply voltage and operating frequency of this machine. In Table 1 we present the parameters for 16 DVFS levels, that specify three “energetic” categories for grid machines in our system (see also [34]).

Table 1 DVFS levels for three machine classes

For each machine i (i=1,…,m), its “energetic” class is denoted by s i and is characterized by the following column meta-vector Vr (i) of DVFS levels with different (reduced) values of the supply voltage and the corresponded relative machine frequency parameters:

$$ Vr_{(i)}= \bigl[(v_{s_0}(i), f_{s_0}(i));\ldots; (v_{s_{l(\max)}}(i), f_{s_{l(\max)}}(i))\bigr]^T $$
(4)

For lower supply voltage, the operation frequency of the machine decreases, which means that \(f_{s_{l}}(i)\) coefficients are within the range of [0, 1]. We assume in this work that the supply voltage is constant during the calculation (execution) of each task, but may be different for different tasks.

The decreasing of the machine frequency and supply voltage leads to the increased computational times of the tasks executed on the machine. For a given task-machine pair (j,i), the time of completion the task j on machine i at various DVFS levels specified for the class s i) can be defined as the coordinates of a vector \(\widehat{\mathit{ETC}}[j][i]\) and calculated by using the following formula:

(5)

where l(max) denotes a number of DVFS levels in the class s i, ETC[j][i] is the task execution time calculated according to the (2) and \(\{f_{s_{0}}(i),\ldots, f_{s_{l(\max)}}(i)\}\) are the relative frequencies of the machine i, specified for the class s i at the s 0,…,s l(max) DVFS levels.

It can be observed from the (5) that the inversions of the relative frequency coefficients approximately estimates the raises of the completion times of tasks on machines. It is a consequence of the previous assumption about the inverse proportion of the completion times of tasks and the frequencies of machines.

The ETC matrix can be easily adapted to the energy-aware scheduling model. In such a case an ETC meta-matrix is defined based on the standard ETC matrix, where each ETC[j][i] element is replaced by the corresponding \(\widehat{\mathit{ETC}}[j][i]\) vector (for each pair (j,i)), that is to say:

$$ \widehat{\mathit{ETC}}=\bigl[ \widehat{\mathit{ETC}}[j][i][s_l]\bigr]_{n\times m\times s_{l(\max)}} $$
(6)

where \(\widehat{\mathit{ETC}}[j][k][s_{l}]\) is approximate completion time of task j on machine i at the level s l .

Based on (6) and (3) we can express the energy consumed for completing the task j on machine i at the level s l , as a scalar product of the number of switches per clock cycle, total capacitance load, frequency and squared voltage at a given level s l and the estimated completion time, that is to say:

$$ E_{ji}= \gamma \cdot (f_{s_l}(i))_{j} \cdot f \cdot \bigl[(v_{s_l}(i))_{j}\bigr]^2 \cdot \widehat{\mathit{ETC}}[j][i][s_l] $$
(7)

where γ=AC is a constant parameter for a given machine; \((v_{s_{l}}(i))_{j}\) is a voltage supply value for the class s i and the machine i at the level s l for computing the task j; \((f_{s_{l}}(i))_{j}\) is a corresponding relative frequency of machine i.

Based on (6), (7) and (5) the computational times for each possible pair (j,i) at the level s l can be calculated as follows:

(8)

The cumulative energy consumed by the machine i for the completion of all tasks from the batch that are assigned to this machine, is defined in the following way:

(9)

where T(i) is a set of tasks assigned to the machine i, ready i is the ready time of the machine i, Idle[i] denotes an idle time of the machine i and L i denotes a subset of DVFS levels used for the tasks assigned to machine i. We ignore all additional machine frequency transition overheads, which take usually a negligible amount of time (e.g., 10 ms–150 ms, see [38]) and do not bear down on the overall ETC model with an active ‘energetic’ module.

And finally, we can complete our formal description of the DVFS-based energy model adapted to the independent batch scheduling in grids with a definition of an average cumulative energy utilized by the grid system for completion of all tasks in the batch:

$$ E_{\mathrm{batch}} = \frac{\sum_{i=1 }^m E_i}{ m} $$
(10)

6 Scheduling scenarios and objectives

The DVFS-based energy model for grid system defined in the previous section, will be used now for the specification of two scheduling scenarios and for the definition of the scheduling criteria.

6.1 Scheduling representation

The solutions of the scheduling problem addressed in this paper (schedules) can be encoded as the permutation strings (with and without repetitions) of task and machine labels. We consider in this paper two different encoding methods of schedules, namely direct representation and permutation-based representation.

In direct representation schedules are the elements of the set of all permutations of the length n, with repetitions, over the set of machine labels M l . We denote this set by \(\mathcal{S}\). Formally, each schedule \(S \in \mathcal{S}\) it is encoded by the following vector:

$$ S = [i_1,\ldots, i_n]^T $$
(11)

where i j M l is a label of the machine on which the task j is computed.

The cardinality of \(\mathcal{S}\) is m n.

The direct representation of the schedules can be easily transformed into a permutation-based representation, in which, for each machine, a sequence of tasks assigned to that machine is specified. The tasks in the sequence are sorted (in increasing order) with respect to their completion times. Thereafter, all of the task sequences are concatenated into a vector u, which is in fact a permutation without repetition of tasks to machines. Formally, in this case the codes of schedules are the elements of a set \(\mathcal{S}_{(1)}\) of all permutations of the length n, without repetitions, over the set of task labels N l , and are defined as the following vectors:

$$ u = [u_1,\ldots, u_{n}]^T $$
(12)

where u i N l , i=1,…,n.

The cardinality of \(\mathcal{S}_{(1)}\) is n!.

In this representation some additional information about the numbers of tasks assigned to each machine is required. Therefore, we defined a vector v=[v 1,…,v m ]T of the size m, where v i denotes the number of tasks assigned to the machine i.

6.2 Scheduling scenarios

The problem of scheduling tasks in CG is multi-objective in its general setting as the quality of the solutions can be measured under several criteria.

Two basic models are utilized in multi-objective optimization: hierarchical and simultaneous modes. In the simultaneous mode (s) all objective functions are optimized simultaneously while in the hierarchical (h) case, the objectives are sorted a priori according to their importance in the model. The process starts by optimizing the most important criterion. When further improvements are impossible, the second criterion is optimized under the restriction of keeping unchanged (or improving) the optimal values of the first one. In this paper we define the scheduling problem as a discrete 2-step hierarchical global optimization procedure, where: (i) a makespan is considered as a dominant criterion, and it is minimized in the first step, and (ii) in the second step a total energy consumption is minimized with the assumption that the makespan value does not increase.

One of the main objectives of our work is to compare the results of the scheduling in grids in two following scenarios:

  1. 1.

    I—Max-Min Mode, in which each machine works at the maximal DVFS level during the computations and turns into the idle mode after the execution of all tasks assigned to this machine;

  2. 2.

    II—Modular Power Supply Mode, in which each machine may work at different DVFS levels during the task executions and then may turn into the idle mode.

The first mode seems to be the most effective in the case of low-power devices or services defined as “machines” (resources) in the system. No modification of the standard scheduling procedures and standard scheduling objectives, such as makespan, flowtime, tardiness, etc., is necessary. The second mode may be a good candidate for a testbed architecture for the future generation grid systems. The optimal power supply levels can be specified for each current devices (machines), that can be in the future replaced by the next-generation low-power devices for keeping (or improving) the energy consumption at optimal level.

In the following two subsections we define the procedures of calculation of the makespan and total energy consumed in the system in these two above mentioned scenarios.

6.3 Makespan optimization

Makespan is expressed as a finishing time of the latest task in the batch. That is to say:

$$ C_{\max} = \min_{S\in \mathit{Schedules}} \max_{j\in N}C_{j} $$
(13)

where C j denotes the time needed for finalizing the task j.

Using the ETC matrix model, the makespan can be defined in terms of the completion times of the machines. The time of finishing the last task is specified as the maximal completion time of the machines available for the batch of tasks. Let us denote by completion[i] a completion time of machine i, which is a cumulative time necessary for reloading the machine i after finalizing the previously assigned tasks and for completing the tasks actually assigned to the machine. In Max-Min Mode this completion time can be defined as follows:

$$ \mathit{completion}_I[i] = \mathit{ready}_i + \sum_{j\in T(i)} \mathit{ETC}[j][i] $$
(14)

The makespan in this mode is defined in the following way:

$$ (C_{\max})_I = \max_{i=1}^{m} \mathit{completion}_I[i] $$
(15)

The idle time for machine i working in Max-Min Mode can be calculated as a difference between the makespan and completion I [i], that is to say:

$$ \mathit{Idle}_I[i] = (C_{\max})_I - \mathit{completion}_I[i] $$
(16)

For the machine with the maximal completion time (makespan) the idle factor is zero.

In order to define the makespan in Modular Power Supply Mode, we must specify the actual DSVF level s l for a given machine. The formulas for computing the completion time, makespan and idle time can be defined in the following way:

(17)
(18)
(19)

6.4 Energy optimization

The second step of the scheduling optimization procedure is the minimization of the total energy consumed in CG for scheduling a given batch of tasks. We assume the minimal power supply for each machine in the idle mode and maximal power and voltage supply in reloading process.

The average energy consumed in the system in Min-Max Mode is defined as follows:

(20)

In Modular Power Supply Mode the average cumulative energy is given by the (10), that is to say:

$$ E_{\mathit{II}} = E_{\mathrm{batch}} = \frac{\sum_{i=1}^m E_i}{m} $$
(21)

whereFootnote 2

(22)

In both cases E I and E II are minimized subject to the following constraint:

$$ \sum_{l\in L_i} \biggl[\frac{1}{f_{s_l}(i)}\cdot \mathit{ETC}[j][i]\biggr]\leq C_{\max};\quad \forall i\in \{1,\ldots, m \} $$
(23)

where L i denotes a subset of DVFS levels specified for tasks assigned to the machine i.

7 Green-HGS-Sched: energy-aware hierarchical genetic scheduler

An exploration of the search space in grid scheduling is very complex, mainly because of the sheer size of the solution space and the system dynamics. The search space is determined by the permutations of tasks or machines’ labels, but the lengths of these permutation strings may vary as the numbers of tasks and/or machines can change over time. Additional probability distributions should be then specified for an estimation of the system states in considered time intervals.

In this paper we adapted to the green grid scheduling the Hierarchic Genetic Scheduler (Green-HGS-Sched) developed in [25] as an effective multi-level hierarchic alternative for the single-population genetic-based schedulers. The main aim of Green-HGS-Sched is a comprehensive exploration of the scheduling landscape by the execution of many dependent evolutionary processes. This scheduler can be modeled as a multi-level decision tree. The search process starts by activating a scheduler with the lowest possible accuracy of search, that is interpreted as the “core” in the tree model. This process is responsible for the management of the whole search process, and for the detection of promising partial solutions. More accurate processes are activated in the neighborhoods of those partial solutions for the prevention of the premature convergence of the scheduler and for a possible improvement of the best solutions found in the system. The activation of these processes does not increase significantly the complexity of the hierarchic scheduler because of three main reasons: (i) differently to the hybrid strategies, where the components are usually composed of various meta-heuristics and local search methods, we use the same general framework for the algorithms working at all levels of the tree; (ii) the tree extension is steered by the specialized operations responsible for the deactivation of the ineffective processes and by the effectiveness of the search in the core of the tree; (iii) finally, the synchronization of the search is provided ‘horizontally’ at each level of the tree, so there is no need to refer to the parental nodes and enables an easy adaptation to the actual system state. For all these reasons Green-HGS-Sched significantly differs from the existed hierarchical, hybrid and branching schedulers applied to the various grid scheduling problems and classical job-shop problems (see e.g. [8]).

We present in Fig. 4 an example of 3-level Green-HGS-Sched tree structure [27].

Fig. 4
figure 4

3 levels of Green-HGS-Sched tree structure [27]

Each branch of the tree is created by an active genetic algorithm designed for solving the scheduling problems in CGs. The accuracy of search in Green-HGS-Sched branches is defined by the degree parameter with lowest value 0 set for the core of the system.

We denote by \(P^{e}_{ (r,t) }\) a population evolving in the branch of degree t, where:

  • e∈ℕ defines the global metaepoch counter;

  • t∈{1,…,M},M∈ℕ and M is the maximal degree of the branches;

  • r is the number of branches of the same degree.

The hierarchical structure of the scheduler is updated periodically after the execution of k-generation evolutionary processes in each active branch. We call such a process a k-periodic metaepochM k ,(k∈ℕ) and define it in the following way:

$$ M_k \bigl(P^e_{(r,t)}\bigr) = \bigl( P^{e+l}_{(r,t)}, \widehat{s} \bigr) $$
(24)

where \(\widehat{s}\) is the best adapted individual in the metaepoch, \(P^{e}_{(r,t)}, (t\in \{1,\ldots, M\}, M\in \mathbb{N})\).

New branches of the higher degree can be created in neighborhoods of the best adapted individuals found in each active branch by using a Sprouting Operation (SO) defined as follows:

$$ \mathit{SO}\bigl(P^{e}_{(r,t)}\bigr) = \bigl(P^{e}_{(r,t)}, P^{0}_{(r', t+1)}\bigr) $$
(25)

where \(P^{e}_{ (r,t) }\) is a parental branch, and \(P^{0}_{(r+1,t+1)}\) denotes an initial population for a new branch of degree t+1. Individuals in this population are selected from an S t -neighborhood (1≤S t n) of the best adapted individual \(\widehat{S}\) in the parental population \(P^{e}_{(r,t)}\). This neighborhood is created by all possible permutations or reassignments of tasks in (nS t )-length suffix of \(\widehat{S}\). The S t -length prefix of a given schedule S is generated by using the following operator:

$$ A_{(S_t)}(S) = \tilde{S},\qquad |\tilde{S}|=S_t,\quad S_t\leq n $$
(26)

where \(|\tilde{S}|\) denotes the length of the suffix in the permutation sequence which encodes the schedule S. The values of S t parameters may be different in branches of the different degrees. In this paper we assume that these parameters can be calculated in the following way:

$$ S_t = (suf)^t\cdot n $$
(27)

where suf∈[0,1] is a global strategy parameter called a neighborhood parameter and t is the branch degree.

The sprouting operation is conditionally activated depending on the outcome of a Branch Comparison (BC) binary operator applied for parental and its all directly sprouted branches. It is used for the detection of ‘similarity’ of the resulting populations in each parental-sprouted pair of branches. Formally the BC:Q→{0,1} operator is defined by the following formula:

(28)

where Q={(X,Y,S t )} and X,Y—are the populations in branches of degrees t and t+1 respectively. This operator is activated after execution of at least two metaepochs in the core. The outcome of the BC operator is 1 if the parental branch and its “descendant” (sprouted) branch operate in a similar region in the optimization landscape. In such a case another metaepoch is executed in the parental branch without creating any new process. This technique is crucial in an effective management of the algorithm structure by preventing the activation of many similar processes in the same local region, which usually increase significantly the complexity of the whole strategy.

The implementation of the BC operator may be very complex. In our early implementations of Green-HGS-Sched we achieved very good results in the minimization of the makespan and other scheduling criteria (see [24]), but the execution time of the scheduler was quite long in more complex grid scenarios. In order to reduce the execution time of the BC procedure, we introduced a hash table with the task-resource allocation key denoted by K, that supports the indication of the populations operating in similar regions in the search space. The value of this key is calculated as the sum of the absolute values of the subtraction of each position and its precedent in the S t -length suffix in direct representation of the schedule vector (reading the suffix in a circular way). The hash function f hash is defined as follows:

(29)

where K min and K max correspond respectively to the smallest and the largest value of K in the population, and N is the population size.

In the case of the conditional sprouting of the new branches of the degree t+1 from the parental branch of the degree t the keys are calculated for the best individual in the parental branch and individuals in all populations in all active branches of the degree t+1. If there is any individual in the higher degree branches, for which the key matches the key of the best adapted individual in the parental branch, then the value of BC is 1 and no branch of the degree t+1 is sprouted.

In the case of the comparison of the branches of the same degree t, all branches, in which there exists the individuals with the identical keys have to be reduced and a single joint branch is created (the value of BC is 1). The individuals in this branch are selected from the “youngest” (in the sense of the population evolution) populations in all reduced branches.

It has been shown in [47] and [25] that hash technique can reduce significantly (50–70 % ) the execution time of the genetic algorithms, where indication of the similarity of solutions is necessary.

7.1 Genetic Engine in Green-HGS-Sched

The main genetic engine in Green-HGS-Sched branches is defined in Alg. 1. It is based on the framework of the classical genetic algorithms used in the combinatorial optimization [36].

Algorithm 1
figure 5

A template of the genetic engine for six genetic-based grid schedulers

We apply the direct representation of the schedules in the base populations P t and P t+1, and permutation representation in \(P_{c}^{t}\) and \(P_{m}^{t}\) populations to implement the crossover and mutation operators. An initial population is generated randomly. Based on our previous results of implementation of genetic-based meta-heuristics to green scheduling in grids we use the following configuration of genetic operations in the main loop of the Algorithm 1: (i) Linear Ranking as selection scheme, (ii) Cycle Crossover (CX) operator and (iii) Move mutation method [36].

In Cycle Crossover (CX) each task in a chromosome must occupy the same position, so that only interchanges between alleles (positions) can be made. Firstly, a cycle of alleles is identified. The crossover operator leaves the cycles unchanged, while the remaining segments of the parental strings are exchanged. The main idea of Move mutation a task is moved from one machine to another one. Although the task can be appropriately chosen, this mutation strategy tends to unbalance the number of tasks per machine.

We consider two alternate replacement mechanism for the generation of the base population for a new GA loop, namely Elitist Generational and Struggle strategies. In Elitist Generational method the “Elite” of the best solutions contains just 2 individuals. The main drawback of such methods is that they may lead to premature convergence on some solution and impacts on the stagnation of the population. A Struggle mechanism can be an effective tool for avoiding too fast scheduler’s convergence to the local optima. In this method, new generation of individuals is created by replacing a part of the population by the most similar individuals—if this replacement minimizes the fitness value. The definition of the struggle procedure requires a specification of the appropriate similarity measure, which indicates the degree of the similarity among two GA’s chromosomes. We use in this work the Mahalanobis distance [32] for measuring the distances between schedules according to the following formula:

$$ \mathit{sim}_e(S_1; S_2) = \sqrt{\sum_{j=1}^{n} \frac{(S_1[j]-S_2[j])^2}{\sigma_P^2}} $$
(30)

where σ P is the standard deviation of the S 1[j] over the population P.

The possible high computational cost of the struggle strategy may be reduced by implementing a hash technique, as it was proposed in the previous section for BC operator. Using the struggle replacement mechanism in genetic grid schedulers allow us a fine tuning of the scheduler to “converge” to a good solution depending on available time (for instance, scheduler’s time activation interval) [47].

8 Empirical evaluation of the hierarchical genetic schedulers

In this section we present the results of empirical evaluation of two implementation of Green-HGS-Sched, namely HGS-Elit and HGS-St with Elitist Generational and Struggle replacement mechanisms (respectively) in the branches. We compare the efficiency of hierarchical schedulers with the results achieved by single-population GAs and Island Models with the same configuration of genetic operators and parameters. The experiments were conducted in two “energetic” scenarios, namely Max-Min Mode and Modular Power Supply Mode defined in Sect. 6.2. For simulating various grid size scenarios in static and dynamic modes we used the Energy-aware Hyper-G grid simulator introduced in [26]. The main idea of the simulator is presented in Sect. 8.1. The empirical results are analyzed in Sect. 8.3.

8.1 Energy-aware HyperSim-G grid simulator

Energy-aware HyperSim-G simulator is an extension of the HyperSim-G software [48] dedicated s for modeling the realistic CGs systems in various energetic scenarios. In energy-aware scheduling the instance contains the following input data:

  • workload vector of tasks;

  • computing capacities of machines;

  • prior loads of machines;

  • machine categories specification parameters (number of classes, maximal computational capacity value, computational capacity ranges interval for each class, machine operational speed parameter for each class, etc.);

  • DSVF levels matrix for machine categories; and

  • the ETC matrix.

The input data is needed for generation of the scheduling event, which is passed on to the selected scheduler in order to compute the optimal schedule(-s). Finally, the scheduler sends the optimal schedule(-s) back to the simulator, which allocate the resources and simulate the computation process.

The instances produced by the simulator for our experiments are divided into static and dynamic grid scheduling benchmarks. In the static case, the number of tasks and the number of machines remain constant during the simulation, while in the dynamic case, both parameters may vary over time. In both static and dynamic cases four Grid size scenarios are considered: (a) small (32 hosts/512 tasks), (b) medium (64 hosts/1024 tasks), (c) large (128 hosts/2048 tasks), and (d) very large (256 hosts/4096 tasks).

The simulator is highly parameterized in order to reflect the realistic grid scenarios. The main parameters are defined as follows:

  • Number of hosts: Number of resources in grid;

  • MIPS: A probability distribution specified for modeling the computing capacity of resources;

  • Total tasks: Number of tasks in a given batch;

  • Workload: A probability distribution used for modeling the workload of tasks;

  • Host selection: Selection policy of resources, the parameter (all means that all resources of the system are selected for scheduling purposes);

  • Task selection: Selection policy of tasks, the parameter (all means that all tasks in the system must be scheduled);

  • Number of runs: Number of simulations done with the same parameters, reported results are then averaged over this number.

In Table 2 we present the key input parameters of the simulator in the static and dynamic cases. We use the notation N(a,b) and E(c,d) for Gaussian and exponential probability distributions. We have used in the experiments the similar settings for our simulator as in our previous work [26, 29]. These parameters were tuned for illustrating the typical grid size scenarios for conventional grid scheduling in [48].

Table 2 Values of key parameters of the grid simulator in static and dynamic cases

In the dynamic case we have to specify the minimal and maximal values for numbers of tasks and machines in the system. The resources can be dropped or added to grid with the frequencies defined by the Gaussian distributions (add host and delete host parameters). New tasks may arrive in the system with frequency parameter denoted by interarrival, until a total tasks value is reached. An Activation parameter establishes the activation policy according to an exponential distribution. The already scheduled tasks that have not been executed yet will be rescheduled if reschedule is true.

We consider 16 DVFS levels for three “energetic” resource classes: Class I, Class II and Class III presented in Table 1.

8.2 Scheduling meta-heuristics and performance measures

We consider in experiments six genetic-based meta-heuristics defined three genetic schedulers defined in Table 3.

Table 3 Six GA-based grid schedulers evaluated in the experimental analysis

The aforementioned methodologies differ in the implementation of the replacement mechanism in the main genetic framework. We used Elitist Generational replacement in xxx-Elit algorithms and Struggle procedure in xxx-St algorithms. Both single-population GAs, namely GA-Elit and GA-St, are implemented as the main genetic mechanism in IGA-Elit, HGS-Elit, IGA-St and HGS-St respectively.

Island Genetic Algorithm (IGA) [46] is a well-known parallel GA technique. An initial population (possibly big) is divided into several sub-populations (islands or demes), for which single-population GAs with identical configurations of the parameters and operators are activated (one separate algorithm for each deme). After fixed number of iterations (we denote it by it d ) the migration procedure is activated, which enables the partial exchange (usually according to the standard ring topology) of the individuals among islands. The relative amount of the migrating individuals denoted by mig, is the algorithm global parameter called a migration rate. It is calculated in the following way:

$$ \mathit{mig} = \frac{m_{\mathrm{deme}}}{\mathrm{deme}}\times 100\ \%, $$
(31)

where deme is the size of the sub-population in IGA and m deme is the number of migrating individuals in each deme.

In Tables 45 and 6 we present the configuration of the key parameters for both implementations of single-population GA, IGA and Green-HGS-Sched meta-heuristics respectively. The size of initial and intermediate populations in IGA depends on the implementation of the genetic engine in islands and are the same as for single-population GA-Elit and GA-St algorithms. Similarly as in the case of the parametrization of the grid simulator, we based on the configuration of the conventional implementation of HGS-Sched scheduler and single-population genetic grid schedulers presented in [26, 29] and [47], where the detailed tuning process has been provided.

Table 4 GA setting for static and dynamic benchmarks
Table 5 Green-HGS-Sched settings for static and dynamic benchmarks
Table 6 Configuration of IGA algorithm

The relative performance of all six schedulers is measured through the two following metrics:

  • minimal makespan defined as follows:

    $$ \mathit{makespan} = \min \{\mathit{Makespan}_I, \mathit{Makespan}_{\mathit{II}}\} $$
    (32)
  • a relative energy consumption improvement rate expressed as follows:

    $$ \mathrm{Im}(E)= \frac{E_I-E_{\mathrm{batch}}}{E_{\mathrm{batch}}}\times 100\ \%, $$
    (33)

    where E batch and E I are defined in Eqs. (10) and (20) respectively.

8.3 Results

Each experiment was repeated 30 times under the same configuration of operators and parameters. In Figs. 5 and 6 we present the box-plots of the makespan values for six considered schedulers (confidence level—95 %). The makespan is measured and expressed in arbitrary time units, the same as defined in ETC matrix model (see Sect. 4).

Fig. 5
figure 6

The box-plot of the results for makespan in static case

Fig. 6
figure 7

The box-plot of the results for makespan in dynamic case

Both implementations of Green-HGS-Sched achieved the best results in all instances but Large grid in static case and Small and Large instances in the dynamic case, in which they lose to IGA model. The results of a simple comparison of the impact of the replacement method on the algorithms’ performance provided for all pairs of the xxx-Elit and xxx-St schedulers show that Struggle replacement is much more effective that Elitist Generational method in the case of single-population GA and IGA schedulers. It confirms the results of our preliminary study on the effectiveness of single-population genetic schedulers in CGs presented in [26]. For Green-HGS-Sched the situation is completely different. In most of the cases the effectiveness of both hierarchical implementations are at the comparative levels, with a little advantage of elite technique in the dynamic case. It means that in Green-HGS-Sched the most important is the fast exploration by the core of the system of probably wider regions in the search space than in the case of GA and IGA implementations. The core can activate the more accurate processes in the neighborhoods of the partial solutions which are undetected by the other schedulers, which makes the Green-HGS-Sched very effective in the exploration of new regions in the optimization domain and in escaping the basins of attraction of the local solutions. The complexity of the hierarchic system is in fact not a drawback of the scheduler, cause the constraints of the execution time for HGS and IGA are exactly the same. The ranges in the achieved makespan values for all considered meta-heuristics are not greater than 30–45 % of the mean makespan values, which means that the stability of all schedulers in all cases are acceptable. The distributions of the makespan results are asymmetric: the skewness in the static case is positive, for GA and IGA and negative for Green-HGS-Sched in most of the static instances, however it is negative in the dynamic grids for almost all schedulers. It means that the reduction of the average makespan in this case is much harder than in static case (the mean values are closer the third quantile, than the first one), which confirms the complexity of the problem in the realistic dynamic grid scenarios.

The box-plots for the energy saving rates Im(E) are presented in Figs. 7 and 8.

Fig. 7
figure 8

The box-plot of the results for relative energy saving rate in static case (in %)

Fig. 8
figure 9

The box-plot of the results for relative energy saving rate in dynamic case (in %)

The results of the energy optimization differs significantly compare with the makespan results. In this case each of IGA-Elit and GA-Elit algorithms outperforms the rest schedulers in four instances. Green-HGS-Sched is not as good in energy optimization as in makespan minimizing. It means that it works quite good in Min-Max scenario, so no additional DSV modules are necessary here. The range of the average saving rate values is 10 %–35 % for most of the schedulers, which is rather high. Finally, it can be observed that the skewness of the distribution of the results is positive or neutral for the worst “energy optimizers” and negative for the best ones.

9 Conclusions

We addressed in this paper the problem of optimizing the energy utilized in CGs in independent batch scheduling. Our energy management model is based on Dynamic Voltage Scaling (DVFS) technique adapted to the dynamic grid environment. We formalized the grid scheduling problem as a bi-objective optimization task with makespan and average energy consumption as the main objectives.

For solving the addressed grid scheduling problem, we developed two implementations of an energy-efficient Hierarchical Grid Scheduler Green-HGS-Sched and provided its experimental evaluation in two ‘energetic’ scheduling modes in static and dynamic grid scenarios under the makespan and relative energy consumption improvement rate. Their effectiveness were compared with the results achieved by four single-population Genetic Algorithm (GA) and Island GA schedulers. To provide the experiments, we integrated all energy-aware schedulers within a grid simulator. The simulation results confirmed the effectiveness of the proposed schedulers in the reduction of the energy consumed by the whole system and in dynamic load balancing of the resources in grid clusters, which is sufficient to maintain the desired quality level(-s).

Our model is general in its implementation and can be easily adapted to a particular scenario and realistic grid infrastructure, such as the large-scale banking system or highly distributed data system. First, we do not consider any special architectures for grid resources, which means that this characteristics can be specify separately and integrated with the system simulator. The term “task” can be also used for monolithic applications, metatasks or parallel applications represented by Directed Acyclic Graphs. The schedulers are integrated with the main grid simulator as separate modules, and therefore they can be easily modified, extended and hybridized with the other algorithms. Finally, we simulate the dynamics of the realistic grid system, in which the availability of the resources and the number of tasks may vary over the time.