Introduction

Cloud computing is a popular computing model that gives scalable and on-demand services to the users using the internet. In Cisco [1], by the year 2021, 94% of computing is done by the cloud computing systems. Due to the increased popularity of cloud services, different communities and enterprises prefer cloud environment to deploy scientific workflow applications appropriate computational resources to workflow tasks to accomplish the task’s execution while satisfying the user-defined constraints. Although cloud computing has great benefits such as minimizing operational costs, it suffer from high-energy consumption problem. As the computational resources increases in cloud environment, energy consumption has become an essential concern [2]. For example, a heterogeneous computing system, namely Tianhe-2 (world fastest supercomputer) situated at National Supercomputer Center in China (Guangzhou), and its power consumption was 17,808 kW in July13, 2015 [3]. For economic development needs, the National Supercomputer Center concentrates on power efficacy rather than speed, since Supercomputers consume a lot of power, although they are beneficial to give a high-performance solution. Further, Moore’s law [4] states that the transistor’s count in integrated circuits is increased by two times in every 2 years. Therefore, this leads to a rise in the energy consumption in data centers. From the above investigations, how to obtain desirable green computing environment is an important issue in cloud computing.

Further, when the energy usage of the cloud system is increased, it creates negative impact on the system’s reliability, causes carbon footprints and increases the operating cost significantly. Moreover, inefficient utilization of computing resources raises the energy consumption of workflow task’s execution [5]. To tackle the problem of energy consumption, different hardware technologies [6] exist like powerful energy-efficient monitors, multiple CPU cores, low-power efficient microprocessors, and cache memory. Moreover, DVFS technique is widely used by researchers to lessen energy consumption by scaling down voltage/frequency during the execution [7]. However, research work [8], signifies that scaling the chip voltage dynamically increases the transient failure of resources, which affects the reliability of the application execution. Additionally, keeping all the applications executing on high-performance systems for a long duration will increase the temperature of the cloud computing systems, which further diminishes the reliability and availability. Thus, energy and reliability are conflicting factors.

In the recent past, high-performance resources attracted a number of researchers to work on workflow scheduling problem in cloud environment. Many authors developed the workflow scheduling algorithms, namely constrained earliest finish time algorithm (CEFT) [9], heterogeneous earliest finish time algorithm (HEFT) [10] and critical path on a processor (CPOP) algorithm [10] considering the single objective of makespan/execution time only. Moreover, considerable research efforts have been done in workflow scheduling with the aim to optimize the system’s reliability and makespan. For example in paper [11], the authors proposed the Reliable-HEFT (RHEFT) algorithm to address the maximization of reliability along with makespan minimization. Authors in paper [12] proposed a reliability dynamic level scheduling (RDLS) algorithm concentrating on the reliability of the computing resources while minimizing the execution time. In paper [13], the authors proposed hierarchical reliability-driven scheduling (HRDS) heuristic to optimize the reliability issues in grid environment. Further, in paper [14, 15], authors considered reliability and energy efficiency parameters for workflow scheduling, but the same is applied in the area of embedded systems. In paper [16], authors developed the scheduling heuristics for workflow applications which uses DVFS mechanism to lessen energy consumption in a homogeneous cluster computing. However, optimizing reliability and energy efficiency jointly in a heterogeneous cloud environment is still to be explored more.

Since energy consumption and system’s reliability conflicts each other, thus workflow scheduling in cloud computing is a multi-objective problem. In multi-objective problems, there exists no single solution, which is good for all the considered objectives. Thus, in multi-objective optimization, many latest researches [17, 18] are providing the Pareto optimal solution set rather than single best solution. Pareto optimality concept is to obtain multiple non-dominated solutions. In paper [19], authors used the well-known meta-heuristic i.e. PSO with non-domination sorting (NSPSO) in order to achieve the Pareto set of optimal solutions. Further, in paper [20], authors proposed the Discrete PSO using ε-fuzzy dominance mechanism that quantify the relative fitness of non-dominated solutions to achieve better trade-off front. However, generation of Pareto set with a meta-heuristic is very time-consuming. Further, it is not easy for the decision maker to select one solution from the set of multiple solutions. Thus, in this research work, we are applying the heuristic technique in spite of meta-heuristic to find out the single best solution using ε-fuzzy dominance with very small computation time.

Our work in this paper contributes to reliability-aware energy-efficient scheduling on heterogeneous cloud environments. The above-mentioned works focus on optimization of either energy consumption or reliability but to the best of my knowledge very less research work focused on the joint optimization of energy consumption and reliability. However, we proposed an algorithm that focuses on maximizing the system’s reliability and jointly minimizing energy consumption for workflow applications under precedence constraints. We designed a new heuristic scheduling algorithm named as ε-fuzzy dominance based reliable green workflow scheduling (FDRGS) for workflow applications. The algorithm works in two phases. Firstly, the rank is assigned to all tasks of the workflow and then those tasks that have the highest rank are assigned the highest priority. Secondly, the best processor for every task is selected by considering multiple objectives i.e. energy consumption and reliability of task’s execution. ε-Fuzzy dominance sorting is applied to obtain best trade-off solutions based on the fuzzy dominance values. This procedure is helpful to find a single best-compromised solution by taking the relative fitness of the trade-off solutions with less overhead. Further, the accurate frequency level of the processor is chosen to lower the energy consumption and maximize the reliability of the application tasks using DVFS.

The primary contributions of the paper are as follows:

  • In this paper, workflow scheduling application is considered to simultaneously optimize energy consumption and reliability for executing workflow applications in a cloud environment.

  • We proposed ε-fuzzy dominance based reliable green workflow scheduling (FDRGS) algorithm in cloud environment. The proposed algorithm obtains the single best solution by taking into account ε-fuzzy dominance values (the relative fitness) of trade-off solutions that too with small computation overhead by pruning the multiple tradeoff solutions.

  • Further, to minimize the energy consumption, we have applied a DVFS technique that lowers the frequency/voltage of the processor to save energy. Moreover, fault rate of the computing system is affected by the operating frequency/voltage level. If the system is operated at minimum frequency; it does not mean that the occurrence of fault rate is less. Fault arrival rate depends on the operating frequency of computing nodes. Hence, our objective is to select the correct/appropriate operating frequency level to achieve maximum reliability and minimum energy consumption.

  • Extensive simulation analysis is carried out to evaluate proposed FDRGS algorithm. We compared our algorithm with energy-conscious scheduling heuristic ECS [21], reliability HEFT (RHEFT) [11] and HEFT [10]. Further, statistical test has been obtained to validate the results. Results manifest that the proposed FDRGS algorithm performs better than other compared algorithms.

The remaining part of the paper is summarized as follows. The next section mentions related works and the following section shows the models used in current work, consisting of system model, scheduling model, energy and reliability model. The next section presents the proposed algorithm. The next section shows the simulation evaluation of the FDRGS approach. Last section provides the conclusion and future work.

Related work

Recently, numerous research works [11, 12, 22,23,24,25,26,27,28,29] in cloud environment have been proposed to achieve efficient performance, energy consumption and reliability. The authors in paper [11] proposed the task scheduling algorithm for heterogeneous systems to optimize makespan and reliability. Authors in paper [12] developed bi-objective scheduling algorithm considering reliability and execution time in heterogeneous systems. In paper [30], a scheduling algorithm has been proposed considering two objectives i.e. time and reliability simultaneously using weighted sum approach. In paper [31], an analytical resource allocation model is proposed to achieve system reliability considering application constraint and resource constraint. Application constraint takes care of task precedence and resource constraint is confined to storage and memory usage. In paper [32], authors proposed a reliable scheduling solution for DAG application considering timing constraints with the aim to maximize the reliability. In paper [33], authors proposed the reliability-driven architecture for task scheduling depending on the optimal reliable communication path. For assigning priority to the tasks, the authors considered the additional parameter of reliability overhead and proposed the scheduling technique with duplication for achieving high reliability for precedence-constrained tasks. Further, in paper [34], authors proposed a scheduling algorithm to optimize the makespan and reliability of tasks without taking care of energy consumption based on a genetic optimization technique. The authors introduced a reliability model and load balancing technique in [35] by using Colored Petri Nets (CPN). The authors in paper [36] proposed a heuristic that minimizes storage and network resource usage while guarantee service reliability. Above-mentioned research works focus on the optimization of reliability along with other objectives of execution time in heterogeneous systems.

Along with the reliability of cloud computing resources, efficient consumption of energy has become a big challenge for the research community in the cloud environment, not only for reducing budget but also for increasing the reliability of the system. Cloud data centers use a lot of power and cause irreversible and serious damage to the global environment due to emission of carbon dioxide. Many scheduling algorithms [15, 37] have been proposed with the objectives of energy optimization at the resource level or cluster level [38]. The authors in [39] proposed an energy-efficient approach in a DVFS-enabled cluster to schedule workflow applications by utilizing slack time corresponding to non-critical jobs. However, in cloud systems, energy optimization along with reliability is still needed to be explored.

Further, there exist few energy efficient scheduling techniques for the sustainable computing along with the other important objectives of minimizing the execution cost, completion time etc. In heterogeneous systems, authors in paper [40] concentrate on the problem of minimizing the completion time while satisfying the energy constraint for DAG applications. However, authors in paper [41] focused on reducing energy consumption while satisfying deadline constraint as per service level agreement (SLA) by reclaiming slack time. The authors in paper [25], proposed two algorithms considering time and energy parameters for task scheduling, and then they combined these two algorithms and proposed a multi-resource scheduling algorithm to show the tradeoff between energy and performance. They considered the probability parameter α to control the cloud system’s energy and performance. To show the better tradeoff, probability parameter values were changed dynamically according to the existing workload. In paper [22], authors proposed a task scheduling heuristic ECOTS by considering server power efficiency, resource requirements to optimize the energy consumption with minimal cost of performance in cloud environment. ECOTS provides power efficiency under different workloads. In paper [23], authors present an algorithm named as cost and energy efficient scheduling (CEAS) to reduce the workflow cost and minimize the energy consumption under deadline constraint. In paper [24] authors proposed an energy aware task scheduling (EATS) algorithms which minimizes energy consumption of the workflow application considering DVFS technique. EATS algorithm finishes task’s execution within deadline to meet the QoS constraint. In above-mentioned algorithms, researchers considered one objective as the primary objectives while taking the other objective as the constraint to optimize the conflicting objectives and provide the unique solution.

On the other hand, there exist some scheduling algorithms considering multiple objectives simultaneously and provide the Pareto solutions set instead of the single best solution. Authors in paper [42], proposed a multi-objective list scheduling algorithm (MOLS) for scientific workflows taking into account various objectives namely energy consumption, makespan, cost, and reliability. They maximize the distance from different constraints corresponding to each objective for dominating solutions. MOLS is used to find solution that dominates/converges to constraints i.e. defined by application user for multi-objective. MOLS algorithms find the dominant solution from the Pareto set of solutions. The authors in paper [43] proposed an extension of HEFT algorithm that is multi-objective HEFT (MOHEFT) which provides multiple solutions considering makespan and energy efficiency and provides Pareto solution set.

From the above-mentioned works, it is observed that most of the researchers considered either energy consumption or reliability with the execution performance of the application. However, considering both parameters reliability and energy consumption simultaneously opens new opportunities for researchers in cloud systems. Hence, in this paper, we proposed an energy efficient scheduling algorithm namely ε-Fuzzy dominance based reliable green workflow scheduling (FDRGS) that uses ε-fuzzy dominance procedure to find the best solution, which ensures the maximization of the reliability and minimization of energy consumption simultaneously. The proposed algorithm is very advantageous to explore the tradeoff between two conflicting objectives.

Mathematical formulation

This section introduces the cloud computing model, workflow model, task scheduling model, reliability model and the energy model considered. Table 1 provides some important notations used in this research work.

Table 1 Some important notations used throughout the paper

Cloud computing model

Cloud infrastructure consists of a number of physical machines (s) that are DVFS enabled and support maximum L frequency levels (f1, f2,..,fL) and the respective voltages. All the physical machines (PM) are further virtualized into the different virtual machines (VMs) by using hypervisor.

Figure 1 shows the cloud computing framework for the proposed FDRGS algorithm. A cloud user sends the request to the Cloud Manager by considering workflow application graphs as input. After a request is approved, the Cloud Manager forwards workflow application to Scheduler. The main purpose of Scheduler is to generate the best scheduling solution for the user’s application. Reliability predictor monitors a resource constantly and evaluates its state by analyzing its log. On the basis of history log information, Reliability Predictor predicts the failure characteristics of each physical machine in order to provide reliable resource allocation for the execution of tasks. Further, DVFS Manager facilitates the scheduler with the saving in energy consumption characteristics of the resources and allows the process to execute at selected voltage level. Finally, Scheduler generates the efficient schedule and passes it to the Cloud Manager that is responsible for allocating tasks to different physical machines as per scheduling sequence.

Fig. 1
figure 1

Cloud computing framework for FDRGS algorithm

Workflow model

In this paper, deterministic workflow model is considered that is represented as directed acyclic graph (DAG). In our work, the application consists of a bag of tasks with dependency\precedence constraints. A workflow application is represented by DAG (G = {T, ET, E, CT}) where

  • T signifies the vertex set, represents n tasks in graph G. Each task tj ε T represents the task that can execute on available computing resources. pred(tj) is the immediate predecessor of task tj and succ(tj) is the immediate successor of task tj. Entry task (tentry) is a task for which pred(tj) = ф and exit task(texit) is a task for which succ(tj) = ф. The weight w(tj) allocated to task tj denotes the computation cost i.e. number of instructions to be executed corresponding to particular task (in MI).

  • ET(tj) denotes the execution time of task tj on any physical machine. Because of heterogeneity, task has different execution time on each physical machine.

  • E represents edges in the application graph where ej,k = (tj,tk) ε E represents the dependency edge from task tj to task tk. The weight w(ej,k) assigned to edge (ej,k) signifies the data transferred (in MB) from task tj to tk.

  • Communication time/cost corresponding to edge ej,k is represented by CT(tj,tk) and it is dependent on the amount of data transfer from task tj to tk when executed on different PMs. Communication time (CT(tj,tk) = 0) when two tasks are assigned on the same physical machine.

Workflow scheduling model

We consider the problem of workflow scheduling in cloud environment with the aim to reduce the energy consumption and maximize the reliability for the workflow application. We firstly define the parameters like: communication time (CT(tj,tk)), earliest start time (EST(tj)) and earliest finish time (EFT(tj)). Communication time (CT(tj,tk)) between two precedence constraint tasks is determined by communicated data from task tj to tk represented by w(ej,k) and the communication bandwidth (\({\text{BW(PM}}_{r} ,{\text{PM}}_{s} )\)) among the chosen computational resources where \({\text{BW}}({\text{PM}}_{r} ,{\text{PM}}_{s} )\) represents the bandwidth (in Mbps) between physical machines PMr and PMs. So,

$$ CT(t_{j} ,t_{k} ) = \left\{ {\begin{array}{*{20}c} {\frac{{w\left( {e_{j,k} } \right)}}{{{\text{BW}}({\text{PM}}_{r} ,{\text{PM}}_{s} )}},} & {{\text{ PM}}_{r} \ne {\text{PM}}_{s} } \\ {0,} & {{\text{PM}}_{r} = {\text{PM}}_{s} )} \\ \end{array} } \right\} $$
(1)

Execution time \({\text{ET}}(t_{j} )\) of task \(t_{j}\) on any PM at maximum frequency level (fr,L) is as:

$$ {\text{ET}}(t_{j} ) = \frac{{w(t_{j} ) \cdot {\text{CPI}}}}{{f_{r,L} }}, $$
(2)

where \(f_{r,L}\) denotes maximum frequency level of CPU during the task’s execution.

Estimated start time (\({\text{EST}}(t_{j} )\)) of a task is defined as the time when the task starts its execution. Due to heterogeneity of PM’s, execution time is different for every PM. \({\text{EST}}(t_{j} )\) is equal to communication time \({\text{CT}}(t_{p} ,t_{j} )\) from the predecessor task to current task and finish time of their predecessors on the assigned physical machines as follows:

$$ {\text{EST}}(t_{j} ) = \left\{ {\begin{array}{*{20}c} 0 & {t_{j} \, = \, t_{{{\text{entry}}}} } \\ {\begin{array}{*{20}c} {\max } \\ {t_{p} \in {\text{pred}}(t_{j} )} \\ \end{array} \left\{ {{\text{EST}}(t_{p} ) + {\text{ET}}(t_{p} ) + {\text{CT}}(t_{p} ,t_{j} )} \right\}} & {{\text{otherwise}}} \\ \end{array} } \right\}. $$
(3)

Estimated finish time (EFT(tj)) of a task tj is the time at which task finish its execution on any PM. It is the summation of estimated start time (\({\text{EST}}(t_{j} )\)) and execution time (\({\text{ET}}(t_{j} )\)) and is given as:

$$ {\text{EFT}}(t_{j} ) = {\text{EST}}(t_{j} ) + {\text{ET}}(t_{j} ). $$
(4)

Power model

The power model provides mathematical modeling for calculating the energy consumption of a cloud computing environment. Power consumption is primarily occurred due to memory, CPU and networks. Out of these, CPU consumes the significant part of energy i.e. of two types: static and dynamic. Thus, the system power [44] is calculated as follows:

$$ P = P^{s} + k(P^{{{\text{ind}}}} + P^{d} ), $$
(5)

where Ps represents static power consumption and (Pind, Pd ) represents the frequency independent and frequency dependent dynamic power consumption. Pind is independent of system operational frequency and its value is going to be decreased by keeping the system power in standby state. The k represents the system state indicating whether active power consumption has occurred or not. If k = 1, the system is in active mode and k = 0 signifies that the system is in snooze

Static power is consumed during the idle state of the system. This power keeps the memory in doze mode and clock in ready state. This keeps the power consumption low down by turning off the systems. The major contributing factor of power consumption is dynamic power consumption. Therefore, we mainly concentrate on the minimization of dynamic power using technique such as DVFS in which the voltage and frequency of the processor is dynamically scaled at run time to reduce dynamic power.

Further, dynamic power consumption (Pd) is calculated as follows:

$$ P^{d} = C^{{{\text{eff}}}} v_{dd}^{2} f_{l} , $$
(6)

where Ceff represents effective loading capacitance. vdd represents the operating voltage and \(v_{dd} \propto f_{l}\). When the workflow application executes using lower frequency levels, then supply voltage is linearly reduced. Thus, Pd is:

$$ P^{d} = C^{{{\text{eff}}}} f_{l}^{3} . $$
(7)

We know, Energy is calculated by the product of power and time and can be calculated by using \(\xi = P \times T\). Finally, the energy consumption (\(\xi (t_{j} ,f_{r,l} )\))for the particular task tj executing on rth selected physical machine operating at frequency (fl) is calculated as follows:

$$ \begin{aligned}\xi (t_{j} ,f_{r,l} ) &= (P^{{{\text{ind}}_{j} }} + C^{{{\text{eff}}}} f_{r,l}^{3} )\frac{{{\text{ET}}(t_{j} )}}{{f_{r,l} }} \\ &= P^{{{\text{ind}}_{j} }} \frac{{{\text{ET}}(t_{j} )}}{{f_{{_{r,l} }} }} + C^{{{\text{eff}}}} {\text{ET}}(t_{j} )f_{r,l}^{2} . \end{aligned}$$
(8)

In order to reduce the energy consumption, energy efficient frequency \(f_{ee}\) [15] is to be considered which can be calculated by performing differentiation of Eq. (8) with respect of f and is calculated as:

$$ f_{ee} = \sqrt[3]{{P^{{{\text{ind}}}} /2C^{{{\text{eff}}}} }}. $$
(9)

Finally, the total energy consumption (\(\xi_{{{\text{tot}}}}\)) of cloud system due to workflow application tasks is calculated by summing up the energy consumption of individual tasks as follows:

$$ \xi_{{{\text{tot}}}} = \sum\limits_{j = 1}^{n} {\xi (t_{j} ,f_{r,l} )} \quad {\text{Objective 1}}. $$
(10)

Reliability model

At the time of application execution, different types of faults occur such as permanent and transient because of many reasons, like hardware malfunction, interference occur due to external devices, software bugs and devices exposed to extreme temperatures. It is very difficult to avoid them. However, occurrence of transient faults is high during the task’s execution and can be taken care. Thus, in the current study, we concentrate on the transient faults only. Poisson distribution is advantageous to model the transient fault with arrival rate λ [8, 45]. The arrival rate of the transient fault is dependent upon the corresponding voltage (vdd) and operating frequency (\(f_{r,l}\)). When the value of \(f_{r,l}\) decreases then the fault rate is exponentially increased and it is important for the energy conservation purpose. Considering the exponential model for voltage scaling [15], the fault of the processor increases as the supply voltage decreases. At minimum frequency (fmin), the fault rate is max i.e. \(\lambda_{\max } = \lambda_{0} 10^{d}\) where positive constant d signifies fault rate dependency. When the system performs their execution at frequency fl, the fault rate relation is:

$$ \lambda (f_{r,l} ) = \lambda_{0} 10^{{\frac{{d(1 - f_{r,l} )}}{{1 - f_{\min } }}}} , $$
(11)

where \(\lambda_{0}\) is the initial value of the average fault rate at fL.

Further, the system reliability is probability of executing the task completely without failure. Poisson distribution is used by transient fault [15, 46], then reliability of a task tj with execution time ET(tj), is given as:

$$ {\text{Reliability}}(t_{j} ,f_{r,l} ) = {\text{e}}^{{ - \lambda (f_{r,l} )\frac{{{\text{ET}}(t_{j} )}}{{f_{r,l} }}}} . $$
(12)

Finally, total system’s reliability which is the probability of executing n number of tasks corresponding to directed acyclic graph G without failure and is as follows:

$$ {\text{Reliability}}_{{{\text{system}}}} (G) = \prod\limits_{j = 1}^{n} {{\text{Reliability}}_{{t_{j} }} (f_{r,l} )} . $$
(13)

In order to optimize (maximize) the reliability, we have to minimize the exponential factor (as per (Eq. 12)) represented as reliability exponential factor (\({\text{RF}}(t_{j} ,f_{r,l} )\)) which is given by

$$ {\text{RF}}(t_{j} ,f_{r,l} ) = \lambda (f_{r,l} )\frac{{{\text{ET}}(t_{j} )}}{{f_{r,l} }}. $$
(14)

Thus, reliability maximization is formulated by minimizing the reliability exponential factor, which is as follows:

$$ {\text{minimize}} ({\text{Reliability}}\;{\text{exponential}}\;{\text{factor}}(S)) = \sum\limits_{j = 1}^{n} {{\text{RF}}(t_{j} ,f_{r,l} )} \quad {\text{Objective}}\;{2}. $$
(15)

Problem statement

In this paper, we addressed the problem of efficient task scheduling corresponding to workflow application onto the dynamic set of cloud computing resources with the aim to jointly optimize multiple conflicting objectives such as lessen the energy consumption of the system and enhancing the system’s reliability.

Proposed algorithm

This section presents the proposed ε-fuzzy dominance based reliable green workflow scheduling (FDRGS) algorithm in cloud environment. Our problem of workflow scheduling is multi-objective (MOP) where we are jointly optimizing two conflicting objectives of energy consumption and reliability. Our algorithm works in two phases. Firstly, the rank is assigned to all tasks of the workflow application, and then the tasks which are having the highest rank are assigned the highest priority. Secondly, selection of the most appropriate processor for every task is done. Fuzzy dominance sorting mechanism is used to find the best trade-off solution based on the fuzzy dominance values. The proposed algorithm obtains the single best compromised solution by taking the relative fitness of the trade-off solutions with less overhead. Further, DVFS approach is applied to optimize the energy consumption along with reliability of the task by choosing accurate frequency level of the processor.

Fuzzy dominance sort

Normally, in MOP, there does not exist a single solution i.e. best in terms of all the conflicting objectives. Thus, Pareto optimality concept is used to obtain the multiple solutions in multi-objective problems. In Pareto-optimal set of solutions, every non-dominating solution is considered to be good because the concept of non-dominance does not determine the point by which one solution is dominating the other solution. As a result, a new metric called ε-fuzzy dominance [47] is introduced which quantify the relative fitness of solutions that provides the more crispy definition of dominance. Thus, it leads to fast convergence towards the optimal set of solutions.

Let the MOP problem contains M number of objectives functions gk (s) where k = 1, 2,.., M and let \(\sigma \subset K^{\partial }\) where \(\partial\) is the dimensions and it is the solution set that consists of all possible solution space.

Some important definitions of Fuzzy dominance can be stated as follows:

  • Fuzzy k-dominance to a solution: there exist a increasing function \(\mu_{{{\text{dom}}}}^{k} (.)\) and its values lies between 0 and 1. Solution \(x \in \sigma\) is k-dominated solution to \(y \in \sigma\) iff \(g_{k} (x) \prec g_{k} (y)\) occurs. If there exist this relation \((x \succ_{k}^{F} y)\) then the degree of fuzzy dominance is represented as

    $$ \mu_{{{\text{dom}}}}^{k} (g_{k} (y) - g_{k} (x)) = \mu_{{{\text{dom}}}}^{k} (x \succ_{k}^{F} y). $$
    (16)

    The membership function \(\mu_{{{\text{dom}}}}^{k}\) is used to calculate a fuzzy k-dominance solution and its value is zero for the negative arguments [47]. By using ε-Fuzzy k-dominance, non-zero values are allowed. In order to obtain non-zero values, a trapezoidal membership function is used. In other words, \(\mu_{{{\text{dom}}}}^{k} (g_{k} (y) - g_{k} (x))\) is zero if \(g_{k} (y) \ge g_{k} (x)\).Assume \(\Delta_{k}\) signifies the maximum value of \(g_{k} (y) - g_{k} (x)\) and membership value \(\mu_{{{\text{dom}}}}^{k} (g_{k} (y) - g_{k} (x))\) is 1 if \(g_{k} (y) - g_{k} (x) \ge - \in\) indicates that \(\mu_{{{\text{dom}}}}^{k} (g_{k} (y) - g_{k} (x))\) reaches to its maximum value. The membership function \(\mu_{{{\text{dom}}}}^{k} (g_{k} (y) - g_{k} (x))\) is defined as follows:

$$ \mu_{{{\text{dom}}}}^{k} (g_{k} (y) - g_{k} (x)) = \left\{ {\begin{array}{*{20}c} 0 & {g_{k} (y) - g_{k} (x) \le - \in } \\ {\frac{{g_{k} (y) - g_{k} (x)}}{{\Delta_{k} }}} & { - \in < (g_{k} (y) - g_{k} (x)) < \Delta_{k} - \in } \\ 1 & {g_{k} (y) - g_{k} (x) \ge - \in } \\ \end{array} } \right\}. $$
(17)
  • ε-Fuzzy dominance to a solution: The solution \(x \in \sigma\) is fuzzy dominance solution iff for every k \(\in\) (1, 2,…M), \(x \succ_{k}^{F} y\) holds. Fuzzy intersection operator and t-norm operator is used to calculate the degree of fuzzy dominance and it is stated as follows:

    $$ \mu_{{{\text{dom}}}} (x \succ^{F} y) = \bigcap\limits_{k = 1}^{M} {\mu_{{{\text{dom}}}}^{k} (x \succ_{k}^{F} y)} . $$
    (18)
  • ε-Fuzzy dominance in the solution space: There exist a set (solution) S and the solution y ε S is fuzzy dominated if given solution set is dominated by another solution set x ε S. Therefore, union operation with t-co norm is used to calculate the value of fuzzy dominance.

    $$ \mu_{{{\text{dom}}}} (S \succ^{F} y) = \bigcup\limits_{x \in S} {\mu_{{{\text{dom}}}} (x \succ^{F} y)} . $$
    (19)

After calculating the degree of fuzzy dominance, every solution is having a single value that signifies by how much one solution dominates another solution in the solution set. If the value of fuzzy dominance is less, it indicates that a better solution is achieved. Hence, after each iteration, this procedure is helpful to find out the best solution according to fuzzy dominance value.

Task priority

Before assigning the task to the specified processors, we determined the order of tasks in order to satisfy dependency constraints among tasks. The concept of rank is used to generate the topological ordering of tasks; the upward rank values are recursively computed as follows:

$$ {\text{rank}}(t_{j} ) = \overline{{w(t_{j} )}} + \mathop {\max }\limits_{{t_{s} \in {\text{succ (t}}_{{\text{j}}} {) }}} (\overline{{{\text{CT}}(t_{j} ,t_{s} )}} + {\text{rank}}(t_{s} )), $$
(20)

where succ(tj) represents the immediate children of task tj. The rank value for texit is as:

$$ rank(t_{exit} ) = \overline{{w(t_{exit} )}} $$
(21)

The task that is having a greater value of rank has a higher priority, and then the task scheduling sequence is generated as per the decreasing order of the rank values.

Proposed FDRGS Algorithm

figure a

The proposed FDRGS algorithm is depicted in Algorithm 1. Firstly, average execution time \(\overline{{{\text{ET}}(t_{j} )}}\) is calculated in step 2. Step 5 calculates the average communication time \(\overline{{{\text{CT}}(t_{j} ,t_{k} ) \, }}\) for all tasks. the upward rank of all tasks is calculated in Step 8, starts from the texit and sorts the task in decreasing order of rank value. Step 10 calculates the EST and EFT for all the tasks in DAG G. Step 13 calculates the energy efficient frequency for each physical machine PMr. Step 14 calculates the reliability index for every task. In step 16, algorithm initializes a scheduling solution sequence SS to store all possible solutions and initializes it to empty. Step 21 firstly iterates for all the tasks sorted by their order of rank. The main inspiration is to lengthen each solution in SS by assigning the next task which is to execute on m possible resources. New solutions can be generated by iterating it over all resources in step 25 and the new intermediate schedules are added to the new set SS in step 26. After obtaining all the solutions, fuzzy dominance values are calculated (as per Algorithm 2) and if values of fuzzy dominance are same which indicates that both solutions are equally good, then perimeter assignment (Algorithm 3) is used (step 30) to find the uniform spread of solutions. Finally, we select the best K solutions from the SS’ to the SS in step 31 representing the best trade-off solution sequence.

figure b

Algorithm 2 calculates the fuzzy dominance values for the solutions represented by SS. Step 2, creates a temporary variable α which is used for storing fuzzy dominance values in a solution set (step13). Steps 3–12 calculates the value of fuzzy k-dominance using Eq. (17) which is then used to compute fuzzy dominance by a solution as per Eq. (18) and fuzzy dominance in a solution space is calculated as per Eq. (19).

If multiple solutions have the same fuzzy dominance values, then the perimeter assignment function (Algorithm 3) is used to find the spread of solutions that represents the perimeter value of the largest M dimensional hypercube in the whole solution space. Boundary solutions are assigned with infinity. The solutions with high perimeter value are preferred as it maintains the diversity and indicate the sparsity region along the scheduling solution space. The value of perimeter \(p(SS_{l} )\) is

$$ p(SS_{l} ) = \sum\limits_{k = 1}^{M} {\frac{{g_{k} (a) - g_{k} (b)}}{{(\max g_{k} ) - \min (g_{k} ))}}} , $$
(22)

where a and b are the adjacent solution having the same value of fuzzy dominance. \(g_{k} (a)\) and \(g_{k} (b)\) are the kth objective solution. \(\max (g_{k} )\) and \(\min (g_{k} )\) are the largest and smallest value of kth objective function of solution sequence SS.

figure c

Algorithm 3 calculates the perimeter value of all solutions corresponding to M objectives. Step 3 initializes all solution values to zero. Algorithm sorts all the solutions based on objective value gk in step 6. Set the initial and last perimeter value to infinity in step 7. After that in step 9 calculates the value of all the remaining solutions using Eq. 22.

Illustrative example

In this section, we consider an example to illustrate the proposed FDRGS algorithm and explain the process of generating and selecting scheduling sequences. We consider three tasks (t1, t2, t3) run on two physical machines (PM1 and PM2). Firstly, for task t1, two solutions are generated and store it in a temporary solution set S. Generated Solutions are: S1 = {(t1, PM1)}, S2 = {(t1, PM2)}. These two solutions are sorted using fuzzy dominance sort (using Algorithm 2). When task t1 is assigned and next task t2 is ready, four possible solutions are generated and two solutions are selected from S and store in scheduling set SS. Generated solutions are:S1 = {(t1, PM1) (t2, PM1)}, S2 = {(t1, PM1) (t2, PM2)}, S3 = {(t1, PM2) (t2, PM1)}, S4 = {(t1, PM2) (t2, PM2)}. Fuzzy dominance sort is applied and selected first K solution i.e. S2 and S3. When task t1 and t2 are assigned, and t3 is ready to execute then 8 possible solutions are generated and stored in set SS. Generated Solution are:{S1 = {(t1, PM1) (t2, PM1) (t3, PM1)}, S2 = {(t1, PM1) (t2, PM1) (t3, PM2)}, S3 = {(t1, PM2) (t2, PM1) (t3, PM1)}, S4 = {(t1, PM2) (t2, PM1) (t3, PM2)}, S5 = {(t1, PM1) (t2, PM2) (t3, PM2)}, S6 = {(t1, PM1) (t2, PM2) (t3, PM1)}, S7 = {(t1, PM2) (t2, PM1) (t3, PM1)}, S8 = {(t1,PM2) (t2,PM2) (t3,PM1)}. Fuzzy dominance sort is applied, if two or more solutions have same fuzzy dominance value then a tie is broken with the help of perimeter assignment function (Algorithm 3). Finally, two solutions S5 and S7 are selected i.e. SS = {S5, S7 = {{(t1, PM1) (t2, PM2) (t3, PM2)}{(t1, PM2) (t2, PM1) (t3, PM1)}}. For the simplicity and clarity of the example, we omit the detail of fuzzy dominance sort and focused mainly on highlighting the process of generating the scheduling sequence by proposed FDRGS algorithm.

Simulation results and analysis

In order to evaluate the proposed FDRGS algorithm, we have conducted simulation experiments on CloudSim toolkit [48] considering different DAG applications like Gaussian elimination (GE) [49] and fast Fourier transform (FFT) [50] representing the real world workflow applications. We considered different matrix sizes to vary the number of tasks. We compared our proposed FDRGS algorithm with different state of art scheduling algorithms such as HEFT [10], Energy consciousness scheduling (ECS) [21] and reliability-aware HEFT (RHEFT) [11]. HEFT algorithm chooses the tasks according to their rank and assigns task to those processors having minimum EFT to minimize the makespan only. To minimize the energy consumption, ECS algorithm uses the concept of relative superiority (RS) to choose a processor for the ready task at suitable voltage/frequency. The processor having the largest RS value is chosen for the ready task. Reliability aware HEFT (RHEFT) aims to increase the system reliability considering the failure rate of the processors for mapping the tasks (Table 2). An extensive simulation was performed on real world task graphs by varying the number of available processors/physical machines, number of tasks and at different computation to communication ratio (CCR) values.

Table 2 Parameter values

Performance metrics

The proposed FDRGS algorithm’s performance is measured by considering following metrics:

  • Reliability: in workflow application, reliability is the probability of task execution over the assigned processor successfully. System reliability is obtained by multiplying reliabilities of all tasks as in Eq. (15).

  • Energy consumption: the complete energy consumption (\(\xi_{{{\text{tot}}}}\)) by processor during the task’s execution is the summation of energy consumed by all tasks as in Eq. (10).

  • Runtime: it is the length of time during which all the tasks execute and it is as shown in Table 5.

Workflow structure

Generally, a workflow application is modelled through DAG where the numbers of tasks are connected by precedence relation/dependencies. To perform the simulation of the workflow applications (DAGs), we used two sets of real-life problems i.e. GE and FFT task graphs. To evaluate the scheduling performance in cloud computing systems, Gaussian elimination and fast Fourier transform have been widely used which are briefly introduced as follows.

Gaussian elimination (GE) task graph

Gaussian Elimination algorithm is used to find a solution of the linear equations by calculating the rank of a matrix and inverse of an invertible square matrix in mathematics. Gaussian elimination is a parallel application, which has precedence constraints. Assume that \(\tau\) is the matrix size, which illustrates the GE graph. The total number of tasks are \(\frac{{\tau^{2} + \tau - 2}}{2}\) where \(\tau\) ≥ 2. Figure 2 depicts a GE graph with matrix size 5. The organization of the Gaussian elimination application graph is not changed, so we do not require mentioning the shape parameter and number of tasks explicitly. However, we varied the size of a matrix and CCR values in GE. In order to consider different sizes of the workflow, different values of matrix size are considered. Table 3 shows the values for the GE task graphs.

Fig. 2
figure 2

GE graph with matrix size 5

Table 3 Values for GE task graphs

Fast Fourier transform (FFT) task graph

FFT computation is used in different type of applications like mathematics, engineering and science. This algorithm calculates the discrete Fourier Transform and the inverse transform. Figure 3 shows FFT graph with the matrix size \(\tau\) = 4. Table 4 shows the parameter values considered for the FFT task graphs. The total number of tasks is calculated as:

$$ \left| {{\text{Task}}_{{{\text{tot}}}} } \right| = (2 \times \tau - 1) + \tau \times \log_{2} \tau . $$
(23)
Fig. 3
figure 3

FFT application with \(\tau\) = 4

Table 4 Parameter values for the FFT task graph

Results and analysis

In this section, we evaluated the proposed FDRGS algorithm with workflows represented by GE task graphs.

Performance at different sizes of task graphs

At first, we considered the varying number of tasks 9–65 (9, 20, 35 and 65) by considering different matrix sizes in GE to estimate the performance of the FDRGS algorithm. The results obtained corresponding to energy consumption and reliability are represented in Figs. 4 and 5, respectively. In Fig. 4, results show that our proposed FDRGS algorithm consumes less energy as compared to other comparative scheduling algorithms. FDRGS algorithm provides savings in energy with respect to ECS by 15–27%, HEFT by 25–31%, and RHEFT by 39–45%. HEFT algorithm is the popular list based scheduling algorithm and it considers only the makespan factor. In RHEFT algorithm, reliability and makespan are the prime concerns and they schedule tasks on most reliable processors without paying attention to the energy consumption, that’s why energy consumption is high in RHEFT algorithm. In ECS, the RS factor is used that contributes towards minimizing the makespan and energy consumption so overall gain in energy savings is less. Further, we observed that as the number of tasks increases, savings in energy is also increased due to the presence of dependency constraints where we can apply DVFS to reduce the energy consumption further.

Fig. 4
figure 4

Energy consumption with different number of tasks for Gaussian elimination graph

Fig. 5
figure 5

Reliability with varying number of tasks for Gaussian elimination graph

From Fig. 5, we observed that our proposed algorithm FDRGS performs better than other algorithms in terms of reliability. In case of ECS, they have applied DVFS to minimize energy consumption without considering the fact that with voltage scaling, reliability of the resource decreases exponentially. Thus, it provides very low reliability. However, in our proposed FDRGS, we are considering the appropriate frequency above the energy efficient frequency (fee) for the resources so that reliability is not compromised while preserving energy consumption. Reliability is minimum in ECS because they focused on makespan and energy consumption only.

Performance at different number of processors

Next, we considered different processors (such as 4, 8, 16 and 32) for workflow application with 65 tasks to estimate performance of FDRGS algorithm in Figs. 6 and 7. It is depicted in Fig. 6 that the energy saving increases as the number of processor increases. In energy consumption objectives, our FDRGS algorithm reduce energy to ECS by 11–21%, RHEFT by 26–38% and HEFT by 14–25%. This is due to the number of processors increases because when the size of processor is going to increase; less number of tasks is allotted to each processor. Due to this, the chances of scaling the voltage appropriately are increased and thus, in the proposed FDRGS algorithm, energy consumption is minimized.

Fig. 6
figure 6

Energy consumption with different number of processors for Gaussian elimination graph

Fig. 7
figure 7

System’s reliability with different number of processors for Gaussian elimination graph

Next, reliability of the FDRGS algorithm is better as compared to other considered algorithms. Due to availability of large number of processors, probabilities of choosing good resources are increased.

Performance at different values of CCR

Figures 8 and 9 represent the FDRGS performance. In workflow applications, we vary the CCR value considering 65 tasks with 10 processors. In ECS algorithm, as the CCR value is going to increase, the energy savings decreases because reduction of communication energy is not considered. In our proposed algorithm FDRGS, for computation intensive graph (CCR < 1), the percentage of energy saving with regard to ECS from 20 to 12%,HEFT by 28–21% and RHEFT by 32–28%.Further, for communication intensive graph, energy saving is there with regard to ECS by 10%, HEFT by 20% and RHEFT by 26%.

Fig. 8
figure 8

Energy consumption with different value of CCR for Gaussian elimination Graph

Fig. 9
figure 9

System’s reliability with different value of CCR for Gaussian elimination Graph

Next, we conducted simulation experiments for the FFT application graph. For this, we vary the size of tasks as 9, 14, 27, 65 for the task execution. The simulation results are shown in Figs. 10 and 11 for the energy consumption and reliability by varying number of application tasks, computing processors and their CCR value. Results signifies that proposed FDRGS algorithm gives superior results in terms of energy consumption and reliability for real world problems i.e. FFT task graph.

Fig. 10
figure 10

(i) Energy consumption for FFT task graphs at varied tasks, (ii) energy consumption for FFT task graphs at varied number of processor, (iii) energy consumption for FFT task graphs by varying CCR

Fig. 11
figure 11

(i) Reliability with varying number of tasks for FFT task graph, (ii) reliability for FFT at different number of processor, (iii) reliability at different CCR for FFT task graph

Further, Table 5 shows the runtime of our proposed algorithm FDRGS and comparative algorithms ECS [21], HEFT [10] and reliability HEFT (RHEFT) [11] for executing different tasks on FFT and GE task graphs at number of resources (8) and CCR equals to 1. The results show that runtime of the FDRGS algorithm increases, as the number of tasks is increasing because large number of tasks is required to generate the optimal scheduling solution. Run time complexity of the proposed FDRGS algorithm varies with the size of the application i.e. number of tasks. However, our FDRGS algorithm takes less run time to other comparative algorithms and is very computational efficient. This happens due to the pruning-off the multiple trade-off solutions using fuzzy dominance sorting.

Table 5 Runtime of four algorithms (our proposed algorithm FDRGS and comparative algorithm ECS [21], HEFT [10] and reliability HEFT (RHEFT) [11]

Statistical analysis

Statistical significance of the results corresponding to the proposed FDRGS algorithm is determined by the coefficient of variation (CoV). This is a statistical method which is used to determine the distribution of data about the mean value. It is data deviation as a proportion of its mean value, and is calculated as follows:

$$ {\text{CoV}} = \frac{{{\text{SD}}}}{{{\text{Mean}}}} \times 100, $$

where SD represents standard deviation.

To guarantee statistical correctness, we run our proposed FDRGS algorithm to an average of 15–20 runs. CoV of average energy consumption and average reliability has been studied with regard to different number of tasks and numbers of processors corresponding to Gaussian Elimination are shown in Tables 6, 7, 8 and 9. Smaller value of CoV for proposed FDRGS algorithm signifies that it is more efficient and stable with respect to comparative algorithms.

Table 6 Statistical analysis for energy consumption for varied tasks for GE task graph
Table 7 Statistical analysis for reliability for varying number of tasks for GE Task graph
Table 8 Statistical analysis for energy consumption for varying number of processor for GE task graph
Table 9 Statistical analysis for reliability for varying number of processor for GE task graph

A similar trend has been followed by fast Fourier transform workflow task graphs. CoV of average energy consumption and average reliability has been studied with respect to different number of tasks and number of processors is shown in Tables 10, 11, 12 and 13.

Table 10 Statistical analysis for energy consumption for varying number of tasks for GE task graph
Table 11 Statistical analysis for energy consumption for varying number of processor for FFT task graph
Table 12 Statistical analysis for reliability for varying number of tasks for FFT task graph
Table 13 Statistical analysis for reliability for varying number of processor for FFT task graph

Conclusion

Due to increasing demand for computation, energy consumption in data centers is increasing. Thus, reduction in energy consumption is the primary concern for the data centers. In recent, the need for minimizing energy consumption and maximizing system reliability has become important research. In this paper, we are reducing the system’s energy consumption by using DVFS technique along with maintaining the reliability of task execution. We proposed a ε-fuzzy dominance based reliable green workflow scheduling (FDRGS) algorithm, which optimizes the application’s reliability and energy consumption jointly using the ε-fuzzy dominance mechanism. The proposed FDRGS algorithm address the energy consumption and reliability very well in case of occurrence of transient faults and is helpful to find single best-compromised solution by taking the relative fitness of the trade-off solutions with less computation time. The performance of the proposed FDRGS algorithm is evaluated against other algorithms like ECS, HEFT and RHEFT by considering the real-world application workflows such as GE and FFT graphs. We validated our results by a well known statistical test namely coefficient of variation. The simulation results clearly manifest that the proposed FDRGS algorithm outperforms other algorithms considered for comparison in terms of energy consumption and reliability. FDRGS algorithm provides saving in energy by 15–45%.The proposed algorithm will be helpful for researchers, engineers in establishing reliable and energy efficient cloud applications. In future, we will extend our algorithm by considering other important objectives like cost and makespan for workflow applications. The deployment of our work on a multi-cloud or real cloud environment will also be a future step.