Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

An intelligent grid is supposed to manage its resources to meet the task requirements on the way to achieving the common objective. Self-configuration of computer grids lies in the fact that new computer nodes are automatically configured by software agents and then integrated into the grid. The whole process of self-configuration is similar to the “plug-and-play” rule for some operating systems. However, configuring agents launch connectivity and download some configuration parameters. If a new computer node is added to the middleware layer and powered on, it is instantly identified and registered by configuration agents.

A base node works due to several configuration parameters that define some aspects of data communications and energy power consumption. These parameters can be improved to change grid behavior, based on some administrator observations. Another way is to delegate this competences to optimization agents they find the most adjusted configuration to the workload and resource using. One of the most commonly used criterion of grid behavior is its reliability that should be maximized. The main dilemma is the fact that this problem is NP-hard and it is impossible to find an optimal configuration for hundreds of nodes.

In the presented model, we propose some optimization agents that are based on harmony search to find a suboptimal configuration of fault–tolerant grids processing big data. A fault-tolerant grid deals with failures of its nodes and software where each node has some duplicated servers associated with its [38]. One node is the primary, and some associated nodes are dedicated for backup [18]. Tasks are performed by primary and backup servers, concurrently. Another model of grid is based on assumption that there are no fault-tolerant nodes. A grid node cooperates with other nodes as backups. In case of a node failing, all tasks allocated to this server are re-allocated to one of its backups. Some algorithms of resource using take into account the failure/repair rates and the fault-tolerant overheads. These algorithms can improve the grid performance meaningfully, but the quality of configuration and delay for its founding are still under construction [20, 42].

In this paper, an outlook of harmony search metaheuristics is discussed in Sect. 2. Moreover, specific aspects for big data are presented in Sect. 3. Especially, Map-Reduce model for BD processing is studied in Sect. 4. Then, intelligent agents based on harmony search for improvement of fault-tolerant measure are described in Sect. 5. Moreover, some outcomes from numerical experiments are interpreted in Sect. 6.

2 Outlook of Harmony Search Metaheuristics

Harmony search can be applied for self-configuration support of some fault-tolerant grids. Harmony search metaheuristics HS models phenomena related to the process of playing on musical instruments [41]. An optimization process can be compared to a process of selection the best sound while improvising jazz musicians. Similarly, a conductor of orchestra searches the best harmony of several instruments or a compositor creates the best melody for different music lanes [1]. The HM concept was suggested by Geem [15, 45]. Figure 1 shows a diagram of the basic version of the HS metaheuristics [2].

Fig. 1.
figure 1

A diagram of the harmony search algorithm [29]

The HS algorithm determines a solution for one-criterion optimization problem with continuous decision variables that can be formulated, as follows [4]:

$$ \mathop {\hbox{min} }\nolimits_{{x \in \varvec{X}}} f\left( x \right), $$
(1)

where:

f(x) – a value of an objective function f for solution \( x \in \varvec{X,}\;f :\varvec{R}^{{J_{max} }} \to \varvec{R}; \)

x – a vector of decision variables, \( x = [x_{1} , \ldots ,x_{j} , \ldots ,x_{{J_{max} }} ]^{T} \) for \( l_{j} \le x_{j} \le u_{j} ,\;j = \overline{{1,J_{\hbox{max} } ;}} \)

\( J_{max} \) – a number of decision variables;

X – a set of decision variables.

The lower limit vector is \( l = [l_{1} , \ldots ,l_{j} , \ldots ,l_{{J_{max} }} ]^{T} \) and the upper limit vector is \( u = [u_{1} , \ldots ,u_{j} , \ldots ,u_{{J_{max} }} ]^{T} \), wherein \( l_{j} \in \varvec{R}, u_{j} \in \varvec{R}, l_{j} \le u_{j} \,{\text{for }}\,j = \overline{{1,J_{max} }} \). An initialization the harmonic memory HM (Fig. 1) occurs after setting the following parameters:

  • HMS - Harmony Memory Size 24;

  • HMCR - Harmony Memory Considering Rate is the probability of a random event that the value of the decision variable during improvisation (constructing a solution) is drawn from the memory HM; an uniform distribution is assumed to draw;

  • PAR - Pitch Adjusting Rate is the rate of the randomly selected decision variable;

  • NGmax - Number of Generations (Improvisations);

  • BW - Bandwidth of Generations that is the width of the interval to modify the value of the decision variable that is randomly selected from memory; the new value of the decision variable is modified by adding the value from the range [−BW, BW].

In memory HM, there are stored HMS randomly generated solutions with Jmax coordinates and the corresponding fitness function values fitness(x). If restrictions are imposed on the solution, its efficiency is reduced by the appropriate punishment in case of violation of restrictions. The efficiency of each solution can be increased by an amount such that the accepted value of non-negative. The basic version of the harmony search algorithm has been repeatedly modified to adjust to solve some optimization problems [22].

3 Intelligent Agent Architecture for Big Data

Big data (an acronym BD) is related to databases with petabyte capacities 1015 B. 10 terabytes is a large capacity for a financial transaction system, but it is too small to test a web search engine. BD is uncooperative to work with using some relational database management systems like DB2, INGRES, Oracle, Sybase or SQL Server. Big data requires hundred thousand processors for data processing like supercomputers [36], grids [11] or clouds [8]. Especially, cloud architectures are preferred to BD processing because of commercial data centers with expensive information.

Tasks developed SQL-like queries to BD are massive parallel because the short time of a query performing is required. For instance, a query for multi-terabyte datasets at BigQuery service in Google Cloud is performed during few seconds. BigQuery service is scalable cloud like IaaS Infrastructure as a Service. Furthermore, this RESTful web service enables interactive analysis cooperating with Google Storage [16]. The most important tasks are related to analytics, capture, search, sharing, storage, and visualizing. Moreover, some BD mining tasks can be used to find predictions as well as some descriptive statistics tasks can be developed for business intelligence [23].

BD can be characterized by the 4Vs model due to high volume, extraordinary velocity, great data variety, and veracity. Data can be captured via Internet of Things from different sensors like smartphones, tablets, microphones, cameras, computers, radars, satellites, radio-telescopes and the other sensors. Moreover, data can be captured from social networks. A storage capacity can achieve many petabytes for one volume that is high volume [26]. MongoDB is one of perspective solutions for BD because the NoSQL database supports data stored to different nodes. Mongo DB can cooperate with massively parallel cluster with lots of CPUs, GPUs, RAM units and disks [27]. A crucial problem with BD is related to reading from a storage system to obtain the rapid answer on a complex query that is divided on some parallel operations acting on diverse data. Big data can be spread over some partitions that run on some separate modes with own table spaces, logs, and configurations. In that case, a query is performed on all partitions concurrently [35, 44].

In an experimental grid called Comcute, two kinds of intelligent tasks have been considered to implement a middleware layer [13]. This grid is dedicated to parallel computing with using volunteer computing. Agents for data management send data from source databases to distribution agents. Then, distribution agents cooperate with web computers to calculate results and return them to management agents. Both types of agents can autonomously move from one host to another to improve quality of grid resource using. Moreover, the other agents based on harmony search have been introduced to optimize big data processing regarding some fault-tolerant aspects. These harmony search schedulers can cooperate with distributors and managers to give them information about optimal workload in a grid [9, 10].

The lambda architecture is developed for real-time BD analysis [26]. The batch layer of this architecture supports offline data processing by MapReduce framework [39]. This layer produces batch views of data, which can be exposed to external applications (Fig. 2). The serving layer offers prepared views to clients. The speed layer is responsible for real-time processing of data streams. It analyses data that was not yet processed by the batch layer. Speed layer produces real-time views that can be coupled with batch views to create complete representation of the extracted knowledge [19].

Fig. 2.
figure 2

Multi-agent real-time processing utilizing lambda architecture [37]

The lambda architecture can be defined in terms of a heterogeneous multiagent system [37]. An implementation requires integration of a few components: one for batch processing, another one for serving views, a different solution for real-time stream analysis and a component that merges real-time views with batch views [43].

The use of multiagent environment will provide a common way for information exchange between different component and a common execution model [40]. The differences between individual components of the lambda architecture lead to inherently heterogeneous realizations so the ability to handle diversity in agent systems in another motivation for this approach [22].

4 Map-Reduce Model for Fault-Tolerant Grid

Grid and volunteer computing systems are different from super-computing systems because inexpensive hardware commodities have been widely deployed, which is helpful to the scalability. But it also brings a large number of hardware failures. Moreover, many machines constantly restart to update systems, which cause huge software failures [14]. Similarly, the popular cloud computing model MapReduce also has to overcome the failures [5, 7].

When a job consists of thousands tasks, the possibility of a few failed tasks is very high. Several fault-tolerant applications can be executed in the platform, which can use the result despite of some failed tasks. To support such fault-tolerant computing, an open source implementation of MapReduce can be applied. Hadoop has already provided the interface, by which the job can tolerate a given percentage of failed tasks. It was observed that optimizing the availability of individual task is not an effective approach for ensuring the high availability of these multi-task jobs [30].

However, Hadoop implicitly assume the nodes are homogeneous, but it doesn’t hold in practice. These motivate to propose an optimal multi-task allocation scheme towards heterogeneous environments, which can tolerant a given percentage of failures to total tasks [28]. In this case the reduce function’s responsibility is to sum the each key’s values [34].

MapReduce is applied to solve several problems like large-scale machine learning for the Google News. Moreover, an extraction of data is used to produce reports of popular queries and extraction of geographical locations from a large corpus of web pages for localized search. In 2004, Google changed an indexing system that produces data used for web search service to system that used MapReduce. The new indexing system takes input documents that have been retrieved from a crawling system store as a set of files, and then they are processed by from five to ten MapReduce operations. It is easier to operate because of automatic resolving problems like machine failures, slow machines and networking hiccups [17, 25].

5 Harmony Search Agents for Local Grid Self-configuration

Intelligent agents can optimize a grid resource management for tasks related to big data queries. An agent based on harmony search metaheuristics AHS can reconfigure a local part of a grid. The whole grid is divided on zones and the AHS is assigned to its grid zone to support self-optimization of a system. The main part of AHS is a multi-objective scheduler for tasks from a middleware layer. This scheduler optimizes a probability that all tasks meet their deadlines, and the grid reliability [12]. We assume that each computer and each link between them are assumed to fail independently with exponential rates. It is preferred to allocate modules to computers on which failures are least likely to occur during the execution of task modules [3]. The rationale assumption is that repair and recovery times are largely implementation-dependent. Moreover, repair and recovery routines usually introduce too high time overheads to be used on-line for time-critical applications [6].

The overhead performing time of the task Tv by the computer \( \pi_{j} \in\Pi = \{ \pi_{1} , \ldots ,\pi_{j} , \ldots ,\pi_{J} \} \) is represented by tvj. Let the computer \( \pi_{j} \) be failed independently due to an exponential distribution with rate \( \lambda_{j} \). Computers can be allocated to nodes and also tasks can be assigned to them in purpose to maximize the reliability function R, as below [21]:

$$ R(x) = \prod\limits_{v = 1}^{V} {\prod\limits_{i = 1}^{I} {\prod\limits_{j = 1}^{J} {\exp ( - \lambda_{j} t_{vj} x_{vi}^{m} x_{ij}^{\pi } } } } ), $$
(2)

where

$$ x_{ij}^{\pi } = \left\{ {_{{0\;\;{\text{in}}\,{\text{the}}\,{\text{other}}\,{\text{case}} .}}^{{1\;\;{\text{if }}\pi_{j} \;{\text{is}}\,{\text{assigned}}\,{\text{to}}\,{\text{the}}\,w_{i} , { }}} } \right. $$
$$ x_{vi}^{m} = \left\{ {_{{0\;\;\,{\text{in}}\,{\text{the}}\,{\text{other}}\,{\text{case,}}}}^{{1\;\;\;{\text{if task}}\;T_{v} \;{\text{is}}\,{\text{assigned}}\,{\text{to}}\;w_{i} ,}} } \right. $$
$$ (x^{m} ,x^{\pi } ) = [x_{11}^{m} , \ldots ,x_{1I}^{m} , \ldots ,x_{vi}^{m} , \ldots ,x_{VI}^{m} ,x_{11}^{\pi } , \ldots ,x_{1J}^{\pi } , \ldots ,x_{ij}^{\pi } , \ldots ,x_{I1}^{\pi } , \ldots ,x_{Ij}^{\pi } , \ldots ,x_{IJ}^{\pi } ]^{T} . $$

Figure 3 shows the relation between the measure of system reliability R and time of using this system for the chosen two-computer system for λ1 = 0.001 [TU−1] and λ2 = 0.002 [TU−1].

Fig. 3.
figure 3

The time-depended reliability of two-computer system

6 Task Scheduling Algorithm

Let the distributed application An starts running after λn and complete it before δn [31]. Figure 4 shows an example of the task flow graph for two applications. Task m2 is performed with the probability q in a sub-graph denoted as OR (Fig. 4) and task m3 – with the probability (1−q). Task may be performed at the most Lmax times in a sub-graph denoted as Loop, and each repetition of this module is performed with the probability p. The task flow graph is split on some instances to schedule tasks if the sub-graph OR appears. There are 2Lmax instances for the task graph from Fig. 4. The instance, where task m2 appears and task m5 runs k times, occurs with the probability:

Fig. 4.
figure 4

A flow graph for two applications

$$ p_{i} = q\left( {1 - p} \right)p^{k - 1} $$
(3)

An allocation of modules to computers \( (x^{m} ,x^{\pi } ) \) creates possibility to schedule tasks for each computer. Times of task completions \( (C_{1} , \ldots ,C_{v} , \ldots ,C_{V} ) \) can be calculated for scheduled allocation modules to computers x. Let dv represents the completion deadline for the vth task. If \( C_{v} \le d_{v} \), then the time constraint is satisfied what can be written as \( \xi (d_{v} - C_{v} ) = 1 \). The state of deadline constraints regarding the ith instance of the flow graph with the set of tasks marked Mi is determined, as below [32]:

$$ S_{i} = \prod\limits_{{m_{v} \in M_{i} }}^{{}} {\xi (d_{v} - C_{v} (x))} . $$
(4)

Probability that all tasks meet their deadlines for K instances of the flow graph is calculated, as follows [33]:

$$ P_{D} (x) = \sum\limits_{i = 1}^{K} {p_{i} \prod\limits_{{m_{v} \in M_{i} }}^{{}} {\xi (d_{v} - C_{v} (x))} } . $$
(5)

Figure 5 shows an example of a compromise configuration for its middleware zone in the Comcute grid that was found by the agent based on the harmony search for its area consisted on 14 modules divided among 2 computers.

Fig. 5.
figure 5

A compromise configuration in the Comcute grid: (a) criterion space (b) a solution

7 Concluding Remarks and Future Work

Intelligent agents in the middleware of grid can significantly support efficiency of fault-tolerant self-configuration in grids. Agents based on harmony search can solve NP-hard multi-objective optimization problem of grid resource using to improve the level of fault-tolerance.

Our future works will focus on testing the other AI algorithms to find fault-tolerant configurations. Moreover, quantum-inspired algorithm can support big data, too [7].