1 Introduction

In the past two decades, cloud computing has emerged as an enabling technology and it has been increasingly adopted in many areas including business, science, and engineering because of its inherent scalability, flexibility, and cost-effectiveness [1]. Currently, cloud computing is providing dynamic services like applications, data, hardware resources, and various IT services over the internet. The reliability and performance of cloud services depend on various factors including load balancing [2, 3] and job dispatching [4, 5]. Job dispatching is performed on the basis of different parameters so that it increases the overall cloud performance. A job may include entering data, processing, accessing software, or storage functions. The data center classifies jobs according to the service-level agreement and requested services. Each job is then assigned to one of the available servers. In turn, the servers perform the requested job, and a response is transmitted back to the user. With this requirement, current cloud data center needs to support large-scale global coverage [6], low cost [7, 8], low latency [9, 10], certain level of security [11, 12], and application availability for the customers.

Scalability in cloud is one of the major advantages brought by the cloud paradigm. However, there is an important issue of job dispatching among heterogeneous clouds. Efficient use of computing resources to minimize the execution time requires a job dispatching algorithm that can appropriately determine the assignment of jobs. Due to this variety of jobs and servers, task scheduling mechanisms for single system may not be effective in a distributed environment [1315]. Furthermore, mobile devices [16] are becoming one of the major sources of the workload for clouds in order to save energy at the mobile device itself and avoid moving bulky data over wireless networks between the mobile devices and data repositories.

However, it should be noted that the job dispatching itself is another kind of system overhead [17, 18]. These overheads include the following: the time required by computing hosts/nodes for updating their system load information in a real time manner, the communication costs for sharing those load information to make a decision, and the costs of job transmission. Therefore, the conditions of job dispatching is worth investigating and how they work should be considered. In order to handle different users’ job requests, a job dispatcher/controller monitors the large-scale and heterogeneous cloud systems with low cost (overhead).

Although the parallelization strategy enables scalability [19], a good load balancing scheme is necessary to achieve good performance. In this paper, we introduced an Inter Cloud Manager (ICM) job dispatching algorithm which is operating in not only small scale (centralized) but also large-scale (decentralized) environments. We compared its performance with three state-of-the-art load balancing algorithms. In the results, ICM showed superior performance of average response time in the congested situation. It means that the proposed ICM provides scalability based on clustering and decision-making. In addition, by using this experimental results, we can design the appropriate number of cloud server resources while changing the system loads. The rest of this paper is organized as follows. A literature review on decentralized load balancing algorithms in clouds and related technologies are presented in Section 2. In Section 3, we detail our proposed ICM. Experiments and measurement results are provided in Section 4. Finally, we conclude the paper in Section 5.

2 Related works

Load balancing has become an attractive issue since the emergence of distributed systems. Load balancing algorithms can be classified into sub categories from various perspectives. From one point of view, they can be classified into centralized and decentralized algorithms. The case where the load balancer resides at the master node is called centralized load balancing policy, while the other case where the load balancer resides at all the nodes under consideration is called the distributed (decentralized) load balancing policy. In this section, we mainly discuss the mostly known contributions in decentralized algorithms for the large-scale cloud.

In [20], an Ant Colony Optimization technique that improves upon the work in [21] was suggested. Both algorithms are using the ants’ behavior to gather information about the cloud hosts to assign the task to a specific host. However, the algorithm in [21] has a problem with ant synchronization and the author in [20] tried to solve this by adding the feature “suicide” to the ants. The Ant Colony algorithm has many advantages compared to other static algorithms. The advantages include fast decision-making, no single point of failure (SPOF), and low complexity.

In [22], a map-reduce-based entity resolution approach was discussed. It has two main tasks: map and reduce tasks. Since several map tasks can read entities in parallel and process them, the reduce adds one more load balancing level between the map task and the reduce task for the purpose of decreasing the overload on these tasks. The job dispatching in the middle stage divides only the large tasks into smaller tasks and then these smaller tasks are sent to the reduce tasks based on their availability.

In [23], a dual direction downloading algorithm from FTP servers (DDFTP) that is available over the Internet, cloud, and grid environments was proposed. This technique utilizes the availability of replicated FTP servers to enhance file download times through concurrent downloads of file blocks. The algorithm reduces the network communication needed between the clients and hosts and therefore reduces the network overhead. Most of the distributed algorithms take quite a long time for their decision-making process. However, DDFTP has low complexity and dispatches a job very fast.

The algorithm proposed in [24] is a load balancing for Internet distributed services (IDS) which are distributed all over the world. A middleware is described to implement this protocol. IDS also uses a heuristic to help web servers to endure overloads. It reduces the service response times that limits the redirection of requests to the closest remote servers without overloading them. IDS is a complex algorithm and generates a large amount of network overhead, but its decision-making process is fast.

In [25], a dynamic load balancing algorithm called load balancing min-min (LBMM) technique is presented which is based on three level frameworks. LBMM helps in an efficient utilization of resources and enhances the work efficiency. However, LBMM itself is highly complex and generates a high amount of additional dummy packets. Furthermore, the distributed LBMM algorithm takes quite a long time for the decision-making process (algorithm speed).

The paper [26] proposed a new content-aware load balancing policy named as Workload and Client Aware Policy (WCAP). It applies a technique to specify the Unique and Special Property (USP) of the requests as well as computing nodes. Based on USP, the scheduler decides the node that is best suitable for the processing of the requests. This strategy is implemented in a decentralized manner with high overhead. By using the content information to narrow down the search, this technique improves the searching performance and hence the overall performance of the system. It also helps in reducing the idle time of the computing nodes, hence improving their utilization.

In [27], a honey bee behavior-inspired load balancing algorithm for cloud environment was proposed. This algorithm is derived from the behavior of honey bees in finding their food. Among the classes of bees, the forager bees forage for food sources. In case of load balancing, the servers are grouped into a virtual server (VS). Each VS calculates its profit which is similar to the bees’ waggle dance.

3 Inter Cloud Manager

In this section, we introduce and describe an Inter Cloud Manager (ICM) that is designed for the large-scale cloud. ICM consists of two main parts: clustering and decision-making.

3.1 Clustering

Data centers for cloud computing continue to grow in terms of both hardware resources and traffic volume, thus making cloud operation and management more and more complex. In this scenario, accurate and fine-grained monitoring [28, 29] activities are required to efficiently operate these platforms and to manage their increasing complexity. Furthermore, in order to be able to meet demands and provide satisfactory QoS [3033], individual monitoring mechanisms are needed and can lead to collection and processing of a large amount of runtime data. To monitor the clouds continuously, we use “Hello” packet to collect system load and the end-to-end delay from the client to the host. A Hello packet is sent periodically on each network interface to discover and test connections among neighbors. Hello packets are broadcasting to enable dynamic router and host server discovery. The field structure of Hello packet is shown in Table 1 and Fig. 1 in the body of the message. Especially, max. hop count is a key parameter in our measurement because this parameter determined the boundary of neighbors.

Table 1 Description of Hello packet field
Fig. 1
figure 1

Hello packet format

The ACK packet is sent by the receiving server (destination) and goes back to the sending server (source). Every interconnected server information is periodically updated by Hello packet through ACK packet that is being sent by the destination server. When the source receives the ACK packet from the destination, it computes and stores network link (hop count, delay, loss) and system load (job execution time, memory usage, number of waiting jobs) information into its neighbor table. Each cloud host can make its own “cluster” based on this table. The details of the ACK packet is described in both Table 2 and Fig. 2.

Table 2 Description of ACK packet field
Fig. 2
figure 2

ACK packet format

3.2 Decision-making

Decision-making is a core function of ICM and maintains best efficiency in dispatching job requests from a client in the ideal cloud location. Algorithm 1 shows the details of ICM’s decision-making process. This decision-making process is triggered when the system’s number of waiting jobs in the waiting queue is larger than 5. There are two steps that decision-making follows.

The first step is selecting one request among the job list that ICM holds. All job requests are saved in a “linked list” containing its first time that it was originally entered as job request and a basic description. One reason to use the linked list is to provide primary benefit of limiting memory waste as insertion, and removal of data constantly takes place. ICM processes job requests in order, from head to tail which means in our case that we used a first-in-first-out (FIFO) job selection process. We will explain details about experimental system settings in the following section.

The second step is the job dispatching process that was initially placed by the client and its task is to make sure that the dispatching process is well undertaken to the most ideal cloud host. ICM uses a “best-fit” approach that dispatches a job to the shortest average response time (ART) among operating clouds. Firstly, ICM checks the ART value. If all the hosts exceed the threshold of ART, ICM does not send and holds a job until a host’s ART is under the threshold. Secondly, ICM checks the status of hosts to see whether they are idle or not. If a host does not have any waiting jobs, ICM sends a job to that host. Finally, ICM sends a job to the host with the minimum value of ART.

As we discussed, ICM uses history base decision-making. We can obtain the network link and system load informations from the Hello-ACK packet. In this case, ART is calculated by summing between expected network delay (ENT) and expected job transfer time (EJTT) from the cloud host. ENT is calculated by using the formula (avg. network delay × num. of packets for current job) over (1 – loss rate). From a network’s point of view, a job (application) consists of several numbers of data packets. In our case, avg. network delay included processing, queuing, transmission, and propagation delays. EJTT is calculated by avg. job execution time × (num. of waiting jobs +1). Average job execution time can be calculated as shown in expression (1) where W new stands for current average job execution time and W old is previous value, W last is last job execution time. a is the value (0<a<1) that is worthwhile to notice as it is the usage of system memory from the last job within the cloud host. For instance, suppose the latest job used 30 % of system memory. Therefore, the value of a becomes 0.3 in the simplest term.

$$\begin{array}{@{}rcl@{}} W_{\text{new}} \leftarrow W_{\text{old}}(1-a) + W_{\text{last}}\times a \end{array} $$
((1))

4 Experimental results

We have implemented experiment environments on a cloud test-bed with different types of server machines. We use a total of 15 servers for the experiment; 5 host servers have Intel Xeon 2.4 GHz dual core CPUs, and size of RAM is 0.5 GB where 5 have Xeon 3.0 GHz dual core CPUs with RAM size of 2.0 GB. Finally, the n five hosts have Xeon 2.8 GHz dual core CPUs with a RAM size of 1.0 GB. Each server physically distributes and runs Linux (Ubuntu) with CPU throttling enabled with the on demand governor, which dynamically adjusts the cores frequencies depending on load. In all experiments, we conducted both averaged measurements for long periods and observed the job response time while changing the job arrival rate.

4.1 System settings and job processing

As shown in Fig. 3, the system setting is largely divided into four client mobile devices and 15 cloud host servers. We use a Google reference mobile phone for the client machine. The client sends 32 ×32 matrix inversion jobs through the WiFi protocol. This 32 ×32 matrix is randomly generated by the client job generator. The client just sends a fraction of job requests with matrix data to the destination. The ICM module (controller) is located inside every server and follows the rule. The controller plays an important role not only in interconnecting the client with cloud hosts but in dispatching jobs to the ideal cloud host. The controller uses four different algorithms for dispatching a job, such as proposed ICM, Ant Colony, HFA, and WCAP.

Fig. 3
figure 3

System settings

A job is executed when the request arrives from the controller. After receiving this request, the system runs binary code and sends the result back to the client over the network. Each host server has different hardware resources (e.g., CPU, memory, storage) and network propagation delay values (10–5000 ms). In addition, each server processes jobs based on first-come-first-served (FCFS) scheduling policy. That is the commonly used and simplest way to develop a single waiting queue. Below are several steps of establishing a network connection among the client, controller, and the host. That is the basics of our cluster-based job dispatching.

  1. 1.

    The client attempts to connect with the job on the controller (SYN).

  2. 2.

    The controller accepts the connection, and after deciding which host should receive the connection, changes the destination IP (and possibly port) to match the job of the selected host (note that the source IP of the client is not touched).

  3. 3.

    The host accepts the connection and responds back to the original source, the client, via its default route, the controller (SYN/ACK).

  4. 4.

    The controller intercepts the return packet from the host and now changes the source IP (and possible port) to match the controller IP and port and forwards the packet back to the client.

  5. 5.

    The client receives the return packet, believing that it came from the controller, and establishes a network session (ACK)

  6. 6.

    The controller receives a job request packet from the client and forwards the packet to the appropriate host (decision-making)

4.2 Measurement results

We evaluate the performance of the proposed algorithm on a cloud test-bed and compare it with three decentralized algorithms (Ant Colony [20], WCAP [26], HFA [27]). We fixed the threshold value of ART for 3 min at every measurement. If every system has reached that value, ICM does not dispatch a job to the cloud hosts anymore. We also fixed the dispatching trigger point of each host as 5. It means that if the number of waiting jobs in a system is over 5, ICM starts forwarding remaining jobs to its neighbor.

The results for each algorithm are shown in Figs. 4, 5, and 6. The reported values were obtained by averaging the measurements. We changed the job arrival rate λ from 0.1 to 0.9 and measured 30 times at each λ point. For instance, if the λ is 0.5, all four clients send job requests with 0.5 requests per second in a probabilistic way. Figure 4 highlights the relative performance of the four algorithms while changing the max. hop count parameter from 2 to 4. In that measurement, we fixed the parameter of hello time interval for 10 s. In each case, ICM shows outstanding performance compared to other load balancing algorithms. ICM has smooth curves while others are rapidly increased for high traffic rates. One interesting thing is that max. hop count =3 shows the best performance. If the max. hop count of Hello packet is larger than 3, it causes degradation of system performance.

Fig. 4
figure 4

Measured response time while changing max. hop count

Fig. 5
figure 5

Measured response time while changing Hello interval

Fig. 6
figure 6

Control message overhead while changing Hello interval

In the case of Fig. 5, we changed hello interval parameter from 10 to 30 s. and fixed max. hop count =3. While increasing the hello time interval, ICM’s performance slowly decreased. Four algorithms show stable performance when the traffic rate λ is lower than 0.5. But, three algorithms reached system saturation when the traffic rate is higher than 0.6 while ICM still works well until λ reaches 0.7.

The four algorithms impose vastly different amounts of overhead, as shown in Fig. 6. We changed only the hello time interval parameter (10–30 s). As the results, WCAP generate least control message overhead. The proposed ICM mostly generate additional message at hello time interval =10, but when hello interval is larger than 20, the amount of overhead is less than the Ant Colony algorithm. Ant Colony uses the ants’ behavior to collect information of cloud node. However, it could easily cause a network overhead due to the large number of dispatched ants.

5 Conclusions

In this paper, we have proposed a decentralized job dispatching algorithm, designed to be suitable for the large-scale cloud environment. The proposed ICM uses additional Hello packets that observe and collect data. Comparative experimental measurement is carried out to compare the performance of ICM, Ant Colony, WCAP, and HFA while increasing the job sending rate. After evaluation, average response time from ICM demonstrated a higher performance than the other three algorithms. To use these experimental results, we can estimate the expected saturation point in cloud systems. However, in our system experiment, the client just sent a fraction of the computation job request to the destination. But in the real environment [3436], congestion can occur in any intermediate node, often due to limitation in resources, when data packets are being transmitted from the client to the destination. Congestion will lead to high packet loss, long delay, and a waste of resource utilization time. Therefore, in our next work, we will use communication jobs such as video streaming and VoIP rather than computation jobs.