1 Introduction

The surge in IoT device usage has led to the emergence of cloud computing as a significant research focus. It offers a variety of services in many different application areas, with the highest level of flexibility and scalability. The high growth of information and communication technologies (ICT) has resulted in integrating big data with the IoT, revolutionizing cloud services. Within this transformative framework, cloud computing is pivotal in enabling efficient and scalable solutions for managing big data. Numerous cloud service providers enable organizations to obtain the optimal software, storage, and hardware facilities needed to accomplish their goals at a much more affordable cost. Customers subscribe to the services they require under the cloud computing paradigm and sign a service level agreement (SLA) with the cloud vendor, outlining the quality of service (QoS) and conditions of service provision. Table 1 presents the service control that the various cloud service models offer to end-users. Load balancing is a method that distributes tasks among virtual machines (VMs) using a Virtual Machine Manager (VMM). It assists in handling different types of workloads, such as CPU, network, and memory demands (Buyya 2018) (Mishra and Majhi 2020). The cloud computing infrastructure has three significant challenges: virtualization, distributed frameworks, and load balancing. The load-balancing problem is defined as the allocation of workloads among the processing modules. In a multi-node environment, it is quite probable that certain nodes will experience excessive workload while others will remain inactive. Load unbalancing is a harmful event for cloud service providers (CSPs), as it diminishes the dependability and effectiveness of computing services while also putting at risk the quality of service (QoS) guaranteed in the service level agreement (SLA) between the customer and the cloud service provider (Oduwole et al. 2022). Verma et al. (2024) introduced a load-balancing methodology, utilizing genetic algorithms (GA), to improve the quality of the telemedicine industry by efficiently adapting to changing workloads and network conditions at the fog level. The flexibility to adapt can enhance patient care and provide scalability for future healthcare systems. Walia et al. (2023) cover several emerging technologies in their survey, including Software-Defined Networking (SDN), Blockchain, Digital Twins, Industrial IoT (IIoT), 5G, Serverless computing, and quantum computing. These technologies can be incorporated with the current fog/edge-of-things models for improved analysis and provide business intelligence for IoT platforms. Adaptive resource management strategies are necessary for efficient scheduling and decision-offloading due to the infrastructural efficiency of these computing paradigms.

Table 1 Service control offered to end-users by the various cloud service models

1.1 Need for load balancing, factors affecting and associated challenges

Intelligent Computing Resource Management (ICRM) is rapidly evolving to meet the increasing needs of businesses and sectors, driven by the proliferation of Internet-based technologies, cloud computing, and cyber-physical systems. With the rise of information-intensive applications, artificial intelligence, cloud computing, and IoT, intelligent computing monitoring and resource allocation have become crucial (Biswas et al. 2024). Cloud data centers typically need to be optimized because they are built to handle hundreds of loads, which could result in low resource utilization and energy waste. The goals of load balancing include reduced job execution times, optimal resource utilization, and high system throughput. Load balancing reduces the overall resource waiting time and avoids resource overload (Apat et al. 2023). In terms of the equilibrium load distribution, load balancing between virtual machines (VMs) is an NP-hard problem. The difficulty of this problem can be determined by taking two elements into account: huge solution spaces and polynomial-bounded computing. The load can be characterized as under-load, overloaded, or balanced in a cloud computing environment. Identifying overloaded and under-loaded nodes and then distributing the load across them is critical to load balancing (Santhanakrishnan and Valarmathi 2022). With the emergence of technology, many challenges have also ushered in a sequence. These challenges include storage capacity, high processing speed, low latency, fast transmission, load balancing, efficient routing, cost efficiency, etc. Load balancing is a crucial optimisation procedure in cloud computing, and achieving this objective depends on dynamic resource allocation. Some factors that affect load balancing in cloud computing are as follows:

  • Workload patterns: The variating workload, unpredictable traffic patterns, and heterogeneous applications may affect the efficiency of the cloud system.

  • Geographical distribution: The cloud data centres are generally located in remote areas that contribute to transmission delays. So, fog computing and edge computing are required to reduce these delays. We must efficiently manage the limited resources of the fog and edge devices.

  • Cost and budget constraints: Cost considerations have a big impact on load-balancing strategies. It frequently aims to use less expensive resources or minimize idle assets.

  • The dynamic nature of applications and monitoring necessitates the elasticity and scalability of cloud services. In addition, inadequate monitoring makes it challenging to balance the load.

  • SLA agreements and breaches: SLA violations are impacted by the services offered by cloud service providers. It is quite necessary to maintain the quality without compromising other factors like throughput, makespan, energy consumption, and cost.

  • Virtual Machine (VM) Migrations: An increase in the number of VM migrations leads to a decrease in service quality. While VM migration can be beneficial to some extent, its frequency can lead to an increase in time complexity. It takes a lot of time to transfer data from one VM to another, including copying memory pages to the host machine.

  • Resource availability: Insufficient resources, such as CPU, memory, or bandwidth, limit the load balancing efficiency.

  • Energy consumption is a critical factor in data centers. Load balancing is very necessary to reduce energy consumption by migrating VMs from overloaded resources to underloaded hosts.

Other factors like fault tolerance, predictive analytics, network latency and data security also affect load balancing in a cloud system. We have divided the technologies reviewed through this SLR into five categories: conventional/traditional, heuristic, meta-heuristic, ML-Centric and Hybrid. Traditional approaches to cloud computing resource allocation and load balancing are time-consuming, unable to yield fast results, and frequently trapped in local optima (Mousavi et al. 2018). In different cloud systems, where resource requirements are estimated at runtime, static load balancing algorithms might not be successful. Dynamic load balancing algorithms, like ESCE and Throttled mechanism, analyse resource requirements and usage during runtime, yet they may result in extra costs and overhead. Traditional algorithms often struggle to scale with the size and complexity of problems. Several articles explore traditional task scheduling algorithms, including Min-min, First come-first serve (FCFS), and Shortest-job-first (SJF). These algorithms are not used often due to their slow processing and time-consuming behaviour. To overcome the issue of conventional methods, a heuristic approach came into the area of research. Kumar and Sharma (2018) propose a resource provisioning and de-provisioning algorithm that outperforms FCFS, SJF, and Min-min in terms of makespan time and task acceptance ratio. However, the priority of tasks is poorly considered, highlighting a limitation in task allocation strategies. Heuristic algorithms demonstrate remarkable scalability. They are highly suitable for handling large-scale optimisation challenges in various industries, including manufacturing, banking, and logistics, due to their efficiency in locating approximate solutions, even in enormous search spaces (Mishra and Majhi 2020). Kumar et al. (2018) presented another heuristic method named ‘Dynamic Load Balancing Algorithm with Elasticity’, showcasing reduced makespan time and increased task completion ratio. Dubey et al. (2018) introduced a Modified Heterogeneous Earliest Finish Time (HEFT) algorithm, demonstrating improved server workload distribution to reduce makespan time. While promising, both studies lack comprehensive performance evaluations and limitedly address other Quality of Service (QoS) metrics, such as response time and cost efficiency. Hung et al. (2019) proposed an Improved Max–min algorithm, achieving the lowest completion and optimal response times. It outperformed the conventional RR, max–min and min-min algorithms.

The development of meta-heuristic algorithms aimed to address the shortcomings of heuristic algorithms, which typically produce approximate rather than ideal solutions. Hybrid techniques have gained traction in recent years, combining heuristic, traditional, and machine-learning approaches. Mousavi et al. (2018) propose a hybrid technique combining Teaching Learning-Based Optimization (TLBO) and Grey Wolf Optimization (GWO), achieving maximized throughput without falling into local optima. Similarly, Behera and Sobhanayak (2024) propose a hybrid GWO-GA algorithm, outperforming GWO, GA (Rekha and Dakshayini 2019), and PSO in terms of makespan, cost, and energy consumption. Further, we have also discussed the cloud and fog architecture and its working principles in the upcoming sections.

1.2 Motivation for the study

The Industrial Internet of Things (IIoT) has experienced significant advancement and implementation due to the quick progress and use of artificial intelligence techniques. In Industry 5.0, the hyper-automation process involves the deployment of intelligent devices connected to the Industrial Internet of Things (IIoT), cloud computing, smart robots, agile software, and embedded components. These systems can leverage the Industry 5.0 concept, which generates massive amounts of data for hyper-automated communication across cloud computing, digital transformation, human sectors, intelligent robots, and industrial production. Big data management requires cloud and fog technology (Souri et al. 2024). Similarly, telemedicine, facilitated by fog computing, has revolutionized the healthcare industry by providing remote access to medical treatments. However, ensuring minimal latency and effective resource utilization are essential for providing high-quality healthcare (Verma et al. 2024). Big data in the industrial sector is crucial for predictive maintenance, enabling informed decisions and enhancing task allocation in Industry 4.0, thus necessitating a proficient resource management system (Teoh et al. 2023). The growing demand for load balancing in various industries using cloud/fog services prompted us to contemplate and inspired us to compose an evaluation of the escalating necessity for resource management technologies. This review’s core contribution is to provide insights into innovative algorithms, their weaknesses and strengths, used dataset details, simulation tools, research gaps, and future research directions.

1.3 Objectives of the SLR

After a detailed review of the selected studies, we observe the following objectives:

  • Systematically categorise and identify different load balancing and task scheduling algorithms used in cloud computing.

  • To address fundamental research questions, such as the effectiveness of different algorithmic approaches, simulation tools, metrics evaluation, etc.

  • To analyse trends and patterns in the literature, such as the prevalence of Meta-heuristic, Hybrid, and ML-centric approaches, and identify any shifts or emerging paradigms in algorithm design.

  • To conduct a comparative analysis of the different algorithm categories, identifying strengths, weaknesses, research limitations and trade-offs between them.

  • Lay the groundwork for future technological advancements by identifying areas where further research and development are needed.

1.4 Research contributions of the SLR

Through this SLR, we have attempted to contribute the following insights, which are based on authentic, selected study material:

  • We have examined selected articles to identify the research patterns and technological advancements related to resource load balancing in cloud computing. We have devised research questions and attempted to ascertain their solutions.

  • Using this SLR, we presented a taxonomy of algorithms that provide solutions to the chosen problem.

  • We provided an in-depth examination of the limitations and advantages of different strategies, along with a thorough comparison study of the techniques discussed in Table 5, Table 7, and Table 8.

  • We have discussed the performance metrics related to load balancing and task scheduling in the cloud system. We have also explored the simulation tools that the authors in this field prefer.

  • We have tabulated some benchmarked datasets (Table 6) utilized by various authors to achieve several performance metrics.

  • Finally, we compiled the research gaps and potential areas for future research.

The paper is structured in nine sections, as shown in Fig. 1 above.

Fig. 1
figure 1

Various sections and subsections of the SLR

2 Methodology of the systematic literature review

This section lays out the components of a systematic literature review, including the search criteria, review methodology, and research questions. This process involves defining research questions or objectives, identifying relevant databases and sources, and systematically searching and screening for eligible studies. The search term constitutes a string encompassing all essential keywords in the research questions and their corresponding synonyms.

2.1 Search criteria and quality assessment:

The keywords utilized to form the search strings are “load balancing”, “task scheduling”, “cloud computing”, and “machine learning.” To extract relevant papers, the below advanced search query was used in Scopus Database:

figure a

The various computer science publication libraries were manually searched. The SLR search was conducted using the Scopus database, IEEE Computer Society, ResearchGate, Science Direct, Springer, and ACM Digital Archive.

A total of 550 papers were found initially using the above-mentioned advanced query. Then we applied the Inclusion–exclusion criteria provided in Table 2. Approximately 122 papers were excluded based on having zero citations or requiring purchase to access. We have incorporated cross-referenced studies to obtain a more comprehensive and quality analysis. We manually chose 35 cross-references from the extracted set that strictly adhered to the search criteria to encompass a broader range of reliable studies. A comprehensive selection of 96 papers was finalised, comprising 63 research articles exclusively considered for the technological survey.

Table 2 Inclusion–exclusion criteria for filtering the relevant articles for SLA

2.2 Inclusion–exclusion criteria

The criterion for accepting or rejecting a research paper for the study is explained in Table 2 below.

Data extraction has been performed to capture key information from each study, such as design, methods or techniques, research limitations, future scope, tools, evaluation metrics, and other significant findings. This captured information was then synthesized and analyzed through a systematic and structured approach and placed in a tabular format to provide insights and draw conclusions about the research questions.

2.3 Research questions:

This study aims to search for answers to the following research issues by investigating, comprehending, and evaluating the methods, models, and algorithms utilized to achieve task scheduling and load balancing.

  1. 1.

    What are the current load balancing and task scheduling techniques commonly used in cloud computing environments?

  2. 2.

    What are the key factors influencing the performance of load-balancing mechanisms in cloud computing?

  3. 3.

    Which evaluation matrices are predominantly utilized for assessing the efficacy of load-balancing techniques in cloud computing environments?

  4. 4.

    Which categories of algorithms are used more in the recent research trend in the cloud computing environment for solving load balancing issues??

  5. 5.

    Which simulation software tools have garnered prominence in recent scholarly analyses within the domain of cloud computing research?

  6. 6.

    What insights do the future perspectives within the reviewed literature offer in terms of potential avenues for exploration and advancement within the field?

This next section explores the working principle and architecture of cloud computing, which consists of fog and IoT application layers.

3 Cloud-fog architecture and relevant frameworks

Cloud architecture represents a centralized infrastructure that broadens the scope of cloud computing functionalities towards the network’s edge. It leverages fog computing, an intermediate layer between cloud servers and end devices, to enable real-time processing, data storage, and analytics closer to the data source. Fog nodes, deployed at the network edge, play the role of mediators linking end devices and the cloud, thus reducing latency and bandwidth consumption. These nodes can be physical or virtual entities, such as routers, switches, gateways, or even edge servers.

3.1 Working principles

The working principles of cloud architecture involve collaboration between cloud servers, fog nodes, and end devices, creating a distributed computing environment. An end device initiates a request, which first passes through the nearest fog node. The fog node performs initial processing, filtering, and aggregation of the data before sending a subset of it to the cloud for further analysis or storage. By offloading some processing tasks to the fog nodes, cloud-fog architecture reduces the burden on the cloud, improves response times, and enhances the overall system performance. During task execution, dynamic cloud load balancing techniques assign tasks to virtual machines and adjust the load on these machines based on the system’s conditions. (Tawfeeg et al. 2022). Alatoun et al. (2022) presented an EEIoMT framework for critical task execution in the shortest time in smart medical services while balancing energy consumption with other tasks. The authors have utilized ECG sensors for health monitoring at home. Similarly, Swarna Priya, et al (2020) have proposed an energy-efficient framework known as the ‘EECloudIoE framework’ for retrieving information from the IoE cloud network. The authors have adopted the ‘Wind Driven optimization algorithm’ to form clusters of sensor nodes in the IoE network. Then, the Firefly algorithm is utilized to select the ‘cluster head’ (CH) for each cluster. Sensor nodes in sensor networks are also used to track physical events in cases of widely dispersed geographic locations. These nodes assist in gathering crucial data from these sites over extended periods; however, they have problems with low battery power. Therefore, it is essential to implement energy-efficient systems using wireless sensor networks to collect this data. Still, cloud computing has some limitations, such as geographical locations of cloud data centers, network connectivity with end nodes, weather conditions, etc. To overcome these issues, Fog computing emerged as a solution. Fog computing acts as an arbitrator between end devices and Cloud Computing, providing storage, networking, and computation services closer to edge devices. The introduction of Edge Computing has brought about the emergence of various computing paradigms, such as Mobile Edge Computing (MEC) and Mobile Cloud Computing (MCC). The MEC primarily emphasizes a 2- or 3-tier application in the network and mobile devices equipped with contemporary cellular base stations. It improves the efficiency of networks by optimizing content distribution and facilitating the creation of applications (Sabireen and Neelanarayanan 2021). Figure 2 shows how the cloud, fog, and IoT layers work in collaboration.

Fig. 2
figure 2

The fog extends the cloud closer to the devices producing data (Swarna Priya, et al 2020; Vergara et al. 2023)

3.2 Cloud computing layer

Cloud computing facilitates virtualization technology, which combines distributed and parallel processes. Using centralized data centers, it transfers computations from off-premises to on-premises. It has become an advanced technology within the swiftly expanding realm of computing paradigms owing to these two principles: (1) ‘Dynamic Provisioning’ and (2) ‘Virtualization Technology’ (Tripathy et al. 2023). Dynamic provisioning is a fundamental concept in the realm of cloud computing. It refers to the automated process of allocating and adjusting computing resources to meet the changing needs of cloud-based applications and services. Virtual network embedding is essential to load balancing in cloud computing as it ensures the mapping of virtual network requests onto physical resources in an effective and balanced manner. By effectively embedding virtual networks onto physical machines, load-balancing algorithms can divide network traffic and workload evenly across the network infrastructure, preventing any single resource from becoming overloaded. Virtual network embedding may be utilized with load-balancing strategies like least connections, weighted round-robin, and round-robin to maximize resource usage and network performance (Apat et al. 2023; Santhanakrishnan and Valarmathi 2022).

3.3 Fog computing layer

Cisco researchers first used the term fog computing in 2012 to address the shortcomings of cloud computing. To offer fast and reliable services to mobile consumers, fog computing enhances their experiences by introducing a middle fog layer between consumers and the cloud. It is an improvement over cloud-based networking and computing services. The architecture of fog computing consists of a fog server as a fog device or fog node deployed in the proximity of IoT devices to provide resources for different applications. As a promising concept, fog computing introduces a decentralized architecture that enhances data processing capabilities at the network’s edge (Goel and Tiwari 2023). However, the limited resources in the fog computing model undoubtedly make it difficult to support several services for these Internet of Things applications. A prompt choice must be made regarding load balancing and application placement in the fog layer due to the diverse and ever-changing nature of application requests from IoT devices. Therefore, it is crucial to allocate resources optimally to maintain service continuity for end customers (Vergara et al. 2023). Unlike cloud computing, fog utilizes distributed computing with devices near clients with good computing capacity and diverse organizations for global connectivity. Mahmoud et al. (2018) introduced a new fog-enabled cloud IoT model by observing that cloud IoT is not the best option in situations where energy usage and latency are important considerations, such as the healthcare sector, where patients need to be monitored in real-time without delay. The energy allocation method used to load jobs into a fog device serves as the foundation for the entire concept. Table 3 presents a comparison between the features of cloud and fog computing paradigms.

Table 3 Comparison of features of cloud computing and fog computing paradigm (Swarna Priya, et al 2020; Goel and Tiwari 2023; Vergara et al. 2023)

3.4 IoT applications layer

Cloud-fog architecture finds applications in various domains, including IoT, healthcare (Alatoun et al. 2022), transportation, smart cities, and industrial automation (Dogo et al. 2019). Healthcare providers can leverage fog nodes for real-time patient monitoring, while industrial automation systems can benefit from edge analytics for predictive maintenance. Telemedicine, smart agriculture and industry 4.0 and 5.0 are other areas that employ IoT applications. Edge computing and cloud computing have given rise to additional computing paradigms such as mobile edge computing (MEC) and mobile cloud computing (MCC). The MEC primarily emphasizes a network architecture that includes a 2- or 3-tier application, and mobile devices equipped with modern wireless base stations. It improves network efficiency, as well as the dissemination of application content (Sabireen and Neelanarayanan 2021).

4 Literature review on load balancing (LB) and task scheduling

We have curated a representative collection of 63 research articles for a technology review. The literature review covers the period from 2014 to 2024. The main target of the LB is to spread the workload on available assets and optimize the overall turnaround time. Before 2014, traditional methods such as FCFS, SJF, MIM-min, Max–min, RR, etc., were recognized for their poor processing speeds and time-consuming job scheduling and load balancing systems. Konjaang et al. (2018) examine the difficulties associated with the conventional Max–Min algorithm and propose the Expa-Max–Min method as a possible solution. The algorithm prioritizes cloudlets with the longest and shortest execution times to schedule them efficiently. The workload can be divided into memory capacity issues, CPU load, and network load. In the meantime, load balancing techniques, with virtual machine management (VMM), are employed in cloud computing to distribute the load among virtual machines (Velpula et al. 2022). In 2019, Hung et al. (2019) introduced an enhanced max–min algorithm called MMSIA. The objective of the MMSIA algorithm is to improve the completion time in cloud computing by utilizing machine learning to cluster requests and optimize the utilization of virtual machines. The system allocates big requests to virtual machines (VMs) with the lowest utilization percentage, improving processing efficiency. The approach integrates supervised learning into the Max–Min scheduling algorithm to enhance clustering efficiency. Kumar et al. (2018) state that the updated HEFT algorithm creates a Directed Acyclic Graph (DAG) for all jobs submitted to the cloud. It also assigns computation costs and communication edges across processing resources.

The ordering of tasks is determined by their execution priority, which considers the average time it takes to complete each work on all processors and the expenses associated with communication between predecessor tasks. Subsequently, the tasks are organized in a list according to their decreasing priority and assigned to processors based on the shortest execution time. In the same way, Seth and Singh (2019) propose the Dynamic Heterogeneous Shortest Job First (DHSJF) model as a solution for work scheduling in cloud computing systems with varying capabilities. The algorithm entails the establishment of a heterogeneous cloud computing environment, the dynamic generation of cloudlet lists, and the analysis of workload and resource heterogeneity to minimize the Makespan. The DHSJF algorithm efficiently schedules dynamic requests to various resources, resulting in optimized utilization of resources. This method overcomes the limitations of the conventional Shortest Job First (SJF) method. A task scheduling process is shown graphically in Fig. 3.

Fig. 3
figure 3

Working of task scheduling in cloud computing

Another technique that many authors increasingly employ is GWO. The GWO technique correlates the duties of grey wolves with viable solutions for distributing jobs or equalizing workloads inside a network or computing system. The Alpha wolves lead the pack, representing the most optimal solution achieved up to this point. The Alpha receives assistance in decision-making and problem-solving from the Beta and Delta wolves, who represent the second and third most optimal alternatives, respectively. The omega wolves, who stand for the remaining solutions, are inspired by the top three wolves. The algorithm represents the exploration and exploitation stages in pursuing the optimal solution through a repetitive process of encircling, hunting, and attacking the target. In 2020, Farrag et al. (2020) published a work that examines the application of the Ant-Lion optimizer (ALO) and Grey wolf optimizer (GWO) in job scheduling for Cloud Computing. The objective of ALO and GWO is to optimize the makespan of tasks in cloud systems by effectively dividing the workload. Although ALO and GWO surpass the Firefly Algorithm (FFA) in minimizing makespan, their performance relative to PSO varies depending on the specific conditions. Reddy et al. (2022) introduced the AVS-PGWO-RDA scheme, which utilizes Probabilistic Grey Wolf optimization (PGWO) in the load balancer unit to find the ideal fitness value for selecting user tasks and allocating resources for tasks with lower complexity and time consumption. The AVS approach is employed to cluster related workloads, and the RDA-based scheduler ultimately assigns these clusters to suitable virtual machines (VMs) in the cloud environment. Similarly, Janakiraman and Priya (2023) introduced the Hybrid Grey Wolf and Improved Particle Swarm Optimization Algorithm with Adaptive Inertial Weight-based multi-dimensional Learning Strategy (HGWIPSOA). This algorithm combines the Grey Wolf Optimization Algorithm (GWOA) with Particle Swarm Optimization (PSO) to efficiently assign tasks to Virtual Machines (VMs) and improve the accuracy and speed of task scheduling and resource allocation in cloud environments. The suggested system effectively tackles the limitations of previous LB approaches by preventing premature convergence and enhancing global search capability. As a result, it provides several benefits, including improved throughput, reduced makespan, reduced degree of imbalance, decreased latency, and reduced execution time. The combination of GWO with GA, as demonstrated by Behera and Sobhanayak (2024), yields superior results. It provides faster convergence and minimum makespan in large task scheduling scenarios.

At the beginning of 2014, metaheuristic and hybrid-metaheuristic algorithms were used to address cloud computing optimization and load-balancing challenges. Zhan et al. (2014) suggested a load-aware genetic algorithm called LAGA which is a modified version of the genetic algorithm (GA). LAGA employs the TLB model to optimize makespan and load balance, establishing a new fitness function to find suitable schedules that maintain makespan while maintaining load balance. Rekha and Dakshayini (2019) introduced a task allocation method for cloud environments that utilizes a Genetic Algorithm. The purpose of this strategy is to minimize job completion time and enhance overall performance. The algorithm considers multiple objectives, such as energy consumption and quick responses, to make the best decisions regarding resource allocation. The evaluation findings exhibit superior throughput using the proposed approach, indicating its efficacy in task allocation decision-making. In 2023, Mishra and Majhi (2023) proposed a hybrid meta-heuristic technique called GAYA, which combines the Genetic Algorithm (GA) and JAYA algorithm. The purpose of this technique is to efficiently schedule dynamically independent biological data. The GAYA algorithm showcases improved abilities in exploiting and exploring, rendering it a highly viable solution for scheduling dynamic medical data in cloud-based systems. Brahmam and Vijay Anand (2024) developed a model called VMMISD, where they combined a Genetic Algorithm (GA) with Ant Colony Optimization (ACO) for resource allocation. The system also utilizes combined optimization techniques, iterative security protocols, and deep learning algorithms to enhance the efficiency of load balancing during virtual machine migrations. The model employs K K-means clustering, Fuzzy Logic, Long Short-Term Memory (LSTM) networks, and Graph Networks to anticipate workloads, make decisions, and measure the affinity between virtual machines (VMs) and physical machines. Behera and Sobhanayak (2024) also proposed a hybrid approach that combines the Grey Wolf Optimizer (GWO) and Genetic Algorithm (GA). The hybrid GWO-GA algorithm effectively reduces makespan, energy consumption, and computing costs, surpassing conventional algorithms in performance. It exhibits accelerated convergence in extensive scheduling problems, offering an edge over earlier techniques.

The combination of autoscaling and reinforcement learning (RL) has garnered significant attention in recent years due to its ability to allocate resources actively in a calm and focused environment (Joshi et al. 2024). Deep reinforcement learning (DRL) is a promising technique that automates the process of predicting workloads. DRL may make immediate decisions on resource allocation based on real-time monitoring of the system’s workload and performance parameters to effectively fulfil the system’s present demands. Ran et al. (2019) introduced a task-scheduling strategy based on deep reinforcement learning (DRL) in 2019. The working of the DRL-based load balancer is shown in Fig. 4. This method assigns tasks to various virtual machines (VMs) in a dynamic manner, resulting in a decrease in average response time and ensuring load balancing. The technique is examined on a tower server with specific configurations and software tools. It showcases its efficacy in balancing load across virtual machines (VMs) while adhering to service level agreement (SLA) limits. The approach employs deep reinforcement learning (DRL) and deep deterministic policy gradients (DDPG) networks to create optimal scheduling decisions by learning directly from experience without prior knowledge. In addition, Jyoti and Shrimali (2020) employed DRL in their research and proposed a technique called Multi-agent ‘Deep Reinforcement Learning-Dynamic Resource Allocation’ (MADRL-DRA) in the Local User Agent (LUA) and Dynamic Optimal Load-Aware Service Broker (DOLASB) in the Global User Agent (GUA) to improve the quality of service (QoS) metrics by allocating resources dynamically. The method demonstrates enhanced performance in terms of execution time, waiting time, energy efficiency, throughput, resource utilization, and makespan when compared to traditional approaches. Tong et al. (2021) present a new technique for task scheduling using deep reinforcement learning (DRL) that aims to reduce the imbalance of virtual machines (VMs) load and the rate of job rejection while also considering service-level agreement limitations. The proposed DDMTS method exhibits stability and outperforms other algorithms in effectively balancing the Degree of Imbalance (DI) and minimizing job rejection rate. The precise configurations of state, action, and reward in the DDMTS algorithm are essential for its efficacy in resolving task scheduling difficulties using the DQN algorithm.

Fig. 4
figure 4

Working of load balancer in cloud computing

Double Deep Q-learning has been employed to address load-balancing concerns. Swarup et al. (2021) introduced a method utilizing Deep Reinforcement Learning (DRL) to address job scheduling in cloud computing. Their approach employs a Clipped Double Deep Q-learning algorithm to minimize computational costs while adhering to resource and deadline constraints. The algorithm employs target network and experience relay techniques to maximize its objective function. The algorithm balances exploration and exploitation by using the e-greedy policy. This policy establishes the approach for selecting actions by considering the trade-off between exploration and exploitation. The system chooses actions randomly for exploration or based on Q-values for exploitation, thus maintaining a balance between attempting new alternatives and utilizing existing ones. In the same way, Kruekaew et al. (Mao et al. (2014) employ Q-learning to optimize job scheduling and resource utilization. The suggested method, Multi-Objective ABCQ, integrates the Artificial Bee Colony Algorithm with Q-learning to optimize task scheduling, resource utilization, and load balancing in cloud environments. MOABCQ exhibited superior throughput and a higher Average Resource Utilization Ratio (ARUR) than alternative algorithms. Q-learning enhances the efficiency of the ABC algorithm. Figure 5 presents the hybridisation trend of various techniques observed in the literature review.

Fig. 5
figure 5

Hybridization trend of some techniques as observed in SLR

Furthermore, the swarm-based technique known as Particle Swarm Optimisation (PSO) is increasingly being adopted by researchers to address challenges related to load balancing in cloud computing. Using PSO, combined with other prominent methods, leads to attaining an ideal solution through extensive investigation and exploration of the search space. Panwar et al. (2019) introduced a TOPSIS-PSO method designed for non-preemptive task scheduling in cloud systems. The approach tackles task scheduling challenges by employing the TOPSIS method to evaluate tasks according to execution time, transmission time, and cost. Subsequently, optimisation is performed using PSO. The proposed method optimises the Makespan, execution time, transmission time, and cost metrics. In 2020, Agarwal et al. (2020) introduced a Mutation-based particle swarm Optimization (PSO) algorithm to tackle issues such as premature convergence, decreased convergence speed, and being trapped in local optima. The suggested method seeks to minimise performance characteristics such as Makespan time and enhance the fitness function in cloud computing. In 2021, Negi et al. (2021) introduced a hybrid load-balancing algorithm in cloud computing called CMODLB. This technique combines machine learning and soft computing techniques. The method employs artificial neural networks, fuzzy logic, and clustering techniques to distribute the workload evenly. The system utilises Bayesian optimization-based augmented K-means for virtual machine clustering and the TOP-SIS-PSO method for work scheduling. VM migration decisions are determined with an interval type 2 fuzzy logic system that relies on load conditions. Although these algorithms demonstrated strong performance, they do not consider the specific type of content used by users. Adil et al. (2022) found that knowledge about the type of content in tasks can significantly enhance scheduling efficiency and reduce the workload on virtual machines (VMs). The PSO-CALBA system categorises user tasks into several content types, such as video, audio, image, and text, using a Support Vector Machine (SVM) classifier. The categorisation begins by selecting file fragments, which are tasks that consist of diverse file fragments of different content types. The initial classification stage involves utilising the Radial Basis Function (RBF) kernel approach to analyse high-dimensional data, which is a big challenge. Pradhan et al. (2022) provided a solution for the issue of handling complicated and high-dimensional data in a cloud setting. To address this challenge, they utilised deep reinforcement learning (DRL) and parallel particle swarm optimisation (PSO). The proposed technique synergistically integrates Particle Swarm Optimisation (PSO) and Deep Reinforcement Learning (DRL) to optimise rewards by minimising both makespan time and energy consumption while ensuring high accuracy and fast execution. The algorithm iteratively enhances accuracy, demonstrating superior performance in dynamic environments, and can handle various tasks in cloud environments. Jena et al. (2022) found that the QMPSO algorithm successfully distributes the workload evenly among virtual machines, resulting in improved makespan, throughput, energy utilisation, and reduced task waiting time. The performance of the hybridisation of modified Particle Swarm Optimisation (MPSO) and improved Q-learning in QMPSO is enhanced by modifying velocity based on the best action generated through Q-learning. The technique employs dynamic resource allocation to distribute tasks among virtual machines (VMs) with varying priorities. This approach aims to minimise task waiting time and maximise VM throughput. This strategy is highly efficient for independent tasks.

Load balancing poses a significant challenge in Fog computing due to limited resources. Talaat et al. (2022) introduced a method called Effective Dynamic Load Balancing (EDLB) that utilises Convolutional Neural Networks (CNN) and Multi-Objective Particle Swarm Optimisation (MPSO) to optimise resource allocation in fog computing environments to maximise resource utilisation. The EDLB system comprises three primary modules: the Fog Resource Monitor (FRM), the CNN-based Classifier (CBC), and the Optimised Dynamic Scheduler (ODS). The FRM system monitors the utilisation of server resources, while the CBC system classifies fog servers. Additionally, the ODS system allocates incoming tasks to the most appropriate server, reducing response time and enhancing resource utilisation. This strategy effectively decreases response time. Comparably, Nabi et al. (2022) presented an Adaptive Particle Swarm Optimisation (PSO)-Based Task Scheduling Approach for Cloud Computing, explicitly emphasising achieving load balance and optimisation. The solution incorporates a technique called Linearly Descending and Adaptive Inertia Weight (LDAIW) to improve the efficiency of job scheduling. The methodology employs a population-based scheduling system that draws inspiration from swarm intelligence. In this technique, particles represent solutions, and their updates are determined by factors such as inertia weight, personal best, and global best. The method can reduce task execution time, increase throughput, and better balance local and global search.

Table 4 gives an overview of the advantages and disadvantages of the state-of-the-art techniques. A comparative analysis of state-of-the-art methods on publicly benchmarked datasets is presented in Table 5.

Table 4 The Table highlights sections and subsections to answer the research questions
Table 5 Some state-of-the-art load balancing algorithms with weakness and strength features

4.1 Some essential load balancing metrics:

It is evident that meticulous monitoring and analysis of metrics enhance resource utilization, minimize downtime, and ensure a seamless user experience, ultimately boosting overall system reliability and scalability. Several metrics employed for assessing the balance of loads in the cloud are illustrated in Fig. 6.

  • Throughput: In cloud load balancing, throughput refers to the rate at which a cloud infrastructure can process and serve data or requests. Specifically, it represents the amount of work accomplished within a given time frame, reflecting the efficiency of the system’s ability to handle concurrent user demands. High throughput ensures that data or requests can be processed quickly and reliably, minimising latency and optimising resource utilisation. Throughput (tp) can be calculated by using the mathematical formula given in Eq. (1) below:

    $$ tp = \sum\limits_{(n = 1)}^{j} {(ExT)} $$
    (1)

    where n is the number of tasks, and ExT is the execution time of the jth task.

  • Makespan: Makespan denotes the overall duration needed to finish a specific set of tasks or jobs within a cloud computing environment. Minimum makespan represents the efficiency and performance of the system in handling and processing tasks. It can be calculated with the help of the following formula:

    $$ Makespan = Max~(ExTj|~j = 1,~2,~3~...) $$
    (2)
  • In equation (2), ExTj is the execution time of the jth virtual machine. A robust and efficient load balancing algorithm has minimum Makespan time.

  • Response time: Response time is when a user makes a request and when the cloud infrastructure delivers a response. Minimizing response time is crucial to providing a seamless user experience and ensuring optimal performance.

  • Reliability: It indicates the system’s ability to effectively handle failures, prevent downtime, and maintain continuous service availability. To detect and mitigate failures promptly, ensure seamless failover mechanisms, and provide continuous and reliable service to users even in the event of disruptions or high load conditions.

  • Migration time: Migration time refers to the duration required to transfer workloads or applications from one server or data center to another within the cloud infrastructure. It encompasses the process of migrating virtual machines, containers, or services to optimize resource allocation and handle changes in demand.

  • Bandwidth: It represents the capacity or available channel for data communication. It also refers to the maximum data capacity that may be transferred across a network connection within a specific period. Adequate bandwidth is essential for efficient load balancing, as it ensures the smooth and timely flow of data between servers and clients.

  • Resource utilization: It refers to the efficient allocation and management of computing resources within a cloud infrastructure to meet the demands of varying workloads. It involves optimizing the utilization of servers, storage, network bandwidth, and other resources to maximize performance and minimize waste. It can be measured with the help of a mathematical formula, as given in Eq. (3):

Fig. 6
figure 6

Classification of load balancing algorithms

$$R_{{es}} U\left( {VM_{k} } \right) = CTjk~/~Makespan$$
(3)
  • In equation (3), ResU is the resource utilization of the kth virtual machine (VM); CTjk is the completion time of the jth job on the kth VM.

  • Energy consumption: It can be defined as the ability of a cloud infrastructure to optimize its power consumption while maintaining optimal performance. It reduces energy consumption by dynamically allocating computing resources and powering down underutilized servers during low-demand periods. By minimizing power usage, cloud load balancing systems contribute to reducing carbon footprints, operational costs, and environmental impact while ensuring sustainable and eco-friendly operations in cloud computing environments.

  • Fault tolerance: A system can continue functioning uninterrupted in the presence of failures or errors. It involves designing load-balancing algorithms and mechanisms that can withstand and recover from various faults, such as server failures, network outages, or traffic spikes (Tawfeeg et al. 2022).

4.2 Taxonomy of load balancing algorithms and challenges associated with them

Mishra and Majhi (2020) have categorized the load balancing algorithms into four broad classes: Traditional, Heuristic, Meta-heuristic, and Hybrid. The authors have also explained the subcategories of meta-heuristic and hybrid algorithms based on their nature. Tawfeeg et al. (2022) have discussed three main categories of load-balancing algorithms, namely static, dynamic, and hybrid. Tripathy et al. (2023) mentioned in their review that the load-balancing algorithm based on their environment is generally classified into three main classes: static, dynamic, and nature-inspired. In this systematic review paper, we have tried to include the maximum range of algorithms by covering all the categories and sub-categories. Figure 6 represents all categories of load-balancing algorithms (Table 6).

  • Traditional Algorithms: Traditional algorithms are mainly classified into preemptive and non-preemptive. Preemptive means to forcefully stop an ongoing execution to serve a higher-priority task. After the completion of the execution of a higher-priority job, the preempted job is resumed. The priority of the task can be internal or external. Traditional algorithms commonly employed for load balancing include Round Robin (RR), Weighted Round Robin, Least Connection, and Weighted Least Connection. Round Robin assigns requests cyclically to each server, ensuring an equal distribution. Weighted Round Robin provides scalability by considering server weights and allocating a proportionate number of requests to each server based on its capabilities and performance (Praditha, et al. 2023). The Least connection (LC) algorithm assigns requests to the server with the fewest active connections, promoting load distribution efficiency. The Weighted Least Connection (WLC) enhances the previous algorithm by considering server weights. It assigns requests to servers with the least active connections, scaling the distribution based on server capabilities. Preemptive scheduling algorithms include round-robin and priority-based. Non-preemptive algorithms include Shortest Job First (SJF) and First Come First Serve (FCFS).

  • Heuristic-based Algorithms: Heuristic algorithms are problem-solving techniques that rely on practical rules, intuition, and experience rather than precise mathematical models. These are used to find approximate solutions in a reasonable amount of time. The heuristic algorithms aim to distribute workload efficiently among cloud and fog nodes. Compared to hybrid and meta-heuristic algorithms, heuristic algorithms are relatively straightforward and have reduced computational complexity. They often provide reasonable solutions but lack guarantees of optimality. There are two types of heuristic techniques: static and dynamic. When a task’s estimated completion time is known, the static heuristic is used. When tasks arrive dynamically, a dynamic heuristic can be applied. Algorithms like Min-min, Max-min (Mao et al. 2014), RASA, Modified Heterogeneous Earliest Finish Time (HEFT) (Dubey et al. 2018), Improved Max-min (Hung et al. 2019) and DHSJF (Seth and Singh 2019) are the prominent examples of the heuristic category.

  • Meta-heuristic based algorithms: Meta-heuristic algorithms are good at finding a global solution without falling into local optima. A meta-heuristic algorithm is a problem-solving technique that guides the search process by iteratively refining potential solutions. It is used to find approximate solutions for complex optimization problems, especially in cloud computing, where traditional algorithms often struggle due to the inherent complexity and dynamic nature of the environment. A particular meta-heuristic algorithm that has proven effective in cloud computing is the Genetic Algorithm (GA) (Rekha and Dakshayini 2019). GA mimics the process of natural selection, evolving a population of solutions to find strong candidates. By employing genetic operators like selection, crossover, and mutation, GA explores the solution space intelligently, adapting to changing conditions and providing near-optimal solutions for resource allocation, task scheduling, and load balancing in cloud computing environments. Other examples from the reviewed literature are GWO (Reddy et al. 2022), ACO (Dhaya and Kanthavel 2022), TBSLB PSO (Ramezani et al. 2014), TOPSIS-PSO (Konjaang et al. 2018), and Modified BAT (Latchoumi and Parthiban 2022). When two meta-heuristic methods are combined the new method is a hybrid meta-heuristic. An example of a hybrid metaheuristic is Ant Colony Optimization with Particle Swarm (ACOPS) (Cho et al. 2015).

  • Hybrid based algorithms: The hybrid algorithms integrate the advantages of centralized and distributed load-balancing algorithms to achieve better performance and scalability. It leverages the centralized approach to monitor and collect real-time information about the system’s state, workload, and resource availability (Geetha et al. 2024). Simultaneously, it incorporates distributed load-balancing techniques to efficiently divide the workload among fog nodes. This hybrid approach enhances the overall load-balancing efficiency, reduces network congestion, and improves the system’s response time. By dynamically adapting to changing workload patterns and resource availability, the hybrid algorithm ensures optimal resource utilization and enhances user satisfaction. A hybrid method that combines the Genetic Algorithm (GA) and the Grey Wolf Optimization Algorithm (GWO) is proposed by Behera and Sobhanayak (2024). The hybrid GWO-GA algorithm minimizes cost, energy usage, and Makespan. Similarly, other examples from the literature review are GAYA (Mishra and Majhi 2023), VMMSID (Brahmam and Vijay Anand 2024), DTSO-TS (Ledmi et al. 2024), etc.

  • ML-Centric algorithms: These algorithms combine machine learning facilities with existing algorithms to automate the function. This is one of the latest approaches in the research area and has proven to be the best way to deal with real-time-based scenarios. To address the challenges of load balancing, researchers have been increasingly focusing on machine-learning-centric algorithms. ML-based algorithms offer promising results in load balancing by dynamically allocating tasks based on workload characteristics and resource availability. These algorithms leverage ML techniques such as reinforcement learning, deep learning, and clustering to intelligently predict and allocate the workload across cloud fog computing environments. ML-centric algorithms deliver improved performance, reduced response time, and enhanced resource utilization by continuously learning from historical data and adapting to changing conditions. Furthermore, these algorithms also consider energy consumption and network traffic factors, ensuring a holistic load-balancing approach (Muchori and Peter 2022). Examples of ML-centric algorithms from reviewed literature are DRL (Ran et al. 2019), MADRL-DRA (Jyoti and Shrimali 2020), TS-DT (Mahmoud et al. 2022), FF-NWRDLB (Prabhakara et al. 2023) etc.

Table 6 A comparative analysis of state-of-the-art methods on publicly benchmarked datasets

Table 7 provides a comprehensive overview of recent load balancing and task scheduling algorithms, presenting information on the technology proposed, comparing technologies, research limitations, results, tools used, and potential future directions. Additionally, Table 8 outlines the evaluation metrics, advantages/disadvantages of the technologies reviewed, and objectives of the study.

Table 7 Comprehensive study on load balancing and task scheduling techniques in cloud computing
Table 8 Detailed literature review on advantages and disadvantages of various studies

5 Applications areas of load balancing in cloud and fog computing

There are various areas of applications where load balancing is very crucial. The healthcare sector is one area where efficient resource utilization and load balancing are highly desirable. According to Mahmoud et al. (2018), Fog computing integrated with IoT-based healthcare architecture improves latency, energy consumption, mobility, and Quality of Service, enabling efficient healthcare services regardless of location. Fog-enabled Cloud-of-Things (CoT) system models with energy-aware allocation strategies result in more energy-efficient operations, which are crucial for healthcare applications sensitive to delays and energy consumption. Yong et al. (2016) propose a dynamic load balancing approach using SDN technology in a cloud data center, enabling real-time monitoring of service node flow and load state, as well as global resource assignment for uneven system load distribution. Dogo et al. (2019) introduced a mist computing system for better connectivity and resource utilization of smart cities and industries. According to the authors, Mist computing enables smart cities to intelligently adapt to dynamic events and changes, enhancing urban operations. Mist computing is more suitable for realizing smart city solutions where streets adapt to different conditions, promoting energy conservation and efficient operations. Similarly, Sharif et al. (2023) presented a paper that discusses the rapid growth of IoT devices and applications, emphasizing the need for efficient task scheduling and resource allocation in edge computing for health surveillance systems. The proposed Priority-based Task Scheduling and Resource Allocation (PTS-RA) mechanism aims to manage emergency conditions efficiently, meeting latency-sensitive tasks’ requirements with reduced bandwidth cost. On the same track, Aqeel, et al. (2023) proposed a CHROA model that can be utilized for energy-efficient and intelligent load balancing in cloud-enabled IoT environments, particularly in healthcare, where real-time applications generate large volumes of data. Sah Tyagi et al. (2021) presented a neural network-based resource allocation model for an energy-efficient WSN-based smart Agri-IoT framework. The model improves dynamic clustering and optimizes cluster size. The approach combines the use of BPNN (Backpropagation Neural Network), APSO (Adaptive Particle Swarm Optimization), and BNN (Binary Neural Network) to accomplish the effective allocation of agricultural resources. This integration showcases notable progress in cooperative networking and overall optimization of resources. In the same manner, Dhaya and Kanthavel (2022) emphasize the importance of energy efficiency in agriculture, and the challenges in resource allocation, and introduce a novel algorithm ‘Naive Multi-Phase Resource Allocation Algorithm’ to enhance energy efficiency and optimize agricultural resources effectively in a dynamic environment. In this way, there are several application areas where load balancing and resource scheduling is crucial. In future, transportation, industry 4.0 and 5.0, IoT network systems, Smart cities, smart agriculture, and healthcare systems will be hotspots for research on load balancing. The following are the areas where resource allocation and utilization are critical, and where cloud service utilization is highest:

  1. 1.

    Telemedicine (Verma et al. 2024)

  2. 2.

    Industry 4.0 and Industry 5.0 (Teoh et al. 2023)

  3. 3.

    Healthcare system (Talaat et al. 2022)

  4. 4.

    Agriculture (Agri-IoT) (Dhaya and Kanthavel 2022; Sah Tyagi et al. 2021)

  5. 5.

    Real-time monitoring services (Yong et al. 2016)

  6. 6.

    Smart cities (Alam 2021)

  7. 7.

    Digital twining (Zhou et al. 2022; Adibi et al. 2024)

  8. 8.

    Smart business and analytics (Nag et al. 2022)

  9. 9.

    E-commerce (Sugan and Isaac Sajan 2024)

6 Research queries and inferences

After a detailed literature review, the answers to the research questions have been inferred successfully without any bias or by adding views from researchers. Below, the inferences drawn are given in the form of answers:

We elucidate the answers to the research questions below to provide a thorough understanding based on the examination of existing material.

Q1. What load balancing and task scheduling techniques are commonly used in cloud computing environments?

This SLR divides the current techniques into five categories: traditional, heuristic, meta-heuristic, ML-centric, and hybrid. We employed the content analysis method to determine the category of each technique used in the literature study, as shown in Table 7. From the literature review, it has been inferred that hybrid, meta-heuristic and ML-centric algorithms/techniques are researchers’ favourite choices for solving load-balancing issues in a cloud computing system. The percentage-wise utilization of various techniques is depicted in Fig. 7. In the future, ML/DL-based load-balancing algorithms will be the hotspot for researchers as there is an emerging trend of hybridising ML-centric approaches with existing ones.

Fig. 7
figure 7

Percentage-wise utilisation of various categories of load balancing algorithm from 2014 to 2024 based on SLR

7 What are the key factors influencing the performance of load-balancing mechanisms in cloud computing?

The performance of load balancing in the cloud is influenced by several aspects, including the availability of resources such as CPU, memory, storage, and network bandwidth, the nature of the workload, network latency, the load balancer algorithm, and the health of the server as well as fault detection and tolerance. The selection of the load balancing algorithm can significantly influence performance, as different algorithms vary in complexity and efficiency, affecting how resources are distributed. In cases of server overload or issues, the load balancer must be able to identify these problems and redirect traffic to other servers to maintain optimal performance.

Q3. Which evaluation matrices are predominantly utilized for assessing the efficacy of load-balancing techniques in cloud computing environments?

The utilization trend of various metrics over the period 2014–2024 is shown graphically in Fig. 8. We have employed the frequency analysis method to determine the year-wise utilization of each performance metric. Table 8 provides an in-depth analysis of the performance metrics attained in every study. The year-wise categorization of each metric is shown in Table 9. The metrics most frequently used to gauge load balancing in cloud computing environments are Makespan, resource utilization (RU), Degree of Imbalance (DI), cost efficiency, throughput, and execution time. Evaluation metrics like fault tolerance, QoS, reliability and migration rate require additional attention without compromising other factors. The row named ‘other’ in Table 9 includes parameters like convergence speed, network longevity, fitness function, packet loss ratio, success rate, task scheduling efficiency, scalability, clustering phase duration, standard deviation of load, accuracy, precision and time complexity.

Fig. 8
figure 8

The analysis of performance metrics used in load balancing based on SLR

Table 9 Comprehensive metrics evaluation of the reviewed literature

Q4. Which categories of algorithms have been used more in recent research trends in the cloud computing environment for solving load-balancing issues?

According to Fig. 9, it is inferred that the researchers prefer using hybrid algorithms for addressing load balancing and task scheduling problems in cloud computing. This preference arises because hybrid algorithms combine the functionalities of various algorithms, resulting in a precise and multi-objective solution to task scheduling and load-balancing challenges. During the period 2014, a heuristic approach was commonly used, but meta-heuristic approaches later replaced it. By 2022, the hybrid approach had become the dominant method. Interestingly, many of these hybrid techniques incorporate machine learning techniques to combine with other optimization methods.

Fig. 9
figure 9

Year-wise utilisation trend of various techniques used in load balancing

Q5. Which simulation software tools have garnered prominence in recent scholarly analyses within the domain of cloud computing research?

Figure 10 shows that 51% of the researchers use the CloudSim tool for simulation purposes, followed by Python with 11%. We have employed the frequency analysis method to quantify and compare the utilization of different simulation tools within each study. According to the literature review, the CloudSim simulation tool is the first choice of researchers, with 51% utilization and has been used more in the last few years. It allows users to model and simulate cloud computing infrastructure, resource provisioning policies, and application scheduling algorithms. The CloudSim simulation tool is an external framework that is available for download and can be imported into various programming software options like Eclipse, NetBeans IDE, Maven, etc. To simulate the cloud computing environment, the CloudSim toolkit has been explicitly integrated with NetBeans IDE 8.2 and Windows 10 as operating systems (Vergara et al. 2023).

Fig. 10
figure 10

Analysis of simulation tools based on SLR

Q6. What insights do the future perspectives within the reviewed literature offer in terms of potential avenues for exploration and advancement within the field?

According to this article, the future directions of this field focus on developing more advanced algorithms that harness the potential of machine learning and deep learning, enabling enhanced energy efficiency and overall system performance in cloud computing environments. Real-time monitoring and automation of systems using the AI approach are also hot topics to explore in future research. The future scopes recorded during the literature review are shown in Table 7.

All the responses in this study are deduced and documented based on the above literature review. It is important to note that these responses are impartial and not generated by the researchers.

8 Statistical analysis

The SLR attempts a bibliographic analysis to understand the development and present condition of research in various domains and investigates the dissemination of scholarly materials, which can unveil both dominant patterns and possible deficiencies within the academic body of work. We used the Scopus academic database to collect important information based on the keywords “load balancing and task scheduling in cloud computing using machine learning”. A total of 129 items were found. This analysis centres on this dataset of 129 items, illustrating the distribution of documents published in many critical subject areas. It offers valuable insights into the current priorities and interests of the academic community.

These publications are distributed across various subjects, providing insights into the interdisciplinary nature of this field, as shown in Fig. 11.

Fig. 11
figure 11

Subject-wise analysis of publications from 2014 to 2024 related to used keywords

9 Discussion

Our extensive literature study has discovered valuable insights and emerging trends crucial for advancing cloud computing technology. This discussion summarizes the research findings, answering the initial research questions and making conclusions based on a thorough examination of chosen studies conducted between 2014 and 2024.

9.1 Research gaps

Most research efforts are concentrated on a specific aspect of load balancing. Many systems are limited to either data center or network load balancing. There is an urgent necessity to address multiple aspects.

  1. 1.

    Load balancing is a single point-of-failure issue. Furthermore, most of the research concentrates solely on a limited number of performance parameters, such as Makespan, throughput, completion time, etc. The degree of Imbalance (DI) is a crucial parameter to work on.

  2. 2.

    There is a significant need to enhance quality measures such as QoS (Quality of Service), fault tolerance, network delay, VM (Virtual Machine) migration and risk assessment.

  3. 3.

    The integration of fog and edge computing to mitigate the requirement for massive amounts of data transfer. This will improve the flexibility and usefulness of cloud computing in multiple sectors.

  4. 4.

    Finally, the power conservation mechanism has not been given much thought by the researchers. There is a shortage of innovative thinking in power conservation when it comes to load balancing.

  5. 5.

    Geographical barriers impose network delay and data transmission delay issues. We need to focus on the development of cutting-edge technologies to overcome distance-related and delay-related issues (Muchori and Peter 2022).

  6. 6.

    Virtual Machine Migrations (VMM) is also a challenge that highly impacts the efficacy of cloud services. There is a dire need for design technologies that allow fewer VM migrations.

  7. 7.

    Despite the advancements, applying machine learning algorithms in cloud computing is complicated. The intricacy of these algorithms, combined with the requirement for extensive training data, presents substantial obstacles. The dynamic nature of cloud environments requires constant learning and adjustment of these models, which raises questions about their ability to handle large-scale operations and maintain long-term viability.

9.2 Integration of machine learning for enhanced load balancing and task scheduling

One key insight from this analysis is a growing reliance on machine learning methods to enhance load balancing and task scheduling processes. Although somewhat successful, conventional algorithms generally struggle in dynamic cloud systems where data and workload patterns continuously change. Due to their capacity to acquire knowledge and adjust accordingly, machine learning algorithms have demonstrated potential in forecasting workload patterns, enabling the implementation of more effective resource allocation strategies. This enhances efficiency and substantially decreases execution time and energy consumption, aligning with the objectives of achieving optimal resource utilisation and high system throughput (Janakiraman and Priya 2023; Edward Gerald et al. 2023).

9.3 Future directions

The future of cloud computing rests on advancing auto-adaptive systems capable of independently handling load balancing and task scheduling without human involvement. Fusing artificial intelligence (AI) and cloud computing can create systems that provide unparalleled efficiency and reliability. Creating efficient cloud services could be significantly improved by developing lightweight machine learning models that require minimum training data and can quickly adapt to changing conditions. Moreover, investigating unsupervised learning algorithms can potentially eliminate the requirement for large, labelled data, enhancing the application’s practicality. These are some of the most frequently observed future scopes based on this SLR:

  • Deployment of deep learning (DL) and machine learning (ML) techniques to predict load patterns: The predictive analysis of workload patterns can prevent resource underutilization or overloading. We can also use ML to reduce energy consumption and predict faults in cloud computing (Reddy et al. 2022; Mishra and Majhi 2023; Agarwal et al. 2020; Negi et al. 2021; Latchoumi and Parthiban 2022; Shuaib, et al. 2023).

  • Development of fault tolerance techniques integrated with load balancing: Only a small number of research studies examine security concerns on cloud computing services, like load balancing and fault tolerance, without elaborating on the connection between the two (Behera and Sobhanayak 2024; Tawfeeg et al. 2022; Brahmam and Vijay Anand 2024).

  • To extend the existing techniques for data security and privacy by incorporating blockchain technology with cloud computing (Edward Gerald et al. 2023; Saba et al. 2023; Li et al. 2020).

  • To achieve more QoS metrics such as scalability, elasticity, and applicability to cover extensive domains, is also scoped to extend research work (Adil et al. 2022; Talaat et al. 2022; Sultana et al. 2024).

  • Most of the researchers have focused on the energy consumption aspect. Future research should aim to achieve energy efficiency as energy is going to be one of the scantiest resources in future (Rekha and Dakshayini 2019; Farrag et al. 2020; Panwar et al. 2019; Mahmoud et al. 2022; Asghari and Sohrabi 2021).

  • To achieve cost-effectiveness and real-time load balancing are prominent research areas. Most of the researchers have plans to extend their work to real-time analytics and dynamic cloud networks (Kumar and Sharma 2018; Ni et al. 2021).

  • Response delays in real-time applications are crucial. Real-time analytics in a complex and dynamic environment is a hotspot for researchers. Healthcare systems, telemedicine domains and real-time monitoring or surveillance services are examples of delay-sensitive applications (Verma et al. 2024; Pradhan et al. 2022; Nabi et al. 2022; Shahakar et al. 2023).

  • Dynamic reallocation of dependent tasks is another scope for future research. Task priority-based scheduling optimize the cloud performance (Ran et al. 2019; Jena et al. 2022; Prabhakara et al. 2023).

  • Fog and edge computing Architectures have limited resources, and optimal resource scheduling is essential. Many authors have also discussed resource scheduling in fog and edge computing as a potential future area of study (Swarup et al. 2021; Kruekaew and Kimpan 2022).

This SLR records the future research scopes mentioned above, and Table 7 provides detailed information.

10 Conclusion

The study of the computational cloud is vast and comes with numerous challenges. It allows end users to access computational processes, leading to many individuals’ widespread use of cloud services. This widespread adoption has made cloud computing an essential part of various businesses, notably online shopping sites. This increased usage has put more strain on cloud resources like hardware, software, and network devices. Consequently, we need load-balancing solutions for efficient utilization of these resources. This SLR categorizes technologies into five classes: conventional/traditional, heuristic, meta-heuristic, ML-Centric, and Hybrid. Traditional approaches are time-consuming, slow, and often stuck in local optima. Traditional algorithms struggle to scale with problem size and complexity, leading to slow processing and time-consuming behavior. Heuristic algorithms, which demonstrate remarkable scalability, are suitable for large-scale optimization challenges in industries like manufacturing, banking, and logistics. Heuristic algorithms often produce approximate answers rather than perfect ones; consequently, meta-heuristic algorithms emerged to address these drawbacks. In recent years, hybrid strategies, which combine heuristic, conventional, and machine-learning approaches, have become increasingly popular. These approaches aim to utilize the advantages of several algorithms to overcome limitations and improve performance. This systematic literature review conducted on efficient load balancing and task scheduling in a cloud computing environment has provided valuable insights into different algorithms, research limitations, evaluation metrics, challenges, simulation tools, and potential future directions. The analysis has demonstrated that the current trend in the cloud computing environment involves the utilization of ML-centric and hybrid algorithms to address load balancing and job/task scheduling issues effectively. Furthermore, the findings indicate a growing interest among researchers in ML-centric techniques, showcasing a shift towards incorporating ML/DL approaches. Our study explained the fundamental structure of cloud computing and its operational principles. A comprehensive examination of evaluation metrics and simulation tools is conducted impartially. Lastly, we addressed the research questions that formed the basis of this literature review, providing well-supported answers derived from the information gathered. This systematic review is a foundational resource for future scopes in this domain. It offers valuable information to researchers and practitioners involved in the domain of load balancing in cloud computing architecture. Additionally, this SLR does not delve into specific aspects concerning security and privacy considerations or issues related to load balancing. This will be retained as a topic for future investigation on our part. Table 10 provides abbreviations for several terms.

Table 10 Abbreviations used in SLR