Introduction

Wireless Sensor Networks (WSNs) have emerged as a vital technological advancement for real-time monitoring and data collection in diverse domains, including environmental and physical scenarios. These networks consist of numerous small resource-constrained sensor nodes responsible for sensing, processing, and transmitting data to gateway nodes, through multi-hop wireless communication. The microprocessor unit is employed within the sensor node for processing numerous pieces of sensed information. Such, WSNs find applications in earthquake detection, smart cities, environment monitoring, battlefield surveillance and many more.

The design and deployment of sensor nodes in WSNs are critical factors that significantly impact the network’s energy efficiency and overall performance. The limited resources of the sensor nodes, such as computing power, memory, and battery life, present critical challenges in achieving optimal deployment strategies. Efficient deployment algorithms are necessary to maximize the network's lifetime and ensure essential connectivity and coverage.

Deterministic and random sensor placement are two commonly used deployment approaches in WSNs. While deterministic placement techniques aim to achieve maximal coverage and connectivity with minimal sensor deployment, they may not be suitable for large-scale areas or hostile environments. Random sensor placement, on the other hand, is widely used but may not provide optimal coverage and connectivity. Also, an efficient deployment model will support maximizing the coverage of heterogeneous wireless sensor networks [1] and can also enable enhanced coverage in homogeneous wireless sensor networks of large regions and hostile environments [2]. The deployment problems in WSN are a major challenge since it influences the energy expended by the sensor in the whole system and the overall performance.

In this study, we propose an Intelligent Satin Bower Bird Optimizer (ISBO) based node deployment approach with reinforcement learning, ISBO-RL, for WSNs. The primary objective of ISBO-RL is to determine the optimal sensor placement to achieve improved coverage and connectivity. We have introduced the concept of reinforcement learning along with the ISBO algorithm, enabling the sensor nodes to adapt their deployment based on the network’s performance and changing environmental conditions.

The main contributions of this research include the application of the Satin Bower Bird algorithm for sensor deployment and the novel integration of reinforcement learning to enhance the deployment strategy. Performance evaluation of ISBO-RL on various simulation platforms along with comparisons with existing techniques, such as Genetic Algorithm (GA), Practical Swarm Optimization (PSO), and Simulated Annealing (SA), to showcase its superiority in terms of coverage and connectivity is carried out. The experimental results validate the effectiveness of the proposed approach and demonstrate its potential for improving WSN performance in various real-world scenarios.

Some of the existing research works relating to the proposed methodologies are briefly discussed in Sect.  “Existing Approaches”. Proposed model along with ISBO-RL algorithms and mathematical concepts are described in Sect.  “Proposed Model”. Simulation experimentation and Comparing of ISBO-RL with other techniques are explained in Sect.  “Summary”, with concluding summary in Sect.  “Summary”.

Existing Approaches

This section delves into various existing techniques for node deployment, offering a comprehensive analysis of their strengths and limitations. However, one noticeable gap in these approaches is the absence of reinforcement learning, which limits their adaptability and robustness in addressing the challenges of optimal sensor placement.

Farsi et al. [3] address node deployment in wireless sensors by categorizing various coverage techniques into classical deployment, meta-heuristic methods, and self-scheduling strategies. The paper compares these techniques in terms of coverage, connectivity, power consumption, and other performance metrics.

Priyadarshi et al. [4] categorize coverage techniques for Wireless Sensor Networks (WSNs) into four main categories: computational geometry-based, force-based, grid-based, and meta-heuristic based methods. Their aims is to compare these techniques considering their advantages and disadvantages, focusing on coverage, practical deployment challenges, sensing models, research issues, performance metrics, and WSN simulator comparisons, while also addressing ongoing research challenges.

ZainEldin et al. [5] introduce an Improved Dynamic Deployment Technique based Genetic Algorithm (IDDTGA) to increase the coverage area with the minimum number of nodes and minimize overlapping areas among neighboring nodes. 2-point crossovers are presented for demonstrating the parameter length encoder.

Yan et al. [6] proposed a growth ring style uneven node depth adjustment self-deployment optimization algorithm for improving the reliability and coverage of underwater wireless sensor networks (UWSNs), in the meantime, to resolve the problems of energy hole. A self-deployment method using a ring-style approach in node depth is proposed to enhance the reliability and coverage of UWSN, addressing the energy hole problem. This approach comprehensively presents the construction of a tree model and an optimal method for wide-ranging optimization, focusing on increasing coverage and energy balance. Song et al. [7] introduce a new sensor placement system, i.e., depends on evidence theory and caters for three dimensional USWN.

Xiang et al. [8] studied cuckoo search (CS) as a hybrid sensor system for optimizing node placement techniques. This method involves identifying the dominant location of mobile nodes that reduces the average motion distance and the number of mobile nodes.

Li and Liu [9] proposed a network coverage model based on evidence theory. The movement direction of the wireless sensors is evaluated. The wireless sensors are moving towards the region with lower perceptive probability.

Liu et al. [10] established an arithmetical method for covering the target region using WSN node. Next, the optimization problems of node placement are converted to the problems of detecting the maximal values. Lastly, ALO model is utilized for obtaining an optimum solution for node placement.

Mohar et al. [11] propose node placement based on bat algorithm (BA) for improved coverage. Every bat computes the placement of sensors separately. This algorithm places sensors at grid points to eliminate residual sensors, minimizing the load on residual nodes by replacing the single sensor at the grid point. Simulations were carried out and they found an improved coverage rate of sensors along with efficacy of bat algorithm variables like frequency, sensing range, loudness, and number of grid points.

Ou, Y.; Qin [12] this paper proposes a WSN coverage optimization method based on the improved grey wolf optimizer with multi-strategies (IGWO-MS) to tackle the problem of high node deployment costs and insufficient effective coverage in WSNs,

To verify the performance of IGWO-MS in WSN coverage optimization, this paper rasterizes the coverage area of the WSN into multiple grids of the same size and symmetry with each other, thereby transforming the node coverage problem into a single-objective optimization problem.

Mao et al. [13] proposed a a collaborative perception model for node deployment based on the 0–1 and exponential perception models. The issue of sensor node deployment is converted into a three-dimensional issue of node deployment. Lastly, the method is used in a tobacco storage setting. The scheme acquired by the suggested algorithm is compared with that obtained by the similar deployment algorithm.

Hashim et al. [14] proposed an improved deployment strategy that uses an artificial bee colony as a model. By optimizing the network settings and limiting the total number of deployed relays, this algorithm deployment ensures a longer lifetime. Simulations confirm that the suggested approach works well in several scenarios with varying levels of problem complexity.

Yang et al. [15] introduce an approach for efficient sensor network deployment, specifically in networks with mobile sensors, to achieve balanced coverage. They present a centralized solution using the Hungarian algorithm and propose a localized method i.e., a localized scan-based movement-assisted sensor deployment method (SMART), which employs scanning and dimension exchange techniques for sensor movement to achieve balance. They also extend SMART to address communication holes in sensor networks and validate their approach through extensive simulations.

Works done by Chelbi et al. [16] presents a novel method that combines iterated local search (PSO-ILS) with particle swarm optimization to achieve the best coverage and connectivity rate while requiring the fewest number of nodes. On one side, the PSO-ILS is used to install the fewest number of sensor nodes necessary to maintain target coverage. On the other hand, to attain complete connection, the optimal position determination (OPD) method was developed to determine the best candidate positions that the PSO-ILS might utilize to locate the fewest number of relay nodes.

While these existing approaches contribute valuable insights to the field of node deployment, their omission of reinforcement learning hinders their ability to offer adaptable, context-aware, and optimized solutions for optimal sensor placement in Wireless Sensor Networks.

Proposed Model

The ISBO-RL technique aims to optimally place the sensor nodes to accomplish maximum coverage and connectivity in WSN. In this proposed model, a combined ISBO algorithm with the concept of reinforcement learning is introduced to enable the sensor nodes to adjust their deployment in response to network performance and changing environmental circumstances. Here the methodology is discussed in an algorithmic form with steps involved in formulating the best positioning of bower bird that leads to sensor node and replay nodes placement. The ISBO-RL is explained with Pseudocode for each step-in Sect.  “ISBO Algorithm with Reinforcement Learning (ISBO-RL)” and is formulated along with Nomenclature that gives expansion and explanation on each of the variables in the next Sect.  “Initialization”.

ISBO Algorithm with Reinforcement Learning (ISBO-RL)

Initialization

  • Initialize the population size of bowers i.e., total bowers (NB).

  • Initialize the iteration step size (α).

  • Initialize the probability of mutation (P).

  • Initialize the variance among threshold limits (Z).

  • Calculate space width using Eq. 1, with some percentage Z of space width.

    $${\text{I}}.{\text{e}}.,\sigma = Z*\left( {{\text{var}}max - {\text{ var}}min} \right)$$
    (1)

where σ = Portion space width at which bower positioning is made.

(varmax − varmin) maximum value allocated to position the next bower bird.

(varmax − varmin) minimum value allocated to position the next bower bird.

Z is the percentage value of variance among (varmax − varmin) width.

  • Generate the bowerbird population.

  • Compute bower fitness and determine the dominant bower as an initialization.

Reinforcement Learning Setup

Before the usage of Reinforcement learning and the specific details of our implementation in the proposed methodology, a brief understanding of the concepts of reinforcement learning (RL) is discussed below.

In RL, an agent interacts with an environment, taking actions and receiving feedback in the form of rewards or penalties. Based on this feedback, the agent learns to adjust its behavior to maximize its long-term rewards. This learning process involves several key components and is discussed below.

  • Agent: Agent in the proposed model represents the individual bowerbirds attempting to optimize its position intern the sensor placement.

  • Environment: The environment represents the sensor deployment area with its specific characteristics and constraints.

  • States: Each state captures the current positions of all deployed sensors and relevant environmental factors.

  • Actions: Actions correspond to possible modifications to the sensor placement, such as moving individual sensors or adjusting their locations.

  • Rewards: The reward function defines the desired outcome, typically maximizing network coverage and connectivity while minimizing energy consumption.

  • Q-table: A data structure storing estimated Q-values (future expected reward) for each combination of state and action.

In specific to the proposed approach, the Q-table initialization and Learning Parameters are as follows.

  • Initialize a Q-table: Here the Q-table is used to store and update the estimated Q-values (expected future rewards) for each bowerbird and its corresponding sensor placement configuration (state, action) pair.

  • Learning rate (α_RL) and discount factor (γ): These parameters control the balance between exploration of new solutions and exploitation of learned knowledge.

By effectively utilizing these RL components, our ISBO-RL approach aims to learn and adapt the sensor deployment strategy, ultimately achieving improved network performance in various real-world scenarios as shown in Fig. 1. The proposed ISBO-RL Q-table initialization and computation algorithm is as follows.

  • Initialize a Q-table to store Q-values for each bower and its corresponding elements.

  • Initialize learning rate (α_RL) and discount factor (γ) for Q-learning.

Fig. 1
figure 1

Overview of a typical Wireless Sensor Network

Loop Computation Algorithm

The algorithm discusses the complete steps, processes involved and the overall methodology used in ISBO-RL along with Nomenclature, variables and their descriptions are explained in Sect.  “Nomenclature”.

figure a

Seagull Optimization Algorithm

Nomenclature

Some of the terminologies and variables used to discuss the algorithms and techniques are discussed below.

  • In Eq. 2,

\({\text{Prob}}_{i}\) is the estimated probability of bower,

\(fit_{i}\)is the fitness for bower element

\(\sum\limits_{n = 1}^{NB} {fit_{n} }\)is the summation of fitness for all NB bowers.

  • In Eq. 3,

\(fit_{i}\)is the fitness for bower element

\(Q - value(xi)\) Is the Q-value of the \(i^{th}\) bower element

  • In Eq. 4,

\(S\) is the current state (current bower element)

\(a\) is the action modification to the bower element

\(R\)is the immediate reward (fitness value obtained by modifying the element)

\(S^{\prime}\) is the next state (modified bower element).

  • In Eq. 5,

\(x_{ik}^{new}\) is the updated location of the \(i^{th}\) bower’s element

\(x_{ik}^{old}\) is the old location of the \(i^{th}\) bower’s element

\(\sigma^{2}\) is the variance within the mutation procedure

\(N\) is the standard distribution within the mutation procedure.

  • In Eq. 6,

\(N\) is the standard distribution within the mutation procedure

\(x_{ik}^{old}\) is the old location of the [Inline Image Removed] bower’s element

\(\sigma^{2}\) is the variance within the mutation procedure

\(\sigma\) is a proportion of space width.

Multi-Objective Optimization Problems

Multi-objective optimization (MOO) involves optimizing multiple conflicting objectives simultaneously. In the context of Wireless Sensor Networks (WSNs), where resource allocation, coverage, connectivity, and energy efficiency are often conflicting objectives, addressing these challenges becomes inherently multi-objective. MOO aims to find a set of solutions, known as the Pareto front, that represents a trade-off between different objectives. Some of the MOO problems include Pareto Dominance, Pareto Optimal Solutions, Weighted Sum Method [17].

ISBO-RL Technique

The Intelligent Satin Bower Bird Optimizer with Reinforcement Learning (ISBO-RL) technique combines the unique capabilities of the Satin Bower Bird algorithm with reinforcement learning to address the complexities of multi-objective optimization in WSNs. It utilizes a hybrid approach for solving the node deployment issue, integrating particle swarm optimization (PSO) with an adapted Iterated Local Search (ILS) technique.

Here the methodology proposed involves the optimization of numerous competing objectives at the same time and is known as multi-objective optimization. In the context of Wireless Sensor Networks (WSNs), tackling these difficulties becomes intrinsically multi-objective, as resource allocation, coverage, connection, and energy efficiency are frequently competing objectives. In the present methodology involves in balancing various goals and seeks to identify a group of solutions known as the Pareto front that represents a trade-off between different objectives.

Reinforcement learning enables individual bower within the population to learn and adapt their behavior based on the feedback received from their environment. This adaptive learning capability allows the bower to dynamically adjust its search strategy, improving the convergence speed towards optimal or near-optimal solutions, with decentralized decision making based on their individual local perception and feedback received from their actions. Thereby strengthening the interactive learning process that brings a balance between new areas and focusing on promising areas. That in turn leads to a more robust and scalable solution process that can adapt to complex and dynamic environments.

Following steps outline the process:

Initialization Generation of Early Population

An early population is a group of particles created arbitrarily. The dimension of the particles is equivalent to the amount of possible locations L in the network, where Pi = {Xi1, Xi2,..., XiL} represent the ith particles of the population in which all the components Xid map the status of the appropriate location Sd by Eq. 7.

$$X_{{i,d}} = \left\{ {\begin{array}{*{20}c} {1,} & {{\text{Select state}}} \\ {0,} & {{\text{else}}} \\ \end{array} } \right.$$
(7)

Tuning of Fitness Function: Multi-Objective

Here all the particles are evaluated based on the objective function. The primary goal is to maximize the coverage rate of the network using Eq. 8.

$${f}_{1}=\frac{{\sum }_{i=1}^{N}cov({T}_{i})}{L}$$
(8)

where cov (Ti) signifies the coverage of the target and its value is 1, otherwise it is 0.

The secondary goal is to minimise the number of elected locations using Eq. 9.

$$f^{\prime}_{2} {\mkern 1mu} = {\mkern 1mu} s{\mkern 1mu} L$$
(9)

Here, ‘s’ means the amount of elected locations, and ‘L’ signifies the amount of possible locations.

$${f}_{2}=1-f \mathrm{^{\prime}}2$$
(10)

Combine the objectives into a multi-objective fitness function:

$$f \mathrm{^{\prime}}2=\frac{s}{L}$$
(11)

Subject to \(\alpha 1 + \alpha 2 = 1\) and \(0 \le \alpha i \le 1\,for\,i\,\) 1,2

Adapted ILS

They expect that the deployed nodes’ redundancy will be eliminated from the network without affecting coverage in any other way. The ILS approach is utilized for improving the efficiency of PSO method. Redundant nodes are eliminated from the current solution without compromising the coverage.

Upgrade

After terminating the PSO-ILS method, the Gbest particles represent the group of selected locations for deploying sensors [16].

Performance Validation

Experimental Simulation

A platform for experimentation has been created and the ISBO-RL technique is simulated. It is assumed that the fitness of each bowerbird element depends on the state of the element.

Simulation platform uses some of the simulation parameters that represent WSN parameters and ISBO-RL parameters that includes network related parameters, Bower bird distribution parameters are given in the Tables 1 and 2.

Table 1 Wireless sensor network simulation parameters
Table 2 ISBO-RL simulation parameters

Figure 2 shows the progression of the maximum fitness value over iterations. It helps visualise how the fitness of the best bowerbird element evolves during the simulation.

Fig. 2
figure 2

Line Plot of Fitness History

Figure 3 is a scatter plot illustrating the relationship between two dimensions of the bowerbird elements, where color indicates the fitness value.

Fig. 3
figure 3

Scatter Plot of Bowerbird Elements

Figure 4 depicts the relationship among three dimensions of the bowerbird elements, with color indicating fitness.

Fig. 4
figure 4

3D Scatter Plot of bowerbird element

Figure 5, Here is the Pareto front plot, where maximizing coverage and maximizing connectivity have been considered as the objectives.

Fig. 5
figure 5

Pareto front plot for connectivity and coverage

Comparing ISBO-RL with Other Techniques

The proposed ISBO-RL technique is simulated, and the results are investigated under varying number of target points and potential positions.

Table 3 and Fig. 6 depict the performance validation of the ISBO-RL technique in terms of selected positions (SP) out of 300. The figure shows that the ISBO-RL technique has attained effective outcomes with the lower SP under the distinct number of target points. For instance, with 40 target points, the ISBO-RL technique has obtained an SP of 16 whereas the PSO, Differential Evolution (DE), improved GA, and PSO-ILS techniques have obtained SP of 24, 23, 21, and 17 respectively. In addition, with 60 target points, the ISBO-RL approach has gained an SP of 17 whereas the PSO, DE, improved GA, and PSO-ILS systems have reached SP of 26, 25, 22, and 19 correspondingly. Also, with 80 target points, the ISBO-RL technique has obtained an SP of 17, (round off to 17), whereas the PSO, DE, improved GA, and PSO-ILS manners have achieved SP of 28, 26, 24, and 21 correspondingly.

Table 3 Result analysis of ISBO-RL model in-terms of SP out of 300
Fig. 6
figure 6

SP Analysis of ISBO-RL model out of 300

Table 4 and Fig. 7 showcase the performance validation of the ISBO-RL technique in terms of number of placed nodes (NPN) for 75 targets. The figure illustrates that the ISBO-RL technique has achieved effective outcomes with lower NPN for various numbers of potential positions. For instance, with 150 potential positions, the ISBO-RL algorithm has obtained an NPN of 19 whereas the PSO, DE, improved GA, and PSO-ILS approaches have obtained NPN of 25, 25, 22, and 20 correspondingly. Following that, with 250 potential positions, the ISBO-RL technique has achieved an NPN of 19, whereas the PSO, DE, improved GA, and PSO-ILS methods have reached NPN of 26, 26, 23, and 20 respectively.

Table 4 NPN Analysis of ISBO-RL model for 75 targets
Fig. 7
figure 7

SP Analysis of ISBO-RL model out of 300

Furthermore, with 350 potential positions, the ISBO-RL algorithm has obtained an NPN of 19 whereas the PSO, DE, improved GA, and PSO-ILS techniques have obtained NPN of 30, 28, 24, and 20 respectively.

Table 5 and Fig. 8 showcase the performance validation of the ISBO-RL manner with respect to number of deployed sensor nodes (NDN). The figure has shown that the ISBO-RL technique has obtained effective outcomes with the lower NDN under distinct number of target points. For instance, with (1,1) values, the ISBO- ND technique has obtained an NDN of 17 whereas the PSO, DE, improved GA, and PSO-ILS techniques have attained NDN of 23, 23, 21, and 19 correspondingly. Along with (2, 1) values, the ISBO-RL scheme has obtained an NDN of 25 whereas the PSO, DE, improved GA, and PSO-ILS techniques have obtained NDN of 38, 39, 35, and 31 correspondingly. Eventually, with (2, 2) values, the ISBO-RL technique has obtained an NDN of 29 whereas the PSO, DE, improved GA, and PSO-ILS systems have gained NDN of 44, 44, 38, and 36 respectively.

Table 5 NDN Analysis of ISBO-RL model with existing techniques
Fig. 8
figure 8

NDN Analysis of ISBO-RL model with existing approaches

Table 6 and Fig. 9 portray the performance validation of the ISBO-RL method in terms of number of relay nodes required (NRNR). The figure has shown that the ISBO-RL algorithm has attained effective outcome with the least NRNR under different numbers of target points. For instance, with 150 positions, the ISBO-RL technique has achieved an NRNR of 7 whereas the PSO, GA, and PAO-ILS techniques have obtained NRNR of 12, 11, and 9 correspondingly. Moreover, with 250 positions, the ISBO-RL approach has obtained an NRNR of 8 whereas the PSO, GA, and PAO-ILS systems have reached NRNR of 13, 12, and 9 correspondingly. Finally, with 300 positions, the ISBO-RL technique has obtained an NRNR of 8 whereas the PSO, GA, and PAO-ILS methodologies have obtained NRNR of 15, 13, and 9 correspondingly.

Table 6 Result analysis of ISBO-RL model with respect to NRNR
Fig. 9
figure 9

NRNR Analysis of ISBO-RL Model with Count of Positions

Summary

In this proposed research, we present a novel approach called ISBO-RL, which leverages the power of Reinforcement Learning to enhance node deployment in Wireless Sensor Networks (WSNs). The primary goal of ISBO-RL is to strategically position sensor nodes for achieving optimal coverage and connectivity within the network. To achieve this, an innovative Optimal placement framework (OPF) is developed, enabling the accurate placement of candidate sensors.

Through a comprehensive series of simulation analyses, the system is rigorously validated for the effectiveness of the ISBO-RL technique. Here, the method demonstrates significant improvements over existing strategies across a range of evaluation metrics. Notably, the ISBO-RL approach showcases superior performance in terms of various key parameters. It particularly excels in minimizing the number of relay nodes required to guarantee complete connectivity among all nodes within the network.

Looking ahead, there is a potential to expand the proposed approach by incorporating node localization methods. These methods can further refine the placement of sensors in WSNs, contributing to an even more optimized and efficient network deployment. Hence, this research contributes to advancing the field of WSNs by introducing a robust and intelligent approach that addresses critical challenges in node placement and network connectivity.

In summary, the research introduces ISBO-RL, a novel approach that utilizes Reinforcement Learning to optimize node deployment in Wireless Sensor Networks (WSNs), achieving improved coverage, connectivity, and efficiency compared to existing strategies, and with the potential for further enhancements through the integration of node localization methods.