1 Introduction

The rapid growth of Internet of Things (IoT) applications has created a need for a communication standard with high energy efficiency. In recent years, Bluetooth Low Energy (BLE) has been applied to various IoT applications, such as lost-and-found tags and indoor localization, contact detection, and push promotions in malls, etc., due to the widespread use of Bluetooth-enabled devices and its low power consumption [1,2,3,4,5]. However, most BLE devices are required to run on a small coin-cell battery for long periods of time. Therefore, further reducing their power consumption is very important. Furthermore, the poor computational resources of IoT devices make it difficult to apply algorithms with high computational and memory requirements, such as deep reinforcement learning, which must be lightweight algorithms.

Wireless communication in BLE can be divided into two periods: advertising and Generic ATTribute (GATT) communication [6]. Advertising, in which devices called peripherals, such as sensor devices and beacons, communicate with a central device, such as a smartphone or PC, in order to have them discover their presence. GATT communication, in which central and peripherals communicate after connecting. Even if GATT communication is desired, advertising communication is required at the start of the connection. In advertising, the peripheral periodically broadcasts advertising packets to be discovered by the central. The BLE specification calls for the same packet to be sent on three channels to ensure more reliable packet arrival, even in the presence of interference from other systems. While this increases reliability, packets that are not received are wasted, leading to wasted power consumption.

In Ref. [7], power consumption has been measured using Nordic’s nRF52, which shows that the current consumption during advertising is much higher than during standby sleep. In addition, the literature [8] shows the possibility of reducing power consumption by not using the channel where interference occurs. Therefore, it is important to efficiently estimate the channel state and determine the channel to be used in order to further reduce power consumption without compromising reliability.

Several studies are being conducted on the performance improvement of BLE NDP (Neighbor Discovery Process) [9]. These studies can be broadly categorized into the following: those that propose analytical models of advertising performance and perform parameter optimization, those that propose backoff strategies, and those that propose firmware to minimize power consumption. However, among them, there are few studies that consider factors such as interference from other wireless systems, and there are even fewer studies that take into account adaptability to changes in the environment.

There are ongoing efforts to utilize artificial intelligence to improve the efficiency of wireless communications [10]. However, complex computations such as neural networks are not suited to applications where it is important to minimize complexity. In our previous works, we introduced a parameter selection technique for low-power wide area networks (LPWANs) [11,12,13,14,15]. This method adapts to the communication environment and aims to maximize the frame success rate (FSR). It achieves this by equating the Multi-Armed Bandit (MAB) problem, which addresses human decision-making challenges, with the parameter selection challenge in LPWANs. The goal of the MAB problem is to maximize overall reward while playing multiple slot machines with unknown reward probabilities within a limited number of trials [16]. It is a decision-making problem that considers the trade-off between “exploration” for the optimal machine and “exploitation” the information obtained from the search to select the superior machine. The algorithm for solving the MAB problem is lightweight, making it suitable for IoT devices with limited computational resources to perform the selection in an autonomous and decentralized manner.

In this paper, we design a reward model for the MAB problem that takes into account the power consumption, in order to consider the trade-off between power consumption and reliability. The proposed method is evaluated by simulation under various communication environments. Furthermore, we implement the proposed method on actual devices and conduct experiments to mimic real communication environments.

The remaining chapters are organized as follows. Section 2 describes the description of BLE advertising. Section 3 describes related work. Section 4 presents an overview of channel masking and optimal parameter selection. Section 5 proposes and designs a channel and advertising interval selection method. Section 6 presents simulation results under various communication environments. Section 7 shows the implementation and experimental results on actual BLE devices. Section 8 concludes and summarizes the paper.

2 Description of BLE advertising

BLE wireless communications operate within the 2.4 GHz band, with the channels are divided into 40 channels of 2 MHz each. Advertising channels, namely CH 37, CH 38, and CH 39, are designated for advertising, while data channels CH 0–36 facilitate GATT communication. To mitigate interference from Wi-Fi and other devices that uses the same frequency band with BLE, GATT communications employ frequency hopping, while advertising sends identical packet to three different channels. This section provides the details of the advertising, utilized for discovering and connecting neighboring devices and broadcasting periodic packets. Figure 1 illustrates the BLE connection establishment procedure.

During an advertising event, a peripheral broadcasts an advertising packet (ADV_PDU) to the three advertising channels. Upon receiving an ADV_PDU, the central can request additional information or initiate a connection based on the advertising packet type.. When additional information is requested, the central responds with a scan request (SCAN_REQ), and the peripheral that receives it replies with additional information in a scan response (SCAN_RSP). In case of a connection, the central further sends a connection request (CONNECT_REQ), and initiate one-to-one communication on the data channel.

Fig. 1
figure 1

BLE connection establishment procedure

Fig. 2
figure 2

Detailed peripheral and central behavior

Figure 2 shows the detailed peripheral and central behavior. The interval between the start of two consecutive advertising events is computed as follow for each advertising event.

$$\begin{aligned} T_{\textrm{adv Event}}=\textrm{advInterval}+\textrm{advDelay}, \end{aligned}$$
(1)

The advertising interval (\(\textrm{advInterval}\)) can be set an integer multiple of 0.625 ms within the range 20 ms to 10.24 s. The advDelay is a random value with a range 0 ms to 10 ms for each advertising event.

The central sequentially scans three advertising channels. Both the scan window for the listening period of one channel and the scan interval for the period of channel traversal can be set to 10.24 s or less. At the timing when an ADV_PDU is transmitted, the central must be scanning on the same channel to receive the packet. Therefore, the advertising and scan intervals are usually set to prime values for each other to avoid a stalemate caused by synchronization between the advertising and scan channels.

3 Related works

There have been many studies on BLE parameter selection and low power consumption, which can be broadly classified into three categories.

One is the performance analysis model of BLE NDP (Neighbor Discovery Process) to derive the optimal parameters [5]. Among them, many CRT (Chinese Remainder Theorem)-based methods have been used, and NDP models that consider average latency and power consumption has been proposed, suggesting a relationship between Scan Interval and Advertising Interval [9, 17,18,19]. Other analytical models based on probabilistic approaches have also been proposed; for instance, [20] introduces an analytical model linking Advertising Interval, Scan Interval, Discovery Latency, and the number of advertisers, and proposes a method for selecting the optimal Advertising Interval. [21] focused on the fact that BLE communicates using multiple channels, calculated the latency expected from the parameters, and proposed guidelines parameter settings.

The second explores backoff strategies. [22] minimized collisions and improved latency and power consumption by adjusting the backoff window. [23] proposes firmware that can efficiently initiate scans by inserting additional information in the advertising packet, improving the success rate and latency.

The third focuses on minimizing power consumption by adjusting the advertising interval. [24] proposes a beacon firmware that maximizes battery life by detecting the presence of a user and adjusting the advertisement interval. This method is mainly designed to work well in environments with few BLE devices and no sources of interference. [25] proposes a packet scheduling method that minimizes power consumption by dynamically adjusting the advertising interval.

There are also other studies that have been conducted, and [26] discovered that scanning operations in actual devices were not ideal, and compared the performance using up to 14 devices. [27] proposes a method to dynamically adjust the Scan Interval / Window based on network conditions to achieve optimal latency. [28] proposes a method to optimize the sensing interval and achieve low power consumption by learning sensor values in BLE applications that involve sensing. [7] measured the power consumption using Nordic’s nRF52, which shows that the current consumption during advertising is much higher than during standby sleep. For example, when the transmit power is 0 dBm, the average power consumption during transmission is 33.2 mW, while the power consumption during sleep is 0.0058 mW, a difference of about 6000 times, indicating that a far longer battery life can be achieved by using a longer advertising interval.

Several studies on power consumption of BLE devices have been presented, but all of them focus on advertising interval optimization and scheduling adjustment, and few papers consider interference. Furthermore, most BLE research involves simulation and theoretical analysis, and few have demonstrated practicality using actual devices. In these studies, the evaluation using actual equipment has been conducted by [7, 21, 24,25,26, 28].

In contrast to existing related works, our proposed method incorporates channel masking in addition to advertising interval selection. The use of channel masking enables power optimization in both the time and frequency directions, resulting in more power savings. Moreover, we implement the proposed method on actual devices and demonstrate the power reduction not only by simulation but also experimentally.

4 Optimal parameter selection

4.1 Reducing power consumption by channel masking

Previous researches achieved low power consumption by adjusting the advertising interval. To further reduce power consumption, we introduce channel masking, in which only certain channels are utilized instead of the standard three advertising channels. To assess the actual power consumption when channel masking, we measured power consumption using the Taiyo Yuden EBSHSNZWZ with Nordic Semiconductor’s BLE SoC nRF52832. Figure 3 illustrates the waveforms of current consumption with transmission on one channel and three channels.

Fig. 3
figure 3

Current waveform

Observing the waveforms, the current consumption rises for a short time at the MPU(Micro Processing Unit)’s activation, and then a rise lasting about \(500\,\upmu \hbox {s}\) for the transmission of ADV_PDU and a short time increase for the reception of SCAN_REQ for each transmission channel. The power consumed for a single advertising event was \(22.99\,\upmu \hbox {C}\) when there were three channels to transmit, \(16.72\,\upmu \hbox {C}\) when there were two, and \(10.3475\,\upmu \hbox {C}\) when there was only one. These results show that changing from the default of three channels to two channels or only one channel can reduce power consumption by 72% and 44%, respectively.

4.2 Effects of parameter selection

We propose a scheme for reducing BLE power consumption that optimizes the advertising interval and the advertising channels, both of which have a significant impact on reliability and power consumption. There is a trade-off between reliability and power consumption, and if reliability is emphasized too much, sending a large number of packets will increase power consumption and shorten the life of devices that are powered by a limited battery. Therefore, it is important to reduce power consumption while meeting the reliability requirements of the application to which the device is adapted.

BLE uses the 2.4 GHz ISM band and shares the frequency band with other communication methods such as Wi-Fi, ZigBee, etc. Therefore, it is easy to have a situation where only a specific channel among the three available advertising channels is congested, making it difficult to communicate. Hence, the proposed method improves the trade-off between power consumption and reliability by narrowing down the number of channels to be transmitted, instead of the standard three fixed channels (CH 37, 38, 39). Furthermore, by also adjusting the advertising interval, behavior as shown in Fig. 4 is assumed.

  • When interference is weak:

    Reduce the number of channels to transmit and increase the advertising interval to reduce power consumption and extend battery life.

  • When there is an interference source on a particular channel:

    Avoid transmitting on collision-prone channels to reduce power consumption.

  • When all channels have heavy interference and communication is difficult:

    Use three channels and reduce the advertising interval to meet the application requirements, even if the power consumption is higher.

Fig. 4
figure 4

Examples of expected behavior for various interference

5 Proposed method

5.1 Decision making algorithm for BLE advertising

To achieve the optimal channel and advertising interval selection, we introduce a decision making algorithm based on the MAB problem. In this approach, as shown in Fig. 5, we map the channel and advertising interval choices to a slot machine in MAB, with the reception of SCAN_REQ packets used as a reward. Furthermore, we take into account the power consumption of each choice, since our objective is to achieve a low power consumption while maintaining reliability. We assign different rewards to each choice based on the power consumption of the corresponding transmissions.

At this time, with 3 channels available, there are 7 channel patterns: {37}, {38}, {39}, {37, 38}, {38, 39}, {37, 39}, {37, 38, 39} and n ways for the advertising interval. If n ways are used for the advertising interval, we have 7n parameter sets of options.

Fig. 5
figure 5

Comparison of the MAB and channel and advertising interval selection problem

5.2 Reward design in proposed decision making algorithm

In this section, we designed two reward models taiking into account the power consumption of the selected choices. The reward probability \(p_k\) is defined the total reward estimate \(Q_k\) divided by the number of trials \(n_k\). We also define a parameter, the learning interval \(T_w\), and assume that the proposed algorithm uses the same choices during the learning interval \(T_w\).

  1. 1.

    Reward model1

    In model1, the inverse ratio of consumed power for the selected choice is added to the total reward \(Q_k\). The reward is updated by the following equation:

    $$\begin{aligned} Q_k(t)=\, & {} Q_k(t-1)+\Delta Q_k(t) \end{aligned}$$
    (2)
    $$\begin{aligned} \Delta Q_k(t)=\, & {} {\left\{ \begin{array}{ll} \frac{1}{E_k} &{} \text{ if } \text{ SCAN }\_\text{ REQ } \text{ received } \\ 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
    (3)
  2. 2.

    Reward model2

    In model2, the ratio of consumed power for the selected choice is added to the number of trials \(n_k\), while the reward takes values of 0 or 1.

    $$\begin{aligned} n_k(t)= \,& {} n_k(t-1)+\Delta n_k(t)\end{aligned}$$
    (4)
    $$\begin{aligned} \Delta n_k(t)= & {} E_k \end{aligned}$$
    (5)
    $$\begin{aligned} \Delta Q_k(t)= & {} {\left\{ \begin{array}{ll} 1 &{} \text{ if } \text{ SCAN }\_\text{ REQ } \text{ received } \\ 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
    (6)

The proposed method employs the MAB algorithms for decision making. In this paper, we use the following two MAB algorithms.

  • \(\epsilon\)-greedy [29]

    \(\epsilon\)-greedy is a simple algorithm that uses the parameter \(\epsilon\) to determine the trade-off between exploration and exploitation. It selects an arm with the highest reward probability with probability \(\epsilon\) and randomly selects an arm with probability \(1-\epsilon\).

    $$\begin{aligned} p_k(t)= & {} \frac{Q_k(t)}{n_k(t)} \end{aligned}$$
    (7)
    $$\begin{aligned} k^*= & {} {\left\{ \begin{array}{ll} {{\,\mathrm{arg\,max}\,}}p_k(t) &{} \cdots 1-\epsilon , \\ \textrm{Random} &{} \cdots \epsilon , \end{array}\right. } \end{aligned}$$
    (8)
  • UCB1-tuned [30]

    The upper confidential bound (UCB) 1-tuned incorporates the confidence bound defined based on the number of trials on the arm and variance. The t-th arm is selected by the following equation.

    $$\begin{aligned} k^*= & {} {{\,\mathrm{arg\,max}\,}}\left( p_k(t)+\sqrt{\frac{\log N}{n_k(t)} \min (1/4, V_k(t))}\right) ,\end{aligned}$$
    (9)
    $$\begin{aligned} V_k(t)= & {} \sigma ^2_k(t)+\sqrt{\frac{2\log N(t)}{n_k(t)}}, \end{aligned}$$
    (10)

    where \(\sigma ^2_k(t)\) is the variance of the reward.

6 Simulation evaluation

In this section, we conducted simulations under the condition of one peripheral and one central, taking into account the assumed interference caused by other systems. The simulations were implemented using Python, and the distances between devices were intentionally set to induce mutual interference. Table1 shows the simulation parameters. It is recommended that the scan and advertising intervals be set to prime values to prevent consecutive advertising failures. As a benchmark, we employed the standard BLE that consistently uses three channel and a fixed interval for comparison.

Table 1 Simulation Parameters

6.1 Varying intensity of interference

In this subsection, we investigate the feasibility of achieving optimal parameter selection proposed in Sect. 4.2 by varying the intensity of interference. The intensity of interference is represented as the Interference Duty Ratio, where a ratio of 0.2, for instance, signifies 20% change of failure even if the central device scans the same channel simultaneously with advertising. The Interference Duty Ratio is set independently for each channel, reflecting the independence of actual channels. Interference, such as Wi-Fi, does not span across two adversing channels.

Fig. 6
figure 6

Time series of selected arm

We identify the selected arms in the case of biased interference in specific channels. Figure 6 illustrates some time series of the selected arms when the optimal learning interval is used in UCB1-tuned model1 and \(\epsilon\)-greedy model1. It shows the cases of scenarios with biased interference, where the Interference Duty Ratio is {0.8, 0.2, 0.2}. The main utilization is observed for the longest advertising interval with only using 1 channel, avoiding the heavily interfering CH 37. Due to the design of the algorithm, UCB1t tends to select arms with the same interference status in a balanced manner, whereas \(\epsilon\)-greedy tends to select one arm in succession.

Next, we observe the advertising success ratio (ASR) and power consumption were observed while gradually changing the Interference Duty Ratio. The ASR represents the percentage of successful reception within the last 60 s for each advertising time. Two types of interference scenarios were assumed: one in which all three channels are equally interfered, and the other in which each channel is biased interfered.

Fig. 7
figure 7

The relationship between the Interference Duty Ratio, advertising success ratio (ASR), and power consumption

Figure 7 illustrates the relationship between Interference Duty Ratio, ASR and power consumption. For simplicity, we display the cases with learning intervals of 0 and 30 s. Since the conventional method has no parameters, ASR decreases as the Interference Duty Ratio increases, and the power consumption remains constant. The proposed method consumes significantly less power than the conventional method at low Interference Duty Ratios, indicating that power consumption tends to increase as Interference Duty Ratios increase. This trend is consistent with the expected performance. Figure 7a,c shows that the ASR decreases more slowly than the conventional method as the Interference Duty Ratio increases, especially noticeable when there is biased interference. It is also evident that performance and power consumption vary depending on the learning interval. The differences by learning interval are discussed in the section 6.2.

6.2 Learning interval of proposed method

In this subsection, we discuss the impact of the learning interval, a parameter of the proposed method. We assumed two interference scenarios: one with an equal load on three channels, and the other with a biased load on each channel. The Interference Duty Ratio was set to {0.2, 0.2, 0.2} and {0.2, 0.2, 0.8}, respectively.

Fig. 8
figure 8

The relationship between the learning interval, mean latency, and power consumption

Figure 8 shows the relationship between the learning interval, mean latency, and power consumption. The mean latency is defined as the average of the arrival intervals for each packet. Shorter mean latency and lower power consumption are favorable. It is observed that \(\epsilon\)-greedy shows exactly the same performance in reward models 1 and 2. Additionally the power consumption tends to be lower in the order of UCB1t model2, \(\epsilon\)-greedy, and UCB1t model1. On the other hand, mean latency are the opposite.

The parameter learning interval affects the trade-off between the mean latency and power consumption. In any interference situation, the mean latency tend to be the smallest, and power consumption is the largest when the learning interval is around 8 s. It can be seen that as the learning interval is increased up to about 31 s, the mean latency tends to increase and power consumption decreases. Since the performance varies depending on the learning interval, it is considered that optimal performance can be achieved by selecting the optimal learning interval depending on the requirements of the application using BLE communications. The optimal performance is discussed in the section 6.3.

6.3 Performance of the proposed scheme for the tradeoff between latency and power consumption

In this subsection we evaluate optimal performance of the proposed method. As we found in the previous subsection, performance varies with the learning interval, posing a challenge due to the trade-off between the mean latency and power consumption. For many BLE applications, the crucial aspect is receiving the advertising message within a certain timeframe. Therefore, the optimal parameter of the proposed scheme depends on the performance requirements on the application side. For example, in the case of a lost-and-found tag, a forgotten item is detected based on the inability to establish communication between the tag and the smartphone for a certain time (e.g., 30 s).

We consider three interference scenarios: equal interference on all three channels, biased interference on specific channel, and dynamic interference. Interference Duty Ratio was set to {0.2, 0.2, 0.2} and {0.2, 0.2, 0.8} for static. For dynamic interference, the first hour was set to {0.2, 0.8, 0.8}, the next hour to {0.8, 0.2, 0.8}, and the last hour to {0.8, 0.8, 0.2}. Figure 9 illustrates the energy consumption on each required mean latency. Minimum of the required mean latency is shown using dots, and the achievable power consumption are shown using lines with changing learning interval that meets the required average latency.

Fig. 9
figure 9

The amount of power consumption during the learning interval that satisfies required mean latency

We consider two applications that require packet arrival every 10 s and every 20 s, as example. In an environment with equally weak interference, the conventional method requires 48 mC for both applications, while the proposed method can reduce the consumption to about 36 mC and 29 mC, respectively. Similarly, in an environment with biased interference, the power consumption can be reduced to about 39 mC and 29 mC, respectively. Furthermore, in an environment with dynamic interference, the conventional method requires about 60 mC and 48 mC, respectively, while the proposed method can reduce the consumption to about 59 mC and 30 mC, respectively. The proposed method is particularly useful for applications that can tolerate long packet arrival intervals, and we found that the power consumption can be reduced by up to 40% by setting the learning interval appropriately.

Fig. 10
figure 10

The amount of power consumption and ASR at the learning interval that satisfies required mean latency

Figure 10 illustrates the power consumption and ASR under low power consumption conditions. As depicted in Fig. 10 (a) and (b), we observe a 40% reduction in the power consumption while maintaining the ASR above 99% for all algorithms in the two static interference scenarios. On the other hand, when the interference changes dynamically, UCB1-tuned and \(\epsilon -\)greedy maintain ASRs of 97% and 94% in Fig. 10 (c), respectively, which are slightly lower than the static scenarios. This deterioration is thought to be due to packet loss until the optimal choice is learned, as the environment changes and the optimal choice changes. Since UCB1-tuned is based on variance and confidence intervals, it explores more options than \(\epsilon -\)greedy, as shown in Fig. 6, suggesting that it adapts faster to the optimal option when the communication environment changes. These degradations of ASR are acceptable for practical use, however, as a future issue, we will investigate the method to improve ASR even in the case of “Dynamic interfered”.

7 Implementation and experimental evaluation

7.1 Implementation on an actual BLE device

We implemented the proposed channel and advertising interval selection algorithm on the Taiyo Yuden EBSHSNZWZ with Nordic Semiconductor’s BLE SoC nRF52832. The peripheral device, EBSHSNZWZ, was powered by Rohde & Schwarz NGM 202, a stabilized power supply with current measurement capability. The power consumption values were calculated by subtracting the idle state portion to isolate the wireless communication portion. On the central device side, a Taiyo Yuden EBSKDNZWB with nRF52840 was employed to collect data of received advertising packets with an external laptop. In addition, a USRP N210 was provided as an interference source. The experimental environment was configured as shown in Fig. 11 using an electromagnetic anechoic box.

Fig. 11
figure 11

Experimental environment

Table 2 Experimental parameters

Experiments were conducted, assuming interference by systems using the same 2.4 GHz band as BLE. The parameters used in the experiments are shown in Table. 2. CH 37, CH 38, and CH 39 used for BLE advertising have center frequencies of 2402 MHz, 2426 MHz, and 2480 MHz, respectively. In the experiment assuming interference, a jamming signal was transmitted on CH 39. The jamming radio waves were added with 10 dB of white Gaussian noise with a bandwidth of 5 MHz for the center frequency.

7.2 Experimental results

In this section, we conducted experiments with a single peripheral and a single central system, considering interference caused by other systems. We assumed two interference scenarios: one with no interference and the other with interference from other systems on a specific channel. For comparison, we used a conventional method that always uses three channels with a constant advertising interval.

Fig. 12
figure 12

The amount of power consumption and ASR at the learning interval that satisfies required mean latency

Fig. 13
figure 13

Time series of energy consumption

Many BLE applications do not require the ADV_PDU to arrive every time, but only once at regular intervals. As an example of such applications, we consider the case where a packet should arrive once every 20 s. Figure 12 demonstrates the minimum power consumption and ASR that satisfies required mean latency. In the absence of interference, the conventional method with a 5122.5 ms interval consumed 64 mC, UCB1t consumed 42 mC, and \(\epsilon\)-greedy consumed 48 mC. In the presence of interference, the conventional method consumed 60 mC, UCB1t 42 mC, and \(\epsilon\)-greedy 38 mC, respectively. In both environments, the power consumption was significantly reduced compared to the conventional method. In addition, both algorithms maintain an ASR of almost 100%. It confirms successful packet transmissions and receptions. Figure 13 shows the time series of power consumption of the conventional method, UCB1t, and \(\epsilon\)-greedy. It can be observed that the proposed method consumes more power for the initial search, but afterwards, the power consumption decreases, and the difference in power consumption will further increase when the system operates for longer periods.

8 Conclusion

In this paper, we propose a method for selecting channels and advertising intervals based on decision-making algorithms. Simulations under various environments demonstrate that the proposed method achieves low power consumption while satisfying application requirements. Addtionally, the implementation of the proposed method on a real device confirms its lightweight nature and ability to achieve low power consumption. From a security perspective, since the proposed method utilizes existing BLE protocol, it can be applied while maintaining the security, robustness and compatibility of BLE.

In future works, we will implement the proposed method in actual applications and evaluate its practicality in various environments including those with more large channel fluctuations and interference. In addition, we will also design and improve the reward model for better search to improve ASR. In particular, we will design rewards for the Tug-of-War dynamics, which is considered to be an efficient search algorithm [31].