Online Power Control and Optimization for Energy Harvesting Communication System Based on State of Charge

In this paper, the online power control problem for energy harvesting wireless communication system with a finite storage capacity battery is addressed, where the channel state and energy harvesting rate are both unknown. A low complexity algorithm based online convex optimization is proposed to guarantee energy availability of energy harvesting node and maximize average long-term throughput. The proposed algorithm restricts maximum transmission power with the information of state of charge, and allocates transmission power based on historical information. In addition, energy availability constraint is given by rigorous theoretical analysis to guarantee the optimization of average long-term throughput. Simulations have been conducted to demonstrate the effectiveness of the algorithm without considering probability distribution of energy arrival or channel coefficients. The proposed algorithm outperforms counterparts in different energy harvested rates.


Introduction
Recently, energy harvesting (EH) technologies extend the lifetime of node by collecting ambient energy (solar, vibration, RF, etc.) from environment [1]. The system with EH nodes has been considered to provide extremely cost-effective and fully passive solutions in Internet of Things (IoT) [2], especially in places where conventional power source are not accessible. Though the operation, efficiency and maturity of energy harvesting systems vary with each other, it is indispensability to support long-term node availability in most of real scenarios. Within the operating time, the power supply of EH node should be no less than the power consumption [3]. However, the harvested energy is usually not continuous and sometimes limited due to the intermittent nature of energy sources. Nodes powered by harvested energy need energy storage and power management algorithm to provide uninterrupted power supply [4].
The efficiency of EH technologies is important to enhance node availability, while optimal power control plays a critical role in improving overall performance, especially in EH based wireless communication system [5]. Computing and communication are the main sources of power consumption during node operation. In most cases, the amount of power consumption on communication is bigger than other tasks, such like sensing and computing. Furthermore, it varies large along with the dynamic and unpredictable wireless channel. As a result, the power control problem of a EH based transmitter, which harvests energy from ambient environment and transmits information to remote wireless receivers with harvested energy is a big challenge [6]. The unknown wireless channel and intermittent nature of ambient environmental energy lead to the challenges of ensuring node continuity and maximizing communication throughput [7][8][9].
In practical scenarios, the characteristics of energy arrival rate and the coefficients of wireless channel are hard to be predicted. The learning theoretic approaches improve the power control strategies by identifying the model of energy arrival rate and channel distribution [10,11]. However, the computing complexity increases sharply with the states number of energy arrival rate and wireless channel, that is unacceptable for low cost transmitter. Online convex optimization (OCO) [12] is a low complexity way to allocate power effectively by projecting power vector into feasible set, which is determined by constraints. However, the energy of battery may be exhausted if the average of harvested energy is below the maximum of feasible set. Then the OCO with stochastic constraints [13] is deeply analyzed, and a new algorithm of guaranteeing node continuity is proposed which required a costly big-capacity battery [14].
In order to remove the constraint of battery capacity, this paper proposes an improved gradient descent algorithm based on OCO for power control problem. The algorithm adjusts the size of feasible set with the State of charge (SOC) (range is [0, 1]). When the SOC is close to 1, the feasible set is almost full size, which indicates maximum power is allocated. Otherwise, the feasible set is a fractional subset of full size set, extra energy would be stored for future when the harvested energy is bigger than maximum of current fractional subset. By rigorous theoretical analysis, the conditions of energy availability guaranty and the lower bounder of average long term throughput are explicitly given. Furthermore, the throughput and battery state are analyzed, the simulations demonstrate the excellent results.
The benefits of proposed algorithm are as follows: • Firstly, comparing with learning theoretic approaches [11], the proposed algorithm has low complexity and only relies last historical information. • Secondly, the node continuity or energy available guaranty is fulfilled without a complex setting of battery capacity, which is indispensable in algorithm [14]. • Thirdly, the proposed algorithm is scalable, that is easily to be deployed into the scenarios with one transmitter to many receivers.
The paper is organized as follows. In the second section, related work is discussed. In the third section, system model is given and the problem is formulated. The new algorithm is proposed in the fourth section. In the fifth section, the performances are analyzed including average long-term throughput and energy-availability guaranty. In the sixth section, simulations are shown and discussed to verify the advantageous properties. Finally, conclusion is given.

Related Work
The power control or the energy allocation problem in EH based wireless system has been widely researched. Existing research can be divided into two groups based on whether the information (about the energy arrival and channel state) assumed to be available at the transmitter or not.
In offline optimization group [15], EH transmitter has almost ideal knowledge of future energy arrival amount or perfect prediction of channel state. Solar-based EH systems work well under such assumption, where the amount of harvested energy is almost predictable. In [16], the authors model the energy arrival rate and channel distribution as Markov Process with known transition probability, optimize the power control by dynamic programming. The transmitter allocates more energy for transmission when the wireless channel is predicted to be good, less energy in bad channel. When the wireless channel or energy arrival rate is known partially, the power control strategies are implemented based on prediction of unknown part, such like the work on unknown channel distribution, on unknown energy arrival rate. In [17], the authors focus on the situation of hybrid energy storage, and battery imperfections situation is discussed in [18]. In [19], the authors consider a multihop EH communication system and cover all possible harvesting profiles including continuous and discrete cases. Overall the solution of offline strategies show the upper bound of optimized throughput.
In online optimization group [8,10], EH transmitter is assumed to know the statistics of underling EH arrival process or to have causal information about their realizations. In this case, the EH arrival process is model as approximated model, and online methods make decision on energy allocation based on predefined models. The model could be Markov decision process (MDP) or regression model based on statistic data [20]. In practical situation, the future channel state is unknown, and the learning theoretic approach is suitable for unpredictable case [21]. In this case, the transmitter learns the optimal energy allocation policies by performing actions and observing their rewards. Learning-based algorithms manage to gain rewards by minimizing the gap between online rewards and offline optimal throughput. In [11], the authors assumed energy arrival and channel state as individual MDPs, and did not know the transition ratios. After a period of learning, the expected average throughput showed convergence in lower bounder. Recently deep reinforcement learning algorithm is implemented in a point-to-point EH wireless communication system where prior information about distribution on energy arrival process and channel coefficient both are not available [22].
For most EH-based wireless systems, the computational capability of node is limited. As a result, the computational complexity of power control algorithm must be considered. In most existing algorithms, the computational loads of value iteration or policy iteration algorithms increases sharply with the number of quantized states and/or actions [15]. The online convex optimization opens up a brand new way of optimizing energy allocation and long-term throughput. The Online descend gradient (ODG) algorithm, a traditional OCO algorithm, achieved acceptable regret on average long-term throughput in energy unlimited case [12]. However, the energy continuity is ignored. In limited energy capacity case, the authors in [13,14] proposed an updated ODG version. By subtracting a vector, proposed algorithm help restrict total allocated energy. Related analysis showed the performance lower bounder based on assumption of huge battery capacity.

System Model
In this paper, a point-to-point EH communication system is considered. The general configuration is similar as [14]. There are n sub-channels between the transmitter and receiver. In the beginning of time slot t, the EH transmitter allocates energy with vector is the energy allocated for sub-channel i in time slot t. Maximal transmission power of each time slot ( P max ) is defined. The Feasible set ( ℙ ) of energy allocation is defined as (1).
Battery capacity ( E max ) is defined, and the harvested energy in time slot t is e[t], which is known only at the end of time slot t. Assume available energy in the beginning of time The states of all sub-channels in time slot t are displayed as a vector , which contains n sub-channels. The corresponding channel capacity of sub-channel i is log ( The system model is shown in Fig. 1, where EH transmitter chooses p[t + 1] at the beginning of time slot t + 1 . We define the reward U t (p[t];s[t])] as the sum throughput of all n sub-channels in (3).
Throughput with power p at channel s (U t (p;s)) is defined, ant it is a non-negative, nondecreasing, and concave utility function. The U t (p;s) is obtained at the end of time slot t by calculating (3)

Proposed Algorithm
In traditional OCO, the allocated power of next time slot P[t + 1] is the projection in fixed feasible set of power vector, which is obtained by ODG method. The feasible set is restricted by E max . As a convex function, the allocated power P[t + 1] would be equal to E max without any restriction, for which a sketch in two-dimensional space is shown in Fig. 2a. In energy limited case, the energy availability conditions shown in (5) must be satisfied, that is energy allocation should be no more than available energy.
In [14], the authors subtract an additional vector from the power vector, so that the new power vector is moved into the inner of feasible set, and a sketch is shown in Fig. 2b. The   (5) with the buff provided by a big capacity battery. Inspired by fractional allocation policy in [15], this paper adjusts the feasible set size to ensure (5). The new algorithm obtains SOC (range is [0, 1]) at the beginning of time slot t + 1 , and restricts feasible set of power control via SOC. The power control strategy is handled under the new feasible set. A sketch for the idea of proposed algorithm is shown in Fig. 2c.
The description of the algorithm is shown as follows. Step 1: obtain current SOC q[t + 1] Step 2: Restrict the maximum of transmission power.
where p i ≥ 0, ∀i ∈ 1, 2, … , n Step 3: power control where Proj ℙ q {⋅} represents the projection onto feasible set ℙ q and ∇ p U t (p;s) represents a gradient of function ∇ p U t (p;s) at point p = p[t].
As shown in Fig. 1 and the algorithm description, the SOC value ( 0 ≤ q[t + 1] ≤ 1 ) is available when deciding energy allocation p[t + 1] , because SOC relies on E[t + 1] only. In the step 2, the feasible set of power control is restricted by SOC, which is dynamic along with harvested energy and allocated energy. In step 3, projection is done in the restricted feasible set.

Algorithm Analysis
When deploying OCO into energy allocation problem in EH based wireless system, the energy availability guarantee should be fulfilled as (5). Then the regret of actions should be analysed in order to provide the lower bounder of performance. In the following part, the power control policy of proposed algorithm is analysed in above two aspects.

Energy Availability Guaranty
In order to implement the power control decision in proposed algorithm, the SOC in time slot t + 1 must satisfy following energy availability constraint (5).

Theorem 1 (Energy guaranty) If P max ≤ E max , then the energy availability is guaranteed.
Proof Based on the definition of Proj{⋅} and algorithm design, we obtain If P max ≤ E max , then ◻

Lower Bound for Long-Term Average Expected Throughput
Upper bound on the diameter of ℙ(D) is defined, that is, ∀x, y ∈ ℙ, ‖x − y‖ ≤ D . Upper bound of the gradient of U t (p;s)(G) is defined, that is, ‖∇ p U t (p;s)‖ ≤ G, ∀p ∈ ℙ, t ∈ {1, 2, … T} . Then we give the main result of this paper. (4), then under proposed algorithm, when T → ∞ , the lower bounder of average long term throughput can be given by where p * = arg max p∈ℙ ∑ T t=1 U t (p;s) in energy unlimited situation.

Theorem 2 (Main Result) If the long term average throughput is defined as
Before proving the Theorem 2, we need to introduce the following three lemmas first.
Proof Based on projection definition, we obtain Recall Then Therefore, ◻ Lemma 3 [12] Let c 1 , c 2 , … c T ∶ ℙ → ℝ be an arbitrary sequence of convex differenti- Below we give the proof of our main result.

Proof of Theorem 2 Let energy vector p[t]
is the output of the proposed algorithm, based on Lemma 1, we can get Then for all p, p z ∈ ℙ q and at time slot t, we have Sum (10) from t = 1 to t = T , and consider Lemma 2, we obtain .

3
The p z and p are the output of Zinkevich's policy [12] and proposed policy, respectively. Considering Lemma 3, we get Combine (11) and (12)

Simulations
In this section, the properties of proposed power control algorithm on EH based wireless system are analyzed. In EH transmitter, there are two sub-channels, which are two independent Rayleigh fading channels. The maximum transmitting power is set to 10 average long-term throughput between proposed algorithm and algorithm in [14] are shown under different setups. The simulation settings are listed in Table 1.
Energy Availability Guarantee Figure 3 shows E(e(t)) = 4 E(e(t)) = 2 E(e(t)) = 4 E(e(t)) = 2 Fig. 6 Battery state comparison always holds as time goes on, then it is clearly that the energy availability is guaranteed as long as the inequality E max ≥ P max is satisfied.
The minimal required battery capacity is P max , which is 10 in simulation. However, the minimal required battery capacity [14] exceeds 100, which is much bigger than 10. A small required battery capacity is easy to reduce the overall node cost.
Average Long-Term Throughput Analysis Figure 4 shows the average long term throughput trajectories of the proposed algorithm with different E(e[t]), where the E[e(t)] is the mean value of e(t). We can get that the performance relies on the energy harvest rate, that is, when average value of e(t) increases, the average throughput also increases. The sum throughput of all sub-channels is a convex function of allocated power.
In the following part, the comparisons of average long-term throughput and battery state are demonstrated in Figs. 5 and 6, respectively. The average harvest rate e[t] equals to 4 is an example of low energy harvesting rate, while 2 is an example of extremely low energy harvesting rate. In both figures, red color solid and dotted lines are the results of our proposed algorithm, the blues are the results of the compared algorithm in [14]. Regardless of the average harvest rate e[t], we see that both algorithms achieve throughput convergence along with the time slot. In Fig. 5, the solid lines indicate that proposed algorithm outperforms counterpart, the dotted lines share similar performance. Overall, proposed algorithm utilizes harvested energy better.
In order to deeply analyze the throughput performance, the battery state is shown. In Fig. 6, the SOC follows a flat oscillation in proposed algorithm, while in other algorithm, the battery level fluctuates in a wider range. The proposed algorithm uses the SOC as a negative input so as to reduce the dynamic of SOC. In compared algorithm, the allocated powers rely on the subtracted vector, which is mostly affected by the channel distribution. As a result, the blue solid and dotted lines show similar fluctuation level when the channel distribution is same. The SOC dynamic of our proposed algorithm is more conducive to extend battery lifetime comparing with the other algorithm, as deep charge-discharge cycle may reduce the battery lifetime. Furthermore, the average SOC of proposed algorithm is below another algorithm's, that is the main reason of achieving higher throughput.

Conclusions
In this paper, a power control problem in EH based wireless communication system aiming to maximize throughput is discussed in this paper. In our setup, the wireless transmitter does not know any future information about channel state and energy arrival rate. The setup is reasonable in actual situation. This paper proposes a simple online algorithm which fulfils energy availability guarantee, and achieves outstanding performance in average long-term throughput. The required battery capacity is small, which is suitable for lowcost wireless sensor node. The battery states are smooth in simulations, which is good for battery lifetime. Furthermore, the analysis model is scalable and is also suitable for one transmitter sends out information to many receivers respectively. are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.