Abstract
The core issue of network selection is to select the optimal network from available network access point (NAP) of heterogeneous wireless networks (HWN). Many previous works evaluate the networks in an ideal environment, i.e., they generally assume that the network state information (NSI) is known and static. However, due to the varying traffic load and radio channel, the NSI could be dynamic and even unavailable for the user in realistic HWN environment, thus most existing network selection algorithms cannot work effectively. Learning-based algorithms can address the problem of uncertain and dynamic NSI, while they commonly need sufficient samples on each option, resulting in unbearable handoff cost. Therefore, this chapter formulates the network selection problem as a multi-armed bandit (MAB) problem and designs two RL-based network selection algorithms with a special consideration on reducing network handoff cost. We prove that the proposed algorithms can achieve optimal order, e.g., logarithmic order regret with limited network handoff cost. Simulation results indicate that the two algorithms can significantly reduce the network handoff cost and improve the transmission performance compared with existing algorithms, simultaneously.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fernandes S, Karmouch A (2012) Vertical mobility management architectures in wireless networks: a comprehensive survey and future directions. IEEE Commun Surv Tutor 14(1):45–63
Niyato D, Hossain E (2009) Dynamics of network selection in heterogeneous wireless networks: an evolutionary game approach. IEEE Trans Veh Technol 58(4):2008–2017
Tabrizi H, Farhadi G, Cioffi J (2011) A learning-based network selection method in heterogeneous wireless systems. In: IEEE global telecommunications conference (GLOBECOM)
Zhang Y, Yuan Y, Zhou J et al (2009) A weighted bipartite graph based network selection scheme for multi-flows in heterogeneous wireless network. In: IEEE global telecommunications conference (GLOBECOM)
Stevens-Navarro E, Lin Y, Wong VWS (2008) An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Trans Veh Technol 57(2):2008–2017
Stevens-Navarro E, Wong VWS (2008) A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In: IEEE international conference on communications (ICC)
Wu T, Jing H, Yu X et al (2008) Cost-aware handover decision algorithm for cooperative cellular relaying networks. In: IEEE vehicle technology conference (VTC)
Wang L, Binet D (2009) Best permutation: a novel network selection scheme in heterogeneous wireless networks. In: International conference on wireless communications and mobile computing (IWCMC)
Hou J, O’Brien DC (2006) Vertical handover decision making algorithm using fuzzy logic for the integrated radio-and-OW system. IEEE Trans Wirel Commun 5(1):176–185
Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47:235–256
Agrawal R, Teneketzis D, Anantharam V (1998) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching. IEEE Trans Autom Control 33(10):899–906
Chen L, Iellamo S, Coupechoux M (2011) Opportunistic spectrum access with channel switching cost for cognitive radio networks. In: IEEE international conference on communications (ICC)
Du Z, Wu Q, Yang P (2016) Learning with handoff cost constraint for network selection in heterogeneous wireless network. Wirel Commun Mob Comput 16(4):441–458
Zhao T, Liu Q, Chen CW (2017) QoE in video transmission: a user experience-driven strategy. IEEE Commun Surv Tutor 19(1):285–302
Quoc-Thinh N, Agoulmine N, Cherkaoui EH et al (2016) Multicriteria optimization of access selection to improve the quality of experience in heterogeneous wireless access networks. IEEE Trans Veh Technol 62(4):1785–1800
ITU-T (2003) One-way transmission time. Rec. G.114
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Appendix 1: Proof of Theorem 6.1
Proof
Given the slot index t, the block number \(K=\left\lceil \frac{t}{m} \right\rceil \) in block UCB1 based algorithm. \({{\gamma }_{n}}\left( K \right) \) is the number of blocks that network n is selected in the first K blocks
Note that \(\frac{{{{\hat{r}}}_{n}}\left( k \right) }{m}\) is the sample mean of \({{r}_{n}}\), based on the Chernoff–Hoeffding bound [11], we can get that
The handoff cost bound is
where \({\Delta _n} = {\mu _{{n^*}}} - {\mu _n}\), \({c'_{{n^*}}} = \mathop {\max }\limits _m {} {c_{m,{n^*}}}\) is the maximal handoff cost for a handoff to the optimal network \({{n}^{*}}\), \(K = \left\lceil {\frac{t}{m}} \right\rceil \) is the block number for t. Note that the above case corresponds to the worst scenario, where any two suboptimal network selections are separated by one optimal network selection.
The expected regret can be bounded as
This completes the proof. \(\square \)
1.2 Appendix 2: Proof of Theorem 2.2
Proof
Similarly, the handoff cost and regret for UCB2 based algorithm are as follows:
where \({c'_n} = \mathop {\max }\limits _{k \in N} {c_{k,n}}\) is the maximal handoff cost when the selected network is n. We bound \({{H}_{UCB2}}\left( t \right) \) and \({{R}_{UCB2}}\left( t \right) \) by finding the bound for \(E\left[ {{\gamma _n}} \right] ,n \ne {n^*}\), which is the expected number of blocks in which network n is selected. For any \(t \ge \mathop {\max }\limits _{{\mu _n} < {\mu _{{n^*}}}} \frac{1}{{2\Delta _n^2}}\), \({\Delta _n} = {\mu _{{n^*}}} - {\mu _n}\), denote \({{\tilde{\gamma }}_{n}}\) as the largest integer satisfying
According to the definition of \(\nu \left( \tau \right) \), \(\nu \left( {{\gamma }_{n}} \right) \le t\) for \(\forall n\), then
Therefore,
According to [11], the event “network n has started \(\gamma \)-th block” implies \(\exists i\ge 0\) such that at least one of the following two events holds:
Thus,
In the following, we bound \(E\left[ {{\gamma }_{n}} \right] \) by bounding its components. Since \(\nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \le \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}\), \({\left( {1 + \alpha } \right) ^{{{\tilde{\gamma }}_n} - 1}} \le \left\lceil {{{\left( {1 + \alpha } \right) }^{{{\tilde{\gamma }}_n} - 1}}} \right\rceil = \nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \) , then,
Denote
According to [11],
In addition, \(t \ge \mathop {\max }\limits _{{\mu _n} < {\mu _{{n^*}}}} \frac{1}{{2\Delta _n^2}}\), thus,
At last, using Chernoff–Hoeffding bound,
From [11],
. Then,
Combining (2.22), (2.24), and (2.25), we get
which completes the proof. \(\square \)
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Du, Z., Jiang, B., Wu, Q., Xu, Y., Xu, K. (2020). Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection. In: Towards User-Centric Intelligent Network Selection in 5G Heterogeneous Wireless Networks. Springer, Singapore. https://doi.org/10.1007/978-981-15-1120-2_2
Download citation
DOI: https://doi.org/10.1007/978-981-15-1120-2_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1119-6
Online ISBN: 978-981-15-1120-2
eBook Packages: EngineeringEngineering (R0)