Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection

Du, Zhiyong; Jiang, Bin; Wu, Qihui; Xu, Yuhua; Xu, Kun

doi:10.1007/978-981-15-1120-2_2

Zhiyong Du⁶,
Bin Jiang⁶,
Qihui Wu⁷,
Yuhua Xu⁸ &
…
Kun Xu⁶

319 Accesses

Abstract

The core issue of network selection is to select the optimal network from available network access point (NAP) of heterogeneous wireless networks (HWN). Many previous works evaluate the networks in an ideal environment, i.e., they generally assume that the network state information (NSI) is known and static. However, due to the varying traffic load and radio channel, the NSI could be dynamic and even unavailable for the user in realistic HWN environment, thus most existing network selection algorithms cannot work effectively. Learning-based algorithms can address the problem of uncertain and dynamic NSI, while they commonly need sufficient samples on each option, resulting in unbearable handoff cost. Therefore, this chapter formulates the network selection problem as a multi-armed bandit (MAB) problem and designs two RL-based network selection algorithms with a special consideration on reducing network handoff cost. We prove that the proposed algorithms can achieve optimal order, e.g., logarithmic order regret with limited network handoff cost. Simulation results indicate that the two algorithms can significantly reduce the network handoff cost and improve the transmission performance compared with existing algorithms, simultaneously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fernandes S, Karmouch A (2012) Vertical mobility management architectures in wireless networks: a comprehensive survey and future directions. IEEE Commun Surv Tutor 14(1):45–63
Article Google Scholar
Niyato D, Hossain E (2009) Dynamics of network selection in heterogeneous wireless networks: an evolutionary game approach. IEEE Trans Veh Technol 58(4):2008–2017
Article Google Scholar
Tabrizi H, Farhadi G, Cioffi J (2011) A learning-based network selection method in heterogeneous wireless systems. In: IEEE global telecommunications conference (GLOBECOM)
Google Scholar
Zhang Y, Yuan Y, Zhou J et al (2009) A weighted bipartite graph based network selection scheme for multi-flows in heterogeneous wireless network. In: IEEE global telecommunications conference (GLOBECOM)
Google Scholar
Stevens-Navarro E, Lin Y, Wong VWS (2008) An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Trans Veh Technol 57(2):2008–2017
Article Google Scholar
Stevens-Navarro E, Wong VWS (2008) A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In: IEEE international conference on communications (ICC)
Google Scholar
Wu T, Jing H, Yu X et al (2008) Cost-aware handover decision algorithm for cooperative cellular relaying networks. In: IEEE vehicle technology conference (VTC)
Google Scholar
Wang L, Binet D (2009) Best permutation: a novel network selection scheme in heterogeneous wireless networks. In: International conference on wireless communications and mobile computing (IWCMC)
Google Scholar
Hou J, O’Brien DC (2006) Vertical handover decision making algorithm using fuzzy logic for the integrated radio-and-OW system. IEEE Trans Wirel Commun 5(1):176–185
Article Google Scholar
Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22
Article MathSciNet Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47:235–256
Article Google Scholar
Agrawal R, Teneketzis D, Anantharam V (1998) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching. IEEE Trans Autom Control 33(10):899–906
Article MathSciNet Google Scholar
Chen L, Iellamo S, Coupechoux M (2011) Opportunistic spectrum access with channel switching cost for cognitive radio networks. In: IEEE international conference on communications (ICC)
Google Scholar
Du Z, Wu Q, Yang P (2016) Learning with handoff cost constraint for network selection in heterogeneous wireless network. Wirel Commun Mob Comput 16(4):441–458
Article Google Scholar
Zhao T, Liu Q, Chen CW (2017) QoE in video transmission: a user experience-driven strategy. IEEE Commun Surv Tutor 19(1):285–302
Article Google Scholar
Quoc-Thinh N, Agoulmine N, Cherkaoui EH et al (2016) Multicriteria optimization of access selection to improve the quality of experience in heterogeneous wireless access networks. IEEE Trans Veh Technol 62(4):1785–1800
Google Scholar
ITU-T (2003) One-way transmission time. Rec. G.114
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, Hunan, China
Zhiyong Du, Bin Jiang & Kun Xu
Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
Qihui Wu
Army Engineering University of PLA, Nanjing, China
Yuhua Xu

Authors

Zhiyong Du
View author publications
You can also search for this author in PubMed Google Scholar
Bin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qihui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Du .

Appendix

1.1 Appendix 1: Proof of Theorem 6.1

Proof

Given the slot index t, the block number $K=\left\lceil \frac{t}{m} \right\rceil $ in block UCB1 based algorithm. ${{\gamma }_{n}}\left( K \right) $ is the number of blocks that network n is selected in the first K blocks

$$\begin{array}{l} {\gamma _n}\left( K \right) = 1 + \sum \limits _{k = N + 1}^K {I\left\{ {\delta \left( {km} \right) = n} \right\} } \\ \le l + \sum \limits _{k = N + 1}^K {I\left\{ {\delta \left( {km} \right) = n,{\gamma _n}\left( {k - 1} \right) \ge l} \right\} } \\ \le l + \sum \limits _{k = N + 1}^K {I\left\{ {{{\hat{r}}_{{n^*}}}\left( k \right) + \sqrt{\frac{{2m\log k}}{{{\gamma _{{n^*}}}\left( k \right) }}} } \right. } \le \\ \quad \left. {{{\hat{r}}_n}\left( k \right) + \sqrt{\frac{{2m\log k}}{{{\gamma _n}\left( k \right) }}} ,{\gamma _n}\left( {k - 1} \right) \ge l} \right\} \\ = l + \sum \limits _{k = N + 1}^K {I\left\{ {\frac{{{{\hat{r}}_{{n^*}}}\left( k \right) }}{m} + \sqrt{\frac{{2\log k}}{{m{\gamma _{{n^*}}}\left( k \right) }}} } \right. } \le \\ \quad \left. {\frac{{{{\hat{r}}_n}\left( k \right) }}{m} + \sqrt{\frac{{2\log k}}{{m{\gamma _n}\left( k \right) }}} ,{\gamma _n}\left( {k - 1} \right) \ge l} \right\} \end{array}$$

Note that $\frac{{{{\hat{r}}}_{n}}\left( k \right) }{m}$ is the sample mean of ${{r}_{n}}$, based on the Chernoff–Hoeffding bound [11], we can get that

$$\begin{aligned} E\left[ {{\gamma }_{n}}\left( K \right) \right]&\le \frac{8\log K}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \nonumber \\&\le \frac{8\log \left( \frac{t+m}{m} \right) }{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \nonumber \\&=\frac{8\log \left( t+m \right) -8\log m}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \nonumber \\&<\frac{8\log t}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \end{aligned}$$

(2.16)

The handoff cost bound is

$$\begin{aligned}&E\left[ {H_{block}{{\left( t \right) }}} \right] \le E\left[ {\sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {{{c'}_n}{\gamma _n}\left( K \right) } } \right] \quad + {{c'}_{{n^*}}}E\left[ {\sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {{\gamma _n}\left( K \right) } } \right] \nonumber \\&= \sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {{{c'}_n}} E\left[ {{\gamma _n}\left( K \right) } \right] + {{c'}_{{n^*}}}\sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {E\left[ {{\gamma _n}\left( K \right) } \right] } \nonumber \\&\le \sum \limits _{n:{\mu _n} < {\mu _{{n^*}}}} {\left( {{{c'}_{{n^*}}} + {{c'}_n}} \right) \left[ \frac{8\log t}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3}\right] } \end{aligned}$$

(2.17)

where ${\Delta _n} = {\mu _{{n^*}}} - {\mu _n}$, ${c'_{{n^*}}} = \mathop {\max }\limits _m {} {c_{m,{n^*}}}$ is the maximal handoff cost for a handoff to the optimal network ${{n}^{*}}$, $K = \left\lceil {\frac{t}{m}} \right\rceil $ is the block number for t. Note that the above case corresponds to the worst scenario, where any two suboptimal network selections are separated by one optimal network selection.

The expected regret can be bounded as

$$\begin{aligned} E\left[ {{R_{block}}\left( t \right) } \right]&\le \sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {m{\Delta _n}E\left[ {{\gamma _n}\left( K \right) } \right] } + \phi E\left[ {{H_\pi }\left( t \right) } \right] \nonumber \\&\le \sum \limits _{n:{\mu _n} < {\mu _{{n^*}}}} {\left( {m{\Delta _n} \phi {{c'}_{{n^*}}} + \phi {{c'}_n}} \right) \left[ {\frac{8\log t}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3}} \right] } \end{aligned}$$

(2.18)

This completes the proof. $\square $

1.2 Appendix 2: Proof of Theorem 2.2

Proof

Similarly, the handoff cost and regret for UCB2 based algorithm are as follows:

$$\begin{aligned} E\left[ {{H}_{UCB2}}\left( t \right) \right]&\le E\left[ \sum \limits _{n:{{\mu }_{n}}<{{\mu }_{{{n}^{*}}}}}{{{{{c}'}}_{n}}{{\gamma }_{n}}} \right] +{{{{c}'}}_{{{n}^{*}}}}E\left[ \sum \limits _{n:{{\mu }_{n}}<{{\mu }_{{{n}^{*}}}}}{{{\gamma }_{n}}} \right] \nonumber \\&=\sum \limits _{n:{{\mu }_{n}}<{{\mu }_{{{n}^{*}}}}}{\left( {c}'+{{{{c}'}}_{{{n}^{*}}}} \right) }E\left[ {{\gamma }_{n}} \right] \end{aligned}$$

(2.19)

$$\begin{aligned} E\left[ {{R_{UCB2}}\left( t \right) } \right] \le \sum \limits _{n:{\mu _n} < {\mu _{{n^*}}}} {\left( {{\mu ^*} -{\mu _n}} \right) E\left[ {{\gamma _n}} \right] } + \phi E\left[ {{H_{UCB2}}\left( t \right) }, \right] \end{aligned}$$

(2.20)

where ${c'_n} = \mathop {\max }\limits _{k \in N} {c_{k,n}}$ is the maximal handoff cost when the selected network is n. We bound ${{H}_{UCB2}}\left( t \right) $ and ${{R}_{UCB2}}\left( t \right) $ by finding the bound for $E\left[ {{\gamma _n}} \right] ,n \ne {n^*}$, which is the expected number of blocks in which network n is selected. For any $t \ge \mathop {\max }\limits _{{\mu _n} < {\mu _{{n^*}}}} \frac{1}{{2\Delta _n^2}}$, ${\Delta _n} = {\mu _{{n^*}}} - {\mu _n}$, denote ${{\tilde{\gamma }}_{n}}$ as the largest integer satisfying

$$\begin{aligned} \nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \le \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}} \end{aligned}$$

According to the definition of $\nu \left( \tau \right) $, $\nu \left( {{\gamma }_{n}} \right) \le t$ for $\forall n$, then

$$\begin{aligned} {\gamma _n} \le \frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}. \end{aligned}$$

Therefore,

$$\begin{aligned} {{\gamma }_{n}}&\le 1+\sum \limits _{\gamma>1}^{\frac{\log t}{\log \left( 1+\alpha \right) }}{I\{\text {network }n\text { has started }\gamma \text {th block}\}} \nonumber \\&\le {{{\tilde{\gamma }}}_{n}}+\sum \limits _{\gamma >{{{\tilde{\gamma }}}_{n}}}^{\frac{\log t}{\log \left( 1+\alpha \right) }}{I\{\text {network }n\text { has started }\gamma \text {th block}\}}. \end{aligned}$$

(2.21)

According to [11], the event “network n has started $\gamma $-th block” implies $\exists i\ge 0$ such that at least one of the following two events holds:

$$\begin{aligned} {\hat{r}_{n,\nu \left( {\gamma - 1} \right) }} + {a_{t,\gamma - 1}} \ge {\mu ^*} - \frac{{\alpha {\Delta _n}}}{2} \end{aligned}$$

$$\begin{aligned} {\hat{r}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu ^*} - \frac{{\alpha {\Delta _n}}}{2} \end{aligned}$$

Thus,

$$\begin{array}{l} E\left[ {{\gamma _n}} \right] \le {{\tilde{\gamma }}_n} + \sum \limits _{\gamma> {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {P\{ {{\hat{r}}_{n,\nu \left( {\gamma - 1} \right) }}+\! {a_{t,\gamma - 1}} \ge {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\} } \\ + \sum \limits _{\gamma > {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {P\{ {{\hat{r}}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\} } } \end{array}$$

In the following, we bound $E\left[ {{\gamma }_{n}} \right] $ by bounding its components. Since $\nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \le \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}$, ${\left( {1 + \alpha } \right) ^{{{\tilde{\gamma }}_n} - 1}} \le \left\lceil {{{\left( {1 + \alpha } \right) }^{{{\tilde{\gamma }}_n} - 1}}} \right\rceil = \nu \left( {{{\tilde{\gamma }}_n} - 1} \right) $ , then,

$$\begin{aligned} {\tilde{\gamma }_n} \le 1 + \frac{1}{{\log \left( {1 + \alpha } \right) }}\log \left[ {\frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}} \right] . \end{aligned}$$

(2.22)

Denote

$$\begin{aligned} A\left( n \right) = P\{ {\hat{r}_{n,\nu \left( {\gamma - 1} \right) }} + {a_{t,\gamma - 1}} \ge {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\}, \end{aligned}$$

$$\begin{aligned} B = \sum \limits _{\gamma > {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {P\{ {{\hat{r}}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\} } } \end{aligned}$$

According to [11],

$$\begin{aligned} A\left( n \right)&=P\{{{{\hat{r}}}_{n,\nu \left( \gamma -1 \right) }}+{{a}_{t,\gamma -1}}\ge {{\mu }_{n}}+{{\Delta }_{n}}-\frac{\alpha {{\Delta }_{n}}}{2}\} \nonumber \\&\le \exp \left\{ -\nu \left( \gamma -1 \right) \Delta _{n}^{2}{{\alpha }^{2}}/2 \right\} \end{aligned}$$

(2.23)

In addition, $t \ge \mathop {\max }\limits _{{\mu _n} < {\mu _{{n^*}}}} \frac{1}{{2\Delta _n^2}}$, thus,

$$\begin{aligned} \nu \left( {\gamma - 1} \right)&> \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}\nonumber \\&\ge \frac{{\left( {1 + 4\alpha } \right) }}{{2\Delta _n^2}}. \end{aligned}$$

(2.24)

At last, using Chernoff–Hoeffding bound,

$$\begin{array}{l} B = \sum \limits _{\gamma> {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {P\{ {{\hat{r}}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu ^*} - \frac{{\alpha {\Delta _n}}}{2}\} } } \\ \le \sum \limits _{\gamma> {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {\exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2} - \left( {1 + \alpha } \right) \log \left[ {e\frac{{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) }}{{\nu \left( i \right) }}} \right] } \right\} } } \\ \le {\sum \limits _{i \ge 0} {\nu \left( i \right) \exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2}} \right\} \sum \limits _{\gamma > {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\exp \left( { - 1 - \alpha } \right) } \left[ {\frac{{\nu \left( i \right) }}{{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) }}} \right] } ^{1 + \alpha }}\\ \le \sum \limits _{i \ge 0} {\nu \left( i \right) \exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2}} \right\} \exp \left( { - 1 - \alpha } \right) \frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}}. \end{array}$$

From [11],

$$\begin{aligned} \sum \limits _{i \ge 0} {\nu \left( i \right) \exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2}} \right\} } < 1 + \frac{{11\left( {1 + \alpha } \right) }}{{5{\alpha ^2}\Delta _n^2\log \left( {1 + \alpha } \right) }} \end{aligned}$$

. Then,

$$\begin{aligned} B \le \left[ {1 + \frac{{11\left( {1 + \alpha } \right) }}{{5{\alpha ^2}\Delta _n^2\log \left( {1 + \alpha } \right) }}} \right] \frac{{{e^{ - \left( {1 + \alpha } \right) }}\log t}}{{\log \left( {1 + \alpha } \right) }}. \end{aligned}$$

(2.25)

Combining (2.22), (2.24), and (2.25), we get

$$\begin{array}{l} E\left[ {{\gamma _n}} \right] \le 1 + \frac{1}{{\log \left( {1 + \alpha } \right) }}\log \left[ {\frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}} \right] + \\ \frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}\left\{ {\exp \left[ { - \frac{{{\alpha ^2}\left( {1 + \alpha } \right) }}{4}} \right] + \exp \left( { - 1 - \alpha } \right) + \frac{{11\left( {1 + \alpha } \right) \exp \left( { - 1 - \alpha } \right) }}{{5{\alpha ^2}\Delta _n^2\log \left( {1 + \alpha } \right) }}} \right\} \end{array},$$

which completes the proof. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, Z., Jiang, B., Wu, Q., Xu, Y., Xu, K. (2020). Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection. In: Towards User-Centric Intelligent Network Selection in 5G Heterogeneous Wireless Networks. Springer, Singapore. https://doi.org/10.1007/978-981-15-1120-2_2

Download citation

DOI: https://doi.org/10.1007/978-981-15-1120-2_2
Published: 07 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1119-6
Online ISBN: 978-981-15-1120-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Appendix 1: Proof of Theorem 6.1

Proof

1.2 Appendix 2: Proof of Theorem 2.2

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation