Skip to main content

Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection

  • Chapter
  • First Online:
Towards User-Centric Intelligent Network Selection in 5G Heterogeneous Wireless Networks
  • 319 Accesses

Abstract

The core issue of network selection is to select the optimal network from available network access point (NAP) of heterogeneous wireless networks (HWN). Many previous works evaluate the networks in an ideal environment, i.e., they generally assume that the network state information (NSI) is known and static. However, due to the varying traffic load and radio channel, the NSI could be dynamic and even unavailable for the user in realistic HWN environment, thus most existing network selection algorithms cannot work effectively. Learning-based algorithms can address the problem of uncertain and dynamic NSI, while they commonly need sufficient samples on each option, resulting in unbearable handoff cost. Therefore, this chapter formulates the network selection problem as a multi-armed bandit (MAB) problem and designs two RL-based network selection algorithms with a special consideration on reducing network handoff cost. We prove that the proposed algorithms can achieve optimal order, e.g., logarithmic order regret with limited network handoff cost. Simulation results indicate that the two algorithms can significantly reduce the network handoff cost and improve the transmission performance compared with existing algorithms, simultaneously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fernandes S, Karmouch A (2012) Vertical mobility management architectures in wireless networks: a comprehensive survey and future directions. IEEE Commun Surv Tutor 14(1):45–63

    Article  Google Scholar 

  2. Niyato D, Hossain E (2009) Dynamics of network selection in heterogeneous wireless networks: an evolutionary game approach. IEEE Trans Veh Technol 58(4):2008–2017

    Article  Google Scholar 

  3. Tabrizi H, Farhadi G, Cioffi J (2011) A learning-based network selection method in heterogeneous wireless systems. In: IEEE global telecommunications conference (GLOBECOM)

    Google Scholar 

  4. Zhang Y, Yuan Y, Zhou J et al (2009) A weighted bipartite graph based network selection scheme for multi-flows in heterogeneous wireless network. In: IEEE global telecommunications conference (GLOBECOM)

    Google Scholar 

  5. Stevens-Navarro E, Lin Y, Wong VWS (2008) An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Trans Veh Technol 57(2):2008–2017

    Article  Google Scholar 

  6. Stevens-Navarro E, Wong VWS (2008) A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In: IEEE international conference on communications (ICC)

    Google Scholar 

  7. Wu T, Jing H, Yu X et al (2008) Cost-aware handover decision algorithm for cooperative cellular relaying networks. In: IEEE vehicle technology conference (VTC)

    Google Scholar 

  8. Wang L, Binet D (2009) Best permutation: a novel network selection scheme in heterogeneous wireless networks. In: International conference on wireless communications and mobile computing (IWCMC)

    Google Scholar 

  9. Hou J, O’Brien DC (2006) Vertical handover decision making algorithm using fuzzy logic for the integrated radio-and-OW system. IEEE Trans Wirel Commun 5(1):176–185

    Article  Google Scholar 

  10. Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22

    Article  MathSciNet  Google Scholar 

  11. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47:235–256

    Article  Google Scholar 

  12. Agrawal R, Teneketzis D, Anantharam V (1998) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching. IEEE Trans Autom Control 33(10):899–906

    Article  MathSciNet  Google Scholar 

  13. Chen L, Iellamo S, Coupechoux M (2011) Opportunistic spectrum access with channel switching cost for cognitive radio networks. In: IEEE international conference on communications (ICC)

    Google Scholar 

  14. Du Z, Wu Q, Yang P (2016) Learning with handoff cost constraint for network selection in heterogeneous wireless network. Wirel Commun Mob Comput 16(4):441–458

    Article  Google Scholar 

  15. Zhao T, Liu Q, Chen CW (2017) QoE in video transmission: a user experience-driven strategy. IEEE Commun Surv Tutor 19(1):285–302

    Article  Google Scholar 

  16. Quoc-Thinh N, Agoulmine N, Cherkaoui EH et al (2016) Multicriteria optimization of access selection to improve the quality of experience in heterogeneous wireless access networks. IEEE Trans Veh Technol 62(4):1785–1800

    Google Scholar 

  17. ITU-T (2003) One-way transmission time. Rec. G.114

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Du .

Appendix

Appendix

1.1 Appendix 1: Proof of Theorem 6.1

Proof

Given the slot index t, the block number \(K=\left\lceil \frac{t}{m} \right\rceil \) in block UCB1 based algorithm. \({{\gamma }_{n}}\left( K \right) \) is the number of blocks that network n is selected in the first K blocks

$$\begin{array}{l} {\gamma _n}\left( K \right) = 1 + \sum \limits _{k = N + 1}^K {I\left\{ {\delta \left( {km} \right) = n} \right\} } \\ \le l + \sum \limits _{k = N + 1}^K {I\left\{ {\delta \left( {km} \right) = n,{\gamma _n}\left( {k - 1} \right) \ge l} \right\} } \\ \le l + \sum \limits _{k = N + 1}^K {I\left\{ {{{\hat{r}}_{{n^*}}}\left( k \right) + \sqrt{\frac{{2m\log k}}{{{\gamma _{{n^*}}}\left( k \right) }}} } \right. } \le \\ \quad \left. {{{\hat{r}}_n}\left( k \right) + \sqrt{\frac{{2m\log k}}{{{\gamma _n}\left( k \right) }}} ,{\gamma _n}\left( {k - 1} \right) \ge l} \right\} \\ = l + \sum \limits _{k = N + 1}^K {I\left\{ {\frac{{{{\hat{r}}_{{n^*}}}\left( k \right) }}{m} + \sqrt{\frac{{2\log k}}{{m{\gamma _{{n^*}}}\left( k \right) }}} } \right. } \le \\ \quad \left. {\frac{{{{\hat{r}}_n}\left( k \right) }}{m} + \sqrt{\frac{{2\log k}}{{m{\gamma _n}\left( k \right) }}} ,{\gamma _n}\left( {k - 1} \right) \ge l} \right\} \end{array}$$

Note that \(\frac{{{{\hat{r}}}_{n}}\left( k \right) }{m}\) is the sample mean of \({{r}_{n}}\), based on the Chernoff–Hoeffding bound [11], we can get that

$$\begin{aligned} E\left[ {{\gamma }_{n}}\left( K \right) \right]&\le \frac{8\log K}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \nonumber \\&\le \frac{8\log \left( \frac{t+m}{m} \right) }{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \nonumber \\&=\frac{8\log \left( t+m \right) -8\log m}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \nonumber \\&<\frac{8\log t}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3} \end{aligned}$$
(2.16)

The handoff cost bound is

$$\begin{aligned}&E\left[ {H_{block}{{\left( t \right) }}} \right] \le E\left[ {\sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {{{c'}_n}{\gamma _n}\left( K \right) } } \right] \quad + {{c'}_{{n^*}}}E\left[ {\sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {{\gamma _n}\left( K \right) } } \right] \nonumber \\&= \sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {{{c'}_n}} E\left[ {{\gamma _n}\left( K \right) } \right] + {{c'}_{{n^*}}}\sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {E\left[ {{\gamma _n}\left( K \right) } \right] } \nonumber \\&\le \sum \limits _{n:{\mu _n} < {\mu _{{n^*}}}} {\left( {{{c'}_{{n^*}}} + {{c'}_n}} \right) \left[ \frac{8\log t}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3}\right] } \end{aligned}$$
(2.17)

where \({\Delta _n} = {\mu _{{n^*}}} - {\mu _n}\), \({c'_{{n^*}}} = \mathop {\max }\limits _m {} {c_{m,{n^*}}}\) is the maximal handoff cost for a handoff to the optimal network \({{n}^{*}}\), \(K = \left\lceil {\frac{t}{m}} \right\rceil \) is the block number for t. Note that the above case corresponds to the worst scenario, where any two suboptimal network selections are separated by one optimal network selection.

The expected regret can be bounded as

$$\begin{aligned} E\left[ {{R_{block}}\left( t \right) } \right]&\le \sum \limits _{n:{\mu _n}< {\mu _{{n^*}}}} {m{\Delta _n}E\left[ {{\gamma _n}\left( K \right) } \right] } + \phi E\left[ {{H_\pi }\left( t \right) } \right] \nonumber \\&\le \sum \limits _{n:{\mu _n} < {\mu _{{n^*}}}} {\left( {m{\Delta _n} \phi {{c'}_{{n^*}}} + \phi {{c'}_n}} \right) \left[ {\frac{8\log t}{m\Delta _{n}^{2}}+1+\frac{{{\pi }^{2}}}{3}} \right] } \end{aligned}$$
(2.18)

This completes the proof.   \(\square \)

1.2 Appendix 2: Proof of Theorem 2.2

Proof

Similarly, the handoff cost and regret for UCB2 based algorithm are as follows:

$$\begin{aligned} E\left[ {{H}_{UCB2}}\left( t \right) \right]&\le E\left[ \sum \limits _{n:{{\mu }_{n}}<{{\mu }_{{{n}^{*}}}}}{{{{{c}'}}_{n}}{{\gamma }_{n}}} \right] +{{{{c}'}}_{{{n}^{*}}}}E\left[ \sum \limits _{n:{{\mu }_{n}}<{{\mu }_{{{n}^{*}}}}}{{{\gamma }_{n}}} \right] \nonumber \\&=\sum \limits _{n:{{\mu }_{n}}<{{\mu }_{{{n}^{*}}}}}{\left( {c}'+{{{{c}'}}_{{{n}^{*}}}} \right) }E\left[ {{\gamma }_{n}} \right] \end{aligned}$$
(2.19)
$$\begin{aligned} E\left[ {{R_{UCB2}}\left( t \right) } \right] \le \sum \limits _{n:{\mu _n} < {\mu _{{n^*}}}} {\left( {{\mu ^*} -{\mu _n}} \right) E\left[ {{\gamma _n}} \right] } + \phi E\left[ {{H_{UCB2}}\left( t \right) }, \right] \end{aligned}$$
(2.20)

where \({c'_n} = \mathop {\max }\limits _{k \in N} {c_{k,n}}\) is the maximal handoff cost when the selected network is n. We bound \({{H}_{UCB2}}\left( t \right) \) and \({{R}_{UCB2}}\left( t \right) \) by finding the bound for \(E\left[ {{\gamma _n}} \right] ,n \ne {n^*}\), which is the expected number of blocks in which network n is selected. For any \(t \ge \mathop {\max }\limits _{{\mu _n} < {\mu _{{n^*}}}} \frac{1}{{2\Delta _n^2}}\), \({\Delta _n} = {\mu _{{n^*}}} - {\mu _n}\), denote \({{\tilde{\gamma }}_{n}}\) as the largest integer satisfying

$$\begin{aligned} \nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \le \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}} \end{aligned}$$

According to the definition of \(\nu \left( \tau \right) \), \(\nu \left( {{\gamma }_{n}} \right) \le t\) for \(\forall n\), then

$$\begin{aligned} {\gamma _n} \le \frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}. \end{aligned}$$

Therefore,

$$\begin{aligned} {{\gamma }_{n}}&\le 1+\sum \limits _{\gamma>1}^{\frac{\log t}{\log \left( 1+\alpha \right) }}{I\{\text {network }n\text { has started }\gamma \text {th block}\}} \nonumber \\&\le {{{\tilde{\gamma }}}_{n}}+\sum \limits _{\gamma >{{{\tilde{\gamma }}}_{n}}}^{\frac{\log t}{\log \left( 1+\alpha \right) }}{I\{\text {network }n\text { has started }\gamma \text {th block}\}}. \end{aligned}$$
(2.21)

According to [11], the event “network n has started \(\gamma \)-th block” implies \(\exists i\ge 0\) such that at least one of the following two events holds:

$$\begin{aligned} {\hat{r}_{n,\nu \left( {\gamma - 1} \right) }} + {a_{t,\gamma - 1}} \ge {\mu ^*} - \frac{{\alpha {\Delta _n}}}{2} \end{aligned}$$
$$\begin{aligned} {\hat{r}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu ^*} - \frac{{\alpha {\Delta _n}}}{2} \end{aligned}$$

Thus,

$$\begin{array}{l} E\left[ {{\gamma _n}} \right] \le {{\tilde{\gamma }}_n} + \sum \limits _{\gamma> {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {P\{ {{\hat{r}}_{n,\nu \left( {\gamma - 1} \right) }}+\! {a_{t,\gamma - 1}} \ge {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\} } \\ + \sum \limits _{\gamma > {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {P\{ {{\hat{r}}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\} } } \end{array}$$

In the following, we bound \(E\left[ {{\gamma }_{n}} \right] \) by bounding its components. Since \(\nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \le \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}\), \({\left( {1 + \alpha } \right) ^{{{\tilde{\gamma }}_n} - 1}} \le \left\lceil {{{\left( {1 + \alpha } \right) }^{{{\tilde{\gamma }}_n} - 1}}} \right\rceil = \nu \left( {{{\tilde{\gamma }}_n} - 1} \right) \) , then,

$$\begin{aligned} {\tilde{\gamma }_n} \le 1 + \frac{1}{{\log \left( {1 + \alpha } \right) }}\log \left[ {\frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}} \right] . \end{aligned}$$
(2.22)

Denote

$$\begin{aligned} A\left( n \right) = P\{ {\hat{r}_{n,\nu \left( {\gamma - 1} \right) }} + {a_{t,\gamma - 1}} \ge {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\}, \end{aligned}$$
$$\begin{aligned} B = \sum \limits _{\gamma > {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {P\{ {{\hat{r}}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu _{{n^*}}} - \frac{{\alpha {\Delta _n}}}{2}\} } } \end{aligned}$$

According to [11],

$$\begin{aligned} A\left( n \right)&=P\{{{{\hat{r}}}_{n,\nu \left( \gamma -1 \right) }}+{{a}_{t,\gamma -1}}\ge {{\mu }_{n}}+{{\Delta }_{n}}-\frac{\alpha {{\Delta }_{n}}}{2}\} \nonumber \\&\le \exp \left\{ -\nu \left( \gamma -1 \right) \Delta _{n}^{2}{{\alpha }^{2}}/2 \right\} \end{aligned}$$
(2.23)

In addition, \(t \ge \mathop {\max }\limits _{{\mu _n} < {\mu _{{n^*}}}} \frac{1}{{2\Delta _n^2}}\), thus,

$$\begin{aligned} \nu \left( {\gamma - 1} \right)&> \frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}\nonumber \\&\ge \frac{{\left( {1 + 4\alpha } \right) }}{{2\Delta _n^2}}. \end{aligned}$$
(2.24)

At last, using Chernoff–Hoeffding bound,

$$\begin{array}{l} B = \sum \limits _{\gamma> {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {P\{ {{\hat{r}}^*}_{\nu \left( i \right) } + {a_{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) ,i}} \le {\mu ^*} - \frac{{\alpha {\Delta _n}}}{2}\} } } \\ \le \sum \limits _{\gamma> {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\sum \limits _{i \ge 0} {\exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2} - \left( {1 + \alpha } \right) \log \left[ {e\frac{{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) }}{{\nu \left( i \right) }}} \right] } \right\} } } \\ \le {\sum \limits _{i \ge 0} {\nu \left( i \right) \exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2}} \right\} \sum \limits _{\gamma > {{\tilde{\gamma }}_n}}^{\frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}} {\exp \left( { - 1 - \alpha } \right) } \left[ {\frac{{\nu \left( i \right) }}{{\nu \left( {\gamma - 1} \right) + \nu \left( i \right) }}} \right] } ^{1 + \alpha }}\\ \le \sum \limits _{i \ge 0} {\nu \left( i \right) \exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2}} \right\} \exp \left( { - 1 - \alpha } \right) \frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}}. \end{array}$$

From [11],

$$\begin{aligned} \sum \limits _{i \ge 0} {\nu \left( i \right) \exp \left\{ { - \frac{{\nu \left( i \right) {\alpha ^2}\Delta _n^2}}{2}} \right\} } < 1 + \frac{{11\left( {1 + \alpha } \right) }}{{5{\alpha ^2}\Delta _n^2\log \left( {1 + \alpha } \right) }} \end{aligned}$$

. Then,

$$\begin{aligned} B \le \left[ {1 + \frac{{11\left( {1 + \alpha } \right) }}{{5{\alpha ^2}\Delta _n^2\log \left( {1 + \alpha } \right) }}} \right] \frac{{{e^{ - \left( {1 + \alpha } \right) }}\log t}}{{\log \left( {1 + \alpha } \right) }}. \end{aligned}$$
(2.25)

Combining (2.22), (2.24), and (2.25), we get

$$\begin{array}{l} E\left[ {{\gamma _n}} \right] \le 1 + \frac{1}{{\log \left( {1 + \alpha } \right) }}\log \left[ {\frac{{\left( {1 + 4\alpha } \right) \log \left( {2et\Delta _n^2} \right) }}{{2\Delta _n^2}}} \right] + \\ \frac{{\log t}}{{\log \left( {1 + \alpha } \right) }}\left\{ {\exp \left[ { - \frac{{{\alpha ^2}\left( {1 + \alpha } \right) }}{4}} \right] + \exp \left( { - 1 - \alpha } \right) + \frac{{11\left( {1 + \alpha } \right) \exp \left( { - 1 - \alpha } \right) }}{{5{\alpha ^2}\Delta _n^2\log \left( {1 + \alpha } \right) }}} \right\} \end{array},$$

which completes the proof.   \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Du, Z., Jiang, B., Wu, Q., Xu, Y., Xu, K. (2020). Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection. In: Towards User-Centric Intelligent Network Selection in 5G Heterogeneous Wireless Networks. Springer, Singapore. https://doi.org/10.1007/978-981-15-1120-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1120-2_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1119-6

  • Online ISBN: 978-981-15-1120-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics