Meeting Dynamic User Demand with Handoff Cost Awareness: MDP RL Based Network Handoff

  • Zhiyong DuEmail author
  • Bin Jiang
  • Qihui Wu
  • Yuhua Xu
  • Kun Xu


This chapter focuses on the network selection for dynamic user demand. The evolution of traffic types indicates dynamic user demands, which may result in user demand specific optimal networks. Considering the network handoff cost, learning the optimal matching between user demand and networks has to tradeoff QoE reward and handoff cost in the long-term learning process. We formulate this problem as a Markov decision process (MDP), where network handoff cost is quantitatively reflected as loss in QoE reward. By exploiting the prior information on the problem, we propose two efficient Q-learning-based algorithms. Simulation results indicate that the proposed algorithms can well solve the online matching between dynamic user demand and network with handoff cost awareness.


  1. 1.
    Reis AB, Chakareski J, Kassler A et al (2010) Distortion optimized multi-service scheduling for next-generation wireless mesh networks. In: IEEE INFOCOMGoogle Scholar
  2. 2.
    Shamik S, Mainak C, Samrat G (2008) Improving quality of VoIP streams over WiMax. IEEE Trans Comput 57(8):145–156Google Scholar
  3. 3.
    Stevens-Navarro E, Lin Y, Wong VWS (2008) An MDP-based vertical handoff decision algorithm for heterogeneous wireless networks. IEEE Trans Veh Technol 57(2):1243–1254CrossRefGoogle Scholar
  4. 4.
    Sun C, Stevens-Navarro E, Wong VWS (2008) A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In: IEEE international conference on communications (ICC)Google Scholar
  5. 5.
    Kim TO, Devanarayana CN et al (2015) An optimal admission control protocol for heterogeneous multicast streaming services. IEEE Trans Commun 63(6):2346–2359CrossRefGoogle Scholar
  6. 6.
    Haleh T, Golnaz F, John C (2011) A learning-based network selection method in heterogeneous wireless systems. In: IEEE global telecommunications conference (GLOBECOM)Google Scholar
  7. 7.
    Tabrizi H, Farhadi G, Cioffi J (2012) Dynamic handoff decision in heterogeneous wireless systems: Q-learning approach. In: IEEE international conference on communications (ICC)Google Scholar
  8. 8.
    Li Z, Wang C, Jiang C (2017) User association for load balancing in vehicular networks: an online reinforcement learning approach. IEEE Trans Intell Transp Syst 18(8):2217–2228CrossRefGoogle Scholar
  9. 9.
    Du Z, Wu Q, Yang P (2014) Dynamic user demand driven online network selection. IEEE Commun Lett 18(3):419–422CrossRefGoogle Scholar
  10. 10.
    Sutton RS, Barto AG (2017) Reinforcement Learning: An Introduction, 2nd edn. MIT Press, LondonzbMATHGoogle Scholar
  11. 11.

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Zhiyong Du
    • 1
    Email author
  • Bin Jiang
    • 1
  • Qihui Wu
    • 2
  • Yuhua Xu
    • 3
  • Kun Xu
    • 1
  1. 1.National University of Defense TechnologyChangshaChina
  2. 2.Nanjing University of Aeronautics and AstronauticsNanjingChina
  3. 3.Army Engineering University of PLANanjingChina

Personalised recommendations