Meeting Dynamic User Demand with Handoff Cost Awareness: MDP RL Based Network Handoff
This chapter focuses on the network selection for dynamic user demand. The evolution of traffic types indicates dynamic user demands, which may result in user demand specific optimal networks. Considering the network handoff cost, learning the optimal matching between user demand and networks has to tradeoff QoE reward and handoff cost in the long-term learning process. We formulate this problem as a Markov decision process (MDP), where network handoff cost is quantitatively reflected as loss in QoE reward. By exploiting the prior information on the problem, we propose two efficient Q-learning-based algorithms. Simulation results indicate that the proposed algorithms can well solve the online matching between dynamic user demand and network with handoff cost awareness.
- 1.Reis AB, Chakareski J, Kassler A et al (2010) Distortion optimized multi-service scheduling for next-generation wireless mesh networks. In: IEEE INFOCOMGoogle Scholar
- 2.Shamik S, Mainak C, Samrat G (2008) Improving quality of VoIP streams over WiMax. IEEE Trans Comput 57(8):145–156Google Scholar
- 4.Sun C, Stevens-Navarro E, Wong VWS (2008) A constrained MDP-based vertical handoff decision algorithm for 4G wireless networks. In: IEEE international conference on communications (ICC)Google Scholar
- 6.Haleh T, Golnaz F, John C (2011) A learning-based network selection method in heterogeneous wireless systems. In: IEEE global telecommunications conference (GLOBECOM)Google Scholar
- 7.Tabrizi H, Farhadi G, Cioffi J (2012) Dynamic handoff decision in heterogeneous wireless systems: Q-learning approach. In: IEEE international conference on communications (ICC)Google Scholar
- 11.Speedtest. http://www.speedtest.net/