Advertisement

Wireless Networks

, Volume 25, Issue 8, pp 5057–5068 | Cite as

Actor-critic deep learning for efficient user association and bandwidth allocation in dense mobile networks with green base stations

  • Quang Vinh Do
  • Insoo KooEmail author
Article
  • 42 Downloads

Abstract

In this paper, we introduce an efficient user-association and bandwidth-allocation scheme based on an actor-critic deep learning framework for downlink data transmission in dense mobile networks. In this kind of network, small cells are densely deployed in a single macrocell, and share the same spectrum band with the macrocell. The small-cell base stations are also called green base stations since they are powered solely by solar-energy harvesters. Therefore, we propose an actor-critic deep learning (ACDL) algorithm for the purpose of maximizing long-term network performance while adhering to constraints on harvested energy and spectrum sharing. For this purpose, the agent of the ACDL algorithm tries to obtain an optimal user-association and bandwidth-allocation policy by interacting with the network’s environment. We first formulate the optimization problem in this paper as a Markov decision process, during which the agent learns about the evolution of the environment through trial and error experience. Then, we use a deep neural network to model the policy function and the value function in the actor and in the critic of the agent, respectively. The actor selects an action based on the output of the policy network. Meanwhile, the critic uses the output of the value network to help the actor evaluate the taken action. Numerical results demonstrate that the proposed algorithm can enhance network performance in the long run.

Keywords

Actor-critic Deep learning Energy harvesting Resource allocation 

Notes

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant through the Korean Government (MSIT) under Grant NRF-2018R1A2B6001714.

References

  1. 1.
    Gao, Z., Dai, L., Mi, D., Wang, Z., Imran, M. A., & Shakir, M. Z. (2015). MmWave massive-MIMO-based wireless backhaul for the 5G ultra-dense network. IEEE Wireless Communications, 22(5), 13–21.CrossRefGoogle Scholar
  2. 2.
    Chou, C.-T., & Shin, K. G. (2004). Analysis of adaptive bandwidth allocation in wireless networks with multilevel degradable quality of service. IEEE Transactions on Mobile Computing, 3(1), 5–17.CrossRefGoogle Scholar
  3. 3.
    Zhou, Z., Dong, M., Ota, K., Wang, G., & Yang, L. T. (2016). Energy-efficient resource allocation for D2D communications underlaying cloud-RAN-based LTE-A networks. IEEE Internet of Things Journal, 3(3), 428–438.CrossRefGoogle Scholar
  4. 4.
    Su, J., & Liu, Y. (2015). Priority-based bandwidth allocation in heterogeneous wireless network. In 11th international conference on wireless communications, networking and mobile computing (WiCOM) (pp. 1–6).Google Scholar
  5. 5.
    Esmailpour, A., & Nasser, N. (2011). Dynamic QoS-based bandwidth allocation framework for broadband wireless networks. IEEE Transactions on Vehicular Technology, 60(6), 2690–2700.CrossRefGoogle Scholar
  6. 6.
    Han, T., & Ansari, N. (2013). On optimizing green energy utilization for cellular networks with hybrid energy supplies. IEEE Transactions on Wireless Communications, 12(8), 3872–3882.CrossRefGoogle Scholar
  7. 7.
    Dhillon, H. S., Li, Y., Nuggehalli, P., Pi, Z., & Andrews, J. G. (2014). Fundamentals of heterogeneous cellular networks with energy harvesting. IEEE Transactions on Wireless Communications, 13(5), 2782–2797.CrossRefGoogle Scholar
  8. 8.
    Hasan, Z., Boostanimehr, H., & Bhargava, V. K. (2011). Green cellular networks: A survey, some research issues and challenges. IEEE Communications Surveys & Tutorials, 13(4), 524–540.CrossRefGoogle Scholar
  9. 9.
    Son, K., Chong, S., & Veciana, G. (2009). Dynamic association for load balancing and interference avoidance in multi-cell networks. IEEE Transactions on Wireless Communications, 8(7), 3566–3576.CrossRefGoogle Scholar
  10. 10.
    Ye, Q., Rong, B., Chen, Y., Al-Shalash, M., Caramanis, C., & Andrews, J. G. (2013). User association for load balancing in heterogeneous cellular networks. IEEE Transactions on Wireless Communications, 12(6), 2706–2716.CrossRefGoogle Scholar
  11. 11.
    Andrews, J., Singh, S., Ye, Q., Lin, X., & Dhillon, H. (2014). An overview of load balancing in hetnets: old myths and open problems. IEEE Wireless Communications, 21(2), 18–25.CrossRefGoogle Scholar
  12. 12.
    Xie, R., Yu, F. R., Ji, H., & Li, Y. (2012). Energy-efficient resource allocation for heterogeneous cognitive radio networks with femtocells. IEEE Transactions on Wireless Communications, 11(11), 3910–3920.CrossRefGoogle Scholar
  13. 13.
    Gabriel Gussen, C. M., Belmega, E. V., & Debbah, M. (2011). Pricing and bandwidth allocation problems in wireless multi-tier networks. In 2011 conference record of the forty fifth Asilomar conference on signals, systems and computers (ASILOMAR) (pp. 1633–1637).Google Scholar
  14. 14.
    Niyato, D., Wang, P., Kim, D. I., Saad, W., & Han, Z. (2016). Mobile energy sharing networks: Performance analysis and optimization. IEEE Transactions on Vehicular Technology, 65(5), 3519–3535.CrossRefGoogle Scholar
  15. 15.
    Li, R., Zhao, Z., Chen, X., Palicot, J., & Zhang, H. (2014). TACT: a transfer actor-critic learning framework for energy saving in cellular radio access networks. IEEE Transactions on Wireless Communications, 13(4), 2000–2011.CrossRefGoogle Scholar
  16. 16.
    Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. A Bradford Book. London: MIT Press.zbMATHGoogle Scholar
  17. 17.
    Wei, Y., Yu, F. R., Song, M., & Han, Z. (2018). User scheduling and resource allocation in hetnets with hybrid energy supply: An actor-critic reinforcement learning approach. IEEE Transactions on Wireless Communications, 17(1), 680–692.CrossRefGoogle Scholar
  18. 18.
    Spielberg, S. P. K., Gopaluni, R. B., & Loewen, P. D. (2017). Deep reinforcement learning approaches for process control. In 2017 6th international symposium on advanced control of industrial processes (AdCONIP) (pp. 201–206).Google Scholar
  19. 19.
    Yu, Y., Wang, T., & Liew, S. C. (2018). Deep-reinforcement learning multiple access for heterogeneous wireless networks. In IEEE international conference on communications (ICC) (pp. 1–7).Google Scholar
  20. 20.
    Wang, W., Wang, X., & Nilsson, A. A. (2006). Energy-efficient bandwidth allocation in wireless networks: Algorithms, analysis, and simulations. IEEE Transactions on Wireless Communications, 5(5), 1103–1114.CrossRefGoogle Scholar
  21. 21.
    Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics Part C (Applications and Reviews), 42(6), 1291–1307.CrossRefGoogle Scholar
  22. 22.
    Mnih V., et al. (2016). Asynchronous methods for deep reinforcement learning. In The 33rd international conference on international conference on machine learning (pp. 1928–1937).Google Scholar
  23. 23.
    Lillicrap T. P., et al. (2016). Continuous control with deep reinforcement learning. In International conference on learning representations (pp. 1–14).Google Scholar
  24. 24.
    van Hasselt, Hado, Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. In The thirtieth AAAI conference on artificial intelligence (pp. 2094–2100).Google Scholar
  25. 25.
    Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press.zbMATHGoogle Scholar
  26. 26.
    Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In The thirteenth international conference on artificial intelligence and statistics (pp. 249–256).Google Scholar
  27. 27.
    Wang, K., Chen, L., & Liu, Q. (2014). On optimality of myopic policy for opportunistic access with nonidentical channels and imperfect sensing. IEEE Transactions on Vehicular Technology, 63(5), 2478–2483.CrossRefGoogle Scholar
  28. 28.
    Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In The 3rd international conference for learning representations (pp. 1–15).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electrical EngineeringUniversity of UlsanUlsanRepublic of Korea

Personalised recommendations