Abstract
In this chapter, we introduce a reinforcement learning (RL)-based approach to jointly optimize cell load balance and network throughput as a potential AI/ML-based use case for sixth-generation cellular systems (6G), where inter-cell handover and massive MIMO antenna tilting are configured as the RL policy to learn. Our rationale behind using RL is to circumvent the challenges of analytically modeling user mobility and network dynamics. We integrate vector rewards into multiple value networks and conduct RL action via a separate policy network. We name this method as Pareto deterministic policy gradients (PDPG). It is an actor-critic, model-free, and deterministic policy algorithm which can handle the coupling objectives with the following two merits: (1) It solves the optimization via leveraging the degree of freedom of vector reward as opposed to choosing handcrafted scalar reward; (2) cross-validation over multiple policies can be significantly reduced. To be self-contained, an ideal static optimization-based brute-force search solver is included as the benchmark method. The comparison shows that the RL approach performs as well as this ideal strategy, though the former one is constrained with limited environment observations and lower action frequency, whereas the latter one has full access to the user mobility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shafin, R., Liu, L., Chandrasekhar, V., Chen, H., Reed, J., Zhang, J.C.: Artificial intelligence-enabled cellular networks: A critical path to beyond-5G and 6G. IEEE Wireless Communications 27(2), 212–217 (2020)
Klaine, P.V., Imran, M.A., Onireti, O., Souza, R.D.: A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Communications Surveys & Tutorials 19(4), 2392–2431 (2017)
Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A survey on mobile edge computing: The communication perspective. IEEE Communications Surveys Tutorials 19(4), 2322–2358 (2017)
Nam, Y.-H., Ng, B.L., Sayana, K., Li, Y., Zhang, J., Kim, Y., Lee, J.: Full-dimension mimo (FD-MIMO) for next generation cellular technology. IEEE Communications Magazine 51(6), 172–179 (2013)
Zhang, H., Liu, N., Chu, X., Long, K., Aghvami, A., Leung, V.C.M.: Network slicing based 5G and future mobile networks: Mobility, resource management, and challenges. IEEE Communications Magazine 55(8), 138–145 (2017)
Imran, A., Zoha, A., Abu-Dayya, A.: Challenges in 5G: how to empower son with big data for enabling 5G. IEEE network 28(6), 27–33 (2014)
Ruiz-Aviles, J.M., Toril, M., Luna-RamÃrez, S., Buenestado, V., Regueira, M.: Analysis of limitations of mobility load balancing in a live lte system. IEEE wireless communications letters 4(4), 417–420 (2015)
Andrews, J.G., Singh, S., Ye, Q., Lin, X., Dhillon, H.S.: An overview of load balancing in hetnets: Old myths and open problems. IEEE Wireless Communications 21(2), 18–25 (2014)
Ye, Q., Rong, B., Chen, Y., Al-Shalash, M., Caramanis, C., Andrews, J.G.: User association for load balancing in heterogeneous cellular networks. IEEE Transactions on Wireless Communications 12(6), 2706–2716 (2013)
Damnjanovic, A., Montojo, J., Wei, Y., Ji, T., Luo, T., Vajapeyam, M., Yoo, T., Song, O., Malladi, D.: A survey on 3GPP heterogeneous networks. IEEE Wireless communications 18(3), 10–21 (2011)
Lopez-Perez, D., Guvenc, I., Chu, X.: Mobility management challenges in 3GPP heterogeneous networks. IEEE Communications Magazine 50(12), 70–78 (2012)
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al.: Mastering atari, Go, chess and shogi by planning with a learned model. arXiv preprint arXiv:1911.08265 (2019)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS central science 3(12), 1337–1344 (2017)
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
38.331, G.T.: NR; radio resource control (RRC); protocol specification
38.213, G.T.: NR; physical layer procedures for control
Bethanabhotla, D., Bursalioglu, O.Y., Papadopoulos, H.C., Caire, G.: User association and load balancing for cellular massive mimo. In: 2014 Information Theory and Applications Workshop (ITA), pp. 1–10 (2014). IEEE
Razaviyayn, M., Hong, M., Luo, Z.-Q.: Linear transceiver design for a mimo interfering broadcast channel achieving max–min fairness. Signal Processing 93(12), 3327–3340 (2013)
Singh, S., Dhillon, H.S., Andrews, J.G.: Offloading in heterogeneous networks: Modeling, analysis, and design insights. IEEE Transactions on Wireless Communications 12(5), 2484–2497 (2013)
Hasan, M.M., Kwon, S., Na, J.: Adaptive mobility load balancing algorithm for lte small-cell networks. IEEE Transactions on Wireless Communications 17(4), 2205–2217 (2018)
Wang, H., Ding, L., Wu, P., Pan, Z., Liu, N., You, X.: Dynamic load balancing and throughput optimization in 3gpp lte networks. In: Proceedings of the 6th International Wireless Communications and Mobile Computing Conference, pp. 939–943 (2010)
Son, K., Chong, S., De Veciana, G.: Dynamic association for load balancing and interference avoidance in multi-cell networks. IEEE Transactions on Wireless Communications 8(7), 3566–3576 (2009)
Ao, W.C., Psounis, K.: Approximation algorithms for online user association in multi-tier multi-cell mobile networks. IEEE/ACM Transactions on Networking 25(4), 2361–2374 (2017)
Mwanje, S.S., Mitschele-Thiel, A.: A q-learning strategy for lte mobility load balancing. In: 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp. 2154–2158 (2013). IEEE
Mwanje, S.S., Schmelz, L.C., Mitschele-Thiel, A.: Cognitive cellular networks: A q-learning framework for self-organizing networks. IEEE Transactions on Network and Service Management 13(1), 85–98 (2016)
Kudo, T., Ohtsuki, T.: Q-learning based cell selection for ue outage reduction in heterogeneous networks. In: 2014 IEEE 80th Vehicular Technology Conference (VTC2014-Fall), pp. 1–5 (2014). IEEE
Xu, Y., Xu, W., Wang, Z., Lin, J., Cui, S.: Deep reinforcement learning based mobility load balancing under multiple behavior policies. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6 (2019). IEEE
Xu, Y., Xu, W., Wang, Z., Lin, J., Cui, S.: Load balancing for ultradense networks: A deep reinforcement learning-based approach. IEEE Internet of Things Journal 6(6), 9399–9412 (2019)
Wang, Z., Li, L., Xu, Y., Tian, H., Cui, S.: Handover control in wireless systems via asynchronous multiuser deep reinforcement learning. IEEE Internet of Things Journal 5(6), 4296–4307 (2018)
Zappone, A., Sanguinetti, L., Debbah, M.: User association and load balancing for massive mimo through deep learning. In: 2018 52nd Asilomar Conference on Signals, Systems, and Computers, pp. 1262–1266 (2018). IEEE
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research 15(1), 3483–3512 (2014)
Miettinen, K., Mäkelä, M.M.: On scalarizing functions in multiobjective optimization. OR spectrum 24(2), 193–213 (2002)
Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 5392–5402 (2017)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of artificial intelligence research 4, 237–285 (1996)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine learning 3(1), 9–44 (1988)
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1054–1062 (2016)
Lee, K., Hong, S., Kim, S.J., Rhee, I., Chong, S.: Slaw: A new mobility model for human walks. In: IEEE INFOCOM 2009, pp. 855–863 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Comparison: A Static Formulation
A Comparison: A Static Formulation
We now consider a static formulation of the joint optimization on load balancing and throughput maximization. To do so, we directly drop the expectation and time constraint in (11) and set \(\gamma = 0\). Therefore, we have the following formulation:
Comparing the above formulation to (11), we can notice the following difference and practical limitations:
-
To solve this static problem, it requires perfect knowledge of all \({p_{n,k}(b_n(t))}\) at every sample time which imposes a large overhead to the user feedback.
-
Due to the integer constraints, the complexity becomes very high. Moreover, it requires BSs to solve out \({\boldsymbol I}(t)\) and \({\boldsymbol b}(t)\) on every time slot.
-
The formulation treats user association as a variable. However, the user association cannot be directly translated to the values of the CIOs in A3 events. Therefore, it is not compatible to the handover operations in current cellular systems.
Therefore, we only consider the above formulation as an evaluation approach to our RL algorithm in the simulation. Ideally, the static way can yield the optimal solution at every time. Thus, it can serve as an upper bound for the RL algorithm.
Algorithm 1 Relaxed brute force for a small λ
1.1 A.1 Heuristic Brute-Force Solvers
Note that (19) can be equivalently written as
where \(\phi \) is a parameter to avoid trivial solutions, such as all users are disconnected to BSs (in this case, all cell loads are zero). Since we do not have an analytical expression of the objective, brute-force search is considered as our primary approach. However, the number of user association combinations is prohibitively large. To narrow down the searching region, we consider a heuristic approach to approximate
for a given \({\boldsymbol b}(t)\). We consider determining the user association in a round-robin manner: Each BS is allocated with an equal number of associated users, where the associated user for each BS is based on the ranking of user rates. Intuitively, this can be considered as a heuristic way to average the throughput \(R_n({\boldsymbol I}(t), {\boldsymbol b}(t))\) over all cells. Accordingly, the algorithm is summarized in Algorithm 1.
Algorithm 2 Relaxed brute-force for a fair λ
Alternatively, the inner loop for user association assignment can be considered as solving
A heuristic approach is assigning users to the BS with maximum transmission power. This association strategy may break the cell load balance but avoid user failure links. Therefore, we can combine the previous two heuristic association strategies and obtain Algorithm 2. Particularly, the two user association strategies are mixed through a random binary decision, where the decision threshold is proportional to the weight ratio between the two objectives, i.e., \({\lambda \over {1+\lambda }}\).
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zhou, Z., Xin, Y., Chen, H., Zhang, C., Liu, L., Yang, K. (2024). Pareto Deterministic Policy Gradients and Its Application in 6G Networks. In: Lin, X., Zhang, J., Liu, Y., Kim, J. (eds) Fundamentals of 6G Communications and Networking. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-37920-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-37920-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37919-2
Online ISBN: 978-3-031-37920-8
eBook Packages: EngineeringEngineering (R0)