Pareto Deterministic Policy Gradients and Its Application in 6G Networks

Zhou, Zhou; Xin, Yan; Chen, Hao; Zhang, Charlie; Liu, Lingjia; Yang, Kai

doi:10.1007/978-3-031-37920-8_23

Zhou Zhou¹¹,
Yan Xin¹²,
Hao Chen¹²,
Charlie Zhang¹²,
Lingjia Liu¹¹ &
…
Kai Yang¹³

Part of the book series: Signals and Communication Technology ((SCT))

763 Accesses

Abstract

In this chapter, we introduce a reinforcement learning (RL)-based approach to jointly optimize cell load balance and network throughput as a potential AI/ML-based use case for sixth-generation cellular systems (6G), where inter-cell handover and massive MIMO antenna tilting are configured as the RL policy to learn. Our rationale behind using RL is to circumvent the challenges of analytically modeling user mobility and network dynamics. We integrate vector rewards into multiple value networks and conduct RL action via a separate policy network. We name this method as Pareto deterministic policy gradients (PDPG). It is an actor-critic, model-free, and deterministic policy algorithm which can handle the coupling objectives with the following two merits: (1) It solves the optimization via leveraging the degree of freedom of vector reward as opposed to choosing handcrafted scalar reward; (2) cross-validation over multiple policies can be significantly reduced. To be self-contained, an ideal static optimization-based brute-force search solver is included as the benchmark method. The comparison shows that the RL approach performs as well as this ideal strategy, though the former one is constrained with limited environment observations and lower action frequency, whereas the latter one has full access to the user mobility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shafin, R., Liu, L., Chandrasekhar, V., Chen, H., Reed, J., Zhang, J.C.: Artificial intelligence-enabled cellular networks: A critical path to beyond-5G and 6G. IEEE Wireless Communications 27(2), 212–217 (2020)
Article Google Scholar
Klaine, P.V., Imran, M.A., Onireti, O., Souza, R.D.: A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Communications Surveys & Tutorials 19(4), 2392–2431 (2017)
Article Google Scholar
Mao, Y., You, C., Zhang, J., Huang, K., Letaief, K.B.: A survey on mobile edge computing: The communication perspective. IEEE Communications Surveys Tutorials 19(4), 2322–2358 (2017)
Article Google Scholar
Nam, Y.-H., Ng, B.L., Sayana, K., Li, Y., Zhang, J., Kim, Y., Lee, J.: Full-dimension mimo (FD-MIMO) for next generation cellular technology. IEEE Communications Magazine 51(6), 172–179 (2013)
Article Google Scholar
Zhang, H., Liu, N., Chu, X., Long, K., Aghvami, A., Leung, V.C.M.: Network slicing based 5G and future mobile networks: Mobility, resource management, and challenges. IEEE Communications Magazine 55(8), 138–145 (2017)
Article Google Scholar
Imran, A., Zoha, A., Abu-Dayya, A.: Challenges in 5G: how to empower son with big data for enabling 5G. IEEE network 28(6), 27–33 (2014)
Article Google Scholar
Ruiz-Aviles, J.M., Toril, M., Luna-Ramírez, S., Buenestado, V., Regueira, M.: Analysis of limitations of mobility load balancing in a live lte system. IEEE wireless communications letters 4(4), 417–420 (2015)
Article Google Scholar
Andrews, J.G., Singh, S., Ye, Q., Lin, X., Dhillon, H.S.: An overview of load balancing in hetnets: Old myths and open problems. IEEE Wireless Communications 21(2), 18–25 (2014)
Article Google Scholar
Ye, Q., Rong, B., Chen, Y., Al-Shalash, M., Caramanis, C., Andrews, J.G.: User association for load balancing in heterogeneous cellular networks. IEEE Transactions on Wireless Communications 12(6), 2706–2716 (2013)
Article Google Scholar
Damnjanovic, A., Montojo, J., Wei, Y., Ji, T., Luo, T., Vajapeyam, M., Yoo, T., Song, O., Malladi, D.: A survey on 3GPP heterogeneous networks. IEEE Wireless communications 18(3), 10–21 (2011)
Article Google Scholar
Lopez-Perez, D., Guvenc, I., Chu, X.: Mobility management challenges in 3GPP heterogeneous networks. IEEE Communications Magazine 50(12), 70–78 (2012)
Article Google Scholar
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al.: Mastering atari, Go, chess and shogi by planning with a learned model. arXiv preprint arXiv:1911.08265 (2019)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS central science 3(12), 1337–1344 (2017)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
Google Scholar
38.331, G.T.: NR; radio resource control (RRC); protocol specification
Google Scholar
38.213, G.T.: NR; physical layer procedures for control
Google Scholar
Bethanabhotla, D., Bursalioglu, O.Y., Papadopoulos, H.C., Caire, G.: User association and load balancing for cellular massive mimo. In: 2014 Information Theory and Applications Workshop (ITA), pp. 1–10 (2014). IEEE
Google Scholar
Razaviyayn, M., Hong, M., Luo, Z.-Q.: Linear transceiver design for a mimo interfering broadcast channel achieving max–min fairness. Signal Processing 93(12), 3327–3340 (2013)
Article Google Scholar
Singh, S., Dhillon, H.S., Andrews, J.G.: Offloading in heterogeneous networks: Modeling, analysis, and design insights. IEEE Transactions on Wireless Communications 12(5), 2484–2497 (2013)
Article Google Scholar
Hasan, M.M., Kwon, S., Na, J.: Adaptive mobility load balancing algorithm for lte small-cell networks. IEEE Transactions on Wireless Communications 17(4), 2205–2217 (2018)
Article Google Scholar
Wang, H., Ding, L., Wu, P., Pan, Z., Liu, N., You, X.: Dynamic load balancing and throughput optimization in 3gpp lte networks. In: Proceedings of the 6th International Wireless Communications and Mobile Computing Conference, pp. 939–943 (2010)
Google Scholar
Son, K., Chong, S., De Veciana, G.: Dynamic association for load balancing and interference avoidance in multi-cell networks. IEEE Transactions on Wireless Communications 8(7), 3566–3576 (2009)
Article Google Scholar
Ao, W.C., Psounis, K.: Approximation algorithms for online user association in multi-tier multi-cell mobile networks. IEEE/ACM Transactions on Networking 25(4), 2361–2374 (2017)
Article Google Scholar
Mwanje, S.S., Mitschele-Thiel, A.: A q-learning strategy for lte mobility load balancing. In: 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), pp. 2154–2158 (2013). IEEE
Google Scholar
Mwanje, S.S., Schmelz, L.C., Mitschele-Thiel, A.: Cognitive cellular networks: A q-learning framework for self-organizing networks. IEEE Transactions on Network and Service Management 13(1), 85–98 (2016)
Article Google Scholar
Kudo, T., Ohtsuki, T.: Q-learning based cell selection for ue outage reduction in heterogeneous networks. In: 2014 IEEE 80th Vehicular Technology Conference (VTC2014-Fall), pp. 1–5 (2014). IEEE
Google Scholar
Xu, Y., Xu, W., Wang, Z., Lin, J., Cui, S.: Deep reinforcement learning based mobility load balancing under multiple behavior policies. In: ICC 2019–2019 IEEE International Conference on Communications (ICC), pp. 1–6 (2019). IEEE
Google Scholar
Xu, Y., Xu, W., Wang, Z., Lin, J., Cui, S.: Load balancing for ultradense networks: A deep reinforcement learning-based approach. IEEE Internet of Things Journal 6(6), 9399–9412 (2019)
Article Google Scholar
Wang, Z., Li, L., Xu, Y., Tian, H., Cui, S.: Handover control in wireless systems via asynchronous multiuser deep reinforcement learning. IEEE Internet of Things Journal 5(6), 4296–4307 (2018)
Article Google Scholar
Zappone, A., Sanguinetti, L., Debbah, M.: User association and load balancing for massive mimo through deep learning. In: 2018 52nd Asilomar Conference on Signals, Systems, and Computers, pp. 1262–1266 (2018). IEEE
Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Google Scholar
Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research 15(1), 3483–3512 (2014)
MathSciNet MATH Google Scholar
Miettinen, K., Mäkelä, M.M.: On scalarizing functions in multiobjective optimization. OR spectrum 24(2), 193–213 (2002)
Article MathSciNet MATH Google Scholar
Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 5392–5402 (2017)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of artificial intelligence research 4, 237–285 (1996)
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine learning 3(1), 9–44 (1988)
Article Google Scholar
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1054–1062 (2016)
Google Scholar
Lee, K., Hong, S., Kim, S.J., Rhee, I., Chong, S.: Slaw: A new mobility model for human walks. In: IEEE INFOCOM 2009, pp. 855–863 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

ECE Department, Virginia Tech, Blacksburg, VA, USA
Zhou Zhou & Lingjia Liu
Samsung Research America, Plano, TX, USA
Yan Xin, Hao Chen & Charlie Zhang
ECE Department, Tongji University, Shanghai, China
Kai Yang

Authors

Zhou Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yan Xin
View author publications
You can also search for this author in PubMed Google Scholar
Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Charlie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lingjia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhou Zhou .

Editor information

Editors and Affiliations

NVIDIA, Santa Clara, CA, USA
Xingqin Lin
Department of ECE, Hong Kong University of Science and Technology, Kowloon, Hong Kong
Jun Zhang
Queen Mary University of London, London, UK
Yuanwei Liu
Korea University, Seoul, Korea (Republic of)
Joongheon Kim

A Comparison: A Static Formulation

We now consider a static formulation of the joint optimization on load balancing and throughput maximization. To do so, we directly drop the expectation and time constraint in (11) and set $\gamma = 0$. Therefore, we have the following formulation:

$$\displaystyle \begin{aligned} {} \begin{aligned} &\max_{{\boldsymbol I}(t), {\boldsymbol b}(t) } \sum_n R_n(t) + \sum_n F_n(t)\\ &s.t. \quad \boldsymbol{I}(t)=\{I_{n, k}(t): I_{n,k}(t) \in \{0, 1\}, n \in N, k \in K\}\\ &\qquad {\boldsymbol b}(t) = \{b_{n}(t):b_{n}(t) \in \{\theta_0, \theta_1, \cdots, \theta_{M-1} \}, n \in N\}.\\ \end{aligned} \end{aligned} $$

(19)

Comparing the above formulation to (11), we can notice the following difference and practical limitations:

To solve this static problem, it requires perfect knowledge of all ${p_{n,k}(b_n(t))}$ at every sample time which imposes a large overhead to the user feedback.
Due to the integer constraints, the complexity becomes very high. Moreover, it requires BSs to solve out ${\boldsymbol I}(t)$ and ${\boldsymbol b}(t)$ on every time slot.
The formulation treats user association as a variable. However, the user association cannot be directly translated to the values of the CIOs in A3 events. Therefore, it is not compatible to the handover operations in current cellular systems.

Therefore, we only consider the above formulation as an evaluation approach to our RL algorithm in the simulation. Ideally, the static way can yield the optimal solution at every time. Thus, it can serve as an upper bound for the RL algorithm.

Algorithm 1 Relaxed brute force for a small λ

1.1 A.1 Heuristic Brute-Force Solvers

Note that (19) can be equivalently written as

$$\displaystyle \begin{aligned} {} \begin{aligned} &\max_{{\boldsymbol I}(t), {\boldsymbol b}(t) } \sum_n F_n({\boldsymbol I}(t), {\boldsymbol b}(t))\\ &s.t. \quad \boldsymbol{I}(t)=\{I_{n, k}(t): I_{n,k}(t) \in \{0, 1\}, n \in N, k \in K\}\\ &\qquad {\boldsymbol b}(t) = \{b_{n}(t):b_{n}(t) \in \{\theta_0, \theta_1, \cdots, \theta_{M-1} \}, n \in N\}\\ &\qquad R_n({\boldsymbol I}(t), {\boldsymbol b}(t)) > \phi, n \in N \end{aligned} \end{aligned} $$

(20)

where $\phi $ is a parameter to avoid trivial solutions, such as all users are disconnected to BSs (in this case, all cell loads are zero). Since we do not have an analytical expression of the objective, brute-force search is considered as our primary approach. However, the number of user association combinations is prohibitively large. To narrow down the searching region, we consider a heuristic approach to approximate

$$\displaystyle \begin{aligned} \max_{{\boldsymbol I}(t) } \sum_n F_n({\boldsymbol I}(t), {\boldsymbol b}(t)) \end{aligned} $$

(21)

for a given ${\boldsymbol b}(t)$. We consider determining the user association in a round-robin manner: Each BS is allocated with an equal number of associated users, where the associated user for each BS is based on the ranking of user rates. Intuitively, this can be considered as a heuristic way to average the throughput $R_n({\boldsymbol I}(t), {\boldsymbol b}(t))$ over all cells. Accordingly, the algorithm is summarized in Algorithm 1.

Algorithm 2 Relaxed brute-force for a fair λ

Alternatively, the inner loop for user association assignment can be considered as solving

$$\displaystyle \begin{aligned} \max_{{\boldsymbol I}(t) } \sum_n R_n({\boldsymbol I}(t), {\boldsymbol b}(t)) \end{aligned} $$

(22)

A heuristic approach is assigning users to the BS with maximum transmission power. This association strategy may break the cell load balance but avoid user failure links. Therefore, we can combine the previous two heuristic association strategies and obtain Algorithm 2. Particularly, the two user association strategies are mixed through a random binary decision, where the decision threshold is proportional to the weight ratio between the two objectives, i.e., ${\lambda \over {1+\lambda }}$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhou, Z., Xin, Y., Chen, H., Zhang, C., Liu, L., Yang, K. (2024). Pareto Deterministic Policy Gradients and Its Application in 6G Networks. In: Lin, X., Zhang, J., Liu, Y., Kim, J. (eds) Fundamentals of 6G Communications and Networking. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-37920-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-37920-8_23
Published: 10 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37919-2
Online ISBN: 978-3-031-37920-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Pareto Deterministic Policy Gradients and Its Application in 6G Networks

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Comparison: A Static Formulation

A Comparison: A Static Formulation

Algorithm 1 Relaxed brute force for a small λ

1.1 A.1 Heuristic Brute-Force Solvers

Algorithm 2 Relaxed brute-force for a fair λ

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation