Abstract
Machine learning (ML) has been empowering all aspects of the wireless communication system design, among which, the reinforcement learning (RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently. In this paper, we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave (mmWave) three-dimensional (3D) multi-input multi-output (MIMO) communication systems. The proposed scheme does not require perfect channel state information (CSI) or precise user location information which both are generally difficult to obtain in practice, and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users, multiple paths, and multiple 3D beams. In particular, a self-attention deep deterministic policy gradient (DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively. We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user. To this end, the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely, which leads to a near-optimal beamformer design. Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.
Similar content being viewed by others
References
Ayach O E, Rajagopal S, Abu-Surra S, et al. Spatially sparse precoding in millimeter wave MIMO systems. IEEE Trans Wirel Commun, 2014, 13: 1499–1513
Che J Z, Zhang Z Y, Yang Z H, et al. Unsourced random massive access with beam-space tree decoding. IEEE J Sel Areas Commun, 2022, 40: 1146–1161
Qi C H, Dong P H, Ma W Y, et al. Acquisition of channel state information for mmWave massive MIMO: traditional and machine learning-based approaches. Sci China Inf Sci, 2021, 64: 181301
You X H, Wang C X, Huang J, et al. Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts. Sci China Inf Sci, 2021, 64: 110301
Huang Y Z, Zhang Z Y, Wang J, et al. Joint AMC and resource allocation for mobile wireless networks based on distributed MARL. In: Proceedings of IEEE International Conference on Communications Workshops, 2022
Xiao Z R, Zhang Z Y, Huang C W, et al. C-GRBFnet: a physics-inspired generative deep neural network for channel representation and prediction. IEEE J Sel Areas Commun, 2022, 40: 2282–2299
Chen Z R, Zhang Z Y, Xiao Z R, et al. Viewing the MIMO channel as sequence rather than image: a Seq2Seq approach for efficient CSI feedback. In: Proceedings of IEEE Wireless Communications and Networking Conference, 2022. 2292–2297
Pan J X, Ye N, Yu H X, et al. AI-driven blind signature classification for IoT connectivity: a deep learning approach. IEEE Trans Wirel Commun, 2022, 21: 6033–6047
Zhang L, Tan J, Liang Y C, et al. Deep reinforcement learning-based modulation and coding scheme selection in cognitive heterogeneous networks. IEEE Trans Wirel Commun, 2019, 18: 3281–3294
Ye N, An J P, Yu J H. Deep-learning-enhanced NOMA transceiver design for massive MTC: challenges, state of the art, and future directions. IEEE Wirel Commun, 2021, 28: 66–73
Hua Y, Li R, Zhao Z, et al. GAN-powered deep distributional reinforcement learning for resource management in network slicing. IEEE J Sel Areas Commun, 2020, 38: 334–349
3GPP. Study on radiated metrics and test methodology for the verification of multi-antenna reception performance of NR User Equipment (UE). TR 38.827. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3519
3GPP. On procedures for beam selection and feedback signaling. TSG-RAN1. https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGRl_86b/Docs/Rl-1610243-zip
Lau V K N, Zhang F, Cui Y. Low complexity delay-constrained beamforming for multi-user MIMO systems with imperfect CSIT. IEEE Trans Signal Process, 2013, 61: 4090–4099
Sohrabi F, Yu W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J Sel Top Signal Process, 2016, 10: 501–513
Palhares V M T, Flores A R, de Lamare R C. Robust MMSE precoding and power allocation for cell-free massive MIMO systems. IEEE Trans Veh Technol, 2021, 70: 5115–5120
Yang D, Yang L L, Hanzo L. DFT-based beamforming weight-vector codebook design for spatially correlated channels in the unitary precoding aided multiuser downlink. In: Proceedings of IEEE International Conference on Communications (ICC), 2010. 1–5
Jia R D, Chen X M, Zhong C J, et al. Design of non-orthogonal beamspace multiple access for cellular Internet-of-Things. IEEE J Sel Top Signal Process, 2019, 13: 538–552
Zhang Y, Alrabeiah M, Alkhateeb A, et al. Reinforcement learning of beam codebooks in millimeter wave and terahertz MIMO systems. 2021. ArXiv:2102.11392
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008
Elbir A M, Papazafeiropoulos A K. Hybrid precoding for multiuser millimeter wave massive MIMO systems: a deep learning approach. IEEE Trans Veh Technol, 2019, 69: 552–563
Li X F, Alkhateeb A. Deep learning for direct hybrid precoding in millimeter wave massive MIMO systems. In: Proceedings of the 53rd Asilomar Conference on Signals, Systems, and Computers, 2019. 800–805
Alkhateeb A, Alex S, Varkey P, et al. Deep learning coordinated beamforming for highly-mobile millimeter wave systems. IEEE Access, 2018, 6: 37328–37348
Li L X, Ren H, Cheng Q Q, et al. Millimeter-wave networking in the sky: a machine learning and mean field game approach for joint beamforming and beam-steering. IEEE Trans Wirel Commun, 2020, 19: 6393–6408
Liu M F, Wang R. Deep reinforcement learning based dynamic power and beamforming design for time-varying wireless downlink interference channel. 2020. ArXiv:2011.03780
Lin J Y, Zou Y Z, Dong X R, et al. Deep reinforcement learning for robust beamforming in IRS-assisted wireless communications. In: Proceedings of IEEE Global Communications Conference (GLOBECOM), 2020. 1–6
Lee H, Girnyk M, Jeong J. Deep reinforcement learning approach to MIMO precoding problem: optimality and robustness. 2020. ArXiv:2006.16646
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. ArXiv:1509.02971
Wu H P, Xiao B, Codella N, et al. CVT: introducing convolutions to vision transformers. 2021. ArXiv:2103.15808
Janner M, Li Q Y, Levine S. Reinforcement learning as one big sequence modeling problem. 2021. ArXiv:2106.02039
Pratik K, Rao B D, Welling M. RE-MIMO: recurrent and permutation equivariant neural MIMO detection. IEEE Trans Signal Process, 2020, 69: 459–473
Chen Z L, Gu F L, Jiang R. Channel estimation method based on transformer in high dynamic environment. In: Proceedings of International Conference on Wireless Communications and Signal Processing (WCSP), 2020. 817–822
Wang S Y, Bi S Z, Zhang Y J. Deep reinforcement learning with communication transformer for adaptive live streaming in wireless edge networks. IEEE J Sel Areas Commun, 2022, 40: 308–322
Song X, Wang J, Wang J, et al. SALDR: joint self-attention learning and dense refine for massive MIMO CSI feedback with multiple compression ratio. IEEE Wirel Commun Lett, 2021, 10: 1899–1903
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018
Yang Y D, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective. 2020. ArXiv:2011.00583
Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. 2017. ArXiv:1706.02275
van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016
Yu X H, Shen J C, Zhang J, et al. Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems. IEEE J Sel Top Signal Process, 2016, 10: 485–500
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–778
Ba J L, Kiros J R, Hinton G E. Layer normalization. 2016. ArXiv:1607.06450
Jang E, Gu S X, Poole B. Categorical reparameterization with gumbel-softmax. 2016. ArXiv:1611.01144
Alkhateeb A. DeepMIMO: a generic deep learning dataset for millimeter wave and massive MIMO applications. 2019. ArXiv:1902.06435
LeCun Y A, Bottou L, Orr G B, et al. Efficient Backprop. Berlin: Springer, 2012
Acknowledgements
This work was supported in part by National Key R&D Program of China (Grant Nos. 2020YFB1807101, 2018YFB1801104) and National Natural Science Foundation of China (Grant Nos. 61725104, U20A20158, 61922071).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, Y., Zhang, Z., Che, J. et al. Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems. Sci. China Inf. Sci. 66, 162304 (2023). https://doi.org/10.1007/s11432-022-3542-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3542-6