Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems

Huang, Yingzhi; Zhang, Zhaoyang; Che, Jingze; Yang, Zhaohui; Yang, Qianqian; Wong, Kai-Kit

doi:10.1007/s11432-022-3542-6

Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems

Research Paper
Published: 22 May 2023

Volume 66, article number 162304, (2023)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Yingzhi Huang¹,
Zhaoyang Zhang¹,
Jingze Che¹,
Zhaohui Yang¹,
Qianqian Yang¹ &
…
Kai-Kit Wong²

168 Accesses
3 Citations
Explore all metrics

Abstract

Machine learning (ML) has been empowering all aspects of the wireless communication system design, among which, the reinforcement learning (RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently. In this paper, we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave (mmWave) three-dimensional (3D) multi-input multi-output (MIMO) communication systems. The proposed scheme does not require perfect channel state information (CSI) or precise user location information which both are generally difficult to obtain in practice, and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users, multiple paths, and multiple 3D beams. In particular, a self-attention deep deterministic policy gradient (DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively. We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user. To this end, the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely, which leads to a near-optimal beamformer design. Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Massive MIMO Power Allocation in Millimeter Wave Networks

Deep reinforcement learning-based beam training with energy and spectral efficiency maximisation for millimetre-wave channels

Article Open access 14 November 2022

Spectrum-efficient user grouping and resource allocation based on deep reinforcement learning for mmWave massive MIMO-NOMA systems

Article Open access 17 April 2024

References

Ayach O E, Rajagopal S, Abu-Surra S, et al. Spatially sparse precoding in millimeter wave MIMO systems. IEEE Trans Wirel Commun, 2014, 13: 1499–1513
Article Google Scholar
Che J Z, Zhang Z Y, Yang Z H, et al. Unsourced random massive access with beam-space tree decoding. IEEE J Sel Areas Commun, 2022, 40: 1146–1161
Article Google Scholar
Qi C H, Dong P H, Ma W Y, et al. Acquisition of channel state information for mmWave massive MIMO: traditional and machine learning-based approaches. Sci China Inf Sci, 2021, 64: 181301
Article Google Scholar
You X H, Wang C X, Huang J, et al. Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts. Sci China Inf Sci, 2021, 64: 110301
Article Google Scholar
Huang Y Z, Zhang Z Y, Wang J, et al. Joint AMC and resource allocation for mobile wireless networks based on distributed MARL. In: Proceedings of IEEE International Conference on Communications Workshops, 2022
Xiao Z R, Zhang Z Y, Huang C W, et al. C-GRBFnet: a physics-inspired generative deep neural network for channel representation and prediction. IEEE J Sel Areas Commun, 2022, 40: 2282–2299
Article Google Scholar
Chen Z R, Zhang Z Y, Xiao Z R, et al. Viewing the MIMO channel as sequence rather than image: a Seq2Seq approach for efficient CSI feedback. In: Proceedings of IEEE Wireless Communications and Networking Conference, 2022. 2292–2297
Pan J X, Ye N, Yu H X, et al. AI-driven blind signature classification for IoT connectivity: a deep learning approach. IEEE Trans Wirel Commun, 2022, 21: 6033–6047
Article Google Scholar
Zhang L, Tan J, Liang Y C, et al. Deep reinforcement learning-based modulation and coding scheme selection in cognitive heterogeneous networks. IEEE Trans Wirel Commun, 2019, 18: 3281–3294
Article Google Scholar
Ye N, An J P, Yu J H. Deep-learning-enhanced NOMA transceiver design for massive MTC: challenges, state of the art, and future directions. IEEE Wirel Commun, 2021, 28: 66–73
Article Google Scholar
Hua Y, Li R, Zhao Z, et al. GAN-powered deep distributional reinforcement learning for resource management in network slicing. IEEE J Sel Areas Commun, 2020, 38: 334–349
Article Google Scholar
3GPP. Study on radiated metrics and test methodology for the verification of multi-antenna reception performance of NR User Equipment (UE). TR 38.827. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3519
3GPP. On procedures for beam selection and feedback signaling. TSG-RAN1. https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGRl_86b/Docs/Rl-1610243-zip
Lau V K N, Zhang F, Cui Y. Low complexity delay-constrained beamforming for multi-user MIMO systems with imperfect CSIT. IEEE Trans Signal Process, 2013, 61: 4090–4099
Article MathSciNet MATH Google Scholar
Sohrabi F, Yu W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J Sel Top Signal Process, 2016, 10: 501–513
Article Google Scholar
Palhares V M T, Flores A R, de Lamare R C. Robust MMSE precoding and power allocation for cell-free massive MIMO systems. IEEE Trans Veh Technol, 2021, 70: 5115–5120
Article Google Scholar
Yang D, Yang L L, Hanzo L. DFT-based beamforming weight-vector codebook design for spatially correlated channels in the unitary precoding aided multiuser downlink. In: Proceedings of IEEE International Conference on Communications (ICC), 2010. 1–5
Jia R D, Chen X M, Zhong C J, et al. Design of non-orthogonal beamspace multiple access for cellular Internet-of-Things. IEEE J Sel Top Signal Process, 2019, 13: 538–552
Article Google Scholar
Zhang Y, Alrabeiah M, Alkhateeb A, et al. Reinforcement learning of beam codebooks in millimeter wave and terahertz MIMO systems. 2021. ArXiv:2102.11392
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008
Elbir A M, Papazafeiropoulos A K. Hybrid precoding for multiuser millimeter wave massive MIMO systems: a deep learning approach. IEEE Trans Veh Technol, 2019, 69: 552–563
Article Google Scholar
Li X F, Alkhateeb A. Deep learning for direct hybrid precoding in millimeter wave massive MIMO systems. In: Proceedings of the 53rd Asilomar Conference on Signals, Systems, and Computers, 2019. 800–805
Alkhateeb A, Alex S, Varkey P, et al. Deep learning coordinated beamforming for highly-mobile millimeter wave systems. IEEE Access, 2018, 6: 37328–37348
Article Google Scholar
Li L X, Ren H, Cheng Q Q, et al. Millimeter-wave networking in the sky: a machine learning and mean field game approach for joint beamforming and beam-steering. IEEE Trans Wirel Commun, 2020, 19: 6393–6408
Article Google Scholar
Liu M F, Wang R. Deep reinforcement learning based dynamic power and beamforming design for time-varying wireless downlink interference channel. 2020. ArXiv:2011.03780
Lin J Y, Zou Y Z, Dong X R, et al. Deep reinforcement learning for robust beamforming in IRS-assisted wireless communications. In: Proceedings of IEEE Global Communications Conference (GLOBECOM), 2020. 1–6
Lee H, Girnyk M, Jeong J. Deep reinforcement learning approach to MIMO precoding problem: optimality and robustness. 2020. ArXiv:2006.16646
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. ArXiv:1509.02971
Wu H P, Xiao B, Codella N, et al. CVT: introducing convolutions to vision transformers. 2021. ArXiv:2103.15808
Janner M, Li Q Y, Levine S. Reinforcement learning as one big sequence modeling problem. 2021. ArXiv:2106.02039
Pratik K, Rao B D, Welling M. RE-MIMO: recurrent and permutation equivariant neural MIMO detection. IEEE Trans Signal Process, 2020, 69: 459–473
Article MATH Google Scholar
Chen Z L, Gu F L, Jiang R. Channel estimation method based on transformer in high dynamic environment. In: Proceedings of International Conference on Wireless Communications and Signal Processing (WCSP), 2020. 817–822
Wang S Y, Bi S Z, Zhang Y J. Deep reinforcement learning with communication transformer for adaptive live streaming in wireless edge networks. IEEE J Sel Areas Commun, 2022, 40: 308–322
Article Google Scholar
Song X, Wang J, Wang J, et al. SALDR: joint self-attention learning and dense refine for massive MIMO CSI feedback with multiple compression ratio. IEEE Wirel Commun Lett, 2021, 10: 1899–1903
Article Google Scholar
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018
MATH Google Scholar
Yang Y D, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective. 2020. ArXiv:2011.00583
Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. 2017. ArXiv:1706.02275
van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016
Yu X H, Shen J C, Zhang J, et al. Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems. IEEE J Sel Top Signal Process, 2016, 10: 485–500
Article Google Scholar
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–778
Ba J L, Kiros J R, Hinton G E. Layer normalization. 2016. ArXiv:1607.06450
Jang E, Gu S X, Poole B. Categorical reparameterization with gumbel-softmax. 2016. ArXiv:1611.01144
Alkhateeb A. DeepMIMO: a generic deep learning dataset for millimeter wave and massive MIMO applications. 2019. ArXiv:1902.06435
LeCun Y A, Bottou L, Orr G B, et al. Efficient Backprop. Berlin: Springer, 2012
Book Google Scholar

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (Grant Nos. 2020YFB1807101, 2018YFB1801104) and National Natural Science Foundation of China (Grant Nos. 61725104, U20A20158, 61922071).

Author information

Authors and Affiliations

College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
Yingzhi Huang, Zhaoyang Zhang, Jingze Che, Zhaohui Yang & Qianqian Yang
Department of Electronic and Electrical Engineering, University College London, London, WC1E 6BT, UK
Kai-Kit Wong

Authors

Yingzhi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jingze Che
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qianqian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Kit Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaoyang Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Y., Zhang, Z., Che, J. et al. Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems. Sci. China Inf. Sci. 66, 162304 (2023). https://doi.org/10.1007/s11432-022-3542-6

Download citation

Received: 20 February 2022
Revised: 19 May 2022
Accepted: 01 July 2022
Published: 22 May 2023
DOI: https://doi.org/10.1007/s11432-022-3542-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems

Abstract

Access this article

Similar content being viewed by others

Massive MIMO Power Allocation in Millimeter Wave Networks

Deep reinforcement learning-based beam training with energy and spectral efficiency maximisation for millimetre-wave channels

Spectrum-efficient user grouping and resource allocation based on deep reinforcement learning for mmWave massive MIMO-NOMA systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems

Abstract

Access this article

Similar content being viewed by others

Massive MIMO Power Allocation in Millimeter Wave Networks

Deep reinforcement learning-based beam training with energy and spectral efficiency maximisation for millimetre-wave channels

Spectrum-efficient user grouping and resource allocation based on deep reinforcement learning for mmWave massive MIMO-NOMA systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation