Skip to main content
Log in

Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Machine learning (ML) has been empowering all aspects of the wireless communication system design, among which, the reinforcement learning (RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently. In this paper, we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave (mmWave) three-dimensional (3D) multi-input multi-output (MIMO) communication systems. The proposed scheme does not require perfect channel state information (CSI) or precise user location information which both are generally difficult to obtain in practice, and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users, multiple paths, and multiple 3D beams. In particular, a self-attention deep deterministic policy gradient (DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively. We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user. To this end, the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely, which leads to a near-optimal beamformer design. Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ayach O E, Rajagopal S, Abu-Surra S, et al. Spatially sparse precoding in millimeter wave MIMO systems. IEEE Trans Wirel Commun, 2014, 13: 1499–1513

    Article  Google Scholar 

  2. Che J Z, Zhang Z Y, Yang Z H, et al. Unsourced random massive access with beam-space tree decoding. IEEE J Sel Areas Commun, 2022, 40: 1146–1161

    Article  Google Scholar 

  3. Qi C H, Dong P H, Ma W Y, et al. Acquisition of channel state information for mmWave massive MIMO: traditional and machine learning-based approaches. Sci China Inf Sci, 2021, 64: 181301

    Article  Google Scholar 

  4. You X H, Wang C X, Huang J, et al. Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts. Sci China Inf Sci, 2021, 64: 110301

    Article  Google Scholar 

  5. Huang Y Z, Zhang Z Y, Wang J, et al. Joint AMC and resource allocation for mobile wireless networks based on distributed MARL. In: Proceedings of IEEE International Conference on Communications Workshops, 2022

  6. Xiao Z R, Zhang Z Y, Huang C W, et al. C-GRBFnet: a physics-inspired generative deep neural network for channel representation and prediction. IEEE J Sel Areas Commun, 2022, 40: 2282–2299

    Article  Google Scholar 

  7. Chen Z R, Zhang Z Y, Xiao Z R, et al. Viewing the MIMO channel as sequence rather than image: a Seq2Seq approach for efficient CSI feedback. In: Proceedings of IEEE Wireless Communications and Networking Conference, 2022. 2292–2297

  8. Pan J X, Ye N, Yu H X, et al. AI-driven blind signature classification for IoT connectivity: a deep learning approach. IEEE Trans Wirel Commun, 2022, 21: 6033–6047

    Article  Google Scholar 

  9. Zhang L, Tan J, Liang Y C, et al. Deep reinforcement learning-based modulation and coding scheme selection in cognitive heterogeneous networks. IEEE Trans Wirel Commun, 2019, 18: 3281–3294

    Article  Google Scholar 

  10. Ye N, An J P, Yu J H. Deep-learning-enhanced NOMA transceiver design for massive MTC: challenges, state of the art, and future directions. IEEE Wirel Commun, 2021, 28: 66–73

    Article  Google Scholar 

  11. Hua Y, Li R, Zhao Z, et al. GAN-powered deep distributional reinforcement learning for resource management in network slicing. IEEE J Sel Areas Commun, 2020, 38: 334–349

    Article  Google Scholar 

  12. 3GPP. Study on radiated metrics and test methodology for the verification of multi-antenna reception performance of NR User Equipment (UE). TR 38.827. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3519

  13. 3GPP. On procedures for beam selection and feedback signaling. TSG-RAN1. https://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGRl_86b/Docs/Rl-1610243-zip

  14. Lau V K N, Zhang F, Cui Y. Low complexity delay-constrained beamforming for multi-user MIMO systems with imperfect CSIT. IEEE Trans Signal Process, 2013, 61: 4090–4099

    Article  MathSciNet  MATH  Google Scholar 

  15. Sohrabi F, Yu W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J Sel Top Signal Process, 2016, 10: 501–513

    Article  Google Scholar 

  16. Palhares V M T, Flores A R, de Lamare R C. Robust MMSE precoding and power allocation for cell-free massive MIMO systems. IEEE Trans Veh Technol, 2021, 70: 5115–5120

    Article  Google Scholar 

  17. Yang D, Yang L L, Hanzo L. DFT-based beamforming weight-vector codebook design for spatially correlated channels in the unitary precoding aided multiuser downlink. In: Proceedings of IEEE International Conference on Communications (ICC), 2010. 1–5

  18. Jia R D, Chen X M, Zhong C J, et al. Design of non-orthogonal beamspace multiple access for cellular Internet-of-Things. IEEE J Sel Top Signal Process, 2019, 13: 538–552

    Article  Google Scholar 

  19. Zhang Y, Alrabeiah M, Alkhateeb A, et al. Reinforcement learning of beam codebooks in millimeter wave and terahertz MIMO systems. 2021. ArXiv:2102.11392

  20. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008

  21. Elbir A M, Papazafeiropoulos A K. Hybrid precoding for multiuser millimeter wave massive MIMO systems: a deep learning approach. IEEE Trans Veh Technol, 2019, 69: 552–563

    Article  Google Scholar 

  22. Li X F, Alkhateeb A. Deep learning for direct hybrid precoding in millimeter wave massive MIMO systems. In: Proceedings of the 53rd Asilomar Conference on Signals, Systems, and Computers, 2019. 800–805

  23. Alkhateeb A, Alex S, Varkey P, et al. Deep learning coordinated beamforming for highly-mobile millimeter wave systems. IEEE Access, 2018, 6: 37328–37348

    Article  Google Scholar 

  24. Li L X, Ren H, Cheng Q Q, et al. Millimeter-wave networking in the sky: a machine learning and mean field game approach for joint beamforming and beam-steering. IEEE Trans Wirel Commun, 2020, 19: 6393–6408

    Article  Google Scholar 

  25. Liu M F, Wang R. Deep reinforcement learning based dynamic power and beamforming design for time-varying wireless downlink interference channel. 2020. ArXiv:2011.03780

  26. Lin J Y, Zou Y Z, Dong X R, et al. Deep reinforcement learning for robust beamforming in IRS-assisted wireless communications. In: Proceedings of IEEE Global Communications Conference (GLOBECOM), 2020. 1–6

  27. Lee H, Girnyk M, Jeong J. Deep reinforcement learning approach to MIMO precoding problem: optimality and robustness. 2020. ArXiv:2006.16646

  28. Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. ArXiv:1509.02971

  29. Wu H P, Xiao B, Codella N, et al. CVT: introducing convolutions to vision transformers. 2021. ArXiv:2103.15808

  30. Janner M, Li Q Y, Levine S. Reinforcement learning as one big sequence modeling problem. 2021. ArXiv:2106.02039

  31. Pratik K, Rao B D, Welling M. RE-MIMO: recurrent and permutation equivariant neural MIMO detection. IEEE Trans Signal Process, 2020, 69: 459–473

    Article  MATH  Google Scholar 

  32. Chen Z L, Gu F L, Jiang R. Channel estimation method based on transformer in high dynamic environment. In: Proceedings of International Conference on Wireless Communications and Signal Processing (WCSP), 2020. 817–822

  33. Wang S Y, Bi S Z, Zhang Y J. Deep reinforcement learning with communication transformer for adaptive live streaming in wireless edge networks. IEEE J Sel Areas Commun, 2022, 40: 308–322

    Article  Google Scholar 

  34. Song X, Wang J, Wang J, et al. SALDR: joint self-attention learning and dense refine for massive MIMO CSI feedback with multiple compression ratio. IEEE Wirel Commun Lett, 2021, 10: 1899–1903

    Article  Google Scholar 

  35. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018

    MATH  Google Scholar 

  36. Yang Y D, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective. 2020. ArXiv:2011.00583

  37. Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. 2017. ArXiv:1706.02275

  38. van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016

  39. Yu X H, Shen J C, Zhang J, et al. Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems. IEEE J Sel Top Signal Process, 2016, 10: 485–500

    Article  Google Scholar 

  40. He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 770–778

  41. Ba J L, Kiros J R, Hinton G E. Layer normalization. 2016. ArXiv:1607.06450

  42. Jang E, Gu S X, Poole B. Categorical reparameterization with gumbel-softmax. 2016. ArXiv:1611.01144

  43. Alkhateeb A. DeepMIMO: a generic deep learning dataset for millimeter wave and massive MIMO applications. 2019. ArXiv:1902.06435

  44. LeCun Y A, Bottou L, Orr G B, et al. Efficient Backprop. Berlin: Springer, 2012

    Book  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (Grant Nos. 2020YFB1807101, 2018YFB1801104) and National Natural Science Foundation of China (Grant Nos. 61725104, U20A20158, 61922071).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaoyang Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Zhang, Z., Che, J. et al. Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems. Sci. China Inf. Sci. 66, 162304 (2023). https://doi.org/10.1007/s11432-022-3542-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3542-6

Keywords

Navigation