Skip to main content
Log in

D2D Resource Allocation Based on Reinforcement Learning and QoS

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Device-to-device (D2D) communications is designed to improve the overall network performance, including low latency, high data rates, and system capacity of the fifth-generation (5G) wireless networks. The system capacity can even be improved by reusing resources between D2D user equipments (DUEs) and cellular user equipments (CUEs) without causing harmful interference to the CUEs. A D2D resource allocation scheme is expected to have the characteristic that one CUE be allocated with a variable number of resource blocks (RBs), and the RBs be reused by more than one DUE. In this study, the Multi-Player Multi-Armed Bandit (MPMAB) reinforcement learning scheme is employed to model such a problem by establishing a preference matrix to facilitate greedy resource allocation. A fair resource allocation scheme is then proposed and shown to achieve fairness, prevent waste of resources, and alleviate starvation. Moreover, this scheme has better performance when there are not too many D2D pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Fig. 4
Algorithm 3
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data Availability

Data supporting this study are openly available from https://drive.google.com/drive/folders/1GNvIUmLMD1CsgaDsDdPvaic80I7IHNp4?usp=sharing.

References

  1. Osseiran A et al (2013) The foundation of the mobile and wireless communications system for 2020 and beyond: Challenges, enablers and technology solutions. In 2013 IEEE 77th Vehicular Technology Conference (VTC Spring). IEEE, pp. 1–5

  2. Ericsson L (2011) More than 50 billion connected devices. White Paper 14(1):124

    Google Scholar 

  3. Agiwal M, Roy A, Saxena N (2016) Next generation 5G wireless networks: a comprehensive survey. IEEE Commun Surv Tutorials 18(3):1617–1655

    Article  Google Scholar 

  4. Osseiran A et al (2014) Scenarios for 5G mobile and wireless communications: the vision of the METIS project. IEEE Commun Mag 52(5):26–35

    Article  Google Scholar 

  5. Mustafa HAU, Imran MA, Shakir MZ, Imran A, Tafazolli R (2015) Separation framework: an enabler for cooperative and D2D communication for future 5G networks. IEEE Commun Surv Tutorials 18(1):419–445

    Article  Google Scholar 

  6. Asadi A, Wang Q, Mancuso V (2014) A survey on device-to-device communication in cellular networks. IEEE Commun Surv Tutorials 16(4):1801–1819

    Article  Google Scholar 

  7. Doumi T et al (2013) LTE for public safety networks. IEEE Commun Mag 51(2):106–112

    Article  Google Scholar 

  8. Lin X, Andrews JG, Ghosh A, Ratasuk R (2014) An overview of 3GPP device-to-device proximity services. IEEE Commun Mag 52(4):40–48

    Article  Google Scholar 

  9. Feng D, Lu L, Yuan-Wu Y, Li GY, Li S, Feng G (2014) Device-to-device communications in cellular networks. IEEE Commun Mag 52(4):49–55

    Article  Google Scholar 

  10. Hakola S, Chen T, Lehtomäki J, Koskela T (2010) Device-to-device (D2D) communication in cellular network-performance analysis of optimum and practical communication mode selection. In 2010 IEEE wireless communication and networking conference. IEEE, pp. 1–6

  11. Peng T, Lu Q, Wang H, Xu S, Wang W (2009) Interference avoidance mechanisms in the hybrid cellular and device-to-device systems. In 2009 IEEE 20th international symposium on personal, indoor and mobile radio communications. IEEE, pp. 617–621

  12. Kaufman B, Aazhang B (2008) Cellular networks with an overlaid device to device network. In 2008 42nd Asilomar conference on signals, systems and computers. IEEE, pp. 1537–1541

  13. Li X, Wang Z, Sun Y, Gu Y, Hu J (2017) Mathematical characteristics of uplink and downlink interference regions in D2D communications underlaying cellular networks. Wireless Pers Commun 93(4):917–932

    Article  Google Scholar 

  14. 3GPP (2016) TS 36.213: Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer procedures

  15. Sun H, Sheng M, Wang X, Zhang Y, Liu J, Wang K (2013) Resource allocation for maximizing the device-to-device communications underlaying LTE-Advanced networks. In 2013 IEEE/CIC International Conference on Communications in China-Workshops (CIC/ICCC). IEEE, pp. 60–64

  16. Li X, Shankaran R, Orgun MA, Fang G, Xu Y (2018) Resource allocation for underlay D2D communication with proportional fairness. IEEE Trans Veh Technol 67(7):6244–6258

    Article  Google Scholar 

  17. Zhou W, Sun X, Ma C, Yue J, Yu H, Luo H (2013) An interference coordination mechanism based on resource allocation for network controlled device-to-device communication. In 2013 IEEE/CIC International Conference on Communications in China-Workshops (CIC/ICCC). IEEE, pp. 109–114

  18. Zulhasnine M, Huang C, Srinivasan A (2010) Efficient resource allocation for device-to-device communication underlaying LTE network. In 2010 IEEE 6th International conference on wireless and mobile computing, networking and communications. IEEE, pp. 368–375

  19. Wang F, Song L, Han Z, Zhao Q, Wang X (2013) Joint scheduling and resource allocation for device-to-device underlay communication. In 2013 IEEE wireless communications and networking conference (WCNC). IEEE, pp. 134–139

  20. Ren H, Jiang F, Wang H (2017) Resource allocation based on clustering algorithm for hybrid device-to-device networks. In 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, pp. 1–6

  21. Sutton RS, Barto AG (2011) Reinforcement learning: An introduction. The MIT Press

    Google Scholar 

  22. Wang M, Cui Y, Wang X, Xiao S, Jiang J (2017) Machine learning for networking: workflow, advances and opportunities. IEEE Network 32(2):92–99

    Article  Google Scholar 

  23. Usama M et al (2019) Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access 7:65579–65615

    Article  Google Scholar 

  24. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning (no. 10). Springer series in statistics New York

  25. Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24

    Google Scholar 

  26. Alpaydin E (2020) Introduction to machine learning. MIT Press

    Google Scholar 

  27. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  28. Luo Y, Shi Z, Zhou X, Liu Q, Yi Q (2014) Dynamic resource allocations based on q-learning for d2d communication in cellular networks. In 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP). IEEE, pp. 385–388

  29. Zhang Y, Wang C-Y, Wei H-Y (2018) Incentive compatible overlay D2D system: a group-based framework without CQI feedback. IEEE Trans Mob Comput 17(9):2069–2086

    Article  Google Scholar 

  30. Neogi A, Chaporkar P, Karandikar A (2018) Multi-Player Multi-Armed Bandit Based Resource Allocation for D2D Communications. arXiv preprint arXiv:1812.11837

  31. Huynh T, Onuma T, Kuroda K, Hasegawa M, Hwang W-J (2016) Joint downlink and uplink interference management for device to device communication underlaying cellular networks. IEEE Access 4:4420–4430. https://doi.org/10.1109/ACCESS.2016.2603149

  32. Ghosh A, Ratasuk R (2011) Essentials of LTE and LTE-A. Cambridge University Press

    Book  Google Scholar 

  33. W. contributors (July 22) Multi-armed bandit. Available: https://en.wikipedia.org/w/index.php?title=Multi-armed_bandit&oldid=1032060189. Accessed 8 July 2023

  34. Robot B (2020) Multi-Armed Bandits: Part 1 Mathematical Framework and Terminology. Available: https://towardsdatascience.com/multi-armed-bandits-part-1-b8d33ab80697. Accessed 8 July 2023

  35. Kuo F-C, Christian S, Wang H-C, Lin W-J, Tseng C-C (2020) D2D resource allocation with power control based on multi-player multi-armed bandit. Wireless Pers Commun 113(3):1455–1470

    Article  Google Scholar 

  36. Tran-Thanh L, Chapman A, De Cote EM, Rogers A, Jennings NR (2010) Epsilon–first policies for budget–limited multi-armed bandits. In Twenty-Fourth AAAI Conference on Artificial Intelligence

  37. Garivier A, Moulines E (2011) On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, pp. 174–188

  38. Liu X, Derakhshani M, Lambotharan S, Van der Schaar M (2020) Risk-aware multi-armed bandits with refined upper confidence bounds. IEEE Signal Process Lett 28:269–273

    Article  Google Scholar 

  39. Kuo F-C, Ting K-C, Wang H-C, Tseng C-C (2017) On demand resource allocation for LTE uplink transmission based on logical channel groups. Mobile Netw Appl 22(5):868–879

    Article  Google Scholar 

  40. Lucas JM, Saccucci MS (1990) Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32(1):1–12

    Article  MathSciNet  Google Scholar 

  41. 3GPP (2014) TR 36.843 Study on LTE device to device proximity services; Radio aspects

  42. Jain DMCR, Hawe W (1984) "A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems," Digital Equipment Corporation (DEC) Research Report TR-301

Download references

Acknowledgements

This research was supported by Ministry of Science and Technology of Taiwan under grant No. 108-2221-E-197-009 and 108-2221-E-197-011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jieh-Ren Chang.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix Pseudocode of MPMAB_GRA scheme

Appendix Pseudocode of MPMAB_GRA scheme

figure d

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuo, FC., Wang, HC., Tseng, CC. et al. D2D Resource Allocation Based on Reinforcement Learning and QoS. Mobile Netw Appl 28, 1076–1095 (2023). https://doi.org/10.1007/s11036-023-02145-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-023-02145-3

Keywords

Navigation