Abstract
Device-to-device (D2D) communications is designed to improve the overall network performance, including low latency, high data rates, and system capacity of the fifth-generation (5G) wireless networks. The system capacity can even be improved by reusing resources between D2D user equipments (DUEs) and cellular user equipments (CUEs) without causing harmful interference to the CUEs. A D2D resource allocation scheme is expected to have the characteristic that one CUE be allocated with a variable number of resource blocks (RBs), and the RBs be reused by more than one DUE. In this study, the Multi-Player Multi-Armed Bandit (MPMAB) reinforcement learning scheme is employed to model such a problem by establishing a preference matrix to facilitate greedy resource allocation. A fair resource allocation scheme is then proposed and shown to achieve fairness, prevent waste of resources, and alleviate starvation. Moreover, this scheme has better performance when there are not too many D2D pairs.
Similar content being viewed by others
Data Availability
Data supporting this study are openly available from https://drive.google.com/drive/folders/1GNvIUmLMD1CsgaDsDdPvaic80I7IHNp4?usp=sharing.
References
Osseiran A et al (2013) The foundation of the mobile and wireless communications system for 2020 and beyond: Challenges, enablers and technology solutions. In 2013 IEEE 77th Vehicular Technology Conference (VTC Spring). IEEE, pp. 1–5
Ericsson L (2011) More than 50 billion connected devices. White Paper 14(1):124
Agiwal M, Roy A, Saxena N (2016) Next generation 5G wireless networks: a comprehensive survey. IEEE Commun Surv Tutorials 18(3):1617–1655
Osseiran A et al (2014) Scenarios for 5G mobile and wireless communications: the vision of the METIS project. IEEE Commun Mag 52(5):26–35
Mustafa HAU, Imran MA, Shakir MZ, Imran A, Tafazolli R (2015) Separation framework: an enabler for cooperative and D2D communication for future 5G networks. IEEE Commun Surv Tutorials 18(1):419–445
Asadi A, Wang Q, Mancuso V (2014) A survey on device-to-device communication in cellular networks. IEEE Commun Surv Tutorials 16(4):1801–1819
Doumi T et al (2013) LTE for public safety networks. IEEE Commun Mag 51(2):106–112
Lin X, Andrews JG, Ghosh A, Ratasuk R (2014) An overview of 3GPP device-to-device proximity services. IEEE Commun Mag 52(4):40–48
Feng D, Lu L, Yuan-Wu Y, Li GY, Li S, Feng G (2014) Device-to-device communications in cellular networks. IEEE Commun Mag 52(4):49–55
Hakola S, Chen T, Lehtomäki J, Koskela T (2010) Device-to-device (D2D) communication in cellular network-performance analysis of optimum and practical communication mode selection. In 2010 IEEE wireless communication and networking conference. IEEE, pp. 1–6
Peng T, Lu Q, Wang H, Xu S, Wang W (2009) Interference avoidance mechanisms in the hybrid cellular and device-to-device systems. In 2009 IEEE 20th international symposium on personal, indoor and mobile radio communications. IEEE, pp. 617–621
Kaufman B, Aazhang B (2008) Cellular networks with an overlaid device to device network. In 2008 42nd Asilomar conference on signals, systems and computers. IEEE, pp. 1537–1541
Li X, Wang Z, Sun Y, Gu Y, Hu J (2017) Mathematical characteristics of uplink and downlink interference regions in D2D communications underlaying cellular networks. Wireless Pers Commun 93(4):917–932
3GPP (2016) TS 36.213: Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer procedures
Sun H, Sheng M, Wang X, Zhang Y, Liu J, Wang K (2013) Resource allocation for maximizing the device-to-device communications underlaying LTE-Advanced networks. In 2013 IEEE/CIC International Conference on Communications in China-Workshops (CIC/ICCC). IEEE, pp. 60–64
Li X, Shankaran R, Orgun MA, Fang G, Xu Y (2018) Resource allocation for underlay D2D communication with proportional fairness. IEEE Trans Veh Technol 67(7):6244–6258
Zhou W, Sun X, Ma C, Yue J, Yu H, Luo H (2013) An interference coordination mechanism based on resource allocation for network controlled device-to-device communication. In 2013 IEEE/CIC International Conference on Communications in China-Workshops (CIC/ICCC). IEEE, pp. 109–114
Zulhasnine M, Huang C, Srinivasan A (2010) Efficient resource allocation for device-to-device communication underlaying LTE network. In 2010 IEEE 6th International conference on wireless and mobile computing, networking and communications. IEEE, pp. 368–375
Wang F, Song L, Han Z, Zhao Q, Wang X (2013) Joint scheduling and resource allocation for device-to-device underlay communication. In 2013 IEEE wireless communications and networking conference (WCNC). IEEE, pp. 134–139
Ren H, Jiang F, Wang H (2017) Resource allocation based on clustering algorithm for hybrid device-to-device networks. In 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, pp. 1–6
Sutton RS, Barto AG (2011) Reinforcement learning: An introduction. The MIT Press
Wang M, Cui Y, Wang X, Xiao S, Jiang J (2017) Machine learning for networking: workflow, advances and opportunities. IEEE Network 32(2):92–99
Usama M et al (2019) Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access 7:65579–65615
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning (no. 10). Springer series in statistics New York
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24
Alpaydin E (2020) Introduction to machine learning. MIT Press
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Luo Y, Shi Z, Zhou X, Liu Q, Yi Q (2014) Dynamic resource allocations based on q-learning for d2d communication in cellular networks. In 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP). IEEE, pp. 385–388
Zhang Y, Wang C-Y, Wei H-Y (2018) Incentive compatible overlay D2D system: a group-based framework without CQI feedback. IEEE Trans Mob Comput 17(9):2069–2086
Neogi A, Chaporkar P, Karandikar A (2018) Multi-Player Multi-Armed Bandit Based Resource Allocation for D2D Communications. arXiv preprint arXiv:1812.11837
Huynh T, Onuma T, Kuroda K, Hasegawa M, Hwang W-J (2016) Joint downlink and uplink interference management for device to device communication underlaying cellular networks. IEEE Access 4:4420–4430. https://doi.org/10.1109/ACCESS.2016.2603149
Ghosh A, Ratasuk R (2011) Essentials of LTE and LTE-A. Cambridge University Press
W. contributors (July 22) Multi-armed bandit. Available: https://en.wikipedia.org/w/index.php?title=Multi-armed_bandit&oldid=1032060189. Accessed 8 July 2023
Robot B (2020) Multi-Armed Bandits: Part 1 Mathematical Framework and Terminology. Available: https://towardsdatascience.com/multi-armed-bandits-part-1-b8d33ab80697. Accessed 8 July 2023
Kuo F-C, Christian S, Wang H-C, Lin W-J, Tseng C-C (2020) D2D resource allocation with power control based on multi-player multi-armed bandit. Wireless Pers Commun 113(3):1455–1470
Tran-Thanh L, Chapman A, De Cote EM, Rogers A, Jennings NR (2010) Epsilon–first policies for budget–limited multi-armed bandits. In Twenty-Fourth AAAI Conference on Artificial Intelligence
Garivier A, Moulines E (2011) On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, pp. 174–188
Liu X, Derakhshani M, Lambotharan S, Van der Schaar M (2020) Risk-aware multi-armed bandits with refined upper confidence bounds. IEEE Signal Process Lett 28:269–273
Kuo F-C, Ting K-C, Wang H-C, Tseng C-C (2017) On demand resource allocation for LTE uplink transmission based on logical channel groups. Mobile Netw Appl 22(5):868–879
Lucas JM, Saccucci MS (1990) Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32(1):1–12
3GPP (2014) TR 36.843 Study on LTE device to device proximity services; Radio aspects
Jain DMCR, Hawe W (1984) "A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems," Digital Equipment Corporation (DEC) Research Report TR-301
Acknowledgements
This research was supported by Ministry of Science and Technology of Taiwan under grant No. 108-2221-E-197-009 and 108-2221-E-197-011.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix Pseudocode of MPMAB_GRA scheme
Appendix Pseudocode of MPMAB_GRA scheme
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kuo, FC., Wang, HC., Tseng, CC. et al. D2D Resource Allocation Based on Reinforcement Learning and QoS. Mobile Netw Appl 28, 1076–1095 (2023). https://doi.org/10.1007/s11036-023-02145-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-023-02145-3