Abstract
In this work, an adaptive resource allocation algorithm based on reinforcement learning is proposed for multicarrier communication systems that consider multiple users and multipath channel characteristics assuming propagation of millimeter waves. An adaptive Markovian model represented by the queueing states in buffers and the channel states is proposed to describe the Cyclic Prefix—Orthogonal Frequency Division Multiplexing (CP-OFDM) communication system. Novel utility functions for the Markovian model-based Q-learning algorithm are introduced and evaluated. The performance of the proposed adaptive resource allocation scheme based on Markovian model reinforcement learning is verified via computational simulations considering real traffic traces. The simulation results show that the application of the proposed resource scheduling algorithm provides general improvements in the communication system performance parameters such as increased throughput and decreased packet loss when using the reward function proposals and increased energy efficiency for one of the proposed reward functions when compared to some algorithms present in the literature. Simulation results confirm that the proposed reward functions in conjunction with the Markov model make the scheduling of users and the sharing of resources more efficient for millimeter-wave-based CP-OFDM networks than traditional Q-learning-based algorithms.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08406-2/MediaObjects/521_2023_8406_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08406-2/MediaObjects/521_2023_8406_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08406-2/MediaObjects/521_2023_8406_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08406-2/MediaObjects/521_2023_8406_Fig4_HTML.png)
Similar content being viewed by others
Abbreviations
- CP-OFDM:
-
Cyclic Prefix—Orthogonal Frequency Division Multiplexing
- BER:
-
Bit error rate
- TDL:
-
Tapped delay line
- QoS:
-
Quality of service
- TDM:
-
Time division multiplexing
- IoT:
-
Internet of things
- TTI:
-
Time transmission interval
- LTE:
-
Long-term evolution
- RB:
-
Resource blocks
- MDP:
-
Markov decision process
- MIMO:
-
Multiple input multiple output
- MLE:
-
Maximum likelihood estimation
- RL:
-
Reinforcement learning algorithm
- MBRL:
-
Model-based reinforcement learning
- AWGN:
-
Additive white Gaussian noise
- SNR:
-
Signal-to-noise ratio
- QAM:
-
Quadrature amplitude modulation
References
Mahmoud M, Mohamad A (2016) A study of efficient power consumption wireless communication techniques/ modules for Internet of Things(IoT) applications. Adv Internet Things 06:19–29. https://doi.org/10.4236/ait.2016.62002
Sangaiah AK, Hosseinabadi AAR, Shareh MB, Bozorgi Rad SY, Zolfagharian A, Chilamkurti N (2020) IoT resource allocation and optimization based on heuristic algorithm. Sensors 20(2):539
Zhu J, Song Y, Jiang D, Song H (2018) A new deep-Q-learning-based transmission scheduling mechanism for the cognitive internet of things. IEEE Internet Things J 5(4):2375–2385. https://doi.org/10.1109/JIOT.2017.2759728
Nassar A, Yilmaz Y (2019) Reinforcement learning for adaptive resource allocation in Fog RAN for IoT with heterogeneous latency requirements. IEEE Access 7:128014–128025. https://doi.org/10.1109/ACCESS.2019.2939735
Wang HS, Moayeri N (1995) Finite-state Markov channel-a useful model for radio communication channels. IEEE Trans Veh Technol 44(1):163–171. https://doi.org/10.1109/25.350282
3GPP: Study on channel model for frequencies from 0.5 to 100 GHz (Release 15). Tech rep, 3GPP TR 38.901 (2018)
Sinclair S, Wang T, Jain G, Banerjee S, Yu C (2020) Adaptive discretization for model-based reinforcement learning. Adv Neural Inf Process Syst 33:3858–3871
Ford R, Rangan S, Mellios E, Kong D, Nix A (2017) Markov channel-based performance analysis for millimeter wave mobile networks. In: 2017 IEEE wireless communications and networking conference (WCNC). pp. 1–6. IEEE
Patteti K, Kumar T, Kalitkar K (2016) M-QAM BER and SER analysis of multipath fading channels in long term evolutions (LTE). Int J Signal Process, Image Process Pattern Recognit (IJSIP) Vol. 9, 361–368. doi: https://doi.org/10.14257/ijsip.2016.9.1.34
David López-Pérez et al (2012) On distributed and coordinated resource allocation for interference mitigation in self-organizing LTE networks. IEEE/ACM Trans Netw 214:1145–1158
Meng Hao et al (2020) Fine-grained powercap allocation for power-constrained systems based on multi-objective machine learning. IEEE Trans Parallel Distrib Syst 32(7):1789–1801
Zhang H, Feng M, Long K, Karagiannidis GK, Nallanathan A (2019) Artificial intelligence-based resource allocation in ultradense networks: applying event-triggered Q-learning algorithms. IEEE Veh Technol Mag 14(4):56–63. https://doi.org/10.1109/MVT.2019.2938328
Liang H, Zhang X, Zhang J, Li Q, Zhou S, Zhao L (2019) A novel adaptive resource allocation model based on SMDP and reinforcement learning algorithm in vehicular cloud system. IEEE Trans Veh Technol 68(10):10018–10029
Wang S, Bi J, Wu J, Vasilakos AV, Fan Q (2019) VNE-TD: a virtual network embedding algorithm based on temporal-difference learning. Comput Netw 161:251–263
Kuroda K, Kato H, Kim SJ, Naruse M, Hasegawa M (2018) Improving throughput using multi-armed bandit algorithm for wireless LANs. Nonlinear Theory Appl, IEICE 9(1):74–81
Gentile C, Li S, Zappella G (2014) Online clustering of bandits. In: international conference on machine learning (pp. 757-765). PMLR
Li S (2016) The art of clustering bandits (Doctoral dissertation, Università degli Studi dell’Insubria)
Niu Y et al (2015) Exploiting device-to-device communications in joint scheduling of access and backhaul for mm wave small cells. IEEE J Sel Areas Commun 33(10):2052–2069
Egorov V, Shpilman A (2022) Scalable multi-agent model-based reinforcement learning. arXiv preprint arXiv:2205.15023
Voelcker C, Liao V, Garg A, Farahmand AM (2022) Value gradient weighted model-based reinforcement learning. arXiv preprint arXiv:2204.01464
Luo FM, Xu T, Lai H, Chen XH, Zhang W, Yu Y (2022). A survey on model-based reinforcement learning. arXiv preprint arXiv:2206.09328
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press
Sklar B, Harris F (2020) Digital communications: fundamentals and applications. Prentice Hall Upper Saddle River, NJ, USA:, 3 edn
Jain R (2007) Channel models: a tutorial. In: WiMAX forum AATG. vol. 10. Washington Univ. St. Louis, Dept. CSE. https://www.cse.wustl.edu/~jain/cse574-08/ftp/channel_model_tutorial.pdf
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2(1–2):83–97
3GPP: 5G; Telecommunication management; study on system and functional aspects of energy efficiency in 5G networks (release 16). https://www.etsi.org/deliver/etsi_tr/132900_132999/132972/16.01.00_60/tr_132972v160100p.pdf
Jain R, Durresi A, Babic G(1999) Throughput fairness index: an explanation. ATM Forum contribution. 99(45)
Van Otterlo M, Wiering M (2012) Reinforcement learning and markov decision processes. In: reinforcement learning (pp. 3–42). Springer, Berlin, Heidelberg
MAWI (2000) MAWI working group traffic archive. https://mawi.wide.ad.jp/mawi/2019 Kenjiro Cho, Koushirou Mitsuya and Akira Kato.“Traffic data repository at the WIDE Project.USENIX 2000 FREENIX Track, San Diego, CA (2000)
Acknowlegments
Work was carried out with the support of the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)—F. Code 001.
Funding
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interests or non-financial interests in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article. Traffic traces data used can be found at: https://mawi.wide.ad.jp/mawi/samplepoint-F/2019/.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Carneiro, D.P.Q., Cardoso, A.A. & Vieira, F.H.T. Adaptive resource allocation in 5G CP-OFDM systems using Markovian model-based reinforcement learning algorithm. Neural Comput & Applic 35, 9421–9435 (2023). https://doi.org/10.1007/s00521-023-08406-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08406-2