Skip to main content
Log in

Adaptive resource allocation in 5G CP-OFDM systems using Markovian model-based reinforcement learning algorithm

  • S.I. : Latin American Computational Intelligence
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this work, an adaptive resource allocation algorithm based on reinforcement learning is proposed for multicarrier communication systems that consider multiple users and multipath channel characteristics assuming propagation of millimeter waves. An adaptive Markovian model represented by the queueing states in buffers and the channel states is proposed to describe the Cyclic Prefix—Orthogonal Frequency Division Multiplexing (CP-OFDM) communication system. Novel utility functions for the Markovian model-based Q-learning algorithm are introduced and evaluated. The performance of the proposed adaptive resource allocation scheme based on Markovian model reinforcement learning is verified via computational simulations considering real traffic traces. The simulation results show that the application of the proposed resource scheduling algorithm provides general improvements in the communication system performance parameters such as increased throughput and decreased packet loss when using the reward function proposals and increased energy efficiency for one of the proposed reward functions when compared to some algorithms present in the literature. Simulation results confirm that the proposed reward functions in conjunction with the Markov model make the scheduling of users and the sharing of resources more efficient for millimeter-wave-based CP-OFDM networks than traditional Q-learning-based algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Abbreviations

CP-OFDM:

Cyclic Prefix—Orthogonal Frequency Division Multiplexing

BER:

Bit error rate

TDL:

Tapped delay line

QoS:

Quality of service

TDM:

Time division multiplexing

IoT:

Internet of things

TTI:

Time transmission interval

LTE:

Long-term evolution

RB:

Resource blocks

MDP:

Markov decision process

MIMO:

Multiple input multiple output

MLE:

Maximum likelihood estimation

RL:

Reinforcement learning algorithm

MBRL:

Model-based reinforcement learning

AWGN:

Additive white Gaussian noise

SNR:

Signal-to-noise ratio

QAM:

Quadrature amplitude modulation

References

  1. Mahmoud M, Mohamad A (2016) A study of efficient power consumption wireless communication techniques/ modules for Internet of Things(IoT) applications. Adv Internet Things 06:19–29. https://doi.org/10.4236/ait.2016.62002

    Article  Google Scholar 

  2. Sangaiah AK, Hosseinabadi AAR, Shareh MB, Bozorgi Rad SY, Zolfagharian A, Chilamkurti N (2020) IoT resource allocation and optimization based on heuristic algorithm. Sensors 20(2):539

    Article  Google Scholar 

  3. Zhu J, Song Y, Jiang D, Song H (2018) A new deep-Q-learning-based transmission scheduling mechanism for the cognitive internet of things. IEEE Internet Things J 5(4):2375–2385. https://doi.org/10.1109/JIOT.2017.2759728

    Article  Google Scholar 

  4. Nassar A, Yilmaz Y (2019) Reinforcement learning for adaptive resource allocation in Fog RAN for IoT with heterogeneous latency requirements. IEEE Access 7:128014–128025. https://doi.org/10.1109/ACCESS.2019.2939735

    Article  Google Scholar 

  5. Wang HS, Moayeri N (1995) Finite-state Markov channel-a useful model for radio communication channels. IEEE Trans Veh Technol 44(1):163–171. https://doi.org/10.1109/25.350282

    Article  Google Scholar 

  6. 3GPP: Study on channel model for frequencies from 0.5 to 100 GHz (Release 15). Tech rep, 3GPP TR 38.901 (2018)

  7. Sinclair S, Wang T, Jain G, Banerjee S, Yu C (2020) Adaptive discretization for model-based reinforcement learning. Adv Neural Inf Process Syst 33:3858–3871

    Google Scholar 

  8. Ford R, Rangan S, Mellios E, Kong D, Nix A (2017) Markov channel-based performance analysis for millimeter wave mobile networks. In: 2017 IEEE wireless communications and networking conference (WCNC). pp. 1–6. IEEE

  9. Patteti K, Kumar T, Kalitkar K (2016) M-QAM BER and SER analysis of multipath fading channels in long term evolutions (LTE). Int J Signal Process, Image Process Pattern Recognit (IJSIP) Vol. 9, 361–368. doi: https://doi.org/10.14257/ijsip.2016.9.1.34

  10. David López-Pérez et al (2012) On distributed and coordinated resource allocation for interference mitigation in self-organizing LTE networks. IEEE/ACM Trans Netw 214:1145–1158

    Google Scholar 

  11. Meng Hao et al (2020) Fine-grained powercap allocation for power-constrained systems based on multi-objective machine learning. IEEE Trans Parallel Distrib Syst 32(7):1789–1801

    Google Scholar 

  12. Zhang H, Feng M, Long K, Karagiannidis GK, Nallanathan A (2019) Artificial intelligence-based resource allocation in ultradense networks: applying event-triggered Q-learning algorithms. IEEE Veh Technol Mag 14(4):56–63. https://doi.org/10.1109/MVT.2019.2938328

    Article  Google Scholar 

  13. Liang H, Zhang X, Zhang J, Li Q, Zhou S, Zhao L (2019) A novel adaptive resource allocation model based on SMDP and reinforcement learning algorithm in vehicular cloud system. IEEE Trans Veh Technol 68(10):10018–10029

    Article  Google Scholar 

  14. Wang S, Bi J, Wu J, Vasilakos AV, Fan Q (2019) VNE-TD: a virtual network embedding algorithm based on temporal-difference learning. Comput Netw 161:251–263

    Article  Google Scholar 

  15. Kuroda K, Kato H, Kim SJ, Naruse M, Hasegawa M (2018) Improving throughput using multi-armed bandit algorithm for wireless LANs. Nonlinear Theory Appl, IEICE 9(1):74–81

    Article  Google Scholar 

  16. Gentile C, Li S, Zappella G (2014) Online clustering of bandits. In: international conference on machine learning (pp. 757-765). PMLR

  17. Li S (2016) The art of clustering bandits (Doctoral dissertation, Università degli Studi dell’Insubria)

  18. Niu Y et al (2015) Exploiting device-to-device communications in joint scheduling of access and backhaul for mm wave small cells. IEEE J Sel Areas Commun 33(10):2052–2069

    Article  Google Scholar 

  19. Egorov V, Shpilman A (2022) Scalable multi-agent model-based reinforcement learning. arXiv preprint arXiv:2205.15023

  20. Voelcker C, Liao V, Garg A, Farahmand AM (2022) Value gradient weighted model-based reinforcement learning. arXiv preprint arXiv:2204.01464

  21. Luo FM, Xu T, Lai H, Chen XH, Zhang W, Yu Y (2022). A survey on model-based reinforcement learning. arXiv preprint arXiv:2206.09328

  22. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press

    MATH  Google Scholar 

  23. Sklar B, Harris F (2020) Digital communications: fundamentals and applications. Prentice Hall Upper Saddle River, NJ, USA:, 3 edn

  24. Jain R (2007) Channel models: a tutorial. In: WiMAX forum AATG. vol. 10. Washington Univ. St. Louis, Dept. CSE. https://www.cse.wustl.edu/~jain/cse574-08/ftp/channel_model_tutorial.pdf

  25. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2(1–2):83–97

    Article  MathSciNet  MATH  Google Scholar 

  26. 3GPP: 5G; Telecommunication management; study on system and functional aspects of energy efficiency in 5G networks (release 16). https://www.etsi.org/deliver/etsi_tr/132900_132999/132972/16.01.00_60/tr_132972v160100p.pdf

  27. Jain R, Durresi A, Babic G(1999) Throughput fairness index: an explanation. ATM Forum contribution. 99(45)

  28. Van Otterlo M, Wiering M (2012) Reinforcement learning and markov decision processes. In: reinforcement learning (pp. 3–42). Springer, Berlin, Heidelberg

  29. MAWI (2000) MAWI working group traffic archive. https://mawi.wide.ad.jp/mawi/2019 Kenjiro Cho, Koushirou Mitsuya and Akira Kato.“Traffic data repository at the WIDE Project.USENIX 2000 FREENIX Track, San Diego, CA (2000)

Download references

Acknowlegments

Work was carried out with the support of the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)—F. Code 001.

Funding

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interests or non-financial interests in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article. Traffic traces data used can be found at: https://mawi.wide.ad.jp/mawi/samplepoint-F/2019/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel P. Q. Carneiro.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carneiro, D.P.Q., Cardoso, A.A. & Vieira, F.H.T. Adaptive resource allocation in 5G CP-OFDM systems using Markovian model-based reinforcement learning algorithm. Neural Comput & Applic 35, 9421–9435 (2023). https://doi.org/10.1007/s00521-023-08406-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08406-2

Keywords

Navigation