Skip to main content

Reinforcement Based User Scheduling for Cellular Communications

  • 184 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13301)


Scheduling in cellular networks is one of the most influential factors in performance in wireless deployments such as 4G and 5G and is one of the most challenging and influential resource allocation tasks performed by the base station. It requires the handling of two important performance metrics, throughput and fairness. Fundamentally, these two metrics challenge one another, and maximization of one might come at the expense of the other. On the one hand maximizing the throughput, which is the goal of many communication networks, requires allocating the resources to users with better channel conditions. On the other hand, fairness requires allocating some resources to users with poor channel conditions. One of the prevalent scheduling schemes relies on maximization of the proportional fairness criterion that balances between the two aforementioned metrics with minimal compromise. Proportional fairness based schedulers commonly rely on a greedy approach in which each resource block is allocated to the user that maximizes the proportional fairness criterion. However, typically users can tolerate some delay especially if it boosts their performance.

Motivated by this assertion, we suggest a reinforcement-based proportional-fair scheduler for cellular networks. The suggested scheduler incorporates users’ channel estimates together with predicted future channel estimates in the process of resource allocation, in order to maximize the proportional fairness criterion in predefined periodic time epochs. We developed a reinforcement learning tool that learns the users’ channel fluctuations and decides upon the best user selection at each time slot in order to achieve the best fairness in throughput trade-off over multiple time slots. We demonstrate through simulations how such a scheduler outperforms the standardized proportional fairness. We further implemented the suggested scheme on a real live 4G base station, also known as an EnodeB, and showed similar gains.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-07689-3_15
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-07689-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.


  1. Asadi, A., Mancuso, V.: A survey on opportunistic scheduling in wireless communications. IEEE Commun. Surv. Tutor. 15(4), 1671–1688 (2013)

    CrossRef  Google Scholar 

  2. Bang, H.J., Ekman, T., Gesbert, D.: Channel predictive proportional fair scheduling. IEEE Trans. Wirel. Commun. 7(2), 482–487 (2008)

    CrossRef  Google Scholar 

  3. Capozzi, F., Piro, G., Grieco, L.A., Boggia, G., Camarda, P.: Downlink packet scheduling in ITE cellular networks: key design issues and a survey. IEEE Commun. Surv. Tutor. 15(2), 678–700 (2012)

    CrossRef  Google Scholar 

  4. Chung, S.T., Goldsmith, A.J.: Degrees of freedom in adaptive modulation: a unified view. IEEE Trans. Commun. 49(9), 1561–1571 (2001)

    CrossRef  Google Scholar 

  5. Donthi, S.N., Mehta, N.B.: An accurate model for EESM and its application to analysis of CQI feedback schemes and scheduling in ITE. IEEE Trans. Wirel. Commun. 10(10), 3436–3448 (2011)

    CrossRef  Google Scholar 

  6. Duran, A., Toril, M., Ruiz, F., Mendo, A.: Self-optimization algorithm for outer loop link adaptation in ITE. IEEE Commun. Lett. 19(11), 2005–2008 (2015)

    CrossRef  Google Scholar 

  7. Elliott, E.O.: Estimates of error rates for codes on burst-noise channels. Bell Syst. Tech. J. 42(5), 1977–1997 (1963)

    CrossRef  Google Scholar 

  8. Gilbert, E.N.: Capacity of a burst-noise channel. Bell Syst. Tech. J. 39(5), 1253–1265 (1960)

    MathSciNet  CrossRef  Google Scholar 

  9. Huaizhou, S.H.I., Venkatesha Prasad, R., Onur, E., Niemegeers, I.G.M.M.: Fairness in wireless networks: issues, measures and challenges. IEEE Commun. Surv. Tutor. 16(1), 5–24 (2013)

    Google Scholar 

  10. Kelly, F.: Charging and rate control for elastic traffic. Eur. Trans. Telecommun. 8(1), 33–37 (1997)

    CrossRef  Google Scholar 

  11. Morales-Jimnez, D., Scnchez, J.J., Gmez, G., Aguayo-Torres, M.C., Entrambasaguas, J.T.: Imperfect adaptation in next generation OFDMA cellular systems (2009)

    Google Scholar 

  12. Ouyang, W., Eryilmaz, A., Shroff, N.B.: Downlink scheduling over Markovian fading channels. IEEE/ACM Trans. Netw. 24(3), 1801–1812 (2015)

    CrossRef  Google Scholar 

  13. Piazza, D., Milstein, L.B.: Multiuser diversity-mobility tradeoff: modeling and performance analysis of a proportional fair scheduling. In: Global Telecommunications Conference, 2002 (GLOBECOM’02), vol. 1, pp. 906–910. IEEE (2002)

    Google Scholar 

  14. Sesia, S., Toufik, I., Baker, M.: LTE-the UMTS Long Term Evolution: From Theory to Practice. Wiley (2011)

    Google Scholar 

  15. Shmuel, O., Cohen, A., Gurewitz, O.: Performance analysis of opportunistic distributed scheduling in multi-user systems. IEEE Trans. Commun. 66(10), 4637–4652 (2018)

    Google Scholar 

  16. Tokic, M., Palm, G.: Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS (LNAI), vol. 7006, pp. 335–346. Springer, Heidelberg (2011).

  17. Tsai, T.-Y., , Chung, Y.-L., Tsai, Z.: Introduction to packet scheduling algorithms for communication networks. Sciyo (2010)

    Google Scholar 

  18. Viswanath, P., Tse, D.N.C., Laroia, R.: Opportunistic beamforming using dumb antennas. In: Proceedings IEEE International Symposium on Information Theory, p. 449. IEEE (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Nimrod Gradus , Asaf Cohen , Erez Biton or Omer Gurwitz .

Editor information

Editors and Affiliations

8 Appendix

8 Appendix

1.1 8.1 LTE Basic Terms

In this study we generally follow the conventional frequency division duplex (FDD) cellular resource units, in which the time is slotted into frames, and each frame is divided into constant 1 ms intervals, denoted as sub-frames. Each subframe is divided into parts termed physical resource resource blocks, which we shall refer to as simply resource blocks. Each such resource block comprises a bandwidth and time duration, e.g., in LTE each resource block comprises 12 sub-carriers in the frequency domain and 14 OFDM symbols in the time domain.

1.2 8.2 Downlink Link Adaptation (DLLA)

As mentioned earlier, opportunistic scheduling, e.g., proportional fairness, takes into consideration the users channel quality reports for better scheduling decisions. In particular, note that in the algorithm presented above, in order for the scheduler to select the user according to Eq. \(\underset{k}{\arg \max } \frac{R_k(t)}{T_k(t)}\) it needs to know the instantaneous rates of all users. In wireless networks, these channel states of users are attained via reports indicating the users’ supported rates for transmission. Furthermore, each practical system supports only a finite set of rates. Link Adaptation is the mechanism where the users’ transmission code rates and modulation schemes are selected based on the channel conditions.

In this section, we briefly explain the concepts and processes of DLLA that is utilized in simulations and experimental results for scheduling using RL. since in the evaluation part both in the simulations and experimental results we follow a typical LTE DLLA, in the following subsection we will provide a technical description of the DLLA we utilized. Our description follows the common terminology and the accepted acronyms hence it is somewhat cumbersome.

The DLLA process is a crucial part of current wireless communication systems. Such technique increases the data rate that can be reliably transmitted [4] and has been adopted as a core feature in cellular standards such as LTE. The LA role in the MAC layer of the base station (BS) is to suggest the scheduler an appropriate modulation and coding scheme (MCS) to be used in the next transmissions to a certain user equipment (UE) in order to keep the block error rate (BLER) below a target. The proposed MCS is signaled from the UE by means of channel quality indicator (CQI) in the form of reports it sends to the BS, [14]. Afterwards, the BS uses a pre-calculated table for the mapping of CQI to a transport block size index (ITBS), an integer ranging from 1–26, which is used in the decision of the transport block (TB) size to be transmitted to the UE. The TB size is also determined by the number of physical resource blocks (PRBs) which can be allocated to the UE. In LTE the radio resources are allocated in the time/frequency domain. In particular, the time is slotted into intervals of 1 ms corresponding to 14 OFDM symbols. and in the frequency domain, the total bandwidth is divided into sub-channels of 180 kHz, each one with twelve consecutive and equally spaced OFDM sub-carriers. A time/frequency radio resource spanning over 1 ms time slot/14 OFDM symbols and twelve consecutive sub-carriers is called a physical resource block(PRB), or just RB, and corresponds to the smallest radio resource unit that can be assigned to a user for transmission. As the sub-channel size is fixed, the number of RBs varies according to the system bandwidth configuration, and it is the scheduler’s decision to divide the total number of RBs to each scheduled UE in the time slot. The ITBS, together with the number of RBs that are allocated to the UE are mapped to the size of the TB.

The CQI reported by the UE on a per transmission time interval (TTI) basis, delivers information on how good/bad the downlink communication channel is. The UE’s measurement of CQI depends solely on the chipset vendors and is derived from UE’s measurement of the reference signals transmitted by the BS. The reference signals received power (RSRP) that is measured by the UE is than used to calculate the link quality metric (LQM) which quantifies the quality of the downlink and is used to determine the CQI. The LQM that is mostly used in LTE is the exponential effective SNR mapping (EESM) [5]. The process of selecting the most suitable MCS based on the link quality measurements is called inner loop link adaptation (ILLA) [6].

Due to various errors in the CQI measurements of the UE, the delay in the reporting process and deviations from the assumed channel conditions, e.g., multi-path environment, UE speed [11], a compensation process is needed and called outer loop link adaptation (OLLA). The correction of OLLA is based on the hybrid automatic repeat request (HARQ) feedback and is depicted as follows, the mapped ITBS from the UE’s CQI report, defined as, ITBS(CQI), is updated by a margin, \(ITBS_{margin}\), for each received positive/negative acknowledgment (ACK/NACK) from the UE. When an ACK is received, \(ITBS_{margin}\) is decreased by \(\varDelta _{down}\), and when a NACK is received, the margin is increased by, \(\varDelta _{up}\). The ratio \(\frac{\varDelta _{down}}{\varDelta _{up}}\) is controlled by the target BLER that OLLA is designed to converge to, given by

$$\begin{aligned} \frac{\varDelta {down}}{\varDelta _{up}} = \frac{BLER_{T}}{100-BLER_{T}} \end{aligned}$$

Intuitively, if \(BLER_{T}\) is set to \(10\%\), this means that the user should receive at least \(90\%\) successful downlink transmissions. As explained the OLLA process is formulated as such,

$$\begin{aligned} ITBS = ITBS(CQI) - ITBS_{margin} \end{aligned}$$
$$\begin{aligned} ITBS_{margin} = {\left\{ \begin{array}{ll} ITBS_{margin} - \varDelta _{down} &{} \text {if ACK}\\ ITBS_{margin} + \varDelta _{up} &{} \text {if NACK}\\ \end{array}\right. } \end{aligned}$$
Fig. 10.
figure 10

DLLA block diagram

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Gradus, N., Cohen, A., Biton, E., Gurwitz, O. (2022). Reinforcement Based User Scheduling for Cellular Communications. In: Dolev, S., Katz, J., Meisels, A. (eds) Cyber Security, Cryptology, and Machine Learning. CSCML 2022. Lecture Notes in Computer Science, vol 13301. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07688-6

  • Online ISBN: 978-3-031-07689-3

  • eBook Packages: Computer ScienceComputer Science (R0)