Skip to main content
Log in

DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery

  • Research
  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Fifth generation (5G) and beyond 5G networks support high-throughput ultra-high definition (UHD) video applications. This paper examines the use of dynamic adaptive streaming over HTTP (DASH) to deliver UHD videos from servers to 5G-capable devices. Due to the dynamic network conditions of wireless networks, it is particularly challenging to provide a high quality of experience (QoE) for UHD video delivery. Consequently, adaptive bit rate (ABR) algorithms are developed to adapt the video bit rate to the network conditions. To improve QoE, several ABR algorithms are developed, the majority of which are based on predetermined rules. Therefore, they do not apply to a broad variety of network conditions. Recent research has shown that ABR algorithms powered by deep reinforcement learning (DRL) based vanilla asynchronous advantage actor-critic (A3C) methods are more effective at generalizing to different network conditions. However, they have some limitations, such as a lag between behavior and target policies, sample inefficiency, and sensitivity to the environment’s randomness. In this paper, we propose the design and implementation of two DRL-empowered ABR algorithms: (i) on-policy proximal policy optimization adaptive bit rate (PPO-ABR), and (ii) off-policy soft-actor critic adaptive bit rate (SAC-ABR). We evaluate the proposed algorithms using 5G traces from the Lumos 5G dataset and show that by utilizing specific properties of on-policy and off-policy methods, our proposed methods perform much better than vanilla A3C for different variations of QoE metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The data supporting the findings of this study are available upon reasonable request.

References

  1. Ericsson Mobility Report (2020). https://www.ericsson.com/49da93/assets/local/mobility-report/documents/2020/june2020-ericsson-mobility-report.pdf

  2. Dash.js. https://github.com/Dash-Industry-Forum/dash.js/

  3. Arunruangsirilert K, Bo W, Hang S, Katto J (2022) Performance evaluation of low-latency live streaming of mpeg-dash uhd video over commercial 5g nsa/sa network. In: 2022 International conference on computer communications and networks (ICCCN), IEEE, pp 1–6

  4. Mao H, Netravali R, Alizadeh M (2017) Neural adaptive video streaming with pensieve. In: Proceedings of the conference of the acm special interest group on data communication. SIGCOMM ’17, pp 197–210. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3098822.3098843

  5. Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA

    Google Scholar 

  6. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. CoRR abs/1602.01783arXiv:1602.01783

  7. Naresh M, Gireesh N, Saxena P, Gupta M (2022) Sac-abr: Soft actor-critic based deep reinforcement learning for adaptive bitrate streaming. In: 2022 14th International conference on communication systems & networks (COMSNETS), IEEE, pp 353–361

  8. Naresh M, Saxena P, Gupta M (2023) Ppo-abr: Proximal policy optimization based deep reinforcement learning for adaptive bitrate streaming. In: 2023 International wireless communications and mobile computing (IWCMC), pp 199–204. https://doi.org/10.1109/IWCMC58020.2023.10182379

  9. Narayanan A, Zhang X, Zhu R, Hassan A, Jin S, Zhu X, Zhang X, Rybkin D, Yang Z, Mao ZM, Qian F, Zhang Z-L (2021) A variegated look at 5g in the wild: Performance, power, and qoe implications. In: Proceedings of the 2021 ACM SIGCOMM 2021 conference. SIGCOMM ’21, pp 610–625, New York, NY, USA. https://doi.org/10.1145/3452296.3472923

  10. Narayanan A, Ramadan E, Mehta R, Hu X, Liu Q, Fezeu RAK, Dayalan UK, Verma S, Ji P, Li T, Qian F, Zhang Z-L (2020) Lumos5g: Mapping and predicting commercial mmwave 5g throughput. In: Proceedings of the ACM internet measurement conference. IMC ’20, pp 176–193. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3419394.3423629

  11. Grondman I, Busoniu L, Lopes GAD, Babuska R (2012) A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern, Part C (Applications and Reviews) 42(6):1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595

  12. Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014

  13. Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. CoRR abs/1502.05477arXiv:1502.05477

  14. Filippi S, Cappé O, Garivier A (2010) Optimism in reinforcement learning and kullback-leibler divergence. 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp 115–122

  15. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347arXiv:1707.06347

  16. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D ICLR (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds.). http://dblp.uni-trier.de/db/conf/iclr/iclr2016.html#LillicrapHPHETS15

  17. Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. CoRR abs/1812.05905https://arxiv.org/abs/1812.059051812.05905

  18. Van Hasselt H (2010) Double q-learning, pp 2613–2621

  19. Jankowiak M, Obermeyer F (2018) Pathwise derivatives beyond the reparameterization trick. In: Dy J, Krause A (eds.) Proceedings of the 35th international conference on machine learning. proceedings of machine learning research, vol 80, pp 2235–2244. https://proceedings.mlr.press/v80/jankowiak18a.html

  20. Naresh M, Das V, Saxena P, Gupta M (2022) Deep reinforcement learning based qoe-aware actor-learner architectures for video streaming in iot environments. Computing 104. https://doi.org/10.1007/s00607-021-01046-1

  21. Naresh M, Saxena P, Gupta M (2023) Deep reinforcement learning with importance weighted a3c for qoe enhancement in video delivery services. In: 2023 IEEE 24th International symposium on a world of wireless, mobile and multimedia networks (WoWMoM), pp 97–106. https://doi.org/10.1109/WoWMoM57956.2023.00024

  22. Yin X, Jindal A, Sekar V, Sinopoli B (2015) A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM Comput Commun Rev 45(4):325–338. https://doi.org/10.1145/2829988.2787486

    Article  Google Scholar 

  23. Taha M, Ali A, Lloret J, Gondim PRL, Canovas A (2021) An automated model for the assessment of qoe of adaptive video streaming over wireless networks. Multimedia Tools Appl 80(17):26833–26854. https://doi.org/10.1007/s11042-021-10934-9

  24. Taha M, Lloret J, Ali A, García L (2018) Adaptive video streaming testbed design for performance study and assessment of qoe. Int J Commun Syst 31:3551. https://doi.org/10.1002/dac.3551

  25. Taha M, García L, Jiménez JM, Lloret J (2017) Sdn-based throughput allocation in wireless networks for heterogeneous adaptive video streaming applications. 2017 13th International wireless communications and mobile computing conference (IWCMC), pp 963–968

Download references

Funding

This work has been supported by TCS Foundation, India under the TCS research scholar program, 2019-2023.

Author information

Authors and Affiliations

Authors

Contributions

Mandan Naresh: Conceptualization, Methodology, Validation, Writing - original draft. Paresh Saxena: Conceptualization, Writing - original draft, Writing - review and editing, Supervision. Manik Gupta: Conceptualization, Writing - original draft, Writing - review and editing, Supervision.

Corresponding author

Correspondence to Naresh Mandan.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mandan, N., Saxena, P. & Gupta, M. DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery. Mobile Netw Appl (2024). https://doi.org/10.1007/s11036-024-02311-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11036-024-02311-1

Keywords

Navigation