DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery

Mandan, Naresh; Saxena, Paresh; Gupta, Manik

doi:10.1007/s11036-024-02311-1

DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery

Research
Published: 28 March 2024

(2024)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Naresh Mandan¹,
Paresh Saxena¹ &
Manik Gupta¹

56 Accesses
Explore all metrics

Abstract

Fifth generation (5G) and beyond 5G networks support high-throughput ultra-high definition (UHD) video applications. This paper examines the use of dynamic adaptive streaming over HTTP (DASH) to deliver UHD videos from servers to 5G-capable devices. Due to the dynamic network conditions of wireless networks, it is particularly challenging to provide a high quality of experience (QoE) for UHD video delivery. Consequently, adaptive bit rate (ABR) algorithms are developed to adapt the video bit rate to the network conditions. To improve QoE, several ABR algorithms are developed, the majority of which are based on predetermined rules. Therefore, they do not apply to a broad variety of network conditions. Recent research has shown that ABR algorithms powered by deep reinforcement learning (DRL) based vanilla asynchronous advantage actor-critic (A3C) methods are more effective at generalizing to different network conditions. However, they have some limitations, such as a lag between behavior and target policies, sample inefficiency, and sensitivity to the environment’s randomness. In this paper, we propose the design and implementation of two DRL-empowered ABR algorithms: (i) on-policy proximal policy optimization adaptive bit rate (PPO-ABR), and (ii) off-policy soft-actor critic adaptive bit rate (SAC-ABR). We evaluate the proposed algorithms using 5G traces from the Lumos 5G dataset and show that by utilizing specific properties of on-policy and off-policy methods, our proposed methods perform much better than vanilla A3C for different variations of QoE metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Applications of game theory in deep learning: a survey

Article 09 February 2022

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Data Availability

The data supporting the findings of this study are available upon reasonable request.

References

Ericsson Mobility Report (2020). https://www.ericsson.com/49da93/assets/local/mobility-report/documents/2020/june2020-ericsson-mobility-report.pdf
Dash.js. https://github.com/Dash-Industry-Forum/dash.js/
Arunruangsirilert K, Bo W, Hang S, Katto J (2022) Performance evaluation of low-latency live streaming of mpeg-dash uhd video over commercial 5g nsa/sa network. In: 2022 International conference on computer communications and networks (ICCCN), IEEE, pp 1–6
Mao H, Netravali R, Alizadeh M (2017) Neural adaptive video streaming with pensieve. In: Proceedings of the conference of the acm special interest group on data communication. SIGCOMM ’17, pp 197–210. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3098822.3098843
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA
Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. CoRR abs/1602.01783arXiv:1602.01783
Naresh M, Gireesh N, Saxena P, Gupta M (2022) Sac-abr: Soft actor-critic based deep reinforcement learning for adaptive bitrate streaming. In: 2022 14th International conference on communication systems & networks (COMSNETS), IEEE, pp 353–361
Naresh M, Saxena P, Gupta M (2023) Ppo-abr: Proximal policy optimization based deep reinforcement learning for adaptive bitrate streaming. In: 2023 International wireless communications and mobile computing (IWCMC), pp 199–204. https://doi.org/10.1109/IWCMC58020.2023.10182379
Narayanan A, Zhang X, Zhu R, Hassan A, Jin S, Zhu X, Zhang X, Rybkin D, Yang Z, Mao ZM, Qian F, Zhang Z-L (2021) A variegated look at 5g in the wild: Performance, power, and qoe implications. In: Proceedings of the 2021 ACM SIGCOMM 2021 conference. SIGCOMM ’21, pp 610–625, New York, NY, USA. https://doi.org/10.1145/3452296.3472923
Narayanan A, Ramadan E, Mehta R, Hu X, Liu Q, Fezeu RAK, Dayalan UK, Verma S, Ji P, Li T, Qian F, Zhang Z-L (2020) Lumos5g: Mapping and predicting commercial mmwave 5g throughput. In: Proceedings of the ACM internet measurement conference. IMC ’20, pp 176–193. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3419394.3423629
Grondman I, Busoniu L, Lopes GAD, Babuska R (2012) A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern, Part C (Applications and Reviews) 42(6):1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. CoRR abs/1502.05477arXiv:1502.05477
Filippi S, Cappé O, Garivier A (2010) Optimism in reinforcement learning and kullback-leibler divergence. 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp 115–122
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347arXiv:1707.06347
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D ICLR (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds.). http://dblp.uni-trier.de/db/conf/iclr/iclr2016.html#LillicrapHPHETS15
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. CoRR abs/1812.05905https://arxiv.org/abs/1812.059051812.05905
Van Hasselt H (2010) Double q-learning, pp 2613–2621
Jankowiak M, Obermeyer F (2018) Pathwise derivatives beyond the reparameterization trick. In: Dy J, Krause A (eds.) Proceedings of the 35th international conference on machine learning. proceedings of machine learning research, vol 80, pp 2235–2244. https://proceedings.mlr.press/v80/jankowiak18a.html
Naresh M, Das V, Saxena P, Gupta M (2022) Deep reinforcement learning based qoe-aware actor-learner architectures for video streaming in iot environments. Computing 104. https://doi.org/10.1007/s00607-021-01046-1
Naresh M, Saxena P, Gupta M (2023) Deep reinforcement learning with importance weighted a3c for qoe enhancement in video delivery services. In: 2023 IEEE 24th International symposium on a world of wireless, mobile and multimedia networks (WoWMoM), pp 97–106. https://doi.org/10.1109/WoWMoM57956.2023.00024
Yin X, Jindal A, Sekar V, Sinopoli B (2015) A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM Comput Commun Rev 45(4):325–338. https://doi.org/10.1145/2829988.2787486
Article Google Scholar
Taha M, Ali A, Lloret J, Gondim PRL, Canovas A (2021) An automated model for the assessment of qoe of adaptive video streaming over wireless networks. Multimedia Tools Appl 80(17):26833–26854. https://doi.org/10.1007/s11042-021-10934-9
Taha M, Lloret J, Ali A, García L (2018) Adaptive video streaming testbed design for performance study and assessment of qoe. Int J Commun Syst 31:3551. https://doi.org/10.1002/dac.3551
Taha M, García L, Jiménez JM, Lloret J (2017) Sdn-based throughput allocation in wireless networks for heterogeneous adaptive video streaming applications. 2017 13th International wireless communications and mobile computing conference (IWCMC), pp 963–968

Download references

Funding

This work has been supported by TCS Foundation, India under the TCS research scholar program, 2019-2023.

Author information

Authors and Affiliations

Computer Science & Information Systems, BITS Pilani, Jawahar Nagar, Hyderabad, 500078, Telangana, India
Naresh Mandan, Paresh Saxena & Manik Gupta

Authors

Naresh Mandan
View author publications
You can also search for this author in PubMed Google Scholar
Paresh Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Manik Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mandan Naresh: Conceptualization, Methodology, Validation, Writing - original draft. Paresh Saxena: Conceptualization, Writing - original draft, Writing - review and editing, Supervision. Manik Gupta: Conceptualization, Writing - original draft, Writing - review and editing, Supervision.

Corresponding author

Correspondence to Naresh Mandan.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mandan, N., Saxena, P. & Gupta, M. DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery. Mobile Netw Appl (2024). https://doi.org/10.1007/s11036-024-02311-1

Download citation

Accepted: 17 March 2024
Published: 28 March 2024
DOI: https://doi.org/10.1007/s11036-024-02311-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Applications of game theory in deep learning: a survey

A review of cooperative multi-agent deep reinforcement learning

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Applications of game theory in deep learning: a survey

A review of cooperative multi-agent deep reinforcement learning

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation