Abstract
Fifth generation (5G) and beyond 5G networks support high-throughput ultra-high definition (UHD) video applications. This paper examines the use of dynamic adaptive streaming over HTTP (DASH) to deliver UHD videos from servers to 5G-capable devices. Due to the dynamic network conditions of wireless networks, it is particularly challenging to provide a high quality of experience (QoE) for UHD video delivery. Consequently, adaptive bit rate (ABR) algorithms are developed to adapt the video bit rate to the network conditions. To improve QoE, several ABR algorithms are developed, the majority of which are based on predetermined rules. Therefore, they do not apply to a broad variety of network conditions. Recent research has shown that ABR algorithms powered by deep reinforcement learning (DRL) based vanilla asynchronous advantage actor-critic (A3C) methods are more effective at generalizing to different network conditions. However, they have some limitations, such as a lag between behavior and target policies, sample inefficiency, and sensitivity to the environment’s randomness. In this paper, we propose the design and implementation of two DRL-empowered ABR algorithms: (i) on-policy proximal policy optimization adaptive bit rate (PPO-ABR), and (ii) off-policy soft-actor critic adaptive bit rate (SAC-ABR). We evaluate the proposed algorithms using 5G traces from the Lumos 5G dataset and show that by utilizing specific properties of on-policy and off-policy methods, our proposed methods perform much better than vanilla A3C for different variations of QoE metrics.
Similar content being viewed by others
Data Availability
The data supporting the findings of this study are available upon reasonable request.
References
Ericsson Mobility Report (2020). https://www.ericsson.com/49da93/assets/local/mobility-report/documents/2020/june2020-ericsson-mobility-report.pdf
Arunruangsirilert K, Bo W, Hang S, Katto J (2022) Performance evaluation of low-latency live streaming of mpeg-dash uhd video over commercial 5g nsa/sa network. In: 2022 International conference on computer communications and networks (ICCCN), IEEE, pp 1–6
Mao H, Netravali R, Alizadeh M (2017) Neural adaptive video streaming with pensieve. In: Proceedings of the conference of the acm special interest group on data communication. SIGCOMM ’17, pp 197–210. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3098822.3098843
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. CoRR abs/1602.01783arXiv:1602.01783
Naresh M, Gireesh N, Saxena P, Gupta M (2022) Sac-abr: Soft actor-critic based deep reinforcement learning for adaptive bitrate streaming. In: 2022 14th International conference on communication systems & networks (COMSNETS), IEEE, pp 353–361
Naresh M, Saxena P, Gupta M (2023) Ppo-abr: Proximal policy optimization based deep reinforcement learning for adaptive bitrate streaming. In: 2023 International wireless communications and mobile computing (IWCMC), pp 199–204. https://doi.org/10.1109/IWCMC58020.2023.10182379
Narayanan A, Zhang X, Zhu R, Hassan A, Jin S, Zhu X, Zhang X, Rybkin D, Yang Z, Mao ZM, Qian F, Zhang Z-L (2021) A variegated look at 5g in the wild: Performance, power, and qoe implications. In: Proceedings of the 2021 ACM SIGCOMM 2021 conference. SIGCOMM ’21, pp 610–625, New York, NY, USA. https://doi.org/10.1145/3452296.3472923
Narayanan A, Ramadan E, Mehta R, Hu X, Liu Q, Fezeu RAK, Dayalan UK, Verma S, Ji P, Li T, Qian F, Zhang Z-L (2020) Lumos5g: Mapping and predicting commercial mmwave 5g throughput. In: Proceedings of the ACM internet measurement conference. IMC ’20, pp 176–193. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3419394.3423629
Grondman I, Busoniu L, Lopes GAD, Babuska R (2012) A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern, Part C (Applications and Reviews) 42(6):1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. CoRR abs/1502.05477arXiv:1502.05477
Filippi S, Cappé O, Garivier A (2010) Optimism in reinforcement learning and kullback-leibler divergence. 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp 115–122
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov (2017) Proximal policy optimization algorithms. CoRR abs/1707.06347arXiv:1707.06347
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D ICLR (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds.). http://dblp.uni-trier.de/db/conf/iclr/iclr2016.html#LillicrapHPHETS15
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. CoRR abs/1812.05905https://arxiv.org/abs/1812.059051812.05905
Van Hasselt H (2010) Double q-learning, pp 2613–2621
Jankowiak M, Obermeyer F (2018) Pathwise derivatives beyond the reparameterization trick. In: Dy J, Krause A (eds.) Proceedings of the 35th international conference on machine learning. proceedings of machine learning research, vol 80, pp 2235–2244. https://proceedings.mlr.press/v80/jankowiak18a.html
Naresh M, Das V, Saxena P, Gupta M (2022) Deep reinforcement learning based qoe-aware actor-learner architectures for video streaming in iot environments. Computing 104. https://doi.org/10.1007/s00607-021-01046-1
Naresh M, Saxena P, Gupta M (2023) Deep reinforcement learning with importance weighted a3c for qoe enhancement in video delivery services. In: 2023 IEEE 24th International symposium on a world of wireless, mobile and multimedia networks (WoWMoM), pp 97–106. https://doi.org/10.1109/WoWMoM57956.2023.00024
Yin X, Jindal A, Sekar V, Sinopoli B (2015) A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM Comput Commun Rev 45(4):325–338. https://doi.org/10.1145/2829988.2787486
Taha M, Ali A, Lloret J, Gondim PRL, Canovas A (2021) An automated model for the assessment of qoe of adaptive video streaming over wireless networks. Multimedia Tools Appl 80(17):26833–26854. https://doi.org/10.1007/s11042-021-10934-9
Taha M, Lloret J, Ali A, García L (2018) Adaptive video streaming testbed design for performance study and assessment of qoe. Int J Commun Syst 31:3551. https://doi.org/10.1002/dac.3551
Taha M, García L, Jiménez JM, Lloret J (2017) Sdn-based throughput allocation in wireless networks for heterogeneous adaptive video streaming applications. 2017 13th International wireless communications and mobile computing conference (IWCMC), pp 963–968
Funding
This work has been supported by TCS Foundation, India under the TCS research scholar program, 2019-2023.
Author information
Authors and Affiliations
Contributions
Mandan Naresh: Conceptualization, Methodology, Validation, Writing - original draft. Paresh Saxena: Conceptualization, Writing - original draft, Writing - review and editing, Supervision. Manik Gupta: Conceptualization, Writing - original draft, Writing - review and editing, Supervision.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mandan, N., Saxena, P. & Gupta, M. DRL Empowered On-policy and Off-policy ABR for 5G Mobile Ultra-HD Video Delivery. Mobile Netw Appl (2024). https://doi.org/10.1007/s11036-024-02311-1
Accepted:
Published:
DOI: https://doi.org/10.1007/s11036-024-02311-1