Abstract
Nowadays, live video streaming events have become a mainstay in viewer’s communication in large international enterprises. Provided that viewers are distributed worldwide, the main challenge resides on how to schedule the optimal event’s time so as to improve both the viewer’s engagement and adoption. In this paper we present a multi-task deep reinforcement learning model to select the time of a live video streaming event, aiming to optimize the viewer’s engagement and adoption at the same time. We consider the engagement and adoption of the viewers as independent tasks and formulate a unified loss function to learn a common policy. In addition, we account for the fact that each task might have different contribution to the training strategy of the agent. Therefore, to determine the contribution of each task to the agent’s training, we design a Transformer’s architecture for the state-action transitions of each task. We evaluate our proposed model on four real-world datasets, generated by the live video streaming events of four large enterprises spanning from January 2019 until March 2021. Our experiments demonstrate the effectiveness of the proposed model when compared with several state-of-the-art strategies. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://github.com/stefanosantaris/merlin.
Keywords
- Multi-task learning
- Reinforcement learning
- Live video streaming
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We consider only the l previous events to capture the most recent viewers behavior. As we will demonstrate in Sect. 4, considering large values of l does not necessarily improve the model’s performance.
- 2.
- 3.
- 4.
- 5.
References
Break up your big virtual meetings. https://hbr.org/2020/04/break-up-your-big-virtual-meetings (2020). Accessed 19 Mar 2021
Gauging demand for enterprise streaming - 2020 - investment trends in times of global change. https://www.ibm.com/downloads/cas/DEAKXQ5P (2020). Accessed 29 Jan 2021
Using video for internal corporate communications, training & compliance. https://www.ibm.com/downloads/cas/M0R85GDQ (2021). Accessed 30 Mar 2021
Antaris, S., Rafailidis, D.: Vstreamdrls: Dynamic graph representation learning with self-attention for enterprise distributed video streaming solutions. In: ASONAM, pp. 486–493 (2020)
Bräm, T., Brunner, G., Richter, O., Wattenhofer, R.: Attentive multi-task deep reinforcement learning. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML, pp. 134–149 (2020)
Calandriello, D., Lazaric, A., Restelli, M.: Sparse multi-task reinforcement learning. In: NIPS (2014)
Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., Chi, E.H.: Top-k off-policy correction for a reinforce recommender system. In: WSDM, pp. 456–464 (2019)
Espeholt, L., et al.: IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In: ICML, pp. 1407–1416 (2018)
Gilotte, A., Calauzènes, C., Nedelec, T., Abraham, A., Dollé, S.: Offline a/b testing for recommender systems. In: WSDM, pp. 198–206 (2018)
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: AAAI, pp. 3796–3803. AAAI Press (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2019)
Liu, F., Guo, H., Li, X., Tang, R., Ye, Y., He, X.: End-to-end deep reinforcement learning based recommendation with supervised embedding. In: WSDM, pp. 384–392 (2020)
Loynd, R., Fernandez, R., Celikyilmaz, A., Swaminathan, A., Hausknecht, M.: Working memory graphs. In: ICML (2020)
Parisotto, E., Salakhutdinov, R.: Efficient transformers in reinforcement learning using actor-learner distillation (2021)
Parisotto, E., et al.: Stabilizing transformers for reinforcement learning. In: ICML, pp. 7487–7498 (2020)
Polydoros, A.S., Nalpantidis, L.: Survey of model-based reinforcement learning: applications on robotics. J. Intell. Robot. Syst. 86(2), 153–173 (2017)
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: ICML, pp. 387–395 (2014)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (2018)
Swaminathan, A., Joachims, T.: The self-normalized estimator for counterfactual learning. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) NeurIPS, vol. 28 (2015)
Teh, Y.W., et al.: Distral: robust multitask reinforcement learning. In: NIPS, pp. 4499–4509 (2017)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Vithayathil Varghese, N., Mahmoud, Q.H.: A survey of multi-task deep reinforcement learning. Electronics 9(9), 1363 (2020)
Xin, X., Karatzoglou, A., Arapakis, I., Jose, J.M.: Self-supervised reinforcement learning for recommender systems. In: SIGIR, pp. 931–940 (2020)
Ye, D., et al.: Mastering complex control in moba games with deep reinforcement learning. AAAI 34(04), 6672–6679 (2020)
Zhu, H., et al.: The ingredients of real world robotic reinforcement learning. In: ICLR (2020)
Zhu, Y., et al.: What to do next: Modeling user behaviors by time-lstm. In: IJCAI-17, pp. 3602–3608 (2017)
Zou, L., Xia, L., Ding, Z., Song, J., Liu, W., Yin, D.: Reinforcement learning to optimize long-term user engagement in recommender systems. In: KDD, pp. 2810–2818 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Antaris, S., Rafailidis, D., Arriaza, R. (2021). Multi-task Learning for User Engagement and Adoption in Live Video Streaming Events. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-86517-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ecmlpkdd.org/
