Abstract
Live video comment generating task aims to automatically generate real-time viewer comments on videos like real viewers do. Like providing search suggestions by search engines, this task can help viewers find comments they want to post by providing generated comments. Previous works ignore the interactivity and diversity of comments and can only generate general and popular comments. In this paper, we incorporate post time of the comments to deal with the real-time related comment interactions. We also take the video type labels into consideration to handle the diversity of comments and generate more related and informative comments. To this end, we propose a pre-training based encoder-decoder joint model called PLVCG model. This model is composed of a bidirectional encoder to encode context comments and visual frames jointly as well as an auto-regressive decoder to generate real-time comments and classify the type of the video. We evaluate our model in a large-scale real-world live comment dataset. The experiment results present that our model outperforms the state-of-the-art on live video comment ranking and generating task significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, X., Zhang, Y., Ai, Q., Xu, H., Yan, J., Qin, Z.: Personalized key frame recommendation. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 315–324 (2017)
Chen, Y., Gao, Q., Rau, P.L.P.: Watching a movie alone yet together: understanding reasons for watching Danmaku videos. Int. J. Hum.-Comput. Interact. 33(9), 731–743 (2017)
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)
Duan, C., Cui, L., Ma, S., Wei, F., Zhu, C., Zhao, T.: Multimodal matching transformer for live commenting. arXiv preprint arXiv:2002.02649 (2020)
He, M., Ge, Y., Wu, L., Chen, E., Tan, C.: Predicting the popularity of DanMu-enabled videos: a multi-factor view. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 351–366. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32049-6_22
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Li, X., Song, J., Gao, L., Liu, X., Huang, W., He, X., Gan, C.: Beyond RNNs: positional self-attention with co-attention for video question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8658–8665 (2019)
Lv, G., Xu, T., Chen, E., Liu, Q., Zheng, Y.: Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Ma, S., Cui, L., Dai, D., Wei, F., Sun, X.: LiveBot: generating live video comments based on visual and textual contexts. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6810–6817 (2019)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: VideoBERT: a joint model for video and language representation learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7464–7473 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wu, H., Jones, G.J., Pitie, F.: Response to LiveBot: generating live video comments based on visual and textual contexts. arXiv preprint arXiv:2006.03022 (2020)
Xu, L., Zhang, C.: Bridging video content and comments: Synchronized video description with temporal summarization of crowdsourced time-sync comments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Yu, Y., Kim, J., Kim, G.: A joint sequence fusion model for video question answering and retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 487–503. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_29
Zeng, Z., Xue, C., Gao, N., Wang, L., Liu, Z.: Learning from audience intelligence: dynamic labeled LDA model for time-sync commented video tagging. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11303, pp. 546–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04182-3_48
Zhang, Z., Yin, Z., Ren, S., Li, X., Li, S.: DCA: diversified co-attention towards informative live video commenting. In: Zhu, X., Zhang, M., Hong, Yu., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12431, pp. 3–15. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60457-8_1
Zhou, L., Palangi, H., Zhang, L., Hu, H., Corso, J.J., Gao, J.: Unified vision-language pre-training for image captioning and VQA. In: AAAI, pp. 13041–13049 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeng, Z., Gao, N., Xue, C., Tu, C. (2021). PLVCG: A Pretraining Based Model for Live Video Comment Generation. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)