Abstract
As opposed to traditional video, a micro-video is a short video that is spread on social platforms. As user-generated contents, micro-videos have stronger social attributes compared to ordinary videos. Research on micro-video analysis has been conducted in both industry and academia and includes venue classification, tag prediction, popularity prediction, action prediction, click prediction, and recommendation. In this paper, we first review the studies on these tasks in terms of micro-video classification, prediction, and recommendation. Thereafter, we present an overview of the methods, features, datasets, and evaluation metrics relating to these studies. Finally, we analyze the challenges of micro-video analysis. Because of the limited research work on micro-video analysis, we can not summarize some aspects of micro-video analysis, such as micro-video classification.We believe that this survey will aid in enhancing the knowledge of researchers and practitioners who are interested in micro-video analysis.
Similar content being viewed by others
References
Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: creativity in micro-videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4272–4279
Nguyen PX, Rogez G, Fowlkes C, Ramanan D (2016) The open world of micro-videos. arXiv preprint arXiv:1603.09439
Huang L, Luo B (2017) Tag refinement of micro-videos by learning from multiple data sources[J]. Multimed Tools Appl 76(19):20341–20358
Sano S, Yamasaki T, Aizawa K (2014) Degree of loop assessment in micro-video. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE, pp 5182–5186
Chen J, Song X, Nie L, Wang X, Zhang H, Chua T-S (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 898–907
Chen J (2016) Multi-modal learning: Study on a large-scale micro-video data collection. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 1454–1458
Zhang J, Nie L, Wang X, He X, Huang X, Chua T-S (2016) Shorter-is-better: Venue category estimation from micro-video. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 1415–1424
Nie L, Wang X, Zhang J, He X, Zhang H, Hong R, Tian Q (2017) Enhancing micro-video understanding by harnessing external sounds. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 1192–1200
Chen J, He X, Song X, Zhang H, Nie L, Chua T-S (2018) Venue prediction for social images by exploiting rich temporal patterns in LBSNs. In: 2018 International Conference on Multimedia Modeling (MMM). Springer, pp 327–339
Liu W, Huang X, Cao G, Zhang J, Song G, Yang L (2019) Joint learning of nnextvlad, cnn and context gating for micro-video venue classification[J]. IEEE Access 7:77091–77099
Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 1437–1445
Wei Y, Cheng Z, Yu X, Zhao Z, Zhu L, Nie L (2019) Personalized hashtag recommendation for micro-videos. In: 2019 ACM international conference on Multimedia (ACM MM). ACM, pp 1446–1454
Li Y, Liu M, Yin J, Cui C, Xu X-S, Nie L (2019) Routing micro-videos via a temporal graph-guided recommendation system. In: 2019 ACM international conference on Multimedia (ACM MM). ACM, pp 1464–1472
Shang S, Shi M, Shang W, Hong Z (2016) A micro-video recommendation system based on big data. In: 2016 IEEE/ACIS International Conference on Computer and Information Science (ICIS). IEEE, pp 1–5
Huang L, Luo B (2017) Personalized micro-video recommendation via hierarchical user interest modeling. In: 2017 the Pacific Rim Conference on Multimedia (PCM). Springer, pp 564–574
Ding J, Li Y, Li Y, Jin D (2018) Click versus share: A feature-driven study of micro-video popularity and virality in social media. In: 2018 SIAM International Conference on Data Mining (SDM). SIAM, pp 198–206
Chen X, Dong L, Zha Z-J, Zhou W, Xiong Z, Li Y (2018) Temporal hierarchical attention at category-and item-level for micro-video click-through prediction. In: 2018 ACM international conference on Multimedia (ACM MM). ACM, pp 1146–1153
Ma J, Li G, Zhong M, Zhao X, Zhu L, Li X (2018) Lga: latent genre aware micro-video recommendation on social media[J]. Multimed Tools Appl 77(3):2991–3008
Wei Y, Wang X, Guan W, Nie L, Lin Z, Chen B (2019) Neural multimodal cooperative learning toward micro-video understanding[J]. IEEE Trans Image Process 29:1–14
Liu Z, Yang N, Cao S (2016) Sentiment-analysis of review text for micro-video. In: 2016 IEEE International Conference on Computer and Communications (ICCC). IEEE, pp 526–530
Liu M, Nie L, Wang M, Chen B (2017) Towards micro-video understanding by joint sequential-sparse modeling. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 970–978
Jing P, Yuting S, Liqiang Nie X, Bai JL, Wang M (2017) Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE Trans Knowl Data Eng 30(8):1519–1532
Liu M, Nie L, XiangWang QT, Chen B (2018) Online data organizer: micro-video categorization by structure-guided multimodal dictionary learning[J]. IEEE Trans Image Process 28(3):1235–1247
Yuting S, Junyu X, Hong D, Fan F, Zhang J, Jing P (2021) Deep low-rank matrix factorization with latent correlation estimation for micro-video multi-label classification[J]. Inf Sci 575:587–598
Chen X, Liu D, Xiong Z, Zha Z-J (2021) Learning and fusing multiple user interest representations for Micro-video and movie recommendations[J]. IEEE Trans Multimed 23:484–496
Han Y, Pan G, Gao W, Guandong X, Jian W (2021) Aspect-level sentiment capsule network for micro-video click-through rate prediction[J]. World Wide Web 24(4):1045–1064
Dong Y, Zhang S, Zhao Z, Fan W, Zhu J, He X, Fei W (2021) Modeling high-order interactions across multi-interests for micro-video recommendation (Student abstract). In: 2021 AAAI Conference on Artificial Intelligence (AAAI). AAAI, pp 15945–15946
He L, Wang D, Wang H, Chen H, Guandong X (2021) TagPick: A system for bridging micro-video hashtags and e-commerce categories. In: 2021 ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 4721–4724
Liu Y, Liu Q, Yu T, Wang C, Niu Y, Yang S, Li C (2021) Concept-aware denoising graph neural network for micro-video recommendation. In: 2021 ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 1099–1108
Lei C, Liu Y, Zhang L, Wang G, Tang H, Li H, Miao C (2021) SEMI: a sequential multi-modal information transfer network for E-commerce Micro-video recommendations. In Proceedings of ACM SIGKDD conference 2021:3161–3171
Yi J, Zhu Y, Xie J, Chen Z (2021) Cross-modal variational auto-encoder for content-based Micro-video background music recommendation [J]. IEEE Trans Multimed 25:515–528
Lu Y, Huang Y, Zhang S, Han W, Chen H, Zhao Z, Wu F (2021) Multi-trends enhanced dynamic micro-video recommendation. arXiv:2110.03902v1
Guo J, Nie X, Ma Y, Shaheed K, Ullah I, Yin Y (2021) Attention based consistent semantic learning for micro-video scene recognition [J]. Inf Sci 543:504–516
Guo J, Nie X, Yin Y (2020) Mutual complementarity: multi-modal enhancement semantic learning for micro-video scene recognition [J]. IEEE Access 8:29518–29524
Yang C, Wang X, Jiang B (2020) Sentiment enhanced multi-modal hashtag recommendation for Micro-videos[J]. IEEE Access 8:78252–78264
Zhang J, Yuting W, Liu J, Jing P, Yuting S (2020) Low-rank regularized multimodal representation for Micro-video event detection[J]. IEEE Access 8:87266–87274
Xiaowei G, Lu L, Qiu S, Zou Q, Yang Z (2020) Sentiment key frame extraction in user-generated micro-videos via low-rank and sparse representation[J]. Neurocomputing 410:441–453
Cao D, Miao L, Rong H, Qin Z (2020) Liqiang Nie: hashtag our stories: hashtag recommendation for micro-videos via harnessing multiple modalities. Knowl [J] Based Syst 203:106114
Yuting S, Yang Li X, Bai PJ (2020) Predicting the popularity of micro-videos via a feature-discrimination transductive model[J]. Multimed Syst 26(5):519–534
Liu W, Huang X, Cao G, Zhang J, Song G, Yang L (2020) Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification[J]. Multimed Tools Appl 79(9-10):6709–6726
Yuting S, Hong D, Li Y, Jing P (2020) Low-rank regularized deep collaborative matrix factorization for Micro-video multi-label classification[J]. IEEE Signal Process Lett 27:740–744
Wei Y, Wang X, Guan W, Nie L, Lin Z, Chen B (2020) Neural multimodal cooperative learning toward Micro-video understanding[J]. IEEE Trans Image Process 29:1–14
Liu S, Xie J, Zou C, Chen Z (2020) User conditional hashtag recommendation for Micro-videos. In Proceedings of IEEE International Conference on Multimedia and Expo. 1-6
Hao Jiang, Wenjie Wang, Yinwei Wei, Zan Gao, Yinglong Wang, Liqiang Nie (2020 ) What aspect do you like: multi-scale time-aware user interest modeling for Micro-video recommendation. In Proceedings of ACM Conference on Multimedia 3487-3495
Xie J, Zhu Y, Zhang Z, Peng J, Yi J, Hu Y, Liu H, Chen Z (2020) A multimodal variational encoder-decoder framework for micro-video popularity prediction. In: 2020 International World Wide Web Conferences (WWW). W3C, pp 2542–2548
Zhu Y, Xie J, Chen Z (2003) Predicting the popularity of micro-videos with multimodal variational encoder-decoder framework. arXiv:2003:12724v1
Nie L, Liu M, Song X (2019) Multimodal learning toward micro-video understanding [M], San Rafael
Ma J, Wen J, Zhong M, Chen W, Li X (2019) MMM: multi-source multi-net Micro-video recommendation with clustered hidden item representation learning[J]. Data Sci Eng 4(3):240–253
Guo J, Nie X, Jian M, Yin Y (2019) binary feature representation learning for scene retrieval in micro-video. Multimed Tools Appl 78(17):24539–24552
Li M, Gan T, Liu M, Cheng Z, Yin J, Nie L (2019) Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 509-518
Ma J, Wen J, Zhong M, Chen W, Zhou X, Indulska J (2019) Multi-source Multi-net Micro-video Recommendation with Hidden Item Category Discovery. In Proceedings of the 24th International Conference on Database Systems for Advanced Applications, 384-400
Jin Y, Xu J, He X (2019) Personalized micro-video recommendation based on multi-modal features and user interest evolution. In: 2019 International Conference on Image and Graphics (ICIG). SPIE, pp 607–618
Liu S, Chen Z (2019) Sequential behavior modeling for next micro-video recommendation with collaborative transformer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 460–465
Ma S, Zha Z-J Wu F (2019) Knowing user better: jointly predicting click-through and playtime for micro-video. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 472–477
Chen J, Peng J, Qi L, Chen G, Zhang W (2019) Implicit rating methods based on interest preferences of categories for micro-video recommendation. In: 2019 International Conference on Knowledge Science, Engineering and Management (KSEM). Springer, pp 371–381
Liu S, Chen Z, Liu H, Hu X (2019) User-video co-attention network for personalized micro-video recommendation. In: 2019 World Wide Web Conferences (WWW). W3C, pp 3020–3026
Guo J, Nie X, Cui C, Xi X, Ma Y, Yin Y (2018) Getting more from one attractive scene: venue retrieval in micro-videos. In: 2018 Pacific Rim Conference on Multimedia (PCM). Springer, pp 721–733
Liu W, Huang X, Cao G, Song G, Yang L (2018) Joint learning of LSTMs-CNN and prototype for micro-video venue classification. In: 2018 Pacific Rim Conference on Multimedia (PCM). Springer, pp 705–715
Jiang Y, Xu B, Xue X (2014) Predicting emotions in user-generated videos. In: 2014 AAAI Conference on Artificial Intelligence (AAAI). AAAI, pp 73–79
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks [J]. Adv Neural Inf Proces Syst 25(2):1097–1105
Graves A, Graves A (2012) Long short-term memory [J]. In: Supervised sequence labelling with recurrent neural networks, 4th edn. Springer-Verlag, Berlin Heidelberg, pp 37–45
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on Computer Vision & Pattern Recognition (CVPR). IEEE, pp 770-778
Wang B, Huang X, Cao G et al (2022) Hybrid-attention and frame difference enhanced network for micro-video venue recognition [J]. J Intell Fuzzy Syst 43(3):3337–3353
Wang B, Huang X, Cao G et al (2022) Attention-enhanced and trusted multimodal learning for micro-video venue recognition [J]. Comput Electr Eng 102:108127
Jian M, Wang J, Yu H et al (2021) Visual saliency detection by integrating spatial position prior of object with background cues[J]. Expert Syst Appl 168:114219
Jian M, Wang J, Yu H et al (2021) Integrating object proposal with attention networks for video saliency detection[J]. Inf Sci 576:819–830
Lu X, Jian M, Wang X et al (2022) Visual saliency detection via combining center prior and U-net[J]. Multimedia Systems 28(5):1689–1698
Jian M, Zhang W, Yu H et al (2018) Saliency detection based on directional patches extraction and principal local color contrast[J]. J Vis Commun Image Represent 57:1–11
Wan W, Wang J, Zhang Y, Li J, Hui Y, Sun J (2022) A comprehensive survey on robust image watermarking. Neurocomputing 488:226–247
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (62176141, 62176139, 61876098), Major Basic Research Project of Natural Science Foundation of Shandong Province (ZR2021ZD15), Taishan Scholar Project of Shandong Province (tsqn202103088), Shandong Provincial Natural Science Foundation for Distinguished Young Scholars (ZR2021JQ26), Natural Science Foundation of Shandong Province (ZR2021QF119, ZR2022MF272) and special funds for distinguished professors of Shandong Jianzhu University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, J., Gong, R., Ma, Y. et al. A survey of micro-video analysis. Multimed Tools Appl 83, 32191–32212 (2024). https://doi.org/10.1007/s11042-023-16691-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16691-1