A survey of micro-video analysis

Guo, Jie; Gong, Rui; Ma, Yuling; Liu, Meng; Xi, Xiaoming; Nie, Xiushan; Yin, Yilong

doi:10.1007/s11042-023-16691-1

A survey of micro-video analysis

Published: 20 September 2023

Volume 83, pages 32191–32212, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jie Guo¹,
Rui Gong¹,
Yuling Ma¹,
Meng Liu¹,
Xiaoming Xi¹,
Xiushan Nie¹ &
…
Yilong Yin²

176 Accesses
Explore all metrics

Abstract

As opposed to traditional video, a micro-video is a short video that is spread on social platforms. As user-generated contents, micro-videos have stronger social attributes compared to ordinary videos. Research on micro-video analysis has been conducted in both industry and academia and includes venue classification, tag prediction, popularity prediction, action prediction, click prediction, and recommendation. In this paper, we first review the studies on these tasks in terms of micro-video classification, prediction, and recommendation. Thereafter, we present an overview of the methods, features, datasets, and evaluation metrics relating to these studies. Finally, we analyze the challenges of micro-video analysis. Because of the limited research work on micro-video analysis, we can not summarize some aspects of micro-video analysis, such as micro-video classification.We believe that this survey will aid in enhancing the knowledge of researchers and practitioners who are interested in micro-video analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Deepfake video detection: challenges and opportunities

Article Open access 29 May 2024

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Notes

www.acmmm16.wixsite.com/mm16

References

Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: creativity in micro-videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4272–4279
Google Scholar
Nguyen PX, Rogez G, Fowlkes C, Ramanan D (2016) The open world of micro-videos. arXiv preprint arXiv:1603.09439
Huang L, Luo B (2017) Tag refinement of micro-videos by learning from multiple data sources[J]. Multimed Tools Appl 76(19):20341–20358
Google Scholar
Sano S, Yamasaki T, Aizawa K (2014) Degree of loop assessment in micro-video. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE, pp 5182–5186
Google Scholar
Chen J, Song X, Nie L, Wang X, Zhang H, Chua T-S (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 898–907
Google Scholar
Chen J (2016) Multi-modal learning: Study on a large-scale micro-video data collection. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 1454–1458
Google Scholar
Zhang J, Nie L, Wang X, He X, Huang X, Chua T-S (2016) Shorter-is-better: Venue category estimation from micro-video. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 1415–1424
Google Scholar
Nie L, Wang X, Zhang J, He X, Zhang H, Hong R, Tian Q (2017) Enhancing micro-video understanding by harnessing external sounds. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 1192–1200
Google Scholar
Chen J, He X, Song X, Zhang H, Nie L, Chua T-S (2018) Venue prediction for social images by exploiting rich temporal patterns in LBSNs. In: 2018 International Conference on Multimedia Modeling (MMM). Springer, pp 327–339
Google Scholar
Liu W, Huang X, Cao G, Zhang J, Song G, Yang L (2019) Joint learning of nnextvlad, cnn and context gating for micro-video venue classification[J]. IEEE Access 7:77091–77099
Google Scholar
Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 1437–1445
Google Scholar
Wei Y, Cheng Z, Yu X, Zhao Z, Zhu L, Nie L (2019) Personalized hashtag recommendation for micro-videos. In: 2019 ACM international conference on Multimedia (ACM MM). ACM, pp 1446–1454
Google Scholar
Li Y, Liu M, Yin J, Cui C, Xu X-S, Nie L (2019) Routing micro-videos via a temporal graph-guided recommendation system. In: 2019 ACM international conference on Multimedia (ACM MM). ACM, pp 1464–1472
Google Scholar
Shang S, Shi M, Shang W, Hong Z (2016) A micro-video recommendation system based on big data. In: 2016 IEEE/ACIS International Conference on Computer and Information Science (ICIS). IEEE, pp 1–5
Google Scholar
Huang L, Luo B (2017) Personalized micro-video recommendation via hierarchical user interest modeling. In: 2017 the Pacific Rim Conference on Multimedia (PCM). Springer, pp 564–574
Google Scholar
Ding J, Li Y, Li Y, Jin D (2018) Click versus share: A feature-driven study of micro-video popularity and virality in social media. In: 2018 SIAM International Conference on Data Mining (SDM). SIAM, pp 198–206
Google Scholar
Chen X, Dong L, Zha Z-J, Zhou W, Xiong Z, Li Y (2018) Temporal hierarchical attention at category-and item-level for micro-video click-through prediction. In: 2018 ACM international conference on Multimedia (ACM MM). ACM, pp 1146–1153
Google Scholar
Ma J, Li G, Zhong M, Zhao X, Zhu L, Li X (2018) Lga: latent genre aware micro-video recommendation on social media[J]. Multimed Tools Appl 77(3):2991–3008
Google Scholar
Wei Y, Wang X, Guan W, Nie L, Lin Z, Chen B (2019) Neural multimodal cooperative learning toward micro-video understanding[J]. IEEE Trans Image Process 29:1–14
ADS MathSciNet PubMed Google Scholar
Liu Z, Yang N, Cao S (2016) Sentiment-analysis of review text for micro-video. In: 2016 IEEE International Conference on Computer and Communications (ICCC). IEEE, pp 526–530
Google Scholar
Liu M, Nie L, Wang M, Chen B (2017) Towards micro-video understanding by joint sequential-sparse modeling. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 970–978
Google Scholar
Jing P, Yuting S, Liqiang Nie X, Bai JL, Wang M (2017) Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE Trans Knowl Data Eng 30(8):1519–1532
Google Scholar
Liu M, Nie L, XiangWang QT, Chen B (2018) Online data organizer: micro-video categorization by structure-guided multimodal dictionary learning[J]. IEEE Trans Image Process 28(3):1235–1247
ADS MathSciNet PubMed Google Scholar
Yuting S, Junyu X, Hong D, Fan F, Zhang J, Jing P (2021) Deep low-rank matrix factorization with latent correlation estimation for micro-video multi-label classification[J]. Inf Sci 575:587–598
MathSciNet Google Scholar
Chen X, Liu D, Xiong Z, Zha Z-J (2021) Learning and fusing multiple user interest representations for Micro-video and movie recommendations[J]. IEEE Trans Multimed 23:484–496
Google Scholar
Han Y, Pan G, Gao W, Guandong X, Jian W (2021) Aspect-level sentiment capsule network for micro-video click-through rate prediction[J]. World Wide Web 24(4):1045–1064
Google Scholar
Dong Y, Zhang S, Zhao Z, Fan W, Zhu J, He X, Fei W (2021) Modeling high-order interactions across multi-interests for micro-video recommendation (Student abstract). In: 2021 AAAI Conference on Artificial Intelligence (AAAI). AAAI, pp 15945–15946
Google Scholar
He L, Wang D, Wang H, Chen H, Guandong X (2021) TagPick: A system for bridging micro-video hashtags and e-commerce categories. In: 2021 ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 4721–4724
Google Scholar
Liu Y, Liu Q, Yu T, Wang C, Niu Y, Yang S, Li C (2021) Concept-aware denoising graph neural network for micro-video recommendation. In: 2021 ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 1099–1108
Google Scholar
Lei C, Liu Y, Zhang L, Wang G, Tang H, Li H, Miao C (2021) SEMI: a sequential multi-modal information transfer network for E-commerce Micro-video recommendations. In Proceedings of ACM SIGKDD conference 2021:3161–3171
Google Scholar
Yi J, Zhu Y, Xie J, Chen Z (2021) Cross-modal variational auto-encoder for content-based Micro-video background music recommendation [J]. IEEE Trans Multimed 25:515–528
Google Scholar
Lu Y, Huang Y, Zhang S, Han W, Chen H, Zhao Z, Wu F (2021) Multi-trends enhanced dynamic micro-video recommendation. arXiv:2110.03902v1
Guo J, Nie X, Ma Y, Shaheed K, Ullah I, Yin Y (2021) Attention based consistent semantic learning for micro-video scene recognition [J]. Inf Sci 543:504–516
MathSciNet Google Scholar
Guo J, Nie X, Yin Y (2020) Mutual complementarity: multi-modal enhancement semantic learning for micro-video scene recognition [J]. IEEE Access 8:29518–29524
Google Scholar
Yang C, Wang X, Jiang B (2020) Sentiment enhanced multi-modal hashtag recommendation for Micro-videos[J]. IEEE Access 8:78252–78264
Google Scholar
Zhang J, Yuting W, Liu J, Jing P, Yuting S (2020) Low-rank regularized multimodal representation for Micro-video event detection[J]. IEEE Access 8:87266–87274
Google Scholar
Xiaowei G, Lu L, Qiu S, Zou Q, Yang Z (2020) Sentiment key frame extraction in user-generated micro-videos via low-rank and sparse representation[J]. Neurocomputing 410:441–453
Google Scholar
Cao D, Miao L, Rong H, Qin Z (2020) Liqiang Nie: hashtag our stories: hashtag recommendation for micro-videos via harnessing multiple modalities. Knowl [J] Based Syst 203:106114
Google Scholar
Yuting S, Yang Li X, Bai PJ (2020) Predicting the popularity of micro-videos via a feature-discrimination transductive model[J]. Multimed Syst 26(5):519–534
Google Scholar
Liu W, Huang X, Cao G, Zhang J, Song G, Yang L (2020) Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification[J]. Multimed Tools Appl 79(9-10):6709–6726
Google Scholar
Yuting S, Hong D, Li Y, Jing P (2020) Low-rank regularized deep collaborative matrix factorization for Micro-video multi-label classification[J]. IEEE Signal Process Lett 27:740–744
ADS Google Scholar
Wei Y, Wang X, Guan W, Nie L, Lin Z, Chen B (2020) Neural multimodal cooperative learning toward Micro-video understanding[J]. IEEE Trans Image Process 29:1–14
ADS MathSciNet PubMed Google Scholar
Liu S, Xie J, Zou C, Chen Z (2020) User conditional hashtag recommendation for Micro-videos. In Proceedings of IEEE International Conference on Multimedia and Expo. 1-6
Hao Jiang, Wenjie Wang, Yinwei Wei, Zan Gao, Yinglong Wang, Liqiang Nie (2020 ) What aspect do you like: multi-scale time-aware user interest modeling for Micro-video recommendation. In Proceedings of ACM Conference on Multimedia 3487-3495
Xie J, Zhu Y, Zhang Z, Peng J, Yi J, Hu Y, Liu H, Chen Z (2020) A multimodal variational encoder-decoder framework for micro-video popularity prediction. In: 2020 International World Wide Web Conferences (WWW). W3C, pp 2542–2548
Google Scholar
Zhu Y, Xie J, Chen Z (2003) Predicting the popularity of micro-videos with multimodal variational encoder-decoder framework. arXiv:2003:12724v1
Nie L, Liu M, Song X (2019) Multimodal learning toward micro-video understanding [M], San Rafael
Ma J, Wen J, Zhong M, Chen W, Li X (2019) MMM: multi-source multi-net Micro-video recommendation with clustered hidden item representation learning[J]. Data Sci Eng 4(3):240–253
Google Scholar
Guo J, Nie X, Jian M, Yin Y (2019) binary feature representation learning for scene retrieval in micro-video. Multimed Tools Appl 78(17):24539–24552
Google Scholar
Li M, Gan T, Liu M, Cheng Z, Yin J, Nie L (2019) Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 509-518
Ma J, Wen J, Zhong M, Chen W, Zhou X, Indulska J (2019) Multi-source Multi-net Micro-video Recommendation with Hidden Item Category Discovery. In Proceedings of the 24th International Conference on Database Systems for Advanced Applications, 384-400
Jin Y, Xu J, He X (2019) Personalized micro-video recommendation based on multi-modal features and user interest evolution. In: 2019 International Conference on Image and Graphics (ICIG). SPIE, pp 607–618
Google Scholar
Liu S, Chen Z (2019) Sequential behavior modeling for next micro-video recommendation with collaborative transformer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 460–465
Google Scholar
Ma S, Zha Z-J Wu F (2019) Knowing user better: jointly predicting click-through and playtime for micro-video. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 472–477
Google Scholar
Chen J, Peng J, Qi L, Chen G, Zhang W (2019) Implicit rating methods based on interest preferences of categories for micro-video recommendation. In: 2019 International Conference on Knowledge Science, Engineering and Management (KSEM). Springer, pp 371–381
Google Scholar
Liu S, Chen Z, Liu H, Hu X (2019) User-video co-attention network for personalized micro-video recommendation. In: 2019 World Wide Web Conferences (WWW). W3C, pp 3020–3026
Google Scholar
Guo J, Nie X, Cui C, Xi X, Ma Y, Yin Y (2018) Getting more from one attractive scene: venue retrieval in micro-videos. In: 2018 Pacific Rim Conference on Multimedia (PCM). Springer, pp 721–733
Google Scholar
Liu W, Huang X, Cao G, Song G, Yang L (2018) Joint learning of LSTMs-CNN and prototype for micro-video venue classification. In: 2018 Pacific Rim Conference on Multimedia (PCM). Springer, pp 705–715
Google Scholar
Jiang Y, Xu B, Xue X (2014) Predicting emotions in user-generated videos. In: 2014 AAAI Conference on Artificial Intelligence (AAAI). AAAI, pp 73–79
Google Scholar
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks [J]. Adv Neural Inf Proces Syst 25(2):1097–1105
Google Scholar
Graves A, Graves A (2012) Long short-term memory [J]. In: Supervised sequence labelling with recurrent neural networks, 4th edn. Springer-Verlag, Berlin Heidelberg, pp 37–45
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on Computer Vision & Pattern Recognition (CVPR). IEEE, pp 770-778
Wang B, Huang X, Cao G et al (2022) Hybrid-attention and frame difference enhanced network for micro-video venue recognition [J]. J Intell Fuzzy Syst 43(3):3337–3353
Google Scholar
Wang B, Huang X, Cao G et al (2022) Attention-enhanced and trusted multimodal learning for micro-video venue recognition [J]. Comput Electr Eng 102:108127
Google Scholar
Jian M, Wang J, Yu H et al (2021) Visual saliency detection by integrating spatial position prior of object with background cues[J]. Expert Syst Appl 168:114219
Google Scholar
Jian M, Wang J, Yu H et al (2021) Integrating object proposal with attention networks for video saliency detection[J]. Inf Sci 576:819–830
MathSciNet Google Scholar
Lu X, Jian M, Wang X et al (2022) Visual saliency detection via combining center prior and U-net[J]. Multimedia Systems 28(5):1689–1698
Google Scholar
Jian M, Zhang W, Yu H et al (2018) Saliency detection based on directional patches extraction and principal local color contrast[J]. J Vis Commun Image Represent 57:1–11
Google Scholar
Wan W, Wang J, Zhang Y, Li J, Hui Y, Sun J (2022) A comprehensive survey on robust image watermarking. Neurocomputing 488:226–247
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (62176141, 62176139, 61876098), Major Basic Research Project of Natural Science Foundation of Shandong Province (ZR2021ZD15), Taishan Scholar Project of Shandong Province (tsqn202103088), Shandong Provincial Natural Science Foundation for Distinguished Young Scholars (ZR2021JQ26), Natural Science Foundation of Shandong Province (ZR2021QF119, ZR2022MF272) and special funds for distinguished professors of Shandong Jianzhu University.

Author information

Authors and Affiliations

Shandong Jianzhu University, Jinan, China
Jie Guo, Rui Gong, Yuling Ma, Meng Liu, Xiaoming Xi & Xiushan Nie
Shandong University, Jinan, China
Yilong Yin

Authors

Jie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Rui Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yuling Ma
View author publications
You can also search for this author in PubMed Google Scholar
Meng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Xi
View author publications
You can also search for this author in PubMed Google Scholar
Xiushan Nie
View author publications
You can also search for this author in PubMed Google Scholar
Yilong Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiushan Nie.

Ethics declarations

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, J., Gong, R., Ma, Y. et al. A survey of micro-video analysis. Multimed Tools Appl 83, 32191–32212 (2024). https://doi.org/10.1007/s11042-023-16691-1

Download citation

Received: 10 October 2022
Revised: 17 May 2023
Accepted: 27 August 2023
Published: 20 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16691-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of micro-video analysis

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

Deepfake video detection: challenges and opportunities

A systematic review and research perspective on recommender systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of micro-video analysis

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

Deepfake video detection: challenges and opportunities

A systematic review and research perspective on recommender systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation