Skip to main content
Log in

A survey of micro-video analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As opposed to traditional video, a micro-video is a short video that is spread on social platforms. As user-generated contents, micro-videos have stronger social attributes compared to ordinary videos. Research on micro-video analysis has been conducted in both industry and academia and includes venue classification, tag prediction, popularity prediction, action prediction, click prediction, and recommendation. In this paper, we first review the studies on these tasks in terms of micro-video classification, prediction, and recommendation. Thereafter, we present an overview of the methods, features, datasets, and evaluation metrics relating to these studies. Finally, we analyze the challenges of micro-video analysis. Because of the limited research work on micro-video analysis, we can not summarize some aspects of micro-video analysis, such as micro-video classification.We believe that this survey will aid in enhancing the knowledge of researchers and practitioners who are interested in micro-video analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. www.acmmm16.wixsite.com/mm16

References

  1. Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: creativity in micro-videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 4272–4279

    Google Scholar 

  2. Nguyen PX, Rogez G, Fowlkes C, Ramanan D (2016) The open world of micro-videos. arXiv preprint arXiv:1603.09439

  3. Huang L, Luo B (2017) Tag refinement of micro-videos by learning from multiple data sources[J]. Multimed Tools Appl 76(19):20341–20358

    Google Scholar 

  4. Sano S, Yamasaki T, Aizawa K (2014) Degree of loop assessment in micro-video. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE, pp 5182–5186

    Google Scholar 

  5. Chen J, Song X, Nie L, Wang X, Zhang H, Chua T-S (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 898–907

    Google Scholar 

  6. Chen J (2016) Multi-modal learning: Study on a large-scale micro-video data collection. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 1454–1458

    Google Scholar 

  7. Zhang J, Nie L, Wang X, He X, Huang X, Chua T-S (2016) Shorter-is-better: Venue category estimation from micro-video. In: 2016 ACM international conference on Multimedia (ACM MM). ACM, pp 1415–1424

    Google Scholar 

  8. Nie L, Wang X, Zhang J, He X, Zhang H, Hong R, Tian Q (2017) Enhancing micro-video understanding by harnessing external sounds. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 1192–1200

    Google Scholar 

  9. Chen J, He X, Song X, Zhang H, Nie L, Chua T-S (2018) Venue prediction for social images by exploiting rich temporal patterns in LBSNs. In: 2018 International Conference on Multimedia Modeling (MMM). Springer, pp 327–339

    Google Scholar 

  10. Liu W, Huang X, Cao G, Zhang J, Song G, Yang L (2019) Joint learning of nnextvlad, cnn and context gating for micro-video venue classification[J]. IEEE Access 7:77091–77099

    Google Scholar 

  11. Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 1437–1445

    Google Scholar 

  12. Wei Y, Cheng Z, Yu X, Zhao Z, Zhu L, Nie L (2019) Personalized hashtag recommendation for micro-videos. In: 2019 ACM international conference on Multimedia (ACM MM). ACM, pp 1446–1454

    Google Scholar 

  13. Li Y, Liu M, Yin J, Cui C, Xu X-S, Nie L (2019) Routing micro-videos via a temporal graph-guided recommendation system. In: 2019 ACM international conference on Multimedia (ACM MM). ACM, pp 1464–1472

    Google Scholar 

  14. Shang S, Shi M, Shang W, Hong Z (2016) A micro-video recommendation system based on big data. In: 2016 IEEE/ACIS International Conference on Computer and Information Science (ICIS). IEEE, pp 1–5

    Google Scholar 

  15. Huang L, Luo B (2017) Personalized micro-video recommendation via hierarchical user interest modeling. In: 2017 the Pacific Rim Conference on Multimedia (PCM). Springer, pp 564–574

    Google Scholar 

  16. Ding J, Li Y, Li Y, Jin D (2018) Click versus share: A feature-driven study of micro-video popularity and virality in social media. In: 2018 SIAM International Conference on Data Mining (SDM). SIAM, pp 198–206

    Google Scholar 

  17. Chen X, Dong L, Zha Z-J, Zhou W, Xiong Z, Li Y (2018) Temporal hierarchical attention at category-and item-level for micro-video click-through prediction. In: 2018 ACM international conference on Multimedia (ACM MM). ACM, pp 1146–1153

    Google Scholar 

  18. Ma J, Li G, Zhong M, Zhao X, Zhu L, Li X (2018) Lga: latent genre aware micro-video recommendation on social media[J]. Multimed Tools Appl 77(3):2991–3008

    Google Scholar 

  19. Wei Y, Wang X, Guan W, Nie L, Lin Z, Chen B (2019) Neural multimodal cooperative learning toward micro-video understanding[J]. IEEE Trans Image Process 29:1–14

    ADS  MathSciNet  PubMed  Google Scholar 

  20. Liu Z, Yang N, Cao S (2016) Sentiment-analysis of review text for micro-video. In: 2016 IEEE International Conference on Computer and Communications (ICCC). IEEE, pp 526–530

    Google Scholar 

  21. Liu M, Nie L, Wang M, Chen B (2017) Towards micro-video understanding by joint sequential-sparse modeling. In: 2017 ACM international conference on Multimedia (ACM MM). ACM, pp 970–978

    Google Scholar 

  22. Jing P, Yuting S, Liqiang Nie X, Bai JL, Wang M (2017) Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE Trans Knowl Data Eng 30(8):1519–1532

    Google Scholar 

  23. Liu M, Nie L, XiangWang QT, Chen B (2018) Online data organizer: micro-video categorization by structure-guided multimodal dictionary learning[J]. IEEE Trans Image Process 28(3):1235–1247

    ADS  MathSciNet  PubMed  Google Scholar 

  24. Yuting S, Junyu X, Hong D, Fan F, Zhang J, Jing P (2021) Deep low-rank matrix factorization with latent correlation estimation for micro-video multi-label classification[J]. Inf Sci 575:587–598

    MathSciNet  Google Scholar 

  25. Chen X, Liu D, Xiong Z, Zha Z-J (2021) Learning and fusing multiple user interest representations for Micro-video and movie recommendations[J]. IEEE Trans Multimed 23:484–496

    Google Scholar 

  26. Han Y, Pan G, Gao W, Guandong X, Jian W (2021) Aspect-level sentiment capsule network for micro-video click-through rate prediction[J]. World Wide Web 24(4):1045–1064

    Google Scholar 

  27. Dong Y, Zhang S, Zhao Z, Fan W, Zhu J, He X, Fei W (2021) Modeling high-order interactions across multi-interests for micro-video recommendation (Student abstract). In: 2021 AAAI Conference on Artificial Intelligence (AAAI). AAAI, pp 15945–15946

    Google Scholar 

  28. He L, Wang D, Wang H, Chen H, Guandong X (2021) TagPick: A system for bridging micro-video hashtags and e-commerce categories. In: 2021 ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 4721–4724

    Google Scholar 

  29. Liu Y, Liu Q, Yu T, Wang C, Niu Y, Yang S, Li C (2021) Concept-aware denoising graph neural network for micro-video recommendation. In: 2021 ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 1099–1108

    Google Scholar 

  30. Lei C, Liu Y, Zhang L, Wang G, Tang H, Li H, Miao C (2021) SEMI: a sequential multi-modal information transfer network for E-commerce Micro-video recommendations. In Proceedings of ACM SIGKDD conference 2021:3161–3171

    Google Scholar 

  31. Yi J, Zhu Y, Xie J, Chen Z (2021) Cross-modal variational auto-encoder for content-based Micro-video background music recommendation [J]. IEEE Trans Multimed 25:515–528

    Google Scholar 

  32. Lu Y, Huang Y, Zhang S, Han W, Chen H, Zhao Z, Wu F (2021) Multi-trends enhanced dynamic micro-video recommendation. arXiv:2110.03902v1

  33. Guo J, Nie X, Ma Y, Shaheed K, Ullah I, Yin Y (2021) Attention based consistent semantic learning for micro-video scene recognition [J]. Inf Sci 543:504–516

    MathSciNet  Google Scholar 

  34. Guo J, Nie X, Yin Y (2020) Mutual complementarity: multi-modal enhancement semantic learning for micro-video scene recognition [J]. IEEE Access 8:29518–29524

    Google Scholar 

  35. Yang C, Wang X, Jiang B (2020) Sentiment enhanced multi-modal hashtag recommendation for Micro-videos[J]. IEEE Access 8:78252–78264

    Google Scholar 

  36. Zhang J, Yuting W, Liu J, Jing P, Yuting S (2020) Low-rank regularized multimodal representation for Micro-video event detection[J]. IEEE Access 8:87266–87274

    Google Scholar 

  37. Xiaowei G, Lu L, Qiu S, Zou Q, Yang Z (2020) Sentiment key frame extraction in user-generated micro-videos via low-rank and sparse representation[J]. Neurocomputing 410:441–453

    Google Scholar 

  38. Cao D, Miao L, Rong H, Qin Z (2020) Liqiang Nie: hashtag our stories: hashtag recommendation for micro-videos via harnessing multiple modalities. Knowl [J] Based Syst 203:106114

    Google Scholar 

  39. Yuting S, Yang Li X, Bai PJ (2020) Predicting the popularity of micro-videos via a feature-discrimination transductive model[J]. Multimed Syst 26(5):519–534

    Google Scholar 

  40. Liu W, Huang X, Cao G, Zhang J, Song G, Yang L (2020) Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification[J]. Multimed Tools Appl 79(9-10):6709–6726

    Google Scholar 

  41. Yuting S, Hong D, Li Y, Jing P (2020) Low-rank regularized deep collaborative matrix factorization for Micro-video multi-label classification[J]. IEEE Signal Process Lett 27:740–744

    ADS  Google Scholar 

  42. Wei Y, Wang X, Guan W, Nie L, Lin Z, Chen B (2020) Neural multimodal cooperative learning toward Micro-video understanding[J]. IEEE Trans Image Process 29:1–14

    ADS  MathSciNet  PubMed  Google Scholar 

  43. Liu S, Xie J, Zou C, Chen Z (2020) User conditional hashtag recommendation for Micro-videos. In Proceedings of IEEE International Conference on Multimedia and Expo. 1-6

  44. Hao Jiang, Wenjie Wang, Yinwei Wei, Zan Gao, Yinglong Wang, Liqiang Nie (2020 ) What aspect do you like: multi-scale time-aware user interest modeling for Micro-video recommendation. In Proceedings of ACM Conference on Multimedia 3487-3495

  45. Xie J, Zhu Y, Zhang Z, Peng J, Yi J, Hu Y, Liu H, Chen Z (2020) A multimodal variational encoder-decoder framework for micro-video popularity prediction. In: 2020 International World Wide Web Conferences (WWW). W3C, pp 2542–2548

    Google Scholar 

  46. Zhu Y, Xie J, Chen Z (2003) Predicting the popularity of micro-videos with multimodal variational encoder-decoder framework. arXiv:2003:12724v1

  47. Nie L, Liu M, Song X (2019) Multimodal learning toward micro-video understanding [M], San Rafael

  48. Ma J, Wen J, Zhong M, Chen W, Li X (2019) MMM: multi-source multi-net Micro-video recommendation with clustered hidden item representation learning[J]. Data Sci Eng 4(3):240–253

    Google Scholar 

  49. Guo J, Nie X, Jian M, Yin Y (2019) binary feature representation learning for scene retrieval in micro-video. Multimed Tools Appl 78(17):24539–24552

    Google Scholar 

  50. Li M, Gan T, Liu M, Cheng Z, Yin J, Nie L (2019) Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 509-518

  51. Ma J, Wen J, Zhong M, Chen W, Zhou X, Indulska J (2019) Multi-source Multi-net Micro-video Recommendation with Hidden Item Category Discovery. In Proceedings of the 24th International Conference on Database Systems for Advanced Applications, 384-400

  52. Jin Y, Xu J, He X (2019) Personalized micro-video recommendation based on multi-modal features and user interest evolution. In: 2019 International Conference on Image and Graphics (ICIG). SPIE, pp 607–618

    Google Scholar 

  53. Liu S, Chen Z (2019) Sequential behavior modeling for next micro-video recommendation with collaborative transformer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 460–465

    Google Scholar 

  54. Ma S, Zha Z-J Wu F (2019) Knowing user better: jointly predicting click-through and playtime for micro-video. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 472–477

    Google Scholar 

  55. Chen J, Peng J, Qi L, Chen G, Zhang W (2019) Implicit rating methods based on interest preferences of categories for micro-video recommendation. In: 2019 International Conference on Knowledge Science, Engineering and Management (KSEM). Springer, pp 371–381

    Google Scholar 

  56. Liu S, Chen Z, Liu H, Hu X (2019) User-video co-attention network for personalized micro-video recommendation. In: 2019 World Wide Web Conferences (WWW). W3C, pp 3020–3026

    Google Scholar 

  57. Guo J, Nie X, Cui C, Xi X, Ma Y, Yin Y (2018) Getting more from one attractive scene: venue retrieval in micro-videos. In: 2018 Pacific Rim Conference on Multimedia (PCM). Springer, pp 721–733

    Google Scholar 

  58. Liu W, Huang X, Cao G, Song G, Yang L (2018) Joint learning of LSTMs-CNN and prototype for micro-video venue classification. In: 2018 Pacific Rim Conference on Multimedia (PCM). Springer, pp 705–715

    Google Scholar 

  59. Jiang Y, Xu B, Xue X (2014) Predicting emotions in user-generated videos. In: 2014 AAAI Conference on Artificial Intelligence (AAAI). AAAI, pp 73–79

    Google Scholar 

  60. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks [J]. Adv Neural Inf Proces Syst 25(2):1097–1105

    Google Scholar 

  61. Graves A, Graves A (2012) Long short-term memory [J]. In: Supervised sequence labelling with recurrent neural networks, 4th edn. Springer-Verlag, Berlin Heidelberg, pp 37–45

    Google Scholar 

  62. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556

  63. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on Computer Vision & Pattern Recognition (CVPR). IEEE, pp 770-778

  64. Wang B, Huang X, Cao G et al (2022) Hybrid-attention and frame difference enhanced network for micro-video venue recognition [J]. J Intell Fuzzy Syst 43(3):3337–3353

    Google Scholar 

  65. Wang B, Huang X, Cao G et al (2022) Attention-enhanced and trusted multimodal learning for micro-video venue recognition [J]. Comput Electr Eng 102:108127

    Google Scholar 

  66. Jian M, Wang J, Yu H et al (2021) Visual saliency detection by integrating spatial position prior of object with background cues[J]. Expert Syst Appl 168:114219

    Google Scholar 

  67. Jian M, Wang J, Yu H et al (2021) Integrating object proposal with attention networks for video saliency detection[J]. Inf Sci 576:819–830

    MathSciNet  Google Scholar 

  68. Lu X, Jian M, Wang X et al (2022) Visual saliency detection via combining center prior and U-net[J]. Multimedia Systems 28(5):1689–1698

    Google Scholar 

  69. Jian M, Zhang W, Yu H et al (2018) Saliency detection based on directional patches extraction and principal local color contrast[J]. J Vis Commun Image Represent 57:1–11

    Google Scholar 

  70. Wan W, Wang J, Zhang Y, Li J, Hui Y, Sun J (2022) A comprehensive survey on robust image watermarking. Neurocomputing 488:226–247

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (62176141, 62176139, 61876098), Major Basic Research Project of Natural Science Foundation of Shandong Province (ZR2021ZD15), Taishan Scholar Project of Shandong Province (tsqn202103088), Shandong Provincial Natural Science Foundation for Distinguished Young Scholars (ZR2021JQ26), Natural Science Foundation of Shandong Province (ZR2021QF119, ZR2022MF272) and special funds for distinguished professors of Shandong Jianzhu University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiushan Nie.

Ethics declarations

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, J., Gong, R., Ma, Y. et al. A survey of micro-video analysis. Multimed Tools Appl 83, 32191–32212 (2024). https://doi.org/10.1007/s11042-023-16691-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16691-1

Keywords

Navigation