Advertisement

Video highlight extraction via content-aware deep transfer

  • Ke Niu
  • Han WangEmail author
Article
  • 23 Downloads

Abstract

In this paper, we focus on detecting highlights in online videos. Given the explosive growth of online videos, it is becoming increasingly important to single out those highlights for audiences instead of requiring them browsing every tedious part of the video. It is ideally that the contents of extracted highlights can be consistent with the topic of the video as well as the preference of the individual audience. To this end, this paper introduces a novel content-aware approach by formulating the highlights detection in a transfer learning framework. Under this framework. The experimental results on three different types of videos show that our content-aware highlight extraction method is particularly useful for online videos content fetching, e.g. showing the abstraction of the entire video while playing focus on the parts that matches the user queries.

Keywords

Video retrieval Video summarization Highlight detection Convolutional network 

Notes

Acknowledgments

The research was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61703046.

References

  1. 1.
    Azadi S, Feng J, Darrell T Learning detection with diverse proposalsGoogle Scholar
  2. 2.
    Bacco R, Lambert P, Ionescu BE (2008) Video summarization from spatio-temporal features. In: ACM Trecvid video summarization workshop, pp 144–148Google Scholar
  3. 3.
    Dan BG, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2006) 25(3):862–871CrossRefGoogle Scholar
  4. 4.
    Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition, pp 248–255Google Scholar
  5. 5.
    Ghosh J, Yong JL, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 2012. pp 1346–1353Google Scholar
  6. 6.
    Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: International conference on neural information processing systems, pp 2069– 2077Google Scholar
  7. 7.
    Gygli M, Grabner H, Riemenschneider H, Gool LV (2014) Creating summaries from user videos. In: European conference on computer vision, pp 505–520Google Scholar
  8. 8.
    Gygli M, Grabner H, Gool LV (2015) Video summarization by learning submodular mixtures of objectives. In: Computer vision and pattern recognitionGoogle Scholar
  9. 9.
    Gygli M, Song Y, Cao L (2016) Video2gif: automatic generation of animated gifs from video. In: Computer vision and pattern recognition, pp 1001–1009Google Scholar
  10. 10.
    Jiao Y, Yang X, Zhang T, Huang S, Xu C (2017) Video highlight detection via deep ranking modeling. Pacific-Rim Symposium on Image and Video Technology. Springer, Cham, pp 28–39Google Scholar
  11. 11.
    Joshi N, Kienzle W, Toelle M, Uyttendaele M, Cohen MF (2015) Real-time hyperlapse creation via optimal frame selection. ACM Trans Graph 34(4):63CrossRefGoogle Scholar
  12. 12.
    Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Foundations and Trends in Machine Learning 5(2–3):17zbMATHGoogle Scholar
  13. 13.
    Liu L, Cheng L, Liu Y, et al. (2016) Recognizing complex activities by a probabilistic interval-based model. In: Thirtieth AAAI conference on artificial intelligence, pp 1266–1272Google Scholar
  14. 14.
    Liu K, Liu W, Gan C, Tan M, Ma H (2018) T-c3d: temporal convolutional 3d network for real-time action recognition. In: Thirty-second AAAI conference on artificial intelligenceGoogle Scholar
  15. 15.
    Liu T, Kender JR (2002) Optimization algorithms for the selection of key frame sequences of variable length. In: European conference on computer vision, pp 403–417Google Scholar
  16. 16.
    Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: IEEE conference on computer vision and pattern recognition, pp 2714–2721Google Scholar
  17. 17.
    Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Conference on computer vision and pattern recognitionGoogle Scholar
  18. 18.
    Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):1–23CrossRefGoogle Scholar
  19. 19.
    Nie L, Xiang W, Zhang J, He X, Zhang H, Hong R, Qi T (2017) Enhancing micro-video understanding by harnessing external sounds. In: ACM on multimedia conferenceGoogle Scholar
  20. 20.
    Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: European conference on computer vision, pp 540–555Google Scholar
  21. 21.
    Rui Z, Sheng T, Wu L, Zhang Y, Li J (2016) Multi-modal tag localization for mobile video search. Multimedia Systems 23(6):713–724Google Scholar
  22. 22.
    Sharghi A, Gong B, Shah M (2016) Query-focused extractive video summarization. In: European conference on computer vision, pp 3–19Google Scholar
  23. 23.
    Sharghi A, Laurel JS, Gong B (2017) Query-focused video summarization: dataset, evaluation, and a memory network based approach. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2127–2136Google Scholar
  24. 24.
    Song J, Gao L, Guo Z, Liu W, Zhang D, Shen HT (2017) Hierarchical lstm with adjusted temporal attention for video captioning. In: International joint conference on artificial intelligence, pp 2737–2743Google Scholar
  25. 25.
    Sun M, Farhadi A, Seitz S (2014) Ranking domain-specific highlights by analyzing edited videos. In: European conference on computer vision, pp 787–802Google Scholar
  26. 26.
    Sun M, Zeng KH, Lin Y, Ali F (2017) Semantic highlight retrieval and term prediction. IEEE Trans Image Process 26(7):3303–3316MathSciNetCrossRefGoogle Scholar
  27. 27.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826Google Scholar
  28. 28.
    Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):3CrossRefGoogle Scholar
  29. 29.
    Vasudevan AB, Gygli M, Volokitin A, Van Gool L (2017) Query-adaptive video summarization via quality-aware relevance estimation. In: 2017 ACM on multimedia conference, pp 582–590Google Scholar
  30. 30.
    Wang H, Yu H, Hua R, Zou L (2018) Video highlight extraction based on the interests of users. Journal of Image and Graphics 23(5):0748–0755Google Scholar
  31. 31.
    Wu L, Tao M, Zhang Y, Che C, Luo J (2015) Multi-task deep visual-semantic embedding for video thumbnail selection. In: Computer vision and pattern recognitionGoogle Scholar
  32. 32.
    Xiong B, Grauman K (2014) Detecting snap points in egocentric video with a web photo prior. In: European conference on computer vision, pp 282–298Google Scholar
  33. 33.
    Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: IEEE International conference on computer visionGoogle Scholar
  34. 34.
    Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. In: Computer vision and pattern recognition, pp 982–990Google Scholar
  35. 35.
    Ye L, Nie L, Lei H, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: IJCAIGoogle Scholar
  36. 36.
    Ye L, Nie L, Li L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  37. 37.
    Yi R, Zhu C, Ping T , Lin S Faces as lighting probes via unsupervised deep highlight extractionGoogle Scholar
  38. 38.
    Yong JL, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition, pp 1346–1353Google Scholar
  39. 39.
    Zhang CL, Luo JH, Wei XS, Wu J (2017) In defense of fully connected layers in visual representation transfer. In: Pacific-rim conference on multimediaGoogle Scholar
  40. 40.
    Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758MathSciNetCrossRefGoogle Scholar
  41. 41.
    Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV, pp 766–782Google Scholar
  42. 42.
    Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: IEEE conference on computer vision and pattern recognitionGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer SchoolBeijing Information Science and Technology UniversityBeijingChina
  2. 2.School of Information Science and TechnologyBeijing Forestry UniversityBeijingChina

Personalised recommendations