Multimedia Tools and Applications

, Volume 78, Issue 2, pp 1737–1755 | Cite as

Interactive video summarization with human intentions

  • Huaping LiuEmail author
  • Fuchun Sun
  • Xinyu Zhang
  • Bin Fang


Automatic video summarization, which is a typical cognitive-inspired task and attempts to select a small set of the most representative images or video clips for a specific video sequence, is therefore vital for enabling many tasks. In this work, we develop an interactive Non-negative Matrix Factorization (NMF) method for representative action video discovery. The original video is first evenly segmented into short clips, and the bag-of-words model is used to describe each clip. A temporally consistent NMF model is subsequently used for clustering and action segmentation. Because the clustering and segmentation results may not satisfy user intention, the user-controlled operations MERGE and ADD are developed to permit the user to adjust the results in line with expectations. The newly developed interactive NMF method can therefore generate personalized results.Experimental results on the public Weizman dataset demonstrate that our approach provides satisfactory action discovery and segmentation results.


Interactive action summarization Video summarization Human-machine interaction Non-negative matrix factorization 



This work was supported in part by the National Natural Science Foundation of China under Grant U1613212, Grant 61673238, in part by the Beijing Municipal Science and Technology Commission under Grant D171100005017002, and in part by the National High Technology Research and Development Program of China under Grant 2016YFB0100903.


  1. 1.
    Amato FF, Castiglione A, Moscato V et al. (2018) Multimedia summarization using social media content[J]. Multimed Tools Appl, 1–25Google Scholar
  2. 2.
    Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Proceedings of international conference on computer vision (ICCV), pp 1395–1402Google Scholar
  3. 3.
    Borzeshi E, Concha O, Xu R, Piccardi M (2013) Joint action segmentation and classification by an extended Hidden Markov model. IEEE Signal Process Lett, 1207–1210Google Scholar
  4. 4.
    Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. In: Proceedings of international conference in data mining (ICDM), pp 63–72Google Scholar
  5. 5.
    Chang XX, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks[J]. IEEE Trans Neural Netw Learn Syst 28(10):2294–2305MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chang X, Nie F, Wang S et al. (2016) Compound rank-k projections for bilinear analysis[J]. IEEE Trans Neural Netw Learn Syst 27(7):1502–1513MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chang X, Yu Y, Yang Y et al. (2017) Semantic pooling for complex event analysis in untrimmed videos[J]. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632CrossRefGoogle Scholar
  8. 8.
    Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In: Proceedings of international conference on data mining (ICDM), pp 103–112Google Scholar
  9. 9.
    Chen S, Xin Y, Luo B (2016) Action-based pedestrian identification via hierarchical matching pursuit and order preserving sparse coding. Cognitive ComputationGoogle Scholar
  10. 10.
    Choo J, Lee C, Reddy C, Park H (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Visual Comput Graph 19(12):1992–2001CrossRefGoogle Scholar
  11. 11.
    Cui P, Wang F, Sun L, Zhang J, Yang S (2012) A matrix-based approach to unsupervised human action categorization. IEEE Trans Multimed, 102–110Google Scholar
  12. 12.
    Hossain M, Ojili P, Grimm C, Muller R, Watson L, Ramakrishnan N (2012) Scatter/gather clsutering: flexibly incorporating user feedback to steer clustering results. IEEE Trans Visual Comput Graph 18(12):2829–2838CrossRefGoogle Scholar
  13. 13.
    Hu T, Zhu X, Guo W et al. (2018) Human action recognition based on scene semantics[J]. Multimed Tools Appl, 1–22Google Scholar
  14. 14.
    Huang H, Fu S, Cai Z et al. (2018) Video abstract system based on spatial-temporal neighborhood trajectory analysis algorithm[J]. Multimed Tools Appl, 1–18Google Scholar
  15. 15.
    Hughes M, Sudderth E (2012) Nonparametric discovery of activity patterns from video collections. In: Proceedings of computer vision and pattern recognition workshops (CVPRW), pp 25–32Google Scholar
  16. 16.
    Kumaran N, Vadivel A, Kumar S (2018) Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance[J]. Multimed Tools Appl, 1–33Google Scholar
  17. 17.
    Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst, 556–562Google Scholar
  18. 18.
    Liu H, Liu Y, Yu Y, Sun F (2014) Diversified key-frame selection using structured L 2,1 optimization. IEEE Trans Indus Inform 10(3):1736–1745CrossRefGoogle Scholar
  19. 19.
    Liu H, Liu H, Sun F, Fang B (In press) Kernel regularized nonlinear dictionary learning for sparse coding. IEEE Trans Syst Man Cybern Syst.
  20. 20.
    Luo M, Nie F, Chang XX et al. (2017) Adaptive unsupervised feature selection with structure regularization[J]. IEEE Transactions on Neural Networks and Learning SystemsGoogle Scholar
  21. 21.
    Ma Z, Chang X, Xu Z et al. (2017) Joint attributes and event analysis for multimedia event detection[J]. IEEE Transactions on Neural Networks and Learning SystemsGoogle Scholar
  22. 22.
    Shao L, Jones S, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circ Syst Video Technol 24(3):504–512CrossRefGoogle Scholar
  23. 23.
    Tang J, Lewis P (2008) Non-negative matrix factorization for object class discovery and image auto-annotation. In: Proceedings of international conference on content-based image and video retrieval (CIVR), pp 105–112Google Scholar
  24. 24.
    Tu Z, Abel A, Zhang L, Luo B, Hussain A (2016) A new spatio-temporal saliency-based video object segmentation. Cognitive ComputationGoogle Scholar
  25. 25.
    Wang M, Ji D, Tian Q, Hua X (2012) Intelligent photo clustering with user interaction and distance metric learning. Pattern Recogn Lett, 462–470Google Scholar
  26. 26.
    Zhao B, Xing E (2014) Quasi real-time summarization for consumer videos. In: Proceedings of computer vision and pattern recognition (CVPR), pp 2513–2520Google Scholar
  27. 27.
    Zhao G, Qin S, Wang D (2018) Interactive segmentation of texture image based on active contour model with local inverse difference moment feature. Multimed Tools Appl, 1–28Google Scholar
  28. 28.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyTsinghua University, BNRist, State Key Lab. of Intelligent Technology and SystemsBeijingChina
  2. 2.State Key Laboratory of Automotive Safety and EnergyTsinghua UniversityBeijingChina

Personalised recommendations