Advertisement

Greedy Salient Dictionary Learning for Activity Video Summarization

  • Ioannis MademlisEmail author
  • Anastasios Tefas
  • Ioannis Pitas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11295)

Abstract

Automated video summarization is well-suited to the task of analysing human activity videos (e.g., from surveillance feeds), mainly as a pre-processing step, due to the large volume of such data and the small percentage of actually important video frames. Although key-frame extraction remains the most popular way to summarize such footage, its successful application for activity videos is obstructed by the lack of editing cuts and the heavy inter-frame visual redundancy. Salient dictionary learning, recently proposed for activity video key-frame extraction, models the problem as the identification of a small number of video frames that, simultaneously, can best reconstruct the entire video stream and are salient compared to the rest. In previous work, the reconstruction term was modelled as a Column Subset Selection Problem (CSSP) and a numerical, SVD-based algorithm was adapted for solving it, while video frame saliency, in the fastest algorithm proposed up to now, was also estimated using SVD. In this paper, the numerical CSSP method is replaced by a greedy, iterative one, properly adapted for salient dictionary learning, while the SVD-based saliency term is retained. As proven by the extensive empirical evaluation, the resulting approach significantly outperforms all competing key-frame extraction methods with regard to speed, without sacrificing summarization accuracy. Additionally, computational complexity analysis of all salient dictionary learning and related methods is presented.

Keywords

Key-frame extraction Dictionary learning Column Subset Selection Problem Video summarization 

References

  1. 1.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)CrossRefGoogle Scholar
  2. 2.
    Arai, H., Maung, C., Schweitzer, H.: Optimal column subset selection by A-star search. In: AAAI Conference on Artificial Intelligence (2015)Google Scholar
  3. 3.
    Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Symposium on Discrete Algorithms, pp. 968–977 (2009)CrossRefGoogle Scholar
  4. 4.
    Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circuits Syst. Video Technol. 16(1), 82–91 (2006)CrossRefGoogle Scholar
  5. 5.
    Chan, T.F., Hansen, P.C.: Low-rank revealing QR factorizations. Numer. Linear Algebra Appl. 1(1), 33–44 (1994)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimed. 14(1), 66–75 (2012)CrossRefGoogle Scholar
  7. 7.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision (ECCV), pp. 1–2 (2004)Google Scholar
  8. 8.
    Dang, C., Radha, H.: RPCA-KFE: key frame extraction for video using robust principal component analysis. IEEE Trans. Image Process. 24(11), 3742–3753 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    De Avilla, S.E.F., Lopes, A.P.B., Luz, A.L.J., Araujo, A.A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32(1), 56–68 (2011)CrossRefGoogle Scholar
  10. 10.
    Elhamifar, E., Sapiro, G., Vidal, R.: See all by looking at a few: Sparse modeling for finding representative objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  11. 11.
    Farahat, A.K., Ghodsi, A., Kamel, M.S.: Efficient greedy feature selection for unsupervised learning. Knowl. Inf. Syst. 35(2), 285–310 (2013)CrossRefGoogle Scholar
  12. 12.
    Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPOST multi-view and 3D human action/interaction database. In: Proceedings of the IEEE Conference for Visual Media Production (CVMP), pp. 159–168 (2009)Google Scholar
  13. 13.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV), pp. 1150–1157. IEEE(1999)Google Scholar
  14. 14.
    Mademlis, I., Nikolaidis, N., Pitas, I.: Stereoscopic video description for key-frame extraction in movie summarization. In: European Signal Processing Conference (EUSIPCO), pp. 819–823. IEEE (2015)Google Scholar
  15. 15.
    Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Compact video description and representation for automated summarization of human activities. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.) INNS 2016. AISC, vol. 529, pp. 18–28. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-47898-2_3CrossRefGoogle Scholar
  16. 16.
    Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Movie shot selection preserving narrative properties. In: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP) (2016)Google Scholar
  17. 17.
    Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Multimodal stereoscopic movie summarization conforming to narrative characteristics. IEEE Trans. Image Process. 25(12), 5828–5840 (2016)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Summarization of human activity videos via low-rank approximation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)Google Scholar
  19. 19.
    Mademlis, I., Tefas, A., Pitas, I.: Summarization of human activity videos using a salient dictionary. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2017)Google Scholar
  20. 20.
    Mademlis, I., Tefas, A., Pitas, I.: Regularized SVD-based video frame saliency for activity summarization. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)Google Scholar
  21. 21.
    Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  22. 22.
    Mei, S., Guan, G., Wang, Z., Wan, S., He, M., Feng, D.D.: Video summarization via minimum sparse reconstruction. Pattern Recogn. 48(2), 522–533 (2015)CrossRefGoogle Scholar
  23. 23.
    Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. arXiv preprint arXiv:1609.08758 (2016)
  24. 24.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_11CrossRefGoogle Scholar
  25. 25.
    Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1, 26 (2016)CrossRefGoogle Scholar
  26. 26.
    Sener, F., Yao, A.: Unsupervised learning and segmentation of complex activities from video. arXiv preprint arXiv:1803.09490 (2018)
  27. 27.
    Song, X., Sun, L., Lei, J., Tao, D., Yuan, G., Song, M.: Event-based large scale surveillance video summarization. Neurocomputing 187, 66–74 (2016)CrossRefGoogle Scholar
  28. 28.
    Theodoridis, T., Tefas, A., Pitas, I.: Multi-view semantic temporal video segmentation. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2016)Google Scholar
  29. 29.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  30. 30.
    Wang, Q., Zhang, X., Zhang, Y., Yi, Q.: AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2013)Google Scholar
  31. 31.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)CrossRefGoogle Scholar
  32. 32.
    Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_47CrossRefGoogle Scholar
  33. 33.
    Zhuang, Y., Rui, Y., Huang, T., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: International Conference on Image Processing (ICIP), pp. 866–870. IEEE (1998)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ioannis Mademlis
    • 1
    Email author
  • Anastasios Tefas
    • 1
  • Ioannis Pitas
    • 1
  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations