INNS 2016: Advances in Big Data pp 18-28 | Cite as
Compact Video Description and Representation for Automated Summarization of Human Activities
Abstract
A compact framework is presented for the description and representation of videos depicting human activities, with the goal of enabling automated large-volume video summarization for semantically meaningful key-frame extraction. The framework is structured around the concept of per-frame visual word histograms, using the popular Bag-of-Features approach. Three existing image descriptors (histogram, FMoD, SURF) and a novel one (LMoD), as well as a component of an existing state-of-the-art activity descriptor (Dense Trajectories), are adapted into the proposed framework and quantitatively compared against each other, as well as against the most common video summarization descriptor (global image histogram), using a publicly available annotated dataset and the most prevalent video summarization method, i.e., frame clustering. In all cases, several image modalities are exploited (luminance, hue, edges, optical flow magnitude) in order to simultaneously capture information about the depicted shapes, colors, lighting, textures and motions. The quantitative evaluation results indicate that one of the proposed descriptors clearly outperforms the competing approaches in the context of the presented framework.
Keywords
Video summarization Video descriptionReferences
- 1.Evans, A., Agenjo, J., Blat, J.: Combined 2D and 3D web-based visualisation of on-set big media data. In: IEEE International Conference on Image Processing (ICIP), pp. 1120–1124 (2015)Google Scholar
- 2.Money, A.G., Agius, H.: Video summarization: a conceptual framework and survey of the state of the art. J. Vis. Commun. Representation 19(2), 121–143 (2008)CrossRefGoogle Scholar
- 3.Cahuina, E.J., Chavez, G.C.: A new method for static video summarization using local descriptors and video temporal segmentation. In: Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 226–233. IEEE (2013)Google Scholar
- 4.Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 41(6), 797–819 (2011)CrossRefGoogle Scholar
- 5.Zhuang, Y., Rui, Y., Huang, T., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: International Conference on Image Processing (ICIP), pp. 866–870. IEEE (1998)Google Scholar
- 6.De Avilla, S.E.F., Lopes, A.P.B., Luz Jr., A.L., Araujo, A.A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32(1), 56–68 (2011)CrossRefGoogle Scholar
- 7.Wan, T., Qin, Z.: A new technique for summarizing video sequences through histogram evolution, pp. 1–5. IEEE (2010)Google Scholar
- 8.Mademlis, I., Nikolaidis, N., Pitas, I.: Stereoscopic video description for key-frame extraction in movie summarization. In: European Signal Processing Conference (EUSIPCO), pp. 819–823. IEEE (2015)Google Scholar
- 9.Li, J.: Video shot segmentation and key frame extraction based on SIFT feature. In: International Conference on Image Analysis and Signal Processing (IASP), pp. 1–8. IEEE (2012)Google Scholar
- 10.Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV), pp. 1150–1157. IEEE (1999)Google Scholar
- 11.Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
- 12.Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision (ECCV), pp. 1–2 (2004)Google Scholar
- 13.Tian, Z., Xue, J., Lan, X., Li, C., Zheng, N.: Key object-based static video summarization. In: ACM International Conference on Multimedia, pp. 1301–1304 (2011)Google Scholar
- 14.Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circuits Syst. Video Technol. 16(1), 82–91 (2006)CrossRefGoogle Scholar
- 15.Fu, W., Wang, J., Gui, L., Lu, H., Ma, S.: Online video synopsis of structured motion. Neurocomputing 135, 155–162 (2014)CrossRefGoogle Scholar
- 16.Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by Dense Trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition (CVPR), pp. 3169–3176 (2011)Google Scholar
- 17.Mademlis, I., Iosifidis, A., Tefas, A., Nikolaidis, N., Pitas, I.: Exploiting stereoscopic disparity for augmenting human activity recognition performance. Multimedia Tools Appl. 75, 1–20 (2015)Google Scholar
- 18.Kourous, N., Iosifidis, A., Tefas, A., Nikolaidis, N., Pitas, I.: Video characterization based on activity clustering. In: International Conference on Electrical and Computer Engineering (ICECE), pp. 266–269. IEEE (2014)Google Scholar
- 19.Kim, H., Hilton, A.: Influence of colour and feature geometry on multi-modal 3D point clouds data registration. In: International Conference on 3D Vision (3DV), pp. 202–209 (2014)Google Scholar
- 20.Penatti, O., Valle, E., da Silva Torres, R.: Comparative study of global color and texture descriptors for Web image retrieval. J. Vis. Commun. Image Representation 23(2), 359–380 (2012)CrossRefGoogle Scholar
- 21.Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)Google Scholar
- 22.Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). doi: 10.1007/3-540-45103-X_50 CrossRefGoogle Scholar