Classifying web videos using a global video descriptor

Solmaz, Berkan; Assari, Shayan Modiri; Shah, Mubarak

doi:10.1007/s00138-012-0449-x

Classifying web videos using a global video descriptor

Original Paper
Published: 26 September 2012

Volume 24, pages 1473–1485, (2013)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Berkan Solmaz¹,
Shayan Modiri Assari¹ &
Mubarak Shah¹

797 Accesses
71 Citations
3 Altmetric
Explore all metrics

Abstract

Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al., Proceedings of the 17th international conference on, pattern recognition (ICPR’04), vol. 3, pp. 32–36, 2004), UCF50 (http://vision.eecs.ucf.edu/datasetsActions.html) and HMDB51 (Kuehne et al., HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatio-Temporal Object Recognition

A Robust and Efficient Video Representation for Action Recognition

Article 17 July 2015

Compact Video Description and Representation for Automated Summarization of Human Activities

References

Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on, pattern recognition (ICPR’04), vol. 3, pp. 32–36 (2004)
http://vision.eecs.ucf.edu/datasetsActions.html
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28, 976–990 (2010)
Article Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Computer Society conference on computer vision and, pattern recognition (CVPR ’08) (2008)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and, pattern recognition (CVPR ’09), pp. 1996–2003 (2009)
Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)
Article Google Scholar
Yilmaz, A., Shah, M.: A differential geometric approach to representing the human actions. Comput. Vis. Image Underst. 109, 335–351 (2008)
Article Google Scholar
Black, M.: Explaining optical flow events with parameterized spatio-temporal models. In: IEEE Computer Society conference on computer vision and, pattern recognition (CVPR ’99), vol. 1, pp. 326–332 (1999)
Polana, R., Nelson, R.C.: Detection and recognition of periodic, non-rigid motion. Int. J. Comput. Vision 23, 261–282 (1997)
Article Google Scholar
Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: IEEE international conference on computer vision (ICCV ’11), pp. 1419–1426 (2011)
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE conference on computer vision and, pattern recognition (CVPR ’11), pp. 3169–3176 (2011)
Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64, 107–123 (2005)
Article Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp. 147–151 (1988)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. BMVC, In (2008)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: ACM multimedia, pp. 357–360 (2007)
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of the 11th European conference on computer vision (ECCV ’10), pp. 494–507 (2010)
Oliva, A., Torralba, A.B., Guerin-Dugue, A., Herault, J.: Global semantic classification of scenes using power spectrum templates. Challenge of image retrieval, pp. 1–12 (1999)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 145–175 (2001)
Article MATH Google Scholar
Heeger, D.J.: Notes on motion estimation. http://white.stanford.edu/~heeger (1998)
Maaten, L.V.D., Postma, E.O., Herik, H.J.V.D.: Dimensionality reduction: a comparative review (2008)
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, p. 127 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision (ICCV’07), pp. 1–8 (2007)
Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33, 883–897 (2011)
Article Google Scholar

Download references

Acknowledgments

The research presented in this paper is supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior National Business Center, contract number D11PC20071. The US government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC or the US government.

Author information

Authors and Affiliations

University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
Berkan Solmaz, Shayan Modiri Assari & Mubarak Shah

Authors

Berkan Solmaz
View author publications
You can also search for this author in PubMed Google Scholar
Shayan Modiri Assari
View author publications
You can also search for this author in PubMed Google Scholar
Mubarak Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Berkan Solmaz.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (AVI 9508KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solmaz, B., Assari, S.M. & Shah, M. Classifying web videos using a global video descriptor. Machine Vision and Applications 24, 1473–1485 (2013). https://doi.org/10.1007/s00138-012-0449-x

Download citation

Received: 24 July 2012
Revised: 28 August 2012
Accepted: 31 August 2012
Published: 26 September 2012
Issue Date: October 2013
DOI: https://doi.org/10.1007/s00138-012-0449-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying web videos using a global video descriptor

Abstract

Access this article

Similar content being viewed by others

Spatio-Temporal Object Recognition

A Robust and Efficient Video Representation for Action Recognition

Compact Video Description and Representation for Automated Summarization of Human Activities

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifying web videos using a global video descriptor

Abstract

Access this article

Similar content being viewed by others

Spatio-Temporal Object Recognition

A Robust and Efficient Video Representation for Action Recognition

Compact Video Description and Representation for Automated Summarization of Human Activities

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation