Skip to main content

Language-Motivated Approaches to Action Recognition

  • Chapter
  • First Online:
Gesture Recognition

Abstract

We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach.

Editors: Isabelle Guyon and Vassilis Athitsos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    When referring to activity spotting purposes, we use the term gestures instead of activities, only to be consistent with the terminology of the ChaLearn Gesture Challenge.

  2. 2.

    Implementation can be found at http://www.irisa.fr/vista/Equipe/People/Laptev/download.html#stip.

  3. 3.

    States are modeled as multinomials since our input observables are discrete values.

References

  • J.K. Aggarwal, M.S. Ryoo, Human activity analysis: a review. ACM Comput. Surv. 43, 1–16 (2011)

    Article  Google Scholar 

  • Y. Benabbas, A. Lablack, N. Ihaddadene, C. Djeraba, Action recognition using direction models of motion, in Proceedings of the 2010 International Conference on Pattern Recognition, 2010, pp. 4295–4298

    Google Scholar 

  • H. Bilen, V.P. Namboodiri, L. Van Gool, Action recognition: a region based approach, in Proceedings of the 2011 IEEE Workshop on the Applications of Computer Vision, 2011, pp. 294–300

    Google Scholar 

  • David M. Blei, Andrew Y. Ng, Michael I. Jordan, Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • M. Bregonzio, S. Gong, T. Xiang, Recognising action as clouds of space-time interest points, in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1948–1955

    Google Scholar 

  • ChaLearn. ChaLearn Gesture Dataset (CGD2011), ChaLearn, California, 2011. http://gesture.chalearn.org/2011-one-shot-learning

  • N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 886–893

    Google Scholar 

  • K.G. Derpanis, M. Sizintsev, K. Cannons, R.P. Wildes, Efficient action spotting based on a spacetime oriented structure representation, in Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1990–1997

    Google Scholar 

  • P. Dollár, V. Rabaud, G. Cottrell, S. Belongie, Behavior recognition via sparse spatio-temporal features, in Proceedings of the 2005 IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72

    Google Scholar 

  • A. Gilbert, J. Illingworth, R. Bowden, Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)

    Article  Google Scholar 

  • S. Gong, and T. Xiang, Recognition of group activities using dynamic probabilistic networks, in Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2003, pp. 742–749

    Google Scholar 

  • G. Heinrich, Parameter estimation for text analysis. Technical report, University of Leipzig, 2008

    Google Scholar 

  • T. Hospedales, S.G. Gong, T. Xiang, A Markov clustering topic model for mining behaviour in video, in Proceedings of the 2009 International Conference on Computer Vision, 2009, pp. 1165–1172

    Google Scholar 

  • A. Kläser, M. Marszalek, C. Schmid, A spatio-temporal descriptor based on 3d-gradients, in Proceedings of the 2008 British Machine Vision Conference (2008)

    Google Scholar 

  • A. Kovashka, K. Grauman, Learning a hierarchy of discriminative space-time neighborhood features for human action recognition, in Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2046–2053

    Google Scholar 

  • H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre, HMDB: a large video database for human motion recognition, in Proceedings of the 2011 International Conference on Computer Vision 2011

    Google Scholar 

  • I. Laptev, On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)

    Article  Google Scholar 

  • I. Laptev, T. Lindeberg, Space-time interest points, in Proceedings of the 2003 International Conference on Computer Vision, 2003, pp. 432–439

    Google Scholar 

  • I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8

    Google Scholar 

  • P. Matikainen, M. Hebert, R. Sukthankar, Trajectons: Action recognition through the motion analysis of tracked features, in Proceedings of the 2009 IEEE Workshop on Video-Oriented Object and Event Classification (2009)

    Google Scholar 

  • P. Matikainen, M. Hebert, R. Sukthankar, Representing pairwise spatial and temporal relations for action recognition, in Proceedings of the 2010 European Conference on Computer Vision 2010

    Google Scholar 

  • R. Messing, C. Pal, H. Kautz, Activity recognition using the velocity histories of tracked keypoints, in Proceedings of the 2009 International Conference on Computer Vision 2009

    Google Scholar 

  • P. Natarajan, R. Nevatia, Coupled hidden semi Markov models for activity recognition, in Proceedings of the IEEE Workshop on Motion and Video Computing 2007

    Google Scholar 

  • N.T. Nguyen, D.Q. Phung, S. Venkatesh, Learning and detecting activities from movement trajectories using the hierarchical hidden Markov models, in Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 955–960

    Google Scholar 

  • E. Nowak, F. Jurie, B. Triggs, Sampling strategies for bag-of-features image classification, in Proceedings of the 2006 European Conference on Computer Vision, 2006, pp. 490–503

    Google Scholar 

  • N. Oliver, E. Horvitz, A. Garg, Layered representations for human activity recognition, in Proceedings of the 2002 IEEE International Conference on Multimodal Interfaces, 2002, pp. 3–8

    Google Scholar 

  • Nuria M. Oliver, Barbara Rosario, Alex P. Pentland, A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)

    Article  Google Scholar 

  • J.R. Rohlicek, W. Russell, S. Roukos, H. Gish, Continuous hidden Markov modeling for speaker-independent word spotting, in Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 627–630

    Google Scholar 

  • R. Rose, D. Paul, A hidden Markov model based keyword recognition system, in Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing 1990

    Google Scholar 

  • C. Schüldt, I. Laptev, B. Caputo, Recognizing human actions: a local svm approach, in Proceedings of the 2004 International Conference on Pattern Recognition, 2004, pp. 32–36

    Google Scholar 

  • P. Scovanner, S. Ali, M. Shah, A 3-dimensional sift descriptor and its application to action recognition, in Procedings of the ACM International Conference on Multimedia, 2007, pp. 57–360

    Google Scholar 

  • University of Central Florida. University of Central Florida, Computer Vision Lab, 2010. URL http://server.cs.ucf.edu/~vision/data/UCF50.rar

  • H. Wang, M.M. Ullah, A. Kläser, I. Laptev, C. Schmid,Evaluation of local spatio-temporal features for action recognition, in Proceedings of the 2009 British Machine Vision Conference 2009

    Google Scholar 

  • H. Wang, A. Kläser, C. Schmid, L. Cheng-Lin, Action recognition by dense trajectories. in Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 3169–3176

    Google Scholar 

  • G. Willems, T. Tuytelaars, L. Gool, An efficient dense and scale-invariant spatio-temporal interest point detector, in Proceedings of the 2008 European Conference on Computer Vision, 2008, pp. 650–663

    Google Scholar 

  • J. Yamato, J. Ohya, K. Ishii, Recognizing human action in time-sequential images using hidden Markov model, in Proceedings of the 1992 IEEE Conference on Computer Vision and Pattern Recognition, 1992, pp. 379–385

    Google Scholar 

  • L. Yeffet, L. Wolf, Local trinary patterns for human action recognition, in Proceedings of the 2009 International Conference on Computer Vision 2009

    Google Scholar 

  • J. Yuan, Z. Liu, Y. Wu, Discriminative subvolume search for efficient action detection, in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009

    Google Scholar 

Download references

Acknowledgements

The authors wish to thank the associate editors and anonymous referees for all their advice about the structure, references, experimental illustration and interpretation of this manuscript. The work benefited significantly from our participation in the ChaLearn challenge as well as the accompanying workshops.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manavender R. Malgireddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Malgireddy, M.R., Nwogu, I., Govindaraju, V. (2017). Language-Motivated Approaches to Action Recognition. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57021-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57020-4

  • Online ISBN: 978-3-319-57021-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics