Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

Niebles, Juan Carlos; Chen, Chih-Wei; Fei-Fei, Li

doi:10.1007/978-3-642-15552-9_29

Juan Carlos Niebles^19,20,21,
Chih-Wei Chen¹⁹ &
Li Fei-Fei¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6312))

Included in the following conference series:

European Conference on Computer Vision

6922 Accesses
248 Citations

Abstract

Much recent research in human activity recognition has focused on the problem of recognizing simple repetitive (walking, running, waving) and punctual actions (sitting up, opening a door, hugging). However, many interesting human activities are characterized by a complex temporal composition of simple actions. Automatic recognition of such complex actions can benefit from a good understanding of the temporal structures. We present in this paper a framework for modeling motion by exploiting the temporal structure of the human activities. In our framework, we represent activities as temporal compositions of motion segments. We train a discriminative model that encodes a temporal decomposition of video sequences, and appearance models for each motion segment. In recognition, a query video is matched to the model according to the learned appearances and motion segment decomposition. Classification is made based on the quality of matching between the motion segment classifiers and the temporal segments in the query sequence. To validate our approach, we introduce a new dataset of complex Olympic Sports activities. We show that our algorithm performs better than other state of the art methods.

Download to read the full chapter text

Chapter PDF

Analysis of Temporal Coherence in Videos for Action Recognition

Human Action Recognition Using Temporal Segmentation and Accordion Representation

Time Series Modeling for Activity Prediction

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine Recognition of Human Activities: A Survey. IEEE Transactions on Circuits and Systems for Video Technology 18, 1473–1488 (2008)
Article Google Scholar
Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J., Ramanan, D.: Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis. Foundations and Trends in Computer Graphics and Vision 1, 77–254 (2005)
Article Google Scholar
Laptev, I.: On Space-Time Interest Points. IJCV 64, 107–123 (2005)
Article Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936. IEEE, Los Alamitos (2009)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, p. 18. IEEE, Los Alamitos (2008)
Google Scholar
Wang, Y., Mori, G.: Human action recognition by semilatent topic models. IEEE TPAMI 31, 1762–1774 (2009)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. IJCV 79, 299–318 (2008)
Article Google Scholar
Wong, S.F., Kim, T.K., Cipolla, R.: Learning Motion Categories using both Semantic and Structural Information. In: CVPR, pp. 1–6. IEEE, Los Alamitos (2007)
Google Scholar
Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: CVPR. IEEE, Los Alamitos (2007)
Google Scholar
Ikizler, N., Forsyth, D.A.: Searching for Complex Human Activities with No Visual Examples. IJCV 80, 337–357 (2008)
Article Google Scholar
Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: CVPR, pp. 2012–2019. IEEE, Los Alamitos (2009)
Google Scholar
Sminchisescu, C., Kanaujia, A., Metaxas, D.: Conditional models for contextual human motion recognition. CVIU 104, 210–220 (2006)
Google Scholar
Wang, S.B., Quattoni, A., Morency, L.P., Demirdjian, D., Darrell, T.: Hidden Conditional Random Fields for Gesture Recognition. In: CVPR, vol. 2, pp. 1521–1527. IEEE, Los Alamitos (2006)
Google Scholar
Quattoni, A., Wang, S.B., Morency, L.P., Collins, M., Darrell, T.: Hidden conditional random fields. IEEE TPAMI 29, 1848–1853 (2007)
Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition. In: CVPR. IEEE, Los Alamitos (2008)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. In: CVPR. IEEE, Los Alamitos (2010)
Google Scholar
Bouchard, G., Triggs, B.: Hierarchical Part-Based Visual Object Categorization. In: CVPR, pp. 710–715. IEEE, Los Alamitos (2005)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial Structures for Object Recognition. IJCV 61, 55–79 (2005)
Article Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition. IJCV 71, 273–303 (2007)
Article Google Scholar
Niebles, J.C., Fei-Fei, L.: A Hierarchical Model of Shape and Appearance for Human Action Classification. In: CVPR, pp. 1–8. IEEE, Los Alamitos (2007)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. IEEE TPAMI, 1–20 (2009)
Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Event Detection in Crowded Videos. In: ICCV, pp. 1–8. IEEE, Los Alamitos (2007)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-Temporal Features. In: VSPETS, pp. 65–72. IEEE, Los Alamitos (2005)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. In: ICCV, vol. 2, pp. 1395–1402. IEEE, Los Alamitos (2005)
Google Scholar
Felzenszwalb, P.F., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR, pp. 1–8. IEEE, Los Alamitos (2008)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36. IEEE, Los Alamitos (2004)
Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)
Google Scholar
Kim, T.K., Wong, S.F., Cipolla, R.: Tensor Canonical Correlation Analysis for Action Classification. In: CVPR, pp. 1–8. IEEE, Los Alamitos (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Stanford University, Stanford, CA, 94305, USA
Juan Carlos Niebles, Chih-Wei Chen & Li Fei-Fei
Princeton University, Princeton, NJ, 08544, USA
Juan Carlos Niebles
Universidad del Norte, Barranquilla, Colombia
Juan Carlos Niebles

Authors

Juan Carlos Niebles
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Li Fei-Fei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis
School of Electrical and Computer Engineering, National Technical University of Athens, 15773, Athens, Greece
Petros Maragos
Department of Applied Mathematics, Ecole Centrale de Paris, Grande Voie des Vignes, 92295, Chatenay-Malabry, France
Nikos Paragios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niebles, J.C., Chen, CW., Fei-Fei, L. (2010). Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15552-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-15552-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15551-2
Online ISBN: 978-3-642-15552-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

Abstract

Chapter PDF

Similar content being viewed by others

Analysis of Temporal Coherence in Videos for Action Recognition

Human Action Recognition Using Temporal Segmentation and Accordion Representation

Time Series Modeling for Activity Prediction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

Abstract

Chapter PDF

Similar content being viewed by others

Analysis of Temporal Coherence in Videos for Action Recognition

Human Action Recognition Using Temporal Segmentation and Accordion Representation

Time Series Modeling for Activity Prediction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation