Spatio-Temporal Phrases for Activity Recognition

Zhang, Yimeng; Liu, Xiaoming; Chang, Ming-Ching; Ge, Weina; Chen, Tsuhan

doi:10.1007/978-3-642-33712-3_51

Yimeng Zhang²¹,
Xiaoming Liu²²,
Ming-Ching Chang²²,
Weina Ge²² &
…
Tsuhan Chen²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7574))

Included in the following conference series:

European Conference on Computer Vision

51 Citations

Abstract

The local feature based approaches have become popular for activity recognition. A local feature captures the local movement and appearance of a local region in a video, and thus can be ambiguous; e.g., it cannot tell whether a movement is from a person’s hand or foot, when the camera is far away from the person. To better distinguish different types of activities, people have proposed using the combination of local features to encode the relationships of local movements. Due to the computation limit, previous work only creates a combination from neighboring features in space and/or time. In this paper, we propose an approach that efficiently identifies both local and long-range motion interactions; taking the “push” activity as an example, our approach can capture the combination of the hand movement of one person and the foot response of another person, the local features of which are both spatially and temporally far away from each other. Our computational complexity is in linear time to the number of local features in a video. The extensive experiments show that our approach is generically effective for recognizing a wide variety of activities and activities spanning a long term, compared to a number of state-of-the-art methods.

Download to read the full chapter text

Chapter PDF

Evaluating and Extending Trajectory Features for Activity Recognition

Frame-Level Covariance Descriptor for Action Recognition

Detecting Spatio-Temporally Interest Points Using the Shearlet Transform

Keywords

References

Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.J.: Behavior recognition via sparse spatio-temporal features. In: PETS Workshop (2005)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Wang, X.G., Ma, X.X., Grimson, W.E.L.: Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. PAMI 31, 539–555 (2009)
Article Google Scholar
Liu, J.G., Luo, J.B., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: CVPR (2009)
Google Scholar
Liu, J.G., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)
Google Scholar
Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: CVPR (2007)
Google Scholar
Gaur, U., Zhu, Y., Song, B., Roy-Chowdhury, A.: A “string of feature graphs” model for recognition of complex activities in natural videos. In: ICCV (2011)
Google Scholar
Wang, P., Abowd, G.D., Rehg, J.M.: Quasi-periodic event analysis for social game retrieval. In: ICCV (2009)
Google Scholar
Duan, L., Xu, D., Tsang, I.W.H., Luo, J.: Visual event recognition in videos by learning from web data. In: CVPR (2010)
Google Scholar
Nowozin, S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification. In: ICCV (2007)
Google Scholar
Sun, J., Wu, X., Yan, S.C., Cheong, L.F., Chua, T.S., Li, J.T.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR (2009)
Google Scholar
Savarese, S., Pozo, A.D., Niebles, J.C., Li, F.F.: Spatial-temporal correlatons for unsupervised action classification. In: WMVC (2008)
Google Scholar
Gilbert, A., Illingworth, J., Bowden, R.: Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 222–233. Springer, Heidelberg (2008)
Chapter Google Scholar
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)
Google Scholar
Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: CVPR (2011)
Google Scholar
Wu, S., Moore, B.E., Shah, M.: Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: CVPR (2010)
Google Scholar
Messing, R., Pal, C., Kautz, H.A.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)
Google Scholar
Yao, A., Gall, J., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: CVPR (2010)
Google Scholar
Zhang, Y., Chen, T.: Efficient kernels for identifying unbounded-order spatial features. In: CVPR (2009)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Google Scholar
Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: ICCV (2011)
Google Scholar
Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV (2011)
Google Scholar
Ryoo, M.S., Chen, C.-C., Aggarwal, J.K., Roy-Chowdhury, A.: An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 270–285. Springer, Heidelberg (2010)
Chapter Google Scholar
Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a Hough-Voting Action Recognition System. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 306–312. Springer, Heidelberg (2010)
Chapter Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV (2009)
Google Scholar
Zhang, Y., Ge, W., Chang, M.C., Liu, X.: Group context learning for event recognition. In: WACV (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Cornell University, USA
Yimeng Zhang & Tsuhan Chen
GE Global Research Center, 1 Research Circle, Niskayuna, NY, USA
Xiaoming Liu, Ming-Ching Chang & Weina Ge

Authors

Yimeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Ching Chang
View author publications
You can also search for this author in PubMed Google Scholar
Weina Ge
View author publications
You can also search for this author in PubMed Google Scholar
Tsuhan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Liu, X., Chang, MC., Ge, W., Chen, T. (2012). Spatio-Temporal Phrases for Activity Recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33712-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-642-33712-3_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33711-6
Online ISBN: 978-3-642-33712-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Spatio-Temporal Phrases for Activity Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Evaluating and Extending Trajectory Features for Activity Recognition

Frame-Level Covariance Descriptor for Action Recognition

Detecting Spatio-Temporally Interest Points Using the Shearlet Transform

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Spatio-Temporal Phrases for Activity Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Evaluating and Extending Trajectory Features for Activity Recognition

Frame-Level Covariance Descriptor for Action Recognition

Detecting Spatio-Temporally Interest Points Using the Shearlet Transform

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation