Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Wang, Yuwang; Wang, Baoyuan; Yu, Yizhou; Dai, Qionghai; Tu, Zhuowen

doi:10.1007/978-3-319-16814-2_17

Yuwang Wang¹⁷,
Baoyuan Wang¹⁸,
Yizhou Yu¹⁹,
Qionghai Dai¹⁷ &
…
Zhuowen Tu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Asian Conference on Computer Vision

1649 Accesses
1 Citations

Abstract

This paper presents “Action-Gons”, a middle level representation for action recognition in videos. Actions in videos exhibit a reasonable level of regularity seen in human behavior, as well as a large degree of variation. One key property of action, compared with image scene, might be the amount of interaction among body parts, although scenes also observe structured patterns in 2D images. Here, we study high-order statistics of the interaction among regions of interest in actions and propose a mid-level representation for action recognition, inspired by the Julesz school of n-gon statistics. We propose a systematic learning process to build an over-complete dictionary of “Action-Gons”. We first extract motion clusters, named as action units, then sequentially learn a pool of action-gons with different granularities modeling different degree of interactions among action units. We validate the discriminative power of our learned action-gons on three challenging video datasets and show evident advantages over the existing methods.

This work was done when Yuwang Wang was an intern at Micrsoft Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code is available from: http://research.microsoft.com/en-us/downloads/dad6c31e-2c04-471f-b724-ded18bf70fe3.
2.
Code is based on http://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html.

References

Wang, H., Klser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103, 60–79 (2013)
Article Google Scholar
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)
Google Scholar
Sadanand, S., Corso, J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)
Google Scholar
Wang, L., Qiao, Y., Tang, X.: Motionlets: mid-level 3D parts for human motion recognition. In: CVPR 2013 (2013)
Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization of actions with actoms. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2782–2795 (2013)
Article Google Scholar
Yuan, F., Xia, G.S., Sahbi, H., Prinet, V.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn. 45, 4182–4191 (2012)
Article Google Scholar
Julesz, B., Gilbert, E.N., Victor, J.D.: Visual discrimination of texture with identical third-order statistics. Biol. Cybern. 31, 137–140 (1978)
Article Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: ICCV (2009)
Google Scholar
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2003–2010 (2011)
Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Article Google Scholar
Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)
Google Scholar
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D gradients. In: BMVC (2008)
Google Scholar
Bilinski, P., Bremond, F.: Contextual statistics of space-time ordered features for human action recognition. In: Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2012, pp. 228–233 (2012)
Google Scholar
Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 508–521. Springer, Heidelberg (2010)
Chapter Google Scholar
Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009, pp. 2004–2011 (2009)
Google Scholar
Zhu, J., Wang, B., Yang, X., Zhang, W., Zhuowen, T.: Action recognition with actons. In: ICCV (2013)
Google Scholar
Si, Z., Pei, M., Yao, Z., Zhu, S.C.: Unsupervised learning of event and-or grammar and semantics from video. In: ICCV (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision, Sydney, Australia (2013)
Google Scholar
Tabatabaei, S.S., Coates, M., Rabbat, M.G.: Ganc: greedy agglomerative normalized cut. CoRR abs/1105.0974 (2011)
Google Scholar
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Google Scholar
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1568–1583 (2006)
Article Google Scholar
Yuille, A., Rangarajan, A.: The concave-convex procedure (CCCP). Neural Comput. 15, 915–936 (2003)
Article MATH Google Scholar
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE Intenational Conference on Computer Vision (ICCV), Sydney, Australia (2013)
Google Scholar
Michael Sapienza, F.C., Torr, P.H.: Learning discriminative space-time actions from weakly labelled videos. In: BMVC (2012)
Google Scholar
Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: CVPR 2013 (2013)
Google Scholar
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR 2013 (2013)
Google Scholar
Brendel, W., Todorovic, S.: Activities as time series of human postures. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 721–734. Springer, Heidelberg (2010)
Chapter Google Scholar
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2046–2053 (2010)
Google Scholar
Wu, X., Xu, D., Duan, L., Luo, J., Jia, Y.: Action recognition using multilevel features and latent structural SVM. IEEE Trans. Circ. Syst. Video Technol. 23, 1422–1431 (2013)
Article Google Scholar

Download references

Acknowledge

Zhuowen Tu is supported by NSF IIS-1216528(IIS-1360566) and NSF award IIS-0844566(IIS-1360568).

Author information

Authors and Affiliations

BBNC Lab, Department of Automation, THU, Beijing, China
Yuwang Wang & Qionghai Dai
Microsoft Research, Beijing, China
Baoyuan Wang
Department of Compute Science, HKU, Hong Kong, China
Yizhou Yu
Department of CogSci, UCSD, San Diego, USA
Zhuowen Tu

Authors

Yuwang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Baoyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qionghai Dai
View author publications
You can also search for this author in PubMed Google Scholar
Zhuowen Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baoyuan Wang .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Wang, B., Yu, Y., Dai, Q., Tu, Z. (2015). Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-16814-2_17
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics