Skip to main content

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

  • Conference paper
  • First Online:
Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Abstract

This paper presents “Action-Gons”, a middle level representation for action recognition in videos. Actions in videos exhibit a reasonable level of regularity seen in human behavior, as well as a large degree of variation. One key property of action, compared with image scene, might be the amount of interaction among body parts, although scenes also observe structured patterns in 2D images. Here, we study high-order statistics of the interaction among regions of interest in actions and propose a mid-level representation for action recognition, inspired by the Julesz school of n-gon statistics. We propose a systematic learning process to build an over-complete dictionary of “Action-Gons”. We first extract motion clusters, named as action units, then sequentially learn a pool of action-gons with different granularities modeling different degree of interactions among action units. We validate the discriminative power of our learned action-gons on three challenging video datasets and show evident advantages over the existing methods.

This work was done when Yuwang Wang was an intern at Micrsoft Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Code is available from: http://research.microsoft.com/en-us/downloads/dad6c31e-2c04-471f-b724-ded18bf70fe3.

  2. 2.

    Code is based on http://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html.

References

  1. Wang, H., Klser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103, 60–79 (2013)

    Article  Google Scholar 

  2. Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)

    Google Scholar 

  3. Sadanand, S., Corso, J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)

    Google Scholar 

  4. Wang, L., Qiao, Y., Tang, X.: Motionlets: mid-level 3D parts for human motion recognition. In: CVPR 2013 (2013)

    Google Scholar 

  5. Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization of actions with actoms. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2782–2795 (2013)

    Article  Google Scholar 

  6. Yuan, F., Xia, G.S., Sahbi, H., Prinet, V.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn. 45, 4182–4191 (2012)

    Article  Google Scholar 

  7. Julesz, B., Gilbert, E.N., Victor, J.D.: Visual discrimination of texture with identical third-order statistics. Biol. Cybern. 31, 137–140 (1978)

    Article  Google Scholar 

  8. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)

    Google Scholar 

  9. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: ICCV (2009)

    Google Scholar 

  10. Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2003–2010 (2011)

    Google Scholar 

  11. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)

    Article  Google Scholar 

  12. Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)

    Google Scholar 

  13. Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D gradients. In: BMVC (2008)

    Google Scholar 

  14. Bilinski, P., Bremond, F.: Contextual statistics of space-time ordered features for human action recognition. In: Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2012, pp. 228–233 (2012)

    Google Scholar 

  15. Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 508–521. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009, pp. 2004–2011 (2009)

    Google Scholar 

  17. Zhu, J., Wang, B., Yang, X., Zhang, W., Zhuowen, T.: Action recognition with actons. In: ICCV (2013)

    Google Scholar 

  18. Si, Z., Pei, M., Yao, Z., Zhu, S.C.: Unsupervised learning of event and-or grammar and semantics from video. In: ICCV (2011)

    Google Scholar 

  19. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision, Sydney, Australia (2013)

    Google Scholar 

  20. Tabatabaei, S.S., Coates, M., Rabbat, M.G.: Ganc: greedy agglomerative normalized cut. CoRR abs/1105.0974 (2011)

    Google Scholar 

  21. Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)

    Google Scholar 

  22. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1568–1583 (2006)

    Article  Google Scholar 

  23. Yuille, A., Rangarajan, A.: The concave-convex procedure (CCCP). Neural Comput. 15, 915–936 (2003)

    Article  MATH  Google Scholar 

  24. Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE Intenational Conference on Computer Vision (ICCV), Sydney, Australia (2013)

    Google Scholar 

  25. Michael Sapienza, F.C., Torr, P.H.: Learning discriminative space-time actions from weakly labelled videos. In: BMVC (2012)

    Google Scholar 

  26. Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: CVPR 2013 (2013)

    Google Scholar 

  27. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR 2013 (2013)

    Google Scholar 

  28. Brendel, W., Todorovic, S.: Activities as time series of human postures. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 721–734. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  29. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2046–2053 (2010)

    Google Scholar 

  30. Wu, X., Xu, D., Duan, L., Luo, J., Jia, Y.: Action recognition using multilevel features and latent structural SVM. IEEE Trans. Circ. Syst. Video Technol. 23, 1422–1431 (2013)

    Article  Google Scholar 

Download references

Acknowledge

Zhuowen Tu is supported by NSF IIS-1216528(IIS-1360566) and NSF award IIS-0844566(IIS-1360568).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baoyuan Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, Y., Wang, B., Yu, Y., Dai, Q., Tu, Z. (2015). Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16814-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16813-5

  • Online ISBN: 978-3-319-16814-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics