Visual Code-Sentences: A New Video Representation Based on Image Descriptor Sequences

  • Yusuke Mitarai
  • Masakazu Matsugu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7583)


We present a new descriptor-sequence model for action recognition that enhances discriminative power in the spatio-temporal context, while maintaining robustness against background clutter as well as variability in inter-/intra-person behavior. We extend the framework of Dense Trajectories based activity recognition (Wang et al., 2011) and introduce a pool of dynamic Bayesian networks (e.g., multiple HMMs) with histogram descriptors as codebooks of composite action categories represented at respective key points. The entire codebooks bound with spatio-temporal interest points constitute intermediate feature representation as basis for generic action categories. This representation scheme is intended to serve as visual code-sentences which subsume a rich vocabulary of basis action categories. Through extensive experiments using KTH, UCF Sports, and Hollywood2 datasets, we demonstrate some improvements over the state-of-the-art methods.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, J.K., Ryoo, M.S.: Human Activity Analysis: A Review. ACM Computing Surveys 43(16) (2011)Google Scholar
  2. 2.
    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR (2005)Google Scholar
  3. 3.
    Dalal, N., Triggs, B., Schmid, C.: Human Detection Using Oriented Histograms of Flow and Appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-Temporal Features. In: VS-PETS (2005)Google Scholar
  5. 5.
    Gaidon, A., Harchaoui, Z., Schmid, C.: A time series kernel for action recognition. In: BMVC (2011)Google Scholar
  6. 6.
    Gilbert, A., Illingworth, J., Bowden, R.: Action Recognition using Mined Hierarchical Compound Features. TPAMI 33(5) (2009)Google Scholar
  7. 7.
    Kläser, A., Marszałek, M., Laptev, I., Schmid, C.: Will person detection help bag-of-features action recognition. Technical Report, INRIA Grenoble - Rhone-Alpes (2010)Google Scholar
  8. 8.
    Kovashshka, A., Grauman, K.: Learning a Hierarchical of Discriminative Space-Time Neighborhood Features for Human Action Recognition. In: CVPR (2010)Google Scholar
  9. 9.
    Laptev, I., Lindeberg, T.: Space-time Interest Points. In: ICCV (2003)Google Scholar
  10. 10.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  11. 11.
    Liu, J., Yang, Y., Shah, M.: Learning Semantic Visual Vocabularies Using Diffusion Distance. In: CVPR (2009)Google Scholar
  12. 12.
    Loy, C.C., Xiang, T., Gong, S.: Detecting and Discriminating Behavioural Anomalies. Pattern Recognition 44 (2011)Google Scholar
  13. 13.
    Marszałek, M., Laptev, I., Schmid, C.: Actions in Context. In: CVPR (2009)Google Scholar
  14. 14.
    Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action Recognition Through the Motion Analysis of Tracked Features. In: ICCV Workshop on Video-Oriented Object and Event Classification (2009)Google Scholar
  15. 15.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)Google Scholar
  16. 16.
    Natarajan, P., Nevatia, R.: Coupled Hidden Semi Markov Models for Activity Recognition. In: WMVC (2007)Google Scholar
  17. 17.
    Nguyen, N.T., Phung, D.Q., Venkatesch, S., Bui, H.H.: Learning and Detecting Activities from Movements Trajectories Using Hierarchical Hidden Markov Model. In: CVPR (2005)Google Scholar
  18. 18.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-temporal Words. In: BMVC (2006)Google Scholar
  19. 19.
    Park, S., Aggarwal, J.K.: A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems 10(2) (2004)Google Scholar
  20. 20.
    Rodriguez, M., Ahmed, J., Shah, M.: Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition. In: CVPR (2008)Google Scholar
  21. 21.
    Savarese, A., Pozo, A.D., Niebles, J.C., Fei-Fei, L.: Spatial-temporal correlations for unsupervised action classification. In: Motion and Video Computing (2008)Google Scholar
  22. 22.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing Human Actions: A Local SVM Approach. In: ICPR (2004)Google Scholar
  23. 23.
    Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical Spatio-Temporal Context Modeling for Action Recognition. In: CVPR (2009)Google Scholar
  24. 24.
    Ullah, M.M., Parizi, S.N., Laptev, I.: Improving Bag-of-Features Action Recognition with Non-local Cues. In: BMVC (2010)Google Scholar
  25. 25.
    Wang, H., Kläser, A., Schmid, C., Liu, C.: Action Recognition by Dense Trajectories. In: CVPR (2011)Google Scholar
  26. 26.
    Wang, X., Ma, X., Grimson, W.E.L.: Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models. TPAMI 31(3) (2009)Google Scholar
  27. 27.
    Xiang, T., Gong, S.: Video Behaviour Profiling for Anomaly Detection. TPAMI 30(5) (2008)Google Scholar
  28. 28.
    Zeng, Z., Ji, Q.: Knowledge Based Activity Recognition with Dynamic Bayesian Network. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 532–546. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Zhang, J., Gong, S.: Action categorization with modified hidden conditional random field. Pattern Recognition 42(1) (2010)Google Scholar
  30. 30.
    Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. IJCV 73(2) (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yusuke Mitarai
    • 1
  • Masakazu Matsugu
    • 1
  1. 1.Canon Inc. Digital System Technology Development HeadquartersTokyoJapan

Personalised recommendations