The Visual Computer

, Volume 30, Issue 12, pp 1395–1404 | Cite as

Action recognition by hidden temporal models

  • Jianzhai Wu
  • Dewen Hu
  • Fanglin Chen
Original Article


We focus on the recognition of human actions in uncontrolled videos that may contain complex temporal structures. It is a difficult problem because of the large intra-class variations in viewpoint, video length, motion pattern, etc. To address these difficulties, we propose a novel system in this paper that represents each action class by hidden temporal models. In this system, we represent the crucial action event per category by a video segment that covers a fixed number of frames and can move temporally within the sequences. To capture the temporal structures, the video segment is described by a temporal pyramid model. To capture large intra-class variations, multiple models are combined using Or operation to represent alternative structures. The index of model and the start frame of segment are both treated as hidden variables. We implement a learning procedure based on the latent SVM method. The proposed approach is tested on two difficult benchmarks: the Olympic Sports and HMDB51 data sets. The experimental results reveal that our system is comparable to the state-of-the-art methods in the literature.


Human action recognition Temporal pyramid model (TPM) Multi-model representation Latent SVM 



This work is supported by the National Basic Research Program of China (2013CB329401), the Natural Science Foundation of China (61375034, 61203263) and the NUDT Open Project of National Key Lab of High Performance Computing.


  1. 1.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 257–267 (2001)CrossRefGoogle Scholar
  2. 2.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)CrossRefGoogle Scholar
  3. 3.
    Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ECCV (2011)Google Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  5. 5.
    Dalal, N., Triggs, B., Schimid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV (2006)Google Scholar
  6. 6.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. (2008)Google Scholar
  7. 7.
    Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR (2008)Google Scholar
  8. 8.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  9. 9.
    Hoai, M., la Torre, F.D.: Max-margin early event detectors. In: CVPR (2012)Google Scholar
  10. 10.
    Ikizler, N., Forsyth, D.: Searching for complex human activities with no visual examples. Int. J. Comput. Vis. 80(3), 337–357 (2008)CrossRefGoogle Scholar
  11. 11.
    Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV (2007)Google Scholar
  12. 12.
    Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory-based modeling of human actions with motion reference points. In: ECCV (2012)Google Scholar
  13. 13.
    Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: ECCV (2012)Google Scholar
  14. 14.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video dataset for human action recognition. In: ICCV (2011)Google Scholar
  15. 15.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR (2008)Google Scholar
  16. 16.
    Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: CVPR (2007)Google Scholar
  17. 17.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  18. 18.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)Google Scholar
  19. 19.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: Proceedings of CVPR (2008)Google Scholar
  20. 20.
    Natarajan, P., Nevatia, R.: View and scale invariant action recognition using multiview shape-flow models. In: CVPR (2008)Google Scholar
  21. 21.
    Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV (2010)Google Scholar
  22. 22.
    Ogale, A.S., Karapurkar, A., Gutemberg, G.F., Aloimonos, Y.: View invariant identification of pose sequences for action recognition. In: VACE (2004)Google Scholar
  23. 23.
    Sadanand, S., Corso, J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)Google Scholar
  24. 24.
    Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: ECCV (2010)Google Scholar
  25. 25.
    Schindler, K., Gool, L.V.: Action snippets: how many frames does human action recognition require? In: CVPR (2008)Google Scholar
  26. 26.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  27. 27.
    Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)Google Scholar
  28. 28.
    Tran, D., Yuan, J.: Optimal spatio-temporal path discovery for video event detection. In: CVPR (2011)Google Scholar
  29. 29.
    Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)Google Scholar
  30. 30.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar
  31. 31.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)Google Scholar
  32. 32.
    Wang, H., Klaser, A., Schimid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  33. 33.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Wang, Y., Mori, G.: Hidden part models for human action recognition: probabilistic versus max margin. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1310–1323 (2011)CrossRefGoogle Scholar
  35. 35.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)Google Scholar
  36. 36.
    Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)CrossRefGoogle Scholar
  37. 37.
    Xiang, T., Gong, S.: Beyong tracking: modelling action and understanding behavior. Int. J. Comput. Vis. 67(1), 21–51 (2006)CrossRefGoogle Scholar
  38. 38.
    Yao, B., Fei-Fei, L.: Recognizing human actions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1691–1703 (2012)CrossRefGoogle Scholar
  39. 39.
    Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)Google Scholar
  40. 40.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)CrossRefGoogle Scholar
  41. 41.
    Yuille, A., Rangarajan, A.: The concave-convex procedure (cccp). In: NIPS, pp. 1033–1040 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Automatic Control, College of Mechatronics and AutomationNational University of Defense TechnologyChangshaChina

Personalised recommendations