Pose Filter Based Hidden-CRF Models for Activity Detection

  • Prithviraj Banerjee
  • Ram Nevatia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


Detecting activities which involve a sequence of complex pose and motion changes in unsegmented videos is a challenging task, and common approaches use sequential graphical models to infer the human pose-state in every frame. We propose an alternative model based on detecting the key-poses in a video, where only the temporal positions of a few key-poses are inferred. We also introduce a novel pose summarization algorithm to automatically discover the key-poses of an activity. We learn a detection filter for each key-pose, which along with a bag-of-words root filter are combined in an HCRF model, whose parameters are learned using the latent-SVM optimization. We evaluate the performance of our model for detection on unsegmented videos on four human action datasets, which include challenging crowded scenes with dynamic backgrounds, inter-person occlusions, multi-human interactions and hard-to-detect daily use objects.


Activity detection Key-poses CRFs Latent-SVM 


  1. 1.
    Cao, Y., Barrett, D.: Recognizing Human Activities from Partially Observed Videos. In: CVPR (2013)Google Scholar
  2. 2.
    Felzenszwalb, P., McAllester, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  3. 3.
    Gaidon, A.: Actom sequence models for efficient action detection. In: CVPR (2011)Google Scholar
  4. 4.
    Huang, C., Wu, B., Nevatia, R.: Robust object tracking by hierarchical association of detection responses. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 788–801. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Jain, A., Gupta, A., Rodriguez, M., Davis, L.: Representing Videos using Mid-level Discriminative Patches. In: CVPR (2013)Google Scholar
  6. 6.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV (2005)Google Scholar
  7. 7.
    Ke, Y., Sukthankar, R., Hebert, M.: Volumetric Features for Video Event Detection. IJCV (2010)Google Scholar
  8. 8.
    Kong, Y., Jia, Y., Fu, Y.: Learning Human Interaction by Interactive Phrases. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 300–313. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  10. 10.
    Liu, T., Kender, J.R.: Computational approaches to temporal sampling of video sequences. MCCA (2007)Google Scholar
  11. 11.
    Lv, F., Nevatia, R.: Single view human action recognition using key pose matching & viterbi path searching. In: CVPR (2007)Google Scholar
  12. 12.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)Google Scholar
  13. 13.
    Natarajan, P., Singh, V., Nevatia, R.: Learning 3D Action Models from a few 2D videos. In: CVPR (2010)Google Scholar
  14. 14.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Raptis, M., Sigal, L.: Poselet Key-framing: A Model for Human Activity Recognition. In: CVPR (2013)Google Scholar
  16. 16.
    Raptis, M., Soatto, S.: Tracklet Descriptors for Action Modeling and Video Analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 577–590. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Rodriguez, M., Ahmed, J., Shah, M.: Action Mach A spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)Google Scholar
  18. 18.
    Ryoo, M.S., Chen, C.-C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 270–285. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Ryoo, M.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: ICCV. IEEE (2011)Google Scholar
  20. 20.
    Satkin, S., Hebert, M.: Modeling the Temporal Extent of Actions. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 536–548. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Schindler, K., Van Gool, L.: Action Snippets: How many frames does human action recognition require? In: CVPR (2008)Google Scholar
  22. 22.
    Shechtman, E., Irani, M.: Space-time behavior-based correlation-Or-how to tell if two underlying motion fields are similar without computing them? PAMI (2007)Google Scholar
  23. 23.
    Singh, V., Nevatia, R.: Action recognition in cluttered dynamic scenes using Pose-Specific Part Models. In: ICCV (2011)Google Scholar
  24. 24.
    Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal Deformable Part Models for Action Detection. In: CVPR (2013)Google Scholar
  25. 25.
    Vahdat, A., Gao, B., Ranjbar, M., Greg Mori: A discriminative key pose sequence model for recognizing human interactions. In: Workshop on Visual Surveillance (2011)Google Scholar
  26. 26.
    Wang, J., Chen, Z., Wu, Y.: Action Recognition with Multiscale Spatio-Temporal Contexts. In: CVPR (2011)Google Scholar
  27. 27.
    Wang, Y., Mori, G.: Hidden Part Models for Human Action Recognition: Probabilistic vs. Max-Margin. PAMI (2010)Google Scholar
  28. 28.
    Yu, C.N.J., Joachims, T.: Learning structural SVMs with latent variables. In: ICML (2009)Google Scholar
  29. 29.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative Subvolume Search for Efficient Action Detection. In: CVPR (2009)Google Scholar
  30. 30.
    Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-Temporal Phrases for Activity Recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  31. 31.
    Zhuang, Y., Rui, Y.: Adaptive key frame extraction using unsupervised clustering. In: ICIP (1998)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Prithviraj Banerjee
    • 1
  • Ram Nevatia
    • 1
  1. 1.University of Southern CaliforniaLos AngelesUSA

Personalised recommendations