Learning Action Primitives for Multi-level Video Event Understanding

  • Tian LanEmail author
  • Lei Chen
  • Zhiwei Deng
  • Guang-Tong Zhou
  • Greg Mori
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8927)


Human action categories exhibit significant intra-class variation. Changes in viewpoint, human appearance, and the temporal evolution of an action confound recognition algorithms. In order to address this, we present an approach to discover action primitives, sub-categories of action classes, that allow us to model this intra-class variation. We learn action primitives and their interrelations in a multi-level spatio-temporal model for action recognition. Action primitives are discovered via a data-driven clustering approach that focuses on repeatable, discriminative sub-categories. Higher-level interactions between action primitives and the actions of a set of people present in a scene are learned. Empirical results demonstrate that these action primitives can be effectively localized, and using them to model action classes improves action recognition performance on challenging datasets.


Gaussian Mixture Model Video Frame Action Recognition Video Event Temporal Segment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: discriminative models for contextual group activities. In: NIPS (2010)Google Scholar
  2. 2.
    Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)Google Scholar
  3. 3.
    Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  4. 4.
    Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  5. 5.
    Ramanathan, V., Yao, B., Fei-Fei, L.: Social role discovery in human events. In: CVPR (2013)Google Scholar
  6. 6.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. T-PAMI 32, 1672–1645 (2010)CrossRefGoogle Scholar
  7. 7.
    Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. T-CSVT (2008)Google Scholar
  8. 8.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)Google Scholar
  9. 9.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)Google Scholar
  10. 10.
    Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)Google Scholar
  11. 11.
    Jain, A., Gupta, A., Rodriguez, M., Davis, L.S.: Representing videos using mid-level discriminative patches. In: CVPR (2013)Google Scholar
  12. 12.
    Wang, H., Kläser, A., C.Schmid, Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  13. 13.
    Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)Google Scholar
  14. 14.
    Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: CVPR (2013)Google Scholar
  15. 15.
    Shugao Ma, Jianming Zhang, N.I.C., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: ICCV (2013)Google Scholar
  16. 16.
    Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: CVPR (1992)Google Scholar
  17. 17.
    Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: AAAI (2002)Google Scholar
  18. 18.
    Bobick, A., Wilson, A.: A state-based technique for the summarization and recognition of gesture. In: ICCV (1995)Google Scholar
  19. 19.
    Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)Google Scholar
  20. 20.
    Médioni, G., Cohen, I., Brémond, F., Hongeng, S., Nevatia, R.: Event detection and analysis from video streams. T-PAMI 23, 873–889 (2001)CrossRefGoogle Scholar
  21. 21.
    Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)Google Scholar
  22. 22.
    Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In: International Workshop on Sign, Gesture, Activity (2010)Google Scholar
  23. 23.
    Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)Google Scholar
  24. 24.
    Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)Google Scholar
  25. 25.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  26. 26.
    Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Discovering primitive action categories by leveraging relevant visual context. In: ECCV Workshop on Visual Surveillance (2008)Google Scholar
  27. 27.
    Hoai, M., Zisserman, A.: Discriminative sub-categorization. In: CVPR (2013)Google Scholar
  28. 28.
    Lan, T., Sigal, L., Raptis, M., Mori, G.: From subcategories to visual composites: a multi-level framework for object detection. In: ICCV (2013)Google Scholar
  29. 29.
    Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  30. 30.
    Todorovic, S., Ahuja, N.: Learning subcategory relevances for category recognition. In: CVPR (2008)Google Scholar
  31. 31.
    Gu, C., Arbeláez, P., Lin, Y., Yu, K., Malik, J.: Multi-component Models for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 445–458. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  32. 32.
    Sheikh, Y.A., Khan, E.A., Kanade, T.: Mode-seeking via medoidshifts. In: ICCV (2007)Google Scholar
  33. 33.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science (2007)Google Scholar
  34. 34.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. JMLR 2, 265–292 (2001)Google Scholar
  35. 35.
    Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)Google Scholar
  36. 36.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: collective activity classification using spatial-temporal relationship among people. In: International Workshop on Visual Surveillance (2009)Google Scholar
  37. 37.
    Sadanand, S., Corso, J.J.: Action Bank: a high-level representation of activity in video. In: CVPR (2012)Google Scholar
  38. 38.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatial-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)Google Scholar
  39. 39.
    Alexe, B., Deselares, T., Ferrari, V.: What is an object?. In: CVPR (2010)Google Scholar
  40. 40.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tian Lan
    • 1
    Email author
  • Lei Chen
    • 2
  • Zhiwei Deng
    • 2
  • Guang-Tong Zhou
    • 2
  • Greg Mori
    • 2
  1. 1.Stanford University StanfordUSA
  2. 2.Simon Fraser UniversityBurnabyCanada

Personalised recommendations