Viewpoint Invariant Collective Activity Recognition with Relative Action Context

  • Takuhiro Kaneko
  • Masamichi Shimosaka
  • Shigeyuki Odashima
  • Rui Fukui
  • Tomomasa Sato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7585)


This paper presents an approach for collective activity recognition. Collective activities are activities performed by multiple persons, such as queueing in a line and talking together. To recognize them, the action context (AC) descriptor [1] encodes the “apparent” relation (e.g. a group crossing and facing “right”), however this representation is sensitive to viewpoint change. We instead propose a novel feature representation called the relative action context (RAC) descriptor that encodes the “relative” relation (e.g. a group crossing and facing the “same” direction). This representation is viewpoint invariant and complementary to AC; hence we employ a simplified combinational classifier. This paper also introduces two methods to accelerate performance. First, to make the contexts robust to various situations, we apply post processes. Second, to reduce local classification failures, we regularize the classification using fully connected CRFs. Experimental results show that our method is applicable to various scenes and outperforms state-of-the art methods.


Collective Activity Quantization Error Post Process Sparse Code Action Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lan, T., Wang, Y., Mori, G.: Retrieving actions in group contexts. In: International Workshop on Sign Gesture Activity (2010)Google Scholar
  2. 2.
    Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: Adv. in NIPS 23 (2010)Google Scholar
  3. 3.
    Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)Google Scholar
  4. 4.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: International Workshop on Visual Surveillance (2009)Google Scholar
  5. 5.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  6. 6.
    Amer, M.R., Todorovic, S.: A chains model for localizing participants of group activities in videos. In: ICCV (2011)Google Scholar
  7. 7.
    Kaneko, T., Shimosaka, M., Odashima, S., Fukui, R., Sato, T.: Consistent collective activity recognition with fully connected CRFs. In: ICPR (to appear, 2012)Google Scholar
  8. 8.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV (2009)Google Scholar
  9. 9.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  10. 10.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  11. 11.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  12. 12.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Adv. in NIPS 24 (2011)Google Scholar
  13. 13.
    Zhang, Y., Chen, T.: Efficient inference for fully-connected CRFs with stationarity. In: CVPR (2012)Google Scholar
  14. 14.
    Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  15. 15.
    Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. PAMI 20, 226–239 (1998)CrossRefGoogle Scholar
  16. 16.
    Boykov, Y.Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV (2001)Google Scholar
  17. 17.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)zbMATHGoogle Scholar
  18. 18.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Takuhiro Kaneko
    • 1
  • Masamichi Shimosaka
    • 1
  • Shigeyuki Odashima
    • 1
  • Rui Fukui
    • 1
  • Tomomasa Sato
    • 1
  1. 1.The University of TokyoJapan

Personalised recommendations