Collective Activity Localization with Contextual Spatial Pyramid

  • Shigeyuki Odashima
  • Masamichi Shimosaka
  • Takuhiro Kaneko
  • Rui Fukui
  • Tomomasa Sato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7585)


In this paper, we propose an activity localization method with contextual information of person relationships. Activity localization is a task to determine “who participates to an activity group”, such as detecting “walking in a group” or “talking in a group”. Usage of contextual information has been providing promising results in the previous activity recognition methods, however, the contextual information has been limited to the local information extracted from one person or only two people relationship. We propose a new context descriptor named “contextual spatial pyramid model (CSPM)”, which represents the global relationships extracted from the whole of activities in single images. CSPM encodes useful relationships for activity localization, such as “facing each other”. The experimental result shows CSPM improve activity localization performance, therefore CSPM provides strong contextual cues for activity recognition in complex scenes.


Collective Activity Activity Localization Activity Recognition Activity Category Spatial Pyramid 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79, 299–318 (2008)CrossRefGoogle Scholar
  2. 2.
    Wang, Y., Mori, G.: Human action recognition by semilatent topic models. IEEE Trans. on PAMI 31, 1762–1774 (2009)CrossRefGoogle Scholar
  3. 3.
    Choi, W., Shahid, K., Savaese, S.: What are they doing?: collective acitivity classification using spatio-temporal relationship among people. In: International Workshop on Visual Surveillance (2009)Google Scholar
  4. 4.
    Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing group activities. IEEE Trans. on PAMI 34, 1549–1562 (2012)CrossRefGoogle Scholar
  5. 5.
    Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)Google Scholar
  6. 6.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  7. 7.
    Amer, M.R., Todorovic, S.: A chains model for localizing participants of group activities in videos. In: ICCV (2011)Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  9. 9.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. on PAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  10. 10.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV (2009)Google Scholar
  11. 11.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  12. 12.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)Google Scholar
  13. 13.
    Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. on PAMI 31, 1775–1789 (2009)CrossRefGoogle Scholar
  14. 14.
    Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: CVPR (2011)Google Scholar
  15. 15.
    Yao, B., Khosla, A., Fei-Fei, L.: Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In: ICML (2011)Google Scholar
  16. 16.
    Amer, M.R., Todorovic, S.: Sum-product networks for modeling activities with stochastic structure. In: CVPR (2012)Google Scholar
  17. 17.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  18. 18.
    Everingham, M., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Shigeyuki Odashima
    • 1
  • Masamichi Shimosaka
    • 1
  • Takuhiro Kaneko
    • 1
  • Rui Fukui
    • 1
  • Tomomasa Sato
    • 1
  1. 1.The University of TokyoTokyoJapan

Personalised recommendations