Activity Group Localization by Modeling the Relations among Participants

  • Lei Sun
  • Haizhou Ai
  • Shihong Lao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8689)


Beyond recognizing the actions of individuals, activity group localization aims to determine ‘‘who participates in each group’’ and ‘‘what activity the group performs’’. In this paper, we propose a latent graphical model to group participants while inferring each group’s activity by exploring the relations among them, thus simultaneously addressing the problems of group localization and activity recognition. Our key insight is to exploit the relational graph among the participants. Specifically, each group is represented as a tree with an activity label while relations among groups are modeled as a fully connected graph. Inference of such a graph is reduced into an extended minimum spanning forest problem, which is casted into a max-margin framework. It therefore avoids the limitation of high-ordered hierarchical model and can be solved efficiently. Our model is able to provide strong and discriminative contextual cues for activity recognition and to better interpret scene information for localization. Experiments on three datasets demonstrate that our model achieves significant improvements in activity group. localization and state-of-the-arts performance on activity recognition.


Action recognition group localization graphical model 


  1. 1.
    Amer, M.R., Todorovic, S.: A chains model for localizing participants of group activities in videos. In: ICCV (2011)Google Scholar
  2. 2.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar
  3. 3.
    Brendel, W., Todorovic, S., Fern, A.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)Google Scholar
  4. 4.
    Chang, M.C., Krahnstoever, M., Lim, S., Yu, T.: Group level activity recognition in crowed environments across multiple cameras. In: Workshop on Activity Monitoring by Multi-camera Surveillance System (2010)Google Scholar
  5. 5.
    Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: VSWS (2009)Google Scholar
  7. 7.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  9. 9.
    Duan, G., Huang, C., Ai, H., Lao, S.: Boosting associated pairing comparison features for pedestrian detection. In: Workshop of ICCV (2009)Google Scholar
  10. 10.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  11. 11.
    Gupta, A., Davis, L.S.: Objects in action: An approach for combing action understanding and object perception. In: CVPR (2007)Google Scholar
  12. 12.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: CVPR (2009)Google Scholar
  13. 13.
    Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. Artificial Intelligence 171(8), 586–605 (2007)CrossRefGoogle Scholar
  14. 14.
    Jain, A., Gupta, A., Davis, L.S.: Learning what and how of contextual models for scene labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 199–212. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Khamis, S., Morariu, V.I., Davis, L.S.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)Google Scholar
  16. 16.
    Lan, T., Wang, Y., Mori, G., Robinovitch, S.N.: Retrieving actions in group contexts. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 181–194. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Lan, T., Wang, Y., Wang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS (2010)Google Scholar
  18. 18.
    Lan, T., Wang, Y., Yang, W.L., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizeing contextual group activities. TPAMI 34(8), 1549–1562 (2012)CrossRefGoogle Scholar
  19. 19.
    Liu, L., Ai, H.: Learning structure models with context information for visual tracking. Journal of Computer Science and Technology 28(5), 818–826 (2013)CrossRefGoogle Scholar
  20. 20.
    Marszalek, M., Laptev, I., Shimid, C.: Actions in context. In: CVPR (2009)Google Scholar
  21. 21.
    Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: CVPR (2011)Google Scholar
  22. 22.
    Odashima, S., Shimosaka, M., Kaneko, T., Fukui, R., Sato, T.: Collective activity localization with contextual spatial pyramid. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 243–252. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Xiang, T., Gong, S.: Beyond tracking: modeling activity and understanding behavior. IJCV 67(1), 21–51 (2006)CrossRefGoogle Scholar
  24. 24.
    Xing, J., Liu, L., Ai, H.: Background subtraction through multiple life span modeling. In: ICIP (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Lei Sun
    • 1
  • Haizhou Ai
    • 1
  • Shihong Lao
    • 2
  1. 1.Computer Science & Technology DepartmentTsinghua UniversityBeijingChina
  2. 2.OMRON Social Solutions Co. Ltd.KusatsuJapan

Personalised recommendations