Advertisement

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

  • Mohamed Rabie Amer
  • Peng Lei
  • Sinisa Todorovic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)

Abstract

This paper addresses the problem of recognizing and localizing coherent activities of a group of people, called collective activities, in video. Related work has argued the benefits of capturing long-range and higher-order dependencies among video features for robust recognition. To this end, we formulate a new deep model, called Hierarchical Random Field (HiRF). HiRF models only hierarchical dependencies between model variables. This effectively amounts to modeling higher-order temporal dependencies of video features. We specify an efficient inference of HiRF that iterates in each step linear programming for estimating latent variables. Learning of HiRF parameters is specified within the max-margin framework. Our evaluation on the benchmark New Collective Activity and Collective Activity datasets, demonstrates that HiRF yields superior recognition and localization as compared to the state of the art.

Keywords

Activity recognition hierarchical graphical models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv. 43, 16:1–16:43 (2011)Google Scholar
  2. 2.
    Amer, M., Todorovic, S., Fern, A., Zhu, S.: Monte carlo tree search for scheduling activity recognition. In: ICCV (2013)Google Scholar
  3. 3.
    Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/Bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Brendel, W., Fern, A., Todorovic, S.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)Google Scholar
  5. 5.
    Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. CVIU 117(6), 633–659 (2013)Google Scholar
  6. 6.
    Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: ICCV (2009)Google Scholar
  8. 8.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  9. 9.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  10. 10.
    Eslami, S.M.A., Heess, N., Williams, C.K.I., Winn, J.: The shape boltzmann machine: a strong model of object shape. IJCV (2013)Google Scholar
  11. 11.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  12. 12.
    Kae, A., Sohn, K., Lee, H., Learned-Miller, E.: Augmenting crfs with boltzmann machine shape priors for image labeling. In: CVPR (2013)Google Scholar
  13. 13.
    Khamis, S., Morariu, V.I., Davis, L.S.: Combining per-frame and per-track cues for multi-person action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 116–129. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Khamis, S., Morariu, V., Davis, L.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)Google Scholar
  15. 15.
    Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)Google Scholar
  16. 16.
    Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)Google Scholar
  17. 17.
    Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. TPAMI (2012)Google Scholar
  18. 18.
    Li, Y., Tarlow, D., Zemel, R.: Exploring complositional high order pattern potentials for structured output learning. In: CVPR (2013)Google Scholar
  19. 19.
    Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  20. 20.
    Odashima, S., Shimosaka, M., Kaneko, T., Fukui, R., Sato, T.: Collective activity localization with contextual spatial pyramid. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 243–252. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Pei, M., Jia, Y., Zhu, S.C.: Parsing video events with goal inference and intent prediction. In: ICCV (2011)Google Scholar
  22. 22.
    Ryoo, M.S., Aggarwal, J.K.: Stochastic Representation and Recognition of High-level Group Activities. IJCV (2011)Google Scholar
  23. 23.
    Wang, S.B., Quattoni, A., Morency, L.P., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR (2006)Google Scholar
  24. 24.
    Wang, Y., Mori, G.: Hidden part models for human action recognition: Probabilistic versus max margin. TPAMI (2011)Google Scholar
  25. 25.
    Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. CVIU 115, 224–241 (2011)Google Scholar
  26. 26.
    Yuille, A.L., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)CrossRefzbMATHGoogle Scholar
  27. 27.
    Zeng, Z., Ji, Q.: Knowledge based activity recognition with Dynamic Bayesian Network. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 532–546. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Zhu, Y., Nayak, N.M., Roy-Chowdhury, A.K.: Context-aware modeling and recognition of activities in video. In: CVPR (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Mohamed Rabie Amer
    • 1
  • Peng Lei
    • 1
  • Sinisa Todorovic
    • 1
  1. 1.School of Electrical Engineering and Computer ScienceOregon State UniversityCorvallisUSA

Personalised recommendations