Combining Per-frame and Per-track Cues for Multi-person Action Recognition

  • Sameh Khamis
  • Vlad I. Morariu
  • Larry S. Davis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7572)


We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual’s action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level and the frame level (e.g., a person dancing in a crowd of joggers). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.


  1. 1.
    Lan, T., Wang, Y., Mori, G., Robinovitch, S.N.: Retrieving actions in group contexts. In: SGA (2010)Google Scholar
  2. 2.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  3. 3.
    Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)Google Scholar
  4. 4.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  5. 5.
    Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: VS (2009)Google Scholar
  6. 6.
    Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS (2010)Google Scholar
  7. 7.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  8. 8.
    Khamis, S., Morariu, V.I., Davis, L.S.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)Google Scholar
  9. 9.
    Xiang, T., Gong, S.: Beyond tracking: modelling activity and understanding behaviour. IJCV 67, 21–51 (2006)CrossRefGoogle Scholar
  10. 10.
    Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. In: AI (2007)Google Scholar
  11. 11.
    Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. IJCV 93, 183–200 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: CVPR (2009)Google Scholar
  13. 13.
    Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: CVPR (2011)Google Scholar
  14. 14.
    Brendel, W., Todorovic, S., Fern, A.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)Google Scholar
  15. 15.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  16. 16.
    Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: CVPR (2008)Google Scholar
  17. 17.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)Google Scholar
  18. 18.
    Berclaz, J., Fleuret, F., Türetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. PAMI 33, 1806–1819 (2011)CrossRefGoogle Scholar
  19. 19.
    Shitrit, H.B., Berclaz, J., Fleuret, F., Fua, P.: Tracking multiple people under global appearance constraints. In: ICCV (2011)Google Scholar
  20. 20.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  21. 21.
    Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)Google Scholar
  22. 22.
    Komodakis, N., Paragios, N., Tziritas, G.: Mrf optimization via dual decomposition: Message-passing revisited. In: ICCV (2007)Google Scholar
  23. 23.
    Pearl, J.: Reverend bayes on inference engines: A distributed hierarchical approach. In: AAAI, pp. 133–136 (1982)Google Scholar
  24. 24.
    Gamarnik, D., Shah, D., Wei, Y.: Belief propagation for min-cost network flow: convergence & correctness. In: SODA (2010)Google Scholar
  25. 25.
    Sutton, C., McCallum, A.: Piecewise training for undirected models. In: UAI (2005)Google Scholar
  26. 26.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: A library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  28. 28.
    Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum-weight independent set. In: CVPR (2011)Google Scholar
  29. 29.
    Weinberger, K.Q., Saul, L.K.: Fast solvers and efficient implementations for distance metric learning. In: ICML (2008)Google Scholar
  30. 30.
    Gonfaus, J.M., Boix, X., de Weijer, J.V., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials for joint classification and segmentation. In: CVPR (2010)Google Scholar
  31. 31.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (2004)Google Scholar
  32. 32.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sameh Khamis
    • 1
  • Vlad I. Morariu
    • 1
  • Larry S. Davis
    • 1
  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations