Multimedia Tools and Applications

, Volume 76, Issue 5, pp 7401–7420 | Cite as

Learning discriminative context models for concurrent collective activity recognition



Collective activity classification is the task to identify activities with multiple persons participation, which often involves the context information like person relationships and person interactions. Most existing approaches assume that all individuals in a single image share the same activity label. However, in many cases, multiple activities co-exist and serve as context cues for each other in real-world scenarios. Based on this observation, in this paper, a unified discriminative learning framework of multiple context models is proposed for concurrent collective activity recognition. Firstly, both the intra-class and inter-class behaviour interactions among persons in a scenario are considered. Besides, the scenario where activities happen also provides additional context information for recognizing specific collective activities. Finally, we jointly model the multiple context cues (intra-class, inter-class and global-context) with a max-margin leaning framework. A greedy forward search method is utilized to label the activities in the testing scenes. Experimental results demonstrate the superiority of our approach in activity recognition.


Activity classification Context information Max-margin learning 


  1. 1.
    Amer MR, Xie D, Zhao M, Todorovic S, Zhu SC (2012) Cost-sensitive top-down / bottom-up inference for multiscale activity recognition. In: ECCVGoogle Scholar
  2. 2.
    Antic B, Ommer B (2014) Learning latent constituents for recognition of group activities in video. In: European Conference on Computer Vision (ECCV)Google Scholar
  3. 3.
    Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: 10th IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol 2, pp 1395–1402Google Scholar
  4. 4.
    Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2:27:1–27:27CrossRefGoogle Scholar
  5. 5.
    Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision (ECCV)Google Scholar
  6. 6.
    Choi W, Shahid K, Savarese S (2009) What are they doing? : Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp 1282–1289Google Scholar
  7. 7.
    Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3280Google Scholar
  8. 8.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893Google Scholar
  9. 9.
    Desai C, Ramanan D, Fowlkes CC (2011) Discriminative models for multi-class object layout. Int J Comput Vis 95(1):1–12MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Fu W, Zhao C, Wang J, Liu J, Cheng J, Lu H (2015) Concurrent group activity classification with context modeling. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. ACM, p 9Google Scholar
  11. 11.
    Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2012–2019Google Scholar
  12. 12.
    Han D, Bo L, Sminchisescu C (2009) Selection and context for action recognition. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1933–1940Google Scholar
  13. 13.
    Jain A, Gupta A, Davis LS (2010) Learning what and how of contextual models for scene labeling. In: Computer Vision–ECCV 2010. Springer, pp 199–212Google Scholar
  14. 14.
    Kjellström H, Romero J, Martínez D, Kragić D (2008) Simultaneous visual recognition of manipulation actions and manipulated objects. In: Computer Vision–ECCV 2008. Springer, pp 336–349Google Scholar
  15. 15.
    Lan T, Yang W, Wang Y, Mori G (2010) Beyond actions: Discriminative models for contextual group activities. In: In Advances in Neural Information Processing SystemsGoogle Scholar
  16. 16.
    Lan T, Sigal L, Mori G (2012a) Social roles in hierarchical models for human activity recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1354–1361Google Scholar
  17. 17.
    Lan T, Wang Y, Mori G, Robinovitch SN (2012b) Retrieving actions in group contexts. In: Trends and Topics in Computer Vision. Springer, pp 181–194Google Scholar
  18. 18.
    Lan T, Wang Y, Yang W, Robinovitch S, Mori G (2012c) Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(8):1549–1562CrossRefGoogle Scholar
  19. 19.
    Li R, Porfilio P, Zickler T (2013) Finding group interactions in social clutter. In: CVPRGoogle Scholar
  20. 20.
    Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2929–2936Google Scholar
  21. 21.
    Murphy K, Torralba A, Freeman W (2003) Using the forest to see the trees: a graphical model relating features, objects and scenes. Advances in neural information processing systems 16:1499–1506Google Scholar
  22. 22.
    Odashima S, Shimosaka M, Kaneko T (2012) Collective activity localization with contextual spatial pyramid. In: European Conference on Computer Vision (ECCV)Google Scholar
  23. 23.
    Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: IEEE 11th international conference on Computer vision, 2007. ICCV 2007. IEEE, pp 1–8Google Scholar
  24. 24.
    Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1593–1600Google Scholar
  25. 25.
    Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36Google Scholar
  26. 26.
    Torralba A, Murphy K, Freeman W, Rubin M (2003) Context-based vision system for place and object recognition. In: Proceedings of the 9th IEEE International Conference on Computer Vision, 2003, vol 1, pp 273–280Google Scholar
  27. 27.
    Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st international conference on Machine learning. ACM, p 104Google Scholar
  28. 28.
    Wang J, Wang B, Duan L, Tian Q, Lu H (2014) Interactive ads recommendation with contextual search on product topic space. Multimedia tools and applications 70(2):799–820CrossRefGoogle Scholar
  29. 29.
    Wongun C, Silvio S (2013) Understanding collective activities of people from videosGoogle Scholar
  30. 30.
    Yao B, Fei-Fei L (2010a) Grouplet: A structured image representation for recognizing human and object interactions. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 9–16Google Scholar
  31. 31.
    Yao B, Fei-Fei L (2010b) Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 17–24Google Scholar
  32. 32.
    Zhao C, Fu W, Wang J, Bai X, Liu Q, Lu H (2014) Discriminative context models for collective activity recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 648–653Google Scholar
  33. 33.
    Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware modeling and recognition of activities in video. CVPRGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations