A Comprehensive Review of Group Activity Recognition in Videos

Wu, Li-Fang; Wang, Qi; Jian, Meng; Qiao, Yu; Zhao, Bo-Xuan

doi:10.1007/s11633-020-1258-8

A Comprehensive Review of Group Activity Recognition in Videos

Review
Open access
Published: 11 January 2021

Volume 18, pages 334–350, (2021)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Automation and Computing Aims and scope Submit manuscript

A Comprehensive Review of Group Activity Recognition in Videos

Download PDF

Li-Fang Wu ORCID: orcid.org/0000-0002-7209-0215^1,2,
Qi Wang¹,
Meng Jian ORCID: orcid.org/0000-0001-5659-5128^1,2,
Yu Qiao³ &
…
Bo-Xuan Zhao¹

2166 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

Human group activity recognition (GAR) has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance, social role understanding and sports video analysis. In this paper, we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years. First, we provide a summary and comparison of 11 GAR video datasets in this field. Second, we survey the group activity recognition methods, including those based on handcrafted features and those based on deep learning networks. For better understanding of the pros and cons of these methods, we compare various models from the past to the present. Finally, we outline several challenging issues and possible directions for future research. From this comprehensive literature review, readers can obtain an overview of progress in group activity recognition for future studies.

Article PDF

A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques

Recognition of Human Group Activity for Video Analytics

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

J. M. Chaquet, E. J. Carmona, A. Fernández-Caballero. A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding, vol. 117, no. 6, pp. 663–659, 2013. DOI: https://doi.org/10.1016/j.cviu.2013.01.013.
Article Google Scholar
S. Herath, M. Harandi, F. Porikii. Going deeper into action recognition: A survey. Image and Vision Computing, vol. 60, pp. 4–21, 2017. DOI: https://doi.org/10.1016/j.imavis.2017.01.010.
Article Google Scholar
G. C. Cheng, Y. W. Wan, A. N. Saudagar, K. Namuduri, B. P. Buckles. Advances in human action recognition: A survey, [Online], Available: https://arxiv.org/abs/1501.05964, 2015.
Y. Kong, Y. Fu. Human action recognition and prediction: A survey, [Online], Available: https://arxiv.org/abs/1806.11230, 2018.
C. Fauzi, S. Sulistyo. A survey of group activity recognition in smart building. In Proceedings of International Conference on Signals and Systems, IEEE, Baii, India, pp. 13–19, 2018. DOI: https://doi.org/10.1109/ICSIGSYS.2018.8372651.
Google Scholar
J. K. Aggarwal, M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, vol. 43, no. 3, Article number 16, 2011. DOI: https://doi.org/10.1145/1922649.1922653.
Google Scholar
S. A. Vahora, N. C. Chauhan. A comprehensive study of group activity recognition methods in video. Indian Journal of Science and Technology, vol 10, no. 23, 2017. DOI: https://doi.org/10.17485/ijst/2017/v10i23/113996.
Google Scholar
S. Blunsden, R. B. Fisher. The BEHAVE video dataset: Ground truthed video for multi-person behavior classification. Annals of the BMVA, vol. 2010, no. 4, pp. 1–12, 2010.
Google Scholar
W. Choi, K. Shahid, S. Savarese. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the 12th IEEE International Conference on Computer Vision Workshops, IEEE, Kyoto, Japan, pp. 1282–1289, 2009. DOI: https://doi.org/10.1109/ICCVW.2009.5457461.
Google Scholar
M. S. Ibrahim, S. Muralidharan, Z. W. Deng, A. Vahdat, G. Mori. A hi deep temporal model for group activity recognition. In Proceedings o IEEE Con erence on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 1971–1980, 2016. DOI: https://doi.org/10.1109/CVPR.2016.217.
Google Scholar
V. Ramanathan, J. Huang, S. Abu-El-Haija, A. Gorban, K. Murphy, F. F. Li. Detecting events and key actors in multi-person videos. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 3043–3053, 2016. DOI: https://doi.org/10.1109/CVPR.2016.332.
Google Scholar
Z. W. Cheng, L. Qin, Q. M. Huang, S. Q. Jiang, Q. Tian. Group activity recognition by gaussian processes estimation. In Proceedings of the 20th International Conference on Pattern Recognition, IEEE, Istanbul, Turkey, pp. 3228–3231, 2010. DOI: https://doi.org/10.1109/ICPR.2010.789.
Google Scholar
C. Zhang, X. K. Yang, W. Y. Lin, J. Zhu. Recognizing human group behaviors with multi-group causalities. In Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE, Macau, China, pp. 44–48, 2012. DOI: https://doi.org/10.1109/WI-IAT.2012.162.
Google Scholar
Y. Tang, Z. Wang, P. Li, J. Lu, M. Yang, J. Zhou. Mining semantics-preserving attention for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, pp. 1283–1291, 2018. DOI: https://doi.org/10.1145/3240508.3240576.
S. Khamis, V. I. Morariu, L. S. Davis. Combining perframe and per-track cues for multi-person action recognition. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 116–129, 2012. DOI: https://doi.org/10.1007/978-3-642-33718-5_9.
Google Scholar
M. R. Amer, P. Lei, S. Todorovic. Hirf: Hierarchical random fiedd for collective activity recognition in videos. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 572–585, 2014. DOI: https://doi.org/10.1007/978-3-319-10599-4_37.
Google Scholar
M. R. Amer, S. Todorovic, A. Fern, S. C. Zhu. Monte Carlo tree search for scheduhng activity recogmrion. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Sydnyy, Auttealla, PP1353–1360, 2013. DOI: https://doi.org/10.1109/ICCV.2013.171.
Google Scholar
Z. W. Deng, A. Vahdat, H. X. Hu, G. Mori. Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, EEEE, Las Vegas, USA, pp. 4772–4781, 2016. DOI: https://doi.org/10.1109/CVPR.2016.516.
Google Scholar
T. Lan, L. Sigal, G. Mori. Social roles in hierarchical models for human activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1354–1361, 2012. DOI: https://doi.org/10.1109/CVPR.2012.6247821.
Google Scholar
L. F. Wu, Z. Yang, J. Y. He, M. Jian, Y. W. Xu, D. Z. Xu, C. W. Chen. Ontology-based global and collective motion patterns for event classification in basketball videos. IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 2178–2190, 2020. DOI: https://doi.org/10.1109/TC-SVT.2019.2912529.
Google Scholar
K. Gavrilyuk, R. Sanford, M. Javan, C. G. M. Snoek. Actor-transformers for group activity recognition, [Online], Available: https://arxiv.org/abs/2003.12737, 2020.
C. Zalluhoglu, N. Ikizler-Cinbis. Collective sports: A Multi-task dataset for collective activity recognition. Image and Vision Computing, vol. 94, Article number 103870, 2020. DOI: https://doi.org/10.1016/j.imavis.2020.103870.
R. Yan, L. X. Xie, J. H. Tang, X. B. Shu, Q. Tian. Social adaptive module for weakly-supervised group activity recognition, [Online], Available: https://arxiv.org/abs/2007.09470, 2020.
B. B. Ni, S. C. Yan, A. Kassim. Recognizing human group activities with localized causalities. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 1470–1477, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206853.
Google Scholar
W. Choi, K. Shahid, S. Savarese. Learning context for collective activity recognition. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Colorado, USA, pp. 3273–3280, 2011. DOI: https://doi.org/10.1109/CVPR.2011.5995707.
Google Scholar
W. Choi, S. Savarese. A unified framework for multi-target tracking and collective activity recognition. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 215–230, 2012. DOI: https://doi.org/10.1007/978-3-642-33765-9_16.
Google Scholar
M. R. Amer, D. Xie, M. T. Zhao, S. Todorovic, S. C. Zhu. Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Itayy, pp, 187–200, 2012. DOI: https://doi.org/10.11007/978-3-642-33765-9_14.
Google Scholar
T. Lan, Y. Wang, W. L. Yang, S. N. Robinovitch. G. Mori. Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1549–1562, 2011. DOI: https://doi.org/10.1109/TPAMI.2011.228.
Article Google Scholar
K. N. Tran, A. Gala, I. A. Kakadiaris, S. K. Shah. Activity analysis in crowded environments using social cues for group discovery and human interaction modeling. Pattern Recognition Letters, vol. 44, pp. 49–57, 2014. DOI: https://doi.org/10.1016/j.patrec.2013.09.015.
Article Google Scholar
Z. W. Cheng, L. Qin, Q. M. Huang, S. C. Yan, Q. Tian. Recognizing human group action by layered model with multiple cues. Neurocomputing, vol. 136, pp. 124–135, 2014. DOI: https://doi.org/10.1016/j.neucom.2014.01.019.
Article Google Scholar
T. Lan, Y. Wang, G. Mori, S. N. Robinovitch. Retrieving actions in group contexts In Proceedings of European Conference on Computer Vision, Springer, Heraklion, Greece, pp. 181–194, 2010. DOI: https://doi.org/10.1007/978-3-642-35749-7_14.
Google Scholar
T. Kaneko, M. Shimosaka, S. Odashima, R. Fukm, T. Sato. Viewpoint invariant collective activity recognition with relative action context. In Proceedings of European Conference on Computer Vision, Springer, Florence, Italy, pp. 253–262, 2012. DOI: https://doi.org/10.1007/978-3-642-33885-4_26.
Google Scholar
M. Nabi, A. Del Bue, V. Murino. Temporal poselets for collective activity detection and recognition. In Proceedings of IEEE International Conference on Computer Vision Workshops, IEEE, Sydney, Australia, pp. 500–507, 2013. DOI: https://doi.org/10.1109/ICCVW.2013.71.
Google Scholar
L. Lan, Y. Wang, W. L. Yang, G. Mori. Beyond actions: Discriminative models for contextual group activities. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 1216–122, 2010.
X. B. Cange, W. S. Zhenge, J. G. Zhangg. Learning person-person intetaction in collective activity recognition. IEEE Transactions on Image Processing, vol. 24, no. 6, pp. 1905–1918, 0055. DOI: https://doi.org/10.1109/tip.2015.2409564.
Google Scholar
H. Hajimirsadeghi, W. Yan, A. Vahdat, G. Mori. Visual recognition by counting Instances: A mult-instance cardinality potential kernel. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2596–2605, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7208875.
Google Scholar
N. Vaswani, A. R. Chowdhury, R. Chellappa. Activity recognition using the dynamics of the configuration of interacting objects. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Madison, USA, 2003.
Book Google Scholar
S. M. Khan, M. Shah. Detecting group activities using rigidity of formation. In Proceedings of the 13th annual ACM International Conference on Multimedia, ACM, Singapore, pp. 403–406, 2005. DOI: https://doi.org/10.1145/1101149.1101237.
Google Scholar
Y. Zhou, B. B. Ni, S. C. Yan, T. S. Huang. Recognizing pair-activities by causality analysis. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 1, Article number 5, 2011. DOI: https://doi.org/10.1145/1889681.1889686.
Google Scholar
Y. M. Zhang, W. N Ge, M. C. Chang, X. M. Liu. Group context learning for event recognition. In Proceedings of IEEE Workshop on the Applications of Computer Vision, IEEE, Breckenridge, USA, pp. 249–255, 2012. DOI https://doi.org/10.1109/WACV.2012.6163009
Y. F. Yin, G. Yang, M J. Xu, H. Man. Small group human activity recognition. In Proceedings of the 19th IEEE International Conference on Image Processing, IEEE, Orlando, USA, pp. 2709–2712, 2012. DOI: https://doi.org/10.1109/ICIP.2012.6467458.
Google Scholar
J. Azorin-Lopez, M. Saval-Calvo, A. Fuster-Guillo, J. Garcia-Rodriguez, M. Cazorla, M. T. Signes-Pont. Group activity description and recognition based on trajectory analysis and neural networks. In Proceedings of International Joint Conference on Neural Networks, IEEE, Vancouver, Canada, pp. 1585–1592, 2016. DOI: https://doi.org/10.1109/IJCNN.2016.7727387.
Google Scholar
Y. J. Kim, N. G. Cho, S. W. Lee. Group activity recognition with group interaction zone. In Proceedings of the 22nd International Conference on Pattern Recognition, IEEE, Stockholm, Sweden, pp. 3517–3521, 2014. DOI: https://doi.org/10.1109/ICPR.2014.605.
Google Scholar
L. Sun, H. Z. Ai, S. H. Lao. Localizing activity groups in videos. Computer Vision and Image Understanding, vol. 144, pp. 144–154, 2016. DOI: https://doi.org/10.1016/j.cviu.2015.10.009.
Article Google Scholar
M C Chang, N Krahnstoever, S Lim, T Yu Group level activity recognition in crowded environments across multiple cameras. In Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Boston, USA, pp. 56–63, 2010. DOI: https://doi.org/10.1109/AVSS.2010.65.
Google Scholar
Z. J. Zha, H. W. Zhang, M. Wang, H. B. Luan, T. S. Chua. Detecting group activities with multi-camera context. IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 5, pp. 856–869, 2013. DOI: https://doi.org/10.1109/TCSVT.2012.2226526.
Article Google Scholar
D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan. Modeling individual and group actions in meetings with layered HMMs. IEEE Transactions on Multimedia, vol. 8, no. 3, pp. 509–520, 2006. DOI: https://doi.org/10.1109/tmm.2006.870735.
Article Google Scholar
P. Dai, H. J. Di, L. G. Dong, L. M. Tao, G. Y. Xu. Group interaction analysis in dynamic context. IEEE Transactions on Systems, Man, and Cybernetics, vol. 38, no. 1, pp. 275–282, 2008. DOI: https://doi.org/10.1109/TSMCB.2007.909939.
Article Google Scholar
M. R. Amer, S. Todorovic. A chains model for localizing participants of group activities in videos. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 786–793, 2011. DOI: https://doi.org/10.1109/ICCV.2011.6126317.
Google Scholar
T. Kaneko, M. Shimosaka, S. Odashima, R. Fukui, T. Sato. Consistent collective activity recognition with fully connected CRFs. In Proceedings of the 21st International Conference on Pattern Recognition, IEEE, Tsukuba, Japan, pp. 2792–2795, 2012.
Google Scholar
C. Y. Zhao, J. Q. Wang, H. Q. Lu. Learning discriminative context models for concurrent collective activity recognition. Multimedia Tools and Applications, vol. 76, no. 5, pp. 7401–7420, 2017. DOI: https://doi.org/10.1007/s11042-016-3393-3.
Article Google Scholar
T. Lan, L. Chen, Z. W. Deng, G. T. Zhou, G. Mori. Learning action primitives for multi-level video event understanding. In Proceedings of European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 95–110, 2014. DOI: https://doi.org/10.1007/978-3-319-16199-0_7.
Google Scholar
Z. Zhou, K. Li, X. J. He, M. M. Li. A generative model for recognizing mixed group activities in still images. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI/AAAI Press, New York, USA, pp. 3654–3660, 2015.
Google Scholar
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1106–1114, 2012.
B. Zhao, J. S. Feng, X. Wu, S. C. Yan. A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing, vol. 14, no. 2, pp. 119–135, 2017. DOI: https://doi.org/10.1007/s11633-017-1053-3.
Article Google Scholar
V. K. Ha, J. C. Ren, X. Y. Xu, S. Zhao, G. Xie, V. Masero, A. Hussain. Deep learning based single image super-resolution: A survey. International Journal of Automation and Computing, vol. 16, no. 4, pp. 413–426, 2019. DOI: https://doi.org/10.1007/s11633-019-1183-x.
Article Google Scholar
K. Simonyan, A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 568–576, 2014.
M. S. Wang, B. B. Ni, X. K. Yang. Recurrent modeling of interaction context for collective activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 7408–7416, 2017. DOI: https://doi.org/10.1109/CVPR.2017.783.
Google Scholar
T. M. Shu, S. Todorovic, S. C. Zhu. CERN: Confidence-energy recurrent network for group activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 4255–4263, 2017. DOI: https://doi.org/10.1109/CVPR.2017.453.
Google Scholar
X. Li, M. C. Chuah. SBGAR: Semantics based group activity recognition. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 2895–2904, 2017. DOI: https://doi.org/10.1109/ICCV.2017.313.
Google Scholar
M. S. Qi, J. Qin, A. N. Li, Y. H. Wang, J. B. Luo, L. Van Gool. StagNet: An attentive semantic RNN for group activity recognition. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 104–120, 2018. DOI: https://doi.org/10.1007/978-3-030-01249-6_7.
Google Scholar
M. S. Ibrahim, G. Mori. Hierarchical relational networks for group activity recognition and retrieval. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 742–758, 2018. DOI: https://doi.org/10.1007/978-3-030-01219-9_44.
Google Scholar
S. M. Aiar, M. G. Atigh, A. Nickabadi, A. Alahi. Convolutional relational machine for group activity recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 7884–7893, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00808.
Google Scholar
J. C. Wu, L. M. Wang, L. Wang, J. Guo, G. S. Wu. Learning actor relation graphs for group activity recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 9956–9966, 2019. DOI: https://doi.org/10.1109/CVPR.2019.01020.
Google Scholar
G. Y. Hu, B. Cui, Y. He, S. Yu. Progressive relation learning for group activity recognition, [Online], Available: https://arxiv.org/abs/1908.02948, 2019.
R. Yan, J. H. Tang, X. B. Shu, Z. C. Li, Q. Tian. Participation-contributed temporal dynamic model for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia, ACM, Seoul, Republic of Korea, pp. 1292–1300, 2018. DOI: https://doi.org/10.1145/3240508.3240572.
Google Scholar
L. H. Lu, Y. Lu, R. Z. Yu, H. J. Di, L. Zhang, S. Z. Wang. GAIM: Graph attention interaction model for collective activity recognition. IEEE Transactions on Multimedia, vol. 22, no. 2, pp. 524–539, 2020. DOI: https://doi.org/10.1109/TMM.2019.2930344.
Article Google Scholar
T. Bagautdinov, A. Alahi, F. Fleuret, P. Fua, S. Savarese. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 4325–4334, 2017. DOI: https://doi.org/10.1109/CVPR.2017.365.
Google Scholar
P. Z. Zhang, Y. Y. Tang, J. F. Hu, W. S. Zheng. Fast collective activity recognition under weak supervision. IEEE Transactions on Image Processing, vol. 29, pp. 29–43, 2019. DOI: https://doi.org/10.1109/TIP.2019.2918725.
Article MathSciNet Google Scholar
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1007/978-3-642-24797-2_4.
Article Google Scholar
A. Graves, N. Jaitly, A. R. Mohamed. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, IEEE, Olomouc, Ciech Republic, pp. 273–278, 2013. DOI: https://doi.org/10.1109/ASRU.2013.6707742.
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 2048–2057, 2015.
P. S. Kim, D. G. Lee, S. W. Lee. Discriminative context learning with gated recurrent unit for group activity recognition. Pattern Recognition, vol. 76, pp. 149–161, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.10.037.
Article Google Scholar
H. Gammulle, S. Denman, S. Sridharan, C. Fookes. Multilevel sequence GAN for group activity recognition. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 331–346, 2018. DOI: https://doi.org/10.1007/978-3-030-20887-5_21.
Google Scholar
L. F. Wu, J. Y. He, M. Jian, S. Y. Liu, Y. W. Xu. Global motion pattern based event recognition in multi-person videos. In Proceedings of the 2nd CCF Chinese Conference on Computer Vision, Springer, Tianjin, China, pp. 667–676, 2017. DOI: https://doi.org/10.1007/978-981-10-7305-2_56.
Google Scholar
X. B. Shu, L. Y. Zhang, Y. L. Sun, J. H. Tang. Host-parasite: Graph LSTM-in-LSTM for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2020. DOI: https://doi.org/10.1109/TNNLS.2020.2978942.
Z. W. Deng, M. Y. Zhai, L. Chen, Y. H. Liu, S. Muralidharan, M. J. Roshtkhari, G. Mori. Deep structured models for group activity recognition. In Proceedings of British Machine Vision Conference, Swansea, UK, pp. 179.1–179.12, 2015. DOI: https://doi.org/10.5244/C.29.179.
Y. Y. Tang, P. Z. Zhang, J. F. Hu, W. S. Zheng. Latent embeddings for collective activity recognition. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Lecce, Italy, 2017. DOI: https://doi.org/10.1109/AVSS.2017.8078522.
Book Google Scholar
D. Z. Xu, H. Fu, L. F. Wu, M. Jian, D. Wang, X. Liu. Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention. IEEE Access, vol. 8, pp. 65689–65698, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.2979742.
Article Google Scholar
A. Vaswani, N. Shaieer, N. Parmar, J. Usikoreit, L. Jones, A. N. Gomei, L. Kaiser, L. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
M. Ehsanpour, A. Abedin, F. Saleh, J. Shi, I. Reid, H. Reiatofighi. Joint learning of social groups, individuals action and sub-group activities in videos, [Online], Available: https://arxiv.org/abs/2007.02632, 2020.
S. A. Vahora, N. C. Chauhan. Deep neural network model for group activity recognition using contextual relationship. Engineering Science and Technology, an International Journal, vol. 22, no. 1, pp. 47–54, 2019. DOI: https://doi.org/10.1016/j.jestch.2018.08.010.
Article Google Scholar
J. C. Liu, C. X. Wang, Y. T. Gong, H. Xue. Deep fully connected model for collective activity recognition. IEEE Access, vol. 7, pp. 104308–104314, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2929684.
Article Google Scholar
S. E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh. Convolutional pose machines. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 4724–4732, 2016. DOI: https://doi.org/10.1109/CVPR.2016.511.
Google Scholar
T. N. Kipf, M. Welling. Semi-supervised classification with graph convulutional networks, [Online], Available: https://arxiv.org/abs/1609.02907, 2017.
J. Y. Gao, T. Z. Zhang, C. S. Xu. Graph convolutional tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4644–4654, 2019. DOI https://doi.org/10.1109/CVPR2019.00478
Google Scholar
S. J. Yan, Y. J. Xiong, D. H. Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, pp. 7444–7452, 2018.
X. L. Wang, A. Gupta. Videos as space-time region graphs. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 413–431, 2018. DOI: https://doi.org/10.1007/978-3-030-01228-1_25.
Google Scholar
J. H. Tang, X. B. Shu, R. Yan, L. Y. Zhang. Coherence constrained graph LSTM for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. DOI: https://doi.org/10.1109/TPAMI.2019.2928540.
Y. S. Tang, J. W. Lu, Z. Wang, M. Yang, J. Zhou. Learning semantics-preserving attention and contextual interaction for group activity recognition. IEEE Transactions on Image Processing, vol. 28, no. 10, pg. 4997–5012, 2019. DOI: https://doi.org/10.1109/TIP.2019.2914577.
Article MathSciNet MATH Google Scholar
L. H. Lu, H. J. Di, Y. Lu, L. Zhang, S. Z. Wang. A two-level attention-based interaction model for multi-person activity recognition. Neurocomputing, vol. 322, pp. 195–205, 2018. DOI: https://doi.org/10.1016/j.neucom.2018.09.060.
Article Google Scholar
L. H. Lu, H. J. Di, Y. Lu, L. Zhang, S. Z. Wang. Spatiotemporal attention mechanisms based model for collective activity recognition. Signal Processing: Image Communication, vol. 74, pp. 162–174, 2019. DOI: https://doi.org/10.1016/j.image.2019.02.012.
Google Scholar
N. F. Zhuang, T. Yusufu, J. Ye, K. A. Hua. Group activity recognition with differential recurrent convolutional neural networks. In Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition, IEEE, Washington, USA, pp. 526–531, 2017. DOI: https://doi.org/10.1109/FG.2017.70.
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61976010, 61802011), Beijing Postdoctoral Research Foundation (No. ZZ2019-63), Beijing excellent young talent cultivation project (No. 20170000 20124G075) and “Ri xin” Training Programme Foundation for the Talents by Beijing University of Technology.

Author information

Authors and Affiliations

College of Information and Communication Engineering, Beijing University of Technology, Beijing, 100124, China
Li-Fang Wu, Qi Wang, Meng Jian & Bo-Xuan Zhao
Beijing Municipal Key Lab of Computation Intelligence and Intelligent Systems, Beijing University of Technology, Beijing, 100124, China
Li-Fang Wu & Meng Jian
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Yu Qiao

Authors

Li-Fang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Jian
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Bo-Xuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Jian.

Additional information

Recommended by Associate Editor De Xu

Li-Fang Wu received the B.Eng. degree in radio technology and M.Eng. degree in metal material and heat treatment from Beijing University of Technology (BJUT), China in 1991 and 1994, respectively, and the Ph. D. degree in pattern recognition and intelligent system from BJUT in 2003. She is a professor with Faculty of Information Technology, Beijing University of Technology, China. She has published over 100 referred technical papers in international journals and conferences of image/video processing and pattern recognition. She is a senior member of the China Computer Federation.

Her research interests include image/video analysis and understanding, social media computing, intelligent 3D printing and face presentation attack detection.

Qi Wang received the B.Sc. degree in electronic information engineering from Beijing University of Technology, China in 2018. He is currently a master student in information and communication engineering at College of Information and Communication Engineering, Beijing University of Technology, China.

His research interests include group activity recognition, computer vision and image processing.

Meng Jian received the B.Sc. degree in electronic information science and technology and Ph. D. degree in pattern recognition and information system from Xidian University, China in 2010 and 2015, respectively. She is currently an associate professor with Faculty of Information Technology, Beijing University of Technology, China. She is also a Research Scholar with School of Computing, National University of Singapore, Singapore, from November 2018 to November 2019. She has been awarded Beijing Excellent Young Talent in 2017 and “Ri xin” Talents of Beijing University of Technology in 2018.

Her research interests include pattern recognition, image understanding and social media computing.

Yu Qiao received Ph. D. degree in information system from University of Electro-Communications, Japan in 2006. He is currently a professor and director of Institute of Advanced Computing and Digital Engineering, with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. He has been a Japan Science Promotion Society Fellow and a project assistant professor with The University of Tokyo, Japan from 2007 to 2010. He has authored over 180 articles in journals and conferences, including PAMI, IJCV, TIP, ICCV, CVPR, ECCV and AAAI, with h-index 52. He was a recipient of the Lu Jiaxi Young Researcher Award from the Chinese Academy of Sciences in 2012, and the first class award on technological invention from Guangdong provincial government in 2019. He was the winner of video classification task in the ActivityNet Large Scale Activity Recognition Challenge 2016 and the first Runner-up of scene recognition task in the ImageNet Large Scale Visual Recognition Challenge 2015.

His research interests include computer vision, deep learning and intelligent robots.

Bo-Xuan Zhao received the B. Sc. degree in electronic information engineering from Beijing University of Technology, China in 2019, and is currently a master student in electronic and communications engineering at Beijing University of Technooogy, China.

His research interests include target tracking and motion trajectory description.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.3.

Reprints and permissions

About this article

Cite this article

Wu, LF., Wang, Q., Jian, M. et al. A Comprehensive Review of Group Activity Recognition in Videos. Int. J. Autom. Comput. 18, 334–350 (2021). https://doi.org/10.1007/s11633-020-1258-8

Download citation

Received: 08 May 2020
Accepted: 25 September 2020
Published: 11 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11633-020-1258-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Comprehensive Review of Group Activity Recognition in Videos

Abstract

Article PDF

Similar content being viewed by others

A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques

Recognition of Human Group Activity for Video Analytics

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Comprehensive Review of Group Activity Recognition in Videos

Abstract

Article PDF

Similar content being viewed by others

A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques

Recognition of Human Group Activity for Video Analytics

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation