Abstract
Human group activity recognition (GAR) has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance, social role understanding and sports video analysis. In this paper, we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years. First, we provide a summary and comparison of 11 GAR video datasets in this field. Second, we survey the group activity recognition methods, including those based on handcrafted features and those based on deep learning networks. For better understanding of the pros and cons of these methods, we compare various models from the past to the present. Finally, we outline several challenging issues and possible directions for future research. From this comprehensive literature review, readers can obtain an overview of progress in group activity recognition for future studies.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
J. M. Chaquet, E. J. Carmona, A. Fernández-Caballero. A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding, vol. 117, no. 6, pp. 663–659, 2013. DOI: https://doi.org/10.1016/j.cviu.2013.01.013.
S. Herath, M. Harandi, F. Porikii. Going deeper into action recognition: A survey. Image and Vision Computing, vol. 60, pp. 4–21, 2017. DOI: https://doi.org/10.1016/j.imavis.2017.01.010.
G. C. Cheng, Y. W. Wan, A. N. Saudagar, K. Namuduri, B. P. Buckles. Advances in human action recognition: A survey, [Online], Available: https://arxiv.org/abs/1501.05964, 2015.
Y. Kong, Y. Fu. Human action recognition and prediction: A survey, [Online], Available: https://arxiv.org/abs/1806.11230, 2018.
C. Fauzi, S. Sulistyo. A survey of group activity recognition in smart building. In Proceedings of International Conference on Signals and Systems, IEEE, Baii, India, pp. 13–19, 2018. DOI: https://doi.org/10.1109/ICSIGSYS.2018.8372651.
J. K. Aggarwal, M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, vol. 43, no. 3, Article number 16, 2011. DOI: https://doi.org/10.1145/1922649.1922653.
S. A. Vahora, N. C. Chauhan. A comprehensive study of group activity recognition methods in video. Indian Journal of Science and Technology, vol 10, no. 23, 2017. DOI: https://doi.org/10.17485/ijst/2017/v10i23/113996.
S. Blunsden, R. B. Fisher. The BEHAVE video dataset: Ground truthed video for multi-person behavior classification. Annals of the BMVA, vol. 2010, no. 4, pp. 1–12, 2010.
W. Choi, K. Shahid, S. Savarese. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the 12th IEEE International Conference on Computer Vision Workshops, IEEE, Kyoto, Japan, pp. 1282–1289, 2009. DOI: https://doi.org/10.1109/ICCVW.2009.5457461.
M. S. Ibrahim, S. Muralidharan, Z. W. Deng, A. Vahdat, G. Mori. A hi deep temporal model for group activity recognition. In Proceedings o IEEE Con erence on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 1971–1980, 2016. DOI: https://doi.org/10.1109/CVPR.2016.217.
V. Ramanathan, J. Huang, S. Abu-El-Haija, A. Gorban, K. Murphy, F. F. Li. Detecting events and key actors in multi-person videos. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 3043–3053, 2016. DOI: https://doi.org/10.1109/CVPR.2016.332.
Z. W. Cheng, L. Qin, Q. M. Huang, S. Q. Jiang, Q. Tian. Group activity recognition by gaussian processes estimation. In Proceedings of the 20th International Conference on Pattern Recognition, IEEE, Istanbul, Turkey, pp. 3228–3231, 2010. DOI: https://doi.org/10.1109/ICPR.2010.789.
C. Zhang, X. K. Yang, W. Y. Lin, J. Zhu. Recognizing human group behaviors with multi-group causalities. In Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE, Macau, China, pp. 44–48, 2012. DOI: https://doi.org/10.1109/WI-IAT.2012.162.
Y. Tang, Z. Wang, P. Li, J. Lu, M. Yang, J. Zhou. Mining semantics-preserving attention for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, pp. 1283–1291, 2018. DOI: https://doi.org/10.1145/3240508.3240576.
S. Khamis, V. I. Morariu, L. S. Davis. Combining perframe and per-track cues for multi-person action recognition. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 116–129, 2012. DOI: https://doi.org/10.1007/978-3-642-33718-5_9.
M. R. Amer, P. Lei, S. Todorovic. Hirf: Hierarchical random fiedd for collective activity recognition in videos. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 572–585, 2014. DOI: https://doi.org/10.1007/978-3-319-10599-4_37.
M. R. Amer, S. Todorovic, A. Fern, S. C. Zhu. Monte Carlo tree search for scheduhng activity recogmrion. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Sydnyy, Auttealla, PP1353–1360, 2013. DOI: https://doi.org/10.1109/ICCV.2013.171.
Z. W. Deng, A. Vahdat, H. X. Hu, G. Mori. Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, EEEE, Las Vegas, USA, pp. 4772–4781, 2016. DOI: https://doi.org/10.1109/CVPR.2016.516.
T. Lan, L. Sigal, G. Mori. Social roles in hierarchical models for human activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1354–1361, 2012. DOI: https://doi.org/10.1109/CVPR.2012.6247821.
L. F. Wu, Z. Yang, J. Y. He, M. Jian, Y. W. Xu, D. Z. Xu, C. W. Chen. Ontology-based global and collective motion patterns for event classification in basketball videos. IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 2178–2190, 2020. DOI: https://doi.org/10.1109/TC-SVT.2019.2912529.
K. Gavrilyuk, R. Sanford, M. Javan, C. G. M. Snoek. Actor-transformers for group activity recognition, [Online], Available: https://arxiv.org/abs/2003.12737, 2020.
C. Zalluhoglu, N. Ikizler-Cinbis. Collective sports: A Multi-task dataset for collective activity recognition. Image and Vision Computing, vol. 94, Article number 103870, 2020. DOI: https://doi.org/10.1016/j.imavis.2020.103870.
R. Yan, L. X. Xie, J. H. Tang, X. B. Shu, Q. Tian. Social adaptive module for weakly-supervised group activity recognition, [Online], Available: https://arxiv.org/abs/2007.09470, 2020.
B. B. Ni, S. C. Yan, A. Kassim. Recognizing human group activities with localized causalities. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 1470–1477, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206853.
W. Choi, K. Shahid, S. Savarese. Learning context for collective activity recognition. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Colorado, USA, pp. 3273–3280, 2011. DOI: https://doi.org/10.1109/CVPR.2011.5995707.
W. Choi, S. Savarese. A unified framework for multi-target tracking and collective activity recognition. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 215–230, 2012. DOI: https://doi.org/10.1007/978-3-642-33765-9_16.
M. R. Amer, D. Xie, M. T. Zhao, S. Todorovic, S. C. Zhu. Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Itayy, pp, 187–200, 2012. DOI: https://doi.org/10.11007/978-3-642-33765-9_14.
T. Lan, Y. Wang, W. L. Yang, S. N. Robinovitch. G. Mori. Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1549–1562, 2011. DOI: https://doi.org/10.1109/TPAMI.2011.228.
K. N. Tran, A. Gala, I. A. Kakadiaris, S. K. Shah. Activity analysis in crowded environments using social cues for group discovery and human interaction modeling. Pattern Recognition Letters, vol. 44, pp. 49–57, 2014. DOI: https://doi.org/10.1016/j.patrec.2013.09.015.
Z. W. Cheng, L. Qin, Q. M. Huang, S. C. Yan, Q. Tian. Recognizing human group action by layered model with multiple cues. Neurocomputing, vol. 136, pp. 124–135, 2014. DOI: https://doi.org/10.1016/j.neucom.2014.01.019.
T. Lan, Y. Wang, G. Mori, S. N. Robinovitch. Retrieving actions in group contexts In Proceedings of European Conference on Computer Vision, Springer, Heraklion, Greece, pp. 181–194, 2010. DOI: https://doi.org/10.1007/978-3-642-35749-7_14.
T. Kaneko, M. Shimosaka, S. Odashima, R. Fukm, T. Sato. Viewpoint invariant collective activity recognition with relative action context. In Proceedings of European Conference on Computer Vision, Springer, Florence, Italy, pp. 253–262, 2012. DOI: https://doi.org/10.1007/978-3-642-33885-4_26.
M. Nabi, A. Del Bue, V. Murino. Temporal poselets for collective activity detection and recognition. In Proceedings of IEEE International Conference on Computer Vision Workshops, IEEE, Sydney, Australia, pp. 500–507, 2013. DOI: https://doi.org/10.1109/ICCVW.2013.71.
L. Lan, Y. Wang, W. L. Yang, G. Mori. Beyond actions: Discriminative models for contextual group activities. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 1216–122, 2010.
X. B. Cange, W. S. Zhenge, J. G. Zhangg. Learning person-person intetaction in collective activity recognition. IEEE Transactions on Image Processing, vol. 24, no. 6, pp. 1905–1918, 0055. DOI: https://doi.org/10.1109/tip.2015.2409564.
H. Hajimirsadeghi, W. Yan, A. Vahdat, G. Mori. Visual recognition by counting Instances: A mult-instance cardinality potential kernel. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2596–2605, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7208875.
N. Vaswani, A. R. Chowdhury, R. Chellappa. Activity recognition using the dynamics of the configuration of interacting objects. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Madison, USA, 2003.
S. M. Khan, M. Shah. Detecting group activities using rigidity of formation. In Proceedings of the 13th annual ACM International Conference on Multimedia, ACM, Singapore, pp. 403–406, 2005. DOI: https://doi.org/10.1145/1101149.1101237.
Y. Zhou, B. B. Ni, S. C. Yan, T. S. Huang. Recognizing pair-activities by causality analysis. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 1, Article number 5, 2011. DOI: https://doi.org/10.1145/1889681.1889686.
Y. M. Zhang, W. N Ge, M. C. Chang, X. M. Liu. Group context learning for event recognition. In Proceedings of IEEE Workshop on the Applications of Computer Vision, IEEE, Breckenridge, USA, pp. 249–255, 2012. DOI https://doi.org/10.1109/WACV.2012.6163009
Y. F. Yin, G. Yang, M J. Xu, H. Man. Small group human activity recognition. In Proceedings of the 19th IEEE International Conference on Image Processing, IEEE, Orlando, USA, pp. 2709–2712, 2012. DOI: https://doi.org/10.1109/ICIP.2012.6467458.
J. Azorin-Lopez, M. Saval-Calvo, A. Fuster-Guillo, J. Garcia-Rodriguez, M. Cazorla, M. T. Signes-Pont. Group activity description and recognition based on trajectory analysis and neural networks. In Proceedings of International Joint Conference on Neural Networks, IEEE, Vancouver, Canada, pp. 1585–1592, 2016. DOI: https://doi.org/10.1109/IJCNN.2016.7727387.
Y. J. Kim, N. G. Cho, S. W. Lee. Group activity recognition with group interaction zone. In Proceedings of the 22nd International Conference on Pattern Recognition, IEEE, Stockholm, Sweden, pp. 3517–3521, 2014. DOI: https://doi.org/10.1109/ICPR.2014.605.
L. Sun, H. Z. Ai, S. H. Lao. Localizing activity groups in videos. Computer Vision and Image Understanding, vol. 144, pp. 144–154, 2016. DOI: https://doi.org/10.1016/j.cviu.2015.10.009.
M C Chang, N Krahnstoever, S Lim, T Yu Group level activity recognition in crowded environments across multiple cameras. In Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Boston, USA, pp. 56–63, 2010. DOI: https://doi.org/10.1109/AVSS.2010.65.
Z. J. Zha, H. W. Zhang, M. Wang, H. B. Luan, T. S. Chua. Detecting group activities with multi-camera context. IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 5, pp. 856–869, 2013. DOI: https://doi.org/10.1109/TCSVT.2012.2226526.
D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan. Modeling individual and group actions in meetings with layered HMMs. IEEE Transactions on Multimedia, vol. 8, no. 3, pp. 509–520, 2006. DOI: https://doi.org/10.1109/tmm.2006.870735.
P. Dai, H. J. Di, L. G. Dong, L. M. Tao, G. Y. Xu. Group interaction analysis in dynamic context. IEEE Transactions on Systems, Man, and Cybernetics, vol. 38, no. 1, pp. 275–282, 2008. DOI: https://doi.org/10.1109/TSMCB.2007.909939.
M. R. Amer, S. Todorovic. A chains model for localizing participants of group activities in videos. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 786–793, 2011. DOI: https://doi.org/10.1109/ICCV.2011.6126317.
T. Kaneko, M. Shimosaka, S. Odashima, R. Fukui, T. Sato. Consistent collective activity recognition with fully connected CRFs. In Proceedings of the 21st International Conference on Pattern Recognition, IEEE, Tsukuba, Japan, pp. 2792–2795, 2012.
C. Y. Zhao, J. Q. Wang, H. Q. Lu. Learning discriminative context models for concurrent collective activity recognition. Multimedia Tools and Applications, vol. 76, no. 5, pp. 7401–7420, 2017. DOI: https://doi.org/10.1007/s11042-016-3393-3.
T. Lan, L. Chen, Z. W. Deng, G. T. Zhou, G. Mori. Learning action primitives for multi-level video event understanding. In Proceedings of European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 95–110, 2014. DOI: https://doi.org/10.1007/978-3-319-16199-0_7.
Z. Zhou, K. Li, X. J. He, M. M. Li. A generative model for recognizing mixed group activities in still images. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI/AAAI Press, New York, USA, pp. 3654–3660, 2015.
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1106–1114, 2012.
B. Zhao, J. S. Feng, X. Wu, S. C. Yan. A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing, vol. 14, no. 2, pp. 119–135, 2017. DOI: https://doi.org/10.1007/s11633-017-1053-3.
V. K. Ha, J. C. Ren, X. Y. Xu, S. Zhao, G. Xie, V. Masero, A. Hussain. Deep learning based single image super-resolution: A survey. International Journal of Automation and Computing, vol. 16, no. 4, pp. 413–426, 2019. DOI: https://doi.org/10.1007/s11633-019-1183-x.
K. Simonyan, A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 568–576, 2014.
M. S. Wang, B. B. Ni, X. K. Yang. Recurrent modeling of interaction context for collective activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 7408–7416, 2017. DOI: https://doi.org/10.1109/CVPR.2017.783.
T. M. Shu, S. Todorovic, S. C. Zhu. CERN: Confidence-energy recurrent network for group activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 4255–4263, 2017. DOI: https://doi.org/10.1109/CVPR.2017.453.
X. Li, M. C. Chuah. SBGAR: Semantics based group activity recognition. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 2895–2904, 2017. DOI: https://doi.org/10.1109/ICCV.2017.313.
M. S. Qi, J. Qin, A. N. Li, Y. H. Wang, J. B. Luo, L. Van Gool. StagNet: An attentive semantic RNN for group activity recognition. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 104–120, 2018. DOI: https://doi.org/10.1007/978-3-030-01249-6_7.
M. S. Ibrahim, G. Mori. Hierarchical relational networks for group activity recognition and retrieval. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 742–758, 2018. DOI: https://doi.org/10.1007/978-3-030-01219-9_44.
S. M. Aiar, M. G. Atigh, A. Nickabadi, A. Alahi. Convolutional relational machine for group activity recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 7884–7893, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00808.
J. C. Wu, L. M. Wang, L. Wang, J. Guo, G. S. Wu. Learning actor relation graphs for group activity recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 9956–9966, 2019. DOI: https://doi.org/10.1109/CVPR.2019.01020.
G. Y. Hu, B. Cui, Y. He, S. Yu. Progressive relation learning for group activity recognition, [Online], Available: https://arxiv.org/abs/1908.02948, 2019.
R. Yan, J. H. Tang, X. B. Shu, Z. C. Li, Q. Tian. Participation-contributed temporal dynamic model for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia, ACM, Seoul, Republic of Korea, pp. 1292–1300, 2018. DOI: https://doi.org/10.1145/3240508.3240572.
L. H. Lu, Y. Lu, R. Z. Yu, H. J. Di, L. Zhang, S. Z. Wang. GAIM: Graph attention interaction model for collective activity recognition. IEEE Transactions on Multimedia, vol. 22, no. 2, pp. 524–539, 2020. DOI: https://doi.org/10.1109/TMM.2019.2930344.
T. Bagautdinov, A. Alahi, F. Fleuret, P. Fua, S. Savarese. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 4325–4334, 2017. DOI: https://doi.org/10.1109/CVPR.2017.365.
P. Z. Zhang, Y. Y. Tang, J. F. Hu, W. S. Zheng. Fast collective activity recognition under weak supervision. IEEE Transactions on Image Processing, vol. 29, pp. 29–43, 2019. DOI: https://doi.org/10.1109/TIP.2019.2918725.
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1007/978-3-642-24797-2_4.
A. Graves, N. Jaitly, A. R. Mohamed. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, IEEE, Olomouc, Ciech Republic, pp. 273–278, 2013. DOI: https://doi.org/10.1109/ASRU.2013.6707742.
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 2048–2057, 2015.
P. S. Kim, D. G. Lee, S. W. Lee. Discriminative context learning with gated recurrent unit for group activity recognition. Pattern Recognition, vol. 76, pp. 149–161, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.10.037.
H. Gammulle, S. Denman, S. Sridharan, C. Fookes. Multilevel sequence GAN for group activity recognition. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 331–346, 2018. DOI: https://doi.org/10.1007/978-3-030-20887-5_21.
L. F. Wu, J. Y. He, M. Jian, S. Y. Liu, Y. W. Xu. Global motion pattern based event recognition in multi-person videos. In Proceedings of the 2nd CCF Chinese Conference on Computer Vision, Springer, Tianjin, China, pp. 667–676, 2017. DOI: https://doi.org/10.1007/978-981-10-7305-2_56.
X. B. Shu, L. Y. Zhang, Y. L. Sun, J. H. Tang. Host-parasite: Graph LSTM-in-LSTM for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2020. DOI: https://doi.org/10.1109/TNNLS.2020.2978942.
Z. W. Deng, M. Y. Zhai, L. Chen, Y. H. Liu, S. Muralidharan, M. J. Roshtkhari, G. Mori. Deep structured models for group activity recognition. In Proceedings of British Machine Vision Conference, Swansea, UK, pp. 179.1–179.12, 2015. DOI: https://doi.org/10.5244/C.29.179.
Y. Y. Tang, P. Z. Zhang, J. F. Hu, W. S. Zheng. Latent embeddings for collective activity recognition. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Lecce, Italy, 2017. DOI: https://doi.org/10.1109/AVSS.2017.8078522.
D. Z. Xu, H. Fu, L. F. Wu, M. Jian, D. Wang, X. Liu. Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention. IEEE Access, vol. 8, pp. 65689–65698, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.2979742.
A. Vaswani, N. Shaieer, N. Parmar, J. Usikoreit, L. Jones, A. N. Gomei, L. Kaiser, L. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
M. Ehsanpour, A. Abedin, F. Saleh, J. Shi, I. Reid, H. Reiatofighi. Joint learning of social groups, individuals action and sub-group activities in videos, [Online], Available: https://arxiv.org/abs/2007.02632, 2020.
S. A. Vahora, N. C. Chauhan. Deep neural network model for group activity recognition using contextual relationship. Engineering Science and Technology, an International Journal, vol. 22, no. 1, pp. 47–54, 2019. DOI: https://doi.org/10.1016/j.jestch.2018.08.010.
J. C. Liu, C. X. Wang, Y. T. Gong, H. Xue. Deep fully connected model for collective activity recognition. IEEE Access, vol. 7, pp. 104308–104314, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2929684.
S. E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh. Convolutional pose machines. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 4724–4732, 2016. DOI: https://doi.org/10.1109/CVPR.2016.511.
T. N. Kipf, M. Welling. Semi-supervised classification with graph convulutional networks, [Online], Available: https://arxiv.org/abs/1609.02907, 2017.
J. Y. Gao, T. Z. Zhang, C. S. Xu. Graph convolutional tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4644–4654, 2019. DOI https://doi.org/10.1109/CVPR2019.00478
S. J. Yan, Y. J. Xiong, D. H. Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, pp. 7444–7452, 2018.
X. L. Wang, A. Gupta. Videos as space-time region graphs. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 413–431, 2018. DOI: https://doi.org/10.1007/978-3-030-01228-1_25.
J. H. Tang, X. B. Shu, R. Yan, L. Y. Zhang. Coherence constrained graph LSTM for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. DOI: https://doi.org/10.1109/TPAMI.2019.2928540.
Y. S. Tang, J. W. Lu, Z. Wang, M. Yang, J. Zhou. Learning semantics-preserving attention and contextual interaction for group activity recognition. IEEE Transactions on Image Processing, vol. 28, no. 10, pg. 4997–5012, 2019. DOI: https://doi.org/10.1109/TIP.2019.2914577.
L. H. Lu, H. J. Di, Y. Lu, L. Zhang, S. Z. Wang. A two-level attention-based interaction model for multi-person activity recognition. Neurocomputing, vol. 322, pp. 195–205, 2018. DOI: https://doi.org/10.1016/j.neucom.2018.09.060.
L. H. Lu, H. J. Di, Y. Lu, L. Zhang, S. Z. Wang. Spatiotemporal attention mechanisms based model for collective activity recognition. Signal Processing: Image Communication, vol. 74, pp. 162–174, 2019. DOI: https://doi.org/10.1016/j.image.2019.02.012.
N. F. Zhuang, T. Yusufu, J. Ye, K. A. Hua. Group activity recognition with differential recurrent convolutional neural networks. In Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition, IEEE, Washington, USA, pp. 526–531, 2017. DOI: https://doi.org/10.1109/FG.2017.70.
Acknowledgements
This work was supported by National Natural Science Foundation of China (Nos. 61976010, 61802011), Beijing Postdoctoral Research Foundation (No. ZZ2019-63), Beijing excellent young talent cultivation project (No. 20170000 20124G075) and “Ri xin” Training Programme Foundation for the Talents by Beijing University of Technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor De Xu
Li-Fang Wu received the B.Eng. degree in radio technology and M.Eng. degree in metal material and heat treatment from Beijing University of Technology (BJUT), China in 1991 and 1994, respectively, and the Ph. D. degree in pattern recognition and intelligent system from BJUT in 2003. She is a professor with Faculty of Information Technology, Beijing University of Technology, China. She has published over 100 referred technical papers in international journals and conferences of image/video processing and pattern recognition. She is a senior member of the China Computer Federation.
Her research interests include image/video analysis and understanding, social media computing, intelligent 3D printing and face presentation attack detection.
Qi Wang received the B.Sc. degree in electronic information engineering from Beijing University of Technology, China in 2018. He is currently a master student in information and communication engineering at College of Information and Communication Engineering, Beijing University of Technology, China.
His research interests include group activity recognition, computer vision and image processing.
Meng Jian received the B.Sc. degree in electronic information science and technology and Ph. D. degree in pattern recognition and information system from Xidian University, China in 2010 and 2015, respectively. She is currently an associate professor with Faculty of Information Technology, Beijing University of Technology, China. She is also a Research Scholar with School of Computing, National University of Singapore, Singapore, from November 2018 to November 2019. She has been awarded Beijing Excellent Young Talent in 2017 and “Ri xin” Talents of Beijing University of Technology in 2018.
Her research interests include pattern recognition, image understanding and social media computing.
Yu Qiao received Ph. D. degree in information system from University of Electro-Communications, Japan in 2006. He is currently a professor and director of Institute of Advanced Computing and Digital Engineering, with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. He has been a Japan Science Promotion Society Fellow and a project assistant professor with The University of Tokyo, Japan from 2007 to 2010. He has authored over 180 articles in journals and conferences, including PAMI, IJCV, TIP, ICCV, CVPR, ECCV and AAAI, with h-index 52. He was a recipient of the Lu Jiaxi Young Researcher Award from the Chinese Academy of Sciences in 2012, and the first class award on technological invention from Guangdong provincial government in 2019. He was the winner of video classification task in the ActivityNet Large Scale Activity Recognition Challenge 2016 and the first Runner-up of scene recognition task in the ImageNet Large Scale Visual Recognition Challenge 2015.
His research interests include computer vision, deep learning and intelligent robots.
Bo-Xuan Zhao received the B. Sc. degree in electronic information engineering from Beijing University of Technology, China in 2019, and is currently a master student in electronic and communications engineering at Beijing University of Technooogy, China.
His research interests include target tracking and motion trajectory description.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.3.
About this article
Cite this article
Wu, LF., Wang, Q., Jian, M. et al. A Comprehensive Review of Group Activity Recognition in Videos. Int. J. Autom. Comput. 18, 334–350 (2021). https://doi.org/10.1007/s11633-020-1258-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-020-1258-8