Rethinking group activity recognition under the open set condition

Zhu, Liping; Wu, Silin; Chang, Xianxiang; Yang, Yixuan; Li, Xuan

doi:10.1007/s00371-024-03424-0

Rethinking group activity recognition under the open set condition

Research
Published: 13 May 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Liping Zhu¹,
Silin Wu¹,
Xianxiang Chang¹,
Yixuan Yang¹ &
…
Xuan Li¹

85 Accesses
Explore all metrics

Abstract

In real-world scenarios, the recognition of unknown activities poses a significant challenge for group activity recognition. Existing methods primarily focus on closed sets, leaving the task of open set group activity recognition unexplored. In this paper, we introduce the concept of open set group activity recognition for the first time and propose a novel recognition framework to deal with it. To mitigate potential scene biases, keypoints extracted from groups are utilized as input. Our framework employs a two-stage approach: Evidence Aware Collection and Evidence Aware Decision, to address the challenge of insufficient evidence for rejecting unknown classes. Specifically, encoders are established at the individual, subgroup, and group scales to collect activity evidence among group members. By applying an attention mechanism, we focus on important evidence, resulting in a set of aggregated evidence. The uncertainty estimated from evidence is then used to effectively distinguish between known and unknown classes. Additionally, we perform open set splits on two publicly available group activity recognition datasets. Experimental results demonstrate that our method shows promising performance in open set group activity recognition while maintaining comparable performance under closed set conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Data availability statement

The Volleyball dataset can be obtained from https://github.com/mostafa-saad/deep-activity-rec, while the CAD dataset is available at https://vhosts.eecs.umich.edu/vision//activity-dataset.html. Our code will be made available on request.

References

Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1282–1289. IEEE (2009)
Wu, L., Tian, M., Xiang, Y., Gu, K., Shi, G.: Learning label semantics for weakly supervised group activity recognition. IEEE Trans. Multimedia 26, 6386–6397 (2024)
Article Google Scholar
Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9964–9974 (2019)
Wang, L., Feng, W., Tian, C., Chen, L., Pei, J.: 3d-unified spatial-temporal graph for group activity recognition. Neurocomputing 556, 126646 (2023)
Article Google Scholar
Li, S., Cao, Q., Liu, L., Yang, K., Liu, S., Hou, J., Yi, S.: Groupformer: Group activity recognition with clustered spatial-temporal transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13668–13677 (2021)
Kim, D., Lee, J., Cho, M., Kwak, S.: Detector-free weakly supervised group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20083–20093 (2022)
Gavrilyuk, K., Sanford, R., Javan, M., Snoek, C.G.: Actor-transformers for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 839–848 (2020)
Zhou, H., Kadav, A., Shamsian, A., Geng, S., Lai, F., Zhao, L., Liu, T., Kapadia, M., Graf, H.P.: Composer: compositional reasoning of group activity in videos with keypoint-only modality. In: European Conference on Computer Vision, pp. 249–266 (2022). Springer
Du, Z., Wang, X., Wang, Q.: Perceiving local relative motion and global correlations for weakly supervised group activity recognition. Image Vis. Comput. 137, 104789 (2023)
Article Google Scholar
Shu, Y., Shi, Y., Wang, Y., Zou, Y., Yuan, Q., Tian, Y.: Odn: Opening the deep network for open-set action recognition. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018). IEEE
Yoon, Y., Yu, J., Jeon, M.: Spatio-temporal representation matching-based open-set action recognition by joint learning of motion and appearance. IEEE Access 7, 165997–166010 (2019)
Article Google Scholar
Bao, W., Yu, Q., Kong, Y.: Evidential deep learning for open set action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13349–13358 (2021)
Zhao, C., Du, D., Hoogs, A., Funk, C.: Open set action recognition via multi-label evidential learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22982–22991 (2023)
Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1242–1257 (2013)
Article Google Scholar
Shu, T., Xie, D., Rothrock, B., Todorovic, S., Chun Zhu, S.: Joint inference of groups, events and human roles in aerial videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4576–4584 (2015)
Lin, W., Chu, H., Wu, J., Sheng, B., Chen, Z.: A heat-map-based algorithm for recognizing group activities in videos. IEEE Trans. Circuits Syst. Video Technol. 23(11), 1980–1992 (2013)
Article Google Scholar
Lin, W., Sun, M.-T., Poovendran, R., Zhang, Z.: Group event detection with a varying number of group members for video surveillance. IEEE Trans. Circuits Syst. Video Technol. 20(8), 1057–1067 (2010)
Article Google Scholar
Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part IV 12, pp. 187–200 (2012). Springer
Amer, M.R., Lei, P., Todorovic, S.: Hirf: Hierarchical random field for collective activity recognition in videos. In: European Conference on Computer Vision, pp. 572–585 (2014). Springer
Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: End-to-end multi-person action localization and collective activity recognition. IEEE Conference on Computer Vision & Pattern Recognition (2016)
Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1971–1980 (2016)
Shu, T., Todorovic, S., Zhu, S.-C.: Cern: confidence-energy recurrent network for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5523–5531 (2017)
Wang, M., Ni, B., Yang, X.: Recurrent modeling of interaction context for collective activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3048–3056 (2017)
Tang, Y., Wang, Z., Li, P., Lu, J., Yang, M., Zhou, J.: Mining semantics-preserving attention for group activity recognition. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1283–1291 (2018)
Qi, M., Jie, Q., Li, A., Wang, Y., Luo, J., Gool, L.V.: stagnet: An attentive semantic rnn for group activity recognition. In: Springer, Cham (2018)
Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 636–647 (2019)
Article Google Scholar
Shu, X., Tang, J., Qi, G.-J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1110–1118 (2019)
Article Google Scholar
Ehsanpour, M., Abedin, A., Saleh, F., Shi, J., Reid, I., Rezatofighi, H.: Joint learning of social groups, individuals action and sub-group activities in videos. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pp. 177–195 (2020). Springer
Yuan, H., Ni, D., Wang, M.: Spatio-temporal dynamic inference network for group activity recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7476–7485 (2021)
Lin, W., Chen, Y., Wu, J., Wang, H., Sheng, B., Li, H.: A new network-based algorithm for human activity recognition in videos. IEEE Trans. Circuits Syst. Video Technol. 24(5), 826–841 (2013)
Article Google Scholar
Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Higcin: Hierarchical graph-based cross inference network for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6955–6968 (2020)
Article Google Scholar
Pramono, R.R.A., Chen, Y.T., Fang, W.H.: Empowering relational network by self-attention augmented conditional random fields for group activity recognition. In: European Conference on Computer Vision, pp. 71–90 (2020). Springer
Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Social adaptive module for weakly-supervised group activity recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 208–224 (2020). Springer
Hu, G., Cui, B., He, Y., Yu, S.: Progressive relation learning for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 980–989 (2020)
Tang, Y., Wei, Y., Yu, X., Lu, J., Zhou, J.: Graph interaction networks for relation transfer in human activity videos. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2872–2886 (2020)
Article Google Scholar
Shu, X., Zhang, L., Sun, Y., Tang, J.: Host-parasite: graph LSTM-in-LSTM for group activity recognition. IEEE Trans Neural Netw Learn Syst 32(2), 663–674 (2020)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv Neural Inf Process Syst 30 (2017)
Tarashima, S., Center, I.: One-shot deep model for end-to-end multi-person activity recognition. In: British Machine Vision Conference (2021)
Yuan, H., Ni, D.: Learning visual context for group activity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, 3261–3269 (2021)
Li, W., Yang, T., Wu, X., Du, X.-J., Qiao, J.-J.: Learning action-guided spatio-temporal transformer for group activity recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2051–2060 (2022)
Hu, B., Cham, T.-J.: Entry-flipped transformer for inference and prediction of participant behavior. In: European Conference on Computer Vision, pp. 439–456 (2022). Springer
Han, M., Zhang, D.J., Wang, Y., Yan, R., Yao, L., Chang, X., Qiao, Y.: Dual-ai: Dual-path actor interaction learning for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2990–2999 (2022)
Zhu, X., Zhou, Y., Wang, D., Ouyang, W., Su, R.: Mlst-former: Multi-level spatial-temporal transformer for group activity recognition. IEEE Transactions on Circuits and Systems for Video Technology (2022)
Du, Z., Wang, X., Wang, Q.: Self-supervised global spatio-temporal interaction pre-training for group activity recognition. IEEE Trans. Circuits Syst. Video Technol. 33, 5076–5088 (2023)
Article Google Scholar
Li, F., Wechsler, H.: Open set face recognition using transduction. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1686–1697 (2005)
Article Google Scholar
Scheirer, W.J., Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2012)
Article Google Scholar
Jain, L.P., Scheirer, W.J., Boult, T.E.: Multi-class open set recognition using probability of inclusion. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13, pp. 393–409 (2014). Springer
Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1563–1572 (2016)
Neal, L., Olson, M., Fern, X., Wong, W.-K., Li, F.: Open set learning with counterfactual images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 613–628 (2018)
Ditria, L., Meyer, B.J., Drummond, T.: Opengan: Open set generative adversarial networks. In: Proceedings of the Asian Conference on Computer Vision (2020)
Yang, G., Zhou, S., Wan, M.: Open-set recognition model based on negative-class sample feature enhancement learning algorithm. Mathematics 10(24), 4725 (2022)
Article Google Scholar
Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., Naemura, T.: Classification-reconstruction learning for open-set recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4016–4025 (2019)
Oh, H., Kim, S.B.: Multivariate time series open-set recognition using multi-feature extraction and reconstruction. IEEE Access 10, 120063–120073 (2022)
Article Google Scholar
Huang, H., Wang, Y., Hu, Q., Cheng, M.-M.: Class-specific semantic reconstruction for open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4214–4228 (2022)
Google Scholar
Roitberg, A., Al-Halah, Z., Stiefelhagen, R.: Informed democracy: voting-based novelty detection for action recognition. arXiv preprint arXiv:1810.12819 (2018)
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 30 (2017)
Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. Adv. Neural Inf. Process. Syst. 31 (2018)
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Choi, J., Gao, C., Messou, J.C., Huang, J.-B.: Why can’t I dance in the mall? Learning to mitigate scene bias in action recognition. Adv. Neural Inf. Process. Syst. 32 (2019)
Kim, Y.-W., Mishra, S., Jin, S., Panda, R., Kuehne, H., Karlinsky, L., Saligrama, V., Saenko, K., Oliva, A., Feris, R.: How transferable are video representations based on synthetic data? Adv. Neural Inf. Process. Syst. 35, 35710–35723 (2022)
Google Scholar
Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969–2978 (2022)
Noor, N., Park, I.K.: A lightweight skeleton-based 3d-cnn for real-time fall detection and action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2179–2188 (2023)
Zhai, X., Hu, Z., Yang, D., Zhou, L., Liu, J.: Spatial temporal network for image and skeleton based group activity recognition. In: Proceedings of the Asian Conference on Computer Vision, pp. 20–38 (2022)
Zhang, J., Jia, Y., Xie, W., Tu, Z.: Zoom transformer for skeleton-based group activity recognition. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8646–8659 (2022)
Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: a survey. Neurocomputing 512, 287–306 (2022)
Article Google Scholar
Yan, R., Tang, J., Shu, X., Li, Z., Tian, Q.: Participation-contributed temporal dynamic model for group activity recognition. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1292–1300 (2018)
Li, D., Xie, Y., Zhang, W., Tang, Y., Zhang, Z.: Attentive pooling for group activity recognition. arXiv preprint arXiv:2208.14847 (2022)
Mao, K., Jin, P., Ping, Y., Tang, B.: Modeling multi-scale sub-group context for group activity recognition. Appl. Intell. 53(1), 1149–1161 (2023)
Article Google Scholar
Sinaga, K.P., Yang, M.-S.: Unsupervised k-means clustering algorithm. IEEE Access 8, 80716–80727 (2020)
Article Google Scholar
Sentz, K., Ferson, S.: Combination of evidence in dempster-shafer theory (2002)
Jøsang, A.: Subjective logic (2016)
Yang, K., Gao, J., Feng, Y., Xu, C.: Leveraging attribute knowledge for open-set action recognition. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 762–767 (2023). IEEE
Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Chen, G., Qiao, L., Shi, Y., Peng, P., Li, J., Huang, T., Pu, S., Tian, Y.: Learning open set network with discriminative reciprocal points. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 507–522 (2020). Springer
Krishnan, R., Subedar, M., Tickoo, O.: Bar: Bayesian activity recognition using variational inference. arXiv preprint arXiv:1811.03305 (2018)
Wang, C., Mohamed, A.S.A.: Attention relational network for skeleton-based group activity recognition. IEEE Access 11, 129230–129239 (2023)
Article Google Scholar
Li, Y., Liu, Y., Yu, R., Zong, H., Xie, W.: Dual attention based spatial-temporal inference network for volleyball group activity recognition. Multimedia Tools Appl. 82(10), 15515–15533 (2023)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Key R &D Program of China (2022YFB4501600).

Author information

Authors and Affiliations

Beijing Key Laboratory of Petroleum Data Mining, China University of Petroleum (Beijing), Beijing, 102249, China
Liping Zhu, Silin Wu, Xianxiang Chang, Yixuan Yang & Xuan Li

Authors

Liping Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Silin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xianxiang Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yixuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LZ: Conceptualization, resources. SW: Methodology, writing—original draft preparation, visualization, writing—reviewing and editing. XC: Visualization, investigation. YY: Supervision. XL: Validation.

Corresponding author

Correspondence to Silin Wu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, L., Wu, S., Chang, X. et al. Rethinking group activity recognition under the open set condition. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03424-0

Download citation

Accepted: 21 April 2024
Published: 13 May 2024
DOI: https://doi.org/10.1007/s00371-024-03424-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rethinking group activity recognition under the open set condition

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rethinking group activity recognition under the open set condition

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation