Skip to main content
Log in

Rethinking group activity recognition under the open set condition

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In real-world scenarios, the recognition of unknown activities poses a significant challenge for group activity recognition. Existing methods primarily focus on closed sets, leaving the task of open set group activity recognition unexplored. In this paper, we introduce the concept of open set group activity recognition for the first time and propose a novel recognition framework to deal with it. To mitigate potential scene biases, keypoints extracted from groups are utilized as input. Our framework employs a two-stage approach: Evidence Aware Collection and Evidence Aware Decision, to address the challenge of insufficient evidence for rejecting unknown classes. Specifically, encoders are established at the individual, subgroup, and group scales to collect activity evidence among group members. By applying an attention mechanism, we focus on important evidence, resulting in a set of aggregated evidence. The uncertainty estimated from evidence is then used to effectively distinguish between known and unknown classes. Additionally, we perform open set splits on two publicly available group activity recognition datasets. Experimental results demonstrate that our method shows promising performance in open set group activity recognition while maintaining comparable performance under closed set conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability statement

The Volleyball dataset can be obtained from https://github.com/mostafa-saad/deep-activity-rec, while the CAD dataset is available at https://vhosts.eecs.umich.edu/vision//activity-dataset.html. Our code will be made available on request.

References

  1. Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1282–1289. IEEE (2009)

  2. Wu, L., Tian, M., Xiang, Y., Gu, K., Shi, G.: Learning label semantics for weakly supervised group activity recognition. IEEE Trans. Multimedia 26, 6386–6397 (2024)

    Article  Google Scholar 

  3. Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9964–9974 (2019)

  4. Wang, L., Feng, W., Tian, C., Chen, L., Pei, J.: 3d-unified spatial-temporal graph for group activity recognition. Neurocomputing 556, 126646 (2023)

    Article  Google Scholar 

  5. Li, S., Cao, Q., Liu, L., Yang, K., Liu, S., Hou, J., Yi, S.: Groupformer: Group activity recognition with clustered spatial-temporal transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13668–13677 (2021)

  6. Kim, D., Lee, J., Cho, M., Kwak, S.: Detector-free weakly supervised group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20083–20093 (2022)

  7. Gavrilyuk, K., Sanford, R., Javan, M., Snoek, C.G.: Actor-transformers for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 839–848 (2020)

  8. Zhou, H., Kadav, A., Shamsian, A., Geng, S., Lai, F., Zhao, L., Liu, T., Kapadia, M., Graf, H.P.: Composer: compositional reasoning of group activity in videos with keypoint-only modality. In: European Conference on Computer Vision, pp. 249–266 (2022). Springer

  9. Du, Z., Wang, X., Wang, Q.: Perceiving local relative motion and global correlations for weakly supervised group activity recognition. Image Vis. Comput. 137, 104789 (2023)

    Article  Google Scholar 

  10. Shu, Y., Shi, Y., Wang, Y., Zou, Y., Yuan, Q., Tian, Y.: Odn: Opening the deep network for open-set action recognition. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018). IEEE

  11. Yoon, Y., Yu, J., Jeon, M.: Spatio-temporal representation matching-based open-set action recognition by joint learning of motion and appearance. IEEE Access 7, 165997–166010 (2019)

    Article  Google Scholar 

  12. Bao, W., Yu, Q., Kong, Y.: Evidential deep learning for open set action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13349–13358 (2021)

  13. Zhao, C., Du, D., Hoogs, A., Funk, C.: Open set action recognition via multi-label evidential learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22982–22991 (2023)

  14. Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1242–1257 (2013)

    Article  Google Scholar 

  15. Shu, T., Xie, D., Rothrock, B., Todorovic, S., Chun Zhu, S.: Joint inference of groups, events and human roles in aerial videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4576–4584 (2015)

  16. Lin, W., Chu, H., Wu, J., Sheng, B., Chen, Z.: A heat-map-based algorithm for recognizing group activities in videos. IEEE Trans. Circuits Syst. Video Technol. 23(11), 1980–1992 (2013)

    Article  Google Scholar 

  17. Lin, W., Sun, M.-T., Poovendran, R., Zhang, Z.: Group event detection with a varying number of group members for video surveillance. IEEE Trans. Circuits Syst. Video Technol. 20(8), 1057–1067 (2010)

    Article  Google Scholar 

  18. Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part IV 12, pp. 187–200 (2012). Springer

  19. Amer, M.R., Lei, P., Todorovic, S.: Hirf: Hierarchical random field for collective activity recognition in videos. In: European Conference on Computer Vision, pp. 572–585 (2014). Springer

  20. Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: End-to-end multi-person action localization and collective activity recognition. IEEE Conference on Computer Vision & Pattern Recognition (2016)

  21. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1971–1980 (2016)

  22. Shu, T., Todorovic, S., Zhu, S.-C.: Cern: confidence-energy recurrent network for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5523–5531 (2017)

  23. Wang, M., Ni, B., Yang, X.: Recurrent modeling of interaction context for collective activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3048–3056 (2017)

  24. Tang, Y., Wang, Z., Li, P., Lu, J., Yang, M., Zhou, J.: Mining semantics-preserving attention for group activity recognition. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1283–1291 (2018)

  25. Qi, M., Jie, Q., Li, A., Wang, Y., Luo, J., Gool, L.V.: stagnet: An attentive semantic rnn for group activity recognition. In: Springer, Cham (2018)

  26. Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 636–647 (2019)

    Article  Google Scholar 

  27. Shu, X., Tang, J., Qi, G.-J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1110–1118 (2019)

    Article  Google Scholar 

  28. Ehsanpour, M., Abedin, A., Saleh, F., Shi, J., Reid, I., Rezatofighi, H.: Joint learning of social groups, individuals action and sub-group activities in videos. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pp. 177–195 (2020). Springer

  29. Yuan, H., Ni, D., Wang, M.: Spatio-temporal dynamic inference network for group activity recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7476–7485 (2021)

  30. Lin, W., Chen, Y., Wu, J., Wang, H., Sheng, B., Li, H.: A new network-based algorithm for human activity recognition in videos. IEEE Trans. Circuits Syst. Video Technol. 24(5), 826–841 (2013)

    Article  Google Scholar 

  31. Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Higcin: Hierarchical graph-based cross inference network for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6955–6968 (2020)

    Article  Google Scholar 

  32. Pramono, R.R.A., Chen, Y.T., Fang, W.H.: Empowering relational network by self-attention augmented conditional random fields for group activity recognition. In: European Conference on Computer Vision, pp. 71–90 (2020). Springer

  33. Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Social adaptive module for weakly-supervised group activity recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 208–224 (2020). Springer

  34. Hu, G., Cui, B., He, Y., Yu, S.: Progressive relation learning for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 980–989 (2020)

  35. Tang, Y., Wei, Y., Yu, X., Lu, J., Zhou, J.: Graph interaction networks for relation transfer in human activity videos. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2872–2886 (2020)

    Article  Google Scholar 

  36. Shu, X., Zhang, L., Sun, Y., Tang, J.: Host-parasite: graph LSTM-in-LSTM for group activity recognition. IEEE Trans Neural Netw Learn Syst 32(2), 663–674 (2020)

    Article  Google Scholar 

  37. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv Neural Inf Process Syst 30 (2017)

  38. Tarashima, S., Center, I.: One-shot deep model for end-to-end multi-person activity recognition. In: British Machine Vision Conference (2021)

  39. Yuan, H., Ni, D.: Learning visual context for group activity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, 3261–3269 (2021)

  40. Li, W., Yang, T., Wu, X., Du, X.-J., Qiao, J.-J.: Learning action-guided spatio-temporal transformer for group activity recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2051–2060 (2022)

  41. Hu, B., Cham, T.-J.: Entry-flipped transformer for inference and prediction of participant behavior. In: European Conference on Computer Vision, pp. 439–456 (2022). Springer

  42. Han, M., Zhang, D.J., Wang, Y., Yan, R., Yao, L., Chang, X., Qiao, Y.: Dual-ai: Dual-path actor interaction learning for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2990–2999 (2022)

  43. Zhu, X., Zhou, Y., Wang, D., Ouyang, W., Su, R.: Mlst-former: Multi-level spatial-temporal transformer for group activity recognition. IEEE Transactions on Circuits and Systems for Video Technology (2022)

  44. Du, Z., Wang, X., Wang, Q.: Self-supervised global spatio-temporal interaction pre-training for group activity recognition. IEEE Trans. Circuits Syst. Video Technol. 33, 5076–5088 (2023)

    Article  Google Scholar 

  45. Li, F., Wechsler, H.: Open set face recognition using transduction. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1686–1697 (2005)

    Article  Google Scholar 

  46. Scheirer, W.J., Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2012)

    Article  Google Scholar 

  47. Jain, L.P., Scheirer, W.J., Boult, T.E.: Multi-class open set recognition using probability of inclusion. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13, pp. 393–409 (2014). Springer

  48. Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1563–1572 (2016)

  49. Neal, L., Olson, M., Fern, X., Wong, W.-K., Li, F.: Open set learning with counterfactual images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 613–628 (2018)

  50. Ditria, L., Meyer, B.J., Drummond, T.: Opengan: Open set generative adversarial networks. In: Proceedings of the Asian Conference on Computer Vision (2020)

  51. Yang, G., Zhou, S., Wan, M.: Open-set recognition model based on negative-class sample feature enhancement learning algorithm. Mathematics 10(24), 4725 (2022)

    Article  Google Scholar 

  52. Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., Naemura, T.: Classification-reconstruction learning for open-set recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4016–4025 (2019)

  53. Oh, H., Kim, S.B.: Multivariate time series open-set recognition using multi-feature extraction and reconstruction. IEEE Access 10, 120063–120073 (2022)

    Article  Google Scholar 

  54. Huang, H., Wang, Y., Hu, Q., Cheng, M.-M.: Class-specific semantic reconstruction for open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4214–4228 (2022)

    Google Scholar 

  55. Roitberg, A., Al-Halah, Z., Stiefelhagen, R.: Informed democracy: voting-based novelty detection for action recognition. arXiv preprint arXiv:1810.12819 (2018)

  56. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 30 (2017)

  57. Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. Adv. Neural Inf. Process. Syst. 31 (2018)

  58. Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

    Article  Google Scholar 

  59. Choi, J., Gao, C., Messou, J.C., Huang, J.-B.: Why can’t I dance in the mall? Learning to mitigate scene bias in action recognition. Adv. Neural Inf. Process. Syst. 32 (2019)

  60. Kim, Y.-W., Mishra, S., Jin, S., Panda, R., Kuehne, H., Karlinsky, L., Saligrama, V., Saenko, K., Oliva, A., Feris, R.: How transferable are video representations based on synthetic data? Adv. Neural Inf. Process. Syst. 35, 35710–35723 (2022)

    Google Scholar 

  61. Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969–2978 (2022)

  62. Noor, N., Park, I.K.: A lightweight skeleton-based 3d-cnn for real-time fall detection and action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2179–2188 (2023)

  63. Zhai, X., Hu, Z., Yang, D., Zhou, L., Liu, J.: Spatial temporal network for image and skeleton based group activity recognition. In: Proceedings of the Asian Conference on Computer Vision, pp. 20–38 (2022)

  64. Zhang, J., Jia, Y., Xie, W., Tu, Z.: Zoom transformer for skeleton-based group activity recognition. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8646–8659 (2022)

  65. Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: a survey. Neurocomputing 512, 287–306 (2022)

    Article  Google Scholar 

  66. Yan, R., Tang, J., Shu, X., Li, Z., Tian, Q.: Participation-contributed temporal dynamic model for group activity recognition. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1292–1300 (2018)

  67. Li, D., Xie, Y., Zhang, W., Tang, Y., Zhang, Z.: Attentive pooling for group activity recognition. arXiv preprint arXiv:2208.14847 (2022)

  68. Mao, K., Jin, P., Ping, Y., Tang, B.: Modeling multi-scale sub-group context for group activity recognition. Appl. Intell. 53(1), 1149–1161 (2023)

    Article  Google Scholar 

  69. Sinaga, K.P., Yang, M.-S.: Unsupervised k-means clustering algorithm. IEEE Access 8, 80716–80727 (2020)

    Article  Google Scholar 

  70. Sentz, K., Ferson, S.: Combination of evidence in dempster-shafer theory (2002)

  71. Jøsang, A.: Subjective logic (2016)

  72. Yang, K., Gao, J., Feng, Y., Xu, C.: Leveraging attribute knowledge for open-set action recognition. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 762–767 (2023). IEEE

  73. Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  74. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)

  75. Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606 (2018)

  76. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  77. Chen, G., Qiao, L., Shi, Y., Peng, P., Li, J., Huang, T., Pu, S., Tian, Y.: Learning open set network with discriminative reciprocal points. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 507–522 (2020). Springer

  78. Krishnan, R., Subedar, M., Tickoo, O.: Bar: Bayesian activity recognition using variational inference. arXiv preprint arXiv:1811.03305 (2018)

  79. Wang, C., Mohamed, A.S.A.: Attention relational network for skeleton-based group activity recognition. IEEE Access 11, 129230–129239 (2023)

    Article  Google Scholar 

  80. Li, Y., Liu, Y., Yu, R., Zong, H., Xie, W.: Dual attention based spatial-temporal inference network for volleyball group activity recognition. Multimedia Tools Appl. 82(10), 15515–15533 (2023)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R &D Program of China (2022YFB4501600).

Author information

Authors and Affiliations

Authors

Contributions

LZ: Conceptualization, resources. SW: Methodology, writing—original draft preparation, visualization, writing—reviewing and editing. XC: Visualization, investigation. YY: Supervision. XL: Validation.

Corresponding author

Correspondence to Silin Wu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, L., Wu, S., Chang, X. et al. Rethinking group activity recognition under the open set condition. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03424-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03424-0

Keywords

Navigation