Skip to main content

Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

Abstract

Group activity recognition is an challenging task with a major issue that reasons about complex interaction relations in the context of multi-person scenes. Most existing approaches concentrate on capturing interaction relations and learning features of the group activity at individual or group levels. These approaches lose sight of multi-level structures and interaction relations of the group activity. To overcome this challenge, we propose a Multi-level Interaction Relation model (MIR) to flexibly and efficiently learn multi-level structures of the group activity and capture multi-level interaction relations in the group activity. MIR employs graph pooling and unpooling networks to build multi-grained group relation graphs, and thus divide the group activity into multiple levels. Specifically, the Key Actor based Group Pooling layer (KeyPool) selects key persons in the activity to build the coarser-grained graph while the Key Actor based Group Unpooling layer (KeyUnPool) reconstructs the finer-grained graph according the corresponding KeyPool. Multiple KeyPool and KeyUnPool progressively build multi-grained graphs and learn multi-level structures of the group activity. Thanks to graph convolutions performed on multi-grained relation graphs, multi-level interactions are finally captured. In addition, graph readout (GR) layers are added to obtain multi-level spatio-temporal features of The group activity. Experimental results on two publicly available datasets demonstrate the effectiveness of KeyPool and KeyUnPool, and show our model can further improve the performance of group activity recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4315–4324 (2017)

    Google Scholar 

  2. Choi, W., Shahid, K., Savarese, S.: What are they doing?: collective activity classification using spatio-temporal relationship among people. In: Proceedings of the IEEE Conference on International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1282–1289. IEEE (2009)

    Google Scholar 

  3. Gao, H., Ji, S.: Graph U-Nets. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 2083–2092 (2019)

    Google Scholar 

  4. Hu, G., Cui, B., He, Y., Yu, S.: Progressive relation learning for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 980–989 (2020)

    Google Scholar 

  5. Ibrahim, M.S., Mori, G.: Hierarchical relational networks for group activity recognition and retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 742–758. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_44

    Chapter  Google Scholar 

  6. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980 (2016)

    Google Scholar 

  7. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  8. Kong, L., Qin, J., Huang, D., Wang, Y., Van Gool, L.: Hierarchical attention and context modeling for group activity recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1328–1332. IEEE (2018)

    Google Scholar 

  9. Lee, J., Lee, I., Kang, J.: Self-attention graph pooling. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 3734–3743 (2019)

    Google Scholar 

  10. Li, X., Choo Chuah, M.: SBGAR: semantics based group activity recognition. In: Proceedings of the IEEE Conference on International Conference on Computer Vision (ICCV), pp. 2876–2885 (2017)

    Google Scholar 

  11. Lu, L., Di, H., Lu, Y., Zhang, L., Wang, S.: A two-level attention-based interaction model for multi-person activity recognition. Neurocomputing 322, 195–205 (2018)

    Article  Google Scholar 

  12. Lu, L., Di, H., Lu, Y., Zhang, L., Wang, S.: Spatio-temporal attention mechanisms based model for collective activity recognition. Sig. Process. Image Commun. 74, 162–174 (2019)

    Article  Google Scholar 

  13. Lu, L., Lu, Y., Yu, R., Di, H., Zhang, L., Wang, S.: GAIM: graph attention interaction model for collective activity recognition. IEEE Trans. Multimedia 22(2), 524–539 (2019)

    Article  Google Scholar 

  14. Morris, C., et al.: Weisfeiler and leman go neural: higher-order graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 33, pp. 4602–4609 (2019)

    Google Scholar 

  15. Qi, M., Wang, Y., Qin, J., Li, A., Luo, J., Van Gool, L.: stagNet: an attentive semantic RNN for group activity and individual action recognition. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 30(2), 549–565 (2020)

    Article  Google Scholar 

  16. Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3043–3053 (2016)

    Google Scholar 

  17. Shu, T., Todorovic, S., Zhu, S.C.: CERN: confidence-energy recurrent network for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5523–5531 (2017)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  19. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)

    Google Scholar 

  20. Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), 1 (2019)

    Google Scholar 

  21. Tang, Y., Lu, J., Wang, Z., Yang, M., Zhou, J.: Learning semantics-preserving attention and contextual interaction for group activity recognition. IEEE Trans. Image Process. (TIP) 28(10), 4997–5012 (2019)

    Article  MathSciNet  Google Scholar 

  22. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  23. Wang, M., Ni, B., Yang, X.: Recurrent modeling of interaction context for collective activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3048–3056 (2017)

    Google Scholar 

  24. Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9964–9974 (2019)

    Google Scholar 

  25. Yan, R., Tang, J., Shu, X., Li, Z., Tian, Q.: Participation-contributed temporal dynamic model for group activity recognition. In: ACM International Conference on Multimedia (ACMMM), pp. 1292–1300 (2018)

    Google Scholar 

  26. Yang, L., Peng, H., Zhang, D., Fu, J., Han, J.: Revisiting anchor mechanisms for temporal action localization. IEEE Trans. Image Process. 29, 8535–8548 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihua Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, L., Lu, Y., Wang, S. (2021). Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67832-6_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67831-9

  • Online ISBN: 978-3-030-67832-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics