Panoramic Human Activity Recognition

Han, Ruize; Yan, Haomin; Li, Jiacheng; Wang, Songmiao; Feng, Wei; Wang, Song

doi:10.1007/978-3-031-19772-7_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13664))

Included in the following conference series:

European Conference on Computer Vision

2372 Accesses
7 Citations

Abstract

To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneously achieve the recognition of individual actions, social group activities, and global activities. This is a challenging yet practical problem in real-world applications. To track this problem, we develop a novel hierarchical graph neural network to progressively represent and model the multi-granular human activities and mutual social relations for a crowd of people. We further build a benchmark to evaluate the proposed method and other related methods. Experimental results verify the rationality of the proposed PAR problem, the effectiveness of our method and the usefulness of the benchmark. We have released the source code and benchmark to the public for promoting the study on this problem.

H. Yan and J. Li—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bazzani, L., Cristani, M., Murino, V.: Decentralized particle filter for joint individual-group tracking. In: CVPR (2012)
Google Scholar
Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: ICCV (2011)
Google Scholar
Choi W, Shahid K, S.S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: ICCV (2009)
Google Scholar
Diba, A., et al.: Spatio-temporal channel correlation networks for action classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 299–315. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_18
Chapter Google Scholar
Du, Y., Yuan, C., Li, B., Zhao, L., Li, Y., Hu, W.: Interaction-aware spatio-temporal pyramid attention networks for action classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 388–404. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_23
Chapter Google Scholar
Ehsanpour, M., Abedin, A., Saleh, F., Shi, J., Reid, I., Rezatofighi, H.: Joint learning of social groups, individuals action and sub-group activities in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 177–195. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_11
Chapter Google Scholar
Ehsanpour, M., Saleh, F.S., Savarese, S., Reid, I.D., Rezatofighi, H.: JRDB-act: a large-scale dataset for spatio-temporal action, social group and activity detection. In: arXiv preprint (2021)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV (2019)
Google Scholar
Feldmann, M., Fränken, D., Koch, W.: Tracking of extended objects and group targets using random matrices. IEEE Trans. Sig. Process. 59(4), 1409–1420 (2010)
Article Google Scholar
Fernando, T., Denman, S., Sridharan, S., Fookes, C.: GD-GAN: generative adversarial networks for trajectory prediction and group detection in crowds. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 314–330. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_20
Chapter Google Scholar
Friji, R., Drira, H., Chaieb, F., Kchok, H., Kurtek, S.: Geometric deep neural network using rigid and non-rigid transformations for human action recognition. In: ICCV (2021)
Google Scholar
Gan, Y., Han, R., Yin, L., Feng, W., Wang, S.: Self-supervised multi-view multi-human association and tracking. In: ACM MM (2021)
Google Scholar
Gavrilyuk, K., Sanford, R., Javan, M., Snoek, C.G.: Actor-transformers for group activity recognition. In: CVPR (2020)
Google Scholar
Ge, W., Collins, R.T., Ruback, R.B.: Vision-based analysis of small groups in pedestrian crowds. IEEE TPAMI 34(5), 1003–1016 (2012)
Article Google Scholar
Gemeren, C.V., Poppe, R., Veltkamp, R.C.: Spatio-temporal detection of fine-grained dyadic human interactions. In: International Workshop on Human Behavior Understanding (2016)
Google Scholar
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (2004)
Google Scholar
Gu, C., et al.: Ava: A video dataset of spatio-temporally localized atomic visual actions. In: CVPR (2018)
Google Scholar
Han, R., Feng, W., Zhang, Y., Zhao, J., Wang, S.: Multiple human association and tracking from egocentric and complementary top views. IEEE TPAMI (2021). https://doi.org/10.1109/TPAMI.2021.3070562
Article Google Scholar
Han, R., et al.: Complementary-view multiple human tracking. In: AAAI (2020)
Google Scholar
Han, R., Zhao, J., Feng, W., Gan, Y., Wan, L., Wang, S.: Complementary-view co-interest person detection. In: ACM MM (2020)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Google Scholar
Huang, Z., Wan, C., Probst, T., Van Gool, L.: Deep learning on lie groups for skeleton-based action recognition. In: CVPR (2017)
Google Scholar
Ibrahim, M.S., Mori, G.: Hierarchical relational networks for group activity recognition and retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 742–758. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_44
Chapter Google Scholar
Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: CVPR (2016)
Google Scholar
Li, Y., Chen, L., He, R., Wang, Z., Wu, G., Wang, L.: MultiSports: a multi-person video dataset of spatio-temporally localized sports actions. In: arXiv preprint (2021)
Google Scholar
Ma, F., et al.: SF-net: single-frame supervision for temporal action localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 420–437. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_25
Chapter Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
Google Scholar
Martin-Martin, R., et al.: JRDB: a dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE TPAMI (2021). https://doi.org/10.1109/TPAMI.2021.3070543
Article Google Scholar
Mettes, P., van Gemert, J.C., Snoek, C.G.M.: Spot on: action localization from pointly-supervised proposals. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 437–453. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_27
Chapter Google Scholar
Pan, J., Chen, S., Shou, M.Z., Liu, Y., Shao, J., Li, H.: Actor-context-actor relation network for spatio-temporal action localization. In: CVPR (2021)
Google Scholar
Pang, S.K., Li, J., Godsill, S.J.: Detection and tracking of coordinated groups. IEEE Trans. Aerosp. Electron. Syst. 47(1), 472–502 (2011)
Article Google Scholar
Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE TPAMI 34(12), 2441–2453 (2012)
Article Google Scholar
Pramono, R.R.A., Chen, Y.T., Fang, W.H.: Empowering relational network by self-attention augmented conditional random fields for group activity recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 71–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_5
Chapter Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV (2009)
Google Scholar
Shao, J., Change Loy, C., Wang, X.: Scene-independent group profiling in crowd. In: CVPR (2014)
Google Scholar
Shu, T., Todorovic, S., Zhu, S.C.: CERN: confidence-energy recurrent network for group activity recognition. In: CVPR (2017)
Google Scholar
Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 106–121. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_7
Chapter Google Scholar
Solera, F., Calderara, S., Cucchiara, R.: Socially constrained structural learning for groups detection in crowd. IEEE TPAMI 38(5), 995–1008 (2015)
Article Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. Comput. Sci. (2012)
Google Scholar
Stergiou, A., Poppe, R.: Analyzing human-human interactions: a survey. Comput. Vision Image Underst. 188(Nov.), 102799.1–102799.12 (2019)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Google Scholar
Tang, J., Xia, J., Mu, X., Pang, B., Lu, C.: Asynchronous interaction aggregation for action detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 71–87. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_5
Chapter Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: CVPR (2014)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Wang, X., et al.: Panda: a gigapixel-level human-centric video dataset. In: CVPR (2020)
Google Scholar
Weinzaepfel, P., Martin, X., Schmid, C.: Towards weakly-supervised action localization. In: arXiv preprint (2016)
Google Scholar
Wu, J., Kuang, Z., Wang, L., Zhang, W., Wu, G.: Context-aware RCNN: a baseline for action detection in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 440–456. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_27
Chapter Google Scholar
Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: CVPR (2019)
Google Scholar
Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: HiGCIN: hierarchical graph-based cross inference network for group activity recognition. IEEE TPAMI (2020). https://doi.org/10.1109/TPAMI.2020.3034233
Article Google Scholar
Yuan, H., Ni, D.: Learning visual context for group activity recognition. In: AAAI (2021)
Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: CVPRW (2012)
Google Scholar
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: NeurIPS (2004)
Google Scholar
Zhan, X., Liu, Z., Yan, J., Lin, D., Loy, C.C.: Consensus-driven propagation in massive unlabeled data for face recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 576–592. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_35
Chapter Google Scholar
Zhang, X.Y., Shi, H., Li, C., Li, P.: Multi-instance multi-label action recognition and localization based on spatio-temporal pre-trimming for untrimmed videos. In: AAAI (2020)
Google Scholar
Zhao, J., Han, R., Gan, Y., Wan, L., Feng, W., Wang, S.: Human identification and interaction detection in cross-view multi-person videos with wearable cameras. In: ACM MM (2020)
Google Scholar
Zhou, Y., Sun, X., Zha, Z.J., Zeng, W.: MICT: mixed 3D/2D convolutional tube for human action recognition. In: CVPR (2018)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grants U1803264, 62072334, and the Tianjin Research Innovation Project for Postgraduate Students under Grant 2021YJSB174.

Author information

Authors and Affiliations

Intelligence and Computing College, Tianjin University, Tianjin, China
Ruize Han, Haomin Yan, Jiacheng Li, Songmiao Wang & Wei Feng
University of South Carolina, Columbia, USA
Song Wang

Authors

Ruize Han
View author publications
You can also search for this author in PubMed Google Scholar
Haomin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jiacheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Songmiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ruize Han , Wei Feng or Song Wang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1357 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, R., Yan, H., Li, J., Wang, S., Feng, W., Wang, S. (2022). Panoramic Human Activity Recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13664. Springer, Cham. https://doi.org/10.1007/978-3-031-19772-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-19772-7_15
Published: 28 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19771-0
Online ISBN: 978-3-031-19772-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics