Self-supervised Social Relation Representation for Human Group Detection

Li, Jiacheng; Han, Ruize; Yan, Haomin; Qian, Zekun; Feng, Wei; Wang, Song

doi:10.1007/978-3-031-19833-5_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13695))

Included in the following conference series:

European Conference on Computer Vision

3 Citations

Abstract

Human group detection, which splits crowd of people into groups, is an important step for video-based human social activity analysis. The core of human group detection is the human social relation representation and division. In this paper, we propose a new two-stage multi-head framework for human group detection. In the first stage, we propose a human behavior simulator head to learn the social relation feature embedding, which is self-supervised trained by leveraging the socially grounded multi-person behavior relationship. In the second stage, based on the social relation embedding, we develop a self-attention inspired network for human group detection. Remarkable performance on two state-of-the-art large-scale benchmarks, i.e., PANDA and JRDB-Group, verifies the effectiveness of the proposed framework. Benefiting from the self-supervised social relation embedding, our method can provide promising results with very few (labeled) training data. We have released the source code to the public.

J. Li and R. Han—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

A Comprehensive Review of Group Activity Recognition in Videos

Article Open access 11 January 2021

Semantic Guided Attention for Weakly Supervised Group Activity Recognition

Notes

1.
If a subject is missing in a frame, we fill it with blank (all-zero feature vector).

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Bazzani, L., Cristani, M., Murino, V.: Decentralized particle filter for joint individual-group tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: IEEE International Conference on Computer Vision (2011)
Google Scholar
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Discovering groups of people in images. In: European Conference on Computer Vision (2014). https://doi.org/10.1007/978-3-319-10593-2_28
Ehsanpour, M., Abedin, A., Saleh, F., Shi, J., Reid, I., Rezatofighi, H.: Joint learning of social groups, individuals action and sub-group activities in videos. In: European Conference on Computer Vision (2020)
Google Scholar
Ehsanpour, M., Saleh, F., Savarese, S., Reid, I., Rezatofighi, H.: JRDB-Act: a large-scale multi-modal dataset for spatio-temporal action, social group and activity detection. arXiv Preprint arXiv:2106.08827 (2021)
Ehsanpour, M., Saleh, F.S., Savarese, S., Reid, I.D., Rezatofighi, H.: JRDB-Act: a large-scale dataset for spatio-temporal action, social group and activity detection (2021)
Google Scholar
Fan, L., Wang, W., Huang, S., Tang, X., Zhu, S.C.: Understanding human gaze communication by spatio-temporal graph reasoning. In: IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Feldmann, M., Fränken, D., Koch, W.: Tracking of extended objects and group targets using random matrices. IEEE Trans. Signal Process. 59(4), 1409–1420 (2010)
Article Google Scholar
Fernando, T., Denman, S., Sridharan, S., Fookes, C.: GD-GAN: generative adversarial networks for trajectory prediction and group detection in crowds. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 314–330. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_20
Chapter Google Scholar
Gan, Y., Han, R., Yin, L., Feng, W., Wang, S.: Self-supervised multi-view multi-human association and tracking. In: ACM International Conference on Multimedia (2021)
Google Scholar
Gavrilyuk, K., Sanford, R., Javan, M., Snoek, C.G.: Actor-transformers for group activity recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Ge, W., Collins, R.T., Ruback, R.B.: Vision-based analysis of small groups in pedestrian crowds. IEEE Trans. Patt. Anal. Mach. Intell. 34(5), 1003–1016 (2012)
Article Google Scholar
Goel, A., Ma, K.T., Tan, C.: An end-to-end network for generating social relationship graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Han, R., Feng, W., Zhang, Y., Zhao, J., Wang, S.: Multiple human association and tracking from egocentric and complementary top views. IEEE TPAMI (2021). https://doi.org/10.1109/TPAMI.2021.3070562
Han, R., et al.: Complementary-view multiple human tracking. In: AAAI Conference on Artificial Intelligence (2020)
Google Scholar
Han, R., Gan, Y., Li, J., Wang, F., Feng, W., Wang, S.: Connecting the complementary-view videos: joint camera identification and subject association. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Han, R., Wang, Y., Yan, H., Feng, W., Wang, S.: Multi-view multi-human association with deep assignment network. IEEE TIP 31, 1830–1840 (2022)
Google Scholar
Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. In: Computer Graphics Forum (2007)
Google Scholar
Li, W., Duan, Y., Lu, J., Feng, J., Zhou, J.: Graph-based social relation reasoning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_2
Chapter Google Scholar
Martín-Martín, R., et al.: JRDB: a dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE Trans. Patt. Anal. Mach. Intell. (2021)
Google Scholar
Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 502–508 (2019)
Google Scholar
Moussaïd, M., Perozo, N., Garnier, S., Helbing, D., Theraulaz, G.: The walking behaviour of pedestrian social groups and its impact on crowd dynamics. PloS One 5(4), e10047 (2010)
Google Scholar
Pang, S.K., Li, J., Godsill, S.J.: Detection and tracking of coordinated groups. IEEE Trans. Aerosp. Electron. Syst. 47(1), 472–502 (2011)
Article Google Scholar
Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Pellegrini, S., Ess, A., Van Gool, L.: Improving data association by joint modeling of pedestrian trajectories and groupings. In: European Conference on Computer Vision (2010)
Google Scholar
Pramono, R.R.A., Chen, Y.T., Fang, W.H.: Empowering relational network by self-attention augmented conditional random fields for group activity recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 71–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_5
Chapter Google Scholar
Shao, J., Dong, N., Zhao, Q.: A real-time algorithm for small group detection in medium density crowds. Patt. Recognit. Image Anal. 28(2), 282–287 (2018)
Article Google Scholar
Shao, J., Loy, C.C., Wang, X.: Scene-independent group profiling in crowd. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Solera, F., Calderara, S., Cucchiara, R.: Socially constrained structural learning for groups detection in crowd. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 995–1008 (2015)
Article Google Scholar
Swofford, M., et al.: Improving social awareness through DANTE: deep affinity network for clustering conversational interactants. ACM Hum.-Comput. Interact. 4(CSCW1), 1–23 (2020)
Article Google Scholar
Thompson, S., Gupta, A., Gupta, A.W., Chen, A., Vázquez, M.: Conversational group detection with graph neural networks. In: International Conference on Multimodal Interaction (2021)
Google Scholar
Turner, J.C.: Towards a cognitive redefinition of the social group. In: Research Colloquium on Social Identity of the European Laboratory of Social Psychology. Psychology Press (2010)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Wang, X., et al.: PANDA: a gigapixel-level human-centric video dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Yamaguchi, K., Berg, A.C., Ortiz, L.E., Berg, T.L.: Who are you with and where are you going? In: IEEE Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Social adaptive module for weakly-supervised group activity recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 208–224. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_13
Chapter Google Scholar
Yuan, H., Ni, D.: Learning visual context for group activity recognition. In: AAAI Conference on Artificial Intelligence (2021)
Google Scholar
Zhan, X., Liu, Z., Yan, J., Lin, D., Loy, C.C.: Consensus-driven propagation in massive unlabeled data for face recognition. In: European Conference on Computer Vision (2018)
Google Scholar
Zhao, J., Han, R., Gan, Y., Wan, L., Feng, W., Wang, S.: Human identification and interaction detection in cross-view multi-person videos with wearable cameras. In: ACM International Conference on Multimedia (2020)
Google Scholar
Zhou, T., Wang, W., Qi, S., Ling, H., Shen, J.: Cascaded human-object interaction recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Zou, C., et al.: End-to-end human object interaction detection with HOI transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grants U1803264, 62072334, and the Tianjin Research Innovation Project for Postgraduate Students under Grant 2021YJSB174.

Author information

Authors and Affiliations

Intelligence and Computing College, Tianjin University, Tianjin, China
Jiacheng Li, Ruize Han, Haomin Yan, Zekun Qian & Wei Feng
University of South Carolina, Columbia, USA
Song Wang

Authors

Jiacheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruize Han
View author publications
You can also search for this author in PubMed Google Scholar
Haomin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zekun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruize Han .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2483 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Han, R., Yan, H., Qian, Z., Feng, W., Wang, S. (2022). Self-supervised Social Relation Representation for Human Group Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13695. Springer, Cham. https://doi.org/10.1007/978-3-031-19833-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-19833-5_9
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19832-8
Online ISBN: 978-3-031-19833-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Self-supervised Social Relation Representation for Human Group Detection

Abstract

Access this chapter

Similar content being viewed by others

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

A Comprehensive Review of Group Activity Recognition in Videos

Semantic Guided Attention for Weakly Supervised Group Activity Recognition

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2483 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Self-supervised Social Relation Representation for Human Group Detection

Abstract

Access this chapter

Similar content being viewed by others

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

A Comprehensive Review of Group Activity Recognition in Videos

Semantic Guided Attention for Weakly Supervised Group Activity Recognition

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2483 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation