Toward jointly understanding social relationships and characters from videos

Teng, Yiyang; Song, Chenguang; Wu, Bin

doi:10.1007/s10489-021-02738-z

Toward jointly understanding social relationships and characters from videos

Published: 18 August 2021

Volume 52, pages 5633–5645, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yiyang Teng¹,
Chenguang Song¹ &
Bin Wu¹

621 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Automatically recognizing social relationships from videos provides intelligent systems with great potential to better understand the behaviors or emotions of human beings. Most existing methods mainly focus on inferring social characters by detecting their interactions or independently predicting each social relationship. However, they cannot directly learn all social relationships and characters. In this paper, we propose a character and relationship joint learning (CRJL) framework to simultaneously infer all social relationships and character pairs involved in videos. First, the video context and the logical associations among relationships provide important cues for social scene understanding. To incorporate these cues into social relationships and character reasoning, we design a novel character and relationship reasoning graph (CRRG). Specifically, we model the relationship passing process on the graph to learn the logical constraints among relationships. We also introduce a graph attention mechanism to capture discriminative video semantic information. Second, localizing a social character pair via supervised learning is time-consuming, as it requires the annotation of video tracks. Instead, we propose a weak label-based training strategy using clip-level relationships. Experimental results on a public benchmark demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 10

Overall-Distinctive GCN for Social Relation Recognition on Videos

A Multimodal Approach for Multiple-Relation Extraction in Videos

Article 15 September 2021

Representation Learning on Visual-Symbolic Graphs for Video Understanding

References

Elharrouss O, Almaadeed N, Al-Maadeed S, Bouridane A, Beghdadi A (2021) A combined multiple action recognition and summarization for surveillance video sequences. Appl Intell 51:690–712
Article Google Scholar
Zhang X Y, Huang Y P, Yang M, Pe YT, Zou Q, Wang S (2020) Video sketch: A middle-level representation for action recognition. Appl Intell
Lingam G, Rout R R, Somayajulu D (2019) Adaptive deep Q-learning model for detecting social bots and influential users in online social networks. Appl Intell 49:3947–3964
Article Google Scholar
Wang C, Wang C, Wang Z, Ye X, Yu P S (2020) Edge2vec: Edge-based Social Network Embedding. ACM T Knowl Discov D 45:1–24
Google Scholar
Gil M A, Hein A M, Spiegel O, Baskett M L, Sih A (2018) Social information links individual behavior to population and community dynamics. Trends Ecol Evol 33(7):535–548
Article Google Scholar
Chu J, Wang Y, Liu X, Liu Y (2020) Social network community analysis based large-scale group decision making approach with incomplete fuzzy preference relations. Inform Fusion 60:98–120
Article Google Scholar
Kukleva A, Tapaswi M, Laptev I (2020) Learning Interactions and Relationships Between Movie Characters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9849–9858
Liu XC, Liu W, Zhang M, Chen JW, Gao LL, Yan CG, Mei T (2019) Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3566–3574
Perez-Hernandez F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object Detection Binary Classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl-Based Syst 194:105590
Article Google Scholar
Wang M, Shu X, Feng J, Wang X, Tang J (2020) Deep multi-person kinship matching and recognition for family photos. Pattern Recogn 105:107342
Article Google Scholar
Zhang Z P, Luo P, Chen C L, Tang X O (2018) From Facial Expression Recognition to Interpersonal Relation Prediction. Int J Comput Vision 126(1)
Robinson J P, Shao M, Wu Y, Liu H F, Gills T, Fu Y (2018) Visual kinship recognition of families in the wild. IEEE T Pattern Anal 40(11):2624–2637
Labatut V, Bost X (2019) Extraction and analysis of fictional character networks: a survey. ACM Comput Surv 52(5):89
Google Scholar
Wang M, Du X, Shu X, Wang X, Tang J (2020) Deep supervised feature selection for social relationship recognition. Pattern Recogn Lett 138:410–416
Article Google Scholar
Sun QR, Bernt S, Mario F (2017) A Domain Based Approach to Social Relation Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3481–3490
Li JN, Wong YK, Zhao Q, Kankanhalli MS (2017) Dual-Glance Model for Deciphering Social Relationships. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2659
Wang ZX, Chen TS, Ren J, Yu WH, Cheng H, Lin L (2018) Deep Reasoning with Knowledge Graph for Social Relationship Understanding. In: Proceedings of the international joint conference on artificial intelligence, pp 1021– 1028
Goel A, Ma KT, Tan C (2019) An End-to-End Network for Generating Social Relationship Graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11186–11195
Li WH, Duan YQ, Lu JW, Feng JJ, Zhou J (2020) Graph-Based Social Relation Reasoning. In: European conference on computer vision, pp 18–34
Kalita S, Karmakar A, Hazarika S M (2018) Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video. Appl Intell 48:204–219
Article Google Scholar
Vicol P, Tapaswi M, Castrejon L, Fidler S (2018) MovieGraphs: Towards Understanding Human-Centric Situations From Videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8581–8590
Sun X H, Gu J, Sun HY (2020) Research progress of zero-shot learning. Appl Intell
Tian P, Mo HW, Jiang LH (2021) Scene graph generation by multi-level semantic tasks. Appl Intell
Yang L, Li LL, Zhang ZL, Zhou XY, Zhou E, Liu Y (2020) DPGN: Distribution Propagation Graph Network for Few-Shot Learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13390–13399
Xie GS, Liu L, Zhu F, Zhao F, Zhang Z, Yao YZ, Qin J, Shao L (2020) Region Graph Embedding Network for Zero-Shot Learning. In: European conference on computer vision, pp 562–580
Kampffmeyer M, Chen YB, Liang XD, Wang H, Zhang YJ, Xing EP (2019) Rethinking Knowledge Graph Propagation for Zero-Shot Learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11487–11496
Shi WJ, Rajkumar R (2020) Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1711–1719
Tang K, Niu Y, Huang JQ, Shi JX, Zhang HW (2020) Unbiased Scene Graph Generation From Biased Training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3716–3725
Lin X, Ding CX, Zeng JQ, Tao DC (2020) GPS-Net: Graph Property Sensing Network for Scene Graph Generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3746–3753
Hara K, Kataoka H, Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of NAACL-HLT, pp 4171–4186
Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translations. In: Proceedings of conference on empirical methods in natural language processing, pp 1724–1734

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant no. 61972047) and the NSFC-General Technology Basic Research Joint Funds (grant no. U1936220)

Author information

Authors and Affiliations

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, China
Yiyang Teng, Chenguang Song & Bin Wu

Authors

Yiyang Teng
View author publications
You can also search for this author in PubMed Google Scholar
Chenguang Song
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, Y., Song, C. & Wu, B. Toward jointly understanding social relationships and characters from videos. Appl Intell 52, 5633–5645 (2022). https://doi.org/10.1007/s10489-021-02738-z

Download citation

Accepted: 04 August 2021
Published: 18 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02738-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward jointly understanding social relationships and characters from videos

Abstract

Access this article

Similar content being viewed by others

Overall-Distinctive GCN for Social Relation Recognition on Videos

A Multimodal Approach for Multiple-Relation Extraction in Videos

Representation Learning on Visual-Symbolic Graphs for Video Understanding

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward jointly understanding social relationships and characters from videos

Abstract

Access this article

Similar content being viewed by others

Overall-Distinctive GCN for Social Relation Recognition on Videos

A Multimodal Approach for Multiple-Relation Extraction in Videos

Representation Learning on Visual-Symbolic Graphs for Video Understanding

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation