Skip to main content
Log in

Toward jointly understanding social relationships and characters from videos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Automatically recognizing social relationships from videos provides intelligent systems with great potential to better understand the behaviors or emotions of human beings. Most existing methods mainly focus on inferring social characters by detecting their interactions or independently predicting each social relationship. However, they cannot directly learn all social relationships and characters. In this paper, we propose a character and relationship joint learning (CRJL) framework to simultaneously infer all social relationships and character pairs involved in videos. First, the video context and the logical associations among relationships provide important cues for social scene understanding. To incorporate these cues into social relationships and character reasoning, we design a novel character and relationship reasoning graph (CRRG). Specifically, we model the relationship passing process on the graph to learn the logical constraints among relationships. We also introduce a graph attention mechanism to capture discriminative video semantic information. Second, localizing a social character pair via supervised learning is time-consuming, as it requires the annotation of video tracks. Instead, we propose a weak label-based training strategy using clip-level relationships. Experimental results on a public benchmark demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Elharrouss O, Almaadeed N, Al-Maadeed S, Bouridane A, Beghdadi A (2021) A combined multiple action recognition and summarization for surveillance video sequences. Appl Intell 51:690–712

    Article  Google Scholar 

  2. Zhang X Y, Huang Y P, Yang M, Pe YT, Zou Q, Wang S (2020) Video sketch: A middle-level representation for action recognition. Appl Intell

  3. Lingam G, Rout R R, Somayajulu D (2019) Adaptive deep Q-learning model for detecting social bots and influential users in online social networks. Appl Intell 49:3947–3964

    Article  Google Scholar 

  4. Wang C, Wang C, Wang Z, Ye X, Yu P S (2020) Edge2vec: Edge-based Social Network Embedding. ACM T Knowl Discov D 45:1–24

    Google Scholar 

  5. Gil M A, Hein A M, Spiegel O, Baskett M L, Sih A (2018) Social information links individual behavior to population and community dynamics. Trends Ecol Evol 33(7):535–548

    Article  Google Scholar 

  6. Chu J, Wang Y, Liu X, Liu Y (2020) Social network community analysis based large-scale group decision making approach with incomplete fuzzy preference relations. Inform Fusion 60:98–120

    Article  Google Scholar 

  7. Kukleva A, Tapaswi M, Laptev I (2020) Learning Interactions and Relationships Between Movie Characters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9849–9858

  8. Liu XC, Liu W, Zhang M, Chen JW, Gao LL, Yan CG, Mei T (2019) Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3566–3574

  9. Perez-Hernandez F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object Detection Binary Classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl-Based Syst 194:105590

    Article  Google Scholar 

  10. Wang M, Shu X, Feng J, Wang X, Tang J (2020) Deep multi-person kinship matching and recognition for family photos. Pattern Recogn 105:107342

    Article  Google Scholar 

  11. Zhang Z P, Luo P, Chen C L, Tang X O (2018) From Facial Expression Recognition to Interpersonal Relation Prediction. Int J Comput Vision 126(1)

  12. Robinson J P, Shao M, Wu Y, Liu H F, Gills T, Fu Y (2018) Visual kinship recognition of families in the wild. IEEE T Pattern Anal 40(11):2624–2637

  13. Labatut V, Bost X (2019) Extraction and analysis of fictional character networks: a survey. ACM Comput Surv 52(5):89

    Google Scholar 

  14. Wang M, Du X, Shu X, Wang X, Tang J (2020) Deep supervised feature selection for social relationship recognition. Pattern Recogn Lett 138:410–416

    Article  Google Scholar 

  15. Sun QR, Bernt S, Mario F (2017) A Domain Based Approach to Social Relation Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3481–3490

  16. Li JN, Wong YK, Zhao Q, Kankanhalli MS (2017) Dual-Glance Model for Deciphering Social Relationships. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2659

  17. Wang ZX, Chen TS, Ren J, Yu WH, Cheng H, Lin L (2018) Deep Reasoning with Knowledge Graph for Social Relationship Understanding. In: Proceedings of the international joint conference on artificial intelligence, pp 1021– 1028

  18. Goel A, Ma KT, Tan C (2019) An End-to-End Network for Generating Social Relationship Graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11186–11195

  19. Li WH, Duan YQ, Lu JW, Feng JJ, Zhou J (2020) Graph-Based Social Relation Reasoning. In: European conference on computer vision, pp 18–34

  20. Kalita S, Karmakar A, Hazarika S M (2018) Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video. Appl Intell 48:204–219

    Article  Google Scholar 

  21. Vicol P, Tapaswi M, Castrejon L, Fidler S (2018) MovieGraphs: Towards Understanding Human-Centric Situations From Videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8581–8590

  22. Sun X H, Gu J, Sun HY (2020) Research progress of zero-shot learning. Appl Intell

  23. Tian P, Mo HW, Jiang LH (2021) Scene graph generation by multi-level semantic tasks. Appl Intell

  24. Yang L, Li LL, Zhang ZL, Zhou XY, Zhou E, Liu Y (2020) DPGN: Distribution Propagation Graph Network for Few-Shot Learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13390–13399

  25. Xie GS, Liu L, Zhu F, Zhao F, Zhang Z, Yao YZ, Qin J, Shao L (2020) Region Graph Embedding Network for Zero-Shot Learning. In: European conference on computer vision, pp 562–580

  26. Kampffmeyer M, Chen YB, Liang XD, Wang H, Zhang YJ, Xing EP (2019) Rethinking Knowledge Graph Propagation for Zero-Shot Learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11487–11496

  27. Shi WJ, Rajkumar R (2020) Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1711–1719

  28. Tang K, Niu Y, Huang JQ, Shi JX, Zhang HW (2020) Unbiased Scene Graph Generation From Biased Training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3716–3725

  29. Lin X, Ding CX, Zeng JQ, Tao DC (2020) GPS-Net: Graph Property Sensing Network for Scene Graph Generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3746–3753

  30. Hara K, Kataoka H, Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555

  31. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of NAACL-HLT, pp 4171–4186

  32. Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translations. In: Proceedings of conference on empirical methods in natural language processing, pp 1724–1734

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant no. 61972047) and the NSFC-General Technology Basic Research Joint Funds (grant no. U1936220)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teng, Y., Song, C. & Wu, B. Toward jointly understanding social relationships and characters from videos. Appl Intell 52, 5633–5645 (2022). https://doi.org/10.1007/s10489-021-02738-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02738-z

Keywords

Navigation