Advertisement

Computational Visual Media

, Volume 1, Issue 2, pp 129–141 | Cite as

Semantic movie summarization based on string of IE-RoleNets

  • Wen Qu
  • Yifei Zhang
  • Daling Wang
  • Shi Feng
  • Ge YuEmail author
Open Access
Research Article

Abstract

Roles, their emotion, and interactions between them are three key elements for semantic content understanding of movies. In this paper, we proposed a novel movie summarization method to capture the semantic content in movies based on a string of IE-RoleNets. An IE-RoleNet (interaction and emotion rolenet) models the emotion and interactions of roles in a shot of the movie. The whole movie is represented as a string of IE-RoleNets. Summarization of a movie is transformed into finding an optimal substring with user-specified summarization ratio. Hierarchical substring mining is conducted to find an optimal substring of the whole movie. We have conducted objective and subjective experiments on our method. Experimental results show the ability of our method to capture the semantic content of movies.

Keywords

movie summarization content analysis movie understanding 

References

  1. [1]
    Li, Y.; Lee, S.-H.; Yeh, C.-H.; Kuo, C.-C. J. Techniques for movie content analysis and skimming: Tutorial and overview on video abstraction techniques. IEEE Singnal Processing Magazine Vol. 23, No. 2, 79–89, 2006.zbMATHCrossRefGoogle Scholar
  2. [2]
    Evangelopoulos, G.; Zlatintsi, A.; Potamianos, A.; Maragos, P.; Rapantzikos, K.; Skoumas, G.; Avrithis, Y. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Transactions on Multimedia Vol. 15, No. 7, 1553–1568, 2013.CrossRefGoogle Scholar
  3. [3]
    Evangelopulos, G.; Rapantzikos, K.; Potamianos, A.; Maragos, P. Zlatintsi, A.; Avrithis, Y. Movie summarization based on audiovisual saliency detection. In: 15th IEEE International Conference on Image Processing, 2528–2531, 2008.Google Scholar
  4. [4]
    Weng, C.-Y.; Chu, W.-T.; Wu, J.-L. RoleNet: Movie analysis from the perspective of social networks. IEEE Transactionns on Multimedia Vol. 11, No. 2, 256–271, 2009.CrossRefGoogle Scholar
  5. [5]
    Tsai, C.-M.; Kang L.-W.; Lin, C.-W.; Lin, W. Scene-based movie summarization via role-community networks. IEEE Transactions on Circuits and System for Video Technology Vol. 23, No. 11, 1927–1940, 2013.CrossRefGoogle Scholar
  6. [6]
    Sang, J.; Xu, C. Character-based movie summarization. In: Proceedings of the international conference on Multimedia, 855–858, 2010.Google Scholar
  7. [7]
    Hanjalic, A.; Xu, L.-Q. Affective video content representation and modeling. IEEE Transactions on Multimedia Vol. 7, No. 1, 143–154, 2005.CrossRefGoogle Scholar
  8. [8]
    Shao, X.; Xu, C.; Maddage, N. C.; Tian, Q.; Kankanhalli, M. S.; Jin, J. S. Automatic summarization of music videos. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 2, No. 2, 127–148, 2006.CrossRefGoogle Scholar
  9. [9]
    Babaguchi, N.; Kawai, Y.; Ogura, T.; Kitahashi, T. Personalized abstraction of broadcasted American football video by highlight selection. IEEE Transactions on Multimedia Vol. 6, No. 4, 575–586, 2004.CrossRefGoogle Scholar
  10. [10]
    Takahashi, Y.; Nitta, N.; Babaguchi, N. Video summarization for large sports video archives. In: IEEE International Conference on Multimedia and Expo, 1170–1173, 2005.Google Scholar
  11. [11]
    Chen, F.; De Vleeschouwer, C.; Cavallaro, A. Resource allocation for personalized video summarization. IEEE Transactions on Multimedia Vol. 16, No. 2, 455–469, 2014.CrossRefGoogle Scholar
  12. [12]
    Ide, I.; Mo, H.; Katayama, N.; Satoh, S. Exploiting topic thread structures in a news video archive for the semi-automatic generation of video summaries. In: IEEE International Conference on Multimedia and Expo, 1473–1476, 2006.Google Scholar
  13. [13]
    Wang, Y; Zhao, P.; Zhang, D.; Li, M.; Zhang, H. MyVideos: A system for home video management. In: Proceedings of the tenth ACM international conference on Multimedia, 412–413, 2002.CrossRefGoogle Scholar
  14. [14]
    Zhao, M.; Bu, J.; Chen, C. Audio and video combined for home video abstraction. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 5, V-620-3, 2003.Google Scholar
  15. [15]
    Qiu, X.; Jiang, S. Liu, H.; Huang, Q.; Cao, L. Spatialtemporal attention analysis for home video. In: IEEE International Conference on Multimedia and Expo, 1517–1520, 2008.Google Scholar
  16. [16]
    Lee, Y. J.; Ghosh, J.; Grauman, K. Discovering important people and objects for egocentric video summarization. In: IEEE Conference on Computer Vision and Pattern Recognition, 1346–1353, 2012.Google Scholar
  17. [17]
    Chen, H.-W.; Kuo, J.-H.; Chu, W.-T.; Wu, J.- T. Action movies segmentation and summarization based on tempo analysis. In: Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, 251–258, 2004.Google Scholar
  18. [18]
    Chen, B.-W.; Wang, J.-C.; Wang, J.-F. A novel video summarization based on mining the story structure and semantic relations among concept entities. IEEE Transactions on Multimedia Vol. 11, No. 2, 295–312, 2009.CrossRefGoogle Scholar
  19. [19]
    Zhu, S.; Zhao, Y.; Liang, Z.; Jing, X. Movie abstraction via the progress of the storyline. IET Signal Processing Vol. 6, No. 8, 751–762, 2012.CrossRefGoogle Scholar
  20. [20]
    Ren, R.; Misra, H.; Jose, J. M. Semantic based adaptive movie summarization. Lecture Notes in Computer Science Vol. 5916, 389–399, 2010.CrossRefGoogle Scholar
  21. [21]
    Trottier, D. The Screenwriter’s Bible: A Complete Guide to Writing, Formatting, and Selling Your Script. Silman-James Press, 1998.Google Scholar
  22. [22]
    Yusoff, Y.; Christmas, W.; Kittler, J. A study on automatic shot change detection. Lecture Notes in Computer Science Vol. 1425, 177–189, 1998.CrossRefGoogle Scholar
  23. [23]
    Zhu, S.; Liu, Y. Automatic scene detection for advanced story retrieval. Expert Systems with Applications Vol. 36, No. 3, 5976–5986, 2009.CrossRefGoogle Scholar
  24. [24]
    Zhu, X; Ramanan, D. Face detection, pose estimation, and landmark localization in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, 2879–2886, 2012.Google Scholar
  25. [25]
    Belhumeur, P. N.; Hespanha, J. P.; Kriegman, D. J. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 19, No. 7, 711–720, 1997.CrossRefGoogle Scholar
  26. [26]
    Everingham, M.; Sivic, J.; Zisserman, A. “Hello! My name is … Buffy”—automatic naming of characters in TVvideo. In: Proceedings of the British Machine Vision Conference, 92.1–92.10, 2006.Google Scholar
  27. [27]
    Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing Vol. 28, No. 4, 357–366, 1980.CrossRefGoogle Scholar
  28. [28]
    Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. Structured learning of human interactions in TV shows. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 12, 2441–2453, 2012.CrossRefGoogle Scholar
  29. [29]
    Laptev, I.; Marszalek, M.; Schmid, C.; Rozenfeld, B. Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2008.Google Scholar
  30. [30]
    Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; Ramanan D. Object detection with discriminatively trained partbased models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627–1645, 2009.CrossRefGoogle Scholar
  31. [31]
    Jiang Y.-G.; Li, Z.; Chang, S.-F. Modeling scene and object contexts for human action retrieval with few examples. IEEE Transactions on Circuits and Systems for Video Technology Vol. 21, No. 5, 674–681, 2011.CrossRefGoogle Scholar
  32. [32]
    Giannakopoulos, T.; Pikrakis, A.; Theodoridis, S. A dimensional approach to emotion recognition of speech from movies. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 65–68, 2009.Google Scholar
  33. [33]
    Xu, M.; Xu, C.; He, X.; Jin, J. S.; Luo, S.; Rui, Y. Hierarchical affective content analysis in arousal and valence dimensions. Signal Processing Vol. 93, No. 8, 2140–2150, 2013.CrossRefGoogle Scholar
  34. [34]
    Ekman, P. Emotion in the Human Face. Cambridge University Press, 1982.Google Scholar
  35. [35]
    Srivastava, R.; Yan, S.; Sim, T; Roy, S. Recognizing emotions of characters in movies. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 993–996, 2012.Google Scholar
  36. [36]
    Lavagetto, F. The facial animation engine: Toward a high-level interface for the design of MPEG-4 compliant animated faces. IEEE Transactions on Circuits and System for Video Technology Vol. 9, No. 2, 277–289, 1999.CrossRefGoogle Scholar
  37. [37]
    Pikrakis, A.; Giannakopoulos, T.; Theodoridis, S. A speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks. IEEE Transactions on Multimedia Vol. 10, No. 5, 846–857, 2008.CrossRefGoogle Scholar
  38. [38]
    Cowie, R.; Douglas-Cowie, E.; Tsapatsoulis, N.; Votsis, G.; Kollias, S.; Fellenz, W.; Taylor, J. G. Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine Vol. 18, No. 1, 32–80, 2001CrossRefGoogle Scholar
  39. [39]
    McGilloway, S.; Cowie, R.; Douglas-Cowie, E.; Gielen, S.; Westerdijk, M.; Stroeve, S. Approaching automatic recognition of emotion from voice: A rough benchmark. In: ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, 2000. Available at http://www.isca-speech.org/archive_open/archive_papers/speech_emotion/spem 207.pdf.Google Scholar
  40. [40]
    Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing Vol. 26, No. 1, 43–49, 1978.zbMATHCrossRefGoogle Scholar
  41. [41]
    Truong, B. T.; Venkatesh, S. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 3, No. 1, Article No. 3, 2007.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • Wen Qu
    • 1
  • Yifei Zhang
    • 1
    • 2
  • Daling Wang
    • 1
    • 2
  • Shi Feng
    • 1
    • 2
  • Ge Yu
    • 1
    • 2
    Email author
  1. 1.School of Information Science and EngineeringNortheastern UniversityShenyangChina
  2. 2.Key Laboratory of Medical Image Computing (Northeastern University), Ministry of EducationShenyangChina

Personalised recommendations