Multimedia emotion prediction using movie script and spectrogram

Kim, Jin-Su

doi:10.1007/s11042-020-08777-x

Multimedia emotion prediction using movie script and spectrogram

Published: 25 February 2020

Volume 80, pages 34535–34551, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jin-Su Kim ORCID: orcid.org/0000-0002-9800-7264¹

310 Accesses
Explore all metrics

Abstract

This article proposes a multimedia emotion-prediction approach using movie scripts and spectrograms with speech information. First, a variety of information is extracted from textual dialogues in scripts for emotion prediction. In addition, spectrograms transformed from speech information help to identify subtle representations of difficult-to-predict emotions from scripts. Accent helps predict emotions because it is an important means of expressing emotion states in speech. These are to analyze emotion words with a similar tendency on the basis of the emotion keywords in scripts and spectrograms. Emotion candidate keywords are extracted from text data using morphological analysis, and representative emotion keywords are extracted through Word2Vec_ARSP. Emotion keywords and speech data from the last part of the dialogue are extracted and converted into images. This multimedia information is used for the input layer in a convolutional neural network. In this paper, we propose a multi-modal method for more efficiently extracting and predicting emotions by mixing and learning integrated multimedia information through the character’s speech and background sounds, as well as dialogue that can directly express the emotional situation of the context. In order to improve the accuracy of emotion prediction using multimedia information in movies, we propose a system with a CNN for learning, testing, and prediction using a multi-modal method. The proposed multi-modal system compensates for unpredictable emotions from certain parts of the text through the spectrogram. The prediction accuracy is improved by 20.9% and 6.7%, compared to using only text information and only voice information, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Article 09 April 2024

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

References

Birajdar G, Patil M (2019) Speech and music classification using spectrogram based statistical descriptors and extreme learning machine. Multimed Tools Appl 78(11):15141–15168
Article Google Scholar
Bird S, Klein E, Loper E (2009) Natural language processing with Python, O’Reilly Media
Bordwell D, Thompson K, Smith J (2016) Film art: an introduction, McGraw-hill education; 11 edition, ISBN-13: 978–1259534959
Cerisara C, Král P, Lenc L (2018) On the effects of using word2vec representations in neural networks for dialogue act recognition. Comput Speech Lang 47:175–193
Article Google Scholar
Cun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series, The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA: MIT Press, 255–258
George K, Kumar C, Sivadas S, Ramachandran K, Panda A (2018) Analysis of cosine distance features for speaker verification. Pattern Recogn Lett 112:285–289
Article Google Scholar
N. Kalchbrenner, E. Grefenstette, P. Blunsom (2014) A convolutional neural network for modelling sentences, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665
Kim Y (2014) Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1746–1751
Kim J (2014) Emotion prediction of document using paragraph analysis. Journal of Digital Convergence 12(12):249–255
Article Google Scholar
Kim O, Lee S (2015) A Movie Recommendation Method based on Emotion Ontology. Journal of Korea Multimedia Society 18(9):1068–1082
Article Google Scholar
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3:211–225
Article Google Scholar
Maaoui C, Pruski A (2010) Emotion recognition through physiological signals for human-machine communication, Cutting Edge Robotics, pp. 317–333
Manning C, Raghavan P, Schutze H (2009) Introduction to information retrieval, Cambridge University Press
McGuinness D, Harmelen F (2009) OWL web ontology language overview, W3C recommendation
Metz C (2008) ROC analysis in medical imaging: a tutorial review of the literature. Radiological Physics & Technology 1(1):2–12
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space, In ICLR Workshop Papers
Ouali C, Dumouchel P, Gupta V (2016) A spectrogram-based audio fingerprinting system for content-based copy detection. In Multimedia Tools and Applications 75(15):9145–9165
Article Google Scholar
Park E, Cho S (2014) KoNLPy: Korean natural language processing in Python (http://dmlab.snu.ac.kr/~lucypark/docs/2014-10-10-hclt.pdf), Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea
Park J, Seo Y (2011) Acoustic information based emotion recognition for human-robot interaction. The Journal of Korean Institute of Information Technology 9(6):39–46
Google Scholar
Picard R (2003) Affective computing: challenges. International Journal of Human-Computer Studies 59(1):55–64
Article Google Scholar
Santos D, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78
Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms, In INTERSPEECH, pp. 1089–1093
Scherer K, Ekman P (2014) Approaches to emotion, Psychology Press
Sewak M, Karim M, Pujari P (2018) Practical convolutional neural network models, Packt Publishing Ltd.
S. Shai, BD. S, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.
Subscene-Passionate about good subtitles. https://subscene.com/. Accessed 20 June 2019.
Tang G, Liang R, Xie Y, Bao Y, Wang S (2019) Improved convolutional neural networks for acoustic event classification. In Multimedia Tools and Applications 78(12):15801–15816
Article Google Scholar
The Internet Movie Script Database (IMSDb), https://www.imsdb.com. Accessed 20 June 2019.
Toutanova K, Manning C (2000) Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger, In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63–70
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5):293–302
Article Google Scholar
Umeozor S (2019) Information networking and its application in the digital era with illustration from the University of Port Harcourt Library. International Journal of Knowledge Content Development & Technology 9(2):33–44
Google Scholar
Yenigalla P, Kumar A, Tripathi S, Singh C, Kar S, Vepa J (2018) Speech emotion recognition using spectrogram & phoneme embedding. Interspeech:3688–3692
Zeng Y, Mao H, Peng D, Yi Z (2019) Spectrogram based multi-task audio classification. In Multimedia Tools and Applications 78(3):3705–3722
Article Google Scholar
Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst Appl 42(4):1857–1863
Article Google Scholar

Download references

Author information

Authors and Affiliations

Division of Ari Liberal Arts, Anyang University, 22 Samdeok-ro 37 beon-gil, Manan-gu, Anyang-si, Gyeonggi-Do, 14028, South Korea
Jin-Su Kim

Authors

Jin-Su Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Su Kim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, JS. Multimedia emotion prediction using movie script and spectrogram. Multimed Tools Appl 80, 34535–34551 (2021). https://doi.org/10.1007/s11042-020-08777-x

Download citation

Received: 21 July 2019
Revised: 17 November 2019
Accepted: 17 February 2020
Published: 25 February 2020
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11042-020-08777-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia emotion prediction using movie script and spectrogram

Abstract

Access this article

Similar content being viewed by others

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Video summarization using deep learning techniques: a detailed analysis and investigation

Transformer models for text-based emotion detection: a review of BERT-based approaches

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia emotion prediction using movie script and spectrogram

Abstract

Access this article

Similar content being viewed by others

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Video summarization using deep learning techniques: a detailed analysis and investigation

Transformer models for text-based emotion detection: a review of BERT-based approaches

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation