Skip to main content
Log in

Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

The recent advancements in the Internet technology and its associated services, led the users to post a large amount of multimodal data into social media Web sites, online shopping portals, video repositories, etc. The availability of the huge amount of multimodal content, multimodal sentiment classification, and affective computing has become the most researched topic. The extraction of context among the neighboring utterances and considering the importance of inter-modal utterances before multimodal fusion are the most important research issues in this field. This article presents a novel approach to extract the context at multiple levels and to understand the importance of inter-modal utterances in sentiment and emotion classification. Experiments are conducted on two publically accepted datasets such as CMU-MOSI for sentiment analysis and IEMOCAP for emotion classification. By incorporating the utterance-level contextual information and importance of inter-modal utterances, the proposed model outperforms the standard baselines by over 3% in classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125

    Article  Google Scholar 

  2. Huddar MG, Sannakki SS, Rajpurohit VS (2019) Multimodal emotion recognition using facial expressions, body gestures, speech, and text modalities. Int J Eng Adv Technol (IJEAT) 8(5):2453–2459

    Google Scholar 

  3. Rosas VP, Mihalcea R, Morency L-P (2013) Multimodal sentiment analysis of Spanish online. IEEE Intell Syst 28(3):38–45

    Article  Google Scholar 

  4. Ellis JG, Jou B, Chang S-F (2014) Why we watch the news: a dataset for exploring sentiment in broadcast video news. In: Proceedings of the 16th international conference on multimodal interaction, Istanbul, Turkey

  5. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP, pp 2539–2544

  6. Poria S, Cambria E, Hazarika D, Mazumder N, Zadeh A, Morency L-P (2017) Context-dependent sentiment analysis in user-generated. ACL 2:873–883

    Google Scholar 

  7. Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107

    Article  Google Scholar 

  8. Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. In: Mining text data. Springer, Boston, pp 415–463

    Chapter  Google Scholar 

  9. Huddar MG, Sannakki SS, Rajpurohit VS (2019) A survey of computational approaches and challenges in multimodal sentiment analysis. Int J Comput Sci Eng 7(1):876–883

    Google Scholar 

  10. Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48(4):499–527

    Article  Google Scholar 

  11. Penga H, Ma Y, Lib Y, Cambria E (2018) Learning multi-grained aspect target sequence for Chinese sentiment analysis. Knowl-Based Syst 148:167–176

    Article  Google Scholar 

  12. Mohammad SM, Kiritchenko S, Zhu X (2013) Building the state-of-the-art in sentiment analysis of tweets. In: Second Joint conference on lexical and computational semantics, Canada

  13. Thakora P, Sasi DS (2015) Ontology-based sentiment analysis process for social media content. Procedia Comput Sci 53:199–207

    Article  Google Scholar 

  14. Kiritchenko S, Zhu X, Mohammad SM (2014) Sentiment analysis of short informal texts. J Artif Intell Res 50:723–762

    Article  Google Scholar 

  15. Nalisnick ET, Baird HS (2013) Extracting sentiment networks from Shakespeare’s plays. In: 12th international conference on document analysis and recognition, Washington, DC, USA

  16. Peng B, Li J, Chen J, Han X, Xu R, Wong K-F (2015) Trending sentiment-topic detection on twitter. In: International conference on intelligent text processing and computational linguistics

  17. Lyu K, Kim H (2016) Sentiment analysis using word polarity of social media. Wireless Pers Commun 89(3):941–958

    Article  Google Scholar 

  18. Gupta P, Tiwari R, Robert N (2016) Sentiment analysis and text summarization of online reviews: a survey. In: International conference on communication and signal processing (ICCSP), Melmaruvathur, India

  19. de Kok S, Punt L, van den Puttelaar R, Ranta K, Schouten K, Frasincar F (2018) Review-aggregated aspect-based sentiment analysis with ontology features. Progress Artif Intell 7(4):295–306

    Article  Google Scholar 

  20. Korayem M, Crandall D, Abdul-Mageed M (2012) Subjectivity and sentiment analysis of Arabic: a survey. In: International conference on advanced machine learning technologies and applications, Springer, Berlin

    Google Scholar 

  21. Ramteke J, Shah S, Godhia D, Shaikh A (2016) Election result prediction using Twitter sentiment analysis. In: International conference on inventive computation technologies (ICICT), Coimbatore, India

  22. Mars A, Gouider MS (2017) Big data analysis to features opinions extraction of customer. Procedia Comput Sci 112:906–916

    Article  Google Scholar 

  23. Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23

    Article  Google Scholar 

  24. Nagamma P, Pruthvi HR, Nisha KK, Shwetha NH (2015) An improved sentiment analysis of online movie reviews based on clustering for box-office prediction. In: International conference on computing, communication and automation, Noida, India

  25. Kirilenko AP, Stepchenkova SO, Kim H, Li X (2018) Automated sentiment analysis in tourism: comparison of approaches. J Travel Res 57(8):1012–1025

    Article  Google Scholar 

  26. Gohil S, Vuik S, Darzi A (2018) Sentiment analysis of health care tweets: review of the methods used. JMIR Public Health Surveill 4(2):e43

    Article  Google Scholar 

  27. Chen LS, Huang TS, Miyasato T, Nakatsu R (1998) Multimodal human emotion/expression recognition. In: Proceedings of the 3rd international conference on face and gesture recognition, Washington, DC, USA

  28. Wöllmer M, Weninger F, Knaup T, Schuller B, Sun C, Sagae K, Morency L-P (2013) YouTube movie reviews: sentiment analysis in an audio–visual context. IEEE Intell Syst 28(3):46–53

    Article  Google Scholar 

  29. Wu CH, Liang WB (2010) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21

    Google Scholar 

  30. Noroozi F, Marjanovic M, Njegus A, Escalera S, Anbarjafari G (2017) Audio–visual emotion recognition in video clips. IEEE Trans Affect Comput 10(1):60–75

    Article  Google Scholar 

  31. Rozgić V, Ananthakrishnan S, Saleem S, Kumar R, Prasad R (2013) Ensemble of SVM trees for multimodal emotion recognition. In: Proceedings of the 2012 Asia pacific signal and information processing association annual summit and conference, Hollywood, CA, USA

  32. Huddar MG, Sannakki SS, Rajpurohit VS (2018) An ensemble approach to utterance level multimodal sentiment analysis. In: 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS), Belgaum, India

  33. Eyben F, Wöllmer M, Graves A, Schuller B, Douglas-Cowie E, Cowie R (2010) On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J Multimodal User Interfaces 3(1–2):7–19

    Article  Google Scholar 

  34. Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: IEEE 16th international conference on data mining (ICDM), Barcelona, Spain

  35. Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Empirical methods in natural language processing, Copenhagen, Denmark

  36. Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. J IEEE Intell Syst 31(6):82–88

    Article  Google Scholar 

  37. Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: interactive emotional dyadic motion capture database. J Lang Resources Eval 42(4):335–359

    Article  Google Scholar 

  38. Eyben F, Wöllmer M, Schuller B (2013) Recent developments in open SMILE, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia, Barcelona, Spain

  39. Mariethoz J, Bengio S (2005) A unified framework for score normalization techniques applied to text-independent speaker verification. IEEE Signal Process Lett 12(7):532–535

    Article  Google Scholar 

  40. Karpathy A, Toderici G, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of international computer vision and pattern recognition

  41. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  42. Teh YW, Hinton GE (2000) Rate-coded restricted Boltzmann machines for face recognition. In: Proceedings of the 13th international conference on neural information processing systems, Cambridge, MA, USA

  43. Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahesh G. Huddar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huddar, M.G., Sannakki, S.S. & Rajpurohit, V.S. Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification. Int J Multimed Info Retr 9, 103–112 (2020). https://doi.org/10.1007/s13735-019-00185-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-019-00185-8

Keywords

Navigation