Skip to main content

Multi-task Gated Contextual Cross-Modal Attention Framework for Sentiment and Emotion Analysis

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1142))

Included in the following conference series:

Abstract

Multi-modal sentiment and emotion analysis have been an emerging and prominent field nowadays at the intersection of natural language processing, deep learning, machine learning, computer vision, and speech processing. Sentiment and emotion prediction model finds the attitude of a speaker or writer towards any discussion, debate, event, document or topic. It can be expressed in different ways like the words spoken, energy and tone while delivering words, accompanying facial expressions, gestures, etc. Moreover related and similar tasks generally depend on each other and are predicted better if solved through a joint framework. In this paper, we present a multi-task gated contextual cross-modal attention framework which considers all the three modalities (viz. text, acoustic and visual) and multiple utterances for sentiment and emotion prediction together. We evaluate our proposed approach on CMU-MOSEI dataset for sentiment and emotion prediction. Evaluation results depict that our proposed approach extracts co-relation among the three modalities and attains an improvement over the previous state-of-the-art models.

First two authors have equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/A2Zadeh/CMU-MultimodalDataSDK.

References

  1. Blanchard, N., Moreira, D., Bharati, A., Scheirer, W.: Getting the subtext without the text: scalable multimodal sentiment classification from visual and acoustic modalities. In: Proceedings of Grand Challenge and Workshop on Human Multimodal Language, Melbourne, Australia, pp. 1–10 (2018)

    Google Scholar 

  2. Zhang, Y., et al.: A quantum-inspired multimodal sentiment analysis framework. Theor. Comput. Sci. 752, 21–40 (2018)

    Article  MathSciNet  Google Scholar 

  3. Poria, S., Cambria, E., Hazarika, D., Mazumder, N., Zadeh, A., Morency, L.: Multi-level multiple attentions for contextual multimodal sentiment analysis. In: 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, pp. 1033–1038, November 2017

    Google Scholar 

  4. Nojavanasghari, B., Gopinath, D., Koushik, J., Baltrušaitis, T., Morency, L.-P.: Deep multimodal fusion for persuasiveness prediction. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan (2016)

    Google Scholar 

  5. Deng, D., Zhou, Y., Pi, J., Shi, B.E.: Multimodal utterance level affect analysis using visual, audio and text features. arXiv preprint arXiv:1805.00625 (2018)

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  7. Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018), New Orleans, USA, pp. 5642–5649 (2018)

    Google Scholar 

  8. Arevalo, J., Solorio, T., Montes-y-Gómez, M., González, F.A.: Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017)

  9. Zadeh, A., Liang, P.P., Poria, S., Cambria, E., Morency, L.-P.: Multimodal language analysis in the Wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the ACL, Melbourne, Australia, pp. 2236–2246 (2018)

    Google Scholar 

  10. Tong, E., Zadeh, A., Jones, C., Morency, L.-P.: Combating human trafficking with multimodal deep models. In: Proceedings of the 55th Annual Meeting of the ACL, Vancouver, Canada, pp. 1547–1556. ACL (2017)

    Google Scholar 

  11. Rajagopalan, S.S., Morency, L.-P., Baltrus̆aitis, T., Goecke, R.: Extending long short-term memory for multi-view structured learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 338–353. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_21

    Chapter  Google Scholar 

  12. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.-P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1103–1114, September 2017

    Google Scholar 

  13. Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.-P.: Memory fusion network for multi-view sequential learning. In: Proceedings of the 32nd AAAI, New Orleans, Louisiana, USA, 2–7 February 2018

    Google Scholar 

Download references

Acknowledgement

Asif Ekbal acknowledges the Young Faculty Research Fellowship (YFRF), supported by Visvesvaraya Ph.D. scheme of MeiTY, Government of India. The research reported here is also partially supported by “Skymap Global India Private Limited”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suyash Sangwan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sangwan, S., Chauhan, D.S., Akhtar, M.S., Ekbal, A., Bhattacharyya, P. (2019). Multi-task Gated Contextual Cross-Modal Attention Framework for Sentiment and Emotion Analysis. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36808-1_72

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36807-4

  • Online ISBN: 978-3-030-36808-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics