Skip to main content

A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

Abstract

Emotion recognition based on text modality has been one of the major topics in the field of emotion recognition in conversation. How to extract efficient emotional features is still a challenge. Previous studies utilize contextual semantics and emotion lexicon for affect modeling. However, they ignore information that may be conveyed by the emotion labels themselves. To address this problem, we propose the sentiment similarity-oriented attention (SSOA) mechanism, which uses the semantics of emotion labels to guide the model’s attention when encoding the input conversations. Thus to extract emotion-related information from sentences. Then we use the convolutional neural network (CNN) to extract complex informative features. In addition, as discrete emotions are highly related with the Valence, Arousal, and Dominance (VAD) in psychophysiology, we train the VAD regression and emotion classification tasks together by using multi-task learning to extract more robust features. The proposed method outperforms the benchmarks by an absolute increase of over 3.65% in terms of the average F1 for the emotion classification task, and also outperforms previous strategies for the VAD regression task on the IEMOCAP database.

Y. Fu and L. Guo—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  2. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  3. Winata, G.I., et al.: CAiRE\_HKUST at SemEval-2019 task 3: hierarchical attention for dialogue emotion classification. arXiv preprint arXiv:1906.04041 (2019)

  4. Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  5. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  6. Araque, O., Zhu, G., Iglesias, C.A.: A semantic similarity-based perspective of affect lexicons for sentiment analysis. Knowl.-Based Syst. 165, 346–359 (2019)

    Article  Google Scholar 

  7. Khosla, S., Chhaya, N., Chawla, K.: Aff2Vec: affect-enriched distributional word representations. arXiv preprint arXiv:1805.07966 (2018)

  8. Zou, Y., Gui, T., Zhang, Q., Huang, X.-J.: A lexicon-based supervised attention model for neural sentiment analysis. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 868–877 (2018)

    Google Scholar 

  9. Kim, E., Shin, J.W.: DNN-based emotion recognition based on bottleneck acoustic features and lexical features. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6720–6724. IEEE (2019)

    Google Scholar 

  10. Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement, pp. 201–237. Elsevier (2016)

    Google Scholar 

  11. Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524 (2017)

  12. Du, J., Gui, L., He, Y., Xu, R.: A convolutional attentional neural network for sentiment classification. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 445–450. IEEE (2017)

    Google Scholar 

  13. Yang, X., Macdonald, C., Ounis, I.: Using word embeddings in Twitter election classification. Inf. Retrieval J. 21(2–3), 183–207 (2018). https://doi.org/10.1007/s10791-017-9319-5

    Article  Google Scholar 

  14. Marsella, S., Gratch, J.: Computationally modeling human emotion. Commun. ACM 57(12), 56–67 (2014)

    Article  Google Scholar 

  15. Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174 (2018)

    Google Scholar 

  16. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  17. Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daumé III, H.: Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1681–1691 (2015)

    Google Scholar 

  18. Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A dimensional approach to emotion recognition of speech from movies. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 65–68. IEEE (2009)

    Google Scholar 

  19. Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45(4), 1191–1207 (2013). https://doi.org/10.3758/s13428-012-0314-x

    Article  Google Scholar 

  20. Tafreshi, S., Diab, M.: Emotion detection and classification in a multigenre corpus with joint multi-task deep learning. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2905–2913 (2018)

    Google Scholar 

  21. Akhtar, Md.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. arXiv preprint arXiv:1905.05812 (2019)

  22. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6

    Article  Google Scholar 

  23. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

    Google Scholar 

  24. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  26. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  27. Tripathi, S., Beigi, H.: Multi-modal emotion recognition on IEMOCAP dataset using deep learning. arXiv preprint arXiv:1804.05788 (2018)

  28. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  29. Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016)

  30. Van Der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1305200, the National Natural Science Foundation of China under Grant 61771333 and the Tianjin Municipal Science and Technology Project under Grant 18ZXZNGX00330.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longbiao Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, Y., Guo, L., Wang, L., Liu, Z., Liu, J., Dang, J. (2021). A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67832-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67831-9

  • Online ISBN: 978-3-030-67832-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics