Abstract
Emotion recognition based on text modality has been one of the major topics in the field of emotion recognition in conversation. How to extract efficient emotional features is still a challenge. Previous studies utilize contextual semantics and emotion lexicon for affect modeling. However, they ignore information that may be conveyed by the emotion labels themselves. To address this problem, we propose the sentiment similarity-oriented attention (SSOA) mechanism, which uses the semantics of emotion labels to guide the model’s attention when encoding the input conversations. Thus to extract emotion-related information from sentences. Then we use the convolutional neural network (CNN) to extract complex informative features. In addition, as discrete emotions are highly related with the Valence, Arousal, and Dominance (VAD) in psychophysiology, we train the VAD regression and emotion classification tasks together by using multi-task learning to extract more robust features. The proposed method outperforms the benchmarks by an absolute increase of over 3.65% in terms of the average F1 for the emotion classification task, and also outperforms previous strategies for the VAD regression task on the IEMOCAP database.
Y. Fu and L. Guo—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Winata, G.I., et al.: CAiRE\_HKUST at SemEval-2019 task 3: hierarchical attention for dialogue emotion classification. arXiv preprint arXiv:1906.04041 (2019)
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Araque, O., Zhu, G., Iglesias, C.A.: A semantic similarity-based perspective of affect lexicons for sentiment analysis. Knowl.-Based Syst. 165, 346–359 (2019)
Khosla, S., Chhaya, N., Chawla, K.: Aff2Vec: affect-enriched distributional word representations. arXiv preprint arXiv:1805.07966 (2018)
Zou, Y., Gui, T., Zhang, Q., Huang, X.-J.: A lexicon-based supervised attention model for neural sentiment analysis. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 868–877 (2018)
Kim, E., Shin, J.W.: DNN-based emotion recognition based on bottleneck acoustic features and lexical features. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6720–6724. IEEE (2019)
Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement, pp. 201–237. Elsevier (2016)
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524 (2017)
Du, J., Gui, L., He, Y., Xu, R.: A convolutional attentional neural network for sentiment classification. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 445–450. IEEE (2017)
Yang, X., Macdonald, C., Ounis, I.: Using word embeddings in Twitter election classification. Inf. Retrieval J. 21(2–3), 183–207 (2018). https://doi.org/10.1007/s10791-017-9319-5
Marsella, S., Gratch, J.: Computationally modeling human emotion. Commun. ACM 57(12), 56–67 (2014)
Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daumé III, H.: Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1681–1691 (2015)
Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A dimensional approach to emotion recognition of speech from movies. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 65–68. IEEE (2009)
Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45(4), 1191–1207 (2013). https://doi.org/10.3758/s13428-012-0314-x
Tafreshi, S., Diab, M.: Emotion detection and classification in a multigenre corpus with joint multi-task deep learning. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2905–2913 (2018)
Akhtar, Md.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. arXiv preprint arXiv:1905.05812 (2019)
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Tripathi, S., Beigi, H.: Multi-modal emotion recognition on IEMOCAP dataset using deep learning. arXiv preprint arXiv:1804.05788 (2018)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016)
Van Der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
Acknowledgements
This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1305200, the National Natural Science Foundation of China under Grant 61771333 and the Tianjin Municipal Science and Technology Project under Grant 18ZXZNGX00330.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, Y., Guo, L., Wang, L., Liu, Z., Liu, J., Dang, J. (2021). A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-67832-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)