Abstract
Recently, many methods that exploit attention mechanism to discover the relevant local regions via visual attributes, have demonstrated promising performance in visual sentiment prediction. In these methods, accurate detection of visual attributes is of vital importance to identify the sentiment relevant regions, which is crucial for successful assessment of visual sentiment. However, existing work merely utilize basic strategies on convolutional neural network for visual attribute detection and fail to obtain satisfactory results due to the semantic gap between visual features and subjective attributes. Moreover, it is difficult for existing attention models to localize subtle sentiment relevant regions, especially when the performance of attribute detection is relatively poor. To address these problems, we first design a multi-task learning based approach for visual attribute detection. By augmenting the attributes with sentiments supervision, the semantic gap can be effectively reduced. We then develop a multi-attention model for jointly discovering and localizing multiple relevant local regions given predicted attributes. The classifier built on top of these regions achieves a significant improvement in visual sentiment prediction. Experimental results demonstrate the superiority of our method against previous approaches.
Similar content being viewed by others
References
Alameda-Pineda X, Ricci E, Yan Y, Sebe N (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5240–5248
Borth D, Chen T, Ji R, Chang SF (2013) SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, pp 459–460
Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232
Campos V, Salvador A, Giro-i Nieto X, Jou B (2015) Diving deep into sentiment: understanding fine-tuned cnns for visual sentiment prediction. In: Proceedings of the 1st international workshop on affect sentiment in multimedia, pp 57–62
Campos V, Jou B, Giró-I-Nieto X (2017) From pixels to sentiment: fine-tuning cnns for visual sentiment prediction. Image Vis Comput 65:15–22
Chen T, Borth D, Darrell T, Chang S (2014) DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. CoRR arXiv:1410.8586
Chen YY, Chen T, Liu T, Liao HYM, Chang SF (2015) Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans Affect Comput 6(3):298–311
Einhauser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18.1–26
Escorcia V, Niebles JC, Ghanem B (2015) On the relationship between visual attributes and convolutional networks. IEEE conference on computer vision and pattern recognition, CVPR 2015, pp 1256–1264
Fan S, Ng T, Herberg JS, Koenig BL, Tan CYC, Wang R (2014) An automated estimator of image visual realism based on human cognition. In: 2014 IEEE conference on computer vision and pattern recognition, pp 4201–4208
Fan S, Jiang M, Shen Z, Koenig BL, Kankanhalli MS, Zhao Q (2017) The role of visual attention in sentiment prediction. In: Proceedings of the 2017 ACM on multimedia conference, MM 2017, pp 217–225
Gomes CFA, Brainerd CJ, Stein LM (2013) Effects of emotional valence and arousal on recollective and nonrecollective recall. J Exp Psychol Learn Mem Cognit 39(3):663–677
Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett 46(2):1–14
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Joshi D, Datta R, Fedorovskaya E, Luong QT, Wang JZ, Jia L, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115
Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd annual ACM conference on multimedia conference, pp 159–168
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, Curran Associates Inc., pp 1097–1105
Lei P, Zhu S, Ngo CW (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimedia 17(11):2008–2020
Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: 13th European conference on computer vision ECCV 2014, pp 740–755
Lu X, Suryanarayan P, Adams RB Jr, Li J, Newman MG, Wang JZ (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 229–238
Ma L, Lu Z, Shang L, Li H (2015) Multimodal convolutional neural networks for matching image and sentence. IEEE Int Conf Comput Vis 2015:2623–2631
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, MM ’10, pp 83–92
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Annual conference on neural information processing systems 2014, pp 2204–2212
Peng K, Sadovnik A, Gallagher A, Chen T (2016) Where do emotions come from? Predicting the emotion stimuli map. In: 2016 IEEE international conference on image processing (ICIP), pp 614–618
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Annual conference on neural information processing systems 2015, pp 91–99
Ulrike R, Lila D, Radoslav P, Sonya D, Phelps EA (2011) Emotion enhances the subjective feeling of remembering, despite lower accuracy for contextual details. Emotion 11(3):553–562
Wu L, Qi M, Jian M, Zhang H (2019) Visual sentiment analysis by combining global and local information. Neural Process Lett. https://doi.org/10.1007/s11063-019-10027-7
Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, pp 2048–2057
Xun H, Shen C, Boix X, Qi Z (2015) SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE international conference on computer vision (ICCV)
Yang J, She D, Lai YK, Rosin PL, Yang MH (2018) Weakly supervised coupled networks for visual sentiment analysis. In: CVPR
You Q, Luo J, Jin H, Yang J (2015) Joint visual-textual sentiment analysis with deep neural networks. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15, pp 1071–1074
You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 381–388
You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining, pp 13–22
You Q, Jin H, Luo J (2017) Visual sentiment analysis by attending on local image regions. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 231–237
Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: Image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, WISDOM ’13, pp 10:1–10:8
Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based cnn for target-dependent sentiment classification. Neural Process Lett 3:1–15
Acknowledgements
This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1003201, in part by the National Natural Science Foundation of China under Grant 61702114 and Grant 61672171, in part by the Guangdong Key R&D Project of China under Grants 2018B010107003, 2019B010121001 and in part by the Guangdong Natural Science foundation under Grant 2018B030311007.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, Z., Meng, M. & Wu, J. Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism. Neural Process Lett 51, 2403–2416 (2020). https://doi.org/10.1007/s11063-020-10201-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10201-2