Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism

Wu, Zhuanghui; Meng, Min; Wu, Jigang

doi:10.1007/s11063-020-10201-2

Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism

Published: 14 February 2020

Volume 51, pages 2403–2416, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

490 Accesses
Explore all metrics

Abstract

Recently, many methods that exploit attention mechanism to discover the relevant local regions via visual attributes, have demonstrated promising performance in visual sentiment prediction. In these methods, accurate detection of visual attributes is of vital importance to identify the sentiment relevant regions, which is crucial for successful assessment of visual sentiment. However, existing work merely utilize basic strategies on convolutional neural network for visual attribute detection and fail to obtain satisfactory results due to the semantic gap between visual features and subjective attributes. Moreover, it is difficult for existing attention models to localize subtle sentiment relevant regions, especially when the performance of attribute detection is relatively poor. To address these problems, we first design a multi-task learning based approach for visual attribute detection. By augmenting the attributes with sentiments supervision, the semantic gap can be effectively reduced. We then develop a multi-attention model for jointly discovering and localizing multiple relevant local regions given predicted attributes. The classifier built on top of these regions achieves a significant improvement in visual sentiment prediction. Experimental results demonstrate the superiority of our method against previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

CBAM: Convolutional Block Attention Module

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

References

Alameda-Pineda X, Ricci E, Yan Y, Sebe N (2016) Recognizing emotions from abstract paintings using non-linear matrix completion. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5240–5248
Borth D, Chen T, Ji R, Chang SF (2013) SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, pp 459–460
Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232
Campos V, Salvador A, Giro-i Nieto X, Jou B (2015) Diving deep into sentiment: understanding fine-tuned cnns for visual sentiment prediction. In: Proceedings of the 1st international workshop on affect sentiment in multimedia, pp 57–62
Campos V, Jou B, Giró-I-Nieto X (2017) From pixels to sentiment: fine-tuning cnns for visual sentiment prediction. Image Vis Comput 65:15–22
Article Google Scholar
Chen T, Borth D, Darrell T, Chang S (2014) DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. CoRR arXiv:1410.8586
Chen YY, Chen T, Liu T, Liao HYM, Chang SF (2015) Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans Affect Comput 6(3):298–311
Article Google Scholar
Einhauser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18.1–26
Escorcia V, Niebles JC, Ghanem B (2015) On the relationship between visual attributes and convolutional networks. IEEE conference on computer vision and pattern recognition, CVPR 2015, pp 1256–1264
Fan S, Ng T, Herberg JS, Koenig BL, Tan CYC, Wang R (2014) An automated estimator of image visual realism based on human cognition. In: 2014 IEEE conference on computer vision and pattern recognition, pp 4201–4208
Fan S, Jiang M, Shen Z, Koenig BL, Kankanhalli MS, Zhao Q (2017) The role of visual attention in sentiment prediction. In: Proceedings of the 2017 ACM on multimedia conference, MM 2017, pp 217–225
Gomes CFA, Brainerd CJ, Stein LM (2013) Effects of emotional valence and arousal on recollective and nonrecollective recall. J Exp Psychol Learn Mem Cognit 39(3):663–677
Article Google Scholar
Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett 46(2):1–14
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Joshi D, Datta R, Fedorovskaya E, Luong QT, Wang JZ, Jia L, Luo J (2011) Aesthetics and emotions in images. IEEE Signal Process Mag 28(5):94–115
Article Google Scholar
Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd annual ACM conference on multimedia conference, pp 159–168
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, Curran Associates Inc., pp 1097–1105
Lei P, Zhu S, Ngo CW (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimedia 17(11):2008–2020
Article Google Scholar
Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: 13th European conference on computer vision ECCV 2014, pp 740–755
Lu X, Suryanarayan P, Adams RB Jr, Li J, Newman MG, Wang JZ (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 229–238
Ma L, Lu Z, Shang L, Li H (2015) Multimodal convolutional neural networks for matching image and sentence. IEEE Int Conf Comput Vis 2015:2623–2631
Google Scholar
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, MM ’10, pp 83–92
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Annual conference on neural information processing systems 2014, pp 2204–2212
Peng K, Sadovnik A, Gallagher A, Chen T (2016) Where do emotions come from? Predicting the emotion stimuli map. In: 2016 IEEE international conference on image processing (ICIP), pp 614–618
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Annual conference on neural information processing systems 2015, pp 91–99
Ulrike R, Lila D, Radoslav P, Sonya D, Phelps EA (2011) Emotion enhances the subjective feeling of remembering, despite lower accuracy for contextual details. Emotion 11(3):553–562
Article Google Scholar
Wu L, Qi M, Jian M, Zhang H (2019) Visual sentiment analysis by combining global and local information. Neural Process Lett. https://doi.org/10.1007/s11063-019-10027-7
Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, ICML 2015, pp 2048–2057
Xun H, Shen C, Boix X, Qi Z (2015) SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE international conference on computer vision (ICCV)
Yang J, She D, Lai YK, Rosin PL, Yang MH (2018) Weakly supervised coupled networks for visual sentiment analysis. In: CVPR
You Q, Luo J, Jin H, Yang J (2015) Joint visual-textual sentiment analysis with deep neural networks. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15, pp 1071–1074
You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 381–388
You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining, pp 13–22
You Q, Jin H, Luo J (2017) Visual sentiment analysis by attending on local image regions. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 231–237
Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: Image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, WISDOM ’13, pp 10:1–10:8
Zhang S, Xu X, Pang Y, Han J (2019) Multi-layer attention based cnn for target-dependent sentiment classification. Neural Process Lett 3:1–15
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1003201, in part by the National Natural Science Foundation of China under Grant 61702114 and Grant 61672171, in part by the Guangdong Key R&D Project of China under Grants 2018B010107003, 2019B010121001 and in part by the Guangdong Natural Science foundation under Grant 2018B030311007.

Author information

Authors and Affiliations

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Zhuanghui Wu, Min Meng & Jigang Wu

Authors

Zhuanghui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Min Meng
View author publications
You can also search for this author in PubMed Google Scholar
Jigang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Meng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Z., Meng, M. & Wu, J. Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism. Neural Process Lett 51, 2403–2416 (2020). https://doi.org/10.1007/s11063-020-10201-2

Download citation

Published: 14 February 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11063-020-10201-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

CBAM: Convolutional Block Attention Module

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual Sentiment Prediction with Attribute Augmentation and Multi-attention Mechanism

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

CBAM: Convolutional Block Attention Module

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation