Exploiting objective text description of images for visual sentiment analysis

Ortis, Alessandro; Farinella, Giovanni Maria; Torrisi, Giovanni; Battiato, Sebastiano

doi:10.1007/s11042-019-08312-7

Exploiting objective text description of images for visual sentiment analysis

Published: 07 January 2020

Volume 80, pages 22323–22346, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Alessandro Ortis ORCID: orcid.org/0000-0003-3461-4679¹,
Giovanni Maria Farinella¹,
Giovanni Torrisi² &
…
Sebastiano Battiato¹

1021 Accesses
17 Citations
Explore all metrics

Abstract

This paper addresses the problem of Visual Sentiment Analysis focusing on the estimation of the polarity of the sentiment evoked by an image. Starting from an embedding approach which exploits both visual and textual features, we attempt to boost the contribution of each input view. We propose to extract and employ an Objective Text description of images rather than the classic Subjective Text provided by the users (i.e., title, tags and image description) which is extensively exploited in the state of the art to infer the sentiment associated to social images. Objective Text is obtained from the visual content of the images through recent deep learning architectures which are used to classify object, scene and to perform image captioning. Objective Text features are then combined with visual features in an embedding space obtained with Canonical Correlation Analysis. The sentiment polarity is then inferred by a supervised Support Vector Machine. During the evaluation, we compared an extensive number of text and visual features combinations and baselines obtained by considering the state of the art methods. Experiments performed on a representative dataset of 47235 labelled samples demonstrate that the exploitation of Objective Text helps to outperform state-of-the-art for sentiment polarity estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual sentiment analysis with semantic correlation enhancement

Article Open access 19 December 2023

Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment Analysis

Article 18 November 2022

A Deep Learning Model for Visual Sentiment Analysis of Social Media

Notes

Our implementation exploits the MVSO English model provided by [23], that corresponds to the DeepSentiBank CNN fine-tuned to predict 4342 English Adjective Noun Pairs.
The code to repeat the performance evaluation is available at the URL: http://iplab.dmi.unict.it/sentimentembedding/
http://www.csie.ntu.edu.tw/~cjlin/liblinear/.

References

Ahmad K, Mekhalfi ML, Conci N, Melgani F, Natale FD (2018) Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing Communications, and Applications (TOMM) 14(2):51
Google Scholar
Baecchi C, Uricchio T, Bertini M, Del Bimbo A (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tools Appl 75(5):2507–2525
Article Google Scholar
Battiato S, Farinella GM, Milotta FL, Ortis A, Addesso L, Casella A, D’Amico V, Torrisi G (2016) The social picture. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 397–400. ACM
Battiato S, Moltisanti M, Ravì F, Bruna AR, Naccari F (2013) Aesthetic scoring of digital portraits for consumer applications. In: IS&T/SPIE electronic imaging, pp 866008–866008. International Society for Optics and Photonics
Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232. ACM
Campos V, Jou B, i Nieto XG (2017) From pixels to sentiment: Fine-tuning cnns for visual sentiment prediction. Image and Vision Computing 65:15–22. https://doi.org/10.1016/j.imavis.2017.01.011. http://www.sciencedirect.com/science/article/pii/S0262885617300355. Multimodal Sentiment Analysis and Mining in the Wild Image and Vision Computing
Article Google Scholar
Campos V, Salvador A, Giró-i Nieto X, Jou B (2015) Diving deep into sentiment: Understanding fine-tuned cnns for visual sentiment prediction. In: Proceedings of the 1st international workshop on affect & sentiment in multimedia, ASM ’15. https://doi.org/10.1145/2813524.2813530. ACM, New York, pp 57–62
Chen T, Borth D, Darrell T, Chang SF (2014) Deepsentibank:, Visual sentiment concept classification with deep convolutional neural networks. arXiv:1410.8586
Cui P, Liu S, Zhu W (2017) General knowledge embedded image representation learning. IEEE Transactions on Multimedia
Datta R, Joshi D, Li J, Wang JZ (2006) Studying aesthetics in photographic images using a computational approach. In: European conference on computer vision, pp 288–301. Springer
Esuli A, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of The European language resources association, vol 6, pp 417–422. Citeseer
Fu Y, Hospedales TM, Xiang T, Fu Z, Gong S (2014) Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of the European conference on computer vision, pp 584–599. Springer
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106 (2):210–233
Article Google Scholar
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: Proceedings of the European conference on computer vision, pp 529–545. Springer
Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 902–909. IEEE
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Article Google Scholar
Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image–text sentiment analysis via deep multimodal attentive fusion. Knowl-Based Syst 167:26–37
Article Google Scholar
Hung C, Lin HK (2013) Using objective words in sentiwordnet to improve sentiment classification for word of mouth. IEEE Intell Syst 28(2):47–54
Article Google Scholar
Hwang SJ, Grauman K (2010) Accounting for the relative importance of objects in image retrieval. In: Proceedings of British machine vision conference, vol 1, 2
Hwang SJ, Grauman K (2012) Learning the relative importance of objects from tagged images for retrieval and cross-modal search. Int J Comput Vis 100(2):134–153
Article MathSciNet Google Scholar
Itten J (1962) The art of color; the subjective experience and objective rationale of colour
Johnson J, Ballan L, Fei-Fei L (2015) Love thy neighbors: Image annotation by exploiting image metadata. In: Proceedings of the IEEE international conference on computer vision, pp 4624–4632
Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang SF (2015) Visual affect around the world: A large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd ACM international conference on multimedia, pp 159–168. ACM
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Katsurai M, Satoh S (2016) Image sentiment analysis using latent correlations among visual, textual, and sentiment views. In: Inproceedings of the IEEE international conference on acoustics, speech and signal processing, pp 2837–2841. IEEE
Lei X, Qian X, Zhao G (2016) Rating prediction based on social sentiment from textual reviews. IEEE Trans Multimed 18(9):1910–1921
Article Google Scholar
Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surveys (CSUR) 49(1):14
Article Google Scholar
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, pp 83–92. ACM
Mike T, Kevan B, Georgios P, Di C, Arvid K (2010) Sentiment in short strength detection informal text. Journal of the Association for Information Science and Technology 61(12):2544–2558
Google Scholar
Miller GA (1995) Wordnet: a lexical database for english. In: Communications of the ACM, vol 38, pp 39–41. ACM
Ortis A, Farinella GM, Torrisi G, Battiato S (2018) Visual sentiment analysis based on on objective text description of images. In: 2018 International conference on content-based multimedia indexing (CBMI), pp 1–6. IEEE
Pang L, Zhu S, Ngo CW (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimed 17(11):2008–2020
Article Google Scholar
Perronnin F, Sénchez J, Xerox YL (2010) Large-scale image categorization with explicit data embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2297–2304
Qian S, Zhang T, Xu C, Shao J (2016) Multi-modal event topic model for social event analysis. IEEE Trans Multimed 18(2):233–246
Article Google Scholar
Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In: Inproceedings of the neural information processing systems, vol 3, pp 5
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260. ACM
Rudinac S, Larson M, Hanjalic A (2013) Learning crowdsourced user preferences for visual summarization of image collections. IEEE Trans Multimed 15(6):1231–1243
Article Google Scholar
Siersdorfer S, Minack E, Deng F, Hare J (2010) Analyzing and predicting sentiment of images on the social web. In: Proceedings of the 18th ACM international conference on multimedia, pp 715–718. ACM
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Valdez P, Mehrabian A (1994) Effects of color on emotions. In: Journal of experimental psychology: General, vol. 123, p. 394. American Psychological Association
Wang G, Hoiem D, Forsyth D (2009) Building text features for object image classification. In: Inproceedings of the IEEE conference on computer vision and pattern recognition, pp 1367–1374
Wang Y, Wang S, Tang J, Liu H, Li B (2015) Unsupervised sentiment analysis for social media images. In: Proceedings of the 24th international joint conference on artificial intelligence, Buenos Aires, Argentina, pp 2378–2379
Xu C, Cetintas S, Lee K, Li L (2014) Visual sentiment prediction with deep convolutional neural networks. arXiv:1411.5731
Yang X, Zhang T, Xu C (2015) Cross-domain feature learning in multimedia. IEEE Trans Multimed 17(1):64–78
Article Google Scholar
You Q, Cao L, Cong Y, Zhang X, Luo J (2015) A multifaceted approach to social multimedia-based prediction of elections. IEEE Trans Multimed 17 (12):2271–2280
Article Google Scholar
You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: 29th AAAI conference on artificial intelligence
Yu FX, Cao L, Feris RS, Smith JR, Chang SF (2013) Designing category-level attributes for discriminative visual recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 771–778
Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the 2nd international workshop on issues of sentiment discovery and opinion mining. ACM
Yuan Z, Sang J, Xu C (2013) Tag-aware image classification via nested deep belief nets. In: 2013 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE
Yuan Z, Sang J, Xu C, Liu Y (2014) A unified framework of latent feature learning in social media. IEEE Trans Multimed 16(6):1624–1635
Article Google Scholar
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Zhu X, Cao B, Xu S, Liu B, Cao J (2019) Joint visual-textual sentiment analysis based on cross-modality attention mechanism. In: International conference on multimedia modeling, pp 264–276. Springer

Download references

Acknowledgments

This work has been partially supported by Telecom Italia TIM - Joint Open Lab.

Author information

Authors and Affiliations

University of Catania, Viale A. Doria 6, Catania, 95125, Italy
Alessandro Ortis, Giovanni Maria Farinella & Sebastiano Battiato
JOL Catania - Telecom Italia, Viale A. Doria 6, Catania, 95125, Italy
Giovanni Torrisi

Authors

Alessandro Ortis
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Maria Farinella
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Torrisi
View author publications
You can also search for this author in PubMed Google Scholar
Sebastiano Battiato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessandro Ortis.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ortis, A., Farinella, G.M., Torrisi, G. et al. Exploiting objective text description of images for visual sentiment analysis. Multimed Tools Appl 80, 22323–22346 (2021). https://doi.org/10.1007/s11042-019-08312-7

Download citation

Received: 23 July 2018
Revised: 18 July 2019
Accepted: 01 October 2019
Published: 07 January 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-019-08312-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting objective text description of images for visual sentiment analysis

Abstract

Access this article

Similar content being viewed by others

Visual sentiment analysis with semantic correlation enhancement

Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment Analysis

A Deep Learning Model for Visual Sentiment Analysis of Social Media

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting objective text description of images for visual sentiment analysis

Abstract

Access this article

Similar content being viewed by others

Visual sentiment analysis with semantic correlation enhancement

Deep Convolutional Neural Networks with Transfer Learning for Visual Sentiment Analysis

A Deep Learning Model for Visual Sentiment Analysis of Social Media

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation