Multilingual visual sentiment concept clustering and analysis

Abstract

Visual content is a rich medium that can be used to communicate not only facts and events, but also emotions and opinions. In some cases, visual content may carry a universal affective bias (e.g., natural disasters or beautiful scenes). Often however, to achieve a parity in the affections a visual media invokes in its recipient compared to the one an author intended requires a deep understanding and even sharing of cultural backgrounds. In this study, we propose a computational framework for the clustering and analysis of multilingual visual affective concepts used in different languages which enable us to pinpoint alignable differences (via similar concepts) and nonalignable differences (via unique concepts) across cultures. To do so, we crowdsource sentiment labels for the MVSO dataset, which contains 16 K multilingual visual sentiment concepts and 7.3M images tagged with these concepts. We then represent these concepts in a distribution-based word vector space via (1) pivotal translation or (2) cross-lingual semantic alignment. We then evaluate these representations on three tasks: affective concept retrieval, concept clustering, and sentiment prediction—all across languages. The proposed clustering framework enables the analysis of the large multilingual dataset both quantitatively and qualitatively. We also show a novel use case consisting of a facial image data subset and explore cultural insights about visual sentiment concepts in such portrait-focused images.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    http://www.crowdflower.com.

  2. 2.

    https://cloud.google.com/translate.

  3. 3.

    We did not perform lemmatization or any other preprocessing step to preserve the original visual concept properties.

  4. 4.

    http://corpora2.informatik.uni-leipzig.de/download.html.

  5. 5.

    https://code.google.com/p/word2vec.

  6. 6.

    http://webscope.sandbox.yahoo.com.

References

  1. 1.

    Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang S-F (2015) Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: ACM international conference on multimedia, (Brisbane, Australia), pp 159–168

  2. 2.

    Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: 48th annual meeting of the Association for Computational Linguistics. ACL ’10, (Uppsala, Sweden), pp 384–394

  3. 3.

    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  4. 4.

    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, vol. arXiv:1301.3781

  5. 5.

    Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Empirical methods in natural language processing, pp 1532–1543

  6. 6.

    Al-Rfou R, Perozzi B, Skiena S (2013) Polyglot: distributed word representations for multilingual NLP. CoRR, vol arXiv:1307.1662

  7. 7.

    Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Proceedings of COLING 2012, (Mumbai, India), pp 1459–1474

  8. 8.

    Zou WY, Socher R, Cer D, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation.In: Proceedings of the 2013 conference on empirical methods in natural language processing, (Seattle. WA, USA), pp 1393–1398

  9. 9.

    Hermann KM, Blunsom P (2014) Multilingual models for compositional distributed semantics. In: Annual meeting of the association for computational linguistics, (Baltimore, Maryland), pp 58–68

  10. 10.

    Chandar APS, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar VC, Saha A (2014) An autoencoder approach to learning bilingual word representations. CoRR, vol arXiv:1402.1454

  11. 11.

    Hill F, Reichart R, Korhonen A (2014) Simlex-999: evaluating semantic models with (genuine) similarity estimation. CoRR, vol arXiv:1408.3456

  12. 12.

    Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res 49:1–47

    MathSciNet  MATH  Google Scholar 

  13. 13.

    Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: 52nd annual meeting of the association for computational linguistics, (Baltimore, Maryland), pp 721–732

  14. 14.

    Lazaridou A, Pham NT, Baroni M (2015) Combining language and vision with a multimodal skip-gram model. In: Conference of the North American chapter of the association for computational linguistics: human language technologies, (Denver, Colorado), pp 153–163

  15. 15.

    Karpathy A, Joulin A, Li F (2014) Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in neural information processing systems 27, pp 1889–1897, Curran Associates, Inc

  16. 16.

    Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. CoRR, vol arXiv:1411.2539

  17. 17.

    Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Association for computational linguistics

  18. 18.

    Socher R, Karpathy A, Le QV, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. TACL 2:207–218

    Google Scholar 

  19. 19.

    Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Explain images with multimodal recurrent neural networks. CoRR vol. arXiv:1410.1090

  20. 20.

    Kottur S, Vedantam R, Moura JMF, Parikh D (2015) Visual word2vec (vis-w2v): learning visually grounded word embeddings using abstract scenes. CoRR, vol. arXiv:1511.07067

  21. 21.

    Schnabel T, Labutov I, Mimno D, Joachims T (2015) Evaluation methods for unsupervised word embeddings. In: Conference on empirical methods in natural language processing, (Lisbon, Portugal), pp 298–307

  22. 22.

    Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Ling 3:211–225

    Google Scholar 

  23. 23.

    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119

    Google Scholar 

  24. 24.

    Lebret R, Collobert R (2014) Word embeddings through hellinger pca. In: Conference of the European chapter of the association for computational linguistics, (Gothenburg, Sweden), pp 482–490

  25. 25.

    Baroni M, Zamparelli R (2010) Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In: Conference on empirical methods in natural language processing, (Cambridge. MA, USA), pp 1183–1193

  26. 26.

    Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Joint conference on empirical methods in natural language processing and computational natural language learning, (Jeju Island, Korea), pp 1201–1211

  27. 27.

    Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: International conference on new methods in language processing, (Manchester, UK)

  28. 28.

    Freiwald WA, Tsao DY (2014) Neurons that keep a straight face. Natl Acad Sci 111(22):7894–7895

    Article  Google Scholar 

  29. 29.

    Redi M, Rasiwasia N, Aggarwal G, Jaimes A (2015) The beauty of capturing faces: Rating the quality of digital portraits. In: IEEE international conference and workshops on automatic face and gesture recognition, (Ljubljana, Slovenia), pp 1–8

  30. 30.

    Jou B, Bhattacharya S, Chang S-F (2014) Predicting viewer perceived emotions in animated GIFs. In: ACM international conference on multimedia, (Orlando, Florida, USA), pp 213–216

  31. 31.

    Bakhshi S, Shamma DA, Gilbert E (2014) Faces engage us: photos with faces attract more likes and comments on instagram. In: ACM conference on human factors in computing systems, (Toronto, ON, Canada), pp 965–974

  32. 32.

    Liao S, Jain AK, Li SZ (2016) A fast and accurate unconstrained face detector. IEEE Trans Pattern Anal Mach Intell 38:211–223

    Article  Google Scholar 

  33. 33.

    Ammar W, Mulcaire G, Tsvetkov Y, Lample G, Dyer C, Smith NA (2016) Massively multilingual word embeddings. arXiv preprint arXiv:1602.01925

  34. 34.

    Quasthoff U, Richter M, Biemann C (2006) Corpus portal for search in monolingual Corpora. In: Proceedings of the fifth international conference on language resources and evaluation. LREC, pp 1799–1802, Genoa

  35. 35.

    Pappas N, Redi M, Topkara M, Brendan J, Liu H, Chen T, Chang S-F (2015) Multilingual visual sentiment concept matching. In: ACM international conference on multimedia retrieval, pp 151–158, New York, USA

  36. 36.

    Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: 43rd annual meeting on association for computational linguistics, pp 115–124, Ann Arbor, Michigan

  37. 37.

    Brendan J, Chang S-F (2016) Deep cross residual learning for multitask visual recognition. In: Proceedings of the 2016 ACM conference on multimedia conference, pp 998–1007, Amsterdam, Netherlands

  38. 38.

    Bo Pang, Lee Lillian (2008) Opinion mining and sentiment analysis. Found Trends Inf Retrieval 2(1–2):1–135

    Google Scholar 

  39. 39.

    Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: ACL-02 conference on empirical methods in natural language processing Vol 10, pp 79–86, Philadelphia, PA

  40. 40.

    Liu H, Brendan J, Chen T, Topkara M, Pappas N, Redi M, Chang S-F (2015) Complura: exploring and leveraging a large-scale multilingual visual sentiment ontology. pp 417–420, New York, USA

  41. 41.

    Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: 40th annual meeting on association for computational linguistics, pp 417–424, Philadelphia, PA

  42. 42.

    Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis In: 49th annual meeting of the association for computational linguistics: human language technologies, Vol 1, pp 142–150

  43. 43.

    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  44. 44.

    Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for Twitter sentiment classification. In: 52nd annual meeting of the association for computational linguistics, pp 1555–1565, Baltimore, MD

  45. 45.

    Hu M, Liu B (2004) Mining and summarizing customer reviews. In: 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 168–177, Seattle, WA

  46. 46.

    Li Z, Jing F, Zhu X-Y (2006) Movie review mining and summarization. In: 15th ACM international conference on information and knowledge management, pp 43–50, Arlington, VA

  47. 47.

    Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: 17th international conference on World Wide Web, pp 111–120, Beijing, China

  48. 48.

    Sauper C, Haghighi A, Barzilay R (2010) Incorporating content structure into text analysis applications. In: 2010 conference on empirical methods in natural language processing, pp 377–387, Cambridge, MA

  49. 49.

    Lu B, Ott M, Cardie C, Tsou BK (2011) Multi-aspect sentiment analysis with topic models. In: 2011 IEEE 11th international conference on data mining workshops. pp 81–88 Washington, DC

  50. 50.

    McAuley J, Leskovec J, Jurafsky D (2012) Learning attitudes and attributes from multi-aspect reviews In: 2012 IEEE 12th international conference on data mining, pp 1020–1025, Brussels, Belgium

  51. 51.

    Pappas N, Popescu-Belis A (2014) Explaining the stars: weighted multiple-instance learning for aspect-based sentiment analysis. In: Conference on empirical methods in natural language processing, pp 455–466, Doha, Qatar

  52. 52.

    Morency L-P, Mihalcea R, Doshi P (2011) Towards multimodal sentiment analysis: harvesting opinions from the web. In: 13th international conference on multimodal interfaces, pp 169–176, Tokyo, Japan

  53. 53.

    Rosas Veronica, Mihalcea Rada, Morency Louis-Philippe (2013) Multimodal sentiment analysis of Spanish online videos. IEEE Intell Syst 28(3):38–45

    Article  Google Scholar 

  54. 54.

    Cambria Erik, Schuller Bjorn, Xia Yunqing, Havasi Catherine (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21

    Article  Google Scholar 

  55. 55.

    Borth D, Ji R, Chen T, Breuel T, Chang S-F (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: 21st ACM international conference on Multimedia, pp 223–232, Barcelona, Spain

  56. 56.

    You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: 9th ACM international conference on web search and data mining, pp 13–22, San Fransisco, USA

  57. 57.

    Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: 2015 conference on empirical methods in natural language processing, pp 2539–2544, Lisbon, Portugal

  58. 58.

    Dodds PS, Clark EM, Desu S, Frank MR, Reagan AJ, Williams JR, Mitchell L et al (2015) Human language reveals a universal positivity bias. In: Proceedings of the national academy of sciences 112(8): 2389–2394

  59. 59.

    Poria Soujanya, Cambria Erik, Howard Newton, Huang Guang-Bin, Hussain Amir (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59

    Article  Google Scholar 

  60. 60.

    Li, H, Ellis Joseph G, Heng J, Chang S-F (2016) Event specific multimodal pattern mining for knowledge base construction. In: Proceedings of the 2016 ACM on multimedia conference, pp 821–830. ACM

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nikolaos Pappas.

Additional information

Nikolaos Pappas, Miriam Redi, Mercan Topkara, Hongyi Liu have contributed equally.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pappas, N., Redi, M., Topkara, M. et al. Multilingual visual sentiment concept clustering and analysis. Int J Multimed Info Retr 6, 51–70 (2017). https://doi.org/10.1007/s13735-017-0120-4

Download citation

Keywords

  • Multilingual
  • Language
  • Cultures
  • Cross-cultural
  • Emotion
  • Sentiment
  • Ontology
  • Concept detection
  • Social multimedia