Zero-Shot Visual Emotion Recognition by Exploiting BERT

Kang, Hyunwook; Hazarika, Devamanyu; Kim, Dongho; Kim, Jihie

doi:10.1007/978-3-031-16078-3_33

Hyunwook Kang¹⁰,
Devamanyu Hazarika¹¹,
Dongho Kim¹⁰ &
…
Jihie Kim¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 543))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

771 Accesses

Abstract

The explosive growth of multimedia has attracted many people to express their opinions through social media like Flickr and Facebook. As a result, social media has become the rich source of data for analyzing human emotions. Many earlier studies have been conducted to automatically assess human emotions due to their wide range of applications such as education, advertisement, and entertainment. Recently, many researchers have been focusing on visual contents to find out clues for evoking emotions. In literature, this type of study is called visual sentiment analysis. Although a great performance has been achieved by many earlier studies on visual emotion analysis, most of them are limited to classification tasks with pre-determined emotion categories. In this paper, we aim to recognize emotion classes that do not exist in the training set. The proposed model is trained by mapping the visual features to the emotional semantic representation embedded by the BERT language model. By evaluating the model on a cross-domain affective dataset, we achieved 66% accuracy for predicting the unseen emotions not included in the training set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhao, S., Gao, Y., Ding, G., Chua, T.: Real-time multimedia social event detection in microblog. IEEE Trans. Cybern. 48(11), 3218–3231 (2017)
Article Google Scholar
Yang, J., She, D., Lai, Y., Rosin, P. L., Yang M.H.: Weakly supervised coupled networks for visual sentiment analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7584–7592. IEEE, Salt Lake City, UT, USA (2018)
Google Scholar
Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 83–92. Association for Computing Machinery, Firenze Italy (2010)
Google Scholar
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 223–232. Association for Computing Machinery, Barcelona Spain (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Peng, K.C., Chen, T., Sadovnik, A., Gallagher, A.: A mixed bag of emotions: model, predict, and transfer emotion distributions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–868. IEEE, Boston, MA, USA (2015)
Google Scholar
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and benchmark. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 308–314. AAAI Press, Phoenix Arizona (2016)
Google Scholar
Yang, J., She, D., Lai, Y.K., Yang, M. H.: Retrieving and classifying affective images via deep metric learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
Article Google Scholar
Ekman, P., et al.: Universals and cultural differences in the judgments of facial expressions of emotion. J. Person. Soc. Psychol. 53(4), 712–717 (1987)
Google Scholar
Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37, 626–630 (2005)
Article Google Scholar
Zhan, C., She, D., Zhao, S., Cheng, M.M., Yang J.: Zero-shot emotion recognition via affective structural embedding. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1151–1160. Seoul, Korea (South) (2019)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representations. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014)
Google Scholar
Devlin, J., Chang, M.W., Lee K., TouTanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)
Google Scholar
Hazarika, D., Zimmermann, R., Poria, S.: MISA: modality-invariant and -specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016)
Google Scholar
Machajdik, J., Hanbury A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 83–92 (2010)
Google Scholar
Zhao, S., Yao, H., Gao, Y., Ding, G., Chua, T.S.: Predicting personalized image emotion perceptions in social networks. IEEE Trans. Affect. Comput. 9(4), 526–540 (2018)
Article Google Scholar
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3266–3272. AAAI Press, Melbourne Australia (2017)
Google Scholar
Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 47–56. Association for Computing Machinery, Orlando Florida USA (2014)
Google Scholar
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021)
Article Google Scholar
Chen, T., Borth, D., Darrell, T., Chang, S.F.: DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv preprint arXiv:1410.8586 (2014)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388. AAAI Press, Austin Texas (2015)
Google Scholar
You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, vol. 31, issue 1, pp. 231–237 (2017)
Google Scholar
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning – the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4582–4591. IEEE, Honolulu, HI, USA (2017)
Google Scholar
Huang, S., Elhoseiny, M., Elgammal, A., Yang, D.: Learning hypergraph-regularized attribute predictors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–417 (2015)
Google Scholar
Lampert, H.C., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2013)
Article Google Scholar
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: Proceedings of the 23rd national conference on Artificial intelligence, pp. 646–651. AAAI Press, Chicago Illinois (2008)
Google Scholar
Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1641–1648. IEEE, Colorado Springs, CO, USA (2011)
Google Scholar
Yu, X., Aloimonos, Y.: Attribute-based transfer learning for object categorization with zero/one training example. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 127–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_10
Chapter Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, Miami, FL, USA (2009)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. Curran Associates Inc., 57 Morehouse Lane Red Hook NY United States (2013)
Google Scholar
Jayaraman, D., Grauman, K.: Zero shot recognition with unreliable attributes. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3464–3472. MIT Press, 55 Hayward St. Cambridge MA United States (2014)
Google Scholar
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I. Schiele, B.: What helps where – and why? Semantic relatedness for knowledge transfer. The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 910–917. IEEE, San Francisco, CA, USA (2010)
Google Scholar
Rohrbach, M., Ebert, S., Schiele B.: Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 46–54. Curran Associates Inc., Lake Tahoe Nevada (2013)
Google Scholar
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 819–826. IEEE, Portland, OR, USA (2013)
Google Scholar
Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Mach. Learn. 81, 21–35 (2010)
Article MathSciNet Google Scholar
Fu, Z., Xiang, T.A., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA, USA (2015)
Google Scholar
Fu, Y., Sigal, L.: Semi-supervised vocabulary-informed learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5337–5346. IEEE, Las Vegas, NV, USA (2016)
Google Scholar
Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 1270–1278. Curran Associates Inc., Barcelona Spain (2016)
Google Scholar
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3174–3183. IEEE, Honolulu, HI, USA (2017)
Google Scholar
Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5327–5336 (2016)
Google Scholar
Changpinyo, S., Chao, W.L., Sha, F.: Predicting visual exemplars of unseen classes for zero-shot learning. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3476–3485. IEEE, Venice, Italy (2017)
Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile (2015)
Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6034–6042. IEEE, Las Vegas, NV, USA (2016)
Google Scholar
Socher, R., Ganjoo, M., Manning, C.D., Andrew, N.G.: Zero-shot learning through cross-modal transfer. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 935–943. Curran Associates Inc., Lake Tahoe Nevada (2013)
Google Scholar
Frome, A., et al.: DeVise: A deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 2121–2129. Curran Associates Inc., 57 Morehouse Lane Red Hook NY United States (2013)
Google Scholar
Norouzi, M., et al.: Zero-Shot Learning by Convex Combination of Semantic Embeddings, arXiv preprint arXiv:1312.5650 (2013)
Chae, J., Zimmermann, R., Kim, D., Kim, J.: Attentive transfer learning via self-supervised learning for cervical dysplasia diagnosis. J. Inf. Process. Syst. 17(3), 453–461 (2021)
Google Scholar

Download references

Acknowledgment

“This research was supported by the MSIT (Ministry of Science, ICT), Korea, under the High-Potential Individuals Global Training Program) (2021-0-01549) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).”

Author information

Authors and Affiliations

Dongguk University, Jung-gu, Seoul, 04620, South Korea
Hyunwook Kang, Dongho Kim & Jihie Kim
National University of Singapore, 21 Lower Kent Ridge Rd, Singapore, 119077, Singapore
Devamanyu Hazarika

Authors

Hyunwook Kang
View author publications
You can also search for this author in PubMed Google Scholar
Devamanyu Hazarika
View author publications
You can also search for this author in PubMed Google Scholar
Dongho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jihie Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jihie Kim .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, H., Hazarika, D., Kim, D., Kim, J. (2023). Zero-Shot Visual Emotion Recognition by Exploiting BERT. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 543. Springer, Cham. https://doi.org/10.1007/978-3-031-16078-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-16078-3_33
Published: 01 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16077-6
Online ISBN: 978-3-031-16078-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics