Abstract
The explosive growth of multimedia has attracted many people to express their opinions through social media like Flickr and Facebook. As a result, social media has become the rich source of data for analyzing human emotions. Many earlier studies have been conducted to automatically assess human emotions due to their wide range of applications such as education, advertisement, and entertainment. Recently, many researchers have been focusing on visual contents to find out clues for evoking emotions. In literature, this type of study is called visual sentiment analysis. Although a great performance has been achieved by many earlier studies on visual emotion analysis, most of them are limited to classification tasks with pre-determined emotion categories. In this paper, we aim to recognize emotion classes that do not exist in the training set. The proposed model is trained by mapping the visual features to the emotional semantic representation embedded by the BERT language model. By evaluating the model on a cross-domain affective dataset, we achieved 66% accuracy for predicting the unseen emotions not included in the training set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhao, S., Gao, Y., Ding, G., Chua, T.: Real-time multimedia social event detection in microblog. IEEE Trans. Cybern. 48(11), 3218–3231 (2017)
Yang, J., She, D., Lai, Y., Rosin, P. L., Yang M.H.: Weakly supervised coupled networks for visual sentiment analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7584–7592. IEEE, Salt Lake City, UT, USA (2018)
Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 83–92. Association for Computing Machinery, Firenze Italy (2010)
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 223–232. Association for Computing Machinery, Barcelona Spain (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Peng, K.C., Chen, T., Sadovnik, A., Gallagher, A.: A mixed bag of emotions: model, predict, and transfer emotion distributions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–868. IEEE, Boston, MA, USA (2015)
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and benchmark. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 308–314. AAAI Press, Phoenix Arizona (2016)
Yang, J., She, D., Lai, Y.K., Yang, M. H.: Retrieving and classifying affective images via deep metric learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
Ekman, P., et al.: Universals and cultural differences in the judgments of facial expressions of emotion. J. Person. Soc. Psychol. 53(4), 712–717 (1987)
Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37, 626–630 (2005)
Zhan, C., She, D., Zhao, S., Cheng, M.M., Yang J.: Zero-shot emotion recognition via affective structural embedding. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1151–1160. Seoul, Korea (South) (2019)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representations. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014)
Devlin, J., Chang, M.W., Lee K., TouTanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)
Hazarika, D., Zimmermann, R., Poria, S.: MISA: modality-invariant and -specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016)
Machajdik, J., Hanbury A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 83–92 (2010)
Zhao, S., Yao, H., Gao, Y., Ding, G., Chua, T.S.: Predicting personalized image emotion perceptions in social networks. IEEE Trans. Affect. Comput. 9(4), 526–540 (2018)
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3266–3272. AAAI Press, Melbourne Australia (2017)
Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 47–56. Association for Computing Machinery, Orlando Florida USA (2014)
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021)
Chen, T., Borth, D., Darrell, T., Chang, S.F.: DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv preprint arXiv:1410.8586 (2014)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388. AAAI Press, Austin Texas (2015)
You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, vol. 31, issue 1, pp. 231–237 (2017)
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning – the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4582–4591. IEEE, Honolulu, HI, USA (2017)
Huang, S., Elhoseiny, M., Elgammal, A., Yang, D.: Learning hypergraph-regularized attribute predictors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–417 (2015)
Lampert, H.C., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2013)
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: Proceedings of the 23rd national conference on Artificial intelligence, pp. 646–651. AAAI Press, Chicago Illinois (2008)
Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1641–1648. IEEE, Colorado Springs, CO, USA (2011)
Yu, X., Aloimonos, Y.: Attribute-based transfer learning for object categorization with zero/one training example. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 127–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_10
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, Miami, FL, USA (2009)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. Curran Associates Inc., 57 Morehouse Lane Red Hook NY United States (2013)
Jayaraman, D., Grauman, K.: Zero shot recognition with unreliable attributes. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3464–3472. MIT Press, 55 Hayward St. Cambridge MA United States (2014)
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I. Schiele, B.: What helps where – and why? Semantic relatedness for knowledge transfer. The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 910–917. IEEE, San Francisco, CA, USA (2010)
Rohrbach, M., Ebert, S., Schiele B.: Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 46–54. Curran Associates Inc., Lake Tahoe Nevada (2013)
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 819–826. IEEE, Portland, OR, USA (2013)
Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Mach. Learn. 81, 21–35 (2010)
Fu, Z., Xiang, T.A., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA, USA (2015)
Fu, Y., Sigal, L.: Semi-supervised vocabulary-informed learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5337–5346. IEEE, Las Vegas, NV, USA (2016)
Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 1270–1278. Curran Associates Inc., Barcelona Spain (2016)
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3174–3183. IEEE, Honolulu, HI, USA (2017)
Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5327–5336 (2016)
Changpinyo, S., Chao, W.L., Sha, F.: Predicting visual exemplars of unseen classes for zero-shot learning. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3476–3485. IEEE, Venice, Italy (2017)
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile (2015)
Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6034–6042. IEEE, Las Vegas, NV, USA (2016)
Socher, R., Ganjoo, M., Manning, C.D., Andrew, N.G.: Zero-shot learning through cross-modal transfer. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 935–943. Curran Associates Inc., Lake Tahoe Nevada (2013)
Frome, A., et al.: DeVise: A deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 2121–2129. Curran Associates Inc., 57 Morehouse Lane Red Hook NY United States (2013)
Norouzi, M., et al.: Zero-Shot Learning by Convex Combination of Semantic Embeddings, arXiv preprint arXiv:1312.5650 (2013)
Chae, J., Zimmermann, R., Kim, D., Kim, J.: Attentive transfer learning via self-supervised learning for cervical dysplasia diagnosis. J. Inf. Process. Syst. 17(3), 453–461 (2021)
Acknowledgment
“This research was supported by the MSIT (Ministry of Science, ICT), Korea, under the High-Potential Individuals Global Training Program) (2021-0-01549) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kang, H., Hazarika, D., Kim, D., Kim, J. (2023). Zero-Shot Visual Emotion Recognition by Exploiting BERT. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 543. Springer, Cham. https://doi.org/10.1007/978-3-031-16078-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-16078-3_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16077-6
Online ISBN: 978-3-031-16078-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)