Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning

Ho, Che-Ting; Lin, Yu-Hsun; Wu, Ja-Ling

doi:10.1007/978-3-319-46687-3_1

Che-Ting Ho¹⁹,
Yu-Hsun Lin¹⁹ &
Ja-Ling Wu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9947))

Included in the following conference series:

International Conference on Neural Information Processing

2864 Accesses
1 Citations

Abstract

To build a robust system for predicting emotions from user-generated videos is a challenging problem due to the diverse contents and the high level abstraction of human emotions. Evidenced by the recent success of deep learning (e.g. Convolutional Neural Networks, CNN) in several visual competitions, CNN is expected to be a possible solution to conquer certain challenges in human cognitive processing, such as emotion prediction. The emotion wheel (a widely used emotion categorization in psychology) may provide a guidance on building basic cognitive structure for CNN feature learning. In this work, we try to predict emotions from user-generated videos with the aid of emotion wheel guided CNN feature extractors. Experimental results show that the emotion wheel guided and CNN learned features improved the average emotion prediction accuracy rate to 54.2 %, which is better than that of the related state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ayache, S., Quénot, G., Gensel, J.: Classifier fusion for SVM-based multimedia semantic indexing. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 494–504. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_44
Chapter Google Scholar
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM 2013, pp. 223–232. ACM, New York (2013)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies foraccurate object detection and semantic segmentation. In: IEEE CVPR 2014 (2014)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE CVPR 2014 (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) NIPS 2012, pp. 1097–1105 (2012)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semanticsegmentation. In: IEEE CVPR 2015 (2015)
Google Scholar
Pang, L., Ngo, C.-W.: Mutlimodal learning with deep boltzmann machine for emotion prediction in user generated videos. In: ICMR 2015, ICMR 2015, pp. 619–622. ACM, New York (2015)
Google Scholar
Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_38
Google Scholar
Plutchik, R.: Emotion: Theory, Research and Experience, vol. 1. Academic Press, New York (1980)
Google Scholar
Rasheed, Z., Sheikh, Y., Shah, M.: On the use of computable features for film classification. IEEE TCSVT 15(1), 52–64 (2005)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV, 1–42, April 2015
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) NIPS 2013, pp. 163–171 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) NIPS 2014, pp. 568–576 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. In: ICLR 2015 (2015)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: IEEE CVPR 2014, pp. 1701–1708 (2014)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV 2013 (2013)
Google Scholar
Wang, H.L., Cheong, L.-F.: Affective understanding in film. IEEE TCSVT 16(6), 689–704 (2006)
Google Scholar
Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: IEEE ICIP 2008, pp. 117–120, October 2008
Google Scholar
Jiang, Y.-G., Xue, X., Baohan, X.: Predicting emotions in user-generated videos. In: AAAI 2014, Canada (2014)
Google Scholar
Zha, S., Luisier, F., Andrews, W., Srivastava, N., Salakhutdinov, R.: Exploiting image-trained CNN architectures for unconstrained video classification. In: BMVC 2015 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of GINM, National Taiwan University, Taipei, Taiwan
Che-Ting Ho, Yu-Hsun Lin & Ja-Ling Wu

Authors

Che-Ting Ho
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ja-Ling Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ja-Ling Wu .

Editor information

Editors and Affiliations

The University of Tokyo, Tokyo, Japan
Akira Hirose
Kobe University, Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology, Ikoma, Japan
Kazushi Ikeda
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences, Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ho, CT., Lin, YH., Wu, JL. (2016). Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-46687-3_1
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics