Skip to main content

Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9947))

Included in the following conference series:

Abstract

To build a robust system for predicting emotions from user-generated videos is a challenging problem due to the diverse contents and the high level abstraction of human emotions. Evidenced by the recent success of deep learning (e.g. Convolutional Neural Networks, CNN) in several visual competitions, CNN is expected to be a possible solution to conquer certain challenges in human cognitive processing, such as emotion prediction. The emotion wheel (a widely used emotion categorization in psychology) may provide a guidance on building basic cognitive structure for CNN feature learning. In this work, we try to predict emotions from user-generated videos with the aid of emotion wheel guided CNN feature extractors. Experimental results show that the emotion wheel guided and CNN learned features improved the average emotion prediction accuracy rate to 54.2 %, which is better than that of the related state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ayache, S., Quénot, G., Gensel, J.: Classifier fusion for SVM-based multimedia semantic indexing. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 494–504. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_44

    Chapter  Google Scholar 

  2. Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM 2013, pp. 223–232. ACM, New York (2013)

    Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies foraccurate object detection and semantic segmentation. In: IEEE CVPR 2014 (2014)

    Google Scholar 

  4. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)

    Google Scholar 

  5. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE CVPR 2014 (2014)

    Google Scholar 

  6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) NIPS 2012, pp. 1097–1105 (2012)

    Google Scholar 

  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semanticsegmentation. In: IEEE CVPR 2015 (2015)

    Google Scholar 

  8. Pang, L., Ngo, C.-W.: Mutlimodal learning with deep boltzmann machine for emotion prediction in user generated videos. In: ICMR 2015, ICMR 2015, pp. 619–622. ACM, New York (2015)

    Google Scholar 

  9. Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_38

    Google Scholar 

  10. Plutchik, R.: Emotion: Theory, Research and Experience, vol. 1. Academic Press, New York (1980)

    Google Scholar 

  11. Rasheed, Z., Sheikh, Y., Shah, M.: On the use of computable features for film classification. IEEE TCSVT 15(1), 52–64 (2005)

    Google Scholar 

  12. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV, 1–42, April 2015

    Google Scholar 

  13. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) NIPS 2013, pp. 163–171 (2013)

    Google Scholar 

  14. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) NIPS 2014, pp. 568–576 (2014)

    Google Scholar 

  15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. In: ICLR 2015 (2015)

    Google Scholar 

  16. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: IEEE CVPR 2014, pp. 1701–1708 (2014)

    Google Scholar 

  17. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV 2013 (2013)

    Google Scholar 

  18. Wang, H.L., Cheong, L.-F.: Affective understanding in film. IEEE TCSVT 16(6), 689–704 (2006)

    Google Scholar 

  19. Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: IEEE ICIP 2008, pp. 117–120, October 2008

    Google Scholar 

  20. Jiang, Y.-G., Xue, X., Baohan, X.: Predicting emotions in user-generated videos. In: AAAI 2014, Canada (2014)

    Google Scholar 

  21. Zha, S., Luisier, F., Andrews, W., Srivastava, N., Salakhutdinov, R.: Exploiting image-trained CNN architectures for unconstrained video classification. In: BMVC 2015 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ja-Ling Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ho, CT., Lin, YH., Wu, JL. (2016). Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46687-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46686-6

  • Online ISBN: 978-3-319-46687-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics