Skip to main content

The University of Passau Open Emotion Recognition System for the Multimodal Emotion Challenge

  • Conference paper
  • First Online:
Pattern Recognition (CCPR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Abstract

This paper presents the University of Passau’s approaches for the Multimodal Emotion Recognition Challenge 2016. For audio signals, we exploit Bag-of-Audio-Words techniques combining Extreme Learning Machines and Hierarchical Extreme Learning Machines. For video signals, we use not only the information from the cropped face of a video frame, but also the broader contextual information from the entire frame. This information is extracted via two Convolutional Neural Networks pre-trained for face detection and object classification. Moreover, we extract facial action units, which reflect facial muscle movements and are known to be important for emotion recognition. Long Short-Term Memory Recurrent Neural Networks are deployed to exploit temporal information in the video representation. Average late fusion of audio and video systems is applied to make prediction for multimodal emotion recognition. Experimental results on the challenge database demonstrate the effectiveness of our proposed systems when compared to the baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baltrušaitis, T., Robinson, P., Morency, L.P.: OpenFace: an open source facial behavior analysis toolkit. In: Proceedings of WACV, Lace Placid, USA, pp. 1–10 (2016)

    Google Scholar 

  2. Bao, W., Li, Y., Gu, M., Yang, M., Li, H., Chao, L., Tao, J.: Building a Chinese natural emotional audio-visual database. In: Proceedings of ICSP, Hangzhou, China, pp. 583–587. IEEE (2014)

    Google Scholar 

  3. Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Proceedings of ACII, pp. 511–516. IEEE (2013)

    Google Scholar 

  4. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of ICML, Beijing, China, pp. 647–655 (2014)

    Google Scholar 

  5. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)

    Article  MATH  Google Scholar 

  6. Eyben, F., Scherer, K., Schuller, B., Sundberg, J., André, E., Busso, C., Devillers, L., Epps, J., Laukka, P., Narayanan, S., Truong, K.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)

    Article  Google Scholar 

  7. Eyben, F., Weninger, F., Groß, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pp. 835–838. ACM (2013)

    Google Scholar 

  8. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of INTERSPEECH, Singapore, pp. 223–227 (2014)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  10. Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)

    Article  MATH  Google Scholar 

  11. Huang, G.B.: What are extreme learning machines? Filling the gap between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Comput. 7(3), 263–278 (2015)

    Article  Google Scholar 

  12. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)

    Article  Google Scholar 

  13. Kaya, H., Eyben, F., Salah, A.A.: CCA based feature selection with application to continuous depression recognition from acoustic speech features. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp. 3757–3761. IEEE (2014)

    Google Scholar 

  14. Kaya, H., Salah, A.A.: Combining modality-specific extreme learning machines for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, ICMI 2014, pp. 487–493. ACM (2014)

    Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, Lake Tahoe, USA, pp. 1106–1114 (2012)

    Google Scholar 

  16. Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: Proceedings of CCPR, Chengdu, China (2016). 11 pages

    Google Scholar 

  17. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimedia 16(8), 2203–2213 (2014)

    Article  Google Scholar 

  18. Pancoast, S., Akbacak, M.: Bag-of-audio-words approach for multimedia event classification. In: Proceedings of INTERSPEECH, Portland, USA, pp. 2105–2108 (2012)

    Google Scholar 

  19. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of BMVC, Swansea, UK, pp. 41.1–41.12 (2015)

    Google Scholar 

  20. Pokorny, F., Graf, F., Pernkopf, F., Schuller, B.: Detection of negative emotions in speech signals using bags-of-audio-words. In: Proceedings of ACII, Xi’an, China, pp. 879–884 (2015)

    Google Scholar 

  21. Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1113–1133 (2015)

    Article  Google Scholar 

  22. Schmitt, M., Ringeval, F., Schuller, B.: At the border of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech. In: Proceedings of INTERSPEECH, San Francsico, USA (2016). 5 pages

    Google Scholar 

  23. Schmitt, M., Schuller, B.W.: Openxbow-introducing the passau open-source crossmodal bag-of-words toolkit. CoRR abs/1605.06778 (2016)

    Google Scholar 

  24. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S.S.: The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings of INTERSPEECH, pp. 2794–2797 (2010)

    Google Scholar 

  25. Schuller, B., Batliner, A.: Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley Publishing, Chichester (2013)

    Book  Google Scholar 

  26. Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of INTERSPEECH, Brighton, UK, pp. 312–315 (2009)

    Google Scholar 

  27. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH, Lyon, France, pp. 148–152 (2013)

    Google Scholar 

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014). http://arxiv.org/abs/1409.1556

  29. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of CVPR, Boston, USA, pp. 1–9 (2015)

    Google Scholar 

  30. Tang, J., Deng, C., Guang, G.B.: Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 27(4), 809–821 (2015)

    Article  MathSciNet  Google Scholar 

  31. Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vision Comput. 31(2), 153–163 (2013). Special Issue on Affect Analysis in Continuous Input

    Article  Google Scholar 

  32. Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: Proceedings of CVPR, Boston, USA, pp. 1798–1807 (2015)

    Google Scholar 

  33. Yao, A., Shao, J., Ma, N., Chen, Y.: Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of ICMI, Seattle, USA, pp. 451–458. ACM (2015)

    Google Scholar 

  34. Zhang, Z., Pinto, J., Plahl, C., Schuller, B., Willett, D.: Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices. IEEE Trans. Consum. Electron. 60(3), 525–533 (2014)

    Article  Google Scholar 

  35. Zhang, Z., Ringeval, F., Han, J., Deng, J., Marchi, E., Schuller, B.: Facing realism in spontaneous emotion recognition from speech: feature enhancement by autoencoder with LSTM neural networks. In: Proceedings of INTERSPEECH, San Francsico, CA (2016). 5 pages

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the BMBF IKT2020-Grant under grant agreement No. 16SV7213 (EmotAsS), the European Communitys Seventh Framework Programme through the ERC Starting Grant No. 338164 (iHEARu), and the EU’s Horizon 2020 Programme agreement No. 688835 (DE-ENIGMA), and the European Union’s Horizon 2020 Programme through the Innovative Action No. 645094 (SEWA). It was further partially supported by research grants from the China Scholarship Council (CSC) awarded to Xinzhou Xu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Deng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Deng, J. et al. (2016). The University of Passau Open Emotion Recognition System for the Multimodal Emotion Challenge. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_54

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_54

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics