Advertisement

Journal on Multimodal User Interfaces

, Volume 10, Issue 2, pp 151–162 | Cite as

Revisiting the EmotiW challenge: how wild is it really?

Classification of human emotions in movie snippets based on multiple features
  • Markus Kächele
  • Martin Schels
  • Sascha Meudt
  • Günther Palm
  • Friedhelm Schwenker
Original Paper

Abstract

The focus of this work is emotion recognition in the wild based on a multitude of different audio, visual and meta features. For this, a method is proposed to optimize multi-modal fusion architectures based on evolutionary computing. Extensive uni- and multi-modal experiments show the discriminative power of each computed feature set and fusion architecture. Furthermore, we summarize the EmotiW 2013/2014 challenges and review the conclusions that have been drawn and compare our results with the state-of-the-art on this dataset.

Keywords

Affective computing EmotiW  Information fusion 

References

  1. 1.
    Almaev TR, Yüce A, Ghitulescu A, Valstar MF (2013) Distribution-based iterative pairwise classification of emotions in the wild using LGBP-TOP. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 535–542Google Scholar
  2. 2.
    Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am 50(2):637–655CrossRefGoogle Scholar
  3. 3.
    Bänziger T, Mortillaro M, Scherer KR (2012) Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12:1161–1179CrossRefGoogle Scholar
  4. 4.
    Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR ’07. ACM, pp 401–408Google Scholar
  5. 5.
    Cardoso JF, Souloumiac A (1993) Blind beamforming for non-gaussian signals. IEE Proc F (Radar Signal Process) 140:362–370CrossRefGoogle Scholar
  6. 6.
    Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 508–513Google Scholar
  7. 7.
    Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T (2008) Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun 50(6):487–503CrossRefGoogle Scholar
  8. 8.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1, pp 886–893Google Scholar
  9. 9.
    Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoust Speech Signal Process IEEE Trans 28(4):357–366CrossRefGoogle Scholar
  10. 10.
    Day M (2013) Emotion recognition with boosted tree classifiers. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 531–534Google Scholar
  11. 11.
    Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 461–466Google Scholar
  12. 12.
    Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 509–516Google Scholar
  13. 13.
    Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 3:34–41CrossRefGoogle Scholar
  14. 14.
    Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion in music. Psychol Music 39(1):18–49CrossRefGoogle Scholar
  15. 15.
    Eyben F, Wöllmer M, Schuller B (2009) OpenEAR - introducing the Munich open-source emotion and affect recognition toolkit. In: Affective computing and intelligent interaction and workshops, 2009. ACII 2009, pp 1–6Google Scholar
  16. 16.
    Gehrig T, Ekenel HK (2013) Why is facial expression analysis in the wild challenging? In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, EmotiW ’13. ACM, pp 9–16Google Scholar
  17. 17.
    Gómez Jáuregui DA, Martin JC (2013) Evaluation of vision-based real-time measures for emotions discrimination under uncontrolled conditions. In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, EmotiW ’13. ACM, pp 17–22Google Scholar
  18. 18.
    Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: IEEE international conference on multimedia and expo, pp 865–868Google Scholar
  19. 19.
    Grosicki M (2014) Neural networks for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 467–472Google Scholar
  20. 20.
    Guoying Z, Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928CrossRefGoogle Scholar
  21. 21.
    Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752CrossRefGoogle Scholar
  22. 22.
    Hermansky H (1997) The modulation spectrum in automatic recognition of speech. In: Proceedings of IEEE workshop on automatic speech recognition and understandingGoogle Scholar
  23. 23.
    Hermansky H, Morgan N, Bayya A, Kohn P (1992) RASTA-PLP speech analysis technique. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP-92), vol 1, pp 121–124Google Scholar
  24. 24.
    Huang X, He Q, Hong X, Zhao G, Pietikäinen M (2014) Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 514–520Google Scholar
  25. 25.
    Kächele M, Schels M, Schwenker F (2014) Inferring depression and affect from application dependent meta knowledge. In: Proceedings of the 4th international workshop on audio/visual emotion challenge, AVEC ’14. ACM, pp 41–48Google Scholar
  26. 26.
    Kächele M., Thiam P., Palm G., Schwenker F., Schels M (2015) Ensemble methods for continuous affect recognition: multi-modality, temporality, and challenges. In: Proceedings of the 5th international workshop on audio/visual emotion challenge, AVEC ’15. ACM, pp 9–16Google Scholar
  27. 27.
    Kächele M, Zharkov D, Meudt S, Schwenker F (2014) Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: Proceedings of the international conference on pattern recognition (ICPR), pp 803–808Google Scholar
  28. 28.
    Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçere Ç, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 543–550Google Scholar
  29. 29.
    Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. Autom Face Gesture Recognit 2000:46–53CrossRefGoogle Scholar
  30. 30.
    Kaya H, Salah AA (2014) Combining modality-specific extreme learning machines for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 487–493Google Scholar
  31. 31.
    Krishna T, Rai A, Bansal S, Khandelwal S, Gupta S, Goyal D (2013) Emotion recognition using facial and audio features. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 557–564Google Scholar
  32. 32.
    Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR), vol 2, pp II-53–II-60Google Scholar
  33. 33.
    Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 525–530Google Scholar
  34. 34.
    Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 494–501Google Scholar
  35. 35.
    McKeown G, Valstar MF, Cowie R, Pantic M (2010) The SEMAINE corpus of emotionally coloured character interactions. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 1079–1084Google Scholar
  36. 36.
    Meng H, Pears N (2009) Descriptive temporal template features for visual motion recognition. Pattern Recognit Lett 30(12):1049–1058CrossRefGoogle Scholar
  37. 37.
    Meng H, Romera-Paredes B, Bianchi-Berthouze N (2011) Emotion recognition by two view SVM-2K classifier on dynamic facial expression features. In: 2011 IEEE international conference on automatic face gesture recognition and workshops (FG 2011), pp 854–859Google Scholar
  38. 38.
    Meudt S, Schwenker F (2014) Enhanced autocorrelation in real world emotion recognition. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 502–507Google Scholar
  39. 39.
    Meudt S, Zharkov D, Kächele M, Schwenker F (2013) Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: Proceedings of the international conference on multimodal interaction, ICMI 2013. ACM, pp 551–556Google Scholar
  40. 40.
    Ojala T, Pietikäinen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell IEEE Trans 24(7):971–987CrossRefMATHGoogle Scholar
  41. 41.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATHGoogle Scholar
  42. 42.
    Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125CrossRefGoogle Scholar
  43. 43.
    Ringeval F, Amiriparian S, Eyben F, Scherer K, Schuller B (2014) Emotion recognition in the wild: Incorporating voice and lip activity in multimodal decision-level fusion. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 473–480Google Scholar
  44. 44.
    Ringeval F, Sonderegger A, Sauer J, Lalanne D (2013) Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: Proceedings of face and gestures 2013, 2nd IEEE international workshop on emotion representation, analysis and synthesis in continuous time and space (EmoSPACE)Google Scholar
  45. 45.
    Robinson DW, Dadson RS (1956) A re-determination of the equal-loudness relations for pure tones. Br J Appl Phys 7(5):166–181CrossRefGoogle Scholar
  46. 46.
    Sidorov M, Minker W (2014) Emotion recognition in real-world conditions with acoustic and visual features. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 521–524Google Scholar
  47. 47.
    Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 517–524Google Scholar
  48. 48.
    Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 481–486Google Scholar
  49. 49.
    Tolonen T, Karjalainen M (2000) A computationally efficient multipitch analysis model. IEEE Trans Speech Audio Process 8(6):708–716CrossRefGoogle Scholar
  50. 50.
    Walter S, Scherer S, Schels M, Glodek M, Hrabal D, Schmidt M, Böck R, Limbrecht K, Traue H, Schwenker F (2011) Multimodal emotion classification in naturalistic user behavior, towards mobile and intelligent interaction environments, LNCS. In: Jacko J (ed) Human–computer interaction, vol 6763. Springer, Berlin Heidelberg, pp 603–611Google Scholar
  51. 51.
    Weiss S, Indurkhya N, Zhang T, Damerau F (2005) Text mining: predictive methods for analyzing unstructured information, 1st edn. Springer, New YorkCrossRefMATHGoogle Scholar

Copyright information

© OpenInterface Association 2016

Authors and Affiliations

  • Markus Kächele
    • 1
  • Martin Schels
    • 1
  • Sascha Meudt
    • 1
  • Günther Palm
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations