Skip to main content

A Survey on Automatic Multimodal Emotion Recognition in the Wild

Part of the Intelligent Systems Reference Library book series (ISRL,volume 189)

Abstract

Affective computing has been an active area of research for the past two decades. One of the major component of affective computing is automatic emotion recognition. This chapter gives a detailed overview of different emotion recognition techniques and the predominantly used signal modalities. The discussion starts with the different emotion representations and their limitations. Given that affective computing is a data-driven research area, a thorough comparison of standard emotion labelled databases is presented. Based on the source of the data, feature extraction and analysis techniques are presented for emotion recognition. Further, applications of automatic emotion recognition are discussed along with current and important issues such as privacy and fairness.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-51870-7_3
  • Chapter length: 30 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-51870-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)

References

  1. Agrafioti, F., Hatzinakos, D., Anderson, A.K.: ECG pattern analysis for emotion detection. IEEE Trans. Affect. Comput. 3(1), 102–115 (2012)

    Google Scholar 

  2. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12, 2037–2041 (2006)

    MATH  Google Scholar 

  3. Alarcao, S.M., Fonseca, M.J.: Emotions recognition using EEG signals: a survey. IEEE Trans. Affect. Comput. (2017)

    Google Scholar 

  4. Albanie, S., Nagrani, A., Vedaldi, A., Zisserman, A.: Emotion recognition in speech using cross-model transfer in the wild. arXiv preprint arXiv:1808.05561 (2018)

  5. Ali, M., Mosa, A.H., Al Machot, F., Kyamakya, K.: Emotion recognition involving physiological and speech signals: a comprehensive review. In: Recent Advances in Nonlinear Dynamics and Synchronization, pp. 287–302. Springer (2018)

    Google Scholar 

  6. Asghar, N., Poupart, P., Hoey, J., Jiang, X., Mou, L.: Affective neural response generation. In: European Conference on Information Retrieval, pp. 154–166. Springer (2018)

    Google Scholar 

  7. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In: Computer Vision and Pattern Recognition, pp. 1859–1866. IEEE (2014)

    Google Scholar 

  8. Bachorowski, J.A.: Vocal expression and perception of emotion. Curr. Direct. Psychol. Sci. 8(2), 53–57 (1999)

    Google Scholar 

  9. Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: Openface 2.0: Facial behavior analysis toolkit. In: 13th International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018)

    Google Scholar 

  10. Bänziger, T., Mortillaro, M., Scherer, K.R.: Introducing the geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12(5), 1161 (2012)

    Google Scholar 

  11. Barber, S.J., Lee, H., Becerra, J., Tate, C.C.: Emotional expressions affect perceptions of younger and older adults’ everyday competence. Psychol. Aging 34(7), 991 (2019)

    Google Scholar 

  12. Basbrain, A.M., Gan, J.Q., Sugimoto, A., Clark, A.: A neural network approach to score fusion for emotion recognition. In: 10th Computer Science and Electronic Engineering (CEEC), pp. 180–185 (2018)

    Google Scholar 

  13. Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M.J., Wong, M.: “You Stupid Tin Box” Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus. Lrec (2004)

    Google Scholar 

  14. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid Kernel. In: 6th ACM international conference on Image and video retrieval, pp. 401–408. ACM (2007)

    Google Scholar 

  15. Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)

    Google Scholar 

  16. Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., Narayanan, S.S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008)

    Google Scholar 

  17. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: 6th International Conference on Multimodal Interfaces, pp. 205–211. ACM (2004)

    Google Scholar 

  18. Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., Provost, E.M.: MSP-IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 8(1), 67–80 (2017)

    Google Scholar 

  19. Cairns, D.A., Hansen, J.H.: Nonlinear analysis and classification of speech under stressed conditions. J. Acoust. Soc. Am. 96(6), 3392–3400 (1994)

    Google Scholar 

  20. Cambria, E.: Affective computing and sentiment analysis. Intell. Syst. 31(2), 102–107 (2016)

    Google Scholar 

  21. Chen, J., Chen, Z., Chi, Z., Fu, H.: Dynamic texture and geometry features for facial expression recognition in video. In: International Conference on Image Processing (ICIP), pp. 4967–4971. IEEE (2015)

    Google Scholar 

  22. Chen, W., Picard, R.W.: Eliminating physiological information from facial videos. In: 12th International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 48–55. IEEE (2017)

    Google Scholar 

  23. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  24. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 6, 681–685 (2001)

    Google Scholar 

  25. Correa, J.A.M., Abadi, M.K., Sebe, N., Patras, I.: AMIGOS: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans. Affect. Comput. (2018)

    Google Scholar 

  26. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)

    Google Scholar 

  27. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE Computer Society (2005)

    Google Scholar 

  28. Davison, A., Merghani, W., Yap, M.: Objective classes for micro-facial expression recognition. J. Imaging 4(10), 119 (2018)

    Google Scholar 

  29. Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: SAMM: a spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 9(1), 116–129 (2018)

    Google Scholar 

  30. Dhall, A., Asthana, A., Goecke, R., Gedeon, T.: Emotion recognition using phog and lpq features. In: Face and Gesture 2011, pp. 878–883. IEEE (2011)

    Google Scholar 

  31. Dhall, A., Goecke, R., Gedeon, T.: Automatic group happiness intensity analysis. IEEE Trans. Affect. Comput. 6(1), 13–26 (2015)

    Google Scholar 

  32. Dhall, A., Goecke, R., Lucey, S., Gedeon, T., et al.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19(3), 34–41 (2012)

    Google Scholar 

  33. Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: audio-video, student engagement and group-level affect prediction. In: International Conference on Multimodal Interaction, pp. 653–656. ACM (2018)

    Google Scholar 

  34. Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Natl. Acad. Sci. 111(15), E1454–E1462 (2014)

    Google Scholar 

  35. Ekman, P., Friesen, W.V.: Unmasking the face: a guide to recognizing emotions from facial clues. Ishk (2003)

    Google Scholar 

  36. Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM, pp. 77–254. A Human Face, Salt Lake City (2002)

    Google Scholar 

  37. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)

    MATH  Google Scholar 

  38. Ertugrul, I.O., Cohn, J.F., Jeni, L.A., Zhang, Z., Yin, L., Ji, Q.: Cross-domain au detection: domains, learning approaches, and measures. In: 14th International Conference on Automatic Face & Gesture Recognition, pp. 1–8. IEEE (2019)

    Google Scholar 

  39. Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)

    Google Scholar 

  40. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in Opensmile, the Munich open-source multimedia feature extractor. In: 21st ACM international conference on Multimedia, pp. 835–838. ACM (2013)

    Google Scholar 

  41. Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Computer Vision and Pattern Recognition, pp. 5562–5570. IEEE (2016)

    Google Scholar 

  42. Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM (2016)

    Google Scholar 

  43. Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. arXiv preprint arXiv:1901.01805 (2019)

  44. Friesen, E., Ekman, P.: Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3, (1978)

    Google Scholar 

  45. Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC implementations on the speaker verification task. SPECOM 1, 191–194 (2005)

    Google Scholar 

  46. Ghimire, D., Lee, J., Li, Z.N., Jeong, S., Park, S.H., Choi, H.S.: Recognition of facial expressions based on tracking and selection of discriminative geometric features. Int. J. Multimedia Ubiquitous Eng. 10(3), 35–44 (2015)

    Google Scholar 

  47. Ghosh, S., Dhall, A., Sebe, N.: Automatic group affect analysis in images via visual attribute and feature networks. In: 25th IEEE International Conference on Image Processing (ICIP), pp. 1967–1971. IEEE (2018)

    Google Scholar 

  48. Girard, J.M., Chu, W.S., Jeni, L.A., Cohn, J.F.: Sayette group formation task (GFT) spontaneous facial expression database. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 581–588. IEEE (2017)

    Google Scholar 

  49. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org

  50. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., et al.: Challenges in representation learning: a report on three machine learning contests. Neural Netw. 64, 59–63 (2015)

    Google Scholar 

  51. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Google Scholar 

  52. Gunes, H., Pantic, M.: Automatic, dimensional and continuous emotion recognition. Int. J. Synth. Emotions (IJSE) 1(1), 68–99 (2010)

    Google Scholar 

  53. Haggard, E.A., Isaacs, K.S.: Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy. In: Methods of research in psychotherapy, pp. 154–165. Springer (1966)

    Google Scholar 

  54. Han, J., Zhang, Z., Ren, Z., Schuller, B.: Implicit fusion by joint audiovisual training for emotion recognition in mono modality. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5861–5865. IEEE (2019)

    Google Scholar 

  55. Han, J., Zhang, Z., Schmitt, M., Ren, Z., Ringeval, F., Schuller, B.: Bags in bag: generating context-aware bags for tracking emotions from speech. Interspeech 2018, 3082–3086 (2018)

    Google Scholar 

  56. Happy, S., Patnaik, P., Routray, A., Guha, R.: The Indian spontaneous expression database for emotion recognition. IEEE Trans. Affect. Comput. 8(1), 131–142 (2017)

    Google Scholar 

  57. Harvill, J., AbdelWahab, M., Lotfian, R., Busso, C.: Retrieving speech samples with similar emotional content using a triplet loss function. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7400–7404. IEEE (2019)

    Google Scholar 

  58. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp. 770–778. IEEE (2016)

    Google Scholar 

  59. Hu, P., Ramanan, D.: Finding tiny faces. In: Computer vision and pattern recognition, pp. 951–959. IEEE (2017)

    Google Scholar 

  60. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Computer vision and pattern recognition, pp. 4700–4708. IEEE (2017)

    Google Scholar 

  61. Huang, Y., Yang, J., Liu, S., Pan, J.: Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet 11(5), 105 (2019)

    Google Scholar 

  62. Hussein, H., Angelini, F., Naqvi, M., Chambers, J.A.: Deep-learning based facial expression recognition system evaluated on three spontaneous databases. In: 9th International Symposium on Signal, Image, Video and Communications (ISIVC), pp. 270–275. IEEE (2018)

    Google Scholar 

  63. Jack, R.E., Blais, C., Scheepers, C., Schyns, P.G., Caldara, R.: Cultural confusions show that facial expressions are not universal. Curr. Biol. 19(18), 1543–1548 (2009)

    Google Scholar 

  64. Jack, R.E., Sun, W., Delis, I., Garrod, O.G., Schyns, P.G.: Four not six: revealing culturally common facial expressions of emotion. J. Exp. Psychol. Gen. 145(6), 708 (2016)

    Google Scholar 

  65. Jiang, B., Valstar, M.F., Pantic, M.: Action unit detection using sparse appearance descriptors in space-time video volumes. In: Face and Gesture, pp. 314–321. IEEE (2011)

    Google Scholar 

  66. Joshi, J., Goecke, R., Alghowinem, S., Dhall, A., Wagner, M., Epps, J., Parker, G., Breakspear, M.: Multimodal assistive technologies for depression diagnosis and monitoring. J. Multimodal User Interfaces 7(3), 217–228 (2013)

    Google Scholar 

  67. Jyoti, S., Sharma, G., Dhall, A.: Expression empowered residen network for facial action unit detection. In: 14th International Conference on Automatic Face and Gesture Recognition, pp. 1–8. IEEE (2019)

    Google Scholar 

  68. Kaiser, J.F.: On a Simple algorithm to calculate the ‘Energy’ of a Signal. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 381–384. IEEE (1990)

    Google Scholar 

  69. King, D.E.: Dlib-ML: A machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  70. Knyazev, B., Shvetsov, R., Efremova, N., Kuharenko, A.: Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598 (2017)

  71. Koelstra, S., Muhl, C., Soleymani, M., Lee, J.S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I.: DEAP: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)

    Google Scholar 

  72. Kratzwald, B., Ilić, S., Kraus, M., Feuerriegel, S., Prendinger, H.: Deep learning for affective computing: text-based emotion recognition in decision support. Decis. Support Syst. 115, 24–35 (2018)

    Google Scholar 

  73. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  74. Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J.: Direct modelling of speech emotion from raw speech. arXiv preprint arXiv:1904.03833 (2019)

  75. Lee, C.M., Narayanan, S.S., et al.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)

    Google Scholar 

  76. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  77. Li, S., Deng, W.: Deep facial expression recognition: a survey. arXiv preprint arXiv:1804.08348 (2018)

  78. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Computer Vision and Pattern Recognition, pp. 2852–2861. IEEE (2017)

    Google Scholar 

  79. Li, W., Xu, H.: Text-based emotion classification using emotion cause extraction. Expert Syst. Appl. 41(4), 1742–1749 (2014)

    Google Scholar 

  80. Lian, Z., Li, Y., Tao, J.H., Huang, J., Niu, M.Y.: Expression analysis based on face regions in read-world conditions. Int. J. Autom. Comput. 1–12

    Google Scholar 

  81. Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016)

    Google Scholar 

  82. Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: Proceedings of International Conference on Image Processing, vol. 1, p. I. IEEE (2002)

    Google Scholar 

  83. Liu, X., Zou, Y., Kong, L., Diao, Z., Yan, J., Wang, J., Li, S., Jia, P., You, J.: Data augmentation via latent space interpolation for image classification. In: 24th International Conference on Pattern Recognition (ICPR), pp. 728–733. IEEE (2018)

    Google Scholar 

  84. Livingstone, S.R., Russo, F.A.: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13(5), e0196391 (2018)

    Google Scholar 

  85. Lotfian, R., Busso, C.: Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast rRecordings. IEEE Trans. Affect. Comput. (2017)

    Google Scholar 

  86. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Google Scholar 

  87. Lowe, D.G., et al.: Object recognition from local scale-invariant features. ICCV 99, 1150–1157 (1999)

    Google Scholar 

  88. Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision (1981)

    Google Scholar 

  89. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE (2010)

    Google Scholar 

  90. Macías, E., Suárez, A., Lacuesta, R., Lloret, J.: Privacy in affective computing based on mobile sensing systems. In: 2nd International Electronic Conference on Sensors and Applications, p. 1. MDPI AG (2015)

    Google Scholar 

  91. Makhmudkhujaev, F., Abdullah-Al-Wadud, M., Iqbal, M.T.B., Ryu, B., Chae, O.: Facial expression recognition with local prominent directional pattern. Signal Process. Image Commun. 74, 1–12 (2019)

    Google Scholar 

  92. Mandal, M., Verma, M., Mathur, S., Vipparthi, S., Murala, S., Deveerasetty, K.: RADAP: regional adaptive affinitive patterns with logical operators for facial expression recognition. IET Image Processing (2019)

    Google Scholar 

  93. Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE’05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp. 8–8. IEEE (2006)

    Google Scholar 

  94. Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)

    Google Scholar 

  95. McDuff, D., Amr, M., El Kaliouby, R.: AM-FED+: an extended dataset of naturalistic facial expressions collected in everyday settings. IEEE Trans. Affect. Comput. 10(1), 7–17 (2019)

    Google Scholar 

  96. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S.: Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000)

    Google Scholar 

  97. McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012)

    Google Scholar 

  98. Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)

    Google Scholar 

  99. Mehrabian, A., Ferris, S.R.: Inference of attitudes from nonverbal communication in two channels. J. Consult. Psychol. 31(3), 248 (1967)

    Google Scholar 

  100. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  101. Moffat, D., Ronan, D., Reiss, J.D.: An evaluation of audio feature extraction toolboxes (2015)

    Google Scholar 

  102. Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: A database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985 (2017)

  103. Munezero, M.D., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–111 (2014)

    Google Scholar 

  104. Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)

    Google Scholar 

  105. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)

    Google Scholar 

  106. Ojansivu, V., Heikkilä, J.: Blur insensitive texture classification using local phase quantization. In: International Conference on Image and Signal Processing, pp. 236–243. Springer (2008)

    Google Scholar 

  107. Ou, J., Bai, X.B., Pei, Y., Ma, L., Liu, W.: Automatic facial expression recognition using gabor filter and expression analysis. In: 2nd International Conference on Computer Modeling and Simulation, vol. 2, pp. 215–218. IEEE (2010)

    Google Scholar 

  108. Pan, X., Guo, W., Guo, X., Li, W., Xu, J., Wu, J.: Deep temporal-spatial aggregation for video-based facial expression recognition. Symmetry 11(1), 52 (2019)

    Google Scholar 

  109. Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. BMVC 1, 6 (2015)

    Google Scholar 

  110. Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)

    Google Scholar 

  111. Rassadin, A., Gruzdev, A., Savchenko, A.: Group-level emotion recognition using transfer learning from face identification. In: 19th ACM International Conference on Multimodal Interaction, pp. 544–548. ACM (2017)

    Google Scholar 

  112. Reynolds, C., Picard, R.: Affective sensors, privacy, and ethical contracts. In: CHI’04 Extended Abstracts on Human Factors in Computing Systems, pp. 1103–1106. ACM (2004)

    Google Scholar 

  113. Rhue, L.: Racial influence on automated perceptions of emotions. Available at SSRN 3281765, (2018)

    Google Scholar 

  114. Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recogn. Lett. 66, 22–30 (2015)

    Google Scholar 

  115. Ringeval, F., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S., Amiriparian, S., Messner, E.M., et al.: AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In: 9th International on Audio/Visual Emotion Challenge and Workshop, pp. 3–12. ACM (2019)

    Google Scholar 

  116. Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 10th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8. IEEE (2013)

    Google Scholar 

  117. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)

    Google Scholar 

  118. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Adv. Neural Inform. Process. Syst. 3856–3866 (2017)

    Google Scholar 

  119. Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: 12th International Conference on Computer Vision, pp. 1034–1041. IEEE (2009)

    Google Scholar 

  120. Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1113–1133 (2015)

    Google Scholar 

  121. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, Autism. In: 14th Annual Conference of the International Speech Communication Association (2013)

    Google Scholar 

  122. Sebe, N., Cohen, I., Gevers, T., Huang, T.S.: Emotion recognition based on joint visual and audio cues. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 1136–1139. IEEE (2006)

    Google Scholar 

  123. Seyeditabari, A., Tabari, N., Zadrozny, W.: Emotion detection in text: a review. arXiv preprint arXiv:1806.00674 (2018)

  124. Shi, J., Tomasi, C.: Good Features to Track. Tech. rep, Cornell University (1993)

    Google Scholar 

  125. Siddharth, S., Jung, T.P., Sejnowski, T.J.: Multi-modal approach for affective computing. arXiv preprint arXiv:1804.09452 (2018)

  126. Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., Bartlett, M.: Multiple Kernel learning for emotion recognition in the wild. In: 15th ACM on International Conference on Multimodal Interaction, pp. 517–524. ACM (2013)

    Google Scholar 

  127. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  128. Sneddon, I., McRorie, M., McKeown, G., Hanratty, J.: The Belfast induced natural emotion database. IEEE Trans. Affect. Comput. 3(1), 32–41 (2012)

    Google Scholar 

  129. Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012)

    Google Scholar 

  130. Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: ACM Symposium on Applied Computing, pp. 1556–1560. ACM (2008)

    Google Scholar 

  131. Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: Lrec, vol. 4, p. 40. Citeseer (2004)

    Google Scholar 

  132. Teager, H.: Some observations on oral air flow during phonation. IEEE Trans. Acoust. Speech Signal Process. 28(5), 599–601 (1980)

    Google Scholar 

  133. Thoits, P.A.: The sociology of emotions. Annu. Rev. Sociol. 15(1), 317–342 (1989)

    Google Scholar 

  134. Tomasi, C., Detection, T.K.: Tracking of point features. Tech. rep., Tech. Rep. CMU-CS-91-132, Carnegie Mellon University (1991)

    Google Scholar 

  135. Torres, J.M.M., Stepanov, E.A.: Enhanced face/audio emotion recognition: video and instance level classification using ConvNets and restricted boltzmann machines. In: International Conference on Web Intelligence, pp. 939–946. ACM (2017)

    Google Scholar 

  136. Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204. IEEE (2016)

    Google Scholar 

  137. Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: Computer Vision and Pattern Recognition, pp. 1526–1535. IEEE (2018)

    Google Scholar 

  138. Verma, G.K., Tiwary, U.S.: Multimodal fusion framework: a multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 102, 162–172 (2014)

    Google Scholar 

  139. Viola, P., Jones, M., et al.: Rapid object detection using a boosted cascade of simple features. CVPR 1(1), 511–518 (2001)

    Google Scholar 

  140. Wagner, J., Andre, E., Lingenfelser, F., Kim, J.: Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affect. Comput. 2(4), 206–218 (2011)

    Google Scholar 

  141. Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: International Conference on Affective Computing and Intelligent Interaction, pp. 114–125. Springer (2007)

    Google Scholar 

  142. Wang, S., Liu, Z., Lv, S., Lv, Y., Wu, G., Peng, P., Chen, F., Wang, X.: A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimedia 12(7), 682–691 (2010)

    Google Scholar 

  143. Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45(4), 1191–1207 (2013)

    Google Scholar 

  144. Wiles, O., Koepke, A., Zisserman, A.: Self-supervised learning of a facial attribute embedding from video. arXiv preprint arXiv:1808.06882 (2018)

  145. Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)

    Google Scholar 

  146. Wu, T., Bartlett, M.S., Movellan, J.R.: Facial expression recognition using gabor motion energy filters. In: Computer Vision and Pattern Recognition-Workshops, pp. 42–47. IEEE (2010)

    Google Scholar 

  147. Wu, Y., Kang, X., Matsumoto, K., Yoshida, M., Kita, K.: Emoticon-based emotion analysis for Weibo articles in sentence level. In: International Conference on Multi-disciplinary Trends in Artificial Intelligence, pp. 104–112. Springer (2018)

    Google Scholar 

  148. Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)

    Google Scholar 

  149. Yan, W.J., Li, X., Wang, S.J., Zhao, G., Liu, Y.J., Chen, Y.H., Fu, X.: CASME II: an improved spontaneous micro-expression database and the baseline evaluation. PloS One 9(1), e86041 (2014)

    Google Scholar 

  150. Yan, W.J., Wu, Q., Liang, J., Chen, Y.H., Fu, X.: How fast are the leaked facial expressions: the duration of micro-expressions. J. Nonverbal Behav. 37(4), 217–230 (2013)

    Google Scholar 

  151. Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic Face and Gesture Recognition, pp. 211–216. IEEE (2006)

    Google Scholar 

  152. Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence and arousal’In-the-wild’challenge. In: Computer Vision and Pattern Recognition Workshops, pp. 34–41. IEEE (2017)

    Google Scholar 

  153. Zamil, A.A.A., Hasan, S., Baki, S.M.J., Adam, J.M., Zaman, I.: Emotion detection from speech signals using voting mechanism on classified frames. In: International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 281–285. IEEE (2019)

    Google Scholar 

  154. Zhalehpour, S., Onder, O., Akhtar, Z., Erdem, C.E.: BAUM-1: a spontaneous audio-visual face database of affective and mental states. IEEE Trans. Affect. Comput. 8(3), 300–313 (2017)

    Google Scholar 

  155. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)

    Google Scholar 

  156. Zhang, Z., Girard, J.M., Wu, Y., Zhang, X., Liu, P., Ciftci, U., Canavan, S., Reale, M., Horowitz, A., Yang, H., et al.: Multimodal spontaneous emotion corpus for human behavior analysis. In: Computer Vision and Pattern Recognition, pp. 3438–3446. IEEE (2016)

    Google Scholar 

  157. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 550–569 (2018)

    MathSciNet  Google Scholar 

  158. Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 607–619 (2011)

    Google Scholar 

  159. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 6, 915–928 (2007)

    Google Scholar 

  160. Zhong, P., Wang, D., Miao, C.: An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. arXiv preprint arXiv:1811.07078 (2018)

  161. Zhou, G., Hansen, J.H., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Garima Sharma or Abhinav Dhall .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Sharma, G., Dhall, A. (2021). A Survey on Automatic Multimodal Emotion Recognition in the Wild. In: Phillips-Wren, G., Esposito, A., Jain, L.C. (eds) Advances in Data Science: Methodologies and Applications. Intelligent Systems Reference Library, vol 189. Springer, Cham. https://doi.org/10.1007/978-3-030-51870-7_3

Download citation