Abstract
Classifying the correct emotion from different data sources such as text, images, videos, and speech has been an inspiring research area for researchers from various disciplines. Automatic emotion detection from videos and images is one of the most challenging tasks that have been analyzed using supervised and unsupervised machine learning methods. Deep learning has been also employed where the model has been trained by facial and body features using pose and landmark detectors and trackers. In this paper, facial and body features extracted by the OpenPose tool have been used for detecting basic 6, 7 and 9 emotions from videos and images by a novel deep neural network framework which combines the Gaussian mixture model with CNN, LSTM and Transformer to generate the CNN-LSTM model and CNN-Transformer model with and without Gaussian centers. The experiments which were conducted using two benchmark datasets, namely FABO and CK+, showed that the proposed transformer model with 9 and 12 Gaussian centers with video generation approach was able to achieve close to 100% classification accuracy for the FABO dataset which outperforms the other DNN frameworks for emotion detection. It reported over 90% accuracy for most combinations of features for both datasets leading to a comparable framework for video emotion classification.
Similar content being viewed by others
Abbreviations
- FABO:
-
The Bi-modal Face and Body Gesture Database
- CK+:
-
The Extended Cohn-Kanade Dataset
- k-fold:
-
k-fold cross-validation
- cnnf:
-
CNN model that trained with facial features only
- cnnb:
-
CNN model that trained with body features only
- cnnOutf:
-
Predictions obtained from the cnnf model.
- cnnOutb:
-
Predictions obtained from the cnnfbmodel.
- lstmf:
-
LSTM model that trained with facial features only
- lstmb:
-
LSTM model that trained with body features only
- lstmOutf:
-
Predictions obtained from the lstmf model.
- lstmOutb:
-
Predictions obtained from the lstmb model.
- mm:
-
The final DNN block used to join face and body models
- transformerf:
-
Transformer based model trained with facial features
- transformerb:
-
Transformer based model trained with body features
- transformerOutf:
-
Predictions obtained from the transformerf model.
- transformerOutb:
-
Predictions obtained from the transformerb model.
- Dg:
-
Dataset with Gaussian mixture centers added.
- TopX Frames:
-
The approach we select the 10 most informational frames
References
Agrawal A, Mittal N (2020) Using cnn for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis Comput 36(2):405–412
Agrawal A, Mittal N (2020) Using cnn for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy, Vis Comput, 36(2):405–412
Akçay MB, Oǧuz K (2020) Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers, Speech Commun, Elsevier, 116:56–76
Akçay MB, Oǧuz K (2020) Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun, Elsevier 116:56–76
Alswaidan N, El Bachir Menai M (2020) A survey of state-of-the-art approaches for emotion recognition in text. Springer, Knowl Inf Syst, pp 1–51
Alswaidan N, El Bachir Menai M (2020) A survey of state-of-the-art approaches for emotion recognition in text, Knowl Inf Syst, Springer, 1–51
Bänziger T, Scherer KR (2010) Introducing the geneva multimodal emotion portrayal (gemep) corpus. A sourcebook, Blueprint for affective computing, pp 271–94
Bänziger T, Scherer KR (2010) Introducing the geneva multimodal emotion portrayal (gemep) corpus, Blueprint for affective computing: A sourcebook, p 271–94
Barros P, Jirak D, Weber C, Wermter S (2015) Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Netw 72:140–151
Barros P, Churamani N, Sciutti A (2020) The facechannel: A fast and furious deep neural network for facial expression recognition. SN Comput Sci 1(6):1–10
Barros P, Churamani N, Sciutti A (2020) The facechannel: A fast and furious deep neural network for facial expression recognition, SN Comput Sci, 1(6)1–10
Barros P, Jirak D, Weber C, Wermter S (2015) Multimodal emotional state recognition using sequence-dependent deep hierarchical features, Neural Netw, 72:140–151
Behoora I, Tucker CS (2015) Machine learning classification of design team members’ body language patterns for real time emotional state detection. Design Studies 39:100–127
Borod JC (2000) The neuropsychology of emotion. Oxford University Press
Bota PJ, Wang C, Fred ALN, Da Silva HP (2019) A review, current challenges, and future possibilities on emotion recognition using machine learning and physiological signals. IEEE Access 7:140990–141020
Broad CD (1954) Emotion and sentiment. J Aesthet Art Crit 13(2):203–214
Calvo RA, Mac Kim S (2013) Emotions in text: dimensional and categorical models. Comput Intell 29(3):527–543
Chakraborty BK, Sarma D, Bhuyan MK, MacDorman KF (2018) Review of constraints on vision-based gesture recognition for human-computer interaction, IET Computer Vision, 12(1):3–15
Chakraborty BK, Sarma D, Bhuyan MK, MacDorman KF (2018) Review of constraints on vision-based gesture recognition for human-computer interaction. IET Computer Vision 12(1):3–15
Chen LF, Yen YS (2007) Taiwanese facial expression image database. brain mapping laboratory, Institute of Brain Science, National Yang-Ming University, Taipei, Taiwan, http://bml.ym.edu.tw/download/html
Chul Ko B (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401
Clore GL, Ortony A, Collins A (1988) The Cognitive Structure of Emotions. Cambridge University Press
Darwin C, Prodger P (1998) The expression of the emotions in man and animals. Oxford University Press, USA
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE multimedia 3:34–41
Ekman P (1992) An argument for basic emotions. Cognit Emot 6(3–4):169–200
Francesca N, Dagnes N, Marcolin F, Vezzetti E (2019) 3d approaches and challenges in facial expression recognition algorithms-a literature review. Appl Sci 9(18):3904
Hu M, Wang H, Wang X, Yang J, Wang R (2019) Video facial emotion recognition based on local enhanced motion history image and cnn-ctslstm networks. J Vis Commun Image Represent, Elsevier 59:176–185
Hu M, Wang H, Wang X, Yang J, Wang R (2019) Video facial emotion recognition based on local enhanced motion history image and cnn-ctslstm networks, J Vis Commun Image Represent, Elsevier, 59:176–185
ialab admin Detecting human facial expression by common computer vision techniques, http://www.interactivearchitecture.org/detecting-human-facial-expression-by-common-computer-vision-techniques.html
Kah Phooi Seng J, Li-Minn Ang K (2019) Multimodal emotion and sentiment modeling from unstructured big data: Challenges, architecture, & techniques. IEEE Access 7:90982–90998
Kah Phooi Seng J, Li-Minn Ang K (2019) Multimodal emotion and sentiment modeling from unstructured big data: Challenges, architecture, & techniques, IEEE Access, 7:90982–90998
Khalil RA, Jones E, Babar MI, Jan T, Zafar MH, Alhussain T (2019) Speech emotion recognition using deep learning techniques: A review. IEEE Access 7:117327–117345
Kleinginna PR, Kleinginna AM (1981) A categorized list of emotion definitions, with suggestions for a consensual definition, Motiv Emot, 5(4):345–379
Kleinginna PR, Kleinginna AM (1981) A categorized list of emotion definitions, with suggestions for a consensual definition. Motiv Emot 5(4):345–379
Kossaifi J, Walecki R, Panagakis Y, Shen J, Schmitt M, Ringeval F, Han J et al (2019) Sewa db: A rich database for audio-visual emotion and sentiment research in the wild, IEEE Trans Pattern Anal Mach Intell
Langner O, Dotsch R, Bijlstra G, Wigboldus DHJ, Hawk ST, Van Knippenberg AD (2010) Presentation and validation of the radboud faces database. Cogn Emot 24(8):1377–1388
LeDoux JE (1984) Cognition and emotion. Handbook of cognitive neuroscience, Springer, US, pp 357–368
Li S, Deng W (2020) Deep facial expression recognition: A survey, IEEE Trans Affect Comput
Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one 13(5):e0196391
Lovheim H (2012) A new three-dimensional model for emotions and monoamine neurotransmitters, Med hypotheses, 78(2):341–348
Lovheim H (2012) A new three-dimensional model for emotions and monoamine neurotransmitters. Med hypotheses 78(2):341–348
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) he extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, 2010 ieee computer society conference on computer vision and pattern recognition-workshops, IEEE, p 94–101
Ly ST, Lee GS, Kim SH, Yang HJ (2019) Gesture-based emotion recognition by 3d-cnn and lstm with keyframes selection. Int J Contents 15(4):59–64
Lyons MJ, Budynek J, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362
Mungra D, Agrawal A, Sharma P, Tanwar S, Obaidat MS (2020) Pratit: a cnn-based emotion recognition system using histogram equalization and data augmentation. Multimedia Tools Appl 79(3):2285–2307
Mungra D, Agrawal A, Sharma P, Tanwar S, Obaidat MS (2020) Pratit: a cnn-based emotion recognition system using histogram equalization and data augmentation, Multimedia Tools Appl, 79(3):2285–2307
Nandwani P, Verma R (2021) A review on sentiment analysis and emotion detection from text. Soc Netw Anal Min 11(1):1–19
Oatley K, Johnson-Laird PN (1987) Towards a cognitive theory of emotions. Cognit Emot 1(1):29–50
Oatley K, Johnson-Laird PN (1987) Towards a cognitive theory of emotions, Cognit emot, 1(1):29–50
Plutchik R (1980) Emotion: A Psychoevolutionary Synthesis. Harper and Row
Poria S, Majumder N, Mihalcea R, Hovy E (2019) Emotion recognition in conversation: Research challenges, datasets, and recent advances. IEEE Access 7:100943–100953
Poria S, Majumder N, Mihalcea R, Hovy E (2019) Emotion recognition in conversation: Research challenges, datasets, and recent advances, IEEE Access, 7:100943–100953
Rafiqul Islam M, Ashad Kabir M, Ahmed A, Kamal ARM, Wang H, Ulhaq A (2018) Depression detection from social network data using machine learning techniques. Health Inf Sci Syst 6(1):1–12
Rafiqul Islam M, Ashad Kabir M, Ahmed A, Kamal ARM, Wang H, Ulhaq A (2018) Depression detection from social network data using machine learning techniques, Health Inf Sci Syst, 6(1):1–12
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178
Sailunaz K, Dhaliwal M, Rokne J, Alhajj R (2018) Emotion detection from text and speech: a survey. Soc Netw Anal Min, Springer 8(1):28
Sailunaz K, Dhaliwal M, Rokne J, Alhajj R (2018) Emotion detection from text and speech: a survey, Soc Netw Anal Min, Springer, 8(1):28
Santamaria-Granados L, Mendoza-Moreno JF, Ramirez-Gonzalez G (2021) Tourist recommender systems based on emotion recognition-a scientometric review. Future Internet 13(1):2
Santhoshkumar R, Kalaiselvi Geetha M (2019) Deep learning approach for emotion recognition from human body movements with feedforward deep convolution neural networks. Procedia Comput Sci 152:158–165
Santhoshkumar R, Kalaiselvi Geetha M (2019) Deep learning approach for emotion recognition from human body movements with feedforward deep convolution neural networks, Procedia Comput Sci, 152:158–165
Sapiński T, Kamińska D, Pelikant A, Anbarjafari G (2019) Emotion recognition from skeletal movements. Entropy 21(7):646
Scherer KR (2000) Psychological models of emotion. The Neuropsychol Emot 137(3):137–162
Shaver P, Schwartz J, Kirson D, O’connor C (1987) Emotion knowledge: further exploration of a prototype approach. J Personal Soc Psychol 52(6):1061–1086
Sreeja PS, Mahalakshmi GS (2017) Emotion models: A review. Int J Control Theory Appl 10(8):651–657
Sun X, Lv M (2019) Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn Comput 11(4):587–597
Sun X, Lv M (2019) Facial expression recognition based on a hybrid model combining deep and shallow features, Cogn Comput, 11(4):587–597
Wang S, Li J, Cao T, Wang H, Tu P, Li Y (2020) Dance emotion recognition based on laban motion analysis using convolutional neural network and long short-term memory. IEEE Access 8:124928–124938
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines, Proc IEEE Conf Comput Vis Pattern Recog, 4724–4732
Xie B, Sidulova M, Hyuk Park C (2021) Robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion. Sensors 21(14):4913
Yang D, Alsadoon A, Prasad PWC, Kumar Singh A, Elchouemi A (2018) An emotion recognition model based on facial recognition in virtual learning environment. Procedia Comput Sci 125:2–10
Yang D, Alsadoon A, Prasad PWC, Kumar Singh A, Elchouemi A (2018) An emotion recognition model based on facial recognition in virtual learning environment, Procedia Comput Sci, 125:2–10
Yu Z, Liu G, Liu Q, Deng J (2018) Spatio-temporal convolutional features with nested lstm for facial expression recognition. Neurocomputing 317:50–57
Yu Z, Liu G, Liu Q, Deng J (2018) Spatio-temporal convolutional features with nested lstm for facial expression recognition, Neurocomputing, 317:50–57
Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos, Image Vis Comput, 29(9):607–619
Acknowledgements
The numerical calculations reported in this paper were fully/partially performed at TUBITAK ULAKBIM, High Performance and the Grid Computing Center (TRUBA resources).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests
No conflict of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Karatay, B., Beştepe, D., Sailunaz, K. et al. CNN-Transformer based emotion classification from facial expressions and body gestures. Multimed Tools Appl 83, 23129–23171 (2024). https://doi.org/10.1007/s11042-023-16342-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16342-5