A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

Chouhayebi, Hajar; Mahraz, Mohamed Adnane; Riffi, Jamal; Tairi, Hamid

doi:10.1007/s11042-023-16779-8

A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

Published: 25 September 2023

Volume 83, pages 32993–33017, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hajar Chouhayebi ORCID: orcid.org/0000-0001-7080-1645¹,
Mohamed Adnane Mahraz¹,
Jamal Riffi¹ &
…
Hamid Tairi¹

1 Citation
Explore all metrics

Abstract

Facial expression recognition plays an essential role in surveillance videos, anxiety treatment, expression analysis, gesture recognition, computer games, patient monitoring, operator fatigue detection, and robotics. Therefore, facial expression recognition has attracted more and more attention over the years and became a difficult task because emotion can be influenced by several factors. Some approaches based on Deep Convolutional Neural Networks (DCNN) and Transfer Learning have been successful to recognize emotion in video sequences. However, these approaches remain limited because it is difficult to model spatio-temporal interactions between video frames or identify salient features to improve accuracy. In this article, we propose a facial expression recognition system combining the representations of Deep Learning features and dynamic texture features. For the Deep Learning part, we used the VGG19 model to extract facial features, which will feed the LSTM (Long Short Term Memory) cells in order to extract spatio-temporal information between frames. While the HOG-TOP (Histogram of Oriented Gradients from Three Orthogonal Planes) descriptor aims to extract dynamic textures from video sequences to characterize facial appearance changes. Finally, we combine both models with a Multimodal Compact Bilinear (MCB) algorithm to produce a robust descriptor vector. Classification was performed using the SVM (Support Vector Machine) classifier to predict the emotion class. The experimental part was carried out based on the INTERFACE05 dataset that the accuracy of facial expression recognition was increased almost 1% by the fusion method (98.44%) than the baseline approach (97.75%). To summarize, the proposed method obtain a higher accuracy and robust detection meaning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Data availability

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

References

Roccetti M, Marfia G, Zanichelli M (2010) The art and craft of making the tortellino: Playing with a digital gesture recognizer for preparing pasta culinary recipes. Comput Entertain 8(4). https://doi.org/10.1145/1921141.1921148
Sariyanidi E, Gunes H, Cavallaro A (2015) Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Trans Pattern Anal Mach Intell 37(6):1113–1133. https://doi.org/10.1109/TPAMI.2014.2366127
Article PubMed Google Scholar
Corive R et al (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(1):32–80. https://doi.org/10.1109/79.911197
Article ADS Google Scholar
Saste ST, Jagdale SM (2017) Emotion recognition from speech using MFCC and DWT for security system. Proc Int Conf Electron Commun Aerosp Technol ICECA 2017 2017-January:701–704. https://doi.org/10.1109/ICECA.2017.8203631
Article Google Scholar
Dang LT, Cooper EW, Kamei K (2014) Development of facial expression recognition for training video customer service representatives. IEEE Int Conf Fuzzy Syst 1297–1303. https://doi.org/10.1109/FUZZ-IEEE.2014.6891864
Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. ∗ University of Michigan Electrical Engineering and Computer Science, Ann Arbor, Michigan, USA. Electr Eng. pp 3687–3691 [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.428.5585&rep=rep1&type=pdf
Kim J, Ricci M, Serre T (2018) Not-So-CLEVR: Learning same-different relations strains feedforward neural networks. Interface Focus 8(4). https://doi.org/10.1098/rsfs.2018.0011
Mellouk W, Handouzi W (2020) Facial emotion recognition using deep learning: Review and insights. Procedia Comput Sci 175:689–694. https://doi.org/10.1016/j.procs.2020.07.101
Article Google Scholar
Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443. https://doi.org/10.1109/TPAMI.2018.2798607
Article PubMed Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection to cite this version : Histograms of oriented gradients for human detection, 2005 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp 886–893 [Online]. Available: http://lear.inrialpes.fr
Zhao G, Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928. https://doi.org/10.1109/TPAMI.2007.1110
Article PubMed Google Scholar
Martin O, Kotsia I, Macq B, Pitas I (2006) The eNTERFACE’05 Audio-Visual emotion database, ICDEW 2006 - Proc. 22nd Int. Conf. Data Eng. Work (1) pp 2–9. https://doi.org/10.1109/ICDEW.2006.145
Deng H-B, Jin L-W, Zhen L-X, Huang J-C (2005) A new facial expression recognition method based on local gabor filter bank and pca plus lda. Int J Inf Technol 11(11):86–96
Google Scholar
Satiyan M (2010) Recognition of facial expression using haar wavelet transform. Int J Electr Electron Syst Res 3(June):89–96
Google Scholar
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image Vis Comput 27(6):803–816. https://doi.org/10.1016/j.imavis.2008.08.005
Article Google Scholar
Kola DGR, Samayamantula SK (2021) A novel approach for facial expression recognition using local binary pattern with adaptive window. Multimed Tools Appl 80(2):2243–2262. https://doi.org/10.1007/s11042-020-09663-2
Article Google Scholar
Huang X, Zhao G, Pietikäinen M, Zheng W (2011) Expression recognition in videos using a weighted component-based feature descriptor. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6688 LNCS, pp 569–578. https://doi.org/10.1007/978-3-642-21227-7_53
Huang X, He Q, Hong X, Zhao G, Pietikänen M (2014) Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild, ICMI 2014 - Proc. 2014 Int. Conf. Multimodal Interact. (March 2017) pp 514–520. https://doi.org/10.1145/2663204.2666278
Huang X, Zhao G, Zheng W, Pietikäinen M (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Process Lett 19(5):243–246. https://doi.org/10.1109/LSP.2012.2188890
Article ADS Google Scholar
Chen J, Chen Z, Chi Z, Fu H (2018) Facial expression recognition in video with multiple feature fusion. IEEE Trans Affect Comput 9(1):38–50. https://doi.org/10.1109/TAFFC.2016.2593719
Article CAS Google Scholar
Long F, Wu T, Movellan JR, Bartlett MS, Littlewort G (2012) Learning spatiotemporal features by using independent component analysis with application to facial expression recognition. Neurocomputing 93:126–132. https://doi.org/10.1016/j.neucom.2012.04.017
Article Google Scholar
Chew SW, Rana R, Lucey P, Lucey S, Sridharan S (2011) Sparse temporal representations for facial expression recognition. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 7088 LNCS(PART 2) pp 311–322. https://doi.org/10.1007/978-3-642-25346-1_28
Almaev TR, Valstar MF (2013) Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition, Proc. - 2013 Hum. Assoc. Conf. Affect. Comput. Intell. Interact. ACII 2013, (August 2014) pp 356–361. https://doi.org/10.1109/ACII.2013.65
Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. vol. 2017-July, pp 2278–2288. https://doi.org/10.1109/CVPRW.2017.282
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Szegedy C et al (2015) Going deeper with convolutions, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07–12-June, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Krizhevsky A, Hinton G (2012) ImageNet classification with deep convolutional neural networks (presentation). ImageNet Large Scale Vis Recognit Chall 27
Gudi A, Tasli HE, Den Uyl TM, Maroulis A (2015) Deep learning based facs action unit occurrence and intensity estimation, 2015 11th IEEE Int. Conf. Work. Autom. Face Gesture Recognition, FG 2015, vol. 2015-Janua. https://doi.org/10.1109/FG.2015.7284873
Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Sel Top Signal Process 11(8):1301–1309. https://doi.org/10.1109/JSTSP.2017.2764438
Article ADS Google Scholar
Nguyen D, Nguyen K, Sridharan S, Dean D, Fookes C (2018) Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition. Comput Vis Image Underst 174:33–42. https://doi.org/10.1016/j.cviu.2018.06.005
Article Google Scholar
Ranganathan H, Chakraborty S, Panchanathan S (2016) Multimodal emotion recognition using deep learning architectures, 2016 IEEE Winter Conf. Appl. Comput. Vision, WACV 2016. https://doi.org/10.1109/WACV.2016.7477679
Chen S, Jin Q (2015) Multi-modal dimensional emotion recognition using recurrent neural networks, AVEC 2015 - Proc. 5th Int. Work. Audio/Visual Emot. Challenge, co-Located with MM 2015, pp 49–56. https://doi.org/10.1145/2808196.2811638
Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y (2018) Multi-cue fusion for emotion recognition in the wild. Neurocomputing 309:27–35. https://doi.org/10.1016/j.neucom.2018.03.068
Article Google Scholar
Fan Y, Lu X, Li D, Liu Y (2016) Video-Based emotion recognition using CNN-RNN and C3D hybrid networks, ICMI 2016 - Proc. 18th ACM Int. Conf. Multimodal Interact. pp 445–450. https://doi.org/10.1145/2993148.2997632
Miyoshi R, Nagata N, Hashimoto M (2021) Enhanced convolutional LSTM with spatial and temporal skip connections and temporal gates for facial expression recognition from video. Neural Comput Appl 33(13):7381–7392. https://doi.org/10.1007/s00521-020-05557-4
Article Google Scholar
Ding H, Zhou SK, Chellappa R (2017) FaceNet2ExpNet: Regularizing a deep face recognition net for expression recognition, Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge, pp 118–126. https://doi.org/10.1109/FG.2017.23
Li B, Lima D (2021) Facial expression recognition via ResNet-50. Int J Cogn Comput Eng 2(January):57–64. https://doi.org/10.1016/j.ijcce.2021.02.002
Article Google Scholar
Gao J, Zhao Y (2021) TFE: A transformer architecture for occlusion aware facial expression recognition. Front Neurorobot 15(October):1–10. https://doi.org/10.3389/fnbot.2021.763100
Article ADS Google Scholar
Ng HW, Nguyen VD, Vonikakis V, Winkler S (2015) Deep learning for emotion recognition on small datasets using transfer learning, ICMI 2015 - Proc. 2015 ACM Int. Conf. Multimodal Interact., pp 443–449. https://doi.org/10.1145/2818346.2830593
Priyasad D, Fernando T, Denman S, Sridharan S, Fookes C (2020) Learning salient features for multimodal emotion recognition with recurrent neural networks and attention based fusion (August) pp. 21–26. https://doi.org/10.21437/avsp.2019-5
Liu W et al (2016) SSD: Single shot multibox detector, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp 1–14
Staudemeyer RC, Morris ER (2019) Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks, pp 1–42, [Online]. Available: http://arxiv.org/abs/1909.09586
Ooi CS, Seng KP, Ang LM, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869. https://doi.org/10.1016/j.eswa.2014.03.026
Article Google Scholar
Wu CH, Lin JC, Wei WL (2014) Survey on audiovisual emotion recognition: Databases, features, and data fusion strategies. APSIPA Trans Signal Inf Process 3(November):2014. https://doi.org/10.1017/ATSIP.2014.11
Article Google Scholar
Lisetti CL (1998) Affective computing. Pattern Anal Appl 1(1):71–73. https://doi.org/10.1007/bf01238028
Article Google Scholar
Lin TY, Roychowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition, Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp 1449–1457. https://doi.org/10.1109/ICCV.2015.170
Fukui A, Park DH, Yang D, Rohrbach A, Darrell T, Rohrbach M (2016) Multimodal compact bilinear pooling for visual question answering and visual grounding, EMNLP 2016 - Conf. Empir. Methods Nat. Lang. Process. Proc., pp 457–468. https://doi.org/10.18653/v1/d16-1044
Delbrouck J-B, Dupont S (2017) Multimodal compact bilinear pooling for multimodal neural machine translation, no. 2014, pp 2014–2017, [Online]. Available: http://arxiv.org/abs/1703.08084
Huang YY, Wang WY (2017) Deep residual learning for weakly-supervised relation extraction, EMNLP 2017 - Conf. Empir. Methods Nat. Lang. Process. Proc., pp 1803–1807. https://doi.org/10.18653/v1/d17-1191
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2010) ImageNet: A large-scale hierarchical image database. IEEE Conf. Comput. Vis. pattern Recognit., no. June, pp 248–255. https://doi.org/10.1109/cvpr.2009.5206848
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J Uncertainty Fuzziness Knowlege-Based Syst 6(2):107–116. https://doi.org/10.1142/S0218488598000094
Article MathSciNet Google Scholar
Chen G (2016) A gentle tutorial of recurrent neural network with error backpropagation. pp 1–9 [Online]. Available: http://arxiv.org/abs/1610.02583
Gritti T, Shan C, Jeanne V, Braspenning R (2008) “Local features based facial expression recognition with face registration errors,” 2008 8th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2008, no. October, pp. 1–8. https://doi.org/10.1109/AFGR.2008.4813379
King DE (2009) Dlib-ml: A machine learning toolkit. J Mach Learn Res 10:1755–1758
Google Scholar
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 Faces In-The-Wild Challenge: database and results. Image Vis Comput 47:3–18. https://doi.org/10.1016/j.imavis.2016.01.002
Article Google Scholar
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) vol. 2380 LNCS:693–703. https://doi.org/10.1007/3-540-45465-9_59
Pham N, Pagh R (2013) Fast and scalable polynomial kernels via explicit feature maps. Proc ACM SIGKDD Int Conf Knowl Discov Data Min vol. Part F1288, pp 239–247. https://doi.org/10.1145/2487575.2487591
Wang Z, Zhou X, Wang W, Liang C (2020) Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video. Int J Mach Learn Cybern 11(4):923–934. https://doi.org/10.1007/s13042-019-01056-8
Article Google Scholar
Priyasad D, Fernando T, Denman S, Sridharan S, Fookes C (2020) Attention driven fusion for multi-modal emotion recognition. ICASSP IEEE Int Conf Acoust Speech Signal Process- Proc 2020-May:3227–3231. https://doi.org/10.1109/ICASSP40776.2020.9054441
Article Google Scholar

Download references

Author information

Authors and Affiliations

LISAC, Department of Computer Science, Sciences Faculty Dhar-Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Hajar Chouhayebi, Mohamed Adnane Mahraz, Jamal Riffi & Hamid Tairi

Authors

Hajar Chouhayebi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Adnane Mahraz
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Riffi
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Tairi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hajar Chouhayebi.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chouhayebi, H., Mahraz, M.A., Riffi, J. et al. A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition. Multimed Tools Appl 83, 32993–33017 (2024). https://doi.org/10.1007/s11042-023-16779-8

Download citation

Received: 06 June 2022
Revised: 22 July 2023
Accepted: 31 August 2023
Published: 25 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16779-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

A review of convolutional neural networks in computer vision

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

A review of convolutional neural networks in computer vision

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation