Skip to main content
Log in

Hierarchical scale convolutional neural network for facial expression recognition

  • Research Article
  • Published:
Cognitive Neurodynamics Aims and scope Submit manuscript

Abstract

Recognition of facial expressions plays an important role in understanding human behavior, classroom assessment, customer feedback, education, business, and many other human-machine interaction applications. Some researchers have realized that using features corresponding to different scales can improve the recognition accuracy, but there is a lack of a systematic study to utilize the scale information. In this work, we proposed a hierarchical scale convolutional neural network (HSNet) for facial expression recognition, which can systematically enhance the information extracted from the kernel, network, and knowledge scale. First, inspired by that the facial expression can be defined by different size facial action units and the power of sparsity, we proposed dilation Inception blocks to enhance kernel scale information extraction. Second, to supervise relatively shallow layers for learning more discriminated features from different size feature maps, we proposed a feature guided auxiliary learning approach to utilize high-level semantic features to guide the shallow layers learning. Last, since human cognitive ability can progressively be improved by learned knowledge, we mimicked such ability by knowledge transfer learning from related tasks. Extensive experiments on lab-controlled, synthesized, and in-the-wild databases showed that the proposed method substantially boosts performance, and achieved state-of-the-art accuracy on most databases. Ablation studies proved the effectiveness of modules in the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abbasi AA, Hussain L, Awan IA, Abbasi I, Majid A, Nadeem MSA, Chaudhary QA (2020) Detecting prostate cancer using deep learning convolution neural network with transfer learning approach. Cogn Neurodyn 14(4):523–533

    Article  Google Scholar 

  • Ali AM, Zhuang H, Ibrahim AK (2017) An approach for facial expression classification. In J Biometrics 9(2):96–112

    Article  Google Scholar 

  • Aneja D, Colburn A, Faigin G, Shapiro L, Mones B (2016) Modeling stylized character expressions via deep learning. In: Asian conference on computer vision, springer, pp 136–153

  • Avani VS, Shaila S, Vadivel A (2020) Geometrical features of lips using the properties of parabola for recognizing facial expression. Cognitive Neurodyn. https://doi.org/10.1007/s11571-020-09638-x

    Article  Google Scholar 

  • Bai Y, Guo L, Jin L, Huang Q (2009) A novel feature extraction method using pyramid histogram of orientation gradients for smile recognition. In: IEEE International conference on image processing, IEEE, pp 3305–3308

  • Balahur A, Hermida JM, Montoyo A, Muñoz R (2011) Emotinet: A knowledge base for emotion detection in text built on the appraisal theories. In: International conference on application of natural language to information systems, Springer, pp 27–39

  • Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: ACM international conference on multimodal interaction, ACM, pp 279–283

  • Bartlett MS, Littlewort G, Fasel I, Movellan JR (2003) Real time face detection and facial expression recognition: Development and applications to human computer interaction. In: IEEE conference on computer vision and pattern recognition workshop, IEEE 5:53–53

  • Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. IEEE Comput Soc Conf Comput Vis Pattern Recognit 2:568–573

    Google Scholar 

  • Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3d facial expression recognition. In: International conference on pattern recognition, IEEE, pp 4125–4128

  • Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Probabilistic attribute tree in convolutional neural networks for facial expression recognition. arXiv preprint arXiv:181207067

  • Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A (2018) Vggface2: A dataset for recognising faces across pose and age. In: IEEE international conference on automatic face and gesture recognition, IEEE, pp 67–74

  • Chang FJ, Tran AT, Hassner T, Masi I, Nevatia R, Medioni G (2018) Expnet: Landmark-free, deep, 3d facial expressions. In: IEEE International conference on automatic face and gesture recognition, IEEE, pp 122–129

  • Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607

  • Chen X, Pan Z, Wang P, Zhang L, Yuan J (2015) Eeg oscillations reflect task effects for the change detection in vocal emotion. Cogn Neurodyn 9(3):351–358

    Article  Google Scholar 

  • Deng Z, Choi KS, Jiang Y, Wang S (2014) Generalized hidden-mapping ridge regression, knowledge-leveraged inductive transfer learning for neural networks, fuzzy systems and kernel methods. IEEE Trans Cybern 44(12):2585–2599

    Article  Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies. 1:4171–4186

  • Fan X, Qureshi R, Shahid AR, Cao J, Yang L, Yan H (2020) Hybrid separable convolutional inception residual network for human facial expression recognition. In: International conference on machine learning and cybernetics, IEEE, pp 21–26

  • Feutry C, Piantanida P, Bengio Y, Duhamel P (2018) Learning anonymized representations with adversarial neural networks. arXiv preprint arXiv:180209386

  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International conference on computer vision, pp 1026–1034

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778

  • Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM international conference on multimodal interaction, pp 553–560

  • Kasiran Z, Yahya S (2007) Facial expression as an implicit customers’ feedback and the challenges. IEEE

  • Khan S, Chen L, Zhe X, Yan H (2016) Feature selection based on co-clustering for effective facial expression recognition. Int Conf Mach Learn Cyberne 1:48–53

    Google Scholar 

  • Khan S, Chen L, Yan H (2017) Co-clustering to reveal salient facial features for expression recognition. IEEE Trans Affect Comput 11:314

    Google Scholar 

  • Khorrami P, Paine T, Huang T (2015) Do deep neural networks learn facial action units when doing expression recognition? In: IEEE International conference on computer vision workshops, pp 19–27

  • Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations

  • Koujan MR, Alharbawee L, Giannakakis G, Pugeault N, Roussos A (2020) Real-time facial expression recognition” in the wild”by disentangling 3d expression from identity. In: International conference on automatic face and gesture recognition, IEEE

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cogn Emot 24(8):1377–1388

    Article  Google Scholar 

  • Li M, Xu H, Huang X, Song Z, Liu X, Li X (2018) Facial expression recognition with identity and emotion joint learning. In: IEEE Transactions on affective computing

  • Li S, Deng W (2018) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370

    Article  Google Scholar 

  • Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput

  • Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: IEEE Conference on computer vision and pattern recognition, pp 2852–2861

  • Lian Z, Li Y, Tao JH, Huang J, Niu MY (2020) Expression analysis based on face regions in read-world conditions. Int J Autom Comput 17(1):96–107

    Article  Google Scholar 

  • Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp 2117–2125

  • Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: IEEE conference on computer vision and pattern recognition workshops, IEEE, pp 94–101

  • Lundqvist D, Flykt A, Öhman A (1998) The karolinska directed emotional faces (kdef). Department of Clinical Neuroscience, Psychology section, Karolinska Institutet 91(630):2–2

  • Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421

  • Mao Q, Rao Q, Yu Y, Dong M (2016) Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Trans Multimedia 19(4):861–873

    Article  Google Scholar 

  • Mavani V, Raman S, Miyapuram KP (2017) Facial expression recognition using visual saliency and deep learning. In: IEEE international conference on computer vision, pp 2783–2788

  • Minaee S, Abdolrashidi A (2019) Deep-emotion: Facial expression recognition using attentional convolutional network. arXiv preprint arXiv:190201019

  • Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: IEEE Winter conference on applications of computer vision, IEEE, pp 1–10

  • Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  • Ocegueda O, Shah SK, Kakadiaris IA (2011) Which parts of the face give out your identity? In: IEEE conference on computer vision and pattern recognition, IEEE, pp 641–648

  • Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8024–8035

    Google Scholar 

  • Prieto LAB, Oplatkova ZK (2018) Emotion recognition using autoencoders and convolutional neural networks. Mendel 24(1):113–120

    Article  Google Scholar 

  • Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V (2017) Stacked deep convolutional auto-encoders for emotion recognition from facial expressions. In: International joint conference on neural networks, IEEE, pp 1586–1593

  • Shahid AR, Khan S, Yan H (2020) Contour and region harmonic features for sub-local facial expression recognition. J Vis Commun Image Represent 73:102949

    Article  Google Scholar 

  • Shan C, Gong S, McOwan PW (2005) Robust facial expression recognition using local binary patterns. In: IEEE international conference on image processing, IEEE, 2:II–370

  • Shen F, Dai G, Lin G, Zhang J, Kong W, Zeng H (2020) Eeg-based emotion recognition using 4d convolutional recurrent neural network. Cogn Neurodyn 14(6):815–828

    Article  Google Scholar 

  • Shih FY, Chuang CF, Wang PS (2008) Performance comparisons of facial expression recognition in jaffe database. Int J Pattern Recognit Artif Intell 22(03):445–459

    Article  Google Scholar 

  • Sun W, Zhao H, Jin Z (2017) An efficient unconstrained facial expression recognition algorithm based on stack binarized auto-encoders and binarized neural networks. Neurocomputing 267:385–395

    Article  Google Scholar 

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9

  • Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp 2818–2826

  • Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI conference on artificial intelligence

  • Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks, Springer, pp 270–279

  • Trepagnier CY, Sebrechts MM, Finkelmeyer A, Stewart W, Woodford J, Coleman M (2006) Simulating social interaction to address deficits of autistic spectrum disorder in children. Cyberpsychol Behav 9(2):213–217

    Article  Google Scholar 

  • Wang S, Liu Z, Lv S, Lv Y, Wu G, Peng P, Chen F, Wang X (2010) A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans Multimedia 12(7):682–691

    Article  CAS  Google Scholar 

  • Wen G, Chang T, Li H, Jiang L (2020) Dynamic objectives learning for facial expression recognition. IEEE Trans Multimed 22:2914

    Article  Google Scholar 

  • Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Springer, pp 499–515

  • Yaddaden Y, Adda M, Bouzouane A, Gaboury S, Bouchard B (2018) User action and facial expression recognition for error detection system in an ambient assisted environment. Expert Syst Appl 112:173–189

    Article  Google Scholar 

  • Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Adv Neural Inf Process Syst 27:3320–3328

    Google Scholar 

  • Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations

  • Zamir AR, Sax A, Shen W, Guibas LJ, Malik J, Savarese S (2018) Taskonomy: Disentangling task transfer learning. In: IEEE Conference on computer vision and pattern recognition, pp 3712–3722

  • Zavarez MV, Berriel RF, Oliveira-Santos T (2017) Cross-database facial expression recognition based on fine-tuned deep convolutional network. SIBGRAPI conference on graphics, Patterns and Images, IEEE, pp 405–412

  • Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong TC, Qu H (2020) Emotioncues: emotion-oriented visual summarization of classroom videos. Trans Vis Comput Graph 27:3168

    Article  Google Scholar 

  • Zhang H, Su W, Yu J, Wang Z (2020) Identity-expression dual branch network for facial expression recognition. In: IEEE transactions on cognitive and developmental systems

  • Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  • Zhao H, Liu Q, Yang Y (2018) Transfer learning with ensemble of multiple feature representations. In: International conference on software engineering research management and applications, IEEE, pp 54–61

Download references

Acknowledgements

This work was supported by the Hong Kong Innovation and Technology Commission, and the City University of Hong Kong (Project 9610460).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinqi Fan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, X., Jiang, M., Shahid, A.R. et al. Hierarchical scale convolutional neural network for facial expression recognition. Cogn Neurodyn 16, 847–858 (2022). https://doi.org/10.1007/s11571-021-09761-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11571-021-09761-3

Keywords

Navigation