Abstract
Recognition of facial expressions plays an important role in understanding human behavior, classroom assessment, customer feedback, education, business, and many other human-machine interaction applications. Some researchers have realized that using features corresponding to different scales can improve the recognition accuracy, but there is a lack of a systematic study to utilize the scale information. In this work, we proposed a hierarchical scale convolutional neural network (HSNet) for facial expression recognition, which can systematically enhance the information extracted from the kernel, network, and knowledge scale. First, inspired by that the facial expression can be defined by different size facial action units and the power of sparsity, we proposed dilation Inception blocks to enhance kernel scale information extraction. Second, to supervise relatively shallow layers for learning more discriminated features from different size feature maps, we proposed a feature guided auxiliary learning approach to utilize high-level semantic features to guide the shallow layers learning. Last, since human cognitive ability can progressively be improved by learned knowledge, we mimicked such ability by knowledge transfer learning from related tasks. Extensive experiments on lab-controlled, synthesized, and in-the-wild databases showed that the proposed method substantially boosts performance, and achieved state-of-the-art accuracy on most databases. Ablation studies proved the effectiveness of modules in the proposed method.
Similar content being viewed by others
References
Abbasi AA, Hussain L, Awan IA, Abbasi I, Majid A, Nadeem MSA, Chaudhary QA (2020) Detecting prostate cancer using deep learning convolution neural network with transfer learning approach. Cogn Neurodyn 14(4):523–533
Ali AM, Zhuang H, Ibrahim AK (2017) An approach for facial expression classification. In J Biometrics 9(2):96–112
Aneja D, Colburn A, Faigin G, Shapiro L, Mones B (2016) Modeling stylized character expressions via deep learning. In: Asian conference on computer vision, springer, pp 136–153
Avani VS, Shaila S, Vadivel A (2020) Geometrical features of lips using the properties of parabola for recognizing facial expression. Cognitive Neurodyn. https://doi.org/10.1007/s11571-020-09638-x
Bai Y, Guo L, Jin L, Huang Q (2009) A novel feature extraction method using pyramid histogram of orientation gradients for smile recognition. In: IEEE International conference on image processing, IEEE, pp 3305–3308
Balahur A, Hermida JM, Montoyo A, Muñoz R (2011) Emotinet: A knowledge base for emotion detection in text built on the appraisal theories. In: International conference on application of natural language to information systems, Springer, pp 27–39
Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: ACM international conference on multimodal interaction, ACM, pp 279–283
Bartlett MS, Littlewort G, Fasel I, Movellan JR (2003) Real time face detection and facial expression recognition: Development and applications to human computer interaction. In: IEEE conference on computer vision and pattern recognition workshop, IEEE 5:53–53
Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. IEEE Comput Soc Conf Comput Vis Pattern Recognit 2:568–573
Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3d facial expression recognition. In: International conference on pattern recognition, IEEE, pp 4125–4128
Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Probabilistic attribute tree in convolutional neural networks for facial expression recognition. arXiv preprint arXiv:181207067
Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A (2018) Vggface2: A dataset for recognising faces across pose and age. In: IEEE international conference on automatic face and gesture recognition, IEEE, pp 67–74
Chang FJ, Tran AT, Hassner T, Masi I, Nevatia R, Medioni G (2018) Expnet: Landmark-free, deep, 3d facial expressions. In: IEEE International conference on automatic face and gesture recognition, IEEE, pp 122–129
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607
Chen X, Pan Z, Wang P, Zhang L, Yuan J (2015) Eeg oscillations reflect task effects for the change detection in vocal emotion. Cogn Neurodyn 9(3):351–358
Deng Z, Choi KS, Jiang Y, Wang S (2014) Generalized hidden-mapping ridge regression, knowledge-leveraged inductive transfer learning for neural networks, fuzzy systems and kernel methods. IEEE Trans Cybern 44(12):2585–2599
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies. 1:4171–4186
Fan X, Qureshi R, Shahid AR, Cao J, Yang L, Yan H (2020) Hybrid separable convolutional inception residual network for human facial expression recognition. In: International conference on machine learning and cybernetics, IEEE, pp 21–26
Feutry C, Piantanida P, Bengio Y, Duhamel P (2018) Learning anonymized representations with adversarial neural networks. arXiv preprint arXiv:180209386
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM international conference on multimodal interaction, pp 553–560
Kasiran Z, Yahya S (2007) Facial expression as an implicit customers’ feedback and the challenges. IEEE
Khan S, Chen L, Zhe X, Yan H (2016) Feature selection based on co-clustering for effective facial expression recognition. Int Conf Mach Learn Cyberne 1:48–53
Khan S, Chen L, Yan H (2017) Co-clustering to reveal salient facial features for expression recognition. IEEE Trans Affect Comput 11:314
Khorrami P, Paine T, Huang T (2015) Do deep neural networks learn facial action units when doing expression recognition? In: IEEE International conference on computer vision workshops, pp 19–27
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations
Koujan MR, Alharbawee L, Giannakakis G, Pugeault N, Roussos A (2020) Real-time facial expression recognition” in the wild”by disentangling 3d expression from identity. In: International conference on automatic face and gesture recognition, IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cogn Emot 24(8):1377–1388
Li M, Xu H, Huang X, Song Z, Liu X, Li X (2018) Facial expression recognition with identity and emotion joint learning. In: IEEE Transactions on affective computing
Li S, Deng W (2018) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370
Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: IEEE Conference on computer vision and pattern recognition, pp 2852–2861
Lian Z, Li Y, Tao JH, Huang J, Niu MY (2020) Expression analysis based on face regions in read-world conditions. Int J Autom Comput 17(1):96–107
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: IEEE conference on computer vision and pattern recognition workshops, IEEE, pp 94–101
Lundqvist D, Flykt A, Öhman A (1998) The karolinska directed emotional faces (kdef). Department of Clinical Neuroscience, Psychology section, Karolinska Institutet 91(630):2–2
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421
Mao Q, Rao Q, Yu Y, Dong M (2016) Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Trans Multimedia 19(4):861–873
Mavani V, Raman S, Miyapuram KP (2017) Facial expression recognition using visual saliency and deep learning. In: IEEE international conference on computer vision, pp 2783–2788
Minaee S, Abdolrashidi A (2019) Deep-emotion: Facial expression recognition using attentional convolutional network. arXiv preprint arXiv:190201019
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: IEEE Winter conference on applications of computer vision, IEEE, pp 1–10
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Ocegueda O, Shah SK, Kakadiaris IA (2011) Which parts of the face give out your identity? In: IEEE conference on computer vision and pattern recognition, IEEE, pp 641–648
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8024–8035
Prieto LAB, Oplatkova ZK (2018) Emotion recognition using autoencoders and convolutional neural networks. Mendel 24(1):113–120
Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V (2017) Stacked deep convolutional auto-encoders for emotion recognition from facial expressions. In: International joint conference on neural networks, IEEE, pp 1586–1593
Shahid AR, Khan S, Yan H (2020) Contour and region harmonic features for sub-local facial expression recognition. J Vis Commun Image Represent 73:102949
Shan C, Gong S, McOwan PW (2005) Robust facial expression recognition using local binary patterns. In: IEEE international conference on image processing, IEEE, 2:II–370
Shen F, Dai G, Lin G, Zhang J, Kong W, Zeng H (2020) Eeg-based emotion recognition using 4d convolutional recurrent neural network. Cogn Neurodyn 14(6):815–828
Shih FY, Chuang CF, Wang PS (2008) Performance comparisons of facial expression recognition in jaffe database. Int J Pattern Recognit Artif Intell 22(03):445–459
Sun W, Zhao H, Jin Z (2017) An efficient unconstrained facial expression recognition algorithm based on stack binarized auto-encoders and binarized neural networks. Neurocomputing 267:385–395
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI conference on artificial intelligence
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks, Springer, pp 270–279
Trepagnier CY, Sebrechts MM, Finkelmeyer A, Stewart W, Woodford J, Coleman M (2006) Simulating social interaction to address deficits of autistic spectrum disorder in children. Cyberpsychol Behav 9(2):213–217
Wang S, Liu Z, Lv S, Lv Y, Wu G, Peng P, Chen F, Wang X (2010) A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans Multimedia 12(7):682–691
Wen G, Chang T, Li H, Jiang L (2020) Dynamic objectives learning for facial expression recognition. IEEE Trans Multimed 22:2914
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Springer, pp 499–515
Yaddaden Y, Adda M, Bouzouane A, Gaboury S, Bouchard B (2018) User action and facial expression recognition for error detection system in an ambient assisted environment. Expert Syst Appl 112:173–189
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Adv Neural Inf Process Syst 27:3320–3328
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations
Zamir AR, Sax A, Shen W, Guibas LJ, Malik J, Savarese S (2018) Taskonomy: Disentangling task transfer learning. In: IEEE Conference on computer vision and pattern recognition, pp 3712–3722
Zavarez MV, Berriel RF, Oliveira-Santos T (2017) Cross-database facial expression recognition based on fine-tuned deep convolutional network. SIBGRAPI conference on graphics, Patterns and Images, IEEE, pp 405–412
Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong TC, Qu H (2020) Emotioncues: emotion-oriented visual summarization of classroom videos. Trans Vis Comput Graph 27:3168
Zhang H, Su W, Yu J, Wang Z (2020) Identity-expression dual branch network for facial expression recognition. In: IEEE transactions on cognitive and developmental systems
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zhao H, Liu Q, Yang Y (2018) Transfer learning with ensemble of multiple feature representations. In: International conference on software engineering research management and applications, IEEE, pp 54–61
Acknowledgements
This work was supported by the Hong Kong Innovation and Technology Commission, and the City University of Hong Kong (Project 9610460).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, X., Jiang, M., Shahid, A.R. et al. Hierarchical scale convolutional neural network for facial expression recognition. Cogn Neurodyn 16, 847–858 (2022). https://doi.org/10.1007/s11571-021-09761-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11571-021-09761-3