Abstract
As deep networks constantly deepen to extract high-level abstract features, the significance of shallow features for the target task will inevitably diminish. To address this issue and provide novel technical support for current research in the field of facial expression recognition (FER), in this article, we propose a network that can increase the decision weight of the shallow and middle feature mappings through the neighbor block (Nei Block) and concentrate on the crucial areas for extracting necessary features through the optimized attention module (OAM), called NA-Resnet. Our work has several merits. First, to the best of our knowledge, NA-Resnet is the first network that directly utilizes surface features to assist image classification. Second, the suggested OAM is embedded into each layer of the network that can precisely extract critical information appropriate to the current stage. Third, our model achieves the best exhibition when using a single relatively lightweight network without a network ensemble on Fer2013. Extensive experiments have been conducted, and the results show that our model achieves much higher state-of-the-art performance than any single network on Fer2013. In particular, our NA-Resnet achieves 74.59% on Fer2013 and an average accuracy of 96.06% with a standard deviation of 2.9% through 10-fold-cross-validation on Ck+.
Similar content being viewed by others
Data availability
All the data generated or analyzed during this study is included in this published article. The datasets used or analyzed during the current study are available from the official website or the corresponding author on reasonable request.
References
Achanta SDM, Karthikeyan T, Vinothkanna R (2019) A novel hidden Markov model-based adaptive dynamic time warping (HMDTW) gait analysis for identifying physically challenged persons. Soft Comput 23:8359–8366. https://doi.org/10.1007/s00500-019-04108-x
Cai J, Meng Z, Khan AS, Li Z, O'Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: Proceedings 2018 13th IEEE international conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, Piscataway, pp 302–309. https://doi.org/10.1109/fg.2018.00051
Connie T, Al-Shabi M, Cheah WP, Goh M (2017) Facial expression recognition using a hybrid cnn–sift aggregator. In: Phon-Amnuaisuk S, Ang SP, Lee SY (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture notes in computer science, vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_12
Fan Y, Lam JCK, Li VOK (2018) Multi-region ensemble convolutional neural network for facial expression recognition. In: Kurkova V, Manolopoulos Y, Hammer B, Iliadis L, Maglogiannis I (eds) Artificial neural networks and machine learning - ICANN 2018, lecture notes in computer science, vol:11139. Springer, Cham, pp 84–94. https://doi.org/10.1007/978-3-030-01418-6_9
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 3141–3149. https://doi.org/10.1109/cvpr.2019.00326
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H, Zhou Y, Ramaiah C, Feng F, Li R, Wang X, Athanasakis D, Shawe-Taylor J, Milakov M, Park J, … Bengio Y (2015) Challenges in representation learning: a report on three machine learning contests. Neural Netw 64:59–63. https://doi.org/10.1016/j.neunet.2014.09.005
Gunes H, Schuller B (2013) Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vis Comput 31(2):120–136. https://doi.org/10.1016/j.imavis.2012.06.016
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu G, Liu L, Yuan Y, Yu Z, Hua Y, Zhang Z, Shen F, Shao L, Hospedales T, Robertson N, Yang Y (2018) Deep multi-task learning to recognise subtle facial expressions of mental states. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018. Lecture notes in computer science, vol 11216. Springer, Cham, pp 106–123. https://doi.org/10.1007/978-3-030-01258-8_7
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 7132–7141. https://doi.org/10.1109/cvpr.2018.00745
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 2261–2269. https://doi.org/10.1109/cvpr.2017.243
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. NIPS, La Jolla, pp 2017–2025
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Li S, Deng W (2019) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. IEEE Trans Image Process 28(1):356–370. https://doi.org/10.1109/TIP.2018.2868382
Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2020.2981446
Liu P, Han S, Meng Z, Tong Y (2014) Facial expression recognition via a boosted deep belief network. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 1805–1812. https://doi.org/10.1109/cvpr.2014.233
Liu K, Zhang M, Pan Z (2016) Facial expression recognition with cnn ensemble. In: 2016 international conference on cyberworlds (CW). IEEE, Piscataway, pp 163–166. https://doi.org/10.1109/cw.2016.34
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops. IEEE, Piscataway, pp 94–101. https://doi.org/10.1109/CVPRW.2010.5543262
Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, Piscataway, pp 558–565. https://doi.org/10.1109/fg.2017.140
Murthy ASD, Karthikeyan T, Jagan BOL, Kumari CU (2020) Novel deep neural network for individual re recognizing physically disabled individuals. Mater Today 33(7):4323–4328. https://doi.org/10.1016/j.matpr.2020.07.447
Papers with Code (2021) Facial Expression Recognition on FER2013. https://paperswithcode.com/sota/facial-expression-recognition-on-fer2013. Accessed 1 December 2021
Pham L, Vu TH, Tran TA (2021) Facial expression recognition using residual masking network. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, Piscataway, pp 4513–4519. https://doi.org/10.1109/ICPR48806.2021.9411919
Pons G, Masip D (2018) Supervised committee of convolutional neural networks in automated facial expression analysis. IEEE Trans Affect Comput 9(3):343–350. https://doi.org/10.1109/taffc.2017.2753235
Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R (2017) An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, Piscataway, pp 17–24. https://doi.org/10.1109/fg.2017.137
Rouast PV, Adam MTP, Chiong R (2021) Deep learning for human affect recognition: insights and new developments. IEEE Trans Affect Comput 12(2):524–543. https://doi.org/10.1109/taffc.2018.2890471
Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 7660–7669
Sanchez E, Tellamekala MK, Valstar M, Tzimiropoulos G (2021) Affective processes: stochastic modelling of temporal context for emotion and facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 9074–9084
Sikka K, Sharma G, Bartlett M (2016) LOMo: latent ordinal model for facial analysis in videos. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 5580–5589. https://doi.org/10.1109/cvpr.2016.602
Siqueira H, Magg S, Wermter S (2020) Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. AAAI, Palo Alto, pp 5800–5809. https://doi.org/10.1609/aaai.v34i04.6037
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tan M, Le QV (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning (ICML). ACM, New York, pp 6105–6114
Tian Y-I, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115. https://doi.org/10.1109/34.908962
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 6450–6458. https://doi.org/10.1109/cvpr.2017.683
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, (CVPR). IEEE, Piscataway, pp 6896–6905. https://doi.org/10.1109/cvpr42600.2020.00693
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision - ECCV 2018, lecture notes in computer science, vol 11211. Springer, Cham, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Wu R, Zhang G, Lu S, Chen T (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 5020–5029. https://doi.org/10.1109/cvpr42600.2020.00507
WuJie1010 (2021) Facial-Expression-Recognition.Pytorch. https://github.com/WuJie1010/Facial-Expression-Recognition.Pytorch/. Accessed 24 September 2021
Yang J, Zhang D, Frangi AF, Yang J-Y (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137. https://doi.org/10.1109/TPAMI.2004.10004
Yao L, Wan Y, Ni H, Xu B (2021) Action unit classification for facial expression recognition using active learning and svm. Multimed Tools Appl 80(16):24287–24301. https://doi.org/10.1007/s11042-021-10836-w
Ying Z, Fang X (2008) Combining LBP and Adaboost for facial expression recognition. In: 2008 9th International Conference on Signal Processing. IEEE, Piscataway, pp 1461–1464. https://doi.org/10.1109/ICOSP.2008.4697408
Zhang L, Verma B, Tjondronegoro D, Chandran V (2018) Facial expression analysis under partial occlusion: a survey. ACM Comput Surv 51(2):25:1–25:49. https://doi.org/10.1145/3158369
Zhang H, Su W, Wang Z (2020) Weakly supervised local-global attention network for facial expression recognition. IEEE Access 8:37976–37987. https://doi.org/10.1109/ACCESS.2020.2975913
Zhang F, Zhang T, Mao Q, Xu C (2020) A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Trans Image Process 29:6574–6589. https://doi.org/10.1109/tip.2020.2991549
Zhao S, Ma Y, Gu Y, Yang J, Xing T, Xu P, Hu R, Chai H, Keutzer K (2020) An end-to-end visual-audio attention network for emotion recognition in user-generated videos. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. AAAI, Palo Alto, pp 303–311. https://doi.org/10.1609/aaai.v34i01.5364
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 2879–2886. https://doi.org/10.1109/CVPR.2012.6248014
Zhu K, Du Z, Li W, Huang D, Wang Y, Chen L (2019) Discriminative attention-based convolutional neural network for 3d facial expression recognition. In: 2019 14th IEEE international conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, Piscataway, pp 1–8. https://doi.org/10.1109/FG.2019.8756524
Acknowledgments
The research was supported by the National Natural Science Foundation of China under Grant 62267007,Gansu Provincial Department of Education Higher Education Industry Support Plan Project under Grant 2022CYZC-16.
Code availability
The code and pre-training model used or analyzed during the current study are available from the corresponding author on reasonable request.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qi, Y., Zhou, C. & Chen, Y. NA-Resnet: neighbor block and optimized attention module for global-local feature extraction in facial expression recognition. Multimed Tools Appl 82, 16375–16393 (2023). https://doi.org/10.1007/s11042-022-14191-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14191-2