Skip to main content
Log in

NA-Resnet: neighbor block and optimized attention module for global-local feature extraction in facial expression recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As deep networks constantly deepen to extract high-level abstract features, the significance of shallow features for the target task will inevitably diminish. To address this issue and provide novel technical support for current research in the field of facial expression recognition (FER), in this article, we propose a network that can increase the decision weight of the shallow and middle feature mappings through the neighbor block (Nei Block) and concentrate on the crucial areas for extracting necessary features through the optimized attention module (OAM), called NA-Resnet. Our work has several merits. First, to the best of our knowledge, NA-Resnet is the first network that directly utilizes surface features to assist image classification. Second, the suggested OAM is embedded into each layer of the network that can precisely extract critical information appropriate to the current stage. Third, our model achieves the best exhibition when using a single relatively lightweight network without a network ensemble on Fer2013. Extensive experiments have been conducted, and the results show that our model achieves much higher state-of-the-art performance than any single network on Fer2013. In particular, our NA-Resnet achieves 74.59% on Fer2013 and an average accuracy of 96.06% with a standard deviation of 2.9% through 10-fold-cross-validation on Ck+.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

All the data generated or analyzed during this study is included in this published article. The datasets used or analyzed during the current study are available from the official website or the corresponding author on reasonable request.

References

  1. Achanta SDM, Karthikeyan T, Vinothkanna R (2019) A novel hidden Markov model-based adaptive dynamic time warping (HMDTW) gait analysis for identifying physically challenged persons. Soft Comput 23:8359–8366. https://doi.org/10.1007/s00500-019-04108-x

    Article  Google Scholar 

  2. Cai J, Meng Z, Khan AS, Li Z, O'Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: Proceedings 2018 13th IEEE international conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, Piscataway, pp 302–309. https://doi.org/10.1109/fg.2018.00051

    Chapter  Google Scholar 

  3. Connie T, Al-Shabi M, Cheah WP, Goh M (2017) Facial expression recognition using a hybrid cnn–sift aggregator. In: Phon-Amnuaisuk S, Ang SP, Lee SY (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture notes in computer science, vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_12

    Chapter  Google Scholar 

  4. Fan Y, Lam JCK, Li VOK (2018) Multi-region ensemble convolutional neural network for facial expression recognition. In: Kurkova V, Manolopoulos Y, Hammer B, Iliadis L, Maglogiannis I (eds) Artificial neural networks and machine learning - ICANN 2018, lecture notes in computer science, vol:11139. Springer, Cham, pp 84–94. https://doi.org/10.1007/978-3-030-01418-6_9

    Chapter  Google Scholar 

  5. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 3141–3149. https://doi.org/10.1109/cvpr.2019.00326

    Chapter  Google Scholar 

  6. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H, Zhou Y, Ramaiah C, Feng F, Li R, Wang X, Athanasakis D, Shawe-Taylor J, Milakov M, Park J, … Bengio Y (2015) Challenges in representation learning: a report on three machine learning contests. Neural Netw 64:59–63. https://doi.org/10.1016/j.neunet.2014.09.005

    Article  Google Scholar 

  7. Gunes H, Schuller B (2013) Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vis Comput 31(2):120–136. https://doi.org/10.1016/j.imavis.2012.06.016

    Article  Google Scholar 

  8. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

    Chapter  Google Scholar 

  9. Hu G, Liu L, Yuan Y, Yu Z, Hua Y, Zhang Z, Shen F, Shao L, Hospedales T, Robertson N, Yang Y (2018) Deep multi-task learning to recognise subtle facial expressions of mental states. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018. Lecture notes in computer science, vol 11216. Springer, Cham, pp 106–123. https://doi.org/10.1007/978-3-030-01258-8_7

    Chapter  Google Scholar 

  10. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 7132–7141. https://doi.org/10.1109/cvpr.2018.00745

    Chapter  Google Scholar 

  11. Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 2261–2269. https://doi.org/10.1109/cvpr.2017.243

    Chapter  Google Scholar 

  12. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. NIPS, La Jolla, pp 2017–2025

    Google Scholar 

  13. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  14. Li S, Deng W (2019) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. IEEE Trans Image Process 28(1):356–370. https://doi.org/10.1109/TIP.2018.2868382

    Article  MathSciNet  MATH  Google Scholar 

  15. Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2020.2981446

  16. Liu P, Han S, Meng Z, Tong Y (2014) Facial expression recognition via a boosted deep belief network. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 1805–1812. https://doi.org/10.1109/cvpr.2014.233

    Chapter  Google Scholar 

  17. Liu K, Zhang M, Pan Z (2016) Facial expression recognition with cnn ensemble. In: 2016 international conference on cyberworlds (CW). IEEE, Piscataway, pp 163–166. https://doi.org/10.1109/cw.2016.34

    Chapter  Google Scholar 

  18. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops. IEEE, Piscataway, pp 94–101. https://doi.org/10.1109/CVPRW.2010.5543262

    Chapter  Google Scholar 

  19. Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, Piscataway, pp 558–565. https://doi.org/10.1109/fg.2017.140

    Chapter  Google Scholar 

  20. Murthy ASD, Karthikeyan T, Jagan BOL, Kumari CU (2020) Novel deep neural network for individual re recognizing physically disabled individuals. Mater Today 33(7):4323–4328. https://doi.org/10.1016/j.matpr.2020.07.447

    Article  Google Scholar 

  21. Papers with Code (2021) Facial Expression Recognition on FER2013. https://paperswithcode.com/sota/facial-expression-recognition-on-fer2013. Accessed 1 December 2021

  22. Pham L, Vu TH, Tran TA (2021) Facial expression recognition using residual masking network. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, Piscataway, pp 4513–4519. https://doi.org/10.1109/ICPR48806.2021.9411919

    Chapter  Google Scholar 

  23. Pons G, Masip D (2018) Supervised committee of convolutional neural networks in automated facial expression analysis. IEEE Trans Affect Comput 9(3):343–350. https://doi.org/10.1109/taffc.2017.2753235

    Article  Google Scholar 

  24. Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R (2017) An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, Piscataway, pp 17–24. https://doi.org/10.1109/fg.2017.137

    Chapter  Google Scholar 

  25. Rouast PV, Adam MTP, Chiong R (2021) Deep learning for human affect recognition: insights and new developments. IEEE Trans Affect Comput 12(2):524–543. https://doi.org/10.1109/taffc.2018.2890471

    Article  Google Scholar 

  26. Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 7660–7669

    Google Scholar 

  27. Sanchez E, Tellamekala MK, Valstar M, Tzimiropoulos G (2021) Affective processes: stochastic modelling of temporal context for emotion and facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 9074–9084

    Google Scholar 

  28. Sikka K, Sharma G, Bartlett M (2016) LOMo: latent ordinal model for facial analysis in videos. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 5580–5589. https://doi.org/10.1109/cvpr.2016.602

    Chapter  Google Scholar 

  29. Siqueira H, Magg S, Wermter S (2020) Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. AAAI, Palo Alto, pp 5800–5809. https://doi.org/10.1609/aaai.v34i04.6037

    Chapter  Google Scholar 

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

    Chapter  Google Scholar 

  31. Tan M, Le QV (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning (ICML). ACM, New York, pp 6105–6114

    Google Scholar 

  32. Tian Y-I, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115. https://doi.org/10.1109/34.908962

    Article  Google Scholar 

  33. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Piscataway, pp 6450–6458. https://doi.org/10.1109/cvpr.2017.683

    Chapter  Google Scholar 

  34. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, (CVPR). IEEE, Piscataway, pp 6896–6905. https://doi.org/10.1109/cvpr42600.2020.00693

    Chapter  Google Scholar 

  35. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision - ECCV 2018, lecture notes in computer science, vol 11211. Springer, Cham, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  36. Wu R, Zhang G, Lu S, Chen T (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 5020–5029. https://doi.org/10.1109/cvpr42600.2020.00507

    Chapter  Google Scholar 

  37. WuJie1010 (2021) Facial-Expression-Recognition.Pytorch. https://github.com/WuJie1010/Facial-Expression-Recognition.Pytorch/. Accessed 24 September 2021

  38. Yang J, Zhang D, Frangi AF, Yang J-Y (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137. https://doi.org/10.1109/TPAMI.2004.10004

    Article  Google Scholar 

  39. Yao L, Wan Y, Ni H, Xu B (2021) Action unit classification for facial expression recognition using active learning and svm. Multimed Tools Appl 80(16):24287–24301. https://doi.org/10.1007/s11042-021-10836-w

    Article  Google Scholar 

  40. Ying Z, Fang X (2008) Combining LBP and Adaboost for facial expression recognition. In: 2008 9th International Conference on Signal Processing. IEEE, Piscataway, pp 1461–1464. https://doi.org/10.1109/ICOSP.2008.4697408

    Chapter  Google Scholar 

  41. Zhang L, Verma B, Tjondronegoro D, Chandran V (2018) Facial expression analysis under partial occlusion: a survey. ACM Comput Surv 51(2):25:1–25:49. https://doi.org/10.1145/3158369

    Article  Google Scholar 

  42. Zhang H, Su W, Wang Z (2020) Weakly supervised local-global attention network for facial expression recognition. IEEE Access 8:37976–37987. https://doi.org/10.1109/ACCESS.2020.2975913

    Article  Google Scholar 

  43. Zhang F, Zhang T, Mao Q, Xu C (2020) A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Trans Image Process 29:6574–6589. https://doi.org/10.1109/tip.2020.2991549

    Article  MATH  Google Scholar 

  44. Zhao S, Ma Y, Gu Y, Yang J, Xing T, Xu P, Hu R, Chai H, Keutzer K (2020) An end-to-end visual-audio attention network for emotion recognition in user-generated videos. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. AAAI, Palo Alto, pp 303–311. https://doi.org/10.1609/aaai.v34i01.5364

    Chapter  Google Scholar 

  45. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Piscataway, pp 2879–2886. https://doi.org/10.1109/CVPR.2012.6248014

    Chapter  Google Scholar 

  46. Zhu K, Du Z, Li W, Huang D, Wang Y, Chen L (2019) Discriminative attention-based convolutional neural network for 3d facial expression recognition. In: 2019 14th IEEE international conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, Piscataway, pp 1–8. https://doi.org/10.1109/FG.2019.8756524

    Chapter  Google Scholar 

Download references

Acknowledgments

The research was supported by the National Natural Science Foundation of China under Grant 62267007,Gansu Provincial Department of Education Higher Education Industry Support Plan Project under Grant 2022CYZC-16.

Code availability

The code and pre-training model used or analyzed during the current study are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenyang Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, Y., Zhou, C. & Chen, Y. NA-Resnet: neighbor block and optimized attention module for global-local feature extraction in facial expression recognition. Multimed Tools Appl 82, 16375–16393 (2023). https://doi.org/10.1007/s11042-022-14191-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14191-2

Keywords

Navigation