Skip to main content
Log in

MPCSAN: multi-head parallel channel-spatial attention network for facial expression recognition in the wild

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Facial expression recognition (FER) in the wild is an exceedingly challenging task in computer vision due to subtle differences, poses, occlusions, label bias, and other uncontrollable factors. CNN-based deep learning networks are susceptible to the above factors, resulting in the inability to obtain highly discriminative features on the key regions of expressions, and most methods of learning in a single feature space may not fully capture the core regions of interest. These will directly affect the solution to the problem of intra-class variability and inter-class similarity of expressions, which ultimately affects the recognition performance. Therefore, we propose an effective multi-head parallel channel-spatial attention network (MPCSAN) for FER in the wild, which consists of a feature aggregation network (FAN), a multi-head parallel attention network (MPAN), and an expression forecasting network (EFN). First, the lightweight FAN network extracts basic expression features while optimizing intra-class and inter-class distribution. Then, MPAN forms a multi-attention subspace by a multi-head parallel channel-space attention fusion design and focuses on more accurate and comprehensive expression regions of interest by minimizing duplicate attention during subspace fusion. Finally, EFN performs the final expression classification under the optimization of label softening, which further improves the robustness problem caused by label bias. Our proposed method is evaluated on the three most widely used wild expression datasets (RAF-DB, FERPlus, and AffectNet). The extensive experimental results demonstrate that our method outperforms several current state-of-the-art methods, achieving accuracies of 90.16% on RAF-DB, 89.91% on FERPlus, and 61.58% on AffectNet, respectively. Occlusion and pose variation datasets evaluation and cross-dataset assessment further demonstrate the good comprehensive performance of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data that support the findings of this study are available at http://www.whdeng.cn/RAF/model1.html, https://github.com/Microsoft/FERPlus, http://mohammadmahoor.com/affectnet/, https://github.com/kaiwang960112/Challenge-condition-FER-dataset and http://www.jeffcohn.net/Resources/ with corresponding permission.

References

  1. Pantic M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: 2005 IEEE International Conference on Multimedia and Expo (ICME), pp. 5–15. IEEE, Amsterdam

  2. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE, San Francisco

  3. Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619

    Article  Google Scholar 

  4. Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2852–2861. IEEE, Hawaii

  5. Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI), pp. 279–283. ACM, Tokyo

  6. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  7. Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816

    Article  Google Scholar 

  8. Hu Y, Zeng Z, Yin L, Wei X, Zhou X, Huang TS (2008) Multi-view facial expression recognition. In: 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2008), pp. 1–6. IEEE, Amsterdam

  9. Gu W, Xiang C, Venkatesh Y, Huang D, Lin H (2012) Facial expression recognition using radial encoding of local gabor features and classifier synthesis. Pattern Recognit 45(1):80–91

    Article  Google Scholar 

  10. Zhao J, Cheng Y, Xu Y, Xiong L, Li J, Zhao F, Jayashree K, Pranata S, Shen S, Xing J (2018) Towards pose invariant face recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2207–2216. IEEE, Salt Lake City

  11. Li S, Deng W (2019) Blended emotion in-the-wild: multi-label facial expression recognition using crowdsourced annotations and deep locality feature learning. Int J Comput Vis 127(6):884–906

    Article  Google Scholar 

  12. Wang Z, Zeng F, Liu S, Zeng B (2021) Oaenet: oriented attention ensemble for accurate facial expression recognition. Pattern Recognit 112:107694

    Article  Google Scholar 

  13. Ruan D, Mo R, Yan Y, Chen S, Xue J, Wang H (2022) Adaptive deep disturbance-disentangled learning for facial expression recognition. Int J Comput Vis 130:455–477

    Article  Google Scholar 

  14. Jeong D, Kim BG, Dong SY (2020) Deep joint spatiotemporal network (djstn) for efficient facial expression recognition. Sensors 20(7):1936

    Article  Google Scholar 

  15. Marrero Fernandez PD, Guerrero Pena FA, Ren T, Cunha A (2019) Feratt: facial expression recognition with attention net. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 0–0. IEEE, Long Beach

  16. Fan Y, Li V, Lam JC (2020) Facial expression recognition with deeply-supervised attention network. IEEE Trans Affect Comput

  17. Park SJ, Kim BG, Chilamkurti N (2021) A robust facial expression recognition algorithm based on multi-rate feature fusion scheme. Sensors 21(21):6954

    Article  Google Scholar 

  18. Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 222–237. Springer, Munich

  19. Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Asian Conference on Machine Learning (ACML), pp. 897–911. PMLR, Nagoya

  20. Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  21. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906. IEEE, Seattle

  22. Farzaneh AH, Qi X (2020) Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 406–407. IEEE, Seattle

  23. Fan X, Deng Z, Wang K, Peng X, Qiao Y (2020) Learning discriminative representation for facial expression recognition from uncertainties. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 903–907. IEEE, Virtual

  24. Siqueira H, Magg S, Wermter S (2020) Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5800–5809. AAAI, New York

  25. Liu P, Lin Y, Meng Z, Lu L, Deng W, Zhou JT, Yang Y (2021) Point adversarial self-mining: a simple method for facial expression recognition. IEEE T Cybern, pp 1–12

  26. Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7660–7669. IEEE, Virtual

  27. Ma F, Sun B, Li S (2021) Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans Affect Comput

  28. Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35(4):3510–3519. AAAI, Virtual

  29. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  30. Albanie S, Nagrani A, Vedaldi A, Zisserman A (2018) Emotion recognition in speech using cross-modal transfer in the wild. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 292–301. ACM, New York

  31. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  MATH  Google Scholar 

  32. Xia H, Li C, Tan Y, Li L, Song S (2021) Destruction and reconstruction learning for facial expression recognition. IEEE Multimed 28(2):20–28

    Article  Google Scholar 

  33. Li Y, Lu G, Li J, Zhang Z, Zhang D (2020) Facial expression recognition in the wild using multi-level features and attention mechanisms. IEEE Trans Affect Comput

  34. Zhao Z, Liu Q, Wang S (2021) Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans Image Process 30:6544–6556

    Article  Google Scholar 

  35. Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: 33rd Conference on Neural Information Processing Systems (NIPS), vol. 32. MIT, Vancouver

  36. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH, et al. (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer, Daegu

  37. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision (ECCV), pp. 87–102. Springer, Amsterdam

  38. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626. IEEE, Venice

  39. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

  40. Huang C (2017) Combining convolutional neural networks for emotion recognition. In: 2017 IEEE MIT Undergraduate Research Technology Conference, pp. 1–4. IEEE, Massachusetts

Download references

Acknowledgements

This research is supported by the National Science Foundation of China under Grant 61966035 and U1803261, by the Autonomous Region Science and Technology Department International Cooperation Project under Grant 2020E01023.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yurong Qian.

Ethics declarations

Conflict of interest

We wish to submit a new manuscript entitled “MPCSAN: Multi-head Parallel Channel-Spatial Attention Network for Facial Expression Recognition in the Wild” for consideration in the Neural Computing and Applications Journal. We declare that this work is original and there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, W., Qian, Y. & Fan, Y. MPCSAN: multi-head parallel channel-spatial attention network for facial expression recognition in the wild. Neural Comput & Applic 35, 6529–6543 (2023). https://doi.org/10.1007/s00521-022-08040-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08040-4

Keywords

Navigation