Skip to main content
Log in

Feature fusion of multi-granularity and multi-scale for facial expression recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Although great progress has been made in facial expression recognition, it still faces challenges such as occlusion and pose changes in real-world scenario. To address this issue, we propose a simple yet effective multi-granularity and multi-scale feature fusion network (MM-Net) to achieve robust expression recognition without either manually extracting local patches or designing complex sub-networks. Specifically, we use a puzzle generator to divide the image into local regions of different granularity, which are then randomly shuffled and reorganized to form a new input image. By feeding the facial puzzles in order from fine-grained to coarse-grained, the network progressively mines the local fine-grained information, the coarse-grained information, and the global information. Besides, considering the subtle inter-class variation characteristic of different expressions, we use the multi-scale feature fusion strategy in the shallow feature extraction module to obtain global features with detailed information for capturing the subtle differences in facial expression images. Extensive experimental results on three in-the-wild FER benchmarks demonstrate the superiority of the proposed MM-Net compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The RAF-DB datasets analyzed during the current study are available in http://www.whdeng.cn/raf/model1.html. The FERPlus datasets analyzed during the current study are available in https://github.com/Microsoft/FERPlus. The AffectNet datasets analyzed during the current study are available in http://mohammadmahoor.com/affectnet/. The FED-RO datasets analyzed during the current study are available in https://doi.org/10.1109/TIP.2018.2886767.

References

  1. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13(3), 1195–1215 (2020). https://doi.org/10.1109/TAFFC.2020.2981446

    Article  ADS  Google Scholar 

  2. Lucey, P., Cohn, J.F., Kanade, T., et al.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94–101 (2010)

  3. Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011). https://doi.org/10.1016/j.imavis.2011.07.002

    Article  Google Scholar 

  4. Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: 2005 IEEE International Conference on Multimedia and Expo, p. 5 (2005)

  5. Kim, Y., Yoo, B., Kwak, Y., Choi, C., Kim, J.: Deep generative-contrastive networks for facial expression recognition. arXiv preprint (2017). arXiv:1703.07140

  6. Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017). https://doi.org/10.1109/TIP.2017.2689999

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  7. Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2168–2177 (2018)

  8. Hazourli, A.R., Djeghri, A., Salam, H., Othmani, A.: Deep multi-facial patches aggregation network for facial expression recognition. arXiv preprint (2020). arXiv:2002.09298

  9. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)

  10. Goodfellow, I.J., Erhan, D., Carrier, P.L., et al.: Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124 (2013)

  11. Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2019). https://doi.org/10.1109/TAFFC.2017.2740923

    Article  Google Scholar 

  12. Farzaneh, A.H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2401–2410 (2021)

  13. Li, H., Xiao, X., Liu, X., Guo, J., Wen, G., Liang, P.: Heuristic objective for facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02619-7

    Article  Google Scholar 

  14. Siqueira, H., Magg, S., Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5800–5809 (2020)

  15. Cai, J., Meng, Z., Khan, A.S., et al.: Identity-free facial expression recognition using conditional generative adversarial network. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1344–1348 (2021)

  16. Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3359–3368 (2018)

  17. Hammal, Z., Arguin, M., Gosselin, F.: Comparing a novel model based on the transferable belief model with humans during the recognition of partially occluded facial expressions. J. Vis. 9(2), 22–22 (2009). https://doi.org/10.1167/9.2.23

    Article  Google Scholar 

  18. Ramírez Cornejo, J.Y., Pedrini, H.: Recognition of occluded facial expressions based on centrist features. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1298–1302 (2016)

  19. Pan, B., Wang, S., Xia, B.: Occluded facial expression recognition enhanced through privileged information. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 566–573 (2019)

  20. Adil, B., Nadjib, K.M., Yacine, L.: A novel approach for facial expression recognition. In: 2019 International Conference on Networking and Advanced Systems (ICNAS), pp. 1–5 (2019)

  21. Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)

  22. Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020). https://doi.org/10.1109/TIP.2019.2956143

    Article  ADS  Google Scholar 

  23. Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2019). https://doi.org/10.1109/TIP.2018.2886767

    Article  MathSciNet  ADS  Google Scholar 

  24. Du, R., Chang, D., Bhunia, A.K., Xie, J., Ma, Z., Song, Y.-Z., Guo, J.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: European Conference on Computer Vision, pp. 153–168 (2020)

  25. Ding, H., Zhou, P., Chellappa, R.: Occlusion-adaptive deep network for robust facial expression recognition. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9 (2020)

  26. Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/TIP.2021.3093397

    Article  PubMed  ADS  Google Scholar 

  27. Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021). https://doi.org/10.1109/TAFFC.2021.3122146

    Article  Google Scholar 

  28. Liang, X., Xu, L., Zhang, W., et al.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02413-5

    Article  Google Scholar 

  29. Liu, C., Hirota, K., Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023). https://doi.org/10.1016/j.ins.2022.11.068

    Article  Google Scholar 

  30. Liao, L., Zhu, Y., Zheng, B., Jiang, X., Lin, J.: Fergcn: facial expression recognition based on graph convolution network. Mach. Vis. Appl. 33(3), 40 (2022). https://doi.org/10.1007/s00138-022-01288-9

    Article  PubMed  PubMed Central  Google Scholar 

  31. Gao, H., Wu, M., Chen, Z., et al.: Ssa-icl: Multi-domain adaptive attention with intra-dataset continual learning for facial expression recognition. Neural Netw. 158, 228–238 (2023). https://doi.org/10.1016/j.neunet.2022.11.025

    Article  PubMed  Google Scholar 

  32. Ruan, D., Yan, Y., Lai, S., et al.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7656–7665 (2021)

  33. Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906 (2020)

  34. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, pp. 69–84 (2016)

  35. Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5152–5161 (2019)

  36. Xia, H., Li, C., Tan, Y., Li, L., Song, S.: Destruction and reconstruction learning for facial expression recognition. IEEE Multimed. 28(2), 20–28 (2021). https://doi.org/10.1109/MMUL.2021.3076834

    Article  Google Scholar 

  37. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556

  39. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)

  40. Duta, I.C., Liu, L., Zhu, F., Shao, L.: Pyramidal convolution: rethinking convolutional neural networks for visual recognition. arXiv preprint (2020). arXiv:2006.11538

  41. Gao, S., Cheng, M., Zhao, K., et al.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021). https://doi.org/10.1109/TPAMI.2019.2938758

    Article  PubMed  Google Scholar 

  42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  43. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  44. Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)

  45. Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102 (2016)

  46. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst., vol. 32 (2019)

  47. Huang, C.: Combining convolutional neural networks for emotion recognition. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4 (2017)

  48. Su, C., Wei, J., Lin, D., Kong, L.: Using attention lsgb network for facial expression recognition. Pattern Anal. Appl. (2022). https://doi.org/10.1007/s10044-022-01124-w

    Article  Google Scholar 

  49. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847 (2018)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos.62106054), the Science and Technology Project of Guangxi (Grant 2018GXNSFAA281351) and the Research Projects of Guangxi Normal University (Natural Sciences) (2021JC012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuxiang Song.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, H., Lu, L. & Song, S. Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis Comput 40, 2035–2047 (2024). https://doi.org/10.1007/s00371-023-02900-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02900-3

Keywords

Navigation