Feature fusion of multi-granularity and multi-scale for facial expression recognition

Xia, Haiying; Lu, Lidan; Song, Shuxiang

doi:10.1007/s00371-023-02900-3

Feature fusion of multi-granularity and multi-scale for facial expression recognition

Original article
Published: 10 June 2023

Volume 40, pages 2035–2047, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Haiying Xia¹,
Lidan Lu¹ &
Shuxiang Song¹

447 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Although great progress has been made in facial expression recognition, it still faces challenges such as occlusion and pose changes in real-world scenario. To address this issue, we propose a simple yet effective multi-granularity and multi-scale feature fusion network (MM-Net) to achieve robust expression recognition without either manually extracting local patches or designing complex sub-networks. Specifically, we use a puzzle generator to divide the image into local regions of different granularity, which are then randomly shuffled and reorganized to form a new input image. By feeding the facial puzzles in order from fine-grained to coarse-grained, the network progressively mines the local fine-grained information, the coarse-grained information, and the global information. Besides, considering the subtle inter-class variation characteristic of different expressions, we use the multi-scale feature fusion strategy in the shallow feature extraction module to obtain global features with detailed information for capturing the subtle differences in facial expression images. Extensive experimental results on three in-the-wild FER benchmarks demonstrate the superiority of the proposed MM-Net compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 7

Facial expression recognition through multi-level features extraction and fusion

Article 04 June 2023

Self-supervised facial expression recognition with fine-grained feature selection

Article 17 March 2024

Facial Expression Recognition Based on Multi-scale CNNs

Data Availability

The RAF-DB datasets analyzed during the current study are available in http://www.whdeng.cn/raf/model1.html. The FERPlus datasets analyzed during the current study are available in https://github.com/Microsoft/FERPlus. The AffectNet datasets analyzed during the current study are available in http://mohammadmahoor.com/affectnet/. The FED-RO datasets analyzed during the current study are available in https://doi.org/10.1109/TIP.2018.2886767.

References

Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13(3), 1195–1215 (2020). https://doi.org/10.1109/TAFFC.2020.2981446
Article ADS Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., et al.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94–101 (2010)
Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011). https://doi.org/10.1016/j.imavis.2011.07.002
Article Google Scholar
Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: 2005 IEEE International Conference on Multimedia and Expo, p. 5 (2005)
Kim, Y., Yoo, B., Kwak, Y., Choi, C., Kim, J.: Deep generative-contrastive networks for facial expression recognition. arXiv preprint (2017). arXiv:1703.07140
Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017). https://doi.org/10.1109/TIP.2017.2689999
Article MathSciNet PubMed ADS Google Scholar
Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2168–2177 (2018)
Hazourli, A.R., Djeghri, A., Salam, H., Othmani, A.: Deep multi-facial patches aggregation network for facial expression recognition. arXiv preprint (2020). arXiv:2002.09298
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Goodfellow, I.J., Erhan, D., Carrier, P.L., et al.: Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124 (2013)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2019). https://doi.org/10.1109/TAFFC.2017.2740923
Article Google Scholar
Farzaneh, A.H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2401–2410 (2021)
Li, H., Xiao, X., Liu, X., Guo, J., Wen, G., Liang, P.: Heuristic objective for facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02619-7
Article Google Scholar
Siqueira, H., Magg, S., Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5800–5809 (2020)
Cai, J., Meng, Z., Khan, A.S., et al.: Identity-free facial expression recognition using conditional generative adversarial network. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1344–1348 (2021)
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3359–3368 (2018)
Hammal, Z., Arguin, M., Gosselin, F.: Comparing a novel model based on the transferable belief model with humans during the recognition of partially occluded facial expressions. J. Vis. 9(2), 22–22 (2009). https://doi.org/10.1167/9.2.23
Article Google Scholar
Ramírez Cornejo, J.Y., Pedrini, H.: Recognition of occluded facial expressions based on centrist features. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1298–1302 (2016)
Pan, B., Wang, S., Xia, B.: Occluded facial expression recognition enhanced through privileged information. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 566–573 (2019)
Adil, B., Nadjib, K.M., Yacine, L.: A novel approach for facial expression recognition. In: 2019 International Conference on Networking and Advanced Systems (ICNAS), pp. 1–5 (2019)
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020). https://doi.org/10.1109/TIP.2019.2956143
Article ADS Google Scholar
Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2019). https://doi.org/10.1109/TIP.2018.2886767
Article MathSciNet ADS Google Scholar
Du, R., Chang, D., Bhunia, A.K., Xie, J., Ma, Z., Song, Y.-Z., Guo, J.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: European Conference on Computer Vision, pp. 153–168 (2020)
Ding, H., Zhou, P., Chellappa, R.: Occlusion-adaptive deep network for robust facial expression recognition. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9 (2020)
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/TIP.2021.3093397
Article PubMed ADS Google Scholar
Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021). https://doi.org/10.1109/TAFFC.2021.3122146
Article Google Scholar
Liang, X., Xu, L., Zhang, W., et al.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02413-5
Article Google Scholar
Liu, C., Hirota, K., Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023). https://doi.org/10.1016/j.ins.2022.11.068
Article Google Scholar
Liao, L., Zhu, Y., Zheng, B., Jiang, X., Lin, J.: Fergcn: facial expression recognition based on graph convolution network. Mach. Vis. Appl. 33(3), 40 (2022). https://doi.org/10.1007/s00138-022-01288-9
Article PubMed PubMed Central Google Scholar
Gao, H., Wu, M., Chen, Z., et al.: Ssa-icl: Multi-domain adaptive attention with intra-dataset continual learning for facial expression recognition. Neural Netw. 158, 228–238 (2023). https://doi.org/10.1016/j.neunet.2022.11.025
Article PubMed Google Scholar
Ruan, D., Yan, Y., Lai, S., et al.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7656–7665 (2021)
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6897–6906 (2020)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, pp. 69–84 (2016)
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5152–5161 (2019)
Xia, H., Li, C., Tan, Y., Li, L., Song, S.: Destruction and reconstruction learning for facial expression recognition. IEEE Multimed. 28(2), 20–28 (2021). https://doi.org/10.1109/MMUL.2021.3076834
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Duta, I.C., Liu, L., Zhu, F., Shao, L.: Pyramidal convolution: rethinking convolutional neural networks for visual recognition. arXiv preprint (2020). arXiv:2006.11538
Gao, S., Cheng, M., Zhao, K., et al.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021). https://doi.org/10.1109/TPAMI.2019.2938758
Article PubMed Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102 (2016)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst., vol. 32 (2019)
Huang, C.: Combining convolutional neural networks for emotion recognition. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4 (2017)
Su, C., Wei, J., Lin, D., Kong, L.: Using attention lsgb network for facial expression recognition. Pattern Anal. Appl. (2022). https://doi.org/10.1007/s10044-022-01124-w
Article Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847 (2018)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos.62106054), the Science and Technology Project of Guangxi (Grant 2018GXNSFAA281351) and the Research Projects of Guangxi Normal University (Natural Sciences) (2021JC012).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Guangxi Normal University, Guilin, 541004, China
Haiying Xia, Lidan Lu & Shuxiang Song

Authors

Haiying Xia
View author publications
You can also search for this author in PubMed Google Scholar
Lidan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shuxiang Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuxiang Song.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xia, H., Lu, L. & Song, S. Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis Comput 40, 2035–2047 (2024). https://doi.org/10.1007/s00371-023-02900-3

Download citation

Accepted: 10 May 2023
Published: 10 June 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02900-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature fusion of multi-granularity and multi-scale for facial expression recognition

Abstract

Access this article

Similar content being viewed by others

Facial expression recognition through multi-level features extraction and fusion

Self-supervised facial expression recognition with fine-grained feature selection

Facial Expression Recognition Based on Multi-scale CNNs

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature fusion of multi-granularity and multi-scale for facial expression recognition

Abstract

Access this article

Similar content being viewed by others

Facial expression recognition through multi-level features extraction and fusion

Self-supervised facial expression recognition with fine-grained feature selection

Facial Expression Recognition Based on Multi-scale CNNs

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation