Abstract
Pose-invariant facial expression recognition is one of the popular research directions within the field of computer vision, but pose variant usually change the facial appearance significantly, making the recognition results unstable from different perspectives. In this paper, a novel deep learning method, namely, soft thresholding squeeze-and-excitation (ST-SE) block, was proposed to extract salient features of different channels for pose-invariant FER. For the purpose of adapting to different pose-invariant facial images better, global average pooling (GAP) operation was adopted to compute the average value of each channel of the feature map. To enhance the representational power of the network, Squeeze-and-Excitation (SE) block was embedded into the nonlinear transformation layer to filter out the redundant feature information. To further shrink the significant features, the absolute values of GAP and SE were multiplied to calculate the threshold suitable for the current view. And the developed ST-SE block was inserted into ResNet50 for the evaluation of recognition performance. In this study, extensive experiments on four pose-invariant datasets were carried out, i.e., BU-3DFE, Multi-PIE, Pose-RAF-DB and Pose-AffectNet, and the influences of different environments, poses and intensities on expression recognition were specifically analyzed. The experimental results demonstrate the feasibility and effectiveness of our method.
Similar content being viewed by others
References
Shu, X., Yang, J., Yan, R.: Expansion-squeeze-excitation fusion network for elderly activity recognition. arXiv e-prints (2021).
Gogić, I., Manhart, M., Pandžić, I.S.: Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. 36(1), 97–112 (2020)
Shu, X., Tang, J., Li, Z.: Personalized age progression with bi-level aging dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 905–917 (2018)
Shu, X., Tang, J., Li, Z.: Personalized age progression with aging dictionary. In: IEEE International Conference on Computer Vision (ICCV), pp. 3970–3978 (2015).
Kumar, S., Bhuyan, M.K., Iwahori, Y.: Multi-level uncorrelated discriminative shared Gaussian process for multi-view facial expression recognition. Vis. Comput. 37(1), 143–159 (2021)
Goh, K.M., Ng, C.H., Li, L.L.: Micro-expression recognition: an updated review of current trends, challenges and solutions. Vis. Comput. 36(3), 445–468 (2020)
Zhu, X., Chen, Z.: Dual-modality spatiotemporal feature learning for spontaneous facial expression recognition in e-learning using hybrid deep neural network. Vis. Comput. 36(4), 743–755 (2019)
Hu, M., Ge, P., Wang, X.: A spatio-temporal integrated model based on local and global features for video expression recognition. Vis. Comput. 1–18 (2021)
Zhang, W., Zhang, Y., Ma, L., Guan, J., Gong, S.: Multimodal learning for facial expression recognition. Pattern Recogn. 48(10), 3191–3202 (2015)
Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5, 71–85 (2014)
Moore, S., Bowden, R.: Local binary patterns for multi-view facial expression recognition. Comput. Vision. Image Underst. 115(4), 541–558 (2011)
Zhang, F.F., Mao, Q.R., Shen, X.J.: Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans. Multimed. Comput. Commun. 14(1), 1–19 (2018)
Liu, Y., Duanmu, M.X., Huo, Z.: Exploring multi-scale deformable context and channel-wise attention for salient object detection. Neurocomputing 428, 92–103 (2021)
Liu, Y., Wei, D., Fang, F., et al. : Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition. Inf. Sci. 578, 195–213 (2021)
Liu, Y., Zeng, J., Shan, S.: Multi-channel pose-aware convolution neural networks for multi-view facial expression recognition. In: 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 458–465 (2018).
Zhang, K., Huang, Y., Du, Y.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26, 4193–4203 (2017)
Zheng, H., Wang, R., Ji, W.: Discriminative deep multi-task learning for facial expression recognition. Inf. Sci. 533, 60–71 (2020)
Ma, H., Celik, T., Li, H.C.: Lightweight attention convolutional neural network through network slimming for robust facial expression recognition. SIViP. 15(7), 1507–1515 (2021)
Li, Y., Lu, G., Li, J.: Facial Expression Recognition in the Wild Using Multi-level Features and Attention Mechanisms. IEEE Trans. Affect. Comput. 10(99), 1–1 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 7132–7141 (2017)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of 32nd International Conference on Mechanical Learning, Lille, France, pp. 448-456 (2015).
Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of International Conference on Learning Computer Science. Vol 20, Issue 13, (2014).
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference Computer Vision Pattern Recognit (CVPR). pp. 770–778 (2016).
Yin, L., Wei, X., Sun, Y., et al. : A 3D facial expression database for facial behavior research. In: Proceedings of the IEEE International Conference on Automatic Face Gesture Recognition, pp. 211–216 (2006).
Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-PIE. Image Vis. Comput. 28(5), 807–813 (2010)
Li, S., Deng, W.: Reliable crowdsourcing and deep locality preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28, 356–370 (2019)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10, 18–31 (2019)
Wu, J.L., Lin, Z.C., Zheng, W.M.: Locality-constrained linear coding based bi-layer model for multi-view facial expression recognition. Neurocomputing 239, 143–152 (2017)
Jung, H., Lee, S., Yim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of International Conference Computer Vision pp. 2982–2991 (2015).
Zhang, T., Zheng, W., Cui, Z.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 18(12), 2528–2536 (2016)
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Geometry guided pose-invariant facial expression recognition. IEEE Trans. Image Process. 29, 4445–4460 (2020)
Can, W., Wang, S., Liang, G.: Identity and pose-robust facial expression recognition through adversarial feature learning. In: The 27th ACM International Conference ACM. pp. 238–246 (2019).
Jampour, M., Mauthner, T., Bischof, H.: Multi-view facial expressions recognition using local linear regression of sparse codes. In: Computer Vision Winter Workshop Paul Wohlhart (2015).
Fan, J., Wang, S., Yang, P., et al. : Multi-view facial expression recognition based on multitask learning and generative adversarial network. In: IEEE International Conference on Industrial Informatics. (2020).
Wang, K., Peng, X., Yang, J.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Gera, D., Balasubramanian, S.: CERN: Compact facial expression recognition net. Pattern Recogn. Lett. 155, 9–18 (2022)
Gera, D., Balasubramanian, S.: Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition. Pattern Recogn. Lett. 145, 58–66 (2021)
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Wang, Z.N., Zeng, F.W.: OAENet: Oriented attention ensemble for accurate facial expression recognition. Pattern Recogn. 112(5), 107694 (2021)
Acknowledgements
National Natural Science Foundation of China (No:31872399), Advantage Discipline Construction Project (PAPD, No.6-2018) of Jiangsu University
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, C., Liu, X., Chen, C. et al. Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition. Vis Comput 39, 2637–2652 (2023). https://doi.org/10.1007/s00371-022-02483-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02483-5