Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Zhou, Haoliang; Huang, Shucheng; Xu, Yuqiao

doi:10.1007/s00530-023-01164-0

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Special Issue Paper
Published: 31 August 2023

Volume 29, pages 3863–3876, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Haoliang Zhou¹,
Shucheng Huang¹ &
Yuqiao Xu¹

568 Accesses
6 Citations
Explore all metrics

Abstract

Micro-Expressions (MEs) are the instantaneous and subtle facial movement that conveys crucial emotional information. However, traditional neural networks face difficulties in accurately capturing the delicate features of MEs due to the limited amount of available data. To address this issue, a dual-branch attention network is proposed for ME recognition, called IncepTR, which can capture attention-aware local and global representations. The network takes optical flow features as input and performs feature extraction using a dual-branch network. First, the Inception model based on the Convolutional Block Attention Module (CBAM) attention mechanism is maintained for multi-scale local feature extraction. Second, the Vision Transformer (ViT) is employed to capture subtle motion features and robustly model global relationships among multiple local patches. Additionally, to enhance the rich relationships between different local patches in ViT, Multi-head Self-Attention Dropping (MSAD) is introduced to drop an attention map randomly, effectively preventing overfitting to specific regions. Finally, the two types of features could be used to learn ME representations effectively through similarity comparison and feature fusion. With such combination, the model is forced to capture the most discriminative multi-scale local and global features while reducing the influence of affective-irrelevant features. Extensive experiments show that the proposed IncepTR achieves UF1 and UAR of 0.753 and 0.746 on the composite dataset MEGC2019-CD, demonstrating better or competitive performance compared to existing state-of-the-art methods for ME recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

Our code is available at https://github.com/HaoliangZhou/IncepTR.

References

Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124 (1971)
Article Google Scholar
Ben, X., Ren, Y., Zhang, J., Wang, S.-J., Kpalma, K., Meng, W., Liu, Y.-J.: Video-based facial micro-expression analysis: A survey of datasets, features and algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5826–5846 (2021)
Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar
Liu, Y.-J., Zhang, J.-K., Yan, W.-J., Wang, S.-J., Zhao, G., Fu, X.: A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Trans. Affect. Comput. 7(4), 299–310 (2015)
Article Google Scholar
Liong, S.-T., See, J., Wong, K., Phan, R.C.-W.: Less is more: micro-expression recognition from video using apex frame. Signal Process. Image Commun. 62, 82–92 (2018)
Article Google Scholar
Zhou, L., Mao, Q., Xue, L.: Dual-inception network for cross-database micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE
Liong, S.-T., Gan, Y.S., See, J., Khor, H.-Q., Huang, Y.-C.: Shallow triple stream three-dimensional cnn (ststnet) for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE
Li, H., Sui, M., Zhao, F., Zha, Z., Wu, F.: Mvt: mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520 (2021)
Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021). https://doi.org/10.1109/TAFFC.2021.3122146
Article Google Scholar
Zhou, H., Huang, S., Li, J., Wang, S.-J.: Dual-atme: dual-branch attention network for micro-expression recognition. Entropy 25(3), 460 (2023)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv Preprint (2020). https://doi.org/10.4855/arXiv.2010.11929
Article Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357 (2021). PMLR
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Xue, F., Wang, Q., Guo, G.: Transfer: Learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)
Zhang, L., Hong, X., Arandjelović, O., Zhao, G.: Short and long range relation based spatio-temporal transformer for micro-expression recognition. IEEE Trans. Affect. Comput. 13(4), 1973–1985 (2022)
Article Google Scholar
Ran, R., Shi, K., Jiang, X., Wang, N.: Micro-expression recognition method based on dual attention crossvit. J. Nanjing Univ. Inform. Eng. 1–11 (2023). http://kns.cnki.net/kcms/detail/32.1801.N.20230214.0837.002.html. Accessed 28 Aug 2023
Li, X., Pfister, T., Huang, X., Zhao, G., Pietikäinen, M.: A spontaneous micro-expression database: Inducement, collection and baseline. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (fg), pp. 1–6 (2013). IEEE
Yan, W.-J., Li, X., Wang, S.-J., Zhao, G., Liu, Y.-J., Chen, Y.-H., Fu, X.: Casme ii: an improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE 9(1), 86041 (2014)
Article Google Scholar
Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: Samm: a spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 9(01), 116–129 (2018)
Article Google Scholar
Li, J., Dong, Z., Lu, S., Wang, S.-J., Yan, W.-J., Ma, Y., Liu, Y., Huang, C., Fu, X.: Cas(me)\(^{3}\): a third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 2782–2800 (2023)
Google Scholar
See, J., Yap, M.H., Li, J., Hong, X., Wang, S.-J.: Megc 2019–the second facial micro-expressions grand challenge. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE
Liong, S.-T., See, J., Phan, R.C.-W., Wong, K., Tan, S.-W.: Hybrid facial regions extraction for micro-expression recognition system. J. Signal Process. Syst. 90(4), 601–617 (2018)
Article Google Scholar
Huang, X., Zhao, G., Hong, X., Zheng, W., Pietikäinen, M.: Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns. Neurocomputing 175, 564–578 (2016)
Article Google Scholar
Huang, X., Wang, S.-J., Liu, X., Zhao, G., Feng, X., Pietikäinen, M.: Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition. IEEE Trans. Affect. Comput. 10(1), 32–47 (2017)
Article Google Scholar
Gan, Y.S., Liong, S.-T., Yau, W.-C., Huang, Y.-C., Tan, L.-K.: Off-apexnet on micro-expression recognition system. Signal Process. Image Commun. 74, 129–139 (2019)
Article Google Scholar
Xia, Z., Peng, W., Khor, H.-Q., Feng, X., Zhao, G.: Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Trans. Image Process. 29, 8590–8605 (2020)
Article MATH Google Scholar
Wang, Y., Huang, Y., Liu, C., Gu, X., Yang, D., Wang, S., Zhang, B.: Micro expression recognition via dual-stream spatiotemporal attention network. J. Healthc. Eng. (2021). https://doi.org/10.1155/2021/7799100
Article Google Scholar
Chen, B., Liu, K.-H., Xu, Y., Wu, Q.-Q., Yao, J.-F.: Block division convolutional network with implicit deep features augmentation for micro-expression recognition. IEEE Trans. Multimed. 25, 1345–58 (2022)
Article Google Scholar
Wang, G., Huang, S., Tao, Z.: Shallow multi-branch attention convolutional neural network for micro-expression recognition. Multim. Syst. 7, 1–14 (2023)
Google Scholar
Van Quang, N., Chun, J., Tokuyama, T.: Capsulenet for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–7 (2019). IEEE
Rodriguez, P., Velazquez, D., Cucurull, G., Gonfaus, J.M., Roca, F.X., Gonzalez, J.: Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. IEEE Trans. Multim. 22(2), 502–514 (2019)
Article Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Zhao, S., Tang, H., Liu, S., Zhang, Y., Wang, H., Xu, T., Chen, E., Guan, C.: Me-plan: a deep prototypical learning with local attention network for dynamic micro-expression recognition. Neural Netw. 153, 427–443 (2022)
Article Google Scholar
Wang, G., Huang, S., Dong, Z.: Haphazard cuboids feature extraction for micro-expression recognition. IEEE Access 10, 110149–110162 (2022)
Article Google Scholar
Su, Y., Zhang, J., Liu, J., Zhai, G.: Key facial components guided micro-expression recognition based on first & second-order motion. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021). IEEE
Li, H., Sui, M., Zhu, Z., Zhao, F.: Mmnet: Muscle motion-guided network for micro-expression recognition. arXiv preprint arXiv:2201.05297 (2022)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. in Neural Inform. Process. Syst. 13(4), 1973 (2017)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Xue, F., Wang, Q., Tan, Z., Ma, Z., Guo, G.: Vision transformer with attentive pooling for robust facial expression recognition. IEEE Trans. Affect. Comput. (2022)
Chen, C.-F.R., Fan, Q., Panda, R.: Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-l 1 optical flow. In: Joint Pattern Recognition Symposium, pp. 214–223 (2007). Springer
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Yan, W.-J., Wang, S.-J., Chen, Y.-H., Zhao, G., Fu, X.: Quantifying micro-expressions with constraint local model and local binary pattern. In: European Conference on Computer Vision, pp. 296–305 (2014). Springer
Xia, Z., Hong, X., Gao, X., Feng, X., Zhao, G.: Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multim. 22(3), 626–640 (2019)
Article Google Scholar
Li, J., Soladie, C., Seguier, R.: Local temporal pattern and data augmentation for micro-expression spotting. IEEE Trans. Affect. Comput. (2020). https://doi.org/10.1109/TAFFC.2020.3023821
Article Google Scholar
Huang, L., Wang, W., Chen, J., Wei, X.-Y.: Attention on attention for image captioning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4634–4643 (2019)
Melacci, S., Sarti, L., Maggini, M., Bianchini, M.: A neural network approach to similarity learning. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 133–136 (2008). Springer
Peng, M., Wang, C., Bi, T., Shi, Y., Zhou, X., Chen, T.: A novel apex-time network for cross-dataset micro-expression recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–6 (2019). IEEE
Nie, X., Takalkar, M.A., Duan, M., Zhang, H., Xu, M.: Geme: Dual-stream multi-task gender-based micro-expression recognition. Neurocomputing 427, 13–28 (2021)
Article Google Scholar
Eckman, P., Friesen, W.: Facial action coding system (facs): a technique for the measurement of facial action. Environ. Psychol. Nonverbal Bahav. 5(3), 56–75 (1978)
Google Scholar

Download references

Acknowledgements

This work was supported, in part, by grants from the National Natural Science Foundation of China (62276118, 61772244), in part, by grants from Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX22_3853).

Author information

Authors and Affiliations

School of Computer, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
Haoliang Zhou, Shucheng Huang & Yuqiao Xu

Authors

Haoliang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shucheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqiao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shucheng Huang.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, H., Huang, S. & Xu, Y. Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer. Multimedia Systems 29, 3863–3876 (2023). https://doi.org/10.1007/s00530-023-01164-0

Download citation

Received: 10 April 2023
Accepted: 12 August 2023
Published: 31 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00530-023-01164-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

A review of convolutional neural networks in computer vision

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

A review of convolutional neural networks in computer vision

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation