Skip to main content
Log in

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Micro-Expressions (MEs) are the instantaneous and subtle facial movement that conveys crucial emotional information. However, traditional neural networks face difficulties in accurately capturing the delicate features of MEs due to the limited amount of available data. To address this issue, a dual-branch attention network is proposed for ME recognition, called IncepTR, which can capture attention-aware local and global representations. The network takes optical flow features as input and performs feature extraction using a dual-branch network. First, the Inception model based on the Convolutional Block Attention Module (CBAM) attention mechanism is maintained for multi-scale local feature extraction. Second, the Vision Transformer (ViT) is employed to capture subtle motion features and robustly model global relationships among multiple local patches. Additionally, to enhance the rich relationships between different local patches in ViT, Multi-head Self-Attention Dropping (MSAD) is introduced to drop an attention map randomly, effectively preventing overfitting to specific regions. Finally, the two types of features could be used to learn ME representations effectively through similarity comparison and feature fusion. With such combination, the model is forced to capture the most discriminative multi-scale local and global features while reducing the influence of affective-irrelevant features. Extensive experiments show that the proposed IncepTR achieves UF1 and UAR of 0.753 and 0.746 on the composite dataset MEGC2019-CD, demonstrating better or competitive performance compared to existing state-of-the-art methods for ME recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

  1. Our code is available at https://github.com/HaoliangZhou/IncepTR.

References

  1. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124 (1971)

    Article  Google Scholar 

  2. Ben, X., Ren, Y., Zhang, J., Wang, S.-J., Kpalma, K., Meng, W., Liu, Y.-J.: Video-based facial micro-expression analysis: A survey of datasets, features and algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5826–5846 (2021)

    Google Scholar 

  3. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)

    Article  Google Scholar 

  4. Liu, Y.-J., Zhang, J.-K., Yan, W.-J., Wang, S.-J., Zhao, G., Fu, X.: A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Trans. Affect. Comput. 7(4), 299–310 (2015)

    Article  Google Scholar 

  5. Liong, S.-T., See, J., Wong, K., Phan, R.C.-W.: Less is more: micro-expression recognition from video using apex frame. Signal Process. Image Commun. 62, 82–92 (2018)

    Article  Google Scholar 

  6. Zhou, L., Mao, Q., Xue, L.: Dual-inception network for cross-database micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE

  7. Liong, S.-T., Gan, Y.S., See, J., Khor, H.-Q., Huang, Y.-C.: Shallow triple stream three-dimensional cnn (ststnet) for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE

  8. Li, H., Sui, M., Zhao, F., Zha, Z., Wu, F.: Mvt: mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520 (2021)

  9. Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021). https://doi.org/10.1109/TAFFC.2021.3122146

    Article  Google Scholar 

  10. Zhou, H., Huang, S., Li, J., Wang, S.-J.: Dual-atme: dual-branch attention network for micro-expression recognition. Entropy 25(3), 460 (2023)

    Article  Google Scholar 

  11. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv Preprint (2020). https://doi.org/10.4855/arXiv.2010.11929

    Article  Google Scholar 

  12. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357 (2021). PMLR

  13. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

  14. Xue, F., Wang, Q., Guo, G.: Transfer: Learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)

  15. Zhang, L., Hong, X., Arandjelović, O., Zhao, G.: Short and long range relation based spatio-temporal transformer for micro-expression recognition. IEEE Trans. Affect. Comput. 13(4), 1973–1985 (2022)

    Article  Google Scholar 

  16. Ran, R., Shi, K., Jiang, X., Wang, N.: Micro-expression recognition method based on dual attention crossvit. J. Nanjing Univ. Inform. Eng. 1–11 (2023). http://kns.cnki.net/kcms/detail/32.1801.N.20230214.0837.002.html. Accessed 28 Aug 2023

  17. Li, X., Pfister, T., Huang, X., Zhao, G., Pietikäinen, M.: A spontaneous micro-expression database: Inducement, collection and baseline. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (fg), pp. 1–6 (2013). IEEE

  18. Yan, W.-J., Li, X., Wang, S.-J., Zhao, G., Liu, Y.-J., Chen, Y.-H., Fu, X.: Casme ii: an improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE 9(1), 86041 (2014)

    Article  Google Scholar 

  19. Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: Samm: a spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 9(01), 116–129 (2018)

    Article  Google Scholar 

  20. Li, J., Dong, Z., Lu, S., Wang, S.-J., Yan, W.-J., Ma, Y., Liu, Y., Huang, C., Fu, X.: Cas(me)\(^{3}\): a third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 2782–2800 (2023)

    Google Scholar 

  21. See, J., Yap, M.H., Li, J., Hong, X., Wang, S.-J.: Megc 2019–the second facial micro-expressions grand challenge. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE

  22. Liong, S.-T., See, J., Phan, R.C.-W., Wong, K., Tan, S.-W.: Hybrid facial regions extraction for micro-expression recognition system. J. Signal Process. Syst. 90(4), 601–617 (2018)

    Article  Google Scholar 

  23. Huang, X., Zhao, G., Hong, X., Zheng, W., Pietikäinen, M.: Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns. Neurocomputing 175, 564–578 (2016)

    Article  Google Scholar 

  24. Huang, X., Wang, S.-J., Liu, X., Zhao, G., Feng, X., Pietikäinen, M.: Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition. IEEE Trans. Affect. Comput. 10(1), 32–47 (2017)

    Article  Google Scholar 

  25. Gan, Y.S., Liong, S.-T., Yau, W.-C., Huang, Y.-C., Tan, L.-K.: Off-apexnet on micro-expression recognition system. Signal Process. Image Commun. 74, 129–139 (2019)

    Article  Google Scholar 

  26. Xia, Z., Peng, W., Khor, H.-Q., Feng, X., Zhao, G.: Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Trans. Image Process. 29, 8590–8605 (2020)

    Article  MATH  Google Scholar 

  27. Wang, Y., Huang, Y., Liu, C., Gu, X., Yang, D., Wang, S., Zhang, B.: Micro expression recognition via dual-stream spatiotemporal attention network. J. Healthc. Eng. (2021). https://doi.org/10.1155/2021/7799100

    Article  Google Scholar 

  28. Chen, B., Liu, K.-H., Xu, Y., Wu, Q.-Q., Yao, J.-F.: Block division convolutional network with implicit deep features augmentation for micro-expression recognition. IEEE Trans. Multimed. 25, 1345–58 (2022)

    Article  Google Scholar 

  29. Wang, G., Huang, S., Tao, Z.: Shallow multi-branch attention convolutional neural network for micro-expression recognition. Multim. Syst. 7, 1–14 (2023)

    Google Scholar 

  30. Van Quang, N., Chun, J., Tokuyama, T.: Capsulenet for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–7 (2019). IEEE

  31. Rodriguez, P., Velazquez, D., Cucurull, G., Gonfaus, J.M., Roca, F.X., Gonzalez, J.: Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. IEEE Trans. Multim. 22(2), 502–514 (2019)

    Article  Google Scholar 

  32. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  33. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  34. Zhao, S., Tang, H., Liu, S., Zhang, Y., Wang, H., Xu, T., Chen, E., Guan, C.: Me-plan: a deep prototypical learning with local attention network for dynamic micro-expression recognition. Neural Netw. 153, 427–443 (2022)

    Article  Google Scholar 

  35. Wang, G., Huang, S., Dong, Z.: Haphazard cuboids feature extraction for micro-expression recognition. IEEE Access 10, 110149–110162 (2022)

    Article  Google Scholar 

  36. Su, Y., Zhang, J., Liu, J., Zhai, G.: Key facial components guided micro-expression recognition based on first & second-order motion. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021). IEEE

  37. Li, H., Sui, M., Zhu, Z., Zhao, F.: Mmnet: Muscle motion-guided network for micro-expression recognition. arXiv preprint arXiv:2201.05297 (2022)

  38. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. in Neural Inform. Process. Syst. 13(4), 1973 (2017)

    Google Scholar 

  39. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)

  40. Xue, F., Wang, Q., Tan, Z., Ma, Z., Guo, G.: Vision transformer with attentive pooling for robust facial expression recognition. IEEE Trans. Affect. Comput. (2022)

  41. Chen, C.-F.R., Fan, Q., Panda, R.: Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)

  42. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  43. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-l 1 optical flow. In: Joint Pattern Recognition Symposium, pp. 214–223 (2007). Springer

  44. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  45. Yan, W.-J., Wang, S.-J., Chen, Y.-H., Zhao, G., Fu, X.: Quantifying micro-expressions with constraint local model and local binary pattern. In: European Conference on Computer Vision, pp. 296–305 (2014). Springer

  46. Xia, Z., Hong, X., Gao, X., Feng, X., Zhao, G.: Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multim. 22(3), 626–640 (2019)

    Article  Google Scholar 

  47. Li, J., Soladie, C., Seguier, R.: Local temporal pattern and data augmentation for micro-expression spotting. IEEE Trans. Affect. Comput. (2020). https://doi.org/10.1109/TAFFC.2020.3023821

    Article  Google Scholar 

  48. Huang, L., Wang, W., Chen, J., Wei, X.-Y.: Attention on attention for image captioning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4634–4643 (2019)

  49. Melacci, S., Sarti, L., Maggini, M., Bianchini, M.: A neural network approach to similarity learning. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 133–136 (2008). Springer

  50. Peng, M., Wang, C., Bi, T., Shi, Y., Zhou, X., Chen, T.: A novel apex-time network for cross-dataset micro-expression recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–6 (2019). IEEE

  51. Nie, X., Takalkar, M.A., Duan, M., Zhang, H., Xu, M.: Geme: Dual-stream multi-task gender-based micro-expression recognition. Neurocomputing 427, 13–28 (2021)

    Article  Google Scholar 

  52. Eckman, P., Friesen, W.: Facial action coding system (facs): a technique for the measurement of facial action. Environ. Psychol. Nonverbal Bahav. 5(3), 56–75 (1978)

    Google Scholar 

Download references

Acknowledgements

This work was supported, in part, by grants from the National Natural Science Foundation of China (62276118, 61772244), in part, by grants from Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX22_3853).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shucheng Huang.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, H., Huang, S. & Xu, Y. Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer. Multimedia Systems 29, 3863–3876 (2023). https://doi.org/10.1007/s00530-023-01164-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01164-0

Keywords

Navigation