Abstract
Micro-Expressions (MEs) are the instantaneous and subtle facial movement that conveys crucial emotional information. However, traditional neural networks face difficulties in accurately capturing the delicate features of MEs due to the limited amount of available data. To address this issue, a dual-branch attention network is proposed for ME recognition, called IncepTR, which can capture attention-aware local and global representations. The network takes optical flow features as input and performs feature extraction using a dual-branch network. First, the Inception model based on the Convolutional Block Attention Module (CBAM) attention mechanism is maintained for multi-scale local feature extraction. Second, the Vision Transformer (ViT) is employed to capture subtle motion features and robustly model global relationships among multiple local patches. Additionally, to enhance the rich relationships between different local patches in ViT, Multi-head Self-Attention Dropping (MSAD) is introduced to drop an attention map randomly, effectively preventing overfitting to specific regions. Finally, the two types of features could be used to learn ME representations effectively through similarity comparison and feature fusion. With such combination, the model is forced to capture the most discriminative multi-scale local and global features while reducing the influence of affective-irrelevant features. Extensive experiments show that the proposed IncepTR achieves UF1 and UAR of 0.753 and 0.746 on the composite dataset MEGC2019-CD, demonstrating better or competitive performance compared to existing state-of-the-art methods for ME recognition.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Notes
Our code is available at https://github.com/HaoliangZhou/IncepTR.
References
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124 (1971)
Ben, X., Ren, Y., Zhang, J., Wang, S.-J., Kpalma, K., Meng, W., Liu, Y.-J.: Video-based facial micro-expression analysis: A survey of datasets, features and algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5826–5846 (2021)
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Liu, Y.-J., Zhang, J.-K., Yan, W.-J., Wang, S.-J., Zhao, G., Fu, X.: A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Trans. Affect. Comput. 7(4), 299–310 (2015)
Liong, S.-T., See, J., Wong, K., Phan, R.C.-W.: Less is more: micro-expression recognition from video using apex frame. Signal Process. Image Commun. 62, 82–92 (2018)
Zhou, L., Mao, Q., Xue, L.: Dual-inception network for cross-database micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE
Liong, S.-T., Gan, Y.S., See, J., Khor, H.-Q., Huang, Y.-C.: Shallow triple stream three-dimensional cnn (ststnet) for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE
Li, H., Sui, M., Zhao, F., Zha, Z., Wu, F.: Mvt: mask vision transformer for facial expression recognition in the wild. arXiv preprint arXiv:2106.04520 (2021)
Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. (2021). https://doi.org/10.1109/TAFFC.2021.3122146
Zhou, H., Huang, S., Li, J., Wang, S.-J.: Dual-atme: dual-branch attention network for micro-expression recognition. Entropy 25(3), 460 (2023)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv Preprint (2020). https://doi.org/10.4855/arXiv.2010.11929
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357 (2021). PMLR
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Xue, F., Wang, Q., Guo, G.: Transfer: Learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)
Zhang, L., Hong, X., Arandjelović, O., Zhao, G.: Short and long range relation based spatio-temporal transformer for micro-expression recognition. IEEE Trans. Affect. Comput. 13(4), 1973–1985 (2022)
Ran, R., Shi, K., Jiang, X., Wang, N.: Micro-expression recognition method based on dual attention crossvit. J. Nanjing Univ. Inform. Eng. 1–11 (2023). http://kns.cnki.net/kcms/detail/32.1801.N.20230214.0837.002.html. Accessed 28 Aug 2023
Li, X., Pfister, T., Huang, X., Zhao, G., Pietikäinen, M.: A spontaneous micro-expression database: Inducement, collection and baseline. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (fg), pp. 1–6 (2013). IEEE
Yan, W.-J., Li, X., Wang, S.-J., Zhao, G., Liu, Y.-J., Chen, Y.-H., Fu, X.: Casme ii: an improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE 9(1), 86041 (2014)
Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: Samm: a spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 9(01), 116–129 (2018)
Li, J., Dong, Z., Lu, S., Wang, S.-J., Yan, W.-J., Ma, Y., Liu, Y., Huang, C., Fu, X.: Cas(me)\(^{3}\): a third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 2782–2800 (2023)
See, J., Yap, M.H., Li, J., Hong, X., Wang, S.-J.: Megc 2019–the second facial micro-expressions grand challenge. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE
Liong, S.-T., See, J., Phan, R.C.-W., Wong, K., Tan, S.-W.: Hybrid facial regions extraction for micro-expression recognition system. J. Signal Process. Syst. 90(4), 601–617 (2018)
Huang, X., Zhao, G., Hong, X., Zheng, W., Pietikäinen, M.: Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns. Neurocomputing 175, 564–578 (2016)
Huang, X., Wang, S.-J., Liu, X., Zhao, G., Feng, X., Pietikäinen, M.: Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition. IEEE Trans. Affect. Comput. 10(1), 32–47 (2017)
Gan, Y.S., Liong, S.-T., Yau, W.-C., Huang, Y.-C., Tan, L.-K.: Off-apexnet on micro-expression recognition system. Signal Process. Image Commun. 74, 129–139 (2019)
Xia, Z., Peng, W., Khor, H.-Q., Feng, X., Zhao, G.: Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Trans. Image Process. 29, 8590–8605 (2020)
Wang, Y., Huang, Y., Liu, C., Gu, X., Yang, D., Wang, S., Zhang, B.: Micro expression recognition via dual-stream spatiotemporal attention network. J. Healthc. Eng. (2021). https://doi.org/10.1155/2021/7799100
Chen, B., Liu, K.-H., Xu, Y., Wu, Q.-Q., Yao, J.-F.: Block division convolutional network with implicit deep features augmentation for micro-expression recognition. IEEE Trans. Multimed. 25, 1345–58 (2022)
Wang, G., Huang, S., Tao, Z.: Shallow multi-branch attention convolutional neural network for micro-expression recognition. Multim. Syst. 7, 1–14 (2023)
Van Quang, N., Chun, J., Tokuyama, T.: Capsulenet for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–7 (2019). IEEE
Rodriguez, P., Velazquez, D., Cucurull, G., Gonfaus, J.M., Roca, F.X., Gonzalez, J.: Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. IEEE Trans. Multim. 22(2), 502–514 (2019)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Zhao, S., Tang, H., Liu, S., Zhang, Y., Wang, H., Xu, T., Chen, E., Guan, C.: Me-plan: a deep prototypical learning with local attention network for dynamic micro-expression recognition. Neural Netw. 153, 427–443 (2022)
Wang, G., Huang, S., Dong, Z.: Haphazard cuboids feature extraction for micro-expression recognition. IEEE Access 10, 110149–110162 (2022)
Su, Y., Zhang, J., Liu, J., Zhai, G.: Key facial components guided micro-expression recognition based on first & second-order motion. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021). IEEE
Li, H., Sui, M., Zhu, Z., Zhao, F.: Mmnet: Muscle motion-guided network for micro-expression recognition. arXiv preprint arXiv:2201.05297 (2022)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. in Neural Inform. Process. Syst. 13(4), 1973 (2017)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Xue, F., Wang, Q., Tan, Z., Ma, Z., Guo, G.: Vision transformer with attentive pooling for robust facial expression recognition. IEEE Trans. Affect. Comput. (2022)
Chen, C.-F.R., Fan, Q., Panda, R.: Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-l 1 optical flow. In: Joint Pattern Recognition Symposium, pp. 214–223 (2007). Springer
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Yan, W.-J., Wang, S.-J., Chen, Y.-H., Zhao, G., Fu, X.: Quantifying micro-expressions with constraint local model and local binary pattern. In: European Conference on Computer Vision, pp. 296–305 (2014). Springer
Xia, Z., Hong, X., Gao, X., Feng, X., Zhao, G.: Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multim. 22(3), 626–640 (2019)
Li, J., Soladie, C., Seguier, R.: Local temporal pattern and data augmentation for micro-expression spotting. IEEE Trans. Affect. Comput. (2020). https://doi.org/10.1109/TAFFC.2020.3023821
Huang, L., Wang, W., Chen, J., Wei, X.-Y.: Attention on attention for image captioning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4634–4643 (2019)
Melacci, S., Sarti, L., Maggini, M., Bianchini, M.: A neural network approach to similarity learning. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 133–136 (2008). Springer
Peng, M., Wang, C., Bi, T., Shi, Y., Zhou, X., Chen, T.: A novel apex-time network for cross-dataset micro-expression recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–6 (2019). IEEE
Nie, X., Takalkar, M.A., Duan, M., Zhang, H., Xu, M.: Geme: Dual-stream multi-task gender-based micro-expression recognition. Neurocomputing 427, 13–28 (2021)
Eckman, P., Friesen, W.: Facial action coding system (facs): a technique for the measurement of facial action. Environ. Psychol. Nonverbal Bahav. 5(3), 56–75 (1978)
Acknowledgements
This work was supported, in part, by grants from the National Natural Science Foundation of China (62276118, 61772244), in part, by grants from Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX22_3853).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, H., Huang, S. & Xu, Y. Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer. Multimedia Systems 29, 3863–3876 (2023). https://doi.org/10.1007/s00530-023-01164-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01164-0