3D convolution network and Siamese-attention mechanism for expression recognition

Zhang, Yi-Feng; Xia, Tian; Liu, Yuan

doi:10.1007/s11042-019-07860-2

3D convolution network and Siamese-attention mechanism for expression recognition

Published: 25 June 2019

Volume 78, pages 30355–30371, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

252 Accesses
1 Citation
Explore all metrics

Abstract

Researches on Expression recognition focus on common subject-independent task, while cross-database evaluation is rare and lack of universal protocol. The key challenge for both tasks is to extract features that effectively describe the pattern of expression. In this paper, we present a variable length 3D convolution network that is able to output variable length features. Additionally, we proposed a Siamese 3D convolution network that utilize the“neutral, intermediate, peak” frames from another subject to provide attention weights for the extracted features. Furthermore, we proposed a method to extract fixed length landmark features from expression sequence as auxiliary for convolution network. At last, we try to recommend a universal protocol for cross-database evaluation. Experiments on both subject-independent task and cross-database evaluation show that our network not only achieves comprehensive better performance than previous methods, but also have better generalization ability due to the attention mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Discriminative attention-augmented feature learning for facial expression recognition in the wild

Article 29 April 2021

Augmented Feature Representation with Parallel Convolution for Cross-domain Facial Expression Recognition

The Method for Micro Expression Recognition Based on Improved Light-Weight CNN

References

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv 409.0473
Bänziger T, Scherer K R. (2010) Introducing the geneva mul-timodal emotion portrayal (gemep) corpus. Blueprint for Affective Computing: A Sourcebook 271–294
Bertinetto L, Valmadre J, Henriques J F, et al (2016) Fully-convolutional siamese networks for object tracking. In Proceedings of European Conference on Computer Vision, pp 850–865
Google Scholar
Bromley J, Guyon I, LeCun Y, et al (1994) Signature verification using a “siamese” time delay neural network. In Proceedings of Advances in neural information processing systems, pp 737–744
Google Scholar
Chen D, Ren SQ, Wei YC et al (2014) Joint cascade face detection and alignment. Lect Notes Comput Sci 8694:109–122
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893
Dhall A, Goecke R, Lucey S, et al (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In Proceedings of IEEE International Conference on Computer Vision Workshops, pp 2106–2112
Gehring J, Auli M, Grangier D, et al (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv 705.03122
Goodfellow I J, Erhan D, Carrier P L, et al (2013) Challenges in representation learning: a report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing, pp 117–124
Chapter Google Scholar
Gross R, Matthews I, Cohn J et al (2010) Multi-pie. Image Vis Comput 28(5):807–813
Article Google Scholar
Guo M, Hou X, Ma Y et al (2017) Facial expression recognition using ELBP based on covariance matrix transform in KLT. Multimed Tools Appl 76(2):2995–3010
Article Google Scholar
Guo Y, Zhao G, Pietikäinen M (2012) Dynamic facial expression recognition using longitudinal facial expression atlases. In Proceedings of the 12th European Conference on Computer Vision, pp 631–644
Chapter Google Scholar
Hasani B, Mahoor M H (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 2278–2288
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In Proceedings of International Workshop on Similarity-Based Pattern Recognition, pp 84–92
Chapter Google Scholar
Jung H, Lee S, Yim J, et al (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision, pp 2983–2991
Liu P, Han S, Meng Z et al (2014) Facial expression recognition via a boosted deep belief network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1805–1812
Liu M, Li S, Shan S, et al (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. In Proceedings of the Asian Conference on Computer Vision, pp 143–157
Chapter Google Scholar
Liu M, Shan S, Wang R, et al (2014) Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1749–1756
Lopes AT, de Aguiar E, De Souza AF et al (2017) Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recogn 61:610–628
Article Google Scholar
Lucey P, Cohn J F, Kanade T, et al (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 94–101
Lyons MJ, Budynek J, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362
Article Google Scholar
Mavadati SM, Mahoor MH, Bartlett K et al (2013) Disfa: A spontaneous facial action intensity database. IEEE Trans Affect Comput 4(2):151–160
Article Google Scholar
Mayer C, Eggers M, Radig B (2014) Cross-database evaluation for facial expression recognition. Pattern Recognition and Image Analysis 24(1):124–132
Article Google Scholar
Miao Y Q, Araujo R, Kamel M S (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Proceedings of 11th International Conference on Machine Learning and Applications, vol. 2, pp 326–332
Mollahosseini A, Chan D, Mahoor M H (2016) Going deeper in facial expression recognition using deep neural networks. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp 1–10
Päivärinta J, Rahtu E, Heikkilä J (2011) Volume local phase quantization for blur-insensitive dynamic texture classification. Lect Notes Comput Sci 6688:360–369
Article Google Scholar
Sanin A, Sanderson C, Harandi M T, et al (2013) Spatio-temporal covariance descriptors for action and gesture recognition. In Proceedings of the IEEE Workshop on Applications of Computer Vision, pp 103–110
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis Comput 27(6):803–816
Article Google Scholar
Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proceedings of the 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, pp 65
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems, pp 5998–6008
Wallhoff F, Schuller B, Hawellek M, et al (2006) Efficient recognition of authentic dynamic facial expressions on the feedtum database. In Proceedings of IEEE International Conference on Multimedia and Expo, pp 493–496
Wang Z, Wang S, Ji Q (2013) Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3422–3429
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 532–539
Zhang Y, Ji Q (2005) Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans Pattern Anal Mach Intell 27(5):699–714
Article Google Scholar
Zhang X, Mahoor MH, Mavadati SM (2015) Facial expression recognition using lp-norm MKL multiclass-SVM. Mach Vis Appl 26(4):467–483
Article Google Scholar
Zhao G, Huang X, Taini M et al (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Article Google Scholar
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar
Zhou G, Zhu X, Song C, et al (2018) Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1059–1068
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2879–2886

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of Jiangsu Province under Grant BK20151102, in part by the Ministry of Education Key Laboratory of Machine Perception, Peking University under Grant K-2016-03, in part by the Open Project Program of the Ministry of Education Key Laboratory of Underwater Acoustic Signal Processing, Southeast University under Grant UASP1502, and in part by the Natural Science Foundation of China under Grant 61673108 & 61802058.

Author information

Authors and Affiliations

School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
Yi-Feng Zhang, Tian Xia & Yuan Liu
Nanjing Institute of Communications Technologies, Southeast University, Nanjing, 211100, China
Yi-Feng Zhang
State Key Lab. for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Yi-Feng Zhang

Authors

Yi-Feng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tian Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tian Xia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, YF., Xia, T. & Liu, Y. 3D convolution network and Siamese-attention mechanism for expression recognition. Multimed Tools Appl 78, 30355–30371 (2019). https://doi.org/10.1007/s11042-019-07860-2

Download citation

Received: 19 November 2018
Revised: 03 May 2019
Accepted: 05 June 2019
Published: 25 June 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11042-019-07860-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D convolution network and Siamese-attention mechanism for expression recognition

Abstract

Access this article

Similar content being viewed by others

Discriminative attention-augmented feature learning for facial expression recognition in the wild

Augmented Feature Representation with Parallel Convolution for Cross-domain Facial Expression Recognition

The Method for Micro Expression Recognition Based on Improved Light-Weight CNN

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D convolution network and Siamese-attention mechanism for expression recognition

Abstract

Access this article

Similar content being viewed by others

Discriminative attention-augmented feature learning for facial expression recognition in the wild

Augmented Feature Representation with Parallel Convolution for Cross-domain Facial Expression Recognition

The Method for Micro Expression Recognition Based on Improved Light-Weight CNN

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation