Abstract
Researches on Expression recognition focus on common subject-independent task, while cross-database evaluation is rare and lack of universal protocol. The key challenge for both tasks is to extract features that effectively describe the pattern of expression. In this paper, we present a variable length 3D convolution network that is able to output variable length features. Additionally, we proposed a Siamese 3D convolution network that utilize the“neutral, intermediate, peak” frames from another subject to provide attention weights for the extracted features. Furthermore, we proposed a method to extract fixed length landmark features from expression sequence as auxiliary for convolution network. At last, we try to recommend a universal protocol for cross-database evaluation. Experiments on both subject-independent task and cross-database evaluation show that our network not only achieves comprehensive better performance than previous methods, but also have better generalization ability due to the attention mechanism.
Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv 409.0473
Bänziger T, Scherer K R. (2010) Introducing the geneva mul-timodal emotion portrayal (gemep) corpus. Blueprint for Affective Computing: A Sourcebook 271–294
Bertinetto L, Valmadre J, Henriques J F, et al (2016) Fully-convolutional siamese networks for object tracking. In Proceedings of European Conference on Computer Vision, pp 850–865
Bromley J, Guyon I, LeCun Y, et al (1994) Signature verification using a “siamese” time delay neural network. In Proceedings of Advances in neural information processing systems, pp 737–744
Chen D, Ren SQ, Wei YC et al (2014) Joint cascade face detection and alignment. Lect Notes Comput Sci 8694:109–122
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893
Dhall A, Goecke R, Lucey S, et al (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In Proceedings of IEEE International Conference on Computer Vision Workshops, pp 2106–2112
Gehring J, Auli M, Grangier D, et al (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv 705.03122
Goodfellow I J, Erhan D, Carrier P L, et al (2013) Challenges in representation learning: a report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing, pp 117–124
Gross R, Matthews I, Cohn J et al (2010) Multi-pie. Image Vis Comput 28(5):807–813
Guo M, Hou X, Ma Y et al (2017) Facial expression recognition using ELBP based on covariance matrix transform in KLT. Multimed Tools Appl 76(2):2995–3010
Guo Y, Zhao G, Pietikäinen M (2012) Dynamic facial expression recognition using longitudinal facial expression atlases. In Proceedings of the 12th European Conference on Computer Vision, pp 631–644
Hasani B, Mahoor M H (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 2278–2288
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In Proceedings of International Workshop on Similarity-Based Pattern Recognition, pp 84–92
Jung H, Lee S, Yim J, et al (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision, pp 2983–2991
Liu P, Han S, Meng Z et al (2014) Facial expression recognition via a boosted deep belief network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1805–1812
Liu M, Li S, Shan S, et al (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. In Proceedings of the Asian Conference on Computer Vision, pp 143–157
Liu M, Shan S, Wang R, et al (2014) Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1749–1756
Lopes AT, de Aguiar E, De Souza AF et al (2017) Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recogn 61:610–628
Lucey P, Cohn J F, Kanade T, et al (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 94–101
Lyons MJ, Budynek J, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362
Mavadati SM, Mahoor MH, Bartlett K et al (2013) Disfa: A spontaneous facial action intensity database. IEEE Trans Affect Comput 4(2):151–160
Mayer C, Eggers M, Radig B (2014) Cross-database evaluation for facial expression recognition. Pattern Recognition and Image Analysis 24(1):124–132
Miao Y Q, Araujo R, Kamel M S (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Proceedings of 11th International Conference on Machine Learning and Applications, vol. 2, pp 326–332
Mollahosseini A, Chan D, Mahoor M H (2016) Going deeper in facial expression recognition using deep neural networks. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp 1–10
Päivärinta J, Rahtu E, Heikkilä J (2011) Volume local phase quantization for blur-insensitive dynamic texture classification. Lect Notes Comput Sci 6688:360–369
Sanin A, Sanderson C, Harandi M T, et al (2013) Spatio-temporal covariance descriptors for action and gesture recognition. In Proceedings of the IEEE Workshop on Applications of Computer Vision, pp 103–110
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis Comput 27(6):803–816
Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proceedings of the 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, pp 65
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems, pp 5998–6008
Wallhoff F, Schuller B, Hawellek M, et al (2006) Efficient recognition of authentic dynamic facial expressions on the feedtum database. In Proceedings of IEEE International Conference on Multimedia and Expo, pp 493–496
Wang Z, Wang S, Ji Q (2013) Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3422–3429
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 532–539
Zhang Y, Ji Q (2005) Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans Pattern Anal Mach Intell 27(5):699–714
Zhang X, Mahoor MH, Mavadati SM (2015) Facial expression recognition using lp-norm MKL multiclass-SVM. Mach Vis Appl 26(4):467–483
Zhao G, Huang X, Taini M et al (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Zhou G, Zhu X, Song C, et al (2018) Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1059–1068
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2879–2886
Acknowledgments
This work was supported in part by the Natural Science Foundation of Jiangsu Province under Grant BK20151102, in part by the Ministry of Education Key Laboratory of Machine Perception, Peking University under Grant K-2016-03, in part by the Open Project Program of the Ministry of Education Key Laboratory of Underwater Acoustic Signal Processing, Southeast University under Grant UASP1502, and in part by the Natural Science Foundation of China under Grant 61673108 & 61802058.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, YF., Xia, T. & Liu, Y. 3D convolution network and Siamese-attention mechanism for expression recognition. Multimed Tools Appl 78, 30355–30371 (2019). https://doi.org/10.1007/s11042-019-07860-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07860-2