Skip to main content
Log in

3D convolution network and Siamese-attention mechanism for expression recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Researches on Expression recognition focus on common subject-independent task, while cross-database evaluation is rare and lack of universal protocol. The key challenge for both tasks is to extract features that effectively describe the pattern of expression. In this paper, we present a variable length 3D convolution network that is able to output variable length features. Additionally, we proposed a Siamese 3D convolution network that utilize the“neutral, intermediate, peak” frames from another subject to provide attention weights for the extracted features. Furthermore, we proposed a method to extract fixed length landmark features from expression sequence as auxiliary for convolution network. At last, we try to recommend a universal protocol for cross-database evaluation. Experiments on both subject-independent task and cross-database evaluation show that our network not only achieves comprehensive better performance than previous methods, but also have better generalization ability due to the attention mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv 409.0473

  2. Bänziger T, Scherer K R. (2010) Introducing the geneva mul-timodal emotion portrayal (gemep) corpus. Blueprint for Affective Computing: A Sourcebook 271–294

  3. Bertinetto L, Valmadre J, Henriques J F, et al (2016) Fully-convolutional siamese networks for object tracking. In Proceedings of European Conference on Computer Vision, pp 850–865

    Google Scholar 

  4. Bromley J, Guyon I, LeCun Y, et al (1994) Signature verification using a “siamese” time delay neural network. In Proceedings of Advances in neural information processing systems, pp 737–744

    Google Scholar 

  5. Chen D, Ren SQ, Wei YC et al (2014) Joint cascade face detection and alignment. Lect Notes Comput Sci 8694:109–122

    Article  Google Scholar 

  6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893

  7. Dhall A, Goecke R, Lucey S, et al (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In Proceedings of IEEE International Conference on Computer Vision Workshops, pp 2106–2112

  8. Gehring J, Auli M, Grangier D, et al (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv 705.03122

  9. Goodfellow I J, Erhan D, Carrier P L, et al (2013) Challenges in representation learning: a report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing, pp 117–124

    Chapter  Google Scholar 

  10. Gross R, Matthews I, Cohn J et al (2010) Multi-pie. Image Vis Comput 28(5):807–813

    Article  Google Scholar 

  11. Guo M, Hou X, Ma Y et al (2017) Facial expression recognition using ELBP based on covariance matrix transform in KLT. Multimed Tools Appl 76(2):2995–3010

    Article  Google Scholar 

  12. Guo Y, Zhao G, Pietikäinen M (2012) Dynamic facial expression recognition using longitudinal facial expression atlases. In Proceedings of the 12th European Conference on Computer Vision, pp 631–644

    Chapter  Google Scholar 

  13. Hasani B, Mahoor M H (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 2278–2288

  14. Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In Proceedings of International Workshop on Similarity-Based Pattern Recognition, pp 84–92

    Chapter  Google Scholar 

  15. Jung H, Lee S, Yim J, et al (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision, pp 2983–2991

  16. Liu P, Han S, Meng Z et al (2014) Facial expression recognition via a boosted deep belief network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1805–1812

  17. Liu M, Li S, Shan S, et al (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. In Proceedings of the Asian Conference on Computer Vision, pp 143–157

    Chapter  Google Scholar 

  18. Liu M, Shan S, Wang R, et al (2014) Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1749–1756

  19. Lopes AT, de Aguiar E, De Souza AF et al (2017) Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recogn 61:610–628

    Article  Google Scholar 

  20. Lucey P, Cohn J F, Kanade T, et al (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 94–101

  21. Lyons MJ, Budynek J, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362

    Article  Google Scholar 

  22. Mavadati SM, Mahoor MH, Bartlett K et al (2013) Disfa: A spontaneous facial action intensity database. IEEE Trans Affect Comput 4(2):151–160

    Article  Google Scholar 

  23. Mayer C, Eggers M, Radig B (2014) Cross-database evaluation for facial expression recognition. Pattern Recognition and Image Analysis 24(1):124–132

    Article  Google Scholar 

  24. Miao Y Q, Araujo R, Kamel M S (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Proceedings of 11th International Conference on Machine Learning and Applications, vol. 2, pp 326–332

  25. Mollahosseini A, Chan D, Mahoor M H (2016) Going deeper in facial expression recognition using deep neural networks. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp 1–10

  26. Päivärinta J, Rahtu E, Heikkilä J (2011) Volume local phase quantization for blur-insensitive dynamic texture classification. Lect Notes Comput Sci 6688:360–369

    Article  Google Scholar 

  27. Sanin A, Sanderson C, Harandi M T, et al (2013) Spatio-temporal covariance descriptors for action and gesture recognition. In Proceedings of the IEEE Workshop on Applications of Computer Vision, pp 103–110

  28. Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis Comput 27(6):803–816

    Article  Google Scholar 

  29. Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497

  30. Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proceedings of the 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, pp 65

  31. Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems, pp 5998–6008

  32. Wallhoff F, Schuller B, Hawellek M, et al (2006) Efficient recognition of authentic dynamic facial expressions on the feedtum database. In Proceedings of IEEE International Conference on Multimedia and Expo, pp 493–496

  33. Wang Z, Wang S, Ji Q (2013) Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 3422–3429

  34. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 532–539

  35. Zhang Y, Ji Q (2005) Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans Pattern Anal Mach Intell 27(5):699–714

    Article  Google Scholar 

  36. Zhang X, Mahoor MH, Mavadati SM (2015) Facial expression recognition using lp-norm MKL multiclass-SVM. Mach Vis Appl 26(4):467–483

    Article  Google Scholar 

  37. Zhao G, Huang X, Taini M et al (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619

    Article  Google Scholar 

  38. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

  39. Zhou G, Zhu X, Song C, et al (2018) Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1059–1068

  40. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2879–2886

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of Jiangsu Province under Grant BK20151102, in part by the Ministry of Education Key Laboratory of Machine Perception, Peking University under Grant K-2016-03, in part by the Open Project Program of the Ministry of Education Key Laboratory of Underwater Acoustic Signal Processing, Southeast University under Grant UASP1502, and in part by the Natural Science Foundation of China under Grant 61673108 & 61802058.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Xia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, YF., Xia, T. & Liu, Y. 3D convolution network and Siamese-attention mechanism for expression recognition. Multimed Tools Appl 78, 30355–30371 (2019). https://doi.org/10.1007/s11042-019-07860-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07860-2

Keywords

Navigation