Advertisement

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

  • Zhiwen Shao
  • Zhilei LiuEmail author
  • Jianfei Cai
  • Lizhuang MaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11217)

Abstract

Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection works often treat face alignment as a preprocessing and handle the two tasks independently. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection. Experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for AU detection.

Keywords

Joint learning Facial AU detection Face alignment Adaptive attention learning 

Notes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61503277 and No. 61472245), the Science and Technology Commission of Shanghai Municipality Program (No. 16511101300), and Data Science & Artificial Intelligence Research Centre@NTU (DSAIR) and SINGTEL-NTU Cognitive & Artificial Intelligence Joint Lab (SCALE@NTU).

References

  1. 1.
    Benitez-Quiroz, C.F., Srinivasan, R., Martinez, A.M., et al.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570. IEEE (2016)Google Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)CrossRefGoogle Scholar
  3. 3.
    Chu, W.S., De la Torre, F., Cohn, J.F.: Learning spatial and temporal cues for multi-label facial action unit detection. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 25–32. IEEE (2017)Google Scholar
  4. 4.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRefGoogle Scholar
  5. 5.
    Ekman, P., Rosenberg, E.L.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, USA (1997)Google Scholar
  6. 6.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)zbMATHGoogle Scholar
  7. 7.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  8. 8.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  9. 9.
    Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6766–6775. IEEE (2017)Google Scholar
  10. 10.
    Li, W., Abtahi, F., Zhu, Z., Yin, L.: EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 103–110. IEEE (2017)Google Scholar
  11. 11.
    Li, Y., Wang, S., Zhao, Y., Ji, Q.: Simultaneous facial feature tracking and facial expression recognition. IEEE Trans. Image Process. 22(7), 2559–2573 (2013)CrossRefGoogle Scholar
  12. 12.
    Liu, Z., Song, G., Cai, J., Cham, T.J., Zhang, J.: Conditional adversarial synthesis of 3D facial action units. arXiv preprint arXiv:1802.07421 (2018)
  13. 13.
    Martinez, B., Valstar, M.F., Jiang, B., Pantic, M.: Automatic analysis of facial actions: a survey. IEEE Trans. Affect. Comput. PP(99), 1 (2017)Google Scholar
  14. 14.
    Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: A spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)CrossRefGoogle Scholar
  15. 15.
    Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: International Conference on 3D Vision, pp. 565–571. IEEE (2016)Google Scholar
  16. 16.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning, pp. 807–814 (2010)Google Scholar
  17. 17.
    Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2017)Google Scholar
  18. 18.
    Shao, Z., Ding, S., Zhao, Y., Zhang, Q., Ma, L.: Learning deep representation from coarse to fine for face alignment. In: IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2016)Google Scholar
  19. 19.
    Shao, Z., Zhu, H., Hao, Y., Wang, M., Ma, L.: Learning a multi-center convolutional network for unconstrained face alignment. In: IEEE International Conference on Multimedia and Expo, pp. 109–114. IEEE (2017)Google Scholar
  20. 20.
    la Torre, F.D., Chu, W.S., Xiong, X., Vicente, F., Ding, X., Cohn, J.: Intraface. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1–8. IEEE (2015)Google Scholar
  21. 21.
    Valstar, M., Pantic, M.: Fully automatic facial action unit detection and temporal analysis. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 149–149. IEEE (2006)Google Scholar
  22. 22.
    Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3471–3480. IEEE (2017)Google Scholar
  23. 23.
    Wu, Y., Ji, Q.: Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3400–3408. IEEE (2016)Google Scholar
  24. 24.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539. IEEE (2013)Google Scholar
  25. 25.
    Zeng, J., Chu, W.S., De la Torre, F., Cohn, J.F., Xiong, Z.: Confidence preserving machine for facial action unit detection. In: IEEE International Conference on Computer Vision, pp. 3622–3630. IEEE (2015)Google Scholar
  26. 26.
    Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Liu, P.: A high-resolution spontaneous 3D dynamic facial expression database. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–6. IEEE (2013)Google Scholar
  27. 27.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar
  28. 28.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2016)CrossRefGoogle Scholar
  29. 29.
    Zhao, K., Chu, W.S., De la Torre, F., Cohn, J.F., Zhang, H.: Joint patch and multi-label learning for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2207–2216. IEEE (2015)Google Scholar
  30. 30.
    Zhao, K., Chu, W.S., De la Torre, F., Cohn, J.F., Zhang, H.: Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Trans. Image Process. 25(8), 3931–3946 (2016)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Zhao, K., Chu, W.S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3391–3399. IEEE (2016)Google Scholar
  32. 32.
    Zhong, L., Liu, Q., Yang, P., Huang, J., Metaxas, D.N.: Learning multiscale active facial patches for expression analysis. IEEE Trans. Cybern. 45(8), 1499–1510 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.College of Intellengence and ComputingTianjin UniversityTianjinChina
  3. 3.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore
  4. 4.School of Computer Science and Software EngineeringEast China Normal UniversityShanghaiChina

Personalised recommendations