Skip to main content
Log in

Stacking multiple cues for facial action unit detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this study, we develop a deep learning-based stacking scheme to detect facial action units (AU) in video data. Given a sequence of video frames, it combines multiple cues extracted from the AU detectors employing in frame, segment, and transition levels. Frame-based detector takes a single frame to determine the existence of AU by employing static face features. Segment-based detector examines various length of subsequences in the neighborhood of a frame to detect whether that frame is an element of an AU segment. Transition-based detector attempts to find the transitions from neutral faces containing no AUs to emotional faces or vice versa, by analyzing fixed size subsequences. The frame subsequences in segment and transition detectors are represented by motion history image, which models the temporal changes in faces. Each detector employs a separate convolutional neural network and, then their results are fed into a meta-classifier to learn the combining method. Combining multiple cues in different levels with a framework containing entirely deep networks improves the detection performance by both locating subtle AUs and tracking small changes in the facial muscles’ movements. In performance analysis, it is shown that the proposed approach significantly outperforms the state of the art methods, when compared on CK+, DISFA, and BP4D databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/st20210104/Stacking-Multiple-Cues-For-Facial-Action-Unit-Detection.git.

References

  1. Ekmann ve, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Personality Soc. Psychol. 17(2), 124–129 (1971)

    Article  Google Scholar 

  2. Pantic, M., Patras, I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 36(2), 433–449 (2006)

    Article  Google Scholar 

  3. Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., Wang, Q.: Facial action unit event detection by cascade of tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2400–2407 (2013)

  4. Broekens, J.: Emotion and reinforcement: affective facial expressions facilitate robot learning. In: Artifical Intelligence for Human Computing, pp. 113–132. Springer, Berlin (2007)

  5. Bravo, J. A., Forsythe, P., Chew, M. V., Escaravage, E., Savignac, H. M., Dinan, T. G., Cryan, J. F.: Ingestion of Lactobacillus strain regulates emotional behavior and central GABA receptor expression in a mouse via the vagus nerve. In: Proceedings of the National Academy of Sciences, 201102999 (2011)

  6. Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Girard, J.M.: Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)

    Article  Google Scholar 

  7. Duan, H., Shao, X., Hou, W., He, G., Zeng, Q.: An incremental learning algorithm for Lagrangian support vector machines. Pattern Recogn. Lett. 30(15), 1384–1391 (2009)

    Article  Google Scholar 

  8. Jiang, B., Valstar, M. F., Pantic, M.: Action unit detection using sparse appearance descriptors in space- time video volumes. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG 2011), pp. 314–321. IEEE (2011, March)

  9. Tang, C., Zheng, W., Yan, J., Li, Q., Li, Y., Zhang, T., Cui, Z.: View-independent facial action unit detec- tion. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 878–882. IEEE (2017, May)

  10. Zhao, K., Chu, W. S., De la Torre, F., Cohn, J. F., Zhang, H.: Joint patch and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2207–2216 (2015)

  11. Zhao, K., Chu, W. S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3391–3399 (2016)

  12. Taigman, Y., Yang, M., Ranzato, M. A., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)

  13. Romero, A., Leon, J., Arbelaez, P.: Multi-View Dynamic Facial Action Unit Detection, Image and Vision Computing (2018)

  14. Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing (2019)

  15. Corneanu, C. A., Madadi, M., Escalera, S.: Deep structure inference network for facial action unit recognition. In: European Conference on Computer Vision. Springer, pp. 309–324 (2018)

  16. De la Torre, F., Simon, T., Ambadar, Z., Cohn, J. F.: Fast-FACS: A computer-assisted system to increase speed and reliability of manual FACS coding. In: International Conference on Affective Computing and Intelligent Interaction, pp. 57–66. Springer, Berlin, Heidelberg (2011, October)

  17. Zeng, J., Chu, W.S., De la Torre, F., Cohn, J.F., Xiong, Z.: Confidence preserving machine for facial action unit detection. In: IEEE International Conference on Computer Vision, pp. 3622–3630. IEEE (2015)

  18. Rudovic, O., Pavlovic, V., Pantic, M., (2012) Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units. In: Fusiello A., Murino V., Cucchiara R. (eds) Computer Vision - ECCV 2012. Workshops and Demonstrations. ECCV, : Lecture Notes in Computer Science, vol. 7584. Springer, Berlin, Heidelberg (2012)

  19. Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Weakly-Supervised Attention and Relation Learningfor Facial Action Unit Detection. IEEE Transactions on Affective Computing (2018)

  20. Jaiswal, S., Valstar, M.: Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016, March)

  21. Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and op- timal temporal fusing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6766–6775. IEEE (2017, July)

  22. Li, W., Abtahi, F., Zhu, Z., Yin, L.: Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection (2017). arXiv preprint arXiv:1702.02925

  23. Valstar, M.F., Pantic, M.: Fully automatic recognition of the temporal phases of facial actions. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(1), 28–43 (2012)

    Article  Google Scholar 

  24. Pei, W., Dibekliolu, H., Tax, D.M., van der Maaten, L.: Multivariate time-series classification using the hidden- unit logistic model. IEEE Trans. Neural Netw. Learn. Syst. 29(4), 920–931 (2018)

    Article  Google Scholar 

  25. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (Vol. 1, pp. I-I). IEEE (2001, December)

  26. Zhang, Z., Zhai, S., Yin, L.: Identity-based Adversarial Training of Deep CNNs for Facial Action Unit Recognition. In: BMVC, p. 226 (2018, September)

  27. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M. (2014). Incremental face alignment in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition(pp. 1859-1866)

  28. Davis, J. W., Bobick, A. F.: The repre- sentation and recognition of human movement using temporal templates. In: 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997. Proceedings., pp. 928–934. IEEE (1997, June)

  29. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohnkanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE (2010, June)

  30. Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)

    Article  Google Scholar 

  31. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  32. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

  33. Zhong, L., Liu, Q., Yang, P., Huang, J., Metaxas, D.N.: Learning multiscale active facial patches for expression analysis. IEEE Trans. Cybern. 45(8), 1499–1510 (2014)

    Article  Google Scholar 

  34. Zhi, R., Liu, M., Zhang, D.: A comprehensive survey on automatic facial action unit analysis. Vis. Comput. 36(5), 1067–1093 (2020)

  35. Martinez, B., Valstar, M. F., Jiang, B., Pantic, M.: Automatic analysis of facial actions: a survey. IEEE transactions on affective computing (2017)

  36. Sumathi, C.P., Santhanam, T., Mahadevi, M.: Automatic facial expression analysis a survey. Int. J. Comput. Sci. Eng. Surv. 3(6), 47 (2012)

    Article  Google Scholar 

  37. Li, G., Zhu, X., Zeng, Y., Wang, Q., Lin, L.: Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 8594–8601) (2019, July)

  38. Liu, Z., Dong, J., Zhang, C., Wang, L., Dang, J.: Relation modeling with graph convolutional networks for facial action unit detection. In: International Conference on Multimedia Modeling, pp. 489–501. Springer, Cham (2020, January)

  39. Shao, Z., Liu, Z., Cai, J., Ma, L.: Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 705–720 (2018)

  40. Chu, W. S., De la Torre, F., Cohn, J. F.: Modeling spatial and temporal cues for multi-label facial action unit detection (2016). arXiv preprint arXiv:1608.00911

  41. Song, T., Chen, L., Zheng, W., Ji, Q.: Uncertain Graph Neural Networks for Facial Action Unit Detection. (AAAI 2021) (2021)

  42. Cui, Z., Song, T., Wang, Y., Ji, Q.: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition. Advances in Neural Information Processing Systems, 33. (NeurIPS 2020) (2020)

  43. Huang, Y., Qing, L., Xu, S., Wang, L., Peng, Y.: HybNet: a hybrid network structure for pain intensity estimation. Vis. Comput. 2021, 1–12 (2021)

    Google Scholar 

  44. Joseph, A., Geetha, P.: Facial emotion detection using modified eyemap-mouthmap algorithm on an enhanced image and classification with tensorflow. Vis. Comput. 36(3), 529–539 (2020)

    Article  Google Scholar 

  45. Vinolin, V., Sucharitha, M.: Dual adaptive deep convolutional neural network for video forgery detection in 3D lighting environment. The Visual Computer, pp. 1–22 (2020)

  46. Zhu, X., Chen, Z.: Dual-modality spatiotemporal feature learning for spontaneous facial expression recognition in e-learning using hybrid deep neural network. Vis. Comput. 2019, 1–13 (2019)

    Google Scholar 

  47. Danelakis, A., Theoharis, T., Pratikakis, I.: A robust spatio-temporal scheme for dynamic 3D facial expression retrieval. Vis. Comput. 32(2), 257–269 (2016)

    Article  Google Scholar 

Download references

Funding

This work is supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant No. 115E310.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simge Akay.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akay, S., Arica, N. Stacking multiple cues for facial action unit detection. Vis Comput 38, 4235–4250 (2022). https://doi.org/10.1007/s00371-021-02291-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02291-3

Keywords

Navigation