Deep Learning for Facial Action Unit Detection Under Large Head Poses

  • Zoltán Tősér
  • László A. JeniEmail author
  • András Lőrincz
  • Jeffrey F. Cohn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9915)


Facial expression communicates emotion, intention, and physical state, and regulates interpersonal behavior. Automated face analysis (AFA) for the detection, synthesis, and understanding of facial expression is a vital focus of basic research with applications in behavioral science, mental and physical health and treatment, marketing, and human-robot interaction among other domains. In previous work, facial action unit (AU) detection becomes seriously degraded when head orientation exceeds \(15^{\circ }\) to \(20^{\circ }\). To achieve reliable AU detection over a wider range of head pose, we used 3D information to augment video data and a deep learning approach to feature selection and AU detection. Source video were from the BP4D database (n = 41) and the FERA test set of BP4D-extended (n = 20). Both consist of naturally occurring facial expression in response to a variety of emotion inductions. In augmented video, pose ranged between \(-18^{\circ }\) and \(90^{\circ }\) for yaw and between \(-54^{\circ }\) and \(54^{\circ }\) for pitch angles. Obtained results for action unit detection exceeded state-of-the-art, with as much as a 10 % increase in \(F_1\) measures.


Deep learning Facial action unit detection Pose dependence 



This work was supported in part by US National Institutes of Health grant MH096951 to the University of Pittsburgh and by US National Science Foundation grants CNS-1205664 and CNS-1205195 to the University of Pittsburgh and the University of Binghamton. Neither agency was involved in the planning or writing of the work.


  1. 1.
    Ambadar, Z., Cohn, J.F., Reed, L.I.: All smiles are not created equal: morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. J. Nonverbal Behav. 33(1), 17–34 (2009)CrossRefGoogle Scholar
  2. 2.
    Bauer, A., Wollherr, D., Buss, M.: Human-robot collaboration: a survey. Int. J. Humanoid Rob. 5(01), 47–66 (2008)CrossRefGoogle Scholar
  3. 3.
    Ekman, P., Rosenberg, E.L.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS), 2nd edn. Oxford University Press, New York (2005)CrossRefGoogle Scholar
  4. 4.
    Fairbairn, C.E., Sayette, M.A., Levine, J.M., Cohn, J.F., Creswell, K.G.: The effects of alcohol on the emotional displays of whites in interracial groups. Emotion 13(3), 468–477 (2013)CrossRefGoogle Scholar
  5. 5.
    Fridlund, A.J.: Human Facial Expression: An Evolutionary View. Academic Press, Cambridge (1994)Google Scholar
  6. 6.
    Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)CrossRefzbMATHGoogle Scholar
  7. 7.
    Girard, J.M., Cohn, J.F., Jeni, L.A., Sayette, M.A., De La Torre, F.: Spontaneous facial expression in unscripted social interactions can be measured automatically. Beh. Res. Methods 47, 1–12 (2014). articles/Girard14BRM.pdf Google Scholar
  8. 8.
    Girard, J.M., Cohn, J.F., Mahoor, M.H., Mavadati, S.M., Hammal, Z., Rosenwald, D.P.: Nonverbal social withdrawal in depression: evidence from manual and automatic analyses. Image Vis. Comput. 32(10), 641–647 (2014)CrossRefGoogle Scholar
  9. 9.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  10. 10.
    Griffin, K.M., Sayette, M.A.: Facial reactions to smoking cues relate to ambivalence about smoking. Psychol. Addict. Behav. 22(4), 551 (2008)CrossRefGoogle Scholar
  11. 11.
    Gudi, A., Tasli, H.E., den Uyl, T.M., Maroulis, A.: Deep learning based FACS action unit occurrence and intensity estimation. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 6, pp. 1–5. IEEE (2015)Google Scholar
  12. 12.
    Jaiswal, S., Valstar, M.F.: Deep learning the dynamic appearance and shape of facial action units. In: Winter Conference on Applications of Computer Vision, (WACV). IEEE, March 2015Google Scholar
  13. 13.
    Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII) (2013)Google Scholar
  14. 14.
    Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D videos in real-time. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) (2015).
  15. 15.
    Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D video for real-time use. Image and Vis. Comput. (2016). doi: 10.1016/j.imavis.2016.05.009
  16. 16.
    Jeni, L.A., Lőrincz, A., Szabó, Z., Cohn, J.F., Kanade, T.: Spatio-temporal event classification using time-series kernel based structured sparsity. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 135–150. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10593-2_10 Google Scholar
  17. 17.
    Keltner, D., MOffitt, T.E., Stouthamer-Loeber, M.: Facial expressions of emotion and psychopathology in adolescent boys. J. Abnorm. Psychol. 104(4), 644 (1995)CrossRefGoogle Scholar
  18. 18.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arxiv:1412.6980 (2014)
  19. 19.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  20. 20.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  21. 21.
    Li, H., Sun, J., Wang, D., Xu, Z., Chen, L.: Deep representation of facial geometric and photometric attributes for automatic 3D facial expression recognition. arXiv preprint arxiv:1511.03015 (2015)
  22. 22.
    Liu, M., Li, S., Shan, S., Chen, X.: Au-inspired deep networks for facial expression feature learning. Neurocomputing 159, 126–136 (2015)CrossRefGoogle Scholar
  23. 23.
    McDuff, D., el Kaliouby, R., Demirdjian, D., Picard, R.: Predicting online media effectiveness based on smile responses gathered over the internet. In: International Conference on Automatic Face and Gesture Recognition (2013)Google Scholar
  24. 24.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  25. 25.
    Szirtes, G., Szolgay, D., Utasi, A.: Facing reality: an industrial view on large scale use of facial expression analysis. In: Proceedings of the Emotion Recognition in the Wild Challenge and Workshop, pp. 1–8 (2013)Google Scholar
  26. 26.
    Valstar, M.F., Almaev, T., Girard, J.M., McKeown, G., Mehu, M., Yin, L., Pantic, M., Cohn, J.F.: Fera 2015-second facial expression recognition and analysis challenge. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 6, pp. 1–8. IEEE (2015)Google Scholar
  27. 27.
    Vicente, F., Huang, Z., Xiong, X., De la Torre, F., Zhang, W., Levi, D.: Driver gaze tracking and eyes off the road detection system. IEEE Trans. Intell. Transp. Syst. 16(4), 2014–2027 (2015)CrossRefGoogle Scholar
  28. 28.
    Xu, M., Cheng, W., Zhao, Q., Ma, L., Xu, F.: Facial expression recognition based on transfer learning from deep convolutional networks. In: 2015 11th International Conference on Natural Computation (ICNC), pp. 702–708. IEEE (2015)Google Scholar
  29. 29.
    Yuce, A., Gao, H., Thiran, J.P.: Discriminant multi-label manifold embedding for facial action unit detection. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 6, pp. 1–6 (2015)Google Scholar
  30. 30.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar
  31. 31.
    Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Liu, P., Girard, J.M.: BP4D-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Zoltán Tősér
    • 1
  • László A. Jeni
    • 2
    Email author
  • András Lőrincz
    • 1
  • Jeffrey F. Cohn
    • 2
    • 3
  1. 1.Faculty of InformaticsEötvös Loránd UniversityBudapestHungary
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  3. 3.Department of PsychologyThe University of PittsburghPittsburghUSA

Personalised recommendations