Advertisement

Deep Multi-task Learning to Recognise Subtle Facial Expressions of Mental States

  • Guosheng HuEmail author
  • Li Liu
  • Yang Yuan
  • Zehao Yu
  • Yang Hua
  • Zhihong Zhang
  • Fumin Shen
  • Ling Shao
  • Timothy Hospedales
  • Neil Robertson
  • Yongxin Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11216)

Abstract

Facial expression recognition is a topical task. However, very little research investigates subtle expression recognition, which is important for mental activity analysis, deception detection, etc. We address subtle expression recognition through convolutional neural networks (CNNs) by developing multi-task learning (MTL) methods to effectively leverage a side task: facial landmark detection. Existing MTL methods follow a design pattern of shared bottom CNN layers and task-specific top layers. However, the sharing architecture is usually heuristically chosen, as it is difficult to decide which layers should be shared. Our approach is composed of (1) a novel MTL framework that automatically learns which layers to share through optimisation under tensor trace norm regularisation and (2) an invariant representation learning approach that allows the CNN to leverage tasks defined on disjoint datasets without suffering from dataset distribution shift. To advance subtle expression recognition, we contribute a Large-scale Subtle Emotions and Mental States in the Wild database (LSEMSW). LSEMSW includes a variety of cognitive states as well as basic emotions. It contains 176K images, manually annotated with 13 emotions, and thus provides the first subtle expression dataset large enough for training deep CNNs. Evaluations on LSEMSW  and 300-W (landmark) databases show the effectiveness of the proposed methods. In addition, we investigate transferring knowledge learned from LSEMSW database to traditional (non-subtle) expression recognition. We achieve very competitive performance on Oulu-Casia NIR&Vis and CK+ databases via transfer learning.

Supplementary material

474200_1_En_7_MOESM1_ESM.pdf (5.6 mb)
Supplementary material 1 (pdf 5720 KB)

Supplementary material 2 (mp4 16336 KB)

References

  1. 1.
    Abadi, M., Agarwal, A., Barham, et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org
  2. 2.
    Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73, 243–272 (2008)CrossRefGoogle Scholar
  3. 3.
    Bakker, B., Heskes, T.: Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)zbMATHGoogle Scholar
  4. 4.
    Benitez-Quiroz, C.F., Srinivasan, R., Feng, Q., Wang, Y., Martinez, A.M.: Emotionet challenge: recognition of facial expressions of emotion in the wild. arXiv preprint arXiv:1703.01210 (2017)
  5. 5.
    Caruana, R.: Multitask learning. In: Thrun, S., Pratt, L. (eds.) Learning to Learn. Springer, Boston (1998).  https://doi.org/10.1007/978-1-4615-5529-2_5CrossRefGoogle Scholar
  6. 6.
    Dhall, A., Asthana, A., Goecke, R., Gedeon, T.: Emotion recognition using PHOG and LPQ features. In: Automatic Face and Gesture Recognition and Workshops (FG) (2011)Google Scholar
  7. 7.
    Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: ICCV Workshops (2011)Google Scholar
  8. 8.
    Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMed. 19(3), 0034 (2012)CrossRefGoogle Scholar
  9. 9.
    Ding, H., Zhou, S.K., Chellappa, R.: FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition. In: FG (2017)Google Scholar
  10. 10.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: KDD (2004)Google Scholar
  11. 11.
    Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: CVPR, pp. 5562–5570 (2016)Google Scholar
  12. 12.
    Feng, Z.H., Kittler, J., Christmas, W., Huber, P., Wu, X.J.: Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. arXiv preprint arXiv:1611.05396 (2016)
  13. 13.
    Ganin, Y., Lempitsky, V.S.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)Google Scholar
  14. 14.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  15. 15.
    Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-42051-1_16CrossRefGoogle Scholar
  16. 16.
    Halko, N., Martinsson, P., Tropp, J.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  18. 18.
    Huang, Z., Zhou, E., Cao, Z.: Coarse-to-fine face alignment with multi-scale local patch regression. arXiv preprint arXiv:1511.04901 (2015)
  19. 19.
    Ji, S., Ye, J.: An accelerated gradient method for trace norm minimization. In: ICML, pp. 457–464 (2009)Google Scholar
  20. 20.
    Kossaifi, J., Tzimiropoulos, G., Todorovic, S., Pantic, M.: AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis. Comput. 65, 23–36 (2017)CrossRefGoogle Scholar
  21. 21.
    Krause, R.: Universals and cultural differences in the judgments of facial expressions of emotion. J. Pers. Soc. Psychol. 5(3), 4–712 (1987)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  23. 23.
    Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: CVPR, July 2017Google Scholar
  24. 24.
    Li, S., Liu, Z.Q., Chan, A.B.: Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: CVPR Workshops (2014)Google Scholar
  25. 25.
    Li, X., et al.: Towards reading hidden emotions: a comparative study of spontaneous micro-expression spotting and recognition methods. IEEE Trans. Affect. Comput. (2017)Google Scholar
  26. 26.
    Liu, M., Li, S., Shan, S., Chen, X.: Au-inspired deep networks for facial expression feature learning. Neurocomputing 159, 126–136 (2015)CrossRefGoogle Scholar
  27. 27.
    Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: CVPR (2015)Google Scholar
  28. 28.
    Liu, X., Gao, J., He, X., Deng, L., Duh, K., Wang, Y.Y.: Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In: HLT-NAACL, pp. 912–921 (2015)Google Scholar
  29. 29.
    Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: CVPR Workshops (CVPRW) (2010)Google Scholar
  30. 30.
    Lv, J., Shao, X., Xing, J., Cheng, C., Zhou, X.: A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: CVPR (2017)Google Scholar
  31. 31.
    Lv, Y., Feng, Z., Xu, C.: Facial expression recognition via deep learning. In: International Conference on Smart Computing (SMARTCOMP) (2014)Google Scholar
  32. 32.
    Lyons, M.J., Akamatsu, S., Kamachi, M., Gyoba, J., Budynek, J.: The Japanese female facial expression (JAFFE) databaseGoogle Scholar
  33. 33.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)zbMATHGoogle Scholar
  34. 34.
    Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)CrossRefGoogle Scholar
  35. 35.
    Meng, H., Romera-Paredes, B., Bianchi-Berthouze, N.: Emotion recognition by two view SVM\_2K classifier on dynamic facial expression features. In: Automatic Face and Gesture Recognition and Workshops (FG) (2011)Google Scholar
  36. 36.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: CVPR (2016)Google Scholar
  37. 37.
    Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985 (2017)
  38. 38.
    Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. TPAMI 22, 1424–1445 (2000)CrossRefGoogle Scholar
  40. 40.
    Pramerdorfer, C., Kampel, M.: Facial expression recognition using convolutional neural networks: state of the art. arXiv preprint arXiv:1612.02903 (2016)
  41. 41.
    Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv (2016)Google Scholar
  42. 42.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  44. 44.
    Romera-Paredes, B., Aung, H., Bianchi-Berthouze, N., Pontil, M.: Multilinear multitask learning. In: ICML (2013)Google Scholar
  45. 45.
    Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: database and results. Image Vis. Comput. 47, 3–18 (2016)CrossRefGoogle Scholar
  46. 46.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: ICCV Workshops (2013)Google Scholar
  47. 47.
    Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)CrossRefGoogle Scholar
  48. 48.
    Sikka, K., Sharma, G., Bartlett, M.: LOMo: latent ordinal model for facial analysis in videos. In: CVPR (2016)Google Scholar
  49. 49.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  50. 50.
    Tang, H., Huang, T.S.: 3D facial expression recognition based on automatically selected features. In: CVPR Workshops, pp. 1–8 (2008)Google Scholar
  51. 51.
    Tomioka, R., Hayashi, K., Kashima, H.: On the extension of trace norm to tensors. In: NIPS Workshop on Tensors, Kernels, and Machine Learning (2010)Google Scholar
  52. 52.
    Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: CVPR (2016)Google Scholar
  53. 53.
    Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Walecki, R., Rudovic, O., Pavlovic, V., Pantic, M.: Variable-state latent conditional random fields for facial expression recognition and action unit detection. In: Automatic Face and Gesture Recognition (FG) (2015)Google Scholar
  55. 55.
    Warren, G., Schertler, E., Bull, P.: Detecting deception from emotional and unemotional cues. J. Nonverbal Behav. 33(1), 59–69 (2009)CrossRefGoogle Scholar
  56. 56.
    Wimalawarne, K., Sugiyama, M., Tomioka, R.: Multitask learning meets tensor factorization: task imputation via convex optimization. In: NIPS, pp. 2825–2833 (2014)Google Scholar
  57. 57.
    Xiao, S., Feng, J., Xing, J., Lai, H., Yan, S., Kassim, A.: Robust facial landmark detection via recurrent attentive-refinement networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 57–72. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_4CrossRefGoogle Scholar
  58. 58.
    Yang, Y., Hospedales, T.: Deep multi-task representation learning: a tensor factorisation approach. In: ICLR (2017)Google Scholar
  59. 59.
    Zafeiriou, S., Trigeorgis, G., Chrysos, G., Deng, J., Shen, J.: The menpo facial landmark localisation challenge: a step towards the solution. In: Computer Vision and Pattern Recognition Workshop (2017)Google Scholar
  60. 60.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar
  61. 61.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar
  62. 62.
    Zhao, G., Huang, X., Taini, M., Li, S.Z., Pietikäinen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)CrossRefGoogle Scholar
  63. 63.
    Zhao, X., et al.: Peak-piloted deep network for facial expression recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 425–442. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_27CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Guosheng Hu
    • 1
    • 2
    Email author
  • Li Liu
    • 3
  • Yang Yuan
    • 1
  • Zehao Yu
    • 4
  • Yang Hua
    • 2
  • Zhihong Zhang
    • 4
  • Fumin Shen
    • 5
  • Ling Shao
    • 3
  • Timothy Hospedales
    • 6
  • Neil Robertson
    • 1
    • 2
  • Yongxin Yang
    • 6
    • 7
  1. 1.AnyvisionBelfastUK
  2. 2.ECITQueens University of BelfastBelfastUK
  3. 3.Inception Institute of Artificial IntelligenceAbu DhabiUAE
  4. 4.Software DepartmentXiamen UniversityXiamenChina
  5. 5.University of Electronic Science and Technology of ChinaChengduChina
  6. 6.School of InformaticsUniversity of EdinburghEdinburghUK
  7. 7.Yang’s Accounting Consultancy Ltd.LondonUK

Personalised recommendations