From Facial Expression Recognition to Interpersonal Relation Prediction

  • Zhanpeng Zhang
  • Ping Luo
  • Chen Change Loy
  • Xiaoou Tang
Article
  • 423 Downloads

Abstract

Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. We investigate if such fine-grained and high-level relation traits can be characterized and quantified from face images in the wild. We address this challenging problem by first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data. While conventional supervised training requires datasets with complete labels (e.g., all samples must be labeled with gender, age, and expression), we show that this requirement can be relaxed via a novel attribute propagation method. The approach further allows us to leverage the inherent correspondences between heterogeneous attribute sources despite the disparate distributions of different datasets. With the network we demonstrate state-of-the-art results on existing facial expression recognition benchmarks. To predict inter-personal relation, we use the expression recognition network as branches for a Siamese model. Extensive experiments show that our model is capable of mining mutual context of faces for accurate fine-grained interpersonal prediction.

Keywords

Facial expression recognition Interpersonal relation Deep convolutional network 

Notes

Acknowledgements

This work is supported by SenseTime Group Limited and the General Research Fund sponsored by the Research Grants Council of the Hong Kong SAR (CUHK 14241716, 14224316. 14209217).

References

  1. Bi, W., & Kwok, J. T. (2014). Multilabel classification with label correlations and missing labels. In AAAI conference on artificial intelligence (pp. 1680–1686).Google Scholar
  2. Bromley, J., Guyon, I., Lecun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a Siamese time delay neural network. In Advances in neural information processing systems.Google Scholar
  3. Celeux, G., Forbes, F., & Peyrard, N. (2003). EM procedures using mean field-like approximations for markov model-based image segmentation. Pattern Recognition, 36(1), 131–144.CrossRefMATHGoogle Scholar
  4. Chakraborty, I., Cheng, H., & Javed, O. (2013). 3D visual proxemics: Recognizing human interactions in 3D from a single image. In IEEE conference on computer vision and pattern recognition (pp. 3406–3413).Google Scholar
  5. Chen, Y. Y., Hsu, W. H., & Liao, H. Y. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACM multimedia (pp. 669–678).Google Scholar
  6. Chu, X., Ouyang, W., Yang, W., & Wang, X. (2015). Multi-task recurrent neural network for immediacy prediction. In Proceedings of the IEEE international conference on computer vision (pp. 3352–3360).Google Scholar
  7. Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, 100, 86–97.CrossRefGoogle Scholar
  8. Dahmane, M., & Meunier, J. (2011). Emotion recognition using dynamic grid-based hog features. In IEEE international conference on automatic face & gesture recognition (pp. 884–888).Google Scholar
  9. Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In IEEE conference on computer vision and pattern recognition.Google Scholar
  10. Dhall, A., Asthana, A., Goecke, R., & Gedeon, T. (2011). Emotion recognition using PHOG and LPQ features. In IEEE international conference on automatic face & gesture recognition and workshops (pp. 878–883).Google Scholar
  11. Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In ACM international conference on multimodal interaction (pp. 423–426).Google Scholar
  12. Ding, L., & Yilmaz, A. (2010). Learning relations among movie characters: A social network perspective. In European conference on computer vision.Google Scholar
  13. Ding, L., & Yilmaz, A. (2011). Inferring social relations from visual concepts. In IEEE international conference on computer vision (pp. 699–706).Google Scholar
  14. Emily, M., & Hand, R. C. (2017). Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In AAAI conference on artificial intelligence.Google Scholar
  15. Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In IEEE conference on computer vision and pattern recognition.Google Scholar
  16. Fathi, A., Hodgins, J. K., & Rehg, J. M. (2012). Social interactions: A first-person perspective. In IEEE conference on computer vision and pattern recognition.Google Scholar
  17. Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In IEEE conference on computer vision and pattern recognition (pp. 256–263). IEEE.Google Scholar
  18. Girard, J. M. (2014). Perceptions of interpersonal behavior are influenced by gender, facial expression intensity, and head pose. In ACM international conference on multimodal interaction (pp. 394–398).Google Scholar
  19. Goodfellow, I., Erhan, D., Carrier, P. L., Courville, A., Mirza, et al. (2013). Challenges in representation learning: A report on three machine learning contests. http://arxiv.org/abs/1307.0414.
  20. Gottman, J., Levenson, R., & Woodin, E. (2001). Facial expressions during marital conflict. Journal of Family Communication, 1(1), 37–57.CrossRefGoogle Scholar
  21. Gupta, A. K., & Nagar, D. K. (1999). Matrix variate distributions. Boca Raton: CRC Press.MATHGoogle Scholar
  22. Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283.CrossRefGoogle Scholar
  23. Hoai, M., & Zisserman, A. (2014). Talking heads: Detecting humans and recognizing their interactions. In IEEE conference on computer vision and pattern recognition.Google Scholar
  24. Hu, Y., Zeng, Z., Yin, L., Wei, X., Zhou, X., & Huang, T. S. (2008). Multi-view facial expression recognition. In IEEE international conference on automatic face & gesture recognition.  https://doi.org/10.1109/AFGR.2008.4813445.
  25. Huang, C., Li, Y., Loy, C. C., & Tang, X. (2016). Learning deep representation for imbalanced classification. In IEEE conference on computer vision and pattern recognition.Google Scholar
  26. Hung, H., Jayagopi, D., Yeo, C., Friedland, G., Ba, S., Odobez, J. M., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACM multimedia.Google Scholar
  27. Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., & Mori, G. (2016). A hierarchical deep temporal model for group activity recognition. In IEEE conference on computer vision and pattern recognition.Google Scholar
  28. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456).Google Scholar
  29. Joo, J., Li, W., Steen, F., & Zhu, S. C. (2014). Visual persuasion: Inferring communicative intents of images. In IEEE conference on computer vision and pattern recognition (pp. 216–223).Google Scholar
  30. Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In IEEE international conference on computer vision.Google Scholar
  31. Khorrami, P., Paine, T., & Huang, T. (2015). Do deep neural networks learn facial action units when doing expression recognition? In IEEE international conference on computer vision workshop.Google Scholar
  32. Kiesler, D. J. (1983). The 1982 interpersonal circle: A taxonomy for complementarity in human transactions. Psychological Review, 90(3), 185.CrossRefGoogle Scholar
  33. Knutson, B. (1996). Facial expressions of emotion influence interpersonal trait inferences. Journal of Nonverbal Behavior, 20(3), 165–182.CrossRefGoogle Scholar
  34. Kong, Y., Jia, Y., & Fu, Y. (2012). Learning human interaction by interactive phrases. In European conference on computer vision (pp. 300–313).Google Scholar
  35. Kostinger, M., Wohlhart, P., Roth, P., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE international conference on computer vision workshop (pp. 2144–2151).Google Scholar
  36. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc.Google Scholar
  37. Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision (pp. 340–353). Berlin: Springer.Google Scholar
  38. Lan, T., Sigal, L., & Mori, G. (2012). Social roles in hierarchical models for human activity recognition. In IEEE conference on computer vision and pattern recognition.Google Scholar
  39. Lee, D. H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International conference on machine learning workshop (vol. 3, p. 2).Google Scholar
  40. Levi, G., & Hassner, T. (2015). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In ACM international conference on multimodal interaction (pp. 503–510).Google Scholar
  41. Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015). A convolutional neural network cascade for face detection. In IEEE conference on computer vision and pattern recognition.Google Scholar
  42. Liu, M., Li, S., Shan, S., & Chen, X. (2015). AU-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.CrossRefGoogle Scholar
  43. Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision.Google Scholar
  44. Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In IEEE conference on computer vision and pattern recognition.Google Scholar
  45. Liu, P., Han, S., Meng, Z., & Tong, Y. (2014c). Facial expression recognition via a boosted deep belief network. In IEEE conference on computer vision and pattern recognition (pp. 1805–1812).Google Scholar
  46. Liu, S., Yang, J., Huang, C., & Yang, M. H. (2015a). Multi-objective convolutional learning for face labeling. In IEEE conference on computer vision and pattern recognition.Google Scholar
  47. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015b). Deep learning face attributes in the wild. In IEEE international conference on computer vision.Google Scholar
  48. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In IEEE conference on computer vision and pattern recognition workshops (pp. 94–101).Google Scholar
  49. Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In IEEE conference on computer vision and pattern recognition.Google Scholar
  50. Lyons, M. J., Budynek, J., & Akamatsu, S. (1999). Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12), 1357–1362.CrossRefGoogle Scholar
  51. Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In IEEE winter conference on applications of computer vision.Google Scholar
  52. Moody, J., Hanson, S., Krogh, A., & Hertz, J. A. (1995). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, 4, 950–957.Google Scholar
  53. Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In ACM international conference on multimodal interaction (pp. 443–449).Google Scholar
  54. Opitz, M., Waltner, G., Poier, G., Possegger, H., & Bischof, H. (2016). Grid loss: Detecting occluded faces. In European conference on computer vision.Google Scholar
  55. Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., et al. (2011). Social signal processing: The research agenda. In T. B. Moeslund, A. Hilton, V. Krüger & L. Sigal (Eds.), Visual analysis of humans (pp. 511–538). Berlin: Springer.Google Scholar
  56. Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005). Web-based database for facial expression analysis. In IEEE international conference on multimedia and expo.Google Scholar
  57. Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In British machine vision conference.Google Scholar
  58. Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.CrossRefGoogle Scholar
  59. Pentland, A. (2007). Social signal processing. IEEE Signal Processing Magazine, 24(4), 108.CrossRefGoogle Scholar
  60. Raducanu, B., & Gatica-Perez, D. (2012). Inferring competitive role patterns in reality TV show through nonverbal analysis. Multimedia Tools and Applications, 56(1), 207–226.CrossRefGoogle Scholar
  61. Ramanathan, V., Yao, B., & Fei-Fei, L. (2013). Social role discovery in human events. In IEEE conference on computer vision and pattern recognition (pp. 2475–2482).Google Scholar
  62. Ricci, E., Varadarajan, J., Subramanian, R., Rota Bulo, S., Ahuja, N., & Lanz, O. (2015). Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In IEEE international conference on computer vision.Google Scholar
  63. Ruiz, A., Van de Weijer, J., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In IEEE international conference on computer vision (pp. 3703–3711).Google Scholar
  64. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRefGoogle Scholar
  65. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition.Google Scholar
  66. Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.CrossRefGoogle Scholar
  67. Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In IEEE conference on computer vision and pattern recognition.Google Scholar
  68. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition.Google Scholar
  69. Tian, Y., Kanade, T., & Cohn, J. F. (2011). Facial expression recognition. In S. Z. Li & A. K. Jain (Eds.), Handbook of face recognition. Berlin: Springer.Google Scholar
  70. Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97–115.CrossRefGoogle Scholar
  71. Tianqi, C., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2016). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPS workshop on machine learning systems.Google Scholar
  72. Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE conference on computer vision and pattern recognition.Google Scholar
  73. Valstar, M. F., Mehu, M., Jiang, B., Pantic, M., & Scherer, K. (2012). Meta-analysis of the first facial expression recognition challenge. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 966–979.CrossRefGoogle Scholar
  74. Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12), 1743–1759.CrossRefGoogle Scholar
  75. Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRefGoogle Scholar
  76. Wang, G., Gallagher, A., Luo, J., & Forsyth, D. (2010). Seeing people in social context: Recognizing people and social relationships. In European conference on computer vision (pp. 169–182).Google Scholar
  77. Wang, J., Cheng, Y., & Feris, R. S. (2016). Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In IEEE conference on computer vision and pattern recognition.Google Scholar
  78. Weng, C. Y., Chu, W. T., & Wu, J. L. (2009). RoleNet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia, 11(2), 256–271.CrossRefGoogle Scholar
  79. Wu, Y., & Ji, Q. (2016). Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In IEEE conference on computer vision and pattern recognition.Google Scholar
  80. Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2014). Aggregate channel features for multi-view face detection. In International joint conference on biometrics.Google Scholar
  81. Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2015). Convolutional channel features. In IEEE international conference on computer vision.Google Scholar
  82. Yang, H., Zhou, J. T., & Cai, J. (2016). Improving multi-label learning with missing labels by structured semantic correlations. In European conference on computer vision (pp. 835–851).Google Scholar
  83. Yang, S., Luo, P., Loy, C. C., & Tang, X. (2015). From facial parts responses to face detection: A deep learning approach. In IEEE international conference on computer vision.Google Scholar
  84. Yang, S., Luo, P., Loy, C. C., & Tang, X. (2016). Wider face: A face detection benchmark. In IEEE conference on computer vision and pattern recognition.Google Scholar
  85. Yao, A., Shao, J., Ma, N., & Chen, Y. (2015). Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In ACM international conference on multimodal interaction (pp. 451–458).Google Scholar
  86. Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In International conference on machine learning (pp. 593–601).Google Scholar
  87. Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In ACM international conference on multimodal interaction (pp. 435–442).Google Scholar
  88. Zafeiriou, S., Papaioannou, A., Kotsia, I., Nicolaou, M. A., & Zhao, G. (2016). Facial affect in-the-wild: A survey and a new database. In IEEE conference on computer vision and pattern recognition workshop.Google Scholar
  89. Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In NIPS (pp. 1601–1608).Google Scholar
  90. Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. (2014). Panda: Pose aligned networks for deep attribute modeling. In IEEE conference on computer vision and pattern recognition.Google Scholar
  91. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015a). Learning deep representation for face alignment with auxiliary attributes. In IEEE transactions on pattern analysis and machine intelligence.Google Scholar
  92. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In IEEE international conference on computer vision.Google Scholar
  93. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Joint face representation adaptation and clustering in videos. In European conference on computer vision.Google Scholar
  94. Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607–619.CrossRefGoogle Scholar
  95. Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 915–928.CrossRefGoogle Scholar
  96. Zhao, X., Liang, X., Liu, L., Li, T., Vasconcelos, N., & Yan, S. (2016). Peak-piloted deep network for facial expression recognition. In European conference on computer vision.Google Scholar
  97. Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In IEEE conference on computer vision and pattern recognition (pp. 2562–2569).Google Scholar
  98. Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3d solution. In IEEE conference on computer vision and pattern recognition.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  • Zhanpeng Zhang
    • 1
  • Ping Luo
    • 2
  • Chen Change Loy
    • 2
  • Xiaoou Tang
    • 2
  1. 1.SenseTime Group LimitedShatinChina
  2. 2.Department of Information EngineeringThe Chinese University of Hong KongShatinHong Kong

Personalised recommendations