Skip to main content
Log in

From Facial Expression Recognition to Interpersonal Relation Prediction

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. We investigate if such fine-grained and high-level relation traits can be characterized and quantified from face images in the wild. We address this challenging problem by first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data. While conventional supervised training requires datasets with complete labels (e.g., all samples must be labeled with gender, age, and expression), we show that this requirement can be relaxed via a novel attribute propagation method. The approach further allows us to leverage the inherent correspondences between heterogeneous attribute sources despite the disparate distributions of different datasets. With the network we demonstrate state-of-the-art results on existing facial expression recognition benchmarks. To predict inter-personal relation, we use the expression recognition network as branches for a Siamese model. Extensive experiments show that our model is capable of mining mutual context of faces for accurate fine-grained interpersonal prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Despite we did not study the integration of face and body cues, if body posture and hand gesture information are available, they can be naturally used as additional input channels for our deep models.

  2. Both ExpW and relation datasets are available at http://mmlab.ie.cuhk.edu.hk/projects/socialrelation/index.html.

References

  • Bi, W., & Kwok, J. T. (2014). Multilabel classification with label correlations and missing labels. In AAAI conference on artificial intelligence (pp. 1680–1686).

  • Bromley, J., Guyon, I., Lecun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a Siamese time delay neural network. In Advances in neural information processing systems.

  • Celeux, G., Forbes, F., & Peyrard, N. (2003). EM procedures using mean field-like approximations for markov model-based image segmentation. Pattern Recognition, 36(1), 131–144.

    Article  MATH  Google Scholar 

  • Chakraborty, I., Cheng, H., & Javed, O. (2013). 3D visual proxemics: Recognizing human interactions in 3D from a single image. In IEEE conference on computer vision and pattern recognition (pp. 3406–3413).

  • Chen, Y. Y., Hsu, W. H., & Liao, H. Y. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACM multimedia (pp. 669–678).

  • Chu, X., Ouyang, W., Yang, W., & Wang, X. (2015). Multi-task recurrent neural network for immediacy prediction. In Proceedings of the IEEE international conference on computer vision (pp. 3352–3360).

  • Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, 100, 86–97.

    Article  Google Scholar 

  • Dahmane, M., & Meunier, J. (2011). Emotion recognition using dynamic grid-based hog features. In IEEE international conference on automatic face & gesture recognition (pp. 884–888).

  • Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In IEEE conference on computer vision and pattern recognition.

  • Dhall, A., Asthana, A., Goecke, R., & Gedeon, T. (2011). Emotion recognition using PHOG and LPQ features. In IEEE international conference on automatic face & gesture recognition and workshops (pp. 878–883).

  • Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In ACM international conference on multimodal interaction (pp. 423–426).

  • Ding, L., & Yilmaz, A. (2010). Learning relations among movie characters: A social network perspective. In European conference on computer vision.

  • Ding, L., & Yilmaz, A. (2011). Inferring social relations from visual concepts. In IEEE international conference on computer vision (pp. 699–706).

  • Emily, M., & Hand, R. C. (2017). Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In AAAI conference on artificial intelligence.

  • Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In IEEE conference on computer vision and pattern recognition.

  • Fathi, A., Hodgins, J. K., & Rehg, J. M. (2012). Social interactions: A first-person perspective. In IEEE conference on computer vision and pattern recognition.

  • Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In IEEE conference on computer vision and pattern recognition (pp. 256–263). IEEE.

  • Girard, J. M. (2014). Perceptions of interpersonal behavior are influenced by gender, facial expression intensity, and head pose. In ACM international conference on multimodal interaction (pp. 394–398).

  • Goodfellow, I., Erhan, D., Carrier, P. L., Courville, A., Mirza, et al. (2013). Challenges in representation learning: A report on three machine learning contests. http://arxiv.org/abs/1307.0414.

  • Gottman, J., Levenson, R., & Woodin, E. (2001). Facial expressions during marital conflict. Journal of Family Communication, 1(1), 37–57.

    Article  Google Scholar 

  • Gupta, A. K., & Nagar, D. K. (1999). Matrix variate distributions. Boca Raton: CRC Press.

    MATH  Google Scholar 

  • Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283.

    Article  Google Scholar 

  • Hoai, M., & Zisserman, A. (2014). Talking heads: Detecting humans and recognizing their interactions. In IEEE conference on computer vision and pattern recognition.

  • Hu, Y., Zeng, Z., Yin, L., Wei, X., Zhou, X., & Huang, T. S. (2008). Multi-view facial expression recognition. In IEEE international conference on automatic face & gesture recognition. https://doi.org/10.1109/AFGR.2008.4813445.

  • Huang, C., Li, Y., Loy, C. C., & Tang, X. (2016). Learning deep representation for imbalanced classification. In IEEE conference on computer vision and pattern recognition.

  • Hung, H., Jayagopi, D., Yeo, C., Friedland, G., Ba, S., Odobez, J. M., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACM multimedia.

  • Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., & Mori, G. (2016). A hierarchical deep temporal model for group activity recognition. In IEEE conference on computer vision and pattern recognition.

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456).

  • Joo, J., Li, W., Steen, F., & Zhu, S. C. (2014). Visual persuasion: Inferring communicative intents of images. In IEEE conference on computer vision and pattern recognition (pp. 216–223).

  • Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In IEEE international conference on computer vision.

  • Khorrami, P., Paine, T., & Huang, T. (2015). Do deep neural networks learn facial action units when doing expression recognition? In IEEE international conference on computer vision workshop.

  • Kiesler, D. J. (1983). The 1982 interpersonal circle: A taxonomy for complementarity in human transactions. Psychological Review, 90(3), 185.

    Article  Google Scholar 

  • Knutson, B. (1996). Facial expressions of emotion influence interpersonal trait inferences. Journal of Nonverbal Behavior, 20(3), 165–182.

    Article  Google Scholar 

  • Kong, Y., Jia, Y., & Fu, Y. (2012). Learning human interaction by interactive phrases. In European conference on computer vision (pp. 300–313).

  • Kostinger, M., Wohlhart, P., Roth, P., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE international conference on computer vision workshop (pp. 2144–2151).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc.

  • Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision (pp. 340–353). Berlin: Springer.

  • Lan, T., Sigal, L., & Mori, G. (2012). Social roles in hierarchical models for human activity recognition. In IEEE conference on computer vision and pattern recognition.

  • Lee, D. H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International conference on machine learning workshop (vol. 3, p. 2).

  • Levi, G., & Hassner, T. (2015). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In ACM international conference on multimodal interaction (pp. 503–510).

  • Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015). A convolutional neural network cascade for face detection. In IEEE conference on computer vision and pattern recognition.

  • Liu, M., Li, S., Shan, S., & Chen, X. (2015). AU-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.

    Article  Google Scholar 

  • Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision.

  • Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In IEEE conference on computer vision and pattern recognition.

  • Liu, P., Han, S., Meng, Z., & Tong, Y. (2014c). Facial expression recognition via a boosted deep belief network. In IEEE conference on computer vision and pattern recognition (pp. 1805–1812).

  • Liu, S., Yang, J., Huang, C., & Yang, M. H. (2015a). Multi-objective convolutional learning for face labeling. In IEEE conference on computer vision and pattern recognition.

  • Liu, Z., Luo, P., Wang, X., & Tang, X. (2015b). Deep learning face attributes in the wild. In IEEE international conference on computer vision.

  • Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In IEEE conference on computer vision and pattern recognition workshops (pp. 94–101).

  • Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In IEEE conference on computer vision and pattern recognition.

  • Lyons, M. J., Budynek, J., & Akamatsu, S. (1999). Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12), 1357–1362.

    Article  Google Scholar 

  • Microsoft Cognitive Services. (2016). https://www.microsoft.com/cognitive-services/en-us/emotion-api.

  • Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In IEEE winter conference on applications of computer vision.

  • Moody, J., Hanson, S., Krogh, A., & Hertz, J. A. (1995). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, 4, 950–957.

    Google Scholar 

  • Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In ACM international conference on multimodal interaction (pp. 443–449).

  • Opitz, M., Waltner, G., Poier, G., Possegger, H., & Bischof, H. (2016). Grid loss: Detecting occluded faces. In European conference on computer vision.

  • Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., et al. (2011). Social signal processing: The research agenda. In T. B. Moeslund, A. Hilton, V. Krüger & L. Sigal (Eds.), Visual analysis of humans (pp. 511–538). Berlin: Springer.

  • Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005). Web-based database for facial expression analysis. In IEEE international conference on multimedia and expo.

  • Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In British machine vision conference.

  • Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.

    Article  Google Scholar 

  • Pentland, A. (2007). Social signal processing. IEEE Signal Processing Magazine, 24(4), 108.

    Article  Google Scholar 

  • Raducanu, B., & Gatica-Perez, D. (2012). Inferring competitive role patterns in reality TV show through nonverbal analysis. Multimedia Tools and Applications, 56(1), 207–226.

    Article  Google Scholar 

  • Ramanathan, V., Yao, B., & Fei-Fei, L. (2013). Social role discovery in human events. In IEEE conference on computer vision and pattern recognition (pp. 2475–2482).

  • Ricci, E., Varadarajan, J., Subramanian, R., Rota Bulo, S., Ahuja, N., & Lanz, O. (2015). Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In IEEE international conference on computer vision.

  • Ruiz, A., Van de Weijer, J., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In IEEE international conference on computer vision (pp. 3703–3711).

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition.

  • Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.

    Article  Google Scholar 

  • Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In IEEE conference on computer vision and pattern recognition.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition.

  • Tian, Y., Kanade, T., & Cohn, J. F. (2011). Facial expression recognition. In S. Z. Li & A. K. Jain (Eds.), Handbook of face recognition. Berlin: Springer.

  • Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97–115.

    Article  Google Scholar 

  • Tianqi, C., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2016). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPS workshop on machine learning systems.

  • Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE conference on computer vision and pattern recognition.

  • Valstar, M. F., Mehu, M., Jiang, B., Pantic, M., & Scherer, K. (2012). Meta-analysis of the first facial expression recognition challenge. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 966–979.

    Article  Google Scholar 

  • Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12), 1743–1759.

    Article  Google Scholar 

  • Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.

    Article  Google Scholar 

  • Wang, G., Gallagher, A., Luo, J., & Forsyth, D. (2010). Seeing people in social context: Recognizing people and social relationships. In European conference on computer vision (pp. 169–182).

  • Wang, J., Cheng, Y., & Feris, R. S. (2016). Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In IEEE conference on computer vision and pattern recognition.

  • Weng, C. Y., Chu, W. T., & Wu, J. L. (2009). RoleNet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia, 11(2), 256–271.

    Article  Google Scholar 

  • Wu, Y., & Ji, Q. (2016). Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In IEEE conference on computer vision and pattern recognition.

  • Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2014). Aggregate channel features for multi-view face detection. In International joint conference on biometrics.

  • Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2015). Convolutional channel features. In IEEE international conference on computer vision.

  • Yang, H., Zhou, J. T., & Cai, J. (2016). Improving multi-label learning with missing labels by structured semantic correlations. In European conference on computer vision (pp. 835–851).

  • Yang, S., Luo, P., Loy, C. C., & Tang, X. (2015). From facial parts responses to face detection: A deep learning approach. In IEEE international conference on computer vision.

  • Yang, S., Luo, P., Loy, C. C., & Tang, X. (2016). Wider face: A face detection benchmark. In IEEE conference on computer vision and pattern recognition.

  • Yao, A., Shao, J., Ma, N., & Chen, Y. (2015). Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In ACM international conference on multimodal interaction (pp. 451–458).

  • Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In International conference on machine learning (pp. 593–601).

  • Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In ACM international conference on multimodal interaction (pp. 435–442).

  • Zafeiriou, S., Papaioannou, A., Kotsia, I., Nicolaou, M. A., & Zhao, G. (2016). Facial affect in-the-wild: A survey and a new database. In IEEE conference on computer vision and pattern recognition workshop.

  • Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In NIPS (pp. 1601–1608).

  • Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. (2014). Panda: Pose aligned networks for deep attribute modeling. In IEEE conference on computer vision and pattern recognition.

  • Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015a). Learning deep representation for face alignment with auxiliary attributes. In IEEE transactions on pattern analysis and machine intelligence.

  • Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In IEEE international conference on computer vision.

  • Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Joint face representation adaptation and clustering in videos. In European conference on computer vision.

  • Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607–619.

    Article  Google Scholar 

  • Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 915–928.

    Article  Google Scholar 

  • Zhao, X., Liang, X., Liu, L., Li, T., Vasconcelos, N., & Yan, S. (2016). Peak-piloted deep network for facial expression recognition. In European conference on computer vision.

  • Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In IEEE conference on computer vision and pattern recognition (pp. 2562–2569).

  • Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3d solution. In IEEE conference on computer vision and pattern recognition.

Download references

Acknowledgements

This work is supported by SenseTime Group Limited and the General Research Fund sponsored by the Research Grants Council of the Hong Kong SAR (CUHK 14241716, 14224316. 14209217).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Change Loy.

Additional information

Communicated by Koichi Kise.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Luo, P., Loy, C.C. et al. From Facial Expression Recognition to Interpersonal Relation Prediction. Int J Comput Vis 126, 550–569 (2018). https://doi.org/10.1007/s11263-017-1055-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-1055-1

Keywords

Navigation