Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

Li, Shan; Deng, Weihong

doi:10.1007/s11263-018-1131-1

Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

Published: 29 November 2018

Volume 127, pages 884–906, (2019)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Shan Li¹ &
Weihong Deng¹

2555 Accesses
67 Citations
Explore all metrics

Abstract

Comprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are in multiple or co-occurring mental states. However, due to the lack of valid datasets, most previous studies are still restricted to basic emotions with single label. In this paper, we present a novel multi-label facial expression database, RAF-ML, along with a new deep learning algorithm, to address this problem. Specifically, a crowdsourcing annotation of 1.2 million labels from 315 participants was implemented to identify the multi-label expressions collected from social network, then EM algorithm was designed to filter out unreliable labels. For all we know, RAF-ML is the first database in the wild that provides with crowdsourced cognition for multi-label expressions. Focusing on the ambiguity and continuity of blended expressions, we propose a new deep manifold learning network, called Deep Bi-Manifold CNN, to learn the discriminative feature for multi-label expressions by jointly preserving the local affinity of deep features and the manifold structures of emotion labels. Furthermore, a deep domain adaption method is leveraged to extend the deep manifold features learned from RAF-ML to other expression databases under various imaging conditions and cultures. Extensive experiments on the RAF-ML and other diverse databases (JAFFE, CK\(+\), SFEW and MMI) show that the deep manifold feature is not only superior in multi-label expression recognition in the wild, but also captures the elemental and generic components that are effective for a wide range of expression recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Pansy Nandwani & Rupali Verma

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Ninad Mehendale

Deep learning-based facial emotion recognition for human–computer interaction applications

Article 22 April 2021

M. Kalpana Chowdary, Tu N. Nguyen & D. Jude Hemanth

Notes

http://www.whdeng.cn/RAF/model2.html.
Ekman have stated in Ekman et al. (2013), “if the stimulus does contain an emotion blend, and the investigator allows only a single choice which does not contain blend terms, low levels of agreement may result, since some of the observers may choose a term for one of the blend components, some for another.”
Compound emotions out of these 4908 images and the other basic emotions have been presented in RAF-DB (Li et al. 2017; Li and Deng 2019).
It’s reasonable that there are no images with five or six labels in RAF-ML, since people can hardly perceive both negative and positive valence-level (for example, joyful and angry) simultaneously from still images, which occupy different regions in the valence-activation space (Cowie et al. 2001; Russell and Barrett 1999).

References

Anitha, C., Venkatesha, M., & Adiga, B. S. (2010). A survey on facial expression databases. International Journal of Engineering Science and Technology, 2(10), 5158–5174.
Google Scholar
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chang, Y., Hu, C., & Turk, M. (2004). Probabilistic expression analysis on manifolds. In Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on (Vol. 2, pp. II–II). IEEE.
Chen, J., Liu, X., Tu, P., & Aragones, A. (2013). Learning person-specific models for facial expression and action unit recognition. Pattern Recognition Letters, 34(15), 1964–1970.
Article Google Scholar
Chu, W. S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3515–3522).
Cour, T., Sapp, B., & Taskar, B. (2011). Learning from partial labels. Journal of Machine Learning Research, 12(May), 1501–1536.
MathSciNet MATH Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.
Article Google Scholar
Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey. CoRR, arXiv:1702.05374.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR, on (Vol. 1, pp. 886–893). IEEE.
Dhall, A., Goecke, R., & Gedeon, T. (2015a). Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing, 6(1), 13–26.
Article Google Scholar
Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015b). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 423–426). ACM.
Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2013). Facial action unit event detection by cascade of tasks. In Proceedings of the IEEE international conference on computer vision (pp. 2400–2407).
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML (pp. 647–655).
Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.
Article Google Scholar
Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues. Journal of Personality (p. 212). Cambridge, MA: Malor Books.
Ekman, P., Friesen, W. V., & Ellsworth, P. (2013). Emotion in the human face: Guidelines for research and an integration of findings. Amsterdam: Elsevier.
Google Scholar
Ekman, P., & Rosenberg, E. L. (1997). What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford: Oxford University Press.
Google Scholar
Ekman, P., & Scherer, K. (1984). Expression and the nature of emotion. Approaches to Emotion, 3, 19–344.
Google Scholar
Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015a). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.
Article MathSciNet MATH Google Scholar
Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015b). Multi-conditional latent variable model for joint facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3792–3800).
Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5562–5570).
Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.
Article Google Scholar
Gao, B. B., Xing, C., Xie, C. W., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.
Article MathSciNet MATH Google Scholar
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012a). A kernel two-sample test. Journal of Machine Learning Research, 13(Mar), 723–773.
MathSciNet MATH Google Scholar
Gretton, A., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K., & Sriperumbudur, B. K. (2012b). Optimal kernel choice for large-scale two-sample tests. In Advances in neural information processing systems (pp. 1205–1213).
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 2, pp. 1735–1742). IEEE.
Hassin, R. R., Aviezer, H., & Bentin, S. (2013). Inherently ambiguous: Facial expressions of emotions, in context. Emotion Review, 5(1), 60–65.
Article Google Scholar
He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (pp. 153–160).
Hou, P., Geng, X., & Zhang, M. L. (2016). Multi-label manifold learning. In AAAI (pp. 1680–1686).
Huang, S. J., Zhou, Z. H., & Zhou, Z. (2012). Multi-label learning by exploiting label correlations locally. In AAAI (pp. 949–955).
Inc. M. (2013). Face++ research toolkit. www.faceplusplus.com.
Izard, C. E. (1972). Anxiety: A variable combination of interacting fundamental emotions. In Anxiety: Current trends in theory and research (Vol. 1, pp. 55–106).
Izard, C. E. (2013). Human emotions. Berlin: Springer.
Google Scholar
Jack, R. E., Garrod, O. G., Yu, H., Caldara, R., & Schyns, P. G. (2012). Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109(19), 7241–7244.
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM.
Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2983–2991).
Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User, Interfaces, 1–17.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. CoRR, arXiv:1804.08348.
Li, S., & Deng, W. (2019). Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, 28(1), 356–370.
Article MathSciNet MATH Google Scholar
Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2584–2593). IEEE.
Liu, M., Li, S., Shan, S., & Chen, X. (2013). Au-aware deep networks for facial expression recognition. In Automatic face and gesture recognition (FG), 2013 10th IEEE international conference and workshops on (pp. 1–6). IEEE.
Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision (pp. 143–157). Berlin: Springer.
Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1749–1756).
Liu, M., Shan, S., Wang, R., & Chen, X. (2016). Learning expressionlets via universal manifold model for dynamic facial expression recognition. IEEE Transactions on Image Processing, 25(12), 5920–5932.
Article MathSciNet MATH Google Scholar
Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPRW, on (pp. 94–101). IEEE.
Lv, Y., Feng, Z., & Xu, C. (2014). Facial expression recognition via deep learning. In Smart computing (SMARTCOMP), 2014 international conference on (pp. 303–308). IEEE.
Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with Gabor wavelets. In Automatic face and gesture recognition, 1998. Proceedings. Third IEEE international conference on (pp. 200–205). IEEE.
Miao, Y. Q., Araujo, R., & Kamel, M. S. (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Machine learning and applications (ICMLA), 2012 11th international conference on, IEEE (Vol. 2, pp. 326–332).
Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter conference on applications of computer vision (WACV) (pp. 1–10). IEEE.
Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443–449). ACM.
Nummenmaa, T. (1988). The recognition of pure and blended facial expressions of emotion from still photographs. Scandinavian Journal of Psychology, 29(1), 33–47.
Article Google Scholar
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
Article MATH Google Scholar
Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on pattern analysis and machine intelligence, 22(12), 1424–1445.
Article Google Scholar
Patel, V. M., Gopalan, R., Li, R., & Chellappa, R. (2015). Visual domain adaptation: A survey of recent advances. IEEE Signal Processing Magazine, 32(3), 53–69.
Article Google Scholar
Plutchik, R. (1991). The emotions. Lanham: University Press of America.
Google Scholar
Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805.
Article Google Scholar
Sariyanidi, E., Gunes, H., & Cavallaro, A. (2015). Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1113–1133.
Article Google Scholar
Sariyanidi, E., Gunes, H., & Cavallaro, A. (2017). Learning bases of activity for facial expression recognition. IEEE Transactions on Image Processing, 26(4), 1965–1978. https://doi.org/10.1109/TIP.2017.2662237.
Article MathSciNet MATH Google Scholar
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 815–823).
Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556.
Tomkins, S. S. (1963). Affect imagery consciousness: Volume II: The negative affects (Vol. 2). Berlin: Springer.
Google Scholar
Tsoumakas, G., & Katakis, I. (2006). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.
Article Google Scholar
Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Machine learning: ECML 2007 (pp. 406–417). Berlin: Springer.
Valstar, M., & Pantic, M. (2010). Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the 3rd international workshop on EMOTION (satellite of LREC): Corpora for research on emotion and affect (p. 65).
Viola, P., & Jones, M. (2001) Rapid object detection using a boosted cascade of simple features. In CVPR, on (Vol. 1, pp. 1–511). IEEE.
Wang, S., Liu, Z., Wang, J., Wang, Z., Li, Y., Chen, X., et al. (2014). Exploiting multi-expression dependences for implicit multi-emotion video tagging. Image and Vision Computing, 32(10), 682–691.
Article Google Scholar
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision. Berlin: Springer (pp. 499–515).
Whitehill, J., Wu, T. F., Bergsma, J., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems (pp. 2035–2043).
Xing, C., Geng, X., & Xue, H. (2016). Logistic boosting regression for label distribution learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4489–4497).
Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 532–539).
Yan, H., Ang, M. H., & Poo, A. N. (2011). Cross-dataset facial expression recognition. In Robotics and automation (ICRA), 2011 IEEE international conference on (pp. 5985–5990). IEEE.
Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on (pp. 211–216). IEEE.
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320–3328).
Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 435–442). ACM.
Zen, G., Porzi, L., Sangineto, E., Ricci, E., & Sebe, N. (2016). Learning personalized models for facial expression analysis and gesture recognition. IEEE Transactions on Multimedia, 18(4), 775–788.
Article Google Scholar
Zeng, J., Chu, W. S., De la Torre, F., Cohn, J. F., & Xiong, Z. (2015). Confidence preserving machine for facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3622–3630).
Zhang, M. L., & Wu, L. (2015). Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.
Article Google Scholar
Zhang, M. L., & Yu, F. (2015). Solvingthe partial label learning problem: An instance-based approach. In IJCAI (pp. 4048–4054).
Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Article MATH Google Scholar
Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.
Article Google Scholar
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2018). From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, 126(5), 550–569.
Article MathSciNet Google Scholar
Zhao, K., Zhang, H., Ma, Z., Song, Y. Z., & Guo, J. (2015). Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing, 157, 280–289.
Article Google Scholar
Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In Computer vision and pattern recognition (CVPR), 2012 IEEE conference on (pp. 2562–2569). IEEE.
Zhou, Y., Xue, H., & Geng, X. (2015). Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1247–1250). ACM.
Zhu, R., Sang, G., & Zhao, Q. (2016). Discriminative feature adaptation for cross-domain facial expression recognition. In Biometrics (ICB), 2016 international conference on (pp. 1–7). IEEE.
Zong, Y., Huang, X., Zheng, W., Cui, Z., & Zhao, G. (2017). Learning a target sample re-generator for cross-database micro-expression recognition. In Proceedings of the 2017 ACM on multimedia conference (pp. 872–880). ACM.

Download references

Acknowledgements

The funding was provided by National Natural Science Foundation of China (Grant Nos 61573068, 61471048), Beijing Nova Program (Grant No Z161100004916088).

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications (BUPT), Beijing, China
Shan Li & Weihong Deng

Authors

Shan Li
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weihong Deng.

Additional information

Communicated by Rama Chellappa, Xiaoming Liu, Tae-Kyun Kim, Fernando De la Torre, Chen Change Loy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 118 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Deng, W. Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning. Int J Comput Vis 127, 884–906 (2019). https://doi.org/10.1007/s11263-018-1131-1

Download citation

Received: 15 January 2018
Accepted: 13 November 2018
Published: 29 November 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11263-018-1131-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Facial emotion recognition using convolutional neural networks (FERC)

Deep learning-based facial emotion recognition for human–computer interaction applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 118 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Facial emotion recognition using convolutional neural networks (FERC)

Deep learning-based facial emotion recognition for human–computer interaction applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 118 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation