Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

Abstract

Comprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are in multiple or co-occurring mental states. However, due to the lack of valid datasets, most previous studies are still restricted to basic emotions with single label. In this paper, we present a novel multi-label facial expression database, RAF-ML, along with a new deep learning algorithm, to address this problem. Specifically, a crowdsourcing annotation of 1.2 million labels from 315 participants was implemented to identify the multi-label expressions collected from social network, then EM algorithm was designed to filter out unreliable labels. For all we know, RAF-ML is the first database in the wild that provides with crowdsourced cognition for multi-label expressions. Focusing on the ambiguity and continuity of blended expressions, we propose a new deep manifold learning network, called Deep Bi-Manifold CNN, to learn the discriminative feature for multi-label expressions by jointly preserving the local affinity of deep features and the manifold structures of emotion labels. Furthermore, a deep domain adaption method is leveraged to extend the deep manifold features learned from RAF-ML to other expression databases under various imaging conditions and cultures. Extensive experiments on the RAF-ML and other diverse databases (JAFFE, CK\(+\), SFEW and MMI) show that the deep manifold feature is not only superior in multi-label expression recognition in the wild, but also captures the elemental and generic components that are effective for a wide range of expression recognition tasks.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    http://www.whdeng.cn/RAF/model2.html.

  2. 2.

    Ekman have stated in Ekman et al. (2013), “if the stimulus does contain an emotion blend, and the investigator allows only a single choice which does not contain blend terms, low levels of agreement may result, since some of the observers may choose a term for one of the blend components, some for another.”

  3. 3.

    Compound emotions out of these 4908 images and the other basic emotions have been presented in RAF-DB (Li et al. 2017; Li and Deng 2019).

  4. 4.

    It’s reasonable that there are no images with five or six labels in RAF-ML, since people can hardly perceive both negative and positive valence-level (for example, joyful and angry) simultaneously from still images, which occupy different regions in the valence-activation space (Cowie et al. 2001; Russell and Barrett 1999).

References

  1. Anitha, C., Venkatesha, M., & Adiga, B. S. (2010). A survey on facial expression databases. International Journal of Engineering Science and Technology, 2(10), 5158–5174.

    Google Scholar 

  2. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  3. Chang, Y., Hu, C., & Turk, M. (2004). Probabilistic expression analysis on manifolds. In Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on (Vol. 2, pp. II–II). IEEE.

  4. Chen, J., Liu, X., Tu, P., & Aragones, A. (2013). Learning person-specific models for facial expression and action unit recognition. Pattern Recognition Letters, 34(15), 1964–1970.

    Article  Google Scholar 

  5. Chu, W. S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3515–3522).

  6. Cour, T., Sapp, B., & Taskar, B. (2011). Learning from partial labels. Journal of Machine Learning Research, 12(May), 1501–1536.

    MathSciNet  MATH  Google Scholar 

  7. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.

    Article  Google Scholar 

  8. Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey. CoRR, arXiv:1702.05374.

  9. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR, on (Vol. 1, pp. 886–893). IEEE.

  10. Dhall, A., Goecke, R., & Gedeon, T. (2015a). Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing, 6(1), 13–26.

    Article  Google Scholar 

  11. Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015b). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 423–426). ACM.

  12. Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2013). Facial action unit event detection by cascade of tasks. In Proceedings of the IEEE international conference on computer vision (pp. 2400–2407).

  13. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML (pp. 647–655).

  14. Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.

    Article  Google Scholar 

  15. Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues. Journal of Personality (p. 212). Cambridge, MA: Malor Books.

  16. Ekman, P., Friesen, W. V., & Ellsworth, P. (2013). Emotion in the human face: Guidelines for research and an integration of findings. Amsterdam: Elsevier.

    Google Scholar 

  17. Ekman, P., & Rosenberg, E. L. (1997). What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford: Oxford University Press.

    Google Scholar 

  18. Ekman, P., & Scherer, K. (1984). Expression and the nature of emotion. Approaches to Emotion, 3, 19–344.

    Google Scholar 

  19. Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015a). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.

    MathSciNet  Article  MATH  Google Scholar 

  20. Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015b). Multi-conditional latent variable model for joint facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3792–3800).

  21. Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5562–5570).

  22. Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.

    Article  Google Scholar 

  23. Gao, B. B., Xing, C., Xie, C. W., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.

    MathSciNet  Article  MATH  Google Scholar 

  24. Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012a). A kernel two-sample test. Journal of Machine Learning Research, 13(Mar), 723–773.

    MathSciNet  MATH  Google Scholar 

  25. Gretton, A., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K., & Sriperumbudur, B. K. (2012b). Optimal kernel choice for large-scale two-sample tests. In Advances in neural information processing systems (pp. 1205–1213).

  26. Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 2, pp. 1735–1742). IEEE.

  27. Hassin, R. R., Aviezer, H., & Bentin, S. (2013). Inherently ambiguous: Facial expressions of emotions, in context. Emotion Review, 5(1), 60–65.

    Article  Google Scholar 

  28. He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (pp. 153–160).

  29. Hou, P., Geng, X., & Zhang, M. L. (2016). Multi-label manifold learning. In AAAI (pp. 1680–1686).

  30. Huang, S. J., Zhou, Z. H., & Zhou, Z. (2012). Multi-label learning by exploiting label correlations locally. In AAAI (pp. 949–955).

  31. Inc. M. (2013). Face++ research toolkit. www.faceplusplus.com.

  32. Izard, C. E. (1972). Anxiety: A variable combination of interacting fundamental emotions. In Anxiety: Current trends in theory and research (Vol. 1, pp. 55–106).

  33. Izard, C. E. (2013). Human emotions. Berlin: Springer.

    Google Scholar 

  34. Jack, R. E., Garrod, O. G., Yu, H., Caldara, R., & Schyns, P. G. (2012). Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109(19), 7241–7244.

    Article  Google Scholar 

  35. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM.

  36. Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2983–2991).

  37. Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User, Interfaces, 1–17.

    Google Scholar 

  38. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  39. Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. CoRR, arXiv:1804.08348.

  40. Li, S., & Deng, W. (2019). Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, 28(1), 356–370.

    MathSciNet  Article  MATH  Google Scholar 

  41. Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2584–2593). IEEE.

  42. Liu, M., Li, S., Shan, S., & Chen, X. (2013). Au-aware deep networks for facial expression recognition. In Automatic face and gesture recognition (FG), 2013 10th IEEE international conference and workshops on (pp. 1–6). IEEE.

  43. Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision (pp. 143–157). Berlin: Springer.

  44. Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1749–1756).

  45. Liu, M., Shan, S., Wang, R., & Chen, X. (2016). Learning expressionlets via universal manifold model for dynamic facial expression recognition. IEEE Transactions on Image Processing, 25(12), 5920–5932.

    MathSciNet  Article  MATH  Google Scholar 

  46. Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).

  47. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPRW, on (pp. 94–101). IEEE.

  48. Lv, Y., Feng, Z., & Xu, C. (2014). Facial expression recognition via deep learning. In Smart computing (SMARTCOMP), 2014 international conference on (pp. 303–308). IEEE.

  49. Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with Gabor wavelets. In Automatic face and gesture recognition, 1998. Proceedings. Third IEEE international conference on (pp. 200–205). IEEE.

  50. Miao, Y. Q., Araujo, R., & Kamel, M. S. (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Machine learning and applications (ICMLA), 2012 11th international conference on, IEEE (Vol. 2, pp. 326–332).

  51. Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter conference on applications of computer vision (WACV) (pp. 1–10). IEEE.

  52. Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443–449). ACM.

  53. Nummenmaa, T. (1988). The recognition of pure and blended facial expressions of emotion from still photographs. Scandinavian Journal of Psychology, 29(1), 33–47.

    Article  Google Scholar 

  54. Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.

    Article  MATH  Google Scholar 

  55. Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on pattern analysis and machine intelligence, 22(12), 1424–1445.

    Article  Google Scholar 

  56. Patel, V. M., Gopalan, R., Li, R., & Chellappa, R. (2015). Visual domain adaptation: A survey of recent advances. IEEE Signal Processing Magazine, 32(3), 53–69.

    Article  Google Scholar 

  57. Plutchik, R. (1991). The emotions. Lanham: University Press of America.

    Google Scholar 

  58. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805.

    Article  Google Scholar 

  59. Sariyanidi, E., Gunes, H., & Cavallaro, A. (2015). Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1113–1133.

    Article  Google Scholar 

  60. Sariyanidi, E., Gunes, H., & Cavallaro, A. (2017). Learning bases of activity for facial expression recognition. IEEE Transactions on Image Processing, 26(4), 1965–1978. https://doi.org/10.1109/TIP.2017.2662237.

    MathSciNet  Article  MATH  Google Scholar 

  61. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 815–823).

  62. Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).

  63. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556.

  64. Tomkins, S. S. (1963). Affect imagery consciousness: Volume II: The negative affects (Vol. 2). Berlin: Springer.

    Google Scholar 

  65. Tsoumakas, G., & Katakis, I. (2006). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.

    Article  Google Scholar 

  66. Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Machine learning: ECML 2007 (pp. 406–417). Berlin: Springer.

  67. Valstar, M., & Pantic, M. (2010). Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the 3rd international workshop on EMOTION (satellite of LREC): Corpora for research on emotion and affect (p. 65).

  68. Viola, P., & Jones, M. (2001) Rapid object detection using a boosted cascade of simple features. In CVPR, on (Vol. 1, pp. 1–511). IEEE.

  69. Wang, S., Liu, Z., Wang, J., Wang, Z., Li, Y., Chen, X., et al. (2014). Exploiting multi-expression dependences for implicit multi-emotion video tagging. Image and Vision Computing, 32(10), 682–691.

    Article  Google Scholar 

  70. Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision. Berlin: Springer (pp. 499–515).

  71. Whitehill, J., Wu, T. F., Bergsma, J., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems (pp. 2035–2043).

  72. Xing, C., Geng, X., & Xue, H. (2016). Logistic boosting regression for label distribution learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4489–4497).

  73. Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 532–539).

  74. Yan, H., Ang, M. H., & Poo, A. N. (2011). Cross-dataset facial expression recognition. In Robotics and automation (ICRA), 2011 IEEE international conference on (pp. 5985–5990). IEEE.

  75. Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on (pp. 211–216). IEEE.

  76. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320–3328).

  77. Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 435–442). ACM.

  78. Zen, G., Porzi, L., Sangineto, E., Ricci, E., & Sebe, N. (2016). Learning personalized models for facial expression analysis and gesture recognition. IEEE Transactions on Multimedia, 18(4), 775–788.

    Article  Google Scholar 

  79. Zeng, J., Chu, W. S., De la Torre, F., Cohn, J. F., & Xiong, Z. (2015). Confidence preserving machine for facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3622–3630).

  80. Zhang, M. L., & Wu, L. (2015). Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.

    Article  Google Scholar 

  81. Zhang, M. L., & Yu, F. (2015). Solvingthe partial label learning problem: An instance-based approach. In IJCAI (pp. 4048–4054).

  82. Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.

    Article  MATH  Google Scholar 

  83. Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.

    Article  Google Scholar 

  84. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2018). From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, 126(5), 550–569.

    MathSciNet  Article  Google Scholar 

  85. Zhao, K., Zhang, H., Ma, Z., Song, Y. Z., & Guo, J. (2015). Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing, 157, 280–289.

    Article  Google Scholar 

  86. Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In Computer vision and pattern recognition (CVPR), 2012 IEEE conference on (pp. 2562–2569). IEEE.

  87. Zhou, Y., Xue, H., & Geng, X. (2015). Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1247–1250). ACM.

  88. Zhu, R., Sang, G., & Zhao, Q. (2016). Discriminative feature adaptation for cross-domain facial expression recognition. In Biometrics (ICB), 2016 international conference on (pp. 1–7). IEEE.

  89. Zong, Y., Huang, X., Zheng, W., Cui, Z., & Zhao, G. (2017). Learning a target sample re-generator for cross-database micro-expression recognition. In Proceedings of the 2017 ACM on multimedia conference (pp. 872–880). ACM.

Download references

Acknowledgements

The funding was provided by National Natural Science Foundation of China (Grant Nos 61573068, 61471048), Beijing Nova Program (Grant No Z161100004916088).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Weihong Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Rama Chellappa, Xiaoming Liu, Tae-Kyun Kim, Fernando De la Torre, Chen Change Loy.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 118 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, S., Deng, W. Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning. Int J Comput Vis 127, 884–906 (2019). https://doi.org/10.1007/s11263-018-1131-1

Download citation

Keywords

  • Facial expression recognition
  • Deep feature learning
  • Multi-label classification
  • Crowdsourced database in-the-wild