Advertisement

Expression Analysis Based on Face Regions in Read-world Conditions

  • Zheng Lian
  • Ya LiEmail author
  • Jian-Hua Tao
  • Jian Huang
  • Ming-Yue Niu
Research Article

Abstract

Facial emotion recognition is an essential and important aspect of the field of human-machine interaction. Past research on facial emotion recognition focuses on the laboratory environment. However, it faces many challenges in real-world conditions, i.e., illumination changes, large pose variations and partial or full occlusions. Those challenges lead to different face areas with different degrees of sharpness and completeness. Inspired by this fact, we focus on the authenticity of predictions generated by different <emotion, region> pairs. For example, if only the mouth areas are available and the emotion classifier predicts happiness, then there is a question of how to judge the authenticity of predictions. This problem can be converted into the contribution of different face areas to different emotions. In this paper, we divide the whole face into six areas: nose areas, mouth areas, eyes areas, nose to mouth areas, nose to eyes areas and mouth to eyes areas. To obtain more convincing results, our experiments are conducted on three different databases: facial expression recognition + ( FER+), real-world affective faces database (RAF-DB) and expression in-the-wild (ExpW) dataset. Through analysis of the classification accuracy, the confusion matrix and the class activation map (CAM), we can establish convincing results. To sum up, the contributions of this paper lie in two areas: 1) We visualize concerned areas of human faces in emotion recognition; 2) We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis. Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.

Keywords

Facial emotion analysis face areas class activation map confusion matrix concerned area 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgments

This work is supported by the National Key Research & Development Plan of China (No. 2017YFB1002804), and National Natural Science Foundation of China (Nos. 61425017, 61773379, 61332017, 61603390 and 61771472) and the Major Program for the 325 National Social Science Fund of China (No. 13&ZD189).

References

  1. [1]
    H. Prendinger, J. Mori, M. Ishizuka. Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game. International Journal of Human-Computer Studies, vol. 62, no. 2, pp. 231–245, 2005. Doi: 10.1016/j.ijhcs.2004.11.009.CrossRefGoogle Scholar
  2. [2]
    B. Martinovski, D. Traum. The error is the clue: Breakdown in human-machine interaction. In Proceedings of the ISCA Tutorial and Research Workshop Error Handling in Spoken Dialogue Systems, Château d’Oex, Switzerland, pp. 11–16, 2003.Google Scholar
  3. [3]
    N. Asghar, P. Poupart, J. Hoey, X. Jiang, L. L. Mou. Affective neural response generation. In Proceedings of the 40th European Conference on Information Retrieval Research, Springer, Grenoble, France, pp. 154–166, 2017. DOI:  https://doi.org/10.1007/978-3-319-76941-7-12.Google Scholar
  4. [4]
    H. Zhou, M. L. Huang, T. Y. Zhang, X. Y. Zhu, B. Liu. Emotional chatting machine: Emotional conversation generation with internal and external memory. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018.Google Scholar
  5. [5]
    S. Ghosh, M. Chollet, E. Laksana, L. P. Morency, S. Scherer. Affect-LM: A neural language model for customizable affective text generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL, Vancouver, Canada, pp. 634–642, 2017. DOI:  https://doi.org/10.18653/v1/P17-1059.Google Scholar
  6. [6]
    N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, pp. 886–893, 2005. DOI:  https://doi.org/10.1109/CVPR.2005.177.Google Scholar
  7. [7]
    T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002. DOI:  https://doi.org/10.1109/TPAMI.2002.1017623.CrossRefzbMATHGoogle Scholar
  8. [8]
    V. Ojansivu, J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Proceedings of the 3rd International Conference on Image and Signal Processing, Springer, Cherbourg-Octeville, France, pp. 236–243, 2008. DOI:  https://doi.org/10.1007/978-3-540-69905-7-27.Google Scholar
  9. [9]
    D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. DOI:  https://doi.org/10.1023/B:VISI.0000029664.99615.94.CrossRefGoogle Scholar
  10. [10]
    Y. P. Chen, J. N. Li, H. X. Xiao, X. J. Jin, S. C. Yan, J. S. Feng. Dual path networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Long Beach, USA, pp. 4467–4475, 2017.Google Scholar
  11. [11]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Curran Associates, Inc., Long Beach, USA, pp. 6000–6010, 2017.Google Scholar
  12. [12]
    L. Shen, Z. C. Lin, Q. M. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 467–482, 2016. DOI: 10.1007/978-3-319-46478-7’29.Google Scholar
  13. [13]
    L. Chen, H. W. Zhang, J. Xiao, L. Q. Nie, J. Shao, W. Liu, T. S. Chua. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 5659–5667, 2017. DOI:  https://doi.org/10.1109/CVPR.2017.667.Google Scholar
  14. [14]
    A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu. Wavenet: A generative model for raw audio. In Proceedings of the 9th ISCA Speech Synthesis Workshop, Sunnyvale, USA, 2016.Google Scholar
  15. [15]
    A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, Nevada, pp. 1097–1105, 2012.Google Scholar
  16. [16]
    K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition, [Online], Available: https://arxiv.org/pdf/1409.1556.pdf, September, 2014.Google Scholar
  17. [17]
    C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI:  https://doi.org/10.1109/CVPR.2015.7298594.Google Scholar
  18. [18]
    K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI:  https://doi.org/10.1109/CVPR.2016.90.Google Scholar
  19. [19]
    G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2261–2269, 2017. DOI:  https://doi.org/10.1109/CVPR.2017.243.Google Scholar
  20. [20]
    P. Ekman, W. V. Friesen. The Facial Action Coding System (FACS): A Technique for the Measurement of Facial Action, Palo Alto, USA: Consulting Psychologists, 1978.Google Scholar
  21. [21]
    Y. I. Tian, T. Kanade, J. F. Cohn. Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97–115, 2001. DOI:  https://doi.org/10.1109/34.908962.CrossRefGoogle Scholar
  22. [22]
    L. Wang, R. F. Li, K. Wang, J. Chen. Feature representation for facial expression recognition based on FACS and LBP. International Journal of Automation and Computing, vol. 11, no. 5, pp. 459–468, 2014. DOI:  https://doi.org/10.1007/s11633-014-0835-0.CrossRefGoogle Scholar
  23. [23]
    X. Sun, M. Lv, C. Q. Quan, F. J. Ren. Improved facial expression recognition method based on ROI deep convolutional neutral network. In Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction, IEEE, San Antonio, USA, pp. 256–261, 2017. DOI:  https://doi.org/10.1109/ACII.2017.8273609.Google Scholar
  24. [24]
    Z. Wei, Y. M. Zhang, L. Ma, J. W. Guan, S. J. Gong. Multimodal learning for facial expression recognition. Pattern Recognition, vol. 48, no. 10, pp. 3191–3202, 2015. DOI:  https://doi.org/10.1016/j.patcog.2015.04.012.CrossRefGoogle Scholar
  25. [25]
    F. K. Zaman, A. A. Shafie, Y. M. Mustafah. Robust face recognition against expressions and partial occlusions. International Journal of Automation and Computing, vol. 13, no. 4, pp. 319–337, 2016. DOI:  https://doi.org/10.1007/s11633-016-0974-6.CrossRefGoogle Scholar
  26. [26]
    B. L. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba. Learning deep features for discriminative localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 2921–2929, 2016. DOI:  https://doi.org/10.1109/CVPR.2016.319.Google Scholar
  27. [27]
    C. Busso, Z. G. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Multimodal Interfaces, ACM, State College, USA, pp. 205–211, 2004. DOI:  https://doi.org/10.1145/1027933.1027968.Google Scholar
  28. [28]
    S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Curran Associates, Inc., Lille, France, pp. 448–456, 2015.Google Scholar
  29. [29]
    K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, Y. LeCun. What is the best multi-stage architecture for object recognition?. In Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, Japan, pp. 2146–2153, 2010. DOI:  https://doi.org/10.1109/ICCV.2009.5459469.Google Scholar
  30. [30]
    E. Barsoum, C. Zhang, C. C. Ferrer, Z. Y. Zhang. Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, pp. 279–283, 2016. DOI:  https://doi.org/10.1145/2993148.2993165.Google Scholar
  31. [31]
    S. Li, W. H. Deng, J. P. Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2584–2593, 2017. DOI:  https://doi.org/10.1109/CVPR.2017.277.Google Scholar
  32. [32]
    Z. P. Zhang, P. Luo, C. C. Loy, X. O. Tang. From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, vol. 126, no. 5, pp. 550–569, 2018. DOI:  https://doi.org/10.1007/s11263-017-1055-1.MathSciNetCrossRefGoogle Scholar
  33. [33]
    M. J. Lyons, J. Budynek, S. Akamatsu. Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, pp. 1357–1362, 1999. DOI:  https://doi.org/10.1109/34.817413.CrossRefGoogle Scholar
  34. [34]
    M. Pantic, M. Valstar, R. Rademaker, L. Maat. Webbased database for facial expression analysis. In Proceedings of IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, 2005. DOI:  https://doi.org/10.1109/ICME.2005.1521424.Google Scholar
  35. [35]
    G. Y. Zhao, X. H. Huang, M. Taini, S. Z. Li, M. Pietikäinen. Facial expression recognition from near-infrared videos. Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011. DOI:  https://doi.org/10.1016/j.imavis.2011.07.002.CrossRefGoogle Scholar
  36. [36]
    P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotionspecified expression. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, USA, pp. 94–101, 2010. DOI:  https://doi.org/10.1109/CVPRW.2010.5543262.Google Scholar
  37. [37]
    A. Dhall, O. V. Ramana Murthy, R. Goecke, J. Joshi, T. Gedeon. Video and image based emotion recognition challenges in the wild: EmotiW 2015. In Proceedings of the ACM on International Conference on Multimodal Interaction, Seattle, USA, pp. 423–426, 2015. DOI:  https://doi.org/10.1145/2818346.2829994.Google Scholar
  38. [38]
    D. E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.Google Scholar
  39. [39]
    I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. C. Tang, D. Thaler, D. H. Lee, Y. B. Zhou, C. Ramaiah, F. X. Feng, R. F. Li, X. J. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. J. Xie, L. Romaszko, B. Xu, Z. Chuang, Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, vol. 64, pp. 59–63, 2015. DOI:  https://doi.org/10.1016/j.neunet.2014.09.005.CrossRefGoogle Scholar
  40. [40]
    B. Yang, J. J. Yan, Z. Lei, S. Z. Li. Aggregate channel features for multi-view face detection. In Proceedings of IEEE International Joint Conference on Biometrics, Clearwater, USA, pp. 1–8, 2014. DOI:  https://doi.org/10.1109/BTAS.2014.6996284.Google Scholar
  41. [41]
    V. Kazemi, J. Sullivan. One millisecond face alignment with an ensemble of regression trees. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1867–1874, 2014. DOI:  https://doi.org/10.1109/CVPR.2014.241.Google Scholar
  42. [42]
    D. P. Kingma, J. Ba. Adam: A Method for Stochastic Optimization, [Online], Available: https://arxiv.org/pdf/1409.1556.pdf, September, 2014.Google Scholar
  43. [43]
    L. van der Maaten, G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.zbMATHGoogle Scholar
  44. [44]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2013.zbMATHGoogle Scholar
  45. [45]
    R. Plutchik. The multifactor-analytic theory of emotion. Journal of Psychology, vol. 50, no. 1, pp. 153–171, 1960. DOI:  https://doi.org/10.1080/00223980.1960.9916432.CrossRefGoogle Scholar

Copyright information

© Institute of Automation, Chinese Academy of Sciences and Springer-Verlag Gmbh Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of Sciences (CAS)BeijingChina
  2. 2.School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijingChina
  3. 3.CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations