Expression Analysis Based on Face Regions in Real-world Conditions

Abstract

Facial emotion recognition is an essential and important aspect of the field of human-machine interaction. Past research on facial emotion recognition focuses on the laboratory environment. However, it faces many challenges in real-world conditions, i.e., illumination changes, large pose variations and partial or full occlusions. Those challenges lead to different face areas with different degrees of sharpness and completeness. Inspired by this fact, we focus on the authenticity of predictions generated by different <emotion, region> pairs. For example, if only the mouth areas are available and the emotion classifier predicts happiness, then there is a question of how to judge the authenticity of predictions. This problem can be converted into the contribution of different face areas to different emotions. In this paper, we divide the whole face into six areas: nose areas, mouth areas, eyes areas, nose to mouth areas, nose to eyes areas and mouth to eyes areas. To obtain more convincing results, our experiments are conducted on three different databases: facial expression recognition + ( FER+), real-world affective faces database (RAF-DB) and expression in-the-wild (ExpW) dataset. Through analysis of the classification accuracy, the confusion matrix and the class activation map (CAM), we can establish convincing results. To sum up, the contributions of this paper lie in two areas: 1) We visualize concerned areas of human faces in emotion recognition; 2) We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis. Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.

References

  1. [1]

    H. Prendinger, J. Mori, M. Ishizuka. Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game. International Journal of Human-Computer Studies, vol. 62, no. 2, pp. 231–245, 2005. Doi: 10.1016/j.ijhcs.2004.11.009.

    Google Scholar 

  2. [2]

    B. Martinovski, D. Traum. The error is the clue: Breakdown in human-machine interaction. In Proceedings of the ISCA Tutorial and Research Workshop Error Handling in Spoken Dialogue Systems, Château d’Oex, Switzerland, pp. 11–16, 2003.

    Google Scholar 

  3. [3]

    N. Asghar, P. Poupart, J. Hoey, X. Jiang, L. L. Mou. Affective neural response generation. In Proceedings of the 40th European Conference on Information Retrieval Research, Springer, Grenoble, France, pp. 154–166, 2017. DOI: https://doi.org/10.1007/978-3-319-76941-7-12.

    Google Scholar 

  4. [4]

    H. Zhou, M. L. Huang, T. Y. Zhang, X. Y. Zhu, B. Liu. Emotional chatting machine: Emotional conversation generation with internal and external memory. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018.

    Google Scholar 

  5. [5]

    S. Ghosh, M. Chollet, E. Laksana, L. P. Morency, S. Scherer. Affect-LM: A neural language model for customizable affective text generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL, Vancouver, Canada, pp. 634–642, 2017. DOI: https://doi.org/10.18653/v1/P17-1059.

    Google Scholar 

  6. [6]

    N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, pp. 886–893, 2005. DOI: https://doi.org/10.1109/CVPR.2005.177.

    Google Scholar 

  7. [7]

    T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002. DOI: https://doi.org/10.1109/TPAMI.2002.1017623.

    MATH  Google Scholar 

  8. [8]

    V. Ojansivu, J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Proceedings of the 3rd International Conference on Image and Signal Processing, Springer, Cherbourg-Octeville, France, pp. 236–243, 2008. DOI: https://doi.org/10.1007/978-3-540-69905-7-27.

    Google Scholar 

  9. [9]

    D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. DOI: https://doi.org/10.1023/B:VISI.0000029664.99615.94.

    Google Scholar 

  10. [10]

    Y. P. Chen, J. N. Li, H. X. Xiao, X. J. Jin, S. C. Yan, J. S. Feng. Dual path networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Long Beach, USA, pp. 4467–4475, 2017.

    Google Scholar 

  11. [11]

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Curran Associates, Inc., Long Beach, USA, pp. 6000–6010, 2017.

    Google Scholar 

  12. [12]

    L. Shen, Z. C. Lin, Q. M. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 467–482, 2016. DOI: 10.1007/978-3-319-46478-7’29.

    Google Scholar 

  13. [13]

    L. Chen, H. W. Zhang, J. Xiao, L. Q. Nie, J. Shao, W. Liu, T. S. Chua. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 5659–5667, 2017. DOI: https://doi.org/10.1109/CVPR.2017.667.

    Google Scholar 

  14. [14]

    A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu. Wavenet: A generative model for raw audio. In Proceedings of the 9th ISCA Speech Synthesis Workshop, Sunnyvale, USA, 2016.

    Google Scholar 

  15. [15]

    A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, Nevada, pp. 1097–1105, 2012.

    Google Scholar 

  16. [16]

    K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition, [Online], Available: https://arxiv.org/pdf/1409.1556.pdf, September, 2014.

    Google Scholar 

  17. [17]

    C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.

    Google Scholar 

  18. [18]

    K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.

    Google Scholar 

  19. [19]

    G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2261–2269, 2017. DOI: https://doi.org/10.1109/CVPR.2017.243.

    Google Scholar 

  20. [20]

    P. Ekman, W. V. Friesen. The Facial Action Coding System (FACS): A Technique for the Measurement of Facial Action, Palo Alto, USA: Consulting Psychologists, 1978.

    Google Scholar 

  21. [21]

    Y. I. Tian, T. Kanade, J. F. Cohn. Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97–115, 2001. DOI: https://doi.org/10.1109/34.908962.

    Google Scholar 

  22. [22]

    L. Wang, R. F. Li, K. Wang, J. Chen. Feature representation for facial expression recognition based on FACS and LBP. International Journal of Automation and Computing, vol. 11, no. 5, pp. 459–468, 2014. DOI: https://doi.org/10.1007/s11633-014-0835-0.

    Google Scholar 

  23. [23]

    X. Sun, M. Lv, C. Q. Quan, F. J. Ren. Improved facial expression recognition method based on ROI deep convolutional neutral network. In Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction, IEEE, San Antonio, USA, pp. 256–261, 2017. DOI: https://doi.org/10.1109/ACII.2017.8273609.

    Google Scholar 

  24. [24]

    Z. Wei, Y. M. Zhang, L. Ma, J. W. Guan, S. J. Gong. Multimodal learning for facial expression recognition. Pattern Recognition, vol. 48, no. 10, pp. 3191–3202, 2015. DOI: https://doi.org/10.1016/j.patcog.2015.04.012.

    Google Scholar 

  25. [25]

    F. K. Zaman, A. A. Shafie, Y. M. Mustafah. Robust face recognition against expressions and partial occlusions. International Journal of Automation and Computing, vol. 13, no. 4, pp. 319–337, 2016. DOI: https://doi.org/10.1007/s11633-016-0974-6.

    Google Scholar 

  26. [26]

    B. L. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba. Learning deep features for discriminative localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 2921–2929, 2016. DOI: https://doi.org/10.1109/CVPR.2016.319.

    Google Scholar 

  27. [27]

    C. Busso, Z. G. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Multimodal Interfaces, ACM, State College, USA, pp. 205–211, 2004. DOI: https://doi.org/10.1145/1027933.1027968.

    Google Scholar 

  28. [28]

    S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Curran Associates, Inc., Lille, France, pp. 448–456, 2015.

    Google Scholar 

  29. [29]

    K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, Y. LeCun. What is the best multi-stage architecture for object recognition?. In Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, Japan, pp. 2146–2153, 2010. DOI: https://doi.org/10.1109/ICCV.2009.5459469.

    Google Scholar 

  30. [30]

    E. Barsoum, C. Zhang, C. C. Ferrer, Z. Y. Zhang. Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, pp. 279–283, 2016. DOI: https://doi.org/10.1145/2993148.2993165.

    Google Scholar 

  31. [31]

    S. Li, W. H. Deng, J. P. Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2584–2593, 2017. DOI: https://doi.org/10.1109/CVPR.2017.277.

    Google Scholar 

  32. [32]

    Z. P. Zhang, P. Luo, C. C. Loy, X. O. Tang. From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, vol. 126, no. 5, pp. 550–569, 2018. DOI: https://doi.org/10.1007/s11263-017-1055-1.

    MathSciNet  Google Scholar 

  33. [33]

    M. J. Lyons, J. Budynek, S. Akamatsu. Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, pp. 1357–1362, 1999. DOI: https://doi.org/10.1109/34.817413.

    Google Scholar 

  34. [34]

    M. Pantic, M. Valstar, R. Rademaker, L. Maat. Webbased database for facial expression analysis. In Proceedings of IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, 2005. DOI: https://doi.org/10.1109/ICME.2005.1521424.

    Google Scholar 

  35. [35]

    G. Y. Zhao, X. H. Huang, M. Taini, S. Z. Li, M. Pietikäinen. Facial expression recognition from near-infrared videos. Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011. DOI: https://doi.org/10.1016/j.imavis.2011.07.002.

    Google Scholar 

  36. [36]

    P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotionspecified expression. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, USA, pp. 94–101, 2010. DOI: https://doi.org/10.1109/CVPRW.2010.5543262.

    Google Scholar 

  37. [37]

    A. Dhall, O. V. Ramana Murthy, R. Goecke, J. Joshi, T. Gedeon. Video and image based emotion recognition challenges in the wild: EmotiW 2015. In Proceedings of the ACM on International Conference on Multimodal Interaction, Seattle, USA, pp. 423–426, 2015. DOI: https://doi.org/10.1145/2818346.2829994.

    Google Scholar 

  38. [38]

    D. E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.

    Google Scholar 

  39. [39]

    I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. C. Tang, D. Thaler, D. H. Lee, Y. B. Zhou, C. Ramaiah, F. X. Feng, R. F. Li, X. J. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. J. Xie, L. Romaszko, B. Xu, Z. Chuang, Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, vol. 64, pp. 59–63, 2015. DOI: https://doi.org/10.1016/j.neunet.2014.09.005.

    Google Scholar 

  40. [40]

    B. Yang, J. J. Yan, Z. Lei, S. Z. Li. Aggregate channel features for multi-view face detection. In Proceedings of IEEE International Joint Conference on Biometrics, Clearwater, USA, pp. 1–8, 2014. DOI: https://doi.org/10.1109/BTAS.2014.6996284.

    Google Scholar 

  41. [41]

    V. Kazemi, J. Sullivan. One millisecond face alignment with an ensemble of regression trees. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1867–1874, 2014. DOI: https://doi.org/10.1109/CVPR.2014.241.

    Google Scholar 

  42. [42]

    D. P. Kingma, J. Ba. Adam: A Method for Stochastic Optimization, [Online], Available: https://arxiv.org/pdf/1409.1556.pdf, September, 2014.

    Google Scholar 

  43. [43]

    L. van der Maaten, G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.

    MATH  Google Scholar 

  44. [44]

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2013.

    MATH  Google Scholar 

  45. [45]

    R. Plutchik. The multifactor-analytic theory of emotion. Journal of Psychology, vol. 50, no. 1, pp. 153–171, 1960. DOI: https://doi.org/10.1080/00223980.1960.9916432.

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Key Research & Development Plan of China (No. 2017YFB1002804), and National Natural Science Foundation of China (Nos. 61425017, 61773379, 61332017, 61603390 and 61771472) and the Major Program for the 325 National Social Science Fund of China (No. 13&ZD189).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ya Li.

Additional information

Zheng Lian received the B. Eng. degree in telecommunication from Beijing University of Posts and Telecommunications, China in 2016. He is a Ph. D. degree candidate in pattern recognition and intelligent system at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China.

His research interests include affective computing, deep learning and multimodal emotion recognition.

Ya Li received the B. Eng. degree in automation from University of Science and Technology of China (USTC), China in 2007, and the Ph. D. degree in pattern recognition and intelligent system from NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2012. She is currently an associate professor in CASIA, China. She has published more than 50 papers in the related journals and conferences, such as Speech Communication, International Conference on Acoustics, Speech and Signal Processing (ICASSP), INTERSPEECH, and International Conference on Affective Computing and Intelligent Interaction (ACII). She has won the Second Prize of Beijing Science and Technology Award in 2014. She has also won the Best Student Paper in Interspeech 2016.

Her interests include affective computing and human-computer interaction.

Jian-Hua Tao received the Ph. D. degree in computer science from Tsinghua University, China in 2001. He is winner of the National Science Fund for Distinguished Young Scholars and the deputy director in NLPR, CASIA, China. He has directed many national projects, including “863”, National Natural Science Foundation of China. He has published more than eighty papers on journals and proceedings including IEEE Transactions on ASLP, and ICASSP, INTERSPEECH. He also serves as the steering committee member for IEEE Transactions on Affective Computing and the chair or program committee member for major conferences, including International Conference on Pattern Recognition (ICPR), Interspeech, etc.

His research interests include speech synthesis, affective computing and pattern recognition.

Jian Huang received the B. Eng. degree in automation from Wuhan University. China in 2015. He is a Ph. D. degree candidate in pattern recognition and intelligent system at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China. He had published the papers in Interspeech and ICASSP.

His research interests include affective computing, deep learning and multimodal emotion recognition.

Ming-Yue Niu received the M. Sc. degree in information and computing science from Department of Applied Mathematics, Northwestern Polytechnical University (NWPU), China in 2017. Currently, he is a Ph. D. degree candidate in pattern recognition and intelligent system at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China.

His research interests include affective computing and human-computer interaction.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lian, Z., Li, Y., Tao, J. et al. Expression Analysis Based on Face Regions in Real-world Conditions. Int. J. Autom. Comput. 17, 96–107 (2020). https://doi.org/10.1007/s11633-019-1176-9

Download citation

Keywords

  • Facial emotion analysis
  • face areas
  • class activation map
  • confusion matrix
  • concerned area