Expression Analysis Based on Face Regions in Real-world Conditions

Lian, Zheng; Li, Ya; Tao, Jian-Hua; Huang, Jian; Niu, Ming-Yue

doi:10.1007/s11633-019-1176-9

Expression Analysis Based on Face Regions in Real-world Conditions

Research Article
Published: 23 April 2019

Volume 17, pages 96–107, (2020)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

Zheng Lian ORCID: orcid.org/0000-0001-9477-0599^1,2,
Ya Li ORCID: orcid.org/0000-0002-6284-5039¹,
Jian-Hua Tao^1,2,3,
Jian Huang^1,2 &
…
Ming-Yue Niu^1,2

496 Accesses
35 Citations
11 Altmetric
1 Mention
Explore all metrics

Abstract

Facial emotion recognition is an essential and important aspect of the field of human-machine interaction. Past research on facial emotion recognition focuses on the laboratory environment. However, it faces many challenges in real-world conditions, i.e., illumination changes, large pose variations and partial or full occlusions. Those challenges lead to different face areas with different degrees of sharpness and completeness. Inspired by this fact, we focus on the authenticity of predictions generated by different <emotion, region> pairs. For example, if only the mouth areas are available and the emotion classifier predicts happiness, then there is a question of how to judge the authenticity of predictions. This problem can be converted into the contribution of different face areas to different emotions. In this paper, we divide the whole face into six areas: nose areas, mouth areas, eyes areas, nose to mouth areas, nose to eyes areas and mouth to eyes areas. To obtain more convincing results, our experiments are conducted on three different databases: facial expression recognition + ( FER+), real-world affective faces database (RAF-DB) and expression in-the-wild (ExpW) dataset. Through analysis of the classification accuracy, the confusion matrix and the class activation map (CAM), we can establish convincing results. To sum up, the contributions of this paper lie in two areas: 1) We visualize concerned areas of human faces in emotion recognition; 2) We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis. Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive review of facial expression recognition techniques

Article 30 July 2022

R. Rashmi Adyapady & B. Annappa

FERMOUTH: Facial Emotion Recognition from the MOUTH Region

MIGMA: The Facial Emotion Image Dataset for Human Expression Recognition

References

H. Prendinger, J. Mori, M. Ishizuka. Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game. International Journal of Human-Computer Studies, vol. 62, no. 2, pp. 231–245, 2005. Doi: 10.1016/j.ijhcs.2004.11.009.
Google Scholar
B. Martinovski, D. Traum. The error is the clue: Breakdown in human-machine interaction. In Proceedings of the ISCA Tutorial and Research Workshop Error Handling in Spoken Dialogue Systems, Château d’Oex, Switzerland, pp. 11–16, 2003.
Google Scholar
N. Asghar, P. Poupart, J. Hoey, X. Jiang, L. L. Mou. Affective neural response generation. In Proceedings of the 40th European Conference on Information Retrieval Research, Springer, Grenoble, France, pp. 154–166, 2017. DOI: https://doi.org/10.1007/978-3-319-76941-7-12.
Google Scholar
H. Zhou, M. L. Huang, T. Y. Zhang, X. Y. Zhu, B. Liu. Emotional chatting machine: Emotional conversation generation with internal and external memory. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018.
Google Scholar
S. Ghosh, M. Chollet, E. Laksana, L. P. Morency, S. Scherer. Affect-LM: A neural language model for customizable affective text generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL, Vancouver, Canada, pp. 634–642, 2017. DOI: https://doi.org/10.18653/v1/P17-1059.
Google Scholar
N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, pp. 886–893, 2005. DOI: https://doi.org/10.1109/CVPR.2005.177.
Google Scholar
T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002. DOI: https://doi.org/10.1109/TPAMI.2002.1017623.
MATH Google Scholar
V. Ojansivu, J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Proceedings of the 3rd International Conference on Image and Signal Processing, Springer, Cherbourg-Octeville, France, pp. 236–243, 2008. DOI: https://doi.org/10.1007/978-3-540-69905-7-27.
Google Scholar
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. DOI: https://doi.org/10.1023/B:VISI.0000029664.99615.94.
Google Scholar
Y. P. Chen, J. N. Li, H. X. Xiao, X. J. Jin, S. C. Yan, J. S. Feng. Dual path networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Long Beach, USA, pp. 4467–4475, 2017.
Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Curran Associates, Inc., Long Beach, USA, pp. 6000–6010, 2017.
Google Scholar
L. Shen, Z. C. Lin, Q. M. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 467–482, 2016. DOI: 10.1007/978-3-319-46478-7’29.
Google Scholar
L. Chen, H. W. Zhang, J. Xiao, L. Q. Nie, J. Shao, W. Liu, T. S. Chua. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 5659–5667, 2017. DOI: https://doi.org/10.1109/CVPR.2017.667.
Google Scholar
A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu. Wavenet: A generative model for raw audio. In Proceedings of the 9th ISCA Speech Synthesis Workshop, Sunnyvale, USA, 2016.
Google Scholar
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, Nevada, pp. 1097–1105, 2012.
Google Scholar
K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition, [Online], Available: https://arxiv.org/pdf/1409.1556.pdf, September, 2014.
Google Scholar
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
Google Scholar
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
Google Scholar
G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2261–2269, 2017. DOI: https://doi.org/10.1109/CVPR.2017.243.
Google Scholar
P. Ekman, W. V. Friesen. The Facial Action Coding System (FACS): A Technique for the Measurement of Facial Action, Palo Alto, USA: Consulting Psychologists, 1978.
Google Scholar
Y. I. Tian, T. Kanade, J. F. Cohn. Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97–115, 2001. DOI: https://doi.org/10.1109/34.908962.
Google Scholar
L. Wang, R. F. Li, K. Wang, J. Chen. Feature representation for facial expression recognition based on FACS and LBP. International Journal of Automation and Computing, vol. 11, no. 5, pp. 459–468, 2014. DOI: https://doi.org/10.1007/s11633-014-0835-0.
Google Scholar
X. Sun, M. Lv, C. Q. Quan, F. J. Ren. Improved facial expression recognition method based on ROI deep convolutional neutral network. In Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction, IEEE, San Antonio, USA, pp. 256–261, 2017. DOI: https://doi.org/10.1109/ACII.2017.8273609.
Google Scholar
Z. Wei, Y. M. Zhang, L. Ma, J. W. Guan, S. J. Gong. Multimodal learning for facial expression recognition. Pattern Recognition, vol. 48, no. 10, pp. 3191–3202, 2015. DOI: https://doi.org/10.1016/j.patcog.2015.04.012.
Google Scholar
F. K. Zaman, A. A. Shafie, Y. M. Mustafah. Robust face recognition against expressions and partial occlusions. International Journal of Automation and Computing, vol. 13, no. 4, pp. 319–337, 2016. DOI: https://doi.org/10.1007/s11633-016-0974-6.
Google Scholar
B. L. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba. Learning deep features for discriminative localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 2921–2929, 2016. DOI: https://doi.org/10.1109/CVPR.2016.319.
Google Scholar
C. Busso, Z. G. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Multimodal Interfaces, ACM, State College, USA, pp. 205–211, 2004. DOI: https://doi.org/10.1145/1027933.1027968.
Google Scholar
S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Curran Associates, Inc., Lille, France, pp. 448–456, 2015.
Google Scholar
K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, Y. LeCun. What is the best multi-stage architecture for object recognition?. In Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, Japan, pp. 2146–2153, 2010. DOI: https://doi.org/10.1109/ICCV.2009.5459469.
Google Scholar
E. Barsoum, C. Zhang, C. C. Ferrer, Z. Y. Zhang. Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, pp. 279–283, 2016. DOI: https://doi.org/10.1145/2993148.2993165.
Google Scholar
S. Li, W. H. Deng, J. P. Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2584–2593, 2017. DOI: https://doi.org/10.1109/CVPR.2017.277.
Google Scholar
Z. P. Zhang, P. Luo, C. C. Loy, X. O. Tang. From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, vol. 126, no. 5, pp. 550–569, 2018. DOI: https://doi.org/10.1007/s11263-017-1055-1.
MathSciNet Google Scholar
M. J. Lyons, J. Budynek, S. Akamatsu. Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, pp. 1357–1362, 1999. DOI: https://doi.org/10.1109/34.817413.
Google Scholar
M. Pantic, M. Valstar, R. Rademaker, L. Maat. Webbased database for facial expression analysis. In Proceedings of IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, 2005. DOI: https://doi.org/10.1109/ICME.2005.1521424.
Google Scholar
G. Y. Zhao, X. H. Huang, M. Taini, S. Z. Li, M. Pietikäinen. Facial expression recognition from near-infrared videos. Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011. DOI: https://doi.org/10.1016/j.imavis.2011.07.002.
Google Scholar
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotionspecified expression. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, USA, pp. 94–101, 2010. DOI: https://doi.org/10.1109/CVPRW.2010.5543262.
Google Scholar
A. Dhall, O. V. Ramana Murthy, R. Goecke, J. Joshi, T. Gedeon. Video and image based emotion recognition challenges in the wild: EmotiW 2015. In Proceedings of the ACM on International Conference on Multimodal Interaction, Seattle, USA, pp. 423–426, 2015. DOI: https://doi.org/10.1145/2818346.2829994.
Google Scholar
D. E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
Google Scholar
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. C. Tang, D. Thaler, D. H. Lee, Y. B. Zhou, C. Ramaiah, F. X. Feng, R. F. Li, X. J. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. J. Xie, L. Romaszko, B. Xu, Z. Chuang, Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, vol. 64, pp. 59–63, 2015. DOI: https://doi.org/10.1016/j.neunet.2014.09.005.
Google Scholar
B. Yang, J. J. Yan, Z. Lei, S. Z. Li. Aggregate channel features for multi-view face detection. In Proceedings of IEEE International Joint Conference on Biometrics, Clearwater, USA, pp. 1–8, 2014. DOI: https://doi.org/10.1109/BTAS.2014.6996284.
Google Scholar
V. Kazemi, J. Sullivan. One millisecond face alignment with an ensemble of regression trees. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1867–1874, 2014. DOI: https://doi.org/10.1109/CVPR.2014.241.
Google Scholar
D. P. Kingma, J. Ba. Adam: A Method for Stochastic Optimization, [Online], Available: https://arxiv.org/pdf/1409.1556.pdf, September, 2014.
Google Scholar
L. van der Maaten, G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
MATH Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2013.
MATH Google Scholar
R. Plutchik. The multifactor-analytic theory of emotion. Journal of Psychology, vol. 50, no. 1, pp. 153–171, 1960. DOI: https://doi.org/10.1080/00223980.1960.9916432.
Google Scholar

Download references

Acknowledgments

This work is supported by the National Key Research & Development Plan of China (No. 2017YFB1002804), and National Natural Science Foundation of China (Nos. 61425017, 61773379, 61332017, 61603390 and 61771472) and the Major Program for the 325 National Social Science Fund of China (No. 13&ZD189).

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, 100190, China
Zheng Lian, Ya Li, Jian-Hua Tao, Jian Huang & Ming-Yue Niu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100190, China
Zheng Lian, Jian-Hua Tao, Jian Huang & Ming-Yue Niu
CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Jian-Hua Tao

Authors

Zheng Lian
View author publications
You can also search for this author in PubMed Google Scholar
Ya Li
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Hua Tao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Yue Niu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya Li.

Additional information

Zheng Lian received the B. Eng. degree in telecommunication from Beijing University of Posts and Telecommunications, China in 2016. He is a Ph. D. degree candidate in pattern recognition and intelligent system at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China.

His research interests include affective computing, deep learning and multimodal emotion recognition.

Ya Li received the B. Eng. degree in automation from University of Science and Technology of China (USTC), China in 2007, and the Ph. D. degree in pattern recognition and intelligent system from NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2012. She is currently an associate professor in CASIA, China. She has published more than 50 papers in the related journals and conferences, such as Speech Communication, International Conference on Acoustics, Speech and Signal Processing (ICASSP), INTERSPEECH, and International Conference on Affective Computing and Intelligent Interaction (ACII). She has won the Second Prize of Beijing Science and Technology Award in 2014. She has also won the Best Student Paper in Interspeech 2016.

Her interests include affective computing and human-computer interaction.

Jian-Hua Tao received the Ph. D. degree in computer science from Tsinghua University, China in 2001. He is winner of the National Science Fund for Distinguished Young Scholars and the deputy director in NLPR, CASIA, China. He has directed many national projects, including “863”, National Natural Science Foundation of China. He has published more than eighty papers on journals and proceedings including IEEE Transactions on ASLP, and ICASSP, INTERSPEECH. He also serves as the steering committee member for IEEE Transactions on Affective Computing and the chair or program committee member for major conferences, including International Conference on Pattern Recognition (ICPR), Interspeech, etc.

His research interests include speech synthesis, affective computing and pattern recognition.

Jian Huang received the B. Eng. degree in automation from Wuhan University. China in 2015. He is a Ph. D. degree candidate in pattern recognition and intelligent system at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China. He had published the papers in Interspeech and ICASSP.

His research interests include affective computing, deep learning and multimodal emotion recognition.

Ming-Yue Niu received the M. Sc. degree in information and computing science from Department of Applied Mathematics, Northwestern Polytechnical University (NWPU), China in 2017. Currently, he is a Ph. D. degree candidate in pattern recognition and intelligent system at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China.

His research interests include affective computing and human-computer interaction.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lian, Z., Li, Y., Tao, JH. et al. Expression Analysis Based on Face Regions in Real-world Conditions. Int. J. Autom. Comput. 17, 96–107 (2020). https://doi.org/10.1007/s11633-019-1176-9

Download citation

Received: 20 September 2018
Accepted: 08 March 2019
Published: 23 April 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11633-019-1176-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expression Analysis Based on Face Regions in Real-world Conditions

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of facial expression recognition techniques

FERMOUTH: Facial Emotion Recognition from the MOUTH Region

MIGMA: The Facial Emotion Image Dataset for Human Expression Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Expression Analysis Based on Face Regions in Real-world Conditions

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of facial expression recognition techniques

FERMOUTH: Facial Emotion Recognition from the MOUTH Region

MIGMA: The Facial Emotion Image Dataset for Human Expression Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation