Abstract
This paper proposes a computationally efficient method for learning features robust to image variations for facial expression recognition (FER). The proposed method minimizes the feature difference between an image under a variable image variation and a corresponding target image with the best image conditions for FER (i.e. frontal face image with uniform illumination). This is achieved by regulating the objective function during the learning process where a Siamese network is employed. At the test stage, the learned network parameters are transferred to a convolutional neural network (CNN) with which the features robust to image variations can be obtained. Experiments have been conducted on the Multi-PIE dataset to evaluate the proposed method under a large number of variations including pose and illumination. The results show that the proposed method improves the FER performance under different variations without requiring extra computational complexity.
Keywords
- Deep learning
- Convolutional neural networks
- Siamese-network
- Facial expression recognition
This is a preview of subscription content, access via your institution.
Buying options





References
Cohn, J.F., Ekman, P.: Measuring facial action. In: The New Handbook of Methods in Nonverbal Behavior Research, pp. 9–64 (2005)
Tian, Y.-L., Kanade, T., Cohn, J.F.: Facial expression analysis. In: Tian, Y.-L., Kanade, T., Cohn, J.F. (eds.) Handbook of Face Recognition, pp. 247–275. Springer, New York (2005)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. Pattern Anal. Mach. Intell. 31, 39–58 (2009)
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17, 124 (1971)
Valstar, M.F., Patras, I., Pantic, M.: Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005)-Workshops, pp. 76–76. IEEE (2005)
Ramirez Rivera, A., Rojas Castillo, J., Chae, O.: Local directional number pattern for face analysis: Face and expression recognition. IEEE Trans. Image Process. 22, 1740–1752 (2013)
Lee, S.H., Baddar, W.J., Ro, Y.M.: Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos. Pattern Recogn. 54, 52–67 (2016)
Jiang, B., Valstar, M., Martinez, B., Pantic, M.: A dynamic appearance descriptor approach to facial actions temporal modeling. IEEE Trans. Cybern. 44, 161–174 (2014)
Mäenpää, T.: The local binary pattern approach to texture analysis: extensions and applications. Oulun yliopisto (2003)
Zhang, L., Tjondronegoro, D., Chandran, V.: Random Gabor based templates for facial expression recognition in images with facial occlusion. Neurocomputing 145, 451–464 (2014)
Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16817-3_10
Liu, M., Li, S., Shan, S., Chen, X.: AU-aware deep networks for facial expression recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–6. IEEE (2013)
Liu, M., Li, S., Shan, S., Chen, X.: AU-inspired deep networks for facial expression feature learning. Neurocomputing 159, 126–136 (2015)
Mollahosseini, A., Chan, D., Mahoor, M.H.: going deeper in facial expression recognition using deep neural networks (2015). arXiv preprint arXiv:1511.04110
Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)
Khorrami, P., Paine, T., Huang, T.: Do deep neural networks learn facial action units when doing expression recognition? In: Proceedings of IEEE International Conference on Computer Vision Workshops, pp. 19–27 (2015)
Gupta, O., Raviv, D., Raskar, R.: Deep video gesture recognition using illumination invariants (2016). arXiv preprint arXiv:1603.06531
Kim, B.-K., Roh, J., Dong, S.-Y., Lee, S.-Y.: Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J. Multimodal User Interfaces 10, 1–17 (2016)
Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions (2014). arXiv preprint arXiv:1409.4842
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv preprint arXiv:1512.03385
Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vis. Comput. 28, 807–813 (2010)
Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropconnect. In: Proceedings of 30th International Conference on Machine Learning (ICML-13), pp. 1058–1066. (2013)
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1859–1866. IEEE (2014)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of 27th International Conference on Machine Learning (ICML-10), pp. 807–814. (2010)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 85 (2008)
Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2015R1A2A2A01005724).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Baddar, W.J., Kim, D.H., Ro, Y.M. (2017). Learning Features Robust to Image Variations with Siamese Networks for Facial Expression Recognition. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-51811-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)