Facial Expression Recognition Based on Multi-scale CNNs
This paper proposes a new method for facial expression recognition, called multi-scale CNNs. It consists several sub-CNNs with different scales of input images. The sub-CNNs of multi-scale CNNs are benefited from various scaled input images to learn the optimalized parameters. After trained all these sub-CNNs separately, we can predict the facial expression of an image by extracting its features from the last fully connected layer of sub-CNNs in different scales and mapping the averaged features to the final classification probability. Multi-scale CNNs can classify facial expression more accurately than any single scale sub-CNN. On Facial Expression Recognition 2013 database, multi-scale CNNs achieved an accuracy of 71.80 % on the testing set, which is comparative to other state-of-the-art methods.
KeywordsFacial expression recognition Multi-scale CNNs CNN Deep learning Patten recognition
This work was supported by the National Key Research and Development Plan (Grant No.2016YFC0801002), the Chinese National Natural Science Foundation Projects \(\sharp \)61473291, \(\sharp \)61572501, \(\sharp \)61502491, \(\sharp \)61572536, Science and Technology Development Fund of Macau (No. 019/2014/A1), NVIDIA GPU donation program and AuthenMetric R&D Funds.
- 1.Challenges in representation learning: facial expression recognition challenge. https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge. Accessed 30 June 2016
- 2.Darwin, C., Ekman, P., Prodger, P.: The Expression of the Emotions in Man and Animals. Oxford University Press, USA (1998)Google Scholar
- 4.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. ArXiv e-prints, 12 (2015)Google Scholar
- 5.He, K., Zhang, X., Ren, S., Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
- 6.Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
- 7.Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R., Vincent, P., Courville, A., Bengio, Y., Ferrari, R.C., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 543–550. ACM (2013)Google Scholar
- 8.Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)Google Scholar
- 9.Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1805–1812 (2014)Google Scholar
- 10.Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)Google Scholar
- 11.Tang, Y.: Deep learning using linear support vector machines (2013). arXiv preprint: arXiv:1306.0239
- 12.Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the ACM on International Conference on Multimodal Interaction, pp. 435–442. ACM (2015)Google Scholar