Improved curriculum learning using SSM for facial expression recognition

  • Xiaoqian Liu
  • Fengyu ZhouEmail author
Original Article


Facial expression recognition is an important research issue in the pattern recognition field. However, the generalization of the model still remains a challenging task. In this paper, we apply a strategy of curriculum learning to facial expression recognition during the stage of training. And a novel curriculum design method is proposed. The system first employs the unsupervised density–distance clustering method to determine the clustering center of each category. Then, the dataset is divided into three subsets of various complexity according to the distance from each sample to the clustering center in the feature space. Importantly, we develop a multistage training process where a main model is trained by continuously adding harder samples to training set to increase the complexity. To solve the problem that the model has a poor recognition accuracy for anger, fear and sadness, a self-selection mechanism is introduced in the test stage to make further judgment on the result of the main model. Experiment results indicate that the proposed model can achieve a satisfactory recognition accuracy of 72.11% on FER-2013 and 98.18% on CK+ dataset for 7-class facial expressions, which outperforms the other state-of-the-art methods.


Curriculum learning Density–distance clustering Facial expression Recognition 



This study was funded by the Key Program of Scientific and Technological Innovation of Shandong Province (Grant No. 2017CXGC0926), Key Research and Development Program of Shandong Province (Grant No. 2017GGX30133), National Key Research and Development Program of China (Grant No. 2017YFB1302400), National Natural Science Foundation of China (Grant No. 61773242).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1113–1133 (2015)CrossRefGoogle Scholar
  2. 2.
    Li, S., Deng, W.: Deep facial expression recognition: a survey (2018)Google Scholar
  3. 3.
    Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124 (1971)CrossRefGoogle Scholar
  4. 4.
    Zeng, Z., Pantic, M., Roisman, G.I., et al.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRefGoogle Scholar
  5. 5.
    Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1424–1445 (2000)CrossRefGoogle Scholar
  6. 6.
    Lucey, P., Cohn, J.F., Kanade, T., et al.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotionspecified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE, pp. 94–101 (2010)Google Scholar
  7. 7.
    Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. (2019).
  8. 8.
    Liang, D., Liang, H., Yu, Z., et al.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. (2019).
  9. 9.
    Ayata, D., Yaslan, Y., Kamasak, M.E.: Emotion based music recommendation system using wearable physiological sensors. IEEE Trans. Consum. Electron. 64, 196–203 (2018)CrossRefGoogle Scholar
  10. 10.
    Shan, C., Gong, S., Mcowan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009) CrossRefGoogle Scholar
  11. 11.
    An, F., Liu, Z.: Facial expression recognition algorithm based on parameter adaptive initialization of CNN and LSTM. Vis. Comput. (2019).
  12. 12.
    Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)CrossRefGoogle Scholar
  13. 13.
    Jung, H., Lee, S., Yim, J., et al.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, CentroParque Convention Center, Chile, pp. 2983–2991 (2015)Google Scholar
  14. 14.
    Li, K., Jin, Y., Akram, M.W., et al.: Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis. Comput. (2019).
  15. 15.
    Mohammadi, M.R., Fatemizadeh, E., Mahoor, M.H.: PCA-based dictionary building for accurate facial expression recognition via sparse representation. J. Vis. Commun. Image Represent. 25(4), 1082–1092 (2014)CrossRefGoogle Scholar
  16. 16.
    Gogić, I., Manhart, M., Pandžić, I.S., et al.: Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. (2018).
  17. 17.
    Mavadati, S.M., Mahoor, M.H., Bartlett, K., et al.: Disfa: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)CrossRefGoogle Scholar
  18. 18.
    Goodfellow, I.J., Erhan, D., Carrier, P.L., et al.: Challenges in representation learning: a report on three machine learning contests. Neural Netw. 64, 59–63 (2015)CrossRefGoogle Scholar
  19. 19.
    Dhall, A., Ramana Murthy, O.V., Goecke, R., et al.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. Seattle, Motif Hotel, USA, pp. 423–426 (2015)Google Scholar
  20. 20.
    Dhall, A., Goecke, R., Joshi, J., et al.: Emotiw 2016: video and group-level emotion recognition challenges. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction. Tokyo, Japan, pp. 427–432 (2016)Google Scholar
  21. 21.
    Bengio, Y., Louradour, J., Collobert, R., et al.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada, pp. 41–48 (2009)Google Scholar
  22. 22.
    Pentina, A., Sharmanska, V., Lampert, C.H.: Curriculum learning of multiple tasks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA, pp. 5492–5500 (2015)Google Scholar
  23. 23.
    Avramova, V.: Curriculum learning with deep convolutional neural networks (2015)Google Scholar
  24. 24.
    Guo, S., Huang, W., Zhang, H., et al.: CurriculumNet: weakly supervised learning from large-scale web images. arXiv: 1808.01097 (2018)
  25. 25.
    Bartlett, M.S., Littlewort, G., Frank, M., et al.: Recognizing facial expression: machine learning and application to spontaneous behavior. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, California, USA, 2, pp. 568–573 (2005)Google Scholar
  26. 26.
    Yang, B., Cao, J., Ni, R., Zhang, Y.: Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6, 4630–4640 (2018)CrossRefGoogle Scholar
  27. 27.
    Ekman, P., Friesen, W.: Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists, San Francisco (1978)Google Scholar
  28. 28.
    Liu, M., Li, S., Shan, S., et al.: Au-inspired deep networks for facial expression feature learning. Neurocomputing 159, 126–136 (2015)CrossRefGoogle Scholar
  29. 29.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Nevada, Lake Tahoe, USA, pp. 1097–1105 (2012)Google Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556 (2014)
  31. 31.
    He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada, Caesars Palace, USA, pp. 770–778 (2016)Google Scholar
  32. 32.
    Khorrami, P., Paine, T., Huang, T.: Do deep neural networks learn facial action units when doing expression recognition. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition Workshops. Santiago, CentroParque, Chile, pp. 19–27 (2015)Google Scholar
  33. 33.
    Tang, Y.: Deep learning using linear support vector machines. arXiv: 1306.0239 (2013)
  34. 34.
    Dehghan, A., Ortiz, E.G., Shu, G., et al.: Dager: Deep age, gender and emotion recognition using convolutional neural network. arXiv: 1702.04280 (2017)
  35. 35.
    Jain, N., Kumar, S., Kumar, A., et al.: Hybrid deep neural networks for face emotion recognition. Pattern Recognit. Lett. 115, 101–106 (2018)CrossRefGoogle Scholar
  36. 36.
    Chernykh, V., Sterling, G., Prihodko, P.: Emotion recognition from speech with recurrent neural networks. arXiv: 1701.08071 (2017)
  37. 37.
    Gui, L., Baltrušaitis, T., Morency, L.P.: Curriculum learning for facial expression recognition. In: Proceedings of International Conference on Automatic Face and Gesture Recognition. Washington, DC, USA, pp. 505–511 (2017)Google Scholar
  38. 38.
    Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRefGoogle Scholar
  39. 39.
    Challenges in representation learning: Facial expression recognition challenge. (2013)
  40. 40.
    Ionescu, R.T., Popescu, M., Grozea, C.: Local learning to improve bag of visual words model for facial expression recognition. In: Workshop on Challenges in Representation Learning, ICML. Atlanta, GA, USA (2013)Google Scholar
  41. 41.
    Ouellet, S.: Real-time emotion recognition for gaming using deep convolutional network features. arXiv preprint arXiv:1408.3750 (2014)
  42. 42.
    Liu, M., Shan, S., Wang, R., et al.: Learning expressionlets on spatiotemporal manifold for dynamic facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1749–1756 (2014)Google Scholar
  43. 43.
    Liu, M., Li, S., Shan, S., et al.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Asian Conference on Computer Vision. Springer, Cham, pp. 143–157 (2014)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Control Science and EngineeringShandong UniversityJinanChina

Personalised recommendations