Semi-supervised learning for facial expression-based emotion recognition in the continuous domain


Emotion recognition is a very important technique for effective interaction between human and artificial intelligence (AI) system. For a long time, facial expression-based methods have been actively studied, and they are showing high recognition performance thanks to powerful deep learning recently. On the other hand, the images of the datasets used in the conventional emotion recognition studies are usually short in length and often generated through intentional expression. Also, continuous domain annotation of emotional labels in dataset configuration requires high cost. In order to overcome such problems, this paper proposes an emotion recognition method based on semi-supervised learning that utilizes an appropriate amount of unlabeled dataset in parallel while minimizing the use of labeled dataset requiring high training cost. The proposed emotion recognition method is based on CNN-LSTM-based regressor for regressing arousal and valence in continuous domain. In addition, we present scenarios and design criteria in which semi-supervised learning can be effectively applied to emotion recognition tasks through experiments using well-known MAHNOB-HCI and AFEW-VA datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Dai A M, Le Q V (2015). Semi-supervised sequence learning. In: advances in neural information processing systems (NIPS), pp. 3079-3087

  2. 2.

    Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 248-255

  3. 3.

    Dhall A, Goecke R, Ghosh S, Joshi J, Hoey J, Gedeon T (2017) From individual to group-level emotion recognition: EmotiW 5.0. In: 19th ACM international conference on multimodal interaction (ICMI), pp. 524-528

  4. 4.

    Ghimire D, Lee J (2013) Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13(6):7714–7734

    Article  Google Scholar 

  5. 5.

    Goodfellow I J, Erhan D, Carrier P L, Courville A, Mirza M, Hamner B, et al (2013) Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing. Springer, pp. 117–124

  6. 6.

    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  7. 7.

    Huang G, Liu Z, Van Der Maaten L, Weinberger K Q (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4700-4708

  8. 8.

    Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: IEEE international conference on computer vision (ICCV), pp. 2983-2991

  9. 9.

    Ketkar N. (2017) Introduction to pytorch. In Deep learning with python, Apress, pp. 195–208

  10. 10.

    Kim D H, Lee M K, Choi D Y, Song B C (2017) Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild. In: 19th ACM international conference on multimodal interaction (ICMI), pp. 529-535

  11. 11.

    Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2011) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31

    Article  Google Scholar 

  12. 12.

    Kollias D, Tzirakis P, Nicolaou MA, Papaioannou A, Zhao G, Schuller B, Kotsia I, Zafeiriou S (2019) Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. Int J Comput Vis 127(6–7):907–929

    Article  Google Scholar 

  13. 13.

    Kossaifi J, Tzimiropoulos G, Todorovic S, Pantic M (2017) AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis Comput 65:23–36

    Article  Google Scholar 

  14. 14.

    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images, technical report. Univ Toronto 1(4):7

    Google Scholar 

  15. 15.

    Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242.

  16. 16.

    Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops (CVPRW), pp. 94-101

  17. 17.

    Ma Y, Chen W, Ma X, Xu J, Huang X, Maciejewski R, TungA K (2017) EasySVM: a visual analysis approach for open-box support vector machines. Comput Vis Med 3(2):161–175

    Article  Google Scholar 

  18. 18.

    Mehrkanoon S, Alzate C, Mall R, Langone R, Suykens JA (2014) Multiclass semisupervised learning based upon kernel spectral clustering. IEEE Trans Neural Netw Learn Syst 26(4):720–733

    MathSciNet  Article  Google Scholar 

  19. 19.

    Mehrkanoon S, Agudelo OM, Suykens JA (2015) Incremental multi-class semi-supervised clustering regularized by Kalman filtering. Neural Netw 71:88–104

    Article  Google Scholar 

  20. 20.

    Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y (2011) Reading digits in natural images with unsupervised feature learning. In: Workshop Deep Learn. Unsupervised Feature Learn. Neural Inf. Process. Syst. Workshops

  21. 21.

    Pantic, M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: IEEE international conference on multimedia and Expo (ICMI), p. 5

  22. 22.

    Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: European conference on computer vision (ECCV), pp. 135-152

  23. 23.

    Rezagholiradeh M, Haidar M A (2018) Reg-Gan: semi-supervised learning based on generative adversarial networks for regression. In: international conference on acoustics, speech and signal processing (ICASSP), pp. 2806-2810

  24. 24.

    Robert T, Thome N, Cord M (2018) Hybridnet: classification and reconstruction cooperation for semi-supervised learning. In: European conference on computer vision (ECCV), pp. 153-169

  25. 25.

    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: advances in neural information processing systems (NIPS), pp. 2234-2242

  26. 26.

    Soleymani M, Lichtenauer J, Pun T, Pantic M (2011) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55

    Article  Google Scholar 

  27. 27.

    Soleymani M, Asghari-Esfeden S, Fu Y, Pantic M (2015) Analysis of EEG signals and facial expressions for continuous emotion detection. IEEE Trans Affect Comput 7(1):17–28

    Article  Google Scholar 

  28. 28.

    Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: advances in neural information processing systems (NIPS), pp. 1195-1204

  29. 29.

    Tong Y, Liao W, Ji Q (2007) Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans Pattern Anal Mach Intell 29(10):1683–1699

    Article  Google Scholar 

  30. 30.

    Yang H, Ciftci U, Yin L (2018) Facial expression recognition by de-expression residue learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 2168-2177

  31. 31.

    Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li S Z (2017) S3fd: single shot scale-invariant face detector. In: IEEE international conference on computer vision (ICCV), pp. 192-201

  32. 32.

    Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619

    Article  Google Scholar 

Download references


This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) [2020-0-01389, Artificial Intelligence Convergence Research Center(Inha University)] and Industrial Technology Innovation Program through the Ministry of Trade, Industry, and Energy (MI, Korea) [Development of Human-Friendly Human-Robot Interaction Technologies Using Human Internal Emotional States] under Grant 10073154.

Author information



Corresponding author

Correspondence to Byung Cheol Song.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Choi, D.Y., Song, B.C. Semi-supervised learning for facial expression-based emotion recognition in the continuous domain. Multimed Tools Appl 79, 28169–28187 (2020).

Download citation


  • Emotion recognition
  • Semi-supervised learning
  • Convolutional neural network
  • Long shot-term memory