Abstract
Facial expression recognition (FER) is one of the popular research topics in the field of computer vision. When most of the deep learning expression recognition methods that achieve satisfactory results with a single dataset are applied to a new dataset, additional costs result from labeling the new data. FER under cross-dataset also suffers from difficulties such as data discrepancy and expression ambiguity. To address these issues, we propose an Unsupervised Self-Training Similarity Transfer (USTST) method for cross-domain FER. The Cross-Swin-Transformer (CST) module is designed to extract features and assign greater attention weight to the similar regions of the source and target domain images. The Self-Training Resampling (STR) and the Knowledge Transfer (KT) modules are then constructed to improve the confidence of the model prediction for the target domain. We also design ambiguity suppression loss and cross-domain loss to improve the ability of the model to discriminate expressions while transferring knowledge across domains. The experimental results with the RAF-DB dataset as the source domain and the CK+, JAFFE, SFEW, FER2013 and ExpW datasets as the target domains, show that our approach achieves much higher performance than the state-of-the-art cross-domain FER methods, while requiring no labels of new datasets.
Similar content being viewed by others
Data Availability Statements
Data openly available in a public repository. The data that support the findings of this study are openly available at: \(\bullet \) RAF-DB: http://whdeng.cn/RAF/model1.html/data-set. \(\bullet \) CK+: http://www.jeffcohn.net/Resources/.\(\bullet \) JAFFE: https://zenodo.org/record/3451524.ZGrmn-3ZByUk.\(\bullet \) SFEW: https://cs.anu.edu.au/few/emotiw2015.html.\(\bullet \) FER2013: https://www.kaggle.com/datasets/msam-bare/fer2013.\(\bullet \) ExpW: http://mmlab.ie.cuhk.edu.hk/projects/social-relation/index.html.
Change history
07 December 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11042-023-17794-5
References
Mijwil MM (2022) Has the future started the current growth of artificial intelligence, machine learning, and deep learning. Iraqi J Comput Sci Math Corpus ID: 249688145
Zhuang F, Qi Z, Duan K et al (2020) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76
Liang J, Hu D, Wang Y et al (2021) Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans Pattern Anal Mach Intell 44(11):8602–8617
Li S, Deng W (2020) A deeper look at facial expression dataset bias. IEEE Trans Affect Comput 13(2):881–893
Xie Y, Chen T, Pu T et al (2020) Adversarial graph representation adaptation for cross-domain facial expression recognition. Proceedings of the 28th ACM international conference on multimedia pp 1255–1264
Yang F, Xie W, Zhong T, (2022) Augmented feature representation with parallel convolution for cross-domain facial expression recognition. Biometric recognition: 16th Chinese Conference, CCBR, et al (2022) Beijing, China, November 11–13, 2022, Proceedings. Springer Nature Switzerland, Cham, pp 297–306
Xie Y, Gao Y, Lin J et al (2022) Learning consistent global-local representation for cross-domain facial expression recognition. 26th International conference on pattern recognition (ICPR). IEEE, pp 2489–2495
Xu T, Chen W, Wang P et al (2021) Cdtrans: cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165
Ganin Y, Ustinova E, Ajakan H et al (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2096–2030
Pan SJ, Tsang IW, Kwok JT et al (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Lucey P, Cohn JF, Kanade T et al (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. 2010 IEEE computer society conference on computer vision and pattern recognition-workshops: IEEE, pp 94-101
Lyons M, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with Gabor wavelets. In: Proceedings third IEEE international conference on Automatic Face and Gesture Recognition, Nara, Japan, p 200–205. https://doi.org/10.1109/AFGR.1998.670949
Lyons MJ (2021) “Excavating AI” Re-excavated: debunking a fallacious account of the JAFFE dataset. arXiv:2107.13998
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, p 2106–2112. https://doi.org/10.1109/ICCVW.2011.6130508
Goodfellow IJ, Erhan D, Carrier PL et al (2013) Challenges in representation learning: a report on three machine learning contests. Springer, Berlin, Heidelberg, International conference on neural information processing, pp 117–124
Zhang Z, Luo P, Loy C-C, Tang X (2015) Learning social relation traits from face images. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, p 3631–3639. https://doi.org/10.1109/ICCV.2015.414
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. Proc IEEE Conf Comput Vis Pattern Recognit 2017:2852–2861
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Mohan K, Seal A, Krejcar O, Yazidi A (2021) Facial expression recognition using local gravitational force descriptor-based deep convolution neural networks. IEEE Trans Instrum Meas 70:1–12
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6248–6257
Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7656–7665
Zhang X, Zhang F, Xu C (2022) Joint expression synthesis and representation learning for facial expression recognition. IEEE Trans Circ Syst Video Technol 32(3):1681–1695
Long M, Cao Z, Wang J et al (2018) Conditional adversarial domain adaptation. Adv Neural Inf Process Syst 31
Xu R, Li G, Yang J et al (2019) Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. Proc IEEE/CVF Int Conf Comput Vis 2019:1426–1435
Lee C-Y, Batra T, Baig MH et al (2019) Sliced wasserstein discrepancy for unsupervised domain adaptation. Proc IEEE/CVF Conf Comput Vis Pattern Recognit 2019:10285–10295
Li S, Deng W (2018) Deep emotion transfer network for cross-database facial expression recognition. 2018 24th International conference on pattern recognition (ICPR): IEEE, pp 3092–3099
Chen T, Pu T, Wu H, Xie Y, Liu L, Lin L (2021) Cross-domain facial expression recognition: a unified evaluation benchmark and adversarial graph learning. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3131222
Ji Y, Hu Y, Yang Y et al (2021) Region attention enhanced unsupervised cross-domain facial emotion recognition. IEEE Trans Knowl Data Eng
Li Y, Zhang Z, Chen B et al (2022) Deep margin-sensitive representation learning for cross-domain facial expression recognition. IEEE Trans Multimedia
Peng X, Gu Y, Zhang P (2022) Au-guided unsupervised domain-adaptive facial expression recognition. Appl Sci 12(9):4366
Xu X, Zheng W, Zong Y et al (2022) Sample self-revised network for cross-dataset facial expression recognition. International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Cubuk ED, Zoph B, Shlens J et al (2020) Randaugment: practical automated data augmentation with a reduced search space. Proceedings of the IEEE/CVF Conf Comput Vis Pattern Recognit Workshops 2020:702–703
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. Proc IEEE/CVF Int Conf Comput Vis 2021:10012–10022
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. Proc IEEE Conf Comput Vis Pattern Recognit 2015:815–823
Kiran A, Qureshi SA, Khan A, Mahmood S, Idrees M, Saeed A, Assam M, Refaai MRA, Mohamed A (2022) Reverse image search using deep unsupervised generative learning and deep convolutional neural network. Appl Sci 12(10):4943
Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32
Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vis Pattern Recognit 2016:770–778
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11)
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62071384 and 62371399, the Key Research and Development Project of Shaanxi Province under Grant 2023-YBGY-239, and Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2023-JC-YB-531.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: The original article contains errors in references 12–16. The original article has been corrected.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, Z., Wei, B., Liu, J. et al. USTST: unsupervised self-training similarity transfer for cross-domain facial expression recognition. Multimed Tools Appl 83, 41703–41723 (2024). https://doi.org/10.1007/s11042-023-17317-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17317-2