Skip to main content

Two-Stage Adaptation for Cross-Corpus Multimodal Emotion Recognition

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

  • 938 Accesses

Abstract

The development of multimodal emotion recognition is severely limited by time-consuming annotation costs. In this paper, we pay attention to the multimodal emotion recognition task in the cross-corpus setting, which can help adapt a trained model to an unlabeled target corpus. Inspired by the recent development of pre-trained models, we adopt a multimodal emotion pre-trained model to provide a better representation learning foundation for our task. However, we may face two domain gaps when applying a pre-trained model to the cross-corpus downstream task: the scenario gap between pre-trained and downstream corpora, and the distribution gap between different downstream sets. To bridge these two gaps, we propose a two-stage adaptation method. Specifically, we first adapt a pre-trained model to the task-related scenario through task-adaptive pre-training. We then fine-tune the model with a cluster-based loss to align the distribution of two downstream sets in a class-conditional manner. Additionally, we propose a ranking-based pseudo-label filtering strategy to obtain more balanced and high-quality samples from the target sets for calculating the cluster-based loss. We conduct extensive experiments on two emotion datasets, IEMOCAP and MSP-IMPROV. The results of our experiments demonstrate the effectiveness of our proposed two-stage adaptation method and the pseudo-label filtering strategy in cross-corpus settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Neumann, M., et al.: Cross-lingual and multilingual speech emotion recognition on English and French. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5769–5773. IEEE (2018)

    Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  3. Li, T., Chen, X., Zhang, S., Dong, Z., Keutzer, K.: Cross-domain sentiment classification with contrastive learning and mutual information maximization. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021, pp. 8203–8207. IEEE (2021)

    Google Scholar 

  4. Chen, Y.-C., et al.: UNITER: UNiversal Image-TExt representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7

    Chapter  Google Scholar 

  5. Bao, H., et al.: VLMo: unified vision-language pre-training with mixture-of-modality-experts. In: Advances in Neural Information Processing Systems, vol. 35, pp. 32897–32912 (2022)

    Google Scholar 

  6. Zhao, J., Li, R., Jin, Q., Wang, X., Li, H.: MEmoBERT: pre-training model with prompt-based learning for multimodal emotion recognition. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 4703–4707. IEEE (2022)

    Google Scholar 

  7. Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)

    Google Scholar 

  8. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)

    Article  Google Scholar 

  9. Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., Provost, E.M.: MSP-IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 8(1), 67–80 (2016)

    Article  Google Scholar 

  10. Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8342–8360 (2020)

    Google Scholar 

  11. Deng, Z., Luo, Y., Zhu, J.: Cluster alignment with a teacher for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9944–9953 (2019)

    Google Scholar 

  12. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16

    Chapter  Google Scholar 

  13. Karouzos, C., Paraskevopoulos, G., Potamianos, A.: UDALM: unsupervised domain adaptation through language modeling. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2579–2590 (2021)

    Google Scholar 

  14. Wilson, G., Cook, D.J.: A survey of unsupervised deep domain adaptation. ACM Trans. Intell. Syst. Technol. (TIST) 11(5), 1–46 (2020)

    Article  Google Scholar 

  15. Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp. 97–105. PMLR (2015)

    Google Scholar 

  16. Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35

    Chapter  Google Scholar 

  17. Wang, M., Deng, W.: Deep face recognition with clustering based domain adaptation. Neurocomputing 393, 1–14 (2020)

    Article  Google Scholar 

  18. Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189. PMLR (2015)

    Google Scholar 

  19. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)

    Google Scholar 

  20. Abdelwahab, M., Busso, C.: Domain adversarial for acoustic emotion recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 26(12), 2423–2435 (2018)

    Article  Google Scholar 

  21. Milner, R., Jalal, M.A., Ng, R.W., Hain, T.: A cross-corpus study on speech emotion recognition. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 304–311. IEEE (2019)

    Google Scholar 

  22. Yin, Y., Huang, B., Wu, Y., Soleymani, M.: Speaker-invariant adversarial domain adaptation for emotion recognition. In: Proceedings of the 2020 International Conference on Multimodal Interaction, pp. 481–490 (2020)

    Google Scholar 

  23. Gao, Y., Okada, S., Wang, L., Liu, J., Dang, J.: Domain-invariant feature learning for cross corpus speech emotion recognition. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 6427–6431. IEEE (2022)

    Google Scholar 

  24. Wen, Y., Zhang, K., Li, Z., Qiao, Yu.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31

    Chapter  Google Scholar 

  25. Luo, Y., Zhu, J., Li, M., Ren, Y., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8896–8905 (2018)

    Google Scholar 

  26. Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: International Workshop on Artificial Intelligence and Statistics, pp. 57–64. PMLR (2005)

    Google Scholar 

  27. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)

  28. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  29. Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)

    Google Scholar 

  30. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Key R &D Program of China (No. 2020AAA0108600) and the National Natural Science Foundation of China (No. 62072462).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Z., Zhao, J., Jin, Q. (2023). Two-Stage Adaptation for Cross-Corpus Multimodal Emotion Recognition. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44696-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44695-5

  • Online ISBN: 978-3-031-44696-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics