Skip to main content
Log in

Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Learning-centered emotions have a significant role in the cognitive process in learning. For this reason, it is relevant that virtual learning environments consider the cognitive and affective aspects of the student. Methods of artificial intelligence such as the recognition of facial expressions, and sentimental analysis have proven to be an excellent alternative in the automatic recognition of emotions. However, learning-centered emotions and opinion-based sentiment dataset commonly contain single modalities. At the same time, single modalities cannot effectively represent complex emotions in real life. This work presents three different fusion methods applied to three image-based and text-based dataset for learning-centered emotion recognition. Using some conventional deep learning architectures, the three new multimodal datasets showed promising results when compared with similar architectures trained in unimodal information. The improvement of one of the methods (embedding-based representation) was 4% compared to single-modality hyperparameter optimization. The main objective of this study is to benchmark the viability of semantic fusion of multimodal learning-centered emotional data from the different datasets for intelligent tutoring system applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

“The dataset generated during and/or analyzed during the current study are not publicly available due to the privacy of the subjects in the dataset but are available from the corresponding author on reasonable request.”

References

  • Bartolini M, Ciaccia P (2008) Scenique: a multimodal image retrieval interface. In: Proceedings of the workshop on advanced visual interfaces AVI, pp 476–477, New York, New York, USA. ACM Press

  • Chatterjee A, Gupta U, Chinnakotla MK, Srikanth R, Galley M, Agrawal P (2019) Understanding emotions in text using deep learning and big data. Comput Hum Behav 93:309–317

    Article  Google Scholar 

  • Chen F, Luo Z (2019) Sentiment analysis using deep robust complementary fusion of multi-features and multi-modalities. CoRR, arXiv:1904.08138

  • Crangle CE, Wang R, Guimaraes MP, Nguyen MU, Nguyen DT, Suppes P (2019) Machine learning for the recognition of emotion in the speech of couples in psychotherapy using the stanford suppes brain lab psychotherapy dataset. CoRR, arXiv:1901.04110

  • D’Mello S, Graesser A (2012) Dynamics of affective states during complex learning. Learn Instr 22(2):145–157

    Article  Google Scholar 

  • Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: IEEE international conference on intelligent robots and systems, volume 2015, pp 681–687

  • Ekman P (1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200

    Article  Google Scholar 

  • González-Hernández F, Zatarain-Cabada R, Barrón-Estrada ML, Rodríguez-Rangel H (2018) Recognition of learning-centered emotions using a convolutional neural network. J Intell Fuzzy Syst 34(5):3325–3336

    Article  Google Scholar 

  • Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emotion recognition using deep belief network architecture. Inf Fusion 51:10–18

    Article  Google Scholar 

  • Hu A, Flaxman S (2018) Multimodal sentiment analysis to explore the structure of emotions. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 350–358

  • Kanjo E, Younis EM, Ang CS (2019) Deep learning analysis of mobile physiological, environmental and location sensor data for emotion detection. Info Fusion 49:46–56

    Article  Google Scholar 

  • Lahat D, Adalı T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges and prospects. Proc IEEE 103(9):1449–1477

    Article  Google Scholar 

  • Mollahosseini A, Hasani B, Mahoor MH (2019) AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  • Oramas S, Barbieri F, Nieto O, Serra X (2018) Multimodal deep learning for music genre classification. Trans Int Soc Music Inf Retr 1(1):4–21

    Google Scholar 

  • Oramas Bustillos R, Zatarain Cabada R, Barrón Estrada ML, Hernández Pérez Y (2019) Opinion mining and emotion recognition in an intelligent learning environment. Comput Appl Eng Educ 27(1):90–101

    Article  Google Scholar 

  • Radu V, Tong C, Bhattacharya S, Lane ND, Mascolo C, Marina MK, Kawsar F (2018) Multimodal deep learning for activity and context recognition. In: Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 1(4):1–27

  • Ranganathan H, Chakraborty S, Panchanathan S (2016) Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE winter conference on applications of computer vision, WACV 2016, pp 1–9. IEEE

  • Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178

    Article  Google Scholar 

  • Tyng CM, Amin HU, Saad MN, Malik AS (2017) The influences of emotion on learning and memory

  • Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  • Yang L, Ban X, Mukeshimana M, Chen Z (2019) Multimodal emotion recognition using the symmetric S-ELM-LUPI paradigm. Symmetry 11(4):487

    Article  Google Scholar 

  • Yuhas BP, Goldstein MH, Sejnowski TJ (1989) Integration of acoustic and visual speech signals using neural networks. IEEE Commun Mag 27(11):65–71

    Article  Google Scholar 

  • Zatarain Cabada R, Rodriguez Rangel H, Barron Estrada ML, Cardenas Lopez, HM (2019) Hyperparameter optimization in CNN for learning-centered emotion recognition for intelligent tutoring systems. Soft Comput

Download references

Funding

The work described in this paper was fully supported by a scholarship from CONACYT (Consejo Nacional de Ciencia y Tecnologia) in México

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramón Zatarain-Cabada.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cárdenas-López, H.M., Zatarain-Cabada, R., Barrón-Estrada, M.L. et al. Semantic fusion of facial expressions and textual opinions from different datasets for learning-centered emotion recognition. Soft Comput 27, 17357–17367 (2023). https://doi.org/10.1007/s00500-023-08076-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-08076-1

Keywords

Navigation