Abstract
Aiming at the problems of insufficient information and poor recognition rate in single-mode emotion recognition, a multi-mode emotion recognition method based on deep belief network is proposed. Firstly, speech and expression signals are preprocessed and feature extracted to obtain high-level features of single-mode signals. Then, the high-level speech features and expression features are fused by using the bimodal deep belief network (BDBN), and the multimodal fusion features for classification are obtained, and the redundant information between modes is removed. Finally, the multi-modal fusion features are classified by LIBSVM to realize the final emotion recognition. Based on the Friends data set, the proposed model is demonstrated experimentally. The experimental results show that the recognition accuracy of multimodal fusion feature is the best, which is 90.89%, and the unweighted recognition accuracy of the proposed model is 86.17%, which is better than other comparison methods, and has certain research value and practicability.
Similar content being viewed by others
References
Rahdari, F., Rashedi, E., Eftekhari, M.: A Multimodal Emotion Recognition System Using Facial Landmark Analysis[J]. Iranian Journal of Science and Technology. Trans. Electr. Eng. 43(JUL.SUPPL.1), S171–S189 (2019)
Nemati, S., Rohani-Dezfuli, A.R., Basiri, E., et al.: A hybrid latent space data fusion method for multimodal emotion recognition[J]. IEEE Access. 7(4), 172948–172964 (2019)
Wang, Y.: Multimodal emotion recognition algorithm based on edge network emotion element compensation and data fusion[J]. Pers. Ubiquit. Comput. 23(3–4), 383–392 (2019)
Wang, Z., Zhou, X., Wang, W., Liang, C.: Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video[J]. Int. J. Mach. Learn. Cybern. 11(4), 923–934 (2020)
Xia, K., Hu, T., Si, W.: Editorial for the special issue on "research on methods of multimodal information fusion in emotion recognition"[J]. Pers. Ubiquit. Comput. 23(3–4), 359–361 (2019)
Jaiswal, M.: Interpreting multimodal machine learning models trained for emotion recognition to address robustness and privacy concerns[J]. Proc. AAAI Conf. Artificial Intell. 34(10), 13716–13717 (2020)
Jaiswal, M., Provost, E.M.: Privacy enhanced multimodal neural representations for emotion recognition[J]. Proc. AAAI Conf. Artificial Intell. 34(5), 7985–7993 (2020)
Choi, D.Y., Kim, D.H., Song, B.C.: Multimodal attention network for continuous-time emotion recognition using video and EEG signals[J]. IEEE Access. 8, 203814–203826 (2020)
Zheng, W.L., Liu, W., Lu, Y., Lu, B.L., Cichocki, A.: EmotionMeter: a multimodal framework for recognizing human emotions[J]. IEEE Trans. Cybern. 49, 1110–1122 (2019)
Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips[J]. Affect. Comput. IEEE Trans. 10(1), 60–75 (2019)
Seng, J.K.P., Ang, L.M.: Multimodal emotion and sentiment modeling from unstructured big data: challenges, architecture, & techniques[J]. IEEE Access. 7(5), 90982–90998 (2019)
Avots, E., Sapinski, T., Bachmann, M., et al.: Audiovisual emotion recognition in wild[J]. Mach. Vis. Appl. 30(5), 975–985 (2019)
Kim, Y., Provost, E.M.: ISLA: temporal segmentation and labeling for audio-visual emotion recognition[J]. Affect. Comput. IEEE Trans. 10(2), 196–208 (2019)
Li, D.H., Wang, Z., Wang, C.H., et al.: The fusion of electroencephalography and facial expression for continuous emotion recognition[J]. IEEE Access. 7(7), 155724–155736 (2019)
Hu, M., Wang, H., Wang, X., et al.: Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks[J]. J. Vis. Commun. Image Represent. 59, 176–185 (2019)
Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps[J]. IEEE Trans. Circuits Syst. Video Technol. 29(6), 1729–1740 (2019)
Li, X., Song, D., Zhang, P., et al.: Emotion recognition from multi-channel EEG data throughConvolutional recurrent neural network[C]// international conference on bioinformatics andBiomedicine. IEEE. 3(4), 352–359 (2017)
A A R , A M M , B S M A . Dear-Mulsemedia: dataset for emotion analysis and recognition in response to multiple sensorial media[J]. Inf. Fusion, 2021, 65(3):37–49
Egger, M., Ley, M., Hanke, S.: Emotion recognition from physiological signal analysis: a review[J]. Electron. Notes Theor. Comput. Sci. 343(5), 35–55 (2019)
Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: M3ER: multiplicative multimodal emotion recognition using facial, textual, and speech cues[J]. Proc. AAAI Conf. Artificial Intell. 34(2), 1359–1367 (2020)
Zhang, H.: Expression-EEG based collaborative multimodal emotion recognition using deep AutoEncoder[J]. IEEE Access. 8(3), 164130–164143 (2020)
Jaratrotkamjorn, A.: Bimodal emotion recognition using deep belief network[J]. ECTI Trans. Comput. Inf. Technol. (ECTI-CIT). 15(1), 73–81 (2021)
Li, Y., Ishi, C.T., Inoue, K., et al.: Expressing reactive emotion based on multimodal emotion recognition for natural conversation in human–robot interaction*[J]. Adv. Robot. 33(1), 1–12 (2019)
Li, J., Zhong, J., Wang, M.: Unsupervised recurrent neural network with parametric Bias framework for human emotion recognition with multimodal sensor data fusion[J]. Sensors and materials. 32(4), 1261–1277 (2020)
Tzirakis, P., Chen, J., Zafeiriou, S., Schuller, B.: End-to-end multimodal affect recognition in real-world environments[J]. Inf. Fusion. 68(5), 46–53 (2021)
Rao, P.: Weighted normalization fusion approach for multimodal emotion recognition[J]. Int. J. Sci. Technol. Res. 9(4), 3092–3098 (2020)
Schmidt, T., Schlindwein, M., Lichtner, K., et al.: Investigating the Relationship Between Emotion Recognition Software and Usability Metrics[J]. i-com. 19(2), 139–151 (2020)
Mansouri-Benssassi, E., Ye, J.: Synch-graph: multisensory emotion recognition through neural synchrony via graph convolutional networks[J]. Proc. AAAI Conf. Artificial Intell. 34(2), 1351–1358 (2020)
Hare, M.M., Garcia, A.M., Hart, K.C., Graziano, P.A.: Intervention response among preschoolers with ADHD: the role of emotion understanding[J]. J. Sch. Psychol. 84(6), 19–31 (2021)
de Boer, M.J., Jürgens, T., Cornelissen, F.W., et al.: Degraded visual and auditory input individually impair audiovisual emotion recognition from speech-like stimuli, but no evidence for an exacerbated effect from combined degradation[J]. Vis. Res. 180(2), 51–62 (2021)
Caldas, O.I., Aviles, O.F., Rodriguez-Guerrero, C.: Effects of presence and challenge variations on emotional engagement in immersive virtual environments[J]. IEEE Trans. Neural Syst. Rehab. Eng. 28(5), 1109–1116 (2020)
Yadegaridehkordi, E., Noor, N.F.B.M., Bin Ayub, M.N., et al.: Affective computing in education: a systematic review and future research[J]. Comput. Educ. 142(11), 1–19 (2019)
Gupta, K.S.: Development of music player application using emotion recognition[J]. Intl. J. Modern Trends Sci. Technol. 7(1), 54–57 (2021)
Acknowledgments
This work is supported This work was supported in part by the Natural Science Foundation of Shandong Province of China under Grant ZR2016AM30, Social Science Planning Research Project of Shandong Province under Grant 18CLYJ50, in part by the Shandong Soft Science Research Program under Grant 2018RKB01144, and in part by The Project of Shandong Province Higher Educational Science and Technology Program under Grant J15LN15.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, D., Chen, L., Wang, Z. et al. Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network. J Grid Computing 19, 22 (2021). https://doi.org/10.1007/s10723-021-09564-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10723-021-09564-0