Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network


Aiming at the problems of insufficient information and poor recognition rate in single-mode emotion recognition, a multi-mode emotion recognition method based on deep belief network is proposed. Firstly, speech and expression signals are preprocessed and feature extracted to obtain high-level features of single-mode signals. Then, the high-level speech features and expression features are fused by using the bimodal deep belief network (BDBN), and the multimodal fusion features for classification are obtained, and the redundant information between modes is removed. Finally, the multi-modal fusion features are classified by LIBSVM to realize the final emotion recognition. Based on the Friends data set, the proposed model is demonstrated experimentally. The experimental results show that the recognition accuracy of multimodal fusion feature is the best, which is 90.89%, and the unweighted recognition accuracy of the proposed model is 86.17%, which is better than other comparison methods, and has certain research value and practicability.

This work was supported in part by the Natural Science Foundation of Shandong Province of China under Grant ZR2016AM30, Social Science Planning Research Project of Shandong Province under Grant 18CLYJ50, in part by the Shandong Soft Science Research Program under Grant 2018RKB01144, and in part by The Project of Shandong Province Higher Educational Science and Technology Program under Grant J15LN15.

  • Bimodal deep belief network
  • Speech signal
  • Expression signal
  • Multimodal emotion recognition