Hierarchical fusion of visual and physiological signals for emotion recognition


Emotion recognition is an attractive and essential topic in image and signal processing. In this paper, we propose a multi-level fusion method to combine visual information and physiological signals for emotion recognition. For visual information, we propose a serial fusion of two-stage features to enhance the representation of facial expression in a video sequence. We propose to integrate the Neural Aggregation Network with Convolutional Neural Network feature map to reinforce the vital emotional frames. For physiological signals, we propose a parallel fusion scheme to widen the band of the annotation of the electroencephalogram signals. We extract the frequency feature with the Linear-Frequency Cepstral Coefficients and enhance it with the signal complexity denoted by Sample Entropy (SampEn). In the classification stage, we realize both feature level and decision level fusion of both visual and physiological information. Experimental results validate the effectiveness of the proposed multi-level multi-modal feature representation method.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  • Emotion recognition
  • Facial expression
  • Electroencephalogram