Two-Level Fusion to Improve Emotion Classification in Spoken Dialogue Systems
This paper proposes a technique to enhance emotion classification in spoken dialogue systems by means of two fusion modules. The first combines emotion predictions generated by a set of classifiers that deal with different kinds of information about each sentence uttered by the user. To do this, the module employs several fusion methods that produce other predictions about the emotional state of the user. The predictions are the input to the second fusion module, where they are combined to deduce the user’s emotional state. Experiments have been carried out considering two emotion categories (‘Non-negative’ and ‘Negative’) and classifiers that deal with prosodic, acoustic, lexical and dialogue acts information. The results show that the first fusion module significantly increases the classification rates of a baseline and the classifiers working separately, as has been observed previously in the literature. The novelty of the technique is the inclusion of the second fusion module, which enhances classification rate by 2.25% absolute.
KeywordsEmotion Recognition Fusion Method Emotion Category Speech Database Voice Sample
Unable to display preview. Download preview PDF.
- 2.Ai, H., Litman, D., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using systems and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proc. of Interspeech 2006-ICSLP, Pittsburgh, USA, pp. 797–800 (2006)Google Scholar
- 3.Devillers, L., Scherer, K.: Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: Proc. of Interspeech 2006-ICSLP, Pittsburgh, USA, pp. 801–804 (2006)Google Scholar