Adaptive On-Line Neural Network Retraining for Real Life Multimodal Emotion Recognition
Emotions play a major role in human-to-human communication enabling people to express themselves beyond the verbal domain. In recent years, important advances have been made in unimodal speech and video emotion analysis where facial expression information and prosodic audio features are treated independently. The need however to combine the two modalities in a naturalistic context, where adaptation to specific human characteristics and expressivity is required, and where single modalities alone cannot provide satisfactory evidence, is clear. Appropriate neural network classifiers are proposed for multimodal emotion analysis in this paper, in an adaptive framework, which is able to activate retraining of each modality, whenever deterioration of the respective performance is detected. Results are presented based on the IST HUMAINE NoE naturalistic database; both facial expression information and prosodic audio features are extracted from the same data and feature-based emotion analysis is performed through the proposed adaptive neural network methodology.
KeywordsEmotion Recognition Network Weight Gradient Projection Method Neural Network Classifier Weight Increment
Unable to display preview. Download preview PDF.
- 1.Young, J.W.: Head and face anthropometry of adult U.S. civilians, FAA Civil Aeromedical Institute (1993)Google Scholar
- 3.Krog, A., Vedelsby, J.: Neural network ensembles, cross validation and active learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in neural information processing systems 7, pp. 231–238. MIT Press, Cambridge (1995)Google Scholar
- 4.Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., Kollias, S.: Emotion recognition through facial expression analysis based on a neurofuzzy network. Special Issue on Emotion: Understanding & Recognition, Neural Networks 18(4), 423–435 (2005)Google Scholar
- 5.HUMAINE, Human-Machine Interaction Network on Emotion IST-2002-18.104.22.168, http://emotion-research.net/
- 6.Fransens, R., De Prins, J.: SVM-based Nonparametric Discriminant Analysis. In: An Application to Face Detection, October 13 - 16, 2003. Ninth IEEE International Conference on Computer Vision, vol. 2 (2003)Google Scholar
- 9.Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002(10) (October 2002)Google Scholar
- 11.Mertens, P.: The Prosogram: Semi-Automatic Transcription of Prosody based on a Tonal Perception Model. In: Bel, B., Marlien, I. (eds.) Proceedings of Speech Prosody, Nara (Japan), March 23-26 (2004), ISBN 2-9518233-1-2Google Scholar
- 12.Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine (2001)Google Scholar
- 13.Picard, R.W.: Affective Computing. MIT Press, Cambridge (2000)Google Scholar
- 14.Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proceedings of the 4th International Conference of Spoken Language Processing, Philadelphia, USA, pp. 1989–1992 (1996)Google Scholar