Abstract
In naturalistic behaviour, the affective states of a person change at a rate much slower than the typical rate at which video or audio is recorded (e.g. 25fps for video). Hence, there is a high probability that consecutive recorded instants of expressions represent a same affective content. In this paper, a multi-stage automatic affective expression recognition system is proposed which uses Hidden Markov Models (HMMs) to take into account this temporal relationship and finalize the classification process. The hidden states of the HMMs are associated with the levels of affective dimensions to convert the classification problem into a best path finding problem in HMM. The system was tested on the audio data of the Audio/Visual Emotion Challenge (AVEC) datasets showing performance significantly above that of a one-stage classification system that does not take into account the temporal relationship, as well as above the baseline set provided by this Challenge. Due to the generality of the approach, this system could be applied to other types of affective modalities.
The original version of this chapter was revised. An Erratum for this chapter can be found at http://dx.doi.org/10.1007/978-3-642-24571-8_75
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)
Eyben, F., Petridis, S., Schuller, B., Tzimiropoulos, G., Zafeiriou, S.: Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: Proceedings of IEEE Int’l Conf. Acoustics, Speech and Signal Processing (ICASSP 2011), Prague, Czech Republic (May 2011)
Eyben, F., Wollmer, M., Valstar, M.F., Gunes, H., Schuller, B., Pantic, M.: String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011), Santa Barbara, CA, USA (March 2011)
Kleinsmith, A., Bianchi-Berthouze, N., Steed, A.: Automatic recognition of non-acted affective postures. IEEE Transactions on Systems, Man and Cybernetics, Part B (in press, 2011)
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proc. ICSLP 2004, pp. 889–892 (2004)
Mandryk, R.L., Inkpen, K.M., Calvert, T.W.: Using psychophysiological techniques to measure user experience with entertainment technologies. Behaviour & IT 25(2), 141–158 (2006)
Mckeown, G., Valstar, M.F., Cowie, R., Pantic, M.: The semaine corpus of emotionally coloured character interactions. In: Proceedings of IEEE Int’l Conf. Multimedia, Expo. (ICME 2010), Singapore, pp. 1079–1084 (July 2010)
Nicolaou, M.A., Gunes, H., Pantic, M.: Output-associative rvm regression for dimensional and continuous emotion prediction. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011), Santa Barbara, CA, USA (March 2011)
Nicolaou, M.A., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing 2, 92–105 (2011)
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Communication 41(4), 603–623 (2003)
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1424–1445 (2000)
Picard, R.W.: Affective Computing. The MIT Press, Cambridge (1997)
Schuller, B., Valstar, M., Cowie, R., Pantic, M.: The first audio/Visual emotion challenge and workshop - an introduction (AVEC 2011). In: D’Mello, S. (ed.) ACII 2011, Part II. LNCS, vol. 6975, pp. 322–322. Springer, Heidelberg (2011)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)
Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies. In: INTERSPEECH, pp. 597–600 (2008)
Wöllmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In: INTERSPEECH, pp. 2362–2365 (2010)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Meng, H., Bianchi-Berthouze, N. (2011). Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24571-8_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-24571-8_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24570-1
Online ISBN: 978-3-642-24571-8
eBook Packages: Computer ScienceComputer Science (R0)