Abstract
Processing of recorded or real-time signals, feature extraction, and recognition are concepts of utmost important to an affect–aware and capable system, since they offer the opportunity to machines to benefit from modeling human behavior based on theory and interpret it based on observation. This chapter discusses feature extraction and recognition based on unimodal features in the case of speech, facial expressions and gestures, and physiological signals and elaborates on attention, fusion, dynamics, and adaptation in different multimodal cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human–computer interaction. Springer, Heidelberg, pp 92–103
Chen T, Rao R (1998 May) Audio-visual integration in multimodal communication. Proc IEEE Spec Issue Multimedia Signal Process 86:837–852
Christoudias M, Saenko K, Morency L-P, Darrell T (2006) Co-adaptation of audio-visual speech and gesture classifiers. In: Proceedings of the 8th international conference on multimodal interfaces, Banff, AB, Canada
Gunes H, Piccardi M (2009) Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans Syst Man Cybern B 39(1):64–84
Karpouzis K, Kollias S (2007) Multimodality, universals, natural interaction, Humaine plenary presentation. http://tinyurl.com/humaine-context. Accessed 6 Jul 2009
Lewis JP, Parke FI (1986) Automated lip-synch and speech synthesis for character animation. SIGCHI Bull 17(May):143–147. doi: 10.1145/30851.30874 http://doi.acm.org/10.1145/30851.30874
McAllister DF, Rodman RD, Bitzer DL, Freeman AS (1997) Lip synchronization for animation. In: Proceedings of SIGGRAPH 97, Los Angeles, CA
Morency L, Sidner C, Lee C, Darrell T (2007) Head gestures for perceptual interfaces: the role of context in improving recognition. Artif Intell 171(8–9):568–585
Onat S, Libertus K, Koenig P (2007) Integrating audiovisual information for the control of overt attention. J Vis 7(10):1–16
Oviatt S (1999) Ten myths of multimodal interaction. Commun ACM 42(11):74–81
Pandzic IS, Forchheimer R (eds) (2002) MPEG-4 facial animation – the standard, implementation and applications. Wiley, New York, NY. ISBN 0-470-84465-5
Rogozan A (1999) Discriminative learning of visual data for audiovisual speech recognition. Int J Artif Intell Tools 8:43–52
Teissier P, Robert-Ribes J, Schwartz JL (1999) Comparing models for audiovisual fusion in a noisy-vowel recognition task. IEEE Trans Speech Audio Process 7:629–642
Vertegaal R, Slagter R, van der Veer G, Nijholt A (2001) Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI conference on human factors in computing systems, Seattle, Washington, DC, pp 301–308
Vinciarelli A, Pantic M, Bourlard H (2009 November) Social signal processing: survey of an emerging domain. Image Vis Comput 27(12):1743–1759
Zoric G (2003) Real-time animation driven by human voice. In: Proceedings of ConTEL, Zagreb
Zoric G, Pandzic I (2006) Real-time language independent lip synchronization method using a genetic algorithm. Signal Process 86(12):3644–3656
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Karpouzis, K. (2011). Editorial: “Signals to Signs” – Feature Extraction, Recognition, and Multimodal Fusion. In: Cowie, R., Pelachaud, C., Petta, P. (eds) Emotion-Oriented Systems. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15184-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-15184-2_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15183-5
Online ISBN: 978-3-642-15184-2
eBook Packages: Computer ScienceComputer Science (R0)