Abstract
Emotion is a research area that has received much attention during the last 10 years, both in the context of speech synthesis, image understanding as well as in automatic speech recognition, interactive dialogues systems and wearable computing. There are promising studies on the emotional behaviour of people, mainly based on human observations. Only a few are based on automatic machine detection due to the lack of Information Technology and Engineering (ITE) techniques that can make available a deeper and large-scale noninvasive analysis and evaluation of people’s emotional behaviour and provide tools and support for helping them to overcome social barriers. The present paper reports a study for extracting and associating emotional meta-features to support the development of emotionally rich man–machine interfaces (interactive dialogue systems and intelligent avatars).
Similar content being viewed by others
References
Caccioppo JT, Klein DJ, Bernston GC, Hatfield E. The psychophysiology of emotion. In: Haviland M, Lewis JM, editors. Handbook of emotion. New York: Guilford Press; 1993. p. 119–42.
Ekman P. Facial expression of emotion: new findings, new questions. Psychol Sci. 1992;3:34–8.
Fridlund AJ. The new ethnology of human facial expressions. In: Russell JA, Fernandez-Dols J, editors. The psychology of facial expression. Cambridge: Cambridge University Press; 1997. p. 103–29.
Izard CE. Basic emotions, relations among emotions, and emotion—cognition relations. Psychol Rev. 1992;99:561–5.
Ortony A, Clore GL, Collins A. The cognitive structure of emotions. Cambridge: Cambridge University Press; 1988.
Esposito A. The amount of information on emotional states conveyed by the verbal and nonverbal channels: some perceptual data. In: Stilianou Y, Faundez-Zanuy M, Esposito A, editors. Progress in nonlinear speech processing, vol. 4392. Berlin: LNCS Springer; 2007. p. 245–6.
Atassi H, Riviello MT, Smékal Z, Hussain A, Esposito A. Emotional vocal expressions recognition using the COST 2102 Italian database of emotional speech. In: Esposito A, Campbell N, Vogel C, Hussain A, Nijholt A, editors. Development of multimodal interfaces: active listening and synchrony, vol. 5967. Berlin: LNCS Springer; 2010. p. 255–67.
Atassi H, Esposito A. Speaker independent approach to the classification of emotional vocal expressions. In: Proceeding of IEEE international conference on tools with artificial intelligence (ICTAI) Dayton, OHIO, USA. 2008; vol. 1, p. 487–494.
Bachorowski JA. Vocal expression and perception of emotion. Curr Dir Psychol Sci. 1999;8:53–7.
Breitenstein C, Van Lancker D, Daum I. The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample. Cogn Emot. 2001;15:57–79.
Scherer KR, Banse R, Wallbott HG. Emotion inferences from vocal expression correlate across languages and cultures. J Cross Cult Psychol. 2001;32:76–92.
Esposito A, Riviello MT, Bourbakis N. Cultural specific effects on the recognition of basic emotions: A study on Italian subjects. In: Holzinger A, Miesenberger K, editors. LNCS Springer. 2009; vol. 5889, p. 135–148.
Morishima S. Face analysis and synthesis. IEEE Signal Process Mag. 2001;18(3):26–34.
Kakumanu P, Bourbakis N. (2006). A local-global graph approach for facial expressions recognition. In: Proceedings IEEE international conference on tools in artificial intelligence; 2006. p. 685–92.
Ekman P, Friesen WV, Hager JC. The facial action coding system, eBook. 2nd ed. London: Weidenfeld & Nicolson; 2002.
Kanade T, Cohn J, Tian Y. Comprehensive database for facial expression analysis. In: Proceedings of the 4th IEEE international conference on automatic face and gesture recognition (FG’00); 2000. p. 46–53.
Pantic M, Patras I, Rothkrantz LJM. Facial action recognition in face profile image sequences. In: Proceedings of IEEE international conference multimedia and expo. http://citeseer.ist.psu.edu/pantic02facial.html; 2002. p. 37–40.
Goldenberg R, Kimmel R, Rivlin E, Rudzsky M. Behaviour classification by Eigen decomposition of periodic motions. Pattern Recognit. 2005;38(7):1033–44.
Granlund G. A Cognitive vision architecture integrating neural networks with symbolic processing. Kunstliche Intelligenz. 2005;2:18–24.
Keogh EJ, Pazzani MJ. Learning the structure of augmented Bayesian classifiers. IJAIT. 2002;11(4):587–603.
Maurer A. Bounds for linear multi-task learning. J Mach Learn. 2006;7:117–39.
Bourbakis N, Gattiker J, Bebis G. A synergistic model for representing and interpreting human activity and events from video. Int J Artif Intell Tools. 2003;12(1):1–16.
Littlewort G, Bartlett MS, Fasel I, Susskind J, Movellan JR. Dynamics of facial expression extracted automatically from video. In: Proceedings of IEEE conference on computer vision and pattern recognition, workshop on face processing in video. See http://citeseer.ist.psu.edu/711804.html; 2004.
Gattiker J, Bourbakis N. Representation of structural and functional knowledge using SPN graphs. In: Proceedings of IEEE international conference on SEKE, MD; 1995. p. 47–54.
Yamato J, Ohya J, Kenichiro (1992). Recognizing human action in time-sequential images using Hidden Markov Model. In: Proceedings of IEEE international conference on computer vision and pattern; 1992. p. 379–385.
Bourbakis N. Emulating human visual perception for measuring differences in images using an SPN graph approach. IEEE Trans System Men Cybern. 2002;32(2):191–201.
Braun D, Mayberry J, Powers A, Schlicker S. A singular introduction to the Hausdorff metric geometry. Pi Mu Epsilon J. 2005;12(3):129–38.
Busiello S. An algorithm to approximate discontinuous functions through fractals and its application to the synthesis and compression of sounds and images (in Italian). PhD thesis in Applied Mathematics, Università di Napoli “Federico II”, Italy; 2004.
Naimpally S. All hyper topologies are hit-and-miss. Appl Gen Topol. 2002;3:45–53.
Sendov B. Hausdorff distance and image processing. Russ Math Surv. 2001;59(2):319–28.
Bourbakis N, Kakumanu P, Makrogiannis S, Bryll R, Panchanathan S. Neural network approach for image chromatic adaptation for skin colour detection. Int J Neural Syst. 2007;17(1):1–12.
Bourbakis N, Makrogiannis S. Stochastic optimization scheme for automatic registration of aerial images In: Proceedings of IEEE international conference tools with artificial intelligence; 2004. p. 328–336.
Fu KS, Mu JK. A survey on image segmentation. Pattern Recogn. 1981;13:3–16.
Moghaddamzadeh A, Bourbakis N. A fuzzy region growing approach for segmentation of colour images. PR Soc J Pattern Recognit. 1997;30(6):867–81.
Pavlidis T. Structural pattern recognition. Berlin: Springer; 1997.
Schettini R. Low-level segmentation of complex colour images. Signal Process VI Theor Appl; 1992; 535–538.
Yaun P, Moghaddamzadeh A, Goldman D, Bourbakis N. A fuzzy-like approach to edge detection in coloured images. IAPR Pattern Anal Appl. 2001;4(4):272–82.
Kubicka E, Kubicki G, Vakalis I. Using graph distance in object recognition. ACM eighteenth annual computer science conference proceedings. New York, NY: ACM; 1990. p. 43–8.
Sanfeliu A, Fu KS. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern. 1983;13(3):353–62.
Ahuja N, An B, Schachter B. Image representation using Voronoi tessellation. Comput Vis Graph Image Process. 1985;29:286–95.
Martinez A, Benavente R. The AR face database. CVC Tech. Rep. # 24. Universitat Autònoma de Barcelona, Edifici O, 08193, Barcelona. http://rvl1.ecn.purdue.edu/~aleix/aleix_face_DB.html or http://RVL.www.ecn.purdue.edu. Accessed by Feb 1999.
Bourbakis N. Triangular representation of speech signals using LG graphs. ITRI-TR-11 internal report, Wright State University, Dayton, OH, USA; 2002.
Esposito A, Bourbakis N. The role of timing on the speech perception and production processes and its effects on language impaired individuals. In: Proceedings of international IEEE symposium on BIBE, 0-7695-2727-2; 2006. p. 348–356.
Ahissar M, Protopapas A, Reid M, Merzenich M. Auditory processing parallel reading abilities in adults. Proc Natl Acad Sci. 2000;97:6832–7.
Tallal P. Temporal or phonetic processing deficit in dyslexia? That is the question. Appl Psycholinguistics. 1984;5:13–24.
Tallal P. Fine-grained discrimination deficits in language-learning impaired children are specific neither to the auditory modality nor to speech perception. J Speech Hear Res. 1990;33:616–7.
Tallal P, et al. Language comprehension in language-learning impaired children improved with acoustically modified speech. Science. 1996;272:81–4.
Breier JI, Fletcher JM, Foorman BR, Klaas P, Gray LC. Auditory temporal processing in children with specific reading disability with and without attention deficit/hyperactivity disorder. J Speech Lang Hear Res. 2003;46:31–42.
Heath SM, Hogben JH. The reliability and validity of tasks measuring perception of rapid sequences in children with dyslexia. J Child Psychol Psychiatry. 2004;45(7):1275–87.
Mody M. Phonological basis in reading disability: a review and analysis of evidence. Read Writ. 2003;16:21–39.
Esposito A, Di Benedetto G. Acoustical and perceptual study of gemination in Italian stops. JASA. 1999;106(4):2051–61.
Murata T. Petri nets: properties, analysis and applications. Proc IEEE. 1989;77(4):541–80.
Acknowledgments
This work has been partially supported by an AIIS Inc. Grant and the European projects: COST 2102 “Cross Modal Analysis of Verbal and Nonverbal Communication”, http://cost2102.cs.stir.ac.uk/ and COST ISCH TD0904 “TMELY: Time in MEntal activitY (http://w3.cost.eu/index.php?id=233&action_number=TD0904).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bourbakis, N., Esposito, A. & Kavraki, D. Extracting and Associating Meta-features for Understanding People’s Emotional Behaviour: Face and Speech. Cogn Comput 3, 436–448 (2011). https://doi.org/10.1007/s12559-010-9072-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-010-9072-1